CN104537003A

CN104537003A - Universal high-performance data writing method for Hbase database

Info

Publication number: CN104537003A
Application number: CN201410777982.5A
Authority: CN
Inventors: 曹宇
Original assignee: BEIJING SINOIOV VEHICLE NETWORK TECHNOLOGY Co Ltd
Current assignee: BEIJING SINOIOV VEHICLE NETWORK TECHNOLOGY Co Ltd
Priority date: 2014-12-16
Filing date: 2014-12-16
Publication date: 2015-04-22
Anticipated expiration: 2034-12-16
Also published as: CN104537003B

Abstract

The invention discloses a universal high-performance data writing method for an Hbase database, and is capable of avoiding the data interference and concurrence potential risk, avoiding the queue data block caused by slow writing speed of the HTable data, and ensuring the continuous output of the queue data. The universal high-performance data writing method for the Hbase database comprises the steps of using the data distribution mechanism to distribute multiple groups of data averagely, wherein the data in the different lists has different data processing objects, and all HTable object arrays are the local variable of this object; corresponding to one HTable object array by each group of the data one to one; writing each group of the data to multiple buffer regions, and reading the written buffer region to perform the in-stockroom operation by the HTable object array; adding a thread lock to the used buffer region in the process of writing the data by the HTable object array, and bypassing the buffer region added with the thread lock and writing a new buffer region by the data unwritten in the buffer region.

Description

A kind of general high-performance data wiring method of Hbase database

Technical field

The present invention relates to the technical field of computer digital animation, relate to a kind of general high-performance data wiring method of Hbase database particularly, be mainly used in large data write Hbase database.

Background technology

HBase be one distributed, towards row PostgreSQL database, utilize HBase technology can erect large-scale structure storage cluster on cheap PC Server.HBase is the realization of increasing income of Google Bigtable, and similar Google Bigtable utilizes GFS as its document storage system, and HBase utilizes HadoopHDFS as its document storage system; Google operation MapReduce carrys out the mass data in treatments B igtable, and HBase utilizes HadoopMapReduce to process the mass data in HBase equally; GoogleBigtable utilizes Chubby as cooperation with service, and HBase utilizes Zookeeper as correspondence.

In data write Hbase database, prior art adopts common producer consumer pattern mostly, and because grabbing synchrolock between productive consumption thread, especially more obvious under multithreading, this has just had a strong impact on warehouse-in efficiency.The prior art also had uses Hbase database to carry instrument import function to carry out data importing, and such efficiency is lower.And prior art can block the regionserver client of Hbase when mass data is put in storage, cause regionserver delay machine or zookeeper time-out.

Summary of the invention

Technology of the present invention is dealt with problems and is: overcome the deficiencies in the prior art, a kind of general high-performance data wiring method of Hbase database is provided, it can avoid data interference and concurrent potential safety hazard, the reason avoided because HTable data write rate is slow causes queuing data to be blocked, and ensure that the data of queue can continually export.

Technical solution of the present invention is: the general high-performance data wiring method of this Hbase database, usage data distribution mechanisms, average distribution many numbers certificate, the data of different table have different data processing objects, the local variable of HTable object array object all for this reason; Every number is according to one_to_one corresponding HTable object array; Every number is according in write buffer zone, and then HTable object array reads the buffer zone finished writing to carry out in-stockroom operation; Add thread lock to the buffer zone used in the process of HTable object array write data, the data not writing buffer zone get around the buffer zone that adds thread lock and write a new buffer zone.

Usage data distribution mechanisms of the present invention, average distribution many numbers certificate, the data of different table have different data processing objects, the local variable of receiving terminal (HTable object array) object all for this reason of many numbers certificate of distribution, this ensure that the independence of table rank, and do not interfere with each other mutually between queue, ensure that the independence between queue, avoid data interference and concurrent potential safety hazard, every number certificate all one_to_one corresponding HTable object array, the write of data is completed by these HTable object array, every number is according to writing in a buffer zone, then HTable object array is gone to read the buffer zone finished writing and is gone to carry out in-stockroom operation, HTable writes in the process of data can add thread lock to the buffer zone used, the data so not writing buffer zone will get around the buffer zone locked, go to write a new buffer zone, the reason that doing so avoids because HTable data write rate is slow causes queuing data to be blocked, ensure that the data of queue can continually export.

Accompanying drawing explanation

Fig. 1 is the schematic diagram of a preferred embodiment of general high-performance data wiring method according to Hbase database of the present invention.

Fig. 2 is the process flow diagram of the general high-performance data wiring method according to Hbase database of the present invention.

Embodiment

The general high-performance data wiring method of this Hbase database, usage data distribution mechanisms, average distribution many numbers certificate, the data of different table have different data processing objects, the local variable of HTable object array object all for this reason; Every number is according to one_to_one corresponding HTable object array; Every number is according in write buffer zone, and then HTable object array reads the buffer zone finished writing to carry out in-stockroom operation; Add thread lock to the buffer zone used in the process of HTable object array write data, the data not writing buffer zone get around the buffer zone that adds thread lock and write a new buffer zone.

In addition, each HTable object array has pond, buffer zone one to one, and HTable object array obtains Buffer object from pond, buffer zone, if buffer zone is finished, waits for.The internal memory caused because buffer zone infinitely creates can be effectively avoided to overflow like this.Simultaneously queue and pond, buffer zone are one to one, instead of all queues share a total pond, buffer zone, and just independently getting up in the pond, buffer zone between each like this HTable object array, decreases the concurrent and interference of data between queue.Because the mechanism of buffer zone, if data volume super large causes buffer zone to be finished here, now data are blocked in this locality, instead of block on the server, the machine blocking data is evenly to server data, and the load of Deterministic service device, avoids regionserver to occur machine zookeeper time-out etc. of delaying.

In addition, the number of internal buffer, pond, buffer zone and size are controlled by configuration file.

In addition, in the corresponding each HTable object array of base class association of DAO layer, by the different corresponding different HTable object array of table name, this base class of DAO layer obtains from object pool, obtain different base classes for data difference, this type of be according to the different singleton pattern of table. with the same object of table, different show to obtain different. because of and HTable array one_to_one corresponding, namely be enter one group of identical HTable array with table data, different table data enter one group of different HTable arrays.

In addition, every data line has top layer interface, defines the type of every data line.No matter be that the data of what Data Source are (as file reads like this, code building, the data that other technologies are come as redis distribution), become an object just can call follow-up data write-in program as long as each row of data to be realized top layer interface, add versatility.

In addition, as shown in Figure 1-2, concrete step is provided:

(1) every data line sequence changes into HBaseObject object;

Every data line all will realize HbaseObject interface, and this interface defines the data such as the rowkey value tableName of every a line.

(2) obtain table name according to each row of data, from object pool, obtain corresponding data processing object according to table name;

Because the data of warehouse-in may be the data of different table, here from object pool, obtaining the object of this table of corresponding process after obtaining tableName. this object has its local variable to be multiple HTable object array. that is multiple queues of step (5), a corresponding HTable object array of queue.Here queue is not entity, for convenience, only refers to and data is divided into the different queue that many parts are said into image, is actually here data be divide into many parts to pass to corresponding HTable object array.One_to_one corresponding i.e. data are divided into several parts just several HTable object array.Queue entries is non-existent.

(3) corresponding object is returned according to the table name transmitted;

Data processing object pool returns corresponding object according to the table name transmitted.If current data be certain table Article 1 enter database data, so in object pool this table to as if non-existent.Returning after what object pool can be corresponding here create object.

Here object pool is singleton pattern, and that is different pieces of information table name is the same, and the object obtained is all the same. this ensures that there local variable (multiple HTable object array) in object by all data sharings with table.

(4) data entered are regular according to allocation algorithm after obtaining corresponding object, give different HTable object array (being namely the queue in figure) successively;

(5) data are divided into many parts and pass to every portion HTable object array one to one;

Here queue is the title risen for convenience, and data are divided into many parts passes to every portion HTable object array one to one in fact exactly.

(6) from pond, buffer zone, Buffer object is obtained;

Because space is inadequate, only depict the follow-up processing flow of a data in figure, in fact each number certificate is all performing like this according to this process concurrency.

(7) HTable object array reads buffer zone warehouse-in.

After data are divided into many parts, each number gets Buffer object according to going to Buffer object pond, then data write buffer zone.

Each HTable object simultaneously in every a HTable object array will go to read the buffer zone finished writing, and locks, and starts data to write Hbase.

Each number certificate all pond, an one_to_one corresponding buffer zone, that is data divide into several parts just pond, several buffer zone, independent each other. just as step (6) is said, all in executed in parallel.

Define pond, buffer zone just to define buffer zone and can not infinitely increase simultaneously, avoid internal memory and overflow.

Below provide application scenarios of the present invention:

On shipping platform, a large amount of data are had to need persistence at short notice to enter database.Traditional warehouse-in mode can not high-performance and can severe obstruction like this.

After using this solution, solve above problem, warehouse-in data volume per second reaches more than 800,000.Short and small data TPS can go up 1,000,000, and stable operation.

The above; it is only preferred embodiment of the present invention; not any pro forma restriction is done to the present invention, every above embodiment is done according to technical spirit of the present invention any simple modification, equivalent variations and modification, all still belong to the protection domain of technical solution of the present invention.

Claims

1. a general high-performance data wiring method for Hbase database, is characterized in that: usage data distribution mechanisms, average distribution many numbers certificate, and the data of different table have different data processing objects, the local variable of HTable object array object all for this reason; Every number is according to one_to_one corresponding HTable object array; Every number is according in the multiple buffer zone of write, and then HTable object array reads the buffer zone finished writing to carry out in-stockroom operation; Add thread lock to the buffer zone used in the process of HTable object array write data, the data not writing buffer zone get around the buffer zone that adds thread lock and write a new buffer zone.

2. the general high-performance data wiring method of Hbase database according to claim 1, it is characterized in that: each HTable object array has pond, buffer zone one to one, HTable object array obtains Buffer object from pond, buffer zone, if buffer zone is finished, waits for.

3. the general high-performance data wiring method of Hbase database according to claim 2, is characterized in that: number and the size of internal buffer, pond, buffer zone are controlled by configuration file.

4. the general high-performance data wiring method of Hbase database according to claim 3, it is characterized in that: in the corresponding each HTable object array of base class association of DAO layer, by the different corresponding different HTable object array of table name, this base class of DAO layer obtains from object pool, different base classes is obtained for data difference, this type of is according to the different singleton pattern of table, and different table data enter one group of different HTable arrays.

5. the general high-performance data wiring method of Hbase database according to claim 4, is characterized in that: every data line has top layer interface, defines the type of every data line.

6. the general high-performance data wiring method of Hbase database according to claim 1, is characterized in that: the method comprises the following steps:

(1) every data line sequence changes into HBaseObject object;

(3) corresponding object is returned according to the table name transmitted;

(4) data entered are regular according to allocation algorithm after obtaining corresponding object, give different HTable object array successively;

(6) from pond, buffer zone, Buffer object is obtained; Write data enter buffer zone.

(7) HTable object array reads buffer zone warehouse-in.