CN104537003A - Universal high-performance data writing method for Hbase database - Google Patents

Universal high-performance data writing method for Hbase database Download PDF

Info

Publication number
CN104537003A
CN104537003A CN201410777982.5A CN201410777982A CN104537003A CN 104537003 A CN104537003 A CN 104537003A CN 201410777982 A CN201410777982 A CN 201410777982A CN 104537003 A CN104537003 A CN 104537003A
Authority
CN
China
Prior art keywords
data
htable
buffer zone
different
object array
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410777982.5A
Other languages
Chinese (zh)
Other versions
CN104537003B (en
Inventor
曹宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING SINOIOV VEHICLE NETWORK TECHNOLOGY Co Ltd
Original Assignee
BEIJING SINOIOV VEHICLE NETWORK TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING SINOIOV VEHICLE NETWORK TECHNOLOGY Co Ltd filed Critical BEIJING SINOIOV VEHICLE NETWORK TECHNOLOGY Co Ltd
Priority to CN201410777982.5A priority Critical patent/CN104537003B/en
Publication of CN104537003A publication Critical patent/CN104537003A/en
Application granted granted Critical
Publication of CN104537003B publication Critical patent/CN104537003B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database

Abstract

The invention discloses a universal high-performance data writing method for an Hbase database, and is capable of avoiding the data interference and concurrence potential risk, avoiding the queue data block caused by slow writing speed of the HTable data, and ensuring the continuous output of the queue data. The universal high-performance data writing method for the Hbase database comprises the steps of using the data distribution mechanism to distribute multiple groups of data averagely, wherein the data in the different lists has different data processing objects, and all HTable object arrays are the local variable of this object; corresponding to one HTable object array by each group of the data one to one; writing each group of the data to multiple buffer regions, and reading the written buffer region to perform the in-stockroom operation by the HTable object array; adding a thread lock to the used buffer region in the process of writing the data by the HTable object array, and bypassing the buffer region added with the thread lock and writing a new buffer region by the data unwritten in the buffer region.

Description

A kind of general high-performance data wiring method of Hbase database
Technical field
The present invention relates to the technical field of computer digital animation, relate to a kind of general high-performance data wiring method of Hbase database particularly, be mainly used in large data write Hbase database.
Background technology
HBase be one distributed, towards row PostgreSQL database, utilize HBase technology can erect large-scale structure storage cluster on cheap PC Server.HBase is the realization of increasing income of Google Bigtable, and similar Google Bigtable utilizes GFS as its document storage system, and HBase utilizes HadoopHDFS as its document storage system; Google operation MapReduce carrys out the mass data in treatments B igtable, and HBase utilizes HadoopMapReduce to process the mass data in HBase equally; GoogleBigtable utilizes Chubby as cooperation with service, and HBase utilizes Zookeeper as correspondence.
In data write Hbase database, prior art adopts common producer consumer pattern mostly, and because grabbing synchrolock between productive consumption thread, especially more obvious under multithreading, this has just had a strong impact on warehouse-in efficiency.The prior art also had uses Hbase database to carry instrument import function to carry out data importing, and such efficiency is lower.And prior art can block the regionserver client of Hbase when mass data is put in storage, cause regionserver delay machine or zookeeper time-out.
Summary of the invention
Technology of the present invention is dealt with problems and is: overcome the deficiencies in the prior art, a kind of general high-performance data wiring method of Hbase database is provided, it can avoid data interference and concurrent potential safety hazard, the reason avoided because HTable data write rate is slow causes queuing data to be blocked, and ensure that the data of queue can continually export.
Technical solution of the present invention is: the general high-performance data wiring method of this Hbase database, usage data distribution mechanisms, average distribution many numbers certificate, the data of different table have different data processing objects, the local variable of HTable object array object all for this reason; Every number is according to one_to_one corresponding HTable object array; Every number is according in write buffer zone, and then HTable object array reads the buffer zone finished writing to carry out in-stockroom operation; Add thread lock to the buffer zone used in the process of HTable object array write data, the data not writing buffer zone get around the buffer zone that adds thread lock and write a new buffer zone.
Usage data distribution mechanisms of the present invention, average distribution many numbers certificate, the data of different table have different data processing objects, the local variable of receiving terminal (HTable object array) object all for this reason of many numbers certificate of distribution, this ensure that the independence of table rank, and do not interfere with each other mutually between queue, ensure that the independence between queue, avoid data interference and concurrent potential safety hazard, every number certificate all one_to_one corresponding HTable object array, the write of data is completed by these HTable object array, every number is according to writing in a buffer zone, then HTable object array is gone to read the buffer zone finished writing and is gone to carry out in-stockroom operation, HTable writes in the process of data can add thread lock to the buffer zone used, the data so not writing buffer zone will get around the buffer zone locked, go to write a new buffer zone, the reason that doing so avoids because HTable data write rate is slow causes queuing data to be blocked, ensure that the data of queue can continually export.
Accompanying drawing explanation
Fig. 1 is the schematic diagram of a preferred embodiment of general high-performance data wiring method according to Hbase database of the present invention.
Fig. 2 is the process flow diagram of the general high-performance data wiring method according to Hbase database of the present invention.
Embodiment
The general high-performance data wiring method of this Hbase database, usage data distribution mechanisms, average distribution many numbers certificate, the data of different table have different data processing objects, the local variable of HTable object array object all for this reason; Every number is according to one_to_one corresponding HTable object array; Every number is according in write buffer zone, and then HTable object array reads the buffer zone finished writing to carry out in-stockroom operation; Add thread lock to the buffer zone used in the process of HTable object array write data, the data not writing buffer zone get around the buffer zone that adds thread lock and write a new buffer zone.
Usage data distribution mechanisms of the present invention, average distribution many numbers certificate, the data of different table have different data processing objects, the local variable of receiving terminal (HTable object array) object all for this reason of many numbers certificate of distribution, this ensure that the independence of table rank, and do not interfere with each other mutually between queue, ensure that the independence between queue, avoid data interference and concurrent potential safety hazard, every number certificate all one_to_one corresponding HTable object array, the write of data is completed by these HTable object array, every number is according to writing in a buffer zone, then HTable object array is gone to read the buffer zone finished writing and is gone to carry out in-stockroom operation, HTable writes in the process of data can add thread lock to the buffer zone used, the data so not writing buffer zone will get around the buffer zone locked, go to write a new buffer zone, the reason that doing so avoids because HTable data write rate is slow causes queuing data to be blocked, ensure that the data of queue can continually export.
In addition, each HTable object array has pond, buffer zone one to one, and HTable object array obtains Buffer object from pond, buffer zone, if buffer zone is finished, waits for.The internal memory caused because buffer zone infinitely creates can be effectively avoided to overflow like this.Simultaneously queue and pond, buffer zone are one to one, instead of all queues share a total pond, buffer zone, and just independently getting up in the pond, buffer zone between each like this HTable object array, decreases the concurrent and interference of data between queue.Because the mechanism of buffer zone, if data volume super large causes buffer zone to be finished here, now data are blocked in this locality, instead of block on the server, the machine blocking data is evenly to server data, and the load of Deterministic service device, avoids regionserver to occur machine zookeeper time-out etc. of delaying.
In addition, the number of internal buffer, pond, buffer zone and size are controlled by configuration file.
In addition, in the corresponding each HTable object array of base class association of DAO layer, by the different corresponding different HTable object array of table name, this base class of DAO layer obtains from object pool, obtain different base classes for data difference, this type of be according to the different singleton pattern of table. with the same object of table, different show to obtain different. because of and HTable array one_to_one corresponding, namely be enter one group of identical HTable array with table data, different table data enter one group of different HTable arrays.
In addition, every data line has top layer interface, defines the type of every data line.No matter be that the data of what Data Source are (as file reads like this, code building, the data that other technologies are come as redis distribution), become an object just can call follow-up data write-in program as long as each row of data to be realized top layer interface, add versatility.
In addition, as shown in Figure 1-2, concrete step is provided:
(1) every data line sequence changes into HBaseObject object;
Every data line all will realize HbaseObject interface, and this interface defines the data such as the rowkey value tableName of every a line.
(2) obtain table name according to each row of data, from object pool, obtain corresponding data processing object according to table name;
Because the data of warehouse-in may be the data of different table, here from object pool, obtaining the object of this table of corresponding process after obtaining tableName. this object has its local variable to be multiple HTable object array. that is multiple queues of step (5), a corresponding HTable object array of queue.Here queue is not entity, for convenience, only refers to and data is divided into the different queue that many parts are said into image, is actually here data be divide into many parts to pass to corresponding HTable object array.One_to_one corresponding i.e. data are divided into several parts just several HTable object array.Queue entries is non-existent.
(3) corresponding object is returned according to the table name transmitted;
Data processing object pool returns corresponding object according to the table name transmitted.If current data be certain table Article 1 enter database data, so in object pool this table to as if non-existent.Returning after what object pool can be corresponding here create object.
Here object pool is singleton pattern, and that is different pieces of information table name is the same, and the object obtained is all the same. this ensures that there local variable (multiple HTable object array) in object by all data sharings with table.
(4) data entered are regular according to allocation algorithm after obtaining corresponding object, give different HTable object array (being namely the queue in figure) successively;
(5) data are divided into many parts and pass to every portion HTable object array one to one;
Here queue is the title risen for convenience, and data are divided into many parts passes to every portion HTable object array one to one in fact exactly.
(6) from pond, buffer zone, Buffer object is obtained;
Because space is inadequate, only depict the follow-up processing flow of a data in figure, in fact each number certificate is all performing like this according to this process concurrency.
(7) HTable object array reads buffer zone warehouse-in.
After data are divided into many parts, each number gets Buffer object according to going to Buffer object pond, then data write buffer zone.
Each HTable object simultaneously in every a HTable object array will go to read the buffer zone finished writing, and locks, and starts data to write Hbase.
Each number certificate all pond, an one_to_one corresponding buffer zone, that is data divide into several parts just pond, several buffer zone, independent each other. just as step (6) is said, all in executed in parallel.
Define pond, buffer zone just to define buffer zone and can not infinitely increase simultaneously, avoid internal memory and overflow.
Below provide application scenarios of the present invention:
On shipping platform, a large amount of data are had to need persistence at short notice to enter database.Traditional warehouse-in mode can not high-performance and can severe obstruction like this.
After using this solution, solve above problem, warehouse-in data volume per second reaches more than 800,000.Short and small data TPS can go up 1,000,000, and stable operation.
The above; it is only preferred embodiment of the present invention; not any pro forma restriction is done to the present invention, every above embodiment is done according to technical spirit of the present invention any simple modification, equivalent variations and modification, all still belong to the protection domain of technical solution of the present invention.

Claims (6)

1. a general high-performance data wiring method for Hbase database, is characterized in that: usage data distribution mechanisms, average distribution many numbers certificate, and the data of different table have different data processing objects, the local variable of HTable object array object all for this reason; Every number is according to one_to_one corresponding HTable object array; Every number is according in the multiple buffer zone of write, and then HTable object array reads the buffer zone finished writing to carry out in-stockroom operation; Add thread lock to the buffer zone used in the process of HTable object array write data, the data not writing buffer zone get around the buffer zone that adds thread lock and write a new buffer zone.
2. the general high-performance data wiring method of Hbase database according to claim 1, it is characterized in that: each HTable object array has pond, buffer zone one to one, HTable object array obtains Buffer object from pond, buffer zone, if buffer zone is finished, waits for.
3. the general high-performance data wiring method of Hbase database according to claim 2, is characterized in that: number and the size of internal buffer, pond, buffer zone are controlled by configuration file.
4. the general high-performance data wiring method of Hbase database according to claim 3, it is characterized in that: in the corresponding each HTable object array of base class association of DAO layer, by the different corresponding different HTable object array of table name, this base class of DAO layer obtains from object pool, different base classes is obtained for data difference, this type of is according to the different singleton pattern of table, and different table data enter one group of different HTable arrays.
5. the general high-performance data wiring method of Hbase database according to claim 4, is characterized in that: every data line has top layer interface, defines the type of every data line.
6. the general high-performance data wiring method of Hbase database according to claim 1, is characterized in that: the method comprises the following steps:
(1) every data line sequence changes into HBaseObject object;
(2) obtain table name according to each row of data, from object pool, obtain corresponding data processing object according to table name;
(3) corresponding object is returned according to the table name transmitted;
(4) data entered are regular according to allocation algorithm after obtaining corresponding object, give different HTable object array successively;
(5) data are divided into many parts and pass to every portion HTable object array one to one;
(6) from pond, buffer zone, Buffer object is obtained; Write data enter buffer zone.
(7) HTable object array reads buffer zone warehouse-in.
CN201410777982.5A 2014-12-16 2014-12-16 A kind of general high-performance data wiring method of Hbase databases Active CN104537003B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410777982.5A CN104537003B (en) 2014-12-16 2014-12-16 A kind of general high-performance data wiring method of Hbase databases

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410777982.5A CN104537003B (en) 2014-12-16 2014-12-16 A kind of general high-performance data wiring method of Hbase databases

Publications (2)

Publication Number Publication Date
CN104537003A true CN104537003A (en) 2015-04-22
CN104537003B CN104537003B (en) 2018-01-09

Family

ID=52852531

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410777982.5A Active CN104537003B (en) 2014-12-16 2014-12-16 A kind of general high-performance data wiring method of Hbase databases

Country Status (1)

Country Link
CN (1) CN104537003B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105608223A (en) * 2016-01-12 2016-05-25 北京中交兴路车联网科技有限公司 Hbase database entering method and system for kafka
CN107370797A (en) * 2017-06-30 2017-11-21 北京百度网讯科技有限公司 A kind of method and apparatus of the strongly-ordered queue operation based on HBase
CN107491314A (en) * 2017-08-30 2017-12-19 四川长虹电器股份有限公司 Processing method is write based on Read-Write Locks algorithm is accessible to HBASE real time datas
CN112256523A (en) * 2020-09-23 2021-01-22 贝壳技术有限公司 Service data processing method and device
CN112445596A (en) * 2020-11-27 2021-03-05 平安普惠企业管理有限公司 Multithreading-based data import method and system and storage medium
CN114237505A (en) * 2021-12-14 2022-03-25 中国建设银行股份有限公司 Batch processing method and device of business data and computer equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101957863A (en) * 2010-10-14 2011-01-26 广州从兴电子开发有限公司 Data parallel processing method, device and system
CN103049556A (en) * 2012-12-28 2013-04-17 中国科学院深圳先进技术研究院 Fast statistical query method for mass medical data
CN103390038A (en) * 2013-07-16 2013-11-13 西安交通大学 HBase-based incremental index creation and retrieval method
CN103646073A (en) * 2013-12-11 2014-03-19 浪潮电子信息产业股份有限公司 Condition query optimizing method based on HBase table
US20140236960A1 (en) * 2013-02-19 2014-08-21 Futurewei Technologies, Inc. System and Method for Database Searching

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101957863A (en) * 2010-10-14 2011-01-26 广州从兴电子开发有限公司 Data parallel processing method, device and system
CN103049556A (en) * 2012-12-28 2013-04-17 中国科学院深圳先进技术研究院 Fast statistical query method for mass medical data
US20140236960A1 (en) * 2013-02-19 2014-08-21 Futurewei Technologies, Inc. System and Method for Database Searching
CN103390038A (en) * 2013-07-16 2013-11-13 西安交通大学 HBase-based incremental index creation and retrieval method
CN103646073A (en) * 2013-12-11 2014-03-19 浪潮电子信息产业股份有限公司 Condition query optimizing method based on HBase table

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105608223A (en) * 2016-01-12 2016-05-25 北京中交兴路车联网科技有限公司 Hbase database entering method and system for kafka
CN105608223B (en) * 2016-01-12 2019-04-30 北京中交兴路车联网科技有限公司 For the storage method and system of the Hbase database of kafka
CN107370797A (en) * 2017-06-30 2017-11-21 北京百度网讯科技有限公司 A kind of method and apparatus of the strongly-ordered queue operation based on HBase
CN107491314A (en) * 2017-08-30 2017-12-19 四川长虹电器股份有限公司 Processing method is write based on Read-Write Locks algorithm is accessible to HBASE real time datas
CN112256523A (en) * 2020-09-23 2021-01-22 贝壳技术有限公司 Service data processing method and device
CN112256523B (en) * 2020-09-23 2023-01-06 贝壳技术有限公司 Service data processing method and device
CN112445596A (en) * 2020-11-27 2021-03-05 平安普惠企业管理有限公司 Multithreading-based data import method and system and storage medium
CN112445596B (en) * 2020-11-27 2024-02-02 上海睿量私募基金管理有限公司 Data importing method, system and storage medium based on multithreading
CN114237505A (en) * 2021-12-14 2022-03-25 中国建设银行股份有限公司 Batch processing method and device of business data and computer equipment

Also Published As

Publication number Publication date
CN104537003B (en) 2018-01-09

Similar Documents

Publication Publication Date Title
CN104537003A (en) Universal high-performance data writing method for Hbase database
US9411659B2 (en) Data processing method used in distributed system
US8381230B2 (en) Message passing with queues and channels
US8676874B2 (en) Data structure for tiling and packetizing a sparse matrix
CN105117417A (en) Read-optimized memory database Trie tree index method
CN104408163A (en) Data hierarchical storage method and device
CN103019861A (en) Distribution method and distribution device of virtual machine
CN107291539B (en) Cluster program scheduler method based on resource significance level
CN110287038A (en) Promote the method and system of the data-handling efficiency of Spark Streaming frame
Liu et al. Massive image data management using HBase and MapReduce
CN106406762A (en) A repeated data deleting method and device
Hegeman et al. Distributed LiDAR data processing in a high-memory cloud-computing environment
US9069621B2 (en) Submitting operations to a shared resource based on busy-to-success ratios
Dai et al. Improving load balance for data-intensive computing on cloud platforms
CN106484532B (en) GPGPU parallel calculating method towards SPH fluid simulation
US8543722B2 (en) Message passing with queues and channels
CN104158902A (en) Method and device of distributing Hbase data blocks based on number of requests
CN103324577B (en) Based on the extensive itemize file allocation system minimizing IO access conflict and file itemize
US10339052B2 (en) Massive access request for out-of-core textures by a parallel processor with limited memory
CN104715349A (en) Method and system for calculating e-commerce freight
JP6333371B2 (en) Method for implementing bit arrays in cache lines
CN104572903B (en) A kind of method of the control data loading of Hbase database
Zhu et al. RECODS: replica consistency-on-demand store
CN106557430A (en) A kind of data cached brush method and device
CN117785759B (en) Data storage method, data reading method, electronic device, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant