CN104537003A - Universal high-performance data writing method for Hbase database - Google Patents
Universal high-performance data writing method for Hbase database Download PDFInfo
- Publication number
- CN104537003A CN104537003A CN201410777982.5A CN201410777982A CN104537003A CN 104537003 A CN104537003 A CN 104537003A CN 201410777982 A CN201410777982 A CN 201410777982A CN 104537003 A CN104537003 A CN 104537003A
- Authority
- CN
- China
- Prior art keywords
- data
- htable
- buffer zone
- different
- object array
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/258—Data format conversion from or to a database
Abstract
The invention discloses a universal high-performance data writing method for an Hbase database, and is capable of avoiding the data interference and concurrence potential risk, avoiding the queue data block caused by slow writing speed of the HTable data, and ensuring the continuous output of the queue data. The universal high-performance data writing method for the Hbase database comprises the steps of using the data distribution mechanism to distribute multiple groups of data averagely, wherein the data in the different lists has different data processing objects, and all HTable object arrays are the local variable of this object; corresponding to one HTable object array by each group of the data one to one; writing each group of the data to multiple buffer regions, and reading the written buffer region to perform the in-stockroom operation by the HTable object array; adding a thread lock to the used buffer region in the process of writing the data by the HTable object array, and bypassing the buffer region added with the thread lock and writing a new buffer region by the data unwritten in the buffer region.
Description
Technical field
The present invention relates to the technical field of computer digital animation, relate to a kind of general high-performance data wiring method of Hbase database particularly, be mainly used in large data write Hbase database.
Background technology
HBase be one distributed, towards row PostgreSQL database, utilize HBase technology can erect large-scale structure storage cluster on cheap PC Server.HBase is the realization of increasing income of Google Bigtable, and similar Google Bigtable utilizes GFS as its document storage system, and HBase utilizes HadoopHDFS as its document storage system; Google operation MapReduce carrys out the mass data in treatments B igtable, and HBase utilizes HadoopMapReduce to process the mass data in HBase equally; GoogleBigtable utilizes Chubby as cooperation with service, and HBase utilizes Zookeeper as correspondence.
In data write Hbase database, prior art adopts common producer consumer pattern mostly, and because grabbing synchrolock between productive consumption thread, especially more obvious under multithreading, this has just had a strong impact on warehouse-in efficiency.The prior art also had uses Hbase database to carry instrument import function to carry out data importing, and such efficiency is lower.And prior art can block the regionserver client of Hbase when mass data is put in storage, cause regionserver delay machine or zookeeper time-out.
Summary of the invention
Technology of the present invention is dealt with problems and is: overcome the deficiencies in the prior art, a kind of general high-performance data wiring method of Hbase database is provided, it can avoid data interference and concurrent potential safety hazard, the reason avoided because HTable data write rate is slow causes queuing data to be blocked, and ensure that the data of queue can continually export.
Technical solution of the present invention is: the general high-performance data wiring method of this Hbase database, usage data distribution mechanisms, average distribution many numbers certificate, the data of different table have different data processing objects, the local variable of HTable object array object all for this reason; Every number is according to one_to_one corresponding HTable object array; Every number is according in write buffer zone, and then HTable object array reads the buffer zone finished writing to carry out in-stockroom operation; Add thread lock to the buffer zone used in the process of HTable object array write data, the data not writing buffer zone get around the buffer zone that adds thread lock and write a new buffer zone.
Usage data distribution mechanisms of the present invention, average distribution many numbers certificate, the data of different table have different data processing objects, the local variable of receiving terminal (HTable object array) object all for this reason of many numbers certificate of distribution, this ensure that the independence of table rank, and do not interfere with each other mutually between queue, ensure that the independence between queue, avoid data interference and concurrent potential safety hazard, every number certificate all one_to_one corresponding HTable object array, the write of data is completed by these HTable object array, every number is according to writing in a buffer zone, then HTable object array is gone to read the buffer zone finished writing and is gone to carry out in-stockroom operation, HTable writes in the process of data can add thread lock to the buffer zone used, the data so not writing buffer zone will get around the buffer zone locked, go to write a new buffer zone, the reason that doing so avoids because HTable data write rate is slow causes queuing data to be blocked, ensure that the data of queue can continually export.
Accompanying drawing explanation
Fig. 1 is the schematic diagram of a preferred embodiment of general high-performance data wiring method according to Hbase database of the present invention.
Fig. 2 is the process flow diagram of the general high-performance data wiring method according to Hbase database of the present invention.
Embodiment
The general high-performance data wiring method of this Hbase database, usage data distribution mechanisms, average distribution many numbers certificate, the data of different table have different data processing objects, the local variable of HTable object array object all for this reason; Every number is according to one_to_one corresponding HTable object array; Every number is according in write buffer zone, and then HTable object array reads the buffer zone finished writing to carry out in-stockroom operation; Add thread lock to the buffer zone used in the process of HTable object array write data, the data not writing buffer zone get around the buffer zone that adds thread lock and write a new buffer zone.
Usage data distribution mechanisms of the present invention, average distribution many numbers certificate, the data of different table have different data processing objects, the local variable of receiving terminal (HTable object array) object all for this reason of many numbers certificate of distribution, this ensure that the independence of table rank, and do not interfere with each other mutually between queue, ensure that the independence between queue, avoid data interference and concurrent potential safety hazard, every number certificate all one_to_one corresponding HTable object array, the write of data is completed by these HTable object array, every number is according to writing in a buffer zone, then HTable object array is gone to read the buffer zone finished writing and is gone to carry out in-stockroom operation, HTable writes in the process of data can add thread lock to the buffer zone used, the data so not writing buffer zone will get around the buffer zone locked, go to write a new buffer zone, the reason that doing so avoids because HTable data write rate is slow causes queuing data to be blocked, ensure that the data of queue can continually export.
In addition, each HTable object array has pond, buffer zone one to one, and HTable object array obtains Buffer object from pond, buffer zone, if buffer zone is finished, waits for.The internal memory caused because buffer zone infinitely creates can be effectively avoided to overflow like this.Simultaneously queue and pond, buffer zone are one to one, instead of all queues share a total pond, buffer zone, and just independently getting up in the pond, buffer zone between each like this HTable object array, decreases the concurrent and interference of data between queue.Because the mechanism of buffer zone, if data volume super large causes buffer zone to be finished here, now data are blocked in this locality, instead of block on the server, the machine blocking data is evenly to server data, and the load of Deterministic service device, avoids regionserver to occur machine zookeeper time-out etc. of delaying.
In addition, the number of internal buffer, pond, buffer zone and size are controlled by configuration file.
In addition, in the corresponding each HTable object array of base class association of DAO layer, by the different corresponding different HTable object array of table name, this base class of DAO layer obtains from object pool, obtain different base classes for data difference, this type of be according to the different singleton pattern of table. with the same object of table, different show to obtain different. because of and HTable array one_to_one corresponding, namely be enter one group of identical HTable array with table data, different table data enter one group of different HTable arrays.
In addition, every data line has top layer interface, defines the type of every data line.No matter be that the data of what Data Source are (as file reads like this, code building, the data that other technologies are come as redis distribution), become an object just can call follow-up data write-in program as long as each row of data to be realized top layer interface, add versatility.
In addition, as shown in Figure 1-2, concrete step is provided:
(1) every data line sequence changes into HBaseObject object;
Every data line all will realize HbaseObject interface, and this interface defines the data such as the rowkey value tableName of every a line.
(2) obtain table name according to each row of data, from object pool, obtain corresponding data processing object according to table name;
Because the data of warehouse-in may be the data of different table, here from object pool, obtaining the object of this table of corresponding process after obtaining tableName. this object has its local variable to be multiple HTable object array. that is multiple queues of step (5), a corresponding HTable object array of queue.Here queue is not entity, for convenience, only refers to and data is divided into the different queue that many parts are said into image, is actually here data be divide into many parts to pass to corresponding HTable object array.One_to_one corresponding i.e. data are divided into several parts just several HTable object array.Queue entries is non-existent.
(3) corresponding object is returned according to the table name transmitted;
Data processing object pool returns corresponding object according to the table name transmitted.If current data be certain table Article 1 enter database data, so in object pool this table to as if non-existent.Returning after what object pool can be corresponding here create object.
Here object pool is singleton pattern, and that is different pieces of information table name is the same, and the object obtained is all the same. this ensures that there local variable (multiple HTable object array) in object by all data sharings with table.
(4) data entered are regular according to allocation algorithm after obtaining corresponding object, give different HTable object array (being namely the queue in figure) successively;
(5) data are divided into many parts and pass to every portion HTable object array one to one;
Here queue is the title risen for convenience, and data are divided into many parts passes to every portion HTable object array one to one in fact exactly.
(6) from pond, buffer zone, Buffer object is obtained;
Because space is inadequate, only depict the follow-up processing flow of a data in figure, in fact each number certificate is all performing like this according to this process concurrency.
(7) HTable object array reads buffer zone warehouse-in.
After data are divided into many parts, each number gets Buffer object according to going to Buffer object pond, then data write buffer zone.
Each HTable object simultaneously in every a HTable object array will go to read the buffer zone finished writing, and locks, and starts data to write Hbase.
Each number certificate all pond, an one_to_one corresponding buffer zone, that is data divide into several parts just pond, several buffer zone, independent each other. just as step (6) is said, all in executed in parallel.
Define pond, buffer zone just to define buffer zone and can not infinitely increase simultaneously, avoid internal memory and overflow.
Below provide application scenarios of the present invention:
On shipping platform, a large amount of data are had to need persistence at short notice to enter database.Traditional warehouse-in mode can not high-performance and can severe obstruction like this.
After using this solution, solve above problem, warehouse-in data volume per second reaches more than 800,000.Short and small data TPS can go up 1,000,000, and stable operation.
The above; it is only preferred embodiment of the present invention; not any pro forma restriction is done to the present invention, every above embodiment is done according to technical spirit of the present invention any simple modification, equivalent variations and modification, all still belong to the protection domain of technical solution of the present invention.
Claims (6)
1. a general high-performance data wiring method for Hbase database, is characterized in that: usage data distribution mechanisms, average distribution many numbers certificate, and the data of different table have different data processing objects, the local variable of HTable object array object all for this reason; Every number is according to one_to_one corresponding HTable object array; Every number is according in the multiple buffer zone of write, and then HTable object array reads the buffer zone finished writing to carry out in-stockroom operation; Add thread lock to the buffer zone used in the process of HTable object array write data, the data not writing buffer zone get around the buffer zone that adds thread lock and write a new buffer zone.
2. the general high-performance data wiring method of Hbase database according to claim 1, it is characterized in that: each HTable object array has pond, buffer zone one to one, HTable object array obtains Buffer object from pond, buffer zone, if buffer zone is finished, waits for.
3. the general high-performance data wiring method of Hbase database according to claim 2, is characterized in that: number and the size of internal buffer, pond, buffer zone are controlled by configuration file.
4. the general high-performance data wiring method of Hbase database according to claim 3, it is characterized in that: in the corresponding each HTable object array of base class association of DAO layer, by the different corresponding different HTable object array of table name, this base class of DAO layer obtains from object pool, different base classes is obtained for data difference, this type of is according to the different singleton pattern of table, and different table data enter one group of different HTable arrays.
5. the general high-performance data wiring method of Hbase database according to claim 4, is characterized in that: every data line has top layer interface, defines the type of every data line.
6. the general high-performance data wiring method of Hbase database according to claim 1, is characterized in that: the method comprises the following steps:
(1) every data line sequence changes into HBaseObject object;
(2) obtain table name according to each row of data, from object pool, obtain corresponding data processing object according to table name;
(3) corresponding object is returned according to the table name transmitted;
(4) data entered are regular according to allocation algorithm after obtaining corresponding object, give different HTable object array successively;
(5) data are divided into many parts and pass to every portion HTable object array one to one;
(6) from pond, buffer zone, Buffer object is obtained; Write data enter buffer zone.
(7) HTable object array reads buffer zone warehouse-in.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410777982.5A CN104537003B (en) | 2014-12-16 | 2014-12-16 | A kind of general high-performance data wiring method of Hbase databases |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410777982.5A CN104537003B (en) | 2014-12-16 | 2014-12-16 | A kind of general high-performance data wiring method of Hbase databases |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104537003A true CN104537003A (en) | 2015-04-22 |
CN104537003B CN104537003B (en) | 2018-01-09 |
Family
ID=52852531
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410777982.5A Active CN104537003B (en) | 2014-12-16 | 2014-12-16 | A kind of general high-performance data wiring method of Hbase databases |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104537003B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105608223A (en) * | 2016-01-12 | 2016-05-25 | 北京中交兴路车联网科技有限公司 | Hbase database entering method and system for kafka |
CN107370797A (en) * | 2017-06-30 | 2017-11-21 | 北京百度网讯科技有限公司 | A kind of method and apparatus of the strongly-ordered queue operation based on HBase |
CN107491314A (en) * | 2017-08-30 | 2017-12-19 | 四川长虹电器股份有限公司 | Processing method is write based on Read-Write Locks algorithm is accessible to HBASE real time datas |
CN112256523A (en) * | 2020-09-23 | 2021-01-22 | 贝壳技术有限公司 | Service data processing method and device |
CN112445596A (en) * | 2020-11-27 | 2021-03-05 | 平安普惠企业管理有限公司 | Multithreading-based data import method and system and storage medium |
CN114237505A (en) * | 2021-12-14 | 2022-03-25 | 中国建设银行股份有限公司 | Batch processing method and device of business data and computer equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101957863A (en) * | 2010-10-14 | 2011-01-26 | 广州从兴电子开发有限公司 | Data parallel processing method, device and system |
CN103049556A (en) * | 2012-12-28 | 2013-04-17 | 中国科学院深圳先进技术研究院 | Fast statistical query method for mass medical data |
CN103390038A (en) * | 2013-07-16 | 2013-11-13 | 西安交通大学 | HBase-based incremental index creation and retrieval method |
CN103646073A (en) * | 2013-12-11 | 2014-03-19 | 浪潮电子信息产业股份有限公司 | Condition query optimizing method based on HBase table |
US20140236960A1 (en) * | 2013-02-19 | 2014-08-21 | Futurewei Technologies, Inc. | System and Method for Database Searching |
-
2014
- 2014-12-16 CN CN201410777982.5A patent/CN104537003B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101957863A (en) * | 2010-10-14 | 2011-01-26 | 广州从兴电子开发有限公司 | Data parallel processing method, device and system |
CN103049556A (en) * | 2012-12-28 | 2013-04-17 | 中国科学院深圳先进技术研究院 | Fast statistical query method for mass medical data |
US20140236960A1 (en) * | 2013-02-19 | 2014-08-21 | Futurewei Technologies, Inc. | System and Method for Database Searching |
CN103390038A (en) * | 2013-07-16 | 2013-11-13 | 西安交通大学 | HBase-based incremental index creation and retrieval method |
CN103646073A (en) * | 2013-12-11 | 2014-03-19 | 浪潮电子信息产业股份有限公司 | Condition query optimizing method based on HBase table |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105608223A (en) * | 2016-01-12 | 2016-05-25 | 北京中交兴路车联网科技有限公司 | Hbase database entering method and system for kafka |
CN105608223B (en) * | 2016-01-12 | 2019-04-30 | 北京中交兴路车联网科技有限公司 | For the storage method and system of the Hbase database of kafka |
CN107370797A (en) * | 2017-06-30 | 2017-11-21 | 北京百度网讯科技有限公司 | A kind of method and apparatus of the strongly-ordered queue operation based on HBase |
CN107491314A (en) * | 2017-08-30 | 2017-12-19 | 四川长虹电器股份有限公司 | Processing method is write based on Read-Write Locks algorithm is accessible to HBASE real time datas |
CN112256523A (en) * | 2020-09-23 | 2021-01-22 | 贝壳技术有限公司 | Service data processing method and device |
CN112256523B (en) * | 2020-09-23 | 2023-01-06 | 贝壳技术有限公司 | Service data processing method and device |
CN112445596A (en) * | 2020-11-27 | 2021-03-05 | 平安普惠企业管理有限公司 | Multithreading-based data import method and system and storage medium |
CN112445596B (en) * | 2020-11-27 | 2024-02-02 | 上海睿量私募基金管理有限公司 | Data importing method, system and storage medium based on multithreading |
CN114237505A (en) * | 2021-12-14 | 2022-03-25 | 中国建设银行股份有限公司 | Batch processing method and device of business data and computer equipment |
Also Published As
Publication number | Publication date |
---|---|
CN104537003B (en) | 2018-01-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104537003A (en) | Universal high-performance data writing method for Hbase database | |
US9411659B2 (en) | Data processing method used in distributed system | |
US8381230B2 (en) | Message passing with queues and channels | |
US8676874B2 (en) | Data structure for tiling and packetizing a sparse matrix | |
CN105117417A (en) | Read-optimized memory database Trie tree index method | |
CN104408163A (en) | Data hierarchical storage method and device | |
CN103019861A (en) | Distribution method and distribution device of virtual machine | |
CN107291539B (en) | Cluster program scheduler method based on resource significance level | |
CN110287038A (en) | Promote the method and system of the data-handling efficiency of Spark Streaming frame | |
Liu et al. | Massive image data management using HBase and MapReduce | |
CN106406762A (en) | A repeated data deleting method and device | |
Hegeman et al. | Distributed LiDAR data processing in a high-memory cloud-computing environment | |
US9069621B2 (en) | Submitting operations to a shared resource based on busy-to-success ratios | |
Dai et al. | Improving load balance for data-intensive computing on cloud platforms | |
CN106484532B (en) | GPGPU parallel calculating method towards SPH fluid simulation | |
US8543722B2 (en) | Message passing with queues and channels | |
CN104158902A (en) | Method and device of distributing Hbase data blocks based on number of requests | |
CN103324577B (en) | Based on the extensive itemize file allocation system minimizing IO access conflict and file itemize | |
US10339052B2 (en) | Massive access request for out-of-core textures by a parallel processor with limited memory | |
CN104715349A (en) | Method and system for calculating e-commerce freight | |
JP6333371B2 (en) | Method for implementing bit arrays in cache lines | |
CN104572903B (en) | A kind of method of the control data loading of Hbase database | |
Zhu et al. | RECODS: replica consistency-on-demand store | |
CN106557430A (en) | A kind of data cached brush method and device | |
CN117785759B (en) | Data storage method, data reading method, electronic device, and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |