CN105589910A - HBase (Hadoop Database)-based mass transaction data retrieving method and system - Google Patents
HBase (Hadoop Database)-based mass transaction data retrieving method and system Download PDFInfo
- Publication number
- CN105589910A CN105589910A CN201410850869.5A CN201410850869A CN105589910A CN 105589910 A CN105589910 A CN 105589910A CN 201410850869 A CN201410850869 A CN 201410850869A CN 105589910 A CN105589910 A CN 105589910A
- Authority
- CN
- China
- Prior art keywords
- hbase
- line unit
- transaction data
- inquiry request
- region
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides an HBase (Hadoop Database)-based mass transaction data retrieving method. The method comprises the following steps: a, receiving a query request input by a user through a query interface; b, transmitting the received query request to an HBase master node; c, transmitting the query request to a region server in a region corresponding to a range represented by a start key and an end key in the query request by the HBase master node according to the range; and d, finishing a query in transaction data by the server based on the query request. The invention also provides a corresponding system.
Description
Technical field
The present invention relates to retrieval technique, more specifically, relate to the magnanimity transaction data retrieval technique based on HBase.
Background technology
In the daily operation of bankcard association, can exist some transaction on problem, need to second level speed retrieve corresponding transaction message information so that analyze investigation problem place.
Summary of the invention
In view of this, the invention provides the magnanimity transaction data search method based on HBase. The method comprises that a. receives the inquiry request that user inputs by query interface; B. send received inquiry request to HBase host node; C. the scope that described HBase host node characterizes according to the beginning key in inquiry request and end key, sends to described inquiry request the region server in the region corresponding with this scope; D. in described transaction data, complete inquiry based on this inquiry request by this server.
The described magnanimity transaction data search method based on HBase, illustratively, the line unit relevant to the structure in described region is according to predetermined format setting, two highest bytes of wherein said line unit are the date territories that represents trade date, and described trade date is that universal time UTC is got to the value after 31 moulds.
The described magnanimity transaction data search method based on HBase, illustratively, the condition field in described line unit is immediately following after the little time domain representing hour, and described little time domain is immediately following after described date territory.
The described magnanimity transaction data search method based on HBase, illustratively, also comprise and set up secondary index table, be used for the mapping relations of the index value and the described line unit that record every transaction message, these mapping relations are relations one to one, and described steps d comprises: d1. obtains the line unit of data to be checked in described region by this server based on this inquiry request mapping relations of the index value based on every transaction message and described line unit in described secondary index table; D2. based on this line unit, in described transaction data, complete inquiry.
The present invention also provides a kind of magnanimity transaction data searching system based on HBase, and described system comprises: query interface, for user input query request; Receiver module, it is arranged at HBase host node, for receiving the inquiry request of user's input; Region determination module, it is arranged at described HBase host node, for the scope characterizing according to beginning key and the end key of inquiry request, described inquiry request is sent to the region server in the region corresponding with this scope; Multiple retrieval modules, are separately positioned in each region server, complete inquiry for the inquiry request receiving based on described region server at transaction data.
Magnanimity transaction data searching system based on HBase, illustratively, described system also comprises module is set, it is for determining that the line unit of described area size is according to predetermined format setting, two highest orders of wherein said line unit are the date territories that represents trade date, and described trade date is that universal time UTC is got to the value after 31 moulds, the condition field in described line unit is immediately following after the little time domain representing hour, and described little time domain is immediately following after described date territory.
Magnanimity transaction data searching system based on HBase, illustratively, also comprise that concordance list sets up unit, it is for setting up secondary index table, be used for the mapping relations of the index value and the described line unit that record every transaction message, these mapping relations are relations one to one, described multiple retrieval module is arranged to obtain the line unit of data to be checked in described region by this server based on this inquiry request mapping relations of the index value based on every transaction message and described line unit in described secondary index table, and based on this line unit, in described transaction data, complete inquiry.
Brief description of the drawings
Fig. 1 is the flow chart of the magnanimity transaction data search method based on HBase of example according to the present invention.
Fig. 2 is the cluster environment schematic diagram of the HBase of the example according to the present invention.
Fig. 3 shows the relation between file, HBase secondary index table and HBase message log sheet.
Fig. 4 is the structural representation of the magnanimity transaction data searching system based on HBase of example according to the present invention.
Detailed description of the invention
Describe schematic example of the present invention referring now to accompanying drawing, identical drawing reference numeral represents identical element. Each embodiment described below contributes to those skilled in the art thoroughly to understand the present invention, and is intended to example and unrestricted. Unless otherwise defined, the term (comprising science, technology and industry slang) using in literary composition has the identical implication of implication of generally understanding with those skilled in the art in the invention.
HBase full name is HadoopDatabase, is a high reliability, high-performance, towards row, telescopic distributed memory system, utilizes HBase technology can on cheap PCServer, erect large-scale structure storage cluster. The Google paper " Bigtable: the distributed memory system of a structural data " that HBase Technology origin is write in FayChang. About the basic introduction of HBase can be obtained in many data, for example can be referring to the introduction on http://baike.baidu.com/link url=ZZH4mj7uoS2hd77AxZDszb0TW1cqt5u0_cmFOoraNCPjg1s-IPAa GxlDyDtca0pXbfDUfhPx45zxQBp1O_J54q.
Fig. 1 is the flow chart of the magnanimity transaction data search method based on HBase of example according to the present invention. In step 10, receive the inquiry request that user inputs by query interface. The inquiry request of this reception is sent to the host node of HBase, see step 12. The scope that this HBase host node characterizes according to the beginning key in inquiry request (being startkey) and end key (being endkey), described inquiry request is sent to the region server (regionserver) of the region corresponding with this scope (region), see step 14. The multiple regions of district inclusion, in nonrestrictive example, comprise 30 regions herein below illustrating. One or more region is by a regional service management. For example two regions of a regional service management,, just there are 15 region server in 30 regions. In described transaction data, complete inquiry based on this inquiry request by this server, see step 16. Further, send inquired about transaction data to user.
Fig. 2 is the cluster environment schematic diagram of the HBase of the example according to the present invention. As shown in the figure, user is by query interface 20 input inquiry requests. This inquiry request is sent to HBase host node 22. The scope as described in the step 14 of Fig. 1, HBase host node 22 characterizes according to the beginning key in inquiry request and end key, sends to described inquiry request the region server in the region corresponding with this scope. For example in this example, start the scope that key and end key characterize, the interval of the packet storage that will inquire about namely characterizing is in SECTOR-SEVEN territory, and HBase host node 22 sends to inquiry request the region server 242 in this SECTOR-SEVEN territory of management. Region server is receiving after this inquiry request, the data of search request in the transaction message of storing in SECTOR-SEVEN territory, and the data that found the most at last return to user.
According to example of the present invention, build relevant line unit to region and set according to predetermined format, at this size, writing speed etc. of referring to line unit (rowkey) according to the former motivation of HBase having determined region relevant to the structure in region. Wherein, two highest bytes of line unit are the date territories that represents trade date, and wherein, the trade date that writes on this date territory is that universal time UTC is got to the value after 31 moulds. Compared with the mode that occupies 8 bytes with the conventional date, memory space has obviously been saved in date territory according to the present invention. Condition field in line unit is immediately following after the little time domain representing hour, and described little time domain is immediately following after described date territory. Table 1 represented the example according to the present invention line unit form is set:
Table 1
Wherein, F100 represents message receiver structure, and F33 represents message transmitting mechanism, and F11 represents system keeps track number.
Because mode as described above arranges line unit, date position is shortened to 2 from 8 (yyyymmdd) of routine. In addition, by the most frequently used querying condition F100 territory be arranged on represent hour little time domain after, make system not need extra assignment just can carry out quick-searching to message according to date and F100 territory. According to the planning of region, every day 350G data, each Region is 4G, this has 29-30 region. According to the Distribution Principle of rowkey, in the time one day up to ten thousand message being carried out to distributed parsing warehouse-in, each message place hour be random, therefore can be from multiple regions random chosen area so that message is write, thereby reach desirable writing speed.
According to example of the present invention, be also provided with secondary index table, for recording the mapping relations of index value and described line unit of every transaction message, these mapping relations are relations one to one. Secondary index table is for example that inquiry mode is set up according to transmit leg F33, the index value of every message of this secondary index table record, that is concordance list line unit is as shown in table 2:
Table 2
Wherein, trade date is the same with the trade date in table 1, is the value of UTC date after to 31 deliverys.
The line unit of message log sheet is as shown in table 1 above. According to example of the present invention, the foundation of this secondary index table is the initial data that reads textual form, by MAP/Reduce process generating indexes data, then this index data is written in concordance list.
Fig. 3 shows the relation between file, HBase secondary index table and HBase message log sheet. In Fig. 3, from file 1, file 2 ..., file n reads the initial data of textual form, produce index data by MAP/Reduce, and this index data write to people to HBase secondary index table 30. In this secondary index table 30, the line unit in the index value for every message of setting up and HBase message log sheet 32 is one to one. HBase message log sheet 32 shown in Fig. 3 is corresponding to the multiple regions shown in Fig. 2. Owing to being subject to the impact of HBase build-in attribute, the search condition that can provide while directly retrieval by Rowkey is limited, and the introducing of secondary index table can many indexes condition be expanded, and even can support full-text search.
In the situation that introducing secondary index table, step 16 illustrated in Figure 1 is further refined as by this server and obtains the line unit of data to be checked in described region and based on this line unit, in described transaction data, complete inquiry based on this inquiry request mapping relations of the index value based on every transaction message and described line unit in described secondary index table. In conjunction with Fig. 3, based on the corresponding relation of this inquiry request rowkey based on concordance list rowkey and message log sheet in secondary index table 30, first determine the rowkey of message log sheet, then in the corresponding region shown in Fig. 2, search in other words to HBase message log sheet 32().
In the time inquiring about according to the method shown in Fig. 1, querying condition is generally time, receiving mechanism, transmitting mechanism etc. While receiving querying condition, can determine time range from startkey and the endkey of querying condition rapidly, and the data that rowkey mode writes according to the present invention, its time on date is front 4 bytes that are positioned at every data writing, therefore can navigate to rapidly region. In addition receiving mechanism, immediately following time-domain, therefore, in the case of comprising this search condition of conventional receiving mechanism, can further navigate to rapidly again receiving mechanism.
After introducing secondary index table, querying condition can be expanded, for example, can comprise certain word or the keyword of initial data. Suppose querying condition in this example comprise the date, transmit leg of institute's query message, with message in keyword A. After receiving this inquiry request, first process according to step 10 as shown in Figure 1,12 and 14, then, first according to secondary index table by the inquiry request that has comprised keyword A, find the rowkey that comes from the original document that comprises keyword A, that is concordance list rowkey, then inquire message log sheet rowkey according to this concordance list rowkey, and then inquire about in corresponding region.
Fig. 4 is the structural representation of the magnanimity transaction data searching system based on HBase of example according to the present invention. As shown in the figure, this system comprises query interface 40, for user input query request; Receiver module 42, it is arranged at HBase host node, for receiving the inquiry request of user's input; Region determination module 44, it is arranged at described HBase host node, for the scope characterizing according to beginning key and the end key of inquiry request, described inquiry request is sent to the region server in the region corresponding with this scope; Multiple retrieval modules 46, are separately positioned in each region server, complete inquiry for the inquiry request receiving based on described region server at transaction data.
The inquiry request that user inputs by query interface 40. Be arranged on HBase host node receiver module 42 and receive this inquiry request transmission. The scope that the region determination module 44 that is arranged on this HBase host node characterizes according to the beginning key in inquiry request (being startkey) and end key (being endkey), determine region corresponding to described inquiry request, and and then this inquiry request is sent to the region server (regionserver) of the region corresponding with this scope (region). As described above, the multiple regions of district inclusion, in nonrestrictive example, comprise 30 regions in signal herein. One or more region is by a regional service management. For example two regions of a regional service management,, just there are 15 region server in 30 regions. By being equipped with retrieval module on the each server of retrieval module 46(being arranged on determined server, their unified retrieval modules 46 that is designated in this article, but only refer to the retrieval module on determined server in conjunction with context-aware retrieval module 46 herein), in described transaction data, complete inquiry based on this inquiry request. Further, the result for retrieval of retrieval module 46 can send to for example electronic equipment at query interface 40 places via this server, so that user knows.
According to the example of the magnanimity transaction data searching system based on HBase shown in Fig. 4, it also comprises module (not shown) is set, be used for the line unit that determines described area size according to predetermined format setting, two highest orders of wherein said line unit are the date territories that represents trade date, and described trade date is that universal time UTC is got to the value after 31 moulds, condition field in described line unit is immediately following after the little time domain representing hour, and described little time domain is immediately following after described date territory. Line unit (rowkey) example is set as shown in Table 1, repeat no more.
According to the example of the magnanimity transaction data searching system based on HBase shown in Fig. 4, alternatively, also can comprise that concordance list sets up unit (not shown), it is for setting up secondary index table, be used for the mapping relations of the index value and the described line unit that record every transaction message, these mapping relations are relations one to one, described multiple retrieval module is arranged to obtain the line unit of data to be checked in described region by this server based on this inquiry request mapping relations of the index value based on every transaction message and described line unit in described secondary index table, and based on this line unit, in described transaction data, complete inquiry. secondary index table is for example that inquiry mode is set up according to transmit leg F33, the index value of every message of this secondary index table record, that is and, concordance list line unit is as shown in Table 2. the line unit of message log sheet is as shown in table 1 above. according to example of the present invention, the foundation of this secondary index table is the initial data that reads textual form, by MAP/Reduce process generating indexes data, then this index data is written in concordance list.
In the situation that comprising that secondary index table is set up unit, that is to say in the situation that having set up secondary index table, retrieval module 46 illustrated in Figure 4 is configured to be obtained the line unit of data to be checked in described region and based on this line unit, in described transaction data, complete inquiry based on this inquiry request mapping relations of the index value based on every transaction message and described line unit in described secondary index table by this server. In conjunction with Fig. 3, based on the corresponding relation of this inquiry request rowkey based on concordance list rowkey and message log sheet in secondary index table 30, first determine the rowkey of message log sheet, then in the corresponding region shown in Fig. 2, search in other words to HBase message log sheet 32().
Method of the present invention can realize by the mode of software, also can realize by the mode of hardware or software and combination of hardware. Similarly, the magnanimity transaction data search method based on HBase of the present invention can be embodied as the combination of software, hardware or software and hardware.
Occur exceeding in trading volume and estimate increase substantially in the situation that, can come extension storage space by increasing region server, increase the mode memory space extending transversely of HBase clustered node (host node), under these circumstances, due to according in scheme of the present invention, to the retrieval of inquiry request be distributed to the region server of respective regions from HBase host node, therefore retrieval rate is still very fast. Moreover according in the solution of the present invention, the coupling that is arranged so that retrieval of rowkey is more rapid, quick, has also accelerated inquiry velocity. Further, in the situation that being provided with secondary index table, also can expanding query condition.
Claims (7)
1. the magnanimity transaction data search method based on HBase, is characterized in that, described method comprises:
A. receive the inquiry request that user inputs by query interface;
B. send received inquiry request to HBase host node;
C. the scope that described HBase host node characterizes according to the beginning key in inquiry request and end key, sends to described inquiry request the region server in the region corresponding with this scope;
D. in described transaction data, complete inquiry based on this inquiry request by this server.
2. the magnanimity transaction data search method based on HBase as claimed in claim 1, it is characterized in that, the line unit relevant to the structure in described region is according to predetermined format setting, two highest bytes of wherein said line unit are the date territories that represents trade date, and described trade date is that universal time UTC is got to the value after 31 moulds.
3. the magnanimity transaction data search method based on HBase as claimed in claim 2, is characterized in that, the condition field in described line unit is immediately following after the little time domain representing hour, and described little time domain is immediately following after described date territory.
4. the magnanimity transaction data search method based on HBase as claimed in claim 3, it is characterized in that, also comprise and set up secondary index table, for recording the mapping relations of index value and described line unit of every transaction message, these mapping relations are relations one to one, and described steps d comprises:
D1. obtain the line unit of data to be checked in described region by this server based on this inquiry request mapping relations of the index value based on every transaction message and described line unit in described secondary index table;
D2. based on this line unit, in described transaction data, complete inquiry.
5. the magnanimity transaction data searching system based on HBase, is characterized in that, described system comprises:
Query interface, for user input query request;
Receiver module, it is arranged at HBase host node, for receiving the inquiry request of user's input;
Region determination module, it is arranged at described HBase host node, for the scope characterizing according to beginning key and the end key of inquiry request, described inquiry request is sent to the region server in the region corresponding with this scope;
Multiple retrieval modules, are separately positioned in each region server, complete inquiry for the inquiry request receiving based on described region server at transaction data.
6. the magnanimity transaction data searching system based on HBase as claimed in claim 5, it is characterized in that, described system also comprises module is set, it is for determining that the line unit of described area size is according to predetermined format setting, two highest orders of wherein said line unit are the date territories that represents trade date, and described trade date is that universal time UTC is got to the value after 31 moulds, condition field in described line unit is immediately following after the little time domain representing hour, and described little time domain is immediately following after described date territory.
7. the magnanimity transaction data searching system based on HBase as claimed in claim 6, it is characterized in that, also comprise that concordance list sets up unit, it is for setting up secondary index table, be used for the mapping relations of the index value and the described line unit that record every transaction message, these mapping relations are relations one to one, described multiple retrieval module is arranged to obtain the line unit of data to be checked in described region by this server based on this inquiry request mapping relations of the index value based on every transaction message and described line unit in described secondary index table, and based on this line unit, in described transaction data, complete inquiry.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410850869.5A CN105589910A (en) | 2014-12-31 | 2014-12-31 | HBase (Hadoop Database)-based mass transaction data retrieving method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410850869.5A CN105589910A (en) | 2014-12-31 | 2014-12-31 | HBase (Hadoop Database)-based mass transaction data retrieving method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105589910A true CN105589910A (en) | 2016-05-18 |
Family
ID=55929492
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410850869.5A Pending CN105589910A (en) | 2014-12-31 | 2014-12-31 | HBase (Hadoop Database)-based mass transaction data retrieving method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105589910A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107357915A (en) * | 2017-07-19 | 2017-11-17 | 郑州云海信息技术有限公司 | A kind of date storage method and system |
CN107894942A (en) * | 2017-12-04 | 2018-04-10 | 北京小度信息科技有限公司 | The monitoring method and device of tables of data visit capacity |
CN109144995A (en) * | 2017-06-26 | 2019-01-04 | 辽宁艾特斯智能交通技术有限公司 | A kind of highway magnanimity transaction data search method |
CN110287198A (en) * | 2019-07-01 | 2019-09-27 | 四川新网银行股份有限公司 | Finance data indexing means based on HBase database |
CN110347722A (en) * | 2019-07-11 | 2019-10-18 | 软通智慧科技有限公司 | Data acquisition method, device, equipment and storage medium based on HBase |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101727465A (en) * | 2008-11-03 | 2010-06-09 | 中国移动通信集团公司 | Methods for establishing and inquiring index of distributed column storage database, device and system thereof |
CN102750356A (en) * | 2012-06-11 | 2012-10-24 | 清华大学 | Construction and management method for secondary indexes of key value library |
CN103678520A (en) * | 2013-11-29 | 2014-03-26 | 中国科学院计算技术研究所 | Multi-dimensional interval query method and system based on cloud computing |
CN103888547A (en) * | 2014-04-16 | 2014-06-25 | 中国银行股份有限公司 | Bill processing method and server |
-
2014
- 2014-12-31 CN CN201410850869.5A patent/CN105589910A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101727465A (en) * | 2008-11-03 | 2010-06-09 | 中国移动通信集团公司 | Methods for establishing and inquiring index of distributed column storage database, device and system thereof |
CN102750356A (en) * | 2012-06-11 | 2012-10-24 | 清华大学 | Construction and management method for secondary indexes of key value library |
CN103678520A (en) * | 2013-11-29 | 2014-03-26 | 中国科学院计算技术研究所 | Multi-dimensional interval query method and system based on cloud computing |
CN103888547A (en) * | 2014-04-16 | 2014-06-25 | 中国银行股份有限公司 | Bill processing method and server |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109144995A (en) * | 2017-06-26 | 2019-01-04 | 辽宁艾特斯智能交通技术有限公司 | A kind of highway magnanimity transaction data search method |
CN107357915A (en) * | 2017-07-19 | 2017-11-17 | 郑州云海信息技术有限公司 | A kind of date storage method and system |
CN107894942A (en) * | 2017-12-04 | 2018-04-10 | 北京小度信息科技有限公司 | The monitoring method and device of tables of data visit capacity |
CN107894942B (en) * | 2017-12-04 | 2020-06-02 | 北京星选科技有限公司 | Method and device for monitoring data table access amount |
CN110287198A (en) * | 2019-07-01 | 2019-09-27 | 四川新网银行股份有限公司 | Finance data indexing means based on HBase database |
CN110347722A (en) * | 2019-07-11 | 2019-10-18 | 软通智慧科技有限公司 | Data acquisition method, device, equipment and storage medium based on HBase |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103997507B (en) | A kind of method for pushing and device of information | |
US9165085B2 (en) | System and method for publishing aggregated content on mobile devices | |
CN105589910A (en) | HBase (Hadoop Database)-based mass transaction data retrieving method and system | |
CN101727465B (en) | Methods for establishing and inquiring index of distributed column storage database, device and system thereof | |
CN101276361B (en) | Method and system for displaying related key words | |
CN104516979B (en) | A kind of data query method and system based on quadratic search | |
WO2015172490A1 (en) | Method and apparatus for providing extended search item | |
CN103258036A (en) | Distributed real-time search engine based on p2p | |
CN103544261A (en) | Method and device for managing global indexes of mass structured log data | |
CN101599886B (en) | Query method, system and device in distributed structured network | |
JPWO2014109009A1 (en) | Database management method, management computer and storage medium | |
CN103678491A (en) | Method based on Hadoop small file optimization and reverse index establishment | |
CN104408044A (en) | File access method and system | |
CN104090901A (en) | Method, device and server for processing data | |
WO2022083436A1 (en) | Data processing method and apparatus, and device and readable storage medium | |
CN103823846A (en) | Method for storing and querying big data on basis of graph theories | |
CN103942344A (en) | File preview method and file processing system | |
CN105303501A (en) | Community information service system and method based on picture recommendation | |
CN103605778A (en) | Method, device and system for locating video file | |
CN108319608A (en) | The method, apparatus and system of access log storage inquiry | |
CN107636655B (en) | System and method for providing data as a service (DaaS) in real time | |
CN104050149A (en) | Contact information recognition system for external textual data displayed by in-vehicle infotainment systems | |
CN112052219A (en) | File storage and retrieval method and device, electronic equipment and readable storage medium | |
CN109144951A (en) | A kind of catalogue update method and meta data server based on distributed file system | |
CN104408084A (en) | Method and device for screening big data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20160518 |
|
RJ01 | Rejection of invention patent application after publication |