CN105589910A - HBase (Hadoop Database)-based mass transaction data retrieving method and system - Google Patents

HBase (Hadoop Database)-based mass transaction data retrieving method and system Download PDF

Info

Publication number
CN105589910A
CN105589910A CN201410850869.5A CN201410850869A CN105589910A CN 105589910 A CN105589910 A CN 105589910A CN 201410850869 A CN201410850869 A CN 201410850869A CN 105589910 A CN105589910 A CN 105589910A
Authority
CN
China
Prior art keywords
hbase
line unit
transaction data
inquiry request
region
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410850869.5A
Other languages
Chinese (zh)
Inventor
邱泽铭
戚跃民
黄明雄
陈根
覃非
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Unionpay Co Ltd
Original Assignee
China Unionpay Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Unionpay Co Ltd filed Critical China Unionpay Co Ltd
Priority to CN201410850869.5A priority Critical patent/CN105589910A/en
Publication of CN105589910A publication Critical patent/CN105589910A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an HBase (Hadoop Database)-based mass transaction data retrieving method. The method comprises the following steps: a, receiving a query request input by a user through a query interface; b, transmitting the received query request to an HBase master node; c, transmitting the query request to a region server in a region corresponding to a range represented by a start key and an end key in the query request by the HBase master node according to the range; and d, finishing a query in transaction data by the server based on the query request. The invention also provides a corresponding system.

Description

The retrieval of magnanimity transaction data and system based on HBase
Technical field
The present invention relates to retrieval technique, more specifically, relate to the magnanimity transaction data retrieval technique based on HBase.
Background technology
In the daily operation of bankcard association, can exist some transaction on problem, need to second level speed retrieve corresponding transaction message information so that analyze investigation problem place.
Summary of the invention
In view of this, the invention provides the magnanimity transaction data search method based on HBase. The method comprises that a. receives the inquiry request that user inputs by query interface; B. send received inquiry request to HBase host node; C. the scope that described HBase host node characterizes according to the beginning key in inquiry request and end key, sends to described inquiry request the region server in the region corresponding with this scope; D. in described transaction data, complete inquiry based on this inquiry request by this server.
The described magnanimity transaction data search method based on HBase, illustratively, the line unit relevant to the structure in described region is according to predetermined format setting, two highest bytes of wherein said line unit are the date territories that represents trade date, and described trade date is that universal time UTC is got to the value after 31 moulds.
The described magnanimity transaction data search method based on HBase, illustratively, the condition field in described line unit is immediately following after the little time domain representing hour, and described little time domain is immediately following after described date territory.
The described magnanimity transaction data search method based on HBase, illustratively, also comprise and set up secondary index table, be used for the mapping relations of the index value and the described line unit that record every transaction message, these mapping relations are relations one to one, and described steps d comprises: d1. obtains the line unit of data to be checked in described region by this server based on this inquiry request mapping relations of the index value based on every transaction message and described line unit in described secondary index table; D2. based on this line unit, in described transaction data, complete inquiry.
The present invention also provides a kind of magnanimity transaction data searching system based on HBase, and described system comprises: query interface, for user input query request; Receiver module, it is arranged at HBase host node, for receiving the inquiry request of user's input; Region determination module, it is arranged at described HBase host node, for the scope characterizing according to beginning key and the end key of inquiry request, described inquiry request is sent to the region server in the region corresponding with this scope; Multiple retrieval modules, are separately positioned in each region server, complete inquiry for the inquiry request receiving based on described region server at transaction data.
Magnanimity transaction data searching system based on HBase, illustratively, described system also comprises module is set, it is for determining that the line unit of described area size is according to predetermined format setting, two highest orders of wherein said line unit are the date territories that represents trade date, and described trade date is that universal time UTC is got to the value after 31 moulds, the condition field in described line unit is immediately following after the little time domain representing hour, and described little time domain is immediately following after described date territory.
Magnanimity transaction data searching system based on HBase, illustratively, also comprise that concordance list sets up unit, it is for setting up secondary index table, be used for the mapping relations of the index value and the described line unit that record every transaction message, these mapping relations are relations one to one, described multiple retrieval module is arranged to obtain the line unit of data to be checked in described region by this server based on this inquiry request mapping relations of the index value based on every transaction message and described line unit in described secondary index table, and based on this line unit, in described transaction data, complete inquiry.
Brief description of the drawings
Fig. 1 is the flow chart of the magnanimity transaction data search method based on HBase of example according to the present invention.
Fig. 2 is the cluster environment schematic diagram of the HBase of the example according to the present invention.
Fig. 3 shows the relation between file, HBase secondary index table and HBase message log sheet.
Fig. 4 is the structural representation of the magnanimity transaction data searching system based on HBase of example according to the present invention.
Detailed description of the invention
Describe schematic example of the present invention referring now to accompanying drawing, identical drawing reference numeral represents identical element. Each embodiment described below contributes to those skilled in the art thoroughly to understand the present invention, and is intended to example and unrestricted. Unless otherwise defined, the term (comprising science, technology and industry slang) using in literary composition has the identical implication of implication of generally understanding with those skilled in the art in the invention.
HBase full name is HadoopDatabase, is a high reliability, high-performance, towards row, telescopic distributed memory system, utilizes HBase technology can on cheap PCServer, erect large-scale structure storage cluster. The Google paper " Bigtable: the distributed memory system of a structural data " that HBase Technology origin is write in FayChang. About the basic introduction of HBase can be obtained in many data, for example can be referring to the introduction on http://baike.baidu.com/link url=ZZH4mj7uoS2hd77AxZDszb0TW1cqt5u0_cmFOoraNCPjg1s-IPAa GxlDyDtca0pXbfDUfhPx45zxQBp1O_J54q.
Fig. 1 is the flow chart of the magnanimity transaction data search method based on HBase of example according to the present invention. In step 10, receive the inquiry request that user inputs by query interface. The inquiry request of this reception is sent to the host node of HBase, see step 12. The scope that this HBase host node characterizes according to the beginning key in inquiry request (being startkey) and end key (being endkey), described inquiry request is sent to the region server (regionserver) of the region corresponding with this scope (region), see step 14. The multiple regions of district inclusion, in nonrestrictive example, comprise 30 regions herein below illustrating. One or more region is by a regional service management. For example two regions of a regional service management,, just there are 15 region server in 30 regions. In described transaction data, complete inquiry based on this inquiry request by this server, see step 16. Further, send inquired about transaction data to user.
Fig. 2 is the cluster environment schematic diagram of the HBase of the example according to the present invention. As shown in the figure, user is by query interface 20 input inquiry requests. This inquiry request is sent to HBase host node 22. The scope as described in the step 14 of Fig. 1, HBase host node 22 characterizes according to the beginning key in inquiry request and end key, sends to described inquiry request the region server in the region corresponding with this scope. For example in this example, start the scope that key and end key characterize, the interval of the packet storage that will inquire about namely characterizing is in SECTOR-SEVEN territory, and HBase host node 22 sends to inquiry request the region server 242 in this SECTOR-SEVEN territory of management. Region server is receiving after this inquiry request, the data of search request in the transaction message of storing in SECTOR-SEVEN territory, and the data that found the most at last return to user.
According to example of the present invention, build relevant line unit to region and set according to predetermined format, at this size, writing speed etc. of referring to line unit (rowkey) according to the former motivation of HBase having determined region relevant to the structure in region. Wherein, two highest bytes of line unit are the date territories that represents trade date, and wherein, the trade date that writes on this date territory is that universal time UTC is got to the value after 31 moulds. Compared with the mode that occupies 8 bytes with the conventional date, memory space has obviously been saved in date territory according to the present invention. Condition field in line unit is immediately following after the little time domain representing hour, and described little time domain is immediately following after described date territory. Table 1 represented the example according to the present invention line unit form is set:
Table 1
Wherein, F100 represents message receiver structure, and F33 represents message transmitting mechanism, and F11 represents system keeps track number.
Because mode as described above arranges line unit, date position is shortened to 2 from 8 (yyyymmdd) of routine. In addition, by the most frequently used querying condition F100 territory be arranged on represent hour little time domain after, make system not need extra assignment just can carry out quick-searching to message according to date and F100 territory. According to the planning of region, every day 350G data, each Region is 4G, this has 29-30 region. According to the Distribution Principle of rowkey, in the time one day up to ten thousand message being carried out to distributed parsing warehouse-in, each message place hour be random, therefore can be from multiple regions random chosen area so that message is write, thereby reach desirable writing speed.
According to example of the present invention, be also provided with secondary index table, for recording the mapping relations of index value and described line unit of every transaction message, these mapping relations are relations one to one. Secondary index table is for example that inquiry mode is set up according to transmit leg F33, the index value of every message of this secondary index table record, that is concordance list line unit is as shown in table 2:
Table 2
Wherein, trade date is the same with the trade date in table 1, is the value of UTC date after to 31 deliverys.
The line unit of message log sheet is as shown in table 1 above. According to example of the present invention, the foundation of this secondary index table is the initial data that reads textual form, by MAP/Reduce process generating indexes data, then this index data is written in concordance list.
Fig. 3 shows the relation between file, HBase secondary index table and HBase message log sheet. In Fig. 3, from file 1, file 2 ..., file n reads the initial data of textual form, produce index data by MAP/Reduce, and this index data write to people to HBase secondary index table 30. In this secondary index table 30, the line unit in the index value for every message of setting up and HBase message log sheet 32 is one to one. HBase message log sheet 32 shown in Fig. 3 is corresponding to the multiple regions shown in Fig. 2. Owing to being subject to the impact of HBase build-in attribute, the search condition that can provide while directly retrieval by Rowkey is limited, and the introducing of secondary index table can many indexes condition be expanded, and even can support full-text search.
In the situation that introducing secondary index table, step 16 illustrated in Figure 1 is further refined as by this server and obtains the line unit of data to be checked in described region and based on this line unit, in described transaction data, complete inquiry based on this inquiry request mapping relations of the index value based on every transaction message and described line unit in described secondary index table. In conjunction with Fig. 3, based on the corresponding relation of this inquiry request rowkey based on concordance list rowkey and message log sheet in secondary index table 30, first determine the rowkey of message log sheet, then in the corresponding region shown in Fig. 2, search in other words to HBase message log sheet 32().
In the time inquiring about according to the method shown in Fig. 1, querying condition is generally time, receiving mechanism, transmitting mechanism etc. While receiving querying condition, can determine time range from startkey and the endkey of querying condition rapidly, and the data that rowkey mode writes according to the present invention, its time on date is front 4 bytes that are positioned at every data writing, therefore can navigate to rapidly region. In addition receiving mechanism, immediately following time-domain, therefore, in the case of comprising this search condition of conventional receiving mechanism, can further navigate to rapidly again receiving mechanism.
After introducing secondary index table, querying condition can be expanded, for example, can comprise certain word or the keyword of initial data. Suppose querying condition in this example comprise the date, transmit leg of institute's query message, with message in keyword A. After receiving this inquiry request, first process according to step 10 as shown in Figure 1,12 and 14, then, first according to secondary index table by the inquiry request that has comprised keyword A, find the rowkey that comes from the original document that comprises keyword A, that is concordance list rowkey, then inquire message log sheet rowkey according to this concordance list rowkey, and then inquire about in corresponding region.
Fig. 4 is the structural representation of the magnanimity transaction data searching system based on HBase of example according to the present invention. As shown in the figure, this system comprises query interface 40, for user input query request; Receiver module 42, it is arranged at HBase host node, for receiving the inquiry request of user's input; Region determination module 44, it is arranged at described HBase host node, for the scope characterizing according to beginning key and the end key of inquiry request, described inquiry request is sent to the region server in the region corresponding with this scope; Multiple retrieval modules 46, are separately positioned in each region server, complete inquiry for the inquiry request receiving based on described region server at transaction data.
The inquiry request that user inputs by query interface 40. Be arranged on HBase host node receiver module 42 and receive this inquiry request transmission. The scope that the region determination module 44 that is arranged on this HBase host node characterizes according to the beginning key in inquiry request (being startkey) and end key (being endkey), determine region corresponding to described inquiry request, and and then this inquiry request is sent to the region server (regionserver) of the region corresponding with this scope (region). As described above, the multiple regions of district inclusion, in nonrestrictive example, comprise 30 regions in signal herein. One or more region is by a regional service management. For example two regions of a regional service management,, just there are 15 region server in 30 regions. By being equipped with retrieval module on the each server of retrieval module 46(being arranged on determined server, their unified retrieval modules 46 that is designated in this article, but only refer to the retrieval module on determined server in conjunction with context-aware retrieval module 46 herein), in described transaction data, complete inquiry based on this inquiry request. Further, the result for retrieval of retrieval module 46 can send to for example electronic equipment at query interface 40 places via this server, so that user knows.
According to the example of the magnanimity transaction data searching system based on HBase shown in Fig. 4, it also comprises module (not shown) is set, be used for the line unit that determines described area size according to predetermined format setting, two highest orders of wherein said line unit are the date territories that represents trade date, and described trade date is that universal time UTC is got to the value after 31 moulds, condition field in described line unit is immediately following after the little time domain representing hour, and described little time domain is immediately following after described date territory. Line unit (rowkey) example is set as shown in Table 1, repeat no more.
According to the example of the magnanimity transaction data searching system based on HBase shown in Fig. 4, alternatively, also can comprise that concordance list sets up unit (not shown), it is for setting up secondary index table, be used for the mapping relations of the index value and the described line unit that record every transaction message, these mapping relations are relations one to one, described multiple retrieval module is arranged to obtain the line unit of data to be checked in described region by this server based on this inquiry request mapping relations of the index value based on every transaction message and described line unit in described secondary index table, and based on this line unit, in described transaction data, complete inquiry. secondary index table is for example that inquiry mode is set up according to transmit leg F33, the index value of every message of this secondary index table record, that is and, concordance list line unit is as shown in Table 2. the line unit of message log sheet is as shown in table 1 above. according to example of the present invention, the foundation of this secondary index table is the initial data that reads textual form, by MAP/Reduce process generating indexes data, then this index data is written in concordance list.
In the situation that comprising that secondary index table is set up unit, that is to say in the situation that having set up secondary index table, retrieval module 46 illustrated in Figure 4 is configured to be obtained the line unit of data to be checked in described region and based on this line unit, in described transaction data, complete inquiry based on this inquiry request mapping relations of the index value based on every transaction message and described line unit in described secondary index table by this server. In conjunction with Fig. 3, based on the corresponding relation of this inquiry request rowkey based on concordance list rowkey and message log sheet in secondary index table 30, first determine the rowkey of message log sheet, then in the corresponding region shown in Fig. 2, search in other words to HBase message log sheet 32().
Method of the present invention can realize by the mode of software, also can realize by the mode of hardware or software and combination of hardware. Similarly, the magnanimity transaction data search method based on HBase of the present invention can be embodied as the combination of software, hardware or software and hardware.
Occur exceeding in trading volume and estimate increase substantially in the situation that, can come extension storage space by increasing region server, increase the mode memory space extending transversely of HBase clustered node (host node), under these circumstances, due to according in scheme of the present invention, to the retrieval of inquiry request be distributed to the region server of respective regions from HBase host node, therefore retrieval rate is still very fast. Moreover according in the solution of the present invention, the coupling that is arranged so that retrieval of rowkey is more rapid, quick, has also accelerated inquiry velocity. Further, in the situation that being provided with secondary index table, also can expanding query condition.

Claims (7)

1. the magnanimity transaction data search method based on HBase, is characterized in that, described method comprises:
A. receive the inquiry request that user inputs by query interface;
B. send received inquiry request to HBase host node;
C. the scope that described HBase host node characterizes according to the beginning key in inquiry request and end key, sends to described inquiry request the region server in the region corresponding with this scope;
D. in described transaction data, complete inquiry based on this inquiry request by this server.
2. the magnanimity transaction data search method based on HBase as claimed in claim 1, it is characterized in that, the line unit relevant to the structure in described region is according to predetermined format setting, two highest bytes of wherein said line unit are the date territories that represents trade date, and described trade date is that universal time UTC is got to the value after 31 moulds.
3. the magnanimity transaction data search method based on HBase as claimed in claim 2, is characterized in that, the condition field in described line unit is immediately following after the little time domain representing hour, and described little time domain is immediately following after described date territory.
4. the magnanimity transaction data search method based on HBase as claimed in claim 3, it is characterized in that, also comprise and set up secondary index table, for recording the mapping relations of index value and described line unit of every transaction message, these mapping relations are relations one to one, and described steps d comprises:
D1. obtain the line unit of data to be checked in described region by this server based on this inquiry request mapping relations of the index value based on every transaction message and described line unit in described secondary index table;
D2. based on this line unit, in described transaction data, complete inquiry.
5. the magnanimity transaction data searching system based on HBase, is characterized in that, described system comprises:
Query interface, for user input query request;
Receiver module, it is arranged at HBase host node, for receiving the inquiry request of user's input;
Region determination module, it is arranged at described HBase host node, for the scope characterizing according to beginning key and the end key of inquiry request, described inquiry request is sent to the region server in the region corresponding with this scope;
Multiple retrieval modules, are separately positioned in each region server, complete inquiry for the inquiry request receiving based on described region server at transaction data.
6. the magnanimity transaction data searching system based on HBase as claimed in claim 5, it is characterized in that, described system also comprises module is set, it is for determining that the line unit of described area size is according to predetermined format setting, two highest orders of wherein said line unit are the date territories that represents trade date, and described trade date is that universal time UTC is got to the value after 31 moulds, condition field in described line unit is immediately following after the little time domain representing hour, and described little time domain is immediately following after described date territory.
7. the magnanimity transaction data searching system based on HBase as claimed in claim 6, it is characterized in that, also comprise that concordance list sets up unit, it is for setting up secondary index table, be used for the mapping relations of the index value and the described line unit that record every transaction message, these mapping relations are relations one to one, described multiple retrieval module is arranged to obtain the line unit of data to be checked in described region by this server based on this inquiry request mapping relations of the index value based on every transaction message and described line unit in described secondary index table, and based on this line unit, in described transaction data, complete inquiry.
CN201410850869.5A 2014-12-31 2014-12-31 HBase (Hadoop Database)-based mass transaction data retrieving method and system Pending CN105589910A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410850869.5A CN105589910A (en) 2014-12-31 2014-12-31 HBase (Hadoop Database)-based mass transaction data retrieving method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410850869.5A CN105589910A (en) 2014-12-31 2014-12-31 HBase (Hadoop Database)-based mass transaction data retrieving method and system

Publications (1)

Publication Number Publication Date
CN105589910A true CN105589910A (en) 2016-05-18

Family

ID=55929492

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410850869.5A Pending CN105589910A (en) 2014-12-31 2014-12-31 HBase (Hadoop Database)-based mass transaction data retrieving method and system

Country Status (1)

Country Link
CN (1) CN105589910A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107357915A (en) * 2017-07-19 2017-11-17 郑州云海信息技术有限公司 A kind of date storage method and system
CN107894942A (en) * 2017-12-04 2018-04-10 北京小度信息科技有限公司 The monitoring method and device of tables of data visit capacity
CN109144995A (en) * 2017-06-26 2019-01-04 辽宁艾特斯智能交通技术有限公司 A kind of highway magnanimity transaction data search method
CN110287198A (en) * 2019-07-01 2019-09-27 四川新网银行股份有限公司 Finance data indexing means based on HBase database
CN110347722A (en) * 2019-07-11 2019-10-18 软通智慧科技有限公司 Data acquisition method, device, equipment and storage medium based on HBase

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101727465A (en) * 2008-11-03 2010-06-09 中国移动通信集团公司 Methods for establishing and inquiring index of distributed column storage database, device and system thereof
CN102750356A (en) * 2012-06-11 2012-10-24 清华大学 Construction and management method for secondary indexes of key value library
CN103678520A (en) * 2013-11-29 2014-03-26 中国科学院计算技术研究所 Multi-dimensional interval query method and system based on cloud computing
CN103888547A (en) * 2014-04-16 2014-06-25 中国银行股份有限公司 Bill processing method and server

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101727465A (en) * 2008-11-03 2010-06-09 中国移动通信集团公司 Methods for establishing and inquiring index of distributed column storage database, device and system thereof
CN102750356A (en) * 2012-06-11 2012-10-24 清华大学 Construction and management method for secondary indexes of key value library
CN103678520A (en) * 2013-11-29 2014-03-26 中国科学院计算技术研究所 Multi-dimensional interval query method and system based on cloud computing
CN103888547A (en) * 2014-04-16 2014-06-25 中国银行股份有限公司 Bill processing method and server

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109144995A (en) * 2017-06-26 2019-01-04 辽宁艾特斯智能交通技术有限公司 A kind of highway magnanimity transaction data search method
CN107357915A (en) * 2017-07-19 2017-11-17 郑州云海信息技术有限公司 A kind of date storage method and system
CN107894942A (en) * 2017-12-04 2018-04-10 北京小度信息科技有限公司 The monitoring method and device of tables of data visit capacity
CN107894942B (en) * 2017-12-04 2020-06-02 北京星选科技有限公司 Method and device for monitoring data table access amount
CN110287198A (en) * 2019-07-01 2019-09-27 四川新网银行股份有限公司 Finance data indexing means based on HBase database
CN110347722A (en) * 2019-07-11 2019-10-18 软通智慧科技有限公司 Data acquisition method, device, equipment and storage medium based on HBase

Similar Documents

Publication Publication Date Title
CN103997507B (en) A kind of method for pushing and device of information
US9165085B2 (en) System and method for publishing aggregated content on mobile devices
CN105589910A (en) HBase (Hadoop Database)-based mass transaction data retrieving method and system
CN101727465B (en) Methods for establishing and inquiring index of distributed column storage database, device and system thereof
CN101276361B (en) Method and system for displaying related key words
CN104516979B (en) A kind of data query method and system based on quadratic search
WO2015172490A1 (en) Method and apparatus for providing extended search item
CN103258036A (en) Distributed real-time search engine based on p2p
CN103544261A (en) Method and device for managing global indexes of mass structured log data
CN101599886B (en) Query method, system and device in distributed structured network
JPWO2014109009A1 (en) Database management method, management computer and storage medium
CN103678491A (en) Method based on Hadoop small file optimization and reverse index establishment
CN104408044A (en) File access method and system
CN104090901A (en) Method, device and server for processing data
WO2022083436A1 (en) Data processing method and apparatus, and device and readable storage medium
CN103823846A (en) Method for storing and querying big data on basis of graph theories
CN103942344A (en) File preview method and file processing system
CN105303501A (en) Community information service system and method based on picture recommendation
CN103605778A (en) Method, device and system for locating video file
CN108319608A (en) The method, apparatus and system of access log storage inquiry
CN107636655B (en) System and method for providing data as a service (DaaS) in real time
CN104050149A (en) Contact information recognition system for external textual data displayed by in-vehicle infotainment systems
CN112052219A (en) File storage and retrieval method and device, electronic equipment and readable storage medium
CN109144951A (en) A kind of catalogue update method and meta data server based on distributed file system
CN104408084A (en) Method and device for screening big data

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160518

RJ01 Rejection of invention patent application after publication