CN106326429A - Hbase second-level query scheme based on solr - Google Patents

Hbase second-level query scheme based on solr Download PDF

Info

Publication number
CN106326429A
CN106326429A CN201610723701.7A CN201610723701A CN106326429A CN 106326429 A CN106326429 A CN 106326429A CN 201610723701 A CN201610723701 A CN 201610723701A CN 106326429 A CN106326429 A CN 106326429A
Authority
CN
China
Prior art keywords
solr
hbase
index
data
rowkey
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610723701.7A
Other languages
Chinese (zh)
Inventor
童浩
杨凡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Optics Valley Information Technologies Co Ltd
Original Assignee
Wuhan Optics Valley Information Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Optics Valley Information Technologies Co Ltd filed Critical Wuhan Optics Valley Information Technologies Co Ltd
Priority to CN201610723701.7A priority Critical patent/CN106326429A/en
Publication of CN106326429A publication Critical patent/CN106326429A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/221Column-oriented storage; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an Hbase second-level query scheme based on solr. The Hbase second-level query scheme comprises the following steps of inserting raw data into an Hbase column-oriented database; calling a MapReduce increment to update an index in the solr, obtaining the raw data, and storing into a server of the solr with a particular file format of the solr; accessing the server of the solr, and establishing the index; firstly, searching the index, obtaining rowkey from the index, and querying required result data from an Hbase main list. The Hbase second-level query scheme has the advantages that the searching speed is high, and the accuracy is high; by adopting a solr and Hbase combining technique, the massive data can be searched in a second-level way, and the rowkey of data of one page can be returned back by a page separating function of the solr; because the number of data of each page is extremely limited, the response speed is higher when the Hbase query is performed according to the rowkey of the corresponding page, and is controlled to the millisecond level.

Description

A kind of Hbase second level query scheme based on solr
Technical field
The present invention relates to hbase technical field, particularly relate to a kind of Hbase second level query scheme based on solr.
Background technology
Solr is a complete search service based on lucene under apache.Solr mainly includes two parts core Assembly: indexing component and searching component.Indexing component is for setting up index by the data needing index in search utility, and searches Rope assembly carrys out search index for the request of customer in response end.Solr is a high-performance, uses Java5 exploitation, based on The full-text search server of Lucene.It is extended, it is provided that the ratio query language of Lucene more horn of plenty simultaneously, with Time achieve configurable, expansible and query performance be optimized, and provide a perfect function management interface, It it is the most outstanding a full-text search engine.Document utilizes XML to be added in a search set by Http.Inquire about this set Also it is to receive an XML/JSON response by http to realize.Its key property includes: efficiently, caching function flexibly, Vertical search function, is highlighted Search Results, improves availability by index copy, it is provided that a set of powerful Data Schema defines field, type and arrange text analyzing, it is provided that Web-based enterprise management interface etc..
Hbase is the Hadoop family distributed storage scheme for mass data, when us by rowkey to being stored in The response of second level can be reached, it is achieved more satisfactory Consumer's Experience when mass data in Hbase is inquired about.But, when Under more complicated scene, if desired for when data are done multi-condition inquiry, the solution that Hbase provides is not the most to manage very much Think.
For multi-condition inquiry, there are two kinds of solutions comparing main flow Hbase present stage itself:
1, table is manually indexed by coprocessor when inserting data
Coprocessor in Hbase has two kinds: Observer and Endpoint.Observer is similar to relevant database In trigger, Endpoint is similar to the storing process in relevant database.
We use Observer when utilizing coprocessor to index table, are i.e. inserting data in Hbase table Time, add Observer operation, allow and before often inserting a data, all call our self-defining service logic life in concordance list Become to need the record of index field.
So when we carry out multi-condition inquiry for Hbase, our inquiry operation is divided into two steps: the first step is first Inquiring about at concordance list according to querying condition, the rowkey of the corresponding result of inquiry, second step goes master meter to look into further according to rowkey Ask the data that we need.
This scheme has several bigger problem:
(1) coprocessor is the most unstable
In existing version Hbase, when our oneself test generates index by coprocessor, once setting up Index process Middle code throw exception, whole Hadoop cluster all can be hung.
(2) index can affect insert data speed
Owing to inserting data and to index be a Tong Bus process, so shadow to a great extent is understood in the operation indexed Ring the speed inserting data.
(3) field needing index must determine before data are inserted, and the later stage can not revise
Inserting another problem of simultaneously indexing of data is exactly that we must disposably determine and be there is a need to set up rope The field drawn, if the later stage need in a new field set up index, before already inserted into data be will not the most again Set up index.
(4) the corresponding concordance list of each index field is inefficient
In order to flexible when the later stage makes index of reference, typically one can be set up for each single field when setting up concordance list Concordance list.Using field value as the rowkey of concordance list, using the rowkey of former table as the field of concordance list.This mode Although us can be facilitated to do multi-condition inquiry flexibly, but the quantity of concordance list can be increased, looking into when word enquiring simultaneously simultaneously When inquiry condition is more, needs the concordance list inquiry operation carried out repeatedly, the response inquired about also is had and compares large effect.
2, the filter using Hbase to carry filters in service end
Hbase carries number of types of filter, and we can also oneself filter self-defined simultaneously.When we are looking into Using filter when of inquiry, the result data of inquiry can be carried out by the logic of filter by Hbase in the service end of cluster Filter.
But same, this scheme also has a problem in that filter still needs scan data, and efficiency is low.
Although filter is to filter in service end, but still need all numbers meeting rowkey querying condition According to all checking out, it is scanned in these data the most again, filters out the data not meeting filtercondition.This process Can take a lot of service end internal memory when original query data volume is bigger, sweep time also can be the longest simultaneously, this mistake of light The time-consuming requirement that the most can not reach the inquiry of second level of journey.
There is some characteristic can not meet our demand based on both the above scheme, we have proposed a kind of based on solr Hbase second level query scheme.
Summary of the invention
The invention aims to solve shortcoming present in prior art, and propose a kind of based on solr Hbase second level query scheme.
A kind of Hbase second level query scheme based on solr, comprises the following steps:
Step 1, initial data is inserted in Hbase columnar database, keep the original mode of Hbase, be not required to do other What change;
Step 2, obtain initial data and initial data is stored in the distinctive document format of solr the service end of solr, After setting up document, document can be analyzed by solr automatically, after completing analysis, solr using the word that is syncopated as key, with Document carries out inverted index as value, i.e. forms index, and the rope set up in MapReduce incremental update solr is called in timing Draw;
When step 3, inquiry, access solr service end, need individually to set up in the field inquired about index, search index, From index, obtain rowkey, go Hbase columnar database is inquired about further according to rowkey, i.e. generate required number of results According to.
Preferably, after described solr sets up index, index compression can be stored in the disk of solr service end, simultaneously Map can be utilized to do the caching of part.
Preferably, segmenter can be optimized, for business scenario to being customized of participle by described solr index Optimization, extract the special word of industry.
Preferably, described solr carries two-page separation function, can return the rowkey of page of data every time.
Preferably, described sorl can combine with ripe memory database, is directly existed in memory database by index.
Preferably, described solr sets up the operation indexed and can also be placed in the coprocessor of Hbase execution.
A kind of based on solr Hbase second level query scheme that the present invention proposes, search speed is fast, and accuracy rate is high, passes through The technology that solr and hbase combines, it is achieved retrieving the second level of mass data, the two-page separation function that solr carries can be returned every time Return the rowkey of page of data, owing to the quantity of every page data is extremely limited, so rowkey based on this page goes Hbase to look into again During inquiry, response speed is very fast, can be controlled in Millisecond.
Accompanying drawing explanation
Fig. 1 is data Stored Procedure figures;
Fig. 2 is data query flow chart.
Detailed description of the invention
Below in conjunction with specific embodiment, the present invention is explained orally further.
With reference to Fig. 1-2, a kind of based on solr Hbase second level query scheme that the present invention proposes, comprise the following steps:
Step 1, initial data is inserted in Hbase columnar database, keep the original mode of Hbase, be not required to do other What change;
Step 2, timing are called in MapReduce incremental update solr and are indexed, and first obtain and insert in Hbase columnar database Initial data and initial data is stored in the server of solr with the distinctive document format of solr, set up solr after document Automatically document can be analyzed, relate among these by specific participle technique, the content in document is carried out participle, complete point After word, solr, using the word that is syncopated as key, carries out inverted index using document as value;
When step 3, inquiry, access solr service end, the field needing inquiry is individually set up index, set up index After, index compression can be stored in the disk of solr service end by solr, Map can be utilized simultaneously to do the caching of part, inquire about rope Draw, from index, obtain rowkey, solr carry two-page separation function, the rowkey of page of data can be returned every time, further according to Rowkey goes to inquire about in Hbase columnar database, i.e. generates required result data.
In the present invention solr set up index operation can also be placed in the coprocessor of Hbase execution, sorl can with become Ripe memory database combines, and is directly existed in memory database by index.
A kind of based on solr Hbase second level query scheme that the present invention proposes, search speed is fast, and accuracy rate is high, passes through The technology that solr and hbase combines, it is achieved retrieving the second level of mass data, the two-page separation function that solr carries can be returned every time Return the rowkey of page of data, owing to the quantity of every page data is extremely limited, so rowkey based on this page goes Hbase to look into again During inquiry, response speed is very fast, can be controlled in Millisecond.
The above, the only present invention preferably detailed description of the invention, but protection scope of the present invention is not limited thereto, Any those familiar with the art in the technical scope that the invention discloses, according to technical scheme and Inventive concept equivalent or change in addition, all should contain within protection scope of the present invention.

Claims (6)

1. a Hbase second level query scheme based on solr, it is characterised in that comprise the following steps:
Step 1, initial data is inserted in Hbase columnar database, keep the original mode of Hbase, be not required to do other any more Change;
Step 2, obtain initial data and initial data is stored in the distinctive document format of solr the service end of solr, setting up After document, document can be analyzed by solr automatically, and after completing analysis, solr is using the word that is syncopated as key, with document Carrying out inverted index as value, i.e. form index, the index set up in MapReduce incremental update solr is called in timing;
When step 3, inquiry, accessing solr service end, individually set up index in the field needing inquiry, search index, from rope Draw middle acquisition rowkey, go Hbase columnar database is inquired about further according to rowkey, i.e. generate required result data.
A kind of Hbase second level query scheme based on solr the most according to claim 1, it is characterised in that described solr After setting up index, index compression can be stored in the disk of solr service end, Map can be utilized simultaneously to do the caching of part.
A kind of Hbase second level query scheme based on solr the most according to claim 1, it is characterised in that described solr Segmenter can be optimized by index, for the business scenario optimization to being customized of participle, extracts the special use of industry Word.
A kind of Hbase second level query scheme based on solr the most according to claim 1, it is characterised in that described solr Carry two-page separation function, the rowkey of page of data can be returned every time.
A kind of Hbase second level query scheme based on solr the most according to claim 1, it is characterised in that described sorl Can combine with ripe memory database, directly index is existed in memory database.
A kind of Hbase second level query scheme based on solr the most according to claim 1, it is characterised in that described solr The operation setting up index can also be placed in the coprocessor of Hbase execution.
CN201610723701.7A 2016-08-25 2016-08-25 Hbase second-level query scheme based on solr Pending CN106326429A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610723701.7A CN106326429A (en) 2016-08-25 2016-08-25 Hbase second-level query scheme based on solr

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610723701.7A CN106326429A (en) 2016-08-25 2016-08-25 Hbase second-level query scheme based on solr

Publications (1)

Publication Number Publication Date
CN106326429A true CN106326429A (en) 2017-01-11

Family

ID=57791438

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610723701.7A Pending CN106326429A (en) 2016-08-25 2016-08-25 Hbase second-level query scheme based on solr

Country Status (1)

Country Link
CN (1) CN106326429A (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106909671A (en) * 2017-02-28 2017-06-30 湖南蚁坊软件股份有限公司 A kind of method and system of NoSQL databases condition query
CN107239517A (en) * 2017-05-23 2017-10-10 中国联合网络通信集团有限公司 Many condition searching method and device based on Hbase databases
CN107656985A (en) * 2017-09-11 2018-02-02 北京京东尚科信息技术有限公司 Web page interrogation method and its system
CN108573063A (en) * 2018-04-27 2018-09-25 宁波银行股份有限公司 A kind of data query method and system
WO2018209574A1 (en) * 2017-05-16 2018-11-22 深圳中兴力维技术有限公司 Alarm data query method and apparatus
CN109144995A (en) * 2017-06-26 2019-01-04 辽宁艾特斯智能交通技术有限公司 A kind of highway magnanimity transaction data search method
CN109299143A (en) * 2018-11-28 2019-02-01 重庆邮电大学 The knowledge fast indexing method in the data interoperation knowledge on testing library based on Redis caching
CN109471893A (en) * 2018-10-24 2019-03-15 上海连尚网络科技有限公司 Querying method, equipment and the computer readable storage medium of network data
CN109697200A (en) * 2018-12-18 2019-04-30 厦门商集网络科技有限责任公司 A kind of HBase secondary index method and apparatus based on Solr
CN110109870A (en) * 2018-01-24 2019-08-09 江苏友上科技实业有限公司 A kind of mass data quick retrieval system based on Solr
CN110232106A (en) * 2019-04-26 2019-09-13 安徽四创电子股份有限公司 A kind of mass data storage and method for quickly retrieving based on MongoDB and Solr
CN110347722A (en) * 2019-07-11 2019-10-18 软通智慧科技有限公司 Data capture method, device, equipment and storage medium based on HBase
CN111078731A (en) * 2019-11-25 2020-04-28 国网冀北电力有限公司 Hbase-based power grid operation data collaborative query method and device and storage medium
CN111488379A (en) * 2020-04-17 2020-08-04 焦点科技股份有限公司 Method for optimizing Hbase large data query
CN112463832A (en) * 2020-11-27 2021-03-09 苏州浪潮智能科技有限公司 Inquiry method and device based on hbase-indexer and electronic equipment
CN112506915A (en) * 2020-10-27 2021-03-16 百果园技术(新加坡)有限公司 Application data management system, processing method and device and server
CN113297273A (en) * 2021-06-09 2021-08-24 北京百度网讯科技有限公司 Method and device for querying metadata and electronic equipment
CN113407785A (en) * 2021-06-11 2021-09-17 西北工业大学 Data processing method and system based on distributed storage system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102426609A (en) * 2011-12-28 2012-04-25 厦门市美亚柏科信息股份有限公司 Index generation method and index generation device based on MapReduce programming architecture
KR20140012377A (en) * 2012-07-20 2014-02-03 유넷시스템주식회사 Method of forming index file, method of searching data and system for managing data using dictionary index file, recoding medium
CN104102710A (en) * 2014-07-15 2014-10-15 浪潮(北京)电子信息产业有限公司 Massive data query method
CN104834688A (en) * 2015-04-20 2015-08-12 北京奇艺世纪科技有限公司 Secondary index establishment method and device
CN105138592A (en) * 2015-07-31 2015-12-09 武汉虹信技术服务有限责任公司 Distributed framework-based log data storing and retrieving method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102426609A (en) * 2011-12-28 2012-04-25 厦门市美亚柏科信息股份有限公司 Index generation method and index generation device based on MapReduce programming architecture
KR20140012377A (en) * 2012-07-20 2014-02-03 유넷시스템주식회사 Method of forming index file, method of searching data and system for managing data using dictionary index file, recoding medium
CN104102710A (en) * 2014-07-15 2014-10-15 浪潮(北京)电子信息产业有限公司 Massive data query method
CN104834688A (en) * 2015-04-20 2015-08-12 北京奇艺世纪科技有限公司 Secondary index establishment method and device
CN105138592A (en) * 2015-07-31 2015-12-09 武汉虹信技术服务有限责任公司 Distributed framework-based log data storing and retrieving method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
施磊磊: "基于Hadoop 和HBase 的分布式索引模型的研究", 《信息技术》 *
魏勇等: "基于GeoNames和Solr的地名数据全文检索", 《测绘工程》 *

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106909671A (en) * 2017-02-28 2017-06-30 湖南蚁坊软件股份有限公司 A kind of method and system of NoSQL databases condition query
WO2018209574A1 (en) * 2017-05-16 2018-11-22 深圳中兴力维技术有限公司 Alarm data query method and apparatus
CN107239517A (en) * 2017-05-23 2017-10-10 中国联合网络通信集团有限公司 Many condition searching method and device based on Hbase databases
CN107239517B (en) * 2017-05-23 2020-09-29 中国联合网络通信集团有限公司 Multi-condition searching method and device based on Hbase database
CN109144995A (en) * 2017-06-26 2019-01-04 辽宁艾特斯智能交通技术有限公司 A kind of highway magnanimity transaction data search method
CN107656985A (en) * 2017-09-11 2018-02-02 北京京东尚科信息技术有限公司 Web page interrogation method and its system
CN110109870A (en) * 2018-01-24 2019-08-09 江苏友上科技实业有限公司 A kind of mass data quick retrieval system based on Solr
CN108573063A (en) * 2018-04-27 2018-09-25 宁波银行股份有限公司 A kind of data query method and system
CN109471893A (en) * 2018-10-24 2019-03-15 上海连尚网络科技有限公司 Querying method, equipment and the computer readable storage medium of network data
CN109299143A (en) * 2018-11-28 2019-02-01 重庆邮电大学 The knowledge fast indexing method in the data interoperation knowledge on testing library based on Redis caching
CN109299143B (en) * 2018-11-28 2022-03-22 重庆邮电大学 Knowledge fast indexing method of data interoperation test knowledge base based on Redis cache
CN109697200A (en) * 2018-12-18 2019-04-30 厦门商集网络科技有限责任公司 A kind of HBase secondary index method and apparatus based on Solr
CN110232106A (en) * 2019-04-26 2019-09-13 安徽四创电子股份有限公司 A kind of mass data storage and method for quickly retrieving based on MongoDB and Solr
CN110347722A (en) * 2019-07-11 2019-10-18 软通智慧科技有限公司 Data capture method, device, equipment and storage medium based on HBase
CN111078731A (en) * 2019-11-25 2020-04-28 国网冀北电力有限公司 Hbase-based power grid operation data collaborative query method and device and storage medium
CN111488379A (en) * 2020-04-17 2020-08-04 焦点科技股份有限公司 Method for optimizing Hbase large data query
CN111488379B (en) * 2020-04-17 2022-07-19 焦点科技股份有限公司 Method for optimizing Hbase large data query
CN112506915A (en) * 2020-10-27 2021-03-16 百果园技术(新加坡)有限公司 Application data management system, processing method and device and server
CN112506915B (en) * 2020-10-27 2024-05-10 百果园技术(新加坡)有限公司 Application data management system, processing method and device and server
CN112463832A (en) * 2020-11-27 2021-03-09 苏州浪潮智能科技有限公司 Inquiry method and device based on hbase-indexer and electronic equipment
CN112463832B (en) * 2020-11-27 2022-10-25 苏州浪潮智能科技有限公司 Inquiry method and device based on hbase-indexer and electronic equipment
CN113297273A (en) * 2021-06-09 2021-08-24 北京百度网讯科技有限公司 Method and device for querying metadata and electronic equipment
CN113297273B (en) * 2021-06-09 2024-03-01 北京百度网讯科技有限公司 Method and device for inquiring metadata and electronic equipment
CN113407785A (en) * 2021-06-11 2021-09-17 西北工业大学 Data processing method and system based on distributed storage system

Similar Documents

Publication Publication Date Title
CN106326429A (en) Hbase second-level query scheme based on solr
US11068439B2 (en) Unsupervised method for enriching RDF data sources from denormalized data
CN106202207B (en) HBase-ORM-based indexing and retrieval system
US11573941B2 (en) Systems, methods, and data structures for high-speed searching or filtering of large datasets
US8880463B2 (en) Standardized framework for reporting archived legacy system data
US9697250B1 (en) Systems and methods for high-speed searching and filtering of large datasets
US9753960B1 (en) System, method, and computer program for dynamically generating a visual representation of a subset of a graph for display, based on search criteria
US20140046928A1 (en) Query plans with parameter markers in place of object identifiers
CN111506621B (en) Data statistical method and device
CN107203640B (en) Method and system for establishing physical model through database operation record
CN109669925B (en) Management method and device of unstructured data
CN106294695A (en) A kind of implementation method towards the biggest data search engine
CN107491487A (en) A kind of full-text database framework and bitmap index establishment, data query method, server and medium
CN105912609A (en) Data file processing method and device
CN107291964A (en) A kind of method that fuzzy query is realized based on HBase
CN104636389A (en) Hbase database real-time query achieving method and system
CN111680043B (en) Method for quickly retrieving mass data
CN106649800A (en) Solr-based Chinese search method
CN105069101A (en) Distributed index construction and search method
KR20200094074A (en) Method, apparatus, device and storage medium for managing index
US8290950B2 (en) Identifying locale-specific data based on a total ordering of supported locales
CN114116762A (en) Offline data fuzzy search method, device, equipment and medium
CN109542930A (en) A kind of data efficient search method based on ElasticSearch
CN110109870A (en) A kind of mass data quick retrieval system based on Solr
CN107291938A (en) Order Query System and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170111

RJ01 Rejection of invention patent application after publication