CN104503985A - Method for automatically creating Solr index file by Hbase data - Google Patents

Method for automatically creating Solr index file by Hbase data Download PDF

Info

Publication number
CN104503985A
CN104503985A CN201410721633.1A CN201410721633A CN104503985A CN 104503985 A CN104503985 A CN 104503985A CN 201410721633 A CN201410721633 A CN 201410721633A CN 104503985 A CN104503985 A CN 104503985A
Authority
CN
China
Prior art keywords
hbase
hive
solr
index
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410721633.1A
Other languages
Chinese (zh)
Inventor
金洪殿
赵仁明
辛国茂
刘伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Electronic Information Industry Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN201410721633.1A priority Critical patent/CN104503985A/en
Publication of CN104503985A publication Critical patent/CN104503985A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9014Indexing; Data structures therefor; Storage structures hash tables

Abstract

The invention provides a method for automatically creating a Solr index file by Hbase data and belongs to the field of mega-data. The index can be automatically created by configuring data in the HBase by adopting a method based on Solr+HBase+Hive. By creating a Hive outer table and an Hbase table which are associated, data in the HBase can be accessed by virtue of Hive. Data in the Hbase associated with the Hive outer table is accessed through a jdbc interface provided by Hive by virtue of a DIH (Data Import Handler) assembly provided by Solr. The function of automatically creating the index for the HBase data by means of the function of automatically creating the index by the DIH is achieved.

Description

A kind of Hbase datamation creates the method for Solr index file
Technical field
The present invention relates to large data fields, be specifically related to a kind of method that Hbase datamation creates Solr index file.
Background technology
Large data (Big data) are commonly used to a large amount of unstructured data and the semi-structured data that describe company's creation, and these data can overspending time and money when downloading to relevant database for analyzing.Normal and the cloud computing of large data analysis is linked together, because the real-time framework of large data set analysis needs as MapReduce, HBase shares out the work to tens of, hundreds of or even thousands of computers.Large data analysis, compared to traditional data warehouse applications, has the features such as data volume is large, query analysis is complicated.Large data need special technology, effectively to process the data in a large amount of tolerance elapsed time.Be applicable to the technology of large data, comprise massively parallel processing (MPP) database, data mining electrical network, distributed file system, distributed data base, cloud computing platform, internet and extendible storage system.
Solr is an independently enterprise-level search application server, and it externally provides the api interface being similar to Web-service.User can pass through http request, submits the XML file of certain format to, generating indexes to search engine server; Also can be operated by Http Get and propose search request, and obtain returning results of XML or json form.
HBase be one distributed, towards row PostgreSQL database, the Google paper " distributed memory system of Bigtable: one structural data " that this Technology origin is write in Fay Chang.HBase – Hadoop Database, be a high reliability, high-performance, towards row, telescopic distributed memory system, utilize HBase technology can erect large-scale structure storage cluster on cheap PC Server.HBase, while providing high concurrent reading and writing to operate support, also also exists some significant defects.Because HBase only sorts to rowkey, so HBase cannot realize fast finding for field beyond rowkey and retrieval.HBase also cannot realize based on the Pagination Display of inquiring about and inquire about page by page simultaneously.
Summary of the invention
Therefore, based on the mass data inquiry method of Solr and HBase, can effectively address these problems.
In current large market demand field, the mass data inquiry method based on Solr and HBase framework be applicable to very much high concurrent, the low delay of process query demand.It is the key realizing this framework that fast automatic automation creates Solr index file.
The invention provides a kind of method that Hbase datamation creates Solr index file, Hive sets up appearance and associates with HBase, utilize the DIH(DataImportHandler that Solr provides) instrument, connected by the jdbc of hive, the robotization indexing service having been carried out Hbase data by configuration can be realized, and do not need to carry out separately coding exploitation.The method index data is utilized to need first to be processed by code relevant for affairs in JdbcDataSource class in Solr, because Hive does not support affairs, so in jdbc interface and the function of unrealized relevant interface.
The present invention adopts the method based on Solr+HBase+Hive, can to the data in HBase by having configured the robotization building work of index.Associate with the carrying out that Hbase shows by creating Hive appearance, thus the data that visited by Hive in HBase can be realized.Utilize the DIH(DataImportHandler that Solr provides) assembly, the jdbc interface provided by Hive, visit the data in the Hbase of Hive appearance association, utilize DIH robotization to create the function of index, thus achieve the function that HBase datamation creates index.
Concrete steps are:
1, the definition of Solr schema file and configuration
Amendment schema.xml file, adds the field that we need index wherein;
2, DataImportHandler configuration
The configuration file that in configuration solrconfig.xml, this Handler of dataimport uses, and under conf path, set up this configuration file and carry out configuration corresponding to field;
3, the jdbc bag of Hive is added
The jdbc bag of Hive is added under the lib catalogue of the corresponding Core of Solr;
4, execution index building work
At the control desk of Solr, the DIH function performing corresponding core completes index creation work.
Accompanying drawing explanation
Fig. 1 is that Solr+HBase+Hive robotization creates index schematic diagram.
Embodiment
More detailed elaboration is carried out to content of the present invention below:
Hive sets up appearance and associates with HBase, utilize the DIH(DataImportHandler that Solr provides) instrument, connected by the jdbc of hive, the robotization indexing service having been carried out Hbase data by configuration can be realized, and do not need to carry out separately coding exploitation.
The present invention adopts the method based on Solr+HBase+Hive, can to the data in HBase by having configured the robotization building work of index.Associate with the carrying out that Hbase shows by creating Hive appearance, thus the data that visited by Hive in HBase can be realized.Utilize the DIH(DataImportHandler that Solr provides) assembly, the jdbc interface provided by Hive, visit the data in the Hbase of Hive appearance association, utilize DIH robotization to create the function of index, thus achieve the function that HBase datamation creates index.
1, the definition of Solr schema file and configuration
Amendment schema.xml file, adds the field that we need index wherein;
2, DataImportHandler configuration
The configuration file that in configuration solrconfig.xml, this Handler of dataimport uses, and under conf path, set up this configuration file and carry out configuration corresponding to field;
3, the jdbc bag of Hive is added
The jdbc bag of Hive is added under the lib catalogue of the corresponding Core of Solr;
4, execution index building work
At the control desk of Solr, the DIH function performing corresponding core completes index creation work.

Claims (3)

1. Hbase datamation creates a method for Solr index file, and it is characterized in that, Hive sets up appearance and associates with HBase, the DIH instrument utilizing Solr to provide, and is connected by the jdbc of hive.
2. method according to claim 1, is characterized in that
Adopt the method based on Solr+HBase+Hive, to the data in HBase by having configured the robotization building work of index;
Associating with the carrying out that Hbase shows by creating Hive appearance, realizing the data visited by Hive in HBase;
Utilize the DIH(DataImportHandler that Solr provides) assembly, the jdbc interface provided by Hive, visits the data in the Hbase of Hive appearance association;
Utilize DIH robotization to create the function of index, achieve the function that HBase datamation creates index.
3. method according to claim 2, is characterized in that concrete steps are as follows
1) definition of Solr schema file and configuration
Amendment schema.xml file, adds the field needing index wherein;
2) DataImportHandler configuration
The configuration file that in configuration solrconfig.xml, this Handler of dataimport uses, and under conf path, set up this configuration file and carry out configuration corresponding to field;
3) the jdbc bag of Hive is added
The jdbc bag of Hive is added under the lib catalogue of the corresponding Core of Solr;
4) execution index building work
At the control desk of Solr, the DIH function performing corresponding core completes index creation work.
CN201410721633.1A 2014-12-03 2014-12-03 Method for automatically creating Solr index file by Hbase data Pending CN104503985A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410721633.1A CN104503985A (en) 2014-12-03 2014-12-03 Method for automatically creating Solr index file by Hbase data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410721633.1A CN104503985A (en) 2014-12-03 2014-12-03 Method for automatically creating Solr index file by Hbase data

Publications (1)

Publication Number Publication Date
CN104503985A true CN104503985A (en) 2015-04-08

Family

ID=52945383

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410721633.1A Pending CN104503985A (en) 2014-12-03 2014-12-03 Method for automatically creating Solr index file by Hbase data

Country Status (1)

Country Link
CN (1) CN104503985A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106484897A (en) * 2016-10-21 2017-03-08 郑州云海信息技术有限公司 A kind of quick method connecting Hiveserver by JDBC
CN106682148A (en) * 2016-12-22 2017-05-17 北京锐安科技有限公司 Method and device based on Solr data search
CN106844716A (en) * 2017-02-08 2017-06-13 上海熙菱信息技术有限公司 A kind of mass data automated storing method based on Solr indexes and Oracle storages
CN106909671A (en) * 2017-02-28 2017-06-30 湖南蚁坊软件股份有限公司 A kind of method and system of NoSQL databases condition query
CN107402987A (en) * 2016-09-21 2017-11-28 广州特道信息科技有限公司 A kind of method of full-text search and distributed NewSQL Database Systems
CN107644050A (en) * 2016-12-22 2018-01-30 北京锐安科技有限公司 A kind of querying method and device of the Hbase based on solr
CN108319636A (en) * 2017-11-27 2018-07-24 大象慧云信息技术有限公司 Electronic invoice data querying method
CN110059091A (en) * 2019-04-22 2019-07-26 成都四方伟业软件股份有限公司 Method, apparatus, client, server and the system of index construct

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102426609A (en) * 2011-12-28 2012-04-25 厦门市美亚柏科信息股份有限公司 Index generation method and index generation device based on MapReduce programming architecture
US20140222843A1 (en) * 2013-02-01 2014-08-07 Netapp, Inc. Systems, Methods, and computer Program Products to Ingest, Process, and Output Large Data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102426609A (en) * 2011-12-28 2012-04-25 厦门市美亚柏科信息股份有限公司 Index generation method and index generation device based on MapReduce programming architecture
US20140222843A1 (en) * 2013-02-01 2014-08-07 Netapp, Inc. Systems, Methods, and computer Program Products to Ingest, Process, and Output Large Data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HRISHIKESH KARAMBELKAR: "Scaling Big Data with Hadoop and Solr", 《WWW.PACKTPUB.COM》 *
HUOYUNSHEN88: "hbase+solr概念和环境搭建", 《HTTP://BLOG.CSDN.NET/HUOYUNSHEN88/ARTICLE/DETAILS/38082455》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107402987A (en) * 2016-09-21 2017-11-28 广州特道信息科技有限公司 A kind of method of full-text search and distributed NewSQL Database Systems
CN107402987B (en) * 2016-09-21 2020-04-03 云润大数据服务有限公司 Full-text retrieval method and distributed NewSQL database system
CN106484897A (en) * 2016-10-21 2017-03-08 郑州云海信息技术有限公司 A kind of quick method connecting Hiveserver by JDBC
CN106682148A (en) * 2016-12-22 2017-05-17 北京锐安科技有限公司 Method and device based on Solr data search
CN107644050A (en) * 2016-12-22 2018-01-30 北京锐安科技有限公司 A kind of querying method and device of the Hbase based on solr
CN106844716A (en) * 2017-02-08 2017-06-13 上海熙菱信息技术有限公司 A kind of mass data automated storing method based on Solr indexes and Oracle storages
CN106909671A (en) * 2017-02-28 2017-06-30 湖南蚁坊软件股份有限公司 A kind of method and system of NoSQL databases condition query
CN108319636A (en) * 2017-11-27 2018-07-24 大象慧云信息技术有限公司 Electronic invoice data querying method
CN110059091A (en) * 2019-04-22 2019-07-26 成都四方伟业软件股份有限公司 Method, apparatus, client, server and the system of index construct
CN110059091B (en) * 2019-04-22 2020-08-11 成都四方伟业软件股份有限公司 Index construction method, device, client, server and system

Similar Documents

Publication Publication Date Title
CN104503985A (en) Method for automatically creating Solr index file by Hbase data
US10664471B2 (en) System and method of query processing with schema change in JSON document store
US10061823B2 (en) Multi-tenancy for structured query language (SQL) and non structured query language (NoSQL) databases
Tauro et al. Comparative study of the new generation, agile, scalable, high performance NOSQL databases
CN104102710A (en) Massive data query method
CN107038207A (en) A kind of data query method, data processing method and device
US9928266B2 (en) Method and computing device for minimizing accesses to data storage in conjunction with maintaining a B-tree
CN104112013A (en) HBase secondary indexing method and device
WO2015074477A1 (en) Path analysis method and apparatus
US20170060977A1 (en) Data preparation for data mining
CN111611268A (en) Government affair service search processing method and device
Psaila et al. J-CO: a platform-independent framework for managing geo-referenced JSON data sets
Amghar et al. Storing, preprocessing and analyzing tweets: finding the suitable noSQL system
CN104714983A (en) Generating method and device for distributed indexes
McClean et al. A comparison of mapreduce and parallel database management systems
Zhang et al. Unified SQL query middleware for heterogeneous databases
de Souza Baptista et al. Using OGC Services To Interoperate Spatial Data Stored In SQL And NoSQL Databases.
Goldfarb et al. Enhancing the Discoverability and Interoperability of Multi-Disciplinary Semantic Repositories.
CN101853307A (en) Note establishing method, corresponding network searching system and method thereof
Vissamsetti et al. Twitter Data Analysis for Live Streaming by Using Flume Technology
Huang et al. Building the distributed geographic SQL workflow in the Grid environment
CN111581173B (en) Method, device, server and storage medium for distributed storage of log system
Suciu et al. Cloud search based applications for big data-challenges and methodologies for acceleration
CN112861030B (en) CDN refreshing method and device, cache server and storage medium
Ghosh et al. NoSQL Database: An Advanced Way to Store, Analyze and Extract Results From Big Data

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20150408