CN104503985A

CN104503985A - Method for automatically creating Solr index file by Hbase data

Info

Publication number: CN104503985A
Application number: CN201410721633.1A
Authority: CN
Inventors: 金洪殿; 赵仁明; 辛国茂; 刘伟
Original assignee: Inspur Electronic Information Industry Co Ltd
Current assignee: Inspur Electronic Information Industry Co Ltd
Priority date: 2014-12-03
Filing date: 2014-12-03
Publication date: 2015-04-08

Abstract

The invention provides a method for automatically creating a Solr index file by Hbase data and belongs to the field of mega-data. The index can be automatically created by configuring data in the HBase by adopting a method based on Solr+HBase+Hive. By creating a Hive outer table and an Hbase table which are associated, data in the HBase can be accessed by virtue of Hive. Data in the Hbase associated with the Hive outer table is accessed through a jdbc interface provided by Hive by virtue of a DIH (Data Import Handler) assembly provided by Solr. The function of automatically creating the index for the HBase data by means of the function of automatically creating the index by the DIH is achieved.

Description

A kind of Hbase datamation creates the method for Solr index file

Technical field

The present invention relates to large data fields, be specifically related to a kind of method that Hbase datamation creates Solr index file.

Background technology

Large data (Big data) are commonly used to a large amount of unstructured data and the semi-structured data that describe company's creation, and these data can overspending time and money when downloading to relevant database for analyzing.Normal and the cloud computing of large data analysis is linked together, because the real-time framework of large data set analysis needs as MapReduce, HBase shares out the work to tens of, hundreds of or even thousands of computers.Large data analysis, compared to traditional data warehouse applications, has the features such as data volume is large, query analysis is complicated.Large data need special technology, effectively to process the data in a large amount of tolerance elapsed time.Be applicable to the technology of large data, comprise massively parallel processing (MPP) database, data mining electrical network, distributed file system, distributed data base, cloud computing platform, internet and extendible storage system.

Solr is an independently enterprise-level search application server, and it externally provides the api interface being similar to Web-service.User can pass through http request, submits the XML file of certain format to, generating indexes to search engine server; Also can be operated by Http Get and propose search request, and obtain returning results of XML or json form.

HBase be one distributed, towards row PostgreSQL database, the Google paper " distributed memory system of Bigtable: one structural data " that this Technology origin is write in Fay Chang.HBase – Hadoop Database, be a high reliability, high-performance, towards row, telescopic distributed memory system, utilize HBase technology can erect large-scale structure storage cluster on cheap PC Server.HBase, while providing high concurrent reading and writing to operate support, also also exists some significant defects.Because HBase only sorts to rowkey, so HBase cannot realize fast finding for field beyond rowkey and retrieval.HBase also cannot realize based on the Pagination Display of inquiring about and inquire about page by page simultaneously.

Summary of the invention

Therefore, based on the mass data inquiry method of Solr and HBase, can effectively address these problems.

In current large market demand field, the mass data inquiry method based on Solr and HBase framework be applicable to very much high concurrent, the low delay of process query demand.It is the key realizing this framework that fast automatic automation creates Solr index file.

The invention provides a kind of method that Hbase datamation creates Solr index file, Hive sets up appearance and associates with HBase, utilize the DIH(DataImportHandler that Solr provides) instrument, connected by the jdbc of hive, the robotization indexing service having been carried out Hbase data by configuration can be realized, and do not need to carry out separately coding exploitation.The method index data is utilized to need first to be processed by code relevant for affairs in JdbcDataSource class in Solr, because Hive does not support affairs, so in jdbc interface and the function of unrealized relevant interface.

The present invention adopts the method based on Solr+HBase+Hive, can to the data in HBase by having configured the robotization building work of index.Associate with the carrying out that Hbase shows by creating Hive appearance, thus the data that visited by Hive in HBase can be realized.Utilize the DIH(DataImportHandler that Solr provides) assembly, the jdbc interface provided by Hive, visit the data in the Hbase of Hive appearance association, utilize DIH robotization to create the function of index, thus achieve the function that HBase datamation creates index.

Concrete steps are:

1, the definition of Solr schema file and configuration

Amendment schema.xml file, adds the field that we need index wherein;

2, DataImportHandler configuration

The configuration file that in configuration solrconfig.xml, this Handler of dataimport uses, and under conf path, set up this configuration file and carry out configuration corresponding to field;

3, the jdbc bag of Hive is added

The jdbc bag of Hive is added under the lib catalogue of the corresponding Core of Solr;

4, execution index building work

At the control desk of Solr, the DIH function performing corresponding core completes index creation work.

Accompanying drawing explanation

Fig. 1 is that Solr+HBase+Hive robotization creates index schematic diagram.

Embodiment

More detailed elaboration is carried out to content of the present invention below:

Hive sets up appearance and associates with HBase, utilize the DIH(DataImportHandler that Solr provides) instrument, connected by the jdbc of hive, the robotization indexing service having been carried out Hbase data by configuration can be realized, and do not need to carry out separately coding exploitation.

1, the definition of Solr schema file and configuration

Amendment schema.xml file, adds the field that we need index wherein;

2, DataImportHandler configuration

3, the jdbc bag of Hive is added

4, execution index building work

Claims

1. Hbase datamation creates a method for Solr index file, and it is characterized in that, Hive sets up appearance and associates with HBase, the DIH instrument utilizing Solr to provide, and is connected by the jdbc of hive.

2. method according to claim 1, is characterized in that

Adopt the method based on Solr+HBase+Hive, to the data in HBase by having configured the robotization building work of index;

Associating with the carrying out that Hbase shows by creating Hive appearance, realizing the data visited by Hive in HBase;

Utilize the DIH(DataImportHandler that Solr provides) assembly, the jdbc interface provided by Hive, visits the data in the Hbase of Hive appearance association;

Utilize DIH robotization to create the function of index, achieve the function that HBase datamation creates index.

3. method according to claim 2, is characterized in that concrete steps are as follows

1) definition of Solr schema file and configuration

Amendment schema.xml file, adds the field needing index wherein;

2) DataImportHandler configuration

3) the jdbc bag of Hive is added

4) execution index building work