CN104503985A - Method for automatically creating Solr index file by Hbase data - Google Patents
Method for automatically creating Solr index file by Hbase data Download PDFInfo
- Publication number
- CN104503985A CN104503985A CN201410721633.1A CN201410721633A CN104503985A CN 104503985 A CN104503985 A CN 104503985A CN 201410721633 A CN201410721633 A CN 201410721633A CN 104503985 A CN104503985 A CN 104503985A
- Authority
- CN
- China
- Prior art keywords
- hbase
- hive
- solr
- index
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9014—Indexing; Data structures therefor; Storage structures hash tables
Abstract
The invention provides a method for automatically creating a Solr index file by Hbase data and belongs to the field of mega-data. The index can be automatically created by configuring data in the HBase by adopting a method based on Solr+HBase+Hive. By creating a Hive outer table and an Hbase table which are associated, data in the HBase can be accessed by virtue of Hive. Data in the Hbase associated with the Hive outer table is accessed through a jdbc interface provided by Hive by virtue of a DIH (Data Import Handler) assembly provided by Solr. The function of automatically creating the index for the HBase data by means of the function of automatically creating the index by the DIH is achieved.
Description
Technical field
The present invention relates to large data fields, be specifically related to a kind of method that Hbase datamation creates Solr index file.
Background technology
Large data (Big data) are commonly used to a large amount of unstructured data and the semi-structured data that describe company's creation, and these data can overspending time and money when downloading to relevant database for analyzing.Normal and the cloud computing of large data analysis is linked together, because the real-time framework of large data set analysis needs as MapReduce, HBase shares out the work to tens of, hundreds of or even thousands of computers.Large data analysis, compared to traditional data warehouse applications, has the features such as data volume is large, query analysis is complicated.Large data need special technology, effectively to process the data in a large amount of tolerance elapsed time.Be applicable to the technology of large data, comprise massively parallel processing (MPP) database, data mining electrical network, distributed file system, distributed data base, cloud computing platform, internet and extendible storage system.
Solr is an independently enterprise-level search application server, and it externally provides the api interface being similar to Web-service.User can pass through http request, submits the XML file of certain format to, generating indexes to search engine server; Also can be operated by Http Get and propose search request, and obtain returning results of XML or json form.
HBase be one distributed, towards row PostgreSQL database, the Google paper " distributed memory system of Bigtable: one structural data " that this Technology origin is write in Fay Chang.HBase – Hadoop Database, be a high reliability, high-performance, towards row, telescopic distributed memory system, utilize HBase technology can erect large-scale structure storage cluster on cheap PC Server.HBase, while providing high concurrent reading and writing to operate support, also also exists some significant defects.Because HBase only sorts to rowkey, so HBase cannot realize fast finding for field beyond rowkey and retrieval.HBase also cannot realize based on the Pagination Display of inquiring about and inquire about page by page simultaneously.
Summary of the invention
Therefore, based on the mass data inquiry method of Solr and HBase, can effectively address these problems.
In current large market demand field, the mass data inquiry method based on Solr and HBase framework be applicable to very much high concurrent, the low delay of process query demand.It is the key realizing this framework that fast automatic automation creates Solr index file.
The invention provides a kind of method that Hbase datamation creates Solr index file, Hive sets up appearance and associates with HBase, utilize the DIH(DataImportHandler that Solr provides) instrument, connected by the jdbc of hive, the robotization indexing service having been carried out Hbase data by configuration can be realized, and do not need to carry out separately coding exploitation.The method index data is utilized to need first to be processed by code relevant for affairs in JdbcDataSource class in Solr, because Hive does not support affairs, so in jdbc interface and the function of unrealized relevant interface.
The present invention adopts the method based on Solr+HBase+Hive, can to the data in HBase by having configured the robotization building work of index.Associate with the carrying out that Hbase shows by creating Hive appearance, thus the data that visited by Hive in HBase can be realized.Utilize the DIH(DataImportHandler that Solr provides) assembly, the jdbc interface provided by Hive, visit the data in the Hbase of Hive appearance association, utilize DIH robotization to create the function of index, thus achieve the function that HBase datamation creates index.
Concrete steps are:
1, the definition of Solr schema file and configuration
Amendment schema.xml file, adds the field that we need index wherein;
2, DataImportHandler configuration
The configuration file that in configuration solrconfig.xml, this Handler of dataimport uses, and under conf path, set up this configuration file and carry out configuration corresponding to field;
3, the jdbc bag of Hive is added
The jdbc bag of Hive is added under the lib catalogue of the corresponding Core of Solr;
4, execution index building work
At the control desk of Solr, the DIH function performing corresponding core completes index creation work.
Accompanying drawing explanation
Fig. 1 is that Solr+HBase+Hive robotization creates index schematic diagram.
Embodiment
More detailed elaboration is carried out to content of the present invention below:
Hive sets up appearance and associates with HBase, utilize the DIH(DataImportHandler that Solr provides) instrument, connected by the jdbc of hive, the robotization indexing service having been carried out Hbase data by configuration can be realized, and do not need to carry out separately coding exploitation.
The present invention adopts the method based on Solr+HBase+Hive, can to the data in HBase by having configured the robotization building work of index.Associate with the carrying out that Hbase shows by creating Hive appearance, thus the data that visited by Hive in HBase can be realized.Utilize the DIH(DataImportHandler that Solr provides) assembly, the jdbc interface provided by Hive, visit the data in the Hbase of Hive appearance association, utilize DIH robotization to create the function of index, thus achieve the function that HBase datamation creates index.
1, the definition of Solr schema file and configuration
Amendment schema.xml file, adds the field that we need index wherein;
2, DataImportHandler configuration
The configuration file that in configuration solrconfig.xml, this Handler of dataimport uses, and under conf path, set up this configuration file and carry out configuration corresponding to field;
3, the jdbc bag of Hive is added
The jdbc bag of Hive is added under the lib catalogue of the corresponding Core of Solr;
4, execution index building work
At the control desk of Solr, the DIH function performing corresponding core completes index creation work.
Claims (3)
1. Hbase datamation creates a method for Solr index file, and it is characterized in that, Hive sets up appearance and associates with HBase, the DIH instrument utilizing Solr to provide, and is connected by the jdbc of hive.
2. method according to claim 1, is characterized in that
Adopt the method based on Solr+HBase+Hive, to the data in HBase by having configured the robotization building work of index;
Associating with the carrying out that Hbase shows by creating Hive appearance, realizing the data visited by Hive in HBase;
Utilize the DIH(DataImportHandler that Solr provides) assembly, the jdbc interface provided by Hive, visits the data in the Hbase of Hive appearance association;
Utilize DIH robotization to create the function of index, achieve the function that HBase datamation creates index.
3. method according to claim 2, is characterized in that concrete steps are as follows
1) definition of Solr schema file and configuration
Amendment schema.xml file, adds the field needing index wherein;
2) DataImportHandler configuration
The configuration file that in configuration solrconfig.xml, this Handler of dataimport uses, and under conf path, set up this configuration file and carry out configuration corresponding to field;
3) the jdbc bag of Hive is added
The jdbc bag of Hive is added under the lib catalogue of the corresponding Core of Solr;
4) execution index building work
At the control desk of Solr, the DIH function performing corresponding core completes index creation work.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410721633.1A CN104503985A (en) | 2014-12-03 | 2014-12-03 | Method for automatically creating Solr index file by Hbase data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410721633.1A CN104503985A (en) | 2014-12-03 | 2014-12-03 | Method for automatically creating Solr index file by Hbase data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104503985A true CN104503985A (en) | 2015-04-08 |
Family
ID=52945383
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410721633.1A Pending CN104503985A (en) | 2014-12-03 | 2014-12-03 | Method for automatically creating Solr index file by Hbase data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104503985A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106484897A (en) * | 2016-10-21 | 2017-03-08 | 郑州云海信息技术有限公司 | A kind of quick method connecting Hiveserver by JDBC |
CN106682148A (en) * | 2016-12-22 | 2017-05-17 | 北京锐安科技有限公司 | Method and device based on Solr data search |
CN106844716A (en) * | 2017-02-08 | 2017-06-13 | 上海熙菱信息技术有限公司 | A kind of mass data automated storing method based on Solr indexes and Oracle storages |
CN106909671A (en) * | 2017-02-28 | 2017-06-30 | 湖南蚁坊软件股份有限公司 | A kind of method and system of NoSQL databases condition query |
CN107402987A (en) * | 2016-09-21 | 2017-11-28 | 广州特道信息科技有限公司 | A kind of method of full-text search and distributed NewSQL Database Systems |
CN107644050A (en) * | 2016-12-22 | 2018-01-30 | 北京锐安科技有限公司 | A kind of querying method and device of the Hbase based on solr |
CN108319636A (en) * | 2017-11-27 | 2018-07-24 | 大象慧云信息技术有限公司 | Electronic invoice data querying method |
CN110059091A (en) * | 2019-04-22 | 2019-07-26 | 成都四方伟业软件股份有限公司 | Method, apparatus, client, server and the system of index construct |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102426609A (en) * | 2011-12-28 | 2012-04-25 | 厦门市美亚柏科信息股份有限公司 | Index generation method and index generation device based on MapReduce programming architecture |
US20140222843A1 (en) * | 2013-02-01 | 2014-08-07 | Netapp, Inc. | Systems, Methods, and computer Program Products to Ingest, Process, and Output Large Data |
-
2014
- 2014-12-03 CN CN201410721633.1A patent/CN104503985A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102426609A (en) * | 2011-12-28 | 2012-04-25 | 厦门市美亚柏科信息股份有限公司 | Index generation method and index generation device based on MapReduce programming architecture |
US20140222843A1 (en) * | 2013-02-01 | 2014-08-07 | Netapp, Inc. | Systems, Methods, and computer Program Products to Ingest, Process, and Output Large Data |
Non-Patent Citations (2)
Title |
---|
HRISHIKESH KARAMBELKAR: "Scaling Big Data with Hadoop and Solr", 《WWW.PACKTPUB.COM》 * |
HUOYUNSHEN88: "hbase+solr概念和环境搭建", 《HTTP://BLOG.CSDN.NET/HUOYUNSHEN88/ARTICLE/DETAILS/38082455》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107402987A (en) * | 2016-09-21 | 2017-11-28 | 广州特道信息科技有限公司 | A kind of method of full-text search and distributed NewSQL Database Systems |
CN107402987B (en) * | 2016-09-21 | 2020-04-03 | 云润大数据服务有限公司 | Full-text retrieval method and distributed NewSQL database system |
CN106484897A (en) * | 2016-10-21 | 2017-03-08 | 郑州云海信息技术有限公司 | A kind of quick method connecting Hiveserver by JDBC |
CN106682148A (en) * | 2016-12-22 | 2017-05-17 | 北京锐安科技有限公司 | Method and device based on Solr data search |
CN107644050A (en) * | 2016-12-22 | 2018-01-30 | 北京锐安科技有限公司 | A kind of querying method and device of the Hbase based on solr |
CN106844716A (en) * | 2017-02-08 | 2017-06-13 | 上海熙菱信息技术有限公司 | A kind of mass data automated storing method based on Solr indexes and Oracle storages |
CN106909671A (en) * | 2017-02-28 | 2017-06-30 | 湖南蚁坊软件股份有限公司 | A kind of method and system of NoSQL databases condition query |
CN108319636A (en) * | 2017-11-27 | 2018-07-24 | 大象慧云信息技术有限公司 | Electronic invoice data querying method |
CN110059091A (en) * | 2019-04-22 | 2019-07-26 | 成都四方伟业软件股份有限公司 | Method, apparatus, client, server and the system of index construct |
CN110059091B (en) * | 2019-04-22 | 2020-08-11 | 成都四方伟业软件股份有限公司 | Index construction method, device, client, server and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104503985A (en) | Method for automatically creating Solr index file by Hbase data | |
US10664471B2 (en) | System and method of query processing with schema change in JSON document store | |
US10061823B2 (en) | Multi-tenancy for structured query language (SQL) and non structured query language (NoSQL) databases | |
Tauro et al. | Comparative study of the new generation, agile, scalable, high performance NOSQL databases | |
CN104102710A (en) | Massive data query method | |
CN107038207A (en) | A kind of data query method, data processing method and device | |
US9928266B2 (en) | Method and computing device for minimizing accesses to data storage in conjunction with maintaining a B-tree | |
CN104112013A (en) | HBase secondary indexing method and device | |
WO2015074477A1 (en) | Path analysis method and apparatus | |
US20170060977A1 (en) | Data preparation for data mining | |
CN111611268A (en) | Government affair service search processing method and device | |
Psaila et al. | J-CO: a platform-independent framework for managing geo-referenced JSON data sets | |
Amghar et al. | Storing, preprocessing and analyzing tweets: finding the suitable noSQL system | |
CN104714983A (en) | Generating method and device for distributed indexes | |
McClean et al. | A comparison of mapreduce and parallel database management systems | |
Zhang et al. | Unified SQL query middleware for heterogeneous databases | |
de Souza Baptista et al. | Using OGC Services To Interoperate Spatial Data Stored In SQL And NoSQL Databases. | |
Goldfarb et al. | Enhancing the Discoverability and Interoperability of Multi-Disciplinary Semantic Repositories. | |
CN101853307A (en) | Note establishing method, corresponding network searching system and method thereof | |
Vissamsetti et al. | Twitter Data Analysis for Live Streaming by Using Flume Technology | |
Huang et al. | Building the distributed geographic SQL workflow in the Grid environment | |
CN111581173B (en) | Method, device, server and storage medium for distributed storage of log system | |
Suciu et al. | Cloud search based applications for big data-challenges and methodologies for acceleration | |
CN112861030B (en) | CDN refreshing method and device, cache server and storage medium | |
Ghosh et al. | NoSQL Database: An Advanced Way to Store, Analyze and Extract Results From Big Data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20150408 |