CN111159185A - Hive index method based on conditional push-down elastic search - Google Patents
Hive index method based on conditional push-down elastic search Download PDFInfo
- Publication number
- CN111159185A CN111159185A CN201911378666.XA CN201911378666A CN111159185A CN 111159185 A CN111159185 A CN 111159185A CN 201911378666 A CN201911378666 A CN 201911378666A CN 111159185 A CN111159185 A CN 111159185A
- Authority
- CN
- China
- Prior art keywords
- hive
- engine
- query
- elasticissearch
- original data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a hive index method based on conditional pushdown elastic search, which comprises the following steps: establishing an index table of a hive original data table in an elastic search engine; establishing an associated foreign key of the index table and the hive original data table, and enabling a query result of an Elasticissearch engine after the Elasticissearch engine is pushed to the Elasticissearch engine under a condition to be the attribute of the hive original data table; the hive engine acquires a DDL query statement, converts the DDL query statement into a query statement executable by an Elasticissearch storage engine, and pushes down the query statement to the Elasticissearch engine; the Elasticissearch engine executes the query action and then returns a query result; the hive engine queries the hive original data table according to the attribute information returned by the Elasticissearch engine and returns the query result to the client. The invention can only establish one table, then realize multiple queries and reduce metadata flooding.
Description
Technical Field
The invention relates to the technical field of database indexing, in particular to a hive indexing method based on conditional pushdown elastic search.
Background
The hive on elastic search plug-in is developed by elastic company and realizes the plug-in of hive storagehandler mechanism. Like the hive on HBase mechanism, after the plug-in is loaded, the hive storage data source is not only hdfs, but also can be external storage engines such as HBase and elastic search.
The incremental deletion and modification operations can be carried out on the elasticsearch engine through a hive client or a hive jdbc interface:
1) the creatable table statement may create a table of the elastic search engine;
2) data can be inserted into the elastic search engine through insert values statements;
3) the table in the elastic search engine can be deleted through the drop table statement;
4) the elasticsearch engine data can be queried and returned to the client by a select query statement.
In the existing hive system based on the elastic search, a DDL form is used for representing a query statement, and if different elastic search query statements are needed to be used in each query, different tables need to be established according to requirements. The table building is a heavy operation and requires manipulation of the source database. DDL defines metadata, and queries are a type of learning operation. If there are many query statements, many database tables need to be built, resulting in metadata flooding.
Disclosure of Invention
The purpose of the invention is as follows: in order to overcome the defects of the prior art, the invention provides a hive index method based on conditional pushdown elastic search, which can only establish one table and then realize multiple queries.
The technical scheme is as follows: in order to achieve the technical effects, the technical scheme provided by the invention is as follows:
the hive index method based on the conditional push-down elastic search comprises the following steps:
(1) establishing an index table of a hive original data table in an elastic search engine;
(2) establishing an associated foreign key of the index table and the hive original data table, and enabling a query result of an Elasticissearch engine after the Elasticissearch engine is pushed to the Elasticissearch engine under a condition to be the attribute of the hive original data table;
(3) the hive engine acquires a DDL query statement, converts the DDL query statement into a query statement executable by an Elasticissearch storage engine, and pushes down the query statement to the Elasticissearch engine;
(4) the Elasticissearch engine executes the query action and then returns a query result;
(5) the hive engine queries the hive original data table according to the attribute information returned by the Elasticissearch engine and returns the query result to the client.
Further, the specific step of converting the DDL query statement into a query statement executable by the Elasticsearch storage engine includes: resolving the DDL query statement into an abstract syntax tree to obtain a query expression of the abstract syntax tree; parsing the restful query portion of the query expression; an Elasticissearch restful query statement is created by DSLBuilder.
Further, the method for obtaining the query expression of the abstract syntax tree comprises the following steps: and acquiring a query expression of the abstract syntax tree through a hive udf function match (String), wherein the parameter String of the function match (String) is a query statement of the elastic search.
Further, the method also comprises the following steps:
when the hive original data table has insertion/deletion/modification operations, the hive engine generates logic of hive original data table update data to index table update data according to the operation information of the hive original data table, and updates the index table according to the generated logic.
Has the advantages that: compared with the prior art, the invention has the following advantages:
the invention optimizes hive interaction with the elastic search through conditional push-down. Therefore, the query condition can be pushed down to the elastic search engine to be executed without reestablishing the database table, so that the metadata of the system is reduced, and the load of the system is reduced.
Drawings
FIG. 1 is a flowchart of a hive index method based on conditional push-down of an elastic search according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a hive index flow based on an elastic search in the prior art.
Detailed Description
The invention will be further described with reference to the accompanying drawings and specific embodiments. It is to be understood that the present invention may be embodied in various forms, and that there is no intention to limit the invention to the specific embodiments illustrated, but on the contrary, the intention is to cover some exemplary and non-limiting embodiments shown in the attached drawings and described below.
Fig. 1 shows an example of a hive index method based on conditional push-down elastic search according to the present invention, which includes the following steps:
(1) establishing an index table of a hive original data table in an elastic search engine;
(2) establishing an associated foreign key of the index table and the hive original data table, and enabling a query result of an Elasticissearch engine after the Elasticissearch engine is pushed to the Elasticissearch engine under a condition to be the attribute of the hive original data table;
(3) the hive engine acquires a DDL query statement, converts the DDL query statement into a query statement executable by an Elasticissearch storage engine, and pushes down the query statement to the Elasticissearch engine;
(4) the Elasticissearch engine executes the query action and then returns a query result;
(5) the hive engine queries the hive original data table according to the attribute information returned by the Elasticissearch engine and returns the query result to the client.
In step (3), the DDL query statement is converted into a query statement executable by the Elasticsearch storage engine. Specifically, the method comprises the following steps: resolving the DDL query statement into an abstract syntax tree to obtain a query expression of the abstract syntax tree; parsing the restful query portion of the query expression; an Elasticissearch restful query statement is created by DSLBuilder. The method for obtaining the query expression of the abstract syntax tree comprises the following steps: and acquiring a query expression of the abstract syntax tree through a hive udf function match (String), wherein the parameter String of the function match (String) is a query statement of the elastic search.
In one or more embodiments of the hive index method based on conditional push-down elastic search according to the present invention, the method further includes: when the hive original data table has insertion/deletion/modification operations, the hive engine generates logic of hive original data table update data to index table update data according to the operation information of the hive original data table, and updates the index table according to the generated logic.
The technical effects of the present invention will be described below in conjunction with the prior art.
The existing hive on elastic search table building statement is as follows:
if each query needs to use different elastic search query statements, different tables need to be built according to requirements. As shown in fig. 2, if the query needed is "query": match ": general": M "}, then the query attribute needs to be specified as the query and tabulated. Then by SELECT FROM tablet; a query is made. It is clear that queries have a learning property, and this approach of indicating SELECT with DDL is not friendly and can cause metadata flooding. For example, if a user wants to make two queries, in the existing hive on elastic search system, the operation is as follows:
create Table 1 and query:
create Table 2 and query:
that is, the user has to create two tables to implement the query twice.
The hive index method based on conditional push-down elastic search takes a query statement as a SELECT FROM free wideband match ('{ "query": match ": gene": M "}') as an example, and the flow is as follows:
step 1: the hive engine carries out SQL syntax parsing on DDL query statements SELECT FROM wideband match ('{ "query": { "match": { "sender": M "}) }') to generate an abstract syntax tree;
step 2: the abstract syntax tree obtains a query condition partial match (' { "query": { "match": { "sender": "M" } }), and the query expression is of a tree syntax structure.
And step 3: analyzing the query expression to obtain sub expressions of ' query ': match ': general ': M ' }.
And 4, step 4: DSLBuilder (domain-specific language) creates an elastic search restful query statement.
And 5: the query statement is pushed down to the elasticsearch engine.
Step 6: the results of the elastic search engine query are returned to the hive engine end.
And 7: the hive engine maps the returned result set into a two-dimensional relation and displays the two-dimensional relation in the form of a data table.
The statement of the table building query of the invention is as follows:
therefore, when a client needs to do two times of inquiry, the invention can establish only one table, and can realize two times of inquiry.
It is to be understood that the features listed above for the different embodiments may be combined with each other to form further embodiments within the scope of the invention, where technically feasible. Furthermore, the particular examples and embodiments of the invention described are non-limiting, and various modifications may be made in the structure, steps, and sequence set forth above without departing from the scope of the invention.
The above-described embodiments, particularly any "preferred" embodiments, are possible examples of implementations, and are presented merely for a clear understanding of the principles of the invention. Many variations and modifications may be made to the above-described embodiments without departing substantially from the spirit and principles of the technology described herein, and such variations and modifications are to be considered within the scope of the invention.
Claims (4)
1. The hive index method based on the conditional push-down elastic search is characterized by comprising the following steps of:
(1) establishing an index table of a hive original data table in an elastic search engine;
(2) establishing an associated foreign key of the index table and the hive original data table, and enabling a query result of an Elasticissearch engine after the Elasticissearch engine is pushed to the Elasticissearch engine under a condition to be the attribute of the hive original data table;
(3) the hive engine acquires a DDL query statement, converts the DDL query statement into a query statement executable by an Elasticissearch storage engine, and pushes down the query statement to the Elasticissearch engine;
(4) the Elasticissearch engine executes the query action and then returns a query result;
(5) the hive engine queries the hive original data table according to the attribute information returned by the Elasticissearch engine and returns the query result to the client.
2. The hive index method for pushing down an elastic search based on conditions of claim 1, wherein the specific steps of converting a DDL query statement into a query statement executable by an elastic search storage engine comprise: resolving the DDL query statement into an abstract syntax tree to obtain a query expression of the abstract syntax tree; parsing the restful query portion of the query expression; an Elasticissearch restful query statement is created by DSLBuilder.
3. The hive index method based on conditional push-down elastic search of claim 2, wherein the method for obtaining the query expression of the abstract syntax tree is as follows:
and acquiring a query expression of the abstract syntax tree through a hive udf function match (String), wherein the parameter String of the function match (String) is a query statement of the elastic search.
4. The hive index method based on conditional push-down elastic search of claim 2, further comprising the steps of:
when the hive original data table has insertion/deletion/modification operations, the hive engine generates logic of hive original data table update data to index table update data according to the operation information of the hive original data table, and updates the index table according to the generated logic.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911378666.XA CN111159185A (en) | 2019-12-27 | 2019-12-27 | Hive index method based on conditional push-down elastic search |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911378666.XA CN111159185A (en) | 2019-12-27 | 2019-12-27 | Hive index method based on conditional push-down elastic search |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111159185A true CN111159185A (en) | 2020-05-15 |
Family
ID=70558631
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911378666.XA Pending CN111159185A (en) | 2019-12-27 | 2019-12-27 | Hive index method based on conditional push-down elastic search |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111159185A (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107506464A (en) * | 2017-08-30 | 2017-12-22 | 武汉烽火众智数字技术有限责任公司 | A kind of method that HBase secondary indexs are realized based on ES |
CN109299102A (en) * | 2018-10-23 | 2019-02-01 | 中国电子科技集团公司第二十八研究所 | A kind of HBase secondary index system and method based on Elastcisearch |
-
2019
- 2019-12-27 CN CN201911378666.XA patent/CN111159185A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107506464A (en) * | 2017-08-30 | 2017-12-22 | 武汉烽火众智数字技术有限责任公司 | A kind of method that HBase secondary indexs are realized based on ES |
CN109299102A (en) * | 2018-10-23 | 2019-02-01 | 中国电子科技集团公司第二十八研究所 | A kind of HBase secondary index system and method based on Elastcisearch |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109299102B (en) | HBase secondary index system and method based on Elastcissearch | |
CN106934062B (en) | Implementation method and system for querying elastic search | |
Zhao et al. | Modeling MongoDB with relational model | |
WO2017096939A1 (en) | Method for establishing index on hdfs-based spark-sql big-data processing system | |
KR101083563B1 (en) | Method and System for Managing Database | |
US9753960B1 (en) | System, method, and computer program for dynamically generating a visual representation of a subset of a graph for display, based on search criteria | |
US10296505B2 (en) | Framework for joining datasets | |
US20140136498A1 (en) | Labeling versioned hierarchical data | |
CN104765731A (en) | Database query optimization method and equipment | |
CN107491476B (en) | Data model conversion and query analysis method suitable for various big data management systems | |
CN106933869B (en) | Method and device for operating database | |
CN112231321B (en) | Oracle secondary index and index real-time synchronization method | |
CN111177303B (en) | Phoenix-based Hbase secondary full-text indexing method and system | |
CN107122486B (en) | Multi-element big data fusion method and system supporting BLOB | |
CN109885585B (en) | Distributed database system and method supporting stored procedures, triggers and views | |
CN104536987A (en) | Data query method and device | |
CN105302842A (en) | Data processing method and device | |
US20220035820A1 (en) | Storage structure of data object, method and system for storing and dynamically managing data object on computer, and storage medium and electronic device | |
CN108198595B (en) | Multi-source heterogeneous unstructured medical record data fusion method | |
US9881055B1 (en) | Language conversion based on S-expression tabular structure | |
CN111159185A (en) | Hive index method based on conditional push-down elastic search | |
JP2010072823A (en) | Database management system and program | |
CN114036158B (en) | Combined query method and device for elastic search and MySQL | |
CN116049193A (en) | Data storage method and device | |
CN108920664A (en) | A kind of database intelligence index implementation method based on index value |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20200515 |
|
WD01 | Invention patent application deemed withdrawn after publication |