CN111159185A - Hive index method based on conditional push-down elastic search - Google Patents

Hive index method based on conditional push-down elastic search Download PDF

Info

Publication number
CN111159185A
CN111159185A CN201911378666.XA CN201911378666A CN111159185A CN 111159185 A CN111159185 A CN 111159185A CN 201911378666 A CN201911378666 A CN 201911378666A CN 111159185 A CN111159185 A CN 111159185A
Authority
CN
China
Prior art keywords
hive
engine
query
elasticissearch
original data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911378666.XA
Other languages
Chinese (zh)
Inventor
于伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unicloud Nanjing Digital Technology Co Ltd
Original Assignee
Unicloud Nanjing Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unicloud Nanjing Digital Technology Co Ltd filed Critical Unicloud Nanjing Digital Technology Co Ltd
Priority to CN201911378666.XA priority Critical patent/CN111159185A/en
Publication of CN111159185A publication Critical patent/CN111159185A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a hive index method based on conditional pushdown elastic search, which comprises the following steps: establishing an index table of a hive original data table in an elastic search engine; establishing an associated foreign key of the index table and the hive original data table, and enabling a query result of an Elasticissearch engine after the Elasticissearch engine is pushed to the Elasticissearch engine under a condition to be the attribute of the hive original data table; the hive engine acquires a DDL query statement, converts the DDL query statement into a query statement executable by an Elasticissearch storage engine, and pushes down the query statement to the Elasticissearch engine; the Elasticissearch engine executes the query action and then returns a query result; the hive engine queries the hive original data table according to the attribute information returned by the Elasticissearch engine and returns the query result to the client. The invention can only establish one table, then realize multiple queries and reduce metadata flooding.

Description

Hive index method based on conditional push-down elastic search
Technical Field
The invention relates to the technical field of database indexing, in particular to a hive indexing method based on conditional pushdown elastic search.
Background
The hive on elastic search plug-in is developed by elastic company and realizes the plug-in of hive storagehandler mechanism. Like the hive on HBase mechanism, after the plug-in is loaded, the hive storage data source is not only hdfs, but also can be external storage engines such as HBase and elastic search.
The incremental deletion and modification operations can be carried out on the elasticsearch engine through a hive client or a hive jdbc interface:
1) the creatable table statement may create a table of the elastic search engine;
2) data can be inserted into the elastic search engine through insert values statements;
3) the table in the elastic search engine can be deleted through the drop table statement;
4) the elasticsearch engine data can be queried and returned to the client by a select query statement.
In the existing hive system based on the elastic search, a DDL form is used for representing a query statement, and if different elastic search query statements are needed to be used in each query, different tables need to be established according to requirements. The table building is a heavy operation and requires manipulation of the source database. DDL defines metadata, and queries are a type of learning operation. If there are many query statements, many database tables need to be built, resulting in metadata flooding.
Disclosure of Invention
The purpose of the invention is as follows: in order to overcome the defects of the prior art, the invention provides a hive index method based on conditional pushdown elastic search, which can only establish one table and then realize multiple queries.
The technical scheme is as follows: in order to achieve the technical effects, the technical scheme provided by the invention is as follows:
the hive index method based on the conditional push-down elastic search comprises the following steps:
(1) establishing an index table of a hive original data table in an elastic search engine;
(2) establishing an associated foreign key of the index table and the hive original data table, and enabling a query result of an Elasticissearch engine after the Elasticissearch engine is pushed to the Elasticissearch engine under a condition to be the attribute of the hive original data table;
(3) the hive engine acquires a DDL query statement, converts the DDL query statement into a query statement executable by an Elasticissearch storage engine, and pushes down the query statement to the Elasticissearch engine;
(4) the Elasticissearch engine executes the query action and then returns a query result;
(5) the hive engine queries the hive original data table according to the attribute information returned by the Elasticissearch engine and returns the query result to the client.
Further, the specific step of converting the DDL query statement into a query statement executable by the Elasticsearch storage engine includes: resolving the DDL query statement into an abstract syntax tree to obtain a query expression of the abstract syntax tree; parsing the restful query portion of the query expression; an Elasticissearch restful query statement is created by DSLBuilder.
Further, the method for obtaining the query expression of the abstract syntax tree comprises the following steps: and acquiring a query expression of the abstract syntax tree through a hive udf function match (String), wherein the parameter String of the function match (String) is a query statement of the elastic search.
Further, the method also comprises the following steps:
when the hive original data table has insertion/deletion/modification operations, the hive engine generates logic of hive original data table update data to index table update data according to the operation information of the hive original data table, and updates the index table according to the generated logic.
Has the advantages that: compared with the prior art, the invention has the following advantages:
the invention optimizes hive interaction with the elastic search through conditional push-down. Therefore, the query condition can be pushed down to the elastic search engine to be executed without reestablishing the database table, so that the metadata of the system is reduced, and the load of the system is reduced.
Drawings
FIG. 1 is a flowchart of a hive index method based on conditional push-down of an elastic search according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a hive index flow based on an elastic search in the prior art.
Detailed Description
The invention will be further described with reference to the accompanying drawings and specific embodiments. It is to be understood that the present invention may be embodied in various forms, and that there is no intention to limit the invention to the specific embodiments illustrated, but on the contrary, the intention is to cover some exemplary and non-limiting embodiments shown in the attached drawings and described below.
Fig. 1 shows an example of a hive index method based on conditional push-down elastic search according to the present invention, which includes the following steps:
(1) establishing an index table of a hive original data table in an elastic search engine;
(2) establishing an associated foreign key of the index table and the hive original data table, and enabling a query result of an Elasticissearch engine after the Elasticissearch engine is pushed to the Elasticissearch engine under a condition to be the attribute of the hive original data table;
(3) the hive engine acquires a DDL query statement, converts the DDL query statement into a query statement executable by an Elasticissearch storage engine, and pushes down the query statement to the Elasticissearch engine;
(4) the Elasticissearch engine executes the query action and then returns a query result;
(5) the hive engine queries the hive original data table according to the attribute information returned by the Elasticissearch engine and returns the query result to the client.
In step (3), the DDL query statement is converted into a query statement executable by the Elasticsearch storage engine. Specifically, the method comprises the following steps: resolving the DDL query statement into an abstract syntax tree to obtain a query expression of the abstract syntax tree; parsing the restful query portion of the query expression; an Elasticissearch restful query statement is created by DSLBuilder. The method for obtaining the query expression of the abstract syntax tree comprises the following steps: and acquiring a query expression of the abstract syntax tree through a hive udf function match (String), wherein the parameter String of the function match (String) is a query statement of the elastic search.
In one or more embodiments of the hive index method based on conditional push-down elastic search according to the present invention, the method further includes: when the hive original data table has insertion/deletion/modification operations, the hive engine generates logic of hive original data table update data to index table update data according to the operation information of the hive original data table, and updates the index table according to the generated logic.
The technical effects of the present invention will be described below in conjunction with the prior art.
The existing hive on elastic search table building statement is as follows:
Figure BDA0002341691180000031
Figure BDA0002341691180000041
if each query needs to use different elastic search query statements, different tables need to be built according to requirements. As shown in fig. 2, if the query needed is "query": match ": general": M "}, then the query attribute needs to be specified as the query and tabulated. Then by SELECT FROM tablet; a query is made. It is clear that queries have a learning property, and this approach of indicating SELECT with DDL is not friendly and can cause metadata flooding. For example, if a user wants to make two queries, in the existing hive on elastic search system, the operation is as follows:
create Table 1 and query:
Figure BDA0002341691180000042
create Table 2 and query:
Figure BDA0002341691180000043
Figure BDA0002341691180000051
that is, the user has to create two tables to implement the query twice.
The hive index method based on conditional push-down elastic search takes a query statement as a SELECT FROM free wideband match ('{ "query": match ": gene": M "}') as an example, and the flow is as follows:
step 1: the hive engine carries out SQL syntax parsing on DDL query statements SELECT FROM wideband match ('{ "query": { "match": { "sender": M "}) }') to generate an abstract syntax tree;
step 2: the abstract syntax tree obtains a query condition partial match (' { "query": { "match": { "sender": "M" } }), and the query expression is of a tree syntax structure.
And step 3: analyzing the query expression to obtain sub expressions of ' query ': match ': general ': M ' }.
And 4, step 4: DSLBuilder (domain-specific language) creates an elastic search restful query statement.
And 5: the query statement is pushed down to the elasticsearch engine.
Step 6: the results of the elastic search engine query are returned to the hive engine end.
And 7: the hive engine maps the returned result set into a two-dimensional relation and displays the two-dimensional relation in the form of a data table.
The statement of the table building query of the invention is as follows:
Figure BDA0002341691180000052
Figure BDA0002341691180000061
therefore, when a client needs to do two times of inquiry, the invention can establish only one table, and can realize two times of inquiry.
It is to be understood that the features listed above for the different embodiments may be combined with each other to form further embodiments within the scope of the invention, where technically feasible. Furthermore, the particular examples and embodiments of the invention described are non-limiting, and various modifications may be made in the structure, steps, and sequence set forth above without departing from the scope of the invention.
The above-described embodiments, particularly any "preferred" embodiments, are possible examples of implementations, and are presented merely for a clear understanding of the principles of the invention. Many variations and modifications may be made to the above-described embodiments without departing substantially from the spirit and principles of the technology described herein, and such variations and modifications are to be considered within the scope of the invention.

Claims (4)

1. The hive index method based on the conditional push-down elastic search is characterized by comprising the following steps of:
(1) establishing an index table of a hive original data table in an elastic search engine;
(2) establishing an associated foreign key of the index table and the hive original data table, and enabling a query result of an Elasticissearch engine after the Elasticissearch engine is pushed to the Elasticissearch engine under a condition to be the attribute of the hive original data table;
(3) the hive engine acquires a DDL query statement, converts the DDL query statement into a query statement executable by an Elasticissearch storage engine, and pushes down the query statement to the Elasticissearch engine;
(4) the Elasticissearch engine executes the query action and then returns a query result;
(5) the hive engine queries the hive original data table according to the attribute information returned by the Elasticissearch engine and returns the query result to the client.
2. The hive index method for pushing down an elastic search based on conditions of claim 1, wherein the specific steps of converting a DDL query statement into a query statement executable by an elastic search storage engine comprise: resolving the DDL query statement into an abstract syntax tree to obtain a query expression of the abstract syntax tree; parsing the restful query portion of the query expression; an Elasticissearch restful query statement is created by DSLBuilder.
3. The hive index method based on conditional push-down elastic search of claim 2, wherein the method for obtaining the query expression of the abstract syntax tree is as follows:
and acquiring a query expression of the abstract syntax tree through a hive udf function match (String), wherein the parameter String of the function match (String) is a query statement of the elastic search.
4. The hive index method based on conditional push-down elastic search of claim 2, further comprising the steps of:
when the hive original data table has insertion/deletion/modification operations, the hive engine generates logic of hive original data table update data to index table update data according to the operation information of the hive original data table, and updates the index table according to the generated logic.
CN201911378666.XA 2019-12-27 2019-12-27 Hive index method based on conditional push-down elastic search Pending CN111159185A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911378666.XA CN111159185A (en) 2019-12-27 2019-12-27 Hive index method based on conditional push-down elastic search

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911378666.XA CN111159185A (en) 2019-12-27 2019-12-27 Hive index method based on conditional push-down elastic search

Publications (1)

Publication Number Publication Date
CN111159185A true CN111159185A (en) 2020-05-15

Family

ID=70558631

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911378666.XA Pending CN111159185A (en) 2019-12-27 2019-12-27 Hive index method based on conditional push-down elastic search

Country Status (1)

Country Link
CN (1) CN111159185A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107506464A (en) * 2017-08-30 2017-12-22 武汉烽火众智数字技术有限责任公司 A kind of method that HBase secondary indexs are realized based on ES
CN109299102A (en) * 2018-10-23 2019-02-01 中国电子科技集团公司第二十八研究所 A kind of HBase secondary index system and method based on Elastcisearch

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107506464A (en) * 2017-08-30 2017-12-22 武汉烽火众智数字技术有限责任公司 A kind of method that HBase secondary indexs are realized based on ES
CN109299102A (en) * 2018-10-23 2019-02-01 中国电子科技集团公司第二十八研究所 A kind of HBase secondary index system and method based on Elastcisearch

Similar Documents

Publication Publication Date Title
CN109299102B (en) HBase secondary index system and method based on Elastcissearch
CN106934062B (en) Implementation method and system for querying elastic search
Zhao et al. Modeling MongoDB with relational model
WO2017096939A1 (en) Method for establishing index on hdfs-based spark-sql big-data processing system
KR101083563B1 (en) Method and System for Managing Database
US9753960B1 (en) System, method, and computer program for dynamically generating a visual representation of a subset of a graph for display, based on search criteria
US10296505B2 (en) Framework for joining datasets
US20140136498A1 (en) Labeling versioned hierarchical data
CN104765731A (en) Database query optimization method and equipment
CN107491476B (en) Data model conversion and query analysis method suitable for various big data management systems
CN106933869B (en) Method and device for operating database
CN112231321B (en) Oracle secondary index and index real-time synchronization method
CN111177303B (en) Phoenix-based Hbase secondary full-text indexing method and system
CN107122486B (en) Multi-element big data fusion method and system supporting BLOB
CN109885585B (en) Distributed database system and method supporting stored procedures, triggers and views
CN104536987A (en) Data query method and device
CN105302842A (en) Data processing method and device
US20220035820A1 (en) Storage structure of data object, method and system for storing and dynamically managing data object on computer, and storage medium and electronic device
CN108198595B (en) Multi-source heterogeneous unstructured medical record data fusion method
US9881055B1 (en) Language conversion based on S-expression tabular structure
CN111159185A (en) Hive index method based on conditional push-down elastic search
JP2010072823A (en) Database management system and program
CN114036158B (en) Combined query method and device for elastic search and MySQL
CN116049193A (en) Data storage method and device
CN108920664A (en) A kind of database intelligence index implementation method based on index value

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200515

WD01 Invention patent application deemed withdrawn after publication