CN109739882B - Big data query optimization method based on Presto and Elasticissearch - Google Patents

Big data query optimization method based on Presto and Elasticissearch Download PDF

Info

Publication number
CN109739882B
CN109739882B CN201910010085.4A CN201910010085A CN109739882B CN 109739882 B CN109739882 B CN 109739882B CN 201910010085 A CN201910010085 A CN 201910010085A CN 109739882 B CN109739882 B CN 109739882B
Authority
CN
China
Prior art keywords
query
node
clause
elasticissearch
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910010085.4A
Other languages
Chinese (zh)
Other versions
CN109739882A (en
Inventor
洪灿榕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Linewell Software Co Ltd
Original Assignee
Linewell Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Linewell Software Co Ltd filed Critical Linewell Software Co Ltd
Priority to CN201910010085.4A priority Critical patent/CN109739882B/en
Publication of CN109739882A publication Critical patent/CN109739882A/en
Application granted granted Critical
Publication of CN109739882B publication Critical patent/CN109739882B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to a big data query optimization method based on Presto and Elasticissearch, which stores all queried data in an Elasticissearch cluster, receives and analyzes an SQL request through the Presto cluster to generate a corresponding abstract syntax tree and an execution plan, converts a query condition into an Elasticissearch query request and issues the Elasticissearch query request to the Elasticissearch for pre-query. The invention utilizes the advantages of Presto SQL analysis and Elasticissearch quick query to pre-filter partial data as much as possible before retrieving the data, thereby greatly reducing the reading of redundant data and improving the performance of conditional query.

Description

Big data query optimization method based on Presto and Elasticissearch
Technical Field
The invention relates to a quick query method of big data, and particularly provides a big data query optimization method based on Presto and Elasticissearch.
Background
The Elasticissearch is a search engine built in the full text, Apache LuceneTMThe distributed search engine based on real-time analysis uses Lucene as a core to realize all the functions of indexing and searching, so that the content of each document can be indexed, searched, sorted and filtered. However, the elastic search lacks the traditional SQL syntax support, so that developers are difficult to use, and data migration and docking work of a system based on a relational database is difficult to perform. While Presto can provide basic SQL syntax support for the Elasticsearch, but the memory-based query mechanism also needs to pre-read almost the whole amount of target data into the cluster memory, and the data required by the end user is only dozens or even several of massive data, so that a large amount of server resources and time are consumed in the process for reading and filtering redundant data, and the query efficiency is greatly reduced.
Disclosure of Invention
The invention aims to provide a big data query optimization method based on Presto and Elasticissearch, which uses a Presto cluster to receive and analyze SQL, and utilizes the characteristics of Elasticissearch quick query and filtering to pre-filter partial data in the Elasticissearch cluster, thereby reducing reading of redundant data and realizing quick query of data.
In order to achieve the purpose, the invention adopts the technical scheme that:
a big data query optimization method based on Presto and Elasticissearch stores the queried data in the index of Elasticissearch cluster, when the user submits SQL query request to Presto cluster, the following steps are executed:
step 1, a Presto cluster receives and analyzes an SQL query request and generates a corresponding abstract syntax tree, and a query execution plan tree is generated according to the syntax tree, wherein the query execution plan tree is of a structure without branches from top to bottom;
the SQL query request comprises one or more of a where clause, an order by clause and a limit clause;
step 2, traversing, analyzing and executing each node in the plan tree, specifically as follows:
firstly, finding out a node supported by an Elasticsearch in an execution plan tree, wherein the name and the corresponding SQL operation of the node are respectively as follows:
Node name Function(s) Corresponding SQL operations
FilterNode Screening data where clause
TopNNode Sorting takes the first N rows order by clause + limit
LimitNode Summary top N lines limit clause
TableScanNode Reading all data in the table Single select query
Then, adding a Query parameter in the tablescan node for issuing a Query condition;
and finally, traversing all nodes in the query plan tree, and reading corresponding conditions according to the node types in the traversing process to issue:
when finding the FilterNode, indicating that the SQL Query request comprises a where clause, acquiring the content of the where clause through the filtering condition of the FilterNode, and transmitting the content to a Query in a head node TableSecanNode;
when TopNNode is found, the SQL Query request is shown to contain an order by clause and a limit clause at the same time, and then the contents of the order by clause and the limit clause are obtained through the count, the offset and the order by in the TopNNode and are transmitted to the Query in the first node TableScan node;
when a limit node is found, the SQL is shown to be used independently, and page-turning parameters are obtained through the count and the offset in the limit node and are transmitted to the Query in the first node TableSecanNode;
when finding a tablescan node, it indicates that all nodes of the execution plan have been traversed, and Query in the tablescan node already contains all Query conditions;
Step 3, the Presto cluster acquires Query parameters from the TableSecanNode and generates corresponding Query request parameters according to the parameter contents, wherein the where clause is converted into a filtering condition in an elastic search Query builder, and the order by and limit clauses are converted into a sequencing and data limiting condition in the Query builder;
step 4, sending the query request generated in the step 3 to an Elasticissearch cluster, performing query and pre-filtering on the Elasticissearch cluster, and returning an obtained result to the Presto cluster;
and 5, summarizing the data returned by each node by the Presto cluster, and generating a final result to the user.
After the scheme is adopted, all the queried data are stored in an Elasticissearch cluster, then a corresponding abstract syntax tree and a query plan are generated by receiving and analyzing an SQL request through a Presto cluster, the query request corresponding to the Elasticissearch is generated by analyzing the abstract syntax tree, on the premise of being supported by the Elasticissearch, the query condition is converted into the query request which can be identified by the Elasticissearch as much as possible, the execution plan is modified, the query condition is issued to the Elasticissearch, most data are filtered in advance by the Elasticissearch, the returned data basically meet the query condition requirements, and then the data are collected uniformly by the Presto finally obtain the result data. The invention utilizes the characteristics of fast query and filtration of the Elasticissearch to pre-filter partial data in the Elasticissearch cluster, reduces the reading of redundant data and realizes the fast query of the data.
Drawings
FIG. 1 is an overall system architecture of the present invention;
FIG. 2 is a query execution plan tree and process flow generated by SQL queries in Presto cluster.
Detailed Description
The invention discloses a big data query optimization method based on Presto and Elasticissearch, as shown in FIG. 1, the required system comprises a Presto cluster and an Elasticissearch cluster, a user can submit an SQL query request to the Presto cluster through various modes such as a command line tool, a jdbc client or a graphical interface development tool, the SQL request is received and analyzed by the Presto cluster, and then the query request is issued to the Elasticissearch cluster and data is read.
According to the big data query optimization method based on Presto and Elasticissearch, queried data are stored in the index of the Elasticissearch cluster, as shown in FIG. 2, when a user submits an SQL query request to the Presto cluster, the following steps are executed:
step 1, the Presto cluster receives and analyzes the SQL query request and generates a corresponding abstract syntax tree, and generates a query execution plan tree according to the syntax tree, wherein the query execution plan tree is of a structure without branches from top to bottom.
Wherein the SQL query request contains one or more of a where clause, an order by clause and a limit clause, for example
SELECT ID,NAME FROM EMPLOYEE WHERE ID>3 ORDER BY ID LIMIT 100,10。
Step 2, traversing, analyzing and executing each node in the plan tree, specifically as follows:
firstly, finding out a node supported by an Elasticsearch in an execution plan tree, wherein the name and the corresponding SQL operation of the node are respectively as follows:
node name Function(s) Corresponding SQL operations
FilterNode Screening data where clause
TopNNode Sorting takes the first N rows order by clause + limit
LimitNode Summary top N lines limit clause
TableScanNode Reading all data in the table Single select query
Then, adding a Query parameter in the tablescan node for issuing a Query condition;
and finally, traversing all nodes in the query plan tree, and reading corresponding conditions according to the node types in the traversing process to issue:
when finding the FilterNode, indicating that the SQL Query request comprises a where clause, acquiring the content of the where clause through the filtering condition of the FilterNode, and transmitting the content to a Query in a head node TableSecanNode;
when TopNNode is found, the SQL Query request is shown to contain an order by clause and a limit clause at the same time, and then the contents of the order by clause and the limit clause are obtained through the count, the offset and the order by in the TopNNode and are transmitted to the Query in the first node TableScan node;
when a limit node is found, the SQL is shown to be used independently, and page-turning parameters are obtained through the count and the offset in the limit node and are transmitted to the Query in the first node TableSecanNode;
When a tablescan node is found, it means that all nodes executing the plan have been traversed, and Query in the tablescan node already contains all Query conditions.
Step 3, the Presto cluster acquires Query parameters from the tableScanNode and generates corresponding Query request parameters according to the parameter contents, wherein the where clause is converted into a filtering condition in an elastic search Query builder, and the order by and limit clauses are converted into a sorting and data limiting condition in the Query builder;
and 4, sending the query request query builder generated in the step 3 to the Elasticissearch cluster, converting the query and sequencing conditions into an Elasticissearch request, then querying and pre-filtering the Elasticissearch cluster, and returning the obtained result to the Presto cluster.
And 5, summarizing the data returned by each node by the Presto cluster, and generating a final result to the user.
The invention stores all the inquired data in an Elasticissearch cluster, then receives and analyzes an SQL request through a Presto cluster to generate a corresponding abstract syntax tree and an inquiry plan, generates the corresponding inquiry request of the Elasticissearch by analyzing the abstract syntax tree, converts the inquiry condition into the inquiry request which can be identified by the Elasticissearch as far as possible under the premise of being supported by the Elasticissearch, modifies an execution plan, sends the inquiry condition to the Elasticissearch, filters most data in advance by the Elasticissearch, returns the data which basically meets the requirement of the inquiry condition, and then is collected uniformly by Presto finally obtain the result data. When Presto queries one index in the elastic search, all data in the index needs to be read out (such as sorting, aggregation, and the like), while one index in the elastic search usually stores a large amount of data, reading all data out completely takes a long time, server resources consume a very large amount of resources, and only hundreds of data or even tens of data actually used may be used, which results in a very large waste of time and resources. The invention utilizes the characteristics of fast query and filtration of the Elasticissearch to pre-filter partial data in the Elasticissearch cluster, reduces the reading of redundant data and realizes the fast query of the data.
The above description is only exemplary of the present invention and is not intended to limit the technical scope of the present invention, so that any minor modifications, equivalent changes and modifications made to the above exemplary embodiments according to the technical spirit of the present invention are within the technical scope of the present invention.

Claims (1)

1. A big data query optimization method based on Presto and Elasticissearch is characterized by comprising the following steps: the optimization method stores the queried data in the index of the Elasticissearch cluster, and executes the following steps when a user submits an SQL query request to the Presto cluster:
step 1, a Presto cluster receives and analyzes an SQL query request and generates a corresponding abstract syntax tree, and a query execution plan tree is generated according to the syntax tree, wherein the query execution plan tree is of a structure without branches from top to bottom;
the SQL query request comprises one or more of a where clause, an order by clause and a limit clause;
step 2, traversing, analyzing and executing each node in the plan tree, specifically as follows:
firstly, finding out a node supported by an Elasticsearch in an execution plan tree, wherein the name and the corresponding SQL operation of the node are respectively as follows:
node name Function(s) Corresponding SQL operations FilterNode Screening data where clause TopNNode Sorting takes the first N rows order by clause + limit LimitNode Summary top N lines limit clause TableScanNode Reading all data in the table Single select query
Then, adding a Query parameter in the tablescan node for issuing a Query condition;
and finally, traversing all nodes in the query plan tree, reading corresponding conditions according to the node types in the traversing process, and issuing the conditions:
when finding the FilterNode, indicating that the SQL Query request comprises a where clause, acquiring the content of the where clause through the filtering condition of the FilterNode, and transmitting the content to a Query in a head node TableSecanNode;
when TopNNode is found, the SQL Query request is shown to contain an order by clause and a limit clause at the same time, and then the contents of the order by clause and the limit clause are obtained through the count, the offset and the order by in the TopNNode and are transmitted to the Query in the first node TableScan node;
when a limit node is found, the SQL is shown to be used independently, and page-turning parameters are obtained through the count and the offset in the limit node and are transmitted to the Query in the first node TableSecanNode;
when finding a tablescan node, it indicates that all nodes of the execution plan have been traversed, and Query in the tablescan node already contains all Query conditions;
Step 3, the Presto cluster acquires Query parameters from the tableScanNode and generates corresponding Query request parameters according to the parameter contents, wherein the where clause is converted into a filtering condition in an elastic search Query builder, and the order by and limit clauses are converted into a sorting and data limiting condition in the Query builder;
step 4, sending the query request generated in the step 3 to an Elasticisearch cluster, performing query and pre-filtering on the Elasticisearch cluster, and returning an obtained result to the Presto cluster;
and 5, summarizing the data returned by each node by the Presto cluster, and generating a final result to the user.
CN201910010085.4A 2019-01-04 2019-01-04 Big data query optimization method based on Presto and Elasticissearch Active CN109739882B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910010085.4A CN109739882B (en) 2019-01-04 2019-01-04 Big data query optimization method based on Presto and Elasticissearch

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910010085.4A CN109739882B (en) 2019-01-04 2019-01-04 Big data query optimization method based on Presto and Elasticissearch

Publications (2)

Publication Number Publication Date
CN109739882A CN109739882A (en) 2019-05-10
CN109739882B true CN109739882B (en) 2022-06-10

Family

ID=66363435

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910010085.4A Active CN109739882B (en) 2019-01-04 2019-01-04 Big data query optimization method based on Presto and Elasticissearch

Country Status (1)

Country Link
CN (1) CN109739882B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110399386B (en) * 2019-07-15 2022-05-10 南威软件股份有限公司 Presto-based SQL UPDATE method and control system
CN114168622A (en) * 2020-09-10 2022-03-11 北京达佳互联信息技术有限公司 Data query method and device based on domain specific language
CN112800104A (en) * 2020-12-08 2021-05-14 江苏苏宁云计算有限公司 Method and device for optimizing ES query request link
CN114500543B (en) * 2022-01-13 2024-01-23 徐工汉云技术股份有限公司 Distributed elastic edge acquisition system and application method thereof

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10467686B2 (en) * 2007-02-09 2019-11-05 Xcira, Inc. Integrated auctioning environment platform
US11474874B2 (en) * 2014-08-14 2022-10-18 Qubole, Inc. Systems and methods for auto-scaling a big data system
CN106528797A (en) * 2016-11-10 2017-03-22 上海轻维软件有限公司 DSL query method based on Elasticsearch
CN106649630A (en) * 2016-12-07 2017-05-10 乐视控股(北京)有限公司 Data query method and device
CN106934062B (en) * 2017-03-28 2020-05-19 广东工业大学 Implementation method and system for querying elastic search
CN107133267B (en) * 2017-04-01 2021-01-26 北京京东尚科信息技术有限公司 Method and device for querying elastic search cluster, electronic equipment and readable storage medium
CN107729428A (en) * 2017-09-28 2018-02-23 南威软件股份有限公司 A kind of SQL query method based on Presto and Elasticsearch
CN107748766B (en) * 2017-09-28 2021-08-24 南威软件股份有限公司 Big data fast query method based on Presto and elastic search

Also Published As

Publication number Publication date
CN109739882A (en) 2019-05-10

Similar Documents

Publication Publication Date Title
CN109739882B (en) Big data query optimization method based on Presto and Elasticissearch
Li Human-in-the-loop data integration
CN103064875B (en) A kind of spatial service data distributed enquiring method
CN106777108A (en) A kind of data query method and apparatus based on mixing storage architecture
CN102799622B (en) Distributed structured query language (SQL) query method based on MapReduce expansion framework
CN104899314B (en) A kind of parentage analysis method and apparatus of data warehouse
CN101021874B (en) Method and apparatus for optimizing request to poll SQL
CN111382226B (en) Database query and retrieval method and device and electronic equipment
US20110072008A1 (en) Query Optimization with Awareness of Limited Resource Usage
CN103927346B (en) Query connection method on basis of data volumes
WO2017170459A1 (en) Method, program, and system for automatic discovery of relationship between fields in environment where different types of data sources coexist
CN103729392A (en) Method for optimizing query and query complier
CN107301205A (en) A kind of distributed Query method in real time of big data and system
CN107748766B (en) Big data fast query method based on Presto and elastic search
CN106599052A (en) Data query system based on ApacheKylin, and method thereof
JP6159908B1 (en) Method, program, and system for automatic discovery of relationships between fields in a heterogeneous data source mixed environment
CN106815353A (en) A kind of method and apparatus of data query
Yafooz et al. Managing unstructured data in relational databases
CN102819600B (en) Keyword search methodology towards relational database of power production management system
JPWO2017170459A6 (en) Method, program, and system for automatic discovery of relationships between fields in a heterogeneous data source mixed environment
CN107729428A (en) A kind of SQL query method based on Presto and Elasticsearch
CN113934750A (en) Data blood relationship analysis method based on compiling mode
KR102345410B1 (en) Big data intelligent collecting method and device
CN104714956A (en) Comparison method and device for isomerism record sets
CN116010439A (en) Visual Chinese SQL system and query construction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant