CN109739882B - Big data query optimization method based on Presto and Elasticissearch - Google Patents
Big data query optimization method based on Presto and Elasticissearch Download PDFInfo
- Publication number
- CN109739882B CN109739882B CN201910010085.4A CN201910010085A CN109739882B CN 109739882 B CN109739882 B CN 109739882B CN 201910010085 A CN201910010085 A CN 201910010085A CN 109739882 B CN109739882 B CN 109739882B
- Authority
- CN
- China
- Prior art keywords
- query
- node
- clause
- elasticissearch
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Abstract
The invention relates to a big data query optimization method based on Presto and Elasticissearch, which stores all queried data in an Elasticissearch cluster, receives and analyzes an SQL request through the Presto cluster to generate a corresponding abstract syntax tree and an execution plan, converts a query condition into an Elasticissearch query request and issues the Elasticissearch query request to the Elasticissearch for pre-query. The invention utilizes the advantages of Presto SQL analysis and Elasticissearch quick query to pre-filter partial data as much as possible before retrieving the data, thereby greatly reducing the reading of redundant data and improving the performance of conditional query.
Description
Technical Field
The invention relates to a quick query method of big data, and particularly provides a big data query optimization method based on Presto and Elasticissearch.
Background
The Elasticissearch is a search engine built in the full text, Apache LuceneTMThe distributed search engine based on real-time analysis uses Lucene as a core to realize all the functions of indexing and searching, so that the content of each document can be indexed, searched, sorted and filtered. However, the elastic search lacks the traditional SQL syntax support, so that developers are difficult to use, and data migration and docking work of a system based on a relational database is difficult to perform. While Presto can provide basic SQL syntax support for the Elasticsearch, but the memory-based query mechanism also needs to pre-read almost the whole amount of target data into the cluster memory, and the data required by the end user is only dozens or even several of massive data, so that a large amount of server resources and time are consumed in the process for reading and filtering redundant data, and the query efficiency is greatly reduced.
Disclosure of Invention
The invention aims to provide a big data query optimization method based on Presto and Elasticissearch, which uses a Presto cluster to receive and analyze SQL, and utilizes the characteristics of Elasticissearch quick query and filtering to pre-filter partial data in the Elasticissearch cluster, thereby reducing reading of redundant data and realizing quick query of data.
In order to achieve the purpose, the invention adopts the technical scheme that:
a big data query optimization method based on Presto and Elasticissearch stores the queried data in the index of Elasticissearch cluster, when the user submits SQL query request to Presto cluster, the following steps are executed:
step 1, a Presto cluster receives and analyzes an SQL query request and generates a corresponding abstract syntax tree, and a query execution plan tree is generated according to the syntax tree, wherein the query execution plan tree is of a structure without branches from top to bottom;
the SQL query request comprises one or more of a where clause, an order by clause and a limit clause;
step 2, traversing, analyzing and executing each node in the plan tree, specifically as follows:
firstly, finding out a node supported by an Elasticsearch in an execution plan tree, wherein the name and the corresponding SQL operation of the node are respectively as follows:
Node name | Function(s) | Corresponding SQL operations |
FilterNode | Screening data | where clause |
TopNNode | Sorting takes the first N rows | order by clause + limit |
LimitNode | Summary top N lines | limit clause |
TableScanNode | Reading all data in the table | Single select query |
Then, adding a Query parameter in the tablescan node for issuing a Query condition;
and finally, traversing all nodes in the query plan tree, and reading corresponding conditions according to the node types in the traversing process to issue:
when finding the FilterNode, indicating that the SQL Query request comprises a where clause, acquiring the content of the where clause through the filtering condition of the FilterNode, and transmitting the content to a Query in a head node TableSecanNode;
when TopNNode is found, the SQL Query request is shown to contain an order by clause and a limit clause at the same time, and then the contents of the order by clause and the limit clause are obtained through the count, the offset and the order by in the TopNNode and are transmitted to the Query in the first node TableScan node;
when a limit node is found, the SQL is shown to be used independently, and page-turning parameters are obtained through the count and the offset in the limit node and are transmitted to the Query in the first node TableSecanNode;
when finding a tablescan node, it indicates that all nodes of the execution plan have been traversed, and Query in the tablescan node already contains all Query conditions;
Step 3, the Presto cluster acquires Query parameters from the TableSecanNode and generates corresponding Query request parameters according to the parameter contents, wherein the where clause is converted into a filtering condition in an elastic search Query builder, and the order by and limit clauses are converted into a sequencing and data limiting condition in the Query builder;
step 4, sending the query request generated in the step 3 to an Elasticissearch cluster, performing query and pre-filtering on the Elasticissearch cluster, and returning an obtained result to the Presto cluster;
and 5, summarizing the data returned by each node by the Presto cluster, and generating a final result to the user.
After the scheme is adopted, all the queried data are stored in an Elasticissearch cluster, then a corresponding abstract syntax tree and a query plan are generated by receiving and analyzing an SQL request through a Presto cluster, the query request corresponding to the Elasticissearch is generated by analyzing the abstract syntax tree, on the premise of being supported by the Elasticissearch, the query condition is converted into the query request which can be identified by the Elasticissearch as much as possible, the execution plan is modified, the query condition is issued to the Elasticissearch, most data are filtered in advance by the Elasticissearch, the returned data basically meet the query condition requirements, and then the data are collected uniformly by the Presto finally obtain the result data. The invention utilizes the characteristics of fast query and filtration of the Elasticissearch to pre-filter partial data in the Elasticissearch cluster, reduces the reading of redundant data and realizes the fast query of the data.
Drawings
FIG. 1 is an overall system architecture of the present invention;
FIG. 2 is a query execution plan tree and process flow generated by SQL queries in Presto cluster.
Detailed Description
The invention discloses a big data query optimization method based on Presto and Elasticissearch, as shown in FIG. 1, the required system comprises a Presto cluster and an Elasticissearch cluster, a user can submit an SQL query request to the Presto cluster through various modes such as a command line tool, a jdbc client or a graphical interface development tool, the SQL request is received and analyzed by the Presto cluster, and then the query request is issued to the Elasticissearch cluster and data is read.
According to the big data query optimization method based on Presto and Elasticissearch, queried data are stored in the index of the Elasticissearch cluster, as shown in FIG. 2, when a user submits an SQL query request to the Presto cluster, the following steps are executed:
step 1, the Presto cluster receives and analyzes the SQL query request and generates a corresponding abstract syntax tree, and generates a query execution plan tree according to the syntax tree, wherein the query execution plan tree is of a structure without branches from top to bottom.
Wherein the SQL query request contains one or more of a where clause, an order by clause and a limit clause, for example
SELECT ID,NAME FROM EMPLOYEE WHERE ID>3 ORDER BY ID LIMIT 100,10。
Step 2, traversing, analyzing and executing each node in the plan tree, specifically as follows:
firstly, finding out a node supported by an Elasticsearch in an execution plan tree, wherein the name and the corresponding SQL operation of the node are respectively as follows:
node name | Function(s) | Corresponding SQL operations |
FilterNode | Screening data | where clause |
TopNNode | Sorting takes the first N rows | order by clause + limit |
LimitNode | Summary top N lines | limit clause |
TableScanNode | Reading all data in the table | Single select query |
Then, adding a Query parameter in the tablescan node for issuing a Query condition;
and finally, traversing all nodes in the query plan tree, and reading corresponding conditions according to the node types in the traversing process to issue:
when finding the FilterNode, indicating that the SQL Query request comprises a where clause, acquiring the content of the where clause through the filtering condition of the FilterNode, and transmitting the content to a Query in a head node TableSecanNode;
when TopNNode is found, the SQL Query request is shown to contain an order by clause and a limit clause at the same time, and then the contents of the order by clause and the limit clause are obtained through the count, the offset and the order by in the TopNNode and are transmitted to the Query in the first node TableScan node;
when a limit node is found, the SQL is shown to be used independently, and page-turning parameters are obtained through the count and the offset in the limit node and are transmitted to the Query in the first node TableSecanNode;
When a tablescan node is found, it means that all nodes executing the plan have been traversed, and Query in the tablescan node already contains all Query conditions.
Step 3, the Presto cluster acquires Query parameters from the tableScanNode and generates corresponding Query request parameters according to the parameter contents, wherein the where clause is converted into a filtering condition in an elastic search Query builder, and the order by and limit clauses are converted into a sorting and data limiting condition in the Query builder;
and 4, sending the query request query builder generated in the step 3 to the Elasticissearch cluster, converting the query and sequencing conditions into an Elasticissearch request, then querying and pre-filtering the Elasticissearch cluster, and returning the obtained result to the Presto cluster.
And 5, summarizing the data returned by each node by the Presto cluster, and generating a final result to the user.
The invention stores all the inquired data in an Elasticissearch cluster, then receives and analyzes an SQL request through a Presto cluster to generate a corresponding abstract syntax tree and an inquiry plan, generates the corresponding inquiry request of the Elasticissearch by analyzing the abstract syntax tree, converts the inquiry condition into the inquiry request which can be identified by the Elasticissearch as far as possible under the premise of being supported by the Elasticissearch, modifies an execution plan, sends the inquiry condition to the Elasticissearch, filters most data in advance by the Elasticissearch, returns the data which basically meets the requirement of the inquiry condition, and then is collected uniformly by Presto finally obtain the result data. When Presto queries one index in the elastic search, all data in the index needs to be read out (such as sorting, aggregation, and the like), while one index in the elastic search usually stores a large amount of data, reading all data out completely takes a long time, server resources consume a very large amount of resources, and only hundreds of data or even tens of data actually used may be used, which results in a very large waste of time and resources. The invention utilizes the characteristics of fast query and filtration of the Elasticissearch to pre-filter partial data in the Elasticissearch cluster, reduces the reading of redundant data and realizes the fast query of the data.
The above description is only exemplary of the present invention and is not intended to limit the technical scope of the present invention, so that any minor modifications, equivalent changes and modifications made to the above exemplary embodiments according to the technical spirit of the present invention are within the technical scope of the present invention.
Claims (1)
1. A big data query optimization method based on Presto and Elasticissearch is characterized by comprising the following steps: the optimization method stores the queried data in the index of the Elasticissearch cluster, and executes the following steps when a user submits an SQL query request to the Presto cluster:
step 1, a Presto cluster receives and analyzes an SQL query request and generates a corresponding abstract syntax tree, and a query execution plan tree is generated according to the syntax tree, wherein the query execution plan tree is of a structure without branches from top to bottom;
the SQL query request comprises one or more of a where clause, an order by clause and a limit clause;
step 2, traversing, analyzing and executing each node in the plan tree, specifically as follows:
firstly, finding out a node supported by an Elasticsearch in an execution plan tree, wherein the name and the corresponding SQL operation of the node are respectively as follows:
Then, adding a Query parameter in the tablescan node for issuing a Query condition;
and finally, traversing all nodes in the query plan tree, reading corresponding conditions according to the node types in the traversing process, and issuing the conditions:
when finding the FilterNode, indicating that the SQL Query request comprises a where clause, acquiring the content of the where clause through the filtering condition of the FilterNode, and transmitting the content to a Query in a head node TableSecanNode;
when TopNNode is found, the SQL Query request is shown to contain an order by clause and a limit clause at the same time, and then the contents of the order by clause and the limit clause are obtained through the count, the offset and the order by in the TopNNode and are transmitted to the Query in the first node TableScan node;
when a limit node is found, the SQL is shown to be used independently, and page-turning parameters are obtained through the count and the offset in the limit node and are transmitted to the Query in the first node TableSecanNode;
when finding a tablescan node, it indicates that all nodes of the execution plan have been traversed, and Query in the tablescan node already contains all Query conditions;
Step 3, the Presto cluster acquires Query parameters from the tableScanNode and generates corresponding Query request parameters according to the parameter contents, wherein the where clause is converted into a filtering condition in an elastic search Query builder, and the order by and limit clauses are converted into a sorting and data limiting condition in the Query builder;
step 4, sending the query request generated in the step 3 to an Elasticisearch cluster, performing query and pre-filtering on the Elasticisearch cluster, and returning an obtained result to the Presto cluster;
and 5, summarizing the data returned by each node by the Presto cluster, and generating a final result to the user.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910010085.4A CN109739882B (en) | 2019-01-04 | 2019-01-04 | Big data query optimization method based on Presto and Elasticissearch |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910010085.4A CN109739882B (en) | 2019-01-04 | 2019-01-04 | Big data query optimization method based on Presto and Elasticissearch |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109739882A CN109739882A (en) | 2019-05-10 |
CN109739882B true CN109739882B (en) | 2022-06-10 |
Family
ID=66363435
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910010085.4A Active CN109739882B (en) | 2019-01-04 | 2019-01-04 | Big data query optimization method based on Presto and Elasticissearch |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109739882B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110399386B (en) * | 2019-07-15 | 2022-05-10 | 南威软件股份有限公司 | Presto-based SQL UPDATE method and control system |
CN114168622A (en) * | 2020-09-10 | 2022-03-11 | 北京达佳互联信息技术有限公司 | Data query method and device based on domain specific language |
CN112800104A (en) * | 2020-12-08 | 2021-05-14 | 江苏苏宁云计算有限公司 | Method and device for optimizing ES query request link |
CN114500543B (en) * | 2022-01-13 | 2024-01-23 | 徐工汉云技术股份有限公司 | Distributed elastic edge acquisition system and application method thereof |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10467686B2 (en) * | 2007-02-09 | 2019-11-05 | Xcira, Inc. | Integrated auctioning environment platform |
US11474874B2 (en) * | 2014-08-14 | 2022-10-18 | Qubole, Inc. | Systems and methods for auto-scaling a big data system |
CN106528797A (en) * | 2016-11-10 | 2017-03-22 | 上海轻维软件有限公司 | DSL query method based on Elasticsearch |
CN106649630A (en) * | 2016-12-07 | 2017-05-10 | 乐视控股(北京)有限公司 | Data query method and device |
CN106934062B (en) * | 2017-03-28 | 2020-05-19 | 广东工业大学 | Implementation method and system for querying elastic search |
CN107133267B (en) * | 2017-04-01 | 2021-01-26 | 北京京东尚科信息技术有限公司 | Method and device for querying elastic search cluster, electronic equipment and readable storage medium |
CN107729428A (en) * | 2017-09-28 | 2018-02-23 | 南威软件股份有限公司 | A kind of SQL query method based on Presto and Elasticsearch |
CN107748766B (en) * | 2017-09-28 | 2021-08-24 | 南威软件股份有限公司 | Big data fast query method based on Presto and elastic search |
-
2019
- 2019-01-04 CN CN201910010085.4A patent/CN109739882B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN109739882A (en) | 2019-05-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109739882B (en) | Big data query optimization method based on Presto and Elasticissearch | |
Li | Human-in-the-loop data integration | |
CN103064875B (en) | A kind of spatial service data distributed enquiring method | |
CN106777108A (en) | A kind of data query method and apparatus based on mixing storage architecture | |
CN102799622B (en) | Distributed structured query language (SQL) query method based on MapReduce expansion framework | |
CN104899314B (en) | A kind of parentage analysis method and apparatus of data warehouse | |
CN101021874B (en) | Method and apparatus for optimizing request to poll SQL | |
CN111382226B (en) | Database query and retrieval method and device and electronic equipment | |
US20110072008A1 (en) | Query Optimization with Awareness of Limited Resource Usage | |
CN103927346B (en) | Query connection method on basis of data volumes | |
WO2017170459A1 (en) | Method, program, and system for automatic discovery of relationship between fields in environment where different types of data sources coexist | |
CN103729392A (en) | Method for optimizing query and query complier | |
CN107301205A (en) | A kind of distributed Query method in real time of big data and system | |
CN107748766B (en) | Big data fast query method based on Presto and elastic search | |
CN106599052A (en) | Data query system based on ApacheKylin, and method thereof | |
JP6159908B1 (en) | Method, program, and system for automatic discovery of relationships between fields in a heterogeneous data source mixed environment | |
CN106815353A (en) | A kind of method and apparatus of data query | |
Yafooz et al. | Managing unstructured data in relational databases | |
CN102819600B (en) | Keyword search methodology towards relational database of power production management system | |
JPWO2017170459A6 (en) | Method, program, and system for automatic discovery of relationships between fields in a heterogeneous data source mixed environment | |
CN107729428A (en) | A kind of SQL query method based on Presto and Elasticsearch | |
CN113934750A (en) | Data blood relationship analysis method based on compiling mode | |
KR102345410B1 (en) | Big data intelligent collecting method and device | |
CN104714956A (en) | Comparison method and device for isomerism record sets | |
CN116010439A (en) | Visual Chinese SQL system and query construction method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |