CN109739882A - A kind of big data enquiring and optimizing method based on Presto and Elasticsearch - Google Patents

A kind of big data enquiring and optimizing method based on Presto and Elasticsearch Download PDF

Info

Publication number
CN109739882A
CN109739882A CN201910010085.4A CN201910010085A CN109739882A CN 109739882 A CN109739882 A CN 109739882A CN 201910010085 A CN201910010085 A CN 201910010085A CN 109739882 A CN109739882 A CN 109739882A
Authority
CN
China
Prior art keywords
elasticsearch
clause
data
query
presto
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910010085.4A
Other languages
Chinese (zh)
Other versions
CN109739882B (en
Inventor
洪灿榕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Linewell Software Co Ltd
Original Assignee
Linewell Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Linewell Software Co Ltd filed Critical Linewell Software Co Ltd
Priority to CN201910010085.4A priority Critical patent/CN109739882B/en
Publication of CN109739882A publication Critical patent/CN109739882A/en
Application granted granted Critical
Publication of CN109739882B publication Critical patent/CN109739882B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The present invention relates to a kind of big data enquiring and optimizing methods based on Presto and Elasticsearch, all data that are queried are stored in Elasticsearch cluster by it, then SQL request is received and parsed through by Presto cluster and generates corresponding abstract syntax tree and executive plan, querying condition is converted into Elasticsearch inquiry request and be issued in Elasticsearch to be inquired in advance.The present invention is using the advantage of Presto SQL parsing and Elasticsearch quick search, and filtration fraction data as preparatory as possible, are greatly decreased the reading of redundant data before fetching data, to improve the performance of condition query.

Description

A kind of big data enquiring and optimizing method based on Presto and Elasticsearch
Technical field
The present invention relates to the method for quickly querying of big data, specifically provide it is a kind of based on Presto and The big data enquiring and optimizing method of Elasticsearch.
Background technique
Elasticsearch is one and establishes in full-text search engine Apache LuceneTMOn the basis of in real time analyze Distributed search engine, the function that it, which uses Lucene as core, realizes all indexes and search for, so that each document Content can be indexed, searches for, sorts, filter.But elasticsearch lacks traditional SQL syntax and supports, developer Using more difficult, system its Data Migration, mating operation based on relevant database are also not easy to carry out.And Presto can be Elasticsearch provides basic SQL syntax and supports, but its inquiry mechanism memory-based is also required to the mesh almost full dose Data pre-head is marked into cluster memory, and data required for end user are tens even several in mass data, A large amount of server resource and time are consumed in the process for the reading of redundant data and overanxious, cause search efficiency significantly under Drop.
Summary of the invention
The purpose of the present invention is to provide a kind of big data query optimization sides based on Presto and Elasticsearch Method carries out the reception and parsing of SQL using Presto cluster, using the characteristic of Elasticsearch quick search and filtering, Preparatory filtration fraction data, reduce the reading of redundant data, realize the quick search of data in Elasticsearch cluster.
To achieve the above object, the technical solution adopted by the present invention is that:
A kind of big data enquiring and optimizing method based on Presto and Elasticsearch, the optimization method will be looked into The data of inquiry are stored in the index of Elasticsearch cluster, when user submits SQL query to request to Presto cluster, Execute following steps:
Step 1, Presto cluster receive and parse through SQL query and request and generate corresponding abstract syntax tree, according to the language Method tree generates query execution plan tree, which is the not no structure of branch from top to bottom;
Wherein, SQL query request includes one or more of where clause, order by clause, limit clause;
Each node in step 2, traversal analysis executive plan tree, specific as follows:
Firstly, the node of executive plan tree Elasticsearch support is found out, title and corresponding SQL operation difference Are as follows:
Nodename Function Corresponding SQL operation
FilterNode Garbled data Where clause
TopNNode N row before sequence takes Order by clause+limit
LimitNode Summarize N row before taking Limit clause
TableScanNode Read all data in table Single select inquiry
Then, a Query parameter is added in TableScanNode, for issuing querying condition;
Finally, all nodes in traversal queries plan tree, read corresponding conditions according to node type in ergodic process It issues:
When finding FilterNode, show then to pass through FilterNode comprising where clause in SQL query request Filter condition, obtain where clause's content, pass it to the Query in first node TableScanNode;
When finding TopNNode, show then to lead in SQL query request while comprising order by and limit clause Count, offset, the orderBy crossed in TopNNode obtains order by and limit clause's content, passes it to first section Query in point TableScanNode;
When finding LimitNode, show the SQL for limit clause's exclusive use, then by LimitNode Count, offset obtain page turning parameter, pass it to the Query in first node TableScanNode;
When finding TableScanNode, illustrates that all nodes of executive plan have traversed and finish, TableScanNode In Query included all querying conditions;
Step 3, Presto cluster obtain Query parameter from TableScanNode, are generated according to content of parameter corresponding Inquiry request parameter, wherein where clause is converted to the filter condition in Elasticsearch QueryBuilder, and Order by and limit clause are then converted to sequence and data restrictive condition in QueryBuilder;
The inquiry request Elasticsearch QueryBuilder generated in step 3 is sent to by step 4 Elasticsearch cluster carries out inquiry and in advance filtering by Elasticsearch cluster, then obtained result is returned to Presto cluster;
Step 5, Presto cluster summarize the data that each node returns, and generate final result to user.
After adopting the above scheme, all data that are queried are stored in Elasticsearch cluster by the present invention, are then led to It crosses Presto cluster and receives and parses through the corresponding abstract syntax tree of SQL request generation and inquiry plan, pass through and analyze abstract syntax Tree generates the corresponding inquiry request of Elasticsearch, under the premise of Elasticsearch is supported, will inquire item as far as possible It is identifiable to inquiry request that part is converted to Elasticsearch, and modifies executive plan, and querying condition is issued to In Elasticsearch, most of data are filtered in advance by Elasticsearch, the data of return have substantially conformed to inquiry item Part requirement, then uniformly summarized by Presto, finally obtain result data.And the present invention utilizes Elasticsearch quick search With the characteristic of filtering, the preparatory filtration fraction data in Elasticsearch cluster reduce the reading of redundant data, realize number According to quick search.
Detailed description of the invention
Fig. 1 is overall system architecture of the invention;
Fig. 2 is SQL query query execution plan tree generated and process flow in Presto cluster.
Specific embodiment
Present invention discloses a kind of big data enquiring and optimizing methods based on Presto and Elasticsearch, such as Fig. 1 institute Show, needed for system in include Presto cluster and Elasticsearch cluster, user can by command-line tool, The various ways such as jdbc client or Development of Graphical Interfaces tool submit SQL query request to Presto cluster, by Presto collection Group receives and parses through SQL request, then issues inquiry request to Elasticsearch cluster and reads data.
The present invention's is protected the data being queried based on the big data enquiring and optimizing method of Presto and Elasticsearch There are in the index of Elasticsearch cluster, as shown in Fig. 2, when user submits SQL query to request to Presto cluster, Execute following steps:
Step 1, Presto cluster receive and parse through SQL query and request and generate corresponding abstract syntax tree, according to the language Method tree generates query execution plan tree, which is the not no structure of branch from top to bottom.
Wherein, SQL query request includes one or more of where clause, order by clause, limit clause, example Such as
SELECT ID,NAME FROM EMPLOYEE WHERE ID>3 ORDER BY ID LIMIT 100,10。
Each node in step 2, traversal analysis executive plan tree, specific as follows:
Firstly, the node of executive plan tree Elasticsearch support is found out, title and corresponding SQL operation difference Are as follows:
Nodename Function Corresponding SQL operation
FilterNode Garbled data Where clause
TopNNode N row before sequence takes Order by clause+limit
LimitNode Summarize N row before taking Limit clause
TableScanNode Read all data in table Single select inquiry
Then, a Query parameter is added in TableScanNode, for issuing querying condition;
Finally, all nodes in traversal queries plan tree, read corresponding conditions according to node type in ergodic process It issues:
When finding FilterNode, show then to pass through FilterNode comprising where clause in SQL query request Filter condition, obtain where clause's content, pass it to the Query in first node TableScanNode;
When finding TopNNode, show then to lead in SQL query request while comprising order by and limit clause Count, offset, the orderBy crossed in TopNNode obtains order by and limit clause's content, passes it to first section Query in point TableScanNode;
When finding LimitNode, show the SQL for limit clause's exclusive use, then by LimitNode Count, offset obtain page turning parameter, pass it to the Query in first node TableScanNode;
When finding TableScanNode, illustrates that all nodes of executive plan have traversed and finish, TableScanNode In Query included all querying conditions.
Step 3, Presto cluster obtain Query parameter from TableScanNode, are generated according to content of parameter corresponding Inquiry request parameter, wherein where clause is converted to the filter condition in Elasticsearch QueryBuilder, and Order by and limit clause are then converted to sequence and data restrictive condition in QueryBuilder;
The inquiry request Elasticsearch QueryBuilder generated in step 3 is sent to by step 4 Elasticsearch cluster, since Elasticsearch supports conditions itself filter and sort, will inquire and sort item Part is converted into carrying out inquiry and filtering, then the knot that will be obtained in advance by Elasticsearch cluster after Elasticsearch request Fruit returns to Presto cluster.
Step 5, Presto cluster summarize the data that each node returns, and generate final result to user.
All data that are queried are stored in Elasticsearch cluster by the present invention, are then connect by Presto cluster It receives and parses SQL request and generate corresponding abstract syntax tree and inquiry plan, by analyzing abstract syntax tree, generate The corresponding inquiry request of Elasticsearch as far as possible converts querying condition under the premise of Elasticsearch is supported Inquiry request is arrived for Elasticsearch is identifiable, and modifies executive plan, querying condition is issued to In Elasticsearch, most of data are filtered in advance by Elasticsearch, the data of return have substantially conformed to inquiry item Part requirement, then uniformly summarized by Presto, finally obtain result data.Due to one in Presto docking Elasticsearch When a index is inquired, need to index in all data all read (as sort, polymerization etc.), and One indexes often in store mass data in Elasticsearch, and all data are all read to time-consuming very long, server Resource consumption is very big, and practical adopted data may only have several hundred even tens, cause very big time and money The waste in source.And the present invention utilizes the characteristic of Elasticsearch quick search and filtering, in Elasticsearch cluster Preparatory filtration fraction data, reduce the reading of redundant data, realize the quick search of data.
The above is only the embodiment of the present invention, is not intended to limit the scope of the present invention, therefore all Any subtle modifications, equivalent variations and modifications to the above embodiments according to the technical essence of the invention still fall within this In the range of inventive technique scheme.

Claims (1)

1. a kind of big data enquiring and optimizing method based on Presto and Elasticsearch, it is characterised in that: the optimization side The data being queried are stored in the index of Elasticsearch cluster by method, when user submits SQL query to Presto cluster When request, following steps are executed:
Step 1, Presto cluster receive and parse through SQL query and request and generate corresponding abstract syntax tree, according to the syntax tree Query execution plan tree is generated, which is the not no structure of branch from top to bottom;
Wherein, SQL query request includes one or more of where clause, orderby clause, limit clause;
Each node in step 2, traversal analysis executive plan tree, specific as follows:
Firstly, finding out the node of executive plan tree Elasticsearch support, title and corresponding SQL operation are respectively as follows:
Nodename Function Corresponding SQL operation FilterNode Garbled data Where clause TopNNode N row before sequence takes Orderby clause+limit LimitNode Summarize N row before taking Limit clause TableScanNode Read all data in table Single select inquiry
Then, a Query parameter is added in TableScanNode, for issuing querying condition;
Finally, all nodes in traversal queries plan tree, read corresponding conditions according to node type in ergodic process and issue:
When finding FilterNode, show then to pass through the mistake of FilterNode comprising where clause in SQL query request Filter condition obtains where clause's content, passes it to the Query in first node TableScanNode;
When finding TopNNode, show then to pass through in SQL query request while comprising orderby and limit clause Count, offset, orderBy in TopNNode obtain orderby and limit clause content, pass it to first node Query in TableScanNode;
When finding LimitNode, show the SQL for limit clause's exclusive use, then by count in LimitNode, Offset obtains page turning parameter, passes it to the Query in first node TableScanNode;
When finding TableScanNode, illustrates that all nodes of executive plan have traversed and finish, in TableScanNode Query has included all querying conditions;
Step 3, Presto cluster obtain Query parameter from TableScanNode, generate corresponding inquiry according to content of parameter Required parameter, wherein where clause is converted to the filter condition in Elasticsearch QueryBuilder, and order By and limit clause is then converted to sequence and data restrictive condition in QueryBuilder;
The inquiry request Elasticsearch QueryBuilder generated in step 3 is sent to Elasticsearch by step 4 Cluster carries out inquiry and in advance filtering by Elasticsearch cluster, then obtained result is returned to Presto cluster;
Step 5, Presto cluster summarize the data that each node returns, and generate final result to user.
CN201910010085.4A 2019-01-04 2019-01-04 Big data query optimization method based on Presto and Elasticissearch Active CN109739882B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910010085.4A CN109739882B (en) 2019-01-04 2019-01-04 Big data query optimization method based on Presto and Elasticissearch

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910010085.4A CN109739882B (en) 2019-01-04 2019-01-04 Big data query optimization method based on Presto and Elasticissearch

Publications (2)

Publication Number Publication Date
CN109739882A true CN109739882A (en) 2019-05-10
CN109739882B CN109739882B (en) 2022-06-10

Family

ID=66363435

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910010085.4A Active CN109739882B (en) 2019-01-04 2019-01-04 Big data query optimization method based on Presto and Elasticissearch

Country Status (1)

Country Link
CN (1) CN109739882B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110399386A (en) * 2019-07-15 2019-11-01 南威软件股份有限公司 A kind of SQL UPDATE method and control system based on Presto
CN112800104A (en) * 2020-12-08 2021-05-14 江苏苏宁云计算有限公司 Method and device for optimizing ES query request link
WO2022052639A1 (en) * 2020-09-10 2022-03-17 北京达佳互联信息技术有限公司 Data query method and apparatus
CN114500543A (en) * 2022-01-13 2022-05-13 徐工汉云技术股份有限公司 Distributed elastic edge acquisition system and application method thereof

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016025924A1 (en) * 2014-08-14 2016-02-18 Qubole, Inc. Systems and methods for auto-scaling a big data system
US20160203548A1 (en) * 2007-02-09 2016-07-14 Xcira, Inc. Integrated auctioning environment platform
CN106528797A (en) * 2016-11-10 2017-03-22 上海轻维软件有限公司 DSL query method based on Elasticsearch
CN106649630A (en) * 2016-12-07 2017-05-10 乐视控股(北京)有限公司 Data query method and device
CN106934062A (en) * 2017-03-28 2017-07-07 广东工业大学 A kind of realization method and system of inquiry elasticsearch
CN107133267A (en) * 2017-04-01 2017-09-05 北京京东尚科信息技术有限公司 Inquire about method, device, electronic equipment and the readable storage medium storing program for executing of elasticsearch clusters
CN107729428A (en) * 2017-09-28 2018-02-23 南威软件股份有限公司 A kind of SQL query method based on Presto and Elasticsearch
CN107748766A (en) * 2017-09-28 2018-03-02 南威软件股份有限公司 A kind of big data method for quickly querying based on Presto and Elasticsearch

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160203548A1 (en) * 2007-02-09 2016-07-14 Xcira, Inc. Integrated auctioning environment platform
WO2016025924A1 (en) * 2014-08-14 2016-02-18 Qubole, Inc. Systems and methods for auto-scaling a big data system
CN106528797A (en) * 2016-11-10 2017-03-22 上海轻维软件有限公司 DSL query method based on Elasticsearch
CN106649630A (en) * 2016-12-07 2017-05-10 乐视控股(北京)有限公司 Data query method and device
CN106934062A (en) * 2017-03-28 2017-07-07 广东工业大学 A kind of realization method and system of inquiry elasticsearch
CN107133267A (en) * 2017-04-01 2017-09-05 北京京东尚科信息技术有限公司 Inquire about method, device, electronic equipment and the readable storage medium storing program for executing of elasticsearch clusters
CN107729428A (en) * 2017-09-28 2018-02-23 南威软件股份有限公司 A kind of SQL query method based on Presto and Elasticsearch
CN107748766A (en) * 2017-09-28 2018-03-02 南威软件股份有限公司 A kind of big data method for quickly querying based on Presto and Elasticsearch

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DEQUAN CHEN, YI CHEN, BRIAN N. BROWNLOW, PRADIP P. KANJAMALA, CA: "Real-Time or Near Real-Time Persisting Daily", 《IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS》 *
胡庆宝等: "基于Elasticsearch的实时集群日志采集和分析系统实现", 《科研信息化技术与应用》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110399386A (en) * 2019-07-15 2019-11-01 南威软件股份有限公司 A kind of SQL UPDATE method and control system based on Presto
CN110399386B (en) * 2019-07-15 2022-05-10 南威软件股份有限公司 Presto-based SQL UPDATE method and control system
WO2022052639A1 (en) * 2020-09-10 2022-03-17 北京达佳互联信息技术有限公司 Data query method and apparatus
CN112800104A (en) * 2020-12-08 2021-05-14 江苏苏宁云计算有限公司 Method and device for optimizing ES query request link
CN114500543A (en) * 2022-01-13 2022-05-13 徐工汉云技术股份有限公司 Distributed elastic edge acquisition system and application method thereof
CN114500543B (en) * 2022-01-13 2024-01-23 徐工汉云技术股份有限公司 Distributed elastic edge acquisition system and application method thereof

Also Published As

Publication number Publication date
CN109739882B (en) 2022-06-10

Similar Documents

Publication Publication Date Title
CN103064875B (en) A kind of spatial service data distributed enquiring method
CN109739882A (en) A kind of big data enquiring and optimizing method based on Presto and Elasticsearch
US9798772B2 (en) Using persistent data samples and query-time statistics for query optimization
US10642832B1 (en) Reducing the domain of a subquery by retrieving constraints from the outer query
CN101021874B (en) Method and apparatus for optimizing request to poll SQL
KR101432700B1 (en) Method for optimizing query
CN111382226B (en) Database query and retrieval method and device and electronic equipment
CA2429910A1 (en) System and method of query transformation
US20110072008A1 (en) Query Optimization with Awareness of Limited Resource Usage
CN106777108A (en) A kind of data query method and apparatus based on mixing storage architecture
WO2007098320A2 (en) Apparatus and method for federated querying of unstructured data
US20110302187A1 (en) Schema definition generating device and schema definition generating method
Gruenheid et al. Query optimization using column statistics in hive
Giannakouris et al. MuSQLE: Distributed SQL query execution over multiple engine environments
CN106933906B (en) Data multi-dimensional query method and device
CN106066895A (en) A kind of intelligent inquiry system
CN109325029A (en) RDF data storage and querying method based on sparse matrix
CN106599052A (en) Data query system based on ApacheKylin, and method thereof
CN104714974A (en) Method and device for parsing and reprocessing query statement
CN112162983A (en) Database index suggestion processing method, device, medium and electronic equipment
Sun et al. Dima: A distributed in-memory similarity-based query processing system
CN111125199B (en) Database access method and device and electronic equipment
CN109815254A (en) Cross-region method for scheduling task and system based on big data
CN107729428A (en) A kind of SQL query method based on Presto and Elasticsearch
Marathe et al. Integrating the Orca Optimizer into MySQL.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant