CN109739882A

CN109739882A - A kind of big data enquiring and optimizing method based on Presto and Elasticsearch

Info

Publication number: CN109739882A
Application number: CN201910010085.4A
Authority: CN
Inventors: 洪灿榕
Original assignee: Linewell Software Co Ltd
Current assignee: Linewell Software Co Ltd
Priority date: 2019-01-04
Filing date: 2019-01-04
Publication date: 2019-05-10
Anticipated expiration: 2039-01-04
Also published as: CN109739882B

Abstract

The present invention relates to a kind of big data enquiring and optimizing methods based on Presto and Elasticsearch, all data that are queried are stored in Elasticsearch cluster by it, then SQL request is received and parsed through by Presto cluster and generates corresponding abstract syntax tree and executive plan, querying condition is converted into Elasticsearch inquiry request and be issued in Elasticsearch to be inquired in advance.The present invention is using the advantage of Presto SQL parsing and Elasticsearch quick search, and filtration fraction data as preparatory as possible, are greatly decreased the reading of redundant data before fetching data, to improve the performance of condition query.

Description

A kind of big data enquiring and optimizing method based on Presto and Elasticsearch

Technical field

The present invention relates to the method for quickly querying of big data, specifically provide it is a kind of based on Presto and The big data enquiring and optimizing method of Elasticsearch.

Background technique

Elasticsearch is one and establishes in full-text search engine Apache Lucene^TMOn the basis of in real time analyze Distributed search engine, the function that it, which uses Lucene as core, realizes all indexes and search for, so that each document Content can be indexed, searches for, sorts, filter.But elasticsearch lacks traditional SQL syntax and supports, developer Using more difficult, system its Data Migration, mating operation based on relevant database are also not easy to carry out.And Presto can be Elasticsearch provides basic SQL syntax and supports, but its inquiry mechanism memory-based is also required to the mesh almost full dose Data pre-head is marked into cluster memory, and data required for end user are tens even several in mass data, A large amount of server resource and time are consumed in the process for the reading of redundant data and overanxious, cause search efficiency significantly under Drop.

Summary of the invention

The purpose of the present invention is to provide a kind of big data query optimization sides based on Presto and Elasticsearch Method carries out the reception and parsing of SQL using Presto cluster, using the characteristic of Elasticsearch quick search and filtering, Preparatory filtration fraction data, reduce the reading of redundant data, realize the quick search of data in Elasticsearch cluster.

To achieve the above object, the technical solution adopted by the present invention is that:

A kind of big data enquiring and optimizing method based on Presto and Elasticsearch, the optimization method will be looked into The data of inquiry are stored in the index of Elasticsearch cluster, when user submits SQL query to request to Presto cluster, Execute following steps:

Step 1, Presto cluster receive and parse through SQL query and request and generate corresponding abstract syntax tree, according to the language Method tree generates query execution plan tree, which is the not no structure of branch from top to bottom；

Wherein, SQL query request includes one or more of where clause, order by clause, limit clause；

Each node in step 2, traversal analysis executive plan tree, specific as follows:

Firstly, the node of executive plan tree Elasticsearch support is found out, title and corresponding SQL operation difference Are as follows:

Nodename	Function	Corresponding SQL operation
			FilterNode	Garbled data	Where clause
TopNNode	N row before sequence takes	Order by clause+limit
			LimitNode	Summarize N row before taking	Limit clause
TableScanNode	Read all data in table	Single select inquiry

Then, a Query parameter is added in TableScanNode, for issuing querying condition；

Finally, all nodes in traversal queries plan tree, read corresponding conditions according to node type in ergodic process It issues:

When finding FilterNode, show then to pass through FilterNode comprising where clause in SQL query request Filter condition, obtain where clause's content, pass it to the Query in first node TableScanNode；

When finding TopNNode, show then to lead in SQL query request while comprising order by and limit clause Count, offset, the orderBy crossed in TopNNode obtains order by and limit clause's content, passes it to first section Query in point TableScanNode；

When finding LimitNode, show the SQL for limit clause's exclusive use, then by LimitNode Count, offset obtain page turning parameter, pass it to the Query in first node TableScanNode；

When finding TableScanNode, illustrates that all nodes of executive plan have traversed and finish, TableScanNode In Query included all querying conditions；

Step 3, Presto cluster obtain Query parameter from TableScanNode, are generated according to content of parameter corresponding Inquiry request parameter, wherein where clause is converted to the filter condition in Elasticsearch QueryBuilder, and Order by and limit clause are then converted to sequence and data restrictive condition in QueryBuilder；

The inquiry request Elasticsearch QueryBuilder generated in step 3 is sent to by step 4 Elasticsearch cluster carries out inquiry and in advance filtering by Elasticsearch cluster, then obtained result is returned to Presto cluster；

Step 5, Presto cluster summarize the data that each node returns, and generate final result to user.

After adopting the above scheme, all data that are queried are stored in Elasticsearch cluster by the present invention, are then led to It crosses Presto cluster and receives and parses through the corresponding abstract syntax tree of SQL request generation and inquiry plan, pass through and analyze abstract syntax Tree generates the corresponding inquiry request of Elasticsearch, under the premise of Elasticsearch is supported, will inquire item as far as possible It is identifiable to inquiry request that part is converted to Elasticsearch, and modifies executive plan, and querying condition is issued to In Elasticsearch, most of data are filtered in advance by Elasticsearch, the data of return have substantially conformed to inquiry item Part requirement, then uniformly summarized by Presto, finally obtain result data.And the present invention utilizes Elasticsearch quick search With the characteristic of filtering, the preparatory filtration fraction data in Elasticsearch cluster reduce the reading of redundant data, realize number According to quick search.

Detailed description of the invention

Fig. 1 is overall system architecture of the invention；

Fig. 2 is SQL query query execution plan tree generated and process flow in Presto cluster.

Specific embodiment

Present invention discloses a kind of big data enquiring and optimizing methods based on Presto and Elasticsearch, such as Fig. 1 institute Show, needed for system in include Presto cluster and Elasticsearch cluster, user can by command-line tool, The various ways such as jdbc client or Development of Graphical Interfaces tool submit SQL query request to Presto cluster, by Presto collection Group receives and parses through SQL request, then issues inquiry request to Elasticsearch cluster and reads data.

The present invention's is protected the data being queried based on the big data enquiring and optimizing method of Presto and Elasticsearch There are in the index of Elasticsearch cluster, as shown in Fig. 2, when user submits SQL query to request to Presto cluster, Execute following steps:

Step 1, Presto cluster receive and parse through SQL query and request and generate corresponding abstract syntax tree, according to the language Method tree generates query execution plan tree, which is the not no structure of branch from top to bottom.

Wherein, SQL query request includes one or more of where clause, order by clause, limit clause, example Such as

SELECT ID,NAME FROM EMPLOYEE WHERE ID>3 ORDER BY ID LIMIT 100,10。

When finding TableScanNode, illustrates that all nodes of executive plan have traversed and finish, TableScanNode In Query included all querying conditions.

The inquiry request Elasticsearch QueryBuilder generated in step 3 is sent to by step 4 Elasticsearch cluster, since Elasticsearch supports conditions itself filter and sort, will inquire and sort item Part is converted into carrying out inquiry and filtering, then the knot that will be obtained in advance by Elasticsearch cluster after Elasticsearch request Fruit returns to Presto cluster.

All data that are queried are stored in Elasticsearch cluster by the present invention, are then connect by Presto cluster It receives and parses SQL request and generate corresponding abstract syntax tree and inquiry plan, by analyzing abstract syntax tree, generate The corresponding inquiry request of Elasticsearch as far as possible converts querying condition under the premise of Elasticsearch is supported Inquiry request is arrived for Elasticsearch is identifiable, and modifies executive plan, querying condition is issued to In Elasticsearch, most of data are filtered in advance by Elasticsearch, the data of return have substantially conformed to inquiry item Part requirement, then uniformly summarized by Presto, finally obtain result data.Due to one in Presto docking Elasticsearch When a index is inquired, need to index in all data all read (as sort, polymerization etc.), and One indexes often in store mass data in Elasticsearch, and all data are all read to time-consuming very long, server Resource consumption is very big, and practical adopted data may only have several hundred even tens, cause very big time and money The waste in source.And the present invention utilizes the characteristic of Elasticsearch quick search and filtering, in Elasticsearch cluster Preparatory filtration fraction data, reduce the reading of redundant data, realize the quick search of data.

The above is only the embodiment of the present invention, is not intended to limit the scope of the present invention, therefore all Any subtle modifications, equivalent variations and modifications to the above embodiments according to the technical essence of the invention still fall within this In the range of inventive technique scheme.

Claims

1. a kind of big data enquiring and optimizing method based on Presto and Elasticsearch, it is characterised in that: the optimization side The data being queried are stored in the index of Elasticsearch cluster by method, when user submits SQL query to Presto cluster When request, following steps are executed:

Step 1, Presto cluster receive and parse through SQL query and request and generate corresponding abstract syntax tree, according to the syntax tree Query execution plan tree is generated, which is the not no structure of branch from top to bottom；

Wherein, SQL query request includes one or more of where clause, orderby clause, limit clause；

Firstly, finding out the node of executive plan tree Elasticsearch support, title and corresponding SQL operation are respectively as follows:

Nodename Function Corresponding SQL operation FilterNode Garbled data Where clause TopNNode N row before sequence takes Orderby clause+limit LimitNode Summarize N row before taking Limit clause TableScanNode Read all data in table Single select inquiry

Finally, all nodes in traversal queries plan tree, read corresponding conditions according to node type in ergodic process and issue:

When finding FilterNode, show then to pass through the mistake of FilterNode comprising where clause in SQL query request Filter condition obtains where clause's content, passes it to the Query in first node TableScanNode；

When finding TopNNode, show then to pass through in SQL query request while comprising orderby and limit clause Count, offset, orderBy in TopNNode obtain orderby and limit clause content, pass it to first node Query in TableScanNode；

When finding LimitNode, show the SQL for limit clause's exclusive use, then by count in LimitNode, Offset obtains page turning parameter, passes it to the Query in first node TableScanNode；

When finding TableScanNode, illustrates that all nodes of executive plan have traversed and finish, in TableScanNode Query has included all querying conditions；

Step 3, Presto cluster obtain Query parameter from TableScanNode, generate corresponding inquiry according to content of parameter Required parameter, wherein where clause is converted to the filter condition in Elasticsearch QueryBuilder, and order By and limit clause is then converted to sequence and data restrictive condition in QueryBuilder；

The inquiry request Elasticsearch QueryBuilder generated in step 3 is sent to Elasticsearch by step 4 Cluster carries out inquiry and in advance filtering by Elasticsearch cluster, then obtained result is returned to Presto cluster；