CN109739882A - A kind of big data enquiring and optimizing method based on Presto and Elasticsearch - Google Patents
A kind of big data enquiring and optimizing method based on Presto and Elasticsearch Download PDFInfo
- Publication number
- CN109739882A CN109739882A CN201910010085.4A CN201910010085A CN109739882A CN 109739882 A CN109739882 A CN 109739882A CN 201910010085 A CN201910010085 A CN 201910010085A CN 109739882 A CN109739882 A CN 109739882A
- Authority
- CN
- China
- Prior art keywords
- elasticsearch
- clause
- data
- query
- presto
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Abstract
The present invention relates to a kind of big data enquiring and optimizing methods based on Presto and Elasticsearch, all data that are queried are stored in Elasticsearch cluster by it, then SQL request is received and parsed through by Presto cluster and generates corresponding abstract syntax tree and executive plan, querying condition is converted into Elasticsearch inquiry request and be issued in Elasticsearch to be inquired in advance.The present invention is using the advantage of Presto SQL parsing and Elasticsearch quick search, and filtration fraction data as preparatory as possible, are greatly decreased the reading of redundant data before fetching data, to improve the performance of condition query.
Description
Technical field
The present invention relates to the method for quickly querying of big data, specifically provide it is a kind of based on Presto and
The big data enquiring and optimizing method of Elasticsearch.
Background technique
Elasticsearch is one and establishes in full-text search engine Apache LuceneTMOn the basis of in real time analyze
Distributed search engine, the function that it, which uses Lucene as core, realizes all indexes and search for, so that each document
Content can be indexed, searches for, sorts, filter.But elasticsearch lacks traditional SQL syntax and supports, developer
Using more difficult, system its Data Migration, mating operation based on relevant database are also not easy to carry out.And Presto can be
Elasticsearch provides basic SQL syntax and supports, but its inquiry mechanism memory-based is also required to the mesh almost full dose
Data pre-head is marked into cluster memory, and data required for end user are tens even several in mass data,
A large amount of server resource and time are consumed in the process for the reading of redundant data and overanxious, cause search efficiency significantly under
Drop.
Summary of the invention
The purpose of the present invention is to provide a kind of big data query optimization sides based on Presto and Elasticsearch
Method carries out the reception and parsing of SQL using Presto cluster, using the characteristic of Elasticsearch quick search and filtering,
Preparatory filtration fraction data, reduce the reading of redundant data, realize the quick search of data in Elasticsearch cluster.
To achieve the above object, the technical solution adopted by the present invention is that:
A kind of big data enquiring and optimizing method based on Presto and Elasticsearch, the optimization method will be looked into
The data of inquiry are stored in the index of Elasticsearch cluster, when user submits SQL query to request to Presto cluster,
Execute following steps:
Step 1, Presto cluster receive and parse through SQL query and request and generate corresponding abstract syntax tree, according to the language
Method tree generates query execution plan tree, which is the not no structure of branch from top to bottom;
Wherein, SQL query request includes one or more of where clause, order by clause, limit clause;
Each node in step 2, traversal analysis executive plan tree, specific as follows:
Firstly, the node of executive plan tree Elasticsearch support is found out, title and corresponding SQL operation difference
Are as follows:
Nodename | Function | Corresponding SQL operation |
FilterNode | Garbled data | Where clause |
TopNNode | N row before sequence takes | Order by clause+limit |
LimitNode | Summarize N row before taking | Limit clause |
TableScanNode | Read all data in table | Single select inquiry |
Then, a Query parameter is added in TableScanNode, for issuing querying condition;
Finally, all nodes in traversal queries plan tree, read corresponding conditions according to node type in ergodic process
It issues:
When finding FilterNode, show then to pass through FilterNode comprising where clause in SQL query request
Filter condition, obtain where clause's content, pass it to the Query in first node TableScanNode;
When finding TopNNode, show then to lead in SQL query request while comprising order by and limit clause
Count, offset, the orderBy crossed in TopNNode obtains order by and limit clause's content, passes it to first section
Query in point TableScanNode;
When finding LimitNode, show the SQL for limit clause's exclusive use, then by LimitNode
Count, offset obtain page turning parameter, pass it to the Query in first node TableScanNode;
When finding TableScanNode, illustrates that all nodes of executive plan have traversed and finish, TableScanNode
In Query included all querying conditions;
Step 3, Presto cluster obtain Query parameter from TableScanNode, are generated according to content of parameter corresponding
Inquiry request parameter, wherein where clause is converted to the filter condition in Elasticsearch QueryBuilder, and
Order by and limit clause are then converted to sequence and data restrictive condition in QueryBuilder;
The inquiry request Elasticsearch QueryBuilder generated in step 3 is sent to by step 4
Elasticsearch cluster carries out inquiry and in advance filtering by Elasticsearch cluster, then obtained result is returned to
Presto cluster;
Step 5, Presto cluster summarize the data that each node returns, and generate final result to user.
After adopting the above scheme, all data that are queried are stored in Elasticsearch cluster by the present invention, are then led to
It crosses Presto cluster and receives and parses through the corresponding abstract syntax tree of SQL request generation and inquiry plan, pass through and analyze abstract syntax
Tree generates the corresponding inquiry request of Elasticsearch, under the premise of Elasticsearch is supported, will inquire item as far as possible
It is identifiable to inquiry request that part is converted to Elasticsearch, and modifies executive plan, and querying condition is issued to
In Elasticsearch, most of data are filtered in advance by Elasticsearch, the data of return have substantially conformed to inquiry item
Part requirement, then uniformly summarized by Presto, finally obtain result data.And the present invention utilizes Elasticsearch quick search
With the characteristic of filtering, the preparatory filtration fraction data in Elasticsearch cluster reduce the reading of redundant data, realize number
According to quick search.
Detailed description of the invention
Fig. 1 is overall system architecture of the invention;
Fig. 2 is SQL query query execution plan tree generated and process flow in Presto cluster.
Specific embodiment
Present invention discloses a kind of big data enquiring and optimizing methods based on Presto and Elasticsearch, such as Fig. 1 institute
Show, needed for system in include Presto cluster and Elasticsearch cluster, user can by command-line tool,
The various ways such as jdbc client or Development of Graphical Interfaces tool submit SQL query request to Presto cluster, by Presto collection
Group receives and parses through SQL request, then issues inquiry request to Elasticsearch cluster and reads data.
The present invention's is protected the data being queried based on the big data enquiring and optimizing method of Presto and Elasticsearch
There are in the index of Elasticsearch cluster, as shown in Fig. 2, when user submits SQL query to request to Presto cluster,
Execute following steps:
Step 1, Presto cluster receive and parse through SQL query and request and generate corresponding abstract syntax tree, according to the language
Method tree generates query execution plan tree, which is the not no structure of branch from top to bottom.
Wherein, SQL query request includes one or more of where clause, order by clause, limit clause, example
Such as
SELECT ID,NAME FROM EMPLOYEE WHERE ID>3 ORDER BY ID LIMIT 100,10。
Each node in step 2, traversal analysis executive plan tree, specific as follows:
Firstly, the node of executive plan tree Elasticsearch support is found out, title and corresponding SQL operation difference
Are as follows:
Nodename | Function | Corresponding SQL operation |
FilterNode | Garbled data | Where clause |
TopNNode | N row before sequence takes | Order by clause+limit |
LimitNode | Summarize N row before taking | Limit clause |
TableScanNode | Read all data in table | Single select inquiry |
Then, a Query parameter is added in TableScanNode, for issuing querying condition;
Finally, all nodes in traversal queries plan tree, read corresponding conditions according to node type in ergodic process
It issues:
When finding FilterNode, show then to pass through FilterNode comprising where clause in SQL query request
Filter condition, obtain where clause's content, pass it to the Query in first node TableScanNode;
When finding TopNNode, show then to lead in SQL query request while comprising order by and limit clause
Count, offset, the orderBy crossed in TopNNode obtains order by and limit clause's content, passes it to first section
Query in point TableScanNode;
When finding LimitNode, show the SQL for limit clause's exclusive use, then by LimitNode
Count, offset obtain page turning parameter, pass it to the Query in first node TableScanNode;
When finding TableScanNode, illustrates that all nodes of executive plan have traversed and finish, TableScanNode
In Query included all querying conditions.
Step 3, Presto cluster obtain Query parameter from TableScanNode, are generated according to content of parameter corresponding
Inquiry request parameter, wherein where clause is converted to the filter condition in Elasticsearch QueryBuilder, and
Order by and limit clause are then converted to sequence and data restrictive condition in QueryBuilder;
The inquiry request Elasticsearch QueryBuilder generated in step 3 is sent to by step 4
Elasticsearch cluster, since Elasticsearch supports conditions itself filter and sort, will inquire and sort item
Part is converted into carrying out inquiry and filtering, then the knot that will be obtained in advance by Elasticsearch cluster after Elasticsearch request
Fruit returns to Presto cluster.
Step 5, Presto cluster summarize the data that each node returns, and generate final result to user.
All data that are queried are stored in Elasticsearch cluster by the present invention, are then connect by Presto cluster
It receives and parses SQL request and generate corresponding abstract syntax tree and inquiry plan, by analyzing abstract syntax tree, generate
The corresponding inquiry request of Elasticsearch as far as possible converts querying condition under the premise of Elasticsearch is supported
Inquiry request is arrived for Elasticsearch is identifiable, and modifies executive plan, querying condition is issued to
In Elasticsearch, most of data are filtered in advance by Elasticsearch, the data of return have substantially conformed to inquiry item
Part requirement, then uniformly summarized by Presto, finally obtain result data.Due to one in Presto docking Elasticsearch
When a index is inquired, need to index in all data all read (as sort, polymerization etc.), and
One indexes often in store mass data in Elasticsearch, and all data are all read to time-consuming very long, server
Resource consumption is very big, and practical adopted data may only have several hundred even tens, cause very big time and money
The waste in source.And the present invention utilizes the characteristic of Elasticsearch quick search and filtering, in Elasticsearch cluster
Preparatory filtration fraction data, reduce the reading of redundant data, realize the quick search of data.
The above is only the embodiment of the present invention, is not intended to limit the scope of the present invention, therefore all
Any subtle modifications, equivalent variations and modifications to the above embodiments according to the technical essence of the invention still fall within this
In the range of inventive technique scheme.
Claims (1)
1. a kind of big data enquiring and optimizing method based on Presto and Elasticsearch, it is characterised in that: the optimization side
The data being queried are stored in the index of Elasticsearch cluster by method, when user submits SQL query to Presto cluster
When request, following steps are executed:
Step 1, Presto cluster receive and parse through SQL query and request and generate corresponding abstract syntax tree, according to the syntax tree
Query execution plan tree is generated, which is the not no structure of branch from top to bottom;
Wherein, SQL query request includes one or more of where clause, orderby clause, limit clause;
Each node in step 2, traversal analysis executive plan tree, specific as follows:
Firstly, finding out the node of executive plan tree Elasticsearch support, title and corresponding SQL operation are respectively as follows:
Then, a Query parameter is added in TableScanNode, for issuing querying condition;
Finally, all nodes in traversal queries plan tree, read corresponding conditions according to node type in ergodic process and issue:
When finding FilterNode, show then to pass through the mistake of FilterNode comprising where clause in SQL query request
Filter condition obtains where clause's content, passes it to the Query in first node TableScanNode;
When finding TopNNode, show then to pass through in SQL query request while comprising orderby and limit clause
Count, offset, orderBy in TopNNode obtain orderby and limit clause content, pass it to first node
Query in TableScanNode;
When finding LimitNode, show the SQL for limit clause's exclusive use, then by count in LimitNode,
Offset obtains page turning parameter, passes it to the Query in first node TableScanNode;
When finding TableScanNode, illustrates that all nodes of executive plan have traversed and finish, in TableScanNode
Query has included all querying conditions;
Step 3, Presto cluster obtain Query parameter from TableScanNode, generate corresponding inquiry according to content of parameter
Required parameter, wherein where clause is converted to the filter condition in Elasticsearch QueryBuilder, and order
By and limit clause is then converted to sequence and data restrictive condition in QueryBuilder;
The inquiry request Elasticsearch QueryBuilder generated in step 3 is sent to Elasticsearch by step 4
Cluster carries out inquiry and in advance filtering by Elasticsearch cluster, then obtained result is returned to Presto cluster;
Step 5, Presto cluster summarize the data that each node returns, and generate final result to user.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910010085.4A CN109739882B (en) | 2019-01-04 | 2019-01-04 | Big data query optimization method based on Presto and Elasticissearch |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910010085.4A CN109739882B (en) | 2019-01-04 | 2019-01-04 | Big data query optimization method based on Presto and Elasticissearch |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109739882A true CN109739882A (en) | 2019-05-10 |
CN109739882B CN109739882B (en) | 2022-06-10 |
Family
ID=66363435
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910010085.4A Active CN109739882B (en) | 2019-01-04 | 2019-01-04 | Big data query optimization method based on Presto and Elasticissearch |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109739882B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110399386A (en) * | 2019-07-15 | 2019-11-01 | 南威软件股份有限公司 | A kind of SQL UPDATE method and control system based on Presto |
CN112800104A (en) * | 2020-12-08 | 2021-05-14 | 江苏苏宁云计算有限公司 | Method and device for optimizing ES query request link |
WO2022052639A1 (en) * | 2020-09-10 | 2022-03-17 | 北京达佳互联信息技术有限公司 | Data query method and apparatus |
CN114500543A (en) * | 2022-01-13 | 2022-05-13 | 徐工汉云技术股份有限公司 | Distributed elastic edge acquisition system and application method thereof |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016025924A1 (en) * | 2014-08-14 | 2016-02-18 | Qubole, Inc. | Systems and methods for auto-scaling a big data system |
US20160203548A1 (en) * | 2007-02-09 | 2016-07-14 | Xcira, Inc. | Integrated auctioning environment platform |
CN106528797A (en) * | 2016-11-10 | 2017-03-22 | 上海轻维软件有限公司 | DSL query method based on Elasticsearch |
CN106649630A (en) * | 2016-12-07 | 2017-05-10 | 乐视控股(北京)有限公司 | Data query method and device |
CN106934062A (en) * | 2017-03-28 | 2017-07-07 | 广东工业大学 | A kind of realization method and system of inquiry elasticsearch |
CN107133267A (en) * | 2017-04-01 | 2017-09-05 | 北京京东尚科信息技术有限公司 | Inquire about method, device, electronic equipment and the readable storage medium storing program for executing of elasticsearch clusters |
CN107729428A (en) * | 2017-09-28 | 2018-02-23 | 南威软件股份有限公司 | A kind of SQL query method based on Presto and Elasticsearch |
CN107748766A (en) * | 2017-09-28 | 2018-03-02 | 南威软件股份有限公司 | A kind of big data method for quickly querying based on Presto and Elasticsearch |
-
2019
- 2019-01-04 CN CN201910010085.4A patent/CN109739882B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160203548A1 (en) * | 2007-02-09 | 2016-07-14 | Xcira, Inc. | Integrated auctioning environment platform |
WO2016025924A1 (en) * | 2014-08-14 | 2016-02-18 | Qubole, Inc. | Systems and methods for auto-scaling a big data system |
CN106528797A (en) * | 2016-11-10 | 2017-03-22 | 上海轻维软件有限公司 | DSL query method based on Elasticsearch |
CN106649630A (en) * | 2016-12-07 | 2017-05-10 | 乐视控股(北京)有限公司 | Data query method and device |
CN106934062A (en) * | 2017-03-28 | 2017-07-07 | 广东工业大学 | A kind of realization method and system of inquiry elasticsearch |
CN107133267A (en) * | 2017-04-01 | 2017-09-05 | 北京京东尚科信息技术有限公司 | Inquire about method, device, electronic equipment and the readable storage medium storing program for executing of elasticsearch clusters |
CN107729428A (en) * | 2017-09-28 | 2018-02-23 | 南威软件股份有限公司 | A kind of SQL query method based on Presto and Elasticsearch |
CN107748766A (en) * | 2017-09-28 | 2018-03-02 | 南威软件股份有限公司 | A kind of big data method for quickly querying based on Presto and Elasticsearch |
Non-Patent Citations (2)
Title |
---|
DEQUAN CHEN, YI CHEN, BRIAN N. BROWNLOW, PRADIP P. KANJAMALA, CA: "Real-Time or Near Real-Time Persisting Daily", 《IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS》 * |
胡庆宝等: "基于Elasticsearch的实时集群日志采集和分析系统实现", 《科研信息化技术与应用》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110399386A (en) * | 2019-07-15 | 2019-11-01 | 南威软件股份有限公司 | A kind of SQL UPDATE method and control system based on Presto |
CN110399386B (en) * | 2019-07-15 | 2022-05-10 | 南威软件股份有限公司 | Presto-based SQL UPDATE method and control system |
WO2022052639A1 (en) * | 2020-09-10 | 2022-03-17 | 北京达佳互联信息技术有限公司 | Data query method and apparatus |
CN112800104A (en) * | 2020-12-08 | 2021-05-14 | 江苏苏宁云计算有限公司 | Method and device for optimizing ES query request link |
CN114500543A (en) * | 2022-01-13 | 2022-05-13 | 徐工汉云技术股份有限公司 | Distributed elastic edge acquisition system and application method thereof |
CN114500543B (en) * | 2022-01-13 | 2024-01-23 | 徐工汉云技术股份有限公司 | Distributed elastic edge acquisition system and application method thereof |
Also Published As
Publication number | Publication date |
---|---|
CN109739882B (en) | 2022-06-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103064875B (en) | A kind of spatial service data distributed enquiring method | |
CN109739882A (en) | A kind of big data enquiring and optimizing method based on Presto and Elasticsearch | |
US9798772B2 (en) | Using persistent data samples and query-time statistics for query optimization | |
US10642832B1 (en) | Reducing the domain of a subquery by retrieving constraints from the outer query | |
CN101021874B (en) | Method and apparatus for optimizing request to poll SQL | |
KR101432700B1 (en) | Method for optimizing query | |
CN111382226B (en) | Database query and retrieval method and device and electronic equipment | |
CA2429910A1 (en) | System and method of query transformation | |
US20110072008A1 (en) | Query Optimization with Awareness of Limited Resource Usage | |
CN106777108A (en) | A kind of data query method and apparatus based on mixing storage architecture | |
WO2007098320A2 (en) | Apparatus and method for federated querying of unstructured data | |
US20110302187A1 (en) | Schema definition generating device and schema definition generating method | |
Gruenheid et al. | Query optimization using column statistics in hive | |
Giannakouris et al. | MuSQLE: Distributed SQL query execution over multiple engine environments | |
CN106933906B (en) | Data multi-dimensional query method and device | |
CN106066895A (en) | A kind of intelligent inquiry system | |
CN109325029A (en) | RDF data storage and querying method based on sparse matrix | |
CN106599052A (en) | Data query system based on ApacheKylin, and method thereof | |
CN104714974A (en) | Method and device for parsing and reprocessing query statement | |
CN112162983A (en) | Database index suggestion processing method, device, medium and electronic equipment | |
Sun et al. | Dima: A distributed in-memory similarity-based query processing system | |
CN111125199B (en) | Database access method and device and electronic equipment | |
CN109815254A (en) | Cross-region method for scheduling task and system based on big data | |
CN107729428A (en) | A kind of SQL query method based on Presto and Elasticsearch | |
Marathe et al. | Integrating the Orca Optimizer into MySQL. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |