CN106649503A

CN106649503A - Query method and system based on sql

Info

Publication number: CN106649503A
Application number: CN201610887292.4A
Authority: CN
Inventors: 温宗臣; 张翼; 何良均; 范卫卫; 冯森林; 李冰; 曾攀; 严亮; 张书凡
Original assignee: BEIJING GEO POLYMERIZATION TECHNOLOGY Co Ltd
Current assignee: BEIJING GEO POLYMERIZATION TECHNOLOGY Co Ltd
Priority date: 2016-10-11
Filing date: 2016-10-11
Publication date: 2017-05-10

Abstract

The invention discloses a query method based on sql. The query efficiency can be improved, and the query stability is guaranteed. The method comprises the steps that 1, a query plan for sql statements is obtained, the amount of data involved in the operation is located, and a calculation method is determined; 2, based on the amount of data involved in the operation and the calculation method, one engine from hive, presto, and spark is selected and adopted as the query engine. Through the automated assessments of the operational complexities of sql statements, and based on the computing scenarios being good at by each computing engine, computing engines can be selected intelligently, and the aim of increasing computing efficiencies is achieved, meanwhile query stability is guaranteed. The invention also provides a sql query system.

Description

A kind of querying method and system based on sql

Technical field

The present invention relates to the technical field that big data is processed, more particularly to a kind of querying method based on sql, and be based on The inquiry system of sql.

Background technology

In big data field, in order to reduce the threshold that cluster is used, class sql language (SQL is usually used (Structured Query Language), abbreviation sql is a kind of programming language of specific purposes, is a kind of data base querying And programming language, for accessing data and inquiry, updating and administrative relationships Database Systems；It is also simultaneously database pin The extension name of presents) carrying out big data computing, the main flow query engine of sql language is supported at present to be had hive (Hive is base In a Tool for Data Warehouse of Hadoop, structurized data file can be mapped as a database table, and class is provided SQL query function), presto (presto is a distributed sql query engine increased income, it is adaptable to which interactive analysis is inquired about, Data volume supports GB to PB bytes) (spark is UC Berkeley AMP lab (University of California Berkeleys with spark AMP laboratories) the universal parallel framework of class Hadoop MapReduce increased income, spark possesses Hadoop MapReduce Had the advantage that；But different from MapReduce is that output result can be stored in internal memory in the middle of Job, so as to no longer need HDFS is read and write, therefore spark can preferably be applied to the calculation that data mining and machine learning etc. need the MapReduce of iteration Method) etc..

Generally, need sql sentences to be submitted on determination query engine, that is, need manually to come to specify to need to use Query engine, submit to sentence that time just determine that this sentence should be performed on hive query engines, or Perform on presto or spark query engines, these three computing platforms are completely independent.

The computation complexity of each sql sentence be it is different, it depend on the size of data volume that sql will calculate with And the logical complexity of sql itself, in real work, sql tasks great majority are routine mission, after routine, with number According to the growth of amount, the complexity of sql is rapid growth therewith, and the enforcement engine of sql can not be dynamically adjusted, and thus can Bring the reduction of execution efficiency, or even the failure of tasks carrying.

The content of the invention

To overcome the defect of prior art, the technical problem to be solved in the present invention to there is provided a kind of inquiry based on sql Method, it can lift search efficiency, while ensureing inquiry stability.

The technical scheme is that：This querying method based on sql, the method is comprised the following steps：

(1) inquiry plan of sql sentences is got, and then finds out the data volume for participating in computing, while finding computational methods；

(2) according to the data volume and computational methods for participating in computing, select using a kind of work in hive, presto, spark For query engine.

The present invention is assessed by the automation to sql sentence computational complexities, further according to the calculating that each computing engines are good at Scene, it is possible to intelligent selection computing engines, to reach the purpose for lifting operation efficiency, while ensure that inquiry stability.

A kind of inquiry system based on sql is additionally provided, the system includes：

Data volume and computation schema identification module, it configures to get the inquiry plan of sql sentences, and then finds out participation The data volume of computing, while finding computational methods；

Query engine intelligent Matching module, its configuration comes according to the data volume and computational methods for participating in computing, selects to adopt One kind in hive, presto, spark is used as query engine.

Description of the drawings

Fig. 1 show the flow chart of the querying method based on sql of the invention.

Specific embodiment

Tri- different query engines of hive, presto and spark have his own strong points：

1st, hive is highly stable, can support the batch processing computing of big data quantity, and stable not error-prone, and in data Execution efficiency is relatively low when little；

2nd, presto very lightweights, full internal memory operation, execution efficiency is very high when data volume is little, and data volume exceedes One definite limitation can perform abnormal even failure because internal memory is limited；

3rd, spark is also the query engine of full internal memory operation, and execution performance is being counted between hive and presto During according to amount super large, can equally become less stable.

The assessment that this motion passes through calculating sql sentences data volume to be processed and sql logical complexities itself, automatically choosing The query engine of sql sentences is selected, to lift search efficiency, while ensure that inquiry stability.

As shown in figure 1, this querying method based on sql, the method is comprised the following steps：

In addition, in the step (1), by the explain of hive the inquiry plan of sql sentences is got.

In addition, in the step (1), computational methods include：Join is operated, group by operations, distinct operations.

In addition, in the step (2), if data volume is more than 100,000,000 datas, and there is join operations, group by Operation or distinct operations, select hive as query engine to submit sql operations to；If data volume is less than 10,000,000, Sql operations are submitted to using presto as query engine；If data volume does not exist between 1,000 ten thousand to 1 hundred million Join is operated, and group by operations or distinct are operated, and using presto as query engine sql operations are submitted to；Divided by Upper three kinds of situations, using spark as query engine sql operations are submitted to.

It will appreciated by the skilled person that realizing that all or part of step in above-described embodiment method can be Related hardware is instructed to complete by program, described program can be stored in a computer read/write memory medium, The program upon execution, including each step of above-described embodiment method, and described storage medium can be：ROM/RAM, magnetic Dish, CD, storage card etc..Therefore, corresponding with the method for the present invention, the present invention also includes a kind of looking into based on sql simultaneously Inquiry system, the system is generally represented in the form of the functional module corresponding with each step of method.Using the system bag of the method Include：

In addition, in the data volume and computation schema identification module, by the explain of hive sql sentences are got Inquiry plan.

In addition, in the data volume and computation schema identification module, computational methods include：Join is operated, group by behaviour Make, distinct operations.

In addition, in the query engine intelligent Matching module, if data volume is more than 100,000,000 datas, and there is join Operation, group by operations or distinct operations, select hive as query engine to submit sql operations to；If data Amount is less than 10,000,000, and sql operations are submitted to as query engine using presto；If data volume is between 1,000 ten thousand to 1 hundred million, And there is no join operations, group by operations or distinct operations, using presto as query engine sql is submitted to Operation；Except three cases above, using spark as query engine sql operations are submitted to.

Beneficial effects of the present invention are as follows：

1st, during data modeling, using to substantial amounts of sql routine missions being processed to business datum.

2nd, in data analysis process, data query result can be caused quickly and correctly to return.

3rd, the advantage of each computing engines has been made full use of, has evaded the shortcoming of each computing engines so that sql tasks Submit intelligent to, the overall efficiency that such sql sentences are performed is highest.

The above, is only presently preferred embodiments of the present invention, not makees any pro forma restriction to the present invention, it is every according to According to any simple modification, equivalent variations and modification that the technical spirit of the present invention is made to above example, still belong to the present invention The protection domain of technical scheme.

Claims

1. a kind of querying method based on sql, it is characterised in that：The method is comprised the following steps：

(2) according to the data volume and computational methods for participating in computing, select using the one kind in hive, presto, spark as looking into Ask engine.

2. the querying method based on sql according to claim 1, it is characterised in that：In the step (1), by hive Explain get the inquiry plan of sql sentences.

3. the querying method based on sql according to claim 2, it is characterised in that：In the step (1), computational methods Including：Join is operated, group by operations, distinct operations.

4. the querying method based on sql according to claim 3, it is characterised in that：In the step (2), if data Amount is more than 100,000,000 datas, and there is join operations, group by operations or distinct operations, selects hive as looking into Ask engine to submit sql operations to；If data volume is less than 10,000,000, sql operations are submitted to as query engine using presto； If data volume is between 1,000 ten thousand to 1 hundred million, and does not have join operations, group by operations or distinct are operated, Sql operations are submitted to using presto as query engine；Except three cases above, using spark as query engine sql is submitted to Operation.

5. a kind of inquiry system based on sql, it is characterised in that：The system includes：

Data volume and computation schema identification module, it configures to get the inquiry plan of sql sentences,

And then the data volume for participating in computing is found out, while finding computational methods；

Query engine intelligent Matching module, its configuration come according to participate in computing data volume and computational methods, select using hive, One kind in presto, spark is used as query engine.

6. the inquiry system based on sql according to claim 5, it is characterised in that：The data volume and computation schema are known In other module, the inquiry plan of sql sentences is got by the explain of hive.

7. the inquiry system based on sql according to claim 6, it is characterised in that：The data volume and computation schema are known In other module, computational methods include：Join is operated, group by operations, distinct operations.

8. the inquiry system based on sql according to claim 7, it is characterised in that：The query engine intelligent Matching mould In block, if data volume is more than 100,000,000 datas, and there is join operations, group by operations or distinct operations, choosing Hive is selected as query engine to submit sql operations to；If data volume is less than 10,000,000, come using presto as query engine Submit sql operations to；If data volume is between 1,000 ten thousand to 1 hundred million, and do not exist join operation, group by operation or Distinct is operated, and using presto as query engine sql operations are submitted to；Except three cases above, using spark as looking into Ask engine and submit sql operations to.