CN106649503A - Query method and system based on sql - Google Patents
Query method and system based on sql Download PDFInfo
- Publication number
- CN106649503A CN106649503A CN201610887292.4A CN201610887292A CN106649503A CN 106649503 A CN106649503 A CN 106649503A CN 201610887292 A CN201610887292 A CN 201610887292A CN 106649503 A CN106649503 A CN 106649503A
- Authority
- CN
- China
- Prior art keywords
- sql
- operations
- data volume
- query engine
- presto
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Fuzzy Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a query method based on sql. The query efficiency can be improved, and the query stability is guaranteed. The method comprises the steps that 1, a query plan for sql statements is obtained, the amount of data involved in the operation is located, and a calculation method is determined; 2, based on the amount of data involved in the operation and the calculation method, one engine from hive, presto, and spark is selected and adopted as the query engine. Through the automated assessments of the operational complexities of sql statements, and based on the computing scenarios being good at by each computing engine, computing engines can be selected intelligently, and the aim of increasing computing efficiencies is achieved, meanwhile query stability is guaranteed. The invention also provides a sql query system.
Description
Technical field
The present invention relates to the technical field that big data is processed, more particularly to a kind of querying method based on sql, and be based on
The inquiry system of sql.
Background technology
In big data field, in order to reduce the threshold that cluster is used, class sql language (SQL is usually used
(Structured Query Language), abbreviation sql is a kind of programming language of specific purposes, is a kind of data base querying
And programming language, for accessing data and inquiry, updating and administrative relationships Database Systems;It is also simultaneously database pin
The extension name of presents) carrying out big data computing, the main flow query engine of sql language is supported at present to be had hive (Hive is base
In a Tool for Data Warehouse of Hadoop, structurized data file can be mapped as a database table, and class is provided
SQL query function), presto (presto is a distributed sql query engine increased income, it is adaptable to which interactive analysis is inquired about,
Data volume supports GB to PB bytes) (spark is UC Berkeley AMP lab (University of California Berkeleys with spark
AMP laboratories) the universal parallel framework of class Hadoop MapReduce increased income, spark possesses Hadoop MapReduce
Had the advantage that;But different from MapReduce is that output result can be stored in internal memory in the middle of Job, so as to no longer need
HDFS is read and write, therefore spark can preferably be applied to the calculation that data mining and machine learning etc. need the MapReduce of iteration
Method) etc..
Generally, need sql sentences to be submitted on determination query engine, that is, need manually to come to specify to need to use
Query engine, submit to sentence that time just determine that this sentence should be performed on hive query engines, or
Perform on presto or spark query engines, these three computing platforms are completely independent.
The computation complexity of each sql sentence be it is different, it depend on the size of data volume that sql will calculate with
And the logical complexity of sql itself, in real work, sql tasks great majority are routine mission, after routine, with number
According to the growth of amount, the complexity of sql is rapid growth therewith, and the enforcement engine of sql can not be dynamically adjusted, and thus can
Bring the reduction of execution efficiency, or even the failure of tasks carrying.
The content of the invention
To overcome the defect of prior art, the technical problem to be solved in the present invention to there is provided a kind of inquiry based on sql
Method, it can lift search efficiency, while ensureing inquiry stability.
The technical scheme is that:This querying method based on sql, the method is comprised the following steps:
(1) inquiry plan of sql sentences is got, and then finds out the data volume for participating in computing, while finding computational methods;
(2) according to the data volume and computational methods for participating in computing, select using a kind of work in hive, presto, spark
For query engine.
The present invention is assessed by the automation to sql sentence computational complexities, further according to the calculating that each computing engines are good at
Scene, it is possible to intelligent selection computing engines, to reach the purpose for lifting operation efficiency, while ensure that inquiry stability.
A kind of inquiry system based on sql is additionally provided, the system includes:
Data volume and computation schema identification module, it configures to get the inquiry plan of sql sentences, and then finds out participation
The data volume of computing, while finding computational methods;
Query engine intelligent Matching module, its configuration comes according to the data volume and computational methods for participating in computing, selects to adopt
One kind in hive, presto, spark is used as query engine.
Description of the drawings
Fig. 1 show the flow chart of the querying method based on sql of the invention.
Specific embodiment
Tri- different query engines of hive, presto and spark have his own strong points:
1st, hive is highly stable, can support the batch processing computing of big data quantity, and stable not error-prone, and in data
Execution efficiency is relatively low when little;
2nd, presto very lightweights, full internal memory operation, execution efficiency is very high when data volume is little, and data volume exceedes
One definite limitation can perform abnormal even failure because internal memory is limited;
3rd, spark is also the query engine of full internal memory operation, and execution performance is being counted between hive and presto
During according to amount super large, can equally become less stable.
The assessment that this motion passes through calculating sql sentences data volume to be processed and sql logical complexities itself, automatically choosing
The query engine of sql sentences is selected, to lift search efficiency, while ensure that inquiry stability.
As shown in figure 1, this querying method based on sql, the method is comprised the following steps:
(1) inquiry plan of sql sentences is got, and then finds out the data volume for participating in computing, while finding computational methods;
(2) according to the data volume and computational methods for participating in computing, select using a kind of work in hive, presto, spark
For query engine.
The present invention is assessed by the automation to sql sentence computational complexities, further according to the calculating that each computing engines are good at
Scene, it is possible to intelligent selection computing engines, to reach the purpose for lifting operation efficiency, while ensure that inquiry stability.
In addition, in the step (1), by the explain of hive the inquiry plan of sql sentences is got.
In addition, in the step (1), computational methods include:Join is operated, group by operations, distinct operations.
In addition, in the step (2), if data volume is more than 100,000,000 datas, and there is join operations, group by
Operation or distinct operations, select hive as query engine to submit sql operations to;If data volume is less than 10,000,000,
Sql operations are submitted to using presto as query engine;If data volume does not exist between 1,000 ten thousand to 1 hundred million
Join is operated, and group by operations or distinct are operated, and using presto as query engine sql operations are submitted to;Divided by
Upper three kinds of situations, using spark as query engine sql operations are submitted to.
It will appreciated by the skilled person that realizing that all or part of step in above-described embodiment method can be
Related hardware is instructed to complete by program, described program can be stored in a computer read/write memory medium,
The program upon execution, including each step of above-described embodiment method, and described storage medium can be:ROM/RAM, magnetic
Dish, CD, storage card etc..Therefore, corresponding with the method for the present invention, the present invention also includes a kind of looking into based on sql simultaneously
Inquiry system, the system is generally represented in the form of the functional module corresponding with each step of method.Using the system bag of the method
Include:
Data volume and computation schema identification module, it configures to get the inquiry plan of sql sentences, and then finds out participation
The data volume of computing, while finding computational methods;
Query engine intelligent Matching module, its configuration comes according to the data volume and computational methods for participating in computing, selects to adopt
One kind in hive, presto, spark is used as query engine.
In addition, in the data volume and computation schema identification module, by the explain of hive sql sentences are got
Inquiry plan.
In addition, in the data volume and computation schema identification module, computational methods include:Join is operated, group by behaviour
Make, distinct operations.
In addition, in the query engine intelligent Matching module, if data volume is more than 100,000,000 datas, and there is join
Operation, group by operations or distinct operations, select hive as query engine to submit sql operations to;If data
Amount is less than 10,000,000, and sql operations are submitted to as query engine using presto;If data volume is between 1,000 ten thousand to 1 hundred million,
And there is no join operations, group by operations or distinct operations, using presto as query engine sql is submitted to
Operation;Except three cases above, using spark as query engine sql operations are submitted to.
Beneficial effects of the present invention are as follows:
1st, during data modeling, using to substantial amounts of sql routine missions being processed to business datum.
2nd, in data analysis process, data query result can be caused quickly and correctly to return.
3rd, the advantage of each computing engines has been made full use of, has evaded the shortcoming of each computing engines so that sql tasks
Submit intelligent to, the overall efficiency that such sql sentences are performed is highest.
The above, is only presently preferred embodiments of the present invention, not makees any pro forma restriction to the present invention, it is every according to
According to any simple modification, equivalent variations and modification that the technical spirit of the present invention is made to above example, still belong to the present invention
The protection domain of technical scheme.
Claims (8)
1. a kind of querying method based on sql, it is characterised in that:The method is comprised the following steps:
(1) inquiry plan of sql sentences is got, and then finds out the data volume for participating in computing, while finding computational methods;
(2) according to the data volume and computational methods for participating in computing, select using the one kind in hive, presto, spark as looking into
Ask engine.
2. the querying method based on sql according to claim 1, it is characterised in that:In the step (1), by hive
Explain get the inquiry plan of sql sentences.
3. the querying method based on sql according to claim 2, it is characterised in that:In the step (1), computational methods
Including:Join is operated, group by operations, distinct operations.
4. the querying method based on sql according to claim 3, it is characterised in that:In the step (2), if data
Amount is more than 100,000,000 datas, and there is join operations, group by operations or distinct operations, selects hive as looking into
Ask engine to submit sql operations to;If data volume is less than 10,000,000, sql operations are submitted to as query engine using presto;
If data volume is between 1,000 ten thousand to 1 hundred million, and does not have join operations, group by operations or distinct are operated,
Sql operations are submitted to using presto as query engine;Except three cases above, using spark as query engine sql is submitted to
Operation.
5. a kind of inquiry system based on sql, it is characterised in that:The system includes:
Data volume and computation schema identification module, it configures to get the inquiry plan of sql sentences,
And then the data volume for participating in computing is found out, while finding computational methods;
Query engine intelligent Matching module, its configuration come according to participate in computing data volume and computational methods, select using hive,
One kind in presto, spark is used as query engine.
6. the inquiry system based on sql according to claim 5, it is characterised in that:The data volume and computation schema are known
In other module, the inquiry plan of sql sentences is got by the explain of hive.
7. the inquiry system based on sql according to claim 6, it is characterised in that:The data volume and computation schema are known
In other module, computational methods include:Join is operated, group by operations, distinct operations.
8. the inquiry system based on sql according to claim 7, it is characterised in that:The query engine intelligent Matching mould
In block, if data volume is more than 100,000,000 datas, and there is join operations, group by operations or distinct operations, choosing
Hive is selected as query engine to submit sql operations to;If data volume is less than 10,000,000, come using presto as query engine
Submit sql operations to;If data volume is between 1,000 ten thousand to 1 hundred million, and do not exist join operation, group by operation or
Distinct is operated, and using presto as query engine sql operations are submitted to;Except three cases above, using spark as looking into
Ask engine and submit sql operations to.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610887292.4A CN106649503A (en) | 2016-10-11 | 2016-10-11 | Query method and system based on sql |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610887292.4A CN106649503A (en) | 2016-10-11 | 2016-10-11 | Query method and system based on sql |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106649503A true CN106649503A (en) | 2017-05-10 |
Family
ID=58855154
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610887292.4A Pending CN106649503A (en) | 2016-10-11 | 2016-10-11 | Query method and system based on sql |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106649503A (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107239548A (en) * | 2017-06-05 | 2017-10-10 | 携程旅游网络技术(上海)有限公司 | Report processing method based on SQL Server and HIVE |
CN107609130A (en) * | 2017-09-18 | 2018-01-19 | 链家网(北京)科技有限公司 | A kind of method and server for selecting data query engine |
CN108170860A (en) * | 2018-01-22 | 2018-06-15 | 北京小度信息科技有限公司 | Data query method, apparatus, electronic equipment and computer readable storage medium |
CN108363819A (en) * | 2018-03-23 | 2018-08-03 | 联想(北京)有限公司 | Query engine matching method, device, server group and readable storage medium storing program for executing |
CN108549683A (en) * | 2018-04-03 | 2018-09-18 | 联想(北京)有限公司 | data query method and system |
CN108985367A (en) * | 2018-07-06 | 2018-12-11 | 中国科学院计算技术研究所 | Computing engines selection method and more computing engines platforms based on this method |
CN109033123A (en) * | 2018-05-31 | 2018-12-18 | 康键信息技术(深圳)有限公司 | Querying method, device, computer equipment and storage medium based on big data |
CN109325042A (en) * | 2018-08-14 | 2019-02-12 | 中国平安人寿保险股份有限公司 | Handle template acquisition methods, form processing method, device, equipment and medium |
CN109426983A (en) * | 2017-09-01 | 2019-03-05 | 北京京东尚科信息技术有限公司 | Dodge purchase activity automatic generation method and device, storage medium, electronic equipment |
CN109446395A (en) * | 2018-09-29 | 2019-03-08 | 上海派博软件有限公司 | A kind of method and system of the raising based on Hadoop big data comprehensive inquiry engine efficiency |
CN109684399A (en) * | 2018-12-24 | 2019-04-26 | 成都四方伟业软件股份有限公司 | Data bank access method, database access device and Data Analysis Platform |
CN109783511A (en) * | 2018-12-07 | 2019-05-21 | 成都四方伟业软件股份有限公司 | A kind of intelligence multi engine data query system and querying method |
CN110807145A (en) * | 2018-07-20 | 2020-02-18 | 中兴通讯股份有限公司 | Query engine acquisition method, device and computer-readable storage medium |
CN112307063A (en) * | 2020-10-16 | 2021-02-02 | 银盛支付服务股份有限公司 | Method and system for checking data quality of each platform by metadata |
CN112988782A (en) * | 2021-02-18 | 2021-06-18 | 新华三大数据技术有限公司 | Hive-supported interactive query method and device and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103729453A (en) * | 2014-01-02 | 2014-04-16 | 浪潮电子信息产业股份有限公司 | HBase table conjunctive query optimization method |
CN105426467A (en) * | 2015-11-16 | 2016-03-23 | 北京京东尚科信息技术有限公司 | SQL query method and system for Presto |
CN105787119A (en) * | 2016-03-25 | 2016-07-20 | 盛趣信息技术(上海)有限公司 | Hybrid engine based big data processing method and system |
CN105824957A (en) * | 2016-03-30 | 2016-08-03 | 电子科技大学 | Query engine system and query method of distributive memory column-oriented database |
-
2016
- 2016-10-11 CN CN201610887292.4A patent/CN106649503A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103729453A (en) * | 2014-01-02 | 2014-04-16 | 浪潮电子信息产业股份有限公司 | HBase table conjunctive query optimization method |
CN105426467A (en) * | 2015-11-16 | 2016-03-23 | 北京京东尚科信息技术有限公司 | SQL query method and system for Presto |
CN105787119A (en) * | 2016-03-25 | 2016-07-20 | 盛趣信息技术(上海)有限公司 | Hybrid engine based big data processing method and system |
CN105824957A (en) * | 2016-03-30 | 2016-08-03 | 电子科技大学 | Query engine system and query method of distributive memory column-oriented database |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107239548A (en) * | 2017-06-05 | 2017-10-10 | 携程旅游网络技术(上海)有限公司 | Report processing method based on SQL Server and HIVE |
CN107239548B (en) * | 2017-06-05 | 2020-04-07 | 携程旅游网络技术(上海)有限公司 | Report processing method based on SQL Server and HIVE |
CN109426983A (en) * | 2017-09-01 | 2019-03-05 | 北京京东尚科信息技术有限公司 | Dodge purchase activity automatic generation method and device, storage medium, electronic equipment |
CN107609130A (en) * | 2017-09-18 | 2018-01-19 | 链家网(北京)科技有限公司 | A kind of method and server for selecting data query engine |
CN108170860A (en) * | 2018-01-22 | 2018-06-15 | 北京小度信息科技有限公司 | Data query method, apparatus, electronic equipment and computer readable storage medium |
CN108363819A (en) * | 2018-03-23 | 2018-08-03 | 联想(北京)有限公司 | Query engine matching method, device, server group and readable storage medium storing program for executing |
CN108549683A (en) * | 2018-04-03 | 2018-09-18 | 联想(北京)有限公司 | data query method and system |
CN109033123A (en) * | 2018-05-31 | 2018-12-18 | 康键信息技术(深圳)有限公司 | Querying method, device, computer equipment and storage medium based on big data |
CN109033123B (en) * | 2018-05-31 | 2023-09-22 | 康键信息技术(深圳)有限公司 | Big data-based query method and device, computer equipment and storage medium |
CN108985367A (en) * | 2018-07-06 | 2018-12-11 | 中国科学院计算技术研究所 | Computing engines selection method and more computing engines platforms based on this method |
CN110807145A (en) * | 2018-07-20 | 2020-02-18 | 中兴通讯股份有限公司 | Query engine acquisition method, device and computer-readable storage medium |
CN109325042A (en) * | 2018-08-14 | 2019-02-12 | 中国平安人寿保险股份有限公司 | Handle template acquisition methods, form processing method, device, equipment and medium |
CN109325042B (en) * | 2018-08-14 | 2023-11-24 | 中国平安人寿保险股份有限公司 | Processing template acquisition method, form processing method, device, equipment and medium |
CN109446395A (en) * | 2018-09-29 | 2019-03-08 | 上海派博软件有限公司 | A kind of method and system of the raising based on Hadoop big data comprehensive inquiry engine efficiency |
CN109783511A (en) * | 2018-12-07 | 2019-05-21 | 成都四方伟业软件股份有限公司 | A kind of intelligence multi engine data query system and querying method |
CN109684399A (en) * | 2018-12-24 | 2019-04-26 | 成都四方伟业软件股份有限公司 | Data bank access method, database access device and Data Analysis Platform |
CN112307063A (en) * | 2020-10-16 | 2021-02-02 | 银盛支付服务股份有限公司 | Method and system for checking data quality of each platform by metadata |
CN112988782A (en) * | 2021-02-18 | 2021-06-18 | 新华三大数据技术有限公司 | Hive-supported interactive query method and device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106649503A (en) | Query method and system based on sql | |
CN109977110B (en) | Data cleaning method, device and equipment | |
US9870382B2 (en) | Data encoding and corresponding data structure | |
CN109739939A (en) | The data fusion method and device of knowledge mapping | |
CN105550225A (en) | Index construction method and query method and apparatus | |
CN111241840A (en) | Named entity identification method based on knowledge graph | |
US20180357298A1 (en) | Performance of Distributed Databases and Database-Dependent Software Applications | |
CN111782826A (en) | Knowledge graph information processing method, device, equipment and storage medium | |
CN112883030A (en) | Data collection method and device, computer equipment and storage medium | |
CN108228787A (en) | According to the method and apparatus of multistage classification processing information | |
US10599614B1 (en) | Intersection-based dynamic blocking | |
CN113918605A (en) | Data query method, device, equipment and computer storage medium | |
CN115905630A (en) | Graph database query method, device, equipment and storage medium | |
CN108255852B (en) | SQL execution method and device | |
CN109656947B (en) | Data query method and device, computer equipment and storage medium | |
CN113254624B (en) | Intelligent question-answering processing method, device, equipment and medium based on artificial intelligence | |
CN117807091A (en) | Data association method and device | |
WO2016119508A1 (en) | Method for recognizing large-scale objects based on spark system | |
CN114090722B (en) | Method and device for automatically completing query content | |
CN108268620A (en) | A kind of Document Classification Method based on hadoop data minings | |
CN114780700A (en) | Intelligent question-answering method, device, equipment and medium based on machine reading understanding | |
Casals et al. | SPARQL query execution time prediction using Deep Learning | |
CN111399838A (en) | Data modeling method and device based on spark SQ L and materialized view | |
CN116755683B (en) | Data processing method and related device | |
JP7443649B2 (en) | Model update method, device, electronic device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170510 |
|
RJ01 | Rejection of invention patent application after publication |