CN106649503A - Query method and system based on sql - Google Patents

Query method and system based on sql Download PDF

Info

Publication number
CN106649503A
CN106649503A CN201610887292.4A CN201610887292A CN106649503A CN 106649503 A CN106649503 A CN 106649503A CN 201610887292 A CN201610887292 A CN 201610887292A CN 106649503 A CN106649503 A CN 106649503A
Authority
CN
China
Prior art keywords
sql
operations
data volume
query engine
presto
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610887292.4A
Other languages
Chinese (zh)
Inventor
温宗臣
张翼
何良均
范卫卫
冯森林
李冰
曾攀
严亮
张书凡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING GEO POLYMERIZATION TECHNOLOGY Co Ltd
Original Assignee
BEIJING GEO POLYMERIZATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING GEO POLYMERIZATION TECHNOLOGY Co Ltd filed Critical BEIJING GEO POLYMERIZATION TECHNOLOGY Co Ltd
Priority to CN201610887292.4A priority Critical patent/CN106649503A/en
Publication of CN106649503A publication Critical patent/CN106649503A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a query method based on sql. The query efficiency can be improved, and the query stability is guaranteed. The method comprises the steps that 1, a query plan for sql statements is obtained, the amount of data involved in the operation is located, and a calculation method is determined; 2, based on the amount of data involved in the operation and the calculation method, one engine from hive, presto, and spark is selected and adopted as the query engine. Through the automated assessments of the operational complexities of sql statements, and based on the computing scenarios being good at by each computing engine, computing engines can be selected intelligently, and the aim of increasing computing efficiencies is achieved, meanwhile query stability is guaranteed. The invention also provides a sql query system.

Description

A kind of querying method and system based on sql
Technical field
The present invention relates to the technical field that big data is processed, more particularly to a kind of querying method based on sql, and be based on The inquiry system of sql.
Background technology
In big data field, in order to reduce the threshold that cluster is used, class sql language (SQL is usually used (Structured Query Language), abbreviation sql is a kind of programming language of specific purposes, is a kind of data base querying And programming language, for accessing data and inquiry, updating and administrative relationships Database Systems;It is also simultaneously database pin The extension name of presents) carrying out big data computing, the main flow query engine of sql language is supported at present to be had hive (Hive is base In a Tool for Data Warehouse of Hadoop, structurized data file can be mapped as a database table, and class is provided SQL query function), presto (presto is a distributed sql query engine increased income, it is adaptable to which interactive analysis is inquired about, Data volume supports GB to PB bytes) (spark is UC Berkeley AMP lab (University of California Berkeleys with spark AMP laboratories) the universal parallel framework of class Hadoop MapReduce increased income, spark possesses Hadoop MapReduce Had the advantage that;But different from MapReduce is that output result can be stored in internal memory in the middle of Job, so as to no longer need HDFS is read and write, therefore spark can preferably be applied to the calculation that data mining and machine learning etc. need the MapReduce of iteration Method) etc..
Generally, need sql sentences to be submitted on determination query engine, that is, need manually to come to specify to need to use Query engine, submit to sentence that time just determine that this sentence should be performed on hive query engines, or Perform on presto or spark query engines, these three computing platforms are completely independent.
The computation complexity of each sql sentence be it is different, it depend on the size of data volume that sql will calculate with And the logical complexity of sql itself, in real work, sql tasks great majority are routine mission, after routine, with number According to the growth of amount, the complexity of sql is rapid growth therewith, and the enforcement engine of sql can not be dynamically adjusted, and thus can Bring the reduction of execution efficiency, or even the failure of tasks carrying.
The content of the invention
To overcome the defect of prior art, the technical problem to be solved in the present invention to there is provided a kind of inquiry based on sql Method, it can lift search efficiency, while ensureing inquiry stability.
The technical scheme is that:This querying method based on sql, the method is comprised the following steps:
(1) inquiry plan of sql sentences is got, and then finds out the data volume for participating in computing, while finding computational methods;
(2) according to the data volume and computational methods for participating in computing, select using a kind of work in hive, presto, spark For query engine.
The present invention is assessed by the automation to sql sentence computational complexities, further according to the calculating that each computing engines are good at Scene, it is possible to intelligent selection computing engines, to reach the purpose for lifting operation efficiency, while ensure that inquiry stability.
A kind of inquiry system based on sql is additionally provided, the system includes:
Data volume and computation schema identification module, it configures to get the inquiry plan of sql sentences, and then finds out participation The data volume of computing, while finding computational methods;
Query engine intelligent Matching module, its configuration comes according to the data volume and computational methods for participating in computing, selects to adopt One kind in hive, presto, spark is used as query engine.
Description of the drawings
Fig. 1 show the flow chart of the querying method based on sql of the invention.
Specific embodiment
Tri- different query engines of hive, presto and spark have his own strong points:
1st, hive is highly stable, can support the batch processing computing of big data quantity, and stable not error-prone, and in data Execution efficiency is relatively low when little;
2nd, presto very lightweights, full internal memory operation, execution efficiency is very high when data volume is little, and data volume exceedes One definite limitation can perform abnormal even failure because internal memory is limited;
3rd, spark is also the query engine of full internal memory operation, and execution performance is being counted between hive and presto During according to amount super large, can equally become less stable.
The assessment that this motion passes through calculating sql sentences data volume to be processed and sql logical complexities itself, automatically choosing The query engine of sql sentences is selected, to lift search efficiency, while ensure that inquiry stability.
As shown in figure 1, this querying method based on sql, the method is comprised the following steps:
(1) inquiry plan of sql sentences is got, and then finds out the data volume for participating in computing, while finding computational methods;
(2) according to the data volume and computational methods for participating in computing, select using a kind of work in hive, presto, spark For query engine.
The present invention is assessed by the automation to sql sentence computational complexities, further according to the calculating that each computing engines are good at Scene, it is possible to intelligent selection computing engines, to reach the purpose for lifting operation efficiency, while ensure that inquiry stability.
In addition, in the step (1), by the explain of hive the inquiry plan of sql sentences is got.
In addition, in the step (1), computational methods include:Join is operated, group by operations, distinct operations.
In addition, in the step (2), if data volume is more than 100,000,000 datas, and there is join operations, group by Operation or distinct operations, select hive as query engine to submit sql operations to;If data volume is less than 10,000,000, Sql operations are submitted to using presto as query engine;If data volume does not exist between 1,000 ten thousand to 1 hundred million Join is operated, and group by operations or distinct are operated, and using presto as query engine sql operations are submitted to;Divided by Upper three kinds of situations, using spark as query engine sql operations are submitted to.
It will appreciated by the skilled person that realizing that all or part of step in above-described embodiment method can be Related hardware is instructed to complete by program, described program can be stored in a computer read/write memory medium, The program upon execution, including each step of above-described embodiment method, and described storage medium can be:ROM/RAM, magnetic Dish, CD, storage card etc..Therefore, corresponding with the method for the present invention, the present invention also includes a kind of looking into based on sql simultaneously Inquiry system, the system is generally represented in the form of the functional module corresponding with each step of method.Using the system bag of the method Include:
Data volume and computation schema identification module, it configures to get the inquiry plan of sql sentences, and then finds out participation The data volume of computing, while finding computational methods;
Query engine intelligent Matching module, its configuration comes according to the data volume and computational methods for participating in computing, selects to adopt One kind in hive, presto, spark is used as query engine.
In addition, in the data volume and computation schema identification module, by the explain of hive sql sentences are got Inquiry plan.
In addition, in the data volume and computation schema identification module, computational methods include:Join is operated, group by behaviour Make, distinct operations.
In addition, in the query engine intelligent Matching module, if data volume is more than 100,000,000 datas, and there is join Operation, group by operations or distinct operations, select hive as query engine to submit sql operations to;If data Amount is less than 10,000,000, and sql operations are submitted to as query engine using presto;If data volume is between 1,000 ten thousand to 1 hundred million, And there is no join operations, group by operations or distinct operations, using presto as query engine sql is submitted to Operation;Except three cases above, using spark as query engine sql operations are submitted to.
Beneficial effects of the present invention are as follows:
1st, during data modeling, using to substantial amounts of sql routine missions being processed to business datum.
2nd, in data analysis process, data query result can be caused quickly and correctly to return.
3rd, the advantage of each computing engines has been made full use of, has evaded the shortcoming of each computing engines so that sql tasks Submit intelligent to, the overall efficiency that such sql sentences are performed is highest.
The above, is only presently preferred embodiments of the present invention, not makees any pro forma restriction to the present invention, it is every according to According to any simple modification, equivalent variations and modification that the technical spirit of the present invention is made to above example, still belong to the present invention The protection domain of technical scheme.

Claims (8)

1. a kind of querying method based on sql, it is characterised in that:The method is comprised the following steps:
(1) inquiry plan of sql sentences is got, and then finds out the data volume for participating in computing, while finding computational methods;
(2) according to the data volume and computational methods for participating in computing, select using the one kind in hive, presto, spark as looking into Ask engine.
2. the querying method based on sql according to claim 1, it is characterised in that:In the step (1), by hive Explain get the inquiry plan of sql sentences.
3. the querying method based on sql according to claim 2, it is characterised in that:In the step (1), computational methods Including:Join is operated, group by operations, distinct operations.
4. the querying method based on sql according to claim 3, it is characterised in that:In the step (2), if data Amount is more than 100,000,000 datas, and there is join operations, group by operations or distinct operations, selects hive as looking into Ask engine to submit sql operations to;If data volume is less than 10,000,000, sql operations are submitted to as query engine using presto; If data volume is between 1,000 ten thousand to 1 hundred million, and does not have join operations, group by operations or distinct are operated, Sql operations are submitted to using presto as query engine;Except three cases above, using spark as query engine sql is submitted to Operation.
5. a kind of inquiry system based on sql, it is characterised in that:The system includes:
Data volume and computation schema identification module, it configures to get the inquiry plan of sql sentences,
And then the data volume for participating in computing is found out, while finding computational methods;
Query engine intelligent Matching module, its configuration come according to participate in computing data volume and computational methods, select using hive, One kind in presto, spark is used as query engine.
6. the inquiry system based on sql according to claim 5, it is characterised in that:The data volume and computation schema are known In other module, the inquiry plan of sql sentences is got by the explain of hive.
7. the inquiry system based on sql according to claim 6, it is characterised in that:The data volume and computation schema are known In other module, computational methods include:Join is operated, group by operations, distinct operations.
8. the inquiry system based on sql according to claim 7, it is characterised in that:The query engine intelligent Matching mould In block, if data volume is more than 100,000,000 datas, and there is join operations, group by operations or distinct operations, choosing Hive is selected as query engine to submit sql operations to;If data volume is less than 10,000,000, come using presto as query engine Submit sql operations to;If data volume is between 1,000 ten thousand to 1 hundred million, and do not exist join operation, group by operation or Distinct is operated, and using presto as query engine sql operations are submitted to;Except three cases above, using spark as looking into Ask engine and submit sql operations to.
CN201610887292.4A 2016-10-11 2016-10-11 Query method and system based on sql Pending CN106649503A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610887292.4A CN106649503A (en) 2016-10-11 2016-10-11 Query method and system based on sql

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610887292.4A CN106649503A (en) 2016-10-11 2016-10-11 Query method and system based on sql

Publications (1)

Publication Number Publication Date
CN106649503A true CN106649503A (en) 2017-05-10

Family

ID=58855154

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610887292.4A Pending CN106649503A (en) 2016-10-11 2016-10-11 Query method and system based on sql

Country Status (1)

Country Link
CN (1) CN106649503A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239548A (en) * 2017-06-05 2017-10-10 携程旅游网络技术(上海)有限公司 Report processing method based on SQL Server and HIVE
CN107609130A (en) * 2017-09-18 2018-01-19 链家网(北京)科技有限公司 A kind of method and server for selecting data query engine
CN108170860A (en) * 2018-01-22 2018-06-15 北京小度信息科技有限公司 Data query method, apparatus, electronic equipment and computer readable storage medium
CN108363819A (en) * 2018-03-23 2018-08-03 联想(北京)有限公司 Query engine matching method, device, server group and readable storage medium storing program for executing
CN108549683A (en) * 2018-04-03 2018-09-18 联想(北京)有限公司 data query method and system
CN108985367A (en) * 2018-07-06 2018-12-11 中国科学院计算技术研究所 Computing engines selection method and more computing engines platforms based on this method
CN109033123A (en) * 2018-05-31 2018-12-18 康键信息技术(深圳)有限公司 Querying method, device, computer equipment and storage medium based on big data
CN109325042A (en) * 2018-08-14 2019-02-12 中国平安人寿保险股份有限公司 Handle template acquisition methods, form processing method, device, equipment and medium
CN109426983A (en) * 2017-09-01 2019-03-05 北京京东尚科信息技术有限公司 Dodge purchase activity automatic generation method and device, storage medium, electronic equipment
CN109446395A (en) * 2018-09-29 2019-03-08 上海派博软件有限公司 A kind of method and system of the raising based on Hadoop big data comprehensive inquiry engine efficiency
CN109684399A (en) * 2018-12-24 2019-04-26 成都四方伟业软件股份有限公司 Data bank access method, database access device and Data Analysis Platform
CN109783511A (en) * 2018-12-07 2019-05-21 成都四方伟业软件股份有限公司 A kind of intelligence multi engine data query system and querying method
CN110807145A (en) * 2018-07-20 2020-02-18 中兴通讯股份有限公司 Query engine acquisition method, device and computer-readable storage medium
CN112307063A (en) * 2020-10-16 2021-02-02 银盛支付服务股份有限公司 Method and system for checking data quality of each platform by metadata
CN112988782A (en) * 2021-02-18 2021-06-18 新华三大数据技术有限公司 Hive-supported interactive query method and device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103729453A (en) * 2014-01-02 2014-04-16 浪潮电子信息产业股份有限公司 HBase table conjunctive query optimization method
CN105426467A (en) * 2015-11-16 2016-03-23 北京京东尚科信息技术有限公司 SQL query method and system for Presto
CN105787119A (en) * 2016-03-25 2016-07-20 盛趣信息技术(上海)有限公司 Hybrid engine based big data processing method and system
CN105824957A (en) * 2016-03-30 2016-08-03 电子科技大学 Query engine system and query method of distributive memory column-oriented database

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103729453A (en) * 2014-01-02 2014-04-16 浪潮电子信息产业股份有限公司 HBase table conjunctive query optimization method
CN105426467A (en) * 2015-11-16 2016-03-23 北京京东尚科信息技术有限公司 SQL query method and system for Presto
CN105787119A (en) * 2016-03-25 2016-07-20 盛趣信息技术(上海)有限公司 Hybrid engine based big data processing method and system
CN105824957A (en) * 2016-03-30 2016-08-03 电子科技大学 Query engine system and query method of distributive memory column-oriented database

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239548A (en) * 2017-06-05 2017-10-10 携程旅游网络技术(上海)有限公司 Report processing method based on SQL Server and HIVE
CN107239548B (en) * 2017-06-05 2020-04-07 携程旅游网络技术(上海)有限公司 Report processing method based on SQL Server and HIVE
CN109426983A (en) * 2017-09-01 2019-03-05 北京京东尚科信息技术有限公司 Dodge purchase activity automatic generation method and device, storage medium, electronic equipment
CN107609130A (en) * 2017-09-18 2018-01-19 链家网(北京)科技有限公司 A kind of method and server for selecting data query engine
CN108170860A (en) * 2018-01-22 2018-06-15 北京小度信息科技有限公司 Data query method, apparatus, electronic equipment and computer readable storage medium
CN108363819A (en) * 2018-03-23 2018-08-03 联想(北京)有限公司 Query engine matching method, device, server group and readable storage medium storing program for executing
CN108549683A (en) * 2018-04-03 2018-09-18 联想(北京)有限公司 data query method and system
CN109033123A (en) * 2018-05-31 2018-12-18 康键信息技术(深圳)有限公司 Querying method, device, computer equipment and storage medium based on big data
CN109033123B (en) * 2018-05-31 2023-09-22 康键信息技术(深圳)有限公司 Big data-based query method and device, computer equipment and storage medium
CN108985367A (en) * 2018-07-06 2018-12-11 中国科学院计算技术研究所 Computing engines selection method and more computing engines platforms based on this method
CN110807145A (en) * 2018-07-20 2020-02-18 中兴通讯股份有限公司 Query engine acquisition method, device and computer-readable storage medium
CN109325042A (en) * 2018-08-14 2019-02-12 中国平安人寿保险股份有限公司 Handle template acquisition methods, form processing method, device, equipment and medium
CN109325042B (en) * 2018-08-14 2023-11-24 中国平安人寿保险股份有限公司 Processing template acquisition method, form processing method, device, equipment and medium
CN109446395A (en) * 2018-09-29 2019-03-08 上海派博软件有限公司 A kind of method and system of the raising based on Hadoop big data comprehensive inquiry engine efficiency
CN109783511A (en) * 2018-12-07 2019-05-21 成都四方伟业软件股份有限公司 A kind of intelligence multi engine data query system and querying method
CN109684399A (en) * 2018-12-24 2019-04-26 成都四方伟业软件股份有限公司 Data bank access method, database access device and Data Analysis Platform
CN112307063A (en) * 2020-10-16 2021-02-02 银盛支付服务股份有限公司 Method and system for checking data quality of each platform by metadata
CN112988782A (en) * 2021-02-18 2021-06-18 新华三大数据技术有限公司 Hive-supported interactive query method and device and storage medium

Similar Documents

Publication Publication Date Title
CN106649503A (en) Query method and system based on sql
CN109977110B (en) Data cleaning method, device and equipment
US9870382B2 (en) Data encoding and corresponding data structure
CN109739939A (en) The data fusion method and device of knowledge mapping
CN105550225A (en) Index construction method and query method and apparatus
CN111241840A (en) Named entity identification method based on knowledge graph
US20180357298A1 (en) Performance of Distributed Databases and Database-Dependent Software Applications
CN111782826A (en) Knowledge graph information processing method, device, equipment and storage medium
CN112883030A (en) Data collection method and device, computer equipment and storage medium
CN108228787A (en) According to the method and apparatus of multistage classification processing information
US10599614B1 (en) Intersection-based dynamic blocking
CN113918605A (en) Data query method, device, equipment and computer storage medium
CN115905630A (en) Graph database query method, device, equipment and storage medium
CN108255852B (en) SQL execution method and device
CN109656947B (en) Data query method and device, computer equipment and storage medium
CN113254624B (en) Intelligent question-answering processing method, device, equipment and medium based on artificial intelligence
CN117807091A (en) Data association method and device
WO2016119508A1 (en) Method for recognizing large-scale objects based on spark system
CN114090722B (en) Method and device for automatically completing query content
CN108268620A (en) A kind of Document Classification Method based on hadoop data minings
CN114780700A (en) Intelligent question-answering method, device, equipment and medium based on machine reading understanding
Casals et al. SPARQL query execution time prediction using Deep Learning
CN111399838A (en) Data modeling method and device based on spark SQ L and materialized view
CN116755683B (en) Data processing method and related device
JP7443649B2 (en) Model update method, device, electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170510

RJ01 Rejection of invention patent application after publication