CN104391895A - SQL (Structured Query Language) sentence processing system based on cloud computing - Google Patents

SQL (Structured Query Language) sentence processing system based on cloud computing Download PDF

Info

Publication number
CN104391895A
CN104391895A CN201410636239.8A CN201410636239A CN104391895A CN 104391895 A CN104391895 A CN 104391895A CN 201410636239 A CN201410636239 A CN 201410636239A CN 104391895 A CN104391895 A CN 104391895A
Authority
CN
China
Prior art keywords
atomic
data
sql statement
adapter
atomic object
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410636239.8A
Other languages
Chinese (zh)
Inventor
别志铭
张健明
张勇鹏
王旭
王礼
吴楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
DINGLI COMMUNICATIONS CORP Ltd
Original Assignee
DINGLI COMMUNICATIONS CORP Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by DINGLI COMMUNICATIONS CORP Ltd filed Critical DINGLI COMMUNICATIONS CORP Ltd
Priority to CN201410636239.8A priority Critical patent/CN104391895A/en
Publication of CN104391895A publication Critical patent/CN104391895A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an SQL (Structured Query Language) sentence processing system based on cloud computing. The SQL sentence processing system comprises a compiling resolver, a combination optimizer, an executive monitor and a data adapter, wherein the compiling resolver is used for converting a received SQL sentence or a text in a storage process to a corresponding atom object; the combination optimizer is used for traversing the atom object of each SQL sentence and extracting the atom objects with the same content as a public atom object; the executive monitor is used for receiving the atom objects processed by the combination optimizer, distributing an independent executable component for each atom object and obtaining a computed result through operation of the executable component; the data adapter comprises a data input adapter and a data output adapter, the data input adapter is used for reading data from different data sources and transmitting the data to the compiling resolver; the data output adapter is used for writing the computed result generated by the executive monitor into the corresponding data source. According to the SQL sentence processing system, the SQL sentence processing speed is improved, a CPU (Central Processing Unit) and an internal storage are optimized, and the flow consumption is reduced.

Description

A kind of SQL statement disposal system based on cloud computing
Technical field
The present invention relates to database processing field, particularly a kind of SQL statement disposal system based on cloud computing.
Background technology
In the sql statement system of current support cloud computing, mass simultaneous is submitted to the query SQL statement of more than 2, every bar SQL statement is all independent execution, shared union operation is not carried out to expressions and statements identical inside SQL statement, when causing many SQL statement to perform, the internal memory of system and CPU etc. take and sharply become large simultaneously, also slow when the speed of operation performs than wall scroll SQL order, some system then directly causes the phenomenons such as low memory, causes tasks carrying failure.
Summary of the invention
For solving the problem, the object of the present invention is to provide a kind of SQL statement disposal system based on cloud computing, optimizing the parallel of SQL statement and serial operation, internal memory during reduction SQL statement process and CPU consume, and promote SQL statement treatment effeciency.
The present invention solves the technical scheme that its problem adopts:
Based on a SQL statement disposal system for cloud computing, comprising:
Compiling resolver, the text-converted for the SQL statement that will receive or storing process is corresponding atomic object;
Merging optimizer, for traveling through the atomic object of each SQL statement, the atomic object with identical content being extracted as a public atomic object;
Executive monitor, for receiving the atomic object after merging optimizer process, for each atomic object distributes independently can executive module, and by the operation of executive module obtaining result of calculation.
Further, also comprise data adapter unit, described data adapter unit comprises data input adapter and data o adapter, wherein:
Described data input adapter is used for reading data from different data sources, and is passed to compiling resolver;
Described data o adapter is used for the result of calculation that executive monitor generates to write in corresponding data source.
Further, the text of described storing process is SQL statement text.
Further, described compiling resolver comprises:
Grammer judging unit, for judging whether grammaticalness specifies corresponding SQL statement;
Resolution unit, the SQL statement for grammaticalness being specified is divided into corresponding field expression formula and/or conditional expression;
Grammatical analysis object unit, for converting field expression and/or conditional expression to syntax tree;
Atomic object unit, for the object inside syntax tree being decomposed into further the atomic object of minimum particle size, the atomic object of described minimum particle size comprises field, table name, function and point group objects.
Further, the atomic object of described minimum particle size is saved in a hashmap object, and different atomic objects, all there is unique GUID numbering.
Further, described merging optimizer is when extracting public atomic object, if father's node of atomic object is an expression formula object, and the expression formula contents of object of all atomic objects is identical, then extract this expression formula object as public expression formula object, public atomic object, the expression formula object of described extraction are all stored into in public memory pool.
Further, described executive monitor comprises:
Actuator, for independently can executive module for distributing for each atomic object, each assembly is independent in actuator respectively to be performed;
Watch-dog, for record each can executive module start time, end time, take CPU and internal memory and expend network traffics parameter, with optimize can the serial of executive module with parallel, obtain this calculate in critical path and optimization model.
The invention has the beneficial effects as follows:
The present invention adopts a kind of SQL statement disposal system based on cloud computing, first SQL statement is divided into atomic object, again merging optimization is carried out to atomic object, extract public atomic object or expression formula object, for each atom or expression formula object distribute independent can executive module, and monitor implementation, can start time of executive module according to each of monitoring, end time, take CPU and internal memory and expend the parameters such as network traffics, determine which assembly can parallel running, which assembly must run in serial, and the critical path calculated according to this in this subtask and optimization model, to be optimized system performance etc.The present invention is when splitting atomic object, and each atomic object all has unique GUID numbering, convenient record and follow-up merging optimization; What store in public memory pool is not only atomic object, can also be father's node of atomic object---expression formula object, achieve the sharing functionality of large objects; According to start time, the end time monitored, take CPU and internal memory and expend the parameters such as network traffics, which can walk abreast and serial by executive module in timely judgement, thus the critical path calculated in this subtask and optimization model, improve travelling speed, optimize CPU and internal memory, reduce traffic consumes.
Accompanying drawing explanation
Below in conjunction with accompanying drawing and example, the invention will be further described.
Fig. 1 is the one-piece construction schematic diagram of system described in the preferred embodiment of the present invention;
Fig. 2 is the syntax tree schematic diagram after being divided into atomic object in the preferred embodiment of the present invention;
Fig. 3 optimizes the syntax tree schematic diagram after merging in the preferred embodiment of the present invention.
Embodiment
Embodiment 1:
With reference to shown in Fig. 1, the preferred embodiments of the present invention provide a kind of SQL statement disposal system based on cloud computing, comprising:
Compiling resolver, the text-converted for the SQL statement that will receive or storing process is corresponding atomic object;
Merging optimizer, for traveling through the atomic object of each SQL statement, the atomic object with identical content being extracted as a public atomic object;
Executive monitor, for receiving the atomic object after merging optimizer process, for each atomic object distributes independently can executive module, and by the operation of executive module obtaining result of calculation.
The data source comprising SQL statement has multiple, comprise relational database, hadoop, hbase, hypertable etc., in order to adapt to read and write different data sources and data layout (comprising file or scale-of-two etc.), the present invention is provided with data adapter unit.Described data adapter unit is equivalent to for different data sources provides unified interface, namely can be used for never same data source and reads data, also can be used for data to write corresponding data source.Data adapter unit is while unified interface, for different data sources, write separately different inputformat and outputformat objects, need to do independent special optimization and improvement to different data, make full use of various different feature, realize digital independent and input.Data adapter unit comprises input data adapter unit and exports data adapter unit, data are read from different data sources by data input adapter, and be passed to compiling resolver, after batch query SQL is complete, system call exports data adapter unit, the result of calculation that executive monitor generates is write in corresponding data source.
The text that compiling resolver receives storing process is SQL statement text.Described compiling resolver comprises:
Grammer judging unit, for judging whether grammaticalness specifies corresponding SQL statement;
Resolution unit, the SQL statement for grammaticalness being specified is divided into corresponding field expression formula and/or conditional expression;
Grammatical analysis object unit, for converting field expression and/or conditional expression to syntax tree;
Atomic object unit, for the object inside syntax tree being decomposed into further the atomic object of minimum particle size, the atomic object of described minimum particle size comprises field, table name, function and point group objects.
When the SQL statement of inquiry or the text of storing process are sent to after in compiling resolver, first grammer judging unit carries out grammer judgement, if there is the text not meeting grammer, will quotes exception, directly exit; Afterwards, resolution unit, by grammatical text object (SQL statement), is divided into field expression and conditional expression etc.; Afterwards, text object is changed into syntax tree by grammatical analysis object unit; And the object inside syntax tree decomposes by atomic object unit more further, resolve into the atomic object of minimum particle size, wherein the atomic object of minimum particle size comprises field, table name, function and point group objects always.Atomic object can be saved in a hashmap object by system, and different atomic objects, all there is unique GUID numbering.
After compiling resolver becomes atomic object text resolution, just enter into and merge the optimizing phase.Travel through the atomic object of each SQL statement by merging optimizer, the atomic object with identical content is extracted and is merged into a public atomic object, be stored in public memory pool, grammer leaf just deposits quoting of this atomic object.If father's node of atomic object is an expression formula object, and the content of whole expression formula object is all identical, merge optimizer then by this identical expression formula object extraction out, be merged into an expression formula object, be stored in public memory object pond, realize the sharing functionality of large objects, so analogize, optimize whole sub-SQL statement to merge.For hadoop system, wherein atomic object divides by implementation, can be divided into the map stage, combine process and reduce stage three types, namely merges optimization object, can reduce at most by 3 process process.
After the optimised merging of atomic object, be all simultaneously input in executive monitor.Executive monitor comprises actuator and watch-dog, and in actuator, each atomic object can find can executive module with one of oneself correspondence, and each assembly is independent in actuator respectively to be performed.Watch-dog can record each can executive module start time, end time, take CPU and internal memory, expend the parameters such as network traffics.Had these parameters, system can determine which assembly can parallel running, and which assembly must run in serial, so that the critical path calculated in this subtask and optimization model, is optimized system performance etc.
The above, first SQL statement is divided into atomic object by the present invention, again merging optimization is carried out to atomic object, extract public atomic object or expression formula object, for each atom or expression formula object distribute independent can executive module, and monitor implementation, can start time of executive module according to each of monitoring, end time, take CPU and internal memory and expend the parameters such as network traffics, determine which assembly can parallel running, which assembly must run in serial, and the critical path calculated according to this in this subtask and optimization model, to be optimized system performance etc.The present invention is when splitting atomic object, and each atomic object all has unique GUID numbering, convenient record and follow-up merging optimization; What store in public memory pool is not only atomic object, can also be father's node of atomic object---expression formula object, achieve the sharing functionality of large objects; According to start time, the end time monitored, take CPU and internal memory and expend the parameters such as network traffics, which can walk abreast and serial by executive module in timely judgement, thus the critical path calculated in this subtask and optimization model, improve travelling speed, optimize CPU and internal memory, reduce traffic consumes.
Embodiment 2:
This preferred embodiment, is further expalined the present invention for illustrating with concrete example.
Example is the following is respectively with 2 sql statements that batch is submitted to:
SQL1:select a,(a+b) as c, count(*) from t where a+b > 0 group by a, (a+b);
SQL2: select (a+b) as c, count(*) from t where a+b + d > 100 group by (a+b)。
With reference to shown in Fig. 2, first by compiling resolver, every bar sql statement loop is resolved into syntax tree, branch Node Decomposition is become the atomic object (field, table name, function, point group objects) of minimum particle size, displaying contents in the result after decomposition and figure.For SQL1, after decomposition, a, (a+b) as c is field object, count (*) is function object, the a+b in (a+b) as c be for field object a, b and operand+combination, table object is t, search criterion to be a+b > 0, condition a+b > 0 be equally field object a, b and operand+combination.The decomposition of SQL2 is similar, refers to the result in figure.
Afterwards, with reference to shown in Fig. 3, by merging optimizer, merging each identical child node inside syntax tree, forming public memory pool.Wherein (a+b) expression formula object does not need these 2 field object of a and b in SQL syntax tree.In public memory pool, GUID numbering and corresponding atomic object or expression formula object are also with reference to shown in figure.Same for SQL1, merge after optimizing, atoms in common object 1 is a, and common expression object 3 is a+b, and public table object 4 is t.SQL2 merge optimize after the results are shown in Figure described in.
Afterwards, executive monitor will optimize the object after merging stored in internal memory, and first perform the object inside public memory pool, and then difference executed in parallel 2 SQL syntax trees.At the map that hadoop performs, combine, reduce3 stage is found corresponding assembly to explain respectively and performs, if groupby object is as the key value of map, count(*) be exactly that counter adds 1, if a+b expression formula, then call expression formula object and calculate, as field a just inside the data sources such as hadoop, read a secondary data by data adapter unit after give 2 SQL syntax trees calculate.
The above, just preferred embodiment of the present invention, the present invention is not limited to above-mentioned embodiment, as long as it reaches technique effect of the present invention with identical means, all should belong to protection scope of the present invention.

Claims (7)

1., based on a SQL statement disposal system for cloud computing, it is characterized in that, comprising:
Compiling resolver, the text-converted for the SQL statement that will receive or storing process is corresponding atomic object;
Merging optimizer, for traveling through the atomic object of each SQL statement, the atomic object with identical content being extracted as a public atomic object;
Executive monitor, for receiving the atomic object after merging optimizer process, for each atomic object distributes independently can executive module, and by the operation of executive module obtaining result of calculation.
2. SQL statement disposal system according to claim 1, is characterized in that, also comprise data adapter unit, and described data adapter unit comprises data input adapter and data o adapter, wherein:
Described data input adapter is used for reading data from different data sources, and is passed to compiling resolver;
Described data o adapter is used for the result of calculation that executive monitor generates to write in corresponding data source.
3. SQL statement disposal system according to claim 1, is characterized in that, the text of described storing process is SQL statement text.
4. SQL statement disposal system according to claim 1, is characterized in that, described compiling resolver comprises:
Grammer judging unit, for judging whether grammaticalness specifies corresponding SQL statement;
Resolution unit, the SQL statement for grammaticalness being specified is divided into corresponding field expression formula and/or conditional expression;
Grammatical analysis object unit, for converting field expression and/or conditional expression to syntax tree;
Atomic object unit, for the object inside syntax tree being decomposed into further the atomic object of minimum particle size, the atomic object of described minimum particle size comprises field, table name, function and point group objects.
5. SQL statement disposal system according to claim 4, is characterized in that, the atomic object of described minimum particle size is saved in a hashmap object, and different atomic objects, all there is unique GUID numbering.
6. SQL statement disposal system according to claim 1, it is characterized in that, described merging optimizer is when extracting public atomic object, if father's node of atomic object is an expression formula object, and the expression formula contents of object of all atomic objects is identical, then extract this expression formula object as public expression formula object, public atomic object, the expression formula object of described extraction are all stored into in public memory pool.
7. SQL statement disposal system according to claim 1, is characterized in that, described executive monitor comprises:
Actuator, for independently can executive module for distributing for each atomic object, each assembly is independent in actuator respectively to be performed;
Watch-dog, for record each can executive module start time, end time, take CPU and internal memory and expend network traffics parameter, with optimize can the serial of executive module with parallel, obtain this calculate in critical path and optimization model.
CN201410636239.8A 2014-11-12 2014-11-12 SQL (Structured Query Language) sentence processing system based on cloud computing Pending CN104391895A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410636239.8A CN104391895A (en) 2014-11-12 2014-11-12 SQL (Structured Query Language) sentence processing system based on cloud computing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410636239.8A CN104391895A (en) 2014-11-12 2014-11-12 SQL (Structured Query Language) sentence processing system based on cloud computing

Publications (1)

Publication Number Publication Date
CN104391895A true CN104391895A (en) 2015-03-04

Family

ID=52609799

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410636239.8A Pending CN104391895A (en) 2014-11-12 2014-11-12 SQL (Structured Query Language) sentence processing system based on cloud computing

Country Status (1)

Country Link
CN (1) CN104391895A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105740344A (en) * 2016-01-25 2016-07-06 中国科学院计算技术研究所 Sql statement combination method and system independent of database
WO2017156673A1 (en) * 2016-03-14 2017-09-21 华为技术有限公司 Processing method and device for stored procedure
CN107506365A (en) * 2017-06-26 2017-12-22 杭州沃趣科技股份有限公司 A kind of method that calculating is merged to output row
CN107818100A (en) * 2016-09-12 2018-03-20 杭州海康威视数字技术股份有限公司 A kind of SQL statement performs method and device
CN108153894A (en) * 2017-12-29 2018-06-12 上海跬智信息技术有限公司 A kind of method of OLAP data model automatic modeling, grader
WO2018120171A1 (en) * 2016-12-30 2018-07-05 华为技术有限公司 Method, device and system for executing stored procedure
CN109145013A (en) * 2018-08-10 2019-01-04 上海达梦数据库有限公司 A kind of expression formula conversion method, device, equipment and storage medium
CN110032574A (en) * 2019-03-07 2019-07-19 北京东方国信科技股份有限公司 The processing method and processing device of SQL statement
CN112069198A (en) * 2020-07-16 2020-12-11 中科驭数(北京)科技有限公司 SQL analysis optimization method and device
CN112540803A (en) * 2020-12-18 2021-03-23 深圳赛安特技术服务有限公司 Form design adaptation method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102156740A (en) * 2011-04-15 2011-08-17 国都兴业信息审计系统技术(北京)有限公司 SQL (structured query language) statement processing method and system
CN103761080A (en) * 2013-12-25 2014-04-30 中国农业大学 Structured query language (SQL) based MapReduce operation generating method and system
US20140181073A1 (en) * 2012-12-20 2014-06-26 Business Objects Software Ltd. Method and system for generating optimal membership-check queries

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102156740A (en) * 2011-04-15 2011-08-17 国都兴业信息审计系统技术(北京)有限公司 SQL (structured query language) statement processing method and system
US20140181073A1 (en) * 2012-12-20 2014-06-26 Business Objects Software Ltd. Method and system for generating optimal membership-check queries
CN103761080A (en) * 2013-12-25 2014-04-30 中国农业大学 Structured query language (SQL) based MapReduce operation generating method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈玉峰: "《SQL Server 2000数据库开发教程 数据库开发师》", 30 September 3003 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105740344A (en) * 2016-01-25 2016-07-06 中国科学院计算技术研究所 Sql statement combination method and system independent of database
WO2017156673A1 (en) * 2016-03-14 2017-09-21 华为技术有限公司 Processing method and device for stored procedure
CN107818100A (en) * 2016-09-12 2018-03-20 杭州海康威视数字技术股份有限公司 A kind of SQL statement performs method and device
CN107818100B (en) * 2016-09-12 2019-12-20 杭州海康威视数字技术股份有限公司 SQL statement execution method and device
WO2018120171A1 (en) * 2016-12-30 2018-07-05 华为技术有限公司 Method, device and system for executing stored procedure
US11182353B2 (en) 2016-12-30 2021-11-23 Huawei Technologies Co., Ltd. Stored-procedure execution method and device, and system
CN107506365B (en) * 2017-06-26 2021-02-12 杭州沃趣科技股份有限公司 Method for carrying out merging calculation on output columns
CN107506365A (en) * 2017-06-26 2017-12-22 杭州沃趣科技股份有限公司 A kind of method that calculating is merged to output row
CN108153894A (en) * 2017-12-29 2018-06-12 上海跬智信息技术有限公司 A kind of method of OLAP data model automatic modeling, grader
CN109145013A (en) * 2018-08-10 2019-01-04 上海达梦数据库有限公司 A kind of expression formula conversion method, device, equipment and storage medium
CN110032574B (en) * 2019-03-07 2021-02-02 北京东方国信科技股份有限公司 SQL statement processing method and device
CN110032574A (en) * 2019-03-07 2019-07-19 北京东方国信科技股份有限公司 The processing method and processing device of SQL statement
CN112069198A (en) * 2020-07-16 2020-12-11 中科驭数(北京)科技有限公司 SQL analysis optimization method and device
CN112540803A (en) * 2020-12-18 2021-03-23 深圳赛安特技术服务有限公司 Form design adaptation method, device, equipment and storage medium
CN112540803B (en) * 2020-12-18 2023-08-11 深圳赛安特技术服务有限公司 Form design adaptation method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN104391895A (en) SQL (Structured Query Language) sentence processing system based on cloud computing
Boykin et al. Summingbird: A framework for integrating batch and online mapreduce computations
CN109614432B (en) System and method for acquiring data blood relationship based on syntactic analysis
Lin et al. Mining high utility itemsets in big data
Subramaniyaswamy et al. Unstructured data analysis on big data using map reduce
Brenna et al. Distributed event stream processing with non-deterministic finite automata
JP6516110B2 (en) Multiple Query Optimization in SQL-on-Hadoop System
CN103761080A (en) Structured query language (SQL) based MapReduce operation generating method and system
CN105550268A (en) Big data process modeling analysis engine
US11132363B2 (en) Distributed computing framework and distributed computing method
CN105260374A (en) Asynchronous production line type graph query method and asynchronous production line type graph query system
Mishra et al. Structured and unstructured big data analytics
CN109063017A (en) A kind of data persistence location mode of cloud computing platform
Ge et al. Adaptive analytic service for real-time internet of things applications
Binnig et al. SQLScript: Efficiently analyzing big enterprise data in SAP HANA
Zou et al. From a stream of relational queries to distributed stream processing
Woods et al. Fast data analytics with FPGAs
CN102737134B (en) Query processing method being suitable for large-scale real-time data stream
Hong et al. The study of improved FP-growth algorithm in MapReduce
CN104298657A (en) Evaluation index analysis system based on expression
Saleh et al. Complex event processing on linked stream data
Abdullah et al. HOME: HiveQL optimization in multi-session environment
Schildgen et al. Marimba: A framework for making mapreduce jobs incremental
Artola et al. A stream computing approach towards scalable NLP.
CN103279328A (en) BlogRank algorithm parallelization processing construction method based on Haloop

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 519085, No. five, No. 8, Harbour Road, Zhuhai, Guangdong

Applicant after: DINGLI CORP., LTD.

Address before: 519085, No. five, No. 8, Harbour Road, Zhuhai, Guangdong

Applicant before: Dingli Communications Corp., Ltd.

COR Change of bibliographic data
RJ01 Rejection of invention patent application after publication

Application publication date: 20150304

RJ01 Rejection of invention patent application after publication