CN104391895A

CN104391895A - SQL (Structured Query Language) sentence processing system based on cloud computing

Info

Publication number: CN104391895A
Application number: CN201410636239.8A
Authority: CN
Inventors: 别志铭; 张健明; 张勇鹏; 王旭; 王礼; 吴楠
Original assignee: DINGLI COMMUNICATIONS CORP Ltd
Current assignee: DINGLI COMMUNICATIONS CORP Ltd
Priority date: 2014-11-12
Filing date: 2014-11-12
Publication date: 2015-03-04

Abstract

The invention discloses an SQL (Structured Query Language) sentence processing system based on cloud computing. The SQL sentence processing system comprises a compiling resolver, a combination optimizer, an executive monitor and a data adapter, wherein the compiling resolver is used for converting a received SQL sentence or a text in a storage process to a corresponding atom object; the combination optimizer is used for traversing the atom object of each SQL sentence and extracting the atom objects with the same content as a public atom object; the executive monitor is used for receiving the atom objects processed by the combination optimizer, distributing an independent executable component for each atom object and obtaining a computed result through operation of the executable component; the data adapter comprises a data input adapter and a data output adapter, the data input adapter is used for reading data from different data sources and transmitting the data to the compiling resolver; the data output adapter is used for writing the computed result generated by the executive monitor into the corresponding data source. According to the SQL sentence processing system, the SQL sentence processing speed is improved, a CPU (Central Processing Unit) and an internal storage are optimized, and the flow consumption is reduced.

Description

A kind of SQL statement disposal system based on cloud computing

Technical field

The present invention relates to database processing field, particularly a kind of SQL statement disposal system based on cloud computing.

Background technology

In the sql statement system of current support cloud computing, mass simultaneous is submitted to the query SQL statement of more than 2, every bar SQL statement is all independent execution, shared union operation is not carried out to expressions and statements identical inside SQL statement, when causing many SQL statement to perform, the internal memory of system and CPU etc. take and sharply become large simultaneously, also slow when the speed of operation performs than wall scroll SQL order, some system then directly causes the phenomenons such as low memory, causes tasks carrying failure.

Summary of the invention

For solving the problem, the object of the present invention is to provide a kind of SQL statement disposal system based on cloud computing, optimizing the parallel of SQL statement and serial operation, internal memory during reduction SQL statement process and CPU consume, and promote SQL statement treatment effeciency.

The present invention solves the technical scheme that its problem adopts:

Based on a SQL statement disposal system for cloud computing, comprising:

Compiling resolver, the text-converted for the SQL statement that will receive or storing process is corresponding atomic object;

Merging optimizer, for traveling through the atomic object of each SQL statement, the atomic object with identical content being extracted as a public atomic object;

Executive monitor, for receiving the atomic object after merging optimizer process, for each atomic object distributes independently can executive module, and by the operation of executive module obtaining result of calculation.

Further, also comprise data adapter unit, described data adapter unit comprises data input adapter and data o adapter, wherein:

Described data input adapter is used for reading data from different data sources, and is passed to compiling resolver;

Described data o adapter is used for the result of calculation that executive monitor generates to write in corresponding data source.

Further, the text of described storing process is SQL statement text.

Further, described compiling resolver comprises:

Grammer judging unit, for judging whether grammaticalness specifies corresponding SQL statement;

Resolution unit, the SQL statement for grammaticalness being specified is divided into corresponding field expression formula and/or conditional expression;

Grammatical analysis object unit, for converting field expression and/or conditional expression to syntax tree;

Atomic object unit, for the object inside syntax tree being decomposed into further the atomic object of minimum particle size, the atomic object of described minimum particle size comprises field, table name, function and point group objects.

Further, the atomic object of described minimum particle size is saved in a hashmap object, and different atomic objects, all there is unique GUID numbering.

Further, described merging optimizer is when extracting public atomic object, if father's node of atomic object is an expression formula object, and the expression formula contents of object of all atomic objects is identical, then extract this expression formula object as public expression formula object, public atomic object, the expression formula object of described extraction are all stored into in public memory pool.

Further, described executive monitor comprises:

Actuator, for independently can executive module for distributing for each atomic object, each assembly is independent in actuator respectively to be performed;

Watch-dog, for record each can executive module start time, end time, take CPU and internal memory and expend network traffics parameter, with optimize can the serial of executive module with parallel, obtain this calculate in critical path and optimization model.

The invention has the beneficial effects as follows:

The present invention adopts a kind of SQL statement disposal system based on cloud computing, first SQL statement is divided into atomic object, again merging optimization is carried out to atomic object, extract public atomic object or expression formula object, for each atom or expression formula object distribute independent can executive module, and monitor implementation, can start time of executive module according to each of monitoring, end time, take CPU and internal memory and expend the parameters such as network traffics, determine which assembly can parallel running, which assembly must run in serial, and the critical path calculated according to this in this subtask and optimization model, to be optimized system performance etc.The present invention is when splitting atomic object, and each atomic object all has unique GUID numbering, convenient record and follow-up merging optimization; What store in public memory pool is not only atomic object, can also be father's node of atomic object---expression formula object, achieve the sharing functionality of large objects; According to start time, the end time monitored, take CPU and internal memory and expend the parameters such as network traffics, which can walk abreast and serial by executive module in timely judgement, thus the critical path calculated in this subtask and optimization model, improve travelling speed, optimize CPU and internal memory, reduce traffic consumes.

Accompanying drawing explanation

Below in conjunction with accompanying drawing and example, the invention will be further described.

Fig. 1 is the one-piece construction schematic diagram of system described in the preferred embodiment of the present invention;

Fig. 2 is the syntax tree schematic diagram after being divided into atomic object in the preferred embodiment of the present invention;

Fig. 3 optimizes the syntax tree schematic diagram after merging in the preferred embodiment of the present invention.

Embodiment

Embodiment 1:

With reference to shown in Fig. 1, the preferred embodiments of the present invention provide a kind of SQL statement disposal system based on cloud computing, comprising:

The data source comprising SQL statement has multiple, comprise relational database, hadoop, hbase, hypertable etc., in order to adapt to read and write different data sources and data layout (comprising file or scale-of-two etc.), the present invention is provided with data adapter unit.Described data adapter unit is equivalent to for different data sources provides unified interface, namely can be used for never same data source and reads data, also can be used for data to write corresponding data source.Data adapter unit is while unified interface, for different data sources, write separately different inputformat and outputformat objects, need to do independent special optimization and improvement to different data, make full use of various different feature, realize digital independent and input.Data adapter unit comprises input data adapter unit and exports data adapter unit, data are read from different data sources by data input adapter, and be passed to compiling resolver, after batch query SQL is complete, system call exports data adapter unit, the result of calculation that executive monitor generates is write in corresponding data source.

The text that compiling resolver receives storing process is SQL statement text.Described compiling resolver comprises:

When the SQL statement of inquiry or the text of storing process are sent to after in compiling resolver, first grammer judging unit carries out grammer judgement, if there is the text not meeting grammer, will quotes exception, directly exit; Afterwards, resolution unit, by grammatical text object (SQL statement), is divided into field expression and conditional expression etc.; Afterwards, text object is changed into syntax tree by grammatical analysis object unit; And the object inside syntax tree decomposes by atomic object unit more further, resolve into the atomic object of minimum particle size, wherein the atomic object of minimum particle size comprises field, table name, function and point group objects always.Atomic object can be saved in a hashmap object by system, and different atomic objects, all there is unique GUID numbering.

After compiling resolver becomes atomic object text resolution, just enter into and merge the optimizing phase.Travel through the atomic object of each SQL statement by merging optimizer, the atomic object with identical content is extracted and is merged into a public atomic object, be stored in public memory pool, grammer leaf just deposits quoting of this atomic object.If father's node of atomic object is an expression formula object, and the content of whole expression formula object is all identical, merge optimizer then by this identical expression formula object extraction out, be merged into an expression formula object, be stored in public memory object pond, realize the sharing functionality of large objects, so analogize, optimize whole sub-SQL statement to merge.For hadoop system, wherein atomic object divides by implementation, can be divided into the map stage, combine process and reduce stage three types, namely merges optimization object, can reduce at most by 3 process process.

After the optimised merging of atomic object, be all simultaneously input in executive monitor.Executive monitor comprises actuator and watch-dog, and in actuator, each atomic object can find can executive module with one of oneself correspondence, and each assembly is independent in actuator respectively to be performed.Watch-dog can record each can executive module start time, end time, take CPU and internal memory, expend the parameters such as network traffics.Had these parameters, system can determine which assembly can parallel running, and which assembly must run in serial, so that the critical path calculated in this subtask and optimization model, is optimized system performance etc.

The above, first SQL statement is divided into atomic object by the present invention, again merging optimization is carried out to atomic object, extract public atomic object or expression formula object, for each atom or expression formula object distribute independent can executive module, and monitor implementation, can start time of executive module according to each of monitoring, end time, take CPU and internal memory and expend the parameters such as network traffics, determine which assembly can parallel running, which assembly must run in serial, and the critical path calculated according to this in this subtask and optimization model, to be optimized system performance etc.The present invention is when splitting atomic object, and each atomic object all has unique GUID numbering, convenient record and follow-up merging optimization; What store in public memory pool is not only atomic object, can also be father's node of atomic object---expression formula object, achieve the sharing functionality of large objects; According to start time, the end time monitored, take CPU and internal memory and expend the parameters such as network traffics, which can walk abreast and serial by executive module in timely judgement, thus the critical path calculated in this subtask and optimization model, improve travelling speed, optimize CPU and internal memory, reduce traffic consumes.

Embodiment 2:

This preferred embodiment, is further expalined the present invention for illustrating with concrete example.

Example is the following is respectively with 2 sql statements that batch is submitted to:

SQL1：select a,(a+b) as c, count(*) from t where a+b > 0 group by a, (a+b)；

SQL2： select (a+b) as c, count(*) from t where a+b + d > 100 group by (a+b)。

With reference to shown in Fig. 2, first by compiling resolver, every bar sql statement loop is resolved into syntax tree, branch Node Decomposition is become the atomic object (field, table name, function, point group objects) of minimum particle size, displaying contents in the result after decomposition and figure.For SQL1, after decomposition, a, (a+b) as c is field object, count (*) is function object, the a+b in (a+b) as c be for field object a, b and operand+combination, table object is t, search criterion to be a+b > 0, condition a+b > 0 be equally field object a, b and operand+combination.The decomposition of SQL2 is similar, refers to the result in figure.

Afterwards, with reference to shown in Fig. 3, by merging optimizer, merging each identical child node inside syntax tree, forming public memory pool.Wherein (a+b) expression formula object does not need these 2 field object of a and b in SQL syntax tree.In public memory pool, GUID numbering and corresponding atomic object or expression formula object are also with reference to shown in figure.Same for SQL1, merge after optimizing, atoms in common object 1 is a, and common expression object 3 is a+b, and public table object 4 is t.SQL2 merge optimize after the results are shown in Figure described in.

Afterwards, executive monitor will optimize the object after merging stored in internal memory, and first perform the object inside public memory pool, and then difference executed in parallel 2 SQL syntax trees.At the map that hadoop performs, combine, reduce3 stage is found corresponding assembly to explain respectively and performs, if groupby object is as the key value of map, count(*) be exactly that counter adds 1, if a+b expression formula, then call expression formula object and calculate, as field a just inside the data sources such as hadoop, read a secondary data by data adapter unit after give 2 SQL syntax trees calculate.

The above, just preferred embodiment of the present invention, the present invention is not limited to above-mentioned embodiment, as long as it reaches technique effect of the present invention with identical means, all should belong to protection scope of the present invention.

Claims

1., based on a SQL statement disposal system for cloud computing, it is characterized in that, comprising:

2. SQL statement disposal system according to claim 1, is characterized in that, also comprise data adapter unit, and described data adapter unit comprises data input adapter and data o adapter, wherein:

3. SQL statement disposal system according to claim 1, is characterized in that, the text of described storing process is SQL statement text.

4. SQL statement disposal system according to claim 1, is characterized in that, described compiling resolver comprises:

5. SQL statement disposal system according to claim 4, is characterized in that, the atomic object of described minimum particle size is saved in a hashmap object, and different atomic objects, all there is unique GUID numbering.

6. SQL statement disposal system according to claim 1, it is characterized in that, described merging optimizer is when extracting public atomic object, if father's node of atomic object is an expression formula object, and the expression formula contents of object of all atomic objects is identical, then extract this expression formula object as public expression formula object, public atomic object, the expression formula object of described extraction are all stored into in public memory pool.

7. SQL statement disposal system according to claim 1, is characterized in that, described executive monitor comprises: