CN104504018B

CN104504018B - Based on dense tree and top-down big data real-time query optimization method

Info

Publication number: CN104504018B
Application number: CN201410765313.6A
Authority: CN
Inventors: 陈岭; 马骄阳
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2014-12-11
Filing date: 2014-12-11
Publication date: 2017-09-08
Anticipated expiration: 2034-12-11
Also published as: CN104504018A

Abstract

The invention discloses one kind based on dense tree and top-down big data real-time query optimization method, including：(1) query statement is parsed, initial query hypergraph is built according to the query statement after parsing；(2) the minimum principle of cost based on query plan tree is decomposed step by step to described initial query hypergraph according to rank is top-down, until obtaining the optimal query plan tree of the initial query hypergraph, that is, completes the optimization of big data real-time query.Search space of the invention by building dense tree, with reference to optimal cost model and Pruning strategy, consider the size of magnetic disc i/o, network transmission and intermediate result, ensure to generate the optimal order of connection, to improve search efficiency, so as to promote the development of big data real-time query technology, improve the service quality of big data real-time query, the production and living being convenient for people to.

Description

Based on dense tree and top-down big data real-time query optimization method

Technical field

The present invention relates to big data inquiring technology field, and in particular to one kind is based on dense tree and top-down big data Real-time query optimization method.

Background technology

With the arrival in big data epoch, the quick search to mass data is treated as the classes such as internet, telecommunications, finance The active demand of type enterprise.In order to meet this kind of demand, big data real time inquiry system is arisen at the historic moment, such as Google Dremel, Berkeley Shark and Cloudera Impala etc..Big data real-time query typically uses Distributed architecture, by weak Change the support to functions such as affairs, real-time query demand of the user under mass data environment can be met.

Query optimization is mainly made up of search space, search strategy and the part of Cost Model three.

Search space can use query tree to be indicated, and be broadly divided into left deep tree (right deep tree) and dense tree.20th century Before the nineties, researcher is primarily upon the query optimization based on left deep tree, because database product at that time is single mostly Machine, the inquiry plan of left deep tree can only be performed serially, and search space is small, and the optimization time is short.And with the hair of science and technology Exhibition, the emphasis studied after distributed system appearance has turned to the query optimization based on dense tree, because the 2 stalk trees of dense tree It can be performed in different nodal parallels, improve the efficiency of inquiry.Because the left deep tree of n single-relation has n！Plant possible, and Dense tree hasKind may, when annexation is more, the search space of dense tree-like formula inquiry plan compared with Greatly, the optimization time is longer.

Search strategy mainly divides 2 classes：One class is top-down, and main thought is to be optimized by overall to local.Base In top-down optimization method, its operation principle is by the way that connection figure is constantly divided, to build connected subgraph, due to The strategies such as branch-and-bound and beta pruning are combined during division, the execution efficiency of algorithm is greatly improved.But its Binary predicate can be handled, interior connection is only supported, there is very big limitation in actual applications.Another kind of search strategy is bottom-up , main thought is to be optimized by local to overall.Such algorithm can handle the inquiry predicate of complexity, but because search is empty Between it is larger, Pruning strategy can not be effectively used again, is existed when relation number is more, optimization the time it is longer the problem of.

The Cost Model of query optimization would generally calculate execution according to features such as statistical information, operator, initial data Cost, can obtain more preferable query optimization effect using suitable Cost Model.General Cost Model needs to consider magnetic Disk I/O, network transmission and CPU computings cost, but when realizing, can according to circumstances select main influence factor.Conventional Cost Model is such as not particularly suited for big data real time inquiry system to Hive Optimized model, and it mainly considers the generation of magnetic disc i/o Valency, in distributed environment multi-table join query execution, the influence of the transmission cost of intermediate result to search efficiency is bigger, therefore Cost Model has certain system limitations, it is difficult to ensure that optimal inquiry plan can be obtained, so as to influence search efficiency.

The content of the invention

Multi-table join sequential optimization is the key areas of data base management system performance optimization, and the present invention is for being currently based on Dense tree has certain system office based on top-down Query Optimal optimization time longer, traditional Cost Model Sex-limited the problems such as, it is proposed that based on dense tree and top-down big data real-time query optimization method.

One kind is set and top-down big data real-time query optimization method based on dense, including：

(1) query statement is parsed, initial query hypergraph is built according to the query statement after parsing；

(2) the minimum principle of cost based on query plan tree to described initial query hypergraph according to rank it is top-down enter Row is decomposed step by step, until obtaining the optimal query plan tree of the initial query hypergraph, that is, completes the optimization of big data real-time query.

When carrying out data query, inquired about according to the optimal query plan tree of initial query hypergraph.

Also comprise the following steps in the step (1)：

(1-1) initializes empty global optimum's tree mapping；

(1-2) is directed in initial query hypergraph each point and builds corresponding inquiry hypergraph and query plan tree, and order is respectively looked into The cost for asking plan tree is 0, and the inquiry hypergraph and corresponding query plan tree each put in the initial query hypergraph are added Add to described global optimum's tree mapping.

Carry out carrying out condition judgment first when decomposing step by step in the step (2), judge current decomposition target whether complete In office's optimal tree mapping：

If in global optimum's tree mapping, according to current decomposition target pair in the tree mapping of cost threshold decision global optimum Whether the query plan tree answered is optimal：

If optimal, then optimal inquiry is used as using the corresponding query plan tree of current decomposition target in global optimum's tree mapping Plan tree,

Otherwise, the optimal query plan tree of current decomposition target is built, and current decomposition mesh during global optimum tree is mapped Mark corresponding query plan tree and be updated to the optimal query plan tree；

Otherwise, the optimal query plan tree of current goal is built, and current decomposition target and optimal query plan tree are deposited In the mapping of Incoming optimal tree；

The optimal query plan tree of current decomposition target is built as follows：

(S1) current decomposition target is decomposed, some subgraphs pair of next stage is obtained, successively to current decomposition target Each subgraph to carry out resolution process；Each subgraph is to including two inquiry hypergraphs, to each subgraph to carrying out at decomposition Resolution process is carried out to each inquiry hypergraph successively during reason；

(S2) when carrying out resolution process to each inquiry hypergraph, using current queries hypergraph as current decomposition target, with this Another inquiry hypergraph of subgraph centering is used as reference target to update cost threshold value；

Returned for the cost threshold value after current decomposition target and renewal and re-execute condition judgment, until currently being divided Solve the optimal query plan tree of target；

According to rank from the bottom to top, the subgraph of each in current level is closed accordingly to merging successively in order And result, the optimal query plan tree as upper level decomposition goal of Least-cost is selected from all amalgamation results, and will Higher level's decomposition goal and the tree mapping of corresponding query plan tree deposit global optimum；

The initial value of the cost threshold value is just infinite.

The initial value of cost threshold value is set in the present invention to be just infinite, can prevent from being appointed because initial value is too small What query plan tree.

Signified current decomposition target is decomposed constantly downwards with top-down in the decomposable process step by step of the present invention Update.It is and cost threshold value and global optimum's tree mapping are all being constantly updated in decomposable process, i.e., right in decomposable process step by step Previous inquiry hypergraph carries out the vestige reservation during resolution process, and the result of latter inquiry hypergraph processing can be produced Influence.

Document Top Down Plan Generation are used in the present invention:From Theory to Practice (Fender P,Moerkotte G.Top down plan generation:From theory to practice.//Proc of the 29th Int Conf on Data Engineering.IEEE,2013:Method is to each disclosed in 1105-1116) Decomposition goal is decomposed, and obtains the subgraph pair of several next stage, and the quantity for decomposing obtained next stage subgraph pair is at least 1, depending on decomposition goal.

In the decomposable process step by step of the present invention, give tacit consent to and the quantity of obtained subgraph pair is decomposed extremely to the decomposition goal of every one-level It is two less.Therefore the inquiry hypergraph of each subgraph centering is decomposed successively.During practical application, to some decomposition goal The quantity for decomposing obtained subgraph pair may be 1, and now, directly each inquiry hypergraph of the subgraph centering is decomposed, and When merging upwards step by step, directly to the subgraph to merging, and decomposition goal using amalgamation result as upper level is most Excellent query plan tree.

In the step (S2) to subgraph to merging when by two of subgraph centering inquiries, hypergraph is corresponding looks into respectively The plan tree of inquiry carry out it is positive merge and reversely merging, merged with forward direction and reversely to merge cost in obtained query plan tree smaller Query plan tree as the subgraph pair amalgamation result.

Merged by forward direction and reversely merging, be further ensured that the cost for obtaining optimal query plan tree is necessarily minimum.

Preferably, for same subgraph to carrying out during resolution process, any selection one is divided as left, then with another One divides as right, and first using right division as reference target, left division obtains the optimal of left division and looked into as decomposition goal Ask after plan tree；Reference target is divided into a left side, right division obtains the optimal query plan tree of right division as decomposition goal；

Cost threshold value is updated according to reference target by the following method in the corresponding step (2)：

(a1) it is as follows in the method that reference target updates cost threshold value when being divided into reference target with the right side：

If reference target is used as the cost threshold value after renewal in global optimum's tree mapping to calculate obtained cost；

Otherwise, the cost threshold value after renewal is used as using the cost lower bound of reference target；

(a2) it is as follows in the method that reference target updates cost threshold value when being divided into reference target with a left side：

The difference that the cost of the optimal query plan tree of left division is subtracted using current cost threshold value is used as the cost after renewal Threshold value.

In the present invention, the cost of the query plan tree reads cost, network transmission generation for the disk of the query plan tree The integrate-cost of valency and table size.

Cost for any one class query plan tree T is calculated according to below equation：

Wherein, α_L+α_R+ β+γ+δ=1；

L and R are respectively the left subtree and right subtree of query plan tree；

C_LThe cost of left subtree, C_RFor the cost of right subtree；

IO_LThe cost of the corresponding data of left subtree is read for disk；

TR is net cost, and S is the size of data, S_LRIt is big for the data after connecting left subtree and right subtree Hash It is small.

C in the present invention_LThe cost of left subtree, C_RIt can be obtained for the cost of right subtree according to above-mentioned formula progress recurrence calculation Arrive.

Preferably, the step (1) also includes the empty global number of attempt mapping of initialization one；

Also include being proceeded as follows according to the result of condition judgment when decompose step by step：

(b1) number of attempt of current decomposition target is determined according to the result of the condition judgment：

If the result of condition judgment be current decomposition target global number of attempt mapping in, reflected from global number of attempt Penetrate the number of attempt for obtaining the current decomposition goal；

If the result of condition judgment be current decomposition target not global number of attempt mapping in, make current decomposition target Number of attempt be zero, and current decomposition target and its number of attempt are added to global number of attempt mapped；

(b2) cost threshold value is updated according to the number of attempt of determination, by the subgraph in its global number of attempt mapping after renewal Corresponding number of attempt adds 1.

The step (a2) updates cost threshold value according to following method：

Budget '=max (budget, lowerBound (graph) × 2^atteempt),

Wherein, budget ' is the cost threshold value after updating, and budget is the cost threshold value before updating, and graph is current point Target is solved, attempt is number of attempt, and lowerBound (graph) is the cost lower bound of current decomposition target.

The cost lower bound of query plan tree is calculated using existing simple Cost Model and obtained in the present invention, referring specifically to ginseng Examine document Effective and Robust Pruning for Top-Down Join Enumeration Algorithms (Fender P,Moerkotte G,Neumann T,et al.Effective and robust pruning for top- down join enumeration algorithms.//Proc of the 28th Int Conf on Data Engineering.IEEE,2012:414-425), to ensure to meet following condition：

LowerBound (graph)≤cost (graph),

Cost represents that current decomposition target graph actual cost (is calculated using Cost Model proposed by the present invention The cost arrived).

Search space of the invention by building dense tree, with reference to optimal cost model and Pruning strategy, considers magnetic Disk I/O, network transmission and intermediate result size, it is ensured that (i.e. initial query hypergraph is corresponding most for the optimal order of connection of generation Excellent query plan tree), to improve search efficiency, so as to promote the development of big data real-time query technology, raising big data real-time The service quality of inquiry, the production and living being convenient for people to specifically include following advantage：

For dense tree-like formula search space it is excessive the problem of, introduce top-down optimization method and Pruning strategy, significantly The time of optimized algorithm execution is reduced, the operational efficiency based on dense tree and top-down optimized algorithm is improved；

The optimal cost model (i.e. cost calculation formula) of distributed big data environment is met, by considering big data The characteristics of Query Cost and Hash under environment are connected, it is ensured that optimal inquiry plan can be generated.

Brief description of the drawings

Fig. 1 is the flow chart of big data real-time query optimization method；

Fig. 2 is hypergraph division result；

Fig. 3 is left and right subtree positive sequence and inverted sequence schematic diagram, wherein (a) merges to be positive, (b) is reverse.

Embodiment

Below in conjunction with the drawings and specific embodiments, the present invention will be described in detail.

The cost of each query plan tree reads cost, network for the disk of the query plan tree and passed defined in the present embodiment The integrate-cost of defeated cost and table size, for any one class query plan tree T cost according to below equation (i.e. cost mould Type) calculate：

Wherein, α_L+α_R+ β+γ+δ=1；

C_LThe cost of left subtree, C_RFor the cost of right subtree, obtained using the formula recursive calculation；

IO_LThe cost of the corresponding data of left subtree is read for disk；

Because 2 subtrees can be performed parallel, therefore when L and R are not single table, the maximum of both costs is taken, together When consider the size of the net cost of right subtree, the size of right subtree and Hash connection result.

Big data real-time query optimization is carried out for a four table Connection inquiring sentences in the present embodiment, it is specific as follows：

Initial query hypergraph G { r1, r2, r3, r4 } is formed after dissection process, and (wherein r1, r2, r3, r4 represent table, are It is easy to description, the expression of opposite side is omitted herein).

(2) the minimum principle of cost based on query plan tree is carried out step by step to initial query hypergraph according to rank is top-down Decompose, until obtaining the optimal query plan tree of initial query hypergraph, that is, complete the optimization of big data real-time query.

Multiple subgraphs are formed after this inquiry hypergraph G { r1, r2, r3, r4 } is divided, as shown in Fig. 2 subsequent descriptions will be based on This is carried out.To inquire about hypergraph G { r1, r2, r3, r4 } as decomposition goal, first order decomposition is carried out, two subgraphs pair are obtained, point Not Wei subgraph to G11 and subgraph to G12, it is respectively G1 { r1 } and G5 { r2, r3, r4 } that subgraph, which includes two inquiry hypergraphs to G11, It is respectively G7 { r1, r2 } and G6 { r3, r4 } that subgraph, which includes two inquiry hypergraphs to G12,.

G1 { r1 } is single table, without continuing to decompose.The inquiry hypergraph G5 { r2, r3, r4 } that is further obtained with level of decomposition, G7 { r1, r2 } and G6 { r3, r4 } carries out two grades of decomposition as decomposition goal.G5 { r2, r3, r4 } is decomposed and is obtained a subgraph pair G21, including two inquiry hypergraphs, respectively G2 { r2 } and G6 { r3, r4 }.Mesh is further decomposed as three-level with G6 { r3, r4 } Mark proceeds to decompose, and obtains a subgraph to G31, including two inquiry hypergraph G3 { r3 } and G4 { r4 }.

Inquiry hypergraph G7 { r1, r2 } is decomposed and obtains a subgraph to G22, including two inquiry hypergraph G1 { r1 } and G2 {r2}。

As shown in figure 1, decomposable process is as follows step by step in the present embodiment：

(a) mapping of initialization global optimum tree and global number of attempt mapping, as shown in table 1, will inquire about hypergraph G r1, R2, r3, r4 } global optimum's tree mapping is put into, correspondence optimal solution is；

Single table hypergraph and corresponding single table query plan tree are put into (wherein T in global optimum's tree mapping_xyzRepresent inquiry Have three tables of x, y, z in plan tree T, x, y, z is the numbering of table, when have have a variety of query plan trees comprising same table when, use T_xyz、T’_xyz、T”_xyzDeng expression).

Hypergraph G { r1, r2, r3, r4 } is put into global number of attempt mapping, correspondence number of attempt is 0, the obtained overall situation Number of attempt mapping is as shown in table 2.

Table 1

Table 2

G{r1,r2,r3,r4}

0

(b) the corresponding optimal query plan trees of G { r1, r2, r3, r4 } are obtained from global optimum's tree mapping, be the discovery that (being not present), therefore obtain from the mapping of global number of attempt G { r1, r2, r3, r4 } number of attempt, and tasting according to acquisition Try number of times and update budget using formula (1)：

Budget=max (budget, lowerBound (graph) × 2^atteempt) (1)

Wherein graph is hypergraph (inquiring about hypergraph), and attempt is number of attempt, and lowerBound, which is used to obtain, to be inquired about The cost lower bound of hypergraph, calculates lower bound, to ensure lowerBound (graph)≤cost usually using simple Cost Model (graph), cost (graph) represents graph actual cost, now graph=G.

Budget after being updated in the present embodiment is b₀' (budget initial value is b in the present embodiment₀, it is just infinite).

Update corresponding number of attempt during its global number of attempt is mapped after budget and add 1.Global trial after renewal Number of times mapping is as shown in table 3.

Table 3

G{r1,r2,r3,r4}

1

(c) document Top Down Plan Generation are used as decomposition goal with G { r1, r2, r3, r4 }:From Division methods in Theory to Practice are divided to G { r1, r2, r3, r4 }, as shown in Fig. 2 two subgraphs of generation It is right, G11 { G₁{r1},G₅{ r2, r3, r4 } } and G12 { G₇{r1,r2},G₆{r3,r4}}。

(d) below to subgraph to G11 { G₁{r1},G₅{ r2, r3, r4 } } analyzed, with G during analysis₅{ r2, r3, r4 } makees Divided to be right, with G₁{ r1 } is divided as left, with b₀' as subgraph to G11 { G₁{r1},G₅{ r2, r3, r4 } } it is corresponding Budget a reference values (budget a reference value).

(d1) first determine whether whether right division is present in global optimum's tree mapping, and be based on accordingly according to judged result Replacement criteria update budget, then using update after budget build G₁{ r1 } corresponding left subtree.

Replacement criteria in the present embodiment is as follows：

If right division is present in optimal tree mapping, the cost of correspondence query plan tree is directly calculated, and to calculate The cost of the query plan tree arrived is as budget；

Otherwise, its cost lower bound is calculated, and updates budget for cost lower bound；

Right division G in the present embodiment₅{ r2, r3, r4 } not in the mapping of global optimum tree, with according to above method (i.e. more New standard) subgraph is updated to corresponding budget a reference values, (now budget a reference values are b₀'), the budget after renewal For c₁₁。

In the present embodiment G is obtained from global optimum's tree mapping₁{ r1 } corresponding optimal tree, is T₁, directly return to optimal look into Ask plan tree T₁(think that the cost of single table query plan tree is always less than given budget herein, it is i.e. small in this embodiment In c₁₁)。

(d2) G11 { G are updated using formula (2) according to the cost of left subtree₁{r1},G₅{ r2, r3, r4 } } it is corresponding Budget a reference values, and G is built according to the budget after renewal₅{ r2, r3, r4 } corresponding query plan tree is used as right subtree.

Budget '=budget-cost (left), (2)

Wherein cost (left) represents the cost of the left subtree built.

Budget after being updated in the present embodiment is b₁：

b₁=b₀’-c₁₂,

Wherein, c₁₂For G₁The cost of { r1 }.

G is built in the present embodiment₅{ r2, r3, r4 } corresponding query plan tree specifically includes following steps as right subtree：

(d21) G is obtained from global optimum's tree mapping₅{ r2, r3, r4 } corresponding optimal tree (optimal query plan tree), It is not present, then obtains G from the mapping of global number of attempt₅{ r2, r3, r4 } corresponding number of attempt, is not present, is initialized as 0, And substitute into obtained number of attempt according to formula (1) to update budget.

Budget after now being updated in the present embodiment is b₁' be：

b₁'=max (b₁,lowerBound(graph)×2^atteempt),

It is saved in after number of attempt is increased into 1 simultaneously in global number of attempt mapping.Now, the global number of attempt after renewal Mapping is as shown in table 4.

Table 4

G{r1,r2,r3,r4}	1
		G₅{r2,r3,r4}	1

(d22) to inquiry hypergraph G₅{ r2, r3, r4 } is divided, as shown in Fig. 2 the present embodiment generates a subgraph pair G21{G₂{r2},G₆{ r3, r4 } }, with G₆{ r3, r4 } divides to be right, with G₂{ r2 } divides to be left, with b₁' as subgraph to G21 {G₂{r2},G₆{ r3, r4 } } budget a reference values, proceed following operation：

(S1) right division G is judged₆Whether { r3, r4 } is present in global optimum's tree mapping, and using replacement criteria more New budget, then builds G using the budget after updating₁{ r1 } corresponding left subtree.

Right division G in the present embodiment₆{ r3, r4 } not in global optimum's tree mapping, being updated according to replacement criteria will Budget is by a reference value b₁' it is updated to c₂₁。

In the present embodiment G is obtained from global optimum's tree mapping₂{ r2 } corresponding optimal tree, is T₂, directly return to optimal look into Ask plan tree T₂(think that the cost of single table query plan tree is always less than given budget herein, herein in fact as left subtree Apply and be less than c in example₂₁)。

(S2) according to G₂The cost of { r2 } updates budget, and the subgraph is updated to corresponding budget bases using formula (2) Quasi- value, budget is after renewal：

b₂=b₁’-c₂₂,

Wherein c₂₂For G₂Cost (the i.e. query plan tree T of { r2 }₂Cost), then build G₆{ r3, r4 } corresponding right son Tree, it is specific as follows：

(S2-1) G is obtained from global optimum's tree mapping₆{ r3, r4 } corresponding optimal tree, is not present, then from overall situation trial G is obtained in number of times mapping₆{ r3, r4 } corresponding number of attempt, is not present, and initialization number of attempt is 0, and according to the trial time Number is using formula (1) by budget value by b₂It is updated to b₂', while being saved in global number of attempt after number of attempt is increased into 1 In mapping.

Now, the global number of attempt mapping after renewal is as shown in table 5.

Table 5

G{r1,r2,r3,r4}	1
		G₅{r2,r3,r4}	1
G₆{r3,r4}	1

(S2-2) inquiry hypergraph G6 { r3, r4 } is divided, generates a subgraph to G31 { G3 { r3 }, G4 { r4 } }, with G4 { r4 } divides to be right, with G3 { r3 } left division, with b₂' as subgraph to G31 { G3 { r3 }, G4 { r4 } } corresponding budget bases Quasi- value.

(S2-21) step (d1) is performed to G31 { G3 { r3 }, G4 { r4 } } for subgraph, now judges right division G4 { r4 } For in global optimum's mapping tree, and based on corresponding replacement criteria, further calculate right division G₄The cost of { r4 }, will Budget value is by b₂' it is updated to c₃₁, and build G₃{ r3 } corresponding left subtree：

(S2-22) G is obtained from global optimum's tree mapping₃{ r3 } corresponding optimal tree, is T₃, directly return to optimal inquiry Plan tree T₃(think that the cost of single table query plan tree is always less than given budget herein, be less than in this embodiment c₃₁)。

(S2-23) left subtree obtained using structure, budget, the value of the budget after renewal are updated according to formula (2) b₃For：

b₃=b₂’-c₃₂,

Wherein, c₃₂For G₃The cost of { r3 }).

(S2-24) budget (the i.e. b after updating are utilized₃) build G₄{ r4 } corresponding right subtree, it is specific as follows：

G is obtained from global optimum's tree mapping₄{ r4 } corresponding optimal tree, is T₄, directly return to optimal query plan tree T₄ (think that the cost of single table query plan tree is always less than given budget herein, b is less than in this embodiment₃)。

(S2-25) by T₃And T₄Carry out positive sequence and inverted sequence merges, if positive sequence amalgamation result is T₃₄, inverted sequence amalgamation result is T’₃₄, and T₃₄Cost is less than T '₃₄.Select the query plan tree of Least-cost and update global optimum's tree mapping.It is complete after renewal Office's optimal tree mapping is as shown in table 6.

Table 6

(S2-26) all subgraphs are finished to having calculated, and return to G in global optimum's tree mapping₆{ r3, r4 } is corresponding optimal to be looked into Ask plan tree T₃₄。

(S3) by T₂And T₃₄Carry out positive sequence and inverted sequence merges, positive sequence amalgamation result is T in the present embodiment₂₃₄, inverted sequence, which merges, to be tied Fruit is T '₂₃₄, and T₂₃₄Cost is less than T '₂₃₄.Select the query plan tree of Least-cost and update global optimum's tree mapping.Update Global optimum's tree mapping afterwards is as shown in table 7.

For subtree T in the present embodiment_mWith subtree T_n, its positive sequence and inverted sequence merging are as shown in figure 3, wherein figure (a) is forward direction Merge, figure (b) is reverse, during merging, with subtree T_mFor left subtree and subtree T_nFor right subtree, referred to as positive sequence, which merges, obtains setting T, Conversely, with subtree T_mFor right subtree and subtree T_nFor left subtree, referred to as inverted sequence, which merges, obtains setting T.

Table 7

(d23) all subgraphs are finished to having calculated, and return to G in global optimum's tree mapping₅{ r2, r3, r4 } is corresponding optimal Query plan tree T₂₃₄。

(d3) by T₁And T₂₃₄Carry out positive sequence and inverted sequence merges, positive sequence amalgamation result might as well be set as T₁₂₃₄, inverted sequence amalgamation result For T '₁₂₃₄, and T₁₂₃₄Cost is less than T '₁₂₃₄.Select the query plan tree of Least-cost and update global optimum's tree mapping.Update Global optimum's tree mapping afterwards is as shown in table 8.

Table 8

G{r1,r2,r3,r4}	T₁₂₃₄
		G₁{r1}	T₁
G₂{r2}	T₂
		G₃{r3}	T₃
G₄{r4}	T₄
		G₆{r3,r4}	T₃₄
G₅{r2,r3,r4}	T₂₃₄

(e) below to subgraph to G12 { G₇{r1,r2},G₆{ r3, r4 } } analyzed, with G during analysis₇{ r1, r2 } conduct It is left to divide, G₆{ r3, r4 } divides to be right, with b₀' as subgraph to G12 { G₇{r1,r2},G₆{ r3, r4 } } corresponding budget bases Quasi- value.

(e1) right division G is calculated first₆The cost of { r3, r4 }, budget is updated with this.Due to G₆It is present in the overall situation most In select tree mapping, therefore directly calculate its cost and by budget by a reference value b₀' it is updated to c₄₁, and build G₇{ r1, r2 } is right The left subtree answered.

G is built in the present embodiment₇The method of { r1, r2 } corresponding left subtree is as follows：

(e11) G is obtained from global optimum's tree mapping₇{ r1, r2 } corresponding optimal tree, is not present.

(e12) G is obtained from the mapping of global number of attempt₇{ r1, r2 } corresponding number of attempt, is not present, initialization is tasted It is 0 to try number of times, and updates budget with this, by budget value by c₄₁It is updated to c₄₁', while being preserved after number of attempt is increased into 1 Into the mapping of global number of attempt.Global number of attempt mapping after renewal is as shown in table 9.

Table 9

G{r1,r2,r3,r4}	1
		G₅{r2,r3,r4}	1
G₆{r3,r4}	1
		G₇{r1,r2}	1

(e13) to hypergraph (inquiring about hypergraph) G₇{ r1, r2 } is divided, and one subgraph of generation is to G22 { G₁{r1},G₂ {r2}}。

(e14) right division G is calculated₂The cost of { r2 }, budget is updated with this, by a reference value by c₄₁' it is updated to c₅₁, and structure Build G₁{ r1 } corresponding left subtree：

G is obtained from global optimum's tree mapping₁{ r1 } corresponding optimal tree, is T₁, directly return to optimal query plan tree T₁ (think that the cost of single table query plan tree is always less than given budget herein, c is less than in this embodiment₅₁)。

(e15) budget, b are updated₅=c₄₁’-c₅₂(wherein c₅₂For G₁The cost of { r1 }), then build G₂{ r2 } is corresponding Right subtree：

G is obtained from global optimum's tree mapping₂{ r2 } corresponding optimal tree, is T₂, directly return to optimal query plan tree T₂ (think that the cost of single table query plan tree is always less than given budget herein, b is less than in this embodiment₅)。

(e16) by T₁And T₂Carry out positive sequence and inverted sequence merges, positive sequence amalgamation result might as well be set as T₁₂, inverted sequence amalgamation result is T’₁₂, and T₁₂Cost is less than T '₁₂.Select the query plan tree of Least-cost and update global optimum's tree mapping.It is complete after renewal Office's optimal tree mapping is as shown in table 10.

Table 10

G{r1,r2,r3,r4}	T₁₂₃₄
		G₁{r1}	T₁
G₂{r2}	T₂
		G₃{r3}	T₃

G₄{r4}	T₄
		G₆{r3,r4}	T₃₄
G₅{r2,r3,r4}	T₂₃₄
		G₇{r1,r2}	T₁₂

(e17) all subgraphs are finished to having calculated, and return to G in global optimum's tree mapping₇{ r1, r2 } corresponding optimal inquiry Plan tree T₁₂。

(e2) update budget and build G₆{ r3, r4 } corresponding right subtree, it is specific as follows：

(e21) by budget by a reference value b₀' it is updated to b₄, b₄=b₀’-c₄₂(wherein c₄₂For G₇The cost of { r1, r2 }), And G is built with this₆{ r3, r4 } corresponding right subtree.

(e22) G is obtained from global optimum's tree mapping₆{ r3, r4 } corresponding optimal tree, is present, and by its cost with giving Fixed budget is compared：

If cost is less than given budget, no longer divided, directly return to corresponding inquiry in optimal tree mapping Plan tree；

Otherwise need to continue executing with partition process.

G in the present embodiment₆{ r3, r4 } corresponding optimal tree cost is less than given budget a reference values b₄, therefore directly return Return G in global optimum's tree mapping₆{ r3, r4 } corresponding optimal query plan tree T₃₄。

(e3) by T₁₂And T₃₄Carry out positive sequence and inverted sequence merges, positive sequence amalgamation result might as well be set as T "₁₂₃₄, inverted sequence amalgamation result For T " '₁₂₃₄, and T "₁₂₃₄Cost is less than T " '₁₂₃₄And T₁₂₃₄.Select the query plan tree of Least-cost and update global optimum tree Mapping.Global optimum's tree mapping after renewal is as shown in table 11.

Table 11

G{r1,r2,r3,r4}	T”₁₂₃₄
		G₁{r1}	T₁
G₂{r2}	T₂
		G₃{r3}	T₃
G₄{r4}	T₄
		G₆{r3,r4}	T₃₄
G₅{r2,r3,r4}	T₂₃₄
		G₇{r1,r2}	T₁₂

(f) all subgraphs are finished to having calculated, and return to G { r1, r2, r3, r4 } in global optimum's tree mapping corresponding optimal Query plan tree T "₁₂₃₄。

Technical scheme and beneficial effect are described in detail above-described embodiment, Ying Li Solution is to the foregoing is only presently most preferred embodiment of the invention, is not intended to limit the invention, all principle models in the present invention Interior done any modification, supplement and equivalent substitution etc. are enclosed, be should be included in the scope of the protection.

Claims

1. one kind is based on dense tree and top-down big data real-time query optimization method, it is characterised in that including：

(1-1) initializes empty global optimum's tree mapping；

(1-2) is directed to each point in initial query hypergraph and builds corresponding inquiry hypergraph and query plan tree, and makes each inquiry meter The cost for drawing tree is 0, and the inquiry hypergraph and corresponding query plan tree each put in the initial query hypergraph are added to Described global optimum's tree mapping；

(2) the minimum principle of cost based on query plan tree to described initial query hypergraph according to rank it is top-down carry out by Level is decomposed, until obtaining the optimal query plan tree of the initial query hypergraph, that is, completes the optimization of big data real-time query；

Carry out carrying out condition judgment first when decomposing step by step, judge current decomposition target whether in global optimum's tree mapping：

If corresponding according to current decomposition target in the tree mapping of cost threshold decision global optimum in global optimum's tree mapping Whether query plan tree is optimal：

If optimal, then optimal inquiry plan is used as using the corresponding query plan tree of current decomposition target in global optimum's tree mapping Tree,

Otherwise, the optimal query plan tree of current decomposition target is built, and current decomposition target pair during global optimum tree is mapped The query plan tree answered is updated to the optimal query plan tree；

Otherwise, the optimal query plan tree of current goal is built, and current decomposition target and optimal query plan tree are stored in entirely In office's optimal tree mapping；

(S1) current decomposition target is decomposed, some subgraphs pair of next stage is obtained, successively to the every of current decomposition target One subgraph is to carrying out resolution process；Each subgraph is to including two inquiry hypergraphs, to each subgraph to carrying out during resolution process Resolution process is carried out to each inquiry hypergraph successively；

(S2) when carrying out resolution process to each inquiry hypergraph, using current queries hypergraph as current decomposition target, with the subgraph Another inquiry hypergraph of centering is used as reference target to update cost threshold value；

Returned for the cost threshold value after current decomposition target and renewal and re-execute condition judgment, until obtaining current decomposition mesh The optimal query plan tree of target；

According to rank from the bottom to top, the subgraph of each in current level is obtained into corresponding merging knot to merging successively in order Really, the optimal query plan tree as upper level decomposition goal of Least-cost is selected from all amalgamation results, and by higher level Decomposition goal and the tree mapping of corresponding query plan tree deposit global optimum；

The initial value of the cost threshold value is just infinite.

2. as described in claim 1 based on dense tree and top-down big data real-time query optimization method, its feature It is, for same subgraph to carrying out during resolution process, any selection one is divided as left, then is drawn using another as the right side Point, first using right division as reference target, left division is as decomposition goal, after the optimal query plan tree for obtaining left division； Reference target is divided into a left side, right division obtains the optimal query plan tree of right division as decomposition goal；

(a1) it is as follows in the method for reference target renewal cost threshold value in the step (2) when being divided into reference target with the right side：

If reference target is in global optimum's tree mapping, the cost threshold value after renewal is used as to calculate obtained cost；

(a2) it is as follows in the method for reference target renewal cost threshold value in the step (2) when being divided into reference target with a left side：

The difference that the cost of the optimal query plan tree of left division is subtracted using current cost threshold value is used as the cost threshold value after renewal.

3. as described in claim 1 based on dense tree and top-down big data real-time query optimization method, its feature Be, in the step (S2) to subgraph to merging when the corresponding inquiry of two of subgraph centering inquiry hypergraphs is counted respectively Draw tree carry out it is positive merge and reversely merging, merged with forward direction and reversely merge in obtained query plan tree that cost is less to be looked into Plan tree is ask as the amalgamation result of the subgraph pair.

4. being optimized based on dense tree and top-down big data real-time query as described in any one in claims 1 to 3 Method, it is characterised in that the cost of the query plan tree reads cost, net cost for the disk of the query plan tree With the integrate-cost of table size；

Wherein, α_L+α_R+ β+γ+δ=1；

C_LThe cost of left subtree, C_RFor the cost of right subtree；

IO_LThe cost of the corresponding data of left subtree is read for disk；

TR is net cost, and S is the size of data, S_LRFor the size of data after connecting left subtree and right subtree Hash.

5. as claimed in claim 4 based on dense tree and top-down big data real-time query optimization method, its feature exists In the step (1) also includes the empty global number of attempt mapping of initialization one；

If the result of condition judgment be current decomposition target global number of attempt mapping in, from global number of attempt mapping in Obtain the number of attempt of the current decomposition goal；

If the result of condition judgment be current decomposition target not global number of attempt mapping in, make tasting for current decomposition target It is zero to try number of times, and current decomposition target and its number of attempt are added into global number of attempt mapping；

(b2) cost threshold value is updated according to the number of attempt of determination, by current decomposition mesh in its global number of attempt mapping after renewal The corresponding number of attempt of mark correspondence subgraph adds 1.

6. as claimed in claim 2 based on dense tree and top-down big data real-time query optimization method, its feature exists In the step (a2) updates cost threshold value according to following method：

Budget '=max (budget, lowerBound (graph) × 2^atteempt),

Wherein, budget ' is the cost threshold value after updating, and budget is the cost threshold value before updating, and graph is current decomposition mesh Mark, attempt is number of attempt, and lowerBound (graph) is the cost lower bound of current decomposition target.