CN104504018B - Based on dense tree and top-down big data real-time query optimization method - Google Patents

Based on dense tree and top-down big data real-time query optimization method Download PDF

Info

Publication number
CN104504018B
CN104504018B CN201410765313.6A CN201410765313A CN104504018B CN 104504018 B CN104504018 B CN 104504018B CN 201410765313 A CN201410765313 A CN 201410765313A CN 104504018 B CN104504018 B CN 104504018B
Authority
CN
China
Prior art keywords
tree
cost
query
query plan
mapping
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201410765313.6A
Other languages
Chinese (zh)
Other versions
CN104504018A (en
Inventor
陈岭
马骄阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201410765313.6A priority Critical patent/CN104504018B/en
Publication of CN104504018A publication Critical patent/CN104504018A/en
Application granted granted Critical
Publication of CN104504018B publication Critical patent/CN104504018B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses one kind based on dense tree and top-down big data real-time query optimization method, including:(1) query statement is parsed, initial query hypergraph is built according to the query statement after parsing;(2) the minimum principle of cost based on query plan tree is decomposed step by step to described initial query hypergraph according to rank is top-down, until obtaining the optimal query plan tree of the initial query hypergraph, that is, completes the optimization of big data real-time query.Search space of the invention by building dense tree, with reference to optimal cost model and Pruning strategy, consider the size of magnetic disc i/o, network transmission and intermediate result, ensure to generate the optimal order of connection, to improve search efficiency, so as to promote the development of big data real-time query technology, improve the service quality of big data real-time query, the production and living being convenient for people to.

Description

Based on dense tree and top-down big data real-time query optimization method
Technical field
The present invention relates to big data inquiring technology field, and in particular to one kind is based on dense tree and top-down big data Real-time query optimization method.
Background technology
With the arrival in big data epoch, the quick search to mass data is treated as the classes such as internet, telecommunications, finance The active demand of type enterprise.In order to meet this kind of demand, big data real time inquiry system is arisen at the historic moment, such as Google Dremel, Berkeley Shark and Cloudera Impala etc..Big data real-time query typically uses Distributed architecture, by weak Change the support to functions such as affairs, real-time query demand of the user under mass data environment can be met.
Query optimization is mainly made up of search space, search strategy and the part of Cost Model three.
Search space can use query tree to be indicated, and be broadly divided into left deep tree (right deep tree) and dense tree.20th century Before the nineties, researcher is primarily upon the query optimization based on left deep tree, because database product at that time is single mostly Machine, the inquiry plan of left deep tree can only be performed serially, and search space is small, and the optimization time is short.And with the hair of science and technology Exhibition, the emphasis studied after distributed system appearance has turned to the query optimization based on dense tree, because the 2 stalk trees of dense tree It can be performed in different nodal parallels, improve the efficiency of inquiry.Because the left deep tree of n single-relation has n!Plant possible, and Dense tree hasKind may, when annexation is more, the search space of dense tree-like formula inquiry plan compared with Greatly, the optimization time is longer.
Search strategy mainly divides 2 classes:One class is top-down, and main thought is to be optimized by overall to local.Base In top-down optimization method, its operation principle is by the way that connection figure is constantly divided, to build connected subgraph, due to The strategies such as branch-and-bound and beta pruning are combined during division, the execution efficiency of algorithm is greatly improved.But its Binary predicate can be handled, interior connection is only supported, there is very big limitation in actual applications.Another kind of search strategy is bottom-up , main thought is to be optimized by local to overall.Such algorithm can handle the inquiry predicate of complexity, but because search is empty Between it is larger, Pruning strategy can not be effectively used again, is existed when relation number is more, optimization the time it is longer the problem of.
The Cost Model of query optimization would generally calculate execution according to features such as statistical information, operator, initial data Cost, can obtain more preferable query optimization effect using suitable Cost Model.General Cost Model needs to consider magnetic Disk I/O, network transmission and CPU computings cost, but when realizing, can according to circumstances select main influence factor.Conventional Cost Model is such as not particularly suited for big data real time inquiry system to Hive Optimized model, and it mainly considers the generation of magnetic disc i/o Valency, in distributed environment multi-table join query execution, the influence of the transmission cost of intermediate result to search efficiency is bigger, therefore Cost Model has certain system limitations, it is difficult to ensure that optimal inquiry plan can be obtained, so as to influence search efficiency.
The content of the invention
Multi-table join sequential optimization is the key areas of data base management system performance optimization, and the present invention is for being currently based on Dense tree has certain system office based on top-down Query Optimal optimization time longer, traditional Cost Model Sex-limited the problems such as, it is proposed that based on dense tree and top-down big data real-time query optimization method.
One kind is set and top-down big data real-time query optimization method based on dense, including:
(1) query statement is parsed, initial query hypergraph is built according to the query statement after parsing;
(2) the minimum principle of cost based on query plan tree to described initial query hypergraph according to rank it is top-down enter Row is decomposed step by step, until obtaining the optimal query plan tree of the initial query hypergraph, that is, completes the optimization of big data real-time query.
When carrying out data query, inquired about according to the optimal query plan tree of initial query hypergraph.
Also comprise the following steps in the step (1):
(1-1) initializes empty global optimum's tree mapping;
(1-2) is directed in initial query hypergraph each point and builds corresponding inquiry hypergraph and query plan tree, and order is respectively looked into The cost for asking plan tree is 0, and the inquiry hypergraph and corresponding query plan tree each put in the initial query hypergraph are added Add to described global optimum's tree mapping.
Carry out carrying out condition judgment first when decomposing step by step in the step (2), judge current decomposition target whether complete In office's optimal tree mapping:
If in global optimum's tree mapping, according to current decomposition target pair in the tree mapping of cost threshold decision global optimum Whether the query plan tree answered is optimal:
If optimal, then optimal inquiry is used as using the corresponding query plan tree of current decomposition target in global optimum's tree mapping Plan tree,
Otherwise, the optimal query plan tree of current decomposition target is built, and current decomposition mesh during global optimum tree is mapped Mark corresponding query plan tree and be updated to the optimal query plan tree;
Otherwise, the optimal query plan tree of current goal is built, and current decomposition target and optimal query plan tree are deposited In the mapping of Incoming optimal tree;
The optimal query plan tree of current decomposition target is built as follows:
(S1) current decomposition target is decomposed, some subgraphs pair of next stage is obtained, successively to current decomposition target Each subgraph to carry out resolution process;Each subgraph is to including two inquiry hypergraphs, to each subgraph to carrying out at decomposition Resolution process is carried out to each inquiry hypergraph successively during reason;
(S2) when carrying out resolution process to each inquiry hypergraph, using current queries hypergraph as current decomposition target, with this Another inquiry hypergraph of subgraph centering is used as reference target to update cost threshold value;
Returned for the cost threshold value after current decomposition target and renewal and re-execute condition judgment, until currently being divided Solve the optimal query plan tree of target;
According to rank from the bottom to top, the subgraph of each in current level is closed accordingly to merging successively in order And result, the optimal query plan tree as upper level decomposition goal of Least-cost is selected from all amalgamation results, and will Higher level's decomposition goal and the tree mapping of corresponding query plan tree deposit global optimum;
The initial value of the cost threshold value is just infinite.
The initial value of cost threshold value is set in the present invention to be just infinite, can prevent from being appointed because initial value is too small What query plan tree.
Signified current decomposition target is decomposed constantly downwards with top-down in the decomposable process step by step of the present invention Update.It is and cost threshold value and global optimum's tree mapping are all being constantly updated in decomposable process, i.e., right in decomposable process step by step Previous inquiry hypergraph carries out the vestige reservation during resolution process, and the result of latter inquiry hypergraph processing can be produced Influence.
Document Top Down Plan Generation are used in the present invention:From Theory to Practice (Fender P,Moerkotte G.Top down plan generation:From theory to practice.//Proc of the 29th Int Conf on Data Engineering.IEEE,2013:Method is to each disclosed in 1105-1116) Decomposition goal is decomposed, and obtains the subgraph pair of several next stage, and the quantity for decomposing obtained next stage subgraph pair is at least 1, depending on decomposition goal.
In the decomposable process step by step of the present invention, give tacit consent to and the quantity of obtained subgraph pair is decomposed extremely to the decomposition goal of every one-level It is two less.Therefore the inquiry hypergraph of each subgraph centering is decomposed successively.During practical application, to some decomposition goal The quantity for decomposing obtained subgraph pair may be 1, and now, directly each inquiry hypergraph of the subgraph centering is decomposed, and When merging upwards step by step, directly to the subgraph to merging, and decomposition goal using amalgamation result as upper level is most Excellent query plan tree.
In the step (S2) to subgraph to merging when by two of subgraph centering inquiries, hypergraph is corresponding looks into respectively The plan tree of inquiry carry out it is positive merge and reversely merging, merged with forward direction and reversely to merge cost in obtained query plan tree smaller Query plan tree as the subgraph pair amalgamation result.
Merged by forward direction and reversely merging, be further ensured that the cost for obtaining optimal query plan tree is necessarily minimum.
Preferably, for same subgraph to carrying out during resolution process, any selection one is divided as left, then with another One divides as right, and first using right division as reference target, left division obtains the optimal of left division and looked into as decomposition goal Ask after plan tree;Reference target is divided into a left side, right division obtains the optimal query plan tree of right division as decomposition goal;
Cost threshold value is updated according to reference target by the following method in the corresponding step (2):
(a1) it is as follows in the method that reference target updates cost threshold value when being divided into reference target with the right side:
If reference target is used as the cost threshold value after renewal in global optimum's tree mapping to calculate obtained cost;
Otherwise, the cost threshold value after renewal is used as using the cost lower bound of reference target;
(a2) it is as follows in the method that reference target updates cost threshold value when being divided into reference target with a left side:
The difference that the cost of the optimal query plan tree of left division is subtracted using current cost threshold value is used as the cost after renewal Threshold value.
In the present invention, the cost of the query plan tree reads cost, network transmission generation for the disk of the query plan tree The integrate-cost of valency and table size.
Cost for any one class query plan tree T is calculated according to below equation:
Wherein, αLR+ β+γ+δ=1;
L and R are respectively the left subtree and right subtree of query plan tree;
CLThe cost of left subtree, CRFor the cost of right subtree;
IOLThe cost of the corresponding data of left subtree is read for disk;
TR is net cost, and S is the size of data, SLRIt is big for the data after connecting left subtree and right subtree Hash It is small.
C in the present inventionLThe cost of left subtree, CRIt can be obtained for the cost of right subtree according to above-mentioned formula progress recurrence calculation Arrive.
Preferably, the step (1) also includes the empty global number of attempt mapping of initialization one;
Also include being proceeded as follows according to the result of condition judgment when decompose step by step:
(b1) number of attempt of current decomposition target is determined according to the result of the condition judgment:
If the result of condition judgment be current decomposition target global number of attempt mapping in, reflected from global number of attempt Penetrate the number of attempt for obtaining the current decomposition goal;
If the result of condition judgment be current decomposition target not global number of attempt mapping in, make current decomposition target Number of attempt be zero, and current decomposition target and its number of attempt are added to global number of attempt mapped;
(b2) cost threshold value is updated according to the number of attempt of determination, by the subgraph in its global number of attempt mapping after renewal Corresponding number of attempt adds 1.
The step (a2) updates cost threshold value according to following method:
Budget '=max (budget, lowerBound (graph) × 2atteempt),
Wherein, budget ' is the cost threshold value after updating, and budget is the cost threshold value before updating, and graph is current point Target is solved, attempt is number of attempt, and lowerBound (graph) is the cost lower bound of current decomposition target.
The cost lower bound of query plan tree is calculated using existing simple Cost Model and obtained in the present invention, referring specifically to ginseng Examine document Effective and Robust Pruning for Top-Down Join Enumeration Algorithms (Fender P,Moerkotte G,Neumann T,et al.Effective and robust pruning for top- down join enumeration algorithms.//Proc of the 28th Int Conf on Data Engineering.IEEE,2012:414-425), to ensure to meet following condition:
LowerBound (graph)≤cost (graph),
Cost represents that current decomposition target graph actual cost (is calculated using Cost Model proposed by the present invention The cost arrived).
Search space of the invention by building dense tree, with reference to optimal cost model and Pruning strategy, considers magnetic Disk I/O, network transmission and intermediate result size, it is ensured that (i.e. initial query hypergraph is corresponding most for the optimal order of connection of generation Excellent query plan tree), to improve search efficiency, so as to promote the development of big data real-time query technology, raising big data real-time The service quality of inquiry, the production and living being convenient for people to specifically include following advantage:
For dense tree-like formula search space it is excessive the problem of, introduce top-down optimization method and Pruning strategy, significantly The time of optimized algorithm execution is reduced, the operational efficiency based on dense tree and top-down optimized algorithm is improved;
The optimal cost model (i.e. cost calculation formula) of distributed big data environment is met, by considering big data The characteristics of Query Cost and Hash under environment are connected, it is ensured that optimal inquiry plan can be generated.
Brief description of the drawings
Fig. 1 is the flow chart of big data real-time query optimization method;
Fig. 2 is hypergraph division result;
Fig. 3 is left and right subtree positive sequence and inverted sequence schematic diagram, wherein (a) merges to be positive, (b) is reverse.
Embodiment
Below in conjunction with the drawings and specific embodiments, the present invention will be described in detail.
The cost of each query plan tree reads cost, network for the disk of the query plan tree and passed defined in the present embodiment The integrate-cost of defeated cost and table size, for any one class query plan tree T cost according to below equation (i.e. cost mould Type) calculate:
Wherein, αLR+ β+γ+δ=1;
L and R are respectively the left subtree and right subtree of query plan tree;
CLThe cost of left subtree, CRFor the cost of right subtree, obtained using the formula recursive calculation;
IOLThe cost of the corresponding data of left subtree is read for disk;
TR is net cost, and S is the size of data, SLRIt is big for the data after connecting left subtree and right subtree Hash It is small.
Because 2 subtrees can be performed parallel, therefore when L and R are not single table, the maximum of both costs is taken, together When consider the size of the net cost of right subtree, the size of right subtree and Hash connection result.
Big data real-time query optimization is carried out for a four table Connection inquiring sentences in the present embodiment, it is specific as follows:
(1) query statement is parsed, initial query hypergraph is built according to the query statement after parsing;
Initial query hypergraph G { r1, r2, r3, r4 } is formed after dissection process, and (wherein r1, r2, r3, r4 represent table, are It is easy to description, the expression of opposite side is omitted herein).
(2) the minimum principle of cost based on query plan tree is carried out step by step to initial query hypergraph according to rank is top-down Decompose, until obtaining the optimal query plan tree of initial query hypergraph, that is, complete the optimization of big data real-time query.
Multiple subgraphs are formed after this inquiry hypergraph G { r1, r2, r3, r4 } is divided, as shown in Fig. 2 subsequent descriptions will be based on This is carried out.To inquire about hypergraph G { r1, r2, r3, r4 } as decomposition goal, first order decomposition is carried out, two subgraphs pair are obtained, point Not Wei subgraph to G11 and subgraph to G12, it is respectively G1 { r1 } and G5 { r2, r3, r4 } that subgraph, which includes two inquiry hypergraphs to G11, It is respectively G7 { r1, r2 } and G6 { r3, r4 } that subgraph, which includes two inquiry hypergraphs to G12,.
G1 { r1 } is single table, without continuing to decompose.The inquiry hypergraph G5 { r2, r3, r4 } that is further obtained with level of decomposition, G7 { r1, r2 } and G6 { r3, r4 } carries out two grades of decomposition as decomposition goal.G5 { r2, r3, r4 } is decomposed and is obtained a subgraph pair G21, including two inquiry hypergraphs, respectively G2 { r2 } and G6 { r3, r4 }.Mesh is further decomposed as three-level with G6 { r3, r4 } Mark proceeds to decompose, and obtains a subgraph to G31, including two inquiry hypergraph G3 { r3 } and G4 { r4 }.
Inquiry hypergraph G7 { r1, r2 } is decomposed and obtains a subgraph to G22, including two inquiry hypergraph G1 { r1 } and G2 {r2}。
As shown in figure 1, decomposable process is as follows step by step in the present embodiment:
(a) mapping of initialization global optimum tree and global number of attempt mapping, as shown in table 1, will inquire about hypergraph G r1, R2, r3, r4 } global optimum's tree mapping is put into, correspondence optimal solution is
Single table hypergraph and corresponding single table query plan tree are put into (wherein T in global optimum's tree mappingxyzRepresent inquiry Have three tables of x, y, z in plan tree T, x, y, z is the numbering of table, when have have a variety of query plan trees comprising same table when, use Txyz、T’xyz、T”xyzDeng expression).
Hypergraph G { r1, r2, r3, r4 } is put into global number of attempt mapping, correspondence number of attempt is 0, the obtained overall situation Number of attempt mapping is as shown in table 2.
Table 1
Table 2
G{r1,r2,r3,r4} 0
(b) the corresponding optimal query plan trees of G { r1, r2, r3, r4 } are obtained from global optimum's tree mapping, be the discovery that (being not present), therefore obtain from the mapping of global number of attempt G { r1, r2, r3, r4 } number of attempt, and tasting according to acquisition Try number of times and update budget using formula (1):
Budget=max (budget, lowerBound (graph) × 2atteempt) (1)
Wherein graph is hypergraph (inquiring about hypergraph), and attempt is number of attempt, and lowerBound, which is used to obtain, to be inquired about The cost lower bound of hypergraph, calculates lower bound, to ensure lowerBound (graph)≤cost usually using simple Cost Model (graph), cost (graph) represents graph actual cost, now graph=G.
Budget after being updated in the present embodiment is b0' (budget initial value is b in the present embodiment0, it is just infinite).
Update corresponding number of attempt during its global number of attempt is mapped after budget and add 1.Global trial after renewal Number of times mapping is as shown in table 3.
Table 3
G{r1,r2,r3,r4} 1
(c) document Top Down Plan Generation are used as decomposition goal with G { r1, r2, r3, r4 }:From Division methods in Theory to Practice are divided to G { r1, r2, r3, r4 }, as shown in Fig. 2 two subgraphs of generation It is right, G11 { G1{r1},G5{ r2, r3, r4 } } and G12 { G7{r1,r2},G6{r3,r4}}。
(d) below to subgraph to G11 { G1{r1},G5{ r2, r3, r4 } } analyzed, with G during analysis5{ r2, r3, r4 } makees Divided to be right, with G1{ r1 } is divided as left, with b0' as subgraph to G11 { G1{r1},G5{ r2, r3, r4 } } it is corresponding Budget a reference values (budget a reference value).
(d1) first determine whether whether right division is present in global optimum's tree mapping, and be based on accordingly according to judged result Replacement criteria update budget, then using update after budget build G1{ r1 } corresponding left subtree.
Replacement criteria in the present embodiment is as follows:
If right division is present in optimal tree mapping, the cost of correspondence query plan tree is directly calculated, and to calculate The cost of the query plan tree arrived is as budget;
Otherwise, its cost lower bound is calculated, and updates budget for cost lower bound;
Right division G in the present embodiment5{ r2, r3, r4 } not in the mapping of global optimum tree, with according to above method (i.e. more New standard) subgraph is updated to corresponding budget a reference values, (now budget a reference values are b0'), the budget after renewal For c11
In the present embodiment G is obtained from global optimum's tree mapping1{ r1 } corresponding optimal tree, is T1, directly return to optimal look into Ask plan tree T1(think that the cost of single table query plan tree is always less than given budget herein, it is i.e. small in this embodiment In c11)。
(d2) G11 { G are updated using formula (2) according to the cost of left subtree1{r1},G5{ r2, r3, r4 } } it is corresponding Budget a reference values, and G is built according to the budget after renewal5{ r2, r3, r4 } corresponding query plan tree is used as right subtree.
Budget '=budget-cost (left), (2)
Wherein cost (left) represents the cost of the left subtree built.
Budget after being updated in the present embodiment is b1
b1=b0’-c12,
Wherein, c12For G1The cost of { r1 }.
G is built in the present embodiment5{ r2, r3, r4 } corresponding query plan tree specifically includes following steps as right subtree:
(d21) G is obtained from global optimum's tree mapping5{ r2, r3, r4 } corresponding optimal tree (optimal query plan tree), It is not present, then obtains G from the mapping of global number of attempt5{ r2, r3, r4 } corresponding number of attempt, is not present, is initialized as 0, And substitute into obtained number of attempt according to formula (1) to update budget.
Budget after now being updated in the present embodiment is b1' be:
b1'=max (b1,lowerBound(graph)×2atteempt),
It is saved in after number of attempt is increased into 1 simultaneously in global number of attempt mapping.Now, the global number of attempt after renewal Mapping is as shown in table 4.
Table 4
G{r1,r2,r3,r4} 1
G5{r2,r3,r4} 1
(d22) to inquiry hypergraph G5{ r2, r3, r4 } is divided, as shown in Fig. 2 the present embodiment generates a subgraph pair G21{G2{r2},G6{ r3, r4 } }, with G6{ r3, r4 } divides to be right, with G2{ r2 } divides to be left, with b1' as subgraph to G21 {G2{r2},G6{ r3, r4 } } budget a reference values, proceed following operation:
(S1) right division G is judged6Whether { r3, r4 } is present in global optimum's tree mapping, and using replacement criteria more New budget, then builds G using the budget after updating1{ r1 } corresponding left subtree.
Right division G in the present embodiment6{ r3, r4 } not in global optimum's tree mapping, being updated according to replacement criteria will Budget is by a reference value b1' it is updated to c21
In the present embodiment G is obtained from global optimum's tree mapping2{ r2 } corresponding optimal tree, is T2, directly return to optimal look into Ask plan tree T2(think that the cost of single table query plan tree is always less than given budget herein, herein in fact as left subtree Apply and be less than c in example21)。
(S2) according to G2The cost of { r2 } updates budget, and the subgraph is updated to corresponding budget bases using formula (2) Quasi- value, budget is after renewal:
b2=b1’-c22,
Wherein c22For G2Cost (the i.e. query plan tree T of { r2 }2Cost), then build G6{ r3, r4 } corresponding right son Tree, it is specific as follows:
(S2-1) G is obtained from global optimum's tree mapping6{ r3, r4 } corresponding optimal tree, is not present, then from overall situation trial G is obtained in number of times mapping6{ r3, r4 } corresponding number of attempt, is not present, and initialization number of attempt is 0, and according to the trial time Number is using formula (1) by budget value by b2It is updated to b2', while being saved in global number of attempt after number of attempt is increased into 1 In mapping.
Now, the global number of attempt mapping after renewal is as shown in table 5.
Table 5
G{r1,r2,r3,r4} 1
G5{r2,r3,r4} 1
G6{r3,r4} 1
(S2-2) inquiry hypergraph G6 { r3, r4 } is divided, generates a subgraph to G31 { G3 { r3 }, G4 { r4 } }, with G4 { r4 } divides to be right, with G3 { r3 } left division, with b2' as subgraph to G31 { G3 { r3 }, G4 { r4 } } corresponding budget bases Quasi- value.
(S2-21) step (d1) is performed to G31 { G3 { r3 }, G4 { r4 } } for subgraph, now judges right division G4 { r4 } For in global optimum's mapping tree, and based on corresponding replacement criteria, further calculate right division G4The cost of { r4 }, will Budget value is by b2' it is updated to c31, and build G3{ r3 } corresponding left subtree:
(S2-22) G is obtained from global optimum's tree mapping3{ r3 } corresponding optimal tree, is T3, directly return to optimal inquiry Plan tree T3(think that the cost of single table query plan tree is always less than given budget herein, be less than in this embodiment c31)。
(S2-23) left subtree obtained using structure, budget, the value of the budget after renewal are updated according to formula (2) b3For:
b3=b2’-c32,
Wherein, c32For G3The cost of { r3 }).
(S2-24) budget (the i.e. b after updating are utilized3) build G4{ r4 } corresponding right subtree, it is specific as follows:
G is obtained from global optimum's tree mapping4{ r4 } corresponding optimal tree, is T4, directly return to optimal query plan tree T4 (think that the cost of single table query plan tree is always less than given budget herein, b is less than in this embodiment3)。
(S2-25) by T3And T4Carry out positive sequence and inverted sequence merges, if positive sequence amalgamation result is T34, inverted sequence amalgamation result is T’34, and T34Cost is less than T '34.Select the query plan tree of Least-cost and update global optimum's tree mapping.It is complete after renewal Office's optimal tree mapping is as shown in table 6.
Table 6
(S2-26) all subgraphs are finished to having calculated, and return to G in global optimum's tree mapping6{ r3, r4 } is corresponding optimal to be looked into Ask plan tree T34
(S3) by T2And T34Carry out positive sequence and inverted sequence merges, positive sequence amalgamation result is T in the present embodiment234, inverted sequence, which merges, to be tied Fruit is T '234, and T234Cost is less than T '234.Select the query plan tree of Least-cost and update global optimum's tree mapping.Update Global optimum's tree mapping afterwards is as shown in table 7.
For subtree T in the present embodimentmWith subtree Tn, its positive sequence and inverted sequence merging are as shown in figure 3, wherein figure (a) is forward direction Merge, figure (b) is reverse, during merging, with subtree TmFor left subtree and subtree TnFor right subtree, referred to as positive sequence, which merges, obtains setting T, Conversely, with subtree TmFor right subtree and subtree TnFor left subtree, referred to as inverted sequence, which merges, obtains setting T.
Table 7
(d23) all subgraphs are finished to having calculated, and return to G in global optimum's tree mapping5{ r2, r3, r4 } is corresponding optimal Query plan tree T234
(d3) by T1And T234Carry out positive sequence and inverted sequence merges, positive sequence amalgamation result might as well be set as T1234, inverted sequence amalgamation result For T '1234, and T1234Cost is less than T '1234.Select the query plan tree of Least-cost and update global optimum's tree mapping.Update Global optimum's tree mapping afterwards is as shown in table 8.
Table 8
G{r1,r2,r3,r4} T1234
G1{r1} T1
G2{r2} T2
G3{r3} T3
G4{r4} T4
G6{r3,r4} T34
G5{r2,r3,r4} T234
(e) below to subgraph to G12 { G7{r1,r2},G6{ r3, r4 } } analyzed, with G during analysis7{ r1, r2 } conduct It is left to divide, G6{ r3, r4 } divides to be right, with b0' as subgraph to G12 { G7{r1,r2},G6{ r3, r4 } } corresponding budget bases Quasi- value.
(e1) right division G is calculated first6The cost of { r3, r4 }, budget is updated with this.Due to G6It is present in the overall situation most In select tree mapping, therefore directly calculate its cost and by budget by a reference value b0' it is updated to c41, and build G7{ r1, r2 } is right The left subtree answered.
G is built in the present embodiment7The method of { r1, r2 } corresponding left subtree is as follows:
(e11) G is obtained from global optimum's tree mapping7{ r1, r2 } corresponding optimal tree, is not present.
(e12) G is obtained from the mapping of global number of attempt7{ r1, r2 } corresponding number of attempt, is not present, initialization is tasted It is 0 to try number of times, and updates budget with this, by budget value by c41It is updated to c41', while being preserved after number of attempt is increased into 1 Into the mapping of global number of attempt.Global number of attempt mapping after renewal is as shown in table 9.
Table 9
G{r1,r2,r3,r4} 1
G5{r2,r3,r4} 1
G6{r3,r4} 1
G7{r1,r2} 1
(e13) to hypergraph (inquiring about hypergraph) G7{ r1, r2 } is divided, and one subgraph of generation is to G22 { G1{r1},G2 {r2}}。
(e14) right division G is calculated2The cost of { r2 }, budget is updated with this, by a reference value by c41' it is updated to c51, and structure Build G1{ r1 } corresponding left subtree:
G is obtained from global optimum's tree mapping1{ r1 } corresponding optimal tree, is T1, directly return to optimal query plan tree T1 (think that the cost of single table query plan tree is always less than given budget herein, c is less than in this embodiment51)。
(e15) budget, b are updated5=c41’-c52(wherein c52For G1The cost of { r1 }), then build G2{ r2 } is corresponding Right subtree:
G is obtained from global optimum's tree mapping2{ r2 } corresponding optimal tree, is T2, directly return to optimal query plan tree T2 (think that the cost of single table query plan tree is always less than given budget herein, b is less than in this embodiment5)。
(e16) by T1And T2Carry out positive sequence and inverted sequence merges, positive sequence amalgamation result might as well be set as T12, inverted sequence amalgamation result is T’12, and T12Cost is less than T '12.Select the query plan tree of Least-cost and update global optimum's tree mapping.It is complete after renewal Office's optimal tree mapping is as shown in table 10.
Table 10
G{r1,r2,r3,r4} T1234
G1{r1} T1
G2{r2} T2
G3{r3} T3
G4{r4} T4
G6{r3,r4} T34
G5{r2,r3,r4} T234
G7{r1,r2} T12
(e17) all subgraphs are finished to having calculated, and return to G in global optimum's tree mapping7{ r1, r2 } corresponding optimal inquiry Plan tree T12
(e2) update budget and build G6{ r3, r4 } corresponding right subtree, it is specific as follows:
(e21) by budget by a reference value b0' it is updated to b4, b4=b0’-c42(wherein c42For G7The cost of { r1, r2 }), And G is built with this6{ r3, r4 } corresponding right subtree.
(e22) G is obtained from global optimum's tree mapping6{ r3, r4 } corresponding optimal tree, is present, and by its cost with giving Fixed budget is compared:
If cost is less than given budget, no longer divided, directly return to corresponding inquiry in optimal tree mapping Plan tree;
Otherwise need to continue executing with partition process.
G in the present embodiment6{ r3, r4 } corresponding optimal tree cost is less than given budget a reference values b4, therefore directly return Return G in global optimum's tree mapping6{ r3, r4 } corresponding optimal query plan tree T34
(e3) by T12And T34Carry out positive sequence and inverted sequence merges, positive sequence amalgamation result might as well be set as T "1234, inverted sequence amalgamation result For T " '1234, and T "1234Cost is less than T " '1234And T1234.Select the query plan tree of Least-cost and update global optimum tree Mapping.Global optimum's tree mapping after renewal is as shown in table 11.
Table 11
G{r1,r2,r3,r4} T”1234
G1{r1} T1
G2{r2} T2
G3{r3} T3
G4{r4} T4
G6{r3,r4} T34
G5{r2,r3,r4} T234
G7{r1,r2} T12
(f) all subgraphs are finished to having calculated, and return to G { r1, r2, r3, r4 } in global optimum's tree mapping corresponding optimal Query plan tree T "1234
Technical scheme and beneficial effect are described in detail above-described embodiment, Ying Li Solution is to the foregoing is only presently most preferred embodiment of the invention, is not intended to limit the invention, all principle models in the present invention Interior done any modification, supplement and equivalent substitution etc. are enclosed, be should be included in the scope of the protection.

Claims (6)

1. one kind is based on dense tree and top-down big data real-time query optimization method, it is characterised in that including:
(1) query statement is parsed, initial query hypergraph is built according to the query statement after parsing;
(1-1) initializes empty global optimum's tree mapping;
(1-2) is directed to each point in initial query hypergraph and builds corresponding inquiry hypergraph and query plan tree, and makes each inquiry meter The cost for drawing tree is 0, and the inquiry hypergraph and corresponding query plan tree each put in the initial query hypergraph are added to Described global optimum's tree mapping;
(2) the minimum principle of cost based on query plan tree to described initial query hypergraph according to rank it is top-down carry out by Level is decomposed, until obtaining the optimal query plan tree of the initial query hypergraph, that is, completes the optimization of big data real-time query;
Carry out carrying out condition judgment first when decomposing step by step, judge current decomposition target whether in global optimum's tree mapping:
If corresponding according to current decomposition target in the tree mapping of cost threshold decision global optimum in global optimum's tree mapping Whether query plan tree is optimal:
If optimal, then optimal inquiry plan is used as using the corresponding query plan tree of current decomposition target in global optimum's tree mapping Tree,
Otherwise, the optimal query plan tree of current decomposition target is built, and current decomposition target pair during global optimum tree is mapped The query plan tree answered is updated to the optimal query plan tree;
Otherwise, the optimal query plan tree of current goal is built, and current decomposition target and optimal query plan tree are stored in entirely In office's optimal tree mapping;
The optimal query plan tree of current decomposition target is built as follows:
(S1) current decomposition target is decomposed, some subgraphs pair of next stage is obtained, successively to the every of current decomposition target One subgraph is to carrying out resolution process;Each subgraph is to including two inquiry hypergraphs, to each subgraph to carrying out during resolution process Resolution process is carried out to each inquiry hypergraph successively;
(S2) when carrying out resolution process to each inquiry hypergraph, using current queries hypergraph as current decomposition target, with the subgraph Another inquiry hypergraph of centering is used as reference target to update cost threshold value;
Returned for the cost threshold value after current decomposition target and renewal and re-execute condition judgment, until obtaining current decomposition mesh The optimal query plan tree of target;
According to rank from the bottom to top, the subgraph of each in current level is obtained into corresponding merging knot to merging successively in order Really, the optimal query plan tree as upper level decomposition goal of Least-cost is selected from all amalgamation results, and by higher level Decomposition goal and the tree mapping of corresponding query plan tree deposit global optimum;
The initial value of the cost threshold value is just infinite.
2. as described in claim 1 based on dense tree and top-down big data real-time query optimization method, its feature It is, for same subgraph to carrying out during resolution process, any selection one is divided as left, then is drawn using another as the right side Point, first using right division as reference target, left division is as decomposition goal, after the optimal query plan tree for obtaining left division; Reference target is divided into a left side, right division obtains the optimal query plan tree of right division as decomposition goal;
(a1) it is as follows in the method for reference target renewal cost threshold value in the step (2) when being divided into reference target with the right side:
If reference target is in global optimum's tree mapping, the cost threshold value after renewal is used as to calculate obtained cost;
Otherwise, the cost threshold value after renewal is used as using the cost lower bound of reference target;
(a2) it is as follows in the method for reference target renewal cost threshold value in the step (2) when being divided into reference target with a left side:
The difference that the cost of the optimal query plan tree of left division is subtracted using current cost threshold value is used as the cost threshold value after renewal.
3. as described in claim 1 based on dense tree and top-down big data real-time query optimization method, its feature Be, in the step (S2) to subgraph to merging when the corresponding inquiry of two of subgraph centering inquiry hypergraphs is counted respectively Draw tree carry out it is positive merge and reversely merging, merged with forward direction and reversely merge in obtained query plan tree that cost is less to be looked into Plan tree is ask as the amalgamation result of the subgraph pair.
4. being optimized based on dense tree and top-down big data real-time query as described in any one in claims 1 to 3 Method, it is characterised in that the cost of the query plan tree reads cost, net cost for the disk of the query plan tree With the integrate-cost of table size;
Cost for any one class query plan tree T is calculated according to below equation:
Wherein, αLR+ β+γ+δ=1;
L and R are respectively the left subtree and right subtree of query plan tree;
CLThe cost of left subtree, CRFor the cost of right subtree;
IOLThe cost of the corresponding data of left subtree is read for disk;
TR is net cost, and S is the size of data, SLRFor the size of data after connecting left subtree and right subtree Hash.
5. as claimed in claim 4 based on dense tree and top-down big data real-time query optimization method, its feature exists In the step (1) also includes the empty global number of attempt mapping of initialization one;
Also include being proceeded as follows according to the result of condition judgment when decompose step by step:
(b1) number of attempt of current decomposition target is determined according to the result of the condition judgment:
If the result of condition judgment be current decomposition target global number of attempt mapping in, from global number of attempt mapping in Obtain the number of attempt of the current decomposition goal;
If the result of condition judgment be current decomposition target not global number of attempt mapping in, make tasting for current decomposition target It is zero to try number of times, and current decomposition target and its number of attempt are added into global number of attempt mapping;
(b2) cost threshold value is updated according to the number of attempt of determination, by current decomposition mesh in its global number of attempt mapping after renewal The corresponding number of attempt of mark correspondence subgraph adds 1.
6. as claimed in claim 2 based on dense tree and top-down big data real-time query optimization method, its feature exists In the step (a2) updates cost threshold value according to following method:
Budget '=max (budget, lowerBound (graph) × 2atteempt),
Wherein, budget ' is the cost threshold value after updating, and budget is the cost threshold value before updating, and graph is current decomposition mesh Mark, attempt is number of attempt, and lowerBound (graph) is the cost lower bound of current decomposition target.
CN201410765313.6A 2014-12-11 2014-12-11 Based on dense tree and top-down big data real-time query optimization method Expired - Fee Related CN104504018B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410765313.6A CN104504018B (en) 2014-12-11 2014-12-11 Based on dense tree and top-down big data real-time query optimization method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410765313.6A CN104504018B (en) 2014-12-11 2014-12-11 Based on dense tree and top-down big data real-time query optimization method

Publications (2)

Publication Number Publication Date
CN104504018A CN104504018A (en) 2015-04-08
CN104504018B true CN104504018B (en) 2017-09-08

Family

ID=52945416

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410765313.6A Expired - Fee Related CN104504018B (en) 2014-12-11 2014-12-11 Based on dense tree and top-down big data real-time query optimization method

Country Status (1)

Country Link
CN (1) CN104504018B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107193813B (en) * 2016-03-14 2021-05-14 阿里巴巴集团控股有限公司 Data table connection mode processing method and device
CN106446134B (en) * 2016-09-20 2019-07-09 浙江大学 Local multi-query optimization method based on predicate specification and cost estimation
CN106991116B (en) * 2017-02-10 2020-04-14 阿里巴巴集团控股有限公司 Optimization method and device for database execution plan
CN108388662A (en) * 2018-03-09 2018-08-10 重庆邮电大学 Dynamic optimization algorithm is established towards chronometer data real-time query logic plan
CN108491516B (en) * 2018-03-26 2021-09-14 哈工大大数据(哈尔滨)智能科技有限公司 Distributed multi-table connection selection method and device based on mixed integer linear programming
CN111930519B (en) 2020-09-22 2020-12-15 北京一流科技有限公司 Parallel decision system and method for distributed data processing
CN113448967B (en) * 2021-07-20 2022-02-08 威讯柏睿数据科技(北京)有限公司 Method and device for accelerating database operation

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793467A (en) * 2013-09-10 2014-05-14 浙江鸿程计算机系统有限公司 Method for optimizing real-time query on big data on basis of hyper-graphs and dynamic programming

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793467A (en) * 2013-09-10 2014-05-14 浙江鸿程计算机系统有限公司 Method for optimizing real-time query on big data on basis of hyper-graphs and dynamic programming

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
"Effective and Robust Pruning for Top一Down Join Enumeration Algorithms";Pit Fender et al.;《2012 IEEE 28th International Conference on Data Engineering》;20121231;第414-425页 *
"Top down plan generation: From theory to practice";P.Fender et al.;《IEEE:ICDE Conference》;20131231;第1105页第1栏第1段-第1116页第1栏倒数第1段 *
"基于改进DPhyp算法的Impala查询优化";周强 等;《计算机研究与发展》;20131231;第114-120页 *
"基于线性浓密树的并行数据库查询优化算法";厉阳春;《湖南理工学院学报(自然科学版)》;20060331;第19卷(第1期);第20-23页 *

Also Published As

Publication number Publication date
CN104504018A (en) 2015-04-08

Similar Documents

Publication Publication Date Title
CN104504018B (en) Based on dense tree and top-down big data real-time query optimization method
US10769147B2 (en) Batch data query method and apparatus
CN104050202B (en) Method and apparatus for searching for database
CN102063486B (en) Multi-dimensional data management-oriented cloud computing query processing method
CN103678550B (en) Mass data real-time query method based on dynamic index structure
Ban et al. Query optimization of distributed database based on parallel genetic algorithm and max-min ant system
CN103793467B (en) Method for optimizing real-time query on big data on basis of hyper-graphs and dynamic programming
CN104137095B (en) System for evolution analysis
Liu et al. Versatile black-box optimization
CN106250457B (en) The inquiry processing method and system of big data platform Materialized View
CN106844664A (en) A kind of time series data index structuring method based on summary
CN104298598B (en) The adjustment method of RDFS bodies under distributed environment
CN110147377A (en) General polling algorithm based on secondary index under extensive spatial data environment
CN108520035A (en) SPARQL parent map pattern query processing methods based on star decomposition
CN109308303B (en) Multi-table connection online aggregation method based on Markov chain
CN104699786A (en) Semantic intelligent search communication network complaint system
CN110019384A (en) A kind of acquisition methods of blood relationship data provide the method and device of blood relationship data
CN111709560A (en) Method for solving vehicle path problem based on improved ant colony algorithm
Park et al. Genetic programming for order acceptance and scheduling
CN111177410A (en) Knowledge graph storage and similarity retrieval method based on evolution R-tree
CN106844666A (en) A kind of time series data querying method of self adaptation
CN111126865A (en) Technology maturity judging method and system based on scientific and technological big data
CN110750560A (en) System and method for optimizing network multi-connection
CN114239237A (en) Power distribution network simulation scene generation system and method supporting digital twinning
CN114461858A (en) Causal relationship analysis model construction and causal relationship analysis method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170908

Termination date: 20201211

CF01 Termination of patent right due to non-payment of annual fee