CN108874849A - A kind of optimization method and system of non-equivalent association subquery - Google Patents

A kind of optimization method and system of non-equivalent association subquery Download PDF

Info

Publication number
CN108874849A
CN108874849A CN201810097136.7A CN201810097136A CN108874849A CN 108874849 A CN108874849 A CN 108874849A CN 201810097136 A CN201810097136 A CN 201810097136A CN 108874849 A CN108874849 A CN 108874849A
Authority
CN
China
Prior art keywords
subquery
subregion
association
associated column
appearance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810097136.7A
Other languages
Chinese (zh)
Other versions
CN108874849B (en
Inventor
何文婷
郑天祺
张志斌
程学旗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN201810097136.7A priority Critical patent/CN108874849B/en
Publication of CN108874849A publication Critical patent/CN108874849A/en
Application granted granted Critical
Publication of CN108874849B publication Critical patent/CN108874849B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses the optimization methods and system of a kind of non-equivalent association subquery, which is characterized in that including:Obtain the value collection of the appearance associated column of association subquery;According to the type of operator in the association subquery and the value collection, the appearance associated column of the association subquery is established to the mapping relations of interior table associated column subregion;According to obtained partitioned set, subregion, while the inquiry aggregate function according to interior table in the association subquery are carried out to the interior table of the association subquery, obtain association subquery in the intermediate result status information of each subregion;According to the mapping relations, the appearance associated column is traversed, by polymerizeing the intermediate result status information of corresponding partition set, obtains the corresponding subquery results of each associated column in appearance.The technical effect that the present invention has includes:Subregion is carried out by internal table, and reuses the intermediate result of each subregion to obtain final subquery results collection, to promote query performance.

Description

A kind of optimization method and system of non-equivalent association subquery
Technical field
The present invention relates to database relation system regions, in particular to the optimization method of a kind of non-equivalent association subquery and System.
Background technique
Subquery refers to that query statement occurs as the querying condition of another sentence, and association subquery refers to subquery Querying condition is inquired dependent on father.Typical non-equivalent association subquery is as follows:
select X.a,X.b from X where X.c>
(select avg(Y.c)from Y where Y.d【OPERATOR】X.d)
Wherein it is associated with the operator of the associated column of subquery and outer inquiry, that is, above-mentioned【OPERATOR】Including:!=,>,>=, <,<=, in, not in, between and etc..Due to subquery use outer inquiry as a result, so at present existing realization skill Art includes the following three types implementation:
1,tuple-at-a-time(nested iterator):The value that a d is often obtained from outer Table X, is transmitted to son and looks into It askes, then executes subquery and obtain the result of subquery.For having the case where repetition values in X.d, there is following two mode to avoid It calculates:One is arranging subquery results to appearance to cache, life can be then cached when there is duplicate appearance associated column to occur In, to avoid calculating;Another way is ranked up to appearance associated column, and the appearance column of identical value are put together, One-off recognition, this method are also avoided that and compute repeatedly.When index is shown in the association of Ruo Neibiao, certain acceleration can be played and made With.
2,semi join/anti join/outer join:Outer Table X and Nei Biao Y are done into cartesian product, are screened out from it Meet the record of Correlation Criteria, then obtains the corresponding result column of subquery.
3,sort-merge join:If OPERATOR is<,<=,>,>=and the select of subquery be classified as max and take most The aggregate function that big value/min is minimized, then can first be ranked up appearance and the associated column of Nei Biao respectively, then passes through The method that similar merger sequence is compared obtain subquery as a result, answering for partial results may be implemented by some cache policies With.Such as subquery is select max (Y.c) from Y where Y.d>X.d carries out descending row when X and Y table respectively arranges d Sequence, it is assumed that when -1 row of kth of X.d is processed, Y.d processing to m row, interior query result is maxk, then X.d row k is arrived in processing When, interior query result is that max (max (Y.c) of maxk, Y.d (m-n row)) wherein n row is in Y.d>X.d row k value is minimum Value where line number.
And the above prior art there is a problem of it is respective:
First method performance is poor, although being avoided that the appearance column of identical value compute repeatedly subquery, for son The calculating of inquiry, the appearance column of different value then obtain the time overhead of all subquery results there is still a need for table in all scanning Interior table is scanned to the time overhead when finally calculated for the number of distinct values * of appearance associated column.And the present invention is due to being utilized The intermediate result of subquery, it is only necessary to table in run-down, therefore it is greatly improved performance.
The problems in 1) second method equally exists.
The third method use condition is limited:The Correlation Criteria of appearance interior first is only an expression formula, and type is only Can be<,<=,>,>=tetra- kinds, set operation in, not in are not supported, are not supported beween and and are not equal to!=operation and The combination of multiple expression formulas such as Y.d>X.d and Y.d<X.d+10.Secondly the inquiry column of subquery only support max/min/ sum/count.Furthermore the technology needs internal appearance to be ranked up according to associated column, and time loss is big.And what the present invention supported The type of subquery union includes set and the combination for supporting a variety of comparison operator expression formulas more, with more general Property.
Inventor is when carrying out subquery association optimizing research, especially for the associated subquery optimizing research of non-equivalent When, this defect is by the searching loop to table interior in subquery, the repetition meter of most of set of records ends in the prior art for discovery Caused by calculation, inventor pass through to sub- enquiring and optimizing method the study found that solve this defect can by cache subquery Calculated result intermediate state, it is stored with minimum particle size, is then multiplexed the intermediate result status merging shape of the small grain size It is realized at the method for final subquery results.The general pass of a kind of support that the present invention is provided for non-equivalent association subquery Connection expression formula and internal table only just needs the height being ranked up to the associated column collection of appearance without being ranked up under specific circumstances Performance optimization method.The present invention only just needs to be ranked up the associated column collection of appearance under specific circumstances, is not necessarily to external table completely It is ranked up, shows index support without association is directed to, and query performance expense is to scan opening for interior table+scanning appearance Pin.Optimisation technique performance of the invention has the promotion of the order of magnitude compared with prior art query performance.
Summary of the invention
Present invention aim to address the support subquery of (overcoming) above-mentioned prior art it is operation associated not general and inquiry The problem of degraded performance, proposes a kind of optimization method for non-equivalent association subquery, including:
Step 1 obtains the value collection for being associated with the appearance associated column of subquery;
Step 2, according to the type of operator in the association subquery and the value collection, establish the appearance of the association subquery Mapping relations of the associated column to interior table associated column subregion;
Step 3, according to the mapping relations, subregion is carried out to the interior table of the association subquery, while looking into according to association The inquiry aggregate function of interior table in inquiry obtains association subquery in the intermediate result status information of each subregion;
Step 4, according to the mapping relations, traverse the appearance associated column, pass through the intermediate result state letter of each subregion of polymerization Breath obtains the corresponding subquery results of each associated column in appearance.
The non-equivalent is associated with the optimization method of subquery, and wherein the step 2 further includes:
Step 21 constructs automatic merge according to the type of operator in the association subquery and the value collection of appearance associated column Partition tree, and by the automatic merging partition tree, the mapping relations are established, wherein each leaf section of the automatic merging partition tree The corresponding subregion of point, gathers the subregion, as interior table associated column subregion.
The non-equivalent is associated with the optimization method of subquery, and wherein the step 2 further includes:
If the operator is, according to the value quantity k of appearance associated column, which to be divided into k+1 not equal to operation A subregion, and the value of each appearance associated column is corresponded into the k subregion in addition to itself, as the mapping relations;
If the operator is to compare operation, according to the value quantity k of appearance associated column, which is divided into k+1 Subregion, and the value of appearance associated column is corresponded into respective partition according to the comparison operator, as the mapping relations;
If the operator is set operation, according to the greatest common divisor of the value of appearance associated column, which is divided For n+1 subregion, and the value of each associated column is corresponded into respective partition according to set operation symbol, as the mapping relations, Wherein n is the greatest common divisor.
The non-equivalent is associated with the optimization method of subquery,, should if the inquiry aggregate function is Avg wherein in step S3 Intermediate result status information is Sum+count;If the inquiry aggregate function is Sum/max/min/count, the intermediate result Status information is Sum/max/min/count.
The non-equivalent is associated with the optimization method of subquery, and wherein the step 4 further includes:Circular treatment appearance should with interior table The correlated judgment of subquery results obtains the final result of the association subquery.
The invention also provides a kind of non-equivalents to be associated with the optimization system of subquery, including:
Subregion mapping block, the value of the appearance associated column for obtaining association subquery, is combined into value collection for value collection, And according to the type of operator in the association subquery and the value collection, the appearance associated column of the association subquery is established to interior table The mapping relations of associated column subregion;
As a result merging module, for carrying out subregion, while foundation to the interior table of the association subquery according to the mapping relations The inquiry aggregate function of interior table in the association subquery obtains association subquery in the intermediate result status information of each subregion, and According to the mapping relations, the appearance associated column is traversed, by polymerizeing the intermediate result status information of each subregion, is obtained each in appearance The corresponding subquery results of associated column.
The non-equivalent is associated with the optimization system of subquery, and wherein the subregion mapping block further includes:It is looked into according to association The building of the value collection of the type of operator and appearance associated column is automatic in inquiry merges partition tree, and passes through the automatic merging subregion Tree, establishes the mapping relations, and wherein the corresponding subregion of each leaf node of the automatic merging partition tree, gathers the subregion, As interior table associated column subregion.
The non-equivalent is associated with the optimization system of subquery, and wherein the subregion mapping block further includes:
If the operator is, according to the value quantity k of appearance associated column, which to be divided into k+1 not equal to operation A subregion, and the value of each appearance associated column is corresponded into the k subregion in addition to itself, as the mapping relations;
If the operator is to compare operation, according to the value quantity k of appearance associated column, which is divided into k+1 Subregion, and the value of appearance associated column is corresponded into respective partition according to the comparison operator, as the mapping relations;
If the operator is set operation, according to the greatest common divisor of the value of appearance associated column, which is divided For n+1 subregion, and the value of each associated column is corresponded into respective partition according to set operation symbol, as the mapping relations, Wherein n is the greatest common divisor.
The non-equivalent is associated with the optimization system of subquery, wherein in result merging module, if the inquiry aggregate function is Avg, then the intermediate result status information is Sum+count;If the inquiry aggregate function is Sum/max/min/count, should Intermediate result status information is Sum/max/min/count.
The non-equivalent is associated with the optimization system of subquery, and wherein the result merging module further includes:Circular treatment appearance with The correlated judgment of the subquery results of interior table obtains the final result of the association subquery.
The present invention is mainly made of dynamic partition mapping and division result merging two stages.It is characterized in that not needing internally Appearance be ranked up (only it is operation associated be an expression formula and when operator is comparison operator just need to appearance associated column into Row sequence), being swept by the primary full table of internal table respectively and appearance just can be obtained the subquery calculating that minimum particle size can be re-used Intermediate result state obtains final subquery results by the final multiple intermediate result states of combination.
Dynamic partition mapping phase:AMP-TREE is constructed according to the different value of the associated column of appearance and union, is obtained To the mapping relations of subregion and the appearance column and subregion of interior table associated column.
Division result merging phase:Interior table is subjected to subregion according to associated column.All subregions are traversed, are obtained in each subregion Between result phase information.According to the mapping relations that dynamic partition mapping phase obtains, it is corresponding most to obtain each appearance associated column Whole subquery results.
The technical effect that the present invention has includes:Adaptable, this non-equivalent association subquery optimization method is to big absolutely Partial association operation is supported, while the internal table of dynamic that can be adaptive carries out subregion, and reuses the centre of each subregion As a result to obtaining final subquery results collection, and performance compares existing technical method very big promotion.
Detailed description of the invention
Fig. 1 is illustraton of model of the invention;
Fig. 2 is the corresponding subregion mapping relations schematic diagram of each incidence relation;
Fig. 3 is the organigram of AMP-TREE numerical intervals class;
Fig. 4 is the building method schematic diagram of AMP-TREE collection class.
Specific embodiment
The present invention for the existing defect to non-equivalent association subquery optimization method, devise it is a kind of new it is general, Adaptively, high performance optimized treatment method.As shown in Figure 1, the design is efficiently solved for different subquery associations The low problem of expression formula, the non-equivalent association subquery performance of subquery polymerization result function.
The present invention mainly consists of the following steps, including:
Step 1 obtains the value collection for being associated with the appearance associated column of subquery;
Step 2, according to the type of operator in the association subquery and the value collection, establish the appearance of the association subquery Mapping relations of the associated column to interior table associated column subregion;
Step 3, according to the mapping relations, subregion is carried out to the interior table of the association subquery, while looking into according to association The inquiry aggregate function of interior table in inquiry obtains association subquery in the intermediate result status information of each subregion;
Step 4, according to the mapping relations, traverse the appearance associated column, pass through the intermediate result state letter of each subregion of polymerization Breath obtains the corresponding subquery results of each associated column in appearance.
Wherein the step 2 further includes:Step 21, type and appearance associated column according to operator in the association subquery The building of value collection is automatic to merge partition tree, and by the automatic merging partition tree, establishes the mapping relations, wherein the automatic merging The corresponding subregion of each leaf node of partition tree, gathers the subregion, as interior table associated column subregion;
Wherein the step 2 further includes:If the operator be not equal to operation, according to the value quantity k of appearance associated column, The interior table is divided into k+1 subregion, and the value of each appearance associated column is corresponded into the k subregion in addition to itself, as The mapping relations;If the operator is to compare operation, according to the value quantity k of appearance associated column, which is divided into k+1 A subregion, and the value of appearance associated column is corresponded into respective partition according to the comparison operator, as the mapping relations;If should Operator is set operation, then according to the greatest common divisor of the value of appearance associated column, which is divided into n+1 subregion, And the value of each associated column is corresponded into respective partition according to set operation symbol, as the mapping relations, wherein n is the maximum Common divisor.
Wherein the step 4 further includes:The correlated judgment of the subquery results of circular treatment appearance and interior table, obtains the pass Join the final result of subquery.
To make the foregoing features and advantages of the present invention clearer and more comprehensible, special embodiment below, and cooperate institute's accompanying drawings It is described in detail below.
The present invention supports a variety of different non-equivalent associative expression formulas, and operator is specifically included:!=,>,>=,<,< =, in, not in, between and etc., wherein processing of the in and not in for monodrome and set column.More seeds are supported to look into Ask aggregate function:Max/min/sum/count/avg etc..Realize the multiplexing of subquery intermediate result to improve query performance.
Wherein Table X is appearance, and table Y is interior table, union operator.The meaning of Y.d-partition is interior The partition information of the associated column d of table Y.Particular content of the invention realizes a model M, which is equivalent to a function, parameter Including two:Distinct x.d, operator, it is also two y.d-partition, map (x.d, y.d- that function, which returns the result, partition).I.e.:M (distinct x.d, operator)=(y.d-partition, map (x.d, y.d- Partition)), all different the d value set distinct d and union for outer Table X are inputted, are exported as needle A map mapping relations between the partitioned set of associated column d and the subregion of x.distinct d to Y.d of internal table Y. The step of concrete operations, is as follows:
S1:The value set and union operator of x.d are obtained in the outer Table X of database;
S2:Based on S1's as a result, using topology row to the value set of x.d according to the total order of operator or partial ordering relation The method of sequence obtains a sequence, as shown in Fig. 2, x.d1 to x.dk indicates the k distinct value of x.d, wherein the mesh to sort Be to determine subsequent subregion range, topological sorting here is the value of all appearance associated columns of traversal, to form AMP Tree use, only when appearance associated column be it is non-set and it is operation associated be a comparison operator, just need the associated column to appearance Value is ranked up.
Wherein S2 specifically further includes step S21, during S2 sequence, generates an automatic merging partition tree Auto- Merge-Partition Tree (AMP tree).Each leaf node of tree is a subregion.In the same of construction AMP-TREE When, obtain the mapping relations in section and appearance associated column.
For!=operator, subregion number=k+1, the different value and an other values of respectively k X.d;
For>,>=,<,<=comparison operators are waited, subregion number=k+1 is ranked up according to appearance associated column x.d, Multiple and different interval ranges is obtained, each interval range is a subregion.For between and operator and compare behaviour Make multiple expression formulas that symbol is combined into, also type is handled, but subregion number, which obtains smallest interval according to sequence, to be calculated.The process Use AMP-TREE tree structure.
For the in of collection class, not in operator obtains more trees of the bottom in Fig. 2, extracts institute as far as possible There is the highest common divisor set of X.distinct d, can finally determine that subregion number is n+1 from figure, the leaf node in forest An as subregion.
Obtain the corresponding relationship of distinct x.d and y.d-partiton.
For being not equal to!=operator, the corresponding k subregion other than X.di of X.di, i is the positive integer less than k, X.di is i-th of associated column value d in outer Table X.
For>,>=,<,<=, with Y.d>For X.d.So X.d corresponds to all meet>The section of X.d.Such as The value of X.d is 1,5,10.So X.d=1 is corresponded to section (1,5), (5,10), and (10, just infinite), X.d=5 corresponds to section (5,10), (10, just infinite), X.d=10 correspond to section (10, just infinite).Between and is processed similarly.
For the in of collection class, not in operator is as can see from Figure 2 X.d1- for the mapping relations of in> (s123),X.d2->(s2345) is for not in then opposite mapping relations X.d1->(non-s123), X.d2->(non-s2345)
S3:The associated column d of the i.e. interior table Y of d column of subquery Y table, the interior table associated column value partition set obtained according to S2 Subregion is carried out, while calculating the intermediate state information for saving the result avg (y.c) of subquery:sum+count.Different polymerizations The corresponding status information of function is as shown in the table:
S4:The data content for traversing every row in appearance, according to the map information of obtained x.d to the y.d-partiton of S2, The intermediate state for polymerizeing each partition calculates subquery results and makes a decision, for example subquery is associated in following embodiment Correlated judgment just refer to x.c>This judgement.
Note:This method is for subquery condition:Wherein X.d is monodrome type to X.d in Y.d, and Y.d is aggregate type When, since the complexity of the algorithm judges that X.d in Y.d complexity is consistent with direct X Join Y again, this optimization method The expense of such inquiry is not will increase, but without effect of optimization yet.So this optimization method puts aside such subquery condition The case where.
To make the foregoing features and advantages of the present invention clearer and more comprehensible, special embodiment below, and cooperate institute's accompanying drawings It is described in detail below.
The present invention is carried out between the comparison operator and set and monodrome of minute mark amount class comprising operator respectively first Explanation:
For the comparison operator of non-collection class>,>=,<,<=, and interior table associated column corresponds to the multiple of appearance associated column The forming process of the combination of algebraic expression, subregion and mapping is as follows:Assuming that inquiry is
select X.a,X.b from X where X.c>
(select avg(Y.c)from Y where Y.d>X.d and Y.d<X.d+20)
S1:Assuming that obtaining X.distinct d={ 1,10,20 }
S2:D.dinstict d is looped through, all values of X.d and X.d+20 is obtained, its numerical value is mapped to It is multiple sections by data cutting if AMP-TREE is sky on Patition-tree;If AMP-TREE is not sky, by two Divide and search matched node, if intersecting with certain leaf node part, the new leaf section of two sons is generated to leaf node cutting Point, detailed process is as follows shown in Fig. 3.In the step, meanwhile, also obtain each subquery appearance associated column and corresponding AMP- The upper child node of TREE is the mapping relations of interior table associated column subregion partition.
Appearance association class and subregion mapping relations are obtained simultaneously in above process, and to node corresponding on AMP-TREE (indicating subsequent will use) is marked:
I.e. partitioned set is (1,21), (10,21), [21,30], (20,21), [30,40] in the present embodiment.
S3:Interior table is mapped in the subregion of AMP-TREE according to the value of associated column, while being calculated each in AMP-TREE The marked node intermediate state to be stored.The corresponding subquery results of X.distinct d are calculated simultaneously:
Wherein aggregate function is the query result of subquery, such as select avg (id) from the present embodiment Avg in table is exactly the aggregate function of this inquiry.Corresponding status information is exactly avgsum+count in that table of front Content.And in order to explain that the polymerization in above-mentioned steps 4 corresponds to partition set, the correspondence when X.d is 1 is divided in the present embodiment Qu Jiwei (1,21), correspondence partition set when X.d is 10 are (10,21), [21,30), the correspondence partition set when X.d is 20 For (20,21), [21,30), [30,40).
S4:The correlated judgment of the subquery results of recycling processing appearance and interior table, obtains the final result of inquiry.
For the in of collection class, not in operator types, the data type of X.d is set, and interior table associated column type For monodrome type.Assuming that inquiry is:
select X.a,X.b from X where X.c>
(select avg(Y.c)from Y where Y.d in X.d)
S1:Assuming that obtaining X.distinct d={ abc }, { ab }, { cd }, { def }, { cef } //a, b, c, d etc. is respectively One element value
S2:X.dinstict d is looped through, is mapped that on Patition-tree, if AMP-TREE is sky, root The value of node is this set;If AMP-TREE is not sky, whether there is matched element in search tree, if can be by multiple sections Point element merges, if intersecting with certain leaf node part, generates the new leaf node of two sons to leaf node cutting, Detailed process is as follows shown in Fig. 3.In the step, meanwhile, also obtain each subquery appearance associated column and corresponding AMP- The mapping relations of the upper partition of TREE.
Appearance association class and subregion mapping relations are obtained simultaneously in above process, and to node corresponding on AMP-TREE (indicating subsequent will use) is marked:
X.d partition
{abc} {ab}{c}
{ab} {ab}
{cd} {c}{d}
{def}, {d}{ef}
{cef} {c}{ef}
S3:Interior table is mapped in the subregion of AMP-TREE according to the value of associated column, while being calculated each in AMP-TREE The marked node intermediate state to be stored.Then the corresponding subquery results of all X.distinct d are calculated.
S4:The correlated judgment of recycling processing appearance obtains the final result of inquiry.
The greatest common divisor of the value of appearance associated column is illustrated in collection class operator embodiment, in this implementation According to the column upper table partition content its common portion totally 4 in example, respectively { ab } { c } { d } { ef }, therefore it is most Big common divisor is 4, n=4, and includes that the collection of { ab } { c } { d } { ef } is combined into highest common divisor set.
Explanation:The size of AMP-TREE can configure, and when the value of configuration is enough big, then the overwhelming majority of appearance associated column is all The partitioned nodes for arriving AMP-TREE can be corresponded.When Configuration Values are smaller, it can only guarantee that a part of appearance associated column is directly right Should be on the node of AMP-TREE, remainder needs to merge by the Partition of small grain size.When Configuration Values are minimum When, most of appearance associated column can not merge with subregion to be mapped, and needs to obtain using existing subquery calculation method Association subquery as a result, there is no a results of intermediate calculations reusable.Therefore, when implementing, it is recommended to use the AMP- of the larger value TREE size configuration, to ensure the high reusability of subquery intermediate result set.
The following are system embodiment corresponding with above method embodiment, this implementation system can be mutual with above embodiment Cooperation is implemented.The above-mentioned relevant technical details mentioned in mode of applying are still effective in this implementation system, in order to reduce repetition, this In repeat no more.Correspondingly, the relevant technical details mentioned in this implementation system are also applicable in above embodiment.
The invention also provides a kind of non-equivalents to be associated with the optimization system of subquery, including:
Subregion mapping block, the value of the appearance associated column for obtaining association subquery, is combined into value collection for value collection, And according to the type of operator in the association subquery and the value collection, the appearance associated column of the association subquery is established to interior table The mapping relations of associated column subregion;
As a result merging module, for carrying out subregion, while foundation to the interior table of the association subquery according to the mapping relations The inquiry aggregate function of interior table in the association subquery obtains association subquery in the intermediate result status information of each subregion, and According to the mapping relations, the appearance associated column is traversed, by polymerizeing the intermediate result status information of each subregion, is obtained each in appearance The corresponding subquery results of associated column.
The non-equivalent is associated with the optimization system of subquery, and wherein the subregion mapping block further includes:It is looked into according to association The building of the value collection of the type of operator and appearance associated column is automatic in inquiry merges partition tree, and passes through the automatic merging subregion Tree, establishes the mapping relations, and wherein the corresponding subregion of each leaf node of the automatic merging partition tree, gathers the subregion, As interior table associated column subregion.
The non-equivalent is associated with the optimization system of subquery, and wherein the subregion mapping block further includes:
If the operator is, according to the value quantity k of appearance associated column, which to be divided into k+1 not equal to operation A subregion, and the value of each appearance associated column is corresponded into the k subregion in addition to itself, as the mapping relations;
If the operator is to compare operation, according to the value quantity k of appearance associated column, which is divided into k+1 Subregion, and the value of appearance associated column is corresponded into respective partition according to the comparison operator, as the mapping relations;
If the operator is set operation, according to the greatest common divisor of the value of appearance associated column, which is divided For n+1 subregion, and the value of each associated column is corresponded into respective partition according to set operation symbol, as the mapping relations, Wherein n is the greatest common divisor.
The non-equivalent is associated with the optimization system of subquery, wherein in result merging module, if the inquiry aggregate function is Avg, then the intermediate result status information is Sum+count;If the inquiry aggregate function is Sum/max/min/count, should Intermediate result status information is Sum/max/min/count.
The non-equivalent is associated with the optimization system of subquery, and wherein the result merging module further includes:Circular treatment appearance with The correlated judgment of the subquery results of interior table obtains the final result of the association subquery.
In conclusion the present invention can internally table be according to associated column progress adaptive partition, for entire subquery, subquery Interior table only needs run-down, does not need the support of index, only it is operation associated be an expression formula and operator is ratio Appearance associated column need to be just ranked up when compared with operator;Subregion and the procedure construction of mapping go out AMP-TREE (automatic merging point Qu Shu) tree.The characteristics of data structure AMP-TREE, is:A) all nodes are a sections;B) all leaves Node interval range is non-intersecting and combines to form complete or collected works;C) interval range of father node is equal to the section model of all child nodes The intersection enclosed, to assist the merging and multiplexing of subquery intermediate result state;The building process of AMP-TREE uses the calculation of subregion Method, according to the different associated columns of appearance value and it is operation associated obtain, be directed to for set topological sorting and needle Partial order and total order sequence to non-set, to form the mapping relations of appearance associated column and Nei Biao subregion;Internal table is according to interior table Associated column is after major key carries out subregion, the intermediate result state of each subregion to be recorded, according to the associated column of appearance to interior table subregion The mapping relations of collection are multiplexed obtained partial results, merge to obtain the final calculated result of subquery to it.
The invention may also have other embodiments, without departing from the spirit and scope of the invention, any this field Technical staff can do some perfect and change on the basis of the present invention, therefore protection scope of the present invention is when view claim Subject to the range that book is defined.

Claims (10)

1. a kind of optimization method of non-equivalent association subquery, which is characterized in that including:
Step 1 obtains the value collection for being associated with the appearance associated column of subquery;
Step 2, according to the type of operator in the association subquery and the value collection, establish the appearance association of the association subquery Arrange the mapping relations of interior table associated column subregion;
Step 3, according to the interior table associated column subregion, obtain partitioned set, with to the association subquery interior table carry out subregion, together When inquiry aggregate function according to interior table in the association subquery, the intermediate result state for obtaining association subquery in each subregion believes Breath;
Step 4, according to the mapping relations, traverse the appearance associated column, pass through the intermediate result state letter for polymerizeing corresponding partition set Breath obtains the corresponding subquery results of each associated column in appearance.
2. the optimization method of non-equivalent as described in claim 1 association subquery, which is characterized in that the step 2 further includes:
Step 21 constructs automatic merging subregion according to the type of operator in the association subquery and the value collection of appearance associated column Tree, and by the automatic merging partition tree, the mapping relations are established, wherein each leaf node pair of the automatic merging partition tree A subregion is answered, the subregion is gathered, as interior table associated column subregion.
3. the optimization method of non-equivalent as described in claim 1 association subquery, which is characterized in that the step 2 further includes:
If the operator is, according to the value quantity k of appearance associated column, which to be divided into k+1 points not equal to operation Area, and the value of each appearance associated column is corresponded into the k subregion in addition to itself, as the mapping relations;
If the operator is to compare operation, according to the value quantity k of appearance associated column, which is divided into k+1 subregion, And the value of appearance associated column is corresponded into respective partition according to the comparison operator, as the mapping relations;
If the interior table is divided into n+1 according to the greatest common divisor of the value of appearance associated column for set operation by the operator A subregion, and the value of each associated column is corresponded into respective partition according to set operation symbol, as the mapping relations, wherein n For the greatest common divisor.
4. the optimization method of non-equivalent association subquery as described in claim 1, which is characterized in that in step S3, if this is looked into Inquiry aggregate function is Avg, then the intermediate result status information is Sum+count;If the inquiry aggregate function is Sum/max/ Min/count, then the intermediate result status information is Sum/max/min/count.
5. the optimization method of non-equivalent as described in claim 1 association subquery, which is characterized in that the step 4 further includes:It follows Ring handles the correlated judgment of the subquery results of appearance and interior table, obtains the final result of the association subquery.
6. a kind of optimization system of non-equivalent association subquery, which is characterized in that including:
Subregion mapping block, the value of the appearance associated column for obtaining association subquery, is combined into value collection, and root for value collection According to the type and the value collection of operator in the association subquery, the appearance associated column for establishing the association subquery is associated with to interior table The mapping relations of column subregion;
As a result merging module, for carrying out subregion to the interior table of the association subquery, while according to the pass according to the mapping relations Join the inquiry aggregate function of interior table in subquery, obtains association subquery in the intermediate result status information of each subregion, and according to The mapping relations traverse the appearance associated column, by polymerizeing the intermediate result status information of each subregion, obtain respectively being associated in appearance Arrange corresponding subquery results.
7. the optimization system of non-equivalent association subquery as claimed in claim 6, which is characterized in that the subregion mapping block is also Including:The automatic merging partition tree of value collection building of type and appearance associated column according to operator in the association subquery, and It by the automatic merging partition tree, determines all subregions, establishes the mapping relations, wherein each of the automatic merging partition tree Leaf node corresponds to a subregion, gathers the subregion, as interior table associated column subregion.
8. the optimization system of non-equivalent association subquery as claimed in claim 6, which is characterized in that the subregion mapping block is also Including:
If the operator is, according to the value quantity k of appearance associated column, which to be divided into k+1 points not equal to operation Area, and the value of each appearance associated column is corresponded into the k subregion in addition to itself, as the mapping relations;
If the operator is to compare operation, according to the value quantity k of appearance associated column, which is divided into k+1 subregion, And the value of appearance associated column is corresponded into respective partition according to the comparison operator, as the mapping relations;
If the interior table is divided into n+1 according to the greatest common divisor of the value of appearance associated column for set operation by the operator A subregion, and the value of each associated column is corresponded into respective partition according to set operation symbol, as the mapping relations, wherein n For the greatest common divisor.
9. the optimization system of non-equivalent association subquery as claimed in claim 6, which is characterized in that in result merging module, If the inquiry aggregate function is Avg, which is Sum+count;If the inquiry aggregate function is Sum/ Max/min/count, then the intermediate result status information is Sum/max/min/count.
10. the optimization system of non-equivalent association subquery as claimed in claim 6, which is characterized in that the result merging module Further include:The correlated judgment of the subquery results of circular treatment appearance and interior table, obtains the final result of the association subquery.
CN201810097136.7A 2018-01-31 2018-01-31 Optimization method and system for non-equivalent associated sub-query Active CN108874849B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810097136.7A CN108874849B (en) 2018-01-31 2018-01-31 Optimization method and system for non-equivalent associated sub-query

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810097136.7A CN108874849B (en) 2018-01-31 2018-01-31 Optimization method and system for non-equivalent associated sub-query

Publications (2)

Publication Number Publication Date
CN108874849A true CN108874849A (en) 2018-11-23
CN108874849B CN108874849B (en) 2020-12-25

Family

ID=64325986

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810097136.7A Active CN108874849B (en) 2018-01-31 2018-01-31 Optimization method and system for non-equivalent associated sub-query

Country Status (1)

Country Link
CN (1) CN108874849B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220237191A1 (en) * 2021-01-25 2022-07-28 Salesforce.Com, Inc. System and method for supporting very large data sets in databases

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080040334A1 (en) * 2006-08-09 2008-02-14 Gad Haber Operation of Relational Database Optimizers by Inserting Redundant Sub-Queries in Complex Queries
CN103294821A (en) * 2013-06-17 2013-09-11 北京工业大学 XML data query result visiting method based on multi-level subquery result branch trees
CN104123374A (en) * 2014-07-28 2014-10-29 北京京东尚科信息技术有限公司 Method and device for aggregate query in distributed databases
CN105975617A (en) * 2016-05-20 2016-09-28 北京京东尚科信息技术有限公司 Multi-partition-table inquiring and processing method and device
CN107169033A (en) * 2017-04-17 2017-09-15 东北大学 Relation data enquiring and optimizing method with parallel framework is changed based on data pattern

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080040334A1 (en) * 2006-08-09 2008-02-14 Gad Haber Operation of Relational Database Optimizers by Inserting Redundant Sub-Queries in Complex Queries
CN103294821A (en) * 2013-06-17 2013-09-11 北京工业大学 XML data query result visiting method based on multi-level subquery result branch trees
CN104123374A (en) * 2014-07-28 2014-10-29 北京京东尚科信息技术有限公司 Method and device for aggregate query in distributed databases
CN105975617A (en) * 2016-05-20 2016-09-28 北京京东尚科信息技术有限公司 Multi-partition-table inquiring and processing method and device
CN107169033A (en) * 2017-04-17 2017-09-15 东北大学 Relation data enquiring and optimizing method with parallel framework is changed based on data pattern

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
毛思雨等: "面向分布式数据库的相关子查询优化策略", 《华东师范大学学报(自然科学版)》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220237191A1 (en) * 2021-01-25 2022-07-28 Salesforce.Com, Inc. System and method for supporting very large data sets in databases

Also Published As

Publication number Publication date
CN108874849B (en) 2020-12-25

Similar Documents

Publication Publication Date Title
US9292570B2 (en) System and method for optimizing pattern query searches on a graph database
Gupta et al. Top-k interesting subgraph discovery in information networks
CN104750496B (en) A kind of model changes disturbance degree automatic check method
CN104408159B (en) A kind of data correlation, loading, querying method and device
US20160071016A1 (en) Scope In Decision Trees
CN104392010B (en) A kind of querying method of subgraph match
CN104462260B (en) A kind of community search method in social networks based on k- cores
CN102945249B (en) A kind of policing rule matching inquiry tree generation method, matching process and device
EP2746964A2 (en) Automatic tuning of database queries
CN106250519A (en) Data query method and apparatus for parallel database
CN106021386B (en) Non-equivalent connection method towards magnanimity distributed data
CN106209989A (en) Spatial data concurrent computational system based on spark platform and method thereof
CN108681577A (en) A kind of novel library structure data index method
Yan et al. Top-k aggregation queries over large networks
CN110032676B (en) SPARQL query optimization method and system based on predicate association
CN103377236B (en) A kind of Connection inquiring method and system for distributed data base
CN108874849A (en) A kind of optimization method and system of non-equivalent association subquery
RU2004131664A (en) METHOD AND DEVICE FOR HANDLING A REQUEST FOR RELATIVE DATABASES
CN109254962A (en) A kind of optimiged index method and device based on T- tree
US20190347302A1 (en) Device, system, and method for determining content relevance through ranked indexes
CN110162716A (en) A kind of influence power community search method and system based on community&#39;s retrieval
CN102214216A (en) Aggregation summarization method for keyword search result of hierarchical relation data
CN107679107A (en) A kind of grid equipment accessibility querying method and system based on chart database
Pang et al. Incremental maintenance of shortest distance and transitive closure in first-order logic and SQL
CN103902715B (en) IP range lookup method and apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
CB03 Change of inventor or designer information

Inventor after: He Wenting

Inventor after: Cheng Xueqi

Inventor after: Zheng Tianqi

Inventor after: Zhang Zhibin

Inventor after: Guo Jiafeng

Inventor after: Zhao Peng

Inventor before: He Wenting

Inventor before: Zheng Tianqi

Inventor before: Zhang Zhibin

Inventor before: Cheng Xueqi

CB03 Change of inventor or designer information
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant