CN106446134B - Local multi-query optimization method based on predicate specification and cost estimation - Google Patents
Local multi-query optimization method based on predicate specification and cost estimation Download PDFInfo
- Publication number
- CN106446134B CN106446134B CN201610833428.3A CN201610833428A CN106446134B CN 106446134 B CN106446134 B CN 106446134B CN 201610833428 A CN201610833428 A CN 201610833428A CN 106446134 B CN106446134 B CN 106446134B
- Authority
- CN
- China
- Prior art keywords
- node
- query
- task
- expense
- inquiry
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention discloses a kind of local multi-query optimization method based on predicate specification and cost estimation, belong to big data query optimization field, the method of the present invention are as follows: the inquiry in query set is optimized respectively first with data query system existing optimizer, and it is indicated in the form of query tree, the query tree set after being optimized;Of equal value or specification is carried out to subtask same or similar between inquiry and is handled by successive ignition then in conjunction with local multi-query optimization method, generates global more query plan trees;The specification relationship between the more inquiry plans of the overall situation generated and subtask is finally combined, estimates that intermediate result reuses expense according to Cost Model, judges that direct subtasking still reuses intermediate result, global more inquiry plans are optimized.The present invention fully considers that intermediate result utilizes the balance between inquiry concurrently, reduces repetitive operation, effectively promotion query performance.
Description
Technical field
The present invention relates to big data query optimization field more particularly to a kind of parts based on predicate specification and cost estimation
Multi-query optimization method.
Background technique
The research of the problems such as early stage query optimization and scheduling is mainly for single inquiry, but with the promotion of concurrency, and
Data query system is continuously improved, and inquiry concurrent processing has become the essential function of modern data inquiry system.When
It concurrently inquires when being made of the inquiry of related (being related to same or similar operation), traditional single enquiring and optimizing method is it is not intended that look into
Correlation between inquiry, to limit the promotion of system queries performance.
Multi-query optimization, by analyzing this batch inquiry, will be wherein related to when system while when inputting multiple queries
And the part of same or similar operation merges, and generates global more inquiry plans.By the execution of more inquiry plans come same
When complete multiple queries, improve search efficiency.
Multi-query optimization method can be divided into two classes: one kind is global multi-query optimization method, and input is not optimized
Query set, the advantages of such method is that the candidate executive plan quantity that generates is big, and the result of output is often more excellent, the disadvantage is that
Optimizing Search expense is high, simultaneously because the optimizer of data query system can not be utilized, thus realizes that difficulty is higher;It is another kind of to be
Local multi-query optimization method, input are the query set of data query system optimizer output, and the advantages of such method is
Search space is smaller, while being easier to realize.Since local multi-query optimization method does not often consider to reuse intermediate result bring
Expense, thus may cause the expense for reusing intermediate result higher than the expense directly executed in actually executing, it reduces instead
System queries performance.
Existing multi-query optimization method only only accounts for I/O expense when using cost function estimation inquiry plan expense,
That is disc page number involved in query processing, and ignore CPU computing cost and network transmission expense.However, when bottom is adopted
With distributed computing architecture (such as most big data inquiry system) and there are when attended operation, CPU is calculated and network passes
Defeated expense be can not ignore.Original Cost Model obviously can not accurately estimate the expense of inquiry at this time.
Summary of the invention
For the deficiency of prior art described above, the present invention provides the parts based on predicate specification and cost estimation
Multi-query optimization method can be used to intermediate result same or similar between inquiry, repetitive operation be reduced, to mention
Concurrent query performance is risen, query responding time is reduced.
Local multi-query optimization method based on predicate specification and cost estimation is divided into pretreatment, at local multi-query optimization
Reason and more inquiry plans optimize three phases, and specific implementation step is as follows:
(1) pretreatment stage:
Step (1-1) carries out every inquiry in query set using the existing query optimizer of data query system
Optimization, is found optimal inquiry plan, and indicated in the form of query plan tree respectively, finally obtains query plan tree set;
Step (1-2), redefines the node serial number in query plan tree;
Step (1-3), definition node mapping relations map<Node key, Node value>M, and M is initialized as sky;
Node in all query plan trees is added in global more query plan trees by step (1-4), and in the overall situation
" super root node " is added in more query plan trees;
In step (1-2), the node serial number in rewritten query plan tree is for generating in global more query plan trees
The serial number of node.
In step (1-3), mapping relations map<Node key, Node value>M show that key node will reuse
That value node obtains as a result, map is data type, M is name variable, belongs to map data type, is used for memory node key
To the mapping relations of node value.
In step (1-4), " super root node " is directed toward the root node of each inquiry in query plan tree set.
The present invention is every using the existing query optimizer of data query system and is looked into using local multi-query optimization method
It askes sentence and finds the inquiry plan after optimization, reduce algorithm search space, shorten Query Optimal and execute the time.
(2) local multi-query optimization processing stage:
Step (2-1), the node set V={ v in the global more query plan trees of traversal queries1,v2,...,vn, if set
There is the node with present node equivalence in V, selects to number the smallest node v in equivalence query node in node set Vj *, into
Row union operation, uses vj *Instead of all vi∈V-{vj *};
Step (2-2), for each node vi, find the subtask task for meeting " most strong reduction condition "j *, add in M
Add vj *To vi, mapping, v is connected by directed line segmentj *→vi, and change vj *Operation description, with node viCorresponding son
Task taskiResult as subtask taskj *Input;
Step (2-3) repeats step (2-1) and step (2-2), carries out to the node in global more query plan trees of equal value
Replacement and specification, until can not be further simplified global more query plan trees;
In step (2-1), the process for judging whether two nodes belong to equivalence relation is specifically included that
(a) consistent with the type of two nodes and execute and judged on identical table and column for condition, it is such as discontented
Sufficient condition then returns to false, shows that two nodes do not meet equivalence relation;
(b) equivalence relation judgement is carried out according to node type respectively: if node is disk scanning node, needing basis
Predicate in node judges whether two nodes are that the data selection of same range in same column if eligible, is returned
True is returned, shows equivalence relation;Otherwise false is returned, shows not meeting equivalence relation;
If (c) node is connecting node, firstly the need of by judging whether condition of contact is identical, if they are the same, then pass
Return and judges whether its left and right child is of equal value, since attended operation is there may be two kinds of tree structures, i.e. left and right child nodes exchange,
Therefore it needs to judge respectively.If one of which is tree-like to meet equivalence relation, returns to true and show to meet equivalence relation, it is on the contrary
Return to false;
If (d) node is other nodes, including converging operation node and data transmission nodal, then recurrence judges its child
Whether node is of equal value.
In step (2-2), vj *→viIndicate the flow direction of data.
In step (2-2), for each subtask taskiIf taskiImplementing result contain other subtasks
Result, then it is assumed that other subtasks can be by specification to taski.It can be by specification to taskiAll subtasks in, as a result
With taskiMost similar subtask is referred to as the subtask task for meeting " most strong reduction condition "j *。
In step (2-2), the result is the intermediate data that task execution obtains after the completion.
In step (2-3), judge the node in global more query plan trees whether belong to specification relationship process it is main
Include:
First, it is determined that whether the type of two nodes is disk scanning node, false is returned if being unsatisfactory for condition,
Show that two nodes do not meet specification relationship;
Secondly, judging whether two nodes execute on identical table and column, if being unsatisfactory for condition, false, table are returned to
Bright two nodes do not meet equivalence relation;
Finally, judging the specification relationship of two nodes according to specification relation table, true is returned to if meeting specification relationship, instead
Return false.Specification relation table is as shown in table 1.
1 reduction relation of table relies on table
(1. a1=> b1and a2=> b2) or (a1=> b2and a2=> b1) or (a=> b1) or (a=> b2)
M, n in upper table indicate integer or floating number, when in predicate a, b m and n meet the size relation in relation table
When, that is, show that a can be with specification to b.When b is made of two predicates of b1 and b2 with and relationship, also need to judge a and b1, b2
Specification relationship.Such as b is made of b1:col1<8 and b2:col1>3, col1>2 a, b2 can be with specification to a at this time, therefore b
It can be with specification to a.
In step (2-3), since the process of step (2-1) and step (2-2) can change global more query plan trees
Structure, the executive overhead for corresponding to task so as to cause global more inquiry plan tree nodes changes, therefore repeats step (2-1)
With step (2-2), equivalencing and specification are carried out to the node in global more query plan trees, it is complete until that can not be further simplified
Until the more query plan trees of office.
(3) more inquiry plan optimizing phases:
Step (3-1) obtains mapping relations and global more query plan trees that local multi-query optimization processing stage obtains;
Step (3-2), according to mapping relations map<Node key, Node value>M, the global more query plan trees of traversal
In node, if traversal complete, then follow the steps (3-8);
Step (3-3) estimates direct expense and reuses expense according to the corresponding Cost Model of different task;
Step (3-4), comparison reuse expense and direct expense, if reusing expense is greater than direct expense, then follow the steps (3-
5);If reusing expense is less than direct expense, (3-6) is thened follow the steps;
The corresponding relationship map of mapping relations interior joint is oneself, does not utilize intermediate result directly to execute by step (3-5)
Then the corresponding task of node executes step (3-7);
Step (3-6) repeats step (3-3) and step (3-4), judges whether there is and reuses the lower node of expense, if
In the presence of mapping relations are then updated, which is mapped to and reuses the lower node of expense;
Step (3-7) repeats step (3-2);
Step (3-8) returns to global more query plan trees.
In step (3-3), if some node is mapped to another node in mapping relations, show the node corresponding
Implementing result of the implementing result of being engaged in dependent on another node (relying on node) corresponding task.
In step (3-3), it is made of due to inquiring multiple subtasks, according to the corresponding Cost Model of different task
The network transmission expense estimated direct expense, reuse expense and intermediate result.
In step (3-3), the direct expense is that the node in global more query plan trees directly executes corresponding appoint
The CPU computing cost of business;The reuse expense corresponds to the CPU computing cost of task and to being relied on node to rely on node
The result that corresponding task generates carries out the expense of network transmission and calculating.
In step (3-3), query plan tree is made of multiple plan nodes, and one in the corresponding inquiry of each node appoints
Business, mainly for disk scanning task, attended operation task and network transmission task are modeled, according to different nodes, estimation
The expense of different task;Cost Model and cost estimation method are as follows:
(a) disk scanning node
When there is disk scanning node, expression reads data to memory from disk, and meets item according to predicate screening
The process of the data of part.The average seek time t that reading data cost is moved by magnetic armseek, magnetic head average rotation delay time
tlatencyAnd data read time treadThe cost of composition, predicate screening depends on specific predicate and data distribution;
When inquiry plan tree node is disk scanning node, disk scanning task is estimated using disk scanning Cost Model
Cost.
(b) attended operation node
The time of attended operation by calculating tuple cryptographic Hash time thashTuple, will construct in memory for right table data
The time t of Hash tablebuild, connection tuple will be participated in be inserted into the time t of Hash tableinsertTupleAnd left and right list cell group is completed
The time t of attended operationjoinTupleComposition, each execution time is mainly by machine cpu performance, the size decision of left-handed watch and right table;
When inquiry plan tree node is attended operation node, attended operation task is estimated using attended operation Cost Model
Cost.
(c) data transmission nodal
The data that data transmission nodal is mainly responsible for receiving and aggregate transmission obtains.Data transfer task can be in different hosts
Upper parallel execution, therefore the completion moment t of the node respective operationsexchangeDepending on finally complete data transfer task when
It carves, the time overhead of transmission is mainly by the network bandwidth Net in byte data amount TransferByte and current clusterbandCertainly
It is fixed;
When inquiry plan tree node is data transmission nodal, data transmission cost model estimated data's transformation task is used
Cost.
The present invention constructs Cost Model during query processing, and it is fixed to give respective queries processing cost to different operation
Justice improves the accuracy of Query Cost estimation, selects efficient more inquiry plans convenient for algorithm.
In step (3-6), reuse expense and be less than and directly executes the expense that the node correspond to task, show to can be used according to
Rely the intermediate result of task, at this time repeatedly step (3-3) and step (3-4), judge whether there is the reuse lower node of expense,
Mapping relations are then updated if it exists, which is mapped to and reuses the lower node of expense.
The present invention utilizes the inquiry that the existing query optimizer of data query system is after every query statement finds optimization
Plan carries out equivalencing or specification to part same or similar between inquiry plan by successive ignition, generates global more
Inquiry plan, and expense is reused by estimation, judge that direct execution task still reuses intermediate result, to global more inquiry plans
It optimizes, reduces query responding time.The present invention relatively traditional multi-query optimization method the advantages of include:
It (1) is every inquiry using the existing query optimizer of data query system using local multi-query optimization method
Sentence finds the inquiry plan after optimization, reduces algorithm search space, shortens Query Optimal and executes the time;
(2) Cost Model is constructed during query processing, respective queries processing cost definition is given to different operation, is mentioned
The high accuracy of Query Cost estimation, selects efficient more inquiry plans convenient for algorithm;
(3) the more inquiry plans of the overall situation of generation are optimized, fully considers that intermediate result reuses the expense generated, avoids
The follow-up work waiting time is too long, ensure that system concurrency, improves query execution efficiency.
Detailed description of the invention
Fig. 1: the local multi-query optimization method flow diagram based on predicate specification and cost estimation;
Fig. 2: query plan tree schematic diagram.
Specific embodiment
In order to more specifically describe the present invention, with reference to the accompanying drawing and specific embodiment is to technical solution of the present invention
It is described in detail.
As shown in Figure 1, the local multi-query optimization method based on predicate specification and cost estimation be divided into pretreatment, part it is more
Query optimization processing and more inquiry plans optimize three phases.
(1) key step of pretreatment stage includes:
Step (1-1) looks into every in the query set of input using the existing query optimizer of data query system
Inquiry optimizes, and finds optimal inquiry plan respectively, and indicate in the form of query plan tree, finally obtains query plan tree
Set;
Query plan tree is expressed as T (V, E, D), and V is the set of all nodes in query plan tree, and each subtask is corresponding
One inquiry plan tree node, node include some operation informations (such as operating involved table and column etc.), and E is query plan tree
In all sides set, D is the description of query node concrete operations (including predicate etc. involved in operation).Inquiry plan leaf
Child node is disk scanning node, is responsible for the reading of data, and non-leaf nodes represents different algebraic manipulations.Non-leaf nodes makes
With the data from its child nodes, connected between node with a line, query plan tree is as shown in Figure 2;
Step (1-2) redefines the node serial number in inquiry, for generating the sequence of global more query plan tree interior joints
Number;
Step (1-3), definition node mapping relations map<Node key, Node value>M, and M is initialized as sky;
The mapping relations show that key node will reuse the result that value node obtains;Then it is more all nodes to be added to the overall situation
In query plan tree, and " super root node " is added in global more query plan trees, which is directed toward in query plan tree set
The root node of each inquiry.
(2) key step of local multi-query optimization processing stage includes:
Step (2-1), the set V={ v of traversal queries node1,v2,...,vn, if existing in set V and present node
Node of equal value selects to number the smallest point v in equivalence query node in set Vj *, operation is merged, that is, uses vj *Instead of institute
There is vi∈V-{vj *};
The process for judging whether two nodes belong to equivalence relation specifically includes that
(a) consistent with the type of two nodes and execute and judged on identical table and column for condition, it is such as discontented
Sufficient condition then returns to false, shows that two nodes do not meet equivalence relation;
(b) equivalence relation judgement is carried out according to node type respectively: if node is disk scanning node, needing basis
Predicate in node judges whether two nodes are that the data selection of same range in same column if eligible, is returned
True is returned, shows equivalence relation;Otherwise false is returned, shows not meeting equivalence relation;
If (c) node is connecting node, firstly the need of by judging whether condition of contact is identical, if they are the same, then pass
Return and judges whether its left and right child is of equal value, since attended operation is there may be two kinds of tree structures, i.e. left and right child nodes exchange,
Therefore it needs to judge respectively.If one of which is tree-like to meet equivalence relation, returns to true and show to meet equivalence relation, it is on the contrary
Return to false;
If (d) node is other nodes, including converging operation node and data transmission nodal, then recurrence judges its child
Whether node is of equal value.
Step (2-2), for each node vi, find the subtask task for meeting " most strong reduction condition "j *, add in M
Add vj *To vi, mapping, v is connected by directed line segmentj *→vi, and change vj *Operation description, with node viCorresponding son
Task taskiResult as subtask taskj *Input;
For each subtask taskiIf taskiResult contain the result of other subtasks, then it is assumed that other son
Task can be by specification to taski.It can be by specification to taskiAll subtasks in, as a result with taskiMost similar son is appointed
Business is referred to as the subtask task for meeting " most strong reduction condition "j *。
Judge whether node belongs to the process of specification relationship and specifically include that
First, it is determined that whether the type of two nodes is disk scanning node, false is returned if being unsatisfactory for condition,
Show that two nodes do not meet specification relationship;
Secondly, judging whether two nodes execute on identical table and column, if being unsatisfactory for condition, false, table are returned to
Bright two nodes do not meet equivalence relation;
Finally, judging the specification relationship of two nodes according to specification relation table, true is returned if meeting specification relationship,
Otherwise return to false.Specification relation table is as shown in table 1.
Step (2-3), since the process of step (2-1) and step (2-2) can change the structure of global more query plan trees,
The executive overhead for corresponding to task so as to cause global more inquiry plan tree nodes changes, and repeats step (2-1) and step (2-
2) equivalencing and specification, are carried out to the node in plan tree, until can not be further simplified global more query plan trees.
1 reduction relation of table relies on table
(1. a1=> b1and a2=> b2) or (a1=> b2and a2=> b1) or (a=> b1) or (a=> b2)
(3) more inquiry plan optimizing phases mainly comprise the steps that
Step (3-1) obtains mapping relations and global more query plan trees that local multi-query optimization processing stage obtains;
Step (3-2), according to mapping relations map<Node key, Node value>M, the global more query plan trees of traversal
In node, if traversal complete, then follow the steps (3-8);
Step (3-3) is made of due to inquiring multiple subtasks, and according to the corresponding Cost Model of different task, estimation is straight
The network transmission expense for connecing expense, reusing expense and intermediate result;
Cost estimation method is as follows:
(a) disk scanning node
When there is disk scanning node, expression reads data to memory from disk, and meets item according to predicate screening
The process of the data of part.The average seek time t that reading data cost is moved by magnetic armseek, magnetic head average rotation delay time
tlatencyAnd data read time treadThe cost of composition, predicate screening depends on specific predicate and data distribution;
(b) attended operation node
The time of attended operation by calculating tuple cryptographic Hash time thashTuple, will construct in memory for right table data
The time t of Hash tablebuild, connection tuple will be participated in be inserted into the time t of Hash tableinsertTupleAnd left and right list cell group is completed
The time t of attended operationjoinTupleComposition, each execution time is mainly by machine cpu performance, the size decision of left-handed watch and right table;
(c) data transmission nodal
The data that data transmission nodal is mainly responsible for receiving and aggregate transmission obtains.Data transfer task can be in different hosts
Upper parallel execution, therefore the completion moment t of the node respective operationsexchangeDepending on finally complete data transfer task when
It carves, the time overhead of transmission is mainly by the network bandwidth Net in byte data amount TransferByte and current clusterbandCertainly
It is fixed.
Step (3-4), comparison reuse expense and direct expense, if reusing expense is greater than direct expense, then follow the steps (3-
5);If reusing expense is less than direct expense, (3-6) is thened follow the steps;
The corresponding relationship map of mapping relations interior joint is oneself, does not utilize intermediate result directly to execute by step (3-5)
Then the corresponding task of node executes step (3-7);
Step (3-6) repeats step (3-3) and step (3-4), judges whether there is and reuses the lower node of expense, if
In the presence of mapping relations are then updated, which is mapped to and reuses the lower node of expense;
Step (3-7) repeats step (3-2);
Step (3-8) returns to global more query plan trees.
Technical solution of the present invention and beneficial effect is described in detail in above-described specific embodiment, Ying Li
Solution is not intended to restrict the invention the foregoing is merely presently most preferred embodiment of the invention, all in principle model of the invention
Interior done any modification, supplementary, and equivalent replacement etc. are enclosed, should all be included in the protection scope of the present invention.
Claims (4)
1. a kind of local multi-query optimization method based on predicate specification and cost estimation, it is characterised in that: be divided into pretreatment, office
The processing of portion's multi-query optimization and more inquiry plans optimize three phases, the specific steps are as follows:
(1) pretreatment stage:
Step (1-1) carries out every inquiry in query set excellent using the existing query optimizer of data query system
Change, finds optimal inquiry plan respectively, and indicate in the form of query plan tree, obtain query plan tree set;
Step (1-2), redefines the node serial number in query plan tree;
Step (1-3), definition node mapping relations map<Node key, Node value>M, and M is initialized as sky;
Node in all query plan trees is added in global more query plan trees by step (1-4), and looks into the overall situation more
It askes in plan tree and adds " super root node ", wherein " super root node " is directed toward the root node of each inquiry in query plan tree set;
(2) local multi-query optimization processing stage:
Step (2-1), the node set V={ v in the global more query plan trees of traversal queries1,v2,...,vn, if in set V
In the presence of the node with present node equivalence, select to number the smallest node v in equivalence query node in node set Vj *, closed
And operate, use vj *Instead of all vi∈V-{vj *};
Step (2-2), for each node vi, find the subtask task for meeting " most strong reduction condition "j *, v is added in Mj *
To viMapping, v is connected by directed line segmentj *→vi, and change vj *Operation description, with node viCorresponding subtask
taskiResult as subtask taskj *Input, wherein for each subtask taskiIf taskiImplementing result
Contain the result of other subtasks, then it is assumed that other subtasks can be by specification to taski, can be by specification to taskiInstitute
Have in subtask, as a result with taskiMost similar subtask is referred to as the subtask task for meeting " most strong reduction condition "j *;
Step (2-3) repeats step (2-1) and step (2-2), carries out equivalencing to the node in global more query plan trees
It is replaced with specification, until can not be further simplified global more query plan trees;
(3) more inquiry plan optimizing phases:
Step (3-1) obtains mapping relations and global more query plan trees that local multi-query optimization processing stage obtains;
Step (3-2), according to mapping relations map<Node key, Node value>M, in the global more query plan trees of traversal
Node thens follow the steps (3-8) if traversal is completed;
Step (3-3) estimates direct expense and reuses expense, wherein is described direct according to the corresponding Cost Model of different task
Expense is the CPU computing cost that the node in global more query plan trees directly executes corresponding task, the reuse expense for according to
The result for relying node to correspond to the CPU computing cost of task and correspond to task generation to relied on node carries out network transmission and meter
The expense of calculation;
Step (3-4), comparison reuse expense and direct expense, if reusing expense is greater than direct expense, then follow the steps (3-5);
If reusing expense is less than direct expense, (3-6) is thened follow the steps;
The corresponding relationship map of mapping relations interior joint is oneself, does not utilize intermediate result directly to execute node by step (3-5)
Then corresponding task executes step (3-7);
Step (3-6) repeats step (3-3) and step (3-4), judges whether there is and reuses the lower node of expense, if it exists
Mapping relations are then updated, which is mapped to and reuses the lower node of expense;
Step (3-7) repeats step (3-2);
Step (3-8) returns to global more query plan trees.
2. the local multi-query optimization method according to claim 1 based on predicate specification and cost estimation, feature exist
In: in step (2-1), the process for judging whether two nodes belong to equivalence relation is specifically included that
Step (a), it is consistent with the type of two nodes and execute and judged on identical table and column for condition, it is such as discontented
Sufficient condition then returns to false;
Step (b) carries out equivalence relation judgement according to node type respectively: if node is disk scanning node, needing root
According to the predicate in node, judge whether two nodes are that the data of same range in same column are selected, if eligible,
True is returned, otherwise returns to false;
Step (c) judges whether condition of contact is identical if node is connecting node, and if they are the same, then recurrence judges its left side respectively
Whether right child is of equal value, if one of which is tree-like to meet equivalence relation, returns to true, otherwise returns to false;
Step (d), if node is other nodes, including converging operation node and data transmission nodal, then recurrence judges its child
Whether child node is of equal value.
3. the local multi-query optimization method according to claim 1 based on predicate specification and cost estimation, feature exist
In: before carrying out the specification replacement in step (2-3), judge whether the node in global more query plan trees belongs to specification pass
System, the process specifically include that
First, it is determined that whether the type of two nodes is disk scanning node, false is returned if being unsatisfactory for condition;
Secondly, judging whether two nodes execute on identical table and column, if being unsatisfactory for condition, false is returned to;
Finally, judging the specification relationship of two nodes according to specification relation table, true is returned if meeting specification relationship, otherwise is returned
Return false.
4. the local multi-query optimization method according to claim 1 based on predicate specification and cost estimation, feature exist
In: in step (3-3), cost estimation method includes disk scanning nodal method, attended operation nodal method and data transmission
Nodal method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610833428.3A CN106446134B (en) | 2016-09-20 | 2016-09-20 | Local multi-query optimization method based on predicate specification and cost estimation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610833428.3A CN106446134B (en) | 2016-09-20 | 2016-09-20 | Local multi-query optimization method based on predicate specification and cost estimation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106446134A CN106446134A (en) | 2017-02-22 |
CN106446134B true CN106446134B (en) | 2019-07-09 |
Family
ID=58166360
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610833428.3A Active CN106446134B (en) | 2016-09-20 | 2016-09-20 | Local multi-query optimization method based on predicate specification and cost estimation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106446134B (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107133281B (en) * | 2017-04-14 | 2020-12-15 | 浙江鸿程计算机系统有限公司 | Global multi-query optimization method based on grouping |
CN107301205A (en) * | 2017-06-01 | 2017-10-27 | 华南理工大学 | A kind of distributed Query method in real time of big data and system |
CN107885865B (en) | 2017-11-22 | 2019-12-10 | 星环信息科技(上海)有限公司 | Cost optimizer and cost estimation method and equipment |
CN108304517A (en) * | 2018-01-23 | 2018-07-20 | 西南大学 | Efficient nested querying method based on Complex event processing system |
CN110297766B (en) * | 2019-06-03 | 2023-05-30 | 合肥移瑞通信技术有限公司 | Software testing method and software testing system based on distributed test node cluster |
WO2021007816A1 (en) * | 2019-07-17 | 2021-01-21 | Alibaba Group Holding Limited | Method and system for generating and executing query plan |
CN115599811A (en) * | 2021-07-09 | 2023-01-13 | 华为技术有限公司(Cn) | Data processing method and device and computing system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102609493A (en) * | 2012-01-20 | 2012-07-25 | 东华大学 | Connection sequence inquiry optimizing method based on column-storage model |
CN103198133A (en) * | 2013-04-12 | 2013-07-10 | 同方知网(北京)技术有限公司 | Query optimization method for converting XPath (XML path language) query into tree-form data structure |
CN104408106A (en) * | 2014-11-20 | 2015-03-11 | 浙江大学 | Scheduling method for big data inquiry in distributed file system |
CN104504018A (en) * | 2014-12-11 | 2015-04-08 | 浙江大学 | Top-down real-time big data query optimization method based on bushy tree |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150032722A1 (en) * | 2013-03-15 | 2015-01-29 | Teradata Corporation | Optimization of database queries for database systems and environments |
-
2016
- 2016-09-20 CN CN201610833428.3A patent/CN106446134B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102609493A (en) * | 2012-01-20 | 2012-07-25 | 东华大学 | Connection sequence inquiry optimizing method based on column-storage model |
CN103198133A (en) * | 2013-04-12 | 2013-07-10 | 同方知网(北京)技术有限公司 | Query optimization method for converting XPath (XML path language) query into tree-form data structure |
CN104408106A (en) * | 2014-11-20 | 2015-03-11 | 浙江大学 | Scheduling method for big data inquiry in distributed file system |
CN104504018A (en) * | 2014-12-11 | 2015-04-08 | 浙江大学 | Top-down real-time big data query optimization method based on bushy tree |
Non-Patent Citations (2)
Title |
---|
基于列存储的OLAP多查询优化策略研究与实现;陆戌辰;《中国优秀硕士学位论文全文数据库 信息科技辑》;20130615;第三章 |
基于改进DPhyp算法的Impala查询优化;周强 等;《计算机研究与发展》;20131230;第114-120页 |
Also Published As
Publication number | Publication date |
---|---|
CN106446134A (en) | 2017-02-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106446134B (en) | Local multi-query optimization method based on predicate specification and cost estimation | |
Koliousis et al. | Saber: Window-based hybrid stream processing for heterogeneous architectures | |
Chen et al. | ThunderGP: HLS-based graph processing framework on FPGAs | |
Wang et al. | FlexGraph: a flexible and efficient distributed framework for GNN training | |
Boehm et al. | On optimizing operator fusion plans for large-scale machine learning in systemml | |
CN107111653B (en) | Query optimization of system memory load for parallel database systems | |
US9535975B2 (en) | Parallel programming of in memory database utilizing extensible skeletons | |
Qiao et al. | A fast parallel community discovery model on complex networks through approximate optimization | |
US11023443B2 (en) | Collaborative planning for accelerating analytic queries | |
US20120072412A1 (en) | Evaluating execution plan changes after a wakeup threshold time | |
JP6516110B2 (en) | Multiple Query Optimization in SQL-on-Hadoop System | |
Funke et al. | Data-parallel query processing on non-uniform data | |
CN105224452A (en) | A kind of prediction cost optimization method for scientific program static analysis performance | |
CN104731969A (en) | Mass data join aggregation query method, device and system in distributed environment | |
CN104778077A (en) | High-speed extranuclear graph processing method and system based on random and continuous disk access | |
Shanoda et al. | JOMR: Multi-join optimizer technique to enhance map-reduce job | |
Gao et al. | GLog: A high level graph analysis system using MapReduce | |
Ye et al. | Hippie: A data-paralleled pipeline approach to improve memory-efficiency and scalability for large dnn training | |
CN111984833B (en) | High-performance graph mining method and system based on GPU | |
WO2018192479A1 (en) | Adaptive code generation with a cost model for jit compiled execution in a database system | |
An et al. | Using index in the mapreduce framework | |
Lai et al. | {GLogS}: Interactive Graph Pattern Matching Query At Large Scale | |
CN104679521B (en) | A kind of accurate calculating task cache WCET analysis method | |
CN110851178B (en) | Inter-process program static analysis method based on distributed graph reachable computation | |
Werner et al. | Automated composition and execution of hardware-accelerated operator graphs |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |