CN106446134B - Local multi-query optimization method based on predicate specification and cost estimation - Google Patents

Local multi-query optimization method based on predicate specification and cost estimation Download PDF

Info

Publication number
CN106446134B
CN106446134B CN201610833428.3A CN201610833428A CN106446134B CN 106446134 B CN106446134 B CN 106446134B CN 201610833428 A CN201610833428 A CN 201610833428A CN 106446134 B CN106446134 B CN 106446134B
Authority
CN
China
Prior art keywords
node
query
task
expense
inquiry
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610833428.3A
Other languages
Chinese (zh)
Other versions
CN106446134A (en
Inventor
陈岭
杨谊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201610833428.3A priority Critical patent/CN106446134B/en
Publication of CN106446134A publication Critical patent/CN106446134A/en
Application granted granted Critical
Publication of CN106446134B publication Critical patent/CN106446134B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention discloses a kind of local multi-query optimization method based on predicate specification and cost estimation, belong to big data query optimization field, the method of the present invention are as follows: the inquiry in query set is optimized respectively first with data query system existing optimizer, and it is indicated in the form of query tree, the query tree set after being optimized;Of equal value or specification is carried out to subtask same or similar between inquiry and is handled by successive ignition then in conjunction with local multi-query optimization method, generates global more query plan trees;The specification relationship between the more inquiry plans of the overall situation generated and subtask is finally combined, estimates that intermediate result reuses expense according to Cost Model, judges that direct subtasking still reuses intermediate result, global more inquiry plans are optimized.The present invention fully considers that intermediate result utilizes the balance between inquiry concurrently, reduces repetitive operation, effectively promotion query performance.

Description

Local multi-query optimization method based on predicate specification and cost estimation
Technical field
The present invention relates to big data query optimization field more particularly to a kind of parts based on predicate specification and cost estimation Multi-query optimization method.
Background technique
The research of the problems such as early stage query optimization and scheduling is mainly for single inquiry, but with the promotion of concurrency, and Data query system is continuously improved, and inquiry concurrent processing has become the essential function of modern data inquiry system.When It concurrently inquires when being made of the inquiry of related (being related to same or similar operation), traditional single enquiring and optimizing method is it is not intended that look into Correlation between inquiry, to limit the promotion of system queries performance.
Multi-query optimization, by analyzing this batch inquiry, will be wherein related to when system while when inputting multiple queries And the part of same or similar operation merges, and generates global more inquiry plans.By the execution of more inquiry plans come same When complete multiple queries, improve search efficiency.
Multi-query optimization method can be divided into two classes: one kind is global multi-query optimization method, and input is not optimized Query set, the advantages of such method is that the candidate executive plan quantity that generates is big, and the result of output is often more excellent, the disadvantage is that Optimizing Search expense is high, simultaneously because the optimizer of data query system can not be utilized, thus realizes that difficulty is higher;It is another kind of to be Local multi-query optimization method, input are the query set of data query system optimizer output, and the advantages of such method is Search space is smaller, while being easier to realize.Since local multi-query optimization method does not often consider to reuse intermediate result bring Expense, thus may cause the expense for reusing intermediate result higher than the expense directly executed in actually executing, it reduces instead System queries performance.
Existing multi-query optimization method only only accounts for I/O expense when using cost function estimation inquiry plan expense, That is disc page number involved in query processing, and ignore CPU computing cost and network transmission expense.However, when bottom is adopted With distributed computing architecture (such as most big data inquiry system) and there are when attended operation, CPU is calculated and network passes Defeated expense be can not ignore.Original Cost Model obviously can not accurately estimate the expense of inquiry at this time.
Summary of the invention
For the deficiency of prior art described above, the present invention provides the parts based on predicate specification and cost estimation Multi-query optimization method can be used to intermediate result same or similar between inquiry, repetitive operation be reduced, to mention Concurrent query performance is risen, query responding time is reduced.
Local multi-query optimization method based on predicate specification and cost estimation is divided into pretreatment, at local multi-query optimization Reason and more inquiry plans optimize three phases, and specific implementation step is as follows:
(1) pretreatment stage:
Step (1-1) carries out every inquiry in query set using the existing query optimizer of data query system Optimization, is found optimal inquiry plan, and indicated in the form of query plan tree respectively, finally obtains query plan tree set;
Step (1-2), redefines the node serial number in query plan tree;
Step (1-3), definition node mapping relations map<Node key, Node value>M, and M is initialized as sky;
Node in all query plan trees is added in global more query plan trees by step (1-4), and in the overall situation " super root node " is added in more query plan trees;
In step (1-2), the node serial number in rewritten query plan tree is for generating in global more query plan trees The serial number of node.
In step (1-3), mapping relations map<Node key, Node value>M show that key node will reuse That value node obtains as a result, map is data type, M is name variable, belongs to map data type, is used for memory node key To the mapping relations of node value.
In step (1-4), " super root node " is directed toward the root node of each inquiry in query plan tree set.
The present invention is every using the existing query optimizer of data query system and is looked into using local multi-query optimization method It askes sentence and finds the inquiry plan after optimization, reduce algorithm search space, shorten Query Optimal and execute the time.
(2) local multi-query optimization processing stage:
Step (2-1), the node set V={ v in the global more query plan trees of traversal queries1,v2,...,vn, if set There is the node with present node equivalence in V, selects to number the smallest node v in equivalence query node in node set Vj *, into Row union operation, uses vj *Instead of all vi∈V-{vj *};
Step (2-2), for each node vi, find the subtask task for meeting " most strong reduction condition "j *, add in M Add vj *To vi, mapping, v is connected by directed line segmentj *→vi, and change vj *Operation description, with node viCorresponding son Task taskiResult as subtask taskj *Input;
Step (2-3) repeats step (2-1) and step (2-2), carries out to the node in global more query plan trees of equal value Replacement and specification, until can not be further simplified global more query plan trees;
In step (2-1), the process for judging whether two nodes belong to equivalence relation is specifically included that
(a) consistent with the type of two nodes and execute and judged on identical table and column for condition, it is such as discontented Sufficient condition then returns to false, shows that two nodes do not meet equivalence relation;
(b) equivalence relation judgement is carried out according to node type respectively: if node is disk scanning node, needing basis Predicate in node judges whether two nodes are that the data selection of same range in same column if eligible, is returned True is returned, shows equivalence relation;Otherwise false is returned, shows not meeting equivalence relation;
If (c) node is connecting node, firstly the need of by judging whether condition of contact is identical, if they are the same, then pass Return and judges whether its left and right child is of equal value, since attended operation is there may be two kinds of tree structures, i.e. left and right child nodes exchange, Therefore it needs to judge respectively.If one of which is tree-like to meet equivalence relation, returns to true and show to meet equivalence relation, it is on the contrary Return to false;
If (d) node is other nodes, including converging operation node and data transmission nodal, then recurrence judges its child Whether node is of equal value.
In step (2-2), vj *→viIndicate the flow direction of data.
In step (2-2), for each subtask taskiIf taskiImplementing result contain other subtasks Result, then it is assumed that other subtasks can be by specification to taski.It can be by specification to taskiAll subtasks in, as a result With taskiMost similar subtask is referred to as the subtask task for meeting " most strong reduction condition "j *
In step (2-2), the result is the intermediate data that task execution obtains after the completion.
In step (2-3), judge the node in global more query plan trees whether belong to specification relationship process it is main Include:
First, it is determined that whether the type of two nodes is disk scanning node, false is returned if being unsatisfactory for condition, Show that two nodes do not meet specification relationship;
Secondly, judging whether two nodes execute on identical table and column, if being unsatisfactory for condition, false, table are returned to Bright two nodes do not meet equivalence relation;
Finally, judging the specification relationship of two nodes according to specification relation table, true is returned to if meeting specification relationship, instead Return false.Specification relation table is as shown in table 1.
1 reduction relation of table relies on table
(1. a1=> b1and a2=> b2) or (a1=> b2and a2=> b1) or (a=> b1) or (a=> b2)
M, n in upper table indicate integer or floating number, when in predicate a, b m and n meet the size relation in relation table When, that is, show that a can be with specification to b.When b is made of two predicates of b1 and b2 with and relationship, also need to judge a and b1, b2 Specification relationship.Such as b is made of b1:col1<8 and b2:col1>3, col1>2 a, b2 can be with specification to a at this time, therefore b It can be with specification to a.
In step (2-3), since the process of step (2-1) and step (2-2) can change global more query plan trees Structure, the executive overhead for corresponding to task so as to cause global more inquiry plan tree nodes changes, therefore repeats step (2-1) With step (2-2), equivalencing and specification are carried out to the node in global more query plan trees, it is complete until that can not be further simplified Until the more query plan trees of office.
(3) more inquiry plan optimizing phases:
Step (3-1) obtains mapping relations and global more query plan trees that local multi-query optimization processing stage obtains;
Step (3-2), according to mapping relations map<Node key, Node value>M, the global more query plan trees of traversal In node, if traversal complete, then follow the steps (3-8);
Step (3-3) estimates direct expense and reuses expense according to the corresponding Cost Model of different task;
Step (3-4), comparison reuse expense and direct expense, if reusing expense is greater than direct expense, then follow the steps (3- 5);If reusing expense is less than direct expense, (3-6) is thened follow the steps;
The corresponding relationship map of mapping relations interior joint is oneself, does not utilize intermediate result directly to execute by step (3-5) Then the corresponding task of node executes step (3-7);
Step (3-6) repeats step (3-3) and step (3-4), judges whether there is and reuses the lower node of expense, if In the presence of mapping relations are then updated, which is mapped to and reuses the lower node of expense;
Step (3-7) repeats step (3-2);
Step (3-8) returns to global more query plan trees.
In step (3-3), if some node is mapped to another node in mapping relations, show the node corresponding Implementing result of the implementing result of being engaged in dependent on another node (relying on node) corresponding task.
In step (3-3), it is made of due to inquiring multiple subtasks, according to the corresponding Cost Model of different task The network transmission expense estimated direct expense, reuse expense and intermediate result.
In step (3-3), the direct expense is that the node in global more query plan trees directly executes corresponding appoint The CPU computing cost of business;The reuse expense corresponds to the CPU computing cost of task and to being relied on node to rely on node The result that corresponding task generates carries out the expense of network transmission and calculating.
In step (3-3), query plan tree is made of multiple plan nodes, and one in the corresponding inquiry of each node appoints Business, mainly for disk scanning task, attended operation task and network transmission task are modeled, according to different nodes, estimation The expense of different task;Cost Model and cost estimation method are as follows:
(a) disk scanning node
When there is disk scanning node, expression reads data to memory from disk, and meets item according to predicate screening The process of the data of part.The average seek time t that reading data cost is moved by magnetic armseek, magnetic head average rotation delay time tlatencyAnd data read time treadThe cost of composition, predicate screening depends on specific predicate and data distribution;
When inquiry plan tree node is disk scanning node, disk scanning task is estimated using disk scanning Cost Model Cost.
(b) attended operation node
The time of attended operation by calculating tuple cryptographic Hash time thashTuple, will construct in memory for right table data The time t of Hash tablebuild, connection tuple will be participated in be inserted into the time t of Hash tableinsertTupleAnd left and right list cell group is completed The time t of attended operationjoinTupleComposition, each execution time is mainly by machine cpu performance, the size decision of left-handed watch and right table;
When inquiry plan tree node is attended operation node, attended operation task is estimated using attended operation Cost Model Cost.
(c) data transmission nodal
The data that data transmission nodal is mainly responsible for receiving and aggregate transmission obtains.Data transfer task can be in different hosts Upper parallel execution, therefore the completion moment t of the node respective operationsexchangeDepending on finally complete data transfer task when It carves, the time overhead of transmission is mainly by the network bandwidth Net in byte data amount TransferByte and current clusterbandCertainly It is fixed;
When inquiry plan tree node is data transmission nodal, data transmission cost model estimated data's transformation task is used Cost.
The present invention constructs Cost Model during query processing, and it is fixed to give respective queries processing cost to different operation Justice improves the accuracy of Query Cost estimation, selects efficient more inquiry plans convenient for algorithm.
In step (3-6), reuse expense and be less than and directly executes the expense that the node correspond to task, show to can be used according to Rely the intermediate result of task, at this time repeatedly step (3-3) and step (3-4), judge whether there is the reuse lower node of expense, Mapping relations are then updated if it exists, which is mapped to and reuses the lower node of expense.
The present invention utilizes the inquiry that the existing query optimizer of data query system is after every query statement finds optimization Plan carries out equivalencing or specification to part same or similar between inquiry plan by successive ignition, generates global more Inquiry plan, and expense is reused by estimation, judge that direct execution task still reuses intermediate result, to global more inquiry plans It optimizes, reduces query responding time.The present invention relatively traditional multi-query optimization method the advantages of include:
It (1) is every inquiry using the existing query optimizer of data query system using local multi-query optimization method Sentence finds the inquiry plan after optimization, reduces algorithm search space, shortens Query Optimal and executes the time;
(2) Cost Model is constructed during query processing, respective queries processing cost definition is given to different operation, is mentioned The high accuracy of Query Cost estimation, selects efficient more inquiry plans convenient for algorithm;
(3) the more inquiry plans of the overall situation of generation are optimized, fully considers that intermediate result reuses the expense generated, avoids The follow-up work waiting time is too long, ensure that system concurrency, improves query execution efficiency.
Detailed description of the invention
Fig. 1: the local multi-query optimization method flow diagram based on predicate specification and cost estimation;
Fig. 2: query plan tree schematic diagram.
Specific embodiment
In order to more specifically describe the present invention, with reference to the accompanying drawing and specific embodiment is to technical solution of the present invention It is described in detail.
As shown in Figure 1, the local multi-query optimization method based on predicate specification and cost estimation be divided into pretreatment, part it is more Query optimization processing and more inquiry plans optimize three phases.
(1) key step of pretreatment stage includes:
Step (1-1) looks into every in the query set of input using the existing query optimizer of data query system Inquiry optimizes, and finds optimal inquiry plan respectively, and indicate in the form of query plan tree, finally obtains query plan tree Set;
Query plan tree is expressed as T (V, E, D), and V is the set of all nodes in query plan tree, and each subtask is corresponding One inquiry plan tree node, node include some operation informations (such as operating involved table and column etc.), and E is query plan tree In all sides set, D is the description of query node concrete operations (including predicate etc. involved in operation).Inquiry plan leaf Child node is disk scanning node, is responsible for the reading of data, and non-leaf nodes represents different algebraic manipulations.Non-leaf nodes makes With the data from its child nodes, connected between node with a line, query plan tree is as shown in Figure 2;
Step (1-2) redefines the node serial number in inquiry, for generating the sequence of global more query plan tree interior joints Number;
Step (1-3), definition node mapping relations map<Node key, Node value>M, and M is initialized as sky; The mapping relations show that key node will reuse the result that value node obtains;Then it is more all nodes to be added to the overall situation In query plan tree, and " super root node " is added in global more query plan trees, which is directed toward in query plan tree set The root node of each inquiry.
(2) key step of local multi-query optimization processing stage includes:
Step (2-1), the set V={ v of traversal queries node1,v2,...,vn, if existing in set V and present node Node of equal value selects to number the smallest point v in equivalence query node in set Vj *, operation is merged, that is, uses vj *Instead of institute There is vi∈V-{vj *};
The process for judging whether two nodes belong to equivalence relation specifically includes that
(a) consistent with the type of two nodes and execute and judged on identical table and column for condition, it is such as discontented Sufficient condition then returns to false, shows that two nodes do not meet equivalence relation;
(b) equivalence relation judgement is carried out according to node type respectively: if node is disk scanning node, needing basis Predicate in node judges whether two nodes are that the data selection of same range in same column if eligible, is returned True is returned, shows equivalence relation;Otherwise false is returned, shows not meeting equivalence relation;
If (c) node is connecting node, firstly the need of by judging whether condition of contact is identical, if they are the same, then pass Return and judges whether its left and right child is of equal value, since attended operation is there may be two kinds of tree structures, i.e. left and right child nodes exchange, Therefore it needs to judge respectively.If one of which is tree-like to meet equivalence relation, returns to true and show to meet equivalence relation, it is on the contrary Return to false;
If (d) node is other nodes, including converging operation node and data transmission nodal, then recurrence judges its child Whether node is of equal value.
Step (2-2), for each node vi, find the subtask task for meeting " most strong reduction condition "j *, add in M Add vj *To vi, mapping, v is connected by directed line segmentj *→vi, and change vj *Operation description, with node viCorresponding son Task taskiResult as subtask taskj *Input;
For each subtask taskiIf taskiResult contain the result of other subtasks, then it is assumed that other son Task can be by specification to taski.It can be by specification to taskiAll subtasks in, as a result with taskiMost similar son is appointed Business is referred to as the subtask task for meeting " most strong reduction condition "j *
Judge whether node belongs to the process of specification relationship and specifically include that
First, it is determined that whether the type of two nodes is disk scanning node, false is returned if being unsatisfactory for condition, Show that two nodes do not meet specification relationship;
Secondly, judging whether two nodes execute on identical table and column, if being unsatisfactory for condition, false, table are returned to Bright two nodes do not meet equivalence relation;
Finally, judging the specification relationship of two nodes according to specification relation table, true is returned if meeting specification relationship, Otherwise return to false.Specification relation table is as shown in table 1.
Step (2-3), since the process of step (2-1) and step (2-2) can change the structure of global more query plan trees, The executive overhead for corresponding to task so as to cause global more inquiry plan tree nodes changes, and repeats step (2-1) and step (2- 2) equivalencing and specification, are carried out to the node in plan tree, until can not be further simplified global more query plan trees.
1 reduction relation of table relies on table
(1. a1=> b1and a2=> b2) or (a1=> b2and a2=> b1) or (a=> b1) or (a=> b2)
(3) more inquiry plan optimizing phases mainly comprise the steps that
Step (3-1) obtains mapping relations and global more query plan trees that local multi-query optimization processing stage obtains;
Step (3-2), according to mapping relations map<Node key, Node value>M, the global more query plan trees of traversal In node, if traversal complete, then follow the steps (3-8);
Step (3-3) is made of due to inquiring multiple subtasks, and according to the corresponding Cost Model of different task, estimation is straight The network transmission expense for connecing expense, reusing expense and intermediate result;
Cost estimation method is as follows:
(a) disk scanning node
When there is disk scanning node, expression reads data to memory from disk, and meets item according to predicate screening The process of the data of part.The average seek time t that reading data cost is moved by magnetic armseek, magnetic head average rotation delay time tlatencyAnd data read time treadThe cost of composition, predicate screening depends on specific predicate and data distribution;
(b) attended operation node
The time of attended operation by calculating tuple cryptographic Hash time thashTuple, will construct in memory for right table data The time t of Hash tablebuild, connection tuple will be participated in be inserted into the time t of Hash tableinsertTupleAnd left and right list cell group is completed The time t of attended operationjoinTupleComposition, each execution time is mainly by machine cpu performance, the size decision of left-handed watch and right table;
(c) data transmission nodal
The data that data transmission nodal is mainly responsible for receiving and aggregate transmission obtains.Data transfer task can be in different hosts Upper parallel execution, therefore the completion moment t of the node respective operationsexchangeDepending on finally complete data transfer task when It carves, the time overhead of transmission is mainly by the network bandwidth Net in byte data amount TransferByte and current clusterbandCertainly It is fixed.
Step (3-4), comparison reuse expense and direct expense, if reusing expense is greater than direct expense, then follow the steps (3- 5);If reusing expense is less than direct expense, (3-6) is thened follow the steps;
The corresponding relationship map of mapping relations interior joint is oneself, does not utilize intermediate result directly to execute by step (3-5) Then the corresponding task of node executes step (3-7);
Step (3-6) repeats step (3-3) and step (3-4), judges whether there is and reuses the lower node of expense, if In the presence of mapping relations are then updated, which is mapped to and reuses the lower node of expense;
Step (3-7) repeats step (3-2);
Step (3-8) returns to global more query plan trees.
Technical solution of the present invention and beneficial effect is described in detail in above-described specific embodiment, Ying Li Solution is not intended to restrict the invention the foregoing is merely presently most preferred embodiment of the invention, all in principle model of the invention Interior done any modification, supplementary, and equivalent replacement etc. are enclosed, should all be included in the protection scope of the present invention.

Claims (4)

1. a kind of local multi-query optimization method based on predicate specification and cost estimation, it is characterised in that: be divided into pretreatment, office The processing of portion's multi-query optimization and more inquiry plans optimize three phases, the specific steps are as follows:
(1) pretreatment stage:
Step (1-1) carries out every inquiry in query set excellent using the existing query optimizer of data query system Change, finds optimal inquiry plan respectively, and indicate in the form of query plan tree, obtain query plan tree set;
Step (1-2), redefines the node serial number in query plan tree;
Step (1-3), definition node mapping relations map<Node key, Node value>M, and M is initialized as sky;
Node in all query plan trees is added in global more query plan trees by step (1-4), and looks into the overall situation more It askes in plan tree and adds " super root node ", wherein " super root node " is directed toward the root node of each inquiry in query plan tree set;
(2) local multi-query optimization processing stage:
Step (2-1), the node set V={ v in the global more query plan trees of traversal queries1,v2,...,vn, if in set V In the presence of the node with present node equivalence, select to number the smallest node v in equivalence query node in node set Vj *, closed And operate, use vj *Instead of all vi∈V-{vj *};
Step (2-2), for each node vi, find the subtask task for meeting " most strong reduction condition "j *, v is added in Mj * To viMapping, v is connected by directed line segmentj *→vi, and change vj *Operation description, with node viCorresponding subtask taskiResult as subtask taskj *Input, wherein for each subtask taskiIf taskiImplementing result Contain the result of other subtasks, then it is assumed that other subtasks can be by specification to taski, can be by specification to taskiInstitute Have in subtask, as a result with taskiMost similar subtask is referred to as the subtask task for meeting " most strong reduction condition "j *
Step (2-3) repeats step (2-1) and step (2-2), carries out equivalencing to the node in global more query plan trees It is replaced with specification, until can not be further simplified global more query plan trees;
(3) more inquiry plan optimizing phases:
Step (3-1) obtains mapping relations and global more query plan trees that local multi-query optimization processing stage obtains;
Step (3-2), according to mapping relations map<Node key, Node value>M, in the global more query plan trees of traversal Node thens follow the steps (3-8) if traversal is completed;
Step (3-3) estimates direct expense and reuses expense, wherein is described direct according to the corresponding Cost Model of different task Expense is the CPU computing cost that the node in global more query plan trees directly executes corresponding task, the reuse expense for according to The result for relying node to correspond to the CPU computing cost of task and correspond to task generation to relied on node carries out network transmission and meter The expense of calculation;
Step (3-4), comparison reuse expense and direct expense, if reusing expense is greater than direct expense, then follow the steps (3-5); If reusing expense is less than direct expense, (3-6) is thened follow the steps;
The corresponding relationship map of mapping relations interior joint is oneself, does not utilize intermediate result directly to execute node by step (3-5) Then corresponding task executes step (3-7);
Step (3-6) repeats step (3-3) and step (3-4), judges whether there is and reuses the lower node of expense, if it exists Mapping relations are then updated, which is mapped to and reuses the lower node of expense;
Step (3-7) repeats step (3-2);
Step (3-8) returns to global more query plan trees.
2. the local multi-query optimization method according to claim 1 based on predicate specification and cost estimation, feature exist In: in step (2-1), the process for judging whether two nodes belong to equivalence relation is specifically included that
Step (a), it is consistent with the type of two nodes and execute and judged on identical table and column for condition, it is such as discontented Sufficient condition then returns to false;
Step (b) carries out equivalence relation judgement according to node type respectively: if node is disk scanning node, needing root According to the predicate in node, judge whether two nodes are that the data of same range in same column are selected, if eligible, True is returned, otherwise returns to false;
Step (c) judges whether condition of contact is identical if node is connecting node, and if they are the same, then recurrence judges its left side respectively Whether right child is of equal value, if one of which is tree-like to meet equivalence relation, returns to true, otherwise returns to false;
Step (d), if node is other nodes, including converging operation node and data transmission nodal, then recurrence judges its child Whether child node is of equal value.
3. the local multi-query optimization method according to claim 1 based on predicate specification and cost estimation, feature exist In: before carrying out the specification replacement in step (2-3), judge whether the node in global more query plan trees belongs to specification pass System, the process specifically include that
First, it is determined that whether the type of two nodes is disk scanning node, false is returned if being unsatisfactory for condition;
Secondly, judging whether two nodes execute on identical table and column, if being unsatisfactory for condition, false is returned to;
Finally, judging the specification relationship of two nodes according to specification relation table, true is returned if meeting specification relationship, otherwise is returned Return false.
4. the local multi-query optimization method according to claim 1 based on predicate specification and cost estimation, feature exist In: in step (3-3), cost estimation method includes disk scanning nodal method, attended operation nodal method and data transmission Nodal method.
CN201610833428.3A 2016-09-20 2016-09-20 Local multi-query optimization method based on predicate specification and cost estimation Active CN106446134B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610833428.3A CN106446134B (en) 2016-09-20 2016-09-20 Local multi-query optimization method based on predicate specification and cost estimation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610833428.3A CN106446134B (en) 2016-09-20 2016-09-20 Local multi-query optimization method based on predicate specification and cost estimation

Publications (2)

Publication Number Publication Date
CN106446134A CN106446134A (en) 2017-02-22
CN106446134B true CN106446134B (en) 2019-07-09

Family

ID=58166360

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610833428.3A Active CN106446134B (en) 2016-09-20 2016-09-20 Local multi-query optimization method based on predicate specification and cost estimation

Country Status (1)

Country Link
CN (1) CN106446134B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107133281B (en) * 2017-04-14 2020-12-15 浙江鸿程计算机系统有限公司 Global multi-query optimization method based on grouping
CN107301205A (en) * 2017-06-01 2017-10-27 华南理工大学 A kind of distributed Query method in real time of big data and system
CN107885865B (en) 2017-11-22 2019-12-10 星环信息科技(上海)有限公司 Cost optimizer and cost estimation method and equipment
CN108304517A (en) * 2018-01-23 2018-07-20 西南大学 Efficient nested querying method based on Complex event processing system
CN110297766B (en) * 2019-06-03 2023-05-30 合肥移瑞通信技术有限公司 Software testing method and software testing system based on distributed test node cluster
WO2021007816A1 (en) * 2019-07-17 2021-01-21 Alibaba Group Holding Limited Method and system for generating and executing query plan
CN115599811A (en) * 2021-07-09 2023-01-13 华为技术有限公司(Cn) Data processing method and device and computing system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102609493A (en) * 2012-01-20 2012-07-25 东华大学 Connection sequence inquiry optimizing method based on column-storage model
CN103198133A (en) * 2013-04-12 2013-07-10 同方知网(北京)技术有限公司 Query optimization method for converting XPath (XML path language) query into tree-form data structure
CN104408106A (en) * 2014-11-20 2015-03-11 浙江大学 Scheduling method for big data inquiry in distributed file system
CN104504018A (en) * 2014-12-11 2015-04-08 浙江大学 Top-down real-time big data query optimization method based on bushy tree

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150032722A1 (en) * 2013-03-15 2015-01-29 Teradata Corporation Optimization of database queries for database systems and environments

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102609493A (en) * 2012-01-20 2012-07-25 东华大学 Connection sequence inquiry optimizing method based on column-storage model
CN103198133A (en) * 2013-04-12 2013-07-10 同方知网(北京)技术有限公司 Query optimization method for converting XPath (XML path language) query into tree-form data structure
CN104408106A (en) * 2014-11-20 2015-03-11 浙江大学 Scheduling method for big data inquiry in distributed file system
CN104504018A (en) * 2014-12-11 2015-04-08 浙江大学 Top-down real-time big data query optimization method based on bushy tree

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于列存储的OLAP多查询优化策略研究与实现;陆戌辰;《中国优秀硕士学位论文全文数据库 信息科技辑》;20130615;第三章
基于改进DPhyp算法的Impala查询优化;周强 等;《计算机研究与发展》;20131230;第114-120页

Also Published As

Publication number Publication date
CN106446134A (en) 2017-02-22

Similar Documents

Publication Publication Date Title
CN106446134B (en) Local multi-query optimization method based on predicate specification and cost estimation
Koliousis et al. Saber: Window-based hybrid stream processing for heterogeneous architectures
Chen et al. ThunderGP: HLS-based graph processing framework on FPGAs
Wang et al. FlexGraph: a flexible and efficient distributed framework for GNN training
Boehm et al. On optimizing operator fusion plans for large-scale machine learning in systemml
CN107111653B (en) Query optimization of system memory load for parallel database systems
US9535975B2 (en) Parallel programming of in memory database utilizing extensible skeletons
Qiao et al. A fast parallel community discovery model on complex networks through approximate optimization
US11023443B2 (en) Collaborative planning for accelerating analytic queries
US20120072412A1 (en) Evaluating execution plan changes after a wakeup threshold time
JP6516110B2 (en) Multiple Query Optimization in SQL-on-Hadoop System
Funke et al. Data-parallel query processing on non-uniform data
CN105224452A (en) A kind of prediction cost optimization method for scientific program static analysis performance
CN104731969A (en) Mass data join aggregation query method, device and system in distributed environment
CN104778077A (en) High-speed extranuclear graph processing method and system based on random and continuous disk access
Shanoda et al. JOMR: Multi-join optimizer technique to enhance map-reduce job
Gao et al. GLog: A high level graph analysis system using MapReduce
Ye et al. Hippie: A data-paralleled pipeline approach to improve memory-efficiency and scalability for large dnn training
CN111984833B (en) High-performance graph mining method and system based on GPU
WO2018192479A1 (en) Adaptive code generation with a cost model for jit compiled execution in a database system
An et al. Using index in the mapreduce framework
Lai et al. {GLogS}: Interactive Graph Pattern Matching Query At Large Scale
CN104679521B (en) A kind of accurate calculating task cache WCET analysis method
CN110851178B (en) Inter-process program static analysis method based on distributed graph reachable computation
Werner et al. Automated composition and execution of hardware-accelerated operator graphs

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant