WO2016078592A1 - 批量数据查询方法和装置 - Google Patents

批量数据查询方法和装置 Download PDF

Info

Publication number
WO2016078592A1
WO2016078592A1 PCT/CN2015/094948 CN2015094948W WO2016078592A1 WO 2016078592 A1 WO2016078592 A1 WO 2016078592A1 CN 2015094948 W CN2015094948 W CN 2015094948W WO 2016078592 A1 WO2016078592 A1 WO 2016078592A1
Authority
WO
WIPO (PCT)
Prior art keywords
query
logical
operator
query plan
subtree
Prior art date
Application number
PCT/CN2015/094948
Other languages
English (en)
French (fr)
Inventor
李丰
王蕾
张赟
冯晓兵
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2016078592A1 publication Critical patent/WO2016078592A1/zh
Priority to US15/600,199 priority Critical patent/US10769147B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24542Plan optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2452Query translation
    • G06F16/24524Access plan code generation and invalidation; Reuse of access plans
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/2454Optimisation of common expressions

Definitions

  • the embodiment of the invention relates to a data warehouse query optimization technology, in particular to a batch data query method and device.
  • representative big data query systems use a single query statement as the basic unit of parsing and optimization.
  • the basic query processing flow is as follows: First, the read single query statement is parsed into a tree. The logical query plan tree of the structure, and then, by implementing the algorithm for each operator of the logical query plan tree, and determining the execution order of these operators, thereby converting the logical query plan tree into a physical query plan, and finally, querying the execution engine Execute the physical query plan and output the query results.
  • a key performance of the big data query system is query efficiency.
  • the commonly used method to improve the query efficiency of the big data query system is to reduce the number of tasks in the physical query plan and reduce the task by performing equivalent transformation on the logical query plan tree.
  • Execution overhead including: reducing the frequency of reading and writing to the file system, controlling the amount of data transmitted by the network, and the amount of calculations for query operations, etc.
  • the traditional processing mode of parsing and optimizing the basic unit with a single query statement has insufficient optimization opportunities.
  • the new task flow association optimization in Hive 0.12 can only accelerate the three use cases of the 22 test cases of the standard test set TPC-H, and has specific requirements for the use case.
  • the optimization opportunities between queries that is, the optimization opportunities existing between multiple query statements, in the batch query application scenario
  • the query optimization opportunities between queries that is, the optimization opportunities existing between multiple query statements
  • query optimization is only performed for a single query statement, so that the query efficiency of the big data query is inefficient.
  • the embodiment of the invention provides a batch data query method and device to improve batch data query effectiveness.
  • a first aspect of the present invention provides a method for querying bulk data, including:
  • each query statement corresponds to a logical query plan tree
  • each logical query plan tree in the plurality of logical query plan trees includes multiple operators,
  • Each operator of the logical query plan tree includes at least one operand
  • the plurality of logical query plan trees are divided into multiple according to each operator of each logical query plan tree Query groups, including:
  • the operator according to each logical query plan tree and the logic of each logic Querying the data identifier of each operand of each operator of the plan tree, and determining the inter-query write-read dependency of each logical query plan tree including:
  • the number of symbols to determine the operator overlap of each of the logical query plan trees including:
  • the logical query plan tree within the query performs inter-query optimization to generate a plurality of logical query plan trees optimized between queries, including:
  • the successor of the second query subtree is connected to the first query subtree, and the second query subtree is used as a new successor of the first query subtree to Having the subsequent query of the second query subtree multiplex the query result of the first query subtree;
  • the execution order of the first query subtree is After the second query subtree, the successor of the first query subtree is connected to the second query subtree, and the success of the first query subtree is used as the new query subtree. And increasing, so that the subsequent query of the first query subtree multiplexes the query result of the second query subtree.
  • the Logic query plan tree within the query for inter-query optimization including:
  • the Logic query plan tree within the query for inter-query optimization including:
  • the optimized multiple logic is Before the query plan tree generates the corresponding multiple physical query plan trees, the method further includes:
  • a second aspect of the present invention provides a bulk data query apparatus, including:
  • the obtaining module is configured to obtain a plurality of logical query plan trees corresponding to the plurality of query statements, wherein each query statement corresponds to a logical query plan tree, and each of the logical query plan trees in the plurality of logical query plan trees includes a plurality of operators, each operator of each logical query plan tree including at least one operand;
  • a grouping module configured to divide the plurality of logical query plan trees into multiple query groups according to each operator of each logical query plan tree, wherein each logical query plan tree in each query group has a lower Relationship: write-read dependencies, operator overlap relationships, or operand overlap relationships between queries;
  • An optimization module configured to perform inter-query optimization on each logical query plan tree in each query group
  • a generating module configured to generate a corresponding multiple physical query plan tree according to the optimized multiple logical query plan trees
  • the query module is configured to perform a physical query according to the multiple physical query plan trees, and obtain a query result of the multiple query statements.
  • the grouping module includes:
  • a first grouping unit configured to determine, according to each operator of each logical query plan tree and a data identifier of each operand of each operator of the logical query plan tree, each logical query plan tree Write dependencies between queries, and all logic with write-read dependencies between queries
  • the query plan tree is divided into the first group, and all logical query plan trees without inter-query write read dependencies are divided into the second group;
  • the first grouping unit is further configured to divide the logical query plan tree in the first group into at least one query group according to an inter-query write read dependency of each logical query plan tree in the first group;
  • a second grouping unit configured to count the number of types of operators of each logical query plan tree in the second group, and determine each of the types of operators according to each logical query plan tree
  • An operator overlap degree of a logical query plan tree, the logical query plan tree having an operator overlap relationship in the second group is divided into a first sub-packet, and the first sub-group is divided into the second group
  • the other logical query plan trees outside the logical query plan tree are divided into the second sub-group;
  • the second grouping unit is further configured to divide the logical query plan tree in the first sub-packet into at least one query group according to operator overlap degree of each logical query plan tree in the first sub-group;
  • a third grouping unit configured to determine, according to a data identifier of each operand of each operator of each logical query plan tree in the second subgroup, an operation of each logical query plan in the second subgroup The number of overlaps is divided into at least one query group according to the operand overlap degree of each logical query plan in the second sub-packet.
  • the first grouping unit is specifically configured to:
  • the second grouping unit is specifically configured to:
  • the optimization module is specifically configured to:
  • the successor of the second query subtree is connected to the first query subtree, and the second query subtree is used as a new successor of the first query subtree to Having the subsequent query of the second query subtree multiplex the query result of the first query subtree;
  • the execution order of the first query subtree is After the second query subtree, the successor of the first query subtree is connected to the second query subtree, and the success of the first query subtree is used as the new query subtree. And increasing, so that the subsequent query of the first query subtree multiplexes the query result of the second query subtree.
  • the optimization module is specifically configured to:
  • the optimization module is specifically configured to:
  • the optimization module is further configured to:
  • the batch data query method and device of the embodiment of the present invention divides multiple logical query plan trees into multiple queries according to each operator of each logical query plan tree by acquiring multiple logical query plan trees corresponding to multiple query statements.
  • Group optimize the query between the logical query plan trees in each query group to generate a plurality of logical query plan trees optimized between the queries, and the logical query plan trees in each of the divided query groups have the following relationship : inter-query write dependency, operator overlap relationship or operand overlap relationship, and then optimize inter-query for each logical query plan tree in each query group, and generate corresponding corresponding according to the optimized multiple logical query plan trees Multiple physical query plan trees.
  • Embodiment 1 is a flowchart of a method for querying bulk data according to Embodiment 1 of the present invention
  • FIG. 2 is a flowchart of a method for querying bulk data according to Embodiment 2 of the present invention
  • Embodiment 3 is a flowchart of a method for querying bulk data according to Embodiment 3 of the present invention.
  • FIG. 4 is a schematic structural diagram of a bulk data query apparatus according to Embodiment 4 of the present invention.
  • FIG. 5 is a schematic structural diagram of a batch data query apparatus according to Embodiment 5 of the present invention.
  • FIG. 6 is a schematic structural diagram of a bulk data query apparatus according to Embodiment 6 of the present invention.
  • the method of the embodiments of the present invention is mainly applied in a scenario of batch data query, for example, can be applied in a data warehouse, and there are a large number of bulk data query opportunities in the data warehouse, and a large amount of data exists in the data warehouse, and the data is usually stored in the data warehouse.
  • Distributed file system can be applied in a scenario of batch data query, for example, can be applied in a data warehouse, and there are a large number of bulk data query opportunities in the data warehouse, and a large amount of data exists in the data warehouse, and the data is usually stored in the data warehouse.
  • Distributed file system Distributed file system.
  • FIG. 1 is a flowchart of a method for querying a batch data according to a first embodiment of the present invention.
  • the method provided in this embodiment may be performed by a batch data query device, and the bulk data query device is specifically a database server, as shown in FIG. 1 .
  • the method of this embodiment may include the following steps:
  • Step 101 Acquire multiple logical query plan trees corresponding to multiple query statements, where each query statement corresponds to a logical query plan tree, and each logical query plan tree in the multiple logical query plan trees includes multiple operations. , each operator of the logical query plan tree includes at least one operand.
  • Each query sent by the client is parsed into a tree-like logical query plan tree on the server side.
  • the logical query plan tree includes multiple nodes, each node corresponds to an operator, and each operator refers to one or more. Operands. Commonly used operators include: Scan Operator, Sort Operator, Select Operator, Aggregate Operator, Product Operator, and Join Operator ( Join Operator) and Filevalue Operator (Filesink Operator).
  • Step 102 The logical query plan is configured according to each operator of each logical query plan tree.
  • the number is divided into multiple query groups, wherein the logical query plan trees in each query group have the following relationship: inter-query write read dependencies, operator overlap relationships, or operand overlap relationships.
  • the two logical query plan trees have write-read dependencies. If all the operators of the two logical query plan numbers are the same, then the two logical query plan trees have operator overlap relationship. If the data identifiers of all or part of the operands of the two logical query plan trees are the same, then the two logical query plan trees have an operand overlap relationship.
  • the number of identical operators of two logical query plan trees with operator overlap relationship is called the operand overlap degree, and the number of the same data identifiers of the two logical query plan trees with operand overlapping relationship is called Operand overlap.
  • the method in this embodiment is based on the result of the inter-query flow analysis, and the flow analysis adds data identifiers to the operands of each operator of each logical query plan tree in the multiple logical query plan trees, so that the logic is more
  • the write-read dependency, operator overlap relationship, and operation between each logical query plan tree are determined according to the data identifiers of the operators of each logical query plan tree and the operands of each operator.
  • the logical query plan tree of each query group after grouping has any one of the overlapping relationship and the operand overlapping relationship or
  • the combination groups multiple logical query plan trees, such as write-read dependencies, operator overlap relationships, or operand overlap relationships between queries.
  • the data identifier may be a version number of the operand.
  • the optimization opportunities between the logical query plan trees in each query group after grouping can be increased, and the logical query plan tree in each query group is next step. It is convenient to optimize between queries.
  • the logical query plan tree with inter-query write-read dependency can be divided into one query group, and then the inter-query optimization is performed on the logical query plan tree according to the inter-query write read dependency relationship. To avoid frequent reading and writing of distributed file systems.
  • the logical query plan tree with the common query subtree can be divided into one query group, and then the results of the public query subtree can be multiplexed when performing inter-query optimization.
  • the logical query plan tree with high operand overlap degree can be grouped into one group, and when the inter-query optimization is performed, the operations with the same data identifier are matched and transferred to one task, and the same is reduced. Data Access or operational overhead.
  • the number of logical query plan trees in each group is small, which can reduce the cost and complexity of optimization between queries, thereby improving the efficiency of batch query and reducing the optimization cost of the logical query plan.
  • Step 103 Perform inter-query optimization on the logical query plan tree in each query group.
  • Inter-query optimization is optimized for multiple logical query plan trees in a query group, that is, optimization opportunities exist between logical query plan trees instead of optimizing for a single logical query plan tree. There are three main optimization opportunities:
  • Common Query Subtree also known as public subquery, is for a logical query plan tree with operator overlap relationship, if a logical query plan tree in the query group
  • the first query subtree is identical to the tree structure of the second query subtree of another logical query plan tree, and the data identifier of the operand of each operator of the first query subtree is the same as the second query subtree
  • the data identifiers of the operands of the operators are the same.
  • the first query subtree When the execution order of the first query subtree is before the second query subtree, the first query subtree is called a public query subtree, and the second query subtree is a copy of the first query subtree, when the first query When the execution order of the subtree is after the second query subtree, the second query subtree is called a public query subtree, and the first query subtree is a copy of the second query subtree.
  • the optimization opportunity for this scenario is: multiplexing the query result of the first query subtree with the query result of the second query subtree.
  • the optimization opportunity of the common query sub-tree is first identified, specifically: determining whether the tree structure of the first query sub-tree and the second query sub-tree in each query group are the same, wherein The first query subtree and the second query subtree belong to two different logical query plan trees of the same query group; if the first query subtree and the second query subtree have the same tree structure, further determine the first The data identifier of the operand of the operator of the query subtree is the same as the data identifier of the operand of the operator of the second query subtree, if the data identifier of the operand of the operator of the first query subtree is If the data identifiers of the operands of the operators of the second query subtree are the same, it is determined that the first query subtree and the second query subtree are common query subtrees, and the optimization opportunity of the common query subtree is identified.
  • the first query subtree and the second Querying the subtree for optimization specifically: if the execution order of the first query subtree is before the second query subtree, connecting the successor of the second query subtree to the first query subtree, the first The successor of the second query subtree is the new successor of the first query subtree, and the first query subtree is deleted, so that the query result of the first query subtree is subsequently multiplexed by the second query subtree.
  • the successor of the first query subtree is connected to the second query subtree, and the successor of the first query subtree is The newly added successor of the second query subtree deletes the second query subtree, so that the query result of the second query subtree is subsequently multiplexed by the first query subtree.
  • query inlining is for the optimization of two logical query plan trees with write-read dependencies between queries.
  • two logical query plans with write-read dependencies between queries are used. The tree is optimized.
  • the query inline opportunity is identified. Specifically, it is determined whether a pair of file setting operators in each query group and the data identifier of the operand corresponding to the scan operator are the same, wherein the scan operator and the file are The fixed operator belongs to two different logical query plan trees in the same query group; if the file setting operator and the data identifier of the operand corresponding to the scan operator are the same, the file setting operator and The logical query plan tree to which the scan operator belongs has inter-query write-read dependencies.
  • the query inline opportunity is optimized to directly connect the predecessor of the file setting operator with the successor of the scan operator and delete the scan operator.
  • OPw represents a file-valued operator
  • OPr represents a scan operator
  • OPw and OPr belong to two logical query plan trees, respectively.
  • OPw writes the output result of OPw's predecessor to the distributed file system.
  • the output result of OPw's precursor is needed from the distributed file system. Read out, and then OPr's successor processing the output of the read OPw's precursor, it takes a lot of time to read and write data to the distributed file system, thus reducing the data query efficiency.
  • the read and write overhead of the system establishes a new predecessor-successor relationship for the precursor of OPw and the successor of OPr.
  • the successor of OPr is directly connected with the precursor of OPw, and the successor of OPr is the successor of OPW's predecessor.
  • the precursor of OPw After the execution, directly output the output result of the OPw precursor as the subsequent input of the OPr, without first writing the output result of the OPw precursor to the distributed file system and then reading it, thereby reducing the distributed file.
  • the system's read and write overhead improves query efficiency.
  • the precursor of the OPw is directly connected with the successor of the OPr, and the OPr is deleted, thereby merging the logical query plan tree to which the OPw belongs and the logical query plan tree to which the OPr belongs into a logical query plan tree.
  • the merged logical query plan tree may introduce new in-query optimization, which is optimized for a single logical query plan tree. Typical opportunities such as Job-flow Correlation Optimization are introduced. After the in-query optimization, the merged logical query plan tree can be optimized within the query.
  • Correlation Query Subtree which is for a logical query plan tree with operand overlap relationship. If there are two operators in the query group: the first operator and the second operation a character, wherein the first operator and the second operator belong to two different logical query plan trees in the same query group, and the operands of the first operator and the second operator overlap, that is, the first operator
  • the operand of the operand and the operand of the second operator are identical or partially identical, and the first operator and the second operator are said to be associated subqueries.
  • the associated sub-query optimization opportunity is identified. Specifically, it is determined whether the data identifiers of the first operator and the second operator in each query group are the same, wherein the first operator and the second The operators belong to two different logical query plan trees of the same query group; if the data identifier of the first operator and the operand of the second operator are the same, the first operator and the second operator are determined Have associated subquery optimization opportunities.
  • the associated sub-query optimization opportunity is optimized: a merge identifier is added for the first operator and the second operator, so that the first operation is generated when the physical query tree is generated The second operation coincides with the same task to reduce the overhead of scanning operations on overlapping data.
  • the merge identifier When the merge identifier is added to the first operator and the second operator, the following two methods may be used: in one mode, the same identifier information is directly added to the first operator and the second operator, and is generated. When the physical query plan is used, if the identification information of the two operators is the same, then the two operations will be Make a match and go to the same task. In another method, a common successor is added for the first operator and the second operator, and the original successor of the first operator and the second operator is used as the subsequent successor of the common, and the public successor is merged. Identification, when generating a physical query plan, if the two operators have a common successor, then the two operations are matched to the same task, and the public successor can be a new operator type to distinguish the existing ones. The operator, the public successor does not have the actual operational function, only the correspondence between the first operator and its original successor, and the second operator and its original successor.
  • Step 104 Generate a corresponding multiple physical query plan tree according to the optimized multiple logical query plan trees.
  • each logical query plan tree in the query group After the inter-query optimization is performed on each logical query plan tree in the query group, a plurality of logical query plan trees optimized between the queries are obtained, and then multiple physical query plans are generated according to the optimized multiple logical query plan trees between the queries. According to the logical query plan tree, the corresponding physical query plan is generated as an existing technology, and no excessive description is made here.
  • Step 105 Perform a physical query according to multiple physical query plan trees, and obtain query results of multiple query statements.
  • multiple logical query plan trees of multiple query statements are obtained, and the multiple logical query plan trees are divided into multiple query groups according to each operator of each logical query plan tree, and each divided The logical query plan trees in the query group have the following relationships: write-read dependencies between queries, operator overlap relationships, or operand overlap relationships.
  • the inter-query optimization is performed on the logical query plan tree in each query group, and corresponding multi-physical query plan trees are generated according to the optimized multiple logical query plan trees.
  • the physical query is executed according to the multiple physical query plan trees, and the query results of the multiple query statements are obtained.
  • step 102 of the first embodiment is mainly described.
  • the method in this embodiment may include the following steps:
  • Step 201 Acquire multiple logical query plan trees corresponding to multiple query statements, where each query statement corresponds to a logical query plan tree, and each logical query plan tree in the multiple logical query plan trees includes multiple operations.
  • each operator of the logical query plan tree includes at least one The operands, each of which has a data identifier, and the same operand has the same data identifier.
  • Step 202 Determine, according to the data identifier of each operator of each logical query plan tree and each operand of each logical query plan tree, determine the write-to-read dependency of each logical query plan tree, and all A logical query plan tree with inter-query write-read dependencies is divided into a first group, and all logical query plan trees that do not have inter-query write-read dependencies are divided into second packets, according to each logic in the first group
  • the inter-query write-read dependency of the query plan tree divides the logical query plan tree in the first group into at least one query group.
  • each logical query plan tree file is separately determined.
  • the data identifier of each operand of the value specified by the fixed operator is the same as the data identifier of each operand of the scan operator of the other logical query plan tree in the multiple logical query plan tree. If they are the same, the file is determined to be included.
  • the fixed-value operator and the logical query plan tree for this scan operator have inter-query write-read dependencies.
  • the flow analysis result includes inter-query write-read dependencies between operators from different logical query plan trees, then each pair of logical query plans to which the operator with write-read dependencies between queries belongs There is also a write-read dependency between queries between trees.
  • multiple logical query plan trees may be numbered, and the inter-query write read dependencies of each logical query plan tree are sequentially determined according to the numbered order. For example, if there are 10 logical query plan trees, the number is 1-10, then the inter-query write read dependency of the first logical query plan tree is determined first, and the file setting operation in the first logical query plan tree is first determined.
  • the operation of determining the value of the file setting operator of the first logical query plan tree Whether the data identifier of the number is the same as the data identifier of at least one of all the operands of the scan operator of the second logical query plan tree, and the data of the operand of the value determined by the file setting operator of the first logical query plan tree Determining that the data identifier of at least one of all the operands of the scan operator of the second logical query plan tree is the same, determining that the first logical query plan tree and the second logical query plan tree have inter-query write read dependencies, Then, it is sequentially determined whether there is an inter-query write-read dependency between the first logical query plan tree and the remaining eight logical query plan trees.
  • the query of the second logical query plan tree is determined.
  • the logical query plan tree is skipped to determine the write-read dependency between the queries of the next logical query plan tree.
  • Each logical query plan tree may also include multiple file-valued operators, so each logical query plan tree may have read-write dependencies between queries with multiple logical query plan trees. After determining the write-read dependency between the queries of each logical query plan tree, all logical query plan trees with inter-query write-read dependencies in multiple logical query plans are divided into the first group, and there will be no inter-query writes. The logical query plan tree of the dependencies is divided into second packets, and then the logical query plan trees within the first group are grouped according to the inter-query write read dependencies.
  • the first logical query plan tree and the second logical query plan tree and the sixth logical query plan tree both have inter-query write read dependencies
  • the sixth logical query plan tree and the fourth tree The logical query plan tree also has a write-read dependency between queries.
  • the first logical query plan tree and the second logical query plan tree can be divided into one query group
  • the sixth logical query plan tree is The fourth logical query plan tree is divided into a query group, and the first logical query plan tree, the second logical query plan tree, the sixth logical query plan tree, and the fourth logical query plan tree can also be divided into one.
  • the query group does not limit the specific grouping manner in the embodiment of the present invention.
  • the logical query plan tree with inter-query write-read dependencies is divided into one query group, thereby improving the optimization opportunities in the query group.
  • Step 203 Count the number of various operators of each logical query plan tree in the second group, and determine operator overlap of each logical query plan tree according to the number of various operators of each logical query plan tree. Dividing a logical query plan tree having an operator overlap relationship in the second group into the first sub-packet, and dividing the logical query plan tree in the second group except the logical query plan tree in the first sub-group to In the second sub-packet, the logical query plan tree in the first sub-packet is divided into at least one query group according to the operator overlap degree of each logical query plan tree in the first sub-packet.
  • the operator has multiple types, such as a connection operator, an aggregation operator, a scan operator, etc., and counts the number of various operators of each logical query plan tree in the second group.
  • a comparison process includes a process of comparing a logical query plan tree in the second group with any other logical query plan tree. If the number of operators of the two logical query plan trees in the second group has a different number of operators, the operator overlap of the two logical query plan trees in the comparison process is 0. .
  • the second group includes six logical query plan trees: the third tree, the fifth tree, the seventh tree, the eighth tree, the ninth tree, and the tenth logical query plan tree.
  • the third logical query plan tree is respectively The number of various operators is the same as the number of operators of the other five logical query plan trees. It is assumed that the number of connection operators of the third logical query plan tree is 3, and the number of aggregate operators is 5.
  • the number of scan operators is 6, the number of operators of the third logical query plan tree is the same as the number of operators of the eighth logical query plan tree, and the operations of the third logical query plan tree.
  • the number of characters is different from the number of operators of the other four logical query plan trees.
  • the operator of the third logical query plan tree overlaps with the other four logical query plan trees by zero. Then, the operand overlap of the remaining 5 logical query plan trees in each comparison process is determined in turn.
  • the third logical query plan tree and the eighth The logical query plan tree is divided into the first sub-group, and the third logical query plan tree and the eighth logical query plan tree are divided into the same query group.
  • the query group is divided according to the operator overlap degree of each logical query plan tree, and the logical query plan trees with the same operator overlap degree are divided into a group, for example, the first There are 4 logical query plan trees in the sub-group, in which the number of operators of the first and second logical query plan trees are the same, the operator overlap is 11, the third and fourth logical query plans The number of operators of the tree is the same, and the operator overlap is 13, then the first and second logical query plan trees are divided into one query group, and the third and fourth trees are divided into another query. s.
  • the number of operators of the above four logical query plan trees is the same, And the operator overlap is the same, then you can divide the four logical query plan trees into one query group.
  • the number of operators of the first and second logical query plan trees, and the third and fourth logical queries The number of operators of the plan tree is not exactly the same.
  • the number of join operators for the first and second logical query plan trees is 2, the number of scan operators is 5, and the number of aggregate operators is 4, the third The number of connection operators for the tree and the fourth logical query plan tree is 4, the number of scan operators is 5, and the number of aggregate operators is 2. Therefore, 4 logical query plan trees cannot be divided into one query group, and The first and second logical query plan trees are divided into one query group, and the third and fourth logical query plan trees are divided into another query group.
  • Step 204 Determine, according to the data identifier of each operand of each operator of each logical query plan tree in the second subgroup, the operand overlap degree of each logical query plan in the second subgroup, according to the second sub
  • the operand overlap of each logical query plan in the group divides the logical query plan tree in the second subgroup into at least one query group.
  • the operand overlap of each logical query plan in the second sub-group may be determined according to the data identifier of each operand of each operator of the logical query plan tree in the second sub-group, Specifically, the data identifiers of the operands of each operator of each logical query plan tree in the second subgroup are respectively associated with the operands of the operators of the other logical query plan trees in the second subgroup. The data identifiers are compared to obtain an operand overlap of each logical query plan tree in the second subgroup in each comparison process, wherein each comparison process includes a logical query plan tree in the second subgroup A process of comparing with any other logical query plan tree in the second subgroup.
  • the fifth tree, the seventh tree , the ninth tree and the tenth logical query plan tree first compare the data identifier of each operand of each operator of the fifth logical query plan tree with the operand of the operator of the seventh logical query plan tree, Assume that the fifth logical query plan tree has six kinds of data identifiers, the seventh logical query plan tree has five kinds of data identifiers, and the fifth logical query plan tree has three kinds of data identifiers and the data identifier of the seventh logical query plan tree.
  • the fifth logical query plan tree and the seventh logical query plan tree have an operand overlap degree of 3 and then, the fifth logical query plan tree and the ninth logical query plan tree and the tenth tree are sequentially determined.
  • Logical query plan tree operand overlap in Determine the overlap degree of the operand of the 7th logical query plan tree and the other 3 logical query plan trees, since the operand overlap degree of the 5th logical query plan tree and the 7th logical query plan tree has been determined, therefore, When determining the overlap degree of the 7th logical query plan tree, it is not necessary to determine the overlap degree of the operand of the 7th logical query plan tree and the 5th logical query plan tree, and only need to determine the 7th logical query plan tree and the 9th The degree of overlap of the logical query plan tree and the tenth logical query plan tree, and finally determines the degree of overlap of the operand of the ninth logical query plan tree and the tenth logical query plan tree.
  • the logical query plan tree with the largest overlap degree is preferentially divided into one. group.
  • the fifth logical query plan tree has the highest overlap with the operand of the seventh logical query plan tree, and the highest operand overlap degree is 5, then the fifth logical query plan tree and the seventh logical query plan tree are Divided into a query group, if the overlap between the fifth logical query plan tree and the ninth logical query plan tree is also 5, the overlap between the ninth logical query plan tree and the tenth logical query plan tree is 3. Then, the 9th logical query plan tree can also be divided into the query group.
  • the query group There are three logical query plan trees in the query group. Since only the 10th logical query plan tree is left, the 10th logical query can be performed.
  • the plan tree is divided into a single query group, and the 10th logical query plan tree can also be divided into the query group.
  • the total number of overlapping operands of the two logical query plan trees is the same, and does not mean that each element of the overlapping operands of the two logical query plan trees is the same, therefore, in determining the logical query plan tree of multiple logical query plans After the overlap degree is the same, it is further determined whether the data identifiers of the overlapping operands included in the two logical query plan trees are the same. For example, the overlap degree of the operand of the fifth logical query plan tree and the seventh logical query plan tree is 5.
  • the data identifiers of the overlapping operands are: data identifiers A, B, C, D, and E, and the overlap degree of the operand of the fifth logical query plan tree and the ninth logical query plan tree is also 5, but overlap.
  • the data identifier of the operand is different.
  • the data identifier of the overlapping data of the fifth logical query plan tree and the ninth logical query plan tree includes: data identifiers A, B, C, E, and F, and the 9th logical query plan tree.
  • the degree of overlap with operand 6 of the 10th logical query plan tree is 3. Therefore, when grouping, the fifth logical query plan tree and the seventh logical query plan tree can be divided into one query group, and the ninth logical query plan tree and the tenth logical query plan tree are divided into one query group. .
  • the scan of the plan tree may be performed according to each logical query in the second subgroup.
  • the data identifier of each operand of the operator is determined to determine the degree of overlap of the operands of each logical query plan in the second subgroup.
  • the data identifiers of the operands of the scan operators of each logical query plan tree in the second subgroup are respectively associated with the operands of the scan operators of the other logical query plan trees in the second subgroup.
  • the data identifiers are compared to obtain an operand overlap of each logical query plan tree in the second subgroup in each comparison process, wherein each comparison process includes a logical query plan tree in the second subgroup A process of comparing with any other logical query plan tree in the second subgroup.
  • Step 205 Perform inter-query optimization on each logical query plan tree in each query group.
  • Step 206 Generate a corresponding multiple physical query plan tree according to the optimized multiple logical query plan trees.
  • Step 207 Perform a physical query according to the multiple physical query plan trees, and obtain a query result of the multiple query statements.
  • multiple logical query plan trees are grouped according to the inter-query write read dependency relationship of each logical query plan tree, and the logical query plan tree with inter-query write read dependencies is divided into the same query group. Then, the remaining logical query plan trees that do not have inter-query write-read dependencies are divided according to the operator overlap relationship, and finally the remaining logical query plan trees are divided according to the write-read dependency and the operator overlap relationship between the queries. Divide according to the operand overlap relationship.
  • the grouping manner is described in detail by taking three parameters of an operand overlap relationship, an operator overlap relationship, and an inter-query write-read dependency relationship according to a logical query plan tree, but the present invention does not
  • the restriction can also be performed according to any one of the above three parameters. For example, when grouping only based on write-read dependencies between queries, the logical query plan tree with inter-query write-read dependencies is first divided into a query group, and a logical query plan tree without inter-query write-read dependencies is used. Divided into a query group, if there is not a logical query plan tree with inter-query write-read dependency, the grouping is too large.
  • the logical query plan tree without inter-query write-read dependency can be randomly divided into Several query groups. In accordance with only When grouping overlapping relationships, the logical query plan tree with the same number of operators is divided into one query group, and the remaining logical query plan trees are divided into one query group. When grouping according to the write-read dependency and the operator overlap relationship between queries, the grouping is first based on the write-read dependency relationship between the queries, and the remaining logical query plan trees without the inter-query write-read dependency are performed according to the operator overlap relationship. Grouping.
  • the grouping is first based on the write-read dependency relationship between the queries, and the remaining logical query plan trees without the inter-query write-read dependency are performed according to the operand overlapping relationship. Grouping.
  • the operand overlap degree, or the operator overlap degree between the queries refer to the related description in the second embodiment, and details are not described herein again.
  • FIG. 3 is a flowchart of a batch data query method according to Embodiment 3 of the present invention.
  • the difference between this embodiment and the first embodiment is that, in this embodiment, after inter-query optimization is performed for each query group, Optimized as a whole, as shown in FIG. 2, the method of this embodiment may include the following steps:
  • Step 301 Acquire a plurality of logical query plan trees corresponding to the plurality of query statements, where each query statement corresponds to a logical query plan tree.
  • Step 302 Divide the plurality of logical query plan trees into multiple query groups according to each operator of each logical query plan tree, wherein the logical query plan trees in each query group have the following relationship: inter-query write Read dependencies, operator overlap relationships, or operand overlap relationships.
  • Step 303 Perform inter-query optimization on the logical query plan tree in each query group.
  • Step 304 Create a virtual operation node, and use the virtual operation node as a root node, and use the root node of the plurality of logical query plan trees optimized by the query as a precursor of the virtual operation node to generate a virtual logic query plan tree. .
  • the purpose of this step is to merge the optimized logical query plan trees between queries into a logical query plan tree. Specifically, by creating a virtual operation node, and establishing each optimized row between the virtual operation node and the query.
  • the successor-predecessor relationship of the root node of the logical query plan tree, that is, the virtual operation node is used as the root node of each logical query plan tree optimized between queries, thereby forming a virtual logic query plan tree.
  • the type of operator represented by this virtual operation node is a new type other than the existing operator type.
  • Step 305 Perform intra-query optimization on the virtual logic query plan tree to generate an in-query optimization After the virtual logic query plan tree.
  • the virtual logic query plan tree can be optimized by using the existing in-query optimization technology by merging the multiple logical query plan trees optimized between the queries into one virtual logical query plan tree.
  • Commonly used in-query optimization techniques include: Predicate Pushdown optimization, Column Pruning optimization, and task flow correlation optimization.
  • the model can be optimized based on optimization rules or costs to determine whether to optimize the virtual logic query plan tree.
  • Step 306 Delete the virtual operation node of the optimized virtual logic query plan tree in the query, and obtain an optimized multiple logical query plan tree.
  • Step 307 Generate a corresponding multiple physical query plan tree according to the optimized multiple logical query plan trees.
  • Step 308 Perform a physical query according to the multiple physical query plan trees, and obtain a query result of changing multiple query statements.
  • the plurality of logical query plan trees optimized by the query are merged into one logical query plan tree, and then the intra-query optimization is performed on the merged logical query plan tree, and the logical query plan tree is further processed. Optimized to improve the query efficiency of batch queries.
  • the apparatus provided in this embodiment includes: an obtaining module 11, a grouping module 12, an optimization module 13, a generating module 14, and a query module. 15.
  • the obtaining module 11 is configured to acquire a plurality of logical query plan trees corresponding to the plurality of query statements, where each query statement corresponds to a logical query plan tree, and each logical query in the multiple logical query plan trees
  • the plan tree includes a plurality of operators, each operator of each logical query plan tree including at least one operand;
  • the grouping module 12 is configured to divide the plurality of logical query plan trees into a plurality of query groups according to the operators of each logical query plan tree, wherein each of the logical query plan trees in each query group has The following relationship: write-read dependencies, operator overlap relationships, or operand overlap relationships between queries;
  • the optimization module 13 is configured to perform inter-query optimization on each logical query plan tree in each query group;
  • the generating module 14 is configured to generate a corresponding multiple physical query plan tree according to the optimized multiple logical query plan trees;
  • the querying module 15 is configured to perform a physical query according to the multiple physical query plan trees, and obtain a query result of the multiple query statements.
  • the optimization module 13 is specifically configured to:
  • the successor of the second query subtree is connected to the first query subtree, and the second query subtree is used as a new successor of the first query subtree to Having the subsequent query of the second query subtree multiplex the query result of the first query subtree;
  • the execution order of the first query subtree is After the second query subtree, the successor of the first query subtree is connected to the second query subtree, and the success of the first query subtree is used as the new query subtree. And increasing, so that the subsequent query of the first query subtree multiplexes the query result of the second query subtree.
  • the optimization module 13 is specifically configured to:
  • the optimization module 13 is specifically configured to:
  • the device in this embodiment can be used to perform the method in the first embodiment of the method.
  • the specific implementation is similar to the technical effect, and details are not described herein again.
  • FIG. 5 is a schematic structural diagram of a bulk data query apparatus according to Embodiment 5 of the present invention.
  • the apparatus of this embodiment is based on the apparatus structure shown in FIG. 4, and further, the grouping module 12 includes: a first grouping unit 121, The second grouping unit 122 and the third grouping unit 123.
  • the first grouping unit 121 is configured to determine, according to each operator of each logical query plan tree and a data identifier of each operand of each operator of the logical query plan tree, determine each logical logic. Query the inter-query write-read dependency of the plan tree, divide all logical query plan trees with write-read dependencies between queries into the first group, and divide all logical query plan trees without inter-query write-read dependencies into the first In the second group;
  • the first grouping unit 121 is further configured to divide the logical query plan tree in the first group into at least one query group according to an inter-query write read dependency of each logical query plan tree in the first group ;
  • a second grouping unit 122 configured to count the number of types of operators of each logical query plan tree in the second group, and determine the number according to the number of types of operators of each logical query plan tree. Dividing operator overlap of each logical query plan tree, dividing a logical query plan tree having an operator overlap relationship in the second group into a first sub-packet, and dividing the second group from the first sub-group The other logical query plan trees outside the logical query plan tree in the group are divided into the second sub-group;
  • the second grouping unit 122 is further configured to divide the logical query plan tree in the first sub-packet into at least one query group according to operator overlap degree of each logical query plan tree in the first sub-group;
  • a third grouping unit 123 configured to determine, according to a data identifier of each operand of each operator of each logical query plan tree in the second subgroup, each logical query plan in the second subgroup
  • the operand overlap degree is divided into at least one query group according to the operand overlap degree of each logical query plan in the second sub-group.
  • the first grouping unit 121 is specifically configured to:
  • the second grouping unit 122 is specifically configured to:
  • optimization module 13 is further configured to:
  • the device in this embodiment may be used to perform the methods in the second embodiment and the third embodiment, and the implementation principles and technical effects are similar, and details are not described herein again.
  • FIG. 6 is a schematic structural diagram of a batch data query apparatus according to Embodiment 6 of the present invention.
  • the batch data query apparatus 300 of this embodiment includes: a processor 31, a memory 32, and a system bus 33. 31 and the memory 32 are connected through the system bus 33 and complete communication with each other; the memory 32 is used to store a computer execution instruction 321; the processor 31 is configured to run the computer to execute the instruction 321 To perform the method described below:
  • each query statement corresponds to a logical query plan tree
  • each logical query plan tree in the plurality of logical query plan trees Include a plurality of operators, each operator of each logical query plan tree including at least one operand;
  • each logical query plan tree divides the multiple logical query plan trees into multiple query groups, including:
  • each operator of the plan tree according to each logical query and the logic of each logic Querying the data identifier of each operand of each operator of the plan tree, and determining the inter-query write-read dependency of each logical query plan tree including:
  • the operator overlaps the logical query plan tree according to the number of various operators of the logical query plan tree, including:
  • the performing inter-query optimization on the logical query plan tree in each query group includes:
  • the successor of the second query subtree is connected to the first query subtree, and the second query subtree is used as a new successor of the first query subtree to Having the subsequent query of the second query subtree multiplex the query result of the first query subtree;
  • the successor of the first query subtree is connected to the second query subtree, and the success of the first query subtree is used as the second query subtree.
  • a successor is added to cause the subsequent query of the first query subtree to multiplex the query result of the second query subtree.
  • the performing inter-query optimization on the logical query plan tree in each query group includes:
  • the performing inter-query optimization on the logical query plan tree in each query group includes:
  • the method before the generating the corresponding multiple physical query plan trees according to the optimized multiple logical query plan trees, the method further includes:
  • the apparatus of this embodiment may be used to perform the method embodiments of any one of Embodiments 1 to 3.
  • the technical solution has similar implementation principles and technical effects, and will not be described here.
  • the aforementioned program can be stored in a computer readable storage medium.
  • the program when executed, performs the steps including the foregoing method embodiments; and the foregoing storage medium includes various media that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Operations Research (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明实施例提供一种批量数据查询方法和装置,通过获取多条查询语句对应的多棵逻辑查询计划树,根据每棵逻辑查询计划树的各操作符将多棵逻辑查询计划树划分为多个查询组,每个查询组中的逻辑查询计划树之间具有下述关系:查询间写读依赖关系、操作符重叠关系或操作数重叠关系,然后,对每个查询组内的逻辑查询计划树进行查询间优化,根据优化后的多棵逻辑查询计划树生成对应的多棵物理查询计划树,最后,根据多棵物理查询计划树执行物理查询得到多条查询语句的查询结果。所述方法通过对多棵逻辑查询计划树进行分组,提高了查询组内的逻辑查询计划树之间的优化机会,通过对查询组内的逻辑查询计划树进行查询间优化,提高了批量数据查询的效率。

Description

批量数据查询方法和装置 技术领域
本发明实施例涉及数据仓库查询优化技术,尤其涉及一种批量数据查询方法和装置。
背景技术
目前代表性的大数据查询系统(如:Hive,Shark,Impala)均以单条查询语句作为解析和优化的基本单位,其基本查询处理流程如下:首先,将读入的单条查询语句解析成树状结构的逻辑查询计划树,然后,通过对逻辑查询计划树的每一个算子选择实现算法,并确定这些算子的执行顺序,从而将逻辑查询计划树转换为物理查询计划,最后,查询执行引擎执行该物理查询计划,并输出查询结果。
大数据查询系统的一个关键性能是查询效率,目前常用的提高大数据查询系统查询效率的方法主要是通过对逻辑查询计划树进行等价变换,以减少物理查询计划中的任务数量、降低任务的执行开销(包括:降低对文件系统的读写频率、控制网络传输的数据量以及查询操作的计算量等)。然而,在数据仓库(Data Warehouse)的批量查询场景下,传统的以单条查询语句为解析和优化基本单位的处理模式存在优化机会不足的问题。例如:Hive 0.12版本新增的任务流关联性优化只能为标准测试集TPC-H共计22个查询用例中的3个用例带来加速,且对用例的书写方式有特定的要求。
与查询内优化机会不足构成强烈对比的是数据仓库批量查询应用场景中所呈现出的丰富的查询间优化机会,查询间优化机会即多条查询语句之间存在的优化机会,在批量查询应用场景下,查询语句之间存在雷同的概率较高,因此,存在大量的查询优化机会,但是,现有技术中只针对单条查询语句进行查询优化,使得大数据查询的查询效率低下。
发明内容
本发明实施例提供一种批量数据查询方法和装置,以提高批量数据查询 效率。
本发明第一方面提供一种批量数据查询方法,包括:
获取多条查询语句对应的多棵逻辑查询计划树,其中,每条查询语句对应一棵逻辑查询计划树,所述多棵逻辑查询计划树中的每棵逻辑查询计划树包括多个操作符,所述每棵逻辑查询计划树的每个操作符包括至少一个操作数;
根据所述每棵逻辑查询计划树的各操作符将所述多棵逻辑查询计划树划分为多个查询组,其中,每个查询组中的逻辑查询计划树之间具有下述关系:查询间写读依赖关系、操作符重叠关系或操作数重叠关系;
对所述每个查询组内的各逻辑查询计划树进行查询间优化;
根据优化后的多棵逻辑查询计划树生成对应的多棵物理查询计划树;
根据所述多棵物理查询计划树执行物理查询,得到所述多条查询语句的查询结果。
结合本发明第一方面,在本发明第一方面的第一种可能的实现方式中,所述根据所述每棵逻辑查询计划树的各操作符将所述多棵逻辑查询计划树划分为多个查询组,包括:
根据所述每棵逻辑查询计划树的各操作符和所述每棵逻辑查询计划树的各操作符的各操作数的数据标识,确定所述每棵逻辑查询计划树的查询间写读依赖关系,将所有具有查询间写读依赖关系的逻辑查询计划树划分到第一分组中,将所有不具有查询间写读依赖关系的逻辑查询计划树划分到第二分组中;
根据所述第一分组中的每棵逻辑查询计划树的查询间写读依赖关系将所述第一分组中的逻辑查询计划树划分为至少一个查询组;
统计所述第二分组中的每棵逻辑查询计划树的各类操作符的数量,根据所述每棵逻辑查询计划树的各类操作符的数量,确定所述每棵逻辑查询计划树的操作符重叠度,将所述第二分组中具有操作符重叠关系的逻辑查询计划树划分到第一子分组中,将所述第二分组中除所述第一子分组中的逻辑查询计划树之外的其他逻辑查询计划树划分到第二子分组中;
根据所述第一子分组中每棵逻辑查询计划树的操作符重叠度将所述第一子分组中逻辑查询计划树划分为至少一个查询组;
根据所述第二子分组中的每棵逻辑查询计划树的各操作符的各操作数的数据标识,确定所述第二子分组中的每棵逻辑查询计划的操作数重叠度,根据所述第二子分组中的每棵逻辑查询计划的操作数重叠度,将所述第二子分组中的逻辑查询计划树划分为至少一个查询组。
结合本发明第一方面的第一种可能的实现方式,在本发明第一方面的第二种可能的实现方式中,所述根据每棵逻辑查询计划树的各操作符和所述每棵逻辑查询计划树的各操作符的各操作数的数据标识,确定所述每棵逻辑查询计划树的查询间写读依赖关系,包括:
分别判断所述每棵逻辑查询计划树的文件定值操作符所定值的各操作数的数据标识,是否与所述多棵逻辑查询计划树中的其他逻辑查询计划树的扫描操作符的各操作数的数据标识相同;
若相同,则确定包含所述文件定值操作符和所述扫描操作符的逻辑查询计划树具有查询间写读依赖关系。
结合本发明第一方面的第一种和第二种可能的实现方式,在本发明第一方面的第三种可能的实现方式中,所述根据所述每棵逻辑查询计划树的各类操作符的数量,确定所述每棵逻辑查询计划树的操作符重叠度,包括:
将所述第二分组中的每棵逻辑查询计划树的各类操作符的数量分别与所述第二分组中的其他逻辑查询计划树的各类操作符的数量进行比较,若所述第二分组中的两棵逻辑查询计划树的各类操作符的数量均相同,则将所述各类操作符的数量的总和分别作为所述两棵逻辑查询计划树在一次比较过程中的操作符重叠度,其中,所述一次比较过程包括所述第二分组中的一棵逻辑查询计划树与其他任意一棵逻辑查询计划树的比较过程。
结合本发明第一方面以及本发明第一方面的第一种至第三种可能的实现方式,在本发明第一方面的第四种可能的实现方式中,所述对所述每个查询组内的逻辑查询计划树进行查询间优化以生成查询间优化后的多棵逻辑查询计划树,包括:
判断所述每个查询组内的第一查询子树和第二查询子树的树结构是否相同,其中,所述第一查询子树和所述第二查询子树属于同一个查询组的两棵不同的逻辑查询计划树;
若所述第一查询子树和所述第二查询子树的树结构相同,判断所述第一 查询子树的操作符的操作数的数据标识与所述第二查询子树的操作符的操作数的数据标识是否相同;
若所述第一查询子树的操作符的操作数的数据标识与所述第二查询子树的操作符的操作数的数据标识相同,且所述第一查询子树的执行顺序在所述第二查询子树之前,则将所述第二查询子树的后继连接到所述第一查询子树上,所述第二查询子树作为所述第一查询子树的新增后继,以使所述第二查询子树的后继复用所述第一查询子树的查询结果;
或者,若所述第一查询子树的操作符的操作数的数据标识与所述第二查询子树的操作符的操作数的数据标识相同,且所述第一查询子树的执行顺序在所述第二查询子树之后,则将所述第一查询子树的后继连接到所述第二查询子树上,所述第一查询子树的后继作为所述第二查询子树的新增后继,以使所述第一查询子树的后继复用所述第二查询子树的查询结果。
结合本发明第一方面以及本发明第一方面的第一种至第三种可能的实现方式,在本发明第一方面的第五种可能的实现方式中,所述对所述每个查询组内的逻辑查询计划树进行查询间优化,包括:
判断所述每个查询组内的一对文件定值操作符和扫描操作符对应的操作数的数据标识是否相同,其中,所述扫描操作符和所述文件定值操作符属于同一个查询组内的两棵不同的逻辑查询计划树;
若所述文件定值操作符和所述扫描操作符对应的操作数的数据标识相同,则确定所述文件定值操作符和所述扫描操作符所属的逻辑查询计划树具有写读依赖关系,将所述文件定值操作符的前驱与所述扫描操作符的后继直接连接,并删除所述扫描操作符。
结合本发明第一方面以及本发明第一方面的第一种至第三种可能的实现方式,在本发明第一方面的第六种可能的实现方式中,所述对所述每个查询组内的逻辑查询计划树进行查询间优化,包括:
判断所述每个查询组内的第一操作符和第二操作符的操作数的数据标识是否相同,其中,所述第一操作符和所述第二操作符属于同一个查询组的两棵不同的逻辑查询计划树;
若所述第一操作符和所述第二操作符的操作数的数据标识相同,则为所述第一操作符和所述第二操作符添加合并标识,所述合并标识用于在生成物 理查询计划树时将所述第一操作符和所述第二操作符进行合并。
结合本发明第一方面以及本发明第一方面的第一种至第六种可能的实现方式,在本发明第一方面的第七种可能的实现方式中,所述根据优化后的多棵逻辑查询计划树生成对应的多棵物理查询计划树之前,所述方法还包括:
创建一个虚拟操作节点,以所述虚拟操作节点为根节点,将查询间优化后的多棵逻辑查询计划树的根节点作为所述虚拟操作节点的前驱,生成一棵虚拟逻辑查询计划树;
对所述虚拟逻辑查询计划树进行查询内优化,生成查询内优化后的虚拟逻辑查询计划树;
删除所述查询内优化后的虚拟逻辑查询计划树的所述虚拟操作节点,得到所述优化后的多棵逻辑查询计划树。
本发明第二方面提供一种批量数据查询装置,包括:
获取模块,用于获取多条查询语句对应的多棵逻辑查询计划树,其中,每条查询语句对应一棵逻辑查询计划树,所述多棵逻辑查询计划树中的每棵逻辑查询计划树包括多个操作符,所述每棵逻辑查询计划树的每个操作符包括至少一个操作数;
分组模块,用于根据所述每棵逻辑查询计划树的各操作符将所述多棵逻辑查询计划树划分为多个查询组,其中,每个查询组中的逻辑查询计划树之间具有下述关系:查询间写读依赖关系、操作符重叠关系或操作数重叠关系;
优化模块,用于对所述每个查询组内的各逻辑查询计划树进行查询间优化;
生成模块,用于根据优化后的多棵逻辑查询计划树生成对应的多棵物理查询计划树;
查询模块,用于根据所述多棵物理查询计划树执行物理查询,得到所述多条查询语句的查询结果。
结合本发明第二方面,在本发明第二方面的第一种可能的实现方式中,所述分组模块包括:
第一分组单元,用于根据所述每棵逻辑查询计划树的各操作符和所述每棵逻辑查询计划树的各操作符的各操作数的数据标识,确定所述每棵逻辑查询计划树的查询间写读依赖关系,将所有具有查询间写读依赖关系的逻辑查 询计划树划分到第一分组中,将所有不具有查询间写读依赖关系的逻辑查询计划树划分到第二分组中;
所述第一分组单元,还用于根据所述第一分组中的每棵逻辑查询计划树的查询间写读依赖关系将所述第一分组中的逻辑查询计划树划分为至少一个查询组;
第二分组单元,用于统计所述第二分组中的每棵逻辑查询计划树的各类操作符的数量,根据所述每棵逻辑查询计划树的各类操作符的数量,确定所述每棵逻辑查询计划树的操作符重叠度,将所述第二分组中具有操作符重叠关系的逻辑查询计划树划分到第一子分组中,将所述第二分组中除所述第一子分组中的逻辑查询计划树之外的其他逻辑查询计划树划分到第二子分组中;
所述第二分组单元,还用于根据所述第一子分组中每棵逻辑查询计划树的操作符重叠度将所述第一子分组中逻辑查询计划树划分为至少一个查询组;
第三分组单元,用于根据所述第二子分组中的每棵逻辑查询计划树的各操作符的各操作数的数据标识,确定所述第二子分组中的每棵逻辑查询计划的操作数重叠度,根据所述第二子分组中的每棵逻辑查询计划的操作数重叠度,将所述第二子分组中的逻辑查询计划树划分为至少一个查询组。
结合本发明第二方面的第一种可能的实现方式,在本发明第二方面的第二种可能的实现方式中,所述第一分组单元具体用于:
分别判断所述每棵逻辑查询计划树的文件定值操作符所定值的各操作数的数据标识,是否与所述多棵逻辑查询计划树中的其他逻辑查询计划树的扫描操作符的各操作数的数据标识相同;
若相同,则确定包含所述文件定值操作符和所述扫描操作符的逻辑查询计划树具有查询间写读依赖关系。
结合本发明第二方面的第一种和第二种可能的实现方式,在本发明第二方面的第三种可能的实现方式中,所述第二分组单元具体用于:
将所述第二分组中的每棵逻辑查询计划树的各类操作符的数量分别与所述第二分组中的其他逻辑查询计划树的各类操作符的数量进行比较,若所述第二分组中的两棵逻辑查询计划树的各类操作符的数量均相同,则将所述各 类操作符的数量的总和分别作为所述两棵逻辑查询计划树在一次比较过程中的操作符重叠度,其中,所述一次比较过程包括所述第二分组中的一棵逻辑查询计划树与其他任意一棵逻辑查询计划树的比较过程。
结合本发明第二方面以及第二方面的第一种至第三种可能的实现方式,在本发明第二方面的第四种可能的实现方式中,所述优化模块具体用于:
判断所述每个查询组内的第一查询子树和第二查询子树的树结构是否相同,其中,所述第一查询子树和所述第二查询子树属于同一个查询组的两棵不同的逻辑查询计划树;
若所述第一查询子树和所述第二查询子树的树结构相同,判断所述第一查询子树的操作符的操作数的数据标识与所述第二查询子树的操作符的操作数的数据标识是否相同;
若所述第一查询子树的操作符的操作数的数据标识与所述第二查询子树的操作符的操作数的数据标识相同,且所述第一查询子树的执行顺序在所述第二查询子树之前,则将所述第二查询子树的后继连接到所述第一查询子树上,所述第二查询子树作为所述第一查询子树的新增后继,以使所述第二查询子树的后继复用所述第一查询子树的查询结果;
或者,若所述第一查询子树的操作符的操作数的数据标识与所述第二查询子树的操作符的操作数的数据标识相同,且所述第一查询子树的执行顺序在所述第二查询子树之后,则将所述第一查询子树的后继连接到所述第二查询子树上,所述第一查询子树的后继作为所述第二查询子树的新增后继,以使所述第一查询子树的后继复用所述第二查询子树的查询结果。
结合本发明第二方面以及第二方面的第一种至第三种可能的实现方式,在本发明第二方面的第五种可能的实现方式中,所述优化模块具体用于:
判断所述每个查询组内的一对文件定值操作符和扫描操作符对应的操作数的数据标识是否相同,其中,所述扫描操作符和所述文件定值操作符属于同一个查询组内的两棵不同的逻辑查询计划树;
若所述文件定值操作符和所述扫描操作符对应的操作数的数据标识相同,则确定所述文件定值操作符和所述扫描操作符所属的逻辑查询计划树具有写读依赖关系,将所述文件定值操作符的前驱与所述扫描操作符的后继直接连接,并删除所述扫描操作符。
结合本发明第二方面以及第二方面的第一种至第三种可能的实现方式,在本发明第二方面的第六种可能的实现方式中,所述优化模块具体用于:
判断所述每个查询组内的第一操作符和第二操作符的操作数的数据标识是否相同,其中,所述第一操作符和所述第二操作符属于同一个查询组的两棵不同的逻辑查询计划树;
若所述第一操作符和所述第二操作符的操作数的数据标识相同,则为所述第一操作符和所述第二操作符添加合并标识,所述合并标识用于在生成物理查询计划树时将所述第一操作符和所述第二操作符进行合并。
结合本发明第二方面以及第二方面的第一种至第六种可能的实现方式,在本发明第二方面的第七种可能的实现方式中,所述优化模块还用于:
创建一个虚拟操作节点,以所述虚拟操作节点为根节点,将查询间优化后的多棵逻辑查询计划树的根节点作为所述虚拟操作节点的前驱,生成一棵虚拟逻辑查询计划树;
对所述虚拟逻辑查询计划树进行查询内优化,生成查询内优化后的虚拟逻辑查询计划树;
删除所述查询内优化后的虚拟逻辑查询计划树的所述虚拟操作节点,得到所述优化后的多棵逻辑查询计划树。
本发明实施例的批量数据查询方法和装置,通过获取多条查询语句对应的多棵逻辑查询计划树,根据每棵逻辑查询计划树的各操作符将多棵逻辑查询计划树划分为多个查询组,对每个查询组内的逻辑查询计划树进行查询间优化以生成查询间优化后的多棵逻辑查询计划树,划分后的每个查询组中的逻辑查询计划树之间具有下述关系:查询间写读依赖关系、操作符重叠关系或操作数重叠关系,然后,对每个查询组内的各逻辑查询计划树进行查询间优化,根据优化后的多棵逻辑查询计划树生成对应的多棵物理查询计划树,最后,根据多棵物理查询计划树执行物理查询,得到多条查询语句的查询结果。上述方法通过对多棵逻辑查询计划树进行分组,可以提高查询组内的逻辑查询计划树之间的优化机会,进一步通过对查询组内的逻辑查询计划树进行查询间优化,提高了批量数据查询的查询效率。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍。
图1为本发明实施例一提供的批量数据查询方法的流程图;
图2为本发明实施例二提供的批量数据查询方法的流程图;
图3为本发明实施例三提供的批量数据查询方法的流程图;
图4为本发明实施例四提供的批量数据查询装置的结构示意图;
图5为本发明实施例五提供的批量数据查询装置的结构示意图;
图6为本发明实施例六提供的批量数据查询装置的结构示意图。
具体实施方式
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。
本发明各实施例的方法主要应用在批量数据查询的场景下,例如可以应用在数据仓库中,在数据仓库中存在大量的批量数据查询机会,数据仓库中存在大量的数据,这些数据通常存储在分布式文件系统中。
图1为本发明实施例一提供的批量数据查询方法的流程图,本实施例提供的方法本实施例可以由批量数据查询装置执行,批量数据查询装置具体为数据库服务器,如图1所示,本实施例的方法可以包括以下步骤:
步骤101、获取多条查询语句对应的多棵逻辑查询计划树,其中,每条查询语句对应一棵逻辑查询计划树,该多棵逻辑查询计划树中的每棵逻辑查询计划树包括多个操作符,该每棵逻辑查询计划树的每个操作符包括至少一个操作数。
客户端发送的每条查询语句在服务器端会被解析成一个树状结构的逻辑查询计划树,逻辑查询计划树包括多个节点,每个节点对应一个操作符,每个操作符引用一个或多个操作数。常用的操作符包括:扫描操作符(Scan Operator)、排序操作符(Sort Operator)、选择操作符(Select Operator)、聚集操作符(Aggregate Operator)、乘积操作符(Product Operator)、连接操作符(Join Operator)和文件定值操作符(Filesink Operator)等。
步骤102、根据每棵逻辑查询计划树的各操作符将该多棵逻辑查询计划 数划分为多个查询组,其中,每个查询组中的逻辑查询计划树之间具有下述关系:查询间写读依赖关系、操作符重叠关系或操作数重叠关系。
本实施例中,若来自一棵逻辑查询计划树的文件定值操作符所定值的操作数的数据标识与来自另一棵逻辑查询计划树的扫描操作符的某个操作数的数据标识相同,那么这两棵逻辑查询计划树具有写读依赖关系。若两棵逻辑查询计划数的全部操作符均相同,那么这两棵逻辑查询计划树具有操作符重叠关系。若两棵逻辑查询计划树的全部或部分操作数的数据标识相同,那么这两棵逻辑查询计划树具有操作数重叠关系。具有操作符重叠关系的两棵逻辑查询计划树具有的相同操作符的个数称为操作数重叠度,具有操作数重叠关系的两棵逻辑查询计划树的具有的相同数据标识的个数称为操作数重叠度。
本实施例的方法是基于查询间流分析的结果,通过流分析会为多棵逻辑查询计划树中的每棵逻辑查询计划树的各操作符的各操作数添加数据标识,使得在多多棵逻辑查询计划树进行优化时,根据每棵逻辑查询计划树的各操作符和各操作符的操作数的数据标识,确定每棵逻辑查询计划树之间的写读依赖关系、操作符重叠关系和操作数重叠关系,然后,根据每棵逻辑查询计划树的写读依赖关系、操使得分组后的每个查询组的逻辑查询计划树之间具有作符重叠关系和操作数重叠关系中的任意一个或其组合对多棵逻辑查询计划树进行分组,下述关系:查询间写读依赖关系、操作符重叠关系或操作数重叠关系。可选地,该数据标识具体可以为操作数的版本号。
本实施例中,通过对多棵逻辑查询计划树进行分组,可以增加分组后每个查询组内的逻辑查询计划树之间的优化机会,为下一步对每个查询组内的逻辑查询计划树进行查询间优化提供方便。例如,根据查询间写读依赖关系分组,可以将具有查询间写读依赖关系的逻辑查询计划树分到一个查询组内,然后,根据查询间写读依赖关系对逻辑查询计划树进行查询间优化,避免频繁的对分布式文件系统进行读写。或者,根据操作符重叠关系分组,可以将具有公共查询子树的逻辑查询计划树划分到一个查询组内,然后在进行查询间优化时可以将公共查询子树的结果进行复用。或者,根据操作数重叠关系进行分组,可以将操作数重叠度高的逻辑查询计划树分到一组,在进行查询间优化时,将具有相同数据标识的操作符合并到一个任务,减少对同一数据 的访问或操作开销。
另外,通过分组使得每个组内的逻辑查询计划树的数目较少,能够减少查询间优化的代价和复杂度,从而在提高批量查询效率的同时,降低了逻辑查询计划的优化代价。
步骤103、对每个查询组内的逻辑查询计划树进行查询间优化。
查询间优化是针对一个查询组内的多棵逻辑查询计划树整体进行优化,也就是优化机会存在逻辑查询计划树之间,而不是针对单棵逻辑查询计划树进行优化,在进行查询间优化时主要存在以下三种优化机会:
第一种优化机会:公共查询子树(Common Query Subtree),也称公共子查询,该优化机会针对的是具有操作符重叠关系的逻辑查询计划树,若查询组内的一棵逻辑查询计划树的第一查询子树与另一棵逻辑查询计划树的第二查询子树的树结构相同,并且第一查询子树的每个操作符的操作数的数据标识与第二查询子树的每个操作符的操作数的数据标识均相同。当第一查询子树的执行顺序在第二查询子树之前时,则称第一查询子树为公共查询子树,第二查询子树为第一查询子树的一个副本,当第一查询子树的执行顺序在第二查询子树之后时,则称第二查询子树为公共查询子树,第一查询子树为第二查询子树的一个副本。
由于第一查询子树和第二查询子树的树结构相同,并且第一查询子树和第二查询子树的所有操作数均相同,可知,第一查询子树和第二查询子树的查询结果相同,因此,对这一情景的优化机会为:对该第一查询子树的查询结果和该第二查询子树的查询结果进行复用。
在进行查询间优化时,先要识别出该公共查询子树的优化机会,具体地:判断每个查询组内的第一查询子树和第二查询子树的树结构是否相同,其中,该第一查询子树和该第二查询子树属于同一个查询组的两棵不同的逻辑查询计划树;若该第一查询子树和该第二查询子树的树结构相同,进一步判断该第一查询子树的操作符的操作数的数据标识与该第二查询子树的操作符的操作数的数据标识是否相同,若该第一查询子树的操作符的操作数的数据标识与该第二查询子树的操作符的操作数的数据标识相同,则确定该第一查询子树和该第二查询子树为公共查询子树,识别出该公共查询子树的优化机会。
在识别出该公共查询子树的优化机会后,则对该第一查询子树和该第二 查询子树进行优化,具体地:若该第一查询子树的执行顺序在该第二查询子树之前,则将该第二查询子树的后继连接到该第一查询子树上,该第二查询子树的后继作为该第一查询子树的新增后继,删除第一查询子树,以使该第二查询子树的后继复用该第一查询子树的查询结果。或者,若该第一查询子树的执行顺序在该第二查询子树之后,则将该第一查询子树的后继连接到该第二查询子树上,该第一查询子树的后继作为该第二查询子树的新增后继,删除第二查询子树,以使该第一查询子树的后继复用该第二查询子树的查询结果。
通过对公共查询子树的优化机会进行优化,避免重复的对同一查询结果反复的进行查询,降低了数据库查询的开销,提高了数据库查询的效率。
第二种优化机会:查询内联(Query inlining),查询内联针对的是具有查询间写读依赖关系的两棵逻辑查询计划树的优化机会,在分布式存储系统中,当写入数据较大时,文件定值操作符和扫描操作符占用的开销大,为了降低文件定值操作符和扫描操作符的开销,本实施例中,对具有查询间写读依赖关系的两棵逻辑查询计划树进行优化。
先要识别出该查询内联机会,具体地,判断每个查询组内的一对文件定值操作符和扫描操作符对应的操作数的数据标识是否相同,其中,该扫描操作符和该文件定值操作符属于同一个查询组内的两棵不同的逻辑查询计划树;若该文件定值操作符和该扫描操作符对应的操作数的数据标识相同,则确定该文件定值操作符和该扫描操作符所属的逻辑查询计划树具有查询间写读依赖关系。在识别出该查询内联机会之后,对查询内联机会进行以下优化:将该文件定值操作符的前驱与该扫描操作符的后继直接连接,并删除该扫描操作符。
例如,对于一对存在查询间写读依赖关系的操作符<OPw,OPr>,OPw表示文件定值操作符,OPr表示扫描操作符,OPw和OPr分别属于两棵逻辑查询计划树,在不进行查询间优化时,OPw的前驱在执行完后,OPw将OPw的前驱的输出结果写入分布式文件系统,后续,在执行OPr时,需要从分布式文件系统中再将OPw的前驱的输出结果读出来,然后OPr的后继对读取的OPw的前驱的输出结果进行处理,对分布式文件系统读写数据需要花费大量的时间,从而降低了数据的查询效率。本实施例中,为了降低对分布式文件 系统的读写开销,为OPw的前驱和OPr的后继建立新的前驱-后继关系,即将OPr的后继与OPw的前驱直接连接,将OPr的后继作为了OPw的前驱的后继,这样,OPw的前驱在执行完后,直接将OPw的前驱的输出结果作为OPr的后继的输入进行处理,不需要先将OPw的前驱的输出结果写入分布式文件系统然后再读出来,从而降低了对分布式文件系统的读写开销,提高了查询效率。
在进行查询内联优化时,将OPw的前驱与OPr的后继直接连接,删除OPr,从而将OPw所属的逻辑查询计划树与OPr所属的逻辑查询计划树合并为一棵逻辑查询计划树。合并后的逻辑查询计划树可能会引入新的查询内优化,查询内优化是针对单棵逻辑查询计划树的优化,典型的如任务流关联性优化(Job-flow Correlation Optimization)机会,在引入新的查询内优化后,可以对合并后的逻辑查询计划树进行查询内优化。
第三种优化机会:关联子查询(Correlation Query Subtree),该优化机会针对的是具有操作数重叠关系的逻辑查询计划树,若查询组内存在两个操作符:第一操作符和第二操作符,其中,第一操作符和第二操作符属于同一个查询组内的两棵不同的逻辑查询计划树,且第一操作符和第二操作符的操作数存在重叠,即第一操作符的操作数和第二操作符的操作数的数据标识部分或全部相同,则称第一操作符和第二操作符互为关联子查询。
先要识别出该关联子查询优化机会,具体地,判断每个查询组内的第一操作符和第二操作符的操作数的数据标识是否相同,其中,该第一操作符和该第二操作符属于同一个查询组的两棵不同的逻辑查询计划树;若该第一操作符和该第二操作符的操作数的数据标识相同,则确定该第一操作符和该第二操作符具有关联子查询优化机会。
在识别出该关联子查询优化机会后,对该关联子查询优化机会进行优化:为该第一操作符和该第二操作符添加合并标识,以使在生成物理查询树时将该第一操作符和该第二操作符合并为同一个任务,以减少对重叠数据的扫描操作开销。
在为该第一操作符和该第二操作符添加合并标识时,具体可以通过以下两种方式:一种方式中,直接为第一操作符和第二操作符添加相同的标识信息,在生成物理查询计划时,若两个操作符的标识信息相同,则将这两个操 作符合并到同一个任务。另一种方式中,为第一操作符和第二操作符添加公共的后继,将第一操作符和第二操作符的原有后继作为该公共的后继的后继,将该公共的后继作为合并标识,在生成物理查询计划时,若两个操作符的具有公共的后继,则将这两个操作符合并到同一个任务,该公共的后继可以为一个新的操作符类型,以区别现有的操作符,该公共的后继并不具有实际的操作功能,仅记录第一操作符与其原有后继,以及第二操作符与其原有后继之际的对应关系。
步骤104、根据优化后的多棵逻辑查询计划树生成对应的多棵物理查询计划树。
在对查询组内的各逻辑查询计划树进行查询间优化之后,得到查询间优化后的多棵逻辑查询计划树,然后,根据查询间优化后的多棵逻辑查询计划树生成多棵物理查询计划,根据逻辑查询计划树生成对应的物理查询计划为现有技术,这里不做过多的描述。
步骤105、根据多棵物理查询计划树执行物理查询,得到多条查询语句的查询结果。
本实施例,通过获取多条查询语句的多棵逻辑查询计划树,并根据每棵逻辑查询计划树的各操作符将该多棵逻辑查询计划树划分为多个查询组,划分后的每个查询组中的逻辑查询计划树之间具有下述关系:查询间写读依赖关系、操作符重叠关系或操作数重叠关系。然后,对每个查询组内的逻辑查询计划树进行查询间优化,根据优化后的多棵逻辑查询计划树生成对应的多棵物理查询计划树。最后,根据多棵物理查询计划树执行物理查询,得到多条查询语句的查询结果。上述方法通过对多棵逻辑查询计划树进行分组,可以提高查询组内的逻辑查询计划树之间的优化机会,通过对查询组内的逻辑查询计划树进行查询间优化,提高了批量数据查询的查询效率。
图2为本发明实施例二提供的批量数据查询方法的流程图,本实施例中主要对实施例一中步骤102进行详细说明,如图2所示,本实施例的方法可以包括以下步骤:
步骤201、获取多条查询语句对应的多棵逻辑查询计划树,其中,每条查询语句对应一棵逻辑查询计划树,该多棵逻辑查询计划树中的每棵逻辑查询计划树包括多个操作符,该每棵逻辑查询计划树的每个操作符包括至少一 个操作数,每个操作数具有数据标识,相同操作数的数据标识相同。
本步骤的具体实现方式可参照实施例一的相关描述,这里不再赘述。
步骤202、根据每棵逻辑查询计划树的各操作符和每棵逻辑查询计划树的各操作符的各操作数的数据标识,确定每棵逻辑查询计划树的查询间写读依赖关系,将所有具有查询间写读依赖关系的逻辑查询计划树划分到第一分组中,将所有不具有查询间写读依赖关系的逻辑查询计划树划分到第二分组中,根据第一分组中的每棵逻辑查询计划树的查询间写读依赖关系将第一分组中的逻辑查询计划树划分为至少一个查询组。
一种实现方式中,若流分析结果中包含了各操作数的数据标识,那么先识别出每棵逻辑查询计划树中的文件定值操作符,然后,分别判断每棵逻辑查询计划树的文件定值操作符所定值的各操作数的数据标识,是否与多棵逻辑查询计划树中的其他逻辑查询计划树的扫描操作符的各操作数的数据标识相同,若相同,则确定包含该文件定值操作符和该扫描操作符的逻辑查询计划树具有查询间写读依赖关系。
另一种方式中,若流分析结果中包含了来自不同逻辑查询计划树的操作符之间的查询间写读依赖关系,那么每对存在查询间写读依赖关系的操作符所属的逻辑查询计划树之间也存在查询间写读依赖关系。
本实施例中,可以对多棵逻辑查询计划树进行编号,按照编号顺序依次确定每棵逻辑查询计划树的查询间写读依赖关系。例如,有10棵逻辑查询计划树,编号依次为1-10,那么先确定第1棵逻辑查询计划树的查询间写读依赖关系,首先确定第1棵逻辑查询计划树中的文件定值操作符的目的路径是否为空,若第1棵逻辑查询计划树中有一个文件定值操作符的目的路径非空,其次,判断第1棵逻辑查询计划树的文件定值操作符所定值的操作数的数据标识是否与第2棵逻辑查询计划树的扫描操作符的所有操作数中至少一个的数据标识相同,若第1棵逻辑查询计划树的文件定值操作符所定值的操作数的数据标识与第2棵逻辑查询计划树的扫描操作符的所有操作数中至少一个的数据标识相同,则确定第1棵逻辑查询计划树和第2棵逻辑查询计划树具有查询间写读依赖关系,然后,依次判断第一棵逻辑查询计划树与剩余的8棵逻辑查询计划树之间是否具有查询间写读依赖关系。在第1棵逻辑查询计划树的查询间写读依赖关系确定完之后,确定第2棵逻辑查询计划树的查询 间写读依赖关系,按照上述的方法依次确定每棵逻辑查询计划树的查询间写读依赖关系。另外,当某一棵逻辑查询计划树中没有文件定值操作符时,跳过该逻辑查询计划树,判断下一棵逻辑查询计划树的查询间写读依赖关系。
每棵逻辑查询计划树中也可能包括多个文件定值操作符,因此,每棵逻辑查询计划树可能与多棵逻辑查询计划树具有查询间写读依赖关系。在确定每棵逻辑查询计划树的查询间写读依赖关系后,将多棵逻辑查询计划中所有具有查询间写读依赖关系的逻辑查询计划树划分到第一分组,将不具有查询间写读依赖关系的逻辑查询计划树划分到第二分组内,然后,根据查询间写读依赖关系对第一分组内的逻辑查询计划树进行分组。在具体分组时,假设第1棵逻辑查询计划树与第2棵逻辑查询计划树和第6棵逻辑查询计划树都具有查询间写读依赖关系,并且第6棵逻辑查询计划树与第4棵逻辑查询计划树也具有查询间写读依赖关系,那么在分组时,可以将第1棵逻辑查询计划树与第2棵逻辑查询计划树划分到一个查询组,将第6棵逻辑查询计划树与第4棵逻辑查询计划树划分到一个查询组,也可以将第1棵逻辑查询计划树、第2棵逻辑查询计划树、第6棵逻辑查询计划树和第4棵逻辑查询计划树划分到一个查询组,本发明实施例并不对具体的分组方式进行限制。
通过本步骤,将具有查询间写读依赖关系的逻辑查询计划树分到一个查询组内,从而提高了查询组内的优化机会。
步骤203、统计第二分组中的每棵逻辑查询计划树的各类操作符的数量,根据每棵逻辑查询计划树的各类操作符的数量,确定每棵逻辑查询计划树的操作符重叠度,将第二分组中具有操作符重叠关系的逻辑查询计划树划分到第一子分组中,将第二分组中除第一子分组中的逻辑查询计划树之外的其他逻辑查询计划树划分到第二子分组中,根据第一子分组中每棵逻辑查询计划树的操作符重叠度将第一子分组中逻辑查询计划树划分为至少一个查询组。
其中,操作符有多种类型,例如连接操作符、聚合操作符、扫描操作符等,统计第二分组中的每棵逻辑查询计划树的各类操作符的数量,本实施例中,具体为统计第二分组中的每棵逻辑查询计划树的连接操作符、聚合操作符、扫描操作符等分别出现的次数。根据每棵逻辑查询计划树的各类操作符的数量,确定每棵逻辑查询计划树的操作符重叠度,具体为:将第二分组中的每棵逻辑查询计划树的各类操作符的数量分别与第二分组中的其他逻辑查 询计划树的各类操作符的数量进行比较,若第二分组中的两棵逻辑查询计划树的各类操作符的数量均相同,则将各类操作符的数量的总和分别作为所述两棵逻辑查询计划树在一次比较过程中的操作符重叠度,其中,一次比较过程包括第二分组中的一棵逻辑查询计划树与其他任意一棵逻辑查询计划树的比较过程。若第二分组中的两棵逻辑查询计划树的各类操作符的数量中有一类操作符数量不相同,那么这两棵逻辑查询计划树的在本次比较过程中的操作符重叠度为0。
仍以步骤202中的例子为例,第二分组中包括6棵逻辑查询计划树:第3棵、第5棵、第7棵、第8棵、第9棵和第10棵逻辑查询计划树。先统计这6棵逻辑查询计划树的各类操作符的数量,假设只统计连接操作符、聚合操作符、扫描操作符这三类操作符的数量,然后,分别将第3棵逻辑查询计划树的各类操作符的数量与其他5棵逻辑查询计划树的各类操作符的数量相同,假设,第3棵逻辑查询计划树的连接操作符的数量为3,聚合操作符的数量为5,扫描操作符的数量为6,第3棵逻辑查询计划树的各类操作符的数量与第8棵逻辑查询计划树的各类操作符的数量相同,第3棵逻辑查询计划树的各类操作符的数量与其他4棵逻辑查询计划树的各类操作符的数量不相同,那么确定第3棵逻辑查询计划树与第6棵逻辑查询计划树的操作符重叠度为3+5+6=11,第3棵逻辑查询计划树与其他4棵逻辑查询计划树的操作符重叠度为0。然后,依次确定剩余5棵逻辑查询计划树在每一次比较过程的操作数重叠度。假设,第二分组中只有第3棵逻辑查询计划树的各类操作符的数量与第8棵逻辑查询计划树的各类操作符的数量相同,那么将第3棵逻辑查询计划树与第8棵逻辑查询计划树划分到第一子分组中,并将第3棵逻辑查询计划树与第8棵逻辑查询计划树划分到同一个查询组内。
当第一子分组中有多棵逻辑查询计划树时,按照各逻辑查询计划树的操作符重叠度划分查询组,将操作符重叠度相同的逻辑查询计划树划分到一组,例如,第一子分组中有4棵逻辑查询计划树,其中第1棵和第2棵逻辑查询计划树的各类操作符的数量均相同,操作符重叠度为11,第3棵和第4棵逻辑查询计划树的各类操作符的数量均相同,操作符重叠度为13,那么将第1棵和第2棵逻辑查询计划树划分到一个查询组内,第3棵和第4棵划分到另一个查询组内。若上述4棵逻辑查询计划树的各类操作符的数量均相同,并 且操作符重叠度都相同,那么可以将这4棵逻辑查询计划树划分到一个查询组内。两外一种情况,虽然4棵逻辑查询计划树的操作符重叠度相同,但是第1棵和第2棵逻辑查询计划树的各类操作符的数量,与第3棵和第4棵逻辑查询计划树的各类操作符的数量不完全相同,例如,第1棵和第2棵逻辑查询计划树的连接操作符数量为2,扫描操作符数量为5,聚合操作符数量为4,第3棵和第4棵逻辑查询计划树的连接操作符数量为4,扫描操作符数量为5,聚合操作符数量为2,因此,不能将4棵逻辑查询计划树划分到一个查询组内,需要将第1棵和第2棵逻辑查询计划树划分到一个查询组,第3棵和第4棵逻辑查询计划树划分到另一个查询组内。
步骤204、根据第二子分组中的每棵逻辑查询计划树的各操作符的各操作数的数据标识,确定第二子分组中的每棵逻辑查询计划的操作数重叠度,根据第二子分组中的每棵逻辑查询计划的操作数重叠度,将第二子分组中的逻辑查询计划树划分为至少一个查询组。
一种实现方式中,可以根据第二子分组中的每棵逻辑查询计划树的各操作符的各操作数的数据标识,确定第二子分组中的每棵逻辑查询计划的操作数重叠度,具体为:将第二子分组中的每棵逻辑查询计划树的各操作符的各操作数的数据标识,分别与第二子分组中的其他逻辑查询计划树的各操作符的各操作数的数据标识进行比较,以获得第二子分组中的每棵逻辑查询计划树在每一次比较过程中的操作数重叠度,其中,每一次比较过程包括第二子分组中的一棵逻辑查询计划树与所述第二子分组中其他任意一棵逻辑查询计划树比较的过程。
仍以上述例子为例,在根据逻辑查询计划树的查询间写读依赖关系和操作符重叠关系进行划分后,对于剩余的第二子分组中的逻辑查询计划树:第5棵、第7棵、第9棵和第10棵逻辑查询计划树,先将第5棵逻辑查询计划树的各操作符的各操作数的数据标识与第7棵逻辑查询计划树的操作符的操作数进行比较,假设第5棵逻辑查询计划树共有6种数据标识,第7棵逻辑查询计划树共有5种数据标识,第5棵逻辑查询计划树有3种数据标识与第7棵逻辑查询计划树的数据标识相同,则确定第5棵逻辑查询计划树与第7棵逻辑查询计划树的操作数重叠度为3,然后,依次确定第5棵逻辑查询计划树与第9棵逻辑查询计划树和第10棵逻辑查询计划树的操作数重叠度,在 确定第7棵逻辑查询计划树与其他3棵逻辑查询计划树的操作数重叠度,由于已经确定过第5棵逻辑查询计划树与第7棵逻辑查询计划树的操作数重叠度,因此,在确定第7棵逻辑查询计划树的重叠度时,不需要确定第7棵逻辑查询计划树与第5棵逻辑查询计划树的操作数重叠度,只需要确定第7棵逻辑查询计划树与第9棵逻辑查询计划树和第10棵逻辑查询计划树的操作数重叠度,最后在确定第9棵逻辑查询计划树与第10棵逻辑查询计划树的操作数重叠度。
在确定第二子分组中的每棵逻辑查询计划树在每一次比较过程中的操作数重叠度后,根据操作数重叠度的大小进行分组,优先将重叠度最大的逻辑查询计划树划分到一组。例如,第5棵逻辑查询计划树与第7棵逻辑查询计划树的操作数重叠度最高,该最高操作数重叠度为5,那么将第5棵逻辑查询计划树与第7棵逻辑查询计划树划分到一个查询组内,若第5棵逻辑查询计划树与第9棵逻辑查询计划树的重叠度也为5,第9棵逻辑查询计划树与第10棵逻辑查询计划树的重叠度为3,那么可以将第9棵逻辑查询计划树也划分到该查询组内,该查询组内共有三棵逻辑查询计划树,由于只剩第10棵逻辑查询计划树了,可以将第10棵逻辑查询计划树单独划分到一个查询组内,也可以将第10棵逻辑查询计划树划分到该查询组内。
需说明的是,两棵逻辑查询计划树的重叠操作数的总数相同,并不表示两棵逻辑查询计划树的重叠操作数的每一个元素都相同,因此,在确定多棵逻辑查询计划树的重叠度相同后,还可以进一步判断两棵逻辑查询计划树所包含的重叠操作数的数据标识是否相同,例如,第5棵逻辑查询计划树与第7棵逻辑查询计划树的操作数重叠度为5,重叠操作数的数据标识分别为:数据标识A、B、C、D和E,第5棵逻辑查询计划树与第9棵逻辑查询计划树的操作数重叠度也为5,但是,重叠操作数的数据标识不一样,第5棵逻辑查询计划树与第9棵逻辑查询计划树的重叠数据的数据标识包括:数据标识A、B、C、E和F,第9棵逻辑查询计划树与第10棵逻辑查询计划树的操作数6的重叠度与3。因此,在分组时,可以将第5棵逻辑查询计划树与第7棵逻辑查询计划树划分到一个查询组,将第9棵逻辑查询计划树与第10棵逻辑查询计划树划分到一个查询组。
另一种实现方式中,可以根据第二子分组中的每棵逻辑查询计划树的扫 描操作符的各操作数的数据标识,确定第二子分组中的每棵逻辑查询计划的操作数重叠度。具体地,将第二子分组中的每棵逻辑查询计划树的扫描操作符的各操作数的数据标识,分别与第二子分组中的其他逻辑查询计划树的扫描操作符的各操作数的数据标识进行比较,以获得第二子分组中的每棵逻辑查询计划树在每一次比较过程中的操作数重叠度,其中,每一次比较过程包括第二子分组中的一棵逻辑查询计划树与所述第二子分组中其他任意一棵逻辑查询计划树比较的过程。
步骤205、对每个查询组内的各逻辑查询计划树进行查询间优化。
本步骤的具体实现方式可以参照实施例一中的相关描述,这里不再赘述。
步骤206、根据优化后的多棵逻辑查询计划树生成对应的多棵物理查询计划树。
步骤207、根据该多棵物理查询计划树执行物理查询,得到该多条查询语句的查询结果。
本实施例中,先根据每棵逻辑查询计划树的查询间写读依赖关系对多棵逻辑查询计划树进行分组,将具有查询间写读依赖关系的逻辑查询计划树划分到同一个查询组内,然后,对剩余的不具有查询间写读依赖关系的逻辑查询计划树在按照操作符重叠关系进行划分,最后对根据查询间写读依赖关系和操作符重叠关系划分后剩余的逻辑查询计划树按照操作数重叠关系进行划分。通过上述方法可以提高查询组内的逻辑查询计划树之间的优化机会,进一步通过对查询组内的逻辑查询计划树进行查询间优化,提高了批量数据查询的查询效率。另外,通过分组使得每棵组内的逻辑查询计划数目较少,能够减少查询间优化的代价和复杂度,从而整体上降低了查询优化的代价。
本发明实施例二中以根据逻辑查询计划树的操作数重叠关系、操作符重叠关系和查询间写读依赖关系三个参数进行分组为例对分组方式进行了详细说明,但本发明并不对此进行限制,还可以根据上述三个参数中的任意一个参数进行划分。例如,在只根据查询间写读依赖关系进行分组时,先将具有查询间写读依赖关系的逻辑查询计划树划分到一个查询组内,将不具有查询间写读依赖关系的逻辑查询计划树划分到一个查询组内,若不具有查询间写读依赖关系的逻辑查询计划树很多时,导致分组过大,这时可以将不具有查询间写读依赖关系的逻辑查询计划树随机的划分为几个查询组。在只根据操 作符重叠关系进行分组时,先将各类操作符的数量均相同的逻辑查询计划树划分到一个查询组内,将剩余的逻辑查询计划树划分到一个查询组内。当根据查询间写读依赖关系和操作符重叠关系分组时,先根据查询间写读依赖关系进行分组,对剩余的不具有查询间写读依赖关系的逻辑查询计划树,根据操作符重叠关系进行分组。当根据查询间写读依赖关系和操作数重叠关系分组时,先根据查询间写读依赖关系进行分组,对剩余的不具有查询间写读依赖关系的逻辑查询计划树,根据操作数重叠关系进行分组。根据查询间写读依赖关系、操作数重叠度或操作符重叠度分组的具体方法可参照实施例二中的相关描述,这里不再赘述。
图3为本发明实施例三提供的批量数据查询方法的流程图,本实施例和实施例一的区别在于,本实施例中,在对每个查询组进行查询间优化后,还对组间整体进行优化,如图2所示,本实施例的方法可以包括以下步骤:
步骤301、获取多条查询语句对应的多棵逻辑查询计划树,其中,每条查询语句对应一棵逻辑查询计划树。
步骤302、根据每棵逻辑查询计划树的各操作符将多棵逻辑查询计划树划分为多个查询组,其中,每个查询组中的逻辑查询计划树之间具有下述关系:查询间写读依赖关系、操作符重叠关系或操作数重叠关系。
步骤303、对每个查询组内的逻辑查询计划树进行查询间优化。
步骤301-303的具体实现方式可参照实施例一和实施例二的相关描述,这里不再赘述。
步骤304、创建一个虚拟操作节点,以该虚拟操作节点为根节点,将该查询间优化后的多棵逻辑查询计划树的根节点作为该虚拟操作节点的前驱,生成一棵虚拟逻辑查询计划树。
本步骤的目的是将查询间优化后的多棵逻辑查询计划树合并为一棵逻辑查询计划树,具体地,通过创建一个虚拟操作节点,并建立该虚拟操作节点与查询间优化后的每棵逻辑查询计划树的根节点的后继-前驱关系,即将该虚拟操作节点作为查询间优化后的每棵逻辑查询计划树的根节点,从而形成棵虚拟逻辑查询计划树。该虚拟操作节点代表的操作符的类型为除现有操作符类型之外一个新类型。
步骤305、对该虚拟逻辑查询计划树进行查询内优化,生成查询内优化 后的虚拟逻辑查询计划树。
步骤304中通过将查询间优化后的多棵逻辑查询计划树合并为一棵虚拟逻辑查询计划树,从而能够利用现有的查询内优化技术对该虚拟逻辑查询计划树进行优化。常用的查询内优化技术包括:谓词下降(Predicate Pushdown)优化、列裁剪(Column Pruning)优化、任务流关联性优化等。可以根据优化规则或代价优化模型,决定是否对该虚拟逻辑查询计划树进行优化。
步骤306、删除查询内优化后的虚拟逻辑查询计划树的虚拟操作节点,得到优化后的多棵逻辑查询计划树。
步骤307、根据优化后的多棵逻辑查询计划树生成对应的多棵物理查询计划树。
步骤308、根据该多棵物理查询计划树执行物理查询,得到改多条查询语句的查询结果。
本实施例,通过将查询间优化后的多棵逻辑查询计划树的合并成一棵逻辑查询计划树,然后,对合并后的逻辑查询计划树进行查询内优化,进一步地对多棵逻辑查询计划树进行了优化,提高了批量查询的查询效率。
图4为本发明实施例四提供的批量数据查询装置的结构示意图,如图4所示,本实施例提供的装置包括:获取模块11、分组模块12、优化模块13、生成模块14和查询模块15。
其中,获取模块11,用于获取多条查询语句对应的多棵逻辑查询计划树,其中,每条查询语句对应一棵逻辑查询计划树,所述多棵逻辑查询计划树中的每棵逻辑查询计划树包括多个操作符,所述每棵逻辑查询计划树的每个操作符包括至少一个操作数;
分组模块12,用于根据所述每棵逻辑查询计划树的各操作符将所述多棵逻辑查询计划树划分为多个查询组,其中,每个查询组中的逻辑查询计划树之间具有下述关系:查询间写读依赖关系、操作符重叠关系或操作数重叠关系;
优化模块13,用于对所述每个查询组内的各逻辑查询计划树进行查询间优化;
生成模块14,用于根据优化后的多棵逻辑查询计划树生成对应的多棵物理查询计划树;
查询模块15,用于根据所述多棵物理查询计划树执行物理查询,得到所述多条查询语句的查询结果。
可选地,所述优化模块13具体用于:
判断所述每个查询组内的第一查询子树和第二查询子树的树结构是否相同,其中,所述第一查询子树和所述第二查询子树属于同一个查询组的两棵不同的逻辑查询计划树;
若所述第一查询子树和所述第二查询子树的树结构相同,判断所述第一查询子树的操作符的操作数的数据标识与所述第二查询子树的操作符的操作数的数据标识是否相同;
若所述第一查询子树的操作符的操作数的数据标识与所述第二查询子树的操作符的操作数的数据标识相同,且所述第一查询子树的执行顺序在所述第二查询子树之前,则将所述第二查询子树的后继连接到所述第一查询子树上,所述第二查询子树作为所述第一查询子树的新增后继,以使所述第二查询子树的后继复用所述第一查询子树的查询结果;
或者,若所述第一查询子树的操作符的操作数的数据标识与所述第二查询子树的操作符的操作数的数据标识相同,且所述第一查询子树的执行顺序在所述第二查询子树之后,则将所述第一查询子树的后继连接到所述第二查询子树上,所述第一查询子树的后继作为所述第二查询子树的新增后继,以使所述第一查询子树的后继复用所述第二查询子树的查询结果。
可选地,所述优化模块13具体用于:
判断所述每个查询组内的一对文件定值操作符和扫描操作符对应的操作数的数据标识是否相同,其中,所述扫描操作符和所述文件定值操作符属于同一个查询组内的两棵不同的逻辑查询计划树;
若所述文件定值操作符和所述扫描操作符对应的操作数的数据标识相同,则确定所述文件定值操作符和所述扫描操作符所属的逻辑查询计划树具有写读依赖关系,将所述文件定值操作符的前驱与所述扫描操作符的后继直接连接,并删除所述扫描操作符。
可选地,所述优化模块13具体用于:
判断所述每个查询组内的第一操作符和第二操作符的操作数的数据标识是否相同,其中,所述第一操作符和所述第二操作符属于同一个查询组的两 棵不同的逻辑查询计划树;
若所述第一操作符和所述第二操作符的操作数的数据标识相同,则为所述第一操作符和所述第二操作符添加合并标识,所述合并标识用于在生成物理查询计划树时将所述第一操作符和所述第二操作符进行合并。
本实施例的装置,可用于执行方法实施例一的方法,具体实现方式和技术效果类似,这里不再赘述。
图5为本发明实施例五提供的批量数据查询装置的结构示意图,本实施例的装置在图4所示装置结构的基础上,进一步地,所述分组模块12包括:第一分组单元121、第二分组单元122和第三分组单元123。
其中,第一分组单元121,用于根据所述每棵逻辑查询计划树的各操作符和所述每棵逻辑查询计划树的各操作符的各操作数的数据标识,确定所述每棵逻辑查询计划树的查询间写读依赖关系,将所有具有查询间写读依赖关系的逻辑查询计划树划分到第一分组中,将所有不具有查询间写读依赖关系的逻辑查询计划树划分到第二分组中;
所述第一分组单元121,还用于根据所述第一分组中的每棵逻辑查询计划树的查询间写读依赖关系将所述第一分组中的逻辑查询计划树划分为至少一个查询组;
第二分组单元122,用于统计所述第二分组中的每棵逻辑查询计划树的各类操作符的数量,根据所述每棵逻辑查询计划树的各类操作符的数量,确定所述每棵逻辑查询计划树的操作符重叠度,将所述第二分组中具有操作符重叠关系的逻辑查询计划树划分到第一子分组中,将所述第二分组中除所述第一子分组中的逻辑查询计划树之外的其他逻辑查询计划树划分到第二子分组中;
所述第二分组单元122,还用于根据所述第一子分组中每棵逻辑查询计划树的操作符重叠度将所述第一子分组中逻辑查询计划树划分为至少一个查询组;
第三分组单元123,用于根据所述第二子分组中的每棵逻辑查询计划树的各操作符的各操作数的数据标识,确定所述第二子分组中的每棵逻辑查询计划的操作数重叠度,根据所述第二子分组中的每棵逻辑查询计划的操作数重叠度,将所述第二子分组中的逻辑查询计划树划分为至少一个查询组。
可选地,所述第一分组单元121具体用于:
分别判断所述每棵逻辑查询计划树的文件定值操作符所定值的各操作数的数据标识,是否与所述多棵逻辑查询计划树中的其他逻辑查询计划树的扫描操作符的各操作数的数据标识相同;
若相同,则确定包含所述文件定值操作符和所述扫描操作符的逻辑查询计划树具有查询间写读依赖关系。
可选地,所述第二分组单元122具体用于:
将所述第二分组中的每棵逻辑查询计划树的各类操作符的数量分别与所述第二分组中的其他逻辑查询计划树的各类操作符的数量进行比较,若所述第二分组中的两棵逻辑查询计划树的各类操作符的数量均相同,则将所述各类操作符的数量的总和分别作为所述两棵逻辑查询计划树在一次比较过程中的操作符重叠度,其中,所述一次比较过程包括所述第二分组中的一棵逻辑查询计划树与其他任意一棵逻辑查询计划树的比较过程。
进一步地,所述优化模块13还用于:
创建一个虚拟操作节点,以所述虚拟操作节点为根节点,将查询间优化后的多棵逻辑查询计划树的根节点作为所述虚拟操作节点的前驱,生成一棵虚拟逻辑查询计划树;
对所述虚拟逻辑查询计划树进行查询内优化,生成查询内优化后的虚拟逻辑查询计划树;
删除所述查询内优化后的虚拟逻辑查询计划树的所述虚拟操作节点,得到所述优化后的多棵逻辑查询计划树。
本实施例的装置,可以用于执行实施例二和实施例三的方法,其实现原理和技术效果类似,此处不再赘述。
图6为本发明实施例六提供的批量数据查询装置的结构示意图,如图6所示,本实施例的批量数据查询装置300包括:处理器31、存储器32和系统总线33,所述处理器31和所述存储器32之间通过所述系统总线33连接并完成相互间的通信;所述存储器32,用于存储计算机执行指令321;所述处理器31,用于运行所述计算机执行指令321,以执行如下所述的方法:
获取多条查询语句对应的多棵逻辑查询计划树,其中,每条查询语句对应一棵逻辑查询计划树,所述多棵逻辑查询计划树中的每棵逻辑查询计划树 包括多个操作符,所述每棵逻辑查询计划树的每个操作符包括至少一个操作数;
根据所述每棵逻辑查询计划树的各操作符将所述多棵逻辑查询计划树划分为多个查询组,其中,每个查询组中的逻辑查询计划树之间具有下述关系:查询间写读依赖关系、操作符重叠关系或操作数重叠关系;
对所述每个查询组内的各逻辑查询计划树进行查询间优化;
根据优化后的多棵逻辑查询计划树生成对应的多棵物理查询计划树;
根据所述多棵物理查询计划树执行物理查询,得到所述多条查询语句的查询结果。
可选地,所述根据所述每棵逻辑查询计划树的各操作符将所述多棵逻辑查询计划树划分为多个查询组,包括:
根据所述每棵逻辑查询计划树的各操作符和所述每棵逻辑查询计划树的各操作符的各操作数的数据标识,确定所述每棵逻辑查询计划树的查询间写读依赖关系,将所有具有查询间写读依赖关系的逻辑查询计划树划分到第一分组中,将所有不具有查询间写读依赖关系的逻辑查询计划树划分到第二分组中;
根据所述第一分组中的每棵逻辑查询计划树的查询间写读依赖关系将所述第一分组中的逻辑查询计划树划分为至少一个查询组;
统计所述第二分组中的每棵逻辑查询计划树的各类操作符的数量,根据所述每棵逻辑查询计划树的各类操作符的数量,确定所述每棵逻辑查询计划树的操作符重叠度,将所述第二分组中具有操作符重叠关系的逻辑查询计划树划分到第一子分组中,将所述第二分组中除所述第一子分组中的逻辑查询计划树之外的其他逻辑查询计划树划分到第二子分组中;
根据所述第一子分组中每棵逻辑查询计划树的操作符重叠度将所述第一子分组中逻辑查询计划树划分为至少一个查询组;
根据所述第二子分组中的每棵逻辑查询计划树的各操作符的各操作数的数据标识,确定所述第二子分组中的每棵逻辑查询计划的操作数重叠度,根据所述第二子分组中的每棵逻辑查询计划的操作数重叠度,将所述第二子分组中的逻辑查询计划树划分为至少一个查询组。
本实施例中,所述根据每棵逻辑查询计划树的各操作符和所述每棵逻辑 查询计划树的各操作符的各操作数的数据标识,确定所述每棵逻辑查询计划树的查询间写读依赖关系,包括:
分别判断所述每棵逻辑查询计划树的文件定值操作符所定值的各操作数的数据标识,是否与所述多棵逻辑查询计划树中的其他逻辑查询计划树的扫描操作符的各操作数的数据标识相同;
若相同,则确定包含所述文件定值操作符和所述扫描操作符的逻辑查询计划树具有查询间写读依赖关系。
本实施例中,所述根据所述每棵逻辑查询计划树的各类操作符的数量,确定所述每棵逻辑查询计划树的操作符重叠度,包括:
将所述第二分组中的每棵逻辑查询计划树的各类操作符的数量分别与所述第二分组中的其他逻辑查询计划树的各类操作符的数量进行比较,若所述第二分组中的两棵逻辑查询计划树的各类操作符的数量均相同,则将所述各类操作符的数量的总和分别作为所述两棵逻辑查询计划树在一次比较过程中的操作符重叠度,其中,所述一次比较过程包括所述第二分组中的一棵逻辑查询计划树与其他任意一棵逻辑查询计划树的比较过程。
可选地,所述对所述每个查询组内的逻辑查询计划树进行查询间优化,包括:
判断所述每个查询组内的第一查询子树和第二查询子树的树结构是否相同,其中,所述第一查询子树和所述第二查询子树属于同一个查询组的两棵不同的逻辑查询计划树;
若所述第一查询子树和所述第二查询子树的树结构相同,判断所述第一查询子树的操作符的操作数的数据标识与所述第二查询子树的操作符的操作数的数据标识是否相同;
若所述第一查询子树的操作符的操作数的数据标识与所述第二查询子树的操作符的操作数的数据标识相同,且所述第一查询子树的执行顺序在所述第二查询子树之前,则将所述第二查询子树的后继连接到所述第一查询子树上,所述第二查询子树作为所述第一查询子树的新增后继,以使所述第二查询子树的后继复用所述第一查询子树的查询结果;
或者,若所述第一查询子树的操作符的操作数的数据标识与所述第二查询子树的操作符的操作数的数据标识相同,且所述第一查询子树的执行顺序 在所述第二查询子树之后,则将所述第一查询子树的后继连接到所述第二查询子树上,所述第一查询子树的后继作为所述第二查询子树的新增后继,以使所述第一查询子树的后继复用所述第二查询子树的查询结果。
可选地,所述对所述每个查询组内的逻辑查询计划树进行查询间优化,包括:
判断所述每个查询组内的一对文件定值操作符和扫描操作符对应的操作数的数据标识是否相同,其中,所述扫描操作符和所述文件定值操作符属于同一个查询组内的两棵不同的逻辑查询计划树;
若所述文件定值操作符和所述扫描操作符对应的操作数的数据标识相同,则确定所述文件定值操作符和所述扫描操作符所属的逻辑查询计划树具有写读依赖关系,将所述文件定值操作符的前驱与所述扫描操作符的后继直接连接,并删除所述扫描操作符。
可选地,所述对所述每个查询组内的逻辑查询计划树进行查询间优化,包括:
判断所述每个查询组内的第一操作符和第二操作符的操作数的数据标识是否相同,其中,所述第一操作符和所述第二操作符属于同一个查询组的两棵不同的逻辑查询计划树;
若所述第一操作符和所述第二操作符的操作数的数据标识相同,则为所述第一操作符和所述第二操作符添加合并标识,所述合并标识用于在生成物理查询计划树时将所述第一操作符和所述第二操作符进行合并。
可选地,所述根据优化后的多棵逻辑查询计划树生成对应的多棵物理查询计划树之前,所述方法还包括:
创建一个虚拟操作节点,以所述虚拟操作节点为根节点,将查询间优化后的多棵逻辑查询计划树的根节点作为所述虚拟操作节点的前驱,生成一棵虚拟逻辑查询计划树;
对所述虚拟逻辑查询计划树进行查询内优化,生成查询内优化后的虚拟逻辑查询计划树;
删除所述查询内优化后的虚拟逻辑查询计划树的所述虚拟操作节点,得到所述优化后的多棵逻辑查询计划树。
本实施例的装置,可用于执行实施例一至实施例三的任一方法实施例的 技术方案,其实现原理和技术效果类似,此处不再赘述。
本领域普通技术人员可以理解:实现上述各方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成。前述的程序可以存储于一计算机可读取存储介质中。该程序在执行时,执行包括上述各方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
最后应说明的是:以上各实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述各实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。

Claims (16)

  1. 一种批量数据查询方法,其特征在于,包括:
    获取多条查询语句对应的多棵逻辑查询计划树,其中,每条查询语句对应一棵逻辑查询计划树,所述多棵逻辑查询计划树中的每棵逻辑查询计划树包括多个操作符,所述每棵逻辑查询计划树的每个操作符包括至少一个操作数;
    根据所述每棵逻辑查询计划树的各操作符将所述多棵逻辑查询计划树划分为多个查询组,其中,每个查询组中的逻辑查询计划树之间具有下述关系:查询间写读依赖关系、操作符重叠关系或操作数重叠关系;
    对所述每个查询组内的各逻辑查询计划树进行查询间优化;
    根据优化后的多棵逻辑查询计划树生成对应的多棵物理查询计划树;
    根据所述多棵物理查询计划树执行物理查询,得到所述多条查询语句的查询结果。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述每棵逻辑查询计划树的各操作符将所述多棵逻辑查询计划树划分为多个查询组,包括:
    根据所述每棵逻辑查询计划树的各操作符和所述每棵逻辑查询计划树的各操作符的各操作数的数据标识,确定所述每棵逻辑查询计划树的查询间写读依赖关系,将所有具有查询间写读依赖关系的逻辑查询计划树划分到第一分组中,将所有不具有查询间写读依赖关系的逻辑查询计划树划分到第二分组中;
    根据所述第一分组中的每棵逻辑查询计划树的查询间写读依赖关系将所述第一分组中的逻辑查询计划树划分为至少一个查询组;
    统计所述第二分组中的每棵逻辑查询计划树的各类操作符的数量,根据所述每棵逻辑查询计划树的各类操作符的数量,确定所述每棵逻辑查询计划树的操作符重叠度,将所述第二分组中具有操作符重叠关系的逻辑查询计划树划分到第一子分组中,将所述第二分组中除所述第一子分组中的逻辑查询计划树之外的其他逻辑查询计划树划分到第二子分组中;
    根据所述第一子分组中每棵逻辑查询计划树的操作符重叠度将所述第一子分组中逻辑查询计划树划分为至少一个查询组;
    根据所述第二子分组中的每棵逻辑查询计划树的各操作符的各操作数的 数据标识,确定所述第二子分组中的每棵逻辑查询计划的操作数重叠度,根据所述第二子分组中的每棵逻辑查询计划的操作数重叠度,将所述第二子分组中的逻辑查询计划树划分为至少一个查询组。
  3. 根据权利要求2所述的方法,其特征在于,所述根据每棵逻辑查询计划树的各操作符和所述每棵逻辑查询计划树的各操作符的各操作数的数据标识,确定所述每棵逻辑查询计划树的查询间写读依赖关系,包括:
    分别判断所述每棵逻辑查询计划树的文件定值操作符所定值的各操作数的数据标识,是否与所述多棵逻辑查询计划树中的其他逻辑查询计划树的扫描操作符的各操作数的数据标识相同;
    若相同,则确定包含所述文件定值操作符和所述扫描操作符的逻辑查询计划树具有查询间写读依赖关系。
  4. 根据权利要求2或3所述的方法,其特征在于,所述根据所述每棵逻辑查询计划树的各类操作符的数量,确定所述每棵逻辑查询计划树的操作符重叠度,包括:
    将所述第二分组中的每棵逻辑查询计划树的各类操作符的数量分别与所述第二分组中的其他逻辑查询计划树的各类操作符的数量进行比较,若所述第二分组中的两棵逻辑查询计划树的各类操作符的数量均相同,则将所述各类操作符的数量的总和分别作为所述两棵逻辑查询计划树在一次比较过程中的操作符重叠度,其中,所述一次比较过程包括所述第二分组中的一棵逻辑查询计划树与其他任意一棵逻辑查询计划树的比较过程。
  5. 根据权利要求1-4中任一项所述的方法,其特征在于,所述对所述每个查询组内的逻辑查询计划树进行查询间优化,包括:
    判断所述每个查询组内的第一查询子树和第二查询子树的树结构是否相同,其中,所述第一查询子树和所述第二查询子树属于同一个查询组的两棵不同的逻辑查询计划树;
    若所述第一查询子树和所述第二查询子树的树结构相同,判断所述第一查询子树的操作符的操作数的数据标识与所述第二查询子树的操作符的操作数的数据标识是否相同;
    若所述第一查询子树的操作符的操作数的数据标识与所述第二查询子树的操作符的操作数的数据标识相同,且所述第一查询子树的执行顺序在所述 第二查询子树之前,则将所述第二查询子树的后继连接到所述第一查询子树上,所述第二查询子树作为所述第一查询子树的新增后继,以使所述第二查询子树的后继复用所述第一查询子树的查询结果;
    或者,若所述第一查询子树的操作符的操作数的数据标识与所述第二查询子树的操作符的操作数的数据标识相同,且所述第一查询子树的执行顺序在所述第二查询子树之后,则将所述第一查询子树的后继连接到所述第二查询子树上,所述第一查询子树的后继作为所述第二查询子树的新增后继,以使所述第一查询子树的后继复用所述第二查询子树的查询结果。
  6. 根据权利要求1-4中任一项所述的方法,其特征在于,所述对所述每个查询组内的逻辑查询计划树进行查询间优化,包括:
    判断所述每个查询组内的一对文件定值操作符和扫描操作符对应的操作数的数据标识是否相同,其中,所述扫描操作符和所述文件定值操作符属于同一个查询组内的两棵不同的逻辑查询计划树;
    若所述文件定值操作符和所述扫描操作符对应的操作数的数据标识相同,则确定所述文件定值操作符和所述扫描操作符所属的逻辑查询计划树具有写读依赖关系,将所述文件定值操作符的前驱与所述扫描操作符的后继直接连接,并删除所述扫描操作符。
  7. 根据权利要求1-4中任一项所述的方法,其特征在于,所述对所述每个查询组内的逻辑查询计划树进行查询间优化,包括:
    判断所述每个查询组内的第一操作符和第二操作符的操作数的数据标识是否相同,其中,所述第一操作符和所述第二操作符属于同一个查询组的两棵不同的逻辑查询计划树;
    若所述第一操作符和所述第二操作符的操作数的数据标识相同,则为所述第一操作符和所述第二操作符添加合并标识,所述合并标识用于在生成物理查询计划树时将所述第一操作符和所述第二操作符进行合并。
  8. 根据权利要求1-7中任一项所述的方法,其特征在于,所述根据优化后的多棵逻辑查询计划树生成对应的多棵物理查询计划树之前,所述方法还包括:
    创建一个虚拟操作节点,以所述虚拟操作节点为根节点,将查询间优化后的多棵逻辑查询计划树的根节点作为所述虚拟操作节点的前驱,生成一棵 虚拟逻辑查询计划树;
    对所述虚拟逻辑查询计划树进行查询内优化,生成查询内优化后的虚拟逻辑查询计划树;
    删除所述查询内优化后的虚拟逻辑查询计划树的所述虚拟操作节点,得到所述优化后的多棵逻辑查询计划树。
  9. 一种批量数据查询装置,其特征在于,包括:
    获取模块,用于获取多条查询语句对应的多棵逻辑查询计划树,其中,每条查询语句对应一棵逻辑查询计划树,所述多棵逻辑查询计划树中的每棵逻辑查询计划树包括多个操作符,所述每棵逻辑查询计划树的每个操作符包括至少一个操作数;
    分组模块,用于根据所述每棵逻辑查询计划树的各操作符将所述多棵逻辑查询计划树划分为多个查询组,其中,每个查询组中的逻辑查询计划树之间具有下述关系:查询间写读依赖关系、操作符重叠关系或操作数重叠关系;
    优化模块,用于对所述每个查询组内的各逻辑查询计划树进行查询间优化;
    生成模块,用于根据优化后的多棵逻辑查询计划树生成对应的多棵物理查询计划树;
    查询模块,用于根据所述多棵物理查询计划树执行物理查询,得到所述多条查询语句的查询结果。
  10. 根据权利要求9所述的装置,其特征在于,所述分组模块包括:
    第一分组单元,用于根据所述每棵逻辑查询计划树的各操作符和所述每棵逻辑查询计划树的各操作符的各操作数的数据标识,确定所述每棵逻辑查询计划树的查询间写读依赖关系,将所有具有查询间写读依赖关系的逻辑查询计划树划分到第一分组中,将所有不具有查询间写读依赖关系的逻辑查询计划树划分到第二分组中;
    所述第一分组单元,还用于根据所述第一分组中的每棵逻辑查询计划树的查询间写读依赖关系将所述第一分组中的逻辑查询计划树划分为至少一个查询组;
    第二分组单元,用于统计所述第二分组中的每棵逻辑查询计划树的各类操作符的数量,根据所述每棵逻辑查询计划树的各类操作符的数量,确定所 述每棵逻辑查询计划树的操作符重叠度,将所述第二分组中具有操作符重叠关系的逻辑查询计划树划分到第一子分组中,将所述第二分组中除所述第一子分组中的逻辑查询计划树之外的其他逻辑查询计划树划分到第二子分组中;
    所述第二分组单元,还用于根据所述第一子分组中每棵逻辑查询计划树的操作符重叠度将所述第一子分组中逻辑查询计划树划分为至少一个查询组;
    第三分组单元,用于根据所述第二子分组中的每棵逻辑查询计划树的各操作符的各操作数的数据标识,确定所述第二子分组中的每棵逻辑查询计划的操作数重叠度,根据所述第二子分组中的每棵逻辑查询计划的操作数重叠度,将所述第二子分组中的逻辑查询计划树划分为至少一个查询组。
  11. 根据权利要求10所述的装置,其特征在于,所述第一分组单元具体用于:
    分别判断所述每棵逻辑查询计划树的文件定值操作符所定值的各操作数的数据标识,是否与所述多棵逻辑查询计划树中的其他逻辑查询计划树的扫描操作符的各操作数的数据标识相同;
    若相同,则确定包含所述文件定值操作符和所述扫描操作符的逻辑查询计划树具有查询间写读依赖关系。
  12. 根据权利要求10或11所述的装置,其特征在于,所述第二分组单元具体用于:
    将所述第二分组中的每棵逻辑查询计划树的各类操作符的数量分别与所述第二分组中的其他逻辑查询计划树的各类操作符的数量进行比较,若所述第二分组中的两棵逻辑查询计划树的各类操作符的数量均相同,则将所述各类操作符的数量的总和分别作为所述两棵逻辑查询计划树在一次比较过程中的操作符重叠度,其中,所述一次比较过程包括所述第二分组中的一棵逻辑查询计划树与其他任意一棵逻辑查询计划树的比较过程。
  13. 根据权利要求9-12中任一项所述的装置,其特征在于,所述优化模块具体用于:
    判断所述每个查询组内的第一查询子树和第二查询子树的树结构是否相同,其中,所述第一查询子树和所述第二查询子树属于同一个查询组的两棵 不同的逻辑查询计划树;
    若所述第一查询子树和所述第二查询子树的树结构相同,判断所述第一查询子树的操作符的操作数的数据标识与所述第二查询子树的操作符的操作数的数据标识是否相同;
    若所述第一查询子树的操作符的操作数的数据标识与所述第二查询子树的操作符的操作数的数据标识相同,且所述第一查询子树的执行顺序在所述第二查询子树之前,则将所述第二查询子树的后继连接到所述第一查询子树上,所述第二查询子树作为所述第一查询子树的新增后继,以使所述第二查询子树的后继复用所述第一查询子树的查询结果;
    或者,若所述第一查询子树的操作符的操作数的数据标识与所述第二查询子树的操作符的操作数的数据标识相同,且所述第一查询子树的执行顺序在所述第二查询子树之后,则将所述第一查询子树的后继连接到所述第二查询子树上,所述第一查询子树的后继作为所述第二查询子树的新增后继,以使所述第一查询子树的后继复用所述第二查询子树的查询结果。
  14. 根据权利要求9-12中任一项所述的装置,其特征在于,所述优化模块具体用于:
    判断所述每个查询组内的一对文件定值操作符和扫描操作符对应的操作数的数据标识是否相同,其中,所述扫描操作符和所述文件定值操作符属于同一个查询组内的两棵不同的逻辑查询计划树;
    若所述文件定值操作符和所述扫描操作符对应的操作数的数据标识相同,则确定所述文件定值操作符和所述扫描操作符所属的逻辑查询计划树具有写读依赖关系,将所述文件定值操作符的前驱与所述扫描操作符的后继直接连接,并删除所述扫描操作符。
  15. 根据权利要求9-12中任一项所述的装置,其特征在于,所述优化模块具体用于:
    判断所述每个查询组内的第一操作符和第二操作符的操作数的数据标识是否相同,其中,所述第一操作符和所述第二操作符属于同一个查询组的两棵不同的逻辑查询计划树;
    若所述第一操作符和所述第二操作符的操作数的数据标识相同,则为所述第一操作符和所述第二操作符添加合并标识,所述合并标识用于在生成物 理查询计划树时将所述第一操作符和所述第二操作符进行合并。
  16. 根据权利要求9-15中任一项所述的装置,其特征在于,所述优化模块还用于:
    创建一个虚拟操作节点,以所述虚拟操作节点为根节点,将查询间优化后的多棵逻辑查询计划树的根节点作为所述虚拟操作节点的前驱,生成一棵虚拟逻辑查询计划树;
    对所述虚拟逻辑查询计划树进行查询内优化,生成查询内优化后的虚拟逻辑查询计划树;
    删除所述查询内优化后的虚拟逻辑查询计划树的所述虚拟操作节点,得到所述优化后的多棵逻辑查询计划树。
PCT/CN2015/094948 2014-11-21 2015-11-18 批量数据查询方法和装置 WO2016078592A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/600,199 US10769147B2 (en) 2014-11-21 2017-05-19 Batch data query method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410673523.2 2014-11-21
CN201410673523.2A CN105677683B (zh) 2014-11-21 2014-11-21 批量数据查询方法和装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/600,199 Continuation US10769147B2 (en) 2014-11-21 2017-05-19 Batch data query method and apparatus

Publications (1)

Publication Number Publication Date
WO2016078592A1 true WO2016078592A1 (zh) 2016-05-26

Family

ID=56013300

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/094948 WO2016078592A1 (zh) 2014-11-21 2015-11-18 批量数据查询方法和装置

Country Status (3)

Country Link
US (1) US10769147B2 (zh)
CN (1) CN105677683B (zh)
WO (1) WO2016078592A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10956417B2 (en) * 2017-04-28 2021-03-23 Oracle International Corporation Dynamic operation scheduling for distributed data processing
CN112860740A (zh) * 2019-11-27 2021-05-28 中兴通讯股份有限公司 一种触发器实现的方法和装置
US11423022B2 (en) 2018-06-25 2022-08-23 Oracle International Corporation Hybrid declarative query compiler and optimizer framework

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180060389A1 (en) * 2016-08-29 2018-03-01 Sap Se Query optimization over distributed heterogeneous execution engines
US10409815B2 (en) * 2017-02-27 2019-09-10 Sap Se SQLScript compilation tracing system
CN107220376B (zh) * 2017-06-21 2020-10-27 北京奇艺世纪科技有限公司 一种数据查询方法和装置
CN108108472B (zh) * 2018-01-02 2022-05-31 联想(北京)有限公司 数据处理方法以及服务器
US10572481B1 (en) * 2018-03-26 2020-02-25 Jeffrey M. Gunther System and method for integrating health information sources
CN110909023B (zh) * 2018-09-17 2021-11-19 华为技术有限公司 一种查询计划的获取方法、数据查询方法及装置
CN111382174B (zh) * 2018-12-28 2023-10-17 百度在线网络技术(北京)有限公司 多方数据联合查询方法、装置、服务器和存储介质
WO2022079856A1 (ja) * 2020-10-15 2022-04-21 日本電信電話株式会社 データ処理装置、データ処理方法及びデータ処理プログラム
CN113656438B (zh) * 2021-08-06 2023-12-12 北京数码大方科技股份有限公司 数据树的数据查询方法及装置
CN114328595B (zh) * 2021-11-30 2024-01-09 苏州浪潮智能科技有限公司 一种数据查询方法、装置、电子设备及存储介质
CN116701429B (zh) * 2023-05-19 2023-12-29 杭州云之重器科技有限公司 一种基于批量历史任务模糊化的公共查询方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040158551A1 (en) * 2003-02-06 2004-08-12 International Business Machines Corporation Patterned based query optimization
US20060020573A1 (en) * 2000-03-31 2006-01-26 Microsoft Corporation Validating multiple execution plans for database queries
CN103761080A (zh) * 2013-12-25 2014-04-30 中国农业大学 一种基于SQL的MapReduce作业生成方法及系统
US20140156633A1 (en) * 2012-11-30 2014-06-05 International Business Machines Corporation Scalable Multi-Query Optimization for SPARQL

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8392920B2 (en) * 2010-01-27 2013-03-05 Microsoft Corporation Parallel query engine with dynamic number of workers
US8868594B2 (en) * 2011-12-23 2014-10-21 Sap Ag Split processing paths for a database calculation engine
CN102521409B (zh) * 2011-12-28 2015-04-08 北京人大金仓信息技术股份有限公司 一种基于通信协议的数据库数据批量更新方法
CN102609493B (zh) * 2012-01-20 2014-07-02 东华大学 一种基于列存储模型的连接顺序查询优化方法
CN103914462B (zh) * 2012-12-31 2017-09-05 中国移动通信集团公司 一种数据存储和查询方法以及装置
US10740329B2 (en) * 2013-03-15 2020-08-11 Teradata Us, Inc. Optimization of database queries for database systems and environments
US9501536B1 (en) * 2014-07-21 2016-11-22 Pivotal Software, Inc. Query plan generation for queries having common expressions

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060020573A1 (en) * 2000-03-31 2006-01-26 Microsoft Corporation Validating multiple execution plans for database queries
US20040158551A1 (en) * 2003-02-06 2004-08-12 International Business Machines Corporation Patterned based query optimization
US20140156633A1 (en) * 2012-11-30 2014-06-05 International Business Machines Corporation Scalable Multi-Query Optimization for SPARQL
CN103761080A (zh) * 2013-12-25 2014-04-30 中国农业大学 一种基于SQL的MapReduce作业生成方法及系统

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10956417B2 (en) * 2017-04-28 2021-03-23 Oracle International Corporation Dynamic operation scheduling for distributed data processing
US11423022B2 (en) 2018-06-25 2022-08-23 Oracle International Corporation Hybrid declarative query compiler and optimizer framework
CN112860740A (zh) * 2019-11-27 2021-05-28 中兴通讯股份有限公司 一种触发器实现的方法和装置
CN112860740B (zh) * 2019-11-27 2022-12-20 金篆信科有限责任公司 一种触发器实现的方法和装置

Also Published As

Publication number Publication date
CN105677683B (zh) 2018-12-14
US20170255673A1 (en) 2017-09-07
CN105677683A (zh) 2016-06-15
US10769147B2 (en) 2020-09-08

Similar Documents

Publication Publication Date Title
WO2016078592A1 (zh) 批量数据查询方法和装置
US8332389B2 (en) Join order for a database query
US10606834B2 (en) Methods and apparatus of shared expression evaluation across RDBMS and storage layer
US10521427B2 (en) Managing data queries
US8396852B2 (en) Evaluating execution plan changes after a wakeup threshold time
US10133778B2 (en) Query optimization using join cardinality
US8601474B2 (en) Resuming execution of an execution plan in a virtual machine
US8290936B2 (en) Executing a query plan with display of intermediate results
Kolb et al. Parallel sorted neighborhood blocking with mapreduce
US8442971B2 (en) Execution plans with different driver sources in multiple threads
US9218394B2 (en) Reading rows from memory prior to reading rows from secondary storage
KR102138627B1 (ko) 데이터 쿼리 방법 및 장치, 및 데이터베이스 시스템
Dey et al. On implementing provenance-aware regular path queries with relational query engines
Zhang et al. Multi-query optimization for complex event processing in SAP ESP
US20230161765A1 (en) System and method for disjunctive joins using a lookup table
US10678789B2 (en) Batch data query method and apparatus
US20230205769A1 (en) System and method for disjunctive joins
US20220215021A1 (en) Data Query Method and Apparatus, Computing Device, and Storage Medium
De Virgilio Rdf keyword search query processing via tensor calculus
US20210149897A1 (en) Method and apparatus for processing query using n-ary join operators
Gagliardelli et al. Scaling Up Record-level Matching Rules.
JP2023090684A (ja) 分散グラフデータベースにおけるsparqlクエリの最適化
Bar-Yossefa et al. The Space Complexity of Processing XML Twig Queries Over Indexed Documents⋆⋆
이태휘 Join Processing with Filtering Techniques on MapReduce Cluster
Atre et al. BitPath–Label Order Constrained Reachability in Large Graphs

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15860966

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15860966

Country of ref document: EP

Kind code of ref document: A1