CN116975098A - Query plan construction method, device, electronic equipment and storage medium - Google Patents

Query plan construction method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116975098A
CN116975098A CN202310575987.9A CN202310575987A CN116975098A CN 116975098 A CN116975098 A CN 116975098A CN 202310575987 A CN202310575987 A CN 202310575987A CN 116975098 A CN116975098 A CN 116975098A
Authority
CN
China
Prior art keywords
sub
query
expression
data
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310575987.9A
Other languages
Chinese (zh)
Inventor
石志林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Tencent Computer Systems Co Ltd
Original Assignee
Shenzhen Tencent Computer Systems Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Tencent Computer Systems Co Ltd filed Critical Shenzhen Tencent Computer Systems Co Ltd
Priority to CN202310575987.9A priority Critical patent/CN116975098A/en
Publication of CN116975098A publication Critical patent/CN116975098A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24542Plan optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases

Abstract

The embodiment of the application discloses a query plan construction method, a query plan construction device, electronic equipment and a storage medium; the embodiment of the application can acquire the data query expression, wherein the data query expression comprises keywords; according to the keywords, sub-expressions in the data query expression and the association relation between the sub-expressions are obtained; determining an equivalent sub-expression, wherein the equivalent sub-expression and the sub-expression have equivalent expression meanings; determining a target sub-expression from the sub-expression and the equivalent sub-expression; and constructing a data query plan through the target sub-expression and the association relation. In the embodiment of the application, the target sub-expressions can construct the query plan according to the association relation, so that the sub-expressions do not conflict with each other when constructing the data query plan. Therefore, the scheme can be convenient for constructing a data query plan and improves query performance.

Description

Query plan construction method, device, electronic equipment and storage medium
Technical Field
The present application relates to the field of computers, and in particular, to a method and apparatus for constructing a query plan, an electronic device, and a storage medium.
Background
A database management system is a software system for managing and organizing data that can assist users in creating, managing, storing, and retrieving data. When retrieving data, the database management system can expand data query through a query optimization technology, construct a query plan through the expanded data query, and query data according to the query plan.
However, the operational steps in the current query plan may conflict with each other, resulting in that the data obtained by querying the constructed query plan may not be consistent with expectations, affecting the query efficiency.
Disclosure of Invention
The embodiment of the application provides a query plan construction method, a query plan construction device, electronic equipment and a storage medium, which can be used for conveniently constructing a data query plan and improving query efficiency.
The embodiment of the application provides a query plan construction method, which comprises the following steps:
acquiring a data query expression, wherein the data query expression comprises keywords;
according to the keywords, sub-expressions in the data query expression and the association relation between the sub-expressions are obtained;
determining an equivalent sub-expression, wherein the equivalent sub-expression and the sub-expression have equivalent expression meanings;
determining a target sub-expression from the sub-expression and the equivalent sub-expression;
And constructing a data query plan through the target sub-expression and the association relation.
The embodiment of the application also provides a query plan construction device, which comprises:
the acquisition unit is used for acquiring a data query expression, wherein the data query expression comprises keywords;
the key unit is used for obtaining sub-expressions in the data query expression and the association relation between the sub-expressions according to the key words;
the equivalent unit is used for determining an equivalent expression, and the equivalent expression and the sub-expression have equivalent expression meanings;
a determining unit configured to determine a target sub-expression from the sub-expression and the equivalent sub-expression;
and the construction unit is used for constructing the data query plan through the target sub-expression and the association relation.
In some embodiments, determining the target sub-expression from the sub-expression and the equivalent sub-expression includes:
acquiring a first condition number and a second condition number, wherein the first condition number is the condition number of the data connection condition associated with the sub-expression, and the second condition number is the condition number of the data connection condition associated with the equivalent sub-expression;
the target sub-expression is determined from the sub-expression and the equivalent sub-expression according to the first condition number and the second condition number.
In some embodiments, determining the target sub-expression from the sub-expression and the equivalent sub-expression includes:
acquiring metadata information of data to be queried associated with the sub-expressions;
acquiring the data quantity of the data to be queried from the metadata information;
the target sub-expression is determined from the sub-expression and the equivalent sub-expression according to the data amount.
In some embodiments, constructing the data query plan from the target sub-expressions and the associations includes:
acquiring a query optimization request associated with a target sub-expression from a data query expression;
constructing an initial sub-query plan through a target sub-expression;
determining a sub-query plan from the initial sub-query plan according to the query optimization request and the association relation associated with the target sub-expression;
and merging the sub-query plans to obtain the data query plan.
In some embodiments, constructing the initial sub-query plan from the target sub-expression includes:
acquiring a query operator associated with a target sub-expression;
constructing a connection relation between query operators according to the target sub-expression;
and controlling the query operators to be arranged according to the connection relation to obtain an initial sub-query plan.
In some embodiments, determining a sub-query plan from the initial sub-query plan based on the query optimization request and the association relationship associated with the target sub-expression, includes:
Acquiring metadata information of data to be queried associated with the target sub-expression according to a query optimization request associated with the target sub-expression;
analyzing the calculated amount required by the execution of the initial sub-query plan through metadata information of the data to be queried;
and determining a sub-query plan from the initial sub-query plans according to the calculated quantity and the association relation.
In some embodiments, the target sub-expression includes a first expression and a second expression, the first expression is used for querying data to be queried, the second expression is used for screening the data to be queried, and determining a sub-query plan from the initial sub-query plan according to a query optimization request and an association relation associated with the target sub-expression includes:
determining a first sub-query plan from the first initial sub-query plan according to the query optimization request associated with the first expression, wherein the first initial sub-query plan is an initial sub-query plan constructed through the first expression;
determining an unsatisfied request from the query optimization requests associated with the first expression based on the query optimization requests satisfied by the first sub-query plan;
determining a target optimization request from the query optimization requests associated with the second expression according to the unsatisfied requests and the association relation, wherein the target optimization request comprises the unsatisfied requests;
And determining a second sub-query plan from the second initial sub-query plan according to the target optimization request, wherein the second initial sub-query plan is an initial sub-query plan constructed through a second expression.
In some embodiments, constructing the initial sub-query plan from the target sub-expression includes:
acquiring a historical query optimization request associated with a target sub-expression and a historical query plan corresponding to the historical query optimization request;
determining a target historical request corresponding to the query optimization request from the historical query optimization requests;
if the target history request exists, taking a history query plan corresponding to the target history request as a sub-query plan;
if the target history request does not exist, an initial sub-query plan is constructed through the target sub-expression.
In some embodiments, obtaining a historical query optimization request associated with the target sub-expression, and a historical query plan corresponding to the historical query optimization request, includes:
the method comprises the steps of obtaining a hash table associated with a target sub-expression, wherein the hash table comprises a historical query optimization request, a first label and a plan calling link which are associated with the historical query optimization request, and the first label is a hash value obtained by the historical query optimization request through hash function calculation;
Determining a target historical request corresponding to the query optimization request from the historical query optimization requests, wherein the target historical request comprises:
determining a second label, wherein the second label is a hash value obtained by the query optimization request through hash function calculation;
determining a target historical request corresponding to the query optimization request from the historical query optimization requests according to the first tag and the second tag;
if the target history request exists, taking a history query plan corresponding to the target history request as a sub-query plan, including:
if the target historical request exists, a plan calling link associated with the target historical request is adopted, and a historical query plan corresponding to the historical query optimization request is called as a sub-query plan.
In some embodiments, obtaining the data query expression includes:
acquiring a data query command, wherein the data query command comprises keywords;
according to the keywords, carrying out grammar analysis processing on the data query command to obtain a grammar tree of the data query command;
and carrying out format conversion processing on the grammar tree by adopting a preset data exchange language to obtain a data query expression.
In some embodiments, after constructing the data query plan by the target sub-expression, further comprising:
Determining a query instruction corresponding to the data query plan and a copy instruction of the query instruction;
and sending the copy instruction to a plurality of data storage nodes of the database so that the data storage nodes return query data according to the copy instruction.
The embodiment of the application also provides electronic equipment, which comprises a memory, wherein the memory stores a plurality of instructions; the processor loads instructions from the memory to perform steps in any of the query plan construction methods provided by the embodiments of the present application.
The embodiment of the application also provides a computer readable storage medium, which stores a plurality of instructions, wherein the instructions are suitable for being loaded by a processor to execute the steps in any query plan construction method provided by the embodiment of the application.
The embodiments of the present application also provide a computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps in any of the query plan construction methods provided by the embodiments of the present application.
The embodiment of the application can acquire the data query expression, wherein the data query expression comprises keywords; according to the keywords, sub-expressions in the data query expression and the association relation between the sub-expressions are obtained; determining an equivalent sub-expression, wherein the equivalent sub-expression and the sub-expression have equivalent expression meanings; determining a target sub-expression from the sub-expression and the equivalent sub-expression; and constructing a data query plan through the target sub-expression and the association relation.
In the application, the data query expression can have a plurality of sub-expressions, each sub-expression has an association relationship, the sub-expressions can be expanded through the equivalent sub-expressions, and as the equivalent sub-expressions and the sub-expressions have equivalent expression meanings, the equivalent sub-expressions have the association relationship of the sub-expressions, the target sub-expressions can be screened from the sub-expressions and the equivalent sub-expressions, the target sub-expressions can be sub-expressions or equivalent sub-expressions, and the plurality of target sub-expressions can construct a query plan according to the association relationship, so that the plurality of sub-expressions do not have mutual conflict when constructing the query plan, thereby being convenient for constructing the data query plan and improving the query performance.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1a is a schematic view of a scenario of a query plan construction method provided by an embodiment of the present application;
FIG. 1b is a flowchart illustrating a query plan construction method according to an embodiment of the present application;
FIG. 2a is a schematic diagram of a query system according to an embodiment of the present application;
FIG. 2b is a schematic block diagram of a query optimizer provided in an embodiment of the present application;
FIG. 2c is a schematic diagram of creating a sub-expression provided by an embodiment of the present application;
FIG. 2d is a schematic diagram of acquiring metadata information according to an embodiment of the present application;
FIG. 2e is a schematic diagram of a structure for constructing a data query plan according to an embodiment of the present application
FIG. 2f is a flow chart diagram of constructing a data query plan provided by an embodiment of the present application;
FIG. 3 is a schematic diagram of a query plan construction device according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a server according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to fall within the scope of the application.
The embodiment of the application provides a query plan construction method, a query plan construction device, electronic equipment and a storage medium.
The query plan construction device may be integrated in an electronic device, which may be a terminal, a server, or other devices. The terminal can be a mobile phone, a tablet computer, an intelligent Bluetooth device, a notebook computer, a personal computer (Personal Computer, PC) or the like; the server may be a single server or a server cluster composed of a plurality of servers.
In some embodiments, the query plan construction apparatus may also be integrated in a plurality of electronic devices, for example, the query plan construction apparatus may be integrated in a plurality of servers, and the query plan construction method of the present application is implemented by the plurality of servers.
In some embodiments, the server may also be implemented in the form of a terminal.
For example, referring to FIG. 1a, the electronic device may obtain a data query expression, the data query expression including keywords; according to the keywords, sub-expressions in the data query expression and the association relation between the sub-expressions are obtained; determining an equivalent sub-expression, wherein the equivalent sub-expression and the sub-expression have equivalent expression meanings; determining a target sub-expression from the sub-expression and the equivalent sub-expression; and constructing a data query plan through the target sub-expression and the association relation.
The data query expression can have a plurality of sub-expressions, each sub-expression has an association relationship, the sub-expressions can be expanded through the equivalent sub-expressions, and as the equivalent sub-expressions and the sub-expressions have equivalent expression meanings, the equivalent sub-expressions have the association relationship of the sub-expressions, the target sub-expressions can be screened from the sub-expressions and the equivalent sub-expressions, the target sub-expressions can be sub-expressions or equivalent sub-expressions, and the plurality of target sub-expressions can construct a query plan according to the association relationship, so that the plurality of sub-expressions do not have mutual conflict when constructing the query plan, thereby being convenient for constructing the query plan and improving the query performance.
The following will describe in detail. The numbers of the following examples are not intended to limit the preferred order of the examples.
In this embodiment, a query plan construction method is provided, as shown in fig. 1b, and the specific flow of the query plan construction method may be as follows:
110. a data query expression is obtained, the data query expression including keywords.
Wherein the data query expression is a syntax structure for querying and filtering data from a plurality of database management systems. For example, the language of the data Query expression may be a simple expression language (Expression Language, EL), a Query expression language (Query Domain Specific Language, query DSL), a Java persistence standard Query language (Java Persistence API Query Language, JPQL), an expression language in Spring (Spring) framework (Spring Expression Language, spEL), and so forth.
Keywords are used to specify special words or characters of query conditions in a data query expression, where the data query expression may exist for multiple query conditions. For example, keywords may include queries (SELECT): for selecting columns of data to be queried FROM a database, tables FROM (FROM) for specifying data to be queried, conditions (WHERE) for specifying filtering conditions for query results, ordering (ORDER BY) for ordering query results BY specified columns, and so forth.
It will be appreciated that the data query expression in the EL language can interact with multiple database management systems because it does not directly access the database, but rather accesses the database through a Java database connection (Java Database Connectivity, JDBC), which is a Java interface for executing SQL statements in Java applications and interacting with multiple database management systems. In this way, applications can be easily connected to various types and versions of database management systems through data query expressions in the EL language, thereby achieving wider functionality and interoperability. In addition, the data query expression of the EL language has good portability and cross-platform performance, and can run on a plurality of platforms and application servers.
The method for acquiring the data query expression can be as follows:
(1) If the query optimizer runs outside the database management systems, one query optimizer can be used by a plurality of database management systems, after a user inputs a data query command into any one of the database management systems, any one of the database management systems can convert the data query command into a data query expression, and then the data query expression is sent to the query optimizer;
(2) If the query optimizer runs in the database management system, after a user inputs a data query command to the database management system, the database management system can convert the data query command into a data query expression, and then the data query expression is transmitted to the query optimizer;
wherein the query optimizer is a component that provides a query optimization technique.
In some embodiments, considering that the types of database management systems may include relational database management systems, non-relational database management systems, in-memory database management systems, distributed database management systems, and data warehouses, etc., to enable a query optimizer to be disposed in a plurality of different database management systems, obtaining a data query expression includes:
Acquiring a data query command, wherein the data query command comprises keywords;
according to the keywords, carrying out grammar analysis processing on the data query command to obtain a grammar tree of the data query command;
and carrying out format conversion processing on the grammar tree by adopting a preset data exchange language to obtain a data query expression.
The data query command is a command input by a user when querying data from the database management system. For example, the data query command may be a query field input by a user, a voice command of the user, a command output by a user writing a data query script when running, and so on.
The keywords correspond to query conditions in the data query command.
For example, the data query command is to screen the same data from the a-th column of the table T1 and the b-th column of the table T2, so that the query condition in the data query command may include the a-th column of the query table T1, the b-th column of the query table T2, and the same data from the a-th column of the table T1 and the b-th column of the table T2, the keyword may be used to indicate the a-th column of the query table T1, the b-th column of the query table T2, and the same data from the a-th column of the table T1 and the b-th column of the table T2.
The syntax tree represents query logic for data query commands, wherein the syntax tree includes a plurality of nodes representing different operations.
For example, in the syntax tree, SELECT node represents an operation of data to be queried, column (column) 1 and column2 nodes and the like represent columns to be queried, FROM node represents which tables to query FROM, table1 and table2 are names of these tables, WHERE node represents a condition to limit query results, and the like.
The preset data interaction language may enable the query optimizer to interact with different types of database management systems. For example, the preset data interaction language may be EL language, query DSL language, JPQL language, and the like.
120. And obtaining sub-expressions in the data query expression and association relations among the sub-expressions according to the keywords.
The sub-expression is used for realizing the query condition indicated by the keyword in the data query expression. For example, the sub-expressions may be used to select columns of data from a database management system that require tables of query data, may be used to specify filtering conditions for query results, may be used to rank the query results by specified columns, and so on.
The association relationship is used to indicate the logical order in which the sub-expressions are executed. For example, the association relationship may include a parallel relationship, a progressive relationship, a AND relationship (AND relationship) between the sub-expressions, AND the like, where the AND relationship is used to limit a plurality of query conditions to the same query, AND only when all the conditions are satisfied at the same time, the data meeting the requirements can be returned.
For example, if the data query expression is "SELECT T1.A FROM T1, T2 WHERE T1. A=t2. B ORDER BY T1.A", the data query expression indicates that the tables T1 and T2 are connected BY means of table connection, the connection condition is t1.a=t2. B, meaning that only the value of the a column in the table T1 is equal to the value of the b column in the table T2 will be returned, i.e., the data conforming to the query condition is filtered out BY the connection condition is t1.a=t2. B, and sorted BY the value of the T1.A column.
Sub-expression 1, sub-expression 2, and sub-expression 3, sub-expression 1 for retrieving the column a data of table T1, sub-expression 2 for retrieving the column b data of table T2, sub-expression 3 for indicating an internal connection (lnner Join (t1.a=t2.b)) between table T1 and table T2, and sub-expression 2 for filtering the query result of sub-expression 1 and the query result of sub-expression 2, can be obtained BY "SELECT T1.A", "FROM T1, T2", "WHERE T1. A=t2.b", "WHERE" is found in the data query expression.
The sub-expression 1 AND the sub-expression 2 have parallel relation, AND the sub-expression 1 AND the sub-expression 2 have AND relation with the sub-expression 3 respectively, that is, after the sub-expression 1 AND the sub-expression 2 acquire the query results respectively, the query results of the sub-expression 1 AND the query results of the sub-expression 2 can be filtered through the sub-expression 3.
130. Determining an equivalent sub-expression, wherein the equivalent sub-expression and the sub-expression have equivalent expression meanings.
Equivalent sub-expressions may be used in place of sub-expressions, with the same query result as sub-expressions, but with different query logic, where the sub-expressions correspond to one or more equivalent sub-expressions.
Equivalent expression meanings are used to define that the equivalent sub-expression and the sub-expression have the same semantics, so that the query result of the equivalent sub-expression is the same as the query result of the sub-expression.
For example, the sub-expression is a sub-expression of the a-th column data of the read (Get) table T1, and the equivalent sub-expression may be a sub-expression of the a-th column data of the Scan (Scan) table T1, or may be a sub-expression obtained by combining the a-th column data of the Scan table T1 and then re-allocating (re-allocating) the a-th column data of the table T1, and so on, so that the sub-expression and the equivalent sub-expression may both look up the a-th column data of the table T1.
In some embodiments, to facilitate obtaining the equivalent sub-expression, determining the equivalent sub-expression includes:
acquiring a preset expression set, wherein the preset expression set comprises a plurality of preset expressions, and the plurality of preset expressions have equivalent expression meanings;
And determining the equivalent sub-expression from the preset expression set according to the sub-expression.
The preset expression set is a set of a plurality of preset expressions with equivalent expression meanings.
For example, preset expressions of a plurality of different query logics are recorded in a preset expression set, but the same query result can be obtained by each preset expression in the preset expression set having the same expression meaning, and the equivalent sub-expressions of the sub-expressions can be conveniently determined by the preset expression set.
140. The target sub-expression is determined from the sub-expression and the equivalent sub-expression.
The target sub-expression is an expression with query cost meeting preset conditions in the sub-expression and the equivalent sub-expression. For example, the target sub-expression may be an expression with the smallest query cost among the sub-expressions and the equivalent sub-expressions, or any one of the expressions with the query cost satisfying a preset condition, and so on.
In some embodiments, considering that the sub-expressions may indicate table connections for combining data between two or more tables in order to query information across the multiple tables, determining a target sub-expression from the sub-expressions and equivalent sub-expressions in order to obtain a sub-expression from the sub-expressions and equivalent sub-expressions that is less query-costly, comprises:
Acquiring a first condition number and a second condition number, wherein the first condition number is the condition number of the data connection condition associated with the sub-expression, and the second condition number is the condition number of the data connection condition associated with the equivalent sub-expression;
the target sub-expression is determined from the sub-expression and the equivalent sub-expression according to the first condition number and the second condition number.
Wherein the data connection condition is used to specify a connection relationship between at least two tables in order to combine the data in the at least two tables, wherein the data connection condition may be combined using a comparison operator (=, < =, etc.) AND a logical operator (AND, OR, etc.).
It is understood that the number of data connection conditions may be a plurality.
For example, assume that there are two tables of employee data (employee) and department data (department) tables, which contain employee and department information, respectively, the employee table and the department table are connected using an Inner Join, and two data connection conditions are used: employee department identification (employee_id) =department identification (division_id), and employee city (employee.city) =department city (division.city), so that it is ensured that only employees and department records in the same city and in the same department are returned, and required employee names, department names, and address information are acquired.
The first condition number is condition data in which the sub-expression represents a connection relationship between at least two tables.
The second condition number is condition data in which the equivalent sub-expression represents a connection relationship between at least two tables.
For example, in Join generally refers to an operation of connecting two tables based on a comparison operator (e.g., equal to "="), i.e., whether or not corresponding rows need to be connected according to whether or not corresponding column values in the two tables are equal, and thus, in the in Join operation, there are generally only one or a few data connection conditions.
An internal Hash Join (Inner Hash Join) is a Hash table-based connection that enables joining between tables, and performs well when handling large-scale data. The main steps are that the data in the two tables to be connected are respectively stored in the hash tables in the memory, and the hash functions are used for mapping the data into the corresponding hash buckets. Then, for the data in each hash bucket, connection is made by comparing connection conditions between them.
Since the Inner Hash Join needs to build an additional data structure such as a Hash table, more memory and computing resources are usually required in practical situations.
In order to determine a target sub-expression from the sub-expressions and the equivalent sub-expressions according to the first condition number and the second condition number:
and when the data amount in the table is smaller, screening out the condition amount with small amount from the first condition amount and the second condition amount, if the condition amount with small amount is the first condition amount, taking the sub-expression as a target sub-expression, and if the condition amount with small amount is the second condition amount, taking the equivalent sub-expression corresponding to the second condition amount as the target sub-expression.
And when large-scale data is recorded in the table, the condition number with a large number is screened out from the first condition number and the second condition number, if the condition number with a large number is the first condition number, the sub-expression is used as a target sub-expression, and if the condition number with a large number is the second condition number, the equivalent sub-expression corresponding to the second condition number is used as a target sub-expression, so that data query can be efficiently executed, and better performance can be provided when the large-scale data is processed.
In some embodiments, multiple expressions may be employed in view of querying the same dataset, each of the expressions having different query logic such that each of the expressions has different query costs, in order that a target sub-expression may be determined from sub-expressions and equivalent sub-expressions in order to obtain a sub-expression of low query cost from the sub-expressions and equivalent sub-expressions, including:
Acquiring metadata information of data to be queried associated with the sub-expressions;
acquiring the data quantity of the data to be queried from the metadata information;
the target sub-expression is determined from the sub-expression and the equivalent sub-expression according to the data amount.
The data to be queried is the data in a table which needs to be queried by the sub-expression. For example, the sub-expression is used to look up table a, and the data to be queried is the data in table a.
The metadata information is data information describing data to be queried.
For example, metadata information is used to describe data to be queried, including information on the format, scale, encoding mode, access protocol, etc. of the data.
The data size is the size of the data to be queried, which is usually represented in the form of record number or file size. For example, the size of the data to be queried can be acquired through metadata information.
For example, the sub-expression is Get (T1), the equivalent sub-expressions are Scan (T1), get (T1) and Scan (T1) are used to query the data to be queried in table T1.
The scene using the sub-expression Get (T1) may include:
1. a single record of a designated row key needs to be acquired;
2. only a small amount of data records need to be queried, such as basic information of a certain user and the like;
3. Multiple version records of a row need to be queried.
The scenario using the sub-expression Scan (T1) may include:
1. a large number of data records, such as batch statistical analysis data, need to be queried;
2. filtering or sorting according to specific conditions, such as time, numerical value and the like;
3. multiple rows of data within an entire table or specified range need to be traversed, such as data backup, archiving, etc.
If the data amount is small, the sub-expression Get (T1) may be adopted as the target sub-expression, and if the data amount is large, the equivalent sub-expression Scan (T1) may be adopted as the target sub-expression.
150. And constructing a data query plan through the target sub-expression and the association relation.
The data query plan is an executable data query scheme constructed by the target sub-expression according to the association relation.
For example, if there are 3 sub-expressions in the data query expression, the sub-expression 1, the sub-expression 2, AND the sub-expression 3, the association relationship between the sub-expression 1 AND the sub-expression 2 may be a parallel relationship, AND the association relationship between the sub-expression 1 AND the sub-expression 2 AND the sub-expression 3 may be an AND relationship, respectively.
The target sub-expression may be a sub-expression or an equivalent sub-expression, that is, there is a target sub-expression 1 determined from sub-expression 1 and equivalent sub-expression 1, a target sub-expression 2 determined from sub-expression 2 and equivalent sub-expression 2, a target sub-expression 3 determined from sub-expression 3 and equivalent sub-expression 3, and an association relationship exists between each target sub-expression and other target sub-expressions.
According to the equivalent sub-expression and the sub-expression having equivalent expression meanings, the association relationship between the equivalent sub-expression 1 and the sub-expression 2 or the equivalent sub-expression 2 is a parallel relationship; the association relationship of the equivalent sub-expression 1 AND the sub-expression 3 or the equivalent sub-expression 3 may be an AND relationship; the association relationship of the equivalent sub-expression 2 AND the sub-expression 3 or the equivalent sub-expression 3 may be an AND relationship.
If the association relationship between the target sub-expression 1 AND the target sub-expression 2 is a parallel relationship, AND the association relationship between the target sub-expression 1 AND the target sub-expression 2 is an AND relationship, controlling the target sub-expression 1 AND the target sub-expression 2 to independently construct a query method according to the association relationship between the target sub-expression 1 AND the target sub-expression 2, AND constructing a query scheme for screening query results of the target sub-expression 1 AND the target sub-expression 2 according to the association relationship between the target sub-expression 1 AND the target sub-expression 2 AND the association relationship between the target sub-expression 2 AND the target sub-expression 3.
In some embodiments, considering that the sub-expressions can construct a plurality of query plans with the same query function, in order to screen a query plan with low query cost from the plurality of query plans, constructing a data query plan through the target sub-expressions and the association relation includes:
Acquiring a query optimization request associated with a target sub-expression from a data query expression;
constructing an initial sub-query plan through a target sub-expression;
determining a sub-query plan from the initial sub-query plan according to the query optimization request and the association relation associated with the target sub-expression;
and merging the sub-query plans to obtain the data query plan.
The query optimization request is a query optimization request formed by query conditions associated with the target sub-expression in the data query expression, wherein the query conditions can be data distribution, index type and the like associated with the target sub-expression in the data query expression, and the query optimization request can be used for optimizing a query plan constructed by the target sub-expression.
For example, the query optimization request may be a request for quickly acquiring data to be queried, a request for reducing query costs, a request for satisfying a certain range of query speeds and query costs, and so on.
The initial sub-query plan is a plurality of non-optimized sub-query plans constructed for the target sub-expression. For example, the plurality of sub-query plans that are not optimized include sub-query plans that are high in query cost, sub-query plans that are low in query cost, sub-query plans that are slow in query speed, sub-query plans that are fast in query speed, and so forth.
The sub-query plans are initial sub-query plans which are acquired from the initial sub-query plans and meet the query optimization request according to the association relation.
For example, the data query expression includes a target sub-expression 1, a target sub-expression 2, and a target sub-expression 3, the target sub-expression 1 is used for the query table T1, the target sub-expression 2 is used for the data of the b-th column in the query table T2, the target sub-expression 3 is used for screening data from the data obtained from the target sub-expression 1 and the target sub-expression 2, and the data query expression further includes a sorting according to the a-th column of the table T1. The association relationship between the target sub-expression 1 AND the target sub-expression 2 is a parallel relationship between the sub-expressions, AND the association relationship between the target sub-expression 1 AND the target sub-expression 2 AND the target sub-expression 3, respectively, is an AND relationship.
The query optimization request associated with target sub-expression 1 may be used to cause target sub-expression 1 to quickly retrieve the lookup table T1 and to cause the data queried from table T1 to be ordered according to column a of the table. The query optimization request associated with target sub-expression 2 may be used to cause target sub-expression 2 to quickly retrieve column b data in lookup table T2. The query optimization request associated with target sub-expression 3 may be used to merge the sorted filter data collected by target sub-expression 3 from all database nodes (sorted according to column a of table T1).
The initial sub-query plan constructed by the target sub-expression 1 may include Get (T1), scan (T1) +sort (T1. A), and the initial sub-query plan constructed by the target sub-expression 2 may include Scan (T2) +redistribute (T2. B), get (T2. B).
According to the query optimization request and the association relation associated with the target sub-expression, determining the sub-query plan from the initial sub-query plan can be:
because of the parallel relationship, the initial sub-query plans constructed by the target sub-expression 1 and the target sub-expression 2 can be independently screened, the Scan (T1) can be screened from the initial sub-query plans Get (T1), scan (T1) and Scan (T1) +sort (T1. A), and the Scan (T2) +redistribute (T2. B) can be screened from the initial sub-query plans Scan (T2) +redistribute (T2. B) and Get (T2. B) through the query optimization requests correlated with the target sub-expression 2.
According to the association relationship between the target sub-expression 1 AND the target sub-expression 2 AND the target sub-expression 3 respectively being an AND relationship, since the target sub-expression 1 does not order the data in the table T1 according to the a-th column of the table, the initial sub-plan constructed by the target sub-expression 3 may be to order the data screened by the Inner Hash Join (t1.a=t1.b) according to the a-th column of the table T1, so as to realize the data distribution in the data query expression.
That is, the initial sub-query plan constructed by the target sub-expression 3 may be Inner Hash Join (t1.a=t1.b) +sort (t1.a) +gathererge (t1.a), inner Hash Join (t1.a=t1.b) +gather (t1.a) +sort (t1.a), etc., where the gathererge operator is used to collect ordered data from all segments of the database to the master node, the Gather is used to collect data from all segments to the master node, and one may be selected as a sub-query plan from Inner Hash Join (t1.a=t1.b) +sort (t1.a) +gatherere (t1.a), inhash Join (t1.a=t1.b) +gather (t1.a) +sort (t1.a), etc. through the query optimization request associated by the target sub-expression 3.
And merging the sub-query plan Scan (T1) constructed by the target sub-expression 1, the sub-query plan Scan (T2) +distribution (T2. B) constructed by the target sub-expression 2, and the sub-query plan Inner Hash Join (T1.a=T1.b) +Sort (T1.a) +GatherMerge (T1. A) constructed by the target sub-expression 3, thereby obtaining a data query plan.
In some embodiments, to enable an expression to construct a query plan, constructing an initial sub-query plan from a target sub-expression includes:
acquiring a query operator associated with a target sub-expression;
constructing a connection relation between query operators according to the target sub-expression;
And controlling the query operators to be arranged according to the connection relation to obtain an initial sub-query plan.
Where the query operator may support retrieval of the required data from the database. For example, the query operator may be Get, scan, sort, gatherMerge, redistribute, replicate (copy), and so on.
The join relationships are used to limit the order in which query operators are executed. For example, the target sub-expression 2 is used to obtain column b data in the lookup table T2, that is, the multiple query operators (Get, scan, and Redistribute) associated with the target sub-expression 2, where the connection relationship between the query operators constructed by the target sub-expression 2 may be first Scan and then Redistribute, or only Get, so that the query operators are arranged according to the connection relationship, and an initial sub-query plan Scan (T2) +redistribute (T2. B), get (T2. B), or Get (T2. B) may be obtained.
In some embodiments, to calculate a query cost of a query plan constructed by an expression, determining a sub-query plan from an initial sub-query plan based on a query optimization request and an association relationship associated with a target sub-expression, comprising:
acquiring metadata information of data to be queried associated with the target sub-expression according to a query optimization request associated with the target sub-expression;
Analyzing the calculated amount required by the execution of the initial sub-query plan through metadata information of the data to be queried;
and determining a sub-query plan from the initial sub-query plans according to the calculated quantity and the association relation.
The metadata information is data information describing data to be queried. For example, metadata information records various attributes and characteristics about data, such as data type, data source, data storage mode, data access rights, etc., wherein the various attributes of data include attributes and characteristics describing tables, such as table name, column data type, column constraint, index information, data size, etc.
The amount of computation is the required computational operations and resource consumption in executing the initial sub-query plan, which can be understood as the query cost. For example, the amount of computation may include the size or time of scanning the data file, the size or time of reading the index, and so forth.
The association relationship between the target sub-expression 1 AND the target sub-expression 2 is a parallel relationship between sub-expressions, the association relationship between the target sub-expression 1 AND the target sub-expression 2 respectively AND the target sub-expression 3 is an AND relationship, AND the determination of the sub-query plan from the initial sub-query plan according to the calculated amount AND the association relationship can be:
Because the association relationship between the target sub-expression 1 and the target sub-expression 2 is a parallel relationship, the sub-query plan can be directly screened from the initial sub-query plans constructed by the target sub-expression 1 and the target sub-expression 2 according to the calculated amount.
Because the association relationship between the target sub-expression 1 AND the target sub-expression 2 AND the target sub-expression 3 is an AND relationship, when the initial sub-query plan is constructed by the calculation amount screening target sub-expression 3, the sub-query plan 1 constructed by the target sub-expression 1 AND the sub-query plan 2 constructed by the target sub-expression 2 also need to be considered, so that the sub-query plan 3 constructed by the target sub-expression 3 is prevented from overlapping with the sub-query plan 1 AND the sub-query plan 2, or all query conditions are prevented from being incompletely executed by the sub-query plan 1, the sub-query plan 2 AND the sub-query plan 3.
In some embodiments, in order for each sub-expression to construct a query plan without conflicting or repeating logic, the target sub-expression includes a first expression for querying the data to be queried and a second expression for filtering the data to be queried, determining a sub-query plan from the initial sub-query plan based on a query optimization request and an association relationship associated with the target sub-expression, including:
Determining a first sub-query plan from the first initial sub-query plan according to the query optimization request associated with the first expression, wherein the first initial sub-query plan is an initial sub-query plan constructed through the first expression;
determining an unsatisfied request from the query optimization requests associated with the first expression based on the query optimization requests satisfied by the first sub-query plan;
determining a target optimization request from the query optimization requests associated with the second expression according to the unsatisfied requests and the association relation, wherein the target optimization request comprises the unsatisfied requests;
and determining a second sub-query plan from the second initial sub-query plan according to the target optimization request, wherein the second initial sub-query plan is an initial sub-query plan constructed through a second expression.
The first expression is used for querying data to be queried. For example, the first expression may be the above-described target sub-expression 1 and target sub-expression 2, the target sub-expression 1 being used for the lookup table T1 and the target sub-expression 2 being used for the data of column b in the lookup table T2.
The query optimization request associated with the first expression is used to filter the initial sub-query plan required in retrieving the data to be queried.
The first initial sub-query plan is an initial sub-query plan constructed by a first expression and is used for querying data to be queried, wherein the first initial sub-query plan constructed by the first expression can be one or more.
The first sub-query plan is a first initial sub-query whose query cost satisfies a condition screened according to a query optimization request associated with the first expression.
The query optimization request satisfied by the first sub-query plan is a query condition that the first sub-query plan can satisfy when executing.
The unsatisfied request is a query optimization request not implemented by the first sub-query plan in the query optimization requests associated with the first expression.
For example, when the first expression is the target sub-expression 1, it is known from the above example that the query optimization request associated with the target sub-expression 1 may be used to enable the target sub-expression 1 to quickly obtain the lookup table T1 and to order the data queried from the table T1 according to the a-th column of the table, and if the first sub-query plan constructed by the first expression according to the query optimization request associated with the first expression is Scan (T1), the query optimization request satisfied by the first sub-query plan only includes obtaining the lookup table T1, and the unsatisfied request is such that the data queried from the table T1 is ordered according to the a-th column of the table.
The target optimization request is a query optimization request associated with the second expression obtained according to the unsatisfied request, so that all query conditions in the data query expression can be realized through the target optimization request and the query optimization request satisfied by the first sub-query plan.
For example, as can be seen from the foregoing example, when the second expression is the target sub-expression 3, the query optimization request associated with the target sub-expression 3 may be used to merge the sorted screening data collected by the target sub-expression 3 from all database nodes (sorted according to the a-th column of the table T1), and since the first sub-query plan constructed by the target sub-expression 1 can only obtain the query table T1, the data queried from the table T1 is not sorted according to the a-th column of the table (unsatisfied request), and thus the query optimization request associated with the second expression needs to include the unsatisfied request to implement the query condition in the data query expression.
The second initial sub-query plan is an initial sub-query plan constructed by the second expression.
The second sub-query plan is a second initial sub-query whose filtered query cost meets the condition according to the target optimization request.
In some embodiments, to shorten the time to construct a query plan, constructing an initial sub-query plan by a target sub-expression includes:
acquiring a historical query optimization request associated with a target sub-expression and a historical query plan corresponding to the historical query optimization request;
determining a target historical request corresponding to the query optimization request from the historical query optimization requests;
If the target history request exists, taking a history query plan corresponding to the target history request as a sub-query plan;
if the target history request does not exist, an initial sub-query plan is constructed through the target sub-expression.
Wherein the historical query optimization request is a query optimization request used by the target sub-expression during the historical time period.
The historical query plan is a query plan constructed by the target sub-expression according to the historical query optimization request.
The target historical request is the same historical query optimization request as the current query optimization request.
For example, when constructing a sub-query plan, the same target sub-expression may face a plurality of different query optimization requests in different query time periods, if the current query optimization request is the same as the historical query optimization request, i.e. there is a target historical request, the historical query plan corresponding to the target historical request is used as the sub-query plan, if there is no target historical request, an initial sub-query plan is directly constructed through the target sub-expression, and then the initial sub-query plan is screened through the query optimization request, thereby obtaining the sub-query plan.
In some embodiments, to facilitate obtaining a historical sub-query plan corresponding to a query optimization request, obtaining a historical query optimization request associated with a target sub-expression, and a historical query plan corresponding to the historical query optimization request, includes:
The method comprises the steps of obtaining a hash table associated with a target sub-expression, wherein the hash table comprises a historical query optimization request, a first label and a plan calling link which are associated with the historical query optimization request, and the first label is a hash value obtained by the historical query optimization request through hash function calculation;
determining a target historical request corresponding to the query optimization request from the historical query optimization requests, wherein the target historical request comprises:
determining a second label, wherein the second label is a hash value obtained by the query optimization request through hash function calculation;
determining a target historical request corresponding to the query optimization request from the historical query optimization requests according to the first tag and the second tag;
if the target history request exists, taking a history query plan corresponding to the target history request as a sub-query plan, including:
if the target historical request exists, a plan calling link associated with the target historical request is adopted, and a historical query plan corresponding to the historical query optimization request is called as a sub-query plan.
The hash table can map the hash value with the historical query optimization request, and the historical query optimization request carries a historical query plan so as to quickly screen the target historical request which is the same as the query optimization request from the historical query optimization request, thus the historical query plan corresponding to the target historical request can be obtained.
The first label is a hash value obtained by the historical query optimization request through hash function calculation.
The plan call links are links to query optimization plans used to call the target history request.
And the second label is a hash value obtained by the query optimization request after the same hash function calculation.
For example, the first tag is a hash value obtained by hash function calculation of the historical query optimization request, and the second tag is a hash value obtained by hash function calculation of the query optimization request, so that the tag with the same hash value as the second tag can be conveniently and rapidly searched from the first tag through the second tag, the historical query optimization request of the tag can be rapidly used as the target historical request which is the same as the query optimization request, and the traversal of the historical query optimization request is avoided.
In some embodiments, considering that data may be stored in a plurality of nodes in a distributed manner, in order to quickly obtain query data, after constructing a data query plan by the target sub-expression, the method further includes:
determining a query instruction corresponding to the data query plan and a copy instruction of the query instruction;
and sending the copy instruction to a plurality of data storage nodes of the database so that the data storage nodes return query data according to the copy instruction.
Wherein the query instruction is a representation of a data query plan such that the database may return query data via the query instruction. The copy instruction is an instruction obtained by copying the query instruction.
The data storage nodes are nodes for storing data for the database, and each data storage node returns data according to the received copy instruction.
For example, as can be seen from the above description, the target sub-expression 3 is used for screening data from the data obtained from the target sub-expression 1 and the target sub-expression 2, so that after receiving the data returned by each data storage node according to the copy instruction, the returned data is screened according to the target sub-expression 3.
It is understood that the data queried through the data query plan can be applied to the fields of data analysis and the like, such as advertising content query, financial wind control analysis, information recommendation and the like.
In some embodiments, when querying for advertisement content, advertisements can be quickly queried through a data query plan and presented in the content played by the application.
For example, it is required to display the advertisement of the target type in the content played by one application program, and considering that the existing query optimizer adopts multi-stage query optimization, each optimization stage is independently optimized, so that the optimization strategy of each stage may have conflict, and the uncertainty and the error rate in the query process are high, so that the query efficiency is low when the existing query optimizer queries the advertisement of the target type.
In order to improve the efficiency of querying the advertisements of the target types, sub-expressions in the data query expression can be expanded through the equivalent sub-expressions, target sub-expressions with high query efficiency are screened out from the equivalent sub-expressions and the sub-expressions, a data query plan is constructed by a plurality of target sub-expressions according to the association relation among the sub-expressions, the step of collision or overlapping in the data query plan is avoided, the query efficiency of the advertisements of the target types is further improved, and therefore the advertisements of the target types can be rapidly displayed in the content played by an application program.
In some embodiments, query plan construction may also be applied to the field of financial wind control analysis.
For example, when a user analyzes a financial product, the financial wind control analysis can quickly inquire about data related to the financial product through the data inquiry plan constructed by the application, and recommend the data related to the financial product to the user.
In some embodiments, query plan construction may also be applied to the field of information recommendation.
For example, when a user inquires information, the user can quickly inquire the information through the data inquiry plan constructed by the application and recommend the inquired information to the user.
From the above, the embodiment of the application can obtain the data query expression, wherein the data query expression comprises keywords; according to the keywords, sub-expressions in the data query expression and the association relation between the sub-expressions are obtained; determining an equivalent sub-expression, wherein the equivalent sub-expression and the sub-expression have equivalent expression meanings; determining a target sub-expression from the sub-expression and the equivalent sub-expression; and constructing a data query plan through the target sub-expression and the association relation.
Therefore, the data query expression of the scheme can have a plurality of sub-expressions, each sub-expression has an association relationship, the sub-expressions can be expanded through the equivalent sub-expressions, and as the equivalent sub-expressions and the sub-expressions have equivalent expression meanings, the equivalent sub-expressions have the association relationship of the sub-expressions, the target sub-expressions can be screened from the sub-expressions and the equivalent sub-expressions, the target sub-expressions can be sub-expressions or equivalent sub-expressions, and the plurality of target sub-expressions can construct a query plan according to the association relationship, so that the plurality of sub-expressions do not have mutual conflict when constructing the query plan, thereby being convenient for constructing the data query plan and improving the query performance.
The method described in the above embodiments will be described in further detail below.
In this embodiment, a query system will be taken as an example, and a method according to an embodiment of the present application will be described in detail.
As shown in fig. 2a, a query system includes a database management system and a query optimizer, wherein the query optimizer is located outside the database management system, and the database management system specifically includes:
the parser is used for carrying out grammar analysis on the received data query command so as to obtain a grammar tree of the data query command;
the query converter is used for receiving the grammar tree sent by the parser, adopting a preset data exchange language, and carrying out format conversion processing on the grammar tree to obtain a data query expression, so that the query optimizer constructs a data query plan according to the data query expression;
the metadata provider is used for receiving a metadata information acquisition request sent by the query optimizer when the query optimizer constructs a data query plan;
the metadata provider is an interface for exchanging metadata information between the query optimizer and the database management system. The metadata providing component is a system-specific plug-in for retrieving metadata information from a database management system.
A catalog for receiving the acquisition request sent by the metadata provider and sending the acquisition request to the metadata exchanger through the metadata provider so that the acquisition request can acquire metadata information from the catalog through the metadata exchanger and send the metadata information to the parser;
the catalog is a database for storing metadata information in the database management system, and the metadata information is data information describing data in the database.
A metadata exchanger: metadata information is exchanged using a preset data exchange language (EL language). The EL language is a data exchange language in XML format, which contains metadata information in the database management system, such as table definition, operator definition, etc., and through the acquisition request of the EL language, it can interact with different back-end database management systems and acquire the required metadata information. Meanwhile, metadata information can be extracted from the database system through a metadata exchanger and converted into an EL language so as to realize decoupling of the query optimizer and the database management system;
preset data exchange language (EL language): refers to a framework for exchanging information between a query optimizer and a database management system. Separating the query optimizer from the database management system requires the construction of a communication mechanism that handles the query. The framework encodes necessary information required for communication using XML-based language, such as input queries, output plans, and metadata. A simple communication protocol is used on the data exchange language framework for acquiring metadata information and sending data query plans;
A plan converter for converting the data query plan into a query instruction;
and the executor is used for receiving the query instruction sent by the plan converter and returning query data according to the query instruction.
Wherein the actuator performs: a copy of the final query instruction (copy instruction) is dispatched into each segment (data storage node). During distributed query execution, the allocation executor on each segment acts as both a sender and a receiver of data. For example, the instance of Redistribute (t 2. B) running on segment S sends tuples on S to other segments based on the hash value of t2.B, and also receives tuples from other instances of Redistribute (t 2. B) on other segments.
Wherein, the distribution (T2. B) indicates column b of table T2.
The input to the query optimizer in FIG. 2a is a data query expression (EL query). The output of the query optimizer is a data query plan (EL plan). When the query optimizer builds a data query plan, the database management system may be queried to obtain metadata information (e.g., table definitions). The query optimizer abstracts metadata access details by allowing the database management system to register with the metadata provider, and is responsible for serializing metadata information into an EL query and then sending the EL query to the query optimizer. Metadata information may also be queried from files containing metadata objects serialized in the EL language format. The database management system needs to include a converter that consumes/emits data in EL format. The query converter converts the syntax tree into a data query expression, and the plan converter converts the data query plan into an executable plan. Implementation of such a plan converter is done entirely outside of the optimizer, which allows multiple database management systems to use the query optimizer by providing an appropriate plan converter.
By locating the query optimizer outside the database management system, the architecture of the query optimizer may be highly scalable, and all components in the query optimizer and database management system may be individually replaced and configured separately.
As shown in fig. 2b, the query optimizer specifically comprises:
a memo for encoding the data query plan constructed by the query optimizer in a compact memory data structure and storing in the memo;
wherein the structure of the memo is composed of a set of containers called expression sets, wherein each expression set contains logically equivalent expressions. The set of memo expressions captures different sub-query expressions (e.g., filters on a table or table joins of two tables) of the data query expression, and the group members of the set of expressions (sub-expressions and equivalent sub-expressions of the sub-expressions) implement the same data query in different logical ways (e.g., different join orders). The sub-expressions or equivalent sub-expressions within each expression group are made up by query operators, and the recursive structure of the memo allows compact encoding of vast amounts of data by building up the space of the data query plan.
The searcher is used for constructing an initial sub-query plan through target sub-expressions in the expression group by using a search mechanism, and estimating a sub-query plan with minimum query cost from the initial sub-query plan;
A task scheduler for starting the search mechanism, creating parallel work units by the task scheduler to perform three main steps of query optimization: exploring (generating equivalent logical expressions), implementing (constructing an initial sub-query plan), and optimizing (optimizing an initial sub-query plan), wherein desired physical attributes (e.g., ordering order) are enforced;
an optimization tool for providing a search mechanism for a searcher, the optimization tool comprising:
the transformation component is used for determining an equivalent sub-expression, and the equivalent sub-expression and the sub-expression have equivalent expression meanings;
for example, by applying conversion rules to generate plan alternatives, these rules may produce equivalent logical expressions (e.g., inner Join (T1, T2) →inner Join (T2, T1)) or physical realizations of existing sub-expressions (e.g., join (T1, T2) →Hash Join (T2, T1)). The result of applying the transformation rules is copied into the memo, which may result in creating a new expression set and/or adding a new expression to an existing expression set. Each transformation rule is an independent component that can be explicitly activated and deactivated in the system configuration.
Wherein the Join is used to combine the rows of two or more tables according to specific conditions to produce a larger new table containing all the information in the source table. Hash Join (Hash Join) is to map two tables of Join into several buckets through Hash (Hash) functions, then to perform Join operation on each bucket, and finally to combine the results of Join.
The forced execution attribute is used for inserting the physical attribute required by forced execution into the initial sub-query plan;
it is understood that the query optimizer includes an extensible framework for describing query requirements and planning characteristics based on formal attribute specifications. Attributes are of different types, including logical attributes (e.g., output columns), physical attributes (e.g., ordering order and data distribution), and scalar attributes (e.g., columns used in input conditions). When the query optimizer builds a data query plan, each query operator may request a particular attribute from passing through the target sub-expression. An optimized initial sub-query plan (sub-query plan) may automatically satisfy a query optimization request associated with a target sub-expression (required attributes (e.g., index scan (IndexScan) plan provides ordering data)), but many times it is necessary to insert some attributes into the initial sub-query plan by enforcing the attributes (e.g., the initial sub-query plan contains a Sort operation, a Sort operator is required to be inserted into the plan). The framework allows each query operator to control the setting of the enforcer according to the properties of the sub-query plan and the local behavior of the query operator, the purpose of which is to ensure that the query data meets certain requirements, such as the order of the output columns, the distribution of the data, etc.
The metadata cache is used for caching metadata information of the data to be queried;
and the query cost estimation is used for carrying out query cost estimation on the initial sub-query plan constructed by the searcher through the metadata information in the metadata cache to obtain the query cost of the initial sub-query plan.
For example, since metadata (e.g., table definitions) is rarely changed, sending metadata information per query incurs significant overhead, so the query system caches metadata information on the query optimizer side, and queries metadata information from the directory only if the metadata information is not found in the cache or changes occur.
For example, the user-entered query language is converted into a structured query language (Structured Query Language, SQL) (i.e., a data query command), WHERE SQL is a standard computer language for processing relational databases, the data query command may be (SELECT T1.A FROM T1, T2WHERE T1. A=t2. B ORDER BY T1. A), WHERE the distribution of table T1 is grouped BY hash value of column a in table T1, may be represented as broken (T1. A), the distribution of table T2 is grouped BY hash value of column a in table T1, may be represented as broken (T2. A), and the query optimizer constructs a data query plan BY the following example:
And (one), converting the data query command into a data query expression in an EL format by adopting a preset data exchange language.
For example, the data query command is converted to an EL format, which contains the required output columns, sequences, data distribution and query logic in an XML format. Metadata information (e.g., table and operator definitions) is decorated with metadata Identifications (IDs) to allow more information to be requested when the query optimizer builds a data query plan. The metadata ID is a unique identifier composed of a database system identifier, an object identifier, and a version number. The EL format data query expression is sent to the query optimizer, parsed and converted into a logical expression tree in memory, and then copied into the memo.
And secondly, obtaining sub-expressions in the data query expression and association relations among the sub-expressions according to the keywords.
For example, as shown in fig. 2c, BY the data query command (SELECT T1.A FROM T1, T2WHERE T1. A=t2. B ORDER BY T1. A), the keyword search table T1, the search table T2, and the Inner link (t1.a=t2. B) in the data query expression can be obtained (Inner Join (t1.a=t2. B)), and three sub-expressions are created BY the search table T1, the search table T2, and the Inner Join (t1.a=t2. B).
The connection condition is omitted for brevity, and expression set 0 (Group 0) is called a root set because it corresponds to the root of the logical plan, and the dependency between operators in the logical plan is captured as a reference between expression sets, i.e., the sub-expressions corresponding to Inner Join belong to the root set, and the sub-expressions corresponding to tables T1 and T2 belong to the sub-sets. For example, inner Join [ T1, T2] refers to expression set 1 (Group 1) and expression set 2 (Group 2) as subgroups and connected. The query optimization is performed according to the following steps:
(1) Searching: triggering transformation rules that produce other equivalent logical plan expressions (i.e., determining equivalent sub-expressions, equivalent sub-expressions and sub-expressions having equivalent expression meanings).
For example, inner Join [ T1, T2] is generated as Inner Join [ T2, T1]. And adds the newly generated expression to the existing root set and possibly creates a new expression set. The memo structure has a built-in duplicate detection mechanism based on an expression topology for detecting and eliminating any duplicate expressions created by different transformations.
(2) Derivative statistics: the memo maintains the complete logical space for a given query at the end of the exploration. The derived statistics mechanism of the query optimizer is then triggered to calculate the statistics object of the memo group. The statistical object is mainly a column histogram for calculating cardinality and data inclination (i.e., metadata information of the data to be queried). The derivative statistics are performed on a compact memo structure to avoid expanding the search space. The system selects the expression with the highest reliable statistics as the target sub-expression set to calculate derived statistics, the statistical calculation being based on the target sub-expressions.
For example, an Inner Join expression (sub-expression) with a small number of connection conditions (a first number of conditions) is less costly to query than another equivalent Inner Join expression (equivalent sub-expression) with a larger number of connection conditions (a second number of conditions) (this may occur when multiple connection orders are generated). Because the greater the number of connection conditions, the higher the propagation and amplification errors may be. Computing the cardinal confidence scores is very challenging because of the need to aggregate confidence scores across all nodes of a given target sub-expression. After selecting the target sub-expression in the root set for which the query cost is the smallest, the system recursively triggers the derived statistics on a subset of the target sub-expressions. Finally, the statistical objects of the sub-groups (metadata information of the data to be queried) are constructed as the statistical objects of the root group by combining the statistical objects of the sub-groups.
As shown in fig. 2d, a derivative statistics mechanism of the running example is shown. First a top-down traversal is performed, wherein the sub-expressions of the root Group request statistical data (metadata information of column a (T1. A) in table T1 and metadata information of column b (T2. B) in table T2) from the sub-expressions (Group 1 and Group 2) of its sub-groups. For example, inner Join (t1.a=t2.b) requests histograms of t1.a and t 2.b. The requested histogram is loaded from the catalog by the metadata provider, parsed into EL queries (data query expressions) and stored in the metadata cache as needed to service future requests. Next, a bottom-up traversal is performed to combine the sub-statistical objects into a root set of statistical objects (metadata information of the data to be queried). This combines the (possibly modified) histograms of t1.A and t2.B, as the connection conditions may affect the column histogram. The structured statistical objects are appended to a single group and may be incrementally updated during optimization (e.g., by adding new histograms). This is critical to keeping the cost of derived statistics manageable.
(3) The realization is as follows: trigger a conversion rule that creates a physical implementation of a logical expression. For example, a Scan rule is triggered and a logical (Get) is generated as a physical table Scan (Scan).
(4) Optimizing: in which attribute enforcement and query cost estimation schemes are performed. Optimization begins with submitting a query optimization request to the set of root expressions of the memo, specifying query requirements such as result distribution and ordering. Submitting a request to a root expression set is equivalent to requesting a minimum query cost plan for satisfying the request in the physical operators in the root expression set.
For each query optimization request, the sub-expressions in the root expression set pass the corresponding request to the sub-expressions in the sub-set in accordance with the incoming query optimization request and the local requirements of the operator. The same group may submit multiple identical query optimization requests during optimization. The system caches the query optimization request in a hash table. The incoming request is only computed if there is no query optimization request in the hash table (to construct a sub-query plan from the query optimization request). In addition, each sub-expression maintains a local hash table that maps query optimization requests to corresponding historical query requests. The local hash table provides a plan call link for use in extracting the historical query plan from the memo.
As shown in fig. 2e, a query optimization request in the running example is illustrated. The query optimization request associated with the first expression specifies that the query results need to be collected on the master node based on the order of column a of table T1. FIG. 2e also includes a group hash table corresponding to the best group expression, and inserts Sort (Sort), reassign (reassign), copy (duplicate) operators in the memo. The Gather operator gathers data from all data storage nodes to the master node. The gathererge operator gathers the ordered data from all data storage nodes to the master node and maintains the ordered order. The retrieve operator distributes tuples into fragments according to the hash value of a given parameter.
FIG. 2e includes query operators in Group0 (nested loop connections [ Group1, group2] (Inner NestLoopJoin [ Group1, group2 ]), inner NestLoopJoin [ Group2, group1], inner Hash Join [ Group1, group2], and Inner Hash Join [ Group2, group1, (Sort (T1. A), gather, gatherMerge (T1. A)), query operators in Group1 (Scan (T1), sort (T1. A), replied), query operators in Group2 (Scan (T2), replicate, redistribute (T2. B)), data query plan on the right side of the construction of the query operators in Group1, group2, and Group0 according to the query optimization request, the optimization of Inner Hash Join [ T1, T2] is presented as shown in FIG. 2f, for this query optimization request, where an alternative is to align node distributions according to connection conditions, this is achieved by requesting a Hashed (T1. A) distribution from the first expression 1 and a Hashed (T2. B) distribution from the first expression 2, both first expressions requiring any type of ordering order to be provided, after finding the sub-Group (Group 1 and Group 2) best plans (first sub-query plan), the Inner Hash Join combines sub-attributes to determine the provided distribution and ordering order (target optimization request). Wherein the best plan of the first expression 2 requires a Hash of the distribution T2 over T2.B because T2 is initially Hashed over T2.A, and the best plan of the first expression 1 is a simple Scan because T1 is already Hashed over T1. When it is determined that the provided attributes do not meet the initial requirements (unsatisfied requests), unsatisfied attributes (target optimization requests) must be enforced. Attribute execution in a system employs a flexible framework that allows each operator to define behavior to execute desired attributes according to sub-plans and operator local behavior. For example, a nested loop Join (Nested Loops Join, NL Join) operator that maintains order may not need to perform the ordering order over the Join if the ordering order has been provided by an external child node.
FIG. 2f illustrates an initial sub-query plan for two possible groups 0 that satisfy a query optimization request by attribute execution. The data query plan on the left orders the query data on segments, and then collects and merges the ordered results on the master node. The data query plan on the right gathers query data from the segments to the master node, which is then ranked. These different alternatives are encoded in the memo and the query cost (calculation amount) of the initial sub-query plan is calculated by the cost model from the metadata information of the data to be queried. And finally, extracting a data query plan from the memo based on the query structure given by the query optimizer. FIG. 2f illustrates plan extraction for an example being run. The associated expressions all have corresponding local hash tables. Each local hash table maps incoming optimization requests to corresponding sub-optimization requests. The optimal group expression is first found in the target sub-expression of the root expression group as the gathererget operator. In the local hash table of gathererge, the best set expression to find is Sort. The gathererge operator is thus linked to Sort. In the Sort's local Hash table, the corresponding best set expression is Inner Hash Join [ T1, T2]. Thus linking Sort to Inner Hash Join. The same procedure is followed later to complete the plan extraction, the final data query plan, as shown in FIG. 2 f. The extracted optimal data query plan is serialized in EL format and sent to the database management system for execution.
The invention is mainly oriented to important business scenes such as data analysis, for example, a large-scale real-time data and offline data query system. The internet generates huge amounts of data every day, thus providing huge challenges for big data query systems, data management and query systems in business have made tremendous progress in terms of scalability, availability and processing performance, so that large data sets of millions of gigabytes (TBs) and even Petabytes (PB) can be analyzed and queried more quickly through structured query language (Structured Query Language, SQL) or SQL-like interfaces. Internet data is typically stored in a distributed computing framework (Hadoop), where the query engine compiles the original SQL query into Spark, which is a distributed computing framework that uses a memory computing scheme, or MapReduce, which is a distributed computing framework that uses a disk read/write IO scheme. The prior query optimizer generally uses multi-stage query optimization, and the compiled Spark or MapReduce has poorer operation performance, so the invention designs a query plan construction method based on a waterfall tree model, and accelerates the coding of complex queries. The actual data test can bring more than several times of query acceleration compared with the prior query optimizer.
For SQL queries on large data systems, a common solution in the industry is to use Hive to convert the query into MapReduce tasks, where Hive is a Hadoop-based data warehouse tool and Hadoop is a distributed computing framework, but this approach may lead to poor interactive analysis performance. To address this problem, industry has developed a number of specialized query engines. These approaches are often only applicable to specific host systems and are optimized for not supporting the query requirements of multiple distributed systems at the same time. Because the query optimizer is a main influencing factor for analyzing the query processing performance and the service has a large amount of processing requirements for complex queries with big data, the invention designs a novel query optimizer for a distributed query architecture of big data, and the service can rapidly develop new optimization technology and advanced query functions based on the optimizer. The invention accelerates the inquiry by designing metadata cache components, memorandum, forced attribute optimization and other technologies, and improves the optimization speed by using an efficient multi-core scheduler.
The invention designs a new query optimizer. And a query optimizer for optimizing the framework based on the waterfall flow model. The query optimizer of the prior data query system was tightly coupled to the entire database management system, but one unique feature of the present invention is that the designed optimizer can be run outside the database system as a separate component. This capability makes it possible for products with different computing architectures to use the same optimizer together. In addition, the optimizer may be deployed and run as a stand-alone product and may be tested and optimized in detail separately, without having to be coupled to a structure of the database system. Thereby reducing the difficulty of deployment and testing.
Therefore, the application is convenient for constructing the data query plan and improves the query efficiency.
In order to better implement the method, the embodiment of the application also provides a query plan construction device, which can be integrated in electronic equipment, wherein the electronic equipment can be a terminal, a server and the like. The terminal can be a mobile phone, a tablet personal computer, an intelligent Bluetooth device, a notebook computer, a personal computer and other devices; the server may be a single server or a server cluster composed of a plurality of servers.
For example, in the present embodiment, a method according to an embodiment of the present application will be described in detail by taking an example in which a query plan construction apparatus is specifically integrated in an electronic device.
For example, as shown in fig. 3, the query plan construction device may include an acquisition unit 310, a key unit 320, an equivalent unit 330, a determination unit 340, and a construction unit 350, as follows:
(one), an acquisition unit 310.
The obtaining unit 310 is configured to obtain a data query expression, where the data query expression includes a keyword.
In some embodiments, obtaining the data query expression includes:
acquiring a data query command, wherein the data query command comprises keywords;
According to the keywords, carrying out grammar analysis processing on the data query command to obtain a grammar tree of the data query command;
and carrying out format conversion processing on the grammar tree by adopting a preset data exchange language to obtain a data query expression.
(two), critical unit 320.
And a key unit 320, configured to obtain the sub-expressions in the data query expression and the association relationship between the sub-expressions according to the key words.
(III), equivalent unit 330.
And an equivalent unit 330, configured to determine an equivalent sub-expression, where the equivalent sub-expression and the sub-expression have equivalent expression meanings.
(fourth), determination unit 340.
A determining unit 340 for determining the target sub-expression from the sub-expression and the equivalent sub-expression.
In some embodiments, determining the target sub-expression from the sub-expression and the equivalent sub-expression includes:
acquiring a first condition number and a second condition number, wherein the first condition number is the condition number of the data connection condition associated with the sub-expression, and the second condition number is the condition number of the data connection condition associated with the equivalent sub-expression;
the target sub-expression is determined from the sub-expression and the equivalent sub-expression according to the first condition number and the second condition number.
In some embodiments, determining the target sub-expression from the sub-expression and the equivalent sub-expression includes:
acquiring metadata information of data to be queried associated with the sub-expressions;
acquiring the data quantity of the data to be queried from the metadata information;
the target sub-expression is determined from the sub-expression and the equivalent sub-expression according to the data amount.
(V) construction unit 350.
And a construction unit 350, configured to construct a data query plan through the target sub-expression and the association relationship.
In some embodiments, constructing the data query plan from the target sub-expressions and the associations includes:
acquiring a query optimization request associated with a target sub-expression from a data query expression;
constructing an initial sub-query plan through a target sub-expression;
determining a sub-query plan from the initial sub-query plan according to the query optimization request and the association relation associated with the target sub-expression;
and merging the sub-query plans to obtain the data query plan.
In some embodiments, constructing the initial sub-query plan from the target sub-expression includes:
acquiring a query operator associated with a target sub-expression;
constructing a connection relation between query operators according to the target sub-expression;
And controlling the query operators to be arranged according to the connection relation to obtain an initial sub-query plan.
In some embodiments, determining a sub-query plan from the initial sub-query plan based on the query optimization request and the association relationship associated with the target sub-expression, includes:
acquiring metadata information of data to be queried associated with the target sub-expression according to a query optimization request associated with the target sub-expression;
analyzing the calculated amount required by the execution of the initial sub-query plan through metadata information of the data to be queried;
and determining a sub-query plan from the initial sub-query plans according to the calculated quantity and the association relation.
In some embodiments, the target sub-expression includes a first expression and a second expression, the first expression is used for querying data to be queried, the second expression is used for screening the data to be queried, and determining a sub-query plan from the initial sub-query plan according to a query optimization request and an association relation associated with the target sub-expression includes:
determining a first sub-query plan from the first initial sub-query plan according to the query optimization request associated with the first expression, wherein the first initial sub-query plan is an initial sub-query plan constructed through the first expression;
Determining an unsatisfied request from the query optimization requests associated with the first expression based on the query optimization requests satisfied by the first sub-query plan;
determining a target optimization request from the query optimization requests associated with the second expression according to the unsatisfied requests and the association relation, wherein the target optimization request comprises the unsatisfied requests;
and determining a second sub-query plan from the second initial sub-query plan according to the target optimization request, wherein the second initial sub-query plan is an initial sub-query plan constructed through a second expression.
In some embodiments, constructing the initial sub-query plan from the target sub-expression includes:
acquiring a historical query optimization request associated with a target sub-expression and a historical query plan corresponding to the historical query optimization request;
determining a target historical request corresponding to the query optimization request from the historical query optimization requests;
if the target history request exists, taking a history query plan corresponding to the target history request as a sub-query plan;
if the target history request does not exist, an initial sub-query plan is constructed through the target sub-expression.
In some embodiments, obtaining a historical query optimization request associated with the target sub-expression, and a historical query plan corresponding to the historical query optimization request, includes:
The method comprises the steps of obtaining a hash table associated with a target sub-expression, wherein the hash table comprises a historical query optimization request, a first label and a plan calling link which are associated with the historical query optimization request, and the first label is a hash value obtained by the historical query optimization request through hash function calculation;
determining a target historical request corresponding to the query optimization request from the historical query optimization requests, wherein the target historical request comprises:
determining a second label, wherein the second label is a hash value obtained by the query optimization request through hash function calculation;
determining a target historical request corresponding to the query optimization request from the historical query optimization requests according to the first tag and the second tag;
if the target history request exists, taking a history query plan corresponding to the target history request as a sub-query plan, including:
if the target historical request exists, a plan calling link associated with the target historical request is adopted, and a historical query plan corresponding to the historical query optimization request is called as a sub-query plan.
In some embodiments, after constructing the data query plan by the target sub-expression, further comprising:
determining a query instruction corresponding to the data query plan and a copy instruction of the query instruction;
And sending the copy instruction to a plurality of data storage nodes of the database so that the data storage nodes return query data according to the copy instruction.
In the implementation, each unit may be implemented as an independent entity, or may be implemented as the same entity or several entities in any combination, and the implementation of each unit may be referred to the foregoing method embodiment, which is not described herein again.
As can be seen from the above, the query plan construction device of the present embodiment acquires, by the acquisition unit, a data query expression including keywords; obtaining sub-expressions in the data query expression and association relations among the sub-expressions by the key units according to the key words; determining an equivalent sub-expression by an equivalent unit, wherein the equivalent sub-expression and the sub-expression have equivalent expression meanings; determining, by the determining unit, a target sub-expression from the sub-expression and the equivalent sub-expression; and constructing the data query plan by a construction unit through the target sub-expression and the association relation.
Therefore, the embodiment of the application can be convenient for constructing the data query plan and improve the query efficiency.
The embodiment of the application also provides electronic equipment which can be a terminal, a server and other equipment. The terminal can be a mobile phone, a tablet computer, an intelligent Bluetooth device, a notebook computer, a personal computer and the like; the server may be a single server, a server cluster composed of a plurality of servers, or the like.
In some embodiments, the query plan construction apparatus may also be integrated in a plurality of electronic devices, for example, the query plan construction apparatus may be integrated in a plurality of servers, and the query plan construction method of the present application is implemented by the plurality of servers.
In this embodiment, a detailed description will be given taking an example that the electronic device of this embodiment is a server, for example, as shown in fig. 4, which shows a schematic structural diagram of the server according to the embodiment of the present application, specifically:
the server may include one or more processor cores 'processors 410, one or more computer-readable storage media's memory 420, a power supply 430, an input module 440, and a communication module 450, among other components. Those skilled in the art will appreciate that the server architecture shown in fig. 4 is not limiting of the server and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components. Wherein:
the processor 410 is a control center of the server, connects various parts of the entire server using various interfaces and lines, performs various functions of the server and processes data by running or executing software programs and/or modules stored in the memory 420, and calling data stored in the memory 420. In some embodiments, processor 410 may include one or more processing cores; in some embodiments, processor 410 may integrate an application processor that primarily handles operating systems, user interfaces, applications, etc., with a modem processor that primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 410.
The memory 420 may be used to store software programs and modules, and the processor 410 may perform various functional applications and data processing by executing the software programs and modules stored in the memory 420. The memory 420 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data created according to the use of the server, etc. In addition, memory 420 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, memory 420 may also include a memory controller to provide processor 410 with access to memory 420.
The server also includes a power supply 430 that provides power to the various components, and in some embodiments, the power supply 430 may be logically connected to the processor 410 via a power management system, such that charge, discharge, and power consumption management functions are performed by the power management system. Power supply 430 may also include one or more of any of a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.
The server may also include an input module 440, which input module 440 may be used to receive entered numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.
The server may also include a communication module 450, and in some embodiments the communication module 450 may include a wireless module, through which the server may wirelessly transmit over short distances, thereby providing wireless broadband internet access to the user. For example, the communication module 450 may be used to assist a user in e-mail, browsing web pages, accessing streaming media, and the like.
Although not shown, the server may further include a display unit or the like, which is not described herein. In particular, in this embodiment, the processor 410 in the server loads executable files corresponding to the processes of one or more application programs into the memory 420 according to the following instructions, and the processor 410 executes the application programs stored in the memory 420, so as to implement various functions as follows:
acquiring a data query expression, wherein the data query expression comprises keywords;
according to the keywords, sub-expressions in the data query expression and the association relation between the sub-expressions are obtained;
Determining an equivalent sub-expression, wherein the equivalent sub-expression and the sub-expression have equivalent expression meanings;
determining a target sub-expression from the sub-expression and the equivalent sub-expression;
and constructing a data query plan through the target sub-expression and the association relation.
The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.
According to the method, the target sub-expressions can construct the query plan according to the association relation, so that the sub-expressions do not conflict with each other when constructing the query plan, the data query plan is constructed conveniently, and the query performance is improved.
Those of ordinary skill in the art will appreciate that all or a portion of the steps of the various methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor.
To this end, embodiments of the present application provide a computer readable storage medium having stored therein a plurality of instructions capable of being loaded by a processor to perform the steps of any of the query plan construction methods provided by embodiments of the present application. For example, the instructions may perform the steps of:
Acquiring a data query expression, wherein the data query expression comprises keywords;
according to the keywords, sub-expressions in the data query expression and the association relation between the sub-expressions are obtained;
determining an equivalent sub-expression, wherein the equivalent sub-expression and the sub-expression have equivalent expression meanings;
determining a target sub-expression from the sub-expression and the equivalent sub-expression;
and constructing a data query plan through the target sub-expression and the association relation.
Wherein the storage medium may include: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.
According to one aspect of the present application, there is provided a computer program product comprising a computer program/instructions stored in a computer readable storage medium. The computer program/instructions are read from a computer-readable storage medium by a processor of a computer device, and executed by the processor, to cause the computer device to perform the methods provided in the various alternative implementations of the query plan construction aspects provided in the above-described embodiments.
The instructions stored in the storage medium can execute the steps in any query plan construction method provided by the embodiments of the present application, so that the beneficial effects that any query plan construction method provided by the embodiments of the present application can be achieved, and detailed descriptions of the previous embodiments are omitted herein.
The foregoing describes in detail a query plan construction method, apparatus, electronic device and storage medium provided by the embodiments of the present application, and specific examples are applied to illustrate the principles and embodiments of the present application, where the foregoing examples are only used to help understand the method and core idea of the present application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in light of the ideas of the present application, the present description should not be construed as limiting the present application.

Claims (15)

1. A query plan construction method, comprising:
acquiring a data query expression, wherein the data query expression comprises keywords;
according to the keywords, sub-expressions in the data query expression and the association relation between the sub-expressions are obtained;
determining an equivalent sub-expression, wherein the equivalent sub-expression and the sub-expression have equivalent expression meanings;
determining a target sub-expression from the sub-expression and the equivalent sub-expression;
and constructing a data query plan through the target sub-expression and the association relation.
2. The query plan construction method of claim 1, wherein said determining a target sub-expression from said sub-expressions and said equivalent sub-expressions comprises:
acquiring a first condition number and a second condition number, wherein the first condition number is the condition number of the data connection condition associated with the sub-expression, and the second condition number is the condition number of the data connection condition associated with the equivalent sub-expression;
and determining a target sub-expression from the sub-expression and the equivalent sub-expression according to the first condition number and the second condition number.
3. The query plan construction method of claim 1, wherein said determining a target sub-expression from said sub-expressions and said equivalent sub-expressions comprises:
acquiring metadata information of data to be queried associated with the sub-expression;
acquiring the data volume of the data to be queried from the metadata information;
and determining a target sub-expression from the sub-expression and the equivalent sub-expression according to the data quantity.
4. The query plan construction method as claimed in claim 1, wherein said constructing a data query plan by said target sub-expression and said association relation comprises:
Acquiring a query optimization request associated with the target sub-expression from the data query expression;
constructing an initial sub-query plan through the target sub-expression;
determining a sub-query plan from the initial sub-query plan according to the query optimization request associated with the target sub-expression and the association relation;
and merging the sub-query plans to obtain a data query plan.
5. The query plan construction method as claimed in claim 4, wherein said constructing an initial sub-query plan from said target sub-expression comprises:
acquiring a query operator associated with the target sub-expression;
constructing a connection relation between the query operators according to the target sub-expression;
and controlling the query operators to be arranged according to the connection relation to obtain an initial sub-query plan.
6. The query plan construction method as claimed in claim 4, wherein said determining a sub-query plan from said initial sub-query plan based on said query optimization request associated with said target sub-expression and said association relation comprises:
acquiring metadata information of data to be queried associated with the target sub-expression according to the query optimization request associated with the target sub-expression;
Analyzing the calculated amount required by the execution of the initial sub-query plan through the metadata information of the data to be queried;
and determining a sub-query plan from the initial sub-query plan according to the calculated quantity and the association relation.
7. The query plan construction method as claimed in claim 4, wherein the target sub-expression includes a first expression for querying data to be queried and a second expression for filtering the data to be queried, and the determining a sub-query plan from the initial sub-query plan according to a query optimization request associated with the target sub-expression and the association relation includes:
determining a first sub-query plan from a first initial sub-query plan according to a query optimization request associated with the first expression, wherein the first initial sub-query plan is the initial sub-query plan constructed through the first expression;
determining an unsatisfied request from the query optimization requests associated with the first expression based on the query optimization requests satisfied by the first sub-query plan;
determining a target optimization request from the query optimization requests associated with the second expression according to the unsatisfied requests and the association relation, wherein the target optimization request comprises the unsatisfied requests;
And determining a second sub-query plan from a second initial sub-query plan according to the target optimization request, wherein the second initial sub-query plan is the initial sub-query plan constructed through the second expression.
8. The query plan construction method as claimed in claim 4, wherein said constructing an initial sub-query plan from said target sub-expression comprises:
acquiring a historical query optimization request associated with the target sub-expression and a historical query plan corresponding to the historical query optimization request;
determining a target historical request corresponding to the query optimization request from the historical query optimization requests;
if the target history request exists, taking a history query plan corresponding to the target history request as a sub-query plan;
if the target history request does not exist, an initial sub-query plan is constructed through the target sub-expression.
9. The query plan construction method of claim 8, wherein the obtaining the historical query optimization request associated with the target sub-expression and the historical query plan corresponding to the historical query optimization request comprises:
the method comprises the steps of obtaining a hash table associated with a target sub-expression, wherein the hash table comprises a historical query optimization request, a first label and a plan retrieval link associated with the historical query optimization request, and the first label is a hash value obtained by hash function calculation of the historical query optimization request;
The determining, from the historical query optimization requests, a target historical request corresponding to the query optimization request includes:
determining a second label, wherein the second label is a hash value obtained by the query optimization request through the hash function calculation;
determining a target historical request corresponding to the query optimization request from the historical query optimization requests according to the first tag and the second tag;
and if the target history request exists, taking a history query plan corresponding to the target history request as a sub-query plan, wherein the method comprises the following steps:
and if the target historical request exists, adopting a plan calling link associated with the target historical request to call a historical query plan corresponding to the historical query optimization request as a sub-query plan.
10. The query plan construction method of claim 1, wherein the obtaining a data query expression comprises:
acquiring a data query command, wherein the data query command comprises keywords;
according to the keywords, carrying out grammar analysis processing on the data query command to obtain a grammar tree of the data query command;
and carrying out format conversion processing on the grammar tree by adopting a preset data exchange language to obtain a data query expression.
11. The query plan construction method as claimed in claim 1, further comprising, after said constructing a data query plan by said target sub-expression:
determining a query instruction corresponding to the data query plan and a copy instruction of the query instruction;
and sending the duplicate instruction to a plurality of data storage nodes of a database, so that the data storage nodes return query data according to the duplicate instruction.
12. A query plan construction apparatus, comprising:
the data query system comprises an acquisition unit, a query unit and a query unit, wherein the acquisition unit is used for acquiring a data query expression, and the data query expression comprises keywords;
the key unit is used for obtaining sub-expressions in the data query expression and the association relation between the sub-expressions according to the key words;
an equivalent unit for determining an equivalent sub-expression, the equivalent sub-expression and the sub-expression having equivalent expression meanings;
a determining unit configured to determine a target sub-expression from the sub-expression and the equivalent sub-expression;
and the construction unit is used for constructing a data query plan through the target sub-expression and the association relation.
13. An electronic device comprising a processor and a memory, the memory storing a plurality of instructions; the processor loads instructions from the memory to perform the steps in the query plan construction method as claimed in any one of claims 1 to 11.
14. A computer readable storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps in the query plan construction method of any one of claims 1 to 11.
15. A computer program product comprising computer programs/instructions which when executed by a processor implement the steps in the query plan construction method of any one of claims 1 to 11.
CN202310575987.9A 2023-05-19 2023-05-19 Query plan construction method, device, electronic equipment and storage medium Pending CN116975098A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310575987.9A CN116975098A (en) 2023-05-19 2023-05-19 Query plan construction method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310575987.9A CN116975098A (en) 2023-05-19 2023-05-19 Query plan construction method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116975098A true CN116975098A (en) 2023-10-31

Family

ID=88475650

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310575987.9A Pending CN116975098A (en) 2023-05-19 2023-05-19 Query plan construction method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116975098A (en)

Similar Documents

Publication Publication Date Title
US10983967B2 (en) Creation of a cumulative schema based on an inferred schema and statistics
US9798772B2 (en) Using persistent data samples and query-time statistics for query optimization
CN108038222B (en) System of entity-attribute framework for information system modeling and data access
US8285707B2 (en) Method of querying relational database management systems
US20170083573A1 (en) Multi-query optimization
CN111971666A (en) Dimension context propagation technology for optimizing SQL query plan
CN109299133A (en) Data query method, computer system and non-transitory computer-readable medium
Luo et al. Storing and indexing massive RDF datasets
US11468031B1 (en) Methods and apparatus for efficiently scaling real-time indexing
US20060161525A1 (en) Method and system for supporting structured aggregation operations on semi-structured data
Rodrigues et al. Big data processing tools: An experimental performance evaluation
CN103177094A (en) Cleaning method of data of internet of things
US20220391367A1 (en) Efficient Indexing for Querying Arrays in Databases
US20230418824A1 (en) Workload-aware column inprints
US10776368B1 (en) Deriving cardinality values from approximate quantile summaries
CN106484815B (en) A kind of automatic identification optimization method based on mass data class SQL retrieval scene
CN111078705A (en) Spark platform based data index establishing method and data query method
Yang et al. Traverse: simplified indexing on large map-reduce-merge clusters
Liu et al. Using provenance to efficiently improve metadata searching performance in storage systems
US20220215021A1 (en) Data Query Method and Apparatus, Computing Device, and Storage Medium
CN115391424A (en) Database query processing method, storage medium and computer equipment
Kalavri et al. m2r2: A framework for results materialization and reuse in high-level dataflow systems for big data
CN116975098A (en) Query plan construction method, device, electronic equipment and storage medium
Zhang et al. Building XML data warehouse based on frequent patterns in user queries
Zhu et al. Hydb: Access optimization for data-intensive service

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication