CN114036188B

CN114036188B - Method for optimizing and processing union in relational database management system

Info

Publication number: CN114036188B
Application number: CN202111429533.8A
Authority: CN
Inventors: 余鹏; 何小栋
Original assignee: Guangzhou Mass Database Technology Co ltd
Current assignee: Guangzhou Mass Database Technology Co ltd
Priority date: 2021-11-29
Filing date: 2021-11-29
Publication date: 2024-08-23
Anticipated expiration: 2041-11-29
Also published as: CN114036188A

Abstract

The invention belongs to the technical field of database management and operating systems, and particularly relates to a method for optimizing and processing union in a relational database management system. The invention provides a method for generating a query plan for optimization of a relational database management system for the first time, aiming at a new operator of union operation union all, the new operator can submit an access result of a basic table to sub-queries of an upper layer to execute projection operation or expression calculation by merging sub-queries with the same access mode of the basic table, and finally output the query result.

Description

Method for optimizing and processing union in relational database management system

Technical Field

The invention belongs to the technical field of database management and operating systems, and particularly relates to a method for optimizing and processing union in a relational database management system and application thereof.

Background

Processing a query in a typical relational database management system such as OpenGauss generally falls into the following three phases:

(1) Lexical grammar analysis is mainly to convert the input text (SQL) of a user into an internal data structure, generally called a grammar analysis tree, and verify the correctness of the grammar to finally obtain the grammar analysis tree representing SQL.

(2) Optimizing and generating a query plan, optimizing rules-based and physical cost-based grammar analysis trees obtained in the previous stage, and generating an optimal query plan.

(3) Executing the query plan, executing the query plan generated in the previous stage (generally adopting an iterator mode), obtaining a query result and returning the query result to the user.

In accordance with the conventional processing strategy in the present relational database management system, in the step (2) of optimizing and generating the query plan stage, for the SQL to be processed, such as SQL：select t.stid,'english'as cousename,t.english as score from t union all select t.stid,'chinese'as cousename,t.chinese as score from t,, in which the structure of the table t is the primary key stid (student's number), the types are int, english (student english score) and chinese (student's english score), the SQL obtains the student's number and english score, such SQL contains a plurality of sub-queries, these sub-queries obtain output columns for the same single or multiple tables, and finally the result is subjected to union all operations. In current operations, for such SQL, when generating a query plan, one sub-query plan is generated for each sub-query, and finally multiple sub-query plans are organized using an add operator (see FIG. 1). For the above example of obtaining the English score of the student, an end operator is generated, where the operator includes two sub-query plans, which represent select t.stind, 'english' as cousename, t.engish as score from t and select t.stind, 'chinese' as cousename, and t.Chinese as score from t, respectively.

Thus, in the next step (3) execution of the query plan phase, the sub-query plans need to be executed from bottom to top when executing the query plan, i.e. the sequential scanning operation (seqscan) of t is performed at the bottom, then the add operator is executed, and the final result (result) is obtained. However, the greatest disadvantage of the above processing procedure is that in the process of executing the plan, the table t needs to be scanned repeatedly (at least twice), which causes resource waste, in other words, if the SQL contains N sub-query plans, and the N sub-queries need to be combined, the table t needs to be scanned N times, and obviously, the larger the number of sub-query plans contained in the SQL, the larger the N, the more wasted resources, and the time, hardware and labor costs will be significantly increased.

Disclosure of Invention

The invention provides a solution to overcome the defect that the prior art scheme needs to scan a data table for SQL containing a plurality of sub-query plans, thereby causing larger resource waste. According to the scheme, an operator is designed, scanning (scanning of a single table or linking scanning of a plurality of tables) in all sub-queries at the lower layer is combined union all, projection operation or expression calculation is directly executed according to requirements of each sub-query, and finally a query result is output, so that SQL execution efficiency can be greatly improved, and a large amount of time, hardware and manpower resources are saved.

For example, for the SQL previously described:

select t.stid,'english'as cousename,t.english as score from t union all select t.stid,'chinese'as cousename,t.chinese as score from t.

For this case, we process union all (record merging) concatenated multiple sub-queries as follows: 1. extracting each sub-query to group, wherein the sub-queries grouped into the same group are required to meet the following conditions: the base tables that are not ordered and accessed are the same (for a single table, the same base table is required; for a link to multiple tables, the base tables are required to be the same and the link mode is also the same); 2. designing a new operator aiming at a plurality of sub-queries in each group to represent access to the basic table, wherein the operator scans each row and outputs a query result according to each sub-query; 3. each group is connected by union all and joined by an add operator.

As shown in FIG. 2, the lowest layer is a megescan operator (node), the query1 operator and the query2 operator on the upper layer are positioned on the same layer, and the query1 operator and the query2 operator are positioned on the uppermost layer of result operators.

The process is also performed from bottom to top: firstly executing megescan nodes, scanning a table t, and outputting the scanning results to query1 and query2 one by one; performing necessary computation, such as projection operation or expression computation, according to the result output by megescan, and submitting the result to the result node at the uppermost layer; and the result node returns the query result to the user.

Specifically, the invention provides a method for optimizing and processing union in a relational database management system, which provides a method for optimizing and processing union for relational database in the optimizing and generating query plan stage, and the method comprises the steps of designing an operator, merging union all and scanning in all sub-queries at the lower layer, directly executing projection operation or expression calculation according to the requirements of each sub-query, and finally outputting query results, thereby improving SQL execution efficiency and saving time, hardware and manpower resources;

the method for optimizing the union comprises the following steps:

(1) Grouping union all the operator-connected sub-queries in an optimization generation query plan stage;

(2) After the grouping is completed, when a query plan is generated, for the sub-queries in the same grouping, a first sub-query is processed to generate a two-layer query plan for the sub-query, wherein the lower layer is megescan (mege scanning) nodes, represents a scanning base table or a link operation of the sub-query, and the upper layer is query nodes, represents projection and calculation operations of the sub-query; for the rest sub-queries in the same group, using a lower-layer megescan node of the first sub-query as a lower-layer plan, wherein the upper-layer plan is the projection and calculation operation corresponding to the sub-query; after the processing, 1 megescan nodes and N upper-layer query nodes corresponding to N sub-queries are formed, and the lower-layer nodes of the N upper-layer query nodes are all the same megescan nodes; after the sub-queries in each group are processed in the same way, all the groups are connected by using an add mode to form a final execution plan;

(3) In the stage of executing query planning, firstly initializing all the nodes one by one from bottom to top, and for megescan in the same group and N query nodes at the upper layer, firstly initializing megescan nodes serving as lower nodes, and then initializing N query nodes at the upper layer one by one;

(4) In the stage of executing the query plan, executing the whole query plan in an iterator mode; for 1 lower-layer megescan nodes and N upper-layer query nodes generated by N sub-queries in the same group, firstly executing megescan nodes to obtain a record, submitting the obtained record to N upper-layer query nodes, executing the record by the N upper-layer query nodes, returning execution results to a result node at the uppermost layer one by one, and finally returning the query result to a user by the result node;

(5) After the query plan is executed, when the execution nodes are destroyed to recover resources, the nodes are destroyed one by one in a bottom-up mode, and for 1 lower-layer megescan nodes and N upper-layer query nodes in the same group, megescan nodes are destroyed firstly to release resources, and then N query nodes are destroyed one by one to release the resources.

Further, the scanning in the sub-queries in the method includes scanning of a single table and linking scanning of multiple tables.

Further, in the step (1) of the method, the plurality of subqueries connected by union all operators are grouped, and the subqueries grouped into the same group meet the following conditions: there is no aggregation, ordering operation, and the underlying table pattern of access is the same.

Further, the above-mentioned access basic table modes are the same, namely when the accessed basic table is a single table, the same basic table is required; when the accessed base table is a link of a plurality of tables, the base tables are required to be identical and the link modes are also identical.

Further, in the step (2) of the method, when there is only one packet, the application node is not set any more, and each upper-layer query node in the group is directly used as a lower-layer node of result.

Further, in the step (4) of the method, when the megescan node does not obtain the record to be processed, the record is directly returned, the blank is submitted to N upper-layer query nodes, at this time, the N upper-layer query nodes directly return the blank to a result node positioned at the uppermost layer, and finally, the result node returns the blank query result to the user;

The executing megescan node in the step (4) refers to scanning a basic table or a link operation; the execution of the record by the upper-layer query node means that the upper-layer query node performs projection operation or expression calculation on the record.

In addition, the invention also relates to the application of the method for optimizing the union processing in the relational database management system in the relational database management or operating system.

In summary, the invention provides a method for generating a query plan for optimizing a relational database management system for the first time, aiming at a new operator (node) of union operation union all, by merging sub-queries with the same access mode to a base table, the new operator submits an access result of the base table to a sub-query of an upper layer to execute projection operation or expression calculation and finally outputs the query result.

Drawings

In order to more clearly illustrate the technical solutions of the prior art and the embodiments of the present invention, the following brief description is given of the drawings needed in the prior art and the embodiments of the present invention, and it is obvious that the following drawings are only some embodiments described in the present invention, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a query plan strategy generated by the current approach for optimization of SQL containing multiple sub-queries.

FIG. 2 is a schematic diagram of a query plan strategy generated by the method of the present invention for optimization of SQL containing multiple sub-queries.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to specific embodiments and corresponding drawings. It is apparent that the described embodiments are only some embodiments of the present invention, but not all embodiments, and the present invention may be implemented or applied by different specific embodiments, and that various modifications or changes may be made in the details of the present description based on different points of view and applications without departing from the spirit of the present invention.

Meanwhile, it should be understood that the scope of the present invention is not limited to the following specific embodiments; it is also to be understood that the terminology used in the examples of the invention is for the purpose of describing particular embodiments only, and is not intended to limit the scope of the invention.

Example 1: in the optimizing generation inquiry plan stage of relational database management system, the method designs a new operator (node) aiming at union operation union all, merges sub-inquiry (scanning for single table or linking scanning for several tables) with same access mode to basic table, and the new operator submits the access result of basic table to upper sub-inquiry to execute projection operation or expression calculation, and finally outputs inquiry result, and the method specifically includes the following steps:

(1) During the optimization generation query plan stage, a plurality of subqueries (including single table scans and multiple table linked scans) connected by union all operators are grouped, and the subqueries grouped into the same group meet the following conditions: there is no aggregation and ordering operation, and the basic tables are accessed in the same way (when the accessed basic table is a single table, the same basic table is required, and when the accessed basic table is a link of a plurality of tables, the basic tables are required to be the same, and the link ways are also the same).

(2) After the grouping is completed, when a query plan is generated, for sub-queries in the same grouping, a first sub-query is processed to generate a two-layer query plan for the sub-query, wherein the lower layer is megescan nodes which represent a scanning basic table or a link operation of the sub-query, and the upper layer is query nodes which represent projection and calculation operations of the sub-query; for the rest sub-queries in the same group, using a lower-layer megescan node of the first sub-query as a lower-layer plan, wherein the upper-layer plan is the projection and calculation operation corresponding to the sub-query; after the processing, 1 megescan nodes and N upper-layer query nodes corresponding to N sub-queries are formed, and the lower-layer nodes of the N upper-layer query nodes are all the same megescan nodes; after the sub-queries in each group are processed in the same way, all the groups are connected by using an application way to form a final execution plan; when only one packet exists, an application node is not set any more, and each upper-layer query node in the group is directly used as a lower-layer node of result.

(3) In the stage of executing query planning, each node is initialized one by one from bottom to top, for megescan and N query nodes at the upper layer in the same group, megescan nodes serving as the lower layer nodes are initialized first, and then N query nodes at the upper layer are initialized one by one.

(4) In the stage of executing the query plan, executing the whole query plan in an iterator mode; for 1 lower-layer megescan nodes and N upper-layer query nodes generated by N sub-queries in the same group, firstly executing megescan nodes (scanning a basic table or linking operation) to obtain a record, submitting the obtained record to N upper-layer query nodes, executing the record by the N upper-layer query nodes (performing projection operation or expression calculation on the record), returning execution results to a result node positioned at the uppermost layer one by one, and finally returning the query result to a user by the result node; when the megescan node does not obtain the record to be processed, the record is directly returned, the blank is submitted to N upper-layer query nodes, at the moment, the N upper-layer query nodes directly return the blank to a result node positioned at the uppermost layer, and finally, the result node returns the blank query result to the user.

Example 2: a method of optimizing the processing of a union in a relational database management system (see figure 2),

SQL：select t.stid,'english'as cousename,t.english as score from t union all select t.stid,'chinese'as cousename,t.chinese as score from t, The structure of the table t is a primary key stid (student's number), the types are int, english (student's english score) and chinese (student's chinese score), and the SQL contains a plurality of sub-queries, which obtain the student's number and english score, the sub-queries obtain output columns on the same single or multiple tables, and finally union all is performed on the results.

Various embodiments of the invention are described in an incremental manner, with identical or similar parts being found in each other.

The foregoing is merely exemplary of the present invention and is not intended to limit the present invention. Various modifications and variations of the present invention will be apparent to those skilled in the art. Any modification, replacement, etc. that comes within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims

1. A method for optimizing union in a relational database management system is characterized in that the method provides a method for optimizing union in a query plan generation stage of relational database optimization, and a query result is finally output by designing an operator, combining union all and scanning in all sub-queries at the lower layer, and directly executing projection operation or expression calculation according to the requirements of each sub-query;

the method for optimizing the union comprises the following steps:

(1) In the optimization generation query plan stage, a plurality of subqueries connected by union all operators are grouped, and the subqueries grouped into the same group meet the following conditions: no aggregation and ordering operation exists, and the accessed basic table mode is the same;

(2) After the grouping is completed, when a query plan is generated, for sub-queries in the same grouping, a first sub-query is processed to generate a two-layer query plan for the sub-query, wherein the lower layer is megescan nodes which represent a scanning basic table or a link operation of the sub-query, and the upper layer is query nodes which represent projection and calculation operations of the sub-query; for the rest sub-queries in the same group, using a lower-layer megescan node of the first sub-query as a lower-layer plan, wherein the upper-layer plan is the projection and calculation operation corresponding to the sub-query; after the processing, 1 megescan nodes and N upper-layer query nodes corresponding to N sub-queries are formed, and the lower-layer nodes of the N upper-layer query nodes are all the same megescan nodes; after the sub-queries in each group are processed in the same way, all the groups are connected by using an application way to form a final execution plan;

2. The method of optimizing a union of processing of claim 1, wherein the scans in the sub-queries include a single table scan and a linked scan of multiple tables.

3. The method for optimizing a union according to claim 1, wherein the same manner of accessing the base tables means that when the accessed base tables are single tables, the same base tables are required; when the accessed base table is a link of a plurality of tables, the base tables are required to be identical and the link modes are also identical.

4. The method of optimizing a union of claim 1, wherein in step (2), when there is only one packet, an application node is not set any more, and each upper-layer query node in the group is directly used as a lower-layer node of result.

5. The method for optimizing the union of processing according to claim 1, wherein in the step (4), when the megescan nodes do not obtain the record to be processed, the method returns directly, and submits the blank to N upper-layer query nodes, at this time, the N upper-layer query nodes return the blank directly to the result node located at the uppermost layer, and finally the result node returns the blank query result to the user;

6. Use of the method of optimizing a union in a relational database management system according to any one of claims 1-5 in a relational database management or operating system.