CN114036188B - Method for optimizing and processing union in relational database management system - Google Patents

Method for optimizing and processing union in relational database management system Download PDF

Info

Publication number
CN114036188B
CN114036188B CN202111429533.8A CN202111429533A CN114036188B CN 114036188 B CN114036188 B CN 114036188B CN 202111429533 A CN202111429533 A CN 202111429533A CN 114036188 B CN114036188 B CN 114036188B
Authority
CN
China
Prior art keywords
query
nodes
layer
sub
megescan
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111429533.8A
Other languages
Chinese (zh)
Other versions
CN114036188A (en
Inventor
余鹏
何小栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Mass Database Technology Co ltd
Original Assignee
Guangzhou Mass Database Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Mass Database Technology Co ltd filed Critical Guangzhou Mass Database Technology Co ltd
Priority to CN202111429533.8A priority Critical patent/CN114036188B/en
Publication of CN114036188A publication Critical patent/CN114036188A/en
Application granted granted Critical
Publication of CN114036188B publication Critical patent/CN114036188B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24549Run-time optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of database management and operating systems, and particularly relates to a method for optimizing and processing union in a relational database management system. The invention provides a method for generating a query plan for optimization of a relational database management system for the first time, aiming at a new operator of union operation union all, the new operator can submit an access result of a basic table to sub-queries of an upper layer to execute projection operation or expression calculation by merging sub-queries with the same access mode of the basic table, and finally output the query result.

Description

Method for optimizing and processing union in relational database management system
Technical Field
The invention belongs to the technical field of database management and operating systems, and particularly relates to a method for optimizing and processing union in a relational database management system and application thereof.
Background
Processing a query in a typical relational database management system such as OpenGauss generally falls into the following three phases:
(1) Lexical grammar analysis is mainly to convert the input text (SQL) of a user into an internal data structure, generally called a grammar analysis tree, and verify the correctness of the grammar to finally obtain the grammar analysis tree representing SQL.
(2) Optimizing and generating a query plan, optimizing rules-based and physical cost-based grammar analysis trees obtained in the previous stage, and generating an optimal query plan.
(3) Executing the query plan, executing the query plan generated in the previous stage (generally adopting an iterator mode), obtaining a query result and returning the query result to the user.
In accordance with the conventional processing strategy in the present relational database management system, in the step (2) of optimizing and generating the query plan stage, for the SQL to be processed, such as SQL:select t.stid,'english'as cousename,t.english as score from t union all select t.stid,'chinese'as cousename,t.chinese as score from t,, in which the structure of the table t is the primary key stid (student's number), the types are int, english (student english score) and chinese (student's english score), the SQL obtains the student's number and english score, such SQL contains a plurality of sub-queries, these sub-queries obtain output columns for the same single or multiple tables, and finally the result is subjected to union all operations. In current operations, for such SQL, when generating a query plan, one sub-query plan is generated for each sub-query, and finally multiple sub-query plans are organized using an add operator (see FIG. 1). For the above example of obtaining the English score of the student, an end operator is generated, where the operator includes two sub-query plans, which represent select t.stind, 'english' as cousename, t.engish as score from t and select t.stind, 'chinese' as cousename, and t.Chinese as score from t, respectively.
Thus, in the next step (3) execution of the query plan phase, the sub-query plans need to be executed from bottom to top when executing the query plan, i.e. the sequential scanning operation (seqscan) of t is performed at the bottom, then the add operator is executed, and the final result (result) is obtained. However, the greatest disadvantage of the above processing procedure is that in the process of executing the plan, the table t needs to be scanned repeatedly (at least twice), which causes resource waste, in other words, if the SQL contains N sub-query plans, and the N sub-queries need to be combined, the table t needs to be scanned N times, and obviously, the larger the number of sub-query plans contained in the SQL, the larger the N, the more wasted resources, and the time, hardware and labor costs will be significantly increased.
Disclosure of Invention
The invention provides a solution to overcome the defect that the prior art scheme needs to scan a data table for SQL containing a plurality of sub-query plans, thereby causing larger resource waste. According to the scheme, an operator is designed, scanning (scanning of a single table or linking scanning of a plurality of tables) in all sub-queries at the lower layer is combined union all, projection operation or expression calculation is directly executed according to requirements of each sub-query, and finally a query result is output, so that SQL execution efficiency can be greatly improved, and a large amount of time, hardware and manpower resources are saved.
For example, for the SQL previously described:
select t.stid,'english'as cousename,t.english as score from t union all select t.stid,'chinese'as cousename,t.chinese as score from t.
For this case, we process union all (record merging) concatenated multiple sub-queries as follows: 1. extracting each sub-query to group, wherein the sub-queries grouped into the same group are required to meet the following conditions: the base tables that are not ordered and accessed are the same (for a single table, the same base table is required; for a link to multiple tables, the base tables are required to be the same and the link mode is also the same); 2. designing a new operator aiming at a plurality of sub-queries in each group to represent access to the basic table, wherein the operator scans each row and outputs a query result according to each sub-query; 3. each group is connected by union all and joined by an add operator.
As shown in FIG. 2, the lowest layer is a megescan operator (node), the query1 operator and the query2 operator on the upper layer are positioned on the same layer, and the query1 operator and the query2 operator are positioned on the uppermost layer of result operators.
The process is also performed from bottom to top: firstly executing megescan nodes, scanning a table t, and outputting the scanning results to query1 and query2 one by one; performing necessary computation, such as projection operation or expression computation, according to the result output by megescan, and submitting the result to the result node at the uppermost layer; and the result node returns the query result to the user.
Specifically, the invention provides a method for optimizing and processing union in a relational database management system, which provides a method for optimizing and processing union for relational database in the optimizing and generating query plan stage, and the method comprises the steps of designing an operator, merging union all and scanning in all sub-queries at the lower layer, directly executing projection operation or expression calculation according to the requirements of each sub-query, and finally outputting query results, thereby improving SQL execution efficiency and saving time, hardware and manpower resources;
the method for optimizing the union comprises the following steps:
(1) Grouping union all the operator-connected sub-queries in an optimization generation query plan stage;
(2) After the grouping is completed, when a query plan is generated, for the sub-queries in the same grouping, a first sub-query is processed to generate a two-layer query plan for the sub-query, wherein the lower layer is megescan (mege scanning) nodes, represents a scanning base table or a link operation of the sub-query, and the upper layer is query nodes, represents projection and calculation operations of the sub-query; for the rest sub-queries in the same group, using a lower-layer megescan node of the first sub-query as a lower-layer plan, wherein the upper-layer plan is the projection and calculation operation corresponding to the sub-query; after the processing, 1 megescan nodes and N upper-layer query nodes corresponding to N sub-queries are formed, and the lower-layer nodes of the N upper-layer query nodes are all the same megescan nodes; after the sub-queries in each group are processed in the same way, all the groups are connected by using an add mode to form a final execution plan;
(3) In the stage of executing query planning, firstly initializing all the nodes one by one from bottom to top, and for megescan in the same group and N query nodes at the upper layer, firstly initializing megescan nodes serving as lower nodes, and then initializing N query nodes at the upper layer one by one;
(4) In the stage of executing the query plan, executing the whole query plan in an iterator mode; for 1 lower-layer megescan nodes and N upper-layer query nodes generated by N sub-queries in the same group, firstly executing megescan nodes to obtain a record, submitting the obtained record to N upper-layer query nodes, executing the record by the N upper-layer query nodes, returning execution results to a result node at the uppermost layer one by one, and finally returning the query result to a user by the result node;
(5) After the query plan is executed, when the execution nodes are destroyed to recover resources, the nodes are destroyed one by one in a bottom-up mode, and for 1 lower-layer megescan nodes and N upper-layer query nodes in the same group, megescan nodes are destroyed firstly to release resources, and then N query nodes are destroyed one by one to release the resources.
Further, the scanning in the sub-queries in the method includes scanning of a single table and linking scanning of multiple tables.
Further, in the step (1) of the method, the plurality of subqueries connected by union all operators are grouped, and the subqueries grouped into the same group meet the following conditions: there is no aggregation, ordering operation, and the underlying table pattern of access is the same.
Further, the above-mentioned access basic table modes are the same, namely when the accessed basic table is a single table, the same basic table is required; when the accessed base table is a link of a plurality of tables, the base tables are required to be identical and the link modes are also identical.
Further, in the step (2) of the method, when there is only one packet, the application node is not set any more, and each upper-layer query node in the group is directly used as a lower-layer node of result.
Further, in the step (4) of the method, when the megescan node does not obtain the record to be processed, the record is directly returned, the blank is submitted to N upper-layer query nodes, at this time, the N upper-layer query nodes directly return the blank to a result node positioned at the uppermost layer, and finally, the result node returns the blank query result to the user;
The executing megescan node in the step (4) refers to scanning a basic table or a link operation; the execution of the record by the upper-layer query node means that the upper-layer query node performs projection operation or expression calculation on the record.
In addition, the invention also relates to the application of the method for optimizing the union processing in the relational database management system in the relational database management or operating system.
In summary, the invention provides a method for generating a query plan for optimizing a relational database management system for the first time, aiming at a new operator (node) of union operation union all, by merging sub-queries with the same access mode to a base table, the new operator submits an access result of the base table to a sub-query of an upper layer to execute projection operation or expression calculation and finally outputs the query result.
Drawings
In order to more clearly illustrate the technical solutions of the prior art and the embodiments of the present invention, the following brief description is given of the drawings needed in the prior art and the embodiments of the present invention, and it is obvious that the following drawings are only some embodiments described in the present invention, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a query plan strategy generated by the current approach for optimization of SQL containing multiple sub-queries.
FIG. 2 is a schematic diagram of a query plan strategy generated by the method of the present invention for optimization of SQL containing multiple sub-queries.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to specific embodiments and corresponding drawings. It is apparent that the described embodiments are only some embodiments of the present invention, but not all embodiments, and the present invention may be implemented or applied by different specific embodiments, and that various modifications or changes may be made in the details of the present description based on different points of view and applications without departing from the spirit of the present invention.
Meanwhile, it should be understood that the scope of the present invention is not limited to the following specific embodiments; it is also to be understood that the terminology used in the examples of the invention is for the purpose of describing particular embodiments only, and is not intended to limit the scope of the invention.
Example 1: in the optimizing generation inquiry plan stage of relational database management system, the method designs a new operator (node) aiming at union operation union all, merges sub-inquiry (scanning for single table or linking scanning for several tables) with same access mode to basic table, and the new operator submits the access result of basic table to upper sub-inquiry to execute projection operation or expression calculation, and finally outputs inquiry result, and the method specifically includes the following steps:
(1) During the optimization generation query plan stage, a plurality of subqueries (including single table scans and multiple table linked scans) connected by union all operators are grouped, and the subqueries grouped into the same group meet the following conditions: there is no aggregation and ordering operation, and the basic tables are accessed in the same way (when the accessed basic table is a single table, the same basic table is required, and when the accessed basic table is a link of a plurality of tables, the basic tables are required to be the same, and the link ways are also the same).
(2) After the grouping is completed, when a query plan is generated, for sub-queries in the same grouping, a first sub-query is processed to generate a two-layer query plan for the sub-query, wherein the lower layer is megescan nodes which represent a scanning basic table or a link operation of the sub-query, and the upper layer is query nodes which represent projection and calculation operations of the sub-query; for the rest sub-queries in the same group, using a lower-layer megescan node of the first sub-query as a lower-layer plan, wherein the upper-layer plan is the projection and calculation operation corresponding to the sub-query; after the processing, 1 megescan nodes and N upper-layer query nodes corresponding to N sub-queries are formed, and the lower-layer nodes of the N upper-layer query nodes are all the same megescan nodes; after the sub-queries in each group are processed in the same way, all the groups are connected by using an application way to form a final execution plan; when only one packet exists, an application node is not set any more, and each upper-layer query node in the group is directly used as a lower-layer node of result.
(3) In the stage of executing query planning, each node is initialized one by one from bottom to top, for megescan and N query nodes at the upper layer in the same group, megescan nodes serving as the lower layer nodes are initialized first, and then N query nodes at the upper layer are initialized one by one.
(4) In the stage of executing the query plan, executing the whole query plan in an iterator mode; for 1 lower-layer megescan nodes and N upper-layer query nodes generated by N sub-queries in the same group, firstly executing megescan nodes (scanning a basic table or linking operation) to obtain a record, submitting the obtained record to N upper-layer query nodes, executing the record by the N upper-layer query nodes (performing projection operation or expression calculation on the record), returning execution results to a result node positioned at the uppermost layer one by one, and finally returning the query result to a user by the result node; when the megescan node does not obtain the record to be processed, the record is directly returned, the blank is submitted to N upper-layer query nodes, at the moment, the N upper-layer query nodes directly return the blank to a result node positioned at the uppermost layer, and finally, the result node returns the blank query result to the user.
(5) After the query plan is executed, when the execution nodes are destroyed to recover resources, the nodes are destroyed one by one in a bottom-up mode, and for 1 lower-layer megescan nodes and N upper-layer query nodes in the same group, megescan nodes are destroyed firstly to release resources, and then N query nodes are destroyed one by one to release the resources.
Example 2: a method of optimizing the processing of a union in a relational database management system (see figure 2),
SQL:select t.stid,'english'as cousename,t.english as score from t union all select t.stid,'chinese'as cousename,t.chinese as score from t, The structure of the table t is a primary key stid (student's number), the types are int, english (student's english score) and chinese (student's chinese score), and the SQL contains a plurality of sub-queries, which obtain the student's number and english score, the sub-queries obtain output columns on the same single or multiple tables, and finally union all is performed on the results.
As shown in FIG. 2, the lowest layer is a megescan operator (node), the query1 operator and the query2 operator on the upper layer are positioned on the same layer, and the query1 operator and the query2 operator are positioned on the uppermost layer of result operators.
The process is also performed from bottom to top: firstly executing megescan nodes, scanning a table t, and outputting the scanning results to query1 and query2 one by one; performing necessary computation, such as projection operation or expression computation, according to the result output by megescan, and submitting the result to the result node at the uppermost layer; and the result node returns the query result to the user.
Various embodiments of the invention are described in an incremental manner, with identical or similar parts being found in each other.
The foregoing is merely exemplary of the present invention and is not intended to limit the present invention. Various modifications and variations of the present invention will be apparent to those skilled in the art. Any modification, replacement, etc. that comes within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims (6)

1. A method for optimizing union in a relational database management system is characterized in that the method provides a method for optimizing union in a query plan generation stage of relational database optimization, and a query result is finally output by designing an operator, combining union all and scanning in all sub-queries at the lower layer, and directly executing projection operation or expression calculation according to the requirements of each sub-query;
the method for optimizing the union comprises the following steps:
(1) In the optimization generation query plan stage, a plurality of subqueries connected by union all operators are grouped, and the subqueries grouped into the same group meet the following conditions: no aggregation and ordering operation exists, and the accessed basic table mode is the same;
(2) After the grouping is completed, when a query plan is generated, for sub-queries in the same grouping, a first sub-query is processed to generate a two-layer query plan for the sub-query, wherein the lower layer is megescan nodes which represent a scanning basic table or a link operation of the sub-query, and the upper layer is query nodes which represent projection and calculation operations of the sub-query; for the rest sub-queries in the same group, using a lower-layer megescan node of the first sub-query as a lower-layer plan, wherein the upper-layer plan is the projection and calculation operation corresponding to the sub-query; after the processing, 1 megescan nodes and N upper-layer query nodes corresponding to N sub-queries are formed, and the lower-layer nodes of the N upper-layer query nodes are all the same megescan nodes; after the sub-queries in each group are processed in the same way, all the groups are connected by using an application way to form a final execution plan;
(3) In the stage of executing query planning, firstly initializing all the nodes one by one from bottom to top, and for megescan in the same group and N query nodes at the upper layer, firstly initializing megescan nodes serving as lower nodes, and then initializing N query nodes at the upper layer one by one;
(4) In the stage of executing the query plan, executing the whole query plan in an iterator mode; for 1 lower-layer megescan nodes and N upper-layer query nodes generated by N sub-queries in the same group, firstly executing megescan nodes to obtain a record, submitting the obtained record to N upper-layer query nodes, executing the record by the N upper-layer query nodes, returning execution results to a result node at the uppermost layer one by one, and finally returning the query result to a user by the result node;
(5) After the query plan is executed, when the execution nodes are destroyed to recover resources, the nodes are destroyed one by one in a bottom-up mode, and for 1 lower-layer megescan nodes and N upper-layer query nodes in the same group, megescan nodes are destroyed firstly to release resources, and then N query nodes are destroyed one by one to release the resources.
2. The method of optimizing a union of processing of claim 1, wherein the scans in the sub-queries include a single table scan and a linked scan of multiple tables.
3. The method for optimizing a union according to claim 1, wherein the same manner of accessing the base tables means that when the accessed base tables are single tables, the same base tables are required; when the accessed base table is a link of a plurality of tables, the base tables are required to be identical and the link modes are also identical.
4. The method of optimizing a union of claim 1, wherein in step (2), when there is only one packet, an application node is not set any more, and each upper-layer query node in the group is directly used as a lower-layer node of result.
5. The method for optimizing the union of processing according to claim 1, wherein in the step (4), when the megescan nodes do not obtain the record to be processed, the method returns directly, and submits the blank to N upper-layer query nodes, at this time, the N upper-layer query nodes return the blank directly to the result node located at the uppermost layer, and finally the result node returns the blank query result to the user;
The executing megescan node in the step (4) refers to scanning a basic table or a link operation; the execution of the record by the upper-layer query node means that the upper-layer query node performs projection operation or expression calculation on the record.
6. Use of the method of optimizing a union in a relational database management system according to any one of claims 1-5 in a relational database management or operating system.
CN202111429533.8A 2021-11-29 2021-11-29 Method for optimizing and processing union in relational database management system Active CN114036188B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111429533.8A CN114036188B (en) 2021-11-29 2021-11-29 Method for optimizing and processing union in relational database management system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111429533.8A CN114036188B (en) 2021-11-29 2021-11-29 Method for optimizing and processing union in relational database management system

Publications (2)

Publication Number Publication Date
CN114036188A CN114036188A (en) 2022-02-11
CN114036188B true CN114036188B (en) 2024-08-23

Family

ID=80139033

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111429533.8A Active CN114036188B (en) 2021-11-29 2021-11-29 Method for optimizing and processing union in relational database management system

Country Status (1)

Country Link
CN (1) CN114036188B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102163195A (en) * 2010-02-22 2011-08-24 北京东方通科技股份有限公司 Query optimization method based on unified view of distributed heterogeneous database
CN105718593A (en) * 2016-01-28 2016-06-29 长春师范大学 Database query optimization method and system

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8635206B2 (en) * 2011-06-30 2014-01-21 International Business Machines Corporation Database query optimization
CN106777054B (en) * 2016-12-09 2020-03-27 北京华胜信泰数据技术有限公司 Semi-connection merging method and semi-connection merging device
CN109739879A (en) * 2018-12-14 2019-05-10 广州优态科技有限公司 Inquire the output method of data hierarchy
CN110688393B (en) * 2019-09-29 2021-01-29 星环信息科技(上海)股份有限公司 Query statement optimization method and device, computer equipment and storage medium
CN111506613A (en) * 2020-04-22 2020-08-07 支付宝(杭州)信息技术有限公司 Method, system, device and equipment for querying incidence relation of data record

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102163195A (en) * 2010-02-22 2011-08-24 北京东方通科技股份有限公司 Query optimization method based on unified view of distributed heterogeneous database
CN105718593A (en) * 2016-01-28 2016-06-29 长春师范大学 Database query optimization method and system

Also Published As

Publication number Publication date
CN114036188A (en) 2022-02-11

Similar Documents

Publication Publication Date Title
CN103064875B (en) A kind of spatial service data distributed enquiring method
Simitsis et al. State-space optimization of ETL workflows
US7343370B2 (en) Plan generation in database query optimizers through specification of plan patterns
Yang et al. A framework for designing materialized views in data warehousing environment
CN107169033A (en) Relation data enquiring and optimizing method with parallel framework is changed based on data pattern
Hölsch et al. An algebra and equivalences to transform graph patterns in neo4j
US20080040317A1 (en) Decomposed query conditions
US8554760B2 (en) System and method for optimizing queries
JP2005521954A (en) Method and apparatus for querying a relational database
JPH07319923A (en) Method and equipment for processing of parallel database of multiprocessor computer system
Evrendilek et al. Multidatabase query optimization
CN106777343A (en) increment distributed index system and method
CN112732752A (en) Query statement optimization method, device, equipment and storage medium
US6598044B1 (en) Method for choosing optimal query execution plan for multiple defined equivalent query expressions
CN114036188B (en) Method for optimizing and processing union in relational database management system
CN101719162A (en) Multi-version open geographic information service access method and system based on fragment pattern matching
CN116383247A (en) Large-scale graph data efficient query method
US6421657B1 (en) Method and system for determining the lowest cost permutation for joining relational database tables
CN112487015B (en) Distributed RDF system based on incremental repartitioning and query optimization method thereof
Yang et al. Tackling the challenges of materialized view design in data warehousing environment
CN118113723B (en) Multi-query shared ordering merge connection method, device, storage medium and program product
Ngu et al. Heterogeneous Query Optimization Using Maximal Sub-Queries
Li et al. Automatic generation of mediated schemas through reasoning over data dependencies
Lee et al. Minimization of resource consumption for multidatabase query optimization
Meng et al. The processing and improvement of multi-statement queries in Chiql

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant