CN110413651B

CN110413651B - Connection query method and device for relational database management system

Info

Publication number: CN110413651B
Application number: CN201910742186.0A
Authority: CN
Inventors: 江树浩; 鄢贵海; 黄彩虹
Original assignee: Yusur Technology Co ltd
Current assignee: Yusur Technology Co ltd
Priority date: 2019-08-13
Filing date: 2019-08-13
Publication date: 2020-12-08
Anticipated expiration: 2039-08-13
Also published as: CN110413651A

Abstract

The invention provides a connection query method and a connection query device for a relational database management system, wherein the method comprises the following steps: analyzing the position information of the attributes participating in connection in the outer table and the inner table and the position information of the attributes to be returned from the related structural bodies of the left plan execution node and the right plan execution node respectively; according to the position information of the attribute participating in the connection in the outer table and the position information of the attribute to be returned in the inner table, respectively, pulling all tuple data of the attribute to participate in the connection in the outer table and the inner table and all tuple data of the attribute to be returned from the left planning subtree node and the right planning subtree node, and respectively storing the tuple data in the first table and the second table, the third table and the fourth table; matching and comparing the data in the first table and the data in the third table according to the connection condition of the outer table and the inner table by using a uniform function; and returning a connection query result according to the matching comparison result, the preset connection type, the second table and the fourth table. By the scheme, the overhead can be reduced, the connection operation of the relational database management system is accelerated, and the system execution efficiency is improved.

Description

Connection query method and device for relational database management system

Technical Field

The invention relates to the technical field of database management, in particular to a connection query method and a connection query device of a relational database management system (DBMS).

Background

With the advent of the digital society, massive data is called more frequently in various application scenes, and the requirement on accelerating the processing of a database is wider.

In relational databases, relational models are mainly used to organize data, storing data in rows and columns. Complex relationships exist among a plurality of data tables, and as the number of data tables increases, data management becomes more complex, and bottlenecks in data operations also occur in operations performed on the plurality of data tables. Among them, the Join operation is a typical inter-data table operation for acquiring data in which a row in one table matches a row in another table from two or more tables. The connection operations can further be divided into various types, including: INNER connection INNER JOIN, OUTER connection OUTER JOIN, LEFT connection LEFT JOIN, RIGHT connection RIGHT JOIN, ANTI-connection ANTI JOIN and the like. The essential purpose of join operations is to create a new table based on certain data matching relationships, but different types of join operations will return different table results. For example, the inner join only returns the data rows in the two tables that match successfully, while the left join returns all the rows in the outer table that specify the attributes, and the portion of the inner table that fails to match the inner table is shown as empty.

In many relational DBMSs today, there is a fixed query compilation and execution mechanism for join operations. However, these mechanisms have some drawbacks, such as difficulty in interfacing external acceleration and low efficiency of the overall join operation when no row index for assisting lookup exists on the attributes to be joined.

Disclosure of Invention

The invention provides a connection query method and a connection query device for a relational database management system, which are used for accelerating the connection operation of the relational database management system and improving the execution efficiency of the system.

In order to achieve the purpose, the invention is realized by adopting the following scheme:

according to an aspect of the embodiments of the present invention, there is provided a connection query method for a relational database management system, including:

acquiring information of a left plan execution node and information of a right plan execution node of a relational database management system, wherein the information of the left plan execution node comprises attributes to be returned and attributes participating in connection in an outer table, and the information of the right plan execution node comprises attributes to be returned and attributes participating in connection in an inner table;

analyzing the position information of the attribute participating in the connection in the outer table and the position information of the attribute to be returned, and the position information of the attribute participating in the connection in the inner table and the position information of the attribute to be returned from the related structure of the left plan execution node and the related structure of the right plan execution node respectively;

according to the position information of the attributes participating in connection and the position information of the attributes to be returned in the appearance, pulling all tuple data of the attributes to participate in connection and all tuple data of the attributes to be returned in all tuple slots in the appearance from a subtree node in the left plan execution node, and respectively storing the tuple data in a first table and a second table;

according to the position information of the attributes participating in the connection and the position information of the attributes to be returned in the inner table, pulling all tuple data of the attributes to participate in the connection and all tuple data of the attributes to be returned in all tuple slots in the inner table from the subtree node in the right plan execution node, and respectively storing the tuple data in a third table and a fourth table;

matching and comparing the tuple data in the first table and the tuple data in the third table according to the connection conditions of the outer table and the inner table by using a uniform function, and storing a matching and comparing result in a fifth table;

and returning a connection query result according to the fifth table, the preset connection type, the second table and the fourth table.

According to another aspect of the embodiments of the present invention, there is provided a connection query apparatus for a relational database management system, including:

a node information obtaining unit, configured to obtain information of a left plan execution node and information of a right plan execution node of a relational database management system, where the information of the left plan execution node includes an attribute to be returned in an outer table and an attribute to participate in a connection, and the information of the right plan execution node includes an attribute to be returned in an inner table and an attribute to participate in a connection;

a position information acquiring unit, configured to analyze position information of the attribute participating in the connection in the outer table and position information of the attribute to be returned, and position information of the attribute participating in the connection in the inner table and position information of the attribute to be returned, from the correlation structure of the left plan execution node and the correlation structure of the right plan execution node, respectively;

a left plan sub-tree node data pulling unit, configured to pull, according to the location information of the attribute participating in the connection and the location information of the attribute to be returned in the exterior, all tuple data of the attribute to participate in the connection and all tuple data of the attribute to be returned in all tuple slots in the exterior from the sub-tree node in the left plan execution node, and store the data in the first table and the second table, respectively;

a right plan sub-tree node data pulling unit, configured to pull, from the sub-tree node in the right plan execution node, all tuple data of the attributes to participate in the connection and all tuple data of the attributes to be returned in all tuple slots in the inner table according to the location information of the attributes to participate in the connection and the location information of the attributes to be returned in the inner table, and store the tuple data in a third table and a fourth table, respectively;

a join operation unit for matching and comparing the tuple data in the first table and the tuple data in the third table according to the join conditions of the outer table and the inner table using a uniform function, and storing a matching and comparing result in a fifth table;

and the result output unit is used for returning a connection query result according to the fifth table, the preset connection type, the second table and the fourth table.

According to a further aspect of embodiments of the present invention, there is provided a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the method of the above embodiments.

According to the connection query method of the relational database management system, the connection query device of the relational database management system and the computer readable storage medium, all tuple data of attributes to be compared in the outer surface and the inner surface are taken out at one time, the tuple data are compared by using a uniform function, all tuple serial numbers meeting connection conditions are returned at one time, the overhead of acquiring plan node information and the overhead of function calling can be saved, and the execution efficiency of the connection operation of the DBMS is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts. In the drawings:

FIG. 1 is a flow chart of a connection query method of a relational database management system according to an embodiment of the invention;

FIG. 2 is a flowchart illustrating a connection query method of a relational database management system according to an embodiment of the invention;

fig. 3 is a schematic diagram of a key data structure NestLoopState involved in the NestJoinLoop algorithm in the PostgreSQL kernel in an embodiment of the present invention;

FIG. 4 is a diagram illustrating the execution of a kernel function according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a connection query apparatus of a relational database management system according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention are further described in detail below with reference to the accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.

The inventor finds that a defect in the prior art is that when there is no row index for assisting in searching on attributes to be connected, the DBMS searches a tuple value corresponding to a certain attribute in a sequential scanning manner, and only one tuple is taken to perform comparison processing each time in the sequential scanning. Because the overhead of acquiring the actual metadata from the execution plan node of the database is relatively large, the actual metadata needs to be acquired by a function of the database kernel. When the method is executed in a database kernel, the time and function call cost are very high, and the process of judging whether the matching relation exists between the data rows can reach the time complexity of square level. Therefore, it is difficult to dock the external acceleration scheme, and the efficiency of the overall connection operation is low.

In order to solve the above defects in the prior art, the inventors propose a connection query method for a relational database management system based on the above findings by studying how to optimize a method of sequentially scanning data for each table.

The following describes a detailed embodiment of the connection query method of the relational database management system according to the present invention.

Fig. 1 is a flowchart illustrating a connection query method of a relational database management system according to an embodiment of the present invention. As shown in fig. 1, the connection query method of the relational database management system according to some embodiments may include the following steps S110 to S160. The steps may optimize the query process of performing join operations in an existing relational database management system (DBMS).

Step S110: the method comprises the steps of obtaining information of a left plan execution node and information of a right plan execution node of a relational database management system, wherein the information of the left plan execution node comprises attributes to be returned and attributes participating in connection in an outer table, and the information of the right plan execution node comprises the attributes to be returned and the attributes participating in connection in an inner table.

When a query related to a join operation is executed in an existing relational database management system (e.g., postgreSQL, MySQL, ORACLE, etc.), related information of a left-planned execution node and a right-planned execution node, such as a corresponding table, a corresponding structure, etc., may be acquired. Wherein the left plan execution node corresponds to the outer table, and the right plan execution node corresponds to the inner table. In the outer and inner tables, tuples and attributes may refer to rows and columns, respectively, in the tables.

The attributes to be returned and the attributes participating in the connection may be identified by identification information of the attributes, such as a name, a code number, or a position index of the attributes; wherein, the attribute participating in the connection may refer to an attribute capable of associating an outer table and an inner table, for example, an attribute existing in both the inner table and the outer table; the attribute to be returned in the outer table and the attribute to be returned in the inner table may be an attribute that a user needs to acquire, or may be an attribute that needs to be compared or corresponded.

For example, an external student comprises an attribute name and an attribute sno, an internal table sc comprises an attribute sno and an attribute cno, if the external student and the internal table sc are connected, the attribute required to be returned in the external student is the name, the attribute required to be returned in the internal table sc is the cno, the attributes participating in the connection between the external student and the internal table sc are sno, and the connection condition is that the attribute sno in the external student is equal to the attribute sno in the internal table sc.

Step S120: and analyzing the position information of the attribute participating in the connection in the outer table and the position information of the attribute to be returned in the inner table and the position information of the attribute participating in the connection in the inner table and the position information of the attribute to be returned from the related structure of the left plan execution node and the related structure of the right plan execution node respectively.

In other words, the position information of the attribute participating in the connection in the outer table and the position information of the attribute to be returned in the outer table are analyzed from the correlation structural body of the left plan execution node, and the position information of the attribute participating in the connection in the inner table and the position information of the attribute to be returned in the inner table are analyzed from the correlation structural body of the right plan execution node, and the two analysis processes do not limit the execution sequence.

For the outer table and the inner table, the position information of the attributes participating in the join and the position information of the attributes to be returned may be represented by a position index. Taking the DBMS of postgreSQL as an example, the correlation structure of the left plan execution node and the correlation structure of the right plan execution node may be both structures ps _ ProjInfo; sn, the location information of the attribute to participate in the connection is represented by an index location, e.g., 0, and the location information of the attribute to be returned is represented by an index location, e.g., 1; in the skin sc, the position information of the attribute sc.sno participating in the connection is represented by an index position, for example, 0, and the position information of the attribute sc.cno to be returned is represented by an index position, for example, 1.

Before the step S120, the method of each embodiment may further include a step of determining whether the row index scan cannot be performed in either the left or right scheduled execution nodes (in this case, the sequential scan is generally required). Specifically, whether a row index scanning mode exists in the left-plan execution node may be determined by determining whether an attribute that needs to be returned in the outer table is a primary key, and whether a row index scanning mode exists in the right-plan execution node may be determined by determining whether an attribute that needs to be returned in the inner table is a primary key.

In some embodiments, the step S120 is executed only when neither of the left and right scheduled execution nodes can perform the row index scan, so that the sequential scan manner that may be adopted by the existing DBMS can be optimized, and the efficiency of the connection operation is improved. Before the step S120, the method of each embodiment may further include: s170, determining that no row index scanning mode exists in the left plan execution node and the right plan execution node. In this embodiment, by determining that the subsequent steps are executed only when the row index scan cannot be performed, the problem of low connection efficiency caused by the sequential scan mode adopted by the existing DBMS can be specifically solved.

In other embodiments, if at least one of the left and right scheduled execution nodes can perform the row index scan, the connection operation is still performed by using the row index scan in the existing DBMS. Before the step S120, the method of each embodiment may further include: and S180, under the condition that the left plan execution node and/or the right plan execution node is determined to have a row index scanning mode, performing row index scanning according to the information of the left plan execution node and the information of the right plan execution node to obtain a connection query result. In this embodiment, in the case that the subsequent connection operation can be performed in the line index scanning manner, the connection operation is preferentially queried in the line index scanning manner, so that the efficiency of connection query can be improved as much as possible.

Step S130: and pulling all tuple data of the attributes to participate in connection and all tuple data of the attributes to be returned in all tuple slots in the appearance from the subtree node in the left plan execution node according to the position information of the attributes to participate in connection and the position information of the attributes to be returned in the appearance, and respectively storing the tuple data and the tuple data in a first table and a second table.

In this step S130, the tuple data in the tuple slot is mainly pulled for the left-plan subtree node (subtree node in the left-plan execution node). All tuple data of the attributes to participate in the join in all tuple slots in the outer table can be pulled from the left plan subtree node according to the position information of the attributes to participate in the join in the outer table, and the pulled all tuple data of the attributes to participate in the join is stored in a first table (for example, a table named dblib _ input _ output); and pulling all tuple data of the attribute to be returned in all tuple slots in the outer table from the left planning subtree node according to the position information of the attribute to be returned in the outer table, and storing the pulled all tuple data of the attribute to be returned in a second table (for example, a table named dblib _ outer _ output _ data), wherein the tuple slots in the outer table when all tuple data of the attribute to participate in connection are pulled correspond to the tuple slots in the outer table when all tuple data of the attribute to be returned are pulled.

The external surface can include a plurality of tuple slots, and tuple data of the attributes to participate in the connection can be pulled from each tuple slot in the external surface in a circular pulling mode, and then tuple data of the attributes to be returned is pulled from the tuple slot.

In this embodiment, all the metadata in the external surface that needs to participate in comparison, that is, the metadata of the attribute participating in the connection in the external surface, may be obtained at one time. This facilitates comparison of the meta-group data of the internal table and the external table by using a uniform function in the subsequent step S150.

Step S140: and pulling all tuple data of the attributes to participate in the connection and all tuple data of the attributes to be returned in all tuple slots in the inner table from the subtree node in the right plan execution node according to the position information of the attributes to participate in the connection and the position information of the attributes to be returned in the inner table, and respectively storing the tuple data in a third table and a fourth table.

In step S140, the tuple data in the tuple slot is mainly pulled for the right-plan subtree node (subtree node in the right-plan execution node). All tuple data of the attributes to participate in the join in all tuple slots in the inner table can be pulled from the right plan subtree node according to the position information of the attributes to participate in the join in the inner table, and the pulled all tuple data of the attributes to participate in the join is stored in a third table (for example, a table named dblib _ input _ inner); and pulling all tuple data of the attribute to be returned in all tuple slots in the outer table from the right planning subtree node according to the position information of the attribute to be returned in the inner table, and storing the all tuple data of the attribute to be returned which is pulled in a fourth table (for example, a table named dblib _ inner _ output _ data), wherein the tuple slots in the inner table when all tuple data of the attribute to participate in the connection are pulled correspond to the tuple slots in the inner table when all tuple data of the attribute to be returned are pulled.

The inner table may include a plurality of tuple slots, and tuple data of the attributes to participate in the connection may be pulled from each tuple slot in the inner table in a cyclic pull manner, and then tuple data of the attributes to be returned is pulled from the tuple slot.

In this embodiment, all metadata that needs to participate in comparison in the inner table, that is, metadata of attributes that participate in connection in the inner table, may be obtained at one time. This facilitates comparison of the meta-group data of the internal table and the external table by using a uniform function in the subsequent step S150.

Step S150: and matching and comparing the tuple data in the first table and the tuple data in the third table according to the connection conditions of the outer table and the inner table by using a uniform function, and storing a matching and comparing result in a fifth table.

In this case, the join condition of the outer table and the inner table may be directly obtained when a query related to a join operation is prepared or started to be executed in an existing DBMS, for example, one or more of the attributes existing in both the outer table and the inner table may be used as attributes participating in the join, and the attribute participating in the join in the outer table and the attribute participating in the join in the inner table may be equal to each other as the join condition of the outer table and the inner table. The matching result may include information of tuple pair indexes formed by tuple data in the first table and tuple data in the third table, which are successfully matched.

In step S150, since all the metadata of the attribute participating in the connection in the required outer table have been pulled out and stored in the first table, and all the metadata of the attribute participating in the connection in the required inner table have been pulled out and stored in the third table, and in addition, a uniform function (for example, a uniform kernel function) is used for matching comparison, so that the comparison of all the metadata in the first table and the third table can be completed only by performing one function call, thereby avoiding calling a function every time of performing one comparison, and thus the present solution can reduce the cost of function call and the cost of time occupied by function call, and therefore, can solve the problem of low execution efficiency when the existing DBMS performs the connection operation.

In some embodiments, a kernel function may be utilized for match comparison. Illustratively, the step S150, more specifically, may include the steps of: and S151, inputting the tuple data in the first table and the tuple data in the third table into a kernel function, and comparing the tuple data in the first table and the tuple data in the third table in a one-to-one matching manner according to the connection conditions of the outer table and the inner table to obtain tuple pair index information corresponding to the tuple data pairs meeting the connection conditions, and storing the tuple pair index information in a fifth table as a matching comparison result.

The kernel function may be a kernel function added to the DBMS source code by the method, and may call a kernel for comparison calculation. In this embodiment, a kernel function may be called once, and then the kernel function compares the tuple data in the first table and the tuple data in the third table in sequence, and the comparison process is performed circularly, specifically, for example, one tuple data in the first table is read and compared with each tuple data in the third table once, and then the next tuple data in the first table is read and compared with each tuple data in the third table once. Therefore, all data can be compared by calling the kernel function once, and the expense of continuously calling the comparison function of the DBMS is greatly reduced.

In other embodiments, acceleration of the match comparison may be performed using external software or hardware. Illustratively, the step S151 or the step S150 may further include: s1511, the processes of matching and comparing the tuple data in the first table and the tuple data in the third table according to the connection conditions of the outer table and the inner table are integrated into a kernel function interface, so as to execute kernel function matching and comparison by using external software or hardware acceleration, and obtain tuple pair index information corresponding to the tuple data pairs meeting the connection conditions.

Wherein, the external software can be parallel execution algorithm, and the hardware can be a plurality of CPUs. By using the external software or hardware, for example, a multi-CPU multi-thread or single-CPU parallel computing mode is adopted, the comparison process of multiple pairs of tuples can be processed simultaneously. The integrated kernel function interface can mainly compare port conditions and data input modes of input data when the external software or hardware carries out comparison calculation, and input required data, such as the tuple data in the first table and the tuple data in the third table, including connection conditions, into the external software or hardware through the integrated interface, and the interface can also receive a calculation result output by the external software or hardware.

In this embodiment, by integrating the kernel function interface, the kernel function can be utilized to execute the comparison calculation process in a faster calculation manner (for example, parallel execution), so as to further improve the comparison efficiency.

Step S160: and returning a connection query result according to the fifth table, the preset connection type, the second table and the fourth table.

The fifth table stores the matching comparison result, and specifically may include index information of one or more tuple pairs, such as tuple numbers, each tuple pair may include corresponding two tuple data, and the two tuple data may include tuple data of the attribute to participate in the connection in the outer table and tuple data of the attribute to participate in the connection in the inner table matching with the tuple data. In other words, the fifth table stores information of the tuple where the data of the attribute to participate in the join in the first table and the information of the tuple where the data of the attribute to participate in the join in the third table, which correspond to the join condition.

The connection types may include inner connections, left connections, right connections, anti-connections, etc., which may differ from DBMS to DBMS or from connection algorithm to connection algorithm. In some embodiments, in a case that the fifth table stores tuple pair information corresponding to tuple data pairs meeting the connection condition, for example, in an embodiment to which the step S151 is applied, the step S160 may further include, in more detail, the step of: s161, when the connection type is an internal connection, obtaining corresponding tuple data from the second table and the fourth table according to tuple pair index information corresponding to tuple data pairs meeting the connection condition in the fifth table, and returning the obtained tuple data; and S162, under the condition that the connection type is left connection, right connection or reverse connection, supplementing information of an external tuple which does not meet the connection condition and fails to be matched in the fifth table, setting a matching failure identifier at a corresponding position for storing internal tuple information to update the fifth table, and then acquiring tuple data which needs to be returned from the second table and the fourth table according to the updated fifth table to generate a connection query result for returning.

The step S161 and the step S162 are not in sequence, but different types of connection manners are defined, and the corresponding manners of returning the query result may be different.

For example, in step S161, since the internal connection requires to return the tuple data of the attributes that need to be returned in the internal table and the external table that match successfully, when the fifth table contains information of these data, the tuple data of the attributes that need to be returned can be read from the second table and the fourth table, respectively, directly according to the index information in the fifth table.

For another example, in step S162, since the left connection or the right connection needs to return the metadata of the attribute that needs to be returned in all the outer tables and the metadata of the attribute that needs to be returned in the corresponding inner table, the metadata of the attribute that needs to be returned in the inner table is the original metadata for the data that is successfully matched, and for the data pair that is failed to be matched, specific data (indicating that matching fails) or a null value may be returned. For anti-connection data, metadata needs to be returned that matches the attributes that need to be returned in the failed skin. In short, for the left connection, the right connection, and the inverse connection, the current fifth table containing the tuple pair information meeting the condition may not meet the return requirement, so the fifth table data may be updated through the above step S162 to complete the required data. Specifically, it is necessary to supplement information of metadata of an attribute to be returned in the outer table with a failed matching to the fifth table, and set information of data of the inner table corresponding to the information of the metadata in the supplemented outer table as an identifier of the failed matching, for example, the unified identifier is-1, and if it is seen that the information of data corresponding to the inner table in the fifth table is-1, the identifier of the failed matching is identified.

In a further embodiment, in the step S162, that is, according to the updated fifth table, obtaining tuple data that needs to be returned from the second table and the fourth table to generate a connection query result for returning, and more specifically, the method may include the steps of: s1621, when the connection type is left connection or right connection, according to the updated fifth table, obtaining all tuple data from the second table, correspondingly obtaining tuple data meeting the connection condition from the fourth table, returning a null value for an attribute to be returned of a tuple that is not successfully matched in the second table, and generating a connection query result for returning; and S1622, under the condition that the connection type is reverse connection, according to the updated fifth table, obtaining the external table metadata which does not accord with the connection condition from the second table, and generating a connection query result for returning.

In step S1621, the updated fifth table includes information of all attribute data that the exterior needs to return, and correspondingly includes information of metadata that is successfully matched in the interior table and a matching failure flag, so that all attribute metadata that the exterior needs to return and corresponding metadata data in the interior table can be returned for the connection type of the left connection or the right connection, or correspondingly empty can be returned according to the matching failure flag. In step S1622, after the fifth table is updated, the fifth table includes information of all metadata data that fails to be matched, so that metadata of the attribute that needs to be returned in the outer table and the inner table that fails to be matched can be sequentially returned.

In other embodiments, if there is a matching comparison result stored in the fifth table, the step S160 may be directly performed to output the connection query result. For example, before the step S120, the method shown in fig. 1 may further include the steps of: s190, determining that no matching comparison result exists between the tuple data in the first table and the tuple data in the third table. In this embodiment, if it is determined that there is no matching comparison result, which indicates that the main function of the connection algorithm used when the connection operation is not executed yet, the above steps S120 to S160 and the like need to be executed to perform the connection query operation by using the method of each embodiment of the present invention.

In the embodiments, all tuple data of the attributes to be compared in the outer surface and the inner surface are taken out at one time, a unified function is used for comparing the tuple data, all tuple serial numbers meeting the connection condition are returned at one time, the overhead of acquiring plan node information and the overhead of function calling can be saved, and the efficiency of connection operation is improved. In addition, by simplifying the form of the data interface, the function calculation can be accelerated by docking a software and hardware acceleration scheme, so that the connection operation is accelerated, and the execution efficiency is improved.

In some embodiments, to optimize the query execution process for join operations in an existing DBMS. Aiming at the condition that the efficiency is low due to the fact that sequential scanning pull element group data is needed to be compared and calculated, the method is improved based on the existing scheme, and optimization and acceleration of processes such as internal connection, left connection, right connection, reverse connection and the like can be supported. In the embodiments, an acceleration scheme for database connection operations is provided, which may include the following steps S1 to S9.

And S1, judging whether the optimization acceleration is needed in specific situations. Specifically, when there is no way of scanning the row index for both the left and right execution plan nodes, step S2 may be entered to perform the improved join process, otherwise the original join process may be performed.

S2, judging whether the main function of the connection algorithm is executed for the first time, if so, executing the step S3, otherwise, executing the step S9.

S3, from the structure relating to the current execution plan node, analyzes the information of the attribute to be returned in each of the outer table and the inner table, and the information of the attribute participating in the connection in each of the outer table and the inner table.

S4, cyclically pulling the tuple slot from the left planning subtree node. Specifically, according to the information of the attributes to be connected obtained in step S3, all tuple data of the attributes to participate in the connection are taken out from each tuple slot as the input of the subsequent kernel function; meanwhile, all tuple data of the attributes to be returned from the tuple slots are taken out and can be stored in the dbmib _ outer _ output _ data table until all tuple slots in the external table are pulled completely, and then step S5 is executed.

S5, similar to step S4, the tuple slot is cyclically pulled from the right-plan subtree node. Specifically, according to the information of the attributes to be connected obtained in step S3, all tuple data of the attributes to participate in the connection are taken out from each tuple slot as the input of the subsequent kernel function; and simultaneously, all tuple data of the attributes to be returned in the tuple slots are taken out and stored in the dblib _ inner _ output _ data table until all tuple slots in the inner table are pulled completely, and then the step S6 is executed.

And S6, comparing all the tuple data of the attributes to be connected in the outer table and the inner table obtained in the steps S4 and S5 one by one according to the connection conditions, recording tuple pair sequence numbers meeting the connection requirements, storing the tuple pair sequence numbers in the dblib _ output _ qual _ index table, and using the tuple pair sequence numbers as a basis for connecting the two tables. Or, the comparison process may be integrated into a kernel function interface for interfacing with a software and hardware acceleration scheme, and accelerating the calculation process performed in the kernel function, thereby further accelerating the entire connection operation process.

S7, a result slot structure is generated that is suitable for return to the database output.

And S8, according to different connection types, correspondingly processing the tuple pair sequence successfully matched and returned in the step S6. If left or right, all items in the outer table are returned. For those outer tuple sequence numbers which do not exist in the dblib _ output _ qual _ index table because of matching failure, the outer tuple sequence numbers are added into the dblib _ output _ qual _ index table again, the corresponding inner tuple sequence numbers are set to be-1, the inner table values which are successfully matched with the entries do not exist, and the values of the corresponding positions in the subsequent data pulling process are returned to be null. If it is a reverse join, only the outer tuple is returned where there is no inner table corresponding match. Therefore, the dblib _ output _ qual _ index table is updated to store the outer tuple numbers that do not exist in the original dblib _ output _ qual _ index table because of matching failure, and the corresponding inner tuple number can be set to-1.

And S9, according to the latest dblib _ output _ qual _ index table, taking out the data rows meeting the requirements of the connection conditions and the connection types from the data to be returned obtained in the step S4 and the step S5. If the corresponding data serial number is-1, the corresponding value does not exist, and a slot can be used for returning; otherwise, according to the specific position information of the data in the result slot, the data are used for assigning values to the result slot, and one result slot is returned at one time until all the data recorded in the dblib _ output _ qual _ index table are returned.

It should be noted that the improvements of the embodiments are applicable to a variety of connection types including inner connection, left connection, right connection, reverse connection, etc. Currently, the comparison of two attributes and the return of multiple attributes can be supported during connection. The method can support the situation when the connecting node is a leaf node, and can also support the connecting operation of non-leaf nodes.

In the embodiments, when data preparation is performed in the early stage, instead of actually extracting tuple data corresponding to attributes, position information of related attributes in an actual storage structure is recorded first, batch data is extracted again according to the position information, unified comparison calculation processing is performed, and tuple sequence number information meeting conditions is returned. Wherein the unified comparison calculation process can further speed up software or hardware methods. For example, a multi-CPU (central processing unit) multithreading or single-CPU parallel computation mode is adopted to simultaneously process the comparison process of multiple pairs of tuples. In addition to saving the overhead of fetching data, the data involved in comparison is only all the tuple data of a certain attribute, not the data of the whole slot. The data form is simplified, so that the butt joint with a software and hardware acceleration scheme is more convenient. It should be noted that, since hardware can only directly process simple data types, when hardware such as an FPGA (programmable gate array) is introduced to perform accelerated computation, the related data structure needs to be converted into a relatively simple structure such as a char array. In summary, the embodiments provide an interface function capable of interfacing with an external accelerated computation scheme to replace a comparison process in the original DBMS kernel during connection operation by optimizing the execution function logic in the existing DBMS query execution stage without additional row index auxiliary query, so as to further improve the execution efficiency of the database connection operation, and have important significance for increasingly large database processing requirements.

In order that those skilled in the art will better understand the present invention, embodiments of the present invention will be described below with reference to specific examples.

The proposed optimization method is further described in detail by taking postgreSQL as an example and combining the drawings and the connection operation example. The specific example described here is the inter-connection operation of two simple tables student and sc in postgreSQL. Attributes of table student have name and sno, and attributes of table sc have sno and cno. The join operation for both tables was performed with the internal join condition sno being equal. The attributes to be returned are name in table student and cno in table sc. The present embodiment will be described below with reference to the accompanying drawings.

Fig. 2 is a schematic flow chart of a connection query method of a relational database management system according to an embodiment of the present invention, fig. 3 is a schematic diagram of a key data structure NestLoopState related to a NestJoinLoop algorithm in a PostgreSQL kernel according to an embodiment of the present invention, and referring to fig. 2 and fig. 3, a specific flow of a function-based docking acceleration scheme includes the following processes:

(1) first, whether the condition of acceleration is satisfied is judged. Specifically, since neither sno of table student nor sno of table sc is a primary key, no line index exists, and the subsequent flow processing is adopted.

(2) The PostgreSQL adopts a NestJoinLoop algorithm to implement the execution process of the connection operation, so that it can be determined whether the main function of the NestJoinLoop algorithm is executed for the first time, and if so, the following steps are sequentially executed.

(3) Analyzing from a structure joinqual (which is a pointer and points to a List structure, and the structure comprises a head address, a tail address and the length of a linked List) to obtain an attribute sno of the join in the exterior student, and a position index in the exterior is 0; the attribute participating in the connection in the inner table sc is sno, and the position index in the inner table is 1; the connection condition is equal to.

(4) The location of the attribute to be returned, student name, in the result slot is 1, the location in the outer table slot is 1, the location of the attribute to be returned, sc.cno, in the result slot is 0, and the location in the inner table slot is 0. In addition, when the PostgreSQL pulls data from the data table, the data of a certain attribute position and all attributes smaller than the position can be pulled with reference to the position of the attribute. Therefore, when the ps _ ProjInfo is parsed, the reference can be determined, and the principle of determining the reference position can be to take the maximum value of all the execution steps to ensure the integrity of the subsequent data.

(5) And circularly pulling the tuple slots from the left planning subtree node outplan, taking all tuple data of the attributes to participate in connection from each tuple slot, and storing the tuple data in a dblib _ input _ output table to be used as the input of a later kernel function. Then all the tuple data of the returned attributes in the tuple slots are taken out and stored in the dblib _ outer _ output _ dates table. Until all tuples in the table have been pulled.

(6) And circularly pulling the tuple slots from the right planning subtree node innerPlan, taking all tuple data of the attributes to participate in the connection from each tuple slot, and storing the tuple data in a dblib _ input _ inner table as the input of a later kernel function. All tuple data of the returned attributes in the tuple slots are taken out and stored in the dblib _ inner _ output _ data table. Until all tuples in the inner table are pulled out.

(7) And uniformly calculating the batch of data to be compared obtained in the steps by using a kernel function. In the process, the unified processing process can be accelerated by a software acceleration or hardware acceleration mode, such as FPGA acceleration or multithreading. Specifically, a multithreading or parallel computing mode can be adopted, a plurality of tuple pairs are compared at the same time, whether the tuple pairs meet the connection condition or not is judged, and if the condition matching is successful, the tuple sequence numbers of the tuple pairs are recorded into the dblib _ output _ qual _ index table. The detailed execution of the kernel function is shown in fig. 4. Since this example describes the inter-connection case, after all comparisons are completed, the sequence number record dblib _ output _ qual _ index of the corresponding tuple pair is returned, and no other processing is performed on the record. If the connection is of other types, different updating processing needs to be carried out on the dblib _ output _ qual _ index table according to different types. For example, if the left connection is a left connection, whether matching is successful or not, all data in the outer table needs to be returned, so that an outer table tuple sequence number which is not recorded due to matching failure needs to be supplemented in the dblib _ output _ qual _ index, the corresponding inner table tuple sequence number can be marked as-1, and it is convenient for a user to identify that no corresponding data is available here and return a null tuple when data is subsequently returned. If the join is reverse, dblib _ output _ qual _ index needs to be updated to be the outer tuple sequence numbers of all matching failures, and the corresponding inner tuple sequence number can also be marked as-1.

(8) After the rescan operation on the nodes of the left and right planning subtrees is completed, the current node is projected to obtain a structure dblib _ result _ tupleslot suitable for returning a final result to be output.

(9) And according to the sequence number information in the dblib _ output _ qual _ index table, taking out the data to be returned which meets the conditions in the output _ dates table, and returning the data to the dblib _ result _ tuple for output. Only one tupleslot is returned at a time. Since this example is an internal connection, only data that matches successfully is common to both tables is returned. Other types of connection processes have different results returned.

In the embodiment, all tuple data of the attributes to be compared in the outer surface and the inner surface are taken out at one time, a unified kernel function is used for comparing the tuple data, and all tuple sequence numbers meeting the connection condition are returned at one time. The method saves the cost of acquiring the plan node information and the cost of function calling, and accelerates the function calculation by simplifying the data interface form and butting the software and hardware acceleration scheme, thereby accelerating the connection operation and improving the execution efficiency.

Based on the same inventive concept as the connection query method of the relational database management system shown in fig. 1, the embodiment of the present invention further provides a connection query system of the relational database management system, as described in the following embodiments. Because the principle of solving the problems of the connection query system of the relational database management system is similar to the connection query method of the relational database management system, the implementation of the connection query system of the relational database management system can be referred to the implementation of the connection query method of the relational database management system, and repeated parts are not repeated.

Fig. 5 is a schematic structural diagram of a connection query apparatus of a relational database management system according to an embodiment of the present invention. As shown in fig. 5, the connection query apparatus of the relational database management system according to some embodiments may include:

a node information obtaining unit 210, configured to obtain information of a left-plan execution node and information of a right-plan execution node of a relational database management system, where the information of the left-plan execution node includes an attribute to be returned and an attribute participating in a connection in an outer table, and the information of the right-plan execution node includes an attribute to be returned and an attribute participating in a connection in an inner table;

a position information obtaining unit 220, configured to analyze, from the structural body related to the left plan execution node and the structural body related to the right plan execution node, position information of an attribute participating in connection in the outer table and position information of an attribute to be returned, and position information of an attribute participating in connection in the inner table and position information of an attribute to be returned, respectively;

a left plan sub-tree node data pulling unit 230, configured to pull, according to the location information of the attribute participating in the connection in the exterior and the location information of the attribute to be returned, all tuple data of the attribute to participate in the connection and all tuple data of the attribute to be returned in all tuple slots in the exterior from the sub-tree node in the left plan execution node, and store the groups in the first table and the second table, respectively;

a right plan sub-tree node data pulling unit 240, configured to pull, from the sub-tree node in the right plan execution node, all tuple data of the attributes to participate in the connection and all tuple data of the attributes to be returned in all tuple slots in the inner table according to the location information of the attributes to participate in the connection and the location information of the attributes to be returned in the inner table, and store the tuple data in a third table and a fourth table, respectively;

a join operation unit 250 for matching and comparing the tuple data in the first table and the tuple data in the third table according to the join conditions of the outer table and the inner table using a uniform function, and storing a matching and comparing result in a fifth table;

and a result output unit 260, configured to return a connection query result according to the fifth table, a predetermined connection type, the second table, and the fourth table.

In some embodiments, the connection querying device of the relational database management system shown in fig. 5 may further include: and the applicable condition judging unit is used for determining that no row index scanning mode exists in the left plan execution node and the right plan execution node. The applicable condition determining unit is connected between the node information acquiring unit 210 and the position information acquiring unit 220.

In some embodiments, the connection operation unit 250 may include: and the connection operation module is used for inputting the tuple data in the first table and the tuple data in the third table into a kernel function, comparing the tuple data in the first table and the tuple data in the third table in a one-to-one matching mode according to the connection conditions of the outer table and the inner table to obtain tuple pair index information corresponding to the tuple data pairs meeting the connection conditions, and storing the tuple pair index information in a fifth table as a matching comparison result.

In some embodiments, the connection operation module may include: and the kernel function interface generating module is used for gathering the processes of matching and comparing the tuple data in the first table and the tuple data in the third table according to the connection conditions of the outer table and the inner table into a kernel function interface so as to execute the kernel function for matching and comparing by using external software or hardware acceleration to obtain tuple pair index information corresponding to the tuple data pairs meeting the connection conditions.

In some embodiments, the result output unit 260 may include: a connection query result output module for: under the condition that the connection type is internal connection, acquiring corresponding tuple data from the second table and the fourth table according to tuple pair index information corresponding to tuple data pairs meeting the connection condition in the fifth table, and returning the acquired tuple data; and under the condition that the connection type is left connection, right connection or reverse connection, supplementing information of an external tuple which does not meet the connection condition and fails to be matched in the fifth table, setting a matching failure identifier at a corresponding position for storing internal tuple information to update the fifth table, and then acquiring tuple data needing to be returned from the second table and the fourth table according to the updated fifth table to generate a connection query result for returning.

In some embodiments, the join query result output module may include: a connection query result generation module to: when the connection type is left connection or right connection, acquiring all tuple data from the second table according to the updated fifth table, correspondingly acquiring the tuple data meeting the connection condition from the fourth table, returning a null value for the attribute to be returned of the tuple which is not successfully matched in the second table, and generating a connection query result for returning; and under the condition that the connection type is reverse connection, according to the updated fifth table, acquiring external table metadata which does not accord with the connection condition and fails to be matched from the second table, and generating a connection query result for returning.

In some embodiments, the connection querying device of the relational database management system shown in fig. 5 may further include: a connection operation execution condition judgment unit, configured to determine that there is no matching comparison result of the tuple data in the first table and the tuple data in the third table. Wherein, the connection operation execution condition judgment unit may be connected with the position information acquisition unit 220.

In some embodiments, the connection querying device of the relational database management system shown in fig. 5 may further include: and the row index scanning matching unit is used for scanning the row indexes according to the information of the left plan execution node and the information of the right plan execution node under the condition that the left plan execution node and/or the right plan execution node is determined to have a row index scanning mode, so as to obtain a connection query result. The line index scan matching unit may be connected to the position information obtaining unit 220.

Embodiments of the present invention further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the method described in the above embodiments.

In summary, according to the connection query method of the relational database management system, the connection query device of the relational database management system, and the computer-readable storage medium of the embodiments of the present invention, all tuple data of attributes to be compared in the outer table and the inner table are taken out at one time, and then a unified function is used to compare the tuple data, and all tuple numbers meeting the connection condition are returned at one time, so that the overhead of acquiring the plan node information and the overhead of function call can be saved, and the execution efficiency of the connection operation can be improved.

In the description herein, reference to the description of the terms "one embodiment," "a particular embodiment," "some embodiments," "for example," "an example," "a particular example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. The sequence of steps involved in the various embodiments is provided to schematically illustrate the practice of the invention, and the sequence of steps is not limited and can be suitably adjusted as desired.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A connection query method of a relational database management system is characterized by comprising the following steps:

matching and comparing the tuple data in the first table and the tuple data in the third table according to the connection conditions of the outer table and the inner table by using a uniform function, and storing a matching and comparing result in a fifth table; the uniform function is a kernel function;

2. The relational database management system connection query method according to claim 1, wherein before analyzing the position information of the attribute participating in the connection and the position information of the attribute to be returned in the outer table and the position information of the attribute participating in the connection and the position information of the attribute to be returned in the inner table from the correlation structure of the left planned execution node and the correlation structure of the right planned execution node, respectively, the method further comprises:

and determining that no row index scanning mode exists in the left plan execution node and the right plan execution node.

3. The relational database management system join query method according to claim 1, matching and comparing tuple data in the first table and tuple data in the third table according to join conditions of the outer table and the inner table using a uniform function, and storing a result of the matching and comparing in a fifth table, comprising:

inputting the tuple data in the first table and the tuple data in the third table into a kernel function, and comparing the tuple data in the first table and the tuple data in the third table in a one-to-one matching manner according to the connection condition of the outer table and the inner table to obtain tuple pair index information corresponding to the tuple data pair meeting the connection condition, and storing the tuple pair index information in a fifth table as a matching comparison result.

4. The join query method for a relational database management system according to claim 3, wherein the inputting the tuple data in the first table and the tuple data in the third table to a kernel function to compare the tuple data in the first table and the tuple data in the third table one-to-one according to the join condition of the outer table and the inner table to obtain the tuple pair index information corresponding to the tuple data pair meeting the join condition comprises:

and combining the processes of matching and comparing the tuple data in the first table and the tuple data in the third table according to the connection conditions of the outer table and the inner table into a kernel function interface, and executing a kernel function for matching and comparing by using external software or hardware acceleration to obtain tuple pair index information corresponding to the tuple data pairs meeting the connection conditions.

5. The relational database management system join query method according to claim 3, returning join query results based on the fifth table, the predetermined join type, the second table, and the fourth table, comprising:

under the condition that the connection type is internal connection, acquiring corresponding tuple data from the second table and the fourth table according to tuple pair index information corresponding to tuple data pairs meeting the connection condition in the fifth table, and returning the acquired tuple data;

and under the condition that the connection type is left connection, right connection or reverse connection, supplementing information of an external tuple which does not meet the connection condition and fails to be matched in the fifth table, setting a matching failure identifier at a corresponding position for storing internal tuple information to update the fifth table, and then acquiring tuple data needing to be returned from the second table and the fourth table according to the updated fifth table to generate a connection query result for returning.

6. The relational database management system join query method according to claim 5, wherein the step of obtaining tuple data to be returned from the second table and the fourth table according to the updated fifth table to generate join query results for returning comprises:

when the connection type is left connection or right connection, acquiring all tuple data from the second table according to the updated fifth table, correspondingly acquiring the tuple data meeting the connection condition from the fourth table, returning a null value for the attribute to be returned of the tuple which is not successfully matched in the second table, and generating a connection query result for returning;

and under the condition that the connection type is reverse connection, according to the updated fifth table, acquiring external table metadata which does not accord with the connection condition and fails to be matched from the second table, and generating a connection query result for returning.

7. The relational database management system connection query method according to claim 1, wherein before analyzing the position information of the attribute participating in the connection and the position information of the attribute to be returned in the outer table and the position information of the attribute participating in the connection and the position information of the attribute to be returned in the inner table from the correlation structure of the left planned execution node and the correlation structure of the right planned execution node, respectively, the method further comprises:

determining that there is not yet a match comparison of the tuple data in the first table and the tuple data in the third table.

8. The connection query method of the relational database management system according to claim 2, wherein before analyzing the position information of the attribute participating in the connection and the position information of the attribute to be returned in the outer table and the position information of the attribute participating in the connection and the position information of the attribute to be returned in the inner table from the correlation structure of the left-plan execution node and the correlation structure of the right-plan execution node, respectively, the method further comprises:

and under the condition that the left plan execution node and/or the right plan execution node are determined to have a row index scanning mode, performing row index scanning according to the information of the left plan execution node and the information of the right plan execution node to obtain a connection query result.

9. A relational database management system join query apparatus, comprising:

a join operation unit for matching and comparing the tuple data in the first table and the tuple data in the third table according to the join conditions of the outer table and the inner table using a uniform function, and storing a matching and comparing result in a fifth table; the uniform function is a kernel function;

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.