CN115587114A

CN115587114A - System and query method

Info

Publication number: CN115587114A
Application number: CN202211167016.2A
Authority: CN
Inventors: 邵大明
Original assignee: Alibaba Cloud Computing Ltd
Current assignee: Alibaba Cloud Computing Ltd
Priority date: 2022-09-23
Filing date: 2022-09-23
Publication date: 2023-01-10

Abstract

An embodiment of the present specification provides a system and an inquiry method, where a coordinating node of the system includes a coordinating service unit and a connection unit, a data node of the system includes a data service unit, and the connection unit includes a reusable connection, and the inquiry method includes: when a data service unit receives a preparation request carrying sub query statements, query optimization data are generated by using the sub query statements, and the query optimization data and information of the sub query statements are correspondingly written into a cache region; the coordination service unit establishes connection with the data service unit based on the connection in the connection unit, determines a target sub-query statement of the target query statement when receiving the target query statement, finds out a target data service unit of query optimization data of the target sub-query statement cached by utilizing target information of the target sub-query statement, and sends a use request to the target data service unit; the data service unit multiplexes the query optimization data in response to receiving the usage request.

Description

System and query method

Technical Field

The embodiment of the specification relates to the technical field of computers, in particular to a query method.

Background

In database systems, query optimization is an important component of the underdevelopment. Query optimization data relates to various data generated by the query optimization process. For example, in the database query, the query optimization includes a query tree generated by parsing, and the cost of each path is calculated according to information such as statistical information and data distribution through logical equivalent transformation and physical execution path screening, so as to select an optimal execution path, that is, the query tree is converted into a query execution plan of the target query statement through query optimization. In a multi-machine database system, the optimization process is more complex than the optimization of a single-machine database, and the calculated amount is larger. If one query statement is repeatedly executed, the query optimization process is repeatedly executed for many times, which brings unnecessary performance loss. Therefore, a query method capable of effectively improving query performance is needed.

Disclosure of Invention

In view of this, the embodiments of the present specification provide a query method. One or more embodiments of the present disclosure relate to a system, an inquiry apparatus, a computing device, a computer-readable storage medium, and a computer program, so as to solve technical problems of the related art.

According to a first aspect of embodiments of the present specification, there is provided an inquiry method applied to a system including a coordinating node and a data node, where the coordinating node includes a coordinating service unit and a connection unit, the data node includes a data service unit, and the connection unit includes a connection that can be multiplexed between the coordinating service units, the method includes: when the data service unit receives a preparation request carrying sub query statements, the data service unit generates query optimization data by using the sub query statements, and correspondingly writes the query optimization data and the information of the sub query statements into a cache region, wherein the sub query statements are generated based on query optimization results of the query statements; the coordination service unit establishes connection with the data service unit based on the connection in the connection unit, determines a target sub-query statement of the target query statement when receiving the target query statement, finds out a target data service unit in which query optimization data of the target sub-query statement is cached by using target information of the target sub-query statement, and sends a use request to the target data service unit; and the data service unit responds to the received use request, and multiplexes the query optimization data corresponding to the target information in the cache region to execute the query of the target sub-query statement.

According to a second aspect of the embodiments of the present specification, there is provided an inquiry method, applied to a coordination service unit, where the coordination service unit is configured in a coordination node of a system, the system further includes a data node, the coordination node further includes a connection unit, the data node includes data service units, the connection unit includes connections that can be multiplexed among coordination service units, and one connection corresponds to one data service unit in the data node, the method includes: establishing connection with a data service unit based on the connection in the connection unit, wherein the data service unit caches a corresponding relation between information of a sub-query statement and query optimization data, the query optimization data is generated based on the sub-query statement, and the sub-query statement is generated based on a query optimization result of the query statement; determining a target sub-query statement of a target query statement when the target query statement is received; searching out a target data service unit of the query optimization data cached in the target sub-query statement by using the target information of the target sub-query statement; and sending a use request to the target data service unit, so that the target data service unit responds to the received use request, and multiplexing the query optimization data corresponding to the target information in the cache region to execute the query of the target sub-query statement.

According to a third aspect of the embodiments of the present specification, there is provided an inquiry method applied to a data service unit, where the data service unit is configured in a data node of a system, the system further includes a coordination node, the coordination node further includes a connection unit, the connection unit includes connections that can be multiplexed among coordination service units, and one connection corresponds to one data service unit in the data node, the method includes: when a preparation request carrying sub query sentences is received, generating query optimization data by using the sub query sentences, and correspondingly writing the query optimization data and the information of the sub query sentences into a cache region, wherein the sub query sentences are generated based on query optimization results of the query sentences; receiving a use request sent by the coordination service unit, wherein the use request is sent to a target data service unit when the coordination service unit receives a target query statement, determines a target sub-query statement of the target query statement, and finds out the target data service unit in which query optimization data of the target sub-query statement is cached by using target information of the target sub-query statement; and multiplexing the query optimization data corresponding to the target information in the cache region according to the use request to execute the query of the target sub-query statement.

According to a fourth aspect of embodiments herein, there is provided a query device configured in a coordination service unit, including: the connection module is configured to establish connection with a data service unit based on the connection in the connection unit, wherein the data service unit caches a corresponding relation between information of a sub-query statement and query optimization data, the query optimization data is generated based on the sub-query statement, and the sub-query statement is generated based on a query optimization result of the query statement; a statement determination module configured to determine, upon receiving a target query statement, a target sub-query statement of the target query statement; the search cache module is configured to search a target data service unit, in which the query optimization data of the target sub-query statement is cached, by using the target information of the target sub-query statement; and the multiplexing query module is configured to send a use request to the target data service unit, so that the target data service unit executes query of the target sub-query statement by multiplexing query optimization data corresponding to the target information in the cache region in response to receiving the use request.

According to a fifth aspect of the embodiments of the present specification, there is provided an inquiry apparatus configured in a data service unit, including: the device comprises a preparation module, a cache module and a processing module, wherein the preparation module is configured to generate query optimization data by using sub-query statements and write the query optimization data into a cache region correspondingly with information of the sub-query statements when receiving a preparation request carrying the sub-query statements, and the sub-query statements are generated based on query optimization results of the query statements; a request receiving module 904, configured to receive a usage request sent by the coordination service unit, where the usage request is sent to a target data service unit when the coordination service unit receives a target query statement, determines a target sub-query statement of the target query statement, and finds the target data service unit in which query optimization data of the target sub-query statement is cached by using target information of the target sub-query statement; and the multiplexing execution module is configured to multiplex the query optimization data corresponding to the target information in the cache region according to the use request to execute the query of the target sub-query statement.

According to a sixth aspect of embodiments herein, there is provided a computing device comprising: a memory and a processor; the memory is used for storing computer-executable instructions, and the processor is used for executing the computer-executable instructions, and the computer-executable instructions realize the steps of the query method when being executed by the processor.

According to a seventh aspect of embodiments herein, there is provided a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of the above-described query method.

According to an eighth aspect of embodiments herein, there is provided a computer program, wherein the computer program, when executed in a computer, causes the computer to perform the steps of the above-mentioned query method.

According to a ninth aspect of embodiments herein, there is provided a system for implementing the query method as described in any one of the above, including: a coordinating node and a data node.

An embodiment of the present specification implements a query method, in which a connection unit includes a connection that can be multiplexed between coordinating service units, the connection has a mapping relationship with a data service unit of a data node, and when a data service unit receives a preparation request carrying a sub query statement, the data service unit generates query optimization data using the sub query statement, and writes the query optimization data in a cache region in correspondence with information of the sub query statement, so that the coordinating service unit of a coordinating node can establish a connection with the data service unit based on the connection in the connection unit, and when a target query statement is received, determines a target sub query statement of the target query statement, and searches for a target data service unit that has cached the query optimization data of the target sub query statement using target information of the target sub query statement, and sends a carry use request to the target data service unit, so that the data service unit executes a query of the target sub query statement according to the query optimization data corresponding to the target information in the use request multiplexing cache region.

It can be seen that, because the method provides a global cache implemented based on a connection unit, the data service unit caches a corresponding relationship between information of sub-query statements and query optimization data based on sub-query statements issued by the coordination service unit, and the sub-query statements are generated based on query optimization results of the query statements, so that the generated sub-query statements are standardized, and although the query statements are different, as long as sub-plans in a query plan of the query statements have the same function, the sub-plans corresponding to the sub-query statements can be reused in queries of different query plans, thereby decoupling the query optimization data from the coordination service unit, and the coordination node provides a global shared connection unit, so that, based on the global reuse connected to the coordination node, the query optimization data cached by the data service unit can be reused among the coordination service units to execute optimized queries, thereby achieving the purposes of reducing redundancy, saving resources, and improving query efficiency.

Drawings

FIG. 1 is a schematic diagram of a cluster architecture for a system provided by one embodiment of the present description;

FIG. 2 is a block diagram of a system provided by one embodiment of the present description;

FIG. 3 is a block diagram of a system provided by another embodiment of the present description;

FIG. 3a is a flow diagram of a query method provided by one embodiment of the present specification;

FIG. 4 is a flowchart of a query method applied to a coordination service unit according to an embodiment of the present disclosure;

FIG. 5a is a signaling interaction schematic diagram of a query method provided by one embodiment of the present specification;

FIG. 5b is a schematic diagram of a synchronization cache provided by one embodiment of the present description;

FIG. 5c is a signaling interaction schematic diagram of a query method provided by another embodiment of the present description;

FIG. 6 is a schematic diagram of a query execution plan for the target query statement, as provided in one embodiment of the present specification;

fig. 7 is a schematic structural diagram of a query device according to an embodiment of the present specification;

FIG. 8 is a flow diagram of a query method applied to a data service unit as provided by one embodiment of the present specification;

fig. 9 is a schematic structural diagram of an inquiry apparatus according to another embodiment of the present disclosure;

FIG. 10 is a system diagram provided in accordance with another embodiment of the present disclosure;

fig. 11 is a block diagram of a computing device according to an embodiment of the present disclosure.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present specification. This description may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, as those skilled in the art will be able to make and use the present disclosure without departing from the spirit and scope of the present disclosure.

The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first can also be referred to as a second and, similarly, a second can also be referred to as a first without departing from the scope of one or more embodiments of the present description. The word "if," as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination," depending on the context.

First, the noun terms referred to in one or more embodiments of the present specification are explained.

And the coordination node is a server for operating a coordination service unit in the system.

And the data node is a server for operating the data service unit in the system.

The coordination service unit is a program execution main body operated in the coordination node and is responsible for processing query statements sent by a query party such as a client and the like, generating and executing a query execution plan, and distributing and executing sub-query statements to the data service unit. One or more coordination service units may be included in the coordination node. In practical applications, the coordination service unit may be represented as a program execution subject in any form, such as a service process, and one Session communicated with the inquiring party corresponds to one coordination service unit.

And the data service unit is a program execution main body operated in the data node and is responsible for processing the sub-query statements sent by the coordination service unit.

The connection unit may be understood as a buffering process that creates and manages a connection.

The query statement is a program statement for describing a query condition, and causes a main body executing the query to return data that meets the query condition based on the query condition. The query language of the query statement is specifically determined according to the environment of the query system in the practical application scenario, for example, in an SQL database query, the query statement is represented as an SQL query statement.

The information of the sub query statement is information determined based on the characteristics of the sub query statement, and is used for accessing the sub query statement. For example, the information of the sub-query statement may be a hash value obtained by performing a hash calculation on the sub-query statement.

The query optimization data is generated in the query optimization process and is used for assisting in querying and improving the query speed. For example, in a database query, query optimization includes a query tree generated by parsing, a query execution plan.

The query tree is generated by parsing the query statement and is used for describing the internal expression form of the query statement.

The query execution plan is an optimized execution path obtained by performing logical equivalence transformation and physical execution path screening on the basis of a query tree and calculating the cost of the path according to information such as statistical information and data distribution.

In a database system, the node types are often divided into a coordinating node and a data node according to roles, the coordinating node is responsible for generating a query execution plan, and the data node is responsible for executing the query execution plan. Specifically, the coordinating node generates a query execution plan of the target query statement through query optimization according to the data element information, the statistical information and the distribution information. The coordination point pushes down the calculation of the sub-plan by issuing the query execution plan of the target query statement, so that the calculation is closer to the data, and the inefficient data transmission is reduced. The conventional query currently has the following problems:

the data nodes do not cache the query execution plan, the coordination node generates the query execution plan, the plan is distributed to each data node to be executed, and finally the results are collected at the coordination node. The issued query execution plan needs serialization and deserialization operations, and is controlled by the parent execution plan on the coordination node, that is, the child plan pushed down to the data node is a part of the query execution plan of the target query statement in the coordination node, and belongs to strong coupling and cannot be reused on the data node. Therefore, the traditional pushed-down sub-plan can only serve the query execution plan that issued the target query statement for this computation. Logically, the execution plan issued by some repeated queries is usually repeated and reusable, and the operation that cannot be repeated and reused brings performance loss. In addition, in a query system such as a database, query optimization data is generally divided into two parts, one part is a query tree generated by a parse (parsing), analyze (parsing), rewrite (rewriting) operation on a query statement; another part is generating executable query execution plans through logic optimization and physical optimization. However, in the conventional scheme, the query tree is not cached and reused, so that the operation of generating the query tree needs to be repeatedly executed every time the same query statement is executed, and unnecessary performance loss is caused. In some conventional schemes, the data service unit generally adopts a service shared by the coordination service unit, and it is not possible to implement that the information cached in the data node service unit serves more coordination service units, which may cause more processes to cache redundant query optimization data, and further cause excessive consumption of data node resources, which ultimately affects the processing capability of the data cluster.

In view of the above, in the present specification, a system, a query method, and a query apparatus, a computing device, and a computer-readable storage medium are provided, and the present specification relates to the query apparatus, the computing device, and the computer-readable storage medium, which are described in detail in the following embodiments one by one.

Referring to fig. 1, fig. 1 illustrates a schematic diagram of a cluster architecture of a system provided according to an embodiment of the present description. As shown in fig. 1, in the cluster architecture, the node types are divided into a coordinating node (abbreviated CN), a data node (abbreviated DN), and a central timing node.

The coordination node is a server for running a coordination service unit in the system. And the data node is a server for operating the data service unit in the system. And the central time service node is used for time synchronization of each node in the cluster. The client sends out a query statement to the CN, and the CN queries according to the query method provided by the embodiments of the present specification.

Referring to fig. 2, fig. 2 is a block diagram illustrating a system provided according to an embodiment of the present disclosure, which specifically includes: a coordinating node 210, and a data node 220.

The coordinating node 210 includes a coordinating service unit 2102 and a connection unit 2104, the data node 220 includes a data service unit 2202, and the connection unit 2104 includes a connection that can be multiplexed among the coordinating service units, and the connection has a mapping relationship with the data service unit 2202. For example, the connection and the data service unit may be in a one-to-one mapping relationship, or may be in a mapping relationship of other corresponding manners set according to actual application scenarios, which is not limited in this specification.

The data service unit 2202 is configured to, when receiving a preparation request carrying a sub query statement, generate query optimization data by using the sub query statement, and write the query optimization data into a cache region in correspondence with information of the sub query statement;

the coordination service unit 2102 is configured to establish a connection with the data service unit 2202 based on the connection in the connection unit 2104, determine a target sub-query statement of the target query statement when the target query statement is received, find a target data service unit in which query optimization data of the target sub-query statement is cached by using target information of the target sub-query statement, and send a use request to the target data service unit;

the data service unit 2202 is further configured to execute, in response to receiving the usage request, the query of the target sub-query statement by multiplexing the query optimization data corresponding to the target information in the cache region.

Because the system provides the global cache realized based on the connection unit, the data service unit caches the corresponding relation between the information of the sub-query statement and the query optimization data based on the sub-query statement issued by the coordination service unit, and the sub-query statement is generated based on the query optimization result of the query statement, the generated sub-query statement is standardized, although the query statements are different, as long as the sub-plans in the query plans of the query statement have the same function, the sub-plans corresponding to the sub-query statement can be reused in the queries of different query plans, so that the query optimization data and the coordination service unit are decoupled, and the coordination node provides the globally shared connection unit, so that based on the global reuse connected to the coordination node, the information of the sub-query statement is used as a search clue, and the query optimization data cached by the data service unit can be reused among the coordination service units to execute the optimized queries, thereby achieving the purposes of reducing redundancy, saving resources and improving the query efficiency.

Referring to fig. 3, fig. 3 illustrates a block diagram of a system provided in accordance with another embodiment of the present description. In this embodiment, the query optimization data includes a query tree and a query execution plan, and the information of the sub-query statement is a hash value obtained by performing hash calculation on the sub-query statement.

The coordinating node (the coordinating node is denoted by CN in fig. 3) 210 includes one or more coordinating service units (the coordinating service unit is denoted by CN backhaul in fig. 3) 2102, and the data node (the data node is denoted by DN in fig. 3) 220 includes one or more data service units (the data service unit is denoted by DN backhaul in fig. 3) 2202.

The coordinating service unit 2102 has a mapping relationship with the first cache. The first cache region comprises a first statement information cache region and a first plan cache region. The first statement information cache region (the cache region is illustrated by a prepared stmts hash in fig. 3) is configured to cache a correspondence between a first hash value and a first plan main key, where the first hash value includes hash values of sub query statements in caches of all data service units currently connected to a coordinating service unit, and the first plan cache region is configured to cache a correspondence between a second plan main key and a first plan value. The first plan primary key is a plan name determined from a hash value of the query statement. The second plan key is a hash value determined according to a query tree generated by the query statement, and the first plan value is a query execution plan corresponding to the target query statement generated on the coordination service unit.

The connected cache regions (the cache regions are schematically shown as c in fig. 3, and the cached stmt hash is schematically shown as the connected cache regions) are used for recording hash values of sub-query statements corresponding to the query optimization data cached in the corresponding data service units. The connection unit (the connection unit is illustrated as pooler in fig. 3). The coordination service 2102 has a mapping relationship with an agent (the connection pool agent is illustrated by agent in fig. 3). The proxy has a mapping relationship with the proxy cache. The proxy cache region (the proxy cache region is denoted by a cached stmts hash in fig. 3), configured to cache hash values of sub-query statements cached by all data service units currently connected to the coordinating service unit. It can be understood that, before the coordination service unit applies for a new connection, the proxy cache region caches the hash value of the sub-query statement cached by the coordination service unit in the last connection state, so that when the coordination service unit applies for a new connection, the proxy cache region can be used as a basis for updating the coordination service unit cache by reference to the basis for updating the coordination service unit cache, and only the updated part is synchronized for the coordination service unit.

The data service unit 2202 has a mapping relationship with the second cache region. The second buffer includes a second statement information buffer (the second statement buffer is represented by a prepared stmt hash in fig. 3) and a second Plan buffer (the buffer is illustrated by a Plan Cache in fig. 3). And the second statement information cache region is used for caching the corresponding relation between the hash value of the sub-query statement and the query tree of the sub-query statement. And the second plan cache region is used for caching the corresponding relation between the third plan main key, the fourth plan main key and the second plan value. The third plan main key is a plan name determined from the hash value of the sub-query statement. The fourth intended primary key is a hash value determined from a query tree of the sub-query statement. The second plan value is a query execution plan generated from a query tree of the sub-query statement.

Next, the architecture shown in fig. 3 will be described in more detail by taking an example in which the query statement is an SQL statement and the service unit is a service process. As shown in fig. 3, the overall architecture is divided into CN nodes and DN nodes, where:

and CN node: the coordination service unit is expressed as a Backend service process and is responsible for processing the query statement sent by the client and generating and executing the query execution plan of the query statement. Each Session corresponds to a callback service process. The first statement information buffer, represented as a prepended stmts hash, is responsible for recording the sub-SQL that has completed the prepare request in the DN. The first Plan Cache area is represented as a Plan Cache and is responsible for caching the query execution Plan of the target query statement generated on the CN. The connection unit appears as a Pooler process, responsible for managing all connections between CNs to DNs.

Each Agent is a proxy of CN backhaul in the connection unit, and is responsible for interacting with CN backhaul.

Slots (extended Slots) for recording connections currently established corresponding to CN backhaul, each element slot [ ] (the slot position of the element is shown by c in fig. 3) corresponds to a connection, and each connection corresponds to a backhaul on the DN. Each element is used for recording the connection information of the current corresponding data service unit.

The connected buffer (represented as prepared stmt hash pointed to by c in fig. 3) is responsible for recording the information of the sub-SQL that completes the preparation request in the DN backup corresponding to the connection.

conn pool, which is used to record the connection released by an Agent for use by other agents.

The Agent cache region of the Agent (the prepended stmts hash below the Agent in fig. 3 indicates the Agent cache region) is used for synchronously recording the information of the prepended mts hash of the corresponding CN backhaul, so that the Agent cache can be directly used as a reference for updating, only the part of the CN backhaul which is synchronously updated is subjected to synchronization updating, the synchronization overhead between the CN backhaul and the pooler is reduced, and the synchronization is accelerated.

DN node: the data service unit is expressed as a backup service process and is responsible for processing the sub SQL transmitted by the CN backup.

The second statement information cache region prepended stmt cache is responsible for caching the prepared sub SQL transmitted by the CN Back.

And the second plan cache area plan cache is responsible for caching a local query execution plan generated by the sub SQL transmitted on the CN backup.

It should be noted that, in the SQL database, the query optimization process is to generate an optimized query execution plan for each SQL statement. In order to effectively reduce the performance loss caused by query optimization, the system provided in the above embodiment caches reusable intermediate results, such as a query tree and a query execution plan cache. Because the global cache is adopted and the query execution plan cached in the coordination service unit is decoupled from the data service unit, two layers of caches are realized on the data node, the query tree is cached in the first layer, and the query execution plan is cached in the second layer, so that each layer of cache is decoupled from the coordination service unit, the multiplexing among multiple service processes in the coordination node is realized, the multiplexing efficiency is improved, the repeated operation in the SQL execution process is reduced, and the SQL execution efficiency is improved.

Based on the system architecture shown in fig. 3, in order to further save resources and improve the query efficiency, the cache of CN/DN may use a doubly linked list. For example, a global variable may be set: planCacheHead represents the head of a doubly linked list used to store the query execution plan; setting a global variable: max _ play _ cache represents the maximum number of cache plans. The LRU eviction algorithm is executed when the number of caches exceeds max _ play _ cache. Specifically, a plan at the tail of the PlanCacheHead linked list is removed, a corresponding plan in the plan _ cache _ hash is removed, and when the coordination service unit inserts a new record into the plan _ cache _ hash, the inserted record is inserted into the head of the PlanCacheHead doubly linked list. In performing the accessing of the plan, if a needed plan is found, the corresponding record is moved to the PlanCacheHead header. It can be seen that in this embodiment, the control of memory resources is realized by LRU control in the CN/DN, and the hit rate of the common execution plan can be improved.

Referring to fig. 3a, fig. 3a illustrates a flow diagram of a query method provided according to an embodiment of the present description. The method is applied to a system comprising a coordination node and a data node, wherein the coordination node comprises a coordination service unit and a connection unit, the data node comprises a data service unit, the connection unit comprises a connection capable of multiplexing among the coordination service units, and the method specifically comprises the following steps:

step 302: when the data service unit receives a preparation request carrying sub query statements, the data service unit generates query optimization data by using the sub query statements, and correspondingly writes the query optimization data and the information of the sub query statements into a cache region, wherein the sub query statements are generated based on query optimization results of the query statements.

Step 304: the coordination service unit establishes connection with the data service unit based on the connection in the connection unit, determines a target sub-query statement of the target query statement when receiving the target query statement, finds out a target data service unit in which query optimization data of the target sub-query statement is cached by using target information of the target sub-query statement, and sends a use request to the target data service unit.

Step 306: and the data service unit responds to the received use request, and multiplexes the query optimization data corresponding to the target information in the cache region to execute the query of the target sub-query statement.

Therefore, the generated sub-query statements are standardized, although the query statements are different, the sub-plans corresponding to the sub-query statements can be reused in the queries of different query plans as long as the sub-plan functions in the query plans of the query statements are the same, so that the query optimization data and the coordination service unit are decoupled, and the coordination node provides a globally shared connection unit, so that the query optimization data cached by the data service unit can be reused among the coordination service units to execute the optimized queries based on the global reuse connected to the coordination node and the information of the sub-query statements is used as a search clue, thereby achieving the purposes of reducing redundancy, saving resources and improving query efficiency.

Referring to fig. 4, fig. 4 is a flowchart illustrating a query method applied to a coordination service unit according to an embodiment of the present disclosure. The coordination service unit is configured in a coordination node of the system, the system further comprises a data node, the coordination node further comprises a connection unit, the data node comprises a data service unit, the connection unit comprises a connection capable of multiplexing among the coordination service units, and one connection corresponds to one data service unit in the data node. The method specifically comprises the following steps.

Step 402: and establishing connection with the data service unit based on the connection in the connection unit.

The data service unit caches the corresponding relation between the information of the sub-query statement and query optimization data, the query optimization data is generated based on the sub-query statement, and the sub-query statement is generated based on a query optimization result of the query statement.

Taking SQL as an example, the method provided in this specification may abstract a sub-plan in the query execution plan of SQL into sub-SQL, and cache a corresponding relationship between a hash value of the sub-SQL and the query execution plan (i.e., the sub-plan) at the data service node, so that the sub-plan is decoupled from the coordinating node, and thus, the sub-plan can be issued by pushing down the sub-SQL instead of accelerating the query. Specifically, for example, the abstraction of child SQL includes: and inputting the SQL into a uniform preset interface to query the result of the optimization stage, and generating the sub-SQL. Because the generated sub SQL is standardized, the query execution plan multiplexing of the sub SQL with different SQL can be realized. If the sub-SQL is a full-table scan for a certain table, and the query execution plans of the multiple SQL include a sub-plan for the full-table scan for the certain table, the sub-plan can be reused by issuing the sub-SQL. It can be understood that the pushed-down sub-SQL statements are completely independent and can be independently expanded in the DN, and any SQL query execution plan can refer to the sub-SQL as long as the generated sub-SQL is the same, thereby realizing the ability of multiplexing the sub-plans by the SQL query execution plans among the coordinating nodes on the coordinating nodes. And through SQL abstraction, the sub-plan can also be optimized independently.

Step 404: when a target query statement is received, a target sub-query statement of the target query statement is determined.

It should be noted that, the method provided in the embodiment of the present specification is not limited to how the cache of the data service node obtains the corresponding relationship between the information of the sub query statement and the query optimization data. For example, the known corresponding relationship between the information of some sub-query statements and the query optimization data may be imported in advance, and for example, when the coordination service node receives a query statement for the first time, the sub-query statements abstracted by a plurality of sub-plans in the query execution plan of the query statement and the query optimization data corresponding to the sub-query statements may be issued to the data service node for caching, so as to perform subsequent global multiplexing.

Specifically, for example, the method further includes: when the coordination service unit receives the target query statement for the first time, determining a target sub-query statement of the target query statement; sending a preparation request for the target sub-query statement to the target data service unit, wherein the preparation request carries the target sub-query statement, so that the target data service unit generates query optimization data by using the target sub-query statement according to the preparation request, and records a first corresponding relation between the query optimization data and the target information in a cache region of the target data service unit.

Taking the information of the sub-query statement as a hash value, and taking the query optimization data comprising a query tree and a query execution plan as an example: the determining a target sub-query statement of the target query statement when the target query statement is received for the first time may include: when the target query statement is received for the first time, generating a query tree and a query execution plan by using the target query statement; serializing the query tree of the target query statement and then performing hash calculation to obtain a hash value of the query tree; converting a sub-plan in the query execution plan of the target query statement into a target sub-query statement; and taking the plan name determined by the hash value of the target query statement and the hash value of the query tree as main keys, taking the query execution plan of the target query statement as a value corresponding to the main keys, and writing the corresponding relation between the main keys and the corresponding values into a cache region.

Accordingly, the determining, upon receiving a target query statement, a sub-query statement of the target query statement comprises: when the target query statement is received again, taking a plan name determined by the hash value of the target query statement and the hash value of the query tree of the target query statement as main keys, and querying whether a corresponding relation between the main keys and the corresponding values exists in a cache region; if yes, obtaining a value corresponding to the main key as a query execution plan of the target query statement; and executing the query execution plan of the target query statement by calling a plan executor to determine a target sub-query statement of the target query statement.

For example, a signaling interaction schematic diagram of the query method as shown in fig. 5 a. Taking SQL as an example, when a coordinating service unit first receives SQL, it sends a preparation request to a data service unit, so that the data service unit writes query optimization data into a processing procedure of a cache (i.e. a preparation phase), which may include:

s5002, the coordination service unit sends the sub SQL abstracted from the query execution plan of the SQL to the data service unit through an extension protocol.

S5004, the data service unit correspondingly generates a query tree and a sub-plan according to the sub-SQL, records a main key consisting of the hash value and the version number of the sub-SQL and the query tree into a local cache correspondingly, namely the prepended stmt cache shown in FIG. 3, and records the hash value and the version number of the sub-SQL and the hash value of the query tree into the local cache correspondingly as the main key and the sub-plan, namely the plan shown in FIG. 3.

S5006, after the coordination service unit receives the message that the data service unit successfully executes the cache, the sub SQL hash value and the version number are recorded into the local first statement information cache region prepared stmts hash for the subsequent multiplexing process.

In addition, after the transaction of the coordination service unit is submitted, the corresponding connection is released, and the connection unit management is handed back to provide for other needed coordination service units. The above-described management is performed by preparing a hash value and a version number that are newly inserted by a request, and synchronizing to a connection corresponding to a connection unit. The method specifically comprises the following steps:

s5008: the coordinator service unit transmits the updated update information of the local prepared _ stmts _ hash to the connection unit.

S5010: the connection unit updates the previous _ stmts _ hash in the home agent cache according to the update information.

S5012: and the connection unit generates a new record and inserts the new record into a connected cache region according to the hash value and the version number of the sub SQL contained in the update information and the unique identifier of the coordination service unit corresponding to the proxy.

The method provided in the embodiments of the present specification is not limited to the transmission method of various requests such as the preparation request. For example, various messages in the extension protocol may be processed by the data service unit, such as a P message indicating a preparation request, a B message indicating binding information, an E message indicating execution of a query execution plan, and so on. Through these messages, the query tree cached in the data service unit can be multiplexed to query the execution plan. For example, in step S5002, the coordination service unit may record the query tree and the query execution plan of the SQL sub-SQL into the cache of the data service unit by sending the P message.

Step 406: and finding out a target data service unit of the query optimization data cached in the target sub-query statement by using the target information of the target sub-query statement.

Wherein, the searching mode is not limited. For example, since the cache of the coordinating service unit records the information of the sub-query statement of the query optimization data cached by the data service unit connected to the coordinating service unit, and records the corresponding data service unit, such as the DN number, the target data service unit of the query optimization data cached by the target sub-query statement can be determined directly according to the information recorded by the cache of the CN. For another example, the coordination service unit may send a query request to the connection unit, and the connection unit searches, based on the target information, a target data service unit in which the query optimization data of the target sub-query statement is cached, and returns a search result to the coordination service unit.

Step 408: and sending a use request to the target data service unit, so that the target data service unit multiplexes the query optimization data corresponding to the target information in the cache region to execute the query of the target sub-query statement in response to receiving the use request.

It should be noted that, since the sub-query statements may contain constants, if the sub-query statements in the same mode have different constant values, the generated query execution plan may be different, so that the generated query execution plan is difficult to be reused in the cache. In view of this, the method provided in this specification further includes, before the call plan executor executes the query execution plan of the target query statement: and binding the parameters of the target query statement to the query plan of the target query statement. Accordingly, the sending a usage request to the target data serving unit includes: sending a use request to the target data service unit, wherein the use request carries target information and parameter binding information of the target sub-query statement, so that the target data service unit searches out a query tree from a cache region by using the target information according to the use request, parameterizes the query tree by using the parameter binding information to obtain a parameterized query tree, queries out a corresponding parameterized query execution plan by using the parameterized query tree, and executes the parameterized query execution plan.

In the above embodiment, the sub-query statement containing the constant is subjected to parameterized transformation in the analyzed query tree, and the parameterized query tree is handed to the query optimizer for query optimization, so as to obtain a parameterized query execution plan and cache the parameterized query execution plan. Therefore, when the sub query sentences containing the constants in the same mode access the database again, the query tree is parameterized again, the parameterized query execution plan in the cache is queried, the parameterized query execution plan is executed, the cost of repeatedly executing query optimization by the query sentences containing the constants is avoided, and the execution efficiency is improved.

For example, a signaling interaction schematic diagram of the query method as shown in fig. 5 a. The coordinating service unit executes the query execution plan of the target query statement, and multiplexes the processing procedures (i.e. execution phases) of the sub-plans cached in the data service unit, which may include:

s5014: when the same SQL statement is received to access the coordination service unit, the coordination service unit queries the SQL query execution plan cached by the coordination service unit according to the plan name provided by the client.

S5016: the coordination service unit binds the parameters into the query execution plan of the SQL.

S5018: and the coordination service unit calls the executor to execute the query execution plan of the SQL.

Because the sub-plan in the query execution plan of the SQL is converted into the sub-SQL, and the query tree and the query execution plan are cached in the corresponding data service unit, when the coordination service unit executes each sub-SQL, the data service unit queries the sub-plan of the sub-SQL according to the cached plan name, and the generation stage of the query tree is omitted. Through execution of the executor, the B information and the E message are sent to the data service unit.

S5020: and the data service unit responds to the received B message, queries the cache according to the plan name of the transmitted sub SQL to find a cached query execution plan, and initializes the query execution plan by using the parameters.

S5022: and the data service unit responds to the received E message, executes the query execution plan and returns the query result to the coordination service unit.

Aiming at the problem that the traditional scheme has no connection shared among processes and the connection can only be multiplexed for one session, because the system of the method provides the connection shared among the processes, each data service unit can be decoupled and multiplexed among the coordination service units, the global cache is realized based on the connection units, the data service units cache the corresponding relation between the information of the sub query statement and the query optimization data based on the sub query statement sent by the coordination service units, and the sub query statement is generated based on the query optimization result of the query statement, therefore, the generated sub query statement is standardized, although the query statements are different, as long as the sub-plans in the query plans of the query statements have the same function, the sub-plans corresponding to the sub query statement can be multiplexed in the queries of different query plans, so that the query optimization data is decoupled from the coordination service units, and the coordination nodes provide the connection units shared globally, therefore, based on the global multiplexing of the query optimization data connected to the coordination nodes, the information of the sub query statements can be multiplexed among the coordination service units to execute the optimized query, the efficiency of the query optimization can be improved, the redundancy of the connection of the data service units can be reduced, and the redundancy of the application can be achieved, and the redundancy of the redundancy can be reduced.

In order to further improve the query efficiency, in one or more embodiments of the present specification, each connection may be provided with a corresponding cache. And the connected cache region is used for recording the information of the sub-query statement corresponding to the query optimization data cached in the corresponding data service unit. The connected cache region records the information of the sub-query statements corresponding to the query optimization data cached in the corresponding data service unit, so that on one hand, the coordination service unit can more quickly find out the target data service unit caching the query optimization data of the target sub-query statements, on the other hand, the connection unit can accelerate the synchronization of the cache of the coordination service unit and the cache of the data service unit connected with the cache of the coordination service unit, and the synchronization overhead of the coordination service unit is reduced.

The finding out the target data service unit of the query optimization data cached in the target sub-query statement by using the target information of the target sub-query statement comprises: and searching a target data service unit of the query optimization data cached with the target sub-query statement by using the target information of the target sub-query statement and the information of the sub-query statement recorded in the connected cache region. Through the steps, the coordination service unit can find out the target data service unit of the query optimization data caching the target sub-query statement more quickly, and the query efficiency is improved.

Correspondingly, the first corresponding relationship may further include first version information, where the first version information is used to distinguish sub-query statements having the same information. The method may further comprise: in response to the target data service unit completing the preparation request, recording the target information and the first version information as a plan name of the target sub-query statement to a cache region of the coordination service unit; and sending a first updating request to a connection unit, wherein the first updating request carries the target information and the first version information, so that the connection unit records the unique identifier of the coordination service unit, the target information and a second corresponding relation between the first version information into a connected cache region corresponding to the target data service unit according to the first updating request. A second update request is received from the connection unit. Through the steps, when the cache of the coordination service unit is updated, the connected cache region can be updated timely, so that the information updated to the connected cache region can be directly synchronized with the cache of the data service unit based on the record in the connected cache region when the connection is reused by other coordination service units, and the synchronization overhead is reduced.

Correspondingly, the second update request carries second update information, where the second update information includes information of a sub-query statement that needs to be updated to the cache region of the coordination service unit in all connected cache regions of the coordination service unit and second version information corresponding to the key information, and the second version information takes a value according to the first version information when the unique identifier of the coordination service unit exists in the second corresponding relationship, otherwise takes a value as version information to be checked; and updating the second updating information carried by the second updating request to a cache region of the coordination service unit. By this step, the information of the connected buffer can be synchronized with the buffer of the data service unit directly based on the record in the connected buffer when the connection is multiplexed by another coordination service unit, thereby reducing the synchronization overhead.

In addition, in the above embodiment, in the process of synchronizing the cache of the coordination service unit and the connected cache, based on the difference between the two, the unique identifier of the coordination service unit and the corresponding version information are left for the record of the difference to indicate whether the sub-query statements need to be checked to be consistent subsequently, so that on the basis of achieving cache synchronization, reusable sub-query statements and sub-query statements needing to be checked to be consistent can be further distinguished subsequently.

In order to further improve the synchronization efficiency, in one or more embodiments of the present specification, the connection unit includes an agent corresponding to the coordination service unit, and the cache area of the agent is used for recording information of the sub query statement and corresponding version information recorded by the cache area of the coordination service unit. The second update information is information that the connection unit compares information in the cache region of the agent corresponding to the coordination service unit with information in all the connected cache regions of the coordination service unit, and the cache region of the coordination service unit needs to be updated according to a comparison result.

In the above embodiment, the record of the cache of the coordination service unit is recorded based on the cache of the proxy of the coordination service unit in the connection unit, so that only the updated part can be synchronized by directly taking the cache of the proxy as a reference for updating, and the synchronization overhead between the coordination service unit and the connection unit is reduced.

In the following, the above embodiment is exemplarily described by taking the information of the sub query statement as the hash value and the query statement as SQL as an example, with reference to the schematic diagram of the synchronous cache shown in fig. 5 b:

as shown in fig. 5b, the cached _ stmt of the connection c records the hash values of the sub-SQL corresponding to all the query trees cached in the currently connected data service node. Specifically, in each record, the primary key includes several bits generated after the sub-SQL is hashed, such as a 32-bit hash value, a unique identifier of the currently connected coordination service unit, and a version number. Where the version number starts with 1.

And the proxy cache region of the proxy Agent is used for recording the hash value of the sub-query statement cached by the corresponding coordination service unit. The prepended _ stmts _ hash below Agent as shown in FIG. 5b is used to signal the proxy cache. The proxy cache region is used to assist in coordinating cache synchronization of the service unit with the connection. Specifically, in the record in the proxy cache, the primary key is a number of bits, such as a 32-bit hash value and a version number, generated after the sub-SQL is hashed. The version number is from 0, and 0 represents that the sub-SQL is issued in another coordination service unit, but has not been verified in the process of the coordination service unit. And checking, namely checking whether the sub SQL to be executed by the process is consistent with the sub SQL of the query tree cached by the data service unit. In addition, each record in the proxy cache region further has a check marker, the check marker is identified by dn _ list in fig. 5b, max _ dn _ num represents the maximum data service unit sequence number, and each bit represents whether the corresponding coordination service unit has been cached or not. The check tag is boolean, indicating a cached tag if the value of the check tag is true and an uncached tag if false.

The first statement information buffer of the coordination service unit, such as the cached _ stmts _ hash below the CN Backend shown in fig. 5b, is used to record the hash value of the cached sub query statement in all the data service units to which the current coordination service unit is connected. The first statement information cache region is consistent with the records of the proxy cache region and is used for synchronizing with the agent.

The second statement information buffer of the data service unit, such as the prepended _ stmt _ hash below DN Backend shown in fig. 5b, is used to record the hash value of the sub query statement completing the preparation request. Specifically, each record in the second statement information cache area includes: a primary key (taking the hash value of the sub-SQL), max _ version (taking the current maximum version number), version (version number starting from 1), and SQL (query tree of the sub-SQL corresponding to this version), wherein the version and SQL are stored in one-to-one.

It should be noted that, when a certain connection multiplexes a certain sub-SQL, the coordination service unit needs to check whether the corresponding sub-SQL in the query execution plan is consistent with the query tree cached in the data service unit through the main key of the sub-SQL, and since the main key is generated by hashing the sub-SQL, the same hash value cannot be guaranteed, and represents the same sub-SQL statement, the coordination service unit needs to check whether the sub-SQL is consistent when being executed for the first time. To solve this problem, on the basis of the above embodiment, the coordinating service unit and the connection unit synchronize the cache through the following process, specifically, as shown in fig. 5c, a signaling interaction schematic diagram of the query method, where the process of synchronizing the cache includes:

s5024: the connection unit traverses the buffer of each connection currently connected by the coordination service unit.

Of course, before the connection unit traverses the buffer of each connection currently connected to the coordination service unit, the method further includes: and the coordination service unit applies for connection from the connection unit. The connection unit obtains the connection required by the coordination service unit, and then the connection unit can enter S5024 to start the process of synchronizing the cache.

S5026, for all records with the unique identification of the coordination service unit in the name, the hash value and the version number of the sub SQL are taken from the records and are used as main keys to be inserted into a proxy cache region of the proxy of the coordination service unit, and dn _ list slot is marked as true.

S5028: for all records without the unique identification of the coordination service unit in the name, taking the hash value and the version number "0" of the sub SQL as a main key and inserting the main key into the proxy cache area of the proxy of the coordination service unit, and marking dn _ list [ slot ] as true.

After the proxy cache regions are updated, returning the acquired connection to the coordination service unit, so that the coordination service unit can use the corresponding connection and correspondingly synchronize the local cache according to the following S5030-S5032.

S5030: and the connection unit carries the hash value and the version number of the sub SQL in the record which is updated in the proxy cache region in the updating request and sends the hash value and the version number to the corresponding coordination service unit.

S5032: and the coordination service unit receives the update request and synchronizes the update information to the local cache region, namely to the local cached stmts hash.

Based on the above manner of synchronous caching, an embodiment of the present specification further provides a processing procedure for checking whether sub-query statements are consistent. Specifically, the finding out the target data service unit, in which the query optimization data of the target sub-query statement is cached, by using the target information of the target sub-query statement includes:

under the condition that the plan name of the target sub-query statement exists in the cache region of the coordination service unit, finding out a target data service unit corresponding to any connection caching the target information;

under the condition that the plan name of the target sub-query statement does not exist in the cache region of the coordination service unit, judging whether the version information corresponding to the target information is the version information to be checked;

if so, sending an inspection request to a target data service unit corresponding to any connection caching the target information, wherein the inspection request carries the target sub-query statement and the target information, and enabling the target data service unit to inspect whether the statement is the same as the target sub-query statement or not by using query optimization data corresponding to the target information;

receiving the checking result returned by the target data service unit;

and determining that the target data service unit caches the query optimization data of the target sub-query statement under the condition that the same statement is determined according to the checking result.

In the above embodiment, the to-be-inspected version information left in the process of synchronizing the cache is utilized to correspondingly send the inspection request to the target data service unit, so that the data service unit inspects whether the sub-storage query statement of the multiplex query execution plan is consistent with the sub-query statement cached locally.

Correspondingly, according to the inspection result returned in the foregoing embodiment, in the method provided in this embodiment of the present specification, when it is determined that the statements are the same according to the inspection result, determining that the target data service unit caches the query optimization data of the target sub-query statement includes:

comparing the version information carried by the inspection result with the latest first version information of the target sub-query statement recorded in the cache region of the coordination service unit, wherein the version information carried by the inspection result is the first version information of the same statement recorded in the target data service unit under the condition that the target data service unit determines that the statement is the same statement, and the version information carried by the inspection result is the first version information updated on the basis of the first version information under the condition that the target data service unit determines that the statement is different;

if the latest first version information is not more updated than the version information carried by the checking result, determining that the target data service unit caches the query optimization data of the target sub-query statement, and recording the target information and the version information carried by the checking result as the plan name of the target sub-query statement in a cache region of the coordination service unit;

otherwise, the step of sending a preparation request for the target sub-query statement to the target data service unit is carried out, so that the target data service unit caches the query optimization data of the target sub-query statement according to the preparation request.

In the above embodiment, it is further determined whether a new query statement is added to the query statement to be reused for query optimization data based on the latest version information, so as to update the cache accordingly, so that more query optimization data can be cached in the data service unit to be reused.

In another or more embodiments of the present specification, according to the inspection result returned by the foregoing embodiments, whether the inspection result is reusable is further determined by a cached flag. Specifically, the second update information further includes a cached flag corresponding to the information of the sub query statement, where the cached flag is used to indicate that the data service node has cached corresponding query optimization data. Correspondingly, in a case that the plan name of the target sub-query statement exists in the cache region of the coordination service unit, finding out a target data service unit corresponding to any connection where the target information is cached includes:

under the condition that the plan name of the target sub-query statement exists in a cache region of the coordination service unit, judging whether the target information has a corresponding cached mark;

if yes, finding out a target data service unit corresponding to any connection caching the target information;

the method further comprises the following steps:

if the target information does not have the corresponding cached marks, entering the step of sending a preparation request for the target sub-query statement to the data service unit, and enabling the target data service unit to cache the query optimization data of the target sub-query statement according to the preparation request;

correspondingly, the sending a ping request to a target data service unit corresponding to any connection in which the target information is cached includes:

judging whether the target information has a corresponding cached mark;

if yes, sending an inspection request to a target data service unit corresponding to any connection caching the target information;

the method further comprises the following steps:

and if the target information does not have the corresponding cached marks, the step of sending a preparation request for the target sub-query statement to the data service unit is carried out, so that the target data service unit caches the query optimization data of the target sub-query statement according to the preparation request.

In the above embodiment, for the sub-query statements with cached marks, the corresponding query optimization data can be correctly multiplexed, and for the sub-query statements without cached marks, a preparation request is issued, and the cache of the data service unit is correspondingly updated, so that more query optimization data are added in the data service unit to be multiplexed.

It can be understood that, for the sub-query statement of the data service node that has cached the query optimization data, the cache of the coordination service unit has a cached flag, and for the case that there is no cached flag (for example, the value of dn _ list is false or there is no dn _ list information), it indicates that the data service node does not cache the corresponding query optimization data, and a preparation request needs to be sent to cause the data service unit to cache.

Next, taking the information of the sub-query statement as the hash value and the query statement as SQL as an example, the processing procedure of the above embodiment is exemplarily described with reference to the signaling interaction schematic diagram of the query method shown in fig. 5 c. Specifically, the method comprises the following steps:

s5034: the coordination service unit checks whether the corresponding sub-SQL in the query execution plan is named (i.e. whether the record corresponding to the hash value of the sub-SQL contains the plan name is determined), and if not, executes S5038.

S5036: if the plan name exists, and dn _ list [ current _ dn ] (current _ dn represents the currently connected data service unit) is true, it indicates that the corresponding data service unit has named the sub SQL, and after checking, the multiplexing process can be directly executed. If the connection bit corresponding to dn _ list [ current _ dn ] is false, it indicates that the coordination service unit has not completed the verification operation, and it needs to enter S5038.

S5038: and the coordination service unit takes the hash value obtained by hash calculation of the sub SQL, namely the initial version 0, as a query main key to query the local first statement information cache region compressed _ stmts _ hash.

S5040: if yes, check if dn _ list [ current _ dn ] is true, if true, indicate that the query execution plan of the sub-SQL is cached in the corresponding data service unit, and enter S5046 to determine if the two are the same statement.

S5042: and if the query execution plan does not exist, caching the query execution plan of newly added sub SQL with the hash value and version number of the sub SQL as main keys.

S5044: the coordination service unit sends a check _ value message to the corresponding data service unit.

S5046: and the data service unit inquires a local second statement information cache region according to the received hash value and checks whether the sub sql is consistent with a certain version of the cache.

S5048: and if the consistency returns that the checking result carries the corresponding version number.

S5050: and if the two are inconsistent, returning the checking result to carry the current maximum version number +1.

S5052: after the coordination service unit receives the checking result, if the version number carried in the checking result is less than the current maximum version number recorded by the coordination service unit, the version number in the checking result is recorded, the hash value of the sub SQL and the recorded version number form a main key and are recorded in a cache, and the stage of generating the query tree is omitted when the sub SQL is executed because the data access unit caches the query tree and the query execution plan.

S5054: and if the checking result is larger than the maximum version number, using the new version number returned by the checking result and the hash value of the sub SQL to form a main key, and caching the query execution plan of the new added sub SQL.

The steps of the query execution plan of the newly added child SQL are as the preparation phase shown in fig. 5a, and the following is briefly repeated:

the coordination service unit sends a preparation request aiming at the sub SQL to a data service unit;

the data service unit receives the sub SQL and generates a query tree, and takes the sub SQL hash value and the version number as main keys and the query tree as value cache;

and after the coordination service unit receives the preparation request which is successfully executed, recording the sub SQL hash value and the version number into a local first statement information cache region for the subsequent execution of the multiplexing process.

When the transaction of the coordination service unit is submitted, the connection of the data service unit needs to be released, the connection is returned to the connection unit for management so as to be provided for another needed coordination service unit, and the newly inserted sub SQL hash value and the version number are synchronized into the cache region of the corresponding connection in the connection unit.

For example, each connection corresponds to a data service unit, and the query tree cached in the data service unit can be reused among the coordination service units by multiplexing the connections.

It will be appreciated that the meaning of a current query may differ if the query execution plan in the data service unit was prepared in a previous period of time. For example, when a queried table is deleted and another list with the same name is created, and the table may have different column names and types, delete columns used in the query table, and other operations affecting the meta information of the query table, the operations of preparing, parsing, analyzing, and verifying the query statement are re-executed when the coordinating node executes the query. Accordingly, a new query tree and a new query execution plan are generated in the coordination node, so that abstract sub-query sentences in the query execution plan are generated according to the latest query table information, the issued sub-query sentences are different from the original ones, the prepared query tree in the data node cannot be selected, the new query tree and the new execution plan are generated in the data node, and the query accuracy is guaranteed.

The following will further explain the effect of the method reuse plan provided by the embodiment of the present specification by taking two SQL query execution plans as an example in conjunction with fig. 6. For example: as shown in FIG. 6, there are two SQL query execution plans in the orchestration service unit, the query execution plan on the left is referred to herein as execution plan A, and the query execution plan on the right is referred to as execution plan F. Wherein, the part of the execution plan A accessing C and the part of the execution plan F accessing C are the same part. According to the method provided by the embodiment of the specification, C after abstraction becomes a sub-query statement and is issued to a data node, A caches a query tree and an execution plan when executing for the first time, and F caches all information used in the execution stage when executing C again due to the use of a two-layer caching technology, so that query optimization is not needed, direct execution can be realized, and the ability of multiplexing sub-plans in the execution plan is realized.

Corresponding to the above method embodiment, the present specification further provides an embodiment of an inquiry apparatus configured in the coordination service unit, and fig. 7 illustrates a schematic structural diagram of an inquiry apparatus configured in the coordination service unit according to an embodiment of the present specification. As shown in fig. 7, the apparatus includes:

a connection module 702 configured to establish a connection with a data service unit based on a connection in the connection unit.

The data service unit caches the corresponding relation between the information of the sub-query statement and query optimization data, wherein the query optimization data is generated based on the sub-query statement, and the sub-query statement is generated based on a query optimization result of the query statement;

a statement determination module 704 configured to, upon receiving a target query statement, determine a target sub-query statement of the target query statement;

a search cache module 706 configured to search a target data service unit in which the query optimization data of the target sub-query statement is cached, by using the target information of the target sub-query statement;

a multiplexing query module 708 configured to send a usage request to the target data service unit, so that the target data service unit, in response to receiving the usage request, multiplexes query optimization data corresponding to the target information in the cache region to execute the query of the target sub-query statement.

The above is an exemplary scheme of the querying device configured in the coordination service unit according to the embodiment. It should be noted that the technical solution of the query device configured in the coordination service unit belongs to the same concept as the technical solution of the query method applied to the coordination service unit, and details of the technical solution of the query device configured in the coordination service unit, which is not described in detail, can be referred to the description of the technical solution of the query method applied to the coordination service unit.

Referring to fig. 8, fig. 8 is a flowchart illustrating a query method applied to a data service unit according to an embodiment of the present specification. The data service unit is configured in a data node of the system, the system further comprises a coordination node, the coordination node further comprises a connection unit, the connection unit comprises a connection capable of multiplexing among the coordination service units, and one connection corresponds to one data service unit in the data node. The method specifically comprises the following steps.

Step 802: when a preparation request carrying sub query sentences is received, generating query optimization data by using the sub query sentences, and correspondingly writing the query optimization data and the information of the sub query sentences into a cache region.

Wherein the sub-query statement is generated based on a query optimization result for the query statement.

Step 804: and receiving the use request sent by the coordination service unit.

The use request is sent to the target data service unit when the coordination service unit receives a target query statement, determines a target sub-query statement of the target query statement, and finds out the target data service unit in which the query optimization data of the target sub-query statement is cached by using the target information of the target sub-query statement.

Step 806: and multiplexing the query optimization data corresponding to the target information in the cache region according to the use request to execute the query of the target sub-query statement.

Because the system of the method provides a global cache realized based on the connection unit, the data service unit caches the corresponding relation between the information of the sub-query statement and the query optimization data based on the sub-query statement issued by the coordination service unit, and the sub-query statement is generated based on the query optimization result of the query statement, so that the generated sub-query statement is standardized, although the query statements are different, as long as the sub-plans in the query plan of the query statement have the same function, the sub-plans corresponding to the sub-query statement can be reused in the queries of different query plans, further the query optimization data and the coordination service unit are decoupled, and the coordination node provides a global shared connection unit, so that based on the global reuse connected to the coordination node, the query optimization data cached by the data service unit can be reused among the coordination service units to execute the optimized queries, thereby achieving the purposes of reducing redundancy, saving resources and improving the query efficiency.

The above is an illustrative scheme of the query method applied to the data service unit in this embodiment. It should be noted that the technical solution of the query method applied to the data service unit and the above technical solution of the query method applied to the coordination service unit belong to the same concept, and details of the technical solution of the query method applied to the data service unit, which are not described in detail, can be referred to the above description of the technical solution of the query method applied to the coordination service unit.

Corresponding to the above method embodiments, the present specification further provides an embodiment of an inquiry apparatus configured in a data service unit, and fig. 9 illustrates a schematic structural diagram of an inquiry apparatus configured in a data service unit according to an embodiment of the present specification. As shown in fig. 9, the apparatus includes:

a preparing module 902, configured to, when receiving a preparation request carrying a sub-query statement, generate query optimization data using the sub-query statement, and write the query optimization data into a cache region in correspondence with information of the sub-query statement, where the sub-query statement is generated based on a query optimization result of the query statement;

a request receiving module 904, configured to receive a usage request sent by the coordination service unit, where the usage request is sent to a target data service unit when the coordination service unit receives a target query statement, determines a target sub-query statement of the target query statement, and finds the target data service unit in which query optimization data of the target sub-query statement is cached by using target information of the target sub-query statement;

a multiplexing execution module 906 configured to multiplex the query optimization data corresponding to the target information in the cache region according to the usage request to execute the query of the target sub-query statement.

The above is an exemplary scheme of the querying device configured in the data service unit according to this embodiment. It should be noted that the technical solution of the query device configured in the data service unit and the technical solution of the query method configured in the data service unit belong to the same concept, and details of the technical solution of the query device configured in the data service unit, which are not described in detail, can be referred to the description of the technical solution of the query method configured in the data service unit.

Corresponding to the above query method embodiment, this specification further provides a system embodiment for implementing the query method according to any of the above embodiments, and fig. 10 shows a schematic structural diagram of a system provided by an embodiment of this specification. As shown in fig. 10, the system includes: a coordinating node 1010, and a data node 1020.

The above is a schematic scheme of a system of the present embodiment. It should be noted that the technical solution of the system and the technical solution of the query method belong to the same concept, and details of the technical solution of the system, which are not described in detail, can be referred to the description of the technical solution of the query method.

FIG. 11 illustrates a block diagram of a computing device 1100 provided in accordance with one embodiment of the present description. The components of the computing device 1100 include, but are not limited to, memory 1110 and a processor 1120. The processor 1120 is coupled to the memory 1110 via a bus 1130 and the database 1150 is used to store data.

The computing device 1100 also includes an access device 1140, the access device 1140 enabling the computing device 1100 to communicate via one or more networks 1160. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. Access device 1140 may include one or more of any type of network interface (e.g., a Network Interface Card (NIC)) whether wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the present description, the above-described components of computing device 1100, as well as other components not shown in FIG. 11, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device structure shown in FIG. 11 is for illustration purposes only and is not intended to limit the scope of the present description. Those skilled in the art may add or replace other components as desired.

Computing device 1100 can be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smartphone), wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 1100 can also be a mobile or stationary server.

The processor 1120 is configured to execute computer-executable instructions, which when executed by the processor, implement the steps of the above-described query method.

The foregoing is a schematic diagram of a computing device of the present embodiment. It should be noted that the technical solution of the computing device and the technical solution of the query method belong to the same concept, and details that are not described in detail in the technical solution of the computing device can be referred to the description of the technical solution of the query method.

An embodiment of the present specification further provides a computer-readable storage medium storing computer-executable instructions, which when executed by a processor implement the steps of the above-mentioned query method.

The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium belongs to the same concept as the technical solution of the above query method, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the above query method.

An embodiment of the present specification further provides a computer program, wherein when the computer program is executed in a computer, the computer is caused to execute the steps of the above query method.

The above is an illustrative scheme of a computer program of the present embodiment. It should be noted that the technical solution of the computer program and the technical solution of the above query method belong to the same concept, and details that are not described in detail in the technical solution of the computer program can be referred to the description of the technical solution of the above query method.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The computer instructions comprise computer program code which may be in source code form, object code form, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

It should be noted that, for the sake of simplicity, the foregoing method embodiments are described as a series of acts, but those skilled in the art should understand that the present embodiment is not limited by the described acts, because some steps may be performed in other sequences or simultaneously according to the present embodiment. Furthermore, those skilled in the art will appreciate that the embodiments described in this specification are presently preferred and that no acts or modules are required in the implementations of the disclosure.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The preferred embodiments of the present specification disclosed above are intended only to aid in the description of the specification. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, to thereby enable others skilled in the art to best understand and utilize the embodiments. The specification is limited only by the claims and their full scope and equivalents.

Claims

1. An inquiry method is applied to a system comprising a coordination node and a data node, wherein the coordination node comprises a coordination service unit and a connection unit, the data node comprises a data service unit, and the connection unit comprises a connection capable of being multiplexed among the coordination service units, and the method comprises the following steps:

when the data service unit receives a preparation request carrying sub query statements, the data service unit generates query optimization data by using the sub query statements, and correspondingly writes the query optimization data and the information of the sub query statements into a cache region, wherein the sub query statements are generated based on query optimization results of the query statements;

the coordination service unit establishes connection with the data service unit based on the connection in the connection unit, determines a target sub-query statement of the target query statement when receiving the target query statement, finds out a target data service unit in which query optimization data of the target sub-query statement is cached by using target information of the target sub-query statement, and sends a use request to the target data service unit;

and the data service unit responds to the received use request, and multiplexes the query optimization data corresponding to the target information in the cache region to execute the query of the target sub-query statement.

2. The method of claim 1, the query optimization data comprising a query tree and a query execution plan, the information of the sub-query statements being hash values computed from the sub-query statements;

the coordinating node comprises one or more coordinating service units, and the data node comprises one or more data service units;

the coordination service unit and a first cache region have a mapping relationship, the first cache region includes a first statement information cache region and a first plan cache region, the first statement information cache region is used for caching a corresponding relationship between a first hash value and a first plan main key, the first hash value includes hash values of sub query statements in caches of all data service units currently connected to the coordination service unit, the first plan cache region is used for caching a corresponding relationship between a second plan main key and a first plan value, the first plan main key is a plan name determined according to the hash values of the query statements, the second plan main key is a hash value determined according to a query tree generated by the query statements, and the first plan value is a query execution plan corresponding to the target query statement generated on the coordination service unit;

the connected cache region is used for recording hash values of sub-query statements corresponding to query optimization data cached in the corresponding data service unit, the coordination service unit and the proxy have a mapping relation, the proxy and the proxy cache region have a mapping relation, and the proxy cache region is used for caching the hash values of the sub-query statements cached in all the data service units currently connected with the corresponding coordination service unit;

the data service unit and a second cache region have a mapping relationship, the second cache region includes a second statement information cache region and a second plan cache region, the second statement information cache region is used for caching a corresponding relationship between the hash value of the sub-query statement and the query tree of the sub-query statement, the second plan cache region is used for caching a corresponding relationship between a third plan main key, a fourth plan main key and a second plan value, the third plan main key is a plan name determined according to the hash value of the sub-query statement, the fourth plan main key is a hash value determined according to the query tree of the sub-query statement, and the second plan value is a query execution plan generated according to the query tree of the sub-query statement.

3. The method of claim 1, further comprising:

when the coordination service unit receives the target query statement for the first time, determining a target sub-query statement of the target query statement;

sending a preparation request for the target sub-query statement to the target data service unit, wherein the preparation request carries the target sub-query statement, so that the target data service unit generates query optimization data by using the target sub-query statement according to the preparation request, and records a first corresponding relation between the query optimization data and the target information in a cache region of the target data service unit.

4. The method of claim 3, the coordinating service unit determining a target sub-query statement of the target query statement when the target query statement is first received, comprising:

when the coordination service unit receives the target query statement for the first time, the coordination service unit generates a query tree and a query execution plan of the target query statement by using the target query statement;

serializing the query tree of the target query statement and then performing hash calculation to obtain a hash value of the query tree;

converting a sub-plan in the query execution plan of the target query statement into a target sub-query statement;

taking the plan name determined by the hash value of the target query statement and the hash value of the query tree as main keys, taking the query execution plan of the target query statement as a value corresponding to the main keys, and writing the corresponding relation between the main keys and the corresponding values into a cache region;

correspondingly, when receiving a target query statement, the coordination service unit determines a sub-query statement of the target query statement, including:

when the coordination service unit receives the target query statement again, the plan name determined by the hash value of the target query statement and the hash value of the query tree of the target query statement are used as main keys, and whether the corresponding relation between the main keys and the corresponding values exists or not is queried in a cache region;

if yes, obtaining a value corresponding to the main key as a query execution plan of a target query statement;

and executing the query execution plan of the target query statement by calling a plan executor to determine a target sub-query statement of the target query statement.

5. The method of claim 4, the call plan executor, prior to executing the query execution plan for the target query statement, further comprising:

the coordination service unit binds the parameters of the target query statement to a query plan of the target query statement;

accordingly, the sending a usage request to the target data serving unit includes:

the coordination service unit sends a use request to the target data service unit, wherein the use request carries target information and parameter binding information of the target sub-query statement, so that the target data service unit searches a query tree from a cache region by using the target information according to the use request, parameterizes the query tree based on the parameter binding information to obtain a parameterized query tree, queries a corresponding parameterized query execution plan by using the parameterized query tree, and executes the parameterized query execution plan.

6. The method according to claim 3, wherein the connected buffer area is used for recording information of sub-query statements corresponding to the query optimization data cached in the corresponding data service unit;

the finding out the target data service unit of the query optimization data cached in the target sub-query statement by using the target information of the target sub-query statement comprises:

the coordination service unit finds out a target data service unit which caches the query optimization data of the target sub-query statement by using the target information of the target sub-query statement and the information of the sub-query statement recorded in the connected cache region;

the first corresponding relationship further includes first version information, and the first version information is used for distinguishing sub-query statements having the same information, and the method further includes:

the coordination service unit responds to the target data service unit to complete the preparation request, and records the target information and the first version information as the plan name of the target sub-query statement to a cache region of the coordination service unit;

sending a first update request to a connection unit, where the first update request carries the target information and the first version information, so that the connection unit records a unique identifier of the coordination service unit, the target information, and a second correspondence between the first version information in a connected cache region corresponding to the target data service unit according to the first update request;

receiving a second update request from the connection unit;

the second update request carries second update information, the second update information includes information of a sub-query statement that needs to be updated to a cache region of the coordination service unit and second version information corresponding to the information of the sub-query statement in all connected cache regions of the coordination service unit, and the second version information takes a value according to the first version information when the unique identifier of the coordination service unit exists in the second corresponding relationship, otherwise takes the value as the version information to be checked;

and updating the second updating information carried by the second updating request to a cache region of the coordination service unit.

7. The method according to claim 6, wherein the connection unit includes an agent corresponding to the coordination service unit, and the buffer of the agent is used for recording information of the sub-query statements recorded by the buffer of the coordination service unit and corresponding version information;

the second update information is information that the connection unit compares information in the cache region of the agent corresponding to the coordination service unit with information in all the connected cache regions of the coordination service unit, and the cache region of the coordination service unit needs to be updated according to a comparison result.

8. The method of claim 6, wherein the finding a target data service unit that has cached the query optimization data of the target sub-query statement using the target information of the target sub-query statement comprises:

the coordination service unit finds out a target data service unit corresponding to any connection caching the target information under the condition that the plan name of the target sub-query statement exists in a cache region of the coordination service unit;

receiving the checking result returned by the target data service unit;

9. The method of claim 8, wherein determining that the target data serving unit has cached the query optimization data of the target sub-query statement if it is determined to be the same statement according to the ping result comprises:

the coordination service unit compares the version information carried by the inspection result with the latest first version information of the target sub-query statement recorded in the cache region of the coordination service unit, wherein the version information carried by the inspection result is the first version information of the same statement recorded in the target data service unit under the condition that the target data service unit determines to be the same statement, and the version information carried by the inspection result is the first version information updated on the basis of the first version information under the condition that the target data service unit determines to be different statements;

if the latest first version information is not more updated than the version information carried by the examination result, determining that the target data service unit caches the query optimization data of the target sub-query statement, and recording the target information and the version information carried by the examination result as the plan name of the target sub-query statement to a cache region of the coordination service unit;

10. The method according to claim 8, wherein the second update information further includes a cached flag corresponding to the information of the sub-query statement, and the cached flag is used to indicate that the data service node has cached the corresponding query optimization data;

the finding out a target data service unit corresponding to any connection in which the target information is cached when the plan name of the target sub-query statement exists in the cache region of the coordination service unit includes:

the coordination service unit judges whether the target information has a corresponding cached mark or not under the condition that the plan name of the target sub-query statement exists in a cache region of the coordination service unit;

the method further comprises the following steps:

the coordination service unit judges whether the target information has a corresponding cached mark;

the method further comprises the following steps:

and if the target information does not have the corresponding cached marks, entering the step of sending a preparation request for the target sub-query statement to the data service unit, so that the target data service unit caches the query optimization data of the target sub-query statement according to the preparation request.

11. An inquiry method applied to a coordinating service unit, where the coordinating service unit is configured in a coordinating node of a system, the system further includes a data node, the coordinating node further includes a connection unit, the data node includes data service units, the connection unit includes connections that can be multiplexed among the coordinating service units, and one connection corresponds to one data service unit in the data node, the method includes:

establishing connection with a data service unit based on the connection in the connection unit, wherein the data service unit caches a corresponding relation between information of a sub-query statement and query optimization data, the query optimization data is generated based on the sub-query statement, and the sub-query statement is generated based on a query optimization result of the query statement;

determining a target sub-query statement of a target query statement when the target query statement is received;

searching out a target data service unit of the query optimization data cached in the target sub-query statement by using the target information of the target sub-query statement;

and sending a use request to the target data service unit, so that the target data service unit multiplexes the query optimization data corresponding to the target information in the cache region to execute the query of the target sub-query statement in response to receiving the use request.

12. An inquiry method applied to a data service unit, where the data service unit is configured in a data node of a system, the system further includes a coordination node, the coordination node further includes a connection unit, the connection unit includes connections that can be multiplexed among coordination service units, and one connection corresponds to one data service unit in the data node, and the method includes:

when a preparation request carrying sub query sentences is received, generating query optimization data by using the sub query sentences, and correspondingly writing the query optimization data and the information of the sub query sentences into a cache region, wherein the sub query sentences are generated based on query optimization results of the query sentences;

receiving a use request sent by the coordination service unit, wherein the use request is sent to a target data service unit when the coordination service unit receives a target query statement, determines a target sub-query statement of the target query statement, and finds out the target data service unit in which query optimization data of the target sub-query statement is cached by using target information of the target sub-query statement;

and multiplexing the query optimization data corresponding to the target information in the cache region according to the use request to execute the query of the target sub-query statement.

13. A system for implementing the query method of any one of claims 1-10, comprising: a coordinating node and a data node.

14. A computing device, comprising:

a memory and a processor;

the memory is configured to store computer-executable instructions and the processor is configured to execute the computer-executable instructions, which when executed by the processor implement the steps of the query method of any one of claims 1 to 12.

15. A computer-readable storage medium storing computer-executable instructions which, when executed by a processor, implement the steps of the query method of any one of claims 1 to 12.