CN116028528A

CN116028528A - Database query optimization method, system, electronic equipment and storage medium

Info

Publication number: CN116028528A
Application number: CN202210324703.4A
Authority: CN
Inventors: 姬莉霞; 赵润哲; 周洪鑫; 张晗; 赵耀; 徐冲; 李鹏
Original assignee: Zhengzhou University
Current assignee: Zhengzhou University
Priority date: 2022-03-29
Filing date: 2022-03-29
Publication date: 2023-04-28

Abstract

The invention discloses a database query optimization method, a database query optimization system, electronic equipment and a storage medium, and aims to solve the technical problems of low efficiency, large calculated amount and low accuracy of the existing query method. The invention adopts a dynamic double DQN connection sequence optimization method (Dynamic Double DQN order select, DDOS), firstly models connection inquiry as a Markov Decision Process (MDP), and trains a neural network model by using a weighted double-depth Q network to improve the prediction precision of the training network. By selecting actions through a dynamic progressive search strategy, the randomness and depth of exploration are improved to accumulate exploration of higher information gain. And after carrying out cost estimation on each query plan, selecting a connection plan which accords with data distribution and has balanced query load. The beneficial technical effects of the invention are as follows: the query performance can be effectively improved, and the method has good generalization capability and robustness.

Description

Database query optimization method, system, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of deep learning, in particular to a database query optimization method, a database query optimization system, electronic equipment and a storage medium.

Background

The connection order problem can be expressed as finding the best rearrangement problem in a multi-table connection, which is one of the most important problems in query optimization studies. There are different connection orders for the same query, and thus various connection execution plans that have different execution efficiencies. In the connection order problem, attention is paid to controlling the range of the search space to reduce the search overhead because the search space grows exponentially as the table relation number increases, for example, when n=20, the search space reaches 2.4e18. When connections of more than 1000 relationships are faced, even a reasonable optimization order cannot be given, and the execution overhead cannot be controlled through an efficient connection plan.

The rapid and accurate establishment of the efficient query plan is a key for improving the performance and the query efficiency of the database system. The multi-table connection query sequence selection based on cost in the traditional database is mainly based on a dynamic planning method and a heuristic method, and the core idea based on the dynamic planning method is to reduce the size of a search space by combining redundant states. The search space is still close to the exhaustive global space and therefore only applicable to queries containing fewer relationships. Heuristic connection methods, which typically perform heuristic searches on a portion of the state space through a particular search strategy, reduce the search space for potential query plans compared to traversing the search space, which reduces the time overhead spent optimizing queries, but also reduces the chance of finding the best query plan in the search space.

With the continued maturation of Deep Learning (DL) and reinforcement Learning (Reinforce Learning, RL) methods, some research has focused on fusing these two approaches together to solve the problem in the past few years, and the DQN (Deep Q Network) model proposed by Mnih et al, which is a DRL (Deep reinforcement Learning Deep Reinforcement Learning) pioneering work, learns the Q matrix through a Deep neural Network. And two major mechanisms of experience playback (learning memory library) and target network are introduced, so that the problem that the convergence of the neural network has up-and-down fluctuation when approaching the Q function is solved. Because the deep neural network provides rich high-dimensional characterization, end-to-end learning can be realized, and the combination of the deep neural network and the end-to-end learning can enable the RL to process some action decision problems in a high-dimensional state space. However, some researchers believe that integration of the online RL with the deep neural network is unstable. Hasselt et al propose a DDQN algorithm to reduce the overestimated case. However, DDQN sometimes exhibits underestimated motion values. In order to reduce training data and ensure sampling efficiency in the RL learning process, mahajan et al propose a new framework combining a symmetry detection program and a symmetry implementation mechanism to find symmetry in the environment, apply the symmetry to a function approximation process and accelerate a symmetry strategy learning process. Furthermore, the DQN of a simple discretization process grows exponentially with increasing degrees of freedom in the continuous domain, resulting in an excessively large action space, which is extremely difficult to converge. Thus, lillicrap et al propose a model-free algorithm DDPG based on DPG and AC, which applies the idea of DQN to continuous action domains, which can be applied to more complex problems and networks, however, requires a relatively slow change of the respective network parameters of the Actor and Critic compared to DDQN, and therefore requires a lot of training to adapt to the connection order operation.

Disclosure of Invention

The invention provides a database query optimization method, a database query optimization system, electronic equipment and a storage medium, which are used for solving the technical problems of low efficiency, large calculated amount and low accuracy of the existing query method.

In order to solve the technical problems, the invention adopts the following technical scheme:

the first aspect of the present invention is to design a database query optimization method, inputting the state and action information of database query into a dynamic dual-DQN network model, wherein the dynamic dual-DQN network model includes two parts of an evaluation network and a target network, the evaluation network is used for mining actions corresponding to the maximum value of Q in the dual-DQN network, the target network is used for evaluating the maximum action value in the dual-DQN network, and Q (s', a ^* The method comprises the steps of carrying out a first treatment on the surface of the θ) and Q (s', a) ^* ；θ ^- ) The target value is calculated as follows:

y ^WDDQN ＝r+γ[βQ(s',a ^* ；θ)+(1-β)Q(s',a ^* ；θ ^- )]

wherein Q (s', a) ^* ,θ ^- ) Representing the maximum action value, Q (s', a) ^L ,θ ^- ) The minimum action value representing the state s' in the target network, beta being defined as the weight, the beta range belonging to [0,1]]When the beta value is 0, the network is equivalent to a DDQN network, and when the beta value is 1, the model completely ignores Double DQN evaluation, and only uses the DQN network to select actions, wherein the weight beta calculation formula is as follows:

wherein action a ^* Denoted as evaluating an action of the network having a maximum action value, a _L Representing an action of evaluating the network with a minimum action value,

a ^* ＝arg maxQ(s',a；θ)

a _L ＝arg minQ(s',a,θ)

wherein c is a super parameter used for calculating the weight beta;

deriving a Q value from the cost model and the execution delay to act:

updated value of Q:

Q(s,a)←Q(s,a)+α[r+γmax _a' Q(s',a')-Q(s,a)]

wherein r is a reward, alpha is a learning rate, and the magnitude gamma for determining error learning is the learning rate, max _a' Q (s ', a') represents a target Q value, Q (s, a) represents an estimated Q value,

the loss function is defined as the difference between the estimated Q value and the actual Q value, i.e. the mean square error L (θ):

L(θ)＝E[||r+γmax _a' Q(s',a'；θ)-Q(s,a；θ)||]。

furthermore, for the super parameter c, in the problems with different characteristics, the optimal value of c is different, the super parameter c affects the network allocation weight, the size of the super parameter is set according to the characteristics of the connection task, and the value of c is selected on the basis of multiple experiments.

Further, the state and action information of database query are used as the input of the model, vectorization expression is adopted when query coding, a tree structure representation method is adopted when an intelligent agent is designed to select actions, connection query is represented by a query tree, connection between tables is defined as adding to a connection tree, leaf nodes are defined as adding to the connected tables, when new leaf nodes are added, connection is carried out with corresponding subtrees, and sub-connection structure information among the tables is reserved.

Further, the evaluation network selects actions through a dynamic progressive search strategy, firstly evaluates uncertainty, takes the uncertain and unexplored actions as the key points of training periods when unexplored, can realize the exploration to utilization stages through the process of increasing epsilon value from 0 to 1, realizes dynamic switching among stages through adjusting parameter epsilon, and can control the expression of parameter epsilon change to be

Where H is a continuous variable, the probability of the Q value difference between two alternative actions in a certain state is epsilon (H), and the value range of epsilon (H) is (0, 1).

Further, an adjustable threshold p is set, wherein U _i (a) The method is characterized in that the method is a boundary of action in an s state, when initial training data are fewer, the boundary is set relatively larger, estimation judgment of the action is more accurate along with the increase of training data, the accurate boundary is also smaller, and the calculation formula is as follows:

furthermore, in the model training stage, only after cost training, the data collected by the previous cost model training is combined as a guide, the network weight of the output layer is reset, the execution delay is used for fine tuning, and then the query data with the running time is retrained.

A second aspect of the present invention is to devise a database query optimization system comprising:

the database query statement input module is used for characterizing the actions and states of database query;

the network environment module is used for data interaction inside the model;

the evaluation network module is used for interactively acquiring sample data of the reward value with the network environment module so as to update an evaluation network;

the target network module is used for acquiring parameters of the evaluation network module, further calculating an action value of the maximum Q value in the current state, selecting a connected action according to the exploration strategy, acting on the network environment module, and obtaining a new state;

and the output module is used for giving the final connection plan to an executor for execution.

A third aspect of the present invention is to design an electronic device comprising

At least one actuator; the method comprises the steps of,

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one executor to enable the at least one executor to perform the database query optimization method described above.

A fourth aspect of the present invention is to devise a storage medium, which when executed by an actuator of a computer, enables the computer to perform the above-described database query optimization method.

Compared with the prior art, the invention has the beneficial technical effects that:

(1) The method for optimizing the connection sequence by combining the weighted double-depth Q network and the dynamic search strategy effectively solves the problem of the influence of the overestimation of the DQN and the low estimation of the DDQN on the query performance, adopts the progressive greedy strategy to train the connected action, can enable an intelligent agent to learn the strategy more accurately, improves the efficiency of the mining action in the training stage, and is closer to an optimal plan.

(2) According to the invention, the network model DQN and the double DQN are integrated into the query optimizer in a weighting way, so that the problem of model estimation errors in query connection can be solved, and the quality of making a query plan is improved.

(3) The invention selects the action by using the dynamic progressive search strategy, can estimate the information gain (information gain) of the network uncertainty data, can accelerate the learning speed and realize effective potential exploration.

(4) Experiments in the JOB show that compared with other methods, the connection query plan generated by the DDOS method not only reduces the time for optimizing delay, but also improves the quality of the connection query plan.

Drawings

FIG. 1 is a schematic workflow diagram of a query plan of the present invention.

FIG. 2 is a diagram illustrating a motion vector representation process according to the present invention.

FIG. 3 is a schematic diagram of a network prediction model architecture according to the present invention.

Detailed Description

The following examples are given to illustrate the invention in detail, but are not intended to limit the scope of the invention in any way.

The procedures involved or relied on in the following embodiments are conventional procedures or simple procedures in the technical field, and those skilled in the art can make routine selections or adaptation according to specific application scenarios.

Example 1: a database query optimization method is characterized in that connection operation is carried out in a relational database, the connection types are more common, the number of connection tables is usually not more than 20, but in practical application, connection queries of tens or even hundreds are possible, the existing method generally depends on scale estimation of intermediate results of the queries, and in order to reduce delay time for planning, the optimization effect of a certain degree of planning is generally affected.

The MySQL database, when implementing the query optimizer, joins heuristic optimization at CBO (Cost Based Optimizer). The Oracle optimizer is also heuristic query optimization based on built-in rules, and the method can always penetrate through the whole query optimization stage, and a large amount of tuning and maintenance work is needed in the process. Dynamic planning is used in System R query optimization by parsing the planning structure into a Join Tree form of left-deep (left-deep). However, the above method is generally limited by a cost model, and under a nonlinear cost curve, a plan with poor query quality is generated by mistake. The above method cannot obtain feedback from the past actually performed query connection plan, resulting in that the optimizer cannot obtain valid information from past experience.

In response to the above problems, a skilled person applies different reinforcement learning algorithms to the connection order optimization. This approach can fundamentally reduce the algorithm overhead and improve query performance by using feedback of past query execution. Ortiz et al predict cardinality estimates of join query plans by training models, propose to incrementally learn a state representation of sub-queries over a deep neural network, and refine the join order with reinforcement learning. But selecting the query scale as a cost function, query exploration has certain limitations, which can lead to suboptimal global query plans. The neural optimizer (Neo) proposed by Marcus et al iteratively searches for an execution plan with minimal relative overhead by means of a best-first search algorithm. However, it is difficult to implement generalization of optimizers across different databases. They then propose a selection strategy for the connection order enumerator ReJOIN to learn the connection actions using a strategy gradient algorithm. And firstly, all the connections are completed, and then, the whole connection plan rewarding value is calculated according to the cost model. This strategy requires training of large amounts of data and the connection query plan is greatly affected by cost estimation. Trummer et al employ an adaptive query processing strategy, do not maintain data statistics, and do not use cost or radix models. Krishnan et al uses the DQN model to optimize the connection order and uses less data to train, thus obtaining better performance than the traditional method, but the coding mode used cannot capture the structure information of the connection tree. Guo et al propose to introduce display representation self-join coding and beam search into join order optimization, improving model robustness, reducing the drawbacks of the model itself, but there is an overestimation that results in a poor query plan.

The DQN model is a DRL creative work, learns a Q matrix through a deep neural network, introduces two large mechanisms of experience playback (learning memory bank) and a target network, relieves the problem that the neural network converges to generate up-and-down fluctuation when approaching a Q function, and can solve the problem of DDQN algorithm for overestimate possibly generated by DQN. Therefore, the DQN and DDQN models are fused to optimize the connection sequence in the query, and the performance of the connection query can be further improved.

The main function of the query optimizer is to improve the execution efficiency of SQL sentences, the optimization process is divided into two stages of logic query optimization and physical query optimization, and the embodiment aims at the stage of generating a physical query execution plan by the optimizer. The modified DRL model in DDOS (Dynamic Double DQN order select, DDOS) is used to guide the selection of connection orders, outputting a predictive connection plan for a given SQL query. Referring to fig. 1, first, connection operation information represented by SQL statement query information and left and right side states is initialized, and a connection state in which a query is received by a neural network can be input. And then entering a training stage, selecting a connected action by adopting a progressive search strategy in the process, and carrying out connection action evaluation on the subsequent query by using the maximum rewards as a target according to a DRL model obtained by training by DDOS according to different network parameters and rewards for different SQL sentences, thereby realizing the establishment of an SQL connection plan. And then, adding Agg, group and other plan nodes to complete a physical query execution plan, and finally, executing the query by an executor according to the physical execution plan.

DDOS is still consistent with DQN in terms of solving the problem, requiring formalization of the connection order problem into a series of definitions of value-based RL, requiring modeling of the connection order problem as a markov decision process (Markov Decision Process, MDP). Using Q-learning to solve the Markov problem of connection order, in MDP model, the agent goes from initial connection state S ₀ Starting, interacting with the environment, observing the prize value after making action, learning to obtain more rewarding strategies by changing own behaviors, and finally making a series of connection actions (a ₁ ，a ₂ ，a ₃ ，…，a _r ) And connect relationships among all queries in order to minimize connection operation costs and execution time. As shown in FIG. 1, after SQL statement input, the SQL statement is vectorized first to complete initialization without adding additional information. Through interaction with the environment, the evaluation network event_net obtains sample data of the reward value, the sample data is used for updating the evaluation network, after a fixed time step number, parameters of the evaluation network are copied to the target network target_net, the target network is utilized to calculate an action value of the maximum Q value in the current state, the action of connection is selected to act on the environment according to the exploration strategy, and a new state is obtained. This process is repeated to arrive at a final connection plan, which is completed by adding plan nodes before being handed to the executor. When the gradient descent function is used for updating the evaluation network, the weighting method is used for balancing the parameters of the DQN and DDQN networks, so that the estimation error is effectively reduced, and the stability of the neural network is improved.

Expressing the connection order as a problem that DRL can solve requires encoding the database and query information as inputs to the neural network. When the query is coded, vectorization expression is adopted, and when the selection action of the intelligent agent is designed, a representation method of a tree structure is adopted, and each connection query can use a defined query treeTo indicate that a table-to-table join is defined as a join to a join tree and a leaf node is defined as a table joining the join. A query is a binary tree, where each table is a leaf node, and t= { K, R } is a query tree. One join of the table is denoted as r= { K, R }, each table join joins a tree with correspondence inserted into the leaf and leaf node. And r is _i And r _j The corresponding leaf node is removed from the action selection after insertion. When there is a new leaf addition, it will be equal to (r) _i ,r _j ) The corresponding subtrees are connected first. This allows preserving the sub-connection structure information between tables.

Each table to be connected is used as a connection candidate, and all the candidate items are connected to be connected by adopting all the combinations of all the candidate item connection. When the total number of the database tables is n, n-dimensional vector motion selection is used for expressing the motion space, in the connection process, vector values are used for connecting corresponding subtrees, and when the tables are added into the connection, the motion values of the corresponding positions are changed into the subtrees

If the corresponding connection table is not added in the subtree, the value is 0, and h is the height value when the subtree is in relation in connection.

As shown in FIG. 2, in state s3, line 3, the correspondence represents a connection

Wherein the value of 3 rows corresponding to the o table is 1/4, the height of the subtree where the o table is positioned is 2, and the value corresponding to n in a connectionless manner is 0. Compared with the data representation method of the existing learning type optimizer (for example, DQ, wherein each column of a database is used as a single feature, a state is expressed as a binary one-hot vector, and each number represents one column of the database), the action behavior can well store hierarchical information among tables, the structure information of a connection tree is conveniently captured, the cost of the optimizer in cost estimation is reduced, and meanwhile, after training data containing local connection, the data can be combined and expanded to multi-table connection with larger connection number. This embodiment incorporates Multiple 1-hot pairs to implement self-join query encoding. With respect to the connection movementThe left-right relationship, physical operation and intermediate relationship are expressed as

A _L And A _R The left-right relation is represented, n-dimensional vectors are adopted, one-hot coding is adopted for physical connection operation, and c is the radix estimation of the connection intermediate relation.

For the construction of the DDOS network model, as shown in fig. 3, in the standard DQN network, the selection and evaluation actions are based on the same parameters, so that the Q value is overestimated, which causes the defect that the deep Q network has an excessively high estimated action value, and the double DQN proposed by Haselt et al, the estimation strategy is performed according to the online Q network, and the target network is used for estimating the Q value. The method can greatly relieve the problem of over fitting, but also brings the problem of underestimated action values, and the over or under estimation influences the learned strategy to further reduce the performance. In the logic optimization stage, in order to reduce the negative effects caused by the overestimation problem of the DQN and the underestimation problem of the DDQN, the weighted double estimator idea is applied to double DQN to construct a network which can be accurately screened and has great improvement on the current query plan performance, the neural network is trained to return the Q value by calculating the 'action-state' pair, and the error of the target estimated value is reduced by balancing the weights of the DQN and the DDQN on the basis of retaining the Q value, so that the more accurate estimated Q value is realized. The vectorized query plan tree, the state information of the input layer and two hidden layers through the neural network is input to the fully connected linear layer, and then the vector is mapped to the prediction of the performance cost, using the ReLU activation function and layer normalization.

The structure of the dual DQN network is still used in this embodiment, including both the evaluation network and the target network, without the need to add additional networks. Wherein the purpose of evaluating the network is to mine actions corresponding to the maximum values. And the target network is responsible for the evaluation of the maximum action value. Using Q (s', a) ^* The method comprises the steps of carrying out a first treatment on the surface of the θ) and Q (s', a) ^* The method comprises the steps of carrying out a first treatment on the surface of the The linear relation of the composition of theta-) is used for calculating the target value, and the calculation formula is as follows:

y ^WDDQN ＝r+γ[βQ(s',a ^* ；θ)+(1-β)Q(s',a ^* ；θ ^- )] (1)

wherein, beta is defined as weight, action selection calculates action value by evaluating the weight of both network and target network, wherein beta range belongs to [0,1], when beta is 0, the network is equivalent to DDQN network, when beta is 1, algorithm completely ignores Double DQN evaluation, only uses DQN network to select action, weight beta calculation formula is that

Action a ^* Denoted as evaluating an action of the network having a maximum action value, a _L Representing an action of the evaluation network with a minimum action value calculated as

a ^* ＝arg maxQ(s',a；θ) (3)

a _L ＝arg minQ(s',a,θ) (4)

Thus, Q (s', a) ^* ,θ ^- ) Representing the maximum action value, Q (s', a) ^L ,θ ^- ) Representing the minimum action value of the state s' in the target network. Where c is a superparameter used to calculate the weight β. In the process of calculation, in the problems with different characteristics, the optimal value of c is different, the super parameter c can influence the network distribution weight, the size of the super parameter is set according to the characteristics of the connection task, and the value of c is selected on the basis of multiple experiments.

When the query optimization is solved, the traditional bottom-up optimization concept is used, and when the Q value is obtained, the Q value is obtained according to the cost model and the execution delay generated on actions.

/>

When the Q value is updated, the difference value of the loss function is calculated through mean square error, the cost model of the traditional optimizer sometimes has deviation, but still is a good choice for training the neural network model, and the method is executed according to the cost modelQuerying rewards corresponding to the actions. Wherein r is a reward, alpha is a learning rate, and the magnitude gamma for determining error learning is the learning rate, max _a' Q (s ', a') represents a target Q value, Q (s, a) represents an estimated Q value, and the Q value updating process is as follows,

Q(s,a)←Q(s,a)+α[r+γmax _a' Q(s',a')-Q(s,a)] (6)

L(θ)＝E[||r+γmax _a' Q(s',a'；θ)-Q(s,a；θ||)] (7)

the dynamic progressive search strategy is adopted for training the DDOS network model, the epsilon-greedy strategy adds random action selection with a certain probability, and according to the function of the parameter epsilon in the greedy strategy in the Q algorithm, the adaptation of an intelligent agent is reduced by using a fixed value for each update, and when the intelligent agent faces a complex scene, the exploration in a long distance is difficult to achieve under the condition that the epsilon is unchanged. If the epsilon value is set lower, the learning speed of the network is reduced, and if the epsilon value is set higher, the intelligent agent can exponentially decrease according to the increase of the distance from the initial state, so that the deep exploration is not achieved, and the possibility of local optimization is increased. In order to avoid meaningless exploration, firstly, uncertainty needs to be evaluated, in a new unexplored state, the uncertainty and unexplored actions generally have higher information gain (information gain), during exploration, the part is used as the key of a training period, the randomness executed at each time of exploration is increased by adding noise, so that a more efficient exploration environment is realized, the exploration to utilization stage can be realized through the process of increasing epsilon value from 0 to 1, and dynamic switching between stages is realized through adjusting parameter epsilon. Thereby designing an expression for controlling the variation of the parameter epsilon

Wherein H is a continuous variable, the probability of the Q value difference of two alternative actions in a certain state is epsilon (H), and when H gradually becomes smaller, epsilon (H) of the Q value difference also becomes larger; conversely, ε (H) becomes smaller. Due to d [ Q(s) _k ,a _i+1 )-Q(s _k ,a _i )]< 0, the value range of ε (H) in formula (8) is (0, 1). And the random jump of the Q algorithm in the exploring-learning-utilizing 3 stages can be realized by adjusting the parameter H to realize the change of epsilon.

First, the action participation alternatives within the following action value range are determined, and an adjustable threshold p is set, wherein U _i (a) The boundary of the motion in the s state is set relatively larger when the initial training data is less, and the estimated value of the motion is judged more and more accurately along with the increase of the training data, so that the accurate boundary is also smaller and smaller. The calculation formula is as follows:

the dynamic progressive search strategy controls the jump of greedy degree from the exploration stage to the utilization stage through the change of greedy parameter epsilon, and potential available actions are mined as widely as possible in the learning process, so that the algorithm can be prevented from falling into a local optimal state to a great extent.

The training process of DDOS is as follows: due to the cost model, the cost to be estimated can be promptly given in a short time, and although there is a case of partial deviation from the actual delay in terms of calculation accuracy, the calculated cost is still referenceable. In addition, in order to alleviate the estimated deviation caused by the cost model, the embodiment adds the training of the execution delay, and each query execution needs a long time to obtain the actual feedback, so that only after the cost training, the network weight of the output layer is reset by combining the data collected by the previous cost model training as a guide, the execution delay is used for fine tuning, and the training is retrained on a small amount of data (the query with the running time is actually executed). After training is completed, the model can be used for optimizing database query, SQL statement query information and connection operation information expressed by left and right side states are input into the model to enable the model to receive the connection state of the query, and then the model outputs a physical query execution plan and gives the physical query execution plan to an executor for execution, so that the database query is completed.

While the present invention has been described in detail with reference to the drawings and the embodiments, those skilled in the art will understand that various specific parameters in the above embodiments may be changed without departing from the spirit of the invention, and a plurality of specific embodiments are common variation ranges of the present invention, and will not be described in detail herein.

Claims

1. A database query optimization method is characterized in that the state and action information of database query are input into a dynamic double DQN network model, wherein the dynamic double DQN network model comprises an evaluation network and a target network, the evaluation network is used for mining actions corresponding to the maximum value of Q in the double DQN network, the target network is used for evaluating the maximum action value in the double DQN network, and Q (s', a ^* The method comprises the steps of carrying out a first treatment on the surface of the θ) and Q (s', a) ^* The method comprises the steps of carrying out a first treatment on the surface of the The linear relationship of the composition of θ -) to calculate the target value is as follows:

y ^WDDQN ＝r+γ[βQ(s',a ^* ；θ)+(1-β)Q(s',a ^* ；θ ^- )]

a ^* ＝arg maxQ(s',a；θ)

a _L ＝arg minQ(s',a,θ)

wherein c is a super parameter used for calculating the weight beta;

deriving a Q value from the cost model and the execution delay to act:

updated value of Q:

Q(s,a)←Q(s,a)+α[r+γmax _a' Q(s',a')-Q(s,a)]

L(θ)＝E[||r+γmax _a' Q(s',a'；θ)-Q(s,a；θ)||]。

2. the database query optimization method according to claim 1, wherein for the super parameter c, in the problems with different characteristics, the optimal value of c is different, the super parameter c affects the network allocation weight, the size of the super parameter is set according to the connection task characteristics, and the value of c is selected on the basis of multiple experiments.

3. The method of optimizing database query according to claim 1, wherein the state and action information of the database query are used as the input of the model, vectorization expression is adopted during query encoding, a tree structure characterization method is adopted during selection of the actions by the design agent, the connection query is represented by a query tree, the table-to-table connection is defined as adding to the connection tree, the leaf nodes are defined as adding to the connected table, and when a new leaf node is added, the connection is performed with the corresponding subtree in advance for retaining the sub-connection structure information between the tables.

4. The database query optimization method as claimed in claim 1, wherein the evaluation network selects actions by dynamic progressive search strategy, first evaluates uncertainty, uses uncertainty and unexplored actions as the key points of training period when unexplored, realizes the exploration to utilization stage by increasing epsilon value from 0 to 1, realizes dynamic switching between stages by adjusting parameter epsilon, and controls the expression of parameter epsilon change to be

5. The method of claim 4, wherein an adjustable threshold p is set, wherein U _i (a) The method is characterized in that the method is a boundary of action in an s state, when initial training data are fewer, the boundary is set relatively larger, estimation judgment of the action is more accurate along with the increase of training data, the accurate boundary is also smaller, and the calculation formula is as follows:

6. the database query optimization method of claim 1, wherein the model training stage resets the network weights of the output layer using execution delays for fine tuning only after cost training in combination with data collected by previous cost model training, and retrains the query data actually executed with runtime.

7. A database query optimization system, comprising:

the network environment module is used for data interaction inside the model;

8. An electronic device, comprising

At least one actuator; the method comprises the steps of,

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one executor to enable the at least one executor to perform the database query optimization method of any of claims 1-6.

9. A storage medium, wherein instructions in the storage medium, when executed by an actuator of a computer, enable the computer to perform the database query optimization method of any one of claims 1-6.