CN114328669B

CN114328669B - Deep learning-based automatic time sequence database index recommendation method and equipment

Info

Publication number: CN114328669B
Application number: CN202111662250.8A
Authority: CN
Inventors: 王宏志; 李同欣; 张凯欣; 郑博; 梁栋; 叶天生; 燕钰; 丁小欧
Original assignee: Beijing Nosi Spacetime Technology Co ltd; Harbin Institute of Technology
Current assignee: Beijing Nosi Spacetime Technology Co ltd; Harbin Institute of Technology
Priority date: 2021-12-30
Filing date: 2021-12-30
Publication date: 2023-05-16
Anticipated expiration: 2041-12-30
Also published as: CN114328669A

Abstract

An automatic time sequence database index recommendation method, a storage medium and equipment based on deep learning belong to the technical field of databases. There is no effective and feasible automatic index recommendation method for the time sequence database. According to the method, index recommendation of a time sequence database is carried out through the reinforcement learning model, and a proxy of the reinforcement learning model is responsible for a decision making process; the agent interacts with an environment model of the database, and the environment model calculates state transition and cost obtained by the agent due to decision; the DBMS interface is responsible for performing database creation, upgrade or deletion actions and obtaining statistics of the current index configuration. The method is mainly used for index recommendation of the time sequence database.

Description

Deep learning-based automatic time sequence database index recommendation method and equipment

Technical Field

The invention relates to a database index recommendation method, a storage medium and equipment, and belongs to the technical field of databases.

Background

The selection of an index is very important to database technology and the complexity of such problems is very high. Research on selection of indexes has been started since 1970, and many algorithms for indexing through different indexes based on various methods have been gradually developed. The first algorithm was an index-based deletion algorithm, after which algorithms such as Extend, machine learning-based algorithms, DTA, etc. were developed. Most algorithms are proposed in academic papers and some are applied in commercial systems, such as AutoAdmin, DB2Advis, DTA, etc. In the current research effort, algorithms for index recommendation fall into approximately the following categories:

1. an iterative indexing algorithm, such as AutoAdmin, DB2Advis, DTA, extend, etc., starting from an empty index configuration, for example, where the Extend algorithm is a composite index iterative selection algorithm by Schlosser et al. Focusing on performance improvement without limiting the a priori set of index candidates.

2. An algorithm to gradually delete indexes from a large number of index configurations, such as Drop, relaxation: such as where the Drop heuristic of white is one of the earliest index selection algorithms.

3. Algorithms based on linear programming, such as CoPhy, etc.: dash et al propose a complex linear programming algorithm CoPhy that solves the database index selection problem. Linear programming is a common method of solving the optimization problem by specifying optimization objectives and constraints using linear equations, and then using off-the-shelf solvers to find the optimal solution. The algorithm optimizes the solver and discards invalid and suboptimal solutions as soon as possible.

4. Deep learning-based algorithms, a new class of algorithms that have emerged in recent years: the work of Kraska et al is to use a machine-learned indexing structure instead of the traditional indexing structure, thus performing well in some situations and superior to the traditional data structures used in current database management systems. Pavlo et al developed peloto, an architecture that automatically optimizes a database system based on incoming workloads, in preparation for possible future incoming workloads.

However, since the time series data is different from the characteristics of the conventional relational database (large writing amount, less support for update and deletion, etc.), it is often impossible to maintain the relational database, and it is necessary to manage the relational database using a dedicated time series database. Aiming at a time sequence database of fire heat in recent years, no effective and feasible automatic index recommendation method exists at present.

Disclosure of Invention

The invention aims at the problem that an effective and feasible automatic index recommendation method for a time sequence database does not exist at present.

The automatic time sequence database index recommendation method based on deep learning carries out index recommendation of the time sequence database through a reinforcement learning model and specifically comprises the following steps of:

the agency RL Agent of the reinforcement learning model is responsible for the decision making process; the agent interacts with an Environment model Environment of the database, and the Environment model calculates state transition and cost obtained by the agent due to decision; the DBMS interface is responsible for executing the creation, upgrading or deleting actions of the database and acquiring the statistical information of the current index configuration;

the environmental model includes states, actions, and costs;

representing states as two-dimensional vectors

Wherein the two dimensions are index number and column number respectively; index number represents the number of the current index, column number represents the number of the current column;

motion as vector

The action behavior comprises: creating an index, upgrading the index, deleting the index and not operating; each action indicates that one of the first C bits does not index the column if it is 0, and that it does index the column if it is 1; if the previous C bits are all 0, the last bit is 0, indicating no operation, and if it is 1, indicating deletion of the index; such asSome of the previous C bits are 1, the last bit is 0 to represent the creation index, and the last bit is 1 to represent the upgrade index;

the cost function is as follows:

using the estimated cost of the execution engine in the CnosDB as the cost of legal action, and obtaining the estimated cost after executing the EXPLAIN command through the virtual index query; if the agent selects illegal action in the current state space, giving a larger penalty r _c So that illegal actions in the state are recorded;

cost function:

wherein mask is an array that records whether action a is legal, if action a is legal in the current state, mask (a) = 1, otherwise mask (a) = 0; the cost from hypopg () obtained by expain is performed.

Further, the upgrade index can only upgrade an existing index by one dimension.

Further, for motion vectors

The vectors are not recorded in a matrix, and are encoded into corresponding decimal numbers.

Further, in the process of encoding the vector into the corresponding decimal number, the encoding is digital compression, the motion vector is regarded as a binary number, and then the corresponding decimal number is encoded. Further, after encoding the vector into the corresponding decimal number, an array is utilized to store which actions are legal and which actions are illegal.

Further, penalty r of the illegal action _c Set to 800.

Further, the reinforcement learning model is a Double DQN model.

Further, the reinforcement learning model for index recommendation of the time series database is trained in advance, and the training process comprises the following steps:

1. a new task is submitted to the DBMS, the state is initialized, two neural networks are initialized, and index recommendation is started;

2. the deep learning agent starts from the current state and selects an action according to experience or greedy strategy of the neural network;

3. executing the action, obtaining the cost of the action through whether the action is legal or not and the estimated cost obtained by the virtual index spoofing optimizer in the CnosDB, simultaneously showing the state after the action is executed, and storing the < state, action, cost and next state > vector into a replay space;

4. sampling a small sample from the replay volume, training two neural networks with the sample;

5. updating the current state to the next state;

6. repeating the processes from the step 2 to the step 5 until the neural network training is completed;

7. starting from the initial state, predicting the Q values of all the current actions by using two neural networks, wherein the Q value is defined as the smaller one of the prediction results given by the two neural networks, and the action with the largest Q value is selected greedily each time until the action cannot be improved any more, and the action sequence at the moment is the index to be established.

A storage medium having stored therein at least one instruction that is loaded and executed by a processor to implement the deep learning based automated time series database index recommendation method.

An apparatus comprising a processor and a memory having stored therein at least one instruction that is loaded and executed by the processor to implement the deep learning based automated time series database index recommendation method.

The beneficial effects of the invention are as follows:

the invention provides an effective and feasible automatic index recommendation method, and experimental results show that the query performance of the database configured with the corresponding index combination after training by the invention is always better than that of a comparable similar baseline algorithm. The agent of the invention can select a strategy with better performance under the condition of not occupying the space of index configuration too much. Meanwhile, because strategies of agents under databases of different sizes have similar performances, training time can be saved by a method of training under a small-scale database and then applying the method to a large-scale database under the condition that absolute accuracy of optimization is not pursued.

Drawings

FIG. 1 is a functional architecture diagram of an automated time series database index recommendation.

Detailed Description

The first embodiment is as follows:

the embodiment is an automatic time sequence database index recommendation method based on deep learning, wherein the specific recommendation process is realized by an agent for reinforcement learning, an environment model of a time sequence database and an interface for applying actions selected by an agent to a DBMS of the database;

the reinforcement learning Agent (RL Agent) is responsible for the decision making process, while the Agent interacts with the Environment model (Environment) of the database, which computes the state transitions and costs that the Agent gets from its decisions. In order to maintain the persistence of changes, there is a DBMS interface (DBMS interface) responsible for performing database creation, upgrade or deletion actions and obtaining statistics of the current index configuration, the specific functional architecture is shown in FIG. 1, where Action a represents Action a, state s represents State s, reward r represents rewards r, structured DB stats represents structured database data, and index option represents index option; database stats represents Database data, changes to apply represents change applications.

Reinforcement learning agent:

the agent of the invention is realized by Double DQN; algorithm 1 is a Q-Learning reinforcement Learning algorithm using neural networks for function approximation, with an experience pool for experience playback. The neural network is used for approximating the Q value of the action and performing experience replay training by randomly extracting the memory in the experience pool. In each iteration, the agent performs an action transition in the environment. That is, the agent selects an action through an experience value or neural network in each iteration round and applies the action to the environment, the environment returns a cost and a next state, and finally, the action, the state and the cost are stored in an experience pool for later experience playback.

Wherein, the empirical method selects to set a window value H, and returns a creation or upgrade action of the last H used columns or a deletion action of the existing columns. Since the action trial-and-error space is huge, in order to avoid selecting a large number of ineffective actions in the initial stage of iteration, the agent tends to choose actions more than by an empirical method in the initial stage of iteration, and a small probability of obtaining actions by a greedy method of a neural network is obtained. As the iteration proceeds, the neural network learns more and more, the probability of using the neural network becomes greater, and finally, the agent policy through the neural network becomes dominant.

Meanwhile, considering the defect of the DQN itself, that is, since the Q value is estimated based on the maximum value of the Q value of the next s', the estimation of Q of the next state also depends on the Q value further next, which results in a tendency of increasing the Q value. In order to alleviate the problem, the invention adopts a simpler idea, and uses two Q networks for training, and because the parameters of the two Q networks are different, the evaluation of the same action is also different, therefore, the invention selects the smaller one of the two Q values each time, and can effectively alleviate the increasing trend of the Q value by combining the errors of the Q networks, so that the prediction result is more reasonable.

Environment:

the environment is responsible for the state transitions and logging actions in the computing algorithm. In order to successfully implement state transitions, the invention implements an environmental model of the database, models (state, action, cost) according to the index features of the database, and implements a transition function that can transition states according to actions selected by the agent. The environment feeds back it to the proxy along with the next state based on the cost per action. Finally, the agent learns the optimal action corresponding to each state through feedback. The quality of the description of the environment is critical to the algorithm. The required state can better represent the current state of the database, the represented information can not be too much or too little, and the environment is required to correctly correspond actions to state conversion.

State representation:

the state is a formalized representation of the environmental information by the agent during the learning process, and therefore, a proper state representation is important to the overall algorithm. The amount of information of a state code has a great influence on the agent of reinforcement learning. If the content of the information representation is too small, the agent may not learn the content of the environment, and if the information representation is too much, the agent may have difficulty converging. For the database optimization task of the present invention, states are represented as two-dimensional vectors

Wherein the two dimensions are index number and column number respectively; index number represents the number of the current index, column number represents the number of the current column; />

For example, for a data table with 4 columns,

when using an index for a query, an index is expressed as: 10 1 0;

when two indexes are used for query, the two indexes are expressed as:

wherein, the first index is built on the 2 nd column, the second index is built on the 1 st column and the 3 rd column, and the 1 st column and the 3 rd column are corresponding compound indexes; according to the number of the index and the number of the current column (i.e. two-dimensional vector +.>

) The corresponding state can be determined；

The action represents:

in the context of the present invention, motion is defined as a vector

Wherein->

Since support for compound index recommendations is explored, there are four possible behaviors for each action of the present invention: creating an index, upgrading the index, deleting the index and not operating, wherein the upgrading index can only upgrade one dimension of the existing index; wherein, each action represents that a certain bit in the front C bits does not index the column if 0, and represents that the column is index if 1; if the previous C bits are all 0, the last bit is 0, indicating no operation, and if it is 1, indicating deletion of the index; if some of the previous C bits are 1, the last bit is 0 to represent the creation index, and the last bit is 1 to represent the upgrade index; according to this coding scheme, the zero vector represents no operation. For example, when C is 4, 00000 indicates that no operation is performed, 01000 indicates that an index is created on the second column, and 01001 indicates that an index is updated on the second column. The arrangement mode not only enables the data processing process to be more efficient, but also can be matched with the reinforcement learning process to rapidly determine action operation.

In practical application of the present invention, in order to save the space cost of the algorithm, the present invention does not record the vector in a matrix manner, but encodes the vector into corresponding decimal numbers, and simultaneously uses an array to store which actions are legal and which actions are illegal. Legal and illegal is from the nature of the index itself, such as deleting an index that does not exist, creating an existing index, upgrading an index that does not exist, etc., corresponding to illegal actions of the following cost function.

The invention writes a coding and decoding method, so that the coding and decoding method can quickly convert decimal numbers into actual actions. Such coding, while having little effect on the results of the algorithm execution, is very significant in reducing training time in practical testing. In practice, the digital compression is simply performed, for example, 00001110 is stored as a matrix, 8 spaces are needed, and in practice, an int is used to store 0b00001110 as a binary number.

Cost function:

a suitable cost function is critical to the learning of the agent. The invention expects that the cost function of the invention can accurately give the agent correct feedback. I.e. whether the index combination reached by the currently performed actions can improve the database performance or whether the currently performed illegal actions can make the agent learn that these actions are illegal in this state. In order to be able to obtain the cost more accurately, the invention uses the estimated cost of the execution engine in the CnosDB as the cost of the correct action of the invention. In the time sequence database, when executing a specific statement, the execution engine of the database can estimate the execution time of the statement so as to expect to obtain the best execution mode and the best execution efficiency. The index can be directly established, and the execution time of the statement is obtained through the EXPLAIN function without actually executing the statement, so that the execution time is taken as the cost of legal actions.

But at the same time it is possible that the performed actions are found illegal after the function verification of the invention, such as upgrading a list of indexes that are not already present, creating an index that is already present in the state, or deleting a group of indexes that are not present, etc. If an agent selects such an action in the current state space, the present invention requires a large penalty to it to record illegal actions in that state.

The cost function implemented by the algorithm is as follows:

where mask is an array that records whether action a is legal, if action a is legal in the current state, mask (a) = 1, otherwise mask (a) = 0. Meanwhile, according to experience of a test in an actual experiment, the cost from hypopg () obtained by performing expain is generally in the range of [10,200], so the penalty of illegal action is set to 800. Since the present invention expects that the penalty of illegal actions should be more than twice that of normal cases, the penalty is set to a number much greater than the maximum cost.

Description of the flow:

1. a new task is submitted to the DBMS, the state is initialized, two neural networks are initialized, and index recommendation is started.

2. The deep learning agent selects an action based on experience or greedy strategy of the neural network starting from the current state, and tends to be more experience in the early stage and greedy strategy of the neural network in the later stage.

3. And executing the action, obtaining the cost of the action according to whether the action is legal or not and the estimated cost obtained by the virtual index spoofing optimizer in the CnosDB, and simultaneously showing the state after the action is executed, and storing the < state, the action, the cost and the next state > vector into a replay space.

4. A small sample is sampled from the replay volume and the two neural networks are trained with the sample.

5. Updating the current state to the next state.

6. The process of 2 to 5 is repeated until the neural network training is completed.

The key point of the invention is as follows: the invention models the index selection problem as a Markov process, and designs the code representation of the specialized index state and index recommendation action for the CnosDB time sequence database. Through these coded representations and cost functions, the index selection problem can be translated into a reinforcement learning problem, helping neural network training through feedback using virtual indexes, and finding a set of optimized indexes. Meanwhile, in order to accommodate more state spaces and reduce estimation deviation of a cost function, the invention selects the DoubleDQN algorithm for training, and can effectively improve the effect of index recommendation.

For the method of the invention, experimental results show that the query performance of the database configured with the corresponding index combination after the algorithm is trained is always better than that of the comparable similar baseline algorithm. The agent of the invention can select a strategy with better performance under the condition of not occupying the space of index configuration too much. Meanwhile, because strategies of agents under databases of different sizes have similar performances, training time can be saved by a method of training under a small-scale database and then applying the method to a large-scale database under the condition that absolute accuracy of optimization is not pursued. In conclusion, the algorithm of the invention has practical application value in actual production.

The second embodiment is as follows:

the embodiment is a storage medium, in which at least one instruction is stored, where the at least one instruction is loaded and executed by a processor to implement the deep learning-based automated time series database index recommendation method.

The present embodiment includes, but is not limited to, a hard disk for storing commands corresponding to an automated time series database index recommendation method for performing deep learning.

And a third specific embodiment:

the embodiment is an apparatus comprising a processor and a memory, wherein at least one instruction is stored in the memory, and the at least one instruction is loaded and executed by the processor to implement the deep learning-based automated time series database index recommendation method.

The embodiment includes, but is not limited to, a PC, a workstation, and a mobile device that execute commands corresponding to the deep learning based automated time series database index recommendation method.

The present invention is capable of other and further embodiments and its several details are capable of modification and variation in light of the present invention, as will be apparent to those skilled in the art, without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. The automatic time sequence database index recommendation method based on deep learning is characterized by comprising the following steps of:

the environmental model includes states, actions, and costs;

representing states as two-dimensional vectors

The two dimensions are index number and column number respectively; index number represents the number of the current index, column number represents the number of the current column;

motion as vector

The action behavior includes: creating an index, upgrading the index, deleting the index and not operating; each action represents the front of the liningCOne of the bits, if 0, represents not indexing the column, and if 1, represents indexing the column; if the previous C bits are all 0, the last bit is 0, indicating no operation, and if it is 1, indicating deletion of the index; if some of the previous C bits are 1, the last bit is 0 to represent the creation index, and the last bit is 1 to represent the upgrade index;

the cost function is as follows:

obtaining after executing EXPLAIN command through virtual index query using estimated cost of execution engine in CnosDB as cost of legal actionObtaining the product; if the agent selects illegal action in the current state space, giving a larger penalty r _c So that illegal actions in the state are recorded;

cost function:

the reinforcement learning model for index recommendation of the time sequence database is trained in advance, and the training process comprises the following steps:

5. updating the current state to the next state;

2. The automated time series database index recommendation method based on deep learning of claim 1, wherein the upgrade index only upgrades an existing index by one dimension.

3. The method for deep learning based automated time series database index recommendation of claim 2, wherein the method is directed to motion vectors

4. The method of claim 3, wherein the encoding is a digital compression in the process of encoding the vector into the corresponding decimal number, and the motion vector is regarded as a binary number and then encoded into the corresponding decimal number.

5. The method of claim 3, wherein after encoding the vector into the corresponding decimal number, the array is used to store which actions are legal and which actions are illegal.

6. The deep learning-based automated time series database index recommendation method of claim 5, wherein the penalty r for illegal actions _c Set to 800.

7. The automated time series database index recommendation method based on deep learning according to any one of claims 1 to 6, wherein the reinforcement learning model is a Double DQN model.

8. A storage medium having stored therein at least one instruction that is loaded and executed by a processor to implement the deep learning based automated time series database index recommendation method of one of claims 1 to 7.

9. An apparatus comprising a processor and a memory having stored therein at least one instruction that is loaded and executed by the processor to implement the deep learning based automated time series database index recommendation method of one of claims 1 to 6.