CN112347104B

CN112347104B - Column storage layout optimization method based on deep reinforcement learning

Info

Publication number: CN112347104B
Application number: CN202011228158.6A
Authority: CN
Inventors: 覃雄派; 陈跃国; 杜小勇; 赵丽萍
Original assignee: Renmin University of China
Current assignee: Renmin University of China
Priority date: 2020-11-06
Filing date: 2020-11-06
Publication date: 2023-09-29
Anticipated expiration: 2040-11-06
Also published as: CN112347104A

Abstract

The application discloses a column storage layout optimization method based on deep reinforcement learning, which comprises the following steps: receiving a query load; analyzing the query load to generate query characteristics; acquiring feature data of a data column according to the query feature; determining the output sequence of the columns based on the strategy of the output sequence of the columns and the characteristic data of the data columns; performing quantitative evaluation on the output sequence, wherein the quantitative evaluation strategy is adjusted based on rewards of the system; and a strategy for adjusting the output sequence of the columns according to the quantitative evaluation result. According to the application, the used model parameters can be continuously adjusted in the expected direction of reducing the disk skip time, so that the neural network automatically learns the optimal column ordering according to the characteristic data of the columns, incremental training can be realized, and the column ordering is not required to be recalculated during each optimization, thereby greatly reducing the calculation cost.

Description

Column storage layout optimization method based on deep reinforcement learning

Technical Field

The application relates to the field of computers, in particular to a column storage layout optimization method based on deep reinforcement learning, which mainly performs layout optimization on column storage of big data so as to improve data reading performance.

Background

OLAP (Online Line Analytic Processing) analysis of relational data plays a vital role in many analysis and decision support class applications. In the big data age, many big data analysis systems, such as Hive, spark SQL, etc., take HDFS (Hadoop Distributed File System) as the storage of the bottom layer, and a large amount of data is continuously accumulated and stored on the HDFS, and the real-time performance requirement of data analysis is higher and higher. HDFS, as a de facto standard for distributed big data low cost data storage and processing, provides a fault tolerant, portable, scalable, high read and write throughput unified data store for big data analytics systems. Big data analysis systems on HDFS are typically used to support batch and interactive query analysis on massive data.

In these systems, the data table is typically in a column storage format such as RCFile, ORC, parquet, carbonData, with column stored data storage providing flexible and efficient data encoding and compression functions and being able to read only the necessary data columns, avoiding unnecessary I/O, but we have found that the query analysis performance of the data on the HDFS can be further improved by optimization of the storage layout. When a query accesses a data column in a horizontal slice in an HDFS data block, multiple disk jumps are required, and an optimal column order may provide minimal disk jump costs. Among these, the rank ordering problem has been demonstrated by academic papers as NP-Hard. How to design an efficient column ordering algorithm to find a near optimal column order for a given query load is a challenge. The existing heuristic search has strong randomness in optimization, is easy to fall into suboptimal, and meanwhile, the column ordering is required to be recalculated for each optimization, so that the calculation cost is high.

Disclosure of Invention

The present application has been made in view of the above problems, and it is an object of the present application to provide a solution to overcome or at least partially solve the above problems. Accordingly, in one aspect of the present application, there is provided a column storage layout optimization method based on deep reinforcement learning, the method comprising:

receiving a query load;

analyzing the query load to generate query characteristics;

acquiring feature data of a data column according to the query feature;

determining the output sequence of the columns based on the strategy of the output sequence of the columns and the characteristic data of the data columns;

performing quantitative evaluation on the output sequence, wherein the quantitative evaluation strategy is adjusted based on rewards of the system;

and a strategy for adjusting the output sequence of the columns according to the quantitative evaluation result.

Optionally, the output sequence is quantitatively evaluated according to the disc skip time.

Optionally, a strategy of implementing the output sequence of the deep reinforcement learning column by adopting an Actor-Critic algorithm, and adjusting the quantitative evaluation strategy based on rewards of the system, wherein the strategy comprises adjusting parameters in the Critic neural network according to rewards given by the system.

Alternatively, neural networks employing the Pointer Net can be used to make decisions about the order of output, including mapping from one sequence to another.

Optionally, determining the output order of the columns based on the policy of the output order of the columns and the characteristic data of the data columns includes:

obtaining the weight associated with each position of the input sequence by using an attention mechanism;

and combining the input sequence with the weight to calculate the element with the maximum relation between the current output and the input sequence, and taking the element of the input sequence as an output element.

Optionally, the method further comprises: the method for uniformly coding the input query load specifically comprises the following steps:

initializing each query in the input query load to a set;

determining a corresponding column access characteristic for each query;

and carrying out binary coding on the elements in the set corresponding to the query according to the column access characteristics.

The technical scheme provided by the application has at least the following technical effects or advantages: the application realizes a column storage layout optimization method based on deep reinforcement learning, and performs experimental comparison with the existing heuristic column ordering algorithm, further reduces the skip cost of the disk, can continuously adjust the used model parameters in the expected direction of reducing the skip time of the disk, enables the neural network to automatically learn the optimal column ordering according to the characteristic data of the column, can realize incremental training, directly inputs the latest query load into the model, and does not need to recalculate the column ordering during each optimization, thereby greatly reducing the calculation cost.

The foregoing description is only an overview of the technical solutions of the present application, and may be implemented according to the content of the specification in order to make the technical means of the present application more clearly understood, and in order to make the technical solutions of the present application and the objects, features and advantages thereof more clearly understood, the following specific embodiments of the present application will be specifically described.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:

FIG. 1 is a flow chart of a column storage layout optimization method based on deep reinforcement learning according to the present application;

FIG. 2 shows a map of a skip cost model for a disk in a wide-table based column storage layout optimization scheme;

FIG. 3 illustrates an overall framework of the deep reinforcement learning based column storage layout optimization proposed by the present application.

Detailed Description

Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the application to those skilled in the art.

In big data analysis systems, I/O is often the main performance bottleneck, and the design and optimization of storage systems is critical to the improvement of big data analysis performance.In terms of data organization, column storage (e.g., ORC, part) provides flexible and efficient data encoding and compression functions and enables reading only the necessary data columns, thus avoiding unnecessary I/O. Under a column storage layout, how to adjust the physical data storage layout to suit the changing query load and system environment is a challenge. The application aims to design and realize a column storage layout optimization method DRL-COA based on deep reinforcement learning under a Hadoop environmentDeep Reinforcement Learning based Column Ordering Algorithm) is applied to adaptive column storage layout optimization, and compared with the existing heuristic column ordering algorithm, the method further reduces the read-out cost of the disk.

In the DRL-COA provided by the application, reinforcement learning is performed by using an Actor-Critic algorithm model, and a network structure of a Pointer Net is applied to an Actor neural network, wherein the network structure is mainly used for continuously outputting new actions (column sequences) according to initial input, and the Critic neural network is used for evaluating the actions according to 'benefits' after the actions, so that the new actions are continuously selected. In order to perform the action selection, the selection of the columns of each position is also important because of a column order, and here, the weight associated with each position of the input sequence by the element of a certain position is obtained through the attention mechanism, so that the selection is performed.

In one aspect of the present application, there is provided a column storage layout optimization method based on deep reinforcement learning, which solves a sequential decision problem of columns by using a deep reinforcement learning technique, optimizes a storage layout by training a model, and in particular, as shown in fig. 1, the method includes:

receiving a query load;

analyzing the query load to generate query characteristics;

acquiring feature data of a data column according to the query feature;

As a preferred embodiment, the strategy of implementing the output sequence of the deep reinforcement learning column by adopting an Actor-Critic algorithm, and the quantitative evaluation strategy is adjusted based on the rewards of the system, wherein the quantitative evaluation strategy comprises the adjustment of parameters in the Critic neural network according to the rewards given by the system.

The specific process implemented by the Actor-Critic algorithm is described in detail below. The inputs to the DRL-COA network model in the scheme may be represented as a matrix 1*n]：c _i Representing whether each data column exists in the query (wherein 1 represents the existence and 0 represents the nonexistence), n represents the number of queries, and the input is a mathematical representation of the query load Q. The output of the Actor-Critic model is the Order of the data columns and may be denoted as O (Order). Meanwhile, the output result of each iteration of the model is evaluated by using the disk skip time. The flow is mainly as follows:

(1) According to the current state, the Actor makes an action of column ordering output;

(2) Critic scores the behavior of the action just according to state and action;

(3) According to the scoring of critic, the Actor adjusts the current strategy (i.e. parameters in the Actor neural network) and executes the next action;

(4) Critic also adjusts current scoring strategies (i.e., parameters in the Critic neural network) according to the report given by the system;

(5) Initially, the Actor performs randomly and Critic scores randomly. However, because of the presence of reorder, critic scoring is more and more accurate and Actor also performs better.

In this method, the output order is preferably quantitatively evaluated based on the disk skip time. Network model random strategy p of the scheme _θ (o|c) can be expressed as: when the input is c (column) and the output is o (order), the model evaluates SC (o-c) as the corresponding disc skip time. Grinding processThe targets of the model training are as follows: when the model evaluation SC (o|c) value is smaller, the output o will be selected with a greater probability. The output result of each iteration of the model is evaluated by the disk skip time during training, so that the design can ensure that model parameters are continuously adjusted in the expected direction of reducing the disk skip time during training.

FIG. 2 shows a map of the skip cost model of a disk in a wide-table based column storage layout optimization scheme. In a column storage layout optimization scheme based on a broad table, a skip time cost model based on a disk is designed aiming at the characteristics of data access on a traditional disk, skip operations with equal distances are executed for a plurality of times in a series of HDFS files, and the average skip time on the skip distance d is taken as the skip cost in the statistical sense corresponding to d. After the skip cost under different skip distances is obtained, a piecewise skip cost function is constructed by adopting linear fitting. Fig. 2 shows skip cost functions obtained with this method on three different types of disks.

In the patent, the method adopts an Actor-Critic algorithm to realize deep reinforcement learning, so that discrete values can be processed and single-step updating can be realized on processing game data, and compared with the prior art, the algorithm based on a value function can only process discrete values and update at each step of a game, and the algorithm based on a strategy can process discrete values and continuous values, but can not process until the end of each round of the game. But after the Actor-Critic deep learning algorithm is adopted in the application, the method can process discrete values and can update in a single step.

FIG. 3 is a general framework diagram of the deep reinforcement learning based column storage layout optimization proposed by the present application. Several components of the DRL-COA model are shown in the figure, from collection and analysis of query load to characteristic input and training of the model, an intelligent body (a deep learning body) learns characteristic data of a learning column through continuous interaction with the environment, and a Critic neural network evaluates the output result of each iteration of the Actor neural network model according to a skip estimator component, so that model parameters can be continuously adjusted in an expected direction of reducing magnetic disc skip time, and compared with stochastic optimization of heuristic search, the model parameters have more directionality and are not easy to fall into suboptimal conditions; meanwhile, as a deep reinforcement learning model, the column ordering is not required to be recalculated in each optimization, and the calculation cost is greatly reduced in an incremental training mode.

In this patent, determining the output order of the columns based on the policy of the output order of the columns, the characteristic data of the data columns may include:

Preferably, neural networks employing the Pointer Net are used to make decisions about the order of output, including mapping from one sequence to another.

The solution is to consider the problem of column ordering and analogize it to a related problem of combinatorial optimization, i.e. decisions between sequences need to be made to adjust the column order. The DRL-COA model adopts a neural network of the Pointer Net to solve a sequence decision problem in column ordering and solve a mapping problem from one sequence to another sequence. Meanwhile, when calculating the output sequence, the attention mechanism is utilized to obtain the weight of the element of a certain position of the output sequence and each position of the input sequence, and then the input sequence and the weight are combined in a certain mode to influence the output. Thus, the element with the greatest relation between the current output and the input sequence can be calculated, and the element of the input sequence is taken as the output element, and each output element points to the input element like a pointer. The design can control each input element to be pointed by only one output element, so that the repeated occurrence of the input elements is avoided.

In the present application, encoding of the input query load samples is performed prior to model training. This is because each query accesses columns may be only a partial column, while the output is a set of all columns, and the Pointer Net network structure can be achieved by input encodingIt is required that the content of the output sequence is identical to the content of the input sequence, except that the order of the sequences is changed. In FIG. 3, C ₁ ,C ₂ ,C ₃ ,C ₄ ,C ₅ For the data columns input by the encoder<g>,C ₄ ,C ₅ ,C ₁ ,C ₂ For the data column output by the decoder.

Assuming that the number of queries in the load Q is N and the accessed data column set length is N, we initialize each query Q to bec _i Set N' of=0.

Thus, the input query load is uniformly encoded, which may specifically include:

initializing each query in the input query load to a set;

determining a corresponding column access characteristic for each query; specifically, for each query Q (involving only m data columns) in load Q, its column access feature is C _q ＝{c _q,1 ,c _q,2 ,...,c _q,m }。

The elements in the set corresponding to the query are binary coded according to the column access feature, specifically, the subscript position of the data column {1,2,..m } corresponding to the query q in N' may be set to 1, and the other positions are still 0 (indicating that the query q did not access the column). Thus, the load is uniformly encoded into a {1,0,..1 } pattern.

According to the method, the input load sample is effectively encoded through the techniques of an Actor-Critic deep reinforcement learning algorithm, a Pointer Net neural network, a attention mechanism, a disk skip cost simulation and the like, the column sequence is used as output, the disk skip cost is used as an output result of each iteration of a model to evaluate, and therefore model parameters can be continuously adjusted in an expected direction of reduction of disk skip time. Under the implementation scheme, the neural network automatically learns the optimal column ordering according to the characteristic data of the columns, the incremental training of the DRL-COA model can be realized, the latest query load is directly input into the model, and the column ordering is not required to be recalculated during each optimization, so that the calculation cost is greatly reduced.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the above description of exemplary embodiments of the application, various features of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be construed as reflecting the intention that: i.e., the claimed application requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application.

It should be noted that the above-mentioned embodiments illustrate rather than limit the application, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim.

Claims

1. A column storage layout optimization method based on deep reinforcement learning is characterized by comprising the following steps:

receiving a query load;

analyzing the query load to generate query characteristics, which specifically comprises the following steps: the query load is uniformly coded, specifically: initializing each query in the query load to a set; determining a corresponding column access characteristic for each query; performing binary coding on elements in the set corresponding to the query according to the column access characteristics;

acquiring feature data of a data column according to the query feature;

adopting an Actor-Critic algorithm to realize a strategy of the output sequence of the deep reinforcement learning column;

determining an output order of the columns based on a strategy of the output order of the columns, characteristic data of the data columns, wherein a neural network of the Pointer Net is adopted to make a decision of the output order, including mapping from one sequence to another sequence;

carrying out quantitative evaluation on the output sequence, wherein a quantitative evaluation strategy is adjusted based on rewards of the system, and the quantitative evaluation strategy based on the rewards adjustment of the system comprises the steps of adjusting parameters in a critic neural network according to rewards given by the system;

a strategy for adjusting the output sequence of the columns according to the quantitative evaluation result;

wherein determining the output order of the columns based on the policy of the output order of the columns, the characteristic data of the data columns, comprises:

obtaining the weight associated with each position of the input sequence by using an attention mechanism; and combining the input sequence with the weight to calculate the element with the maximum relation between the current output and the input sequence, and taking the element of the input sequence as an output element.

2. The column storage layout optimization method of claim 1, further characterized by quantitatively evaluating the output order based on disk skip time.