CN108932671A

CN108932671A - A kind of LSTM wind-powered electricity generation load forecasting method joined using depth Q neural network tune

Info

Publication number: CN108932671A
Application number: CN201810575699.2A
Authority: CN
Inventors: 赵坤; 张挺
Original assignee: Shanghai University of Electric Power
Current assignee: Shanghai University of Electric Power; University of Shanghai for Science and Technology
Priority date: 2018-06-06
Filing date: 2018-06-06
Publication date: 2018-12-04

Abstract

The present invention relates to a kind of LSTM wind-powered electricity generation load forecasting methods joined using depth Q neural network tune, and this approach includes the following steps：1) initial data for acquiring power system environment, chooses training set and forecast set；2) using LSTM as prediction model, the hyper parameter in prediction model is adjusted using DQN, specifically includes environmental parameter adjusting, state adjustment, movement selection, the intensified learning reward of regularized learning algorithm rate using the hyper parameter in DQN adjusting prediction model；3) training result is fed back to using experience recovery method and carries out parameter optimization in DQN by the prediction model after training set to be substituted into adjustment parameter, obtains optimal L STM prediction model；4) wind-powered electricity generation load prediction is carried out using optimal L STM prediction model.Compared with prior art, the present invention is not necessarily to need professional's de-regulation when different regions, is greatly improved forecasting efficiency.

Description

A kind of LSTM wind-powered electricity generation load forecasting method joined using depth Q neural network tune

Technical field

The present invention relates to technical field of electric power, more particularly, to a kind of LSTM joined using depth Q neural network tune Wind-powered electricity generation load forecasting method.

Background technique

Wind-powered electricity generation load prediction is the important component in power scheduling work, and prediction quality directly determines wind-powered electricity generation energy No access network system.Wind-powered electricity generation load belongs to time series, as the variation of time is constantly updated.With LSTM (Long Short Term Memory networks, shot and long term memory network) structure RNN (Recurrent Neural Networks, Recognition with Recurrent Neural Network) it can effectively solve the problems, such as that the time gradient of RNN network disappears, and since RNN is special Network structure make it have unique advantage to time series data.

Recognition with Recurrent Neural Network has special network structure, the i.e. input of hidden layer：In addition to the input layer at current time inputs, There are also the inputs of the input layer of last moment, as shown in Figure 1.In Fig. 1, x, x1, x2 are respectively the input of different time nodes, o, O1, o2 are then respectively the output of corresponding time, and it is shared in entire RNN that U, V, which are linear relationship matrix,.It will be born with wind-powered electricity generation The relevant data of lotus are used as prediction model including time, the wind speed of wind field, realtime power, frequency, wind direction, outdoor temperature Input passes through network query function and obtains output result o, o is then compared available error with corresponding wind-force load, Obtain after error using gradient decline (Gradient Descent) and BPTT (Back-Propagation Through Time, Time-based backpropagation) method is trained model, and BPTT solves gradient using backpropagation and updates network parameter Weight.By the loop unrolling in RNN, upper one layer of neural network can communicate information to next layer, and here it is RNN to time series The advantageous reason of the processing of data.Do not need to train the parameter of all neural networks, it is only necessary to one layer of training, it is therein Parameter is shared parameter.

The problem of common RNN might have gradient disappearance or gradient explosion in face of long-time span, LSTM can retain mistake Difference, for carrying out back transfer along time and layer.Error is maintained at more constant level by LSTM, allows recirculating network can be into The study (more than 1000 time steps) of many a time steps of row, to open the channel for establishing remote causal relation.

Information is stored in the door control unit except recirculating network normal information stream by LSTM.These units can store, Information is read in write-in, just as the data in calculator memory.Unit determines which information stored by the switch of door, with And when allows to read, is written or removes information.But unlike the digital memory in computer, these doors are simulations , comprising output area all the sigmoid function between 0~1 by element multiplication operate.Compared to digital storage, mould The advantages of analog values is differentiable, therefore is suitble to backpropagation.Signal that these foundations receive and switch, and and nerve net The node of network is similar, they can screen information with own weight sets, is decided whether according to its intensity and importing content Information is allowed to pass through.These weights input as the weight of hidden state just as modulation, can pass through the study of recirculating network Journey is adjusted.That is, memory unit can by conjecture, error back propagation, with gradient decline adjustment weight iteration When procedural learning allows data to enter, leave or be deleted.Its structure is as shown in Figure 2.Three arrow tables of bottommost in Fig. 2 Show that information flows into memory unit (cell) from multiple points.Current input is not only admitted to memory unit sheet with past location mode Body, while also entering three doors of unit, and these doors would decide how that processing input stain is door, by being from different Number, which is multiplied to determine when to allow respectively newly to input, enters (yⁱⁿ), when remove current location modeAnd when allow list First state has an impact (y to the network output that current time walks^out)。S_cIt is the current state of memory unit, and gyⁱⁿIt is current Input.Each Men Douke, which is opened, to close, and door each time step can regrouping switch state.Memory unit is each Time step can decide whether to forget its state, if allow to be written, if allow to read.LSTM predict whether it is accurate with it is super Parameter has direct relation, and therefore, suitable hyper parameter enables prediction model to reach or be very close to globe optimum.Existing skill Art generallys use Q-Learning algorithm, and algorithm flow is：

Initialization Q (s, a),A ∈ A (s), arbitrary numerical value, and Q (terminal-state)=0；

It repeats (to each section episode)；

Init state S；

It repeats (to each step in episode)：

Using some policy, such as (ε-greedy) chooses a movement according to state S and executes；

After having executed movement, reward and new state S ' is observed；

Q(S_t,A_t)←Q(S_t, A_t)+a(R_t+1+λmax_aQ(S_t+1,a)-Q(S_t,A_t))

S←S′

Circulation is until terminating.

α in algorithm is learning rate, controls the difference degree being taken into account between previous Q value and the Q value newly proposed. Q refers to corresponding Q value, and λ is then discount factor, and when discount factor is 0, prediction model can tend to active sheet and do decision, The content for then tending to the trial that do not did before doing when it is 1 to expand Q table, it is however generally that discount factor takes between 0 to 1 A number come balance even if reward and explore.R_t+1+λmax_aQ(S_t+1, a) it is target Q value, Q-Learning algorithm is mainly It is to allow Q (S_t, a) close to target Q value.Help to optimize wind-powered electricity generation prediction model, is allowed to be adapted to different geographical.However wind-powered electricity generation load It is affected by territorial environment, the model parameter of different geographical has biggish difference, when prediction model is applied in different regions When need professional's de-regulation, and the tune of prediction model ginseng quite takes manpower, more inconvenience.

Summary of the invention

It is an object of the present invention to overcome the above-mentioned drawbacks of the prior art and provide a kind of automatic tune ginseng, improve Forecasting efficiency, and it is capable of the LSTM wind-powered electricity generation load forecasting method of adaptive different geographical joined using depth Q neural network tune.

The purpose of the present invention can be achieved through the following technical solutions：

A kind of LSTM wind-powered electricity generation load forecasting method joined using depth Q neural network tune, this approach includes the following steps：

S1：The initial data of power system environment is acquired, training set and forecast set are chosen；

S2：Using LSTM as prediction model, the hyper parameter in prediction model is adjusted using DQN；

It include that environmental parameter is adjusted, state adjusts, movement choosing using the particular content that DQN adjusts the parameter in prediction model It selects and the intensified learning of regularized learning algorithm rate is rewarded.Environmental parameter, which is adjusted, combines LSTM prediction model and a series of movement, is formed One Markovian decision model；The realization of the intensified learning reward of state adjustment, movement selection and regularized learning algorithm rate is based on The Markovian decision model of formation.

Wherein, the particular content of environmental parameter adjusting is：

Adaptive learning rate is adjusted using learning rate adjustment function f (x), is adjusted and is adapted to using regular parameter adjustment function g (x) Regular parameter, it is assumed that (p, y) is a training sample, and p is input, including learning rate x_tWith regular parameter z_t, y is desired defeated Out, a is reality output, then has：

In formula, n is number of samples.

State adjustment particular content be：

Indicate that state, the feature vector of six state features include the phase using the feature vector comprising six state features Hope dot product, the MI/ between hyper parameter, candidate iterative target value, past M step maximum target value, descent direction and the gradient of adjustment MAX coding, function review number and alignment metric, then have：

IfFor the list of M obtained minimum target values of time t-1, state [S_t] coding determined by following formula：

In formula, as f (x_t) be less thanMinimum value when, be encoded to 1, then take 0 in M F before, other situations take- 1；

To the adjustment [st] that does well_alignmentFor：

Descent directionExpression formula be：

In formula,ForAboutGradient,For the mean value of learning rate.

Act the particular content that selects for：

For given state, learning rate or regularization parameter are reset into initial value using after receiving iteration Method carries out movement selection, and when Schistosomiasis control rate, there are two movements, keeps learning rate or half learning rate；For Regularization coefficient is adjusted, other than two kinds of selections, it is allowed to increase a quarter.

The intensified learning of regularized learning algorithm rate rewards r_id(f, x_t) expression formula be：

In formula, f_lbFor the target lower bound of functional value, c is target floor value.

S3：It is excellent to be fed back to progress parameter in DQN by the prediction model after training set to be substituted into adjustment parameter for training result Change, obtains optimal L STM prediction model；

The skill that experience replay is used to training part, when being updated each time to the parameter of neural network, from number According to the inner training result randomly transferred before part, for updating DQN, and then optimal L STM prediction model is obtained.

S4：Wind-powered electricity generation load prediction is carried out using optimal L STM prediction model.

Compared with prior art, the present invention makes prediction model voluntarily learning regulation hyper parameter using DQN, is adapted to difference The wind-powered electricity generation prediction model of region substantially increases forecasting efficiency without needing professional's de-regulation when different regions.

Detailed description of the invention

Fig. 1 is RNN structure chart；

Fig. 2 is LSTM structure chart；

Fig. 3 is the flow diagram of the method for the present invention；

Fig. 4 be in the embodiment of the present invention DQN learning rate be 0.05, e_greedy=0.01 when convergence effect picture；

Fig. 5 is to be declined and the prediction model accuracy rate pair using the decline of general gradient in the embodiment of the present invention using Q gradient Than figure；

Fig. 6 is to decline to decline with the error prediction model using the decline of general gradient using Q gradient in the embodiment of the present invention Restrain effect contrast figure.

Specific embodiment

The present invention is described in detail with specific embodiment below in conjunction with the accompanying drawings.

Embodiment

As shown in figure 3, the present invention relates to a kind of LSTM wind-powered electricity generation load forecasting method joined using depth Q neural network tune, The main contents of this method are：

1) initial data for acquiring power system environment, chooses training set and forecast set.

2) prediction model is obtained using the hyper parameter in DQN dynamically adapting prediction model as prediction model using LSTM Output valve.Particular content using the hyper parameter in DQN dynamically adapting prediction model includes environmental parameter adjusting, state tune Whole, movement selection, the intensified learning reward of regularized learning algorithm rate.

3) training set is substituted into the prediction model after adjustment parameter, training result is fed back to and carries out parameter optimization in DQN, Obtain optimal L STM prediction model；

4) wind-powered electricity generation load prediction is carried out using optimal L STM prediction model.

Using LSTM as prediction model, gone in dynamic adaptive prediction model using a depth Q neural network (DQN) Hyper parameter, whenever DQN makes the value that a movement takes a learning rate, this value, which can be modeled in prediction model, then can There is an output, and reward valuation is carried out to it, at this moment DQN will be acted and is included in a Q table with corresponding reward valuation, and Data volume is huge, therefore needs to use depth network to attempt before recording as a result, so as to so that DQN can learn from table To the skill for adjusting hyper parameter.Environment, movement and reward therein are defined as follows：

1, environment

In formula, learning rate adjustment function f (x_t), regular parameter adjustment function g (z_t) make to join in different learning rate and canonical It narrows the gap between the output and expectation of prediction model in the case where number, the x in two formulas_t、z_tIt respectively indicates learning rate and canonical is joined Number；(p, y) is a training sample, and n is number of samples, and p is input, including learning rate x_t, regular parameter z_t；Y is desired defeated Out, a is reality output.Here adjustment function f (x_t), regular parameter adjustment function g (z_t) cross entropy cost function is used, when accidentally Weight updating decision when poor big, error hour weight update slow.Using f (x) de-regulation adaptive learning rate, is adjusted and adapted to using g (x) Regular parameter.Environment combines prediction model and a series of movement and other essential elements, forms a Markov Decision model, that is, the selection acted only relies on the current state of user, not related with historical behavior before.State adjustment, The completion of movement selection and the intensified learning reward of regularized learning algorithm rate is based on the Markovian decision model.

2, state

There are six the feature vectors of state feature for apparatus to indicate state.State feature is it is desirable that adjustment surpasses respectively Dot product, MI/MAX coding, letter between parameter, candidate iterative target value, past M step maximum target value, descent direction and gradient Number review number and alignment metric, first four feature can directly acquire, and for most latter two feature, have：

IfIt is the list for the M minimum target values that time t-1 is obtained, state [S_t] coding determined by following formula：

In formula, as f (x_t) relatively before minimumAlso hour is encoded to 1, then takes 0 in M F before, other feelings Condition takes -1.It allowsAs descent direction, expression formula is：

In formula,ForAboutGradient,For the mean value of learning rate.

To the adjustment [st] that does well_alignmentFor：

In addition, being in section [- 1,1] by all eigentransformations to make state feature independently of specific objective function Feature.

3, it acts

For given state, movement is how to change the combination of learning rate and regularization parameter.In general, it learns It practises rate and regularization parameter is very small.Therefore, learning rate or regularization parameter are reset to using after receiving iteration The strategy of initial value.Therefore, when Schistosomiasis control rate, there are two movements：Keep learning rate or half learning rate.For Regularization coefficient is adjusted, other than two kinds of selections, it is allowed to increase a quarter.

4, it rewards

For regularized learning algorithm rate, reward is defined as training the anti-distance for being lost to lower bound only from target.Regularized learning algorithm The intensified learning of rate rewards r_id(f, x_t) be shown below：

In formula, f_lbFor the target lower bound of functional value, it in general will can be set as in fact zero, as the sum of loss function Target.C is target lower bound.

5, experience replay

In the skill of training part application experience playback, when being updated each time to the parameter of neural network, just from number According to it is inner randomly transfer one it is small quantities of before training result, help trains neural network.

One experience includes a (s_i, a_i, r_i+1, s_i+1, label)^j, wherein i refers to that time step is i；J refers to that e_greed is j.These tuples are stored in the memory of experience E.In addition to updating DQN, a subset S ∈ E quilt with most of nearest experience The update DQN for small lot is pulled out from memory.

6, training result

LSTM prediction model is set 6 inputs (time, the wind speed of wind field, realtime power, frequencies, wind by the present embodiment To, outdoor temperature), an output (load), Recognition with Recurrent Neural Network is set as 3 layers, and hidden unit is 128, activation primitive choosing Softsign activation primitive is selected, the load data in experiment has 8932, wherein 80% data are as training data, 20% Data are as test set.Discount factor λ is set as 0.99 when starting, explore probability and be set as 1, Uniform attenuation arrives in 100 steps 0.1.When it is 0.01 that the learning rate of DQN, which is 0.05, e_greed, as shown in the longitudinal axis of Fig. 4, the loss of DQN starts significantly Convergence.The precision that DQN after 100 steps reaches about 40% is iterated to when training 8 is small, it is higher than reference line by 10%, and its loss is received Hold back speed also fastly compared with reference line (gradient decline), if Fig. 5 horizontal axis is iterative steps, the longitudinal axis is prediction model accuracy rate, is compared The rate of climb of the method for gradient decline, the accuracy rate of Q gradient decline model has a clear superiority with amplitude.Horizontal axis is in Fig. 6 Iterative steps, the longitudinal axis are the error prediction model of the decline of Q gradient with gradient decline, and it is more fast that Q gradient declines model error decline Speed.It is limited to the limitation of computing capability, 100 steps of our iteration, but from wherein finding out that the learning ability of DQN still makes target The accuracy of network rises quickly and the convergence of its error is also fine.

The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any The staff for being familiar with the art in the technical scope disclosed by the present invention, can readily occur in various equivalent modifications or replace It changes, these modifications or substitutions should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with right It is required that protection scope subject to.

Claims

1. it is a kind of using depth Q neural network tune join LSTM wind-powered electricity generation load forecasting method, which is characterized in that this method include with Lower step：

1) initial data for acquiring power system environment, chooses training set and forecast set；

2) using LSTM as prediction model, the hyper parameter in prediction model is adjusted using DQN；

3) training set is substituted into the prediction model after adjustment parameter, training result is fed back to and carries out parameter optimization in DQN, is obtained Optimal L STM prediction model；

2. a kind of LSTM wind-powered electricity generation load forecasting method joined using depth Q neural network tune according to claim 1, special Sign is, includes environmental parameter adjusting, state tune using the particular content that DQN adjusts the parameter in prediction model in step 2) Whole, movement selection and the intensified learning reward of regularized learning algorithm rate, environmental parameter, which is adjusted, combines LSTM prediction model and a series of Movement forms a Markovian decision model, the reality of the intensified learning reward of state adjustment, movement selection and regularized learning algorithm rate Now based on the Markovian decision model.

3. a kind of LSTM wind-powered electricity generation load forecasting method joined using depth Q neural network tune according to claim 2, special Sign is that the particular content that environmental parameter is adjusted is：

Adaptive learning rate is adjusted using learning rate adjustment function f (x), is adjusted using regular parameter adjustment function g (x) and adapts to canonical Parameter, it is assumed that (p, y) is a training sample, and p is input, including learning rate x_tWith regular parameter z_t, y is desired output, a For reality output, then have：

In formula, n is number of samples.

4. a kind of LSTM wind-powered electricity generation load forecasting method joined using depth Q neural network tune according to claim 3, special Sign is that the particular content of state adjustment is：

Indicate that state, the feature vector of six state features include that expectation is adjusted using the feature vector comprising six state features Dot product, MI/MAX volume between whole hyper parameter, candidate iterative target value, past M step maximum target value, descent direction and gradient Code, function review number and alignment metric, then have：

In formula, as f (x_t) be less thanMinimum value when, be encoded to 1, then take 0 in M F before, other situations take -1；

To the adjustment [st] that does well_alignmentFor：

5. a kind of LSTM wind-powered electricity generation load forecasting method joined using depth Q neural network tune according to claim 4, special Sign is, descent directionExpression formula be：

In formula,ForAboutGradient,For the mean value of learning rate.

6. a kind of LSTM wind-powered electricity generation load forecasting method joined using depth Q neural network tune according to claim 5, special Sign is, act the particular content that selects for：

For given state, using the method that learning rate or regularization parameter are reset to initial value after receiving iteration Movement selection is carried out, when Schistosomiasis control rate, there are two movements, keep learning rate or half learning rate；For adjustment Regularization coefficient allows it to increase a quarter other than two kinds of selections.

7. a kind of LSTM wind-powered electricity generation load forecasting method joined using depth Q neural network tune according to claim 6, special Sign is that the intensified learning of regularized learning algorithm rate rewards r_id(f, x_t) expression formula be：

8. a kind of LSTM wind-powered electricity generation load forecasting method joined using depth Q neural network tune according to claim 1, special Sign is that the particular content of step 3) is：

The skill that experience replay is used to training part, when being updated each time to the parameter of neural network, in data The training result before part is randomly transferred, for updating DQN, and then obtains optimal L STM prediction model.