CN114865620B

CN114865620B - Wind power plant generating capacity prediction method based on machine learning algorithm

Info

Publication number: CN114865620B
Application number: CN202210467269.5A
Authority: CN
Inventors: 綦方中; 卓可翔; 曹聪
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2022-04-29
Filing date: 2022-04-29
Publication date: 2023-01-10
Anticipated expiration: 2042-04-29
Also published as: CN114865620A

Abstract

The invention discloses a wind power plant generating capacity prediction method based on a machine learning algorithm, which comprises the following steps: obtaining and inputting meteorological historical data to obtain a vector capable of representing meteorological data characteristics, training and learning characteristics related to a time sequence among variables in a meteorological data characteristic sequence by using a circulating high-speed network, re-screening the characteristic vector in different dimensions by using a circulating high-speed path network encoder and a multi-layer space-time attention mechanism to respectively obtain a time dimension attention vector and a network layer dimension attention vector, obtaining and inputting wind power generation capacity historical data, obtaining a prediction result of wind power generation capacity by performing full link layer dimension matching through decoding operation of a circulating high-speed path network decoder, and calculating a confidence interval of the wind power generation capacity prediction result. The method not only effectively improves the wind power generation prediction precision, but also can provide confidence interval information of wind power generation prediction, and enriches the decision space of a power grid manager.

Description

Wind power plant generating capacity prediction method based on machine learning algorithm

Technical Field

The invention belongs to the technical field of wind power generation capacity prediction, and particularly relates to a wind power generation field power generation capacity prediction method based on a machine learning algorithm.

Background

The generated energy of the wind power plant has strong instability under the influence of meteorological conditions such as wind speed and air pressure. The prediction of the wind power generation amount in the unit of hours or days is significant for the comprehensive scheduling of a power grid and the operation or maintenance of a wind turbine generator, but has extremely high difficulty. The machine learning algorithm can effectively represent deep features of input data, and is widely applied to prediction of wind power generation. However, performance degradation is caused by the phenomena of input sequence information loss, gradient disappearance caused by network layer superposition and the like in machine learning, particularly in deep learning algorithm application, and therefore the power training efficiency and the prediction accuracy are influenced. Meanwhile, the existing prediction method does not provide confidence interval distribution information of a prediction result on the basis of more accurately predicting the wind power generation amount, and the method is more important for more efficient power grid decision.

Disclosure of Invention

Aiming at the problems in the prior art, the invention aims to provide a method for predicting the power generation amount of a wind power plant based on a machine learning algorithm.

In order to achieve the purpose, the method is based on a circulating high-speed path network and combines a multi-layer space-time attention mechanism to predict the wind power generation amount and further calculate the probability distribution of the wind power generation amount. The adopted technical scheme comprises the following steps:

step 1, acquiring and inputting meteorological historical data to obtain a vector capable of representing meteorological data characteristics:

step 1.1, acquiring meteorological historical data to obtain a meteorological historical data sequence which can be input into a convolutional neural network for feature extraction: (x) ₁ ,x ₁ ,…,x _t ,…,x _T-1 )

Wherein T is equal to {1,2,. Multidot., T-1}, x _t ∈R ⁿ N-dimensional meteorological real vector data at the time T, wherein T is a time sequence of a target to be predicted;

step 1.2, inputting the sequences into a convolutional neural network, and performing convolutional operation to obtain a sequence capable of representing meteorological data characteristics:

(w ₁ ,w ₂ ,…,w _t ,…,w _T-1 )

wherein, w _t ∈R ^m Representing meteorological characteristic real vector data for dimension m at t moment after processing;

step 2, training and learning the characteristics related to the time sequence among variables in the meteorological data characteristic sequence by using a circulating high-speed network: with S '= S' (W, W) _S' )、t'＝T'(w,W _T' ) And C '= C' (W, W) _C' ) Respectively representing the outputs, w, after action conversion by nonlinear conversion tanh function S ', sigmoid function T' and sigmoid function C ^[t] For the meteorological feature vector input at the time t, the hidden state in the circulating high-speed path network is updated to

Wherein the content of the first and second substances,

the hidden state vector of the dimension I is output by the circulating expressway network unit at the time point t of the K layer, K belongs to {1, 2.. K }, and K is the number of network layers and is specified when K =0

Wherein, W _S' ,W _T' ,W _C' ∈R ^l×m 、R _S'k ,R _C'k ,R _T'k ∈R ^l×l And b _Hk ,b _C'k ,b _T'k ∈R ^l A weight matrix and a bias unit respectively representing the S ', T ' and C ' conversion of the k-th layer, and an indicator function I {. Cndot. } representing a feature vector w ^[t] Only at layer 1 (k = 1) of the round robin highway network participates in the operation,

indicating that all of the original input information is retained,

representing conversion of all input information;

and 3, re-screening the feature vectors in different dimensions through the encoding operation of an encoder and a multi-layer space-time attention mechanism to respectively obtain a time dimension attention vector and a network hierarchy dimension attention vector:

step 3.1, screening different coding characteristics by adopting a space-time attention mechanism to obtain a time dimension attention vector:

let the decoder have a hidden state vector of the k-th layer at the time T-1

To D _T-1 The Query vector Query1 is obtained after the deformation operation, and the Query vector Query and the hidden vector of the encoder at the kth layer at the T (T is more than 1 and less than or equal to T-1) time

The attention weight between is expressed as

Wherein, query1 belongs to R ^p ，V _k ∈R ^l ，T' _k ∈R ^l×p ，U _k ∈R ^l×l ，V _k 、T’ _k And U _k Respectively are k-th layer nonlinear transformation matrixes, p is a query vector dimension, and the attention weight after normalization processing is expressed as

The k-th layer time-dimension local attention vector can be obtained

The local attention vectors of each layer are spliced to obtain the attention vector of time dimension

And 3.2, screening different coding characteristics by adopting a space-time attention mechanism to obtain a network level dimension attention vector:

for the decoder at T-1Hidden state vector of etch/k layer

After reshape operation, the Query vector Query2 is obtained, which is obtained from the operation of reshape

The attention weight between is expressed as

Wherein Query2 ∈ R ^p ，V _t ∈R ^l ，T' _t ∈R ^l×p ，U _t ∈R ^l×l ，V _t 、T’ _t And U _t Respectively, the t-th time nonlinear transformation matrix is normalized and the attention weight is expressed as

The network hierarchy dimension local attention vector at the t moment can be obtained

The local attention vectors of all layers are spliced to obtain the network level dimension attention vector

And 4, acquiring and inputting historical data of the wind power generation capacity, and obtaining a prediction result of the wind power generation capacity through decoding operation of a decoder and dimension matching of a full link layer:

step 4.1, obtaining and inputting historical data of wind power generation amount to obtain a power generation amount data sequence:

(y ₁ ,y ₂ ,…,y _t ,…,y _T-1 )

step 4.2, performing dimension matching on the time dimension and the network dimension attention vector and the wind power generation capacity historical data through the full connection layer to obtain a characterization vector

Wherein the content of the first and second substances,

and

representing a full connection layer weight matrix, d being the weight matrix dimension,

is a bias unit;

and 4.3, updating the hidden state vector of the encoder:

subjecting the obtained mixture to

As an input to the k-th loop freeway network, an encoder state vector is updated to

Wherein the content of the first and second substances,

wherein, the first and the second end of the pipe are connected with each other,

respectively representing the weight matrices of different layers of the encoder stage cyclic highway network,

representing different bias units, q is the weight matrix dimension,

and 4.4, obtaining and outputting a prediction result of the wind power generation amount at the T moment:

wherein the content of the first and second substances,

a hidden state vector of a K-th layer of a decoder at the moment T-1 is represented by W, V and H, and b is a bias unit;

step 5, calculating a confidence interval of the wind power generation amount prediction result:

if the distribution function of the random variable Y satisfies F (Y) = P (Y ≦ Y), the τ -th quantile thereof may be defined as

Q(τ)＝inf{y:F(y)≥τ},τ∈(0,1)

The fractal regression model is optimized by using pinball loss function minimum as an index through a neural network back propagation method

Wherein N is the predicted horizontal number of quantiles, X _i (i =1,2, \8230;, N) is a sample of the density function f (x),

the output values of different quantiles. The influence of the interpretation variables on the condition quantiles of the response variables at different quantiles can be measured by continuously adjusting the values of W and b in the learning process. Obtaining the optimal parameter vector

And

then, the optimum estimated value of Y is

The quantiles of the wind power generation capacity predicted value under different quantiles are used as input of a Gaussian kernel, and the probability distribution estimation value is obtained by selecting a proper window width

Wherein h is the window width, and the Gaussian kernel function K (-) is expressed as

The invention has the beneficial effects that: the machine learning, particularly the deep learning algorithm, has advantages in short-term prediction in unit of hour or day on the wind power generation amount with strong instability; in order to further improve the short-term prediction precision and the prediction performance of the wind power generation, a circulating high-speed path network and a multi-layer space-time attention mechanism are introduced, so that the characteristics of a learning data sequence are better represented, the loss of input sequence information is reduced, and the information selection and utilization capacity is improved; in order to provide more and more valuable decision information about the prediction result, a complete probability distribution interval of a short-term wind power generation capacity prediction value is obtained by combining quantile regression and a Gaussian kernel function method.

Drawings

FIG. 1 is a schematic flow chart of the present invention.

Detailed Description

The invention will be further described with reference to the drawings attached to the description, but the scope of the invention is not limited thereto.

Wherein, T is an element of {1,2,. Eta., T-1}, x _t ∈R ⁿ N-dimensional meteorological real vector data at the moment T, wherein T is a time sequence of a target to be predicted;

the meteorological historical data mainly comprises 2-meter wind speed, 10-meter wind speed, 50-meter wind speed, roughness, ground level radiation, atmospheric level radiation, air temperature, air density, air pressure, on-shore wind speed profile, off-shore wind speed profile and the like, and the table 1 shows the meteorological historical data of each meteorological point per hour:

TABLE 1 weather History data sequence

Step 1.2, inputting the sequences into a convolutional neural network, and obtaining the sequences capable of representing meteorological data characteristics after convolution operation processing: (w) ₁ ,w ₂ ,…,w _t ,…,w _T-1 )

Wherein w _t ∈R ^m Representing real vector data of meteorological features for m-dimension at t moment after processing;

after convolution operation, a sequence representing the meteorological historical data characteristics is obtained, and a table 2 shows the meteorological characteristic data sequence:

TABLE 2 weather characteristic data series

Step 2, training and learning the characteristics related to the time sequence among variables in the meteorological data characteristic sequence by using a circulating high-speed network: with S '= S' (W, W) _S' )、t'＝T'(w,W _T' ) And C '= C' (W, W) _C' ) Respectively representing the outputs, w, after action conversion by nonlinear conversion tanh function S ', sigmoid function T' and sigmoid function C ^[t] For the meteorological characteristic vector input at the time t, the hidden state in the circulating high-speed path network is updated to

Wherein the content of the first and second substances,

Wherein, W _S' ,W _T' ,W _C' ∈R ^l×m 、R _S'k ,R _C'k ,R _T'k ∈R ^l×l And b _Hk ,b _C'k ,b _T'k ∈R ^l A weight matrix and a bias unit respectively representing the conversion of the k-th layer S ', T ' and C ', and an indication function I {. DEG } representing a characteristic vector w ^[t] Only at layer 1 (k = 1) of the round robin highway network participates in the operation,

indicating that all of the original input information is retained,

representing conversion of all input information;

taking the data from 1 month and 1 day to 10 months and 17 days as the training set of the model, taking the data from 10 months and 18 days to 11 months and 23 days as the test set of the model, setting the relevant parameters, and obtaining the updated internal hidden state, as shown in table 3:

TABLE 3 updated hidden State vector List

let the decoder hide the state of the k-th layer at time T-1Vector is

To D _T-1 After reshape operation, the Query vector Query1 is obtained, which is associated with the hidden vector of the k-th layer at the T (T is more than 1 and less than or equal to T-1) time of the encoder

The attention weight between is expressed as

Wherein, query1 belongs to R ^p ，V _k ∈R ^l ，T' _k ∈R ^l×p ，U _k ∈R ^l×l ，V _k 、T’ _k And U _k Respectively, k-th layer nonlinear transformation matrix, p is query vector dimension, and after normalization treatment, attention weight is expressed as

The k-th layer time-dimension local attention vector can be obtained

Table 4 is the time-dimensional attention vector after the stitching process:

TABLE 4 time-dimensional attention vector

Step 3.2, screening different coding characteristics by adopting a space-time attention mechanism to obtain a network level dimension attention vector:

hidden state vector of k layer at T-1 time of decoder

Get Query vector Query2 after reshape operation, it and

the attention weight between is expressed as

Wherein Query2 ∈ R ^p ，V _t ∈R ^l ，T' _t ∈R ^l×p ，U _t ∈R ^l×l ，V _t 、T’ _t And U _t Respectively, the t-th time nonlinear transformation matrix, and the attention weight after normalization treatment is expressed as

Table 5 shows the attention vectors after the local attention items of each layer of network hierarchy are spliced:

TABLE 5 network hierarchy dimension attention vector

Index	0	1	2	3	4	……	253	254	255
										0	0.1864	0.1060	-0.1316	-0.0669	0.0009	……	0.1204	0.0836	-0.0913
1	0.1980	0.0854	-0.1102	-0.0839	-0.0078	……	0.1217	0.0739	-0.1037
										……	……	……	……	……	……	……	……	……	……
334	0.1997	0.0821	-0.1207	-0.0778	-0.0080	……	0.1180	0.0839	-0.1028
										335	0.2072	0.0967	-0.0942	-0.0817	-0.0165	……	0.1171	0.0812	-0.1058

(y ₁ ,y ₂ ,…,y _t ,…,y _T-1 )

the historical data of the wind power generation amount is 8784 sample data in each hour at intervals and in a time span of one year, and the table 6 shows part of the sample data:

TABLE 6 historical data List of wind power generation

and

representing the fully connected layer weight matrix, d is the weight matrix dimension,

is a bias unit;

and 4.3, updating the hidden state vector of the encoder:

subjecting the obtained product to

Wherein the content of the first and second substances,

respectively representing the weight matrices of different layers of the encoder stage circular high-speed path network,

representing different bias units, and q is a weight matrix dimension;

wherein the content of the first and second substances,

the prediction results of the prediction method at the time T and different network layers can be obtained by respectively adopting the average absolute percentage error MAPE and the root mean square error RMSE methods, the point prediction results are shown in the table 7, and the point prediction results are divided into the point prediction results shown in the table 8:

TABLE 7 prediction results

TABLE 8 Point prediction results

if the distribution function of the random variable Y satisfies F (Y) = P (Y ≦ Y), the τ th quantile can be defined as

Q(τ)＝inf{y:F(y)≥τ},τ∈(0,1)

And

then, the optimum estimate of Y is

The quantiles of the wind power generation capacity predicted value under different quantiles are used as the input of a Gaussian kernel, and the probability distribution estimation value can be obtained by selecting a proper window width

Selecting interval coverage (PICP) and average width of Prediction Interval (PINAW) as evaluation indexes of interval prediction, predicting the wind power generation amount of a complete time interval from 11 months 3 days to 11 months 10 days in 1 week, respectively selecting 85%, 90% and 95% confidence levels to establish the prediction interval of the wind power generation amount, and the result is shown in Table 9:

TABLE 9 comparison of interval prediction indexes

Confidence level	PICP	PINAW
			85％	0.997	0.078
90％	1	0.097
			95％	1	0.116

From table 9, it can be seen that the algorithm obtains a relatively high PICP and a relatively small PINAW at 85%, 90%, and 95% confidence levels, which indicates that the algorithm can provide interval distribution information of wind power generation pre-measurement in a relatively small range and a relatively large degree, and provides an optimization space for power grid decision.

Claims

1. The wind power plant power generation amount prediction method based on the machine learning algorithm is characterized by comprising the following steps of:

step 1: acquiring and inputting meteorological historical data to obtain a vector capable of representing meteorological data characteristics;

and 2, step: training and learning characteristics related to a time sequence among variables in a meteorological data characteristic sequence by using a circulating high-speed network;

and step 3: re-screening the feature vectors in different dimensions through the encoding operation of a circulating high-speed path network encoder and a multi-layer space-time attention mechanism to respectively obtain a time dimension attention vector and a network hierarchy dimension attention vector;

and 4, step 4: acquiring and inputting historical data of wind power generation capacity, and acquiring a prediction result of the wind power generation capacity through decoding operation of a circulating high-speed path network decoder and dimension matching of a full link layer;

and 5: and calculating a confidence interval of the wind power generation amount prediction result.

2. A method for predicting the power production of a wind farm based on a machine learning algorithm according to claim 1, characterized in that the specific operating procedure of step 1 comprises the following steps:

step 1.1, acquiring meteorological historical data to obtain a meteorological historical data sequence which can be input into a convolutional neural network for feature extraction: (x) ₁ ,x ₁ ,…,x _t ,…,x _T-1 ) Wherein, T is an element {1,2, 1}, x _t ∈R ⁿ N-dimensional meteorological real vector data at the time T, wherein T is a time sequence of a target to be predicted;

step 1.2, inputting the sequences into a convolutional neural network, and obtaining the sequences (w) capable of representing meteorological data characteristics after convolutional operation processing ₁ ,w ₂ ,…,w _t ,…,w _T-1 ) Wherein w is _t ∈R ^m And representing real vector data of meteorological features for m-dimension at the t moment after processing.

3. The machine learning algorithm based wind farm energy production prediction method according to claim 2, characterized in that the specific operation of step 2 comprises the steps of: with S '= S' (W, W) _S' )、t'＝T'(w,W _T' ) And C '= C' (W, W) _C' ) Respectively representing the outputs after action conversion by nonlinear conversion tanh function S ', sigmoid function T ' and sigmoid function C ', w ^[t] For the meteorological characteristic vector input at the time t, the hidden state in the circulating high-speed path network is updated to

Wherein the content of the first and second substances,

and the hidden state vector is expressed by a dimension I output by the circulating expressway network unit at the time point t of the K layer, K belongs to {1, 2.. K }, and K is the number of network layers and is specified when K =0

Wherein, W _S' ,W _T' ,W _C' ∈R ^l×m 、R _S'k ,R _C'k ,R _T'k ∈R ^l×l And b _Hk ,b _C'k ,b _T'k ∈R ^l A weight matrix and a bias unit respectively representing the conversion of the k-th layer S ', T ' and C ', and an indication function I {. DEG } representing a characteristic vector w ^[t] Only at layer 1 of the round robin highway network,

indicating that all of the original input information is retained,

indicating that all input information is converted.

4. A method for predicting the power production of a wind farm based on a machine learning algorithm according to claim 3, characterized in that the specific operating procedure of step 3 comprises the following steps:

let the decoder at the T-1 time point and the hidden state vector of the k layer be

To D _T-1 After the deformation operation, the Query vector Query1 is obtained, which is the hidden vector of the k layer at the t time point with the encoder

The attention weight between is expressed as

Get the k-th layer time-dimensional local attention vector as

The local attention vectors of each layer are spliced to obtain the attention vector with time dimension as

hidden state vector of k layer at time T-1 of decoder

Is expressed as the attention weight between

The network hierarchy dimension local attention vector at the t-th moment can be obtained as

The local attention vectors of each layer are spliced to obtain the network level dimension attention vector of

5. A method for predicting the power production of a wind farm based on machine learning algorithms according to claim 4, characterized in that the specific operating procedure of step 4 comprises the following steps:

step 4.1, obtaining and inputting historical data of wind power generation capacity to obtain a power generation capacity data sequence:

(y ₁ ,y ₂ ,…,y _t ,…,y _T-1 )

Wherein the content of the first and second substances,

and

is a bias unit;

and 4.3, updating the hidden state vector of the encoder:

subjecting the obtained product to

As an input to the k-th loop freeway network, the encoder state vector is updated to

wherein the content of the first and second substances,

representing different bias units, q is the weight matrix dimension,

w, V and H represent weight matrixes which can be learned for a hidden state vector of a K-th layer of a decoder at the moment T-1, and b is a bias unit.

6. The machine learning algorithm based wind farm energy production prediction method according to claim 5, characterized in that the specific operation process of step 5 is that if the distribution function of the random variable Y satisfies F (Y) = P (Y ≦ Y), the τ th quantile thereof is defined as

Q(τ)＝inf{y:F(y)≥τ},τ∈(0,1)

Optimizing the quantile regression model by using the minimum pinball loss function as an index through a neural network back propagation method

Wherein N is the number of predicted quantile levels, X _i Is a sample of the density function f (x), i =1,2, \8230;, N,

for the output values of different quantiles, the influence of the explanatory variables on the quantiles of the response variable condition at different quantiles is measured and calculated by continuously adjusting the values of W and b in the learning process to obtain the optimal parameter vector

And

then, the optimum estimated value of Y is

The quantiles of the wind power generation capacity predicted value under different quantiles are used as the input of a Gaussian kernel, and the probability distribution estimated as