CN114912577A

CN114912577A - Wind power plant short-term wind speed prediction method combining VMD and attention mechanism

Info

Publication number: CN114912577A
Application number: CN202210425233.0A
Authority: CN
Inventors: 季培远; 赵英男; 陈飞; 季冠岚
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2022-04-21
Filing date: 2022-04-21
Publication date: 2022-08-16

Abstract

The invention discloses a short-term wind speed prediction method for a wind power plant combining a VMD (minimum mean Square) and an attention mechanism, which comprises the steps of firstly obtaining wind speed space-time data of a target station, establishing a series SWSM (Single-Range short Range; extracting wind speed time domain characteristics by using the GRU model at the top layer, and obtaining respective prediction results; accumulating all the prediction results to obtain a final predicted wind speed; the method fully utilizes the time-space correlation of the wind speed, combines the VMD and the attention mechanism, improves the unstable characteristic of the original wind speed, optimizes the CNN-GRU model by utilizing the attention mechanism, enables the model to capture the long-distance interdependence characteristic in the sequence more easily, effectively improves the wind speed prediction precision, and ensures the reliable operation of the power system.

Description

Wind power plant short-term wind speed prediction method combining VMD and attention mechanism

Technical Field

The invention relates to the technical field of new energy power generation and deep learning, in particular to a wind power plant short-term wind speed prediction method combining a VMD (variable vector machine) and an attention mechanism.

Background

Currently, there is an increasing demand to utilize renewable energy as a solution to future energy shortages, and many conventional power generation systems are being replaced by renewable energy systems. Wind energy has gained wide attention and utilization worldwide as one of the most potential, practical, abundant and environmentally friendly renewable resources in the world. Therefore, further development of wind power generation technology is required.

The method has very important significance for the operation control of the power system by accurately predicting the short-term wind speed, is beneficial to reasonably scheduling wind power integration, reduces the voltage and frequency fluctuation caused by the wind power change, and improves the operation reliability of a power grid. At present, wind speed prediction technologies can be divided into three categories, namely physical models, statistical models and artificial intelligence models. The physical model is represented by a numerical weather forecast model, which uses real-time weather conditions for forecasting, but is generally used for long-term wind speed forecasting in a specific area because the modeling process requires a large amount of calculation. And is not suitable for short-term and ultra-short-term wind speed prediction. The statistical method establishes a nonlinear mapping relation between historical wind speed data through a learning rule, and realizes time series prediction. The basis of artificial intelligence models is machine learning techniques. It describes a complex non-linear relationship between system input and output based on a large amount of wind speed time data. With the rapid development of deep learning technologies, the deep learning technologies are also rapidly applied to short-term wind speed prediction, and the methods combine the existing wind speed prediction technology with a hybrid neural network model and obtain good prediction effects.

Because wind power generation has the characteristics of intermittency, volatility, uncertainty and the like. In practical applications, a certain treatment method is usually combined to obtain a relatively stable subsequence. Through reasonable control of convergence conditions. In addition, when the input time sequence is long, the sequence information is easily lost by networks such as LSTM and GRU, and it is difficult to model the structural information between data, which also affects the accuracy of wind speed prediction.

Disclosure of Invention

The invention discloses a short-term wind speed prediction method for a wind power plant by combining a VMD (virtual machine model) and an attention mechanism, and aims to solve the technical problems that when a time sequence is input to be longer, the networks such as LSTM (local Scale time) and GRU (generalized regression) are easy to lose sequence information, the modeling of structural information among data is difficult, and the accuracy of wind speed prediction is influenced.

In order to achieve the purpose, the invention adopts the following technical scheme:

a wind power plant short-term wind speed prediction method combining a VMD and an attention mechanism specifically comprises the following steps:

step 1: establishing a wind speed data set with two dimensions of time and space;

step 2: establishing a series of SWSMs according to the original data, wherein the SWSMs comprise time domain and space domain characteristics of a wind speed data set;

and step 3: performing wind speed decomposition on the SWSM on each time sequence by using the VMD to obtain sub-SWSM consisting of the IMFs;

and 4, step 4: combining the CNN model with an attention mechanism to obtain an SEnet model;

and 5: aiming at each sub SWSM, the SEnet model is obtained by applying the step 4, and the airspace characteristics of the wind speed are extracted;

step 6: processing the airspace characteristics obtained in the step 5 by applying a GRU model, extracting time domain characteristics and obtaining each predicted component of the wind speed;

and 7: giving different weights to the features of the input through an attention layer based on an attention mechanism;

and 8: and combining the prediction results and obtaining the final predicted wind speed.

In a preferred scheme, in step 1, a wind speed data set is established to include two dimensions of time and space, for an original wind speed data set, wind speeds at a predicted time and a predicted position are set as tag wind speeds, and then the data set and the tag wind speeds are proportionally divided into a training set, a verification set and a test set in a time sequence.

In a preferred embodiment, in step 2, the establishing of the SWSM includes the following procedures:

assuming that the object of study is an array of M rows and N columns over a spatial region, which array may be represented by an M N grid, the position of each site in the array may be indexed by a two-dimensional rectangular coordinate (i, j) (1 ≦ i ≦ M, 1 ≦ j ≦ N), and for each site, the wind speed is a one-dimensional time series, and at time t, the spatial wind speed matrix SWSM for the site (M, N) may be defined as x (i, j) _t ∈R ^M×N ：

The wind speed sequence is converted to SWSM by the above method.

In a preferred embodiment, in step 3, the main steps of VMD decomposition are as follows:

s21: firstly, constructing variation problem, ensuring that the decomposition sequence is modal component with limited bandwidth of central frequency, simultaneously minimizing the sum of the estimated bandwidths of all the modes, and preprocessing the wind speed data as

The corresponding constraint variation expression is

Wherein K is the number of modes to be decomposed, positive integer, { u _k }、{ω _k Respectively corresponding to the k-th modal component and the center frequency after decomposition, wherein delta (t) is a dirac function and is convolution operation;

s22: for solving S21, introducing a Lagrangian multiplier lambda, converting the constraint problem into an unconstrained problem, and obtaining an augmented Lagrangian expression:

in the formula, alpha is a penalty factor and is used for reducing the influence of Gaussian noise;

s23: finally, solving the unconstrained variational problem by adopting an alternating direction multiplier ADMM iterative algorithm, optimizing to obtain each modal component and center frequency, searching saddle points of the augmented Lagrangian function, and iteratively updating the parameters { u } _k }，{ω _k And λ; the formula is as follows:

in the formula

And

respectively represent f (omega) and u _i (ω), λ (ω) and

a Fourier transform of (1); n is the number of iterations; gamma is noise tolerance and is used for meeting the fidelity requirement of signal decomposition;

s24: finally, for a given accuracy of determination e>0, when satisfying

If so, the iteration is stopped, otherwise, the step S3 is returned to. (ii) a Finally, k decomposed IMF components can be obtained; decomposing SWSM by VMD method to obtain sub-SWSM composed of IMFsComponent IMF of station (M, N) at time t _k A constituent sub-SWSM can be defined as

In a preferred scheme, in the step 4, the core idea of CNN used in sense combined with attention is to automatically acquire the importance degree of each feature channel by means of learning, and then according to the importance degree, promote useful features and suppress features that are not useful for the current task, this function is implemented by SE blocks, and a convolution layer, two SE block layers and then another convolution layer are included in the sense layer, and a convolution layer, a global average pooling layer, two active layers and a fusion layer are set in each SE block; the following is the procedure for the SEnet layer construction:

s41 first F _tr Is a conversion operation, which is a standard convolution operation with an input of X and an output of U, and its defining formula is as follows:

in the formula, Vc represents the c-th convolution kernel, X ^s Represents the s-th input, represents the convolution operation, u _c Represents the c-th 2D matrix in the 3D matrix U,

then the 2D spatial kernel for the X corresponding channel, which represents v _c Of the single channel of (a).

S42: then is the Squeeze operation, which is actually a global average pooling operation to compress spatial features, converting the W × H × C input to 1 × 1 × C output, where W denotes the width of a channel, H denotes the height of a channel, and there are C channels, the Squeeze operation formula is as follows:

s43: the following is the Excitation operation, which has the following formula:

s＝σ(g(z，W))＝σ(W ₂ δ(W ₁ z))

in the formula, W ₁ z is a full link layer operation, W ₁ Is C/r C, where r is a scaling parameter, and W is the z dimension 1C ₁ The result of z is 1 x 1C/r; δ is the ReLU function, and does not change the dimensionality of the output; then W is further mixed with ₂ Multiplication of W ₂ The process of multiplication is also a process of full link layer, W ₂ The dimension of (d) is C/r, so the output dimension is 1C; finally, obtaining a weight matrix s by using sigma, namely a sigmoid function;

(4) and finally, assigning a value to the 3D matrix U through a weight matrix s, wherein the formula is as follows:

H _S ＝s _c ·u _c

in the formula u _c Is a two-dimensional matrix, s _c Is a weight, and this equation is equivalent to u _c Each value in the matrix is multiplied by s _c 。

In a preferred scheme, the model is used for extracting the spatial domain characteristics of the wind speed, the sub SWSM matrix obtained by VMD decomposition is input to a SEnet layer, convolution activation is carried out on an input image by using convolution kernel in a convolution layer to obtain a characteristic diagram of a convolution block, the characteristic diagram of the convolution block is input to the convolution layer in an SE block, weight assignment is carried out on a channel of the convolution kernel in the convolution layer by using two activation layers, a new characteristic diagram is obtained by global average pooling, the new characteristic diagram is flattened to be subjected to dimension reduction while the spatial characteristics are kept, and the feature diagram after dimension reduction can be used as the input of a GRU layer.

In a preferred approach, in step 6, feature extraction on a time series is implemented by the GRU layer, the GRU having two gates, one reset gate and the other update gate, the reset gate determining how to combine new input information with previous memory, the update gate defining the amount of previous memory saved to the current time step, two gating vectors determining which information can ultimately be used as output of a gated loop unit, which can save information in a long-term sequence and not be cleared over time or removed because it is irrelevant to prediction;

the gate and activation function of the GRU is calculated as follows:

s61: activation function Sigmoid:

s62: activation function tanh:

s63: and (4) updating the door: z is a radical of _t ＝σ(W _z ·[h _t-1 ，x _t ])

S64: resetting the gate: r is _t ＝σ(W _r ·[h _t-1 ，x _t ])

S65: new memory (input of reset gate):

s66: and (3) outputting a value:

where σ is the activation function Sigmoid, tanh is the activation function tanh, and z is the update gate and the reset gate, respectively _t And r _t ，x _t For input, h _t In order to hide the output of the layer,

is to input x _t And past hidden layer states h _t-1 Summarizing; w _z 、W _r And W are the weights of the update gate, reset gate and candidate output, respectively.

In a preferred scheme, in step 7, an attention layer is added after the GRU layer, and different weights are given to input features of the model, so that the influence of important information is strengthened to avoid the problem of long-distance information loss, and the model can more easily capture long-distance interdependent features in the sequence, wherein the input of the attention layer is an output vector h subjected to activating processing of the GRU layer _t Calculating the probability corresponding to different eigenvectors according to the weight distribution principle, continuously updating and iterating to obtain a better weight parameter matrix, wherein the weight coefficient calculation formula of the attention layer is as follows:

e _t ＝utanh(wh _t +b)

in the formula, e _t Represents the output vector h of the GRU network layer at the t moment _t The determined attention probability distribution value; u and w are weight coefficients; b is a bias coefficient; the output of the attention layer at the t-th moment is represented by s _t And (4) showing.

In a preferred embodiment, in step 8, the prediction results are combined in the output layer, and the calculation is performed by the full-link layer, and the output Y ═ Y [ Y ] with the predicted step size m is predicted ₁ ，y ₂ ，·····，y _m ] ^T During prediction, an early stopping mechanism is used for monitoring the model, when the training error is not optimized within a certain training frequency, the training is stopped, otherwise, the training is continued until the original set frequency is finished, and the prediction formula is as follows:

y _t ＝Sigmoid(w _o s _t +b _o )

in the formula, y _t A predicted output value indicating a t-th time; w is a ₀ Is a weight matrix; b ₀ For the deviation vector, the activation function is a Sigmoid function.

Compared with the prior art, the invention has the following advantages:

(1) and processing the wind speed data by adopting a VMD method. The unstable wind speed sequence is converted into a relatively stable subsequence, and the wind speed prediction precision is improved.

(2) Considering that irrelevant features in the data can cause the performance of the model to be reduced, the attention mechanism is used for reallocating the feature weights to improve the performance of the model.

(3) The bottom layer architecture of the algorithm adopts a CNN-GRU model. The model can process the space-time characteristics of the wind speed, and the wind speed is predicted by utilizing the space-time correlation, so that the prediction precision is improved. .

Drawings

FIG. 1 is a wind speed prediction flow chart of a wind farm short-term wind speed prediction method combining a VMD and an attention mechanism according to the present invention.

FIG. 2 is a schematic diagram of a wind speed sequence converted into SWSM according to the wind farm short-term wind speed prediction method combining a VMD and an attention mechanism.

FIG. 3 is a schematic diagram of a VMD decomposed SWSM of a wind farm short-term wind speed prediction method combining a VMD and an attention mechanism according to the present invention.

Fig. 4 is an architecture diagram of a GRU network of a wind farm short-term wind speed prediction method combining a VMD and an attention mechanism according to the present invention.

FIG. 5 is a structural diagram of SENET of a wind farm short-term wind speed prediction method combining a VMD and an attention mechanism.

FIG. 6 is a schematic diagram of a wind speed prediction result with a prediction time of 20 minutes of the wind farm short-term wind speed prediction method combining a VMD and an attention mechanism provided by the invention.

FIG. 7 is a comparison graph of different model prediction errors RMSE (m/s) of the wind power plant short-term wind speed prediction method combining the VMD and the attention mechanism.

FIG. 8 is a comparison graph of different model prediction errors MAPE (m/s) of the wind farm short-term wind speed prediction method combining the VMD and the attention mechanism.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.

1-8, a wind farm short-term wind speed prediction method combining a VMD and an attention mechanism and adopting a variational modal decomposition and an attention mechanism to improve the accuracy of wind speed prediction. Firstly, noise reduction is carried out on the wind speed by using a variational modal decomposition technology to obtain an optimized wind speed matrix; then converting the spatial information into visual information through a gray image by using the wind speed at the same time in the wind power plant, processing the visual information by using a deep convolution neural network which is relatively suitable for processing the visual information, and extracting spatial features by using a method of taking an SENet layer formed by combining CNN (convolutional neural network) and an attention mechanism as a strengthening network; then, combining the GRU with an attention mechanism to realize the extraction of time characteristics; finally, obtaining a wind speed prediction result; which comprises the following steps:

step 1: and establishing a wind speed data set with two dimensions of time and space.

Step 2: a series of SWSMs are built from the raw data, the SWSMs including time-domain and space-domain characteristics of the wind speed data set.

And step 3: and performing wind speed decomposition on the SWSM on each time sequence by using the VMD, and decomposing to obtain sub-SWSM consisting of the IMFs.

And 4, step 4: combining the CNN model with the attention mechanism to obtain the SENEt model.

And 5: and (4) aiming at each sub SWSM, applying the step 4 to obtain a SEnet model, and extracting the spatial domain characteristics of the wind speed.

Step 6: and (5) processing the spatial domain characteristics obtained in the step (5) by applying a GRU model, extracting time domain characteristics and obtaining each prediction component of the wind speed.

And 7: the features of the input are given different weights by the attention layer based on the attention mechanism.

In a preferred embodiment, in step 1, the established wind speed data set comprises both time and space dimensions. For the original wind speed data set, the wind speed at the predicted time and position is set as the tag wind speed, and then the data set and the tag wind speed are proportionally divided into a training set, a verification set and a test set in a time sequence.

As shown in fig. 3 in detail, in a preferred embodiment, in step 2, the SWSM is established by the following method: assuming the object of study is an array of M rows and N columns over a spatial region, which may be represented by an M N grid, the position of each site in this array may be indexed by a two-dimensional rectangular coordinate (i, j) (1 ≦ i ≦ M, 1 ≦ j ≦ N), and the wind speed is a one-dimensional time series for each site. Thus, when time is t, the spatial wind velocity matrix (SWSM) for a site (M, N) can be defined as x (i, j) _t ∈R ^M×N ：

(1) firstly, constructing variation problem, ensuring that the decomposition sequence is modal component with limited bandwidth of central frequency, simultaneously minimizing the sum of the estimated bandwidths of all the modes, and preprocessing the wind speed data as

The corresponding constraint variation expression is

Wherein K is the number of patterns to be decomposed (positive integer), { u _k }、{ω _k Respectively corresponding to the k-th modal component and the center frequency after decomposition, wherein delta (t) is a dirac function and is convolution operation;

(2) for solving the (1), introducing a Lagrangian multiplier lambda, converting the constraint problem into an unconstrained problem, and obtaining an augmented Lagrangian expression:

where α is a penalty factor for reducing the effect of gaussian noise.

(3) Finally, solving the unconstrained variational problem by adopting an alternating direction multiplier (ADMM) iterative algorithm, optimizing to obtain each modal component and center frequency, searching saddle points of the augmented Lagrangian function, and iteratively updating the parameters { u } _k }，{ω _k And lambda. The formula is as follows:

in the formula

And

respectively represent f (omega) and u _i (ω), λ (ω) and

fourier transform of (2); n is the number of iterations; gamma is a noise tolerance for meeting the fidelity requirements of signal decomposition.

(4) Finally, for a given accuracy of determination e>0, if it satisfies

Stopping the iteration, otherwise returning to the step (3). Finally, k decomposed IMF components can be obtained.

Decomposing SWSM by VMD method to obtain sub-SWSM composed of each IMF, when time is t, component IMF of site (M, N) _k A constituent sub-SWSM can be defined as

In a preferred embodiment, in the step 4, the core idea of CNN and attention combination used in the SENet model is to automatically acquire the importance degree of each feature channel by means of learning, and then to promote useful features and suppress features that are not useful for the current task according to the importance degree. This function is implemented by the SE block. Whereas in the sense layer one convolutional layer is included, two SE block layers are included, followed by another convolutional layer. In each SE block, a convolutional layer, a global averaging pooling layer, two active layers, and a fusion layer are provided. The structure of the SE block is shown in fig. 4.

The following is the procedure for the SENet layer construction:

(1) first F _tr Is a conversion operation, which is a standard convolution operation with an input of X and an output of U, and its defining formula is as follows:

in the formula, v _c Denotes the c-th convolution kernel, X ^s Represents the s-th input, represents the convolution operation. u. of _c Represents the c-th 2D matrix in the 3D matrix U,

(2) Then is the Squeeze operation, which is actually a global average pooling operation to compress the spatial features, converting the W × H × C input to a 1 × 1 × C output, where W denotes the width of the channel and H denotes the height of the channel, for a total of C channels. The Squeeze operation formula is as follows:

(3) the following is the Excitation operation, which has the following formula:

s＝σ(g(z，W))＝σ(W ₂ δ(W ₁ z))

in the formula, W ₁ z is a full link layer operation, W ₁ Is C/r C, where r is a scaling parameter, and W is the z dimension 1C ₁ The result of z is 1 x 1C/r; δ is the ReLU function, and does not change the dimensionality of the output; then W is further mixed with ₂ Multiply by and W ₂ The multiplication process is also a full link layer process, W ₂ The dimension of (d) is C/r, so the output dimension is 1C; finally, sigma, namely a sigmoid function, is used to obtain a weight matrix s.

H _S ＝s _c ·u _c

According to the SENET model described in step 4, spatial features of wind speed are extracted using this model. And inputting the sub SWSM matrix obtained by VMD decomposition into a SENET layer, and performing convolution activation on an input image by using a convolution kernel in the convolution layer to obtain a feature map of a convolution block. And inputting the feature map of the convolution block into a convolution layer in the SE block, performing weight assignment on a channel of a convolution kernel in the convolution layer by using two activation layers, and performing global average pooling to obtain a new feature map. And flattening the new feature map to reduce the dimension while keeping the space features. The feature map after dimension reduction can be used as the input of the GRU layer.

In a preferred embodiment, in step 6, feature extraction in time series is implemented by the GRU layer. The GRU has two gates, a reset gate (resetgate) which determines how the new input information is combined with the previous memory and an update gate (updategate) which defines the amount of time the previous memory is saved to the current time step. The two gating vectors determine which information can ultimately be output as a gated loop unit, which can hold information in long-term sequences and is not cleared over time or removed because it is not relevant to prediction.

The gate and activation function of the GRU is calculated as follows:

(1) activation function Sigmoid:

(2) activation function tanh:

(3) and (4) updating the door: z is a radical of _t ＝σ(W _z ·[h _t-1 ，x _t ])

(4) Resetting a gate: r is a radical of hydrogen _t ＝σ(W _r ·[h _t-1 ，x _t ])

(5) New memory (input of reset gate):

(6) and (3) outputting a value:

in the formulaσ is the activation function Sigmoid, and tanh is the activation function tanh. The refresh gate and the reset gate are z respectively _t And r _t 。x _t For input, h _t Is the output of the hidden layer.

Is to input x _t And past hidden layer state h _t-1 Summarizing; w _z 、W _r And W are the weights of the update gate, reset gate and candidate output, respectively.

In a preferred embodiment, in step 7, an attention layer is added after the GRU layer, and by giving different weights to the input features of the model, the influence of important information is strengthened to avoid the problem of long-distance information loss of the sequence, so that the model can more easily capture the features which are mutually dependent in the sequence for a long distance. The input of the attention layer is an output vector h subjected to GRU layer activation processing _t And calculating the probabilities corresponding to different eigenvectors according to a weight distribution principle, and continuously updating and iterating to obtain a better weight parameter matrix. The formula for calculating the weight coefficient of the attention layer is as follows:

e _t ＝utanh(wh _t +b)

In a preferred embodiment, in step 8, the prediction results are combined in the output layer and calculated by the full-link layer, and the output Y ═ Y with step size m is predicted ₁ ，y ₂ ，·····，y _m ] ^T . Early stop in predictionAnd the stopping mechanism monitors the model, stops training when the training error is not optimized within a certain training frequency, and otherwise continues training until the original set frequency is finished. The prediction formula is as follows:

y _t ＝Sigmoid(w _o s _t +b _o )

in the formula, y _t A predicted output value indicating a t-th time; w is a _o Is a weight matrix; b _o For the deviation vector, the activation function is a Sigmoid function.

Compared with the prior art, the invention has the following advantages:

(4) and processing the wind speed data by adopting a VMD method. The unstable wind speed sequence is converted into a relatively stable subsequence, and the wind speed prediction precision is improved.

(5) Considering that irrelevant features in the data can cause the performance of the model to be reduced, the attention mechanism is used for reallocating the feature weights to improve the performance of the model.

The bottom layer architecture of the algorithm adopts a CNN-GRU model. The model can process the space-time characteristics of the wind speed, and the wind speed is predicted by utilizing the space-time correlation, so that the prediction precision is improved.

FIG. 6 shows the partial wind speed prediction result and the residual at a prediction time interval of 20 minutes.

As can be seen from fig. 6, the VCGA model has a very high fitting degree, and can accurately reflect the trend of the true value. The residual analysis result shows that the prediction residuals of the model are uniformly and randomly distributed on two sides of a zero base line, which indicates that no systematic error exists in the modeling process. Thus, the model is feasible in short term wind speed prediction.

Fig. 7 and 8 show the RMSE and MAPE results at different prediction times for different prediction models, respectively.

In order to verify the effect of the invention, the algorithm of the invention is compared with other traditional machine learning algorithms and sub-algorithms of the algorithm for experiments, and two representative evaluation indexes of RMSE and MAPE are used. The comparison result shows that compared with other traditional methods based on deep learning, the method provided by the invention has higher prediction precision and more accurate result

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims

1. A wind power plant short-term wind speed prediction method combining a VMD and an attention mechanism is characterized by comprising the following steps:

and 5: aiming at each sub SWSM, the SEnet model obtained in the step 4 is applied, and the airspace characteristics of the wind speed are extracted;

2. The method for predicting the short-term wind speed of the wind farm by combining the VMD and the attention mechanism as claimed in claim 1, wherein in the step 1, a wind speed data set is established to comprise two dimensions of time and space, for an original wind speed data set, the wind speed at the predicted time and position is set as a label wind speed, and then the data set and the label wind speed are proportionally divided into a training set, a verification set and a test set in a time sequence.

3. The wind farm short-term wind speed prediction method combining the VMD and the attention mechanism according to claim 1, wherein in the step 2, the establishment of the SWSM comprises the following process:

assuming that the object of study is an array of M rows and N columns over a spatial region, which array may be represented by an M N grid, the position of each site in the array may be indexed by a two-dimensional rectangular coordinate (i, j) (1 ≦ i ≦ M, 1 ≦ j ≦ N), and for each site, the wind speed is a one-dimensional time series, and at time t, the spatial wind speed matrix SWSM for the site (M, N) may be defined as x (i, j) t ∈ R ^M×N ：

The wind speed sequence was converted to SWSM by the method described above.

4. The wind farm short-term wind speed prediction method combining the VMD and the attention mechanism according to claim 1, characterized in that in step 3, the main steps of VMD decomposition are as follows:

The corresponding constraint variation expression is

In the formula, K isNumber of patterns to be decomposed, positive integer, { u } _k }、{ω _k Respectively corresponding to the k-th modal component and the center frequency after decomposition, wherein delta (t) is a dirac function and is convolution operation;

in the formula

And

respectively represent f (omega) and u _i (ω), λ (ω) and

s24: finally, for a given accuracy of determination e>0, when satisfying

If so, the iteration is stopped, otherwise, the step S3 is returned to. (ii) a Finally, k decomposed IMF components can be obtained; decomposing SWSM by VMD method to obtain sub-SWSM composed of each IMF, when time is t, component IMF of site (M, N) _k A constituent sub-SWSM can be defined as

5. The method for predicting the short-term wind speed of the wind farm combining the VMD and the attention mechanism according to claim 1, characterized in that in the step 4, the core idea of combining the CNN used in the SENET and the attention is to automatically acquire the importance degree of each feature channel by means of learning, and then according to the importance degree, to promote useful features and suppress features which are not used for the current task, the function is realized by SE blocks, and a convolution layer, two SE block layers and then another convolution layer are included in the SENET layer, and a convolution layer, a global average pooling layer, two active layers and a fusion layer are arranged in each SE block; the following is the procedure for the SENet layer construction:

s＝σ(g(z，W))＝σ(W ₂ δ(W ₁ z))

H _S ＝s _c ·u _c

in the formula u _c Is a two-dimensional matrix, s _c Is a weight, and this equation is equivalent to dividing u in the matrix _c Is multiplied by s _c 。

6. The wind farm short-term wind speed prediction method combining the VMD and the attention mechanism is characterized in that a sub-SWSM matrix obtained by decomposing the VMD is input to a SEnet layer, convolution activation is carried out on an input image by using convolution kernels in the convolution layers to obtain a feature map of a convolution block, the feature map of the convolution block is input to the convolution layers in an SE block, weight assignment is carried out on channels of the convolution kernels in the convolution layers by using the two activation layers, a new feature map is obtained by global averaging pooling, the new feature map is flattened to be subjected to dimension reduction while space features are kept, and the feature map after dimension reduction can be used as input of a GRU layer.

7. A wind farm short term wind speed prediction method combining VMD and attention mechanism according to claim 1 characterized by that in step 6 feature extraction on time series is implemented by GRU layer, GRU has two gates, one is reset gate and the other is update gate, the reset gate determines how to combine new input information with previous memory, the update gate defines the amount of previous memory saved to current time step, two gating vectors determine which information can finally be used as output of gating cycle units, they can save information in long term series and not clear over time or because they are not relevant to prediction;

the gate and activation function of the GRU is calculated as follows:

s61: activation function Sigmoid:

s62: activation function tanh:

S64: resetting a gate: r is _t ＝σ(W _r ·[h _t-1 ，x _t ])

S65: new memory (input to reset gate):

s66: and (3) outputting a value:

where σ is the activation function Sigmoid, tanh is the activation function tanh, and the update gate and the reset gate are z _t And r _t ，x _t For input, h _t In order to hide the output of the layer,

8. The method for predicting the short-term wind speed of the wind farm by combining the VMD and the attention mechanism as claimed in claim 1, wherein in step 7, an attention layer is added behind the GRU layer, and the input of the attention layer is an output vector h subjected to the activation processing of the GRU layer by giving different weights to the input characteristics of the model _t Calculating the probability corresponding to different eigenvectors according to the weight distribution principle, continuously updating and iterating to obtain a better weight parameter matrix, and noting

The weight coefficient calculation formula of the intention layer is as follows:

e _t ＝utanh(wh _t +b)

in the formula, e _t Represents that the t-th moment is output by the GRU network layer to a vector h _t The determined attention probability distribution value; u and w are weight coefficients; b is a bias coefficient; the output of the attention layer at the t-th moment is represented by s _t And (4) showing.

9. The method for predicting the short-term wind speed of the wind farm by combining the VMD and the attention mechanism as claimed in claim 1, wherein in step 8, the prediction results are combined in an output layer, the calculation is carried out through a full connection layer, and the output Y-Y with the step length m is predicted ₁ ，y ₂ ，……，y _m ] ^T During prediction, an early stopping mechanism is used for monitoring the model, when the training error is not optimized within a certain training frequency, the training is stopped, otherwise, the training is continued until the original set frequency is finished, and the prediction formula is as follows:

y _t ＝Sigmoid(w _o s _t +b _o )