CN113642767A

CN113642767A - Multi-dimensional feature combination prediction method based on MI-VMD-DA-EDLSTM-VEC

Info

Publication number: CN113642767A
Application number: CN202110781945.1A
Authority: CN
Inventors: 廖雪超; 伍杰平; 陈才圣
Original assignee: Wuhan University of Science and Engineering WUSE
Current assignee: Wuhan University of Science and Engineering WUSE; Wuhan University of Science and Technology WHUST
Priority date: 2021-07-09
Filing date: 2021-07-09
Publication date: 2021-11-12

Abstract

The invention discloses a multi-dimensional feature combination prediction method based on MI-VMD-DA-EDLSTM-VEC, which comprises S1, selecting MI features of an original feature sequence x (t) to obtain a wind power, wind speed and temperature sequence; s2, wind power generation work is respectively performedVMD decomposition is carried out on the rate, the wind speed and the temperature sequence to obtain modal components; s3, model training and predicting the modal component obtained by VMD decomposition based on the coding and decoding model DA-EDLSTM using the double-layer attention mechanism to obtain an initial prediction sequence

By passing

Obtaining an original prediction error e (t); s4, carrying out VMD decomposition pretreatment on the original prediction error e (t), and retraining and predicting by using a single-layer LSTM model to obtain an error prediction sequence

And correcting the original prediction result to obtain the final prediction result

The invention provides a multi-dimensional characteristic combination prediction method based on MI-VMD-DA-EDLSTM-VEC, which improves the decision making capability when the data such as wind power and the like change rapidly, can more accurately predict the peak value or the valley condition, has optimal prediction precision and stronger prediction stability.

Description

Multi-dimensional feature combination prediction method based on MI-VMD-DA-EDLSTM-VEC

Technical Field

The invention relates to the field of wind power generation prediction models, in particular to a multi-dimensional feature combination prediction method based on MI-VMD-DA-EDLSTM-VEC.

Background

With the change of global energy structures, the proportion of wind energy in future energy sources is gradually increased. The rapid development of wind power generation presents a number of challenges related to power system reliability and safety. Due to the intermittency and randomness of wind speed, voltage and frequency fluctuations in the power system are large, resulting in a full uncertainty of the wind power generation process. Therefore, accurate and effective prediction of the wind power efficiency is vital to guarantee the uninterrupted operation of the power system and the full utilization of wind energy resources. At present, short-term prediction methods of wind power can be classified into 3 types: 1) driving a model; 2) driving data; 3) model and data integration driving.

The model driver predicts meteorological variables by computing three dimensional spatial and temporal information of thermodynamic and fluid dynamics (CFD) models based on numerical weather forecast (NWP) models, and converts to wind power using wind power curves appropriate for a given wind farm. The accuracy of this model-driven approach depends to a large extent on the NWP model, which needs to provide meteorological data and wind turbine physical characteristic data, but these data are not always available, and the model requires a lot of computation and time. Therefore, these methods may not be suitable for practical wind power prediction (WPF) applications.

Data-driven predictions are made by finding a mapping between input variables and wind power, for which many statistical and machine-learning models have been developed. The statistical model comprises a Persistence Model (PM), an autoregressive differential moving average model (ARIMA), a Gaussian Process (GP), a Kalman Filter (KF) and the like; the machine learning model comprises an Artificial Neural Network (ANN), an Extreme Learning Machine (ELM), a Support Vector Machine (SVM), and the like. With the advent of the big data age, the explosion-type growth of multivariate massive time series data. In most cases, multivariate time series data has characteristics of high dimensionality, space-time correlation, dynamics, nonlinearity and the like, or contains noise data, so that the traditional statistical model and machine learning model have difficulty in processing massive and complicated data, and the deep learning algorithm can more mine deep characteristics of the data than the traditional method. Thus, although conventional methods are still available, deep learning based prediction methods are becoming increasingly popular. Common deep learning models are Convolutional Neural Network (CNN), layered self-encoder (SAE), deep belief neural network (DBN), Recurrent Neural Network (RNN), etc.

The integrated drive is the combination of model drive and data drive, and integrates signal preprocessing techniques such as wavelet transformation and empirical mode decomposition; optimization algorithms, such as particle swarm optimization and Grid Search (GS); and predictive models such as Extreme Learning Machines (ELMs), Back Propagation Neural Networks (BPNNs), and Support Vector Machines (SVMs).

Due to the randomness and complexity of wind power and wind speed data, the signal preprocessing algorithm can effectively extract the characteristics of signals, and many scholars perform wind power prediction by combining the signal preprocessing method and the machine learning algorithm. For example, wavelet packet decomposition (WTD), Empirical Mode Decomposition (EMD), fully-integrated empirical mode decomposition (CEEMD), EEMD + AWNN (integrated empirical mode decomposition and adaptive wavelet neural network), and fully-integrated empirical mode decomposition of adaptive noise. The results show that the model combining the signal decomposition and machine learning methods can have higher stability and prediction accuracy than a single model without signal preprocessing.

Deep learning is considered as the strongest characterization learning algorithm, and in the past few years, researches for combining signal preprocessing and the deep learning algorithm for wind power prediction are gradually increasing. The WD-VMD-DLSTM-AT combined prediction model based on the two-stage decomposition of Wavelet Decomposition (WD) and Variational Modal Decomposition (VMD) and attention mechanism (AT) achieves higher precision and stability in short-term wind speed prediction.

While the LSTM may cause error accumulation during recursive multi-step prediction, the codec framework can solve the error accumulation problem, but the prediction accuracy may be significantly reduced during a longer input time step. Many scholars have introduced the ability to focus on the selection of time-dependent boosting models,

in summary, the conventional ARMA model cannot process nonlinear and non-stationary complex data, while the single SVR and LSTM model has a problem of prediction lag and error accumulation in multi-step prediction, and although the codec model can better solve the problem of error accumulation, the codec model cannot grasp the long-time correlation between input features.

Disclosure of Invention

In order to overcome the defects of the technologies, the invention provides a multi-dimensional feature combination prediction method based on MI-VMD-DA-EDLSTM-VEC, which is used for short-term wind power prediction, the original feature sequence and the prediction error sequence are subjected to Variational Modal Decomposition (VMD), a long-time memory neural network model based on a two-stage attention mechanism and a coding and decoding framework and an error correction module using error decomposition preprocessing are utilized, the prediction model for predicting the wind power has the optimal performance, and the error sequence obtained by the variational modal decomposition can improve the correction effect.

The technical scheme adopted by the invention for overcoming the technical problems is as follows:

the invention provides a multi-dimensional feature combination prediction method based on MI-VMD-DA-EDLSTM-VECThe method specifically comprises the following steps: s1, selecting MI characteristics of the original characteristic sequence x (t) to obtain a wind power, wind speed and temperature sequence; s2, performing VMD decomposition on the wind power, the wind speed and the temperature sequence respectively to obtain modal components; s3, model training and predicting modal components obtained by VMD decomposition based on the coding and decoding DA-EDLSTM model using the double-layer attention mechanism to obtain an initial pre-sequencing sequence

By passing

Further, step S1 specifically includes: s11, calculating mutual information of the original wind speed sequence X and the target sequence Y based on the formula 1, and sequencing mutual information quantity, wherein the original wind speed sequence comprises wind power, temperature, air pressure, density and wind direction,

wherein p (X, Y) is a joint probability density function of X and Y, and p (X) and p (Y) are marginal density functions, if X and Y are not related at all, p (X, Y) will be equal to p (X) p (Y), and mutual information will be equal to 0, if I (X; Y) is larger, the correlation between the two variables is stronger; and S12, selecting a 3-dimensional characteristic sequence with the maximum mutual information based on mutual information sequencing of the characteristic sequence X and the target sequence Y, wherein the 3-dimensional characteristic sequence comprises wind power, wind speed and temperature.

Further, step S2 is embodiedThe method comprises the following steps: respectively decomposing wind power, wind speed and temperature sequence into modal components u with central frequency K_k(

k

1, 2.. k), wherein the sum of the bandwidth estimates for each modal component is minimized, comprising the steps of: s21, by applying each modal component u_kPerforming Hilbert transform to obtain a corresponding frequency spectrum; s22, mixing u by an exponential mixture modulation method_kTo respective estimated center frequencies ω_k(ii) a S23, demodulating and estimating u using Gaussian smoothness and gradient square criterion of the signal_kThe bandwidth of (c).

Further, the encoding and decoding model DA-EDLSTM using the two-layer attention mechanism in step S3 includes an input layer, an encoding layer, a decoding layer and an output layer, which are coupled in sequence, wherein the encoding layer uses the attention mechanism for the input features, and the decoding layer uses the time attention mechanism.

Further, the encoding layer specifically includes, for the input feature, using an attention mechanism: s31, inputting the k-dimension characteristic sequence X in the observation sequence X based on the formula 2 and the formula 3^kConstructing an attention mechanism;

wherein the content of the first and second substances,

and

are the parameters that the model needs to learn,

and

is a plaitHidden state and unit state of the code layer, m is the size of the hidden layer, and T is the window length of the observation time sequence; s32, based on softmax function

Normalized and the sum of the attention weights is 1; s33, input X for each time_tEach influence factor is given a certain attention weight

Hence attention weighted output of the decoding stage

S34, mixing

Input to the coding layer to obtain

Wherein the function f1 is an LSTM network.

Further, the decoding layer specifically includes using a time attention mechanism: based on e_i＝tanh(W_d[h_i；s_t-1]+b_d) Computing an attention weight vector e for representing the importance of the input without normalization_iWherein W is_dAnd U_dIs the weight parameter that requires model learning; based on

Normalization is carried out to obtain the attention frequency of the input sequence at each moment; the context vector at time t is based on

Weighted summation is carried out to obtain a vector x of the gate control unit finally entering the LSTM_t1。

Furthermore, the method also comprises the step of carrying out prediction performance evaluation based on root mean square error RMSE, mean absolute error MAE and symmetric mean absolute percentage error SMAPE.

Further, the LSTM model in step S3 and step S4 at least includes a control unit and a storage unit, wherein the control unit at least includes a forgetting gate, an input gate and an output gate for controlling the storage unit information update and utilization.

Further, the forgetting gate is used for controlling the last unit c in the information updating of the storage unit_t-1Forgotten information with the formula f_t＝σ(W_f[h_t-1；x_t]+b_f) (ii) a The input gate is used to control the information input to the unit, and the formula is i_t＝σ(W_i[h_t-1；x_t]+b_i)，

Selectively updating c based on forgetting gate and input gate_tIs of the formula

C is to_tActivate and control c_tDegree of filtration of formula o_t＝σ(W_o[h_t-1；x_t]+b_o)， h_t＝o_t⊙tanh(c_t) (ii) a Wherein, W_*As a weight matrix, b_*As an offset term,. indicates a matrix element product,. sigma.

Output layer based on y_t＝σ(W_yh_t+b_y) And obtaining a final predicted value.

The invention has the beneficial effects that:

1) for the multi-dimensional characteristic time sequence, a characteristic sequence with strong correlation with a target sequence can be selected by using a mutual information method, so that the interference of redundant characteristics and irrelevant characteristics on a prediction model is reduced;

2) different frequency domain characteristics of signals with strong complexity and instability can be extracted through VMD decomposition, the problem of prediction lag existing in an LSTM model is solved, and the prediction precision of the model is improved;

3) the VMD-DA-EDLSTM improves the decision-making capability of the model when the wind power and other data change rapidly;

4) the condition that the wind power has a peak value or a trough is predicted more accurately.

5) The two-stage attention mechanism for the input of the coding layer and the hidden state of the decoding layer can select key information, the important characteristic dimension is paid attention to through the AT in the first stage, and meanwhile, the AT in the second stage pays attention to the important moment in the long-term time sequence, so that the effects of not only mastering the long-term time sequence dependency relationship and paying attention to the important moment, but also realizing the selection of the important characteristic factor are achieved, the defect that the performance of a coding and decoding model is poor when the length of an input sequence is increased is improved, and the performance of the model is further improved.

6) The error correction mechanism of signal preprocessing is used to further improve the prediction precision, VMD decomposition solves the problems of instability, strong complexity and the like of an error sequence, and realizes good feature extraction of the error sequence, thereby further improving the prediction performance of the model.

Drawings

FIG. 1 is a schematic model flow diagram of an embodiment of the present invention;

FIG. 2 is a diagram of the overall architecture of the model according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of the internal structure of an LSTM according to an embodiment of the present invention;

FIG. 4 is a DA-EDLSTM model structure according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of an encoding layer feature attention mechanism according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating a decoding layer time attention mechanism according to an embodiment of the present invention;

FIG. 7 is a wind power trend graph and a frequency spectrum graph according to an embodiment of the present invention;

FIG. 8 is a diagram of wind power ACF and PACF according to an embodiment of the present invention;

FIG. 9 is a mutual information ordering of 6-dimensional features according to an embodiment of the invention;

FIG. 10 is a graph of wind speed and temperature trends according to an embodiment of the present invention;

FIG. 11 shows a result of VMD decomposition of wind power and wind speed sequence according to an embodiment of the present invention;

FIG. 12 is a model error comparison of a BOV station and a CCRTA station according to an embodiment of the present invention;

FIG. 13 is an attention mechanism weighting diagram according to an embodiment of the present invention;

FIG. 14 is a comparison of model prediction results for different p and K according to embodiments of the present invention;

FIG. 15 shows a comparison of the stability of 20 sets of experiments in accordance with the present invention.

Detailed Description

In order to facilitate a better understanding of the invention for those skilled in the art, the invention will be described in further detail with reference to the accompanying drawings and specific examples, which are given by way of illustration only and do not limit the scope of the invention.

Before describing a multi-dimensional feature combination prediction method based on MI-VMD-DA-EDLSTM-VEC of the present invention, some proper terms are explained first:

MI mutual information method feature selection

VMD is variation modal decomposition;

ARMA: autoregressive moving average model

SVR support vector regression model

EDLSTM codec neural network

Double-Attention mechanism

An Encoder-Decoder, a coding and decoding architecture;

LSTM is a neural network model for long and short time memory;

Error-Correction Error Correction.

Fig. 1 is a flowchart of an embodiment of the multi-dimensional feature combination prediction method based on MI-VMD-DA-EDLSTM-VEC of the present invention, and fig. 2 is a model overall architecture diagram of an embodiment of the multi-dimensional feature combination prediction method based on MI-VMD-DA-EDLSTM-VEC of the present invention, which specifically includes:

and S1, selecting MI characteristics of the original characteristic sequence x (t) to obtain a wind power, wind speed and temperature sequence.

Mutual information methods are filtering methods for capturing arbitrary relationships (including linear and non-linear relationships) between each feature and the label [36 ]. Mutual information is a measure between two random variables X and Y that quantifies the amount of information that can be obtained by one random variable with respect to the other. For continuous type variables, mutual information is calculated by the following equation (1):

where p (x, y) is the joint probability density function of x and y, and p (x) and p (y) are the marginal density functions. The mutual informativeness determines how similar the product of the joint distribution and the decomposed marginal distribution p (x, y) is. If x and y are not related at all, p (x, y) will be equal to p (x) p (y), and its mutual information will be equal to 0. If I (X; Y) is larger, the correlation between the two variables is stronger. And respectively calculating mutual information quantity between the multidimensional characteristics and the labeled wind power by using MI, and selecting relevant characteristics for subsequent model training and prediction after sequencing.

The original characteristic sequence x (t) is a 6-dimensional characteristic sequence comprising wind power, temperature, air pressure, density and wind direction; and selecting a 3-dimensional characteristic sequence with the maximum mutual information based on mutual information sequencing of the characteristic sequence x (t) and the target sequence Y, wherein the 3-dimensional characteristic sequence comprises wind power, wind speed and temperature.

S2, performing VMD decomposition on the wind power, the wind speed and the temperature sequence respectively to obtain modal components;

the VMD has the advantages of high calculation efficiency and strong robustness, and can solve the problem of mode mixing. By applying VMD, the signal x (t) is decomposed into K subsequences or modal components u_k(

k

1, 2.. k), and the sum of the bandwidth estimates for each modal component is minimized. The method comprises the following steps of constructing and solving a variational problem: 1) by applying a function u to each mode_kPerforming Hilbert transform to obtain a corresponding frequency spectrum; 2) u is modulated by an exponential mixture modulation algorithm_kTo respective estimated center frequencies ω_k(ii) a 3) Demodulation and estimation of u using gaussian smoothness and gradient square criterion of the signal_kThe bandwidth of (c).

The variation problem with constraints is as follows (2):

and decomposing the front 3-dimensional features selected after the MI features are selected by adopting a VMD algorithm to obtain modal components with certain center frequency.

In one embodiment of the invention, VMD decomposition is performed on wind power (power), wind speed (speed) and temperature (temperature) sequences respectively to obtain 3 sets of modal components: power IMFS [ IMF ]₁,IMF₂,…,IMF_p]；Speed IMFS [IMF₁,IMF₂,…,IMF_s]；Temp IMFS[IFM₁,IMF₂,…,IMF_t]。

S3, model training and predicting the modal components obtained by VMD decomposition based on the coding solution model DA-EDLSTM using the double-layer attention mechanism to obtain an initial prediction sequence

By passing

Obtaining an original prediction error e (t);

the standard Recurrent Neural Network (RNN) has the problems of gradient disappearance and gradient explosion during back propagation, making it difficult to continuously optimize network parameters. The long-time memory neural network LSTM can effectively solve the problem, and the LSTM can store and transmit information for a long time by adding a control unit for historical information and instant information. The LSTM memory cell is controlled by three active gate structures (forgetting gate, input gate and output gate) so that valid memory cell information can be updated and utilized. The LSTM cell structure is shown in fig. 3. The three control gate and cell information update formulas are as follows:

1) forget the door: controlling last unit c in unit information update_t-1Which information is forgotten.

f_t＝σ(W_f[h_t-1；x_t]+b_f) (3)

2) An input gate: which information is input to the unit.

i_t＝σ(W_i[h_t-1；x_t]+b_i) (4)

3) Updating unit information: selectively updating c through forgetting gate and input gate_t。

4) An output gate: c is to_tActivate and control c_tThe degree of filtration of (a).

o_t＝σ(W_o[h_t-1；x_t]+b_o) (7)

h_t＝o_t⊙tanh(c_t) (8)

Wherein, W_*As a weight matrix, b_*For the bias term, e represents the matrix element product, σ is the sigmoid activation function, tanh is the hyperbolic tangent activation function, and the activation function is defined as follows:

finally, the output layer obtains the final predicted value according to the following formula.

y_t＝σ(W_yh_t+b_y) (11)

For a given pointOf n-dimensional characteristic wind power time series, e.g.

Where T is the window length of the observation time series, we use

To represent the k-th dimension characteristic sequence therein,

representing the n-dimensional signature sequence at time t.

For the wind power prediction problem, an observation sequence historical value (X) is given₁,X₂,...,X_T-1) Wherein

The aim is to find observation characteristic variable and target predictive variable y_TA non-linear mapping function F () is found so that y_T＝F(X₁,X₂,...,X_T-1)。

Inspired by human attention mechanism theory, basic stimulus features are selected in a first phase, and the stimulus is decoded by a second phase using classification information. An attention mechanism is applied to the input features at the model coding layer, so that the encoder can adaptively focus on the relevant features, which has practical significance in the prediction of the time series.

Therefore, the invention provides a wind power prediction method based on a two-stage attention machine coding and decoding model (DA-EDLSTM) by combining an attention machine mechanism and a coding and decoding model (Encoder-Decoder), and the model structure is shown in FIG. 4. The traditional attention mechanism only generates different context vectors for input parameters of a decoding layer at different moments, and the attention mechanism is introduced into an encoding layer and applied to different features of each moment, so that the selection of important feature factors is realized while the long-term time sequence dependency relationship is grasped.

The feature attention mechanism of the coding layer is shown in FIG. 5For the k-dimension feature sequence X in the input observation sequence X^kAccording to the hidden state h of the encoder at the previous moment_t-1And cell state s_t-1The attention mechanism is constructed as shown in equations (12) and (13):

wherein the content of the first and second substances,

and

are the parameters that the model needs to learn,

and

is the hidden state and the unit state of the coding layer, and m is the size of the hidden layer. To obtain

It is then normalized by the softmax function, ensuring that the sum of the attention weights is 1, for each moment of time input X_tEach influence factor is given a certain attention weight

The importance of the kth dimension feature at time t is measured. Since each feature at each time has its corresponding weight, the first stage attention weighted output is given by equation (14):

thus, we can use

In place of X_tInput into the coding layer:

where the function f1 is an LSTM network, with this input attention mechanism, the encoding layer can focus on important feature factors rather than treat all feature attributes equally.

Decoding layer time attention mechanism as shown in fig. 6, in the conventional codec structure, the encoding layer outputs the context vector having the same at all time instants, however, the input sequence X is not provided at every time instant_tAnd hidden layer hidden state h of coding layer_tThe same contribution is made to the context vector. Thus, a temporal attention mechanism is employed to selectively focus on the relevant input sequence. As shown, the attention mechanism calculates an attention weight vector e according to equation (16)_i. The vector may be used to represent the importance of the input that is not normalized.

e_i＝tanh(W_d[h_i；s_t-1]+b_d) (16)

Wherein, W_dAnd U_dAre the weight parameters that require model learning. The probability of interest of the input sequence at each time is obtained by normalization using equation (17).

The context vector at time t is then weighted by equation (18) to obtain the vector x that eventually enters the LSTM gating cell_t1：

The vector can represent the importance of the encoded input variables at different time instances in predicting the output. In general, a temporal attention mechanism is implemented in the decoding structure, the intrinsic temporal correlation between hidden states at different times is learned, and the prediction is performed using information weighted by the attention mechanism. In addition, the attention mechanism is a feedforward neural network, and parameter learning can be carried out together with the model.

As shown in fig. 6, a decoding layer time attention mechanism is illustrated, in the conventional codec structure, the encoding layer output has the same context vector at all time instants, however, the input sequence Xt and the hidden layer hidden state ht of the encoding layer at each time instant do not contribute to the context vector in the same way. Thus, a temporal attention mechanism is employed to selectively focus on the relevant input sequences. The attention mechanism is shown, and an attention weight vector is calculated according to equation (19) below. The vector may be used to represent the input importance without normalization.

e_i＝tanh(W_d[h_i；s_t-1]+b_d) (19)

Wherein, W_dAnd U_dAre the weight parameters that require model learning. The probability of interest of the input sequence at each time is obtained by normalization using equation (20).

The context vector at time t is then weighted by equation (21) to obtain the vector x that eventually enters the LSTM gating cell_t1：

S4, carrying out VMD decomposition pretreatment on the original prediction error e (t), and retraining by using a single-layer LSTM model to obtain an error prediction sequence

And S5, further comprising the step of estimating the prediction performance based on the root mean square error RMSE, the mean absolute error MAE and the symmetric mean absolute percentage error SMAPE.

One embodiment of the invention is used for predicting the data set used for performance evaluation, wherein the data set is station data of an observation station provided by a National Renewable Energy Laboratory (NREL), the time span of an original data set is 2013 all the year, the original data set comprises 6 dimensional characteristics of wind power, wind direction, wind speed, air temperature, surface air pressure, density and the like, and sampling is carried out once every 5 minutes. The experimental data set was taken from the Haizhou Wind Farm of Batterit (Bartlett's Ocean View Wind Farm, BOV) in the West of south Tower island, Massachusetts and the CCRTA site of Danis in Barnstattel.

NREL provides a data set sampled every 5 minutes from 2013, month 1 to 2013, month 12, and we resample the average of 12 samples over an hour to obtain hour data. In the experimental evaluation, 3 data subsets were used, training set, validation set and test set according to a fixed ratio of 7: 1: and 2, dividing. Thus, there were a total of 8760 hours of data samples, with 6132 samples as the training set, 876 samples as the validation set, and 1752 samples for testing.

In one embodiment of the present invention, we use 3 evaluation indices to evaluate the performance of the prediction model. The method comprises the following steps: mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Symmetric Mean Absolute Percentage Error (SMAPE). RMSE is a standard indicator of the error of the measurement model when predicting quantitative data. SMAPE is an accuracy measure based on percentage error. The 3 evaluation indexes were calculated by the following formulas, respectively. Where y (t) is the actual wind power value at time t,

is the predicted wind power value at time t, and N represents the number of data samples in the test set.

1) Mean absolute error:

2) mean square root error:

3) mean absolute percentage error:

in order to verify the prediction effect of the combined model of MI-VMD-DA-EDLSTM-VEC provided by the invention, and the self-regression moving average model ARMA, the support vector regression model SVR, the long and short memory neural network LSTM, VMD + codec LSTM, VMD + decoding layer attention mechanism + EDLSTM, VMD + dual-stage attention mechanism + EDLSTM + error correction (VMD), a wind power time sequence sample of a BOV wind power plant is shown in FIG. 7, and it can be found that the wind power time sequence has a large change amplitude and a large fluctuation frequency. This is one of the reasons for choosing this wind farm data to verify the validity and promotional ability of the proposed method. The change amplitude of the wind power is large enough to be seen from the graph; the main frequency of the wind-out electric power cannot be observed in the spectrogram, and the change of the frequency spectrum curve is severe, which shows that signals with different frequencies in the wind-out electric power interfere with each other and are seriously influenced by noise impression.

In order to determine the optimal order of the time series prediction model, the autocorrelation of the time series data needs to be analyzed. As can be seen from the ACF diagram and the PACF diagram of the wind power shown in fig. 8, the ACF diagram is characterized by a tail, while the PACF diagram is truncated. Therefore, the wind power signal meets the characteristic of an Auto Regression (AR) model, and as can be seen from the ACF diagram, the data completely enters a confidence interval after 30 steps of delay, so the optimal order p of the time sequence prediction model is initially determined to be 30.

MI feature selection is performed first.

Because the original data are 6-dimensional and comprise data such as wind power, wind direction, wind speed, air temperature, surface air pressure and density, in order to eliminate or weaken the influence of noise caused by redundant characteristics and irrelevant characteristics on model prediction, MI is used as a characteristic selection method for characteristic selection, and characteristics with strong correlation with PW are selected as main characteristics to participate in model prediction. Fig. 9 is a mutual information quantity sequence of each feature dimension calculated by a mutual information method, and it can be found that wind power, wind speed and air temperature are the first 3-dimensional main relevant features, as shown in fig. 10, a wind speed and temperature trend graph, and a subsequent model uses the 3-dimensional feature data to perform signal decomposition, model training and prediction.

The results of the prediction were performed using a single LSTM model for different dimensional features. It can be found that the prediction effect of the 3-dimensional features selected by MI is better than that of the original 6-dimensional feature data, but is still slightly inferior to that of the single-dimensional data (PW), the accuracy of the model is reduced due to the fact that the added wind speed signals have instability and complexity, and the VMD algorithm is subsequently adopted to decompose the signals into stable components with different frequencies, so that the accuracy of the model is improved.

And VMD feature extraction, namely decomposing the wind power, wind speed and air temperature data into 20, 10 and 1 IMFs by using a VMD algorithm, so that each feature is decomposed into components which are uniformly distributed on different frequency domains. The VMD decomposition results of the wind power and the wind speed are shown in fig. 11, and it can be seen that the IMFs after the wind power and the wind speed are decomposed are uniformly distributed in the frequency domain, which indicates that the VMD decomposition can well extract the characteristics of the signals in the frequency domain.

The predicted index pair is shown in fig. 12, where the RMSE and MAE predicted by the model after VMD decomposition are 1.117 and 0.799, respectively, with a significant reduction compared to the RMSE of 2.229 for the index where the 3-dimensional features were predicted directly using LSTM. The prediction result after VMD decomposition is closer to the reference line than the result of directly predicting by using LSTM, which shows that the prediction after VMD decomposition has higher precision. Compared with the prediction results of other models, the VMD-LSTM is superior to the statistical model ARMA and the machine learning model SVR; the prediction indexes MAE and RMSE of the VMD-EDLSTM are respectively 0.568 and 0.266, which are improved by 0.533 and 0.549 compared with the VMD-LSTM using single-layer LSTM; the VMD-AT-EDLSTM model which applies the Attention mechanism to use different context vectors AT the decoding layer has better prediction precision than the VMD-EDLSTM model because the hidden state of the important moment can be concerned, and indexes are respectively improved by 0.034 and 0.049; on the basis of the VMD-AT-EDLSTM model, the VMD-DA-EDLSTM model applies an attention mechanism AT the input stage of an encoding layer, so that the model can not only grasp the long-term time sequence dependency relationship and pay attention to important moments, but also realize the selection of important characteristic factors, and as can be seen from the figure, under the condition of an error-free correction module, the model has the optimal prediction performance, the prediction indexes MAE and RMSE are respectively 0.218 and 0.381, 0.048 and 0.187 are reduced on the basis of the VMD-AT-EDLSTM model, and 0.014 and 0.138 are reduced on the basis of the VMD-AT-EDLSTM model.

As shown in fig. 13, which is an Attention weight ratio diagram of an Attention mechanism in two stages of decoding and encoding, in the Attention mechanism in the first stage, an Attention mechanism is used to select key IMFs components, the number of IMFs of wind power, wind speed and air temperature is 20, 10 and 1 respectively, and it can be found from fig. 13(a) that the model mainly focuses on low-frequency trend components in each group of IMFs components; in the Attention mechanism of the second stage, the hidden state of the key time step is focused, and it can be found from fig. 13(b) that the maximum Attention weight is possessed at the 24 th step and the later time step has a larger weight, which shows that the Attention mechanism of the decoding layer at the second stage can select the key information in the long-time correlation and the information is possessed at the later time.

In one embodiment of the invention, an error correction module is added on the basis of a coding and decoding model of a two-stage attention mechanism, an original prediction error is used as a data sample, a single-layer LSTM model is used for error prediction, and finally an error prediction sequence is used for error correction of the original prediction. Considering the characteristics of unstable prediction error and strong volatility, the error sequence is preprocessed by VMD decomposition, and experimental results show that the error correction module based on VMD decomposition preprocessing can further improve the prediction precision of the model. From FIG. 12, it can be seen that the performance indexes RMSE and MAE of the VMD-DA-EDLSTM-VEC model reach 0.179 and 0.121, which are respectively 0.175 and 0.076 higher than that of the VMD-DA-EDLSTM-EC without VMD decomposition pretreatment, and 0.202 and 0.097 higher than that of the VMD-DA-EDLSTM model.

In some embodiments, different VMD decomposition layer numbers K may affect the distribution of the IMFs in the frequency domain, and further affect the prediction result of the model; different lag time steps also affect the prediction accuracy. The VMD-DA-EDLSTM model is used below to determine the autocorrelation order p and the number of eigen-decomposition layers K.

1) Determination of the number of decomposition layers K: assuming that p is 30, searching the decomposition layer number K of wind power, wind speed and temperature by a greedy algorithm_p、K_s、K_tFirst, at K_pK with relatively low search error between 15 and 25_pDetermining K on the basis thereof_sFurther determining K_sThe value of the compound is selected,

2) determination of the autocorrelation order p: searching a p value on the basis of determining the K value, and completely entering a confidence interval from about 30 autocorrelation orders on an ACF graph and a PACF graph in the graph of FIG. 7; from FIG. 13(a), the result is shown to be consistent with the result from the PACF chart, with relatively minimal errors in RMSE and MAE for an autocorrelation order p of 30.

Five models (VMD-LSTM, VMD-EDLSTM, VMD-AT-EDLSTM, VMD-DA-EDLSTM, and VMD-DA-EDLSTM-VEC) are used for respectively carrying out 20 groups of prediction experiments on BOV site data, and error indexes MAE and RMSE of the experimental results are compared, as shown in FIG. 15.

As can be seen from the average error of 20 sets of experiments in FIG. 15, the average prediction error MAE of VMD-AT-EDLSTM with increased decoding layer attention mechanism is 0.23, RMSE is 0.52, which is better than that of VMD-LSTM and VMD-EDLSTM, while the average prediction error MAE of VMD-DA-EDLSTM with further increased encoding layer attention mechanism is 0.22, RMSE is 0.38, which is better than that of the single attention mechanism model, and the prediction error is further reduced to MAE of 0.12, RMSE of 0.17 after the error correction module is used, and the model is more stable.

The invention provides a novel short-term wind power prediction model formed by combining MI, VMD, an encoding and decoding LSTM model and a two-stage ATTENTION mechanism, and provides a multi-dimensional characteristic combination model of MI-VMD-DA-EDLSTM-VEC by combining an error correction module.

The foregoing has described only the basic principles and preferred embodiments of the present invention and numerous changes and modifications may be made by those skilled in the art in light of the foregoing description and are to be included within the scope of the present invention.

Claims

1. A multi-dimensional feature combination prediction method based on MI-VMD-DA-EDLSTM-VEC is characterized by specifically comprising the following steps:

s1, selecting MI characteristics of the original characteristic sequence x (t) to obtain a wind power, wind speed and temperature sequence;

s3, model training and predicting the modal component obtained by VMD decomposition based on the coding and decoding model DA-EDLSTM using the double-layer attention mechanism to obtain an initial prediction sequence

By passing

Obtaining an original prediction error e (t);

s4, carrying out VMD decomposition pretreatment on the original prediction error e (t), and retraining and predicting by using a single-layer LSTM model to obtain an error prediction sequence

2. The method for predicting the combination of multidimensional features based on MI-VMD-DA-EDLSTM-VEC as claimed in claim 1, wherein the step S1 specifically comprises:

s11, calculating mutual information of the original characteristic sequence x (t) and the target sequence Y based on the formula 1, sequencing mutual information quantity, wherein the original wind speed sequence comprises wind power, temperature, air pressure, density and wind direction,

wherein p (X, Y) is a joint probability density function of X and Y, and p (X) and p (Y) are marginal density functions, if X and Y are not related at all, p (X, Y) will be equal to p (X) p (Y), and mutual information will be equal to 0, if I (X; Y) is larger, the correlation between the two variables is stronger;

and S12, selecting a 3-dimensional characteristic sequence with the maximum mutual information based on mutual information sequencing of the characteristic sequence x (t) and the target sequence Y, wherein the 3-dimensional characteristic sequence comprises wind power, wind speed and temperature.

3. The method for predicting the combination of multidimensional features based on MI-VMD-DA-EDLSTM-VEC as claimed in claim 2, wherein the step S2 specifically comprises: respectively decomposing wind power, wind speed and temperature sequence into modal components u with central frequency K_k(k ═ 1, 2.. times, k), where the band of each modal componentThe sum of the wide estimates is minimized, comprising the steps of:

s21, by applying each modal component u_kPerforming Hilbert transform to obtain a corresponding frequency spectrum;

s22, mixing u by an exponential mixture modulation algorithm_kTo respective estimated center frequencies ω_k；

S23, demodulating and estimating u using Gaussian smoothness and gradient square criterion of the signal_kThe bandwidth of (c).

4. The MI-VMD-DA-EDLSTM-VEC based multi-dimensional feature combination prediction method of claim 2, wherein the coding and decoding model DA-EDLSTM using a two-layer attention mechanism in step S3 comprises an input layer, a coding layer, a decoding layer and an output layer coupled in sequence, wherein the coding layer uses an attention mechanism for input features, and the decoding layer uses a time attention mechanism.

5. The combined model of MI-VMD-DA-EDLSTM-VEC for short term wind power prediction as claimed in claim 4, wherein the coding layer using an attentional mechanism for input features specifically comprises:

s31, inputting the k-dimension characteristic sequence X in the observation sequence X based on the formula 2 and the formula 3^kConstructing an attention mechanism;

wherein the content of the first and second substances,

and

are the parameters that the model needs to learn,

and

is the hidden state and unit state of the coding layer, m is the size of the hidden layer, and T is the window length of the observation time sequence;

s32, based on softmax function

Normalized and the sum of the attention weights is 1;

s33, input X for each time_tEach influence factor is given a certain attention weight

Hence attention weighted output of the decoding stage

S34, mixing

Input to the coding layer to obtain

Wherein the function f1 is an LSTM network.

6. The MI-VMD-DA-EDLSTM-VEC based multi-dimensional feature combination prediction method of claim 4, wherein the decoding layer using a temporal attention mechanism specifically comprises:

based on e_i＝tanh(W_d[h_i；s_t-1]+b_d) Computing an attention weight vector e for representing the importance of the input without normalization_iWherein, in the step (A),W_dand U_dIs a weight parameter requiring model learning;

based on

Normalization is carried out to obtain the attention frequency of the input sequence at each moment;

the context vector at time t is based on

7. The multi-dimensional feature combination prediction method based on MI-VMD-DA-EDLSTM-VEC as claimed in claim 2, further comprising performing prediction performance evaluation based on Root Mean Square Error (RMSE), Mean Absolute Error (MAE) and Symmetric Mean Absolute Percentage Error (SMAPE).

8. The MI-VMD-DA-EDLSTM-VEC based multi-dimensional feature combination prediction method of claim 2, wherein the LSTM model in the steps S3 and S4 comprises a control unit and a storage unit, wherein the control unit comprises a forgetting gate, an input gate and an output gate for controlling the storage unit information update and utilization.

9. The MI-VMD-DA-EDLSTM-VEC based multi-dimensional feature combination prediction method of claim 8, wherein the forgetting gate is used to control the previous cell c in the storage cell information update_t-1Forgotten information of formula f_t＝σ(W_f[h_t-1；x_t]+b_f)；

The input gate is used to control the information input to the unit, and the formula is i_t＝σ(W_i[h_t-1；x_t]+b_i)，

C is to_tActivate and control c_tDegree of filtration of formula o_t＝σ(W_o[h_t-1；x_t]+b_o)，h_t＝o_t⊙tanh(c_t)；

Wherein, W_*As a weight matrix, b_*As an offset term,. indicates a matrix element product,. sigma.