CN114218870A

CN114218870A - Wind speed prediction method based on variational modal decomposition and attention mechanism

Info

Publication number: CN114218870A
Application number: CN202111583769.7A
Authority: CN
Inventors: 章靖凯; 顾宏; 余向军; 秦攀
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2021-12-22
Filing date: 2021-12-22
Publication date: 2022-03-22

Abstract

The invention belongs to the technical field of wind speed prediction methods, and provides a wind speed prediction method based on variational modal decomposition and attention mechanism. The wind speed prediction problem under the multivariable condition is researched by considering the influence of natural factors. Carrying out variation modal decomposition and deep neural network input sequence by preprocessing acquired data; the deep neural network comprises a position encoder, an encoder and a decoder; the position code provides relative position information for the input signal; the encoder comprises an attention mechanism layer, a feedforward network layer and a normalized residual connection layer, and is used for capturing the mutual relation among the time sequence characteristics by the neural network model; the decoder is composed of a double-layer full-connection network and is used for deconstructing the time sequence characteristic information acquired by the encoder so as to output a predicted value. The method can effectively improve the ultra-short-term wind speed prediction precision; the method has no limitation on an operation platform, is flexible and convenient to use, has strong transportability and excellent wind speed prediction performance.

Description

Wind speed prediction method based on variational modal decomposition and attention mechanism

Technical Field

The invention relates to the technical field of wind speed prediction methods, in particular to a wind speed prediction method based on variational modal decomposition and attention mechanism.

Background

In the field of ocean navigation, the optimal air route can be designed for ship navigation by effectively and accurately predicting the wind speed and forecasting the sea condition information, and the navigation safety is guaranteed while the time and the oil consumption are reduced. In the military field, in the aspects of operation planning formulation and task deployment arrangement, the meteorological condition is an extremely important reference factor, and military commanders can pertinently assemble troops, formulate a marching scheme and set a proper transportation mode by means of accurate meteorological prediction, so that the success rate of military operation is improved. Therefore, the wind speed prediction method has great practical significance and application value.

A traditional meteorological numerical prediction technology adopts NWP numerical weather forecast physical models such as WRF and the like, so that a real-time meteorological operation rule is well simulated, and the result shows uncertainty of short-term high-precision prediction due to complexity of a modeling process and high dependence on accuracy of environmental information. The emerging meteorological numerical prediction technology is different from the analysis of a classical physical model based on a meteorological change mechanism, and the processing of time series data is mainly developed from two directions of a statistical theory and machine learning. In the prediction model based on the statistical theory, it is more common to classify different time series data based on classification methods such as a fuzzy clustering method, and to perform short-time prediction by using the type of the time series data in combination with prediction methods such as an autoregressive moving average method and an autoregressive differential moving average method. However, the accuracy of wind speed time series prediction based on statistical principles is easily limited by factors such as time series random fluctuation and instability, so more and more learners try to apply machine learning theory and model to time series prediction, such as support vector machine, kalman filter, artificial neural network, and the like. In the field of deep learning, Lidameng et al propose a deep learning model based on a convolutional recurrent neural network, which is greatly improved in short-term prediction accuracy compared with a general machine learning algorithm. Shi et al propose a convolution long and short term memory neural network, and verify on Doppler radar echo time series data, achieving higher prediction accuracy than the traditional physical model optical flow method. Currently, some researches begin to pay attention to processing such as noise reduction and complexity reduction on original data in a data preprocessing stage, and adopt a concept of decomposition, prediction and reconstruction. The SANTHOSH M and the like adopt an integrated empirical mode decomposition algorithm to decompose original data, and input deconstructed signals into a neural network to train the model, thereby greatly improving the prediction precision of time sequence data. However, the decomposition algorithm introduces an end-point effect and a modal aliasing phenomenon to a certain extent, and interferes with the actual prediction effect. In view of the above problems, a wind speed prediction method with higher accuracy is needed.

Disclosure of Invention

The invention provides a wind speed prediction method of a deep neural network based on a variational modal decomposition and fusion attention mechanism. In order to overcome the defects of the prior art, the invention researches the wind speed prediction problem under the multivariable condition on the basis of considering natural factors such as wind direction, wind speed, gust, air pressure, temperature, water surface temperature and the like. The variational modal decomposition can effectively improve noise robustness to effectively decompose and extract the wind speed time sequence while filtering noise and reducing data complexity. Meanwhile, through mirror extension, the end effect and the false component phenomenon in the iteration process can be avoided. The depth neural network model Transformer fused with the attention mechanism has the capability of capturing global internal correlation information, can effectively capture long-distance dependence, and does not have excessive information accumulation, redundancy and loss. A contrast test shows that the model can effectively improve the ultra-short-term wind speed prediction precision. By designing a deep learning model, the problems of signal noise reduction, feature extraction, attention mechanism and the like are focused, and the superiority of the method is verified by means of theoretical analysis, comparative experiments and the like.

The technical scheme of the invention is as follows: a wind speed prediction method based on variational modal decomposition and attention mechanism comprises the following steps:

the method comprises the following steps: collecting historical meteorological data, constructing an original wind speed and a related covariate data set, and filling data by adopting a weighted linear interpolation method when the data is missing or abnormal, namely the value is more than or equal to 999;

step two: data preprocessing, namely, in order to eliminate the influence of different characteristic variable dimensions in a data set on an experiment, zooming the data in the data set to the same scale by adopting a mean-variance normalization method (the mean is 0, and the variance is 1); dividing a data set into a training set, a verification set and a test set according to a ratio of 8:1: 1;

step three: decomposing variation modes; taking a sequence with a certain step length of a data set as an input sequence u (t) of the deep neural network, and obtaining K sequences u (t) with specific modes by adopting variational mode decomposition on the input sequence u (t)_k(t), K ═ 1, 2.. K; by means of mirror extension, the end effect and the false component phenomenon in the iteration process can be avoided. The variational modal decomposition can filter noise, reduce the complexity of data and effectively improve the robustness of the noise;

(3.1) mixing analytic signals of each mode of the input sequence u (t) after Hilbert transform into an estimated center frequency:

where δ (t) is the dirac function, ω_kIs the center frequency of the kth mode;

the sum of K modalities is the input sequence u (t) and the sum of the estimated bandwidths of the modalities is the smallest, i.e.:

wherein, { u_k}＝{u₁,…,u_K},{ω_k}＝{ω₁,…,ω_K}；

(3.2) changing the constrained variation problem of equation (2) to an unconstrained variation problem:

in the formula, alpha is a secondary penalty factor, and the reconstruction precision of the input sequence u (t) under the condition of mixing noise signals is ensured; lambda (t) is a Lagrange multiplication operator, and the strictness of constraint conditions is kept;

updating by iteration alternatively by using alternative direction multiplier method

And

obtaining the optimal solution in the formula (2); the center frequency of the new component of each mode is represented by the following formula:

new component of each mode after decomposition

Comprises the following steps:

wherein n is the iteration number of the algorithm,

and

respectively represent u (omega),

And λⁿ(ω) fourier transform;

the decomposed input sequence u (t) is subjected to inverse Fourier transform to obtain a real part:

in the formula, F^-1(. cndot.) represents the inverse Fourier transform, Real (. cndot.) represents taking the Real part of the complex number;

step four: decomposed K components { u }_k(t) the real part is used as an input signal of the depth neural network of the fusion attention mechanism, and the actual wind speed is used as a prediction target to carry out parameter training, verification or prediction on the depth neural network; the deep neural network comprises a position encoder, an encoder and a decoder; the position code provides relative position information for the input signal; the encoder comprises an attention mechanism layer, a feedforward network layer and a normalized residual connection layer, and is used for capturing the mutual relation among the time sequence characteristics by the neural network model; the decoder is composed of a double-layer full-connection network and is used for deconstructing the time sequence characteristic information acquired by the encoder so as to output a predicted value.

The weighted linear interpolation method is as follows:

wherein u (t)_miss) Represents the sequence t_missMissing or abnormal values to be filled at time u (t)_prev) T being the nearest before a deletion or outlier_prevEffective value of time, u (t)_next) T being the nearest after a deletion or outlier_nextThe effective value of the time of day. The construction method of the position code is as follows: adding a specific vector to each single-point data of the input sequence u (t), and inputting the wholeInjecting position information of the sequence signal; obtaining the input of the next attention mechanism layer by constructing a matrix which is consistent with the dimension of the input sequence u (t) and adding the matrix with the input sequence u (t);

where pos represents the position of the single point data in the entire input sequence u (t), d_modelIs the dimension of the unique vector, and i represents the position of the unique vector.

In the attention mechanism layer, three matrixes W with the same dimension are initialized randomly_Q，W_KAnd W_VRespectively performing linear transformation on the input sequence by using the three matrixes to obtain a result which is recorded as a query vector Q, a key vector K and a value vector V, and mapping the result to an output matrix by calculating, wherein the weight allocated to each value is calculated by a softmax function of a compatibility function of the query vector Q and the corresponding key vector K; the output matrix is calculated as:

wherein

And taking a square root for the K column dimension of the matrix to prevent unstable training.

And forming a multi-head attention mechanism layer by arranging h attention mechanism layers.

And (3) outputting a result by the model: and (5) specifying a wind speed predicted value at a future moment, and storing the predicted result in a file.

The invention has the beneficial effects that: after acquiring meteorological observation data of the buoy station, the method carries out preprocessing, variational modal decomposition and neural network prediction on the data, and finally obtains a predicted value of the wind speed at a future designated moment.

The method has no limitation on the operation platform, is flexible and convenient to use, has strong transportability and excellent wind speed prediction performance.

Drawings

FIG. 1 is a general flow chart of a wind speed prediction method based on a variational modal decomposition and attention mechanism;

FIG. 2 is an attention mechanism layer structure;

FIG. 3 is a multi-head attention mechanism configuration;

FIG. 4 is a result diagram of a time series wind speed decomposed by a variational modal decomposition algorithm.

Detailed Description

The following detailed description of the embodiments of the present invention is provided in conjunction with the summary of the invention:

the process of predicting wind speed using a deep neural network model based on a variational modal decomposition with a fused attention mechanism is described in detail below. The use and method of use of the present invention are further illustrated by the following examples, but the invention is not limited thereto.

1. Experimental Environment configuration

A software system: window10 system

Programming language: python3.7

A deep learning framework: pytrch 1.10

2. Experimental methods

(1) Acquiring wind speed and related meteorological data: the historical wind speed observed by a meteorological buoy and related meteorological data are stored as csv files, and the data are filled up by adopting a weighted linear interpolation method under the condition that the observed data are missing or abnormal (the value is more than or equal to 999). The fractional linear interpolation expression is as follows:

wherein u (t)_miss) Represents the sequence t_missMissing or abnormal values, u (t), to be filled in at a time_prev) T being the nearest before a deletion or outlier_prevEffective value of time, u (t)_next) T being the nearest after a deletion or outlier_nextThe effective value of the time of day. Dividing a data set into a training set, a verification set and a test set according to a ratio of 8:1: 1;

(2) data preprocessing: in order to eliminate the influence of different characteristic variable dimensions in the data set on the experiment, the invention adopts a mean-variance normalization method to scale the data to the same scale (the mean is 0 and the variance is 1) and respectively applies the data to a training set, a verification set and a test set, and the conversion formula is as follows:

where μ is the mean of the data set and σ is the standard deviation of the data set.

The wind direction, the wind speed, the gust, the atmospheric pressure, the air temperature, the water temperature and the like measured by a station are taken as 6 characteristics which possibly influence the wind speed prediction, 36 data observation values which are continuously observed for 18 hours at intervals of 30 minutes are taken as data input of each characteristic dimension to form a [ Batch Size,36 and 6] input matrix, wherein the Batch Size is the number of samples selected by single training, and the Batch Size is taken as 256.

(3) And (3) variational modal decomposition: seeking K sequences u (t) with specific modes for an original input sequence u (t) by adopting variable mode decomposition_kK, such that the sum of the estimated bandwidths of each modality is minimal, with the constraint that the sum of the K modality reconstructions is equal to the input sequence u (t). The target mode number K is set according to the fact that the center frequencies of the modes are not overlapped and the center frequency of the last layer of components is kept relatively stable. If the value of K is too large, mode overlapping and noise introduction may be caused, and if the value of K is too small, a problem of incomplete decomposition may occur, which may affect prediction accuracy. The method specifically comprises the following steps:

(3.1) for the wind speed input sequence u (t), mixing the analytic signals of each mode after Hilbert transform to an estimated central frequency, thereby modulating the frequency spectrum of each mode to a corresponding basic frequency band:

where δ (t) is the dirac function, ω_kIs the center frequency of the k-th mode。

The constrained variation problem can be described as the sum of K modalities being the input time series and the sum of the estimated bandwidths of the modalities being the smallest, i.e.:

wherein, { u_k}＝{u₁,…,u_K},{ω_k}＝{ω₁,…,ω_K}

(3.2) on the basis, introducing a secondary penalty factor alpha to ensure the reconstruction accuracy of the time sequence under the condition of mixing noise signals, and introducing a Lagrangian multiplier lambda (t) to keep the strictness of constraint conditions, so that the constraint variable problem is changed into an unconstrained variable problem, wherein the expanded Lagrangian expression is as follows

And

and (n is iteration times) seeking a 'saddle point' of the extended Lagrange expression so as to obtain the optimal solution in the formula (2). Wherein the center frequency of the new component of each mode after decomposition can be represented by the following formula:

new component of each mode after decomposition

Can be expressed as:

wherein n is the iteration number of the algorithm,

and

respectively represent u (omega),

And λⁿ(ω) Fourier transform. The decomposed sequence is subjected to Fourier inversion to obtain a real part as model input, and the real part can be expressed as:

in the formula, F^-1(. cndot.) denotes the inverse Fourier transform, and Real (. cndot.) denotes taking the Real part of the complex number.

Decomposed K components { u }_k(t) as model input, the original input data dimension becomes [256,36,6K ]]。

The hyper-parameters involved in the variational modal decomposition are respectively taken according to empirical values: penalty factor α is 1000, fidelity sparsity τ is 1 × 10-6 and convergence stop condition e is 1 × 10^-9. A schematic diagram of the result of decomposing the wind speed time series by the variational modal decomposition algorithm is shown in fig. 4;

(4) fusing attention mechanism neural networks: the neural network model Transformer of the attention mechanism is completely based on the attention mechanism, and does not need recursive and convolution calculation, so that the calculation of model parameters does not depend on the output of a previous position and the input of a current position to calculate the output of the current position, the serialized calculation is converted into parallelization, and the running time is effectively saved. In addition, the traditional recurrent neural network algorithm has poor capability of extracting global information, has a larger bottleneck in feature extraction and prediction of a long time scale, and the Transformer can extract information from the global information and has a larger advantage in feature extraction and prediction of the long time scale.

As shown in fig. 1, the model is divided into two parts, encoding and decoding. The encoder consists of a plurality of identical unit blocks, each unit block consists of two subunits, including a multi-head attention mechanism layer and a fully-connected feedforward network layer, and residual connection and standardization are added to each subunit. The decoder part adopts a double-layer fully-connected neural network.

The time sequence data after the decomposition of the variation mode is firstly subjected to position coding. The Transformer uses the position information of the time sequence and introduces position coding in the model to record the relative position space information between each data in the sequence. The Transformer adds a specific vector to each input single-point data source, and the vector follows a specific pattern, which is helpful for determining the position of each single-point data source and the distance between different single-point data sources, so that the distance between data points can be better expressed in subsequent calculation. The position code may be constructed by constructing a matrix having dimensions corresponding to the dimensions of the input data and adding the matrix to the input to obtain the input of the next attention level.

Where pos represents the position of the single point data in the whole sequence, d_modelIs the dimension of the input vector and i represents the position of the encoded vector. The entire input sequence signal is injected with position information, the difference of which depends on the difference in frequency and phase.

The input signal sequence u (t) is input into the attention layer after adding the position code, as shown in fig. 2. The attention mechanism is the core of the Transformer, and corresponding position information characteristics can be learned in different subspaces. The attention mechanism function may be described as a set of d_kQuery vectors of dimension Q and d_kKey vectors of dimension K and d_vThe vector of values V of the dimension is mapped to an output, which is calculated as a weighted sum of the values, wherein the weight assigned to each value is determined by the compatibility function of the query with the corresponding keySeveral softmax function calculations. The output matrix is calculated as:

in which the dot product is calculated using all the keys, dividing each key by

And the phenomenon that the gradient of softmax is extremely small due to the overlarge dot product dimension is prevented, so that the training is unstable.

As shown in fig. 3, the multi-head attention mechanism performs linear transformation on Q, K, and V through h different attention layers, that is, there are multiple subspaces in the model, applies the self-attention mechanism to each subspace, and finally splices the outputs of the different subspaces to convert into a result with the same dimension as that of the single self-attention mechanism. The self-attention layer is further improved through the mechanism of increasing the multi-head attention, and the capability of the extensible model for focusing on different positions can be expanded.

Each layer in the encoder comprises a fully connected feed forward network, which is fully connected with two layers, the activation function of the first layer is ReLu, whose mathematical expression is f (x) max (0, x), and the second layer is a linear activation function, which are collectively expressed as:

FFW(x)＝max[W₂(0，W₁x+b₁)+b₂] (20)

where x is the output of the multi-headed attention layer, W is the weight matrix, and b is the offset value. The two linear transformation parts jointly form a feedforward neural network, and the feedforward network parameters in the Transformer are different, so that different sequence data can be flexibly matched, and the prediction precision is improved.

And performing residual connection after the attention layer is normalized. Residual concatenation, i.e., inside each layer, concatenates the input and output of multiple self-attention or feedforward layers. And subsequently, inputting the processed time sequence data into a decoder, namely a fully-connected neural network, and optimizing the model.

The invention can carry out one to the wind speedStep-to-step prediction, Mean Square Error (MSE) and decision coefficient R are selected²As an index for evaluating the accuracy of the prediction result. The performance index is defined as follows:

wherein N is the total number of samples in each batch of the wind speed time series data test set,

and y_iRespectively representing the predicted value and the actual observed value of the wind speed time series,

representing the mean of the actual observations. The smaller the MSE value is, the closer the predicted value is to the actual value is, and the higher the prediction accuracy of the model is. Determining the coefficient R²The linear relation between the two variables is reflected, the fitting degree of the predicted value relative to the actual observed value is shown, whether the model predicts the wind speed value accurately or not can be measured, and the higher the correlation coefficient is, the better the model prediction performance is.

In the invention, a set encoding layer in a fusion attention mechanism neural network is set as 2, wherein a multi-head attention mechanism layer is set as double-head attention, a hidden unit of a feedforward network in each encoder is set as 64, a decoder part is a double-layer fully-connected neural network, and the hidden units are respectively 36 and 18. The model takes 256 groups, each group of continuous 36 observed values is used as a Batch of data set to be input into a network, namely, Batch Size is 256, the network takes the mean square error as a loss function, the network parameters are updated by using a back propagation algorithm, and an adaptive optimizer adam (adaptive motion estimation) is used for optimizing the network model parameters. The model regularization Dropout ratio is set to 0.2 and the Adam optimizer learning rate is set to 0.002.

3. Results of the experiment

(1) Comparative experiment:

and training the model on a training set, performing model verification on a verification set every training round, comparing the verification effect, stopping training when the performance of the model on the verification set is not improved for 5 consecutive rounds, and storing the model parameter with the best verification effect for predicting the wind speed sequence. The effect of the invention compared to other classical wind speed prediction modes is as follows.

TABLE 1 comparison of model predicted effects (MSE index)

TABLE 2 comparison of model predicted effects (R)²Index)

The VMD-Transformer Model is compared with other classical wind speed prediction modes such as LSTM, VMD-LSTM and Persistence Model, and 84944 meteorological data measured from 31/2015 to 23/2020/11/2015 at 30-minute intervals are selected as experimental data sets by adopting 48.493 northern latitude and 46087 Western longitude buoy stations in national data buoy center of national oceanic and atmospheric administration (https:// www.ndbc.noaa.gov /) and by adopting the 46087 buoy stations of Western longitude 124.726. Using the standard deviation MSE and the decision coefficient R²The results of the experiments are shown in tables 1 and 2 as evaluation indexes. Through the comparison experiment in the table, compared with the mode that the original data are directly processed without variation modal decomposition, the model combined with the VMD is more excellent in prediction effect, and the VMD can effectively extract the characteristics of the input time sequence data, so that the neural network model parameter training is more focused on processing the key characteristic information. Meanwhile, different neural network models are compared to find VThe MD-LSTM has better effect on 1-step prediction, but the VMD-transducer has better prediction performance on 2-18 steps, so that the neural network model transducer introducing the fusion attention mechanism has better learning effect on global information characteristics compared with the recurrent neural network LSTM, thereby having better and excellent performance in prediction of a relatively longer time sequence.

Claims

1. A wind speed prediction method based on variational modal decomposition and attention mechanism is characterized by comprising the following steps:

the method comprises the following steps: collecting historical meteorological data, constructing an original wind speed and a related covariate data set, and filling data by adopting a weighted linear interpolation method under the condition of data loss or abnormality;

step two: data preprocessing, namely zooming the data in the data set to the same scale by adopting a mean-variance normalization method; dividing the data set into a training set, a verification set and a test set according to the ratio of 8: 1;

step three: decomposing variation modes; taking a sequence with a certain step length of a data set as an input sequence u (t) of the deep neural network, and obtaining K sequences u (t) with specific modes by adopting variational mode decomposition on the input sequence u (t)_k(t)，k＝1，2...K；

wherein the content of the first and second substances,{u_k}＝{u₁，…，u_K}，{ω_k}＝{ω₁，…，ω_K}；

And

new component of each mode after decomposition

Comprises the following steps:

wherein n is the iteration number of the algorithm,

and

respectively represents u (w),

And λⁿ(w) a fourier transform;

2. The method of claim 1, wherein the weighted linear interpolation is as follows:

wherein u (t)_miss) Represents the sequence t_missMissing or abnormal values to be filled at time u (t)_prev) T being the nearest before a deletion or outlier_prevEffective value of time, u (t)_next) T being the nearest after a deletion or outlier_nextTime of dayIs determined.

3. The wind speed prediction method based on the variational modal decomposition and attention mechanism according to claim 1 or 2, wherein the position code is constructed by the following specific method: adding a special vector to each single-point data of an input sequence u (t), and injecting position information into the whole input sequence signal; obtaining the input of the next attention mechanism layer by constructing a matrix which is consistent with the dimension of the input sequence u (t) and adding the matrix with the input sequence u (t);

4. The wind speed prediction method based on the variational modal decomposition and attention mechanism according to claim 1 or 2, wherein in the attention mechanism layer, a matrix W with three dimensions being same is initialized randomly_Q，W_KAnd W_VRespectively carrying out linear transformation on the input sequence u (t) by using the three matrixes to obtain results which are recorded as a query vector Q, a key vector K and a value vector V, and mapping the three matrixes to an output matrix through calculation, wherein the weight distributed to each value is calculated by a softmax function of a compatibility function of the query vector Q and the corresponding key vector K; the output matrix is calculated as:

wherein

5. The wind speed prediction method based on the variational modal decomposition and attention mechanism according to claim 1 or 2, wherein a multi-head attention mechanism layer is formed by setting h attention mechanism layers.