CN117874483A

CN117874483A - Bidirectional trapezoidal attention prediction method for data compression perception reconstruction task

Info

Publication number: CN117874483A
Application number: CN202311613466.4A
Authority: CN
Inventors: 王立辉; 张仲禹
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2023-11-29
Filing date: 2023-11-29
Publication date: 2024-04-12

Abstract

The invention provides a bidirectional trapezoidal attention prediction method for a data compressed sensing reconstruction task, which comprises the following steps of (1) compressing sensing sampling of a photovoltaic terminal data set; (2) Dividing a data set, preprocessing data and constructing an improved model; (3) Dividing a data set, preprocessing data and constructing an improved model; and (4) evaluating the index by the model. The improved bidirectional trapezoidal attention prediction model is applied to a compressed sensing reconstruction method, and compared with a traditional base tracking reconstruction (BP) algorithm and an orthogonal matching tracking (OMP) algorithm, the model realizes the optimal reconstruction signal performance. The reconstruction method effectively avoids the problem of high computational complexity of the traditional scheme, and improves the quality of data reconstruction.

Description

Bidirectional trapezoidal attention prediction method for data compression perception reconstruction task

Technical Field

The invention relates to a bidirectional trapezoidal attention prediction method for a data compression perception reconstruction task, belonging to the technical field of data science and information optimization.

Background

Currently, with the continuous expansion of the power grid scale, the power system is moving toward diversification and complicating. In order to meet the requirements of power management and control, it is important to effectively record massive power data, and these data can be used in aspects of data analysis, fault monitoring, wide area measurement and the like. However, the large-scale data generated can place a great burden on the operating speed and storage space, and may even hamper the intelligent development of the power grid.

The mass data generated by the photovoltaic terminal equipment belongs to time series data, has strong volatility and noise, and brings great pressure to communication transmission and data storage of the intelligent power grid. In order to effectively compress the data of the power system and reduce the requirements of storage space and data transmission, an effective time sequence data compression method is researched under the support of a related big data compression technology, and large-scale steady-state data generated in the operation process of the power system is researched.

(1) Data compression technique

Data compression aims at improving the efficiency of data transmission, storage and processing by reducing the data volume or reducing the data redundancy and storage space by reorganizing the data according to a certain rule under the premise of ensuring effective information transmission. Applications of data compression computing technology include local collection, analysis and processing of massive data, and intelligent analysis and decision-making based thereon. The technology is an important means for releasing the value of data resources, breaking through the bottleneck of large-scale data communication and transmission and improving the decision speed in a rapid change environment.

Time-series data compression technology research and application can alleviate the pressure of data storage and communication transmission to some extent. For large-scale time series data, common compression methods include lossless compression algorithms such as Huffman coding, burows-Wheeler transform, LZ series coding, and the like. For general time-series data, lossless compression algorithms for non-differential processing and compression are generally used, and compression is performed in units of bytes. However, in order to achieve higher compression rates, lossy compression methods are often employed and the original signal is reconstructed as much as possible for scenarios where data loss can be tolerated. Common lossy compression methods include predictive coding, JPEG, wavelet compression, and the like.

(2) Compressed sensing technology

Conventional signal acquisition systems follow the nyquist sampling theorem, requiring that a large amount of data be acquired first, and then a portion of the sampled data be discarded, such operations result in wasted resources for data processing and storage, and may affect the stability and instantaneity of the system when processing large amounts of data. In the compressed sensing method, the target signal sampling process is a compression coding process, which directly obtains compressed signal data, and omits intermediate processing redundant data in the compression process. This has important advantages for the acquisition and storage of large-scale time-series data. The compressed signal may be restored to the original signal by a reconstruction algorithm, if necessary. A conventional signal sampling compression process and a signal sampling and reconstruction process based on compressed sensing theory are shown in fig. 1.

Currently, the focus of research on compressed sensing theory is on a signal reconstruction algorithm, i.e. a process of recovering an original sparse signal from an observed value. The efficient reconstruction algorithm can recover the original signal at a lower sampling rate, thereby achieving a higher compression ratio. Currently, the conventional compressed sensing reconstruction algorithm mainly includes convex optimization algorithms such as a base tracking (BP) algorithm and greedy iterative algorithms such as an orthogonal matching Pursuit (OMP, orthogonal Matching Pursuit) algorithm. The base tracking algorithm finds an approximation of the signal by converting the non-convex problem into a convex problem. And the orthogonal matching pursuit algorithm finds a support set of sparse vectors in an iterative mode and utilizes a constraint support least square method to reconstruct signals.

However, the conventional compressed sensing reconstruction algorithm has the problems of high computational complexity and long signal reconstruction time, which limits the application of the compressed sensing technology in the field of signal processing to a certain extent.

(3) Time series prediction model

In the long time series prediction (Long sequence time-series forecasting, LSTF) task, the transducer model proposed by researchers at google corporation in 2017 was the most common model. The basic unit of the transducer model is an Attention neuron. Compared with a cyclic neural network (Recurrent Neural Networks, RNN), the Attention training process can perform parallel computation; the capability in feature extraction is superior to RNN. Compared to convolutional neural networks (Convolutional Neural Networks, CNN), the advantage of the Attention is that global data can be seen, unlike CNN which can only acquire global variables by narrowing the receptive field layer by layer.

Long-sequence time-series prediction (LSTF) requires that the model be able to effectively capture the exact long-range dependency between the output and the input. However, the transducer model has several serious problems that limit its application in time series prediction tasks. These problems include high time complexity, high memory usage, and inherent limitations of encoder-decoder architecture.

Compared with the technology of a long time sequence data prediction method based on a bidirectional sparse mechanism transducer in patent CN116541435A

In the patent CN116541435A, the time sequence information is added to single data through the time sequence information coding module, so that a transducer model can be helped to more effectively capture time sequence association among different time sequence data, and the precision is improved; a self-attention mechanism based on bidirectional sparsification is provided in a timing sequence information extraction module based on a bidirectional sparsification mechanism transducer, the biggest first u qi and ki in Sq and Sk are reserved, the rest part of Q, K is emptied, the sparsification of Q, K is completed, and the space-time overhead in the calculation process is reduced.

The present invention improves on the bi-directional attention mechanism by computing the average value of each layer in the encoder and decoder and halving the input, after multi-layer processing, the dimensions of the encoder and decoder exhibit a trapezoidal shape to highlight high-feature attention neurons.

According to the multi-head trapezoid self-attention mechanism designed by the invention, a plurality of QKV matrixes are calculated only for the part with obvious characteristics in data, information extraction is carried out on the characteristics of different dimensions of the data, and the global average value is used for replacing the Query matrix with the smaller absolute value by calculating the absolute value of the difference between the maximum value and the average value of each Query matrix. The processing mode ensures that the unreserved Query matrix and the unreserved Key matrix do not participate in model training optimization, and the average value is used for replacing the calculated result, so that the calculated amount is reduced.

In patent CN116541435a, attention is focused on improving the structure of the transducer model to improve the prediction accuracy for long time sequences. The improved model is applied to the compressed sensing reconstruction task to replace the traditional compressed sensing reconstruction algorithm, and aims to recover an original signal by using limited high-dimensional characteristic data points and further improve the compression ratio of data.

In patent CN116541435a, authors use the public data sets ETT, ETTh and ECL to evaluate the performance of the model and perform model optimization with Mean Square Error (MSE) and Mean Absolute Error (MAE) as evaluation indices. The present patent uses Critical Success Index (CSI), hit rate (POD) and False Alarm Rate (FAR) on photovoltaic terminal status dataset (PEDS) to evaluate the effect of the model and optimize. The evaluation method has high practical value.

Disclosure of Invention

Aiming at the multivariate time sequence data compressed sensing reconstruction, the invention provides a bidirectional trapezoidal attention prediction method for a data compressed sensing reconstruction task, so as to achieve the purposes of further improving the compression ratio, improving the accuracy of compressed sensing reconstruction and restoring an original signal by using limited high-dimensional characteristic data points.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

the bidirectional trapezoidal attention prediction method for the data compression perception reconstruction task comprises the following specific steps:

step 1: constructing a state time sequence data set of the photovoltaic terminal equipment;

respectively taking the total output power of unit square meter, the three voltages of unit square meter, the three-phase current of unit square meter, the three output power of unit square meter and the recording time of the photovoltaic terminal, and constructing a state time sequence data set of the photovoltaic terminal equipment;

step 2: partitioning and sparse sampling of data sets:

dividing an original photovoltaic terminal equipment state data set into a training set and a verification set, which are used for serving as an improved bidirectional cyclic transducer model for training, performing compressed sensing sparse sampling on the data set, and respectively constructing a test set by using a partial Hadamard matrix and a Gaussian random matrix by a measurement matrix;

step 3: the construction, training and optimization process of the bidirectional cyclic transducer model is improved:

setting up an improved bidirectional cyclic transducer model, setting up an encoder-decoder architecture by adopting Attention neurons, adding a redesigned multi-head self-Attention mechanism into each Attention neuron, respectively setting up a bidirectional cyclic structure, a forward and reverse combined structure in the encoder and the decoder, extracting forward characteristics and reverse characteristics of data, setting up network super parameters and configuring network model structure parameters after setting up the model, and continuously optimizing the model to enable the model to achieve convergence;

step 4: model evaluation indexes;

the model is loaded and verified on a verification set, and the network model is comprehensively evaluated on a test set by using various model evaluation indexes.

As a further improvement of the invention, the construction of the state time sequence data set of the photovoltaic terminal equipment in the step 1 adopts compressed sensing sampling:

compressive sensing has three main steps:

(1) Sparse representation of the signal;

obtaining a sparse representation base of a signal by adopting discrete cosine transform and fast Fourier transform;

(2) Designing a measurement matrix;

multiplying the measurement matrix with the original signal to obtain a measurement value, realizing data dimension reduction, designing an effective measurement matrix, enabling the measurement value to keep important information in the original signal, improving reconstruction accuracy, and using a part of Hadamard matrix and Gaussian random matrix as the measurement matrix;

(3) Designing a reconstruction algorithm;

a reconstruction algorithm that improves the bi-directional trapezoidal attention prediction model and uses it as compressed sensing is proposed based on a Transformer model.

As a further improvement of the present invention, the partitioning and sparse sampling of the data set in step 2 comprises the steps of:

with the Pandas toolkit reconstructing the date index, reordering by index, noise elements in the time series may cause serious problems, denoising before any model is built, a process of minimizing noise is called denoising, and a rolling average method is used to remove noise in the time series, wherein the rolling average is the average of the previous observation window, wherein the window is a series of values of the time series data, and each ordered window is calculated as an average.

As a further improvement of the invention, the construction of a bidirectional cyclic transducer model is improved in the step 3:

the overall structure of the network model of the improved bidirectional cyclic converter model comprises an Input Layer, an embedded Layer Embedding Layer, a position coding Layer Positional Encoding Layer, an Encoder, a Decoder, a full-connection Layer Fully-connected Layer and an Output Layer, wherein the structure of the Encoder comprises a attention Layer SparseAttention Layer and a residual regularization Layer Residual Normalize Layer, and the dimension of the attention Layer SparseAttention Layer is gradually reduced.

As a further improvement of the invention, the specific steps of the model evaluation index in the step 4 are as follows;

in order to comprehensively evaluate the application condition of the model on the time sequence prediction, the prediction reconstruction capability of the time sequence prediction model is evaluated by using root mean square error, average absolute value error and confusion matrix, whether the network reaches convergence is judged by using the trend of a loss function curve, the loss function selects an MSE loss function and an MAE loss function for predicting a target sequence, the loss is transmitted to the whole model from the input of an encoder in the forward direction, the whole model is reversely transmitted from the output of a decoder, and the smaller the loss function is, the better the model effect is;

root mean square error, the mean of the sum of squares of the errors of the observed and predicted values, SSE/n. The method is a second moment of error, and comprises the variance of the estimated quantity and the deviation thereof, which are indexes for measuring the quality of the estimated quantity, and the formula is as follows:

the mean absolute value error is a common index of prediction error in time sequence analysis, and because MAE uses the same scale as measured data, it cannot be used for comparing sequences of two different scales, and MAE is also called L1 norm loss function, and is the mean value of absolute values of differences between real data and predicted data, where the formula is:

as a further improvement of the invention, the model evaluation index in the step 4 uses critical success index, hit rate and false alarm rate to evaluate the effect and precision of the time sequence prediction task;

the critical success index, the formula:

hit rate, the formula is:

the false alarm rate is expressed as:

wherein:

a is the number of hits, namely the number that both the predicted value and the actual value are larger than the discrimination threshold;

b is the number of empty reports, namely the predicted value is larger than the discrimination threshold value and the actual value is smaller than the discrimination threshold value;

c is the number of missed reports, namely the predicted value is smaller than the judging threshold value and the true value is larger than the judging threshold value;

the closer the values of CSI and POD are to 1 and the closer the value of FAR is to 0 in the three indexes, which shows that the better the model effect is. The main improvement of the method is that

(1) The conventional task of sequential sequence prediction is adapted to predict a continuous sequence, with the previous part of a sequence as input into a trained model, and the next part of the sequence. However, the sequence of compressed perceptual sparse samples is not a complete continuous sequence. The improved bidirectional circulation model performs one forward calculation and one backward calculation, and comprehensively considers past information and future information. And comprehensively extracting the data characteristics of the missing position at the previous time and the data characteristics of the missing position at the later time in the signal, and carrying out characteristic fusion to predict the missing position of the data.

(2) The attention mechanism used in the traditional time series data prediction model calculates a Query (Query) matrix, a Key Value (Key) matrix and a Value (Value) matrix for each input, and has good effect in the feature extraction of the data, but greatly increases the calculation amount. It is found that the calculation QKV matrix for the data segment with insignificant characteristics in the time series data does not play a positive role in the prediction accuracy of the model. For data points of compressed sensing sparse sampling, a multi-head trapezoid sampling self-attention mechanism is designed, and a plurality of QKV matrixes are calculated only for the part (a data segment with larger difference from the mean value) with obvious characteristics in the data; in a multi-layer encoder and decoder, the data dimension presents a progressively decreasing trapezoidal shape by applying a residual normalization technique; information extraction is carried out on different dimensional characteristics of the data, so that model accuracy is improved, and calculation resource consumption is reduced.

(3) The improved bidirectional trapezoidal attention model is applied to the compressed sensing reconstruction task to replace the traditional compressed sensing reconstruction algorithm, the verification is carried out on the existing data set to obtain an effect which is obviously superior to that of the existing method, the time complexity and the memory requirement are further reduced, and a new solution is provided for the compressed sensing reconstruction task.

Drawings

FIG. 1 is a flow chart comparing Nyquist sampling with compressed sensing;

FIG. 2 is a flow chart of the disclosed technique;

FIG. 3 is a diagram of a compressed sensing mathematical expression process;

FIG. 4 is a block diagram of an improved bi-directional trapezoidal attention prediction model;

FIG. 5 is a schematic diagram of a bi-directional recurrent neural network deployed over time;

fig. 6 is a data set evaluation index line graph.

Detailed Description

The invention is described in further detail below with reference to the attached drawings and detailed description:

a conventional signal sampling compression process and a signal sampling and reconstruction process based on compressed sensing theory are shown in fig. 1.

The invention discloses a compressed sensing reconstruction method based on an improved bidirectional trapezoidal attention prediction model, wherein a flow chart of the technical scheme disclosed by the invention is shown in fig. 2, and the method comprises the following steps:

step 1: and respectively taking the total output power of the unit square meter, the three voltages of the unit square meter, the three-phase current of the unit square meter, the three output powers of the unit square meter and the recording time of the photovoltaic terminal, and constructing a state time sequence data set of the photovoltaic terminal equipment.

Step 2: partitioning and sparse sampling of data sets: and dividing the original photovoltaic terminal equipment state data set into a training set and a verification set, and using the training set and the verification set as model training. And performing compressed sensing sparse sampling on the data set, and constructing a test set by using the measurement matrix by using a partial Hadamard matrix and a Gaussian random matrix respectively.

Step 3: the construction, training and optimizing process of the model: an Encoder-Decoder (Encoder-Decoder) architecture is built using Attention neurons, each incorporating a redesigned multi-headed trapezoidal self-Attention mechanism. And a bidirectional circulating structure is respectively built in the encoder and the decoder, and a forward and reverse combined structure can extract the forward characteristic and the reverse characteristic of data. After the model is built, setting network super parameters and configuring network model structure parameters, and continuously optimizing the model through grading indexes to enable the model to achieve convergence.

Step 4: and (5) evaluating indexes by using a model. The model is loaded and verified on a verification set, and the network model is comprehensively evaluated on a test set by using various model evaluation indexes.

The compressed sensing sampling process of the photovoltaic terminal data set comprises the following steps:

processing the signal using compressed sensing theory, the first step is to find a sparse representation of the signal, knowing a one-dimensional random signal x, of length N, if a set of orthogonal basis ψ= (ψ) is present ₁ ，ψ ₂ ，ψ ₃ …ψ _N ) So that the following formula holds:

in the formula, if alpha has K which is less than or equal to N atoms which are non-zero and have larger absolute values, then X is called as having sparsity on a transform domain, the sparsity is K, and psi is called as a sparse basis. If x is originally sparse, then ψ is an identity matrix.

After obtaining the sparse signal, the sparse signal can be subjected to dimension reduction observation by using an observation matrix phi with the length of N and the size of M multiplied by N (M < N), so as to obtain an observation value y with the length of M, namely:

y＝φx＝φψα＝Aα

where a is the sensing matrix. The compressed sensing signal reconstruction is essentially to solve a high-dimensional signal x from a low-dimensional y value observed through an observation matrix through a series of linear equations, or to obtain the reconstructed high-dimensional signal x by adopting new schemes such as a prediction model.

The design of the observation matrix is the second step of compressed sensing, and the observation matrix needs to meet the finite equidistant property (Restricted Isometry Property, RIP), so that the measured value can contain all important information in the original signal, and the effective reconstruction of the original signal is realized. RIP theory states that for a sparse signal x, if there is a constant delta E (0, 1), the observation matrix satisfies

(1-δ)||θ|| ₂ ≤||Λθ|| ₂ ≤(1+δ)||θ|| ₂

The observation matrix satisfies the property of limited equidistance and can preserve key features in the original signal. However, in engineering practice, whether the observation matrix satisfies the RIP property is calculated by using the above formula, so that the calculation is complicated, and it is difficult to design a suitable observation matrix according to the above formula. To simplify the design complexity of the observation matrix, equivalent conditions of RIP properties, i.e. the observation matrix and the sparse representation basis are uncorrelated, may be used. The correlation is defined as follows:

formula μ ranges:the smaller μ is, the less phi is correlated to ψ. The less relevant the observation matrix and the sparse representation base are, the more features are reserved for the signals after sparse sampling, and the higher the probability of recovering the original signals is.

Specific mathematical expression of compressed sensing fig. 3:

x: input signal phi: observation matrix y: observations of

The high-dimensional signal x is projected into a low-dimensional space in an observation matrix. And carrying out sparse representation on x on a psi sparse basis, wherein x=ψs is a sparse basis matrix, and s is a sparse coefficient.

The invention adopts a part of Hadamard matrix and a random Gaussian matrix as an observation matrix. The partial Hadamard matrix meets RIP property and is a common compressed sensing measurement matrix, and the construction method comprises the following steps: firstly, generating a Hadamard matrix with the size of N multiplied by N, and then randomly selecting M row vectors from the Hadamard matrix to form a measuring matrix with the size of M multiplied by N. Because the Hadamard matrix is an orthogonal matrix, the partial Hadamard matrix with the size of M multiplied by N obtained after M rows are taken from the orthogonal matrix also has stronger non-correlation and partial orthogonality, and therefore, compared with other deterministic measurement matrices, the measurement matrix has fewer measurement numbers required for accurate reconstruction, that is, the reconstruction effect of the partial Hadamard matrix is better under the same measurement number. However, due to the hadamard matrix itself, the size of the dimension N must satisfy an integer multiple of 2, i.e., n=2, k=1, 2,3, …, so the application range and the occasion of the matrix are greatly limited.

Random gaussian matrices are also typically satisfactory RIP-property matrices, and are commonly used as measurement matrices. The design method comprises the following steps: constructing a matrix phi with M multiplied by N, and enabling each element in the matrix phi to independently obey a Gaussian distribution with a mean value of 0 and a variance of 1/M, namely:

the measurement matrix has strong randomness, and can prove that when the measurement number M of the random Gaussian measurement matrix is larger than or equal to cKlog (N/k), the RIP condition can be met with great probability. The random gaussian measurement matrix is uncorrelated with most orthogonal basis or orthogonal dictionaries and requires a relatively small number of measurements to reconstruct accurately.

The method for data set division, data preprocessing and improved model building comprises the following steps:

common problems with time series data preprocessing are unordered time stamps, missing time stamps, outliers and noise in the data. Regarding the related issues of time stamping, the date index is rebuilt by the Pandas toolkit, reordered by index. Noise elements in a time series may cause serious problems, and a rolling average method is used to remove noise in a time series. The rolling average is the average of the previous observation window, where a window is a series of values of time series data, each ordered window being averaged. This can greatly help reduce noise in the time series data.

The network model of the improved bidirectional cyclic converter model is schematically shown in fig. 4. The overall structure includes an Input Layer, an embedded Layer, a position coding Layer Positional Encoding Layer, an Encoder Encoder, a Decoder, a full-connection Layer Fully-connected Layer, and an Output Layer. Wherein the encoder structure includes a progressively smaller dimension attention layer Sparse Attention Layer, a residual regularization layer Residual Normalize Layer. The architecture of the decoder is approximately the same as that of the Encoder, and researches show that the Encoder-Dncoder has good effect on the feature extraction of data.

The photovoltaic terminal state data set belongs to a multivariate time series, and a plurality of state data exist for each time stamp, so that an input layer and an output layer need to be redesigned, and the input layer in the model is designed to be a 1×10 vector. The data input is followed by a 10 x 128 Embedding layer, which acts primarily to up-scale the input data to extract more features. Studies have shown that the Embedding layer can scale up some data features and separate some generic features into more detailed feature vectors.

For time series data, the position-coding layer Positional Encoding Layer is an indispensable layer network. The position coding layer can acquire the position information of the data point in the whole sequence, extract the time characteristic of the data, and calculate the self-position and the position values of the previous position and the next position through the current position.

In the conventional Attention neuron, by calculating a Query (Query) matrix, a Key Value (Key) matrix and a Value (Value) matrix for each input, extracting multiple characteristics of a sequence through trainable parameters in a QKV matrix is a Key of an Attention mechanism. It was found that although computing the QKV matrix extracted a large number of features, most of them always had values close to the mean. Only part of the Query matrix extracts a large number of features, and the model optimization has a positive effect. The Query matrix with a part always close to the mean value has a small effect on model optimization, and consumes a large amount of computing resources and memory. Thus, conventional Attention neurons are improved in the trapezoidal Attention layer Sparse Attention Layer used in the present invention, and the average is calculated for the Query matrix of each layer. The expression for calculating the mean is as follows:

and (3) reserving a Query matrix with a larger absolute value by calculating the absolute value of the maximum value and the average value difference value of each Query matrix, wherein the Query matrix with a smaller absolute value is replaced by the calculated average value, and neither the Query matrix nor the corresponding Key matrix participates in training optimization of the model. The expression for calculating the difference is as follows:

the residual regularization layer Residual Normalize Layer is used for preventing degradation problems in deep neural network training, wherein degradation means that the loss function of the deep neural network is gradually reduced by increasing the number of layers of the network, then tends to be stable to reach saturation, and the loss function is increased when the number of layers of the network is increased again. The weight residual value obtained by the last calculation can be added to the next layer by introducing the residual block, so that the value of the training parameter is not too small. Before training the neural network, the input data is normalized, so that the training speed can be increased, and the training stability can be improved.

The input to the decoder is a compressed-perceived sparse sampled sparse signal with only a small number of data points containing significant features of the data. The decoder is similar in structure to the encoder, including a trapezoidal attention layer and a residual regularization layer. As the number of layers of the network model increases, the attention layer in the decoder retains less Query matrix, resulting in the structure of the model assuming a trapezoidal shape. The output of the decoder is subjected to linear transformation once through a full-connected Layer, and then a complete prediction output result is obtained.

The attention layer Sparse Attention Layer of the present design uses bi-directional circular attention neurons. The two attentive neurons are not completely independent. The common neural network predicts the output of the next moment according to the time sequence information of the previous moment. The bi-directional recurrent neural network considers not only the previous state but also the subsequent state information. The processing of the bidirectional attention network is back propagation on the basis of forward propagation. Both forward propagation and backward propagation are connected to one output layer. The calculation process and structure are shown in fig. 5.

The model provided by the invention is optimized by an Adam optimizer, and the learning rate starts from 1 e-4. The learning rate of the model was decayed 0.5 times at intervals starting from 1 e-4. A total of 8 epochs were trained with appropriate early stops to prevent gradient explosion or gradient disappearance problems. We set the batch size to 32.

Model evaluation indexes;

in order to comprehensively evaluate the application condition of the model on time sequence prediction, the prediction reconstruction capability of the time sequence prediction model is judged by using a root mean square error, an average absolute value error and a confusion matrix, and whether the network achieves convergence is judged by using the trend of a loss function curve.

Root mean square error (Mean Squared Error, MSE), which is the mean of the sum of the squares of the errors of the observed values (observed values) and the predicted values (predicted values), SSE/n. It is the second moment of error, including the variance (variance) and bias (bias) of the estimator, which is an index for measuring the quality of the estimator, and its formula is:

the mean absolute value error (Mean Absolute Error, MAE), which is a common indicator of prediction error in time series analysis, cannot be used to compare sequences of two different scales because MAE uses the same scale as the measured data. MAE, also known as the L1 norm loss function, is the mean of the absolute value of the difference between the real data and the predicted data. The formula is:

on the time sequence prediction task, the invention uses the critical success index, hit rate and false alarm rate to evaluate the effect and accuracy of the prediction task.

Critical success index (Critical Success Index, CSI), the formula is:

hit ratio (Properformance Ofdetection, POD) The formula is:

false Alarm Rate (FAR), the formula is:

wherein:

a is the number of hits, i.e. the number of predicted and actual values are both greater than the discrimination threshold.

b is the number of nulls, i.e. the predicted value is greater than the discrimination threshold and the actual value is less than the discrimination threshold.

And c is the miss count, namely the predicted value is smaller than the judging threshold value and the true value is larger than the judging threshold value.

The closer the values of CSI and POD are to 1 and the closer the value of FAR is to 0 in the three indexes, which shows that the better the model effect is.

The beneficial effects are that: the time series prediction data set constructed by the output signals verifies the accuracy and the effectiveness of the time series prediction model. And constructing an improved bidirectional trapezoidal attention prediction model by using the constructed photovoltaic terminal state data set. Training is carried out by using a photovoltaic terminal data set, the hit rate of overall prediction reaches 97.9%, the mean square error reaches 0.577, and the mean absolute error reaches 0.549. The method completes the time sequence compressed sensing reconstruction and improves the accuracy of data reconstruction.

Tables 1 and 6 show that the model was optimized using Adam optimizer on a photovoltaic end state (Photovoltaic End Device Status, PEDS) dataset, learning rate set to 1e-4, decay 0.5 times at intervals, training 8 epochs total, batch size set to 32. The mean square error and the average absolute value error are obtained.

Table 1 five dataset predictions

The above description is only of the preferred embodiment of the present invention, and is not intended to limit the present invention in any other way, but is intended to cover any modifications or equivalent variations according to the technical spirit of the present invention, which fall within the scope of the present invention as defined by the appended claims.

Claims

1. The bidirectional trapezoidal attention prediction method for the data compression perception reconstruction task is characterized by comprising the following specific steps of:

step 2: partitioning and sparse sampling of data sets:

step 4: model evaluation indexes;

2. The bi-directional trapezoidal attention prediction method for a data compressed aware reconstruction task of claim 1, wherein: in the step 1, a state time sequence data set of the photovoltaic terminal equipment is constructed by adopting compressed sensing sampling:

compressive sensing has three main steps:

(1) Sparse representation of the signal;

(2) Designing a measurement matrix;

(3) Designing a reconstruction algorithm;

3. The bi-directional trapezoidal attention prediction method for a data compressed aware reconstruction task of claim 1, wherein: the partitioning and sparse sampling of the data set in step 2 includes the steps of:

4. The bi-directional trapezoidal attention prediction method for a data compressed aware reconstruction task of claim 1, wherein: and 3, building an improved bidirectional cyclic transducer model:

5. The bi-directional trapezoidal attention prediction method for a data compressed aware reconstruction task of claim 1, wherein: the specific steps of the model evaluation index in the step 4 are as follows;

6. the bi-directional trapezoidal attention prediction method for a data compressed aware reconstruction task of claim 4, wherein: step 4, evaluating the effect and the precision of the time sequence prediction task by using a critical success index, a hit rate and a false alarm rate by using model evaluation indexes;

the critical success index, the formula:

hit rate, the formula is:

the false alarm rate is expressed as:

wherein: