Disclosure of Invention
In order to solve the problem that the Long-term information is easy to lose when the neural Network processes the Long-sequence information, the invention provides a Long Short-term Time-series Network (LSTNet) model to deal with the problem. Thermal loading is a typical time series problem, and time-wise thermal loading is periodic. The long-short time sequence network model provided by the invention utilizes the characteristic and introduces the idea of cyclic jump, thereby effectively solving the problem of information loss. Firstly, the model utilizes a Random Forest (RF) algorithm to screen and reduce dimensions of features; and then, a thermal load prediction model based on a long-term and short-term time sequence network is established, long-term and short-term characteristic information is captured through the convolution layer and the cycle layer, then the concept of a cycle jump layer is introduced, longer-term characteristic information is captured, and meanwhile, linear processing capacity is added to the model by utilizing an autoregressive algorithm, so that the robustness of the model is enhanced.
The invention adopts the following technical scheme and implementation steps:
s1, selecting meteorological data and heating data in a certain time period, and constructing a data set as an input variable Xn;
S2, preprocessing the data, including identifying and correcting missing values and outliers, and standardizing the data;
s3, screening the input variables by using an RF method, and performing dimensionality reduction operation on the data set to obtain XmAnd the data set is divided into 8: 2, dividing the ratio into a training set and a testing set;
s4 inputs the training set into the LSTNet model item by item, the weights and biases of the training model:
s401, firstly, capturing short-term local characteristic information by using a convolutional layer;
s402, utilizing the circulation layer to capture the long-term macro information, and outputting ht R(ii) a Simultaneous cycle of skip floor benefit
The periodic characteristics of the sequence are used to capture longer-term information, the output is ht S;
S403, connecting the outputs of the cycle layer and the cycle jump layer in a mode of full connection layer to obtain the output yt D。
S404 thenLinear components are added for prediction by combining the output of the AR process, and meanwhile, the model can capture the scale change of the input, the robustness of the model is enhanced, and the output y is obtainedt A。
The output module of S405 integrates the output of the neural network part and the output of the AR model to obtain a final prediction model.
S5, inputting the test set into the well-trained LSTNet model one by one to obtain a predicted value
Advantageous effects
Compared with the prior art, the method fully considers the periodic characteristic of the ultra-short-term thermal load, and makes up the problem of information loss of the conventional neural network caused by gradient descent by introducing the concept of the cycle jump layer. Different from the traditional neural network algorithm, the method fully considers the periodic characteristic of the time-by-time heat load, the characteristic is more representative, and the prediction task of the ultra-short-term heat load can be better completed.
Detailed Description
The technical features and advantages of the present invention will become more apparent from the following detailed description of the embodiments of the present invention when taken in conjunction with the accompanying drawings.
S1, selecting as many related characteristic variable data as possible, wherein the related characteristic variable data may include meteorological data, operation condition data, heat load data and the like, so as to construct a heat load data set to obtain Xn={x1,x2,…,xnN is the number of characteristic variables;
s2, after the data set is constructed, preprocessing the data:
s201 compensates for the missing value, i.e. the value of 0 or null data, and calculates using the following formula:
xi=0.4xi-1+0.4xi+1+0.2xi+2 (1)
in the formula xiIs the current miss value, xi-1、xi+1And xi+2The values of the previous moment, the next moment and the next two moments are respectively;
s202 treats an outlier, that is, a value exceeding 3 times or more the predetermined range, as a missing value;
s203, standardizing each dimension input variable, wherein the adopted calculation formula is as follows:
in the formula y
iIs a normalized value; x is the number of
iIs the original value;
and s represent the mean and variance of the raw data, respectively. The normalized data mean is 0, variance is 1, and there is no dimension.
S3, screening and dimension reduction are carried out on the feature variables by using an RF algorithm, the idea of evaluating feature importance by using a random forest is simple, and the method mainly comprises the steps of determining how much each feature contributes to each tree in the random forest, then averaging, and finally comparing the contribution sizes of different features. The importance of a certain feature x is denoted as IMP, and the calculation method is as follows:
s301, for each decision tree in the random forest, calculates its Out-Of-Bag data error, denoted as OOB error1, using the corresponding Out-Of-Bag data (Out Of Bag, OOB), and the calculation formula is as follows:
and taking the out-of-bag data as input, bringing the out-of-bag data into a random forest classifier, performing classification comparison on the O pieces of data by using the classifier, and counting the number of classification errors to be set as X.
S302, randomly adding noise interference to the characteristic x of all samples of the data outside the bag, and calculating the error of the data outside the bag again and recording the error as OOBERR 2;
s303, assuming there are N trees in the random forest, the importance IMP of the feature x is shown in formula 1:
after noise is randomly added to a certain feature, the accuracy rate outside the bag is greatly reduced, which indicates that the feature has a great influence on the classification result of the sample, that is, the feature has a high importance degree.
The invention utilizes random forest to sort the importance of the characteristic variables in a descending order, then determines the deletion ratio, and eliminates the unimportant indexes of the corresponding ratio from the current characteristic variables, thereby obtaining a new characteristic set, wherein the characteristic of the new characteristic set is Xm={x1,x2,…,xm}. Wherein m is<n, the deleting proportion is determined according to the number of the characteristic variables in the original data set. After dimensionality reduction of the dataset, the dataset is scaled by 8: the scale of 2 is divided into a training set and a test set.
S4, inputting training set data into the LSTNet model item by item according to a time sequence, wherein the weights and the bias of the training model are shown in the overall structure of the LSTNet model in FIG. 1:
s401, the first module of the network is a convolution layer, and the function of the convolution layer is to extract features and capture local short-term feature information. The convolutional layer module consists of a number of filters, where the width is ω, the height is m, and m is the same as the number of features. The output of the ith filter is then:
hi=ReLU(Wi*X+bi) (5)
in which h is outputiAs a vector, ReLU is an activation function, and ReLU (x) max (0, x). Is a convolution operation, WiAnd biRespectively weight matrix and bias.
S402 the convolutional layer module outputs the loop layer and the loop-jump layer simultaneously input to the second module. What is used by the loop layer is a gated loop Unit (GRU), in which ReLU is used as an activation function for implicit updates. Then the hidden state output h of the cell at time tt RComprises the following steps:
wherein z is
tAnd r
tThe outputs of the update gate and reset gate in the GRU neuron respectively,
output for an intermediate state; σ is sigmoid activation function, x
tAn input at this layer at time t, which is an elemental product; w, U and b are the weight matrix and offset, respectively, for each gate cell. The output of this layer is the hidden state at each time step.
The GRU network can capture long-term history information, but because the gradient disappears, all the previous information cannot be saved, so that the correlation of the longer-term information is lost. In the LSTNet model, the problem is solved by a jumping idea, which is based on periodic data, by the hyper-parameter of period p, obtaining very far time information. When the time t is predicted, the time data information of the previous period, the previous period and the earlier period can be predicted. Since this type of dependency is difficult to capture by the cyclic unit due to the long time of one cycle, introducing a cyclic network structure with hopping connections can extend the time span of the information flow to obtain longer-term data information. Its output h at time tt SComprises the following steps:
the input to this layer is the same as the recycle layer and is the output of the convolutional layer. Where p is the number of skipped hidden units, i.e. the period. The general period is easily determined, and according to engineering experience or data trend, if the data is non-periodic or the periodicity is dynamically changed, attention mechanism method can be cited to dynamically update the period p.
S403, connecting the two layers, and combining the outputs of the two layers by adopting a full-connection layer mode by the model. The output of this layer at time t is:
wherein WRAnd WSWeights assigned to the loop layer and the loop jump layer, respectively, and b is an offset value.
S404 in the actual data set, the input scale changes non-periodically, but the neural network is not sensitive to the scale changes of the input and output, so the prediction accuracy of the neural network model is significantly reduced by this problem. Therefore, in the model, in order to solve the deficiency, a linear part is added in the model, and a classical Autoregressive (AR) model is adopted to enhance the robustness of the model. The output y of the AR model at time tt AComprises the following steps:
wherein q isAIs the input window size on the input matrix.
The S405 output module integrates the output of the neural network part and the output of the AR model to obtain the final output of the LSTNet model
Comprises the following steps:
wherein
Is the final predicted value of the model at time t.
S406, in the model training process, a Mean Square Error (MSE) function is used as a loss function, and the formula is as follows:
where n is the number of valid data,
and y
iRespectively predicted values and actual values tested.
S5, inputting the test set into the well-trained LSTNet model one by one to obtain a predicted value
To verify the effectiveness of the method, we used normal data from a heating season for verification. The data is obtained by simulating the 120-day heating process of the heat exchange station in one district of Zheng State in Henan by EnergyPlus software, and a data diagram is shown in FIG. 2. The comparison experiment is performed by using methods such as AR, Integrated Moving Average Autoregressive (ARIMA), MLR, SVR, GRU and the like, the experimental result is shown in FIG. 3, and the evaluation index result of each model is shown in Table 1.
TABLE 1 comparison of evaluation indexes for models of thermal load prediction
Model (model)
|
RMSE(×103)
|
MAE(×103)
|
R-Squared
|
AR
|
40.815
|
27.213
|
76.724%
|
ARIMA
|
33.892
|
19.028
|
83.951%
|
MLR
|
31.631
|
20.857
|
86.020%
|
SVR
|
29.220
|
18.662
|
88.070%
|
GRU
|
24.994
|
17.249
|
91.268%
|
LSTNet
|
15.833
|
12.341
|
96.501% |
From the above experimental results, it can be seen that the LSTNet model utilized herein predicts performance better than other models for time-wise thermal load prediction, closer to 1 on the R-square index than other models. Compared with a GRU model, the RMSE of the LSTNet model in the model is reduced by 36.7%, the MAE is reduced by 28.5%, and the model precision is obviously improved.