CN113962454A - LSTM energy consumption prediction method based on dual feature selection and particle swarm optimization - Google Patents
LSTM energy consumption prediction method based on dual feature selection and particle swarm optimization Download PDFInfo
- Publication number
- CN113962454A CN113962454A CN202111213171.9A CN202111213171A CN113962454A CN 113962454 A CN113962454 A CN 113962454A CN 202111213171 A CN202111213171 A CN 202111213171A CN 113962454 A CN113962454 A CN 113962454A
- Authority
- CN
- China
- Prior art keywords
- lstm
- prediction
- model
- particle
- energy consumption
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 239000002245 particle Substances 0.000 title claims abstract description 66
- 238000000034 method Methods 0.000 title claims abstract description 45
- 238000005265 energy consumption Methods 0.000 title claims abstract description 37
- 238000005457 optimization Methods 0.000 title claims abstract description 19
- 230000009977 dual effect Effects 0.000 title claims abstract description 11
- 238000012549 training Methods 0.000 claims abstract description 12
- 238000010219 correlation analysis Methods 0.000 claims abstract description 4
- 230000006870 function Effects 0.000 claims description 17
- 239000011159 matrix material Substances 0.000 claims description 9
- 230000003044 adaptive effect Effects 0.000 claims description 8
- 230000004913 activation Effects 0.000 claims description 4
- 238000001914 filtration Methods 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 238000012986 modification Methods 0.000 claims description 2
- 230000004048 modification Effects 0.000 claims description 2
- 238000010248 power generation Methods 0.000 claims description 2
- 230000008901 benefit Effects 0.000 abstract description 4
- 238000011156 evaluation Methods 0.000 description 11
- 238000002474 experimental method Methods 0.000 description 8
- 230000000694 effects Effects 0.000 description 6
- 230000005611 electricity Effects 0.000 description 5
- YHXISWVBGDMDLQ-UHFFFAOYSA-N moclobemide Chemical compound C1=CC(Cl)=CC=C1C(=O)NCCN1CCOCC1 YHXISWVBGDMDLQ-UHFFFAOYSA-N 0.000 description 4
- 238000010187 selection method Methods 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 3
- 241001123248 Arma Species 0.000 description 2
- 230000000052 comparative effect Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008033 biological extinction Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000035515 penetration Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000010206 sensitivity analysis Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/086—Learning methods using evolutionary algorithms, e.g. genetic algorithms or genetic programming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Economics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Human Resources & Organizations (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- Evolutionary Biology (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Game Theory and Decision Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Development Economics (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Physiology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- Primary Health Care (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses an LSTM energy consumption prediction method based on dual feature selection and particle swarm optimization. The method comprises the following steps: performing correlation analysis on time and feature dimensions of the original data set by adopting an MI mutual information method, and selecting front N' dimension features most effective on the energy consumption prediction target value; step two: performing secondary feature selection on the N-dimensional features to obtain N' dimensional features after PMI feature selection; step three: performing model training and prediction on the data after PMI dual feature selection by adopting an LSTM model to obtain an initial prediction sequence y (t); step four: and optimizing the hyperparameter units, dropout and batchsize of the LSTM model by adopting a PSO algorithm, thereby improving the prediction precision of the LSTM model and finally obtaining the PMI-LSTM-PSO model. The method has the advantages of high prediction precision, high algorithm efficiency and stable prediction performance.
Description
Technical Field
The invention relates to the technical field of building energy consumption prediction, in particular to an LSTM energy consumption prediction method based on dual feature selection and particle swarm optimization.
Background
With the wide application of more and more complex science and technology products, the demand for electric power is gradually increasing worldwide, and the electric power grid needs to be controlled to realize the sustainable development of electric power. In the artificial intelligence era, the power internet of things is gradually connected into daily life, the development of the smart power grid also needs adaptive testing capability, and the smart power meter is produced accordingly. The continuous expansion of the infrastructure of the intelligent electric meter in the global range also lays a foundation for introducing an active electric energy system into an intelligent power grid. Since the 'strong smart grid' plan was introduced in 2009, the power grid companies in China are always deploying smart meters, power distribution automation, embedded intelligence and other technologies on a large scale.
For household buildings and enterprise buildings, the prediction of energy consumption is used for improving the use efficiency of energy consumption and reducing the energy consumption, so that the method has great practical significance. Commercial and residential buildings account for 30% to 40% of the total energy consumption of intelligent buildings. Current trends indicate that this percentage may increase in the near future and that global energy consumption and penetration are increasing. Short-term energy consumption prediction is crucial, and is a challenging problem due to the complexity and various uncertainties of infrastructure behavior of buildings, and the disadvantages of low efficiency, serious waste of electric energy, weak information interaction capability and low automation degree of the traditional power grid.
In view of this, researchers have developed many predictive methods to improve grid quality and optimize energy usage. In many related researches, a time series model ARIMA and the like are also often used as a reference model for verifying whether the prediction performance of some newly proposed methods is superior. Researchers now often use historical data in conjunction with machine learning and deep learning algorithms, such as Artificial Neural Networks (ANNs), Support Vector Machines (SVMs), adaptive neuro-fuzzy inference systems (ANFIS), and Extreme Learning Machines (ELMs) for prediction. The convolutional neural network, the BP neural network and the like have been studied in the field of power consumption, but are still in the early stage of the prediction method.
In the data preprocessing process, the accuracy of the model is largely determined by the quality of feature selection of the original data. The predictive model is better enhanced if the number of input data features can be reduced by selecting the most efficient and useful inputs. The methods of feature selection methods include correlation analysis and numerical sensitivity analysis, but these methods are linear input selection methods, while the energy consumption data are nonlinear. Therefore, the mutual information feature selection method is more effective, and the efficiency of calculating the correlation between input data and output data is high. Feature variable selection based on mutual information is a novel variable selection method, wherein mutual information is quantized and the correlation between different related variables is calculated.
1) MI mutual information algorithm
Mutual Information (MI), which represents the interdependence between two variables X and Y.
Mutual information I (X; Y) between X and Y is defined as:
wherein p (x, y) is a joint probability density function, and p (x), p (y) are edge probability density functions of x and y, respectively. MI is the amount of information used to evaluate the contribution of the occurrence of one event to the occurrence of another event. The MI mutual information method is characterized in that mutual information measurement of all characteristics and target characteristics is calculated, then sequencing is carried out, and N' characteristics with the highest correlation are selected, so that the purpose of characteristic selection is achieved.
2) Correlation coefficient of Person
Wherein,are the average values of X and Y, respectively. If r is more than or equal to 0.5, the correlation between X and Y is stronger, otherwise, the correlation between X and Y is weaker. The features can be further reduced by performing a quadratic feature selection by the Person correlation coefficient.
3) LSTM model
LSTM is a deep learning model that can efficiently process longer time series and automatically learn data and mine deeper functions. However, similar to other neural network models, the setting of part of hyper-parameters in the LSTM neural network model often depends on the experience of researchers, and such models lack scientific rigor. PSO has the advantage of being simple to implement, PSO solutions provide faster convergence speed, and no many parameters need to be adjusted. Genetic algorithms and ant colony algorithms, etc. do not have such a guiding mechanism.
The long-short Time neural memory network (LSTM) is proposed by Hochreiter and used for solving the problems of gradient extinction and gradient explosion existing in Back-propagation Through Time (BPTT). With the continuous improvement of the model, the LSTM network architecture is gradually developed into the widely used LSTM network architecture. The internal part of the device consists of 3 unique gate structures and 1 state module for storing and memorizing. The structure of the LSTM cell is shown in fig. 1. Wherein C istFor the state information stored in the local LSTM cell, htFor the output of the hidden layer of this unit, ftTo forget the door, itIn order to input the information into the gate,as information of the current time otIn order to output the output gate, the output gate is provided with a gate,which represents the multiplication of the elements of the matrix,representing a matrix addition.
Forget the door: control the last unitState Ct-1Degree of forgetting:
ft=σ(Wf·[ht-1,xt]+bf) (3)
an input gate: control which information is added to the unit:
it=σ(Wi·[ht-1,xt]+bi) (4)
updating the state of the unit: according to ftSelectively recording new information to CtThe method comprises the following steps:
an output gate: c is to betActivating and controlling CtDegree of being filtered:
ot=σ(Wo·[ht-1,xt]+bo) (7)
Wf,Wi,Woweight matrices corresponding to the respective modules, bf,bi,boIs a bias term, sigma is a sigmoid activation function, and tanh is a hyperbolic tangent activation function defined as
σ(x)=1/(1+e-x) (9)
tanh(x)=(ex-e-x)/(ex+e-x) (10)
The output layer is represented by the formula (11)tObtaining the final predicted value y through a full connection layer (dense)t:
Wherein, Wy,byRespectively, a weight matrix and an offset term.
yt=σ(Wy·ht+by) (11)
The LSTM controls the transfer of historical information through a gate function, and has certain time sequence processing and prediction capabilities.
4) PSO particle swarm optimization algorithm
The basic idea of the particle swarm optimization is as follows: a group of birds randomly flies to a certain position in a certain area to search for food, and all the birds only know the distance between the birds and the food and the position information of other birds. Each bird, when flying away from the current location to another location, will rely on the following information: at present, the surrounding area of the bird nearest to the food is judged according to the flying experience of the bird.
The PSO is initialized to a population of random particles (random solution). The optimal solution is then found by iteration. In each iteration, the particle updates itself by tracking two "extrema" (the local optimal solution pbest, the global optimal solution gbest). After finding these two optimal values, the particle updates its velocity and position by the following formula.
vi=vi+c1×rand()×(pbesti-xi)+c2×rand()×(gbesti-xi) (12)
xi=xi+vi
Wherein i is 1, 2, …, and N is the total number of particles in the particle group.
vi: current velocity of ith particle
And rand (): random number between (0, 1)
xi: current position of i particle
c1And c2: learning factor
pbestiAnd gbestiRespectively is the local optimum of the current particle swarmLocation and global optimum location.
However, the existing MI mutual information algorithm, LSTM model and PSO particle swarm optimization algorithm have low precision on energy consumption prediction and unstable prediction performance, and do not meet the requirement of building energy consumption prediction. Therefore, it is necessary to develop an energy consumption prediction method applied to buildings, which has high prediction accuracy and stable prediction performance.
Disclosure of Invention
The invention aims to provide an LSTM energy consumption prediction method based on multi-dimensional feature selection and particle swarm optimization, which is an energy consumption prediction method applied to buildings, and has high prediction precision and stable prediction performance.
In order to achieve the purpose, the technical scheme of the invention is as follows: an energy consumption prediction method based on MI-LSTM-PSO is characterized by comprising the following steps: as shown in fig. 2, includes the steps of,
the method comprises the following steps: performing correlation analysis on time and feature dimensions of the original data set by adopting an MI mutual information method, and selecting front N' dimension features most effective on the energy consumption prediction target value, thereby eliminating redundant data and playing a role in improving the efficiency of a model algorithm;
step two: calculating a pearson correlation coefficient value between the top N 'dimensional feature selected by the MI mutual information method and the predicted sequence, and selecting an N' dimensional feature with the pearson correlation coefficient value being greater than or equal to 0.5;
step three: performing model training and prediction on the N' dimensional feature data after PMI dual feature selection by adopting an LSTM model to obtain an initial prediction sequence y (t);
step four: and optimizing the hyperparameter units, dropout and batchsize of the LSTM model by adopting a particle swarm optimization PSO algorithm, thereby improving the prediction precision of the LSTM model and finally obtaining the MI-LSTM-PSO model.
In the above technical solution, in the first step and the second step, N' is 60, that is, the first 60-dimensional feature most effective for the energy consumption prediction target value is selected.
In the above technical solution, the first step specifically includes the following steps,
s11, forming the first 24-hour 20-dimensional feature data into 24M (i.e. 480) -dimensional feature components by using a sliding window, wherein the original data sequence comprises: photovoltaic power generation capacity of 2 areas, energy consumption of 17 areas and different facilities, and total input electric quantity of a system power grid (data can be in different sequences according to different scene data sets);
s12, selecting the feature of the above 24M (480) dimension feature component by MI mutual information method;
wherein p (X, Y) is a joint probability density function of X and Y, and p (X) and p (Y) are marginal density functions, if X and Y are not related at all, p (X, Y) will be equal to p (X) p (Y), and mutual information will be equal to 0, if I (X; Y) is larger, the correlation between the two variables is stronger;
s13, determining the optimal parameter N of MI feature selection dimension through experimental optimization; if the value of N is too large, the model training data set will contain too much redundant information and noise, which will deteriorate the prediction performance, while if the value of N is too small, the model training data set will contain too little information, which will also deteriorate the prediction result; generally, the optimal N value is between 3M and 6M, and the feature dimension with better prediction performance and smaller N value is selected;
and S14, based on mutual information sequencing of the characteristic sequence x (t) and the target sequence Y, integrating time and characteristic dimension data, and selecting the first 60-dimensional characteristic most effective on the energy consumption prediction target value as a training data set of a subsequent model.
In the above technical solution, the second step specifically comprises the following steps,
s21, calculating a pearson correlation coefficient of the above 60-dimensional feature component with the target sequence Y (i.e. Gi);
wherein,are respectively X and are respectively a group of X,the average value of Y; if r is more than or equal to 0.5, the correlation between X and Y is stronger, otherwise, the correlation between X and Y is weaker;
and S22, selecting 37-dimensional feature data with the pearson correlation coefficient larger than or equal to 0.5 according to the fact that the pearson correlation coefficient is smaller than 0.5, which indicates that the correlation between the two is weak.
In the above technical solution, the LSTM network includes three gate structures and a state module for storing memory, as shown in fig. 1, the third step specifically includes the following steps:
s31, setting CtFor the state information stored for the local LSTM cell, xtAs input to the input layer, htFor the output of the hidden layer of this unit, ftTo forget the door, itIn order to input the information into the gate,as information of the current time otFor the output gate, "×" indicates matrix element multiplication, "+" indicates addition operation, σ is sigmoid function;
s32, forget gate: for controlling the last cell state Ct-1The degree of forgetting, the expression of which is as follows:
ft=σ(Wf*[ht-1,xt]+bf) (3)
s33, input gate: for controlling which information is added to the unit, the expression is as follows:
it=σ(Wi*[ht-1,xt]+bi) (4)
s34, cell stored state information: for according to ftAnd itSelectively recording new information to CtWherein the expression is as follows:
s35, output gate: for mixing CtActivating and controlling CtThe degree of filtering is expressed as follows:
ot=σ(Wo*[ht-1,xt]+bo) (7)
ht=ot*tanh(Ct) (8)
wherein h istThe output of the hidden layer of the unit; h ist-1The output of the previous unit hidden layer; wf、Wi、WoAre respectively ft、it、otCorresponding weight matrix, bf、bi、boAre respectively ft、it、otThe corresponding bias term, tanh, is a hyperbolic tangent activation function, defined as follows:
σ(x)=1/(1+e-x) (9)
tanh(x)=(ex-e-x)/(ex+e-x) (10)
s36, the output layer is htObtaining the final predicted value y through a full connection layert:
yt=σ(Wy*ht+by) (11)
In the above formula, WyAnd byRespectively, a weight matrix and an offset term.
In the above technical solution, the step four specifically includes the following steps,
s41, initializing modification parameters, setting the range units belonging to [20,300], dropout belonging to [0,1], batchsize belonging to [20,300 ];
s42, randomly initializing a particle swarm (20 particles) in an initial range, calculating an adaptive value (mean absolute error MAE) of each particle according to a fixness function (LSTM model fitting result), and determining the optimal position (pbest) of the particle swarm of the iteration and the optimal orientation (gbest) of a historical particle swarm according to the prediction index MAE of each current particle;
s43, updating the position and the speed of the current particle according to the position and the speed of the optimal particle, fitting the updated particle through an LSTM model, calculating the MAE of each particle, and updating pbest and gbest according to the MAE;
vi=vi+c1×rand()×(pbesti-xi)+c2×rand()×(gbesti-xi) (12)
xi=xi+vi
in formula (12): i is 1, 2, …, N is the total number of particles in the population;
vi: the current velocity of the ith particle;
and rand (): a random number between (0, 1);
xi: i current position of the particle;
c1and c2: a learning factor;
pbestiand gbestiRespectively obtaining a local optimal position and a global optimal position of the current particle swarm;
s44, after the updated particles are trained through an LSTM model, calculating the adaptive value of each particle, and updating the optimal position of the particle swarm of the iteration and the optimal orientation of the historical particle swarm according to the adaptive value;
s45, when the fitness value of the optimal particle is not changed any more or the iteration number reaches the upper limit value, the algorithm is considered to have converged at the moment; if the particle is not converged, the flow returns to S33 to update the particle;
and S46, substituting the obtained optimal particle parameters units, dropout and batchsize into the LSTM model, and performing model prediction on the data in the first step to obtain a final prediction result.
The foregoing "+" indicates: and multiplied by it.
The invention has the following advantages:
(1) the invention is an energy consumption prediction method applied to buildings, which has high prediction precision and stable prediction performance;
(2) according to the method, redundant characteristics are reduced by 87.5% through MI, a good effect is achieved on improving the efficiency of the model algorithm, and the efficiency of the model algorithm is high;
(3) the method adopts the PSO algorithm to optimize the hyperparameter units, dropout and Batchsize of the LSTM model, thereby improving the prediction precision of the LSTM model and achieving good model fitting effect;
(4) the prediction value of the PMI-PSO-LSTM model is basically in the confidence interval of the true value, the prediction trend is close to the true value, and the prediction precision is high;
(5) the MAE and SMAPE of the PMI-PSO-LSTM combined model are superior to all results of other models, and the PMI-PSO-LSTM combined model has higher robustness and more stable prediction performance.
Drawings
Fig. 1 is a schematic view of the internal structure of a conventional LSTM.
FIG. 2 is a schematic structural diagram of the PMI-PSO-LSTM model of the present invention.
FIG. 3 is a graph comparing the predicted results of the basic model according to the embodiment of the present invention.
FIG. 4 is a scatter plot comparing the prediction results of the base model in accordance with the present invention.
FIG. 5 is a comparison graph of the combined model prediction results according to the embodiment of the present invention.
FIG. 6 is a comparison scatter plot of combined model prediction results according to an embodiment of the present invention.
FIG. 7 is a comparison chart of evaluation indexes of the model according to the embodiment of the present invention.
Detailed Description
The embodiments of the present invention will be described in detail with reference to the accompanying drawings, which are not intended to limit the present invention, but are merely exemplary. While the advantages of the invention will be clear and readily understood by the description.
Examples
The invention will be described in detail by taking the prediction of the electricity consumption of a certain building as an example, and has a guiding function for applying the invention to the prediction of the energy consumption of other buildings.
The implementation takes the historical electricity consumption of a certain building as a time sequence to predict the electricity consumption of a short-term single step 1 h.
In this embodiment, the prediction of the power consumption of a certain building includes the following contents:
1. experimental data set and MI feature selection
The data set used in the embodiment is the electricity consumption of a building from 10 and 15 days in 2019 to 6 and 4 days in 2019, and the data set has 20 characteristics in total. These features are described in table 1. Where column 5 data is the pearson's correlation coefficient value for the current feature and the Gi feature.
Table 1 data set description
In the present embodiment, the data of the previous 24 hours is used to predict the value of Gi in the next hour, so that the data of 20 features in 24 hours is formed into 480 feature components using a sliding window. Then, the first 60-dimensional feature with the maximum MI value among the 480 feature components formed by the sliding window method is selected by using an MI mutual information method.
The selection results are shown in table 2; the selection results are shown in table 2;
wherein, selected characteristics such as Gi (t-1) represent that the previous hour is input from the public power grid of the industrial factory building by taking the current time as a reference;
TABLE 2 characteristics of MI selection
Wherein the selected characteristic, such as Gi (t-1), indicates that the previous hour was entered from the industrial plant utility grid based on the current time. The MI value is the size of the mutual information value of the current characteristic component X and the Gi component (i.e. I (X; Gi (t)) based on the current time, and as can be seen from Table 2, the mutual information values of most of characteristics in the previous four hours and the Gi characteristic at the current time are larger, and the mutual information values of the characteristics in the previous 24 hours of Gi, Ao, Co and A2 and the Gi characteristic at the current time are also relatively larger, therefore, the MI value is reduced by 87.5% of redundant characteristics, and the method plays a good role in improving the efficiency of the model algorithm.
The data set used in this example is a 20-dimensional feature, and the previous 24-hour data is used to predict the 25 th hour data in the future.
Experiments were performed on the 20-dimensional feature data set in this example, and the experimental results show that:
1) the prediction result obtained by selecting the first 60-dimensional features is almost the same as the 100-dimensional feature;
2) when the feature data dimension is increased (namely, the feature data dimension is selected to be more than 100), the prediction result is deteriorated;
3) when the feature data dimension is reduced (i.e. the feature data dimension is selected to be less than 60), the data set contains too little information, which also degrades the prediction result.
Therefore, the present embodiment selects the top 60-dimensional feature having the largest MI value among the 480 feature components formed using the sliding window method using the MI mutual information method.
2. Evaluation index
4 evaluation indexes are used for evaluating the quality of the model.
Root mean square error: RMSE, the smaller the number, the better the model fit.
Mean absolute error: the smaller the MAE, the better the model fitting.
Mean absolute percentage error of symmetry: SMAPE, the smaller the value, the better the model fitting effect.
Coefficient of block: r2, the larger the number, the better the model fit.
In the formulae (13), (14), (15), (16),to predict value, yiIn order to be the true value of the value,the mean of the true values, n is the number of data.
3. Model parameter setting
In order to verify the prediction effect of the proposed MI + PSO-LSTM combined model, this example uses two groups of 6 experimental models (i.e. M1-M6) in Table 3 for experimental comparison, and the main parameters of the models are shown in tables 4 and 5.
TABLE 3 experimental reference model
No | Model (model) | Description of the invention |
M1 | ARIMA | Differential integration moving average autoregressive model |
M2 | KNR | K nearest neighbor (regression) model |
M3 | LSTM | LSTM model |
M4 | MI-LSTM | Mutual information method + LSTM model |
M5 | PMI-LSTM | Mutual information method + LSTM model |
M6 | PMI-LSTM-PSO | Mutual information method + PSO optimization LSTM model |
Table 4 comparative model main parameters 1
Table 5 comparative model principal parameters 2
4. Analysis of model Experimental data
4.1 analysis of basic model test results
In the embodiment, a basic model M1-M3 of Table 3 is adopted, and single-step prediction experiment comparison is carried out on the total input electric quantity Gi of the public power grid through characteristics 1-20.
In the experimental comparison results (table 6), the best LSTM model prediction results can be seen from the four model prediction evaluation indexes, namely the coefficient of performance, the root mean square error, and the symmetric average absolute percentage error.
TABLE 6 comparison of basic model experiments
Model (model) | R2 | RMSE | MAE | SMAPE |
ARIMA | 0.872609 | 12.1688 | 7.496174 | 8.548175 |
KNR | 0.849556 | 13.21612 | 8.155453 | 9.543262 |
LSTM | 0.889503 | 11.211024 | 6.622012 | 7.594866 |
The comparison of the predicted results of 1h power usage predicted by ARMA, K neighbors and LSTM with the true values is shown in fig. 3 and 4. It can be seen from fig. 3 and 4 that the predicted trend of the LSTM model is closest to the true value, and only the LSTM model is within the confidence interval of the original value. The result curve predicted by the ARIMA and K neighbor model is not in the confidence interval of the true value, and the problem of prediction lag exists. In summary, the predicted effect of the LSTM model is best compared to the ARMA, K-nearest neighbor regression model. LSTM was chosen as the experimental base model.
4.2 analysis of the results of the LSTM combined model experiment
In the embodiment, 20 groups of single-step prediction comparison experiments are carried out on the total input electric quantity Gi of the public power grid through the characteristics 1-20 by adopting the combined models M3-M6 shown in the table 3.
The comparison of the predicted results of the four models for predicting 1h electricity consumption Gi with the true values is shown in fig. 5 and 6. It can be seen from fig. 5 and 6 that the predicted values of the four models are substantially within the confidence interval of the true values, and the predicted trend of the PMI-PSO-LSTM model is closest to the true values. As can be seen from fig. 7, the evaluation indexes of the PMI-PSO-LSTM model are all optimal (in fig. 7, M3, M4, M5, and M6 are combination models M3-M6 in table 3, respectively, in this embodiment).
Table 7 shows the average of the experimental results of 20 groups of four combined models, the first four columns are four evaluation indexes of the prediction model, and the fifth column is the training time of the prediction model. As can be seen from Table 7, the MI + PSO-LSTM model did not improve significantly on R2, but improved performance by about 20%, 10%, 5% on MAE, SMAPE, respectively, compared to the LSTM, MI-LSTM, and PMI-LSTM models. Compared with the LSTM model, the performance of MI-LSTM is not improved significantly, but after features are selected through MI, the dimension of input data is reduced by 87.5%, and the time for model training is reduced by about 63%. Compared with the MI-LSTM model, the PMI-LSTM performance is hardly improved, but after secondary feature selection, the dimension of input data is reduced by about 40%, so that the time for model training is reduced by about 20%;
TABLE 7 comparison of evaluation indexes of combination models
Model (model) | R2 | RMSE | MAE | SMAPE | t |
LSTM | 0.88724 | 11.18282 | 7.19766 | 8.56986 | 159S |
MI-LSTM | 0.89722 | 10.67590 | 6.66639 | 7.82360 | 59S |
PMI-LSTM | 0.92301 | 10.73256 | 6.42070 | 7.49299 | 46S |
MI-PSO-LSTM | 0.90482 | 10.27717 | 6.12843 | 6.87869 | 44S |
FIG. 7 is a box plot of four evaluation indexes of 20 experiments of M3-M6. The '+' symbols in fig. 7 that are not within the box shape are outliers (negligible). As can be seen from FIG. 7, the four evaluation indexes of the MI-PSO-LSTM model are obviously superior to those of the other three models, the MAE and SMAPE of the MI-PSO-LSTM model are superior to all the results of the other models, and the R2 and RMSE of the MI-PSO-LSTM model are also superior to those of the other models by about 95%. The four evaluation indexes of MI + LSTM are partially overlapped with LSTM, but the overall trend of MI-LSTM is superior to that of the LSTM model. As can be seen from FIG. 7, the box plot shape (upper and lower quartile difference) of the MI-PSO-LSTM model is minimal compared to the LSTM, MI-LSTM, and PMI-LSTM models, indicating that the MI-PSO-LSTM model is more stable than the other models.
In summary, the invention provides a short-term energy consumption combined prediction model based on PMI, PSO and LSTM. Firstly, in the data preprocessing stage, the mutual information method and the Pearson coefficient are used for carrying out double feature selection on the original data, and redundant features are deleted. And then matching and optimizing the network architecture of the LSTM by using the PSO to ensure that the adaptability of the topology structure of the LSTM and the current input data is the best, and finally inputting the data after the characteristic selection into the optimized LSTM to predict the energy consumption data in a short term. In order to verify the effect of the MI-PSO-LSTM model on short-term energy consumption prediction, a multi-dimensional single-step prediction comparison experiment is carried out on an energy consumption time sequence dataset of a certain building. The results of the above experiments are combined to show that 4 evaluation indexes of the MI-PSO-LSTM combined model are all optimal, namely that the MI-PSO-LSTM model has higher prediction precision and robustness and more stable prediction performance. The MI-PSO-LSTM combined model can provide a beneficial research idea for exploring the aspect of predictive analysis of time series by utilizing deep learning. However, the MI-PSO-LSTM combined model still has a large optimization space, such as a noise filtering problem and a feature dynamic intelligent selection problem which are researched in time series, so that the model prediction accuracy is further optimized.
Other parts not described belong to the prior art.
Claims (5)
1. A LSTM energy consumption prediction method based on dual feature selection and particle swarm optimization is characterized by comprising the following steps: comprises the following steps of (a) carrying out,
the method comprises the following steps: performing correlation analysis on time and feature dimensions of the original data set by adopting an MI mutual information method, and selecting front N' dimension features most effective on the energy consumption prediction target value;
step two: performing secondary feature selection on the N-dimensional features selected in the step one by adopting a Person correlation coefficient to obtain N' dimensional features after PMI feature selection;
step three: carrying out model training and prediction on the N' dimensional feature data after PMI feature selection by adopting an LSTM model to obtain an initial prediction sequence y (t);
step four: and optimizing the hyperparameter units, dropout and batchsize of the LSTM model by adopting a particle swarm optimization PSO algorithm, thereby improving the prediction precision of the LSTM model and finally obtaining the PMI-LSTM-PSO model.
2. The LSTM energy consumption prediction method based on dual feature selection + particle swarm optimization according to claim 1, wherein: the first step specifically comprises the following steps of,
s11, forming the M-dimensional feature data of the first 24 hours into 24M-dimensional feature components using a sliding window, wherein the original data sequence includes: photovoltaic power generation of 2 areas, energy consumption of 17 different facilities of the areas, and total electric quantity input by a system power grid;
s12, selecting the characteristics of the 24M dimensional characteristic components by using an MI mutual information method;
in formula (1): p (X, Y) is a joint probability density function of X and Y, and p (X) and p (Y) are marginal density functions, if X and Y are not related at all, p (X, Y) will be equal to p (X) p (Y), and mutual information will be equal to 0, if I (X; Y) is larger, the correlation between the two variables is stronger;
s13, determining the optimal parameter N of MI feature selection dimension through experimental optimization; if the value of N is too large, the model training data set will contain too much redundant information and noise, which will deteriorate the prediction performance, while if the value of N is too small, the model training data set will contain too little information, which will also deteriorate the prediction result; generally, the optimal N value is between 3M and 6M, and the feature dimension with better prediction performance and smaller N value is selected;
and S14, based on mutual information sequencing of the characteristic sequence x (t) and the target sequence Y, integrating time and characteristic dimension data, and selecting the first N' dimension characteristic most effective on the energy consumption prediction target value as a training data set of a subsequent model.
3. The LSTM energy consumption prediction method based on dual feature selection + particle swarm optimization according to claim 2, wherein: the second step specifically comprises the following steps:
s21, calculating a Pearson correlation coefficient of the N' -dimensional characteristic component and the target sequence Y;
if r is more than or equal to 0.5, the correlation between X and Y is stronger, otherwise, the correlation between X and Y is weaker;
s22, selecting the N' dimension characteristic data with pearson correlation coefficient larger than or equal to 0.5.
4. The LSTM energy consumption prediction method based on dual feature selection + particle swarm optimization according to claim 3, wherein: the LSTM network internally comprises three gate structures and a state module for storing and memorizing, and the third step specifically comprises the following steps:
s31, setting CtFor the state information stored for the local LSTM cell, xtAs input to the input layer, htFor the output of the hidden layer of this unit, ftTo forget the door, itIn order to input the information into the gate,as information of the current time otFor the output gate, "×" indicates matrix element multiplication, "+" indicates addition operation, σ is sigmoid function;
s32, forget gate: for controlling the last cell state Ct-1The degree of forgetting, the expression of which is as follows:
ft=σ(Wf*[ht-1,xt]+bf) (3)
s33, input gate: for controlling which information is added to the unit, the expression is as follows:
it=σ(Wi*[ht-1,xt]+bi) (4)
s34, cell stored state information: for according to ftAnd itSelectively recording new information to CtWherein the expression is as follows:
s35, output gate: for mixing CtActivating and controlling CtThe degree of filtering is expressed as follows:
ot=σ(Wo*[ht-1,xt]+bo) (7)
ht=ot*tanh(Ct) (8)
formula (3) to formula (8): w isf、Wi、WoAre respectively ft、it、otCorresponding weight matrix, bf、bi、boAre respectively ft、it、otThe corresponding bias term, tanh, is a hyperbolic tangent activation function, defined as follows:
σ(x)=1/(1+e-x) (9)
tanh(x)=(ex-e-x)/(ex+e-x) (10)
s36, the output layer is htObtaining the final predicted value y through a full connection layert:
yt=σ(Wy*ht+by) (11)
In formula (11): wyAnd byRespectively, a weight matrix and an offset term.
5. The LSTM energy consumption prediction method based on dual feature selection + particle swarm optimization according to claim 4, wherein: the fourth step specifically comprises the following steps of,
s41, initializing modification parameters, setting the range units belonging to [20,300], dropout belonging to [0,1], batchsize belonging to [20,300 ];
s42, randomly initializing the particle swarm in an initial range, calculating an adaptive value of each particle according to the fixness function, and determining pbest of the iterated particle swarm and gbest of the historical particle swarm according to the prediction index MAE of each current particle;
s43, updating the position and the speed of the current particle according to the position and the speed of the optimal particle, fitting the updated particle through an LSTM model, calculating the MAE of each particle, and updating pbest and gbest according to the MAE;
vi=vi+c1×rand()×(pbesti-xi)+c2×rand()×(gbesti-xi) (12)
xi=xi+vi
in formula (12): i is 1, 2, …, N is the total number of particles in the population;
vi: the current velocity of the ith particle;
and rand (): a random number between (0, 1);
xi: i current position of the particle;
c1and c2: a learning factor;
pbestiand gbestiRespectively obtaining a local optimal position and a global optimal position of the current particle swarm;
s44, after the updated particles are trained through an LSTM model, calculating the adaptive value of each particle, and updating the optimal position of the particle swarm of the iteration and the optimal orientation of the historical particle swarm according to the adaptive value;
s45, when the fitness value of the optimal particle is not changed any more or the iteration number reaches the upper limit value, the algorithm is considered to have converged at the moment; if the particle is not converged, the flow returns to S33 to update the particle;
and S46, substituting the obtained optimal particle parameters units, dropout and batchsize into the LSTM model, and performing model prediction on the data in the first step to obtain a final prediction result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111213171.9A CN113962454A (en) | 2021-10-18 | 2021-10-18 | LSTM energy consumption prediction method based on dual feature selection and particle swarm optimization |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111213171.9A CN113962454A (en) | 2021-10-18 | 2021-10-18 | LSTM energy consumption prediction method based on dual feature selection and particle swarm optimization |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113962454A true CN113962454A (en) | 2022-01-21 |
Family
ID=79464357
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111213171.9A Pending CN113962454A (en) | 2021-10-18 | 2021-10-18 | LSTM energy consumption prediction method based on dual feature selection and particle swarm optimization |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113962454A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116561554A (en) * | 2023-04-18 | 2023-08-08 | 南方电网电力科技股份有限公司 | Feature extraction method, system, equipment and medium of boiler soot blower |
CN117455053A (en) * | 2023-10-31 | 2024-01-26 | 郑州轻工业大学 | Random configuration network prediction building energy consumption method based on search interval reconstruction |
CN118249408A (en) * | 2024-05-29 | 2024-06-25 | 浙江禹贡信息科技有限公司 | Grid-connected hybrid renewable energy system based on combination optimization and machine learning algorithm |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108986470A (en) * | 2018-08-20 | 2018-12-11 | 华南理工大学 | The Travel Time Estimation Method of particle swarm algorithm optimization LSTM neural network |
CN111783953A (en) * | 2020-06-30 | 2020-10-16 | 重庆大学 | 24-point power load value 7-day prediction method based on optimized LSTM network |
CN111985706A (en) * | 2020-08-15 | 2020-11-24 | 西北工业大学 | Scenic spot daily passenger flow volume prediction method based on feature selection and LSTM |
-
2021
- 2021-10-18 CN CN202111213171.9A patent/CN113962454A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108986470A (en) * | 2018-08-20 | 2018-12-11 | 华南理工大学 | The Travel Time Estimation Method of particle swarm algorithm optimization LSTM neural network |
CN111783953A (en) * | 2020-06-30 | 2020-10-16 | 重庆大学 | 24-point power load value 7-day prediction method based on optimized LSTM network |
CN111985706A (en) * | 2020-08-15 | 2020-11-24 | 西北工业大学 | Scenic spot daily passenger flow volume prediction method based on feature selection and LSTM |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116561554A (en) * | 2023-04-18 | 2023-08-08 | 南方电网电力科技股份有限公司 | Feature extraction method, system, equipment and medium of boiler soot blower |
CN117455053A (en) * | 2023-10-31 | 2024-01-26 | 郑州轻工业大学 | Random configuration network prediction building energy consumption method based on search interval reconstruction |
CN118249408A (en) * | 2024-05-29 | 2024-06-25 | 浙江禹贡信息科技有限公司 | Grid-connected hybrid renewable energy system based on combination optimization and machine learning algorithm |
CN118249408B (en) * | 2024-05-29 | 2024-08-02 | 浙江禹贡信息科技有限公司 | Grid-connected hybrid renewable energy system based on combination optimization and machine learning algorithm |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109754113B (en) | Load prediction method based on dynamic time warping and long-and-short time memory | |
CN111563611B (en) | Cloud data center renewable energy space-time prediction method for graph rolling network | |
CN110705743B (en) | New energy consumption electric quantity prediction method based on long-term and short-term memory neural network | |
Kalogirou et al. | Artificial intelligence techniques in solar energy applications | |
CN113962454A (en) | LSTM energy consumption prediction method based on dual feature selection and particle swarm optimization | |
CN112116144B (en) | Regional power distribution network short-term load prediction method | |
Tarek et al. | Wind Power Prediction Based on Machine Learning and Deep Learning Models. | |
Zhao et al. | Heating load prediction of residential district using hybrid model based on CNN | |
CN113591957B (en) | Wind power output short-term rolling prediction and correction method based on LSTM and Markov chain | |
CN116526473A (en) | Particle swarm optimization LSTM-based electrothermal load prediction method | |
CN114119273A (en) | Park comprehensive energy system non-invasive load decomposition method and system | |
Fan et al. | Multi-objective LSTM ensemble model for household short-term load forecasting | |
Gao et al. | A hybrid improved whale optimization algorithm with support vector machine for short-term photovoltaic power prediction | |
Huang et al. | Short-term load forecasting based on a hybrid neural network and phase space reconstruction | |
CN115640901A (en) | Small sample load prediction method based on hybrid neural network and generation countermeasure | |
CN115759458A (en) | Load prediction method based on comprehensive energy data processing and multi-task deep learning | |
Zuo | Integrated forecasting models based on LSTM and TCN for short-term electricity load forecasting | |
CN109408896B (en) | Multi-element intelligent real-time monitoring method for anaerobic sewage treatment gas production | |
Goh et al. | Hybrid SDS and WPT-IBBO-DNM based model for ultra-short term photovoltaic prediction | |
Li et al. | BO-STA-LSTM: Building energy prediction based on a Bayesian optimized spatial-temporal attention enhanced LSTM method | |
Wang et al. | Support vector machine with particle swarm optimization for reservoir annual inflow forecasting | |
CN117477561A (en) | Residential household load probability prediction method and system | |
Wang et al. | Prediction of heating load fluctuation based on fuzzy information granulation and support vector machine | |
CN114372418A (en) | Wind power space-time situation description model establishing method | |
CN114444763A (en) | Wind power prediction method based on AFSA-GNN |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |