CN113962454A

CN113962454A - LSTM energy consumption prediction method based on dual feature selection + particle swarm optimization

Info

Publication number: CN113962454A
Application number: CN202111213171.9A
Authority: CN
Inventors: 谌东海; 王宁; 刘杰; 王伟; 刘畅
Original assignee: Changjiang Institute of Survey Planning Design and Research Co Ltd
Current assignee: Changjiang Institute of Survey Planning Design and Research Co Ltd
Priority date: 2021-10-18
Filing date: 2021-10-18
Publication date: 2022-01-21

Abstract

The invention discloses an LSTM energy consumption prediction method based on dual feature selection and particle swarm optimization. The method comprises the following steps: performing correlation analysis on time and feature dimensions of the original data set by adopting an MI mutual information method, and selecting front N' dimension features most effective on the energy consumption prediction target value; step two: performing secondary feature selection on the N-dimensional features to obtain N' dimensional features after PMI feature selection; step three: performing model training and prediction on the data after PMI dual feature selection by adopting an LSTM model to obtain an initial prediction sequence y (t); step four: and optimizing the hyperparameter units, dropout and batchsize of the LSTM model by adopting a PSO algorithm, thereby improving the prediction precision of the LSTM model and finally obtaining the PMI-LSTM-PSO model. The method has the advantages of high prediction precision, high algorithm efficiency and stable prediction performance.

Description

LSTM energy consumption prediction method based on dual feature selection + particle swarm optimization

技术领域technical field

本发明涉及建筑能耗预测技术领域，更具体第说它是一种基于双重特征选择+粒子群优化的LSTM能耗预测方法。The invention relates to the technical field of building energy consumption prediction, more specifically, it is an LSTM energy consumption prediction method based on dual feature selection + particle swarm optimization.

背景技术Background technique

随着越来越复杂的科技产品的广泛应用，对电力的需求目前正在全球范围内逐渐增大，需要对电网进行控制从而实现电力的可持续发展。在人工智能时代电力物联网已经逐渐接入日常生活中，而智能电网的发展也需要与之相适应的测试能力，智能电表应运而生。智能电表基础设施在全球范围内的持续扩展也为将有功电能系统引入智能电网奠定了基础。自2009年推出“坚强智能电网”计划以来，中国国家电网公司一直在大规模部署智能电表、配电自动化和嵌入式智能等技术。With the widespread application of more and more complex technological products, the demand for electricity is gradually increasing worldwide, and it is necessary to control the power grid to achieve sustainable development of electricity. In the era of artificial intelligence, the Internet of Things has gradually been integrated into daily life, and the development of smart grids also requires the corresponding testing capabilities, and smart meters emerge as the times require. The continued expansion of smart metering infrastructure across the globe has also laid the foundation for the introduction of active energy systems into the smart grid. The State Grid Corporation of China has been deploying technologies such as smart meters, distribution automation and embedded intelligence on a large scale since launching its "Strong Smart Grid" program in 2009.

对于家庭建筑和企业建筑，通过能耗的预测提高能耗的使用效率，降低能耗具有很大的现实意义。商业和住宅建筑占智能楼宇能耗总量的30％至40％。当前的趋势表明，这一百分比在不久的将来可能会增加，并且全球的能源消耗和渗透率正在增加。所以短期能耗预测至关重要，由于建筑物的基础设施行为的复杂性和各种不确定性，以及传统电网存在着效率低，电能浪费严重，信息交互能力弱和自动化程度低的缺点，使得这成为一个具有挑战性的问题。For home buildings and enterprise buildings, it is of great practical significance to improve the efficiency of energy consumption and reduce energy consumption through the prediction of energy consumption. Commercial and residential buildings account for 30% to 40% of total energy consumption in smart buildings. Current trends suggest that this percentage is likely to increase in the near future and that energy consumption and penetration are increasing globally. Therefore, short-term energy consumption prediction is very important. Due to the complexity and various uncertainties of the building's infrastructure behavior, as well as the shortcomings of traditional power grids such as low efficiency, serious power waste, weak information interaction ability and low degree of automation, the This becomes a challenging problem.

鉴于此，研究人员开发了许多预测方法来改善电网质量并优化能源的使用。在很多相关的研究中，时间序列模型ARIMA等也经常作为参考模型，用于验证某些新提出的方法其预测性能是否优越。目前研究人员常常将历史数据与机器学习和深度学习算法结合使用，例如人工神经网络(ANN)，支持向量机(SVM)，自适应神经模糊推理系统(ANFIS)和极限学习机(ELM)进行预测。其中卷积神经网络和BP神经网络等在用电量领域已经有所研究，但其仍然处于预测方法的初级阶段。With this in mind, researchers have developed a number of forecasting methods to improve grid quality and optimize energy use. In many related studies, time series models such as ARIMA are often used as reference models to verify whether some newly proposed methods have superior prediction performance. Currently researchers often use historical data in conjunction with machine learning and deep learning algorithms, such as artificial neural networks (ANN), support vector machines (SVM), adaptive neuro-fuzzy inference systems (ANFIS) and extreme learning machines (ELM) to make predictions . Among them, convolutional neural network and BP neural network have been studied in the field of electricity consumption, but they are still in the initial stage of prediction methods.

在数据预处理过程中，对原始数据进行特征选择的好坏很大程度上决定了模型的精确度。如果可以通过选择最有效和有用的输入来减少输入数据特征的数量，则预测模型会得到更好的增强。特征选择方法的方法包括相关性分析和数值灵敏度分析，但是这些方法都是线性的输入选择方法，而能耗数据则是非线性的。因此，互信息特征选择方法将更加有效，此方法计算输入和输出数据相关性的效率是很高的。基于互信息进行特征变量选择是一种新型的变量选择方法，其中互信息量化并计算了不同相关变量之间的关联性。In the process of data preprocessing, the quality of feature selection on the original data largely determines the accuracy of the model. Predictive models are better enhanced if the number of input data features can be reduced by selecting the most efficient and useful inputs. The methods of feature selection methods include correlation analysis and numerical sensitivity analysis, but these methods are all linear input selection methods, while energy consumption data are nonlinear. Therefore, the mutual information feature selection method will be more effective, and the efficiency of this method to calculate the correlation between input and output data is very high. Feature variable selection based on mutual information is a novel variable selection method, in which mutual information quantifies and calculates the correlation between different correlated variables.

1)MI互信息算法1) MI mutual information algorithm

互信息(Mutual Information，MI)，表示两个变量X与Y之间的相互依赖性。Mutual Information (MI) represents the interdependence between two variables X and Y.

X，Y之间的互信息I(X；Y)定义为：The mutual information I(X; Y) between X and Y is defined as:

其中，p(x,y)是联合概率密度函数，p(x)，p(y)分别为x,y的边缘概率密度函数。MI是用来评价一个事件的出现对于另一个事件的出现所贡献的信息量。MI互信息法通过计算所有特征与目标特征的互信息度量，然后进行排序，选取N′个相关性最高的特征，从而达到特征选择的目的。Among them, p(x, y) is the joint probability density function, p(x), p(y) are the marginal probability density functions of x and y, respectively. MI is used to evaluate the amount of information that the occurrence of one event contributes to the occurrence of another event. The MI mutual information method achieves the purpose of feature selection by calculating the mutual information measure of all features and target features, and then sorting them to select N' features with the highest correlation.

2)Person相关系数2) Person correlation coefficient

其中，

分别为X,Y的平均值。如果r≥0.5说明X,Y之间相关性较强，否则说明X,Y之间相关性较弱。通过Person相关系数进行二次特征选择，可进一步减少特征。in,

are the mean values of X and Y, respectively. If r≥0.5, it means that the correlation between X and Y is strong; otherwise, it means that the correlation between X and Y is weak. Quadratic feature selection by Person correlation coefficient can further reduce features.

3)LSTM模型3) LSTM model

LSTM是一种深度学习模型，可以有效地处理较长的时间序列并自动学习数据并挖掘更深层次的功能。但是与其他神经网络模型类似，LSTM神经网络模型中部分超参数的设置，往往依赖研究者的经验，这样的模型缺乏科学严谨性。PSO的优势在于简单容易实现，PSO解决方案提供了更快的收敛速度，并且没有许多参数需要调整。遗传算法和蚁群算法等不具备这种引导机制。LSTM is a deep learning model that can efficiently process longer time series and automatically learn data and mine deeper features. However, similar to other neural network models, the setting of some hyperparameters in the LSTM neural network model often relies on the experience of researchers, and such models lack scientific rigor. The advantage of PSO is that it is simple and easy to implement, the PSO solution provides faster convergence, and there are not many parameters to adjust. Genetic algorithm and ant colony algorithm do not have this kind of guidance mechanism.

长短时神经记忆网络(LSTM)是由Hochreiter提出的用于解决时间反向传播(Back-propagation Through Time，BPTT)存在的梯度消失和梯度爆炸问题。随着模型不断改善，逐渐演变成被广泛使用的LSTM网络架构。其内部是由3个独特的门结构和1个用于存储记忆的状态模块组成。LSTM单元的结构如图1所示。其中C_t为本LSTM单元存储的状态信息，h_t为本单元隐含层的输出，f_t为遗忘门，i_t为输入门，

为当前时刻信息，o_t为输出门，

表示矩阵元素相乘，

表示矩阵相加。Long-short-term neural memory network (LSTM) was proposed by Hochreiter to solve the gradient disappearance and gradient explosion problems of Back-propagation Through Time (BPTT). As the model continues to improve, it gradually evolves into the widely used LSTM network architecture. Its interior is composed of 3 unique gate structures and a state module for storing memory. The structure of the LSTM unit is shown in Figure 1. Among them, C _t is the state information stored by the LSTM unit, h _t is the output of the hidden layer of the unit, f _t is the forget gate, and i _t is the input gate,

is the current moment information, o _t is the output gate,

represents the multiplication of matrix elements,

Represents matrix addition.

遗忘门：控制上一单元状态C_t-1被遗忘的程度：Forget Gate: Controls the degree to which the previous unit state C _t-1 is forgotten:

f_t＝σ(W_f·[h_t-1,x_t]+b_f) (3)f _t =σ(W _f ·[h _t-1 ,x _t ]+b _f ) (3)

输入门：控制哪些信息被加入到本单元中：Input Gate: Controls what information is added to this unit:

i_t＝σ(W_i·[h_t-1,x_t]+b_i) (4)i _t =σ(W _i ·[h _t-1 ,x _t ]+b _i ) (4)

单元状态更新：根据f_t将新信息有选择的记录到C_t中：Cell state update: Selectively record new information into C _t according to f _t :

输出门：将C_t激活，并控制C_t被过滤的程度：Output gate: activates C _t and controls how much C _t is filtered:

o_t＝σ(W_o·[h_t-1,x_t]+b_o) (7)o _t =σ(W _o ·[h _t-1 ,x _t ]+b _o ) (7)

W_f，W_i，

W_o为各个模块对应的权重矩阵，b_f，b_i，

b_o为偏置项，σ为sigmoid激活函数，tanh为双曲正切激活函数，定义为W _f , W _i ,

W _o is the weight matrix corresponding to each module, b _f , b _i ,

b _o is the bias term, σ is the sigmoid activation function, and tanh is the hyperbolic tangent activation function, which is defined as

σ(x)＝1/(1+e^-x) (9)σ(x)=1/(1+e ^-x ) (9)

tanh(x)＝(e^x-e^-x)/(e^x+e^-x) (10)tanh(x)=(e ^x -e ^-x )/(e ^x +e ^-x ) (10)

输出层依据式(11)将h_t经过一个全连接层(dense)得到最终预测值y_t：The output layer passes h _t through a fully connected layer (dense) according to formula (11) to obtain the final predicted value y _t :

其中，W_y，b_y分别为权重矩阵和偏置项。Among them, W _y and _by are the weight matrix and the bias term, respectively.

y_t＝σ(W_y·h_t+b_y) (11) _y _t =σ(W _y ·h _t +by ) (11)

LSTM通过门函数，控制历史信息的传递，具备一定的时间序列处理与预测能力。LSTM controls the transmission of historical information through the gate function, and has certain time series processing and forecasting capabilities.

4)PSO粒子群优化算法4) PSO particle swarm optimization algorithm

粒子群算法的基本思想：一群鸟在一定的区域内随机飞往某处搜索食物，所有的鸟仅知道自己与食物的距离和其他鸟的位置信息。每一只鸟在离开当前所在位置飞往其他位置时，会依赖于下列信息：目前离食物最近的鸟的周围区域、根据自己飞行的经验判断食物的所在。The basic idea of particle swarm algorithm: a group of birds randomly fly somewhere in a certain area to search for food, all birds only know the distance between themselves and the food and the location information of other birds. When each bird leaves its current location to fly to another location, it will rely on the following information: the surrounding area of the bird that is currently closest to the food, and judge the location of the food based on its own flying experience.

PSO初始化为一群随机粒子(随机解)。然后通过迭代找到最优解。在每一次的迭代中，粒子通过跟踪两个“极值”(局部最优解pbest，全局最优解gbest)来更新自己。在找到这两个最优值后，粒子通过下面的公式来更新自己的速度和位置。PSO is initialized as a group of random particles (random solution). Then iteratively find the optimal solution. In each iteration, the particle updates itself by tracking two "extremes" (the local optimal solution pbest and the global optimal solution gbest). After finding these two optimal values, the particle updates its velocity and position by the following formula.

v_i＝v_i+c₁×rand()×(pbest_i-x_i)+c₂×rand()×(gbest_i-x_i) (12)v _i =v _i +c ₁ ×rand()×(pbest _i -x _i )+c ₂ ×rand()×(gbest _i -x _i ) (12)

x_i＝x_i+v_i x _i = x _i +v _i

其中，i＝1，2，…，N，N是粒子群的粒子总数。Among them, i=1, 2, ..., N, N is the total number of particles in the particle swarm.

v_i：第i个粒子的当前速度v _i : the current velocity of the ith particle

rand()：介于(0，1)之间的随机数rand(): random number between (0, 1)

x_i：i粒子的当前位置x _i : the current position of the i particle

c₁和c₂：学习因子c ₁ and c ₂ : learning factors

pbest_i和gbest_i分别是当前粒子群局部最优位置和全局最优位置。pbest _i and gbest _i are the local optimal position and the global optimal position of the current particle swarm, respectively.

但现有的MI互信息算法、LSTM模型、PSO粒子群优化算法对能耗预测的精度不高、且预测性能不稳定，不满足建筑能耗预测的要求。因此，开发一种预测精度高，预测性能稳定的应用于建筑的能耗预测方法很有必要。However, the existing MI mutual information algorithm, LSTM model, and PSO particle swarm optimization algorithm have low accuracy for energy consumption prediction, and the prediction performance is unstable, which does not meet the requirements of building energy consumption prediction. Therefore, it is necessary to develop a building energy consumption prediction method with high prediction accuracy and stable prediction performance.

发明内容SUMMARY OF THE INVENTION

本发明的目的是为了提供一种基于多维特征选择+粒子群优化的LSTM能耗预测方法，为一种应用于建筑的能耗预测方法，预测精度高，预测性能稳定。The purpose of the present invention is to provide an LSTM energy consumption prediction method based on multi-dimensional feature selection + particle swarm optimization, which is an energy consumption prediction method applied to buildings, with high prediction accuracy and stable prediction performance.

为了实现上述目的，本发明的技术方案为：一种基于MI-LSTM-PSO的能耗预测方法，其特征在于：如图2所示，包括如下步骤，In order to achieve the above purpose, the technical solution of the present invention is: a method for predicting energy consumption based on MI-LSTM-PSO, which is characterized in that: as shown in FIG. 2 , it includes the following steps:

步骤一：采用MI互信息法，对原始数据集时间和特征维度上进行相关性分析，选取对能耗预测目标值最有效的前N′维特征，从而消除冗余数据，起到提高模型算法效率的作用；Step 1: Use the MI mutual information method to analyze the correlation between the time and feature dimensions of the original data set, and select the most effective front N' dimension features for the energy consumption prediction target value, so as to eliminate redundant data and improve the model algorithm. the role of efficiency;

步骤二：计算由MI互信息法选择得到的前N′维特征与被预测序列之间的pearson相关系数值，选择pearson相关系数值大于或等于0.5的N″维特征；Step 2: Calculate the pearson correlation coefficient value between the first N'-dimensional feature selected by the MI mutual information method and the predicted sequence, and select the N"-dimensional feature whose pearson correlation coefficient value is greater than or equal to 0.5;

步骤三：采用LSTM模型，对PMI双重特征选择后的N″维特征数据进行模型训练和预测，得到初始预测序列y(t)；Step 3: Use the LSTM model to perform model training and prediction on the N"-dimensional feature data after the PMI double feature selection, and obtain the initial prediction sequence y(t);

步骤四：采用粒子群优化PSO算法对LSTM模型的超参数units、dropout、batchsize进行寻优，从而提高LSTM模型预测的精度，最终得到MI-LSTM-PSO模型。Step 4: Use the particle swarm optimization PSO algorithm to optimize the hyperparameters units, dropout, and batchsize of the LSTM model, thereby improving the prediction accuracy of the LSTM model, and finally obtain the MI-LSTM-PSO model.

在上述技术方案中，在步骤一和步骤二中，N′为60，即选取对能耗预测目标值最有效的前60维特征。In the above technical solution, in step 1 and step 2, N' is 60, that is, the first 60-dimensional features that are most effective for the energy consumption prediction target value are selected.

在上述技术方案中，步骤一具体包括如下步骤，In the above technical solution, step 1 specifically includes the following steps:

S11，使用滑动窗口将前24小时20维特征数据形成24M(即480)维特征分量，其中原始数据序列包括：2个区域的光伏发电量，17个区域不同设施的能耗量，系统电网输入总电量(根据不同场景数据集，数据会序列有所不同)；S11, use the sliding window to form the 20-dimensional feature data of the first 24 hours into 24M (ie, 480)-dimensional feature components, wherein the original data sequence includes: photovoltaic power generation in 2 regions, energy consumption of different facilities in 17 regions, system grid input Total power (the data sequence will be different according to different scene datasets);

S12，采用MI互信息法对以上24M(即480)维特征分量进行特征选择；S12, using the MI mutual information method to perform feature selection on the above 24M (ie 480) dimension feature components;

其中，p(x,y)是x和y的联合概率密度函数，而p(x)和p(y)是边际密度函数，如果x与y完全不相关，则p(x,y)将等于p(x)p(y)，其互信息将等0，若I(X；Y)越大，则表示两个变量相关性越强；where p(x,y) is the joint probability density function of x and y, and p(x) and p(y) are the marginal density functions. If x is completely uncorrelated with y, then p(x,y) will be equal to p(x)p(y), its mutual information will be equal to 0, if I(X; Y) is larger, it means that the correlation between the two variables is stronger;

S13，通过实验寻优，确定MI特征选择维数的最优参数N；N值太大，模型训练数据集中会包含过多的冗余信息和噪声，会使预测性能变差，而N值太小，模型训练数据集包含的信息量太少，同样会使预测结果变差；通常，最优的N值在3M～6M之间，选择预测性能较好，且N值比较小的特征维数；S13, determine the optimal parameter N of the MI feature selection dimension through experimental optimization; if the value of N is too large, the model training data set will contain too much redundant information and noise, which will deteriorate the prediction performance, and the value of N is too large. Small, the model training data set contains too little information, which will also make the prediction results worse; usually, the optimal N value is between 3M and 6M, the prediction performance is better, and the N value is relatively small. ;

S14，基于特征序列x(t)和目标序列Y的互信息排序，综合时间和特征维度数据，选取对能耗预测目标值最有效的前60维特征作为后续模型的训练数据集。S14 , based on the mutual information ranking of the feature sequence x(t) and the target sequence Y, synthesizing the time and feature dimension data, select the top 60-dimensional features that are most effective for predicting the target value of energy consumption as the training data set of the subsequent model.

在上述技术方案中，步骤二具体包括如下步骤，In the above technical solution, step 2 specifically includes the following steps:

S21，计算以上60维特征分量与目标序列Y(即Gi)的皮尔逊相关系数；S21, calculate the Pearson correlation coefficient between the above 60-dimensional feature components and the target sequence Y (ie Gi);

其中，

分别为X,Y的平均值；如果r≥0.5说明X,Y之间相关性较强，否则说明X,Y之间相关性较弱；in,

are the average values of X and Y respectively; if r≥0.5, it means that the correlation between X and Y is strong, otherwise, it means that the correlation between X and Y is weak;

S22，根据pearson相关系数小于0.5说明两者相关性较弱，选择pearson相关系数大于或等于0.5的37维特征数据。S22, according to the fact that the pearson correlation coefficient is less than 0.5, the correlation between the two is weak, and the 37-dimensional feature data with the pearson correlation coefficient greater than or equal to 0.5 is selected.

在上述技术方案中，LSTM网络内部包括三个门结构和一个用于存储记忆的状态模块，如图1所示，步骤三具体包括如下步骤：In the above technical solution, the LSTM network includes three gate structures and a state module for storing memory. As shown in Figure 1, step 3 specifically includes the following steps:

S31，设C_t为本LSTM单元存储的状态信息，x_t为输入层的输入，h_t为本单元隐含层的输出，f_t为遗忘门，i_t为输入门，

为当前时刻信息，o_t为输出门，“×”表示矩阵元素相乘，“+”表示相加运算，σ为sigmoid函数；S31, set C _t as the state information stored by the LSTM unit, x _t as the input of the input layer, h _t as the output of the hidden layer of the unit, f _t as the forgetting gate, i _t as the input gate,

is the current moment information, o _t is the output gate, "×" means multiplication of matrix elements, "+" means addition operation, and σ is the sigmoid function;

S32，遗忘门：用于控制上一单元状态C_t-1被遗忘的程度，其表达式如下：S32, forget gate: used to control the degree to which the previous unit state C _t-1 is forgotten, its expression is as follows:

f_t＝σ(W_f*[h_t-1,x_t]+b_f) (3)f _t =σ(W _f *[h _t-1 ,x _t ]+b _f ) (3)

S33，输入门：用于控制哪些信息被加入到本单元中，其表达式如下：S33, input gate: used to control which information is added to this unit, and its expression is as follows:

i_t＝σ(W_i*[h_t-1,x_t]+b_i) (4)i _t =σ(W _i *[h _t-1 ,x _t ]+b _i ) (4)

S34，单元存储的状态信息：用于根据f_t和i_t将新信息有选择的记录到C_t中，其表达式如下：S34, the state information stored by the unit: it is used to selectively record new information into C _t according to _f _t and it, and its expression is as follows:

S35，输出门：用于将C_t激活，并控制C_t被过滤的程度，其表达式如下：S35, output gate: used to activate C _t and control the degree to which C _t is filtered, its expression is as follows:

o_t＝σ(W_o*[h_t-1,x_t]+b_o) (7)o _t =σ(W _o *[h _t-1 ,x _t ]+b _o ) (7)

h_t＝o_t*tanh(C_t) (8)h _t =o _t *tanh(C _t ) (8)

其中，h_t为本单元隐含层的输出；h_t-1则为上一单元隐含层的输出；W_f、W_i、

W_o分别为f_t、i_t、

o_t对应的权重矩阵，b_f、b_i、

b_o分别为f_t、i_t、

o_t对应的偏置项，tanh为双曲正切激活函数，定义如下：Among them, h _t is the output of the hidden layer of the unit; h _t-1 is the output of the hidden layer of the previous unit; W _f , _Wi ,

W _o are _ft , _it ,

Weight matrix corresponding to o _t , b _f , b _i ,

b _o are _ft , _it ,

The bias term corresponding to o _t , tanh is the hyperbolic tangent activation function, which is defined as follows:

σ(x)＝1/(1+e^-x) (9)σ(x)=1/(1+e ^-x ) (9)

tanh(x)＝(e^x-e^-x)/(e^x+e^-x) (10)tanh(x)=(e ^x -e ^-x )/(e ^x +e ^-x ) (10)

S36，输出层则依据下式将h_t经过一个全连接层得到最终预测值y_t：S36, the output layer passes h _t through a fully connected layer to obtain the final predicted value y _t according to the following formula:

y_t＝σ(W_y*h_t+b_y) (11) _y _t =σ(W _y *h _t +by ) (11)

上式中，W_y和b_y分别为权重矩阵和偏置项。In the above formula, W _y and _by are the weight matrix and the bias term, respectively.

在上述技术方案中，步骤四具体包括如下步骤，In the above technical solution, step 4 specifically includes the following steps:

S41，初始化修改参数，设置以下参数的范围units∈[20,300]，dropout∈[0,1]，batchsize∈[20,300]；S41, initialize the modification parameters, and set the ranges of the following parameters units∈[20,300], dropout∈[0,1], batchsize∈[20,300];

S42，在初始范围内，对粒子群(20个粒子)随机初始化，根据fitness function(LSTM模型拟合结果)，计算每个粒子的适应值(平均绝对误差MAE)，根据当前每个粒子的预测指标MAE确定这次迭代的粒子群的最优位置(pbest)以及历史粒子种群的最佳方位(gbest)；S42, within the initial range, randomly initialize the particle swarm (20 particles), calculate the fitness value (mean absolute error MAE) of each particle according to the fitness function (LSTM model fitting result), and calculate the fitness value (mean absolute error MAE) of each particle according to the current prediction of each particle The index MAE determines the optimal position of the particle swarm in this iteration (pbest) and the best orientation of the historical particle swarm (gbest);

S43，根据最优粒子的位置和速度，对当前粒子的位置和速度进行更新，将更新后的粒子通过LSTM模型拟合后，计算每个粒子的MAE，根据MAE更新pbest和gbest；S43, according to the position and velocity of the optimal particle, update the position and velocity of the current particle, fit the updated particle through the LSTM model, calculate the MAE of each particle, and update pbest and gbest according to the MAE;

x_i＝x_i+v_i x _i = x _i +v _i

式(12)中：i＝1，2，…，N，N是粒子群的粒子总数；In formula (12): i = 1, 2, ..., N, N is the total number of particles in the particle swarm;

v_i：第i个粒子的当前速度；v _i : the current velocity of the i-th particle;

rand()：介于(0，1)之间的随机数；rand(): a random number between (0, 1);

x_i：i粒子的当前位置；x _i : the current position of the i particle;

c₁和c₂：学习因子；c ₁ and c ₂ : learning factors;

pbest_i和gbest_i分别是当前粒子群局部最优位置和全局最优位置；pbest _i and gbest _i are the local optimal position and the global optimal position of the current particle swarm, respectively;

S44，将更新后的粒子通过LSTM模型训练后，计算每个粒子的适应值，根据适应值更新这次迭代的粒子群的最优位置以及历史粒子种群的最佳方位；S44, after the updated particles are trained by the LSTM model, the fitness value of each particle is calculated, and the optimal position of the particle swarm in this iteration and the optimal orientation of the historical particle swarm are updated according to the fitness value;

S45，当最优粒子的适应度值不再变化或者迭代次数达到上限值即认为此时算法已经达到收敛；若粒子未收敛，则继续返回S33进行粒子更新；S45, when the fitness value of the optimal particle no longer changes or the number of iterations reaches the upper limit value, it is considered that the algorithm has reached convergence; if the particle does not converge, continue to return to S33 to update the particle;

S46，将得到的最优粒子参数units、dropout、batchsize代入到LSTM模型中，对步骤一中的数据进行模型预测，得到最终的预测结果。S46: Substitute the obtained optimal particle parameters units, dropout, and batchsize into the LSTM model, perform model prediction on the data in step 1, and obtain a final prediction result.

上述“*”均表示：乘以。The above "*" all mean: multiply.

本发明具有如下优点：The present invention has the following advantages:

(1)本发明为一种应用于建筑的能耗预测方法，预测精度高，预测性能稳定；(1) The present invention is an energy consumption prediction method applied to buildings, with high prediction accuracy and stable prediction performance;

(2)本发明通过MI减少了87.5％的多余特征，对提高模型算法效率起到了很好的作用，模型算法效率高；(2) The present invention reduces 87.5% of redundant features through MI, which plays a good role in improving the efficiency of the model algorithm, and the model algorithm has high efficiency;

(3)本发明采用PSO算法对LSTM模型的超参数units、dropout、batchsize进行寻优，从而提高LSTM模型预测的精度，且模型拟合效果好；(3) The present invention uses the PSO algorithm to optimize the hyperparameters units, dropout and batchsize of the LSTM model, thereby improving the accuracy of the LSTM model prediction, and the model fitting effect is good;

(4)本发明中的PMI-PSO-LSTM模型的预测值基本处于真实值的置信区间内，且预测的趋势与真实值接近，预测精度高；(4) The predicted value of the PMI-PSO-LSTM model in the present invention is basically within the confidence interval of the real value, and the predicted trend is close to the real value, and the prediction accuracy is high;

(5)本发明中的PMI-PSO-LSTM组合模型的MAE和SMAPE均优于其他模型的所有结果，具有更高的鲁棒性以及更为稳定的预测性能。(5) The MAE and SMAPE of the PMI-PSO-LSTM combined model in the present invention are superior to all results of other models, and have higher robustness and more stable prediction performance.

附图说明Description of drawings

图1为现有的LSTM内部结构示意图。Figure 1 is a schematic diagram of the internal structure of the existing LSTM.

图2为本发明的PMI-PSO-LSTM模型结构示意图。FIG. 2 is a schematic structural diagram of the PMI-PSO-LSTM model of the present invention.

图3为本发明实施例基础模型预测结果对比曲线图。FIG. 3 is a graph showing a comparison of prediction results of a basic model according to an embodiment of the present invention.

图4为本发明实施例基础模型预测结果对比散点图。FIG. 4 is a comparison scatter diagram of the prediction results of the basic model according to the embodiment of the present invention.

图5为本发明实施例组合模型预测结果对比曲线图。FIG. 5 is a graph showing a comparison of prediction results of a combined model according to an embodiment of the present invention.

图6为本发明实施例组合模型预测结果对比散点图。FIG. 6 is a comparison scatter diagram of the prediction results of the combined model according to the embodiment of the present invention.

图7为本发明实施例模型评价指标对比图。FIG. 7 is a comparison diagram of model evaluation indicators according to an embodiment of the present invention.

具体实施方式Detailed ways

下面结合附图详细说明本发明的实施情况，但它们并不构成对本发明的限定，仅作举例而已。同时通过说明使本发明的优点更加清楚和容易理解。The implementation of the present invention will be described in detail below with reference to the accompanying drawings, but they do not constitute a limitation of the present invention, but are merely examples. At the same time, the advantages of the present invention are made clearer and easier to understand by the description.

实施例Example

现以本发明试用于某建筑的用电量预测为实施例对本发明进行详细说明，对本发明应用于其它建筑能耗预测同样具有指导作用。Now, the present invention will be described in detail by taking the power consumption prediction of a certain building as an example, which also has a guiding role for the present invention to be applied to the energy consumption prediction of other buildings.

本实施将某建筑的历史用电量作为时间序列进行短期单步1h的用电量预测。In this implementation, the historical electricity consumption of a building is used as a time series for short-term single-step 1h electricity consumption forecast.

本实施例中，某建筑的用电量预测，包括如下内容：In this embodiment, the electricity consumption forecast of a building includes the following contents:

1.实验数据集及MI特征选择1. Experimental dataset and MI feature selection

本实施例所用数据集为某建筑2019年10月15日至2019年6月4日的用电量，该数据集一共20个特征。这些特征的描述如表1所示。其中第5列数据为当前特征与Gi特征的pearsonr相关系数值。The data set used in this example is the electricity consumption of a building from October 15, 2019 to June 4, 2019, and the data set has a total of 20 features. A description of these features is shown in Table 1. The data in the fifth column is the pearsonr correlation coefficient value between the current feature and the Gi feature.

表1数据集说明Table 1 Dataset Description

本实施例使用前24小时的数据预测下一小时Gi的值，故使用滑动窗口将24小时的20个特征的数据形成480个特征分量。然后使用MI互信息法选择使用滑动窗口法形成的480个特征分量中MI值最大的前60维特征。In this embodiment, the data of the previous 24 hours is used to predict the value of Gi in the next hour, so a sliding window is used to form 480 feature components from the data of 20 features in 24 hours. Then the MI mutual information method is used to select the top 60-dimensional features with the largest MI value among the 480 feature components formed using the sliding window method.

选择结果如表2所示；选择结果如表2所示；The selection results are shown in Table 2; the selection results are shown in Table 2;

其中，选择的特征例如Gi(t-1)表示，以当前时间为基准前一小时从工业厂房公共电网中输入；Among them, the selected feature, such as Gi(t-1), is input from the public power grid of the industrial plant one hour before the current time as the benchmark;

表2 MI选择的特征Table 2 Features selected by MI

其中，选择的特征例如Gi(t-1)表示，以当前时间为基准前一小时从工业厂房公共电网中输入。MI值为当前特征分量X与以当前时间为基准的Gi分量(即I(X；Gi(t))的互信息值大小。由表2可知前四小时的大部分特征与当前时刻的Gi特征的互信息值较大，Gi、Ao、Co、A2前24个小时的特征与当前时刻的Gi特征的互信息值也相对较大。MI减少了87.5％的多余特征，对提高模型算法效率起到了很好的作用。Among them, the selected feature, such as Gi(t-1), represents the input from the public grid of the industrial plant one hour before the current time. The MI value is the mutual information value of the current feature component X and the Gi component (that is, I(X; Gi(t)) based on the current time. It can be seen from Table 2 that most of the features in the previous four hours and the Gi feature at the current moment The mutual information value of Gi, Ao, Co, and A2 is relatively large, and the mutual information value of the Gi, Ao, Co, and A2 features in the first 24 hours and the Gi feature at the current moment are also relatively large. MI reduces the redundant features by 87.5%, which is effective in improving the efficiency of the model algorithm. to a good effect.

本实施例采用的数据集是20维特征，用之前24小时数据，来预测未来的第25小时数据。The data set used in this embodiment is a 20-dimensional feature, and the data of the previous 24 hours is used to predict the data of the 25th hour in the future.

对本实施例中20维特征的数据集，进行了实验，实验结果表明：Experiments have been carried out on the data set of 20-dimensional features in this embodiment, and the experimental results show that:

1)选择前60维特征得到的预测结果与100维差不多；1) The prediction result obtained by selecting the first 60-dimensional features is similar to that of 100-dimensional features;

2)特征数据维度再增多(即选择特征数据维度大于100)后，会使预测结果变差；2) After the dimension of feature data is increased (that is, the dimension of feature data is selected to be greater than 100), the prediction result will be worse;

3)特征数据维度减少(即选择特征数据维度小于60)后，数据集包含的信息量太少，同样会使预测结果变差。3) After the dimension of the feature data is reduced (that is, the dimension of the feature data is selected to be less than 60), the data set contains too little information, which will also make the prediction result worse.

因此，本实施例使用MI互信息法选择使用滑动窗口法形成的480个特征分量中MI值最大的前60维特征。Therefore, in this embodiment, the MI mutual information method is used to select the top 60-dimensional features with the largest MI value among the 480 feature components formed by the sliding window method.

2.评价指标2. Evaluation indicators

使用4种评价指标来评判模型的好坏程度。Use 4 evaluation indicators to judge the quality of the model.

均方根误差：RMSE，数值越小，表示模型拟合效果越好。Root mean square error: RMSE, the smaller the value, the better the model fitting effect.

平均绝对误差：MAE，数值越小，表示模型拟合效果越好。Mean absolute error: MAE, the smaller the value, the better the model fitting effect.

对称平均绝对百分比误差：SMAPE，数值越小，表示模型拟合效果越好。Symmetric mean absolute percentage error: SMAPE, the smaller the value, the better the model fitting effect.

可决系数：R2，数值越大，表示模型拟合效果越好。Coefficient of determination: R2, the larger the value, the better the model fitting effect.

式(13)、(14)、(15)、(16)中，

为预测值，y_i为真实值，

为真实值的均值，n为数据数量。In formulas (13), (14), (15), (16),

is the predicted value, y _i is the actual value,

is the mean of the true values, and n is the number of data.

3.模型参数设置3. Model parameter settings

为了验证提出MI+PSO-LSTM组合模型的预测效果，本实施例采用表3中的两组6个实验模型(即M1-M6)做实验对比，模型的主要参数如表4，表5所示。In order to verify the prediction effect of the proposed MI+PSO-LSTM combined model, two groups of 6 experimental models (ie M1-M6) in Table 3 are used for experimental comparison in this embodiment. The main parameters of the models are shown in Table 4 and Table 5. .

表3实验对比基准模型Table 3 Experiments vs. benchmark models

NoNo 模型Model 描述describe M1M1 ARIMAARIMA 差分整合移动平均自回归模型Difference Integrated Moving Average Autoregressive Model M2M2 KNRKNR K近邻(回归)模型K-nearest neighbor (regression) model M3M3 LSTMLSTM LSTM模型LSTM model M4M4 MI-LSTMMI-LSTM 互信息法+LSTM模型Mutual information method + LSTM model M5M5 PMI-LSTMPMI-LSTM 互信息法+LSTM模型Mutual information method + LSTM model M6M6 PMI-LSTM-PSOPMI-LSTM-PSO 互信息法+PSO优化LSTM模型Mutual information method + PSO optimization LSTM model

表4对比模型主要参数1Table 4 The main parameters of the comparison model 1

表5对比模型主要参数2Table 5 The main parameters of the comparison model 2

4.模型实验数据分析4. Model experiment data analysis

4.1、基础模型实验结果分析4.1. Analysis of the experimental results of the basic model

本实施例采用表3的基础模型M1-M3，通过特征1-20对公共电网输入总电量Gi，进行单步预测实验对比。In this embodiment, the basic models M1-M3 in Table 3 are used, and a single-step prediction experiment comparison is performed for the total electricity Gi input to the public power grid through features 1-20.

实验对比结果(表6)中，从可决系数、均方根误差、对称平均绝对百分比误差这四个模型预测评价指标中均可看出LSTM模型预测结果最好。In the experimental comparison results (Table 6), it can be seen from the four model prediction evaluation indicators of coefficient of determination, root mean square error, and symmetric mean absolute percentage error that the LSTM model has the best prediction results.

表6基础模型实验对比Table 6 Comparison of basic model experiments

模型Model R2R2 RMSERMSE MAEMAE SMAPESMAPE ARIMAARIMA 0.8726090.872609 12.168812.1688 7.4961747.496174 8.5481758.548175 KNRKNR 0.8495560.849556 13.2161213.21612 8.1554538.155453 9.5432629.543262 LSTMLSTM 0.8895030.889503 11.21102411.211024 6.6220126.622012 7.5948667.594866

ARMA、K近邻和LSTM预测1h用电量的预测结果与真实值的比对如图3和图4所示。由图3和图4可以看出LSTM模型预测的趋势与真实值最接近，且仅有LSTM模型在原始值的置信区间里。ARIMA与K近邻模型预测的结果曲线既不在真实值的置信区间内，又存在预测滞后问题。综上对比于ARMA、K近邻回归模型，LSTM模型的预测效果是最佳的。所以选择LSTM作为实验基础模型。Figures 3 and 4 show the comparison between the predicted results of ARMA, K-nearest neighbors and LSTM to predict 1h electricity consumption and the actual value. It can be seen from Figure 3 and Figure 4 that the trend predicted by the LSTM model is the closest to the real value, and only the LSTM model is in the confidence interval of the original value. The result curve predicted by ARIMA and K-nearest neighbor model is not within the confidence interval of the true value, but also has the problem of prediction lag. In summary, compared with the ARMA and K nearest neighbor regression models, the prediction effect of the LSTM model is the best. So choose LSTM as the experimental base model.

4.2、LSTM组合模型实验结果分析4.2. Analysis of experimental results of LSTM combined model

本实施例采用表3的组合模型M3-M6，通过特征1-20对公共电网输入总电量Gi，进行了20组单步预测对比实验。In this embodiment, the combined models M3-M6 in Table 3 are used, and 20 groups of single-step prediction and comparison experiments are carried out for the total electricity Gi input to the public power grid through features 1-20.

四种模型预测1h用电量Gi的预测结果与真实值的比对如图5和图6所示。由图5和图6可以看出四个模型的预测值基本处于真实值的置信区间内，而且PMI-PSO-LSTM模型预测的趋势与真实值最接近。从图7可以看出，PMI-PSO-LSTM模型的各项评价指标均为最优(图7中，M3、M4、M5、M6分别为本实施例采用表3的组合模型M3-M6)。Figure 5 and Figure 6 show the comparison between the prediction results of the four models to predict the electricity consumption Gi for 1 hour and the actual value. It can be seen from Figure 5 and Figure 6 that the predicted values of the four models are basically within the confidence interval of the true value, and the trend predicted by the PMI-PSO-LSTM model is the closest to the true value. It can be seen from FIG. 7 that the evaluation indicators of the PMI-PSO-LSTM model are all optimal (in FIG. 7, M3, M4, M5, and M6 respectively use the combined models M3-M6 in Table 3 in this embodiment).

表7给出了四种组合模型20组实验结果的平均值，前四列为预测模型的四种评价指标，第五列为预测模型的训练时间。从表7中可以看出，对比于LSTM、MI-LSTM和PMI-LSTM模型，MI+PSO-LSTM模型在R2上提高并不明显，但是在MAE、SMAPE上性能分别提高了20％、10％、5％左右。对比于LSTM模型，MI-LSTM的性能并没有显著提升，但是通过MI选择特征之后，输入数据的维数减少了87.5％，导致模型训练的时间减少了63％左右。对比于MI-LSTM模型，PMI-LSTM的性能并几乎没有提升，但是通过二次特征选择后，输入数据的维数减了40％左右，导致模型训练的时间减少了20％左右；Table 7 shows the average of the 20 experimental results of the four combined models. The first four columns are the four evaluation indicators of the prediction model, and the fifth column is the training time of the prediction model. As can be seen from Table 7, compared with the LSTM, MI-LSTM and PMI-LSTM models, the MI+PSO-LSTM model is not significantly improved on R2, but the performance on MAE and SMAPE is improved by 20% and 10% respectively. , 5% or so. Compared with the LSTM model, the performance of MI-LSTM is not significantly improved, but after selecting features through MI, the dimension of the input data is reduced by 87.5%, resulting in a reduction of model training time by about 63%. Compared with the MI-LSTM model, the performance of the PMI-LSTM has not improved almost, but after the secondary feature selection, the dimension of the input data is reduced by about 40%, resulting in a reduction of the model training time by about 20%;

表7组合模型评价指标对比Table 7 Comparison of evaluation indicators of combined models

模型Model R2R2 RMSERMSE MAEMAE SMAPESMAPE tt LSTMLSTM 0.887240.88724 11.1828211.18282 7.197667.19766 8.569868.56986 159S159S MI-LSTMMI-LSTM 0.897220.89722 10.6759010.67590 6.666396.66639 7.823607.82360 59S59S PMI-LSTMPMI-LSTM 0.923010.92301 10.7325610.73256 6.420706.42070 7.492997.49299 46S46S MI-PSO-LSTMMI-PSO-LSTM 0.904820.90482 10.2771710.27717 6.128436.12843 6.878696.87869 44S44S

图7是M3-M6的20组实验四项评价指标的箱线图。图7中不在箱子形状内的’+’符号为异常值(可以忽略不计)。从图7中可以看出，MI-PSO-LSTM模型的四项评价指标明显优于其他三种模型，且MI-PSO-LSTM模型每次的MAE和SMAPE均优于其他模型的所有结果，而MI-PSO-LSTM模型每次的R2和RMSE也有95％左右的数据优于其他模型。而MI+LSTM的四项评价指标与LSTM虽然有部分重合，但是MI-LSTM总体趋势上是优于LSTM模型的。从图7中可以看出，对比于LSTM模型、MI-LSTM模型和PMI-LSTM模型，MI-PSO-LSTM模型的箱线图形状(上下四分位数差值)最小，这说明MI-PSO-LSTM模型比其他模型更为稳定。Figure 7 is a boxplot of the four evaluation indicators of the 20 groups of M3-M6 experiments. The '+' symbols that are not within the box shape in Figure 7 are outliers (can be ignored). As can be seen from Figure 7, the four evaluation indicators of the MI-PSO-LSTM model are significantly better than the other three models, and the MAE and SMAPE of the MI-PSO-LSTM model each time are better than all the results of the other models, while The MI-PSO-LSTM model also outperforms other models by about 95% of the data in R2 and RMSE each time. The four evaluation indicators of MI+LSTM and LSTM partially overlap, but the overall trend of MI-LSTM is better than the LSTM model. As can be seen from Figure 7, compared with the LSTM model, the MI-LSTM model and the PMI-LSTM model, the MI-PSO-LSTM model has the smallest boxplot shape (the difference between the upper and lower quartiles), which shows that the MI-PSO - LSTM model is more stable than other models.

综上所述，本发明提出了一种基于PMI、PSO、LSTM的短期能耗组合预测模型。首先，在数据预处理阶段，使用互信息法和皮尔逊系相关系数对原始数据进行双重特征选择，删除冗余特征。然后使用PSO对LSTM的网络架构进行匹配化寻优，使得LSTM的拓扑结构与当前输入数据适配性达到最好，最后将特征选择后的数据输入到优化好的LSTM中，对能耗数据进行短期预测。本发明为了验证MI-PSO-LSTM模型在短期能耗预测上的效果，对某建筑的能耗时间序列数据集进行了多维单步预测对比实验。综合上述实验的结果表明，MI-PSO-LSTM组合模型的4种评价指标均为最优，即说明MI-PSO-LSTM模型具有更高的预测精度和鲁棒性以及更为稳定的预测性能。MI-PSO-LSTM组合模型可以为利用深度学习探索时间序列的预测分析方面提供一个有益的研究思路。然而，MI-PSO-LSTM组合模型仍有很大的优化空间，例如研究时间序列的噪声过滤问题和特征动态智能选择问题，从而进一步优化模型预测精度。To sum up, the present invention proposes a short-term energy consumption combined prediction model based on PMI, PSO and LSTM. First, in the data preprocessing stage, double feature selection is performed on the original data using the mutual information method and the Pearson correlation coefficient to remove redundant features. Then use PSO to match and optimize the network architecture of LSTM, so that the topological structure of LSTM is best suited to the current input data. Finally, the data after feature selection is input into the optimized LSTM, and the energy consumption data is analyzed. short-term forecast. In order to verify the effect of the MI-PSO-LSTM model on short-term energy consumption prediction, the present invention conducts a multi-dimensional single-step prediction comparison experiment on the energy consumption time series data set of a building. The results of the above experiments show that the four evaluation indicators of the MI-PSO-LSTM combined model are all optimal, which means that the MI-PSO-LSTM model has higher prediction accuracy, robustness and more stable prediction performance. The MI-PSO-LSTM combined model can provide a useful research idea for exploring the predictive analysis aspects of time series using deep learning. However, the MI-PSO-LSTM combined model still has a lot of room for optimization, such as studying the noise filtering problem of time series and the problem of dynamic intelligent selection of features, so as to further optimize the prediction accuracy of the model.

其它未说明的部分均属于现有技术。Other unexplained parts belong to the prior art.

Claims

1. A LSTM energy consumption prediction method based on dual feature selection and particle swarm optimization is characterized by comprising the following steps: comprises the following steps of (a) carrying out,

the method comprises the following steps: performing correlation analysis on time and feature dimensions of the original data set by adopting an MI mutual information method, and selecting front N' dimension features most effective on the energy consumption prediction target value;

step two: performing secondary feature selection on the N-dimensional features selected in the step one by adopting a Person correlation coefficient to obtain N' dimensional features after PMI feature selection;

step three: carrying out model training and prediction on the N' dimensional feature data after PMI feature selection by adopting an LSTM model to obtain an initial prediction sequence y (t);

step four: and optimizing the hyperparameter units, dropout and batchsize of the LSTM model by adopting a particle swarm optimization PSO algorithm, thereby improving the prediction precision of the LSTM model and finally obtaining the PMI-LSTM-PSO model.

2. The LSTM energy consumption prediction method based on dual feature selection + particle swarm optimization according to claim 1, wherein: the first step specifically comprises the following steps of,

s11, forming the M-dimensional feature data of the first 24 hours into 24M-dimensional feature components using a sliding window, wherein the original data sequence includes: photovoltaic power generation of 2 areas, energy consumption of 17 different facilities of the areas, and total electric quantity input by a system power grid;

s12, selecting the characteristics of the 24M dimensional characteristic components by using an MI mutual information method;

in formula (1): p (X, Y) is a joint probability density function of X and Y, and p (X) and p (Y) are marginal density functions, if X and Y are not related at all, p (X, Y) will be equal to p (X) p (Y), and mutual information will be equal to 0, if I (X; Y) is larger, the correlation between the two variables is stronger;

s13, determining the optimal parameter N of MI feature selection dimension through experimental optimization; if the value of N is too large, the model training data set will contain too much redundant information and noise, which will deteriorate the prediction performance, while if the value of N is too small, the model training data set will contain too little information, which will also deteriorate the prediction result; generally, the optimal N value is between 3M and 6M, and the feature dimension with better prediction performance and smaller N value is selected;

and S14, based on mutual information sequencing of the characteristic sequence x (t) and the target sequence Y, integrating time and characteristic dimension data, and selecting the first N' dimension characteristic most effective on the energy consumption prediction target value as a training data set of a subsequent model.

3. The LSTM energy consumption prediction method based on dual feature selection + particle swarm optimization according to claim 2, wherein: the second step specifically comprises the following steps:

s21, calculating a Pearson correlation coefficient of the N' -dimensional characteristic component and the target sequence Y;

in formula (2):

respectively the average values of X and Y;

if r is more than or equal to 0.5, the correlation between X and Y is stronger, otherwise, the correlation between X and Y is weaker;

s22, selecting the N' dimension characteristic data with pearson correlation coefficient larger than or equal to 0.5.

4. The LSTM energy consumption prediction method based on dual feature selection + particle swarm optimization according to claim 3, wherein: the LSTM network internally comprises three gate structures and a state module for storing and memorizing, and the third step specifically comprises the following steps:

s31, setting C_tFor the state information stored for the local LSTM cell, x_tAs input to the input layer, h_tFor the output of the hidden layer of this unit, f_tTo forget the door, i_tIn order to input the information into the gate,

as information of the current time o_tFor the output gate, "×" indicates matrix element multiplication, "+" indicates addition operation, σ is sigmoid function;

s32, forget gate: for controlling the last cell state C_t-1The degree of forgetting, the expression of which is as follows:

f_t＝σ(W_f*[h_t-1,x_t]+b_f) (3)

s33, input gate: for controlling which information is added to the unit, the expression is as follows:

i_t＝σ(W_i*[h_t-1,x_t]+b_i) (4)

s34, cell stored state information: for according to f_tAnd i_tSelectively recording new information to C_tWherein the expression is as follows:

s35, output gate: for mixing C_tActivating and controlling C_tThe degree of filtering is expressed as follows:

o_t＝σ(W_o*[h_t-1,x_t]+b_o) (7)

h_t＝o_t*tanh(C_t) (8)

formula (3) to formula (8): w is_f、W_i、

W_oAre respectively f_t、i_t、

o_tCorresponding weight matrix, b_f、b_i、

b_oAre respectively f_t、i_t、

o_tThe corresponding bias term, tanh, is a hyperbolic tangent activation function, defined as follows:

σ(x)＝1/(1+e^-x) (9)

tanh(x)＝(e^x-e^-x)/(e^x+e^-x) (10)

s36, the output layer is h_tObtaining the final predicted value y through a full connection layer_t：

y_t＝σ(W_y*h_t+b_y) (11)

In formula (11): w_yAnd b_yRespectively, a weight matrix and an offset term.

5. The LSTM energy consumption prediction method based on dual feature selection + particle swarm optimization according to claim 4, wherein: the fourth step specifically comprises the following steps of,

s41, initializing modification parameters, setting the range units belonging to [20,300], dropout belonging to [0,1], batchsize belonging to [20,300 ];

s42, randomly initializing the particle swarm in an initial range, calculating an adaptive value of each particle according to the fixness function, and determining pbest of the iterated particle swarm and gbest of the historical particle swarm according to the prediction index MAE of each current particle;

s43, updating the position and the speed of the current particle according to the position and the speed of the optimal particle, fitting the updated particle through an LSTM model, calculating the MAE of each particle, and updating pbest and gbest according to the MAE;

v_i＝v_i+c₁×rand()×(pbest_i-x_i)+c₂×rand()×(gbest_i-x_i) (12)

x_i＝x_i+v_i

in formula (12): i is 1, 2, …, N is the total number of particles in the population;

v_i: the current velocity of the ith particle;

and rand (): a random number between (0, 1);

x_i: i current position of the particle;

c₁and c₂: a learning factor;

pbest_iand gbest_iRespectively obtaining a local optimal position and a global optimal position of the current particle swarm;

s44, after the updated particles are trained through an LSTM model, calculating the adaptive value of each particle, and updating the optimal position of the particle swarm of the iteration and the optimal orientation of the historical particle swarm according to the adaptive value;

s45, when the fitness value of the optimal particle is not changed any more or the iteration number reaches the upper limit value, the algorithm is considered to have converged at the moment; if the particle is not converged, the flow returns to S33 to update the particle;

and S46, substituting the obtained optimal particle parameters units, dropout and batchsize into the LSTM model, and performing model prediction on the data in the first step to obtain a final prediction result.