CN106709820A

CN106709820A - Power system load prediction method and device based on deep belief network

Info

Publication number: CN106709820A
Application number: CN201710021315.8A
Authority: CN
Inventors: 吴争荣; 董旭柱; 陆锋; 刘志文; 陶文伟; 谢雄威; 陈立明; 何锡祺; 俞小勇; 陈根军; 禤亮; 苏颜; 李瑾; 陶凯
Original assignee: China South Power Grid International Co ltd; China Southern Power Grid Co Ltd; NR Electric Co Ltd; Electric Power Research Institute of Guangxi Power Grid Co Ltd; Nanning Power Supply Bureau of Guangxi Power Grid Co Ltd; Power Grid Technology Research Center of China Southern Power Grid Co Ltd
Current assignee: China South Power Grid International Co ltd; China Southern Power Grid Co Ltd; NR Electric Co Ltd; Electric Power Research Institute of Guangxi Power Grid Co Ltd; Nanning Power Supply Bureau of Guangxi Power Grid Co Ltd; Power Grid Technology Research Center of China Southern Power Grid Co Ltd
Priority date: 2017-01-11
Filing date: 2017-01-11
Publication date: 2017-05-24

Abstract

The embodiment of the invention provides a method and a device for predicting a load of a power system based on a deep belief network, relates to the field of power systems, and can improve convergence speed and reduce prediction errors. The specific scheme comprises the following steps: acquiring a training sample and a test sample; constructing an energy function of the RBM model; training the at least one hidden layer and the visible layer by utilizing the training sample to obtain the weight of the training sample between the nodes of the at least one hidden layer and the visible layer; and inputting the output data obtained by the training sample and the test sample into the trained DBN to obtain a predicted value of the power system load. The method is used for predicting the load of the power system.

Description

A method and device for power system load forecasting based on deep belief network

技术领域technical field

本发明的实施例涉及电力系统领域，尤其涉及一种基于深度置信网络的电力系统负荷预测方法及装置。Embodiments of the present invention relate to the field of power systems, and in particular to a method and device for power system load forecasting based on a deep belief network.

背景技术Background technique

电力系统负荷预测是电力系统规划的重要组成部分，也是电力系统经济运行的基础，它从已知的用电需求出发，充分考虑政治、经济、气候等相关因素的影响，预测未来的用电需求。精确的负荷预测数据有助于电网调度控制和安全运行，制定合理的电源建设规划以及提高电力系统的经济效益和社会效益。在负荷预测中，常采用回归模型、时间序列预测技术和灰色理论预测技术等方法。近年来，随着人工神经网络研究的兴起，利用神经网络进行电力系统负荷预测，使预测误差有很大的降低，引起负荷预测工作者的高度重视。Power system load forecasting is an important part of power system planning and the basis of power system economic operation. It starts from the known power demand and fully considers the influence of political, economic, climate and other related factors to predict future power demand. . Accurate load forecasting data is helpful for power grid dispatching control and safe operation, formulating reasonable power supply construction planning and improving the economic and social benefits of the power system. In load forecasting, methods such as regression model, time series forecasting technology and gray theory forecasting technology are often used. In recent years, with the rise of artificial neural network research, the use of neural network for power system load forecasting has greatly reduced the forecasting error, which has aroused great attention of load forecasting workers.

现有技术中利用传统神经网络来进行电力系统的负荷预测。然而，传统神经网络是一种典型的全局逼近网络，网络的一个或多个权值对每个输出都有影响；另一方面，网络在确定权值时具有随机性，导致每次训练后输入、输出间的关系不定，预测结果存在差异。可见现有技术的方案存在收敛速度慢且预测误差大的问题。In the prior art, the traditional neural network is used for load forecasting of the power system. However, the traditional neural network is a typical global approximation network, and one or more weights of the network have an influence on each output; on the other hand, the network has randomness in determining the weights, resulting in the input after each training , The relationship between the output is uncertain, and the prediction results are different. It can be seen that the solutions in the prior art have the problems of slow convergence speed and large prediction error.

发明内容Contents of the invention

本发明的实施例提供一种基于深度置信网络的电力系统负荷预测方法及装置，能够提高收敛速度，降低预测误差。Embodiments of the present invention provide a power system load forecasting method and device based on a deep belief network, which can improve convergence speed and reduce forecasting errors.

为了达成上述目的，本申请的实施例采用如下技术方案：In order to achieve the above purpose, the embodiments of the present application adopt the following technical solutions:

第一方面，提供一种基于DBN的电力系统负荷预测方法，DBN由多层受限玻尔兹曼机RBM构成，包括至少一个隐层和一个可见层，所述电力系统负荷预测方法包括：In the first aspect, a DBN-based power system load forecasting method is provided. The DBN is composed of a multi-layer restricted Boltzmann machine RBM, including at least one hidden layer and one visible layer. The power system load forecasting method includes:

获取训练样本和测试样本；构造RBM模型的能量函数；Obtain training samples and test samples; construct the energy function of the RBM model;

利用所述训练样本逐层训练所述至少一个隐层和可见层，得到所述训练样本在所述至少一个隐层和可见层节点间的权值；Using the training samples to train the at least one hidden layer and visible layer layer by layer, to obtain the weight of the training samples between the at least one hidden layer and visible layer nodes;

将由所述训练样本得到的输出数据，以及所述测试样本输入经过训练后的DBN，得到对电力系统负荷的预测值。The output data obtained from the training samples and the test samples are input into the trained DBN to obtain the predicted value of the power system load.

第二方面，提供一种基于DBN的电力系统负荷预测装置，用于执行第一方面所提供的方法。In the second aspect, a DBN-based power system load forecasting device is provided, which is used to implement the method provided in the first aspect.

本发明的实施例所提供的基于深度置信网络的电力系统负荷预测方法及装置，利用多层受限玻尔兹曼机构成深度置信网络，包括若干隐层和一个可见层，可以通过采用层次无监督贪婪预训练方法分层预训练RBM，并将得到的结果作为监督学习训练概率模型的初始值，从而极大改善学习性能，提高收敛速度，降低预测误差。The power system load forecasting method and device based on the deep belief network provided by the embodiment of the present invention uses a multi-layer restricted Boltzmann machine to form a deep belief network, including several hidden layers and a visible layer, which can be achieved by using hierarchical The supervised greedy pre-training method pre-trains RBM hierarchically, and uses the obtained results as the initial value of the supervised learning training probability model, thereby greatly improving the learning performance, increasing the convergence speed, and reducing the prediction error.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following will briefly introduce the accompanying drawings that need to be used in the description of the embodiments. Obviously, the accompanying drawings in the following description are only of the present invention. For some embodiments, those of ordinary skill in the art can also obtain other drawings based on these drawings without any creative effort.

图1为本发明的实施例所提供的基于深度置信网络的电力系统负荷预测方法流程示意图；Fig. 1 is a schematic flow chart of a power system load forecasting method based on a deep belief network provided by an embodiment of the present invention;

图2为RBM包括一个可见层和一个隐含层时的说明示意图；Fig. 2 is a schematic diagram illustrating when RBM includes a visible layer and a hidden layer;

图3为本发明的实施例中对三种模型不同结构下的最优错误识别率的说明示意图；Fig. 3 is a schematic diagram illustrating the optimal error recognition rate under different structures of three models in an embodiment of the present invention;

图4本发明的实施例中对平均绝对百分误差随迭代次数变化的曲线图；In the embodiment of the present invention in Fig. 4, the curve graph of the average absolute percentage error changing with the number of iterations;

图5本发明的实施例所提供的基于DBN的电力系统负荷预测装置的结构示意图。FIG. 5 is a schematic structural diagram of a DBN-based power system load forecasting device provided by an embodiment of the present invention.

具体实施方式detailed description

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

针对传统神经网络的负荷预测收敛速度慢，预测误差大的问题，本文提出一种基于深度置信网络(英文全称：Deep Belief Network，英文简称：DBN)的电力系统负荷预测方案，结合图1所示，该方法包括以下步骤：Aiming at the slow convergence speed and large prediction error of traditional neural network load forecasting, this paper proposes a power system load forecasting scheme based on Deep Belief Network (English full name: Deep Belief Network, English abbreviation: DBN), which is shown in Figure 1 , the method includes the following steps:

101、获取训练样本和测试样本。101. Obtain training samples and test samples.

102、构造RBM模型的能量函数。102. Construct the energy function of the RBM model.

103、利用训练样本逐层训练至少一个隐层和可见层，得到训练样本在至少一个隐层和可见层节点间的权值。103. Use the training samples to train at least one hidden layer and visible layer layer by layer, and obtain weights of the training samples between nodes in the at least one hidden layer and the visible layer.

104、将由训练样本得到的输出数据，以及测试样本输入经过训练后的DBN，得到对电力系统负荷的预测值。104. Input the output data obtained from the training samples and the test samples into the trained DBN to obtain a predicted value of the power system load.

本发明的具体实施例中，DBN由多层受限玻尔兹曼机(英文全称：RestrictedBoltzmann Machine，RBM)构成，包括至少一个隐层和一个可见层，通过采用层次无监督贪婪预训练方法分层预训练RBM，并将得到的结果作为监督学习训练概率模型的初始值，从而极大改善学习性能，说明如下：In a specific embodiment of the present invention, the DBN is composed of a multi-layer Restricted Boltzmann Machine (English full name: RestrictedBoltzmann Machine, RBM), including at least one hidden layer and one visible layer. Layer pre-training RBM, and the obtained result is used as the initial value of the supervised learning training probability model, thereby greatly improving the learning performance, as follows:

一、数据预处理1. Data preprocessing

S1、对训练样本进行预处理：S1. Preprocess the training samples:

假设所有训练样本为X＝{X₁,X₂,...,X_M}，其中X_i表示一组负荷数据，M表示负荷数据的个数，对于输入样本，算法采用4个步骤进行归一化：Assume that all training samples are X={X ₁ ,X ₂ ,...,X _M _} , where Xi represents a set of load data, and M represents the number of load data. For input samples, the algorithm adopts 4 steps for regression One chemical:

①计算均值：① Calculate the mean:

其中，u表示样本的均值。Among them, u represents the mean of the sample.

②计算方差：② Calculate the variance:

其中，δ表示样本的方差。Among them, δ represents the variance of the sample.

③白化：③Albinism:

其中，X_i'表示样本数据的白化参数。Among them, X _i ' represents the whitening parameter of the sample data.

④归一化：④Normalization:

其中，X_i,n表示样本数据的归一化参数；Among them, X _{i, n} represent the normalization parameters of the sample data;

S2、对测试样本进行预处理：S2. Preprocessing the test samples:

测试样本的预处理需要将其按照训练样本的均值和方差白化，然后按照训练样本的最大、最小值均匀归一化到0到1之间，假设单个测试样本为T，则归一化步骤为：The preprocessing of the test sample needs to be whitened according to the mean and variance of the training sample, and then uniformly normalized to between 0 and 1 according to the maximum and minimum values of the training sample. Assuming that a single test sample is T, the normalization step is :

①白化：① Whitening:

其中，T'表示测试样本数据的白化参数，T为单个测试样本，u表示训练样本的均值，δ表示训练样本的方差。Among them, T' represents the whitening parameter of the test sample data, T is a single test sample, u represents the mean of the training samples, and δ represents the variance of the training samples.

②归一化：②Normalization:

其中，T_n表示测试样本数据的归一化参数。Among them, T _n represents the normalization parameter of the test sample data.

此处，白化的原因是因为自然数据相邻元素之间有较大的相关性，因此通过白化可以降低数据的冗余性，类似于主成分分析(英文全称：Principal Component Analysis，英文间称：PCA)降维。Here, the reason for whitening is because there is a large correlation between adjacent elements of natural data, so the redundancy of data can be reduced through whitening, similar to principal component analysis (English full name: Principal Component Analysis, English name: PCA) dimensionality reduction.

二、深度置信网络的基本原理及构成2. The basic principle and composition of deep belief network

DBN是一种能量模型，它的特征学习能力非常强大，是一种很早便得到研究的深度网络。其学习方法是由Hinton等人提出的，实质是通过构建具有很多隐层的机器学习模型和海量的训练数据，来学习更有用的特征，从而最终提升分类或预测的准确性。DBN is an energy model with a very powerful feature learning ability, and it is a deep network that has been studied very early. The learning method was proposed by Hinton et al. The essence is to learn more useful features by building a machine learning model with many hidden layers and a large amount of training data, so as to finally improve the accuracy of classification or prediction.

DBN可以看成由多层RBM构成的复杂神经网络，它包括一个可见层和若干隐层，采用对比散度的方法训练RBM的预权值。深度神经网络用层次无监督贪婪预训练方法分层预训练RBM，将得到的结果作为监督学习训练概率模型的初始值，学习性能得到很大改善。DBN can be regarded as a complex neural network composed of multi-layer RBM, which includes a visible layer and several hidden layers, and uses the method of contrast divergence to train the pre-weight value of RBM. The deep neural network uses the hierarchical unsupervised greedy pre-training method to pre-train the RBM hierarchically, and the obtained results are used as the initial value of the supervised learning training probability model, and the learning performance is greatly improved.

RBM是生成式的随机神经网络，用来学习关于输入数据的概率分布，可以根据需要使用在监督和非监督两种方式训练，其在很多方面都有广泛运用。如图2所示RBM有两层网络分别表示为：向量h＝(h₁,h₂,…h_m)表示隐层单元，共有m个元素；向量v＝(v₁,v₂,…,v_n)表示可见层，共有n个元素。因为层内部元素之间是无连接的，所以层内各个元素之间是互相独立的，即有：RBM is a generative random neural network, which is used to learn the probability distribution of input data. It can be trained in both supervised and unsupervised ways as needed. It is widely used in many aspects. As shown in Figure 2, the RBM has two layers of networks, which are respectively expressed as: vector h=(h ₁ ,h ₂ ,…h _m ) represents the hidden layer unit, and there are m elements in total; vector v=(v ₁ ,v ₂ ,…, v _n ) represents the visible layer, with a total of n elements. Because there is no connection between the internal elements of the layer, each element in the layer is independent of each other, that is:

p(h|v)＝p(h₁|v)p(h₂|v)…p(h_m|v)；p(h|v)＝p(h ₁ |v)p(h ₂ |v)...p(h _m |v);

p(v|h)＝p(v₁|h)p(v₂|h)…p(v_n|h)。p(v|h)=p(v ₁ |h) p(v ₂ |h)...p(v _n |h).

一般可见层和隐层单元的分布满足伯努利形式，那么有：Generally, the distribution of visible layer and hidden layer unit satisfies the Bernoulli form, then:

由此可知，通过每一层的概率分布就可以求出整个模型的概率分布，从而可以逐层进行无监督的训练。将下层RBM的输出作为上层RBM的输入。直到最后一层隐含层，将最后一层隐含层输入到softmax分类器，然后计算估计误差进行反向微调。It can be seen that the probability distribution of the entire model can be obtained through the probability distribution of each layer, so that unsupervised training can be performed layer by layer. The output of the lower layer RBM is used as the input of the upper layer RBM. Up to the last hidden layer, the last hidden layer is input to the softmax classifier, and then the estimated error is calculated for reverse fine-tuning.

S3、构造RBM模型的能量函数：S3, constructing the energy function of the RBM model:

隐层和可见层的满足伯努利分布的RBM模型的能量函数定义为：The energy function of the RBM model satisfying the Bernoulli distribution of the hidden layer and the visible layer is defined as:

其中，a_i和b_j表示偏置项，w_ij表示可见单元与隐层单元之间的连接权值。θ＝(a,b,w)是RBM模型的权值参数，向量h＝(h₁,h₂,…h_m)表示隐层单元，共有m个元素，向量v＝(v₁,v₂,…,v_n)表示可见层，共有n个元素。Among them, a _i and b _j represent bias items, and w _ij represents the connection weight between visible units and hidden layer units. θ=(a,b,w) is the weight parameter of the RBM model, the vector h=(h ₁ ,h ₂ ,…h _m ) represents the hidden layer unit, there are m elements in total, the vector v=(v ₁ ,v ₂ ,…,v _n ) represent the visible layer, with a total of n elements.

S4、计算可见层、隐层单元的联合概率分布：S4. Calculate the joint probability distribution of visible layer and hidden layer units:

给定模型参数，可见、隐层单元的联合概率分布可以表示为：Given the model parameters, the joint probability distribution of visible and hidden layer units can be expressed as:

其中，是一个归一化因子，-E(v,h)是隐层和可见层的RBM模型的能量函数，向量h＝(h₁,h₂,…h_m)表示隐层单元，共有m个元素。in, is a normalization factor, -E(v,h) is the energy function of the RBM model of the hidden layer and the visible layer, the vector h=(h ₁ ,h ₂ ,…h _m ) represents the hidden layer unit, and there are m elements in total .

根据式(7)和式(8)可以推导出：According to formula (7) and formula (8), it can be deduced that:

式中， In the formula,

S5、计算可见层与隐层的边缘分布：S5. Calculate the edge distribution of the visible layer and the hidden layer:

根据可见层与隐层的联合概率分布P(v,h)，可以得到边缘分布：According to the joint probability distribution P(v,h) of the visible layer and the hidden layer, the marginal distribution can be obtained:

其中，是一个归一化因子，E(v,h)是隐层和可见层的RBM模型的能量函数，向量h＝(h₁,h₂,…h_m)表示隐层单元，共有m个元素。in, is a normalization factor, E(v,h) is the energy function of the RBM model of the hidden layer and the visible layer, and the vector h=(h ₁ ,h ₂ ,…h _m ) represents the hidden layer unit, with m elements in total.

S6、构造对数似然函数为：S6, constructing the logarithmic likelihood function is:

其中，l是训练样本的数目，θ＝(a,b,w)是RBM模型的权值参数，P(v_i)是可见层与隐层的边缘分布。Among them, l is the number of training samples, θ=(a,b,w) is the weight parameter of the RBM model, P(v _i ) is the marginal distribution of the visible layer and the hidden layer.

S7、计算对数似然函数的梯度：S7. Calculating the gradient of the logarithmic likelihood function:

根据梯度下降法对对数似然函数进行求导，可得：According to the gradient descent method to derive the logarithmic likelihood function, we can get:

整理可得：Organized to get:

其中，lnL(θ)是对数似然函数，l是训练样本的数目，θ＝(a,b,w)是RBM模型的权值参数，P(h|v_i)是可见层与隐层的边缘分布，E(v,h)是隐层和可见层的RBM模型的能量函数，P(h,v_i)是可见层、隐层单元的联合概率分布。Among them, lnL(θ) is the logarithmic likelihood function, l is the number of training samples, θ=(a,b,w) is the weight parameter of the RBM model, P(h|v _i ) is the visible layer and the hidden layer The marginal distribution of , E(v,h) is the energy function of the RBM model of the hidden layer and the visible layer, P(h,v _i ) is the joint probability distribution of the visible layer and hidden layer units.

三、利用深度置信网络预测电力系统负荷的流程3. The process of predicting power system load using deep belief network

对于深度学习来说，其思想就是对堆叠多个层，也就是说这一层的输出作为下一层的输入。比如有一个系统S，它有n层(S1,…Sn)，它的输入是I，输出是O，可表示为：I＝>S1＝>S2＝>…..＝>Sn＝>O，通过调整系统中参数，使得输出O仍然是输入I，那么就可以自动地获取得到输入I的一系列层次特征，即S1，…,Sn。通过这种方式，可以实现对输入信息进行分级表达。利用深度置信网络预测电力系统负荷的过程本质上就是一个深度学习的过程。For deep learning, the idea is to stack multiple layers, that is to say, the output of this layer is used as the input of the next layer. For example, there is a system S, which has n layers (S1,...Sn), its input is I, and its output is O, which can be expressed as: I=>S1=>S2=>.....=>Sn=>O, By adjusting the parameters in the system so that the output O is still the input I, then a series of hierarchical features of the input I can be automatically obtained, namely S1,...,Sn. In this way, hierarchical expression of input information can be realized. The process of using deep belief network to predict power system load is essentially a deep learning process.

利用深度置信网络预测电力系统负荷的流程为：采用对比散度的方法训练RBM的预权值。通过分析，算法采用三层隐含层，并采用层次无监督贪婪预训练方法分层预训练RBM，将得到的结果作为监督学习训练概率模型的初始值，从而提高学习性能。仿真数据显示，与传统神经网络算法相比，该算法收敛快，预测误差小。The process of using deep belief network to predict power system load is as follows: using the method of contrastive divergence to train the pre-weight value of RBM. Through analysis, the algorithm uses three hidden layers, and adopts hierarchical unsupervised greedy pre-training method to pre-train RBM hierarchically. The obtained results are used as the initial value of the supervised learning training probability model, thereby improving the learning performance. The simulation data show that, compared with the traditional neural network algorithm, the algorithm converges faster and the prediction error is smaller.

①采用对比散度的方法训练RBM的预权值。KL-散度是非对称度量，所以KL(Q||P)与KL(P||Q)的值是不同的。两个分布的差别越大，KL-散度的值也越大。RBM模型的对数似然函数的最大值最终演化为计算两个概率分布的KL-散度之差：① Use the method of contrast divergence to train the pre-weight value of RBM. KL-divergence is an asymmetric measure, so the values of KL(Q||P) and KL(P||Q) are different. The greater the difference between the two distributions, the greater the value of the KL-divergence. The maximum value of the log-likelihood function of the RBM model eventually evolves to calculate the difference between the KL-divergence of two probability distributions:

KL(Q||P)-KL(P_k||P) (式15)KL(Q||P)-KL(P _k ||P) (Equation 15)

其中，Q是先验分布，P_k是k步Gibbs采样后的分布。若Gibbs链达到平稳状态(由于初始状态v⁽⁰⁾＝v，已经为平稳状态)，则有P_k＝P，也即KL(P_k||P)＝0。这样对比散度算法得到的估计误差等于0。Among them, Q is the prior distribution, and P _k is the distribution after k-step Gibbs sampling. If the Gibbs chain reaches a steady state (since the initial state v ⁽⁰⁾ = v, it is already a steady state), then P _k =P, that is, KL(P _k ||P)=0. In this way, the estimation error obtained by the contrastive divergence algorithm is equal to 0.

②将得到的结果作为监督学习训练概率模型的初始值，相对熵又称为KL-散度常用来衡量两个概率分布的距离。两个概率Q和P分布在状态空间中的KL-散度定义如下：②The obtained result is used as the initial value of the supervised learning training probability model, and the relative entropy, also known as KL-divergence, is often used to measure the distance between two probability distributions. The KL-divergence of two probability Q and P distributions in state space is defined as follows:

KL(Q||P)是两个概率Q和P分布在状态空间中的KL-散度。KL(Q||P) is the KL-divergence of two probability distributions Q and P in state space.

③将得到的预权值和初始值作为输入数据，传递至下一层，在下一层中再进行训练。每次训练一层网络，逐层训练。具体的，先用无标定数据训练第一层，训练时先学习第一层的参数(这一层可以看作是得到一个使得输出和输入差别最小的三层神经网络的隐层)，在学习得到第n-1层后，将n-1层的输出作为第n层的输入，训练第n层，由此分别得到各层的参数。因此，训练过程就是一个重复迭代的过程。当所有层训练完后，使用wake-sleep算法进行调优。③Use the obtained pre-weight value and initial value as input data, pass it to the next layer, and then train in the next layer. Train one layer of network at a time, layer by layer. Specifically, the first layer is first trained with uncalibrated data, and the parameters of the first layer are first learned during training (this layer can be regarded as a hidden layer of a three-layer neural network that minimizes the difference between the output and the input). After the n-1th layer is obtained, the output of the n-1 layer is used as the input of the nth layer, and the nth layer is trained to obtain the parameters of each layer respectively. Therefore, the training process is an iterative process. When all layers are trained, use the wake-sleep algorithm for tuning.

wake阶段：认知过程，通过外界的特征和向上的权重(认知权重)产生每一层的抽象表示(结点状态)，并且使用梯度下降修改层间的下行权重(生成权重)。Wake stage: the cognitive process, which generates an abstract representation (node state) of each layer through external features and upward weights (cognitive weights), and uses gradient descent to modify the downlink weights between layers (generated weights).

sleep阶段：生成过程，通过顶层表示和向下权重，生成底层的状态，同时修改层间向上的权重。Sleep stage: the generation process, through the top-level representation and downward weights, the underlying state is generated, and the upward weights between layers are modified at the same time.

④在预训练后，DBN可以通过利用带标签数据用BP算法去对判别性能做调整。在这里，一个标签集将被附加到顶层(推广联想记忆)，通过一个自下向上的，学习到的识别权值获得一个网络的分类面。这个性能会比单纯的BP算法训练的网络好。这可以很直观的解释，DBNs的BP算法只需要对权值参数空间进行一个局部的搜索，这相比前向神经网络来说，训练是要快的，而且收敛的时间也少。④ After pre-training, DBN can use the BP algorithm to adjust the discriminative performance by using the labeled data. Here, a label set will be appended to the top layer (generalizing associative memory) to obtain a classification facet of the network through a bottom-up, learned recognition weight. This performance will be better than the network trained by pure BP algorithm. This can be explained intuitively. The BP algorithm of DBNs only needs to perform a local search on the weight parameter space. Compared with the forward neural network, the training is faster and the convergence time is also less.

⑤当训练和调优步骤结束后，在最高两层(即最后一层隐层和可见层)，权值被连接到一起，这样更低层的输出将会提供一个参考的线索或者关联给顶层，这样顶层就会将其联系到它的记忆内容。从而，进行电力系统的负荷值预测。⑤When the training and tuning steps are over, at the top two layers (ie, the last hidden layer and the visible layer), the weights are connected together so that the output of the lower layer will provide a reference clue or link to the top layer, This way the top layer will associate it with the contents of its memory. Thus, the load value prediction of the electric power system is performed.

在一种具体的实施方式中，S8：根据式16计算可见层、隐层两个概率Q和P分布在状态空间中的KL-散度；S9：根据式15将RBM模型的对数似然函数的最大值最终演化为计算两个概率分布Q和P的KL-散度之差。S10：S1-S9是通过RBM模型对训练样本进行分层训练，得到训练样本在可见层节点和隐层节点间的权值，权值从隐层输入到可见层，可见层会记忆该权值内容并得出训练样本的负荷预测值，用于电力系统的负荷的预测过程。S11、测试样本加上S10得到的训练样本的平均负荷值、最大负荷值、平均气温、数值天气预报的预测日平均气温作为深度神经网络算法的输入数据，通过S1-S9的计算，输出的数据即是预测电力系统的负荷值。In a specific implementation, S8: Calculate the KL-divergence of the two probabilities Q and P of the visible layer and the hidden layer in the state space according to Formula 16; S9: Calculate the logarithmic likelihood of the RBM model according to Formula 15 The maximum value of the function eventually evolves to calculate the difference of the KL-divergence of two probability distributions Q and P. S10: S1-S9 is to carry out hierarchical training on the training samples through the RBM model, and obtain the weight value of the training sample between the visible layer node and the hidden layer node. The weight value is input from the hidden layer to the visible layer, and the visible layer will remember the weight value The content and the load prediction value of the training samples are obtained, which are used in the load prediction process of the power system. S11, the test sample plus the average load value, maximum load value, average temperature, and the predicted daily average temperature of the numerical weather forecast of the training sample obtained in S10 are used as the input data of the deep neural network algorithm, through the calculation of S1-S9, the output data That is to predict the load value of the power system.

四、实验设置4. Experimental settings

负荷测试使用EUNITE竞赛所提供的东斯洛伐克电力公司2007年1月～10月负荷数据作训练数据，11月～12月数据用作预测对比数据。其中神经网络的输入数据包括：前一天48个采样点(每半小时的平均负荷数据)、前一天的平均温度、前一天负荷峰值、前一天负荷谷值、前一天平均值、预测日平均温度。预测算法一次性预测第二天的48点数据。The load test used the load data of East Slovakia Electric Power Company from January to October 2007 provided by EUNITE competition as training data, and the data from November to December was used as forecast comparison data. The input data of the neural network includes: 48 sampling points of the previous day (average load data every half hour), the average temperature of the previous day, the peak load of the previous day, the valley value of the load of the previous day, the average value of the previous day, and the predicted daily average temperature . The prediction algorithm predicts 48 points of data for the next day at one time.

在负荷预测的应用中，一般需要0小时～24小时内每半小时的负荷预测数据，算法取预测日前一天0点～24点一共48个历史数据，加上预测日前一天平均负荷值、最大负荷值、平均气温、数值天气预报的预测日平均气温作为深度神经网络算法的输入数据。输出数据是预测日0点～24点每半小时的负荷数据。In the application of load forecasting, load forecasting data is generally required every half hour within 0 hours to 24 hours. The algorithm takes a total of 48 historical data from 0:00 to 24:00 of the day before the forecast day, plus the average load value and maximum load of the day before the forecast day. The daily average temperature predicted by the numerical weather forecast is used as the input data of the deep neural network algorithm. The output data is the load data every half hour from 0:00 to 24:00 on the forecast day.

为了比较算法性能，选用了BP神经网络、自组织模糊神经网络(英文全称：Self-organizing Fuzzy Neural Network，英文简称：SOFNN)和深度置信网络进行对比。BP神经网络具有良好的捕捉非线性规律的特性，当隐层的神经元足够多时，3层感知器模型可以实现任何非线性函数的逼近。SOFNN算法的主要优点在于能够自动地决定网络结构并且给出模型参数，具有很好的预测精度。In order to compare the performance of the algorithm, BP neural network, self-organizing fuzzy neural network (English full name: Self-organizing Fuzzy Neural Network, English abbreviation: SOFNN) and deep belief network are selected for comparison. BP neural network has good characteristics of capturing nonlinear laws. When there are enough neurons in the hidden layer, the 3-layer perceptron model can realize the approximation of any nonlinear function. The main advantage of the SOFNN algorithm is that it can automatically determine the network structure and give model parameters, which has good prediction accuracy.

评判标准为平均绝对百分误差(MAPE)，定义如下：The evaluation standard is the mean absolute percentage error (MAPE), which is defined as follows:

其中，D_i和分别为1997年某月第i天最大负荷的真实值和预测值，n为1997年某月的天数。Among them, D _i and are the real value and predicted value of the maximum load on the i-th day of a certain month in 1997, respectively, and n is the number of days in a certain month in 1997.

五、性能评价5. Performance Evaluation

(1)网络结构对算法性能的影响(1) Influence of network structure on algorithm performance

首先，实验研究网络结构(隐含层即神经元数目)对预测效果的影响，取2007年1月～10月负荷数据作训练数据，11月负荷数据作为测试数据。网络结构分别使用1、2、3层隐含层。神经元数目分别使用20、50、100、200、400。此处，为方便比较，每层的神经元数目相同。对于初始参数的设置，下列参数的选择将从这些范围内手动选择以获取最优识别率：BP学习率(0.1，0.05，0.02，0.01，0.005)，预训练学习率(0.01，0.005，0.002，0.001)。图3显示了三种模型不同结构下的最优错误识别率。由图3可知，当隐含层数较少或者神经元个数较少时没有经过预训练的BP网络性能较优，当只有1层隐含层时，神经元个数达到200时SOFNN的错误率才和BP的相当，而当隐含层数为2层和3层时，神经元个数达到100和50时SOFNN的性能就将接近并超过BP。DBN也同样，可以发现神经元数目较少的情况下SOFNN的性能与DBN相比较优，神经元数目较多的情况下则相反。并且当隐含层数相同时，因为过拟合问题。BP网络的性能并不是随着节点个数递增，当节点数过多时反而下降，而深度模型则一直递增并趋于稳定，这说明经过预训练的深度网络的性能随着网络规模的扩大(包括隐含层数增多或节点数增多)是逐渐优化的。可以看到，过少的隐含层数及隐含节点数会降低深度模型的性能。原因可以这样解释，预训练模型的作用是提取输入特征中的核心特征，由于稀疏性条件的限制，假设神经元个数过少，对于一些输入样本的输入，只有少量的神经元被激活，而这些特征无法代表原始的输入，因此丢失了一些信息量，造成了性能的下降。虽然网络规模越大深度模型的性能越好，但同时训练时间也加长了，因此需要在性能与训练时间进行权衡。First of all, the experiment studies the influence of the network structure (hidden layer, that is, the number of neurons) on the prediction effect, taking the load data from January to October 2007 as the training data, and the load data in November as the test data. The network structure uses 1, 2, and 3 hidden layers respectively. The number of neurons was 20, 50, 100, 200, 400, respectively. Here, for the convenience of comparison, the number of neurons in each layer is the same. For the initial parameter setting, the selection of the following parameters will be manually selected from these ranges to obtain the optimal recognition rate: BP learning rate (0.1, 0.05, 0.02, 0.01, 0.005), pre-training learning rate (0.01, 0.005, 0.002, 0.001). Figure 3 shows the optimal false recognition rates of the three models under different structures. It can be seen from Figure 3 that when the number of hidden layers is small or the number of neurons is small, the performance of the BP network without pre-training is better. When there is only one hidden layer and the number of neurons reaches 200, the error of SOFNN is The rate is comparable to that of BP, and when the number of hidden layers is 2 and 3, and the number of neurons reaches 100 and 50, the performance of SOFNN will be close to and exceed that of BP. The same is true for DBN. It can be found that the performance of SOFNN is better than that of DBN when the number of neurons is small, and the opposite is true when the number of neurons is large. And when the number of hidden layers is the same, because of the overfitting problem. The performance of the BP network does not increase with the number of nodes, but decreases when the number of nodes is too large, while the depth model has been increasing and tends to be stable, which shows that the performance of the pre-trained deep network increases with the expansion of the network size (including The number of hidden layers increases or the number of nodes increases) is gradually optimized. It can be seen that too few hidden layers and hidden nodes will reduce the performance of the deep model. The reason can be explained as follows. The function of the pre-training model is to extract the core features of the input features. Due to the limitation of the sparsity condition, assuming that the number of neurons is too small, only a small number of neurons are activated for the input of some input samples, while These features cannot represent the original input, so some information is lost, resulting in a decrease in performance. Although the larger the network size, the better the performance of the deep model, but at the same time the training time is longer, so a trade-off between performance and training time is required.

(2)算法收敛性比较(2) Algorithm convergence comparison

图4是平均绝对百分误差随迭代次数变化曲线。算法采用三层隐含层，每层100每层100个神经元的模型。预训练时学习率都取0.01。微调时学习率取0.1，动量项系数取0.5。2007年1月～11月负荷数据作训练数据，12月负荷数据作为测试数据。比较迭代1000次下的BP、SOFNN、DBN的平均绝对百分误差变化。Figure 4 is the curve of the mean absolute percentage error versus the number of iterations. The algorithm adopts a model with three hidden layers, 100 neurons in each layer, and 100 neurons in each layer. The learning rate is set to 0.01 during pre-training. When fine-tuning, the learning rate is 0.1, and the coefficient of the momentum item is 0.5. The load data from January to November 2007 is used as training data, and the load data from December is used as test data. Compare the average absolute percentage error changes of BP, SOFNN, and DBN under 1000 iterations.

分析图4可以看到随着迭代次数的增加，三种模型的错误率都逐渐下降，这是由于随着训练次数增加网络参数的分布越来越接近最小值点。但是BP网络在达到一定的训练次数后错误率出现震荡并且有逐渐增加的趋势，而经过预训练DBN网络的错误率稳定下降，这间接的证明了预训练可以使网络初始参数的分布区域更加接近最小值点，并且有效地避免了局部震荡。Analyzing Figure 4, it can be seen that as the number of iterations increases, the error rates of the three models gradually decrease. This is because the distribution of network parameters is getting closer to the minimum point as the number of training increases. However, after reaching a certain number of training times, the error rate of the BP network fluctuates and tends to increase gradually, while the error rate of the pre-trained DBN network decreases steadily, which indirectly proves that pre-training can make the distribution area of the initial parameters of the network closer to minimum point, and effectively avoid local shocks.

负荷预测为配电网管理决策和运行方式提供了重要依据。本发明提出采用深度置信网络的负荷预测算法，改进现有神经网络算法学习速率较慢，预测效率低的问题。仿真结果显示，相比于传统神经网络算法，基于深度置信网络的算法的电力负荷预测效果明显。Load forecasting provides an important basis for distribution network management decision-making and operation mode. The invention proposes a load forecasting algorithm using a deep belief network to improve the problems of slow learning rate and low forecasting efficiency of the existing neural network algorithm. The simulation results show that compared with the traditional neural network algorithm, the power load forecasting effect of the algorithm based on the deep belief network is obvious.

本发明的实施例还提供一种基于DBN的电力系统负荷预测装置，用于执行上述实施例中所描述的电力系统负荷预测方法，结合图5所示，电力系统负荷预测装置包括：Embodiments of the present invention also provide a DBN-based power system load forecasting device, which is used to implement the power system load forecasting method described in the above embodiments. As shown in FIG. 5 , the power system load forecasting device includes:

数据处理单元501，用于获取训练样本和测试样本；构造RBM模型的能量函数；Data processing unit 501, for obtaining training sample and test sample; Construct the energy function of RBM model;

训练单元502，用于利用训练样本逐层训练至少一个隐层和可见层，得到训练样本在至少一个隐层和可见层节点间的权值；A training unit 502, configured to use training samples to train at least one hidden layer and visible layer layer by layer, to obtain weights of training samples between at least one hidden layer and visible layer nodes;

预测单元503，用于将由训练样本得到的输出数据，以及测试样本输入经过训练后的DBN，得到对电力系统负荷的预测值。The prediction unit 503 is configured to input the output data obtained from the training samples and the test samples into the trained DBN to obtain a predicted value of the power system load.

可选的，数据处理单元501，还用于对由负荷数据X_i构成的训练样本X＝{X₁,X₂,...,X_M}进行归一化；其中X_i表示一组负荷数据，M表示负荷数据的个数；还用于将测试样本按照训练样本的均值和方差白化，并按照训练样本的最大、最小值均匀归一化到0到1之间。Optionally, the data processing unit 501 is further configured to normalize the training samples X={X ₁ , X ₂ ,...,X _M _} composed of load data Xi; wherein Xi _represents a set of load Data, M represents the number of load data; it is also used to whiten the test samples according to the mean and variance of the training samples, and uniformly normalize them between 0 and 1 according to the maximum and minimum values of the training samples.

可选的，可见层和隐层单元的分布满足伯努利形式；Optionally, the distribution of the visible and hidden layer units satisfies the Bernoulli form;

数据处理单元501所定义的可见层和隐层的满足伯努利分布的RBM模型的能量函数为：The energy function of the RBM model satisfying the Bernoulli distribution of the visible layer and the hidden layer defined by the data processing unit 501 is:

可选的，训练单元502，具体用于采用对比散度的装置训练RBM的预权值，将得到的结果作为监督学习训练概率模型的初始值；将得到的预权值和初始值作为输入数据，传递至下一层网络，在下一层网络中再进行训练；每次训练一层网络，逐层训练至少一个隐层和可见层。Optionally, the training unit 502 is specifically used to train the pre-weight value of the RBM using a contrastive divergence device, and use the obtained result as the initial value of the supervised learning training probability model; use the obtained pre-weight value and initial value as input data , passed to the next layer of network, and then trained in the next layer of network; each time a layer of network is trained, at least one hidden layer and visible layer are trained layer by layer.

可选的，由训练样本得到的输出数据，至少包括训练样本的平均负荷值、最大负荷值、平均气温、数值天气预报的预测日平均气温。Optionally, the output data obtained from the training samples at least includes the average load value, the maximum load value, the average temperature of the training samples, and the predicted daily average temperature of the numerical weather prediction.

本发明的实施例所提供的基于深度置信网络的电力系统负荷预测方法及装置，利用多层受限玻尔兹曼机构成深度置信网络，包括若干隐层和一个可见层，可以通过采用层次无监督贪婪预训练方法分层预训练RBM，并将得到的结果作为监督学习训练概率模型的初始值，从而极大改善学习性能，提高收敛速度，降低预测误差，改进现有神经网络算法学习速率较慢，预测效率低的问题。仿真结果显示，相比于传统神经网络算法，基于深度置信网络的算法的电力负荷预测效果明显。The power system load forecasting method and device based on the deep belief network provided by the embodiment of the present invention uses a multi-layer restricted Boltzmann machine to form a deep belief network, including several hidden layers and a visible layer, which can be achieved by using hierarchical The supervised greedy pre-training method layeredly pre-trains RBM, and uses the obtained result as the initial value of the supervised learning training probability model, thereby greatly improving the learning performance, increasing the convergence speed, reducing the prediction error, and improving the learning rate of the existing neural network algorithm. Slow and low prediction efficiency. The simulation results show that compared with the traditional neural network algorithm, the power load forecasting effect of the algorithm based on the deep belief network is obvious.

以上，仅为本发明的具体实施方式，但本发明的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本发明揭露的技术范围内，可轻易想到变化或替换，都应涵盖在本发明的保护范围之内。因此，本发明的保护范围应以权利要求的保护范围为准。The above is only a specific embodiment of the present invention, but the protection scope of the present invention is not limited thereto. Any person familiar with the technical field can easily think of changes or replacements within the technical scope disclosed in the present invention, and should cover Within the protection scope of the present invention. Therefore, the protection scope of the present invention should be based on the protection scope of the claims.

Claims

1. A power system load prediction method based on a Deep Belief Network (DBN), wherein the DBN is composed of a plurality of layers of Restricted Boltzmann Machines (RBMs) and comprises at least one hidden layer and one visible layer, and the power system load prediction method comprises the following steps:

acquiring a training sample and a test sample; constructing an energy function of the RBM model;

training the at least one hidden layer and the visible layer by utilizing the training sample to obtain the weight of the training sample between the nodes of the at least one hidden layer and the visible layer;

and inputting the output data obtained by the training sample and the test sample into the trained DBN to obtain a predicted value of the power system load.

2. The method of claim 1, wherein the constructing the RBM model energy function is preceded by:

for the load data X_iFormed training sample X ═ X₁,X₂,...,X_MNormalizing; wherein X_iRepresenting a group of load data, and M representing the number of the load data;

and whitening the test sample according to the mean value and the variance of the training sample, and uniformly normalizing the test sample to be between 0 and 1 according to the maximum value and the minimum value of the training sample.

3. The power system load prediction method of claim 1, wherein the distribution of visible layer and hidden layer units satisfies bernoulli form; the energy function for constructing the RBM model comprises the following steps:

the energy function of the RBM model satisfying bernoulli distribution for the visible and hidden layers is defined as:

E (v, h) = - Σ_{i = 1}^{n} Σ_{j = 1}^{m} v_{i} w_{i j} h_{j} - Σ_{i = 1}^{n} a_{i} v_{i} - Σ_{j = 1}^{m} b_{j} h_{j};

wherein, a_iAnd b_jRepresents a bias term, w_ijRepresenting the connection weight between the visible unit and the hidden layer unit, theta ═ a, b, w are weight parameters of the RBM model, and the vector h ═ h₁,h₂,…h_m) Represents a hidden layer unit, has m elements in total, and has a vector v ═ v₁,v₂,…,v_n) Representing a visible layer, for a total of n elements.

4. The method according to claim 3, wherein the training the at least one hidden layer and the visible layer by using the training samples layer by layer to obtain the weight of the training samples between the at least one hidden layer and the visible layer node comprises:

training the preweight of the RBM by adopting a divergence contrast method, and taking the obtained result as an initial value of a probability model for supervised learning training;

the obtained preset value and the initial value are used as input data and transmitted to a next layer of network, and training is carried out in the next layer of network;

and training the network one layer at a time, and training the at least one hidden layer and the visible layer by layer.

5. The power system load prediction method according to claim 3,

the output data obtained from the training samples at least comprise the average load value, the maximum load value, the average air temperature and the predicted daily average air temperature of the numerical weather forecast of the training samples.

6. An electric power system load prediction device based on a Deep Belief Network (DBN), wherein the DBN is composed of a plurality of layers of Restricted Boltzmann Machines (RBMs) and comprises at least one hidden layer and one visible layer, and the electric power system load prediction device comprises:

the data processing unit is used for acquiring a training sample and a test sample; constructing an energy function of the RBM model;

the training unit is used for training the at least one hidden layer and the visible layer by utilizing the training samples to obtain the weight of the training samples between the nodes of the at least one hidden layer and the visible layer;

and the prediction unit is used for inputting the output data obtained by the training sample and the test sample into the trained DBN to obtain a predicted value of the power system load.

7. The power system load prediction device of claim 6,

the data processing unit is also used for processing the load data X_iFormed training sample X ═ X₁,X₂,...,X_MNormalizing; wherein X_iRepresenting a group of load data, and M representing the number of the load data; and the device is also used for whitening the test sample according to the mean value and the variance of the training sample and uniformly normalizing the test sample to be between 0 and 1 according to the maximum value and the minimum value of the training sample.

8. The power system load prediction device of claim 6,

the distribution of the visible layer and the hidden layer units meets the Bernoulli form;

the energy function of the RBM model satisfying Bernoulli distribution of the visible layer and the hidden layer defined by the data processing unit is as follows:

E (v, h) = - Σ_{i = 1}^{n} Σ_{j = 1}^{m} v_{i} w_{i j} h_{j} - Σ_{i = 1}^{n} a_{i} v_{i} - Σ_{j = 1}^{m} b_{j} h_{j};

9. The power system load prediction device according to claim 8,

the training unit is specifically used for training the preset value of the RBM by adopting a device for comparing divergence, and taking the obtained result as an initial value of the probability model for supervised learning training; the obtained preset value and the initial value are used as input data and transmitted to a next layer of network, and training is carried out in the next layer of network; and training the network one layer at a time, and training the at least one hidden layer and the visible layer by layer.

10. The power system load prediction device according to claim 8,