CN109102126B

CN109102126B - Theoretical line loss rate prediction model based on deep migration learning

Info

Publication number: CN109102126B
Application number: CN201810999797.9A
Authority: CN
Inventors: 卢志刚; 杨英杰; 丁艺楠; 顾媛媛; 杨宇
Original assignee: Yanshan University
Current assignee: Yanshan University
Priority date: 2018-08-30
Filing date: 2018-08-30
Publication date: 2021-12-10
Anticipated expiration: 2038-08-30
Also published as: CN109102126A

Abstract

The invention discloses a theoretical line loss rate prediction model based on deep migration learning, and relates to the technical field of application of artificial intelligence algorithms in power systems. Then, in order to avoid trapping in local optimization during deep learning model training, a concept of transfer learning is introduced, the predicted data is combined, the distribution difference of source and target data is measured through an MMD method, source training data is screened, the screened training data is used for fine tuning the trained DNN deep neural network, and finally the TDBN-DNN-based deep transfer learning model is obtained. Finally, using the power grid operation data as model input to predict the line loss rate; the method solves the problems of strong power grid, high operation efficiency, energy conservation and environmental protection and the construction of the smart power grid.

Description

Theoretical line loss rate prediction model based on deep migration learning

Technical Field

The invention relates to the technical field of application of artificial intelligence algorithms in power systems, in particular to a method for predicting theoretical line loss rate based on deep migration learning.

Background

In a two-zero-eight scientific conference with the subject of 'direction and key of application and research of artificial intelligence in the electric power field' held by the national electric academy of sciences in 2017, 12 and 6, experts mostly make special reports on the technical progress of artificial intelligence and application practices and prospects in an electric power system, preliminarily reach consensus, and focus on theoretical and technical applications such as deep learning, reinforcement learning, transfer learning, small sample learning and the like. At present, due to the strong power grid, high-efficiency operation, energy-saving and environment-friendly requirements and the construction of a smart power grid, a large amount of data is generated in the operation of the power grid, so that a new rapid calculation method is found, which is a problem to be solved urgently in the line loss research field. The difficulty is how to improve the speed and the precision of multidimensional data processing so as to ensure the timeliness and the accuracy of the theoretical line loss rate calculation. The deep learning theory is used as the latest research result in the field of pattern recognition and machine learning, and achieves great results in the aspect of big data processing in the fields of image and voice processing and the like by strong modeling and representation capabilities.

However, conventional machine learning methods typically have two premise assumptions: (1) the samples in the training dataset and the testing dataset must follow the same probability distribution (2) sufficient training samples (in tens of thousands) are required. These two preconditions are often not true in practical applications. The migration learning appears to successfully solve the above limitations of the conventional machine learning.

In summary, it is necessary to provide a line loss rate algorithm for deep transfer learning.

Disclosure of Invention

The invention aims to provide a method for predicting theoretical line loss rate based on deep transfer learning. Transfer learning is utilized which solves the two problems described above with conventional machine learning; the intelligent power grid system aims to meet the requirements of strong power grid, high operation efficiency, energy conservation and environmental protection and the construction of the intelligent power grid.

In order to solve the technical problems under the background that a large amount of data is generated in the operation of a power grid, the technical scheme adopted by the invention is as follows: a method for predicting theoretical line loss rate based on deep transfer learning is characterized by comprising the following steps:

step 1, establishing a DBN deep confidence network formed by stacking a plurality of RBM models; wherein, a parameter theta in a log-likelihood function obtained by each RBM model needs to be derived by using a contrast divergence algorithm;

step 2, connecting an output layer of the DBN deep belief network to an input layer of the DNN model to form a DBN-DNN deep learning model; the DNN model consists of a plurality of layers of common neural networks, and the last layer is an output layer;

step 3, freezing a lower DBN in the DBN-DNN deep learning model deep network, measuring the distribution distance between source data and task prediction data by using an MMD method, and migrating rho in a source sample_i>0, obtaining a model for fine-tuning a DNN structure in the DBN-DNN based on the migrated data, namely a TDBN-DNN migration deep learning model:

and 4, simulating a nonlinear mapping relation among load data, power supply data, bus voltage data and line loss rate when the power grid operates by using the TDBN-DNN migration deep learning model, and predicting the line loss rate.

The further technical scheme is that the specific steps of the step 1 are as follows:

firstly, building a DBN deep confidence network

The RBM model is an energy model whose energy function is defined as E for a set of known states (v, h)_θ(v, h), where θ is the network parameter θ ═ { a ═ a_i,b_j,w_ijIts joint probability distribution of hidden and visible layers is defined as p (v, h), i.e.:

in the formula (1), m is the number of input units, n is the number of output units, a_iFor each input cell real-valued offset, b_jFor real-valued bias of each hidden unit, v_iFor representing input data, h_iFor hidden unit output data, w_ijIs a real-valued weight matrix;

z in the formula (2)_θIn order to be a function of the allocation,

as can be seen from the conditional independence among the layers of the limited Boltzmann machine, when input data is given, the node values of the output layer meet the following conditional probability:

in the formula (3)

Activating a function for sigmoid; after the data of the output layer is determined, the value conditional probability of the input node is as follows:

given a set of training samples G ═ G₁,g₂,…,g_sTraining the RBM model, adjusting the parameter theta to fit a given training sample, and obtaining the parameter theta by maximizing the likelihood function L (theta) of the network, namely

To simplify the calculation, it is written in logarithmic form as:

two, CD training algorithm

The parameter θ in the log-likelihood function is derived with a contrast divergence algorithm as follows:

the updating criterion of each parameter is as follows:

△w_ij＝ε_CD(<v_ih_j>_data-<v_ih_j>_recon) (10)

△a_i＝ε_CD(<v_i>_data-<v_i>_recon) (11)

△b_j＝ε_CD(<h_j>_data-<h_j>_recon) (12)

in the formula (11) - (12), ε_CDIn order to learn the step size,<h_j>_reconrepresenting the distribution of the model definition after one-step reconstruction.

The further technical scheme is that the specific steps of the step 2 are as follows:

connecting an input layer of the DNN to an output layer of the DBN, and establishing a DBN-DNN deep learning model, wherein the training process of the DNN comprises forward propagation and backward propagation, and the contents are as follows:

one, forward propagation process

f(p)＝p (14)

h_w,b(p)＝δ(wp+b) (15)

In formula (16):

represents a connection weight parameter between a jth node of the ith layer and an ith node of the (l-1) th layer;

an intercept term representing a jth node of the ith layer;

in the formula (17)

Represents the k-th_lThe output value of the jth node of the layer;

two, counter-propagating process

For the k-th layer as the output layer_lThe residual error calculation formula of the output unit i of the layer is as follows:

for l ═ k_l-1,k_l-2,k_l-3The residual error calculation for each layer …,2, i-th node of the l-th layer is as follows:

for a fixed training sample set { (u)¹,y¹),…,(u^c,y^c) And (4) solving the deep neural network by using a small batch gradient descent method, wherein the loss function of the deep neural network comprises c samples:

in the equation (20), the weight attenuation parameter λ is used for the importance of the two terms in the control formula (20); thus, the update for parameters w and b for each iteration in the small batch gradient descent method is as follows:

the further technical scheme is that the specific process of the step 3 is as follows:

firstly, measuring the distribution difference of source data and task data:

suppose there is one source number satisfying d distribution

And a target data satisfying q distribution

X^(s)，X^(t)The maximum mean difference MMD of (a) can be expressed as:

in the formula (23), H represents a regeneration nuclear Hilbert space RKHS, phi (·): X → H represents a nonlinear feature mapping kernel function of the original feature space mapping to RKHS; since RKHS is typically a high-dimensional, even infinite space, the corresponding kernel selects a gaussian kernel representing an infinite dimension:

equation (24) where the width parameter of the sigma function; here, the square form of MMD, i.e. MMD²(ii) a It is unfolded as follows:

the nuclear trick is introduced here:

<η(x),η(x')>_H＝<K(x,·),K(·,x')>_H＝K(x,x') (26)

equation (25) reduces to:

third, migration source data of maximum mean difference contribution coefficient

In order to find out data with irrelevant or not relevant relevance between the source data and the target data, the source data is screened by defining a maximum mean difference contribution coefficient CCMMD of each source data;

by rho_iRepresents the maximum mean difference contribution coefficient of the ith sample; suppose there is a lack of MMD for the ith source data sample_γ≠iComprises the following steps:

MMD in formula (28)_γ≠iRepresents the maximum mean distance of the missing ith source data sample, where γ is 1,2, …, N_sThen

In the formula (28) (. rho)_iRepresents the maximum mean difference contribution coefficient of the ith sample, if_i>0, indicates that the ith sample is contributing to MMD, conversely if ρ_i≦ 0, indicating that the ith sample is "negative" contributing to MMD; by calculating p for each source data_iMigration out of rho_i>0, obtaining source data which is distributed more closely to the target data;

and fourthly, freezing a DBN layer of the DBN-DNN, and obtaining a model for finely adjusting the DNN structure in the DBN-DNN based on the migrated data, namely a TDBN-DNN migration deep learning model.

The further technical scheme is that the specific process of the step 4 is as follows:

(1) data processing

Input data load data, power output data and bus voltage data are subjected to standardization treatment, and the formula is (32):

in the formula (32), α_i,jThe ith sample data representing the jth data characteristic, i ═ 1,2, …, MC,

j＝1,2,…,2r,α′_i,jdata normalized by the ith sample data representing the jth data feature, α_min,jRepresents the minimum value of the j-th data characteristic in the sample data, alpha_max,jRepresenting the maximum value of the jth data characteristic in the sample data;

(2) TDBN-DNN migration deep learning model simulation model for simulating nonlinear mapping relation among load data, power supply data, bus voltage data and line loss rate during power grid operation

Determining the number of hidden layers and the number of nodes of a TDBN-DNN migration deep learning model according to the scale and real-time requirements of samples, and initializing network parameters; secondly, taking the standardized load data, power output data and bus voltage data sample data as input data, taking the corresponding section line loss rate as label data, and training the whole DBN deep belief network model layer by utilizing a greedy unsupervised learning algorithm; finally, giving the characteristic vector output by the DBN to a good initial value for the DNN, and adopting a BP algorithm to train the DNN to fit the label data by monitoring the corresponding fault line loss rate;

(3) line loss rate prediction of TDBN-DNN migration deep learning model

After the input data of the task target are subjected to standardization processing, the data are input into a TDBN-DNN line loss rate prediction model to obtain a predicted line loss rate value.

Compared with the prior art, the invention has the following beneficial effects:

1. the consideration factors are comprehensive, the sample input data is consistent with the data used by load flow calculation, and the model can more effectively fit the nonlinear mapping relation between input and output;

2. the line loss rate prediction model does not need to be subjected to a load flow calculation process in line loss calculation, but directly uses the nonlinear fitting capacity of the model to directly calculate a result, and improves the operation speed.

3. A deep learning line loss rate prediction model is provided, and a complex nonlinear mapping relation between power grid operation data and the line loss rate can be effectively simulated compared with a shallow network.

4. Before a task to be predicted is treated, the distribution distance between source training data and target data is tested through an MMD method in transfer learning, and training data which are distributed more closely to the target data in the source data are screened by a maximum mean difference Contribution Coefficient (CCMMD) to fine tune DNN. Therefore, the problems that a large amount of sample data is needed in deep learning, source data and task data are not distributed uniformly and are easy to over-fit are solved, and the prediction accuracy of the line loss rate can be improved.

Drawings

FIG. 1 is a diagram of a DBN basic building block RBM of the method of the present invention;

FIG. 2 is a DBN-DNN model structure of the method of the present invention;

fig. 3 is a flow chart of the method of the present invention.

Detailed description of the preferred embodiments

The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.

As shown in fig. 3, the method of the present invention comprises the following steps:

step 1, establishing a DBN deep confidence network model, training with a CD algorithm thereof, and specifically modeling as follows:

one, deep belief network

The DBN is a generative structure diagram model with multiple hidden layers, is formed by stacking a plurality of layers of Restricted Boltzmann Machines (RBMs), and has strong feature extraction capability. The structure of the RBM is shown in figure 1.

The RBM model is an energy model whose energy function can be defined as E for a set of known states (v, h)_θ(v, h), where θ is the network parameter θ ═ { a ═ a_i,b_j,w_ijIts joint probability distribution of hidden and visible layers can be defined as p (v, h), i.e.:

z in the formula (2)_θIn order to be a function of the allocation,

in the formula (3)

Activating a function for sigmoid; phase (C)Correspondingly, after the data of the output layer is determined, the value conditional probability of the input node is as follows:

given a set of training samples G ═ G₁,g₂,…,g_sTraining an RBM, which means adjusting a parameter θ to fit a given training sample, is to obtain the parameter θ by maximizing the likelihood function L (θ) of the network, i.e.

To simplify the calculation, it is written in logarithmic form as:

two, CD training algorithm

Contrast Divergence (CD) the parameter θ in the log-likelihood function is derived with a contrast divergence algorithm as follows:

the updating criterion of each parameter is as follows:

△w_ij＝ε_CD(<v_ih_j>_data-<v_ih_j>_recon) (10)

△a_i＝ε_CD(<v_i>_data-<v_i>_recon) (11)

△b_j＝ε_CD(<h_j>_data-<h_j>_recon) (12)

Step 2, building a DBN-DNN deep learning model on the basis of the DBN, as shown in fig. 2, wherein the specific process of DNN model construction is as follows:

one, forward propagation process

f(p)＝p (14)

h_w,b(p)＝δ(wp+b) (15)

In formula (16):

an intercept term representing a jth node of the ith layer;

in the formula (17)

Represents the k-th_lThe output value of the jth node of the layer;

two, counter-propagating process

For the k-th_lThe residual error calculation formula of the output unit i of the layer (output layer) is as follows:

in which (20) the weight decay parameter λ is used to control the importance of both terms in (20); thus, the update for parameters w and b for each iteration in the small batch gradient descent method is as follows:

further, in step 3, the migration deep learning model specifically includes the following processes:

and (4) training the DBN-DNN deep learning model by using the source data, wherein the training process is as shown in the first step and the second step.

Firstly, measuring the distribution difference of source data and task data:

suppose there is one source number satisfying d distribution

And a target data satisfying q distribution

X^(s)，X^(t)The Maximum Mean Difference (MMD) of (a) can be expressed as:

in the formula (23), H represents a Regenerative Kernel Hilbert Space (RKHS), phi (·): X → H represents a nonlinear feature mapping kernel function of the original feature space mapped to the RKHS; since RKHS is typically a high-dimensional, even infinite space, the corresponding kernel is typically chosen to represent a gaussian kernel of infinite dimensions:

equation (24) where the width parameter of the sigma function; for computational convenience, we generally use the square form of MMD, i.e., MMD²(ii) a It is unfolded as follows:

the nuclear trick (kernel rock) was introduced here:

<η(x),η(x')>_H＝<K(x,·),K(·,x')>_H＝K(x,x') (26)

equation (25) reduces to:

second, migration source data of maximum mean difference contribution coefficient

In order to find data that is irrelevant or not relevant to the source data and the target data, herein, the source data is filtered by defining a maximum mean difference Contribution Coefficient (CCMMD) of each source data;

MMD of formula (28)_γ≠iRepresents the maximum mean distance of the missing ith source data sample, where γ is 1,2, …, N_sThen

In the formula (29) (. rho)_iRepresents the maximum mean difference contribution coefficient of the ith sample, if_i>0, indicates that the ith sample is contributing to MMD, conversely if ρ_i≦ 0, indicating that the ith sample is "negative" contributing to MMD; by calculating p for each source data_iMigration out of rho_i>And 0, obtaining source data which is distributed more closely to the target data.

And fourthly, freezing a DBN layer of the DBN-DNN, and using the migrated source data for fine-tuning a DNN neural network at a high layer to obtain a TDBN-DNN deep migration learning model.

Further, in step 4, the theoretical line loss prediction model based on TDBN-DNN is established by the following specific process:

first, theoretical line loss problem description

And simulating a nonlinear mapping relation between load data, power supply data and bus voltage data and a line loss rate when the power grid operates by using a deep learning model. Considering the strong feature extraction capability and the strong fitting performance capability of a deep learning model, taking active power data, reactive power data and bus voltage data of each node in an equivalent model of the electric network as an input matrix X of the deep model, and taking the theoretical line loss rate of a corresponding network as an output matrix T. assuming that the electric network has r nodes, wherein the e PQ nodes, the r-e-1 PV nodes and one balance node are shown in the formulas (30) and (31);

T＝[△ξ₁,△ξ₂,…,△ξ_MC] (31)

in the formula (30), MC is the number of sections, r is the number of nodes in the power grid, and e is the number of load PQ nodes;

in the formula (31), Δ ξ_iLine loss rate tag data for the ith sample;

second, TDBN-DNN model construction and solution

The method comprises the following specific steps:

(1) and (6) data processing. Because the value ranges and units of the load data, the power output data and the bus voltage data are different, in order to avoid the influence of dimension and prevent the situation of 'big eating and small' of data caused by different absolute values among characteristics, the input data is subjected to standardization treatment, as shown in formula (32):

in the formula (32), α_i,jThe ith sample data representing the jth data feature, i ═ 1,2, …, M,

j＝1,2,…,2n,α′_i,jrepresenting characteristics of j-th dataData normalized to the ith sample data, α_min,jRepresents the minimum value of the j-th data characteristic in the sample data, alpha_max,jRepresenting the maximum value of the jth data feature in the sample data.

(2) And pre-training a deep learning model. Firstly, determining the number of hidden layers and the number of nodes of a deep learning network according to the scale and the real-time requirement of a sample, and initializing network parameters. And then, taking the standardized sample data of the power supply, load, voltage and the like of the power grid as input data, taking the corresponding line loss rate of the section as label data, and training the whole DBN deep belief network model layer by utilizing a greedy unsupervised learning algorithm. And finally, giving the characteristic vector output by the DBN to a good initial value for the DNN, and adopting a BP algorithm to train the DNN to fit the label data by using a mode of supervising the corresponding fault line loss rate.

(3) And (5) transferring the deep learning model. First, the underlying generic network in the deep learning model, which is generally considered, is frozen. Then, the distribution difference of the source training data and the target data is measured by adopting a maximum mean difference method (MMD), the maximum mean difference Contribution Coefficient (CCMMD) of each sample in the source data is calculated, and rho is selected_i>Sample data of 0. And finally, fine-tuning the pre-trained DNN by using the selected source sample data to obtain a deep migration learning model TDBN-DNN.

(4) And predicting the line loss rate. After the input data of the task target are subjected to standardization processing, the data are input into a TDBN-DNN line loss rate prediction model to obtain a predicted line loss rate value.

The above-mentioned embodiments are merely illustrative of the preferred embodiments of the present invention, and do not limit the scope of the present invention, and various modifications and improvements of the technical solution of the present invention by those skilled in the art should fall within the protection scope defined by the claims of the present invention without departing from the spirit of the present invention.

Claims

1. A method for predicting theoretical line loss rate based on deep transfer learning is characterized by comprising the following steps:

step 4, simulating a nonlinear mapping relation among load data, power supply data, bus voltage data and line loss rate when a power grid operates by using a TDBN-DNN migration deep learning model, and predicting the line loss rate;

the specific steps of step 4 are as follows:

(1) data processing

in the formula (32), α_i,jThe ith sample data representing the jth data feature, i ═ 1,2, L, MC, j ═ 1,2, L,2r, α'_i,jData normalized by the ith sample data representing the jth data feature, α_min,jRepresents the minimum value of the j-th data characteristic in the sample data, alpha_max,jRepresenting the maximum value of the jth data characteristic in the sample data;

(3) line loss rate prediction of TDBN-DNN migration deep learning model

2. The method for predicting the theoretical line loss rate based on deep migration learning according to claim 1, wherein the specific steps of the step 1 are as follows:

firstly, building a DBN deep confidence network

in the formula (1), m is the number of input units, n is the number of output units, a_iFor each input cell real-valued offset, b_jFor real-valued bias of each hidden unit, v_iFor representing input data, h_iFor hidden unit output data, w_ijIs a real-valued weightA re-matrix;

z in the formula (2)_θIn order to be a function of the allocation,

in the formula (3)

For the sigmoid activation function, after the data of the output layer is determined, the value condition probability of the input node is as follows:

given a set of training samples G ═ G₁,g₂,L,g_sTraining the RBM model, adjusting the parameter theta to fit a given training sample, and obtaining the parameter theta by maximizing the likelihood function L (theta) of the network, namely

To simplify the calculation, it is written in logarithmic form as:

two, CD training algorithm

the updating criterion of each parameter is as follows:

Δw_ij＝ε_CD(＜v_ih_j＞_data-＜v_ih_j＞_recon) (10)

Δa_i＝ε_CD(＜v_i＞_data-＜v_i＞_recon) (11)

Δb_j＝ε_CD(＜h_j＞_data-＜h_j＞_recon) (12)

in the formula (11) - (12), ε_CDFor learning step length, < h_j＞_reconRepresenting the distribution of the model definition after one-step reconstruction.

3. The method for predicting the theoretical line loss rate based on deep migration learning according to claim 1, wherein the specific steps of the step 2 are as follows:

one, forward propagation process

f(p)＝p (14)

h_w,b(p)＝δ(wp+b) (15)

In formula (16):

an intercept term representing a jth node of the ith layer;

in the formula (17)

Represents the k-th_lThe output value of the jth node of the layer;

two, counter-propagating process

for l ═ k_l-1,k_l-2,k_l-3In each layer L,2, the residual calculation formula of the ith node in the L-th layer is as follows:

for a fixed training sample set { (u)¹,y¹),L,(u^c,y^c) Is composed of cThe method comprises the following steps of solving a deep neural network by using a small batch gradient descent method, wherein a loss function of the deep neural network is as follows:

。

4. the method for predicting the theoretical line loss rate based on deep migration learning according to claim 1, wherein the specific process in the step 3 is as follows:

firstly, measuring the distribution difference of source data and task data:

suppose there is one source number satisfying d distribution

And a target data satisfying q distribution

X^(s)，X^(t)The maximum mean difference MMD of (a) can be expressed as:

the nuclear trick is introduced here:

＜η(x),η(x')＞_H＝＜K(x,g),K(g,x')＞_H＝K(x,x') (26)

equation (25) reduces to:

MMD in formula (28)_γ≠iRepresents the maximum mean distance of the missing ith source data sample, where γ is 1,2, L, N_sThen