CN110490659A

CN110490659A - A kind of customer charge curve generation method based on GAN

Info

Publication number: CN110490659A
Application number: CN201910775634.7A
Authority: CN
Inventors: 陈建福; 曹安瑛; 李建标; 甘德树; 裴星宇; 唐捷; 刘嘉宁; 温柏坚; 蔡徽; 陈勇; 周建明; 邹国惠; 黄培专; 杨昆; 唐小川; 钱兴博; 萧展辉; 裴求根; 江疆; 游雪峰
Original assignee: Guangdong Power Grid Co Ltd; Zhuhai Power Supply Bureau of Guangdong Power Grid Co Ltd
Current assignee: Guangdong Power Grid Co Ltd; Zhuhai Power Supply Bureau of Guangdong Power Grid Co Ltd
Priority date: 2019-08-21
Filing date: 2019-08-21
Publication date: 2019-11-22
Anticipated expiration: 2039-08-21
Also published as: CN110490659B

Abstract

The customer charge curve generation method based on GAN that the present invention relates to a kind of, belongs to electric system Demand Side Response field.This method comprises: load curve normalizes, builds generation network, builds differentiation network, alternately training generation network, load curve generation.This method is based on GAN (Generative Adversarial Network, generate confrontation network), the implicit statistical nature of truthful data is captured using GAN, and mass users load is generated based on statistical nature, so that the intrinsic distribution characteristics and diversity of user can be followed by generating load data.Magnanimity virtual load data can be generated in the insufficient situation of truthful data using this method, generation data are able to reflect user and really use electrical feature, embody the statistical nature of true load, it is capable of the electricity consumption behavior of analog subscriber multiplicity simultaneously, to substitute simulation of the truthful data for distribution system analysis and Demand Side Response technology.

Description

A kind of customer charge curve generation method based on GAN

Technical field

The customer charge curve generation method based on GAN that the present invention relates to a kind of belongs to electric system Demand Side Response neck Domain.

Background technique

At this stage, power grid is towards intelligence, and distributed, the direction cleaned is developed, along with the deep change of power grid Leather, for the analysis and research object of electric system, also terminad distribution, terminal user extend since system layer, busbar laminate. Important component of the customer charge as end distribution is obtaining more and more concerns, many distribution system analysis methods The support of customer charge data is be unable to do without with the deployment of Demand Side Response technology.However, due to by secret protection law and intelligence The limitation of energy ammeter popularity, the true load data wretched insufficiency of currently available user, this seriously inhibits phases in field Close the propulsion of research.

It is, therefore, desirable to provide a kind of customer charge generation method, it is virtual can to generate magnanimity in the case where data deficiencies Load data, generation data are able to reflect user and really use electrical feature, embody the statistical nature of true load, while being capable of mould The electricity consumption behavior of quasi- user's multiplicity, to substitute simulation of the truthful data for distribution system analysis and Demand Side Response technology.

Summary of the invention

The purpose of the present invention is to propose to one kind, and based on GAN, (Generative Adversarial Network generates confrontation Network) customer charge curve generation method supplemented by way of generating data in the insufficient situation of true load data Data set.

Customer charge curve generation method proposed by the present invention based on GAN the following steps are included:

Step 1: load normalization:

Remember the existing historical load collection of curves of userWherein, every load curve is denoted as:Wherein, T is load length of curve, finds the maxima and minima of user's history load first:

Load value is adjusted between section [- 1,1] according to very big load and minimum load, as follows:

Step 2: build generation network:

It is G that note, which generates network, and the input for generating network is the noise vector z of Gaussian distributed, generates the output of network It is the simulation load curve x generated_g, it is denoted as:

x_g=G (z；θ_g)

Wherein, θ_gFor the parameter to be asked in G；

Since load curve is one-dimensional vector, what generation network was actually accomplished is from fixed length one-dimensional vector to fixed length Mapping between one-dimensional vector, the characteristics of according to neural network, the hidden layer for generating network directly selects full articulamentum, tool Body is as follows:

The dimension of curve to be generated is 1 × N, therefore the noise vector z length for inputting Gaussian Profile is also 1 × N, wherein z In each element Normal Distribution, i.e. z_i~N (0,1)；

The first layer for generating network is the full articulamentum f containing 2N neuron₁, remember that its output is p₁, then this layer of operation is fixed Justice are as follows:

p₁=f₁(z), wherein z ∈ R^1×N, p₁∈R^1×2N

f₁Z is mapped to a twice original higher dimensional space；

The second layer for generating network is the full articulamentum f containing 4N neuron₂, remember that its output is p₂, then this layer of operation is fixed Justice are as follows:

p₂=f₂(p₁), wherein p₁∈R^1×2N, p₂∈R^1×4N

f₂Further by p₁It is mapped to a twice original higher dimensional space；

The last layer for generating network is the full articulamentum f containing N number of neuron₃, output is formation curve x_g, then originally Layer operation definition are as follows:

x_g=f₃(p₂), wherein p₂∈R^1×4N, x_g∈R^1×N

f₃Abstract characteristics in higher dimensional space are mapped to real space；

f₁、f₂、f₃It is substantially linear affine operation, the corresponding parameter matrix of every step operation is respectivelyI.e. full articulamentum f₁、f₂、f₃Corresponding mapping mode are as follows:

p₁=zA₁、p₂=p₁A₂、x_g=p₂A₃

But only introduce affine transformation and model is made to be still linear, and include a large amount of non-thread in load curve Property, it is therefore desirable to it makes a living into network and non-linear elements is added, so that it is had the ability for establishing nonlinear model, by each time Can be non-linear for model addition using nonlinear activation function after affine transformation, i.e., in first time and second of affine transformation Afterwards, as follows using relu activation primitive:

p′₁=relu (p₁), p'₂=relu (p₂)

Wherein, relu activation primitive are as follows:

After last time affine transformation, tanh activation primitive is used, it may be assumed that

x'_g=tanh (x_g)

Wherein, tanh activation primitive are as follows:

By tanh activation primitive by formation curve x_gCodomain adjust between section [- 1,1], it is identical as real curve；

It generates in network G, the element in affine matrix is network parameter θ_g, need to obtain by training；

Step 3: build differentiation network:

Note differentiates that network is D, differentiates that the input of network is load curve x, differentiates that the output of network is that load curve comes from The probability P of authentic specimen collection, when differentiating that network thinks that load curve is that true probability is bigger, then its output tends to 1, when thinking Load curve is that the probability of generation is bigger, then its output tends to 0, is denoted as:

P=D (x；θ_d)

Wherein, θ_dFor the parameter to be asked in D；

Since load curve is one-dimensional vector, what differentiation network was actually accomplished is from fixed length one-dimensional vector to numerical value Mapping between section [0,1] the characteristics of according to neural network, differentiates that the hidden layer of network directly selects full articulamentum, It is specific as follows:

The dimension of curve to be discriminated is 1 × N, differentiates that the first layer of network is the full articulamentum d containing 4N neuron₁, Remember that its output is q₁, then this layer of operation definition are as follows:

q₁=d₁(x), wherein x ∈ R^1×N, q₁∈R^1×4N

d₁X is mapped to one and is four times in original higher dimensional space；

The second layer for differentiating network is the full articulamentum d containing 2N neuron₂, remember that its output is q₂, then this layer of operation is fixed Justice are as follows:

q₂=d₂(q₁), wherein q₁∈R^1×4N, q₂∈R^1×2N

d₂By q₁A half is mapped in original lower dimensional space；

The last layer for differentiating network is the full articulamentum d containing N number of neuron₃, output is to sentence to curve source Other probability P, then this layer of operation definition are as follows:

P=d₃(q₂), wherein q₂∈R^1×2N, P ∈ [0,1]

d₁、d₂、d₃It is substantially linear affine operation, the corresponding parameter matrix of every step operation is B respectively₁∈R^N×4N、B₂∈ R^4N×2N、B₃∈R^2N×N, i.e., full articulamentum d₁、d₂、d₃Corresponding mapping mode are as follows:

q₁=xB₁、q₂=q₁B₂, P=q₂B₃

It is similarly and differentiates that nonlinear element is added in network, i.e., for the first time and after second of affine transformation, use leaky- Relu activation primitive, as follows:

q′₁=leaky-relu (q₁), q'₂=leaky-relu (q₂)

Wherein, leaky-relu activation primitive are as follows:

After last time affine transformation, Sigmoid activation primitive is used, it may be assumed that

P'=Sigmoid (P)

Wherein, Sigmoid activation primitive are as follows:

The codomain of P is adjusted between section [0,1] by Sigmoid activation primitive, it is made to have probability attribute；

Differentiate in network D, the element in affine matrix is network parameter θ_d, need to obtain by training；

Step 4: network training:

Neural network obtains the optimal estimation of parameter by gradient descent algorithm, and this method model includes two networks, and two A network is alternately trained, and another party fixes when a side trains, therefore also using the optimization algorithm declined based on gradient；

Firstly, definition differentiates the training objective function of network, differentiate that network should make it become the output of real curve In 1,0 is tended to the output of formation curve, two classification is introduced herein and intersects entropy function:

Wherein,For predicted value, y true value, the correlated variables to real curve, in above formula are as follows:

Correlated variables to formation curve, in above formula are as follows:

The correlated variables of the above real curve and formation curve is substituted into two classification respectively and intersects entropy function, the friendship that will be obtained It pitches entropy function to be added, then has:

loss_D=-ln (D (x_i；θ_d))-ln(1-D(x_g；θ_d))

The training objective for differentiating network is so that loss function reaches minimum, it is contemplated that the second part derivation of above formula right end Difficulty, by its further abbreviation are as follows:

loss_D=-ln (D (x_i；θ_d))+ln(D(x_g；θ_d))

Wherein, x_g=G (z；θ_g), therefore have:

loss_D=-ln (D (x_i；θ_d))+ln(D(G(z；θ_g)；θ_d))

Meanwhile needing to make a living into network objective function, the target of generating function is that formation curve is passed through The detection for differentiating network is thought to generate net when differentiating that network can not differentiate the difference between formation curve and real curve Network training finishes, and real curve can be replaced to be used as subsequent analysis with formation curve, therefore generating network should to differentiate network 1 is tended to the output of formation curve, it is same to introduce two classification intersection entropy functions, have at this time:

loss_G=-ln (D (x_g；θ_d))=- ln (D (G (z；θ_g)；θ_d))

Differentiate that network replaces training with network is generated, target is so that respective loss function loss_DAnd loss_GIt minimizes, When a network training, the parameter of another network is fixed；

When training starts, remember that training round is 0, sampling to obtain according to normal distribution N (0,0.01) generates network and differentiation The parameter of network is denoted asWithWhen t takes turns training beginning previous existence into network and differentiates that the parameter of network is denoted asWithThis When generation network and differentiate network loss function be denoted asWithThe parameter of network at the end of then t wheel is trained Are as follows:

Wherein:

α is learning rate, is arranged between 0.0002~0.0001；

The judge index ε that training completion is arranged is 0.0001, if meeting after T wheel training:

Then think to train completion, save current network parameter, be denoted as:

At this point, magnanimity Virtual User curve can be generated with network is generated:

Wherein, z ∈ R^1×NAnd z_i~N (0,1),For the optimized parameter that training obtains, the curve that generation network is exported Normalization is carried out to get formation curve is arrived:

The beneficial effects of the present invention are:

This method is realized based on GAN, captures the implicit statistical nature of truthful data using GAN, and raw based on statistical nature At mass users load, so that the intrinsic distribution characteristics and diversity of user can be followed by generating load data.Utilize this Method can generate magnanimity virtual load data in the insufficient situation of truthful data, and it is true that generation data are able to reflect user With electrical feature, the statistical nature of true load is embodied, while capableing of the electricity consumption behavior of analog subscriber multiplicity, thus the true number of substitution According to the simulation for distribution system analysis and Demand Side Response technology.

Specific embodiment

Embodiment 1:

Step 1: load normalization:

Step 2: build generation network:

x_g=G (z；θ_g)

Wherein, θ_gFor the parameter to be asked in G；

p₁=f₁(z), wherein z ∈ R^1×N, p₁∈R^1×2N

f₁Z is mapped to a twice original higher dimensional space；

p₂=f₂(p₁), wherein p₁∈R^1×2N, p₂∈R^1×4N

f₂Further by p₁It is mapped to a twice original higher dimensional space；

x_g=f₃(p₂), wherein p₂∈R^1×4N, x_g∈R^1×N

f₁、f₂、f₃It is substantially linear affine operation, the corresponding parameter matrix of every step operation is A respectively₁∈R^N×2N、A₂∈ R^2N×4N、A₃∈R^4N×N, therefore the three-decker generated in network can be summarized as following table:

Full articulamentum	Affine matrix	Input vector	Output vector	Mapping mode
					f₁	A₁∈R^N×2N	z∈R^1×N	p₁∈R^1×2N	p₁=zA₁
f₂	A₂∈R^2N×4N	p₁∈R^1×2N	p₂∈R^1×4N	p₂=p₁A₂
					f₃	A₃∈R^4N×N	p₂∈R^1×4N	x_g∈R^1×N	x_g=p₂A₃

It can see full articulamentum f₁、f₂、f₃Corresponding mapping mode are as follows:

p₁=zA₁、p₂=p₁A₂、x_g=p₂A₃

p′₁=relu (p₁), p'₂=relu (p₂)

Wherein, relu activation primitive are as follows:

x'_g=tanh (x_g)

Wherein, tanh activation primitive are as follows:

Step 3: build differentiation network:

P=D (x；θ_d)

Wherein, θ_dFor the parameter to be asked in D；

q₁=d₁(x), wherein x ∈ R^1×N, q₁∈R^1×4N

q₂=d₂(q₁), wherein q₁∈R^1×4N, q₂∈R^1×2N

d₂By q₁A half is mapped in original lower dimensional space；

P=d₃(q₂), wherein q₂∈R^1×2N, P ∈ [0,1]

d₁、d₂、d₃It is substantially linear affine operation, the corresponding parameter matrix of every step operation is B respectively₁∈R^N×4N、B₂∈ R^4N×2N、B₃∈R^2N×N, therefore differentiate that the three-decker in network can be summarized as following table:

Full articulamentum	Affine matrix	Input vector	Output vector	Mapping mode
					d₁	B₁∈R^N×4N	x∈R^1×N	q₁∈R^1×4N	q₁=xB₁
d₂	B₂∈R^4N×2N	q₁∈R^1×4N	q₂∈R^1×2N	q₂=q₁B₂
					d₃	B₃∈R^2N×N	q₂∈R^1×2N	P∈[0,1]	P=q₂B₃

It can see full articulamentum d₁、d₂、d₃Corresponding mapping mode are as follows:

q₁=xB₁、q₂=q₁B₂, P=q₂B₃

q′₁=leaky-relu (q₁), q'₂=leaky-relu (q₂)

Wherein, leaky-relu activation primitive are as follows:

P'=Sigmoid (P)

Wherein, Sigmoid activation primitive are as follows:

Step 4: network training:

Correlated variables to formation curve, in above formula are as follows:

loss_D=-ln (D (x_i；θ_d))-ln(1-D(x_g；θ_d))

loss_D=-ln (D (x_i；θ_d))+ln(D(x_g；θ_d))

Wherein, x_g=G (z；θ_g), therefore have:

loss_D=-ln (D (x_i；θ_d))+ln(D(G(z；θ_g)；θ_d))

loss_G=-ln (D (x_g；θ_d))=- ln (D (G (z；θ_g)；θ_d))

Wherein:

α is learning rate, is arranged between 0.0002~0.0001；

Then think to train completion, save current network parameter, be denoted as:

Wherein, z ∈ R^1×NAnd z_i~N (0,1),For the obtained optimized parameter of training, to generate the curve of network output into Row goes normalization to get formation curve is arrived:

Obviously, the above embodiment of the present invention be only to clearly illustrate example of the present invention, and not be pair The restriction of embodiments of the present invention.For those of ordinary skill in the art, may be used also on the basis of the above description To make other variations or changes in different ways.There is no necessity and possibility to exhaust all the enbodiments.It is all this Made any modifications, equivalent replacements, and improvements etc., should be included in the claims in the present invention within the spirit and principle of invention Protection scope within.

Claims

1. a kind of customer charge curve generation method based on GAN, which comprises the following steps:

Step 1: load normalization:

Step 2: build generation network:

It is G that note, which generates network, and the input for generating network is the noise vector z of Gaussian distributed, and it is raw for generating the output of network At simulation load curve x_g, it is denoted as:

x_g=G (z；θ_g)

Wherein, θ_gFor the parameter to be asked in G；

Since load curve is one-dimensional vector, what generation network was actually accomplished is one-dimensional from fixed length one-dimensional vector to fixed length Mapping between vector, the hidden layer for generating network directly selects full articulamentum, specific as follows:

The dimension of curve to be generated is 1 × N, therefore the noise vector z length for inputting Gaussian Profile is also 1 × N, wherein in z Each element Normal Distribution, i.e. z_i~N (0,1)；

The first layer for generating network is the full articulamentum f containing 2N neuron₁, remember that its output is p₁, then this layer of operation definition Are as follows:

p₁=f₁(z), wherein z ∈ R^1×N, p₁∈R^1×2N

f₁Z is mapped to a twice original higher dimensional space；

The second layer for generating network is the full articulamentum f containing 4N neuron₂, remember that its output is p₂, then this layer of operation definition Are as follows:

p₂=f₂(p₁), wherein p₁∈R^1×2N, p₂∈R^1×4N

f₂Further by p₁It is mapped to a twice original higher dimensional space；

The last layer for generating network is the full articulamentum f containing N number of neuron₃, output is formation curve x_g, then this layer is transported It calculates is defined as:

x_g=f₃(p₂), wherein p₂∈R^1×4N, x_g∈R^1×N

f₁、f₂、f₃It is substantially linear affine operation, the corresponding parameter matrix of every step operation is A respectively₁∈R^N×2N、A₂∈R^2N ^×4N、A₃∈R^4N×N, i.e., full articulamentum f₁、f₂、f₃Corresponding mapping mode are as follows:

p₁=zA₁、p₂=p₁A₂、x_g=p₂A₃

But introduce affine transformation and model made to be still linear, and include in load curve it is a large amount of non-linear, therefore It needs to make a living into network and non-linear elements is added, so that it is had the ability for establishing nonlinear model, by affine change each time Can be non-linear for model addition using nonlinear activation function after changing, i.e., for the first time and after second of affine transformation, use Relu activation primitive, as follows:

p'₁=relu (p₁), p'₂=relu (p₂)

Wherein, relu activation primitive are as follows:

x'_g=tanh (x_g)

Wherein, tanh activation primitive are as follows:

Step 3: build differentiation network:

Note differentiates that network is D, differentiates that the input of network is load curve x, and the output for differentiating network is load curve from true The probability P of sample set, when differentiating that network thinks that load curve is that true probability is bigger, then its output tends to 1, when thinking load Curve is that the probability of generation is bigger, then its output tends to 0, is denoted as:

P=D (x；θ_d)

Wherein, θ_dFor the parameter to be asked in D；

Since load curve is one-dimensional vector, what differentiation network was actually accomplished is from fixed length one-dimensional vector to numerical intervals Mapping between [0,1] differentiates that the hidden layer of network directly selects full articulamentum, specific as follows:

The dimension of curve to be discriminated is 1 × N, differentiates that the first layer of network is the full articulamentum d containing 4N neuron₁, remember that its is defeated It is out q₁, then this layer of operation definition are as follows:

q₁=d₁(x), wherein x ∈ R^1×N, q₁∈R^1×4N

The second layer for differentiating network is the full articulamentum d containing 2N neuron₂, remember that its output is q₂, then this layer of operation definition Are as follows:

q₂=d₂(q₁), wherein q₁∈R^1×4N, q₂∈R^1×2N

d₂By q₁A half is mapped in original lower dimensional space；

The last layer for differentiating network is the full articulamentum d containing N number of neuron₃, output is the differentiation probability to curve source P, then this layer of operation definition are as follows:

P=d₃(q₂), wherein q₂∈R^1×2N, P ∈ [0,1]

d₁、d₂、d₃It is substantially linear affine operation, the corresponding parameter matrix of every step operation is B respectively₁∈R^N×4N、B₂∈R^4N ^×2N、B₃∈R^2N×N, i.e., full articulamentum d₁、d₂、d₃Corresponding mapping mode are as follows:

q₁=xB₁、q₂=q₁B₂, P=q₂B₃

It is similarly and differentiates that nonlinear element is added in network, i.e., for the first time and after second of affine transformation, use leaky-relu Activation primitive, as follows:

q'₁=leaky-relu (q₁), q'₂=leaky-relu (q₂)

Wherein, leaky-relu activation primitive are as follows:

P'=Sigmoid (P)

Wherein, Sigmoid activation primitive are as follows:

Step 4: network training:

Neural network obtains the optimal estimation of parameter by gradient descent algorithm, and this method model includes two networks, two nets Network is alternately trained, and another party fixes when a side trains, therefore also using the optimization algorithm declined based on gradient；

Firstly, definition differentiates the training objective function of network, differentiate that network should make it tend to 1 to the output of real curve, 0 is tended to the output of formation curve, two classification is introduced herein and intersects entropy function:

Correlated variables to formation curve, in above formula are as follows:

The correlated variables of the above real curve and formation curve is substituted into two classification respectively and intersects entropy function, the cross entropy that will be obtained Function is added, then is had:

loss_D=-ln (D (x_i；θ_d))-ln(1-D(x_g；θ_d))

Differentiating the training objective of network is so that loss function reaches minimum, it is contemplated that the second part derivation of above formula right end is tired Difficulty, by its further abbreviation are as follows:

loss_D=-ln (D (x_i；θ_d))+ln(D(x_g；θ_d))

Wherein, x_g=G (z；θ_g), therefore have:

loss_D=-ln (D (x_i；θ_d))+ln(D(G(z；θ_g)；θ_d))

Meanwhile needing to make a living into network objective function, the target of generating function is that formation curve is enabled to pass through differentiation The detection of network is thought to generate network instruction when differentiating that network can not differentiate the difference between formation curve and real curve White silk finishes, and real curve can be replaced to be used as subsequent analysis with formation curve, therefore generating network should to differentiate network to life Tend to 1 at the output of curve, it is same to introduce two classification intersection entropy functions, have at this time:

loss_G=-ln (D (x_g；θ_d))=- ln (D (G (z；θ_g)；θ_d))

Differentiate that network replaces training with network is generated, target is so that respective loss function loss_DAnd loss_GIt minimizes, when one When a network training, the parameter of another network is fixed；

When training starts, remember that training round is 0, sampling to obtain according to normal distribution N (0,0.01) generates network and differentiation network Parameter be denoted asWithWhen t takes turns training beginning previous existence into network and differentiates that the parameter of network is denoted asWithAt this time Generation network and differentiate network loss function be denoted asWithThe parameter of network at the end of then t wheel is trained Are as follows:

Wherein:

α is learning rate, is arranged between 0.0002~0.0001；

Then think to train completion, save current network parameter, be denoted as:

At this point, generating magnanimity Virtual User curve with network is generated:

Wherein, z ∈ R^1×NAnd z_i~N (0,1),For the optimized parameter that training obtains, the curve for generating network output is gone It normalizes to get formation curve is arrived: