CN110880773B

CN110880773B - Power grid frequency modulation control method based on combination of data driving and physical model driving

Info

Publication number: CN110880773B
Application number: CN201911129495.7A
Authority: CN
Inventors: 李富盛; 余涛
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2019-11-18
Filing date: 2019-11-18
Publication date: 2023-09-15
Anticipated expiration: 2039-11-18
Also published as: CN110880773A

Abstract

The invention discloses a power grid frequency modulation control method based on combination of data driving and physical model driving. The method comprises the steps of determining a state space set of a power grid according to a historical frequency modulation sample of the power gridSAnd controlling the set of actionsAThen clustering elements of the state space set, taking a clustering result as a sample label of a condition generation type countermeasure network, training the condition generation type countermeasure network, generating a new sample with similar distribution with a historical frequency modulation sample, enhancing the historical frequency modulation sample by the new sample, introducing a multi-layer perceptron MLP to establish a mapping model, controlling a physical model of power grid frequency modulation by using a Q learning controller, taking a scheduling decision result of the mapping model as an initial value of the physical model, outputting an optimal solution of a power grid frequency modulation strategy, namely power adjustment quantity corresponding to the power grid frequency deviation at a certain moment, and modulating the power grid. The method introduces the generated type countermeasure network to enhance the data, and improves the efficiency of the initial iteration process of the existing model-driven power grid frequency modulation strategy.

Description

Power grid frequency modulation control method based on combination of data driving and physical model driving

Technical Field

The invention relates to the technical field of frequency modulation control of power systems, in particular to a power grid frequency modulation control method based on combination of data driving and physical model driving.

Background

Over time, a large number of power grid frequency modulation control methods are used for actual power grids, massive historical decision schemes are accumulated, and besides the schemes are verified by engineering, the schemes are corrected by a dispatcher according to actual conditions, so that the method has high engineering application value. However, the previous big data technology has a low development level and does not have the capability of processing massive historical decision schemes, so the data-driven frequency modulation method is relatively lacking. In recent years, with the rapid development of artificial intelligence technology and the rapid progress of big data technology in hardware, software and algorithms, it has become possible to mine useful scheduling experience from massive historical frequency modulation control data.

The traditional power grid frequency modulation control method is basically based on model driving, has high requirements on the quality of a data model and an algorithm, is generally selected randomly, causes long time consumption in the initial stage of the algorithm, is easy to sink into local convergence, and causes waste of data resources due to insufficient utilization of a historical decision scheme, and when the processed problem is met, the complete operation is needed to be carried out again, so that the time and the effort are consumed. Therefore, how to combine data driving and physical model driving to improve the physical driving model method is a problem that needs to be researched in the new power grid in the big data age.

Disclosure of Invention

Therefore, the invention aims to provide the power grid frequency modulation control method based on the combination of the data driving and the physical model driving, and the generated type countermeasure network is introduced to carry out data enhancement, so that the diversity and the robustness of a historical sample can be improved, the data driving and the model driving are combined, the historical frequency modulation data of the power grid can be fully utilized, the effectiveness of the initial value selection of the physical model can be improved, and the efficiency of the initial iteration process of the power grid frequency modulation strategy based on the model driving can be improved.

The object of the invention is achieved by at least one of the following technical solutions.

A power grid frequency modulation control method based on combination of data driving and physical model driving comprises the following steps:

s1, determining a state space set S and a control action set A of a power grid according to a historical frequency modulation sample of the power grid;

s2, clustering elements of a state space set by using a k-means algorithm, wherein a clustering result is used as a sample label of a conditional generation type countermeasure network (CGAN);

s3, taking the noise Z, the state space set S, the control action set A and the sample label Y as the input of a conditional generation type countermeasure network, training the CGAN by adopting the minimized Wasserstein distance as an objective function, and generating a generated sample with similar distribution with the historical frequency modulation sample;

s4, enhancing the generated samples to the historical frequency modulation samples to obtain enhanced samples, and introducing the enhanced samples into a multi-layer perceptron MLP to establish a mapping model from S to A;

s5, using a physical model of grid frequency modulation controlled by a Q learning controller, taking a scheduling decision result of the mapping model as an initial value of the Q learning controller, and outputting an optimal solution of a grid frequency modulation control strategy to obtain grid frequency deviation and corresponding power adjustment quantity of each period;

s6, frequency modulation is carried out on the power grid according to the power grid frequency deviation of each period and the corresponding power adjustment quantity.

Further, in step S1, the state space set S includes a regional power gridFrequency deviation |Δf|, area control deviation |ace|, control performance standard value CPS1, i.e. S _it ＝{|Δf _it |、|ACE _it |、CPS1 _it}, wherein ,S_it Status space set of regional power grid for ith power adjustment period, |Δf _it Frequency deviation of regional power grid of ith power adjustment period, |ACE _it Regional control deviation of regional power grid with i being ith power adjustment period, CPS1 _it The control performance standard value of the regional power grid in the ith power adjustment period, i is the number of days of the historical frequency modulation sample, p is the total number of days of the historical frequency modulation sample, i epsilon [1, p]T is the power regulation period, T is the maximum period number of power regulation, T is [1, T ]]The method comprises the steps of carrying out a first treatment on the surface of the The control action set a includes a power adjustment amount Δp of the regional power grid.

Further, in step S2, the clustering the elements of the state space set by using the K-means algorithm includes the following steps:

s2.1 initializing K clusters C _k ，k∈[1,K]The cluster center of each cluster is { u }, respectively ₁ ,u ₂ ,…,u _k ,…,u _K }，u _k For the kth cluster center, by the attribute of the kth cluster C _k The average value of all the historical frequency modulation samples is determined, and the calculation formula of the clustering center is as follows:

wherein ,n_Ck To be attributed to the kth cluster C _k The number of all historical frequency modulation samples;

s2.2 calculating the state space sets { S }, respectively _1t ,S _2t ,…,S _it ,…,S _pt And kth cluster center { u } ₁ ,u ₂ ,…,u _k ,…,u _K Euclidean distance; the Euclidean distance is:

s2.3, dividing a historical frequency modulation sample of the power grid into cluster centers with the nearest Euclidean distance, and calculating a cluster objective function as follows:

s2.4, reducing a clustering objective function J by adjusting the number K of the clustering centers, and clustering historical frequency modulation samples of the power grid according to the current clustering center when the J reaches the minimum.

Further, the step S3 specifically includes the following steps:

s3.1, constructing a condition generation type countermeasure network;

s3.2, adopting the minimum Wasserstein distance as an objective function of the condition generation type countermeasure network is as follows:

wherein p (x) is the distribution of the historic frequency modulation samples, p (Z) is the distribution of the generated samples, D (x) is the output of the historic frequency modulation sample passing through the discriminator, D (G (Z)) is the output of the generated samples passing through the discriminator, E _x～p(x) [D(x)]To obtain the expected value of the historical frequency modulation sample through the output of the discriminator, E _x～p(x) [D(x)]Obtaining an expected value for the output of the passing discriminator of the generated sample;

s3.3, training according to the objective function in the step S3.2 to obtain an optimal generator, and randomly inputting noise Z to generate a generated sample with similar distribution with the historical frequency modulation sample.

Further, the condition generating type countermeasure network includes a generator and a discriminator; the input of the generator is noise and sample labels, and the output of the generator is a generated sample with the sample labels; the input of the discriminator is a generated sample with a sample tag or a history frequency modulation sample with a sample tag, the output is the probability of attributing to the history frequency modulation sample and is used for distinguishing the history frequency modulation sample from the generated sample, if the input of the discriminator is the generated sample with the sample tag, the output of the discriminator is close to 0, and if the input of the discriminator is the history frequency modulation sample with the sample tag, the output of the discriminator is close to 1;

the probability of the discriminator output is passed to the generator through the objective function of the conditional generation antagonism network, the generator is updated by minimizing the objective function, and the discriminator is updated by maximizing the objective function.

Further, the step S4 specifically includes the following steps:

s4.1, merging the historical frequency modulation samples and the generated samples with similar distribution with the historical frequency modulation samples according to the state space set S and the control action set A respectively to obtain enhanced samples, wherein the merging method is that the state space set of the historical frequency modulation samples is directly added at last, and the control action set of the historical frequency modulation samples is directly added at last;

s4.2, clustering the state space set S of the enhanced sample according to a K-means algorithm to form K _new Clustering centers to form K _new A cluster;

s4.3, K formed by clustering by using multi-layer perceptron MLP _new The clusters respectively establish a mapping model from S to A.

Further, in step S4.3, the multi-layer sensor MLP is formed by an input layer, a hidden layer and an output layer, each layer is connected to the next layer in a fully connected manner, the output of the previous layer is processed by an activation function and then is used as the input of the next layer, and except for the input layer, the activation function of each layer adopts a sigmoid nonlinear activation function;

in the input layer, K is selected in positive sequence _new Cluster C _k Cluster C _k Is used as the input of the input layer of the multi-layer sensor, and the width of the input layer is cluster C _k State quantity of state space set of (2)Input layer +.>The individual states respectively correspond to the unique control actions t _k ；

Cluster C _k The label of the control action set of the multi-layer sensor is used as the output of the output layer of the multi-layer sensor, and the width of the output layer is cluster C _k Number of selectable control actions of the control action set of (2)If%>The unique control action corresponding to the state of each input is consistent with the control action of the output, and then e is lost _n 0, if not, lose e _n 1 is shown in the specification;

cluster C _k The multi-layer sensor is provided with 3 hidden layers, the width of the 1 st hidden layer is 128, the width of the 2 nd hidden layer is 128, and the width of the 3 rd hidden layer is 64; cluster C _k By minimizing the loss function E _k Updating weight parameters of the multi-layer perceptron; cluster C _k The loss function of the multi-layer sensor is defined as:

further, in step S5, the value function of the Q learning controller is:

wherein: a, S, S 'are action selection, current state and next state, a epsilon A, S epsilon S, S' epsilon S respectively; the action of the Q learning controller is selected as the power adjustment quantity delta P of the regional power grid, and the real-time state space S of the Q learning controller at a certain moment _t For this moment the frequency deviation |Δf of the regional power grid _t |, regional control offset|ace _t Standard value of control performance CPS1 _t The method comprises the steps of carrying out a first treatment on the surface of the Q (s, a) is the current state s after action a occursThe state-action pair value function which can be calculated iteratively, R (s, s ', a) is immediate compensation after the current state s is transferred to the state s' after the action a occurs, and P (s '|s, a) is the probability of the environment after the current state s is transferred to the state s' after the action a occurs; gamma is the discount rate; q (s ', a) is an iteratively computable state-action pair function after action a occurs for the next state s'; p (s ' |s, a) is synchronously updated along with the updating of Q (s, a), and P (s ' |s, a) is updated according to the proportion of Q (s, a) before and after updating, so that P (s ' |s, a) of the next iteration is obtained; the grid frequency modulation control strategy is to select action a with the largest Q (s, a) in any state s.

Further, the iterative formula of the updated value function of the Q learning controller is:

wherein ：Q^k+1 An approximation of the ideal value Q obtained for the (k+1) th iteration, Q ^k For the approximation of the ideal value Q obtained in the kth iteration, the Q learning controller obtains a sample s in the kth +1 iteration _k ,a,r,s _k+1 ]；R(s _k ,a _k ,s _k+1 ) For the current state s _k In the occurrence of action a _k Backward state s _k+1 Immediate rewards after transfer; alpha is learning rate, 0<α<1, improving the confidence coefficient of the part for two times of iteration; a' refers to any one of the control actions in the action set a.

Further, the immediate reward function R (s, s', a) of the Q learning controller at the t-th power adjustment period is:

wherein: ACE (s, s ', a) and CPS1 (s, s ', a) are both real-time measurements of the current state s after the transition of action a to state s '; ACE (angiotensin converting enzyme) ^* (s, s', a) and CPS1 ^* (s, s ', a) are the ACE control ideal value and CPS1 index control ideal value respectively after the transition of the current state s to the state s' after the occurrence of the action a,wherein ACE is ^* (s, s', a) is taken as ACE-modulating dead zone value, CPS1 ^* The value interval of (s, s', a) is [180,220 ]]。

Compared with the prior art, the invention has the following beneficial effects:

according to the power grid frequency modulation control method based on the combination of the data driving and the physical model driving, the data driving and the model driving are combined, the historical frequency modulation control data of the power grid are fully utilized, the historical frequency modulation control data are subjected to data enhancement by the generated anti-network method, the diversity and the robustness of the data are improved, the decision result obtained by the data driving method is used as the initial value of the physical model driving method, the effectiveness of selecting the initial value of the physical model can be improved, the calculation efficiency of the physical model method in the initial stage of an algorithm can be improved, and meanwhile, the historical frequency modulation data subjected to engineering verification are mined by the data driving method, so that the optimal solution is close, and the global convergence speed and the global convergence accuracy of the physical model driving method can be improved.

Drawings

Fig. 1 is a schematic flow chart of a power grid frequency modulation control method based on combination of data driving and physical model driving;

FIG. 2 is a schematic diagram of a conditional access network according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a multi-layer perceptron in accordance with an embodiment of the present invention;

fig. 4 is a schematic diagram of a Q learning controller according to an embodiment of the invention.

Detailed Description

For a better understanding of the present invention, a detailed description of a specific embodiment of the present invention will be provided below, and the accompanying drawings will be clearly and completely described, wherein the described embodiment is only a part of the embodiment of the present invention, not all of the embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Examples:

as shown in fig. 1, the power grid frequency modulation control method based on the combination of data driving and physical model driving comprises the following steps:

the state space set S comprises the frequency deviation |delta f| of the regional power grid, the regional control deviation |ACE|, and the control performance standard value CPS1, namely S _it ＝{|Δf _it |、|ACE _it |、CPS1 _it}, wherein ,S_it Status space set of regional power grid for ith power adjustment period, |Δf _it Frequency deviation of regional power grid of ith power adjustment period, |ACE _it Regional control deviation of regional power grid with i being ith power adjustment period, CPS1 _it The control performance standard value of the regional power grid in the ith power adjustment period, i is the number of days of the historical frequency modulation sample, p is the total number of days of the historical frequency modulation sample, i epsilon [1, p]T is the power regulation period, T is the maximum period number of power regulation, T is [1, T ]]The method comprises the steps of carrying out a first treatment on the surface of the The control action set a includes a power adjustment amount Δp of the regional power grid.

S2, clustering elements of a state space set by using a k-means algorithm, wherein a clustering result is used as a sample label of a conditional generation type countermeasure network (CGAN); the clustering of the elements of the state space set by using the K-means algorithm comprises the following steps:

wherein ,to be attributed to the kth cluster C _k The number of all historical frequency modulation samples;

S3, taking the noise Z, the state space set S, the control action set A and the sample label Y as the input of a conditional generation type countermeasure network, training the CGAN by adopting the minimized Wasserstein distance as an objective function, and generating a generated sample with similar distribution with the historical frequency modulation sample; the method specifically comprises the following steps:

s3.1, constructing a condition generation type countermeasure network;

as shown in fig. 2, the condition-generating type countermeasure network includes a generator and a discriminator; the input of the generator is noise and sample labels, and the output of the generator is a generated sample with the sample labels; the input of the discriminator is a generated sample with a sample tag or a history frequency modulation sample with a sample tag, the output is the probability of attributing to the history frequency modulation sample and is used for distinguishing the history frequency modulation sample from the generated sample, if the input of the discriminator is the generated sample with the sample tag, the output of the discriminator is close to 0, and if the input of the discriminator is the history frequency modulation sample with the sample tag, the output of the discriminator is close to 1;

the probability output by the discriminator is transmitted to the generator and the discriminator through the objective function of the conditional generation type countermeasure network, the generator is updated by minimizing the objective function, and the discriminator is updated by maximizing the objective function;

S4, enhancing the generated samples to the historical frequency modulation samples to obtain enhanced samples, and introducing the enhanced samples into a multi-layer perceptron MLP to establish a mapping model from S to A; the method specifically comprises the following steps:

As shown in fig. 3, the multi-layer sensor MLP is composed of an input layer, a hidden layer and an output layer, each layer is connected to the next layer in a fully connected manner, the output of the previous layer is processed by an activation function and then is used as the input of the next layer, and except for the input layer, the activation function of each layer adopts a sigmoid nonlinear activation function;

and S5, controlling a physical model of the power grid frequency modulation by using the Q learning controller, taking a scheduling decision result of the mapping model as an initial value of the Q learning controller, and outputting an optimal solution of the power grid frequency modulation control strategy to obtain power adjustment quantity corresponding to the power grid frequency deviation of each period.

As shown in fig. 4, the value function of the Q learning controller is:

wherein: a, S, S 'are action selection, current state and next state, a epsilon A, S epsilon S, S' epsilon S respectively; the action of the Q learning controller is selected as the power adjustment quantity delta P of the regional power grid, and the real-time state space S of the Q learning controller at a certain moment _t For this moment the frequency deviation |Δf of the regional power grid _t |, regional control offset|ace _t Standard value of control performance CPS1 _t The method comprises the steps of carrying out a first treatment on the surface of the Q (s, a) is an iteratively computable state-action pair value function after action a occurs in the current state s, R (s, s ', a) is immediate consideration after action a transition to state s' occurs in the current state s, and P (s '|s, a) is the probability of the environment being transitioned from the current state s to state s' after action a occurs; gamma is the discount rate; q (s ', a) is an iteratively computable state-action pair function after action a occurs for the next state s'; p (s ' |s, a) is synchronously updated along with the updating of Q (s, a), and P (s ' |s, a) is updated according to the proportion of Q (s, a) before and after updating, so that P (s ' |s, a) of the next iteration is obtained; the grid frequency modulation control strategy is to select action a with the largest Q (s, a) in any state s.

The iterative formula of the updated value function of the Q learning controller is as follows:

The immediate reward function R (s, s', a) of the Q learning controller at the t-th power adjustment period is:

wherein: ACE (s, s ', a) and CPS1 (s, s ', a) are both real-time measurements of the current state s after the transition of action a to state s '; ACE (angiotensin converting enzyme) ^* (s, s', a) and CPS1 ^* (s, s ', a) are ACE control ideal value and CPS1 index control ideal value after the transition of the current state s to the state s' after the occurrence of the action a, respectively, wherein ACE ^* (s, s', a) is taken as ACE-modulating dead zone value, CPS1 ^* The value interval of (s, s', a) is [180,220 ]]。

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. The power grid frequency modulation control method based on the combination of data driving and physical model driving is characterized by comprising the following steps of:

s1, according to the power gridA state space set S and a control action set A of the power grid are determined through historical frequency modulation samples; the state space set S comprises the frequency deviation |delta f| of the regional power grid, the regional control deviation |ACE|, and the control performance standard value CPS1, namely S _it ＝{|Δf _it |、|ACE _it |、CPS1 _it}, wherein ,S_it Status space set of regional power grid for ith power adjustment period, |Δf _it Frequency deviation of regional power grid of ith power adjustment period, |ACE _it Regional control deviation of regional power grid with i being ith power adjustment period, CPS1 _it The control performance standard value of the regional power grid in the ith power adjustment period, i is the number of days of the historical frequency modulation sample, p is the total number of days of the historical frequency modulation sample, i epsilon [1, p]T is the power regulation period, T is the maximum period number of power regulation, T is [1, T ]]The method comprises the steps of carrying out a first treatment on the surface of the The control action set A comprises a power adjustment quantity delta P of the regional power grid;

s2.4, reducing a clustering objective function J by adjusting the number K of the clustering centers, and clustering historical frequency modulation samples of the power grid according to the current clustering center when the J reaches the minimum;

s3.1, constructing a condition generation type countermeasure network;

wherein p (x) is the distribution of the historic frequency modulated samples, p (Z) is the distribution of the generated samples, D (x) is the output of the historic frequency modulated samples through the discriminator, D (G (Z)) is the output of the generated samples through the discriminator,to find the expected value for the output of the historic fm sample through the discriminator,/>obtaining an expected value for the output of the passing discriminator of the generated sample;

s3.3, training according to the objective function in the step S3.2 to obtain an optimal generator, and randomly inputting noise Z to generate a generated sample with similar distribution with the historical frequency modulation sample;

s5, using a physical model of grid frequency modulation controlled by a Q learning controller, taking a scheduling decision result of the mapping model as an initial value of the Q learning controller, and outputting an optimal solution of a grid frequency modulation control strategy to obtain grid frequency deviation and corresponding power adjustment quantity of each period; the value function of the Q learning controller is as follows:

wherein: a, S, S 'are action selection, current state and next state, a epsilon A, S epsilon S, S' epsilon S respectively; the action of the Q learning controller is selected as the power adjustment quantity delta P of the regional power grid, and the real-time state space S of the Q learning controller at a certain moment _t For this moment the frequency deviation |Δf of the regional power grid _t |, regional control offset|ace _t Standard value of control performance CPS1 _t The method comprises the steps of carrying out a first treatment on the surface of the Q (s, a) is an iteratively computable state-action pair value function after action a occurs in the current state s, R (s, s ', a) is immediate consideration after action a transition to state s' occurs in the current state s, and P (s '|s, a) is the probability of the environment being transitioned from the current state s to state s' after action a occurs; gamma is the discount rate; q (s ', a) is an iteratively computable state-action pair function after action a occurs for the next state s'; p (s ' |s, a) is synchronously updated along with the updating of Q (s, a), and P (s ' |s, a) is updated according to the proportion of Q (s, a) before and after updating, so that P (s ' |s, a) of the next iteration is obtained; power grid regulationThe frequency control strategy is to select the action with the largest Q (s, a) in any state s;

2. The grid frequency modulation control method based on a combination of data driving and physical model driving according to claim 1, wherein the condition generating type countermeasure network comprises a generator and a discriminator; the input of the generator is noise and sample labels, and the output of the generator is a generated sample with the sample labels; the input of the discriminator is a generated sample with a sample tag or a history frequency modulation sample with a sample tag, the output is the probability of attributing to the history frequency modulation sample and is used for distinguishing the history frequency modulation sample from the generated sample, if the input of the discriminator is the generated sample with the sample tag, the output of the discriminator is close to 0, and if the input of the discriminator is the history frequency modulation sample with the sample tag, the output of the discriminator is close to 1;

3. The grid frequency modulation control method based on the combination of data driving and physical model driving according to claim 1, wherein step S4 specifically comprises the following steps:

4. The method for controlling frequency modulation of a power grid based on combination of data driving and physical model driving according to claim 3, wherein in step S4.3, the multi-layer sensor MLP is composed of an input layer, a hidden layer and an output layer, each layer is connected to the next layer in a fully connected manner, the output of the previous layer is processed by an activation function and then is used as the input of the next layer, and the activation function of each layer except the input layer adopts a sigmoid nonlinear activation function;

cluster C _k The multilayer sensor of (2) is provided with 3 hidden layers, the width of the 1 st hidden layer is 128, and the width of the 2 nd hidden layerThe degree is 128, and the width of the 3 rd hidden layer is 64; cluster C _k By minimizing the loss function E _k Updating weight parameters of the multi-layer perceptron; cluster C _k The loss function of the multi-layer sensor is defined as:

5. the grid frequency modulation control method based on the combination of data driving and physical model driving according to claim 1, wherein the iterative formula of the update value function of the Q learning controller is:

6. The grid frequency modulation control method based on a combination of data driving and physical model driving according to claim 1, wherein the immediate compensation function R (s, s', a) of the Q learning controller at the t-th power adjustment period is:

wherein: ACE (s, s ', a) and CPS1 (s, s', a) are both present states s occurringReal-time measurement after the motion a is transferred to the state s'; ACE (angiotensin converting enzyme) ^* (s, s', a) and CPS1 ^* (s, s ', a) are ACE control ideal value and CPS1 index control ideal value after the transition of the current state s to the state s' after the occurrence of the action a, respectively, wherein ACE ^* (s, s', a) is taken as ACE-modulating dead zone value, CPS1 ^* The value interval of (s, s', a) is [180,220 ]]。