CN110880773B - Power grid frequency modulation control method based on combination of data driving and physical model driving - Google Patents
Power grid frequency modulation control method based on combination of data driving and physical model driving Download PDFInfo
- Publication number
- CN110880773B CN110880773B CN201911129495.7A CN201911129495A CN110880773B CN 110880773 B CN110880773 B CN 110880773B CN 201911129495 A CN201911129495 A CN 201911129495A CN 110880773 B CN110880773 B CN 110880773B
- Authority
- CN
- China
- Prior art keywords
- frequency modulation
- sample
- power grid
- layer
- state
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 42
- 238000013507 mapping Methods 0.000 claims abstract description 11
- 238000012549 training Methods 0.000 claims abstract description 7
- 230000002708 enhancing effect Effects 0.000 claims abstract description 4
- 230000009471 action Effects 0.000 claims description 68
- 230000006870 function Effects 0.000 claims description 54
- 102100026422 Carbamoyl-phosphate synthase [ammonia], mitochondrial Human genes 0.000 claims description 24
- 101000855412 Homo sapiens Carbamoyl-phosphate synthase [ammonia], mitochondrial Proteins 0.000 claims description 24
- 101000983292 Homo sapiens N-fatty-acyl-amino acid synthase/hydrolase PM20D1 Proteins 0.000 claims description 24
- 101000861263 Homo sapiens Steroid 21-hydroxylase Proteins 0.000 claims description 24
- UUUHXMGGBIUAPW-UHFFFAOYSA-N 1-[1-[2-[[5-amino-2-[[1-[5-(diaminomethylideneamino)-2-[[1-[3-(1h-indol-3-yl)-2-[(5-oxopyrrolidine-2-carbonyl)amino]propanoyl]pyrrolidine-2-carbonyl]amino]pentanoyl]pyrrolidine-2-carbonyl]amino]-5-oxopentanoyl]amino]-3-methylpentanoyl]pyrrolidine-2-carbon Chemical compound C1CCC(C(=O)N2C(CCC2)C(O)=O)N1C(=O)C(C(C)CC)NC(=O)C(CCC(N)=O)NC(=O)C1CCCN1C(=O)C(CCCN=C(N)N)NC(=O)C1CCCN1C(=O)C(CC=1C2=CC=CC=C2NC=1)NC(=O)C1CCC(=O)N1 UUUHXMGGBIUAPW-UHFFFAOYSA-N 0.000 claims description 23
- 102000004270 Peptidyl-Dipeptidase A Human genes 0.000 claims description 23
- 108090000882 Peptidyl-Dipeptidase A Proteins 0.000 claims description 23
- 230000004913 activation Effects 0.000 claims description 9
- 230000007704 transition Effects 0.000 claims description 7
- SLXKOJJOQWFEFD-UHFFFAOYSA-N 6-aminohexanoic acid Chemical compound NCCCCCC(O)=O SLXKOJJOQWFEFD-UHFFFAOYSA-N 0.000 claims description 6
- 238000011217 control strategy Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000005259 measurement Methods 0.000 claims description 3
- 238000012546 transfer Methods 0.000 claims description 3
- 230000008485 antagonism Effects 0.000 claims description 2
- 230000008569 process Effects 0.000 abstract description 2
- 239000004576 sand Substances 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013499 data model Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/24—Arrangements for preventing or reducing oscillations of power in networks
Landscapes
- Engineering & Computer Science (AREA)
- Power Engineering (AREA)
- Feedback Control In General (AREA)
Abstract
The invention discloses a power grid frequency modulation control method based on combination of data driving and physical model driving. The method comprises the steps of determining a state space set of a power grid according to a historical frequency modulation sample of the power gridSAnd controlling the set of actionsAThen clustering elements of the state space set, taking a clustering result as a sample label of a condition generation type countermeasure network, training the condition generation type countermeasure network, generating a new sample with similar distribution with a historical frequency modulation sample, enhancing the historical frequency modulation sample by the new sample, introducing a multi-layer perceptron MLP to establish a mapping model, controlling a physical model of power grid frequency modulation by using a Q learning controller, taking a scheduling decision result of the mapping model as an initial value of the physical model, outputting an optimal solution of a power grid frequency modulation strategy, namely power adjustment quantity corresponding to the power grid frequency deviation at a certain moment, and modulating the power grid. The method introduces the generated type countermeasure network to enhance the data, and improves the efficiency of the initial iteration process of the existing model-driven power grid frequency modulation strategy.
Description
Technical Field
The invention relates to the technical field of frequency modulation control of power systems, in particular to a power grid frequency modulation control method based on combination of data driving and physical model driving.
Background
Over time, a large number of power grid frequency modulation control methods are used for actual power grids, massive historical decision schemes are accumulated, and besides the schemes are verified by engineering, the schemes are corrected by a dispatcher according to actual conditions, so that the method has high engineering application value. However, the previous big data technology has a low development level and does not have the capability of processing massive historical decision schemes, so the data-driven frequency modulation method is relatively lacking. In recent years, with the rapid development of artificial intelligence technology and the rapid progress of big data technology in hardware, software and algorithms, it has become possible to mine useful scheduling experience from massive historical frequency modulation control data.
The traditional power grid frequency modulation control method is basically based on model driving, has high requirements on the quality of a data model and an algorithm, is generally selected randomly, causes long time consumption in the initial stage of the algorithm, is easy to sink into local convergence, and causes waste of data resources due to insufficient utilization of a historical decision scheme, and when the processed problem is met, the complete operation is needed to be carried out again, so that the time and the effort are consumed. Therefore, how to combine data driving and physical model driving to improve the physical driving model method is a problem that needs to be researched in the new power grid in the big data age.
Disclosure of Invention
Therefore, the invention aims to provide the power grid frequency modulation control method based on the combination of the data driving and the physical model driving, and the generated type countermeasure network is introduced to carry out data enhancement, so that the diversity and the robustness of a historical sample can be improved, the data driving and the model driving are combined, the historical frequency modulation data of the power grid can be fully utilized, the effectiveness of the initial value selection of the physical model can be improved, and the efficiency of the initial iteration process of the power grid frequency modulation strategy based on the model driving can be improved.
The object of the invention is achieved by at least one of the following technical solutions.
A power grid frequency modulation control method based on combination of data driving and physical model driving comprises the following steps:
s1, determining a state space set S and a control action set A of a power grid according to a historical frequency modulation sample of the power grid;
s2, clustering elements of a state space set by using a k-means algorithm, wherein a clustering result is used as a sample label of a conditional generation type countermeasure network (CGAN);
s3, taking the noise Z, the state space set S, the control action set A and the sample label Y as the input of a conditional generation type countermeasure network, training the CGAN by adopting the minimized Wasserstein distance as an objective function, and generating a generated sample with similar distribution with the historical frequency modulation sample;
s4, enhancing the generated samples to the historical frequency modulation samples to obtain enhanced samples, and introducing the enhanced samples into a multi-layer perceptron MLP to establish a mapping model from S to A;
s5, using a physical model of grid frequency modulation controlled by a Q learning controller, taking a scheduling decision result of the mapping model as an initial value of the Q learning controller, and outputting an optimal solution of a grid frequency modulation control strategy to obtain grid frequency deviation and corresponding power adjustment quantity of each period;
s6, frequency modulation is carried out on the power grid according to the power grid frequency deviation of each period and the corresponding power adjustment quantity.
Further, in step S1, the state space set S includes a regional power gridFrequency deviation |Δf|, area control deviation |ace|, control performance standard value CPS1, i.e. S it ={|Δf it |、|ACE it |、CPS1 it}, wherein ,Sit Status space set of regional power grid for ith power adjustment period, |Δf it Frequency deviation of regional power grid of ith power adjustment period, |ACE it Regional control deviation of regional power grid with i being ith power adjustment period, CPS1 it The control performance standard value of the regional power grid in the ith power adjustment period, i is the number of days of the historical frequency modulation sample, p is the total number of days of the historical frequency modulation sample, i epsilon [1, p]T is the power regulation period, T is the maximum period number of power regulation, T is [1, T ]]The method comprises the steps of carrying out a first treatment on the surface of the The control action set a includes a power adjustment amount Δp of the regional power grid.
Further, in step S2, the clustering the elements of the state space set by using the K-means algorithm includes the following steps:
s2.1 initializing K clusters C k ,k∈[1,K]The cluster center of each cluster is { u }, respectively 1 ,u 2 ,…,u k ,…,u K },u k For the kth cluster center, by the attribute of the kth cluster C k The average value of all the historical frequency modulation samples is determined, and the calculation formula of the clustering center is as follows:
wherein ,nCk To be attributed to the kth cluster C k The number of all historical frequency modulation samples;
s2.2 calculating the state space sets { S }, respectively 1t ,S 2t ,…,S it ,…,S pt And kth cluster center { u } 1 ,u 2 ,…,u k ,…,u K Euclidean distance; the Euclidean distance is:
s2.3, dividing a historical frequency modulation sample of the power grid into cluster centers with the nearest Euclidean distance, and calculating a cluster objective function as follows:
s2.4, reducing a clustering objective function J by adjusting the number K of the clustering centers, and clustering historical frequency modulation samples of the power grid according to the current clustering center when the J reaches the minimum.
Further, the step S3 specifically includes the following steps:
s3.1, constructing a condition generation type countermeasure network;
s3.2, adopting the minimum Wasserstein distance as an objective function of the condition generation type countermeasure network is as follows:
wherein p (x) is the distribution of the historic frequency modulation samples, p (Z) is the distribution of the generated samples, D (x) is the output of the historic frequency modulation sample passing through the discriminator, D (G (Z)) is the output of the generated samples passing through the discriminator, E x~p(x) [D(x)]To obtain the expected value of the historical frequency modulation sample through the output of the discriminator, E x~p(x) [D(x)]Obtaining an expected value for the output of the passing discriminator of the generated sample;
s3.3, training according to the objective function in the step S3.2 to obtain an optimal generator, and randomly inputting noise Z to generate a generated sample with similar distribution with the historical frequency modulation sample.
Further, the condition generating type countermeasure network includes a generator and a discriminator; the input of the generator is noise and sample labels, and the output of the generator is a generated sample with the sample labels; the input of the discriminator is a generated sample with a sample tag or a history frequency modulation sample with a sample tag, the output is the probability of attributing to the history frequency modulation sample and is used for distinguishing the history frequency modulation sample from the generated sample, if the input of the discriminator is the generated sample with the sample tag, the output of the discriminator is close to 0, and if the input of the discriminator is the history frequency modulation sample with the sample tag, the output of the discriminator is close to 1;
the probability of the discriminator output is passed to the generator through the objective function of the conditional generation antagonism network, the generator is updated by minimizing the objective function, and the discriminator is updated by maximizing the objective function.
Further, the step S4 specifically includes the following steps:
s4.1, merging the historical frequency modulation samples and the generated samples with similar distribution with the historical frequency modulation samples according to the state space set S and the control action set A respectively to obtain enhanced samples, wherein the merging method is that the state space set of the historical frequency modulation samples is directly added at last, and the control action set of the historical frequency modulation samples is directly added at last;
s4.2, clustering the state space set S of the enhanced sample according to a K-means algorithm to form K new Clustering centers to form K new A cluster;
s4.3, K formed by clustering by using multi-layer perceptron MLP new The clusters respectively establish a mapping model from S to A.
Further, in step S4.3, the multi-layer sensor MLP is formed by an input layer, a hidden layer and an output layer, each layer is connected to the next layer in a fully connected manner, the output of the previous layer is processed by an activation function and then is used as the input of the next layer, and except for the input layer, the activation function of each layer adopts a sigmoid nonlinear activation function;
in the input layer, K is selected in positive sequence new Cluster C k Cluster C k Is used as the input of the input layer of the multi-layer sensor, and the width of the input layer is cluster C k State quantity of state space set of (2)Input layer +.>The individual states respectively correspond to the unique control actions t k ;
Cluster C k The label of the control action set of the multi-layer sensor is used as the output of the output layer of the multi-layer sensor, and the width of the output layer is cluster C k Number of selectable control actions of the control action set of (2)If%>The unique control action corresponding to the state of each input is consistent with the control action of the output, and then e is lost n 0, if not, lose e n 1 is shown in the specification;
cluster C k The multi-layer sensor is provided with 3 hidden layers, the width of the 1 st hidden layer is 128, the width of the 2 nd hidden layer is 128, and the width of the 3 rd hidden layer is 64; cluster C k By minimizing the loss function E k Updating weight parameters of the multi-layer perceptron; cluster C k The loss function of the multi-layer sensor is defined as:
further, in step S5, the value function of the Q learning controller is:
wherein: a, S, S 'are action selection, current state and next state, a epsilon A, S epsilon S, S' epsilon S respectively; the action of the Q learning controller is selected as the power adjustment quantity delta P of the regional power grid, and the real-time state space S of the Q learning controller at a certain moment t For this moment the frequency deviation |Δf of the regional power grid t |, regional control offset|ace t Standard value of control performance CPS1 t The method comprises the steps of carrying out a first treatment on the surface of the Q (s, a) is the current state s after action a occursThe state-action pair value function which can be calculated iteratively, R (s, s ', a) is immediate compensation after the current state s is transferred to the state s' after the action a occurs, and P (s '|s, a) is the probability of the environment after the current state s is transferred to the state s' after the action a occurs; gamma is the discount rate; q (s ', a) is an iteratively computable state-action pair function after action a occurs for the next state s'; p (s ' |s, a) is synchronously updated along with the updating of Q (s, a), and P (s ' |s, a) is updated according to the proportion of Q (s, a) before and after updating, so that P (s ' |s, a) of the next iteration is obtained; the grid frequency modulation control strategy is to select action a with the largest Q (s, a) in any state s.
Further, the iterative formula of the updated value function of the Q learning controller is:
wherein :Qk+1 An approximation of the ideal value Q obtained for the (k+1) th iteration, Q k For the approximation of the ideal value Q obtained in the kth iteration, the Q learning controller obtains a sample s in the kth +1 iteration k ,a,r,s k+1 ];R(s k ,a k ,s k+1 ) For the current state s k In the occurrence of action a k Backward state s k+1 Immediate rewards after transfer; alpha is learning rate, 0<α<1, improving the confidence coefficient of the part for two times of iteration; a' refers to any one of the control actions in the action set a.
Further, the immediate reward function R (s, s', a) of the Q learning controller at the t-th power adjustment period is:
wherein: ACE (s, s ', a) and CPS1 (s, s ', a) are both real-time measurements of the current state s after the transition of action a to state s '; ACE (angiotensin converting enzyme) * (s, s', a) and CPS1 * (s, s ', a) are the ACE control ideal value and CPS1 index control ideal value respectively after the transition of the current state s to the state s' after the occurrence of the action a,wherein ACE is * (s, s', a) is taken as ACE-modulating dead zone value, CPS1 * The value interval of (s, s', a) is [180,220 ]]。
Compared with the prior art, the invention has the following beneficial effects:
according to the power grid frequency modulation control method based on the combination of the data driving and the physical model driving, the data driving and the model driving are combined, the historical frequency modulation control data of the power grid are fully utilized, the historical frequency modulation control data are subjected to data enhancement by the generated anti-network method, the diversity and the robustness of the data are improved, the decision result obtained by the data driving method is used as the initial value of the physical model driving method, the effectiveness of selecting the initial value of the physical model can be improved, the calculation efficiency of the physical model method in the initial stage of an algorithm can be improved, and meanwhile, the historical frequency modulation data subjected to engineering verification are mined by the data driving method, so that the optimal solution is close, and the global convergence speed and the global convergence accuracy of the physical model driving method can be improved.
Drawings
Fig. 1 is a schematic flow chart of a power grid frequency modulation control method based on combination of data driving and physical model driving;
FIG. 2 is a schematic diagram of a conditional access network according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a multi-layer perceptron in accordance with an embodiment of the present invention;
fig. 4 is a schematic diagram of a Q learning controller according to an embodiment of the invention.
Detailed Description
For a better understanding of the present invention, a detailed description of a specific embodiment of the present invention will be provided below, and the accompanying drawings will be clearly and completely described, wherein the described embodiment is only a part of the embodiment of the present invention, not all of the embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Examples:
as shown in fig. 1, the power grid frequency modulation control method based on the combination of data driving and physical model driving comprises the following steps:
s1, determining a state space set S and a control action set A of a power grid according to a historical frequency modulation sample of the power grid;
the state space set S comprises the frequency deviation |delta f| of the regional power grid, the regional control deviation |ACE|, and the control performance standard value CPS1, namely S it ={|Δf it |、|ACE it |、CPS1 it}, wherein ,Sit Status space set of regional power grid for ith power adjustment period, |Δf it Frequency deviation of regional power grid of ith power adjustment period, |ACE it Regional control deviation of regional power grid with i being ith power adjustment period, CPS1 it The control performance standard value of the regional power grid in the ith power adjustment period, i is the number of days of the historical frequency modulation sample, p is the total number of days of the historical frequency modulation sample, i epsilon [1, p]T is the power regulation period, T is the maximum period number of power regulation, T is [1, T ]]The method comprises the steps of carrying out a first treatment on the surface of the The control action set a includes a power adjustment amount Δp of the regional power grid.
S2, clustering elements of a state space set by using a k-means algorithm, wherein a clustering result is used as a sample label of a conditional generation type countermeasure network (CGAN); the clustering of the elements of the state space set by using the K-means algorithm comprises the following steps:
s2.1 initializing K clusters C k ,k∈[1,K]The cluster center of each cluster is { u }, respectively 1 ,u 2 ,…,u k ,…,u K },u k For the kth cluster center, by the attribute of the kth cluster C k The average value of all the historical frequency modulation samples is determined, and the calculation formula of the clustering center is as follows:
wherein ,to be attributed to the kth cluster C k The number of all historical frequency modulation samples;
s2.2 calculating the state space sets { S }, respectively 1t ,S 2t ,…,S it ,…,S pt And kth cluster center { u } 1 ,u 2 ,…,u k ,…,u K Euclidean distance; the Euclidean distance is:
s2.3, dividing a historical frequency modulation sample of the power grid into cluster centers with the nearest Euclidean distance, and calculating a cluster objective function as follows:
s2.4, reducing a clustering objective function J by adjusting the number K of the clustering centers, and clustering historical frequency modulation samples of the power grid according to the current clustering center when the J reaches the minimum.
S3, taking the noise Z, the state space set S, the control action set A and the sample label Y as the input of a conditional generation type countermeasure network, training the CGAN by adopting the minimized Wasserstein distance as an objective function, and generating a generated sample with similar distribution with the historical frequency modulation sample; the method specifically comprises the following steps:
s3.1, constructing a condition generation type countermeasure network;
as shown in fig. 2, the condition-generating type countermeasure network includes a generator and a discriminator; the input of the generator is noise and sample labels, and the output of the generator is a generated sample with the sample labels; the input of the discriminator is a generated sample with a sample tag or a history frequency modulation sample with a sample tag, the output is the probability of attributing to the history frequency modulation sample and is used for distinguishing the history frequency modulation sample from the generated sample, if the input of the discriminator is the generated sample with the sample tag, the output of the discriminator is close to 0, and if the input of the discriminator is the history frequency modulation sample with the sample tag, the output of the discriminator is close to 1;
s3.2, adopting the minimum Wasserstein distance as an objective function of the condition generation type countermeasure network is as follows:
wherein p (x) is the distribution of the historic frequency modulation samples, p (Z) is the distribution of the generated samples, D (x) is the output of the historic frequency modulation sample passing through the discriminator, D (G (Z)) is the output of the generated samples passing through the discriminator, E x~p(x) [D(x)]To obtain the expected value of the historical frequency modulation sample through the output of the discriminator, E x~p(x) [D(x)]Obtaining an expected value for the output of the passing discriminator of the generated sample;
the probability output by the discriminator is transmitted to the generator and the discriminator through the objective function of the conditional generation type countermeasure network, the generator is updated by minimizing the objective function, and the discriminator is updated by maximizing the objective function;
s3.3, training according to the objective function in the step S3.2 to obtain an optimal generator, and randomly inputting noise Z to generate a generated sample with similar distribution with the historical frequency modulation sample.
S4, enhancing the generated samples to the historical frequency modulation samples to obtain enhanced samples, and introducing the enhanced samples into a multi-layer perceptron MLP to establish a mapping model from S to A; the method specifically comprises the following steps:
s4.1, merging the historical frequency modulation samples and the generated samples with similar distribution with the historical frequency modulation samples according to the state space set S and the control action set A respectively to obtain enhanced samples, wherein the merging method is that the state space set of the historical frequency modulation samples is directly added at last, and the control action set of the historical frequency modulation samples is directly added at last;
s4.2, clustering the state space set S of the enhanced sample according to a K-means algorithm to form K new Clustering centers to form K new A cluster;
s4.3, K formed by clustering by using multi-layer perceptron MLP new The clusters respectively establish a mapping model from S to A.
As shown in fig. 3, the multi-layer sensor MLP is composed of an input layer, a hidden layer and an output layer, each layer is connected to the next layer in a fully connected manner, the output of the previous layer is processed by an activation function and then is used as the input of the next layer, and except for the input layer, the activation function of each layer adopts a sigmoid nonlinear activation function;
in the input layer, K is selected in positive sequence new Cluster C k Cluster C k Is used as the input of the input layer of the multi-layer sensor, and the width of the input layer is cluster C k State quantity of state space set of (2)Input layer +.>The individual states respectively correspond to the unique control actions t k ;
Cluster C k The label of the control action set of the multi-layer sensor is used as the output of the output layer of the multi-layer sensor, and the width of the output layer is cluster C k Number of selectable control actions of the control action set of (2)If%>The unique control action corresponding to the state of each input is consistent with the control action of the output, and then e is lost n 0, if not, lose e n 1 is shown in the specification;
cluster C k The multi-layer sensor is provided with 3 hidden layers, the width of the 1 st hidden layer is 128, the width of the 2 nd hidden layer is 128, and the width of the 3 rd hidden layer is 64; cluster C k By minimizing the loss function E k Updating weight parameters of the multi-layer perceptron; cluster C k The loss function of the multi-layer sensor is defined as:
and S5, controlling a physical model of the power grid frequency modulation by using the Q learning controller, taking a scheduling decision result of the mapping model as an initial value of the Q learning controller, and outputting an optimal solution of the power grid frequency modulation control strategy to obtain power adjustment quantity corresponding to the power grid frequency deviation of each period.
As shown in fig. 4, the value function of the Q learning controller is:
wherein: a, S, S 'are action selection, current state and next state, a epsilon A, S epsilon S, S' epsilon S respectively; the action of the Q learning controller is selected as the power adjustment quantity delta P of the regional power grid, and the real-time state space S of the Q learning controller at a certain moment t For this moment the frequency deviation |Δf of the regional power grid t |, regional control offset|ace t Standard value of control performance CPS1 t The method comprises the steps of carrying out a first treatment on the surface of the Q (s, a) is an iteratively computable state-action pair value function after action a occurs in the current state s, R (s, s ', a) is immediate consideration after action a transition to state s' occurs in the current state s, and P (s '|s, a) is the probability of the environment being transitioned from the current state s to state s' after action a occurs; gamma is the discount rate; q (s ', a) is an iteratively computable state-action pair function after action a occurs for the next state s'; p (s ' |s, a) is synchronously updated along with the updating of Q (s, a), and P (s ' |s, a) is updated according to the proportion of Q (s, a) before and after updating, so that P (s ' |s, a) of the next iteration is obtained; the grid frequency modulation control strategy is to select action a with the largest Q (s, a) in any state s.
The iterative formula of the updated value function of the Q learning controller is as follows:
wherein :Qk+1 An approximation of the ideal value Q obtained for the (k+1) th iteration, Q k For the approximation of the ideal value Q obtained in the kth iteration, the Q learning controller obtains a sample s in the kth +1 iteration k ,a,r,s k+1 ];R(s k ,a k ,s k+1 ) For the current state s k In the occurrence of action a k Backward state s k+1 Immediate rewards after transfer; alpha is learning rate, 0<α<1, improving the confidence coefficient of the part for two times of iteration; a' refers to any one of the control actions in the action set a.
The immediate reward function R (s, s', a) of the Q learning controller at the t-th power adjustment period is:
wherein: ACE (s, s ', a) and CPS1 (s, s ', a) are both real-time measurements of the current state s after the transition of action a to state s '; ACE (angiotensin converting enzyme) * (s, s', a) and CPS1 * (s, s ', a) are ACE control ideal value and CPS1 index control ideal value after the transition of the current state s to the state s' after the occurrence of the action a, respectively, wherein ACE * (s, s', a) is taken as ACE-modulating dead zone value, CPS1 * The value interval of (s, s', a) is [180,220 ]]。
S6, frequency modulation is carried out on the power grid according to the power grid frequency deviation of each period and the corresponding power adjustment quantity.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims (6)
1. The power grid frequency modulation control method based on the combination of data driving and physical model driving is characterized by comprising the following steps of:
s1, according to the power gridA state space set S and a control action set A of the power grid are determined through historical frequency modulation samples; the state space set S comprises the frequency deviation |delta f| of the regional power grid, the regional control deviation |ACE|, and the control performance standard value CPS1, namely S it ={|Δf it |、|ACE it |、CPS1 it}, wherein ,Sit Status space set of regional power grid for ith power adjustment period, |Δf it Frequency deviation of regional power grid of ith power adjustment period, |ACE it Regional control deviation of regional power grid with i being ith power adjustment period, CPS1 it The control performance standard value of the regional power grid in the ith power adjustment period, i is the number of days of the historical frequency modulation sample, p is the total number of days of the historical frequency modulation sample, i epsilon [1, p]T is the power regulation period, T is the maximum period number of power regulation, T is [1, T ]]The method comprises the steps of carrying out a first treatment on the surface of the The control action set A comprises a power adjustment quantity delta P of the regional power grid;
s2, clustering elements of a state space set by using a k-means algorithm, wherein a clustering result is used as a sample label of a conditional generation type countermeasure network (CGAN); the clustering of the elements of the state space set by using the K-means algorithm comprises the following steps:
s2.1 initializing K clusters C k ,k∈[1,K]The cluster center of each cluster is { u }, respectively 1 ,u 2 ,…,u k ,…,u K },u k For the kth cluster center, by the attribute of the kth cluster C k The average value of all the historical frequency modulation samples is determined, and the calculation formula of the clustering center is as follows:
wherein ,to be attributed to the kth cluster C k The number of all historical frequency modulation samples;
s2.2 calculating the state space sets { S }, respectively 1t ,S 2t ,…,S it ,…,S pt And kth cluster center { u } 1 ,u 2 ,…,u k ,…,u K Euclidean distance; the Euclidean distance is:
s2.3, dividing a historical frequency modulation sample of the power grid into cluster centers with the nearest Euclidean distance, and calculating a cluster objective function as follows:
s2.4, reducing a clustering objective function J by adjusting the number K of the clustering centers, and clustering historical frequency modulation samples of the power grid according to the current clustering center when the J reaches the minimum;
s3, taking the noise Z, the state space set S, the control action set A and the sample label Y as the input of a conditional generation type countermeasure network, training the CGAN by adopting the minimized Wasserstein distance as an objective function, and generating a generated sample with similar distribution with the historical frequency modulation sample; the method specifically comprises the following steps:
s3.1, constructing a condition generation type countermeasure network;
s3.2, adopting the minimum Wasserstein distance as an objective function of the condition generation type countermeasure network is as follows:
wherein p (x) is the distribution of the historic frequency modulated samples, p (Z) is the distribution of the generated samples, D (x) is the output of the historic frequency modulated samples through the discriminator, D (G (Z)) is the output of the generated samples through the discriminator,to find the expected value for the output of the historic fm sample through the discriminator,/>obtaining an expected value for the output of the passing discriminator of the generated sample;
s3.3, training according to the objective function in the step S3.2 to obtain an optimal generator, and randomly inputting noise Z to generate a generated sample with similar distribution with the historical frequency modulation sample;
s4, enhancing the generated samples to the historical frequency modulation samples to obtain enhanced samples, and introducing the enhanced samples into a multi-layer perceptron MLP to establish a mapping model from S to A;
s5, using a physical model of grid frequency modulation controlled by a Q learning controller, taking a scheduling decision result of the mapping model as an initial value of the Q learning controller, and outputting an optimal solution of a grid frequency modulation control strategy to obtain grid frequency deviation and corresponding power adjustment quantity of each period; the value function of the Q learning controller is as follows:
wherein: a, S, S 'are action selection, current state and next state, a epsilon A, S epsilon S, S' epsilon S respectively; the action of the Q learning controller is selected as the power adjustment quantity delta P of the regional power grid, and the real-time state space S of the Q learning controller at a certain moment t For this moment the frequency deviation |Δf of the regional power grid t |, regional control offset|ace t Standard value of control performance CPS1 t The method comprises the steps of carrying out a first treatment on the surface of the Q (s, a) is an iteratively computable state-action pair value function after action a occurs in the current state s, R (s, s ', a) is immediate consideration after action a transition to state s' occurs in the current state s, and P (s '|s, a) is the probability of the environment being transitioned from the current state s to state s' after action a occurs; gamma is the discount rate; q (s ', a) is an iteratively computable state-action pair function after action a occurs for the next state s'; p (s ' |s, a) is synchronously updated along with the updating of Q (s, a), and P (s ' |s, a) is updated according to the proportion of Q (s, a) before and after updating, so that P (s ' |s, a) of the next iteration is obtained; power grid regulationThe frequency control strategy is to select the action with the largest Q (s, a) in any state s;
s6, frequency modulation is carried out on the power grid according to the power grid frequency deviation of each period and the corresponding power adjustment quantity.
2. The grid frequency modulation control method based on a combination of data driving and physical model driving according to claim 1, wherein the condition generating type countermeasure network comprises a generator and a discriminator; the input of the generator is noise and sample labels, and the output of the generator is a generated sample with the sample labels; the input of the discriminator is a generated sample with a sample tag or a history frequency modulation sample with a sample tag, the output is the probability of attributing to the history frequency modulation sample and is used for distinguishing the history frequency modulation sample from the generated sample, if the input of the discriminator is the generated sample with the sample tag, the output of the discriminator is close to 0, and if the input of the discriminator is the history frequency modulation sample with the sample tag, the output of the discriminator is close to 1;
the probability of the discriminator output is passed to the generator through the objective function of the conditional generation antagonism network, the generator is updated by minimizing the objective function, and the discriminator is updated by maximizing the objective function.
3. The grid frequency modulation control method based on the combination of data driving and physical model driving according to claim 1, wherein step S4 specifically comprises the following steps:
s4.1, merging the historical frequency modulation samples and the generated samples with similar distribution with the historical frequency modulation samples according to the state space set S and the control action set A respectively to obtain enhanced samples, wherein the merging method is that the state space set of the historical frequency modulation samples is directly added at last, and the control action set of the historical frequency modulation samples is directly added at last;
s4.2, clustering the state space set S of the enhanced sample according to a K-means algorithm to form K new Clustering centers to form K new A cluster;
s4.3, K formed by clustering by using multi-layer perceptron MLP new The clusters respectively establish a mapping model from S to A.
4. The method for controlling frequency modulation of a power grid based on combination of data driving and physical model driving according to claim 3, wherein in step S4.3, the multi-layer sensor MLP is composed of an input layer, a hidden layer and an output layer, each layer is connected to the next layer in a fully connected manner, the output of the previous layer is processed by an activation function and then is used as the input of the next layer, and the activation function of each layer except the input layer adopts a sigmoid nonlinear activation function;
in the input layer, K is selected in positive sequence new Cluster C k Cluster C k Is used as the input of the input layer of the multi-layer sensor, and the width of the input layer is cluster C k State quantity of state space set of (2)Input layer +.>The individual states respectively correspond to the unique control actions t k ;
Cluster C k The label of the control action set of the multi-layer sensor is used as the output of the output layer of the multi-layer sensor, and the width of the output layer is cluster C k Number of selectable control actions of the control action set of (2)If%>The unique control action corresponding to the state of each input is consistent with the control action of the output, and then e is lost n 0, if not, lose e n 1 is shown in the specification;
cluster C k The multilayer sensor of (2) is provided with 3 hidden layers, the width of the 1 st hidden layer is 128, and the width of the 2 nd hidden layerThe degree is 128, and the width of the 3 rd hidden layer is 64; cluster C k By minimizing the loss function E k Updating weight parameters of the multi-layer perceptron; cluster C k The loss function of the multi-layer sensor is defined as:
5. the grid frequency modulation control method based on the combination of data driving and physical model driving according to claim 1, wherein the iterative formula of the update value function of the Q learning controller is:
wherein :Qk+1 An approximation of the ideal value Q obtained for the (k+1) th iteration, Q k For the approximation of the ideal value Q obtained in the kth iteration, the Q learning controller obtains a sample s in the kth +1 iteration k ,a,r,s k+1 ];R(s k ,a k ,s k+1 ) For the current state s k In the occurrence of action a k Backward state s k+1 Immediate rewards after transfer; alpha is learning rate, 0<α<1, improving the confidence coefficient of the part for two times of iteration; a' refers to any one of the control actions in the action set a.
6. The grid frequency modulation control method based on a combination of data driving and physical model driving according to claim 1, wherein the immediate compensation function R (s, s', a) of the Q learning controller at the t-th power adjustment period is:
wherein: ACE (s, s ', a) and CPS1 (s, s', a) are both present states s occurringReal-time measurement after the motion a is transferred to the state s'; ACE (angiotensin converting enzyme) * (s, s', a) and CPS1 * (s, s ', a) are ACE control ideal value and CPS1 index control ideal value after the transition of the current state s to the state s' after the occurrence of the action a, respectively, wherein ACE * (s, s', a) is taken as ACE-modulating dead zone value, CPS1 * The value interval of (s, s', a) is [180,220 ]]。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911129495.7A CN110880773B (en) | 2019-11-18 | 2019-11-18 | Power grid frequency modulation control method based on combination of data driving and physical model driving |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911129495.7A CN110880773B (en) | 2019-11-18 | 2019-11-18 | Power grid frequency modulation control method based on combination of data driving and physical model driving |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110880773A CN110880773A (en) | 2020-03-13 |
CN110880773B true CN110880773B (en) | 2023-09-15 |
Family
ID=69729087
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911129495.7A Active CN110880773B (en) | 2019-11-18 | 2019-11-18 | Power grid frequency modulation control method based on combination of data driving and physical model driving |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110880773B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111461977B (en) * | 2020-03-26 | 2022-07-26 | 华南理工大学 | Power data super-resolution reconstruction method based on improved generation type countermeasure network |
CN111555368B (en) * | 2020-05-15 | 2022-12-06 | 广西大学 | Deep generation type countermeasure network scheduling and control method of comprehensive energy system |
CN113434286B (en) * | 2021-05-15 | 2024-10-01 | 南京逸智网络空间技术创新研究院有限公司 | Energy efficiency optimization method suitable for mobile application processor |
CN114662850A (en) * | 2022-02-22 | 2022-06-24 | 大连海事大学 | Electric energy prediction distribution system based on LoRaWAN network cloud monitoring |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106899026A (en) * | 2017-03-24 | 2017-06-27 | 三峡大学 | Intelligent power generation control method based on the multiple agent intensified learning with time warp thought |
CN107766937A (en) * | 2017-09-11 | 2018-03-06 | 重庆大学 | Feature based chooses and the wind power ultra-short term prediction method of Recognition with Recurrent Neural Network |
-
2019
- 2019-11-18 CN CN201911129495.7A patent/CN110880773B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106899026A (en) * | 2017-03-24 | 2017-06-27 | 三峡大学 | Intelligent power generation control method based on the multiple agent intensified learning with time warp thought |
CN107766937A (en) * | 2017-09-11 | 2018-03-06 | 重庆大学 | Feature based chooses and the wind power ultra-short term prediction method of Recognition with Recurrent Neural Network |
Also Published As
Publication number | Publication date |
---|---|
CN110880773A (en) | 2020-03-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110880773B (en) | Power grid frequency modulation control method based on combination of data driving and physical model driving | |
Kim et al. | Deep neural networks with weighted spikes | |
Mason et al. | Predicting host CPU utilization in the cloud using evolutionary neural networks | |
Chen et al. | Powernet: Multi-agent deep reinforcement learning for scalable powergrid control | |
CN103164742B (en) | A kind of server performance Forecasting Methodology based on particle group optimizing neural network | |
CN106484512B (en) | The dispatching method of computing unit | |
Yu et al. | Online tuning of a supervisory fuzzy controller for low-energy building system using reinforcement learning | |
Fan et al. | Data-centric or algorithm-centric: Exploiting the performance of transfer learning for improving building energy predictions in data-scarce context | |
CN111709672A (en) | Virtual power plant economic dispatching method based on scene and deep reinforcement learning | |
CN104318090A (en) | Least square method support vector machine-based generalized prediction method in lysozyme fermentation process | |
CN116009990B (en) | Cloud edge collaborative element reinforcement learning computing unloading method based on wide attention mechanism | |
Fan et al. | Conservative-progressive collaborative learning for semi-supervised semantic segmentation | |
CN112804103A (en) | Intelligent calculation migration method for joint resource allocation and control in block chain enabled Internet of things | |
CN116842856A (en) | Industrial process optimization method based on deep reinforcement learning | |
Zheng et al. | A dynamic multi-objective evolutionary algorithm using adaptive reference vector and linear prediction | |
CN114139778A (en) | Wind turbine generator power prediction modeling method and device | |
Ebadi et al. | Resource Allocation in The Cloud Environment with Supervised Machine learning for Effective Data Transmission | |
CN108234151B (en) | Cloud platform resource allocation method | |
Sun et al. | Deterministic and discriminative imitation (d2-imitation): revisiting adversarial imitation for sample efficiency | |
CN116090500A (en) | Evolutionary multitask optimization algorithm based on anomaly detection model | |
Wang et al. | Real-time Hybrid Modeling of Francis Hydroturbine Dynamics via a Neural Controlled Differential Equation Approach | |
CN116048785A (en) | Elastic resource allocation method based on supervised learning and reinforcement learning | |
CN113435475B (en) | Multi-agent communication cooperation method | |
Du et al. | Enhanced artificial bee colony with novel search strategy and dynamic parameter | |
Liu et al. | A PSO-RBF neural network for BOD multi-step prediction in wastewater treatment process |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |