CN103683337B - A kind of interconnected network CPS instruction dynamic assignment optimization method - Google Patents

A kind of interconnected network CPS instruction dynamic assignment optimization method Download PDF

Info

Publication number
CN103683337B
CN103683337B CN201310656811.2A CN201310656811A CN103683337B CN 103683337 B CN103683337 B CN 103683337B CN 201310656811 A CN201310656811 A CN 201310656811A CN 103683337 B CN103683337 B CN 103683337B
Authority
CN
China
Prior art keywords
unit
cps
value
state
dynamic assignment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310656811.2A
Other languages
Chinese (zh)
Other versions
CN103683337A (en
Inventor
余涛
张孝顺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201310656811.2A priority Critical patent/CN103683337B/en
Publication of CN103683337A publication Critical patent/CN103683337A/en
Application granted granted Critical
Publication of CN103683337B publication Critical patent/CN103683337B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention discloses a kind of interconnected network CPS instruction dynamic assignment optimization method, comprise the following steps: step 1, determine control objectives; Step 2, determine state discrete collection S; Step 3, selection balance unit and determine teamwork discrete set A; Step 4, calculate the instantaneous value in this region ACE (k) and the instantaneous value of CPS (k); Step 5, obtain the R of award value immediately of each intelligent body i(k); Step 6, ask for correlated equilibrium linkage strategy by linear equalization and equilibrium selection function; Step 7, corresponding operating is performed to all unit j; Step 8, when upper once control cycle arrives, return step 4.There is the frequent adjustment number of times that effectively can reduce all kinds of unit, improve the CPS control performance of AGC system, be specially adapted to the advantages such as thermoelectricity is dominant, the interconnected network CPS instruction dynamic assignment optimization of Unit Combination complexity.

Description

A kind of interconnected network CPS instruction dynamic assignment optimization method
Technical field
The present invention relates to electric power system automatic generation control technical field (i.e. frequency modulation frequency modulation), in particular to a kind of interconnected network CPS instruction dynamic assignment optimization method, this dynamic assignment optimization method is applicable to that thermoelectricity is dominant, the interconnected network CPS instruction dynamic assignment optimization of Unit Combination complexity.
Background technology
Since at interconnected network automatic generation control (AutomaticGenerationControl, control performance standard (ControlPerformanceStandard is proposed AGC), CPS), after, the qualification rate of CPS just becomes the key factor affecting AGC control strategy.Total Jiu Shi one of committed step of AGC control system is assigned to each AGC unit CPS regulating command according to certain optimized algorithm.
Traditional AGC regulating power have employed average distribution system when distributing, and does not consider the difference between each unit, can not meet CPS and regulate needs.Except intensified learning, the existing majority of the design about CPS control strategy is classical PI control structure, all can improve CPS index, NARX neural network prediction also introduced in its Chinese and fuzzy control principle is studied CPS control strategy, on the basis of improving CPS examination rate, to a certain degree reduce the frequent movement of unit.Conventional PI control and NARX neural network prediction and fuzzy control can ensure that the model uncertainty to controlled object exists has higher robustness, but also there is certain shortcoming in optimized design.Existing theoretical research shows, the height self study that intensified learning method has has better harmony and robustness with from optimizing ability solving dispatching terminal optimal power generation controlling party mask.Yu Tao, Wang Yuming, Liu Qianjin proposes a kind of CPS instruction dynamic optimal distribution method based on Q study in " interconnected network CPS regulating command dynamic optimal distributes Q-learning algorithm " (Proceedings of the CSEE), the change of running environment can be adapted to well, distribution behavior is not fixed, and improves control adaptability and the robustness of whole AGC system.Be dominant at thermoelectricity for single step Q study, application table during automatic generation control (AGC) power instruction dynamic optimization that unit time delay is larger distributes reveals convergence rate and waits deficiency slowly and the acquisition that affects optimal policy, Yu Tao, Wang Yuming, Zhen Weiguo, solve the time delay report problems that fired power generating unit long time delay link brings Deng introducing eligibility trace in (control theory and application) at " the automatic generation control instruction dynamic optimization allocation algorithm based on multistep backtracking Q study ", improve algorithm the convergence speed, meet the requirement of real-time of application on site, and system fading margin cost is saved under the prerequisite keeping AGC high qualification rate.For solving based on the dimension disaster problem in the assigning process under Q study multiple stage unit, Yu Tao, Wang Yuming, Ye Wenjia, the whole network unit is done just subseries by frequency modulation time delay by Liu Qianjin in " the CPS instruction multiobjective Dynamic Optimization allocation algorithm based on improving Hierarchical reinforcement learning ", CPS instruction successively distributes formation task hierarchy, and become coordinating factor when layering Q learning algorithm introduces one between layers, the layering Q learning algorithm of improvement effectively improves former algorithm the convergence speed.Although classical intensified learning can meet the equilibrium point obtaining under electrical network CPS appraisal standards prerequisite restraining, but adopt the distribution factor action policy that unit output interblock space is limited in the assignment procedure, make the equilibrium point sought might not be optimum equilibrium point, the adjustment of all kinds of unit is more frequent, convergence step number is also relatively long, and after convergence, CPS1 and ACE real-time curve is level and smooth not.In addition, Q study, Q (λ) study and layering Q study are all single intelligent body nitrification enhancement in itself, do not relate to the Cooperative Study between each intelligent body, and the combination of actions of each intelligent body might not be the optimum action of associating.The inventive method CEQ (λ) (Correlated-Equilibrium-Q (λ)) be can be formed by the countermeasure game between multiple intelligent body of correlated equilibrium intensified learning learn than single intelligent body Q, conventional PI control and NARX neural network prediction and the more excellent equilibrium point of fuzzy control, be more suitable for that coal electricity is dominant, the interconnected network CPS instruction dynamic optimal of Unit Combination complexity distributes, effectively improve adaptability and the robustness of system.
Summary of the invention
The object of the invention is to overcome the shortcoming of prior art and deficiency, there is provided a kind of interconnected network CPS instruction dynamic assignment optimization method, this optimization method is a kind of interconnected network CPS instruction dynamic assignment optimization method based on CEQ (λ) multiple agent Cooperative Study; CEQ (λ) learning algorithm, it is the improvement to CEQ algorithm, also be the important watershed that intensified learning develops from single intelligent body to multiple agent, the dynamic action strategy of each intelligent body is no longer merely decided by self historical action strategy and award value, but the dynamic equilibrium point that the action probability depending on other intelligent body is formed.In addition, at CEQ (λ) in the application of CPS instruction dynamic assignment, the command assignment action of every type AGC unit mentions the proportionality coefficient adopted in document before being no longer, but the increase and decrease of actual set action is exerted oneself, the teamwork interblock space of all types AGC unit is more much bigger than what mention in document above, improves the probability of the more excellent equilibrium point of searching.
Object of the present invention is achieved through the following technical solutions: a kind of interconnected network CPS instruction dynamic assignment optimization method, comprises the following steps:
Step 1, determine control objectives;
Step 2, determine state discrete collection S;
Step 3, select a class unit for balance unit, other units participate in CEQ (λ) Cooperative Study, determine teamwork discrete set A simultaneously;
Step 4, gather the real-time running data of institute control area electrical network when each control cycle starts, the practical adjustments that described real-time running data comprises frequency deviation f, power deviation Δ P and Ge Tai unit is exerted oneself Δ P gi, calculate the instantaneous value of this area control error ACE (k) and the instantaneous value of control performance standard C PS (k);
Step 5, by current state s, obtain the R of award value immediately of unit i i(k);
Step 6, to be retrained by linear equalization Σ a - i ∈ A - i π s ( a ) Q i ( s , a ) ≥ Σ a - i ∈ A - i π s ( a ) Q i ( s , ( a - i , a i ′ ) ) With equilibrium selection function ask for the optimum linkage strategy π of correlated equilibrium s *;
Wherein, A -i=∏ j ≠ ia j, A ifor the set of actions of intelligent body i, s is current state, a ifor the action of intelligent body i ,-i represents the set of other intelligent bodies except intelligent body i, and π is balance policy, Q i(s, a) is the state-operating value function of intelligent body i;
Step 7, to all study unit j, upgrade all state-actions pair state-operating value function value and eligibility trace matrix and the balanced linkage strategy of random optimum asked under current state s by the Q value upgraded again by select each unit cooperative action, more new state s and action a;
Step 8, when upper once control cycle arrives, return step 4.
Control objectives selected zone departure ACE in described step 1 is minimum, cost of electricity-generating is minimum or Control performance standard CPS is the highest.
State discrete collection S in described step 2 specifically can by the power offset value of area control error ACE (k) of institute's control area electrical network, Control performance standard CPS (k) value and its each unit | Δ P error-i| the scope of value divides to be determined.
Balance unit in described step 3 generally selects coal unit, and select the pondage such as water power and liquefied natural gas bound less but time ductility less, regulations speed is higher, adjustment expense is less unit participate in balanced study.
The expression formula of the teamwork discrete set A in described step 3 is:
A=A 1×A 2×…×A i×…×A n-1
Wherein, A ifor the output discrete movement collection of intelligent body i, n is intelligent body number.
Real-time running data in described step 4 is gathered by computer and supervisory control system.
In described step 5 R ik () is generally walk the difference value of ACE and CPS1 and each power of the assembling unit deviate Δ P by institute's control area electrical network kth error-ilinear combination to design.
Introduce the core concept of correlated equilibrium in described step 6, namely introduce the linear restriction of correlated equilibrium strategy and be suitable for the uCEQ equilibrium selection function of CPS instruction dynamic assignment optimization, make the coordination teamwork between intelligent body can reach optimum.
In described step 7 the iteration of value more new formula is:
Q j ( s , a → ) = Q j ( s , a → ) + α × δ j × e j ( s , a → ) ,
In formula, for intelligent body j is in state-action pair state-operating value function, δ jfor study deviate, for eligibility trace matrix;
δ j = ( 1 - γ ) × R j ( s , a → ) + γ × V j ( s , ) - Q j ( s , a → ) , V i t + 1 ( s ) = Σ π s t ( a ) Q i t ( s , a ) ,
In formula, γ is discount factor, and the span of γ is: 0≤γ≤1, and α is Studying factors, and the span of α is: 0≤α≤1, for the award value that intelligent body j obtains after current state s performs an action a, V j(s ') for intelligent body j is at the value function of NextState s ', Q i t(s, a) for t intelligent body i in state-action to (s, state a)-operating value function, π s ta () is balance policy, V i t+1s () is for t+1 moment intelligent body i is at the value function of state s.
Eligibility trace matrix in described step 7 the iteration of value more new formula is:
e j ( s , a → ) = γ × λ × e j ( s , a → ) ,
In formula, for eligibility trace matrix, γ is discount factor, and the span of γ is: 0≤γ≤1, and λ is decay factor, and the span of λ is: 0≤λ≤1.
Concrete scheme of the present invention comprises the following steps:
1, selected control objectives;
The target that allocating task controls has multiple choices, has that area control error (AreaControlError, ACE) is minimum, cost of electricity-generating is minimum and CPS index is the most high.Specifically describe as follows:
min E = Σ t = 1 T e ( t ) s . t . Δ P order - Σ ( t ) = Σ i = 1 n Δ P order - i ( t ) 0 ≤ Δ P order - i ( t ) - Δ P order - i ( t - 1 ) ≤ P rate + P rate - ≤ Δ P order - i ( t ) - Δ P order - i ( t - 1 ) ≤ 0 Δ P Gi min ≤ Δ P Gi ( t ) ≤ Δ P Gi max ,
In formula: t is discrete instants; E is the variance between control objectives value and working control export; E be one about the cumulative variance of e in time period T; P order-Σfor AGC system CPS command value, MW; P order-ifor being assigned to the regulating command of i-th unit, MW; be the rising adjustment rate limit of i-th unit, MW/min; be the decline regulations speed restriction of i-th unit, MW/min; P gibe that the practical adjustments of i-th unit is exerted oneself, MW; be respectively i-th unit pondage upper and lower limit, MW.
2, selected balance unit and teamwork space;
Be subject to the constraint that total instruction is known, as long as so carry out Cooperative Study to the n-1 class unit in n class AGC unit.Namely the CPS instruction regulated quantity of the n-th class unit is:
Δ P order - n ( t ) = Δ P order - Σ ( t ) - Σ i = 1 n - 1 Δ P order - i ( t ) ,
The inventive method defines the n-th unit for balance unit.
Determine teamwork discrete set A, wherein A=A 1× A 2× ... × A i× ... × A n-1, A ifor the output discrete movement collection of intelligent body i;
3, the solving of correlated equilibrium;
In markov decision process, each intelligent body maximizes respective jackpot prize value when not relying on other intelligent body action probability distribution, and now formed dynamic balance state is Nash Equilibrium.Correlated equilibrium is then contrary, and it is the dynamic equilibrium point that the action probability distribution depending on other intelligent body when each intelligent body maximizes oneself award value is formed.Correlated equilibrium mathematical description is:
Σ a - i ∈ A - i π ( a - i , a i ) R i ( a - i , a i ) ≥ Σ a - i ∈ A - i π ( a - i , a i ) R i ( a i , a i ′ ) ,
In formula: A -i=∏ j ≠ ia j, π is balance policy, R ifor the reward function immediately of intelligent body i.If a certain tactful π is for all intelligent body i, everything a i, a -i∈ A i(π (a i) >0) above formula all sets up, this strategy is correlated equilibrium dynamic equilibrium point.Correlated equilibrium can be asked for by linear programming is simple and easy.For one there is n intelligent body, Markov countermeasure (MarkovGames, MG) that each intelligent body has m action, its action is to total total m nindividual, nm (m-1) is individual altogether in the linear restriction of above formula.
4, CEQ (λ) multiple agent Cooperative Study algorithm;
Given all intelligent body i ∈ N, all state s ∈ S and action a ∈ A (s) are at the Q value of moment t: Q i t(s, a); Given balance policy π t; Given equilibrium selection function f; Under correlated equilibrium condition, by the value function Q of MG rule definable moment t+1 intelligent body i i t+1(s, a) and V i t+1(s):
V i t + 1 ( s ) = Σ a ∈ A ( s ) π s t ( a ) Q i t ( s , a ) Q i t + 1 ( s , a ) = ( 1 - γ ) R i ( s , a ) + γ Σ s ′ ∈ S P [ s ′ | s , a ] V i t + 1 ( s ′ ) π s t + 1 ∈ f ( Q t + 1 ( s ) ) ,
The linear restriction of correlated equilibrium strategy is described as all intelligent body i, everything a i, a -i∈ A i(π (a i) >0) following formula all sets up:
Σ a - i ∈ A - i π s ( a ) Q i ( s , a ) ≥ Σ a - i ∈ A - i π s ( a ) Q i ( s , ( a - i , a i ′ ) ) ,
The correlated equilibrium strategy meeting above formula increases along with increasing of intelligent body, by solving the optimum linkage strategy of above formula just can perform the optimum action of associating between AGC unit.
In addition, the equilibrium selection function f that the inventive method uses is the uCEQ that GreenwaldA, HallK, ZinkevichM mention at " CorrelatedQ-learning ", that is:
f = max π s ∈ CE Σ i ∈ N Σ a → ∈ A ( s ) π s ( a → ) Q i ( s , a → ) ,
UCEQ physical significance is for maximizing all intelligent body remuneration sums, can the consideration value of fair " treating " every class AGC unit, raising regional power grid CPS is examined to qualification rate and reduces CPS power adjustments deviation, is applicable in the very high CPS instruction dynamic allocation procedure of requirement of real-time.
The present invention has following advantage and effect relative to prior art:
1, the unit output under CEQ (λ) algorithm between each unit more continuously, steadily, thus make unit actual power total amount and CPS1 curve smoother.
2, the teamwork space under CEQ (λ) algorithm between each unit is larger, thus can find more excellent isostatic equilibrium point, effectively can improve the examination qualification rate of CPS.
3, the proportion that under CEQ (λ) algorithm, coal group of motors bears load disturbance is comparatively large, and the pondage simultaneously by Hydropower Unit affects less, is more applicable for that coal electricity is dominant, the interconnected network CPS instruction dynamic assignment of hydroelectric resources scarcity.
Accompanying drawing explanation
Fig. 1 is AGC system load dynamic optimization assigning process.
Fig. 2 is CEQ (λ) control decision process.
Fig. 3 is two regional internet system loading frequency control model.
Embodiment
Below in conjunction with embodiment and accompanying drawing, the present invention is described in further detail, but embodiments of the present invention are not limited thereto.
Embodiment
The present embodiment is using the LOAD FREQUENCY Controlling model of typical IEEE two regional internet system as research object, 1 unit simulation generating link is only had in master mould, first select to learn emulation in advance in a-quadrant in this example, so use 3 kinds of unit models to substitute 1 original unit in a-quadrant, be respectively coal-fired, liquefied natural gas (liquefiednaturalgas, and Hydropower Unit LNG), B region still uses 1 original unit model, concrete model parameter and design of Simulation principle refer to Yu Tao, Wang Yuming, " interconnected network CPS regulating command dynamic optimal distributes Q-learning algorithm " (Proceedings of the CSEE) that Liu Qianjin delivers, as shown in Figure 3.To improve CPS qualification rate for control objectives, in this simulation model a-quadrant, add load disturbance, comprise load cycling disturbance and randomness load disturbance.The CPS power divider of regional power grid A is at the power offset value Δ P of comprehensive this regional power grid ACE, CPS1 instantaneous value and each unit error-iseek optimal joint action policy between each unit, wherein AGC control cycle is chosen as 8s, uses Simulink to carry out modeling and simulating research.
As shown in Figure 1, at each AGC control cycle, grid dispatching center is by EMS (energymanagementsystem, EMS) the SCADA database in obtains CPS instantaneous value, power plants generating electricity plan and history correlation values, and be sent to CPS controller, calculate the regulated quantity CPS instruction P that unit is total order-Σ.Control centre is again in conjunction with (mainly CPS1, ACE and the Δ P such as actual operating state and grid condition of each unit error-iinstantaneous value) by multiple agent CEQ (λ) by total command assignment to all kinds of AGC unit, the P thus the target adjustment obtaining each unit is exerted oneself order-i.CPS instruction is sent to power plants generating electricity control unit by the information transmission system.Meanwhile, the actual adjustment of unit is exerted oneself P by power plant giand relevant operation information is delivered in the EMS system of grid dispatching center by the information transmission system.
The inventive method CEQ (λ) adopted in CPS directive distributor can make up the shortcoming lacking linkage strategy optimizing in traditional intelligence Generation Control in regional power grid between unit, by obtaining the actual power generation of the ACE instantaneous value of regional power grid, CPS rolling mean value and each unit, seek optimal joint action policy online to make CPS long-term gain maximum.As shown in Figure 2, CEQ (λ) control decision process is divided into three phases:
1) the Q value matrix under iteration renewal current state and eligibility trace matrix e (s);
2) under given equalization target function uCEQ, linear programming for solution correlated equilibrium is passed through;
3) optimal joint action policy is performed, and observing system response, return award value and current state.
The design of this control method is very little by the impact of regional power grid and all kinds of unit model, and the characteristic of its multiple agent automatic measure on line is highly suitable for uncertain AGC stochastic system.
CPS instruction dynamic assignment optimal control method under CEQ (λ) multiple agent Cooperative Study is as follows:
1) be up to control objectives with CPS index, the constraint of all kinds of unit see remaining great waves, " interconnected network CPS regulating command dynamic optimal distribute Q-learning algorithm " that Wang Yuming, Liu Qianjin deliver;
2) power of the assembling unit real-time offsets determination discrete state collection S is analyzed: this example is incited somebody to action | Δ P error-i| value is divided into 10 states: [0,5), [and 5,10), [10,20), [20,50), [50,100), [100,200), [200,500), [500,1000), [1000,1500), [1500 ,+∞), each study unit can define 10 states thus;
3) select coal unit as balance unit, LNG unit and Hydropower Unit participate in correlated equilibrium intensified learning, wherein output action discrete set A1=A2={-100-50-20-10-505102050100}MW, and teamwork value number has A=A 1× A 2=11 × 11=121, correlated equilibrium constraint equation always has 2 × 11 × (11-1)=220.
4) collection institute's control area electrical network and the real-time running data of all AGC adjustment unit when each control cycle starts is measured: Δ f, Δ P, Δ P gi, wherein Δ f represents system frequency deviation, and Δ P represents dominant eigenvalues deviation, Δ P girepresent that the actual adjustment of i-th unit is exerted oneself; According to international evaluation method ACE=T a-T s-10B (F a-F s) (T a, T sbe respectively the actual trend value of interconnection and expect trend value; B is frequency bias coefficient; F a, F sbe respectively system actual frequency value and expected frequency value), (B is the frequency bias coefficient of this regional power grid; ε 1for interconnected network is to annual 1 minute root mean square control objectives value of frequency averaging deviation; N is the number of minutes of this examination period), CPS1=(2-CF1) × 100%, 10for interconnected network is to annual 10 minutes root mean square control objectives values of frequency averaging deviation; B netfrequency bias coefficient for whole interconnected network), the ACE (k) in this region and CPS (k) instantaneous value is calculated with formula CPS2=(1-R) × 100%;
5) ACE (k), the CPS (k) of CPS gross power distributor according to this regional power grid and the Δ P of each unit error-i(k) instantaneous value determination current state s, then the R of award value immediately being obtained each unit by state s i(k), reward function design is as follows:
R i ( k ) = &eta; i , &eta; i &GreaterEqual; 0 , C CPS 1 ( k ) &GreaterEqual; 200 R i ( k ) = 10 &times; [ E ACE ( k ) - E ACE ( k - 1 ) ] - &Delta; P error - i 2 ( k ) , E ACE ( k ) &le; 0 &cup; C CPS 1 ( k ) &Element; [ 100,200 ) R i ( k ) = 10 &times; [ E ACE ( k - 1 ) - E ACE ( k ) ] - &Delta; P error - i 2 ( k ) , E ACE ( k ) > 0 &cup; C CPS 1 ( k ) &Element; [ 100,200 ) R i ( k ) = 20 &times; [ C CPS 1 ( k ) - C CPS 1 ( k - 1 ) ] - 2 &times; &Delta; P error - i 2 ( k ) , E ACE ( k ) &le; 0 &cup; C CPS 1 ( k ) < 100 R i ( k ) = 20 &times; [ C CPS 1 ( k - 1 ) - C CPS 1 ( k ) ] - 2 &times; &Delta; P error - i 2 ( k ) , E ACE ( k ) > 0 &cup; C CPS 1 ( k ) < 100
In formula: η ifor unit i history rewards maximum, be initially 0; E aCE(k) and C cPS1k () is respectively CPS1 and the ACE instantaneous value of regional power grid kth step iteration; Δ P error-itarget adjustment for unit i is exerted oneself Δ P order-ito exert oneself Δ P with reality adjustment gidifference, i.e. Δ P error-i(k)=Δ P order-i(k-1)-Δ P gi(k);
6) by linear equalization &Sigma; a - i &Element; A - i &pi; s ( a ) Q i ( s , a ) &GreaterEqual; &Sigma; a - i &Element; A - i &pi; s ( a ) Q i ( s , ( a - i , a i &prime; ) ) (π is the linkage strategy under state s) and equilibrium selection function ask for the optimum linkage strategy of correlated equilibrium
7) to all study unit j, perform:
1. state value function is upgraded V i t + 1 ( s ) = &Sigma; &pi; s t ( a ) Q i t ( s , a ) ;
2. estimated value function error &delta; j = ( 1 - 0.2 ) &times; R j ( s , a &RightArrow; ) + 0.2 &times; V j ( s , ) - Q j ( s , a &RightArrow; ) ;
3. eligibility trace element is upgraded e j ( s , a &RightArrow; ) = e j ( s , a &RightArrow; ) + 1 ;
4. to all state-actions pair perform:
◆ upgrade Q value function Q j ( s , a &RightArrow; ) = Q j ( s , a &RightArrow; ) + 0 . 2 &times; &delta; j &times; e j ( s , a &RightArrow; ) ;
◆ upgrade eligibility trace matrix e j ( s , a &RightArrow; ) = 0.2 &times; 0.4 &times; e j ( s , a &RightArrow; ) ;
5. if current state s and NextState s ' is same state, then ask for Stochastic Equilibrium interlock optimal policy by renewal Q value at this;
6. by optimum equalization linkage strategy select each unit cooperative action;
⑦s=s',
8) when upper once control cycle arrives, step 4) is returned.
Core of the present invention is the balance selection of unit, the improvement of motion space, the design of reward function, optimum coordination strategy solve and the Q value matrix of each unit upgrades.Wherein balance the introducing of unit, the expansion of optimizing motion space, linear restriction and the equilibrium selection function of correlated equilibrium strategy are key innovations, the enforcement of this method and correlation technique thereof, the multicomputer power division in regional power grid is made to be in the state of optimum coordination all the time, state and the action of whole intelligent body are depended in the action of each intelligent body, improve the ability of collaborative power adjustments between each unit, the frequent adjustment number of times of all kinds of unit of effective reduction, be specially adapted to coal electricity be dominant, the interconnected network CPS instruction dynamic optimal of Unit Combination complexity distributes, effectively improve the adaptability of system, robustness and CPS examine qualification rate.
The application of CEQ (λ) method in CPS command assignment that the present invention proposes mainly includes: the selection of control objectives, the design of reward function, balance unit and the determination of motion space, the introducing of eligibility trace, the solving of the selection of balance function and correlated equilibrium.The method that CEQ (λ) newly proposes as the present invention, does not also have example to be applied to the very high Complex Nonlinear System of the such requirement of real-time of electric power system.
Control method of the present invention can completely be described below:
(1) selection of control objectives: the target that allocating task controls has multiple choices, has that area control error (AreaControlError, ACE) is minimum, cost of electricity-generating is minimum and CPS index is the most high.Specifically describe as follows:
min E = &Sigma; t = 1 T e ( t ) s . t . &Delta; P order - &Sigma; ( t ) = &Sigma; i = 1 n &Delta; P order - i ( t ) 0 &le; &Delta; P order - i ( t ) - &Delta; P order - i ( t - 1 ) &le; P rate + P rate - &le; &Delta; P order - i ( t ) - &Delta; P order - i ( t - 1 ) &le; 0 &Delta; P Gi min &le; &Delta; P Gi ( t ) &le; &Delta; P Gi max ,
In formula: t is discrete instants; E is the variance between control objectives value and working control export; E be one about the cumulative variance of e in time period T; P order-Σfor AGC system CPS command value, MW; P order-ifor being assigned to the regulating command of i-th unit, MW; be the rising adjustment rate limit of i-th unit, MW/min; be the decline regulations speed restriction of i-th unit, MW/min; P gibe that the practical adjustments of i-th unit is exerted oneself, MW; be respectively i-th unit pondage upper and lower limit, MW.
(2) ACE (k) of this regional power grid, CPS (k) value and unit is analyzed | Δ P error-i| value determines discrete state collection S;
(3) determine balance unit and motion space: generally select the pondage such as water power and liquefied natural gas bound less but time ductility less, that regulations speed is higher, adjustment expense is less unit participate in balanced study, and balance unit and generally select coal unit.In addition, the teamwork discrete set A=A of the present invention's proposition 1× A 2× ... × A i× ... × A n-1, A ifor the output discrete movement collection of intelligent body i.
(4) collection institute's control area electrical network and the real-time running data of all AGC adjustment unit when each control cycle starts is measured: Δ f, Δ P, Δ P gi, and calculate the ACE (k) in this region and the instantaneous value of CPS (k), wherein Δ f represents system frequency deviation, and Δ P represents dominant eigenvalues deviation, Δ P girepresent that the actual adjustment of i-th unit is exerted oneself;
(5) according to ACE (k), the CPS (k) of this regional power grid and the Δ P of each unit error-i(k) instantaneous value determination current state s, then the R of award value immediately being obtained each unit by state s i(k), R ithe difference value being generally designed to this regional power grid kth step ACE and CPS1 of (k) and Δ P error-ithe linear combination of value.
(6) by linear equalization &Sigma; a - i &Element; A - i &pi; s ( a ) Q i ( s , a ) &GreaterEqual; &Sigma; a - i &Element; A - i &pi; s ( a ) Q i ( s , ( a - i , a i &prime; ) ) (π is the linkage strategy under state s) and equilibrium selection function ask for the optimum linkage strategy π of correlated equilibrium s *;
(7) to all study unit j, perform:
1. state value function is upgraded V i t + 1 ( s ) = &Sigma; &pi; s t ( a ) Q i t ( s , a ) ;
2. estimated value function error &delta; j = ( 1 - &gamma; ) &times; R j ( s , a &RightArrow; ) + &gamma; &times; V j ( s , ) - Q j ( s , a &RightArrow; ) , Wherein γ is discount factor, 0≤γ≤1;
3. eligibility trace element is upgraded e j ( s , a &RightArrow; ) = e j ( s , a &RightArrow; ) + 1 ;
4. to all state-actions pair perform:
Upgrade Q value function Q j ( s , a &RightArrow; ) = Q j ( s , a &RightArrow; ) + &alpha; &times; &delta; j &times; e j ( s , a &RightArrow; ) , Wherein α is Studying factors, 0≤α≤1;
Upgrade eligibility trace matrix wherein λ is decay factor, 0≤λ≤1;
5. if current state s and NextState s ' is same state, then ask for Stochastic Equilibrium interlock optimal policy by renewal Q value at this;
6. by optimum equalization linkage strategy select each unit cooperative action;
⑦s=s',
(8) when upper once control cycle arrives, step (4) is returned.
Above-described embodiment is the present invention's preferably execution mode; but embodiments of the present invention are not restricted to the described embodiments; change, the modification done under other any does not deviate from Spirit Essence of the present invention and principle, substitute, combine, simplify; all should be the substitute mode of equivalence, be included within protection scope of the present invention.

Claims (9)

1. an interconnected network CPS instruction dynamic assignment optimization method, is characterized in that, comprise the following steps:
Step 1, determine control objectives;
Step 2, determine state discrete collection S;
Step 3, select a class unit for balance unit, other units participate in CEQ (λ) Cooperative Study, determine teamwork discrete set A simultaneously;
Step 4, gather the real-time running data of institute control area electrical network when each control cycle starts, the practical adjustments that described real-time running data comprises frequency deviation f, power deviation Δ P and Ge Tai unit is exerted oneself Δ P gi, calculate the instantaneous value of this area control error ACE (k) and the instantaneous value of control performance standard C PS (k);
Step 5, by current state s, obtain the R of award value immediately of unit i i(k);
Step 6, to be retrained by linear equalization &Sigma; a - i &Element; A - i &pi; s ( a ) Q i ( s , a ) &GreaterEqual; &Sigma; a - i &Element; A - i &pi; s ( a ) Q i ( s , ( a - i , a i &prime; ) ) With equilibrium selection function ask for the optimum linkage strategy π of correlated equilibrium s *;
Wherein, A -i=∏ j ≠ ia j, A ifor the output discrete movement collection of unit i, s is current state, a ifor the action of unit i ,-i represents the set of other intelligent bodies except unit i, and π is balance policy, Q i(s, a) is the state-operating value function of unit i;
Step 7, to all study unit j, upgrade all state-actions pair state-operating value function value and eligibility trace matrix and the balanced linkage strategy of random optimum asked under current state s by the Q value upgraded again by select each unit cooperative action, more new state s and action a;
Step 8, when upper once control cycle arrives, return step 4;
In described step 7 the iteration of value more new formula is:
Q j ( s , a &RightArrow; ) = Q j ( s , a &RightArrow; ) + &alpha; &times; &delta; j &times; e j ( s , a &RightArrow; ) ,
In formula, for intelligent body j is in state-action pair state-operating value function, δ jfor study deviate, for eligibility trace matrix;
&delta; j = ( 1 - &gamma; ) &times; R j ( s , a &RightArrow; ) + &gamma; &times; V j ( s , ) - Q j ( s , a &RightArrow; ) , V i t + 1 ( s ) = &Sigma;&pi; s t ( a ) Q i t ( s , a ) ,
In formula, γ is discount factor, and the span of γ is: 0≤γ≤1, and α is Studying factors, and the span of α is: 0≤α≤1, for the award value that intelligent body j obtains after current state s performs an action a, V j(s ') for intelligent body j is at the value function of NextState s ', for t unit i in state-action to (s, state a)-operating value function, π s ta () is balance policy, for t+1 moment unit i is at the value function of state s.
2. interconnected network CPS instruction dynamic assignment optimization method as claimed in claim 1, it is characterized in that, the control objectives selected zone departure ACE in described step 1 is minimum, cost of electricity-generating is minimum or Control performance standard CPS is the highest.
3. interconnected network CPS instruction dynamic assignment optimization method as claimed in claim 1, it is characterized in that, the state discrete collection S in described step 2 specifically can by the absolute value of each power of the assembling unit deviate of area control error ACE (k) of institute's control area electrical network, Control performance standard CPS (k) value and its each unit | Δ P error-i| scope divide determine.
4. interconnected network CPS instruction dynamic assignment optimization method as claimed in claim 1, it is characterized in that, balance unit in described step 3 selects coal unit, and select water power and liquefied natural gas pondage bound less but time ductility less, regulations speed is higher, adjustment expense is less unit participate in balanced study.
5. interconnected network CPS instruction dynamic assignment optimization method as claimed in claim 1, it is characterized in that, the expression formula of the teamwork discrete set A in described step 3 is:
A=A 1×A 2×…×A i×…×A n-1
Wherein, A ifor the output discrete movement collection of unit i, n is intelligent body number.
6. interconnected network CPS instruction dynamic assignment optimization method as claimed in claim 1, it is characterized in that, the real-time running data in described step 4 is gathered by computer and supervisory control system.
7. interconnected network CPS instruction dynamic assignment optimization method as claimed in claim 1, is characterized in that, in described step 5 R ik () walks the difference value of ACE and CPS1 and each power of the assembling unit deviate Δ P by institute's control area electrical network kth error-ilinear combination design, CPS1 be unit actual power total amount and.
8. interconnected network CPS instruction dynamic assignment optimization method as claimed in claim 1, it is characterized in that, introduce the linear restriction of correlated equilibrium strategy in described step 6 and be suitable for the uCEQ equilibrium selection function of CPS instruction dynamic assignment optimization, make the coordination teamwork between intelligent body reach optimum, uCEQ physical significance is for maximizing all intelligent body remuneration sums.
9. interconnected network CPS instruction dynamic assignment optimization method as claimed in claim 1, is characterized in that, the eligibility trace matrix in described step 7 the iteration of value more new formula is:
e j ( s , a &RightArrow; ) = &gamma; &times; &lambda; &times; e j ( s , a &RightArrow; ) ,
In formula, for eligibility trace matrix, γ is discount factor, and the span of γ is: 0≤γ≤1, and λ is decay factor, and the span of λ is: 0≤λ≤1.
CN201310656811.2A 2013-12-05 2013-12-05 A kind of interconnected network CPS instruction dynamic assignment optimization method Active CN103683337B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310656811.2A CN103683337B (en) 2013-12-05 2013-12-05 A kind of interconnected network CPS instruction dynamic assignment optimization method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310656811.2A CN103683337B (en) 2013-12-05 2013-12-05 A kind of interconnected network CPS instruction dynamic assignment optimization method

Publications (2)

Publication Number Publication Date
CN103683337A CN103683337A (en) 2014-03-26
CN103683337B true CN103683337B (en) 2016-01-06

Family

ID=50320008

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310656811.2A Active CN103683337B (en) 2013-12-05 2013-12-05 A kind of interconnected network CPS instruction dynamic assignment optimization method

Country Status (1)

Country Link
CN (1) CN103683337B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104037761B (en) * 2014-06-25 2017-01-11 南方电网科学研究院有限责任公司 AGC power multi-objective random optimization distribution method
CN104932267B (en) * 2015-06-04 2017-10-03 曲阜师范大学 A kind of neural network lea rning control method of use eligibility trace
CN107154635B (en) * 2017-05-22 2019-11-05 国电南瑞科技股份有限公司 A kind of AGC frequency regulation capacity calculation method suitable for frequency modulation service market
CN107367929B (en) * 2017-07-19 2021-05-04 北京上格云技术有限公司 Method for updating Q value matrix, storage medium and terminal equipment
CN107605548B (en) * 2017-08-18 2019-08-16 华电电力科学研究院 A kind of control method improving 300MW coal unit ACE response performance
CN107589672A (en) * 2017-09-27 2018-01-16 三峡大学 The intelligent power generation control method of isolated island intelligent power distribution virtual wolf pack control strategy off the net
CN108307510A (en) * 2018-02-28 2018-07-20 北京科技大学 A kind of power distribution method in isomery subzone network
CN108512220A (en) * 2018-03-22 2018-09-07 华南理工大学 A kind of network of ship dynamic reconfiguration method based on artificial intelligence
CN108845492A (en) * 2018-05-23 2018-11-20 上海电力学院 A kind of AGC system Intelligent predictive control method based on CPS evaluation criterion
CN110348681B (en) * 2019-06-04 2022-02-18 国网浙江省电力有限公司衢州供电公司 Power CPS dynamic load distribution method
CN110471297B (en) * 2019-07-30 2020-08-11 清华大学 Multi-agent cooperative control method, system and equipment
CN112186811B (en) * 2020-09-16 2022-03-25 北京交通大学 AGC unit dynamic optimization method based on deep reinforcement learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8543714B2 (en) * 2010-03-05 2013-09-24 Delta Electronics, Inc. Local power management unit and power management system employing the same

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于强化学习算法的互联电网AGC随机最优控制;袁野;《中国优秀硕士学位论文全文数据库》;20111231;第1-65页 *

Also Published As

Publication number Publication date
CN103683337A (en) 2014-03-26

Similar Documents

Publication Publication Date Title
CN103683337B (en) A kind of interconnected network CPS instruction dynamic assignment optimization method
Hossain et al. Energy scheduling of community microgrid with battery cost using particle swarm optimisation
CN112615379B (en) Power grid multi-section power control method based on distributed multi-agent reinforcement learning
CN106682810B (en) Long-term operation method of cross-basin cascade hydropower station group under dynamic production of giant hydropower station
Russell et al. Reservoir operating rules with fuzzy programming
CN102270309B (en) Short-term electric load prediction method based on ensemble learning
CN111242443B (en) Deep reinforcement learning-based economic dispatching method for virtual power plant in energy internet
CN103490413A (en) Intelligent electricity generation control method based on intelligent body equalization algorithm
CN105337310B (en) A kind of more microgrid Economical Operation Systems of cascaded structure light storage type and method
CN103699941A (en) Method for making annual dispatching operation plan for power system
CN104537428B (en) One kind meter and the probabilistic economical operation appraisal procedure of wind power integration
CN116247648A (en) Deep reinforcement learning method for micro-grid energy scheduling under consideration of source load uncertainty
Forouzandehmehr et al. Stochastic dynamic game between hydropower plant and thermal power plant in smart grid networks
CN105654224A (en) Provincial power-grid monthly electricity purchasing risk management method considering wind power uncertainty
CN106505624A (en) Determine regulator control system and the method for power distribution network distributed power source optimum ability to arrange jobs
CN115207977A (en) Active power distribution network deep reinforcement learning real-time scheduling method and system
CN107026462B (en) Energy storage device control strategy formulating method for the tracking of wind-powered electricity generation unscheduled power
Jin et al. A deep neural network coordination model for electric heating and cooling loads based on IoT data
Zhang et al. Real-time optimal operation of integrated electricity and heat system considering reserve provision of large-scale heat pumps
CN103633641B (en) A kind ofly consider the medium and long-term transaction operation plan acquisition methods that wind-powered electricity generation is received
Jiang et al. Research on short-term optimal scheduling of hydro-wind-solar multi-energy power system based on deep reinforcement learning
Li et al. Generation scheduling in a system with wind power
CN116995682B (en) Adjustable load participation active power flow continuous adjustment method and system
Ding et al. Long-term operation rules of a hydro–wind–photovoltaic hybrid system considering forecast information
CN111799793B (en) Source-grid-load cooperative power transmission network planning method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant