CN106026084A

CN106026084A - AGC power dynamic distribution method based on virtual generation tribe

Info

Publication number: CN106026084A
Application number: CN201610479264.9A
Authority: CN
Inventors: 张孝顺; 李清; 徐豪; 余涛
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2016-06-24
Filing date: 2016-06-24
Publication date: 2016-10-12
Anticipated expiration: 2036-06-24
Also published as: CN106026084B

Abstract

The invention discloses an AGC power dynamic distribution method based on a virtual generation tribe. According to the AGC power dynamic distribution method based on the virtual generation tribe, an automatic generation control (AGC) power distribution framework based on the virtual generation tribe (VGT) is firstly established, and a state discrete set and a motion discrete set are determined; a total real-time power grid generation power instruction of a present control period area is acquired, a value function matrix and selection motion of each virtual generation tribe are initialized by utilizing migration learning; motion selection is carried out according to leader strategy, and a tribe power is calculated; an instant reward function value of each virtual generation tribe is acquired; consistency weight, self-learning weight and a value function matrix of each virtual generation tribe in the present state are updated; whether an algorithm satisfies a convergence condition is determined. Through the method, under the AGC power distribution framework based on the VGT, in combination with QD learning and migration learning, AGC control time requirements can be satisfied, and the method is more suitable for solving an AGC dynamic power distribution problem of a large-scale complex power grid having properties of relatively strong randomness and nondeterminacy.

Description

A kind of AGC power dynamic allocation method based on virtual generating clan

Technical field

The present invention relates to power system Automatic Generation Control technical field, particularly relate to migrate Q based on harmonious property The virtual generating clan AGC power dynamic allocation method of study, the method is applicable to interconnected network AGC scattered control system power Random optimization dynamically distribute.

Background technology

AGC (Automatic Generation Control, Automatic Generation Control) is as EMS (Energy Management System, EMS) one of important step, it is ensured that the frequency of interconnected network and interconnection are handed over Change power and be maintained at rated value.In general, AGC is broadly divided into two processes: the 1) tracking of general power instruction, actual electric network is adjusted Degree center main PI to be used controller, also has scholar to propose the intelligent control method such as fuzzy control, intensified learning；2) general power refers to The distribution of order, often carries out merit according to engineering experience or in the method for identical variable capacity ratio fixed allocation in real system Rate is distributed, for reducing regulation expense and improving CPS (Control Performance Standard, control performance standard), remaining The scholars such as great waves have employed a series of nitrification enhancement such as Q study, multistep backtracking Q (λ) method, improvement layering Q study and carry out merit Rate dynamically distributes optimization.But, above-mentioned all of allocation algorithm is all centralized optimized algorithm, when AGC unit scale increases, Its effect of optimization is consequently increased also with decline, convergence time, it is difficult to meet the time scale requirements in 4～16 seconds of AGC.Another Aspect, centralized optimized algorithm needs to be acquired the service data of each AGC unit, is also easily caused communication blocking.

To this end, applicant " the virtual generating clan harmonious property algorithm that interconnected network AGC power dynamically distributes " (in State's electrical engineering journal) to propose power based on VGT (Virtual Generation Tribe, virtual generating clan) dynamic Distribute harmonious property algorithm, respectively with regulation expense and time-to-climb for coherency state variable, efficiently solve AGC power The Decentralized Autonomous problem of distribution, but its consistency algorithm is simple single order consistency algorithm, and use the distribution of two-layer power, Stronger to the dependency of Optimized model, it is easier to be absorbed in locally optimal solution.For improving consistency algorithm at dynamic random environment Adaptability, Moura teaches at " QD Learning:a collaborative distributed strategy for multi‐agent reinforcement learning through consensus+innovations》(IEEE Transactions on Signal Processing) carry out consistency algorithm and classical intensified learning Q algorithm highly having Machine merges, it is proposed that the how virtual clan's intensified learning QD learning algorithm that generates electricity of a kind of brand-new distributing.But, QD learns As conventional machines learning algorithm, learning new task when, do not utilize learning experience in the past and result so that When algorithm carries out the new optimization task of enquiry learning every time, will take a substantial amount of time, this also cannot be applied to ultra-short term control The AGC power distribution of time scale.

In recent years, vaild act and the knowledge of task is optimized for making full use of history, to improve of nitrification enhancement Practising efficiency, many scholars expand in-depth study to transfer learning.Fachantidis A, Partalas I, The scholars such as Tsoumakas G are at " Transferring task models in reinforcement learning Agents " (Neurocomputing) point out that the legacy information that transfer learning is intended to utilize in the past history to optimize task processes newly Optimization task, wherein the quality of migration performance depends entirely on the similarity of new task and historic task, actually distinct Optimization task also tends to interrelated.

Summary of the invention

It is an object of the invention to overcome the deficiencies in the prior art, propose a kind of based on the migration Q study of harmonious property The virtual generating clan AGC power dynamic allocation method of (consensus transfer Q-learning, CTQ).The method exists Under virtual generating clan control framework, it is adjacent clan in each clan and carries out the interactive consistency calculating of value function matrix After, clan leader can work in coordination with to self-organizing the generated output of each clan, thus reaches the effect of " Decentralized Autonomous is concentrated and coordinated " Really；Effectively utilize history optimization information to carry out quick power and dynamically distribute optimization, want meeting the control time scale of AGC Ask.

The present invention migrates the virtual generating clan AGC power dynamic allocation method of Q study based on harmonious property, including two Individual process: 1) the general power instruction of regional power grid is assigned to each virtual generating clan VGT；2) by each virtual generating clan The general power instruction of VGT is assigned to its generating set group.CTQ algorithm is applied to first process and carries out power distribution, second Assigning process is in large scale due to unit, for improve optimize calculate speed, still use applicant " interconnected network AGC power move The virtual generating clan harmonious property algorithm of state distribution " consistent the time-to-climb of simple unit in (Proceedings of the CSEE) Property algorithm carries out power distribution.Based on the transfer of behavior in transfer learning, the present invention proposes a kind of CTQ algorithm, and uses merit The linear similarity factor of rate distribution optimization task, the AGC power being organically applied to virtual generating clan dynamically distributes, effectively Solve the distributing optimization problem that complicated large scale electric network AGC power dynamically distributes, reduce the same of AGC unit regulation expense Time, the control performance standard of regional power grid can be improved.

The purpose of the present invention is achieved through the following technical solutions:

Virtual generating clan AGC power dynamic allocation method based on CTQ, comprises the following steps:

(1) build AGC power distribution frame based on virtual generating clan VGT, determine the leader of virtual generating clan VGT Person；

(2) state set S and action discrete set A is determined_Q；

(3) real-time total generated output instruction Δ P of current control period regional power grid is gathered_∑, it is current state, Transfer learning is utilized to initialize value function matrix and the selection action of each virtual generating clan；

(4) the policy selection action issued according to leader calculates i-th virtual generating clan VGT_iRegulation power Δ P_i；

(5) by Δ P_i→R_iCarry out reward function value mapping calculation, it is thus achieved that the reward function immediately of each virtual generating clan Value R_i, wherein Δ P_iIt is i-th virtual generating clan VGT_iRegulation power, R_iIt is i-th virtual generating clan VGT_iAward Function；

(6) concordance weight and the self study weight of each virtual generating clan under current state are updated；

(7) each void under current control period is updated according to the award value immediately of each virtual generating clan of current control period Send out the value function matrix Q of electricity clan_i；

(8) the value function matrix Q of leader's+1 iteration of kth in current control period is judged_i ^k+1Value with kth time iteration Jacobian matrix Q_i ^kTwo norms of difference whether less than an infinitesimal positive number ε, i.e. | | Q_i ^k+1‐Q_i ^k||₂<ε；If being unsatisfactory for, then Returning step (5), if meeting, turning next step；

(9) solve the selection action under current state, and then obtain i-th virtual generating clan VGT_iRegulation power ΔP_i；

(10) according to i-th virtual generating clan VGT_iRegulation power Δ P_i, by time-to-climb consistency algorithm try to achieve I-th virtual generating clan VGT_iW unit unit regulation power Δ P_iw, and when the next one controls cycle arrival, Return step (3).

AGC power distribution frame based on virtual generating clan in described step 1, under this framework, traditional area electricity Net is divided into several manor electrical networks according to geographical distribution, and the most virtual generating clan VGT, actually in AGC and power plant Power controls to increase between (Plant Controller, PLC) a new power generation dispatching and key-course, is by the electrical network of manor The generating set group that large power plant PLC, actively distribution AGC, microgrid AGC and load control system are constituted.Leader is used between VGT Person's follower's pattern communicates cooperation, the leader i.e. control centre of regional power grid, is responsible for power disturbance balance；Follower The dispatching terminal of the most common VGT, main being responsible for interacts collaborative with leader.

The state discrete collection S of described step 2 is to be instructed Δ P by each total generated output_∑Constituted as a state 's.

Action discrete set A in described step 2_QIt is made up of several combination of actions.When there is y action policy, n During individual virtual generating clan, action discrete set A_QIt is the matrix of a y × n, can be expressed as follows:

A_Q=[a₁ a₂ … a_y]=[(λ₁₁,λ₁₂,…,λ_1n),

(λ₂₁,λ₂₂,…,λ_2n),…,(λ_y1,λ_y2,…,λ_yn)]

Wherein, a_yFor action discrete set A_QY-th action；λ_ynFor taking y-th action policy is distributed to virtual The distribution power factor of electricity clan n.

Initial selected action in described step 3 is action discrete set A_QMiddle any action.

According to currently optimizing task in described step 3, transfer learning is utilized to initialize the value letter of each virtual generating clan Before matrix number, need to set up value function source matrix.The inventive method instructs Δ P with total generated output_∑Degree of closeness conduct The relativity evaluation index of Different Optimization task.First, total generated output is instructed Δ P_∑Carry out interval range division, interval point It is not:

{[{ΔP}_{l e f t}^{1}, {ΔP}_{r i g h t}^{1}), ..., [{ΔP}_{l e f t}^{η}, {ΔP}_{r i g h t}^{η}), ..., [{ΔP}_{l e f t}^{Z}, {ΔP}_{r i g h t}^{Z}]}

Wherein,Represent η total generated output instruction Δ P respectively_∑Left and right end points；Z is total generating Power instruction Δ P_∑Interval number.

Gather total generated output instruction Δ P of current control period_∑, it is assumed that it falls in the η load disturbance interval, then The dependency that current optimization task and left and right end points optimize task is respectively as follows:

\{\begin{matrix} r_{l e f t} = \frac{{ΔP}_{r i g h t}^{η} - {ΔP}_{Σ}}{{ΔP}_{r i g h t}^{η} - {ΔP}_{l e f t}^{η}} \\ r_{r i g h t} = \frac{{ΔP}_{Σ} - {ΔP}_{l e f t}^{η}}{{ΔP}_{r i g h t}^{η} - {ΔP}_{l e f t}^{η}} \end{matrix}

Wherein, r_left、r_rightRepresent current respectively and optimize the relevant of task end points left and right to interval corresponding optimization task Property, 0≤r_left≤ 1,0≤r_right≤ 1, r_left+r_right=1.

Therefore the initial value function matrix of the optimization task under current state is:

Q_i=r_leftQ_i,left+r_rightQ_i,rightI=1,2 ..., n

In formula: Q_i,left、Q_i,rightRepresent that virtual generating clan i is at interval left and right end points correspondence optimal value function square respectively Battle array, can be by learning acquisition (see flow chart 2) in advance.

The policy selection action that leader in described step 4 issues, is greedy strategy π^*, as follows:

π^{*} (s_{k}) = \arg \underset{a^{'} &Element; A_{Q}}{m a x} Q_{l e a d e r}^{k} (s_{k}, a^{'})

Wherein, Q_leaderRepresent the state action value function matrix of leader, s_kRepresent the state of kth time iteration；A' is Refer to motion space A_QAny one interior action.

I-th virtual generating clan VGT in described step 4_iRegulation power Δ P_iCan be calculated by equation below:

ΔP_i=λ_yiΔP_Σ

Wherein, λ_yiFor taking y-th action policy is distributed to the distribution power factor of clan of virtual generating clan i, and MeetConstraint.

Reward function value R immediately in described step 5_i(s_k,a_k,s_k+1) can design as follows:

\begin{matrix} R_{i} (s_{k}, a_{k}, s_{k + 1}) = - μ_{1} Σ_{w = 1}^{m_{i}} C_{i w} {ΔP}_{i w} - \\ μ_{2} \max_{j = i \cup j &Element; Ω_{i}} (\underset{w = 1, 2, K, m_{j}}{\min \max} ({ΔP}_{j w} / {ΔP}_{j w}^{r a t e})) \end{matrix}

Wherein, μ₁、μ₂Be respectively regulation the goal of cost and time-to-climb target weight coefficient, and μ₁>=0, μ₂>=0, adjust Joint the goal of cost refer to minimize whole control area electrical network in the adjustment cost sum of all AGC units, time-to-climb target The time-to-climb of being to minimize all units maximum；C_iwFor the adjustment cost coefficient of w unit in clan i；ΔP_iwExpressed portion Fall the generated output instruction of w unit in i；Represent the regulations speed of w unit in clan i；N is that clan is the most individual Number；m_iThe total number of unit for clan i；Ω_iRepresent the set of the clan adjacent with clan i.

Owing to the calculating of reward function value needs first to determine the regulation power of clan's each unit internal, and time-to-climb Under consistency algorithm, the unit regulation power under different clan's power situations determines that.Therefore, method of least square can be used Regulation power Δ P to clan i_iAnd R_iCarry out mapping relations matching, to accelerate the calculating of reward function value.

The concordance weight beta of each the virtual generating clan in described step 6^k(s_k,a_k) and self study weight α^k(s_k,a_k) It is updated as follows:

β^{k} (s_{k}, a_{k}) = \frac{o_{2}}{{(N_{O} (s_{k}, a_{k}) + 1)}^{τ_{2}}}

α^{k} (s_{k}, a_{k}) = \frac{o_{1}}{{(N_{O} (s_{k}, a_{k}) + 1)}^{τ_{1}}}

Wherein, N_O(s_k,a_k) represent that state action is to (s_k,a_k) algorithm explore optimizing occurrence number；o₁、o₂、τ₁、τ₂ It is respectively normal number, and τ to be met₁∈(1/2,1)、0<τ₂<τ₁‐1/(2+ε₁) constraint, wherein ε₁It it is an infinitesimal positive number.

The value function matrix Q of each virtual generating clan under current control period in described step 7_iUpdate as follows:

Q_{i}^{k + 1} (s_{k}, a_{k}) = Q_{i}^{k} (s_{k}, a_{k}) - C_{Q} (s_{k}, a_{k}) + I_{Q} (s_{k}, a_{k})

Wherein, C_Q(s_k,a_k) represent that virtual generating clan i is at s_kExecution action a under state_kTime with adjacent virtual generate electricity clan Between harmonious property update item, be also to characterize the maximum different qualities that CTQ learns with single virtual generating clan Q；I_Q(s_k, a_k) represent that virtual generating clan i is in state s_kLower execution action a_kTime self study update item, learn with single virtual generating clan Q Update mechanism is the same.Wherein, C_Q(s_k,a_k)、I_Q(s_k,a_k) iteration update respectively as follows:

C_{Q} (s_{k}, a_{k}) = β^{k} (s_{k}, a_{k}) \underset{j &Element; Ω_{i} (k)}{Σ} (Q_{i}^{k} (s_{k}, a_{k}) - Q_{j}^{k} (s_{k}, a_{k}))

\begin{matrix} I_{Q} (s_{k}, a_{k}) = α^{k} (s_{k}, a_{k}) [R_{i} (s_{k}, a_{k}, s_{k + 1}) + \\ γ \max_{a^{'} &Element; A_{Q}} Q_{i}^{k} (s_{k + 1}, a^{'}) - Q_{i}^{k} (s_{k}, a_{k})] \end{matrix}

Wherein, s_k、s_k+1Represent the state of kth time and k+1 iteration respectively；It is that virtual generating clan i is at s_k Execution action a under state_kQ-value；Ω_iK () is the collection when kth time iteration with the virtual i adjacent virtual generating clan of clan that generates electricity Close.

Time-to-climb consistency algorithm in described step 10 comprises the following steps, and asks for an interview " the interconnected network of the author in detail The virtual generating clan harmonious property algorithm that AGC power dynamically distributes " (Proceedings of the CSEE):

1. to i-th virtual generating clan VGT_iW the power of the assembling unit time-to-climb t_iwGenerating clan virtual with i-th VGT_iPower offset value Δ P_error-iInitialize, wherein

2. gather the real-time running data of current control period regional power grid, instruct Δ P including total generated output_∑, i-th Virtual generating clan VGT_iRegulation power Δ P_iAnd the real-time active power of output of each unit, and calculate virtual of i-th Electricity clan VGT_iPower offset value Δ P_error-i, simultaneously according to i-th virtual generating clan VGT_iRegulation power Δ P_iDetermine Regulations speed directionWherein, regulations speed directionDetermine according to following formula:

{ΔP}_{i w}^{r a t e} = \{\begin{matrix} P_{i w}^{r a t e +} & i f {ΔP}_{i} > 0 \\ P_{i w}^{r a t e -} & i f {ΔP}_{i} < 0 \end{matrix}

In formula:Represent i-th virtual generating clan VGT_iThe w AGC unit rising regulation rate limit；Represent i-th virtual generating clan VGT_iThe w AGC unit decline regulations speed limit.

3. according to current control period i-th virtual generating clan VGT_iThe power of the assembling unit time-to-climb t_iwEmpty with i-th Send out electricity clan VGT_iPower offset value Δ P_error-iCarry out concordance calculating；

I-th virtual generating clan VGT_iThe power of the w AGC unit time-to-climb consistent update as follows:

t_{i w} [k + 1] = Σ_{j = 1}^{m_{i}} d_{j w} [k] t_{i j} [k]

Meanwhile, for ensure VGT power-balance, head time-to-climb should update as follows:

t_{i w} [k + 1] = \{\begin{matrix} Σ_{j = 1}^{m_{i}} d_{j w} [k] t_{i j} [k] + μ_{i} {ΔP}_{e r r o r - i}, i f {ΔP}_{i} > 0 \\ Σ_{m_{i}}^{m_{i}} d_{j w} [k] t_{i j} [k] - μ_{i} {ΔP}_{e r r o r - i}, i f {ΔP}_{i} < 0 \end{matrix}

Wherein, μ_iRepresent the power error regulatory factor of clan i, μ_i＞ 0.

4. power of the assembling unit Δ P is calculated_iw；

Power of the assembling unit Δ P_iwCan be calculated by following formula:

t_{i w} = {ΔP}_{i w} / {ΔP}_{i w}^{r a t e}

5. power of the assembling unit Δ P is judged_iwThe most out-of-limit.If power of the assembling unit Δ P_iwOut-of-limit, calculate power of the assembling unit Δ the most respectively P_iwWith t time-to-climb of the power of the assembling unit_iw, and update row stochastic matrix element；Otherwise, next step is turned；

If power of the assembling unit Δ P_iwOut-of-limit, the power Δ P of unit_iwAnd time-to-climb t_iwCan recalculate as follows:

{ΔP}_{i w} = \{\begin{matrix} {ΔP}_{i w}^{\max} & i f {ΔP}_{i w} > {ΔP}_{i w}^{\max} \\ {ΔP}_{i w}^{\min} & i f {ΔP}_{i w} < {ΔP}_{i w}^{\min} \end{matrix}

t_{i w} = t_{i w}^{\max} = \{\begin{matrix} {ΔP}_{i w}^{\max} / P_{i w}^{r a t e +} & i f {ΔP}_{i w} > {ΔP}_{i w}^{\max} \\ {ΔP}_{i w}^{\min} / P_{i w}^{r a t e -} & i f {ΔP}_{i w} < {ΔP}_{i w}^{\min} \end{matrix}

Wherein,Represent VGT respectively_iThe minimum and maximum spare capacity of power of w unit.

If power of the assembling unit Δ P_iwOut-of-limit, with connection weight all vanishing of unit w, it may be assumed that a_wj=0, j=1,2, L, m_i, with Shi Jinhang corresponding row stochastic matrix element updates.

6. VGT is updated_iPower offset value Δ P_error-i；

7. precision is judged.If | Δ P_error-i| ＜ ε_i, wherein ε_iIt is an infinitesimal positive number, then obtains power of the assembling unit Δ P_im, Otherwise return step 3..

The present invention has such advantages as relative to prior art and effect:

(1) to be that one has the how virtual generating clan of " Decentralized Autonomous, concentrate coordinate " characteristic strong for the CTQ method of the present invention Change learning algorithm, the control framework of virtual generating clan can be fused to well, fill for information network from now on and energy networks Intensive energy the Internet AGC decentralized coordinated control is divided to provide new thinking.

(2) the CTQ method of the present invention is after the transfer learning introducing task dependencies, can be quickly carried out the most excellent Change, ensure the stability of global convergence simultaneously, while the control time scale meeting AGC requires, the control of CPS can be improved Performance processed, reduces the regulation expense of AGC.

(3) how virtual the CTQ method of the present invention is strong as a kind of generating clan with self study and Cooperative Study ability Change learning algorithm, be more applicable for solving having stronger randomness, probabilistic large-scale complex power grid AGC power dynamic Assignment problem.

Accompanying drawing explanation

Fig. 1 is the flow chart of present invention AGC power dynamic allocation method based on virtual generating clan.

Fig. 2 is Guangdong Power Grid VGT communication network topology figure in the present embodiment.

Detailed description of the invention

For being more fully understood that the present invention, below in conjunction with embodiment and accompanying drawing, the invention will be further described, but this Bright embodiment is not limited to this.

Embodiment

In the present embodiment based on Guangdong Power Grid LOAD FREQUENCY Controlling model, 93 the machine components participating in AGC frequency modulation Being 6 VGT, wherein the communication network topology of VGT is as in figure 2 it is shown, the relevant parameter of 93 units is referring specifically to table 1.In example The connection weight a having communication for information_ijIt is set to 1.For the void migrating Q study in model based on harmonious property in the present embodiment Send out electricity clan AGC power dynamic allocation method to comprise the following steps (Fig. 2):

(1) based on Guangdong Power Grid LOAD FREQUENCY Controlling model, according to the location distribution situation in Guangdong, with region Electrical network inner high voltage interconnection is electrical network border, manor, 93 units participating in AGC frequency modulation is divided into 6 VGT, builds based on VGT AGC power distribution frame.Wherein, VGT₁～VGT₆Represent North Guangdong, south, west of Guangdong Province Yuexi, west of Guangdong Province Yuexi, Pearl River Delta, south, East Guangdong and Guangdong respectively 6 manor electrical networks such as east, and with the VGT of hair electricity core the most₄As leader.

(2) state discrete collection S and action discrete set A is determined_Q

The state discrete collection S wherein determined in the present embodiment is to be instructed Δ P by each total generated output_∑As a shape State is constituted, unit MW.

The action discrete set A determined in the present embodiment_QFor:

A=[(0,0,0,0,0,1), (0.1,0,0,0,0.9) ..., (1,0,0,0,0,0)]；

A total of 2568 discrete movement.(action discrete set is exactly the situation of exhaustive all distribution in fact, due to amount of calculation Problem, the precision of distribution factor gets 0.1)

(3) real-time total generated output that the AGC general power PI controller of current control period regional power grid sends is gathered Instruction Δ P_∑, it is current state, initializes selection action, and according to currently optimizing task, utilize transfer learning to initialize The value function matrix of each virtual generating clan.

In the present embodiment, initial selected action is action discrete set A_QMiddle any action.

According to currently optimizing task, before utilizing transfer learning to initialize the value function matrix of each virtual generating clan, Need to set up sufficient value function source matrix.The inventive method is appointed using the degree of closeness of general power instruction size as Different Optimization The relativity evaluation index of business.Interval is respectively as follows:

{[{ΔP}_{l e f t}^{1}, {ΔP}_{r i g h t}^{1}), ..., [{ΔP}_{l e f t}^{η}, {ΔP}_{r i g h t}^{η}), ..., [{ΔP}_{l e f t}^{Z}, {ΔP}_{r i g h t}^{Z}]}

In the present embodiment, general power instruction size is carried out interval range division, interval be respectively as follows: [1500, 1250), [1250,1000), [1000,750) ..., (750,1000], (1000,1250], (1250,1500] } MW, always Totally 12 migrate interval, optimize originating task 12 altogether.Wherein, 1500MW is corresponding Guangdong Power Grid maximum single failure (direct current One pole locking) under load disturbance size.

\{\begin{matrix} r_{l e f t} = \frac{{ΔP}_{r i g h t}^{η} - {ΔP}_{Σ}}{{ΔP}_{r i g h t}^{η} - {ΔP}_{l e f t}^{η}} \\ r_{r i g h t} = \frac{{ΔP}_{Σ} - {ΔP}_{l e f t}^{η}}{{ΔP}_{r i g h t}^{η} - {ΔP}_{l e f t}^{η}} \end{matrix}

Wherein, r_left、r_rightRepresent current respectively and optimize the relevant of task end points left and right to interval corresponding optimization task Property, 0≤r_left≤ 1,0≤r_right≤ 1, r_left+r_right=1.In the present embodiment, the control cycle is 8s.

Q_i=r_leftQ_i,left+r_rightQ_i,rightI=1,2 ..., n

Wherein, Q_i,left、Q_i,rightRepresent that virtual generating clan i is at interval left and right end points correspondence optimal value function square respectively Battle array, can be by learning acquisition in advance.(see flow chart 2)

(4) the policy selection action issued according to leader calculates clan regulation power Δ P_i。

In the present embodiment, the policy selection action that leader issues, it is greedy strategy π^*, as follows:

π^{*} (s_{k}) = \arg \underset{a^{'} &Element; A_{Q}}{m a x} Q_{l e a d e r}^{k} (s_{k}, a^{'})

In the present embodiment, clan regulation power Δ P_iGreedy strategy π issued according to leader^*Obtain selection action, and then Can be calculated by equation below:

ΔP_i=λ_yiΔP_Σ

Wherein, λ_yiFor taking y-th action policy is distributed to the distribution power factor of virtual generating clan i, and to expire FootConstraint.

Nitrification enhancement is exactly the probability i.e. action policy of action discrete set of exhaustive all of distribution, each action One group of scale factor is constituted in fact.

(5) by Δ P_i→R_iCarry out reward function value mapping calculation, it is thus achieved that the reward function immediately of each virtual generating clan Value R_i(s_k,a_k,s_k+1)。

In the present embodiment, reward function value R immediately_i(s_k,a_k,s_k+1) can design as follows:

\begin{matrix} R_{i} (s_{k}, a_{k}, s_{k + 1}) = - μ_{1} Σ_{w = 1}^{m_{i}} C_{i w} {ΔP}_{i w} - \\ μ_{2} \max_{j = i \cup j &Element; Ω_{i}} (\underset{w = 1, 2, K, m_{j}}{\min \max} ({ΔP}_{j w} / {ΔP}_{j w}^{r a t e})) \end{matrix}

Wherein, μ₁、μ₂Be respectively regulation the goal of cost and time-to-climb target weight coefficient, and μ₁>=0, μ₂>=0, adjust Joint the goal of cost refer to minimize whole control area electrical network in the adjustment cost sum of all AGC units, time-to-climb target The time-to-climb of being to minimize all units maximum.In the present embodiment, to regulation the goal of cost and time-to-climb target preference Unanimously, thus μ₁、μ₂All it is set to 0.5；C_iwFor the adjustment cost coefficient of w unit in clan i；ΔP_iwRepresent that in clan i, w is individual The generated output instruction of unit；Represent the regulations speed of w unit in clan i；N is the total number of clan, the present embodiment In, n is 6；m_iThe total number of unit for clan i；Ω_iRepresent the set of the clan adjacent with clan i.

Owing to the calculating of reward function value needs first to determine the regulation power of clan's each unit internal, and time-to-climb Under consistency algorithm, the unit regulation power under different clan's power situations determines that.Therefore, method of least square can be used To clan power Δ P_iAnd R_iCarry out mapping relations matching, to accelerate the calculating of reward function value.

(6) concordance weight and the self study weight of each virtual generating clan under current state are updated.

The concordance weight beta of each virtual generating clan^k(s_k,a_k) and self study weight α^k(s_k,a_k) be updated as follows:

β^{k} (s_{k}, a_{k}) = \frac{o_{2}}{{(N_{O} (s_{k}, a_{k}) + 1)}^{τ_{2}}}

α^{k} (s_{k}, a_{k}) = \frac{o_{1}}{{(N_{O} (s_{k}, a_{k}) + 1)}^{τ_{1}}}

Wherein, N_O(s_k,a_k) represent that state action is to (s_k,a_k) algorithm explore optimizing occurrence number；o₁、o₂、τ₁、τ₂ It is respectively normal number, and τ to be met₁∈(1/2,1)、0<τ₂<τ₁‐1/(2+ε₁) constraint, wherein ε₁It it is an infinitesimal positive number. In the present embodiment, through substantial amounts of simulating, verifying, o₁Take 0.2, o₂Take 0.8, τ₁Take 0.55, τ₂Take 0.005.

(7) each void under current control period is updated according to the award value immediately of each virtual generating clan of current control period Send out the value function matrix Q of electricity clan_i。

In the present embodiment, the value function matrix Q of each virtual generating clan under current control period_iUpdate as follows:

Q_{i}^{k + 1} (s_{k}, a_{k}) = Q_{i}^{k} (s_{k}, a_{k}) - C_{Q} (s_{k}, a_{k}) + I_{Q} (s_{k}, a_{k})

Wherein, C_QRepresent the harmonious property renewal item that virtual generating clan i and adjacent virtual generate electricity between clan, be also Characterize the maximum different qualities of CTQ and single virtual generating clan Q study；I_QRepresent that the self study of virtual generating clan i updates item, As single virtual generating clan Q study update mechanism.Wherein, C_Q、I_QIteration update respectively as follows:

C_{Q} (s_{k}, a_{k}) = β^{k} (s_{k}, a_{k}) \underset{j &Element; Ω_{i} (k)}{Σ} (Q_{i}^{k} (s_{k}, a_{k}) - Q_{j}^{k} (s_{k}, a_{k}))

\begin{matrix} I_{Q} (s_{k}, a_{k}) = α^{k} (s_{k}, a_{k}) [R_{i} (s_{k}, a_{k}, s_{k + 1}) + \\ γ \max_{a^{'} &Element; A_{Q}} Q_{i}^{k} (s_{k + 1}, a^{'}) - Q_{i}^{k} (s_{k}, a_{k})] \end{matrix}

(8) the value function matrix Q of leader's+1 iteration of kth in current control period is judged_i ^k+1Value with kth time iteration Jacobian matrix Q_i ^kTwo norms of difference whether less than an infinitesimal positive number ε, i.e. | | Q_i ^k+1‐Q_i ^k||₂<ε.At the present embodiment In, ε takes 0.001.

(9) solve the selection action under current state, and then obtain clan regulation power Δ P_i。

(10) power Δ P is regulated according to clan_i, by time-to-climb consistency algorithm try to achieve power of the assembling unit Δ P_iw, and When the next control cycle arrives, return step (3).Time-to-climb the specifically comprising the following steps that of consistency algorithm

2. gather the real-time running data of current control period regional power grid, instruct Δ P including total generated output_∑, i-th Virtual generating clan VGT_iRegulation power Δ P_iAnd the real-time active power of output of each unit, and calculate virtual of i-th Electricity clan VGT_iPower offset value Δ P_error-i, simultaneously according to i-th virtual generating clan VGT_iRegulation power Δ P_iDetermine Regulations speed direction；

Regulations speed directionDetermine according to following formula:

{ΔP}_{i w}^{r a t e} = \{\begin{matrix} P_{i w}^{r a t e +} & i f {ΔP}_{i} > 0 \\ P_{i w}^{r a t e -} & i f {ΔP}_{i} < 0 \end{matrix}

In formula:Represent i-th virtual generating clan VGT_iThe w AGC unit rising regulation rate limit；Represent i-th virtual generating clan VGT_iThe w AGC unit decline regulations speed limit, machine in the present embodiment GroupNumerical value be shown in Table 1.

In the present embodiment, it is similar to therewith using VGT1 as object of study, the analysis of other VGT.As shown in table 1, it is assumed that G1 is the head of VGT1, and other unit is the head of a family and kinsfolk.

The time-to-climb of the power of the w AGC unit of virtual generating clan VGT1, consistent update is as follows:

t_{i w} [k + 1] = Σ_{j = 1}^{m_{i}} d_{j w} [k] t_{i j} [k]

Meanwhile, for ensure VGT power-balance, head G1 time-to-climb should update as follows:

t_{i w} [k + 1] = \{\begin{matrix} Σ_{j = 1}^{m_{i}} d_{j w} [k] t_{i j} [k] + μ_{i} {ΔP}_{e r r o r - i}, i f {ΔP}_{i} > 0 \\ Σ_{m_{i}}^{m_{i}} d_{j w} [k] t_{i j} [k] - μ_{i} {ΔP}_{e r r o r - i}, i f {ΔP}_{i} < 0 \end{matrix}

Wherein, μ_iRepresent the power error regulatory factor of clan i, μ_i＞ 0.In the present embodiment, μ_iTake 0.001.

4. power of the assembling unit Δ P is calculated_iw。

Power of the assembling unit Δ P_iwCan be calculated by following formula:

t_{i w} = {ΔP}_{i w} / {ΔP}_{i w}^{r a t e}

5. power of the assembling unit Δ P is judged_iwThe most out-of-limit.If power of the assembling unit Δ P_iwOut-of-limit, calculate power of the assembling unit Δ the most respectively P_iwWith t time-to-climb of the power of the assembling unit_iw, and update row stochastic matrix element；Otherwise, next step is turned.

{ΔP}_{i w} = \{\begin{matrix} {ΔP}_{i w}^{\max} & i f {ΔP}_{i w} > {ΔP}_{i w}^{\max} \\ {ΔP}_{i w}^{\min} & i f {ΔP}_{i w} < {ΔP}_{i w}^{\min} \end{matrix}

t_{i w} = t_{i w}^{\max} = \{\begin{matrix} {ΔP}_{i w}^{\max} / P_{i w}^{r a t e +} & i f {ΔP}_{i w} > {ΔP}_{i w}^{\max} \\ {ΔP}_{i w}^{\min} / P_{i w}^{r a t e -} & i f {ΔP}_{i w} < {ΔP}_{i w}^{\min} \end{matrix}

Wherein,Represent VGT respectively_iThe minimum and maximum spare capacity of power of w unit.At this Unit in embodimentNumerical value be shown in Table 1.

6. VGT is updated_iPower offset value Δ P_error-i；In the present embodiment, VGT1 power offset value is carried out more by definition New:

{ΔP}_{e r r o r - i} = {ΔP}_{i} - Σ_{w = 1}^{m_{i}} {ΔP}_{i w}

7. precision is judged.If | Δ P_error-i| ＜ ε_i, then power of the assembling unit Δ P is obtained_iw, otherwise return step 3..In this reality Execute in example, take peak power deviation | Δ P_error-1| ＜ 0.1MW is as the condition of convergence of VGT1.

Table 1 Guangdong Power Grid AGC unit parameter statistical table

Above-described embodiment is the present invention preferably embodiment, but embodiments of the present invention are not by above-described embodiment Limit, the change made under other any spirit without departing from the present invention and principle, modify, substitute, combine, simplify, All should be the substitute mode of equivalence, within being included in protection scope of the present invention.

Claims

1. an AGC power dynamic allocation method based on virtual generating clan, it is characterised in that comprise the following steps:

(1) build Automation generation control power distribution frame based on virtual generating clan VGT, determine the leader of clan；

(2) state discrete collection S and action discrete set A is determined_Q；

(3) real-time total generated output instruction Δ P of current control period regional power grid is gathered_∑For current state, and according to Currently optimize task, utilize transfer learning to initialize value function matrix and the selection action of each virtual generating clan；

(4) according to leader issue policy selection action and calculate clan regulation power Δ P_i；

(5) power Δ P is regulated by clan_iReward function value R immediately_iBetween mapping calculation, it is thus achieved that each virtual Power Generation Section Reward function value R immediately fallen_i；

(7) according under the renewal current control period of award value immediately of each virtual generating clan of current control period each virtual The value function matrix Q of electricity clan_i；

(8) the value function matrix Q of leader's+1 iteration of kth in current control period is judged_i ^k+1Value function with kth time iteration Matrix Q_i ^kTwo norms of difference whether less than an infinitesimal positive number ε, i.e. | | Q_i ^k+1‐Q_i ^k||₂<ε；If being unsatisfactory for, then return Step (5), if meeting, turns next step；

(9) power Δ P is regulated according to clan_i, by time-to-climb consistency algorithm try to achieve power of the assembling unit Δ P_iw, and at the next one When the control cycle arrives, return step (3).

AGC power dynamic allocation method based on virtual generating clan the most according to claim 1, it is characterised in that step (1) described based on virtual generating clan VGT the Automation generation control power distribution frame in, is by traditional area electrical network Being divided into several manors electrical network VGT according to geographical distribution, manor electrical network VGT is joined by large power plant PLC, active in the electrical network of manor The generating set group composition that net AGC, microgrid AGC and load control system are constituted；Between the electrical network VGT of manor use leader with Communicating cooperation with person's pattern, leader is the control centre of regional power grid, is responsible for power disturbance balance；Follower is common The dispatching terminal of VGT, main being responsible for interacts collaborative with leader.

AGC power dynamic allocation method based on virtual generating clan the most according to claim 1, it is characterised in that described State discrete collection S in step (2) is to be instructed Δ P by each total generated output_∑Constituted as a state.

AGC power dynamic allocation method based on virtual generating clan the most according to claim 1, it is characterised in that described In step (3) according to currently optimizing task, utilize transfer learning initialize each virtual generating clan value function matrix and Before selection action, need to set up value function source matrix；Δ P is instructed with total generated output_∑The degree of closeness of size is as difference The relativity evaluation index of optimization task；First, total generated output is instructed Δ P_∑Size carries out interval range division, interval point It is not:

{[{ΔP}_{l e f t}^{1}, {ΔP}_{r i g h t}^{1}), ..., [{ΔP}_{l e f t}^{η}, {ΔP}_{r i g h t}^{η}), ..., [{ΔP}_{l e f t}^{Z}, {ΔP}_{r i g h t}^{Z}]}

Wherein,Represent η total generated output instruction Δ P respectively_∑Interval left and right end points；Z is total generating Power instruction Δ P_∑Interval number；

Gather total generated output instruction Δ P of current control period_∑, it is assumed that it falls in the η load disturbance interval, the most currently The dependency that optimization task optimizes task with left and right end points is respectively as follows:

\{\begin{matrix} r_{l e f t} = \frac{{ΔP}_{r i g h t}^{η} - {ΔP}_{Σ}}{{ΔP}_{r i g h t}^{η} - {ΔP}_{l e f t}^{η}} \\ r_{r i g h t} = \frac{{ΔP}_{Σ} - {ΔP}_{l e f t}^{η}}{{ΔP}_{r i g h t}^{η} - {ΔP}_{l e f t}^{η}} \end{matrix}

Wherein, r_left、r_rightRepresent respectively and currently optimize the corresponding dependency optimizing task of task end points left and right with interval, 0≤ r_left≤ 1,0≤r_right≤ 1, r_left+r_right=1；

The initial value function matrix of the optimization task under current state is:

Q_i=r_leftQ_i,left+r_rightQ_i,rightI=1,2 ..., n

In formula: Q_i,left、Q_i,rightRepresent that virtual generating clan i, can at interval left and right end points correspondence optimal value function matrix respectively By learning acquisition in advance.

AGC power dynamic allocation method based on virtual generating clan the most according to claim 1, it is characterised in that described Initial selected action in step (3) is the action selected after the most pre-study before CTQ algorithm carries out transfer learning.

AGC power dynamic allocation method based on virtual generating clan the most according to claim 1, it is characterised in that step (5) described reward function value R immediately in_i(s_k,a_k,s_k+1) it is designed as:

\begin{matrix} R_{i} (s_{k}, a_{k}, s_{k + 1}) = - μ_{1} Σ_{w = 1}^{m_{i}} C_{i w} {ΔP}_{i w} - \\ μ_{2} \max_{j = i \cup j &Element; Ω_{i}} (\underset{w = 1, 2, K, m_{j}}{\min \max} ({ΔP}_{j w} / {ΔP}_{j w}^{r a t e})) \end{matrix}

Wherein, μ₁、μ₂Be respectively regulation the goal of cost and time-to-climb target weight coefficient, and μ₁>=0, μ₂>=0, regulate expense Target refer to minimize whole control area electrical network in the adjustment cost sum of all AGC units, time-to-climb target be minimum The time-to-climb of changing all units maximum；C_iwFor the adjustment cost coefficient of w unit in clan i；ΔP_iwRepresent in clan i The generated output instruction of w unit；Represent the regulations speed of w unit in clan i；N is the total number of clan；m_iFor The total number of unit of clan i；Ω_iRepresent the set of the clan adjacent with clan i；Use method of least square to clan power Δ P_i And R_iCarry out mapping relations matching, to accelerate the calculating of reward function value.

AGC power dynamic allocation method based on virtual generating clan the most according to claim 1, it is characterised in that described The concordance weight beta of each the virtual generating clan in step (6)^k(s_k,a_k) and self study weight α^k(s_k,a_k) be updated as Under:

β^{k} (s_{k}, a_{k}) = \frac{o_{2}}{{(N_{O} (s_{k}, a_{k}) + 1)}^{τ_{2}}}

α^{k} (s_{k}, a_{k}) = \frac{o_{1}}{{(N_{O} (s_{k}, a_{k}) + 1)}^{τ_{1}}}

Wherein, N_O(s_k,a_k) represent that state action is to (s_k,a_k) algorithm explore optimizing occurrence number；o₁、o₂、τ₁、τ₂Respectively For normal number, and τ to be met₁∈(1/2,1)、0<τ₂<τ₁‐1/(2+ε₁) constraint, wherein ε₁It it is an infinitesimal positive number.

AGC power dynamic allocation method based on virtual generating clan the most according to claim 1, it is characterised in that described The value function matrix Q of each virtual generating clan under current control period in step (7)_iUpdate as follows:

Q_{i}^{k + 1} (s_{k}, a_{k}) = Q_{i}^{k} (s_{k}, a_{k}) - C_{Q} (s_{k}, a_{k}) + I_{Q} (s_{k}, a_{k})

Wherein, C_Q(s_k,a_k) represent that virtual generating clan i is at s_kExecution action a under state_kTime and adjacent virtual generating clan between Harmonious property update item；I_Q(s_k,a_k) represent that virtual generating clan i is in state s_kLower execution action a_kTime self study update , as single virtual generating clan Q study update mechanism；C_Q(s_k,a_k)、I_Q(s_k,a_k) iteration update respectively as follows:

C_{Q} (s_{k}, a_{k}) = β^{k} (s_{k}, a_{k}) \underset{j &Element; Ω_{i} (k)}{Σ} (Q_{i}^{k} (s_{k}, a_{k}) - Q_{j}^{k} (s_{k}, a_{k}))

\begin{matrix} I_{Q} (s_{k}, a_{k}) = α^{k} (s_{k}, a_{k}) [R_{i} (s_{k}, a_{k}, s_{k + 1}) + \\ γ \max_{a^{'} &Element; A_{Q}} Q_{i}^{k} (s_{k + 1}, a^{'}) - Q_{i}^{k} (s_{k}, a_{k})] \end{matrix}

Wherein, s_k、s_k+1Represent the state of kth time and k+1 iteration respectively；It is that virtual generating clan i is at s_kState Lower execution action a_kQ-value；Ω_iK () is the set when kth time iteration with the virtual i adjacent virtual generating clan of clan that generates electricity.