CN106372366A - Intelligent power generation control method based on hill-climbing algorithm - Google Patents

Intelligent power generation control method based on hill-climbing algorithm Download PDF

Info

Publication number
CN106372366A
CN106372366A CN201610866538.XA CN201610866538A CN106372366A CN 106372366 A CN106372366 A CN 106372366A CN 201610866538 A CN201610866538 A CN 201610866538A CN 106372366 A CN106372366 A CN 106372366A
Authority
CN
China
Prior art keywords
value
function
hill
ace
wolf
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610866538.XA
Other languages
Chinese (zh)
Inventor
席磊
陈建峰
杨苹
许志荣
柳浪
李玉丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Three Gorges University CTGU
Original Assignee
China Three Gorges University CTGU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Three Gorges University CTGU filed Critical China Three Gorges University CTGU
Priority to CN201610866538.XA priority Critical patent/CN106372366A/en
Publication of CN106372366A publication Critical patent/CN106372366A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/36Circuit design at the analogue level
    • G06F30/367Design verification, e.g. using simulation, simulation program with integrated circuit emphasis [SPICE], direct methods or relaxation methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/086Learning methods using evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06312Adjustment or analysis of established resource schedule, e.g. resource or task levelling, or dynamic rescheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E60/00Enabling technologies; Technologies with a potential or indirect contribution to GHG emissions mitigation

Abstract

The invention discloses an intelligent power generation control method based on a hill-climbing algorithm. The method comprises the following steps: determining a state discrete set S; determining a joint action discrete set A; collecting real-time operating data of each power grid, such as frequency deviation Delta f and power deviation Delta P, at the beginning of each control period, and calculating an instantaneous value of a control error ACEi(k) in each area and an instantaneous value of control performance standard CPSi(k); determining the current state S, and acquiring a short-term award function signal Ri(k) of a power grid i in a certain area according to the current state S and an award function; calculating and estimating earned value function errors pk and delta k; solving the target value function and strategy through the function; performing corresponding operations on the power grids j in all areas; returning to the step 3. According to the method disclosed by the invention, the optimal averaging strategy can be obtained in the control process, the performance of a closed-loop system is excellent, the problem of automation power generation coordination control in a complex interconnected power system environment brought by new energy power connection can be solved; compared with the conventional intelligent algorithm, the method disclosed by the invention has high learning ability and high convergence rate.

Description

A kind of intelligent power generation control method based on wolf hill-climbing algorithm
Technical field
The present invention relates to a kind of Power System Intelligent Generation Control technology, particularly to a kind of intelligence based on wolf hill-climbing algorithm Can electricity-generating control method.
Background technology
Modern power network has been developed as the interacted system of many control areas on the basis of electricity market mechanism, and interconnects Electrical network Automatic Generation Control (automatic generation control, agc) is most basic in energy management system One of function, be to ensure that the basic means of power system active power balance and frequency stable, its control effect directly affects Electrical network quality.In interconnected electric power system, its dominant eigenvalues deviation is as the change of user side load with frequency change And change.By controlling the change at random of active follow load of exerting oneself of electromotor and improving frequency quality of power grid, it is to work as A modern hot issue controlling research field.Automatic Generation Control is built upon using scheduling monitor computer, passage, a distant place The closed-loop control system of the compositions such as terminal, execution (distribution) device, generating set automation device.It is electric power system dispatching from One of main contents of dynamicization.
At present, under the overall background that intelligent grid is greatly developed, exploitation has independent learning ability and power plants and grid coordination ability Intelligent power generation control, progressively become a kind of main trend.And in recent years, multiple agent nitrification enhancement becomes The one big focus in machine learning field.The algorithm frame system being based particularly on classical q study is constantly enriched and is developed.And In its research field, having had many application examples to demonstrate each intelligent body in multiple agent intensified learning can chase after The decision-making of track other intelligent body is with dynamic coordinate itself action.So, several based on game theory, and with q learning method Lai Realize Distributed Reinforcement Learning method proposed successively, wherein more famous such as: minimax-q, nash-q and friend- or-foe q.Yet with minimax_q be zero-sum game, nash-q take up room greatly, the agent of ff-q must be known by other Agent be enemy be friend so that ff-q only has the defects such as individual rationality, limit the application of these algorithms.
Then, a kind of be suggested based on distributed multi agent learning algorithm dceq (λ) algorithm of correlated equilibrium, use To solve interconnected network agc Harmonic Control, and achieve relatively satisfactory control effect.But, increase in intelligent body number Added-time, dceq (λ) algorithm is in that geometry number increases in the search multiple agent equilibrium solution time, limits its method more massive Extensively apply in network system.The hill climbing that bowling&veloso developed " win " or " Fast Learning " in 2002 is calculated Method;In study, each agent using mixed strategy and only preserves the q value table of itself.So, on the one hand, it avoids typically Need the exploration solving in q study and utilize this contradictory problems;On the other hand, it can solve many agent system asynchronous certainly Question and answer on politics is inscribed.Based on this it is proposed that distributed wolf-phc (λ) algorithm, i.e. wolf hill-climbing algorithm.Its merged wolf-phc algorithm, Eligibility trace and sarsa algorithm, and the method is applied to solve the equilibrium solution during multiple agent intelligent power generation controls.Standard two area Electric power system model and the effectiveness of two case studies this algorithm verified of southern pessimistic concurrency control that domain LOAD FREQUENCY controls.By Change with environmental suitability in wolf learning rate, compared with other intelligent power generation control methods, wolf hill-climbing algorithm has quickly Rate of convergence.
For wolf hill-climbing algorithm, the information that each region intelligent body will not reduce and other intelligent bodies between exchanges, but At every moment perceive the state change that the action of other intelligent bodies causes.Control system is multi-agent system, each region All embedded in wolf hill-climbing algorithm, compared with ceq algorithm, seem the same single intelligent body algorithm of q study, only have in each algorithm One intelligent body, other intelligent body actions can produce impact to current state and subsequent time state, and this is namely so-called Intelligent body teamwork, and intelligent body can change learning rate at any time with the change of state, this namely wolf climb the mountain than q study Superior place.It is true that it is as many in hereinbefore cited minimax-q, nash-q, friend-or-foe q and dceq etc. Intelligent body learning algorithm is inherently belonging to the game between multiple agent, can be summarized as Nash Equilibrium game.But no It is same as Static Game scene, for the control process belonging to dynamic game, Nash Equilibrium Solution is in searching that each control time is spaced Suo Sudu might not meet realtime control requirement.The wolf hill climbing method being proposed is to replace many intelligence by Average Strategy Can the equilibrium point of body dynamic game solve, therefore from the viewpoint of game theory, wolf hill climbing method can be regarded as a kind of efficiently, Self independent game, reduces real time information and other intelligent bodies between and exchanges and jointly control the solution difficulty of strategy.Always For, wolf hill-climbing algorithm energy effectively solving Stochastic Game solves and the application problem in non-Markovian environment.And pass through The a kind of of stochastic dynamic game suitably wins defeated standard, introduces learning rate changing and Average Strategy, can improve wolf and climb the mountain dynamic Energy.Electric power system model and southern pessimistic concurrency control are controlled based on standard two region LOAD FREQUENCY, intelligence has been carried out to multi-intelligence algorithm The simulation example research that Generation Control is coordinated.Simulation result shows, wolf climbs the mountain and is obtained in that quickly compared with other intelligent algorithms Convergence property and the learning efficiency, multizone by force at random interconnection complex electric network environment under there is high degree of adaptability and robustness.
Content of the invention
The present invention provides a kind of intelligent power generation control method based on wolf hill-climbing algorithm, can obtain in control process Excellent Average Strategy, Performance of Closed Loop System is excellent, can solve the interconnection complicated electric power system ring that new forms of energy plant-grid connection is brought Automatic generation Harmonic Control under border;Compared with existing intelligent algorithm, there is higher learning capacity and Fast Convergent speed.
The technical solution adopted in the present invention is:
A kind of intelligent power generation control method based on wolf hill-climbing algorithm, comprises the following steps:
Step 1: determine state discrete collection s;
Step 2: determine teamwork discrete set a;
Step 3: when each controlling cycle starts, gather the real-time running data of each electrical network, described real time execution number According to including frequency departure △ f and power deviation △ p, calculate regional and control error aceiThe instantaneous value of (k) and control performance Standard cpsiThe instantaneous value of (k);
Step 4, determine current state s, then by current state s and reward function obtain one of certain regional power grid i short Phase reward function signal ri(k);
Step 5, by calculate with estimate obtain value function error pk、δk
Step 6, ask for optimal objective value function and strategy by function;
Step 7, to all regional power grid j, update all state-actions to (s, q function form a) and eligibility trace matrix ej(s a), and updates mixed strategy u under current state s by the q value updatingk(sk,ak), then by mixed strategy uk(sk,ak) more New value function qk+1(sk,ak), eligibility trace element e (s, a), learning rate changing φ and average mixed strategy table;
Step 8, return to step 3.
The state discrete collection s of described step 1, is determined by the division of the value of control performance standard cps1/cps2.
In described step 2, according to motion blurization rule, determine interval action.
The real-time running data of described step 3, is gathered using computer and monitoring system.
In described step 3, the area control error ace of described region iiK the instantaneous value calculating method of () is as follows:
Ace=ta-ts-10b(fa-fs),
Wherein, taFor interconnection actual trend value, tsExpect trend value for interconnection, b is frequency bias coefficient, faFor being System actual frequency values, fsFor system expected frequency value;
The cps of the control performance standard 1 of described region iiK the instantaneous value calculating method of () is as follows:
Cps1=(2-cf1) × 100%,
Wherein,biFrequency bias coefficient for control area i;ε1For interconnection electricity Net root mean square control targe value to 1 minute whole year frequency averaging deviation;N is the number of minutes of this examination period;aceave-1minFor Meansigma methodss in 1 minute for the area control error ace;△faveFor meansigma methodss in 1 minute for the frequency departure △ f;
The cps of the control performance standard 2 of described region iiK the instantaneous value calculating method of () is as follows:
Cps2=(1-r) × 100%,
Wherein,
ε10For interconnected network to annual 10 minutes frequency averaging deviations Root mean square control targe value;bnetFrequency bias coefficient for whole interconnected network;aceave-10minFor area control error ace Meansigma methodss in 10 minutes.
The short-term reward function signal r of described step 4iK () is by following formula obtained by, formula is as follows:
r i ( s k - 1 , s k , a k - 1 ) = &sigma; i - &mu; 1 i &delta;p i ( k ) 2 cpsl i ( k ) &greaterequal; 200 r i ( s k - 1 , s k , a k - 1 ) = - &eta; 1 i &lsqb; | ace i ( k ) | - | ace i ( k - 1 ) | &rsqb; - &mu; 1 i &delta;p i ( k ) 2 cpsl i ( k ) &element; &lsqb; 100 , 200 ) r i ( s k - 1 , s k , a k - 1 ) = - &eta; 2 i | cpsl i ( k ) - 200 | - | cpsl i ( k - 1 ) - 200 | - &mu; 2 i &delta;p i ( k ) 2 cpsl i ( k ) < 100 ,
Wherein, ri(sk-1,sk,ak-1) it is in selected action ak-1Lower state is from sk-1To skIntelligent body reward function, acei(k) and cps1iK () is respectively the instantaneous value that regional power grid i kth walks ace and cps1 of iteration, σiReward for region i history Maximum.
Value function error p of described step 5k、δkBy formula:
pk=r (sk,sk+1,ak)+γqk(sk+1,ag)-qk(sk,ak)
And δk=r (sk,sk+1,ak)+γqk(sk+1,ag)-qk(sk,ak)
Obtained, wherein, r (sk,sk+1,ak) it is in selected action akLower state is from skTo sk+1Intelligent body reward function, γ is discount factor, and the span of γ is 0 < γ < 1, agFor greedy action policy.
In described step 6, optimal objective value functionWith tactful π*S () is
v &pi; * ( s ) = m a x a &element; a q ( s , a )
&pi; * ( s ) = arg m a x a &element; a q ( s , a )
In formula, a is behavior aggregate.
In described step 7, by formula:
ek+1(s,a)←γλek(s,a)
Update eligibility trace matrix, according to formula:
qk+1(s, a)=qk(s,a)+αδkek(s,a)
Update q function form, wherein, ek(s is a) eligibility trace that kth walks iteration under state s action a, γ is discount The factor, the span of γ is mark decay factor for 0 < γ < 1, λ, and the span of λ is q learning rate for 0 < λ < 1, α, and α arranges model Enclose for 0 < α < 1.
Mixed strategy u in described step 7k(sk,ak) updated according to following formula:
In formula, φiFor learning rate changing.
In described step 7, according to formula:
qk+1(sk,ak)=qk+1(sk,ak)+αpk
Update value function qk+1(sk,ak), according to formula:
e k + 1 ( s , a ) = &gamma;&lambda;e k ( s , a ) + 1 , ( s , a ) = ( s k , a k ) &gamma;&lambda;e k ( s , a ) o t h e r w i s e
Update eligibility trace element e (sk,ak)←e(sk,ak)+1, according to formula:
Update learning rate changingAccording to formula:
u ~ ( s k , a i ) &leftarrow; u ~ ( s k , a i ) + ( u ( s k , a i ) - u ~ ( s k , a i ) ) / v i s i t ( s k ) , &forall; a i &element; a
Update average mixed strategy table, in formula,WithTwo learning parameters be used for representing the win of intelligent body with defeated, visit(sk) by the s being experienced from original state to current statekNumber of times.
A kind of intelligent power generation control method based on wolf hill-climbing algorithm of the present invention, hinge structure has such advantages as And effect:
1: in the design of the inventive method, its intelligent body can change learning rate at any time with the change of state, improve and be The dynamic property of system is so as to have higher Fast Convergent speed.
2: the inventive method is solved by the equilibrium point that Average Strategy replaces multiple agent dynamic game, reduces and other Between intelligent body, real time information exchanges and jointly controls the solution difficulty of strategy.
3: the inventive method is based on Average Strategy and mixed strategy so as in non-Markovian environment and long time delay system There is high degree of adaptability, and automatic generation under the interconnection complicated electric power system environment that new forms of energy plant-grid connection is brought can be solved Harmonic Control.
Brief description
Fig. 1 is agc MAS control framework.
Fig. 2 is south electric network LOAD FREQUENCY Controlling model figure.
Specific embodiment
A kind of intelligent power generation control method based on wolf hill-climbing algorithm, the framework of this intelligent power generation control method is by measuring intelligence Energy body, centralized Control intelligent body and decentralised control intelligent body three class intelligent body are formed, and this control framework adopts wolf hill-climbing algorithm Realize centralized Control and the decentralised control of agc respectively.Wolf hill-climbing algorithm be a kind of have multistep backtracking and learning rate changing many intelligence Energy body new algorithm, proposes for solving automatic generation Harmonic Control under interconnection complicated electric power system environment.This calculation Method, on the basis of wolf-phc, has merged sarsa (λ) and eligibility trace, can effectively solving Stochastic Game solve and in Fei Maer Can husband's environment application problem.Wolf hill-climbing algorithm learns compared to q, q (λ) study is calculated with multi-agent Learnings such as dceq (λ) Method its have faster convergence rate and the learning efficiency, multizone by force at random interconnection complex electric network environment under, have highly suitable Answering property and robustness.
The data input of test intelligent body is dominant eigenvalues deviation and the frequency departure in this region, is output as this region Control error amount and roll cps value.Afterwards, ace the and cps value in each region is transferred to concentration agc controller.If regional Data complete and concentrate agc controller normal work, then be output as the working value of regional, be cwolf- using method phc(λ)(centralized wolf-phc(λ));Otherwise, Centralized Controller transmits all gathered datas to regional Dispersion agc controller.If data is complete, each dispersion agc controller distributes the action each calculating and is independent of each other;If number According to complete, the uneven region-wide last time normal data of data row calculating action value distribute dynamic again called in by each decentralized controller Make, be dwolf-phc (λ) (decentralized wolf-phc (λ)) using method.Whole interconnected network one and only one Concentrate agc controller, and measure intelligent body and disperse agc controller to have one in each regional power grid.
Method of the present invention cwolf-phc (λ), its control decision process is divided into three phases:
1), the state-action to all intelligent bodies updates its q value to using wolf hill-climbing algorithm;
2), draw optimum Average Strategy;
3), execute optimum Average Strategy, and viewing system response, return award value and current state.
A kind of intelligent power generation control method based on wolf hill-climbing algorithm, comprises the following steps:
Step 1: determine state discrete collection s;
Step 2: determine teamwork discrete set a;
Step 3: when each controlling cycle starts, gather the real-time running data of each electrical network, described real time execution number According to including frequency departure △ f and power deviation △ p, calculate regional and control error aceiThe instantaneous value of (k) and control performance Standard cpsiThe instantaneous value of (k);
Step 4, determine current state s, then by current state s and reward function obtain one of certain regional power grid i short Phase reward function signal ri(k);
Step 5, by calculate with estimate obtain value function error pk、δk
Step 6, ask for optimal objective value function and strategy by function;
Step 7, to all regional power grid j, update all state-actions to (s, q function form a) and eligibility trace matrix ej(s a), and updates mixed strategy u under current state s by the q value updatingk(sk,ak), then by mixed strategy uk(sk,ak) more New value function qk+1(sk,ak), eligibility trace element e (s, a), learning rate changing φ and average mixed strategy table;
Step 8, return to step 3.
The state discrete collection s of described step 1, is determined by the division of the value of control performance standard cps1/cps2.
In described step 2, according to motion blurization rule, determine interval action.
The real-time running data of described step 3, is gathered using computer and monitoring system.
In described step 3, the area control error ace of described region iiK the instantaneous value calculating method of () is as follows:
Ace=ta-ts-10b(fa-fs),
Wherein, taFor interconnection actual trend value, tsExpect trend value for interconnection, b is frequency bias coefficient, faFor being System actual frequency values, fsFor system expected frequency value;
The cps of the control performance standard 1 of described region iiK the instantaneous value calculating method of () is as follows:
Cps1=(2-cf1) × 100%,
Wherein,biFrequency bias coefficient for control area i;ε1For interconnection electricity Net root mean square control targe value to 1 minute whole year frequency averaging deviation;N is the number of minutes of this examination period;aceave-1minFor Meansigma methodss in 1 minute for the area control error ace;△faveFor meansigma methodss in 1 minute for the frequency departure △ f;
The cps of the control performance standard 2 of described region iiK the instantaneous value calculating method of () is as follows:
Cps2=(1-r) × 100%,
Wherein,
ε10Equal to annual 10 minutes frequency averaging deviations for interconnected network The control targe value of root;bnetFrequency bias coefficient for whole interconnected network;aceave-10minExist for area control error ace Meansigma methodss in 10 minutes.
The short-term reward function signal r of described step 4iK () is by following formula obtained by, formula is as follows:
r i ( s k - 1 , s k , a k - 1 ) = &sigma; i - &mu; 1 i &delta;p i ( k ) 2 cpsl i ( k ) &greaterequal; 200 r i ( s k - 1 , s k , a k - 1 ) = - &eta; 1 i &lsqb; | ace i ( k ) | - | ace i ( k - 1 ) | &rsqb; - &mu; 1 i &delta;p i ( k ) 2 cpsl i ( k ) &element; &lsqb; 100 , 200 ) r i ( s k - 1 , s k , a k - 1 ) = - &eta; 2 i | cpsl i ( k ) - 200 | - | cpsl i ( k - 1 ) - 200 | - &mu; 2 i &delta;p i ( k ) 2 cpsl i ( k ) < 100 ,
Wherein, ri(sk-1,sk,ak-1) it is in selected action ak-1Lower state is from sk-1To skIntelligent body reward function, acei(k) and cps1iK () is respectively the instantaneous value that regional power grid i kth walks ace and cps1 of iteration, σiReward for region i history Maximum.
Value function error p of described step 5k、δkBy formula:
pk=r (sk,sk+1,ak)+γqk(sk+1,ag)-qk(sk,ak)
And δk=r (sk,sk+1,ak)+γqk(sk+1,ag)-qk(sk,ak)
Obtained, wherein, r (sk,sk+1,ak) it is in selected action akLower state is from skTo sk+1Intelligent body reward function, γ is discount factor, and the span of γ is 0 < γ < 1, agFor greedy action policy.
In described step 6, optimal objective value functionWith tactful π*S () is
v &pi; * ( s ) = m a x a &element; a q ( s , a )
&pi; * ( s ) = arg m a x a &element; a q ( s , a )
In formula, a is behavior aggregate.
In described step 7, by formula:
ek+1(s,a)←γλek(s,a)
Update eligibility trace matrix, according to formula:
qk+1(s, a)=qk(s,a)+αδkek(s,a)
Update q function form, wherein, ek(s is a) eligibility trace that kth walks iteration under state s action a, γ is discount The factor, the span of γ is mark decay factor for 0 < γ < 1, λ, and the span of λ is q learning rate for 0 < λ < 1, α, and α arranges model Enclose for 0 < α < 1.
Mixed strategy u in described step 7k(sk,ak) updated according to following formula:
In formula, φiFor learning rate changing.
In described step 7, according to formula:
qk+1(sk,ak)=qk+1(sk,ak)+αpk
Update value function qk+1(sk,ak), according to formula:
e k + 1 ( s , a ) = &gamma;&lambda;e k ( s , a ) + 1 , ( s , a ) = ( s k , a k ) &gamma;&lambda;e k ( s , a ) o t h e r w i s e
Update eligibility trace element e (sk,ak)←e(sk,ak)+1, according to formula:
Update learning rate changingAccording to formula:
u ~ ( s k , a i ) &leftarrow; u ~ ( s k , a i ) + ( u ( s k , a i ) - u ~ ( s k , a i ) ) / v i s i t ( s k ) , &forall; a i &element; a
Update average mixed strategy table, in formula,WithTwo learning parameters be used for representing the win of intelligent body with defeated, visit(sk) by the s being experienced from original state to current statekNumber of times.
The operation principle of the present invention:
The present invention is the intelligent power generation control method based on wolf hill-climbing algorithm, and the main working process of the present invention is as follows: One controlling cycle gathers the real-time running data of regional power grid to be controlled when starting;Setting based on reward function and working as Front state, obtains reward function signal;Optimal objective value function and strategy are asked for by function;Update all control areas electrical network Q value, eligibility trace, learning rate changing and mixed strategy;Obtain up-to-date action.The present invention can obtain optimum flat in control process All strategies, and closed loop system is excellent, can solve under the interconnection complicated electric power system environment that new forms of energy plant-grid connection is brought certainly Dynamic generating Harmonic Control, has higher learning capacity and Fast Convergent speed compared with existing algorithm.Whole control Method does not need the mathematical model of external environment condition, system control performance index can be converted into a kind of evaluation index, work as system When performance satisfaction requires, receive awards;Otherwise, pay for.The study by itself for the controller, the control obtaining optimum is moved Make, be highly suitable for multizone interconnected network intelligent power generation system random by force.Relative theory of the present invention includes:
1.wolf principle:
Scholars have made intensive studies having the application in opponent's problem of the wolf principle of heuristic, Accelerate pace of learning when failure, when win, reduce pace of learning.With and contrary average of other intelligent body current strategies Strategy is compared, if a player prefers current strategies, or current expectation award is bigger than the equilibrium value of game, then Player just wins.But the player of wolf principle gives strict requirements to required knowledge, which has limited wolf Principle universality.
2.phc:
Hill climbing (policy hill-climbing, the phc) algorithm being proposed is the extension of wolf principle, so that its More universality, according to hill climbing algorithm, q study can obtain mixed strategy and preserve q value.Due to phc have rationality and Convergence property, when other intelligent bodies select fixed policy, it can obtain optimal solution.Part document verified by suitable Explore tactful q value and can converge to optimal value q*, and pass through greedy strategy q*, u can obtain optimal solution.Although the method is rationality And mixed strategy can be obtained, but its convergence property is inconspicuous.
3.wolf-phc:
Bowling&veloso proposed the wolf-phc algorithm with learning rate changing φ in 2002, meanwhile met Rationality and convergence property.Two learning parameter φloseAnd φwinBe used for showing the win of intelligent body with defeated.Wolf-phc is based on void Intend game, the averagely greedy plan that it can pass through approximate equalization replaces unknown balance policy.
For a known intelligent body, based on mixed strategy collection uk(sk,ak), it can be in state skIt is transitioned into sk+1And have Exploration action a is executed in the case of having reward function rk, q function will be according to formula qk+1(s, a)=qk(s,a)+αδkek(s, a) and qk+1(sk,ak)=qk+1(sk,ak)+αρkIt is updated, u (sk,ak) more new law be
φ in formulaiFor learning rate changing, and φlosewin.If average mixed strategy value is lower than current strategy value, intelligence Body can win, select φwin, otherwise select φlose.Its more new law is
In formulaFor average mixed strategy.
Execution action akAfterwards, to skUnder state, the mixed strategy table of everything is updated,
u ~ ( s k , a i ) &leftarrow; u ~ ( s k , a i ) + ( u ( s k , a i ) - u ~ ( s k , a i ) ) / v i s i t ( s k ) , &forall; a i &element; a
Visit (s in formulak) by the s being experienced from original state to current statekNumber of times.
Embodiment:
The present embodiment is under the general frame of south electric network, and with Guangdong Power Grid as main study subject, phantom is Detailed full dynamic simulation model, detailed model parameter and emulation that Guangdong Center of Electric Dispatching and Transforming's practical engineering project is built Design principle refers to seat and builds, " the power system of quick multi-agent Learning strategy of being climbed the mountain based on wolf that Yu Tao, Zhang Xiaoshun deliver Intelligent power generation controls " (electrotechnics journal).In this phantom, south electric network is divided into Guangdong, Guangxi, Yunnan and four, Guizhou Regional power grid, and in the case of nominal parameters and two kinds of 10% white noise parameter perturbation of addition, be modeled using simulink Simulation study, is estimated to model performance.Control coordination, multiple agent intelligence in order to design learning rate changing to obtain intelligent power generation Generation Control can provide Average Strategy value.
Intelligent power generation design of control method based on wolf hill-climbing algorithm is as follows:
1): analyze the behaviour of systems with to state set s discretization;This example is drawn according to Guangdong power grid dispatching center cps index Minute mark is accurate, and cps1/cps2 value is divided into 6 states (- ∞, 0), and [0,100%), [100%, 150%), [150%, 180%), [180%, 200%), [200% ,+∞), then ace is divided into positive and negative 2 states, thus each intelligent body can be true Surely there are 12 states.The state of ace is primarily to distinguish the reason cause cps index to fluctuate;
2): determine teamwork discrete set a, using the obfuscation at action interval, action a total of 49 of the obfuscation in interval Rule, each rule has 7 discrete movement surely.
3): when each controlling cycle starts, the real-time running data of collection regional electrical network: △ f, △ p, wherein △ F represents system frequency deviation, and △ p represents dominant eigenvalues deviation;According to international evaluation method, ace=ta-ts-10b (fa-fs)(taFor interconnection actual trend value, tsExpect trend value for interconnection, b is frequency bias coefficient, faActual for system Frequency values, fsFor system expected frequency value), cps1=(2-cf1) × 100%,(biFor The frequency bias coefficient of control area i;ε1For interconnected network to annual 1 minute frequency averaging deviation root mean square control targe Value;N is the number of minutes of this examination period;aceave-1minFor meansigma methodss in 1 minute for the area control error ace;△faveFor frequency Meansigma methodss in 1 minute for the rate deviation △ f;), cps2=(1-r) × 100%,10 For interconnected network to annual 10 minutes frequency averaging deviations root mean square control targe value;bnetFrequency departure for whole interconnected network Coefficient;aceave-10minFor meansigma methodss in 10 minutes for the area control error ace), Calculate acei(k)、cpsiThe instantaneous value of (k).
4): according to the ace of regionali(k)、cpsiK the instantaneous value of () determines current state s, then by state s and award Function obtains the reward function signal r of a short-term of regional power gridi(k), reward function design is as follows:
r i ( s k - 1 , s k , a k - 1 ) = &sigma; i - &mu; 1 i &delta;p i ( k ) 2 cpsl i ( k ) &greaterequal; 200 r i ( s k - 1 , s k , a k - 1 ) = - &eta; 1 i &lsqb; | ace i ( k ) | - | ace i ( k - 1 ) | &rsqb; - &mu; 1 i &delta;p i ( k ) 2 cpsl i ( k ) &element; &lsqb; 100 , 200 ) r i ( s k - 1 , s k , a k - 1 ) = - &eta; 2 i | cpsl i ( k ) - 200 | - | cpsl i ( k - 1 ) - 200 | - &mu; 2 i &delta;p i ( k ) 2 cpsl i ( k ) < 100
Wherein, ri(sk-1,sk,ak-1) it is in selected action ak-1Lower state is from sk-1To skIntelligent body reward function, acei(k) and cps1iK () is respectively the instantaneous value that regional power grid i kth walks ace and cps1 of iteration, σiReward for region i history Maximum.
5): to all regional power grids, value of calculation function error pk=r (sk,sk+1,ak)+0.9×qk(sk+1,ag)-qk(sk, ak), estimated value function error
δk=r (sk,sk+1,ak)+0.9×qk(sk+1,ag)-qk(sk,ak) (γ is discount factor, takes 0.9, agMove for greedy Make strategy).
6): to all regional power grids, determine optimal objective value functionAnd strategy (a is behavior aggregate).
7): to all regional power grids, ek+1(s,a)←0.9×0.9×ek(s a) updates eligibility trace matrix, qk+1(s, a)=qk(s,a) +0.1×δkek(s, a) updates q function form, according to
Two formulas update mixed strategy,
qk+1(sk,ak)=qk+1(sk,ak)+0.1×pkUpdate value function qk+1(sk,ak),
e k + 1 ( s , a ) = 0.9 &times; 0.9 &times; e k ( s , a ) + 1 , ( s , a ) = ( s k , a k ) 0.9 &times; 0.9 &times; e k ( s , a ) o t h e r w i s e
Update eligibility trace element
e(sk,ak)←e(sk,ak)+1, Update learning rate changing
u ~ ( s k , a i ) &leftarrow; u ~ ( s k , a i ) + ( u ( s k , a i ) - u ~ ( s k , a i ) ) / v i s i t ( s k ) , &forall; a i &element; a
Update average mixed strategy table.
8): when upper once controlling cycle arrives, return to step 3.
The core of the present invention is the selection of reward function, the obfuscation at action interval and parameter designing.Wherein in wolf- On the basis of phc, merged sarsa (λ) and eligibility trace be this patent key innovations, the reality of this method or correlation technique Apply, efficiently solve Stochastic Game and solve and the application problem in non-Markovian environment, be allowed to obtain more quick Convergence property and the learning efficiency, and interconnect under complex electric network environment at random by force in multizone, it has high degree of adaptability and Shandong Rod, meets and coordinates the needs that optimal power generation controls between multi-region electric network.
Control method of the present invention can completely be described as follows:
1): state discrete collection s is determined by the division of the value of control performance standard cps1/cps2;
2): according to motion blurization rule, determine teamwork discrete set a;
3): when each controlling cycle starts, gather the real-time running data of each electrical network: frequency departure △ f and power Deviation △ p, calculates the ace of regionali(k) and cpsiThe instantaneous value of (k);
4): determine current state s, then a short-term prize being obtained certain regional power grid i by current state s and reward function Encourage function signal ri(k);
5): pass through
pk=r (sk,sk+1,ak)+γqk(sk+1,ag)-qk(sk,ak)
And δk=r (sk,sk+1,ak)+γqk(sk+1,ag)-qk(sk,ak) obtain value function error pk、δk
6): ask for optimal objective value functionAnd strategy
7): all regional power grids are executed:
ek+1(s,a)←γλek(s, a) updates eligibility trace matrix,
qk+1(s, a)=qk(s,a)+αδkek(s, a) updates q function form,
Update mixed strategy uk(sk,ak),
qk+1(sk,ak)=qk+1(sk,ak)+αpkUpdate value function qk+1(sk,ak),
e k + 1 ( s , a ) = &gamma;&lambda;e k ( s , a ) + 1 , ( s , a ) = ( s k , a k ) &gamma;&lambda;e k ( s , a ) o t h e r w i s e
Update eligibility trace element
e(sk,ak)←e(sk,ak)+1, Update learning rate changing
Update average mixed strategy table.
8): when upper once controlling cycle arrives, return to step 3.
Above-described embodiment is the present invention preferably embodiment, but embodiments of the present invention are not subject to above-described embodiment Limit other any spirit without departing from the present invention and the change made under principle, modification, replacement, combine, simplify, all Should be equivalent substitute mode, be included within protection scope of the present invention.

Claims (10)

1. a kind of intelligent power generation control method based on wolf hill-climbing algorithm is it is characterised in that comprise the following steps:
Step 1: determine state discrete collection s;
Step 2: determine teamwork discrete set a;
Step 3: when each controlling cycle starts, gather the real-time running data of each electrical network, described real-time running data bag Include frequency departure △ f and power deviation △ p, calculate regional and control error aceiThe instantaneous value of (k) and control performance standard cpsiThe instantaneous value of (k);
Step 4, determine current state s, then obtained a short-term prize of certain regional power grid i by current state s and reward function Encourage function signal ri(k);
Step 5, by calculate with estimate obtain value function error pk、δk
Step 6, ask for optimal objective value function and strategy by function;
Step 7, to all regional power grid j, update all state-actions to (s, q function form a) and eligibility trace matrix ej(s, A), and by the q value updating update mixed strategy u under current state sk(sk,ak), then by mixed strategy uk(sk,ak) updated value Function qk+1(sk,ak), eligibility trace element e (s, a), learning rate changing φ and average mixed strategy table;
Step 8, return to step 3.
2. according to claim 1 a kind of intelligent power generation control method based on wolf hill-climbing algorithm it is characterised in that: described step Rapid 1 state discrete collection s, is determined by the division of the value of control performance standard cps1/cps2.
3. according to claim 1 a kind of intelligent power generation control method based on wolf hill-climbing algorithm it is characterised in that: described step In rapid 2, according to motion blurization rule, determine interval action.
4. according to claim 1 a kind of intelligent power generation control method based on wolf hill-climbing algorithm it is characterised in that: described step Rapid 3 real-time running data, is gathered using computer and monitoring system.
5. according to claim 1 a kind of intelligent power generation control method based on wolf hill-climbing algorithm it is characterised in that: described step In rapid 3, the area control error ace of described region iiK the instantaneous value calculating method of () is as follows:
Ace=ta-ts-10b(fa-fs),
Wherein, taFor interconnection actual trend value, tsExpect trend value for interconnection, b is frequency bias coefficient, faReal for system Border frequency values, fsFor system expected frequency value;
The cps of the control performance standard 1 of described region iiK the instantaneous value calculating method of () is as follows:
Cps1=(2-cf1) × 100%,
Wherein,biFrequency bias coefficient for control area i;ε1For interconnected network to complete The root mean square control targe value of year 1 minute frequency averaging deviation;N is the number of minutes of this examination period;aceave-1minFor region control Meansigma methodss in 1 minute for error ace processed;△faveFor meansigma methodss in 1 minute for the frequency departure △ f;
The cps of the control performance standard 2 of described region iiK the instantaneous value calculating method of () is as follows:
Cps2=(1-r) × 100%,
Wherein,
ε10For interconnected network to annual 10 minutes frequency averaging deviation root-mean-square Control targe value;bnetFrequency bias coefficient for whole interconnected network;aceave-10minFor area control error ace at 10 points Meansigma methodss in clock.
6. according to claim 1 a kind of intelligent power generation control method based on wolf hill-climbing algorithm it is characterised in that: described step Rapid 4 short-term reward function signal riK () is by following formula obtained by, formula is as follows:
{ r i ( s k - 1 , s k , a k - 1 ) = &sigma; i - &mu; 1 i &delta;p i ( k ) 2 c p s 1 i ( k ) &greaterequal; 200 r i ( s k - 1 , s k , a k - 1 ) = - &eta; 1 i &lsqb; | ace i ( k ) | - | ace i ( k - 1 ) | &rsqb; - &mu; 1 i &delta;p i ( k ) 2 c p s 1 i ( k ) &element; &lsqb; 100 , 200 ) r i ( s k - 1 , s k , a k - 1 ) = - &eta; 2 i | c p s 1 i ( k ) - 200 | - | c p s 1 i ( k - 1 ) - 200 | - &mu; 2 i &delta;p i ( k ) 2 c p s 1 i ( k ) < 100 .
Wherein, ri(sk-1,sk,ak-1) it is in selected action ak-1Lower state is from sk-1To skIntelligent body reward function, acei(k) And cps1iK () is respectively the instantaneous value that regional power grid i kth walks ace and cps1 of iteration, σiReward maximum for region i history.
7. according to claim 1 a kind of intelligent power generation control method based on wolf hill-climbing algorithm it is characterised in that: described step Rapid 5 value function error pk、δkBy formula:
pk=r (sk,sk+1,ak)+γqk(sk+1,ag)-qk(sk,ak)
And δk=r (sk,sk+1,ak)+γqk(sk+1,ag)-qk(sk,ak)
Obtained, wherein, r (sk,sk+1,ak) it is in selected action akLower state is from skTo sk+1Intelligent body reward function, γ is Discount factor, the span of γ is 0 < γ < 1, agFor greedy action policy.
8. according to claim 1 a kind of intelligent power generation control method based on wolf hill-climbing algorithm it is characterised in that: described step In rapid 6, optimal objective value functionWith tactful π*S () is
v &pi; * ( s ) = m a x a &element; a q ( s , a )
&pi; * ( s ) = arg m a x a &element; a q ( s , a )
In formula, a is behavior aggregate.
9. according to claim 1 a kind of intelligent power generation control method based on wolf hill-climbing algorithm it is characterised in that: described step In rapid 7, by formula:
ek+1(s,a)←γλek(s,a)
Update eligibility trace matrix, according to formula:
qk+1(s, a)=qk(s,a)+αδkek(s,a)
Update q function form, wherein, ek(s, is a) eligibility trace that kth walks iteration under state s action a, γ is discount factor, The span of γ is mark decay factor for 0 < γ < 1, λ, and the span of λ is q learning rate for 0 < λ < 1, α, and α setting scope is 0 <α<1.
10. according to claim 1 a kind of intelligent power generation control method based on wolf hill-climbing algorithm it is characterised in that: described Mixed strategy u in step 7k(sk,ak) updated according to following formula:
In formula, φiFor learning rate changing;
In described step 7, according to formula:
qk+1(sk,ak)=qk+1(sk,ak)+αpk
Update value function qk+1(sk,ak), according to formula:
e k + 1 ( s , a ) = &gamma;&lambda;e k ( s , a ) + 1 , ( s , a ) = ( s k , a k ) &gamma;&lambda;e k ( s , a ) o t h e r w i s e
Update eligibility trace element e (sk,ak)←e(sk,ak)+1, according to formula:
Update learning rate changingAccording to formula:
u ~ ( s k , a i ) &leftarrow; u ~ ( s k , a i ) + ( u ( s k , a i ) - u ~ ( s k , a i ) ) / v i s i t ( s k ) , &forall; a i &element; a
Update average mixed strategy table, in formula,WithTwo learning parameters are used for representing the win of intelligent body and defeated, visit (sk) by the s being experienced from original state to current statekNumber of times.
CN201610866538.XA 2016-09-30 2016-09-30 Intelligent power generation control method based on hill-climbing algorithm Pending CN106372366A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610866538.XA CN106372366A (en) 2016-09-30 2016-09-30 Intelligent power generation control method based on hill-climbing algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610866538.XA CN106372366A (en) 2016-09-30 2016-09-30 Intelligent power generation control method based on hill-climbing algorithm

Publications (1)

Publication Number Publication Date
CN106372366A true CN106372366A (en) 2017-02-01

Family

ID=57898558

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610866538.XA Pending CN106372366A (en) 2016-09-30 2016-09-30 Intelligent power generation control method based on hill-climbing algorithm

Country Status (1)

Country Link
CN (1) CN106372366A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106899026A (en) * 2017-03-24 2017-06-27 三峡大学 Intelligent power generation control method based on the multiple agent intensified learning with time warp thought
CN107589672A (en) * 2017-09-27 2018-01-16 三峡大学 The intelligent power generation control method of isolated island intelligent power distribution virtual wolf pack control strategy off the net
CN110994620A (en) * 2019-11-16 2020-04-10 国网浙江省电力有限公司台州供电公司 Q-Learning algorithm-based power grid power flow intelligent adjustment method
CN111612162A (en) * 2020-06-02 2020-09-01 中国人民解放军军事科学院国防科技创新研究院 Reinforced learning method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103490413A (en) * 2013-09-27 2014-01-01 华南理工大学 Intelligent electricity generation control method based on intelligent body equalization algorithm
US20140172125A1 (en) * 2012-09-29 2014-06-19 Operation Technology, Inc. Dynamic parameter tuning using particle swarm optimization
CN103904641A (en) * 2014-03-14 2014-07-02 华南理工大学 Method for controlling intelligent power generation of island micro grid based on correlated equilibrium reinforcement learning
CN104037761A (en) * 2014-06-25 2014-09-10 南方电网科学研究院有限责任公司 AGC power multi-objective random optimization distribution method
CN107045655A (en) * 2016-12-07 2017-08-15 三峡大学 Wolf pack clan strategy process based on the random consistent game of multiple agent and virtual generating clan

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140172125A1 (en) * 2012-09-29 2014-06-19 Operation Technology, Inc. Dynamic parameter tuning using particle swarm optimization
CN103490413A (en) * 2013-09-27 2014-01-01 华南理工大学 Intelligent electricity generation control method based on intelligent body equalization algorithm
CN103904641A (en) * 2014-03-14 2014-07-02 华南理工大学 Method for controlling intelligent power generation of island micro grid based on correlated equilibrium reinforcement learning
CN104037761A (en) * 2014-06-25 2014-09-10 南方电网科学研究院有限责任公司 AGC power multi-objective random optimization distribution method
CN107045655A (en) * 2016-12-07 2017-08-15 三峡大学 Wolf pack clan strategy process based on the random consistent game of multiple agent and virtual generating clan

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
席磊 等: "基于狼爬山快速多智能体学习策略的电力系统智能发电控制方法", 《电工技术学报》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106899026A (en) * 2017-03-24 2017-06-27 三峡大学 Intelligent power generation control method based on the multiple agent intensified learning with time warp thought
CN107589672A (en) * 2017-09-27 2018-01-16 三峡大学 The intelligent power generation control method of isolated island intelligent power distribution virtual wolf pack control strategy off the net
CN110994620A (en) * 2019-11-16 2020-04-10 国网浙江省电力有限公司台州供电公司 Q-Learning algorithm-based power grid power flow intelligent adjustment method
CN111612162A (en) * 2020-06-02 2020-09-01 中国人民解放军军事科学院国防科技创新研究院 Reinforced learning method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN112615379B (en) Power grid multi-section power control method based on distributed multi-agent reinforcement learning
CN103490413B (en) A kind of intelligent power generation control method based on intelligent body equalization algorithm
CN106899026A (en) Intelligent power generation control method based on the multiple agent intensified learning with time warp thought
CN106372366A (en) Intelligent power generation control method based on hill-climbing algorithm
CN107045655A (en) Wolf pack clan strategy process based on the random consistent game of multiple agent and virtual generating clan
CN109726503A (en) Missing data complementing method and device
Li et al. Probabilistic charging power forecast of EVCS: Reinforcement learning assisted deep learning approach
CN103683337A (en) Interconnected power system CPS instruction dynamic allocation and optimization method
CN104537428B (en) One kind meter and the probabilistic economical operation appraisal procedure of wind power integration
US20230281459A1 (en) Method for calibrating parameters of hydrology forecasting model based on deep reinforcement learning
Wu et al. Power system flow adjustment and sample generation based on deep reinforcement learning
AU2021106780A4 (en) Virtual power plant self-optimisation load track control method
CN106532691A (en) Adaptive dynamic programming-based frequency compound control method of single-region power system
Konstantakopoulos et al. Smart building energy efficiency via social game: a robust utility learning framework for closing–the–loop
CN105787650A (en) Simulation calculation method for Nash equilibrium point of electricity market including multiple load agents
CN116454920A (en) Power distribution network frequency modulation method, device, equipment and storage medium
CN107589672A (en) The intelligent power generation control method of isolated island intelligent power distribution virtual wolf pack control strategy off the net
CN105914752A (en) Pilot node selection method based on clustering by fast search and density peaks
CN104731709A (en) Software defect predicting method based on JCUDASA_BP algorithm
CN117172097A (en) Power distribution network dispatching operation method based on cloud edge cooperation and multi-agent deep learning
Zamani-Gargari et al. Application of particle swarm optimization algorithm in power system problems
CN116933619A (en) Digital twin distribution network fault scene generation method and system based on reinforcement learning
CN104537224B (en) Multi-state System Reliability analysis method and system based on adaptive learning algorithm
CN111428903A (en) Interruptible load optimization method based on deep reinforcement learning
CN116470511A (en) Circuit power flow control method based on deep reinforcement learning

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170201

RJ01 Rejection of invention patent application after publication