CN116260143A - Automatic control method and system for power distribution network switch based on reinforcement learning theory - Google Patents
Automatic control method and system for power distribution network switch based on reinforcement learning theory Download PDFInfo
- Publication number
- CN116260143A CN116260143A CN202310099843.0A CN202310099843A CN116260143A CN 116260143 A CN116260143 A CN 116260143A CN 202310099843 A CN202310099843 A CN 202310099843A CN 116260143 A CN116260143 A CN 116260143A
- Authority
- CN
- China
- Prior art keywords
- distribution network
- power
- power distribution
- model
- reinforcement learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000009826 distribution Methods 0.000 title claims abstract description 196
- 238000000034 method Methods 0.000 title claims abstract description 98
- 230000002787 reinforcement Effects 0.000 title claims abstract description 59
- 238000010248 power generation Methods 0.000 claims abstract description 71
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 41
- 238000005457 optimization Methods 0.000 claims abstract description 31
- 238000006243 chemical reaction Methods 0.000 claims abstract description 26
- 238000004364 calculation method Methods 0.000 claims abstract description 25
- 230000008569 process Effects 0.000 claims description 55
- 230000009471 action Effects 0.000 claims description 41
- 230000006870 function Effects 0.000 claims description 26
- 238000012549 training Methods 0.000 claims description 11
- 230000007704 transition Effects 0.000 claims description 11
- 230000003993 interaction Effects 0.000 claims description 6
- 230000008859 change Effects 0.000 claims description 5
- 238000011217 control strategy Methods 0.000 claims description 4
- 230000009466 transformation Effects 0.000 claims description 4
- 238000011156 evaluation Methods 0.000 claims description 3
- 238000012821 model calculation Methods 0.000 claims description 3
- 230000008901 benefit Effects 0.000 abstract description 10
- 238000002347 injection Methods 0.000 description 8
- 239000007924 injection Substances 0.000 description 8
- 238000010586 diagram Methods 0.000 description 6
- 239000003795 chemical substances by application Substances 0.000 description 5
- 238000013459 approach Methods 0.000 description 4
- 230000007613 environmental effect Effects 0.000 description 4
- 239000000243 solution Substances 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 238000006467 substitution reaction Methods 0.000 description 3
- 101150051783 SWT1 gene Proteins 0.000 description 2
- 230000002411 adverse Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 235000014277 Clidemia hirta Nutrition 0.000 description 1
- 241000069219 Henriettea Species 0.000 description 1
- 238000000342 Monte Carlo simulation Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000005684 electric field Effects 0.000 description 1
- 238000004134 energy conservation Methods 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/04—Circuit arrangements for ac mains or ac distribution networks for connecting networks of the same frequency but supplied from different sources
- H02J3/06—Controlling transfer of power between connected networks; Controlling sharing of load between connected networks
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/38—Arrangements for parallely feeding a single network by two or more generators, converters or transformers
- H02J3/46—Controlling of the sharing of output between the generators, converters, or transformers
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J2203/00—Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
- H02J2203/10—Power transmission or distribution systems management focussing at grid-level, e.g. load flow analysis, node profile computation, meshed network optimisation, active network management or spinning reserve management
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J2203/00—Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
- H02J2203/20—Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J2300/00—Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
- H02J2300/20—The dispersed energy generation being of renewable origin
- H02J2300/22—The renewable source being solar energy
- H02J2300/24—The renewable source being solar energy of photovoltaic origin
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J2300/00—Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
- H02J2300/20—The dispersed energy generation being of renewable origin
- H02J2300/28—The renewable source being wind energy
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02E—REDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
- Y02E10/00—Energy generation through renewable energy sources
- Y02E10/50—Photovoltaic [PV] energy
- Y02E10/56—Power conversion systems, e.g. maximum power point trackers
Landscapes
- Engineering & Computer Science (AREA)
- Power Engineering (AREA)
- Supply And Distribution Of Alternating Current (AREA)
Abstract
The invention discloses a power distribution network switch automatic control method and system based on reinforcement learning theory, wherein the method comprises the following steps: establishing a Distflow tide optimization constraint model, and determining a reinforcement learning algorithm of approximate dynamic programming; acquiring historical output data of a distributed power generation unit, and establishing an output power fluctuation and conversion model of the distributed power generation unit; determining a distribution network topology structure, obtaining active output data of a distributed power generation unit, controllable sectionalizing switches of a power distribution network, state information of tie switches and calculation result information of a Distflow power flow optimization constraint model, and establishing an MDP model according to the fluctuation and conversion model of output power of the distributed power generation unit, the distribution network topology structure and the obtained information; and solving the MDP model by adopting a reinforcement learning algorithm, and outputting the automatic control optimal strategy of the power distribution network switch in real time. The invention can solve the problems of work fluctuation, failure and faults of the distributed power generation units in the power distribution network, improve the power supply reliability of the power distribution system and improve the investment benefit of the power distribution system.
Description
Technical Field
The invention relates to the technical field of power grid dispatching, in particular to a power distribution network switch automatic control method and system based on reinforcement learning theory.
Background
The distribution network bears the difficult tasks of receiving and distributing electric energy in an electric power system, and is directly oriented to an electric terminal and is communicated with daily production and living of people. The traditional power distribution network is in a tree-shaped structure, and the radial structure has a plurality of weaknesses: the fault damage is large, the mutual ability is poor, the automation degree is low, and the like. Although the manufacturing cost is sufficiently reduced, the reliability thereof is not high. In recent years, more and more distributed power generation systems (Distributed Generation, DG) such as new energy power generation systems for wind power generation and photovoltaic power generation are connected to the power distribution network, and such an optimization strategy can meet the development requirements of clean, environment-friendly, low-cost, efficient and reliable power industry to a certain extent.
However, the access to these renewable new energy sources also has a number of adverse effects on the distribution network. The biggest problem is that a new energy power generation mode such as wind power generation, solar power generation and the like is affected by the environment and has a plurality of random fluctuation, uncertainty and instability; secondly, each node branch of the power distribution network is easy to fail under the influence of natural disasters and artificial disasters. These adverse factors make the operating state of the power distribution system complex and variable, directly affecting the safe operation of the power distribution system.
Disclosure of Invention
The present invention aims to solve at least one of the technical problems in the related art to some extent. Therefore, a first object of the present invention is to provide an automatic control method for a power distribution network switch based on reinforcement learning theory, by which the problems of fluctuation, failure and faults of distributed power generation units in a power distribution network can be solved, the power supply reliability of the power distribution system can be improved, and the investment benefit of the power distribution system can be improved.
The second object of the invention is to provide an automatic control system for the power distribution network switch based on the reinforcement learning theory.
In order to achieve the above purpose, the invention is realized by the following technical scheme:
an automatic control method for a power distribution network switch based on reinforcement learning theory comprises the following steps:
step S1: establishing a Distflow tide optimization constraint model, and determining a reinforcement learning algorithm approximating dynamic programming;
step S2: acquiring historical output data of a photovoltaic and wind power distributed generation unit, and establishing a distributed generation unit output power fluctuation and conversion model according to the historical output data;
step S3: determining a distribution network topology structure, acquiring active output data of a distributed power generation unit, controllable sectionalizing switches of a power distribution network, contact switch status information and Distflow power flow optimization constraint model calculation result information, and establishing a distribution network Markov decision process MDP model according to the output power fluctuation and conversion model of the distributed power generation unit, the distribution network topology structure and the information acquired in the step S3;
Step S4: and solving the MDP model of the power distribution network Markov decision process by adopting the reinforcement learning algorithm of the approximate dynamic programming, and outputting the automatic control optimal strategy of the power distribution network switch in real time.
Optionally, the step S1 includes:
step S11: establishing the Distflow power flow optimization constraint model according to the power distribution network topological structure constraint and the power distribution network power flow calculation basic theory;
step S12: and determining the reinforcement learning algorithm which approximates to the dynamic programming according to reinforcement learning theory and a Belman optimal equation.
Optionally, in the step S2, before the step of establishing the output power fluctuation and conversion model of the distributed generation unit, the method further includes: the output power fluctuations and uncertainty conditions of each distributed generation unit are analyzed and quantified so as to take the quantified values as inputs to the power distribution network markov decision process MDP model.
Optionally, fluctuation and uncertainty of the output power of each distributed power generation unit can be simulated through the fluctuation and transformation model of the output power of the distributed power generation unit.
Optionally, the step S3 includes:
step S31: modeling output power fluctuation status quantized values of each distributed generation unit and corresponding power distribution network topological structures as state parameters of a power distribution network Markov decision process MDP model;
Step S32: modeling the action state of a switch connected with each branch line of a power distribution network as an action combination parameter of an MDP model of a Markov decision process of the power distribution network;
step S33: selecting cut loads in the Distflow power flow optimization constraint model considering distributed power generation faults and modeling line operation cost as a rewarding function reference index of the distribution network Markov decision process MDP model;
step S34: the state transition probabilities of the distribution network markov decision process MDP model are defined so as to take into account the uncertainty caused by the probability of change in the generated power output level of each distributed generation unit.
Optionally, in step S4, before outputting the automatic control optimal strategy of the power distribution network switch in real time, the method further includes: and performing offline iterative training on the MDP model of the power distribution network Markov decision process.
Optionally, the step of performing offline iterative training on the power distribution network markov decision process MDP model includes: and inputting active output data of the distributed power generation unit, outputting an automatic control strategy of a power distribution network switch, and feeding back an evaluation index of the real-time running condition of the power distribution network.
Optionally, in step S4, after outputting the power distribution network switch automatic control optimal policy in real time, the method further includes: and displaying decision result characters and displaying the network topology structure of the real-time power distribution network on the man-machine interaction interface.
Optionally, the action state of the switch includes: the switch changes the switch state at the current time and maintains the switch state at the current time.
In order to achieve the above object, a second aspect of the present invention provides an automatic control system for a power distribution network switch based on reinforcement learning theory, including:
the establishing module is used for establishing a Distflow tide optimization constraint model;
the determining module is used for determining a reinforcement learning algorithm approximate to dynamic programming;
the acquisition module is used for acquiring historical output data of the photovoltaic and wind power distributed generation units so that the establishment module establishes an output power fluctuation and conversion model of the distributed generation units according to the historical output data;
the establishing module is also used for establishing an MDP model of a power distribution network Markov decision process according to the power distribution network topological structure determined by the determining module, the active output data of the distributed power generation units, the controllable sectional switch of the power distribution network, the state information of the contact switch and the calculation result information of a Distflow flow optimization constraint model, and the fluctuation and conversion model of the output power of the distributed power generation units;
And the calculation module is used for solving the MDP model of the power distribution network Markov decision process according to the reinforcement learning algorithm of the approximate dynamic programming and outputting the automatic control optimal strategy of the power distribution network switch in real time.
The invention has at least the following technical effects:
1. according to the invention, the distributed power generation unit output power fluctuation and conversion model is built through the historical output data of the distributed power generation units such as photovoltaic power and wind power, and the Markov decision process MDP model is built through the distributed power generation unit output power fluctuation and conversion model, and the working efficiency fluctuation of new energy power generation modes such as wind power generation and solar power generation, which are influenced by the environment, is fully considered by the distributed power generation unit output power fluctuation and conversion model, so that the Markov decision process MDP model built by the model can focus on the power generation power output fluctuation of each distributed power generation unit, and a high-efficiency and stable power distribution network switch automatic control optimization strategy is provided.
2. The invention establishes the Distflow power flow optimization constraint model, and the Distflow power flow optimization constraint model can solve the problems of power distribution network optimization reconstruction, fault reconstruction and the like, and because the power distribution network reconstruction can achieve the purposes of optimizing power flow distribution and improving power supply reliability and economy by selecting a power supply path of a user, the invention establishes the obtained Markov decision process MDP model through the Distflow power flow optimization constraint model, can solve the fault problem of a power distribution system and improve the reliability of the power distribution system, and can dynamically formulate an optimal strategy in each time interval according to the real-time state and the state transition probability on the basis of the MDP model.
3. The invention adopts an Approximate Dynamic Programming (ADP) algorithm to solve the MDP model, and can solve the problem of dimension disaster.
4. The invention aims to optimally solve the problems of work fluctuation, failure and faults of the distributed power generation units in the power distribution network, improve the power supply reliability of the power distribution system and improve the investment benefit of the power distribution system.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
Fig. 1 is a flowchart of a method for automatically controlling a power distribution network switch based on reinforcement learning theory according to an embodiment of the present invention.
Fig. 2 is an overall frame diagram of a system corresponding to an automatic control method for a power distribution network switch based on approximate dynamic programming according to an embodiment of the present invention.
FIG. 3 is a flow chart of an approximate dynamic programming algorithm according to an embodiment of the present invention.
FIG. 4 is a block diagram of an offline computing process according to an embodiment of the present invention.
FIG. 5 is a block diagram of an online computing process according to an embodiment of the invention.
Fig. 6 is an overall flowchart of a method for automatically controlling a power distribution network switch based on reinforcement learning theory according to an embodiment of the present invention.
Fig. 7 is a block diagram of a power distribution network switch automatic control system based on reinforcement learning theory according to an embodiment of the present invention.
Detailed Description
The present embodiment is described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present invention and should not be construed as limiting the invention.
The method and system for automatically controlling the switch of the power distribution network based on the reinforcement learning theory of the embodiment are described below with reference to the accompanying drawings.
Fig. 1 is a flowchart of a method for automatically controlling a power distribution network switch based on reinforcement learning theory according to an embodiment of the present invention. As shown in fig. 1, the method includes:
step S1: and establishing a Distflow tide optimization constraint model, and determining a reinforcement learning algorithm approximating dynamic programming.
The step S1 includes:
step S11: and establishing a Distflow power flow optimization constraint model according to the power distribution network topological structure constraint and the power distribution network power flow calculation basic theory.
Specifically, the network node structure of the power distribution network is mainly in a tree shape, a radial structure and an annular structure, and the annular structure is auxiliary. It is widely used in light and medium density load areas. In view of both economy and management, most of the power grids adopt a radial distribution network structure.
Assuming that a distribution network has n nodes and m connecting lines, the basic judgment equation of the "tree" structure of the distribution network is:
m=n-1 (1)
the above formula describes the basic requirements of a "tree" like structure, but spanning trees also need to meet connectivity. The "tree" like structure with connectivity is called a spanning tree. In the spanning tree, it is required to satisfy the requirement that all nodes except the root node (substation node) have only one parent node. This requirement can be achieved with the following equation:
β ij +βj i =α l ,l=1,2,...,m (2)
∑ j∈N(i) β ij =1,i=1,2,...,n (3)
β 0j =0,j∈N(0) (4)
β ij ∈{0,1} (5)
0≤α l ≤1 (6)
by introducing two binary variables beta ij And beta ji Corresponds to each connecting line in the tree-shaped structure, and beta ij =1 means that node j is the parent node of node i, otherwise β ij =0. Sigma is a sum function. N (i) then represents the set of all nodes connected to node i. In addition, the connection state (connection and disconnection) of any two nodes in the network is defined by the variable alpha l Or alpha ij It is shown that the distribution network can be ensured to correspond to the spanning tree to which the main substation is connected. The above formulas respectively indicate that the line i actually exists in the spanning tree; constraining that each node except the root node has and has only one parent node; the substation node, i.e. the root node, has no parent node. The 5 equations described above can guarantee connectivity of the "tree" like structure, making it a spanning tree to simulate the structure of a power distribution network.
The automatic control method of the power distribution network switch meets the operation constraint conditions of the power distribution network in each decision time t, wherein the operation constraint conditions comprise network topology constraint, power balance constraint, power flow constraint, voltage limit constraint and line capacity constraint. The theoretical basis for meeting the constraint is a DistFlow power distribution network power flow calculation model. The power flow calculation of the power system can utilize parameters such as physical structures of nodes, voltage phasors, active power distribution, reactive power distribution, line loss and the like in the network system as operation conditions to determine the operation state of the whole power system.
The following formula is obtained according to kirchhoff's voltage law, energy conservation law and ohm's law:
S 1 =S 0 -S loss1 -S L1 (7)
V 1 ∠θ=V 0 -z 1 I 0 (9)
wherein S is i I=1, 2,..n represents the injection power of node i, e.g. the injection power of node 0 is S 0 =P 0 +jQ 0 I.e. injection power S 0 Equal to the injected active power P 0 And reactive power Q 0 Is a sum of plural forms of (a) and (b). S is S loss1 Representing the energy loss on the line from node 0 to node 1, S L1 Representing the load on node 1. The injection power on node 1 is equal to the injection power on node 0 minus the line loss power and the load demand power. z 1 =r 1 +jx 1 Representing the line impedance value, r, connecting node 0 to node 1 1 And x 1 As a line impedance variable, the relationship between impedance and energy loss power can be used according to ohm's law And (3) representing. V (V) 1 The angle theta represents the voltage phasor on the node 1, and the relation between the angle theta and the voltage phasor at the node 0 can be V 1 ∠θ=V 0 -z 1 I 0 Representation, I 0 For line current, θ is the phase angle.
And the following formula is obtained according to the power calculation formula:
wherein,,is S 0 Is then added to z 1 =r 1 +jx 1 、And substituting into V 1 ∠θ=V 0 -z 1 I 0 The following formula can be obtained:
V 1 ∠θ=V 0 -(r 1 +jx 1 )(P 0 -jQ 0 )/V 0 (11)
taking phasors from two sides of the model to obtain the following formula:
after simplification, the following formula can be obtained:
from the above analysis and using the above equation, the recurrence equation can be analogized to i, thus the following DistFlow equation is obtained:
in order to meet the actual condition of electric power operation, in the above formula, the lower case is introducedThe total consumption of the electric power load for the j node. Wherein p is j And q j Injection active power and injection reactive power of node j, respectively, +.>And->Respectively representing the active power and the reactive power of the electric power load of the j node;And->Respectively represents the active power and the reactive power of j node connected with a generator or DG output power supply, v j For the voltage of node j, r j And x j Is a line impedance variable. It should be noted in particular that the +. >And->These two physical quantities.
Now a nonlinear Distflow equation is obtained, in order to better apply the above formula to practical research work, the following two assumptions are proposed:
assume one: the value of the nonlinear term in the distribution model is very small and can be considered as 0.
Suppose two: consider V j ≈V 0 The following formula can be obtained:
based on the two assumptions above, the nonlinear Distflow equation obtained above can be converted into a system of linear equations:
thus, a linearized Distflow power flow calculation model is obtained. The automatic control of each branch switch in the power distribution network is essentially to plan and design the topology structure of the power distribution network, and the analysis and research are based on a linearized Distflow power flow calculation model.
Based on the Distflow power flow calculation model, the following power flow optimization constraint model is considered:
min∑p sd,j (21)
wherein a new physical quantity p is introduced sd,i And q sd,i Representing the active and reactive cut loads at the inode, respectively. Shedding loads, i.e. reducing loads, is also known as shedding loads, in which, in order to maintain the power balance and stability of the power system, line faults or natural disasters are typically encountered, the act of disconnecting part of the load from the grid is called shedding loads. In addition, P ik And Q ik Representing the active and reactive power flowing from node i to any node k respectively,and->Respectively representing the active power and the reactive power of the i node connected with the generator or the DG output power supply, P d,i And q d,i Representing the active and reactive load demands at the inode respectively,representing arbitrary node->And->Respectively represent P ji And Q ji Maximum value of>And->Respectively represent V i Minimum and maximum values; representing p by the power factor tan beta of the load sd,i And q sd,i Is a relationship of (3). By using the constraint relation of the power flow optimization constraint model, for the power distribution network of n nodes, the voltage V of the root node, namely node 1, is known 1 And the load demand P of each node d,i Topology of distribution network and impedance r of each branch ji +x ji For each node voltage V i 、V j Active power P of injection load ji And Q ji Active load shedding and reactive load shedding p sd,i And q sd,i And carrying out optimization solution on the equal physical quantity.
Part of lines in the power distribution network can control the on-off state of the lines, so that the following Distflow power flow optimization constraint model can be obtained:
min∑p sd,j (30)
the variable mu is introduced into the above ji Represents the on-off state of the line if mu ji =1 indicates line closure, otherwise μ ji =0。
The network reconstruction is the essence of a switch automatic control strategy, and the network reconstruction refers to the aim of optimizing power flow distribution and improving power supply reliability and economy by changing the combined state of a sectionalizer switch and a tie switch, namely selecting a power supply path of a user under the normal running state of the network. The problems of optimal reconstruction, fault reconstruction and the like of the power distribution network are solved by using the model.
Step S12: and determining a reinforcement learning algorithm for approximate dynamic programming according to the reinforcement learning theory and the Belman optimal equation.
Reinforcement learning is a class of problems in the field of machine learning that aims to allow an agent to take optimal action to maximize revenue. It is used to find the best action that should be taken in a certain environment. Reinforcement learning differs from supervised learning in that: in supervised learning, the training data (Train data) carries a Label value (Label), so the model itself is trained with correct answers; in reinforcement learning, the intelligent agent is able to learn gradually the optimal actions or paths from one experience and error, although there is no Label.
Dynamic programming (Dynamic Programming, DP) originates from problems in the engineering, financial arts that tend to focus on continuous state and decision control. In contrast, DP is primarily concerned with discrete states and decisions in the artificial intelligence domain. It belongs to an algorithm based on model characteristics in reinforcement learning. Under this reinforcement learning algorithm, the environment and model are known to the agent, and under a given complete Markov decision process model, the optimal strategy can be learned, and thus is classified as a model-based approach.
There are many high dimensional problems in DP, such as those studied herein, which are typically studied using programming tools. Most of these efforts have focused on deterministic problems using linear, nonlinear, or integer programming tools.
As with other algorithms for reinforcement learning, the core idea of DP is still to find the optimal decision based on the state-cost function. However, the state-cost function under the DP algorithm is based on the bellman optimal equation, on which it can be derived:
wherein v is marked with an asterisk subscript * (S t ) Representing a state-cost function satisfying a cost maximization, R t For instant rewards at time t, gamma is the discount factor, gamma e [0,1 ]]The degree to which γ approaches 1 represents the importance of future benefits; whereas the degree of gamma approaching 0 represents the importance of the current income, G t+1 To sum all rewards up to time t+ 1, S t Is a state set, s is a state at t time, A t For the action set, a is the action at time t,for the desired value. Using the concept expected in probability theory, overwriting can continue:
v * (S t )=max a ∑ s′,r p(s′,r|s,a)[r+γG t+1 ] (40)
where max is a maximum function, s ' is a next time state, r is an environmental feedback prize value, and p (s ', r|s, a) is a probability that the environmental feedback prize r and transitions to the next time state s ' under the precondition of state s and action a.
It should be noted that G t+1 And v * (S t+1 ) Having interchangeability, the following are obtained:
v * (S t )=max a ∑ s′,r p(s′,r|s,a)[r+γv * (S t+1 )] (41)
v * (S t )=max a {R t +∑ s′,r p(s′,r|s,a)*[γv * (S t+1 )]} (42)
the equation after substitution is a rudiment of the DP algorithm, which converts the bellman equation into a recursively updated equation that approximates the ideal cost function. Based on the above formula, all dynamic programs can be written in a recursive manner, and the recursive process enables a certain time t to be in a specific state value v t (S t ) State value v entered with next time t t+1 (S t+1 ) To be connected.
However, the DP algorithm has three "curses" in dimensions, complicating the solution process, mainly in three aspects: the state space S is too large to calculate the value function v in an acceptable time * (S t ) The method comprises the steps of carrying out a first treatment on the surface of the The decision space A is too large, the arrangement and combination of actions are too many, and the actions tend to rise exponentially, so that the optimal actions cannot be found quickly; the resulting space is too large to calculate the expected value for the future rewards.
The approximate dynamic programming is based on an algorithmic strategy that advances gradually over time, and is therefore also known as forward dynamic programming (forward dynamic). DP algorithm needs to be solved Converted into the form of probability distribution to obtain sigma s′,r p(s′,r|s,a)[r+γv * (S t+1 )]. Observing the method finds that excessive states of MDP (Markov Decision Process ) can cause problems of slow or unsolvable solution, and proposes an approximate dynamic programming algorithm to solve the problems, namely, a state-value function v * (S t ) Approximation and introducing a new concept:
the post-decision state (The post-decision state) refers to The state after The decision is madeAny new information arrives at the previous system state. This is a decision a t After being between S t And S is t+1 The state in between is recorded as
After this concept is put forward, the above formula can be rewritten as:
Step S2: and acquiring historical output data of the photovoltaic and wind power distributed generation units, and establishing a distributed generation unit output power fluctuation and conversion model according to the historical output data.
In the step S2, before the distributed generation unit output power fluctuation and conversion model is established, the method further includes: the output power fluctuations and uncertainty conditions of each distributed generation unit are analyzed and quantified to take the quantified value as one of the state inputs of the power distribution network markov decision process MDP model.
According to the related historical data, the wind power plant and the photovoltaic power plant have strong randomness of the active power output condition under a long time scale, and the requirement of exploring the volatility of new energy power generation units such as wind power generation, photovoltaic power generation and the like along with the natural environment is met.
The fluctuation and randomness of the active power of the wind power plant are affected by a plurality of objective factors, such as the region and the climate condition of the region where the wind power plant is located; and (3) spatial distribution and arrangement modes of the wind turbine generator. The average daily output of the wind power generation is calculated according to the following formula by using the active output data of the wind power generation with the sampling period of 15 min:
Wherein P is Average force of day And W is Day of the day And respectively representing the daily average output and the daily power generation amount of the wind power generation, wherein P (t) is the active output at the time t, and carrying out data analysis on the daily average output of the wind power plant according to the above method. The active power of the wind farm is affected by natural weather conditions in different seasons, and the fluctuation in the annual time range is severe. Therefore, in order to qualitatively select the wind power DG active power output fluctuation data, output data of a general output day (daily power generation amount=annual power generation amount/365 day) of the wind farm is selected. The peak output of the power day is generally from 3 to 10 points in the daytime and from 22 to 24 points in the nighttime, and the rest period is approximately equal to the stop of the power day.
The fluctuation and randomness of the active power of the photovoltaic electric field are mainly caused by the solar hours, the altitude and the natural disasters (drought, heavy rain and frost) of the photovoltaic power plant. Likewise, actual active output data from photovoltaic power generation with a general output day sampling period of t=15 min is utilized. Weather conditions directly lead to severe fluctuation of the active output of the photovoltaic power station; the output condition in sunny weather is greatly better than that in overcast weather; photovoltaic DG has output intermittence, and long-term continuous hair-cutting occurs during night time.
In this embodiment, after the distributed generation unit output power fluctuation and transformation model is established, the fluctuation and uncertainty of the output power of each distributed generation unit can be simulated through the distributed generation unit output power fluctuation and transformation model.
Specifically, the quantized value of the fluctuation of the output power of each distributed generation unit (DG) and the corresponding topology structure of the distribution network are regarded as the MDP state, and the generated output power of DG has high uncertainty.
DG has k output levels in each time period, and the larger the k value is, the finer the discrete quantization of DG output power analog quantity is, and the smaller the error is. Probability of one output stage k at time t to the other output stage k' at time t+1 is used pi kk′ This transition probability can be represented by a Monte Carlo simulation of the wind power generation DG history data. For example, the output level at time t and kThe number of history occurrences of (c) is m times and the number of transitions from the k output level to the k' output level at time t+1 is n times, so pi kk′ Equal to n/m.
Step S3: determining a distribution network topology structure, acquiring active output data of a distributed power generation unit, controllable sectionalizing switches of a power distribution network, contact switch status information and Distflow power flow optimization constraint model calculation result information, and establishing a distribution network Markov decision process MDP model according to the distribution power fluctuation and conversion model of the distributed power generation unit, the distribution network topology structure and the information acquired in the step S3.
The overall system framework diagram corresponding to the power distribution network switch automatic control method based on approximate dynamic programming is shown in fig. 2. Fig. 2 is composed of two parts, a reinforcement learning algorithm and a distributed power distribution network. At each moment of operation of the distributed power distribution network, external conditions such as weather, natural disasters and the like and network topology structure are taken as state parameters S t Namely, the external environment state is input into the MDP process, the concrete expression form is that the external conditions such as weather, natural disasters and the like influence the output conditions of photovoltaic power generation and wind power generation, and then the switching strategy A is output t . According to preset related parameters such as network line running cost, load shedding and the like, the current rewarding value is fed back, namely the instant rewarding R is fed back t . The reinforcement learning ADP algorithm utilizes the state after decision and the forward dynamic algorithm, and carries out iterative calculation according to the data, the real-time running state of the power distribution network and the accumulated rewarding value, and the calculation result of the algorithm is fed back to the agent. Each complete time period is one training, each decision after hundreds of training can obtain a converged calculated value, the values are stored in a table, and finally the values are compared to find the optimal strategy at each moment.
The step of establishing a power distribution network markov decision process MDP model in step S3 includes:
step S31: and modeling the quantized value of the fluctuation condition of the output power of each distributed power generation unit and the corresponding distribution network topological structure into the state parameters of a distribution network Markov decision process MDP model.
State set S i,t : the switches on each line within the distribution network topology are in turn defined as the network topology:
Ξ t =[swt 1 ,swt 2 ,swt 3 ,...,swt n ] (45)
therein, swt n The current state of the switches n is represented, each switch having an open and a closed state, represented by binary numbers 0 and 1, respectively. The topology of the distribution network at this time can thus be represented by an n-bit binary number t . In a Markov decision model of the power distribution network, the network topology structure at the moment t and the power output level of the distributed power generation unit are defined as a state set:
S i,t =[Ξ t |k 1,t ,k 2,t ,k 3,t ,...,k dg,t ,...,k DG,t ] (46)
wherein, xi t Representing network topology of power distribution network at time t, k dg,t Indicating the generated power output level at time t DG.
Step S32: the action states of the switches connected with each branch line of the power distribution network are modeled as action combination parameters of an MDP model of a Markov decision process of the power distribution network.
The action state of the switch comprises switching state of the switch at the current moment and maintaining the switching state at the current moment.
Action set A in State t (S i,t ): switch (swt) for connecting nodes 1 ,swt 2 ,swt 3 ,...,swt n ]At each time t there are and only two actions, namely switching from the state of the switch at time t-1 (switching from closed to open, switching from open to closed) or holding the state of the switch at time t-1, action a in a single switch state swt1 =a 1 Or a 2 And (3) representing. Each switch in the tree network line has the two actions at any time, A t (S i,t ) Can be expressed as:
A t (S i,t )=[a swt1 ,a swt2 ,...,a swti ] (47)
wherein a is swti Representing the action in the ith switch state, S i,t Representing a set of states.
Step S33: and selecting cut loads (load shedding) in a Distflow power flow optimization constraint model considering distributed power generation faults, and modeling line operation cost as a rewarding function reference index of an MDP model of a power distribution network Markov decision process.
Instant benefit function R (S) i,t ,a t ): when a specific action a is applied at time t t ∈A t (S i,t ) When the state of the network will be from s t ∈S i,t Changed to s t+1 ∈S i,t ,s t Sum s t+1 Representing the states at times t and t+1, respectively, the system observes an instant prize value R (S i,t ,a t ) The prize value is measured in terms of the overall distribution system network flow calculation.
Specifically, load shedding (load shedding) in the distribution network power flow calculation model Distflow considering the distributed power generation fault can be selected as a reward function R (S i,t ,a t ) Is a reference index of (2). R (S) i,t ,a t ) Representing the load shedding cost and the line operation cost in the distribution system, the values are constant and negative, and R (S i,t ,a t ) The larger the value of (2) is, the smaller the cost of load shedding is required. The magnitude of the load shedding reflects the input and output power balance and the system stability of the power distribution system, and the smaller the value is, the more can the work requirement and the economic budget of the power distribution network be met.
Step S34: the state transition probabilities of the distribution network markov decision process MDP model are defined so as to take into account the uncertainty caused by the probability of change in the generated power output level of each distributed generation unit.
In this embodiment, the uncertainty caused by the probability of change of the DG generated power output level may be considered, and the state transition probability of the MDP model in the markov decision process for converting the state into the state under the action of the action may be defined.
Probability of state transition P (S) j,t+1 |S i,t ,a t ): taking into account the probability of change pi of the generated power output level at time DG from t to t+1 kk′ Uncertainty brought about, state S i,t In action a t Is switched to state S by the action of (2) j,t+1 State transition probability P (S) j,t+1 |S i,t ,a t ) Can be expressed as follows:
P(S j,t+1 |S i,t ,a t )=∏ kk′ ×P(Ξ t+1 |Ξ t ,a t ) (48)
wherein P (xi) t+1 |Ξ t ,a t ) Representing the time t and the topological structure of the distribution network t In action a t Under the action of conversion into xi t+1 Is a probability of (2). Due to xi t Is changed to xi due to network reconfiguration t+1 And the topology conversion of the distribution network is determined under a specific reconfiguration operation, this probability is thus "100%" or "0%". Pi-shaped structure kk′ Probability of one output stage k at time t to another output stage k' at time t+1 when there is only one DG; if n DGs exist, the conversion probability of the generated power output level of each DG should be multiplied, i.e. the upper part should be written as n pi kk′ In the form of a continuous multiplication.
Step S4: and solving an MDP model of a power distribution network Markov decision process by adopting a reinforcement learning algorithm of approximate dynamic programming, and outputting an automatic control optimal strategy of a power distribution network switch in real time.
In the step S4, before outputting the automatic control optimal strategy of the power distribution network switch in real time, the method further includes: and performing offline iterative training on the MDP model of the Markov decision process of the power distribution network.
The step of performing offline iterative training on the power distribution network Markov decision process MDP model comprises the following steps: and inputting active output data of the distributed power generation unit, outputting an automatic control strategy of a power distribution network switch, and feeding back an evaluation index of the real-time running condition of the power distribution network.
Markov model and power distribution network established according to step S3 The Belman optimal equation considers the uncertainty and fluctuation of DG output, and the dynamic characteristic of the whole system is represented by four-parameter probability distribution P (S j,t+1 |S i,t ,a t ) Given. The recursively optimized state-cost function can be expressed as:
wherein, gamma is discount factor, gamma is [0,1 ]]The degree to which γ approaches 1 represents the importance of future benefits; whereas the degree to which γ approaches O represents the importance of the current benefit, r (S i,t ,a t ) To state S at the current moment i,t And action a t Instant prize value, v, of lower environmental feedback t+1 (S j,t+1 ) For the next time state S j,t+1 State-value function of (c). Let γ=1 and let r (S i,t ,a t ) The brackets are put forward, the following formula is obtained:
wherein,,representing the state S at the current time i,t And action a t Under the precondition, the next time state S j,t+1 The expected value of the state-cost function of (a), the instant benefit function R (S i,t ,a t ) Consists of two parts: the cost of load shedding and the line operation cost in the power distribution system are constant. So R (S) i,t ,a t ) Can be expressed as:
R(S i,t ,a t )=∑ b∈B (c b *p sd,b )+∑ l∈L (c 1 *μ b,b′ ) (52)
wherein B is node set, c b To cut off the load cost factor, p sd,b For active load shedding at node b, c 1 Mu, as a line operation cost coefficient b,b′ Introducing The concept of The post-decision state for The line operation cost from node b to node b', S i,t ,a t Rewriting into form of state variable after decisionWill beDefined as->Thereby realizing the form conversion of the logarithmic expected solving process, and obtaining the following formula:
wherein,,representing post-decision state->Is used to determine the state-value of the state-value of the value,indicating the last time state S j,t-1 And action a t-1 Under the precondition, the current moment state S i,t Expected value of state-cost function, R t+1 (S j,t+1 ,a t+1 ) Representing the next time state S j,t+1 And action a t+1 Immediate prize value for lower environmental feedback, +.>Representing post-decision state->The above equation converts multi-periodic and large scale MDP-based stochastic models into single-period deterministic models for each state in each decision period and can be solved by iteration +.>
Introducing a variable n to express the nth iteration, resulting in the following formula:
wherein,,representing state S in the nth iteration i,t State-value of function,/->Representing the post-decision state in the nth iteration +.>Combining the state-value function value of the state variable after decision with the state-value function recurrence form, and then using forward dynamic algorithm to update the state-value function value
Wherein,,representing the post-decision state in the nth iteration +.>Is used to determine the state-value of the state-value of the value,representing the post-decision state +.>State-value of function,/- >Representing state S in the nth iteration j,t The coefficient alpha is a smoothing parameter smaller than 1, the formula is updated by depending on the iteration and the number of iterations is enough, the state-value of each state after decision is +.>A corresponding convergence value may be obtained. And the value of different switching actions in each Markov state can be obtained, and finally the optimal actions and decisions can be found by comparing the values.
The ADP algorithm is specifically pseudo-code as follows:
Step 1a sets a network topology state and a DG fluctuation output state.
Step 1c sets a step size alpha epsilon (0, 1) and an iteration round number N.
Step 2 Do when t=1, 2,., T:
step 2a Mixed integer Linear programming solution R (S) using a Gurobi mathematical programming optimizer i,t ,a t ) And solve to obtainWherein, let substitution a t To solve the maximum problem.
Step 2c from state S i,t And action a t To obtain the state after decisionJudging whether T is smaller than T, and executing the following steps when T is not smaller than T;
step 2d, according to DG fluctuation and network topology structure, from the state after decisionEntering the Markov state S at the next moment j,t+1 。
The approximate dynamic programming algorithm flow chart is shown in fig. 3.
The process of automatically controlling MDP model learning and training by the power distribution network switch under the condition of DG output power fluctuation is an iterative process of offline calculation, and fig. 4 shows a flow chart of offline calculation. The purpose of the offline operation is to obtain a convergence value of the state value after decisionThe object is achieved by an approximate dynamic programming algorithm. The input information of the offline computing part comprises the network topology condition of the power distribution system and the uncertainty and fluctuation of DG output power. These information are fed into the ADP algorithm, which returns +.>It is a kind of ∈ ->And repeatedly iterating and finally converging the multi-period process.
In step S4, after outputting the power distribution network switch automatic control optimal strategy in real time, the method further includes: and displaying decision result characters and displaying the network topology structure of the real-time power distribution network on the man-machine interaction interface.
After the training and learning of the distribution network switch automatic control MDP model under the condition of DG output power fluctuation are finished, an optimal switching strategy needs to be automatically predicted. The prediction order and decision stage is an online calculation process, and the flow chart of the online calculation is shown in fig. 5.
The online computing is used for realizing the real-time Markov state S observed by the agent at each moment j,t And obtaining an optimal switching strategy of the power distribution network.As shown in FIG. 5, the post-decision state value output in the offline computationAnd observed Markov states S for each time period i,t As input to the single period deterministic model, the objective function is still R (S i,t ,a t ) The constraint condition is the distribution network structure and the tide constraint condition proposed in the step S1, and then return +.>And best action a t (S i,t ). In this case, the online computing process can obtain real-time Markov state S based at each decision moment i,t Is a maximum value policy of (a).
Fig. 6 is an overall flowchart of the power distribution network switch automatic control method based on reinforcement learning theory. As shown in fig. 6, it is mainly composed of four parts: the method comprises the steps of deducing relevant theoretical knowledge, establishing an automatic control MDP model of a power distribution network switch under DG fluctuation, solving the MDP model, and finishing analysis results to complete man-machine interaction APP. The theory deducing part deduces the Belman equation according to the ideas of the reinforcement learning and Markov decision model, and introduces the basic principles of dynamic planning and approximate dynamic planning based on the Belman equation. The MDP model is built, the fluctuation of new energy power generation units such as wind power generation, photovoltaic power generation and the like along with the natural environment is considered, and the Markov decision model for the automatic control of the power distribution network switch is built. Firstly, a conversion fluctuation model among different output levels of a distributed power generation unit is established, then an MDP model of a power distribution network is established according to an MDP model idea in a reinforcement learning theory, and finally a corresponding program is written according to structural constraint conditions and Distflow tide calculation constraint conditions of the distributed power distribution network. Solving the MDP model aims at the problems and difficulties which occur when the MDP model is solved by the traditional reinforcement learning algorithm, and proposes an automatic control ADP algorithm for a power distribution network switch to solve the problems. The core method is to introduce a post-decision state and a forward dynamic algorithm. The aim is to implement the conversion of multicycle and massive MDP-based stochastic models into single-cycle deterministic models for each state in each decision cycle. Finally, a man-machine interaction APP (Application) is written, interaction between a user and an automatic control software system of the power distribution network switch is realized, and text results and picture results of optimal decisions are directly checked.
Fig. 7 is a block diagram of a power distribution network switch automatic control system based on reinforcement learning theory according to an embodiment of the present invention. As shown in fig. 7, the power distribution network switch automatic control system 100 based on the reinforcement learning theory includes a building module 10, a determining module 20, an obtaining module 30, and a calculating module 40, where the building module 10 is connected to the determining module 20, the obtaining module 30, and the calculating module 40 is also connected to the determining module 20.
The establishing module 10 is configured to establish a Distflow power flow optimization constraint model. The determination module 20 is used to determine reinforcement learning algorithms that approximate dynamic programming. The obtaining module 30 is configured to obtain historical output data of the photovoltaic and wind power distributed generation units, so that the building module 10 builds the output power fluctuation and conversion model of the distributed generation units according to the historical output data. The establishing module 10 is further configured to establish an MDP model of the power distribution network markov decision process according to the power distribution network topology structure determined by the determining module 20, the active power output data of the distributed power generation unit, the controllable sectionalizing switch of the power distribution network, the status information of the tie switch, and the calculation result information of the Distflow power optimization constraint model, and the fluctuation and conversion model of the output power of the distributed power generation unit. The calculation module 40 is configured to solve the power distribution network markov decision process MDP model according to a reinforcement learning algorithm of the approximate dynamic programming, and output an automatic control optimal strategy of the power distribution network switch in real time.
It should be noted that, for avoiding redundancy, reference may be made to the specific implementation manner of the power distribution network switch automatic control system based on the reinforcement learning theory, which is described above, and no redundant description is given here.
In summary, the invention provides a method and a system for automatically controlling a power distribution network switch based on reinforcement learning theory, which can fully consider the working efficiency fluctuation of renewable new energy sources such as wind power generation, solar power generation and the like, establish a Markov decision process MDP model for automatically controlling the power distribution network switch, combine a general distribution network configuration model and power supply reliability, pay attention to the power output fluctuation of each distributed power generation unit, and provide an efficient and stable power distribution network switch automatic control optimization strategy; the invention utilizes the MDP process to improve economy, reliability, reduce loss and balance load, completes network reconstruction of the power distribution network containing the distributed power generation units, and dynamically establishes an optimal strategy in each time interval according to the real-time state and the state transition probability on the basis of the model; the invention adopts the approximate dynamic programming ADP algorithm to solve the MDP model, and can solve the problem of dimension disaster; the invention aims to optimally solve the problems of work fluctuation, failure and faults of the distributed power generation units in the power distribution network, improve the power supply reliability of the power distribution system and improve the investment benefit of the power distribution system.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
While the present invention has been described in detail through the foregoing description of the preferred embodiment, it should be understood that the foregoing description is not to be considered as limiting the invention. Many modifications and substitutions of the present invention will become apparent to those of ordinary skill in the art upon reading the foregoing. Accordingly, the scope of the invention should be limited only by the attached claims.
Claims (10)
1. The utility model provides a distribution network switch automatic control method based on reinforcement learning theory which is characterized by comprising the following steps:
step S1: establishing a Distflow tide optimization constraint model, and determining a reinforcement learning algorithm approximating dynamic programming;
step S2: acquiring historical output data of a photovoltaic and wind power distributed generation unit, and establishing a distributed generation unit output power fluctuation and conversion model according to the historical output data;
step S3: determining a distribution network topology structure, acquiring active output data of a distributed power generation unit, controllable sectionalizing switches of a power distribution network, contact switch status information and Distflow power flow optimization constraint model calculation result information, and establishing a distribution network Markov decision process MDP model according to the output power fluctuation and conversion model of the distributed power generation unit, the distribution network topology structure and the information acquired in the step S3;
step S4: and solving the MDP model of the power distribution network Markov decision process by adopting the reinforcement learning algorithm of the approximate dynamic programming, and outputting the automatic control optimal strategy of the power distribution network switch in real time.
2. The method for automatically controlling a power distribution network switch based on reinforcement learning theory according to claim 1, wherein the step S1 comprises:
Step S11: establishing the Distflow power flow optimization constraint model according to the power distribution network topological structure constraint and the power distribution network power flow calculation basic theory;
step S12: and determining the reinforcement learning algorithm which approximates to the dynamic programming according to reinforcement learning theory and a Belman optimal equation.
3. The method for automatically controlling a power distribution network switch based on reinforcement learning theory according to claim 1, wherein in the step S2, before the step of establishing the output power fluctuation and conversion model of the distributed power generation unit, the method further comprises: the output power fluctuations and uncertainty conditions of each distributed generation unit are analyzed and quantified so as to take the quantified values as inputs to the power distribution network markov decision process MDP model.
4. The method for automatically controlling the switching of the power distribution network based on the reinforcement learning theory according to claim 3, wherein fluctuation and uncertainty of the output power of each distributed power generation unit can be simulated through the fluctuation and transformation model of the output power of the distributed power generation unit.
5. The method for automatically controlling a power distribution network switch based on reinforcement learning theory according to claim 4, wherein the step S3 comprises:
Step S31: modeling output power fluctuation status quantized values of each distributed generation unit and corresponding power distribution network topological structures as state parameters of a power distribution network Markov decision process MDP model;
step S32: modeling the action state of a switch connected with each branch line of a power distribution network as an action combination parameter of an MDP model of a Markov decision process of the power distribution network;
step S33: selecting cut loads in the Distflow power flow optimization constraint model considering distributed power generation faults and modeling line operation cost as a rewarding function reference index of the distribution network Markov decision process MDP model;
step S34: the state transition probabilities of the distribution network markov decision process MDP model are defined so as to take into account the uncertainty caused by the probability of change in the generated power output level of each distributed generation unit.
6. The method for automatically controlling a power distribution network switch based on reinforcement learning theory according to claim 1, wherein in step S4, before outputting the power distribution network switch automatic control optimal strategy in real time, the method further comprises: and performing offline iterative training on the MDP model of the power distribution network Markov decision process.
7. The method for automatically controlling a power distribution network switch based on reinforcement learning theory according to claim 6, wherein the step of performing offline iterative training on the power distribution network markov decision process MDP model comprises: and inputting active output data of the distributed power generation unit, outputting an automatic control strategy of a power distribution network switch, and feeding back an evaluation index of the real-time running condition of the power distribution network.
8. The method for automatically controlling a power distribution network switch based on reinforcement learning theory according to claim 1, wherein in the step S4, after outputting the power distribution network switch automatic control optimal strategy in real time, the method further comprises: and displaying decision result characters and displaying the network topology structure of the real-time power distribution network on the man-machine interaction interface.
9. The method for automatically controlling a power distribution network switch based on reinforcement learning theory according to claim 5, wherein the action state of the switch comprises: the switch changes the switch state at the current time and maintains the switch state at the current time.
10. An automatic control system for a power distribution network switch based on reinforcement learning theory is characterized by comprising:
the establishing module is used for establishing a Distflow tide optimization constraint model;
The determining module is used for determining a reinforcement learning algorithm approximate to dynamic programming;
the acquisition module is used for acquiring historical output data of the photovoltaic and wind power distributed generation units so that the establishment module establishes an output power fluctuation and conversion model of the distributed generation units according to the historical output data;
the establishing module is also used for establishing an MDP model of a power distribution network Markov decision process according to the power distribution network topological structure determined by the determining module, the active output data of the distributed power generation units, the controllable sectional switch of the power distribution network, the state information of the contact switch and the calculation result information of a Distflow flow optimization constraint model, and the fluctuation and conversion model of the output power of the distributed power generation units;
and the calculation module is used for solving the MDP model of the power distribution network Markov decision process according to the reinforcement learning algorithm of the approximate dynamic programming and outputting the automatic control optimal strategy of the power distribution network switch in real time.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310099843.0A CN116260143A (en) | 2023-02-06 | 2023-02-06 | Automatic control method and system for power distribution network switch based on reinforcement learning theory |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310099843.0A CN116260143A (en) | 2023-02-06 | 2023-02-06 | Automatic control method and system for power distribution network switch based on reinforcement learning theory |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116260143A true CN116260143A (en) | 2023-06-13 |
Family
ID=86687421
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310099843.0A Pending CN116260143A (en) | 2023-02-06 | 2023-02-06 | Automatic control method and system for power distribution network switch based on reinforcement learning theory |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116260143A (en) |
-
2023
- 2023-02-06 CN CN202310099843.0A patent/CN116260143A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105826944A (en) | Method and system for predicting power of microgrid group | |
Huang et al. | Economic and resilient operation of hydrogen-based microgrids: An improved MPC-based optimal scheduling scheme considering security constraints of hydrogen facilities | |
CN116207739A (en) | Optimal scheduling method and device for power distribution network, computer equipment and storage medium | |
Khan et al. | Day ahead load forecasting for IESCO using artificial neural network and bagged regression tree | |
CN117833285A (en) | Micro-grid energy storage optimization scheduling method based on deep reinforcement learning | |
Liu et al. | Towards long-period operational reliability of independent microgrid: A risk-aware energy scheduling and stochastic optimization method | |
Harrold et al. | Battery control in a smart energy network using double dueling deep q-networks | |
Liu et al. | A review of wind energy output simulation for new power system planning | |
CN113673065B (en) | Loss reduction method for automatic reconstruction of power distribution network | |
Soares et al. | Hydro-dominated short-term hydrothermal scheduling via a hybrid simulation-optimisation approach: a case study | |
CN113612191A (en) | Method and device for rapidly recovering power supply of power distribution network | |
CN113344283A (en) | Energy internet new energy consumption capacity assessment method based on edge intelligence | |
CN117498335A (en) | Source network load storage global coordination optimization method considering source load uncertainty | |
CN109213104B (en) | Scheduling method and scheduling system of energy storage system based on heuristic dynamic programming | |
CN111799793A (en) | Source-grid-load cooperative power transmission network planning method and system | |
CN116260143A (en) | Automatic control method and system for power distribution network switch based on reinforcement learning theory | |
Bødal et al. | Production of hydrogen from wind and hydro power in constrained transmission grids, considering the stochasticity of wind power | |
CN110752599B (en) | Distributed power supply grid-connected configuration method | |
Tsegaye et al. | Hopfield Neural Network-based Security Constrained Economic Dispatch of Renewable Energy Systems | |
CN113240072A (en) | Deep learning-based prediction method for direct-current micro-grid cluster | |
CN111799842A (en) | Multi-stage power transmission network planning method and system considering flexibility of thermal power generating unit | |
Ghiassi et al. | On the use of AI as a requirement for improved insolation forecasting accuracy to achieve optimized PV utilization | |
Wang et al. | Active and reactive power coordinated control of active distribution networks based on prioritized reinforcement learning | |
Pan et al. | Robust security-constrained unit commitment using Benders decomposition | |
Sato et al. | Two-stage probabilistic short-term wind power prediction using neural network with MC dropout and control information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |