CN116542137A - Multi-agent reinforcement learning method for distributed resource cooperative scheduling - Google Patents
Multi-agent reinforcement learning method for distributed resource cooperative scheduling Download PDFInfo
- Publication number
- CN116542137A CN116542137A CN202310401017.7A CN202310401017A CN116542137A CN 116542137 A CN116542137 A CN 116542137A CN 202310401017 A CN202310401017 A CN 202310401017A CN 116542137 A CN116542137 A CN 116542137A
- Authority
- CN
- China
- Prior art keywords
- power
- distributed
- ess
- reinforcement learning
- agent
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000002787 reinforcement Effects 0.000 title claims abstract description 53
- 238000000034 method Methods 0.000 title claims abstract description 45
- 238000004146 energy storage Methods 0.000 claims abstract description 42
- 238000009826 distribution Methods 0.000 claims abstract description 39
- 238000004088 simulation Methods 0.000 claims abstract description 24
- 238000013528 artificial neural network Methods 0.000 claims abstract description 9
- 238000012549 training Methods 0.000 claims abstract description 6
- 230000006870 function Effects 0.000 claims description 23
- 230000009471 action Effects 0.000 claims description 20
- 230000005611 electricity Effects 0.000 claims description 14
- 230000033228 biological regulation Effects 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 5
- 230000003828 downregulation Effects 0.000 claims description 3
- 230000000694 effects Effects 0.000 claims description 3
- 230000002452 interceptive effect Effects 0.000 claims description 3
- 230000003827 upregulation Effects 0.000 claims description 3
- 238000005728 strengthening Methods 0.000 claims description 2
- 230000003993 interaction Effects 0.000 abstract description 13
- 230000002457 bidirectional effect Effects 0.000 abstract description 8
- 238000005457 optimization Methods 0.000 abstract description 7
- 230000002776 aggregation Effects 0.000 abstract description 3
- 238000004220 aggregation Methods 0.000 abstract description 3
- 230000008447 perception Effects 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000007599 discharging Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000005284 excitation Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/092—Reinforcement learning
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/38—Arrangements for parallely feeding a single network by two or more generators, converters or transformers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2111/00—Details relating to CAD techniques
- G06F2111/04—Constraint-based CAD
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2113/00—Details relating to the application field
- G06F2113/04—Power grid distribution networks
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J2203/00—Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
- H02J2203/10—Power transmission or distribution systems management focussing at grid-level, e.g. load flow analysis, node profile computation, meshed network optimisation, active network management or spinning reserve management
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J2203/00—Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
- H02J2203/20—Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J2300/00—Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
- H02J2300/20—The dispersed energy generation being of renewable origin
- H02J2300/22—The renewable source being solar energy
- H02J2300/24—The renewable source being solar energy of photovoltaic origin
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Geometry (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Computer Hardware Design (AREA)
- Power Engineering (AREA)
- Supply And Distribution Of Alternating Current (AREA)
Abstract
The invention discloses a multi-agent reinforcement learning method for distributed resource cooperative scheduling, which comprises the steps of establishing a simulation environment of distributed equipment access distribution network; constructing intelligent agents for reinforcement learning of different distributed equipment; the intelligent agent and the simulation environment are trained interactively; and making a decision through the trained agent. Through the invention, researchers can accurately and quickly make decisions under the condition that all parameters of all distributed equipment aggregation models are not required to be known through training of historical data and strong data fitting capacity of a neural network. According to the invention, bidirectional interaction between a user and a power grid can be realized by an electric automobile aggregator, distributed photovoltaic equipment and energy storage, and the problems of inaccurate decision-making caused by overlong optimization time and incomplete parameter perception in the traditional optimization method are solved.
Description
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a multi-agent reinforcement learning method for distributed resource collaborative scheduling.
Background
At present, the power distribution network serving as a main body for new energy consumption has multiple branches and complex line structures, a large amount of distributed controllable resources are connected into a power grid inevitably to cause various and complex operation modes of the power grid, users can realize bidirectional interaction with the power grid through the distributed controllable equipment, most of research at present is based on the establishment of a distributed equipment aggregation model and the research of an electricity price excitation mechanism, great difficulty is brought to decision making when the power grid cannot comprehensively sense all parameters of a bottom aggregation model, optimal decision making is difficult to be carried out according to the current state, and meanwhile, the non-convexity and high uncertainty of coordination and optimization of the distributed photovoltaic equipment and the electric automobile of the power grid lead to overlong solving time and difficult to meet the regulation and control requirements. Therefore, whether an intelligent method can be explored to solve the defects brought by the distributed optimization method can be solved.
In recent years, with the rising and development of artificial intelligence technology, reinforcement learning (reinforement learning) is taken as an important scientific paradigm for solving the sequential decision problem, value judgment and strategy selection are updated in continuous learning through trial and error with environment, so that the reinforcement learning is an effective technology for solving the sequential decision problem, particularly a deep reinforcement learning model (Deep Reinforcement Learning, DRL) formed by combining a deep neural network and reinforcement learning, has better self-adaptive learning capability and optimal decision capability for solving the non-convex nonlinear problem, and provides a new idea for processing the distributed controllable resource collaborative scheduling problem of a complex power system.
Disclosure of Invention
This section is intended to outline some aspects of embodiments of the invention and to briefly introduce some preferred embodiments. Some simplifications or omissions may be made in this section as well as in the description summary and in the title of the application, to avoid obscuring the purpose of this section, the description summary and the title of the invention, which should not be used to limit the scope of the invention.
The present invention has been made in view of the above-described problems occurring in the prior art.
Therefore, the multi-agent reinforcement learning method for the distributed resource collaborative scheduling can solve the problems of inaccurate decision caused by overlong optimization time and incomplete parameter perception of the traditional optimization method.
In order to solve the technical problems, the invention provides a multi-agent reinforcement learning method for distributed resource collaborative scheduling, which comprises the following steps:
establishing a simulation environment of the distributed equipment access distribution network;
constructing intelligent agents for reinforcement learning of different distributed equipment;
the intelligent agent and the simulation environment are trained interactively;
and making a decision through the trained agent.
As a preferable scheme of the multi-agent reinforcement learning method for the distributed resource collaborative scheduling, the invention comprises the following steps: the simulation environment specifically comprises:
the power distribution network accessed by the distributed equipment needs to meet the constraint of a power system tide equation, the constraint of voltage safety and stability, the constraint of energy storage equipment operation, the constraint of distributed photovoltaic equipment and the constraint condition of an electric automobile aggregator, and after the distributed equipment is accessed, the decision of the distributed equipment is evaluated according to the decision given by the distributed equipment and returned to the intelligent body in the form of a rewarding value.
7. As a preferable scheme of the multi-agent reinforcement learning method for the distributed resource collaborative scheduling, the invention comprises the following steps: the power system flow equation constraints:
wherein P is mt,i,t And Q mt,i,t The active power and the reactive power of the node i generator set at the time t are calculated; p (P) load,i,t ,Q load,i,t The active load and the reactive load of the node i at the moment t are obtained; p (P) pv,i,t ,P ess,i,t ,P EVA,i,t Active power of distributed photovoltaic, energy storage and electric automobile aggregators of the node i at the time t is respectively; u (U) i,t The voltage modulus of the node i; u (U) j,t The voltage modulus of the node j; θ ij,t Is the phase angle difference between two nodes; g ij ,B ij The conductance and susceptance between nodes i, j, respectively;
the energy storage device operating constraints:
wherein E is ess,i For the energy storage capacity at node i, S ess,i,max ,P ess,i,max ,Q ess,i,max The apparent power, active power and reactive power upper limit at the node i are respectively Soc ess,i,max ,Soc ess,i,min Is the upper and lower limits of the charge state of stored energy, soc ess,i,t Is eta of the node energy storage charge state c ,η d E is the charge and discharge efficiency of energy storage ess,i,t For the energy stored at the current time at the node i at the t-th time, Δt represents the increment of the t time;
the distributed photovoltaic device constraints are such that,
P pv,i,min <P pv,i,t <P pv,i,max
p in the formula pv,i,max And P pv,i,min Respectively representing the maximum power and the minimum power which can be output by the distributed photovoltaic equipment of the node i at the t moment, P pv,i,t, Representing the output power of the distributed photovoltaic device at node i at time t;
the electric vehicle aggregator constraints are that,
p in the formula up,t And P down,t Respectively represents the adjustable capacity of the electric automobile polymerizer participating in the power down-regulation and up-regulation control at the t-th moment,maximum output power, P, of electric automobile polymerizer ev,t The output power of the electric automobile aggregator at the time t is obtained.
As a preferable scheme of the multi-agent reinforcement learning method for the distributed resource collaborative scheduling, the invention comprises the following steps: the agent comprises:
the agents of different distributed devices reinforcement learning, states acquired from the simulation environment, output action space, and rewarding functions.
As a preferable scheme of the multi-agent reinforcement learning method for the distributed resource collaborative scheduling, the invention comprises the following steps: the agent for reinforcement learning of different distributed devices further comprises:
the intelligent agents for reinforcement learning of different cloth-type equipment have respective state space and action space, and the intelligent agents can update parameters according to respective targets to achieve the effect of self-adaptive learning.
As a preferable scheme of the multi-agent reinforcement learning method for the distributed resource collaborative scheduling, the invention comprises the following steps: the state space comprises:
S={P load,load ,P pv,pv,max ,P EVA,EVA,max ,P mt,mt ,SOC ess,ess ,a,t}
wherein P is load,|load| ,P pv,|pv|,max ,P EVA,|EVA|,max ,P mt,|mt| ,SOC ess,|ess| A and t are respectively the power characteristics of an electric load, the upper limit of the output of the distributed photovoltaic equipment, the output of an electric automobile aggregator, the output of a traditional unit, the SOC of energy storage, the power price of a power grid at the current moment and a scheduling time section.
As a preferable scheme of the multi-agent reinforcement learning method for the distributed resource collaborative scheduling, the invention comprises the following steps: the action space includes:
a 1 =a ess,ess
a 2 =a pv,pv
a 3 =a EVA,EVA
wherein a is ess,|ess| ,a pv,|pv| ,a EVA,|EVA| Respectively representing the real-time energy storage output of the model, the output of the distributed photovoltaic equipment, the output of an electric automobile polymerizer and the output of the neural network, wherein the range of the value of the neural network output is [ -1,1]It is necessary to map back to the real action space according to the real physical constraints.
As a preferable scheme of the multi-agent reinforcement learning method for the distributed resource collaborative scheduling, the invention comprises the following steps: the interactive training comprises the following steps:
and the source load data accessed to the history in the simulation environment is used as a sample to interact with the intelligent agent, the intelligent agent for strengthening the learning of the distributed equipment performs action to learn according to the current state of the power distribution network, and a decision of maximizing the rewarding value is explored according to a gradient descending updating strategy of the rewarding value fed back by the simulation environment of the distribution network of the distributed equipment.
As a preferable scheme of the multi-agent reinforcement learning method for the distributed resource collaborative scheduling, the invention comprises the following steps: the prize value includes: setting a distributed photovoltaic equipment rewarding function, setting an energy storage equipment rewarding function and setting a rewarding function of an electric automobile aggregator;
the distributed photovoltaic device bonus function setting includes:
r 1 =r normal +aP pv,out +bP pv,delta
wherein r is normal Representing the rewards of safe and stable operation of the power grid, wherein the rewards are negative P when the power grid is unsafe pv,delta Representing the spare capacity of a photovoltaic cluster, P pv,out Representing the output power of the photovoltaic cluster, wherein a represents the time-of-use electricity price, and b represents the discount coefficient;
the energy storage device reward function setting comprises:
r 2 =r normal +a 1 η 1 P ess,in +a 2 η 2 P ess,out
wherein r is normal Representing the safe and stable operation rewards of the power grid, wherein the rewards are negative values when the power grid is unsafe, and P ess,in Represents the charging power, which is expressed as a negative value, a 1 Represents the purchase price, eta 1 Indicating the charging efficiency, a 2 Represents the price of electricity sold, eta 2 Indicating discharge efficiency;
the electric automobile aggregator's rewards function setting includes:
r 3 =r normal +a 1 P EVA,in +r DSO
wherein r is normal Representing the safe and stable operation rewards of the power grid, wherein the rewards are negative values when the power grid is unsafe, and P EVA,in Charging power purchased from the grid on behalf of the electric vehicle aggregator, expressed as negative value, a 1 Represents the purchase price of electricity, r DSO And representing the participation of the electric automobile polymerizer in the rewards given by the grid peak regulation and frequency modulation grid.
As a preferable scheme of the multi-agent reinforcement learning method for the distributed resource collaborative scheduling, the invention comprises the following steps: making a decision by the trained agent, comprising:
and accessing the trained intelligent agent into a power distribution network environment, analyzing data collected by a power grid data acquisition system in real time, and deciding the predicted power of the new energy and the current output power, load and energy storage state of charge state quantity according to the state information of the current power distribution network including the output power of the traditional unit.
The invention has the beneficial effects that: the algorithm provided by the invention firstly acquires the running state of the power distribution network from the power distribution network side, comprises source network load data and inputs the source network load data into different intelligent agents, the intelligent agents correct strategies according to respective rewards values to maximize the value of the evaluation network, and the purpose of bidirectional interaction and coordinated operation of user side resources and the power grid can be realized by learning in the process of constantly interacting with the environment and only acquiring the state of the power grid in the application stage. The method has the advantages that the goal that reasonable decisions can be made under the condition that the internal parameters of an electric automobile aggregator or a photovoltaic energy storage cluster are not required to be known is achieved, and when the method is applied online, quick and accurate decisions can be completed only according to the real-time power grid running state at the current moment.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. Wherein:
FIG. 1 is a flowchart of a multi-agent reinforcement learning method for distributed resource collaborative scheduling according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a multi-agent reinforcement learning method for collaborative scheduling of distributed resources according to an embodiment of the present invention, wherein the upper part of the diagram is a multi-agent reinforcement learning model, and the lower part of the diagram is a bidirectional interaction environment between a power distribution network and distributed equipment;
FIG. 3 is a schematic diagram of a multi-agent reinforcement learning model input state, decision output and policy update process of a multi-agent reinforcement learning method for distributed resource collaborative scheduling according to an embodiment of the present invention;
fig. 4 is a schematic diagram of an IEEE30 node simulation environment of a multi-agent reinforcement learning method for distributed resource co-scheduling according to embodiment 2 of the present invention.
Detailed Description
So that the manner in which the above recited objects, features and advantages of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to the embodiments, some of which are illustrated in the appended drawings. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present invention is not limited to the specific embodiments disclosed below.
Further, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic can be included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.
While the embodiments of the present invention will be described in detail with reference to the drawings, the cross-sectional view of the device structure will not be partially enlarged to general scale for convenience of description, and the drawings are merely illustrative and should not limit the scope of the invention. In addition, the three-dimensional dimensions of length, width and depth should be included in actual fabrication.
Also in the description of the present invention, it should be noted that the orientation or positional relationship indicated by the terms "upper, lower, inner and outer", etc. are based on the orientation or positional relationship shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the apparatus or elements referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first, second, or third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
The terms "mounted, connected, and coupled" should be construed broadly in this disclosure unless otherwise specifically indicated and defined, such as: can be fixed connection, detachable connection or integral connection; it may also be a mechanical connection, an electrical connection, or a direct connection, or may be indirectly connected through an intermediate medium, or may be a communication between two elements. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.
Example 1
Referring to fig. 1 and 2, a first embodiment of the present invention provides a multi-agent reinforcement learning method for distributed resource cooperative scheduling, including:
s1: constructing a distribution network simulation environment:
establishing a simulation environment of the distributed equipment access distribution network, wherein the distribution network accessed by the distributed equipment needs to meet the constraint of a power system tide equation, the constraint of voltage safety and stability, the constraint of energy storage equipment operation, the constraint of distributed photovoltaic equipment and the constraint condition of an electric automobile aggregator, and after the distributed equipment is accessed, evaluating the decision of the distributed equipment according to the decision given by the distributed equipment and returning the decision to an intelligent agent in the form of a rewarding value;
the constraints are as follows:
constraint of power flow equation of power distribution network
Wherein P is mt,i,t And Q mt,i,t The active power and the reactive power of the node i generator set at the time t are calculated; p (P) load,i,t ,Q load,i,t The active load and the reactive load of the node i at the moment t are obtained; p (P) pv,i,t ,P ess,i,t ,P EVA,i,t Active power of distributed photovoltaic, energy storage and electric automobile aggregators of the node i at the time t is respectively; u (U) i,t The voltage modulus of the node i; θ i,t Is the phase angle difference between two nodes; g ij ,B ij Conductivity and susceptance between nodes i, j, respectively 。
Power distribution network voltage safety and stability constraint
In order to ensure the quality of the power supply voltage, voltage safety and stability constraints are set as follows:
v i,min <v i <v i,max
in the formula, v i,max ,v i,min Respectively represent the upper and lower limits of the safe and stable voltage at the node i, and are respectively set to be 0.95v N And 1.05v N ,v N Is rated voltage.
Energy storage device operation constraints:
wherein E is ess,i For the energy storage capacity at node i, S ess,i,max ,P ess,i,max ,Q ess,i,max The apparent power, active power and reactive power upper limit at the node i are respectively, soc ess,i,max ,Soc ess,i,min Is the upper and lower limit of the energy storage charge state, eta c ,η d E is the charge and discharge efficiency of energy storage ess,i,t The energy stored at the current moment at the node i at the t moment is stored;
distributed photovoltaic device constraints
P pv,i,min <P pv,i,t <P pv,i,max
P in the formula pv,i,t,max And P pv,i,t,min Respectively representing the maximum power and the minimum power which can be output by the distributed photovoltaic equipment of the node i at the t moment, P pv,i,t, Representing the output power of the distributed photovoltaic device at node i at time t.
The electric vehicle aggregator constraints are that,
p in the formula up And P down Respectively representing adjustable capacity of electric automobile polymerizers participating in power down regulation and up regulation control at t moment , Maximum output power, P, of electric automobile polymerizer ev,t The output power of the electric automobile aggregator at the time t is obtained.
It should be noted that, if the power distribution network environment is a simulated power distribution network environment, different types of distributed devices including electric automobile aggregators, distributed photovoltaics, energy storage and the like need to be connected into the simulated environment, and bidirectional interaction between the power grid and the user is realized through mechanisms such as electricity price response and the like, so that the purpose of response at the demand side is achieved. According to the invention, the distribution network environment can provide training samples for the multi-agent reinforcement learning algorithm, the agents can acquire observation values from the environment, and the agents can give rewards in time for feedback after making decisions. If an open actual distribution network system for the test algorithm is provided, the simulation distributed resource interaction distribution network environment is not required to be constructed, and the simulation distributed resource interaction distribution network system can directly interact with the intelligent agent.
S2: building an intelligent agent:
constructing different distributed equipment reinforcement learning intelligent agents, and acquiring states, output action spaces and rewarding functions from a simulation environment;
different intelligent agents have respective state space and action space, and the intelligent agents can update parameters according to respective targets to achieve the effect of self-adaptive learning.
Further, the state space includes:
S={P load,|load| ,P pv,|pv|,max ,P EVA,|EVA|,max ,P mt,|mt| ,SOC ess,|ess| ,a,t}
wherein P is load,|load| ,P pv,|pv|,max ,P EVA,|EVA| ,P mt,|mt| ,SOC ess,|ess| A and t are respectively the power characteristics of an electric load, the upper limit of the output of the distributed photovoltaic equipment, the output of an electric vehicle aggregator, the output of a traditional unit, the SOC of energy storage, the power price of a power grid at the current moment and a scheduling time section;
still further, an action space comprising:
a 2 =a pv,|pv|
wherein a is ess,|ess| ,a pv,|pv| ,a EVA,|EVA| Respectively representing the real-time energy storage output of the model, the output of the distributed photovoltaic equipment, the output of an electric automobile polymerizer and the output of the neural network, wherein the range of the value of the neural network output is [ -1,1]It is necessary to map back to the real action space according to the real physical constraints.
It should be noted that the multi-agent reinforcement learning section includes different types of agents that have respective optimization objectives that are capable of trial and error learning while constantly interacting with the distribution network environment to maximize the respective objective rewards. Each intelligent agent obtains different states from the power distribution network environment, takes action according to the states, the environment calculates rewards according to the actions made by all the intelligent agents and returns the rewards to the intelligent agents, the intelligent agents update model parameters according to the rewards, and adjust strategies to enable the rewards obtained by the intelligent agents to be maximum and obtain the maximum accumulated rewards in continuous learning.
S3: interactive training of the intelligent agent and the simulation environment:
the distributed equipment reinforcement learning intelligent agent interacts with the power distribution network environment, firstly, the power distribution network gives out the running state of the current power grid and inputs the running state into the intelligent agent, and the intelligent agent makes a decision according to the running state of the power grid and interacts with the power grid to acquire rewards; and finally updating the value evaluation and strategy of the model according to the feedback rewarding value of the environment, so as to achieve the maximum rewarding value of different agents.
It should be noted that, the source load data accessed to the history in the environment is used as a sample to interact with the intelligent agent, and the model performs action learning according to the current state of the power distribution network. In the process of interaction between the intelligent agent and the environment, the strategy of updating the intelligent agent is reduced according to the gradient of the rewarding value fed back by the environment, and the decision of maximization of the rewarding value is explored, so that the requirement of cooperative operation of distributed equipment and a power distribution network can be met, and the problems that the current power grid is overlong in solving time and fuzzy in model parameters due to non-convexity and high uncertainty of solving problems are solved.
Further, different agents have respective different prize values, and specific prize values are set as follows:
distributed photovoltaic device bonus function settings:
because the output of the distributed photovoltaic equipment has randomness, the output power of the distributed photovoltaic equipment needs to consider the influence on the safe and stable operation of the power grid and has certain reserve capacity, and therefore, the rewards are given by the following formula:
r 1 =r normal +aP pv,out +bP pv,delta
wherein r is normal Representing the rewards of safe and stable operation of the power grid, wherein the rewards are negative P when the power grid is unsafe pv,delta Representing the spare capacity of the photovoltaic cluster, a representing the time-of-use electricity price and b representing the discount coefficient.
The charging and discharging power of the energy storage device has different efficiencies, the output power of the energy storage device also needs to consider the influence on the safe and stable operation of the power grid, electricity needs to be purchased from the power grid during charging, electricity needs to be sold to the power grid during discharging, and the maximum benefit of one day needs to be considered, so that the rewards are given by the following formula:
r 2 =r normal +a 1 η 1 P ess,in +a 2 η 2 P ess,out (10)
in the formula, rnormal represents a grid safe and stable operation rewarding, the rewarding value is a negative value when the grid is unsafe, pess, in represent charging power, the charging power is represented as a negative value, a1 represents purchase price, eta 1 The charging efficiency is represented, a2 represents the selling electricity price, eta 2 Indicating the discharge efficiency.
In essence, the electric automobile is equivalent to a battery energy storage device, and the idle time of the electric automobile can be fully utilized to participate in power grid regulation and control on the premise of meeting the charging requirement of an automobile owner. Thus, the electric vehicle aggregator's reward function is:
r 3 =r normal +a 1 P EVA,in +r DSO (11)
wherein r is normal Representing the safe and stable operation rewards of the power grid, wherein the rewards are negative values when the power grid is unsafe, and P EVA,in Charging power purchased from the grid on behalf of the electric vehicle aggregator, expressed as negative value, a 1 Represents the purchase price of electricity, r DSO And representing the participation of the electric automobile polymerizer in the rewards given by the grid peak regulation and frequency modulation grid.
The three agents optimize the strategy according to the respective rewarding function to maximize the rewarding function, so that the aim of bidirectional interaction between the power grid and the user side adjustable resource is fulfilled.
S4: on-line application decision:
and accessing the trained reinforcement learning intelligent agent into a power distribution network environment, analyzing data collected by a power grid data acquisition system in real time, and deciding according to the state information of the current power distribution network, including the output of a traditional unit, the predicted power of the new energy and the state quantities of the current output power, load, energy storage charge state and the like.
Furthermore, different agents act according to maximization of respective rewarding functions, and the requirement of bidirectional interaction between the distributed equipment and the power grid is met.
It should be noted that the invention aims at solving the defects of the prior method in the business quotient of the two-way interaction of the power grid and the user side resource, and can realize the two-way interaction of the power grid and the distributed equipment under the excitation of electricity price under the condition of meeting the safety and stability of the power distribution network, thereby maximally realizing the power consumption of the distributed photovoltaic equipment, and the peak regulation and frequency modulation of the power distribution network by the energy storage and electric automobile aggregator;
through quantifying the bidirectional interaction between the power grid and the user side resource through the reward function, different intelligent agents can learn through the respective reward function, gradient descent update respective strategies realize maximization of the reward function, and the energy storage and distributed photovoltaic equipment obtains benefits on the basis of meeting the safety and stability of the power grid and participates in peak regulation and frequency modulation of the power grid;
through training the reinforcement learning intelligent body consisting of the representation, dynamic and prediction neural network in offline, the intelligent body continuously learns the rules of the environment and continuously deduces the influence on the future internally, and continuously "trial and error" learns and explores, the goal of making reasonable decisions under the condition that the electric automobile aggregator or the internal parameters of the photovoltaic and energy storage clusters are not required to be known is finally achieved. When the method is applied online, quick and accurate decision-making can be completed only according to the real-time power grid running state at the current moment.
Example 2
Referring to fig. 3, for one embodiment of the present invention, a multi-agent reinforcement learning method for distributed resource collaborative scheduling is provided, and in order to verify the beneficial effects of the present invention, scientific demonstration is performed through experiments.
An IEEE30 node distribution network simulation environment is built, electric automobile load aggregators are connected to nodes 3 and 10, distributed photovoltaics are connected to nodes 20 and 28, an energy storage power station is connected to node 5, the simulation environment is built based on python's photovoltaic power flow calculation, and the specific environment is shown in figure 3.
Initializing environment, extracting source charge data from historical data, wherein the initial state of energy storage is 0, forming observed quantity ob s and respectively inputting the observed quantity ob s into the intelligent agents, and respectively giving actions a by the three intelligent agents ess,|ess| ,a pv,|pv| ,a EVA,|EVA| The output of the action and the source load data are input into a simulation environment built based on a bandwidth package together for load flow calculation, the branch load flow power and the node voltage are obtained, try and excess are used simultaneously, and if the load flow is not converged, r is the same as the branch load flow power normal =-100。
Calculating r according to the branch tidal current power, the node voltage, the upper limit of the branch power and the upper and lower limits of the node voltage normal :
According to r normal Calculating rewards r of three intelligent agents respectively 1 ,r 2 And r 3 Returning to the agent, the gradient drops to update the agent's parameters.
And fourthly, entering the next moment according to the action and the environmental state of the intelligent agent, and inputting source charge data at the next moment to repeat the process.
It should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present invention may be modified or substituted without departing from the spirit and scope of the technical solution of the present invention, which is intended to be covered in the scope of the claims of the present invention.
Claims (10)
1. A multi-agent reinforcement learning method for distributed resource cooperative scheduling is characterized in that: comprising the steps of (a) a step of,
establishing a simulation environment of the distributed equipment access distribution network;
constructing intelligent agents for reinforcement learning of different distributed equipment;
the intelligent agent and the simulation environment are trained interactively;
and making a decision through the trained agent.
2. The multi-agent reinforcement learning method for distributed resource collaborative scheduling according to claim 1, wherein: the simulation environment specifically comprises:
the power distribution network accessed by the distributed equipment needs to meet the constraint of a power system tide equation, the constraint of voltage safety and stability, the constraint of energy storage equipment operation, the constraint of distributed photovoltaic equipment and the constraint condition of an electric automobile aggregator, and after the distributed equipment is accessed, the decision of the distributed equipment is evaluated according to the decision given by the distributed equipment and returned to the intelligent body in the form of a rewarding value.
3. A multi-agent reinforcement learning method for distributed resource co-scheduling according to claim 1 or 2, wherein the power system flow equation constraints:
wherein P is mt,i,t And Q mt,i,t The active power and the reactive power of the node i generator set at the time t are calculated; p (P) load,i,t ,Q load,i,t The active load and the reactive load of the node i at the moment t are obtained; p (P) pv,i,t ,P ess,i,t ,P EVA,i,t Active power of distributed photovoltaic, energy storage and electric automobile aggregators of the node i at the time t is respectively; u (U) i,t The voltage modulus of the node i; u (U) j,t The voltage modulus of the node j; θ ij,t Is the phase angle difference between two nodes; g ij ,B ij The conductance and susceptance between nodes i, j, respectively;
the energy storage device operating constraints:
wherein E is ess,i For the energy storage capacity at node i, S ess,i,max ,P ess,i,max ,Q ess,i,max The apparent power, active power and reactive power upper limit at the node i are respectively Soc ess,i,max ,Soc ess,i,min Is the upper and lower limits of the charge state of stored energy, soc ess,i,t Is eta of the node energy storage charge state c ,η d E is the charge and discharge efficiency of energy storage ess,i,t For the energy stored at the current time at the node i at the t-th time, Δt represents the increment of the t time;
the distributed photovoltaic device constraints are such that,
P pv,i,min <P pv,i,t <P pv,i,max
p in the formula pv,i,max And P pv,i,min Respectively representing the maximum power and the minimum power which can be output by the distributed photovoltaic equipment of the node i at the t moment, P pv,i,t, Representing the output power of the distributed photovoltaic device at node i at time t;
the electric vehicle aggregator constraints are that,
p in the formula up,t And P down,t Respectively representing adjustable capacity of electric automobile polymerizers participating in power down regulation and up regulation control at t moment , Maximum output power, P, of electric automobile polymerizer ev,t The output power of the electric automobile aggregator at the time t is obtained.
4. The multi-agent reinforcement learning method for distributed resource collaborative scheduling according to claim 1, wherein: the agent comprises:
the agents of different distributed devices reinforcement learning, states acquired from the simulation environment, output action space, and rewarding functions.
5. The multi-agent reinforcement learning method for distributed resource collaborative scheduling according to claim 4, wherein: the agent for reinforcement learning of different distributed devices further comprises:
the intelligent agents for reinforcement learning of different cloth-type equipment have respective state space and action space, and the intelligent agents can update parameters according to respective targets to achieve the effect of self-adaptive learning.
6. The multi-agent reinforcement learning method for distributed resource collaborative scheduling according to claim 5, wherein: the state space comprises:
S={P load,|load| ,P pv,|pv|,max ,P EVA,|EVA|,max ,P mt,|mt| ,SOC ess,|ess| ,a,t}
wherein P is load,|load| ,P pv,|pv|,max ,P EVA,|EVA|,max ,P mt,|mt| ,SOC ess,|ess| A and t are respectively the power characteristics of an electric load, the upper limit of the output of the distributed photovoltaic equipment, the output of an electric automobile aggregator, the output of a traditional unit, the SOC of energy storage, the power price of a power grid at the current moment and a scheduling time section.
7. A multi-agent reinforcement learning method for distributed resource co-scheduling as claimed in claim 5 or 6, wherein said action space comprises:
a 1 =a ess,|ess|
a 2 =a pv,|pv|
a 3 =a EVA,|EVA|
wherein a is ess,|ess| ,a pv,|pv| ,a EVA,|EVA| Respectively representing the real-time energy storage output of the model, the output of the distributed photovoltaic equipment, the output of an electric automobile polymerizer and the output of the neural network, wherein the range of the value of the neural network output is [ -1,1]It is necessary to map back to the real action space according to the real physical constraints.
8. The multi-agent reinforcement learning method for distributed resource co-scheduling of claim 1, wherein the interactive training comprises:
and the source load data accessed to the history in the simulation environment is used as a sample to interact with the intelligent agent, the intelligent agent for strengthening the learning of the distributed equipment performs action to learn according to the current state of the power distribution network, and a decision of maximizing the rewarding value is explored according to a gradient descending updating strategy of the rewarding value fed back by the simulation environment of the distribution network of the distributed equipment.
9. The multi-agent reinforcement learning method of distributed resource co-scheduling according to any of claims 2 and 8, wherein the reward value comprises: setting a distributed photovoltaic equipment rewarding function, setting an energy storage equipment rewarding function and setting a rewarding function of an electric automobile aggregator;
the distributed photovoltaic device bonus function setting includes:
r 1 =r normal +aP pv,out +bP pv,delta
wherein r is normal Representing the rewards of safe and stable operation of the power grid, wherein the rewards are negative P when the power grid is unsafe pv,delta Representing the spare capacity of a photovoltaic cluster, P pv,out Representing the output power of the photovoltaic cluster, wherein a represents the time-of-use electricity price, and b represents the discount coefficient;
the energy storage device reward function setting comprises:
r 2 =r normal +a 1 η 1 P ess,in +a 2 η 2 P ess,out
in the method, in the process of the invention, rnormal representing the safe and stable operation rewards of the power grid, wherein the rewards are negative values when the power grid is unsafe, and P ess,in Represents the charging power, which is expressed as a negative value, P ess,out Represents the discharge power, a 1 Represents the purchase price, eta 1 Indicating the charging efficiency, a 2 Represents the price of electricity sold, eta 2 Indicating discharge efficiency;
the electric automobile aggregator's rewards function setting includes:
r 3 =r normal +a 1 P EVA,in +r DSO
wherein r is normal Representing the safe and stable operation rewards of the power grid, wherein the rewards are negative values when the power grid is unsafe, and P EVA,in Charging power purchased from the grid on behalf of the electric vehicle aggregator, expressed as negative value, a 1 Represents the purchase price of electricity, r DSO And representing the participation of the electric automobile polymerizer in the rewards given by the grid peak regulation and frequency modulation grid.
10. The multi-agent reinforcement learning method of co-scheduling of distributed resources according to any one of claims 1, 4, 8, wherein making decisions by trained agents comprises:
and accessing the trained intelligent agent into a power distribution network environment, analyzing data collected by a power grid data acquisition system in real time, and deciding the predicted power of the new energy and the current output power, load and energy storage state of charge state quantity according to the state information of the current power distribution network including the output power of the traditional unit.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310401017.7A CN116542137A (en) | 2023-04-14 | 2023-04-14 | Multi-agent reinforcement learning method for distributed resource cooperative scheduling |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310401017.7A CN116542137A (en) | 2023-04-14 | 2023-04-14 | Multi-agent reinforcement learning method for distributed resource cooperative scheduling |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116542137A true CN116542137A (en) | 2023-08-04 |
Family
ID=87456815
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310401017.7A Pending CN116542137A (en) | 2023-04-14 | 2023-04-14 | Multi-agent reinforcement learning method for distributed resource cooperative scheduling |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116542137A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117335439A (en) * | 2023-11-30 | 2024-01-02 | 国网浙江省电力有限公司 | Multi-load resource joint scheduling method and system |
-
2023
- 2023-04-14 CN CN202310401017.7A patent/CN116542137A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117335439A (en) * | 2023-11-30 | 2024-01-02 | 国网浙江省电力有限公司 | Multi-load resource joint scheduling method and system |
CN117335439B (en) * | 2023-11-30 | 2024-02-27 | 国网浙江省电力有限公司 | Multi-load resource joint scheduling method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109492815B (en) | Energy storage power station site selection and volume fixing optimization method for power grid under market mechanism | |
CN109103912B (en) | Industrial park active power distribution system scheduling optimization method considering power grid peak regulation requirements | |
CN111884213A (en) | Power distribution network voltage adjusting method based on deep reinforcement learning algorithm | |
CN112186743A (en) | Dynamic power system economic dispatching method based on deep reinforcement learning | |
CN112287463A (en) | Fuel cell automobile energy management method based on deep reinforcement learning algorithm | |
CN107453381B (en) | Electric car cluster power regulating method and system based on two stages cross-over control | |
CN112821465B (en) | Industrial microgrid load optimization scheduling method and system containing cogeneration | |
CN116345577B (en) | Wind-light-storage micro-grid energy regulation and optimization method, device and storage medium | |
CN112217195B (en) | Cloud energy storage charging and discharging strategy forming method based on GRU multi-step prediction technology | |
CN112633571A (en) | LSTM-based ultrashort-term load prediction method under source network load interaction environment | |
CN112491094B (en) | Hybrid-driven micro-grid energy management method, system and device | |
CN113326994A (en) | Virtual power plant energy collaborative optimization method considering source load storage interaction | |
CN114331059A (en) | Electricity-hydrogen complementary park multi-building energy supply system and coordinated scheduling method thereof | |
CN116542137A (en) | Multi-agent reinforcement learning method for distributed resource cooperative scheduling | |
CN115986834A (en) | Near-end strategy optimization algorithm-based optical storage charging station operation optimization method and system | |
CN112821432A (en) | Double-layer multi-position configuration method of energy storage system under wind and light access | |
CN115395539A (en) | Shared energy storage operation control method considering customized power service | |
CN114169916A (en) | Market member quotation strategy making method suitable for novel power system | |
CN114123256A (en) | Distributed energy storage configuration method and system adaptive to random optimization decision | |
CN111799820B (en) | Double-layer intelligent hybrid zero-star cloud energy storage countermeasure regulation and control method for power system | |
CN116454920A (en) | Power distribution network frequency modulation method, device, equipment and storage medium | |
CN110599032A (en) | Deep Steinberg self-adaptive dynamic game method for flexible power supply | |
Hu et al. | Energy management for microgrids using a reinforcement learning algorithm | |
CN112564151B (en) | Multi-microgrid cloud energy storage optimization scheduling method and system considering privacy awareness | |
CN111668879A (en) | High-permeability power distribution network scheduling model based on C-PSO algorithm and calculation method thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |