CN116542137A - Multi-agent reinforcement learning method for distributed resource cooperative scheduling - Google Patents

Multi-agent reinforcement learning method for distributed resource cooperative scheduling Download PDF

Info

Publication number
CN116542137A
CN116542137A CN202310401017.7A CN202310401017A CN116542137A CN 116542137 A CN116542137 A CN 116542137A CN 202310401017 A CN202310401017 A CN 202310401017A CN 116542137 A CN116542137 A CN 116542137A
Authority
CN
China
Prior art keywords
power
distributed
ess
reinforcement learning
agent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310401017.7A
Other languages
Chinese (zh)
Inventor
谈竹奎
刘斌
张俊玮
冯圣勇
潘旭辉
何龙
王秀境
徐长宝
张秋雁
徐玉韬
唐赛秋
徐宏伟
陈敦辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guizhou Power Grid Co Ltd
Original Assignee
Guizhou Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guizhou Power Grid Co Ltd filed Critical Guizhou Power Grid Co Ltd
Priority to CN202310401017.7A priority Critical patent/CN116542137A/en
Publication of CN116542137A publication Critical patent/CN116542137A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/38Arrangements for parallely feeding a single network by two or more generators, converters or transformers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/04Constraint-based CAD
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2113/00Details relating to the application field
    • G06F2113/04Power grid distribution networks
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/10Power transmission or distribution systems management focussing at grid-level, e.g. load flow analysis, node profile computation, meshed network optimisation, active network management or spinning reserve management
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2300/00Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
    • H02J2300/20The dispersed energy generation being of renewable origin
    • H02J2300/22The renewable source being solar energy
    • H02J2300/24The renewable source being solar energy of photovoltaic origin

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Geometry (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Power Engineering (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention discloses a multi-agent reinforcement learning method for distributed resource cooperative scheduling, which comprises the steps of establishing a simulation environment of distributed equipment access distribution network; constructing intelligent agents for reinforcement learning of different distributed equipment; the intelligent agent and the simulation environment are trained interactively; and making a decision through the trained agent. Through the invention, researchers can accurately and quickly make decisions under the condition that all parameters of all distributed equipment aggregation models are not required to be known through training of historical data and strong data fitting capacity of a neural network. According to the invention, bidirectional interaction between a user and a power grid can be realized by an electric automobile aggregator, distributed photovoltaic equipment and energy storage, and the problems of inaccurate decision-making caused by overlong optimization time and incomplete parameter perception in the traditional optimization method are solved.

Description

Multi-agent reinforcement learning method for distributed resource cooperative scheduling
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a multi-agent reinforcement learning method for distributed resource collaborative scheduling.
Background
At present, the power distribution network serving as a main body for new energy consumption has multiple branches and complex line structures, a large amount of distributed controllable resources are connected into a power grid inevitably to cause various and complex operation modes of the power grid, users can realize bidirectional interaction with the power grid through the distributed controllable equipment, most of research at present is based on the establishment of a distributed equipment aggregation model and the research of an electricity price excitation mechanism, great difficulty is brought to decision making when the power grid cannot comprehensively sense all parameters of a bottom aggregation model, optimal decision making is difficult to be carried out according to the current state, and meanwhile, the non-convexity and high uncertainty of coordination and optimization of the distributed photovoltaic equipment and the electric automobile of the power grid lead to overlong solving time and difficult to meet the regulation and control requirements. Therefore, whether an intelligent method can be explored to solve the defects brought by the distributed optimization method can be solved.
In recent years, with the rising and development of artificial intelligence technology, reinforcement learning (reinforement learning) is taken as an important scientific paradigm for solving the sequential decision problem, value judgment and strategy selection are updated in continuous learning through trial and error with environment, so that the reinforcement learning is an effective technology for solving the sequential decision problem, particularly a deep reinforcement learning model (Deep Reinforcement Learning, DRL) formed by combining a deep neural network and reinforcement learning, has better self-adaptive learning capability and optimal decision capability for solving the non-convex nonlinear problem, and provides a new idea for processing the distributed controllable resource collaborative scheduling problem of a complex power system.
Disclosure of Invention
This section is intended to outline some aspects of embodiments of the invention and to briefly introduce some preferred embodiments. Some simplifications or omissions may be made in this section as well as in the description summary and in the title of the application, to avoid obscuring the purpose of this section, the description summary and the title of the invention, which should not be used to limit the scope of the invention.
The present invention has been made in view of the above-described problems occurring in the prior art.
Therefore, the multi-agent reinforcement learning method for the distributed resource collaborative scheduling can solve the problems of inaccurate decision caused by overlong optimization time and incomplete parameter perception of the traditional optimization method.
In order to solve the technical problems, the invention provides a multi-agent reinforcement learning method for distributed resource collaborative scheduling, which comprises the following steps:
establishing a simulation environment of the distributed equipment access distribution network;
constructing intelligent agents for reinforcement learning of different distributed equipment;
the intelligent agent and the simulation environment are trained interactively;
and making a decision through the trained agent.
As a preferable scheme of the multi-agent reinforcement learning method for the distributed resource collaborative scheduling, the invention comprises the following steps: the simulation environment specifically comprises:
the power distribution network accessed by the distributed equipment needs to meet the constraint of a power system tide equation, the constraint of voltage safety and stability, the constraint of energy storage equipment operation, the constraint of distributed photovoltaic equipment and the constraint condition of an electric automobile aggregator, and after the distributed equipment is accessed, the decision of the distributed equipment is evaluated according to the decision given by the distributed equipment and returned to the intelligent body in the form of a rewarding value.
7. As a preferable scheme of the multi-agent reinforcement learning method for the distributed resource collaborative scheduling, the invention comprises the following steps: the power system flow equation constraints:
wherein P is mt,i,t And Q mt,i,t The active power and the reactive power of the node i generator set at the time t are calculated; p (P) load,i,t ,Q load,i,t The active load and the reactive load of the node i at the moment t are obtained; p (P) pv,i,t ,P ess,i,t ,P EVA,i,t Active power of distributed photovoltaic, energy storage and electric automobile aggregators of the node i at the time t is respectively; u (U) i,t The voltage modulus of the node i; u (U) j,t The voltage modulus of the node j; θ ij,t Is the phase angle difference between two nodes; g ij ,B ij The conductance and susceptance between nodes i, j, respectively;
the energy storage device operating constraints:
wherein E is ess,i For the energy storage capacity at node i, S ess,i,max ,P ess,i,max ,Q ess,i,max The apparent power, active power and reactive power upper limit at the node i are respectively Soc ess,i,max ,Soc ess,i,min Is the upper and lower limits of the charge state of stored energy, soc ess,i,t Is eta of the node energy storage charge state c ,η d E is the charge and discharge efficiency of energy storage ess,i,t For the energy stored at the current time at the node i at the t-th time, Δt represents the increment of the t time;
the distributed photovoltaic device constraints are such that,
P pv,i,min <P pv,i,t <P pv,i,max
p in the formula pv,i,max And P pv,i,min Respectively representing the maximum power and the minimum power which can be output by the distributed photovoltaic equipment of the node i at the t moment, P pv,i,t, Representing the output power of the distributed photovoltaic device at node i at time t;
the electric vehicle aggregator constraints are that,
p in the formula up,t And P down,t Respectively represents the adjustable capacity of the electric automobile polymerizer participating in the power down-regulation and up-regulation control at the t-th moment,maximum output power, P, of electric automobile polymerizer ev,t The output power of the electric automobile aggregator at the time t is obtained.
As a preferable scheme of the multi-agent reinforcement learning method for the distributed resource collaborative scheduling, the invention comprises the following steps: the agent comprises:
the agents of different distributed devices reinforcement learning, states acquired from the simulation environment, output action space, and rewarding functions.
As a preferable scheme of the multi-agent reinforcement learning method for the distributed resource collaborative scheduling, the invention comprises the following steps: the agent for reinforcement learning of different distributed devices further comprises:
the intelligent agents for reinforcement learning of different cloth-type equipment have respective state space and action space, and the intelligent agents can update parameters according to respective targets to achieve the effect of self-adaptive learning.
As a preferable scheme of the multi-agent reinforcement learning method for the distributed resource collaborative scheduling, the invention comprises the following steps: the state space comprises:
S={P load,load ,P pv,pv,max ,P EVA,EVA,max ,P mt,mt ,SOC ess,ess ,a,t}
wherein P is load,|load| ,P pv,|pv|,max ,P EVA,|EVA|,max ,P mt,|mt| ,SOC ess,|ess| A and t are respectively the power characteristics of an electric load, the upper limit of the output of the distributed photovoltaic equipment, the output of an electric automobile aggregator, the output of a traditional unit, the SOC of energy storage, the power price of a power grid at the current moment and a scheduling time section.
As a preferable scheme of the multi-agent reinforcement learning method for the distributed resource collaborative scheduling, the invention comprises the following steps: the action space includes:
a 1 =a ess,ess
a 2 =a pv,pv
a 3 =a EVA,EVA
wherein a is ess,|ess| ,a pv,|pv| ,a EVA,|EVA| Respectively representing the real-time energy storage output of the model, the output of the distributed photovoltaic equipment, the output of an electric automobile polymerizer and the output of the neural network, wherein the range of the value of the neural network output is [ -1,1]It is necessary to map back to the real action space according to the real physical constraints.
As a preferable scheme of the multi-agent reinforcement learning method for the distributed resource collaborative scheduling, the invention comprises the following steps: the interactive training comprises the following steps:
and the source load data accessed to the history in the simulation environment is used as a sample to interact with the intelligent agent, the intelligent agent for strengthening the learning of the distributed equipment performs action to learn according to the current state of the power distribution network, and a decision of maximizing the rewarding value is explored according to a gradient descending updating strategy of the rewarding value fed back by the simulation environment of the distribution network of the distributed equipment.
As a preferable scheme of the multi-agent reinforcement learning method for the distributed resource collaborative scheduling, the invention comprises the following steps: the prize value includes: setting a distributed photovoltaic equipment rewarding function, setting an energy storage equipment rewarding function and setting a rewarding function of an electric automobile aggregator;
the distributed photovoltaic device bonus function setting includes:
r 1 =r normal +aP pv,out +bP pv,delta
wherein r is normal Representing the rewards of safe and stable operation of the power grid, wherein the rewards are negative P when the power grid is unsafe pv,delta Representing the spare capacity of a photovoltaic cluster, P pv,out Representing the output power of the photovoltaic cluster, wherein a represents the time-of-use electricity price, and b represents the discount coefficient;
the energy storage device reward function setting comprises:
r 2 =r normal +a 1 η 1 P ess,in +a 2 η 2 P ess,out
wherein r is normal Representing the safe and stable operation rewards of the power grid, wherein the rewards are negative values when the power grid is unsafe, and P ess,in Represents the charging power, which is expressed as a negative value, a 1 Represents the purchase price, eta 1 Indicating the charging efficiency, a 2 Represents the price of electricity sold, eta 2 Indicating discharge efficiency;
the electric automobile aggregator's rewards function setting includes:
r 3 =r normal +a 1 P EVA,in +r DSO
wherein r is normal Representing the safe and stable operation rewards of the power grid, wherein the rewards are negative values when the power grid is unsafe, and P EVA,in Charging power purchased from the grid on behalf of the electric vehicle aggregator, expressed as negative value, a 1 Represents the purchase price of electricity, r DSO And representing the participation of the electric automobile polymerizer in the rewards given by the grid peak regulation and frequency modulation grid.
As a preferable scheme of the multi-agent reinforcement learning method for the distributed resource collaborative scheduling, the invention comprises the following steps: making a decision by the trained agent, comprising:
and accessing the trained intelligent agent into a power distribution network environment, analyzing data collected by a power grid data acquisition system in real time, and deciding the predicted power of the new energy and the current output power, load and energy storage state of charge state quantity according to the state information of the current power distribution network including the output power of the traditional unit.
The invention has the beneficial effects that: the algorithm provided by the invention firstly acquires the running state of the power distribution network from the power distribution network side, comprises source network load data and inputs the source network load data into different intelligent agents, the intelligent agents correct strategies according to respective rewards values to maximize the value of the evaluation network, and the purpose of bidirectional interaction and coordinated operation of user side resources and the power grid can be realized by learning in the process of constantly interacting with the environment and only acquiring the state of the power grid in the application stage. The method has the advantages that the goal that reasonable decisions can be made under the condition that the internal parameters of an electric automobile aggregator or a photovoltaic energy storage cluster are not required to be known is achieved, and when the method is applied online, quick and accurate decisions can be completed only according to the real-time power grid running state at the current moment.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. Wherein:
FIG. 1 is a flowchart of a multi-agent reinforcement learning method for distributed resource collaborative scheduling according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a multi-agent reinforcement learning method for collaborative scheduling of distributed resources according to an embodiment of the present invention, wherein the upper part of the diagram is a multi-agent reinforcement learning model, and the lower part of the diagram is a bidirectional interaction environment between a power distribution network and distributed equipment;
FIG. 3 is a schematic diagram of a multi-agent reinforcement learning model input state, decision output and policy update process of a multi-agent reinforcement learning method for distributed resource collaborative scheduling according to an embodiment of the present invention;
fig. 4 is a schematic diagram of an IEEE30 node simulation environment of a multi-agent reinforcement learning method for distributed resource co-scheduling according to embodiment 2 of the present invention.
Detailed Description
So that the manner in which the above recited objects, features and advantages of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to the embodiments, some of which are illustrated in the appended drawings. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present invention is not limited to the specific embodiments disclosed below.
Further, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic can be included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.
While the embodiments of the present invention will be described in detail with reference to the drawings, the cross-sectional view of the device structure will not be partially enlarged to general scale for convenience of description, and the drawings are merely illustrative and should not limit the scope of the invention. In addition, the three-dimensional dimensions of length, width and depth should be included in actual fabrication.
Also in the description of the present invention, it should be noted that the orientation or positional relationship indicated by the terms "upper, lower, inner and outer", etc. are based on the orientation or positional relationship shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the apparatus or elements referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first, second, or third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
The terms "mounted, connected, and coupled" should be construed broadly in this disclosure unless otherwise specifically indicated and defined, such as: can be fixed connection, detachable connection or integral connection; it may also be a mechanical connection, an electrical connection, or a direct connection, or may be indirectly connected through an intermediate medium, or may be a communication between two elements. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.
Example 1
Referring to fig. 1 and 2, a first embodiment of the present invention provides a multi-agent reinforcement learning method for distributed resource cooperative scheduling, including:
s1: constructing a distribution network simulation environment:
establishing a simulation environment of the distributed equipment access distribution network, wherein the distribution network accessed by the distributed equipment needs to meet the constraint of a power system tide equation, the constraint of voltage safety and stability, the constraint of energy storage equipment operation, the constraint of distributed photovoltaic equipment and the constraint condition of an electric automobile aggregator, and after the distributed equipment is accessed, evaluating the decision of the distributed equipment according to the decision given by the distributed equipment and returning the decision to an intelligent agent in the form of a rewarding value;
the constraints are as follows:
constraint of power flow equation of power distribution network
Wherein P is mt,i,t And Q mt,i,t The active power and the reactive power of the node i generator set at the time t are calculated; p (P) load,i,t ,Q load,i,t The active load and the reactive load of the node i at the moment t are obtained; p (P) pv,i,t ,P ess,i,t ,P EVA,i,t Active power of distributed photovoltaic, energy storage and electric automobile aggregators of the node i at the time t is respectively; u (U) i,t The voltage modulus of the node i; θ i,t Is the phase angle difference between two nodes; g ij ,B ij Conductivity and susceptance between nodes i, j, respectively
Power distribution network voltage safety and stability constraint
In order to ensure the quality of the power supply voltage, voltage safety and stability constraints are set as follows:
v i,min <v i <v i,max
in the formula, v i,max ,v i,min Respectively represent the upper and lower limits of the safe and stable voltage at the node i, and are respectively set to be 0.95v N And 1.05v N ,v N Is rated voltage.
Energy storage device operation constraints:
wherein E is ess,i For the energy storage capacity at node i, S ess,i,max ,P ess,i,max ,Q ess,i,max The apparent power, active power and reactive power upper limit at the node i are respectively, soc ess,i,max ,Soc ess,i,min Is the upper and lower limit of the energy storage charge state, eta c ,η d E is the charge and discharge efficiency of energy storage ess,i,t The energy stored at the current moment at the node i at the t moment is stored;
distributed photovoltaic device constraints
P pv,i,min <P pv,i,t <P pv,i,max
P in the formula pv,i,t,max And P pv,i,t,min Respectively representing the maximum power and the minimum power which can be output by the distributed photovoltaic equipment of the node i at the t moment, P pv,i,t, Representing the output power of the distributed photovoltaic device at node i at time t.
The electric vehicle aggregator constraints are that,
p in the formula up And P down Respectively representing adjustable capacity of electric automobile polymerizers participating in power down regulation and up regulation control at t moment Maximum output power, P, of electric automobile polymerizer ev,t The output power of the electric automobile aggregator at the time t is obtained.
It should be noted that, if the power distribution network environment is a simulated power distribution network environment, different types of distributed devices including electric automobile aggregators, distributed photovoltaics, energy storage and the like need to be connected into the simulated environment, and bidirectional interaction between the power grid and the user is realized through mechanisms such as electricity price response and the like, so that the purpose of response at the demand side is achieved. According to the invention, the distribution network environment can provide training samples for the multi-agent reinforcement learning algorithm, the agents can acquire observation values from the environment, and the agents can give rewards in time for feedback after making decisions. If an open actual distribution network system for the test algorithm is provided, the simulation distributed resource interaction distribution network environment is not required to be constructed, and the simulation distributed resource interaction distribution network system can directly interact with the intelligent agent.
S2: building an intelligent agent:
constructing different distributed equipment reinforcement learning intelligent agents, and acquiring states, output action spaces and rewarding functions from a simulation environment;
different intelligent agents have respective state space and action space, and the intelligent agents can update parameters according to respective targets to achieve the effect of self-adaptive learning.
Further, the state space includes:
S={P load,|load| ,P pv,|pv|,max ,P EVA,|EVA|,max ,P mt,|mt| ,SOC ess,|ess| ,a,t}
wherein P is load,|load| ,P pv,|pv|,max ,P EVA,|EVA| ,P mt,|mt| ,SOC ess,|ess| A and t are respectively the power characteristics of an electric load, the upper limit of the output of the distributed photovoltaic equipment, the output of an electric vehicle aggregator, the output of a traditional unit, the SOC of energy storage, the power price of a power grid at the current moment and a scheduling time section;
still further, an action space comprising:
a 2 =a pv,|pv|
wherein a is ess,|ess| ,a pv,|pv| ,a EVA,|EVA| Respectively representing the real-time energy storage output of the model, the output of the distributed photovoltaic equipment, the output of an electric automobile polymerizer and the output of the neural network, wherein the range of the value of the neural network output is [ -1,1]It is necessary to map back to the real action space according to the real physical constraints.
It should be noted that the multi-agent reinforcement learning section includes different types of agents that have respective optimization objectives that are capable of trial and error learning while constantly interacting with the distribution network environment to maximize the respective objective rewards. Each intelligent agent obtains different states from the power distribution network environment, takes action according to the states, the environment calculates rewards according to the actions made by all the intelligent agents and returns the rewards to the intelligent agents, the intelligent agents update model parameters according to the rewards, and adjust strategies to enable the rewards obtained by the intelligent agents to be maximum and obtain the maximum accumulated rewards in continuous learning.
S3: interactive training of the intelligent agent and the simulation environment:
the distributed equipment reinforcement learning intelligent agent interacts with the power distribution network environment, firstly, the power distribution network gives out the running state of the current power grid and inputs the running state into the intelligent agent, and the intelligent agent makes a decision according to the running state of the power grid and interacts with the power grid to acquire rewards; and finally updating the value evaluation and strategy of the model according to the feedback rewarding value of the environment, so as to achieve the maximum rewarding value of different agents.
It should be noted that, the source load data accessed to the history in the environment is used as a sample to interact with the intelligent agent, and the model performs action learning according to the current state of the power distribution network. In the process of interaction between the intelligent agent and the environment, the strategy of updating the intelligent agent is reduced according to the gradient of the rewarding value fed back by the environment, and the decision of maximization of the rewarding value is explored, so that the requirement of cooperative operation of distributed equipment and a power distribution network can be met, and the problems that the current power grid is overlong in solving time and fuzzy in model parameters due to non-convexity and high uncertainty of solving problems are solved.
Further, different agents have respective different prize values, and specific prize values are set as follows:
distributed photovoltaic device bonus function settings:
because the output of the distributed photovoltaic equipment has randomness, the output power of the distributed photovoltaic equipment needs to consider the influence on the safe and stable operation of the power grid and has certain reserve capacity, and therefore, the rewards are given by the following formula:
r 1 =r normal +aP pv,out +bP pv,delta
wherein r is normal Representing the rewards of safe and stable operation of the power grid, wherein the rewards are negative P when the power grid is unsafe pv,delta Representing the spare capacity of the photovoltaic cluster, a representing the time-of-use electricity price and b representing the discount coefficient.
The charging and discharging power of the energy storage device has different efficiencies, the output power of the energy storage device also needs to consider the influence on the safe and stable operation of the power grid, electricity needs to be purchased from the power grid during charging, electricity needs to be sold to the power grid during discharging, and the maximum benefit of one day needs to be considered, so that the rewards are given by the following formula:
r 2 =r normal +a 1 η 1 P ess,in +a 2 η 2 P ess,out (10)
in the formula, rnormal represents a grid safe and stable operation rewarding, the rewarding value is a negative value when the grid is unsafe, pess, in represent charging power, the charging power is represented as a negative value, a1 represents purchase price, eta 1 The charging efficiency is represented, a2 represents the selling electricity price, eta 2 Indicating the discharge efficiency.
In essence, the electric automobile is equivalent to a battery energy storage device, and the idle time of the electric automobile can be fully utilized to participate in power grid regulation and control on the premise of meeting the charging requirement of an automobile owner. Thus, the electric vehicle aggregator's reward function is:
r 3 =r normal +a 1 P EVA,in +r DSO (11)
wherein r is normal Representing the safe and stable operation rewards of the power grid, wherein the rewards are negative values when the power grid is unsafe, and P EVA,in Charging power purchased from the grid on behalf of the electric vehicle aggregator, expressed as negative value, a 1 Represents the purchase price of electricity, r DSO And representing the participation of the electric automobile polymerizer in the rewards given by the grid peak regulation and frequency modulation grid.
The three agents optimize the strategy according to the respective rewarding function to maximize the rewarding function, so that the aim of bidirectional interaction between the power grid and the user side adjustable resource is fulfilled.
S4: on-line application decision:
and accessing the trained reinforcement learning intelligent agent into a power distribution network environment, analyzing data collected by a power grid data acquisition system in real time, and deciding according to the state information of the current power distribution network, including the output of a traditional unit, the predicted power of the new energy and the state quantities of the current output power, load, energy storage charge state and the like.
Furthermore, different agents act according to maximization of respective rewarding functions, and the requirement of bidirectional interaction between the distributed equipment and the power grid is met.
It should be noted that the invention aims at solving the defects of the prior method in the business quotient of the two-way interaction of the power grid and the user side resource, and can realize the two-way interaction of the power grid and the distributed equipment under the excitation of electricity price under the condition of meeting the safety and stability of the power distribution network, thereby maximally realizing the power consumption of the distributed photovoltaic equipment, and the peak regulation and frequency modulation of the power distribution network by the energy storage and electric automobile aggregator;
through quantifying the bidirectional interaction between the power grid and the user side resource through the reward function, different intelligent agents can learn through the respective reward function, gradient descent update respective strategies realize maximization of the reward function, and the energy storage and distributed photovoltaic equipment obtains benefits on the basis of meeting the safety and stability of the power grid and participates in peak regulation and frequency modulation of the power grid;
through training the reinforcement learning intelligent body consisting of the representation, dynamic and prediction neural network in offline, the intelligent body continuously learns the rules of the environment and continuously deduces the influence on the future internally, and continuously "trial and error" learns and explores, the goal of making reasonable decisions under the condition that the electric automobile aggregator or the internal parameters of the photovoltaic and energy storage clusters are not required to be known is finally achieved. When the method is applied online, quick and accurate decision-making can be completed only according to the real-time power grid running state at the current moment.
Example 2
Referring to fig. 3, for one embodiment of the present invention, a multi-agent reinforcement learning method for distributed resource collaborative scheduling is provided, and in order to verify the beneficial effects of the present invention, scientific demonstration is performed through experiments.
An IEEE30 node distribution network simulation environment is built, electric automobile load aggregators are connected to nodes 3 and 10, distributed photovoltaics are connected to nodes 20 and 28, an energy storage power station is connected to node 5, the simulation environment is built based on python's photovoltaic power flow calculation, and the specific environment is shown in figure 3.
Initializing environment, extracting source charge data from historical data, wherein the initial state of energy storage is 0, forming observed quantity ob s and respectively inputting the observed quantity ob s into the intelligent agents, and respectively giving actions a by the three intelligent agents ess,|ess| ,a pv,|pv| ,a EVA,|EVA| The output of the action and the source load data are input into a simulation environment built based on a bandwidth package together for load flow calculation, the branch load flow power and the node voltage are obtained, try and excess are used simultaneously, and if the load flow is not converged, r is the same as the branch load flow power normal =-100。
Calculating r according to the branch tidal current power, the node voltage, the upper limit of the branch power and the upper and lower limits of the node voltage normal
According to r normal Calculating rewards r of three intelligent agents respectively 1 ,r 2 And r 3 Returning to the agent, the gradient drops to update the agent's parameters.
And fourthly, entering the next moment according to the action and the environmental state of the intelligent agent, and inputting source charge data at the next moment to repeat the process.
It should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present invention may be modified or substituted without departing from the spirit and scope of the technical solution of the present invention, which is intended to be covered in the scope of the claims of the present invention.

Claims (10)

1. A multi-agent reinforcement learning method for distributed resource cooperative scheduling is characterized in that: comprising the steps of (a) a step of,
establishing a simulation environment of the distributed equipment access distribution network;
constructing intelligent agents for reinforcement learning of different distributed equipment;
the intelligent agent and the simulation environment are trained interactively;
and making a decision through the trained agent.
2. The multi-agent reinforcement learning method for distributed resource collaborative scheduling according to claim 1, wherein: the simulation environment specifically comprises:
the power distribution network accessed by the distributed equipment needs to meet the constraint of a power system tide equation, the constraint of voltage safety and stability, the constraint of energy storage equipment operation, the constraint of distributed photovoltaic equipment and the constraint condition of an electric automobile aggregator, and after the distributed equipment is accessed, the decision of the distributed equipment is evaluated according to the decision given by the distributed equipment and returned to the intelligent body in the form of a rewarding value.
3. A multi-agent reinforcement learning method for distributed resource co-scheduling according to claim 1 or 2, wherein the power system flow equation constraints:
wherein P is mt,i,t And Q mt,i,t The active power and the reactive power of the node i generator set at the time t are calculated; p (P) load,i,t ,Q load,i,t The active load and the reactive load of the node i at the moment t are obtained; p (P) pv,i,t ,P ess,i,t ,P EVA,i,t Active power of distributed photovoltaic, energy storage and electric automobile aggregators of the node i at the time t is respectively; u (U) i,t The voltage modulus of the node i; u (U) j,t The voltage modulus of the node j; θ ij,t Is the phase angle difference between two nodes; g ij ,B ij The conductance and susceptance between nodes i, j, respectively;
the energy storage device operating constraints:
wherein E is ess,i For the energy storage capacity at node i, S ess,i,max ,P ess,i,max ,Q ess,i,max The apparent power, active power and reactive power upper limit at the node i are respectively Soc ess,i,max ,Soc ess,i,min Is the upper and lower limits of the charge state of stored energy, soc ess,i,t Is eta of the node energy storage charge state c ,η d E is the charge and discharge efficiency of energy storage ess,i,t For the energy stored at the current time at the node i at the t-th time, Δt represents the increment of the t time;
the distributed photovoltaic device constraints are such that,
P pv,i,min <P pv,i,t <P pv,i,max
p in the formula pv,i,max And P pv,i,min Respectively representing the maximum power and the minimum power which can be output by the distributed photovoltaic equipment of the node i at the t moment, P pv,i,t, Representing the output power of the distributed photovoltaic device at node i at time t;
the electric vehicle aggregator constraints are that,
p in the formula up,t And P down,t Respectively representing adjustable capacity of electric automobile polymerizers participating in power down regulation and up regulation control at t moment Maximum output power, P, of electric automobile polymerizer ev,t The output power of the electric automobile aggregator at the time t is obtained.
4. The multi-agent reinforcement learning method for distributed resource collaborative scheduling according to claim 1, wherein: the agent comprises:
the agents of different distributed devices reinforcement learning, states acquired from the simulation environment, output action space, and rewarding functions.
5. The multi-agent reinforcement learning method for distributed resource collaborative scheduling according to claim 4, wherein: the agent for reinforcement learning of different distributed devices further comprises:
the intelligent agents for reinforcement learning of different cloth-type equipment have respective state space and action space, and the intelligent agents can update parameters according to respective targets to achieve the effect of self-adaptive learning.
6. The multi-agent reinforcement learning method for distributed resource collaborative scheduling according to claim 5, wherein: the state space comprises:
S={P load,|load| ,P pv,|pv|,max ,P EVA,|EVA|,max ,P mt,|mt| ,SOC ess,|ess| ,a,t}
wherein P is load,|load| ,P pv,|pv|,max ,P EVA,|EVA|,max ,P mt,|mt| ,SOC ess,|ess| A and t are respectively the power characteristics of an electric load, the upper limit of the output of the distributed photovoltaic equipment, the output of an electric automobile aggregator, the output of a traditional unit, the SOC of energy storage, the power price of a power grid at the current moment and a scheduling time section.
7. A multi-agent reinforcement learning method for distributed resource co-scheduling as claimed in claim 5 or 6, wherein said action space comprises:
a 1 =a ess,|ess|
a 2 =a pv,|pv|
a 3 =a EVA,|EVA|
wherein a is ess,|ess| ,a pv,|pv| ,a EVA,|EVA| Respectively representing the real-time energy storage output of the model, the output of the distributed photovoltaic equipment, the output of an electric automobile polymerizer and the output of the neural network, wherein the range of the value of the neural network output is [ -1,1]It is necessary to map back to the real action space according to the real physical constraints.
8. The multi-agent reinforcement learning method for distributed resource co-scheduling of claim 1, wherein the interactive training comprises:
and the source load data accessed to the history in the simulation environment is used as a sample to interact with the intelligent agent, the intelligent agent for strengthening the learning of the distributed equipment performs action to learn according to the current state of the power distribution network, and a decision of maximizing the rewarding value is explored according to a gradient descending updating strategy of the rewarding value fed back by the simulation environment of the distribution network of the distributed equipment.
9. The multi-agent reinforcement learning method of distributed resource co-scheduling according to any of claims 2 and 8, wherein the reward value comprises: setting a distributed photovoltaic equipment rewarding function, setting an energy storage equipment rewarding function and setting a rewarding function of an electric automobile aggregator;
the distributed photovoltaic device bonus function setting includes:
r 1 =r normal +aP pv,out +bP pv,delta
wherein r is normal Representing the rewards of safe and stable operation of the power grid, wherein the rewards are negative P when the power grid is unsafe pv,delta Representing the spare capacity of a photovoltaic cluster, P pv,out Representing the output power of the photovoltaic cluster, wherein a represents the time-of-use electricity price, and b represents the discount coefficient;
the energy storage device reward function setting comprises:
r 2 =r normal +a 1 η 1 P ess,in +a 2 η 2 P ess,out
in the method, in the process of the invention, rnormal representing the safe and stable operation rewards of the power grid, wherein the rewards are negative values when the power grid is unsafe, and P ess,in Represents the charging power, which is expressed as a negative value, P ess,out Represents the discharge power, a 1 Represents the purchase price, eta 1 Indicating the charging efficiency, a 2 Represents the price of electricity sold, eta 2 Indicating discharge efficiency;
the electric automobile aggregator's rewards function setting includes:
r 3 =r normal +a 1 P EVA,in +r DSO
wherein r is normal Representing the safe and stable operation rewards of the power grid, wherein the rewards are negative values when the power grid is unsafe, and P EVA,in Charging power purchased from the grid on behalf of the electric vehicle aggregator, expressed as negative value, a 1 Represents the purchase price of electricity, r DSO And representing the participation of the electric automobile polymerizer in the rewards given by the grid peak regulation and frequency modulation grid.
10. The multi-agent reinforcement learning method of co-scheduling of distributed resources according to any one of claims 1, 4, 8, wherein making decisions by trained agents comprises:
and accessing the trained intelligent agent into a power distribution network environment, analyzing data collected by a power grid data acquisition system in real time, and deciding the predicted power of the new energy and the current output power, load and energy storage state of charge state quantity according to the state information of the current power distribution network including the output power of the traditional unit.
CN202310401017.7A 2023-04-14 2023-04-14 Multi-agent reinforcement learning method for distributed resource cooperative scheduling Pending CN116542137A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310401017.7A CN116542137A (en) 2023-04-14 2023-04-14 Multi-agent reinforcement learning method for distributed resource cooperative scheduling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310401017.7A CN116542137A (en) 2023-04-14 2023-04-14 Multi-agent reinforcement learning method for distributed resource cooperative scheduling

Publications (1)

Publication Number Publication Date
CN116542137A true CN116542137A (en) 2023-08-04

Family

ID=87456815

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310401017.7A Pending CN116542137A (en) 2023-04-14 2023-04-14 Multi-agent reinforcement learning method for distributed resource cooperative scheduling

Country Status (1)

Country Link
CN (1) CN116542137A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117335439A (en) * 2023-11-30 2024-01-02 国网浙江省电力有限公司 Multi-load resource joint scheduling method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117335439A (en) * 2023-11-30 2024-01-02 国网浙江省电力有限公司 Multi-load resource joint scheduling method and system
CN117335439B (en) * 2023-11-30 2024-02-27 国网浙江省电力有限公司 Multi-load resource joint scheduling method and system

Similar Documents

Publication Publication Date Title
CN109492815B (en) Energy storage power station site selection and volume fixing optimization method for power grid under market mechanism
CN109103912B (en) Industrial park active power distribution system scheduling optimization method considering power grid peak regulation requirements
CN111884213A (en) Power distribution network voltage adjusting method based on deep reinforcement learning algorithm
CN112186743A (en) Dynamic power system economic dispatching method based on deep reinforcement learning
CN112287463A (en) Fuel cell automobile energy management method based on deep reinforcement learning algorithm
CN107453381B (en) Electric car cluster power regulating method and system based on two stages cross-over control
CN112821465B (en) Industrial microgrid load optimization scheduling method and system containing cogeneration
CN116345577B (en) Wind-light-storage micro-grid energy regulation and optimization method, device and storage medium
CN112217195B (en) Cloud energy storage charging and discharging strategy forming method based on GRU multi-step prediction technology
CN112633571A (en) LSTM-based ultrashort-term load prediction method under source network load interaction environment
CN112491094B (en) Hybrid-driven micro-grid energy management method, system and device
CN113326994A (en) Virtual power plant energy collaborative optimization method considering source load storage interaction
CN114331059A (en) Electricity-hydrogen complementary park multi-building energy supply system and coordinated scheduling method thereof
CN116542137A (en) Multi-agent reinforcement learning method for distributed resource cooperative scheduling
CN115986834A (en) Near-end strategy optimization algorithm-based optical storage charging station operation optimization method and system
CN112821432A (en) Double-layer multi-position configuration method of energy storage system under wind and light access
CN115395539A (en) Shared energy storage operation control method considering customized power service
CN114169916A (en) Market member quotation strategy making method suitable for novel power system
CN114123256A (en) Distributed energy storage configuration method and system adaptive to random optimization decision
CN111799820B (en) Double-layer intelligent hybrid zero-star cloud energy storage countermeasure regulation and control method for power system
CN116454920A (en) Power distribution network frequency modulation method, device, equipment and storage medium
CN110599032A (en) Deep Steinberg self-adaptive dynamic game method for flexible power supply
Hu et al. Energy management for microgrids using a reinforcement learning algorithm
CN112564151B (en) Multi-microgrid cloud energy storage optimization scheduling method and system considering privacy awareness
CN111668879A (en) High-permeability power distribution network scheduling model based on C-PSO algorithm and calculation method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination