CN117833285A - Micro-grid energy storage optimization scheduling method based on deep reinforcement learning - Google Patents

Micro-grid energy storage optimization scheduling method based on deep reinforcement learning Download PDF

Info

Publication number
CN117833285A
CN117833285A CN202311618193.2A CN202311618193A CN117833285A CN 117833285 A CN117833285 A CN 117833285A CN 202311618193 A CN202311618193 A CN 202311618193A CN 117833285 A CN117833285 A CN 117833285A
Authority
CN
China
Prior art keywords
energy storage
grid
micro
power
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311618193.2A
Other languages
Chinese (zh)
Inventor
邓立
环加飞
王伟
张伟韬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North China Grid Co Ltd
Original Assignee
North China Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North China Grid Co Ltd filed Critical North China Grid Co Ltd
Priority to CN202311618193.2A priority Critical patent/CN117833285A/en
Publication of CN117833285A publication Critical patent/CN117833285A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/28Arrangements for balancing of the load in a network by storage of energy
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/28Arrangements for balancing of the load in a network by storage of energy
    • H02J3/32Arrangements for balancing of the load in a network by storage of energy using batteries with converting means
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]

Landscapes

  • Engineering & Computer Science (AREA)
  • Power Engineering (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention relates to a micro-grid energy storage optimization scheduling method based on deep reinforcement learning, belongs to the technical field of electrical engineering, and provides a micro-grid energy storage optimization scheduling model based on deep reinforcement learning, wherein the micro-grid energy storage optimization scheduling model is considered, the economy is taken as a scheduling target, and the energy storage optimization operation, equipment constraint and tide constraint in the micro-grid are considered. The micro-grid dispatching model is converted into a Markov decision process, a state space, an action space and a reward function are firstly defined by reinforcement learning, and state transition probability and discount factors are introduced. The above elements are combined together to define a complete markov decision process. Policy optimization process: after training, an optimization strategy for actual dispatching of the micro-grid is found, and the used DDPG algorithm combines a deep neural network and a deterministic strategy gradient method, so that the strategy is continuously improved in an iterative mode and is used for training a deterministic strategy to maximize the cumulative rewards and the like.

Description

Micro-grid energy storage optimization scheduling method based on deep reinforcement learning
Technical Field
The invention belongs to the technical field of electrical engineering, and particularly relates to a micro-grid energy storage optimization scheduling method based on deep reinforcement learning.
Background
A micro grid refers to a small, independently operating power system that can operate independently of the main grid or can be connected to the main grid and share energy. The uncertainty of renewable energy sources and the fluctuation of load demands can cause energy fluctuation problems for micro-grids.
The optimal dispatching of the micro-grid is an important strategy, can effectively solve the problem of power fluctuation, and ensures the balance between renewable energy sources and energy loads in the micro-grid. In previous studies, load demand over a period of time in the future may be predicted using advanced load prediction models, which are used to optimize scheduling decisions to ensure adequate power supply. Similarly, accurate predictions of renewable energy (e.g., solar and wind) output are also essential. These predictions can be used to determine when and how much renewable energy will be available, thereby facilitating the distribution of power. However, an error problem occurs in the prediction, which affects the accuracy of the scheduling decision. And the energy supply and demand can be balanced by starting a standby generator, adjusting the load or the operation of an energy storage system and the like.
Conventional algorithms can be used to solve the microgrid optimization scheduling problem, especially in the case of low complexity or limited computational resources. Conventional algorithms include linear programming, integer programming, greedy algorithms, genetic algorithms, etc., which may be used alone or in combination, depending on the particular requirements and constraints of the microgrid. In practical application, mathematical modeling and algorithm design are generally required to be combined to solve the problem of optimal scheduling of the micro-grid, so that good performance can be ensured under different conditions. This process involves building complex mathematical models, while the solution efficiency is not high.
The deep reinforcement learning method has some significant benefits in the micro-grid optimization scheduling problem. The micro-grid operation environment may have external factors which are constantly changed, such as weather, energy price, load demand and the like, but deep reinforcement learning can adapt to the changes, and automatic learning and updating strategies so as to meet the optimal scheduling requirements in different scenes. Micro grid scheduling problems typically contain a large number of device state variables and decision variables, and the problem itself can be very complex. The deep neural network can handle high-dimensional state space and decision space and handle nonlinear relationships to better model the microgrid system. The deep reinforcement learning agent can make an autonomous decision without manual intervention. This means that the microgrid can make decisions in a real-time environment without the need to predefine complex rules and policies. Agents can also explore during the learning process to find new strategies, while also utilizing known effective strategies, which help to overcome local optimization problems in the microgrid. Once the deep reinforcement learning agent is trained, the deep reinforcement learning agent can be automatically applied to a micro-grid system to make scheduling decisions in real time, so that the requirement of manual intervention is reduced.
However, the existing micro-grid optimal scheduling method does not consider the problem that energy storage optimal operation is suitable for various scenes, and also does not consider the limitations of the micro-grid prediction data errors and the traditional algorithm.
Disclosure of Invention
The invention aims to provide a micro-grid energy storage optimization scheduling method based on deep reinforcement learning, which is used for solving the technical problems in the prior art, namely, the problems of the existing micro-grid optimization scheduling.
In order to achieve the above purpose, the technical scheme of the invention is as follows:
a micro-grid energy storage optimization scheduling method based on deep reinforcement learning comprises the following steps:
taking the energy storage optimization operation micro-grid dispatching model as a planning target, and taking the energy storage optimization operation, equipment constraint and tide constraint in the micro-grid into consideration, and providing a micro-grid energy storage optimization dispatching model based on deep reinforcement learning;
converting the micro-grid dispatching model into a Markov decision process, firstly defining a state space, an action space and a reward function by using reinforcement learning, and introducing state transition probability and discount factors; combining the state transition probability and the discount factor to define a complete Markov decision process;
after training, finding out an optimization strategy for actual dispatching of the micro-grid, and continuously improving the strategy in an iterative mode by combining a deep neural network and a deterministic strategy gradient method by using a DDPG algorithm, wherein the strategy is used for training a deterministic strategy to maximize the cumulative rewards; the Actor and Critic networks are iteratively trained until the strategy converges or a predetermined number of training steps is reached.
Further, the objective function of the micro-grid energy storage optimization scheduling model is as follows:
the optimal scheduling of the micro-grid is required to meet the economy, namely the operation cost is minimized; the operation cost comprises the wind-discarding/light-emitting cost, the gas turbine operation cost, the pollution gas emission cost and the cost of purchasing electricity from the main power grid by the micro power grid; wherein, the energy storage optimization operation is also considered;
1) Running cost;
wherein P is R (t) generating output in t time period for renewable energy sources in the system;the maximum value of the power generation output of the new energy is obtained by prediction before the day; c (C) R The wind and light discarding cost is the unit power; to fully eliminate new energy, C can be used R Is set large enough to ensure that the power discarding is generated only when the constraint condition cannot be met; p (P) G,k (t) is the active power output of the kth adjustable generator in the system during period t; n (N) G The number of the generators can be adjusted; a, a k 、b k 、c k The power generation cost coefficient of the kth controllable generator is set; alpha em The discharge cost is the unit generated power; p (P) line (t) is the power on the tie line between the microgrid and the main grid for period t; e (E) p (t) is the real-time electricity price of the period t;
the formula (1) defines the operation cost as the sum of the wind and light discarding cost, the operation cost and the emission cost of the controllable generator and the electricity purchasing cost of the upper power grid;
2) An energy storage system operation penalty;
SOC(t+1)=SOC(t)+ΔSOC(t) (2)
wherein, SOC (t) is the charge state of the energy storage system in a t period; ΔSOC (t) is the state of charge of the energy storage system that varies over a period of t; p (P) B (t) is the charge/discharge power of the energy storage system during period t; η (eta) dis And eta ch Discharging and charging efficiencies, respectively; e (E) B Is the capacity of the energy storage system; wherein P is B (t) < 0 represents discharge state, P B (t) > 0 represents a state of charge; c (C) soc (t) is a penalty for operation of the energy storage system; SOC (State of Charge) max And SOC (System on chip) min The upper and lower limits of the state of charge of the optimal operation range of the energy storage system; lambda (lambda) 1 、λ 2 、λ 3 And lambda (lambda) 4 Is a penalty coefficient.
Further, constraint conditions of the micro-grid energy storage optimization scheduling model are as follows:
1) Wind and light constraint;
2) Controllable generator constraint;
in the method, in the process of the invention,and->The upper limit and the lower limit of the output of the controllable generator are respectively; CR (computed radiography) I And CR (CR) D Respectively representing the output limit of the controllable generator which is increased or decreased in unit time, namely the upper limit of the climbing rate;
3) Energy storage system constraint;
SOC min ≤SOC(t)≤SOC max (8)
-P N ≤P B (t)≤P N (9)
wherein P is N An upper limit of charge/discharge power of the energy storage system;
4) Flow restriction;
in the method, in the process of the invention,and->The upper and lower limits of the power on the connecting line are respectively; u (U) j The voltage at node j; u (U) N Is the nominal value of the node voltage.
Further, the energy storage optimization scheduling problem of the micro-grid is converted into an MDP, and the core elements of the reinforcement learning problem are defined: state space, action space, rewards function, MDP provides a formalized method to represent the problem and is used to build a mathematical model of the microgrid optimization problem; scheduling the agent to train by interacting with the microgrid environment, and learning an optimal strategy according to a DDPG algorithm; in the training process, various scenes including different load demands, renewable energy source changes and power price fluctuation are simulated so as to improve the robustness of the agent program;
wherein the state space comprises various power parameters in the micro-grid, including a scheduling period tThe method comprises the steps of carrying out a first treatment on the surface of the Real-time electricity price E at t period p (t) maximum output of renewable energy sourceAnd load demand P load (t); state of charge SOC (t-1) of the battery at time t-1, controllable generator output P G (t-1); defining a state space as s t
The action space is available control action, including the renewable energy source output P at the t period R (t) energy storage charging/discharging power P B (t), output force P of controllable generator set G (t); defining the action space as a t
a t =(P R (t),P B (t),P G (t)) T (13)
A bonus function is used to evaluate system performance, the bonus function being equal to the inverse of the sum of operating cost, energy storage system operating penalty, instant bonus being defined as r t+1 (s t ,a t ) The cumulative prize from start to end of training is defined as R t
r t+1 (s t |a t )=-Z 0 -C soc (t) (16)
R t =r(a t |s t )+γr(a t+1 |s t+1 )+γ 2 r(a t+2 |s t+2 )+…+γ T-t r(a T |s T ) (17)
Wherein, gamma E (0, 1) is a rewarding discount coefficient and also a factor reflecting the relative importance of current and future rewards;
furthermore, for policy pi, a single exploration procedure is from state s t Start and perform action a t The method comprises the steps of carrying out a first treatment on the surface of the Action value function V from a description of a desired jackpot during exploration π (s, a) to describe;
further, in the DDPG algorithm, the fitting capability maps states to action policies, and state-action pairs to value functions; adopting DDPG algorithm in deep reinforcement learning to solve micro-grid dispatching optimization problem, and collecting current state information s by background data processing t+1 Return value r t And previous state information s t Forming a sample cell and storing in a data pool; resampling M sample cells(s) in a sample storage data pool t ,a t ,r t ,s t+1 ) T=1, 2, … M and store it in the experience pool for training, i.e. experience sample playback;
in the optimization process, firstly, calculating an action predicted value and a corresponding target evaluation value according to a target network parameter which is not updated currently to obtain a loss function L for evaluating network training, and then updating parameters of a main evaluation network; secondly, updating parameters of a main strategy network, a target network and an evaluation network through training a deep neural network; acquiring a current action value according to the updated target network, and outputting the current action value to a micro-grid control module; and collecting the state information of the micro-grid at the time t+1 as a new sample, and carrying out the next study and calculation.
Compared with the prior art, the invention has the following beneficial effects:
the micro-grid energy storage optimization scheduling model based on deep reinforcement learning is provided by taking the micro-grid scheduling modeling of energy storage optimization operation into consideration, taking economy as a planning target and taking the energy storage optimization operation, equipment constraint and tide constraint in the micro-grid into consideration; the problem that the energy storage optimization operation is suitable for various scenes is considered, and meanwhile, the limitation of the prediction data error of the micro-grid and the limitation of the traditional algorithm are also considered.
Drawings
FIG. 1 is a schematic diagram of an interactive process of dispatching agents and micro grid energy systems based on reinforcement learning.
Fig. 2 is a schematic diagram of a network framework for scheduling policy optimization using DDPG algorithm.
Detailed Description
For the purpose of making the technical solution and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings and examples. It should be understood that the particular embodiments described herein are illustrative only and are not intended to limit the invention, i.e., the embodiments described are merely some, but not all, of the embodiments of the invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present invention. It is noted that relational terms such as "first" and "second", and the like, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
The utility model provides a micro-grid energy storage optimization scheduling method based on deep reinforcement learning, which comprises the following steps:
taking the energy storage optimization operation micro-grid dispatching model as a planning target, and taking the energy storage optimization operation, equipment constraint and tide constraint in the micro-grid into consideration, and providing a micro-grid energy storage optimization dispatching model based on deep reinforcement learning;
converting the micro-grid dispatching model into a Markov decision process, firstly defining a state space, an action space and a reward function by using reinforcement learning, and introducing state transition probability and discount factors; combining the state transition probability and the discount factor to define a complete Markov decision process;
after training, finding out an optimization strategy for actual dispatching of the micro-grid, and continuously improving the strategy in an iterative mode by combining a deep neural network and a deterministic strategy gradient method by using a DDPG algorithm, wherein the strategy is used for training a deterministic strategy to maximize the cumulative rewards; the Actor and Critic networks are iteratively trained until the strategy converges or a predetermined number of training steps is reached.
Further, the objective function of the micro-grid energy storage optimization scheduling model is as follows:
the optimal scheduling of the micro-grid is required to meet the economy, namely the operation cost is minimized; the operation cost comprises the wind-discarding/light-emitting cost, the gas turbine operation cost, the pollution gas emission cost and the cost of purchasing electricity from the main power grid by the micro power grid; wherein, the energy storage optimization operation is also considered;
1) Running cost;
wherein P is R (t) generating output in t time period for renewable energy sources in the system;the maximum value of the power generation output of the new energy is obtained by prediction before the day; c (C) R The wind and light discarding cost is the unit power; to fully eliminate new energy, C can be used R Is set large enough to ensure that the power discarding is generated only when the constraint condition cannot be met; p (P) G,k (t) is the active power output of the kth adjustable generator in the system during period t; n (N) G The number of the generators can be adjusted; a, a k 、b k 、c k Generating cost for the kth controllable generatorCoefficients; alpha em The discharge cost is the unit generated power; p (P) line (t) is the power on the tie line between the microgrid and the main grid for period t; e (E) p (t) is the real-time electricity price of the period t;
the formula (1) defines the operation cost as the sum of the wind and light discarding cost, the operation cost and the emission cost of the controllable generator and the electricity purchasing cost of the upper power grid;
2) An energy storage system operation penalty;
SOC(t+1)=SOC(t)+ΔSOC(t) (2)
wherein, SOC (t) is the charge state of the energy storage system in a t period; ΔSOC (t) is the state of charge of the energy storage system that varies over a period of t; p (P) B (t) is the charge/discharge power of the energy storage system during period t; η (eta) dis And eta ch Discharging and charging efficiencies, respectively; e (E) B Is the capacity of the energy storage system; wherein P is B (t) < 0 represents discharge state, P B (t) > 0 represents a state of charge; c (C) soc (t) is a penalty for operation of the energy storage system; SOC (State of Charge) max And SOC (System on chip) min The upper and lower limits of the state of charge of the optimal operation range of the energy storage system; lambda (lambda) 1 、λ 2 、λ 3 And lambda (lambda) 4 Is a penalty coefficient.
Further, constraint conditions of the micro-grid energy storage optimization scheduling model are as follows:
1) Wind and light constraint;
2) Controllable generator constraint;
in the method, in the process of the invention,and->The upper limit and the lower limit of the output of the controllable generator are respectively; CR (computed radiography) I And CR (CR) D Respectively representing the output limit of the controllable generator which is increased or decreased in unit time, namely the upper limit of the climbing rate;
3) Energy storage system constraint;
SOC min ≤SOC(t)≤SOC max (8)
-P N ≤P B (t)≤P N (9)
wherein P is N An upper limit of charge/discharge power of the energy storage system;
4) Flow restriction;
in the method, in the process of the invention,and->The upper and lower limits of the power on the connecting line are respectively; u (U) j The voltage at node j; u (U) N Is the nominal value of the node voltage.
Further, the energy storage optimization scheduling problem of the micro-grid is converted into an MDP (shown in fig. 1), and the core elements of the reinforcement learning problem are defined: state space, action space, rewards function, MDP provides a formalized method to represent the problem and is used to build a mathematical model of the microgrid optimization problem; scheduling the agent to train by interacting with the microgrid environment, and learning an optimal strategy according to a DDPG algorithm; in the training process, various scenes including different load demands, renewable energy source changes and power price fluctuation are simulated so as to improve the robustness of the agent program;
wherein the state space comprises various power parameters in the microgrid, including a scheduling time period t; real-time electricity price E at t period p (t) maximum output of renewable energy sourceAnd load demand P load (t); state of charge SOC (t-1) of the battery at time t-1, controllable generator output P G (t-1); defining a state space as s t
The action space is available control action, including the renewable energy source output P at the t period R (t) energy storage charging/discharging power P B (t), output force P of controllable generator set G (t); defining the action space as a t
a t =(P R (t),P B (t),P G (t)) T (13)
A bonus function is used to evaluate system performance, the bonus function being equal to the inverse of the sum of operating cost, energy storage system operating penalty, instant bonus being defined as r t+1 (s t ,a t ) The cumulative prize from start to end of training is defined as R t
r t+1 (s t |a t )=-Z 0 -C soc (t) (16)
R t =r(a t |s t )+γr(a t+1 |s t+1 )+γ 2 r(a t+2 |s t+2 )+…+γ T-t r(a T |s T ) (17)
Wherein, gamma E (0, 1) is a rewarding discount coefficient and also a factor reflecting the relative importance of current and future rewards;
furthermore, for policy pi, a single exploration procedure is from state s t Start and perform action a t The method comprises the steps of carrying out a first treatment on the surface of the Action value function V from a description of a desired jackpot during exploration π (s, a) to describe;
further, as shown in fig. 2, in the DDPG algorithm, the fitting ability maps states to action policies, and maps state-action pairs to value functions; adopting DDPG algorithm in deep reinforcement learning to solve micro-grid dispatching optimization problem, and collecting current state information s by background data processing t+1 Return value r t And previous state information s t Forming a sample cell and storing in a data pool; resampling M sample cells(s) in a sample storage data pool t ,a t ,r t ,s t+1 ) T=1, 2, … M and store it in the experience pool for training, i.e. experience sample playback;
in the optimization process, firstly, calculating an action predicted value and a corresponding target evaluation value according to a target network parameter which is not updated currently to obtain a loss function L for evaluating network training, and then updating parameters of a main evaluation network; secondly, updating parameters of a main strategy network, a target network and an evaluation network through training a deep neural network; acquiring a current action value according to the updated target network, and outputting the current action value to a micro-grid control module; and collecting the state information of the micro-grid at the time t+1 as a new sample, and carrying out the next study and calculation.
The above is a preferred embodiment of the present invention, and all changes made according to the technical solution of the present invention belong to the protection scope of the present invention when the generated functional effects do not exceed the scope of the technical solution of the present invention.

Claims (5)

1. The micro-grid energy storage optimization scheduling method based on deep reinforcement learning is characterized by comprising the following steps of:
taking the energy storage optimization operation micro-grid dispatching model as a dispatching target, and taking the energy storage optimization operation, equipment constraint and tide constraint in the micro-grid into consideration, and providing a micro-grid energy storage optimization dispatching model based on deep reinforcement learning;
converting the micro-grid dispatching model into a Markov decision process, firstly defining a state space, an action space and a reward function by using reinforcement learning, and introducing state transition probability and discount factors; combining the state transition probability and the discount factor to define a complete Markov decision process;
after training, finding out an optimization strategy for actual dispatching of the micro-grid, and continuously improving the strategy in an iterative mode by combining a deep neural network and a deterministic strategy gradient method by using a DDPG algorithm, wherein the strategy is used for training a deterministic strategy to maximize the cumulative rewards; the Actor and Critic networks are iteratively trained until the strategy converges or a predetermined number of training steps is reached.
2. The deep reinforcement learning-based micro-grid energy storage optimization scheduling method according to claim 1, wherein an objective function of a micro-grid energy storage optimization scheduling model is as follows:
the optimal scheduling of the micro-grid is required to meet the economy, namely the operation cost is minimized; the operation cost comprises the wind-discarding/light-emitting cost, the gas turbine operation cost, the pollution gas emission cost and the cost of purchasing electricity from the main power grid by the micro power grid; wherein, the energy storage optimization operation is also considered;
1) Running cost;
wherein P is R (t) generating output in t time period for renewable energy sources in the system;for period tThe maximum value of the power generated by the new energy is obtained by prediction before the day; c (C) R The wind and light discarding cost is the unit power; to fully eliminate new energy, C can be used R Is set large enough to ensure that the power discarding is generated only when the constraint condition cannot be met; p (P) G,k (t) is the active power output of the kth adjustable generator in the system during period t; n (N) G The number of the generators can be adjusted; a, a k 、b k 、c k The power generation cost coefficient of the kth controllable generator is set; alpha em The discharge cost is the unit generated power; p (P) line (t) is the power on the tie line between the microgrid and the main grid for period t; e (E) p (t) is the real-time electricity price of the period t;
the formula (1) defines the operation cost as the sum of the wind and light discarding cost, the operation cost and the emission cost of the controllable generator and the electricity purchasing cost of the upper power grid;
2) An energy storage system operation penalty;
SOC(t+1)=SOC(t)+ΔSOC(t) (2)
wherein, SOC (t) is the charge state of the energy storage system in a t period; ΔSOC (t) is the state of charge of the energy storage system that varies over a period of t; p (P) B (t) is the charge/discharge power of the energy storage system during period t; η (eta) dis And eta ch Discharging and charging efficiencies, respectively; e (E) B Is the capacity of the energy storage system; wherein P is B (t) < 0 represents discharge state, P B (t) > 0 represents a state of charge; c (C) soc (t) is a penalty for operation of the energy storage system; SOC (State of Charge) max And SOC (System on chip) min The upper and lower limits of the state of charge of the optimal operation range of the energy storage system; lambda (lambda) 1 、λ 2 、λ 3 And lambda (lambda) 4 Is a penalty coefficient.
3. The deep reinforcement learning-based micro-grid energy storage optimization scheduling method according to claim 2, wherein constraint conditions of a micro-grid energy storage optimization scheduling model are as follows:
1) Wind and light constraint;
2) Controllable generator constraint;
in the method, in the process of the invention,and->The upper limit and the lower limit of the output of the controllable generator are respectively; CR (computed radiography) I And CR (CR) D Respectively representing the output limit of the controllable generator which is increased or decreased in unit time, namely the upper limit of the climbing rate;
3) Energy storage system constraint;
SOC min ≤SOC(t)≤SOC max (8)
-P N ≤P B (t)≤P N (9)
wherein P is N An upper limit of charge/discharge power of the energy storage system;
4) Flow restriction;
in the method, in the process of the invention,and->The upper and lower limits of the power on the connecting line are respectively; u (U) j The voltage at node j; u (U) N Is the nominal value of the node voltage.
4. The micro-grid energy storage optimization scheduling method based on deep reinforcement learning according to claim 3, wherein the energy storage optimization scheduling problem of the micro-grid is converted into an MDP, and the core elements of the reinforcement learning problem are defined: state space, action space, rewards function, MDP provides a formalized method to represent the problem and is used to build a mathematical model of the microgrid optimization problem; scheduling the agent to train by interacting with the microgrid environment, and learning an optimal strategy according to a DDPG algorithm; in the training process, various scenes including different load demands, renewable energy source changes and power price fluctuation are simulated so as to improve the robustness of the agent program;
wherein the state space comprises various power parameters in the microgrid, including a scheduling time period t; real-time electricity price E at t period p (t) maximum output of renewable energy sourceAnd load demand P load (t); state of charge SOC (t-1) of the battery at time t-1, controllable generator output P G (t-1); defining a state space as s t
The action space is available control action, including the renewable energy source output P at the t period R (t) energy storage charging/discharging power P B (t), output force P of controllable generator set G (t); defining the action space as a t
a t =(P R (t),P B (t),P G (t)) T (13)
A bonus function is used to evaluate system performance, the bonus function being equal to the inverse of the sum of operating cost, energy storage system operating penalty, instant bonus being defined as r t+1 (s t ,a t ) The cumulative prize from start to end of training is defined as R t
r t+1 (s t |a t )=-Z 0 -C soc (t) (16)
R t =r(a t |s t )+γr(a t+1 |s t+1 )+γ 2 r(a t+2 |s t+2 )+…+γ T-t r(a T |s T ) (17)
Wherein, gamma E (0, 1) is a rewarding discount coefficient and also a factor reflecting the relative importance of current and future rewards;
furthermore, for policy pi, a single exploration procedure is from state s t Start and perform action a t The method comprises the steps of carrying out a first treatment on the surface of the Action value function V from a description of a desired jackpot during exploration π (s, a) to describe;
5. the micro-grid energy storage optimization scheduling method based on deep reinforcement learning according to claim 4, wherein in the DDPG algorithm, fitting capacity maps states to action strategies and state-action pairs to value functions; solving by adopting DDPG algorithm in deep reinforcement learningMicro-grid dispatching optimization problem, and background data processing is used for collecting current state information s t+1 Return value r t And previous state information s t Forming a sample cell and storing in a data pool; resampling M sample cells(s) in a sample storage data pool t ,a t ,r t ,s t+1 ) T=1, 2, … M and store it in the experience pool for training, i.e. experience sample playback;
in the optimization process, firstly, calculating an action predicted value and a corresponding target evaluation value according to a target network parameter which is not updated currently to obtain a loss function L for evaluating network training, and then updating parameters of a main evaluation network; secondly, updating parameters of a main strategy network, a target network and an evaluation network through training a deep neural network; acquiring a current action value according to the updated target network, and outputting the current action value to a micro-grid control module; and collecting the state information of the micro-grid at the time t+1 as a new sample, and carrying out the next study and calculation.
CN202311618193.2A 2023-11-29 2023-11-29 Micro-grid energy storage optimization scheduling method based on deep reinforcement learning Pending CN117833285A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311618193.2A CN117833285A (en) 2023-11-29 2023-11-29 Micro-grid energy storage optimization scheduling method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311618193.2A CN117833285A (en) 2023-11-29 2023-11-29 Micro-grid energy storage optimization scheduling method based on deep reinforcement learning

Publications (1)

Publication Number Publication Date
CN117833285A true CN117833285A (en) 2024-04-05

Family

ID=90504796

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311618193.2A Pending CN117833285A (en) 2023-11-29 2023-11-29 Micro-grid energy storage optimization scheduling method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN117833285A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118316024A (en) * 2024-04-07 2024-07-09 国家电网有限公司 Electric power intelligent scheduling method based on artificial intelligence
CN118473021A (en) * 2024-07-10 2024-08-09 格瓴新能源科技(杭州)有限公司 Micro-grid optimal scheduling method and system combining CMA-ES algorithm and DDPG algorithm
CN118504417A (en) * 2024-07-09 2024-08-16 暨南大学 Reinforced learning optimization scheduling method and system considering scheduling experience
CN118572795A (en) * 2024-07-10 2024-08-30 格瓴新能源科技(杭州)有限公司 Micro-grid group optimal scheduling method and system based on MADDPG and pareto front edge combination
CN118657264A (en) * 2024-08-21 2024-09-17 中国电力科学研究院有限公司 Energy Internet optimization operation method and related device
CN118710067A (en) * 2024-08-29 2024-09-27 中国特种设备检测研究院 Information multiplexing method and system based on deep learning

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118316024A (en) * 2024-04-07 2024-07-09 国家电网有限公司 Electric power intelligent scheduling method based on artificial intelligence
CN118504417A (en) * 2024-07-09 2024-08-16 暨南大学 Reinforced learning optimization scheduling method and system considering scheduling experience
CN118473021A (en) * 2024-07-10 2024-08-09 格瓴新能源科技(杭州)有限公司 Micro-grid optimal scheduling method and system combining CMA-ES algorithm and DDPG algorithm
CN118572795A (en) * 2024-07-10 2024-08-30 格瓴新能源科技(杭州)有限公司 Micro-grid group optimal scheduling method and system based on MADDPG and pareto front edge combination
CN118657264A (en) * 2024-08-21 2024-09-17 中国电力科学研究院有限公司 Energy Internet optimization operation method and related device
CN118710067A (en) * 2024-08-29 2024-09-27 中国特种设备检测研究院 Information multiplexing method and system based on deep learning

Similar Documents

Publication Publication Date Title
Machlev et al. A review of optimal control methods for energy storage systems-energy trading, energy balancing and electric vehicles
CN117833285A (en) Micro-grid energy storage optimization scheduling method based on deep reinforcement learning
CN114725936A (en) Power distribution network optimization method based on multi-agent deep reinforcement learning
Vosoogh et al. An intelligent day ahead energy management framework for networked microgrids considering high penetration of electric vehicles
CN113098007B (en) Distributed online micro-grid scheduling method and system based on layered reinforcement learning
Varzaneh et al. Optimal energy management for PV‐integrated residential systems including energy storage system
CN117277327A (en) Grid-connected micro-grid optimal energy management method based on intelligent agent
Chen et al. Optimal control strategy for solid oxide fuel cell‐based hybrid energy system using deep reinforcement learning
Zhang et al. Physical-model-free intelligent energy management for a grid-connected hybrid wind-microturbine-PV-EV energy system via deep reinforcement learning approach
CN115544871A (en) Distributed robust energy storage planning method considering renewable power supply space correlation
Khoubseresht et al. An analytical optimum method for simultaneous integration of PV, wind turbine and BESS to maximize technical benefits
CN115423153A (en) Photovoltaic energy storage system energy management method based on probability prediction
Harrold et al. Battery control in a smart energy network using double dueling deep q-networks
CN115099590A (en) Active power distribution network economic optimization scheduling method and system considering light load uncertainty
Sun et al. Interval mixed-integer programming for daily unit commitment and dispatch incorporating wind power
CN106339773A (en) Sensitivity-based active power distribution network distributed power source constant-capacity planning method
Zhang et al. Two-Step Diffusion Policy Deep Reinforcement Learning Method for Low-Carbon Multi-Energy Microgrid Energy Management
Mohammadi et al. Ai-based optimal scheduling of renewable ac microgrids with bidirectional lstm-based wind power forecasting
Sun et al. Optimal Scheduling of Wind-Photovoltaic-Pumped Storage Joint Complementary Power Generation System Based on Improved Firefly Algorithm
CN113988403A (en) Electric vehicle charging load prediction method and system
Sage et al. Economic Battery Storage Dispatch with Deep Reinforcement Learning from Rule-Based Demonstrations
CN112290535A (en) Online scheduling method of electricity-gas integrated energy system based on deep strategy optimization
Jain et al. Battery optimization in microgrids using Markov decision process integrated with load and solar forecasting
Wang et al. A comprehensive survey of the application of swarm intelligent optimization algorithm in photovoltaic energy storage systems
CN117650553B (en) Multi-agent deep reinforcement learning-based 5G base station energy storage battery charge and discharge scheduling method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination