CN117833285A

CN117833285A - Micro-grid energy storage optimization scheduling method based on deep reinforcement learning

Info

Publication number: CN117833285A
Application number: CN202311618193.2A
Authority: CN
Inventors: 邓立; 环加飞; 王伟; 张伟韬
Original assignee: North China Grid Co Ltd
Current assignee: North China Grid Co Ltd
Priority date: 2023-11-29
Filing date: 2023-11-29
Publication date: 2024-04-05

Abstract

The invention relates to a micro-grid energy storage optimization scheduling method based on deep reinforcement learning, belongs to the technical field of electrical engineering, and provides a micro-grid energy storage optimization scheduling model based on deep reinforcement learning, wherein the micro-grid energy storage optimization scheduling model is considered, the economy is taken as a scheduling target, and the energy storage optimization operation, equipment constraint and tide constraint in the micro-grid are considered. The micro-grid dispatching model is converted into a Markov decision process, a state space, an action space and a reward function are firstly defined by reinforcement learning, and state transition probability and discount factors are introduced. The above elements are combined together to define a complete markov decision process. Policy optimization process: after training, an optimization strategy for actual dispatching of the micro-grid is found, and the used DDPG algorithm combines a deep neural network and a deterministic strategy gradient method, so that the strategy is continuously improved in an iterative mode and is used for training a deterministic strategy to maximize the cumulative rewards and the like.

Description

Micro-grid energy storage optimization scheduling method based on deep reinforcement learning

Technical Field

The invention belongs to the technical field of electrical engineering, and particularly relates to a micro-grid energy storage optimization scheduling method based on deep reinforcement learning.

Background

A micro grid refers to a small, independently operating power system that can operate independently of the main grid or can be connected to the main grid and share energy. The uncertainty of renewable energy sources and the fluctuation of load demands can cause energy fluctuation problems for micro-grids.

The optimal dispatching of the micro-grid is an important strategy, can effectively solve the problem of power fluctuation, and ensures the balance between renewable energy sources and energy loads in the micro-grid. In previous studies, load demand over a period of time in the future may be predicted using advanced load prediction models, which are used to optimize scheduling decisions to ensure adequate power supply. Similarly, accurate predictions of renewable energy (e.g., solar and wind) output are also essential. These predictions can be used to determine when and how much renewable energy will be available, thereby facilitating the distribution of power. However, an error problem occurs in the prediction, which affects the accuracy of the scheduling decision. And the energy supply and demand can be balanced by starting a standby generator, adjusting the load or the operation of an energy storage system and the like.

Conventional algorithms can be used to solve the microgrid optimization scheduling problem, especially in the case of low complexity or limited computational resources. Conventional algorithms include linear programming, integer programming, greedy algorithms, genetic algorithms, etc., which may be used alone or in combination, depending on the particular requirements and constraints of the microgrid. In practical application, mathematical modeling and algorithm design are generally required to be combined to solve the problem of optimal scheduling of the micro-grid, so that good performance can be ensured under different conditions. This process involves building complex mathematical models, while the solution efficiency is not high.

The deep reinforcement learning method has some significant benefits in the micro-grid optimization scheduling problem. The micro-grid operation environment may have external factors which are constantly changed, such as weather, energy price, load demand and the like, but deep reinforcement learning can adapt to the changes, and automatic learning and updating strategies so as to meet the optimal scheduling requirements in different scenes. Micro grid scheduling problems typically contain a large number of device state variables and decision variables, and the problem itself can be very complex. The deep neural network can handle high-dimensional state space and decision space and handle nonlinear relationships to better model the microgrid system. The deep reinforcement learning agent can make an autonomous decision without manual intervention. This means that the microgrid can make decisions in a real-time environment without the need to predefine complex rules and policies. Agents can also explore during the learning process to find new strategies, while also utilizing known effective strategies, which help to overcome local optimization problems in the microgrid. Once the deep reinforcement learning agent is trained, the deep reinforcement learning agent can be automatically applied to a micro-grid system to make scheduling decisions in real time, so that the requirement of manual intervention is reduced.

However, the existing micro-grid optimal scheduling method does not consider the problem that energy storage optimal operation is suitable for various scenes, and also does not consider the limitations of the micro-grid prediction data errors and the traditional algorithm.

Disclosure of Invention

The invention aims to provide a micro-grid energy storage optimization scheduling method based on deep reinforcement learning, which is used for solving the technical problems in the prior art, namely, the problems of the existing micro-grid optimization scheduling.

In order to achieve the above purpose, the technical scheme of the invention is as follows:

a micro-grid energy storage optimization scheduling method based on deep reinforcement learning comprises the following steps:

taking the energy storage optimization operation micro-grid dispatching model as a planning target, and taking the energy storage optimization operation, equipment constraint and tide constraint in the micro-grid into consideration, and providing a micro-grid energy storage optimization dispatching model based on deep reinforcement learning;

converting the micro-grid dispatching model into a Markov decision process, firstly defining a state space, an action space and a reward function by using reinforcement learning, and introducing state transition probability and discount factors; combining the state transition probability and the discount factor to define a complete Markov decision process;

after training, finding out an optimization strategy for actual dispatching of the micro-grid, and continuously improving the strategy in an iterative mode by combining a deep neural network and a deterministic strategy gradient method by using a DDPG algorithm, wherein the strategy is used for training a deterministic strategy to maximize the cumulative rewards; the Actor and Critic networks are iteratively trained until the strategy converges or a predetermined number of training steps is reached.

Further, the objective function of the micro-grid energy storage optimization scheduling model is as follows:

the optimal scheduling of the micro-grid is required to meet the economy, namely the operation cost is minimized; the operation cost comprises the wind-discarding/light-emitting cost, the gas turbine operation cost, the pollution gas emission cost and the cost of purchasing electricity from the main power grid by the micro power grid; wherein, the energy storage optimization operation is also considered;

1) Running cost;

wherein P is _R (t) generating output in t time period for renewable energy sources in the system;the maximum value of the power generation output of the new energy is obtained by prediction before the day; c (C) _R The wind and light discarding cost is the unit power; to fully eliminate new energy, C can be used _R Is set large enough to ensure that the power discarding is generated only when the constraint condition cannot be met; p (P) _G,k (t) is the active power output of the kth adjustable generator in the system during period t; n (N) _G The number of the generators can be adjusted; a, a _k 、b _k 、c _k The power generation cost coefficient of the kth controllable generator is set; alpha _em The discharge cost is the unit generated power; p (P) _line (t) is the power on the tie line between the microgrid and the main grid for period t; e (E) _p (t) is the real-time electricity price of the period t;

the formula (1) defines the operation cost as the sum of the wind and light discarding cost, the operation cost and the emission cost of the controllable generator and the electricity purchasing cost of the upper power grid;

2) An energy storage system operation penalty;

SOC(t+1)＝SOC(t)+ΔSOC(t) (2)

wherein, SOC (t) is the charge state of the energy storage system in a t period; ΔSOC (t) is the state of charge of the energy storage system that varies over a period of t; p (P) _B (t) is the charge/discharge power of the energy storage system during period t; η (eta) _dis And eta _ch Discharging and charging efficiencies, respectively; e (E) _B Is the capacity of the energy storage system; wherein P is _B (t) < 0 represents discharge state, P _B (t) > 0 represents a state of charge; c (C) _soc (t) is a penalty for operation of the energy storage system; SOC (State of Charge) _max And SOC (System on chip) _min The upper and lower limits of the state of charge of the optimal operation range of the energy storage system; lambda (lambda) ₁ 、λ ₂ 、λ ₃ And lambda (lambda) ₄ Is a penalty coefficient.

Further, constraint conditions of the micro-grid energy storage optimization scheduling model are as follows:

1) Wind and light constraint;

2) Controllable generator constraint;

in the method, in the process of the invention,and->The upper limit and the lower limit of the output of the controllable generator are respectively; CR (computed radiography) _I And CR (CR) _D Respectively representing the output limit of the controllable generator which is increased or decreased in unit time, namely the upper limit of the climbing rate;

3) Energy storage system constraint;

SOC _min ≤SOC(t)≤SOC _max (8)

-P _N ≤P _B (t)≤P _N (9)

wherein P is _N An upper limit of charge/discharge power of the energy storage system;

4) Flow restriction;

in the method, in the process of the invention,and->The upper and lower limits of the power on the connecting line are respectively; u (U) _j The voltage at node j; u (U) _N Is the nominal value of the node voltage.

Further, the energy storage optimization scheduling problem of the micro-grid is converted into an MDP, and the core elements of the reinforcement learning problem are defined: state space, action space, rewards function, MDP provides a formalized method to represent the problem and is used to build a mathematical model of the microgrid optimization problem; scheduling the agent to train by interacting with the microgrid environment, and learning an optimal strategy according to a DDPG algorithm; in the training process, various scenes including different load demands, renewable energy source changes and power price fluctuation are simulated so as to improve the robustness of the agent program;

wherein the state space comprises various power parameters in the micro-grid, including a scheduling period tThe method comprises the steps of carrying out a first treatment on the surface of the Real-time electricity price E at t period _p (t) maximum output of renewable energy sourceAnd load demand P _load (t); state of charge SOC (t-1) of the battery at time t-1, controllable generator output P _G (t-1); defining a state space as s _t ；

The action space is available control action, including the renewable energy source output P at the t period _R (t) energy storage charging/discharging power P _B (t), output force P of controllable generator set _G (t); defining the action space as a _t ；

a _t ＝(P _R (t),P _B (t),P _G (t)) ^T (13)

A bonus function is used to evaluate system performance, the bonus function being equal to the inverse of the sum of operating cost, energy storage system operating penalty, instant bonus being defined as r _t+1 (s _t ,a _t ) The cumulative prize from start to end of training is defined as R _t ；

r _t+1 (s _t |a _t )＝-Z ₀ -C _soc (t) (16)

R _t ＝r(a _t |s _t )+γr(a _t+1 |s _t+1 )+γ ² r(a _t+2 |s _t+2 )+…+γ ^T-t r(a _T |s _T ) (17)

Wherein, gamma E (0, 1) is a rewarding discount coefficient and also a factor reflecting the relative importance of current and future rewards;

furthermore, for policy pi, a single exploration procedure is from state s _t Start and perform action a _t The method comprises the steps of carrying out a first treatment on the surface of the Action value function V from a description of a desired jackpot during exploration ^π (s, a) to describe;

further, in the DDPG algorithm, the fitting capability maps states to action policies, and state-action pairs to value functions; adopting DDPG algorithm in deep reinforcement learning to solve micro-grid dispatching optimization problem, and collecting current state information s by background data processing _t+1 Return value r _t And previous state information s _t Forming a sample cell and storing in a data pool; resampling M sample cells(s) in a sample storage data pool _t ,a _t ,r _t ,s _t+1 ) T=1, 2, … M and store it in the experience pool for training, i.e. experience sample playback;

in the optimization process, firstly, calculating an action predicted value and a corresponding target evaluation value according to a target network parameter which is not updated currently to obtain a loss function L for evaluating network training, and then updating parameters of a main evaluation network; secondly, updating parameters of a main strategy network, a target network and an evaluation network through training a deep neural network; acquiring a current action value according to the updated target network, and outputting the current action value to a micro-grid control module; and collecting the state information of the micro-grid at the time t+1 as a new sample, and carrying out the next study and calculation.

Compared with the prior art, the invention has the following beneficial effects:

the micro-grid energy storage optimization scheduling model based on deep reinforcement learning is provided by taking the micro-grid scheduling modeling of energy storage optimization operation into consideration, taking economy as a planning target and taking the energy storage optimization operation, equipment constraint and tide constraint in the micro-grid into consideration; the problem that the energy storage optimization operation is suitable for various scenes is considered, and meanwhile, the limitation of the prediction data error of the micro-grid and the limitation of the traditional algorithm are also considered.

Drawings

FIG. 1 is a schematic diagram of an interactive process of dispatching agents and micro grid energy systems based on reinforcement learning.

Fig. 2 is a schematic diagram of a network framework for scheduling policy optimization using DDPG algorithm.

Detailed Description

For the purpose of making the technical solution and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings and examples. It should be understood that the particular embodiments described herein are illustrative only and are not intended to limit the invention, i.e., the embodiments described are merely some, but not all, of the embodiments of the invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present invention. It is noted that relational terms such as "first" and "second", and the like, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.

Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

The utility model provides a micro-grid energy storage optimization scheduling method based on deep reinforcement learning, which comprises the following steps:

1) Running cost;

wherein P is _R (t) generating output in t time period for renewable energy sources in the system;the maximum value of the power generation output of the new energy is obtained by prediction before the day; c (C) _R The wind and light discarding cost is the unit power; to fully eliminate new energy, C can be used _R Is set large enough to ensure that the power discarding is generated only when the constraint condition cannot be met; p (P) _G,k (t) is the active power output of the kth adjustable generator in the system during period t; n (N) _G The number of the generators can be adjusted; a, a _k 、b _k 、c _k Generating cost for the kth controllable generatorCoefficients; alpha _em The discharge cost is the unit generated power; p (P) _line (t) is the power on the tie line between the microgrid and the main grid for period t; e (E) _p (t) is the real-time electricity price of the period t;

2) An energy storage system operation penalty;

SOC(t+1)＝SOC(t)+ΔSOC(t) (2)

1) Wind and light constraint;

2) Controllable generator constraint;

3) Energy storage system constraint;

SOC _min ≤SOC(t)≤SOC _max (8)

-P _N ≤P _B (t)≤P _N (9)

4) Flow restriction;

Further, the energy storage optimization scheduling problem of the micro-grid is converted into an MDP (shown in fig. 1), and the core elements of the reinforcement learning problem are defined: state space, action space, rewards function, MDP provides a formalized method to represent the problem and is used to build a mathematical model of the microgrid optimization problem; scheduling the agent to train by interacting with the microgrid environment, and learning an optimal strategy according to a DDPG algorithm; in the training process, various scenes including different load demands, renewable energy source changes and power price fluctuation are simulated so as to improve the robustness of the agent program;

wherein the state space comprises various power parameters in the microgrid, including a scheduling time period t; real-time electricity price E at t period _p (t) maximum output of renewable energy sourceAnd load demand P _load (t); state of charge SOC (t-1) of the battery at time t-1, controllable generator output P _G (t-1); defining a state space as s _t ；

a _t ＝(P _R (t),P _B (t),P _G (t)) ^T (13)

r _t+1 (s _t |a _t )＝-Z ₀ -C _soc (t) (16)

further, as shown in fig. 2, in the DDPG algorithm, the fitting ability maps states to action policies, and maps state-action pairs to value functions; adopting DDPG algorithm in deep reinforcement learning to solve micro-grid dispatching optimization problem, and collecting current state information s by background data processing _t+1 Return value r _t And previous state information s _t Forming a sample cell and storing in a data pool; resampling M sample cells(s) in a sample storage data pool _t ,a _t ,r _t ,s _t+1 ) T=1, 2, … M and store it in the experience pool for training, i.e. experience sample playback;

The above is a preferred embodiment of the present invention, and all changes made according to the technical solution of the present invention belong to the protection scope of the present invention when the generated functional effects do not exceed the scope of the technical solution of the present invention.

Claims

1. The micro-grid energy storage optimization scheduling method based on deep reinforcement learning is characterized by comprising the following steps of:

taking the energy storage optimization operation micro-grid dispatching model as a dispatching target, and taking the energy storage optimization operation, equipment constraint and tide constraint in the micro-grid into consideration, and providing a micro-grid energy storage optimization dispatching model based on deep reinforcement learning;

2. The deep reinforcement learning-based micro-grid energy storage optimization scheduling method according to claim 1, wherein an objective function of a micro-grid energy storage optimization scheduling model is as follows:

1) Running cost;

wherein P is _R (t) generating output in t time period for renewable energy sources in the system;for period tThe maximum value of the power generated by the new energy is obtained by prediction before the day; c (C) _R The wind and light discarding cost is the unit power; to fully eliminate new energy, C can be used _R Is set large enough to ensure that the power discarding is generated only when the constraint condition cannot be met; p (P) _G,k (t) is the active power output of the kth adjustable generator in the system during period t; n (N) _G The number of the generators can be adjusted; a, a _k 、b _k 、c _k The power generation cost coefficient of the kth controllable generator is set; alpha _em The discharge cost is the unit generated power; p (P) _line (t) is the power on the tie line between the microgrid and the main grid for period t; e (E) _p (t) is the real-time electricity price of the period t;

2) An energy storage system operation penalty;

SOC(t+1)＝SOC(t)+ΔSOC(t) (2)

3. The deep reinforcement learning-based micro-grid energy storage optimization scheduling method according to claim 2, wherein constraint conditions of a micro-grid energy storage optimization scheduling model are as follows:

1) Wind and light constraint;

2) Controllable generator constraint;

3) Energy storage system constraint;

SOC _min ≤SOC(t)≤SOC _max (8)

-P _N ≤P _B (t)≤P _N (9)

4) Flow restriction;

4. The micro-grid energy storage optimization scheduling method based on deep reinforcement learning according to claim 3, wherein the energy storage optimization scheduling problem of the micro-grid is converted into an MDP, and the core elements of the reinforcement learning problem are defined: state space, action space, rewards function, MDP provides a formalized method to represent the problem and is used to build a mathematical model of the microgrid optimization problem; scheduling the agent to train by interacting with the microgrid environment, and learning an optimal strategy according to a DDPG algorithm; in the training process, various scenes including different load demands, renewable energy source changes and power price fluctuation are simulated so as to improve the robustness of the agent program;

a _t ＝(P _R (t),P _B (t),P _G (t)) ^T (13)

r _t+1 (s _t |a _t )＝-Z ₀ -C _soc (t) (16)

5. the micro-grid energy storage optimization scheduling method based on deep reinforcement learning according to claim 4, wherein in the DDPG algorithm, fitting capacity maps states to action strategies and state-action pairs to value functions; solving by adopting DDPG algorithm in deep reinforcement learningMicro-grid dispatching optimization problem, and background data processing is used for collecting current state information s _t+1 Return value r _t And previous state information s _t Forming a sample cell and storing in a data pool; resampling M sample cells(s) in a sample storage data pool _t ,a _t ,r _t ,s _t+1 ) T=1, 2, … M and store it in the experience pool for training, i.e. experience sample playback;