CN117277327A

CN117277327A - Grid-connected micro-grid optimal energy management method based on intelligent agent

Info

Publication number: CN117277327A
Application number: CN202311206909.8A
Authority: CN
Inventors: 杨志淳; 姚志荣; 沈煜; 杨帆; 李进扬; 崔世常; 闵怀东; 雷杨; 胡伟; 吴畏; 姚金林; 操燕春; 方石磊
Original assignee: Huazhong University of Science and Technology; Electric Power Research Institute of State Grid Hubei Electric Power Co Ltd
Current assignee: Huazhong University of Science and Technology; Electric Power Research Institute of State Grid Hubei Electric Power Co Ltd
Priority date: 2023-09-18
Filing date: 2023-09-18
Publication date: 2023-12-22

Abstract

An agent-based grid-connected micro-grid optimal energy management method, comprising: (1) Modeling a sequential energy management process of a grid-connected micro-grid which comprises different types of distributed power sources, energy storage devices and user loads and can purchase and sell electric quantity to a power distribution network as a Markov decision process; (2) Designing a reward function of a Markov decision process based on exchange power constraint, tide constraint and voltage constraint of the micro-grid and the power distribution network; (3) And solving the optimal stability strategy of the established Markov decision process by adopting a deep Q learning method, namely the optimal energy management strategy of the micro-grid. Compared with the prior art, the invention has the beneficial effects that: from historical data, uncertainty is learned, an optimal strategy is learned through continuous interaction with the environment, the output action is quite close to the running cost of an optimal solution obtained by mixed integer quadratic programming when the uncertainty factor is completely and accurately predicted, the calculation time is shorter, and the running cost of the micro-grid can be effectively reduced.

Description

Grid-connected micro-grid optimal energy management method based on intelligent agent

Technical Field

The invention relates to the field of electrical engineering, in particular to an optimal energy management method for a micro-grid.

Background

The renewable energy source is a necessary way for developing the energy source in China, and by the end of 2022, the renewable energy source installation in China breaks through 12 hundred million kilowatts to 12.13 hundred million kilowatts, and the renewable energy source installation accounts for 47.3 percent of the total installation of the national power generation and is improved by 2.5 percent compared with 2021. Wherein, wind power is 3.65 hundred million kilowatts, solar energy is 3.93 hundred million kilowatts. However, the output of the distributed energy sources such as the photovoltaic energy sources, the fan energy sources and the like depends on the distribution characteristics of the renewable energy sources, the randomness and the fluctuation are obvious, and certain challenges are brought to the planning, the operation and the management of the power distribution network if the power distribution network is accessed in a large-scale and distributed manner. The distributed power sources such as the photovoltaic power source, the fan power source and the like are connected into a power grid in the form of a micro-grid, so that the method is an effective way for effectively solving the problem of large-scale application of the distributed renewable energy sources and further improving the installed capacity of the renewable energy sources.

The micro-grid is a small power generation, distribution and utilization system formed by integrating a distributed power supply, an energy storage system, an energy conversion device, a monitoring and protecting device, a load and the like. The micro-grid can be regarded as a small-sized power system, has complete power generation and distribution functions, and can effectively realize energy optimization in the system. The micro-grid can be applied to independent operation in remote areas or islands, can also be connected to grid-connected operation in a power distribution network, and can provide auxiliary services such as power support and standby for the power distribution network while meeting self-load requirements.

The effective energy management of the micro-grid system can optimize operation and reduce cost, and the existing methods such as a mixed integer quadratic programming method are highly dependent on the prediction accuracy of uncertain factors in the system, so that the future fan, photovoltaic power and load demands cannot be accurately predicted in practice, and therefore, the solving results of the methods are difficult to directly apply. In addition, the scale of the micro-grid is further enlarged, the uncertainty of the system is changed, and the conventional method is difficult to provide a universal solution framework.

Disclosure of Invention

Aiming at the defects existing in the prior art, the invention provides an intelligent body-based grid-connected type micro-grid optimal energy management method which can effectively reduce the running cost of the micro-grid through constantly interacting with the environment to learn an optimal strategy.

The invention discloses a grid-connected micro-grid optimal energy management method based on an intelligent agent, which comprises the following steps of:

(1) Modeling a grid-connected micro-grid energy management process as a Markov decision process, wherein state variables of an intelligent agent comprise output power of different distributed power supplies in the micro-grid, active power and reactive power requirements of resident loads, node electricity prices and stored energy of an energy storage device; the action of the intelligent body consists of active power, reactive power of a conventional distributed power supply and charge and discharge power of an energy storage device;

(2) Considering a reward function of the operation constraint of the micro-grid, wherein the reward function of the intelligent agent comprises the operation cost of the micro-grid, namely the power generation cost of a conventional distributed power supply and the electricity purchasing and selling cost of the micro-grid to the power distribution network; meanwhile, the operation constraint of the micro-grid is considered in the reward function, wherein the operation constraint comprises exchange power constraint, tide constraint, voltage constraint and energy storage constraint of the micro-grid and the power distribution network, and an energy management scheme which violates constraint conditions is not output according to an optimal strategy obtained through learning of the reward function;

(3) And solving the optimal stable strategy of the established Markov decision process by adopting a deep Q learning method, wherein an intelligent agent and a micro-grid environment interact once to obtain a group of samples comprising the current state, the action of the intelligent agent, the obtained rewards and the state at the next moment, and learning an optimal action value network by utilizing the samples to obtain an optimal strategy, wherein the optimal strategy outputs an optimal energy management scheme.

In the step (1) of the invention, the state variable of the intelligent agent in the energy management problem of the grid-connected micro grid meets the markov property; aiming at a micro-grid system comprising a conventional distributed power supply, a wind driven generator, a distributed photovoltaic, an energy storage device and a resident load;

the state of the constructed Markov decision process isWherein->Respectively represents the output power of photovoltaic and fan in the past 24 hours, +.>Respectively representing the power requirements of the past 24 hours load, R _t Represents node electricity prices for the past 24 hours, E _t Representing the stored energy of the energy storage device over the past 24 hours. Action as-> Is a vector of active power output by a t-period conventional distributed generator, +.>Respectively representing the active power output by the kth conventional distributed motor in t period, +.>The charge and discharge power of the energy storage device at the time t is represented, the charge state is represented when the charge and discharge power is positive, and the discharge state is represented when the charge and discharge power is negative; furthermore, conventional distributed generators and energy storage devices need to satisfy the following constraints, respectively:

and->Representing maximum and minimum output power, respectively, < > of a conventional distributed power supply>Maximum charge and discharge power of the energy storage device; the action space of the markov decision process is therefore: />

In the step (2) of the invention, aiming at the micro-grid comprising the conventional distributed power supply, the wind driven generator, the distributed photovoltaic, the energy storage device and the residential load, the operation cost of the micro-grid is contained in the reward function, and the operation constraint of the micro-grid is considered in the reward function; when the action of the intelligent body output cannot meet the constraint condition, a smaller rewarding value is obtained, so that the optimal action of the intelligent body output trained by adopting the rewarding function cannot violate the constraint condition;

the invention optimizes the operation cost of the micro-grid, and when the operation constraint of the micro-grid is satisfied, the rewarding function of the intelligent agent is as follows:

wherein r is _t Indicating the rewards of the t-th decision,and->The cost of the kth conventional distributed power supply and the electricity purchasing cost of the micro-grid in the t period are respectively calculated according to the following formulas:

wherein a is _d ，b _d ，c _d As a factor of the cost of the material,to exchange power with the distribution network, when->For positive value, purchasing electricity to the power distribution network, and for negative value, selling electricity to the power distribution network, R _t For real-time electricity prices, Δt is the running step. r is (r) _t Is a negative number of costs, thus maximizing r _t Meaning that the cost is minimized;

the formula is a reward function when the constraint condition is met, and the following constraint condition is specifically considered when the constraint reward function is designed and considered:

(1) Tidal current constraint

Wherein,and->Respectively representing the active power and the reactive power flowing through the branch ij in the t period,/>Indicating the maximum apparent power allowed by branch ij.

(2) Switching power constraints

Wherein,maximum power exchange allowed for the connection line of the micro-grid to the distribution network.

(3) Voltage constraint

Wherein,the voltage of the node i in the t period and the lowest value and the maximum value of the allowable voltage of the node are respectively represented.

(4) Energy storage constraint

E _min ≤E _t ≤E _max (10)

Wherein E is _t For the stored energy of the energy storage device during the period t,for the charge and discharge power of the energy storage device in the t period, when u _t When 1, the energy storage device is charged, and when u is _t When the energy storage device is in a discharge state, the charging and the discharging cannot be performed simultaneously in the same period; η (eta) _c And eta _d Respectively representing the charging efficiency and the discharging efficiency of the system; e (E) _min And E is _max Respectively the minimum value and the maximum value of the energy stored by the energy storage device;

when the agent outputs the actionAfter that, it is first checked whether the constraint is satisfied, if satisfied, the reward is calculated according to the formula, and if not, the reward is calculated according to the following formula:

r _t ＝-ζ (11)

wherein ζ is a very large positive number.

Step 3: solving an optimal strategy of the established Markov decision process by adopting a deep Q learning method;

the cumulative return of the Markov decision process is:

wherein, gamma E [0,1] is discount rate, which is used for reducing the return of long-term income; the state cost function of the markov decision process is:

the intelligent agent and the micro-grid environment in the deep Q learning are interactively sampled to obtain samples (state, action, rewarding and next time state), and parameters of a cost function are updated by using the samples, so that an optimal strategy with the maximum state cost function for all states s is obtained;

action cost function Q ^π (s, a) represents the expected return that would be expected to be obtainable by selecting action a in state s and then following policy pi; in deep Q learning, the action cost function is modeled as a multi-layer neural network Q _w (s, a), the input of the neural network is the state s, and the output is the Q value of each action; to solve instability of neural network training, target networkThe method is used for calculating TD errors, the target network and the training network have the same structure and different parameters; the playback buffer area is used for storing four-tuple data obtained by sampling from the environment, so that training data is better facilitated;

when the optimal strategy of the constructed Markov decision process is solved by deep Q learning, the state is firstly calculatedInput to the agent, agentOutputting an action according to the current policy>Applied to the micro-grid environment, the micro-grid environment firstly judges whether the constraint is met and calculates a rewarding value, and the environment rewards r _t And next state s _t+1 Is fed back to the agent, thus obtaining a set of samples (s _t ，a _t ，r _t ，s _t+1 ) And stores the samples in the playback buffer. The intelligent agent is according to s _t+1 Continuing to sample the environment interactively;

when the data in the playback buffer is sufficient, the Q network is started to be updated, and N groups of samples {(s) are extracted from the Q network each time _i ，a _i ，r _i ，s _i+1 )}} _{i＝1，...，N} The TD error is calculated for each set of samples:

the target loss for this set of samples is:

learning parameters by minimizing target loss;

gradient descent final learning g using sampled data _w The parameters of (s, a) may eventually converge to an optimal value. State s _t Resulting actions input to the optimal state cost functionThe optimal energy management scheme of the micro-grid in the period t is obtained.

Compared with the prior art, the technical scheme of the invention has the following beneficial effects: the invention provides a data-driven model-free method, which learns uncertainty from historical data, learns an optimal strategy through continuous interaction with the environment, outputs actions very close to the running cost of an optimal solution obtained by mixed integer quadratic programming when uncertainty factors are completely and accurately predicted, has shorter calculation time, and can effectively reduce the running cost of a micro-grid.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a microgrid system diagram;

FIG. 3 is a schematic diagram of an agent interacting with a microgrid environment;

FIG. 4 is a comparison of the operating costs of an agent-based method and mixed integer quadratic programming to find the optimal solution.

The invention will be further described in detail below with reference to the accompanying drawings.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Examples

As shown in fig. 1, the method of the present invention includes a markov decision process that establishes a microgrid energy management process model; designing a reward function considering the constraint of the micro-grid; and solving an optimal strategy by adopting a deep Q learning method. Specifically, the following is described.

1. Modeling a grid-connected micro-grid energy management process as a markov decision process:

the present invention is directed to a micro-grid system comprising a conventional distributed power source, wind generator, distributed photovoltaic, energy storage device, and residential load. The micro grid system used in this example is shown in fig. 2, where there are a diesel generator, a photovoltaic, a fan, an energy storage system, and a load. The state of the Markov decision process isWherein->Respectively representing the output power of a photovoltaic and a fan in a micro-grid system of the past 24 hours, +.>Representing the power demand of the load over the past 24 hours, R _t Represents node electricity prices for the past 24 hours, E _t Representing the stored energy of the energy storage device over the past 24 hours. Action as Is a vector of active power output by a t-period conventional distributed generator, +.>Representing the active power output by the kth conventional distributed motor in t period,/for the period of time>The charge and discharge power of the energy storage device at the time t is represented, the charge state is represented when the charge and discharge power is positive, and the discharge state is represented when the charge and discharge power is negative;

furthermore, conventional distributed generators and energy storage devices need to satisfy the following constraints, respectively:

and->Representing maximum and minimum output power, respectively, < > of a conventional distributed power supply>The maximum charge and discharge power of the energy storage device is obtained. The action space of the markov decision process is therefore: />

2. Based on exchange power constraint, tide constraint and voltage constraint and energy storage constraint of the micro-grid and the power distribution network, a reward function of an agent Markov decision process is designed: the system has the following constraints:

(1) Tidal current constraint

(2) Switching power constraints

Wherein,allowed for connection lines of micro-grid and distribution networkMaximum switching power.

(3) Voltage constraint

(4) Energy storage constraint

E _min ≤E _t ≤E _max (7)

when the agent outputs the actionAfter that, it is first checked whether the constraint is satisfied, and if so, the reward is:

wherein r is _i Indicating the rewards of the t-th decision,and->The cost of the kth conventional distributed power supply and the electricity purchasing cost of the micro-grid in the t period are respectively calculated according to the following formulas:

wherein a is _d ，b _d ，c _d As a factor of the cost of the material,to exchange power with the distribution network, when->For positive value, purchasing electricity to the power distribution network, and for negative value, selling electricity to the power distribution network, R _t For real-time electricity prices Δt is the running step length, here 1 hour.

If not, then calculate as:

r _t ＝-ζ (11)

wherein ζ is set to 10 ⁶ 。

3. Optimal strategy for solving Markov decision process by deep Q learning method

The cumulative return of the Markov decision process is:

wherein, gamma E [0,1] is discount rate, which is used for reducing the return of long-term income and is set to 0.9. The state cost function of the markov decision process is:

deep Q learning obtains samples (states, actions, rewards and next states) through interactive sampling with the environment and utilizes sample learning parameters, so that an optimal strategy with the largest state value function for all states s is obtained;

action cost function Q ^π (s, a) represents the expected return that would be expected to be obtainable if action a was selected in state s, and then policy pi was followed. Modeling an action cost function as a multi-layer neural network Q _w The input of the neural network is the state s, the output is the Q value of each action, the larger the Q value is, the better the action is, the neural network comprises 3 hidden layers and input layer output layers, the number of the hidden layer neurons is 512, and the target network and the training network have the same structure.

The interaction process of the intelligent agent and the micro-grid environment is shown in fig. 3, and the state is firstly setInput to agent, agent output action->Acting on the environment, then the micro-grid environment firstly judges whether the constraint is met and calculates the rewarding value, and the environment rewards r _t And next state s _t+1 Is fed back to the agent, thus obtaining a set of samples (s _t ，a _t ，r _t ，s _t+1 ) The intelligent agent is according to s _t+1 Continuing to sample with the environment, the agent stores the samples sampled each time in a playback buffer.

the target loss for this set of samples is:

parameters are learned by minimizing target losses. The playback buffer size is set to 50000, parameter learning is performed when the number of samples exceeds 1000, and N is set to 256.

Fig. 4 shows comparison of the running cost of the optimal solution obtained by adopting the agent method and the optimal solution obtained by adopting the mixed integer quadratic programming, and the difference between the two is very small, but the prediction information of accurate wind power, photovoltaic output and load is required to be obtained when the mixed integer quadratic programming is adopted for solving, so that the uncertainty does not exist.

However, in practice, the future fan, photovoltaic power and load demands cannot be predicted accurately, so that the solving result of the mixed integer quadratic programming algorithm is difficult to apply directly. The method based on the intelligent agent is a model-free method driven by data, the uncertainty is learned from historical data, the output action is very close to the running cost of an optimal solution obtained by mixed integer quadratic programming when the uncertainty factor is completely and accurately predicted, the calculation time is shorter, and the method is an effective method capable of reducing the running cost of a micro-grid.

The foregoing is merely illustrative embodiments of the present invention, and the present invention is not limited thereto, and any changes or substitutions that may be easily contemplated by those skilled in the art within the scope of the present invention should be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims

1. The utility model provides a grid-connected micro-grid optimal energy management method based on an agent, which is characterized by comprising the following steps: (1) Modeling an energy management process of a grid-connected micro-grid which comprises different types of distributed power supplies, energy storage devices and user loads and can purchase and sell electric quantity to a power distribution network as a Markov decision process;

(2) Designing a reward function of a Markov decision process based on the operation constraint of the micro-grid and the power distribution network;

(3) And solving the optimal stability strategy of the established Markov decision process by adopting a deep Q learning method, namely the optimal energy management strategy of the micro-grid.

2. The method for optimal energy management of an agent-based grid-connected micro-grid of claim 1, wherein in step (1), the state variables of the agent include output power of different distributed power sources in the micro-grid, active power and reactive power requirements of residential loads, node electricity prices, and stored energy of an energy storage device; the actions of the intelligent body consist of active power, reactive power of a conventional distributed power supply and charging and discharging power of an energy storage device.

3. The agent-based grid-connected microgrid optimal energy management method of claim 1 or 2, wherein step (1) in the constructed markov decision process framework, the agent state isWherein (1)>Respectively representing the output power of the last 24 hours distributed photovoltaic and wind generator, +.>Representing the power demand of the resident load for the past 24 hours, R _t Represents node electricity prices for the past 24 hours, E _t Representing stored energy of the past 24 hours energy storage device; the action of the intelligent body is-> Is a vector of active power output by a t-period conventional distributed generator, +.>Representing the active power output by the kth conventional distributed motor in t period,/for the period of time>The charging and discharging power of the energy storage device at the time t is shown to be in a charging state when the charging and discharging power is positive, and is shown to be in a discharging state when the charging and discharging power is negative;

wherein,and->Representing maximum and minimum output power, respectively, < > of a conventional distributed power supply>Representing the active power output by the d-th conventional distributed motor in the t period; />Maximum charge and discharge power of the energy storage device; the action space of the markov decision process is therefore: />

4. The intelligent agent-based grid-connected micro-grid optimal energy management method according to claim 1, wherein the step (2) includes running cost of the micro-grid, namely power generation cost of a conventional distributed power supply and cost of purchasing and selling electricity from the micro-grid to the power distribution network; meanwhile, the operation constraint of the micro-grid is considered in the reward function, wherein the operation constraint comprises exchange power constraint, tide constraint, voltage constraint and energy storage constraint of the micro-grid and the power distribution network; the optimal strategy learned according to the reward function does not output an energy management scheme that violates the constraint.

5. The method for optimal energy management of an agent-based grid-connected micro-grid of claim 1 or 4, wherein the step (2) considers a bonus function of the micro-grid operation constraint, and when the micro-grid operation constraint is satisfied, the bonus function of the agent is:

wherein r is _t Indicating the rewards earned by the agent after the t-th decision,and->The power generation cost of the kth conventional distributed power supply in the t period and the electricity purchasing and selling cost of the micro-grid to the power distribution network are respectively calculated according to the following formulas:

wherein a is _d ，b _d ，c _d As a factor of the cost of the material,to exchange power with the distribution network, when->For positive value, purchasing electricity to the power distribution network, and for negative value, selling electricity to the power distribution network, R _t The real-time electricity price is given, and deltat is the running step length; r is (r) _t Is a negative number of costs, thus maximizing r _t Meaning that the cost is minimized;

the formula is a reward function when the constraint condition is met, and the following constraint condition is specifically considered when the reward function of the micro grid operation constraint is designed and considered:

(1) Tidal current constraint

Wherein,and->Respectively representing the active power and the reactive power flowing through the branch ij in the t period,/>Representing the maximum apparent power allowed by branch ij;

(2) Switching power constraints

Wherein,maximum exchange power allowed for a connecting line of the micro-grid and the power distribution network;

(3) Voltage constraint

Wherein,respectively representing the minimum value and the maximum value of the voltage of the node i in the t period and the allowable voltage of the node;

(4) Energy storage constraint

E _min ≤E _t ≤E _max (10)

when intelligenceBody output actionAfter that, it is first checked whether the constraint is satisfied, if satisfied, the reward is calculated according to the formula, and if not, the reward is calculated according to the following formula:

r _t ＝-ζ (11)

wherein ζ is a very large positive number.

6. The agent-based grid-connected microgrid optimal energy management method of claim 1, wherein: and (3) solving the optimal stable strategy of the established Markov decision process by adopting a deep Q learning method, wherein a group of samples comprising the current state, the action of the intelligent agent, the obtained rewards and the state at the next moment can be obtained by performing one-time interaction between the intelligent agent and the micro-grid environment, and the optimal action value network is learned by utilizing the samples to obtain an optimal strategy which outputs an optimal energy management scheme.

7. The intelligent agent-based grid-connected micro grid optimal energy management method according to claim 1 or 6, wherein the step (3) adopts a deep Q learning method to solve an optimal smooth strategy of an established markov decision process, and the accumulated return of the markov decision process is as follows:

wherein, gamma is E [0,1]]For discounted rate, for reducing return on long-term revenue; gamma ray ^t Discount rate for period t; the state cost function of the markov decision process is:

action cost function Q ^π (s, a) represents the expected return that would be expected to be obtainable by selecting action a in state s and then following policy pi; in deep Q learning, the action cost function is modeled as a multi-layer neural network Q _w (s, a), the input of the neural network is the state s, and the output is the Q value of each action; to solve instability of neural network training, target networkThe method is used for calculating TD errors, and the playback buffer zone is used for storing four-tuple data obtained by sampling from the environment, so that training data is better facilitated;

when the optimal strategy of the constructed Markov decision process is solved by deep Q learning, the state is firstly calculatedInput to the agent, which outputs the action according to the current strategy>Applied to the micro-grid environment, the micro-grid environment firstly judges whether the constraint is met and calculates a rewarding value, and the environment rewards r _t And next state s _t+1 Is fed back to the agent, thus obtaining a set of samples (s _t ，a _t ，r _t+1 ) And stores the samples in the playback buffer. The intelligent agent is according to s _t+1 Continuing to sample the environment interactively;

when the data in the playback buffer is sufficient, the Q network is started to be updated, and N groups of samples {(s) are extracted from the Q network each time _i ，a _i ，r _i ，s _i+1 )}} _{i=1，…，N} The TD error is calculated for each set of samples:

the target loss for this set of samples is:

parameters are learned by minimizing target losses.