CN115333143B

CN115333143B - Deep learning multi-agent micro-grid cooperative control method based on double neural networks

Info

Publication number: CN115333143B
Application number: CN202210797934.7A
Authority: CN
Inventors: 马兴明; 郎宇宁; 杨东海; 王佳兴; 毛新宇; 周义民; 张冬; 孟庆宇; 徐凤霞; 仝书林
Original assignee: Daqing Power Supply Co Of State Grid Heilongjiang Electric Power Co ltd; State Grid Corp of China SGCC; Qiqihar University
Current assignee: Daqing Power Supply Co Of State Grid Heilongjiang Electric Power Co ltd; State Grid Corp of China SGCC; Qiqihar University
Priority date: 2022-07-08
Filing date: 2022-07-08
Publication date: 2024-05-07
Anticipated expiration: 2042-07-08
Also published as: CN115333143A

Abstract

The invention provides a deep learning multi-agent cooperative control method based on a double neural network, which comprises the following steps: establishing a voltage and frequency control model of the micro-grid; designing a multi-agent-based deep reinforcement learning framework: constructing a Markov decision process of a multi-agent reinforcement learning environment action space and state space and a reward function; designing a flow of a deep reinforcement learning algorithm of the double neural networks, and training the defined reinforcement learning environment for multiple times by adopting the neural networks to achieve convergence of the rewarding value and train the optimal Q value; based on the Q value trained by reinforcement learning, the frequency deviation adjustment of the distributed power supply is realized, and the overestimation problem of the reinforcement learning algorithm is solved so as to optimize the stability of the multi-agent system. And the micro-grid system performs related operation on each distributed power supply, so that optimal energy management optimization strategy selection is completed, and cooperative control of the micro-grid is realized.

Description

Deep learning multi-agent micro-grid cooperative control method based on double neural networks

Technical Field

The invention relates to the technical field of micro-grid frequency control, in particular to a deep learning multi-agent micro-grid cooperative control method based on a double-neural network.

Background

With the rapid development of the economy in China, the energy consumption is increased year by year. Along with the over exploitation of non-renewable energy sources such as fossil energy sources and the like and the increasing of the influence of the traditional power generation process on the environment, in response to world calls, china is used for greatly developing renewable energy sources such as wind energy, light energy, biological energy and the like, so that important contribution is made for environmental protection, and a new development mode is provided for novel energy sources.

At present, in order to overcome the defects of the traditional control method in a micro-grid system, distributed control is introduced, the strategy is realized based on a multi-intelligent system framework, and the multi-intelligent micro-grid based on distributed power generation is widely applied by virtue of the unique flexibility, the short period, the high energy utilization rate and the like. How to run in parallel or independently in a micro-grid mode to bring extremely high economic benefits, and reducing the power generation cost and reducing the loss of energy long-distance transmission are problems which need to be solved at present.

Disclosure of Invention

First, the technical problem to be solved

The invention provides a deep learning multi-agent micro-grid cooperative control method based on a double neural network, which aims to overcome the defects of high power generation cost, high energy loss and the like in the prior art.

(II) technical scheme

In order to solve the problems, the invention provides a deep learning multi-agent micro-grid cooperative control method based on a double neural network, which comprises the following steps:

step S1, establishing a voltage and frequency control model of a micro-grid;

Step S2, training by adopting a micro-grid model under a deep reinforcement learning framework, searching an optimal Q value network, and specifically comprising the following steps:

Step S21, constructing an environment state space for reinforcement learning: the reinforcement learning environment is a micro-grid system, the environment and the intelligent agents are subjected to feedback rewarding, the frequency deviation state of the micro-grid multi-intelligent-agent system controller forms a controllable part of a state space, and the time information deltat of each scheduling forms a time part of the state space;

step S22, constructing an environment action space for reinforcement learning: controlling the frequency deviation of each scheduling agent;

Step S23, defining a reward function: the intelligent agent is used for guiding the intelligent agent to achieve a preset micro-grid optimization target;

step S24, setting a backup controller of the energy storage system, so that actions generated by the schedulable agent and the agent of the energy storage system do not exceed the power range of the system;

Step S3, establishing a double-neural network deep reinforcement learning algorithm flow: training the reinforcement learning environment defined in the step S2 for a plurality of times by adopting a neural network to achieve convergence of the rewarding value;

Estimating a Q (s, a) function by adopting a neural network Q (s, a; omega) as a function approximator; obtaining a Q value of the action after the neural network analysis according to the state and the input of the action, and selecting the maximum Q value as the action of the next step;

The weight ω of the deep neural network represents the mapping of the system state to the Q value, and a loss function Li (ω) is defined to update the neural network weight ω with the corresponding Q value:

l _i(ω_t)＝E_s[(y_t-Q(s,a;ω_t))² ] type (4)

Wherein y _t is represented as an objective function:

the weights of the agents are updated by gradient the loss function and performing a random gradient descent:

constructing an estimated network and a target network, wherein the two networks have the same structure but different parameters, the estimated network value is smaller than that of the target network, the estimated network is adopted to continuously learn iteration to update network parameters, the target network uses the parameters updated by the estimated network to update own parameters for a period of time T, one of the two parameters is used for selecting actions, and the other is used for evaluating the current state, wherein the two parameters are respectively marked as omega _t and omega _t ^-:

the multi-agent in the micro-grid system randomly selects actions and environments according to a certain probability to perform better exploration feedback, searches for actions which maximize rewards under a specific state, and converges to an optimal strategy finally along with the continuous increase of training times until actions which maximize Q values are completely adopted;

And S4, based on the Q value trained by reinforcement learning, realizing frequency deviation adjustment of the distributed power supply.

Preferably, the alternating-current micro-grid is based on a synchronous generator control theory, and the droop control method is adopted to regulate the active power and the reactive power of the micro-grid;

wherein: the active power method for droop control comprises the following steps:

f=f ₀-k_p(P-P^*) (1)

Wherein: f ₀ is the rated frequency, p is the rated active power, kp is the droop coefficient.

Preferably, step S24 specifically includes:

by the Markov decision principle, the Q table is utilized to store a value function Q (s, a) corresponding to the system state and action, namely, the system takes action at a certain state at a time t to represent the obtained accumulated return Rt as the expected return, and gamma is represented as a discount factor:

Q(s,a)＝E[R_t|s_t＝s,a_t＝a]＝E[r_t+γQ(s_t+1,a_t+1)+γ²(s_t+2,a_t+2)+...] (2)

In the training process, the Q value training module trains with the energy storage device tuple (st, at, rt, st+1) as a sample, st is the current state, at is the current action, rt is the instant reward after the action is executed, st+1 is the next state, t is the moment, and the Q function recursion updating strategy is:

Where α is the learning rate and γ is the discount factor.

Preferably, the step S4 includes:

the deep reinforcement learning algorithm is adopted to train the control strategy of the step S2 and the step S3 for a plurality of times, and the deep reinforcement learning algorithm is utilized to train the Q value so as to optimize the stability of the multi-agent system;

According to the step S2, the intelligent agent randomly selects actions according to a certain probability according to the self state to explore the environment, selects actions with maximum rewards according to the self state, and reduces the exploration probability to select actions with maximum Q value along with the increase of training times so as to achieve the optimal convergence strategy;

according to the deep reinforcement learning algorithm in the step S3, data (st, at, rt, st+1) are stored in a preferential experience playback mode, characteristic vectors of the data are recorded, an intelligent agent randomly takes action in the initial training stage to generate enough training data to be stored in an experience pool, the data are randomly selected to update parameters of a neural network after the memory unit is filled, and new data with poor updating correlation are continuously obtained in the strategy training process.

(III) beneficial effects

The deep learning multi-agent micro-grid cooperative control method based on the double-neural network provided by the invention is used for guaranteeing the stability of the micro-grid system and the cost of power dispatching when the energy dispatching of the multi-agent micro-grid system is flexibly accessed to renewable energy sources and the energy exchange of a micro-grid group is problematic.

Drawings

FIG. 1 is a flow chart of a deep learning multi-agent micro-grid cooperative control method based on a dual neural network in an embodiment of the invention;

FIG. 2 is a system model of a micro grid and a main grid;

FIG. 3 is a flow chart of a reinforcement learning algorithm;

FIG. 4 is a reinforcement learning algorithm prize value comparison.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings and examples.

As shown in fig. 1-4, the invention provides a deep learning multi-agent micro-grid cooperative control method based on a dual neural network, which comprises the following steps:

step S1, establishing a voltage and frequency control model of a micro-grid; in the step, the method for controlling the frequency of the micro-grid is based on the synchronous generator control theory of the alternating-current micro-grid, and the droop control method is often adopted to regulate the active power and the reactive power of the micro-grid.

The distributed power supply of the general micro-grid corresponds to each intelligent agent of the multi-intelligent system, and the energy management mode of multiple layers improves the capacity of absorbing renewable energy sources and improves the operation efficiency of the system.

The droop control active power control method of the distributed power supply is as follows:

f=f ₀-k_p(P-P^*) (1)

Wherein: f ₀ is the rated frequency, p is the rated active power, and k _p is the droop coefficient;

S2, designing a reinforcement learning framework based on multiple agents;

the control strategy is to train by adopting a micro-grid model under a deep reinforcement learning framework to find an optimal Q value network, and comprises the following sub-steps:

step S22, constructing a reinforcement learning action space of the multi-agent: controlling the frequency deviation of each scheduling agent;

step S24, setting an energy storage system backup controller: to ensure that actions generated by the schedulable agent and the energy storage system do not exceed the power range of the system;

The frequency control goal of the micro-grid is to discretize the frequency deviation by implementing the frequency deviation of the optimized distributed power supply, namely the environmental state corresponding to { Δf ₁,Δf₂,Δf₃,...Δf_n } is { s ₁,s₂,s₃...s_n };

The value of the environmental state interval can affect the convergence speed and the precision of the controller, the frequency adjustment range of the power system is 50+/-0.1 hz, and the state S can be designed as follows:

setting a bonus function based on the frequency distribution in S as:

Wherein mu ₁～μ₄ is a reward factor;

An agent acts on the environment to change state s, where the environment feeds back a prize R to the agent, and the process is continuously cycled as markov decision process, and a Q table is used to store a value function Q (s, a) corresponding to the state and the action of the system, that is, the system takes action a _t at time s _t in a certain state to obtain accumulated return R _t, which can be expressed as expected return:

Q(s,a)＝E[R_t|s_t＝s,a_t＝a]＝E[r_t+γQ(s_t+1,a_t+1)+γ²(s_t+2,a_t+2)+...] (4)

In this training process, the Q value training module trains with the energy storage device tuple (s _t,a_t,r_t,s_t+1) as a sample, s _t as a current state, a _t as a current action, r _t as an instant prize after the action is executed, s _t+1 as a next state, t as a time, and the Q function recurrence update strategy is:

Where α is the learning rate and γ is the discount factor.

Step S3, designing a double DQN deep reinforcement learning algorithm flow of the double neural network: training the reinforcement learning environment defined in the step S2 for a plurality of times by adopting a neural network to achieve convergence of the rewarding value;

The state and the action of the Q function in a general reinforcement learning algorithm have a high-dimensional complex problem, and a neural network Q (s, a; omega) can be introduced as a function approximator to estimate the Q (s, a) function in order to solve the problem; obtaining a Q value of the action after the neural network analysis according to the state and the input of the action, and selecting the maximum Q value as the action of the next step;

The weight ω of the deep neural network represents the mapping of the system state to the Q value, so a loss function L _i (ω) needs to be defined to update the neural network weight ω and the corresponding Q value:

L_i(ω_t)＝E_s[(y_t-Q(s,a;ω_t))²]

(6)

Wherein y _t is represented as an objective function:

For the algorithm performance to be more stable, an estimated network and a target network are respectively constructed on the basis of a deep learning algorithm framework, the two networks have the same structure but different parameters, the estimated network value is generally smaller than that of the target network, so that the estimated network is adopted to continuously learn iteration to update network parameters, the target network uses the parameter updated by the estimated network to update own parameters for a period of time T, one of the two parameters is used for selecting actions, the other is used for evaluating the value of the current state, and the two parameters are respectively marked as omega _t and omega _t ^-:

And the multi-agent in the micro-grid system randomly selects actions and environments according to a certain probability to perform better exploration feedback, searches for actions which maximize rewards under a certain state, and converges to an optimal strategy finally along with the continuous increase of training times until the actions which maximize the Q value are completely adopted.

S4, based on the Q value trained by reinforcement learning, realizing frequency deviation adjustment of the distributed power supply;

And training the control strategy of the step S2 and the step S3 for a plurality of times by adopting a deep reinforcement learning algorithm, and training out the overestimation problem of the Q value solving algorithm by utilizing the deep reinforcement learning algorithm so as to optimize the stability of the multi-agent system.

And the micro-grid system performs relevant operation on each distributed power supply to complete optimal energy management optimization strategy selection, so that cooperative control of the micro-grid is realized.

According to the micro-grid energy scheduling method of the double DQN network in the step S2, the intelligent agent randomly selects actions according to a certain probability according to the self state to explore the environment, selects actions with maximum rewards according to the self state, and finally reduces the exploration probability to select actions with maximum Q value along with the increase of training times so as to achieve the optimal convergence strategy.

According to the deep reinforcement learning algorithm described in step S3, the data is stored in a preferential experience playback mode (S _t,a_t,r_t,s_t+1) and the feature vectors thereof are recorded, the intelligent agent randomly takes action to generate enough training data to be stored in an experience pool in the initial training stage, the data is randomly selected to update parameters of the neural network after the memory unit is filled, and new data with poor updating correlation is continuously obtained in the strategy training process, so that no valuable iteration is avoided, and the convergence rate is improved.

Establishing a voltage and frequency control model of the micro-grid: active power is regulated by controlling the frequency of a power grid, reactive power is regulated by voltage amplitude, and droop control is realized; designing a multi-agent-based deep reinforcement learning framework: constructing a Markov decision process of a multi-agent reinforcement learning environment action space and state space and a reward function; designing a flow of a deep reinforcement learning algorithm of the double neural networks, and training the defined reinforcement learning environment for multiple times by adopting the neural networks to achieve convergence of the rewarding value and train the optimal Q value; based on the Q value trained by reinforcement learning, the frequency deviation adjustment of the distributed power supply is realized, and the overestimation problem of the reinforcement learning algorithm is solved so as to optimize the stability of the multi-agent system. And the micro-grid system performs related operation on each distributed power supply, so that optimal energy management optimization strategy selection is completed, and cooperative control of the micro-grid is realized. The deep learning multi-agent micro-grid cooperative control method based on the double-neural network provided by the invention is used for guaranteeing the stability of the micro-grid system and the cost of power dispatching when the energy dispatching of the multi-agent micro-grid system is flexibly accessed to renewable energy sources and the energy exchange of a micro-grid group is problematic.

The above embodiments are only for illustrating the present invention, not for limiting the present invention, and various changes and modifications may be made by one of ordinary skill in the relevant art without departing from the spirit and scope of the present invention, and therefore, all equivalent technical solutions are also within the scope of the present invention, and the scope of the present invention is defined by the claims.

Claims

1. A deep learning multi-agent micro-grid cooperative control method based on a double neural network is characterized by comprising the following steps:

Step S1, establishing a voltage and frequency control model of a micro-grid; the method for controlling the frequency of the micro-grid is based on a synchronous generator control theory by using an alternating-current micro-grid, and active power and reactive power of the micro-grid are regulated by adopting a droop control method;

f=f ₀-k_p(P-P^*) (1)

Wherein: f ₀ is the rated frequency, p is the rated active power, kp is the droop coefficient;

Step S24, setting a backup controller of the energy storage system so that actions generated by the schedulable agent and the agent of the energy storage system do not exceed the power range of the system, and specifically comprising the following steps:

Wherein alpha is learning rate and gamma is discount factor;

l _i(ω_t)＝E_s[(y_t-Q(s,a;ω_t))² ] type (4)

Wherein y _t is represented as an objective function:

2. The deep learning multi-agent micro grid cooperative control method based on the dual neural network as set forth in claim 1, wherein the step S4 includes: