CN115333143B - Deep learning multi-agent micro-grid cooperative control method based on double neural networks - Google Patents

Deep learning multi-agent micro-grid cooperative control method based on double neural networks Download PDF

Info

Publication number
CN115333143B
CN115333143B CN202210797934.7A CN202210797934A CN115333143B CN 115333143 B CN115333143 B CN 115333143B CN 202210797934 A CN202210797934 A CN 202210797934A CN 115333143 B CN115333143 B CN 115333143B
Authority
CN
China
Prior art keywords
micro
grid
agent
reinforcement learning
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210797934.7A
Other languages
Chinese (zh)
Other versions
CN115333143A (en
Inventor
马兴明
郎宇宁
杨东海
王佳兴
毛新宇
周义民
张冬
孟庆宇
徐凤霞
仝书林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Daqing Power Supply Co Of State Grid Heilongjiang Electric Power Co ltd
State Grid Corp of China SGCC
Qiqihar University
Original Assignee
Daqing Power Supply Co Of State Grid Heilongjiang Electric Power Co ltd
State Grid Corp of China SGCC
Qiqihar University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Daqing Power Supply Co Of State Grid Heilongjiang Electric Power Co ltd, State Grid Corp of China SGCC, Qiqihar University filed Critical Daqing Power Supply Co Of State Grid Heilongjiang Electric Power Co ltd
Priority to CN202210797934.7A priority Critical patent/CN115333143B/en
Publication of CN115333143A publication Critical patent/CN115333143A/en
Application granted granted Critical
Publication of CN115333143B publication Critical patent/CN115333143B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/38Arrangements for parallely feeding a single network by two or more generators, converters or transformers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/04Circuit arrangements for ac mains or ac distribution networks for connecting networks of the same frequency but supplied from different sources
    • H02J3/06Controlling transfer of power between connected networks; Controlling sharing of load between connected networks
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/12Circuit arrangements for ac mains or ac distribution networks for adjusting voltage in ac networks by changing a characteristic of the network load
    • H02J3/16Circuit arrangements for ac mains or ac distribution networks for adjusting voltage in ac networks by changing a characteristic of the network load by adjustment of reactive power
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/24Arrangements for preventing or reducing oscillations of power in networks
    • H02J3/241The oscillation concerning frequency
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/38Arrangements for parallely feeding a single network by two or more generators, converters or transformers
    • H02J3/46Controlling of the sharing of output between the generators, converters, or transformers
    • H02J3/48Controlling the sharing of the in-phase component
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/38Arrangements for parallely feeding a single network by two or more generators, converters or transformers
    • H02J3/46Controlling of the sharing of output between the generators, converters, or transformers
    • H02J3/50Controlling the sharing of the out-of-phase component
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2300/00Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
    • H02J2300/20The dispersed energy generation being of renewable origin
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2300/00Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
    • H02J2300/20The dispersed energy generation being of renewable origin
    • H02J2300/22The renewable source being solar energy
    • H02J2300/24The renewable source being solar energy of photovoltaic origin
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2300/00Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
    • H02J2300/20The dispersed energy generation being of renewable origin
    • H02J2300/28The renewable source being wind energy

Landscapes

  • Engineering & Computer Science (AREA)
  • Power Engineering (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Feedback Control In General (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention provides a deep learning multi-agent cooperative control method based on a double neural network, which comprises the following steps: establishing a voltage and frequency control model of the micro-grid; designing a multi-agent-based deep reinforcement learning framework: constructing a Markov decision process of a multi-agent reinforcement learning environment action space and state space and a reward function; designing a flow of a deep reinforcement learning algorithm of the double neural networks, and training the defined reinforcement learning environment for multiple times by adopting the neural networks to achieve convergence of the rewarding value and train the optimal Q value; based on the Q value trained by reinforcement learning, the frequency deviation adjustment of the distributed power supply is realized, and the overestimation problem of the reinforcement learning algorithm is solved so as to optimize the stability of the multi-agent system. And the micro-grid system performs related operation on each distributed power supply, so that optimal energy management optimization strategy selection is completed, and cooperative control of the micro-grid is realized.

Description

Deep learning multi-agent micro-grid cooperative control method based on double neural networks
Technical Field
The invention relates to the technical field of micro-grid frequency control, in particular to a deep learning multi-agent micro-grid cooperative control method based on a double-neural network.
Background
With the rapid development of the economy in China, the energy consumption is increased year by year. Along with the over exploitation of non-renewable energy sources such as fossil energy sources and the like and the increasing of the influence of the traditional power generation process on the environment, in response to world calls, china is used for greatly developing renewable energy sources such as wind energy, light energy, biological energy and the like, so that important contribution is made for environmental protection, and a new development mode is provided for novel energy sources.
At present, in order to overcome the defects of the traditional control method in a micro-grid system, distributed control is introduced, the strategy is realized based on a multi-intelligent system framework, and the multi-intelligent micro-grid based on distributed power generation is widely applied by virtue of the unique flexibility, the short period, the high energy utilization rate and the like. How to run in parallel or independently in a micro-grid mode to bring extremely high economic benefits, and reducing the power generation cost and reducing the loss of energy long-distance transmission are problems which need to be solved at present.
Disclosure of Invention
First, the technical problem to be solved
The invention provides a deep learning multi-agent micro-grid cooperative control method based on a double neural network, which aims to overcome the defects of high power generation cost, high energy loss and the like in the prior art.
(II) technical scheme
In order to solve the problems, the invention provides a deep learning multi-agent micro-grid cooperative control method based on a double neural network, which comprises the following steps:
step S1, establishing a voltage and frequency control model of a micro-grid;
Step S2, training by adopting a micro-grid model under a deep reinforcement learning framework, searching an optimal Q value network, and specifically comprising the following steps:
Step S21, constructing an environment state space for reinforcement learning: the reinforcement learning environment is a micro-grid system, the environment and the intelligent agents are subjected to feedback rewarding, the frequency deviation state of the micro-grid multi-intelligent-agent system controller forms a controllable part of a state space, and the time information deltat of each scheduling forms a time part of the state space;
step S22, constructing an environment action space for reinforcement learning: controlling the frequency deviation of each scheduling agent;
Step S23, defining a reward function: the intelligent agent is used for guiding the intelligent agent to achieve a preset micro-grid optimization target;
step S24, setting a backup controller of the energy storage system, so that actions generated by the schedulable agent and the agent of the energy storage system do not exceed the power range of the system;
Step S3, establishing a double-neural network deep reinforcement learning algorithm flow: training the reinforcement learning environment defined in the step S2 for a plurality of times by adopting a neural network to achieve convergence of the rewarding value;
Estimating a Q (s, a) function by adopting a neural network Q (s, a; omega) as a function approximator; obtaining a Q value of the action after the neural network analysis according to the state and the input of the action, and selecting the maximum Q value as the action of the next step;
The weight ω of the deep neural network represents the mapping of the system state to the Q value, and a loss function Li (ω) is defined to update the neural network weight ω with the corresponding Q value:
l it)=Es[(yt-Q(s,a;ωt))2 ] type (4)
Wherein y t is represented as an objective function:
the weights of the agents are updated by gradient the loss function and performing a random gradient descent:
constructing an estimated network and a target network, wherein the two networks have the same structure but different parameters, the estimated network value is smaller than that of the target network, the estimated network is adopted to continuously learn iteration to update network parameters, the target network uses the parameters updated by the estimated network to update own parameters for a period of time T, one of the two parameters is used for selecting actions, and the other is used for evaluating the current state, wherein the two parameters are respectively marked as omega t and omega t -:
the multi-agent in the micro-grid system randomly selects actions and environments according to a certain probability to perform better exploration feedback, searches for actions which maximize rewards under a specific state, and converges to an optimal strategy finally along with the continuous increase of training times until actions which maximize Q values are completely adopted;
And S4, based on the Q value trained by reinforcement learning, realizing frequency deviation adjustment of the distributed power supply.
Preferably, the alternating-current micro-grid is based on a synchronous generator control theory, and the droop control method is adopted to regulate the active power and the reactive power of the micro-grid;
wherein: the active power method for droop control comprises the following steps:
f=f 0-kp(P-P*) (1)
Wherein: f 0 is the rated frequency, p is the rated active power, kp is the droop coefficient.
Preferably, step S24 specifically includes:
by the Markov decision principle, the Q table is utilized to store a value function Q (s, a) corresponding to the system state and action, namely, the system takes action at a certain state at a time t to represent the obtained accumulated return Rt as the expected return, and gamma is represented as a discount factor:
Q(s,a)=E[Rt|st=s,at=a]=E[rt+γQ(st+1,at+1)+γ2(st+2,at+2)+...] (2)
In the training process, the Q value training module trains with the energy storage device tuple (st, at, rt, st+1) as a sample, st is the current state, at is the current action, rt is the instant reward after the action is executed, st+1 is the next state, t is the moment, and the Q function recursion updating strategy is:
Where α is the learning rate and γ is the discount factor.
Preferably, the step S4 includes:
the deep reinforcement learning algorithm is adopted to train the control strategy of the step S2 and the step S3 for a plurality of times, and the deep reinforcement learning algorithm is utilized to train the Q value so as to optimize the stability of the multi-agent system;
According to the step S2, the intelligent agent randomly selects actions according to a certain probability according to the self state to explore the environment, selects actions with maximum rewards according to the self state, and reduces the exploration probability to select actions with maximum Q value along with the increase of training times so as to achieve the optimal convergence strategy;
according to the deep reinforcement learning algorithm in the step S3, data (st, at, rt, st+1) are stored in a preferential experience playback mode, characteristic vectors of the data are recorded, an intelligent agent randomly takes action in the initial training stage to generate enough training data to be stored in an experience pool, the data are randomly selected to update parameters of a neural network after the memory unit is filled, and new data with poor updating correlation are continuously obtained in the strategy training process.
(III) beneficial effects
The deep learning multi-agent micro-grid cooperative control method based on the double-neural network provided by the invention is used for guaranteeing the stability of the micro-grid system and the cost of power dispatching when the energy dispatching of the multi-agent micro-grid system is flexibly accessed to renewable energy sources and the energy exchange of a micro-grid group is problematic.
Drawings
FIG. 1 is a flow chart of a deep learning multi-agent micro-grid cooperative control method based on a dual neural network in an embodiment of the invention;
FIG. 2 is a system model of a micro grid and a main grid;
FIG. 3 is a flow chart of a reinforcement learning algorithm;
FIG. 4 is a reinforcement learning algorithm prize value comparison.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and examples.
As shown in fig. 1-4, the invention provides a deep learning multi-agent micro-grid cooperative control method based on a dual neural network, which comprises the following steps:
step S1, establishing a voltage and frequency control model of a micro-grid; in the step, the method for controlling the frequency of the micro-grid is based on the synchronous generator control theory of the alternating-current micro-grid, and the droop control method is often adopted to regulate the active power and the reactive power of the micro-grid.
The distributed power supply of the general micro-grid corresponds to each intelligent agent of the multi-intelligent system, and the energy management mode of multiple layers improves the capacity of absorbing renewable energy sources and improves the operation efficiency of the system.
The droop control active power control method of the distributed power supply is as follows:
f=f 0-kp(P-P*) (1)
Wherein: f 0 is the rated frequency, p is the rated active power, and k p is the droop coefficient;
S2, designing a reinforcement learning framework based on multiple agents;
the control strategy is to train by adopting a micro-grid model under a deep reinforcement learning framework to find an optimal Q value network, and comprises the following sub-steps:
Step S21, constructing an environment state space for reinforcement learning: the reinforcement learning environment is a micro-grid system, the environment and the intelligent agents are subjected to feedback rewarding, the frequency deviation state of the micro-grid multi-intelligent-agent system controller forms a controllable part of a state space, and the time information deltat of each scheduling forms a time part of the state space;
step S22, constructing a reinforcement learning action space of the multi-agent: controlling the frequency deviation of each scheduling agent;
Step S23, defining a reward function: the intelligent agent is used for guiding the intelligent agent to achieve a preset micro-grid optimization target;
step S24, setting an energy storage system backup controller: to ensure that actions generated by the schedulable agent and the energy storage system do not exceed the power range of the system;
The frequency control goal of the micro-grid is to discretize the frequency deviation by implementing the frequency deviation of the optimized distributed power supply, namely the environmental state corresponding to { Δf 1,Δf2,Δf3,...Δfn } is { s 1,s2,s3...sn };
The value of the environmental state interval can affect the convergence speed and the precision of the controller, the frequency adjustment range of the power system is 50+/-0.1 hz, and the state S can be designed as follows:
setting a bonus function based on the frequency distribution in S as:
Wherein mu 1~μ4 is a reward factor;
An agent acts on the environment to change state s, where the environment feeds back a prize R to the agent, and the process is continuously cycled as markov decision process, and a Q table is used to store a value function Q (s, a) corresponding to the state and the action of the system, that is, the system takes action a t at time s t in a certain state to obtain accumulated return R t, which can be expressed as expected return:
Q(s,a)=E[Rt|st=s,at=a]=E[rt+γQ(st+1,at+1)+γ2(st+2,at+2)+...] (4)
In this training process, the Q value training module trains with the energy storage device tuple (s t,at,rt,st+1) as a sample, s t as a current state, a t as a current action, r t as an instant prize after the action is executed, s t+1 as a next state, t as a time, and the Q function recurrence update strategy is:
Where α is the learning rate and γ is the discount factor.
Step S3, designing a double DQN deep reinforcement learning algorithm flow of the double neural network: training the reinforcement learning environment defined in the step S2 for a plurality of times by adopting a neural network to achieve convergence of the rewarding value;
The state and the action of the Q function in a general reinforcement learning algorithm have a high-dimensional complex problem, and a neural network Q (s, a; omega) can be introduced as a function approximator to estimate the Q (s, a) function in order to solve the problem; obtaining a Q value of the action after the neural network analysis according to the state and the input of the action, and selecting the maximum Q value as the action of the next step;
The weight ω of the deep neural network represents the mapping of the system state to the Q value, so a loss function L i (ω) needs to be defined to update the neural network weight ω and the corresponding Q value:
Lit)=Es[(yt-Q(s,a;ωt))2]
(6)
Wherein y t is represented as an objective function:
The weights of the agents are updated by gradient the loss function and performing a random gradient descent:
For the algorithm performance to be more stable, an estimated network and a target network are respectively constructed on the basis of a deep learning algorithm framework, the two networks have the same structure but different parameters, the estimated network value is generally smaller than that of the target network, so that the estimated network is adopted to continuously learn iteration to update network parameters, the target network uses the parameter updated by the estimated network to update own parameters for a period of time T, one of the two parameters is used for selecting actions, the other is used for evaluating the value of the current state, and the two parameters are respectively marked as omega t and omega t -:
And the multi-agent in the micro-grid system randomly selects actions and environments according to a certain probability to perform better exploration feedback, searches for actions which maximize rewards under a certain state, and converges to an optimal strategy finally along with the continuous increase of training times until the actions which maximize the Q value are completely adopted.
S4, based on the Q value trained by reinforcement learning, realizing frequency deviation adjustment of the distributed power supply;
And training the control strategy of the step S2 and the step S3 for a plurality of times by adopting a deep reinforcement learning algorithm, and training out the overestimation problem of the Q value solving algorithm by utilizing the deep reinforcement learning algorithm so as to optimize the stability of the multi-agent system.
And the micro-grid system performs relevant operation on each distributed power supply to complete optimal energy management optimization strategy selection, so that cooperative control of the micro-grid is realized.
According to the micro-grid energy scheduling method of the double DQN network in the step S2, the intelligent agent randomly selects actions according to a certain probability according to the self state to explore the environment, selects actions with maximum rewards according to the self state, and finally reduces the exploration probability to select actions with maximum Q value along with the increase of training times so as to achieve the optimal convergence strategy.
According to the deep reinforcement learning algorithm described in step S3, the data is stored in a preferential experience playback mode (S t,at,rt,st+1) and the feature vectors thereof are recorded, the intelligent agent randomly takes action to generate enough training data to be stored in an experience pool in the initial training stage, the data is randomly selected to update parameters of the neural network after the memory unit is filled, and new data with poor updating correlation is continuously obtained in the strategy training process, so that no valuable iteration is avoided, and the convergence rate is improved.
Establishing a voltage and frequency control model of the micro-grid: active power is regulated by controlling the frequency of a power grid, reactive power is regulated by voltage amplitude, and droop control is realized; designing a multi-agent-based deep reinforcement learning framework: constructing a Markov decision process of a multi-agent reinforcement learning environment action space and state space and a reward function; designing a flow of a deep reinforcement learning algorithm of the double neural networks, and training the defined reinforcement learning environment for multiple times by adopting the neural networks to achieve convergence of the rewarding value and train the optimal Q value; based on the Q value trained by reinforcement learning, the frequency deviation adjustment of the distributed power supply is realized, and the overestimation problem of the reinforcement learning algorithm is solved so as to optimize the stability of the multi-agent system. And the micro-grid system performs related operation on each distributed power supply, so that optimal energy management optimization strategy selection is completed, and cooperative control of the micro-grid is realized. The deep learning multi-agent micro-grid cooperative control method based on the double-neural network provided by the invention is used for guaranteeing the stability of the micro-grid system and the cost of power dispatching when the energy dispatching of the multi-agent micro-grid system is flexibly accessed to renewable energy sources and the energy exchange of a micro-grid group is problematic.
The above embodiments are only for illustrating the present invention, not for limiting the present invention, and various changes and modifications may be made by one of ordinary skill in the relevant art without departing from the spirit and scope of the present invention, and therefore, all equivalent technical solutions are also within the scope of the present invention, and the scope of the present invention is defined by the claims.

Claims (2)

1. A deep learning multi-agent micro-grid cooperative control method based on a double neural network is characterized by comprising the following steps:
Step S1, establishing a voltage and frequency control model of a micro-grid; the method for controlling the frequency of the micro-grid is based on a synchronous generator control theory by using an alternating-current micro-grid, and active power and reactive power of the micro-grid are regulated by adopting a droop control method;
wherein: the active power method for droop control comprises the following steps:
f=f 0-kp(P-P*) (1)
Wherein: f 0 is the rated frequency, p is the rated active power, kp is the droop coefficient;
Step S2, training by adopting a micro-grid model under a deep reinforcement learning framework, searching an optimal Q value network, and specifically comprising the following steps:
Step S21, constructing an environment state space for reinforcement learning: the reinforcement learning environment is a micro-grid system, the environment and the intelligent agents are subjected to feedback rewarding, the frequency deviation state of the micro-grid multi-intelligent-agent system controller forms a controllable part of a state space, and the time information deltat of each scheduling forms a time part of the state space;
step S22, constructing an environment action space for reinforcement learning: controlling the frequency deviation of each scheduling agent;
Step S23, defining a reward function: the intelligent agent is used for guiding the intelligent agent to achieve a preset micro-grid optimization target;
Step S24, setting a backup controller of the energy storage system so that actions generated by the schedulable agent and the agent of the energy storage system do not exceed the power range of the system, and specifically comprising the following steps:
by the Markov decision principle, the Q table is utilized to store a value function Q (s, a) corresponding to the system state and action, namely, the system takes action at a certain state at a time t to represent the obtained accumulated return Rt as the expected return, and gamma is represented as a discount factor:
Q(s,a)=E[Rt|st=s,at=a]=E[rt+γQ(st+1,at+1)+γ2(st+2,at+2)+...] (2)
In the training process, the Q value training module trains with the energy storage device tuple (st, at, rt, st+1) as a sample, st is the current state, at is the current action, rt is the instant reward after the action is executed, st+1 is the next state, t is the moment, and the Q function recursion updating strategy is:
Wherein alpha is learning rate and gamma is discount factor;
Step S3, establishing a double-neural network deep reinforcement learning algorithm flow: training the reinforcement learning environment defined in the step S2 for a plurality of times by adopting a neural network to achieve convergence of the rewarding value;
Estimating a Q (s, a) function by adopting a neural network Q (s, a; omega) as a function approximator; obtaining a Q value of the action after the neural network analysis according to the state and the input of the action, and selecting the maximum Q value as the action of the next step;
The weight ω of the deep neural network represents the mapping of the system state to the Q value, and a loss function Li (ω) is defined to update the neural network weight ω with the corresponding Q value:
l it)=Es[(yt-Q(s,a;ωt))2 ] type (4)
Wherein y t is represented as an objective function:
the weights of the agents are updated by gradient the loss function and performing a random gradient descent:
constructing an estimated network and a target network, wherein the two networks have the same structure but different parameters, the estimated network value is smaller than that of the target network, the estimated network is adopted to continuously learn iteration to update network parameters, the target network uses the parameters updated by the estimated network to update own parameters for a period of time T, one of the two parameters is used for selecting actions, and the other is used for evaluating the current state, wherein the two parameters are respectively marked as omega t and omega t -:
the multi-agent in the micro-grid system randomly selects actions and environments according to a certain probability to perform better exploration feedback, searches for actions which maximize rewards under a specific state, and converges to an optimal strategy finally along with the continuous increase of training times until actions which maximize Q values are completely adopted;
And S4, based on the Q value trained by reinforcement learning, realizing frequency deviation adjustment of the distributed power supply.
2. The deep learning multi-agent micro grid cooperative control method based on the dual neural network as set forth in claim 1, wherein the step S4 includes:
the deep reinforcement learning algorithm is adopted to train the control strategy of the step S2 and the step S3 for a plurality of times, and the deep reinforcement learning algorithm is utilized to train the Q value so as to optimize the stability of the multi-agent system;
According to the step S2, the intelligent agent randomly selects actions according to a certain probability according to the self state to explore the environment, selects actions with maximum rewards according to the self state, and reduces the exploration probability to select actions with maximum Q value along with the increase of training times so as to achieve the optimal convergence strategy;
according to the deep reinforcement learning algorithm in the step S3, data (st, at, rt, st+1) are stored in a preferential experience playback mode, characteristic vectors of the data are recorded, an intelligent agent randomly takes action in the initial training stage to generate enough training data to be stored in an experience pool, the data are randomly selected to update parameters of a neural network after the memory unit is filled, and new data with poor updating correlation are continuously obtained in the strategy training process.
CN202210797934.7A 2022-07-08 2022-07-08 Deep learning multi-agent micro-grid cooperative control method based on double neural networks Active CN115333143B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210797934.7A CN115333143B (en) 2022-07-08 2022-07-08 Deep learning multi-agent micro-grid cooperative control method based on double neural networks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210797934.7A CN115333143B (en) 2022-07-08 2022-07-08 Deep learning multi-agent micro-grid cooperative control method based on double neural networks

Publications (2)

Publication Number Publication Date
CN115333143A CN115333143A (en) 2022-11-11
CN115333143B true CN115333143B (en) 2024-05-07

Family

ID=83917405

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210797934.7A Active CN115333143B (en) 2022-07-08 2022-07-08 Deep learning multi-agent micro-grid cooperative control method based on double neural networks

Country Status (1)

Country Link
CN (1) CN115333143B (en)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115499849B (en) * 2022-11-16 2023-04-07 国网湖北省电力有限公司信息通信公司 Wireless access point and reconfigurable intelligent surface cooperation method
CN116307440B (en) * 2022-11-21 2023-11-17 暨南大学 Workshop scheduling method based on reinforcement learning and multi-objective weight learning, device and application thereof
CN115796364A (en) * 2022-11-30 2023-03-14 南京邮电大学 Intelligent interactive decision-making method for discrete manufacturing system
CN116488154B (en) * 2023-04-17 2024-07-26 海南大学 Energy scheduling method, system, computer equipment and medium based on micro-grid
CN116594358B (en) * 2023-04-20 2024-01-02 暨南大学 Multi-layer factory workshop scheduling method based on reinforcement learning
CN116629128B (en) * 2023-05-30 2024-03-29 哈尔滨工业大学 Method for controlling arc additive forming based on deep reinforcement learning
CN116934050A (en) * 2023-08-10 2023-10-24 深圳市思特克电子技术开发有限公司 Electric power intelligent scheduling system based on reinforcement learning
CN117172163B (en) * 2023-08-15 2024-04-12 重庆西南集成电路设计有限责任公司 Amplitude and phase two-dimensional optimization method and system of amplitude and phase control circuit, medium and electronic equipment
CN117350515B (en) * 2023-11-21 2024-04-05 安徽大学 Ocean island group energy flow scheduling method based on multi-agent reinforcement learning
CN117713202B (en) * 2023-12-15 2024-08-13 嘉兴正弦电气有限公司 Distributed power supply self-adaptive control method and system based on deep reinforcement learning
CN117474295B (en) * 2023-12-26 2024-04-26 长春工业大学 Dueling DQN algorithm-based multi-AGV load balancing and task scheduling method
CN117764360A (en) * 2023-12-29 2024-03-26 中海油信息科技有限公司 Paint workshop intelligent scheduling method based on graphic neural network
CN117578466B (en) * 2024-01-17 2024-04-05 国网山西省电力公司电力科学研究院 Power system transient stability prevention control method based on dominant function decomposition
CN117807895B (en) * 2024-02-28 2024-06-04 中国电建集团昆明勘测设计研究院有限公司 Magnetorheological damper control method and device based on deep reinforcement learning
CN117808174B (en) * 2024-03-01 2024-05-28 山东大学 Micro-grid operation optimization method and system based on reinforcement learning under network attack
CN117973233B (en) * 2024-03-29 2024-06-18 合肥工业大学 Converter control model training and oscillation suppression method based on deep reinforcement learning
CN118092195B (en) * 2024-04-26 2024-06-25 山东工商学院 Multi-agent cooperative control method for improving IQL based on cooperative training model

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106410808A (en) * 2016-09-27 2017-02-15 东南大学 General distributed control method comprising constant-power control and droop control for microgrid group
CN109347149A (en) * 2018-09-20 2019-02-15 国网河南省电力公司电力科学研究院 Micro-capacitance sensor energy storage dispatching method and device based on depth Q value network intensified learning
CN111200285A (en) * 2020-02-12 2020-05-26 燕山大学 Micro-grid hybrid coordination control method based on reinforcement learning and multi-agent theory
CN111371112A (en) * 2020-04-15 2020-07-03 苏州科技大学 Distributed finite time control method for island microgrid heterogeneous battery energy storage system
CN111431216A (en) * 2020-03-18 2020-07-17 国网浙江嘉善县供电有限公司 High-proportion photovoltaic microgrid reactive power sharing control method adopting Q learning
CN112117760A (en) * 2020-08-13 2020-12-22 国网浙江省电力有限公司台州供电公司 Micro-grid energy scheduling method based on double-Q-value network deep reinforcement learning
CN114400704A (en) * 2022-01-24 2022-04-26 燕山大学 Island micro-grid multi-mode switching strategy based on double Q learning consideration economic regulation
CN114421479A (en) * 2021-11-30 2022-04-29 国网浙江省电力有限公司台州供电公司 Voltage control method for AC/DC micro-grid group cooperative mutual supply
WO2022135066A1 (en) * 2020-12-25 2022-06-30 南京理工大学 Temporal difference-based hybrid flow-shop scheduling method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111694365B (en) * 2020-07-01 2021-04-20 武汉理工大学 Unmanned ship formation path tracking method based on deep reinforcement learning

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106410808A (en) * 2016-09-27 2017-02-15 东南大学 General distributed control method comprising constant-power control and droop control for microgrid group
CN109347149A (en) * 2018-09-20 2019-02-15 国网河南省电力公司电力科学研究院 Micro-capacitance sensor energy storage dispatching method and device based on depth Q value network intensified learning
CN111200285A (en) * 2020-02-12 2020-05-26 燕山大学 Micro-grid hybrid coordination control method based on reinforcement learning and multi-agent theory
CN111431216A (en) * 2020-03-18 2020-07-17 国网浙江嘉善县供电有限公司 High-proportion photovoltaic microgrid reactive power sharing control method adopting Q learning
CN111371112A (en) * 2020-04-15 2020-07-03 苏州科技大学 Distributed finite time control method for island microgrid heterogeneous battery energy storage system
CN112117760A (en) * 2020-08-13 2020-12-22 国网浙江省电力有限公司台州供电公司 Micro-grid energy scheduling method based on double-Q-value network deep reinforcement learning
WO2022135066A1 (en) * 2020-12-25 2022-06-30 南京理工大学 Temporal difference-based hybrid flow-shop scheduling method
CN114421479A (en) * 2021-11-30 2022-04-29 国网浙江省电力有限公司台州供电公司 Voltage control method for AC/DC micro-grid group cooperative mutual supply
CN114400704A (en) * 2022-01-24 2022-04-26 燕山大学 Island micro-grid multi-mode switching strategy based on double Q learning consideration economic regulation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于强化学习的多微电网分布式二次优化控制;沈珺;柳伟;李虎成;李娜;温镇;殷明慧;;电力系统自动化;20200305(第05期);全文 *

Also Published As

Publication number Publication date
CN115333143A (en) 2022-11-11

Similar Documents

Publication Publication Date Title
CN115333143B (en) Deep learning multi-agent micro-grid cooperative control method based on double neural networks
CN109347149B (en) Micro-grid energy storage scheduling method and device based on deep Q-value network reinforcement learning
CN114362196B (en) Multi-time-scale active power distribution network voltage control method
CN108565874B (en) Source-load cooperative frequency modulation method based on load frequency control model
CN114217524A (en) Power grid real-time self-adaptive decision-making method based on deep reinforcement learning
CN110138019B (en) Method for optimizing start and stop of unit
CN113872213B (en) Autonomous optimization control method and device for power distribution network voltage
Tsang et al. Autonomous household energy management using deep reinforcement learning
CN110445186B (en) Self-synchronizing microgrid control system and secondary frequency modulation control method
CN117039981A (en) Large-scale power grid optimal scheduling method, device and storage medium for new energy
CN117578466B (en) Power system transient stability prevention control method based on dominant function decomposition
CN117117989A (en) Deep reinforcement learning solving method for unit combination
CN115459320B (en) Intelligent decision-making method and device for aggregation control of multipoint distributed energy storage system
CN115133540B (en) Model-free real-time voltage control method for power distribution network
CN114400675B (en) Active power distribution network voltage control method based on weight mean value deep double-Q network
Tang et al. Voltage Control Strategy of Distribution Networks with Distributed Photovoltaic Based on Multi-agent Deep Reinforcement Learning
CN110289643B (en) Rejection depth differential dynamic planning real-time power generation scheduling and control algorithm
CN114421470B (en) Intelligent real-time operation control method for flexible diamond type power distribution system
CN117713202B (en) Distributed power supply self-adaptive control method and system based on deep reinforcement learning
CN118508416A (en) Urban level micro-grid control method
CN117674160A (en) Active power distribution network real-time voltage control method based on multi-agent deep reinforcement learning
Song et al. Research on Cooperative Control Algorithm Based on Distributed Multi-Region Integrated Energy System
Ai et al. Power flow rebalancing in optimal scheduling of smart distribution systems based on Deep Deterministic Policy Gradient
Chen et al. Research on Flexible Resource Dynamic Interactive Regulation Technology for Microgrids with High Permeable New Energy
CN117350423A (en) Distributed energy system cluster collaborative optimization method based on multi-agent reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant