CN115333143B - Deep learning multi-agent micro-grid cooperative control method based on double neural networks - Google Patents
Deep learning multi-agent micro-grid cooperative control method based on double neural networks Download PDFInfo
- Publication number
- CN115333143B CN115333143B CN202210797934.7A CN202210797934A CN115333143B CN 115333143 B CN115333143 B CN 115333143B CN 202210797934 A CN202210797934 A CN 202210797934A CN 115333143 B CN115333143 B CN 115333143B
- Authority
- CN
- China
- Prior art keywords
- micro
- grid
- agent
- reinforcement learning
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 37
- 238000000034 method Methods 0.000 title claims abstract description 33
- 238000013135 deep learning Methods 0.000 title claims abstract description 13
- 230000009471 action Effects 0.000 claims abstract description 54
- 230000002787 reinforcement Effects 0.000 claims abstract description 46
- 238000012549 training Methods 0.000 claims abstract description 30
- 230000006870 function Effects 0.000 claims abstract description 28
- 230000008569 process Effects 0.000 claims abstract description 11
- 238000005457 optimization Methods 0.000 claims abstract description 6
- 239000003795 chemical substances by application Substances 0.000 claims description 55
- 238000004146 energy storage Methods 0.000 claims description 9
- 230000001276 controlling effect Effects 0.000 claims description 6
- 238000011217 control strategy Methods 0.000 claims description 4
- 230000009977 dual effect Effects 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 230000001105 regulatory effect Effects 0.000 claims description 3
- 230000001360 synchronised effect Effects 0.000 claims description 3
- 239000013598 vector Substances 0.000 claims description 3
- 238000010248 power generation Methods 0.000 description 4
- 230000007613 environmental effect Effects 0.000 description 3
- 230000007547 defect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/38—Arrangements for parallely feeding a single network by two or more generators, converters or transformers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/04—Circuit arrangements for ac mains or ac distribution networks for connecting networks of the same frequency but supplied from different sources
- H02J3/06—Controlling transfer of power between connected networks; Controlling sharing of load between connected networks
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/12—Circuit arrangements for ac mains or ac distribution networks for adjusting voltage in ac networks by changing a characteristic of the network load
- H02J3/16—Circuit arrangements for ac mains or ac distribution networks for adjusting voltage in ac networks by changing a characteristic of the network load by adjustment of reactive power
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/24—Arrangements for preventing or reducing oscillations of power in networks
- H02J3/241—The oscillation concerning frequency
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/38—Arrangements for parallely feeding a single network by two or more generators, converters or transformers
- H02J3/46—Controlling of the sharing of output between the generators, converters, or transformers
- H02J3/48—Controlling the sharing of the in-phase component
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/38—Arrangements for parallely feeding a single network by two or more generators, converters or transformers
- H02J3/46—Controlling of the sharing of output between the generators, converters, or transformers
- H02J3/50—Controlling the sharing of the out-of-phase component
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J2203/00—Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
- H02J2203/20—Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J2300/00—Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
- H02J2300/20—The dispersed energy generation being of renewable origin
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J2300/00—Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
- H02J2300/20—The dispersed energy generation being of renewable origin
- H02J2300/22—The renewable source being solar energy
- H02J2300/24—The renewable source being solar energy of photovoltaic origin
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J2300/00—Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
- H02J2300/20—The dispersed energy generation being of renewable origin
- H02J2300/28—The renewable source being wind energy
Landscapes
- Engineering & Computer Science (AREA)
- Power Engineering (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Feedback Control In General (AREA)
- Supply And Distribution Of Alternating Current (AREA)
Abstract
The invention provides a deep learning multi-agent cooperative control method based on a double neural network, which comprises the following steps: establishing a voltage and frequency control model of the micro-grid; designing a multi-agent-based deep reinforcement learning framework: constructing a Markov decision process of a multi-agent reinforcement learning environment action space and state space and a reward function; designing a flow of a deep reinforcement learning algorithm of the double neural networks, and training the defined reinforcement learning environment for multiple times by adopting the neural networks to achieve convergence of the rewarding value and train the optimal Q value; based on the Q value trained by reinforcement learning, the frequency deviation adjustment of the distributed power supply is realized, and the overestimation problem of the reinforcement learning algorithm is solved so as to optimize the stability of the multi-agent system. And the micro-grid system performs related operation on each distributed power supply, so that optimal energy management optimization strategy selection is completed, and cooperative control of the micro-grid is realized.
Description
Technical Field
The invention relates to the technical field of micro-grid frequency control, in particular to a deep learning multi-agent micro-grid cooperative control method based on a double-neural network.
Background
With the rapid development of the economy in China, the energy consumption is increased year by year. Along with the over exploitation of non-renewable energy sources such as fossil energy sources and the like and the increasing of the influence of the traditional power generation process on the environment, in response to world calls, china is used for greatly developing renewable energy sources such as wind energy, light energy, biological energy and the like, so that important contribution is made for environmental protection, and a new development mode is provided for novel energy sources.
At present, in order to overcome the defects of the traditional control method in a micro-grid system, distributed control is introduced, the strategy is realized based on a multi-intelligent system framework, and the multi-intelligent micro-grid based on distributed power generation is widely applied by virtue of the unique flexibility, the short period, the high energy utilization rate and the like. How to run in parallel or independently in a micro-grid mode to bring extremely high economic benefits, and reducing the power generation cost and reducing the loss of energy long-distance transmission are problems which need to be solved at present.
Disclosure of Invention
First, the technical problem to be solved
The invention provides a deep learning multi-agent micro-grid cooperative control method based on a double neural network, which aims to overcome the defects of high power generation cost, high energy loss and the like in the prior art.
(II) technical scheme
In order to solve the problems, the invention provides a deep learning multi-agent micro-grid cooperative control method based on a double neural network, which comprises the following steps:
step S1, establishing a voltage and frequency control model of a micro-grid;
Step S2, training by adopting a micro-grid model under a deep reinforcement learning framework, searching an optimal Q value network, and specifically comprising the following steps:
Step S21, constructing an environment state space for reinforcement learning: the reinforcement learning environment is a micro-grid system, the environment and the intelligent agents are subjected to feedback rewarding, the frequency deviation state of the micro-grid multi-intelligent-agent system controller forms a controllable part of a state space, and the time information deltat of each scheduling forms a time part of the state space;
step S22, constructing an environment action space for reinforcement learning: controlling the frequency deviation of each scheduling agent;
Step S23, defining a reward function: the intelligent agent is used for guiding the intelligent agent to achieve a preset micro-grid optimization target;
step S24, setting a backup controller of the energy storage system, so that actions generated by the schedulable agent and the agent of the energy storage system do not exceed the power range of the system;
Step S3, establishing a double-neural network deep reinforcement learning algorithm flow: training the reinforcement learning environment defined in the step S2 for a plurality of times by adopting a neural network to achieve convergence of the rewarding value;
Estimating a Q (s, a) function by adopting a neural network Q (s, a; omega) as a function approximator; obtaining a Q value of the action after the neural network analysis according to the state and the input of the action, and selecting the maximum Q value as the action of the next step;
The weight ω of the deep neural network represents the mapping of the system state to the Q value, and a loss function Li (ω) is defined to update the neural network weight ω with the corresponding Q value:
l i(ωt)=Es[(yt-Q(s,a;ωt))2 ] type (4)
Wherein y t is represented as an objective function:
the weights of the agents are updated by gradient the loss function and performing a random gradient descent:
constructing an estimated network and a target network, wherein the two networks have the same structure but different parameters, the estimated network value is smaller than that of the target network, the estimated network is adopted to continuously learn iteration to update network parameters, the target network uses the parameters updated by the estimated network to update own parameters for a period of time T, one of the two parameters is used for selecting actions, and the other is used for evaluating the current state, wherein the two parameters are respectively marked as omega t and omega t -:
the multi-agent in the micro-grid system randomly selects actions and environments according to a certain probability to perform better exploration feedback, searches for actions which maximize rewards under a specific state, and converges to an optimal strategy finally along with the continuous increase of training times until actions which maximize Q values are completely adopted;
And S4, based on the Q value trained by reinforcement learning, realizing frequency deviation adjustment of the distributed power supply.
Preferably, the alternating-current micro-grid is based on a synchronous generator control theory, and the droop control method is adopted to regulate the active power and the reactive power of the micro-grid;
wherein: the active power method for droop control comprises the following steps:
f=f 0-kp(P-P*) (1)
Wherein: f 0 is the rated frequency, p is the rated active power, kp is the droop coefficient.
Preferably, step S24 specifically includes:
by the Markov decision principle, the Q table is utilized to store a value function Q (s, a) corresponding to the system state and action, namely, the system takes action at a certain state at a time t to represent the obtained accumulated return Rt as the expected return, and gamma is represented as a discount factor:
Q(s,a)=E[Rt|st=s,at=a]=E[rt+γQ(st+1,at+1)+γ2(st+2,at+2)+...] (2)
In the training process, the Q value training module trains with the energy storage device tuple (st, at, rt, st+1) as a sample, st is the current state, at is the current action, rt is the instant reward after the action is executed, st+1 is the next state, t is the moment, and the Q function recursion updating strategy is:
Where α is the learning rate and γ is the discount factor.
Preferably, the step S4 includes:
the deep reinforcement learning algorithm is adopted to train the control strategy of the step S2 and the step S3 for a plurality of times, and the deep reinforcement learning algorithm is utilized to train the Q value so as to optimize the stability of the multi-agent system;
According to the step S2, the intelligent agent randomly selects actions according to a certain probability according to the self state to explore the environment, selects actions with maximum rewards according to the self state, and reduces the exploration probability to select actions with maximum Q value along with the increase of training times so as to achieve the optimal convergence strategy;
according to the deep reinforcement learning algorithm in the step S3, data (st, at, rt, st+1) are stored in a preferential experience playback mode, characteristic vectors of the data are recorded, an intelligent agent randomly takes action in the initial training stage to generate enough training data to be stored in an experience pool, the data are randomly selected to update parameters of a neural network after the memory unit is filled, and new data with poor updating correlation are continuously obtained in the strategy training process.
(III) beneficial effects
The deep learning multi-agent micro-grid cooperative control method based on the double-neural network provided by the invention is used for guaranteeing the stability of the micro-grid system and the cost of power dispatching when the energy dispatching of the multi-agent micro-grid system is flexibly accessed to renewable energy sources and the energy exchange of a micro-grid group is problematic.
Drawings
FIG. 1 is a flow chart of a deep learning multi-agent micro-grid cooperative control method based on a dual neural network in an embodiment of the invention;
FIG. 2 is a system model of a micro grid and a main grid;
FIG. 3 is a flow chart of a reinforcement learning algorithm;
FIG. 4 is a reinforcement learning algorithm prize value comparison.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and examples.
As shown in fig. 1-4, the invention provides a deep learning multi-agent micro-grid cooperative control method based on a dual neural network, which comprises the following steps:
step S1, establishing a voltage and frequency control model of a micro-grid; in the step, the method for controlling the frequency of the micro-grid is based on the synchronous generator control theory of the alternating-current micro-grid, and the droop control method is often adopted to regulate the active power and the reactive power of the micro-grid.
The distributed power supply of the general micro-grid corresponds to each intelligent agent of the multi-intelligent system, and the energy management mode of multiple layers improves the capacity of absorbing renewable energy sources and improves the operation efficiency of the system.
The droop control active power control method of the distributed power supply is as follows:
f=f 0-kp(P-P*) (1)
Wherein: f 0 is the rated frequency, p is the rated active power, and k p is the droop coefficient;
S2, designing a reinforcement learning framework based on multiple agents;
the control strategy is to train by adopting a micro-grid model under a deep reinforcement learning framework to find an optimal Q value network, and comprises the following sub-steps:
Step S21, constructing an environment state space for reinforcement learning: the reinforcement learning environment is a micro-grid system, the environment and the intelligent agents are subjected to feedback rewarding, the frequency deviation state of the micro-grid multi-intelligent-agent system controller forms a controllable part of a state space, and the time information deltat of each scheduling forms a time part of the state space;
step S22, constructing a reinforcement learning action space of the multi-agent: controlling the frequency deviation of each scheduling agent;
Step S23, defining a reward function: the intelligent agent is used for guiding the intelligent agent to achieve a preset micro-grid optimization target;
step S24, setting an energy storage system backup controller: to ensure that actions generated by the schedulable agent and the energy storage system do not exceed the power range of the system;
The frequency control goal of the micro-grid is to discretize the frequency deviation by implementing the frequency deviation of the optimized distributed power supply, namely the environmental state corresponding to { Δf 1,Δf2,Δf3,...Δfn } is { s 1,s2,s3...sn };
The value of the environmental state interval can affect the convergence speed and the precision of the controller, the frequency adjustment range of the power system is 50+/-0.1 hz, and the state S can be designed as follows:
setting a bonus function based on the frequency distribution in S as:
Wherein mu 1~μ4 is a reward factor;
An agent acts on the environment to change state s, where the environment feeds back a prize R to the agent, and the process is continuously cycled as markov decision process, and a Q table is used to store a value function Q (s, a) corresponding to the state and the action of the system, that is, the system takes action a t at time s t in a certain state to obtain accumulated return R t, which can be expressed as expected return:
Q(s,a)=E[Rt|st=s,at=a]=E[rt+γQ(st+1,at+1)+γ2(st+2,at+2)+...] (4)
In this training process, the Q value training module trains with the energy storage device tuple (s t,at,rt,st+1) as a sample, s t as a current state, a t as a current action, r t as an instant prize after the action is executed, s t+1 as a next state, t as a time, and the Q function recurrence update strategy is:
Where α is the learning rate and γ is the discount factor.
Step S3, designing a double DQN deep reinforcement learning algorithm flow of the double neural network: training the reinforcement learning environment defined in the step S2 for a plurality of times by adopting a neural network to achieve convergence of the rewarding value;
The state and the action of the Q function in a general reinforcement learning algorithm have a high-dimensional complex problem, and a neural network Q (s, a; omega) can be introduced as a function approximator to estimate the Q (s, a) function in order to solve the problem; obtaining a Q value of the action after the neural network analysis according to the state and the input of the action, and selecting the maximum Q value as the action of the next step;
The weight ω of the deep neural network represents the mapping of the system state to the Q value, so a loss function L i (ω) needs to be defined to update the neural network weight ω and the corresponding Q value:
Li(ωt)=Es[(yt-Q(s,a;ωt))2]
(6)
Wherein y t is represented as an objective function:
The weights of the agents are updated by gradient the loss function and performing a random gradient descent:
For the algorithm performance to be more stable, an estimated network and a target network are respectively constructed on the basis of a deep learning algorithm framework, the two networks have the same structure but different parameters, the estimated network value is generally smaller than that of the target network, so that the estimated network is adopted to continuously learn iteration to update network parameters, the target network uses the parameter updated by the estimated network to update own parameters for a period of time T, one of the two parameters is used for selecting actions, the other is used for evaluating the value of the current state, and the two parameters are respectively marked as omega t and omega t -:
And the multi-agent in the micro-grid system randomly selects actions and environments according to a certain probability to perform better exploration feedback, searches for actions which maximize rewards under a certain state, and converges to an optimal strategy finally along with the continuous increase of training times until the actions which maximize the Q value are completely adopted.
S4, based on the Q value trained by reinforcement learning, realizing frequency deviation adjustment of the distributed power supply;
And training the control strategy of the step S2 and the step S3 for a plurality of times by adopting a deep reinforcement learning algorithm, and training out the overestimation problem of the Q value solving algorithm by utilizing the deep reinforcement learning algorithm so as to optimize the stability of the multi-agent system.
And the micro-grid system performs relevant operation on each distributed power supply to complete optimal energy management optimization strategy selection, so that cooperative control of the micro-grid is realized.
According to the micro-grid energy scheduling method of the double DQN network in the step S2, the intelligent agent randomly selects actions according to a certain probability according to the self state to explore the environment, selects actions with maximum rewards according to the self state, and finally reduces the exploration probability to select actions with maximum Q value along with the increase of training times so as to achieve the optimal convergence strategy.
According to the deep reinforcement learning algorithm described in step S3, the data is stored in a preferential experience playback mode (S t,at,rt,st+1) and the feature vectors thereof are recorded, the intelligent agent randomly takes action to generate enough training data to be stored in an experience pool in the initial training stage, the data is randomly selected to update parameters of the neural network after the memory unit is filled, and new data with poor updating correlation is continuously obtained in the strategy training process, so that no valuable iteration is avoided, and the convergence rate is improved.
Establishing a voltage and frequency control model of the micro-grid: active power is regulated by controlling the frequency of a power grid, reactive power is regulated by voltage amplitude, and droop control is realized; designing a multi-agent-based deep reinforcement learning framework: constructing a Markov decision process of a multi-agent reinforcement learning environment action space and state space and a reward function; designing a flow of a deep reinforcement learning algorithm of the double neural networks, and training the defined reinforcement learning environment for multiple times by adopting the neural networks to achieve convergence of the rewarding value and train the optimal Q value; based on the Q value trained by reinforcement learning, the frequency deviation adjustment of the distributed power supply is realized, and the overestimation problem of the reinforcement learning algorithm is solved so as to optimize the stability of the multi-agent system. And the micro-grid system performs related operation on each distributed power supply, so that optimal energy management optimization strategy selection is completed, and cooperative control of the micro-grid is realized. The deep learning multi-agent micro-grid cooperative control method based on the double-neural network provided by the invention is used for guaranteeing the stability of the micro-grid system and the cost of power dispatching when the energy dispatching of the multi-agent micro-grid system is flexibly accessed to renewable energy sources and the energy exchange of a micro-grid group is problematic.
The above embodiments are only for illustrating the present invention, not for limiting the present invention, and various changes and modifications may be made by one of ordinary skill in the relevant art without departing from the spirit and scope of the present invention, and therefore, all equivalent technical solutions are also within the scope of the present invention, and the scope of the present invention is defined by the claims.
Claims (2)
1. A deep learning multi-agent micro-grid cooperative control method based on a double neural network is characterized by comprising the following steps:
Step S1, establishing a voltage and frequency control model of a micro-grid; the method for controlling the frequency of the micro-grid is based on a synchronous generator control theory by using an alternating-current micro-grid, and active power and reactive power of the micro-grid are regulated by adopting a droop control method;
wherein: the active power method for droop control comprises the following steps:
f=f 0-kp(P-P*) (1)
Wherein: f 0 is the rated frequency, p is the rated active power, kp is the droop coefficient;
Step S2, training by adopting a micro-grid model under a deep reinforcement learning framework, searching an optimal Q value network, and specifically comprising the following steps:
Step S21, constructing an environment state space for reinforcement learning: the reinforcement learning environment is a micro-grid system, the environment and the intelligent agents are subjected to feedback rewarding, the frequency deviation state of the micro-grid multi-intelligent-agent system controller forms a controllable part of a state space, and the time information deltat of each scheduling forms a time part of the state space;
step S22, constructing an environment action space for reinforcement learning: controlling the frequency deviation of each scheduling agent;
Step S23, defining a reward function: the intelligent agent is used for guiding the intelligent agent to achieve a preset micro-grid optimization target;
Step S24, setting a backup controller of the energy storage system so that actions generated by the schedulable agent and the agent of the energy storage system do not exceed the power range of the system, and specifically comprising the following steps:
by the Markov decision principle, the Q table is utilized to store a value function Q (s, a) corresponding to the system state and action, namely, the system takes action at a certain state at a time t to represent the obtained accumulated return Rt as the expected return, and gamma is represented as a discount factor:
Q(s,a)=E[Rt|st=s,at=a]=E[rt+γQ(st+1,at+1)+γ2(st+2,at+2)+...] (2)
In the training process, the Q value training module trains with the energy storage device tuple (st, at, rt, st+1) as a sample, st is the current state, at is the current action, rt is the instant reward after the action is executed, st+1 is the next state, t is the moment, and the Q function recursion updating strategy is:
Wherein alpha is learning rate and gamma is discount factor;
Step S3, establishing a double-neural network deep reinforcement learning algorithm flow: training the reinforcement learning environment defined in the step S2 for a plurality of times by adopting a neural network to achieve convergence of the rewarding value;
Estimating a Q (s, a) function by adopting a neural network Q (s, a; omega) as a function approximator; obtaining a Q value of the action after the neural network analysis according to the state and the input of the action, and selecting the maximum Q value as the action of the next step;
The weight ω of the deep neural network represents the mapping of the system state to the Q value, and a loss function Li (ω) is defined to update the neural network weight ω with the corresponding Q value:
l i(ωt)=Es[(yt-Q(s,a;ωt))2 ] type (4)
Wherein y t is represented as an objective function:
the weights of the agents are updated by gradient the loss function and performing a random gradient descent:
constructing an estimated network and a target network, wherein the two networks have the same structure but different parameters, the estimated network value is smaller than that of the target network, the estimated network is adopted to continuously learn iteration to update network parameters, the target network uses the parameters updated by the estimated network to update own parameters for a period of time T, one of the two parameters is used for selecting actions, and the other is used for evaluating the current state, wherein the two parameters are respectively marked as omega t and omega t -:
the multi-agent in the micro-grid system randomly selects actions and environments according to a certain probability to perform better exploration feedback, searches for actions which maximize rewards under a specific state, and converges to an optimal strategy finally along with the continuous increase of training times until actions which maximize Q values are completely adopted;
And S4, based on the Q value trained by reinforcement learning, realizing frequency deviation adjustment of the distributed power supply.
2. The deep learning multi-agent micro grid cooperative control method based on the dual neural network as set forth in claim 1, wherein the step S4 includes:
the deep reinforcement learning algorithm is adopted to train the control strategy of the step S2 and the step S3 for a plurality of times, and the deep reinforcement learning algorithm is utilized to train the Q value so as to optimize the stability of the multi-agent system;
According to the step S2, the intelligent agent randomly selects actions according to a certain probability according to the self state to explore the environment, selects actions with maximum rewards according to the self state, and reduces the exploration probability to select actions with maximum Q value along with the increase of training times so as to achieve the optimal convergence strategy;
according to the deep reinforcement learning algorithm in the step S3, data (st, at, rt, st+1) are stored in a preferential experience playback mode, characteristic vectors of the data are recorded, an intelligent agent randomly takes action in the initial training stage to generate enough training data to be stored in an experience pool, the data are randomly selected to update parameters of a neural network after the memory unit is filled, and new data with poor updating correlation are continuously obtained in the strategy training process.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210797934.7A CN115333143B (en) | 2022-07-08 | 2022-07-08 | Deep learning multi-agent micro-grid cooperative control method based on double neural networks |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210797934.7A CN115333143B (en) | 2022-07-08 | 2022-07-08 | Deep learning multi-agent micro-grid cooperative control method based on double neural networks |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115333143A CN115333143A (en) | 2022-11-11 |
CN115333143B true CN115333143B (en) | 2024-05-07 |
Family
ID=83917405
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210797934.7A Active CN115333143B (en) | 2022-07-08 | 2022-07-08 | Deep learning multi-agent micro-grid cooperative control method based on double neural networks |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115333143B (en) |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115499849B (en) * | 2022-11-16 | 2023-04-07 | 国网湖北省电力有限公司信息通信公司 | Wireless access point and reconfigurable intelligent surface cooperation method |
CN116307440B (en) * | 2022-11-21 | 2023-11-17 | 暨南大学 | Workshop scheduling method based on reinforcement learning and multi-objective weight learning, device and application thereof |
CN115796364A (en) * | 2022-11-30 | 2023-03-14 | 南京邮电大学 | Intelligent interactive decision-making method for discrete manufacturing system |
CN116488154B (en) * | 2023-04-17 | 2024-07-26 | 海南大学 | Energy scheduling method, system, computer equipment and medium based on micro-grid |
CN116594358B (en) * | 2023-04-20 | 2024-01-02 | 暨南大学 | Multi-layer factory workshop scheduling method based on reinforcement learning |
CN116629128B (en) * | 2023-05-30 | 2024-03-29 | 哈尔滨工业大学 | Method for controlling arc additive forming based on deep reinforcement learning |
CN116934050A (en) * | 2023-08-10 | 2023-10-24 | 深圳市思特克电子技术开发有限公司 | Electric power intelligent scheduling system based on reinforcement learning |
CN117172163B (en) * | 2023-08-15 | 2024-04-12 | 重庆西南集成电路设计有限责任公司 | Amplitude and phase two-dimensional optimization method and system of amplitude and phase control circuit, medium and electronic equipment |
CN117350515B (en) * | 2023-11-21 | 2024-04-05 | 安徽大学 | Ocean island group energy flow scheduling method based on multi-agent reinforcement learning |
CN117713202B (en) * | 2023-12-15 | 2024-08-13 | 嘉兴正弦电气有限公司 | Distributed power supply self-adaptive control method and system based on deep reinforcement learning |
CN117474295B (en) * | 2023-12-26 | 2024-04-26 | 长春工业大学 | Dueling DQN algorithm-based multi-AGV load balancing and task scheduling method |
CN117764360A (en) * | 2023-12-29 | 2024-03-26 | 中海油信息科技有限公司 | Paint workshop intelligent scheduling method based on graphic neural network |
CN117578466B (en) * | 2024-01-17 | 2024-04-05 | 国网山西省电力公司电力科学研究院 | Power system transient stability prevention control method based on dominant function decomposition |
CN117807895B (en) * | 2024-02-28 | 2024-06-04 | 中国电建集团昆明勘测设计研究院有限公司 | Magnetorheological damper control method and device based on deep reinforcement learning |
CN117808174B (en) * | 2024-03-01 | 2024-05-28 | 山东大学 | Micro-grid operation optimization method and system based on reinforcement learning under network attack |
CN117973233B (en) * | 2024-03-29 | 2024-06-18 | 合肥工业大学 | Converter control model training and oscillation suppression method based on deep reinforcement learning |
CN118092195B (en) * | 2024-04-26 | 2024-06-25 | 山东工商学院 | Multi-agent cooperative control method for improving IQL based on cooperative training model |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106410808A (en) * | 2016-09-27 | 2017-02-15 | 东南大学 | General distributed control method comprising constant-power control and droop control for microgrid group |
CN109347149A (en) * | 2018-09-20 | 2019-02-15 | 国网河南省电力公司电力科学研究院 | Micro-capacitance sensor energy storage dispatching method and device based on depth Q value network intensified learning |
CN111200285A (en) * | 2020-02-12 | 2020-05-26 | 燕山大学 | Micro-grid hybrid coordination control method based on reinforcement learning and multi-agent theory |
CN111371112A (en) * | 2020-04-15 | 2020-07-03 | 苏州科技大学 | Distributed finite time control method for island microgrid heterogeneous battery energy storage system |
CN111431216A (en) * | 2020-03-18 | 2020-07-17 | 国网浙江嘉善县供电有限公司 | High-proportion photovoltaic microgrid reactive power sharing control method adopting Q learning |
CN112117760A (en) * | 2020-08-13 | 2020-12-22 | 国网浙江省电力有限公司台州供电公司 | Micro-grid energy scheduling method based on double-Q-value network deep reinforcement learning |
CN114400704A (en) * | 2022-01-24 | 2022-04-26 | 燕山大学 | Island micro-grid multi-mode switching strategy based on double Q learning consideration economic regulation |
CN114421479A (en) * | 2021-11-30 | 2022-04-29 | 国网浙江省电力有限公司台州供电公司 | Voltage control method for AC/DC micro-grid group cooperative mutual supply |
WO2022135066A1 (en) * | 2020-12-25 | 2022-06-30 | 南京理工大学 | Temporal difference-based hybrid flow-shop scheduling method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111694365B (en) * | 2020-07-01 | 2021-04-20 | 武汉理工大学 | Unmanned ship formation path tracking method based on deep reinforcement learning |
-
2022
- 2022-07-08 CN CN202210797934.7A patent/CN115333143B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106410808A (en) * | 2016-09-27 | 2017-02-15 | 东南大学 | General distributed control method comprising constant-power control and droop control for microgrid group |
CN109347149A (en) * | 2018-09-20 | 2019-02-15 | 国网河南省电力公司电力科学研究院 | Micro-capacitance sensor energy storage dispatching method and device based on depth Q value network intensified learning |
CN111200285A (en) * | 2020-02-12 | 2020-05-26 | 燕山大学 | Micro-grid hybrid coordination control method based on reinforcement learning and multi-agent theory |
CN111431216A (en) * | 2020-03-18 | 2020-07-17 | 国网浙江嘉善县供电有限公司 | High-proportion photovoltaic microgrid reactive power sharing control method adopting Q learning |
CN111371112A (en) * | 2020-04-15 | 2020-07-03 | 苏州科技大学 | Distributed finite time control method for island microgrid heterogeneous battery energy storage system |
CN112117760A (en) * | 2020-08-13 | 2020-12-22 | 国网浙江省电力有限公司台州供电公司 | Micro-grid energy scheduling method based on double-Q-value network deep reinforcement learning |
WO2022135066A1 (en) * | 2020-12-25 | 2022-06-30 | 南京理工大学 | Temporal difference-based hybrid flow-shop scheduling method |
CN114421479A (en) * | 2021-11-30 | 2022-04-29 | 国网浙江省电力有限公司台州供电公司 | Voltage control method for AC/DC micro-grid group cooperative mutual supply |
CN114400704A (en) * | 2022-01-24 | 2022-04-26 | 燕山大学 | Island micro-grid multi-mode switching strategy based on double Q learning consideration economic regulation |
Non-Patent Citations (1)
Title |
---|
基于强化学习的多微电网分布式二次优化控制;沈珺;柳伟;李虎成;李娜;温镇;殷明慧;;电力系统自动化;20200305(第05期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN115333143A (en) | 2022-11-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115333143B (en) | Deep learning multi-agent micro-grid cooperative control method based on double neural networks | |
CN109347149B (en) | Micro-grid energy storage scheduling method and device based on deep Q-value network reinforcement learning | |
CN114362196B (en) | Multi-time-scale active power distribution network voltage control method | |
CN108565874B (en) | Source-load cooperative frequency modulation method based on load frequency control model | |
CN114217524A (en) | Power grid real-time self-adaptive decision-making method based on deep reinforcement learning | |
CN110138019B (en) | Method for optimizing start and stop of unit | |
CN113872213B (en) | Autonomous optimization control method and device for power distribution network voltage | |
Tsang et al. | Autonomous household energy management using deep reinforcement learning | |
CN110445186B (en) | Self-synchronizing microgrid control system and secondary frequency modulation control method | |
CN117039981A (en) | Large-scale power grid optimal scheduling method, device and storage medium for new energy | |
CN117578466B (en) | Power system transient stability prevention control method based on dominant function decomposition | |
CN117117989A (en) | Deep reinforcement learning solving method for unit combination | |
CN115459320B (en) | Intelligent decision-making method and device for aggregation control of multipoint distributed energy storage system | |
CN115133540B (en) | Model-free real-time voltage control method for power distribution network | |
CN114400675B (en) | Active power distribution network voltage control method based on weight mean value deep double-Q network | |
Tang et al. | Voltage Control Strategy of Distribution Networks with Distributed Photovoltaic Based on Multi-agent Deep Reinforcement Learning | |
CN110289643B (en) | Rejection depth differential dynamic planning real-time power generation scheduling and control algorithm | |
CN114421470B (en) | Intelligent real-time operation control method for flexible diamond type power distribution system | |
CN117713202B (en) | Distributed power supply self-adaptive control method and system based on deep reinforcement learning | |
CN118508416A (en) | Urban level micro-grid control method | |
CN117674160A (en) | Active power distribution network real-time voltage control method based on multi-agent deep reinforcement learning | |
Song et al. | Research on Cooperative Control Algorithm Based on Distributed Multi-Region Integrated Energy System | |
Ai et al. | Power flow rebalancing in optimal scheduling of smart distribution systems based on Deep Deterministic Policy Gradient | |
Chen et al. | Research on Flexible Resource Dynamic Interactive Regulation Technology for Microgrids with High Permeable New Energy | |
CN117350423A (en) | Distributed energy system cluster collaborative optimization method based on multi-agent reinforcement learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |