CN114362187B - Active power distribution network cooperative voltage regulation method and system based on multi-agent deep reinforcement learning - Google Patents

Active power distribution network cooperative voltage regulation method and system based on multi-agent deep reinforcement learning Download PDF

Info

Publication number
CN114362187B
CN114362187B CN202111415562.9A CN202111415562A CN114362187B CN 114362187 B CN114362187 B CN 114362187B CN 202111415562 A CN202111415562 A CN 202111415562A CN 114362187 B CN114362187 B CN 114362187B
Authority
CN
China
Prior art keywords
distributed power
power
network
voltage
time slot
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111415562.9A
Other languages
Chinese (zh)
Other versions
CN114362187A (en
Inventor
余亮
毕刚
岳东
窦春霞
张廷军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202111415562.9A priority Critical patent/CN114362187B/en
Publication of CN114362187A publication Critical patent/CN114362187A/en
Application granted granted Critical
Publication of CN114362187B publication Critical patent/CN114362187B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E40/00Technologies for an efficient electrical power generation, transmission or distribution
    • Y02E40/30Reactive power compensation

Landscapes

  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention discloses an active power distribution network cooperative voltage regulation method and system based on multi-agent deep reinforcement learning, which comprises the steps of obtaining a high-proportion renewable energy power distribution network cooperative voltage control model; designing the cooperative voltage control model as a Markov game problem related to the control of each distributed power inverter; solving the Markov game problem by adopting a multi-agent attention near-end strategy optimization algorithm and expert knowledge, and finally obtaining the optimal control strategy of the local active power and reactive power of each distributed power inverter; compared with the existing method, the method provided by the invention has stronger renewable energy consumption capability on the premise of realizing the voltage safety of the power distribution network.

Description

Active power distribution network cooperative voltage regulation method and system based on multi-agent deep reinforcement learning
Technical Field
The invention relates to the technical field of voltage regulation and artificial intelligence crossing of a power distribution network, in particular to an active power distribution network cooperative voltage regulation method and system based on multi-agent deep strong learning.
Background
The power flow of the traditional power distribution network is that power is supplied to loads of all nodes from a first section node along the direction of a feeder line, the power is in a radial type, and the voltage is gradually reduced along the power flow direction of the feeder line. The grid connection of the distributed power supply changes the distribution of the power flow, so that the distributed power supply supplies power to the local node or a nearby node, and the voltage of the local node is increased. Therefore, it is highly necessary to cooperatively control the power distribution network including the distributed power sources in real time so as to control the voltages of the nodes within a safe range and minimize the reduction of the active power of the distributed power sources.
The active power distribution network cooperative voltage regulation method of the traditional method mainly comprises the following steps: such as empirical rule based methods and safe optimal power flow based methods (e.g., model predictive control). The former adopts a preset threshold value as a decision basis, has small calculation amount, and is easy to cause unnecessary load removal. The latter requires accurate knowledge of the system model and is computationally expensive. In order to reduce the dependence on the accurate model, some data-driven methods are proposed, such as a reinforcement learning method. The methods can learn an end-to-end strategy, namely, a control decision is directly obtained according to feedback information of the power grid. However, the conventional reinforcement learning method cannot effectively cope with the situation of large state space, i.e. the method lacks stability or even does not converge. Therefore, in the existing research, some voltage control methods based on Deep reinforcement learning are proposed, for example, the methods based on Multi-Agent Deep reinforcement learning include Multi-Agent Deep dependent Policy Gradient (maddppg) and the like, and although these methods can effectively control the voltage, the stability and expandability of the algorithm are weak, and efficient cooperation between large-scale distributed power supplies cannot be realized, so that the reduction of active power is reduced.
Disclosure of Invention
The invention aims to provide an active power distribution network cooperative voltage regulation method and system based on multi-agent deep reinforcement learning, which have the advantages of stability brought by multi-agent near-end strategy optimization algorithm and expert knowledge and high expandability brought by attention mechanism.
The invention adopts the following technical scheme for realizing the aim of the invention:
the invention provides an active power distribution network cooperative voltage regulation method based on multi-agent deep reinforcement learning, which comprises the following steps:
acquiring a high-proportion renewable energy power distribution network collaborative voltage control model;
designing the cooperative voltage control model as a Markov game problem related to the control of each distributed power inverter;
solving the Markov game problem by adopting a multi-agent attention near-end strategy optimization algorithm and expert knowledge, and finally obtaining the optimal control strategy of the local active power and reactive power of each distributed power inverter;
and deploying the optimal control strategy obtained by training for online cooperative voltage regulation.
Further, the collaborative voltage control model comprises an objective function, decision variables and constraint conditions;
if the number of the nodes of the power distribution network is M, the number of the accessed distributed power supplies is N, and the target function is expressed as follows:
Figure BDA0003375209180000021
in formula (1): [. For] + = max (·, 0), | · | denotes the absolute value, V min And V max Respectively representing the lowest and highest voltage values, V, acceptable to the node j,t Representing the t-slot node voltage, M representing the number of distribution network nodes, Δ p i,t Represents the active power reduction quantity of the ith distributed power supply in the t time slot, delta q i,t The method comprises the steps that reactive compensation quantity of an ith distributed power inverter in a t time slot is represented, N represents the number of power distribution network access distributed power, alpha is an importance coefficient of penalty cost caused by the active reduction cost of the distributed power relative to the voltage deviation degree, and beta is an importance coefficient of penalty cost caused by the reactive power of the distributed power inverter relative to the voltage deviation degree;
decision variables and constraints, the formula is as follows:
Figure BDA0003375209180000022
Figure BDA0003375209180000023
Figure BDA0003375209180000024
in formula (2):
Figure BDA0003375209180000025
and
Figure BDA0003375209180000026
for the minimum and maximum reactive compensation amount of the ith distributed power inverter, in equation (3):
Figure BDA0003375209180000027
for the maximum active power of the ith distributed power supply at the time t, P in the formula (4) i,t For the active power, Q, of the ith distributed power supply under the condition of unadjusted t time slot i,t For the reactive power of the ith distributed power supply under the condition of unadjusted t time slot, S i The apparent power of the ith distributed power supply is a fixed value;
after the distributed power supply is subjected to reactive power compensation and active power reduction, the whole power distribution network meets the tidal current equality constraint, and the formula is as follows:
Figure BDA0003375209180000028
Figure BDA0003375209180000031
in formulae (5) and (6):
Figure BDA0003375209180000032
and
Figure BDA0003375209180000033
is that the load demands the access node i at t time slotActive and reactive power, G ij,t And B ij,t Are the real and imaginary parts of the admittance element between node i and node j.
Further, the Markov game problem is characterized by three parts of environment state, action and reward function;
environmental state s t Represented by the following tuple:
S t =(o 1,t ,o 2,t ,…,o n,t ) (7)
in formula (7): o. o i,t =(P i,t ,Q i,t ,V i,t ),P i,t Representing the active power, Q, of a t-slot distributed power access node i i,t Representing the reactive power, V, of a t-slot distributed power supply access node i i,t Representing the voltage of the access node of the distributed power supply at the t time slot;
action a t Represented by the following tuple:
a t =(Δq i,t ,Δp i,t ) (8)
in formula (8): a is a t For the behavior of the distributed power inverter at time slot t, Δ q i,t Reactive compensation quantity delta p for ith distributed power inverter in time slot t i,t Reducing the active power of the ith distributed power supply;
reward function r t The expression is as follows:
c 1,t r t =c 1,t +αc 2,t +βc 3,t (9)
in formula (9): c. C 1,t Penalty cost due to violation of safe voltage at all nodes of t time slot, c 2,t Is the sum of the active reduction of all distributed power supplies in the t time slot, c 3,t The sum of all the reactive compensation quantities of the distributed power supply inverters in the t time slot is obtained, alpha is an importance coefficient of penalty cost caused by the active reduction cost of the distributed power supply relative to the voltage deviation degree, and beta is an importance coefficient of penalty cost caused by the reactive compensation of the distributed power supply inverters relative to the voltage deviation degree.
Further, the multi-agent attention near-end strategy optimization algorithm comprises an actor network, a critic network and an attention network; for distributed power source node i, the critic network and the attention network are characterized in that:
V i (o i,t )=f i (g i (o i ),x i ) (10)
Figure BDA0003375209180000034
in formula (10): v i (o i,t ) Is a function of the state value, g i Is a state encoder of the distributed power source i, f i Is a double-layer fully-connected neural network, x i The contributions from the rest of the distributed power state information are obtained after weighting;
initializing a strategy parameter θ = (θ) for all distributed power inverters 12 ,…,θ n ) Interacting with the power distribution network environment in each iteration, collecting states and behaviors and calculating corresponding advantage functions, when batch data acquired by interaction are studied, firstly inputting a plurality of states stored in batch data to a critic network and an attention network, then calculating the output of the attention network to the critic network to obtain a plurality of advantage functions, and finally optimizing parameters of an actor network and the critic network for multiple times by using the advantage functions of the batch data, wherein the objective functions are as follows:
Figure BDA0003375209180000041
in formula (12): pi θ' Is a strategy for sampling, uses the collected samples for training theta,
Figure BDA0003375209180000042
function is expressed as
Figure BDA0003375209180000043
Greater than 1When + epsilon is 1+ epsilon, and when less than 1-epsilon is 1-epsilon, epsilon is a hyperparameter.
Further, the expert knowledge expression is as follows:
V min ≤V i,t ≤V max (13)
Figure BDA0003375209180000044
Figure BDA0003375209180000045
in formula (13): v min And V max Respectively representing the lowest and highest voltage values acceptable for the node;
the formulae (14) and (15) are represented as: when the current voltage is higher than the tolerable maximum voltage, the distributed power supply can only reduce the reactive output and active power reduction; when the current voltage is lower than the acceptable minimum voltage, the distributed power supply can only increase the reactive power, and the condition that the active power cannot be increased in an actual physical scene is met.
Further, the method for obtaining the local active power and reactive power optimal control strategy of each distributed power inverter is as follows:
step 1: acquiring the current environment state of the power distribution network, and inputting the current environment state into the actor network to obtain a mean value and a variance;
step 2: constructing the distribution of the actions of the distributed power inverter according to the mean value and the variance, and then obtaining the current action of the distributed power inverter through sampling;
and step 3: the current action is trimmed according to expert knowledge, then reactive compensation amount and active reduction amount of a distributed power inverter actually acting on the environment are output, the reactive compensation amount and the active reduction amount are input into the environment of a digital twin simulator of the power distribution network system to obtain rewards and the state of the next time slot, then current environment state information, the current action and the rewards are stored in an experience pool, then the state of the power distribution network of the next time slot is input into an actor network, and the step 1-3 is repeated for a certain number of times;
and 4, step 4: inputting the state of the last time slot after the circulation of the steps 1-3 into a critic network embedded with an attention mechanism to obtain a state value function V', and calculating the discount reward by the formula (16) to obtain R = [ ] 1 ,R 2 ,…,R T ];
R t =r t1 *r t+12 *r t+2 +…+γ T-t *V' (16)
In formula (16) < gamma >, (Y) 12 ,…,γ T-t Is a discount factor, T is the last time slot;
and 5: inputting all stored state combinations into a critic network to obtain all state value functions V, and calculating the distributed power supply inverter advantage function through a formula (17);
A t =R-V (17)
step 6: calculating a loss function of the critic network, and then reversely propagating and updating the critic network;
and 7: calculating the loss function of the actor network by the formula (12), and then reversely propagating and updating the actor network;
and 8: repeating the step 6-7 to update for multiple times;
and step 9: and (5) circulating the steps 1-8 until the training reward curve tends to be stable, and finishing the training to obtain the optimal control strategy of the local active power and the local reactive power of each distributed power supply inverter.
Further, the method for performing online coordinated voltage regulation on the optimal control strategy deployment obtained by training comprises the following steps:
collecting environment state information of distributed power nodes;
sending the acquired environmental state information to a distributed power inverter at a corresponding node;
and the distributed power inverter outputs the reactive compensation amount and the active reduction amount of the distributed power inverter in the current time slot by using the obtained optimal control strategy, and performs online cooperative voltage regulation.
The invention provides an active power distribution network cooperative voltage regulation system based on multi-agent deep reinforcement learning, which comprises:
an acquisition module: the method is used for acquiring a high-proportion renewable energy power distribution network coordinated voltage control model;
designing a module: the method comprises the steps of designing a cooperative voltage control model as a Markov game problem related to control of each distributed power inverter;
a solving module: the method is used for solving the Markov game problem by adopting a multi-agent attention near-end strategy optimization algorithm and expert knowledge, and finally obtaining the optimal control strategy of the local active power and reactive power of each distributed power inverter;
a deployment module: and the method is used for carrying out online cooperative pressure regulation on the optimal control strategy deployment obtained by training.
The invention has the following beneficial effects:
compared with the traditional method, the method does not need to know the accurate knowledge of the system model, supports end-to-end control under the uncertain environment, and has millisecond-level response and low computational complexity;
compared with the DQN, MADDPG and other pressure regulating methods based on deep reinforcement learning, the method adopts a multi-agent attention near-end strategy optimization algorithm and expert knowledge, and has higher stability;
compared with the prior art, the method can control all node voltages within a safe range, obviously reduce the active power reduction amount of the distributed power supply, and has stronger renewable energy consumption capability.
Drawings
Fig. 1 is a schematic flow chart of an active power distribution network cooperative voltage regulation method based on multi-agent deep reinforcement learning according to an embodiment of the present invention;
fig. 2 is a voltage comparison diagram of an active power distribution network cooperative voltage regulation method based on multi-agent deep reinforcement learning and other methods provided by the embodiment of the invention;
fig. 3 is a comparison diagram of the active reduction and reduction of the active power distribution network cooperative voltage regulation method based on multi-agent deep reinforcement learning and other methods provided by the embodiment of the invention.
Detailed Description
In order to further explain the technical means and effects of the present invention, the following description will be clearly and completely provided for the technical solution of the present invention with reference to the accompanying drawings and preferred embodiments.
As shown in fig. 1, a design flow chart of the active power distribution network cooperative voltage regulation method based on multi-agent deep reinforcement learning provided by the invention comprises the following steps:
step 1: establishing a high-proportion renewable energy power distribution network cooperative voltage control model by aiming at minimizing the reduction of the active power of the distributed power supply under the premise of controlling the voltages of all the nodes within a set safety range;
and 2, step: designing the established cooperative voltage control model as a Markov game problem related to each renewable energy inverter;
and step 3: solving the Markov game problem by adopting a multi-agent attention near-end strategy optimization algorithm and expert knowledge, and finally obtaining the optimal control strategy of the local active power and reactive power of each distributed power inverter;
and 4, step 4: in an actual application scene, actual deployment is carried out according to an optimal control strategy obtained by training to carry out online collaborative pressure regulation, namely: and obtaining the reactive compensation amount and the active reduction amount of each distributed power inverter immediately according to the acquired environment state information.
Selecting IEEE33 bus to access 6 distributed power supplies as a simulation implementation object, wherein in the step 1, the cooperative voltage control model comprises three parts, namely a target function, a decision variable and a constraint condition, and specifically comprises the following steps:
if the number of the nodes of the power distribution network is M, the number of the accessed distributed power supplies is N, and the target function is expressed as follows:
Figure BDA0003375209180000061
in formula (1): [. The] + = max (·, 0), | · | denotes the absolute value, V min And V max Respectively represent nodes canMinimum and maximum voltage values accepted, V j,t Representing the t-slot node voltage, M representing the number of distribution network nodes, Δ p i,t The active power reduction amount of the ith distributed power supply in a time slot t is represented, N represents the number of the distributed power supplies connected to the power distribution network, and alpha is an importance coefficient of penalty cost caused by the active reduction cost of the distributed power supplies relative to the voltage deviation degree.
Decision variables and constraints are as follows:
and the decision variables comprise reactive compensation quantity and active reduction quantity of the distributed power supply inverter. The active power reduction amount of the distributed power supply is smaller than the maximum output of the distributed power supply in the current time slot. No matter the distributed power supply inverter selects to increase the reactive power or reduce the reactive power output, the reactive power regulation range meets the selectable reactive power range of the distributed power supply inverter, and the constraint conditions of active power, reactive power and apparent power are met, and the specific formula is as follows:
Figure BDA0003375209180000071
Figure BDA0003375209180000072
Figure BDA0003375209180000073
in formula (2): Δ q of i,t For the reactive compensation quantity of the ith distributed power inverter in the t time slot,
Figure BDA0003375209180000074
and
Figure BDA0003375209180000075
the minimum and maximum reactive compensation quantities of the ith distributed power inverter. In formula (3):
Figure BDA0003375209180000076
the maximum active power of the ith distributed power supply at the moment t. P in formula (4) i,t For the active power, Q, of the ith distributed power supply under the condition of unadjusted t time slot i,t For the reactive power of the ith distributed power supply without regulation at the time slot t, S i Is the apparent power of the ith distributed power source.
After the distributed power supply is subjected to reactive power compensation and active power reduction, the whole power distribution network meets the tidal current equality constraint, and the specific formula is as follows:
Figure BDA0003375209180000077
Figure BDA0003375209180000078
in formulae (5) and (6):
Figure BDA0003375209180000079
and
Figure BDA00033752091800000710
respectively representing the active power and the reactive power of the access node i in the time slot t of the load demand, G ij,t And B ij,t Are the real and imaginary parts of the admittance element between node i and node j.
In step 2 above, the established coordinated voltage control model is designed as a markov game problem associated with the control of each distributed power inverter. Specifically, in the Markov game problem, each agent selects an action based on current local state information to maximize its own expected reward. Since the multi-agent deep reinforcement learning does not need information of the state transition function, in this embodiment, the environment state, the action, and the reward function are mainly designed as follows:
(1) Environmental state: distributed power access node environmental state s t Represented by the following tuple:
S t =(o 1,t ,o 2,t ,…,o n,t )(7)
in formula (7): o. o i,t =(P i,t ,Q i,t ,V i,t ),P i,t Representing the active power, Q, of a t-slot distributed power access node i i,t Representing the reactive power, V, of a t-slot distributed power access node i i,t Representing the voltage of the t-slot distributed power access node.
(2) The method comprises the following steps: distributed power inverter action a t Represented by the following tuple:
a t =(Δq i,t ,Δp i,t )(8)
in formula (8): a is t For distributed power inverter operation in t time slot, Δ q i,t For the ith distributed power inverter, the reactive compensation quantity of the time gap is delta p i,t And reducing the active power of the ith distributed power supply.
(3) The reward function: r is t The expression is as follows:
r t =c 1,t +αc 2,t (9)
in formula (9): c. C 1,t Penalty cost due to violation of safe voltage at all nodes of t time slot, c 2,t The sum of the active reduction amount of all the distributed power supplies in the t time slot is obtained, and alpha is an important coefficient of the active reduction cost of the distributed power supplies relative to the penalty cost caused by the voltage deviation degree.
Cost c of voltage deviation from safety range in reward expression (9) 1,t Active power reduction cost c of distributed power supply 2,t The specific expression is as follows:
Figure BDA0003375209180000081
Figure BDA0003375209180000082
in step 3, the multi-agent attention near-end strategy optimization algorithm is structured as follows:
the framework includes actor network, commentsA network of the student and an attention network. Each distributed power supply-related agent has the same network structure, namely an actor network, a critic network, and an attention network common to all critic networks. The number of neurons of the actor network input layer corresponds to the number of components of the local observed state information, and the number of neurons of the output layer corresponds to the number of continuous actions. In particular, the input layer of the actor network corresponds to the local observed state s of the agent i,t The output layer of the actor network corresponds to the reactive compensation amount and the active reduction amount of the distributed power inverter.
For distributed power source node i, the critic network and the attention network are characterized in that:
V i (o i,t )=f i (g i (o i ),x i ) (10)
Figure BDA0003375209180000091
in formula (10): v i (o i,t ) Is a function of the state value, g i Is the state encoder of the distributed power supply i, f i Is a double-layer fully-connected neural network. x is the number of i The contributions from the remaining distributed power state information are weighted.
Initializing a policy parameter θ = (θ) for all distributed power inverter agents 12 ,…,θ n ) And interacting with the power distribution network environment in each iteration, collecting states and actions and calculating corresponding advantage functions. When learning the interactively collected batch data, firstly inputting a plurality of states stored in the batch data into a critic network and an attention network, then sending the output of the attention network into the critic network to calculate to obtain a plurality of merit functions, and finally optimizing parameters of the actor network and the critic network for a plurality of times by using the merit functions of the batch data, wherein the objective functions are as follows:
Figure BDA0003375209180000092
in formula (12): pi θ' Is the strategy used to perform the sampling, and the collected samples are used to train θ.
Figure BDA0003375209180000093
Function is expressed as
Figure BDA0003375209180000094
If the value is more than 1+ epsilon, 1+ epsilon is taken, and if the value is less than 1-epsilon, 1-epsilon is taken, wherein epsilon is a hyperparameter.
In step 3, the expert knowledge is specifically expressed as follows:
V min ≤V i,t ≤V max (13)
Figure BDA0003375209180000095
Figure BDA0003375209180000096
in formula (13): v min And V max Representing the lowest and highest voltage values acceptable for the node, respectively.
Equations (14) and (15) have the significance that the distributed power supply can only reduce reactive output and active power reduction when the current voltage is higher than the tolerable maximum voltage; when the present voltage is below the acceptable minimum voltage, the distributed power supply can only increase reactive power while avoiding active power curtailment.
In the step 3, the solving process essentially comprises training a deep neural network model in each distributed power inverter intelligent body, and the specific training steps are as follows:
(1) Each distributed power inverter intelligent agent acquires the current environment state of the power distribution network, inputs the current environment state into the actor network of the intelligent agent, and obtains a mean value and a variance;
(2) Each distributed power inverter intelligent body constructs the distribution of the distributed power inverter actions according to the mean value and the variance, and then the current actions of the distributed power inverters are obtained through sampling;
(3) And each distributed power inverter intelligent agent prunes the current action according to expert knowledge and then outputs the reactive compensation quantity and the active reduction quantity of the distributed power inverter which actually act on the environment. Then, inputting the actions into the environment of the digital twin simulator of the power distribution network system to obtain the reward and the environment state of the next time slot, and storing the current environment state information, the current actions and the reward in an experience pool. And then, inputting the power distribution network state of the next time slot into the actor network of each distributed power inverter intelligent body, and circulating the step 1-3 for a certain number of times.
(4) Inputting the state of the last time slot after the circulation of the steps 1-3 into a critic network embedded with an attention mechanism to obtain a state value function V', and calculating the discount reward by the formula (16) to obtain R = [ R ] 1 ,R 2 ,…,R T ];
R t =r t1 *r t+12 *r t+2 +…+γ T-t *V' (16)
In formula (16) < gamma >, (Y) 12 ,…,γ T-t Is the discounting factor and T is the last time slot.
(5) Inputting all stored state combinations into a critic network to obtain all state value functions V, and calculating the advantage functions related to the distributed power inverter intelligent agent through the formula (17);
A t =R-V (17)
(6) Calculating loss functions of all intelligent agent critic networks, and then reversely propagating and updating the critic networks;
(7) Calculating loss functions of all the agent actor networks through a formula (12), and then reversely propagating and updating the actor networks;
(8) Repeating the steps (6) - (7) to update for multiple times;
(9) And (5) circulating the steps (1) - (8) until the training is finished when the training reward curve tends to be stable, and finally obtaining the optimal control strategy related to each distributed power supply inverter intelligent agent.
Further, the step 4 of deploying the optimal control strategy obtained by training includes the following steps:
(1) Collecting environment state information of a distributed power supply access node;
(2) Sending the acquired environmental state information to a distributed power inverter intelligent agent at a corresponding node;
(3) And the intelligent agent of the distributed power inverter outputs the reactive compensation amount and the active reduction amount of the distributed power inverter in the current time slot by using the obtained optimal control strategy.
FIG. 2 is a voltage comparison graph of an embodiment of the method of the present invention with other methods. The comparison scheme adopts a multi-agent attention near-end strategy optimization algorithm (MAPPO), the comparison scheme adopts a multi-intelligent depth certainty strategy gradient algorithm (MADDPG), and the method adopts the multi-agent attention near-end strategy optimization algorithm (AMAPPO) and expert knowledge. Specifically, the simulation environment is based on a standard IEEE33 bus model, a distributed power supply is connected to 6 nodes of the simulation environment, a coordinate voltage regulation model is trained by using MADDPG, MAPPO, AMAPPO and expert knowledge respectively, and the voltage can be regulated to be in a safety range by the method according to a test result chart.
As shown in fig. 2 and 3, in an environment of 6 distributed power nodes, the active power distribution network cooperative voltage regulation control method based on the multi-agent attention mechanism near-end strategy optimization algorithm and expert knowledge has an optimal voltage regulation effect, and meanwhile, the reduction of the active power of the distributed power supply can be effectively reduced, compared with the method in the first comparison scheme, the reduction of the active power is reduced by 11.17%, and compared with the method in the second comparison scheme, the reduction of the active power is reduced by 35.38%.
The invention provides an active power distribution network cooperative voltage regulation system based on multi-agent deep reinforcement learning, which comprises:
an acquisition module: the method is used for acquiring a high-proportion renewable energy power distribution network collaborative voltage control model;
designing a module: the cooperative voltage control model is designed to be a Markov game problem related to the control of each distributed power inverter;
a solution module: the method is used for solving the Markov game problem by adopting a multi-agent attention near-end strategy optimization algorithm and expert knowledge, and finally obtaining the optimal control strategy of the local active power and reactive power of each distributed power inverter;
a deployment module: and the method is used for carrying out online cooperative voltage regulation on the optimal control strategy deployment obtained by training.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, it is possible to make various modifications and variations without departing from the technical principle of the present invention, and these modifications and variations should be considered as the protection scope of the present invention.

Claims (3)

1. An active power distribution network cooperative voltage regulation method based on multi-agent deep reinforcement learning is characterized by comprising the following steps:
acquiring a high-proportion renewable energy power distribution network collaborative voltage control model;
designing the cooperative voltage control model as a Markov game problem related to the control of each distributed power inverter;
solving the Markov game problem by adopting a multi-agent attention near-end strategy optimization algorithm and expert knowledge, and finally obtaining a local active power and reactive power optimal control strategy of each distributed power inverter;
deploying the optimal control strategy obtained by training for online cooperative voltage regulation;
the cooperative voltage control model comprises an objective function, a decision variable and a constraint condition;
if the number of the nodes of the power distribution network is M, the number of the accessed distributed power supplies is N, and the target function is expressed as follows:
Figure FDA0003874467390000011
in formula (1): [. The] + (·, 0) |, | · | denotes the absolute value, V min And V max Respectively representing the lowest and highest voltage values, V, acceptable to the node j,t Denotes the voltage at the t-slot node j, M denotes the number of distribution network nodes, Δ p i,t Represents the active power reduction quantity of the ith distributed power supply in the t time slot, delta q i,t The method comprises the steps that reactive compensation quantity of an ith distributed power inverter in a time slot t is obtained, N represents the number of power distribution networks connected to the distributed power, alpha is an importance coefficient of penalty cost caused by the active reduction cost of the distributed power relative to the voltage deviation degree, and beta is an importance coefficient of penalty cost caused by the reactive power of the distributed power inverter relative to the voltage deviation degree;
decision variables and constraints, the formula is as follows:
Figure FDA0003874467390000012
Figure FDA0003874467390000013
Figure FDA0003874467390000014
in formula (2):
Figure FDA0003874467390000015
and
Figure FDA0003874467390000016
for the minimum and maximum reactive compensation amount of the ith distributed power inverter, in equation (3):
Figure FDA0003874467390000017
for the maximum active power of the ith distributed power supply at the time t, P in the formula (4) i,t Is the ith distributionActive power, Q, of a power supply without adjustment during the t time slot i,t For the reactive power of the ith distributed power supply under the condition of unadjusted t time slot, S i The apparent power of the ith distributed power supply is a fixed value;
after the distributed power supply is subjected to reactive power compensation and active power reduction, the whole power distribution network meets the tidal current equality constraint, and the formula is as follows:
Figure FDA0003874467390000021
Figure FDA0003874467390000022
in formulae (5) and (6):
Figure FDA0003874467390000023
and
Figure FDA0003874467390000024
is the active and reactive power, G, of the load demand access node i at time slot t ij,t And B ij,t Is the real and imaginary parts of the admittance elements between node i and node j, abs represents the absolute value function, V i,t Represents the voltage of the t time slot node i;
the Markov game problem is characterized by three parts of an environment state, an action and a reward function;
environmental state s t Represented by the following tuple:
s t =(o 1,t ,o 2,t ,…,o n,t ) (7)
in formula (7): o. o i,t =(P i,t ,Q i,t ,V i,t ),P i,t Representing the active power, Q, of a t-slot distributed power access node i i,t Representing reactive power, V, of a t-slot distributed power access node i i,t Representing the voltage of the access node of the distributed power supply at the time slot t;
action a t Represented by the following tuple:
a t =(Δq i,t ,Δp i,t ) (8)
in formula (8): a is a t For distributed power inverter operation in t time slot, Δ q i,t Reactive compensation quantity delta p of ith distributed power supply inverter in t time slot i,t Reducing the active power of the ith distributed power supply;
reward function r t The expression is as follows:
c 1,t r t =c 1,t +αc 2,t +βc 3,t (9)
in formula (9): c. C 1,t Penalty cost due to violation of safe voltage at all nodes of t time slot, c 2,t Is the sum of the active reduction of all distributed power supplies in the t time slot, c 3,t The sum of all reactive compensation quantities of the distributed power inverters in the t time slot is obtained, alpha is an importance coefficient of punishment cost caused by the active reduction cost of the distributed power inverters relative to the voltage deviation degree, and beta is an importance coefficient of punishment cost caused by the reactive compensation of the distributed power inverters relative to the voltage deviation degree;
the multi-agent attention near-end strategy optimization algorithm comprises an actor network, a critic network and an attention network;
for distributed power node i, the critic network and the attention network thereof are characterized together as follows:
V i (o i,t )=f i (g i (o i ),x i ) (10)
Figure FDA0003874467390000031
in formula (10): v i (o i,t ) Is a function of the state value, g i Is a state encoder of the distributed power source i, f i Is a double-layer fully-connected neural network, x i The contributions from the remaining distributed power state information are weighted;
initializing a strategy parameter θ = (θ) for all distributed power inverters 12 ,…,θ n ) Interacting with the power distribution network environment in each iteration, collecting states and actions and calculating corresponding advantage functions, when batch data acquired by interaction are studied, firstly inputting a plurality of states stored in the batch data to a critic network and an attention network, then calculating the output of the attention network to the critic network to obtain a plurality of advantage functions, and finally optimizing parameters of an actor network and the critic network for multiple times by using the advantage functions of the batch data, wherein the objective functions are as follows:
Figure FDA0003874467390000032
in formula (12): pi θ' Is a strategy for sampling, the sampled samples are used to train theta,
Figure FDA0003874467390000033
function is expressed as
Figure FDA0003874467390000034
Taking 1+ epsilon when the epsilon is more than 1+ epsilon, and taking 1-epsilon when the epsilon is less than 1-epsilon, wherein epsilon is a hyper-parameter;
the expert knowledge expression is as follows:
V min ≤V i,t ≤V max (13)
Figure FDA0003874467390000035
Figure FDA0003874467390000036
in formula (13): v min And V max Respectively representing the lowest and highest voltage values acceptable to the node;
The formulae (14) and (15) are represented as: when the current voltage is higher than the tolerable maximum voltage, the distributed power supply can only reduce the reactive output and active power reduction; when the current voltage is lower than the acceptable minimum voltage, the distributed power supply can only increase the reactive power and can not increase the active power in an actual physical scene;
the method for obtaining the optimal control strategy of the local active power and the local reactive power of each distributed power supply inverter comprises the following steps:
step 1: acquiring the current environment state of the power distribution network, and inputting the current environment state into the actor network to obtain a mean value and a variance;
step 2: constructing the distribution of the action of the distributed power inverter according to the mean value and the variance, and then obtaining the current action of the distributed power inverter through sampling;
and 3, step 3: after the current action is pruned according to expert knowledge, outputting reactive compensation quantity and active reduction quantity of a distributed power inverter actually acting on the environment, inputting the reactive compensation quantity and the active reduction quantity into the environment of a digital twin simulator of the power distribution network system to obtain the states of reward and next time slot, then storing the state information of the current environment, the current action and the reward in an experience pool, inputting the state of the power distribution network of the next time slot into an actor network, and circulating the step 1-3 for a certain number of times;
and 4, step 4: inputting the state of the last time slot after the circulation of the steps 1-3 into a critic network embedded with an attention mechanism to obtain a state value function V', and calculating the discount reward by the formula (16) to obtain R = [ ] 1 ,R 2 ,…,R T ];
R t =r t1 *r t+12 *r t+2 +…+γ T-t *V' (16)
γ in formula (16) 12 ,…,γ T-t Is a discount factor, T is the last time slot;
and 5: inputting all stored state combinations into a critic network to obtain all state value functions V, and calculating the distributed power supply inverter advantage function through a formula (17);
A t =R-V (17)
and 6: calculating a loss function of the critic network, and then reversely propagating and updating the critic network;
and 7: calculating the loss function of the actor network by the formula (12), and then reversely propagating and updating the actor network;
and step 8: the step 6-7 is circulated to carry out multiple updates;
and step 9: and (5) circulating the steps 1-8 until the training reward curve tends to be stable, and finishing the training to obtain the optimal control strategy of the local active power and reactive power of each distributed power inverter.
2. The active power distribution network cooperative voltage regulation method based on multi-agent deep reinforcement learning of claim 1, wherein the method for online cooperative voltage regulation of optimal control strategy deployment obtained by training is as follows:
collecting environment state information of distributed power supply nodes;
sending the acquired environmental state information to a distributed power inverter at a corresponding node;
and the distributed power inverter outputs the reactive compensation amount and the active reduction amount of the distributed power inverter in the current time slot by using the obtained optimal control strategy to perform online coordinated voltage regulation.
3. The active power distribution network cooperative voltage regulation system based on multi-agent deep reinforcement learning of the active power distribution network cooperative voltage regulation method according to claim 1, comprising:
an acquisition module: the method is used for acquiring a high-proportion renewable energy power distribution network collaborative voltage control model;
designing a module: the method comprises the steps of designing a cooperative voltage control model as a Markov game problem related to control of each distributed power inverter;
a solving module: the method is used for solving the Markov game problem by adopting a multi-agent attention near-end strategy optimization algorithm and expert knowledge, and finally obtaining the optimal control strategy of the local active power and reactive power of each distributed power inverter;
a deployment module: and the method is used for carrying out online cooperative voltage regulation on the optimal control strategy deployment obtained by training.
CN202111415562.9A 2021-11-25 2021-11-25 Active power distribution network cooperative voltage regulation method and system based on multi-agent deep reinforcement learning Active CN114362187B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111415562.9A CN114362187B (en) 2021-11-25 2021-11-25 Active power distribution network cooperative voltage regulation method and system based on multi-agent deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111415562.9A CN114362187B (en) 2021-11-25 2021-11-25 Active power distribution network cooperative voltage regulation method and system based on multi-agent deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN114362187A CN114362187A (en) 2022-04-15
CN114362187B true CN114362187B (en) 2022-12-09

Family

ID=81095633

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111415562.9A Active CN114362187B (en) 2021-11-25 2021-11-25 Active power distribution network cooperative voltage regulation method and system based on multi-agent deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN114362187B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114725936B (en) * 2022-04-21 2023-04-18 电子科技大学 Power distribution network optimization method based on multi-agent deep reinforcement learning
CN115544899B (en) * 2022-11-23 2023-04-07 南京邮电大学 Water plant water intake pump station energy-saving scheduling method based on multi-agent deep reinforcement learning
CN116581766B (en) * 2023-07-11 2023-11-28 南京理工大学 Virtual power plant strengthening online voltage control method considering sagging characteristic
CN118017523A (en) * 2024-04-09 2024-05-10 杭州鸿晟电力设计咨询有限公司 Voltage control method, device, equipment and medium for electric power system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103199565A (en) * 2013-03-29 2013-07-10 华南理工大学 Multi-zone automatic generation control coordination method based on differential game theory
CN111144793A (en) * 2020-01-03 2020-05-12 南京邮电大学 Commercial building HVAC control method based on multi-agent deep reinforcement learning
CN111564849A (en) * 2020-05-15 2020-08-21 清华大学 Two-stage deep reinforcement learning-based power grid reactive voltage control method
CN112186743A (en) * 2020-09-16 2021-01-05 北京交通大学 Dynamic power system economic dispatching method based on deep reinforcement learning
CN112290536A (en) * 2020-09-23 2021-01-29 电子科技大学 Online scheduling method of electricity-heat comprehensive energy system based on near-end strategy optimization
CN113285485A (en) * 2021-07-23 2021-08-20 南京邮电大学 Power distribution network source network charge storage multi-terminal cooperative voltage regulation method under long, short and multi-time scales

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104820864B (en) * 2015-03-31 2018-05-08 浙江工业大学 Intelligent distribution network panorama fault recovery game method containing distributed generation resource
US20200119556A1 (en) * 2018-10-11 2020-04-16 Di Shi Autonomous Voltage Control for Power System Using Deep Reinforcement Learning Considering N-1 Contingency

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103199565A (en) * 2013-03-29 2013-07-10 华南理工大学 Multi-zone automatic generation control coordination method based on differential game theory
CN111144793A (en) * 2020-01-03 2020-05-12 南京邮电大学 Commercial building HVAC control method based on multi-agent deep reinforcement learning
CN111564849A (en) * 2020-05-15 2020-08-21 清华大学 Two-stage deep reinforcement learning-based power grid reactive voltage control method
CN112186743A (en) * 2020-09-16 2021-01-05 北京交通大学 Dynamic power system economic dispatching method based on deep reinforcement learning
CN112290536A (en) * 2020-09-23 2021-01-29 电子科技大学 Online scheduling method of electricity-heat comprehensive energy system based on near-end strategy optimization
CN113285485A (en) * 2021-07-23 2021-08-20 南京邮电大学 Power distribution network source network charge storage multi-terminal cooperative voltage regulation method under long, short and multi-time scales

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A Review of Deep Reinforcement Learning for Smart Building Energy Management;Liang Yu等;《IEEE Internet of Things Journal》;20210510;第8卷(第15期);第12046-12063页 *
基于马尔可夫攻防模型的电力信息光网安全策略加固;周诚等;《科学技术与工程》;20170430;第17卷(第11期);第79-83页 *

Also Published As

Publication number Publication date
CN114362187A (en) 2022-04-15

Similar Documents

Publication Publication Date Title
CN114362187B (en) Active power distribution network cooperative voltage regulation method and system based on multi-agent deep reinforcement learning
Li et al. Coordinated load frequency control of multi-area integrated energy system using multi-agent deep reinforcement learning
CN113363997B (en) Reactive voltage control method based on multi-time scale and multi-agent deep reinforcement learning
CN107437813B (en) Power distribution network reactive power optimization method based on cuckoo-particle swarm
CN113363998B (en) Power distribution network voltage control method based on multi-agent deep reinforcement learning
CN114362196B (en) Multi-time-scale active power distribution network voltage control method
CN114217524B (en) Power grid real-time self-adaptive decision-making method based on deep reinforcement learning
Liu et al. A distributed iterative learning framework for DC microgrids: Current sharing and voltage regulation
CN110163540B (en) Power system transient stability prevention control method and system
CN114784823A (en) Micro-grid frequency control method and system based on depth certainty strategy gradient
CN115293052A (en) Power system active power flow online optimization control method, storage medium and device
CN116760047A (en) Power distribution network voltage reactive power control method and system based on safety reinforcement learning algorithm
CN115313403A (en) Real-time voltage regulation and control method based on deep reinforcement learning algorithm
Zhang et al. Deep reinforcement learning for load shedding against short-term voltage instability in large power systems
CN113872213B (en) Autonomous optimization control method and device for power distribution network voltage
CN115345380A (en) New energy consumption electric power scheduling method based on artificial intelligence
CN116826762B (en) Intelligent power distribution network voltage safety control method, device, equipment and medium thereof
CN117833263A (en) New energy power grid voltage control method and system based on DDPG
CN117154845A (en) Power grid operation adjustment method based on generation type decision model
CN117200213A (en) Power distribution system voltage control method based on self-organizing map neural network deep reinforcement learning
CN115133540B (en) Model-free real-time voltage control method for power distribution network
CN114063438B (en) Data-driven multi-agent system PID control protocol self-learning method
CN115276067A (en) Distributed energy storage voltage adjusting method adaptive to topological dynamic change of power distribution network
CN114400675B (en) Active power distribution network voltage control method based on weight mean value deep double-Q network
CN114841595A (en) Deep-enhancement-algorithm-based hydropower station plant real-time optimization scheduling method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant