CN117833263A

CN117833263A - New energy power grid voltage control method and system based on DDPG

Info

Publication number: CN117833263A
Application number: CN202311537485.3A
Authority: CN
Inventors: 任冲; 程林; 柯贤波; 卫琳; 张文朝; 杜金欢
Original assignee: Northwest Branch Of State Grid Corp Of China; Beijing Kedong Electric Power Control System Co Ltd
Current assignee: Northwest Branch Of State Grid Corp Of China; Beijing Kedong Electric Power Control System Co Ltd
Priority date: 2023-11-17
Filing date: 2023-11-17
Publication date: 2024-04-05

Abstract

The invention relates to the technical field of stable control of power systems, in particular to a method and a system for controlling the voltage of a new energy power grid based on DDPG. The technical scheme can effectively coordinate the action of reactive power control equipment of the power system, reduce the active loss of the power grid, improve the voltage quality and the voltage stability level of the power grid, ensure the safe and stable operation of the power grid, and provide stable voltage support for the new energy consumption and the outward transmission of the energy storage and pumping and storage system.

Description

New energy power grid voltage control method and system based on DDPG

Technical Field

The invention relates to the technical field of stable control of power systems, in particular to a new energy power grid voltage control method and system based on DDPG.

Background

Under the background of double carbon, new energy installation in China keeps growing at a high speed, and the current new energy of a northwest power grid is close to 50% of that of a full-grid installation, and becomes a first large installation power supply. In addition, with the rapid development of new technologies and new equipment such as energy storage, pumping and storage, the power supply and load characteristics are greatly changed, and the uncertainty of the operation of a power grid is obviously enhanced. A plurality of alternating current-direct current hybrid large-scale power transmission end grids are built in succession in the current northwest power grid, and due to the fact that the dynamic reactive power storage of the power transmission end grids is insufficient, voltage control contradiction is outstanding, and reactive power optimization difficulty is high.

With the high-speed development of big data processing technology and information transmission technology, voltage control technology has developed, and it utilizes the existing algorithm to solve and optimally combine reactive compensation point variables, but it still exists: if the reactive compensation quantity cannot be effectively controlled, the overall active power loss is increased, the optimization cost is increased, and certain node voltages cannot be stabilized in a reasonable range. These problems reduce the efficiency and lifetime of the powered device and may cause voltage collapse.

At present, the traditional solving method of voltage control mainly comprises linear programming, nonlinear programming and heuristic algorithm. The linear programming algorithm is required to build a mathematical model explicit in the system, so that accurate line parameters and topological structures are often difficult to acquire in a large-scale power system. However, the heuristic algorithm cannot guarantee the optimality of the solution, and the calculated amount increases exponentially as the number of variables and the number of constraints increase. The nonlinear programming algorithm is widely used in the control of the power system due to the flexible and efficient characteristics, wherein the voltage control algorithm based on the deep reinforcement learning converts the continuous reactive power optimization of the power distribution network into a multi-step Markov decision process, and the deep reinforcement learning algorithm based on an actor-critic (actor-critic) is adopted for solving, but the method can only be used for discrete action control, consumes a large amount of resources and time in the actual system operation process, and is difficult to continuously play a role.

The reactive voltage control of the system is a multi-constraint nonlinear programming problem, and reactive power injected into a power grid is controlled by comprehensively adjusting the AVR, the on-load transformer tap and the reactive compensation device, so that the system achieves minimum active loss, and the power grid is ensured to be in a safe and efficient running state. Therefore, a new high-proportion new energy power grid voltage control strategy based on a depth deterministic strategy gradient algorithm is needed to solve the problems.

Disclosure of Invention

The present invention has been made in view of the above-mentioned problems occurring in the prior art.

The invention provides a new energy power grid voltage control method and a system based on DDPG, which are characterized in that a high-proportion new energy power grid voltage control strategy based on a depth deterministic strategy gradient algorithm is established, the minimum grid loss of a power system is taken as an objective function, the reactive power optimization problem of the power system containing energy storage and pumping storage is converted into a Markov decision process, a 'Actor-Critic' structure is utilized to train a depth neural network model, fitting of a value evaluation and strategy improvement process is realized, more effective learning can be realized on continuous action, finally, a global optimal solution is converged in voltage control, the reactive power compensation quantity is effectively controlled, the active loss is reduced, and the voltage is stabilized.

In order to solve the technical problems, the invention provides a new energy power grid voltage control method based on DDPG, which comprises the following steps: step 1, constructing a new energy power grid system containing energy storage and pumping storage into a plurality of intelligent agents, wherein the intelligent agents observe the reactive state of a system node at the current moment and randomly take the state S _t Inputs to the policy network Actor;

step 2, the current network of the actor is according to the input state S _t Selecting the action A to enable reactive power of a new energy power grid system which is injected with energy storage and pumping storage to change so as to realize voltage adjustment;

step 3, after the action is executed by the actor current network, the agent observes the rewards r fed back by the environment and sends the current time state S _t The Actor outputs the action A to the current network, and the reactive state S of the node is the next moment _t+1 And rewarding r quaternary data group to store in experience pool, then observing reactive state S at next moment _t+1 The next round of data set sampling is carried out and stored in an experience pool until a set sampling threshold value is reached; at the same time, the reactive state S of the node at the next moment _t+1 Inputting the action A 'to an Actor target network to select a corresponding action A';

step 4, after the sampling is completed, entering a reactive power optimization training module, collecting N samples from an experience pool, simultaneously, inputting an output action A of a current network of an Actor and a node reactive power state S to an input of a value network Critic by an intelligent agent, calculating the value of an Actor network strategy, providing gradient information of updated strategy weight for the strategy network after the value is evaluated to guide the action direction of the Actor network, and calculating the loss;

and 5, soft updating the current network and the target network of the Actor, the current network and the target network parameters of the Critic based on the minimized loss function, and updating the Actor network model.

As a preferable scheme of the DDPG-based new energy power grid voltage control method, the invention comprises the following steps: in the step 1, the construction of a plurality of agents includes that each trained agent Actor and Critic network has the same structure and contains two layers of hidden networks, the number of neurons of each layer is 400 and 300, and the operation output of each layer is within the range of [ -1,1 ].

As a preferable scheme of the DDPG-based new energy power grid voltage control method, the invention comprises the following steps: in the step 2, the reactive voltage control of the system is a multi-constraint nonlinear programming problem, and the reactive power injected into the power grid is controlled by comprehensively adjusting the AVR, the on-load transformer tap and the reactive compensation device, so that the system achieves minimum active loss, and the Actor network automatically selects actions according to the input state.

As a preferable scheme of the DDPG-based new energy grid voltage control method of the present invention, in the step 3,

actor network output, a _t ＝μ(s _t ∣θ ^μ )；

Actor target network output, a _t+1 '＝μ′(s _t+1 ∣θ ^μ′ )；

Wherein s is _t Inputting the reactive state of the node at the current moment, s _t+1 For the reactive state input of the node at the next moment, theta ^u A is an Actor model parameter _t For output decisions, μ is the optimal behavior policy function.

As a preferable scheme of the DDPG-based new energy power grid voltage control method, the invention comprises the following steps: the step 4 includes that the value of the current network action A of the Actor is calculated by the Critic current value network, and the value of the calculation action A' of the Critic target value network is recorded as a tag function y _t After calculating the action value, further calculating a minimized loss function L;

critic current network output formula q _t ＝Q(s _t ,a _t |θ ^Q )

Critic target network output formula q' _t+1 ＝Q′(s _t+1 ,μ′(s _t+1 ∣θ ^μ′ )∣θ ^Q )

The marking function y _t The minimization loss function L expression is,

y _t ＝r _t +γQ′(s _t+1 ,μ′(s _t+1 ∣θ ^μ′ )∣θ ^Q )

wherein, gamma is an attenuation factor, theta ^μ′ Updating parameters theta for an Actor target network ^q′ Updating parameters for the Critic target network;

wherein, the minimum network loss function of the power system under the calculation reference constraint condition of the minimum loss function:

the constraint conditions include: the equation constraint of the power system flow equation,

wherein P is _i And Q _i Active power (MW) and reactive power (MVar) injected for node i; u (U) _i And U _j Voltage amplitude (KV) for nodes i and j; omega _ij The phase angle difference of the voltage between the nodes i and j is obtained; g _ij And B _ij The real part and the imaginary part of the node admittance matrix; the total node number of the system is n;

the variable constraint constitutes an inequality constraint,

wherein,and->Upper and lower limits of reactive power output of pumping, storing and energy storage unit；/>And->The upper and lower limits of the capacity of the reactive compensation equipment; t (T) _imax And T _imin An adjustment range of the transformer tap; u (U) _imax And U _imin Representing a safe operating limit for the node voltage.

As a preferable scheme of the DDPG-based new energy power grid voltage control method, the invention comprises the following steps: in the step 5, the Actor network updates parameters by a gradient descent method,

wherein θ ^u Is the parameter of the Actor model, theta ^q For Critic model parameters, N is the number of samples of batch gradient descent,for optimum behavioural function mu with respect to theta ^u Is at s _i The value of the key is taken;

model parameters θ for Actor and Critic ^μ 、θ ^q θ ^μ′ 、θ ^q′ Soft updating is carried out, namely, partial parameters are updated when optimizing training is carried out each time, the parameters are ensured to be updated slowly,

where τ is represented as a soft update coefficient, θ ^μ′ Updating parameters theta for an Actor target network ^q′ Is an updated parameter for the Critic target network.

Another object of the present invention is to provide a new energy power grid voltage control system based on DDPG, which can reduce power grid loss and improve overall efficiency by optimizing reactive power control. Accurate voltage control helps to maintain the stability of the grid, especially in situations where a large amount of new energy is involved in power generation. By utilizing a deep learning algorithm, the system can automatically learn and adapt to continuously changing power grid conditions, and manual intervention is reduced. By minimizing the power grid loss, the operation cost can be reduced, and the economic benefit can be improved.

As a preferable scheme of the novel energy power grid voltage control system based on DDPG, the invention comprises the following steps: the system comprises a sampling module, a reactive power optimization training module and a parameter updating module;

the data preprocessing module is used for the intelligent agent to output the network output action A of the current moment state S, actor and the node reactive state S of the next moment _t+1 And the data of the rewards R obtained from the environment by the Actor network are stored in an experience pool until reaching a preset threshold value, and then the data is transferred into a training module;

the reactive power optimization training module is used for calculating the value of an Actor network action A and a target network A' of the Critic value network and the target network thereof, further calculating the loss and finding an optimal strategy model;

the parameter updating module is used for updating part of parameters during each optimization training, so that the parameters can be slowly updated, and the learning stability is improved.

A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of a method according to any one of the DDPG-based new energy grid voltage control methods.

A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of a method of any one of the DDPG-based new energy grid voltage control methods.

The invention has the beneficial effects that: aiming at voltage regulation of the energy storage and pumping storage system, the DDPG algorithm adopts a deep reinforcement learning DQN updating mode, and utilizes a deterministic strategy gradient algorithm to enable the voltage regulation of the energy storage and pumping storage power system to be converged to an optimal strategy, so that the optimal action can be searched in a continuous action space, the algorithm can be converged on a global optimal action with less calculated amount, and the effect of optimizing the continuity control of discrete equipment is achieved. And the validity of the method is verified by an IEEE30 node standard calculation example, and under the condition that the set constraint condition is met, the network loss can be optimized by adjusting reactive power control equipment, for example, the voltage is controlled by means of changing the generator terminal voltage, changing the transformer transformation ratio, switching capacitor capacity and the like. The method can be used for solving the reactive voltage operation control requirement in the actual energy storage and pumping and storage system, and improves the voltage quality and the voltage stability level of the novel power system with energy storage and pumping and storage.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. Wherein:

fig. 1 is a schematic flow chart of a new energy grid voltage control method based on a depth gradient algorithm according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a Markov Decision Process (MDP) of a new energy grid voltage control method based on a depth gradient algorithm according to an embodiment of the present invention.

Fig. 3 is a deep reinforcement learning Actor-Critic Architecture (AC) diagram of a new energy grid voltage control method based on a deep gradient algorithm according to an embodiment of the present invention.

Fig. 4 is a depth deterministic strategy gradient algorithm (DDPG) architecture diagram of a new energy grid voltage control method based on a depth gradient algorithm according to an embodiment of the present invention.

Fig. 5 is an electrical wiring diagram of an IEEE30 node standard calculation example of a new energy power grid voltage control method based on a depth gradient algorithm according to an embodiment of the present invention.

Fig. 6 is an iteration graph of an objective function (rewarding value is obtained every 20 steps) of a new energy grid voltage control method based on a depth gradient algorithm according to an embodiment of the present invention.

Fig. 7 is a DDPG reactive power optimization training curve chart (rewarding value is obtained every 20 steps) of a new energy grid voltage control method based on a depth gradient algorithm according to an embodiment of the present invention.

Fig. 8 is a schematic diagram of a generator terminal voltage action value of a new energy grid voltage control method based on a depth gradient algorithm according to an embodiment of the present invention.

Fig. 9 is a schematic flow chart of a new energy grid voltage control system based on a depth gradient algorithm according to an embodiment of the present invention.

Detailed Description

So that the manner in which the above recited objects, features and advantages of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to the embodiments, some of which are illustrated in the appended drawings. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present invention is not limited to the specific embodiments disclosed below.

Further, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic can be included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.

While the embodiments of the present invention have been illustrated and described in detail in the drawings, the cross-sectional view of the device structure is not to scale in the general sense for ease of illustration, and the drawings are merely exemplary and should not be construed as limiting the scope of the invention. In addition, the three-dimensional dimensions of length, width and depth should be included in actual fabrication.

Also in the description of the present invention, it should be noted that the orientation or positional relationship indicated by the terms "upper, lower, inner and outer", etc. are based on the orientation or positional relationship shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the apparatus or elements referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first, second, or third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

The terms "mounted, connected, and coupled" should be construed broadly in this disclosure unless otherwise specifically indicated and defined, such as: can be fixed connection, detachable connection or integral connection; it may also be a mechanical connection, an electrical connection, or a direct connection, or may be indirectly connected through an intermediate medium, or may be a communication between two elements. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.

Example 1

Referring to fig. 1, a method for controlling voltage of a new energy power grid based on DDPG is provided in a first embodiment of the present invention, including:

s1, constructing a new energy power grid system containing energy storage and pumping storage into a plurality of intelligent agents, observing the reactive state of a system node at the current moment by the intelligent agents, and randomly taking the state S _t Inputs to the policy network Actor;

it should be noted that the safe operating range of the voltage of each node of the power system is set between 0.95p.u. and 1.10 p.u.. In the network structure, the Actor and Critic network structures of each training agent are the same, each training agent comprises two layers of hidden networks, the number of neurons of each layer is 400 and 300, and the operation output of each layer is within the range of [ -1,1 ].

S2, selecting an action A to enable reactive power of a new energy power grid system with energy storage and pumping storage to be changed according to an input St state by the current network of the Actor so as to realize voltage adjustment;

it should be noted that, in the power system, the network loss can be optimized by adjusting reactive power control equipment, for example, the voltage is controlled by means of changing the generator terminal voltage, changing the transformer transformation ratio and switching capacitor capacity.

S3, after the action is executed by the current network of the Actor, the agent observes the rewards r fed back by the environment, and the state S at the current moment _t The Actor outputs the action A to the current network, and the reactive state S of the node is the next moment _t+1 And rewarding r quaternary data group to store in experience pool, then observing reactive state S at next moment _t+1 The next round of data set sampling is carried out and stored in an experience pool until a set sampling threshold value is reached; at the same time, the reactive state S of the node at the next moment _t+1 Inputting the action A 'to an Actor target network to select a corresponding action A';

it should be noted that the Actor target network selects the corresponding action a' for the Critic target network to calculate the Actor value in the subsequent reactive training process. The Actor network expression is:

actor network output, a _t ＝μ(s _t ∣θ ^μ )

Actor target network output, a _t+1 '＝μ′(s _t+1 ∣θ ^μ′ )

S4, in the reactive power optimization training process, N samples are collected from the experience pool, meanwhile, the intelligent agent inputs the output action A of the current network of the Actor and the node reactive power state S to the input of the value network Critic, the value of the Actor network strategy is calculated, and after the value is evaluated, gradient information of updated strategy weight is provided for the strategy network to guide the action direction of the Actor network, and the loss is calculated. As long as the method has enough (s, a, r, st+1), a good evaluation model can be obtained, and a 'Zhongjing' evaluation can be given to a specific (s, a), namely a reactive power optimization voltage control strategy which accords with minimum network loss can be automatically found. Comprising the following steps:

critic current network output formula q _t ＝Q(s _t ,a _t |θ ^Q )

The marking function y _t The minimization loss function L expression is,

y _t ＝r _t +γQ′(s _t+1 ,μ′(s _t+1 ∣θ ^μ′ )∣θ ^Q )

wherein, gamma is an attenuation factor, theta ^μ′ Updating parameters theta for an Actor target network ^q′ Is an updated parameter for the Critic target network.

wherein P is _i And Q _i Active power (MW) and reactive power (MVar) injected for node i; u (U) _i And U _j Voltage amplitude (KV) for nodes i and j; omega _ij The phase angle difference of the voltage between the nodes i and j is obtained; g _ij And B _ij The real part and the imaginary part of the node admittance matrix; the total node number of the system isn。

The variable constraint constitutes an inequality constraint,

wherein,and->The reactive power output upper limit and the reactive power output lower limit of the pumping, storing and energy storing unit are set; />And->The upper and lower limits of the capacity of the reactive compensation equipment; t (T) _imax And T _imin An adjustment range of the transformer tap; u (U) _imax And U _imin Representing a safe operating limit for the node voltage.

S5, soft updating the current network and the target network of the Actor, the current network and the target network parameters of the Critic based on the minimized loss function, and updating the Actor network model.

It should be noted that, the Actor network updates parameters by a gradient descent method,

wherein θ ^u Is the parameter of the Actor model, theta ^q For Critic model parameters, N is the number of samples of batch gradient descent,for optimum behavioural function mu with respect to theta ^u Is at s _i The value of the above value.

Model parameters θ for Actor and Critic ^μ 、θ ^q θ ^μ′ 、θ ^q′ Soft updating is carried out, namely, part of parameters are updated when optimizing training is carried out each time, ensuring that the parameters can be slowly updated,

According to the high-proportion new energy power grid voltage control strategy based on the depth deterministic strategy gradient algorithm, the reactive power adjustment action corresponding to the minimum power system loss is found out according to the depth deterministic strategy gradient algorithm, so that the reactive power injected into the power grid gradually approaches to the optimal value required by the power grid, the whole power grid has reactive power voltage flow close to the optimal value, and the purpose of improving the voltage stability of the system is achieved.

Example 2

Fig. 2-8 are diagrams showing a second embodiment of the present invention, which provides a new energy grid voltage control method based on DDPG, and in order to verify the beneficial effects of the present invention, scientific demonstration is performed through experiments.

Aiming at the defects of the traditional voltage control method, a deep reinforcement learning model for voltage control of a power system is constructed by adopting a Markov decision process based on a DDPG algorithm.

Reinforcement Learning (RL) is a trial and error learning by interacting with the environment with the goal of allowing agents to obtain the maximum jackpot in the environment interaction. Markov decision processes (Markov Decision Process, MDP) are commonly used to model reinforcement learning problems, which mainly contain the following 4 elements:

(a) State set: s is a set of environmental states, where the state of the agent at time t is S _t ∈S；

(b) Action set: a is a set of actions of the agent, wherein the action of the agent at time t is a _t ∈A；

(c) State transition process: state transition processT(s _t ,a _t ,s _t+1 )～p _r (s _t+1 |s _t ,a _t ) Representing that the agent is in state s _t Lower execution action a _t After transition to the next time state s _t+1 Probability of (2);

(d) Bonus function: reward function r _t Refers to that the intelligent agent is in state s _t Lower execution action a _t The instant rewards obtained later. In each round, the agent first observes the current state of the environment s _t And making decision a based on the state _t When the action is executed, the environment feeds back a rewarding value r to the intelligent body _t The environment then transitions to the next state s _t+1 This is a Markov decision process, and a schematic diagram is shown in FIG. 2.

The DDPG is an algorithm for outputting deterministic actions, and different from a general DRL method, the DDPG combines an AC mechanism with a neural network to construct a deterministic policy (Deterministic Policy), and in order to solve the problem of the stability of the explicit and policy update of the deterministic policy, the random quantity and the soft-replay are increased. The DDPG algorithm adopts a depth reinforcement learning DQN updating mode, and a deterministic strategy gradient algorithm is used for converging the DDPG algorithm to an optimal strategy, so that the optimal action can be searched in a continuous action space, and the algorithm can be converged on a global optimal action with less calculated amount. The DDPG algorithm adopts an AC architecture as shown in fig. 3, and the DDPG architecture is shown in fig. 4.

The Actor-Critic architecture of the DDPG algorithm is respectively realized by four full-connection-layer neural networks, as shown in Table 1, θ ^μ Is an Actor network parameter, θ ^q Is a Critic network parameter.

Table 1 DDPG algorithm network structure

The parameters of the Actor network are updated through a gradient descent strategy:

the parameters of Critic network are updated by minimizing the loss function L and the tag function y _t I.e.

y _t ＝r _t +γQ′(s _t+1 ,μ′(s _t+1 ∣θ ^μ′ )∣θ ^Q )

For target network parameter theta ^μ′ 、θ ^q′ Is updated by:

where τ is the soft update coefficient, θ ^μ′ Updating parameters theta for an Actor target network ^q′ Is an updated parameter for the Critic target network.

The IEEE30 node system is used as an example for simulating and analyzing the voltage control method based on deep reinforcement learning. The simulation test is realized based on the training of a pytorch1.12.1 neural network framework by using the python language, and the hardware platform is AMD Ryzen 76800H CPU,3.2GHZ.

The IEEE30 node system has five generators and four transformers, the system has 30 branches, 1 node is set as a balance node, node 27 is a PQ node accessed to the pumping storage, node 28 is a PQ node accessed to the energy storage, and the rest is taken as a load node. The node system topology is shown in fig. 5.

The safe operating range of each node voltage of the IEEE30 node system is set to be between 0.95pu and 1.10 pu. In the network structure, the Actor and Critic network structures of each training agent are the same, and each training agent comprises two layers of hidden networks, and the neuron number of each layer is 400 and 300. The activation function of the last layer of the Actor network is a hyperbolic tangent activation function tanh, so that the operation output of each layer is within the range of [ -1,1 ]. The DDPG network structure and superparameters are detailed in table 2.

TABLE 2 Actor and Critic network architecture and Supermarameter setting

The neural network training based on the parameter setting is converged in about 20 rounds, the converged voltage amplitude is concentrated between 1.0pu and 1.1pu, and the constraint condition of the voltage is met. According to the established objective function, the number of rounds is 60, and an iteration graph 6 of the objective function is obtained as follows.

As can be seen from fig. 6, the active power loss of the system is 2.8MW under the initial power flow of the IEEE30 node system after the round number of 10 to 20, and the average power loss of the system after the ddpg optimization is 2.25MW, which is reduced by 19.64% compared with the previous system, so as to achieve the reactive power optimization effect, as shown in fig. 7.

The dark curve in fig. 7 shows the average performance of rewards in all rounds, the light curve shows the effect error, the fluctuation of the error is relatively large before 400 rounds, the error tends to be stable after 400 rounds, and it can be concluded that the rewarding function can lead to unstable learning performance of DDPG.

Under the condition that constraint conditions are met, the network loss can be optimized by adjusting reactive power control equipment, namely, by means of changing the generator terminal voltage, changing the transformer transformation ratio, switching capacitor capacity and the like, and here, the effect of controlling the voltage is achieved by changing the generator terminal voltage, as shown in table 3.

TABLE 3 Generator terminal voltage action (Unit: pu)

According to the verification of fig. 8, reactive power optimization and voltage control of the power system can be realized by means of changing the generator terminal voltage. In addition, according to the average value of the output voltage of each round, the voltage regulation success rate is 94%, and in contrast, the voltage regulation success rate of the method provided by the invention can reach 99.76%. It can be concluded that the priority of voltage violations in the bonus design needs to be considered, so that the learning performance of the bonus function DDPG is more stable.

The voltage control strategy based on the DDPG algorithm herein is compared with the results of traditional heuristic algorithms such as classical Particle Swarm Optimization (PSO), adaptive Particle Swarm Optimization (APSO) and adaptive particle swarm optimization (VEPSO) based on vector evaluation on standard examples of the same example IEEE30 node. The minimized net loss reduction rate was used as a comparison index, and the results are shown in table 4.

Table 4IEEE30 node System different algorithms optimized comparison

Since the DDPG algorithm presented herein makes it easier to obtain samples with important reference values during the training process. Therefore, compared with the heuristic algorithm, the DDPG algorithm provided by the invention has more remarkable effect on the aspect of reducing the network loss, and the superiority of the reactive power optimization strategy of the algorithm is reflected.

Example 3

A third embodiment of the present invention, which is different from the first two embodiments, is:

the functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.

It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

Example 3

Referring to fig. 9, a fourth embodiment of the present invention provides a DDPG-based new energy grid voltage control system, which is characterized in that: the system comprises a sampling module, a reactive power optimization training module and a parameter updating module;

the sampling module is used for the intelligent agent to store the output action A of the network in the current moment state S, actor, the reactive state S' of the node at the next moment and the rewards R data obtained by the Actor network from the environment into the experience pool, and the intelligent agent is transferred into the training module after reaching a preset threshold;

the reactive power optimization training module is used for calculating the value of an Actor network action A and a target network A' of the Critic value network and the target network thereof, further calculating the loss and finding an optimal strategy model; the reactive voltage control of the system is a multi-constraint nonlinear programming problem, and the means for controlling the voltage in a new energy system containing energy storage and extraction are usually to comprehensively adjust AVR, change the generator terminal voltage, load transformer tap and reactive compensation device to control the reactive power injected into the power grid, so that the system achieves minimum active loss, and better economy and voltage stability are expected to be achieved.

The parameter updating module is used for updating part of parameters during each optimization training, so that the parameters can be slowly updated, and the learning stability of the intelligent body is improved.

It should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the technical solution of the present invention, which is intended to be covered by the scope of the claims of the present invention.

Claims

1. A new energy power grid voltage control method based on DDPG is characterized in that: comprising the steps of (a) a step of,

step 1, constructing a new energy power grid system containing energy storage and pumping storage into a plurality of intelligent agents, wherein the intelligent agents observe the reactive state of a system node at the current moment and randomly take the state S _t Inputs to the policy network Actor;

2. The DDPG-based new energy power grid voltage control method as set forth in claim 1, wherein the method comprises the following steps: in the step 1, the construction of a plurality of agents includes that each trained agent Actor and Critic network has the same structure and contains two layers of hidden networks, the number of neurons of each layer is 400 and 300, and the operation output of each layer is within the range of [ -1,1 ].

3. The DDPG-based new energy power grid voltage control method as set forth in claim 1, wherein the method comprises the following steps: in the step 2, the reactive voltage control of the system is a multi-constraint nonlinear programming problem, and the reactive power injected into the power grid is controlled by comprehensively adjusting the AVR, the on-load transformer tap and the reactive compensation device, so that the system achieves minimum active loss, and the Actor network automatically selects actions according to the input state.

4. The DDPG-based new energy power grid voltage control method as set forth in claim 1, wherein the method comprises the following steps: in the step (3) of the above-mentioned process,

actor network output, a _t ＝μ(s _t ∣θ ^μ )

Actor target network output, a _t+1 '＝μ′(s _t+1 ∣θ ^μ′ )

5. The DDPG-based new energy power grid voltage control method as set forth in claim 1, wherein the method comprises the following steps: the step 4 includes that the value of the current network action A of the Actor is calculated by the Critic current value network, and the value of the calculation action A' of the Critic target value network is recorded as a tag function y _t After calculating the action value, further calculating a minimized loss function L;

critic current network output formula q _t ＝Q(s _t ,a _t |θ ^Q )；

Critic target network output formula q _t ′ ₊₁ ＝Q′(s _t+1 ,μ′(s _t+1 ∣θ ^μ′ )∣θ ^Q )；

The marking function y _t The minimization loss function L expression is,

y _t ＝r _t +γQ′(s _t+1 ,μ′(s _t+1 ∣θ ^μ′ )∣θ ^Q )

wherein P is _i And Q _i Active power and reactive power injected for node i; u (U) _i And U _j The voltage magnitudes for nodes i and j; omega _ij The phase angle difference of the voltage between the nodes i and j is obtained; g _ij And B _ij The real part and the imaginary part of the node admittance matrix; the total node number of the system is n;

the variable constraint constitutes an inequality constraint,

wherein Q is _Gimax And Q _Gimin The reactive power output upper limit and the reactive power output lower limit of the pumping, storing and energy storing unit are set; q (Q) _Cimax And Q _Cimin The upper and lower limits of the capacity of the reactive compensation equipment; t (T) _imax And T _imin An adjustment range of the transformer tap; u (U) _imax And U _imin Representing a safe operating limit for the node voltage.

6. The DDPG-based new energy power grid voltage control method as set forth in claim 1, wherein the method comprises the following steps: in the step 5, the Actor network updates parameters by a gradient descent method,

7. A system based on a DDPG-based new energy grid voltage control method according to any one of claims 1-6, characterized in that: the system comprises a sampling module, a reactive power optimization training module and a parameter updating module;

8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.

9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.