CN112465664B

CN112465664B - AVC intelligent control method based on artificial neural network and deep reinforcement learning

Info

Publication number: CN112465664B
Application number: CN202011263523.7A
Authority: CN
Inventors: 朱勇; 陶用伟; 王常沛; 蒋宏荣; 徐坤; 李泽群; 张韵; 杨键; 黄琼; 杨晓燕; 邓钦; 郑华; 高卫华; 王秀境; 时敏; 李明宏; 刘岑俐; 肖彬; 肖浩宇; 王寅
Original assignee: Guizhou Power Grid Co Ltd; Kaili Power Supply Bureau of Guizhou Power Grid Co Ltd
Current assignee: Guizhou Power Grid Co Ltd; Kaili Power Supply Bureau of Guizhou Power Grid Co Ltd
Priority date: 2020-11-12
Filing date: 2020-11-12
Publication date: 2022-05-03
Anticipated expiration: 2040-11-12
Also published as: CN112465664A

Abstract

The invention discloses an AVC intelligent control method based on an artificial neural network and deep reinforcement learning, which comprises the steps of dividing a transformer substation into different sub-control areas by combining a situation prediction result of reactive load of a power grid and a reactive load change rule of a new energy grid-connected point; optimizing an action utility function based on a Bellman equation and a minimized loss function, and obtaining a decision metric function by combining the action utility function; training the agent by optimizing decision model parameters of the agent using the gradient of the decision metric function; and inputting the situation prediction results of different sub-regions and the reactive change rule of the new energy into the intelligent agent, and calculating the voltage control quantity of the power system through the intelligent agent to control the reactive voltage of the power grid. The invention trains the intelligent agent by combining the artificial neural network and the multi-agent reinforcement learning algorithm of the deterministic strategy, thereby improving the active control capability of the reactive voltage.

Description

AVC intelligent control method based on artificial neural network and deep reinforcement learning

Technical Field

The invention relates to the technical field of power control, in particular to an AVC intelligent control method based on an artificial neural network and deep reinforcement learning.

Background

In recent years, in the operation control process of the power system, the large-scale power failure accidents caused by insufficient situation perception are increased day by day in all countries in the world, and the wide-area situation perception of the power system is paid more and more attention; electric power system wide area situation perception includes through gathering wide area electric wire netting steady state and developments, electric quantity and non-electric quantity information: analyzing, understanding and evaluating equipment state information, power grid steady-state data information, power grid dynamic data information, power grid transient fault information, power grid operating environment information and the like by means of wide-area dynamic safety monitoring, data mining, dynamic parameter identification, super real-time simulation, visualization and the like, and further predicting the power grid development situation; the application of situation awareness technology in power systems is still in the beginning stage, and the situation awareness has been listed as one of the technical fields of preferential support of smart grids by mechanisms such as the U.S. federal energy management commission and the national standards and technical society.

With the rapid development of large-scale new energy access and alternating current-direct current hybrid power grids, the uncertainty of the source-load double sides is enhanced, the reactive voltage problem of the system is increasingly prominent, and the challenge is brought to the safe operation of the power grids; at present, reactive power optimization control belongs to system global optimization under a short time scale, control decisions do not have initiative and predictability, and the influence of uncertainty of new energy and reactive load on reactive voltage control under a long time scale is not fully considered, so that reactive power equipment is frequently adjusted, and the overall control effect under the long time scale is not ideal.

Disclosure of Invention

This section is for the purpose of summarizing some aspects of embodiments of the invention and to briefly introduce some preferred embodiments. In this section, as well as in the abstract and the title of the invention of this application, simplifications or omissions may be made to avoid obscuring the purpose of the section, the abstract and the title, and such simplifications or omissions are not intended to limit the scope of the invention.

The present invention has been made in view of the above-mentioned conventional problems.

Therefore, the invention provides an AVC intelligent control method based on an artificial neural network and deep reinforcement learning, which can avoid reactive voltage risks and solve the problem of poor reactive voltage active control effect.

In order to solve the technical problems, the invention provides the following technical scheme: the method comprises the steps of dividing the transformer substation into different sub-control areas by combining a situation prediction result of the reactive load of a power grid and a reactive load change rule of a new energy grid-connected point; optimizing an action utility function based on a Bellman equation and a minimized loss function, and obtaining a decision metric function by combining the action utility function; training the agent by optimizing decision model parameters of the agent using the gradient of the decision metric function; and inputting the situation prediction results of different sub-regions and the reactive change rule of the new energy into the intelligent agent, and calculating the voltage control quantity of the power system through the intelligent agent to control the reactive voltage of the power grid.

As a preferred scheme of the AVC intelligent control method based on the artificial neural network and the deep reinforcement learning, the AVC intelligent control method comprises the following steps: the situation prediction result comprises the steps of constructing a deep neural network regression model based on a deep artificial neural network, and integrating a plurality of regression load results of the deep neural network regression model to obtain the situation prediction result of the reactive load.

As a preferred scheme of the AVC intelligent control method based on the artificial neural network and the deep reinforcement learning, the AVC intelligent control method comprises the following steps: the method for constructing the deep neural network regression model comprises the following steps of constructing a regression model structure based on reactive load data characteristics and by considering climate environment, season, regional distribution, user load and power grid scheduling control strategies:

wherein k is the order; x is the number of^(k)A k-order hidden layer node unit vector is obtained; y is^(k)Is a k-order output node vector; u. of^(k)Is an input vector of order k;

inputting a vector for a k-order feedback state;

is a k-order feedback state vector;

outputting vectors for k-order hidden layers; omega_iA connection weight matrix of each layer, i is 1, 2, 3, 4, 5, 6; g () is the transfer function of the output neuron; f () is the transfer function of the middle layer neurons.

As a preferred scheme of the AVC intelligent control method based on the artificial neural network and the deep reinforcement learning, the AVC intelligent control method comprises the following steps: the regression model structure may further include,

wherein x is^(k-1)Is a k-1 order hidden layer node unit vector;

is a feedback state vector of k-1 order; u. of^(k-1)Is an input vector of order k-1;

inputting a vector for a feedback state of k-1 order; y is^(k-1)Is a k-1 order output node vector;

outputting vectors for k-1 order hidden layer; eta, b,

Is a self-feedback gain factor.

As a preferred scheme of the AVC intelligent control method based on the artificial neural network and the deep reinforcement learning, the AVC intelligent control method comprises the following steps: the minimizing a loss function includes defining the minimizing a loss function:

wherein,

for taking the independent variable as a training parameter

A time-dependent minimum loss function, E is an expected value, s is a current system state, s' is an environmental state at a next moment, a is a selected action in a corresponding state,

as a pool of experiences, y_iTo pass through the Bellman equation pair

And estimating the true value.

As a preferred scheme of the AVC intelligent control method based on the artificial neural network and the deep reinforcement learning, the AVC intelligent control method comprises the following steps: the estimated true values of the values may include,

wherein r is_iIs the return value obtained in the ith iteration; mu is a decision value, gamma represents a decay rate, and gamma is in the range of 0,1]；Q_i' is a function of the Q value of the next state target Critic network; s' is the next state entered by taking the action a at the system state s; a' is according to the target Actor network in the system state s

An act of selecting;

is a parameter of the target Actor network;

is a parameter of the target Critic network.

As a preferred scheme of the AVC intelligent control method based on the artificial neural network and the deep reinforcement learning, the AVC intelligent control method comprises the following steps: the parameters of the target Critic network and the parameters of the target Actor network comprise parameters passing through the actual Actor network

And (3) updating parameters:

parameters through a practical Critic network

And (3) updating parameters:

where τ controls the update rate.

As a preferred scheme of the AVC intelligent control method based on the artificial neural network and the deep reinforcement learning, the AVC intelligent control method comprises the following steps: the decision metric function includes defining the action utility function Q_i(s, a) is the expectation of the sum of the rewards subsequently obtained by the agents in the ith area after the action a is executed in the system state s, and then the decision metric function is:

as a preferred scheme of the AVC intelligent control method based on the artificial neural network and the deep reinforcement learning, the AVC intelligent control method comprises the following steps: the gradient of the decision metric function comprises the decision function parameters of the i-th regional agent

The gradient of (d) is:

wherein,

computing the sign of the gradient for the function; a is_iAn action value representing the ith iteration;

iterating the gradient i times for the action utility function,

for the target Actor network

The gradient is iterated i times.

As a preferred scheme of the AVC intelligent control method based on the artificial neural network and the deep reinforcement learning, the AVC intelligent control method comprises the following steps: the voltage control quantity comprises a calculation formula based on Newton-Raphson power flow, wherein the calculation formula of the voltage control quantity is as follows:

wherein U is the voltage control mass, M_iFor modulation of voltage source converters, U_dThe fundamental voltage of the dc node.

The invention has the beneficial effects that: the reactive voltage future situation prediction is formed based on the analysis of data samples of new energy and reactive load, the reactive voltage of a power grid is controlled through an intelligent agent, and meanwhile, the intelligent agent is trained by combining an artificial neural network and a multi-agent reinforcement learning algorithm of a deterministic strategy, so that the active control capability of the reactive voltage is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise. Wherein:

FIG. 1 is a schematic flowchart of an AVC intelligent control method based on an artificial neural network and deep reinforcement learning according to a first embodiment of the present invention;

fig. 2 is a schematic diagram of a transformer substation and a substation system region division of the AVC intelligent control method based on an artificial neural network and deep reinforcement learning according to the first embodiment of the present invention;

fig. 3 is a schematic diagram of an Actor network structure of an AVC intelligent control method based on an artificial neural network and deep reinforcement learning according to a first embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a criticic network of the AVC intelligent control method based on an artificial neural network and deep reinforcement learning according to the first embodiment of the present invention;

FIG. 5 is a schematic diagram of an agent training process of an AVC intelligent control method based on an artificial neural network and deep reinforcement learning according to a first embodiment of the present invention;

FIG. 6 is a schematic diagram of the operation flow of an agent in an AVC intelligent control method based on an artificial neural network and deep reinforcement learning according to a first embodiment of the present invention;

fig. 7 is a schematic diagram of a loss function curve of an Actor network of an AVC intelligent control method based on an artificial neural network and deep reinforcement learning according to a second embodiment of the present invention;

FIG. 8 is a graph illustrating a loss function curve of a criticc network of an AVC intelligent control method based on an artificial neural network and deep reinforcement learning according to a second embodiment of the present invention;

FIG. 9 is a diagram illustrating the variation of the total reward function and the action times of the AVC intelligent control method based on artificial neural network and deep reinforcement learning according to the second embodiment of the present invention with the training process;

fig. 10 is a schematic diagram of voltage amplitudes of nodes before and after the intelligent agent controls in a certain operating state according to the AVC intelligent control method based on the artificial neural network and the deep reinforcement learning according to the second embodiment of the present invention;

fig. 11 is a schematic diagram of a loss function curve of an Actor network in consideration of new energy output fluctuation according to a second embodiment of the AVC intelligent control method based on an artificial neural network and deep reinforcement learning;

fig. 12 is a schematic diagram of a loss function curve of the Critic network in consideration of new energy output fluctuation in the AVC intelligent control method based on an artificial neural network and deep reinforcement learning according to the second embodiment of the present invention;

fig. 13 is a schematic diagram of the action times of each agent in consideration of new energy fluctuation according to the AVC intelligent control method based on an artificial neural network and deep reinforcement learning according to the second embodiment of the present invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, specific embodiments accompanied with figures are described in detail below, and it is apparent that the described embodiments are a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making creative efforts based on the embodiments of the present invention, shall fall within the protection scope of the present invention.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.

Furthermore, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.

The present invention will be described in detail with reference to the drawings, wherein the cross-sectional views illustrating the structure of the device are not enlarged partially in general scale for convenience of illustration, and the drawings are only exemplary and should not be construed as limiting the scope of the present invention. In addition, the three-dimensional dimensions of length, width and depth should be included in the actual fabrication.

Meanwhile, in the description of the present invention, it should be noted that the terms "upper, lower, inner and outer" and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of describing the present invention and simplifying the description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation and operate, and thus, cannot be construed as limiting the present invention. Furthermore, the terms first, second, or third are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

The terms "mounted, connected and connected" in the present invention are to be understood broadly, unless otherwise explicitly specified or limited, for example: can be fixedly connected, detachably connected or integrally connected; they may be mechanically, electrically, or directly connected, or indirectly connected through intervening media, or may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

Example 1

Referring to fig. 1 to 6, a first embodiment of the present invention provides an AVC intelligent control method based on an artificial neural network and deep reinforcement learning, including:

s1: and dividing the transformer substation into different sub-control areas by combining the situation prediction result of the reactive load of the power grid and the reactive load change rule of the new energy grid-connected point.

(1) And constructing a deep neural network regression model based on the deep artificial neural network, and integrating a plurality of regression load results of the deep neural network regression model to further obtain a situation prediction result of the reactive load.

It should be noted that, before the deep neural network regression model is constructed, the load data is preprocessed by methods including denoising, normalization, whitening and the like, massive reactive load data is integrated, error load data is eliminated, and a reactive load data set with a complete structure, a standard format and a low error rate is generated.

Specifically, the constructing of the deep neural network regression model comprises,

based on the reactive load data characteristics, a regression model structure is constructed by considering climate environment, season, regional distribution, user load and a power grid dispatching control strategy, and input information of a middle layer of the structure comprises an input layer, an input bearing layer, a middle bearing layer and an output bearing layer; the input of the output layer comprises an intermediate layer and an intermediate receiving layer, and the mathematical formula corresponding to the regression model is as follows:

inputting a vector for a k-order feedback state;

is a k-order feedback state vector;

k order feedback state vector

Comprises the following steps:

k order feedback state input vector

Comprises the following steps:

k order hidden layer output vector

Comprises the following steps:

wherein x is^(k-1)Is a k-1 order hidden layer node unit vector;

a k-1 order hidden layer output vector; eta (eta is more than or equal to 0),

Is a self-feedback gain factor.

It should be noted that, in this embodiment, g () uses a linear function, and f () uses a Sigmoid function;

sigmoid function is shown as follows:

further, integrating a plurality of regression load results of the regression model based on the preprocessed data set to obtain a situation prediction value of the reactive load.

(2) Because reactive power compensation devices installed in different new energy plants are different and reactive voltage control methods of the reactive power compensation devices are different, firstly, reactive power characteristics of different reactive power sources are analyzed based on the near-area actual situation of the new energy plants; and (3) integrating the output and load characteristics of the new energy based on a cluster analysis method, and equivalently obtaining the equivalent load characteristic of the near region of the system energy field station to obtain the fluctuation rule of the node voltage under the equivalent load characteristic.

Specifically, Clustering Analysis (Clustering Analysis) is an Analysis method for grouping according to the principle of maximizing intra-class similarity and minimizing inter-class similarity of objects, and also belongs to a descriptive mining task.

The embodiment adopts K-means to perform partition clustering on the data.

Dividing a data set D into K classes, and evaluating the cluster quality by using the sum of squared errors, wherein the classes are defined as follows:

wherein E represents the sum of the squares of the errors for all objects of the data set; p represents a point of a given data object in space; dist (x, y) represents the Euclidean distance in space from point x to point y.

And secondly, determining the value of the cluster number k by adopting an elbow method.

Finding out an inflection point through SSE (sum of the squared errors), wherein the K value at the moment is the value obtained; calculating the SSE:

wherein, C_iIs the ith cluster, p is C_iSample point of (1), m_iIs C_iThe center of mass of;

region division referring to fig. 2, each is controlled by 2 different agents.

S2: and optimizing the action utility function based on the Bellman equation and the minimized loss function, and combining the action utility function to obtain a decision metric function.

Defining a minimization loss function:

wherein,

for taking the independent variable as a training parameter

The function of the minimum loss of time,e is the expected value, s is the current system state, s' is the environmental state at the next moment, a is the selected action in the corresponding state,

as a pool of experiences, y_iIs to pass through a Bellman equation pair

And estimating the true value.

In particular, the method comprises the following steps of,

wherein r is_iThe return value obtained in the ith iteration is obtained; mu is a decision value; gamma denotes the decay rate and gamma is in [0,1 ]]When γ is 0, only immediate return is considered and no long-term return is considered, and when γ is 1, the system considers both long-term return and immediate return as equally important; q_i' is a function of the Q value of the next state target Critic network; (ii) a s' is the next state entered by taking action a at system state s; a' is a network according to a target Actor in a system state s

An act of selecting;

is a parameter of the target Actor network;

is a parameter of the target Critic network.

Further, updating the parameters of the target Critic network and the parameters of the target Actor network:

the embodiment adopts an Adaptive motion estimation (Adam) optimization algorithm to update parameters;

wherein, it is required to be noted that: momentum gradient descent part (exponentially weighted average) in Adam optimization algorithm:

v_dw＝β₁v_dw+(1-β₁)dW

v_db＝β₁v_db+(1-β₁)db

RMSprop section (exponentially weighted average of squared versions) in Adam optimization algorithm:

S_dw＝β₂S_dw+(1-β₂)dW²

S_db＝β₂S_db+(1-β₂)db²

wherein, beta₁Is the first torch, beta₂Is a second torch;

thus, the parameters of the actual Actor network are passed

And (3) updating parameters:

parameters through the actual Critic network

And (3) updating parameters:

wherein τ controls the update rate, and τ < 1 is usually satisfied.

Still further, an action utility function Q is defined_i(s, a) is the expectation of the sum of the rewards subsequently obtained by agents in the ith zone after performing action a in system state s:

Q_i(s,a)＝E(r(s,a)+γmaxQ_i(s′,a′))

wherein r (s, a) is the return value after executing action a under the system state s, Q_i(s ', a') is the goodness of taking action a 'under system state s';

the decision metric function is then:

s3: the agent is trained by optimizing decision model parameters of the agent using the gradient of the decision metric function.

And optimizing the decision model of the agent in the ith area through the gradient of the decision metric function to finish the training of the agent.

In particular, the decision function parameter of the i-th area agent

The gradient of (d) is:

wherein,

the gradient is iterated i times for the action utility function,

for target Actor network

The gradient is iterated i times.

S4: and inputting the situation prediction results of different sub-regions and the reactive change rule of the new energy into the intelligent agent, and calculating the voltage control quantity of the power system through the intelligent agent to control the reactive voltage of the power grid.

Based on Newton Raphson power flow calculation, the calculation formula of the voltage control quantity is as follows:

Example 2

In order to verify and explain the technical effect adopted in the method, the embodiment selects the new energy power station which does not generate output fluctuation and the new energy power station which generates output fluctuation to carry out voltage control comparison test, and compares the test results by a scientific demonstration means to verify the real effect of the method.

(1) Analysis of voltage control result when no output fluctuation occurs in new energy power station

Firstly, analyzing the effect of the invention on the voltage control of the power system under the condition that the output of the new energy power station is relatively stable; under the condition, the active output and the load of each generator set (including a new energy generator set) in the power system are kept near relatively stable values in the whole voltage real-time control process, so that the active output and the load of the generator are considered to be kept unchanged in the process of interaction of each agent and the power system environment, and only the change of the generator terminal voltage caused by excitation regulation of the generator is considered.

The method comprises the steps of generating power system operation state data samples through random sampling, training the agents in two areas by using the first 70% of group operation states of which the node load change ranges are 0.8-1.2 times of rated load, and using the last 30% of group operation states of which the node load change ranges are 0.7-1.3 times of rated load as a verification set of a regression model.

As can be seen from fig. 7 and 8, with the progress of the training process, the loss function of the Actor network of the agent first rises obviously, then falls, and finally converges to a stable value; this shows that the parameter initialization of the neural network is random, and the output of the Actor network cannot effectively regulate the generator terminal voltage in the training early stage, so that the voltage of the power system is out of limit, and the loss function is high; however, with the continuous update of the neural network parameters, after the generator terminal voltage is set according to the output of the Actor network, the voltage level of the power system is effectively controlled, and the loss function is continuously reduced, which shows that the training algorithm provided by the method can effectively train the regression model.

It can be known by comparing the loss function curves between the two agents that the drop speed of the Actor network loss function of agent 1 is significantly faster than that of agent 2, and when the Critic network loss function of agent 1 converges, the fluctuation degree is significantly lower than that of agent 2, which means that the number of nodes in area 1 controlled by agent 1 is less than that in area 2 controlled by agent 2, and the control action of agent 2 can be used for controlling the node voltage of area 1, and the node voltage of area 2 is controlled by agent 2 only, indicating that for the training strategy proposed by the method, the number of nodes is less, the more controllable node voltage units are in the area, and the corresponding agent model is easier to train.

In fig. 9, the gray line in the left graph is the total reward curve obtained by each agent during each screen interaction, the black line is the smoothed total reward curve, and the black dotted points in the right graph represent the required action times when the control voltage of each agent is not exceeded during each screen interaction; therefore, as can be seen from fig. 9, in the training process, the total reward obtained by each agent is continuously increased, and the number of actions required by each agent to control the voltage to be not out-of-limit is continuously reduced, which means that after each agent is continuously trained, the number of actions required to be executed from the control voltage to be not out-of-limit is as small as possible, and when the training is completed and the test is performed, the agent can execute only one or two actions to prevent the voltage from being out-of-limit.

Taking a certain running state in the test, calculating the voltage sum and the average value of each node in each intelligent agent control area before and after control, and displaying the result visually as shown in fig. 10; in fig. 10, dotted dots indicate upper and lower limits of node voltages, scattered gray dots indicate voltages of respective nodes before control, black dotted lines indicate average values of voltages of respective nodes before control, and triangular dotted lines indicate average values of voltages of respective nodes after control; before control, the voltage of each node is integrally higher, and the voltage of each node is higher than the upper limit; after the intelligent agent control, the voltage of each node moves towards the direction of the voltage reference value 1.0, and the average value of the voltage is close to 1.0, which shows that the node voltage is effectively controlled from the out-of-limit.

(2) Voltage control result analysis considering output fluctuation of new energy power station

Under the condition of considering new energy fluctuation, the uncertainty of the output of the new energy unit is enhanced, so in the process of real-time voltage control, the output of the new energy unit should be regarded as a variable quantity, that is, in the process of interaction between each intelligent agent and the power system environment, the active output of the new energy unit is considered to be changed, and meanwhile, the change of the generator terminal voltage caused by generator excitation regulation is considered.

Similarly, the running state data sample of the power system is generated through random sampling, but in the interaction process of the intelligent agent and the environment, the load power of the node 2 and the output of the wind turbine generator of the node 3 need to be dynamically adjusted, and in each step of the interaction, the adjustment range of the randomly generated load and the output of the wind turbine generator is increased to 0.5-1.3 times of the rated power relative to the generation of the sample, so that the uncertainty of stronger output of the new energy turbine generator is reflected.

Fig. 11 and 12 show loss function curves of the Actor network and the Critic network after the new energy output fluctuation is considered, and comparing the convergence conditions of the loss functions in fig. 7 and 8, it can be seen that the drop speed of the loss function of the Actor network is lower than that when the new energy fluctuation is not considered, and the convergence value of the loss function is higher than that when the new energy fluctuation is not considered; the loss function of the criticic network is reduced after being trained for a certain number of times, but the loss function is difficult to converge to a stable value, and the fluctuation with a large amplitude is kept, so that the model is more difficult to train under the condition of considering the fluctuation of new energy.

Fig. 13 shows the number of actions required when the node voltage of each agent control area is not out of limit in consideration of new energy fluctuation, and as can be seen from comparison with fig. 9, the number of control actions required by each agent is relatively more, and can be as high as 50 or more; however, with the progress of the training process, the number of actions required for controlling the voltage not to exceed the limit can be continuously reduced, and finally, the number can be basically controlled to be less than 5, which shows that although the difficulty of model training is higher when the fluctuation of new energy is considered, the model with the control effect can still be obtained through training.

It should be noted that the above-mentioned embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.

Claims

1. An AVC intelligent control method based on an artificial neural network and deep reinforcement learning is characterized in that: comprises the steps of (a) preparing a mixture of a plurality of raw materials,

dividing the transformer substation into different sub-control areas by combining the situation prediction result of the reactive load of the power grid and the reactive load change rule of the new energy grid-connected point;

optimizing an action utility function based on a Bellman equation and a minimized loss function, and obtaining a decision metric function by combining the action utility function;

training the agent by optimizing decision model parameters of the agent using the gradient of the decision metric function;

inputting situation prediction results of different sub-control areas and a new energy reactive power change rule into the intelligent agent, and calculating the voltage control quantity of the power system through the intelligent agent to control the reactive voltage of the power grid;

wherein the step of obtaining the situation prediction result comprises:

constructing a deep neural network regression model based on a deep artificial neural network, and integrating a plurality of regression load results of the deep neural network regression model to further obtain a situation prediction result of the reactive load;

the constructing of the deep neural network regression model comprises the following steps:

based on the reactive load data characteristics, considering the climate environment, season, regional distribution, user load and power grid dispatching control strategy, constructing the deep neural network regression model:

inputting a vector for a k-order feedback state;

is a k-order feedback state vector;

outputting vectors for k-order hidden layers; omega_iA connection weight matrix of each layer, i is 1, 2, 3, 4, 5, 6; g () is the transfer function of the output neuron; f () is the transfer function of the middle layer neurons;

the deep neural network regression model further includes,

wherein x is^(k-1)Is a k-1 order hidden layer node unit vector;

outputting vectors for k-1 order hidden layer; eta, b,

Is a self-feedback gain factor.

2. The AVC intelligent control method based on artificial neural network and deep reinforcement learning of claim 1, wherein: the function for minimizing the loss comprises,

defining the minimization of loss function:

wherein,

for taking the independent variable as a training parameter

A time minimum loss function, E is an expected value, s is the current system state, s' is the environment state at the next moment, and a is the phaseIn response to the action selected in the state,

as a pool of experiences, y_iTo pass through the Bellman equation pair

And estimating the true value.

3. The AVC intelligent control method based on artificial neural network and deep reinforcement learning of claim 2, wherein: the estimated true values include, for example,

wherein r is_iThe return value obtained in the ith iteration is obtained; mu is a decision value, gamma represents a decay rate, and gamma is in the range of 0,1]；Q_i' is a function of the Q value of the next state target Critic network; s' is the next state entered by taking the action a at the system state s; a' is according to the target Actor network in the system state s

An act of selecting;

is a parameter of the target Actor network;

is a parameter of the target Critic network.

4. The AVC intelligent control method based on artificial neural network and deep reinforcement learning of claim 3, wherein: the parameters of the target Critic network and the parameters of the target Actor network comprise,

through the actual Actor networkParameters of the network

And (3) updating parameters:

parameters through a practical Critic network

And (3) updating parameters:

where τ controls the update rate.

5. The AVC intelligent control method based on artificial neural network and deep reinforcement learning of claim 4, wherein: the decision metric function includes at least one of,

defining the action utility function Q_i(s, a) is the expectation of the sum of the rewards subsequently obtained by the agents in the ith area after the action a is executed in the system state s, and then the decision metric function is:

6. the AVC intelligent control method based on artificial neural network and deep reinforcement learning of claim 5, wherein: the gradient of the decision metric function includes,

decision function parameters of the ith regional agent

Gradient of (2)Comprises the following steps:

wherein,

iterating the gradient i times for the action utility function,

for the target Actor network

The gradient is iterated i times.

7. The AVC intelligent control method based on artificial neural network and deep reinforcement learning of claim 6, wherein: the voltage control amount may include a voltage control amount,

wherein U is the voltage control quantity, M_iFor modulation of voltage source converters, U_dThe fundamental voltage of the dc node.