CN112787331A - Deep reinforcement learning-based automatic power flow convergence adjusting method and system - Google Patents

Deep reinforcement learning-based automatic power flow convergence adjusting method and system Download PDF

Info

Publication number
CN112787331A
CN112787331A CN202110114628.4A CN202110114628A CN112787331A CN 112787331 A CN112787331 A CN 112787331A CN 202110114628 A CN202110114628 A CN 202110114628A CN 112787331 A CN112787331 A CN 112787331A
Authority
CN
China
Prior art keywords
state
power
node
reinforcement learning
convergence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110114628.4A
Other languages
Chinese (zh)
Other versions
CN112787331B (en
Inventor
李烨
蒲天骄
王新迎
顾雨嘉
田蓓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
China Electric Power Research Institute Co Ltd CEPRI
Electric Power Research Institute of State Grid Ningxia Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
China Electric Power Research Institute Co Ltd CEPRI
Electric Power Research Institute of State Grid Ningxia Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, China Electric Power Research Institute Co Ltd CEPRI, Electric Power Research Institute of State Grid Ningxia Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN202110114628.4A priority Critical patent/CN112787331B/en
Publication of CN112787331A publication Critical patent/CN112787331A/en
Application granted granted Critical
Publication of CN112787331B publication Critical patent/CN112787331B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/04Circuit arrangements for ac mains or ac distribution networks for connecting networks of the same frequency but supplied from different sources
    • H02J3/06Controlling transfer of power between connected networks; Controlling sharing of load between connected networks
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/10Power transmission or distribution systems management focussing at grid-level, e.g. load flow analysis, node profile computation, meshed network optimisation, active network management or spinning reserve management

Landscapes

  • Engineering & Computer Science (AREA)
  • Power Engineering (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The embodiment of the application discloses a method and a system for automatically adjusting trend convergence based on deep reinforcement learning. The method comprises the following steps: acquiring power grid data to form a data sample; generating a state space for the data sample according to the constructed reinforcement learning model; when the reinforcement learning model is constructed, selecting control actions to form an action space; constructing a state space according to the trend states of a plurality of data samples in the data set and the mapping relation between the execution control action in the action space and the formed working state; and acquiring the current operation state of the power grid system from the state space, adjusting the power flow convergence based on the final network parameters formed by the current operation state, and outputting an adjustment scheme. According to the technical scheme, automatic adjustment of large-scale power grid flow convergence can be achieved, the working intensity of operation mode setting personnel is reduced, and the working efficiency is improved.

Description

Deep reinforcement learning-based automatic power flow convergence adjusting method and system
Technical Field
The embodiment of the application relates to the technical field of automatic adjustment of power flow convergence, in particular to a method and a system for automatically adjusting power flow convergence based on deep reinforcement learning.
Background
With the continuous development of science and technology, the operation mode of the power grid system is also changed greatly. The load flow calculation is the basis for simulation analysis of the power grid operation mode and the planning scheme. The actual large power grid nodes are tens of thousands, the structure is complex, the adjustable variables are numerous, and the difficulty in power flow convergence adjustment is high. The traditional trend adjusting mode seriously depends on manual experience, and consumes a great amount of manpower and time, so that the working efficiency of establishing the operation mode is low. With the development of artificial intelligence technology, the artificial intelligence technology has some exploration and application in the field of electric power. An artificial intelligence method for automatically adjusting power flow convergence is urgently needed in an electric power system to improve the efficiency of the work of setting an operation mode.
At present, there are initial attempts of artificial intelligence technology in trend analysis, which mainly relate to two aspects of operation mode arrangement and management based on an intelligent method and trend adjustment based on an expert system. Simulating operation mode arrangement based on a knowledge base, and developing an intelligent regional power grid operation mode arrangement decision support system; extracting and representing characteristic variables of the power grid operation state through cluster analysis, and realizing fine management of the power grid operation mode; and combining static network equivalence with an expert system, and providing a large-range power grid reactive voltage adjusting method and the like. The research mainly integrates expert knowledge into a knowledge base, solves practical problems by utilizing the existing expert experience, still has great dependence on manpower, and is lack of a self-exploration mechanism.
Disclosure of Invention
The embodiment of the application provides a method and a system for automatically adjusting power flow convergence based on deep reinforcement learning, so that the automatic adjustment of large-scale power grid power flow convergence is realized, the working intensity of operation mode setting personnel is reduced, and the working efficiency is improved.
In a first aspect, an embodiment of the present application provides an automatic power flow convergence adjustment method based on deep reinforcement learning, including:
acquiring power grid data to form a data sample;
generating a state space for the data sample according to the constructed reinforcement learning model; when the reinforcement learning model is constructed, selecting control actions to form an action space; constructing a state space according to the trend states of a plurality of data samples in the data set and the mapping relation between the execution control action in the action space and the formed working state;
and acquiring the current operation state of the power grid system from the state space, adjusting the power flow convergence based on the final network parameters formed by the current operation state, and outputting an adjustment scheme.
In the embodiment, in the construction of the reinforcement learning model, an action space construction method based on a sensitivity method is provided, and a reward function which meets the operation constraint requirement of an actual power grid is designed by combining knowledge and experience. By means of the mode of combining the reinforcement learning model and the deep learning model, automatic adjustment of large-scale power grid load flow convergence can be achieved, working intensity of operation mode setting personnel is reduced, and working efficiency is improved.
Preferably, when the control action is selected, the forming of the action space includes the following steps: according to the voltage sensitivity calculation of the load node, the sensitivity of the voltage state variable to the node injection reactive power, the transformer tap and the voltage of the generator terminal is respectively obtained; respectively sorting the voltage sensitivity of each node to obtain a sorting result of the voltage sensitivity of each node; calculating the sensitivity of the node injection power to the line power; obtaining and sequencing power sensitivity of each branch circuit; selecting adjustment quantities according to the voltage of each node and the power sensitivity sequencing result of each branch circuit to respectively form an adjustment quantity matrix of each action; determining a capacitor/reactor to be adjusted according to the calculation result of the voltage sensitivity of the load node and the sensitivity of the node injection power to the line power and the position of the node with insufficient reactive power; generating each validity matrix according to a capacitor/reactor to be adjusted and the added PV nodes obtained from the power grid topological structure; and combining the action adjustment quantity matrixes and the effectiveness matrixes into a set to form an action space.
In the embodiment, a reinforcement learning executable action set is constructed based on a sensitivity analysis method, so that the search space is reduced, and the adjustment efficiency is improved.
Preferably, in any one of the above embodiments, each motion adjustment quantity matrix includes an active output adjustment quantity of the generator, an active output adjustment quantity of the generator that has a large influence on line active power, a generator terminal voltage adjustment quantity that has a large influence on node voltage, and a transformer tap adjustment quantity that has a large influence on node voltage;
each validity matrix comprises validity of an adjustable reactor which has a large influence on the node voltage, validity of an adjustable capacitor which has a large influence on the node voltage, and validity of an additional PV node.
Preferably, in any one of the above embodiments, the state space is a set of tidal current states reflecting observable variables of the system state in the data sample; the observable includes generator output, capacitor/reactor switching effectiveness, effectiveness of added PV nodes, and inter-area exchange power.
In any of the above embodiments, preferably, when the reinforcement learning model is constructed, the method further includes designing a reward function according to the following method:
setting a reward value of load flow convergence and a penalty value of load flow non-convergence according to the load flow calculation result;
and setting a reward value when the output of the generator exceeds the limit and a reward value when the output of the balancing machine exceeds the limit according to the output constraints of the generator and the balancing machine.
When the deep learning model preferred in any of the above embodiments is trained, the method includes
Setting a target Q function in a reinforcement learning model, constructing a target Q network and a valuation Q' network according to the target Q function, and respectively setting the same network activation function and loss function for the two networks;
and (5) performing iterative training by adopting a DDQN algorithm until iterative convergence is stable to obtain Q network parameters.
Further preferably, the activation function is a PReLU function. In the present embodiment, the prilu function is proposed as an improvement for the problem that the gradient is 0 during the back propagation of the deep neural network activation function.
Further preferably, the target Q function is represented by the following formula:
yj=rj+γQ′[s′j,argmaxa′Q(s′j,a,ω),ω′]
wherein Q (s, a, ω) is the target Q network, Q '(s', a ', ω') is the estimated Q network, γ is the conversion ratio, rjIs the prize value.
Further preferably, the modified part of the loss function is L formed by a norm1A regularization term, the loss function being represented by the following equation: and selecting a norm as a regular term of the loss function to improve the over-fitting problem of the model.
Figure BDA0002920016050000031
Wherein, yiIs a target Q function; f (x)i(ii) a Omega) is an activation function of an initial state, lambda | omega | non-woven gas1As a regular term, N is the total number of iterations.
Another embodiment of the present invention further provides a deep reinforcement learning-based automatic power flow convergence adjustment system, including:
the environment establishing module is used for acquiring power grid data to form a data sample;
generating a state space for the data sample according to the constructed reinforcement learning model; when the reinforcement learning model is constructed, selecting control actions to form an action space; constructing a state space according to the trend states of a plurality of data samples in the data set and the mapping relation between the execution control action in the action space and the formed working state;
and the deep learning model calculation module is used for acquiring the current operation state of the power grid system from the state space, adjusting the load flow convergence based on the final network parameters formed by the current operation state, and outputting an adjustment scheme.
According to the technical scheme provided by the embodiment of the application, the automatic adjustment of the large power grid load flow convergence is realized by adopting a deep reinforcement learning model, an action space construction method based on a sensitivity method is provided in the construction of the reinforcement learning model, and a reward function which meets the actual power grid operation constraint requirement is designed by combining knowledge and experience. Aiming at the problem that the gradient is 0 in the process of back propagation of the deep neural network activation function, a PReLU function is proposed as an improvement; and selecting a norm as a regular term of the loss function to improve the over-fitting problem of the model. The method can realize automatic adjustment of large-scale power grid flow convergence, reduce the working intensity of operation mode setting personnel and improve the working efficiency.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:
fig. 1 is a flowchart of an automatic power flow convergence adjustment method provided in an embodiment of the present application;
FIG. 2 is a schematic diagram of a line power flow profile provided by an embodiment of the present application;
fig. 3 is a schematic diagram of a transformer branch power flow distribution provided by an embodiment of the present application;
FIG. 4 is a flow chart of the construction of an action space provided by an embodiment of the present application;
FIG. 5 is a flow chart of a DDQN algorithm provided by an embodiment of the present application;
fig. 6 is a schematic structural diagram of an automatic power flow convergence adjustment system according to an embodiment of the present application.
Detailed Description
The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
The following detailed description is exemplary in nature and is intended to provide further details of the invention. Unless otherwise defined, all technical terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention.
With the development of artificial intelligence technology, machine learning algorithms make it possible to learn a solution to a problem by the machine itself, with little or no human intervention. The trend adjustment is a discrete action problem, and an optimal value algorithm represented by Q-learning in the reinforcement learning algorithm is suitable for processing the discrete action problem. In addition, deep learning and reinforcement learning are combined, fitting efficiency and fitting precision of a reinforcement learning value function can be improved, and therefore a possible solution is provided for achieving large-scale automatic power grid load flow adjustment.
Fig. 1 is a flowchart of a method for automatically adjusting power flow convergence based on deep reinforcement learning according to an embodiment of the present application, where the present embodiment is applicable to a case of automatically adjusting power flow, and the method may be executed by an apparatus for automatically adjusting power flow convergence, which may be implemented by software and/or hardware and may be integrated in an electronic device.
As shown in fig. 1, the method for automatically adjusting power flow convergence includes:
s110, acquiring power grid data to form a data sample;
s120, generating a state space for the data sample according to the constructed reinforcement learning model; when the reinforcement learning model is constructed, selecting control actions by using a sensitivity analysis method to form an action space; constructing a state space according to the trend states of a plurality of data samples in the data set and the mapping relation between the execution control action in the action space and the formed working state;
firstly, a reinforcement learning model environment is constructed, and the construction mainly comprises the construction of a state space, the construction of an action space, the design of a reward function, the design of a target Q function and the like.
In this embodiment, optionally, the process of constructing the state space includes:
selecting observable variables reflecting the system state; the observable variables include: the output of the generator, the switching effectiveness of the capacitor/reactor, the effectiveness of adding PV nodes and the exchange power among the areas;
in the power flow convergence adjustment, the state space needs to be capable of reflecting the active power and reactive power balance characteristics of the system and the relevant characteristics of the region. By combining the above considerations, the observable variables capable of reflecting the system state are selected to mainly comprise the output of the generator, the switching effectiveness of the capacitor/reactor, the effectiveness of the added PV node, the exchange power between the areas and the like. The tidal current states of a plurality of samples jointly form a state space, as shown in formula (1):
Figure BDA0002920016050000061
in the formula (I), the compound is shown in the specification,
Figure BDA0002920016050000062
the active output of the generator j of the ith sample is obtained;
Figure BDA0002920016050000063
validity of capacitor/reactor k for the ith sample;
Figure BDA0002920016050000064
the effectiveness of the PV node l added for the ith sample;
Figure BDA0002920016050000065
power is exchanged for the region m of the ith sample. (j 1.. times.x; k.1.. times.N)CR;l=1,...,NPV;m=1,...,NE. Wherein, x and NCR、NPVAnd NEThe number of the generator, the capacitor/reactor, the added PV node and the area exchange channel is respectively, and M is the number of samples. )
As shown in fig. 4, in this embodiment, optionally, the constructing process of the action space includes:
selecting the generator terminal voltage of the system, the transformer tap and the node which have great influence on the node voltage, injecting reactive power, and injecting active power into the node which has great influence on the line active power change by adopting a sensitivity analysis method;
and determining the capacitor/reactor to be adjusted according to the reactive compensation device near the node and the sensitivity analysis result.
Considering that for the active power of a certain line, not all the active power of the generator is adjusted to have obvious influence on the active power; not all generator side voltages, capacitors/reactors or transformer taps will have a significant effect on the voltage at a certain node by regulation. Therefore, in order to adjust the trend more efficiently, it is necessary to select an appropriate adjustable variable to form an action space of the system, so as to improve the search efficiency of the learning process.
Selecting the generator terminal voltage of the system, the tap joint of the transformer and the node with large influence on the node voltage, injecting reactive power and the node injecting power with large influence on the active power change of the line by adopting a sensitivity analysis method; for the selection of the capacitor/reactor, the nodes with insufficient reactive power are mostly heavy-load buses with multiple outgoing lines, the reactive power compensation devices near the nodes are mainly concerned, and the capacitor/reactor to be adjusted is determined by combining sensitivity analysis results.
The forming of the motion space comprises the following steps:
s410, calculating according to the voltage sensitivity of the load node, and respectively obtaining the sensitivity of the voltage state variable to the node injection reactive power, the voltage of a transformer tap and the voltage of the generator terminal; respectively sorting the voltage sensitivity of each node to obtain a sorting result of the voltage sensitivity of each node;
s420, calculating the sensitivity of the node injection power to the line power; obtaining and sequencing power sensitivity of each branch circuit;
s430, selecting adjustment quantities according to the voltage of each node and the power sensitivity sequencing result of each branch, and respectively forming an adjustment quantity matrix of each action;
s440, determining a capacitor/reactor to be adjusted according to the calculation result of the voltage sensitivity of the load node and the sensitivity of the node injection power to the line power and the position of the node with insufficient reactive power;
s450, generating each validity matrix according to the capacitor/reactor to be adjusted and the added PV nodes obtained from the power grid topological structure;
and S460, combining the action adjustment quantity matrixes and the effectiveness matrixes into a set to form an action space.
The specific calculation sensitivity adopts the following process, and as shown in fig. 2, the power transmitted on the transmission line is derived as follows:
Figure BDA0002920016050000071
the transformation into polar form is available:
Figure BDA0002920016050000072
the real and imaginary parts of equation (3) are separated:
Figure BDA0002920016050000081
Figure BDA0002920016050000082
for transmission lines, Gij<<BijAnd thetaij≈0,Gijsinθij<<BijcosθijThe formula (5) is simplified to
Figure BDA0002920016050000083
Fig. 3 is a schematic diagram of a power flow distribution of a transformer branch according to an embodiment of the present application, and as shown in fig. 3, power transmitted by the transformer branch is derived as follows:
Figure BDA0002920016050000084
will be provided with
Figure BDA0002920016050000085
YT=Gij+jBijSubstituted for formula (7) and expressed in polar coordinates
Figure BDA0002920016050000086
The real and imaginary parts of equation (8) are separated:
Figure BDA0002920016050000087
Figure BDA0002920016050000088
due to thetaij0, therefore Gijsinθij0, equation (10) can be simplified to:
Figure BDA0002920016050000089
obtaining the reactive power injected into the node by the formula (6) and the formula (11):
Figure BDA00029200160500000810
equation (12) is expressed as:
Q=[QD,QC]T (13)
wherein Q isDBeing the load and generator nodesReactive, QCReactive power is injected into the reactive power compensation node.
Will the generator terminal voltage UGLoad node voltage ULWith the transformer tap T as the control variable and the other variables as the state variables, the respective sensitivities are obtained as:
the sensitivity of the voltage state variable to node injected reactive power is
Figure BDA0002920016050000091
The sensitivity of the voltage state variable to the transformer tap is
Figure BDA0002920016050000092
The sensitivity of the voltage state variable to the voltage at the generator terminal is
Figure BDA0002920016050000093
(1) Derivation of node injected power to line power sensitivity
For a high-voltage transmission network, the reactance value is far larger than the resistance value, the resistance is omitted, and a generator and a load in the power grid are represented as node injection currents. The node voltage equation of the network is
IN=YNUN (17)
In the formula: i isNInjecting a current column vector for the node (with the direction of the incoming node being the positive direction); u shapeNIs a node voltage column vector; y isNIs a node susceptance matrix.
The relationship between the network branch current and the node voltage is
IB=YBATUN (18)
In the formula: i isBIs a branch current column vector; y isBIs a branch susceptance matrix; a is a nodeAnd (4) associating the matrix.
From formulas (17) and (18):
Figure BDA0002920016050000094
defining a grid correlation coefficient matrix C (lambda) as:
Figure BDA0002920016050000095
taking branch k as an example, its current vector Ik,BIs a linear combination of the injected currents at each node:
Ik,B=λk-1I1,N+...+λk-iIi,N+...+λk-nIn,N (21)
in the formula Ik,BAnd node I injects a current Ii,NHas a correlation of λk-i,|λk-iThe larger is i, the larger is the influence of the change in the injection current of the node i on the current on the branch k.
Obtained by processing the formula (21)
Figure BDA0002920016050000101
In the formula of Uk,BThe voltage vector of the head end of the kth branch is taken as the voltage vector of the head end of the kth branch; u shapei,NIs the voltage vector of the ith node. The formula (22) is developed to obtain:
Figure BDA0002920016050000102
in the formula, Pk,BAnd Qk,BRespectively the active power and the reactive power of the branch k; pi,NAnd Qi,NInjecting active power and injecting reactive power respectively for the ith node; u shapek,BAnd
Figure BDA0002920016050000103
respectively representing the voltage mode value and the phase angle of the head end of the branch k; u shapei,NAnd
Figure BDA0002920016050000104
the voltage modulus and the phase angle of the ith node are respectively.
And (5) expanding the formula (23) to obtain a real part, and taking the real part as a variable quantity form. Considering that in the emergency control, the node injection reactive power variation is 0, it can be obtained:
Figure BDA0002920016050000105
power sensitivity beta between branch k power variable and node i injected power variablek-iComprises the following steps:
Figure BDA0002920016050000111
and sequencing the results obtained by the sensitivity analysis method, and taking the variable of 50 th before sequencing of the voltage of each node and the power sensitivity of each branch as a corresponding adjusting variable. And determining the comprehensive sensitivity analysis result of the adjustable capacitor/reactor and the distribution of heavy-load and multi-outgoing-line buses. Additionally, the effectiveness of adding PV nodes is also considered. The action space is constructed as follows:
Figure BDA0002920016050000112
wherein, Δ PGAdjustment of the active power output of the generator, NGThe number of generators of the system; delta PGkAdjusting the active output of the generator with great influence on the active power of a line k; Δ ViThe generator terminal voltage adjustment quantity which has great influence on the voltage of the node i is adjusted; delta TiThe adjustment quantity of the tap joint of the transformer which has great influence on the voltage of the node i is adjusted;
Figure BDA0002920016050000113
for influencing the voltage at node i to a greater extentThe effectiveness of the adjustable reactor;
Figure BDA0002920016050000114
the effectiveness of the adjustable capacitor with great influence on the voltage of the node i is achieved;
Figure BDA0002920016050000115
is the NthPVValidity of each additional PV node.
In this embodiment, the designing of the reward function includes:
setting a reward value of load flow convergence and a penalty value of load flow non-convergence according to the load flow calculation result;
and setting a reward value when the output of the generator exceeds the limit and a reward value when the output of the balancing machine exceeds the limit according to the output constraints of the generator and the balancing machine.
Specifically, the reward function is designed as follows:
Figure BDA0002920016050000116
in the formula, PGiAnd PLiRespectively the active power of the ith generator and the active power of the load in the region; n is a radical ofGAnd NLThe number of generators and loads, respectively; pEXExchange power for regions; pEXmaxExchanging the upper power limit for the region; pGimaxAnd PBimaxThe upper limits of the ith generator and the ith balance machine output respectively.
The final reward is
r=r1+r2+r3+r4+r5+r6 (28)
S130, acquiring the current operation state of the power grid system from the state space, adjusting the power flow convergence based on the final network parameters formed by the current operation state, and outputting an adjustment scheme. And inputting the current running state into the trained deep learning model, and forming final network parameters by using iterative training.
In a specific embodiment, the deep learning network is trained by the method comprising
According to the target Q function, a target Q network and a valuation Q' network are constructed, and the same network activation function and loss function are set for the two networks respectively; the objective Q function determines the desired value of the reward in a particular state and taking a particular action in that state. To avoid over-estimation, a Double Q-learning function is used, and the formula is as follows:
yj=rj+γQ′[s′j,argmaxa′Q(s′j,a,ω),ω′] (29)
in the DQL function, the optimal action selection and the optimal action estimation are implemented by different value functions, and 2Q functions are adopted: q (s, a, ω), Q '(s', a ', ω'). Gamma is the conversion rate.
And (5) performing iterative training by adopting a DDQN algorithm until iterative convergence is stable to obtain Q network parameters.
And constructing a deep learning model by adopting a PReLU function as an activation function and adopting a regular term formed by a norm as a loss function.
The construction of the deep learning model mainly describes the selection of an activation function and the modification of a loss function. The activation function adopts a PReLU function, and the formula is as follows:
f(x)=max(αx,x)α∈(0,1) (30)
the modified part of the loss function is L formed by a norm1The regularization term, the loss function L (ω) is:
Figure BDA0002920016050000121
where λ is the weight of the regularization term.
As shown in fig. 5, in training, 3.1 sets the total number of training iteration rounds to N, initializes the empirical playback space D, initializes the target Q network, and estimates Q 'network parameters ω, ω', ω ═ ω. Starting from load flow calculation adjustment j (j equals 1), the state variable of the system environment is sj, and action aj is generated according to an epsilon-greedy strategy.
3.2 perform action aj, get reward rj, new state sj +1, and whether to terminate state is _ end (flow is terminated if convergent, not convergent).
3.3 store the 5-tuple { sj, aj, rj, sj +1, is _ end } of the current state sj, action aj, reward rj, next state sj +1, and whether the state is _ end is terminated into the empirical playback space D.
3.4 randomly draw m samples { sj, aj, rj, sj +1, is _ end } from the empirical playback space D. It is determined whether is _ end is the final state. If so, jump out of the cycle, yj=rj. If not, the target Q function is calculated according to equation 29 and the Q network parameters are updated with the penalty function of equation 31.
3.5 update the state, j ═ j + 1.
3.6 the above process is continued until the iteration converges steadily, and the model is saved.
And 3.7, adopting the trained model to automatically adjust the power flow convergence of the power grid. And inputting the running state of the system into the trained Q network, and quickly generating an adjustment scheme to realize trend convergence.
The power balance comprises an active power balance part and a reactive power balance part. For the adjustment of active power balance, the active power output of the generator is adjusted; for the adjustment of reactive power balance, the adjustment can be realized by adjusting the terminal voltage of the generator (namely, the injected reactive power of the generator node), adding a PV node and switching a capacitor reactor. For the adjustment of the rationality of the power flow distribution, the out-of-limit branch power is adjusted to be within a reasonable range by adjusting the generator with larger influence on the line power.
The technical scheme provided by the embodiment of the application is applied to the field of automatic power flow convergence adjustment. Aiming at the problems of low efficiency and high strength of the manual operation mode formulation of the actual power grid, the automatic power flow convergence adjusting method based on deep reinforcement learning is provided. Selecting variables capable of reflecting the system state to form a state space based on manual experience; establishing a reinforcement learning executable action set by using a sensitivity analysis method, and reducing an action space; and establishing a reward function by considering operation experience and actual power grid constraint so as to form a reinforced learning environment. The improved PReLU function is adopted as an activation function, so that the problem that the gradient is 0 when the input is negative is avoided; and adding a norm regular term to the loss function to solve the over-fitting problem, thereby forming a deep learning model. The DDQN algorithm combining deep learning and reinforcement learning models is adopted to automatically adjust the load flow convergence, so that the efficiency of operation mode adjustment work can be greatly improved; the executable action set selection based on the sensitivity method can reduce the action space and improve the search efficiency.
Fig. 6 is a schematic structural diagram of an automatic power flow convergence adjusting device based on deep reinforcement learning according to an embodiment of the present application. As shown in fig. 6, the apparatus includes:
the environment establishing module 610 is used for acquiring power grid data to form a data sample; generating a state space and an action space for the data set according to the constructed reinforcement learning model; calculating the reward value of the control action in the action space according to the set reward function; when the reinforcement learning model is constructed, selecting control actions by using a sensitivity analysis method to form an action space; constructing a state space according to the trend states of a plurality of data samples in the data set; setting a reward function to calculate a reward expected value after the action is executed; and setting a target Q function, and calculating the network parameters of each iteration by using the loss function.
The deep learning model calculation module 620 is used for acquiring the current operation state of the power grid system from the state space, inputting the current operation state into the trained deep learning model, adjusting load flow convergence by using the final network parameters formed by iterative training, and outputting an adjustment scheme;
when the deep learning model is trained, inputting the current state s into the trained deep learning model, selecting a control a from an epsilon-greedy strategy action space for execution, and calculating according to the load flow to obtain a new state s' formed after the action a is executed;
judging whether the state s 'is a final state, giving out an award r according to an award function, storing data into an experience playback space D in a vector (s, a, r, s'), if not, calculating an award expected value by using a target Q function, updating the state, and continuing to select an adjustment action until iteration convergence is stable; and if the current is in the final state, finishing the power flow convergence adjustment and outputting an adjustment scheme.
The product can execute the method provided by the embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be appreciated by those skilled in the art that the invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The embodiments disclosed above are therefore to be considered in all respects as illustrative and not restrictive. All changes which come within the scope of or equivalence to the invention are intended to be embraced therein.

Claims (10)

1. A trend convergence automatic adjustment method based on deep reinforcement learning is characterized by comprising the following steps:
acquiring power grid data to form a data sample;
generating a state space for the data sample according to the constructed reinforcement learning model; when the reinforcement learning model is constructed, selecting control actions to form an action space; constructing a state space according to the trend states of a plurality of data samples in the data set and the mapping relation between the execution control action in the action space and the formed working state;
and acquiring the current operation state of the power grid system from the state space, adjusting the power flow convergence based on the final network parameters formed by the current operation state, and outputting an adjustment scheme.
2. The method for automatically adjusting the convergence of the deep reinforcement learning-based power flow according to claim 1, wherein the selecting the control action and forming the action space comprises the following steps:
according to the voltage sensitivity calculation of the load node, the sensitivity of the voltage state variable to the node injection reactive power, the transformer tap and the voltage of the generator terminal is respectively obtained; respectively sorting the voltage sensitivity of each node to obtain a sorting result of the voltage sensitivity of each node;
calculating the sensitivity of the node injection power to the line power; obtaining and sequencing power sensitivity of each branch circuit;
selecting adjustment quantities according to the voltage of each node and the power sensitivity sequencing result of each branch circuit to respectively form an adjustment quantity matrix of each action;
determining a capacitor/reactor to be adjusted according to the calculation result of the voltage sensitivity of the load node and the sensitivity of the node injection power to the line power and the position of the node with insufficient reactive power;
generating each validity matrix according to a capacitor/reactor to be adjusted and the added PV nodes obtained from the power grid topological structure;
and combining the action adjustment quantity matrixes and the effectiveness matrixes into a set to form an action space.
3. The method according to claim 2, wherein the action adjustment matrix comprises generator active output adjustment quantities, generator active output adjustment quantities with a large influence on line active power, generator terminal voltage adjustment quantities with a large influence on node voltage, and transformer tap adjustment quantities with a large influence on node voltage;
each validity matrix comprises validity of an adjustable reactor which has a large influence on the node voltage, validity of an adjustable capacitor which has a large influence on the node voltage, and validity of an additional PV node.
4. The method for automatically adjusting the convergence of the power flow based on the deep reinforcement learning as claimed in claim 1, wherein the state space is a set of power flow states reflecting measurable variables of system states in data samples; the observable variables comprise the output of the generator, the switching effectiveness of the capacitor/reactor, the effectiveness of the added PV node and the exchange power between areas.
5. The method for automatically adjusting the convergence of the trend based on the deep reinforcement learning of claim 1, wherein the construction of the reinforcement learning model further comprises designing a reward function according to the following method:
setting a reward value of load flow convergence and a penalty value of load flow non-convergence according to the load flow calculation result;
and setting a reward value when the output of the generator exceeds the limit and a reward value when the output of the balancing machine exceeds the limit according to the output constraints of the generator and the balancing machine.
6. The method for automatically adjusting the convergence of the trend based on the deep reinforcement learning of claim 5, wherein the deep learning model is trained by the method comprising
Setting a target Q function in a reinforcement learning model, constructing a target Q network and a valuation Q' network according to the target Q function, and respectively setting the same network activation function and loss function for the two networks;
selecting and executing a control action a from the action space; acquiring a current state s from the state space and a new state s' formed after the control action a is executed; a reward value r for performing the control action a; and a judgment result is _ end for judging whether the state s' is the final state; forming a power flow data vector quintuple (s, a, r, s', is _ end);
obtaining a trend data vector quintuple (s, a, r, s', is _ end) to a deep learning model for iterative training, wherein each iteration comprises the following steps: judging whether s' is in a final state, if not, calculating an incentive expected value by using a target Q function, and updating Q network parameters by using a loss function; updating the new state s' to the current state, and continuing to select a control action to form a new flow data vector quintuple until iteration convergence is stable; and if the current is in the final state, finishing the power flow convergence adjustment and outputting the final network parameters.
7. The method for automatically adjusting the convergence of the deep reinforcement learning-based power flow according to claim 6, wherein the target Q function is represented by the following formula:
yj=rj+γQ′[s′j,argmaxa′Q(s′j,a,ω),ω′]
wherein Q (s, a, omega) is a target Q network, Q'(s ', a ', ω ') is the estimated Q network, γ is the conversion ratio, rjIs the prize value.
8. The method for automatically adjusting the convergence of the deep reinforcement learning-based power flow according to claim 6, wherein the activation function is a PReLU function.
9. The method according to claim 6, wherein the modified part of the loss function is L formed by a norm1A regularization term, the loss function being represented by the following equation:
Figure FDA0002920016040000031
wherein, yiIs a target Q function; f (x)i(ii) a ω) is the activation function for the initial state; lambada | | omega | | non-conducting phosphor1Is a regular term; and N is the total iteration number.
10. An automatic power flow convergence adjusting system based on deep reinforcement learning is characterized by comprising:
the environment establishing module is used for acquiring power grid data to form a data sample;
generating a state space for the data sample according to the constructed reinforcement learning model; when the reinforcement learning model is constructed, selecting control actions to form an action space; constructing a state space according to the trend states of a plurality of data samples in the data set and the mapping relation between the execution control action in the action space and the formed working state;
and the deep learning model calculation module is used for acquiring the current operation state of the power grid system from the state space, adjusting the load flow convergence based on the final network parameters formed by the current operation state, and outputting an adjustment scheme.
CN202110114628.4A 2021-01-27 2021-01-27 Deep reinforcement learning-based automatic power flow convergence adjusting method and system Active CN112787331B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110114628.4A CN112787331B (en) 2021-01-27 2021-01-27 Deep reinforcement learning-based automatic power flow convergence adjusting method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110114628.4A CN112787331B (en) 2021-01-27 2021-01-27 Deep reinforcement learning-based automatic power flow convergence adjusting method and system

Publications (2)

Publication Number Publication Date
CN112787331A true CN112787331A (en) 2021-05-11
CN112787331B CN112787331B (en) 2022-06-14

Family

ID=75759183

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110114628.4A Active CN112787331B (en) 2021-01-27 2021-01-27 Deep reinforcement learning-based automatic power flow convergence adjusting method and system

Country Status (1)

Country Link
CN (1) CN112787331B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114880932A (en) * 2022-05-12 2022-08-09 中国电力科学研究院有限公司 Power grid operating environment simulation method, system, equipment and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111209710A (en) * 2020-01-07 2020-05-29 中国电力科学研究院有限公司 Automatic adjustment method and device for load flow calculation convergence
CN111224404A (en) * 2020-04-07 2020-06-02 江苏省电力试验研究院有限公司 Power flow rapid control method for electric power system with controllable phase shifter
US20200257971A1 (en) * 2019-01-11 2020-08-13 Chongqing University Full-linear model for optimal power flow of integrated power and natural-gas system based on deep learning methods
CN112001066A (en) * 2020-07-30 2020-11-27 四川大学 Deep learning-based method for calculating limit transmission capacity

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200257971A1 (en) * 2019-01-11 2020-08-13 Chongqing University Full-linear model for optimal power flow of integrated power and natural-gas system based on deep learning methods
CN111209710A (en) * 2020-01-07 2020-05-29 中国电力科学研究院有限公司 Automatic adjustment method and device for load flow calculation convergence
CN111224404A (en) * 2020-04-07 2020-06-02 江苏省电力试验研究院有限公司 Power flow rapid control method for electric power system with controllable phase shifter
CN112001066A (en) * 2020-07-30 2020-11-27 四川大学 Deep learning-based method for calculating limit transmission capacity

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王甜婧 等: "基于知识经验和深度强化学习的大电网潮流计算收敛自动调整方法", 《中国电机工程学报》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114880932A (en) * 2022-05-12 2022-08-09 中国电力科学研究院有限公司 Power grid operating environment simulation method, system, equipment and medium

Also Published As

Publication number Publication date
CN112787331B (en) 2022-06-14

Similar Documents

Publication Publication Date Title
Li et al. Coordinated load frequency control of multi-area integrated energy system using multi-agent deep reinforcement learning
CN107437813B (en) Power distribution network reactive power optimization method based on cuckoo-particle swarm
CN106295880A (en) A kind of method and system of power system multi-objective reactive optimization
CN112507614B (en) Comprehensive optimization method for power grid in distributed power supply high-permeability area
CN114362196B (en) Multi-time-scale active power distribution network voltage control method
El Helou et al. Fully decentralized reinforcement learning-based control of photovoltaics in distribution grids for joint provision of real and reactive power
CN114217524A (en) Power grid real-time self-adaptive decision-making method based on deep reinforcement learning
Zou Design of reactive power optimization control for electromechanical system based on fuzzy particle swarm optimization algorithm
CN113300380B (en) Load curve segmentation-based power distribution network reactive power optimization compensation method
CN107516892A (en) The method that the quality of power supply is improved based on processing active optimization constraints
CN114784823A (en) Micro-grid frequency control method and system based on depth certainty strategy gradient
CN116629461B (en) Distributed optimization method, system, equipment and storage medium for active power distribution network
CN115940294B (en) Multi-stage power grid real-time scheduling strategy adjustment method, system, equipment and storage medium
CN115313403A (en) Real-time voltage regulation and control method based on deep reinforcement learning algorithm
CN113872213B (en) Autonomous optimization control method and device for power distribution network voltage
CN112787331B (en) Deep reinforcement learning-based automatic power flow convergence adjusting method and system
CN109449994B (en) Power regulation and control method for active power distribution network comprising flexible interconnection device
CN117039981A (en) Large-scale power grid optimal scheduling method, device and storage medium for new energy
CN108964099B (en) Distributed energy storage system layout method and system
CN117335494A (en) Cluster division-based power distribution network source-network-load multi-level collaborative planning method
CN116436003A (en) Active power distribution network risk constraint standby optimization method, system, medium and equipment
CN113270869B (en) Reactive power optimization method for photovoltaic power distribution network
CN116523327A (en) Method and equipment for intelligently generating operation strategy of power distribution network based on reinforcement learning
Li et al. Research on dynamic switch migration strategy based on fmopso
Wang Grid Voltage Control Method Based on Generator Reactive Power Regulation Using Reinforcement Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant