CN112787331B - Deep reinforcement learning-based automatic power flow convergence adjusting method and system - Google Patents
Deep reinforcement learning-based automatic power flow convergence adjusting method and system Download PDFInfo
- Publication number
- CN112787331B CN112787331B CN202110114628.4A CN202110114628A CN112787331B CN 112787331 B CN112787331 B CN 112787331B CN 202110114628 A CN202110114628 A CN 202110114628A CN 112787331 B CN112787331 B CN 112787331B
- Authority
- CN
- China
- Prior art keywords
- power
- node
- state
- voltage
- action
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 49
- 230000002787 reinforcement Effects 0.000 title claims abstract description 45
- 230000009471 action Effects 0.000 claims abstract description 77
- 238000013507 mapping Methods 0.000 claims abstract description 6
- 230000006870 function Effects 0.000 claims description 71
- 230000035945 sensitivity Effects 0.000 claims description 47
- 239000003990 capacitor Substances 0.000 claims description 23
- 238000004364 calculation method Methods 0.000 claims description 16
- 239000011159 matrix material Substances 0.000 claims description 16
- 238000002347 injection Methods 0.000 claims description 15
- 239000007924 injection Substances 0.000 claims description 15
- 238000013136 deep learning model Methods 0.000 claims description 14
- 230000004913 activation Effects 0.000 claims description 13
- 238000012163 sequencing technique Methods 0.000 claims description 10
- 238000012549 training Methods 0.000 claims description 8
- 238000010276 construction Methods 0.000 claims description 7
- OAICVXFJPJFONN-UHFFFAOYSA-N Phosphorus Chemical compound [P] OAICVXFJPJFONN-UHFFFAOYSA-N 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 13
- 238000010206 sensitivity analysis Methods 0.000 description 10
- 238000004422 calculation algorithm Methods 0.000 description 7
- 238000004590 computer program Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 238000012545 processing Methods 0.000 description 6
- 238000013473 artificial intelligence Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 238000009826 distribution Methods 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 238000003860 storage Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 239000000243 solution Substances 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007621 cluster analysis Methods 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/04—Circuit arrangements for ac mains or ac distribution networks for connecting networks of the same frequency but supplied from different sources
- H02J3/06—Controlling transfer of power between connected networks; Controlling sharing of load between connected networks
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J2203/00—Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
- H02J2203/10—Power transmission or distribution systems management focussing at grid-level, e.g. load flow analysis, node profile computation, meshed network optimisation, active network management or spinning reserve management
Landscapes
- Engineering & Computer Science (AREA)
- Power Engineering (AREA)
- Supply And Distribution Of Alternating Current (AREA)
Abstract
The embodiment of the application discloses a method and a system for automatically adjusting trend convergence based on deep reinforcement learning. The method comprises the following steps: acquiring power grid data to form a data sample; generating a state space for the data sample according to the constructed reinforcement learning model; when the reinforcement learning model is constructed, selecting control actions to form an action space; constructing a state space according to the trend states of a plurality of data samples in the data set and the mapping relation between the execution control action in the action space and the formed working state; and acquiring the current operation state of the power grid system from the state space, adjusting the power flow convergence based on the final network parameters formed by the current operation state, and outputting an adjustment scheme. According to the technical scheme, automatic adjustment of large-scale power grid flow convergence can be achieved, the working intensity of operation mode setting personnel is reduced, and the working efficiency is improved.
Description
Technical Field
The embodiment of the application relates to the technical field of automatic adjustment of power flow convergence, in particular to a method and a system for automatically adjusting power flow convergence based on deep reinforcement learning.
Background
With the continuous development of science and technology, the operation mode of the power grid system is also changed greatly. The load flow calculation is the basis for simulation analysis of the power grid operation mode and the planning scheme. The actual large power grid nodes are tens of thousands of nodes, the structure is complex, the adjustable variables are numerous, and the difficulty in load flow convergence adjustment is high. The traditional trend adjusting mode seriously depends on manual experience, and consumes a great amount of manpower and time, so that the working efficiency of establishing the operation mode is low. With the development of artificial intelligence technology, the artificial intelligence technology has some exploration and application in the field of electric power. An artificial intelligence method for automatically adjusting power flow convergence is urgently needed in an electric power system to improve the efficiency of the work of setting an operation mode.
At present, there are initial attempts of artificial intelligence technology in trend analysis, which mainly relate to two aspects of operation mode arrangement and management based on an intelligent method and trend adjustment based on an expert system. Simulating operation mode arrangement based on a knowledge base, and developing an intelligent regional power grid operation mode arrangement decision support system; extracting and representing characteristic variables of the power grid operation state through cluster analysis, and realizing the fine management of the power grid operation mode; and combining static network equivalence with an expert system, and providing a large-range power grid reactive voltage adjusting method and the like. The research mainly integrates expert knowledge into a knowledge base, solves practical problems by utilizing the existing expert experience, still has great dependence on manpower, and is lack of a self-exploration mechanism.
Disclosure of Invention
The embodiment of the application provides a method and a system for automatically adjusting power flow convergence based on deep reinforcement learning, so that the automatic adjustment of large-scale power grid power flow convergence is realized, the working intensity of operation mode setting personnel is reduced, and the working efficiency is improved.
In a first aspect, an embodiment of the present application provides an automatic power flow convergence adjustment method based on deep reinforcement learning, including:
acquiring power grid data to form a data sample;
generating a state space for the data sample according to the constructed reinforcement learning model; when the reinforcement learning model is constructed, selecting control actions to form an action space; constructing a state space according to the trend states of a plurality of data samples in the data set and the mapping relation between the execution control action in the action space and the formed working state;
and acquiring the current operation state of the power grid system from the state space, adjusting the power flow convergence based on the final network parameters formed by the current operation state, and outputting an adjustment scheme.
In the embodiment, in the construction of the reinforcement learning model, an action space construction method based on a sensitivity method is provided, and a reward function which meets the operation constraint requirement of an actual power grid is designed by combining knowledge and experience. By means of the mode of combining the reinforcement learning model and the deep learning model, automatic adjustment of large-scale power grid load flow convergence can be achieved, working intensity of operation mode setting personnel is reduced, and working efficiency is improved.
Preferably, when the control action is selected, the forming of the action space includes the following steps: according to the voltage sensitivity calculation of the load node, the sensitivity of the voltage state variable to the node injection reactive power, the transformer tap and the voltage of the generator terminal is respectively obtained; respectively sorting the voltage sensitivity of each node to obtain a sorting result of the voltage sensitivity of each node; calculating the sensitivity of the node injection power to the line power; obtaining and sequencing power sensitivity of each branch circuit; selecting adjustment quantities according to the voltage of each node and the power sensitivity sequencing result of each branch circuit to respectively form an adjustment quantity matrix of each action; determining a capacitor/reactor to be adjusted according to the calculation result of the voltage sensitivity of the load node and the sensitivity of the node injection power to the line power and the position of the node with insufficient reactive power; generating each validity matrix according to a capacitor/reactor to be adjusted and the added PV nodes obtained from the power grid topological structure; and combining the action adjustment quantity matrixes and the effectiveness matrixes into a set to form an action space.
In the embodiment, a reinforcement learning executable action set is constructed based on a sensitivity analysis method, so that the search space is reduced, and the adjustment efficiency is improved.
Preferably, in any one of the above embodiments, each motion adjustment quantity matrix includes an active output adjustment quantity of the generator, an active output adjustment quantity of the generator that has a large influence on line active power, a generator terminal voltage adjustment quantity that has a large influence on node voltage, and a transformer tap adjustment quantity that has a large influence on node voltage;
each validity matrix comprises validity of an adjustable reactor which has a large influence on the node voltage, validity of an adjustable capacitor which has a large influence on the node voltage, and validity of an additional PV node.
Preferably, in any one of the above embodiments, the state space is a set of tidal current states reflecting observable variables of the system state in the data sample; the observable measurements include generator output, capacitor/reactor switching effectiveness, effectiveness of additional PV nodes, and inter-area exchange power.
In any of the above embodiments, preferably, when the reinforcement learning model is constructed, the method further includes designing a reward function according to the following method:
setting a reward value of load flow convergence and a penalty value of load flow non-convergence according to the load flow calculation result;
and setting a reward value when the output of the generator exceeds the limit and a reward value when the output of the balancing machine exceeds the limit according to the output constraints of the generator and the balancing machine.
When the deep learning model preferred in any of the above embodiments is trained, the training includes
Setting a target Q function in a reinforcement learning model, constructing a target Q network and a valuation Q' network according to the target Q function, and respectively setting the same network activation function and loss function for the two networks;
and (5) performing iterative training by adopting a DDQN algorithm until iterative convergence is stable to obtain Q network parameters.
Further preferably, the activation function is a PReLU function. In the present embodiment, the prilu function is proposed as an improvement for the problem that the gradient is 0 during the back propagation of the deep neural network activation function.
Further preferably, the target Q function is represented by the following formula:
yj=rj+γQ′[s′j,argmaxa′Q(s′j,a,ω),ω′]
wherein Q (s, a, ω) is a target Q network, Q '(s', a ', ω') is an estimated Q network, γ is a conversion ratio, rjIs the prize value.
Further preferably, the modified part of the loss function is L formed by a norm1A regularization term, the loss function being represented by the following equation: and selecting a norm as a regular term of the loss function to improve the over-fitting problem of the model.
Wherein, yiIs a target Q function; f (x)i(ii) a Omega) is an activation function of an initial state, lambda | omega | non-woven gas1As a regular term, N is the total number of iterations.
Another embodiment of the present invention further provides a deep reinforcement learning-based automatic power flow convergence adjustment system, including:
the environment establishing module is used for acquiring power grid data to form a data sample;
generating a state space for the data sample according to the constructed reinforcement learning model; when the reinforcement learning model is constructed, selecting control actions to form an action space; constructing a state space according to the trend states of a plurality of data samples in the data set and the mapping relation between the execution control action in the action space and the formed working state;
and the deep learning model calculation module is used for acquiring the current operation state of the power grid system from the state space, adjusting the load flow convergence based on the final network parameters formed by the current operation state, and outputting an adjustment scheme.
According to the technical scheme provided by the embodiment of the application, the automatic adjustment of the large power grid load flow convergence is realized by adopting a deep reinforcement learning model, an action space construction method based on a sensitivity method is provided in the construction of the reinforcement learning model, and a reward function which meets the actual power grid operation constraint requirement is designed by combining knowledge and experience. Aiming at the problem that the gradient is 0 in the back propagation process of the deep neural network activation function, a PReLU function is provided as an improvement; and selecting a norm as a regular term of the loss function to improve the over-fitting problem of the model. The method can realize automatic adjustment of large-scale power grid flow convergence, reduce the working intensity of operation mode setting personnel and improve the working efficiency.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:
fig. 1 is a flowchart of an automatic power flow convergence adjustment method provided in an embodiment of the present application;
FIG. 2 is a schematic diagram of a line power flow profile provided by an embodiment of the present application;
fig. 3 is a schematic diagram of a transformer branch power flow distribution provided by an embodiment of the present application;
FIG. 4 is a flow chart of the construction of an action space provided by an embodiment of the present application;
FIG. 5 is a flow chart of a DDQN algorithm provided by an embodiment of the present application;
fig. 6 is a schematic structural diagram of an automatic power flow convergence adjustment system according to an embodiment of the present application.
Detailed Description
The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
The following detailed description is exemplary in nature and is intended to provide further details of the invention. Unless otherwise defined, all technical terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention.
With the development of artificial intelligence technology, machine learning algorithms make it possible to learn a solution to a problem by the machine itself, with little or no human intervention. The trend adjustment is a discrete action problem, and an optimal value algorithm represented by Q-learning in the reinforcement learning algorithm is suitable for processing the discrete action problem. In addition, deep learning and reinforcement learning are combined, fitting efficiency and fitting precision of a reinforcement learning value function can be improved, and therefore a possible solution is provided for achieving large-scale automatic power grid load flow adjustment.
Fig. 1 is a flowchart of a method for automatically adjusting power flow convergence based on deep reinforcement learning according to an embodiment of the present application, where the present embodiment is applicable to a case of automatically adjusting power flow, and the method may be executed by an apparatus for automatically adjusting power flow convergence, which may be implemented by software and/or hardware and may be integrated in an electronic device.
As shown in fig. 1, the method for automatically adjusting power flow convergence includes:
s110, acquiring power grid data to form a data sample;
s120, generating a state space for the data sample according to the constructed reinforcement learning model; when the reinforcement learning model is constructed, selecting control actions by using a sensitivity analysis method to form an action space; constructing a state space according to the trend states of a plurality of data samples in the data set and the mapping relation between the execution control action in the action space and the formed working state;
firstly, constructing a reinforcement learning model environment, which mainly comprises the steps of constructing a state space, constructing an action space, designing a reward function, designing a target Q function and the like.
In this embodiment, optionally, the process of constructing the state space includes:
selecting observable variables reflecting the system state; the observable variables include: the output of the generator, the switching effectiveness of the capacitor/reactor, the effectiveness of adding PV nodes and the exchange power among the areas;
in the power flow convergence adjustment, the state space needs to be capable of reflecting the active power and reactive power balance characteristics of the system and the relevant characteristics of the region. By combining the above considerations, the observable variables capable of reflecting the system state are selected to mainly comprise the output of the generator, the switching effectiveness of the capacitor/reactor, the effectiveness of the added PV node, the exchange power between the areas and the like. The tidal current states of a plurality of samples jointly form a state space, as shown in formula (1):
in the formula (I), the compound is shown in the specification,the active output of the generator j of the ith sample is obtained;validity of capacitor/reactor k for the ith sample;the effectiveness of the PV node l added for the ith sample;power is exchanged for the region m of the ith sample. (j 1.. times.x; k.1.. times.N)CR;l=1,...,NPV;m=1,...,NE. Wherein, x and NCR、NPVAnd NEThe number of the generator, the capacitor/reactor, the added PV node and the area exchange channel is respectively, and M is the number of samples. )
As shown in fig. 4, in this embodiment, optionally, the constructing process of the action space includes:
selecting the generator terminal voltage of the system, the transformer tap and the node which have great influence on the node voltage, injecting reactive power, and injecting active power into the node which has great influence on the line active power change by adopting a sensitivity analysis method;
and determining the capacitor/reactor to be adjusted according to the reactive compensation device near the node and the sensitivity analysis result.
Considering that for the active power of a certain line, not all the active power of the generator is adjusted to have obvious influence on the active power; not all generator side voltages, capacitors/reactors or transformer taps will have a significant effect on the voltage at a certain node by regulation. Therefore, in order to adjust the trend more efficiently, it is necessary to select an appropriate adjustable variable to form an action space of the system, so as to improve the search efficiency of the learning process.
Selecting the generator terminal voltage of the system, the tap joint of the transformer and the node with large influence on the node voltage, injecting reactive power and the node injecting power with large influence on the active power change of the line by adopting a sensitivity analysis method; for the selection of the capacitor/reactor, the nodes with insufficient reactive power are mostly heavy-load buses with multiple outgoing lines, the reactive power compensation devices near the nodes are mainly concerned, and the capacitor/reactor to be adjusted is determined by combining sensitivity analysis results.
The forming of the motion space comprises the following steps:
s410, calculating according to the voltage sensitivity of the load node, and respectively obtaining the sensitivity of the voltage state variable to the node injection reactive power, the voltage of a transformer tap and the voltage of the generator terminal; respectively sorting the voltage sensitivity of each node to obtain a sorting result of the voltage sensitivity of each node;
s420, calculating the sensitivity of the node injection power to the line power; obtaining and sequencing power sensitivity of each branch circuit;
s430, selecting adjustment quantities according to the voltage of each node and the power sensitivity sequencing result of each branch, and respectively forming an adjustment quantity matrix of each action;
s440, determining a capacitor/reactor to be adjusted according to the calculation result of the voltage sensitivity of the load node and the sensitivity of the node injection power to the line power and the position of the node with insufficient reactive power;
s450, generating each validity matrix according to the capacitor/reactor to be adjusted and the added PV nodes obtained from the power grid topological structure;
and S460, combining the action adjustment quantity matrixes and the effectiveness matrixes into a set to form an action space.
The specific calculation sensitivity adopts the following process, and as shown in fig. 2, the power transmitted on the transmission line is derived as follows:
conversion to polar form yields:
the real part and the imaginary part of the formula (3) are separated:
for transmission lines, Gij<<BijAnd thetaij≈0,Gijsinθij<<BijcosθijThe formula (5) is simplified to
Fig. 3 is a schematic diagram of a power flow distribution of a transformer branch according to an embodiment of the present application, and as shown in fig. 3, power transmitted by the transformer branch is derived as follows:
The real and imaginary parts of equation (8) are separated:
due to thetaij0, therefore Gijsinθij0, equation (10) can be simplified to obtain:
obtaining the reactive power injected into the node by the formula (6) and the formula (11):
equation (12) is expressed as:
Q=[QD,QC]T (13)
wherein Q isDBeing reactive of the load and generator nodes, QCReactive power is injected into the reactive power compensation node.
Will the generator terminal voltage UGLoad node voltage ULWith the transformer tap T as the control variable and the other variables as the state variables, the respective sensitivities are obtained as:
the sensitivity of the voltage state variable to node injected reactive power is
The sensitivity of the voltage state variable to the transformer tap is
The sensitivity of the voltage state variable to the voltage at the generator terminal is
(1) Derivation of node injected power to line power sensitivity
For a high-voltage transmission network, the reactance value is far larger than the resistance value, the resistance is omitted, and a generator and a load in the power grid are represented as node injection currents. The node voltage equation of the network is
IN=YNUN (17)
In the formula: i isNInjecting a current column vector for the node (with the direction of the incoming node being the positive direction); u shapeNIs a node voltage column vector; y isNIs a node susceptance matrix.
The relationship between the network branch current and the node voltage is
IB=YBATUN (18)
In the formula: i isBIs a branch current column vector; y isBIs a branch susceptance matrix; a is a node incidence matrix.
From formulas (17) and (18):
defining a grid correlation coefficient matrix C (lambda) as:
take branch k as an example, whichCurrent vector Ik,BIs a linear combination of the injected currents at each node:
Ik,B=λk-1I1,N+...+λk-iIi,N+...+λk-nIn,N (21)
in the formula Ik,BAnd node I injects a current Ii,NHas a correlation of λk-i,|λk-iThe larger is i, the larger is the influence of the change in the injection current of the node i on the current on the branch k.
Obtained by processing the formula (21)
In the formula of Uk,BThe voltage vector of the head end of the kth branch is taken as the voltage vector of the head end of the kth branch; u shapei,NIs the voltage vector of the ith node. The formula (22) is developed to obtain:
in the formula, Pk,BAnd Qk,BRespectively the active power and the reactive power of the branch k; p isi,NAnd Qi,NInjecting active power and injecting reactive power respectively for the ith node; u shapek,BAndrespectively representing the voltage mode value and the phase angle of the head end of the branch k; u shapei,NAndthe voltage modulus and the phase angle of the ith node are respectively.
And (5) expanding the formula (23) to obtain a real part, and taking the real part as a variable quantity form. Considering that in the emergency control, the node injection reactive power variation is 0, it can be obtained:
power sensitivity beta between branch k power variable and node i injected power variablek-iComprises the following steps:
and sequencing the results obtained by the sensitivity analysis method, and taking the variable of 50 th before sequencing of the voltage of each node and the power sensitivity of each branch as a corresponding adjusting variable. And determining the comprehensive sensitivity analysis result of the adjustable capacitor/reactor and the distribution of heavy-load and multi-outgoing-line buses. Additionally, the effectiveness of adding PV nodes is also considered. The action space is constructed as follows:
wherein, Δ PGAdjustment of the active power output of the generator, NGThe number of generators of the system; delta PGkAdjusting the active output of the generator with great influence on the active power of a line k; Δ ViThe generator terminal voltage adjustment quantity which has great influence on the voltage of the node i is adjusted; delta TiThe adjustment quantity of the tap joint of the transformer which has great influence on the voltage of the node i is adjusted;the effectiveness of the adjustable reactor which has great influence on the voltage of the node i is achieved;the effectiveness of the adjustable capacitor with great influence on the voltage of the node i is achieved;is the NthPVValidity of each additional PV node.
In this embodiment, the designing of the reward function includes:
setting a reward value of power flow convergence and a penalty value of power flow non-convergence according to a power flow calculation result;
according to the output constraints of the generator and the balancing machine, a reward value when the output of the generator exceeds the limit and a reward value when the output of the balancing machine exceeds the limit are set.
Specifically, the reward function is designed as follows:
in the formula, PGiAnd PLiRespectively the active power of the ith generator and the active power of the load in the region; n is a radical ofGAnd NLThe number of generators and loads, respectively; pEXExchange power for regions; pEXmaxExchanging the upper power limit for the region; pGimaxAnd PBimaxThe upper limits of the ith generator and the ith balance machine output respectively.
The final reward is
r=r1+r2+r3+r4+r5+r6 (28)
S130, acquiring the current operation state of the power grid system from the state space, adjusting the power flow convergence based on the final network parameters formed by the current operation state, and outputting an adjustment scheme. And inputting the current running state into the trained deep learning model, and forming final network parameters by using iterative training.
In a specific embodiment, the deep learning network is trained by the method comprising
According to the target Q function, a target Q network and a valuation Q' network are constructed, and the same network activation function and loss function are set for the two networks respectively; the objective Q function determines the desired value of the reward in a particular state and taking a particular action in that state. To avoid over-estimation, a Double Q-learning function is used, and the formula is as follows:
yj=rj+γQ′[s′j,argmaxa′Q(s′j,a,ω),ω′] (29)
in the DQL function, the optimal action selection and the optimal action estimation are implemented by different value functions, and 2Q functions are adopted: q (s, a, ω), Q '(s', a ', ω'). Gamma is the conversion rate.
And (5) performing iterative training by adopting a DDQN algorithm until iterative convergence is stable to obtain Q network parameters.
And constructing a deep learning model by adopting a PReLU function as an activation function and adopting a regular term formed by a norm as a loss function.
The construction of the deep learning model mainly describes the selection of an activation function and the modification of a loss function. The activation function adopts a PReLU function, and the formula is as follows:
f(x)=max(αx,x)α∈(0,1) (30)
the modified part of the loss function is L formed by a norm1The regularization term, the loss function L (ω) is:
where λ is the weight of the regularization term.
As shown in fig. 5, in training, 3.1 sets the total number of training iteration rounds to N, initializes the empirical playback space D, initializes the target Q network, and estimates Q 'network parameters ω, ω', ω ═ ω. Starting from load flow calculation adjustment j (j equals 1), the state variable of the system environment is sj, and action aj is generated according to an epsilon-greedy strategy.
3.2 perform action aj, get reward rj, new state sj +1, and whether to terminate state is _ end (flow is terminated if convergent, not convergent).
3.3 store the 5-tuple { sj, aj, rj, sj +1, is _ end } of the current state sj, action aj, reward rj, next state sj +1, and whether the state is _ end is terminated into the empirical playback space D.
3.4 randomly draw m samples { sj, aj, rj, sj +1, is _ end } from the empirical playback space D. It is determined whether is _ end is the final state. If so, jump out of the cycle, yj=rj. If not, the target Q function is calculated according to equation 29 and the Q network parameters are updated with the penalty function of equation 31.
3.5 update the state, j ═ j + 1.
3.6 the above process is continued until the iteration converges steadily, and the model is saved.
And 3.7, adopting the trained model to automatically adjust the power flow convergence of the power grid. And inputting the running state of the system into the trained Q network, and quickly generating an adjustment scheme to realize trend convergence.
The power balance comprises an active power balance part and a reactive power balance part. For the adjustment of active power balance, the active power output of the generator is adjusted; for the adjustment of reactive power balance, the adjustment can be realized by adjusting the terminal voltage of the generator (namely, the injected reactive power of the generator node), adding a PV node and switching a capacitor reactor. For the adjustment of the rationality of the power flow distribution, the out-of-limit branch power is adjusted to be within a reasonable range by adjusting the generator with larger influence on the line power.
The technical scheme provided by the embodiment of the application is applied to the field of automatic adjustment of power flow convergence. Aiming at the problems of low efficiency and high strength of the manual operation mode formulation of the actual power grid, the automatic power flow convergence adjusting method based on deep reinforcement learning is provided. Selecting variables capable of reflecting the system state to form a state space based on manual experience; establishing a reinforcement learning executable action set by using a sensitivity analysis method, and reducing an action space; and establishing a reward function by considering operation experience and actual power grid constraint so as to form a reinforced learning environment. The improved PReLU function is adopted as an activation function, so that the problem that the gradient is 0 when the input is negative is avoided; and adding a norm regular term to the loss function to solve the over-fitting problem, thereby forming a deep learning model. The DDQN algorithm combining deep learning and reinforcement learning models is adopted to automatically adjust the load flow convergence, so that the efficiency of operation mode adjustment work can be greatly improved; the executable action set selection based on the sensitivity method can reduce the action space and improve the search efficiency.
Fig. 6 is a schematic structural diagram of an automatic power flow convergence adjusting device based on deep reinforcement learning according to an embodiment of the present application. As shown in fig. 6, the apparatus includes:
the environment establishing module 610 is used for acquiring power grid data to form a data sample; generating a state space and an action space for the data set according to the constructed reinforcement learning model; calculating the reward value of the control action in the action space according to the set reward function; when the reinforcement learning model is constructed, selecting control actions by using a sensitivity analysis method to form an action space; constructing a state space according to the trend states of a plurality of data samples in the data set; setting a reward function to calculate a reward expected value after the action is executed; and setting a target Q function, and calculating the network parameters of each iteration by using the loss function.
The deep learning model calculation module 620 is used for acquiring the current operation state of the power grid system from the state space, inputting the current operation state into the trained deep learning model, adjusting load flow convergence by using the final network parameters formed by iterative training, and outputting an adjustment scheme;
when the deep learning model is trained, inputting the current state s into the trained deep learning model, selecting a control a from an epsilon-greedy strategy action space for execution, and calculating according to the load flow to obtain a new state s' formed after the action a is executed;
judging whether the state s 'is a final state, giving out an award r according to an award function, storing data into an experience playback space D in a vector (s, a, r, s'), if not, calculating an award expected value by using a target Q function, updating the state, and continuing to select an adjustment action until iteration convergence is stable; and if the current is in the final state, finishing the power flow convergence adjustment and outputting an adjustment scheme.
The product can execute the method provided by the embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be appreciated by those skilled in the art that the invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The embodiments disclosed above are therefore to be considered in all respects as illustrative and not restrictive. All changes which come within the scope of or are equivalent to the scope of the invention are intended to be embraced therein.
Claims (8)
1. A trend convergence automatic adjustment method based on deep reinforcement learning is characterized by comprising the following steps:
acquiring power grid data to form a data sample;
generating a state space for the data sample according to the constructed reinforcement learning model; when the reinforcement learning model is constructed, selecting control actions to form an action space; constructing a state space according to the trend states of a plurality of data samples in the data set and the mapping relation between the execution control action in the action space and the formed working state;
acquiring the current operation state of the power grid system from a state space, adjusting the power flow convergence based on the final network parameters formed by the current operation state, and outputting an adjustment scheme;
the selecting the control action and forming the action space comprises the following steps:
according to the voltage sensitivity calculation of the load node, the sensitivity of the voltage state variable to the node injection reactive power, the transformer tap and the voltage of the generator terminal is respectively obtained; respectively sorting the voltage sensitivity of each node to obtain a sorting result of the voltage sensitivity of each node;
calculating the sensitivity of the node injection power to the line power; obtaining and sequencing power sensitivity of each branch circuit;
selecting adjustment quantities according to the voltage of each node and the power sensitivity sequencing result of each branch circuit to respectively form an adjustment quantity matrix of each action;
determining a capacitor/reactor to be adjusted according to the calculation result of the voltage sensitivity of the load node and the sensitivity of the node injection power to the line power and the position of the node with insufficient reactive power;
generating each validity matrix according to a capacitor/reactor to be adjusted and the added PV nodes obtained from the power grid topological structure;
and combining the action adjustment quantity matrixes and the effectiveness matrixes into a set to form an action space.
2. The method according to claim 1, wherein the action adjustment matrix comprises generator active output adjustment quantities, generator active output adjustment quantities with a large influence on line active power, generator terminal voltage adjustment quantities with a large influence on node voltage, and transformer tap adjustment quantities with a large influence on node voltage;
each validity matrix comprises validity of an adjustable reactor which has a large influence on the node voltage, validity of an adjustable capacitor which has a large influence on the node voltage, and validity of an additional PV node.
3. The method for automatically adjusting the convergence of the power flow based on the deep reinforcement learning as claimed in claim 1, wherein the state space is a set of power flow states reflecting measurable variables of system states in data samples; the observable variables comprise the output of the generator, the switching effectiveness of the capacitor/reactor, the effectiveness of the added PV node and the exchange power between areas.
4. The method for automatically adjusting the convergence of the trend based on the deep reinforcement learning of claim 1, wherein the construction of the reinforcement learning model further comprises designing a reward function according to the following method:
setting a reward value of power flow convergence and a penalty value of power flow non-convergence according to a power flow calculation result;
according to the output constraints of the generator and the balancing machine, a reward value when the output of the generator exceeds the limit and a reward value when the output of the balancing machine exceeds the limit are set.
5. The method for automatically adjusting the convergence of the trend based on the deep reinforcement learning of claim 4, wherein the deep learning model is trained by the method comprising
Setting a target Q function in a reinforcement learning model, constructing a target Q network and a valuation Q' network according to the target Q function, and respectively setting the same network activation function and loss function for the two networks;
selecting and executing a control action a from the action space; acquiring a current state s from the state space and a new state s' formed after the control action a is executed; a reward value r for performing the control action a; and a judgment result is _ end for judging whether the state s' is the final state; forming a power flow data vector quintuple (s, a, r, s', is _ end);
obtaining a flow data vector quintuple (s, a, r, s', is _ end) to a deep learning model for iterative training, wherein each iteration comprises the following steps: judging whether s' is in a final state, if not, calculating an incentive expected value by using a target Q function, and updating Q network parameters by using a loss function; updating the new state s' to the current state, and continuing to select a control action to form a new flow data vector quintuple until iteration convergence is stable; and if the current is in the final state, finishing the power flow convergence adjustment and outputting the final network parameters.
6. The method for automatically adjusting power flow convergence based on deep reinforcement learning of claim 5, wherein the activation function is a PReLU function.
7. The method according to claim 5, wherein the modified part of the loss function is L formed by a norm1A regularization term, the loss function being represented by the following equation:
wherein, yiIs a target Q function; f (x)i(ii) a ω) is the activation function for the initial state; lambada | | omega | | non-conducting phosphor1Is a regular term; and N is the total iteration number.
8. An automatic power flow convergence adjusting system based on deep reinforcement learning is characterized by comprising:
the environment establishing module is used for acquiring power grid data to form a data sample;
generating a state space for the data sample according to the constructed reinforcement learning model; when the reinforcement learning model is constructed, selecting control actions to form an action space; constructing a state space according to the trend states of a plurality of data samples in the data set and the mapping relation between the execution control action in the action space and the formed working state;
the deep learning model calculation module is used for acquiring the current operation state of the power grid system from a state space, adjusting load flow convergence based on the final network parameters formed by the current operation state and outputting an adjustment scheme;
the selecting the control action and forming the action space comprises the following steps:
according to the voltage sensitivity calculation of the load node, the sensitivity of the voltage state variable to the node injection reactive power, the transformer tap and the voltage of the generator terminal is respectively obtained; respectively sorting the voltage sensitivity of each node to obtain a sorting result of the voltage sensitivity of each node;
calculating the sensitivity of the node injection power to the line power; obtaining and sequencing power sensitivity of each branch circuit;
selecting adjustment quantities according to the voltage of each node and the power sensitivity sequencing result of each branch, and respectively forming an adjustment quantity matrix of each action;
determining a capacitor/reactor to be adjusted according to the calculation result of the voltage sensitivity of the load node and the sensitivity of the node injection power to the line power and the position of the node with insufficient reactive power;
generating each validity matrix according to a capacitor/reactor to be adjusted and the added PV nodes obtained from the power grid topological structure;
and combining the action adjustment quantity matrixes and the effectiveness matrixes into a set to form an action space.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110114628.4A CN112787331B (en) | 2021-01-27 | 2021-01-27 | Deep reinforcement learning-based automatic power flow convergence adjusting method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110114628.4A CN112787331B (en) | 2021-01-27 | 2021-01-27 | Deep reinforcement learning-based automatic power flow convergence adjusting method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112787331A CN112787331A (en) | 2021-05-11 |
CN112787331B true CN112787331B (en) | 2022-06-14 |
Family
ID=75759183
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110114628.4A Active CN112787331B (en) | 2021-01-27 | 2021-01-27 | Deep reinforcement learning-based automatic power flow convergence adjusting method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112787331B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113489010B (en) * | 2021-06-21 | 2024-05-28 | 清华大学 | Power system power flow sample convergence adjustment method |
CN114336632B (en) * | 2021-12-28 | 2024-09-06 | 西安交通大学 | Method for correcting alternating current power flow based on model information assisted deep learning |
CN114880932B (en) * | 2022-05-12 | 2023-03-10 | 中国电力科学研究院有限公司 | Power grid operating environment simulation method, system, equipment and medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111209710A (en) * | 2020-01-07 | 2020-05-29 | 中国电力科学研究院有限公司 | Automatic adjustment method and device for load flow calculation convergence |
CN111224404A (en) * | 2020-04-07 | 2020-06-02 | 江苏省电力试验研究院有限公司 | Power flow rapid control method for electric power system with controllable phase shifter |
CN112001066A (en) * | 2020-07-30 | 2020-11-27 | 四川大学 | Deep learning-based method for calculating limit transmission capacity |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109902854B (en) * | 2019-01-11 | 2020-11-27 | 重庆大学 | Method for constructing optimal power flow full-linear model of electric-gas interconnection system |
-
2021
- 2021-01-27 CN CN202110114628.4A patent/CN112787331B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111209710A (en) * | 2020-01-07 | 2020-05-29 | 中国电力科学研究院有限公司 | Automatic adjustment method and device for load flow calculation convergence |
CN111224404A (en) * | 2020-04-07 | 2020-06-02 | 江苏省电力试验研究院有限公司 | Power flow rapid control method for electric power system with controllable phase shifter |
CN112001066A (en) * | 2020-07-30 | 2020-11-27 | 四川大学 | Deep learning-based method for calculating limit transmission capacity |
Non-Patent Citations (1)
Title |
---|
基于知识经验和深度强化学习的大电网潮流计算收敛自动调整方法;王甜婧 等;《中国电机工程学报》;20200420;第40卷(第8期);正文第2396-2403页 * |
Also Published As
Publication number | Publication date |
---|---|
CN112787331A (en) | 2021-05-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112787331B (en) | Deep reinforcement learning-based automatic power flow convergence adjusting method and system | |
Li et al. | Coordinated load frequency control of multi-area integrated energy system using multi-agent deep reinforcement learning | |
CN110535146B (en) | Electric power system reactive power optimization method based on depth determination strategy gradient reinforcement learning | |
CN112507614B (en) | Comprehensive optimization method for power grid in distributed power supply high-permeability area | |
CN114217524A (en) | Power grid real-time self-adaptive decision-making method based on deep reinforcement learning | |
CN108462184B (en) | Power system line series compensation optimization configuration method | |
Zou | Design of reactive power optimization control for electromechanical system based on fuzzy particle swarm optimization algorithm | |
CN105449675A (en) | Power network reconfiguration method for optimizing distributed energy access point and access proportion | |
El Helou et al. | Fully decentralized reinforcement learning-based control of photovoltaics in distribution grids for joint provision of real and reactive power | |
CN113872213B (en) | Autonomous optimization control method and device for power distribution network voltage | |
CN113300380B (en) | Load curve segmentation-based power distribution network reactive power optimization compensation method | |
CN114784823A (en) | Micro-grid frequency control method and system based on depth certainty strategy gradient | |
CN107516892A (en) | The method that the quality of power supply is improved based on processing active optimization constraints | |
CN115313403A (en) | Real-time voltage regulation and control method based on deep reinforcement learning algorithm | |
CN116629461B (en) | Distributed optimization method, system, equipment and storage medium for active power distribution network | |
CN115588998A (en) | Graph reinforcement learning-based power distribution network voltage reactive power optimization method | |
CN109449994B (en) | Power regulation and control method for active power distribution network comprising flexible interconnection device | |
CN108964099B (en) | Distributed energy storage system layout method and system | |
CN110445130A (en) | Consider the air extract computing device of OPTIMAL REACTIVE POWER support | |
CN115133540B (en) | Model-free real-time voltage control method for power distribution network | |
CN116436003A (en) | Active power distribution network risk constraint standby optimization method, system, medium and equipment | |
CN114048576B (en) | Intelligent control method for energy storage system for stabilizing power transmission section tide of power grid | |
Wang | Grid Voltage Control Method Based on Generator Reactive Power Regulation Using Reinforcement Learning | |
CN106099912A (en) | A kind of active distribution network partial power coordinated control system and method | |
CN113270869B (en) | Reactive power optimization method for photovoltaic power distribution network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |