CN111209710B

CN111209710B - Automatic adjustment method and device for load flow calculation convergence

Info

Publication number: CN111209710B
Application number: CN202010015091.1A
Authority: CN
Inventors: 汤涌; 王甜婧; 郭强; 黄彦浩; 陈兴雷; 文晶; 李文臣; 张松涛; 黄河凯; 王宏志
Original assignee: Harbin Institute of Technology; State Grid Corp of China SGCC; China Electric Power Research Institute Co Ltd CEPRI
Current assignee: Harbin Institute of Technology; State Grid Corp of China SGCC; China Electric Power Research Institute Co Ltd CEPRI
Priority date: 2020-01-07
Filing date: 2020-01-07
Publication date: 2022-07-01
Anticipated expiration: 2040-01-07
Also published as: CN111209710A

Abstract

The invention discloses an automatic adjustment method for load flow calculation convergence, which comprises the following steps: designing states, action spaces and rewards of a deep reinforcement learning network for convergence of trend calculation; constructing a deep reinforcement learning network for load flow calculation convergence according to the state, the action space and the reward; knowledge experience is added into the deep reinforcement learning network, the process of manually adjusting the power flow is simulated, so that a power flow adjustment strategy is constructed, the power flow convergence of the power grid is adjusted by using the power flow adjustment strategy, and the problems that the convergence work efficiency of the existing large power grid is low, the existing large power grid is not accurate, and the labor cost is excessively consumed are solved.

Description

Automatic adjustment method and device for load flow calculation convergence

Technical Field

The application relates to the technical field of power systems, in particular to an automatic adjustment method for load flow calculation convergence, and also relates to an automatic adjustment device for load flow calculation convergence.

Background

The problem of power flow convergence adjustment is the most basic work of power grid simulation analysis, and provides a foundation for the scheme design of power grid operation and planning. From a mathematical point of view, the essence of the power flow calculation is to solve a nonlinear equation, which is called power flow non-convergence if the solution of the equation does not exist or falls into a pathological solution making the power flow equation difficult to converge. The trend unconvergence is mainly adjusted manually at present, the adjusting process is complicated and depends on manual experience, and the efficiency is low. For a large power grid, the variable of the power flow equation is high in dimension, adjustable parameters are numerous, and the adjustment efficiency is further reduced. The power grid is scheduled with a lot of manpower every year, and it takes a long time to adjust the convergence of the large power grid, which includes a lot of repetitive work. In summary, the conventional adjustment mode has significant disadvantages, mainly including a large amount of labor cost, a serious dependence on expert experience, a possibility of error occurrence, and a difficulty in controlling analysis errors. Therefore, there is a need for an automatic adjustment method for power flow convergence, which can make the work more efficient and accurate, thereby relieving the manpower.

Disclosure of Invention

The application provides an automatic adjustment method for load flow calculation convergence, and solves the problems that the existing large power grid convergence adjustment is low in working efficiency and inaccurate in operation, and the labor cost is too high.

The application provides an automatic adjustment method for load flow calculation convergence, which comprises the following steps:

designing states, action spaces and rewards of a deep reinforcement learning network for convergence of trend calculation;

constructing a deep reinforcement learning network for load flow calculation convergence according to the state, the action space and the reward;

and adding knowledge experience into the deep reinforcement learning network, and simulating the process of manually adjusting the power flow so as to construct a power flow adjustment strategy, and adjusting the power flow convergence of the power grid by using the power flow adjustment strategy.

Preferably, the states, action spaces and rewards of the deep reinforcement learning network include:

the states of the deep reinforcement learning network include: the output state of the generator, the switching effectiveness of the capacitor/reactor and the exchange power among the regions;

the action space of the deep reinforcement learning network comprises the following steps: the amount of change or effectiveness of the generator's output, the effectiveness of a particular capacitor/reactor;

rewards for the deep reinforcement learning network include: rewards obtained by trend calculation results, rewards obtained by conforming to knowledge experience and rewards obtained by conforming to constraint conditions.

Preferably, the capacitor/reactor is a heavy load, a capacitor not put into the vicinity of the multi-outlet bus, or a reactor already put into the vicinity of the multi-outlet bus.

Preferably, the heavy load, the capacitor not put into the vicinity of the outgoing bus and the reactor put into the vicinity of the outgoing bus are obtained by:

counting the load and outlet conditions of each bus to obtain the positions of heavy-load and multi-outlet buses;

recording the positions of each branch circuit, the capacitor and the reactor;

obtaining a capacitor and a reactor near a heavy-load and multi-outgoing-line bus by using a Dijkstra algorithm and an optimal path method;

and finally obtaining the heavy load, the capacitor which is not put into the vicinity of the multi-outgoing-line bus and the reactor which is put into the vicinity of the multi-outgoing-line bus by identifying the validity mark bit.

Preferably, the reward obtained by the trend calculation result is set to a large positive number when converging; when not converging, the reward is set to a small negative number.

Preferably, the reward obtained according to knowledge experience comprises:

active power balance in the region: the total active power of the generator is slightly larger than the total active power of the load, and the reward is obtained;

inter-area exchange power constraints: if the switching power is out of limit, a negative reward is obtained.

Preferably, the prize awarded according to the constraint condition includes:

rated power constraint of the generator: if the output power is out of limit, a negative reward is obtained

Output constraint of the balancing machine: if the switching power is out of limit, a negative reward is obtained.

Preferably, the knowledge and experience comprises:

the order of adjusting the power flow convergence is that the active power is balanced in the first step, and the reactive power is balanced in the second step.

Preferably, adding knowledge experience to the deep reinforcement learning network, and simulating a process of manually adjusting the power flow, so as to construct a power flow adjustment strategy, including:

according to knowledge and experience, simulating the process of manually adjusting the power flow to form a conventional balanced power flow adjustment strategy and a PV node-added balanced power flow adjustment strategy;

the conventional balanced power flow adjustment strategy is characterized in that active power is balanced through an action generator, and reactive power is balanced through a capacitor or a reactor near an action reactive power unbalance position;

according to the PV node adding balancing load flow adjusting strategy, after PV nodes are added at positions of reactive unbalance, and reactive shortage is obtained through convergence, equivalent capacitors or reactors are switched to nearby buses to balance reactive power.

This application provides a load flow calculation convergence's automatic regulating apparatus simultaneously, includes:

the design unit is used for designing the state, the action space and the reward of the deep reinforcement learning network for the convergence of the trend calculation;

the network construction unit is used for constructing a deep reinforcement learning network for load flow calculation convergence according to the state, the action space and the reward;

and the adjusting unit is used for adding knowledge experience into the deep reinforcement learning network and simulating the process of manually adjusting the power flow so as to construct a power flow adjusting strategy, and adjusting the power flow convergence of the power grid by using the power flow adjusting strategy.

The method comprises the steps of constructing a deep reinforcement learning network for load flow calculation convergence, adding knowledge experience into the deep reinforcement learning network, and simulating the process of manually adjusting load flow, so as to construct a load flow adjustment strategy, and using the load flow adjustment strategy to adjust the load flow convergence of a power grid, so that the problems of low working efficiency, inaccuracy and excessive labor cost in the conventional convergence work of large power grid adjustment are solved.

Drawings

Fig. 1 is a schematic flow chart of an automatic adjustment method for load flow calculation convergence provided in the present application;

fig. 2 is a flow chart of DQN-based algorithm to which the present application relates;

FIG. 3 is a power balancing scheme for power flow regulation to which the present application relates;

fig. 4 is the active and reactive balancing process of embodiment 1 of the present application;

FIG. 5 shows the results of 36-node system test samples according to embodiment 1 of the present application;

fig. 6 is a northeast grid test sample result of embodiment 2 of the present application;

fig. 7 is a schematic diagram of an automatic adjustment device for load flow calculation convergence according to the present application.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit of this application and is therefore not limited to the specific implementations disclosed below.

Fig. 1 is a schematic flow chart of an automatic adjustment method for load flow calculation convergence provided by the present application, and the method provided by the embodiment of the present application is described in detail below with reference to fig. 1.

Step S101, designing the state, the action space and the reward of the deep reinforcement learning network for convergence of trend calculation.

The states of the deep reinforcement learning network include: the output state of the generator, the switching effectiveness of the capacitor/reactor and the exchange power among the regions; the action space of the deep reinforcement learning network comprises the following steps: the amount of change or effectiveness of the generator's output, the effectiveness of a particular capacitor/reactor; rewards for the deep reinforcement learning network include: rewards obtained by trend calculation results, rewards obtained by conforming to knowledge experience and rewards obtained by conforming to constraint conditions.

The capacitor/reactor is a heavy load, a capacitor not put into the vicinity of the multi-outgoing bus and a reactor already put into the vicinity of the multi-outgoing bus. Heavy load, a capacitor which is not put into the vicinity of a multi-outgoing bus and a reactor which is put into the vicinity of the multi-outgoing bus are obtained by the following method: counting the load and outlet conditions of each bus to obtain the positions of heavy-load and multi-outlet buses; recording the positions of each branch circuit, the capacitor and the reactor; obtaining a capacitor and a reactor near a heavy-load and multi-outgoing-line bus by using a Dijkstra algorithm and an optimal path method; and finally obtaining the heavy load, the capacitor which is not put into the vicinity of the multi-outgoing-line bus and the reactor which is put into the vicinity of the multi-outgoing-line bus by identifying the validity mark bit.

Rewarding obtained by the trend calculation result is set as a large positive number when convergence occurs; when not converging, the reward is set to a small negative number. Rewards derived in accordance with knowledge experience, comprising: active power balance in the region: the total active power of the generator is slightly larger than the total active power of the load, and the reward is obtained; inter-area exchange power constraints: if the switching power is out of limit, a negative reward is obtained. Awards awarded in compliance with constraints include: rated power constraint of the generator: if the output power exceeds the limit, the output constraint of the negative reward balancing machine is obtained: if the switching power is out of limit, a negative reward is obtained.

Knowledge and experience, including: the order of adjusting the power flow convergence is that the active power is balanced in the first step, and the reactive power is balanced in the second step.

And step S102, constructing a deep reinforcement learning network for convergence of trend calculation according to the state, the action space and the reward.

Reinforcement learning is semi-supervised learning, and unlike general supervised learning, it guides behaviors by obtaining a reward through interaction with an environment, and obtains behaviors with a large reward value. With respect to dynamic programming, reinforcement learning can address the markov decision process of the motionless probabilistic model. And according to different evaluation objects, the reinforcement learning is divided into an optimal value algorithm and a strategy gradient algorithm. For the optimal value algorithm, typical algorithms include a Monte Carlo method and a time sequence difference method, the time sequence difference algorithm is divided into SARSA and Q-learning according to whether the current interaction strategy is adopted or not, and the Q-learning is the most widely applied reinforcement learning algorithm at present. In contrast to SARSA, Q-learning is a value-based reinforcement learning algorithm that does not follow an interactive sequence, but rather selects the action that maximizes value at the next time.

The deep reinforcement learning is the combination of the deep learning and the reinforcement learning, DQN (deep Q-network) is adopted in the application, Q-learning and the deep learning are combined, and the value function of the reinforcement learning is fitted by using the deep learning. The constructed deep reinforcement learning network for load flow calculation convergence adopts the PReLU as an activation function, which is an improvement on a ReLU function. When the input is negative, the PReLU function will have a very small slope, avoiding the problem of a gradient of 0. An L1 regularization term, constructed with a norm, is used to prevent overfitting.

Step S103, adding knowledge experience into the deep reinforcement learning network, simulating the process of manually adjusting the power flow, thereby constructing a power flow adjustment strategy, and adjusting the power flow convergence of the power grid by using the power flow adjustment strategy.

Knowledge and experience show that in actual large power grid calculation, in order to enable the adjustment load flow to be converged, after data errors are checked, active power is generally balanced, and then reactive power is balanced.

The active power balance is global balance, active power balance of all generators and loads needs to be guaranteed, a transmission channel is not out of limit, and adjustment is easy. The specific method comprises the following steps: a) and (5) carrying out partition statistics on active power, and checking whether the active power exchange between the regions is overlarge. If the exchange power is out of limit, the start of the receiving terminal subarea is increased, and the start of the sending terminal subarea is reduced, so that the transmission power of the connecting line is reduced. b) And checking whether the active power borne by the balancing machine is reasonable. And if the active output of the balancing machine exceeds the bearing range, increasing the output of the generator so as to enable the active output of the balancing machine to be within a reasonable range. c) During adjustment, whether the output of the generator reaches the limit or not needs to be checked. If the generator is fully generated (maximum allowable active power), the generator output is increased by adding a new generator, and if the generator reaches the minimum output (minimum allowable active power), the generator output is reduced by closing the generator.

The reactive power balance is layered and partitioned, the position of reactive power unbalance is difficult to accurately position, and only the place where the reactive power is possibly lost can be considered, so that the reactive power balance is difficult to adjust. The specific method comprises the following steps: a) and estimating reactive power shortage according to the voltage grade or the partition, and switching a capacitor or a reactor nearby to perform reactive balance adjustment. b) Heavy-load substations with multiple incoming and outgoing lines and large active power exchange, new energy access substations, direct current converter stations and positions near the inter-area transmission channel where reactive power may be unbalanced are checked to determine whether reactive power compensation configurations at the positions are sufficient. And if the reactive compensation configuration is insufficient, switching nearby capacitors or reactors. c) And setting the node with the reactive power possibly unbalanced as a PV node, converging the load flow calculation to obtain the reactive capacity of the node to be compensated, switching the reactive compensation device with the required capacity and setting the PQ node back, and performing equivalent calculation at the switching position according to the node and the reactive compensation around the node.

The deep reinforcement learning is the combination of the deep learning and the reinforcement learning, the DQN (deep Q-network) is adopted in the invention, the Q-learning and the deep learning are combined, and the value function of the reinforcement learning is fitted by using the deep learning. The DQN-based algorithm flow is shown in fig. 2.

Then, according to knowledge and experience, simulating the process of manually adjusting the power flow to form two schemes, as shown in fig. 3, one is a conventional balanced power flow adjustment strategy, and the other is a PV node-added balanced power flow adjustment strategy; according to the conventional balanced power flow adjustment strategy, active power is balanced through an action generator, and reactive power is balanced through a capacitor or a reactor near a position where reactive power possibly is unbalanced; the PV node-added balanced power flow adjustment strategy is different from a conventional balanced power flow adjustment strategy in that PV nodes are added at positions where reactive power is possibly unbalanced, reactive power is balanced by switching equivalent capacitors or reactors at nearby buses after convergence and reactive power shortage are obtained.

The technical solutions in the present application will be described clearly and completely with reference to the accompanying drawings, and it is obvious that the described embodiments are only some embodiments, not all embodiments, of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1

In this embodiment, verification is performed through a small example, and the method of the present invention is applied to the small example to adjust the power flow by balancing the active power and the reactive power, so that the operating mode of the sample that is not converged is adjusted to calculate the convergence. The test results prove the effectiveness of the invention.

Furthermore, in the small calculation example, in order to approach to practical engineering, a capacitor and a reactor are added to a low-voltage bus of a CEPRI36 node system, so that the system can bear larger tidal current and more reactive imbalance. Based on the initial convergence tide of the system, the generator and the load are randomly changed between 0 and 4 times, and the switching condition of the capacitance reactor is changed at the same time, so that 9711 groups of data are generated. Through load flow calculation, in 9711 groups of data, 4310 groups of data converge, and 5401 groups of data do not converge. 4000 groups of non-converged data were used as training set, and 1000 groups of data were used as test set.

Further, the active power and the reactive power are balanced, referring to fig. 4, after the active power is balanced in step 1, no active power problem exists. And 6, adding PV nodes, wherein the upper limit is 2 PV nodes. Therefore, the sum of the reactive unbalance amounts at step 7 is larger than that at step 8, since step 8 reduces the number of PV nodes. After different PV nodes are increased and decreased continuously, if the power flow is converged, equivalent capacitors or reactors are switched near the added PV nodes.

Further, referring to fig. 5, 100step and 200step represent the number of steps included in 1 epsilon in reinforcement learning, that is, the maximum number of steps for the same power flow adjustment. It can be seen that when the number of steps is either 100 or 200, the samples that are difficult to converge under scheme 2 are fewer than those under scheme 1. Under scenario 2, when each epicode includes 100 steps, 52% can converge within 10 steps, and 5% is difficult to converge within 100 steps. When each epsilon comprises 200 steps, only 2% is difficult to converge, which means that increasing the number of steps of each epsilon, i.e. increasing the number of steps of each sample adjustment, can improve the probability of convergence of tidal current adjustment, but the effect is very limited, and meanwhile, the training duration is increased.

Example 2

In the embodiment of the application, verification is carried out on the northeast power grid, and by using the method, the load flow is adjusted by balancing the active power and the reactive power, so that the operating mode of a sample which is not converged is adjusted, and thus the calculation convergence is carried out. The test results prove the effectiveness of the invention in practical systems.

Further, referring to fig. 6, compared with a 36-node system, the result of the northeast power grid in the scheme 1 is much worse than that of the scheme, mainly because the switched capacitors in a large power grid have a wider range, and reactive power is less likely to be effectively balanced. Under scheme 2, when each epicode includes 100 steps, 43% can converge within 10 steps, and 8% is difficult to converge within 100 steps. When each epicode includes 200 steps, only 3% is difficult to converge, and the effect of adjusting convergence is reduced compared with a 36-node system. Because in a real system, the reactive power balancing becomes complex, in addition to the heavy-load substation, the influence of new energy and direct current is also considered. But in the generated samples, the unadjustable converged samples are all the conditions of heavy trend, which are difficult to meet in the actual engineering, so that the method is still feasible in the actual engineering.

The working principle of the application is as follows: knowledge experience is added in the reinforcement learning, so that the action space and the state space of the reinforcement learning are limited, the search space is reduced, and the complexity of the neural network learning is reduced. And the adjustment behavior of a human is simulated, the active balance is adjusted firstly, and then the reactive balance is adjusted, so that the search has directionality.

The present application also provides an automatic adjustment device 700 for load flow calculation convergence, as shown in fig. 7, including:

a design unit 710 for designing a state, an action space, and a reward of a deep reinforcement learning network for convergence of load flow calculation;

a network construction unit 720, which constructs a deep reinforcement learning network for load flow calculation convergence according to the state, the action space and the reward;

and the adjusting unit 730 adds knowledge experience to the deep reinforcement learning network and simulates the process of manually adjusting the power flow, so as to construct a power flow adjusting strategy, and the power flow convergence of the power grid is adjusted by using the power flow adjusting strategy.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims

1. An automatic adjustment method for load flow calculation convergence is characterized by comprising the following steps:

designing states, action spaces and rewards of a deep reinforcement learning network for convergence of load flow calculation, wherein the states of the deep reinforcement learning network comprise: the output state of the generator, the switching effectiveness of the capacitor/reactor and the exchange power among the regions; the capacitor/reactor is a heavy-load capacitor which is not put into the vicinity of the multi-outgoing-line bus and an already put into the reactor; the heavy load, the non-input capacitor and the input reactor near the multi-outgoing line bus are obtained by the following steps: counting the load and outlet conditions of each bus to obtain the positions of heavy-load and multi-outlet buses; recording the positions of each branch circuit, the capacitor and the reactor; obtaining a capacitor and a reactor near a heavy-load and multi-outgoing-line bus by using a Dijkstra algorithm and an optimal path method; finally obtaining a heavy load, a non-input capacitor and an input reactor near the multi-outgoing-line bus by identifying the validity mark bit; the action space of the deep reinforcement learning network comprises the following steps: the amount of change or effectiveness of the generator's output, the effectiveness of a particular capacitor/reactor; rewards for the deep reinforcement learning network include: rewards obtained by trend calculation results, rewards obtained by conforming to knowledge experience and rewards obtained by conforming to constraint conditions;

and adding knowledge and experience into the deep reinforcement learning network, and simulating the process of manually adjusting the power flow so as to construct a power flow adjustment strategy, and adjusting the power flow convergence of the power grid by using the power flow adjustment strategy.

2. The method according to claim 1, wherein the reward obtained by the trend calculation result is set to be a large positive number when converging; when not converging, the reward is set to a small negative number.

3. The method of claim 1, wherein said reward derived in accordance with knowledge experience comprises:

4. The method of claim 1, wherein said qualifying a prize comprises:

rated power constraint of the generator: if the power exceeds the limit, a negative reward is obtained

5. The method of claim 1, wherein the knowledge-based experience comprises:

6. The method of claim 1, wherein the building of the trend adjustment strategy by adding knowledge experience to the deep reinforcement learning network and simulating a process of manually adjusting the trend comprises:

according to the PV node adding balancing load flow adjusting strategy, after PV nodes are added at the positions of reactive unbalance, convergence is carried out, and reactive shortage is obtained, and an equivalent capacitor or an equivalent reactor is switched on a nearby bus to balance reactive power.

7. An automatic adjustment device for load flow calculation convergence, comprising:

a designing unit, which designs a state, an action space and a reward of a deep reinforcement learning network for convergence of trend calculation, wherein the state of the deep reinforcement learning network comprises: the output state of the generator, the switching effectiveness of the capacitor/reactor and the exchange power among the regions; the capacitor/reactor is a heavy load, a capacitor which is not put into the vicinity of the multi-outgoing-line bus and a reactor which is put into the vicinity of the multi-outgoing-line bus; the heavy load, the capacitor which is not put into the vicinity of the multi-outgoing bus and the reactor which is put into the vicinity of the multi-outgoing bus are obtained by the following steps: counting the load and outlet conditions of each bus to obtain the positions of heavy-load and multi-outlet buses; recording the positions of each branch circuit, the capacitor and the reactor; obtaining a capacitor and a reactor near a heavy-load and multi-outgoing-line bus by using a Dijkstra algorithm and an optimal path method; finally obtaining heavy load, non-input capacitors and input reactors near the multi-outgoing bus by identifying the validity mark bit; the action space of the deep reinforcement learning network comprises the following steps: the amount of change or effectiveness of the generator's output, the effectiveness of a particular capacitor/reactor; rewards for the deep reinforcement learning network include: rewards obtained by trend calculation results, rewards obtained by conforming to knowledge experience and rewards obtained by conforming to constraint conditions;