CN115765035A

CN115765035A - Flexible power distribution network disturbance recovery method suitable for full-time dynamic reconstruction

Info

Publication number: CN115765035A
Application number: CN202211308934.2A
Authority: CN
Inventors: 霍现旭; 董雷; 张磐; 郑悦; 梁海深; 李占一; 吴怡; 张涛
Original assignee: State Grid Corp of China SGCC; State Grid Tianjin Electric Power Co Ltd; North China Electric Power University; Electric Power Research Institute of State Grid Tianjin Electric Power Co Ltd; Baodi Power Supply Co of State Grid Tianjin Electric Power Co Ltd
Current assignee: State Grid Corp of China SGCC; State Grid Tianjin Electric Power Co Ltd; North China Electric Power University; Electric Power Research Institute of State Grid Tianjin Electric Power Co Ltd; Baodi Power Supply Co of State Grid Tianjin Electric Power Co Ltd
Priority date: 2022-10-25
Filing date: 2022-10-25
Publication date: 2023-03-07

Abstract

The invention relates to a flexible power distribution network disturbance recovery method suitable for full-time dynamic reconstruction, which comprises the following steps of: step 1, initializing to generate an initial network structure, and importing photovoltaic, wind power and load data; step 2, generating a network topology mechanism set which can be adopted after the flexible distribution line fails based on the network structure generated by initializing in the step 1; step 3, establishing a mathematical model for optimizing operation after the fault recovery of the flexible power distribution network based on the feasible network topology structure generated in the step 2; step 4, establishing a fault recovery optimized operation mathematical model of the flexible power distribution network based on the step 3, and establishing a multi-type variable double-agent reinforcement learning collaborative optimization model based on the multi-type variable double-agent reinforcement learning collaborative optimization model; and 5, outputting dynamic reconstruction and a regulation and control strategy of the controllable active equipment in an online decision mode. The invention can solve the problems of extreme difficulty in optimizing the intelligent agent, low efficiency and difficulty in convergence.

Description

Flexible power distribution network disturbance recovery method suitable for full-time dynamic reconstruction

Technical Field

The invention belongs to the technical field of optimized operation of a power distribution network containing a renewable distributed power supply, and relates to a flexible power distribution network disturbance recovery method, in particular to a flexible power distribution network disturbance recovery method suitable for full-time dynamic reconstruction.

Background

With the development of national economy and the progress of science and technology, users put higher requirements on the reliability of power supply correspondingly. The power distribution network is used as a terminal of the power network, and whether the user can safely and reliably use the power or not can be directly determined. Because the power distribution network has the characteristics of closed-loop design and open-loop operation, the N-1 safety criterion can be met, namely when a certain distribution line breaks down and is disconnected, the system can achieve stable operation without load shedding by adjusting the topological structure. However, when the topology structure is adjusted, the above process only realizes the restoration of power supply for all loads, and topology optimization is not performed, that is, while the reliability of power supply is ensured, the operation economy of the system is ignored. Meanwhile, the access of the renewable distributed power supply with volatility and uncertainty inevitably brings disturbance to the operation of the power distribution network, and further causes adverse effects on the quality of the electric energy supplied to users. Therefore, in the fault recovery process of the power distribution network, a series of voltage problems caused by the fact that the renewable distributed power supply is connected into the power distribution network can be effectively relieved by reasonably selecting the topological structure.

With the development of power electronic technology, an intelligent soft Switch (SOP) gets more and more attention and discussion by virtue of its flexible regulation and control manner. The SOP is a power electronic device installed on a tie line, and compared with a traditional tie switch, the SOP can provide continuous and smooth reactive power compensation and rapid and accurate active power regulation and control, and is one of effective means in a system fault recovery process. However, since the SOP is expensive and can only partially replace the conventional tie switch in a short period of time, research on cooperative optimization of the SOP and the conventional tie switch is relatively small. In addition, the above collaborative optimization problem includes continuous variables and a large number of discrete variables, and the problem has a large scale, and therefore, the problems are difficult to find and low in solving efficiency. The existing solving methods such as a mathematical programming method, a heuristic algorithm and the like need to simplify a physical model to reduce the quantity and scale of problems, and for example, the measures of time-interval combination by adopting a specific clustering method or artificially set indexes before and after network reconstruction face the problems of complicated flow and incapability of maximizing the optimization due to the subjectivity of the clustering process. Meanwhile, the simplified model still has the problems of low solving speed, easy falling into local optimization, difficult convergence and the like during solving.

Generally speaking, for the technical field of optimized operation of a power distribution network containing renewable distributed power sources, a method for flexible power distribution network disturbance recovery suitable for full-time dynamic reconstruction is still lacking at present.

Through searching, the patent documents of the prior art which are the same as or similar to the invention are not found.

Disclosure of Invention

The invention aims to overcome the defects of a theoretical support system based on experience in the prior art, and provides a flexible power distribution network disturbance recovery method suitable for full-time dynamic reconstruction.

The invention solves the practical problem by adopting the following technical scheme:

a flexible power distribution network disturbance recovery method suitable for full-time dynamic reconstruction comprises the following steps:

step 1, initializing to generate an initial network structure, and importing photovoltaic, wind power and load data;

step 2, generating a network topology mechanism set which can be adopted after the flexible distribution line has a fault based on the network structure generated by initializing in the step 1;

step 3, establishing a mathematical model for optimizing operation after the fault recovery of the flexible power distribution network based on the feasible network topology structure generated in the step 2;

step 4, establishing a fault recovery optimized operation mathematical model of the flexible power distribution network based on the step 3, and establishing a multi-type variable double-agent reinforcement learning collaborative optimization model based on the multi-type variable double-agent reinforcement learning collaborative optimization model;

and 5, outputting dynamic reconstruction and a regulation and control strategy of the controllable active equipment in an online decision mode.

Further, the specific steps of step 2 include:

step 2.1, deleting the fault line and simplifying the network: deleting a fault line on an initial network structure G0, combining branches connected with nodes with the node degree of 2 in the network, and regarding the branches as a branch to obtain a simplified network G1;

step 2.2 generating all spanning trees of the simplified network based on polynomial multiplication: based on the node branch information of G1, self-defining a root node, and forming basic items in the polynomial by taking each non-root node as a unit, wherein each basic item is obtained by adding the labels of all branches connected with corresponding nodes;

step 2.3 mapping the spanning tree of the simplified network to the spanning tree of the original network: each spanning tree of the simplified network corresponds to a plurality of spanning trees of the original network; for the tree branches of the spanning tree of the simplified network, the initial network structure reserves all corresponding branches; for the connection branches of the simplified network except the spanning tree, the initial network structure can be used for disconnecting any one of the corresponding branch circuits, so that a network topology structure set which can be adopted after the flexible distribution line fails is generated.

Further, the specific steps of step 3 include:

step 3.1, establishing an optimized operation objective function after the fault of the flexible power distribution network is recovered;

the objective function considers the accumulated active power loss value of the system in the optimization period after the disturbance occurs, the switching action times in the network reconstruction process and the renewable energy consumption level, and is shown as formula (1):

in the formula: t is simulation duration, the method is set to be one day, optimization is carried out by taking hours as units, and n is the number of nodes of the system. P is _i (t) is the active power injected at node i during time t. T is a unit of _j (t) is the state of the switch j in the time period t, when the switch is closed, the state is 1, and when the switch is opened, the state is 0; p is _DGi (t) is the active power injected by the distributed power supply at node i during time t,

the active power injected by the distributed power supply at the node i in the t period is predicted. n is _DG Is the total number of distributed power supplies. The network topology and the injected power of each node in a unit time interval are assumed to be unchanged. w is a _loss 、w _t 、w _r The impact coefficients on the importance of the objective function are respectively consumed for network losses, switching action times and renewable energy.

Step 3.2, establishing constraint conditions for optimizing operation after fault recovery of the flexible power distribution network

1) Radial confinement

G(t)∈G (2)

In the formula: g (t) represents a network structure adopted in the t period; g is a network topology set which does not consider the branch where the SOP is located, strictly meets the radial constraint and does not contain the fault line.

2) Switch action times constraint

In the formula: t is a unit of _max Maximum value of total number of allowable switching actions for network reconfiguration in an optimized period, T _j,max In order to optimize the maximum number of allowed actions of the switch j in the period, in the invention, the optimization period is 24 hours, the total number of allowed maximum switch actions is 15, and the maximum number of allowed actions of a single switch is 3; omega _node Is a set of system nodes.

3) SOP constraint

The SOP involves the transmission active power constraint and the two-side capacity constraint as follows:

in the formula: s. the _k1max 、S _k2max The maximum capacity of the converter on two sides of the SOP is respectively. Omega _SOP Is the set of the branches where the SOP is located.

4) Flow balance constraints

In the formula: p is _i (t)、Q _i (t) active power and reactive power injected into the node i at a time period t respectively; v _i (t) is the voltage amplitude of the inode during t period; g _ij 、B _ij Respectively the conductance and the susceptance of the node admittance matrix; theta.theta. _ij Is the phase angle difference between the i node and the j node; p is _DGi (t)、Q _DGi (t) active power and reactive power injected by the distributed power supply at the node i in the t period respectively; p is _SOPi (t)、Q _SOPi (t) active power and reactive power injected by the SOP at the node i in the t period respectively; p _LDi (t)、Q _LDi And (t) the active power and the reactive power consumed by the load at the node at the t period.

5) Node voltage constraint

V _min ≤V _i (t)≤V _max i∈Ω _node (8)

In the formula: v _min 、V _max Respectively to meet the upper and lower limits of the voltage amplitude of the system operation node.

6) Branch flow constraint

In the formula: I.C. A _ij (t) is the amplitude of the current flowing in branch ij at time t, I _ijmax The maximum amplitude of the current allowed to flow on branch ij.

7) Distributed power supply output constraints

P _DGi,min ≤P _DGi (t)≤P _DGi,max i∈Ω _DG (10)

Q _DGi,min ≤Q _DGi (t)≤Q _DGi,max i∈Ω _DG (11)

P _DGi,min 、P _DGi,max The minimum value and the maximum value of the active power output of the distributed power supply connected with the i node are respectively; q _DGi,min 、Q _DGi,max The minimum value and the maximum value of the reactive power output of the distributed power supply connected with the node i are respectively;

further, the specific steps of step 4 include:

step 4.1: constructing a discrete intelligent agent action space for network topology optimization, namely a network topology set which can be adopted after a flexible power distribution network line fails, wherein the action space of the discrete intelligent agent is as follows: { G (t) }

And 4.2: constructing a continuous intelligent agent action space for optimizing the controllable active device, wherein the action space of the continuous intelligent agent is as follows: { P _k1 (t)、P _k2 (t)、Q _k1 (t)、Q _k2 (t)、P _DG (t)}。

Step 4.3: constructing state spaces of the discrete intelligent agents and the continuous intelligent agents, and describing the running state of the system through the source network load state, namely the state spaces are as follows: { P _k1 (t)、P _k2 (t)、Q _k1 (t)、Q _k2 (t)、P _DG (t)、G(t)、P _L (t)}

Step 4.4: constructing a reward function of the discrete agent and the continuous agent:

in the formula: w is a _p ,w _g The weighting coefficients of the network loss and the switching action frequency are respectively, M is a great positive number, and when the switching action frequency is out of range, a great punishment is given.

The continuous intelligent agent reward function calculation method comprises the following steps: the continuous intelligent agent optimizes controllable devices such as a distributed power supply and an SOP (self-service platform) based on a network topology structure selected by the discrete intelligent agent, reduces network loss and light and air abandonment, and has a reward function as shown in a formula (15):

step 4.5: setting intelligent agent hyper-parameters;

step 4.6: off-line training is based on a double-agent reinforcement learning model for optimizing operation after the flexible power distribution network fails;

moreover, the specific method of the step 5 is as follows:

after the load and the wind power photovoltaic prediction curve are obtained based on the multi-type variable double-intelligent-body reinforcement learning collaborative optimization model, the dynamic reconstruction and the regulation and control strategy of the controllable active device can be directly completed without the optimization process.

The invention has the advantages and beneficial effects that:

1. the invention provides a flexible power distribution network disturbance recovery method suitable for full-time dynamic reconstruction, and mainly aims to solve the problems that the traditional power distribution network regulating and controlling capability is insufficient after a large number of fluctuating distributed power supplies are connected into a power distribution network to generate disturbance, so that the network loss is large, the renewable energy consumption level is low, the relay protection switching frequency is high and the like.

2. In order to further accelerate the convergence speed, the DQN algorithm considering the sampling priority is adopted, the larger the TD error of the sample in the DQN algorithm considering the sampling priority is, the larger the influence on the inverse gradient calculation is, and the higher the probability of being sampled is, so that the condition that uniform sampling or annihilation of information with higher value but less quantity in an experience pool is avoided, and finally the DQN algorithm converges to an undesired suboptimal solution is avoided.

3. The invention adopts multi-type variable double-agent training to generate an optimal strategy, and different agents respectively carry out coordination training on integer variables and continuous variables. The two agents respectively give actions at specific time points and are applied to the same environment, and the updating of the environment state influences the next action (cooperative interaction) of the agents. By assigning actions to different agents, the action dimension is also reduced, making the agents more reliable convergence.

Drawings

FIG. 1 is a schematic diagram of a simplified network structure based on polynomial multiplication of the present invention;

FIG. 2 is a diagram of the location and topology of the SOP of the present invention in a power distribution network;

FIG. 3 is a diagram of a multi-type variable dual agent reinforcement learning collaborative optimization model of the present invention;

fig. 4 is a topology diagram of an improved IEEE33 node power distribution network of the present invention.

Detailed Description

The embodiments of the invention are further described in the following with reference to the drawings:

initializing and generating an initial network structure, importing photovoltaic data, wind power historical data and load data, and describing the fluctuation and uncertainty of active power actually output by the wind power and the photovoltaic by adopting a mode of predicting power superposition prediction errors.

as shown in fig. 1. Based on the initial network structure generated in the step 1, the initial network structure is reconstructed by adopting a polynomial multiplication method, and a radial network topological structure meeting the requirements of network connectivity and relay protection setting is generated. The feasible network topology formation process after the line fault is as follows:

the specific steps of the step 2 comprise:

step 2.1, deleting the fault line and simplifying the network: deleting a fault line on an initial network structure G0, merging branches connected with nodes with the node degree of 2 in the network, and regarding the branches as a branch to obtain a simplified network G1;

multiplying, expanding and combining the same-class terms by a polynomial, and deleting high-power terms and the same-class terms, thereby ensuring connectivity and radiancy of the spanning tree;

step 2.3, mapping the spanning tree of the simplified network to the spanning tree of the original network: each spanning tree of the simplified network corresponds to a plurality of spanning trees of the original network; for the tree branches of the spanning tree of the simplified network, the initial network structure reserves all corresponding branches; for the connection branches of the simplified network except the spanning tree, the initial network structure can be used for disconnecting any one of the corresponding branch circuits, so that a network topology structure set which can be adopted after the flexible distribution line fails is generated.

Step 3, based on the feasible network topology structure generated in the step 2, establishing an optimized operation mathematical model after the fault of the flexible power distribution network is recovered;

the specific steps of the step 3 comprise:

the main purpose of the optimized operation research after the fault recovery of the flexible power distribution network is to disconnect the circuit breakers on two sides of a fault point after the fault occurs, ensure that the power consumption of a user is not influenced as much as possible in a network reconstruction mode, and expect that the operation economy of the power distribution network is optimal after the network structure is changed. The objective function considers the accumulated active power loss value of the system in the optimization period after the disturbance occurs, the switching action times in the network reconstruction process and the renewable energy consumption level, and is shown as formula (1):

in the formula: t is simulation duration, the method is set to be one day, optimization is carried out by taking hours as a unit, and n is the number of nodes of the system. P _i (t) is the active power injected at node i during time t. T is a unit of _j (t) is the state of the switch j in the period of t, when the switch is closed, the state is 1, and when the switch is opened, the state is 0; p _DGi (t) is the active power injected by the distributed power supply at node i during time t,

and the active power predicted value injected by the distributed power supply at the node i in the t period. n is a radical of an alkyl radical _DG Is the total number of distributed power sources. The network topology and the injected power of each node in a unit time period are assumed to be unchanged. w is a _loss 、w _t 、w _r The impact coefficients on the importance of the objective function are consumed for network losses, switching action times and renewable energy, respectively.

The network topology and the injected power of each node in a unit time interval are assumed to be unchanged. The invention adopts a linear weighting method to convert multi-objective optimization into single-objective optimization, the weight coefficient can be adjusted according to the dispatching requirement of the power distribution system, the specific numerical value can be determined by an analytic hierarchy process, and the requirement of w is met _loss +w _h +w _r ＝1.0。

1) Radial constraint

G(t)∈G (2)

2) Switch action times constraint

In the dynamic reconfiguration process of the power distribution network, the circuit breaker performance is degraded due to frequent changes of the network topology, the service life is shortened, and adverse effects are brought to the transient stability of the system, so that the times of the circuit breaker action need to be reasonably limited in an optimization period.

In the formula: t is _max Maximum value of total number of allowable switching actions for network reconfiguration in an optimized period, T _j,max In order to optimize the maximum number of allowed actions of the switch j in the period, in the invention, the optimization period is 24 hours, the total number of the allowed maximum switch actions is 15, and the maximum action number allowed by a single switch is 3; omega _node Is a set of system nodes.

3) SOP constraint

As shown in fig. 2, the SOP is composed of a fully-controlled power electronic device, and considering that the manufacturing cost of the intelligent soft switch is high, the intelligent soft switch generally selects a part to replace a tie switch in the distribution network.

The operation scenes of the SOP are all normal operation, and the control variables are as follows: active power P transmitted by SOP _k1 、P _k2 Reactive power support Q provided at the connection node on both sides of the SOP _k1 、Q _k2 The power losses generated during SOP operation are ignored. The SOP involves the transmission active power constraint and the two-sided capacity constraint as follows:

4) Flow balance constraints

In the formula: p is _i (t)、Q _i (t) active power and reactive power injected into the node i in a period t respectively; v _i (t) is the voltage amplitude of the inode during t period; g _ij 、B _ij Respectively the conductance and the susceptance of the node admittance matrix; theta.theta. _ij Is the phase angle difference between the i node and the j node; p is _DGi (t)、Q _DGi (t) active power and reactive power injected by the distributed power supply at the node i in the t period respectively; p _SOPi (t)、Q _SOPi (t) active power and reactive power injected by the SOP at the node i in the t period respectively; p _LDi (t)、Q _LDi And (t) the active power and the reactive power consumed by the load at the node at the t period.

5) Node voltage constraint

V _min ≤V _i (t)≤V _max i∈Ω _node (8)

In the formula: v _min 、V _max Respectively satisfying the upper and lower limits of the voltage amplitude of the system operation node.

6) Branch current flow restraint

In the formula: i is _ij (t) is the amplitude of the current flowing in branch ij at time t, I _ijmax The maximum amplitude of the current allowed to flow on branch ij.

7) Distributed power supply output constraints

P _DGi,min ≤P _DGi (t)≤P _DGi,max i∈Ω _DG (10)

Q _DGi,min ≤Q _DGi (t)≤Q _DGi,max i∈Ω _DG (11)

P _DGi,min 、P _DGi,max The minimum value and the maximum value of the active power output of the distributed power supply connected with the i node are respectively; q _DGi,min 、Q _DGi,max The minimum value and the maximum value of the reactive power output of the distributed power supply connected with the node i are respectively; in the invention, the renewable energy wind generating set runs with a constant power factor and is trained by adopting prediction data; the power provided by the controllable distributed power supply meets the above constraints.

From the above problems, it can be seen that: decision variables in network reconstruction are composed of 0-1 variables, the regulating quantity of SOP-containing active controllable equipment is a continuous variable, and related power flow constraints, SOP operation constraints and the like are all nonlinear constraints. Therefore, the multi-period collaborative optimization problem is a large-scale mixed integer nonlinear programming problem, and great challenges are provided for the solving capability and the solving efficiency of the algorithm.

according to the model established in the step 3, the optimized operation mathematical model after the fault recovery of the flexible power distribution network is a large-scale mixed integer nonlinear programming problem, the variables comprise discrete decision variables consisting of 0-1 variables in network reconstruction, distributed power output, SOP-containing active controllable equipment regulating quantity and other continuous variables, the related power flow constraint, SOP operation constraint and the like are nonlinear constraints, and the direct solution has the advantages of large calculated quantity and low solution speed and cannot ensure the global optimal solution.

The existing research shows that the reinforcement learning algorithm gets rid of the dependence on accurate prediction data and a physical model by virtue of strong autonomous learning exploration capacity, adapts to the change of the environment and effectively processes the complex sequence decision problem, so that the reinforcement learning can be used for processing the problem of mixed integer programming. The process of reinforcement learning processing mixed integer programming is as follows: as shown in fig. 3, the optimal strategy is generated by training with two agents, and different agents perform coordination training on the integer variable and the continuous variable respectively. Therefore, the invention constructs two intelligent agents with different time scales, and respectively carries out network topology optimization and controllable active equipment output optimization containing the intelligent soft switch.

The specific steps of the step 4 comprise:

Step 4.2: a continuous agent action space for optimizing controllable active devices is constructed. The continuous intelligent agent realizes the optimized operation of the system by optimizing the reactive power provided by two sides of the intelligent soft switch and the active power transmitted by the intelligent soft switch and regulating and controlling a controllable distributed power supply (wind power, photovoltaic and micro gas turbine). Thus, the action space of a continuum agent is: { P _k1 (t)、P _k2 (t)、Q _k1 (t)、Q _k2 (t)、P _DG (t)}。

Step 4.3: and constructing state spaces of the discrete agents and the continuous agents. Two agents share distribution network state information, and therefore set the same state quantities. Describing the running state of the system through the source network load state, namely the state space is as follows: { P _k1 (t)、P _k2 (t)、Q _k1 (t)、Q _k2 (t)、P _DG (t)、G(t)、P _L (t)}

Step 4.4: a reward function is constructed for both discrete agents and continuous agents. The instant reward of the discrete intelligent agent comprises two parts of network loss and switching action times, and because the algorithm converges towards the direction of the maximum reward function value, the reward function is set as the opposite number of the network loss and the switching action times:

in the formula: w is a _p ,w _g The weighting coefficients of the network loss and the switching action times are respectively, M is a great positive number, and when the switching action times are out of range, a great punishment is given.

step 4.5: setting intelligent agent hyper-parameters;

the hyper-parameters of the discrete agent are: by adopting the DQN algorithm, the size of the experience pool is set to 20000, the discount factor gamma is 0.9, the batch processing scale is 32, the learning rate beta is 0.001, and the target network is updated 1 time every 200 steps. The continuous intelligent agent hyper-parameter design is as follows: with the AC algorithm, the learning rates α and β of the actor Network and the critical Network are 0.001 and 0.01, respectively, and the discount factor γ is 0.9. The neural network adopts a full connection layer mode, the number of neurons of two hidden layers is 128 and 256 respectively, and the activation function adopts a Relu function.

Step 4.6: off-line training is based on a double-agent reinforcement learning model for optimizing operation after the fault of the flexible power distribution network;

in order to further accelerate the convergence of the algorithm and improve the training efficiency, a discrete agent for network topology selection adopts a DQN algorithm, and each sample is provided with a priority delta proportional to the absolute value of a time sequence difference error (TD error) and is stored into an experience pool. Since the topology of the power network cannot be changed frequently, the agent acts on a longer time scale, the action time interval is set to d1, and the time interval of d1 is set in hours.

The continuous intelligent agent for regulating and controlling the controllable active equipment has the advantages that the regulating speed is high, the action time interval is set to be d2, and generally the time interval of the d2 is set to be in minutes.

In the training process, the two agents adopt a collaborative training mode and share the state information of the power distribution network. At different time points, the discrete intelligent agents and the continuous intelligent agents give corresponding action selection based on the current power distribution network dispatching center state, and meanwhile, the running state of the power distribution network changes, so that the action selection of another intelligent agent is influenced.

Step 5, making a decision on line to output dynamic reconstruction and a regulation and control strategy of the controllable active equipment;

the specific method of the step 5 comprises the following steps:

The working principle of the invention is as follows:

the invention provides a dynamic reconstruction-based flexible power distribution network disturbance recovery optimization method, which is based on the current situation that the economical efficiency of a system is poor due to the change of a network structure when the interior of a power distribution network containing an SOP is disturbed. The method specifically comprises the following steps:

step 1: initializing and generating an initial network structure; importing photovoltaic and wind power historical data and load data; introducing impedance data of each branch of a power distribution network of an improved IEEE33 node (as shown in figure 4), performing per unit value, and defining active power and reactive power of the nodes in the network; importing day-ahead load curve data and calculating the total load; and setting the capacity value of the converter on two sides of the SOP and adding a controllable distributed power supply in the improved IEEE33 node network.

Step 2: and (2) reconstructing the initial network structure by adopting a polynomial multiplication method based on the initial network structure generated in the step (1) to generate a radial network topology structure meeting the network connectivity and relay protection setting requirements. If each switch is used as a single variable, and the related radial constraint is directly added into a reward function of reinforcement learning so as to constrain the action selected by the intelligent agent, the variables are numerous, the action space formed after arrangement and combination is huge, a large number of infeasible solutions exist, and meanwhile, the limitation of the action times of the switch needs to be considered, and the factors can make the optimization of the intelligent agent extremely difficult, the efficiency is low, and the convergence is difficult. The invention ensures the radial network topological structure through the polynomial multiplication method reduction technology, and has higher accurate solution while reducing the number of action spaces.

And step 3: and (3) according to the feasible network topology structure generated in the step (2), taking the distributed power output, the SOP power and the load recovery state as decision variables, taking the sum of the accumulated active power loss value of the system in the optimized period after disturbance, the switching action times in the network reconstruction process and the renewable energy consumption level as a minimum objective function to establish an optimized operation mathematical model after the fault recovery of the flexible power distribution network, and adding radial constraint, switching action times constraint, SOP constraint conditions, power flow balance constraint, node voltage constraint, branch power flow constraint and distributed power output constraint.

And 4, step 4: according to the model established in the step 3, the optimized operation mathematical model after the fault recovery of the flexible power distribution network is a large-scale mixed integer nonlinear programming problem, the decision variable in the network reconstruction is composed of 0-1 variables and is a discrete variable, the regulating quantity of the SOP-containing active controllable equipment is a continuous variable, and the related power flow constraint, SOP operation constraint and the like are nonlinear constraints. And constructing two different types of intelligent agents with different time scales, and respectively carrying out network topology optimization and controllable active equipment output optimization containing an intelligent soft switch.

It should be emphasized that the embodiments described herein are illustrative and not restrictive, and thus the present invention includes, but is not limited to, the embodiments described in this detailed description, as well as other embodiments that can be derived by one skilled in the art from the teachings herein, and are within the scope of the present invention.

Claims

1. A flexible power distribution network disturbance recovery method suitable for full-time dynamic reconstruction is characterized by comprising the following steps: the method comprises the following steps:

step 2, generating a network topology mechanism set which can be adopted after the flexible distribution line fails based on the network structure generated by initializing in the step 1;

step 4, establishing an optimized operation mathematical model after the fault recovery of the flexible power distribution network based on the establishment constructed in the step 3, and establishing a multi-type variable based double-agent reinforcement learning collaborative optimization model;

and 5, outputting dynamic reconstruction and a regulation and control strategy of the controllable active equipment by online decision.

2. The method for recovering disturbance of the flexible power distribution network adaptive to full-time dynamic reconstruction according to claim 1, wherein the method comprises the following steps: the specific steps of the step 2 comprise:

step 2.2 generating all spanning trees of the simplified network based on a polynomial multiplication method: based on the node branch information of G1, self-defining a root node, and forming basic items in the polynomial by taking each non-root node as a unit, wherein each basic item is obtained by adding the labels of all branches connected with corresponding nodes;

step 2.3, mapping the spanning tree of the simplified network to the spanning tree of the original network: each spanning tree of the simplified network corresponds to a plurality of spanning trees of the original network; for the tree branches of the spanning tree of the simplified network, the initial network structure reserves all corresponding branches; for the connection branches of the simplified network except the spanning tree, the initial network structure can be used for disconnecting any corresponding branch, so that a network topology structure set which can be adopted after the flexible distribution line fails is generated.

3. The method for recovering disturbance of the flexible power distribution network adaptive to full-time dynamic reconstruction according to claim 1, wherein the method comprises the following steps: the specific steps of the step 3 comprise:

the objective function considers the accumulated active power loss value of the system in the optimization period after disturbance occurs, the switching action times in the network reconstruction process and the renewable energy consumption level, and is shown as formula (1):

in the formula: t is simulation duration, is set to be one day, is optimized by taking hours as a unit, and n is the number of nodes of the system; p _i (t) is the active power injected at node i during time t; t is a unit of _j (t) is the state of the switch j in the time period t, when the switch is closed, the state is 1, and when the switch is opened, the state is 0; p _DGi (t) is the active power injected by the distributed power supply at node i during time t,

the active power predicted value injected by the distributed power supply at the node i in the t period is obtained; n is _DG The total number of the distributed power supplies; the network topology and the injected power of each node in a unit time interval are assumed to be kept unchanged; w is a _loss 、w _t 、w _r Respectively eliminating the influence coefficients of the network loss, the switching action times and the renewable energy on the importance of the objective function;

1) Radial constraint

G(t)∈G (2)

In the formula: g (t) represents a network structure adopted in the t period; g is a network topology set which does not consider the branch where the SOP is and strictly meets the radial constraint and does not contain the fault line;

2) Switch action frequency constraint

In the formula: t is _max Maximum value of total number of permissible switching actions for network reconfiguration in an optimized period, T _j,max In order to optimize the maximum number of allowed actions of the switch j in the period, in the invention, the optimization period is 24 hours, the total number of allowed maximum switch actions is 15, and the maximum number of allowed actions of a single switch is 3; omega _node Is a system node set;

3) SOP constraints

The SOP involves the transmission active power constraint and the two-sided capacity constraint as follows:

in the formula: s _k1max 、S _k2max The maximum capacity of the converters on two sides of the SOP is respectively; omega _SOP The branch set where the SOP is located;

4) Tidal current balance constraint

In the formula: p _i (t)、Q _i (t) active power and reactive power injected into the node i at a time period t respectively; v _i (t) is the voltage amplitude of the inode during t period; g _ij 、B _ij Respectively the conductance and the susceptance of the node admittance matrix; theta _ij Is the phase angle difference between the i node and the j node; p is _DGi (t)、Q _DGi (t) active power and reactive power injected by the distributed power supply at the node i in the t period respectively; p _SOPi (t)、Q _SOPi (t) active power injected at node i node, t,Reactive power; p _LDi (t)、Q _LDi (t) the active power and the reactive power consumed by the load at the i node in the t period respectively;

5) Node voltage constraint

V _min ≤V _i (t)≤V _max i∈Ω _node (8)

In the formula: v _min 、V _max Respectively satisfying the upper and lower limits of the voltage amplitude of the system operation node;

6) Branch current flow restraint

In the formula: i is _ij (t) is the amplitude of the current flowing in branch ij at time t, I _ijmax The maximum amplitude of the current allowed to flow in branch ij;

7) Distributed power supply output constraints

P _DGi,min ≤P _DGi (t)≤P _DGi,max i∈Ω _DG (10)

Q _DGi,min ≤Q _DGi (t)≤Q _DGi,max i∈Ω _DG (11)

P _DGi,min 、P _DGi,max Respectively the minimum value and the maximum value of the active power output of the distributed power supply connected with the i node; q _DGi,min 、Q _DGi,max Respectively the minimum value and the maximum value of the reactive power output of the distributed power supply connected with the i node.

4. The method for recovering disturbance of the flexible power distribution network adaptive to full-time dynamic reconstruction according to claim 1, wherein the method comprises the following steps: the specific steps of the step 4 comprise:

step 4.1: constructing a discrete intelligent agent action space for network topology optimization, namely a network topology set which can be adopted after a flexible power distribution network line has a fault, wherein the action space of the discrete intelligent agent is as follows: { G (t) }

Step 4.2: constructing a continuous agent action space for optimizing controllable active devicesIn between, the action space of the continuous agent is: { P _k1 (t)、P _k2 (t)、Q _k1 (t)、Q _k2 (t)、P _DG (t)}；

in the formula: w is a _p ,w _g The weighting coefficients are respectively the network loss and the switching action frequency, M is a great positive number, and when the switching action frequency is out of range, a great punishment is given;

step 4.5: setting intelligent agent hyper-parameters;

step 4.6: and off-line training is based on a double-agent reinforcement learning model for optimizing operation after the fault of the flexible power distribution network.

5. The method for recovering disturbance of the flexible power distribution network adaptive to full-time dynamic reconstruction according to claim 1, wherein the method comprises the following steps: the specific method of the step 5 comprises the following steps: