CN114282330A

CN114282330A - Distribution network real-time dynamic reconstruction method and system based on branch dual-depth Q network

Info

Publication number: CN114282330A
Application number: CN202111625067.0A
Authority: CN
Inventors: 张玉敏; 吉兴全; 尹孜阳; 张旋; 于一潇; 杨子震; 刘志强; 朱应业; 赵国航; 刘小虎
Original assignee: Shandong University of Science and Technology
Current assignee: Shandong University of Science and Technology
Priority date: 2021-12-28
Filing date: 2021-12-28
Publication date: 2022-04-05

Abstract

The utility model provides a real-time dynamic reconfiguration method and a system of a distribution network based on a branched dual-depth Q network, wherein the reconfiguration method comprises the following processes: acquiring real-time node load and distributed power supply output of a power distribution network; transmitting the acquired data to a dynamic DNR model constructed based on a Markov Decision Process (MDP); the dynamic DNR model takes the minimum network loss cost and the switch action cost as objective functions; and obtaining a branch dual-depth Q network based on the loop decomposition of the power distribution network, solving the dynamic DNR model by adopting a Q learning algorithm to obtain a switching action set which enables the output return of the branch dual-depth Q network to be maximum, and updating the topological structure of the power distribution network according to the switching action set. By mining the time sequence dynamic change rule between the dynamic DNR decision variables and the decision result, the method does not need to perform power flow modeling and sectional decision during online application, does not depend on the load before the day and the output prediction of the distributed power supply, and can greatly improve the operation performance of the power distribution network.

Description

Distribution network real-time dynamic reconstruction method and system based on branch dual-depth Q network

Technical Field

The disclosure relates to the technical field of power distribution network reconstruction, in particular to a method and a system for dynamically reconstructing a power distribution network in real time based on a branched dual-depth Q network.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

Distribution Network Reconfiguration (DNR) is an important function in a Distribution management system, and generally aims to minimize Network loss and improve power quality and power supply reliability. DNRs can be divided into two categories: static reconfiguration and dynamic reconfiguration, the operation of Distribution Network safety, high quality and economic nature can be guaranteed in the dynamic reconfiguration, compares with static reconfiguration, and it accords with the demand that Distribution Network (DN) actually operated the dispatch more.

The dynamic reconfiguration algorithm based on the Long-short term memory (LSTM) network model and the switching action function can effectively solve the problem of dynamic Distribution Network Reconfiguration (DNR), but the method needs to be divided into two steps to solve the problem of dynamic reconfiguration, and load flow calculation is needed when the algorithm is executed, and the real-time load flow modeling of part of low-perception Distribution networks is often difficult.

Disclosure of Invention

The invention aims to solve the problems, provides a distribution network real-time dynamic reconstruction method and a distribution network real-time dynamic reconstruction system based on a branched dual-depth Q network, and excavates a time sequence dynamic change rule between a dynamic DNR decision variable and a decision result, so that power flow modeling and sectional decision making are not needed when the method is applied online, and the method does not depend on load before the day and output prediction of a distributed power supply, and can greatly improve the operation performance of a distribution network.

In order to achieve the purpose, the following technical scheme is adopted in the disclosure:

one or more embodiments provide a distribution network real-time dynamic reconstruction method based on a branched dual-depth Q network, including the following processes:

acquiring real-time node load and distributed power supply output of a power distribution network;

transmitting the acquired data to a dynamic DNR model constructed based on a Markov Decision Process (MDP); the dynamic DNR model takes the minimum network loss cost and the switch action cost as objective functions;

and obtaining a branch dual-depth Q network based on the loop decomposition of the power distribution network, solving the dynamic DNR model by adopting a Q learning algorithm to obtain a switching action set which enables the output return of the branch dual-depth Q network to be maximum, and updating the topological structure of the power distribution network according to the switching action set.

One or more embodiments provide a distribution network real-time dynamic reconfiguration system based on a branched dual-depth Q network, including:

an acquisition module: configured for acquiring real-time node loads and distributed power generation output of a power distribution network;

a Markov decision building module: configured for transferring the acquired data to a dynamic DNR model constructed based on a Markov decision process MDP; the dynamic DNR model takes the minimum network loss cost and the switch action cost as objective functions;

the power distribution network dynamic reconfiguration module: the method is configured to obtain a branched dual-depth Q network based on power distribution network loop decomposition, solve the dynamic DNR model by adopting a Q learning algorithm, obtain a switching action set enabling the branched dual-depth Q network to output maximum return, and update the topological structure of the power distribution network according to the switching action set.

An electronic device comprising a memory and a processor and computer instructions stored on the memory and executed on the processor, the computer instructions, when executed by the processor, performing the steps of the above method.

Compared with the prior art, the beneficial effect of this disclosure is:

the method is based on the loop decomposition of the power distribution network, improves the network structure of the Q learning algorithm, and obtains the deep reinforcement learning algorithm of the Branched Double Deep Q Network (BDDQN) to realize the solution of the dynamic DNR model. The BDDQN algorithm can search for the optimal decision of the distribution network Markov dynamic reconstruction model in an iterative mode, load flow calculation is not needed in the execution process, less operation cost can be generated through a dynamic reconstruction solution given by the BDDQN, and the operation performance of the system is greatly improved.

Advantages of additional aspects of the disclosure will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the disclosure.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and not to limit the disclosure.

Fig. 1 is a flow chart of a reconstruction method of embodiment 1 of the present disclosure;

fig. 2 is a markov dynamic reconfiguration decision framework diagram of the power distribution network according to embodiment 1 of the present disclosure;

fig. 3 is a dual depth Q network prior to modification of embodiment 1 of the present disclosure;

fig. 4 is a branched dual-depth Q network based on a loop improvement of a distribution network according to embodiment 1 of the present disclosure;

FIG. 5(a) is node load data of an IEEE 33 node system simulation test set in an example analysis of embodiment 1 of the present disclosure;

fig. 5(b) is DG output sequence data of the IEEE 33 node system simulation test set in the example analysis of embodiment 1 of the present disclosure;

FIG. 6 is a comparison of simulation results of an IEEE 33 node system in an example analysis of embodiment 1 of the present disclosure;

fig. 7(a) is a comparison graph of the comprehensive operating cost of the BDDQN algorithm and the static reconstruction algorithm of the present embodiment after reconstruction in the IEEE 33 node system simulation of the embodiment 1 of the present disclosure;

fig. 7(b) is a graph comparing the loss of the network generated by the BDDQN algorithm and the static reconfiguration algorithm in the simulation of the IEEE 33 node system in embodiment 1 of the disclosure;

fig. 7(c) is a graph comparing the lowest node voltage after the BDDQN algorithm and the static reconstruction algorithm of the present embodiment are reconstructed in the IEEE 33 node system simulation of embodiment 1 of the present disclosure;

FIG. 8(a) is node load data of 185 node system simulation test set in the example analysis of embodiment 1 of the present disclosure;

fig. 8(b) is DG output sequence data of 185 node system simulation test set in the example analysis of embodiment 1 of the present disclosure;

fig. 9(a) is a comparison graph of the comprehensive operating cost of the BDDQN algorithm and the static reconstruction algorithm of the present embodiment after reconstruction in the 185-node system simulation of embodiment 1 of the present disclosure;

fig. 9(b) is a graph comparing the loss of the network generated by the BDDQN algorithm and the static reconstruction algorithm in the 185-node system simulation of embodiment 1 of the present disclosure;

fig. 9(c) is a comparison diagram of the number of switching operations after the BDDQN algorithm and the static reconstruction algorithm of the embodiment are reconstructed in the 185-node system simulation of embodiment 1 of the present disclosure.

The specific implementation mode is as follows:

the present disclosure is further described with reference to the following drawings and examples.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise. It should be noted that, in the case of no conflict, the embodiments and features in the embodiments in the present disclosure may be combined with each other. The embodiments will be described in detail below with reference to the accompanying drawings.

The method firstly establishes a distribution network Markov dynamic reconstruction decision process based on a mathematical model of distribution network dynamic reconstruction and combined with a Markov decision process, wherein the distribution network Markov dynamic reconstruction decision process is used for minimizing the network loss cost and the switching action cost. Then, aiming at the characteristics of a distribution network of multi-ring network structure and radial operation, a traditional Double Deep Q Network (DDQN) algorithm is improved, and a deep reinforcement learning algorithm of a Branch Double Deep Q Network (BDDQN) based on loop decomposition is provided to realize the solution of dynamic DNR. Compared with the traditional method, the method provided by the disclosure can be used for mining the time sequence dynamic change rule between the dynamic DNR decision variable and the decision result according to the BDDQN algorithm, does not need to perform power flow modeling and segmentation decision during online application, and does not depend on load and Distributed Generation (DG) output prediction before the day. Example analysis shows that the method can effectively improve the system operation performance and the economy. The following examples are given by way of illustration.

Example 1

In the technical solutions disclosed in one or more embodiments, as shown in fig. 1, a method for dynamically reconstructing a distribution network in real time based on a branched dual-depth Q network includes the following steps:

step 1, acquiring real-time node load and distributed power supply output of a power distribution network;

step 2, transmitting the acquired data to a dynamic DNR model constructed based on the Markov decision process MDP; the dynamic DNR model takes the minimum network loss cost and the switch action cost as objective functions;

and 3, improving the structure of the Q learning algorithm network based on the power distribution network loop decomposition to obtain a branched double-depth Q network, solving the dynamic DNR model by adopting the Q learning algorithm to obtain a switching action set which enables the branched double-depth Q network to output maximum return, and updating the topological structure of the power distribution network according to the switching action set.

In this embodiment, a network structure of a Q learning algorithm is improved based on loop decomposition of a power distribution network, and a deep reinforcement learning algorithm of a Branch Double Deep Q Network (BDDQN) is obtained to implement solution of a dynamic DNR model. The BDDQN algorithm can search for the optimal decision of the distribution network Markov dynamic reconstruction model in an iterative mode, load flow calculation is not needed in the execution process, less operation cost can be generated through a dynamic reconstruction solution given by the BDDQN, and the operation performance of the system is greatly improved.

The details will be described below.

(1) And constructing a dynamic DNR model based on the Markov decision process MDP.

The principle of the Markov decision process MDP is as follows:

MDP is a decision making process of a stochastic dynamic system based on Markov process theory, and mainly comprises 5 elements:

in the formula, S represents a state set, and is a set of all states that an Agent can perceive the environment. A represents the action set, which is the set of all possible actions of an Agent. And R is a return set, is an instant return set fed back to the Agent by the environment according to the state and the action, and is an index for evaluating the quality of the action. P represents a state transition probability matrix, determined by the environment, while in the course of MDP the state transition probabilities satisfy markov, i.e. the state currently in and the action taken only have an effect on the state at the next moment. Gamma epsilon [0,1] is the attenuation factor.

In MDP, the mapping between state S and optimal action A is called policy π:

if action A_tE.g. A is definite, then pi is a definite strategy A_t＝π(S_t) If action A_tObeying a probability distribution, then pi is a random strategy pi (A)_t|S_t) The ultimate goal of MDP is to find an optimal strategy to maximize the cumulative return G_t：

G_t＝R_t+γR_t+1+γ²R_t+2+L (2)

In the formula, R_tRepresenting the immediate return at time t.

Written in the form of the mathematical expectation, i.e. the state cost function:

V^π(s)＝E(G_t|S_t＝s) (3)

wherein the strategy pi determines the distribution of actions and the impact on state transition probability and immediate return. The mathematical expectation of generating a reward after an Agent performs an action a in state s is defined as the action value Q^π(s, a), namely:

Q^π(s,a)＝E(G_t|S_t＝s,A_t＝a) (4)

according to equations (3) and (4), the action potential function of strategy pi can be written as:

A^π(s,a)＝Q^π(s,a)-V^π(s) (5)

this function represents the return from the system taking action a at state s. Note that this return is not an immediate return, but a long-term return. It can be shown theoretically that there is an optimum strategy pi in MDP^*，π^*Will generate a set of time series action A₀,A₁,A₂L,A_tMake the system slave state S₀The cumulative reward from the departure is the greatest. The RL algorithm aims to search the optimal strategy pi through continuous interaction between the environment and the intelligent agent^*. The Bellman optimal equation provides a theoretical basis for finding the optimal strategy:

in the formula (I), the compound is shown in the specification,

this indicates the probability that the system state becomes s' after performing action a when the system is in state s. R (s, a) is the real-time reward generated after the system executes the action a in the state s.

The dynamic DNR model is constructed based on the Markov decision process MDP, and the construction process is as follows:

1.1 construct the state set and action set of the MDP.

Integrating node load and DG output into node injection power, and using the node injection power and the set of disconnected switches as a state set S in the MDP^DRNamely:

in the formula, P_t ^injAnd

and respectively representing the injected active power and reactive power of all nodes of the distribution network at the moment t.

Defining a set of switches of an action as a set of actions

Wherein the motion switch is coded according to its position in the basic loop.

1.2, constructing a distribution network state transition probability function matrix according to the state set and the action set;

a current state S can be defined based on the state set and the action set_t ^DRAnd the next state

State transition probability function matrix between:

1.3 in this embodiment, optionally, the inverse number of the operation cost of the power distribution network is set as the immediate return of the MDP, and a penalty term is added, where the penalty term is used to penalize a switching action policy that does not meet the system security constraint, that is:

in the formula, S_t ^DRIndicating the state at time t. A. the_t ^DRAnd an operation switch at time t. Lambda [ alpha ]^DRIs a binary variable, λ when the action switch satisfies the system constraint ^DR1, or conversely λ ^DR0. M is a larger positive real number, and when the action does not satisfy the system safety constraint, the denominator term in the formula will be modified into a larger number by M, so that the return approaches to 0. Beta' is a positive integer.

Through the transformation, the decision problem of dynamic reconstruction can be converted into an MDP, and the optimal Bellman equation of the dynamic reconstruction of the power distribution network is a constructed model and can be expressed as follows:

wherein the content of the first and second substances,

indicating that the system is in state S_t ^DRWhen it is time, perform action A_t ^DRAfter that, the system state becomes

The probability of (d);

is the system in state S_t ^DRWhen it is time, perform action A_t ^DRAnd then generate an immediate reward.

As shown in fig. 2, for a power distribution network markov dynamic reconfiguration decision framework, a training Agent (Agent) acquires action information and state information by interacting with the power distribution network, and selects an optimal switching action capable of achieving a preset target. After the Agent makes a decision, the system can make a return feedback on the execution condition of the current action, and the Agent learns according to the quality of the feedback signal and finally finds the optimal strategy.

(2) And solving the dynamic DNR model by adopting a Q learning algorithm based on the loop decomposition of the power distribution network.

The Q learning algorithm is a typical reinforcement learning algorithm based on a value function, that is, in a certain state, expectation that a profit can be obtained after a maximization action, and learns an optimal strategy by maximizing a Q value, wherein an update formula of the Q value is as follows:

in the formula, s_tIndicates the state at time t, a_tRepresents the action performed at time t, δ ∈ [0,1]]Representing the learning rate, gamma a discount factor, R_tIs shown inExecute action a at time t_tImmediate return obtained later, Q(s)_t+1And a) represents the execution of action a_tRear state s_tThe resulting Q estimate.

The conventional Q learning algorithm stores the Q value by updating the Q table, but the number of states and actions tends to be huge when solving the dynamic reconstruction problem, and thus the learning efficiency of the Q learning algorithm may be low.

In the dynamic reconstruction DNR problem of the power distribution network, the action states of the switches are correlated, and the switches in each loop act in a coordinated manner to achieve the optimal system operation state.

In this embodiment, an algorithm of a Double Deep Q Network (DDQN) is adopted, and the algorithm uses two Q networks with the same structure, one of which is a current Q network Q_predictFor selecting actions and updating model parameters, and another for a target Q network Q_targetFor calculating a target Q value. The over-estimation problem is eliminated by decoupling the two steps of selection of the target Q value action and calculation of the target Q value.

As shown in fig. 3, the experience pool is used to store state features, motion vectors and real-time reports generated by Agent interacting with the environment during the iterative process of the algorithm.

The method is based on power distribution network loop decomposition, the dynamic DNR model is solved by adopting a Q learning algorithm, the structure of a Q learning algorithm network is improved based on the power distribution network loop decomposition, the solution is carried out based on the improved network, and the method specifically comprises the following steps:

2.1, taking the number of loops in the power distribution network as the output dimensionality of the Q learning algorithm network, taking the dimensionality of each dimensionality output vector as the switch quantity of the loop to which the dimensionality belongs, and improving the structure of the Q learning algorithm network to obtain a branch dual-depth Q network;

optionally, in this embodiment, the Q network with one-dimensional output is improved to a Q network with multidimensional output, and the output dimension of the Q network is equal to the number of basic loops in the DNR problem. Each dimension of the Q network is output as a vector, the dimension of each vector being determined by the number of switches that can be actuated in the loop. As shown in fig. 4, the structure of the Q network is improved based on the number of the power distribution network loops, that is, the Q network is a branched dual-depth Q network.

Assuming that a certain distribution network has L loops, namely L branches, and the maximum switching action number in the loops is d, the Q network outputs an L multiplied by d state value matrix. The state value matrix

And the switch action matrix (FLM) are of the same dimension,

each value in (a) is the value produced by the actuation of the corresponding position switch in the FLM. If the element in FLM is 0, then it will be correspondingly

The element at the middle position may be set to 0.

The FLM is a basic loop matrix which is formed by a basic loop according to the fact that one tie switch and a plurality of section switches are adopted in a radial distribution network, wherein each line of the FLM is all tie switches and section switches contained in one loop.

2.2 obtaining a set of switching actions that maximize the output return of the Q learning algorithm network per dimension based on the improved branch dual depth Q network solution

According to

And a greedy selection strategy to select

The switch with the largest return in each dimension is used as a decision action switch, and the action and the cost function are selected according to an epsilon greedy selection method

Comprises the following steps:

in the formula (I), the compound is shown in the specification,

and forming a group of switch action strategies for the action with the maximum value of each row in the state value matrix.

Function of value

The action that yields the greatest value can be selected from each row of the state-value matrix to form a complete reconstruction strategy.

The improved Q learning algorithm network, namely the branched double-depth Q network, decomposes the original one-dimensional complex decision problem into multi-dimensional simple decisions.

Further, the method also comprises a step of training the branch dual deep Q network, and the training process and the using process can be separated, and the method comprises the following steps:

step 3.1, acquiring historical operation data of the power distribution network and network structure parameters of the power distribution network, and constructing a training set;

the historical operation data comprises historical node loads of the power distribution network and output of the distributed power supply;

step 3.2, initializing the structure and parameters of the branched dual-depth Q network, generating an initial experience pool, and starting algorithm iteration;

step 3.3, initializing a state vector, collecting samples from the experience pool and inputting the samples into the branched dual-depth Q network;

step 3.4, according to the switch action set in the collected sample

Calculating the output and loss function of the branched dual-depth Q network, and updating an experience pool;

after selecting the action switch with the maximum return in each dimension, the action is performed

Is inputted into

To calculate

Then added with the reward R of the current action_tThen the target Q can be obtained^*The values, namely:

where Z is a one-dimensional row vector with columns equal to the number of loops, which contains all 1's.

At this time, the process of the present invention,

and

no longer just a numerical value but a one-dimensional vector, so to calculate the average loss, the modified loss function is:

and 3.5, optimizing parameters of the branched double-depth Q network by using a gradient descent method, and performing the next iteration until data of the training set is traversed.

In this embodiment, the initial data of the experience pool may be directly replaced with a historical data set of the power distribution network, that is, a DNR data set, which may increase the convergence rate of the algorithm.

In step 3.4, the experience pool is updated, and in each iteration process, assuming that the traversal reaches the time t, the step of updating the experience pool of the branched dual-depth Q network (BDDQN) is as follows:

the method comprises the following steps: in a branched dual depth Q networkUsing the state S at the current moment_t ^DRSelecting corresponding switch actions A by epsilon greedy selection method as input_t ^DR：

Wherein beta is Q_predictThe parameter (c) of (c).

Is a random number. ε ∈ [0,1]]Probability of greedy selection. The action selection method selects a strategy with the maximum Q value as the optimal action according to the probability of epsilon. Or randomly choose an action with a probability of 1-epsilon to achieve diversity in policy search.

Step two: at the current state S_t ^DRPerforming action corresponds to A_t ^DRThen, carrying out load flow calculation and calculating to obtain the real-time return R_t ^DRAnd obtaining the state of the next moment by centralized indexing from the node load and DG output data of the power distribution network

Step three: to be obtained

And storing the 5 tuples into an experience pool, judging whether the capacity of the experience pool reaches an upper limit, if so, deleting the first old data in the experience pool, namely deleting the earliest data according to the data storage time of the experience pool.

Example analysis

To verify the validity of the proposed algorithm, this section performs simulation verification on IEEE 33 node system and 185 node system, respectively.

The DRL method solves the dynamic DNR problem in the interaction process of a decision-making main body and the environment. Therefore, in the simulation process, MATLAB is used for simulating a training environment through load flow calculation, Python is used for training an algorithm main body, and the interaction process can be realized through an interface between the MATLAB and the algorithm main body. The CPU of the test equipment is 4 cores i5-8250U-1.6GHz, and the GPU is Nvidia GeForce GTX 1060. In addition, the hyper-parameters of the DRL algorithm provided in this embodiment are set as: discount factor 0.99, learning rate 0.001, experience pool capacity 50000, training round number 9000, batch number 128, greedy selection probability 0.8. The historical data and parameters of the power distribution network are set according to the following principle: the load data is from the literature "Real-time power system state estimation and implementation of visual device estimated neural networks", the DG output data is from "2014 Global Energy computing composition", by means of a second-order cone programming and heuristic DNR method, a DNR data set is obtained by simulating system operation, 8760 groups of continuous load and DG data are taken as a unit to generate a training set, and 4000 groups of continuous load and DG data are taken to generate a test set.

IEEE 33 node system

In order to analyze the learning effect of model learning, the weights of the strategic neural network are recorded every 30 rounds of training in the training process of the algorithm and are used for evaluating the decision convergence performance of the algorithm on a test set. The test set is 100 groups of real-time continuous node loads and DG output sequences as shown in FIG. 5, the system operation cost in the test set is used as an evaluation criterion, and the evaluation criterion is compared with the traditional DDQN algorithm, and the result is shown in FIG. 6.

As can be seen in fig. 6, BDDQN and DDQN begin to converge around iterations 110 and 200, respectively. Since the output of the Q network is set to be in a multidimensional form in this embodiment, a representation manner of a reconstructed solution in a decision process is simplified, and the search efficiency of the conventional DDQN optimal decision can be improved, in fig. 6, BDDQN can converge to the optimal decision at a faster speed. The combined running cost of BDDQN and DDQN is 1.2076 × 10 respectively when iterating to convergence³USD and 1.2155X 10³And (5) USD. From all possible action switches as counted in table 1, it can be seen that if the strategy is encoded in a DDQN one-dimensional output manner, the number of strategies to be selected is 1200, and encoded in a BDDQN multi-dimensional manner, the strategies to be selected are decomposed into 5, 2, 6, 4, and 5, and each dimension is convertedIs a simple combination of strategies. Therefore, the strategy search spaces of the two algorithms are significantly different, and the optimal dynamic reconstruction strategy is difficult to search by the DDQN, so that the BDDQN algorithm in the graph can finally reduce the operation cost to a lower level.

Table 1 reconstruction results statistics

To further verify the effect of the method proposed in this embodiment, under the load condition of fig. 5, the network loss, the lowest node voltage, and the comprehensive operating cost generated by the static reconstruction and the reconstruction scheme determined by BDDQN in each time interval are compared.

First, as can be seen from fig. 7(a), since static reconfiguration minimizes network loss by frequently actuating switches, the loss caused by the switching actions may cause the cost curve to fluctuate greatly, while BDDQN, taking into account the cost of the switching actions, gives a dynamic reconfiguration strategy that aims at minimizing the operating cost, and under the load condition of fig. 5, BDDQN gives a reconfiguration strategy that branches S33, S14, S9, S36, and S27 are disconnected, so that the cost of the switching actions is 0 in the 100 reconfiguration periods, and therefore, the change trends of the bdn and the non-reconfigured cost curve are consistent, and the situation of large fluctuation does not occur. The sum of the running costs of dynamic, static and non-reconstructed in FIG. 7(a) is 1.2076 × 10³USD，1.6377×10³USD and 1.7737X 10³From the total operating cost, the USD can be seen that although the switch is frequently operated in the static reconfiguration, the cost is still lower than the initial non-reconfiguration cost, and the operating cost can be greatly reduced by the dynamic reconfiguration solution provided by BDDQN.

It can be seen from observing fig. 7(b) that the system loss after static reconstruction is smaller than that of dynamic reconstruction, but the difference between the two loss curves is not large, wherein the sum of the static and dynamic reconstruction loss is 7.3858 × 10³kW and 7.5494X 10³kW, only 163.6kW difference. Therefore, the running cost of the dynamic reconfiguration is less than that of the static reconfiguration after the switching action cost is considered26.38％。

As can be seen from fig. 7(c), the lowest node voltages after dynamic reconfiguration are all greater than 0.95p.u., and the system operation constraint is satisfied. In addition, since the reduced voltage deviation is not set as the objective function in the present embodiment, the lowest node voltage after the partial period dynamic reconstruction is higher than the static reconstruction.

In order to compare the dynamic reconstruction method based on the LSTM model and the switching action function and the reconstruction optimization effect of the dynamic reconstruction algorithm based on the BDDQN model provided in this embodiment, the first 24 time periods in fig. 5 are taken as the dynamic reconstruction optimization cycles, and four different algorithms are compared, and the results are shown in table 2.

TABLE 2

As can be seen from table 2, the static reconfiguration during this optimization period, although the total network loss is the lowest, is the highest in operating cost, even higher than the original, unreconfigured state, because it frequently actuates the switch. In all dynamic reconstruction strategies, the loss reduction rate of the MISOCP algorithm is the lowest, but the switching action times of the reconstruction strategy given by the algorithm are 4 times, so the comprehensive operation cost is slightly higher than BDDQN. In terms of cost reduction ratio, the reconstruction scheme given by BDDQN is the highest, because the reconstruction scheme of BDDQN is only that the branch [ S33, S14, S9, S36, S27] is disconnected, and the reconstruction scheme can ensure that the total network loss in the whole optimization period is at a lower level and the number of switching actions is the least. Therefore, it can be seen that the distribution network dynamic reconfiguration method based on BDDQN provided by the embodiment can effectively reduce the comprehensive operation cost of the IEEE 33 node system, and is superior to the existing algorithm.

185 node system

The proposed algorithm is validated on 185 the actual node system. The test was first performed under the load condition of fig. 8.

The comprehensive operation cost, the network loss and the switching action times generated by the static reconstruction of each time interval and the most reconstructed scheme decided by the BDDQN are compared, as shown in FIG. 9.

First, as can be seen from fig. 9(a), in such a large-scale system, since the static reconfiguration may operate the switch more frequently, the loss caused by the switching operation also causes the cost curve to fluctuate greatly, even more than the cost without the reconfiguration. Meanwhile, the situation that the cost of dynamic reconstruction is larger than that of static reconstruction occurs in a small part of time periods of 12, 14-16, 60, 61, 82, 83 and the like can also be found. This is because, in order to reduce the number of switching operations in the optimization cycle during the dynamic reconfiguration, the dynamic reconfiguration strategy in some time periods cannot ensure the minimum network loss, and if the switching operation cost loss of the static reconfiguration in these time periods is smaller than the gain caused by the loss reduction, the running cost of the static reconfiguration is smaller than the dynamic reconfiguration. The sum of the running costs of dynamic, static and non-reconstructed in graph (a) is 1.6772 × 10³USD，1.7841×10³USD and 2.1663X 10³In USD, as seen from the total operating cost, static reconfiguration can significantly reduce the operating cost even though the switch is frequently operated, but this affects the lifetime of the switch. And the dynamic reconstruction solution given by the BDDQN can generate less operation cost so as to improve the economical efficiency of system operation.

Then, as can be seen from observing fig. 9(b), although the system loss after static reconstruction is smaller than that of dynamic reconstruction, the difference between the two loss curves is not large, wherein the sum of the static and dynamic reconstruction loss is 1.0083 × 10⁴kW and 9.7145X 10³kW, only 368.5kW difference. In addition, since the system is complex with 20 loops, the number of switching actions for static and dynamic reconfiguration is 120 and 32 in fig. 7(b), respectively. After considering the switching action cost, the running cost of the dynamic reconfiguration is 5.99 percent less than that of the static reconfiguration. Although the cost reduction ratio of the BDDQN algorithm is smaller than that of an IEEE 33 node system in the complex system, the action times and the action frequency of the switch after dynamic reconstruction are obviously smaller than those of static reconstruction, so that the dynamic reconstruction strategy decided by the BDDQN can effectively prolong the service life of the switch and reduce the operation cost of the system.

To further verify the superiority of the algorithm proposed in this embodiment, the first 24 time periods in fig. 8 are taken as a dynamic reconstruction optimization cycle, and three different algorithms are compared, with the results shown in table 3.

TABLE 3 comparison of dynamic reconstruction effects

As can be seen from table 3, in the optimization period of the complex system, the total network loss is also the lowest by the static reconfiguration, and since the network loss in the initial state of the system is too high, the static reconfiguration frequently operates the switch, but the running cost in table 2 is not higher than the original non-reconfiguration cost. Furthermore, it can be seen that the reconstruction scheme given by BDDQN is also the highest in cost reduction ratio. However, since the number of switching actions of the decision scheme is 8, the increase of the cost reduction rate is smaller than the decision scheme of the algorithm on the IEEE 33 node system.

Finally, in order to verify the superiority of the method provided by the embodiment in the calculation speed, the dynamic reconstruction decision time of several different algorithms is tested, and since both algorithms provided by the embodiment are trained in an off-line manner, only the decision time is counted in the table. The results are shown in Table 4.

TABLE 4 comparison of computational efficiencies of different dynamic reconstruction methods

The result of the ISOCP algorithm is the computation time at three different dynamic reconstruction periods T-1, T-5 and T-24. First, it can be seen that the decision time of the two dynamic reconstruction methods based on data driving provided in this embodiment is the shortest, and when the distribution network scale increases, the advantages of the algorithm based on data driving become more obvious. The dynamic reconstruction method based on the LSTM model decision needs to perform two power flow calculations, so the calculation efficiency is slightly lower than that of the BDDQN.

It can also be seen that the MISOCP computation time grows exponentially with the number of reconstruction periods. When T is 1, it corresponds to a single-period static reconstruction. When T is 5, the computation time of the 33-node system increases by 96.67 times with respect to T being 1. And at 185 nodes, T is 5, the calculation time is increased by 591.71 times. This is because the constraint dimension of MISOCP increases with the increase of the dynamic reconstruction period, and a commercial solver cannot effectively obtain an optimal solution when the constraint dimension is too high. And when T is 24, the constrained dimensions of the 33 and 185 node systems are 11160 and 61440 respectively, the MISOCP algorithm takes 3.37 hours to obtain the optimal solution on the 33 node system, and cannot obtain the optimal solution in an acceptable time on the 185 node system.

Therefore, in conclusion, the dynamic reconstruction method based on data driving is obviously superior to the traditional method in computational efficiency, and due to the advantage, the method can rapidly make reconstruction decisions after the real-time running state of the distribution network is obtained, so that the dependence on high prediction accuracy of load and DG output is reduced.

The results of the examples show that the BDDQN algorithm provided by the embodiment can effectively learn the dynamic reconstruction strategy and can provide an ideal dynamic reconstruction scheme of the current reconstruction period within millisecond time according to the real-time running state of the system. Compared with the traditional day-ahead dynamic reconstruction method, the method provided by the embodiment has the advantages that the cost reduction rate is remarkably improved, the real-time state of the system is directly utilized for decision making, the day-ahead prediction of the load and the DG output is not relied on, and the prediction precision requirements on the load and the DG output are not high.

Example 2

Based on embodiment 1, this embodiment provides a distribution network real-time dynamic reconfiguration system based on a branched dual-depth Q network, including:

a Markov decision building module: configured for transmitting the acquired data to a markov decision process-based MDP building dynamic DNR model; the dynamic DNR model takes the minimum network loss cost and the switch action cost as objective functions;

Example 3

The embodiment is an electronic device, which includes a memory, a processor, and computer instructions stored in the memory and executed on the processor, where the computer instructions, when executed by the processor, implement the steps of the above method.

The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Although the present disclosure has been described with reference to specific embodiments, it should be understood that the scope of the present disclosure is not limited thereto, and those skilled in the art will appreciate that various modifications and changes can be made without departing from the spirit and scope of the present disclosure.

Claims

1. A distribution network real-time dynamic reconstruction method based on a branched dual-depth Q network is characterized by comprising the following processes:

and obtaining a branched double-depth Q network based on the loop decomposition and improvement of the power distribution network, solving the dynamic DNR model by adopting a Q learning algorithm to obtain a switching action set which enables the branched double-depth Q network to output maximum return, and updating the topological structure of the power distribution network according to the switching action set.

2. The method for dynamically reconstructing a distribution network in real time based on the branched dual-depth Q network as claimed in claim 1, wherein: constructing a dynamic DNR model based on a Markov decision process, specifically: and integrating the node load and the distributed power output into node injection power, and taking the node injection power and a set of disconnected switches as a state set of a Markov decision process.

3. The method for dynamically reconstructing a distribution network in real time based on the branched dual-depth Q network as claimed in claim 1, wherein: constructing a dynamic DNR model based on a Markov decision process, specifically: and setting the reciprocal of the operation cost of the power distribution network as the instant return of a Markov decision process, wherein the instant return also comprises a punishment item which is used for punishment of a switch action strategy which does not accord with the system safety constraint.

4. The method for dynamically reconstructing a distribution network in real time based on the branched dual-depth Q network as claimed in claim 1, wherein: based on the loop decomposition of the power distribution network, the obtained structure of the branched dual-depth Q network is as follows: and taking the number of loops in the power distribution network as the output dimensionality of the Q learning algorithm network, wherein the dimensionality of each dimensionality output vector is the switch number of the loop to which the dimensionality belongs.

5. The method for dynamically reconstructing a distribution network in real time based on the branched dual-depth Q network as claimed in claim 4, wherein: and selecting the switch with the largest return in the output dimensionality of the Q learning algorithm network by adopting a greedy selection strategy to serve as a decision action switch.

6. The method for dynamically reconstructing a distribution network in real time based on the branched dual-depth Q network as claimed in claim 4, wherein: the method also comprises a step of training the branched dual-depth Q network, which comprises the following steps:

acquiring historical operation data of the power distribution network and network structure parameters of the power distribution network, and constructing a training set;

initializing the structure and parameters of the branched dual-depth Q network, generating an initial experience pool, and starting algorithm iteration;

initializing a state vector, collecting samples from an experience pool and inputting the samples into a branched dual-depth Q network;

calculating the output and loss function of the branched dual-depth Q network according to a switching action set in a collected sample, and updating an experience pool;

and optimizing parameters of the branched double-depth Q network by using a gradient descent method, and performing the next iteration until data of the training set is traversed.

7. The method for dynamically reconstructing a distribution network in real time based on the branched dual-depth Q network as claimed in claim 6, wherein: the data of the initial experience pool is replaced by a historical data set of the power distribution network.

8. The method for dynamically reconstructing a distribution network in real time based on the branched dual-depth Q network as claimed in claim 6, wherein: and (3) the method for updating the experience pool is traversed to the time t, and the steps are as follows:

using state S at the current time in a bifurcated dual-deep Q network_t ^DRSelecting corresponding switch actions A by epsilon greedy selection method as input_t ^DR；

At the current state S_t ^DRPerforming action corresponds to A_t ^DRThen, carrying out load flow calculation and calculating to obtain the real-time return R_t ^DRAnd obtaining the state of the next moment by centralized indexing from the node load and DG output data of the power distribution network

To be obtained

And storing the 5 tuples into an experience pool, judging whether the capacity of the experience pool reaches an upper limit, and if so, deleting the earliest data according to the data storage time of the experience pool.

9. Distribution network real-time dynamic reconfiguration system based on branch dual depth Q network, characterized by includes:

10. An electronic device comprising a memory and a processor and computer instructions stored on the memory and executable on the processor, the computer instructions when executed by the processor performing the steps of the method of any one of claims 1 to 8.