CN114662982B

CN114662982B - Multistage dynamic reconstruction method for urban power distribution network based on machine learning

Info

Publication number: CN114662982B
Application number: CN202210399965.7A
Authority: CN
Inventors: 高红均; 王子晗; 贺帅佳; 马望
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2022-04-15
Filing date: 2022-04-15
Publication date: 2023-07-14
Anticipated expiration: 2042-04-15
Also published as: CN114662982A

Abstract

The invention relates to a machine learning-based multistage dynamic reconstruction method for an urban power distribution network, which belongs to the technical field of dynamic reconstruction of the urban power distribution network. Secondly, establishing a multi-agent reinforcement learning model, and carrying out joint optimization on different reconstruction subjects in each period; and (3) learning environmental information such as predicted load, photovoltaic energy output power and the like by using a deep Q network with parameter freezing and experience playback mechanisms, and dynamically reconstructing and optimizing the power distribution network by using a learned strategy set with the aim of optimizing running cost, voltage offset and load balance. The invention can realize the efficient, safe and economic operation of the urban power distribution network.

Description

Multistage dynamic reconstruction method for urban power distribution network based on machine learning

Technical Field

The invention belongs to the technical field of dynamic reconstruction of urban power distribution networks, and particularly relates to a multistage dynamic reconstruction method of an urban power distribution network based on machine learning.

Background

The access of the high-permeability distributed power supply and the rapid development of unbalanced load in urban areas lead to the extremely unbalanced space-time distribution of the net load of the urban power distribution network, and provide new challenges for the safe and economic operation of the power distribution network. The reconstruction of the power distribution network is taken as one of the active management measures of the power distribution network, and the network structure is adjusted by changing the on-off states of the interconnection switch and the sectionalizing switch so as to achieve the purposes of reducing network loss, balancing load, eliminating line overload and improving clean energy consumption. However, the traditional optimization method relies on an explicit model, a prediction technology and an optimization solver, so that the solving is time-consuming and on-line decision making is difficult to realize, and meanwhile, uncertainty caused by large-scale access of distributed power sources such as wind-driven photovoltaic and the like also increases the solving difficulty. Therefore, in the face of increasingly complex power grid environments, how to select an urban power distribution network reconstruction strategy, how to implement on-line decision of a reconstruction level, and how to handle uncertainty of a distributed power supply become important problems to be discussed and researched in the background of a novel power system.

Disclosure of Invention

The invention aims to provide a machine learning-based multistage dynamic reconstruction method for an urban power distribution network, which is used for solving the technical problems in the prior art, such as: the traditional optimization method relies on an explicit model, a prediction technology and an optimization solver, solving is time-consuming and is difficult to realize online decision, and meanwhile, uncertainty caused by large-scale access of distributed power sources such as wind-driven photovoltaic and the like also improves solving difficulty. Therefore, in the face of increasingly complex power grid environments, how to select an urban power distribution network reconstruction strategy, how to implement on-line decision of a reconstruction level, and how to handle uncertainty of a distributed power supply become important problems to be discussed and researched in the background of a novel power system.

In order to achieve the above purpose, the technical scheme of the invention is as follows:

a multistage dynamic reconstruction method of an urban power distribution network based on machine learning comprises the following steps:

s1: fitting the historical operation data of the power distribution network and the reconstruction level decision based on a neural network multi-label classification method to realize the on-line decision of the reconstruction level of the power distribution network;

s2: taking the power and current characteristics of each node in the power distribution network at each moment as an agent state space, taking the on-off state of a branch and the light discarding and load discarding rate as an action space, comprehensively considering the running cost, the voltage offset index and the load balance degree of the power distribution network by a reward function, taking the uncertainty of photovoltaic output into account in state transition, and constructing an urban power distribution network reconstruction model based on reinforcement learning;

s3: and carrying out joint optimization training on different agents distributed in the whole optimization period, and constructing a multi-agent reinforcement learning joint optimization model.

Further, the step S1 specifically includes:

node load, upper power grid interaction quantity and photovoltaic output measured by the whole system of the power distribution network are used as input characteristics of a neural network, and reconstruction levels of all the main bodies are used as output of the neural network; the neural network structure has 4 layers, namely an input layer, a hidden layer formed by two full-connection layers and an output layer;

the neural network structure is as follows:

wherein: w= { W ⁽¹⁾ ,b ⁽¹⁾ ,W ⁽²⁾ ,b ⁽²⁾ ,W ⁽³⁾ ,b ⁽³⁾ }，f _o Representing the sigmoid activation function,

and->

Representing a linear rectification activation function; the two hidden layers are provided with F neurons, and the neurons of the input layer pass through the weight W ⁽¹⁾ ∈R ^F×D And bias term b ⁽¹⁾ ∈R ^F×1 Hidden layer neuron h E R with first layer ^F×1 Connecting; hidden layer neuron h E R ^F×1 By weight W ⁽²⁾ ∈R ^F×F And bias term b ⁽²⁾ ∈R ^F×1 Is connected with the second layer hidden layer neuron, and finally is connected with the output layer neuron o E R through a sigmoid activation function ^L×1 Connection to limit the output range to [0,1 ]]Between them.

Further, in step S2,

state space:

wherein:

respectively representing the photovoltaic power generation amount of the photovoltaic node in the t period, the load reduction amount of the load reduction node and the purchase power amount of the upper power grid; i _ij,t A current flowing from the node i to the node j at the time t;

action space:

wherein: w (w) _ij Representing the communication state of the branch ij switch, wherein the communication state is a variable of 0-1;

respectively representing the photovoltaic power generation amount of the photovoltaic node in the t period and the load reduction amount of the load-reducible node; d represents the discretized granularity;

state transfer function:

s _t+1 ＝f(s _t ,a _t ,ρ)

wherein: ρ represents a random amount; this indicates that the state transition is not only subject to action a _t And is also influenced by randomness, and on the basis of the photovoltaic output prediction baseline, normal function noise is added during each training to simulate the uncertainty of the photovoltaic output.

Further, in step S2, the reward function is:

wherein: lambda (lambda) ₁ ，λ ₂ Respectively representing the rewarding weight coefficients;

giving corresponding punishment to the intelligent agent if the branch power exceeds the allowable upper limit; />

The economic operation cost is t time intervals of the power distribution network; />

Representing the line loss cost in the t period; />

Representing switching loss costs; />Representing cut-down of load costs; />

Representing the reject cost; Δt represents a time interval; c ^loss Representing the electricity price of unit network loss; r is (r) _ij Representing the resistance of branch ij; omega shape ^l Representing all branch sets in the optimization area;

respectively represents the single operation cost when changing the on-off states of a feeder tie switch, a transformer substation tie switch and a branch sectionalizer, and

each kind of contact switch state marks respectively representing the reconstruction area, 0 represents open, 1 represents close; omega shape ^LR Representing a reducible load set; c ^LR Representing unit load rejection cost; />

The load power of the node i is abandoned in the period t; omega shape ^PV Representing a collection of light Fu Jiedian; />

The unit light rejection penalty cost at the time t is represented; />

Representing the power of photovoltaic output at the moment t of the node i; />

The actual power of the node i in the period t of photovoltaic access to the power grid is represented; v (V) _i ^N And V _i,t Respectively a voltage rated value of a node i and an actual value of a period t; Ω denotes a set of all nodes; r is R _i,t Representing the load rate of the node i in the t period; />

Representing the average load rate of the power distribution network in the t period; p (P) _i,t Active power injected for node i in period t; p (P) _i ^max Active power is injected for the maximum allowable node i; n represents the number of nodes of the distribution network.

Further, the multi-agent reinforcement learning joint optimization model of step S3 firstly determines an optimization subject of a 24-hour period through step S1, and distributes different agents to different optimization subjects of different periods; distributing the same agent to the same optimizing subject in different time periods; and the distribution network structure changed by the execution action decision of the agent in the current period is matched with a state transfer function to serve as an agent state space in the next period.

A machine learning based multistage dynamic reconfiguration system for an urban power distribution network, comprising: reconstructing a level fast decision model and a distribution network optimization operation model based on reinforcement learning;

the reconstruction level fast decision model comprises: the system comprises a power distribution network state rapid sensing module, a reconstruction level decision module and a first information interaction module;

the power distribution network state rapid sensing module is used for monitoring photovoltaic power generation capacity of a real-time photovoltaic node of a power distribution network in real time, reducing load capacity of load nodes, power exchange capacity of a superior power grid and load demand capacity;

the reconstruction level decision module is used for deciding the reconstruction level of the power distribution network according to the running state of the urban power distribution network and limiting the optimization main body range of the reinforcement learning agent;

the first information interaction module is used for transmitting a reconstruction level decision result to the power distribution network optimization operation model for reinforcement learning;

the power distribution network optimization operation model based on reinforcement learning comprises a second information interaction module, a power distribution network state accurate sensing module, an experience pool module, a tie switch action module, a photovoltaic output decision module, a load reduction decision module, an agent joint operation module and a reconstruction decision module;

the second information interaction module is used for receiving a reconstruction level decision result of the reconstruction level quick decision model;

the power distribution network state accurate sensing module is used for accurately sensing real-time photovoltaic power generation capacity of the power distribution network, load reduction capacity capable of reducing load nodes, power exchange capacity of a superior power grid, load demand and branch current;

the experience pool module is used for storing historical operation environments of the power distribution network and rewarding values obtained after decision making is carried out on the historical operation environments and the corresponding agents;

the contact switch action module is used for remotely controlling the opening and closing actions of the contact switches of all the branches according to the corresponding reconstruction scheme;

the photovoltaic output decision module is used for deciding the photovoltaic node light rejection amount behavior under the corresponding reconstruction scheme;

the load reduction decision module is used for reducing load rejection decision of load nodes under the corresponding reconstruction scheme;

the intelligent agent joint operation module is used for carrying out power distribution network optimization operation decision by combining all intelligent agents;

and the reconstruction decision module is used for correspondingly controlling each module according to the multi-stage dynamic reconstruction method of the urban power distribution network based on machine learning.

Compared with the prior art, the invention has the following beneficial effects:

the scheme has the advantages that a machine learning urban power distribution network multistage dynamic reconstruction method is provided, a reconstruction level rapid judgment model and a multi-agent deep reinforcement learning model are established, and real-time decision of reconstruction level and optimized operation is realized. Firstly, an on-line decision of a reconstruction level is realized by establishing a neural network-based reconstruction level rapid judgment model, real-time reference is provided for a dispatcher, and meanwhile, the problem that the action space grows exponentially when a plurality of subjects are optimized by a traditional deep reinforcement learning single agent is solved by dividing an optimizing subject; secondly, simulating the uncertainty of the photovoltaic through a state transfer function, realizing convergence of training accuracy through a large amount of training, and solving the problem of difficult solution of the problem containing uncertainty; and finally, a multi-agent joint solution model is established to finish the solution of the multi-stage dynamic reconstruction problem of the power distribution network considering uncertainty, and the model does not need repeated solution under the similar running state of the power distribution network and has practicability.

Drawings

Fig. 1 is a schematic diagram of a multistage dynamic reconstruction method of an urban power distribution network based on machine learning.

FIG. 2 is a schematic diagram of the operation of the reconstruction level fast decision model of the present invention.

Fig. 3 is a working principle diagram of a reconstruction optimization operation model of the power distribution network based on reinforcement learning.

FIG. 4 is a block diagram of a multi-agent reinforcement learning architecture of the present invention.

FIG. 5 is a graph of the reconstruction level fast decision model fitting results of the present invention.

FIG. 6 is a graph of the results of the multi-agent reinforcement learning model optimization of the present invention.

Detailed Description

For the purpose of making the technical solution and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings and examples. It should be understood that the particular embodiments described herein are illustrative only and are not intended to limit the invention, i.e., the embodiments described are merely some, but not all, of the embodiments of the invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present invention. It is noted that relational terms such as "first" and "second", and the like, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.

Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

The features and capabilities of the present invention are described in further detail below in connection with the examples.

Examples:

the traditional optimization method relies on an explicit model, a prediction technology and an optimization solver, solving is time-consuming and is difficult to realize online decision, and meanwhile, uncertainty caused by large-scale access of distributed power sources such as wind-driven photovoltaic and the like also improves solving difficulty. Therefore, in the face of increasingly complex power grid environments, how to select an urban power distribution network reconstruction strategy, how to implement on-line decision of a reconstruction level, and how to handle uncertainty of a distributed power supply become important problems to be discussed and researched in the background of a novel power system.

As shown in fig. 1, the invention uses node load, upper power grid interaction quantity and photovoltaic output measured by a whole system as input characteristics based on measurement characteristics of space-time big data of a power system, so that sample data has space-time characteristics, provides labels for a neural network multi-label classification fitting model, uses binary cross entropy as a loss function, establishes a reconstruction level fast decision model, realizes fast decision of the reconstruction level of a power distribution network, and performs dimension reduction on a next reinforcement learning action space; the power distribution network with the reconstructed level judged is subjected to a single-agent reinforcement learning model containing parameter freezing and experience playback mechanisms by taking the optimal economic operation cost, voltage offset and load balance as targets and considering constraints such as the radiation performance, the tide and the like of a power distribution network structure; dividing a reconstruction optimization main body of the power distribution network in each period according to a 24-hour reconstruction level judgment result, establishing a multi-agent reinforcement learning model, performing joint optimization on the model, and performing validity verification on the method based on an example system.

As shown in fig. 2, the reconstruction level fast decision model consists of a power distribution network state fast sensing module, a reconstruction level decision module and an information interaction module; the power distribution network state rapid sensing module is used for monitoring photovoltaic power generation capacity of real-time photovoltaic nodes of the urban power distribution network in real time, reducing load capacity of load-reducible nodes, power exchange capacity of an upper-level power grid and load demand capacity; the reconstruction level decision module is used for deciding the reconstruction level of the power distribution network according to the running state of the urban power distribution network and limiting the optimization main body range of the reinforcement learning agent; the information interaction module is used for transmitting the reconstruction level decision result to the power distribution network optimization operation model for reinforcement learning;

the power distribution network optimization operation model based on reinforcement learning is composed of an information interaction module, a power distribution network state accurate sensing module, an experience pool module, a tie switch action module, a photovoltaic output decision module, a load reduction decision module, an agent joint operation module and a reconstruction decision module as shown in fig. 3. The information interaction module is used for receiving a reconstruction level decision result of the reconstruction level quick decision model; the power distribution network state accurate sensing module is used for accurately sensing real-time photovoltaic power generation capacity of the power distribution network, load reduction capacity capable of reducing load nodes, power exchange capacity of a superior power grid, load demand and branch current; the experience pool module is used for storing historical operation environments of the power distribution network and rewarding values obtained after decision making is carried out on the historical operation environments and corresponding intelligent agents; the contact switch action module is used for remotely controlling the switching action of each branch contact switch according to the corresponding reconstruction scheme; the photovoltaic output decision module is used for deciding the photovoltaic node light rejection amount behavior under the corresponding reconstruction scheme; the load reduction decision module is used for reducing the load rejection decision of the load node under the corresponding reconstruction scheme; the intelligent agent joint operation module is used for carrying out power distribution network optimization operation decision by combining all intelligent agents;

the reconstruction decision module is used for correspondingly controlling each module according to the machine learning-based power distribution network multistage dynamic reconstruction method established by the invention.

In the reconstruction level fast decision model,

the neural network structure:

and->

Representing a linear rectifying activation function. The two hidden layers are provided with F neurons, and the neurons of the input layer pass through the weight W ⁽¹⁾ ∈R ^F×D And bias term b ⁽¹⁾ ∈R ^F×1 Hidden layer neuron h E R with first layer ^F×1 And (5) connection. Hidden layer neuron h E R ^F×1 By weight W ⁽²⁾ ∈R ^F×F And bias term b ⁽²⁾ ∈R ^F×1 Is connected with the second layer hidden layer neuron, and finally is connected with the output layer neuron o E R through a sigmoid activation function ^L×1 Connection to limit the output range to [0,1 ]]Between them.

The binary cross entropy:

wherein: loss represents a binary cross entropy Loss function used to represent the error between evaluating a given real label and the predicted outcome in a multi-label classification task；o _nl And y _nl The prediction result for the tag l and the real tag are represented respectively.

Meanwhile, the reconstruction level of the power distribution network is judged to have coupling constraint, and the constraint is as follows:

(m, n correlation)

Wherein:

respectively representing substation level reconstruction, transformer level reconstruction or feeder level reconstruction which are required by different main bodies in t time periods, wherein the variables are 0-1; omega shape ^Sub Representing a substation node set; n (N) _i ^Trans Representing the total number of transformers contained in the i substation; f epsilon mu (i) represents a transformer f belonging to a transformer substation i; the formula (3) shows that the transformer substation can only selectively execute transformer level reconstruction or substation level reconstruction at the same time; equation (4) indicates that the actions of the mutually related substations m and n are consistent when whether to execute substation level reconstruction or not; equation (5) indicates that when the upper-layer substation i performs transformer level or substation level reconstruction, the inside of the substation cannot perform feeder level reconstruction at the same time.

In the reinforcement learning-based power distribution network optimization operation model,

the state space:

wherein:

respectively representing the photovoltaic power generation amount of the photovoltaic node in the t period, the load reduction amount of the load reduction node and the purchase power amount of the upper power grid; i _ij,t The current flowing from node i to node j at time t is shown.

The action space is as follows:

respectively representing the photovoltaic power generation amount of the photovoltaic node in the t period and the load reduction amount of the load-reducible node; d represents the discretized granularity.

Meanwhile, the action space needs to meet the switch set selection constraint, and the constraint is as follows:

wherein: w (w) _ij Representing the communication state of the branch ij switch, wherein the communication state is a variable of 0-1; omega shape _i ^SW,Sub 、N _i ^SW,Sub 、Ω _i ^SW,Trans And N _i ^SW ^,Trans Respectively representing the collection and the number of transformer station interconnection switches and transformer interconnection switches of the i transformer station;

and->

Respectively representing the collection and the quantity of feeder tie switches and branch on-off switches of transformers belonging to the substation i. From the above formula, it can be seen that: 1) When beta is _i ^SR ＝1、β _i ^TR ＝0、/>

When the transformer substation is in operation, the i transformer substation performs transformer substation level reconstruction, and the on-off state of the interconnection switch can be freely adjusted with the transformer interconnection switch, the feeder interconnection switch and the branch sectionalizer of the associated transformer substation; 2) When beta is _i ^SR ＝0、β _i ^TR ＝1、/>

When the transformer stage reconstruction is carried out by the i transformer station, the on-off states of the interconnection switch, the feeder interconnection switch and the branch sectionalizer of the transformer station can be freely adjusted; 3) When beta is _i ^SR ＝0、β _i ^TR ＝0、/>

And when the feeder level reconstruction of the transformer f belonging to the transformer i is executed, the on-off states of a feeder tie switch and a branch sectionalizer belonging to the transformer can be adjusted.

The state transfer function:

s _t+1 ＝f(s _t ,a _t ,ρ)

where ρ represents a random quantity. This indicates that the state transition is not only subject to action a _t And is also affected by randomness, and is added during each training round on the basis of the predicted baseline of the photovoltaic outputNormal function noise to simulate photovoltaic output uncertainty.

The reward function:

giving a larger punishment to the intelligent agent if the branch power exceeds the allowable upper limit; />

Representing the line loss cost in the t period;

representing switching loss costs; />

Representing cut-down of load costs; />

each type of tie switch status flag, 0, and 1 respectively represent a reconstruction area, respectively, is open and closed. Omega shape ^LR Representing a reducible load set; c ^LR Representing unit load rejection cost; />

The unit light rejection penalty cost at the time t is represented; />

Representing the power of photovoltaic output at the moment t of the node i; />

The multi-agent reinforcement learning model:

as shown in fig. 4, firstly, determining an optimized main body in a 24-hour period through a reconstruction level rapid judgment model, and distributing different intelligent agents to different optimized main bodies in different periods; distributing the same agent to the same optimizing subject in different time periods; and the distribution network structure changed by the execution action decision of the agent in the current period is matched with a state transfer function to serve as an agent state space in the next period.

The constraint conditions are as follows:

(1) Tidal current constraint

Wherein: p (P) _i,t And Q _i,t Respectively representing the active power and the reactive power of a node i at the moment t; v (V) _i,t Representing the voltage of node i at time t, the admittance between adjacent nodes is G _ij And B _ij ；θ _ij Is the voltage phase angle difference.

(2) Safe operation constraint

Wherein: w (w) _ij Indicating the switching state of branch ij, if w _ij =1, then it indicates that the branch ij switch is closed; omega shape ^SW Representing a collection of switching legs.

(3) Radiation constraints for power distribution network

Wherein: e (E) ^Always Representing the total number of the branches which are always in a closed state and cannot be adjusted in the net rack; l (L) ^s (i) To represent a set of branch terminal nodes with i as an initial node; l (L) ^e (i) Representing a branch initial node set taking i as a terminal node; n (N) ^Sub Representing the number of substations of the optimized subject, if feeder level reconstruction or transformer level reconstruction is performed, N ^Sub =1, the size is the sum of the numbers of the substation and the associated substation if substation level reconstruction is performed. However, the power distribution network containing DG may have island operation under the constraint of formula (32), so that the supplementary formulas (33-35) are needed, power epsilon is injected into the non-substation nodes, and the nodes are kept in a communicated state through the simplified power flow constraint.

And the auxiliary power flow active power on the branch ij at the moment t is represented.

And (3) performing example verification analysis:

the method is verified by adopting a modified practical 145 node system, fitting training is carried out on the reconstruction level fast decision model based on the existing data set, and validity analysis is carried out on the multi-agent reinforcement learning model by using the prediction data of initial load and photovoltaic output.

As shown in fig. 5, it can be seen that the loss function value of the neural network in the verification set is continuously reduced, approaches to the minimum value and gradually converges after 15 rounds, no fitting phenomenon occurs, and the prediction accuracy is 99% -100%, which indicates that the neural network has been fitted with a reconstruction level fast judgment model based on a mathematical method, thereby realizing accurate sensing of the power distribution network environment, shortening the reconstruction level judgment time, and being capable of fast judging the reconstruction level of the power distribution network.

As shown in fig. 6, for the multi-agent reinforcement learning model for dynamic multi-stage reconstruction of the urban distribution network, the model reaches the vicinity of the maximum value after training for 15 000 rounds, and the reason for the oscillation of the reward value is that the joint agent continuously tries new selection because of the setting of the search value, so as to avoid sinking into local optimum. The optimization effect is continuously improved and tends to be stable according to the rewarding trend, and the effectiveness of the model is verified.

According to the invention, the reconstruction level rapid judgment and operation optimization strategy based on machine learning is mainly researched by considering the space-time flexibility requirement of the net load of the urban power distribution network and the difference of the adjustment capability of the multi-type interconnection switches. Firstly, realizing the quick decision of the reconstruction level of the power distribution network through a quick decision model of the reconstruction level, and reducing the dimension of the next reinforcement learning action space. And secondly, establishing an optimal operation model of the power distribution network based on reinforcement learning by taking the optimal economic operation cost, voltage offset and load balance as targets and considering constraints such as the radiation performance, the tide and the like of the power distribution network structure. And finally, dividing the reconstruction optimization main body of the power distribution network in each period according to the 24-hour reconstruction level judgment result, establishing a multi-agent reinforcement learning model and carrying out joint optimization on the multi-agent reinforcement learning model.

The above is a preferred embodiment of the present invention, and all changes made according to the technical solution of the present invention belong to the protection scope of the present invention when the generated functional effects do not exceed the scope of the technical solution of the present invention.

Claims

1. The multistage dynamic reconfiguration method for the urban power distribution network based on machine learning is characterized by comprising the following steps of:

s3: performing joint optimization training on different agents distributed in the whole optimization period, and constructing a multi-agent reinforcement learning joint optimization model;

the step S1 is specifically as follows:

the neural network structure is as follows:

and->

Representing a linear rectification activation function; the two hidden layers are provided with F neurons, and the neurons of the input layer pass through the weight W ⁽¹⁾ ∈R ^F×D And bias term b ⁽¹⁾ ∈R ^F×1 Hidden layer neuron h E R with first layer ^F×1 Connecting; hidden layer neuron h E R ^F×1 By weight W ⁽²⁾ ∈R ^F×F And bias term b ⁽²⁾ ∈R ^F×1 Is connected with the second layer hidden layer neuron, and finally is connected with the output layer neuron o E R through a sigmoid activation function ^L×1 Connection to limit the output range to [0,1 ]]Between them;

in the step S2 of the process,

state space:

wherein:

action space:

state transfer function:

s _t+1 ＝f(s _t ,a _t ,ρ)

wherein: ρ represents a random amount; this indicates that the state transition is not only subject to action a _t On the basis of a photovoltaic output prediction baseline, normal function noise is added during each training to simulate the uncertainty of the photovoltaic output;

in step S2, the bonus function is:

Representing the line loss cost in the t period; />

Representing switching loss costs; />

Representing cut-down of load costs; />

The unit light rejection penalty cost at the time t is represented; />

Representing the power of photovoltaic output at the moment t of the node i; />

2. The method for multistage dynamic reconfiguration of an urban power distribution network based on machine learning according to claim 1, wherein the multi-agent reinforcement learning joint optimization model of step S3 firstly determines an optimization subject of 24-hour period through step S1, and distributes different agents to different optimization subjects of different periods; distributing the same agent to the same optimizing subject in different time periods; and the distribution network structure changed by the execution action decision of the agent in the current period is matched with a state transfer function to serve as an agent state space in the next period.

3. The utility model provides a multistage dynamic reconfiguration system of urban distribution network based on machine learning which characterized in that includes: reconstructing a level fast decision model and a distribution network optimization operation model based on reinforcement learning;

a reconstruction decision module for controlling each module according to the machine learning-based multistage dynamic reconstruction method for the urban power distribution network according to any one of claims 1-2.