CN114298429A

CN114298429A - Power distribution network scheme aided decision-making method, system, device and storage medium

Info

Publication number: CN114298429A
Application number: CN202111661200.8A
Authority: CN
Inventors: 齐小伟; 陈秀海; 李昕; 李永勋; 姚巍; 韩爽; 关鹏; 陈佳博; 彭博; 张育臣
Original assignee: State Grid Corp of China SGCC; State Grid Beijing Electric Power Co Ltd
Current assignee: State Grid Corp of China SGCC; State Grid Beijing Electric Power Co Ltd
Priority date: 2021-12-30
Filing date: 2021-12-30
Publication date: 2022-04-08

Abstract

The invention discloses a power distribution network scheme-based aid decision-making method, a system, a device and a storage medium, wherein the method comprises the following steps: acquiring online operation data of the power distribution network; inputting the online operation data of the power distribution network into a preset power distribution network dynamic case auxiliary decision model; the power distribution network scheme-based auxiliary decision-making model outputs a network structure for ensuring the power distribution network to run safely and indefinitely; the power distribution network scheme-based aided decision-making model is formed based on training of a DQN reinforcement learning algorithm. The topological structure of the power distribution network can be reconstructed only by inputting the corresponding state of the on-line power distribution network, and the power distribution network scheme-based auxiliary decision-making method based on the reinforcement learning method has the advantages of high operation speed, high efficiency and the like because all reconstruction schemes of the whole power distribution network do not need to be completely calculated in the judgment process, the calculated amount is small, and the consumed time is short.

Description

Power distribution network scheme aided decision-making method, system, device and storage medium

Technical Field

The invention belongs to the technical field of power grid operation safety, and particularly relates to a power distribution network scheme-based auxiliary decision-making method, a system, a device and a storage medium.

Background

With the rapid development of urban power distribution network technology, the construction of the power distribution network gradually enters a high-reliability stage. According to statistics, more than three-fourths of the power failure accidents of users are caused by the faults of the power distribution network. Meanwhile, with the popularization of distributed renewable energy (DRG), power distribution networks face changes in supply and demand relationships. How to ensure the safe and economic operation of the distribution network becomes more and more important. The Distribution Network Reconfiguration (DNR) adjusts the topological structure of the Distribution network by controlling the on/off state of the connecting switch, so that the network loss can be reduced, the voltage quality of the Distribution network is improved, and the safe and stable operation of the power grid is ensured.

The traditional method for optimizing the power distribution network scheduling by using network reconstruction mainly comprises heuristic algorithms such as a brute force search algorithm, a genetic algorithm and the like. However, the network topology structure of the power distribution network is complex, the distribution is relatively dispersed, the equipment variety is various, the equipment running state is easily affected by external factors, and the like, so that the calculation amount is large when the traditional method is adopted for network reconstruction, the consumed time is relatively long, and the final reconstruction efficiency is low.

Disclosure of Invention

The invention aims to provide a power distribution network scheme-based decision-making assisting method, a system, a device and a storage medium, and aims to solve the problems that in the prior art, heuristic methods such as a traditional genetic algorithm are low in decision-making efficiency and untimely in decision-making due to too long calculation time.

In order to achieve the purpose, the invention adopts the following technical scheme:

the invention provides a power distribution network dynamic case assistant decision method based on a deep neural network, which comprises the following steps:

acquiring online operation data of the power distribution network;

inputting the online operation data of the power distribution network into a preset power distribution network dynamic case auxiliary decision model; the power distribution network scheme-based auxiliary decision-making model outputs a network structure for ensuring the power distribution network to run safely and indefinitely;

the power distribution network dynamic scheme assistant decision model is formed based on training of a DQN reinforcement learning algorithm.

Optionally, the online operation data of the power distribution network includes real-time or predicted switching state, fan power generation, photovoltaic power generation, and load size.

Optionally, the power distribution network scenario aided decision model is obtained in the following manner:

acquiring historical operation data of the power distribution network;

setting an epsilon-greedy strategy selection action;

establishing a power distribution network system environment model based on the historical operation data of the power distribution network, and establishing a deep reinforcement learning model of an intelligent agent;

and based on a preset epsilon-greedy strategy, performing off-line training and learning by using a power distribution network system environment model and a deep reinforcement learning model to obtain a power distribution network dynamic case assistant decision model meeting the error requirement.

Optionally, after the historical operation data of the power distribution network is obtained, the historical operation data of the power distribution network is preprocessed, and the historical operation data of the power distribution network is converted into an original sample set suitable for a reinforcement learning algorithm.

Optionally, the power distribution network system environment model and the deep reinforcement learning model are used for offline training and learning, and the method specifically includes the following steps:

the power distribution network system environment model returns a new system state and calculates a corresponding reward value every time the power distribution network system environment model executes the action given by the deep reinforcement learning model; and the deep reinforcement learning model continuously learns and improves the action strategy in the interaction process with the power distribution network system environment model by taking the control action capable of maximizing the reward expectation value as the target according to the current state.

Optionally, the establishing of the power distribution network system environment model includes: a smart agent state space, an action space, and a smart agent reward/penalty mechanism are set.

Optionally, when the deep reinforcement learning model of the agent is constructed, two neural networks are used, one real network generates a current Q value, and the other target network Q target, the initial weights and parameters of the two neural networks are the same, and the parameter updating speeds are different.

In a second aspect of the present invention, a system for the power distribution network scenario aided decision making method based on the deep neural network is provided, including:

the data acquisition module is used for acquiring the online operation data of the power distribution network;

the prediction module is used for inputting the online operation data of the power distribution network into a preset power distribution network dynamic case aid decision model; and the power distribution network scheme auxiliary decision model outputs a network structure for ensuring the power distribution network to operate safely and indefinitely.

In a third aspect of the present invention, a computer apparatus is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the processor implements the method for assisting in deciding a power distribution grid pattern based on a deep neural network.

In a fourth aspect of the present invention, a computer-readable storage medium is provided, where a computer program is stored, and when the computer program is executed by a processor, the method for assisting in decision-making of a power distribution grid pattern based on a deep neural network is implemented.

The invention has the following beneficial effects:

(A) the invention provides a neural network-based power distribution network dynamic case aid decision method, which utilizes an artificial intelligence algorithm to solve the idea of power distribution network dynamic case aid decision and comprises the following steps: firstly, the risk early warning problem of the power distribution network is converted into a Markov decision process, the network structure, the generated energy, the load capacity and the like of the power distribution network are selected as states, the on-off of all the adjustable line switches are used as actions, the radiancy and the safe and stable operation of the power distribution network are ensured as rewards, and then a DON reinforcement learning algorithm is utilized to train an auxiliary decision model of the power distribution network. Therefore, the topological structure of the power distribution network can be reconstructed only by inputting the corresponding state of the on-line power distribution network, and the power distribution network scheme-based auxiliary decision-making method based on the reinforcement learning method has the advantages of high operation speed, high efficiency and the like because all reconstruction schemes of the whole power distribution network do not need to be completely calculated in the judgment process, the calculated amount is small, and the consumed time is short.

(B) The DQN algorithm adopted by the invention is an algorithm which is widely applied in the field of reinforcement learning and has excellent performance, and the DQN algorithm integrates a deep neural network and a Q-learning reinforcement learning algorithm.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:

FIG. 1 is a flow chart of an auxiliary decision method for a power distribution network scenario change according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating a process for interaction of an reinforcement learning agent with an environment in accordance with an embodiment of the present invention;

FIG. 3 is a diagram illustrating a training process of a DQN reinforcement learning algorithm according to an embodiment of the present invention;

figure 4 is a flow chart of a Markov Decision Process (MDP) in an embodiment of the present invention;

fig. 5 is a diagram of a DQN reinforcement learning algorithm neural network in the embodiment of the present invention.

Detailed Description

The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

The following detailed description is exemplary in nature and is intended to provide further details of the invention. Unless otherwise defined, all technical terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention.

The embodiment of the invention provides a power distribution network scenario-based decision-making assisting method, a system, a device and a storage medium, and solves the problems of low decision-making efficiency, untimely decision-making and the like caused by overlong calculation time of heuristic methods such as the traditional genetic algorithm and the like at present.

As shown in fig. 1, in a first aspect of the present invention, a power distribution network scenario-based aided decision-making method based on a deep neural network is provided, a power distribution network aided decision-making process is autonomously learned by using a reinforcement learning algorithm model, and a trained model can make a decision immediately according to a current power distribution network state to maintain stable operation of a power distribution network, including the following steps:

step 1: and (3) data acquisition and preprocessing, namely selecting historical operation data of the power distribution network in a certain area as a data source, and converting the historical operation data of the power distribution network related to power distribution network aid decision into an original sample set suitable for a reinforcement learning algorithm.

Specifically, the historical operating data of the power distribution network comprises: data of the power distribution system (corresponding to the "environment" in the RL) and various input data (corresponding to the parts of the "state" in the RL, such as the number of fans, the number of photovoltaic power generation, the number of loads, and the switching operation frequency, etc.) can be simulated.

The data preprocessing process mainly comprises the following steps:

step 11: the WT, PV and load numbers in the distribution system are respectively marked as N_WT,N_PVAnd N_DThen the required data comprises a data set of the power generation amount of the distributed renewable energy device:

load data set:

step 12: the time interval between the last operation of each switch and the current moment is recorded simultaneously:

and when the time interval is smaller than the preset value, the switch is not actuated within the range of the preset value. Reach T at lead time_SWThis value will then not increase further, enough to indicate that the switch is not over-used. Thus, SW_trNot prepared in advance but (after assigning it an initial value) extracted from the actions of the agent.

Step 13:

D^tthe data is acquired from various online or offline sources, and the data is preprocessed, including data cleaning, normalization and the like.

Specifically, data cleaning refers to eliminating problem data such as missing data and error measurement (zero value or negative value) data, and selecting valid data closest to linear connection to perform.

Specifically, the data normalization is to normalize the raw data such as

D^tRespectively normalized to X_WT,X_PV,X_DAll negative data are removed and all data are divided by their maximum value. Before inputting DQN, SW is connected_trNormalized to X_trDivide each element by T_SW。

Step 14: the normalized data is divided into a training set and a test set.

Specifically, in normal reinforcement learning, the state is updated through interaction of the agent with the environment, independent of special external data. The state in the present embodiment depends on the power distribution system corresponding to the environment and the distributed renewable energy system power generation, and the load data may vary depending on external conditions such as weather. If no data separation is performed, DQN may be over-used in the training data set, possibly leading to configuration failures. Therefore, a test set is specified in the present embodiment to verify whether DQN learning performs correctly in relatively bad conditions.

Step 2: setting an epsilon-greedy strategy selection action;

specifically, in reinforcement learning, the action selection of an agent is called a policy. The goal of reinforcement learning is to determine the best strategy to achieve the highest reward. Prior to training, DQN randomly initializes weights and bias values, and thus initially cannot determine the action that will deliver a highest reward. The agent takes a random action to "explore" which can get a high reward, rather than relying on the output of DQN from the outset. So this embodiment uses intuitive, simple epsilon-greedy as an exploration strategy.

The agent selects the action with the maximum Q value in the corresponding state according to the probability epsilon, and simultaneously randomly selects the action a according to the probability 1-epsilon_tTo ensure that the exploration in the motion space is as possible, the formula of epsilon-greedy is as follows:

in the epsilon-greedy strategy, the larger the epsilon value, the faster the convergence and the easier it is to fall into local optima. Therefore, in the embodiment, in the early stage of exploration, the intelligent agent lacks effective information, the intelligent agent is explored in the action space with a high probability, and the accumulated empirical knowledge tends to be accurate along with continuous learning, and epsilon is gradually reduced.

And step 3: and establishing a power distribution network system environment model by using the preprocessed data, and establishing a deep reinforcement learning model of the intelligent agent. And setting corresponding state space, action space and reward (punishment) function, and performing off-line training and learning by using the DQN reinforcement learning model and the simulation environment to obtain the power distribution network dynamic case assistant decision-making model meeting the error requirement.

FIG. 2 is a process of an reinforcement learning agent interacting with a system environment. Every time the system environment executes the action given by the agent, the system environment returns to the new system state (state) and calculates the corresponding reward value (reward); and the intelligent agent continuously learns and improves the action strategy in the process of interacting with the actual environment by taking the control action capable of maximizing the reward expectation value as the target according to the current state.

As an example applied to the present invention, a Deep Q Network (DQN) learns Q values of all state-action pairs in a limited state and environment interaction process by fitting an action cost function through a neural network, thereby learning an optimal strategy.

Fig. 3 shows a DQN algorithm training process, wherein step 3 specifically comprises the following steps:

step 31: constructing a power distribution network system environment model;

the power distribution network system environment model is a power system environment interacting with the intelligent agent, namely, for each action given by the reinforcement learning intelligent agent, whether a risk exists under the strategy can be calculated, a reward function is fed back, the strategy is updated, and iteration is continuously carried out until an optimal strategy is learned.

In the present embodiment, the environment is formalized as a Markov decision process (Markov de)Precision process, MDP). MD P can be defined as a tuple (, a, P, R, γ) representing state space, action space, state transition probability, reward function and discount factor, respectively. Agent observes state s from the environment_tE.g. S, and take action a at time step t_tE.g. A, agent with probability P(s)_t+1|s_t,a_t) To a new state s_t+1While receiving the prize r(s)_t,a_t,s_t+1). The state transition process is shown in fig. 4.

Step 32: smart body action space

The random action consists of an array of 0 (open switch) and 1 (close switch). When extracting actions from DQN, the output layer is arranged in array form, and then the switch that wants to be turned off is replaced with 1, and the remaining elements are replaced with 0. The determined action is then input to the test system to effect opening or closing of each switch, and the SW is updated by comparing the switch state with the stored state of the previous action_tr。

Step 33: intelligent agent reward (punishment) mechanism

The agent's primary purpose in this application is to preserve the radiance of the network by maximizing its cumulative reward through continuous learning over a period of time. Furthermore, the reconfigured network meets general power network constraints;

as an example of the invention, line traffic or bus voltage is kept within a certain range for a given amount of power generation and load. The reward (penalty) mechanism of the DQN model is:

given state s and action a, the total reward rt in each time step may be expressed as follows:

wherein, L is the number of lines in the network; i is_iAmpacity (percentage) of ith line; α is the line overload penalty weight; b is the total bus number in the network; v_jObtaining a j-th line voltage per unit value; β is bus voltage penalty weight; γ ═ switch uses an excess penalty weight.

If the reconfigured network is radial, then each time step gives the agent a specific reward r_init(ii) a Otherwise, it will get a negative reward (i.e. penalty), failure r_failWhile the simulated episode will terminate immediately.

If the reconfigured network is radioactive, when the network violates the line capacity or bus voltage constraints, the agent will pair p according to the degree of violation and the corresponding weighting factor each time_lineAnd p_busAnd punishment is carried out.

p_lineThe calculation method of (1) is to assume that the capacity percentage of the distribution network line is 1, and multiply each illegal line i by a weight coefficient alpha. Likewise, p_busIs to calculate the hypothetical stable bus voltage in the range of 0.95-1.05 and multiply the weight factor β by each offending bus j. In addition, a penalty period p_swFrequent operation of the section switch is prevented. The use of handover records in this embodiment

To identify how much time has elapsed since the last actuation of the switch when sw_tr,k<T_SW,p_swMultiplying sw by weight factor gamma_tr,kAnd T_SWThe difference between them. In other embodiments, anyone using the model may adjust the penalty weight factor according to the applied distribution network environment.

Step 34: DQN algorithm process;

in the embodiment, Q-table is updated and converted into a function fitting problem through DQN, and a function is fitted to replace the Q-table to generate a Q value, so that the similar state obtains a good effect of extracting complex features by a similar output action depth neural network, and then deep Learning and Reinforcement Learning are combined to generate a DQN algorithm. The DQN reinforcement learning algorithm neural network is constructed as shown in fig. 5.

Another part of the innovation of the DQN algorithm is to solve the correlation and non-static distribution problem by an experience replay (experience pool) method. Experience(s) of the agent at time t_t,a_t,r_t,s_t+1) Stored in a playback memory D of size N. Then samples are randomly extracted to form small batch processing of a certain scale, and parameter learning is carried out. The size of the experience pool is limited due to both memory limitations and the need to train with the most up-to-date data. By using the method, the number of interaction with the environment can be reduced, the data efficiency is improved, the deviation caused by correlation among training samples can be eliminated, and the generalization performance is improved; two neural networks are used, one real network generates a current Q value, and the other target network Qtarget has the same initial weight and parameters, but the updating speed of the parameters is different, so that the data correlation is reduced.

The specific algorithm flow is as follows:

(1) initializing a playback memory unit D, wherein the number of the data pieces which can be accommodated is N;

(2) initializing a real Q network, and randomly generating a weight omega;

(3) generating a target Q network by using the same structure and parameters, wherein the weight omega' is omega;

(4) cycling through, each event, namely, 1,2, …, M (wherein M is the total number of days);

(5) initializing a preprocessed first state

(6) Each step 1,2, …, T of each event is circularly traversed; (at 15min intervals, one day is divided into 96 spots, T96)

(7) Generating action a with e-greedy policy_t: selecting a random action a with a probability_t(ii) a If the small probability event does not occur, selecting the action with the largest current value function by using a greedy strategy;

(8) executing switching action in a power distribution network simulation environment;

(9) if the reconfigured network is radioactive;

(10) solving tide and saving current-carrying capacity of all lines

All bus voltages

And updating the record to calculate the prize r according to the first formula of step 33_t；

(11) Otherwise r_t＝r_fail；

(12) Receiving a reward r_tAnd a new state s_t+1；

(13) Converting the result(s)_t,a_t,r_t,s_t+1) Storing the data into D;

(14) uniformly and randomly sampling a conversion sample data from D(s)_j,a_j,r_j,s_j+1)；

(15) Judging whether the event is in a termination state, if so, rewarding the event as r_j(ii) a Otherwise, calculating the TD return r by using the TD target network parameter omega_j+γmax_a′Q(s′,a′,ω′)；

(16) Executing a gradient descent algorithm to update network parameters;

(17) updating the network parameter theta which is approximated by the action value function as theta + delta theta;

(18) updating the target Q network every C step;

(19) ending the cycle of each event;

(20) the loop between events is ended.

After the training and learning of the steps, a power distribution network dynamic case aided decision-making model based on a reinforcement learning algorithm is formed, so that the power distribution network structure is accurately regulated and controlled.

And 4, step 4: and directly using the online operation data of the power distribution network, and performing online auxiliary decision-making on the power distribution network by using a power distribution network dynamic case auxiliary decision-making model. The method specifically comprises the following steps:

step 41: inputting the online operation data of the power distribution network as a state into a power distribution network dynamic case aid decision model;

as an example of the invention, the online operation data of the power distribution network is real-time or predicted switch state, fan power generation, photovoltaic power generation and load size.

Step 42: and the power distribution network dynamic case assistant decision model directly outputs a network structure for ensuring the power distribution network to operate safely and stably.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be appreciated by those skilled in the art that the invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The embodiments disclosed above are therefore to be considered in all respects as illustrative and not restrictive. All changes which come within the scope of or equivalence to the invention are intended to be embraced therein.

Claims

1. A power distribution network dynamic case aid decision-making method based on a deep neural network is characterized by comprising the following steps:

acquiring online operation data of the power distribution network;

2. The deep neural network-based power distribution network scenario-based aided decision making method according to claim 1, wherein the power distribution network online operation data comprises real-time or predicted switch states, wind turbine power generation, photovoltaic power generation and load sizes.

3. The power distribution network dynamic scenario aided decision method based on the deep neural network as claimed in claim 1, wherein the power distribution network dynamic scenario aided decision model is obtained in the following manner:

acquiring historical operation data of the power distribution network;

setting an epsilon-greedy strategy selection action;

4. The deep neural network-based power distribution network dynamic scenario aided decision making method according to claim 3, characterized in that after power distribution network historical operation data are obtained, the power distribution network historical operation data are preprocessed, and the power distribution network historical operation data are converted into an original sample set suitable for a reinforcement learning algorithm.

5. The power distribution network scenario-based aided decision-making method based on the deep neural network as claimed in claim 3, wherein the offline training and learning are performed by using a power distribution network system environment model and a deep reinforcement learning model, and specifically comprises:

6. The deep neural network-based power distribution network scenario aided decision making method according to claim 3, wherein the establishing of the power distribution network system environment model comprises: a smart agent state space, an action space, and a smart agent reward/penalty mechanism are set.

7. The power distribution network dynamic scenario aided decision-making method based on the deep neural network as claimed in claim 3, wherein when a deep reinforcement learning model of an agent is constructed, two neural networks are used, one real network generates a current Q value, and the other target network is a Q target, wherein the two neural networks have the same initial weight and parameters and different parameter updating speeds.

8. A system for the power distribution network dynamic scenario aided decision method based on the deep neural network is characterized by comprising the following steps:

9. A computer arrangement comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor when executing the computer program implements a deep neural network based power distribution grid pattern aid decision method according to any one of claims 1 to 7.

10. A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, implements the method for assisting in deciding the dynamic scenario of the deep neural network-based power distribution network according to any one of claims 1 to 7.