CN111401769A

CN111401769A - Intelligent power distribution network fault first-aid repair method and device based on deep reinforcement learning

Info

Publication number: CN111401769A
Application number: CN202010218227.9A
Authority: CN
Inventors: 刘江东; 赵越; 蒋振宇; 汪波; 徐力; 赵光; 项达冬
Original assignee: Xiamen Epgis Information Technology Co ltd; State Grid Jiangsu Electric Power Co ltd Yangzhou Power Supply Branch; State Grid Jiangsu Electric Power Co Ltd
Current assignee: Xiamen Epgis Information Technology Co ltd; State Grid Jiangsu Electric Power Co ltd Yangzhou Power Supply Branch; State Grid Jiangsu Electric Power Co Ltd
Priority date: 2020-03-25
Filing date: 2020-03-25
Publication date: 2020-07-10

Abstract

A method and a device for intelligently repairing a power distribution network fault based on deep reinforcement learning are disclosed, wherein the method comprises the following steps: 1) firstly, constructing a deep reinforcement learning model, and combining the distance between a fault point and an emergency repair center and the emergency repair task amount of the deep reinforcement learning model into a system state as input data of the deep reinforcement learning model; 2) training the neural network according to input data to obtain system actions, namely a power distribution network first-aid repair resource distribution strategy; 3) bringing the system state and the system action into a reward function to obtain a reward value of the system action, and updating the neural network parameters according to the magnitude of the reward value; 4) and repeating the steps until the reward value tends to be stable, thereby finishing the training process and carrying out distribution network fault first-aid repair resource allocation according to the final system action. The invention can greatly reduce the time for rush repair of faults and improve the power utilization satisfaction of users.

Description

Intelligent power distribution network fault first-aid repair method and device based on deep reinforcement learning

Technical Field

The invention relates to the technical field of power grids, in particular to a power distribution network fault intelligent first-aid repair method and device based on deep reinforcement learning.

Background

In recent years, network technology has been developed rapidly, and various industries have become more modern due to the development of networks more or less. Therefore, the existing power supply system has some problems in dealing with modern power supply, and the power first-aid repair system is produced in order to deal with the sudden problems in all aspects. The power first-aid repair system has the functions that in order to deal with sudden faults of the power system, the power supply command center receives the alarm and then informs the resource distribution center to carry out resource scheduling, so that the time cost from the occurrence of the accident to the start of the treatment is greatly saved, the process of accident treatment is optimized, the satisfaction degree of a user on the power supply system is improved, and the solid power guarantee is provided for the modern construction.

With the continuous expansion of the scale of the power grid in China, the number of power consumers is increased rapidly, and higher requirements are put forward on the service level of power supply enterprises. The original power first-aid repair scheduling systems cannot rapidly and effectively schedule first-aid repair resources due to technical and strategic defects, and economic loss is increased.

Disclosure of Invention

Aiming at the problems, the invention provides the intelligent power distribution network fault emergency repair method and device based on deep reinforcement learning, which are convenient for resource allocation and can reduce the emergency repair time.

The technical scheme of the invention is as follows: the method comprises the following steps:

s1, constructing a power distribution network fault first-aid repair deep reinforcement learning model, formulating a power distribution network fault first-aid repair task, and taking a system state as the input of a deep neural network;

step S2, training the neural network according to the input system state to obtain the system action;

step S3, bringing the system state and the system action into a reward function to obtain a reward value of the system action, and updating the neural network parameters according to the magnitude of the reward value;

and S4, repeating the steps S1-S3 until the reward value tends to be stable, thereby completing the training process and carrying out distribution network fault emergency repair resource allocation according to the final system action.

The system state comprises the distance between a fault point and an emergency repair center and the emergency repair task amount.

In the step S1, the deep reinforcement learning model is built by a neural network, and parameters of the neural network in the model include weight w, bias b, learning rate L and the number of hidden layers of the neural network.

The distribution network fault first-aid repair task in the step S1 is as follows: r_u(d_u,n_u) (ii) a Wherein d is_uIndicating the distance of the repair point from the repair center, n_uIndicating the amount of first-aid repair tasks.

In step S1, the distance between the failure point and the emergency repair center and the emergency repair task amount may be expressed as S { d1, d2 ·, du, n1, n2 ·, nu }, as the system status.

The system action in step S2 is defined as a ═ f1, f2, ·, fu, where fu is the amount of resources allocated to the fault point u by the emergency repair center, including emergency repair personnel, emergency repair vehicles, and emergency repair tools.

The prize winning function in the step S3 is defined as r ═ r_max-T_allWherein r is_maxRepresenting the maximum value of the repair time, T_allRepresenting the time taken for the first-aid repair.

Time T spent in first-aid repair_allThe system consists of two parts, namely journey time and maintenance time; wherein, the journey time refers to the time when the vehicle reaches the fault point and can be expressed as

o_uIndicating the vehicle travel speed assigned to the fault u; repair time refers to the time it takes for a repair person to resolve a fault, and may be expressed as

The utility model provides a device is salvageed to distribution network trouble intelligence based on degree of depth reinforcement study, includes:

the data input module is used for inputting a power distribution network fault first-aid repair task;

the system state module is used for establishing a system state formed by combining the distance between the fault point and the emergency repair center and the emergency repair task amount and used as the input of the deep reinforcement learning model;

the system action module is used for carrying out neural network training on the input system state to obtain system action;

the reward module is used for bringing the system state and the system action into a reward function to obtain a reward value of the system action, and updating the neural network parameters according to the magnitude of the reward value;

and the distribution module is used for carrying out distribution network fault first-aid repair resource distribution according to the last system action. According to the technical scheme, the deep reinforcement learning model reaching the training purpose can be obtained, and the model has strong perception capability and decision-making capability according to the characteristics of the deep reinforcement learning algorithm. Through interacting with the actual emergency repair environment, a solution is provided for the complex emergency repair task.

The method can find the optimal distribution network fault emergency repair resource allocation method based on the deep reinforcement learning algorithm under the condition of a plurality of fault emergency repair tasks, so as to reduce the fault emergency repair time.

The invention greatly reduces the time for rush repair of faults and improves the power utilization satisfaction of users; meanwhile, the economic loss is reduced.

Drawings

FIG. 1 is a flowchart of a method for intelligently repairing a power distribution network fault based on deep reinforcement learning according to an embodiment of the present invention,

fig. 2 is a schematic structural diagram of a power distribution network fault intelligent first-aid repair device based on deep reinforcement learning according to an embodiment of the present invention.

Detailed Description

The invention comprises the following steps:

s1, constructing a power distribution network fault first-aid repair deep reinforcement learning model, formulating a power distribution network fault first-aid repair task, and taking a system state as input of the deep reinforcement learning model;

The deep reinforcement learning model is an end-to-end perception and decision system, and can obtain an optimal strategy through interaction with the environment. Each moment model needs to interact with the environment to obtain a high-dimensional observation, and the observation is sensed by using a reinforcement learning method to obtain a specific state characteristic representation. And evaluating a value function of each action according to the expected emergency repair effect, and mapping the current state into a corresponding action through a certain strategy. The environment reacts to this action and gets the next observation. And finally, the optimal strategy for realizing the target can be obtained.

In the step S3, the parameter value of the neural network directly affects the distribution strategy of the first-aid repair resources of the power distribution network. Therefore, in order to better adapt to the actual dynamic scene and obtain the optimal resource allocation strategy, the neural network parameters are updated according to the size of the reward value.

In step S4, the reward value is r, which can sufficiently reflect the quality of the resource allocation policy, and the larger the value of r, the better the allocation policy. Meanwhile, the change of the r value reflects the result of the neural network training, and when the r value tends to be stable (namely constant), the neural network training is finished, and the optimal distribution strategy of the emergency repair resources of the power distribution network is obtained.

In step S1, the deep reinforcement learning model is built by a neural network. The model interacts with a power distribution network emergency repair environment, and neural network training is completed from the model; after the training is finished, a corresponding power distribution network resource allocation strategy is made according to the actual power distribution network emergency repair environment.

The number of hidden layers, the weight w of each layer, the bias b, the initial value of the learning rate L and the selection of the activation function influence the training result of the whole neural network and influence the distribution of the power distribution network fault first-aid repair resources.

Fig. 1 is a flowchart of a power distribution network fault intelligent first-aid repair method based on deep reinforcement learning, as shown in fig. 1, the method includes the following steps:

step 101: the power distribution network emergency repair environment is an actual emergency repair scene. The distribution network fault first-aid repair task is defined as follows: r_u(d_u，n_u) (ii) a Wherein d is_uIndicating the distance of the repair point from the repair center, n_uRepresenting the amount of a first-aid repair task;

step 102: the distance between the fault point and the emergency repair center and the emergency repair task amount are combined into a system state, which can be expressed as S ═ d₁，d₂，…，d_U，n₁，n₂，…，n_U}；

Step 103: the reward value is formed by the time spent in first-aid repair, and the time spent in first-aid repair T_allMay consist of two parts, journey time and maintenance time. Wherein, the distance time refers to the vehicleThe time at which the vehicle reaches the point of failure can be expressed as

o_uIndicating the vehicle travel speed assigned to the fault u. Repair time refers to the time it takes for a repair person to resolve a fault, and may be expressed as

The reward function is defined as r ═ r_max-T_allWherein r is_maxRepresents the maximum value of the first-aid repair time;

step 104: the system action is output as a neural network, namely a power distribution network first-aid repair resource distribution strategy; system action, defined as a ═ f₁，f₂，…，f_UIn which f_uThe resource amount distributed to a fault point u by an emergency repair center specifically comprises emergency repair personnel, emergency repair vehicles, emergency repair tools and the like;

and 105, establishing a deep reinforcement learning model, and initializing parameters of a deep neural network, such as weight w, bias b, learning rate L, an activation function and the number of hidden layers.

The reinforcement learning obtains the optimal decision through the interactive direct learning with the environment, and the method is applied to a power distribution network emergency repair system, can adaptively learn the optimal signal strategy according to the emergency repair condition, and improves the emergency repair scheme. Fig. 2 is a schematic structural diagram of a power distribution network fault intelligent emergency repair device based on deep reinforcement learning, which is provided in an embodiment of the present invention, and as shown in fig. 2, the power distribution network fault intelligent emergency repair device based on deep reinforcement learning includes:

and the distribution module is used for carrying out distribution network fault first-aid repair resource distribution according to the last system action.

According to the invention, a distribution network fault emergency repair task sequentially passes through the system state module, the system action module and the reward module, and finally passes through the distribution module, so that the reliable distribution of resources is realized, and the emergency repair time is reduced.

The disclosure of the present application also includes the following points:

(1) the drawings of the embodiments disclosed herein only relate to the structures related to the embodiments disclosed herein, and other structures can refer to general designs;

(2) in case of conflict, the embodiments and features of the embodiments disclosed in this application can be combined with each other to arrive at new embodiments;

the above embodiments are only embodiments disclosed in the present disclosure, but the scope of the disclosure is not limited thereto, and the scope of the disclosure should be determined by the scope of the claims.

Claims

1. A power distribution network fault intelligent first-aid repair method based on deep reinforcement learning is characterized by comprising the following steps:

2. The intelligent power distribution network fault emergency repair method based on deep reinforcement learning of claim 1, wherein the system state comprises a distance between a fault point and an emergency repair center and an emergency repair task amount.

3. The intelligent power distribution network fault emergency repair method based on deep reinforcement learning of claim 1, wherein in step S1, the deep reinforcement learning model is built by a neural network, and parameters of the neural network in the model include weight w, bias b, learning rate L and the number of hidden layers of the neural network.

4. The intelligent power distribution network fault emergency repair method based on deep reinforcement learning of claim 2, wherein the power distribution network fault emergency repair task in step S1 is as follows: r_u(d_u,n_u) (ii) a Wherein d is_uIndicating the distance of the repair point from the repair center, n_uIndicating the amount of first-aid repair tasks.

5. The method for intelligently repairing power distribution network faults based on deep reinforcement learning of claim 2, wherein in step S1, the distance between the fault point and the repair center and the repair task amount are expressed as S { d1, d2, ·, d ═ as the system status_u,n₁,n₂,···,n_u}。

6. The intelligent power distribution network fault emergency repair method based on deep reinforcement learning of claim 1, wherein the system action in step S2 is defined as a ═ f₁,f₂，···,f_uIn which f_uThe resource quantity distributed to the fault point u by the emergency repair center comprises emergency repair personnel, emergency repair vehicles and emergency repair tools.

7. The intelligent power distribution network fault robbery based on deep reinforcement learning of claim 1The method, wherein the reward function is defined as r-r in the step S3_max-T_allWherein r is_maxRepresenting the maximum value of the repair time, T_allRepresenting the time taken for the first-aid repair.

8. The intelligent emergency repair method for power distribution network faults based on deep reinforcement learning of claim 7, wherein the emergency repair takes time T_allThe system consists of two parts, namely journey time and maintenance time; wherein, the journey time refers to the time when the vehicle reaches the fault point and can be expressed as

9. The utility model provides a device is salvageed to distribution network trouble intelligence based on degree of depth reinforcement study which characterized in that includes: