CN114710792A

CN114710792A - Optimal arrangement method of 5G distribution network distributed protection devices based on reinforcement learning

Info

Publication number: CN114710792A
Application number: CN202210330896.4A
Authority: CN
Inventors: 孙伟; 王文浩; 戴宇; 于洋; 王吉文; 李端超; 王同文; 汪伟; 俞斌; 张骏; 戴长春; 李奇越; 李帷韬
Original assignee: Hefei University of Technology; State Grid Anhui Electric Power Co Ltd
Current assignee: Hefei University of Technology; State Grid Anhui Electric Power Co Ltd
Priority date: 2022-03-30
Filing date: 2022-03-30
Publication date: 2022-07-05

Abstract

The invention discloses an optimized arrangement method of a 5G distribution network distributed protection device based on reinforcement learning, which comprises the following steps: 1. building a 5G distribution network protection system environment; 2. establishing a reinforcement learning model of the 5G distribution network protection system, wherein the reinforcement learning model consists of a strategy body and an executive body; 3. training a reinforcement learning model in a 5G distribution network protection system environment; 4. using optimal layout information S^*maxAnd arranging the 5G distribution network distributed protection devices. The invention can ensure that the master station protection device and the distribution network distributed protection device establish communication, and find out the optimal proportion of the protection devices of the 5G distribution network distributed protection system, thereby ensuring that the distribution network can operate safely and efficiently.

Description

Optimal arrangement method of 5G distribution network distributed protection devices based on reinforcement learning

Technical Field

The invention belongs to the field of distribution network protection, and particularly relates to an optimized arrangement method of a 5G distribution network distributed protection device based on reinforcement learning.

Background

The distribution network has the characteristics of multiple voltage levels, complex network structure, various equipment types, multiple and wide operation points, relatively poor safety environment and the like, has relatively more safety risk factors, and puts forward higher requirements on the safe and reliable operation of the distribution network in order to provide electric energy for various users, so that a protection device needs to be arranged to protect the distribution network. However, the distribution network has a large number and wide distribution, and the protection device is difficult to arrange due to the influence of the technology. At present, most of distribution network protection devices still adopt the traditional thought, and the optimal distribution network protection device cannot be arranged in the reliability range.

Disclosure of Invention

The invention provides an optimized arrangement method of a 5G distribution network distributed protection device based on reinforcement learning, aiming at solving the defects in the prior art. The optimal arrangement of the protection devices of the 5G distribution network distributed protection system is expected to be found on the premise of meeting the reliability, so that the safe and efficient operation of the distribution network is ensured.

In order to achieve the purpose, the invention adopts the following technical scheme:

the invention discloses an optimized arrangement method of a 5G distribution network distributed protection device based on reinforcement learning, which is characterized by comprising the following steps of:

step 1, building a 5G distribution network protection system environment;

let L be [ < L >₁,l₂,...,l_i,...,l_n]Indicating whether n nodes of the 5G distribution network are provided with the protection devices or not, and if l_i0 means that the i-th node has no protection device, if l_i1, the protection device is arranged at the ith node, and at most one protection device can be arranged at each node, i is 1, 2.

Let D ═ D₁,d₂,...,d_i,...,d_n]Representing the actual distances, d, of the n nodes of the 5G distribution network from the 5G base station_iRepresents the actual distance between the ith node and the 5G base station when l_iWhen equal to 0, d_i＝0；i＝1,2,...,n；

Let S ═ L, D denote arrangement information of protection devices of the 5G distribution network protection system;

n nodes of the initialized 5G distribution network are all provided with protection devices, namely, { l_i＝1,i＝1,2,...,n}；

Step 2, establishing a reinforcement learning model of the 5G distribution network protection system, wherein the reinforcement learning model consists of a strategy body and an executive body:

the strategy body consists of two layers of neuron networks, an input layer of the strategy body inputs the arrangement information S of a protection device of the 5G distribution network protection system, and an output layer of the strategy body outputs the probability pi (A) of all actions a, wherein the probability pi (A) is obtained through a strategy pi (theta) and a SoftMax function, and the theta is a set of neuron network parameters;

the executive body is used for executing actions so as to change the arrangement information of the protection devices of the 5G distribution network protection system, and the reward module in the executive body calculates rewards;

step 3, training a reinforcement learning model in a 5G distribution network protection system environment;

step 3.1, defining the number of rounds as m, and initializing m to be 1;

step 3.2, defining the times of each round of training as t, and initializing t to be 1;

defining the arrangement information of the protection device of the protection system for the mth training round as

And initialize

Step 3.3, the input layer of the strategy body inputs the arrangement information

Step 3.4, the strategy body passes the strategy

And the probability of the SoftMax function outputting all actions a

Wherein

Is the mth round and the tth training strategy,

is the set of parameters for the tth training neural network of the mth round,

is the probability of training all actions a for the mth round, the tth time;

and 3.5, under the environment of a 5G distribution network protection system, the executive body outputs the probability of all the actions a according to the strategy body

Selection actions

The executive body executes actions under the mth training of the mth round

Thereby changing the number of the protection devices on the m adjacent points of the ith node, namely changing the arrangement information of the protection devices of the 5G distribution network protection system

And outputting arrangement information of t +1 training times in mth round

Only when_iWhen the number of the m adjacent points is 0, the executive body respectively adds a protection device for the m adjacent points of the ith node; l_i+1,i＝1,2,...,n；

Only when_iWhen the number of the m adjacent points is 1, the executing body reduces one protection device for each of the m adjacent points of the ith node; l_i-1,n＝1,2,...,n；

Step 3.6, judging whether the formula (1) is established, if so, executing step 3.7, otherwise, returning to execute step 3.4:

in the formula (1), s isReliability of the protection system, p_iIs the probability of failure of the protection device on the ith node, s_exThe reliability is expected when the 5G distribution network operates normally;

step 3.7, the reward module of the executive body calculates the reward under the tth training of the mth round through the formula (2) to the formula (5)

In the formulae (2) to (5),

is the reward of the distance between the protective device and the 5G base station under the mth training round and the tth training,

is the reward for the number of protective devices under the mth round and the tth training,

is the reward for protecting the reliability of the system under the mth round and the tth training;

is the reward for the number of protection devices under the mth round and the tth training;

step 3.8, updating the parameters under the t training of the mth round by the strategy body through the formulas (6) and (7)

Thereby obtaining the parameters under the t +1 training of the mth round

In the expressions (6) and (7), α is a learning rate,

is a parameter

The partial derivatives of (a) are,

is that

The function of the value of (a) below,

is that

The following strategy is adopted,

is a function of value

(iii) a desire;

step 3.9, after assigning t +1 to t, judging t>C₁If yes, finishing the training of the current mth round to obtain the optimal arrangement information under the current mth round

And will be

Is stored in a collection

After the step (3) is finished, executing the step (3.10), otherwise, returning to execute the step (3.3) for sequential execution; wherein, C₁Is the maximum number of iterations per round;

step 3.10, after m +1 is assigned to m, m is judged>C₂If yes, ending all training and collecting

To obtain the optimal arrangement information S^*max(ii) a Otherwise, returning to execute the step 3.2 for sequential execution, wherein C₂Is the maximum number of iteration rounds;

step 4, adopting the optimal layout information S^*maxAnd arranging the 5G distribution network distributed protection devices.

Compared with the prior art, the invention has the beneficial effects that:

1. according to the invention, by utilizing the advantages of reinforcement learning and continuous interactive learning of the environment, and considering the characteristics of large quantity and wide distribution network, under the environment of a 5G distribution network protection system, the quantity and the arrangement condition of protection devices are changed in a multi-dimensional manner, and the optimal arrangement of the protection devices is found through iterative learning of reinforcement learning, so that the protection of the distribution network is realized, and the safe and reliable operation of the distribution network is ensured;

2. the invention provides a low-delay and high-reliability information channel for distribution network protection services by utilizing a 5G communication technology, thereby solving the problems that the traditional distribution network protection is weaker in selectivity, not accurate in fault location, long in fault removal time and incapable of realizing self-healing after fault removal of a distribution network line.

Drawings

Fig. 1 is a flowchart of an optimized arrangement method of a 5G distribution network distributed protection device based on reinforcement learning according to the present invention;

FIG. 2 is an environmental diagram of a 5G distribution network protection system according to the present invention;

FIG. 3 is a diagram of the reinforcement learning training process of the present invention.

Detailed Description

In this embodiment, as shown in fig. 1, a method for optimally arranging 5G distribution network distributed protection devices based on reinforcement learning is characterized by including the following steps:

step 1, as shown in fig. 2, building a 5G distribution network protection system environment;

Let D be [ D₁,d₂,...,d_i,...,d_n]Representing the actual distances between the n nodes of the 5G distribution network and the 5G base station, d_iRepresents the actual distance between the ith node and the 5G base station when l_iWhen equal to 0, d_i＝0；i＝1,2,...,n；

the strategy body consists of two layers of neuron networks, the input layer of the strategy body inputs the arrangement information S of a protection device of a 5G distribution network protection system, and the output layer of the strategy body outputs the probability pi (A) of all actions a, wherein the pi (A) is obtained through a strategy pi (theta) and a SoftMax function, and the theta is a set of neuron network parameters;

the executive body is used for executing actions so as to change the arrangement information of the protection device of the 5G distribution network protection system, and the reward module in the executive body calculates rewards;

step 3, as shown in fig. 3, training a reinforcement learning model in a 5G distribution network protection system environment;

step 3.1, defining the number of rounds as m, and initializing m to 1;

And initialize

Step 3.3, input layer of strategy body inputs layout information

Step 3.4, strategy body passing strategy

And the probability of the SoftMax function outputting all actions a

Wherein

Is the mth round and the tth training strategy,

is the set of parameters for the tth training neural network of the mth round,

is the probability of training all actions a in the mth round, the tth time;

and 3.5, under the environment of the 5G distribution network protection system, the executing body outputs the probability of all the actions a according to the strategy body

Selection actions

Executing action under mth training of executing body

And outputting the arrangement information of the mth round at the time of the t +1 training

Only when_iWhen the number of the nodes is 0, the executive body respectively adds a protection device for m adjacent points of the ith node; l_i+1,i＝1,2,...,n；

Only when_iWhen the number of the m adjacent points is 1, the executive body reduces one protection device for the m adjacent points of the ith node; l_i-1,n＝1,2,...,n；

The number of the protection devices is simply changed, the actual conditions of large distribution network number and wide distribution can not be met, and different actions are executed in each round in consideration of the complicated actual conditions of the distribution network, so that the protection effect is improved on the premise of meeting the reliability of the distribution network;

in the formula (1), s is the reliability of the protection system, and p_iIs the probability of failure of the protection device on the ith node, s_exThe reliability is expected when the 5G distribution network operates normally;

step 3.7, the reward module of the executive body calculates the reward under the tth training of the mth round through the formulas (2) to (5)

In the formulae (2) to (5),

is the reward for the number of protection devices under the mth round and the tth training; the distance between the protection device and the 5G base station affects the reliability of the protection systemSex;

Thereby obtaining the parameters under the t +1 training of the mth round

In the formulae (6) and (7), α is a learning rate,

is a parameter

The partial derivatives of (a) are,

is that

The function of the value of (a) below,

is that

The following strategy is adopted,

is a function of value

(ii) a desire for;

And will be

Is stored in a collection

Claims

1. An optimized arrangement method of 5G distribution network distributed protection devices based on reinforcement learning is characterized by comprising the following steps:

step 1, building a 5G distribution network protection system environment;

Let D ═ D₁,d₂,...,d_i,...,d_n]Representing the actual distances between the n nodes of the 5G distribution network and the 5G base station, d_iRepresents the actual distance between the ith node and the 5G base station when l_iWhen equal to 0, d_i＝0；i＝1,2,...,n；

step 3.1, defining the number of rounds as m, and initializing m to be 1;

step 3.2, defining the times of each round of training as t, and initializing t to 1;

And initialize

Step 3.4, the strategy body passes the strategy

And the probability of the SoftMax function outputting all actions a

Wherein

Is the mth round and the tth training strategy,

is the set of parameters for the tth training neural network of the mth round,

is the probability of training all actions a for the mth round, the tth time;

step 3.5, under the environment of the 5G distribution network protection system, the executive body outputs the probability of all the actions a according to the strategy body

Selection actions

The executive body executes actions under the mth training of the mth round

Only when_iWhen the number of the m adjacent points is 0, the executive body respectively adds a protection device for the m adjacent points of the ith node; l. the_i+1,i＝1,2,...,n；

In the formulae (2) to (5),

Thereby obtaining the parameters under the t +1 training of the mth round

In the expressions (6) and (7), α is a learning rate,

is a parameter

The partial derivatives of (a) are,

is that

The function of the value of (a) below,

is that

The following strategy is adopted,

is a function of value

(iii) a desire;

step 3.9, after assigning t +1 to t, judging t>C₁Whether the current mth round is satisfied, if so, the training of the current mth round is ended to obtain the optimal arrangement information under the current mth round

And will be

Is stored in a collection