CN114710792A - Optimal arrangement method of 5G distribution network distributed protection devices based on reinforcement learning - Google Patents

Optimal arrangement method of 5G distribution network distributed protection devices based on reinforcement learning Download PDF

Info

Publication number
CN114710792A
CN114710792A CN202210330896.4A CN202210330896A CN114710792A CN 114710792 A CN114710792 A CN 114710792A CN 202210330896 A CN202210330896 A CN 202210330896A CN 114710792 A CN114710792 A CN 114710792A
Authority
CN
China
Prior art keywords
distribution network
training
round
protection
mth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210330896.4A
Other languages
Chinese (zh)
Inventor
孙伟
王文浩
戴宇
于洋
王吉文
李端超
王同文
汪伟
俞斌
张骏
戴长春
李奇越
李帷韬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
State Grid Anhui Electric Power Co Ltd
Original Assignee
Hefei University of Technology
State Grid Anhui Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology, State Grid Anhui Electric Power Co Ltd filed Critical Hefei University of Technology
Priority to CN202210330896.4A priority Critical patent/CN114710792A/en
Publication of CN114710792A publication Critical patent/CN114710792A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/061Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using biological neurons, e.g. biological neurons connected to an integrated circuit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses an optimized arrangement method of a 5G distribution network distributed protection device based on reinforcement learning, which comprises the following steps: 1. building a 5G distribution network protection system environment; 2. establishing a reinforcement learning model of the 5G distribution network protection system, wherein the reinforcement learning model consists of a strategy body and an executive body; 3. training a reinforcement learning model in a 5G distribution network protection system environment; 4. using optimal layout information S*maxAnd arranging the 5G distribution network distributed protection devices. The invention can ensure that the master station protection device and the distribution network distributed protection device establish communication, and find out the optimal proportion of the protection devices of the 5G distribution network distributed protection system, thereby ensuring that the distribution network can operate safely and efficiently.

Description

Optimal arrangement method of 5G distribution network distributed protection devices based on reinforcement learning
Technical Field
The invention belongs to the field of distribution network protection, and particularly relates to an optimized arrangement method of a 5G distribution network distributed protection device based on reinforcement learning.
Background
The distribution network has the characteristics of multiple voltage levels, complex network structure, various equipment types, multiple and wide operation points, relatively poor safety environment and the like, has relatively more safety risk factors, and puts forward higher requirements on the safe and reliable operation of the distribution network in order to provide electric energy for various users, so that a protection device needs to be arranged to protect the distribution network. However, the distribution network has a large number and wide distribution, and the protection device is difficult to arrange due to the influence of the technology. At present, most of distribution network protection devices still adopt the traditional thought, and the optimal distribution network protection device cannot be arranged in the reliability range.
Disclosure of Invention
The invention provides an optimized arrangement method of a 5G distribution network distributed protection device based on reinforcement learning, aiming at solving the defects in the prior art. The optimal arrangement of the protection devices of the 5G distribution network distributed protection system is expected to be found on the premise of meeting the reliability, so that the safe and efficient operation of the distribution network is ensured.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention discloses an optimized arrangement method of a 5G distribution network distributed protection device based on reinforcement learning, which is characterized by comprising the following steps of:
step 1, building a 5G distribution network protection system environment;
let L be [ < L >1,l2,...,li,...,ln]Indicating whether n nodes of the 5G distribution network are provided with the protection devices or not, and if li0 means that the i-th node has no protection device, if li1, the protection device is arranged at the ith node, and at most one protection device can be arranged at each node, i is 1, 2.
Let D ═ D1,d2,...,di,...,dn]Representing the actual distances, d, of the n nodes of the 5G distribution network from the 5G base stationiRepresents the actual distance between the ith node and the 5G base station when liWhen equal to 0, di=0;i=1,2,...,n;
Let S ═ L, D denote arrangement information of protection devices of the 5G distribution network protection system;
n nodes of the initialized 5G distribution network are all provided with protection devices, namely, { li=1,i=1,2,...,n};
Step 2, establishing a reinforcement learning model of the 5G distribution network protection system, wherein the reinforcement learning model consists of a strategy body and an executive body:
the strategy body consists of two layers of neuron networks, an input layer of the strategy body inputs the arrangement information S of a protection device of the 5G distribution network protection system, and an output layer of the strategy body outputs the probability pi (A) of all actions a, wherein the probability pi (A) is obtained through a strategy pi (theta) and a SoftMax function, and the theta is a set of neuron network parameters;
the executive body is used for executing actions so as to change the arrangement information of the protection devices of the 5G distribution network protection system, and the reward module in the executive body calculates rewards;
step 3, training a reinforcement learning model in a 5G distribution network protection system environment;
step 3.1, defining the number of rounds as m, and initializing m to be 1;
step 3.2, defining the times of each round of training as t, and initializing t to be 1;
defining the arrangement information of the protection device of the protection system for the mth training round as
Figure BDA0003573059550000021
And initialize
Figure BDA0003573059550000022
Step 3.3, the input layer of the strategy body inputs the arrangement information
Figure BDA0003573059550000023
Step 3.4, the strategy body passes the strategy
Figure BDA0003573059550000024
And the probability of the SoftMax function outputting all actions a
Figure BDA0003573059550000025
Wherein
Figure BDA0003573059550000026
Is the mth round and the tth training strategy,
Figure BDA0003573059550000027
is the set of parameters for the tth training neural network of the mth round,
Figure BDA0003573059550000028
is the probability of training all actions a for the mth round, the tth time;
and 3.5, under the environment of a 5G distribution network protection system, the executive body outputs the probability of all the actions a according to the strategy body
Figure BDA0003573059550000029
Selection actions
Figure BDA00035730595500000210
The executive body executes actions under the mth training of the mth round
Figure BDA00035730595500000211
Thereby changing the number of the protection devices on the m adjacent points of the ith node, namely changing the arrangement information of the protection devices of the 5G distribution network protection system
Figure BDA00035730595500000212
And outputting arrangement information of t +1 training times in mth round
Figure BDA00035730595500000213
Only wheniWhen the number of the m adjacent points is 0, the executive body respectively adds a protection device for the m adjacent points of the ith node; li+1,i=1,2,...,n;
Only wheniWhen the number of the m adjacent points is 1, the executing body reduces one protection device for each of the m adjacent points of the ith node; li-1,n=1,2,...,n;
Step 3.6, judging whether the formula (1) is established, if so, executing step 3.7, otherwise, returning to execute step 3.4:
Figure BDA00035730595500000214
in the formula (1), s isReliability of the protection system, piIs the probability of failure of the protection device on the ith node, sexThe reliability is expected when the 5G distribution network operates normally;
step 3.7, the reward module of the executive body calculates the reward under the tth training of the mth round through the formula (2) to the formula (5)
Figure BDA0003573059550000031
Figure BDA0003573059550000032
Figure BDA0003573059550000033
Figure BDA0003573059550000034
Figure BDA0003573059550000035
In the formulae (2) to (5),
Figure BDA0003573059550000036
is the reward of the distance between the protective device and the 5G base station under the mth training round and the tth training,
Figure BDA0003573059550000037
is the reward for the number of protective devices under the mth round and the tth training,
Figure BDA0003573059550000038
is the reward for protecting the reliability of the system under the mth round and the tth training;
Figure BDA0003573059550000039
is the reward for the number of protection devices under the mth round and the tth training;
step 3.8, updating the parameters under the t training of the mth round by the strategy body through the formulas (6) and (7)
Figure BDA00035730595500000310
Thereby obtaining the parameters under the t +1 training of the mth round
Figure BDA00035730595500000311
Figure BDA00035730595500000312
Figure BDA00035730595500000313
In the expressions (6) and (7), α is a learning rate,
Figure BDA00035730595500000314
is a parameter
Figure BDA00035730595500000315
The partial derivatives of (a) are,
Figure BDA00035730595500000316
is that
Figure BDA00035730595500000317
The function of the value of (a) below,
Figure BDA00035730595500000318
is that
Figure BDA00035730595500000319
The following strategy is adopted,
Figure BDA00035730595500000320
is a function of value
Figure BDA00035730595500000321
(iii) a desire;
step 3.9, after assigning t +1 to t, judging t>C1If yes, finishing the training of the current mth round to obtain the optimal arrangement information under the current mth round
Figure BDA00035730595500000322
And will be
Figure BDA00035730595500000323
Is stored in a collection
Figure BDA00035730595500000324
After the step (3) is finished, executing the step (3.10), otherwise, returning to execute the step (3.3) for sequential execution; wherein, C1Is the maximum number of iterations per round;
step 3.10, after m +1 is assigned to m, m is judged>C2If yes, ending all training and collecting
Figure BDA00035730595500000325
To obtain the optimal arrangement information S*max(ii) a Otherwise, returning to execute the step 3.2 for sequential execution, wherein C2Is the maximum number of iteration rounds;
step 4, adopting the optimal layout information S*maxAnd arranging the 5G distribution network distributed protection devices.
Compared with the prior art, the invention has the beneficial effects that:
1. according to the invention, by utilizing the advantages of reinforcement learning and continuous interactive learning of the environment, and considering the characteristics of large quantity and wide distribution network, under the environment of a 5G distribution network protection system, the quantity and the arrangement condition of protection devices are changed in a multi-dimensional manner, and the optimal arrangement of the protection devices is found through iterative learning of reinforcement learning, so that the protection of the distribution network is realized, and the safe and reliable operation of the distribution network is ensured;
2. the invention provides a low-delay and high-reliability information channel for distribution network protection services by utilizing a 5G communication technology, thereby solving the problems that the traditional distribution network protection is weaker in selectivity, not accurate in fault location, long in fault removal time and incapable of realizing self-healing after fault removal of a distribution network line.
Drawings
Fig. 1 is a flowchart of an optimized arrangement method of a 5G distribution network distributed protection device based on reinforcement learning according to the present invention;
FIG. 2 is an environmental diagram of a 5G distribution network protection system according to the present invention;
FIG. 3 is a diagram of the reinforcement learning training process of the present invention.
Detailed Description
In this embodiment, as shown in fig. 1, a method for optimally arranging 5G distribution network distributed protection devices based on reinforcement learning is characterized by including the following steps:
step 1, as shown in fig. 2, building a 5G distribution network protection system environment;
let L be [ < L >1,l2,...,li,...,ln]Indicating whether n nodes of the 5G distribution network are provided with the protection devices or not, and if li0 means that the i-th node has no protection device, if li1, the protection device is arranged at the ith node, and at most one protection device can be arranged at each node, i is 1, 2.
Let D be [ D1,d2,...,di,...,dn]Representing the actual distances between the n nodes of the 5G distribution network and the 5G base station, diRepresents the actual distance between the ith node and the 5G base station when liWhen equal to 0, di=0;i=1,2,...,n;
Let S ═ L, D denote arrangement information of protection devices of the 5G distribution network protection system;
n nodes of the initialized 5G distribution network are all provided with protection devices, namely, { li=1,i=1,2,...,n};
Step 2, establishing a reinforcement learning model of the 5G distribution network protection system, wherein the reinforcement learning model consists of a strategy body and an executive body:
the strategy body consists of two layers of neuron networks, the input layer of the strategy body inputs the arrangement information S of a protection device of a 5G distribution network protection system, and the output layer of the strategy body outputs the probability pi (A) of all actions a, wherein the pi (A) is obtained through a strategy pi (theta) and a SoftMax function, and the theta is a set of neuron network parameters;
the executive body is used for executing actions so as to change the arrangement information of the protection device of the 5G distribution network protection system, and the reward module in the executive body calculates rewards;
step 3, as shown in fig. 3, training a reinforcement learning model in a 5G distribution network protection system environment;
step 3.1, defining the number of rounds as m, and initializing m to 1;
step 3.2, defining the times of each round of training as t, and initializing t to be 1;
defining the arrangement information of the protection device of the protection system for the mth training round as
Figure BDA0003573059550000051
And initialize
Figure BDA0003573059550000052
Step 3.3, input layer of strategy body inputs layout information
Figure BDA0003573059550000053
Step 3.4, strategy body passing strategy
Figure BDA0003573059550000054
And the probability of the SoftMax function outputting all actions a
Figure BDA0003573059550000055
Wherein
Figure BDA0003573059550000056
Is the mth round and the tth training strategy,
Figure BDA0003573059550000057
is the set of parameters for the tth training neural network of the mth round,
Figure BDA0003573059550000058
is the probability of training all actions a in the mth round, the tth time;
and 3.5, under the environment of the 5G distribution network protection system, the executing body outputs the probability of all the actions a according to the strategy body
Figure BDA0003573059550000059
Selection actions
Figure BDA00035730595500000510
Executing action under mth training of executing body
Figure BDA00035730595500000511
Thereby changing the number of the protection devices on the m adjacent points of the ith node, namely changing the arrangement information of the protection devices of the 5G distribution network protection system
Figure BDA00035730595500000512
And outputting the arrangement information of the mth round at the time of the t +1 training
Figure BDA00035730595500000513
Only wheniWhen the number of the nodes is 0, the executive body respectively adds a protection device for m adjacent points of the ith node; li+1,i=1,2,...,n;
Only wheniWhen the number of the m adjacent points is 1, the executive body reduces one protection device for the m adjacent points of the ith node; li-1,n=1,2,...,n;
The number of the protection devices is simply changed, the actual conditions of large distribution network number and wide distribution can not be met, and different actions are executed in each round in consideration of the complicated actual conditions of the distribution network, so that the protection effect is improved on the premise of meeting the reliability of the distribution network;
step 3.6, judging whether the formula (1) is established, if so, executing step 3.7, otherwise, returning to execute step 3.4:
Figure BDA00035730595500000514
in the formula (1), s is the reliability of the protection system, and piIs the probability of failure of the protection device on the ith node, sexThe reliability is expected when the 5G distribution network operates normally;
step 3.7, the reward module of the executive body calculates the reward under the tth training of the mth round through the formulas (2) to (5)
Figure BDA0003573059550000061
Figure BDA0003573059550000062
Figure BDA0003573059550000063
Figure BDA0003573059550000064
Figure BDA0003573059550000065
In the formulae (2) to (5),
Figure BDA0003573059550000066
is the reward of the distance between the protective device and the 5G base station under the mth training round and the tth training,
Figure BDA0003573059550000067
is the reward for the number of protective devices under the mth round and the tth training,
Figure BDA0003573059550000068
is the reward for protecting the reliability of the system under the mth round and the tth training;
Figure BDA0003573059550000069
is the reward for the number of protection devices under the mth round and the tth training; the distance between the protection device and the 5G base station affects the reliability of the protection systemSex;
step 3.8, updating the parameters under the t training of the mth round by the strategy body through the formulas (6) and (7)
Figure BDA00035730595500000610
Thereby obtaining the parameters under the t +1 training of the mth round
Figure BDA00035730595500000611
Figure BDA00035730595500000612
Figure BDA00035730595500000613
In the formulae (6) and (7), α is a learning rate,
Figure BDA00035730595500000614
is a parameter
Figure BDA00035730595500000615
The partial derivatives of (a) are,
Figure BDA00035730595500000616
is that
Figure BDA00035730595500000617
The function of the value of (a) below,
Figure BDA00035730595500000618
is that
Figure BDA00035730595500000619
The following strategy is adopted,
Figure BDA00035730595500000620
is a function of value
Figure BDA00035730595500000621
(ii) a desire for;
step 3.9, after assigning t +1 to t, judging t>C1If yes, finishing the training of the current mth round to obtain the optimal arrangement information under the current mth round
Figure BDA00035730595500000622
And will be
Figure BDA00035730595500000623
Is stored in a collection
Figure BDA00035730595500000624
After the step (3) is finished, executing the step (3.10), otherwise, returning to execute the step (3.3) for sequential execution; wherein, C1Is the maximum number of iterations per round;
step 3.10, after m +1 is assigned to m, m is judged>C2If yes, ending all training and collecting
Figure BDA0003573059550000071
To obtain the optimal arrangement information S*max(ii) a Otherwise, returning to execute the step 3.2 for sequential execution, wherein C2Is the maximum number of iteration rounds;
step 4, adopting the optimal layout information S*maxAnd arranging the 5G distribution network distributed protection devices.

Claims (1)

1. An optimized arrangement method of 5G distribution network distributed protection devices based on reinforcement learning is characterized by comprising the following steps:
step 1, building a 5G distribution network protection system environment;
let L be [ < L >1,l2,...,li,...,ln]Indicating whether n nodes of the 5G distribution network are provided with the protection devices or not, and if li0 means that the i-th node has no protection device, if li1, the protection device is arranged at the ith node, and at most one protection device can be arranged at each node, i is 1, 2.
Let D ═ D1,d2,...,di,...,dn]Representing the actual distances between the n nodes of the 5G distribution network and the 5G base station, diRepresents the actual distance between the ith node and the 5G base station when liWhen equal to 0, di=0;i=1,2,...,n;
Let S ═ L, D denote arrangement information of protection devices of the 5G distribution network protection system;
n nodes of the initialized 5G distribution network are all provided with protection devices, namely, { li=1,i=1,2,...,n};
Step 2, establishing a reinforcement learning model of the 5G distribution network protection system, wherein the reinforcement learning model consists of a strategy body and an executive body:
the strategy body consists of two layers of neuron networks, an input layer of the strategy body inputs the arrangement information S of a protection device of the 5G distribution network protection system, and an output layer of the strategy body outputs the probability pi (A) of all actions a, wherein the probability pi (A) is obtained through a strategy pi (theta) and a SoftMax function, and the theta is a set of neuron network parameters;
the executive body is used for executing actions so as to change the arrangement information of the protection devices of the 5G distribution network protection system, and the reward module in the executive body calculates rewards;
step 3, training a reinforcement learning model in a 5G distribution network protection system environment;
step 3.1, defining the number of rounds as m, and initializing m to be 1;
step 3.2, defining the times of each round of training as t, and initializing t to 1;
defining the arrangement information of the protection device of the protection system for the mth training round as
Figure FDA0003573059540000011
And initialize
Figure FDA0003573059540000012
Step 3.3, the input layer of the strategy body inputs the arrangement information
Figure FDA0003573059540000013
Step 3.4, the strategy body passes the strategy
Figure FDA0003573059540000014
And the probability of the SoftMax function outputting all actions a
Figure FDA0003573059540000015
Wherein
Figure FDA0003573059540000018
Is the mth round and the tth training strategy,
Figure FDA0003573059540000016
is the set of parameters for the tth training neural network of the mth round,
Figure FDA0003573059540000017
is the probability of training all actions a for the mth round, the tth time;
step 3.5, under the environment of the 5G distribution network protection system, the executive body outputs the probability of all the actions a according to the strategy body
Figure FDA0003573059540000021
Selection actions
Figure FDA0003573059540000022
The executive body executes actions under the mth training of the mth round
Figure FDA0003573059540000023
Thereby changing the number of the protection devices on the m adjacent points of the ith node, namely changing the arrangement information of the protection devices of the 5G distribution network protection system
Figure FDA0003573059540000024
And outputting the arrangement information of the mth round at the time of the t +1 training
Figure FDA0003573059540000025
Only wheniWhen the number of the m adjacent points is 0, the executive body respectively adds a protection device for the m adjacent points of the ith node; l. thei+1,i=1,2,...,n;
Only wheniWhen the number of the m adjacent points is 1, the executing body reduces one protection device for each of the m adjacent points of the ith node; li-1,n=1,2,...,n;
Step 3.6, judging whether the formula (1) is established, if so, executing step 3.7, otherwise, returning to execute step 3.4:
Figure FDA0003573059540000026
in the formula (1), s is the reliability of the protection system, and piIs the probability of failure of the protection device on the ith node, sexThe reliability is expected when the 5G distribution network operates normally;
step 3.7, the reward module of the executive body calculates the reward under the tth training of the mth round through the formula (2) to the formula (5)
Figure FDA0003573059540000027
Figure FDA0003573059540000028
Figure FDA0003573059540000029
Figure FDA00035730595400000210
Figure FDA00035730595400000211
In the formulae (2) to (5),
Figure FDA00035730595400000212
is the reward of the distance between the protective device and the 5G base station under the mth training round and the tth training,
Figure FDA00035730595400000213
is the reward for the number of protective devices under the mth round and the tth training,
Figure FDA00035730595400000214
is the reward for protecting the reliability of the system under the mth round and the tth training;
Figure FDA00035730595400000215
is the reward for the number of protection devices under the mth round and the tth training;
step 3.8, updating the parameters under the t training of the mth round by the strategy body through the formulas (6) and (7)
Figure FDA00035730595400000216
Thereby obtaining the parameters under the t +1 training of the mth round
Figure FDA0003573059540000031
Figure FDA0003573059540000032
Figure FDA0003573059540000033
In the expressions (6) and (7), α is a learning rate,
Figure FDA0003573059540000034
is a parameter
Figure FDA0003573059540000035
The partial derivatives of (a) are,
Figure FDA0003573059540000036
is that
Figure FDA0003573059540000037
The function of the value of (a) below,
Figure FDA0003573059540000038
is that
Figure FDA0003573059540000039
The following strategy is adopted,
Figure FDA00035730595400000310
is a function of value
Figure FDA00035730595400000311
(iii) a desire;
step 3.9, after assigning t +1 to t, judging t>C1Whether the current mth round is satisfied, if so, the training of the current mth round is ended to obtain the optimal arrangement information under the current mth round
Figure FDA00035730595400000312
And will be
Figure FDA00035730595400000313
Is stored in a collection
Figure FDA00035730595400000315
After the step (3) is finished, executing the step (3.10), otherwise, returning to execute the step (3.3) for sequential execution; wherein, C1Is the maximum number of iterations per round;
step 3.10, after m +1 is assigned to m, m is judged>C2If yes, ending all training and collecting
Figure FDA00035730595400000314
To obtain the optimal arrangement information S*max(ii) a Otherwise, returning to execute the step 3.2 for sequential execution, wherein C2Is the maximum number of iteration rounds;
step 4, adopting the optimal layout information S*maxAnd arranging the 5G distribution network distributed protection devices.
CN202210330896.4A 2022-03-30 2022-03-30 Optimal arrangement method of 5G distribution network distributed protection devices based on reinforcement learning Pending CN114710792A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210330896.4A CN114710792A (en) 2022-03-30 2022-03-30 Optimal arrangement method of 5G distribution network distributed protection devices based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210330896.4A CN114710792A (en) 2022-03-30 2022-03-30 Optimal arrangement method of 5G distribution network distributed protection devices based on reinforcement learning

Publications (1)

Publication Number Publication Date
CN114710792A true CN114710792A (en) 2022-07-05

Family

ID=82170813

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210330896.4A Pending CN114710792A (en) 2022-03-30 2022-03-30 Optimal arrangement method of 5G distribution network distributed protection devices based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN114710792A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116233895A (en) * 2023-05-04 2023-06-06 合肥工业大学 5G distribution network node communication optimization method, equipment and medium based on reinforcement learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116233895A (en) * 2023-05-04 2023-06-06 合肥工业大学 5G distribution network node communication optimization method, equipment and medium based on reinforcement learning
CN116233895B (en) * 2023-05-04 2023-07-18 合肥工业大学 5G distribution network node communication optimization method, equipment and medium based on reinforcement learning

Similar Documents

Publication Publication Date Title
CN103219743B (en) Pilot node selecting method based on wind electric power fluctuation probability characters
CN112288326B (en) Fault scene set reduction method suitable for toughness evaluation of power transmission system
CN114217524A (en) Power grid real-time self-adaptive decision-making method based on deep reinforcement learning
CN105429185A (en) Economic dispatching method with robust collaborative consistency
CN104102956A (en) Distribution network expansion planning method based on strategy adaption differential evolution
CN114710792A (en) Optimal arrangement method of 5G distribution network distributed protection devices based on reinforcement learning
Wang et al. Design and analysis of genetic algorithm and BP neural network based PID control for boost converter applied in renewable power generations
CN114666204A (en) Fault root cause positioning method and system based on cause and effect reinforcement learning
CN111160716A (en) Large power grid vulnerability assessment method based on tidal current betweenness
CN105552895A (en) Multilevel elicitation method dynamic planning based power system dynamic equivalent method
CN116316637A (en) Dynamic topology identification method, system, equipment and storage medium for power distribution network
CN111130053B (en) Power distribution network overcurrent protection method based on deep reinforcement learning
CN112464575A (en) Dam group risk assessment method and equipment based on Bayesian network
CN114123178B (en) Multi-agent reinforcement learning-based intelligent power grid partition network reconstruction method
CN111900720B (en) Transmission network fragile line identification method based on double-layer webpage sorting algorithm
CN105406517A (en) Finite time average consistency algorithm-based economic dispatching method for power system
CN106991229B (en) Wind power plant equivalent modeling method for complex topology
CN116401572A (en) Power transmission line fault diagnosis method and system based on CNN-LSTM
CN105629101B (en) A kind of method for diagnosing faults of more power module parallel systems based on ant group algorithm
Jin et al. Cyber-physical risk driven routing planning with deep reinforcement-learning in smart grid communication networks
CN114583696A (en) Power distribution network reactive power optimization method and system based on BP neural network and scene matching
CN115001978A (en) Cloud tenant virtual network intelligent mapping method based on reinforcement learning model
CN114219125A (en) High-elasticity urban power grid multi-dimensional intelligent partitioning method
CN114697200B (en) Protection device proportion optimization method of 5G distribution network distributed protection system
CN114417710A (en) Overload dynamic decision generation method and related device for power transmission network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination