CN114710792A - Optimal arrangement method of 5G distribution network distributed protection devices based on reinforcement learning - Google Patents
Optimal arrangement method of 5G distribution network distributed protection devices based on reinforcement learning Download PDFInfo
- Publication number
- CN114710792A CN114710792A CN202210330896.4A CN202210330896A CN114710792A CN 114710792 A CN114710792 A CN 114710792A CN 202210330896 A CN202210330896 A CN 202210330896A CN 114710792 A CN114710792 A CN 114710792A
- Authority
- CN
- China
- Prior art keywords
- distribution network
- training
- round
- protection
- mth
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W24/00—Supervisory, monitoring or testing arrangements
- H04W24/02—Arrangements for optimising operational condition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/061—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using biological neurons, e.g. biological neurons connected to an integrated circuit
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Neurology (AREA)
- Probability & Statistics with Applications (AREA)
- Microelectronics & Electronic Packaging (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The invention discloses an optimized arrangement method of a 5G distribution network distributed protection device based on reinforcement learning, which comprises the following steps: 1. building a 5G distribution network protection system environment; 2. establishing a reinforcement learning model of the 5G distribution network protection system, wherein the reinforcement learning model consists of a strategy body and an executive body; 3. training a reinforcement learning model in a 5G distribution network protection system environment; 4. using optimal layout information S*maxAnd arranging the 5G distribution network distributed protection devices. The invention can ensure that the master station protection device and the distribution network distributed protection device establish communication, and find out the optimal proportion of the protection devices of the 5G distribution network distributed protection system, thereby ensuring that the distribution network can operate safely and efficiently.
Description
Technical Field
The invention belongs to the field of distribution network protection, and particularly relates to an optimized arrangement method of a 5G distribution network distributed protection device based on reinforcement learning.
Background
The distribution network has the characteristics of multiple voltage levels, complex network structure, various equipment types, multiple and wide operation points, relatively poor safety environment and the like, has relatively more safety risk factors, and puts forward higher requirements on the safe and reliable operation of the distribution network in order to provide electric energy for various users, so that a protection device needs to be arranged to protect the distribution network. However, the distribution network has a large number and wide distribution, and the protection device is difficult to arrange due to the influence of the technology. At present, most of distribution network protection devices still adopt the traditional thought, and the optimal distribution network protection device cannot be arranged in the reliability range.
Disclosure of Invention
The invention provides an optimized arrangement method of a 5G distribution network distributed protection device based on reinforcement learning, aiming at solving the defects in the prior art. The optimal arrangement of the protection devices of the 5G distribution network distributed protection system is expected to be found on the premise of meeting the reliability, so that the safe and efficient operation of the distribution network is ensured.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention discloses an optimized arrangement method of a 5G distribution network distributed protection device based on reinforcement learning, which is characterized by comprising the following steps of:
step 1, building a 5G distribution network protection system environment;
let L be [ < L >1,l2,...,li,...,ln]Indicating whether n nodes of the 5G distribution network are provided with the protection devices or not, and if li0 means that the i-th node has no protection device, if li1, the protection device is arranged at the ith node, and at most one protection device can be arranged at each node, i is 1, 2.
Let D ═ D1,d2,...,di,...,dn]Representing the actual distances, d, of the n nodes of the 5G distribution network from the 5G base stationiRepresents the actual distance between the ith node and the 5G base station when liWhen equal to 0, di=0;i=1,2,...,n;
Let S ═ L, D denote arrangement information of protection devices of the 5G distribution network protection system;
n nodes of the initialized 5G distribution network are all provided with protection devices, namely, { li=1,i=1,2,...,n};
Step 2, establishing a reinforcement learning model of the 5G distribution network protection system, wherein the reinforcement learning model consists of a strategy body and an executive body:
the strategy body consists of two layers of neuron networks, an input layer of the strategy body inputs the arrangement information S of a protection device of the 5G distribution network protection system, and an output layer of the strategy body outputs the probability pi (A) of all actions a, wherein the probability pi (A) is obtained through a strategy pi (theta) and a SoftMax function, and the theta is a set of neuron network parameters;
the executive body is used for executing actions so as to change the arrangement information of the protection devices of the 5G distribution network protection system, and the reward module in the executive body calculates rewards;
step 3, training a reinforcement learning model in a 5G distribution network protection system environment;
step 3.1, defining the number of rounds as m, and initializing m to be 1;
step 3.2, defining the times of each round of training as t, and initializing t to be 1;
defining the arrangement information of the protection device of the protection system for the mth training round asAnd initialize
Step 3.4, the strategy body passes the strategyAnd the probability of the SoftMax function outputting all actions aWhereinIs the mth round and the tth training strategy,is the set of parameters for the tth training neural network of the mth round,is the probability of training all actions a for the mth round, the tth time;
and 3.5, under the environment of a 5G distribution network protection system, the executive body outputs the probability of all the actions a according to the strategy bodySelection actionsThe executive body executes actions under the mth training of the mth roundThereby changing the number of the protection devices on the m adjacent points of the ith node, namely changing the arrangement information of the protection devices of the 5G distribution network protection systemAnd outputting arrangement information of t +1 training times in mth round
Only wheniWhen the number of the m adjacent points is 0, the executive body respectively adds a protection device for the m adjacent points of the ith node; li+1,i=1,2,...,n;
Only wheniWhen the number of the m adjacent points is 1, the executing body reduces one protection device for each of the m adjacent points of the ith node; li-1,n=1,2,...,n;
Step 3.6, judging whether the formula (1) is established, if so, executing step 3.7, otherwise, returning to execute step 3.4:
in the formula (1), s isReliability of the protection system, piIs the probability of failure of the protection device on the ith node, sexThe reliability is expected when the 5G distribution network operates normally;
step 3.7, the reward module of the executive body calculates the reward under the tth training of the mth round through the formula (2) to the formula (5)
In the formulae (2) to (5),is the reward of the distance between the protective device and the 5G base station under the mth training round and the tth training,is the reward for the number of protective devices under the mth round and the tth training,is the reward for protecting the reliability of the system under the mth round and the tth training;is the reward for the number of protection devices under the mth round and the tth training;
step 3.8, updating the parameters under the t training of the mth round by the strategy body through the formulas (6) and (7)Thereby obtaining the parameters under the t +1 training of the mth round
In the expressions (6) and (7), α is a learning rate,is a parameterThe partial derivatives of (a) are,is thatThe function of the value of (a) below,is thatThe following strategy is adopted,is a function of value(iii) a desire;
step 3.9, after assigning t +1 to t, judging t>C1If yes, finishing the training of the current mth round to obtain the optimal arrangement information under the current mth roundAnd will beIs stored in a collectionAfter the step (3) is finished, executing the step (3.10), otherwise, returning to execute the step (3.3) for sequential execution; wherein, C1Is the maximum number of iterations per round;
step 3.10, after m +1 is assigned to m, m is judged>C2If yes, ending all training and collectingTo obtain the optimal arrangement information S*max(ii) a Otherwise, returning to execute the step 3.2 for sequential execution, wherein C2Is the maximum number of iteration rounds;
step 4, adopting the optimal layout information S*maxAnd arranging the 5G distribution network distributed protection devices.
Compared with the prior art, the invention has the beneficial effects that:
1. according to the invention, by utilizing the advantages of reinforcement learning and continuous interactive learning of the environment, and considering the characteristics of large quantity and wide distribution network, under the environment of a 5G distribution network protection system, the quantity and the arrangement condition of protection devices are changed in a multi-dimensional manner, and the optimal arrangement of the protection devices is found through iterative learning of reinforcement learning, so that the protection of the distribution network is realized, and the safe and reliable operation of the distribution network is ensured;
2. the invention provides a low-delay and high-reliability information channel for distribution network protection services by utilizing a 5G communication technology, thereby solving the problems that the traditional distribution network protection is weaker in selectivity, not accurate in fault location, long in fault removal time and incapable of realizing self-healing after fault removal of a distribution network line.
Drawings
Fig. 1 is a flowchart of an optimized arrangement method of a 5G distribution network distributed protection device based on reinforcement learning according to the present invention;
FIG. 2 is an environmental diagram of a 5G distribution network protection system according to the present invention;
FIG. 3 is a diagram of the reinforcement learning training process of the present invention.
Detailed Description
In this embodiment, as shown in fig. 1, a method for optimally arranging 5G distribution network distributed protection devices based on reinforcement learning is characterized by including the following steps:
step 1, as shown in fig. 2, building a 5G distribution network protection system environment;
let L be [ < L >1,l2,...,li,...,ln]Indicating whether n nodes of the 5G distribution network are provided with the protection devices or not, and if li0 means that the i-th node has no protection device, if li1, the protection device is arranged at the ith node, and at most one protection device can be arranged at each node, i is 1, 2.
Let D be [ D1,d2,...,di,...,dn]Representing the actual distances between the n nodes of the 5G distribution network and the 5G base station, diRepresents the actual distance between the ith node and the 5G base station when liWhen equal to 0, di=0;i=1,2,...,n;
Let S ═ L, D denote arrangement information of protection devices of the 5G distribution network protection system;
n nodes of the initialized 5G distribution network are all provided with protection devices, namely, { li=1,i=1,2,...,n};
Step 2, establishing a reinforcement learning model of the 5G distribution network protection system, wherein the reinforcement learning model consists of a strategy body and an executive body:
the strategy body consists of two layers of neuron networks, the input layer of the strategy body inputs the arrangement information S of a protection device of a 5G distribution network protection system, and the output layer of the strategy body outputs the probability pi (A) of all actions a, wherein the pi (A) is obtained through a strategy pi (theta) and a SoftMax function, and the theta is a set of neuron network parameters;
the executive body is used for executing actions so as to change the arrangement information of the protection device of the 5G distribution network protection system, and the reward module in the executive body calculates rewards;
step 3, as shown in fig. 3, training a reinforcement learning model in a 5G distribution network protection system environment;
step 3.1, defining the number of rounds as m, and initializing m to 1;
step 3.2, defining the times of each round of training as t, and initializing t to be 1;
defining the arrangement information of the protection device of the protection system for the mth training round asAnd initialize
Step 3.4, strategy body passing strategyAnd the probability of the SoftMax function outputting all actions aWhereinIs the mth round and the tth training strategy,is the set of parameters for the tth training neural network of the mth round,is the probability of training all actions a in the mth round, the tth time;
and 3.5, under the environment of the 5G distribution network protection system, the executing body outputs the probability of all the actions a according to the strategy bodySelection actionsExecuting action under mth training of executing bodyThereby changing the number of the protection devices on the m adjacent points of the ith node, namely changing the arrangement information of the protection devices of the 5G distribution network protection systemAnd outputting the arrangement information of the mth round at the time of the t +1 training
Only wheniWhen the number of the nodes is 0, the executive body respectively adds a protection device for m adjacent points of the ith node; li+1,i=1,2,...,n;
Only wheniWhen the number of the m adjacent points is 1, the executive body reduces one protection device for the m adjacent points of the ith node; li-1,n=1,2,...,n;
The number of the protection devices is simply changed, the actual conditions of large distribution network number and wide distribution can not be met, and different actions are executed in each round in consideration of the complicated actual conditions of the distribution network, so that the protection effect is improved on the premise of meeting the reliability of the distribution network;
step 3.6, judging whether the formula (1) is established, if so, executing step 3.7, otherwise, returning to execute step 3.4:
in the formula (1), s is the reliability of the protection system, and piIs the probability of failure of the protection device on the ith node, sexThe reliability is expected when the 5G distribution network operates normally;
step 3.7, the reward module of the executive body calculates the reward under the tth training of the mth round through the formulas (2) to (5)
In the formulae (2) to (5),is the reward of the distance between the protective device and the 5G base station under the mth training round and the tth training,is the reward for the number of protective devices under the mth round and the tth training,is the reward for protecting the reliability of the system under the mth round and the tth training;is the reward for the number of protection devices under the mth round and the tth training; the distance between the protection device and the 5G base station affects the reliability of the protection systemSex;
step 3.8, updating the parameters under the t training of the mth round by the strategy body through the formulas (6) and (7)Thereby obtaining the parameters under the t +1 training of the mth round
In the formulae (6) and (7), α is a learning rate,is a parameterThe partial derivatives of (a) are,is thatThe function of the value of (a) below,is thatThe following strategy is adopted,is a function of value(ii) a desire for;
step 3.9, after assigning t +1 to t, judging t>C1If yes, finishing the training of the current mth round to obtain the optimal arrangement information under the current mth roundAnd will beIs stored in a collectionAfter the step (3) is finished, executing the step (3.10), otherwise, returning to execute the step (3.3) for sequential execution; wherein, C1Is the maximum number of iterations per round;
step 3.10, after m +1 is assigned to m, m is judged>C2If yes, ending all training and collectingTo obtain the optimal arrangement information S*max(ii) a Otherwise, returning to execute the step 3.2 for sequential execution, wherein C2Is the maximum number of iteration rounds;
step 4, adopting the optimal layout information S*maxAnd arranging the 5G distribution network distributed protection devices.
Claims (1)
1. An optimized arrangement method of 5G distribution network distributed protection devices based on reinforcement learning is characterized by comprising the following steps:
step 1, building a 5G distribution network protection system environment;
let L be [ < L >1,l2,...,li,...,ln]Indicating whether n nodes of the 5G distribution network are provided with the protection devices or not, and if li0 means that the i-th node has no protection device, if li1, the protection device is arranged at the ith node, and at most one protection device can be arranged at each node, i is 1, 2.
Let D ═ D1,d2,...,di,...,dn]Representing the actual distances between the n nodes of the 5G distribution network and the 5G base station, diRepresents the actual distance between the ith node and the 5G base station when liWhen equal to 0, di=0;i=1,2,...,n;
Let S ═ L, D denote arrangement information of protection devices of the 5G distribution network protection system;
n nodes of the initialized 5G distribution network are all provided with protection devices, namely, { li=1,i=1,2,...,n};
Step 2, establishing a reinforcement learning model of the 5G distribution network protection system, wherein the reinforcement learning model consists of a strategy body and an executive body:
the strategy body consists of two layers of neuron networks, an input layer of the strategy body inputs the arrangement information S of a protection device of the 5G distribution network protection system, and an output layer of the strategy body outputs the probability pi (A) of all actions a, wherein the probability pi (A) is obtained through a strategy pi (theta) and a SoftMax function, and the theta is a set of neuron network parameters;
the executive body is used for executing actions so as to change the arrangement information of the protection devices of the 5G distribution network protection system, and the reward module in the executive body calculates rewards;
step 3, training a reinforcement learning model in a 5G distribution network protection system environment;
step 3.1, defining the number of rounds as m, and initializing m to be 1;
step 3.2, defining the times of each round of training as t, and initializing t to 1;
defining the arrangement information of the protection device of the protection system for the mth training round asAnd initialize
Step 3.4, the strategy body passes the strategyAnd the probability of the SoftMax function outputting all actions aWhereinIs the mth round and the tth training strategy,is the set of parameters for the tth training neural network of the mth round,is the probability of training all actions a for the mth round, the tth time;
step 3.5, under the environment of the 5G distribution network protection system, the executive body outputs the probability of all the actions a according to the strategy bodySelection actionsThe executive body executes actions under the mth training of the mth roundThereby changing the number of the protection devices on the m adjacent points of the ith node, namely changing the arrangement information of the protection devices of the 5G distribution network protection systemAnd outputting the arrangement information of the mth round at the time of the t +1 training
Only wheniWhen the number of the m adjacent points is 0, the executive body respectively adds a protection device for the m adjacent points of the ith node; l. thei+1,i=1,2,...,n;
Only wheniWhen the number of the m adjacent points is 1, the executing body reduces one protection device for each of the m adjacent points of the ith node; li-1,n=1,2,...,n;
Step 3.6, judging whether the formula (1) is established, if so, executing step 3.7, otherwise, returning to execute step 3.4:
in the formula (1), s is the reliability of the protection system, and piIs the probability of failure of the protection device on the ith node, sexThe reliability is expected when the 5G distribution network operates normally;
step 3.7, the reward module of the executive body calculates the reward under the tth training of the mth round through the formula (2) to the formula (5)
In the formulae (2) to (5),is the reward of the distance between the protective device and the 5G base station under the mth training round and the tth training,is the reward for the number of protective devices under the mth round and the tth training,is the reward for protecting the reliability of the system under the mth round and the tth training;is the reward for the number of protection devices under the mth round and the tth training;
step 3.8, updating the parameters under the t training of the mth round by the strategy body through the formulas (6) and (7)Thereby obtaining the parameters under the t +1 training of the mth round
In the expressions (6) and (7), α is a learning rate,is a parameterThe partial derivatives of (a) are,is thatThe function of the value of (a) below,is thatThe following strategy is adopted,is a function of value(iii) a desire;
step 3.9, after assigning t +1 to t, judging t>C1Whether the current mth round is satisfied, if so, the training of the current mth round is ended to obtain the optimal arrangement information under the current mth roundAnd will beIs stored in a collectionAfter the step (3) is finished, executing the step (3.10), otherwise, returning to execute the step (3.3) for sequential execution; wherein, C1Is the maximum number of iterations per round;
step 3.10, after m +1 is assigned to m, m is judged>C2If yes, ending all training and collectingTo obtain the optimal arrangement information S*max(ii) a Otherwise, returning to execute the step 3.2 for sequential execution, wherein C2Is the maximum number of iteration rounds;
step 4, adopting the optimal layout information S*maxAnd arranging the 5G distribution network distributed protection devices.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210330896.4A CN114710792A (en) | 2022-03-30 | 2022-03-30 | Optimal arrangement method of 5G distribution network distributed protection devices based on reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210330896.4A CN114710792A (en) | 2022-03-30 | 2022-03-30 | Optimal arrangement method of 5G distribution network distributed protection devices based on reinforcement learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114710792A true CN114710792A (en) | 2022-07-05 |
Family
ID=82170813
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210330896.4A Pending CN114710792A (en) | 2022-03-30 | 2022-03-30 | Optimal arrangement method of 5G distribution network distributed protection devices based on reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114710792A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116233895A (en) * | 2023-05-04 | 2023-06-06 | 合肥工业大学 | 5G distribution network node communication optimization method, equipment and medium based on reinforcement learning |
-
2022
- 2022-03-30 CN CN202210330896.4A patent/CN114710792A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116233895A (en) * | 2023-05-04 | 2023-06-06 | 合肥工业大学 | 5G distribution network node communication optimization method, equipment and medium based on reinforcement learning |
CN116233895B (en) * | 2023-05-04 | 2023-07-18 | 合肥工业大学 | 5G distribution network node communication optimization method, equipment and medium based on reinforcement learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103219743B (en) | Pilot node selecting method based on wind electric power fluctuation probability characters | |
CN112288326B (en) | Fault scene set reduction method suitable for toughness evaluation of power transmission system | |
CN114217524A (en) | Power grid real-time self-adaptive decision-making method based on deep reinforcement learning | |
CN105429185A (en) | Economic dispatching method with robust collaborative consistency | |
CN104102956A (en) | Distribution network expansion planning method based on strategy adaption differential evolution | |
CN114710792A (en) | Optimal arrangement method of 5G distribution network distributed protection devices based on reinforcement learning | |
Wang et al. | Design and analysis of genetic algorithm and BP neural network based PID control for boost converter applied in renewable power generations | |
CN114666204A (en) | Fault root cause positioning method and system based on cause and effect reinforcement learning | |
CN111160716A (en) | Large power grid vulnerability assessment method based on tidal current betweenness | |
CN105552895A (en) | Multilevel elicitation method dynamic planning based power system dynamic equivalent method | |
CN116316637A (en) | Dynamic topology identification method, system, equipment and storage medium for power distribution network | |
CN111130053B (en) | Power distribution network overcurrent protection method based on deep reinforcement learning | |
CN112464575A (en) | Dam group risk assessment method and equipment based on Bayesian network | |
CN114123178B (en) | Multi-agent reinforcement learning-based intelligent power grid partition network reconstruction method | |
CN111900720B (en) | Transmission network fragile line identification method based on double-layer webpage sorting algorithm | |
CN105406517A (en) | Finite time average consistency algorithm-based economic dispatching method for power system | |
CN106991229B (en) | Wind power plant equivalent modeling method for complex topology | |
CN116401572A (en) | Power transmission line fault diagnosis method and system based on CNN-LSTM | |
CN105629101B (en) | A kind of method for diagnosing faults of more power module parallel systems based on ant group algorithm | |
Jin et al. | Cyber-physical risk driven routing planning with deep reinforcement-learning in smart grid communication networks | |
CN114583696A (en) | Power distribution network reactive power optimization method and system based on BP neural network and scene matching | |
CN115001978A (en) | Cloud tenant virtual network intelligent mapping method based on reinforcement learning model | |
CN114219125A (en) | High-elasticity urban power grid multi-dimensional intelligent partitioning method | |
CN114697200B (en) | Protection device proportion optimization method of 5G distribution network distributed protection system | |
CN114417710A (en) | Overload dynamic decision generation method and related device for power transmission network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |