CN114298429A - Power distribution network scheme aided decision-making method, system, device and storage medium - Google Patents

Power distribution network scheme aided decision-making method, system, device and storage medium Download PDF

Info

Publication number
CN114298429A
CN114298429A CN202111661200.8A CN202111661200A CN114298429A CN 114298429 A CN114298429 A CN 114298429A CN 202111661200 A CN202111661200 A CN 202111661200A CN 114298429 A CN114298429 A CN 114298429A
Authority
CN
China
Prior art keywords
power distribution
distribution network
decision
model
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111661200.8A
Other languages
Chinese (zh)
Inventor
齐小伟
陈秀海
李昕
李永勋
姚巍
韩爽
关鹏
陈佳博
彭博
张育臣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
State Grid Beijing Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
State Grid Beijing Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, State Grid Beijing Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN202111661200.8A priority Critical patent/CN114298429A/en
Publication of CN114298429A publication Critical patent/CN114298429A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention discloses a power distribution network scheme-based aid decision-making method, a system, a device and a storage medium, wherein the method comprises the following steps: acquiring online operation data of the power distribution network; inputting the online operation data of the power distribution network into a preset power distribution network dynamic case auxiliary decision model; the power distribution network scheme-based auxiliary decision-making model outputs a network structure for ensuring the power distribution network to run safely and indefinitely; the power distribution network scheme-based aided decision-making model is formed based on training of a DQN reinforcement learning algorithm. The topological structure of the power distribution network can be reconstructed only by inputting the corresponding state of the on-line power distribution network, and the power distribution network scheme-based auxiliary decision-making method based on the reinforcement learning method has the advantages of high operation speed, high efficiency and the like because all reconstruction schemes of the whole power distribution network do not need to be completely calculated in the judgment process, the calculated amount is small, and the consumed time is short.

Description

Power distribution network scheme aided decision-making method, system, device and storage medium
Technical Field
The invention belongs to the technical field of power grid operation safety, and particularly relates to a power distribution network scheme-based auxiliary decision-making method, a system, a device and a storage medium.
Background
With the rapid development of urban power distribution network technology, the construction of the power distribution network gradually enters a high-reliability stage. According to statistics, more than three-fourths of the power failure accidents of users are caused by the faults of the power distribution network. Meanwhile, with the popularization of distributed renewable energy (DRG), power distribution networks face changes in supply and demand relationships. How to ensure the safe and economic operation of the distribution network becomes more and more important. The Distribution Network Reconfiguration (DNR) adjusts the topological structure of the Distribution network by controlling the on/off state of the connecting switch, so that the network loss can be reduced, the voltage quality of the Distribution network is improved, and the safe and stable operation of the power grid is ensured.
The traditional method for optimizing the power distribution network scheduling by using network reconstruction mainly comprises heuristic algorithms such as a brute force search algorithm, a genetic algorithm and the like. However, the network topology structure of the power distribution network is complex, the distribution is relatively dispersed, the equipment variety is various, the equipment running state is easily affected by external factors, and the like, so that the calculation amount is large when the traditional method is adopted for network reconstruction, the consumed time is relatively long, and the final reconstruction efficiency is low.
Disclosure of Invention
The invention aims to provide a power distribution network scheme-based decision-making assisting method, a system, a device and a storage medium, and aims to solve the problems that in the prior art, heuristic methods such as a traditional genetic algorithm are low in decision-making efficiency and untimely in decision-making due to too long calculation time.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention provides a power distribution network dynamic case assistant decision method based on a deep neural network, which comprises the following steps:
acquiring online operation data of the power distribution network;
inputting the online operation data of the power distribution network into a preset power distribution network dynamic case auxiliary decision model; the power distribution network scheme-based auxiliary decision-making model outputs a network structure for ensuring the power distribution network to run safely and indefinitely;
the power distribution network dynamic scheme assistant decision model is formed based on training of a DQN reinforcement learning algorithm.
Optionally, the online operation data of the power distribution network includes real-time or predicted switching state, fan power generation, photovoltaic power generation, and load size.
Optionally, the power distribution network scenario aided decision model is obtained in the following manner:
acquiring historical operation data of the power distribution network;
setting an epsilon-greedy strategy selection action;
establishing a power distribution network system environment model based on the historical operation data of the power distribution network, and establishing a deep reinforcement learning model of an intelligent agent;
and based on a preset epsilon-greedy strategy, performing off-line training and learning by using a power distribution network system environment model and a deep reinforcement learning model to obtain a power distribution network dynamic case assistant decision model meeting the error requirement.
Optionally, after the historical operation data of the power distribution network is obtained, the historical operation data of the power distribution network is preprocessed, and the historical operation data of the power distribution network is converted into an original sample set suitable for a reinforcement learning algorithm.
Optionally, the power distribution network system environment model and the deep reinforcement learning model are used for offline training and learning, and the method specifically includes the following steps:
the power distribution network system environment model returns a new system state and calculates a corresponding reward value every time the power distribution network system environment model executes the action given by the deep reinforcement learning model; and the deep reinforcement learning model continuously learns and improves the action strategy in the interaction process with the power distribution network system environment model by taking the control action capable of maximizing the reward expectation value as the target according to the current state.
Optionally, the establishing of the power distribution network system environment model includes: a smart agent state space, an action space, and a smart agent reward/penalty mechanism are set.
Optionally, when the deep reinforcement learning model of the agent is constructed, two neural networks are used, one real network generates a current Q value, and the other target network Q target, the initial weights and parameters of the two neural networks are the same, and the parameter updating speeds are different.
In a second aspect of the present invention, a system for the power distribution network scenario aided decision making method based on the deep neural network is provided, including:
the data acquisition module is used for acquiring the online operation data of the power distribution network;
the prediction module is used for inputting the online operation data of the power distribution network into a preset power distribution network dynamic case aid decision model; and the power distribution network scheme auxiliary decision model outputs a network structure for ensuring the power distribution network to operate safely and indefinitely.
In a third aspect of the present invention, a computer apparatus is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the processor implements the method for assisting in deciding a power distribution grid pattern based on a deep neural network.
In a fourth aspect of the present invention, a computer-readable storage medium is provided, where a computer program is stored, and when the computer program is executed by a processor, the method for assisting in decision-making of a power distribution grid pattern based on a deep neural network is implemented.
The invention has the following beneficial effects:
(A) the invention provides a neural network-based power distribution network dynamic case aid decision method, which utilizes an artificial intelligence algorithm to solve the idea of power distribution network dynamic case aid decision and comprises the following steps: firstly, the risk early warning problem of the power distribution network is converted into a Markov decision process, the network structure, the generated energy, the load capacity and the like of the power distribution network are selected as states, the on-off of all the adjustable line switches are used as actions, the radiancy and the safe and stable operation of the power distribution network are ensured as rewards, and then a DON reinforcement learning algorithm is utilized to train an auxiliary decision model of the power distribution network. Therefore, the topological structure of the power distribution network can be reconstructed only by inputting the corresponding state of the on-line power distribution network, and the power distribution network scheme-based auxiliary decision-making method based on the reinforcement learning method has the advantages of high operation speed, high efficiency and the like because all reconstruction schemes of the whole power distribution network do not need to be completely calculated in the judgment process, the calculated amount is small, and the consumed time is short.
(B) The DQN algorithm adopted by the invention is an algorithm which is widely applied in the field of reinforcement learning and has excellent performance, and the DQN algorithm integrates a deep neural network and a Q-learning reinforcement learning algorithm.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 is a flow chart of an auxiliary decision method for a power distribution network scenario change according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating a process for interaction of an reinforcement learning agent with an environment in accordance with an embodiment of the present invention;
FIG. 3 is a diagram illustrating a training process of a DQN reinforcement learning algorithm according to an embodiment of the present invention;
figure 4 is a flow chart of a Markov Decision Process (MDP) in an embodiment of the present invention;
fig. 5 is a diagram of a DQN reinforcement learning algorithm neural network in the embodiment of the present invention.
Detailed Description
The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
The following detailed description is exemplary in nature and is intended to provide further details of the invention. Unless otherwise defined, all technical terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention.
The embodiment of the invention provides a power distribution network scenario-based decision-making assisting method, a system, a device and a storage medium, and solves the problems of low decision-making efficiency, untimely decision-making and the like caused by overlong calculation time of heuristic methods such as the traditional genetic algorithm and the like at present.
As shown in fig. 1, in a first aspect of the present invention, a power distribution network scenario-based aided decision-making method based on a deep neural network is provided, a power distribution network aided decision-making process is autonomously learned by using a reinforcement learning algorithm model, and a trained model can make a decision immediately according to a current power distribution network state to maintain stable operation of a power distribution network, including the following steps:
step 1: and (3) data acquisition and preprocessing, namely selecting historical operation data of the power distribution network in a certain area as a data source, and converting the historical operation data of the power distribution network related to power distribution network aid decision into an original sample set suitable for a reinforcement learning algorithm.
Specifically, the historical operating data of the power distribution network comprises: data of the power distribution system (corresponding to the "environment" in the RL) and various input data (corresponding to the parts of the "state" in the RL, such as the number of fans, the number of photovoltaic power generation, the number of loads, and the switching operation frequency, etc.) can be simulated.
The data preprocessing process mainly comprises the following steps:
step 11: the WT, PV and load numbers in the distribution system are respectively marked as NWT,NPVAnd NDThen the required data comprises a data set of the power generation amount of the distributed renewable energy device:
Figure BDA0003446987160000041
Figure BDA0003446987160000042
load data set:
Figure BDA0003446987160000043
step 12: the time interval between the last operation of each switch and the current moment is recorded simultaneously:
Figure BDA0003446987160000044
and when the time interval is smaller than the preset value, the switch is not actuated within the range of the preset value. Reach T at lead timeSWThis value will then not increase further, enough to indicate that the switch is not over-used. Thus, SWtrNot prepared in advance but (after assigning it an initial value) extracted from the actions of the agent.
Step 13:
Figure BDA0003446987160000045
Dtthe data is acquired from various online or offline sources, and the data is preprocessed, including data cleaning, normalization and the like.
Specifically, data cleaning refers to eliminating problem data such as missing data and error measurement (zero value or negative value) data, and selecting valid data closest to linear connection to perform.
Specifically, the data normalization is to normalize the raw data such as
Figure BDA0003446987160000046
DtRespectively normalized to XWT,XPV,XDAll negative data are removed and all data are divided by their maximum value. Before inputting DQN, SW is connectedtrNormalized to XtrDivide each element by TSW
Step 14: the normalized data is divided into a training set and a test set.
Specifically, in normal reinforcement learning, the state is updated through interaction of the agent with the environment, independent of special external data. The state in the present embodiment depends on the power distribution system corresponding to the environment and the distributed renewable energy system power generation, and the load data may vary depending on external conditions such as weather. If no data separation is performed, DQN may be over-used in the training data set, possibly leading to configuration failures. Therefore, a test set is specified in the present embodiment to verify whether DQN learning performs correctly in relatively bad conditions.
Step 2: setting an epsilon-greedy strategy selection action;
specifically, in reinforcement learning, the action selection of an agent is called a policy. The goal of reinforcement learning is to determine the best strategy to achieve the highest reward. Prior to training, DQN randomly initializes weights and bias values, and thus initially cannot determine the action that will deliver a highest reward. The agent takes a random action to "explore" which can get a high reward, rather than relying on the output of DQN from the outset. So this embodiment uses intuitive, simple epsilon-greedy as an exploration strategy.
The agent selects the action with the maximum Q value in the corresponding state according to the probability epsilon, and simultaneously randomly selects the action a according to the probability 1-epsilontTo ensure that the exploration in the motion space is as possible, the formula of epsilon-greedy is as follows:
Figure BDA0003446987160000051
in the epsilon-greedy strategy, the larger the epsilon value, the faster the convergence and the easier it is to fall into local optima. Therefore, in the embodiment, in the early stage of exploration, the intelligent agent lacks effective information, the intelligent agent is explored in the action space with a high probability, and the accumulated empirical knowledge tends to be accurate along with continuous learning, and epsilon is gradually reduced.
And step 3: and establishing a power distribution network system environment model by using the preprocessed data, and establishing a deep reinforcement learning model of the intelligent agent. And setting corresponding state space, action space and reward (punishment) function, and performing off-line training and learning by using the DQN reinforcement learning model and the simulation environment to obtain the power distribution network dynamic case assistant decision-making model meeting the error requirement.
FIG. 2 is a process of an reinforcement learning agent interacting with a system environment. Every time the system environment executes the action given by the agent, the system environment returns to the new system state (state) and calculates the corresponding reward value (reward); and the intelligent agent continuously learns and improves the action strategy in the process of interacting with the actual environment by taking the control action capable of maximizing the reward expectation value as the target according to the current state.
As an example applied to the present invention, a Deep Q Network (DQN) learns Q values of all state-action pairs in a limited state and environment interaction process by fitting an action cost function through a neural network, thereby learning an optimal strategy.
Fig. 3 shows a DQN algorithm training process, wherein step 3 specifically comprises the following steps:
step 31: constructing a power distribution network system environment model;
the power distribution network system environment model is a power system environment interacting with the intelligent agent, namely, for each action given by the reinforcement learning intelligent agent, whether a risk exists under the strategy can be calculated, a reward function is fed back, the strategy is updated, and iteration is continuously carried out until an optimal strategy is learned.
In the present embodiment, the environment is formalized as a Markov decision process (Markov de)Precision process, MDP). MD P can be defined as a tuple (, a, P, R, γ) representing state space, action space, state transition probability, reward function and discount factor, respectively. Agent observes state s from the environmenttE.g. S, and take action a at time step ttE.g. A, agent with probability P(s)t+1|st,at) To a new state st+1While receiving the prize r(s)t,at,st+1). The state transition process is shown in fig. 4.
Step 32: smart body action space
The random action consists of an array of 0 (open switch) and 1 (close switch). When extracting actions from DQN, the output layer is arranged in array form, and then the switch that wants to be turned off is replaced with 1, and the remaining elements are replaced with 0. The determined action is then input to the test system to effect opening or closing of each switch, and the SW is updated by comparing the switch state with the stored state of the previous actiontr
Step 33: intelligent agent reward (punishment) mechanism
The agent's primary purpose in this application is to preserve the radiance of the network by maximizing its cumulative reward through continuous learning over a period of time. Furthermore, the reconfigured network meets general power network constraints;
as an example of the invention, line traffic or bus voltage is kept within a certain range for a given amount of power generation and load. The reward (penalty) mechanism of the DQN model is:
given state s and action a, the total reward rt in each time step may be expressed as follows:
Figure BDA0003446987160000061
Figure BDA0003446987160000062
Figure BDA0003446987160000063
Figure BDA0003446987160000064
wherein, L is the number of lines in the network; i isiAmpacity (percentage) of ith line; α is the line overload penalty weight; b is the total bus number in the network; vjObtaining a j-th line voltage per unit value; β is bus voltage penalty weight; γ ═ switch uses an excess penalty weight.
If the reconfigured network is radial, then each time step gives the agent a specific reward rinit(ii) a Otherwise, it will get a negative reward (i.e. penalty), failure rfailWhile the simulated episode will terminate immediately.
If the reconfigured network is radioactive, when the network violates the line capacity or bus voltage constraints, the agent will pair p according to the degree of violation and the corresponding weighting factor each timelineAnd pbusAnd punishment is carried out.
plineThe calculation method of (1) is to assume that the capacity percentage of the distribution network line is 1, and multiply each illegal line i by a weight coefficient alpha. Likewise, pbusIs to calculate the hypothetical stable bus voltage in the range of 0.95-1.05 and multiply the weight factor β by each offending bus j. In addition, a penalty period pswFrequent operation of the section switch is prevented. The use of handover records in this embodiment
Figure BDA0003446987160000071
To identify how much time has elapsed since the last actuation of the switch when swtr,k<TSW,pswMultiplying sw by weight factor gammatr,kAnd TSWThe difference between them. In other embodiments, anyone using the model may adjust the penalty weight factor according to the applied distribution network environment.
Step 34: DQN algorithm process;
in the embodiment, Q-table is updated and converted into a function fitting problem through DQN, and a function is fitted to replace the Q-table to generate a Q value, so that the similar state obtains a good effect of extracting complex features by a similar output action depth neural network, and then deep Learning and Reinforcement Learning are combined to generate a DQN algorithm. The DQN reinforcement learning algorithm neural network is constructed as shown in fig. 5.
Another part of the innovation of the DQN algorithm is to solve the correlation and non-static distribution problem by an experience replay (experience pool) method. Experience(s) of the agent at time tt,at,rt,st+1) Stored in a playback memory D of size N. Then samples are randomly extracted to form small batch processing of a certain scale, and parameter learning is carried out. The size of the experience pool is limited due to both memory limitations and the need to train with the most up-to-date data. By using the method, the number of interaction with the environment can be reduced, the data efficiency is improved, the deviation caused by correlation among training samples can be eliminated, and the generalization performance is improved; two neural networks are used, one real network generates a current Q value, and the other target network Qtarget has the same initial weight and parameters, but the updating speed of the parameters is different, so that the data correlation is reduced.
The specific algorithm flow is as follows:
(1) initializing a playback memory unit D, wherein the number of the data pieces which can be accommodated is N;
(2) initializing a real Q network, and randomly generating a weight omega;
(3) generating a target Q network by using the same structure and parameters, wherein the weight omega' is omega;
(4) cycling through, each event, namely, 1,2, …, M (wherein M is the total number of days);
(5) initializing a preprocessed first state
Figure BDA0003446987160000072
(6) Each step 1,2, …, T of each event is circularly traversed; (at 15min intervals, one day is divided into 96 spots, T96)
(7) Generating action a with e-greedy policyt: selecting a random action a with a probabilityt(ii) a If the small probability event does not occur, selecting the action with the largest current value function by using a greedy strategy;
Figure BDA0003446987160000073
(8) executing switching action in a power distribution network simulation environment;
(9) if the reconfigured network is radioactive;
(10) solving tide and saving current-carrying capacity of all lines
Figure BDA0003446987160000074
All bus voltages
Figure BDA0003446987160000075
And updating the record to calculate the prize r according to the first formula of step 33t
(11) Otherwise rt=rfail
(12) Receiving a reward rtAnd a new state st+1
(13) Converting the result(s)t,at,rt,st+1) Storing the data into D;
(14) uniformly and randomly sampling a conversion sample data from D(s)j,aj,rj,sj+1);
(15) Judging whether the event is in a termination state, if so, rewarding the event as rj(ii) a Otherwise, calculating the TD return r by using the TD target network parameter omegaj+γmaxa′Q(s′,a′,ω′);
(16) Executing a gradient descent algorithm to update network parameters;
Figure BDA0003446987160000081
(17) updating the network parameter theta which is approximated by the action value function as theta + delta theta;
(18) updating the target Q network every C step;
(19) ending the cycle of each event;
(20) the loop between events is ended.
After the training and learning of the steps, a power distribution network dynamic case aided decision-making model based on a reinforcement learning algorithm is formed, so that the power distribution network structure is accurately regulated and controlled.
And 4, step 4: and directly using the online operation data of the power distribution network, and performing online auxiliary decision-making on the power distribution network by using a power distribution network dynamic case auxiliary decision-making model. The method specifically comprises the following steps:
step 41: inputting the online operation data of the power distribution network as a state into a power distribution network dynamic case aid decision model;
as an example of the invention, the online operation data of the power distribution network is real-time or predicted switch state, fan power generation, photovoltaic power generation and load size.
Step 42: and the power distribution network dynamic case assistant decision model directly outputs a network structure for ensuring the power distribution network to operate safely and stably.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be appreciated by those skilled in the art that the invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The embodiments disclosed above are therefore to be considered in all respects as illustrative and not restrictive. All changes which come within the scope of or equivalence to the invention are intended to be embraced therein.

Claims (10)

1. A power distribution network dynamic case aid decision-making method based on a deep neural network is characterized by comprising the following steps:
acquiring online operation data of the power distribution network;
inputting the online operation data of the power distribution network into a preset power distribution network dynamic case auxiliary decision model; the power distribution network scheme-based auxiliary decision-making model outputs a network structure for ensuring the power distribution network to run safely and indefinitely;
the power distribution network dynamic scheme assistant decision model is formed based on training of a DQN reinforcement learning algorithm.
2. The deep neural network-based power distribution network scenario-based aided decision making method according to claim 1, wherein the power distribution network online operation data comprises real-time or predicted switch states, wind turbine power generation, photovoltaic power generation and load sizes.
3. The power distribution network dynamic scenario aided decision method based on the deep neural network as claimed in claim 1, wherein the power distribution network dynamic scenario aided decision model is obtained in the following manner:
acquiring historical operation data of the power distribution network;
setting an epsilon-greedy strategy selection action;
establishing a power distribution network system environment model based on the historical operation data of the power distribution network, and establishing a deep reinforcement learning model of an intelligent agent;
and based on a preset epsilon-greedy strategy, performing off-line training and learning by using a power distribution network system environment model and a deep reinforcement learning model to obtain a power distribution network dynamic case assistant decision model meeting the error requirement.
4. The deep neural network-based power distribution network dynamic scenario aided decision making method according to claim 3, characterized in that after power distribution network historical operation data are obtained, the power distribution network historical operation data are preprocessed, and the power distribution network historical operation data are converted into an original sample set suitable for a reinforcement learning algorithm.
5. The power distribution network scenario-based aided decision-making method based on the deep neural network as claimed in claim 3, wherein the offline training and learning are performed by using a power distribution network system environment model and a deep reinforcement learning model, and specifically comprises:
the power distribution network system environment model returns a new system state and calculates a corresponding reward value every time the power distribution network system environment model executes the action given by the deep reinforcement learning model; and the deep reinforcement learning model continuously learns and improves the action strategy in the interaction process with the power distribution network system environment model by taking the control action capable of maximizing the reward expectation value as the target according to the current state.
6. The deep neural network-based power distribution network scenario aided decision making method according to claim 3, wherein the establishing of the power distribution network system environment model comprises: a smart agent state space, an action space, and a smart agent reward/penalty mechanism are set.
7. The power distribution network dynamic scenario aided decision-making method based on the deep neural network as claimed in claim 3, wherein when a deep reinforcement learning model of an agent is constructed, two neural networks are used, one real network generates a current Q value, and the other target network is a Q target, wherein the two neural networks have the same initial weight and parameters and different parameter updating speeds.
8. A system for the power distribution network dynamic scenario aided decision method based on the deep neural network is characterized by comprising the following steps:
the data acquisition module is used for acquiring the online operation data of the power distribution network;
the prediction module is used for inputting the online operation data of the power distribution network into a preset power distribution network dynamic case aid decision model; and the power distribution network scheme auxiliary decision model outputs a network structure for ensuring the power distribution network to operate safely and indefinitely.
9. A computer arrangement comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor when executing the computer program implements a deep neural network based power distribution grid pattern aid decision method according to any one of claims 1 to 7.
10. A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, implements the method for assisting in deciding the dynamic scenario of the deep neural network-based power distribution network according to any one of claims 1 to 7.
CN202111661200.8A 2021-12-30 2021-12-30 Power distribution network scheme aided decision-making method, system, device and storage medium Pending CN114298429A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111661200.8A CN114298429A (en) 2021-12-30 2021-12-30 Power distribution network scheme aided decision-making method, system, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111661200.8A CN114298429A (en) 2021-12-30 2021-12-30 Power distribution network scheme aided decision-making method, system, device and storage medium

Publications (1)

Publication Number Publication Date
CN114298429A true CN114298429A (en) 2022-04-08

Family

ID=80974083

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111661200.8A Pending CN114298429A (en) 2021-12-30 2021-12-30 Power distribution network scheme aided decision-making method, system, device and storage medium

Country Status (1)

Country Link
CN (1) CN114298429A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114662982A (en) * 2022-04-15 2022-06-24 四川大学 Urban power distribution network multi-stage dynamic reconstruction method based on machine learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114662982A (en) * 2022-04-15 2022-06-24 四川大学 Urban power distribution network multi-stage dynamic reconstruction method based on machine learning
CN114662982B (en) * 2022-04-15 2023-07-14 四川大学 Multistage dynamic reconstruction method for urban power distribution network based on machine learning

Similar Documents

Publication Publication Date Title
Li et al. Coordinated load frequency control of multi-area integrated energy system using multi-agent deep reinforcement learning
CN112615379A (en) Power grid multi-section power automatic control method based on distributed multi-agent reinforcement learning
Bose Artificial intelligence techniques: How can it solve problems in power electronics?: An advancing frontier
Levent et al. Energy management for microgrids: A reinforcement learning approach
CN113935463A (en) Microgrid controller based on artificial intelligence control method
CN112491094B (en) Hybrid-driven micro-grid energy management method, system and device
CN112947672B (en) Maximum power point tracking method and device for photovoltaic cell
CN111917134B (en) Power distribution network dynamic autonomous reconstruction method and system based on data driving
CN116914751B (en) Intelligent power distribution control system
Xiao et al. Online sequential extreme learning machine algorithm for better predispatch electricity price forecasting grids
CN113872213B (en) Autonomous optimization control method and device for power distribution network voltage
El Bourakadi et al. Multi-agent system based sequential energy management strategy for Micro-Grid using optimal weighted regularized extreme learning machine and decision tree
Sun et al. Hybrid reinforcement learning for power transmission network self-healing considering wind power
CN115588998A (en) Graph reinforcement learning-based power distribution network voltage reactive power optimization method
CN115795992A (en) Park energy Internet online scheduling method based on virtual deduction of operation situation
CN114298429A (en) Power distribution network scheme aided decision-making method, system, device and storage medium
CN112182835A (en) Power system reliability assessment method and system considering wind power uncertainty and energy storage regulation
CN115133540B (en) Model-free real-time voltage control method for power distribution network
Yundra et al. Hybrid Model Combined Fuzzy Multi-Objective Decision Making with Feed Forward Neural Network (F-MODM-FFNN) For Very Short-Term Load Forecasting Based on Weather Data.
Anoop et al. Short term load forecasting using fuzzy logic control
CN115333152A (en) Distributed real-time control method for voltage of power distribution network
Liu et al. Deep reinforcement learning for real-time economic energy management of microgrid system considering uncertainties
Kang et al. Power flow coordination optimization control method for power system with DG based on DRL
Sundararajan et al. LSTM Recurrent Neural Network-Based Frequency Control Enhancement of the Power System with Electric Vehicles and Demand Management
CN117477607B (en) Three-phase imbalance treatment method and system for power distribution network with intelligent soft switch

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination