CN114400675A - Active power distribution network voltage control method based on weight mean value deep double-Q network - Google Patents
Active power distribution network voltage control method based on weight mean value deep double-Q network Download PDFInfo
- Publication number
- CN114400675A CN114400675A CN202210074238.3A CN202210074238A CN114400675A CN 114400675 A CN114400675 A CN 114400675A CN 202210074238 A CN202210074238 A CN 202210074238A CN 114400675 A CN114400675 A CN 114400675A
- Authority
- CN
- China
- Prior art keywords
- value
- action
- network
- state
- adjustable
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/12—Circuit arrangements for ac mains or ac distribution networks for adjusting voltage in ac networks by changing a characteristic of the network load
- H02J3/16—Circuit arrangements for ac mains or ac distribution networks for adjusting voltage in ac networks by changing a characteristic of the network load by adjustment of reactive power
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/12—Circuit arrangements for ac mains or ac distribution networks for adjusting voltage in ac networks by changing a characteristic of the network load
- H02J3/14—Circuit arrangements for ac mains or ac distribution networks for adjusting voltage in ac networks by changing a characteristic of the network load by switching loads on to, or off from, network, e.g. progressively balanced loading
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/28—Arrangements for balancing of the load in a network by storage of energy
- H02J3/32—Arrangements for balancing of the load in a network by storage of energy using batteries with converting means
- H02J3/322—Arrangements for balancing of the load in a network by storage of energy using batteries with converting means the battery being on-board an electric or hybrid vehicle, e.g. vehicle to grid arrangements [V2G], power aggregation, use of the battery for network load balancing, coordinated or cooperative battery charging
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/38—Arrangements for parallely feeding a single network by two or more generators, converters or transformers
- H02J3/46—Controlling of the sharing of output between the generators, converters, or transformers
- H02J3/48—Controlling the sharing of the in-phase component
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/38—Arrangements for parallely feeding a single network by two or more generators, converters or transformers
- H02J3/46—Controlling of the sharing of output between the generators, converters, or transformers
- H02J3/50—Controlling the sharing of the out-of-phase component
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J2203/00—Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
- H02J2203/10—Power transmission or distribution systems management focussing at grid-level, e.g. load flow analysis, node profile computation, meshed network optimisation, active network management or spinning reserve management
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J2203/00—Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
- H02J2203/20—Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02B—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO BUILDINGS, e.g. HOUSING, HOUSE APPLIANCES OR RELATED END-USER APPLICATIONS
- Y02B70/00—Technologies for an efficient end-user side electric power management and consumption
- Y02B70/30—Systems integrating technologies related to power network operation and communication or information technologies for improving the carbon footprint of the management of residential or tertiary loads, i.e. smart grids as climate change mitigation technology in the buildings sector, including also the last stages of power distribution and the control, monitoring or operating management systems at local level
- Y02B70/3225—Demand response systems, e.g. load shedding, peak shaving
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02E—REDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
- Y02E40/00—Technologies for an efficient electrical power generation, transmission or distribution
- Y02E40/30—Reactive power compensation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T90/00—Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation
- Y02T90/10—Technologies relating to charging of electric vehicles
- Y02T90/16—Information or communication technologies improving the operation of electric vehicles
- Y02T90/167—Systems integrating technologies related to power network operation and communication or information technologies for supporting the interoperability of electric or hybrid vehicles, i.e. smartgrids as interface for battery charging of electric vehicles [EV] or hybrid vehicles [HEV]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S20/00—Management or operation of end-user stationary applications or the last stages of power distribution; Controlling, monitoring or operating thereof
- Y04S20/20—End-user application control systems
- Y04S20/222—Demand response systems, e.g. load shedding, peak shaving
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S30/00—Systems supporting specific end-user applications in the sector of transportation
- Y04S30/10—Systems supporting the interoperability of electric or hybrid vehicles
- Y04S30/14—Details associated with the interoperability, e.g. vehicle recognition, authentication, identification or billing
Landscapes
- Engineering & Computer Science (AREA)
- Power Engineering (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Feedback Control In General (AREA)
Abstract
The invention discloses an active power distribution network voltage control method based on a weight mean value deep double-Q network, which is applied to a deep reinforcement learning method in the field of distribution network/micro-grid voltage regulation, wherein the voltage distribution of a distribution network/micro-grid, the adjustable capacity/power of an electric vehicle cluster are used as a state set, the adjustable micro-source output power is used as an action set, and the deep reinforcement learning training is carried out according to the corrected node voltage fluctuation as an instant reward, so that an intelligent body learns the output action which is most beneficial to voltage regulation under different voltage distribution and adjustable resource environments. The method for designing the reward target value by adopting the weight mean value combines the characteristics of reward target value design of the deep Q network and the deep double Q network, thereby avoiding the problems that the reward target value of the deep Q network is overestimated and the reward target value in the deep double Q network is underestimated.
Description
Technical Field
The invention relates to an active power distribution network voltage regulation method based on a weight-mean-value deep double-Q network (WDDQN), in particular to an intelligent voltage regulation method for distribution network/microgrid voltage fluctuation and overlimit caused by renewable energy output fluctuation, voltage drop caused by network topology and other reasons, which is applied to the environment that electric automobiles are connected to a power distribution network and a microgrid in a large scale.
Background
With the continuous development of renewable energy sources and the increase of electric automobiles. A variety of distributed energy sources including renewable energy, distributed generation units (DG), and stored energy are connected into a distribution/microgrid on a large scale. However, the coupling of reproducible intermittency and randomness to load fluctuations, such as photovoltaic fans, creates source-to-load mismatch problems and thus voltage fluctuations and out-of-limit problems. In addition, the distribution network/microgrid generally belongs to a low-voltage power grid, the line reactance in a low-voltage line cannot meet the condition far greater than the resistance, and the coupling relation between the active power and the voltage cannot be ignored. Therefore, the output adjustment of the active power and the reactive power of the adjustable micro-source in the distribution network/micro-grid can affect the voltage distribution, so that the voltage control is more difficult in the scene.
For the problem of voltage regulation of the distribution network/microgrid, various methods exist, such as a local control method based on droop control, hierarchical control through mathematical optimization and an intelligent algorithm, and the like. However, the methods generally need to solve after modeling for the multi-type micro-sources, and the problems of large modeling workload, large calculation amount, difficult online control, easy falling into local optimization and the like generally exist. In addition, for the problem that the electric automobile participates in voltage regulation, as the charging and discharging of the electric automobile are active power exchange, a model and an algorithm for voltage regulation need to be designed independently when the electric automobile is jointly regulated with a reactive power source, and the optimization is more difficult. Based on this, a voltage control method based on reinforcement learning has attracted attention. For example, voltage control using Q learning is adopted, but the state set and the action set in Q learning are discrete and limited, and it is difficult to handle a large-scale network structure. The deep Q learning (DQN) combined with the deep learning realizes the continuity of the state set, but the DQN often has the problem that the selection of the voltage regulating action is unreasonable due to overestimation of the reward value.
Disclosure of Invention
The invention provides an active power distribution network voltage control method based on a weight mean value deep double-Q network for avoiding the defects of the prior art, wherein the weight mean value method is adopted in the design of the reward target value and is combined with the characteristics of the reward target value design of the deep Q network and the deep double-Q network, so that the problems that the reward target value of the deep Q network is overestimated and the reward target value in the deep double-Q network is underestimated are avoided, and the reasonable evaluation of the reward value is realized; aiming at the problem of voltage and active and reactive power uniform coupling in a low-voltage distribution network/microgrid, a reasonable action set design is adopted, the common control of the output power of an active micro source and a reactive micro source is realized, and finally the optimal output of an adjustable micro source is realized under the conditions of different voltage distribution states and adjustable resources of an electric automobile cluster.
The invention adopts the following technical scheme for solving the technical problems:
the active power distribution network voltage control method based on the weight mean value deep double-Q network is characterized by comprising the following steps of:
in formula (1):
s belongs to S; a belongs to A; e () is the desired value; gamma is the learning rate;
s ' represents the new state reached by taking action a in state s, a ' represents the new action taken in state s ';
p (s, s ') is the probability of state s transitioning to a new state s';
q (s ', a') is the reward function for the new action a 'in the new state s';
the network structure comprises an input layer, an output layer and a plurality of hidden layers;
the input layer takes the current state S of a state set S at the current time ttTo input, the current state stIs a state vector, i.e. a state vector st:
The output layer is in the current state stReward function estimate Q for all actions in action set a at the current time tt(stA | θ) is the output, where θ is a defining parameter of the reward function estimate;
each hidden layer of the plurality of hidden layers comprises a plurality of neurons; the activation function of the neuron is ReLu;
the network loss function L (θ) is characterized by equation (2):
L(θ)=E(yWDDQN-Q(s,a|θ))2 (2)
in formula (2):
q (s, a | θ) is the reward function estimate for action a in state s;
yWDDQNthe reward target value is obtained by calculation of an equation (3) by adopting a weight average value method:
yWDDQN=r+γ(βQ(s',a*|θ)+(1-β)Q(s',a*|θ-)) (3)
in formula (3):
beta is a weight; a is the action when the current reward function estimation value is maximum under the state s';
Q(s',a*|θ-) The target reward function value of action a in state s' is θ-Defining parameters for the value of the objective reward function;
q (s', a | θ) is the reward function estimation value of action a in state s, i.e. the current reward function estimation value is defined parameter using θ as reward function value;
the current reward function estimation value Q (s', a | theta) is obtained by the output of the neural network output layer and is continuously updated in the neural network to form an online network;
the target reward function value Q (s', a | θ |)-) The target network is obtained by target network output, the structure of the target network is the same as that of the online network, and target network parameters are obtained by online network replication according to set interval steps;
the weight β is obtained by equation (4):
in formula (4):
aLthe action when the reward function estimation value is minimum under the state s'; c is a hyperparameter for adjusting the weight value;
Q(s',aL|θ-) Is in a state s' and acts aLThe value of the objective reward function of, in theta-Defining parameters for the value of the objective reward function;
the dynamic epsilon-greedy strategy is to randomly select an action with a probability of epsilon and select the action with the maximum current reward value with a probability of (1-epsilon) when selecting the action, wherein the epsilon is characterized by an equation (5):
in formula (5):
δ is an adjustment coefficient, the value of δ being a constant less than 1;
step is the number of steps; x0Is an initial value of exploration, the value of which is a positive number; a isrIs an action randomly selected under the state s;
Q(s,ar| θ) is an action a under the state srThe reward function estimated value of (1) takes theta as a definition parameter of a reward function value;
q (s, a | theta) is the reward function estimation value of the action a in the state s, and theta is a definition parameter of the reward function value;
4.1, establishing a mean depth double-Q neural network according to the Step 2, initializing a parameter theta in the neural network, initializing the capacity of a memory set D and the sampling number of a sampling set, reading the prediction result of the adjustable power of the electric automobile, and setting the Step number Step to be 0;
4.2, reading the voltage of the power grid and obtaining the current state s by combining the prediction result of the adjustable power of the electric automobilet;
4.3, comparing the current state stReward function estimation value Q of all actions obtained by inputting into online networkt(st,A|θ);
4.4, selecting the current action a from the action set A in the step 1 according to the dynamic epsilon-greedy strategy obtained in the step 3tAnd inputting the current action into the power grid for load flow calculation to obtain a new state st+1;
4.5 according to the new state st+1Calculating to obtain an instant reward r;
4.6, will { st,at,st+1R is put into a memory set D, and then whether the memory set D is full is judged;
if the memory set D is not full, returning to the step 4.2;
if the memory set D is full, go to step 4.7;
4.7, sampling the memory set D to the online network and the target network, and respectively calculating Q (s, a | theta) and yWDDQNCalculating a loss function L (theta), and updating online network parameters by adopting random gradient descent;
4.8, increasing the assignment of Step by 1, and copying the online network parameters to the target network at intervals of a fixed Step number C;
4.9, judging whether the value of Step is the maximum;
if the value of Step is not the maximum value, returning to the Step 4.7;
if the value of Step is the maximum value, the action with the maximum reward value is output by the current online network, the reinforcement learning process is completed, and the voltage control is realized.
The active power distribution network voltage control method based on the weight mean value deep double-Q network is also characterized in that: in the step 1, the status set S, the action set a and the instant prize r are set as follows:
the state set S is a set of all state vectors, state vector StThe node voltage distribution condition in the power grid and the electric automobile cluster adjustable condition at the current moment t are represented by the formula (6):
st={U1,t,...,UN,t,...,PEl,t,min,...,PEl,t,max,...,CEl,t,min,...,CEl,t,max,...} (6)
in formula (6):
representing nodes by i, wherein i is 1,2, …, and N is the number of nodes in the voltage regulation area;
by Ui,tRepresenting the voltage amplitude of the ith node at the current moment t;
U1,tis the voltage amplitude, U, of the 1 st node at the present time tN,tThe voltage amplitude of the Nth node at the current moment t is obtained;
PEl,t,minthe minimum value of the adjustable power of the first electric automobile cluster at the current moment t is obtained;
PEl,t,maxthe maximum value of the adjustable power of the first electric automobile cluster at the current moment t is obtained;
CEl,t,minthe minimum value of the adjustable capacity of the first electric automobile cluster at the current moment t is obtained;
CEl,t,maxthe maximum value of the adjustable capacity of the first electric automobile cluster at the current moment t is obtained;
the number of the adjustable electric automobile clusters is represented by L, and the state vector stThe number of elements of (a) is N + 4L;
the action set A is the set of all action vectors in the current state stMotion vector a oftThe output action of the adjustable micro source is characterized by an equation (7);
in formula (7):
the tunable micro sources are represented by j, m is the total number of tunable micro sources, and j is 1,2, …, m;
the adjustable actions are represented by K, the K is the total number of the adjustable actions of each unit, and K is 0,1, … and K-1
The number of elements of the action set A is KmA plurality of;
to be provided withCharacterizing a kth action of a jth tunable micro-source at a current time tth; and comprises the following components:
Qj,minthe minimum value of the adjustable reactive power of the jth adjustable micro source;
Qj,maxthe maximum value of the adjustable reactive power of the jth adjustable micro source;
Pj,t,minthe minimum value of the adjustable active power of the jth adjustable micro source at the moment t in the current plane is obtained;
Pj,t,maxthe maximum value of the adjustable active power of the jth adjustable micro source at the moment t in the current plane is obtained;
the instant prize r is characterized by equation (9);
in formula (9):
Uiis the node voltage; lambda [ alpha ]iThe real-time reward is a reward coefficient and is used for correcting the size of the real-time reward, and the reward coefficient comprises the following components:
the instant prize r is based on the relevant specifications of the power system with respect to voltage offsets and schedules out-of-limit nodes preferentially.
Compared with the prior art, the invention has the beneficial effects that:
1. according to the active power distribution network/microgrid voltage control method based on the weight mean value deep double-Q network, which is disclosed by the invention, in the field of power distribution network/microgrid voltage regulation, the problem that the traditional deep Q learning overestimates the reward target value and the double-Q deep learning underestimates the reward target value is effectively avoided; reasonable evaluation of the reward target value is more effectively realized, so that the optimal output action of the adjustable micro-source is determined under different voltage distribution states and adjustable resource conditions of the electric automobile cluster;
2. the electric automobile can only perform active regulation, and the voltage of the distribution network/microgrid is coupled with both active power and reactive power, and the invention sets a suitable action set and state set aiming at the condition that the electric automobile and other adjustable loads are connected into the distribution network/microgrid, thereby realizing the purpose of effectively utilizing various adjustable resources to perform voltage regulation;
3. the invention sets the immediate action reward function in consideration of the regulation of the voltage deviation by the management department, realizes the prior scheduling of the node with larger deviation under the condition of limited scheduling resources, and greatly contributes to preventing the voltage from exceeding the limit.
Drawings
FIG. 1 is a flow chart of an active power distribution network voltage control method based on a weight mean depth double Q network in the invention;
FIG. 2 illustrates a training process of a neural network according to the present invention;
FIG. 3 is an IEEE-33 distribution network topology used in the testing of the present invention;
FIG. 4a is a graph of the renewable energy output of the distribution network used in the tests of the present invention
FIG. 4b shows the distribution network used in the test of the present invention; the connected electric automobile cluster can regulate power data;
FIG. 5 shows the voltage distribution before conditioning for the test case of the present invention;
FIG. 6 is a process of training different algorithms in the test case of the present invention;
FIG. 7 shows the adjusted voltage distribution for the test cases of the present invention.
Detailed description of the invention
The invention belongs to a deep reinforcement learning method, which is a branch of machine learning and is the combination of reinforcement learning and a deep neural network. The deep reinforcement learning has the greatest characteristic that the intelligent agent learns in interaction, and the intelligent agent learns knowledge continuously in interaction with the environment according to acquired rewards or punishment, so that the intelligent agent is more adaptive to the environment. The state range of the environment is a state set, the action range of the intelligent agent is an action set, the rewards obtained after actions can be divided into instant rewards and overall profits, and the overall profits are obtained by the instant rewards. Therefore, for a deep reinforcement learning design, one of the cores is the design of the reward set, the action set and the instant reward.
The active power distribution network voltage control method based on the weight mean value deep double-Q network in the embodiment comprises the following steps:
Through the design of the state set, the action set and the instant reward, the instant reward value of the action a in the state s can be obtained, but for the whole system, what is more important is what the overall benefit of the action a in the state s can be obtained in the whole process, but the overall benefit is difficult to obtain directly, so that the reward function Q (s, a) of the action a in the state s is characterized by the following formula (1):
in formula (1):
s belongs to S; a belongs to A; e () is the desired value; γ is the learning rate, γ is typically a constant less than 1;
s ' represents the new state reached by taking action a in state s, a ' represents the new action taken in state s ';
p (s, s ') is the probability of state s transitioning to a new state s';
q (s ', a') is the reward function for the new action a 'in the new state s'.
the network structure comprises an input layer, an output layer and a plurality of hidden layers;
the current state S of the input layer at the current time t in a state set StFor input, current state stIs a state vector, i.e. a state vector st:
Output layer in current state stReward function estimate Q for all actions in action set a at the current time tt(stA | θ) is the output, where θ is a defining parameter of the reward function estimate;
each hidden layer of the plurality of hidden layers comprises a plurality of neurons; the activation function of the neuron is ReLu;
the network loss function L (θ) is characterized by equation (2):
L(θ)=E(yWDDQN-Q(s,a|θ))2 (2)
in formula (2):
q (s, a | θ) is the reward function estimate for action a in state s;
yWDDQNthe reward target value is obtained by calculation of an equation (3) by adopting a weight average value method:
yWDDQN=r+γ(βQ(s',a*|θ)+(1-β)Q(s',a*|θ-)) (3)
in formula (3):
beta is a weight; a is the action when the current reward function estimation value is maximum under the state s';
Q(s',a*|θ-) The target reward function value of action a in state s' is θ-Defining parameters for the value of the objective reward function;
q (s', a | θ) is the reward function estimation value of action a in state s, i.e. the current reward function estimation value is defined parameter using θ as reward function value;
the current reward function estimated value Q (s', a | theta) is obtained by the output of the output layer of the neural network and is continuously updated in the neural network to form an online network;
target reward function value Q (s', a | θ)-) The target network is obtained by the output of the target network, the structure of the target network is the same as that of the online network, and the target network parameters are obtained by the online network replication according to the set interval steps;
the weight β is obtained by equation (4):
in formula (4):
aLthe action when the reward function estimation value is minimum under the state s'; c is a hyperparameter for adjusting the weight value;
Q(s',aL|θ-) Is in a state s' and acts aLThe value of the objective reward function of, in theta-Defining parameters for the value of the objective reward function;
in Q learning, each state and action corresponds to a reward function, and when the state and the action are increased, the reward functions are too many to be calculated; therefore, the method of deep reinforcement learning through the deep neural network approaches the real reward function in the form of the reward function estimation value, namely the Q neural network; the network structure of the weight mean value deep double-Q neural network is similar to that of the Q neural network, and the difference is mainly the design of a network loss function and a reward target value.
the dynamic epsilon-greedy strategy is to randomly select an action with the probability of epsilon and select the action with the maximum current reward value with the probability of (1-epsilon) when selecting the action, and epsilon is characterized by the formula (5):
in formula (5):
δ is an adjustment coefficient, the value of δ being a constant less than 1;
step is the number of steps; x0Is an initial value of exploration, the value of which is a positive number; a isrIs an action randomly selected under the state s;
Q(s,ar| θ) is an action a under the state srThe reward function estimated value of (1) takes theta as a definition parameter of a reward function value;
q (s, a | theta) is the reward function estimation value of the action a in the state s, and theta is a definition parameter of the reward function value;
on one hand, epsilon is gradually reduced along with the increase of the iteration times, so that the goal of first exploring and then converging can be realized; on the other hand, epsilon is also related to the relative size of the reward value, and if the randomly chosen reward value is a small difference from the current maximum reward value, indicating that the action currently obtaining the maximum reward value may not be good enough, epsilon will become large, tending to explore new actions, and vice versa.
4.1, establishing a mean depth double-Q neural network according to the Step 2, initializing a parameter theta in the neural network, initializing the capacity of a memory set D and the sampling number of a sampling set, reading the prediction result of the adjustable power of the electric automobile, and setting the Step number Step to be 0;
4.2, reading the voltage of the power grid and obtaining the current state s by combining the prediction result of the adjustable power of the electric automobilet;
4.3, comparing the current state stReward function estimation value Q of all actions obtained by inputting into online networkt(st,A|θ);
4.4, selecting the current action a from the action set A in the step 1 according to the dynamic epsilon-greedy strategy obtained in the step 3tAnd inputting the current action into the power grid for load flow calculation to obtain a new state st+1;
4.5 according to the new state st+1Calculating to obtain an instant reward r;
4.6, will { st,at,st+1R is put into a memory set D, and then whether the memory set D is full is judged;
if the memory set D is not full, returning to the step 4.2;
if the memory set D is full, go to step 4.7;
4.7, sampling the memory set D to the online network and the target network, and respectively calculating Q (s, a | theta) and yWDDQNAnd calculating a loss function L (theta), updating the on-line network parameters using a random gradient descent, whichThe process is shown in FIG. 2;
4.8, increasing the assignment of Step by 1, and copying the online network parameters to the target network at intervals of a fixed Step number C;
4.9, judging whether the value of Step is the maximum;
if the value of Step is not the maximum value, returning to the Step 4.7;
if the value of Step is the maximum value, the action with the maximum reward value is output by the current online network, the reinforcement learning process is completed, and the voltage control is realized.
In a specific implementation, in step 1, the status set S, the action set a and the instant prize r are set as follows:
the state set S is the set of all state vectors, state vector StThe node voltage distribution condition in the power grid and the electric automobile cluster adjustable condition at the current moment t are represented by the formula (6):
st={U1,t,...,UN,t,...,PEl,t,min,...,PEl,t,max,...,CEl,t,min,...,CEl,t,max,...} (6)
in formula (6):
representing nodes by i, wherein i is 1,2, …, and N is the number of nodes in the voltage regulation area;
by Ui,tRepresenting the voltage amplitude of the ith node at the current moment t;
U1,tis the voltage amplitude, U, of the 1 st node at the present time tN,tThe voltage amplitude of the Nth node at the current moment t is obtained;
PEl,t,minthe minimum value of the adjustable power of the first electric automobile cluster at the current moment t is obtained;
PEl,t,maxthe maximum value of the adjustable power of the first electric automobile cluster at the current moment t is obtained;
CEl,t,minthe minimum value of the adjustable capacity of the first electric automobile cluster at the current moment t is obtained;
CEl,t,maxthe maximum value of the adjustable capacity of the first electric automobile cluster at the current moment t is obtained;
the number of the adjustable electric automobile clusters is represented by L,state vector stThe number of elements of (a) is N + 4L;
the action set A is the set of all action vectors, at the current state stMotion vector a oftThe output action of the adjustable micro source is characterized by an equation (7);
in formula (7):
the tunable micro sources are represented by j, m is the total number of tunable micro sources, and j is 1,2, …, m;
the adjustable actions are represented by K, the K is the total number of the adjustable actions of each unit, and K is 0,1, … and K-1
The number of elements of the action set A is KmA plurality of;
to be provided withCharacterizing a kth action of a jth tunable micro-source at a current time tth; and comprises the following components:
Qj,minthe minimum value of the adjustable reactive power of the jth adjustable micro source;
Qj,maxthe maximum value of the adjustable reactive power of the jth adjustable micro source;
Pj,t,minthe minimum value of the adjustable active power of the jth adjustable micro source at the moment t in the current plane is obtained;
Pj,t,maxthe maximum value of the adjustable active power of the jth adjustable micro source at the moment t in the current plane is obtained;
the instant prize r is characterized by equation (9);
in formula (9):
Uiis a section ofA dot voltage; lambda [ alpha ]iThe real-time reward is a reward coefficient and is used for correcting the size of the real-time reward, and the reward coefficient comprises the following components:
the immediate reward r is based on the relevant specifications of the power system with respect to voltage offsets and schedules out-of-limit nodes preferentially.
In the invention, the maximum power and the minimum power of the reactive power source in the scheduling are regarded as constant, and the adjustable power of the active power source is changed. The reactive power source comprises a capacitor, a static reactive compensator and a synchronous generator; the active power source comprises an electric automobile cluster, an energy storage unit, a temperature control load and a micro gas turbine.
According to the invention, based on the weight mean value deep double-Q network method, the node voltage of a power grid and the adjustable condition of an electric vehicle are used as a state set, the adjustable micro-source reactive power and the active power output are used as an action set, and the corrected voltage fluctuation is used as a reward value to carry out deep reinforcement learning training, so that an intelligent agent learns the output action which is most beneficial to voltage adjustment under different power grid and adjustable resource environments.
Referring to fig. 3, a distribution network system using the modified IEEE33 node was tested. Three nodes of 8, 15 and 25 in the system are respectively connected with photovoltaic with the rated power of 2.5MW, a fan with the rated power of 3MW and photovoltaic with the rated power of 2.5 MW. The solar output of the fan and the photovoltaic is shown in fig. 4 a. The node 1 in the figure 5 is a balance node due to the fluctuation of the photovoltaic and fan output and the voltage drop of the distribution network, which causes the voltage fluctuation and the out-of-limit of the system; the voltage control of the distribution network is realized in the distribution network system according to the following steps:
step a, determining a state set S and an action set A, and setting an instant reward r.
The number N of nodes in a voltage regulation area is 33, l is 2, 35 state sets are used, the maximum power and the minimum power of a reactive power source in scheduling are regarded as constant, the adjustable power of an active power source such as an electric automobile is changed, and the adjustable micro power source condition is shown in a table 1:
TABLE 1
Summarizing: if K is 8 and m is 3 in the action set, there are 512 action combinations in total.
B, establishing a mean value depth double-Q neural network, wherein a hidden layer is 2, each hidden layer neuron is 48, and an activation function is ReLu; the hyperparameter c for adjusting the weight value is taken as 1.
Step c, designing a dynamic Epsilon-Greedy (Epsilon-Greedy) strategy, wherein an adjusting coefficient delta is 0.99, the iteration number is i, and an initial value X is explored0Is taken as 106。
Step d, implementing distribution network/microgrid voltage control according to the following processes, as shown in fig. 1:
d1, establishing a mean depth double-Q neural network according to the Step b, initializing a parameter theta in the neural network, initializing the capacity of a memory set D to be 10000 and the sampling number of a sampling set to be 96, reading the prediction result of the adjustable power of the electric automobile, and setting the Step number Step to be 0;
step d2, reading the voltage of the power grid and obtaining the current state s by combining the prediction result of the adjustable power of the electric automobilet;
Step d3, converting the current state stReward function estimation value Q of all actions obtained by inputting into online networkt(st,A|θ);
Step d4, selecting the current action a from the action set A in the step a according to the dynamic epsilon-greedy strategy obtained in the step CtAnd inputting the current action into the power grid for load flow calculation to obtain a new state st+1;
Step d5, according to the new state st+1Calculating to obtain an instant reward r;
step d6, converting st,at,st+1R is put into a memory set D and whether the memory set D is full is judged;
if the memory set D is not full, returning to the step D2; if the memory set D is full, go to step D7;
step D7, sampling the online network and the target network from the memory set DSeparately, Q (s, a | θ) and y are calculatedWDDQNCalculating a loss function L (theta), and updating online network parameters by adopting random gradient descent, wherein the process is as shown in FIG. 2;
step d8, increasing the assignment of Step by 1, taking C as 100 every fixed Step number C, and copying the online network parameters to the target network;
step d9, judging whether the value of Step is maximum;
if the value of Step is not the maximum value, returning to the Step 4.7;
if the value of Step is the maximum value, the action with the maximum reward value is output by the current online network, the reinforcement learning process is completed, and the voltage control is realized. In this embodiment, the maximum value of Step is 30000.
The average reward comparison between the inventive method (WDDQN) and the traditional deep Q learning (DQN) during training is illustrated in fig. 6, and fig. 6 shows that the effect of both methods gradually increases and eventually stabilizes as training progresses. However, the stable value of WDDQN is larger than DQN, and DQN is trapped in local optima. The experimental result shows that the invention can better select the action value compared with DQN.
The result of using the trained agent for the distribution network voltage control is shown in fig. 7, and compared with the voltage distribution before adjustment in fig. 5, the distribution network voltage range in fig. 7 is changed from [0.926,1.073] before adjustment to [0.951,1.046] after adjustment, and at this time, the distribution network voltage all day is within the range [0.95,1.05] required by the national standard. Meanwhile, the voltage offset amount is expressed by equation (11):
the voltage offset before and after control is reduced from 0.0412 before adjustment to 0.0152 after adjustment, and the method can effectively control the voltage of the distribution network.
Claims (2)
1. A voltage control method of an active power distribution network based on a weight mean value deep double-Q network is characterized by comprising the following steps:
step 1, determining a state set S and an action set A according to voltage distribution and power supply adjustable conditions in a power grid, setting an instant reward r, and obtaining a reward function Q (S, a) of an action a in a state S, which is represented by formula (1):
in formula (1):
s belongs to S; a belongs to A; e () is the desired value; gamma is the learning rate;
s ' represents the new state reached by taking action a in state s, a ' represents the new action taken in state s ';
p (s, s ') is the probability of state s transitioning to a new state s';
q (s ', a') is the reward function for the new action a 'in the new state s';
step 2, designing a network structure and a network loss function L (theta) of the weight mean value deep double-Q neural network:
the network structure comprises an input layer, an output layer and a plurality of hidden layers;
the input layer takes the current state S of a state set S at the current time ttTo input, the current state stIs a state vector, i.e. a state vector st:
The output layer is in the current state stReward function estimate Q for all actions in action set a at the current time tt(stA | θ) is the output, where θ is a defining parameter of the reward function estimate;
each hidden layer of the plurality of hidden layers comprises a plurality of neurons; the activation function of the neuron is ReLu;
the network loss function L (θ) is characterized by equation (2):
L(θ)=E(yWDDQN-Q(s,a|θ))2 (2)
in formula (2):
q (s, a | θ) is the reward function estimate for action a in state s;
yWDDQNfor rewarding the target value, a weight averaging method is adoptedObtained by calculation of equation (3):
yWDDQN=r+γ(βQ(s',a*|θ)+(1-β)Q(s',a*|θ-)) (3)
in formula (3):
beta is a weight; a is the action when the current reward function estimation value is maximum under the state s';
Q(s',a*|θ-) The target reward function value of action a in state s' is θ-Defining parameters for the value of the objective reward function;
q (s', a | θ) is the reward function estimation value of action a in state s, i.e. the current reward function estimation value is defined parameter using θ as reward function value;
the current reward function estimation value Q (s', a | theta) is obtained by the output of the neural network output layer and is continuously updated in the neural network to form an online network;
the target reward function value Q (s', a | θ |)-) The target network is obtained by target network output, the structure of the target network is the same as that of the online network, and target network parameters are obtained by online network replication according to set interval steps;
the weight β is obtained by equation (4):
in formula (4):
aLthe action when the reward function estimation value is minimum under the state s'; c is a hyperparameter for adjusting the weight value;
Q(s',aL|θ-) Is in a state s' and acts aLThe value of the objective reward function of, in theta-Defining parameters for the value of the objective reward function;
step 3, designing a dynamic epsilon-greedy strategy;
the dynamic epsilon-greedy strategy is to randomly select an action with a probability of epsilon and select the action with the maximum current reward value with a probability of (1-epsilon) when selecting the action, wherein the epsilon is characterized by an equation (5):
in formula (5):
δ is an adjustment coefficient, the value of δ being a constant less than 1;
step is the number of steps; x0Is an initial value of exploration, the value of which is a positive number; a isrIs an action randomly selected under the state s;
Q(s,ar| θ) is an action a under the state srThe reward function estimated value of (1) takes theta as a definition parameter of a reward function value;
q (s, a | theta) is the reward function estimation value of the action a in the state s, and theta is a definition parameter of the reward function value;
step 4, the voltage control of the distribution network/microgrid is implemented as follows:
4.1, establishing a mean depth double-Q neural network according to the Step 2, initializing a parameter theta in the neural network, initializing the capacity of a memory set D and the sampling number of a sampling set, reading the prediction result of the adjustable power of the electric automobile, and setting the Step number Step to be 0;
4.2, reading the voltage of the power grid and obtaining the current state s by combining the prediction result of the adjustable power of the electric automobilet;
4.3, comparing the current state stReward function estimation value Q of all actions obtained by inputting into online networkt(st,A|θ);
4.4, selecting the current action a from the action set A in the step 1 according to the dynamic epsilon-greedy strategy obtained in the step 3tAnd inputting the current action into the power grid for load flow calculation to obtain a new state st+1;
4.5 according to the new state st+1Calculating to obtain an instant reward r;
4.6, will { st,at,st+1R is put into a memory set D, and then whether the memory set D is full is judged;
if the memory set D is not full, returning to the step 4.2;
if the memory set D is full, go to step 4.7;
4.7, sampling the memory set D to the online network and the target network, and respectively calculating Q (s, a | theta) and yWDDQNCalculating a loss function L (theta), and updating online network parameters by adopting random gradient descent;
4.8, increasing the assignment of Step by 1, and copying the online network parameters to the target network at intervals of a fixed Step number C;
4.9, judging whether the value of Step is the maximum;
if the value of Step is not the maximum value, returning to the Step 4.7;
if the value of Step is the maximum value, the action with the maximum reward value is output by the current online network, the reinforcement learning process is completed, and the voltage control is realized.
2. The active power distribution network voltage control method based on the weight mean depth double-Q network as claimed in claim 1, wherein: in the step 1, the status set S, the action set a and the instant prize r are set as follows:
the state set S is a set of all state vectors, state vector StThe node voltage distribution condition in the power grid and the electric automobile cluster adjustable condition at the current moment t are represented by the formula (6):
st={U1,t,...,UN,t,...,PEl,t,min,...,PEl,t,max,...,CEl,t,min,...,CEl,t,max,...} (6)
in formula (6):
representing nodes by i, wherein i is 1,2, …, and N is the number of nodes in the voltage regulation area;
by Ui,tRepresenting the voltage amplitude of the ith node at the current moment t;
U1,tis the voltage amplitude, U, of the 1 st node at the present time tN,tThe voltage amplitude of the Nth node at the current moment t is obtained;
PEl,t,minthe minimum value of the adjustable power of the first electric automobile cluster at the current moment t is obtained;
PEl,t,maxthe maximum value of the adjustable power of the first electric automobile cluster at the current moment t is obtained;
CEl,t,minthe minimum value of the adjustable capacity of the first electric automobile cluster at the current moment t is obtained;
CEl,t,maxthe maximum value of the adjustable capacity of the first electric automobile cluster at the current moment t is obtained;
the number of the adjustable electric automobile clusters is represented by L, and the state vector stThe number of elements of (a) is N + 4L;
the action set A is the set of all action vectors in the current state stMotion vector a oftThe output action of the adjustable micro source is characterized by an equation (7);
in formula (7):
the tunable micro sources are represented by j, m is the total number of tunable micro sources, and j is 1,2, …, m;
the adjustable actions are represented by K, the K is the total number of the adjustable actions of each unit, and K is 0,1, … and K-1
The number of elements of the action set A is KmA plurality of;
to be provided withCharacterizing a kth action of a jth tunable micro-source at a current time tth; and comprises the following components:
Qj,minthe minimum value of the adjustable reactive power of the jth adjustable micro source;
Qj,maxthe maximum value of the adjustable reactive power of the jth adjustable micro source;
Pj,t,minthe minimum value of the adjustable active power of the jth adjustable micro source at the moment t in the current plane is obtained;
Pj,t,maxthe maximum value of the adjustable active power of the jth adjustable micro source at the moment t in the current plane is obtained;
the instant prize r is characterized by equation (9);
in formula (9):
Uiis the node voltage; lambda [ alpha ]iThe real-time reward is a reward coefficient and is used for correcting the size of the real-time reward, and the reward coefficient comprises the following components:
the instant prize r is based on the relevant specifications of the power system with respect to voltage offsets and schedules out-of-limit nodes preferentially.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210074238.3A CN114400675B (en) | 2022-01-21 | 2022-01-21 | Active power distribution network voltage control method based on weight mean value deep double-Q network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210074238.3A CN114400675B (en) | 2022-01-21 | 2022-01-21 | Active power distribution network voltage control method based on weight mean value deep double-Q network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114400675A true CN114400675A (en) | 2022-04-26 |
CN114400675B CN114400675B (en) | 2023-04-07 |
Family
ID=81233698
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210074238.3A Active CN114400675B (en) | 2022-01-21 | 2022-01-21 | Active power distribution network voltage control method based on weight mean value deep double-Q network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114400675B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116599061A (en) * | 2023-07-18 | 2023-08-15 | 国网浙江省电力有限公司宁波供电公司 | Power grid operation control method based on reinforcement learning |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111478326A (en) * | 2020-05-12 | 2020-07-31 | 南方电网科学研究院有限责任公司 | Comprehensive energy optimization method and device based on model-free reinforcement learning |
US20200327411A1 (en) * | 2019-04-14 | 2020-10-15 | Di Shi | Systems and Method on Deriving Real-time Coordinated Voltage Control Strategies Using Deep Reinforcement Learning |
CN112117760A (en) * | 2020-08-13 | 2020-12-22 | 国网浙江省电力有限公司台州供电公司 | Micro-grid energy scheduling method based on double-Q-value network deep reinforcement learning |
CN112465664A (en) * | 2020-11-12 | 2021-03-09 | 贵州电网有限责任公司 | AVC intelligent control method based on artificial neural network and deep reinforcement learning |
CN113036772A (en) * | 2021-05-11 | 2021-06-25 | 国网江苏省电力有限公司南京供电分公司 | Power distribution network topology voltage adjusting method based on deep reinforcement learning |
-
2022
- 2022-01-21 CN CN202210074238.3A patent/CN114400675B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200327411A1 (en) * | 2019-04-14 | 2020-10-15 | Di Shi | Systems and Method on Deriving Real-time Coordinated Voltage Control Strategies Using Deep Reinforcement Learning |
CN111478326A (en) * | 2020-05-12 | 2020-07-31 | 南方电网科学研究院有限责任公司 | Comprehensive energy optimization method and device based on model-free reinforcement learning |
CN112117760A (en) * | 2020-08-13 | 2020-12-22 | 国网浙江省电力有限公司台州供电公司 | Micro-grid energy scheduling method based on double-Q-value network deep reinforcement learning |
CN112465664A (en) * | 2020-11-12 | 2021-03-09 | 贵州电网有限责任公司 | AVC intelligent control method based on artificial neural network and deep reinforcement learning |
CN113036772A (en) * | 2021-05-11 | 2021-06-25 | 国网江苏省电力有限公司南京供电分公司 | Power distribution network topology voltage adjusting method based on deep reinforcement learning |
Non-Patent Citations (1)
Title |
---|
刘峻岐 等: "基于时空耦合关联分析的电动汽车集群可调度能力评估", 《电力建设》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116599061A (en) * | 2023-07-18 | 2023-08-15 | 国网浙江省电力有限公司宁波供电公司 | Power grid operation control method based on reinforcement learning |
CN116599061B (en) * | 2023-07-18 | 2023-10-24 | 国网浙江省电力有限公司宁波供电公司 | Power grid operation control method based on reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
CN114400675B (en) | 2023-04-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112186743B (en) | Dynamic power system economic dispatching method based on deep reinforcement learning | |
Li et al. | Coordinated load frequency control of multi-area integrated energy system using multi-agent deep reinforcement learning | |
CN114362196B (en) | Multi-time-scale active power distribution network voltage control method | |
CN105846461B (en) | Control method and system for large-scale energy storage power station self-adaptive dynamic planning | |
CN108964050A (en) | Micro-capacitance sensor dual-layer optimization dispatching method based on Demand Side Response | |
CN109256810B (en) | Multi-objective optimization method considering uncertain cost of fan output | |
CN108565874B (en) | Source-load cooperative frequency modulation method based on load frequency control model | |
CN109034587B (en) | Active power distribution system optimal scheduling method for coordinating multiple controllable units | |
CN113872213B (en) | Autonomous optimization control method and device for power distribution network voltage | |
CN113300380B (en) | Load curve segmentation-based power distribution network reactive power optimization compensation method | |
CN110165714B (en) | Micro-grid integrated scheduling and control method based on extreme dynamic programming algorithm and computer readable storage medium | |
CN112507614A (en) | Comprehensive optimization method for power grid in distributed power supply high-permeability area | |
CN114784823A (en) | Micro-grid frequency control method and system based on depth certainty strategy gradient | |
Yin et al. | Hybrid multi-agent emotional deep Q network for generation control of multi-area integrated energy systems | |
CN118174355A (en) | Micro-grid energy optimization scheduling method | |
CN108539797A (en) | A kind of secondary frequency of isolated island micro-capacitance sensor and voltage control method considering economy | |
CN113675890A (en) | TD 3-based new energy microgrid optimization method | |
CN114566971A (en) | Real-time optimal power flow calculation method based on near-end strategy optimization algorithm | |
CN114400675B (en) | Active power distribution network voltage control method based on weight mean value deep double-Q network | |
CN117039981A (en) | Large-scale power grid optimal scheduling method, device and storage medium for new energy | |
Liu et al. | An AGC dynamic optimization method based on proximal policy optimization | |
CN117674160A (en) | Active power distribution network real-time voltage control method based on multi-agent deep reinforcement learning | |
CN116799856A (en) | Energy control method, controller and system for multi-microgrid system | |
CN114336704A (en) | Regional energy Internet multi-agent distributed control and efficiency evaluation method | |
CN110289643B (en) | Rejection depth differential dynamic planning real-time power generation scheduling and control algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |