CN111523737A - Automatic optimization-approaching adjusting method for operation mode of electric power system driven by deep Q network - Google Patents

Automatic optimization-approaching adjusting method for operation mode of electric power system driven by deep Q network Download PDF

Info

Publication number
CN111523737A
CN111523737A CN202010478336.4A CN202010478336A CN111523737A CN 111523737 A CN111523737 A CN 111523737A CN 202010478336 A CN202010478336 A CN 202010478336A CN 111523737 A CN111523737 A CN 111523737A
Authority
CN
China
Prior art keywords
state
data
power grid
action
generator
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010478336.4A
Other languages
Chinese (zh)
Other versions
CN111523737B (en
Inventor
刘友波
刘季昂
刘俊勇
田蓓
顾雨嘉
李宏强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Electric Power Research Institute of State Grid Ningxia Electric Power Co Ltd
Original Assignee
Sichuan University
Electric Power Research Institute of State Grid Ningxia Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University, Electric Power Research Institute of State Grid Ningxia Electric Power Co Ltd filed Critical Sichuan University
Priority to CN202010478336.4A priority Critical patent/CN111523737B/en
Publication of CN111523737A publication Critical patent/CN111523737A/en
Application granted granted Critical
Publication of CN111523737B publication Critical patent/CN111523737B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06312Adjustment or analysis of established resource schedule, e.g. resource or task levelling, or dynamic rescheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Abstract

The invention discloses an automatic optimization-approaching adjusting method for a running mode of a power system driven by a deep Q network; determining a load fluctuation range by taking a typical operation mode as an adjustment reference mode, and generating a large amount of target mode sample data for training and testing by combining a Latin hypercube sampling method; determining all feasible single control actions in the power grid model, numbering the control actions, and setting the control actions as an action space; initializing a power grid model, judging whether an untrained sample exists, if so, assigning load data in the sample to the power grid model, performing convergence optimization processing on output data of the generator in the current operation mode, and if not, terminating training and the like. The method ensures the calculation speed, simultaneously makes up the problem that the optimal power flow method is difficult to converge when solving the multi-target optimal power flow, ensures that each index of the adjusted mode has no overlarge deviation, and provides method reference for deep reinforcement learning applied to the power grid optimization and control problems.

Description

Automatic optimization-approaching adjusting method for operation mode of electric power system driven by deep Q network
Technical Field
The invention relates to the technical field of power system automation, in particular to a method for automatically optimizing and adjusting a running mode of a power system driven by a deep Q network.
Background
The power grid operation mode is used as a general technical scheme about power grid operation and production compiled by a power grid operation regulation and control department, and has a guiding effect on the work of planning and designing a power grid, generating plan arrangement, real-time power grid dispatching, maintenance plan making and the like. The complex factors such as the structure of a power grid, the distribution of a power supply and a load, the bearing capacity of equipment operation and the like need to be fully considered in the compilation, the load requirement is met to the maximum extent, the overall safe, stable, reliable, flexible and economic operation of the power grid is ensured, and the method has the characteristics of multiple influence factors, complex association relation, large calculation workload and the like. In the compiling process, the method belongs to load flow calculation and is most important. The result of the load flow calculation can provide a basis for quantitative analysis for judging the operation mode of the power grid, and the calculation of static stability, transient stability and the like of the power grid also needs to be based on the load flow calculation. However, as the scale of the power grid is enlarged and the load level is increased, the problems that a plurality of controllable variables exist in the adjustment of the operation mode of the power grid, multiple targets are difficult to be considered, and the like are gradually highlighted, the situation that the power flow calculation is not converged frequently occurs in the work of compiling the operation mode, the power flow adjustment is carried out while considering multiple indexes, so that the work is time-consuming and tedious in order to compile the operation mode of the power grid, and the traditional adjustment method relying on manual experience cannot meet the requirements. The common optimal power flow method also has the problems of easiness in falling into local optimization, difficulty in convergence of power flow in multi-objective optimization and the like when the operation mode of a large power grid is adjusted. Based on the method, the invention provides an automatic optimization approach adjusting method of a deep Q network driven power system operation mode, and the advantages of deep reinforcement learning in high-dimensional data perception and multi-target optimization are fully utilized.
Disclosure of Invention
Aiming at the defects in the prior art, the automatic optimization approach adjusting method for the operation mode of the power system driven by the deep Q network solves the problems that a common optimal power flow method is easy to fall into local optimization and the power flow is not easy to converge when the operation mode of a large power grid is adjusted. The deep Q network method is introduced into the problem of adjusting the power grid operation mode, the power grid generator output, node voltage, line power and other data are used as driving data, after offline training, an adjustment strategy reaching a target mode can be given on the premise of meeting multiple adjustment targets, stable convergence is achieved, automatic adjustment of the power grid operation mode is achieved, and mapping from mode data to the adjustment strategy is formed.
In order to achieve the purpose of the invention, the invention adopts the technical scheme that:
the automatic optimization-seeking adjusting method for the operation mode of the power system driven by the deep Q network comprises the following steps of:
s1: determining a load fluctuation range by taking a typical operation mode as an adjustment reference mode, and generating a large amount of target mode sample data for training and testing by combining a Latin hypercube sampling method;
s2: determining all feasible single control actions in the power grid model, numbering the control actions, and setting the control actions as an action space;
s3: initializing a power grid model, judging whether an untrained sample exists, if so, assigning load data in the sample to the power grid model, performing convergence optimization processing on output data of a generator in a current operation mode, and if not, terminating training;
s4: carrying out load flow calculation, carrying out normalization processing calculation to obtain state data, and storing the state data into a state vector s;
s5: building a deep neural network and training, and fitting various data in the current power grid state s and action values of various adjustment actions in an action space;
s6: selecting an adjusting action a from the action space according to a greedy strategy for execution, and calculating the trend to obtain a new state vector s';
s7: judging whether the state s 'meets constraint conditions, if so, giving out an award r according to an award function, storing data into a memory unit D in a vector (s, a, r, s'), and if not, giving punishment;
s8: sampling a plurality of samples from the memory unit D to train a deep neural network, and updating a parameter theta of the deep neural network by using a random gradient descent method;
s9: it is determined whether the state S' satisfies the termination condition, and if so, the process returns to S3, and if not, the process returns to S5.
Further, the step S1 is specifically:
the Latin hypercube sampling method comprises the following steps: dividing the value range of the samples into N equal parts according to the number of the samples, and selecting one sample in each part to enable the sample to be distributed in the whole sample space and have certain randomness;
the load data of a typical operation mode of a power grid is taken as a reference, the random load fluctuation is 80% -120%, disturbance is added to the original data, and finally N sample data are generated.
Further, the step S2 is specifically:
selecting all feasible single control actions in the power grid model, and setting the control actions as an action space A, wherein the action space A comprises the output action a of the generatorGTransformer tap action aTAnd reactive compensation action aCThe output action of the generator can be divided into + △ and- △ states, wherein △ represents the adjusting range of the generator power, and the transformer pumpThe head movement can be divided into two states of one gear rising and one gear falling; the reactive compensation action comprises two states of switching in and switching out, namely:
A={aG,aT,aC}
and numbering all single control actions in the power grid model, and forming mapping with the adjustment strategy.
Further, the step S3 is specifically:
carrying out convergence optimization processing on the output of the generator in the current power grid operation mode: the total variation of the load data is obtained, and the variation is uniformly distributed to each generator; at this point, the agent may obtain an initial operating mode that is closer to the target mode feasible region to begin training.
Further, the step S4 is specifically:
and (3) carrying out normalization processing on the related data:
Figure BDA0002516495060000041
wherein, ηkExpressing the result after the data normalization processing; x is the number ofkA kth data value representing the grid mode status data; n represents the number of the data; x is the number ofk,maxAnd xk,minThe upper and lower limit values of the data are shown;
in the case where structural parameters and load conditions of the power grid are already given, the state vector s is expressed as:
s={PG,V,Pline,Tp_pos}
wherein, PGRepresenting the generator power in the current state; v represents a node voltage; plineRepresenting the active power of the line; t isp_posRepresenting the tap position of the transformer.
Further, the step S5 is specifically:
fitting various data in the current power grid state s and action values of various adjustment actions in an action space by using a deep neural network to approximate a value function in reinforcement learning, wherein a state feature vector consisting of generator output, node voltage and line power grid data is used as input of the deep neural network, and the action values of discretization adjustment actions are output;
and (3) approximating a Q value function in Q learning through a deep neural network, and updating the formula of the Q value function into:
Figure BDA0002516495060000042
wherein α represents a learning rate;
a Keras framework based on TensorFlow is used for building a deep neural network, the number of layers of the built deep neural network is a double-hidden-layer framework, and the deep neural network comprises 1 input layer, 1 output layer and 2 hidden layers; the input of the deep neural network is state quantity under the current operation mode of the power grid, and the state quantity comprises output of a generator, tap positions of a transformer, node voltage, line load rate and reactive compensation switching states, so that the total number of input layer nodes is 116; the output layer nodes respectively correspond to 82 discrete action values; setting the node of each hidden layer as 200, selecting a ReLu function as an activation function, normally initializing interlayer weight omega, and setting initialization bias b as 0.01; the selection of the hyper-parameters can set a value range, then the hyper-parameters are optimized by a particle swarm method, the accuracy of the deep neural network is used as a standard for judging the performance of the hyper-parameters, and the optimal hyper-parameters are found, so that the deep neural network achieves the optimal fitting effect.
Further, the step S6 is specifically:
the learning rate attenuation method is adopted in the training process, the learning speed can be improved in the early stage of the training, the evaluation accuracy rate is improved in the later stage of the training, namely, the exploration rate in the greedy strategy should be dynamically adjusted, and the interval [ 2 ] is carried out along with the iterationminini]The inner part gradually descends.
Further, the step S7 is specifically:
the operation mode which can meet all the constraint conditions is found by adjusting available control variables, and the adjustment targets are as follows:
(1) minimizing the average fluctuation of the system node voltage;
Figure BDA0002516495060000051
(2) maximizing the utilization rate of the system line load;
Figure BDA0002516495060000052
(3) the power generation cost of the generator is minimized;
Figure BDA0002516495060000053
F(PG,k)=mkpG,k 2+nkpG,k+lk
wherein N is1Representing the number of the nodes of the power grid; n is a radical of2Representing the number of the power grid lines; n is a radical of3Representing the total number of generator sets; vkRepresenting a voltage per unit value of the node k in the current state, and obtaining the voltage per unit value through load flow calculation; vk,baseA benchmark per unit value representing a node k; pline,kRepresenting the active power of the line k in the current state; pline,k,limRepresenting the upper active power limit of the line k; f (P)G,k) Representing the generating cost of the generator set; skRepresenting the start-up and shut-down costs of the generator set; u. ofkIndicating the variation control quantity of the starting and stopping state of the generator set, and when the starting and stopping state of the generator set varies, uk1, otherwise uk=0;mk、nk、lkRespectively the cost coefficients of the generator set;
the constraint conditions are the same as the optimal power flow and comprise equality constraint conditions and inequality constraint conditions; the operation mode obtained by adjustment must meet a basic power flow equation, namely an equality constraint condition; the inequality constraints include: the method comprises the following steps of (1) restraining the upper and lower limits of the active power output of a generator, restraining the gear adjusting range of a transformer tap, restraining the upper and lower limits of the node voltage amplitude, restraining the maximum current or apparent power passing through a power transmission line or a transformer element, and restraining the maximum active power flow or reactive power flow passing through a line;
in correspondence with the control objectives, the single step rewards earned in the exploration and training of the agent should include three aspects involved in adjusting the objectives: average fluctuation of node voltage, load safety margin of a fragile line in the system and power generation cost of a generator set; and forming a comprehensive reward function by linear weighting of the three indexes, and defining the reward r obtained after selecting the action a in a given state s as:
Figure BDA0002516495060000061
Figure BDA0002516495060000062
Figure BDA0002516495060000063
Figure BDA0002516495060000064
λ, ω represent rewarding weights considering the voltage stability index and the line load safety margin index, λ, ω ∈ (0,1), and λ + ω ∈ (0,1), r, respectivelydoneIs a constant with a negative value.
Further, the step S8 is specifically:
the deep Q network also establishes another identical network for generating a Q value in a target state; the intelligent agent updates the neural network parameter theta by minimizing the mean square error between the Q function value in the current state and the Q function value in the target state; in addition, after N rounds of iteration, the Q value network parameters of the current state are copied to the Q value network of the target state, and finally, an action strategy for realizing the expected target is obtained in the process of continuous cyclic training; an experience playback mechanism is adopted in the deep Q network, that is, at each time step t, samples e ═ s, a, r, s' generated by interaction are stored in a memory unit D, and during training, small batches of samples are randomly extracted from the memory unit D each time and added into a training set.
Further, the step S9 is specifically:
and (3) taking the performance difference criterion under the variable reward function as the ending condition of the intelligent agent training, namely, taking the difference between each item of data in the state s and each item of data in the state s ', solving the state change under a single action, and judging that s' is in a termination state when the state change is smaller than a set value.
The invention has the beneficial effects that:
the method takes the data of the power generator output, the node voltage, the line power and the like of the power grid as the drive, can provide the adjustment strategy of the mode reaching the target on the premise of meeting a plurality of adjustment targets after offline training, is stable and convergent, realizes the automatic adjustment of the power grid operation mode, and forms the mapping from the mode data to the adjustment strategy. The problems of large workload, low adjustment efficiency, high convergence difficulty and the like in the traditional manual modulation method are solved, the problem that the optimal power flow method is difficult to converge when the multi-target optimal power flow is solved while the calculation speed is ensured, various indexes of the adjusted mode have no overlarge deviation, a new tool is provided for operation mode compilation work, and method reference is provided for applying deep reinforcement learning to power grid optimization and control problems.
Drawings
FIG. 1 is a topology diagram of an IEEE39 node system in accordance with one embodiment of the present invention;
FIG. 2 is a schematic diagram of an average cumulative prize of one embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating the step size required to adjust the operation mode according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating runtime test results according to one embodiment of the present invention;
FIG. 5 is a diagram illustrating the step size required for a single iteration of a test set, in accordance with one embodiment of the present invention;
FIG. 6 is a flowchart illustrating steps according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.
As shown in fig. 6, an automatic optimization-approaching adjusting method for a deep Q-network-driven power system operation mode includes the following steps:
s1: determining a load fluctuation range by taking a typical operation mode as an adjustment reference mode, and generating a large amount of target mode sample data for training and testing by combining a Latin hypercube sampling method;
s2: determining all feasible single control actions in the power grid model, numbering the control actions, and setting the control actions as an action space;
s3: initializing a power grid model, judging whether an untrained sample exists, if so, assigning load data in the sample to the power grid model, performing convergence optimization processing on output data of a generator in a current operation mode, and if not, terminating training;
s4: carrying out load flow calculation, carrying out normalization processing calculation to obtain state data, and storing the state data into a state vector s;
s5: building a deep neural network and training, and fitting various data in the current power grid state s and action values of various adjustment actions in an action space;
s6: selecting an adjusting action a from the action space according to a greedy strategy for execution, and calculating the trend to obtain a new state vector s';
s7: judging whether the state s 'meets constraint conditions, if so, giving out an award r according to an award function, storing data into a memory unit D in a vector (s, a, r, s'), and if not, giving punishment;
s8: sampling a plurality of samples from the memory unit D to train a deep neural network, and updating a parameter theta of the deep neural network by using a random gradient descent method;
s9: it is determined whether the state S' satisfies the termination condition, and if so, the process returns to S3, and if not, the process returns to S5.
Further, the step S1 includes: determining a load fluctuation range by taking a typical operation mode as an adjusting reference mode, and generating a large amount of target mode sample data for training and testing by combining a Latin hypercube sampling method, which specifically comprises the following steps:
the principle of the Latin hypercube sampling method is as follows: the value range of the samples is divided into N equal parts according to the number of the samples, and one sample is selected from each part, so that the samples can be distributed in the whole sample space and have certain randomness.
The load data of a typical operation mode of a power grid is taken as a reference, the random load fluctuation is 80-120%, and in order to ensure the performance and generalization effect of the method, disturbance is added to the original data, and finally N sample data are generated.
Further, the step S2 includes: determining all feasible single control actions in the power grid model, numbering the control actions, and setting the control actions as an action space, specifically:
selecting all feasible single control actions in the power grid model, and setting the control actions as an action space A, wherein the action space A comprises the output action a of the generatorGTransformer tap action aTAnd reactive compensation action aCThe output action of the generator can be divided into two states of + △ and- △, wherein △ represents the adjusting amplitude of the power of the generator, the tap action of the transformer can be divided into two states of ascending one gear and descending one gear, and the reactive compensation action comprises two states of switching in and switching out, namely:
A={aG,aT,aC}
in addition, for convenience of expression, all the single control actions in the power grid model need to be numbered and mapped with the adjustment strategy.
Further, the step S3 includes: initializing a power grid model, judging whether an untrained sample exists, if so, assigning load data in the sample to the power grid model, performing convergence optimization processing on output data of a generator in a current operation mode, and if not, terminating training, specifically:
when the total variation of the load data (i.e., the difference between the target-mode load data and the original-mode load data) is too large, the original mode may be out of the feasible range of the target mode. At this time, any action is selected to be executed in the original mode, and the state data obtained after adjustment may not satisfy the constraint condition, and only a large penalty can be obtained. This makes the operation unable to be effectively adjusted to the feasible range, which may eventually lead to a non-convergence of the method. Therefore, in the method, "convergence optimization processing is performed on the generator output of the current power grid operation mode": the total variation of the load data is obtained, and the variation is evenly distributed to each generator. At the moment, the intelligent agent can obtain an initial operation mode closer to a feasible domain of the target mode to start training, so that the convergence of the method is improved, the step length from exploration to the target mode is reduced, the efficiency of the method is improved, and the condition that the output of the generator is only concentrated on a few generators is prevented.
Further, the step S4 includes: carrying out load flow calculation, storing state data obtained after normalization processing calculation into a state vector s, and specifically comprising the following steps:
in order to ensure that the dimensions of each index can be unified, normalization processing needs to be carried out on related data:
Figure BDA0002516495060000101
wherein, ηkExpressing the result after the data normalization processing; x is the number ofkA kth data value representing the grid mode status data; n represents the number of the data; x is the number ofk,maxAnd xk,minIndicating the upper and lower limits of this data.
In the case where structural parameters and load conditions of the power grid are already given, the state vector s is expressed as:
s={PG,V,Pline,Tp_pos}
wherein, PGTo representThe generator power in the current state; v represents a node voltage; plineRepresenting the active power of the line; t isp_posRepresenting the tap position of the transformer.
Further, the step S5 includes: the method comprises the following steps of building a deep neural network, training, fitting various data in the current power grid state s and action values of various adjustment actions in an action space, and specifically:
and fitting various data in the current power grid state s and action values of various adjustment actions in an action space by using a deep neural network to approximate a value function in reinforcement learning, wherein a state characteristic vector formed by power grid data such as generator output, node voltage, line power and the like is used as input of the deep neural network, and the action values of the discretization adjustment actions are output.
The deep neural network can approximate the function without depending on any analytical equation and automatically learn the low-dimensional feature representation of the high-dimensional data; meanwhile, the method has strong growth performance, and continuous improvement and continuous updating are realized only by adjusting network parameters so as to achieve the optimal approximate effect; it is also possible to quickly give an output from an input. In the deep Q network method, a Q value function in Q learning is approximated by a deep neural network, and the formula of the Q value function is updated as follows:
Figure BDA0002516495060000111
wherein s' represents the next state; α represents a learning rate.
The construction method disclosed by the invention is characterized in that a Keras framework based on TensorFlow is used for constructing the deep neural network, the number of the constructed deep neural network layers is a double-hidden-layer framework, and the deep neural network comprises 1 input layer, 1 output layer and 2 hidden layers. The input of the deep neural network is state quantity under the current operation mode of the power grid, and the state quantity comprises output of a generator, tap positions of a transformer, node voltage, line load rate and reactive compensation switching states, so that the total number of input layer nodes is 116; and the output layer nodes correspond to 82 discrete action values respectively. Setting the node of each hidden layer as 200, selecting a ReLu function as an activation function, normally initializing inter-layer weight omega, and setting initialization bias b as 0.01. The selection of the hyper-parameters can be firstly set with a value range, then the hyper-parameters are optimized by an optimization method such as a particle swarm optimization method, the accuracy of the deep neural network is used as a standard for judging the performance of the hyper-parameters, and the optimal hyper-parameters are found, so that the deep neural network achieves the optimal fitting effect.
Further, the step S6 includes: selecting an adjusting action a from the action space according to a greedy strategy for execution, and calculating the trend to obtain a new state vector s', specifically:
in order to ensure the convergence of the method, a method of learning rate attenuation is adopted in the training process of the method, the method can improve the learning speed in the early stage of the training and improve the evaluation accuracy in the later stage of the training, namely, the exploration rate in a greedy strategy should be dynamically adjusted, and the method is in an interval [ 2 ] along with the iterationminini]The inner part gradually descends. Further, the step S7 includes: judging whether the state s 'meets the constraint condition, if so, giving out an award r according to an award function, storing data into a memory unit D in a vector (s, a, r, s'), and if not, giving a punishment, specifically:
the automatic adjustment of the power grid operation mode is actually an optimization problem, the operation mode which can meet all constraint conditions is found by adjusting available control variables (such as output of a generator, a transformer tap and the like), and the adjustment target is as follows:
(1) the average fluctuation of the system node voltage is minimized.
Figure BDA0002516495060000121
(2) The utilization rate of the system line load is maximized.
Figure BDA0002516495060000122
(3) The cost of generating electricity by the generator is minimized.
Figure BDA0002516495060000123
F(PG,k)=mkpG,k 2+nkpG,k+lk
Wherein N is1Representing the number of the nodes of the power grid; n is a radical of2Representing the number of the power grid lines; n is a radical of3Representing the total number of generator sets; vkRepresenting a voltage per unit value of the node k in the current state, and obtaining the voltage per unit value through load flow calculation; vk,baseA benchmark per unit value representing a node k; pline,kRepresenting the active power of the line k in the current state; pline,k,limRepresenting the upper active power limit of the line k; f (P)G,k) Representing the generating cost of the generator set; skRepresenting the start-up and shut-down costs of the generator set; u. ofkIndicating the variation control quantity of the starting and stopping state of the generator set, and when the starting and stopping state of the generator set varies, u k1, otherwise uk=0;mk、nk、lkRespectively the cost factor of the generator set.
The constraint conditions are the same as the optimal power flow and comprise equality constraint conditions and inequality constraint conditions. The operation mode obtained by adjustment must meet the basic power flow equation, namely, the equation constraint condition. The inequality constraints include: the method comprises the following steps of generator active power output upper and lower limit constraint, transformer tap gear adjustment range constraint, node voltage amplitude upper and lower limit constraint, maximum current or apparent power constraint passing through a power transmission line or a transformer element, and maximum active power flow or reactive power flow constraint passing through a line.
In correspondence with the control objectives, the single step rewards earned in the exploration and training of the agent should include three aspects involved in adjusting the objectives: average fluctuations in node voltage, load safety margins for fragile lines in the system, and power generation costs for the generator set. And forming a comprehensive reward function by linear weighting of the three indexes, and defining the reward r obtained after selecting the action a in a given state s as:
Figure BDA0002516495060000131
Figure BDA0002516495060000132
Figure BDA0002516495060000133
Figure BDA0002516495060000134
wherein, Vk、Vk,base、Pline,k、Pline,base、pG,kThe data are normalized data, lambda and omega respectively represent reward weight considering voltage stability index and line load safety margin index, lambda, omega ∈ (0,1) and lambda + omega ∈ (0,1), rdoneA constant with a negative value represents a large penalty.
Further, the step S8 includes: sampling a plurality of samples from a memory unit D to train a deep neural network, and updating a parameter theta of the deep neural network by using a random gradient descent method, wherein the method specifically comprises the following steps:
in addition to approximating the Q function with a deep neural network, a deep Q network establishes another identical network for generating the Q at the target state. The agent updates the neural network parameter theta by minimizing the mean square error between the Q function value in the current state and the Q function value in the target state. In addition, after N rounds of iteration, the Q value network parameters of the current state are copied to the Q value network of the target state, and finally, an action strategy for achieving the expected target is obtained in the process of continuous cyclic training. It is worth mentioning that, in order to alleviate the problems that the non-linear network indicates that the value function is unstable, an empirical playback mechanism (empirical playback) is adopted in the deep Q network, that is, at each time step t, samples e generated by interaction are stored into the memory unit D (s, a, r, s'), and during training, small batches of samples are randomly extracted from the memory unit D and added into the training set each time. This results in reduced correlation between samples and improved stability of the method.
Further, the step S9 includes: judging whether the state S' meets the termination condition, if so, returning to S3, and if not, returning to S5, specifically:
because the change of the target mode is random and the optimal value of each performance index is difficult to calculate, the termination state of the reinforcement learning of the intelligent agent is difficult to determine. Therefore, the method provides a performance difference criterion under a variable reward function as an end condition of the intelligent agent training, namely, each item of data in the state s is different from each item of data in the state s', the state change amount under a single action is obtained, and when the state change amount is smaller than a set value, the state change amount can be judged to be in an end state.
The IEEE39 node system will be described as an example. The IEEE39 node system is shown in fig. 1, and the method flow is shown in table 1.
Table 1 pseudo code of automatic optimization-approaching adjusting method for operation mode of deep Q network driven power system
Figure BDA0002516495060000141
Figure BDA0002516495060000151
The original load data of an IEEE39 node system is used as a reference, random load fluctuation is 80% -120%, in order to guarantee performance and generalization effect of the method, disturbance is added to the original data, 15000 sample data are finally generated, 10000 sample data are randomly selected to serve as a training set, and the rest 5000 sample data serve as a testing set.
In the training process, for convenience of description, all the single control actions in the example are numbered and form a mapping with the adjustment strategy, and the mapping relation is shown in table 2.
TABLE 2 action space LUT
Figure BDA0002516495060000152
Figure BDA0002516495060000161
The input of the Q value network is state quantity of the power grid in the current operation mode, wherein the state quantity comprises output of a generator, tap positions of a transformer, node voltage, line load rate and reactive compensation switching state, so that the total number of input layer nodes is 116; the output layer nodes correspond to 82 discrete action values respectively. And (3) constructing a double-hidden-layer framework, setting the node of each hidden layer as 200, selecting a ReLu function as an activation function, normally initializing an inter-layer weight omega, and setting the initialization bias b as 0.01. Furthermore, to ensure convergence, the rate of exploration in the greedy strategy should be dynamically adjusted, i.e., in the interval [ 2 ] as the iteration progressesminini]The inner part gradually descends. The hyper-parameter settings in the iterative training are shown in table 3.
Table 3 intelligent agent parameters in the examples
Figure BDA0002516495060000162
10000 load data samples are used as a training set, and 5000 load data samples are used as a testing set. Recording the step length N required for adjusting the initial operation mode to the target operation mode in the current iteration when each iteration is performed for 100 timesstep(ii) a At 300 iterations, the accumulated reward values for the next 10 iterations are recorded and the average r is calculatedave. After 15000 load data samples are trained and tested, the recorded convergence of the average accumulated reward and the distribution of the required step length in a single iteration are respectively shown in fig. 2 and 3.
When 5000 samples of the test set are tested, the time consumed by the iteration of the round and the step length N required by adjusting the initial operation mode to the target operation mode in the iteration are recorded every 100 times of iterationstepThe run-time test results of the test set and the step size required for a single iteration of the test set are shown in fig. 4 and 5.
As can be seen from fig. 2 and 3, the Q-value network and the present invention have convergence as a whole. Comparing fig. 4 and 5, it can be seen that the operation time of the method is related to the step length required in a single iteration, and the automatic adjustment of the operation mode of the power grid can be realized relatively quickly after sufficient off-line training.
In addition, the method and the optimal power flow interior point method are tested by using the test set samples, and the evaluation indexes of the target mode obtained after adjustment are calculated and compared. Wherein, the control targets of the two adjusting methods should be kept consistent; the evaluation index is selected corresponding to the objective function of the method, and a voltage fluctuation index I is definedVLine load utilization index IlineAnd power generation cost index IcostComprises the following steps:
Figure BDA0002516495060000171
Figure BDA0002516495060000172
Figure BDA0002516495060000173
wherein, index IVThe voltage fluctuation conditions before and after the operation mode is adjusted are reflected by calculating the average value of the voltage variation of each node, and the smaller the value is, the better the value is; index IlineEvaluating the utilization rate of the line load by calculating the variance between each line load and a reference value, wherein the smaller the value is, the closer the line load is to the reference value, namely the higher the utilization rate of the line load is while ensuring the safety margin of the line load; i iscostThe smaller the value is, the lower the power generation cost of the current operation mode is.
Randomly selecting a sample in the test set to test the invention, and calculating and recording each evaluation index of the current operation mode in each step of iteration (namely after each action is executed), wherein the result is shown in table 3.
TABLE 3 evaluation index Change in operation mode adjustment
Figure BDA0002516495060000174
And selecting the evaluation indexes of the adjusted running modes of 7 samples from the test results for displaying, wherein the three samples with the numbers of 2499, 2502 and 5000 do not converge when calculating the optimal power flow by adopting an optimal power flow interior point method, so the evaluation indexes are represented by "-".
TABLE 4 evaluation index of the operating mode adjusted
Figure BDA0002516495060000175
Figure BDA0002516495060000181
As can be seen from table 3, with the gradual adjustment of the operation mode, the output of the generator gradually changes to the distribution mode with the minimum power generation cost, the utilization rate of the line load also gradually increases, and the node voltage also changes at this time, so that the average voltage fluctuation gradually increases, and the trend of the change is the same as that of the three indexes in table 3. Table 4 illustrates that the convergence is significantly better than the optimal power flow method when the multi-objective optimization operation mode adjustment problem is faced.

Claims (10)

1. The automatic optimization-seeking adjusting method of the operation mode of the power system driven by the deep Q network is characterized by comprising the following steps of:
s1: determining a load fluctuation range by taking a typical operation mode as an adjustment reference mode, and generating a large amount of target mode sample data for training and testing by combining a Latin hypercube sampling method;
s2: determining all feasible single control actions in the power grid model, numbering the control actions, and setting the control actions as an action space;
s3: initializing a power grid model, judging whether an untrained sample exists, if so, assigning load data in the sample to the power grid model, performing convergence optimization processing on output data of a generator in a current operation mode, and if not, terminating training;
s4: carrying out load flow calculation, carrying out normalization processing calculation to obtain state data, and storing the state data into a state vector s;
s5: building a deep neural network and training, and fitting various data in the current power grid state s and action values of various adjustment actions in an action space;
s6: selecting an adjusting action a from the action space according to a greedy strategy for execution, and calculating the trend to obtain a new state vector s';
s7: judging whether the state s 'meets constraint conditions, if so, giving out an award r according to an award function, storing data into a memory unit D in a vector (s, a, r, s'), and if not, giving punishment;
s8: sampling a plurality of samples from the memory unit D to train a deep neural network, and updating a parameter theta of the deep neural network by using a random gradient descent method;
s9: it is determined whether the state S' satisfies the termination condition, and if so, the process returns to S3, and if not, the process returns to S5.
2. The method of claim 1, wherein the method comprises the steps of,
the step S1 specifically includes:
the Latin hypercube sampling method comprises the following steps: dividing the value range of the samples into N equal parts according to the number of the samples, and selecting one sample in each part to enable the sample to be distributed in the whole sample space and have certain randomness;
the load data of a typical operation mode of a power grid is taken as a reference, the random load fluctuation is 80% -120%, disturbance is added to the original data, and finally N sample data are generated.
3. The method of claim 2, wherein the method comprises the steps of,
the step S2 specifically includes:
selecting all feasible single control actions in the power grid model, and setting the control actions as an action space A, wherein the action space A comprises the output action a of the generatorGTransformer tap action aTAnd reactive compensation action aCThe generator output action can be divided into two states of + △ and- △, wherein △ represents the adjusting amplitude of the generator power, the transformer tap action can be divided into two states of one gear increasing and one gear decreasing, and the reactive compensation action comprises two states of switching in and switching out, namely:
A={aG,aT,aC}
and numbering all single control actions in the power grid model, and forming mapping with the adjustment strategy.
4. The method of claim 3, wherein the method comprises the steps of,
the step S3 specifically includes:
carrying out convergence optimization processing on the output of the generator in the current power grid operation mode: the total variation of the load data is obtained, and the variation is uniformly distributed to each generator; at this point, the agent may obtain an initial operating mode that is closer to the target mode feasible region to begin training.
5. The method of claim 4, wherein the method comprises the steps of,
the step S4 specifically includes:
and (3) carrying out normalization processing on the related data:
Figure FDA0002516495050000031
wherein, ηkExpressing the result after the data normalization processing; x is the number ofkA kth data value representing the grid mode status data; n represents the number of the data; x is the number ofk,maxAnd xk,minRepresents this dataUpper and lower limit values of (d);
in the case where structural parameters and load conditions of the power grid are already given, the state vector s is expressed as:
s={PG,V,Pline,Tp_pos}
wherein, PGRepresenting the generator power in the current state; v represents a node voltage; plineRepresenting the active power of the line; t isp_posRepresenting the tap position of the transformer.
6. The method of claim 5, wherein the method comprises the steps of,
the step S5 specifically includes:
fitting various data in the current power grid state s and action values of various adjustment actions in an action space by using a deep neural network to approximate a value function in reinforcement learning, wherein a state feature vector consisting of generator output, node voltage and line power grid data is used as input of the deep neural network, and the action values of discretization adjustment actions are output;
and (3) approximating a Q value function in Q learning through a deep neural network, and updating the formula of the Q value function into:
Figure FDA0002516495050000032
wherein α represents a learning rate;
a Keras framework based on TensorFlow is used for building a deep neural network, the number of layers of the built deep neural network is a double-hidden-layer framework, and the deep neural network comprises 1 input layer, 1 output layer and 2 hidden layers; the input of the deep neural network is state quantity under the current operation mode of the power grid, and the state quantity comprises output of a generator, tap positions of a transformer, node voltage, line load rate and reactive compensation switching states, so that the total number of input layer nodes is 116; the output layer nodes respectively correspond to 82 discrete action values; setting the node of each hidden layer as 200, selecting a ReLu function as an activation function, normally initializing interlayer weight omega, and setting initialization bias b as 0.01; the selection of the hyper-parameters can set a value range, then the hyper-parameters are optimized by a particle swarm method, the accuracy of the deep neural network is used as a standard for judging the performance of the hyper-parameters, and the optimal hyper-parameters are found, so that the deep neural network achieves the optimal fitting effect.
7. The method of claim 6, wherein the method comprises the steps of,
the step S6 specifically includes:
the learning rate attenuation method is adopted in the training process, the learning speed can be improved in the early stage of the training, the evaluation accuracy rate is improved in the later stage of the training, namely, the exploration rate in the greedy strategy should be dynamically adjusted, and the interval [ 2 ] is carried out along with the iterationminini]The inner part gradually descends.
8. The method of claim 7, wherein the method comprises the steps of,
the step S7 specifically includes:
the operation mode which can meet all the constraint conditions is found by adjusting available control variables, and the adjustment targets are as follows:
(1) minimizing the average fluctuation of the system node voltage;
Figure FDA0002516495050000041
(2) maximizing the utilization rate of the system line load;
Figure FDA0002516495050000042
(3) the power generation cost of the generator is minimized;
Figure FDA0002516495050000051
F(PG,k)=mkpG,k 2+nkpG,k+lk
wherein N is1Representing the number of the nodes of the power grid; n is a radical of2Representing the number of the power grid lines; n is a radical of3Representing the total number of generator sets; vkRepresenting a voltage per unit value of the node k in the current state, and obtaining the voltage per unit value through load flow calculation; vk,baseA benchmark per unit value representing a node k; pline,kRepresenting the active power of the line k in the current state; pline,k,limRepresenting the upper active power limit of the line k; f (P)G,k) Representing the generating cost of the generator set; skRepresenting the start-up and shut-down costs of the generator set; u. ofkIndicating the variation control quantity of the starting and stopping state of the generator set, and when the starting and stopping state of the generator set varies, uk1, otherwise uk=0;mk、nk、lkRespectively the cost coefficients of the generator set;
the constraint conditions are the same as the optimal power flow and comprise equality constraint conditions and inequality constraint conditions; the operation mode obtained by adjustment must meet a basic power flow equation, namely an equality constraint condition; the inequality constraints include: the method comprises the following steps of (1) restraining the upper and lower limits of the active power output of a generator, restraining the gear adjusting range of a transformer tap, restraining the upper and lower limits of the node voltage amplitude, restraining the maximum current or apparent power passing through a power transmission line or a transformer element, and restraining the maximum active power flow or reactive power flow passing through a line;
in correspondence with the control objectives, the single step rewards earned in the exploration and training of the agent should include three aspects involved in adjusting the objectives: average fluctuation of node voltage, load safety margin of a fragile line in the system and power generation cost of a generator set; and forming a comprehensive reward function by linear weighting of the three indexes, and defining the reward r obtained after selecting the action a in a given state s as:
Figure FDA0002516495050000052
Figure FDA0002516495050000053
Figure FDA0002516495050000054
Figure FDA0002516495050000061
λ, ω represent rewarding weights considering the voltage stability index and the line load safety margin index, λ, ω ∈ (0,1), and λ + ω ∈ (0,1), r, respectivelydoneIs a constant with a negative value.
9. The method of claim 8, wherein the method comprises the steps of,
the step S8 specifically includes:
the deep Q network also establishes another identical network for generating a Q value in a target state; the intelligent agent updates the neural network parameter theta by minimizing the mean square error between the Q function value in the current state and the Q function value in the target state; in addition, after N rounds of iteration, the Q value network parameters of the current state are copied to the Q value network of the target state, and finally, an action strategy for realizing the expected target is obtained in the process of continuous cyclic training; an experience playback mechanism is adopted in the deep Q network, that is, at each time step t, samples e ═ s, a, r, s' generated by interaction are stored in a memory unit D, and during training, small batches of samples are randomly extracted from the memory unit D each time and added into a training set.
10. The method of claim 9, wherein the method comprises the steps of,
the step S9 specifically includes:
and (3) taking the performance difference criterion under the variable reward function as the ending condition of the intelligent agent training, namely, taking the difference between each item of data in the state s and each item of data in the state s ', solving the state change under a single action, and judging that s' is in a termination state when the state change is smaller than a set value.
CN202010478336.4A 2020-05-29 2020-05-29 Automatic optimization-seeking adjustment method for operation mode of deep Q network-driven power system Active CN111523737B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010478336.4A CN111523737B (en) 2020-05-29 2020-05-29 Automatic optimization-seeking adjustment method for operation mode of deep Q network-driven power system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010478336.4A CN111523737B (en) 2020-05-29 2020-05-29 Automatic optimization-seeking adjustment method for operation mode of deep Q network-driven power system

Publications (2)

Publication Number Publication Date
CN111523737A true CN111523737A (en) 2020-08-11
CN111523737B CN111523737B (en) 2022-06-28

Family

ID=71911232

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010478336.4A Active CN111523737B (en) 2020-05-29 2020-05-29 Automatic optimization-seeking adjustment method for operation mode of deep Q network-driven power system

Country Status (1)

Country Link
CN (1) CN111523737B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112564189A (en) * 2020-12-15 2021-03-26 深圳供电局有限公司 Active and reactive power coordinated optimization control method
CN112600221A (en) * 2020-12-08 2021-04-02 深圳供电局有限公司 Reactive compensation device configuration method, device, equipment and storage medium
CN112615379A (en) * 2020-12-10 2021-04-06 浙江大学 Power grid multi-section power automatic control method based on distributed multi-agent reinforcement learning
CN112798901A (en) * 2020-12-29 2021-05-14 成都沃特塞恩电子技术有限公司 Equipment calibration system and method
CN112818588A (en) * 2021-01-08 2021-05-18 南方电网科学研究院有限责任公司 Optimal power flow calculation method and device for power system and storage medium
CN113268933A (en) * 2021-06-18 2021-08-17 大连理工大学 Rapid structural parameter design method of S-shaped emergency robot based on reinforcement learning
CN113315131A (en) * 2021-05-18 2021-08-27 国网浙江省电力有限公司 Intelligent power grid operation mode adjusting method and system
CN114355776A (en) * 2022-01-04 2022-04-15 神华神东电力有限责任公司 Control method and control system for generator set

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108964042A (en) * 2018-07-24 2018-12-07 合肥工业大学 Regional power grid operating point method for optimizing scheduling based on depth Q network
CN110535146A (en) * 2019-08-27 2019-12-03 哈尔滨工业大学 The Method for Reactive Power Optimization in Power of Policy-Gradient Reinforcement Learning is determined based on depth

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108964042A (en) * 2018-07-24 2018-12-07 合肥工业大学 Regional power grid operating point method for optimizing scheduling based on depth Q network
CN110535146A (en) * 2019-08-27 2019-12-03 哈尔滨工业大学 The Method for Reactive Power Optimization in Power of Policy-Gradient Reinforcement Learning is determined based on depth

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HONGJUNGAO ET AL.: "Cutting planes based relaxed optimal power flow in active distribution systems", 《ELECTRIC POWER SYSTEMS RESEARCH》 *
JIAJUN DUAN ET AL.: "Deep-reinforcement-learning-based autonomous voltage control for power grid operations", 《 IEEE TRANSACTIONS ON POWER SYSTEMS》 *
朱轶伦 等: "一种基于深度强化学习的电网潮流特征提取方法", 《电网与清洁能源》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112600221A (en) * 2020-12-08 2021-04-02 深圳供电局有限公司 Reactive compensation device configuration method, device, equipment and storage medium
CN112615379A (en) * 2020-12-10 2021-04-06 浙江大学 Power grid multi-section power automatic control method based on distributed multi-agent reinforcement learning
CN112615379B (en) * 2020-12-10 2022-05-13 浙江大学 Power grid multi-section power control method based on distributed multi-agent reinforcement learning
CN112564189A (en) * 2020-12-15 2021-03-26 深圳供电局有限公司 Active and reactive power coordinated optimization control method
CN112798901A (en) * 2020-12-29 2021-05-14 成都沃特塞恩电子技术有限公司 Equipment calibration system and method
CN112798901B (en) * 2020-12-29 2023-01-10 成都沃特塞恩电子技术有限公司 Equipment calibration system and method
CN112818588A (en) * 2021-01-08 2021-05-18 南方电网科学研究院有限责任公司 Optimal power flow calculation method and device for power system and storage medium
CN112818588B (en) * 2021-01-08 2023-05-02 南方电网科学研究院有限责任公司 Optimal power flow calculation method, device and storage medium of power system
CN113315131A (en) * 2021-05-18 2021-08-27 国网浙江省电力有限公司 Intelligent power grid operation mode adjusting method and system
CN113268933A (en) * 2021-06-18 2021-08-17 大连理工大学 Rapid structural parameter design method of S-shaped emergency robot based on reinforcement learning
CN114355776A (en) * 2022-01-04 2022-04-15 神华神东电力有限责任公司 Control method and control system for generator set

Also Published As

Publication number Publication date
CN111523737B (en) 2022-06-28

Similar Documents

Publication Publication Date Title
CN111523737B (en) Automatic optimization-seeking adjustment method for operation mode of deep Q network-driven power system
CN112615379B (en) Power grid multi-section power control method based on distributed multi-agent reinforcement learning
CN114362196B (en) Multi-time-scale active power distribution network voltage control method
CN112465664B (en) AVC intelligent control method based on artificial neural network and deep reinforcement learning
CN103683337B (en) A kind of interconnected network CPS instruction dynamic assignment optimization method
CN104181900B (en) Layered dynamic regulation method for multiple energy media
CN104037761B (en) AGC power multi-objective random optimization distribution method
CN110414725B (en) Wind power plant energy storage system scheduling method and device integrating prediction and decision
CN105896575B (en) Hundred megawatt energy storage power control method and system based on self-adaptive dynamic programming
CN114784823A (en) Micro-grid frequency control method and system based on depth certainty strategy gradient
CN115907191B (en) Self-adaptive building photovoltaic epidermis model prediction control method
CN115986845A (en) Power distribution network double-layer optimization scheduling method based on deep reinforcement learning
CN113872213B (en) Autonomous optimization control method and device for power distribution network voltage
CN111324167A (en) Photovoltaic power generation maximum power point tracking control method and device
CN116963461A (en) Energy saving method and device for machine room air conditioner
CN105207220B (en) A kind of tapping voltage regulation and control method based on progressive learning
CN114400675B (en) Active power distribution network voltage control method based on weight mean value deep double-Q network
CN115526504A (en) Energy-saving scheduling method and system for water supply system of pump station, electronic equipment and storage medium
CN111293703A (en) Power grid reactive voltage regulation and control method and system based on time sequence reinforcement learning
CN111563699B (en) Power system distribution robust real-time scheduling method and system considering flexibility requirement
CN115912367A (en) Intelligent generation method for operation mode of power system based on deep reinforcement learning
CN111082442B (en) Energy storage capacity optimal configuration method based on improved FPA
CN116755409B (en) Coal-fired power generation system coordination control method based on value distribution DDPG algorithm
CN113051774B (en) Model and data drive-based wind power plant generated power optimization method
CN112564133B (en) Intelligent power generation control method based on deep learning full-state optimal feedback and application

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant