CN114707613A - Power grid regulation and control method based on layered depth strategy gradient network - Google Patents
Power grid regulation and control method based on layered depth strategy gradient network Download PDFInfo
- Publication number
- CN114707613A CN114707613A CN202210435606.2A CN202210435606A CN114707613A CN 114707613 A CN114707613 A CN 114707613A CN 202210435606 A CN202210435606 A CN 202210435606A CN 114707613 A CN114707613 A CN 114707613A
- Authority
- CN
- China
- Prior art keywords
- power grid
- network
- action
- state
- regulation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 21
- 230000009471 action Effects 0.000 claims abstract description 110
- 239000013598 vector Substances 0.000 claims abstract description 62
- 238000004088 simulation Methods 0.000 claims abstract description 14
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 11
- 230000005540 biological transmission Effects 0.000 claims description 19
- 238000012512 characterization method Methods 0.000 claims description 17
- 230000006870 function Effects 0.000 claims description 15
- 238000012549 training Methods 0.000 claims description 9
- 230000009467 reduction Effects 0.000 claims description 7
- 230000003993 interaction Effects 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 3
- 150000001875 compounds Chemical class 0.000 claims description 3
- 230000001276 controlling effect Effects 0.000 claims description 3
- 230000003247 decreasing effect Effects 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 claims description 3
- 230000001105 regulatory effect Effects 0.000 claims description 3
- 238000011217 control strategy Methods 0.000 description 5
- 230000006399 behavior Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000002787 reinforcement Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 239000003795 chemical substances by application Substances 0.000 description 2
- 230000007123 defense Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000012938 design process Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000010248 power generation Methods 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 230000001149 cognitive effect Effects 0.000 description 1
- 238000005094 computer simulation Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000003064 k means clustering Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000008929 regeneration Effects 0.000 description 1
- 238000011069 regeneration method Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Economics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Health & Medical Sciences (AREA)
- Public Health (AREA)
- Probability & Statistics with Applications (AREA)
- Water Supply & Treatment (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Supply And Distribution Of Alternating Current (AREA)
Abstract
A power grid regulation and control method based on a layered depth strategy gradient network is characterized in that a state representation vector and an action representation vector of the power grid are designed for the power grid; clustering the action space to enable the action number of each cluster to be equal, designing a power grid regulation and control model by using a strategy gradient algorithm by taking a state representation vector as the input of the network based on a layered strategy gradient network, wherein the model has two layers, each layer is an independent strategy gradient model, the first layer selects an action cluster firstly, and the second layer selects specific actions in the clusters to perform continuous decision making; and interacting the model with the simulation power grid operation environment based on a discrete power grid operation data set of the simulation power grid environment, obtaining the current state from the simulation power grid operation environment, and handing the power grid action to be executed to the simulation power grid operation environment to realize the purpose of power grid regulation.
Description
Technical Field
The invention belongs to the technical field of intelligent power grids, relates to artificial intelligence enhancement of power grid flow regulation and control, and particularly relates to a power grid regulation and control method based on a layered deep strategy gradient network.
Background
The power grid is a core infrastructure for national economic operation and is a high-dimensional tight coupling complex dynamic system. By providing reliable electricity to industry, services and consumers, a central economic and social role is played. The operation, dispatching and regulation of the power grid highly depend on a safety and stability automation device as a first safety defense line, and after the defense line fails, the final safety of the whole system is guaranteed by the experience cognitive decision of people on the power grid.
The traditional power grid regulation and control system is difficult to timely regulate the regulation and control strategy, and the regulation and control strategy setting period is long. The existing regulation and control strategy is usually a power grid safety and stability analysis technology based on which a dispatcher fully grasps the characteristics and rules of the safe operation of a power grid through computer simulation calculation, and needs to quickly and accurately clarify the weak points of the power grid to make a fault strategy offline, so that the existing regulation and control strategy is very dependent on manual experience. With the continuous access of renewable energy sources in modern power grids, the complexity and the time variation of the operation mode of the power grid are increased continuously, so that a dispatcher cannot master the characteristic information and rule in safe operation, the capability of handling the uncertainty caused by the renewable energy sources and other system emergencies is limited, the traditional regulation and control rule cannot be completely adapted, the problems of great adaptability and robustness are faced, and the risk of power grid operation is increased.
The existing power grid intelligent regulation and control method extracts key features in a power grid to help a dispatcher to more effectively acquire key information of the power grid through massive data and information generated in operation and management of a power system, or makes a fine decision tree to assist decision according to the operation state of the power grid. However, when the structure of the power grid changes, the established power grid regulation and control model needs to be redesigned and trained, the regulation and control strategy cannot be determined according to the overall condition of the power grid, and the reliability and the agility of the overall decision of the power grid are difficult to ensure. Therefore, a power grid regulation and control model with stronger generalization capability and higher efficiency needs to be established urgently.
A power grid dispatching operation neighborhood knowledge model is provided by an intelligent machine dispatcher facing dispatching decisions [ J ] power grid technology, 2020,44(1):1-8 ].
A method combining simulated learning and deep reinforcement learning is provided in a document [ Lan T, Duan J, Zhang B, et al. AI-based autonomous line flow control vision for maximum knowledge time-series ATCs [ C ]//2020IEEE Power & Energy Society General Meeting (PESGM). IEEE,2020:1-5 ], so that the fault tolerance and robustness of the system are effectively improved.
The documents [ Kim B G, Zhang Y, Van Der Schaar M, et al.dynamic balancing and energy consumption coordination with regeneration Learning [ J ]. IEEE Transactions on smart grid,2015,7(5): 2187-.
The document [ Huang Tian En, Sun hong, Guo Qing, etc. ] provides an online distributed safety feature selection method based on power grid feature quantity correlation grouping and adapted to power grid operation big data [ J ]. power system automation, 2016,40(4):32-40 ].
[ Duan J, Shi D, Diao R, et al, Deep-Learning-based autonomous optimization control for Power grid operations [ J ]. IEEE Transactions on Power Systems,2019,35(1):814-817 ] proposes a grid autonomous optimization control and decision framework with online Learning function, namely a 'grid brain' system, which uses two DRL algorithms, namely a Deep Q-Learning Network (DQN) and a Deep Deterministic Policy Gradient Network (DDPG), to solve the automatic voltage control problem, and an AI agent can learn and solve the output problem of each Power generation device under the Power grid considering various constraints and practical constraints.
Therefore, the study based on the traditional machine learning algorithm cannot meet the complexity of the operation mode of the power grid and the reliability and agility of the global decision required by the power grid, and the deep reinforcement learning technology becomes an effective method for solving the power grid regulation and control problem. Therefore, the invention provides a more effective decision method aiming at the efficiency condition of deep learning model training and exploration in power grid regulation and control under the environment that the deep reinforcement learning technology is applied to the high-dimensional continuous state space and the high-dimensional discrete action space of the power grid, and the effect in the actual power grid application is improved.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide a power grid regulation and control method based on a layered depth strategy gradient network, which is based on interaction between an intelligent agent for deep reinforcement learning and a simulation power grid environment, learns a large amount of power grid regulation and control knowledge and mapping relation between power grid states and regulation and control behaviors, provides a feasible means for real-time regulation and control of a power grid, and carries out algorithm design aiming at high-dimensional states and action spaces in complex problems.
In order to achieve the purpose, the invention adopts the technical scheme that:
a power grid regulation and control method based on a layered depth strategy gradient network comprises the following steps:
step 3, designing a state characterization vector S and an action characterization vector A for the power grid;
step 4, designing a power grid regulation and control model based on a layered strategy gradient network, wherein the power grid regulation and control model has two layers, each layer is an independent strategy gradient network, a state representation vector S is used as the input of each layer of strategy gradient network, the power grid regulation and control model is trained by using a strategy gradient algorithm, continuous selection is carried out, the first layer of the power grid regulation and control model selects an action cluster firstly, the second layer selects a specific action in the cluster secondly, and the power grid regulation and control model is designed according to the given state StPost-output concrete grid action atIs the product of the probabilities of the two selections;
and 6, estimating the value of the action according to the collected experience sample data and the returned reward, updating the network parameters, and then returning to execute the step 5, so that the continuous interaction of the simulation power grid operating environment is realized, and the aim of training a power grid regulation and control model is fulfilled.
In the step 2, a simulation environment exploration mechanism is introduced to perform dimensionality reduction processing on the action space, state information of the power grid before and after the execution of each power grid action in the power grid environment, namely the magnitude of a current value in each power transmission line in the power grid, is used as a characteristic vector representing the power grid action in the action space after dimensionality reduction, and then clustering operation is performed on the characteristic vector.
The clustering adopts a K-means algorithm, firstly, randomly selects K characteristic vectors of power grid actions in an action space as initial clustering centers, calculates the distance between the characteristic vectors and each clustering center for the rest characteristic vectors, classifies the characteristic vectors nearby, and then updates the clustering centers for many times in an iterative mode until the clustering results with the same number of each type are obtained, namely, the similarity of objects in the same cluster is high, and the similarity of objects in different clusters is low.
Step 3, representing and corresponding components and transmission lines contained in the power grid by using numbers, wherein the components comprise a transformer substation node, a generator node and a load node; then, forming one-dimensional state characterization vectors S by variables contained in the components and the transmission lines;
specific power increasing/decreasing values of the output power adjustment and the load power adjustment of the generator are placed in the corresponding number positions of the one-dimensional motion vector S, the on/off state switching motion of the transmission line is represented by 1 and 0, the connection state of each component and a double bus in the node of the transformer substation is represented by 0,1 and 2, 0 represents that the component is disconnected with all buses, 1 represents that the component is connected with the bus No. 1, and 2 represents that the component is connected with the bus No. 2, so that a motion representation vector A is obtained.
Step 4, the vector S is represented by the current statetAs an input to the per-layer policy gradient network, the initialization policy θ ═ θ1,θ2),θ1And theta2Parameter vectors, p, representing target policies of a first layer policy gradient network and a second layer policy gradient network, respectivelytRepresenting a path from a state input of a first layer policy gradient network to a target policy output of a second layer policy gradient network at time step t, the path consisting of two choices, the first layer policy gradient network each choice being denoted 1 to c1Integer between, second layer policy gradient network representation 1 to c2An integer of c1Is the number of clusters after the clustering of actions, c2Is the number of specific actions within a cluster.
In said step 5, calculating the reward according to the obtained reward
And calculates a policy function:
updating network parameters, and updating loss function of the network is as follows:
in the formula (I), the compound is shown in the specification,representing the current state characterization vector StLower selected power network action A after outputting to the policy networktWherein γ is the discount reward coefficient, γ ∈ [0,1]]N is the length of a primary sequence, i.e. the number of samples; theta is a policy gradient network parameter,representing the gradient of the output of the policy network at the current input, st、atRepresenting the state token vector, the motion token vector, pi at the t-th momentθ′(At|St) Characterizing a vector s for a current statetThe output of the lower policy network is,representing a current state characterization vector stA selected after the output of the strategy networktA value estimate of (2);
updating network parameters of the policy gradient network as follows:
θ=θ+αΔθ
in the formula, theta is a strategy gradient network parameter, alpha is an updating step length, namely a learning rate, and alpha belongs to [0,1 ].
Compared with the prior art, the method provided by the invention learns a large amount of power grid regulation and control knowledge and the mapping relation between the power grid state and the regulation and control behavior, provides a feasible means for real-time regulation and control of the power grid, has important influence on the training and convergence speed of the model in a high-dimensional space, and is proved by theory and experiments that the method can be suitable for actual complex power grid regulation and control scenes.
Drawings
FIG. 1 is an overall flow diagram of the present invention.
Fig. 2 is a schematic diagram of the numbering of the power grid structure in the embodiment of the present invention.
Fig. 3 is a structural diagram of a power grid regulation and control model designed based on a hierarchical policy gradient network in the embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples.
As shown in fig. 1, the present invention is a power grid regulation method based on a hierarchical depth policy gradient network, comprising the following steps:
step 1: and acquiring power grid information, and constructing a state space and an action space.
The state space and the action space of the power grid are both composed of continuous space variables and discrete space variables. Generally, the continuous space variables of the state space comprise time, generator power and generator terminal voltage, load power, node voltage, line tide value, voltage and the like, and the discrete space variables mainly comprise network topology. The continuous variables of the action space comprise output adjustment of the generator, load power adjustment and the like, and the discrete variables comprise the on-off state of a transmission line, the connection topological structure of double buses and each element in a substation node and the like.
And 2, clustering the motion space to ensure that the motion number of each cluster is equal.
In the action space of the power grid, a large number of actions without practical significance exist in the action space for adjusting the power grid topological structure. In this regard, in one embodiment of the present invention, a simulation environment exploration mechanism is introduced to perform dimension reduction processing on the action space. The specific operation is that simulation operation is carried out on each scene in a power grid seed data set (the data set comprises discretization power grid operation seed data of different years, months and dates, and each scene is a different operation scene), a certain action in an action space is traversed and executed, the fault solving capacity of the action is recorded and quantized into the acquired instant reward, and the steps (state input, action selection, action execution, reward feedback and the next state) are repeated until the explored power grid scene number reaches the proportion n (super-parameter, between 0 and 1) of the trained data aggregation scene number. And if the average reward value is negative, the potential value of the action is considered to be negative, and the action is deleted from the action space, so that the dimension reduction processing of the action space is realized. Therefore, the action space can be simplified, and the network exploration efficiency is improved.
In the action space after the dimensionality reduction, the state information of the power grid before and after the action of each power grid in the power grid environment is executed, namely the magnitude of the current value in each power transmission line in the power grid is used as a characteristic vector for representing the action of the power grid, and then clustering operation is carried out on the characteristic vector, so that actions similar to the fault action solving of the power grid environment can be divided into a cluster to form an action class.
Illustratively, the clustering of the invention adopts a K-means algorithm, firstly randomly selecting K characteristic vectors of power grid actions in an action space as initial clustering centers, calculating the distance between the characteristic vectors and each clustering center for the rest characteristic vectors, classifying the characteristic vectors nearby, and then updating the clustering centers for many times in an iterative mode until obtaining clustering results with equal number of each type, namely, the similarity of objects in the same cluster is high, and the similarity of objects in different clusters is low.
And 3, designing a state characterization vector S and an action characterization vector A for the power grid.
For the specific power grid structure to be applied, as shown in fig. 2, the number of substation nodes, generator nodes, load nodes, transmission lines, and the like included in the power grid is determined and numbered. And representing and corresponding components such as substation nodes, generator nodes, load nodes and the like in the power grid and transmission lines by using the numbers. Then, variables contained in the components and the transmission line are put into proper positions to form a one-dimensional state characterization vector S. For example, the generator node comprises the generated power and the generator-end voltage variable, the load node comprises the load power variable, the transformer substation and the transmission line comprise the topological structure represented by the number connection, and the like. Specific power increasing/decreasing values of the output power adjustment and the load power adjustment of the generator are placed in the corresponding number positions of the one-dimensional motion vector S, the on/off state switching motion of the transmission line is represented by 1 and 0, the connection state of each component and a double bus in the node of the transformer substation is represented by 0,1 and 2, 0 represents that the component is disconnected with all buses, 1 represents that the component is connected with the bus No. 1, and 2 represents that the component is connected with the bus No. 2, so that a motion representation vector A is obtained.
Wherein the components in the state are explained as follows:
time: the real-time of the operation of the power grid, particularly the time of year, month and day;
generated power of the generator: at the current time, the active power P and the reactive power Q sent by each generator;
terminal voltage: at the present time, the outlet voltage of each generator;
load power: at the present time, the total power (including active power and reactive power) of each load node (e.g., a power utilization region is equivalent to a whole);
node voltage: at the current time, the voltage value of each substation node;
line current value and voltage: at the current time, the current value in each power transmission line and the voltage values at the two ends of each power transmission line;
the network topology structure is as follows: at the current time, the connection relationship and the state of all components in the power grid.
Step 4, designing a power grid regulation and control model based on a layered strategy gradient network, wherein the power grid regulation and control model has two layers, each layer is an independent strategy gradient network, and a state representation vector S is used as the input of each layer of strategy gradient network (can be subjected to data preprocessing functions such as normalization and the like)Preprocessing), training a power grid regulation and control model by using a strategy gradient algorithm, and continuously selecting, wherein the first layer of the power grid regulation and control model selects an action cluster firstly, and the second layer selects specific actions in the cluster secondly, wherein in a given state stPost-output concrete grid action atIs the product of the probabilities of the two selections.
The power grid regulation and control model has two strategy selections, wherein the first strategy is to select the class A(s) where the action is located, and the second strategy selection is to select a specific action A from the last selected cluster. Fig. 3 shows a layered power grid regulation model, where a primary network is the first layer of the model and a secondary network is the second layer of the model.
And 5, simulating a power grid operation environment based on the discretized power grid operation data set, interacting a power grid regulation and control model with the simulated power grid operation environment, acquiring the current state and the final action to be executed by the power grid regulation and control model from the simulated power grid operation environment, handing the final action to be executed to the simulated power grid operation environment for executing, realizing the purpose of power grid regulation and control, feeding back instant rewards, combining the state of the power grid, the action of power grid regulation and control and the rewards obtained by feedback, and collecting experience sample data.
The goal of the network training is to maximize the expected discount jackpot, i.e. the
The gradient calculation of the parameters is as follows:
wherein piθ(as) is the probability of taking the action a at state s, and Qπ(s, a) represents the expected discounted jackpot starting with s and a, and pi can be estimated empirically by sampling the trajectory following the strategyθ。
Specifically, the design and training method of the power grid regulation and control model comprises the following steps:
and 3.1, determining structural parameters of the deep hierarchical strategy gradient network, such as hyperparameters of the number of neurons of an input layer, a hidden layer and an output layer, an activation function, parameter initialization and the like.
Step 3.2, representing the vector S by the current power grid statetAs each layerInputting strategy gradient network, initializing strategy theta ═ theta1,θ2),θ1And theta2Parameter vectors, p, representing target policies of a first layer policy gradient network and a second layer policy gradient network, respectivelytRepresenting a path from a state input of a first layer policy gradient network to a target policy output of a second layer policy gradient network at time step t, the path consisting of two choices, the first layer policy gradient network each choice being denoted 1 to c1Integer between, second layer policy gradient network representation 1 to c2An integer of c1Is the number of clusters after the clustering of actions, c2Is the number of specific actions within a cluster. The output of the grid action is corresponding to ptTwo choices along ptTraversing the two layers of strategy gradient networks and finally reaching the output of the second layer of strategy gradient network; thus, path ptIs mapped to atIn a grid-based environment, thus given a state StWhen the probability of selecting one power grid action output is along ptThe product of the probabilities of the two selections is made to obtain a specific action A at the second levelt. Executed in a power grid environment and obtaining a feedback instant reward value rtAnd the state characterization vector S at the next momentt+1(ii) a The state of the power grid, the action of regulating and controlling the power grid and the reward groups obtained by feedback<St,rt,A,St+1>And collecting experience sample data.
Step 3.3, calculate the reward according to getting
And calculating a policy function, wherein the policy function is calculated as follows because the selection of the policy is performed twice:
in the formula (I), the compound is shown in the specification,representing the current state token vector StLower selected power network action A after outputting to the policy networktWherein γ is the discount reward coefficient, γ ∈ [0,1]]And n is the length of a one-time sequence, i.e., the number of samples.
And 3.4, updating the network parameters, wherein the network updating loss function is as follows:
wherein theta is a policy gradient network parameter,representing the output gradient of the policy network at the current input, St、AtRepresenting the state token vector, the motion token vector, pi at the t-th momentθ′(At|St) Characterizing a vector s for a current statetThe output of the lower policy network is,representing a current state characterization vector stA selected after the output of the strategy networktValue of (2).
Step 3.5: updating network parameters of the policy gradient network as follows:
θ=θ+αΔθ
in the formula, theta is a strategy gradient network parameter, alpha is an updating step length, namely a learning rate, and alpha belongs to [0,1 ].
The above is the hierarchical policy gradient network design process, which is the flow shown in fig. 3.
And 6, calculating loss by using the sampled sample data according to the designed network loss function, the optimized target and the like, and updating and optimizing network parameters through gradient back propagation. And continuously enabling the network and the simulation power grid environment to interactively collect new and more diversified power grid sample data based on the updated network parameters, estimating the value of the action according to the collected experience sample data and the returned reward, updating the network parameters, and then returning to the step 5 to realize continuous interaction of the simulation power grid operating environment until the network converges, thereby achieving the purpose of training the power grid regulation and control model. The converged model can directly output the power grid action capable of solving the fault when the power grid fails, so that the purpose of quickly responding to and solving the fault is achieved.
The design process of the power grid regulation and control model based on the hierarchical policy network is the logic flow as shown in the figure. In the invention, because the action space of the power grid consists of parts such as the adjustment of the connection topological structure of double buses and each element in a substation node and the like, the action space is discrete space variable, and the action space can only be adjusted by fixed arrangement and combination due to the limitation of the physical structure of the power grid, and elements cannot be added or deleted at will so as to achieve the aim of continuously changing the topological structure.
Therefore, the application condition of the hierarchical strategy gradient network in the power grid flow regulation problem can be met, namely the input and the output of the network are discrete spaces. For the explanation of decision reasoning in the power grid flow regulation and control problem, the invention considers that the effective regulation and control behavior is not unique at a certain time state in the actual power grid regulation and control, and one-to-many situations can exist; conversely, an adjustment is not only valid for a certain state, but many-to-one situations are quite possible.
The overall process of the invention can be summarized as the following algorithm:
inputting: iteration round number T, power grid state characterization vector S, action characterization vector A, attenuation coefficient gamma, updating coefficient alpha and number c of action clusters1Number of actions in a cluster c2Batch _ size ═ n, policy network parameter θ;
and (3) outputting: an optimal policy network parameter θ;
initialization: performing K-means clustering operation on the action space of the power grid to obtain c1An action cluster { A }1,A2,…Ac1Randomly initializing each strategy gradient network parameter theta;
for each round, loop operation:
for each time step of the current round, the loop:
Step 3 based on two selections of ptGet the corresponding grid action a ═ i, jt;
Step 4, according to the current power grid state StObtaining the power grid action A through a strategy networktAnd implementing the power grid simulation environment to obtain the reward Rt+1And new state S in the grid environmentt+1;
For each step cycle, T is 0,1, … … T-1
Step 6 calculates Q value:
step 7 according to ptThe twice-selected path results in pt1And pt2And calculating a strategy function:
wherein the policy function is a softmax policy function that uses a linear combination of a characteristic phi (s, a) describing the state and behavior and a parameter theta to weigh the probability of an action occurring
Step 8 updates the network parameter θ by back-propagation using the loss function:
step 9, updating the network parameters of the global neural network as follows:
θ=θ+αΔθ
step 10 until reaching the termination state S, ending the current round
The model after training can directly output the power grid action capable of solving the fault when the power grid fails, so that the aim of quickly responding and realizing power grid regulation is fulfilled.
Claims (6)
1. A power grid regulation and control method based on a layered depth strategy gradient network is characterized by comprising the following steps:
step 1, acquiring power grid information, and constructing a state space and an action space, wherein the state space and the action space are both composed of a continuous space variable and a discrete space variable; the continuous space variables of the state space comprise time, generator power and generator terminal voltage, load power, node voltage, line tide value and voltage, and the discrete space variables comprise a network topological structure; the continuous space variable of the action space comprises generator output adjustment and load power adjustment, and the discrete space variable comprises a transmission line on-off state and a connection topological structure of double buses and each component in a transformer substation node;
step 2, clustering the action space to ensure that the action number of each cluster is equal;
step 3, designing a state characterization vector S and an action characterization vector A for the power grid;
step 4, designing a power grid regulation and control model based on a layered strategy gradient network, wherein the power grid regulation and control model has two layers, each layer is an independent strategy gradient network, a state representation vector S is used as the input of each layer of strategy gradient network, the power grid regulation and control model is trained by using a strategy gradient algorithm for continuous selection, the first layer of the power grid regulation and control model selects action clusters first, and the second layer selects clusters againHaving a particular action within, wherein, at a given state stPost-output concrete grid action atIs the product of the probabilities of the two selections;
step 5, simulating a power grid operation environment based on a discretized power grid operation data set, interacting the power grid regulation and control model with the simulated power grid operation environment, acquiring a current state and a final action to be executed by the power grid regulation and control model from the simulated power grid operation environment, handing the final action to be executed to the simulated power grid operation environment for executing, realizing the purpose of power grid regulation and control, feeding back instant rewards, combining the state of the power grid, the action of power grid regulation and control and the rewards acquired through feedback, and collecting experience sample data;
and 6, estimating the value of the action according to the collected experience sample data and the returned reward, updating the network parameters, and then returning to execute the step 5, so that the continuous interaction of the simulation power grid operating environment is realized, and the aim of training a power grid regulation and control model is fulfilled.
2. The power grid regulation and control method based on the hierarchical depth strategy gradient network of claim 1, wherein in the step 2, a simulation environment exploration mechanism is introduced to perform dimensionality reduction processing on the action space, state information of the power grid before and after execution of each power grid action in the power grid environment, namely the magnitude of a current value in each power transmission line in the power grid, is taken as a feature vector representing the power grid action in the action space after dimensionality reduction, and then clustering operation is performed on the feature vector.
3. The power grid regulation and control method based on the hierarchical depth strategy gradient network of claim 2, wherein the clustering adopts a K-means algorithm, firstly, feature vectors of K power grid actions in an action space are randomly selected as initial clustering centers, distances between the feature vectors and the clustering centers are calculated for the rest of the feature vectors and are classified nearby, and then, the clustering centers are updated for multiple times in an iterative manner until a clustering result with the same number of classes is obtained, namely, the similarity of objects in the same cluster is high and the similarity of objects in different clusters is low.
4. The power grid regulation and control method based on the hierarchical depth strategy gradient network of claim 1, wherein in the step 3, components and transmission lines included in the power grid are represented and correspond to each other by using numbers, wherein the components comprise a substation node, a generator node and a load node; then, forming one-dimensional state characterization vectors S by variables contained in the components and the transmission lines;
specific power increasing/decreasing values of the output power adjustment and the load power adjustment of the generator are placed in the corresponding number positions of the one-dimensional motion vector S, the on/off state switching motion of the transmission line is represented by 1 and 0, the connection state of each component and a double bus in the node of the transformer substation is represented by 0,1 and 2, 0 represents that the component is disconnected with all buses, 1 represents that the component is connected with the bus No. 1, and 2 represents that the component is connected with the bus No. 2, so that a motion representation vector A is obtained.
5. The method for regulating and controlling the power grid of the hierarchical depth strategy gradient network according to claim 1, wherein the vector S is characterized by the current state in the step 4tAs an input to the per-layer policy gradient network, the initialization policy θ ═ θ1,θ2),θ1And theta2Parameter vectors, p, representing target policies of a first layer policy gradient network and a second layer policy gradient network, respectivelytRepresenting a path from a state input of a first layer policy gradient network to a target policy output of a second layer policy gradient network at time step t, the path consisting of two choices, the first layer policy gradient network each choice being denoted 1 to c1Integer between, second layer policy gradient network representation 1 to c2An integer of (b), c1Is the number of clusters after the clustering of actions, c2Is the number of specific actions within a cluster.
6. The method for regulating and controlling the power grid of the depth strategy gradient network based on the layering as claimed in claim 1, wherein in the step 5, the calculation is performed according to the obtained rewards
And calculates a policy function:
updating network parameters, and updating loss function of the network is as follows:
in the formula (I), the compound is shown in the specification,representing the current state characterization vector StSelected power grid action A after outputting to strategy networktWherein γ is the discount reward coefficient, γ ∈ [0,1]]N is the length of a primary sequence, i.e. the number of samples; theta is a policy gradient network parameter,representing the gradient of the output of the policy network at the current input, st、atRepresenting the state token vector, the motion token vector, pi at the t-th momentθ′(At|St) Characterizing a vector s for a current statetThe output of the lower policy network is,representing a current state characterization vector stA selected after the output of the strategy networktA value estimate of (2);
updating network parameters of the policy gradient network as follows:
θ=θ+αΔθ
in the formula, theta is a strategy gradient network parameter, alpha is an updating step length, namely a learning rate, and alpha belongs to [0,1 ].
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210435606.2A CN114707613B (en) | 2022-04-24 | 2022-04-24 | Layered depth strategy gradient network-based power grid regulation and control method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210435606.2A CN114707613B (en) | 2022-04-24 | 2022-04-24 | Layered depth strategy gradient network-based power grid regulation and control method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114707613A true CN114707613A (en) | 2022-07-05 |
CN114707613B CN114707613B (en) | 2024-03-12 |
Family
ID=82174223
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210435606.2A Active CN114707613B (en) | 2022-04-24 | 2022-04-24 | Layered depth strategy gradient network-based power grid regulation and control method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114707613B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115545197A (en) * | 2022-11-24 | 2022-12-30 | 中国电力科学研究院有限公司 | Power grid regulation and control decision knowledge model construction method and device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113141012A (en) * | 2021-04-24 | 2021-07-20 | 西安交通大学 | Power grid power flow regulation and control decision reasoning method based on deep deterministic strategy gradient network |
US20220004191A1 (en) * | 2020-07-01 | 2022-01-06 | Wuhan University Of Technology | Usv formation path-following method based on deep reinforcement learning |
CN114048903A (en) * | 2021-11-11 | 2022-02-15 | 天津大学 | Intelligent optimization method for power grid safe operation strategy based on deep reinforcement learning |
-
2022
- 2022-04-24 CN CN202210435606.2A patent/CN114707613B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220004191A1 (en) * | 2020-07-01 | 2022-01-06 | Wuhan University Of Technology | Usv formation path-following method based on deep reinforcement learning |
CN113141012A (en) * | 2021-04-24 | 2021-07-20 | 西安交通大学 | Power grid power flow regulation and control decision reasoning method based on deep deterministic strategy gradient network |
CN114048903A (en) * | 2021-11-11 | 2022-02-15 | 天津大学 | Intelligent optimization method for power grid safe operation strategy based on deep reinforcement learning |
Non-Patent Citations (1)
Title |
---|
吴夏铭;李明秋;陈恩志;王春阳;: "基于动作空间噪声的深度Q网络学习", 长春理工大学学报(自然科学版), no. 04, 31 August 2020 (2020-08-31) * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115545197A (en) * | 2022-11-24 | 2022-12-30 | 中国电力科学研究院有限公司 | Power grid regulation and control decision knowledge model construction method and device |
CN115545197B (en) * | 2022-11-24 | 2023-04-28 | 中国电力科学研究院有限公司 | Power grid regulation and control decision knowledge model construction method and device |
Also Published As
Publication number | Publication date |
---|---|
CN114707613B (en) | 2024-03-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Saleh et al. | A data mining based load forecasting strategy for smart electrical grids | |
CN113141012B (en) | Power grid power flow regulation and control decision reasoning method | |
Sharma | Designing and modeling fuzzy control Systems | |
CN111325315A (en) | Distribution transformer power failure and power loss prediction method based on deep learning | |
Laouafi et al. | One-hour ahead electric load forecasting using neuro-fuzzy system in a parallel approach | |
Zhou et al. | Action set based policy optimization for safe power grid management | |
CN113887141A (en) | Micro-grid group operation strategy evolution method based on federal learning | |
CN114707613B (en) | Layered depth strategy gradient network-based power grid regulation and control method | |
CN111784019A (en) | Power load processing method and device | |
Hu et al. | Finite-time stabilization of fuzzy spatiotemporal competitive neural networks with hybrid time-varying delays | |
CN114384931A (en) | Unmanned aerial vehicle multi-target optimal control method and device based on strategy gradient | |
Wang et al. | Transfer-Reinforcement-Learning-Based rescheduling of differential power grids considering security constraints | |
Arseniev et al. | The Model of a Cyber-Physical System for Hybrid Renewable Energy Station Control | |
Bin et al. | A short-term power load forecasting method based on eemd-abgru | |
Yuanyuan et al. | Artificial intelligence and learning techniques in intelligent fault diagnosis | |
Wang et al. | Energy management strategy for HEV based on KFCM and neural network | |
Wang et al. | Design and Research of Smart Grid Based on Artificial Intelligence | |
Kundacina et al. | Supporting future electrical utilities: Using deep learning methods in ems and dms algorithms | |
Ding et al. | Review of Machine Learning for Short Term Load Forecasting | |
Kasar et al. | Recent Trends in Electrical Power System by using Computational Intelligence Techniques | |
Vaščák | Automatic design and optimization of fuzzy inference systems | |
Ono et al. | Operation Planning Method Using Convolutional Neural Network for Combined Heat and Power System | |
Wang et al. | Summary of Fault Diagnosis Technology in Smart Grid | |
Dong et al. | Adaptive electric load forecaster | |
Cai et al. | Data-Driven Tie-line Scheduling Method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |