CN114707613A - Power grid regulation and control method based on layered depth strategy gradient network - Google Patents

Power grid regulation and control method based on layered depth strategy gradient network Download PDF

Info

Publication number
CN114707613A
CN114707613A CN202210435606.2A CN202210435606A CN114707613A CN 114707613 A CN114707613 A CN 114707613A CN 202210435606 A CN202210435606 A CN 202210435606A CN 114707613 A CN114707613 A CN 114707613A
Authority
CN
China
Prior art keywords
power grid
network
action
state
regulation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210435606.2A
Other languages
Chinese (zh)
Other versions
CN114707613B (en
Inventor
杜友田
解圣源
王晨希
郭子豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN202210435606.2A priority Critical patent/CN114707613B/en
Publication of CN114707613A publication Critical patent/CN114707613A/en
Application granted granted Critical
Publication of CN114707613B publication Critical patent/CN114707613B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Economics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Probability & Statistics with Applications (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

A power grid regulation and control method based on a layered depth strategy gradient network is characterized in that a state representation vector and an action representation vector of the power grid are designed for the power grid; clustering the action space to enable the action number of each cluster to be equal, designing a power grid regulation and control model by using a strategy gradient algorithm by taking a state representation vector as the input of the network based on a layered strategy gradient network, wherein the model has two layers, each layer is an independent strategy gradient model, the first layer selects an action cluster firstly, and the second layer selects specific actions in the clusters to perform continuous decision making; and interacting the model with the simulation power grid operation environment based on a discrete power grid operation data set of the simulation power grid environment, obtaining the current state from the simulation power grid operation environment, and handing the power grid action to be executed to the simulation power grid operation environment to realize the purpose of power grid regulation.

Description

Power grid regulation and control method based on layered depth strategy gradient network
Technical Field
The invention belongs to the technical field of intelligent power grids, relates to artificial intelligence enhancement of power grid flow regulation and control, and particularly relates to a power grid regulation and control method based on a layered deep strategy gradient network.
Background
The power grid is a core infrastructure for national economic operation and is a high-dimensional tight coupling complex dynamic system. By providing reliable electricity to industry, services and consumers, a central economic and social role is played. The operation, dispatching and regulation of the power grid highly depend on a safety and stability automation device as a first safety defense line, and after the defense line fails, the final safety of the whole system is guaranteed by the experience cognitive decision of people on the power grid.
The traditional power grid regulation and control system is difficult to timely regulate the regulation and control strategy, and the regulation and control strategy setting period is long. The existing regulation and control strategy is usually a power grid safety and stability analysis technology based on which a dispatcher fully grasps the characteristics and rules of the safe operation of a power grid through computer simulation calculation, and needs to quickly and accurately clarify the weak points of the power grid to make a fault strategy offline, so that the existing regulation and control strategy is very dependent on manual experience. With the continuous access of renewable energy sources in modern power grids, the complexity and the time variation of the operation mode of the power grid are increased continuously, so that a dispatcher cannot master the characteristic information and rule in safe operation, the capability of handling the uncertainty caused by the renewable energy sources and other system emergencies is limited, the traditional regulation and control rule cannot be completely adapted, the problems of great adaptability and robustness are faced, and the risk of power grid operation is increased.
The existing power grid intelligent regulation and control method extracts key features in a power grid to help a dispatcher to more effectively acquire key information of the power grid through massive data and information generated in operation and management of a power system, or makes a fine decision tree to assist decision according to the operation state of the power grid. However, when the structure of the power grid changes, the established power grid regulation and control model needs to be redesigned and trained, the regulation and control strategy cannot be determined according to the overall condition of the power grid, and the reliability and the agility of the overall decision of the power grid are difficult to ensure. Therefore, a power grid regulation and control model with stronger generalization capability and higher efficiency needs to be established urgently.
A power grid dispatching operation neighborhood knowledge model is provided by an intelligent machine dispatcher facing dispatching decisions [ J ] power grid technology, 2020,44(1):1-8 ].
A method combining simulated learning and deep reinforcement learning is provided in a document [ Lan T, Duan J, Zhang B, et al. AI-based autonomous line flow control vision for maximum knowledge time-series ATCs [ C ]//2020IEEE Power & Energy Society General Meeting (PESGM). IEEE,2020:1-5 ], so that the fault tolerance and robustness of the system are effectively improved.
The documents [ Kim B G, Zhang Y, Van Der Schaar M, et al.dynamic balancing and energy consumption coordination with regeneration Learning [ J ]. IEEE Transactions on smart grid,2015,7(5): 2187-.
The document [ Huang Tian En, Sun hong, Guo Qing, etc. ] provides an online distributed safety feature selection method based on power grid feature quantity correlation grouping and adapted to power grid operation big data [ J ]. power system automation, 2016,40(4):32-40 ].
[ Duan J, Shi D, Diao R, et al, Deep-Learning-based autonomous optimization control for Power grid operations [ J ]. IEEE Transactions on Power Systems,2019,35(1):814-817 ] proposes a grid autonomous optimization control and decision framework with online Learning function, namely a 'grid brain' system, which uses two DRL algorithms, namely a Deep Q-Learning Network (DQN) and a Deep Deterministic Policy Gradient Network (DDPG), to solve the automatic voltage control problem, and an AI agent can learn and solve the output problem of each Power generation device under the Power grid considering various constraints and practical constraints.
Therefore, the study based on the traditional machine learning algorithm cannot meet the complexity of the operation mode of the power grid and the reliability and agility of the global decision required by the power grid, and the deep reinforcement learning technology becomes an effective method for solving the power grid regulation and control problem. Therefore, the invention provides a more effective decision method aiming at the efficiency condition of deep learning model training and exploration in power grid regulation and control under the environment that the deep reinforcement learning technology is applied to the high-dimensional continuous state space and the high-dimensional discrete action space of the power grid, and the effect in the actual power grid application is improved.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide a power grid regulation and control method based on a layered depth strategy gradient network, which is based on interaction between an intelligent agent for deep reinforcement learning and a simulation power grid environment, learns a large amount of power grid regulation and control knowledge and mapping relation between power grid states and regulation and control behaviors, provides a feasible means for real-time regulation and control of a power grid, and carries out algorithm design aiming at high-dimensional states and action spaces in complex problems.
In order to achieve the purpose, the invention adopts the technical scheme that:
a power grid regulation and control method based on a layered depth strategy gradient network comprises the following steps:
step 1, acquiring power grid information, and constructing a state space and an action space, wherein the state space and the action space are both composed of a continuous space variable and a discrete space variable; the continuous space variables of the state space comprise time, generator power generation power, generator terminal voltage, load power, node voltage, line tide value and voltage, and the discrete space variables comprise a network topological structure; the continuous space variable of the action space comprises generator output adjustment and load power adjustment, and the discrete space variable comprises a transmission line on-off state and a connection topological structure of double buses and each component in a transformer substation node;
step 2, clustering the action space to ensure that the action number of each cluster is equal;
step 3, designing a state characterization vector S and an action characterization vector A for the power grid;
step 4, designing a power grid regulation and control model based on a layered strategy gradient network, wherein the power grid regulation and control model has two layers, each layer is an independent strategy gradient network, a state representation vector S is used as the input of each layer of strategy gradient network, the power grid regulation and control model is trained by using a strategy gradient algorithm, continuous selection is carried out, the first layer of the power grid regulation and control model selects an action cluster firstly, the second layer selects a specific action in the cluster secondly, and the power grid regulation and control model is designed according to the given state StPost-output concrete grid action atIs the product of the probabilities of the two selections;
step 5, simulating a power grid operation environment based on a discretized power grid operation data set, interacting the power grid regulation and control model with the simulated power grid operation environment, acquiring a current state and a final action to be executed by the power grid regulation and control model from the simulated power grid operation environment, handing the final action to be executed to the simulated power grid operation environment for executing, realizing the purpose of power grid regulation and control, feeding back instant rewards, combining the state of the power grid, the action of power grid regulation and control and the rewards acquired through feedback, and collecting experience sample data;
and 6, estimating the value of the action according to the collected experience sample data and the returned reward, updating the network parameters, and then returning to execute the step 5, so that the continuous interaction of the simulation power grid operating environment is realized, and the aim of training a power grid regulation and control model is fulfilled.
In the step 2, a simulation environment exploration mechanism is introduced to perform dimensionality reduction processing on the action space, state information of the power grid before and after the execution of each power grid action in the power grid environment, namely the magnitude of a current value in each power transmission line in the power grid, is used as a characteristic vector representing the power grid action in the action space after dimensionality reduction, and then clustering operation is performed on the characteristic vector.
The clustering adopts a K-means algorithm, firstly, randomly selects K characteristic vectors of power grid actions in an action space as initial clustering centers, calculates the distance between the characteristic vectors and each clustering center for the rest characteristic vectors, classifies the characteristic vectors nearby, and then updates the clustering centers for many times in an iterative mode until the clustering results with the same number of each type are obtained, namely, the similarity of objects in the same cluster is high, and the similarity of objects in different clusters is low.
Step 3, representing and corresponding components and transmission lines contained in the power grid by using numbers, wherein the components comprise a transformer substation node, a generator node and a load node; then, forming one-dimensional state characterization vectors S by variables contained in the components and the transmission lines;
specific power increasing/decreasing values of the output power adjustment and the load power adjustment of the generator are placed in the corresponding number positions of the one-dimensional motion vector S, the on/off state switching motion of the transmission line is represented by 1 and 0, the connection state of each component and a double bus in the node of the transformer substation is represented by 0,1 and 2, 0 represents that the component is disconnected with all buses, 1 represents that the component is connected with the bus No. 1, and 2 represents that the component is connected with the bus No. 2, so that a motion representation vector A is obtained.
Step 4, the vector S is represented by the current statetAs an input to the per-layer policy gradient network, the initialization policy θ ═ θ1,θ2),θ1And theta2Parameter vectors, p, representing target policies of a first layer policy gradient network and a second layer policy gradient network, respectivelytRepresenting a path from a state input of a first layer policy gradient network to a target policy output of a second layer policy gradient network at time step t, the path consisting of two choices, the first layer policy gradient network each choice being denoted 1 to c1Integer between, second layer policy gradient network representation 1 to c2An integer of c1Is the number of clusters after the clustering of actions, c2Is the number of specific actions within a cluster.
In said step 5, calculating the reward according to the obtained reward
Figure BDA0003612792170000051
And calculates a policy function:
Figure BDA0003612792170000052
updating network parameters, and updating loss function of the network is as follows:
Figure BDA0003612792170000053
in the formula (I), the compound is shown in the specification,
Figure BDA0003612792170000054
representing the current state characterization vector StLower selected power network action A after outputting to the policy networktWherein γ is the discount reward coefficient, γ ∈ [0,1]]N is the length of a primary sequence, i.e. the number of samples; theta is a policy gradient network parameter,
Figure BDA0003612792170000055
representing the gradient of the output of the policy network at the current input, st、atRepresenting the state token vector, the motion token vector, pi at the t-th momentθ′(At|St) Characterizing a vector s for a current statetThe output of the lower policy network is,
Figure BDA0003612792170000056
representing a current state characterization vector stA selected after the output of the strategy networktA value estimate of (2);
updating network parameters of the policy gradient network as follows:
θ=θ+αΔθ
in the formula, theta is a strategy gradient network parameter, alpha is an updating step length, namely a learning rate, and alpha belongs to [0,1 ].
Compared with the prior art, the method provided by the invention learns a large amount of power grid regulation and control knowledge and the mapping relation between the power grid state and the regulation and control behavior, provides a feasible means for real-time regulation and control of the power grid, has important influence on the training and convergence speed of the model in a high-dimensional space, and is proved by theory and experiments that the method can be suitable for actual complex power grid regulation and control scenes.
Drawings
FIG. 1 is an overall flow diagram of the present invention.
Fig. 2 is a schematic diagram of the numbering of the power grid structure in the embodiment of the present invention.
Fig. 3 is a structural diagram of a power grid regulation and control model designed based on a hierarchical policy gradient network in the embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples.
As shown in fig. 1, the present invention is a power grid regulation method based on a hierarchical depth policy gradient network, comprising the following steps:
step 1: and acquiring power grid information, and constructing a state space and an action space.
The state space and the action space of the power grid are both composed of continuous space variables and discrete space variables. Generally, the continuous space variables of the state space comprise time, generator power and generator terminal voltage, load power, node voltage, line tide value, voltage and the like, and the discrete space variables mainly comprise network topology. The continuous variables of the action space comprise output adjustment of the generator, load power adjustment and the like, and the discrete variables comprise the on-off state of a transmission line, the connection topological structure of double buses and each element in a substation node and the like.
And 2, clustering the motion space to ensure that the motion number of each cluster is equal.
In the action space of the power grid, a large number of actions without practical significance exist in the action space for adjusting the power grid topological structure. In this regard, in one embodiment of the present invention, a simulation environment exploration mechanism is introduced to perform dimension reduction processing on the action space. The specific operation is that simulation operation is carried out on each scene in a power grid seed data set (the data set comprises discretization power grid operation seed data of different years, months and dates, and each scene is a different operation scene), a certain action in an action space is traversed and executed, the fault solving capacity of the action is recorded and quantized into the acquired instant reward, and the steps (state input, action selection, action execution, reward feedback and the next state) are repeated until the explored power grid scene number reaches the proportion n (super-parameter, between 0 and 1) of the trained data aggregation scene number. And if the average reward value is negative, the potential value of the action is considered to be negative, and the action is deleted from the action space, so that the dimension reduction processing of the action space is realized. Therefore, the action space can be simplified, and the network exploration efficiency is improved.
In the action space after the dimensionality reduction, the state information of the power grid before and after the action of each power grid in the power grid environment is executed, namely the magnitude of the current value in each power transmission line in the power grid is used as a characteristic vector for representing the action of the power grid, and then clustering operation is carried out on the characteristic vector, so that actions similar to the fault action solving of the power grid environment can be divided into a cluster to form an action class.
Illustratively, the clustering of the invention adopts a K-means algorithm, firstly randomly selecting K characteristic vectors of power grid actions in an action space as initial clustering centers, calculating the distance between the characteristic vectors and each clustering center for the rest characteristic vectors, classifying the characteristic vectors nearby, and then updating the clustering centers for many times in an iterative mode until obtaining clustering results with equal number of each type, namely, the similarity of objects in the same cluster is high, and the similarity of objects in different clusters is low.
And 3, designing a state characterization vector S and an action characterization vector A for the power grid.
For the specific power grid structure to be applied, as shown in fig. 2, the number of substation nodes, generator nodes, load nodes, transmission lines, and the like included in the power grid is determined and numbered. And representing and corresponding components such as substation nodes, generator nodes, load nodes and the like in the power grid and transmission lines by using the numbers. Then, variables contained in the components and the transmission line are put into proper positions to form a one-dimensional state characterization vector S. For example, the generator node comprises the generated power and the generator-end voltage variable, the load node comprises the load power variable, the transformer substation and the transmission line comprise the topological structure represented by the number connection, and the like. Specific power increasing/decreasing values of the output power adjustment and the load power adjustment of the generator are placed in the corresponding number positions of the one-dimensional motion vector S, the on/off state switching motion of the transmission line is represented by 1 and 0, the connection state of each component and a double bus in the node of the transformer substation is represented by 0,1 and 2, 0 represents that the component is disconnected with all buses, 1 represents that the component is connected with the bus No. 1, and 2 represents that the component is connected with the bus No. 2, so that a motion representation vector A is obtained.
Wherein the components in the state are explained as follows:
time: the real-time of the operation of the power grid, particularly the time of year, month and day;
generated power of the generator: at the current time, the active power P and the reactive power Q sent by each generator;
terminal voltage: at the present time, the outlet voltage of each generator;
load power: at the present time, the total power (including active power and reactive power) of each load node (e.g., a power utilization region is equivalent to a whole);
node voltage: at the current time, the voltage value of each substation node;
line current value and voltage: at the current time, the current value in each power transmission line and the voltage values at the two ends of each power transmission line;
the network topology structure is as follows: at the current time, the connection relationship and the state of all components in the power grid.
Step 4, designing a power grid regulation and control model based on a layered strategy gradient network, wherein the power grid regulation and control model has two layers, each layer is an independent strategy gradient network, and a state representation vector S is used as the input of each layer of strategy gradient network (can be subjected to data preprocessing functions such as normalization and the like)
Figure BDA0003612792170000081
Preprocessing), training a power grid regulation and control model by using a strategy gradient algorithm, and continuously selecting, wherein the first layer of the power grid regulation and control model selects an action cluster firstly, and the second layer selects specific actions in the cluster secondly, wherein in a given state stPost-output concrete grid action atIs the product of the probabilities of the two selections.
The power grid regulation and control model has two strategy selections, wherein the first strategy is to select the class A(s) where the action is located, and the second strategy selection is to select a specific action A from the last selected cluster. Fig. 3 shows a layered power grid regulation model, where a primary network is the first layer of the model and a secondary network is the second layer of the model.
And 5, simulating a power grid operation environment based on the discretized power grid operation data set, interacting a power grid regulation and control model with the simulated power grid operation environment, acquiring the current state and the final action to be executed by the power grid regulation and control model from the simulated power grid operation environment, handing the final action to be executed to the simulated power grid operation environment for executing, realizing the purpose of power grid regulation and control, feeding back instant rewards, combining the state of the power grid, the action of power grid regulation and control and the rewards obtained by feedback, and collecting experience sample data.
The goal of the network training is to maximize the expected discount jackpot, i.e. the
Figure BDA0003612792170000091
The gradient calculation of the parameters is as follows:
Figure BDA0003612792170000092
wherein piθ(as) is the probability of taking the action a at state s, and Qπ(s, a) represents the expected discounted jackpot starting with s and a, and pi can be estimated empirically by sampling the trajectory following the strategyθ
Specifically, the design and training method of the power grid regulation and control model comprises the following steps:
and 3.1, determining structural parameters of the deep hierarchical strategy gradient network, such as hyperparameters of the number of neurons of an input layer, a hidden layer and an output layer, an activation function, parameter initialization and the like.
Step 3.2, representing the vector S by the current power grid statetAs each layerInputting strategy gradient network, initializing strategy theta ═ theta12),θ1And theta2Parameter vectors, p, representing target policies of a first layer policy gradient network and a second layer policy gradient network, respectivelytRepresenting a path from a state input of a first layer policy gradient network to a target policy output of a second layer policy gradient network at time step t, the path consisting of two choices, the first layer policy gradient network each choice being denoted 1 to c1Integer between, second layer policy gradient network representation 1 to c2An integer of c1Is the number of clusters after the clustering of actions, c2Is the number of specific actions within a cluster. The output of the grid action is corresponding to ptTwo choices along ptTraversing the two layers of strategy gradient networks and finally reaching the output of the second layer of strategy gradient network; thus, path ptIs mapped to atIn a grid-based environment, thus given a state StWhen the probability of selecting one power grid action output is along ptThe product of the probabilities of the two selections is made to obtain a specific action A at the second levelt. Executed in a power grid environment and obtaining a feedback instant reward value rtAnd the state characterization vector S at the next momentt+1(ii) a The state of the power grid, the action of regulating and controlling the power grid and the reward groups obtained by feedback<St,rt,A,St+1>And collecting experience sample data.
Step 3.3, calculate the reward according to getting
Figure BDA0003612792170000093
And calculating a policy function, wherein the policy function is calculated as follows because the selection of the policy is performed twice:
Figure BDA0003612792170000101
in the formula (I), the compound is shown in the specification,
Figure BDA0003612792170000102
representing the current state token vector StLower selected power network action A after outputting to the policy networktWherein γ is the discount reward coefficient, γ ∈ [0,1]]And n is the length of a one-time sequence, i.e., the number of samples.
And 3.4, updating the network parameters, wherein the network updating loss function is as follows:
Figure BDA0003612792170000103
wherein theta is a policy gradient network parameter,
Figure BDA0003612792170000104
representing the output gradient of the policy network at the current input, St、AtRepresenting the state token vector, the motion token vector, pi at the t-th momentθ′(At|St) Characterizing a vector s for a current statetThe output of the lower policy network is,
Figure BDA0003612792170000105
representing a current state characterization vector stA selected after the output of the strategy networktValue of (2).
Step 3.5: updating network parameters of the policy gradient network as follows:
θ=θ+αΔθ
in the formula, theta is a strategy gradient network parameter, alpha is an updating step length, namely a learning rate, and alpha belongs to [0,1 ].
The above is the hierarchical policy gradient network design process, which is the flow shown in fig. 3.
And 6, calculating loss by using the sampled sample data according to the designed network loss function, the optimized target and the like, and updating and optimizing network parameters through gradient back propagation. And continuously enabling the network and the simulation power grid environment to interactively collect new and more diversified power grid sample data based on the updated network parameters, estimating the value of the action according to the collected experience sample data and the returned reward, updating the network parameters, and then returning to the step 5 to realize continuous interaction of the simulation power grid operating environment until the network converges, thereby achieving the purpose of training the power grid regulation and control model. The converged model can directly output the power grid action capable of solving the fault when the power grid fails, so that the purpose of quickly responding to and solving the fault is achieved.
The design process of the power grid regulation and control model based on the hierarchical policy network is the logic flow as shown in the figure. In the invention, because the action space of the power grid consists of parts such as the adjustment of the connection topological structure of double buses and each element in a substation node and the like, the action space is discrete space variable, and the action space can only be adjusted by fixed arrangement and combination due to the limitation of the physical structure of the power grid, and elements cannot be added or deleted at will so as to achieve the aim of continuously changing the topological structure.
Therefore, the application condition of the hierarchical strategy gradient network in the power grid flow regulation problem can be met, namely the input and the output of the network are discrete spaces. For the explanation of decision reasoning in the power grid flow regulation and control problem, the invention considers that the effective regulation and control behavior is not unique at a certain time state in the actual power grid regulation and control, and one-to-many situations can exist; conversely, an adjustment is not only valid for a certain state, but many-to-one situations are quite possible.
The overall process of the invention can be summarized as the following algorithm:
inputting: iteration round number T, power grid state characterization vector S, action characterization vector A, attenuation coefficient gamma, updating coefficient alpha and number c of action clusters1Number of actions in a cluster c2Batch _ size ═ n, policy network parameter θ;
and (3) outputting: an optimal policy network parameter θ;
initialization: performing K-means clustering operation on the action space of the power grid to obtain c1An action cluster { A }1,A2,…Ac1Randomly initializing each strategy gradient network parameter theta;
for each round, loop operation:
step 1, initializing an initial power grid state representation S;
for each time step of the current round, the loop:
step 2, the two layers of strategy networks respectively output the serial number i of the action cluster in the current power grid state and the specific power grid action serial number j in the cluster, wherein i belongs to [1, c ]1],j∈[1,c2];
Step 3 based on two selections of ptGet the corresponding grid action a ═ i, jt
Step 4, according to the current power grid state StObtaining the power grid action A through a strategy networktAnd implementing the power grid simulation environment to obtain the reward Rt+1And new state S in the grid environmentt+1
Step 5 generating a one-screen sequence S0,A0,R1,S1,A1,R2,…,ST-1,AT-1,RT,ST
For each step cycle, T is 0,1, … … T-1
Step 6 calculates Q value:
Figure BDA0003612792170000111
step 7 according to ptThe twice-selected path results in pt1And pt2And calculating a strategy function:
Figure BDA0003612792170000121
wherein the policy function is a softmax policy function that uses a linear combination of a characteristic phi (s, a) describing the state and behavior and a parameter theta to weigh the probability of an action occurring
Figure BDA0003612792170000122
Step 8 updates the network parameter θ by back-propagation using the loss function:
Figure BDA0003612792170000123
step 9, updating the network parameters of the global neural network as follows:
θ=θ+αΔθ
step 10 until reaching the termination state S, ending the current round
The model after training can directly output the power grid action capable of solving the fault when the power grid fails, so that the aim of quickly responding and realizing power grid regulation is fulfilled.

Claims (6)

1. A power grid regulation and control method based on a layered depth strategy gradient network is characterized by comprising the following steps:
step 1, acquiring power grid information, and constructing a state space and an action space, wherein the state space and the action space are both composed of a continuous space variable and a discrete space variable; the continuous space variables of the state space comprise time, generator power and generator terminal voltage, load power, node voltage, line tide value and voltage, and the discrete space variables comprise a network topological structure; the continuous space variable of the action space comprises generator output adjustment and load power adjustment, and the discrete space variable comprises a transmission line on-off state and a connection topological structure of double buses and each component in a transformer substation node;
step 2, clustering the action space to ensure that the action number of each cluster is equal;
step 3, designing a state characterization vector S and an action characterization vector A for the power grid;
step 4, designing a power grid regulation and control model based on a layered strategy gradient network, wherein the power grid regulation and control model has two layers, each layer is an independent strategy gradient network, a state representation vector S is used as the input of each layer of strategy gradient network, the power grid regulation and control model is trained by using a strategy gradient algorithm for continuous selection, the first layer of the power grid regulation and control model selects action clusters first, and the second layer selects clusters againHaving a particular action within, wherein, at a given state stPost-output concrete grid action atIs the product of the probabilities of the two selections;
step 5, simulating a power grid operation environment based on a discretized power grid operation data set, interacting the power grid regulation and control model with the simulated power grid operation environment, acquiring a current state and a final action to be executed by the power grid regulation and control model from the simulated power grid operation environment, handing the final action to be executed to the simulated power grid operation environment for executing, realizing the purpose of power grid regulation and control, feeding back instant rewards, combining the state of the power grid, the action of power grid regulation and control and the rewards acquired through feedback, and collecting experience sample data;
and 6, estimating the value of the action according to the collected experience sample data and the returned reward, updating the network parameters, and then returning to execute the step 5, so that the continuous interaction of the simulation power grid operating environment is realized, and the aim of training a power grid regulation and control model is fulfilled.
2. The power grid regulation and control method based on the hierarchical depth strategy gradient network of claim 1, wherein in the step 2, a simulation environment exploration mechanism is introduced to perform dimensionality reduction processing on the action space, state information of the power grid before and after execution of each power grid action in the power grid environment, namely the magnitude of a current value in each power transmission line in the power grid, is taken as a feature vector representing the power grid action in the action space after dimensionality reduction, and then clustering operation is performed on the feature vector.
3. The power grid regulation and control method based on the hierarchical depth strategy gradient network of claim 2, wherein the clustering adopts a K-means algorithm, firstly, feature vectors of K power grid actions in an action space are randomly selected as initial clustering centers, distances between the feature vectors and the clustering centers are calculated for the rest of the feature vectors and are classified nearby, and then, the clustering centers are updated for multiple times in an iterative manner until a clustering result with the same number of classes is obtained, namely, the similarity of objects in the same cluster is high and the similarity of objects in different clusters is low.
4. The power grid regulation and control method based on the hierarchical depth strategy gradient network of claim 1, wherein in the step 3, components and transmission lines included in the power grid are represented and correspond to each other by using numbers, wherein the components comprise a substation node, a generator node and a load node; then, forming one-dimensional state characterization vectors S by variables contained in the components and the transmission lines;
specific power increasing/decreasing values of the output power adjustment and the load power adjustment of the generator are placed in the corresponding number positions of the one-dimensional motion vector S, the on/off state switching motion of the transmission line is represented by 1 and 0, the connection state of each component and a double bus in the node of the transformer substation is represented by 0,1 and 2, 0 represents that the component is disconnected with all buses, 1 represents that the component is connected with the bus No. 1, and 2 represents that the component is connected with the bus No. 2, so that a motion representation vector A is obtained.
5. The method for regulating and controlling the power grid of the hierarchical depth strategy gradient network according to claim 1, wherein the vector S is characterized by the current state in the step 4tAs an input to the per-layer policy gradient network, the initialization policy θ ═ θ12),θ1And theta2Parameter vectors, p, representing target policies of a first layer policy gradient network and a second layer policy gradient network, respectivelytRepresenting a path from a state input of a first layer policy gradient network to a target policy output of a second layer policy gradient network at time step t, the path consisting of two choices, the first layer policy gradient network each choice being denoted 1 to c1Integer between, second layer policy gradient network representation 1 to c2An integer of (b), c1Is the number of clusters after the clustering of actions, c2Is the number of specific actions within a cluster.
6. The method for regulating and controlling the power grid of the depth strategy gradient network based on the layering as claimed in claim 1, wherein in the step 5, the calculation is performed according to the obtained rewards
Figure FDA0003612792160000031
And calculates a policy function:
Figure FDA0003612792160000032
updating network parameters, and updating loss function of the network is as follows:
Figure FDA0003612792160000033
in the formula (I), the compound is shown in the specification,
Figure FDA0003612792160000034
representing the current state characterization vector StSelected power grid action A after outputting to strategy networktWherein γ is the discount reward coefficient, γ ∈ [0,1]]N is the length of a primary sequence, i.e. the number of samples; theta is a policy gradient network parameter,
Figure FDA0003612792160000035
representing the gradient of the output of the policy network at the current input, st、atRepresenting the state token vector, the motion token vector, pi at the t-th momentθ′(At|St) Characterizing a vector s for a current statetThe output of the lower policy network is,
Figure FDA0003612792160000036
representing a current state characterization vector stA selected after the output of the strategy networktA value estimate of (2);
updating network parameters of the policy gradient network as follows:
θ=θ+αΔθ
in the formula, theta is a strategy gradient network parameter, alpha is an updating step length, namely a learning rate, and alpha belongs to [0,1 ].
CN202210435606.2A 2022-04-24 2022-04-24 Layered depth strategy gradient network-based power grid regulation and control method Active CN114707613B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210435606.2A CN114707613B (en) 2022-04-24 2022-04-24 Layered depth strategy gradient network-based power grid regulation and control method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210435606.2A CN114707613B (en) 2022-04-24 2022-04-24 Layered depth strategy gradient network-based power grid regulation and control method

Publications (2)

Publication Number Publication Date
CN114707613A true CN114707613A (en) 2022-07-05
CN114707613B CN114707613B (en) 2024-03-12

Family

ID=82174223

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210435606.2A Active CN114707613B (en) 2022-04-24 2022-04-24 Layered depth strategy gradient network-based power grid regulation and control method

Country Status (1)

Country Link
CN (1) CN114707613B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115545197A (en) * 2022-11-24 2022-12-30 中国电力科学研究院有限公司 Power grid regulation and control decision knowledge model construction method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113141012A (en) * 2021-04-24 2021-07-20 西安交通大学 Power grid power flow regulation and control decision reasoning method based on deep deterministic strategy gradient network
US20220004191A1 (en) * 2020-07-01 2022-01-06 Wuhan University Of Technology Usv formation path-following method based on deep reinforcement learning
CN114048903A (en) * 2021-11-11 2022-02-15 天津大学 Intelligent optimization method for power grid safe operation strategy based on deep reinforcement learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220004191A1 (en) * 2020-07-01 2022-01-06 Wuhan University Of Technology Usv formation path-following method based on deep reinforcement learning
CN113141012A (en) * 2021-04-24 2021-07-20 西安交通大学 Power grid power flow regulation and control decision reasoning method based on deep deterministic strategy gradient network
CN114048903A (en) * 2021-11-11 2022-02-15 天津大学 Intelligent optimization method for power grid safe operation strategy based on deep reinforcement learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吴夏铭;李明秋;陈恩志;王春阳;: "基于动作空间噪声的深度Q网络学习", 长春理工大学学报(自然科学版), no. 04, 31 August 2020 (2020-08-31) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115545197A (en) * 2022-11-24 2022-12-30 中国电力科学研究院有限公司 Power grid regulation and control decision knowledge model construction method and device
CN115545197B (en) * 2022-11-24 2023-04-28 中国电力科学研究院有限公司 Power grid regulation and control decision knowledge model construction method and device

Also Published As

Publication number Publication date
CN114707613B (en) 2024-03-12

Similar Documents

Publication Publication Date Title
Saleh et al. A data mining based load forecasting strategy for smart electrical grids
CN113141012B (en) Power grid power flow regulation and control decision reasoning method
Sharma Designing and modeling fuzzy control Systems
CN111325315A (en) Distribution transformer power failure and power loss prediction method based on deep learning
Laouafi et al. One-hour ahead electric load forecasting using neuro-fuzzy system in a parallel approach
Zhou et al. Action set based policy optimization for safe power grid management
CN113887141A (en) Micro-grid group operation strategy evolution method based on federal learning
CN114707613B (en) Layered depth strategy gradient network-based power grid regulation and control method
CN111784019A (en) Power load processing method and device
Hu et al. Finite-time stabilization of fuzzy spatiotemporal competitive neural networks with hybrid time-varying delays
CN114384931A (en) Unmanned aerial vehicle multi-target optimal control method and device based on strategy gradient
Wang et al. Transfer-Reinforcement-Learning-Based rescheduling of differential power grids considering security constraints
Arseniev et al. The Model of a Cyber-Physical System for Hybrid Renewable Energy Station Control
Bin et al. A short-term power load forecasting method based on eemd-abgru
Yuanyuan et al. Artificial intelligence and learning techniques in intelligent fault diagnosis
Wang et al. Energy management strategy for HEV based on KFCM and neural network
Wang et al. Design and Research of Smart Grid Based on Artificial Intelligence
Kundacina et al. Supporting future electrical utilities: Using deep learning methods in ems and dms algorithms
Ding et al. Review of Machine Learning for Short Term Load Forecasting
Kasar et al. Recent Trends in Electrical Power System by using Computational Intelligence Techniques
Vaščák Automatic design and optimization of fuzzy inference systems
Ono et al. Operation Planning Method Using Convolutional Neural Network for Combined Heat and Power System
Wang et al. Summary of Fault Diagnosis Technology in Smart Grid
Dong et al. Adaptive electric load forecaster
Cai et al. Data-Driven Tie-line Scheduling Method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant