CN116362377A - Large power grid region cooperative power flow regulation and control method based on multi-agent strategy gradient model - Google Patents

Large power grid region cooperative power flow regulation and control method based on multi-agent strategy gradient model Download PDF

Info

Publication number
CN116362377A
CN116362377A CN202310159550.7A CN202310159550A CN116362377A CN 116362377 A CN116362377 A CN 116362377A CN 202310159550 A CN202310159550 A CN 202310159550A CN 116362377 A CN116362377 A CN 116362377A
Authority
CN
China
Prior art keywords
agent
power grid
regional
model
power
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310159550.7A
Other languages
Chinese (zh)
Inventor
杜友田
郭子豪
王晨希
常源麟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN202310159550.7A priority Critical patent/CN116362377A/en
Publication of CN116362377A publication Critical patent/CN116362377A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/04Circuit arrangements for ac mains or ac distribution networks for connecting networks of the same frequency but supplied from different sources
    • H02J3/06Controlling transfer of power between connected networks; Controlling sharing of load between connected networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/38Arrangements for parallely feeding a single network by two or more generators, converters or transformers
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/10Power transmission or distribution systems management focussing at grid-level, e.g. load flow analysis, node profile computation, meshed network optimisation, active network management or spinning reserve management
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Tourism & Hospitality (AREA)
  • Power Engineering (AREA)
  • Game Theory and Decision Science (AREA)
  • Geometry (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Quality & Reliability (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Development Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Operations Research (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The large power grid region collaborative power flow regulation and control method based on the multi-agent strategy gradient model is used for carrying out region division and designing a state characterization vector, a local observation characterization vector and an action characterization vector of the large power grid region collaborative power flow regulation and control method; based on a multi-agent strategy gradient model, taking a local observation characterization vector of each agent as input of a first layer, outputting the local observation characterization vector as a specific continuous motion space vector, namely a communication motion, mapping and splicing all communication motions into global strategy communication information, taking the global strategy communication information and the local observation characterization vector as input of a second layer, and outputting the continuous motion as final motion executed by the regional agent in the environment; constructing a simulated power grid operation environment based on a discretized power grid operation data set, interacting a model with the simulated power grid operation environment, collecting batch sample data, and performing model training until convergence; the method can effectively reduce the variance and randomness of multi-agent strategy learning and improve the application effect in large-scale complex power grids.

Description

Large power grid region cooperative power flow regulation and control method based on multi-agent strategy gradient model
Technical Field
The invention belongs to the technical field of intelligent power grids, relates to an artificial intelligent technology for distributed power flow regulation of an electric power network, and particularly relates to a large power grid region collaborative power flow regulation method based on a multi-agent strategy gradient model.
Background
With the increase of energy resources in the proportion of economic structures, the power system network has become one of the high-dimensional dynamic systems with the largest coverage area and the most complex element structure. The tight interconnection of the large power grid and the regional power grid on one hand can transmit electric energy to the place where the electric energy is in the thousand-grid, on the other hand, the vulnerability and the complexity of the power system network are increased, the possibility that the power system suffers from faults is greatly increased, the fault coverage area is wider and uncontrollable, and a serious test is made on the safe and stable operation of the modern power network. The guarantee of safe and stable long-term operation of a large power grid is always a problem of extensive attention of the academic circles and industry. In the industry, the stability of the power grid highly depends on an automation device as a safety line, when abnormal conditions beyond the processing capacity of the automation device are met, a detection device reports to a power grid dispatching mechanism, the safety of the whole power grid is ensured through the regulation and control knowledge of dispatching experts, and the response time and the processing capacity to the abnormal conditions are limited by the expert knowledge capacity; in the academic world, the whole power grid is generally regarded as a research object, the emergency control of large power grid regulation is taken as an application background, the aim of intelligent control and optimization of complex power grid operation scheduling is achieved by using a digital means, such as an artificial intelligent mode of reinforcement learning and the like, and the large-scale power grid has the problems of complex network topology structure, large action space, large uncertainty in the power grid system operation process and the like, so that the difficulties of large model exploration difficulty, large training variance of a cost function and the like are caused.
In an actual scene, a large-scale power grid generally performs regional regulation and control according to administrative units such as ground and the like, and for each region, the acquisition of actual power grid scheduling information is very limited, and the capability of regional power grid operation environment for local perception and local decision reasoning needs to be inspected. Under the research background of the large-scale interconnected power grid, the large-scale power grid is necessary to carry out regional management, the power system network is reasonably planned and divided, the safe operation, the optimal control and the efficient management of each power grid region are facilitated, and the stability of the large-scale interconnected power grid is realized.
The literature [ Glavic m.design of a Resistive Brake Controller for Power System Stability Enhancement Using Reinforcement Learning [ J ]. IEEE Transactions on Control Systems Technology,2005,13 (5): 743-751 ] has studied the application of reinforcement learning algorithms in the control of instantaneous power angle stability of a power grid. Document [ Xu Y, zhang W, liu W, et al Multi agent-Based Reinforcement Learning for Optimal Reactive Power Dispatch [ J ]. IEEE Transactions on Systems Man & Cybernetics Part C,2012,42 (6): 1742-1751 ] study on reactive distribution optimization strategy method based on Multi-agent (Multi-Agents) reinforcement learning, which does not need accurate power grid system model, adopts model-free reinforcement learning algorithm, is very effective in testing in power systems of different scales, and can perform distributed power grid regulation. The literature [ Hossain M J, rahnamay-Naeini m.data-drive, multi-Region Distributed State Estimation for Smart Grids [ C ]//2021IEEE PES Innovative Smart Grid Technologies Europe (ISGT Europe) & IEEE,2021:1-6 ] proposes a distributed state estimation of multiple regions of a power grid for the purpose of coping with the problem of low-latency data processing of power system data required for real-time wide area monitoring of a smart grid, performs region identification based on correlation between geographic distance and power system component states, and evaluates performance of a distributed data-Driven state estimation method using IEEE 118 test cases. The literature [ Cao D, zhao J, hu W, et al data-drive Multi-agent deep reinforcement learning for distribution system decentralized voltage control with high penetration of PVs [ J ]. IEEE Transactions on Smart Grid,2021,12 (5): 4137-4150 ] proposes a multi-agent deep reinforcement learning algorithm that can coordinate the active and reactive control of photovoltaic and existing static reactive compensators and battery storage systems, dividing the grid system into different voltage control regions to achieve better distributed control, and showing the superiority of the proposed method on IEEE 123 node and 342 node systems. North China university of electric power [ Zhao Dongmei, ceramic, ma Taiyi ], an active-reactive coordination scheduling model [ J ] of strategy gradient algorithm is determined based on the depth of multiple agents [ 2021,36 (9): 1914-1925 ] adopts the technology of multiple agents to intelligently organize multiple active regulation resources and reactive regulation resources, and a power grid active-reactive coordination scheduling model is established. The literature [ Tang H, lv K, bak-Jensen B, et al deep neural network-based hierarchical learning method for dispatch control of multi-regional power grid [ J ]. Neural Computing and Applications,2022,34 (7): 5063-5079 ] introduces a deep neural network-based hierarchical learning optimization method to establish an online method to solve the problem of centralized coordinated scheduling of interconnected multi-region power grids, and can effectively redistribute power resources in a large-scale multi-region power grid.
Therefore, the research based on the traditional reinforcement learning algorithm is gradually unable to adapt to the limited distributed regulation and control scene obtained by the power grid information, the multi-agent reinforcement learning technology becomes an effective way for solving the problem of regional collaborative regulation and control of the large power grid, and when the multi-agent reinforcement learning technology is applied to the multi-regional distributed regulation and control of the large power grid, the problems of high variance and non-convergence of strategy exploration and utilization exist, so that the regulation and control effect is greatly reduced.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide a large power grid region collaborative power flow regulation and control method based on a multi-agent strategy gradient model, and an effective multi-agent strategy communication method is utilized to reduce variance and randomness of multi-agent strategy learning, so that the application effect in an actual power grid is improved.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
a large power grid region collaborative power flow regulation and control method based on a multi-agent strategy gradient model comprises the following steps:
step 1, performing topological partition on a large power grid, dividing the large power grid into a plurality of regulation areas, enabling the electric distances inside each regional power grid to be similar, keeping the electric distances among the regional power grids far away, and determining the number N of the regional power grids;
step 2, designing a state characterization vector S, a local observation characterization vector O and an action characterization vector A for the power network; each regional power grid is regulated and controlled by a regional intelligent agent;
step 3, designing a multi-agent strategy gradient model based on a multi-agent near-end strategy optimization algorithm, wherein the model consists of two layers of agents, and locally observing and characterizing vectors o of agents in each area i As input to the first layer, the output is a specific continuous motion space vector, i.e. communication motion
Figure BDA0004093649460000031
All communication actions output by the first layer are mapped and spliced into global strategy communication information +.>
Figure BDA0004093649460000032
Policy communication information->
Figure BDA0004093649460000041
And a local observation characterization vector o i As input of the second layer, the output of the second layer is a continuous action +.>
Figure BDA0004093649460000042
As a final action performed by the regional agent in the environment, i.e., a regulatory action;
step 4, constructing a simulated power grid operation environment based on a discretized power grid operation data set, interacting the model with the simulated power grid operation environment, collecting area sample data by each area agent, obtaining observation information of a current area and final actions executed by an interaction environment from the simulated power grid operation environment by each area agent, executing the final actions to be executed by each area agent by the simulated power grid operation environment, and feeding back global instant rewards, next time status and whether signals are ended or not by the environment;
step 5, after collecting a batch of data, updating model parameters by each regional agent, and then returning to execute step 4, continuously interacting the operation environment of the simulated power grid, and training a multi-agent strategy gradient model until the model performance converges;
and 6, realizing the cooperative regulation and control of the large power grid area based on the trained multi-agent strategy gradient model.
Compared with the prior art, the intelligent power grid distributed regulation and control system has the advantages that the intelligent agents can autonomously learn the collaborative mapping relation from the real-time running state of the regional power grid to the regulation and control action by constructing the multi-intelligent-agent model to interact with the power grid simulation environment, the strategy communication capability of the multi-intelligent-agent in the centralized training is realized, the capability has important influence on the training variance and the convergence speed of the model in the multi-intelligent-agent regulation and control scene, and theory and experiment prove that the intelligent power grid distributed regulation and control system can be suitable for the actual complex power grid distributed regulation and control scene.
In the multi-agent regulation task, because a plurality of regional agents coexist in the same power grid environment and exert an effect and influence on the same power grid environment together, each regional agent needs to consider the communication information of other regional agents to perform cooperative regulation when regulating own regulation strategy. The invention considers that the strategy information among regional intelligent agents is the high-efficiency communication information required by multi-intelligent agent model training, and constructs a two-layer model structure: the proto agent model and the routing agent model are a framework for centralized training distributed execution. In the multi-agent centralized training stage, the proto agent model can provide strategy communication information for the routing agent model, so that the routing agent model utilizes additional strategy information of other regional agents to reduce variance in strategy exploration and evaluation of the proto agent model, and then the proto agent model can be fitted with updated strategies of the routing agent model on line to provide more accurate strategy information, and the two-layer model is trained interactively, so that the performance is improved jointly. The two-layer model structure constructed by the invention can carry out efficient centralized communication at the least possible communication cost, and after model training is finished, the two-layer model can converge to the same performance, so that in the distributed execution stage, namely, when each regional agent respectively regulates and controls the regional power grid, the regional agents only need to deploy the pro agent model, and the high-performance regulation and control can be ensured without communication among the regional agents.
Drawings
Fig. 1 is a general flow chart of the present invention.
Fig. 2 is a schematic diagram of a power network according to an embodiment of the invention.
FIG. 3 is a diagram of a multi-agent strategy gradient model in an embodiment of the invention.
Fig. 4 is a simulation case of the IEEE1888 node large power grid division area power grid in the embodiment of the present invention.
FIG. 5 is a graph comparing the performance of the algorithm of the present invention with the open source algorithm IPPO (Independent PPO), MAPPO (Multi-Agent PPO) in an embodiment of the present invention.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings and examples.
In a large power grid, the power grid information acquisition is limited, so that the distributed regulation and control requirements of the large power grid cannot be met by utilizing a traditional reinforcement learning algorithm. After the multi-agent reinforcement learning method is introduced, the regulation and control effect is limited due to high variance and non-convergence of strategy exploration and utilization.
The invention provides a large power grid regional collaborative power flow regulation and control method based on a multi-agent strategy gradient model, which is characterized in that a two-layer model consisting of a plurality of agents is constructed, a multi-agent reinforcement learning algorithm is utilized to interactively learn an artificial power network environment, each regional agent establishes a mapping relation between a power grid state and regulation and control behaviors, a feasible means is provided for regional and inter-regional regulation and control of a large power grid, a new view angle and a new method are provided for researching the interconnected large power grid, and algorithm design is carried out aiming at the non-stationarity problem existing in multi-agent strategy learning.
Specifically, as shown in fig. 1, the large power grid regional collaborative power flow regulation and control method based on the multi-agent strategy gradient model, namely the distributed power flow regulation and control of the multi-region power grid, comprises the following steps:
step 1, performing topology partitioning on a large power grid, dividing the large power grid into a plurality of regulation and control areas, enabling the electric distances inside each regional power grid to be similar, enabling the electric distances between regional power grids to be far, determining the number N of the regional power grids, and regarding each regional power grid as a regional intelligent agent, namely, regulating and controlling each regional power grid by one regional intelligent agent.
According to the basic principle of multi-region decoupling of a large power grid, the shortest paths among power grid nodes are used as basic electrical distances, the number of divided regional power grids is determined according to a community discovery theory (Community Detection) by a specific power grid scale, and the large power grid is divided into a plurality of regulation and control regions in a graph structure by combining geographic position information.
In the community discovery theory, the shortest paths of line connection between grid nodes are used as calculation indexes, the edge betweenness is calculated, the edge betweenness is defined as the proportion of the number of paths passing through the edge in all the shortest paths in the network to the total number of the shortest paths, the edge betweenness is used for measuring the importance degree of a point or an edge, and the higher the value is, the more important the point or the edge is. The method comprises the steps that a line with the largest edge medium number in a power grid graph structure is used for dividing regions, if one line is easy to be a line in the shortest path between any two nodes, the line can be considered as an important line for carrying current transmission between nodes, the division between power grid regions is carried out, and in an actual power grid scene, the line is called a tie line; the modularity may evaluate the degree of density inside the graph structure, thereby determining the number of regional grids divided.
The community discovery theory can divide the topological structure of the power grid, but does not consider the geographical position information of the power grid nodes, and particularly, the invention improves the K-means algorithm and divides the power grid area according to the geographical position information of the nodes and the shortest path of line connection between the nodes. Firstly, randomly selecting k power grid nodes as initial clustering centers according to the number of regional power grids determined by a community discovery theory; secondly, calculating the shortest distance between the rest nodes and the line connection of each clustering center node and classifying nearby; thirdly, calculating the geographical position center of the nodes in the cluster for each classified cluster, and updating the cluster center in the cluster; repeating the second and third steps until the nodes in the cluster are stable.
In the aspect of data, the reference operation data of an IEEE open source power grid is taken as a base line, a plurality of operation modes such as load fluctuation, load mutation, new energy mutation and the like are constructed, a set of modes is randomly selected in each region to generate power grid operation data, meanwhile, dynamic balance of two sides of the power grid source load is kept, and power data with variable operation modes are constructed.
And 2, designing a state characterization vector S, a local observation characterization vector O and an action characterization vector A for the power network.
The state characterization vector S, the local observation characterization vector O and the action characterization vector A of the power network are continuous space variables; the state representation vector S comprises the power generation power, the load power and the node voltage of the generator on the integral power grid node, and the tide power and the current value on the line; the local observation characterization vector O comprises the power generation power, the load power and the node voltage of a generator on a regional power grid node; the motion characterization vector A is an adjustment value of the current output force of the generator, and the number of electric elements in different areas is different.
For a specific applied power grid structure, as shown in fig. 2, determining the number N of divided regional power grids, and respectively determining the number of generators, the number of loads and the number of lines on nodes in different regional power grids, wherein the different regional power grids are controlled by different regional intelligent agents; the input of the regional intelligent agent is a regional power grid local observation characterization vector, the regional intelligent agent determines input and output dimensions according to the number of electric elements in the region, the input of the regional intelligent agent is a regional power grid local observation characterization vector, the output of the regional intelligent agent is a high-dimensional Gaussian distribution with the same dimensions as the number of generators in the region, and after high-dimensional continuous actions are sampled on the distribution, the climbing rate C of each generator is multiplied to be used as an action adjustment value of the regional intelligent agent in unit time steps.
Wherein the components in the state are explained as follows:
the power generated by the generator: at the current moment, active power P generated by each generator;
load power: at the present moment, the total power (including active power and reactive power) of each load node;
node voltage: at the current moment, the per-unit value of the voltage of each node;
line tide value: at the present moment, the current value and the active power value in each power transmission line.
And 3, designing a multi-agent strategy gradient model based on a multi-agent near-end strategy optimization algorithm, wherein the model is composed of two layers of agents, and in the embodiment of the invention, the two layers of agents are respectively a proto agent and a routing agent. Characterization vector o of local observation of each agent i As an input to the first layer, the protoagent, outputs as a specific continuous motion space vector, called a communication motion
Figure BDA0004093649460000071
The communication actions of all regional agents (namely, all communication actions output by the protoagent) are spliced into global strategy communication information through the mapping of a communication layer>
Figure BDA0004093649460000072
Policy communication information->
Figure BDA0004093649460000073
And a local observation characterization vector o i As the input of the second layer, i.e. the routing agent, the output of the routing agent is a continuous action +.>
Figure BDA0004093649460000074
And as a final action, i.e., a regulatory action, performed by the regional agent in the environment.
In the embodiment of the invention, the pro agent is only composed of an Actor strategy network, the routing agent is composed of two networks of Actor-Critic, as shown in the overall structure of the model in fig. 3, and the input and output dimensions of each regional agent Actor network and Critic network are determined according to the state characterization vector S, the local observation characterization vector O and the action characterization vector A dimension designed in the step 2. The Actor network of the proto agent takes the local observation characterization vector as input, and the Actor and the Critic network of the routing agent take the local observation characterization vector and communication information as input.
In the task of multi-agent cooperation regulation, each regional agent performs strategy exploration and learning by maximizing global rewards, because all agents perform strategy learning and exploration in the same environment, each agent can be influenced by strategy learning and exploration of other agents when performing environment exploration, the agents can not distinguish the influence of other agent strategies on the environment in the training stage, the problem of higher variance is faced, and a large amount of time and calculation are required to be consumed. Therefore, the strategy behavior of the intelligent agent is considered as the communication information, and in the centralized training stage, the proto agent provides the communication information by the Actor network through imitative learning, and the routing agent interacts with the environment after receiving the communication information, so that the stability of strategy learning of the routing agent can be helped; because the proto agent has similar performance with the routing agent by simulating and learning the strategy behavior of the online speculative routing agent, in the actual execution stage, each regional power grid only needs a proto agent model, and the regional power grid can directly output the regulation and control action to interact with the environment without inter-regional communication given by the regional power grid local observation characterization vector, which is called a centralized training distributed execution method.
The design method of the reasoning model is as follows:
and 3.1, determining structural parameters of the multi-agent strategy gradient model, wherein the structural parameters comprise the number of the multi-agent, the dimension of an input layer, the number of neurons of a hidden layer, an activation function and the dimension of an output layer.
Initializing model parameters
Figure BDA0004093649460000081
Wherein θ and ω represent the Actor parameter vectors of the pro agent and the routing agent, respectively, ++>
Figure BDA0004093649460000082
The Critic parameter vector representing the routing agent, the number of regional agents is N.
Step 3.2, for each regionThe energy represents the vector o by local observation of the current area i As an input to the first layer of protoagent model, communication actions are output
Figure BDA0004093649460000091
Splicing all communication actions of the protoagent into global strategy communication information through mapping of a communication layer>
Figure BDA0004093649460000092
Step 3.3, communicating the policy information
Figure BDA0004093649460000093
And a local observation characterization vector o i As an input of the second layer routing agent model, the routing agent outputs a continuous action +.>
Figure BDA0004093649460000094
As the final action executed by the regional agent in the environment, the environment receives the regulation actions of all the routing agents, then carries out once power flow calculation, and feeds back the global rewarding value and the state characterization vector of the whole power grid at the next moment, thereby realizing the reasoning that the regional agent observes the regulation actions from the region.
In step 3.4, during the training phase, the model adopts a method of Centralized Training Distributed Execution (CTDE). The purpose of the proto agent is to infer the real strategy of the routing agent interacting with the environment, specifically, it speculates the strategy of the routing agent in the centralized training stage, so as to provide strategy communication information to help the routing agent train, that is, strategy communication information can be provided in advance in the centralized training stage to help the routing agent train. The goal of the protoagent is to minimize communication action distribution
Figure BDA0004093649460000095
Real action distribution outputted with routing agent +.>
Figure BDA0004093649460000096
KL divergence between; routAfter the ing agent receives the local observation and strategy communication information, the ing agent outputs a regulating action>
Figure BDA0004093649460000097
Interact with the environment with the goal of maximizing round jackpots +.>
Figure BDA0004093649460000098
Wherein gamma is the discount rewarding coefficient, gamma is [0,1 ]]T is the current time, n is the nth time, r k The method is global instant rewards of the environment, the regulation and control of multiple agents in a large power grid are team-type cooperation forms similar to football match, all regional agents share the same global rewards, and the global targets are jointly optimized through cooperation strategies among the agents learned through centralized training.
The update loss function of the Actor network of the Proto agent is as follows:
Figure BDA0004093649460000099
the update loss function of the Actor network of the routing agent is as follows:
Figure BDA00040936494600000910
Figure BDA00040936494600000911
the update loss function of the Critic network of the routing agent is as follows:
Figure BDA0004093649460000101
Figure BDA0004093649460000102
wherein D is KL Indicating the KL divergence between the distributions, θ and ω represent the Actor parameter vectors of the pro agent and the routing agent, respectively,
Figure BDA0004093649460000103
critic parameter vector representing routing agent. R is R i And P i Respectively representing an ith routing agent model and a procoto agent model; />
Figure BDA0004093649460000104
Representing the i-th proto agent observing characterization vector o in the current regional power grid i Lower Actor network->
Figure BDA0004093649460000105
An output of (2); />
Figure BDA0004093649460000106
Indicating that the ith routing agent receives policy communication information +.>
Figure BDA0004093649460000107
Then, the characterization vector o is observed in the current regional power grid i Lower Actor network->
Figure BDA0004093649460000108
An output of (2);
Figure BDA0004093649460000109
the method is characterized in that the method represents a policy network of the ith routing agent before updating, and E represents a confidence domain interval used for measuring the optimization of the policy network under a certain confidence domain.
Figure BDA00040936494600001010
Indicate routing agent receives policy communication information +.>
Figure BDA00040936494600001011
Then, the characterization vector o is observed in the current regional power grid i A lower Critic network representing an assessment of current regional observations; />
Figure BDA00040936494600001012
The observation and evaluation after the T-th step control of the regional power grid are represented; />
Figure BDA00040936494600001013
Is a multi-step dominance function of the routing agent, 1: t represents a T-step policy evaluation on the routing agent; y is i And (3) performing multi-step interaction between the routing agent and the environment for the multi-step TD target of the ith routing agent, and performing strategy evaluation after collecting multi-step sample data.
And 3.5, calculating the model loss of forward propagation by using the sampled batch sample data according to the designed model loss function, and updating the model parameters of the pro agent and the routing agent through gradient back propagation joint optimization.
The Actor-Critic network of the routing agent is updated by adopting a reinforcement learning algorithm optimized by a near-end strategy, and the Actor network of the proto agent adopts imitation learning to infer the strategy behavior of the routing agent on line.
Figure BDA00040936494600001014
Figure BDA00040936494600001015
Figure BDA0004093649460000111
In the method, in the process of the invention,
Figure BDA0004093649460000112
and->
Figure BDA0004093649460000113
Respectively representing the Actor parameter vectors before and after the jth procoto agent is updated; />
Figure BDA0004093649460000114
And (3) with
Figure BDA0004093649460000115
Respectively representing the Actor parameter vectors before and after the jth routing agent is updated; />
Figure BDA0004093649460000116
And->
Figure BDA0004093649460000117
Critic parameter vectors before and after the update of the jth routing agent are respectively represented; k is a superparameter that represents a batch of training samples that can update K network parameters.
After the model is trained and converged, in an actual distributed execution stage, each regional power grid can output a regulating action under the condition that only a local observation characterization is input only by deploying the pro agent model, and the abnormal condition of the power grid is responded quickly, so that the purpose of distributed power flow regulation of the power grid is achieved.
And 4, constructing a simulated power grid operation environment based on the discretized power grid operation data set, wherein in the embodiment of the invention, an open source calculation library is used as a power grid power flow calculation back end to construct the simulated power grid operation environment. According to step 3, the model is interacted with a simulated power grid operation environment, each regional agent collects regional sample data, the regional agents obtain observation information of the current region and final actions executed by the delivery environment from the simulated power grid operation environment, each regional agent delivers the final actions to be executed to the simulated power grid operation environment for execution, and the environment feeds back global instant rewards, the next time state and whether signals are ended or not; if the ending signal is true, ending the current round, and reinitializing the power grid state for interaction; otherwise, repeating the interacting step based on the next state;
and 5, after collecting a batch of data, updating model parameters by each regional agent, and then returning to execute the step 4, continuously interacting the simulated power grid operation environment, and training a multi-agent strategy gradient model until the model performance converges.
And 6, realizing the cooperative regulation and control of the large power grid area based on the trained multi-agent strategy gradient model.
After the training convergence, the protoagent model considers the communication information between the regional power grids in the centralized training stage, and only the protoagent model is required to be deployed for each regional power grid in the actual power grid distributed regulation stage, so that the regional power grids do not need to communicate, and specific regulation actions are output under the condition of only inputting local observation characterization. The regulation and control of each regional intelligent agent can comprehensively consider the conditions of other regional power grids to quickly respond to the abnormality of the whole large power grid, so that the aim of the cooperative regulation and control of the large power grid region is fulfilled.
The invention assumes that when the multi-agent strategy gradient model is used for carrying out power grid distributed power flow regulation, the regulation among regional power grids is parallel, and after all regional agents output regional regulation actions, one-step power flow calculation is needed for the whole power grid.
The invention uses an open source algorithm PPO as baseline, and proposes a PRPPO algorithm (Policy Routing PPO) according to the two-layer model and a centralized strategy communication mechanism, and the whole flow can be summarized as follows:
input: iteration round number T, state set S, observation set O, action set A, regional agent number N, and Actor parameter vectors θ and ω of proto agent and routing agent, critic parameter vector of routing agent
Figure BDA0004093649460000121
And (3) outputting: the proto agent optimal Actor network parameter theta;
initializing: multi-agent strategy gradient model parameters
Figure BDA0004093649460000122
For each iteration round, the loop operation:
step 1, initializing an initial state representation S, and obtaining an observation representation O of the regional intelligent agent;
for each time step of the current round, the operation is cycled:
for each regional agent of the current time step, the cyclic operation is carried out:
the Actor network of Step 2 procoto agent characterizes vector o according to the current local observation i Outputting communication actions
Figure BDA0004093649460000123
Mapping all communication actions of the regional intelligent agent into global communication information through a communication layer
Figure BDA0004093649460000124
After global communication information is obtained, carrying out intelligent agent circulation operation on each area again:
step 3routing agent's receipt of global communication information
Figure BDA0004093649460000125
And a local observation characterization vector o i
The Step 4routing agent's Actor network outputs the actual actions interacting with the environment
Figure BDA0004093649460000126
The real actions of all agents
Figure BDA0004093649460000127
Spliced into the regulating action of the whole power grid>
Figure BDA0004093649460000128
Step 5 acts on the whole power grid
Figure BDA0004093649460000129
Acquiring global rewards and new state characterization;
repeatedly executing until the final state of the current round to obtain a sequence interaction sample S 0 ,A 0 ,R 1 ,S 1 ,A 1 ,R 2 ,…,S T-1 ,A T-1 ,R T ,S T
Constructing a loss function according to all the interactive samples of one round;
step 6 updates the Actor network parameters of the proto agent using the following loss function:
Figure BDA0004093649460000131
step 7 updates the Actor network parameters of the routing agent using the following loss function:
Figure BDA0004093649460000132
Figure BDA0004093649460000133
step 8 updates the critical network parameters of the routing agent using the following loss function:
Figure BDA0004093649460000134
Figure BDA0004093649460000135
step 9 back-propagates the loss function and updates the parameters
Figure BDA0004093649460000136
Figure BDA0004093649460000137
Figure BDA0004093649460000138
Figure BDA0004093649460000139
Step 10 returns to Step 1 to enter the next round, and the model iterative interactive update is performed.
By adopting the above large power grid region division mode, as shown in fig. 4, an IEEE1888 node large power grid is divided into 10 regional power grids as the subject of the present invention. Each regional power grid is regarded as an agent, namely, 10 regional agents carry out multi-agent cooperative regulation and control on the whole large power grid.
The PRPPO algorithm proposed by the present invention is compared with open source algorithm IPPO (Independent PPO) and MAPPO (Multi-Agent PPO) through experimental verification, as shown in fig. 5.
The abscissa is the iteration step number of the model parameters, and the ordinate is the round accumulated rewards, so that the algorithm performance can be evaluated, and the algorithm performance gradually converges along with the iteration of the model parameters. In three types of algorithms, the IPPO algorithm adopts independent learning, no communication behavior exists among regional intelligent agents, and the variance and randomness of the algorithm are large in the training process; the MAPPO adopts a centralized training framework, global state information is used as communication information, and the algorithm has higher stability in the training process; the PRPPO is an algorithm provided by the invention, adopts strategy information as communication information in a centralized training stage by using a two-layer model, ensures stable training, has higher performance, and is best represented in three types of algorithms.

Claims (10)

1. A large power grid region cooperative power flow regulation and control method based on a multi-agent strategy gradient model is characterized by comprising the following steps:
step 1, performing topological partition on a large power grid, dividing the large power grid into a plurality of regulation areas, enabling the electric distances inside each regional power grid to be similar, keeping the electric distances among the regional power grids far away, and determining the number N of the regional power grids;
step 2, designing a state characterization vector S, a local observation characterization vector O and an action characterization vector A for the power network; each regional power grid is regulated and controlled by a regional intelligent agent;
step 3, designing a multi-agent strategy gradient model based on a multi-agent near-end strategy optimization algorithm, wherein the model consists of two layers of agents, and locally observing and characterizing vectors o of agents in each area i As input to the first layer, the output is a specific continuous motion space vector, i.e. communication motion
Figure QLYQS_1
All communication actions output by the first layer are mapped and spliced into global strategy communication information +.>
Figure QLYQS_2
Policy communication information->
Figure QLYQS_3
And a local observation characterization vector o i As input of the second layer, the output of the second layer is a continuous action +.>
Figure QLYQS_4
As a final action performed by the regional agent in the environment, i.e., a regulatory action;
step 4, constructing a simulated power grid operation environment based on a discretized power grid operation data set, interacting the model with the simulated power grid operation environment, collecting area sample data by each area agent, obtaining observation information of a current area and final actions executed by an interaction environment from the simulated power grid operation environment by each area agent, executing the final actions to be executed by each area agent by the simulated power grid operation environment, and feeding back global instant rewards, next time status and whether signals are ended or not by the environment;
step 5, after collecting a batch of data, updating model parameters by each regional agent, and then returning to execute step 4, continuously interacting the operation environment of the simulated power grid, and training a multi-agent strategy gradient model until the model performance converges;
and 6, realizing the cooperative regulation and control of the large power grid area based on the trained multi-agent strategy gradient model.
2. The method for regional collaborative power flow regulation and control of a large power grid based on a multi-agent strategy gradient model according to claim 1, wherein in the step 1, according to the basic principle of multi-regional decoupling of the large power grid, the topological shortest paths among power grid nodes are used as basic electrical distances, the number of regional power grids divided by a specific power grid scale is determined according to a community discovery theory, and the large power grid is divided into a plurality of regulation and control regions in a graph structure by combining geographic position information.
3. The large power grid regional collaborative power flow regulation and control method based on the multi-agent strategy gradient model according to claim 1, wherein in the step 2, a state characterization vector S, a local observation characterization vector O and an action characterization vector a of a power network are all continuous space variables; the state representation vector S comprises the power generation power, the load power and the node voltage of the generator on the integral power grid node, and the tide power and the current value on the line; the local observation characterization vector O comprises the power generation power, the load power and the node voltage of a generator on a regional power grid node; the motion characterization vector A is an adjustment value of the current output force of the generator, and the number of electric elements in different areas is different.
4. The large power grid regional collaborative power flow regulation and control method based on the multi-agent strategy gradient model according to claim 3, wherein the number of generators, the number of loads and the number of lines on nodes are respectively determined in different regional power grids according to the number N of divided regional power grids, and the different regional power grids are controlled by different regional agents; the input of the regional intelligent agent is a local observation characterization vector of the regional power grid, the input dimension and the output dimension are determined according to the number of electric elements in the region, the output of the regional intelligent agent is a high-dimensional Gaussian distribution with the dimension the same as the number of generators in the region, and after the high-dimensional continuous action is sampled on the distribution, the climbing rate C of each generator is multiplied to be used as an action adjustment value of the regional intelligent agent in unit time step.
5. The method for collaborative power flow regulation and control in a large power grid area based on a multi-agent strategy gradient model according to claim 1 or 4, wherein in the step 3, a first layer agent is a proto agent, a second layer agent is a routing agent, and all communication actions output by the proto agent are spliced into a result through mapping of a communication layer
Figure QLYQS_5
The proto agent is only composed of an Actor strategy network, and the routing agent is composed of two networks, namely an Actor-Critic.
6. The large power grid regional collaborative power flow regulation and control method based on the multi-agent strategy gradient model according to claim 5, wherein the reasoning design method of the model is as follows:
determining structural parameters of the model, including the number of multi-agent, the dimension of an input layer, the number of neurons of a hidden layer, an activation function and the dimension of an output layer;
initializing model parameters
Figure QLYQS_6
Wherein θ and ω represent the Actor parameter vectors of the pro agent and the routing agent, respectively, +.>
Figure QLYQS_7
Critic parameter vectors representing routing agents, wherein the number of regional intelligent agents is N;
after receiving the regulation actions of all routing agents, the environment performs one-time power flow calculation, and feeds back the global rewards value and the state characterization vector of the whole power grid at the next moment, so that reasoning about the regulation actions observed by regional intelligent agents is realized;
in the training stage, the model adopts a method of Centralized Training Distributed Execution (CTDE), and the procoto agent speculates the real strategy of the routing agent interacting with the environment, so that strategy communication information can be provided in advance in the centralized training stage to help the routing agent to trainThe goal of the procoto agent is to minimize
Figure QLYQS_8
And->
Figure QLYQS_9
KL divergence of (2); routing agent output->
Figure QLYQS_10
Interact with the environment with the goal of maximizing round jackpots +.>
Figure QLYQS_11
Wherein gamma is the discount rewarding coefficient, gamma is [0,1 ]]T is the current time, n is the nth time, r k Is an immediate rewards for environmental returns;
the update loss function of the Actor network of the Proto agent is as follows:
Figure QLYQS_12
the update loss function of the Actor network of the routing agent is as follows:
Figure QLYQS_13
Figure QLYQS_14
the update loss function of the Critic network of the routing agent is as follows:
Figure QLYQS_15
Figure QLYQS_16
wherein D is KL Indicating the KL divergence between the distributions, θ and ω represent the Actor parameter vectors of the pro agent and the routing agent, respectively,
Figure QLYQS_17
critic parameter vector representing routing agent. R is R i And P i Respectively representing an ith routing agent model and a procoto agent model; />
Figure QLYQS_18
Representing the i-th proto agent observing characterization vector o in the current regional power grid i Lower Actor network->
Figure QLYQS_19
An output of (2); />
Figure QLYQS_20
Indicating that the ith routing agent receives policy communication information +.>
Figure QLYQS_21
Then, the characterization vector o is observed in the current regional power grid i Lower Actor network->
Figure QLYQS_22
An output of (2);
Figure QLYQS_23
the method comprises the steps that an ith routing agent is represented by a policy network before updating, and an epsilon represents a confidence domain interval and is used for measuring the optimization of the policy network under a certain confidence domain;
Figure QLYQS_24
indicate routing agent receives policy communication information +.>
Figure QLYQS_25
Then, the characterization vector o is observed in the current regional power grid i The lower part of the Critic network is provided with a network,representing an assessment of current regional observations; />
Figure QLYQS_26
The observation and evaluation after the T-th step control of the regional power grid are represented; />
Figure QLYQS_27
Is a multi-step dominance function of the routing agent, 1:T represents a T-step policy evaluation on the routing agent; y is i The method comprises the steps that the i-th routing agent is a multi-step TD target, the routing agent performs multi-step interaction with the environment, and policy evaluation is performed after multi-step sample data are collected;
r k the method is global instant rewards of the environment, all regional agents share the same global rewards, and the global targets are jointly optimized through centralized training and learning to a cooperation strategy among the agents;
and calculating forward propagation model loss by using the sampled batch sample data according to the designed model loss function, and updating model parameters of the pro agent and the routing agent through gradient back propagation joint optimization.
7. The large power grid regional collaborative power flow regulation and control method based on the multi-agent policy gradient model according to claim 6, wherein an Actor-Critic network of the routing agent is updated by adopting a reinforcement learning algorithm optimized by a near-end policy, and a proto agent adopts imitation learning to infer the policy behaviors of the routing agent on line, which is expressed as follows:
Figure QLYQS_28
Figure QLYQS_29
Figure QLYQS_30
in the method, in the process of the invention,
Figure QLYQS_31
and->
Figure QLYQS_32
Respectively representing the Actor parameter vectors before and after the jth procoto agent is updated; />
Figure QLYQS_33
And->
Figure QLYQS_34
Respectively representing the Actor parameter vectors before and after the jth routing agent is updated; />
Figure QLYQS_35
And->
Figure QLYQS_36
Critic parameter vectors before and after the update of the jth routing agent are respectively represented; k is a superparameter that represents a batch of training samples that can update K network parameters.
8. The method for regulating and controlling the regional collaborative power flow of a large power grid based on a multi-agent strategy gradient model according to claim 5, wherein step 4 is characterized in that a simulation power grid operation environment is constructed by taking a panapower as a power grid power flow calculation back end based on a discretized power grid operation data set.
9. The method for regulating and controlling the regional collaborative power flow of a large power grid based on a multi-agent strategy gradient model according to claim 5, wherein the step 4 is characterized in that if the ending signal is true, the current round is ended, and the power grid state is reinitialized for interaction; otherwise, the interaction step is repeated based on the next state.
10. The method for regional collaborative power flow regulation and control of a large power grid based on a multi-agent strategy gradient model according to claim 5, wherein in step 6, during distributed regulation and control of the power grid, each regional power grid is only provided with a protoagent model, communication between regional power grids is not needed, and specific regulation and control actions are output under the condition that only local observation characterization is input.
CN202310159550.7A 2023-02-24 2023-02-24 Large power grid region cooperative power flow regulation and control method based on multi-agent strategy gradient model Pending CN116362377A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310159550.7A CN116362377A (en) 2023-02-24 2023-02-24 Large power grid region cooperative power flow regulation and control method based on multi-agent strategy gradient model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310159550.7A CN116362377A (en) 2023-02-24 2023-02-24 Large power grid region cooperative power flow regulation and control method based on multi-agent strategy gradient model

Publications (1)

Publication Number Publication Date
CN116362377A true CN116362377A (en) 2023-06-30

Family

ID=86931165

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310159550.7A Pending CN116362377A (en) 2023-02-24 2023-02-24 Large power grid region cooperative power flow regulation and control method based on multi-agent strategy gradient model

Country Status (1)

Country Link
CN (1) CN116362377A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116611194A (en) * 2023-07-17 2023-08-18 合肥工业大学 Circuit superposition scheduling strategy model, method and system based on deep reinforcement learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116611194A (en) * 2023-07-17 2023-08-18 合肥工业大学 Circuit superposition scheduling strategy model, method and system based on deep reinforcement learning
CN116611194B (en) * 2023-07-17 2023-09-29 合肥工业大学 Circuit superposition scheduling strategy model, method and system based on deep reinforcement learning

Similar Documents

Publication Publication Date Title
CN112615379B (en) Power grid multi-section power control method based on distributed multi-agent reinforcement learning
Lai et al. Distributed multi-DER cooperative control for master-slave-organized microgrid networks with limited communication bandwidth
Trivedi et al. Implementation of artificial intelligence techniques in microgrid control environment: Current progress and future scopes
Liu et al. Hierarchical-structure-based fault estimation and fault-tolerant control for multiagent systems
Xu et al. Multiagent-based reinforcement learning for optimal reactive power dispatch
Yang et al. Minimum-time consensus-based approach for power system applications
Wang et al. Accurate cooperative control for multiple leaders multiagent uncertain systems: A two-layer node-to-node communication framework
Tang et al. Fuzzy-based goal representation adaptive dynamic programming
CN113141012B (en) Power grid power flow regulation and control decision reasoning method
CN110994673B (en) Prediction method for micro-grid self-adaptive anti-islanding disturbance load impedance value
Xi et al. A virtual generation ecosystem control strategy for automatic generation control of interconnected microgrids
CN116362377A (en) Large power grid region cooperative power flow regulation and control method based on multi-agent strategy gradient model
Wang et al. Multi-agent and ant colony optimization for ship integrated power system network reconfiguration
Guériau et al. Constructivist approach to state space adaptation in reinforcement learning
Zeng et al. A multiagent deep deterministic policy gradient-based distributed protection method for distribution network
CN114707613B (en) Layered depth strategy gradient network-based power grid regulation and control method
Nygard et al. Decision support independence in a smart grid
Wei et al. A lite cellular generalized neuron network for frequency prediction of synchronous generators in a multimachine power system
Luitel et al. Wide area monitoring in power systems using cellular neural networks
Zhang et al. A multi-agent deep reinforcement learning based voltage control on power distribution networks
US20170023962A1 (en) Arrangement for operating a technical installation
Abu et al. Echo state network (ESN) based generator speed prediction of wide area signals in a multimachine power system
Liu et al. Cooperative Optimization Strategy for Distributed Energy Resource System using Multi-Agent Reinforcement Learning
Somarakis et al. The effect of delays in the Economic Dispatch Problem for smart grid architectures
Zhang et al. Transmission and Decision-Making Co-Design for Active Support of Region Frequency Regulation Through Distribution Network-Side Resources

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination