CN115986748A

CN115986748A - A data-driven microgrid voltage control method

Info

Publication number: CN115986748A
Application number: CN202211708121.2A
Authority: CN
Inventors: 沈嘉伟; 李培帅; 韩静; 董彦昊; 陈敏强; 王艺涵
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2022-12-29
Filing date: 2022-12-29
Publication date: 2023-04-18

Abstract

The invention discloses a data-driven microgrid voltage control method, which relates to the technical field of microgrid application and comprises the following steps: dividing the microgrid into a plurality of sub-networks which are mutually connected and coupled through tide based on a distributed architecture, and constructing an agent corresponding to the internal control of the sub-networks aiming at each sub-network; constructing a combined control model with the reactive output of the photovoltaic inverter as a main part and the active output of the energy storage system as an auxiliary part, and constructing a barrel-shaped voltage barrier function model by combining a V-shaped voltage barrier function model and a U-shaped voltage barrier function model; taking voltage constraint based on the network total power loss and the voltage barrier function model as target control, balancing voltage deviation and network power loss by using a weighting sum algorithm, and finally establishing a micro-grid distributed VVC model; the micro-grid distributed VVC model is fitted into a partially observable Markov decision POMG model, and a state action equation is modified from adaptation of discrete action to adaptation of continuous action so as to adapt to the real-time control requirement.

Description

A data-driven voltage control method for microgrids

技术领域Technical Field

本发明涉及微电网应用技术领域，具体的是一种数据驱动的微电网电压控制方法。The present invention relates to the technical field of microgrid applications, and in particular to a data-driven microgrid voltage control method.

背景技术Background Art

微电网(micro-grid，microgrid)，是指由分布式电源、储能装置、能量转换装置、相关负荷和监控、保护装置汇集而成的小型发配电系统，是一个能够实现自我控制、保护和管理的自治系统，既可以与外部电网并网运行，也可以孤立运行。是智能电网的重要组成部分。微网具有双重角色：对于电网，微电网作为一个大小可以改变的智能负载，为本地电力系统提供了可调度负荷，可以在数秒内做出响应以满足系统需要，适时向大电网提供有力支撑；可以在维修系统同时不影响客户的负荷；可以减轻(延长)配电网更新换代，采用IEEE1547.4标准，指导分布式电源孤岛运行，能够消除某些特殊操作要求产生的技术阻碍。对于用户，微电网作为一个可定制的电源，可以满足用户多样化的需求，例如，增强局部供电可靠性，降低馈电损耗，支持当地电压，通过利用废热提高效率，提供电压下陷的校正，或作为不可中断电源服务等。微电网不仅解决了分布式电源的大规模接入问题，充分发挥了分布式电源的各项优势，还为用户带来了其他多方面的效益。微电网将从根本上改变传统的应对负荷增长的方式，在降低能耗、提高电力系统可靠性和灵活性等方面具有巨大潜力。Microgrid (microgrid) refers to a small power generation and distribution system composed of distributed power sources, energy storage devices, energy conversion devices, related loads and monitoring and protection devices. It is an autonomous system that can achieve self-control, protection and management. It can be connected to the external power grid or run in isolation. It is an important part of the smart grid. Microgrid has a dual role: for the power grid, as a smart load with variable size, microgrid provides dispatchable loads for the local power system, can respond within seconds to meet system needs, and provide strong support to the large power grid in a timely manner; it can maintain the system without affecting the customer's load; it can reduce (extend) the replacement of the distribution network, adopt the IEEE1547.4 standard, guide the operation of distributed power islands, and eliminate the technical obstacles caused by certain special operating requirements. For users, microgrids, as a customizable power source, can meet the diverse needs of users, such as enhancing local power supply reliability, reducing feeder losses, supporting local voltages, improving efficiency by utilizing waste heat, providing voltage sag correction, or serving as uninterruptible power services. Microgrids not only solve the problem of large-scale access to distributed power sources, give full play to the advantages of distributed power sources, but also bring many other benefits to users. Microgrids will fundamentally change the traditional way of dealing with load growth, and have great potential in reducing energy consumption, improving the reliability and flexibility of power systems, etc.

面向微电网，为解决电压调节问题和降低网络功率损耗等电能管理问题，已经开发出了一些电压/无功控制(VVC)方法。其中，逆变器和储能设备由于其日益增长的可用性、灵活性和先进的电力电子技术的快速响应速度而得到了广泛的研究并用于参与微电网调节控制。针对微电网的控制框架目前主要有集中式控制、本地控制以及分布式控制3种控制框架。分布式方法由于不需要复杂的通信网络，并且能够考虑到存在的隐私、复杂性、可伸缩性以及全局性等问题得到了越来越多的研究。In order to solve the power management problems such as voltage regulation and reducing network power loss in microgrids, some voltage/var control (VVC) methods have been developed. Among them, inverters and energy storage devices have been widely studied and used to participate in microgrid regulation control due to their increasing availability, flexibility and fast response speed of advanced power electronics technology. There are currently three main control frameworks for microgrids: centralized control, local control and distributed control. Distributed methods have been increasingly studied because they do not require complex communication networks and can take into account existing privacy, complexity, scalability and global issues.

分布式控制常用的研究思路是通过合理的逼近和简化将非凸问题转化为凸优化问题，然后使用分布式算法进行求解。为了应对分布式新能源和负荷需求的不确定性，基于模型的分布式控制方法通常需要预先确定最优解。然而，当出现新情况时，传统的模型控制方法必须解决优化问题，这需要大量的计算。此外，基于模型的控制方法通常需要精确而全面的电网参数，而这通常难以实现。A common research idea for distributed control is to transform non-convex problems into convex optimization problems through reasonable approximation and simplification, and then use distributed algorithms to solve them. In order to cope with the uncertainty of distributed renewable energy and load demand, model-based distributed control methods usually need to predetermine the optimal solution. However, when new situations arise, traditional model control methods must solve optimization problems, which requires a lot of calculations. In addition, model-based control methods usually require accurate and comprehensive grid parameters, which are often difficult to achieve.

发明内容Summary of the invention

为解决上述背景技术中提到的不足，本发明的目的在于提供一种数据驱动的微电网电压控制方法。In order to solve the deficiencies mentioned in the above background technology, an object of the present invention is to provide a data-driven microgrid voltage control method.

本发明的目的可以通过以下技术方案实现：一种数据驱动的微电网电压控制方法，方法包括以下步骤：The purpose of the present invention can be achieved by the following technical solution: a data-driven microgrid voltage control method, the method comprising the following steps:

将微电网基于分布式架构，划分为多个通过潮流相互连接耦合的子网络，针对每一个子网络构建对应子网络内部控制的智能体；Based on the distributed architecture, the microgrid is divided into multiple sub-networks that are interconnected and coupled through power flow, and an intelligent agent for the internal control of each sub-network is constructed;

构建光伏逆变器无功输出为主、储能系统有功输出为辅的联合控制模型，每一个子网络对应的单个智能体控制子网络内的所有光伏逆变器与储能设备，通过控制光伏逆变器无功输出与储能系统有功输出实现对微电网电压的有效控制；Construct a joint control model with the reactive output of photovoltaic inverter as the main function and the active output of energy storage system as the auxiliary function. A single intelligent agent corresponding to each subnetwork controls all photovoltaic inverters and energy storage devices in the subnetwork, and effectively controls the voltage of the microgrid by controlling the reactive output of photovoltaic inverter and the active output of energy storage system.

结合V型和U型电压势垒函数模型，构建桶型电压势垒函数模型；Combining V-type and U-type voltage barrier function models, a barrel-type voltage barrier function model is constructed;

将网络总功率损耗和电压势垒函数模型为基础的电压约束作为目标控制，使用加权和算法对电压偏差和网络功率损耗进行权衡，建立基于光伏和负载预测间隔的时空不确定性模型，建立基于牛顿拉夫逊法求解的网络潮流约束模型，综合基于光伏和负载预测间隔的时空不确定性模型，光伏逆变器无功输出为主、储能系统有功输出为辅的联合控制模型，权衡电压偏差和网络功率损耗后的控制目标以及网络潮流约束模型，建立微电网分布式VVC模型；The voltage constraint based on the total network power loss and voltage barrier function model is used as the target control, and the voltage deviation and network power loss are weighed using the weighted sum algorithm. A spatiotemporal uncertainty model based on the photovoltaic and load prediction intervals is established. A network flow constraint model based on the Newton-Raphson method is established. A spatiotemporal uncertainty model based on the photovoltaic and load prediction intervals is integrated. A joint control model with the photovoltaic inverter as the main reactive output and the energy storage system as the auxiliary active output is established. The control target after weighing the voltage deviation and network power loss and the network flow constraint model are established to establish a distributed VVC model of the microgrid.

将微电网分布式VVC模型拟合为部分可观测马尔科夫决策POMG模型，通过将状态动作方程从适应离散动作改进为适应连续动作以适应实时控制要求；The distributed VVC model of the microgrid is fitted into a partially observable Markov decision POMG model, and the state-action equation is improved from adapting to discrete actions to adapting to continuous actions to meet the real-time control requirements;

将构建好的基于光伏和负载预测间隔的时空不确定性模型转化为随机规划模型，在每一训练集的返回奖励中添加网络求解失败惩罚以提升网络求解成功率；The constructed spatiotemporal uncertainty model based on PV and load forecast intervals is transformed into a stochastic programming model, and a network solution failure penalty is added to the return reward of each training set to improve the network solution success rate;

使用多智能体深度确定性策略梯度MADDPG算法，设计算法流程，应用于微电网。Using the multi-agent deep deterministic policy gradient MADDPG algorithm, the algorithm process is designed and applied to microgrids.

优选地，所述分布式架构不需要在一个中央协调器收集网络中产生的所有信息，需要考虑全局的协调问题；在分布式优化中，每个构建的智能体所代表的子网络仅和他相邻的子网络交换有限的边界物理信息和全局的奖励回报，集体寻求全局最优解决方案，在每个操作期间，训练过的控制器在相应的子网络内实现局部测量的VVC。Preferably, the distributed architecture does not require a central coordinator to collect all the information generated in the network, and needs to consider global coordination issues; in distributed optimization, the sub-network represented by each constructed intelligent agent only exchanges limited boundary physical information and global reward returns with its adjacent sub-networks, collectively seeking a global optimal solution, and during each operation, the trained controller implements locally measured VVC within the corresponding sub-network.

优选地，所述光伏逆变器无功输出为主、储能系统有功输出为辅的联合控制模型如下：Preferably, the joint control model in which the photovoltaic inverter is mainly reactive and the energy storage system is supplemented by active output is as follows:

表示光伏输出的实时有功功率，

表示光伏逆变器输出的实时无功功率，

表示光伏逆变器的复功率，

表示t时的光伏有功功率输出边界，δ光伏逆变器无功功率容量因数，

和

表示储能系统可以发出和吸收的最小和最大有功功率，

表示在t时储能的有功功率输出值，

表示t时储能的实时容量，

表示储能的最大容量。

Indicates the real-time active power output of photovoltaic power.

Indicates the real-time reactive power output by the photovoltaic inverter.

represents the complex power of the PV inverter,

represents the photovoltaic active power output boundary at time t, δ is the photovoltaic inverter reactive power capacity factor,

and

Indicates the minimum and maximum active power that the energy storage system can generate and absorb,

represents the active power output value of energy storage at time t,

represents the real-time capacity of energy storage at time t,

Indicates the maximum capacity of energy storage.

光伏逆变器在需要时优先提供无功功率，当无功功率补偿能力不足时，储能系统就会发生动作，各光伏逆变器输出的无功功率也限定在其视在功率容量的预设比例内，正值表示向电网注入无功功率，负值表示吸收电网无功功率；储能系统的设置与逆变器相类似，正值表示向电网注入有功功率，负值表示吸收电网有功功率，储能系统的剩余电量一直为正。Photovoltaic inverters give priority to providing reactive power when needed. When the reactive power compensation capacity is insufficient, the energy storage system will take action. The reactive power output of each photovoltaic inverter is also limited to a preset proportion of its apparent power capacity. A positive value indicates the injection of reactive power into the grid, and a negative value indicates the absorption of grid reactive power. The setting of the energy storage system is similar to that of the inverter. A positive value indicates the injection of active power into the grid, and a negative value indicates the absorption of grid active power. The remaining power of the energy storage system is always positive.

优选地，所述桶型电压势垒函数模型如下：Preferably, the barrel voltage barrier function model is as follows:

式中，v_a为节点实时的电压大小；v_ref为网络电压基准值，取1.00p.u.；l_v(v_a)为该节点电压的实时奖励；In the formula, _va is the real-time voltage of the node; v _ref is the network voltage reference value, which is 1.00 pu; l _v ( _va ) is the real-time reward of the node voltage;

且电压势垒函数模型结合V型和U型的优点：一方面，它在安全范围内具有缓慢的梯度，获得更好的电压条件；另一方面，安全范围之外的较大梯度确保更快的策略引导。The voltage barrier function model combines the advantages of the V-type and the U-type: on the one hand, it has a slow gradient within the safety range to obtain better voltage conditions; on the other hand, the larger gradient outside the safety range ensures faster strategy guidance.

优选地，权衡电压偏差和网络功率损耗后的控制目标：Preferably, the control target after weighing the voltage deviation and network power loss is:

式中，lv(vi,t)为t时刻i节点的实时电压势垒函数值；N为网络节点数目；N_m为网络节点集合；

为网络支路集合；r_ij和x_ij分别表示i和j节点之间支路的电阻和电抗；v_i,t和v_j,t分别表示t时刻i节点和j节点的电压幅值；Where lv(vi,t) is the real-time voltage barrier function value of node i at time t; N is the number of network nodes; _Nm is the set of network nodes;

is the network branch set; r _ij and x _ij represent the resistance and reactance of the branch between nodes i and j respectively; v _i,t and v _j,t represent the voltage amplitudes of nodes i and j at time t respectively;

后采用加权和算法，将多目标函数转化为具有加权因子的等价单目标函数，使用乌托邦点和纳迪尔点将目标规范化，对于任意子网络m，得到归一化目标的加权和表示为：Then, the weighted sum algorithm is used to transform the multi-objective function into an equivalent single-objective function with weighted factors. The objectives are normalized using Utopia points and Nadir points. For any subnetwork m, the weighted sum of the normalized objectives is expressed as:

式中，

表示归一化后m网络t时刻的电压偏差，

表示归一化后m网络t时刻的网络有功损耗，α和β表示归一化的系数。

In the formula,

represents the normalized voltage deviation of the m network at time t,

represents the normalized network active loss of m network at time t, and α and β represent the normalized coefficients.

优选地，所述建立基于光伏和负载预测间隔的时空不确定性模型的过程如下：Preferably, the process of establishing a spatiotemporal uncertainty model based on photovoltaic and load forecast intervals is as follows:

在每个操作时段之前，通过蒙特卡罗抽样，在给出的预测区间内随机生成空间不确定性场景，其次，在每个运行周期内，通过蒙特卡罗抽样生成时间不确定性场景，并通过时间不确定性区间来考虑时滞，在每个时间不确定性场景下，计算电压偏差和网损，并用场景发生概率对这两个目标进行修正，修正后的归一化目标表示为：Before each operation period, a spatial uncertainty scenario is randomly generated within a given prediction interval through Monte Carlo sampling. Secondly, in each operation cycle, a temporal uncertainty scenario is generated through Monte Carlo sampling, and the time lag is considered through the temporal uncertainty interval. In each temporal uncertainty scenario, the voltage deviation and network loss are calculated, and the two targets are corrected by the scenario occurrence probability. The corrected normalized target is expressed as:

为随机规划中所以情形归一化值的和(m网络t时刻的平均电压偏差/网损)；

为u情形下归一化值的和；ξ_u表示u情形的发生概率。

is the sum of the normalized values of all cases in the stochastic planning (average voltage deviation/network loss at time t in network m);

is the sum of the normalized values under situation u; ξ _u represents the probability of occurrence of situation u.

优选地，所述将状态动作方程从适应离散动作改进为适应连续动作的过程如下：Preferably, the process of improving the state-action equation from adapting to discrete actions to adapting to continuous actions is as follows:

状态-动作函数如下，用于指示当前状态或状态-动作对的综合返回：The state-action function is as follows, which is used to indicate the current state or the comprehensive return of the state-action pair:

式中，τ_i表示代理i的历史记录；a_-i＝×_j≠ia_j，为了适应连续动作，将状态-动作函数更改为以下形式：Where τ _i represents the history of agent i; a _{- i} = × _{j ≠ i} a _j . In order to adapt to continuous actions, the state-action function is changed to the following form:

式中，基于a′_i的高斯分布由π_i(a′_i∣τ_i)表示，

通过蒙特卡罗抽样获得，为：Where, the Gaussian distribution based on a′ _i is represented by π _i (a′ _i |τ _i ),

Obtained through Monte Carlo sampling, it is:

优选地，所述添加网络求解失败惩罚以提升网络求解成功率的过程如下：Preferably, the process of adding a network solution failure penalty to improve the network solution success rate is as follows:

添加网络求解失败惩罚函数F：Add network solution failure penalty function F:

F＝-f,t_f＜T_max F＝-f,t _f ＜T _max

式中，f为惩罚出现常数，是一个大的正数；t_f为因网络求解失败而出现中断的训练集中网络成功求解的持续时间；T_max为每一训练集的最大时间步；Where f is the penalty constant, which is a large positive number; _tf is the duration of successful network solution in the training set that was interrupted due to network solution failure; _Tmax is the maximum time step of each training set;

则此时每一因出现网络求解失败而中断的训练集的奖励调整为R_mf：At this time, the reward for each training set that is interrupted due to network solution failure is adjusted to R _mf :

R_mf＝R_m+FR _mf = R _m + F

式中，R_m为训练集中断前所获得的正常奖励值。Where _Rm is the normal reward value obtained before the training set is interrupted.

优选地，一种设备，包括：Preferably, a device comprises:

一个或多个处理器；one or more processors;

存储器，用于存储一个或多个程序；A memory for storing one or more programs;

当一个或多个所述程序被一个或多个所述处理器执行，使得一个或多个所述处理器实现如上所述的一种数据驱动的微电网电压控制方法。When one or more of the programs are executed by one or more of the processors, the one or more processors implement the data-driven microgrid voltage control method as described above.

优选地，一种包含计算机可执行指令的存储介质，所述计算机可执行指令在由计算机处理器执行时用于执行如上所述的一种数据驱动的微电网电压控制方法。Preferably, a storage medium comprising computer executable instructions, wherein the computer executable instructions are used to perform a data-driven microgrid voltage control method as described above when executed by a computer processor.

本发明的有益效果：Beneficial effects of the present invention:

本发明基于数据驱动方法和分布式架构，采用集中训练与分散执行结合的方式，通过对历史数据进行离线训练得到有效的运行控制策略，从而能够根据网络的实时运行工况，在线做出实时的联合出力决策，确保了网络运行过程中的可靠性、经济性与安全性。The present invention is based on a data-driven approach and a distributed architecture, and adopts a combination of centralized training and decentralized execution. An effective operation control strategy is obtained by offline training of historical data, so that real-time joint output decisions can be made online according to the real-time operating conditions of the network, thereby ensuring the reliability, economy and safety of the network during operation.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图；In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following briefly introduces the drawings required for use in the embodiments or the description of the prior art. Obviously, for those skilled in the art, other drawings can be obtained based on these drawings without creative work.

图1是本发明网络总体框架示意图；FIG1 is a schematic diagram of the overall network framework of the present invention;

图2是本发明算法流程框架示意图。FIG. 2 is a schematic diagram of the algorithm flow framework of the present invention.

具体实施方式DETAILED DESCRIPTION

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其它实施例，都属于本发明保护的范围。The following will be combined with the drawings in the embodiments of the present invention to clearly and completely describe the technical solutions in the embodiments of the present invention. Obviously, the described embodiments are only part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.

如图1所示，一种数据驱动的微电网电压控制方法，包括以下步骤：As shown in FIG1 , a data-driven microgrid voltage control method includes the following steps:

步骤(1)：将微电网基于分布式架构，划分为多个通过潮流相互连接耦合的子网络，针对每一个子网络构建对应子网络内部控制的智能体。Step (1): Based on a distributed architecture, the microgrid is divided into multiple sub-networks that are interconnected and coupled through power flows, and an intelligent agent for internal control of each sub-network is constructed.

考虑到传统的集中式VVC框架存在通信和计算负担、隐私问题等问题，以及传统的本地控制方法无法进行全局协调的问题，构建一个区域协调微电网VVC框架，如附图1所示。将微电网基于分布式架构划分为多个通过潮流相互连接耦合的子网络，针对每一个子网络构建对应其内部控制的智能体。每个构建的智能体所代表的子网络仅和他相邻的子网络交换有限的边界物理信息和全局的奖励回报，集体寻求全局最优解决方案。在每个操作期间，训练过的控制器在其相应的子网络内实现局部测量的VVC。Considering the problems of communication and computational burden, privacy issues, etc. in the traditional centralized VVC framework, as well as the problem that the traditional local control method cannot be globally coordinated, a regional coordinated microgrid VVC framework is constructed, as shown in Figure 1. The microgrid is divided into multiple sub-networks that are interconnected and coupled through power flows based on a distributed architecture, and an intelligent agent corresponding to its internal control is constructed for each sub-network. The sub-network represented by each constructed intelligent agent only exchanges limited boundary physical information and global reward returns with its adjacent sub-networks, and collectively seeks a global optimal solution. During each operation, the trained controller implements the locally measured VVC in its corresponding sub-network.

步骤(2)：构建光伏逆变器无功输出为主、储能系统有功输出为辅的联合控制模型，每一个子网络对应的单个智能体控制子网络内的所有光伏逆变器与储能设备，通过控制光伏逆变器无功输出与储能系统有功输出实现对微电网电压的有效控制。Step (2): Construct a joint control model with the reactive output of the photovoltaic inverter as the main control and the active output of the energy storage system as the auxiliary control. The single intelligent agent corresponding to each subnetwork controls all photovoltaic inverters and energy storage devices in the subnetwork, and effectively controls the voltage of the microgrid by controlling the reactive output of the photovoltaic inverter and the active output of the energy storage system.

逆变器控制模型如(1)-(3)所示。各逆变器输出的无功功率也限定在其视在功率容量的预设比例内，正值表示向电网注入无功功率，负值表示吸收电网无功功率。The inverter control model is shown in (1)-(3). The reactive power output of each inverter is also limited to a preset ratio of its apparent power capacity. A positive value indicates reactive power injection into the grid, and a negative value indicates reactive power absorption from the grid.

其中，

表示光伏输出的实时有功功率，

表示光伏逆变器输出的实时无功功率，

表示光伏逆变器的复功率，

表示t时的光伏有功功率输出边界，δ光伏逆变器无功功率容量因数。in,

Indicates the real-time active power output of photovoltaic power.

Indicates the real-time reactive power output by the photovoltaic inverter.

represents the complex power of the PV inverter,

represents the PV active power output boundary at time t, and δ represents the PV inverter reactive power capacity factor.

光伏逆变器下垂控制调度模型如(4)、(5)所示。储能系统的设置与逆变器相类似，正值表示向电网注入有功功率，负值表示吸收电网有功功率，储能系统的剩余电量一直为正。The PV inverter droop control scheduling model is shown in (4) and (5). The setting of the energy storage system is similar to that of the inverter. A positive value indicates that active power is injected into the grid, and a negative value indicates that active power is absorbed from the grid. The remaining power of the energy storage system is always positive.

其中，

和

表示储能系统可以发出和吸收的最小和最大有功功率，

表示在t时储能的有功功率输出值，

表示t时储能的实时容量，

表示储能的最大容量。in,

and

represents the active power output value of energy storage at time t,

represents the real-time capacity of energy storage at time t,

Indicates the maximum capacity of energy storage.

步骤(3)：结合V型和U型电压势垒函数模型，构建桶型电压势垒函数模型；Step (3): combining the V-type and U-type voltage barrier function models to construct a barrel-type voltage barrier function model;

式(6)中，V形电压势垒功能在0.95p.u.-1.05p.u.范围内具有较大的梯度，可以实现更好的电压条件，但在范围内无法做到精细化调节。式(7)中U型电压势垒函数在0.95p.u.-1.05p.u.范围外具有较小的梯度，无法实现快速进入安全范围。In formula (6), the V-shaped voltage barrier function has a large gradient within the range of 0.95p.u.-1.05p.u., which can achieve better voltage conditions, but it is impossible to achieve fine adjustment within the range. In formula (7), the U-shaped voltage barrier function has a small gradient outside the range of 0.95p.u.-1.05p.u., which cannot achieve rapid entry into the safe range.

结合两者优势提出桶型电压势垒函数模型：Combining the advantages of both, a barrel voltage barrier function model is proposed:

式(8)中，v_a为节点实时的电压大小；v_ref为网络电压基准值，一般取1.00p.u.；l_v(v_a)为该节点电压的实时奖励。In formula (8), _va is the real-time voltage of the node; v _ref is the network voltage reference value, which is generally 1.00 pu; l _v ( _va ) is the real-time reward of the node voltage.

该模型结合V型和U型的优点：一方面，它在安全范围内具有缓慢的梯度，可以获得更好的电压条件；另一方面，安全范围之外的较大梯度可以确保更快的策略引导。This model combines the advantages of V-type and U-type: on the one hand, it has a slow gradient within the safety range to obtain better voltage conditions; on the other hand, the larger gradient outside the safety range can ensure faster strategy guidance.

步骤(4)：将网络总功率损耗和电压势垒函数模型为基础的电压约束作为目标控制，使用加权和算法对电压偏差和网络功率损耗进行权衡，建立基于光伏和负载预测间隔的时空不确定性模型，建立基于牛顿拉夫逊法求解的网络潮流约束模型，综合基于光伏和负载预测间隔的时空不确定性模型，光伏逆变器无功输出为主、储能系统有功输出为辅的联合控制模型，权衡电压偏差和网络功率损耗后的控制目标以及网络潮流约束模型，建立微电网分布式VVC模型；Step (4): Take the voltage constraint based on the total network power loss and the voltage barrier function model as the target control, use the weighted sum algorithm to weigh the voltage deviation and network power loss, establish a spatiotemporal uncertainty model based on the photovoltaic and load prediction intervals, establish a network power flow constraint model based on the Newton-Raphson method, integrate the spatiotemporal uncertainty model based on the photovoltaic and load prediction intervals, a joint control model with the photovoltaic inverter as the main reactive output and the energy storage system as the auxiliary active output, the control target after weighing the voltage deviation and network power loss and the network power flow constraint model, and establish a distributed VVC model of the microgrid;

每个子网设置了两个VVC目标，即电压约束和网络有功功率最小化，如(9)所示。其中(10)如为电压偏差，(11)m网络t时刻的总功率损耗。Two VVC objectives are set for each subnetwork, namely voltage constraint and network active power minimization, as shown in (9). Where (10) is the voltage deviation, and (11) is the total power loss of the network at time t.

式(10)中，l_v(v_i,t)为t时刻i节点的实时电压势垒函数值；N为网络节点数目；N_m为网络节点集合；式(11)中

为网络支路集合，r_ij和x_ij分别表示i和j节点之间支路的电阻和电抗。v_i,t和v_j,t分别表示t时刻i节点和j节点的电压幅值。In formula (10), l _v (vi _,t ) is the real-time voltage barrier function value of node i at time t; N is the number of network nodes; _Nm is the set of network nodes; in formula (11)

is the network branch set, r _ij and x _ij represent the resistance and reactance of the branch between nodes i and j respectively. _{v i,t} and v _j,t represent the voltage amplitude of nodes i and j at time t respectively.

采用经典的加权和算法，将多目标函数转化为具有加权因子的等价单目标函数。这里我们使用乌托邦点和纳迪尔点将这些目标规范化，对于子网络m，得到归一化目标的加权和表示为：The classic weighted sum algorithm is used to transform the multi-objective function into an equivalent single objective function with weighted factors. Here we use Utopia points and Nadir points to normalize these objectives. For subnetwork m, the weighted sum of the normalized objectives is expressed as:

式中，

表示归一化后m网络t时刻的电压偏差，

表示归一化后m网络t时刻的网络有功损耗，α和β表示归一化的系数。In the formula,

represents the normalized voltage deviation of the m network at time t,

建立基于光伏和负载预测间隔的时空不确定性模型。Establish a spatiotemporal uncertainty model based on PV and load forecast intervals.

首先基于光伏发电和负载的预测间隔建立了空间不确定性模型，之后在每个决策时间步进行优化。优化后的模型中的时间间隔随时间变化，并与t处的实时测量相关。First, a spatial uncertainty model is established based on the prediction intervals of PV generation and load, which is then optimized at each decision time step. The time intervals in the optimized model vary with time and are related to the real-time measurements at time t.

考虑到可再生能源发电和负荷的位置变化，以及短期的间歇性和波动，需要解决了空间和时间的不确定性，以确保运行约束。Considering the locational variability, short-term intermittency and fluctuation of renewable energy generation and load, spatial and temporal uncertainties need to be addressed to ensure operational constraints.

首先基于预测的光伏发电时间间隔和负荷建立了空间不确定性模型，将t0处可能实现的不确定性限制在预测区间内：Firstly, a spatial uncertainty model is established based on the predicted photovoltaic power generation time interval and load, and the possible uncertainty at t0 is limited to the prediction interval:

式中，

表示空间不确定性逆变器最大功率点/母线i有功、无功功率需求的上下限。In the formula,

It represents the upper and lower limits of the active and reactive power requirements of the spatially uncertain inverter maximum power point/bus i.

在上述空间不确定性模型中，通信延迟会造成逆变器的响应延迟，为了消除这些延迟的影响，建立时间不确定性模型，对每个决策时间步骤进行了优化，将其表示为：In the above spatial uncertainty model, communication delay will cause response delay of the inverter. In order to eliminate the influence of these delays, a temporal uncertainty model is established and each decision time step is optimized, which can be expressed as:

式中，

示逆变器最大功率点/母线有功功率和无功功率需求的时间不确定性上下限。这个时间不确定性模型的间隔随时间而变化，并且与t时的实时测量有关。In the formula,

The upper and lower limits of the time uncertainty of the inverter maximum power point/bus active power and reactive power demand are shown. The interval of this time uncertainty model varies with time and is related to the real-time measurement at time t.

时间不确定性模型的间隔随时间而变化，并且与t时的实时测量有关。The interval of the temporal uncertainty model varies with time and is related to the real-time measurement at time t.

建立基于牛顿拉夫逊法求解的网络潮流约束模型如式(18)所示。The network power flow constraint model based on the Newton-Raphson method is established as shown in formula (18).

式中p_i,t和q_i,t分别表示i节点在t时刻注入的有功功率与无功功率，v_i,t表示i节点t时刻的电压幅值，g_ij和b_ij表示i节点和j节点间母线的电导和电纳，θ_ij,t表示i节点和j节点t时刻的电压相角差，N为微电网的所有节点索引。where p _i,t and q _i,t represent the active power and reactive power injected by node i at time t, respectively; vi _,t represents the voltage amplitude of node i at time t; g _ij and b _ij represent the conductance and susceptance of the bus between node i and node j; θ _ij,t represents the voltage phase angle difference between node i and node j at time t; and N is the index of all nodes in the microgrid.

综合基于光伏和负载预测间隔的时空不确定性模型，光伏逆变器无功输出为主、储能系统有功输出为辅的联合控制模型，权衡后的电压偏差和网络功率损耗控制目标以及网络潮流约束模型，建立微电网的分布式VVC模型，如式(19)所示。The distributed VVC model of the microgrid is established by integrating the spatiotemporal uncertainty model based on the photovoltaic and load prediction intervals, the joint control model with the photovoltaic inverter as the main reactive output and the energy storage system as the auxiliary active output, the balanced voltage deviation and network power loss control objectives, and the network flow constraint model, as shown in Equation (19).

式(19)中M为网络分区索引；T为持续总时长；v_i,t为实时电压。In formula (19), M is the network partition index; T is the total duration; and _vi,t is the real-time voltage.

步骤(5)：将微电网分布式VVC模型拟合为部分可观测马尔科夫决策POMG模型，通过将状态动作方程从适应离散动作改进为适应连续动作以适应实时控制要求；Step (5): Fit the distributed VVC model of the microgrid to a partially observable Markov decision POMG model by improving the state-action equation from adapting to discrete actions to adapting to continuous actions to adapt to real-time control requirements;

首先设置观察空间包含v_i,t、

和

分别表示节点实时电压、节点连接负载的有功消耗、节点连接负载的无功消耗、实时光伏注入电网的有功功率和实时储能剩余能量；连续的动作空间包含

和

分别表示逆变器无功输出功率和储能有功输出功率。First, set the observation space to contain v _i,t ,

and

They represent the real-time node voltage, the active power consumption of the node connected load, the reactive power consumption of the node connected load, the real-time active power injected into the grid by photovoltaic power generation, and the real-time residual energy of energy storage. The continuous action space contains

and

They represent the inverter reactive output power and energy storage active output power respectively.

确定每个行动者的目标是在时间范围T内最大化其预期收益即Determine that the goal of each actor is to maximize its expected benefit within the time frame T, that is,

状态动作方程从适应离散动作改进为适应连续动作以适应实时控制要求。The state action equation is improved from adapting to discrete actions to adapt to continuous actions to meet real-time control requirements.

状态-动作函数用于指示当前状态或状态-动作对的综合返回：The state-action function is used to indicate the current state or the comprehensive return of the state-action pair:

式中，τ_i表示代理i的历史记录；a_-i＝×_j≠ia_j。为了适应连续动作，我们将其更改为以下形式：Where τ _i represents the history of agent i; a _-i = × _{j ≠ i} a _j . To accommodate continuous actions, we change it to the following form:

式中，基于a′_i的高斯分布由π_i(a′_i∣τ_i)表示。在实际应用中，

通过蒙特卡罗抽样近似获得，可以改写为：Where, the Gaussian distribution based on a′ _i is represented by π _i (a′ _i |τ _i ). In practical applications,

It is obtained by Monte Carlo sampling approximation and can be rewritten as:

步骤(6)：将构建好的基于光伏和负载预测间隔的时空不确定性模型转化为随机规划模型，在每一训练集的返回奖励中添加网络求解失败惩罚以提升网络求解成功率；Step (6): Convert the constructed spatiotemporal uncertainty model based on PV and load forecast intervals into a stochastic programming model, and add a network solution failure penalty to the return reward of each training set to improve the network solution success rate;

通过蒙特卡罗抽样根据式(13)和(14)的预测区间生成空间不确定性场景，这里考虑时滞根据式(15)和(16)通过蒙特卡罗抽样生成时间不确定性情景，用情景发生概率对电压偏差和网损进行修正。ξ_u是情景u∈U的发生概率,将其归一化处理：The spatial uncertainty scenario is generated by Monte Carlo sampling according to the prediction interval of equations (13) and (14). Here, the time lag is considered and the temporal uncertainty scenario is generated by Monte Carlo sampling according to equations (15) and (16). The voltage deviation and network loss are corrected by the scenario occurrence probability. _ξu is the occurrence probability of scenario u∈U, which is normalized:

式(24)中，U考虑每个决策时间步上有U个时间不确定性情景，ξ_u是情景u∈U的发生概率。

为随机规划中所以情形归一化值的和(m网络t时刻的平均电压偏差/网损)。

为u情形下归一化值的和。In formula (24), U considers U time uncertainty scenarios at each decision time step, and _ξu is the occurrence probability of scenario u∈U.

It is the sum of the normalized values of all situations in the stochastic planning (average voltage deviation/network loss of m network at time t).

is the sum of the normalized values in case u.

最后给出策略梯度函数：Finally, the policy gradient function is given:

为了减少由于电力网络承载能力的问题而出现的训练过程中网络求解失败问题，添加网络求解失败惩罚函数F：In order to reduce the problem of network solution failure during training due to the problem of power network carrying capacity, a network solution failure penalty function F is added:

F＝-f,t_f＜T_max(25)F＝-f,t _f ＜T _max (25)

式中，f为惩罚出现常数，是一个较大的正数；t_f为因网络求解失败而出现中断的训练集中网络成功求解的持续时间；T_max为每一训练集的最大时间步。Where f is the penalty constant, which is a large positive number; _tf is the duration of successful network solution in the training set that was interrupted due to network solution failure; _Tmax is the maximum time step of each training set.

则此时每一因出现网络求解失败而中断的训练集的奖励调整为：At this time, the reward for each training set that is interrupted due to network solution failure is adjusted to:

R_mf＝R_m+F(26)R _mf = R _m + F (26)

式(26)中，R_m为训练集中断前所获得的正常奖励值。In formula (26), _Rm is the normal reward value obtained before the training set is interrupted.

步骤(7)：使用多智能体深度确定性策略梯度MADDPG算法，设计合理的算法流程，应用于微电网。Step (7): Use the multi-agent deep deterministic policy gradient MADDPG algorithm to design a reasonable algorithm process and apply it to microgrids.

具体算法流程如附图2所示。初始化时，将储能系统的容量设置为总容量的50％，若储能动作则采用和光伏相同的动作值。对于每一训练集，缓冲区存储480个时间步长(即1天)的PV和负载数据。此外，奖励是根据PandaPower软件包基于缓冲区数据的计算结果获得的。在向代理提供反馈之前，接收到的状态将根据每个代理的区域分配给观测值。每个代理只收到局部观察和全局奖励，然后才能做出下一个决定。上述过程将一直重复到训练单一集结束。The specific algorithm flow is shown in Figure 2. During initialization, the capacity of the energy storage system is set to 50% of the total capacity. If the energy storage action is used, the same action value as the photovoltaic is adopted. For each training set, the buffer stores PV and load data for 480 time steps (i.e., 1 day). In addition, the reward is obtained based on the calculation results of the PandaPower software package based on the buffer data. Before providing feedback to the agent, the received state will be assigned to the observation value according to the area of each agent. Each agent only receives local observations and global rewards before making the next decision. The above process will be repeated until the end of the single training set.

每集动作完成后，将会获得一个奖励。一个训练包含多个训练集，训练所需的集数是根据需要手动设置的。行为改变和策略模型保存每40训练集发生一次，目标每120集更新一次。每次策略模型更新前，将执行10组测试。测试数据基于总数据样本，测试的平均值将用于评估策略的效果。After each set of actions is completed, a reward will be obtained. A training contains multiple training sets, and the number of sets required for training is manually set as needed. Behavior changes and policy model preservation occur every 40 training sets, and the target is updated every 120 sets. Before each policy model update, 10 sets of tests will be performed. The test data is based on the total data sample, and the average value of the test will be used to evaluate the effectiveness of the policy.

基于同一种发明构思，本发明还提供一种计算机设备，该计算机设备包括包括：一个或多个处理器，以及存储器，用于存储一个或多个计算机程序；程序包括程序指令，处理器用于执行存储器存储的程序指令。处理器可能是中央处理单元(Central ProcessingUnit，CPU)，还可以是其他通用处理器、数字信号处理器(Digital Signal Processor、DSP)、专用集成电路(Application SpecificIntegrated Circuit，ASIC)、现场可编程门阵列(Field-Programmable GateArray，FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等，其是终端的计算核心以及控制核心，其用于实现一条或一条以上指令，具体用于加载并执行计算机存储介质内一条或一条以上指令从而实现上述方法。Based on the same inventive concept, the present invention also provides a computer device, which includes: one or more processors, and a memory for storing one or more computer programs; the program includes program instructions, and the processor is used to execute the program instructions stored in the memory. The processor may be a central processing unit (CPU), or other general-purpose processors, digital signal processors (DSP), application-specific integrated circuits (ASIC), field-programmable gate arrays (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. It is the computing core and control core of the terminal, which is used to implement one or more instructions, specifically for loading and executing one or more instructions in a computer storage medium to implement the above method.

需要进一步进行说明的是，基于同一种发明构思，本发明还提供一种计算机存储介质，该存储介质上存储有计算机程序，所述计算机程序被处理器运行时执行上述方法。该存储介质可以采用一个或多个计算机可读的介质的任意组合。计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质。计算机可读存储介质例如可以是但不限于电、磁、光、电、磁、红外线、或半导体的系统、装置或器件，或者任意以上的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括：具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本发明中，计算机可读存储介质可以是任何包含或存储程序的有形介质，该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。It needs to be further explained that, based on the same inventive concept, the present invention also provides a computer storage medium, on which a computer program is stored, and the computer program is executed by the processor when it is run. The storage medium can adopt any combination of one or more computer-readable media. The computer-readable medium can be a computer-readable signal medium or a computer-readable storage medium. The computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electrical, magnetic, infrared, or semiconductor system, device or device, or any combination of the above. More specific examples (non-exhaustive list) of computer-readable storage media include: an electrical connection with one or more wires, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the above. In the present invention, a computer-readable storage medium can be any tangible medium containing or storing a program, which can be used by an instruction execution system, device or device or used in combination with it.

在本说明书的描述中，参考术语“一个实施例”、“示例”、“具体示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本公开的至少一个实施例或示例中。在本说明书中，对上述术语的示意性表述不一定指的是相同的实施例或示例。而且，描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。In the description of this specification, the description with reference to the terms "one embodiment", "example", "specific example", etc. means that the specific features, structures, materials or characteristics described in conjunction with the embodiment or example are included in at least one embodiment or example of the present disclosure. In this specification, the schematic representation of the above terms does not necessarily refer to the same embodiment or example. Moreover, the specific features, structures, materials or characteristics described can be combined in any one or more embodiments or examples in a suitable manner.

以上显示和描述了本公开的基本原理、主要特征和本公开的优点。本行业的技术人员应该了解，本公开不受上述实施例的限制，上述实施例和说明书中描述的只是说明本公开的原理，在不脱离本公开精神和范围的前提下，本公开还会有各种变化和改进，这些变化和改进都落入要求保护的本公开范围内容。The above shows and describes the basic principles, main features and advantages of the present disclosure. Those skilled in the art should understand that the present disclosure is not limited by the above embodiments, and the above embodiments and descriptions are only for explaining the principles of the present disclosure. Without departing from the spirit and scope of the present disclosure, the present disclosure may have various changes and improvements, and these changes and improvements fall within the scope of the present disclosure to be protected.

Claims

1. A data-driven microgrid voltage control method, characterized in that the method comprises the following steps:

Based on the distributed architecture, the microgrid is divided into multiple sub-networks that are interconnected and coupled through power flow, and an intelligent agent for the internal control of each sub-network is constructed;

Construct a joint control model with the reactive output of photovoltaic inverter as the main function and the active output of energy storage system as the auxiliary function. A single intelligent agent corresponding to each subnetwork controls all photovoltaic inverters and energy storage devices in the subnetwork, and effectively controls the voltage of the microgrid by controlling the reactive output of photovoltaic inverter and the active output of energy storage system.

Combining V-type and U-type voltage barrier function models, a barrel-type voltage barrier function model is constructed;

The voltage constraint based on the total network power loss and voltage barrier function model is used as the target control, and the voltage deviation and network power loss are weighed using the weighted sum algorithm. A spatiotemporal uncertainty model based on the photovoltaic and load prediction intervals is established. A network flow constraint model based on the Newton-Raphson method is established. A spatiotemporal uncertainty model based on the photovoltaic and load prediction intervals is integrated. A joint control model with the photovoltaic inverter as the main reactive output and the energy storage system as the auxiliary active output is established. The control target after weighing the voltage deviation and network power loss and the network flow constraint model are established to establish a distributed VVC model of the microgrid.

The distributed VVC model of the microgrid is fitted into a partially observable Markov decision POMG model, and the state-action equation is improved from adapting to discrete actions to adapting to continuous actions to meet the real-time control requirements;

The constructed spatiotemporal uncertainty model based on PV and load forecast intervals is transformed into a stochastic programming model, and a network solution failure penalty is added to the return reward of each training set to improve the network solution success rate;

Using the multi-agent deep deterministic policy gradient MADDPG algorithm, the algorithm process is designed and applied to microgrids.

2. A data-driven microgrid voltage control method according to claim 1, characterized in that the distributed architecture does not require a central coordinator to collect all the information generated in the network, and needs to consider global coordination issues; in distributed optimization, the subnetwork represented by each constructed intelligent agent only exchanges limited boundary physical information and global reward returns with its adjacent subnetworks, collectively seeking a global optimal solution, and during each operation, the trained controller implements locally measured VVC in the corresponding subnetwork.

3. A data-driven microgrid voltage control method according to claim 1, characterized in that the joint control model with the photovoltaic inverter as the main reactive output and the energy storage system as the auxiliary active output is as follows:

Indicates the real-time active power output of photovoltaic power.

Indicates the real-time reactive power output by the photovoltaic inverter.

represents the complex power of the PV inverter,

and

represents the active power output value of energy storage at time t,

represents the real-time capacity of energy storage at time t,

Indicates the maximum capacity of energy storage;

Photovoltaic inverters give priority to providing reactive power when needed. When the reactive power compensation capacity is insufficient, the energy storage system will take action. The reactive power output of each photovoltaic inverter is also limited to a preset proportion of its apparent power capacity. A positive value indicates the injection of reactive power into the grid, and a negative value indicates the absorption of grid reactive power. The setting of the energy storage system is similar to that of the inverter. A positive value indicates the injection of active power into the grid, and a negative value indicates the absorption of grid active power. The remaining power of the energy storage system is always positive.

4. A data-driven microgrid voltage control method according to claim 1, characterized in that the bucket-type voltage barrier function model is as follows:

In the formula, _va is the real-time voltage of the node; v _ref is the network voltage reference value, which is 1.00 pu; l _v ( _va ) is the real-time reward of the node voltage;

The voltage barrier function model combines the advantages of the V-type and the U-type: on the one hand, it has a slow gradient within the safety range to obtain better voltage conditions; on the other hand, the larger gradient outside the safety range ensures faster strategy guidance.

5. A data-driven microgrid voltage control method according to claim 1, characterized in that the control target after weighing the voltage deviation and network power loss is:

Where l _v (v _i,t ) is the real-time voltage barrier function value of node i at time t; N is the number of network nodes; N _m is the set of network nodes;

Then, the weighted sum algorithm is used to transform the multi-objective function into an equivalent single-objective function with weighted factors. The objectives are normalized using Utopia points and Nadir points. For any subnetwork m, the weighted sum of the normalized objectives is expressed as:

In the formula,

represents the normalized voltage deviation of the m network at time t,

6. A data-driven microgrid voltage control method according to claim 1, characterized in that the process of establishing a spatiotemporal uncertainty model based on photovoltaic and load prediction intervals is as follows:

Before each operation period, a spatial uncertainty scenario is randomly generated within a given prediction interval through Monte Carlo sampling. Secondly, in each operation cycle, a temporal uncertainty scenario is generated through Monte Carlo sampling, and the time lag is considered through the temporal uncertainty interval. Under each temporal uncertainty scenario, the voltage deviation and network loss are calculated, and the two targets are corrected by the scenario occurrence probability. The corrected normalized target is expressed as:

represents the normalized average voltage deviation/network power loss under scenario u;

represents the normalized average voltage deviation/network power loss of network m at time t, that is, the sum of the normalized values in all cases in the stochastic planning; _ξu represents the probability of occurrence of scenario u.

7. A data-driven microgrid voltage control method according to claim 1, characterized in that the process of improving the state action equation from adapting to discrete actions to adapting to continuous actions is as follows:

The state-action function is as follows, which is used to indicate the current state or the comprehensive return of the state-action pair:

Where τ _i represents the history of agent i; a _{- i} = × _{j ≠ i} a _j . In order to adapt to continuous actions, the state-action function is changed to the following form:

Where, the Gaussian distribution based on a′ _i is represented by π _i (a′ _i |τ _i ),

Obtained through Monte Carlo sampling, it is:

8. A data-driven microgrid voltage control method according to claim 1, characterized in that the process of adding a network solution failure penalty to improve the network solution success rate is as follows:

Add network solution failure penalty function F:

F＝-f,t _f ＜T _max

Where f is the penalty constant, which is a large positive number; _tf is the duration of successful network solution in the training set that was interrupted due to network solution failure; _Tmax is the maximum time step of each training set;

At this time, the reward for each training set that is interrupted due to network solution failure is adjusted to R _mf :

R _mf = R _m + F

Where _Rm is the normal reward value obtained before the training set is interrupted.

9. A device, comprising:

one or more processors;

A memory for storing one or more programs;

When one or more of the programs are executed by one or more of the processors, one or more of the processors implement a data-driven microgrid voltage control method as described in any one of claims 1-8.

10. A storage medium comprising computer executable instructions, wherein the computer executable instructions are used to execute a data-driven microgrid voltage control method as described in any one of claims 1 to 8 when executed by a computer processor.