CN117613983A

CN117613983A - Energy storage charge and discharge control decision method and device based on fusion rule reinforcement learning

Info

Publication number: CN117613983A
Application number: CN202410090946.5A
Authority: CN
Inventors: 那琼澜; 李信; 邢宁哲; 杨艺西; 马跃; 彭柏; 邢海瀛; 苏丹; 娄竞; 邬小波; 陈重韬; 王艺霏; 张海明; 张实君; 周子阔; 李宇鹏
Original assignee: State Grid Corp of China SGCC; State Grid Jibei Electric Power Co Ltd; Information and Telecommunication Branch of State Grid Jibei Electric Power Co Ltd
Current assignee: State Grid Corp of China SGCC; State Grid Jibei Electric Power Co Ltd; Information and Telecommunication Branch of State Grid Jibei Electric Power Co Ltd
Priority date: 2024-01-23
Filing date: 2024-01-23
Publication date: 2024-02-27
Anticipated expiration: 2044-01-23
Also published as: CN117613983B

Abstract

The specification relates to the technical field of electric power, in particular to an energy storage charging and discharging control decision method and device based on fusion rule reinforcement learning, which are applied to a power grid and a user side optical energy storage device and comprise the following steps: determining a state space by using user electric power, the charge state of an energy storage battery, outdoor temperature, solar irradiance, power grid electricity unit price and power grid selling unit price in any time period; inputting the state space into a charge and discharge control decision model based on fusion rule reinforcement learning to obtain an optimal charge and discharge decision variable, wherein the optimal charge and discharge decision variable comprises: the optimal charge and discharge power and the optimal coefficient of the energy storage battery are obtained through training a charge and discharge control decision model based on fusion rule reinforcement learning through a sample state space and a photovoltaic power generation uncertain model. The method and the device integrate the predefined rules, improve the speed of convergence of reinforcement learning training to the optimal charge-discharge control strategy, improve the electricity economy and reduce the power fluctuation and burden of the power grid.

Description

Energy storage charge and discharge control decision-making method and device based on fusion rule reinforcement learning

技术领域Technical field

本说明书涉及电力技术领域，尤其是基于融合规则强化学习的储能充放电控制决策方法及装置。This specification relates to the field of electric power technology, especially the energy storage charge and discharge control decision-making method and device based on fusion rule reinforcement learning.

背景技术Background technique

随着第三产业、城乡居民用户的用电总量和用电占比不断提升，用电高峰时电网峰值负荷需求不断提高，且峰谷电价差进一步拉大并维持高位。为了降低电网负荷波动、减少电网扩容成本和提升电力系统经济效率，用户侧储能技术受到了广泛关注，拥有着较大的发展潜力。As the total electricity consumption and proportion of electricity consumption in the tertiary industry and urban and rural residential users continue to increase, the peak load demand of the power grid during peak electricity consumption continues to increase, and the peak-valley electricity price gap further widens and remains high. In order to reduce grid load fluctuations, reduce grid expansion costs and improve the economic efficiency of the power system, user-side energy storage technology has received widespread attention and has great development potential.

在用户侧储能技术领域，用户侧光伏储能同时包含了光伏发电、储能电池和电网供电，不同电力设备之间的高效协调亟待解决。现有研究主要关注用户侧储能设施的最优配置和储能规划方法。通过计算储能系统在充电时段需配置的第一功率和第一容量、放电时段需配置的第二功率和第二容量，取第一功率和第二功率中的最大值作为储能系统需配置的功率，取第一容量和第二容量中的最小值作为储能系统需配置的容量，提高了削峰填谷的收益。In the field of user-side energy storage technology, user-side photovoltaic energy storage also includes photovoltaic power generation, energy storage batteries and grid power supply. Efficient coordination between different power equipment needs to be solved urgently. Existing research mainly focuses on the optimal configuration of user-side energy storage facilities and energy storage planning methods. By calculating the first power and first capacity that the energy storage system needs to configure during the charging period, and the second power and second capacity that need to be configured during the discharge period, the maximum value of the first power and the second power is taken as the required configuration of the energy storage system. The minimum value of the first capacity and the second capacity is taken as the capacity to be configured for the energy storage system, which improves the benefits of peak shaving and valley filling.

现有技术中也有用户储能规划方法，构建用户侧储能规划模型，通过最小化初期消耗、运维消耗、电力消耗和基本消耗的综合目标函数，求解得到最优规划模型参数，模型简单、无法自适应调整，且未考虑光伏发电因素。此外，光伏发电受灰尘、温度和遮挡等不确定性因素的影响，其发电功率具有不确定性。因此，考虑光伏发电不确定性的用户侧储能运行控制有待进一步优化。There are also user energy storage planning methods in the existing technology. A user-side energy storage planning model is constructed. By minimizing the comprehensive objective function of initial consumption, operation and maintenance consumption, power consumption and basic consumption, the optimal planning model parameters are obtained. The model is simple and It cannot be adjusted adaptively, and photovoltaic power generation factors are not considered. In addition, photovoltaic power generation is affected by uncertain factors such as dust, temperature and shading, and its power generation is uncertain. Therefore, user-side energy storage operation control considering the uncertainty of photovoltaic power generation needs to be further optimized.

发明内容Contents of the invention

为解决上述现有技术中未考虑光伏发电不确定性的用户侧储能问题，本说明书实施例提供了基于融合规则强化学习的充放电控制决策模型训练方法。In order to solve the user-side energy storage problem in the above-mentioned prior art that does not consider the uncertainty of photovoltaic power generation, embodiments of this specification provide a charging and discharging control decision model training method based on fusion rule reinforcement learning.

本说明书实例提供了一种基于融合规则强化学习的储能充放电控制决策方法，所述方法应用于电网及用户侧光伏储能设备，其中，用户侧光伏储能设备包括：光伏发电装置、用户用电装置及储能电池，所述方法包括：将任意时间段的用户用电功率、储能电池的荷电状态、室外温度、太阳光辐照度、电网用电单价、电网卖电单价，确定状态空间；将所述状态空间输入至基于融合规则强化学习的充放电控制决策模型，得到任意时间段的最优充放电决策变量，其中，所述最优充放电决策变量包括：储能电池最优充放电功率及最优系数，所述基于融合规则强化学习的充放电控制决策模型通过样本状态空间、光伏发电不确定模型训练得到。The example in this manual provides an energy storage charge and discharge control decision-making method based on fusion rule reinforcement learning. The method is applied to the power grid and user-side photovoltaic energy storage equipment. The user-side photovoltaic energy storage equipment includes: photovoltaic power generation device, user Electric devices and energy storage batteries, the method includes: determining the user's power consumption in any period of time, the state of charge of the energy storage battery, outdoor temperature, solar irradiance, grid electricity unit price, and grid electricity sales unit price. State space; input the state space into the charge and discharge control decision model based on fusion rule reinforcement learning to obtain the optimal charge and discharge decision variables for any time period, where the optimal charge and discharge decision variables include: the optimal charge and discharge decision variables of the energy storage battery Optimal charge and discharge power and optimal coefficients are obtained by training the sample state space and photovoltaic power generation uncertainty model based on the fusion rule reinforcement learning charge and discharge control decision-making model.

根据本说明书实施例的一个方面，基于融合规则强化学习的充放电控制决策模型通过如下步骤训练得到：根据样本时间段与用户用电装置、储能电池、电网相关的状态变量及环境变量，确定样本状态空间；根据光伏发电不确定模型，构建充放电控制决策基础模型，其中，所述光伏发电不确定模型为考虑环境变量的高斯分布模型；将样本状态空间中的状态变量输入至所述充放电控制决策基础模型，输出充放电决策变量；根据预设充放电规则约束所述充放电决策变量，利用奖励函数迭代训练所述充放电控制决策基础模型，得到训练完成的充放电控制决策模型。According to one aspect of the embodiments of this specification, the charging and discharging control decision-making model based on fusion rule reinforcement learning is trained through the following steps: according to the state variables and environmental variables related to the user's power device, energy storage battery, and power grid during the sample time period, determine Sample state space; construct a basic model for charge and discharge control decision-making based on the photovoltaic power generation uncertainty model, where the photovoltaic power generation uncertainty model is a Gaussian distribution model that considers environmental variables; input the state variables in the sample state space into the charge and discharge control decision-making model. The discharge control decision-making basic model outputs charging and discharging decision variables; the charging and discharging decision variables are constrained according to the preset charging and discharging rules, and the charging and discharging control decision-making basic model is iteratively trained using the reward function to obtain the trained charging and discharging control decision-making model.

根据本说明书实施例的一个方面，通过如下方式，确定样本时间段储能电池的状态变量：获取样本时间段之前的历史样本时段储能电池的荷电功率；根据历史样本时段储能电池的荷电功率，通过如下公式确定样本时间段储能电池的荷电状态：According to one aspect of the embodiment of this specification, the state variable of the energy storage battery in the sample time period is determined in the following manner: obtaining the charging power of the energy storage battery in the historical sample period before the sample time period; according to the charging power of the energy storage battery in the historical sample period , determine the state of charge of the energy storage battery during the sample time period through the following formula:

；其中，k表示当前样本时间段，/>为第1个样本时间段（即初始时刻）的储能电池的荷电状态；/>为储能电池的自放电率；/>为储能电池的开路电压；/>为储能电池的额定容量；/>为储能电池的充放电效率；/>，/>，/>为第i个样本时间段的储能电池充放电功率，当储能电池放电时，/>，当储能电池充电时，/>；/>，/>，为第i个时间段的效率指数，其满足：/>。 ; Among them, k represents the current sample time period, /> It is the state of charge of the energy storage battery in the first sample time period (i.e. the initial moment);/> is the self-discharge rate of the energy storage battery;/> is the open circuit voltage of the energy storage battery;/> is the rated capacity of the energy storage battery;/> is the charging and discharging efficiency of the energy storage battery;/> ,/> ,/> is the charging and discharging power of the energy storage battery in the i -th sample time period. When the energy storage battery is discharged,/> , when the energy storage battery is charging,/> ;/> ,/> , is the efficiency index of the i -th time period, which satisfies:/> .

根据本说明书实施例的一个方面，根据光伏发电不确定模型，构建储能装置的充放电控制决策基础模型包括：基于服从高斯分布光伏发电不确定模型，确定样本时间段的光伏发电功率；根据样本时间段的光伏发电功率与用户用电功率之差，确定样本时间段光伏发电装置输出到电网的功率。According to one aspect of the embodiments of this specification, constructing a basic model for charge and discharge control decision-making of the energy storage device based on the photovoltaic power generation uncertainty model includes: determining the photovoltaic power generation power in the sample time period based on the photovoltaic power generation uncertainty model that obeys Gaussian distribution. ; Based on the difference between the photovoltaic power generation power and the user's electricity consumption during the sample time period, determine the power output by the photovoltaic power generation device to the power grid during the sample time period.

根据本说明书实施例的一个方面，所述方法包括：通过如下公式确定光伏发电不确定模型：；其中，/>为样本时间段的光伏发电功率；/>为光电转换效率；/>为光伏材料面积；/>表示样本时间段的太阳光辐照度；/>为方差；光伏发电功率服从均值期望为/>、方差为/>的高斯分布；根据光伏发电不确定模型得到的光伏发电功率，通过如下公式确定光伏发电装置的输出到电网的功率：/>；其中，/>为样本时间段光伏发电装置输出到电网的功率；Pload为用户用电功率；/>表示光伏发电功率；/>为比例权重，/>。According to one aspect of the embodiment of this specification, the method includes: determining the photovoltaic power generation uncertainty model through the following formula: ;wherein,/> is the photovoltaic power generation during the sample time period;/> is the photoelectric conversion efficiency;/> is the photovoltaic material area;/> Indicates the solar irradiance during the sample time period;/> is the variance; photovoltaic power generation obeys the mean expectation as/> , the variance is/> Gaussian distribution; according to the photovoltaic power generation obtained by the photovoltaic power generation uncertainty model, the power output from the photovoltaic power generation device to the grid is determined by the following formula:/> ;wherein,/> is the power output by the photovoltaic power generation device to the grid during the sample time period; Pload is the user's power consumption;/> Indicates photovoltaic power generation;/> is the proportional weight,/> .

根据本说明书实施例的一个方面，通过如下方式确定预设充放电规则包括：若当前时段属于用电高峰时间段，设定储能电池的充放电功率为负；若当前时段属于用电低谷时间段，设定储能电池的充放电功率为正；According to one aspect of the embodiment of this specification, determining the preset charging and discharging rules in the following manner includes: if the current period belongs to the peak period of electricity consumption, setting the charging and discharging power of the energy storage battery to be negative; if the current period belongs to the low period of electricity consumption section, set the charging and discharging power of the energy storage battery to be positive;

若光伏发电功率小于负载功率，设定所述比例权重为零；若光伏发电功率大于负载功率，设定所述比例权重为比例权重原本值；根据上述预设充放电规则，约束充放电控制决策模型中的决策变量。If the photovoltaic power generation is less than the load power, the proportional weight is set to zero; if the photovoltaic power generation is greater than the load power, the proportional weight is set to the original value of the proportional weight; the charge and discharge control decisions are constrained according to the above preset charge and discharge rules. Decision variables in the model.

根据本说明书实施例的一个方面，利用奖励函数迭代训练所述充放电控制决策基础模型包括：根据各时段的卖电单价、用电单价、各时段从电网的输入功率，确定经济性奖励函数；根据当前时刻的储能电池充放电功率及上一时刻储能电池的充放电功率，构建平顺性奖励函数；根据各时段的电网的输入功率，确定最大需求功率奖励函数；根据储能电池的最小荷电状态、储能电池的最大荷电状态及当前时刻储能电池的荷电状态，确定约束惩罚奖励函数；根据所述经济性奖励函数、平顺性奖励函数、最大需求功率奖励函数及约束惩罚奖励函数，迭代训练所述充放电控制决策模型，当上述奖励取得最大值且在预设迭代次数中保持稳定，确定充放电控制决策模型训练完成。According to one aspect of the embodiments of this specification, using the reward function to iteratively train the charge and discharge control decision-making basic model includes: determining the economic reward function based on the unit price of electricity sales, the unit price of electricity in each period, and the input power from the power grid in each period; Based on the charging and discharging power of the energy storage battery at the current moment and the charging and discharging power of the energy storage battery at the previous moment, a ride comfort reward function is constructed; according to the input power of the power grid in each period, the maximum demand power reward function is determined; according to the minimum power of the energy storage battery The state of charge, the maximum state of charge of the energy storage battery and the current state of charge of the energy storage battery are used to determine the constraint penalty reward function; according to the economic reward function, ride comfort reward function, maximum demand power reward function and constraint penalty Reward function, iteratively train the charge and discharge control decision model. When the above reward reaches the maximum value and remains stable within the preset number of iterations, it is determined that the charge and discharge control decision model training is completed.

本说明书实施例还提供了一种基于融合规则强化学习的光伏储能系统，所述系统包括：电网，与用户侧光伏储能设备通信连接，用于向用户侧光伏储能设备供电；用户侧光伏储能设备，包括：光伏发电装置、用户用电装置及储能电池，其中，所述光伏发电装置用于向用户用电装置提供光伏发电的电量，并用于当电量剩余时，将剩余电量输送至储能电池或电网；所述储能电池用于存储光伏发电装置的剩余电量，并用于向用户用电装置供电。The embodiment of this specification also provides a photovoltaic energy storage system based on fusion rule reinforcement learning. The system includes: a power grid, communicatively connected to the user-side photovoltaic energy storage equipment, and used to supply power to the user-side photovoltaic energy storage equipment; Photovoltaic energy storage equipment includes: photovoltaic power generation device, user power device and energy storage battery, wherein the photovoltaic power generation device is used to provide photovoltaic power generation to the user power device, and is used to use the remaining power when the power is remaining. The energy storage battery is used to store the remaining power of the photovoltaic power generation device and supply power to the user's electrical device.

本说明书实施例还提供了一种基于融合规则强化学习的充放电控制决策模型训练装置，所述装置包括：样本状态空间构建单元，用于根据样本时间段与用户用电装置、储能电池、电网相关的状态变量及环境变量，确定样本状态空间；基础模型构建单元，用于根据光伏发电不确定模型，构建储能装置的充放电控制决策基础模型，其中，所述光伏发电不确定模型为考虑环境变量的高斯分布模型；输出单元，用于将样本状态空间中的状态变量输入至所述充放电控制决策基础模型，输出充放电决策变量；模型训练单元，用于根据预设充放电规则，迭代训练所述充放电控制决策基础模型，直到基础模型满足预设条件，得到训练完成的充放电控制决策模型。The embodiment of this specification also provides a charging and discharging control decision model training device based on fusion rule reinforcement learning. The device includes: a sample state space construction unit, which is used to communicate with the user's power device, energy storage battery, The state variables and environmental variables related to the power grid determine the sample state space; the basic model construction unit is used to construct a basic model for charge and discharge control decision-making of the energy storage device based on the photovoltaic power generation uncertainty model, where the photovoltaic power generation uncertainty model is A Gaussian distribution model that considers environmental variables; an output unit for inputting state variables in the sample state space into the charge and discharge control decision basic model and outputting charge and discharge decision variables; a model training unit for charging and discharging according to preset rules , iteratively train the charge and discharge control decision-making basic model until the basic model meets the preset conditions, and obtain the fully trained charge and discharge control decision-making model.

本说明书实施例提供了一种计算机设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现所述基于融合规则强化学习的充放电控制决策模型训练方法。Embodiments of this specification provide a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements the fusion rule-based reinforcement learning. The charging and discharging control decision model training method.

本说明书实施例还提供了一种计算机可读存储介质，所述计算机可读存储介质存储有计算机程序，所述计算机程序被处理器执行时实现所述基于融合规则强化学习的充放电控制决策模型训练方法。Embodiments of this specification also provide a computer-readable storage medium that stores a computer program. When the computer program is executed by a processor, the charge-discharge control decision-making model based on fusion rule reinforcement learning is implemented. Training methods.

本说明书融合预先定义的规则，提高强化学习训练收敛至最优充放电控制策略的速度，降低电网功率波动和负担。This manual integrates predefined rules to improve the speed of reinforcement learning training to converge to the optimal charge and discharge control strategy, and reduce power fluctuations and burdens on the power grid.

附图说明Description of drawings

为了更清楚地说明本说明书实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本说明书的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly explain the embodiments of this specification or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are only These are some embodiments of this specification. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without exerting creative efforts.

图1所示为本说明书实施例一种基于融合规则强化学习的充放电控制决策方法的流程图；Figure 1 shows a flow chart of a charging and discharging control decision-making method based on fusion rule reinforcement learning according to an embodiment of this specification;

图2所示为本说明书实施例一种训练基于融合规则强化学习的充放电控制决策模型的方法流程图；Figure 2 shows a flow chart of a method for training a charging and discharging control decision-making model based on fusion rule reinforcement learning according to an embodiment of this specification;

图3所示为本说明书实施例一种构建充放电控制决策基础模型的方法流程图；Figure 3 shows a flow chart of a method for constructing a basic model for charge and discharge control decision-making according to an embodiment of this specification;

图4所示为本说明书实施例一种确定样本时间段储能电池的状态变量的方法流程图；Figure 4 shows a flow chart of a method for determining state variables of an energy storage battery in a sample time period according to an embodiment of this specification;

图5所示为本说明书实施例一种确定预设充放电规则的方法流程图；Figure 5 shows a flow chart of a method for determining preset charging and discharging rules according to an embodiment of this specification;

图6所示为本说明书实施例一种利用奖励函数迭代训练所述充放电控制决策基础模型的方法流程图；Figure 6 shows a flow chart of a method for iteratively training the charge and discharge control decision-making basic model using a reward function according to an embodiment of this specification;

图7所示为本说明书实施例一种基于融合规则强化学习的光伏储能系统的示意图；Figure 7 shows a schematic diagram of a photovoltaic energy storage system based on fusion rule reinforcement learning according to an embodiment of this specification;

图8所示为本说明书实施例一种基于融合规则强化学习的充放电控制决策模型训练装置的结构示意图；Figure 8 shows a schematic structural diagram of a charging and discharging control decision model training device based on fusion rule reinforcement learning according to an embodiment of this specification;

图9所示为本说明书实施例一种计算机设备的结构示意图。FIG. 9 is a schematic structural diagram of a computer device according to an embodiment of this specification.

附图符号说明：Explanation of drawing symbols:

701、电网；701. Power grid;

702、用户侧光伏储能设备；702. User-side photovoltaic energy storage equipment;

703、光伏发电装置；703. Photovoltaic power generation device;

704、用户用电装置；704. User electrical equipment;

705、储能电池；705. Energy storage battery;

801、状态空间构建单元；801. State space building unit;

802、最优充放电决策变量确定单元；802. Optimal charging and discharging decision variable determination unit;

902、计算机设备；902. Computer equipment;

904、处理器；904. Processor;

906、存储器；906. Memory;

908、驱动机构；908. Driving mechanism;

910、输入/输出模块；910. Input/output module;

912、输入设备；912. Input device;

914、输出设备；914. Output device;

916、呈现设备；916. Presentation equipment;

918、图形用户接口；918. Graphical user interface;

920、网络接口；920. Network interface;

922、通信链路；922. Communication link;

924、通信总线。924. Communication bus.

具体实施方式Detailed ways

为了使本技术领域的人员更好地理解本说明书中的技术方案，下面将结合本说明书实施例中的附图，对本说明书实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本说明书一部分实施例，而不是全部的实施例。基于本说明书中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本说明书保护的范围。In order to enable those skilled in the art to better understand the technical solutions in this specification, the technical solutions in the embodiments of this specification will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of this specification. Obviously, the described The embodiments are only some of the embodiments of this specification, but not all of the embodiments. Based on the embodiments in this specification, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this specification.

需要说明的是，本说明书的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换，以便这里描述的本说明书的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外，术语“包括”和“具有”以及他们的任何变形，意图在于覆盖不排他的包含，例如，包含了一系列步骤或单元的过程、方法、装置、产品或设备不必限于清楚地列出的那些步骤或单元，而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms "first", "second", etc. in the description and claims of this specification and the above-mentioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances so that the embodiments of the specification described herein can be practiced in sequences other than those illustrated or described herein. In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusions, for example, a process, method, apparatus, product or equipment that includes a series of steps or units and need not be limited to those explicitly listed. Those steps or elements may instead include other steps or elements not expressly listed or inherent to the process, method, product or apparatus.

本说明书提供了如实施例或流程图所述的方法操作步骤，但基于常规或者无创造性的劳动可以包括更多或者更少的操作步骤。实施例中列举的步骤顺序仅仅为众多步骤执行顺序中的一种方式，不代表唯一的执行顺序。在实际中的系统或装置产品执行时，可以按照实施例或者附图所示的方法顺序执行或者并行执行。This specification provides method operation steps as described in the examples or flow charts, but more or less operation steps may be included based on routine or non-inventive efforts. The sequence of steps listed in the embodiment is only one way of executing the sequence of many steps, and does not represent the only execution sequence. When the actual system or device product is executed, the methods shown in the embodiments or drawings may be executed sequentially or in parallel.

需要说明的是，本说明书的基于融合规则强化学习的储能充放电控制决策方法及装置可用于电力技术领域，本说明书对基于融合规则强化学习的储能充放电控制决策方法及装置的应用领域不做限定。It should be noted that the energy storage charge and discharge control decision-making method and device based on fusion rule reinforcement learning in this specification can be used in the field of electric power technology. The application fields of this specification are to the energy storage charge and discharge control decision-making method and device based on fusion rule reinforcement learning. No restrictions.

图1所示为本说明书实施例一种基于融合规则强化学习的充放电控制决策方法的流程图，具体包括如下步骤：Figure 1 shows a flow chart of a charging and discharging control decision-making method based on fusion rule reinforcement learning according to an embodiment of this specification, which specifically includes the following steps:

步骤101，将任意时间段的用户用电功率、储能电池的荷电状态、室外温度、太阳光辐照度、电网用电单价、电网卖电单价，确定状态空间。Step 101: Determine the state space based on the user's power consumption in any time period, the state of charge of the energy storage battery, outdoor temperature, solar irradiance, grid electricity unit price, and grid electricity sales unit price.

本步骤应用于模型应用阶段，与模型训练阶段构建状态样本空间类似，取任意未知时间段的用户用电功率、储能电池的荷电状态、室外温度、太阳光辐照度、电网用电单价、电网卖电单价，确定状态空间。This step is applied in the model application stage. It is similar to the state sample space constructed in the model training stage. It takes the user's power consumption, the state of charge of the energy storage battery, outdoor temperature, solar irradiance, grid electricity unit price, The unit price of electricity sold by the power grid determines the state space.

步骤102，将所述状态空间输入至基于融合规则强化学习的充放电控制决策基础模型，得到所述任意时间段的最优充放电决策变量。其中，所述最优充放电决策变量包括：储能电池最优充放电功率及最优系数，所述基于融合规则强化学习的充放电控制决策模型通过样本状态空间、光伏发电不确定模型训练得到。Step 102: Input the state space into the charge and discharge control decision-making basic model based on fusion rule reinforcement learning to obtain the optimal charge and discharge decision variables for any time period. Among them, the optimal charge and discharge decision variables include: the optimal charge and discharge power and optimal coefficient of the energy storage battery. The charge and discharge control decision model based on fusion rule reinforcement learning is obtained through sample state space and photovoltaic power generation uncertainty model training. .

本步骤中，将状态空间作为最优策略网络的输入，输出该时间段的用户侧储能电池的最优充放电动作。本步骤中，关于基于融合规则强化学习的充放电控制决策模型的训练过程详见图2描述。In this step, the state space is used as the input of the optimal policy network, and the optimal charging and discharging actions of the user-side energy storage battery in this time period are output. . In this step, the training process of the charge and discharge control decision-making model based on fusion rule reinforcement learning is detailed in Figure 2.

图2所示为本说明书实施例一种训练基于融合规则强化学习的充放电控制决策模型的方法流程图，具体包括如下步骤：Figure 2 shows a flow chart of a method for training a charging and discharging control decision-making model based on fusion rule reinforcement learning according to an embodiment of this specification, which specifically includes the following steps:

步骤201，根据样本时间段与用户用电装置、储能电池、电网相关的状态变量及环境变量，确定样本状态空间。在本步骤中，用户用电装置相关的状态变量为用户负荷功率，其中，样本时间段的用户负荷功率可以由表示；储能电池相关的状态变量为储能电池的荷电状态，可以由/>表示；电网相关的状态变量为用电单价、卖电单价，其中，样本时间段的用电单价可以由/>表示；样本时间段的卖电单价可以由/>表示。环境变量为室外空气温度及太阳光辐照度。其中，样本时间段的室外空气温度可以由/>表示；样本时间段的太阳光辐照度可以由/>表示。Step 201: Determine the sample state space based on the state variables and environmental variables related to the user's power device, energy storage battery, and power grid during the sample time period. In this step, the state variable related to the user's electrical device is the user load power, where the user load power in the sample time period can be calculated by means; the state variable related to the energy storage battery is the state of charge of the energy storage battery, which can be expressed by/> represents; the state variables related to the power grid are the unit price of electricity consumption and the unit price of electricity sales. Among them, the unit price of electricity during the sample time period can be expressed by/> represents; the electricity sales unit price during the sample period can be expressed by/> express. Environmental variables are outdoor air temperature and solar irradiance. Among them, the outdoor air temperature in the sample time period can be calculated by/> represents; the solar irradiance in the sample time period can be expressed by/> express.

将上述同一样本时间段的状态变量及环境变量，确定为样本状态空间，后续用于充放电控制决策基础模型的输入。The above-mentioned state variables and environmental variables in the same sample time period are determined as the sample state space, which is subsequently used as input to the basic model of charge and discharge control decision-making.

步骤202，根据光伏发电不确定模型，构建充放电控制决策基础模型，其中，所述光伏发电不确定模型为考虑环境变量的高斯分布模型。在本步骤中，通过构建样本状态空间、动作空间，结合光伏发电不确定模型，构建用户侧光伏储能设备的充放电控制决策基础模型。充放电控制决策基础模型为策略网络，基础模型由全连接神经网络构建。其中，充放电控制决策基础模型通过随机初始化策略网络的参数得到。Step 202: Construct a basic model for charge and discharge control decision-making based on the photovoltaic power generation uncertainty model, where the photovoltaic power generation uncertainty model is a Gaussian distribution model that takes into account environmental variables. In this step, by constructing the sample state space and action space, combined with the photovoltaic power generation uncertainty model, a basic model for charge and discharge control decision-making of user-side photovoltaic energy storage equipment is constructed. The basic model of charge and discharge control decision-making is a policy network, and the basic model is constructed by a fully connected neural network. Among them, the basic model of charge and discharge control decision-making is obtained by randomly initializing the parameters of the policy network.

本说明书实施例的动作空间可以如下表示：；其中，/>表示储能电池的最大充电功率，/>表示储能电池的最大放电功率。The action space of the embodiment of this specification can be expressed as follows: ;wherein,/> Indicates the maximum charging power of the energy storage battery,/> Indicates the maximum discharge power of the energy storage battery.

步骤203，将样本状态空间中的状态变量输入至所述充放电控制决策基础模型，输出充放电决策变量。决策变量为储能电池的充放电功率及比例权重。其中，比例权重决定光伏输出到电网的功率，目的在于将光伏发电多余的电量卖入电网赚取经济利润。Step 203: Input the state variables in the sample state space into the charge and discharge control decision basic model, and output the charge and discharge decision variables. The decision variables are the charge and discharge power and proportional weight of the energy storage battery. Among them, the proportional weight determines the power of photovoltaic output to the grid, with the purpose of selling excess photovoltaic power into the grid to earn economic profits.

步骤204，根据预设充放电规则约束所述充放电决策变量，利用奖励函数迭代训练所述充放电控制决策基础模型，得到训练完成的充放电控制决策模型。本步骤中，采用融合规则的强化学习方法，迭代训练所述充放电控制决策基础模型，直到获得使得奖励最大的模型得到充放电控制决策模型。Step 204: Constrain the charging and discharging decision variables according to the preset charging and discharging rules, use the reward function to iteratively train the charging and discharging control decision basic model, and obtain the trained charging and discharging control decision model. In this step, the reinforcement learning method of fusion rules is used to iteratively train the charge and discharge control decision-making basic model until the model that maximizes the reward is obtained to obtain the charge and discharge control decision-making model.

在本说明书实施例中，将充放电控制决策模型的状态变量作为策略网络的输入，动作网络输出充放电控制决策模型的动作变量/>；采用预先定义的规则校正动作变量。In the embodiment of this specification, the state variables of the charge and discharge control decision model are As the input of the policy network, the action network outputs the action variables of the charge and discharge control decision model/> ; Use predefined rules to correct action variables.

进一步的，利用奖励函数训练充放电控制决策基础模型，当充放电控制决策模型的奖励函数的值达到最大，且在一定迭代时间或迭代次数内保持稳定，则将最大奖励值对应的策略网络确定充放电控制决策基础模型，模型训练完成。Furthermore, the reward function is used to train the charge and discharge control decision-making basic model. When the value of the reward function of the charge and discharge control decision model reaches the maximum and remains stable within a certain iteration time or number of iterations, the policy network corresponding to the maximum reward value is determined. Basic model of charge and discharge control decision-making, model training completed.

本说明书融合预先定义的规则，提高强化学习训练收敛至最优充放电控制策略的速度，提高用电经济性，降低电网功率波动和负担。This manual integrates predefined rules to improve the speed of reinforcement learning training to converge to the optimal charging and discharging control strategy, improve the economy of electricity consumption, and reduce power fluctuations and burdens on the power grid.

图3所示为本说明书实施例一种构建充放电控制决策基础模型的方法流程图，具体包括如下步骤：Figure 3 shows a flow chart of a method for constructing a basic model for charge and discharge control decision-making according to an embodiment of this specification, which specifically includes the following steps:

步骤301，基于服从高斯分布光伏发电不确定模型，确定样本时间段的光伏发电功率。在本说明书实施例中，光伏发电不确定性模型能够确定第k样本时间段的光伏发电功率，利用光伏发电不确定性模型计算光伏发电功率，考虑了环境因素，包括环境温度、太阳光辐照度，第k样本时间段的光伏发电功率服从均值期望为，方差为/>的高斯分布，其中，/>为样本时间段的光伏发电功率；/>为光电转换效率；/>为光伏材料面积；/>表示样本时间段的太阳光辐照度；/>为方差。本说明书考虑了环境因素造成的光伏发电的不确定性，实现了光伏发电装置、储能电池和电网之间的协调，降低了用户经济成本和电网供电负荷。Step 301: Determine the photovoltaic power generation power in the sample time period based on the photovoltaic power generation uncertainty model that obeys Gaussian distribution. In the embodiment of this specification, the photovoltaic power generation uncertainty model can determine the photovoltaic power generation power in the kth sample time period. The photovoltaic power generation uncertainty model is used to calculate the photovoltaic power generation power, taking into account environmental factors, including ambient temperature and solar irradiation. Degree, the photovoltaic power generation in the kth sample time period obeys the mean expectation as , the variance is/> Gaussian distribution, where, /> is the photovoltaic power generation during the sample time period;/> is the photoelectric conversion efficiency;/> is the photovoltaic material area;/> Indicates the solar irradiance during the sample time period;/> is the variance. This manual takes into account the uncertainty of photovoltaic power generation caused by environmental factors, achieves coordination between photovoltaic power generation devices, energy storage batteries and the power grid, and reduces user economic costs and power grid power supply load.

步骤302，根据充放电控制决策基础模型输出的决策变量、样本时间段的光伏发电功率与用户用电功率之差，确定样本时间段光伏发电装置输出到电网的功率。Step 302: Determine the power output by the photovoltaic power generation device to the grid during the sample time period based on the decision variables output by the charge and discharge control decision-making basic model and the difference between the photovoltaic power generation power in the sample time period and the user's power consumption.

在本步骤中，充放电控制决策基础模型输出的决策变量包括比例权重及储能电池的充放电功率。根据前述步骤可知，样本时间段的光伏发电功率由/>表示。In this step, the decision variables output by the charge and discharge control decision-making basic model include proportional weights and the charging and discharging power of the energy storage battery. According to the previous steps, it can be seen that the photovoltaic power generation power in the sample time period is given by/> express.

在本步骤中，样本时间段的光伏发电装置输出到电网的功率由如下公式表示：；其中，/>为第k个样本时间段光伏发电装置输出到电网的功率；/>为第k个样本时间段的用户用电功率；/>表示第k个样本时间段的光伏发电功率；/>为比例权重，/>。其中，用户用电功率可以直接采集得到。上述光伏发电装置输出到电网的功率公式表示：光伏发电装置发电电量优先供给用户使用，若有剩余电量则分配给储能电池或电网。在本说明书实施例中，光伏发电装置输出到电网的功率、比例权重为训练充放电控制决策基础模型的决策变量。In this step, the power output from the photovoltaic power generation device to the grid during the sample time period is expressed by the following formula: ;wherein,/> is the power output by the photovoltaic power generation device to the grid in the k-th sample time period;/> It is the user’s power consumption in the k-th sample time period;/> Represents the photovoltaic power generation power in the kth sample time period;/> is the proportional weight,/> . Among them, the user's electrical power can be collected directly. The power formula of the above-mentioned photovoltaic power generation device output to the power grid indicates that the power generated by the photovoltaic power generation device is given priority to the user, and if there is remaining power, it is allocated to the energy storage battery or the power grid. In the embodiment of this specification, the power and proportional weight output by the photovoltaic power generation device to the power grid are decision variables for training the basic model of charge and discharge control decision-making.

在本说明书另外一些实施例中，电力平衡方程用于描述电网及用户侧光伏储能设备组成的系统，在运行状态时的电力输入及输出的平衡。In other embodiments of this specification, the power balance equation is used to describe the balance of power input and output in the operating state of a system composed of the power grid and user-side photovoltaic energy storage equipment.

电力平衡方程由如下公式表示：，/>为第k个样本时间段从电网输入的功率；/>为第k个样本时间段输出到电网的功率，其由光伏发电装置提供；/>为第k个样本时间段的用户用电功率；/>表示第k个样本时间段的光伏发电功率。The power balance equation is expressed by the following formula: ,/> The power input from the power grid in the kth sample time period;/> is the power output to the grid in the k -th sample time period, which is provided by the photovoltaic power generation device;/> It is the user’s power consumption in the k-th sample time period;/> Represents the photovoltaic power generation power in the kth sample time period.

图4所示为本说明书实施例一种确定样本时间段储能电池的状态变量的方法流程图，具体包括如下步骤：Figure 4 shows a flow chart of a method for determining the state variables of an energy storage battery during a sample time period according to an embodiment of this specification, which specifically includes the following steps:

步骤401，获取样本时间段之前的历史样本时段储能电池的荷电功率。在本步骤中，样本时间段之前的历史样本时段为：当前样本时段之前的所有历史样本时段。例如，当前样本时间段为第k个样本时间段，则样本时间段之前的历史样本时段为第k-1个样本时间段、第k-2个样本时间段……第2个样本时间段、第1个样本时间段。Step 401: Obtain the charging power of the energy storage battery in the historical sample period before the sample time period. In this step, the historical sample periods before the sample period are: all historical sample periods before the current sample period. For example, the current sample time period is the k-th sample time period, then the historical sample time periods before the sample time period are the k-1th sample time period, the k-2nd sample time period... the 2nd sample time period, The first sample time period.

在本说明书实施例中，历史样本时段的储能电池的荷电功率可以采集获取得到。In the embodiment of this specification, the charging power of the energy storage battery during the historical sample period can be collected and obtained.

步骤402，根据历史样本时段储能电池的荷电功率，确定样本时间段储能电池的荷电状态。本步骤中，具体通过如下公式确定样本时间段储能电池的荷电状态：Step 402: Determine the state of charge of the energy storage battery in the sample time period based on the charging power of the energy storage battery in the historical sample time period. In this step, the state of charge of the energy storage battery during the sample time period is determined through the following formula:

；其中，k表示当前样本时间段，/>为第1个样本时间段（即初始时刻）的储能电池的荷电状态；/>为储能电池的自放电率；/>为储能电池的开路电压；/>为储能电池的额定容量；/>为储能电池的充放电效率；/>，/>，/>为第i个样本时间段的储能电池充放电功率，当储能电池放电时，/>，当储能电池充电时，/>；/>，/>，为第i时间段的效率指数，其满足：/>。 ; Among them, k represents the current sample time period, /> It is the state of charge of the energy storage battery in the first sample time period (i.e. the initial moment);/> is the self-discharge rate of the energy storage battery;/> is the open circuit voltage of the energy storage battery;/> is the rated capacity of the energy storage battery;/> is the charging and discharging efficiency of the energy storage battery;/> ,/> ,/> is the charging and discharging power of the energy storage battery in the i -th sample time period. When the energy storage battery is discharged,/> , when the energy storage battery is charging,/> ;/> ,/> , is the efficiency index of the i-th time period, which satisfies:/> .

在本说明书实施例中，通过如下公式确定光伏发电不确定模型：；其中，/>为样本时间段的光伏发电功率；/>为光电转换效率；/>为光伏材料面积；/>表示样本时间段的太阳光辐照度；/>为方差；光伏发电功率服从均值期望为/>、方差为/>的高斯分布；In the embodiment of this specification, the photovoltaic power generation uncertainty model is determined through the following formula: ;wherein,/> is the photovoltaic power generation during the sample time period;/> is the photoelectric conversion efficiency;/> is the photovoltaic material area;/> Indicates the solar irradiance during the sample time period;/> is the variance; photovoltaic power generation obeys the mean expectation as/> , the variance is/> Gaussian distribution;

根据光伏发电不确定模型得到的光伏发电功率，通过如下公式确定光伏发电装置的输出到电网的功率：；其中，/>为第k个样本时间段光伏发电装置输出到电网的功率；/>为第k个样本时间段的用户用电功率；/>表示第k个样本时间段光的光伏发电功率；/>为比例权重，/>。According to the photovoltaic power generation obtained by the photovoltaic power generation uncertainty model, the power output from the photovoltaic power generation device to the grid is determined by the following formula: ;wherein,/> is the power output by the photovoltaic power generation device to the grid in the k-th sample time period;/> It is the user’s power consumption in the k-th sample time period;/> Indicates the photovoltaic power generation of light in the kth sample time period;/> is the proportional weight,/> .

图5所示为本说明书实施例一种确定预设充放电规则的方法流程图，具体包括如下步骤：Figure 5 shows a flow chart of a method for determining preset charging and discharging rules according to an embodiment of this specification, which specifically includes the following steps:

步骤501，若当前时段属于用电高峰时间段，设定储能电池的充放电功率为负。本步骤中的预设充放电规则表示在用电高峰时间段，禁止储能电池充电，由此确保储能电池的在用电高峰时间段能全面供给用户用电装置使用，提高储能电池在用电高峰时间段的利用效率。Step 501: If the current period belongs to the peak power consumption period, set the charging and discharging power of the energy storage battery to be negative. The preset charging and discharging rules in this step indicate that charging of the energy storage battery is prohibited during the peak period of electricity consumption, thereby ensuring that the energy storage battery can fully supply the user's electrical devices during the peak period of electricity consumption, and improving the efficiency of the energy storage battery. Utilization efficiency during peak electricity consumption periods.

步骤502，若当前时段属于用电低谷时间段，设定储能电池的充放电功率为正。本步骤中的预设充放电规则表示在用电低估时间段，允许储能电池充电，由此；其中，F表示用电高峰时间段集合，G表示用电低谷时间段集合；/>表示第k时间段的储能电池的充放电功率，Step 502: If the current period belongs to a low power consumption period, set the charging and discharging power of the energy storage battery to be positive. The preset charging and discharging rules in this step indicate that the energy storage battery is allowed to be charged during the period when power consumption is underestimated. Therefore, ; Among them, F represents the set of peak power consumption time periods, and G represents the set of low power consumption time periods;/> represents the charging and discharging power of the energy storage battery in the kth time period,

步骤503，若光伏发电功率小于负载功率，设定所述比例权重为零。本步骤中的预设充放电规则表示：若光伏发电装置的发电功率小于用户用电装置的负载功率，光伏发电装置无法再向储能电池或电网供电，则设定比例权重为0。Step 503: If the photovoltaic power generation is less than the load power, set the proportional weight to zero. The preset charging and discharging rules in this step indicate that if the power generation of the photovoltaic power generation device is less than the load power of the user's electrical device and the photovoltaic power generation device can no longer supply power to the energy storage battery or the grid, then the proportional weight is set to 0.

步骤504，若光伏发电功率大于负载功率，设定所述比例权重为比例权重原本值。本步骤中，预设规则表示当光伏发电装置的发电功率大于用户用电装置的用电功率，则光伏发电装置可以向储能电池或电网供电，则比例权重保留为其原有的值。Step 504: If the photovoltaic power generation is greater than the load power, set the proportional weight to the original value of the proportional weight. In this step, the preset rule indicates that when the power generated by the photovoltaic power generation device is greater than the power consumption of the user's power device, the photovoltaic power generation device can supply power to the energy storage battery or the power grid, and the proportional weight remains at its original value.

在本说明书实施例中，根据上述预设充放电规则，可以约束充放电控制决策模型中的决策变量，利用规则可以加速强化学习的效率及模型迭代的效率。In the embodiment of this specification, according to the above-mentioned preset charging and discharging rules, the decision variables in the charging and discharging control decision model can be constrained, and the rules can be used to accelerate the efficiency of reinforcement learning and the efficiency of model iteration.

图6所示为本说明书实施例一种利用奖励函数迭代训练所述充放电控制决策基础模型的方法流程图，具体包括如下步骤：Figure 6 shows a flow chart of a method for iteratively training the charge and discharge control decision-making basic model using a reward function according to an embodiment of this specification, which specifically includes the following steps:

步骤601，根据各时段的卖电单价、用电单价、各时段从电网的输入功率，确定经济性奖励函数。在本说明书实施例中，经济型奖励函数可以由如下公式确定： Step 601: Determine the economic reward function based on the unit price of electricity sold in each period, the unit price of electricity used in each period, and the input power from the power grid in each period. In the embodiment of this specification, the economic reward function can be determined by the following formula:

其中，为第k时间段的经济性奖励；in, is the economic reward in the kth time period;

表示第i个样本时间段从电网输入的功率； Represents the power input from the power grid in the i-th sample time period;

表示第i个样本时间段输出到电网的功率，N表示时间段的总数量； represents the power output to the grid in the i-th sample time period, and N represents the total number of time periods;

表示第i个样本时间段的用电单价，/>表示第i个样本时间段的卖电单价。 Represents the unit price of electricity in the i-th sample time period,/> Indicates the electricity sales unit price in the i-th sample time period.

步骤602，根据当前时刻的储能电池充放电功率及上一时刻储能电池的充放电功率，构建平顺性奖励函数。Step 602: Construct a ride comfort reward function based on the charging and discharging power of the energy storage battery at the current moment and the charging and discharging power of the energy storage battery at the previous moment.

具体的，平顺性奖励函数由如下公式表示：Specifically, the ride comfort reward function is expressed by the following formula:

； ;

式中，为第k样本时间段的平顺性奖励；/>和/>分别为第k时间段和第k-1时间段的储能电池充放电功率。In the formula, is the smoothness reward for the k -th sample time period;/> and/> are the charging and discharging power of the energy storage battery in the k- th time period and the k- 1th time period respectively.

步骤603，根据各时段的电网的输入功率，确定最大需求功率奖励函数。本步骤中，最大需求功率奖励函数由如下公式表示：；Step 603: Determine the maximum demand power reward function based on the input power of the power grid in each period. In this step, the maximum demand power reward function is expressed by the following formula: ;

式中，为第k时间段的最大需求功率奖励函数。最大需求奖励函数用于尽量减少电网的输入功率，减轻用电高峰期的用电网负担。In the formula, is the maximum demand power reward function in the kth time period. The maximum demand reward function is used to minimize the input power of the power grid and reduce the burden on the power grid during peak power consumption periods.

步骤604，根据储能电池的最小荷电状态、储能电池的最大荷电状态及当前时刻储能电池的荷电状态，确定约束惩罚奖励函数。Step 604: Determine the constraint penalty reward function based on the minimum state of charge of the energy storage battery, the maximum state of charge of the energy storage battery, and the current state of charge of the energy storage battery.

约束惩罚奖励函数可以由如下公式表示：The constrained penalty reward function can be expressed by the following formula:

； ;

式中，为第k样本时间段的约束惩罚奖励函数，/>为指数参数；/>和分别为储能电池的最小荷电状态和最大荷电状态。约束惩罚奖励函数目的在于约束储能电池的荷电状态，避免损伤电池。In the formula, is the constrained penalty reward function of the k -th sample time period,/> is the exponential parameter;/> and are respectively the minimum state of charge and the maximum state of charge of the energy storage battery. The purpose of the constraint penalty reward function is to constrain the state of charge of the energy storage battery and avoid damaging the battery.

步骤605，根据所述经济性奖励函数、平顺性奖励函数、最大需求功率奖励函数及约束惩罚奖励函数，迭代训练所述充放电控制决策模型，当上述奖励取得最大值且在预设迭代次数中保持稳定，确定充放电控制决策模型训练完成。Step 605: Iteratively train the charge and discharge control decision model according to the economic reward function, ride comfort reward function, maximum demand power reward function and constraint penalty reward function. When the above reward reaches the maximum value and is within the preset number of iterations, Maintain stability and confirm that the charge and discharge control decision model training is completed.

在本说明书实施例中，根据前述步骤构建的经济型奖励函数、平顺性奖励函数、最大需求功率奖励函数、约束惩罚奖励函数，确定充放电控制决策模型的奖励函数，如下公式表示：；In the embodiment of this specification, the reward function of the charge and discharge control decision-making model is determined based on the economic reward function, ride comfort reward function, maximum demand power reward function, and constraint penalty reward function constructed in the previous steps, as expressed by the following formula: ;

式中，为第k时间段的奖励；/>、/>、/>和/>分别为第k样本时间段的经济性奖励、平顺性奖励、最大需求功率奖励和约束惩罚奖励；/>、/>、/>和/>为权重系数。In the formula, is the reward for the kth time period;/> ,/> ,/> and/> They are the economic reward, ride comfort reward, maximum demand power reward and constraint penalty reward in the k -th sample time period;/> ,/> ,/> and/> is the weight coefficient.

根据所述奖励函数迭代训练充放电控制决策模型，当奖励取得最大值且在预设迭代次数中保持稳定或在模型训练一段时间内保持稳定，确定充放电控制决策模型训练完成。The charge and discharge control decision model is iteratively trained according to the reward function. When the reward reaches the maximum value and remains stable within the preset number of iterations or remains stable within a period of model training, it is determined that the charge and discharge control decision model training is completed.

图7所示为本说明书实施例一种基于融合规则强化学习的光伏储能系统的示意图，所述系统包括：Figure 7 shows a schematic diagram of a photovoltaic energy storage system based on fusion rule reinforcement learning according to an embodiment of this specification. The system includes:

电网701、用户侧光伏储能设备702。其中，电网701与用户侧光伏储能设备702通信连接，用于向用户侧光伏储能设备702供电。用户侧光伏储能设备702包括：光伏发电装置703、用户用电装置704及储能电池705，其中，所述光伏发电装置703用于向用户用电装置704提供光伏发电的电量，并用于当电量剩余时，将剩余电量输送至储能电池705或电网701。所述储能电池705用于存储光伏发电装置703的剩余电量，并用于向用户用电装置704供电。Power grid 701, user-side photovoltaic energy storage equipment 702. Among them, the power grid 701 is communicatively connected to the user-side photovoltaic energy storage device 702 and is used to supply power to the user-side photovoltaic energy storage device 702 . The user-side photovoltaic energy storage equipment 702 includes: a photovoltaic power generation device 703, a user power device 704, and an energy storage battery 705. The photovoltaic power generation device 703 is used to provide photovoltaic power generation to the user power device 704, and is used for current use. When the power is remaining, the remaining power is transferred to the energy storage battery 705 or the power grid 701. The energy storage battery 705 is used to store the remaining power of the photovoltaic power generation device 703 and to provide power to the user's electrical device 704 .

如图8所示为本说明书实施例一种基于融合规则强化学习的充放电控制决策装置的结构示意图，在本图中描述了基于融合规则强化学习的充放电控制决策装置的基本结构，其中的功能单元、模块可以采用软件方式实现，也可以采用通用芯片或者特定芯片实现基于融合规则强化学习的充放电控制决策，该装置具体包括：Figure 8 is a schematic structural diagram of a charge and discharge control decision-making device based on fusion rule reinforcement learning according to an embodiment of this specification. This figure describes the basic structure of a charge and discharge control decision-making device based on fusion rule reinforcement learning, in which Functional units and modules can be implemented in software, or general chips or specific chips can be used to implement charge and discharge control decisions based on fusion rule reinforcement learning. The device specifically includes:

状态空间构建单元801，用于将任意时间段的用户用电功率、储能电池的荷电状态、室外温度、太阳光辐照度、电网用电单价、电网卖电单价，确定状态空间；The state space construction unit 801 is used to determine the state space based on the user's power consumption in any period of time, the state of charge of the energy storage battery, outdoor temperature, solar irradiance, grid electricity unit price, and grid electricity sales unit price;

最优充放电决策变量确定单元802，用于将所述状态空间输入至基于融合规则强化学习的充放电控制决策模型，得到任意时间段的最优充放电决策变量，其中，所述最优充放电决策变量包括：储能电池最优充放电功率及最优系数，所述基于融合规则强化学习的充放电控制决策模型通过样本状态空间、光伏发电不确定模型训练得到。The optimal charging and discharging decision variable determination unit 802 is used to input the state space into the charging and discharging control decision model based on fusion rule reinforcement learning to obtain the optimal charging and discharging decision variables for any period of time, wherein the optimal charging and discharging decision variable is The discharge decision variables include: the optimal charge and discharge power and optimal coefficient of the energy storage battery. The charge and discharge control decision model based on fusion rule reinforcement learning is obtained through sample state space and photovoltaic power generation uncertainty model training.

本说明书针对用户侧光伏储能设施的光伏发电、储能电池和电网之间的协调问题，考虑了用户侧光伏储能设施的光伏发电不确定性，建立了光伏发电不确定模型，融合预先定义的规则，提高强化学习训练收敛至最优充放电控制策略的速度，并设计了经济性奖励、平顺性奖励、最大需求功率奖励和约束惩罚奖励，提高了用户侧用电经济性，降低了电网功率波动和负担。This manual aims at the coordination problem between photovoltaic power generation, energy storage batteries and power grid of user-side photovoltaic energy storage facilities. It considers the uncertainty of photovoltaic power generation of user-side photovoltaic energy storage facilities, establishes a photovoltaic power generation uncertainty model, and integrates pre-defined rules, improve the speed of reinforcement learning training to converge to the optimal charge and discharge control strategy, and design economic rewards, ride comfort rewards, maximum demand power rewards and constraint penalty rewards, improve the user-side electricity economy and reduce the power grid Power fluctuations and burdens.

如图9所示为本说明书实施例提供的一种计算机设备，所述计算机设备用于执行所述基于融合规则强化学习的储能充放电控制决策方法，所述计算机设备902可以包括一个或多个处理器904，诸如一个或多个中央处理单元(CPU)，每个处理单元可以实现一个或多个硬件线程。计算机设备902还可以包括任何存储器906，其用于存储诸如代码、设置、数据等之类的任何种类的信息。非限制性的，比如，存储器906可以包括以下任一项或多种组合：任何类型的RAM，任何类型的ROM，闪存设备，硬盘，光盘等。更一般地，任何存储器都可以使用任何技术来存储信息。进一步地，任何存储器可以提供信息的易失性或非易失性保留。进一步地，任何存储器可以表示计算机设备902的固定或可移除部件。在一种情况下，当处理器904执行被存储在任何存储器或存储器的组合中的相关联的指令时，计算机设备902可以执行相关联指令的任一操作。计算机设备902还包括用于与任何存储器交互的一个或多个驱动机构908，诸如硬盘驱动机构、光盘驱动机构等。Figure 9 shows a computer device provided by an embodiment of this specification. The computer device is used to execute the energy storage charge and discharge control decision-making method based on fusion rule reinforcement learning. The computer device 902 may include one or more A processor 904, such as one or more central processing units (CPUs), each of which may implement one or more hardware threads. Computer device 902 may also include any memory 906 for storing any kind of information such as code, settings, data, and the like. For example, without limitation, the memory 906 may include any one or more combinations of the following: any type of RAM, any type of ROM, flash memory device, hard disk, optical disk, etc. More generally, any memory can use any technology to store information. Further, any memory can provide volatile or non-volatile retention of information. Further, any memory may represent a fixed or removable component of computer device 902. In one instance, when processor 904 executes associated instructions stored in any memory or combination of memories, computer device 902 may perform any operation of the associated instructions. Computer device 902 also includes one or more drive mechanisms 908 for interacting with any memory, such as a hard disk drive, an optical disk drive, and the like.

计算机设备902还可以包括输入/输出模块910（I/O），其用于接收各种输入(经由输入设备912)和用于提供各种输出(经由输出设备914)。一个具体输出机构可以包括呈现设备916和相关联的图形用户接口(GUI)918。在其他实施例中，还可以不包括输入/输出模块910（I/O）、输入设备912以及输出设备914，仅作为网络中的一台计算机设备。计算机设备902还可以包括一个或多个网络接口920，其用于经由一个或多个通信链路922与其他设备交换数据。一个或多个通信总线924将上文所描述的部件耦合在一起。Computer device 902 may also include an input/output module 910 (I/O) for receiving various inputs (via input device 912 ) and for providing various outputs (via output device 914 ). One particular output mechanism may include a presentation device 916 and an associated graphical user interface (GUI) 918. In other embodiments, the input/output module 910 (I/O), the input device 912 and the output device 914 may not be included, and may only be used as a computer device in the network. Computer device 902 may also include one or more network interfaces 920 for exchanging data with other devices via one or more communication links 922 . One or more communication buses 924 couple together the components described above.

通信链路922可以以任何方式实现，例如，通过局域网、广域网(例如，因特网)、点对点连接等、或其任何组合。通信链路922可以包括由任何协议或协议组合支配的硬连线链路、无线链路、路由器、网关功能、名称服务器等的任何组合。Communication link 922 may be implemented in any manner, such as through a local area network, a wide area network (eg, the Internet), a point-to-point connection, etc., or any combination thereof. Communication link 922 may include any combination of hardwired links, wireless links, routers, gateway functions, name servers, etc. governed by any protocol or combination of protocols.

对应于图1至图6中的方法，本说明书实施例还提供了一种计算机可读存储介质，该计算机可读存储介质上存储有计算机程序，该计算机程序被处理器运行时执行上述方法的步骤。Corresponding to the methods in Figures 1 to 6, embodiments of this specification also provide a computer-readable storage medium. The computer-readable storage medium stores a computer program, and the computer program executes the above method when run by a processor. step.

本说明书实施例还提供一种计算机可读指令，其中当处理器执行所述指令时，其中的程序使得处理器执行如图1至图6所示的方法。Embodiments of this specification also provide computer-readable instructions, wherein when the processor executes the instructions, the program therein causes the processor to perform the methods shown in FIGS. 1 to 6 .

应理解，在本说明书的各种实施例中，上述各过程的序号的大小并不意味着执行顺序的先后，各过程的执行顺序应以其功能和内在逻辑确定，而不应对本说明书实施例的实施过程构成任何限定。It should be understood that in the various embodiments of this specification, the size of the sequence numbers of the above-mentioned processes does not mean the order of execution. The execution order of each process should be determined by its functions and internal logic, and should not be used in the embodiments of this specification. The implementation process constitutes any limitation.

还应理解，在本说明书实施例中，术语“和/或”仅仅是一种描述关联对象的关联关系，表示可以存在三种关系。例如，A和/或B，可以表示：单独存在A，同时存在A和B，单独存在B这三种情况。另外，本说明书中字符“/”，一般表示前后关联对象是一种“或”的关系。It should also be understood that in the embodiment of this specification, the term "and/or" is only an association relationship describing associated objects, indicating that three relationships can exist. For example, A and/or B can mean: A exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in this manual generally indicates that the related objects are an "or" relationship.

本领域普通技术人员可以意识到，结合本说明书中所公开的实施例描述的各示例的单元及算法步骤，能够以电子硬件、计算机软件或者二者的结合来实现，为了清楚地说明硬件和软件的可互换性，在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本说明书的范围。Those of ordinary skill in the art can appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed in this specification can be implemented with electronic hardware, computer software, or a combination of both. In order to clearly illustrate the hardware and software Interchangeability, in the above description, the composition and steps of each example have been generally described according to functions. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functions using different methods for each specific application, but such implementations should not be considered beyond the scope of this specification.

所属领域的技术人员可以清楚地了解到，为了描述的方便和简洁，上述描述的系统、装置和单元的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that for the convenience and simplicity of description, the specific working processes of the systems, devices and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be described again here.

在本说明书所提供的几个实施例中，应该理解到，所揭露的系统、装置和方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另外，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口、装置或单元的间接耦合或通信连接，也可以是电的，机械的或其它的形式连接。In the several embodiments provided in this specification, it should be understood that the disclosed systems, devices and methods can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented. In addition, the coupling or direct coupling or communication connection between each other shown or discussed may be an indirect coupling or communication connection through some interfaces, devices or units, or may be electrical, mechanical or other forms of connection.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本说明书实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiments of this specification.

另外，在本说明书各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以是两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of this specification may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above integrated units can be implemented in the form of hardware or software functional units.

所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本说明书的技术方案本质上或者说对现有技术做出贡献的部分，或者该技术方案的全部或部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备（可以是个人计算机，服务器，或者网络设备等）执行本说明书各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器（ROM，Read-OnlyMemory）、随机存取存储器（RAM，Random Access Memory）、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium. Based on this understanding, the technical solution in this specification is essentially or contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to cause a computer device (which can be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this specification. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program code.

本说明书中应用了具体实施例对本说明书的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本说明书的方法及其核心思想；同时，对于本领域的一般技术人员，依据本说明书的思想，在具体实施方式及应用范围上均会有改变之处，综上所述，本说明书内容不应理解为对本说明书的限制。Specific examples are used in this specification to illustrate the principles and implementation methods of this specification. The description of the above embodiments is only used to help understand the methods and core ideas of this specification; at the same time, for those of ordinary skill in the art, based on this specification The ideas of the specification may be changed in the specific implementation mode and application scope. In summary, the content of this specification should not be understood as a limitation of this specification.

Claims

1. An energy storage charge and discharge control decision-making method based on fusion rule reinforcement learning, characterized in that the method is applied to the power grid and user-side photovoltaic energy storage equipment, wherein the user-side photovoltaic energy storage equipment includes: photovoltaic power generation device, User electrical devices and energy storage batteries, the method includes:

Determine the state space by determining the user's electricity power, the state of charge of the energy storage battery, outdoor temperature, solar irradiance, grid electricity unit price, and grid electricity sales unit price in any period of time;

Input the state space into the charge and discharge control decision model based on fusion rule reinforcement learning to obtain the optimal charge and discharge decision variables for any time period, where the optimal charge and discharge decision variables include: optimal charge and discharge of the energy storage battery Power and optimal coefficients, the charge and discharge control decision-making model based on fusion rule reinforcement learning is obtained through sample state space and photovoltaic power generation uncertainty model training.

2. The energy storage charge and discharge control decision-making method based on fusion rule reinforcement learning according to claim 1, characterized in that the charge and discharge control decision-making model based on fusion rule reinforcement learning is obtained through the following steps:

Determine the sample state space based on the state variables and environmental variables related to user power devices, energy storage batteries, and power grids during the sample time period;

Construct a basic model for charge and discharge control decision-making based on the photovoltaic power generation uncertainty model, where the photovoltaic power generation uncertainty model is a Gaussian distribution model that considers environmental variables;

Input the state variables in the sample state space into the basic charge and discharge control decision model, and output the charge and discharge decision variables;

The charging and discharging decision-making variables are constrained according to the preset charging and discharging rules, and the charging and discharging control decision-making basic model is iteratively trained using a reward function to obtain a fully trained charging and discharging control decision-making model.

3. The energy storage charge and discharge control decision-making method based on fusion rule reinforcement learning according to claim 2, characterized in that the state variables of the energy storage battery in the sample time period are determined in the following manner:

Obtain the charging power of the energy storage battery in the historical sample period before the sample time period;

According to the charging power of the energy storage battery during the historical sample period, the state of charge of the energy storage battery during the sample period is determined by the following formula: ; Among them, k represents the current sample time period, /> It is the state of charge of the energy storage battery in the first sample time period (i.e. the initial moment);/> is the self-discharge rate of the energy storage battery;/> is the open circuit voltage of the energy storage battery;/> is the rated capacity of the energy storage battery;/> is the charging and discharging efficiency of the energy storage battery;/> ,/> ,/> is the charging and discharging power of the energy storage battery in the i -th sample time period. When the energy storage battery is discharged,/> , when the energy storage battery is charging,/> ;/> ,/> , is the efficiency index of the i-th time period, which satisfies:/> .

4. The energy storage charge and discharge control decision-making method based on fusion rule reinforcement learning according to claim 2, characterized in that, after outputting the charge and discharge decision variables, the method further includes:

Based on the photovoltaic power generation uncertainty model that obeys Gaussian distribution, determine the photovoltaic power generation power in the sample time period;

According to the decision variables output by the charge and discharge control decision-making basic model and the difference between the photovoltaic power generation power in the sample time period and the user's power consumption, the power output by the photovoltaic power generation device to the grid during the sample time period is determined.

5. The energy storage charge and discharge control decision-making method based on fusion rule reinforcement learning according to claim 4, characterized in that the method includes:

The photovoltaic power generation uncertainty model is determined through the following formula: ;wherein,/> is the photovoltaic power generation during the sample time period;/> is the photoelectric conversion efficiency;/> is the photovoltaic material area; Indicates the solar irradiance during the sample time period;/> is the variance; photovoltaic power generation obeys the mean expectation as , the variance is/> Gaussian distribution; according to the photovoltaic power generation power obtained by the photovoltaic power generation uncertainty model, the power output from the photovoltaic power generation device to the grid is determined by the following formula: ;wherein,/> is the power output by the photovoltaic power generation device to the grid in the k-th sample time period;/> It is the user’s power consumption in the k-th sample time period;/> Indicates the photovoltaic power generation of light in the kth sample time period;/> is the proportional weight,/> .

6. The energy storage charge and discharge control decision-making method based on fusion rule reinforcement learning according to claim 5, characterized in that determining the preset charge and discharge rules in the following manner includes:

If the current period belongs to the peak period of electricity consumption, set the charging and discharging power of the energy storage battery to be negative;

If the current period belongs to the low power consumption period, set the charging and discharging power of the energy storage battery to be positive;

If the photovoltaic power generation is less than the load power, set the proportional weight to zero;

If the photovoltaic power generation is greater than the load power, the proportional weight is set to the original value of the proportional weight;

According to the above preset charging and discharging rules, the decision variables in the charging and discharging control decision model are constrained.

7. The energy storage charge and discharge control decision-making method based on fusion rule reinforcement learning according to claim 6, characterized in that the iterative training of the charge and discharge control decision-making basic model using a reward function includes:

Determine the economic reward function based on the unit price of electricity sold in each period, the unit price of electricity consumed in each period, and the input power from the power grid in each period;

Based on the charging and discharging power of the energy storage battery at the current moment and the charging and discharging power of the energy storage battery at the previous moment, a ride comfort reward function is constructed;

According to the input power of the power grid in each period, determine the maximum demand power reward function;

According to the minimum state of charge of the energy storage battery, the maximum state of charge of the energy storage battery and the current state of charge of the energy storage battery, determine the constraint penalty reward function;

According to the economic reward function, ride comfort reward function, maximum demand power reward function and constraint penalty reward function, the charge and discharge control decision-making model is iteratively trained. When the above reward reaches the maximum value and remains stable within the preset number of iterations, It is determined that the charging and discharging control decision model training is completed.

8. A photovoltaic energy storage system based on fusion rule reinforcement learning, characterized in that the system includes:

The power grid is connected to the user-side photovoltaic energy storage equipment for communication and is used to supply power to the user-side photovoltaic energy storage equipment;

User-side photovoltaic energy storage equipment includes: photovoltaic power generation device, user power device and energy storage battery, wherein the photovoltaic power generation device is used to provide photovoltaic power generation to the user power device, and is used to store the photovoltaic power when the power is remaining. The remaining power is transferred to the energy storage battery or power grid;

The energy storage battery is used to store the remaining power of the photovoltaic power generation device and to provide power to the user's electrical device.

9. An energy storage charge and discharge control decision-making device based on fusion rule reinforcement learning, characterized in that the device includes:

The state space construction unit is used to determine the state space by combining the user's electricity power, the state of charge of the energy storage battery, outdoor temperature, solar irradiance, grid electricity unit price, and grid electricity sales unit price in any period of time;

The optimal charging and discharging decision variable determination unit is used to input the state space into the charging and discharging control decision model based on fusion rule reinforcement learning to obtain the optimal charging and discharging decision variables for any time period, wherein the optimal charging and discharging The decision variables include: the optimal charge and discharge power and optimal coefficient of the energy storage battery. The charge and discharge control decision model based on fusion rule reinforcement learning is obtained through sample state space and photovoltaic power generation uncertainty model training.

10. A computer device, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that when the processor executes the computer program, any one of claims 1 to 7 is realized. method described in the item.

11. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the method according to any one of claims 1 to 7 is implemented.