CN115130733A

CN115130733A - Hydrogen-containing building energy system operation control method combining optimization and learning

Info

Publication number: CN115130733A
Application number: CN202210631486.3A
Authority: CN
Inventors: 余亮; 张予涵; 任静怡; 岳东; 窦春霞; 张腾飞
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2022-06-06
Filing date: 2022-06-06
Publication date: 2022-09-30
Anticipated expiration: 2042-06-06
Also published as: CN115130733B

Abstract

The invention discloses a hydrogen-containing building energy system operation control method combining optimization and learning in the field of building energy system operation control, which comprises the following steps: establishing an expected operation cost minimization problem model of the hydrogen-containing building energy system, and converting the problem model into a plurality of single-time-slot optimization sub-problem models; decomposing the single-time-slot optimization sub-problem model into an upper sub-problem model and a lower sub-problem model; solving the upper sub-problem model by adopting a convex optimization method, and calculating according to the solving result of the upper sub-problem to obtain the heat production quantity of the fuel cell; taking the heat production quantity of the fuel cell as the input state of the lower layer subproblem model; solving the lower sub-problem model to obtain an optimal control strategy of the heat energy subsystem; the operation of the hydrogen-containing building energy system is controlled in real time; the invention realizes the minimum operation cost under high thermal comfort by utilizing the dual advantages of the convex optimization method based on the model and the learning method based on the model-free.

Description

An operation control method for hydrogen-containing building energy system based on joint optimization and learning

技术领域technical field

本发明属于建筑能源系统运行控制领域，具体涉及含氢建筑能源系统运行控制方法。The invention belongs to the field of building energy system operation control, and in particular relates to a hydrogen-containing building energy system operation control method.

背景技术Background technique

建筑在全世界能源消耗和碳排放总量中占有很大的比重。在2019年，全球建筑消耗的能源占全球能源总量约30％，产生的碳排放占全球碳排放总量约28％。目前全球能源供给主要依赖化石燃料等不可再生能源，导致能源枯竭问题和环境污染问题日益严重。近年来，氢能因其具有清洁、可再生、来源广泛、储运方便、利用率高等优点受到了广泛关注，被公认为一种很有前景的化石燃料替代品。此外，氢能存储系统与其他储能系统(如热能存储系统、电能存储系统)的协调运行有助于提升建筑能量效率。因此，含氢建筑能源系统的运行控制值得深入研究。Buildings account for a large proportion of the world's total energy consumption and carbon emissions. In 2019, the energy consumed by global buildings accounted for about 30% of the total global energy, and the carbon emissions generated accounted for about 28% of the total global carbon emissions. At present, the global energy supply mainly relies on non-renewable energy sources such as fossil fuels, resulting in increasingly serious problems of energy depletion and environmental pollution. In recent years, hydrogen energy has received extensive attention due to its clean, renewable, wide-ranging sources, convenient storage and transportation, and high utilization rate, and has been recognized as a promising alternative to fossil fuels. In addition, the coordinated operation of hydrogen energy storage systems and other energy storage systems (such as thermal energy storage systems, electrical energy storage systems) can help improve building energy efficiency. Therefore, the operation control of the hydrogen-containing building energy system is worthy of in-depth study.

现有研究提出了若干含氢建筑能源系统的运行控制方法，如随机规划、模型预测控制等。这些方法的目标是最小化系统运行成本(主要包括能量成本和碳排放成本等)。尽管现有研究取得了一定的进展，但均未考虑建筑热动态性，这意味着高建筑热惯性(即建筑室内温度由于初始激励(如突然停止加热)呈现弱化和延迟反应的现象)并未被充分利用以降低系统运行成本。Existing studies have proposed several operational control methods for hydrogen-containing building energy systems, such as stochastic programming and model predictive control. The goal of these methods is to minimize system operating costs (mainly including energy costs and carbon emissions costs, etc.). Although some progress has been made in existing studies, none of them consider building thermal dynamics, which means that high building thermal inertia (i.e. the weakening and delayed response of building interior temperature due to initial excitation (such as abrupt heating stop)) does not be fully utilized to reduce system operating costs.

当将建筑热动态性考虑在含氢建筑能源系统中时，系统运行优化控制面临四个方面的挑战：(1)存在大量不确定性系统参数；(2)存在大量时间和空间耦合运行约束；(3)氢能存储系统中燃料电池同时产生电和热导致电能流和热能流之间存在耦合；(4)很难建立既准确又易于建筑控制的明确建筑热动态性模型。具体而言，单智能体深度强化学习的动作空间维度将随着热区域数量增大而急剧增加；多智能体深度强化学习由于面临的是异构智能体之间的协同，在智能体数量增加时，其有效学习面临困难。When considering building thermal dynamics in a hydrogen-containing building energy system, the optimal control of system operation faces four challenges: (1) there are a lot of uncertain system parameters; (2) there are a lot of time and space coupled operational constraints; (3) The simultaneous generation of electricity and heat by a fuel cell in a hydrogen energy storage system results in a coupling between the electrical energy flow and the thermal energy flow; (4) It is difficult to establish a clear building thermal dynamic model that is both accurate and easy to control. Specifically, the action space dimension of single-agent deep reinforcement learning will increase sharply with the increase of the number of hot regions; multi-agent deep reinforcement learning is faced with the cooperation between heterogeneous agents, and the number of agents increases It is difficult to learn effectively.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于提供一种联合优化与学习的含氢建筑能源系统运行控制方法，利用基于模型的凸优化方法和基于无模型的学习方法的双重优势，实现高热舒适性下的运行成本最小化。The purpose of the present invention is to provide a combined optimization and learning operation control method for a hydrogen-containing building energy system, which utilizes the dual advantages of a model-based convex optimization method and a model-free learning method to minimize operating costs under high thermal comfort. .

为达到上述目的，本发明所采用的技术方案是：In order to achieve the above object, the technical scheme adopted in the present invention is:

本发明第一方面提供了一种联合优化与学习的含氢建筑能源系统运行控制方法，包括：A first aspect of the present invention provides an operation control method for a hydrogen-containing building energy system that combines optimization and learning, including:

根据含氢建筑能源系统的运行约束条件和参数不确定性，建立含氢建筑能源系统的期望运行成本最小化问题模型；利用李雅普诺夫最优化框架将期望运行成本最小化问题转化为多个单时隙最优化子问题模型；According to the operating constraints and parameter uncertainties of the hydrogen-containing building energy system, a model of the expected operating cost minimization problem of the hydrogen-containing building energy system is established; the Lyapunov optimization framework is used to transform the expected operating cost minimization problem into multiple single Slot optimization subproblem model;

将单时隙最优化子问题模型分解为与电-氢子系统对应的上层子问题模型以及与热能子系统对应的下层子问题模型；The single-slot optimization sub-problem model is decomposed into an upper sub-problem model corresponding to the electric-hydrogen subsystem and a lower sub-problem model corresponding to the thermal energy subsystem;

对上层子问题模型采用凸优化方法进行求解，并根据上层子问题的求解结果计算得到燃料电池产热量；The upper sub-problem model is solved by using the convex optimization method, and the heat production of the fuel cell is calculated according to the solution results of the upper sub-problem;

将燃料电池产热量作为下层子问题模型的输入状态；基于马尔科夫博弈框架对下层子问题模型进行重新建模，并采用多智能体注意力深度确定性策略梯度算法进行求解，得到热能子系统的最优控制策略；The fuel cell heat production is used as the input state of the lower sub-problem model; the lower sub-problem model is re-modeled based on the Markov game framework, and the multi-agent attention depth deterministic strategy gradient algorithm is used to solve the problem, and the thermal energy subsystem is obtained. the optimal control strategy;

根据上层子问题模型的凸优化求解方法和热能子系统最优控制策略对含氢建筑能源系统的运行进行实时控制。According to the convex optimization solution method of the upper sub-problem model and the optimal control strategy of the thermal energy subsystem, the operation of the hydrogen-containing building energy system is controlled in real time.

优选的，含氢建筑能源系统期望运行成本最小化问题模型，表达公式为：Preferably, the expected operating cost minimization problem model of the hydrogen-containing building energy system is expressed as:

s.t.电能子系统的运行约束、氢能子系统的运行约束和热能子系统的运行约束；s.t. Operational constraints of the electrical energy subsystem, operational constraints of the hydrogen energy subsystem, and operational constraints of the thermal energy subsystem;

公式中，C_1,t为t时隙买卖电成本，C_2,t为t时隙碳排放成本，C_3,t为t时隙电能存储系统损耗成本，C_4,t为t时隙氢能子系统运维成本，C_5,t为t时隙热能子系统损耗成本，C_6,t为t时隙天然气购买成本，T表示时隙长度；决策变量Θ包括：本地能源系统与大电网之间的能量交易量、电能存储系统充放电功率、电解槽输入功率、燃料电池输出功率、每个房间的热供给功率、热能存储系统的充放电功率、天然气消耗量。In the formula, C1 _,t is the cost of buying and selling electricity in the t slot, C2 _,t is the carbon emission cost in the t slot, C3 _,t is the loss cost of the energy storage system in the t slot, and C4 _,t is the hydrogen in the t slot. Energy subsystem operation and maintenance cost, C _5,t is the loss cost of the thermal energy subsystem in the t time slot, C _6,t is the natural gas purchase cost in the t time slot, T represents the time slot length; decision variables Θ include: local energy system and large power grid The energy transaction volume, the charging and discharging power of the electric energy storage system, the input power of the electrolyzer, the output power of the fuel cell, the heat supply power of each room, the charging and discharging power of the thermal energy storage system, and the natural gas consumption.

优选的，利用李雅普诺夫最优化框架将期望运行成本最小化问题转化为多个单时隙最优化子问题模型的方法包括：Preferably, the method for transforming the expected running cost minimization problem into multiple single-slot optimization sub-problem models using the Lyapunov optimization framework includes:

判定含氢建筑能源系统的可控性；选择符合可控条件的氢建筑能源系统构建电能子系统和氢能子系统的虚拟队列；根据虚拟队列定义李雅普诺夫函数，计算单时隙李雅普诺夫漂移和运行成本的加权和ΔY(t)；通过最小化加权和ΔY(t)将含氢建筑能源系统期望运行成本最小化问题模型转化为多个单时隙最优化子问题模型，计算确定单时隙最优化子问题模型中的最优系统参数。Determine the controllability of the hydrogen-containing building energy system; select the hydrogen building energy system that meets the controllable conditions to construct the virtual queue of the electric energy subsystem and the hydrogen energy subsystem; define the Lyapunov function according to the virtual queue, and calculate the single-slot Lyapunov The weighted sum ΔY(t) of drift and operating cost; by minimizing the weighted sum ΔY(t), the expected operating cost minimization problem model of the hydrogen-containing building energy system is transformed into multiple single-slot optimization sub-problem models, and the calculation and determination of single Optimal system parameters in the slotted optimization subproblem model.

优选的，所述可控条件的表达公式为：Preferably, the expression formula of the controllable condition is:

v^max＞τ^max，v ^max >τ ^max ,

v^min＞τ^min，v ^min >τ ^min ,

v^max＝max_t v_t，τ^max＝max_tτ_t，v^min＝min_t v_t，τ^min＝min_tτ_t，

v ^max =max _t v _t , τ ^max =max _t τ _t , v ^min =min _t v _t , τ ^min =min _t τ _t ,

式中，v^max和v^min分别表示买电最高电价和最低电价；τ^max和τ^min分别表示卖电最高电价和最低电价；η_bc和η_bd分别表示电能存储系统的充电效率和放电效率；μ_c是加权参数，用来表示碳排放相对于能量成本的重要性；

和

分别表示碳排放最大速率和最小速率；ψ_BESS是电能存储系统折旧系数；B^max和B^min分别表示电能存储系统的最大储能水平和最小储能水平；

和

分别表示电能存储系统的注入额定功率和释放额定功率；ω_el和ω_fc分别表示电解槽和燃料电池的转换系数；

和

分别表示电解槽和燃料电池是否开启的指示变量；H^max和H^min分别表示氢能存储系统的最大储能水平和最小储能水平；

和

分别表示电解槽和燃料电池的额定功率；Δt表示时隙长度。In the formula, v ^max and v ^min represent the highest and lowest electricity prices for buying electricity, respectively; τ ^max and τ ^min represent the highest and lowest electricity prices for selling electricity, respectively; η _bc and η _bd represent the charging efficiency and discharging efficiency of the electrical energy storage system, respectively; μ _c is a weighting parameter used to express the importance of carbon emissions relative to energy costs;

and

represent the maximum rate and minimum rate of carbon emission, respectively; ψ _BESS is the depreciation coefficient of the electrical energy storage system; B ^max and B ^min represent the maximum energy storage level and the minimum energy storage level of the electrical energy storage system, respectively;

and

represent the injection rated power and release rated power of the electrical energy storage system, respectively; ω _el and ω _fc represent the conversion coefficients of the electrolyzer and fuel cell, respectively;

and

respectively indicate whether the electrolyzer and the fuel cell are turned on; H ^max and H ^min respectively indicate the maximum energy storage level and the minimum energy storage level of the hydrogen energy storage system;

and

are the rated power of the electrolyzer and the fuel cell, respectively; Δt is the time slot length.

优选的，计算单时隙李雅普诺夫漂移和运行成本的加权和ΔY(t)的方法包括：Preferably, the method for calculating the weighted sum ΔY(t) of the single-slot Lyapunov drift and the running cost includes:

所述李雅普诺夫函数L(t)，表达公式为：The Lyapunov function L(t), the expression formula is:

公式中，X_B,t＝B_t+W_B，X_H,t＝H_t+W_H，ω_r是统一X_B,t和X_H,t量纲的加权系数；B_t表示为t时隙的电能存储系统的储能水平，H_t表示为t时隙的氢能存储系统的储能水平，W_B表示为最优电能存储系统的参数，W_H表示为最优氢能存储系统的参数；B_t和H_t需要满足的动态性约束分别表示为：

式中，P_bc,t和P_bd,t分别表示电能存储系统的充电功率和放电功率；P_el,t和P_fc,t分别表示t时隙的电解槽输入功率和燃料电池输出功率。单时隙李雅普诺夫漂移，表达公式为：In the formula, X _B,t =B _t +W _B , X _H,t =H _t +W _H , ω _r is a weighting coefficient that unifies the dimensions of X _B,t and X _H,t ; when B _t is expressed as t The energy storage level of the electric energy storage system in time slot t, H _t is the energy storage level of the hydrogen energy storage system in time slot t, _WB is the parameter of the optimal electric energy storage system, and W _H is the optimal hydrogen energy storage system. parameters; the dynamic constraints that B _t and H _t need to satisfy are expressed as:

In the formula, P _bc,t and P _bd,t represent the charging power and discharging power of the electrical energy storage system, respectively; P _el,t and P _fc,t represent the electrolyzer input power and fuel cell output power in time slot t, respectively. The single-slot Lyapunov drift is expressed as:

Λ_t＝E{L(t+1)-L(t)|X(t)}，Λ _t =E{L(t+1)-L(t)|X(t)},

公式中，X(t)＝(X_B,t,X_H,t)，E{·}表示期望运算。In the formula, X(t)=(X _B,t ,X _H,t ), and E{·} represents the expected operation.

则单时隙李雅普诺夫漂移Λ_t的表达式可转化为：Then the expression of the single-slot Lyapunov drift Λ _t can be transformed into:

Λ_t≤ξ_B+ξ_H+E{Γ₀|X(t)}，Λ _t ≤ξ _B +ξ _H +E{Γ ₀ |X(t)},

计算单时隙李雅普诺夫漂移和运行成本的加权和ΔY(t)，表达公式为：Calculate the weighted sum ΔY(t) of the single-slot Lyapunov drift and running cost, expressed as:

式中，V是一个加权参数。where V is a weighting parameter.

优选的，单时隙最优化子问题模型的表达公式为Preferably, the expression formula of the single-slot optimization sub-problem model is:

最优电能存储系统的参数W_B的计算公式为： _The calculation formula of the parameter WB of the optimal electric energy storage system is:

最优氢能存储系统的参数W_H的计算公式为：The calculation formula of the parameter W _H of the optimal hydrogen energy storage system is:

s.t.电能子系统的运行约束、氢能子系统的运行约束和热能子系统的运行约束。s.t. Operational constraints of the electrical energy subsystem, operational constraints of the hydrogen energy subsystem, and operational constraints of the thermal energy subsystem.

优选的，根据信息确定性将单时隙最优化子问题模型分解为与电-氢子系统对应的上层子问题模型以及与热能子系统对应的下层子问题模型，方法包括：Preferably, the single-slot optimization sub-problem model is decomposed into an upper-level sub-problem model corresponding to the electric-hydrogen subsystem and a lower-level sub-problem model corresponding to the thermal energy subsystem according to the information determinism, and the method includes:

与电-氢子系统对应的上层子问题模型，表达公式为：The upper-level sub-problem model corresponding to the electro-hydrogen subsystem is expressed as:

s.t.电能子系统的运行约束和氢能子系统的运行约束；s.t. Operational constraints of the electrical energy subsystem and operational constraints of the hydrogen energy subsystem;

与热能子系统对应的下层子问题模型，表达公式为：The lower sub-problem model corresponding to the thermal energy subsystem is expressed as:

min(V(C_5，t+C_6，t))s.t.热能子系统的运行约束。min(V(C _{5 , t} + C _{6 , t} )) st operating constraints of the thermal energy subsystem.

优选的，基于马尔科夫博弈框架对下层子问题模型进行重新建模的方法包括：Preferably, the method for re-modeling the underlying sub-problem model based on the Markov game framework includes:

所述热能子系统的环境状态表达式如下：The environmental state expression of the thermal energy subsystem is as follows:

s_t＝(Q_fc,t,Q_th,t,β_in,i,t,β_out,i,t,t)，s _t =(Q _fc,t ,Q _th,t ,β _in,i,t ,β _out,i,t ,t),

式中，Q_fc,t表示t时隙的燃料电池的产热量；Q_th,t表示t时隙热能子系统中的隙热能存储系统的储能水平；β_in,i,t为t时隙第i个房间的室内温度；β_out,t为t时隙的室外温度；t表示指当前含氢建筑能源系统执行连续两次动作决策的时间间隔；Q_th,t表示t时隙在热能子系统中的热能存储系统的储能水平，η_tc和η_td分别表示热能子系统中的热能存储系统的注入效率和释放效率；P_tc,t和P_td,t分别表示t时隙热能子系统中的隙热能存储系统的注入功率和释放功率；In the formula, Q _fc,t represents the heat production of the fuel cell in time slot t; Q _th,t represents the energy storage level of the interstitial thermal energy storage system in the thermal energy subsystem of time slot t; β _in,i,t is the time slot t The indoor temperature of the i-th room; β _out,t is the outdoor temperature in time slot t; t refers to the time interval between the current hydrogen-containing building energy system executing two consecutive action decisions; Q _th,t refers to the time slot t in the thermal energy quantum The energy storage level of the thermal energy storage system in the system, η _tc and η _td represent the injection efficiency and release efficiency of the thermal energy storage system in the thermal energy subsystem, respectively; P _tc,t and P _td,t represent the t-slot thermal energy subsystem, respectively The injected power and the released power of the interstitial thermal energy storage system in ;

所述热能子系统的动作表达式为：The action expression of the thermal energy subsystem is:

a_t＝(P_sp,1,t,P_sp,2,t,…,P_sp,i,t),1≤i≤N_b，a _t =(P _sp,1,t ,P _sp,2,t ,...,P _sp,i,t ), 1≤i≤N _b ,

式中，P_sp,i,t为在t时隙时第i个房间的热供给功率；N_b为房间个数；In the formula, P _sp,i,t is the heat supply power of the ith room at time slot t; N _b is the number of rooms;

所述热能子系统的奖励表达式如下：The reward expression of the thermal energy subsystem is as follows:

式中，

其中，κ_th为惩罚系数。In the formula,

Among them, κ _th is the penalty coefficient.

优选的，采用多智能体注意力深度确定性策略梯度算法进行求解的方法包括：Preferably, the method for solving by using the multi-agent attention depth deterministic policy gradient algorithm includes:

在每个时隙初，获取热能子系统的环境状态；At the beginning of each time slot, obtain the environmental state of the thermal energy subsystem;

深度神经网络根据所述当前热能子系统的环境状态，输出含氢建筑能源系统的当前热供给行为对热能子系统进行控制；The deep neural network controls the thermal energy subsystem by outputting the current heat supply behavior of the hydrogen-containing building energy system according to the environmental state of the current thermal energy subsystem;

获取下一时隙奖励和下一时隙的环境状态；将各时隙的奖励和环境状态存储至经验池中；Obtain the reward of the next time slot and the environmental state of the next time slot; store the reward and environmental state of each time slot into the experience pool;

计算深度神经网络的损失函数L(θ_i)和策略梯度

则从经验池中抽取训练样本，利用多智能体注意力深度确定性策略梯度算法训练深度神经网络，根据损失函数L(θ_i)和策略梯度

对深度神经网络进行迭代，获得热能子系统的最优控制策略。Calculate the loss function L(θ _i ) and the policy gradient of the deep neural network

Then, the training samples are extracted from the experience pool, and the deep neural network is trained by the multi-agent attention depth deterministic policy gradient algorithm. According to the loss function L(θ _i ) and the policy gradient

Iterate the deep neural network to obtain the optimal control strategy of the thermal energy subsystem.

优选的，多智能体注意力深度确定性策略梯度算法架构包括i个智能体，所述智能体设有单个深度神经网络，各深度神经网络包括行动者网络、目标行动者网络、评论家网络和目标评论家网络；行动者网络和目标行动者网络结构相同，评论家网络和目标评论家网络结构相同；Preferably, the multi-agent attention depth deterministic policy gradient algorithm architecture includes i agents, the agents are provided with a single deep neural network, and each deep neural network includes an actor network, a target actor network, a critic network and a The target critic network; the actor network and the target actor network have the same structure, and the critic network and the target critic network have the same structure;

行动者网络输入层的神经元个数与环境状态s_t的分量数相同，输出层的神经元个数与行为a_t的个数相同；所述智能体的评论家网络包括动作行为编码器模块、注意力机制模块和多层感知机模块；The number of neurons in the input layer of the actor network is the same as the number of components of the environmental state s _t , and the number of neurons in the output layer is the same as the number of the behavior a _t ; the critic network of the agent includes an action behavior encoder module , attention mechanism module and multilayer perceptron module;

所述注意力机制模块中第i个智能体行动者网络的输入是为o_i，输出为a_i；评论家网络的输入包括o_i、a_i和

输出是Q_i(o,a)，

The input of the ith agent actor network in the attention mechanism module is o _i , and the output is a _i ; the input of the critic network includes o _i , a _i and

The output is Q _i (o,a),

其中，o_i是第i个智能体的局部观测状态；a_i是输出的动作；e_i表示第i个智能体的局部观察和行为的编码；Q_i(o,a)是该评论家网络输出的Q值，在第i个智能体的评论家网络中，注意力模块的输入是

输出是x_i，x_i表示其他智能体的贡献；Among them, o _i is the local observation state of the ith agent; a _i is the output action; _ei represents the encoding of the local observation and behavior of the ith agent; Q _i (o, a) is the critic network The Q value of the output, in the critic network of the ith agent, the input of the attention module is

The output is _xi , where _xi represents the contribution of other agents;

其他智能体的贡献x_i表达式为：The contribution _xi of other agents is expressed as:

式中，W_value,j表示与第j个智能体相关的值变换矩阵；

是一个非线性激活函数；In the formula, W _value,j represents the value transformation matrix related to the jth agent;

is a nonlinear activation function;

w_j是与第j个智能体相关的权重；w _j is the weight associated with the jth agent;

第j个智能体相关的权重w_j表达为：The weight w _j associated with the jth agent is expressed as:

式中，W_key,i和W_query,i分别是与第i个智能体相关的变换矩阵。where W _key,i and W _query,i are the transformation matrices related to the ith agent, respectively.

优选的，所述训深度神经网络的损失函数L(θ_i)和策略梯度

表达式为：Preferably, the loss function L(θ _i ) and the policy gradient of the training deep neural network

The expression is:

式中，π表示智能体的策略(由行动者网络表示)；y表示目标评论家网络的输出Q值，π′表示智能体的目标策略(由目标行动者网络表示)；

表示第i个智能体的评论家网络在策略π下输出的Q值；π_i(a_i|o_i)表示第i个智能体的行动者网络输出。where π represents the agent's strategy (represented by the actor network); y represents the output Q value of the target critic network, and π' represents the agent's target strategy (represented by the target actor network);

represents the Q value of the critic network output of the ith agent under policy π; π _i (a _i |o _i ) represents the actor network output of the ith agent.

与现有技术相比，本发明的有益效果：Compared with the prior art, the beneficial effects of the present invention:

本发明电-氢子系统的运行采用基于上层子问题模型的优化，然后将其优化结果作为热能子系统运行的输入状态，采用多智能体深度强化学习技术学习热能子系统的最优运行控制策略，因而避免了异构智能体的出现；采用了注意力机制使热能子系统的最优运行控制策略的学习具有高可扩展性。The operation of the electro-hydrogen subsystem of the present invention adopts the optimization based on the upper-level sub-problem model, and then the optimization result is used as the input state of the operation of the thermal energy subsystem, and the multi-agent deep reinforcement learning technology is used to learn the optimal operation control strategy of the thermal energy subsystem , thus avoiding the emergence of heterogeneous agents; the attention mechanism is adopted to make the learning of the optimal operation control strategy of the thermal energy subsystem highly scalable.

本发明利用基于模型的凸优化方法和基于无模型的学习方法的双重优势，在无需知晓不确定性参数的先验信息和明确建筑热动态性模型的前提下，实现高热舒适性下的运行成本最小化。The invention utilizes the dual advantages of the model-based convex optimization method and the model-free learning method, and realizes the operation cost under high thermal comfort without knowing the prior information of the uncertain parameters and clarifying the building thermal dynamic model. minimize.

附图说明Description of drawings

图1是本发明实施例提供的一种联合优化与学习的含氢建筑能源系统运行控制方法的流程图；Fig. 1 is a flow chart of an operation control method of a hydrogen-containing building energy system for joint optimization and learning provided by an embodiment of the present invention;

图2是本发明多智能体注意力深度确定性策略梯度算法网络框架图；Fig. 2 is the multi-agent attention depth deterministic strategy gradient algorithm network frame diagram of the present invention;

图3是本发明实施例与其他方案的平均温度偏离对比图；Fig. 3 is the average temperature deviation contrast diagram of the embodiment of the present invention and other schemes;

图4是本发明实施例与其他方案的平均运行成本对比图。FIG. 4 is a comparison diagram of the average running cost of the embodiment of the present invention and other solutions.

具体实施方式Detailed ways

下面结合附图对本发明作进一步描述。以下实施例仅用于更加清楚地说明本发明的技术方案，而不能以此来限制本发明的保护范围。The present invention will be further described below in conjunction with the accompanying drawings. The following examples are only used to illustrate the technical solutions of the present invention more clearly, and cannot be used to limit the protection scope of the present invention.

一种联合优化与学习的含氢建筑能源系统运行控制方法，包括：An operation control method for a hydrogen-containing building energy system based on joint optimization and learning, including:

根据含氢建筑能源系统的运行约束条件和参数不确定性，建立含氢建筑能源系统的期望运行成本最小化问题模型；According to the operating constraints and parameter uncertainties of the hydrogen-containing building energy system, a model for the minimization of the expected operating cost of the hydrogen-containing building energy system is established;

含氢建筑能源系统期望运行成本最小化问题模型，表达公式为：The expected operating cost minimization problem model of hydrogen-containing building energy system, the expression formula is:

C_2,t＝μ_cμ_e,tP_g,tΔtC _2,t = μ _c μ _e,t P _g,t Δt

公式中，C_1,t为t时隙买卖电成本，C_2,t为t时隙碳排放成本，C_3,t为t时隙电能存储系统损耗成本，C_4,t为t时隙氢能子系统运维成本，C_5,t为t时隙热能子系统损耗成本，C_6,t为t时隙天然气购买成本，T表示时隙长度；v_t和τ_t分别表示t时隙买电价格和卖电价格；P_g,t为t时隙含氢建筑能源系统与大电网交互的能量交易量；μ_c是碳排放成本系数，单位为RMB/kg；μ_e,t为t时隙大电网的碳排放率；ψ_BESS是电池折旧系数，单位为RMB/kW；P_bc,t和P_bd,t分别表示电能存储系统的充电功率和放电功率；

和

分别表示氢能存储系统中组件x(x∈{el,fc})的运行和维护成本、启动成本和关闭成本，其中，“el”和“fc”分别表示电解槽和燃料电池；

和

分别表示与组件x的ON/OFF状态、启动状态和关闭状态相关的逻辑指示变量，其中，

ψ_TESS是热能存储系统折旧系数，单位为RMB/kW；P_tc,t和P_td,t分别表示t时隙热能存储系统的注入功率和释放功率；η_gb表示天然气转换为热能的转换效率；P_gb,t表示天然气锅炉输出的热功率；λ_gb表示天然气价格，单位为RMB/kWh。In the formula, C1 _,t is the cost of buying and selling electricity in the t slot, C2 _,t is the carbon emission cost in the t slot, C3 _,t is the loss cost of the energy storage system in the t slot, and C4 _,t is the hydrogen in the t slot. Energy subsystem operation and maintenance cost, C _5,t is the loss cost of thermal energy subsystem in time slot t, C _6,t is the cost of purchasing natural gas in time slot t, T represents the length of the time slot; v _t and τ _t represent the purchase cost of time slot t, respectively electricity price and electricity selling price; P _g,t is the energy transaction volume of the hydrogen-containing building energy system interacting with the large power grid in time slot t; μ _c is the carbon emission cost coefficient, in RMB/kg; μ _e,t is when t is the carbon emission rate of the grid with large gaps; ψ _BESS is the battery depreciation coefficient, in RMB/kW; P _bc,t and P _bd,t represent the charging power and discharging power of the electric energy storage system, respectively;

and

denote the operation and maintenance cost, startup cost and shutdown cost of component x (x∈{el,fc}) in the hydrogen energy storage system, respectively, where “el” and “fc” denote electrolyzer and fuel cell, respectively;

and

respectively represent the logical indicator variables related to the ON/OFF state, startup state and shutdown state of component x, where,

ψ _TESS is the depreciation coefficient of the thermal energy storage system, in RMB/kW; P _tc,t and P _td,t represent the injection power and release power of the thermal energy storage system in the t time slot, respectively; η _gb represents the conversion efficiency of natural gas into thermal energy; P _gb,t represents the thermal power output by the natural gas boiler; λ _gb represents the price of natural gas, in RMB/kWh.

在上述含有氢电热混合储能的含氢建筑能源系统运行成本最小化问题中，决策变量Θ包括：本地能源系统与大电网之间的能量交易量、电能存储系统充放电功率、电解槽输入功率、燃料电池输出功率、每个房间的热供给功率、热能存储系统的充放电功率、天然气消耗量。需要考虑的约束有：与氢能存储系统相关的运行约束、与电能存储系统相关的运行约束、与热能存储系统相关的运行约束以及与房间舒适温度范围相关的约束，具体如下：In the above problem of minimizing the operating cost of the hydrogen-containing building energy system with hydrogen-electric-heat hybrid energy storage, the decision variables Θ include: the energy transaction volume between the local energy system and the large power grid, the charging and discharging power of the electric energy storage system, and the input power of the electrolyzer , Fuel cell output power, heat supply power of each room, charge and discharge power of thermal energy storage system, natural gas consumption. The constraints to consider are: operational constraints related to hydrogen energy storage systems, operational constraints related to electrical energy storage systems, operational constraints related to thermal energy storage systems, and constraints related to room comfort temperature range, as follows:

(1)氢能存储系统应满足以下约束：0≤H_t≤H^max，

P_el,t·P_fc,t＝0，式中，H^max是氢罐的最大存储容量；

和

分别是电解槽和燃料电池的额定功率。(1) The hydrogen energy storage system should satisfy the following constraints: 0≤H _t ≤H ^max ,

P _el,t ·P _fc,t =0, where H ^max is the maximum storage capacity of the hydrogen tank;

and

are the power ratings of the electrolyzer and fuel cell, respectively.

(2)电能存储系统需满足以下约束：B^min≤B_t≤B^max，

P_bc,t·P_bd,t＝0，式中，B^min和B^max分别是电能存储系统的最小和最大能量水平；

分别为电能存储系统的最大充电、放电功率。(2) The electric energy storage system needs to satisfy the following constraints: B ^min ≤ B _t ≤ B ^max ,

P _bc,t ·P _bd,t =0, where B ^min and B ^max are the minimum and maximum energy levels of the electrical energy storage system, respectively;

are the maximum charging and discharging power of the electrical energy storage system, respectively.

(3)在热能存储系统充放过程中，需满足如下运行约束：

P_td,t·P_tc,t＝0，式中，

是热能存储系统的最大容量；

和

分别是热能存储系统的最大释放功率和最大注入功率。(3) During the charging and discharging process of the thermal energy storage system, the following operating constraints must be satisfied:

P _td,t ·P _tc,t =0, where,

is the maximum capacity of the thermal energy storage system;

and

are the maximum released power and maximum injected power of the thermal energy storage system, respectively.

(4)热负载需求满足以下运行约束：

β_in,i,t+1＝F(P_sp,i,t,β_out,t,β_in,i,t,ε_i,t)，式中，

和

分别表示建筑i内舒适温度范围的下限和上限；β_in,i,t为t时隙第i个房间的室内温度；F_i表示建筑i的热动态性模型；ε_i,t表示t时隙的随机热扰动；

表示建筑i内的最大热供给功率。(4) The heat load requirement satisfies the following operating constraints:

β _in,i,t+1 =F(P _sp,i,t ,β _out,t ,β _in,i,t ,ε _i,t ), where,

and

represent the lower and upper limits of the comfortable temperature range in building i, respectively; β _in,i,t is the indoor temperature of the ith room in time slot t; F _i represents the thermal dynamic model of building i; ε _i,t represents time slot t Random thermal disturbances;

represents the maximum heat supply power in building i.

利用李雅普诺夫最优化框架将期望运行成本最小化问题转化为多个单时隙最优化子问题模型的方法包括：Using the Lyapunov optimization framework to transform the expected running cost minimization problem into multiple single-slot optimization subproblems models include:

判定含氢建筑能源系统的可控性；所述可控条件的表达公式为：Determine the controllability of the hydrogen-containing building energy system; the expression formula of the controllable condition is:

v^max＞τ^max，v ^max >τ ^max ,

v^min＞τ^min，v ^min >τ ^min ,

和

和

和

和

and

选择符合可控条件的氢建筑能源系统构建电能子系统和氢能子系统的虚拟队列；根据虚拟队列定义李雅普诺夫函数，计算单时隙李雅普诺夫漂移和运行成本的加权和ΔY(t)的方法包括：Select the hydrogen building energy system that meets the controllable conditions to construct the virtual queue of the electric energy subsystem and the hydrogen energy subsystem; define the Lyapunov function according to the virtual queue, and calculate the weighted sum ΔY(t) of the single-slot Lyapunov drift and operating cost methods include:

公式中，X_B,t＝B_t+W_B，X_H,t＝H_t+W_H，ω_r是一个统一X_B,t和X_H,t量纲的加权系数；B_t表示为t时隙的电能存储系统的储能水平，H_t表示为t时隙的氢能存储系统的储能水平，W_B表示为最优电能存储系统的参数，W_H表示为最优氢能存储系统的参数；B_t和H_t需要满足的动态性约束分别表示为：

式中，P_bc,t和P_bd,t分别表示电能存储系统的充电功率和放电功率；P_el,t和P_fc,t分别表示t时隙的电解槽输入功率和燃料电池输出功率。In the formula, X _B,t =B _t +W _B , X _H,t =H _t +W _H , ω _r is a weighting coefficient that unifies the dimensions of X _B,t and X _H,t ; B _t is expressed as t The energy storage level of the electric energy storage system in the time slot, H _t is the energy storage level of the hydrogen energy storage system in the time slot t, _WB is the parameter of the optimal electric energy storage system, and _WH is the optimal hydrogen energy storage system. The parameters of ; the dynamic constraints that B _t and H _t need to satisfy are expressed as:

In the formula, P _bc,t and P _bd,t represent the charging power and discharging power of the electrical energy storage system, respectively; P _el,t and P _fc,t represent the electrolyzer input power and fuel cell output power in time slot t, respectively.

单时隙李雅普诺夫漂移，表达公式为：The single-slot Lyapunov drift is expressed as:

Λ_t＝E{L(t+1)-L(t)|X(t)}，Λ _t =E{L(t+1)-L(t)|X(t)},

Λ_t≤ξ_B+ξ_H+E{Γ₀|X(t)}，Λ _t ≤ξ _B +ξ _H +E{Γ ₀ |X(t)},

式中，V是一个加权参数。where V is a weighting parameter.

通过最小化加权和ΔY(t)将含氢建筑能源系统期望运行成本最小化问题模型转化为多个单时隙最优化子问题模型，单时隙最优化子问题模型的表达公式为：By minimizing the weighted sum ΔY(t), the expected operating cost minimization problem model of the hydrogen-containing building energy system is transformed into multiple single-slot optimization sub-problem models. The expression formula of the single-slot optimization sub-problem model is:

计算确定单时隙最优化子问题模型中的最优系统参数；最优电能存储系统的参数W_B的计算公式为：Calculate and determine the optimal system parameters in the single-slot optimization sub-problem model; the calculation formula of the parameter W _B of the optimal electric energy storage system is:

根据信息确定性将单时隙最优化子问题模型分解为与电-氢子系统对应的上层子问题模型以及与热能子系统对应的下层子问题模型；Decompose the single-slot optimization sub-problem model into an upper-level sub-problem model corresponding to the electric-hydrogen subsystem and a lower-level sub-problem model corresponding to the thermal energy subsystem according to the information determinism;

根据信息确定性将单时隙最优化子问题模型分解为与电-氢子系统对应的上层子问题模型以及与热能子系统对应的下层子问题模型，方法包括：The single-slot optimization sub-problem model is decomposed into an upper-level sub-problem model corresponding to the electric-hydrogen subsystem and a lower-level sub-problem model corresponding to the thermal energy subsystem according to the information determinism, and the method includes:

min(V(C_5，t+C_6，t))min(V(C _{5, t} + C _{6, t} ))

s.t.热能子系统的运行约束。s.t. Operational constraints for thermal energy subsystems.

对上层子问题模型采用凸优化方法进行求解，并根据上层子问题的求解结果计算得到燃料电池产热量，方法包括：The upper-layer sub-problem model is solved by using the convex optimization method, and the heat production of the fuel cell is calculated according to the solution results of the upper-layer sub-problem. The methods include:

由于上层子问题的目标函数为非凸函数，采用如下方式将其进行凸松弛，即目标函数调整为：

该目标函数与原目标函数的最大差距为

由于调整目标函数后，整个问题为线性规划，故可以快速得到最优解。然后，根据求解结果得到燃料电池产热量Q_fc,t＝η_hrη_h2eP_fc,tΔt，其中：η_hr表示热恢复效率，η_h2e表示燃料电池的热电比，P_fc,t表示燃料电池输出功率。Since the objective function of the upper sub-problem is a non-convex function, it is convexly relaxed in the following way, that is, the objective function is adjusted as:

The maximum difference between the objective function and the original objective function is

Since the whole problem is a linear programming after adjusting the objective function, the optimal solution can be obtained quickly. Then, according to the solution result, the fuel cell heat production Q _fc,t = η _hr η _h2e P _fc,t Δt, where: η _hr represents the heat recovery efficiency, η _h2e represents the thermoelectric ratio of the fuel cell, and P _fc,t represents the fuel cell Output Power.

将燃料电池产热量作为下层子问题模型的输入状态；基于马尔科夫博弈框架对下层子问题模型进行重新建模的方法包括：The fuel cell heat production is used as the input state of the lower sub-problem model; the methods of re-modeling the lower sub-problem model based on the Markov game framework include:

式中，

其中，κ_th为惩罚系数。In the formula,

Among them, κ _th is the penalty coefficient.

采用多智能体注意力深度确定性策略梯度算法进行求解，得到热能子系统的最优控制策略的方法包括：The multi-agent attention depth deterministic policy gradient algorithm is used to solve the problem, and the method to obtain the optimal control strategy of the thermal energy subsystem includes:

计算深度神经网络的损失函数L(θ_i)和策略梯度

所述训深度神经网络的损失函数L(θ_i)和策略梯度

表达式为：The loss function L(θ _i ) and the policy gradient of the trained deep neural network

The expression is:

多智能体注意力深度确定性策略梯度算法架构包括i个智能体，所述智能体设有单个深度神经网络，各深度神经网络包括行动者网络、目标行动者网络、评论家网络和目标评论家网络；行动者网络和目标行动者网络结构相同，评论家网络和目标评论家网络结构相同；The multi-agent attention depth deterministic policy gradient algorithm architecture includes i agents, the agents are provided with a single deep neural network, and each deep neural network includes an actor network, a target actor network, a critic network and a target critic network; the actor network and the target actor network have the same structure, and the critic network and the target critic network have the same structure;

输出是Q_i(o,a)，

The output is Q _i (o,a),

输出是x_i，x_i表示其他智能体的贡献，Among them, o _i is the local observation state of the ith agent; a _i is the output action; _ei represents the encoding of the local observation and behavior of the _ith agent; Qi (o, a) is the critic network The Q value of the output, in the critic network of the ith agent, the input of the attention module is

The output is _xi , where _xi represents the contribution of other agents,

式中，W_value,j表示与第j个智能体相关的值变换矩阵；

is a nonlinear activation function;

w_j是与第j个智能体相关的权重，w _j is the weight associated with the jth agent,

图3展示了本发明方法与其他对比方案的性能对比图。方案1表示对电能存储系统和氢能存储系统进行联合控制。具体而言，当存在可再生能源过剩时，电能存储系统和氢能存储系统进行充电。反之，则电能存储系统和氢能存储系统进行放电。而且，采用ON-OFF策略对建筑热供给功率进行控制，即：当室内温度低于下限时，输入热功率为0；当室内温度高于上限时，输入热功率为最大热供给功率。方案2利用深度Q网络(DQN)算法对电能存储系统和氢能存储系统进行控制。同时，采用ON-OFF策略对建筑热供给功率进行控制。方案3采用多智能体深度确定性策略梯度算法(MADDPG)对所有储能设备和热负荷进行联合控制。方案4与本发明方法类似，但未考虑注意力机制。由图4可知，本发明方法可在维持高热舒适性(如平均温度偏离小于0.03摄氏度)的前提下显著降低运行成本。具体而言，相比方案1，方案2，方案3和方案4，分别降低平均运行成本30.09％,20.31％,25.66％,18.53％。FIG. 3 shows a performance comparison diagram of the method of the present invention and other comparison schemes. Scheme 1 represents the joint control of the electric energy storage system and the hydrogen energy storage system. Specifically, when there is a surplus of renewable energy, the electrical energy storage system and the hydrogen energy storage system are charged. Conversely, the electrical energy storage system and the hydrogen energy storage system discharge. Moreover, the ON-OFF strategy is used to control the building heat supply power, that is, when the indoor temperature is lower than the lower limit, the input heat power is 0; when the indoor temperature is higher than the upper limit, the input heat power is the maximum heat supply power. Scheme 2 uses the deep Q network (DQN) algorithm to control the electric energy storage system and the hydrogen energy storage system. At the same time, the ON-OFF strategy is used to control the building heat supply power. Scheme 3 adopts the multi-agent deep deterministic policy gradient algorithm (MADDPG) to jointly control all energy storage devices and thermal loads. Scheme 4 is similar to the method of the present invention, but does not consider the attention mechanism. It can be seen from FIG. 4 that the method of the present invention can significantly reduce the operating cost on the premise of maintaining high thermal comfort (eg, the deviation of the average temperature is less than 0.03 degrees Celsius). Specifically, compared with scheme 1, scheme 2, scheme 3 and scheme 4, the average operating cost is reduced by 30.09%, 20.31%, 25.66% and 18.53% respectively.

本领域内的技术人员应明白，本申请的实施例可提供为方法、系统、或计算机程序产品。因此，本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。As will be appreciated by those skilled in the art, the embodiments of the present application may be provided as a method, a system, or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.

以上所述仅是本发明的优选实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明技术原理的前提下，还可以做出若干改进和变形，这些改进和变形也应视为本发明的保护范围。The above are only the preferred embodiments of the present invention. It should be pointed out that for those skilled in the art, without departing from the technical principle of the present invention, several improvements and modifications can also be made. These improvements and modifications It should also be regarded as the protection scope of the present invention.

Claims

1. The operation control method of the hydrogen-containing building energy system based on the combined optimization and learning is characterized by comprising the following steps of: establishing an expected operation cost minimization problem model of the hydrogen-containing building energy system according to the operation constraint conditions and parameter uncertainty of the hydrogen-containing building energy system; converting the expected running cost minimization problem into a plurality of single-time slot optimization sub-problem models by utilizing a Lyapunov optimization framework;

decomposing the single-time-slot optimization sub-problem model into an upper layer sub-problem model corresponding to the electric-hydrogen subsystem and a lower layer sub-problem model corresponding to the heat energy subsystem;

solving the upper sub-problem model by adopting a convex optimization method, and calculating according to the solving result of the upper sub-problem to obtain the heat production quantity of the fuel cell;

taking the heat production quantity of the fuel cell as the input state of the lower layer subproblem model; based on a Markov game framework, carrying out re-modeling on a lower-layer sub-problem model, and solving by adopting a multi-agent attention depth certainty strategy gradient algorithm to obtain an optimal control strategy of a heat energy subsystem;

and controlling the operation of the hydrogen-containing building energy system in real time according to the convex optimization solving method of the upper sub-problem model and the optimal control strategy of the heat energy subsystem.

2. The method for controlling the operation of the hydrogen-containing building energy system based on the combined optimization and learning of claim 1, wherein the problem model of minimizing the expected operation cost of the hydrogen-containing building energy system is expressed by the following formula:

s.t. the operation constraint of the electric energy subsystem, the operation constraint of the hydrogen energy subsystem and the operation constraint of the heat energy subsystem;

in the formula, C _1,t Cost of buying and selling electricity for t time slot, C _2,t Cost of carbon emission for t time slot, C _3,t Cost of loss for t-slot electrical energy storage system, C _4,t Hydrogen for t time slotEnergy subsystem operation and maintenance cost, C _5,t For the loss cost of the t-slot thermal subsystem, C _6,t For T time slot natural gas purchase cost, T represents time slot length; the decision variables Θ include: energy trading volume between a local energy system and a large power grid, charging and discharging power of an electric energy storage system, input power of an electrolytic cell, output power of a fuel cell, heat supply power of each room, charging and discharging power of a heat energy storage system and natural gas consumption.

3. The method for controlling the operation of a hydrogen-containing building energy system based on combined optimization and learning of claim 2, wherein the method for converting the desired operation cost minimization problem into a plurality of single-time-slot optimization sub-problem models by using a Lyapunov optimization framework comprises:

judging the controllability of a hydrogen-containing building energy system; selecting a hydrogen building energy system which meets controllable conditions to construct a virtual queue of an electric energy subsystem and a hydrogen energy subsystem; defining a Lyapunov function according to the virtual queue, and calculating the weighted sum delta Y (t) of the single-time-slot Lyapunov drift and the operation cost; and converting the hydrogen-containing building energy system expected operation cost minimization problem model into a plurality of single-time-slot optimization sub-problem models through the minimization weighted sum delta Y (t), and calculating and determining optimal system parameters in the single-time-slot optimization sub-problem models.

4. The method for controlling the operation of the hydrogen-containing building energy system through combined optimization and learning according to claim 3, wherein the expression formula of the controllable conditions is as follows:

v ^max ＞τ ^max ，

v ^min ＞τ ^min ，

v ^max ＝max _t v _t ，τ ^max ＝max _t τ _t ，v ^min ＝min _t v _t ，τ ^min ＝min _t τ _t ，

in the formula, v ^max And v ^min Respectively representing the highest electricity price and the lowest electricity price for buying electricity; tau. ^max And τ ^min Respectively representing the highest electricity price and the lowest electricity price for selling electricity; eta _bc And η _bd Respectively representing the charging efficiency and the discharging efficiency of the electric energy storage system; mu.s _c Is a weighting parameter that represents the importance of carbon emissions relative to energy costs;

and

respectively representing a maximum rate and a minimum rate of carbon emission; psi _BESS Is the electrical energy storage system depreciation coefficient; b ^max And B ^min Respectively representing the maximum energy storage level and the minimum energy storage level of the electric energy storage system;

and

respectively representing the injection rated power and the release rated power of the electric energy storage system; omega _el And omega _fc Respectively representing the conversion coefficients of the electrolytic cell and the fuel cell;

and

indicating variables respectively indicating whether the electrolyzer and the fuel cell are on or off; h ^max And H ^min Respectively representing the maximum energy storage level and the minimum energy storage level of the hydrogen energy storage system;

and

respectively representing rated power of the electrolytic cell and the fuel cell; Δ t represents the slot length.

5. The method for controlling the operation of the hydrogen-containing building energy system based on the combined optimization and learning of claim 4, wherein the method for calculating the weighted sum Δ Y (t) of the single-slot lyapunov drift and the operation cost comprises:

the Lyapunov function L (t) is expressed by the formula:

in the formula, X _B,t ＝B _t +W _B ，X _H,t ＝H _t +W _H ，ω _r Is a unity of X _B,t And X _H,t A dimensional weighting factor; b is _t Energy storage level of an electrical energy storage system, denoted t time slot, H _t Storage of a hydrogen energy storage system denoted t-slotsCan be horizontal, W _B Expressed as a parameter of the optimal electric energy storage system, W _H Parameters expressed as an optimal hydrogen energy storage system;

B _t and H _t The dynamic constraints that need to be satisfied are respectively expressed as:

in the formula, P _bc,t And P _bd,t Respectively representing the charging power and the discharging power of the electric energy storage system; p is _el,t And P _fc,t Respectively representing the input power of the electrolytic cell and the output power of the fuel cell at t time slot;

the single-time-slot lyapunov drift is expressed by the following formula:

Λ _t ＝E{L(t+1)-L(t)|X(t)}，

in the formula, X (t) ═ X _B,t ,X _H,t ) E {. cndot } represents an expected operation;

then the single time slot Lyapunov drift Lambda _t The expression of (c) can be converted into:

Λ _t ≤ξ _B +ξ _H +E{Γ ₀ |X(t)}，

calculating a weighted sum Δ y (t) of the single-slot lyapunov drift and the operating cost, expressed by the formula:

where V is a weighting parameter.

6. The method for controlling the operation of the hydrogen-containing building energy system through the combined optimization and learning of the claim 5 is characterized in that the expression formula of the single-time-slot optimization subproblem model is as follows:

parameter W of an optimal electrical energy storage system _B The calculation formula of (2) is as follows:

parameter W of optimal hydrogen energy storage system _H The calculation formula of (c) is:

7. the method for controlling the operation of the hydrogen-containing building energy system based on the combined optimization and learning of claim 6, wherein the single-time-slot optimization sub-problem model is decomposed into an upper sub-problem model corresponding to the electric-hydrogen subsystem and a lower sub-problem model corresponding to the thermal energy subsystem according to the information certainty, and the method comprises the following steps:

an upper sub-problem model corresponding to the electro-hydrogen subsystem, expressed as:

s.t. the operation constraint of the electric energy subsystem and the operation constraint of the hydrogen energy subsystem;

the lower-layer sub-problem model corresponding to the heat energy subsystem has the expression formula as follows:

min(V(C _5，t +C _6，t ))

s.t. operating constraints of the thermal energy subsystem.

8. The method for controlling the operation of the hydrogen-containing building energy system based on the combined optimization and learning of claim 7, wherein the method for modeling the lower layer subproblem model again based on the Markov game framework comprises the following steps:

the environmental state expression of the thermal energy subsystem is as follows:

s _t ＝(Q _fc,t ,Q _th,t ,β _in,i,t ,β _out,i,t ,t)，

in the formula, Q _fc,t Representing the heat generation amount of the fuel cell at t time slot; q _th,t Representing the energy storage level of the slot thermal energy storage system in the t-slot thermal energy subsystem; beta is a _in,i,t The indoor temperature of the ith room at the time slot t; beta is a _out,t An outdoor temperature of t time slot; t represents the time interval of two continuous action decisions executed by the current hydrogen-containing building energy system; q _th,t Representing the energy storage level, η, of the thermal energy storage system in the thermal energy sub-system for the t time slot _tc And η _td Respectively representing the injection efficiency and the release efficiency of the thermal energy storage system in the thermal energy subsystem; p _tc,t And P _td,t Respectively representing the injection power and the release power of a slot thermal energy storage system in the t-slot thermal energy subsystem;

the action expression of the heat energy subsystem is as follows:

a _t ＝(P _sp,1,t ,P _sp,2,t ,…,P _sp,i,t ),1≤i≤N _b ，

in the formula, P _sp,i,t Supplying power for the heat of the ith room at the time of the t time slot; n is a radical of hydrogen _b The number of rooms;

the reward expression for the thermal energy subsystem is as follows:

in the formula (I), the compound is shown in the specification,

wherein, κ _th Is a penalty factor.

9. The method for controlling the operation of a hydrogen-containing building energy system based on combined optimization and learning of claim 8, wherein the method for solving by using a multi-agent depth of attention deterministic strategy gradient algorithm comprises:

at the beginning of each time slot, acquiring the environmental state of the heat energy subsystem;

the deep neural network outputs the current heat supply behavior of the hydrogen-containing building energy system to control the heat energy subsystem according to the environmental state of the current heat energy subsystem;

acquiring the reward of the next time slot and the environmental state of the next time slot; storing the rewards and the environment state of each time slot into an experience pool;

computing a loss function L (theta) for a deep neural network _i ) And strategic gradient

Extracting training samples from the experience pool, training a deep neural network by using a multi-agent attention depth deterministic strategy gradient algorithm, and obtaining a loss function L (theta) _i ) And strategic gradient

Iterating the deep neural network to obtain an optimal control strategy of the heat energy subsystem;

a loss function L (theta) of the training deep neural network _i ) And strategic gradient

The expression is as follows:

where π represents the policy of the agent (represented by the actor network); y represents the output Q value of the target critic network, and pi' represents the target strategy (represented by the target actor network) of the agent;

representing the Q value output by the critic network of the ith agent under the strategy pi; pi _i (a _i |o _i ) Representing the actor network output of the ith agent.

10. The method for controlling the operation of the hydrogen-containing building energy system through combined optimization and learning of claim 9 is characterized in that a multi-agent attention depth certainty strategy gradient algorithm framework comprises i agents, each agent is provided with a single deep neural network, and each deep neural network comprises an actor network, a target actor network, a critic network and a target critic network; the actor network and the target actor network have the same structure, and the critic network and the target critic network have the same structure;

neuron number and environment state s of actor network input layer _t The number of the components of (a) is the same, the number of the neurons of the output layer is the same as the behavior a _t The number of the groups is the same; the critic network of the intelligent agent comprises an action behavior encoder module, an attention mechanism module and a multilayer perceptron module;

the input to the i-th agent actor network in the attention mechanism module is o _i Output is a _i (ii) a The input to the critic network includes o _i 、a _i And

the output is Q _i (o,a)，

Wherein o is _i Is the local observed state of the ith agent; a is a _i Is an action of output; e.g. of the type _i Code representing local observations and behaviors of the ith agent; q _i (o, a) is the Q value of the critic network output, and in the critic network of the ith agent, the input to the attention module is

The output is x _i ，x _i Represents contributions of other agents;

contribution x of other Agents _i The expression is as follows:

in the formula, W _value,j A value transformation matrix representing a value associated with a jth agent;

is a non-linear activation function;

w _j is the weight associated with the jth agent;

jth agent dependent weight w _j Expressed as:

in the formula, W _key,i And W _query,i Respectively, the transformation matrices associated with the ith agent.