CN115130733A - Hydrogen-containing building energy system operation control method combining optimization and learning - Google Patents

Hydrogen-containing building energy system operation control method combining optimization and learning Download PDF

Info

Publication number
CN115130733A
CN115130733A CN202210631486.3A CN202210631486A CN115130733A CN 115130733 A CN115130733 A CN 115130733A CN 202210631486 A CN202210631486 A CN 202210631486A CN 115130733 A CN115130733 A CN 115130733A
Authority
CN
China
Prior art keywords
hydrogen
subsystem
energy storage
slot
storage system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210631486.3A
Other languages
Chinese (zh)
Other versions
CN115130733B (en
Inventor
余亮
张予涵
任静怡
岳东
窦春霞
张腾飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202210631486.3A priority Critical patent/CN115130733B/en
Publication of CN115130733A publication Critical patent/CN115130733A/en
Application granted granted Critical
Publication of CN115130733B publication Critical patent/CN115130733B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for AC mains or AC distribution networks
    • H02J3/38Arrangements for parallely feeding a single network by two or more generators, converters or transformers
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2300/00Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
    • H02J2300/30The power source being a fuel cell
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2310/00The network for supplying or distributing electric power characterised by its spatial reach or by the load
    • H02J2310/10The network having a local or delimited stationary reach
    • H02J2310/12The local stationary network supplying a household or a building
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Marketing (AREA)
  • Tourism & Hospitality (AREA)
  • General Health & Medical Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Primary Health Care (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Power Engineering (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention discloses a hydrogen-containing building energy system operation control method combining optimization and learning in the field of building energy system operation control, which comprises the following steps: establishing an expected operation cost minimization problem model of the hydrogen-containing building energy system, and converting the problem model into a plurality of single-time-slot optimization sub-problem models; decomposing the single-time-slot optimization sub-problem model into an upper sub-problem model and a lower sub-problem model; solving the upper sub-problem model by adopting a convex optimization method, and calculating according to the solving result of the upper sub-problem to obtain the heat production quantity of the fuel cell; taking the heat production quantity of the fuel cell as the input state of the lower layer subproblem model; solving the lower sub-problem model to obtain an optimal control strategy of the heat energy subsystem; the operation of the hydrogen-containing building energy system is controlled in real time; the invention realizes the minimum operation cost under high thermal comfort by utilizing the dual advantages of the convex optimization method based on the model and the learning method based on the model-free.

Description

一种联合优化与学习的含氢建筑能源系统运行控制方法An operation control method for hydrogen-containing building energy system based on joint optimization and learning

技术领域technical field

本发明属于建筑能源系统运行控制领域,具体涉及含氢建筑能源系统运行控制方法。The invention belongs to the field of building energy system operation control, and in particular relates to a hydrogen-containing building energy system operation control method.

背景技术Background technique

建筑在全世界能源消耗和碳排放总量中占有很大的比重。在2019年,全球建筑消耗的能源占全球能源总量约30%,产生的碳排放占全球碳排放总量约28%。目前全球能源供给主要依赖化石燃料等不可再生能源,导致能源枯竭问题和环境污染问题日益严重。近年来,氢能因其具有清洁、可再生、来源广泛、储运方便、利用率高等优点受到了广泛关注,被公认为一种很有前景的化石燃料替代品。此外,氢能存储系统与其他储能系统(如热能存储系统、电能存储系统)的协调运行有助于提升建筑能量效率。因此,含氢建筑能源系统的运行控制值得深入研究。Buildings account for a large proportion of the world's total energy consumption and carbon emissions. In 2019, the energy consumed by global buildings accounted for about 30% of the total global energy, and the carbon emissions generated accounted for about 28% of the total global carbon emissions. At present, the global energy supply mainly relies on non-renewable energy sources such as fossil fuels, resulting in increasingly serious problems of energy depletion and environmental pollution. In recent years, hydrogen energy has received extensive attention due to its clean, renewable, wide-ranging sources, convenient storage and transportation, and high utilization rate, and has been recognized as a promising alternative to fossil fuels. In addition, the coordinated operation of hydrogen energy storage systems and other energy storage systems (such as thermal energy storage systems, electrical energy storage systems) can help improve building energy efficiency. Therefore, the operation control of the hydrogen-containing building energy system is worthy of in-depth study.

现有研究提出了若干含氢建筑能源系统的运行控制方法,如随机规划、模型预测控制等。这些方法的目标是最小化系统运行成本(主要包括能量成本和碳排放成本等)。尽管现有研究取得了一定的进展,但均未考虑建筑热动态性,这意味着高建筑热惯性(即建筑室内温度由于初始激励(如突然停止加热)呈现弱化和延迟反应的现象)并未被充分利用以降低系统运行成本。Existing studies have proposed several operational control methods for hydrogen-containing building energy systems, such as stochastic programming and model predictive control. The goal of these methods is to minimize system operating costs (mainly including energy costs and carbon emissions costs, etc.). Although some progress has been made in existing studies, none of them consider building thermal dynamics, which means that high building thermal inertia (i.e. the weakening and delayed response of building interior temperature due to initial excitation (such as abrupt heating stop)) does not be fully utilized to reduce system operating costs.

当将建筑热动态性考虑在含氢建筑能源系统中时,系统运行优化控制面临四个方面的挑战:(1)存在大量不确定性系统参数;(2)存在大量时间和空间耦合运行约束;(3)氢能存储系统中燃料电池同时产生电和热导致电能流和热能流之间存在耦合;(4)很难建立既准确又易于建筑控制的明确建筑热动态性模型。具体而言,单智能体深度强化学习的动作空间维度将随着热区域数量增大而急剧增加;多智能体深度强化学习由于面临的是异构智能体之间的协同,在智能体数量增加时,其有效学习面临困难。When considering building thermal dynamics in a hydrogen-containing building energy system, the optimal control of system operation faces four challenges: (1) there are a lot of uncertain system parameters; (2) there are a lot of time and space coupled operational constraints; (3) The simultaneous generation of electricity and heat by a fuel cell in a hydrogen energy storage system results in a coupling between the electrical energy flow and the thermal energy flow; (4) It is difficult to establish a clear building thermal dynamic model that is both accurate and easy to control. Specifically, the action space dimension of single-agent deep reinforcement learning will increase sharply with the increase of the number of hot regions; multi-agent deep reinforcement learning is faced with the cooperation between heterogeneous agents, and the number of agents increases It is difficult to learn effectively.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于提供一种联合优化与学习的含氢建筑能源系统运行控制方法,利用基于模型的凸优化方法和基于无模型的学习方法的双重优势,实现高热舒适性下的运行成本最小化。The purpose of the present invention is to provide a combined optimization and learning operation control method for a hydrogen-containing building energy system, which utilizes the dual advantages of a model-based convex optimization method and a model-free learning method to minimize operating costs under high thermal comfort. .

为达到上述目的,本发明所采用的技术方案是:In order to achieve the above object, the technical scheme adopted in the present invention is:

本发明第一方面提供了一种联合优化与学习的含氢建筑能源系统运行控制方法,包括:A first aspect of the present invention provides an operation control method for a hydrogen-containing building energy system that combines optimization and learning, including:

根据含氢建筑能源系统的运行约束条件和参数不确定性,建立含氢建筑能源系统的期望运行成本最小化问题模型;利用李雅普诺夫最优化框架将期望运行成本最小化问题转化为多个单时隙最优化子问题模型;According to the operating constraints and parameter uncertainties of the hydrogen-containing building energy system, a model of the expected operating cost minimization problem of the hydrogen-containing building energy system is established; the Lyapunov optimization framework is used to transform the expected operating cost minimization problem into multiple single Slot optimization subproblem model;

将单时隙最优化子问题模型分解为与电-氢子系统对应的上层子问题模型以及与热能子系统对应的下层子问题模型;The single-slot optimization sub-problem model is decomposed into an upper sub-problem model corresponding to the electric-hydrogen subsystem and a lower sub-problem model corresponding to the thermal energy subsystem;

对上层子问题模型采用凸优化方法进行求解,并根据上层子问题的求解结果计算得到燃料电池产热量;The upper sub-problem model is solved by using the convex optimization method, and the heat production of the fuel cell is calculated according to the solution results of the upper sub-problem;

将燃料电池产热量作为下层子问题模型的输入状态;基于马尔科夫博弈框架对下层子问题模型进行重新建模,并采用多智能体注意力深度确定性策略梯度算法进行求解,得到热能子系统的最优控制策略;The fuel cell heat production is used as the input state of the lower sub-problem model; the lower sub-problem model is re-modeled based on the Markov game framework, and the multi-agent attention depth deterministic strategy gradient algorithm is used to solve the problem, and the thermal energy subsystem is obtained. the optimal control strategy;

根据上层子问题模型的凸优化求解方法和热能子系统最优控制策略对含氢建筑能源系统的运行进行实时控制。According to the convex optimization solution method of the upper sub-problem model and the optimal control strategy of the thermal energy subsystem, the operation of the hydrogen-containing building energy system is controlled in real time.

优选的,含氢建筑能源系统期望运行成本最小化问题模型,表达公式为:Preferably, the expected operating cost minimization problem model of the hydrogen-containing building energy system is expressed as:

Figure BDA0003680126570000031
Figure BDA0003680126570000031

s.t.电能子系统的运行约束、氢能子系统的运行约束和热能子系统的运行约束;s.t. Operational constraints of the electrical energy subsystem, operational constraints of the hydrogen energy subsystem, and operational constraints of the thermal energy subsystem;

公式中,C1,t为t时隙买卖电成本,C2,t为t时隙碳排放成本,C3,t为t时隙电能存储系统损耗成本,C4,t为t时隙氢能子系统运维成本,C5,t为t时隙热能子系统损耗成本,C6,t为t时隙天然气购买成本,T表示时隙长度;决策变量Θ包括:本地能源系统与大电网之间的能量交易量、电能存储系统充放电功率、电解槽输入功率、燃料电池输出功率、每个房间的热供给功率、热能存储系统的充放电功率、天然气消耗量。In the formula, C1 ,t is the cost of buying and selling electricity in the t slot, C2 ,t is the carbon emission cost in the t slot, C3 ,t is the loss cost of the energy storage system in the t slot, and C4 ,t is the hydrogen in the t slot. Energy subsystem operation and maintenance cost, C 5,t is the loss cost of the thermal energy subsystem in the t time slot, C 6,t is the natural gas purchase cost in the t time slot, T represents the time slot length; decision variables Θ include: local energy system and large power grid The energy transaction volume, the charging and discharging power of the electric energy storage system, the input power of the electrolyzer, the output power of the fuel cell, the heat supply power of each room, the charging and discharging power of the thermal energy storage system, and the natural gas consumption.

优选的,利用李雅普诺夫最优化框架将期望运行成本最小化问题转化为多个单时隙最优化子问题模型的方法包括:Preferably, the method for transforming the expected running cost minimization problem into multiple single-slot optimization sub-problem models using the Lyapunov optimization framework includes:

判定含氢建筑能源系统的可控性;选择符合可控条件的氢建筑能源系统构建电能子系统和氢能子系统的虚拟队列;根据虚拟队列定义李雅普诺夫函数,计算单时隙李雅普诺夫漂移和运行成本的加权和ΔY(t);通过最小化加权和ΔY(t)将含氢建筑能源系统期望运行成本最小化问题模型转化为多个单时隙最优化子问题模型,计算确定单时隙最优化子问题模型中的最优系统参数。Determine the controllability of the hydrogen-containing building energy system; select the hydrogen building energy system that meets the controllable conditions to construct the virtual queue of the electric energy subsystem and the hydrogen energy subsystem; define the Lyapunov function according to the virtual queue, and calculate the single-slot Lyapunov The weighted sum ΔY(t) of drift and operating cost; by minimizing the weighted sum ΔY(t), the expected operating cost minimization problem model of the hydrogen-containing building energy system is transformed into multiple single-slot optimization sub-problem models, and the calculation and determination of single Optimal system parameters in the slotted optimization subproblem model.

优选的,所述可控条件的表达公式为:Preferably, the expression formula of the controllable condition is:

vmax>τmaxv maxmax ,

vmin>τminv minmin ,

Figure BDA0003680126570000032
Figure BDA0003680126570000032

Figure BDA0003680126570000033
Figure BDA0003680126570000033

Figure BDA0003680126570000041
Figure BDA0003680126570000041

Figure BDA0003680126570000042
Figure BDA0003680126570000042

vmax=maxt vt,τmax=maxtτt,vmin=mint vt,τmin=mintτt

Figure BDA0003680126570000043
Figure BDA0003680126570000044
v max =max t v t , τ max =max t τ t , v min =min t v t , τ min =min t τ t ,
Figure BDA0003680126570000043
Figure BDA0003680126570000044

式中,vmax和vmin分别表示买电最高电价和最低电价;τmax和τmin分别表示卖电最高电价和最低电价;ηbc和ηbd分别表示电能存储系统的充电效率和放电效率;μc是加权参数,用来表示碳排放相对于能量成本的重要性;

Figure BDA0003680126570000045
Figure BDA0003680126570000046
分别表示碳排放最大速率和最小速率;ψBESS是电能存储系统折旧系数;Bmax和Bmin分别表示电能存储系统的最大储能水平和最小储能水平;
Figure BDA0003680126570000047
Figure BDA0003680126570000048
分别表示电能存储系统的注入额定功率和释放额定功率;ωel和ωfc分别表示电解槽和燃料电池的转换系数;
Figure BDA0003680126570000049
Figure BDA00036801265700000410
分别表示电解槽和燃料电池是否开启的指示变量;Hmax和Hmin分别表示氢能存储系统的最大储能水平和最小储能水平;
Figure BDA00036801265700000411
Figure BDA00036801265700000412
分别表示电解槽和燃料电池的额定功率;Δt表示时隙长度。In the formula, v max and v min represent the highest and lowest electricity prices for buying electricity, respectively; τ max and τ min represent the highest and lowest electricity prices for selling electricity, respectively; η bc and η bd represent the charging efficiency and discharging efficiency of the electrical energy storage system, respectively; μ c is a weighting parameter used to express the importance of carbon emissions relative to energy costs;
Figure BDA0003680126570000045
and
Figure BDA0003680126570000046
represent the maximum rate and minimum rate of carbon emission, respectively; ψ BESS is the depreciation coefficient of the electrical energy storage system; B max and B min represent the maximum energy storage level and the minimum energy storage level of the electrical energy storage system, respectively;
Figure BDA0003680126570000047
and
Figure BDA0003680126570000048
represent the injection rated power and release rated power of the electrical energy storage system, respectively; ω el and ω fc represent the conversion coefficients of the electrolyzer and fuel cell, respectively;
Figure BDA0003680126570000049
and
Figure BDA00036801265700000410
respectively indicate whether the electrolyzer and the fuel cell are turned on; H max and H min respectively indicate the maximum energy storage level and the minimum energy storage level of the hydrogen energy storage system;
Figure BDA00036801265700000411
and
Figure BDA00036801265700000412
are the rated power of the electrolyzer and the fuel cell, respectively; Δt is the time slot length.

优选的,计算单时隙李雅普诺夫漂移和运行成本的加权和ΔY(t)的方法包括:Preferably, the method for calculating the weighted sum ΔY(t) of the single-slot Lyapunov drift and the running cost includes:

所述李雅普诺夫函数L(t),表达公式为:The Lyapunov function L(t), the expression formula is:

Figure BDA00036801265700000413
Figure BDA00036801265700000413

公式中,XB,t=Bt+WB,XH,t=Ht+WH,ωr是统一XB,t和XH,t量纲的加权系数;Bt表示为t时隙的电能存储系统的储能水平,Ht表示为t时隙的氢能存储系统的储能水平,WB表示为最优电能存储系统的参数,WH表示为最优氢能存储系统的参数;Bt和Ht需要满足的动态性约束分别表示为:

Figure BDA00036801265700000414
Figure BDA0003680126570000051
式中,Pbc,t和Pbd,t分别表示电能存储系统的充电功率和放电功率;Pel,t和Pfc,t分别表示t时隙的电解槽输入功率和燃料电池输出功率。单时隙李雅普诺夫漂移,表达公式为:In the formula, X B,t =B t +W B , X H,t =H t +W H , ω r is a weighting coefficient that unifies the dimensions of X B,t and X H,t ; when B t is expressed as t The energy storage level of the electric energy storage system in time slot t, H t is the energy storage level of the hydrogen energy storage system in time slot t, WB is the parameter of the optimal electric energy storage system, and W H is the optimal hydrogen energy storage system. parameters; the dynamic constraints that B t and H t need to satisfy are expressed as:
Figure BDA00036801265700000414
Figure BDA0003680126570000051
In the formula, P bc,t and P bd,t represent the charging power and discharging power of the electrical energy storage system, respectively; P el,t and P fc,t represent the electrolyzer input power and fuel cell output power in time slot t, respectively. The single-slot Lyapunov drift is expressed as:

Λt=E{L(t+1)-L(t)|X(t)},Λ t =E{L(t+1)-L(t)|X(t)},

Figure BDA0003680126570000052
Figure BDA0003680126570000052

Figure BDA0003680126570000053
Figure BDA0003680126570000053

Figure BDA0003680126570000054
Figure BDA0003680126570000054

公式中,X(t)=(XB,t,XH,t),E{·}表示期望运算。In the formula, X(t)=(X B,t ,X H,t ), and E{·} represents the expected operation.

则单时隙李雅普诺夫漂移Λt的表达式可转化为:Then the expression of the single-slot Lyapunov drift Λ t can be transformed into:

Λt≤ξBH+E{Γ0|X(t)},Λ t ≤ξ BH +E{Γ 0 |X(t)},

Figure BDA0003680126570000055
Figure BDA0003680126570000055

计算单时隙李雅普诺夫漂移和运行成本的加权和ΔY(t),表达公式为:Calculate the weighted sum ΔY(t) of the single-slot Lyapunov drift and running cost, expressed as:

Figure BDA0003680126570000056
Figure BDA0003680126570000056

式中,V是一个加权参数。where V is a weighting parameter.

优选的,单时隙最优化子问题模型的表达公式为Preferably, the expression formula of the single-slot optimization sub-problem model is:

Figure BDA0003680126570000057
Figure BDA0003680126570000057

Figure BDA0003680126570000061
Figure BDA0003680126570000061

Figure BDA0003680126570000062
Figure BDA0003680126570000062

Figure BDA0003680126570000063
Figure BDA0003680126570000063

Figure BDA0003680126570000064
Figure BDA0003680126570000064

Figure BDA0003680126570000065
Figure BDA0003680126570000065

最优电能存储系统的参数WB的计算公式为: The calculation formula of the parameter WB of the optimal electric energy storage system is:

Figure BDA0003680126570000066
Figure BDA0003680126570000066

最优氢能存储系统的参数WH的计算公式为:The calculation formula of the parameter W H of the optimal hydrogen energy storage system is:

Figure BDA0003680126570000067
Figure BDA0003680126570000067

s.t.电能子系统的运行约束、氢能子系统的运行约束和热能子系统的运行约束。s.t. Operational constraints of the electrical energy subsystem, operational constraints of the hydrogen energy subsystem, and operational constraints of the thermal energy subsystem.

优选的,根据信息确定性将单时隙最优化子问题模型分解为与电-氢子系统对应的上层子问题模型以及与热能子系统对应的下层子问题模型,方法包括:Preferably, the single-slot optimization sub-problem model is decomposed into an upper-level sub-problem model corresponding to the electric-hydrogen subsystem and a lower-level sub-problem model corresponding to the thermal energy subsystem according to the information determinism, and the method includes:

与电-氢子系统对应的上层子问题模型,表达公式为:The upper-level sub-problem model corresponding to the electro-hydrogen subsystem is expressed as:

Figure BDA0003680126570000068
Figure BDA0003680126570000068

s.t.电能子系统的运行约束和氢能子系统的运行约束;s.t. Operational constraints of the electrical energy subsystem and operational constraints of the hydrogen energy subsystem;

与热能子系统对应的下层子问题模型,表达公式为:The lower sub-problem model corresponding to the thermal energy subsystem is expressed as:

min(V(C5,t+C6,t))s.t.热能子系统的运行约束。min(V(C 5 , t + C 6 , t )) st operating constraints of the thermal energy subsystem.

优选的,基于马尔科夫博弈框架对下层子问题模型进行重新建模的方法包括:Preferably, the method for re-modeling the underlying sub-problem model based on the Markov game framework includes:

所述热能子系统的环境状态表达式如下:The environmental state expression of the thermal energy subsystem is as follows:

st=(Qfc,t,Qth,tin,i,tout,i,t,t),s t =(Q fc,t ,Q th,tin,i,tout,i,t ,t),

Figure BDA0003680126570000071
Figure BDA0003680126570000071

式中,Qfc,t表示t时隙的燃料电池的产热量;Qth,t表示t时隙热能子系统中的隙热能存储系统的储能水平;βin,i,t为t时隙第i个房间的室内温度;βout,t为t时隙的室外温度;t表示指当前含氢建筑能源系统执行连续两次动作决策的时间间隔;Qth,t表示t时隙在热能子系统中的热能存储系统的储能水平,ηtc和ηtd分别表示热能子系统中的热能存储系统的注入效率和释放效率;Ptc,t和Ptd,t分别表示t时隙热能子系统中的隙热能存储系统的注入功率和释放功率;In the formula, Q fc,t represents the heat production of the fuel cell in time slot t; Q th,t represents the energy storage level of the interstitial thermal energy storage system in the thermal energy subsystem of time slot t; β in,i,t is the time slot t The indoor temperature of the i-th room; β out,t is the outdoor temperature in time slot t; t refers to the time interval between the current hydrogen-containing building energy system executing two consecutive action decisions; Q th,t refers to the time slot t in the thermal energy quantum The energy storage level of the thermal energy storage system in the system, η tc and η td represent the injection efficiency and release efficiency of the thermal energy storage system in the thermal energy subsystem, respectively; P tc,t and P td,t represent the t-slot thermal energy subsystem, respectively The injected power and the released power of the interstitial thermal energy storage system in ;

所述热能子系统的动作表达式为:The action expression of the thermal energy subsystem is:

at=(Psp,1,t,Psp,2,t,…,Psp,i,t),1≤i≤Nba t =(P sp,1,t ,P sp,2,t ,...,P sp,i,t ), 1≤i≤N b ,

式中,Psp,i,t为在t时隙时第i个房间的热供给功率;Nb为房间个数;In the formula, P sp,i,t is the heat supply power of the ith room at time slot t; N b is the number of rooms;

所述热能子系统的奖励表达式如下:The reward expression of the thermal energy subsystem is as follows:

Figure BDA0003680126570000072
Figure BDA0003680126570000072

式中,

Figure BDA0003680126570000073
其中,κth为惩罚系数。In the formula,
Figure BDA0003680126570000073
Among them, κ th is the penalty coefficient.

优选的,采用多智能体注意力深度确定性策略梯度算法进行求解的方法包括:Preferably, the method for solving by using the multi-agent attention depth deterministic policy gradient algorithm includes:

在每个时隙初,获取热能子系统的环境状态;At the beginning of each time slot, obtain the environmental state of the thermal energy subsystem;

深度神经网络根据所述当前热能子系统的环境状态,输出含氢建筑能源系统的当前热供给行为对热能子系统进行控制;The deep neural network controls the thermal energy subsystem by outputting the current heat supply behavior of the hydrogen-containing building energy system according to the environmental state of the current thermal energy subsystem;

获取下一时隙奖励和下一时隙的环境状态;将各时隙的奖励和环境状态存储至经验池中;Obtain the reward of the next time slot and the environmental state of the next time slot; store the reward and environmental state of each time slot into the experience pool;

计算深度神经网络的损失函数L(θi)和策略梯度

Figure BDA0003680126570000081
则从经验池中抽取训练样本,利用多智能体注意力深度确定性策略梯度算法训练深度神经网络,根据损失函数L(θi)和策略梯度
Figure BDA0003680126570000082
对深度神经网络进行迭代,获得热能子系统的最优控制策略。Calculate the loss function L(θ i ) and the policy gradient of the deep neural network
Figure BDA0003680126570000081
Then, the training samples are extracted from the experience pool, and the deep neural network is trained by the multi-agent attention depth deterministic policy gradient algorithm. According to the loss function L(θ i ) and the policy gradient
Figure BDA0003680126570000082
Iterate the deep neural network to obtain the optimal control strategy of the thermal energy subsystem.

优选的,多智能体注意力深度确定性策略梯度算法架构包括i个智能体,所述智能体设有单个深度神经网络,各深度神经网络包括行动者网络、目标行动者网络、评论家网络和目标评论家网络;行动者网络和目标行动者网络结构相同,评论家网络和目标评论家网络结构相同;Preferably, the multi-agent attention depth deterministic policy gradient algorithm architecture includes i agents, the agents are provided with a single deep neural network, and each deep neural network includes an actor network, a target actor network, a critic network and a The target critic network; the actor network and the target actor network have the same structure, and the critic network and the target critic network have the same structure;

行动者网络输入层的神经元个数与环境状态st的分量数相同,输出层的神经元个数与行为at的个数相同;所述智能体的评论家网络包括动作行为编码器模块、注意力机制模块和多层感知机模块;The number of neurons in the input layer of the actor network is the same as the number of components of the environmental state s t , and the number of neurons in the output layer is the same as the number of the behavior a t ; the critic network of the agent includes an action behavior encoder module , attention mechanism module and multilayer perceptron module;

所述注意力机制模块中第i个智能体行动者网络的输入是为oi,输出为ai;评论家网络的输入包括oi、ai

Figure BDA0003680126570000083
输出是Qi(o,a),
Figure BDA0003680126570000084
Figure BDA0003680126570000085
The input of the ith agent actor network in the attention mechanism module is o i , and the output is a i ; the input of the critic network includes o i , a i and
Figure BDA0003680126570000083
The output is Q i (o,a),
Figure BDA0003680126570000084
Figure BDA0003680126570000085

其中,oi是第i个智能体的局部观测状态;ai是输出的动作;ei表示第i个智能体的局部观察和行为的编码;Qi(o,a)是该评论家网络输出的Q值,在第i个智能体的评论家网络中,注意力模块的输入是

Figure BDA0003680126570000086
输出是xi,xi表示其他智能体的贡献;Among them, o i is the local observation state of the ith agent; a i is the output action; ei represents the encoding of the local observation and behavior of the ith agent; Q i (o, a) is the critic network The Q value of the output, in the critic network of the ith agent, the input of the attention module is
Figure BDA0003680126570000086
The output is xi , where xi represents the contribution of other agents;

其他智能体的贡献xi表达式为:The contribution xi of other agents is expressed as:

Figure BDA0003680126570000091
Figure BDA0003680126570000091

式中,Wvalue,j表示与第j个智能体相关的值变换矩阵;

Figure BDA0003680126570000092
是一个非线性激活函数;In the formula, W value,j represents the value transformation matrix related to the jth agent;
Figure BDA0003680126570000092
is a nonlinear activation function;

wj是与第j个智能体相关的权重;w j is the weight associated with the jth agent;

第j个智能体相关的权重wj表达为:The weight w j associated with the jth agent is expressed as:

Figure BDA0003680126570000093
Figure BDA0003680126570000093

式中,Wkey,i和Wquery,i分别是与第i个智能体相关的变换矩阵。where W key,i and W query,i are the transformation matrices related to the ith agent, respectively.

优选的,所述训深度神经网络的损失函数L(θi)和策略梯度

Figure BDA0003680126570000094
表达式为:Preferably, the loss function L(θ i ) and the policy gradient of the training deep neural network
Figure BDA0003680126570000094
The expression is:

Figure BDA0003680126570000095
Figure BDA0003680126570000095

Figure BDA0003680126570000096
Figure BDA0003680126570000096

Figure BDA0003680126570000097
Figure BDA0003680126570000097

式中,π表示智能体的策略(由行动者网络表示);y表示目标评论家网络的输出Q值,π′表示智能体的目标策略(由目标行动者网络表示);

Figure BDA0003680126570000098
表示第i个智能体的评论家网络在策略π下输出的Q值;πi(ai|oi)表示第i个智能体的行动者网络输出。where π represents the agent's strategy (represented by the actor network); y represents the output Q value of the target critic network, and π' represents the agent's target strategy (represented by the target actor network);
Figure BDA0003680126570000098
represents the Q value of the critic network output of the ith agent under policy π; π i (a i |o i ) represents the actor network output of the ith agent.

与现有技术相比,本发明的有益效果:Compared with the prior art, the beneficial effects of the present invention:

本发明电-氢子系统的运行采用基于上层子问题模型的优化,然后将其优化结果作为热能子系统运行的输入状态,采用多智能体深度强化学习技术学习热能子系统的最优运行控制策略,因而避免了异构智能体的出现;采用了注意力机制使热能子系统的最优运行控制策略的学习具有高可扩展性。The operation of the electro-hydrogen subsystem of the present invention adopts the optimization based on the upper-level sub-problem model, and then the optimization result is used as the input state of the operation of the thermal energy subsystem, and the multi-agent deep reinforcement learning technology is used to learn the optimal operation control strategy of the thermal energy subsystem , thus avoiding the emergence of heterogeneous agents; the attention mechanism is adopted to make the learning of the optimal operation control strategy of the thermal energy subsystem highly scalable.

本发明利用基于模型的凸优化方法和基于无模型的学习方法的双重优势,在无需知晓不确定性参数的先验信息和明确建筑热动态性模型的前提下,实现高热舒适性下的运行成本最小化。The invention utilizes the dual advantages of the model-based convex optimization method and the model-free learning method, and realizes the operation cost under high thermal comfort without knowing the prior information of the uncertain parameters and clarifying the building thermal dynamic model. minimize.

附图说明Description of drawings

图1是本发明实施例提供的一种联合优化与学习的含氢建筑能源系统运行控制方法的流程图;Fig. 1 is a flow chart of an operation control method of a hydrogen-containing building energy system for joint optimization and learning provided by an embodiment of the present invention;

图2是本发明多智能体注意力深度确定性策略梯度算法网络框架图;Fig. 2 is the multi-agent attention depth deterministic strategy gradient algorithm network frame diagram of the present invention;

图3是本发明实施例与其他方案的平均温度偏离对比图;Fig. 3 is the average temperature deviation contrast diagram of the embodiment of the present invention and other schemes;

图4是本发明实施例与其他方案的平均运行成本对比图。FIG. 4 is a comparison diagram of the average running cost of the embodiment of the present invention and other solutions.

具体实施方式Detailed ways

下面结合附图对本发明作进一步描述。以下实施例仅用于更加清楚地说明本发明的技术方案,而不能以此来限制本发明的保护范围。The present invention will be further described below in conjunction with the accompanying drawings. The following examples are only used to illustrate the technical solutions of the present invention more clearly, and cannot be used to limit the protection scope of the present invention.

一种联合优化与学习的含氢建筑能源系统运行控制方法,包括:An operation control method for a hydrogen-containing building energy system based on joint optimization and learning, including:

根据含氢建筑能源系统的运行约束条件和参数不确定性,建立含氢建筑能源系统的期望运行成本最小化问题模型;According to the operating constraints and parameter uncertainties of the hydrogen-containing building energy system, a model for the minimization of the expected operating cost of the hydrogen-containing building energy system is established;

含氢建筑能源系统期望运行成本最小化问题模型,表达公式为:The expected operating cost minimization problem model of hydrogen-containing building energy system, the expression formula is:

Figure BDA0003680126570000101
Figure BDA0003680126570000101

Figure BDA0003680126570000102
Figure BDA0003680126570000102

C2,t=μcμe,tPg,tΔtC 2,t = μ c μ e,t P g,t Δt

C3,t=ψBESS(|Pbc,t|+|Pbd,t|)C 3,tBESS (|P bc,t |+|P bd,t |)

Figure BDA0003680126570000111
Figure BDA0003680126570000111

C5,t=ψTESS(|Ptc,t|+|Ptd,t|)C 5,tTESS (|P tc,t |+|P td,t |)

Figure BDA0003680126570000112
Figure BDA0003680126570000112

s.t.电能子系统的运行约束、氢能子系统的运行约束和热能子系统的运行约束;s.t. Operational constraints of the electrical energy subsystem, operational constraints of the hydrogen energy subsystem, and operational constraints of the thermal energy subsystem;

公式中,C1,t为t时隙买卖电成本,C2,t为t时隙碳排放成本,C3,t为t时隙电能存储系统损耗成本,C4,t为t时隙氢能子系统运维成本,C5,t为t时隙热能子系统损耗成本,C6,t为t时隙天然气购买成本,T表示时隙长度;vt和τt分别表示t时隙买电价格和卖电价格;Pg,t为t时隙含氢建筑能源系统与大电网交互的能量交易量;μc是碳排放成本系数,单位为RMB/kg;μe,t为t时隙大电网的碳排放率;ψBESS是电池折旧系数,单位为RMB/kW;Pbc,t和Pbd,t分别表示电能存储系统的充电功率和放电功率;

Figure BDA0003680126570000113
Figure BDA0003680126570000114
分别表示氢能存储系统中组件x(x∈{el,fc})的运行和维护成本、启动成本和关闭成本,其中,“el”和“fc”分别表示电解槽和燃料电池;
Figure BDA0003680126570000115
Figure BDA0003680126570000116
分别表示与组件x的ON/OFF状态、启动状态和关闭状态相关的逻辑指示变量,其中,
Figure BDA0003680126570000117
Figure BDA0003680126570000118
ψTESS是热能存储系统折旧系数,单位为RMB/kW;Ptc,t和Ptd,t分别表示t时隙热能存储系统的注入功率和释放功率;ηgb表示天然气转换为热能的转换效率;Pgb,t表示天然气锅炉输出的热功率;λgb表示天然气价格,单位为RMB/kWh。In the formula, C1 ,t is the cost of buying and selling electricity in the t slot, C2 ,t is the carbon emission cost in the t slot, C3 ,t is the loss cost of the energy storage system in the t slot, and C4 ,t is the hydrogen in the t slot. Energy subsystem operation and maintenance cost, C 5,t is the loss cost of thermal energy subsystem in time slot t, C 6,t is the cost of purchasing natural gas in time slot t, T represents the length of the time slot; v t and τ t represent the purchase cost of time slot t, respectively electricity price and electricity selling price; P g,t is the energy transaction volume of the hydrogen-containing building energy system interacting with the large power grid in time slot t; μ c is the carbon emission cost coefficient, in RMB/kg; μ e,t is when t is the carbon emission rate of the grid with large gaps; ψ BESS is the battery depreciation coefficient, in RMB/kW; P bc,t and P bd,t represent the charging power and discharging power of the electric energy storage system, respectively;
Figure BDA0003680126570000113
and
Figure BDA0003680126570000114
denote the operation and maintenance cost, startup cost and shutdown cost of component x (x∈{el,fc}) in the hydrogen energy storage system, respectively, where “el” and “fc” denote electrolyzer and fuel cell, respectively;
Figure BDA0003680126570000115
and
Figure BDA0003680126570000116
respectively represent the logical indicator variables related to the ON/OFF state, startup state and shutdown state of component x, where,
Figure BDA0003680126570000117
Figure BDA0003680126570000118
ψ TESS is the depreciation coefficient of the thermal energy storage system, in RMB/kW; P tc,t and P td,t represent the injection power and release power of the thermal energy storage system in the t time slot, respectively; η gb represents the conversion efficiency of natural gas into thermal energy; P gb,t represents the thermal power output by the natural gas boiler; λ gb represents the price of natural gas, in RMB/kWh.

在上述含有氢电热混合储能的含氢建筑能源系统运行成本最小化问题中,决策变量Θ包括:本地能源系统与大电网之间的能量交易量、电能存储系统充放电功率、电解槽输入功率、燃料电池输出功率、每个房间的热供给功率、热能存储系统的充放电功率、天然气消耗量。需要考虑的约束有:与氢能存储系统相关的运行约束、与电能存储系统相关的运行约束、与热能存储系统相关的运行约束以及与房间舒适温度范围相关的约束,具体如下:In the above problem of minimizing the operating cost of the hydrogen-containing building energy system with hydrogen-electric-heat hybrid energy storage, the decision variables Θ include: the energy transaction volume between the local energy system and the large power grid, the charging and discharging power of the electric energy storage system, and the input power of the electrolyzer , Fuel cell output power, heat supply power of each room, charge and discharge power of thermal energy storage system, natural gas consumption. The constraints to consider are: operational constraints related to hydrogen energy storage systems, operational constraints related to electrical energy storage systems, operational constraints related to thermal energy storage systems, and constraints related to room comfort temperature range, as follows:

(1)氢能存储系统应满足以下约束:0≤Ht≤Hmax

Figure BDA0003680126570000121
Figure BDA0003680126570000122
Pel,t·Pfc,t=0,式中,Hmax是氢罐的最大存储容量;
Figure BDA0003680126570000123
Figure BDA0003680126570000124
分别是电解槽和燃料电池的额定功率。(1) The hydrogen energy storage system should satisfy the following constraints: 0≤H t ≤H max ,
Figure BDA0003680126570000121
Figure BDA0003680126570000122
P el,t ·P fc,t =0, where H max is the maximum storage capacity of the hydrogen tank;
Figure BDA0003680126570000123
and
Figure BDA0003680126570000124
are the power ratings of the electrolyzer and fuel cell, respectively.

(2)电能存储系统需满足以下约束:Bmin≤Bt≤Bmax

Figure BDA0003680126570000125
Figure BDA0003680126570000126
Pbc,t·Pbd,t=0,式中,Bmin和Bmax分别是电能存储系统的最小和最大能量水平;
Figure BDA0003680126570000127
分别为电能存储系统的最大充电、放电功率。(2) The electric energy storage system needs to satisfy the following constraints: B min ≤ B t ≤ B max ,
Figure BDA0003680126570000125
Figure BDA0003680126570000126
P bc,t ·P bd,t =0, where B min and B max are the minimum and maximum energy levels of the electrical energy storage system, respectively;
Figure BDA0003680126570000127
are the maximum charging and discharging power of the electrical energy storage system, respectively.

(3)在热能存储系统充放过程中,需满足如下运行约束:

Figure BDA0003680126570000128
Figure BDA0003680126570000129
Ptd,t·Ptc,t=0,式中,
Figure BDA00036801265700001210
是热能存储系统的最大容量;
Figure BDA00036801265700001211
Figure BDA00036801265700001212
分别是热能存储系统的最大释放功率和最大注入功率。(3) During the charging and discharging process of the thermal energy storage system, the following operating constraints must be satisfied:
Figure BDA0003680126570000128
Figure BDA0003680126570000129
P td,t ·P tc,t =0, where,
Figure BDA00036801265700001210
is the maximum capacity of the thermal energy storage system;
Figure BDA00036801265700001211
and
Figure BDA00036801265700001212
are the maximum released power and maximum injected power of the thermal energy storage system, respectively.

(4)热负载需求满足以下运行约束:

Figure BDA00036801265700001213
βin,i,t+1=F(Psp,i,tout,tin,i,ti,t),式中,
Figure BDA00036801265700001214
Figure BDA00036801265700001215
分别表示建筑i内舒适温度范围的下限和上限;βin,i,t为t时隙第i个房间的室内温度;Fi表示建筑i的热动态性模型;εi,t表示t时隙的随机热扰动;
Figure BDA00036801265700001216
表示建筑i内的最大热供给功率。(4) The heat load requirement satisfies the following operating constraints:
Figure BDA00036801265700001213
β in,i,t+1 =F(P sp,i,tout,tin,i,ti,t ), where,
Figure BDA00036801265700001214
and
Figure BDA00036801265700001215
represent the lower and upper limits of the comfortable temperature range in building i, respectively; β in,i,t is the indoor temperature of the ith room in time slot t; F i represents the thermal dynamic model of building i; ε i,t represents time slot t Random thermal disturbances;
Figure BDA00036801265700001216
represents the maximum heat supply power in building i.

利用李雅普诺夫最优化框架将期望运行成本最小化问题转化为多个单时隙最优化子问题模型的方法包括:Using the Lyapunov optimization framework to transform the expected running cost minimization problem into multiple single-slot optimization subproblems models include:

判定含氢建筑能源系统的可控性;所述可控条件的表达公式为:Determine the controllability of the hydrogen-containing building energy system; the expression formula of the controllable condition is:

vmax>τmaxv maxmax ,

vmin>τminv minmin ,

Figure BDA00036801265700001217
Figure BDA00036801265700001217

Figure BDA0003680126570000131
Figure BDA0003680126570000131

Figure BDA0003680126570000132
Figure BDA0003680126570000132

Figure BDA0003680126570000133
Figure BDA0003680126570000133

vmax=maxt vt,τmax=maxtτt,vmin=mint vt,τmin=mintτt

Figure BDA0003680126570000134
Figure BDA0003680126570000135
v max =max t v t , τ max =max t τ t , v min =min t v t , τ min =min t τ t ,
Figure BDA0003680126570000134
Figure BDA0003680126570000135

式中,vmax和vmin分别表示买电最高电价和最低电价;τmax和τmin分别表示卖电最高电价和最低电价;ηbc和ηbd分别表示电能存储系统的充电效率和放电效率;μc是加权参数,用来表示碳排放相对于能量成本的重要性;

Figure BDA0003680126570000136
Figure BDA0003680126570000137
分别表示碳排放最大速率和最小速率;ψBESS是电能存储系统折旧系数;Bmax和Bmin分别表示电能存储系统的最大储能水平和最小储能水平;
Figure BDA0003680126570000138
Figure BDA0003680126570000139
分别表示电能存储系统的注入额定功率和释放额定功率;ωel和ωfc分别表示电解槽和燃料电池的转换系数;
Figure BDA00036801265700001310
Figure BDA00036801265700001311
分别表示电解槽和燃料电池是否开启的指示变量;Hmax和Hmin分别表示氢能存储系统的最大储能水平和最小储能水平;
Figure BDA00036801265700001312
Figure BDA00036801265700001313
分别表示电解槽和燃料电池的额定功率;Δt表示时隙长度。In the formula, v max and v min represent the highest and lowest electricity prices for buying electricity, respectively; τ max and τ min represent the highest and lowest electricity prices for selling electricity, respectively; η bc and η bd represent the charging efficiency and discharging efficiency of the electrical energy storage system, respectively; μ c is a weighting parameter used to express the importance of carbon emissions relative to energy costs;
Figure BDA0003680126570000136
and
Figure BDA0003680126570000137
represent the maximum rate and minimum rate of carbon emission, respectively; ψ BESS is the depreciation coefficient of the electrical energy storage system; B max and B min represent the maximum energy storage level and the minimum energy storage level of the electrical energy storage system, respectively;
Figure BDA0003680126570000138
and
Figure BDA0003680126570000139
represent the injection rated power and release rated power of the electrical energy storage system, respectively; ω el and ω fc represent the conversion coefficients of the electrolyzer and fuel cell, respectively;
Figure BDA00036801265700001310
and
Figure BDA00036801265700001311
respectively indicate whether the electrolyzer and the fuel cell are turned on; H max and H min respectively indicate the maximum energy storage level and the minimum energy storage level of the hydrogen energy storage system;
Figure BDA00036801265700001312
and
Figure BDA00036801265700001313
are the rated power of the electrolyzer and the fuel cell, respectively; Δt is the time slot length.

选择符合可控条件的氢建筑能源系统构建电能子系统和氢能子系统的虚拟队列;根据虚拟队列定义李雅普诺夫函数,计算单时隙李雅普诺夫漂移和运行成本的加权和ΔY(t)的方法包括:Select the hydrogen building energy system that meets the controllable conditions to construct the virtual queue of the electric energy subsystem and the hydrogen energy subsystem; define the Lyapunov function according to the virtual queue, and calculate the weighted sum ΔY(t) of the single-slot Lyapunov drift and operating cost methods include:

所述李雅普诺夫函数L(t),表达公式为:The Lyapunov function L(t), the expression formula is:

Figure BDA00036801265700001314
Figure BDA00036801265700001314

公式中,XB,t=Bt+WB,XH,t=Ht+WH,ωr是一个统一XB,t和XH,t量纲的加权系数;Bt表示为t时隙的电能存储系统的储能水平,Ht表示为t时隙的氢能存储系统的储能水平,WB表示为最优电能存储系统的参数,WH表示为最优氢能存储系统的参数;Bt和Ht需要满足的动态性约束分别表示为:

Figure BDA0003680126570000141
式中,Pbc,t和Pbd,t分别表示电能存储系统的充电功率和放电功率;Pel,t和Pfc,t分别表示t时隙的电解槽输入功率和燃料电池输出功率。In the formula, X B,t =B t +W B , X H,t =H t +W H , ω r is a weighting coefficient that unifies the dimensions of X B,t and X H,t ; B t is expressed as t The energy storage level of the electric energy storage system in the time slot, H t is the energy storage level of the hydrogen energy storage system in the time slot t, WB is the parameter of the optimal electric energy storage system, and WH is the optimal hydrogen energy storage system. The parameters of ; the dynamic constraints that B t and H t need to satisfy are expressed as:
Figure BDA0003680126570000141
In the formula, P bc,t and P bd,t represent the charging power and discharging power of the electrical energy storage system, respectively; P el,t and P fc,t represent the electrolyzer input power and fuel cell output power in time slot t, respectively.

单时隙李雅普诺夫漂移,表达公式为:The single-slot Lyapunov drift is expressed as:

Λt=E{L(t+1)-L(t)|X(t)},Λ t =E{L(t+1)-L(t)|X(t)},

Figure BDA0003680126570000142
Figure BDA0003680126570000142

Figure BDA0003680126570000143
Figure BDA0003680126570000143

Figure BDA0003680126570000144
Figure BDA0003680126570000144

公式中,X(t)=(XB,t,XH,t),E{·}表示期望运算。In the formula, X(t)=(X B,t ,X H,t ), and E{·} represents the expected operation.

则单时隙李雅普诺夫漂移Λt的表达式可转化为:Then the expression of the single-slot Lyapunov drift Λ t can be transformed into:

Λt≤ξBH+E{Γ0|X(t)},Λ t ≤ξ BH +E{Γ 0 |X(t)},

Figure BDA0003680126570000145
Figure BDA0003680126570000145

计算单时隙李雅普诺夫漂移和运行成本的加权和ΔY(t),表达公式为:Calculate the weighted sum ΔY(t) of the single-slot Lyapunov drift and running cost, expressed as:

Figure BDA0003680126570000151
Figure BDA0003680126570000151

式中,V是一个加权参数。where V is a weighting parameter.

通过最小化加权和ΔY(t)将含氢建筑能源系统期望运行成本最小化问题模型转化为多个单时隙最优化子问题模型,单时隙最优化子问题模型的表达公式为:By minimizing the weighted sum ΔY(t), the expected operating cost minimization problem model of the hydrogen-containing building energy system is transformed into multiple single-slot optimization sub-problem models. The expression formula of the single-slot optimization sub-problem model is:

Figure BDA0003680126570000152
Figure BDA0003680126570000152

Figure BDA0003680126570000153
Figure BDA0003680126570000153

Figure BDA0003680126570000154
Figure BDA0003680126570000154

Figure BDA0003680126570000155
Figure BDA0003680126570000155

Figure BDA0003680126570000156
Figure BDA0003680126570000156

Figure BDA0003680126570000157
Figure BDA0003680126570000157

计算确定单时隙最优化子问题模型中的最优系统参数;最优电能存储系统的参数WB的计算公式为:Calculate and determine the optimal system parameters in the single-slot optimization sub-problem model; the calculation formula of the parameter W B of the optimal electric energy storage system is:

Figure BDA0003680126570000158
Figure BDA0003680126570000158

最优氢能存储系统的参数WH的计算公式为:The calculation formula of the parameter W H of the optimal hydrogen energy storage system is:

Figure BDA0003680126570000159
Figure BDA0003680126570000159

s.t.电能子系统的运行约束、氢能子系统的运行约束和热能子系统的运行约束。s.t. Operational constraints of the electrical energy subsystem, operational constraints of the hydrogen energy subsystem, and operational constraints of the thermal energy subsystem.

根据信息确定性将单时隙最优化子问题模型分解为与电-氢子系统对应的上层子问题模型以及与热能子系统对应的下层子问题模型;Decompose the single-slot optimization sub-problem model into an upper-level sub-problem model corresponding to the electric-hydrogen subsystem and a lower-level sub-problem model corresponding to the thermal energy subsystem according to the information determinism;

根据信息确定性将单时隙最优化子问题模型分解为与电-氢子系统对应的上层子问题模型以及与热能子系统对应的下层子问题模型,方法包括:The single-slot optimization sub-problem model is decomposed into an upper-level sub-problem model corresponding to the electric-hydrogen subsystem and a lower-level sub-problem model corresponding to the thermal energy subsystem according to the information determinism, and the method includes:

与电-氢子系统对应的上层子问题模型,表达公式为:The upper-level sub-problem model corresponding to the electro-hydrogen subsystem is expressed as:

Figure BDA0003680126570000161
Figure BDA0003680126570000161

s.t.电能子系统的运行约束和氢能子系统的运行约束;s.t. Operational constraints of the electrical energy subsystem and operational constraints of the hydrogen energy subsystem;

与热能子系统对应的下层子问题模型,表达公式为:The lower sub-problem model corresponding to the thermal energy subsystem is expressed as:

min(V(C5,t+C6,t))min(V(C 5, t + C 6, t ))

s.t.热能子系统的运行约束。s.t. Operational constraints for thermal energy subsystems.

对上层子问题模型采用凸优化方法进行求解,并根据上层子问题的求解结果计算得到燃料电池产热量,方法包括:The upper-layer sub-problem model is solved by using the convex optimization method, and the heat production of the fuel cell is calculated according to the solution results of the upper-layer sub-problem. The methods include:

由于上层子问题的目标函数为非凸函数,采用如下方式将其进行凸松弛,即目标函数调整为:

Figure BDA0003680126570000162
该目标函数与原目标函数的最大差距为
Figure BDA0003680126570000163
由于调整目标函数后,整个问题为线性规划,故可以快速得到最优解。然后,根据求解结果得到燃料电池产热量Qfc,t=ηhrηh2ePfc,tΔt,其中:ηhr表示热恢复效率,ηh2e表示燃料电池的热电比,Pfc,t表示燃料电池输出功率。Since the objective function of the upper sub-problem is a non-convex function, it is convexly relaxed in the following way, that is, the objective function is adjusted as:
Figure BDA0003680126570000162
The maximum difference between the objective function and the original objective function is
Figure BDA0003680126570000163
Since the whole problem is a linear programming after adjusting the objective function, the optimal solution can be obtained quickly. Then, according to the solution result, the fuel cell heat production Q fc,t = η hr η h2e P fc,t Δt, where: η hr represents the heat recovery efficiency, η h2e represents the thermoelectric ratio of the fuel cell, and P fc,t represents the fuel cell Output Power.

将燃料电池产热量作为下层子问题模型的输入状态;基于马尔科夫博弈框架对下层子问题模型进行重新建模的方法包括:The fuel cell heat production is used as the input state of the lower sub-problem model; the methods of re-modeling the lower sub-problem model based on the Markov game framework include:

所述热能子系统的环境状态表达式如下:The environmental state expression of the thermal energy subsystem is as follows:

st=(Qfc,t,Qth,tin,i,tout,i,t,t),s t =(Q fc,t ,Q th,tin,i,tout,i,t ,t),

Figure BDA0003680126570000171
Figure BDA0003680126570000171

式中,Qfc,t表示t时隙的燃料电池的产热量;Qth,t表示t时隙热能子系统中的隙热能存储系统的储能水平;βin,i,t为t时隙第i个房间的室内温度;βout,t为t时隙的室外温度;t表示指当前含氢建筑能源系统执行连续两次动作决策的时间间隔;Qth,t表示t时隙在热能子系统中的热能存储系统的储能水平,ηtc和ηtd分别表示热能子系统中的热能存储系统的注入效率和释放效率;Ptc,t和Ptd,t分别表示t时隙热能子系统中的隙热能存储系统的注入功率和释放功率;In the formula, Q fc,t represents the heat production of the fuel cell in time slot t; Q th,t represents the energy storage level of the interstitial thermal energy storage system in the thermal energy subsystem of time slot t; β in,i,t is the time slot t The indoor temperature of the i-th room; β out,t is the outdoor temperature in time slot t; t refers to the time interval between the current hydrogen-containing building energy system executing two consecutive action decisions; Q th,t refers to the time slot t in the thermal energy quantum The energy storage level of the thermal energy storage system in the system, η tc and η td represent the injection efficiency and release efficiency of the thermal energy storage system in the thermal energy subsystem, respectively; P tc,t and P td,t represent the t-slot thermal energy subsystem, respectively The injected power and the released power of the interstitial thermal energy storage system in ;

所述热能子系统的动作表达式为:The action expression of the thermal energy subsystem is:

at=(Psp,1,t,Psp,2,t,…,Psp,i,t),1≤i≤Nba t =(P sp,1,t ,P sp,2,t ,...,P sp,i,t ), 1≤i≤N b ,

式中,Psp,i,t为在t时隙时第i个房间的热供给功率;Nb为房间个数;In the formula, P sp,i,t is the heat supply power of the ith room at time slot t; N b is the number of rooms;

所述热能子系统的奖励表达式如下:The reward expression of the thermal energy subsystem is as follows:

Figure BDA0003680126570000172
Figure BDA0003680126570000172

式中,

Figure BDA0003680126570000173
其中,κth为惩罚系数。In the formula,
Figure BDA0003680126570000173
Among them, κ th is the penalty coefficient.

采用多智能体注意力深度确定性策略梯度算法进行求解,得到热能子系统的最优控制策略的方法包括:The multi-agent attention depth deterministic policy gradient algorithm is used to solve the problem, and the method to obtain the optimal control strategy of the thermal energy subsystem includes:

在每个时隙初,获取热能子系统的环境状态;At the beginning of each time slot, obtain the environmental state of the thermal energy subsystem;

深度神经网络根据所述当前热能子系统的环境状态,输出含氢建筑能源系统的当前热供给行为对热能子系统进行控制;The deep neural network controls the thermal energy subsystem by outputting the current heat supply behavior of the hydrogen-containing building energy system according to the environmental state of the current thermal energy subsystem;

获取下一时隙奖励和下一时隙的环境状态;将各时隙的奖励和环境状态存储至经验池中;Obtain the reward of the next time slot and the environmental state of the next time slot; store the reward and environmental state of each time slot into the experience pool;

计算深度神经网络的损失函数L(θi)和策略梯度

Figure BDA0003680126570000181
则从经验池中抽取训练样本,利用多智能体注意力深度确定性策略梯度算法训练深度神经网络,根据损失函数L(θi)和策略梯度
Figure BDA0003680126570000182
对深度神经网络进行迭代,获得热能子系统的最优控制策略。Calculate the loss function L(θ i ) and the policy gradient of the deep neural network
Figure BDA0003680126570000181
Then, the training samples are extracted from the experience pool, and the deep neural network is trained by the multi-agent attention depth deterministic policy gradient algorithm. According to the loss function L(θ i ) and the policy gradient
Figure BDA0003680126570000182
Iterate the deep neural network to obtain the optimal control strategy of the thermal energy subsystem.

所述训深度神经网络的损失函数L(θi)和策略梯度

Figure BDA0003680126570000183
表达式为:The loss function L(θ i ) and the policy gradient of the trained deep neural network
Figure BDA0003680126570000183
The expression is:

Figure BDA0003680126570000184
Figure BDA0003680126570000184

Figure BDA0003680126570000185
Figure BDA0003680126570000185

Figure BDA0003680126570000186
Figure BDA0003680126570000186

式中,π表示智能体的策略(由行动者网络表示);y表示目标评论家网络的输出Q值,π′表示智能体的目标策略(由目标行动者网络表示);

Figure BDA0003680126570000187
表示第i个智能体的评论家网络在策略π下输出的Q值;πi(ai|oi)表示第i个智能体的行动者网络输出。where π represents the agent's strategy (represented by the actor network); y represents the output Q value of the target critic network, and π' represents the agent's target strategy (represented by the target actor network);
Figure BDA0003680126570000187
represents the Q value of the critic network output of the ith agent under policy π; π i (a i |o i ) represents the actor network output of the ith agent.

多智能体注意力深度确定性策略梯度算法架构包括i个智能体,所述智能体设有单个深度神经网络,各深度神经网络包括行动者网络、目标行动者网络、评论家网络和目标评论家网络;行动者网络和目标行动者网络结构相同,评论家网络和目标评论家网络结构相同;The multi-agent attention depth deterministic policy gradient algorithm architecture includes i agents, the agents are provided with a single deep neural network, and each deep neural network includes an actor network, a target actor network, a critic network and a target critic network; the actor network and the target actor network have the same structure, and the critic network and the target critic network have the same structure;

行动者网络输入层的神经元个数与环境状态st的分量数相同,输出层的神经元个数与行为at的个数相同;所述智能体的评论家网络包括动作行为编码器模块、注意力机制模块和多层感知机模块;The number of neurons in the input layer of the actor network is the same as the number of components of the environmental state s t , and the number of neurons in the output layer is the same as the number of the behavior a t ; the critic network of the agent includes an action behavior encoder module , attention mechanism module and multilayer perceptron module;

所述注意力机制模块中第i个智能体行动者网络的输入是为oi,输出为ai;评论家网络的输入包括oi、ai

Figure BDA0003680126570000188
输出是Qi(o,a),
Figure BDA0003680126570000189
Figure BDA0003680126570000191
The input of the ith agent actor network in the attention mechanism module is o i , and the output is a i ; the input of the critic network includes o i , a i and
Figure BDA0003680126570000188
The output is Q i (o,a),
Figure BDA0003680126570000189
Figure BDA0003680126570000191

其中,oi是第i个智能体的局部观测状态;ai是输出的动作;ei表示第i个智能体的局部观察和行为的编码;Qi(o,a)是该评论家网络输出的Q值,在第i个智能体的评论家网络中,注意力模块的输入是

Figure BDA0003680126570000192
输出是xi,xi表示其他智能体的贡献,Among them, o i is the local observation state of the ith agent; a i is the output action; ei represents the encoding of the local observation and behavior of the ith agent; Qi (o, a) is the critic network The Q value of the output, in the critic network of the ith agent, the input of the attention module is
Figure BDA0003680126570000192
The output is xi , where xi represents the contribution of other agents,

其他智能体的贡献xi表达式为:The contribution xi of other agents is expressed as:

Figure BDA0003680126570000193
Figure BDA0003680126570000193

式中,Wvalue,j表示与第j个智能体相关的值变换矩阵;

Figure BDA0003680126570000194
是一个非线性激活函数;In the formula, W value,j represents the value transformation matrix related to the jth agent;
Figure BDA0003680126570000194
is a nonlinear activation function;

wj是与第j个智能体相关的权重,w j is the weight associated with the jth agent,

第j个智能体相关的权重wj表达为:The weight w j associated with the jth agent is expressed as:

Figure BDA0003680126570000195
Figure BDA0003680126570000195

式中,Wkey,i和Wquery,i分别是与第i个智能体相关的变换矩阵。where W key,i and W query,i are the transformation matrices related to the ith agent, respectively.

根据上层子问题模型的凸优化求解方法和热能子系统最优控制策略对含氢建筑能源系统的运行进行实时控制。According to the convex optimization solution method of the upper sub-problem model and the optimal control strategy of the thermal energy subsystem, the operation of the hydrogen-containing building energy system is controlled in real time.

图3展示了本发明方法与其他对比方案的性能对比图。方案1表示对电能存储系统和氢能存储系统进行联合控制。具体而言,当存在可再生能源过剩时,电能存储系统和氢能存储系统进行充电。反之,则电能存储系统和氢能存储系统进行放电。而且,采用ON-OFF策略对建筑热供给功率进行控制,即:当室内温度低于下限时,输入热功率为0;当室内温度高于上限时,输入热功率为最大热供给功率。方案2利用深度Q网络(DQN)算法对电能存储系统和氢能存储系统进行控制。同时,采用ON-OFF策略对建筑热供给功率进行控制。方案3采用多智能体深度确定性策略梯度算法(MADDPG)对所有储能设备和热负荷进行联合控制。方案4与本发明方法类似,但未考虑注意力机制。由图4可知,本发明方法可在维持高热舒适性(如平均温度偏离小于0.03摄氏度)的前提下显著降低运行成本。具体而言,相比方案1,方案2,方案3和方案4,分别降低平均运行成本30.09%,20.31%,25.66%,18.53%。FIG. 3 shows a performance comparison diagram of the method of the present invention and other comparison schemes. Scheme 1 represents the joint control of the electric energy storage system and the hydrogen energy storage system. Specifically, when there is a surplus of renewable energy, the electrical energy storage system and the hydrogen energy storage system are charged. Conversely, the electrical energy storage system and the hydrogen energy storage system discharge. Moreover, the ON-OFF strategy is used to control the building heat supply power, that is, when the indoor temperature is lower than the lower limit, the input heat power is 0; when the indoor temperature is higher than the upper limit, the input heat power is the maximum heat supply power. Scheme 2 uses the deep Q network (DQN) algorithm to control the electric energy storage system and the hydrogen energy storage system. At the same time, the ON-OFF strategy is used to control the building heat supply power. Scheme 3 adopts the multi-agent deep deterministic policy gradient algorithm (MADDPG) to jointly control all energy storage devices and thermal loads. Scheme 4 is similar to the method of the present invention, but does not consider the attention mechanism. It can be seen from FIG. 4 that the method of the present invention can significantly reduce the operating cost on the premise of maintaining high thermal comfort (eg, the deviation of the average temperature is less than 0.03 degrees Celsius). Specifically, compared with scheme 1, scheme 2, scheme 3 and scheme 4, the average operating cost is reduced by 30.09%, 20.31%, 25.66% and 18.53% respectively.

本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。As will be appreciated by those skilled in the art, the embodiments of the present application may be provided as a method, a system, or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.

以上所述仅是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明技术原理的前提下,还可以做出若干改进和变形,这些改进和变形也应视为本发明的保护范围。The above are only the preferred embodiments of the present invention. It should be pointed out that for those skilled in the art, without departing from the technical principle of the present invention, several improvements and modifications can also be made. These improvements and modifications It should also be regarded as the protection scope of the present invention.

Claims (10)

1. The operation control method of the hydrogen-containing building energy system based on the combined optimization and learning is characterized by comprising the following steps of: establishing an expected operation cost minimization problem model of the hydrogen-containing building energy system according to the operation constraint conditions and parameter uncertainty of the hydrogen-containing building energy system; converting the expected running cost minimization problem into a plurality of single-time slot optimization sub-problem models by utilizing a Lyapunov optimization framework;
decomposing the single-time-slot optimization sub-problem model into an upper layer sub-problem model corresponding to the electric-hydrogen subsystem and a lower layer sub-problem model corresponding to the heat energy subsystem;
solving the upper sub-problem model by adopting a convex optimization method, and calculating according to the solving result of the upper sub-problem to obtain the heat production quantity of the fuel cell;
taking the heat production quantity of the fuel cell as the input state of the lower layer subproblem model; based on a Markov game framework, carrying out re-modeling on a lower-layer sub-problem model, and solving by adopting a multi-agent attention depth certainty strategy gradient algorithm to obtain an optimal control strategy of a heat energy subsystem;
and controlling the operation of the hydrogen-containing building energy system in real time according to the convex optimization solving method of the upper sub-problem model and the optimal control strategy of the heat energy subsystem.
2. The method for controlling the operation of the hydrogen-containing building energy system based on the combined optimization and learning of claim 1, wherein the problem model of minimizing the expected operation cost of the hydrogen-containing building energy system is expressed by the following formula:
Figure FDA0003680126560000011
s.t. the operation constraint of the electric energy subsystem, the operation constraint of the hydrogen energy subsystem and the operation constraint of the heat energy subsystem;
in the formula, C 1,t Cost of buying and selling electricity for t time slot, C 2,t Cost of carbon emission for t time slot, C 3,t Cost of loss for t-slot electrical energy storage system, C 4,t Hydrogen for t time slotEnergy subsystem operation and maintenance cost, C 5,t For the loss cost of the t-slot thermal subsystem, C 6,t For T time slot natural gas purchase cost, T represents time slot length; the decision variables Θ include: energy trading volume between a local energy system and a large power grid, charging and discharging power of an electric energy storage system, input power of an electrolytic cell, output power of a fuel cell, heat supply power of each room, charging and discharging power of a heat energy storage system and natural gas consumption.
3. The method for controlling the operation of a hydrogen-containing building energy system based on combined optimization and learning of claim 2, wherein the method for converting the desired operation cost minimization problem into a plurality of single-time-slot optimization sub-problem models by using a Lyapunov optimization framework comprises:
judging the controllability of a hydrogen-containing building energy system; selecting a hydrogen building energy system which meets controllable conditions to construct a virtual queue of an electric energy subsystem and a hydrogen energy subsystem; defining a Lyapunov function according to the virtual queue, and calculating the weighted sum delta Y (t) of the single-time-slot Lyapunov drift and the operation cost; and converting the hydrogen-containing building energy system expected operation cost minimization problem model into a plurality of single-time-slot optimization sub-problem models through the minimization weighted sum delta Y (t), and calculating and determining optimal system parameters in the single-time-slot optimization sub-problem models.
4. The method for controlling the operation of the hydrogen-containing building energy system through combined optimization and learning according to claim 3, wherein the expression formula of the controllable conditions is as follows:
v max >τ max
v min >τ min
Figure FDA0003680126560000021
Figure FDA0003680126560000022
Figure FDA0003680126560000023
Figure FDA0003680126560000024
v max =max t v t ,τ max =max t τ t ,v min =min t v t ,τ min =min t τ t
Figure FDA0003680126560000025
Figure FDA0003680126560000031
in the formula, v max And v min Respectively representing the highest electricity price and the lowest electricity price for buying electricity; tau. max And τ min Respectively representing the highest electricity price and the lowest electricity price for selling electricity; eta bc And η bd Respectively representing the charging efficiency and the discharging efficiency of the electric energy storage system; mu.s c Is a weighting parameter that represents the importance of carbon emissions relative to energy costs;
Figure FDA0003680126560000032
and
Figure FDA0003680126560000033
respectively representing a maximum rate and a minimum rate of carbon emission; psi BESS Is the electrical energy storage system depreciation coefficient; b max And B min Respectively representing the maximum energy storage level and the minimum energy storage level of the electric energy storage system;
Figure FDA0003680126560000034
and
Figure FDA0003680126560000035
respectively representing the injection rated power and the release rated power of the electric energy storage system; omega el And omega fc Respectively representing the conversion coefficients of the electrolytic cell and the fuel cell;
Figure FDA0003680126560000036
and
Figure FDA0003680126560000037
indicating variables respectively indicating whether the electrolyzer and the fuel cell are on or off; h max And H min Respectively representing the maximum energy storage level and the minimum energy storage level of the hydrogen energy storage system;
Figure FDA0003680126560000038
and
Figure FDA0003680126560000039
respectively representing rated power of the electrolytic cell and the fuel cell; Δ t represents the slot length.
5. The method for controlling the operation of the hydrogen-containing building energy system based on the combined optimization and learning of claim 4, wherein the method for calculating the weighted sum Δ Y (t) of the single-slot lyapunov drift and the operation cost comprises:
the Lyapunov function L (t) is expressed by the formula:
Figure FDA00036801265600000310
in the formula, X B,t =B t +W B ,X H,t =H t +W H ,ω r Is a unity of X B,t And X H,t A dimensional weighting factor; b is t Energy storage level of an electrical energy storage system, denoted t time slot, H t Storage of a hydrogen energy storage system denoted t-slotsCan be horizontal, W B Expressed as a parameter of the optimal electric energy storage system, W H Parameters expressed as an optimal hydrogen energy storage system;
B t and H t The dynamic constraints that need to be satisfied are respectively expressed as:
Figure FDA00036801265600000311
Figure FDA0003680126560000041
in the formula, P bc,t And P bd,t Respectively representing the charging power and the discharging power of the electric energy storage system; p is el,t And P fc,t Respectively representing the input power of the electrolytic cell and the output power of the fuel cell at t time slot;
the single-time-slot lyapunov drift is expressed by the following formula:
Λ t =E{L(t+1)-L(t)|X(t)},
Figure FDA0003680126560000042
Figure FDA0003680126560000043
Figure FDA0003680126560000044
in the formula, X (t) ═ X B,t ,X H,t ) E {. cndot } represents an expected operation;
then the single time slot Lyapunov drift Lambda t The expression of (c) can be converted into:
Λ t ≤ξ BH +E{Γ 0 |X(t)},
Figure FDA0003680126560000045
calculating a weighted sum Δ y (t) of the single-slot lyapunov drift and the operating cost, expressed by the formula:
Figure FDA0003680126560000046
where V is a weighting parameter.
6. The method for controlling the operation of the hydrogen-containing building energy system through the combined optimization and learning of the claim 5 is characterized in that the expression formula of the single-time-slot optimization subproblem model is as follows:
Figure FDA0003680126560000051
s.t. the operation constraint of the electric energy subsystem, the operation constraint of the hydrogen energy subsystem and the operation constraint of the heat energy subsystem;
Figure FDA0003680126560000052
Figure FDA0003680126560000053
Figure FDA0003680126560000054
Figure FDA0003680126560000055
Figure FDA0003680126560000056
parameter W of an optimal electrical energy storage system B The calculation formula of (2) is as follows:
Figure FDA0003680126560000057
parameter W of optimal hydrogen energy storage system H The calculation formula of (c) is:
Figure FDA0003680126560000058
7. the method for controlling the operation of the hydrogen-containing building energy system based on the combined optimization and learning of claim 6, wherein the single-time-slot optimization sub-problem model is decomposed into an upper sub-problem model corresponding to the electric-hydrogen subsystem and a lower sub-problem model corresponding to the thermal energy subsystem according to the information certainty, and the method comprises the following steps:
an upper sub-problem model corresponding to the electro-hydrogen subsystem, expressed as:
Figure FDA0003680126560000061
s.t. the operation constraint of the electric energy subsystem and the operation constraint of the hydrogen energy subsystem;
the lower-layer sub-problem model corresponding to the heat energy subsystem has the expression formula as follows:
min(V(C 5,t +C 6,t ))
s.t. operating constraints of the thermal energy subsystem.
8. The method for controlling the operation of the hydrogen-containing building energy system based on the combined optimization and learning of claim 7, wherein the method for modeling the lower layer subproblem model again based on the Markov game framework comprises the following steps:
the environmental state expression of the thermal energy subsystem is as follows:
s t =(Q fc,t ,Q th,tin,i,tout,i,t ,t),
Figure FDA0003680126560000062
in the formula, Q fc,t Representing the heat generation amount of the fuel cell at t time slot; q th,t Representing the energy storage level of the slot thermal energy storage system in the t-slot thermal energy subsystem; beta is a in,i,t The indoor temperature of the ith room at the time slot t; beta is a out,t An outdoor temperature of t time slot; t represents the time interval of two continuous action decisions executed by the current hydrogen-containing building energy system; q th,t Representing the energy storage level, η, of the thermal energy storage system in the thermal energy sub-system for the t time slot tc And η td Respectively representing the injection efficiency and the release efficiency of the thermal energy storage system in the thermal energy subsystem; p tc,t And P td,t Respectively representing the injection power and the release power of a slot thermal energy storage system in the t-slot thermal energy subsystem;
the action expression of the heat energy subsystem is as follows:
a t =(P sp,1,t ,P sp,2,t ,…,P sp,i,t ),1≤i≤N b
in the formula, P sp,i,t Supplying power for the heat of the ith room at the time of the t time slot; n is a radical of hydrogen b The number of rooms;
the reward expression for the thermal energy subsystem is as follows:
Figure FDA0003680126560000071
in the formula (I), the compound is shown in the specification,
Figure FDA0003680126560000072
wherein, κ th Is a penalty factor.
9. The method for controlling the operation of a hydrogen-containing building energy system based on combined optimization and learning of claim 8, wherein the method for solving by using a multi-agent depth of attention deterministic strategy gradient algorithm comprises:
at the beginning of each time slot, acquiring the environmental state of the heat energy subsystem;
the deep neural network outputs the current heat supply behavior of the hydrogen-containing building energy system to control the heat energy subsystem according to the environmental state of the current heat energy subsystem;
acquiring the reward of the next time slot and the environmental state of the next time slot; storing the rewards and the environment state of each time slot into an experience pool;
computing a loss function L (theta) for a deep neural network i ) And strategic gradient
Figure FDA0003680126560000073
Extracting training samples from the experience pool, training a deep neural network by using a multi-agent attention depth deterministic strategy gradient algorithm, and obtaining a loss function L (theta) i ) And strategic gradient
Figure FDA0003680126560000074
Iterating the deep neural network to obtain an optimal control strategy of the heat energy subsystem;
a loss function L (theta) of the training deep neural network i ) And strategic gradient
Figure FDA0003680126560000075
The expression is as follows:
Figure FDA0003680126560000076
Figure FDA0003680126560000077
Figure FDA0003680126560000078
where π represents the policy of the agent (represented by the actor network); y represents the output Q value of the target critic network, and pi' represents the target strategy (represented by the target actor network) of the agent;
Figure FDA0003680126560000079
representing the Q value output by the critic network of the ith agent under the strategy pi; pi i (a i |o i ) Representing the actor network output of the ith agent.
10. The method for controlling the operation of the hydrogen-containing building energy system through combined optimization and learning of claim 9 is characterized in that a multi-agent attention depth certainty strategy gradient algorithm framework comprises i agents, each agent is provided with a single deep neural network, and each deep neural network comprises an actor network, a target actor network, a critic network and a target critic network; the actor network and the target actor network have the same structure, and the critic network and the target critic network have the same structure;
neuron number and environment state s of actor network input layer t The number of the components of (a) is the same, the number of the neurons of the output layer is the same as the behavior a t The number of the groups is the same; the critic network of the intelligent agent comprises an action behavior encoder module, an attention mechanism module and a multilayer perceptron module;
the input to the i-th agent actor network in the attention mechanism module is o i Output is a i (ii) a The input to the critic network includes o i 、a i And
Figure FDA0003680126560000081
the output is Q i (o,a),
Figure FDA0003680126560000082
Figure FDA0003680126560000083
Wherein o is i Is the local observed state of the ith agent; a is a i Is an action of output; e.g. of the type i Code representing local observations and behaviors of the ith agent; q i (o, a) is the Q value of the critic network output, and in the critic network of the ith agent, the input to the attention module is
Figure FDA0003680126560000084
The output is x i ,x i Represents contributions of other agents;
contribution x of other Agents i The expression is as follows:
Figure FDA0003680126560000085
in the formula, W value,j A value transformation matrix representing a value associated with a jth agent;
Figure FDA0003680126560000086
is a non-linear activation function;
w j is the weight associated with the jth agent;
jth agent dependent weight w j Expressed as:
Figure FDA0003680126560000091
in the formula, W key,i And W query,i Respectively, the transformation matrices associated with the ith agent.
CN202210631486.3A 2022-06-06 2022-06-06 Hydrogen-containing building energy system operation control method combining optimization and learning Active CN115130733B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210631486.3A CN115130733B (en) 2022-06-06 2022-06-06 Hydrogen-containing building energy system operation control method combining optimization and learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210631486.3A CN115130733B (en) 2022-06-06 2022-06-06 Hydrogen-containing building energy system operation control method combining optimization and learning

Publications (2)

Publication Number Publication Date
CN115130733A true CN115130733A (en) 2022-09-30
CN115130733B CN115130733B (en) 2024-07-09

Family

ID=83378492

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210631486.3A Active CN115130733B (en) 2022-06-06 2022-06-06 Hydrogen-containing building energy system operation control method combining optimization and learning

Country Status (1)

Country Link
CN (1) CN115130733B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115907220A (en) * 2022-12-26 2023-04-04 南京邮电大学 Method and system for online optimization operation of hydrogen-containing energy system in uncertain environment
CN119065418A (en) * 2024-11-05 2024-12-03 宁波隆跃科技有限公司 Mold temperature adaptive adjustment method and system based on machine learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110458443A (en) * 2019-08-07 2019-11-15 南京邮电大学 A smart home energy management method and system based on deep reinforcement learning
US20200301924A1 (en) * 2019-03-20 2020-09-24 Guangdong University Of Technology Method for constructing sql statement based on actor-critic network
CN112966444A (en) * 2021-03-12 2021-06-15 南京邮电大学 Intelligent energy optimization method and device for building multi-energy system
US20220036392A1 (en) * 2020-08-03 2022-02-03 Desong Bian Deep Reinforcement Learning Based Real-time scheduling of Energy Storage System (ESS) in Commercial Campus

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200301924A1 (en) * 2019-03-20 2020-09-24 Guangdong University Of Technology Method for constructing sql statement based on actor-critic network
CN110458443A (en) * 2019-08-07 2019-11-15 南京邮电大学 A smart home energy management method and system based on deep reinforcement learning
US20220036392A1 (en) * 2020-08-03 2022-02-03 Desong Bian Deep Reinforcement Learning Based Real-time scheduling of Energy Storage System (ESS) in Commercial Campus
CN112966444A (en) * 2021-03-12 2021-06-15 南京邮电大学 Intelligent energy optimization method and device for building multi-energy system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115907220A (en) * 2022-12-26 2023-04-04 南京邮电大学 Method and system for online optimization operation of hydrogen-containing energy system in uncertain environment
CN119065418A (en) * 2024-11-05 2024-12-03 宁波隆跃科技有限公司 Mold temperature adaptive adjustment method and system based on machine learning
CN119065418B (en) * 2024-11-05 2025-03-04 宁波隆跃科技有限公司 Mold temperature adaptive adjustment method and system based on machine learning

Also Published As

Publication number Publication date
CN115130733B (en) 2024-07-09

Similar Documents

Publication Publication Date Title
Narimani et al. A novel approach to multi-objective optimal power flow by a new hybrid optimization algorithm considering generator constraints and multi-fuel type
Hou et al. Multi-time scale optimization scheduling of microgrid considering source and load uncertainty
Gazijahani et al. Game theory based profit maximization model for microgrid aggregators with presence of EDRP using information gap decision theory
Moghaddam et al. Multi-operation management of a typical micro-grids using Particle Swarm Optimization: A comparative study
CN112966444B (en) Intelligent energy optimization method and device for building multi-energy system
Qi et al. Low-carbon community adaptive energy management optimization toward smart services
CN109636056B (en) A decentralized optimization scheduling method for multi-energy microgrids based on multi-agent technology
Mei et al. Multi-objective optimal scheduling of microgrid with electric vehicles
Yu et al. Joint optimization and learning approach for smart operation of hydrogen-based building energy systems
CN105790266B (en) A kind of parallel Multi-objective Robust Optimized Operation integrated control method of micro-capacitance sensor
Yin et al. Relaxed deep generative adversarial networks for real-time economic smart generation dispatch and control of integrated energy systems
CN106712075A (en) Peaking strategy optimization method considering safety constraints of wind power integration system
CN115130733B (en) Hydrogen-containing building energy system operation control method combining optimization and learning
Liang et al. Deep reinforcement learning-based optimal scheduling of integrated energy systems for electricity, heat, and hydrogen storage
Yang et al. Data-driven optimal dynamic dispatch for hydro-PV-PHS integrated power systems using deep reinforcement learning approach
CN116957229A (en) Hausdorff distance-based micro-grid two-stage distribution robust optimal scheduling method
Dong et al. A coordinated active and reactive power optimization approach for multi-microgrids connected to distribution networks with multi-actor-attention-critic deep reinforcement learning
CN117610813A (en) A virtual power plant distributed resource collaborative optimization dispatching method and system
Ahmadi et al. Performance of a smart microgrid with battery energy storage system's size and state of charge
Fan et al. Multi-agent deep reinforced co-dispatch of energy and hydrogen storage in low-carbon building clusters
He et al. Economic optimization scheduling of multi‐microgrid based on improved genetic algorithm
Hongwei et al. Robust stochastic optimal dispatching of integrated electricity-gas-heat systems with improved integrated demand response
Yujie et al. Optimal operation of new coastal power systems with seawater desalination based on grey wolf optimization
Sun et al. Energy management based on safe multi-agent reinforcement learning for smart buildings in distribution networks
Zheng et al. Multi‐scale coordinated optimal dispatch method of electricity‐thermal‐hydrogen integrated energy systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant