CN115130733A - Hydrogen-containing building energy system operation control method combining optimization and learning - Google Patents
Hydrogen-containing building energy system operation control method combining optimization and learning Download PDFInfo
- Publication number
- CN115130733A CN115130733A CN202210631486.3A CN202210631486A CN115130733A CN 115130733 A CN115130733 A CN 115130733A CN 202210631486 A CN202210631486 A CN 202210631486A CN 115130733 A CN115130733 A CN 115130733A
- Authority
- CN
- China
- Prior art keywords
- hydrogen
- subsystem
- energy storage
- slot
- energy
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 229910052739 hydrogen Inorganic materials 0.000 title claims abstract description 101
- 239000001257 hydrogen Substances 0.000 title claims abstract description 101
- UFHFLCQGNIYNRP-UHFFFAOYSA-N Hydrogen Chemical compound [H][H] UFHFLCQGNIYNRP-UHFFFAOYSA-N 0.000 title claims abstract description 90
- 238000000034 method Methods 0.000 title claims abstract description 56
- 238000005457 optimization Methods 0.000 title claims abstract description 56
- 239000000446 fuel Substances 0.000 claims abstract description 32
- 238000011217 control strategy Methods 0.000 claims abstract description 12
- 238000004519 manufacturing process Methods 0.000 claims abstract description 9
- 238000004146 energy storage Methods 0.000 claims description 107
- 239000003795 chemical substances by application Substances 0.000 claims description 70
- 210000004027 cell Anatomy 0.000 claims description 40
- 230000006870 function Effects 0.000 claims description 28
- 230000005611 electricity Effects 0.000 claims description 24
- 238000013528 artificial neural network Methods 0.000 claims description 21
- VNWKTOKETHGBQD-UHFFFAOYSA-N methane Chemical compound C VNWKTOKETHGBQD-UHFFFAOYSA-N 0.000 claims description 18
- 238000007599 discharging Methods 0.000 claims description 16
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 claims description 15
- 229910052799 carbon Inorganic materials 0.000 claims description 15
- 238000004422 calculation algorithm Methods 0.000 claims description 14
- 230000009471 action Effects 0.000 claims description 13
- 230000006399 behavior Effects 0.000 claims description 12
- 230000007613 environmental effect Effects 0.000 claims description 12
- 238000002347 injection Methods 0.000 claims description 10
- 239000007924 injection Substances 0.000 claims description 10
- 239000003345 natural gas Substances 0.000 claims description 9
- 230000007246 mechanism Effects 0.000 claims description 8
- 238000012549 training Methods 0.000 claims description 8
- 238000003860 storage Methods 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 6
- 210000002569 neuron Anatomy 0.000 claims description 6
- 230000009466 transformation Effects 0.000 claims description 6
- 230000020169 heat generation Effects 0.000 claims description 4
- 238000012423 maintenance Methods 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 3
- 150000001875 compounds Chemical class 0.000 claims description 3
- 230000001419 dependent effect Effects 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 150000002431 hydrogen Chemical class 0.000 claims 1
- 230000008901 benefit Effects 0.000 abstract description 4
- 230000009977 dual effect Effects 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 9
- 238000004590 computer program Methods 0.000 description 7
- 238000012545 processing Methods 0.000 description 4
- 230000002787 reinforcement Effects 0.000 description 4
- 239000000243 solution Substances 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 2
- 239000002803 fossil fuel Substances 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000003912 environmental pollution Methods 0.000 description 1
- 238000010438 heat treatment Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000003313 weakening effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/38—Arrangements for parallely feeding a single network by two or more generators, converters or transformers
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J2300/00—Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
- H02J2300/30—The power source being a fuel cell
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J2310/00—The network for supplying or distributing electric power characterised by its spatial reach or by the load
- H02J2310/10—The network having a local or delimited stationary reach
- H02J2310/12—The local stationary network supplying a household or a building
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Economics (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Strategic Management (AREA)
- Human Resources & Organizations (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Marketing (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Power Engineering (AREA)
- Molecular Biology (AREA)
- Evolutionary Computation (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Primary Health Care (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Development Economics (AREA)
- Game Theory and Decision Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Feedback Control In General (AREA)
Abstract
The invention discloses a hydrogen-containing building energy system operation control method combining optimization and learning in the field of building energy system operation control, which comprises the following steps: establishing an expected operation cost minimization problem model of the hydrogen-containing building energy system, and converting the problem model into a plurality of single-time-slot optimization sub-problem models; decomposing the single-time-slot optimization sub-problem model into an upper sub-problem model and a lower sub-problem model; solving the upper sub-problem model by adopting a convex optimization method, and calculating according to the solving result of the upper sub-problem to obtain the heat production quantity of the fuel cell; taking the heat production quantity of the fuel cell as the input state of the lower layer subproblem model; solving the lower sub-problem model to obtain an optimal control strategy of the heat energy subsystem; the operation of the hydrogen-containing building energy system is controlled in real time; the invention realizes the minimum operation cost under high thermal comfort by utilizing the dual advantages of the convex optimization method based on the model and the learning method based on the model-free.
Description
Technical Field
The invention belongs to the field of building energy system operation control, and particularly relates to a hydrogen-containing building energy system operation control method.
Background
Buildings account for a significant percentage of the total energy consumption and carbon emissions worldwide. In 2019, the energy consumed by global buildings accounts for about 30% of the total amount of global energy, and the generated carbon emission accounts for about 28% of the total amount of global carbon emission. At present, global energy supply mainly depends on non-renewable energy sources such as fossil fuels, so that the problem of energy exhaustion and the problem of environmental pollution are increasingly serious. In recent years, hydrogen energy has attracted much attention because of its advantages of being clean, renewable, widely available, convenient to store and transport, high in utilization rate, etc., and is recognized as a promising fossil fuel substitute. In addition, the coordinated operation of the hydrogen energy storage system and other energy storage systems (such as a thermal energy storage system and an electric energy storage system) is beneficial to improving the energy efficiency of the building. Therefore, the operation control of the hydrogen-containing building energy system is worth intensive research.
The existing research proposes a plurality of operation control methods of the hydrogen-containing building energy system, such as random planning, model predictive control and the like. The goal of these methods is to minimize system operating costs (mainly including energy costs and carbon emission costs, etc.). Despite the advances made in the prior art, none of the prior art has considered building thermal dynamics, which means that the high building thermal inertia (i.e., the phenomenon of building room temperature weakening and delaying reactions due to initial stimuli such as sudden cessation of heating) is not fully exploited to reduce system operating costs.
When building thermodynamics are considered in a hydrogen-containing building energy system, optimal control of system operation faces four challenges: (1) there are a number of uncertain system parameters; (2) there are a number of time and space coupled operational constraints; (3) the fuel cell in the hydrogen energy storage system simultaneously generates electricity and heat to cause coupling between the electrical energy flow and the thermal energy flow; (4) it is difficult to establish a definite building thermodynamic model that is both accurate and easy to control the building. Specifically, the action space dimension of single agent deep reinforcement learning will increase dramatically as the number of hot regions increases; due to the fact that cooperation among heterogeneous agents is faced in multi-agent deep reinforcement learning, effective learning of the multi-agent deep reinforcement learning faces difficulty when the number of agents is increased.
Disclosure of Invention
The invention aims to provide a hydrogen-containing building energy system operation control method combining optimization and learning, which utilizes the dual advantages of a convex optimization method based on a model and a learning method based on no model to realize the minimization of the operation cost under high thermal comfort.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
the invention provides a method for controlling the operation of a hydrogen-containing building energy system by combining optimization and learning, which comprises the following steps:
establishing an expected operation cost minimization problem model of the hydrogen-containing building energy system according to the operation constraint conditions and parameter uncertainty of the hydrogen-containing building energy system; converting the expected running cost minimization problem into a plurality of single-time slot optimization sub-problem models by utilizing a Lyapunov optimization framework;
decomposing the single-time-slot optimization sub-problem model into an upper layer sub-problem model corresponding to the electric-hydrogen subsystem and a lower layer sub-problem model corresponding to the heat energy subsystem;
solving the upper sub-problem model by adopting a convex optimization method, and calculating according to the solving result of the upper sub-problem to obtain the heat production quantity of the fuel cell;
taking the heat production quantity of the fuel cell as the input state of the lower layer subproblem model; based on a Markov game framework, carrying out re-modeling on a lower-layer subproblem model, and solving by adopting a multi-agent attention depth certainty strategy gradient algorithm to obtain an optimal control strategy of a heat energy subsystem;
and controlling the operation of the hydrogen-containing building energy system in real time according to the convex optimization solving method of the upper sub-problem model and the optimal control strategy of the heat energy subsystem.
Preferably, the hydrogen-containing building energy system expected operation cost minimization problem model is expressed by the following formula:
s.t. the operation constraint of the electric energy subsystem, the operation constraint of the hydrogen energy subsystem and the operation constraint of the heat energy subsystem;
in the formula, C 1,t Cost of buying and selling electricity for t time slot, C 2,t Cost of carbon emissions for t time slot, C 3,t Cost of loss for t-slot electrical energy storage system, C 4,t The operation and maintenance cost of the hydrogen energy subsystem in the t time slot, C 5,t For the loss cost of the t-slot thermal subsystem, C 6,t For T time slot natural gas purchase cost, T represents time slot length; the decision variables Θ include: energy trading volume between a local energy system and a large power grid, charging and discharging power of an electric energy storage system, input power of an electrolytic cell, output power of a fuel cell, heat supply power of each room, charging and discharging power of a heat energy storage system and natural gas consumption.
Preferably, the method for converting the expected running cost minimization problem into a plurality of single-slot optimization sub-problem models by using the Lyapunov optimization framework comprises the following steps:
judging the controllability of a hydrogen-containing building energy system; selecting a hydrogen building energy system which meets controllable conditions to construct a virtual queue of an electric energy subsystem and a hydrogen energy subsystem; defining a Lyapunov function according to the virtual queue, and calculating the weighted sum delta Y (t) of the single-time-slot Lyapunov drift and the operation cost; and converting the minimization problem model of the expected operation cost of the hydrogen-containing building energy system into a plurality of single-time-slot optimization sub-problem models through the minimization weighted sum delta Y (t), and calculating and determining optimal system parameters in the single-time-slot optimization sub-problem models.
Preferably, the expression formula of the controllable condition is as follows:
v max >τ max ,
v min >τ min ,
in the formula, v max And v min Respectively representing the highest electricity price and the lowest electricity price for buying electricity; tau. max And τ min Respectively representing the highest electricity price and the lowest electricity price for selling electricity; eta bc And η bd Respectively representing the charging efficiency and the discharging efficiency of the electric energy storage system; mu.s c Is a weighting parameter that represents the importance of carbon emissions relative to energy costs;andrespectively representing a maximum rate and a minimum rate of carbon emission; psi BESS Is the electrical energy storage system depreciation coefficient;B max and B min Respectively representing the maximum energy storage level and the minimum energy storage level of the electric energy storage system;andrespectively representing the injection rated power and the release rated power of the electric energy storage system; omega el And omega fc Respectively representing the conversion coefficients of the electrolytic cell and the fuel cell;andindicating variables respectively indicating whether the electrolyzer and the fuel cell are on or off; h max And H min Respectively representing the maximum energy storage level and the minimum energy storage level of the hydrogen energy storage system;andrespectively representing rated power of the electrolytic cell and the fuel cell; Δ t represents the slot length.
Preferably, the method of calculating the weighted sum of the single-slot lyapunov drift and the operating cost Δ y (t) comprises:
the Lyapunov function L (t) is expressed by the formula:
in the formula, X B,t =B t +W B ,X H,t =H t +W H ,ω r Is a unity of X B,t And X H,t A dimensional weighting factor; b t Energy storage level of an electrical energy storage system, denoted t time slot, H t Is shown asEnergy storage level of a hydrogen energy storage system of time t slots, W B Expressed as a parameter of the optimal electric energy storage system, W H Parameters expressed as an optimal hydrogen energy storage system; b is t And H t The dynamic constraints that need to be satisfied are respectively expressed as: in the formula, P bc,t And P bd,t Respectively representing the charging power and the discharging power of the electric energy storage system; p is el,t And P fc,t Respectively representing the input power of the electrolyzer and the output power of the fuel cell at t time slot. The single-time-slot lyapunov drift is expressed by the following formula:
Λ t =E{L(t+1)-L(t)|X(t)},
in the formula, X (t) ═ X B,t ,X H,t ) And E {. cndot } represents the desired operation.
Then the single time slot Lyapunov drift Lambda t The expression of (c) can be converted into:
Λ t ≤ξ B +ξ H +E{Γ 0 |X(t)},
calculating a weighted sum Δ y (t) of the single-slot lyapunov drift and the operating cost, expressed by the formula:
where V is a weighting parameter.
Preferably, the expression formula of the single-slot optimization subproblem model is
Parameter W of an optimal electrical energy storage system B The calculation formula of (2) is as follows:
parameter W of optimal hydrogen energy storage system H The calculation formula of (2) is as follows:
s.t. the operating constraints of the electrical energy subsystem, the operating constraints of the hydrogen energy subsystem and the operating constraints of the thermal energy subsystem.
Preferably, the single-slot optimization sub-problem model is decomposed into an upper sub-problem model corresponding to the electric-hydrogen subsystem and a lower sub-problem model corresponding to the thermal energy subsystem according to information certainty, and the method comprises the following steps:
the upper layer subproblem model corresponding to the electro-hydrogen subsystem is expressed by the formula:
s.t. the operation constraint of the electric energy subsystem and the operation constraint of the hydrogen energy subsystem;
the lower layer subproblem model corresponding to the heat energy subsystem has the expression formula as follows:
min(V(C 5,t +C 6,t ) S.t. operating constraints of the thermal energy subsystem.
Preferably, the method for modeling the underlying subproblem model again based on the markov game framework comprises the following steps:
the environmental state expression of the thermal energy subsystem is as follows:
s t =(Q fc,t ,Q th,t ,β in,i,t ,β out,i,t ,t),
in the formula, Q fc,t Representing the heat generation of the fuel cell at t time slot; q th,t Representing the energy storage level of the slot thermal energy storage system in the t-slot thermal energy subsystem; beta is a in,i,t The indoor temperature of the ith room at the time slot t; beta is a out,t An outdoor temperature of t time slot; t represents the time interval of two continuous action decisions executed by the current hydrogen-containing building energy system; q th,t Indicating that t time slot is in the thermal subsystemEnergy storage level, η, of thermal energy storage system of (1) tc And η td Respectively representing the injection efficiency and the release efficiency of a thermal energy storage system in the thermal energy subsystem; p tc,t And P td,t Respectively representing the injection power and the release power of a slot thermal energy storage system in the t-slot thermal energy subsystem;
the action expression of the heat energy subsystem is as follows:
a t =(P sp,1,t ,P sp,2,t ,…,P sp,i,t ),1≤i≤N b ,
in the formula, P sp,i,t Supplying power for the heat of the ith room at the time of the t time slot; n is a radical of b The number of rooms;
the reward expression for the thermal energy subsystem is as follows:
Preferably, the method for solving by using the multi-agent attention depth certainty strategy gradient algorithm comprises the following steps:
at the beginning of each time slot, acquiring the environmental state of the heat energy subsystem;
the deep neural network outputs the current heat supply behavior of the hydrogen-containing building energy system to control the heat energy subsystem according to the environmental state of the current heat energy subsystem;
acquiring the reward of the next time slot and the environmental state of the next time slot; storing the rewards and the environment state of each time slot into an experience pool;
computing a loss function L (theta) for a deep neural network i ) And a strategic gradientThen training samples are extracted from the experience pool and the multi-agent attention depth certainty strategy is utilizedTraining the deep neural network by a slight gradient algorithm according to a loss function L (theta) i ) And a strategic gradientAnd (4) iterating the deep neural network to obtain the optimal control strategy of the heat energy subsystem.
Preferably, the multi-agent attention depth certainty strategy gradient algorithm framework comprises i agents, wherein each agent is provided with a single deep neural network, and each deep neural network comprises an actor network, a target actor network, a critic network and a target critic network; the actor network and the target actor network have the same structure, and the critic network and the target critic network have the same structure;
neuron number and environment state s of actor network input layer t The number of components of (a) is the same, and the number of neurons of the output layer is the same as the behavior a t The number of the groups is the same; the critic network of the intelligent agent comprises an action behavior encoder module, an attention mechanism module and a multilayer perceptron module;
the input to the i-th agent actor network in the attention mechanism module is o i The output is a i (ii) a The input to the critic network includes o i 、a i Andthe output is Q i (o,a),
Wherein o is i Is the local observed state of the ith agent; a is i Is an action of output; e.g. of the type i Code representing local observations and behaviors of the ith agent; q i (o, a) is the Q value of the critic network output, and in the critic network of the ith agent, the input to the attention module isThe output is x i ,x i Represents contributions of other agents;
contribution x of other Agents i The expression is as follows:
in the formula, W value,j A value transformation matrix representing a value associated with a jth agent;is a non-linear activation function;
w j is the weight associated with the jth agent;
jth agent dependent weight w j Expressed as:
in the formula, W key,i And W query,i Respectively, the transformation matrices associated with the ith agent.
Preferably, the training deep neural network has a loss function L (θ) i ) And a strategic gradientThe expression is as follows:
where π represents the policy of the agent (represented by the actor network); y represents the output Q value of the target critic network, and pi' represents the target policy (represented by the target actor network) of the agent;representing the Q value output by the critic network of the ith agent under the strategy pi; pi i (a i |o i ) Representing the actor network output of the ith agent.
Compared with the prior art, the invention has the following beneficial effects:
the operation of the electricity-hydrogen subsystem adopts the optimization based on the upper sub-problem model, then the optimization result is used as the input state of the operation of the heat energy subsystem, and the optimal operation control strategy of the heat energy subsystem is learned by adopting the multi-agent deep reinforcement learning technology, so that the occurrence of heterogeneous agents is avoided; the attention mechanism is adopted, so that the learning of the optimal operation control strategy of the heat energy subsystem has high expandability.
The method utilizes the dual advantages of a convex optimization method based on a model and a learning method based on no model, and realizes the minimization of the operation cost under high thermal comfort on the premise of not knowing the prior information of uncertain parameters and defining a thermodynamic model of the building.
Drawings
Fig. 1 is a flowchart of a method for controlling operation of a hydrogen-containing building energy system by combined optimization and learning according to an embodiment of the present invention;
FIG. 2 is a network framework diagram of a multi-agent depth of attention deterministic policy gradient algorithm of the present invention;
FIG. 3 is a graph of average temperature deviation of an embodiment of the present invention compared to other solutions;
FIG. 4 is a graph comparing the average operating cost of an embodiment of the present invention with other solutions.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
A method for controlling the operation of a hydrogen-containing building energy system by combining optimization and learning comprises the following steps:
establishing an expected operation cost minimization problem model of the hydrogen-containing building energy system according to the operation constraint conditions and parameter uncertainty of the hydrogen-containing building energy system;
the minimum problem model of the expected operation cost of the hydrogen-containing building energy system has the expression formula as follows:
C 2,t =μ c μ e,t P g,t Δt
C 3,t =ψ BESS (|P bc,t |+|P bd,t |)
C 5,t =ψ TESS (|P tc,t |+|P td,t |)
s.t. the operation constraint of the electric energy subsystem, the operation constraint of the hydrogen energy subsystem and the operation constraint of the heat energy subsystem;
in the formula, C 1,t Cost of buying and selling electricity for t time slot, C 2,t Cost of carbon emissions for t time slot, C 3,t Cost of loss for t-slot electrical energy storage systems, C 4,t The operation and maintenance cost of the hydrogen energy subsystem in the t time slot, C 5,t For the loss cost of the t-slot thermal subsystem, C 6,t For the t-slot purchase cost of natural gas,t represents the time slot length; v. of t And τ t Respectively representing the electricity buying price and the electricity selling price of the t time slot; p g,t The energy trading volume of the hydrogen-containing building energy system and the large power grid interaction is t time slot; mu.s c Is the carbon emission cost coefficient, with the unit of RMB/kg; mu.s e,t The carbon emission rate of a large power grid at the time slot t; psi BESS Is the battery depreciation coefficient, the unit is RMB/kW; p bc,t And P bd,t Respectively representing the charging power and the discharging power of the electric energy storage system;andrespectively representing the operation and maintenance costs, the start-up costs and the shut-down costs of a component x (x. epsilon. { el, fc }) in the hydrogen energy storage system, wherein "el" and "fc" respectively represent an electrolyzer and a fuel cell;andrespectively, representing logical indicator variables associated with the ON/OFF state, the ON state and the OFF state of the component x, wherein, ψ TESS the depreciation coefficient of the thermal energy storage system is RMB/kW; p tc,t And P td,t Respectively representing the injection power and the release power of the t-time slot thermal energy storage system; eta gb Representing the conversion efficiency of natural gas into heat energy; p gb,t The thermal power output by the natural gas boiler is represented; lambda gb Indicating the price of natural gas in RMB/kWh.
In the above problem of minimizing the operation cost of the hydrogen-containing building energy system containing hydrogen-electricity-heat mixed energy storage, the decision variables Θ include: the energy trading volume between the local energy system and the large power grid, the charging and discharging power of the electric energy storage system, the input power of the electrolytic cell, the output power of the fuel cell, the heat supply power of each room, the charging and discharging power of the heat energy storage system and the natural gas consumption. The constraints to be considered are: the operating constraints associated with the hydrogen energy storage system, the electrical energy storage system, the thermal energy storage system, and the room comfort temperature range are as follows:
(1) the hydrogen energy storage system should satisfy the following constraints: h is not less than 0 t ≤H max , P el,t ·P fc,t 0 in the formula, H max Is the maximum storage capacity of the hydrogen tank;andthe nominal power of the electrolyzer and the fuel cell, respectively.
(2) The electrical energy storage system needs to satisfy the following constraints: b is min ≤B t ≤B max , P bc,t ·P bd,t 0, wherein B min And B max Minimum and maximum energy levels of the electrical energy storage system, respectively;the maximum charging and discharging power of the electric energy storage system are respectively.
(3) In thermal energy storage systemsDuring charging and discharging, the following operation constraints are required to be met: P td,t ·P tc,t when the ratio is 0, in the formula,is the maximum capacity of the thermal energy storage system;andrespectively the maximum released power and the maximum injected power of the thermal energy storage system.
(4) The thermal load demand meets the following operating constraints:β in,i,t+1 =F(P sp,i,t ,β out,t ,β in,i,t ,ε i,t ) In the formula (I), the reaction is carried out,andrespectively representing the lower limit and the upper limit of a comfortable temperature range in a building i; beta is a in,i,t The indoor temperature of the ith room at the time slot t; f i A thermodynamic model representing a building i; epsilon i,t Representing a random thermal perturbation of the t time slot;representing the maximum heat supply power within the building i.
The method for converting the expected operation cost minimization problem into a plurality of single-time-slot optimization sub-problem models by utilizing the Lyapunov optimization framework comprises the following steps:
judging the controllability of a hydrogen-containing building energy system; the expression formula of the controllable condition is as follows:
v max >τ max ,
v min >τ min ,
in the formula, v max And v min Respectively representing the highest electricity price and the lowest electricity price for buying electricity; tau is max And τ min Respectively representing the highest electricity price and the lowest electricity price for selling electricity; eta bc And η bd Respectively representing the charging efficiency and the discharging efficiency of the electric energy storage system; mu.s c Is a weighting parameter that represents the importance of carbon emissions relative to energy costs;andrespectively representing a maximum rate and a minimum rate of carbon emission; psi BESS Is the electrical energy storage system depreciation coefficient; b is max And B min Respectively representing the maximum energy storage level and the minimum energy storage level of the electric energy storage system;andrespectively representing the injection rated power and the release rated power of the electric energy storage system; omega el And ω fc Respectively representing the conversion coefficients of the electrolytic cell and the fuel cell;andindicating variables respectively indicating whether the electrolyzer and the fuel cell are on or off; h max And H min Respectively representing the maximum energy storage level and the minimum energy storage level of the hydrogen energy storage system;andrespectively representing rated power of the electrolytic cell and the fuel cell; Δ t represents the slot length.
Selecting a hydrogen building energy system which meets controllable conditions to construct a virtual queue of an electric energy subsystem and a hydrogen energy subsystem; the method for calculating the weighted sum of the single-slot lyapunov drift and the running cost deltay (t) according to the virtual queue definition lyapunov function comprises the following steps:
the Lyapunov function L (t) is expressed by the formula:
in the formula, X B,t =B t +W B ,X H,t =H t +W H ,ω r Is a uniform X B,t And X H,t A dimensional weighting factor; b is t Energy storage level of an electrical energy storage system, denoted t time slot, H t Energy storage level of a hydrogen energy storage system, denoted t time slot, W B Expressed as a parameter of the optimal electric energy storage system, W H Parameters expressed as an optimal hydrogen energy storage system; b is t And H t The dynamic constraints that need to be satisfied are respectively expressed as:in the formula, P bc,t And P bd,t Respectively representing the charging power and the discharging power of the electric energy storage system; p el,t And P fc,t Respectively representing the input power of the electrolyzer and the output power of the fuel cell at t time slot.
The single-time-slot lyapunov drift is expressed by the following formula:
Λ t =E{L(t+1)-L(t)|X(t)},
in the formula, X (t) ═ X B,t ,X H,t ) And E {. cndot } represents the desired operation.
Then the single-time slot lyapunov drift Λ t The expression of (c) can be converted into:
Λ t ≤ξ B +ξ H +E{Γ 0 |X(t)},
calculating a weighted sum Δ y (t) of the single-slot lyapunov drift and the operating cost, expressed by the formula:
where V is a weighting parameter.
Converting the hydrogen-containing building energy system expected operation cost minimization problem model into a plurality of single-time slot optimization sub-problem models through the minimization weighted sum delta Y (t), wherein the expression formula of the single-time slot optimization sub-problem model is as follows:
calculating and determining optimal system in single-time-slot optimization subproblem modelSystem parameters; parameter W of an optimal electrical energy storage system B The calculation formula of (2) is as follows:
parameter W of optimal hydrogen energy storage system H The calculation formula of (2) is as follows:
s.t. the operating constraints of the electrical energy subsystem, the operating constraints of the hydrogen energy subsystem and the operating constraints of the thermal energy subsystem.
Decomposing the single-time-slot optimization sub-problem model into an upper layer sub-problem model corresponding to the electric-hydrogen subsystem and a lower layer sub-problem model corresponding to the heat energy subsystem according to the information certainty;
decomposing the single-time-slot optimization sub-problem model into an upper sub-problem model corresponding to the electric-hydrogen subsystem and a lower sub-problem model corresponding to the thermal energy subsystem according to information certainty, wherein the method comprises the following steps:
the upper layer subproblem model corresponding to the electro-hydrogen subsystem is expressed by the formula:
s.t. the operation constraint of the electric energy subsystem and the operation constraint of the hydrogen energy subsystem;
the lower layer subproblem model corresponding to the heat energy subsystem has the expression formula as follows:
min(V(C 5,t +C 6,t ))
s.t. operating constraints of the thermal energy subsystem.
Solving the upper sub-problem model by adopting a convex optimization method, and calculating to obtain the heat production of the fuel cell according to the solving result of the upper sub-problem, wherein the method comprises the following steps:
object box due to upper sub-problemThe number is a non-convex function, and convex relaxation is carried out on the non-convex function in the following way, namely the objective function is adjusted to be:the maximum difference between the target function and the original target function isAfter the objective function is adjusted, the whole problem is linear programming, so that the optimal solution can be quickly obtained. Then, the heat generation quantity Q of the fuel cell is obtained according to the solving result fc,t =η hr η h2e P fc,t Δ t, wherein: eta hr Indicates the heat recovery efficiency, η h2e Represents the thermoelectric ratio, P, of the fuel cell fc,t Indicating the fuel cell output power.
Taking the heat production quantity of the fuel cell as the input state of the lower layer subproblem model; the method for re-modeling the lower-layer sub-problem model based on the Markov game framework comprises the following steps:
the environmental state expression of the thermal energy subsystem is as follows:
s t =(Q fc,t ,Q th,t ,β in,i,t ,β out,i,t ,t),
in the formula, Q fc,t Representing the heat generation of the fuel cell at t time slot; q th,t Representing the energy storage level of the slot thermal energy storage system in the t-slot thermal energy subsystem; beta is a in,i,t The indoor temperature of the ith room at the time slot t; beta is a out,t Outdoor temperature for t time slot; t represents the time interval of two continuous action decisions executed by the current hydrogen-containing building energy system; q th,t Representing the energy storage level, η, of the thermal energy storage system in the thermal energy sub-system for the t time slot tc And η td Respectively representing the injection efficiency and the release efficiency of a thermal energy storage system in the thermal energy subsystem; p tc,t And P td,t In thermal subsystems representing t time slots separatelyThe injection power and the release power of the thermal energy storage system are measured;
the action expression of the heat energy subsystem is as follows:
a t =(P sp,1,t ,P sp,2,t ,…,P sp,i,t ),1≤i≤N b ,
in the formula, P sp,i,t Supplying power for the heat of the ith room at the time of the t time slot; n is a radical of b The number of rooms;
the reward expression for the thermal energy subsystem is as follows:
The method for solving by adopting a multi-agent attention depth certainty strategy gradient algorithm to obtain the optimal control strategy of the heat energy subsystem comprises the following steps:
at the beginning of each time slot, acquiring the environmental state of the heat energy subsystem;
the deep neural network outputs the current heat supply behavior of the hydrogen-containing building energy system to control the heat energy subsystem according to the environmental state of the current heat energy subsystem;
acquiring the reward of the next time slot and the environmental state of the next time slot; storing the rewards and the environment state of each time slot into an experience pool;
computing a loss function L (theta) for a deep neural network i ) And a strategic gradientExtracting training samples from the experience pool, training a deep neural network by using a multi-agent attention deep certainty strategy gradient algorithm, and obtaining a loss function L (theta) i ) And strategic gradientAnd (4) iterating the deep neural network to obtain the optimal control strategy of the heat energy subsystem.
A loss function L (theta) of the training deep neural network i ) And a strategic gradientThe expression is as follows:
where π represents the agent's policy (represented by the actor network); y represents the output Q value of the target critic network, and pi' represents the target strategy (represented by the target actor network) of the agent;representing the Q value output by the critic network of the ith agent under the strategy pi; pi i (a i |o i ) Representing the actor network output of the ith agent.
The multi-agent attention depth certainty strategy gradient algorithm architecture comprises i agents, wherein each agent is provided with a single depth neural network, and each depth neural network comprises an actor network, a target actor network, a critic network and a target critic network; the actor network and the target actor network have the same structure, and the critic network and the target critic network have the same structure;
neuron number and environment state s of actor network input layer t The number of components of (a) is the same, and the number of neurons of the output layer is the same as the behavior a t The number of the groups is the same; critic network of the agentThe system comprises an action behavior encoder module, an attention mechanism module and a multilayer perceptron module;
the input to the i-th agent actor network in the attention mechanism module is o i Output is a i (ii) a The input to the critic network includes o i 、a i Andthe output is Q i (o,a),
Wherein o is i Is the local observed state of the ith agent; a is i Is an action of output; e.g. of a cylinder i Code representing local observations and behaviors of the ith agent; q i (o, a) is the Q value of the critic network output, and in the critic network of the ith agent, the input to the attention module isThe output is x i ,x i The contribution of other agents is represented and,
contribution x of other Agents i The expression is as follows:
in the formula, W value,j A value transformation matrix representing a value associated with a jth agent;is a non-linear activation function;
w j is the weight associated with the jth agent,
jth agent dependent weight w j Expressed as:
in the formula, W key,i And W query,i Respectively, the transformation matrices associated with the ith agent.
And controlling the operation of the hydrogen-containing building energy system in real time according to the convex optimization solving method of the upper sub-problem model and the optimal control strategy of the heat energy subsystem.
Figure 3 shows a graph comparing the performance of the method of the invention with other comparison schemes. Scheme 1 represents the combined control of an electrical energy storage system and a hydrogen energy storage system. Specifically, when there is a surplus of renewable energy, the electric energy storage system and the hydrogen energy storage system are charged. And otherwise, discharging the electric energy storage system and the hydrogen energy storage system. Furthermore, the ON-OFF strategy is adopted to control the building heat supply power, namely: when the indoor temperature is lower than the lower limit, the input thermal power is 0; when the indoor temperature is higher than the upper limit, the input thermal power is the maximum thermal supply power. Scheme 2 utilizes a Deep Q Network (DQN) algorithm to control the electrical energy storage system and the hydrogen energy storage system. Meanwhile, the ON-OFF strategy is adopted to control the building heat supply power. Scheme 3 employs a multi-agent deep deterministic policy gradient algorithm (MADDPG) for joint control of all energy storage devices and thermal loads. Scheme 4 is similar to the inventive method, but does not consider the attention mechanism. As can be seen from fig. 4, the method of the present invention can significantly reduce the operation cost while maintaining high thermal comfort (e.g., average temperature deviation less than 0.03 ℃). Specifically, the average running cost was reduced by 30.09%, 20.31%, 25.66%, 18.53% compared to scheme 1, scheme 2, scheme 3, and scheme 4, respectively.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, it is possible to make various improvements and modifications without departing from the technical principle of the present invention, and those improvements and modifications should be considered as the protection scope of the present invention.
Claims (10)
1. The operation control method of the hydrogen-containing building energy system based on the combined optimization and learning is characterized by comprising the following steps of: establishing an expected operation cost minimization problem model of the hydrogen-containing building energy system according to the operation constraint conditions and parameter uncertainty of the hydrogen-containing building energy system; converting the expected running cost minimization problem into a plurality of single-time slot optimization sub-problem models by utilizing a Lyapunov optimization framework;
decomposing the single-time-slot optimization sub-problem model into an upper layer sub-problem model corresponding to the electric-hydrogen subsystem and a lower layer sub-problem model corresponding to the heat energy subsystem;
solving the upper sub-problem model by adopting a convex optimization method, and calculating according to the solving result of the upper sub-problem to obtain the heat production quantity of the fuel cell;
taking the heat production quantity of the fuel cell as the input state of the lower layer subproblem model; based on a Markov game framework, carrying out re-modeling on a lower-layer sub-problem model, and solving by adopting a multi-agent attention depth certainty strategy gradient algorithm to obtain an optimal control strategy of a heat energy subsystem;
and controlling the operation of the hydrogen-containing building energy system in real time according to the convex optimization solving method of the upper sub-problem model and the optimal control strategy of the heat energy subsystem.
2. The method for controlling the operation of the hydrogen-containing building energy system based on the combined optimization and learning of claim 1, wherein the problem model of minimizing the expected operation cost of the hydrogen-containing building energy system is expressed by the following formula:
s.t. the operation constraint of the electric energy subsystem, the operation constraint of the hydrogen energy subsystem and the operation constraint of the heat energy subsystem;
in the formula, C 1,t Cost of buying and selling electricity for t time slot, C 2,t Cost of carbon emission for t time slot, C 3,t Cost of loss for t-slot electrical energy storage system, C 4,t Hydrogen for t time slotEnergy subsystem operation and maintenance cost, C 5,t For the loss cost of the t-slot thermal subsystem, C 6,t For T time slot natural gas purchase cost, T represents time slot length; the decision variables Θ include: energy trading volume between a local energy system and a large power grid, charging and discharging power of an electric energy storage system, input power of an electrolytic cell, output power of a fuel cell, heat supply power of each room, charging and discharging power of a heat energy storage system and natural gas consumption.
3. The method for controlling the operation of a hydrogen-containing building energy system based on combined optimization and learning of claim 2, wherein the method for converting the desired operation cost minimization problem into a plurality of single-time-slot optimization sub-problem models by using a Lyapunov optimization framework comprises:
judging the controllability of a hydrogen-containing building energy system; selecting a hydrogen building energy system which meets controllable conditions to construct a virtual queue of an electric energy subsystem and a hydrogen energy subsystem; defining a Lyapunov function according to the virtual queue, and calculating the weighted sum delta Y (t) of the single-time-slot Lyapunov drift and the operation cost; and converting the hydrogen-containing building energy system expected operation cost minimization problem model into a plurality of single-time-slot optimization sub-problem models through the minimization weighted sum delta Y (t), and calculating and determining optimal system parameters in the single-time-slot optimization sub-problem models.
4. The method for controlling the operation of the hydrogen-containing building energy system through combined optimization and learning according to claim 3, wherein the expression formula of the controllable conditions is as follows:
v max >τ max ,
v min >τ min ,
in the formula, v max And v min Respectively representing the highest electricity price and the lowest electricity price for buying electricity; tau. max And τ min Respectively representing the highest electricity price and the lowest electricity price for selling electricity; eta bc And η bd Respectively representing the charging efficiency and the discharging efficiency of the electric energy storage system; mu.s c Is a weighting parameter that represents the importance of carbon emissions relative to energy costs;andrespectively representing a maximum rate and a minimum rate of carbon emission; psi BESS Is the electrical energy storage system depreciation coefficient; b max And B min Respectively representing the maximum energy storage level and the minimum energy storage level of the electric energy storage system;andrespectively representing the injection rated power and the release rated power of the electric energy storage system; omega el And omega fc Respectively representing the conversion coefficients of the electrolytic cell and the fuel cell;andindicating variables respectively indicating whether the electrolyzer and the fuel cell are on or off; h max And H min Respectively representing the maximum energy storage level and the minimum energy storage level of the hydrogen energy storage system;andrespectively representing rated power of the electrolytic cell and the fuel cell; Δ t represents the slot length.
5. The method for controlling the operation of the hydrogen-containing building energy system based on the combined optimization and learning of claim 4, wherein the method for calculating the weighted sum Δ Y (t) of the single-slot lyapunov drift and the operation cost comprises:
the Lyapunov function L (t) is expressed by the formula:
in the formula, X B,t =B t +W B ,X H,t =H t +W H ,ω r Is a unity of X B,t And X H,t A dimensional weighting factor; b is t Energy storage level of an electrical energy storage system, denoted t time slot, H t Storage of a hydrogen energy storage system denoted t-slotsCan be horizontal, W B Expressed as a parameter of the optimal electric energy storage system, W H Parameters expressed as an optimal hydrogen energy storage system;
B t and H t The dynamic constraints that need to be satisfied are respectively expressed as:
in the formula, P bc,t And P bd,t Respectively representing the charging power and the discharging power of the electric energy storage system; p is el,t And P fc,t Respectively representing the input power of the electrolytic cell and the output power of the fuel cell at t time slot;
the single-time-slot lyapunov drift is expressed by the following formula:
Λ t =E{L(t+1)-L(t)|X(t)},
in the formula, X (t) ═ X B,t ,X H,t ) E {. cndot } represents an expected operation;
then the single time slot Lyapunov drift Lambda t The expression of (c) can be converted into:
Λ t ≤ξ B +ξ H +E{Γ 0 |X(t)},
calculating a weighted sum Δ y (t) of the single-slot lyapunov drift and the operating cost, expressed by the formula:
where V is a weighting parameter.
6. The method for controlling the operation of the hydrogen-containing building energy system through the combined optimization and learning of the claim 5 is characterized in that the expression formula of the single-time-slot optimization subproblem model is as follows:
s.t. the operation constraint of the electric energy subsystem, the operation constraint of the hydrogen energy subsystem and the operation constraint of the heat energy subsystem;
parameter W of an optimal electrical energy storage system B The calculation formula of (2) is as follows:
parameter W of optimal hydrogen energy storage system H The calculation formula of (c) is:
7. the method for controlling the operation of the hydrogen-containing building energy system based on the combined optimization and learning of claim 6, wherein the single-time-slot optimization sub-problem model is decomposed into an upper sub-problem model corresponding to the electric-hydrogen subsystem and a lower sub-problem model corresponding to the thermal energy subsystem according to the information certainty, and the method comprises the following steps:
an upper sub-problem model corresponding to the electro-hydrogen subsystem, expressed as:
s.t. the operation constraint of the electric energy subsystem and the operation constraint of the hydrogen energy subsystem;
the lower-layer sub-problem model corresponding to the heat energy subsystem has the expression formula as follows:
min(V(C 5,t +C 6,t ))
s.t. operating constraints of the thermal energy subsystem.
8. The method for controlling the operation of the hydrogen-containing building energy system based on the combined optimization and learning of claim 7, wherein the method for modeling the lower layer subproblem model again based on the Markov game framework comprises the following steps:
the environmental state expression of the thermal energy subsystem is as follows:
s t =(Q fc,t ,Q th,t ,β in,i,t ,β out,i,t ,t),
in the formula, Q fc,t Representing the heat generation amount of the fuel cell at t time slot; q th,t Representing the energy storage level of the slot thermal energy storage system in the t-slot thermal energy subsystem; beta is a in,i,t The indoor temperature of the ith room at the time slot t; beta is a out,t An outdoor temperature of t time slot; t represents the time interval of two continuous action decisions executed by the current hydrogen-containing building energy system; q th,t Representing the energy storage level, η, of the thermal energy storage system in the thermal energy sub-system for the t time slot tc And η td Respectively representing the injection efficiency and the release efficiency of the thermal energy storage system in the thermal energy subsystem; p tc,t And P td,t Respectively representing the injection power and the release power of a slot thermal energy storage system in the t-slot thermal energy subsystem;
the action expression of the heat energy subsystem is as follows:
a t =(P sp,1,t ,P sp,2,t ,…,P sp,i,t ),1≤i≤N b ,
in the formula, P sp,i,t Supplying power for the heat of the ith room at the time of the t time slot; n is a radical of hydrogen b The number of rooms;
the reward expression for the thermal energy subsystem is as follows:
9. The method for controlling the operation of a hydrogen-containing building energy system based on combined optimization and learning of claim 8, wherein the method for solving by using a multi-agent depth of attention deterministic strategy gradient algorithm comprises:
at the beginning of each time slot, acquiring the environmental state of the heat energy subsystem;
the deep neural network outputs the current heat supply behavior of the hydrogen-containing building energy system to control the heat energy subsystem according to the environmental state of the current heat energy subsystem;
acquiring the reward of the next time slot and the environmental state of the next time slot; storing the rewards and the environment state of each time slot into an experience pool;
computing a loss function L (theta) for a deep neural network i ) And strategic gradientExtracting training samples from the experience pool, training a deep neural network by using a multi-agent attention depth deterministic strategy gradient algorithm, and obtaining a loss function L (theta) i ) And strategic gradientIterating the deep neural network to obtain an optimal control strategy of the heat energy subsystem;
a loss function L (theta) of the training deep neural network i ) And strategic gradientThe expression is as follows:
where π represents the policy of the agent (represented by the actor network); y represents the output Q value of the target critic network, and pi' represents the target strategy (represented by the target actor network) of the agent;representing the Q value output by the critic network of the ith agent under the strategy pi; pi i (a i |o i ) Representing the actor network output of the ith agent.
10. The method for controlling the operation of the hydrogen-containing building energy system through combined optimization and learning of claim 9 is characterized in that a multi-agent attention depth certainty strategy gradient algorithm framework comprises i agents, each agent is provided with a single deep neural network, and each deep neural network comprises an actor network, a target actor network, a critic network and a target critic network; the actor network and the target actor network have the same structure, and the critic network and the target critic network have the same structure;
neuron number and environment state s of actor network input layer t The number of the components of (a) is the same, the number of the neurons of the output layer is the same as the behavior a t The number of the groups is the same; the critic network of the intelligent agent comprises an action behavior encoder module, an attention mechanism module and a multilayer perceptron module;
the input to the i-th agent actor network in the attention mechanism module is o i Output is a i (ii) a The input to the critic network includes o i 、a i Andthe output is Q i (o,a),
Wherein o is i Is the local observed state of the ith agent; a is a i Is an action of output; e.g. of the type i Code representing local observations and behaviors of the ith agent; q i (o, a) is the Q value of the critic network output, and in the critic network of the ith agent, the input to the attention module isThe output is x i ,x i Represents contributions of other agents;
contribution x of other Agents i The expression is as follows:
in the formula, W value,j A value transformation matrix representing a value associated with a jth agent;is a non-linear activation function;
w j is the weight associated with the jth agent;
jth agent dependent weight w j Expressed as:
in the formula, W key,i And W query,i Respectively, the transformation matrices associated with the ith agent.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210631486.3A CN115130733B (en) | 2022-06-06 | 2022-06-06 | Hydrogen-containing building energy system operation control method combining optimization and learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210631486.3A CN115130733B (en) | 2022-06-06 | 2022-06-06 | Hydrogen-containing building energy system operation control method combining optimization and learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115130733A true CN115130733A (en) | 2022-09-30 |
CN115130733B CN115130733B (en) | 2024-07-09 |
Family
ID=83378492
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210631486.3A Active CN115130733B (en) | 2022-06-06 | 2022-06-06 | Hydrogen-containing building energy system operation control method combining optimization and learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115130733B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110458443A (en) * | 2019-08-07 | 2019-11-15 | 南京邮电大学 | A kind of wisdom home energy management method and system based on deeply study |
US20200301924A1 (en) * | 2019-03-20 | 2020-09-24 | Guangdong University Of Technology | Method for constructing sql statement based on actor-critic network |
CN112966444A (en) * | 2021-03-12 | 2021-06-15 | 南京邮电大学 | Intelligent energy optimization method and device for building multi-energy system |
US20220036392A1 (en) * | 2020-08-03 | 2022-02-03 | Desong Bian | Deep Reinforcement Learning Based Real-time scheduling of Energy Storage System (ESS) in Commercial Campus |
-
2022
- 2022-06-06 CN CN202210631486.3A patent/CN115130733B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200301924A1 (en) * | 2019-03-20 | 2020-09-24 | Guangdong University Of Technology | Method for constructing sql statement based on actor-critic network |
CN110458443A (en) * | 2019-08-07 | 2019-11-15 | 南京邮电大学 | A kind of wisdom home energy management method and system based on deeply study |
US20220036392A1 (en) * | 2020-08-03 | 2022-02-03 | Desong Bian | Deep Reinforcement Learning Based Real-time scheduling of Energy Storage System (ESS) in Commercial Campus |
CN112966444A (en) * | 2021-03-12 | 2021-06-15 | 南京邮电大学 | Intelligent energy optimization method and device for building multi-energy system |
Also Published As
Publication number | Publication date |
---|---|
CN115130733B (en) | 2024-07-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Li et al. | Sizing of a stand-alone microgrid considering electric power, cooling/heating, hydrogen loads and hydrogen storage degradation | |
Pu et al. | Optimal sizing for an integrated energy system considering degradation and seasonal hydrogen storage | |
CN112966444B (en) | Intelligent energy optimization method and device for building multi-energy system | |
CN109636056B (en) | Multi-energy microgrid decentralized optimization scheduling method based on multi-agent technology | |
CN108985524B (en) | Coordination control method of multi-energy complementary system | |
Yu et al. | Joint optimization and learning approach for smart operation of hydrogen-based building energy systems | |
CN115169916A (en) | Electric heating comprehensive energy control method based on safety economy | |
Sanaye et al. | A novel energy management method based on Deep Q Network algorithm for low operating cost of an integrated hybrid system | |
Ahmadi et al. | Performance of a smart microgrid with battery energy storage system's size and state of charge | |
CN116300755A (en) | Double-layer optimal scheduling method and device for heat storage-containing heating system based on MPC | |
Liang et al. | Deep reinforcement learning-based optimal scheduling of integrated energy systems for electricity, heat, and hydrogen storage | |
CN115795992A (en) | Park energy Internet online scheduling method based on virtual deduction of operation situation | |
Liang et al. | Real-time optimization of large-scale hydrogen production systems using off-grid renewable energy: Scheduling strategy based on deep reinforcement learning | |
CN114971071A (en) | Park comprehensive energy system time sequence planning method considering wind-solar access and electric heating hybrid energy storage | |
CN111509784A (en) | Uncertainty-considered virtual power plant robust output feasible region identification method and device | |
CN111275572A (en) | Unit scheduling system and method based on particle swarm and deep reinforcement learning | |
Fan et al. | Multi-agent deep reinforced co-dispatch of energy and hydrogen storage in low-carbon building clusters | |
Yin et al. | Decomposition prediction fractional-order PID reinforcement learning for short-term smart generation control of integrated energy systems | |
Zhou et al. | Deep reinforcement learning guided cascade control for air supply of polymer exchange membrane fuel cell | |
CN111555362B (en) | Optimal regulation and control method and device for full-renewable energy source thermoelectric storage coupling system | |
CN115130733B (en) | Hydrogen-containing building energy system operation control method combining optimization and learning | |
CN113098073B (en) | Day-ahead scheduling optimization method considering source-load bilateral elastic space | |
CN112583053A (en) | Microgrid energy optimization scheduling method containing distributed wind power | |
Sun et al. | Energy management based on safe multi-agent reinforcement learning for smart buildings in distribution networks | |
CN113131464A (en) | Multi-energy collaborative optimization method based on chaotic frog-leaping algorithm and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |