CN116914732A

CN116914732A - Deep reinforcement learning-based low-carbon scheduling method and system for cogeneration system

Info

Publication number: CN116914732A
Application number: CN202310854951.4A
Authority: CN
Inventors: 皮润一; 孙中海; 郑伟铭; 杨超
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2023-07-13
Filing date: 2023-07-13
Publication date: 2023-10-20

Abstract

The application discloses a deep reinforcement learning-based low-carbon scheduling method and system for a cogeneration system, wherein the method comprises the following steps: the carbon capture technology and the P2G technology are integrated into the cogeneration system, so that an improved cogeneration system is obtained; performing equipment mathematical model construction on the improved cogeneration system, and adding constraint conditions; carrying out Markov decision process design on the improved cogeneration system based on constraint conditions to obtain a multi-objective optimization function; and performing interactive training on the improved cogeneration system by using an improved TD 3-based scheduling algorithm to obtain an optimal low-carbon economic scheduling strategy. The system comprises: an improved cogeneration system module, a model building module, a constraint module, a markov decision design module, and an improved TD 3-based scheduling algorithm module. By using the application, the thermoelectric supply balance of the cogeneration system can be ensured, and the profit maximization and carbon emission minimization of the system can be realized. The method and the device can be widely applied to the technical field of intelligent power grids.

Description

Deep reinforcement learning-based low-carbon scheduling method and system for cogeneration system

Technical Field

The application relates to the technical field of smart grids, in particular to a deep reinforcement learning-based low-carbon scheduling method and system for a cogeneration system.

Background

Global warming and environmental pollution have become major challenges facing the human society, and there is an urgent need to develop low-carbon economy to realize CO ₂ Emission reduction, research on optimal scheduling operation of the existing energy system gradually progresses to low-carbon economic scheduling. Compared with a traditional power distribution network, the Combined Heat and Power (CHP) can combine multiple energy sources, such as electricity, gas and heat, so as to overcome the waste heat generated by the traditional energy supply system, however, the distributed power equipment can cause unreasonable resource allocation due to independent operation. With the development of various computers and communication technologies such as intelligent power grid wireless communication networks and edge computing, the occurrence of the cogeneration coupling network solves the problem. Various Distributed Energy Sources (DERs) are aggregated through advanced intelligent computer technology and communication systems, and the functions of energy storage and flexible load can be achieved. However, most of the existing cogeneration systems only consider electric loads, the heat energy system only supplements the electric power system, and the thermoelectric systems cannot be mutually coupled and mutually supplied with energy, so that the energy utilization rate is low, and the energy loss of energy conversion is high.

Disclosure of Invention

In order to solve the technical problems, the application aims to provide a deep reinforcement learning-based low-carbon scheduling method and system for a cogeneration system, which can ensure the thermoelectric supply balance of the cogeneration system and realize the maximization of profit and the minimization of carbon emission of the system.

The first technical scheme adopted by the application is as follows: a deep reinforcement learning-based low-carbon scheduling method for a cogeneration system comprises the following steps:

the carbon capture technology and the P2G technology are integrated into the cogeneration system, so that an improved cogeneration system is obtained;

modeling various devices of the improved cogeneration system based on the operating characteristics of the devices to obtain a mathematical model of the devices;

constructing constraint conditions according to the equipment mathematical model;

carrying out Markov decision process design on the improved cogeneration system based on constraint conditions to obtain a multi-objective optimization function;

and performing interactive training on the improved cogeneration system by using an improved TD 3-based scheduling algorithm to obtain an optimal low-carbon economic scheduling strategy.

Further, the plant mathematical model includes a wind turbine mathematical model, a photovoltaic generator mathematical model, a gas turbine mathematical model, a waste heat boiler mathematical model, a gas boiler mathematical model, an energy storage system mathematical model, a carbon capture unit mathematical model, and a P2G unit mathematical model, wherein:

the mathematical model of the carbon capture unit has the following expression:

wherein ,P ^cc,b and />Respectively representing the total energy consumption, the basic energy consumption and the operation energy consumption of the carbon capture unit in the period t; epsilon ^e,cc 、η ^cc Respectively represents the CO passing through the capturing unit amount ₂ And consumed electric power and carbon capture efficiency; /> and />Respectively representing the total weight of the emissions and CO ₂ An upper weight limit; /> and />The specific carbon emission intensity coefficients of the gas turbine and the gas boiler are respectively represented.

The expression of the P2G unit mathematical model is as follows:

wherein ,representing CO captured during period t ₂ Weight (S)>Representing the sequestered portion of CO ₂ The weight of the composite material is that,representing a portion of CO provided to a P2G device ₂ A weight; />Representation->At->The proportion of the components; and />Representing the weight of natural gas produced by the P2G device and the electrical energy consumed by the P2G device, respectively; η (eta) ^p2g and ε^e,p2g Respectively represent the CO of unit quantity ₂ Conversion efficiency and consumed electric energy when converting into natural gas; lambda (lambda) ^p2g For adjusting parameters, for eliminating-> and />Dimensional differences between them.

Through the optimization step, the performance parameters of various devices of the improved cogeneration system are quantized, and constraint conditions are more conveniently constructed from the quantized performance parameters;

further, the constraint condition specifically includes:

the power balancing constraint is expressed as follows:

wherein ,representing interconnection power between the virtual power plant and the utility grid at time t, < >>Represents the nth time period t ^wt Generating capacity of typhoon turbine, +.>Represents the output power generation of the nth photovoltaic generator in period t, +.>Indicating the output power of the nth gas turbine during period t, < >>Indicating the nth time period t ^es Discharge power of the individual electrical energy storage cells, +.>Represents the sum of the power of the electric loads on the energy consumption side, < + >>Indicating the nth time period t ^es Discharge power of the individual electrical energy storage cells, +.>Electric energy consumed by the meter P2G device, +.>Total energy consumption of the carbon capture unit during period t, +.>Indicating that the nth time is within time t ^wbb Output heat power of waste heat boiler +.>Represents the output thermal power of natural gas during period t, < >>Represents the sum of the energy-consuming side thermal load powers, < >>N in period t ^hs The charging thermal power of the individual thermal energy storage units.

A distributed generation constraint, the expression of which is as follows:

wherein , and />Respectively represent the nth ^dg Continuous start-up time and continuous stop time of the counter generator to time t, < >>Represents the nth ^dg The state of the counter generator during period t +.> and />Respectively represent the nth ^dg Minimum allowable starting time, minimum allowable stopping time, maximum power generation amount climbing and maximum power generation amount of the generator, dg E [ wt, pv]Representing dg type identifiers.

Device operation constraints, the expressions of which are as follows:

wherein u is [ gt, whb, gb, cc, p2g]Which represents the device type identifier and,represents the nth ^u Status of station device within period t +.>Respectively the nth ^u Minimum allowed start-up time, maximum allowed down time, maximum ramp power and maximum power of the station apparatus.

The energy storage system constraint is expressed as follows:

wherein ,respectively represent the nth ^es Minimum electric quantity storage, maximum electric quantity storage, maximum charging power, charging and discharging power and maximum discharging power of the electric energy storage units; respectively the nth ^hs The thermal energy storage units have a minimum thermal storage power, a maximum thermal storage power, a maximum charge power, a charge-discharge power, and a maximum discharge power.

Other constraints, the expressions of which are as follows:

wherein , and />Respectively representing the minimum value and the maximum value of the mains interconnection power; />Representing the capacity of line y in the t period, < >>Representing the maximum capacity of line y; />Representing the voltage of the h bus in the t period; /> and />Represents the minimum and maximum limits, respectively,/-for the bus voltage>Expressed as the ratio of natural gas consumption to total gas supply during period t +.>Representing the sequestered portion of CO ₂ Weight at captured CO ₂ The weight ratio of the components.

Further, the expression of the multi-objective optimization function is as follows:

R＝C ^I -C ^II +C ^B

wherein R represents a multi-objective optimization function, C ^I 、C ^II and C^B Respectively representing the running cost, penalty cost and fixed parameters ensuring that R is always positive for the virtual power plant, and />Respectively representing the electricity selling price, electricity purchasing price, carbon emission selling price and natural gas price in the period t, < >> and />Represents carbon emission quota and total carbon emission, u E [ gt, whb, gb, cc, p2g]Representing a device type identifier->Represents the nth ^u Power consumption of seed equipment, N ^u Represents the total number of the u-th device, omega ^u Represents the nth ^u Operation and maintenance cost unit cost of seed equipment, +.>Represents the total supply of natural gas, ρ represents the regional carbon emission coefficient determined by the baseline method, ++>Indicating CO emissions ₂ Total weight of eta ^cc Representing the CO by capturing a unit amount ₂ Whereas the carbon capture efficiency of consumption, +.>Indicating the unit quantity CO generated by the gas turbine and the gas boiler ₂ Punishment costs of->Represents the specific carbon emission intensity coefficient of the gas turbine, < >>Indicating the output power of the internal combustion turbine at time t, < >>Indicating the output heat power of the internal combustion turbine, ψ, at time t ^wt and ψ^pv Penalty factor representing wind power limit and photovoltaic power limit, +.> and />Representing predicted output power of wind power generator and photovoltaic power generator in period t, < >>Represents penalty factors beyond constraint,>representing action a ^χ Is the maximum constraint value of (2); chi represents the adjustment parameters for eliminating dimensional differences beyond constraint costs, < >> and />And the output power of the wind driven generator and the photovoltaic generator in the period t is represented.

Through this preferred step, a multi-objective optimization function is established that takes into account revenue, operating costs, carbon trading, curtailment and overconstrained behavior to achieve improved profit maximization and carbon emission minimization of the cogeneration system.

Further, the scheduling algorithm based on TD3 specifically expands one independent experience playback pool into two independent experience playback pools based on the TD3 algorithm, wherein the two independent experience playback pools comprise a current rewarding experience playback pool and a long-term experience playback pool.

By this preferred step, it is possible to store empirical samples of different time scales.

Further, the step of performing interactive training on the improved cogeneration system by using the improved scheduling algorithm based on TD3 to obtain an optimal low-carbon economic scheduling strategy specifically comprises the following steps:

acquiring the operation data of the improved cogeneration system and storing the operation data into an independent experience playback pool;

sampling based on the priority probability of the reward value of the sample in the long-term experience playback pool to obtain batch samples;

calculating a reward average value of the batch samples, and screening based on the reward average value to obtain an optimized sample;

updating the sample action according to the state information in the optimized sample to obtain an updated sample action;

and inputting the current information of the improved cogeneration system into a scheduling algorithm network after the sample updating action to obtain an optimal low-carbon economic scheduling strategy.

By this preferred step, the problems of local optimality, time consumption and non-convergence in sample selection can be ameliorated,

the second technical scheme adopted by the application is as follows: deep reinforcement learning-based low-carbon scheduling system of cogeneration system, comprising:

an improved cogeneration system module for carbon capture and carbon conversion of a cogeneration system;

the model building module is used for modeling each device of the improved cogeneration system to obtain a mathematical model of the device;

the constraint module is used for constraining the equipment mathematical model to obtain constraint conditions

The Markov decision design module is used for carrying out function design on the improved cogeneration system based on constraint conditions to obtain a multi-objective optimization function;

the improved scheduling algorithm module based on TD3 is used for performing interactive training on the improved cogeneration system to obtain an optimal low-carbon economic scheduling strategy.

Further, the improved cogeneration system module, which specifically comprises:

the wind power generation module is used for converting renewable clean wind energy into electric energy and providing electric energy for the system;

the photovoltaic power generation module is used for converting renewable clean light energy into electric energy and providing electric energy for the system;

a gas turbine module for providing electrical and thermal energy to the system;

the waste heat boiler module is used for absorbing the redundant heat energy generated by the gas turbine and preferentially providing the heat energy for the heat load

The gas boiler module is used for providing heat energy for the system;

the energy storage module is used for storing redundant electric energy and heat energy;

a carbon capture module for capturing and storing CO exhausted from the gas turbine module, the waste heat boiler module and the gas boiler module ₂ ；

P2G module for capturing CO by the carbon capture module ₂ Conversion to methane;

an AC bus module for delivering a regulated voltage into an electrical load;

an AC-DC conversion module for converting electric energy into a stable voltage;

and the thermal energy bus module is used for conveying thermal energy into a thermal load.

Through the optimized system, the surplus power of the power grid is converted into the compatible fuel gas of the power grid which is easy to store, so that the balance of power and fuel gas supply is realized, the energy utilization rate is improved, and the energy loss of energy conversion is reduced.

The method and the system have the beneficial effects that: according to the application, on the basis of a coupled electricity, gas and heat cogeneration system, a carbon capture and P2G technology is fused, an improved heat and power cogeneration system is constructed, the residual power of a power grid can be converted into compatible fuel gas of the power grid which is easy to store, and the energy loss of energy conversion is reduced; based on the improved cogeneration system, a Markov decision process design is performed, and a multi-objective optimization function considering income, operation cost, carbon transaction, reduction and over-constraint behavior is established; the improved scheduling algorithm based on TD3 is designed to solve the problems of local optimum, time consumption and non-convergence caused by sample selection, and the algorithm is used for performing interactive training on an improved cogeneration system to obtain an optimal low-carbon economic scheduling strategy, so that the thermoelectric supply balance of the cogeneration system can be ensured under the strategy, and the profit maximization and carbon emission minimization of the system are realized.

Drawings

FIG. 1 is a flow chart of the steps of the deep reinforcement learning-based cogeneration system low-carbon scheduling method of the application;

FIG. 2 is a block diagram of a deep reinforcement learning-based cogeneration system low-carbon scheduling system;

FIG. 3 is a TD 3-based scheduling algorithm interactive training flow chart of the deep reinforcement learning-based cogeneration system low-carbon scheduling method;

FIG. 4 is a topological structure diagram of an improved cogeneration system of the deep reinforcement learning-based cogeneration system low-carbon dispatch system of the application.

Detailed Description

The application will now be described in further detail with reference to the drawings and to specific examples. The step numbers in the following embodiments are set for convenience of illustration only, and the order between the steps is not limited in any way, and the execution order of the steps in the embodiments may be adaptively adjusted according to the understanding of those skilled in the art.

As shown in fig. 1, the application provides a deep reinforcement learning-based low-carbon scheduling method for a cogeneration system, which comprises the following steps:

s1, integrating a carbon capture technology and a P2G technology into a cogeneration system to obtain an improved cogeneration system;

specifically, referring to fig. 4, the apparatus of the improved cogeneration system mainly includes a wind power generator, a photovoltaic power generator, a gas turbine, a waste heat boiler, a gas boiler, an electric energy storage unit, a thermal energy storage unit, a carbon capturing unit, and a P2G unit, and in terms of energy supply, electric energy is cooperatively supplied by a common power grid, the gas turbine, the wind power generator, and the photovoltaic power generator while preferably being supplied with power by the photovoltaic power generator, and when the electric energy supply is insufficient, the apparatus can be purchased from a conventional power grid; the heat energy is provided by the gas boiler and the gas turbine, and the redundant heat energy generated by the gas turbine is absorbed by the waste heat boiler, so that the heat energy is preferentially provided for the heat load, and when the heat energy is insufficient to be provided by the gas boiler; in terms of energy storage, the electrical energy storage unit is used to store excess electrical energy and the thermal energy storage unit stores excess heat such that the supply and demand of electrical energy and thermal energyThe system can be adjusted when electricity price, natural gas price or load fluctuates; in addition, the carbon capture unit is used for capturing and storing CO exhausted by the gas turbine and the boiler ₂ The P2G unit will CO ₂ Methanation; in terms of energy conversion, the P2G unit may utilize captured CO ₂ Converting the electrical energy to natural gas.

S2, modeling various devices of the improved cogeneration system based on the operating characteristics of the devices to obtain a mathematical model of the devices;

specifically, each equipment mathematical model includes a wind power generator mathematical model, a photovoltaic power generator mathematical model, a gas turbine mathematical model, a waste heat boiler mathematical model, a gas boiler mathematical model, an energy storage system mathematical model, a carbon capture unit mathematical model, and a P2G unit mathematical model, wherein:

the expression of the mathematical model of the wind driven generator is as follows:

wherein , and v_t Respectively represent the nth of the period t ^wt Generating capacity and wind speed of wind driven generator +.>Is the nth ^wt Rated power n of wind driven generator ^wt ＝1,2,…,N ^wt ，N ^wt Representing the total number of wind power generators, v _ci 、v _N and v_co The cut-in wind speed, the rated wind speed and the cut-out wind speed are indicated, respectively.

The mathematical model of the photovoltaic generator has the following expression:

wherein ,represents the output generated energy of the nth photovoltaic generator in the period t, n ^pv ＝1,2,…,N ^pv ，N ^pv Represents the total number, eta of the photovoltaic generators ^pv Representing the efficiency of a photovoltaic generator, +.>Is irradiance condition G ₀ Rated power of n-th photovoltaic generator under 1000, G _t Is incident irradiance on the inclined plane of the photovoltaic generator, T _stc and T_t Is the reference temperature and the ambient temperature during period t, alpha being the temperature coefficient.

A mathematical model of a gas turbine, the expression of which is as follows:

wherein ,respectively representing the output electric power, the output thermal power and the natural gas consumption weight of the internal combustion gas turbine in the period t; n is n ^gt ＝1,2,…,N ^gt ，N ^gt Representing the total number of gas turbines; η (eta) ^gt 、/>δ ^LHV The power generation efficiency, the heat dissipation efficiency, and the heating value of the natural gas of the gas turbine are respectively represented.

The expression of the mathematical model of the waste heat boiler is as follows:

wherein ,represents the nth ^wbb Waste heat of the heat pumpThe output heat power of the boiler in time t; n is n ^whb ＝1,2,…,N ^whb ，N ^whb Representing the total number of waste heat boilers; η (eta) ^whb and COP^b The heat transfer efficiency and the heating coefficient of the waste heat boiler are respectively +.>The output thermal power of the internal combustion turbine is shown in period t.

The mathematical model of the gas boiler has the following expression:

wherein , and />The output thermal power and the natural gas consumption weight of the natural gas in the gas boiler in the period t are represented; n is n ^gb ＝1,2,…,N ^gb ，N ^gb Representing the total number of gas boilers; η (eta) ^gb Is the heat transfer efficiency delta of the gas boiler ^LHV Represents the calorific value of natural gas, < >>Represents the total supply of natural gas, +.>Represents the weight of fuel consumption of the internal combustion turbine during period t,/-), of the internal combustion turbine>Expressed as a proportionality of natural gas consumption to total gas supply during period t.

The mathematical model of the energy storage system consists of an electric energy storage part and a thermal energy storage part, and the formula is expressed as follows:

wherein , and />Represents the nth ^es Power storage capacity and nth of an electrical energy storage unit ^hs A heat storage amount of the individual thermal energy storage units; n is n ^es ＝1,2,…,N ^es ，n ^hs ＝1,2,…,N ^hS ；N ^ess and N^hs s represents the total number of electrical energy storage units and thermal energy storage units; zeta type ^es and ζ^hs Representing the natural loss rates of the electrical energy storage unit and the thermal energy storage unit; /> and />Indicating the nth time period t ^es Charging power and discharging power of the individual electrical energy storage units; /> and />Indicating the nth time period t ^hs The charging thermal power and the discharging thermal power of the thermal energy storage units; η (eta) ^c,es 、η ^d,es 、η ^c,hs 、η ^d,hs The charging efficiency of the electrical energy storage unit, the discharging efficiency of the electrical energy storage unit, the thermal charging efficiency of the thermal energy storage unit and the thermal discharging efficiency of the thermal energy storage unit are indicated, respectively.

A mathematical model of a carbon capture unit, expressed as follows:

The expression of the P2G unit mathematical model is as follows:

S3, constructing constraint conditions according to the equipment mathematical model;

specifically, constraints include power balance constraints, distributed generation constraints, plant operation constraints, energy storage system constraints, and other constraints, wherein:

the power balancing constraint is expressed as follows:

A distributed generation constraint, the expression of which is as follows:

Device operation constraints, the expressions of which are as follows:

The energy storage system constraint is expressed as follows:

Other constraints, the expressions of which are as follows:

S4, carrying out Markov decision process design on the improved cogeneration system based on constraint conditions to obtain a multi-objective optimization function;

in particular, the problem of low-carbon economic dispatch of an improved cogeneration system is to determine maximum profit and minimum carbon emissions within the day based on meeting constraints such as heat and electrical loads. Thus, an optimal scheduling problem can be understood mathematically as a sequential decision problem that finds an optimal solution within the feasible domain under a range of constraints, thereby maximizing one or more objective functions. The low-carbon economic scheduling problem is designed into a Markov decision process, so that the simplicity and the calculation efficiency of the sequence decision problem can be improved.

A markov decision process is generally defined as a tuple (a, S, P, R), where S is an array of states called a state space; a is an action array called action space; p: s x A x S → [0,1] is the transition probability distribution; r: sxa→r is a bonus function. The main idea of the markov decision process is that the agent interacts with the environment over a number of time steps to improve the optimal game strategy.

The improved cogeneration system provided by the embodiment of the application is in a simulation environment, and the power output of the system is adjusted by an intelligent body to formulate an optimal scheduling strategy. In each time step t, in dependence on the observed current state s in the environment of the improved cogeneration system _t E S, the agent generates a scheduling action a from the action space A according to the strategy _t E A. The agent's policy is from state s _t To action a _t Is mapped to the mapping of (a). After performing the selected action, the agent transmits a scheduling action to adjust the power output of each unit in the improved cogeneration system. The agent then receives the next state s from the modified cogeneration system environment _t ' and returns the current prize value r _t . The above process is repeated until the conditions set by the environment are satisfied. The goal of the agent person is to maximize profits and minimize carbon emissions.

The basic definition of the markov decision process design for an improved cogeneration system is as follows:

during operation, state observations s _t Refers to the value observed by the agent prior to the selection of an action. Environmental state s of time step t _t Including the output of gas turbines, gas boilers, waste heat boilers, thermal energy storage units, carbon capture units, and P2G units. Thus, for an improved cogeneration system, the current state of time step t is described as:

wherein ,s_t E S, S is the set of all observable states that satisfy the constraint.

The improved cogeneration system environmental status is controlled by the actions of the agent at each step t. The purpose of the low-carbon economic dispatch is to optimally determine the output power of the electric energy storage unit and the CO of the carbon capture unit ₂ Capture rate and CO for methanation ₂ Capture rate. The scheduling decision variable (action) can thus be represented by the current action of time step t, including the electrical energy storage unit charge-discharge powerProportional coefficient-> and />

wherein ,a_t E a, a is the set of all allowed scheduling actions that meet the constraint.

From the current information a, according to mathematical models of the respective devices described by equations 1-9 _t and s_t Determining to the next state s _t ' transition probability P(s) _t ′|s _t ,a _t )。

The multi-objective optimization function mainly comprises operation cost and punishment cost of the virtual power plant, wherein the operation cost comprises profit obtained by electric power transaction and carbon emission transaction, and cost of operation and maintenance of fuel and equipment; penalty costs include pollution gas emission costs, WT and PV cut costs, and costs above constraints, expressed as follows:

R＝C ^I -C ^II +C ^B (17)

wherein R represents a multi-objective optimization function, C ^I 、C ^II and C^B Respectively representing the running cost, penalty cost and fixed parameters ensuring that R is always positive for the virtual power plant, and />Respectively representing the electricity selling price, electricity purchasing price, carbon emission selling price and natural gas price in the period t, < >> and />Represents carbon emission quota and total carbon emission, u E [ gt, whb, gb, cc, p2g]Representing a device type identifier->Represents the nth ^u Power consumption of seed equipment, N ^u Represents the total number of the u-th device, omega ^u Represents the nth ^u Operation and maintenance cost unit cost of seed equipment, +.>Represents the total supply of natural gas, ρ represents the area determined by baseline methodCarbon emission coefficient,/->Indicating CO emissions ₂ Total weight of eta ^cc Representing the CO by capturing a unit amount ₂ Whereas the carbon capture efficiency of consumption, +.>Indicating the unit quantity CO generated by the gas turbine and the gas boiler ₂ Punishment costs of->Represents the specific carbon emission intensity coefficient of the gas turbine, < >>Indicating the output power of the internal combustion turbine at time t, < >>Indicating the output heat power of the internal combustion turbine, ψ, at time t ^wt and ψ^pv Penalty factor representing wind power limit and photovoltaic power limit, +.> and />Representing predicted output power of wind power generator and photovoltaic power generator in period t, < >>Represents penalty factors beyond constraint,>representing action a ^χ Is the maximum constraint value of (2); chi represents the adjustment parameters for eliminating dimensional differences beyond constraint costs, < >> and />And the output power of the wind driven generator and the photovoltaic generator in the period t is represented.

S5, performing interactive training on the improved cogeneration system by using an improved TD 3-based scheduling algorithm to obtain an optimal low-carbon economic scheduling strategy.

TD3 is a DRL algorithm based on an Actor-Critic framework, and the goal of the DRL algorithm is to explore an optimal action scheduling strategy pi of a TD3 intelligent agent so as to maximize expected reward return J (phi) =E ^{s-p|π,a～π} (R). The original TD3 agent comprises 2 actor networks, 4 critic networks and one experience playback pool. The improved scheduling algorithm based on TD3 expands one independent experience playback pool into two independent experience playback pools, wherein the independent experience playback pools comprise a current rewarding experience playback pool and a long-term experience playback pool, and the action strategy is formed by an actor network pi ^φ To approximate, the network maps the improved cogeneration system environmental status to the continuous scheduled action a by a parameter phi _t . Critic network and />For evaluating the state s _t Take action a _t In addition, the action cost function Q of (2) is applied to the target network pi ^φ′ 、/> and />The 'hysteresis' updating is realized, and the stability of the neural network can be improved.

Referring to fig. 3, the procedure of the improved TD 3-based scheduling algorithm is described as follows:

specifically, the independent experience playback pool contains a current rewards experience playback pool B1 and a long-term experience playback pool B2, storing experience samples of different time scales. Taking the first stored sample as an example, the experience sample Ω of the t-th time step _t ^l ＝(s _t ,a _t ,r _t ,s _t ′) ^l Stored in B1, t=1, …, T, where T is the total time step; after ending a time total step, empirical sample Ω of the entire time total step ^l ＝(r ^l |s,a,r,s′) ^l Stored in B2.

specifically, a bonus average reference value of samples in the long-term experience playback buffer pool B2 is calculatedBefore the initialization of the network, the average prize value of the samples in B2 is +.>Marked 0, recalculated ++each time a new sample is placed in the pool>Then, the prize values in all samples are divided by their average prize values to obtain a priority ratio array θ= [ θ ] ¹ ,…,θ ^l ]The array is then normalized to obtain the array +.>At the position ofIn the long-term experience replay buffer pool B2, the probability of acquiring the first sample during training is +.>This allows sampling in B2 to obtain a summary of high value samplesThe rate increases. Based on the priority ratio array from B2->A small batch of samples is selected to update the network of the improved TD 3-based scheduling algorithm.

specifically, the agent extracts z samples from the B2 prize value first probability, for sample z Ω _t ^Z ＝(r ^Z |s,a,r,s′) ^Z Calculating a prize average for a sampleAnd sample z. Omega _t ^Z ＝(r ^Z |s,a,r,s′) ^Z Every step prize value +.>Mean value of->Comparing, rewarding +.>Less than average->Time m=1, …, M<Z, mth time step +.>Marked as samples to be optimized.

specifically, the sample Ω is optimized according to the label _m ^Z State information in tuplesIncluded and />Intelligent search in B1 at Ω _k Approximation state s within a certain neighborhood ± δ ^k δ, wherein k=1, …, K. Then approximate state s ^k Action a corresponding to + -delta _k Replace with original action->Thereby calculating a new prize value +.>If->The original action is +>Updated to new action a _k . If not, then processing the next approximation s according to the above procedure ^k+1 δ until k=k. And updating the sample according to the method.

Specifically, the environment current state information of the improved cogeneration system is input into a scheduling algorithm network after the action parameters are updated, and the model outputs scheduling actionsSo as to achieve the aims of maximizing the profit and minimizing the carbon emission of the improved cogeneration system.

As shown in fig. 2, the deep reinforcement learning-based low-carbon scheduling system of the cogeneration system comprises:

Further, referring to fig. 4, the improved cogeneration system module specifically includes:

a gas turbine module for providing electrical and thermal energy to the system;

The gas boiler module is used for providing heat energy for the system;

an AC bus module for delivering a regulated voltage into an electrical load;

The content in the method embodiment is applicable to the system embodiment, the functions specifically realized by the system embodiment are the same as those of the method embodiment, and the achieved beneficial effects are the same as those of the method embodiment.

While the preferred embodiment of the present application has been described in detail, the application is not limited to the embodiment, and various equivalent modifications and substitutions can be made by those skilled in the art without departing from the spirit of the application, and these equivalent modifications and substitutions are intended to be included in the scope of the present application as defined in the appended claims.

Claims

1. The deep reinforcement learning-based low-carbon scheduling method for the cogeneration system is characterized by comprising the following steps of:

2. The deep reinforcement learning-based low-carbon scheduling method of a cogeneration system of claim 1, wherein the equipment mathematical model comprises a wind turbine mathematical model, a photovoltaic generator mathematical model, a gas turbine mathematical model, a waste heat boiler mathematical model, a gas boiler mathematical model, an energy storage system mathematical model, a carbon capture unit mathematical model, and a P2G unit mathematical model, wherein:

the mathematical model of the carbon capture unit has the following expression:

wherein ,P ^cc,b and />Respectively representing the total energy consumption, the basic energy consumption and the operation energy consumption of the carbon capture unit in the period t; epsilon ^e,cc 、η ^cc Respectively represents the CO passing through the capturing unit amount ₂ And consumed electric power and carbon capture efficiency; /> and />Respectively representing the total weight of the emissions and CO ₂ An upper weight limit; /> and />Respectively representing the unit carbon emission intensity coefficients of the gas turbine and the gas boiler;

the expression of the P2G unit mathematical model is as follows:

3. The deep reinforcement learning-based low-carbon scheduling method for a cogeneration system according to claim 1, wherein the constraint conditions specifically include:

the power balancing constraint is expressed as follows:

wherein ,representing interconnection power between the virtual power plant and the utility grid at time t, < >>Represents the nth time period t ^wt Generating capacity of typhoon turbine, +.>Represents the output power generation of the nth photovoltaic generator in period t, +.>Indicating the output power of the nth gas turbine during period t, < >>Indicating the nth time period t ^es The discharge power of the individual electrical energy storage units,represents the sum of the power of the electric loads on the energy consumption side, < + >>Indicating the nth time period t ^es The discharge power of the individual electrical energy storage units,electric energy consumed by the meter P2G device, +.>Total energy consumption of the carbon capture unit during period t, +.>Indicating that the nth time is within time t ^wbb Output heat power of waste heat boiler +.>Represents the output thermal power of the natural gas during period t,

represents the sum of the energy-consuming side thermal load powers, < >>N in period t ^hs Charging thermal power of the thermal energy storage units;

a distributed generation constraint, the expression of which is as follows:

wherein , and />Respectively represent the nth ^dg Continuous start-up time and continuous stop time of the counter generator to time t, < >>Represents the nth ^dg The state of the counter generator during period t +.> and />Respectively represent the nth ^dg Minimum allowable starting time, minimum allowable stopping time, maximum power generation amount climbing and maximum power generation amount of the generator, dg E [ wt, pv]Representing dg type identifiers;

device operation constraints, the expressions of which are as follows:

wherein u is [ gt, whb, gb, cc, p2g]Which represents the device type identifier and,represents the nth ^u Status of station device within period t +.>Respectively the nth ^u Minimum allowed start-up time, maximum allowed down time, maximum ramp power and maximum power of the station apparatus;

the energy storage system constraint is expressed as follows:

wherein ,respectively represent the nth ^es Minimum electric quantity storage, maximum electric quantity storage, maximum charging power, charging and discharging power and maximum discharging power of the electric energy storage units; />

Respectively the nth ^hs Minimum heat storage power, maximum heat storage power, maximum charging power, charging power and maximum discharging power of the individual heat energy storage units;

other constraints, the expressions of which are as follows:

4. The deep reinforcement learning-based low-carbon scheduling method for a cogeneration system according to claim 1, wherein the multi-objective optimization function has the following expression:

R＝C ^I -C ^II +C ^B

wherein R represents a multi-objective optimization function, C ^I 、C ^II and C^B Respectively representing the running cost, penalty cost and fixed parameters ensuring that R is always positive for the virtual power plant, and />Respectively representing the electricity selling price, electricity purchasing price, carbon emission selling price and natural gas price in the period t, < >> and />Represents carbon emission quota and total carbon emission, u E [ gt, whb, gb, cc, p2g]Representing a device type identifier->Represents the nth ^u Power consumption of seed equipment, N ^u Represents the total number of the u-th device, omega ^u Represents the nth ^u Operation and maintenance cost unit cost of seed equipment, +.>Represents the total supply of natural gas, ρ represents the regional carbon emission coefficient determined by the baseline method, ++>Indicating CO emissions ₂ Total weight of eta ^cc Representing the CO by capturing a unit amount ₂ Whereas the carbon capture efficiency of consumption, +.>Indicating the unit quantity CO generated by the gas turbine and the gas boiler ₂ Punishment costs of->Represents the specific carbon emission intensity coefficient of the gas turbine, < >>Indicating the output electric power of the internal combustion turbine at time t,indicating the output heat power of the internal combustion turbine, ψ, at time t ^wt and ψ^pv Penalty factor representing wind power limit and photovoltaic power limit, +.> and />Representing predicted output power of wind power generator and photovoltaic power generator in period t, < >>Represents penalty factors beyond constraint,>representing action a ^χ Is the maximum constraint value of (2); chi represents the adjustment parameters for eliminating dimensional differences beyond constraint costs, < >> and />Representing wind power generation within a period tOutput power of the machine and the photovoltaic generator.

5. The deep reinforcement learning-based low-carbon scheduling method of a cogeneration system according to claim 1, wherein the scheduling algorithm based on TD3 specifically expands one independent experience playback pool into two independent experience playback pools based on the algorithm based on TD3, including a current rewarding experience playback pool and a long-term experience playback pool.

6. The deep reinforcement learning-based low-carbon scheduling method for a cogeneration system according to claim 1, wherein the step of performing interactive training on the improved cogeneration system by using an improved TD 3-based scheduling algorithm to obtain an optimal low-carbon economic scheduling strategy specifically comprises the following steps:

7. Deep reinforcement learning-based low-carbon scheduling system of cogeneration system is characterized by comprising:

the model building module is used for modeling each device of the improved cogeneration system to obtain a mathematical model of the device; the constraint module is used for constraining the equipment mathematical model to obtain constraint conditions

8. The deep reinforcement learning based cogeneration system low-carbon dispatch system of claim 7, wherein the improved cogeneration system module specifically comprises:

a gas turbine module for providing electrical and thermal energy to the system;

The gas boiler module is used for providing heat energy for the system;

P2G module for capturing CO by the carbon capture module ₂ Conversion to methane; an AC bus module for delivering a regulated voltage into an electrical load;