CN117455183A - Comprehensive energy system optimal scheduling method based on deep reinforcement learning - Google Patents
Comprehensive energy system optimal scheduling method based on deep reinforcement learning Download PDFInfo
- Publication number
- CN117455183A CN117455183A CN202311488353.6A CN202311488353A CN117455183A CN 117455183 A CN117455183 A CN 117455183A CN 202311488353 A CN202311488353 A CN 202311488353A CN 117455183 A CN117455183 A CN 117455183A
- Authority
- CN
- China
- Prior art keywords
- energy system
- power
- network
- comprehensive energy
- reinforcement learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 230000002787 reinforcement Effects 0.000 title claims abstract description 35
- 230000006870 function Effects 0.000 claims abstract description 32
- 230000005611 electricity Effects 0.000 claims abstract description 12
- 230000003993 interaction Effects 0.000 claims abstract description 6
- 230000007613 environmental effect Effects 0.000 claims abstract description 4
- 238000011156 evaluation Methods 0.000 claims description 31
- 238000004146 energy storage Methods 0.000 claims description 23
- 238000005457 optimization Methods 0.000 claims description 23
- 238000012549 training Methods 0.000 claims description 22
- 238000003860 storage Methods 0.000 claims description 20
- 239000003795 chemical substances by application Substances 0.000 claims description 17
- 230000008569 process Effects 0.000 claims description 10
- 238000007599 discharging Methods 0.000 claims description 9
- 230000009471 action Effects 0.000 claims description 8
- 229910052799 carbon Inorganic materials 0.000 claims description 8
- 238000012360 testing method Methods 0.000 claims description 8
- 238000010248 power generation Methods 0.000 claims description 7
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 claims description 6
- 230000007246 mechanism Effects 0.000 claims description 5
- 238000013507 mapping Methods 0.000 claims description 3
- 238000007789 sealing Methods 0.000 claims description 3
- 239000002699 waste material Substances 0.000 claims description 3
- 238000013461 design Methods 0.000 abstract description 3
- 230000008859 change Effects 0.000 abstract description 2
- 230000010354 integration Effects 0.000 abstract description 2
- 230000008901 benefit Effects 0.000 description 5
- 238000013459 approach Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 238000013209 evaluation strategy Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000029087 digestion Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000011065 in-situ storage Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 238000002203 pretreatment Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/092—Reinforcement learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Theoretical Computer Science (AREA)
- Economics (AREA)
- Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Marketing (AREA)
- General Health & Medical Sciences (AREA)
- Tourism & Hospitality (AREA)
- Primary Health Care (AREA)
- Biophysics (AREA)
- Development Economics (AREA)
- Quality & Reliability (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- Operations Research (AREA)
- Game Theory and Decision Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Educational Administration (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Supply And Distribution Of Alternating Current (AREA)
Abstract
The invention discloses an optimized scheduling method of a comprehensive energy system based on deep reinforcement learning, relates to the field of intelligent energy, and aims to improve the collaborative operation performance of multiple devices in the comprehensive energy system by a data intelligent method. The technical scheme is characterized in that a deep reinforcement learning scheduling framework suitable for optimizing scheduling of a comprehensive energy system is reasonably constructed, and the deep reinforcement learning scheduling framework comprises the steps of selecting scheduling variables, state variables, design constraint indexes and rewarding functions which characterize cooperative operation of the energy system; through interaction with real-time data, the system can adapt to continuously changing environmental conditions and user demands, and can cope with renewable output, fluctuation of user load and electricity price change, so that optimal scheduling is realized, and the overall performance of the system is improved. The application field of the invention covers a plurality of fields such as energy management, renewable energy integration, scheduling and the like, and the stable and economic operation level of the complex comprehensive energy system is improved.
Description
Technical Field
The invention relates to a comprehensive energy system optimization scheduling method based on deep reinforcement learning, which belongs to the technical field of intelligent energy, and the application field of the technology covers multiple fields of energy management, renewable energy integration, scheduling and the like.
Background
The high-proportion renewable comprehensive energy system is constructed, so that the high-efficiency utilization of energy and the in-situ digestion of renewable resources are realized, and the method becomes one of the important ways of low-carbon transformation of the energy system. However, there are strong coupling and large dynamic characteristic differences between the energy subsystems in the integrated energy system, so that the economical low-carbon operation of the integrated energy system is challenging. Therefore, the optimal set value instruction is provided for the long-period operation of the integrated energy system at the system scheduling level, so that the stable, economical and flexible energy supply of the system is ensured, and the advantages of the integrated energy system are presented.
At present, a plurality of researches on a system control level exist for the operation optimization of a combined heat and power energy system, and the aim of realizing the coordination control of each energy source of the system is to quickly meet the load supply and demand balance of an electric side and a hot side. However, the lack of exact and reasonable control instructions can lead to a significant reduction in the economic stability of operation of the integrated energy system. For this reason, many scholars have conducted studies on the operation of integrated energy systems with the aim of achieving optimal performance of the system, thereby reducing costs, reducing carbon emissions, improving reliability, and better adapting to fluctuations in the energy market.
When a scheduling method of an integrated energy system is involved, various methods are available to meet the requirements of different systems. These methods can be classified into the following categories according to their characteristics:
a rule-based scheduling method: rule-based scheduling methods rely on predefined rules and policies to manage the integrated energy system. These rules may operate according to a schedule, such as providing additional power supply during peak hours and reducing supply during low peak hours. This approach is very effective in managing simple systems or requiring minimal computational complexity. However, these rules may become inflexible for complex systems and conditions that change rapidly, failing to effectively cope with changing demands and resources.
The optimal scheduling method comprises the following steps: the optimized scheduling method uses mathematical optimization techniques such as linear programming, integer programming, and nonlinear programming to determine the optimal energy configuration and scheduling strategy. This requires building a mathematical model of the system, including constraints and objective functions, and then using an optimization algorithm to find the optimal solution. The optimization method may take into account a number of objectives such as cost minimization, carbon emission minimization, and reliability maximization. The advantage of such approaches is that they can find globally optimal solutions taking into account multiple objectives, but typically require a large amount of computational resources.
The deep reinforcement learning scheduling method comprises the following steps: deep reinforcement learning methods have recently been used in comprehensive energy system scheduling. This approach utilizes deep neural networks and reinforcement learning techniques to learn the optimal decision strategy through interactions with the environment. The advantage of deep reinforcement learning is that it can cope with complex, nonlinear systems and changing conditions without explicit models. The system learns by interacting with the environment, optimizing decisions based on the reward signals, thereby continually improving performance. The method has remarkable effect under the conditions of high real-time requirements and large fluctuation of requirements and resources.
Disclosure of Invention
In order to solve the technical problems, the invention discloses a comprehensive energy system optimal scheduling method based on deep reinforcement learning, which aims to improve energy utilization efficiency, enhance sustainability, realize load demand balance, reduce manual intervention and cope with energy market fluctuation. The specific technical scheme is as follows:
a comprehensive energy system optimization scheduling method based on deep reinforcement learning comprises the following steps:
step 1: establishing a comprehensive energy system model, wherein the comprehensive energy system model comprises a wind generating set, a photovoltaic generating set, a storage battery model and a park electric load demand model;
step 2: establishing an economic optimization model according to the comprehensive energy system model, and defining system variables and constraints; constructing a deep reinforcement learning training model framework according to indexes, variables and constraints, namely designing reinforcement learning state variables S, scheduling variables A and reward functions r;
step 3: setting up a TD3 training network structure, and setting up a strategy network of the TD3 training network structure, network parameters of an evaluation network, buffer area size, discount factors and soft update rate;
step 4: through interaction of the comprehensive energy system models, the intelligent body is trained, so that the intelligent body learns how to make optimal decisions under different conditions, and the reward function is maximized, so that the stable and economic operation level of the comprehensive energy system is realized.
Further, in the mechanism of establishing the comprehensive energy system model in step 1, each model is constructed as follows:
the photovoltaic generator set model is shown in the following formula (1):
wherein P is PV The output power of the photovoltaic generator set is represented, and the unit kW is represented; y is Y PV The rated capacity of the photovoltaic generator set is the unit kW, and represents the output power under the standard test condition; f (f) PV Is a photovoltaic derating factor; g T Solar irradiation intensity of current time step, unit kW/m 2 ;G T,STC The unit kW/m is the solar irradiation intensity under standard test conditions 2 Usually 1 is taken; alpha P Is the power temperature coefficient of the photovoltaic cell panel, unit%/K; t (T) C The temperature of the photovoltaic cell is the current time step length, and the unit is K; t (T) C,STC The temperature of the photovoltaic cell under standard test conditions is shown as a unit K;
the fan output power of the wind generating set is estimated through the predicted wind speed and wind speed power characteristic curve, and the following formula (2) is shown:
in the method, in the process of the invention,the output power of the fan at the moment t is the unit kW; u (U) hub The wind speed is predicted at the height of the hub of the fan, and the unit is m/s; a, b, c, d are fitting coefficients; v ci 、v r 、v co The cut-in wind speed, the rated wind speed and the cut-out wind speed of the fan are respectively in units of m/s;
the battery model is represented by the following formula (3):
in the method, in the process of the invention,and->The unit MWh is the capacity of the battery energy storage system at the time t and the time t-1; /> The unit MW, n is the charge and discharge power of the battery energy storage system at the moment t c,BESS 、n d,BESS The unit percentage is the charge and discharge efficiency of the battery energy storage system.
Further, the economic optimization model of the comprehensive energy system is as follows:
min M total =M om +M buy (4)
wherein M is total As the total cost M om To run and maintain costs M buy The cost of electricity purchase is achieved.
Further, the constraints of the economic optimization model configured in the step 2 include a power balance constraint and a device operation constraint:
the power balance constraint is shown in the following formula (5):
in the method, in the process of the invention,the power and the wind power are respectively output by the photovoltaic generator set at the ith momentGroup output electric power, storage battery discharge power and charging power, main network purchase electric power, and waste electric power, unit kW; />For the user electrical load at instant i;
upper and lower limit constraint of charging and discharging power of the storage battery:
in the method, in the process of the invention,for maximum moment battery discharge power and charge power, < >>The discharging power and the charging power of the storage battery at the minimum moment;
upper and lower limit constraint of storage battery capacity:
wherein E is cap,ESS Is the rated capacity of the battery.
Further, the state variable S is designed to:
in a wind-solar energy storage coupling carbon capture utilization sealing system, the state should be selected to reflect the current running condition of the system, the environmental index directly related to the scheduling variable, and the time t and the electrical load demand P should be selected load Wind power generation P wind Photovoltaic power generation P pv State of charge S of battery bat Current time electricity price C E ,
The state variable S is represented by the following formula (9):
S=[t P load P wind P pv S bat C E ]。 (9)
further, the scheduling variable a selects:
the dispatching variable should select the variable directly influencing rewards and states, so the charging and discharging quantity delta P of the energy storage system at the current t moment is input c Electric quantity P for purchasing electricity of power grid buy The charge and discharge quantity of the energy storage system is unified into an increment variable, the value is positive and discharge, the value is negative and discharge,
A=[ΔP c Pbu y ]。 (10)
further, the reward function r is specifically:
the optimization goal of the intelligent agent is to find the economic optimal solution in the feasible domain, so the reward setting is divided into two parts of economic index rewards and out-of-limit penalties according to the following formula (11),
r=-k ope M total -k vio r vio (11)
k in ope 、k vio The economic and out-of-limit penalty scale factors, respectively.
Further, besides the hard constraint of the charge and discharge capacity of the storage battery and the purchase capacity of the power grid, the capacity limit of the energy storage system can be defined by r vio Continuous soft constraint implementation, as shown in formula (12) below,
wherein r is vio Penalty for violating the constraint.
Further, the decision network selects a scheduling variable according to the state variable, and a policy function of the decision network is updated through a deterministic policy gradient algorithm:
in which Q π (s,a)=E S~Pπ,a~π [R t |s,a]Wherein E is S~Pπ Predicted expected operation of energy system after acting according to pi strategy in state sCost;and carrying out gradient solving for the Actor network.
Furthermore, the evaluation network is an energy system scheduling value evaluation based on a current time state variable, a current time scheduling variable, a current time running cost and a next time state variable, the evaluation network consists of a Critic network and a Critic target network, and according to a Belman equation, the state cost function corresponds to an optimal strategy under the optimal condition,
Q π (s,a)=r+γE S′~a′ [Q π (s′,a′)]a′~π(s′) (14)
in which Q π (s ', a') represents the state and action of the energy system at the next moment.
Furthermore, the TD3 training network structure introduces two evaluation networks, and through comparison of the two networks, conservative evaluation values in the two evaluation networks are selected to serve as evaluation values:
the estimated values are:
wherein y is 1 ,y 2 Representing a target Q value; r is the reward at the current moment; θ 1 ,θ 2 ,Two evaluation target network parameters and two evaluation network parameters are respectively. 11. The optimal scheduling method for the comprehensive energy system based on deep reinforcement learning according to claim 10, wherein the target value of the time difference can be obtained by integrating the formulas (15), (16):
the system first passes through the policy network at state variable s t Obtain the scheduling variable a on the basis of (a) t ,
Subsequently, the system state s t And a system schedule variable a t As input, through Critic1 network mapping, Q is obtained 1 (s t ,a t )、Q 2 (s t ,a t ) The network parameters are obtained through the double Q network loss function back propagation algorithm of (18),
wherein L is Q S is a loss function j ,a j And respectively making a decision of the energy system in the j-th training and information of the energy system at the current moment.
The deterministic strategy estimation function has the problem of overfitting, so Gaussian noise is added to the target strategy network:
ε=clip(N(0,σ),-c,c) (19)
where ε is noise, clip is a truncated function, and c is the noise truncated boundary value.
The beneficial effects of this application lie in:
(1) The system can more effectively distribute and utilize various energy resources, including wind generating sets, photovoltaic generating sets, storage batteries and main power purchasing by establishing a mechanism model of the comprehensive energy system and using deep reinforcement learning. This helps to improve the overall efficiency of the system, reduce energy waste, and improve the economic benefits of the energy system.
(2) The TD3 intelligent agent in the invention enables the system to make automatic and intelligent decisions. The intelligent agent continuously learns and improves the decision thereof through the interaction with the environment of the comprehensive energy system so as to adapt to different operating conditions. This reduces the need for manual intervention and increases the autonomy of the system.
(3) Through intelligent scheduling and management of electric resources, the system can learn and optimize energy production and distribution autonomously so as to ensure efficient operation of the system, reduce energy cost, reduce carbon emission and improve stability and reliability of the energy system.
Drawings
Fig. 1 is a schematic flow chart of an algorithm according to a specific embodiment of the present application.
Detailed Description
The technical scheme of the application will be described in detail below with reference to the accompanying drawings.
The corresponding integrated energy system in this embodiment is shown in fig. 1. The system mainly comprises a photovoltaic generator set, a wind generating set, a storage battery, a bus and a main network; the system demand side is mainly park electric side demand. Wherein, the storage battery participates in the power balance adjustment; the power consumption equipment in the system is met by the power generation equipment of the power consumption system, and is externally connected with a power grid, so that electricity is purchased when the electricity price is low.
In order to realize economic and efficient operation of the comprehensive energy system, the application provides a comprehensive energy system optimal scheduling method based on deep reinforcement learning. More specifically, the scheduling optimization training process of the integrated energy system in this embodiment is shown in fig. 1. Based on the strategy-evaluation network structure, the evaluation network realizes strategy evaluation, and the strategy network continuously optimizes the actions of the intelligent agent according to the evaluation network, interacts and updates the multiple networks, and learns how to select the best operation under different states so as to maximize rewards and obtain the economic strategy.
The invention discloses a comprehensive energy system optimal scheduling method based on deep reinforcement learning, which specifically comprises the following steps:
step 1: and establishing a mechanism model of the comprehensive energy system, wherein the mechanism model comprises a wind generating set, a photovoltaic generating set, a storage battery and other equipment models and a park electric load demand model.
Step 2: establishing an economic optimization model according to the comprehensive energy system unit, and defining system variables and constraints; and constructing a deep reinforcement learning training model framework according to the indexes, the variables and the constraints, namely designing reinforcement learning scheduling variables, state variables and rewarding functions.
Step 3: setting up a TD3 training network structure, setting a TD3 strategy, and evaluating network parameters, buffer area size, discount factors and soft update rate of the network.
Step 4: through interaction of the environment models of the comprehensive energy system, the intelligent body is trained, so that the intelligent body learns how to make optimal decisions under different conditions, and the reward function is maximized, so that the stable and economic operation level of the comprehensive energy system is realized.
Specifically, in step 1, the model is built as follows:
the photovoltaic generator set model is shown in formula (1):
wherein P is PV [kW]Representing the output power of the photovoltaic generator set; y is Y PV [kW]Is the rated capacity of the photovoltaic generator set and represents the output power (the irradiation intensity of illumination is 1 kW/m) 2 298K, windless); f (f) PV Is a photovoltaic derating factor; g T [kW/m2]Solar irradiation intensity of the current time step; GT, STC [ kW/m2 ]]The solar irradiation intensity under standard test conditions is usually 1; alpha P [%/K]Is the power temperature coefficient of the photovoltaic cell panel; t (T) C [K]The temperature of the photovoltaic cell for the current time step; t (T) C,STC [K]Is the temperature of the photovoltaic cell under standard test conditions.
The fan output power of the wind generating set can be estimated through the predicted wind speed and the wind speed power characteristic curve, as shown in the formula (2):
wherein,the output power of the fan at the moment t; u (U) hub [m/s]For the pre-treatment of the height of the fan hubMeasuring wind speed; a, b, c, d are fitting coefficients; v ci [m/s],v r [m/s],v co [m/s]The wind speed is cut-in, rated and cut-out of the fan.
The storage battery model is shown in formula (3):
based on the battery charge and discharge conditions, the battery energy storage system can estimate the battery energy storage capacity through equation (8).
Wherein,and->The capacity of the battery energy storage system is the time t and the time t-1;and->The charge and discharge power of the battery energy storage system at the moment t is n c,BESS [%]And n d,BESS [%]Is the charge and discharge efficiency of the battery energy storage system.
In the step 2, the economic optimization model is shown as a formula (4):
min M total =M om +M buy (4)
wherein M is total As the total cost M om To run and maintain costs M buy The cost of electricity purchase is achieved.
The constraints for configuring the economic optimization model in step 2 mainly comprise power balance constraints and equipment operation constraints.
The power balance constraint is as shown in formula (5):
wherein the method comprises the steps ofThe method comprises the steps of outputting electric power by a photovoltaic generator set at the i moment, outputting electric power by a wind generating set, discharging power and charging power of a storage battery, purchasing electric power from a main network and discarding electric power; />For the user electrical load at instant i.
The plant operation constraints are shown in formulas (6) - (8):
upper and lower limit constraint of charging and discharging power of storage battery:
upper and lower limit constraint of storage battery capacity:
deep Reinforcement Learning (DRL) framework design involves four key elements: status, action, rewards and environment. The state represents the context in which the agent is located, the action is an operation that the agent can perform, the reward is immediate feedback, and the environment is the outside world. In the DRL, the agent receives rewards by observing states, selecting actions, and interacting with the environment to learn how to formulate strategies to maximize jackpots.
(1) The state variable S is designed. In the wind-solar energy storage coupled carbon capture and utilization sealing system, the state should be selected to most reflect the current running condition of the system, and the state has environmental indexes directly related to actions. The time t and the electrical load demand P are selected load Wind power generation P wind Photovoltaic power generation P pv State of charge S of battery bat Current time electricity price C E
The state S can be expressed as formula (9):
S=[t P load P wind P pv S bat C E | (9)
(2) Scheduling variable a design. The dispatching variable should select the variable directly influencing rewards and states, so the charging and discharging quantity delta P of the energy storage system at the current t moment is input c Electric quantity P for purchasing electricity of power grid buy . And unifying the charge and discharge quantity of the energy storage system into an increment variable, wherein the value is positive and discharge, and the value is negative and discharge.
A=[ΔP c P buy ] (10)
(3) The bonus function r is designed. The optimization objective of the agent is to find an economically optimal solution in the feasible domain. Thus, the reward setting is divided into two parts, namely an economic index reward and an out-of-limit penalty according to the formula (11).
r=-k ope M total -k vio r vio (11)
Wherein k is ope 、k vio The economic and out-of-limit penalty scale factors, respectively.
The capacity constraint of the energy storage system is defined by r vio Continuous soft constraint implementation is shown in formula (12).
Step 3: the construction of the TD3 training network is shown in FIG. 1.
In fig. 1, the integrated energy system state s at time step t is depicted t And (5) collecting. These states are mapped to the energy system decision variables a via a policy function t And to increase the exploration property, the decision variable a will be given t Noise is added. The energy system management agent interacts with the environment at time t+1 to generate a new state s t+1 And the economic cost r required by the energy system at the next moment t+1 . This information is stored in an experience pool for random extraction as training samples when the network is trained.
The decision network adopted in the figure has the function of selecting a scheduling variable according to a state variable, and the policy function of the decision network is updated through a deterministic policy gradient algorithm:
wherein Q is π (s,a)=E S~Pπ,a~π [R t |s,a]Wherein E is S~Pπ Expected benefits predicted after the energy system acts according to pi strategy in the state s;and carrying out gradient solving for the Actor network.
The evaluation network is composed of a Critic network and a Critic target policy network. According to the Belman equation, the state cost function corresponds to the optimal strategy under the optimal condition.
Q π (s,a)=r+γE S′~a′ [Q π (s′,a′)]a′~π(s′) (14)
Wherein Q is π (s ', a') represents the state and action of the energy system at the next moment.
TD3 introduces two evaluation networks, and through comparison of two networks, conservative evaluation values in the two evaluation networks are selected to be used as evaluation values:
the estimated values are:
wherein y is 1 ,y 2 Representing a target Q value; r is the reward at the current moment; θ 1 ,θ 2 ,Two evaluation strategy target network parameters and two evaluation strategy network parameters are respectively adopted.
Thus integrating equations (15), (16) can yield a target value of the time difference:
the system first passes through the policy network at state variable s t Obtain the scheduling variable a on the basis of (a) t 。
Subsequently, the system state s t And a system schedule variable a t As input, through Critic1 network mapping, Q is obtained 1 (s t ,a t )。Q 2 (s t ,a t ). The network parameters are obtained by the double Q network loss function back propagation algorithm of equation (18).
Wherein L is Q S is a loss function j ,a j And respectively making a decision of the energy system in the j-th training and information of the energy system at the current moment.
The deterministic strategy estimation function has the problem of overfitting, so Gaussian noise is added to the target strategy network:
ε=clip(N(0,σ),-c,c) (19)
epsilon is noise, clip is a truncation function, and c is a noise truncation boundary value.
The objective function adopts a soft update mode to gradually update the parameters of the strategy network to the objective strategy network, and simultaneously gradually update the parameters of the evaluation network to the objective evaluation network. Introducing a learning rate tau, and updating the strategy to be:
θ i =τθ i +(1-τ)θ″ i
table 1 TD3 training parameter settings
In the step 4, training the intelligent agent based on the MATLAB platform, setting the maximum training step length as 24 steps, representing optimization for 24 hours, and setting the condition of storing the intelligent agent as the average rewarding value to reach r std I.e. to save the agent. The final training stopping condition is set to be the maximum training round number E max . The training process is divided into two steps, and the first step is to train the intelligent agent to avoid the operation constraint of the equipment, and store the intelligent agent after convergence. And adding a second part of economic index based on the trained intelligent agent, and performing overall optimization again to realize the energy system optimization scheduling intelligent agent.
The foregoing is an exemplary embodiment of the present application, the scope of which is defined by the claims and their equivalents.
Claims (12)
1. The optimized scheduling method of the comprehensive energy system based on the deep reinforcement learning is characterized by comprising the following steps of:
step 1: establishing a comprehensive energy system model, wherein the comprehensive energy system model comprises a wind generating set, a photovoltaic generating set, a storage battery model and a park electric load demand model;
step 2: establishing an economic optimization model according to the comprehensive energy system model, and defining system variables and constraints; constructing a deep reinforcement learning training model framework according to indexes, variables and constraints, namely designing reinforcement learning state variables S, scheduling variables A and reward functions r;
step 3: setting up a TD3 training network structure, and setting up a strategy network of the TD3 training network structure, network parameters of an evaluation network, buffer area size, discount factors and soft update rate;
step 4: through interaction of the comprehensive energy system models, the intelligent body is trained, so that the intelligent body learns how to make optimal decisions under different conditions, and the reward function is maximized, so that the stable and economic operation level of the comprehensive energy system is realized.
2. The optimal scheduling method of the comprehensive energy system based on deep reinforcement learning according to claim 1, wherein in the mechanism of establishing the comprehensive energy system model in step 1, each model is established as follows:
the photovoltaic generator set model is shown in the following formula (1):
wherein P is PV The output power of the photovoltaic generator set is represented, and the unit kW is represented; y is Y PV The rated capacity of the photovoltaic generator set is the unit kW, and represents the output power under the standard test condition; f (f) PV Is a photovoltaic derating factor; g T Solar irradiation intensity of current time step, unit kW/m 2 ;G T,STC The unit kW/m is the solar irradiation intensity under standard test conditions 2 Usually 1 is taken; alpha P Is the power temperature coefficient of the photovoltaic cell panel, unit%/K; t (T) C The temperature of the photovoltaic cell is the current time step length, and the unit is K; t (T) C,STC The temperature of the photovoltaic cell under standard test conditions is shown as a unit K;
the fan output power of the wind generating set is estimated through the predicted wind speed and wind speed power characteristic curve, and the following formula (2) is shown:
in the method, in the process of the invention,the output power of the fan at the moment t is the unit kW; u (U) hub The wind speed is predicted at the height of the hub of the fan, and the unit is m/s; a, b, c, d are fitting coefficients; v ci 、v r 、v co The cut-in wind speed, the rated wind speed and the cut-out wind speed of the fan are respectively in units of m/s;
the battery model is represented by the following formula (3):
in the method, in the process of the invention,and->The unit MWh is the capacity of the battery energy storage system at the time t and the time t-1; /> The unit MW, n is the charge and discharge power of the battery energy storage system at the moment t c,BESS 、n d,BESS The unit percentage is the charge and discharge efficiency of the battery energy storage system.
3. The optimal scheduling method for the comprehensive energy system based on deep reinforcement learning according to claim 1, wherein the economic optimization model of the comprehensive energy system is as follows:
min M total =M om +M buy (4)
wherein M is total As the total cost M om To run and maintain costs M buy The cost of electricity purchase is achieved.
4. The optimization scheduling method of the comprehensive energy system based on deep reinforcement learning according to claim 1, wherein the constraints for configuring the economic optimization model in the step 2 include a power balance constraint and a device operation constraint:
the power balance constraint is shown in the following formula (5):
in the method, in the process of the invention,the power supply unit comprises a photovoltaic generator set output electric power at the ith moment, a wind power generator set output electric power, storage battery discharge power and charging power, main network purchase electric power and waste electric power, wherein the units are kW; />For the user electrical load at instant i;
upper and lower limit constraint of charging and discharging power of the storage battery:
in the method, in the process of the invention,for maximum moment battery discharge power and charge power, < >>The discharging power and the charging power of the storage battery at the minimum moment;
upper and lower limit constraint of storage battery capacity:
wherein E is cap,ESS Is the rated capacity of the battery.
5. The optimal scheduling method for the comprehensive energy system based on deep reinforcement learning according to claim 1, wherein the state variable S is designed by:
in the wind-solar energy storage coupled carbon capture and utilization sealing system, the state should be selected to reflect the current running state of the system, the environmental index directly related to the dispatching variable,selecting time t and electric load demand P load Wind power generation P wind Photovoltaic power generation P pv State of charge S of battery bat Current time electricity price C E ,
The state variable S is represented by the following formula (9):
S=[tP load P wind P pv S bat C E ](9)。
6. the optimal scheduling method for the comprehensive energy system based on deep reinforcement learning according to claim 1, wherein the scheduling variable a is selected from the following:
the dispatching variable should select the variable directly influencing rewards and states, so the charging and discharging quantity delta P of the energy storage system at the current t moment is input c Electric quantity P for purchasing electricity of power grid buy The charge and discharge quantity of the energy storage system is unified into an increment variable, the value is positive and discharge, the value is negative and discharge,
A=[ΔP c P buy ](10)。
7. the optimal scheduling method for the comprehensive energy system based on deep reinforcement learning according to claim 1, wherein the reward function r is specifically:
the optimization goal of the intelligent agent is to find the economic optimal solution in the feasible domain, so the reward setting is divided into two parts of economic index rewards and out-of-limit penalties according to the following formula (11),
r=-k ope M total -k vio r vio (11)
k in ope 、k vio The economic and out-of-limit penalty scale factors, respectively.
8. The optimal scheduling method of the comprehensive energy system based on deep reinforcement learning according to claim 1, wherein the capacity limit of the energy storage system can be defined by r in addition to the hard constraint of the charge and discharge capacity of the storage battery and the purchase capacity of the power grid vio Continuous soft constraint implementation, as shown in formula (12) below,
wherein r is vio Penalty for violating the constraint.
9. The optimal scheduling method for the comprehensive energy system based on deep reinforcement learning according to claim 1, wherein the decision network selects scheduling variables according to state variables, and a policy function of the decision network is updated by a deterministic policy gradient algorithm:
in which Q π (s,a)=E S~Pπ,a~π [R t |s,a]Wherein E is s~Pπ The expected running cost predicted after the energy system acts according to pi strategy in the state s is given;and carrying out gradient solving for the Actor network.
10. The optimal scheduling method for the comprehensive energy system based on deep reinforcement learning according to claim 1, wherein the evaluation network is an energy system scheduling value evaluation based on a state variable at a current moment, a scheduling variable at the current moment, an operation cost at the current moment and a state variable at a next moment, the evaluation network consists of a Critic network and a Critic target network, the state cost function corresponds to an optimal strategy under the optimal condition according to a Belman equation,
Q π (s,a)=r+γE S′~a′ [Q π (s′,a′)]a′~π(s′) (14)
in which Q π (s ', a') represents the state and action of the energy system at the next moment.
11. The optimal scheduling method for the comprehensive energy system based on deep reinforcement learning according to claim 1, wherein the TD3 training network structure introduces two evaluation networks, and the conservative evaluation values in the two evaluation networks are selected as the evaluation values through the comparison of the two networks:
the estimated values are:
wherein y is 1 ,y 2 Representing a target Q value; r is the reward at the current moment;two evaluation target network parameters and two evaluation network parameters are respectively.
12. The optimal scheduling method for the comprehensive energy system based on deep reinforcement learning according to claim 11, wherein the target value of the time difference can be obtained by integrating the formulas (15), (16):
the system first passes through the policy network at state variable s t Obtain the scheduling variable a on the basis of (a) t ,
Subsequently, the system state s t And a system schedule variable a t As input, through Critic1 network mapping, Q is obtained 1 (s t ,a t )、Q 2 (s t ,a t ) The network parameters are obtained through the double Q network loss function back propagation algorithm of (18),
wherein L is Q S is a loss function j ,a j Respectively making a decision on the energy system in the j-th training and information on the energy system at the current moment;
the deterministic strategy estimation function has the problem of overfitting, so Gaussian noise is added to the target strategy network:
ε=clip(N(0,σ),-c,c) (19)
where ε is noise, clip is a truncated function, and c is the noise truncated boundary value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311488353.6A CN117455183A (en) | 2023-11-09 | 2023-11-09 | Comprehensive energy system optimal scheduling method based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311488353.6A CN117455183A (en) | 2023-11-09 | 2023-11-09 | Comprehensive energy system optimal scheduling method based on deep reinforcement learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117455183A true CN117455183A (en) | 2024-01-26 |
Family
ID=89596463
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311488353.6A Pending CN117455183A (en) | 2023-11-09 | 2023-11-09 | Comprehensive energy system optimal scheduling method based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117455183A (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108537357A (en) * | 2018-02-09 | 2018-09-14 | 上海电气分布式能源科技有限公司 | Photovoltaic power generation quantity loss forecasting method based on derating factor |
CN113723749A (en) * | 2021-07-20 | 2021-11-30 | 中国电力科学研究院有限公司 | Multi-park comprehensive energy system coordinated scheduling method and device |
CN114091879A (en) * | 2021-11-15 | 2022-02-25 | 浙江华云电力工程设计咨询有限公司 | Multi-park energy scheduling method and system based on deep reinforcement learning |
CN114417695A (en) * | 2021-11-30 | 2022-04-29 | 国网浙江省电力有限公司台州供电公司 | Multi-park comprehensive energy system economic dispatching method |
CN114462696A (en) * | 2022-01-27 | 2022-05-10 | 合肥工业大学 | Comprehensive energy system source-load cooperative operation optimization method based on TD3 |
CN114519456A (en) * | 2022-01-14 | 2022-05-20 | 东南大学 | Green agriculture zero-carbon energy supply system and intelligent configuration layered optimization algorithm thereof |
WO2022160705A1 (en) * | 2021-01-26 | 2022-08-04 | 中国电力科学研究院有限公司 | Method and apparatus for constructing dispatching model of integrated energy system, medium, and electronic device |
CN115186885A (en) * | 2022-06-29 | 2022-10-14 | 山东大学 | Comprehensive energy system energy optimization scheduling method and system based on reinforcement learning |
WO2023082697A1 (en) * | 2021-11-15 | 2023-05-19 | 中国电力科学研究院有限公司 | Coordination and optimization method and system for comprehensive electric-thermal energy system, and device, medium and program |
CN116663820A (en) * | 2023-05-19 | 2023-08-29 | 合肥工业大学 | Comprehensive energy system energy management method under demand response |
-
2023
- 2023-11-09 CN CN202311488353.6A patent/CN117455183A/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108537357A (en) * | 2018-02-09 | 2018-09-14 | 上海电气分布式能源科技有限公司 | Photovoltaic power generation quantity loss forecasting method based on derating factor |
WO2022160705A1 (en) * | 2021-01-26 | 2022-08-04 | 中国电力科学研究院有限公司 | Method and apparatus for constructing dispatching model of integrated energy system, medium, and electronic device |
CN113723749A (en) * | 2021-07-20 | 2021-11-30 | 中国电力科学研究院有限公司 | Multi-park comprehensive energy system coordinated scheduling method and device |
CN114091879A (en) * | 2021-11-15 | 2022-02-25 | 浙江华云电力工程设计咨询有限公司 | Multi-park energy scheduling method and system based on deep reinforcement learning |
WO2023082697A1 (en) * | 2021-11-15 | 2023-05-19 | 中国电力科学研究院有限公司 | Coordination and optimization method and system for comprehensive electric-thermal energy system, and device, medium and program |
CN114417695A (en) * | 2021-11-30 | 2022-04-29 | 国网浙江省电力有限公司台州供电公司 | Multi-park comprehensive energy system economic dispatching method |
CN114519456A (en) * | 2022-01-14 | 2022-05-20 | 东南大学 | Green agriculture zero-carbon energy supply system and intelligent configuration layered optimization algorithm thereof |
CN114462696A (en) * | 2022-01-27 | 2022-05-10 | 合肥工业大学 | Comprehensive energy system source-load cooperative operation optimization method based on TD3 |
CN115186885A (en) * | 2022-06-29 | 2022-10-14 | 山东大学 | Comprehensive energy system energy optimization scheduling method and system based on reinforcement learning |
CN116663820A (en) * | 2023-05-19 | 2023-08-29 | 合肥工业大学 | Comprehensive energy system energy management method under demand response |
Non-Patent Citations (1)
Title |
---|
席先鹏: "基于预测辅助深度强化学习的综合能源系统运行优化研究", 万方-学位论文, 2 October 2023 (2023-10-02), pages 25 - 63 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Athari et al. | Operational performance of energy storage as function of electricity prices for on-grid hybrid renewable energy system by optimized fuzzy logic controller | |
CN105071389B (en) | The alternating current-direct current mixing micro-capacitance sensor optimizing operation method and device of meter and source net load interaction | |
CN112713618B (en) | Active power distribution network source network load storage cooperative optimization operation method based on multi-scene technology | |
CN109066750B (en) | Photovoltaic-battery micro-grid hybrid energy scheduling management method based on demand side response | |
CN107546781A (en) | Micro-capacitance sensor multiple target running optimizatin method based on PSO innovatory algorithms | |
Bonthu et al. | Minimization of building energy cost by optimally managing PV and battery energy storage systems | |
CN116388252B (en) | Wind farm energy storage capacity optimal configuration method, system, computer equipment and medium | |
CN117057553A (en) | Deep reinforcement learning-based household energy demand response optimization method and system | |
CN107273968A (en) | A kind of Multiobjective Scheduling method and device based on dynamic fuzzy Chaos-Particle Swarm Optimization | |
CN111668878A (en) | Optimal configuration method and system for renewable micro-energy network | |
CN114243791A (en) | Multi-objective optimization configuration method, system and storage medium for wind-solar-hydrogen storage system | |
CN112734116A (en) | Optimal scheduling method for active power distribution network containing distributed energy storage | |
Huangfu et al. | An optimal energy management strategy with subsection bi-objective optimization dynamic programming for photovoltaic/battery/hydrogen hybrid energy system | |
CN115115130A (en) | Wind-solar energy storage hydrogen production system day-ahead scheduling method based on simulated annealing algorithm | |
CN115986834A (en) | Near-end strategy optimization algorithm-based optical storage charging station operation optimization method and system | |
Gbadega et al. | JAYA algorithm-based energy management for a grid-connected micro-grid with PV-wind-microturbine-storage energy system | |
Saber et al. | Smart micro-grid optimization with controllable loads using particle swarm optimization | |
CN115940284B (en) | Operation control strategy of new energy hydrogen production system considering time-of-use electricity price | |
Bonthu et al. | Energy cost optimization in microgrids using model predictive control and mixed integer linear programming | |
CN115411776B (en) | Thermoelectric collaborative scheduling method and device for residence comprehensive energy system | |
CN116865270A (en) | Optimal scheduling method and system for flexible interconnection power distribution network containing embedded direct current | |
CN114285093B (en) | Source network charge storage interactive scheduling method and system | |
CN117455183A (en) | Comprehensive energy system optimal scheduling method based on deep reinforcement learning | |
Yu et al. | A fuzzy Q-learning algorithm for storage optimization in islanding microgrid | |
CN113410900A (en) | Micro-grid HESS optimization configuration method and system based on self-adaptive difference whale optimization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |