CN117455183A - Comprehensive energy system optimal scheduling method based on deep reinforcement learning - Google Patents

Comprehensive energy system optimal scheduling method based on deep reinforcement learning Download PDF

Info

Publication number
CN117455183A
CN117455183A CN202311488353.6A CN202311488353A CN117455183A CN 117455183 A CN117455183 A CN 117455183A CN 202311488353 A CN202311488353 A CN 202311488353A CN 117455183 A CN117455183 A CN 117455183A
Authority
CN
China
Prior art keywords
energy system
power
network
comprehensive energy
reinforcement learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311488353.6A
Other languages
Chinese (zh)
Inventor
章哲玮
蔺琪蒙
王立公
陈宏伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guoneng Jiangsu New Energy Technology Development Co ltd
Original Assignee
Guoneng Jiangsu New Energy Technology Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guoneng Jiangsu New Energy Technology Development Co ltd filed Critical Guoneng Jiangsu New Energy Technology Development Co ltd
Priority to CN202311488353.6A priority Critical patent/CN117455183A/en
Publication of CN117455183A publication Critical patent/CN117455183A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Economics (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • General Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Primary Health Care (AREA)
  • Biophysics (AREA)
  • Development Economics (AREA)
  • Quality & Reliability (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Educational Administration (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention discloses an optimized scheduling method of a comprehensive energy system based on deep reinforcement learning, relates to the field of intelligent energy, and aims to improve the collaborative operation performance of multiple devices in the comprehensive energy system by a data intelligent method. The technical scheme is characterized in that a deep reinforcement learning scheduling framework suitable for optimizing scheduling of a comprehensive energy system is reasonably constructed, and the deep reinforcement learning scheduling framework comprises the steps of selecting scheduling variables, state variables, design constraint indexes and rewarding functions which characterize cooperative operation of the energy system; through interaction with real-time data, the system can adapt to continuously changing environmental conditions and user demands, and can cope with renewable output, fluctuation of user load and electricity price change, so that optimal scheduling is realized, and the overall performance of the system is improved. The application field of the invention covers a plurality of fields such as energy management, renewable energy integration, scheduling and the like, and the stable and economic operation level of the complex comprehensive energy system is improved.

Description

Comprehensive energy system optimal scheduling method based on deep reinforcement learning
Technical Field
The invention relates to a comprehensive energy system optimization scheduling method based on deep reinforcement learning, which belongs to the technical field of intelligent energy, and the application field of the technology covers multiple fields of energy management, renewable energy integration, scheduling and the like.
Background
The high-proportion renewable comprehensive energy system is constructed, so that the high-efficiency utilization of energy and the in-situ digestion of renewable resources are realized, and the method becomes one of the important ways of low-carbon transformation of the energy system. However, there are strong coupling and large dynamic characteristic differences between the energy subsystems in the integrated energy system, so that the economical low-carbon operation of the integrated energy system is challenging. Therefore, the optimal set value instruction is provided for the long-period operation of the integrated energy system at the system scheduling level, so that the stable, economical and flexible energy supply of the system is ensured, and the advantages of the integrated energy system are presented.
At present, a plurality of researches on a system control level exist for the operation optimization of a combined heat and power energy system, and the aim of realizing the coordination control of each energy source of the system is to quickly meet the load supply and demand balance of an electric side and a hot side. However, the lack of exact and reasonable control instructions can lead to a significant reduction in the economic stability of operation of the integrated energy system. For this reason, many scholars have conducted studies on the operation of integrated energy systems with the aim of achieving optimal performance of the system, thereby reducing costs, reducing carbon emissions, improving reliability, and better adapting to fluctuations in the energy market.
When a scheduling method of an integrated energy system is involved, various methods are available to meet the requirements of different systems. These methods can be classified into the following categories according to their characteristics:
a rule-based scheduling method: rule-based scheduling methods rely on predefined rules and policies to manage the integrated energy system. These rules may operate according to a schedule, such as providing additional power supply during peak hours and reducing supply during low peak hours. This approach is very effective in managing simple systems or requiring minimal computational complexity. However, these rules may become inflexible for complex systems and conditions that change rapidly, failing to effectively cope with changing demands and resources.
The optimal scheduling method comprises the following steps: the optimized scheduling method uses mathematical optimization techniques such as linear programming, integer programming, and nonlinear programming to determine the optimal energy configuration and scheduling strategy. This requires building a mathematical model of the system, including constraints and objective functions, and then using an optimization algorithm to find the optimal solution. The optimization method may take into account a number of objectives such as cost minimization, carbon emission minimization, and reliability maximization. The advantage of such approaches is that they can find globally optimal solutions taking into account multiple objectives, but typically require a large amount of computational resources.
The deep reinforcement learning scheduling method comprises the following steps: deep reinforcement learning methods have recently been used in comprehensive energy system scheduling. This approach utilizes deep neural networks and reinforcement learning techniques to learn the optimal decision strategy through interactions with the environment. The advantage of deep reinforcement learning is that it can cope with complex, nonlinear systems and changing conditions without explicit models. The system learns by interacting with the environment, optimizing decisions based on the reward signals, thereby continually improving performance. The method has remarkable effect under the conditions of high real-time requirements and large fluctuation of requirements and resources.
Disclosure of Invention
In order to solve the technical problems, the invention discloses a comprehensive energy system optimal scheduling method based on deep reinforcement learning, which aims to improve energy utilization efficiency, enhance sustainability, realize load demand balance, reduce manual intervention and cope with energy market fluctuation. The specific technical scheme is as follows:
a comprehensive energy system optimization scheduling method based on deep reinforcement learning comprises the following steps:
step 1: establishing a comprehensive energy system model, wherein the comprehensive energy system model comprises a wind generating set, a photovoltaic generating set, a storage battery model and a park electric load demand model;
step 2: establishing an economic optimization model according to the comprehensive energy system model, and defining system variables and constraints; constructing a deep reinforcement learning training model framework according to indexes, variables and constraints, namely designing reinforcement learning state variables S, scheduling variables A and reward functions r;
step 3: setting up a TD3 training network structure, and setting up a strategy network of the TD3 training network structure, network parameters of an evaluation network, buffer area size, discount factors and soft update rate;
step 4: through interaction of the comprehensive energy system models, the intelligent body is trained, so that the intelligent body learns how to make optimal decisions under different conditions, and the reward function is maximized, so that the stable and economic operation level of the comprehensive energy system is realized.
Further, in the mechanism of establishing the comprehensive energy system model in step 1, each model is constructed as follows:
the photovoltaic generator set model is shown in the following formula (1):
wherein P is PV The output power of the photovoltaic generator set is represented, and the unit kW is represented; y is Y PV The rated capacity of the photovoltaic generator set is the unit kW, and represents the output power under the standard test condition; f (f) PV Is a photovoltaic derating factor; g T Solar irradiation intensity of current time step, unit kW/m 2 ;G T,STC The unit kW/m is the solar irradiation intensity under standard test conditions 2 Usually 1 is taken; alpha P Is the power temperature coefficient of the photovoltaic cell panel, unit%/K; t (T) C The temperature of the photovoltaic cell is the current time step length, and the unit is K; t (T) C,STC The temperature of the photovoltaic cell under standard test conditions is shown as a unit K;
the fan output power of the wind generating set is estimated through the predicted wind speed and wind speed power characteristic curve, and the following formula (2) is shown:
in the method, in the process of the invention,the output power of the fan at the moment t is the unit kW; u (U) hub The wind speed is predicted at the height of the hub of the fan, and the unit is m/s; a, b, c, d are fitting coefficients; v ci 、v r 、v co The cut-in wind speed, the rated wind speed and the cut-out wind speed of the fan are respectively in units of m/s;
the battery model is represented by the following formula (3):
in the method, in the process of the invention,and->The unit MWh is the capacity of the battery energy storage system at the time t and the time t-1; /> The unit MW, n is the charge and discharge power of the battery energy storage system at the moment t c,BESS 、n d,BESS The unit percentage is the charge and discharge efficiency of the battery energy storage system.
Further, the economic optimization model of the comprehensive energy system is as follows:
min M total =M om +M buy (4)
wherein M is total As the total cost M om To run and maintain costs M buy The cost of electricity purchase is achieved.
Further, the constraints of the economic optimization model configured in the step 2 include a power balance constraint and a device operation constraint:
the power balance constraint is shown in the following formula (5):
in the method, in the process of the invention,the power and the wind power are respectively output by the photovoltaic generator set at the ith momentGroup output electric power, storage battery discharge power and charging power, main network purchase electric power, and waste electric power, unit kW; />For the user electrical load at instant i;
upper and lower limit constraint of charging and discharging power of the storage battery:
in the method, in the process of the invention,for maximum moment battery discharge power and charge power, < >>The discharging power and the charging power of the storage battery at the minimum moment;
upper and lower limit constraint of storage battery capacity:
wherein E is cap,ESS Is the rated capacity of the battery.
Further, the state variable S is designed to:
in a wind-solar energy storage coupling carbon capture utilization sealing system, the state should be selected to reflect the current running condition of the system, the environmental index directly related to the scheduling variable, and the time t and the electrical load demand P should be selected load Wind power generation P wind Photovoltaic power generation P pv State of charge S of battery bat Current time electricity price C E
The state variable S is represented by the following formula (9):
S=[t P load P wind P pv S bat C E ]。 (9)
further, the scheduling variable a selects:
the dispatching variable should select the variable directly influencing rewards and states, so the charging and discharging quantity delta P of the energy storage system at the current t moment is input c Electric quantity P for purchasing electricity of power grid buy The charge and discharge quantity of the energy storage system is unified into an increment variable, the value is positive and discharge, the value is negative and discharge,
A=[ΔP c Pbu y ]。 (10)
further, the reward function r is specifically:
the optimization goal of the intelligent agent is to find the economic optimal solution in the feasible domain, so the reward setting is divided into two parts of economic index rewards and out-of-limit penalties according to the following formula (11),
r=-k ope M total -k vio r vio (11)
k in ope 、k vio The economic and out-of-limit penalty scale factors, respectively.
Further, besides the hard constraint of the charge and discharge capacity of the storage battery and the purchase capacity of the power grid, the capacity limit of the energy storage system can be defined by r vio Continuous soft constraint implementation, as shown in formula (12) below,
wherein r is vio Penalty for violating the constraint.
Further, the decision network selects a scheduling variable according to the state variable, and a policy function of the decision network is updated through a deterministic policy gradient algorithm:
in which Q π (s,a)=E S~Pπ,a~π [R t |s,a]Wherein E is S~Pπ Predicted expected operation of energy system after acting according to pi strategy in state sCost;and carrying out gradient solving for the Actor network.
Furthermore, the evaluation network is an energy system scheduling value evaluation based on a current time state variable, a current time scheduling variable, a current time running cost and a next time state variable, the evaluation network consists of a Critic network and a Critic target network, and according to a Belman equation, the state cost function corresponds to an optimal strategy under the optimal condition,
Q π (s,a)=r+γE S′~a′ [Q π (s′,a′)]a′~π(s′) (14)
in which Q π (s ', a') represents the state and action of the energy system at the next moment.
Furthermore, the TD3 training network structure introduces two evaluation networks, and through comparison of the two networks, conservative evaluation values in the two evaluation networks are selected to serve as evaluation values:
the estimated values are:
wherein y is 1 ,y 2 Representing a target Q value; r is the reward at the current moment; θ 1 ,θ 2Two evaluation target network parameters and two evaluation network parameters are respectively. 11. The optimal scheduling method for the comprehensive energy system based on deep reinforcement learning according to claim 10, wherein the target value of the time difference can be obtained by integrating the formulas (15), (16):
the system first passes through the policy network at state variable s t Obtain the scheduling variable a on the basis of (a) t
Subsequently, the system state s t And a system schedule variable a t As input, through Critic1 network mapping, Q is obtained 1 (s t ,a t )、Q 2 (s t ,a t ) The network parameters are obtained through the double Q network loss function back propagation algorithm of (18),
wherein L is Q S is a loss function j ,a j And respectively making a decision of the energy system in the j-th training and information of the energy system at the current moment.
The deterministic strategy estimation function has the problem of overfitting, so Gaussian noise is added to the target strategy network:
ε=clip(N(0,σ),-c,c) (19)
where ε is noise, clip is a truncated function, and c is the noise truncated boundary value.
The beneficial effects of this application lie in:
(1) The system can more effectively distribute and utilize various energy resources, including wind generating sets, photovoltaic generating sets, storage batteries and main power purchasing by establishing a mechanism model of the comprehensive energy system and using deep reinforcement learning. This helps to improve the overall efficiency of the system, reduce energy waste, and improve the economic benefits of the energy system.
(2) The TD3 intelligent agent in the invention enables the system to make automatic and intelligent decisions. The intelligent agent continuously learns and improves the decision thereof through the interaction with the environment of the comprehensive energy system so as to adapt to different operating conditions. This reduces the need for manual intervention and increases the autonomy of the system.
(3) Through intelligent scheduling and management of electric resources, the system can learn and optimize energy production and distribution autonomously so as to ensure efficient operation of the system, reduce energy cost, reduce carbon emission and improve stability and reliability of the energy system.
Drawings
Fig. 1 is a schematic flow chart of an algorithm according to a specific embodiment of the present application.
Detailed Description
The technical scheme of the application will be described in detail below with reference to the accompanying drawings.
The corresponding integrated energy system in this embodiment is shown in fig. 1. The system mainly comprises a photovoltaic generator set, a wind generating set, a storage battery, a bus and a main network; the system demand side is mainly park electric side demand. Wherein, the storage battery participates in the power balance adjustment; the power consumption equipment in the system is met by the power generation equipment of the power consumption system, and is externally connected with a power grid, so that electricity is purchased when the electricity price is low.
In order to realize economic and efficient operation of the comprehensive energy system, the application provides a comprehensive energy system optimal scheduling method based on deep reinforcement learning. More specifically, the scheduling optimization training process of the integrated energy system in this embodiment is shown in fig. 1. Based on the strategy-evaluation network structure, the evaluation network realizes strategy evaluation, and the strategy network continuously optimizes the actions of the intelligent agent according to the evaluation network, interacts and updates the multiple networks, and learns how to select the best operation under different states so as to maximize rewards and obtain the economic strategy.
The invention discloses a comprehensive energy system optimal scheduling method based on deep reinforcement learning, which specifically comprises the following steps:
step 1: and establishing a mechanism model of the comprehensive energy system, wherein the mechanism model comprises a wind generating set, a photovoltaic generating set, a storage battery and other equipment models and a park electric load demand model.
Step 2: establishing an economic optimization model according to the comprehensive energy system unit, and defining system variables and constraints; and constructing a deep reinforcement learning training model framework according to the indexes, the variables and the constraints, namely designing reinforcement learning scheduling variables, state variables and rewarding functions.
Step 3: setting up a TD3 training network structure, setting a TD3 strategy, and evaluating network parameters, buffer area size, discount factors and soft update rate of the network.
Step 4: through interaction of the environment models of the comprehensive energy system, the intelligent body is trained, so that the intelligent body learns how to make optimal decisions under different conditions, and the reward function is maximized, so that the stable and economic operation level of the comprehensive energy system is realized.
Specifically, in step 1, the model is built as follows:
the photovoltaic generator set model is shown in formula (1):
wherein P is PV [kW]Representing the output power of the photovoltaic generator set; y is Y PV [kW]Is the rated capacity of the photovoltaic generator set and represents the output power (the irradiation intensity of illumination is 1 kW/m) 2 298K, windless); f (f) PV Is a photovoltaic derating factor; g T [kW/m2]Solar irradiation intensity of the current time step; GT, STC [ kW/m2 ]]The solar irradiation intensity under standard test conditions is usually 1; alpha P [%/K]Is the power temperature coefficient of the photovoltaic cell panel; t (T) C [K]The temperature of the photovoltaic cell for the current time step; t (T) C,STC [K]Is the temperature of the photovoltaic cell under standard test conditions.
The fan output power of the wind generating set can be estimated through the predicted wind speed and the wind speed power characteristic curve, as shown in the formula (2):
wherein,the output power of the fan at the moment t; u (U) hub [m/s]For the pre-treatment of the height of the fan hubMeasuring wind speed; a, b, c, d are fitting coefficients; v ci [m/s],v r [m/s],v co [m/s]The wind speed is cut-in, rated and cut-out of the fan.
The storage battery model is shown in formula (3):
based on the battery charge and discharge conditions, the battery energy storage system can estimate the battery energy storage capacity through equation (8).
Wherein,and->The capacity of the battery energy storage system is the time t and the time t-1;and->The charge and discharge power of the battery energy storage system at the moment t is n c,BESS [%]And n d,BESS [%]Is the charge and discharge efficiency of the battery energy storage system.
In the step 2, the economic optimization model is shown as a formula (4):
min M total =M om +M buy (4)
wherein M is total As the total cost M om To run and maintain costs M buy The cost of electricity purchase is achieved.
The constraints for configuring the economic optimization model in step 2 mainly comprise power balance constraints and equipment operation constraints.
The power balance constraint is as shown in formula (5):
wherein the method comprises the steps ofThe method comprises the steps of outputting electric power by a photovoltaic generator set at the i moment, outputting electric power by a wind generating set, discharging power and charging power of a storage battery, purchasing electric power from a main network and discarding electric power; />For the user electrical load at instant i.
The plant operation constraints are shown in formulas (6) - (8):
upper and lower limit constraint of charging and discharging power of storage battery:
upper and lower limit constraint of storage battery capacity:
deep Reinforcement Learning (DRL) framework design involves four key elements: status, action, rewards and environment. The state represents the context in which the agent is located, the action is an operation that the agent can perform, the reward is immediate feedback, and the environment is the outside world. In the DRL, the agent receives rewards by observing states, selecting actions, and interacting with the environment to learn how to formulate strategies to maximize jackpots.
(1) The state variable S is designed. In the wind-solar energy storage coupled carbon capture and utilization sealing system, the state should be selected to most reflect the current running condition of the system, and the state has environmental indexes directly related to actions. The time t and the electrical load demand P are selected load Wind power generation P wind Photovoltaic power generation P pv State of charge S of battery bat Current time electricity price C E
The state S can be expressed as formula (9):
S=[t P load P wind P pv S bat C E | (9)
(2) Scheduling variable a design. The dispatching variable should select the variable directly influencing rewards and states, so the charging and discharging quantity delta P of the energy storage system at the current t moment is input c Electric quantity P for purchasing electricity of power grid buy . And unifying the charge and discharge quantity of the energy storage system into an increment variable, wherein the value is positive and discharge, and the value is negative and discharge.
A=[ΔP c P buy ] (10)
(3) The bonus function r is designed. The optimization objective of the agent is to find an economically optimal solution in the feasible domain. Thus, the reward setting is divided into two parts, namely an economic index reward and an out-of-limit penalty according to the formula (11).
r=-k ope M total -k vio r vio (11)
Wherein k is ope 、k vio The economic and out-of-limit penalty scale factors, respectively.
The capacity constraint of the energy storage system is defined by r vio Continuous soft constraint implementation is shown in formula (12).
Step 3: the construction of the TD3 training network is shown in FIG. 1.
In fig. 1, the integrated energy system state s at time step t is depicted t And (5) collecting. These states are mapped to the energy system decision variables a via a policy function t And to increase the exploration property, the decision variable a will be given t Noise is added. The energy system management agent interacts with the environment at time t+1 to generate a new state s t+1 And the economic cost r required by the energy system at the next moment t+1 . This information is stored in an experience pool for random extraction as training samples when the network is trained.
The decision network adopted in the figure has the function of selecting a scheduling variable according to a state variable, and the policy function of the decision network is updated through a deterministic policy gradient algorithm:
wherein Q is π (s,a)=E S~Pπ,a~π [R t |s,a]Wherein E is S~Pπ Expected benefits predicted after the energy system acts according to pi strategy in the state s;and carrying out gradient solving for the Actor network.
The evaluation network is composed of a Critic network and a Critic target policy network. According to the Belman equation, the state cost function corresponds to the optimal strategy under the optimal condition.
Q π (s,a)=r+γE S′~a′ [Q π (s′,a′)]a′~π(s′) (14)
Wherein Q is π (s ', a') represents the state and action of the energy system at the next moment.
TD3 introduces two evaluation networks, and through comparison of two networks, conservative evaluation values in the two evaluation networks are selected to be used as evaluation values:
the estimated values are:
wherein y is 1 ,y 2 Representing a target Q value; r is the reward at the current moment; θ 1 ,θ 2Two evaluation strategy target network parameters and two evaluation strategy network parameters are respectively adopted.
Thus integrating equations (15), (16) can yield a target value of the time difference:
the system first passes through the policy network at state variable s t Obtain the scheduling variable a on the basis of (a) t
Subsequently, the system state s t And a system schedule variable a t As input, through Critic1 network mapping, Q is obtained 1 (s t ,a t )。Q 2 (s t ,a t ). The network parameters are obtained by the double Q network loss function back propagation algorithm of equation (18).
Wherein L is Q S is a loss function j ,a j And respectively making a decision of the energy system in the j-th training and information of the energy system at the current moment.
The deterministic strategy estimation function has the problem of overfitting, so Gaussian noise is added to the target strategy network:
ε=clip(N(0,σ),-c,c) (19)
epsilon is noise, clip is a truncation function, and c is a noise truncation boundary value.
The objective function adopts a soft update mode to gradually update the parameters of the strategy network to the objective strategy network, and simultaneously gradually update the parameters of the evaluation network to the objective evaluation network. Introducing a learning rate tau, and updating the strategy to be:
θ i =τθ i +(1-τ)θ″ i
table 1 TD3 training parameter settings
In the step 4, training the intelligent agent based on the MATLAB platform, setting the maximum training step length as 24 steps, representing optimization for 24 hours, and setting the condition of storing the intelligent agent as the average rewarding value to reach r std I.e. to save the agent. The final training stopping condition is set to be the maximum training round number E max . The training process is divided into two steps, and the first step is to train the intelligent agent to avoid the operation constraint of the equipment, and store the intelligent agent after convergence. And adding a second part of economic index based on the trained intelligent agent, and performing overall optimization again to realize the energy system optimization scheduling intelligent agent.
The foregoing is an exemplary embodiment of the present application, the scope of which is defined by the claims and their equivalents.

Claims (12)

1. The optimized scheduling method of the comprehensive energy system based on the deep reinforcement learning is characterized by comprising the following steps of:
step 1: establishing a comprehensive energy system model, wherein the comprehensive energy system model comprises a wind generating set, a photovoltaic generating set, a storage battery model and a park electric load demand model;
step 2: establishing an economic optimization model according to the comprehensive energy system model, and defining system variables and constraints; constructing a deep reinforcement learning training model framework according to indexes, variables and constraints, namely designing reinforcement learning state variables S, scheduling variables A and reward functions r;
step 3: setting up a TD3 training network structure, and setting up a strategy network of the TD3 training network structure, network parameters of an evaluation network, buffer area size, discount factors and soft update rate;
step 4: through interaction of the comprehensive energy system models, the intelligent body is trained, so that the intelligent body learns how to make optimal decisions under different conditions, and the reward function is maximized, so that the stable and economic operation level of the comprehensive energy system is realized.
2. The optimal scheduling method of the comprehensive energy system based on deep reinforcement learning according to claim 1, wherein in the mechanism of establishing the comprehensive energy system model in step 1, each model is established as follows:
the photovoltaic generator set model is shown in the following formula (1):
wherein P is PV The output power of the photovoltaic generator set is represented, and the unit kW is represented; y is Y PV The rated capacity of the photovoltaic generator set is the unit kW, and represents the output power under the standard test condition; f (f) PV Is a photovoltaic derating factor; g T Solar irradiation intensity of current time step, unit kW/m 2 ;G T,STC The unit kW/m is the solar irradiation intensity under standard test conditions 2 Usually 1 is taken; alpha P Is the power temperature coefficient of the photovoltaic cell panel, unit%/K; t (T) C The temperature of the photovoltaic cell is the current time step length, and the unit is K; t (T) C,STC The temperature of the photovoltaic cell under standard test conditions is shown as a unit K;
the fan output power of the wind generating set is estimated through the predicted wind speed and wind speed power characteristic curve, and the following formula (2) is shown:
in the method, in the process of the invention,the output power of the fan at the moment t is the unit kW; u (U) hub The wind speed is predicted at the height of the hub of the fan, and the unit is m/s; a, b, c, d are fitting coefficients; v ci 、v r 、v co The cut-in wind speed, the rated wind speed and the cut-out wind speed of the fan are respectively in units of m/s;
the battery model is represented by the following formula (3):
in the method, in the process of the invention,and->The unit MWh is the capacity of the battery energy storage system at the time t and the time t-1; /> The unit MW, n is the charge and discharge power of the battery energy storage system at the moment t c,BESS 、n d,BESS The unit percentage is the charge and discharge efficiency of the battery energy storage system.
3. The optimal scheduling method for the comprehensive energy system based on deep reinforcement learning according to claim 1, wherein the economic optimization model of the comprehensive energy system is as follows:
min M total =M om +M buy (4)
wherein M is total As the total cost M om To run and maintain costs M buy The cost of electricity purchase is achieved.
4. The optimization scheduling method of the comprehensive energy system based on deep reinforcement learning according to claim 1, wherein the constraints for configuring the economic optimization model in the step 2 include a power balance constraint and a device operation constraint:
the power balance constraint is shown in the following formula (5):
in the method, in the process of the invention,the power supply unit comprises a photovoltaic generator set output electric power at the ith moment, a wind power generator set output electric power, storage battery discharge power and charging power, main network purchase electric power and waste electric power, wherein the units are kW; />For the user electrical load at instant i;
upper and lower limit constraint of charging and discharging power of the storage battery:
in the method, in the process of the invention,for maximum moment battery discharge power and charge power, < >>The discharging power and the charging power of the storage battery at the minimum moment;
upper and lower limit constraint of storage battery capacity:
wherein E is cap,ESS Is the rated capacity of the battery.
5. The optimal scheduling method for the comprehensive energy system based on deep reinforcement learning according to claim 1, wherein the state variable S is designed by:
in the wind-solar energy storage coupled carbon capture and utilization sealing system, the state should be selected to reflect the current running state of the system, the environmental index directly related to the dispatching variable,selecting time t and electric load demand P load Wind power generation P wind Photovoltaic power generation P pv State of charge S of battery bat Current time electricity price C E
The state variable S is represented by the following formula (9):
S=[tP load P wind P pv S bat C E ](9)。
6. the optimal scheduling method for the comprehensive energy system based on deep reinforcement learning according to claim 1, wherein the scheduling variable a is selected from the following:
the dispatching variable should select the variable directly influencing rewards and states, so the charging and discharging quantity delta P of the energy storage system at the current t moment is input c Electric quantity P for purchasing electricity of power grid buy The charge and discharge quantity of the energy storage system is unified into an increment variable, the value is positive and discharge, the value is negative and discharge,
A=[ΔP c P buy ](10)。
7. the optimal scheduling method for the comprehensive energy system based on deep reinforcement learning according to claim 1, wherein the reward function r is specifically:
the optimization goal of the intelligent agent is to find the economic optimal solution in the feasible domain, so the reward setting is divided into two parts of economic index rewards and out-of-limit penalties according to the following formula (11),
r=-k ope M total -k vio r vio (11)
k in ope 、k vio The economic and out-of-limit penalty scale factors, respectively.
8. The optimal scheduling method of the comprehensive energy system based on deep reinforcement learning according to claim 1, wherein the capacity limit of the energy storage system can be defined by r in addition to the hard constraint of the charge and discharge capacity of the storage battery and the purchase capacity of the power grid vio Continuous soft constraint implementation, as shown in formula (12) below,
wherein r is vio Penalty for violating the constraint.
9. The optimal scheduling method for the comprehensive energy system based on deep reinforcement learning according to claim 1, wherein the decision network selects scheduling variables according to state variables, and a policy function of the decision network is updated by a deterministic policy gradient algorithm:
in which Q π (s,a)=E S~Pπ,a~π [R t |s,a]Wherein E is s~Pπ The expected running cost predicted after the energy system acts according to pi strategy in the state s is given;and carrying out gradient solving for the Actor network.
10. The optimal scheduling method for the comprehensive energy system based on deep reinforcement learning according to claim 1, wherein the evaluation network is an energy system scheduling value evaluation based on a state variable at a current moment, a scheduling variable at the current moment, an operation cost at the current moment and a state variable at a next moment, the evaluation network consists of a Critic network and a Critic target network, the state cost function corresponds to an optimal strategy under the optimal condition according to a Belman equation,
Q π (s,a)=r+γE S′~a′ [Q π (s′,a′)]a′~π(s′) (14)
in which Q π (s ', a') represents the state and action of the energy system at the next moment.
11. The optimal scheduling method for the comprehensive energy system based on deep reinforcement learning according to claim 1, wherein the TD3 training network structure introduces two evaluation networks, and the conservative evaluation values in the two evaluation networks are selected as the evaluation values through the comparison of the two networks:
the estimated values are:
wherein y is 1 ,y 2 Representing a target Q value; r is the reward at the current moment;two evaluation target network parameters and two evaluation network parameters are respectively.
12. The optimal scheduling method for the comprehensive energy system based on deep reinforcement learning according to claim 11, wherein the target value of the time difference can be obtained by integrating the formulas (15), (16):
the system first passes through the policy network at state variable s t Obtain the scheduling variable a on the basis of (a) t
Subsequently, the system state s t And a system schedule variable a t As input, through Critic1 network mapping, Q is obtained 1 (s t ,a t )、Q 2 (s t ,a t ) The network parameters are obtained through the double Q network loss function back propagation algorithm of (18),
wherein L is Q S is a loss function j ,a j Respectively making a decision on the energy system in the j-th training and information on the energy system at the current moment;
the deterministic strategy estimation function has the problem of overfitting, so Gaussian noise is added to the target strategy network:
ε=clip(N(0,σ),-c,c) (19)
where ε is noise, clip is a truncated function, and c is the noise truncated boundary value.
CN202311488353.6A 2023-11-09 2023-11-09 Comprehensive energy system optimal scheduling method based on deep reinforcement learning Pending CN117455183A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311488353.6A CN117455183A (en) 2023-11-09 2023-11-09 Comprehensive energy system optimal scheduling method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311488353.6A CN117455183A (en) 2023-11-09 2023-11-09 Comprehensive energy system optimal scheduling method based on deep reinforcement learning

Publications (1)

Publication Number Publication Date
CN117455183A true CN117455183A (en) 2024-01-26

Family

ID=89596463

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311488353.6A Pending CN117455183A (en) 2023-11-09 2023-11-09 Comprehensive energy system optimal scheduling method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN117455183A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108537357A (en) * 2018-02-09 2018-09-14 上海电气分布式能源科技有限公司 Photovoltaic power generation quantity loss forecasting method based on derating factor
CN113723749A (en) * 2021-07-20 2021-11-30 中国电力科学研究院有限公司 Multi-park comprehensive energy system coordinated scheduling method and device
CN114091879A (en) * 2021-11-15 2022-02-25 浙江华云电力工程设计咨询有限公司 Multi-park energy scheduling method and system based on deep reinforcement learning
CN114417695A (en) * 2021-11-30 2022-04-29 国网浙江省电力有限公司台州供电公司 Multi-park comprehensive energy system economic dispatching method
CN114462696A (en) * 2022-01-27 2022-05-10 合肥工业大学 Comprehensive energy system source-load cooperative operation optimization method based on TD3
CN114519456A (en) * 2022-01-14 2022-05-20 东南大学 Green agriculture zero-carbon energy supply system and intelligent configuration layered optimization algorithm thereof
WO2022160705A1 (en) * 2021-01-26 2022-08-04 中国电力科学研究院有限公司 Method and apparatus for constructing dispatching model of integrated energy system, medium, and electronic device
CN115186885A (en) * 2022-06-29 2022-10-14 山东大学 Comprehensive energy system energy optimization scheduling method and system based on reinforcement learning
WO2023082697A1 (en) * 2021-11-15 2023-05-19 中国电力科学研究院有限公司 Coordination and optimization method and system for comprehensive electric-thermal energy system, and device, medium and program
CN116663820A (en) * 2023-05-19 2023-08-29 合肥工业大学 Comprehensive energy system energy management method under demand response

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108537357A (en) * 2018-02-09 2018-09-14 上海电气分布式能源科技有限公司 Photovoltaic power generation quantity loss forecasting method based on derating factor
WO2022160705A1 (en) * 2021-01-26 2022-08-04 中国电力科学研究院有限公司 Method and apparatus for constructing dispatching model of integrated energy system, medium, and electronic device
CN113723749A (en) * 2021-07-20 2021-11-30 中国电力科学研究院有限公司 Multi-park comprehensive energy system coordinated scheduling method and device
CN114091879A (en) * 2021-11-15 2022-02-25 浙江华云电力工程设计咨询有限公司 Multi-park energy scheduling method and system based on deep reinforcement learning
WO2023082697A1 (en) * 2021-11-15 2023-05-19 中国电力科学研究院有限公司 Coordination and optimization method and system for comprehensive electric-thermal energy system, and device, medium and program
CN114417695A (en) * 2021-11-30 2022-04-29 国网浙江省电力有限公司台州供电公司 Multi-park comprehensive energy system economic dispatching method
CN114519456A (en) * 2022-01-14 2022-05-20 东南大学 Green agriculture zero-carbon energy supply system and intelligent configuration layered optimization algorithm thereof
CN114462696A (en) * 2022-01-27 2022-05-10 合肥工业大学 Comprehensive energy system source-load cooperative operation optimization method based on TD3
CN115186885A (en) * 2022-06-29 2022-10-14 山东大学 Comprehensive energy system energy optimization scheduling method and system based on reinforcement learning
CN116663820A (en) * 2023-05-19 2023-08-29 合肥工业大学 Comprehensive energy system energy management method under demand response

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
席先鹏: "基于预测辅助深度强化学习的综合能源系统运行优化研究", 万方-学位论文, 2 October 2023 (2023-10-02), pages 25 - 63 *

Similar Documents

Publication Publication Date Title
Athari et al. Operational performance of energy storage as function of electricity prices for on-grid hybrid renewable energy system by optimized fuzzy logic controller
CN105071389B (en) The alternating current-direct current mixing micro-capacitance sensor optimizing operation method and device of meter and source net load interaction
CN112713618B (en) Active power distribution network source network load storage cooperative optimization operation method based on multi-scene technology
CN109066750B (en) Photovoltaic-battery micro-grid hybrid energy scheduling management method based on demand side response
CN107546781A (en) Micro-capacitance sensor multiple target running optimizatin method based on PSO innovatory algorithms
Bonthu et al. Minimization of building energy cost by optimally managing PV and battery energy storage systems
CN116388252B (en) Wind farm energy storage capacity optimal configuration method, system, computer equipment and medium
CN117057553A (en) Deep reinforcement learning-based household energy demand response optimization method and system
CN107273968A (en) A kind of Multiobjective Scheduling method and device based on dynamic fuzzy Chaos-Particle Swarm Optimization
CN111668878A (en) Optimal configuration method and system for renewable micro-energy network
CN114243791A (en) Multi-objective optimization configuration method, system and storage medium for wind-solar-hydrogen storage system
CN112734116A (en) Optimal scheduling method for active power distribution network containing distributed energy storage
Huangfu et al. An optimal energy management strategy with subsection bi-objective optimization dynamic programming for photovoltaic/battery/hydrogen hybrid energy system
CN115115130A (en) Wind-solar energy storage hydrogen production system day-ahead scheduling method based on simulated annealing algorithm
CN115986834A (en) Near-end strategy optimization algorithm-based optical storage charging station operation optimization method and system
Gbadega et al. JAYA algorithm-based energy management for a grid-connected micro-grid with PV-wind-microturbine-storage energy system
Saber et al. Smart micro-grid optimization with controllable loads using particle swarm optimization
CN115940284B (en) Operation control strategy of new energy hydrogen production system considering time-of-use electricity price
Bonthu et al. Energy cost optimization in microgrids using model predictive control and mixed integer linear programming
CN115411776B (en) Thermoelectric collaborative scheduling method and device for residence comprehensive energy system
CN116865270A (en) Optimal scheduling method and system for flexible interconnection power distribution network containing embedded direct current
CN114285093B (en) Source network charge storage interactive scheduling method and system
CN117455183A (en) Comprehensive energy system optimal scheduling method based on deep reinforcement learning
Yu et al. A fuzzy Q-learning algorithm for storage optimization in islanding microgrid
CN113410900A (en) Micro-grid HESS optimization configuration method and system based on self-adaptive difference whale optimization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination