CN115759604A - Optimized scheduling method for comprehensive energy system - Google Patents
Optimized scheduling method for comprehensive energy system Download PDFInfo
- Publication number
- CN115759604A CN115759604A CN202211397926.XA CN202211397926A CN115759604A CN 115759604 A CN115759604 A CN 115759604A CN 202211397926 A CN202211397926 A CN 202211397926A CN 115759604 A CN115759604 A CN 115759604A
- Authority
- CN
- China
- Prior art keywords
- algorithm
- scheduling
- energy system
- learning
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses an optimized scheduling method of a comprehensive energy system, belonging to the technical field of algorithm optimized scheduling and comprising the following steps: constructing an integrated energy system, and obtaining a scheduling model based on the integrated energy system and a reinforcement learning algorithm, wherein the scheduling model comprises an intelligent agent and an environment; and correcting a Q value function in the reinforcement learning algorithm based on advantage learning to obtain a comprehensive algorithm, and training the intelligent agent based on the comprehensive algorithm to obtain an optimized scheduling strategy. The invention utilizes the dominant learning value function theory framework to combine with the SAC algorithm, improves the dominant learning value function theory framework and realizes the optimized dispatching of the comprehensive energy system by taking low carbon and economy as targets.
Description
Technical Field
The invention belongs to the technical field of algorithm optimization scheduling, and particularly relates to an optimization scheduling method of a comprehensive energy system.
Background
As a new energy management mode, the comprehensive energy system aims to realize the efficient application of various energy sources by utilizing advanced communication and control technologies, and is beneficial to improving the energy utilization efficiency and the consumption proportion of renewable energy sources.
In the prior art, for the optimized scheduling of the integrated energy system, deep reinforcement-learning (DRL) is mostly adopted as an effective means for processing a sequence decision problem, but in the optimized scheduling of the integrated energy system, there are two difficulties in the DRL optimized scheduling based on a policy gradient: firstly, the problem of over-estimation is solved, the greedy thought of the algorithm can excessively estimate the Q values corresponding to some non-optimal actions, the scheduling strategy is disturbed to generate, wrong judgment is carried out in a new environment, and the generalization capability is reduced. Secondly, the convergence speed is lower during algorithm training. The intelligent agent needs to obtain more data samples in the new scene to improve the scheduling strategy of the intelligent agent, but the samples need to be collected again each time the strategy is improved, so that the utilization efficiency of the samples is low, the learning efficiency of the intelligent agent is reduced, and the convergence rate of the DRL is slower along with the addition of new training samples.
Disclosure of Invention
The invention aims to provide an optimized scheduling method of an integrated energy system, which aims to solve the problem of low convergence speed in overestimation and training in the prior art.
In order to achieve the purpose, the invention provides an optimized scheduling method of a comprehensive energy system, which comprises the following steps:
constructing an integrated energy system, and obtaining a scheduling model based on the integrated energy system and a reinforcement learning algorithm, wherein the scheduling model comprises an intelligent agent and an environment;
and correcting a Q value function in the reinforcement learning algorithm based on advantage learning to obtain a comprehensive algorithm, and training the intelligent agent based on the comprehensive algorithm to obtain an optimized scheduling strategy.
Preferably, the integrated energy system operates a plurality of equipment models through grid connection, wherein the equipment models include: the system comprises a hydrogen energy storage model, an electric energy storage model, a combined heat and power generation model, an electric boiler model, a gas boiler model and a heat exchange device model.
Preferably, the process of obtaining the scheduling model comprises:
acquiring constraint balance in the integrated energy system, and constructing a scheduling model through a reinforcement learning algorithm based on the constraint balance, wherein the constraint balance comprises the following steps: power grid balancing, heat supply network balancing and air grid balancing.
Preferably, the reinforcement learning algorithm includes: algorithm iteration and algorithm parameter updating.
Preferably, the process of obtaining the synthesis algorithm comprises:
obtaining a Q value network loss function in algorithm iteration, calculating the reduction rate of the loss function, judging the starting advantage learning based on the reduction rate, correcting the Q value function in the reinforcement learning algorithm based on the judgment result, and finally obtaining a comprehensive algorithm.
Preferably, the Q value function is a function between a state parameter and an action parameter in the integrated energy system at the time t.
Preferably, the method further comprises a process of combining the synthesis algorithm with the transfer learning:
obtaining scheduling knowledge based on a comprehensive algorithm, and migrating the scheduling knowledge to a target task; and fine-tuning the scheduling strategy based on the migration result to obtain an optimized scheduling strategy.
Preferably, the process of migrating the scheduling knowledge into the target task includes:
and based on the scheduling knowledge, performing parameter migration on the deep neural network, judging the environment of the target task through a k-means clustering algorithm, and migrating the scheduling knowledge to the target task based on a judgment result.
The invention has the technical effects that:
the method comprises the steps of constructing an integrated energy system, and obtaining a scheduling model based on the integrated energy system and a reinforcement learning algorithm, wherein the scheduling model comprises an intelligent agent and an environment; and correcting a Q value function in the reinforcement learning algorithm based on advantage learning to obtain a comprehensive algorithm, and training the intelligent agent based on the comprehensive algorithm to obtain an optimized scheduling strategy.
The optimal scheduling of the comprehensive energy system is realized by combining the advantage learning value function theory framework with the SAC algorithm and improving the algorithm with low carbon and economy as targets, the optimal scheduling of the comprehensive energy system is more robust by the maximum entropy mechanism of the SAC in the method, after the idea of advantage learning is combined, the over-estimation of a Q network on the value of non-optimal actions is reduced, the misselection of an intelligent body on the non-optimal actions is reduced, the generalization capability is improved, and meanwhile, the stability judgment of a neural network is added into the algorithm to determine whether to start the advantage learning or not, so that the parameter iteration of the neural network in the early stage of the interference of the advantage learning is prevented.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application. In the drawings:
FIG. 1 is a diagram of an integrated energy system according to an embodiment of the present invention;
figure 2 is a diagram of a markov decision process in an embodiment of the present invention;
FIG. 3 is a flow chart of the ALSAC algorithm in an embodiment of the present invention;
fig. 4 is an optimized scheduling diagram of the integrated energy system based on the ALSAC algorithm and the transfer learning in the embodiment of the present invention;
FIG. 5 is a K-Means cluster map of historical data in an embodiment of the present invention;
FIG. 6 is a graph of wind, solar, power generation and electrical heating load for a test scenario in an embodiment of the present invention;
FIG. 7 is a comparison of the convergence process of the reward function values R in an embodiment of the present invention;
fig. 8 is a schematic diagram illustrating a result of optimized scheduling of a heat supply network in an embodiment of the present invention;
FIG. 9 is a diagram illustrating an optimized scheduling result according to an embodiment of the present invention;
FIG. 10 is a diagram illustrating an algorithm training process of scenario 1 in an embodiment of the present invention;
FIG. 11 is a graphical representation of scene 4 wind-solar power generation power and electrical heating load curves in an embodiment of the present invention;
FIG. 12 is a schematic diagram of SAC optimization results with learning migration in an embodiment of the present invention;
FIG. 13 is a schematic diagram of ALSAC optimization results with learning migration in an embodiment of the present invention;
fig. 14 is a schematic diagram of an ALSAC optimization result without combination of transfer learning in the embodiment of the present invention.
Detailed Description
It should be noted that, in the present application, the embodiments and features of the embodiments may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.
Example one
The embodiment provides an optimal scheduling method for an integrated energy system, which includes:
composition of 1, gas-electricity-heat comprehensive energy system and equipment model thereof
The comprehensive energy system scheduling model constructed in the embodiment adopts grid-connected operation, and the given structure is shown in fig. 1.
Wherein the major components include:
1.1 Hydrogen energy storage model
The hydrogen production model adopts proton exchange membrane water hydrogen production equipment and utilizes solid polymer water to produce hydrogen. The hydrogen production and the hydrogen storage capacity of the hydrogen storage tank are as follows:
V HES (t)=P HES (t)η HES (1)
V HSOC (t)=V HSOC (t-1)+V HES (t)η t -V HOUT (t)η HOUT (2)
wherein V HES (t) is the volume of hydrogen produced by electrolysis over a period of time t; p HES (t) is the electrical power consumed for the period t; eta HES 、η t 、η HOUT The electrolysis efficiency, the hydrogen storage efficiency and the output efficiency of the hydrogen storage tank are obtained; v HSOC (t) the hydrogen storage amount of the hydrogen storage tank in the period of t; v HOUT (t) is the volume of hydrogen output by the hydrogen storage tank during period t. The hydrogen output constraint conditions of the electrolytic cell are as follows:
V HES,min <V HES (t)<V HES,max (3)
in the formula V HES,max ,V HES,min Respectively the upper limit and the lower limit of the hydrogen production of the electrolytic cell at the t time period.
The energy storage state of the hydrogen storage tank is represented by the ratio of the current storage capacity to the maximum storage capacity of the hydrogen storage tank:
in the formula SOC h (t) is the hydrogen storage tank energy storage state, V h,max Storing the maximum storage amount for hydrogen. Constraint conditions of the hydrogen energy storage tank:
SOC h,min <SOC h (t)<SOC h,max (5)
V HOUT,min <V HOUT (t)<V HOUT,max (6)
in the formula SOC h,max 、SOC h,min Upper and lower limits of hydrogen storage state, V HOUT,max 、V HOUT,min And outputting upper and lower limits of hydrogen energy storage for a time period t.
Volume V of hydrogen gas output by hydrogen storage tank at t time interval HOUT (t) its use is daily industrial hydrogen demand and natural gas pipeline mixed hydrogen transport:
V HOUT (t)=V HDE (t)+V H,in (t) (7)
in the formula V HDE (t) the industrial hydrogen demand volume over a period of t; v H,in And (t) is the volume of the mixed hydrogen in the natural gas pipeline in the period of t.
1.2 electric energy storage model
The electrical energy storage model of the present embodiment is composed of a storage battery. The state of charge formula for a battery is as follows:
in the formula SOC e (t) represents the state of charge of the accumulator at time t, P soc,in (t)、P soc,out (t) represents the charging and discharging power of the storage battery in the period of t; w soc Is the maximum capacity of the battery; eta e For charge-discharge efficiency, Δ t is the time interval. In order to prolong the service life of the storage battery, the constraint conditions are as follows:
SOC e,min <SOC e (t)<SOC e,max (9)
0<P soc,in (t)<P soc,inmax (10)
0<P soc,out (t)<P soc,outmax (11)
in the formula SOC e,max 、SOC e,min The upper and lower limits of the energy storage charge state; p soc,inmax 、P soc,outmax The maximum value of the energy storage charging and discharging power is obtained.
1.3 Cogeneration model
The cogeneration unit includes a gas turbine and a heat recovery boiler. The gas turbine generates electric energy through consumption of natural gas, and meanwhile, flue gas with heat energy can be generated to output heat power. Power generation of gas turbine:
P GT (t)=V GT (t)q NG η GT (12)
in the formula: p GT (t) generating power of the gas boiler for a period of t; v GT (t) the consumption of natural gas in a unit time of cogeneration in a period t; q. q.s NG Low heat of natural gasA value; eta GT The power generation efficiency of the gas turbine. The power generation power of the gas turbine meets the constraint conditions:
P GT,min <P GT (t)<P GT,max (13)
in the formula P GT,max 、P GT,min The upper limit and the lower limit of the generating power of the gas turbine in the t period. Mathematical expression for thermal power generated by a gas turbine:
Q GT (t)=V GT (t)q NG (1-η GT ) (14)
wherein Q GT And (t) the output thermal power of the waste heat recovery boiler in the period of t.
The thermal power constraints of a gas turbine are:
Q GT,min <Q GT (t)<Q GT,max (15)
in the formula Q GT,max 、Q GT,min Respectively outputting upper and lower limits of thermal power for the period t of the gas turbine.
The waste heat recovery boiler collects heat in the flue gas discharged by the gas turbine and supplies the heat to the heat supply network. The output thermal power is as follows:
Q HRSG (t)=Q GT (t)η HRSG (16)
in the formula: q HRSG (t) the output thermal power of the waste heat recovery boiler in the period of t; q GT (t) thermal power output by the gas turbine during time t; eta HRSG The heat exchange efficiency of the waste heat boiler is improved. The upper and lower limits of the heat output power of the waste heat recovery boiler are as follows:
Q HRSG,min <Q HRSG (t)<Q HRSG,max (17)
in the formula Q HRSG,max ,Q HRSG,min Respectively are the upper limit and the lower limit of the output power of the waste heat recovery boiler in the t time period.
1.4 electric boiler model
The electric boiler can change the electric energy converted from clean energy into heat energy without natural gas combustion, thereby greatly reducing carbon emission and improving the consumption of clean energy. The mathematical expression for heat emission is:
Q EB (t)=P EB (t)η EB (18)
in the formula P EB (t) and Q EB (t) the electric power for the electric boiler is used and the heating power is respectively in the period of t; eta EB The electric heat conversion efficiency of the electric boiler is obtained. The thermal power of the electric boiler meets the constraint condition:
Q EB,min <Q EB (t)<Q EBmax (19)
in the formula Q EBmax ,Q EB,min Which are the upper and lower limits of the output power of the electric boiler at the time interval of t respectively.
1.5 gas boiler model
The gas boiler is equipment for generating heat energy by utilizing natural gas in a comprehensive energy system, and the heat power output of the gas boiler is as follows:
Q SB (t)=V SB (t)q NG η SB (20)
in the formula: q SB (t) thermodynamic rate of gas boiler output for a period of t; v SB (t) the natural gas consumption of the gas boiler during the period t; eta SB Is the efficiency of a gas boiler. Q SB (t) satisfying the constraint condition:
Q SB,min <Q SB (t)<Q SBmax (21)
in the formula Q SBmax ,Q SB,min Respectively the upper and lower limits of the output power of the gas boiler in the time period t.
1.6 Heat exchanger model
The heat exchange device can convert heat energy transmitted by the waste heat recovery boiler, the electric boiler and the gas boiler to supply heat load requirements, and the formula of the output heat power is as follows:
Q HE (t)=Q HE,in (t)η HE (22)
in the formula Q HE (t) outputting thermal power by the heat exchange device in a period of t; q HE,in (t) is the heat supply network heat power input quantity in the period of t; eta HE The efficiency of heat energy conversion. The heat exchange device outputs heat power under the constraint conditions that:
Q HE,min <Q HE (t)<Q HE,max (23)
in the formula Q HE,max ,Q HE,min Respectively as the output of the heat exchange device at t time intervalUpper and lower limits of power.
1.7 constraints
According to the energy structure composition of the comprehensive energy system, the constraint balance is as follows:
a power grid balance equation:
L E (t)+P HES (t)+P soc,in (t)+P EB (t)=P GT (t)+P soc,out (t)+P G (t)+P solar (t)+P wind (t) (24)
in the formula: p G (t)、P solar (t)、P wind (t) respectively, electric power flows into the integrated energy system for the grid (P when power generated by the integrated energy system flows into the grid G (t) is a negative value), photovoltaic power generation power, and fan power generation power; l is a radical of an alcohol E And (t) is the electrical load power.
Heat supply network equilibrium equation:
Q HRSG (t)+Q EB (t)+Q SB (t)=L Q (t)/η HE (25)
in the formula L Q (t) is the heat load.
Air network balance equation:
V HOUT (t)+V GT (t)+V SB (t)+V RES (t)=V H (t) (26)
in the formula V RES And (t) is the gas consumption of residents in the period of t. V H (t) is the natural gas output for a period of t. In order to meet the requirement of full-load operation of gas unit energy, the natural gas pipeline is operated at t time interval V H (t) output limit is:
0<V H (t)<V H,max (27)
in the formula V H,max Is the upper limit of the output gas in the period t of the natural gas pipe.
According to the experience of international existing projects, the volume fraction of hydrogen mixed into natural gas can reach up to 20%. Under the condition of considering the thermal efficiency of the fuel gas, 12T-0 is taken as the blending reference base gas, the 5% hydrogen blending proportion is selected, the wonderful number and the calorific value of the mixed fuel gas are superior to other proportions, and the fuel gas quality meets the technical index that the high-order calorific value of a natural gas in the national standard GB17820-2012 is not less than 36.0MJ/m 3. In this embodiment, the constraint conditions for the total amount of hydrogen delivered from the hydrogen storage tank to the natural gas pipeline in the t period are as follows:
0<V H,in (t)<5.26%V H,max (28)
2,SAC algorithm principle
2.1 reinforcement learning
Reinforcement learning is based on Markov Decision Process (MDP), i.e. the process that the agent makes the action of the next environment and obtains the reward based on the current environment information and the agent obtains the maximum reward through continuous trial and error.
As shown in fig. 2, an agent refers to a controller based on some control algorithm. The model of the markov decision process is generally represented as a tuple (S, a, P, R), where: s is a state set, A is an action set, P is a state transition probability, and R is a reward and punishment function.
2.2SAC Algorithm
When the problem to be solved is unknown and the environment information is various, the dimensionality of the state space is too high, and the reinforcement learning cannot be applied. In order to enable reinforcement learning to deal with high-dimensional events, deep Learning (DL) has been introduced for this purpose, and both are combined into a DRL.
The SAC algorithm is a reinforcement learning algorithm proposed by Harrnoja et al, and compared with other strategy gradient-based DRL algorithms PPO, actor-Critic multithreading (A3C) and DDPG, an action maximum entropy encouraging mechanism is introduced, so that the robustness of the algorithm is improved, and a better scheduling strategy can be explored in a complex power environment.
2.2.1SAC maximum entropy
Entropy, defined as the expectation of the amount of information, is a measure describing the uncertainty of random variables, which increases when the event uncertainty increases.
H (X) is entropy, P (X) i ) Is the event probability. Excellent DRL can search the ring as much as possibleInstead of greedy acting to maximize some reward, the situation gets an optimal strategy, trapping local optimality. When one action is repeatedly selected, the entropy is reduced, and by using the maximum entropy mechanism, the SAC can select other actions, so that the exploration range is increased, more scheduling strategies and accompanying probabilities can be explored in an environment state, and the robustness of the system is increased.
In SAC, an incentive value and a strategy entropy are added into a target function, and the strategy pi is required to improve the final incentive value and maximize the entropy. Accordingly, the objective function J (π) is constructed as follows:
in the formula:for the expectation function, π is the strategy, S t And a t Synthesizing a state space and an action space of the energy system at the moment t; r(s) t ,a t ) The reward function is time t. (s) q ,a q )~P π Is a strategic pi state action track; alpha is an entropy temperature term and determines the influence degree of entropy on the reward. α H (π (· | s) t ) Is a state s) t The entropy of time, with reference to equation (29), can be found in its expression:
in the formula, P (pi (a) t |s t ) A when t is t As the probability of the motion space.
2.2.2SAC iterative approach
Teaching the value function Q(s) in reinforcement learning t ,a t ) As shown in equation (32), policy value evaluation for SAC; the strategy updating is shown by a Bellman operator as a formula (33).
In the formula T π The Bellman operator under the strategy pi; gamma is a discount factor for the reward; v(s) t+1 ) Is a state s t+1 The calculation method comprises the following steps:
in combination with Bellman operators, having
Q k+1 =T π Q k (35)
In the formula Q k As a function of the value at the time of the k-th calculation. The soft policy evaluation can be iterated through equation (35) and eventually Q converges to a soft Q function under a fixed policy pi.
2.2.3SAC policy update
The strategy is output as a Gaussian distribution, and the difference between the two distributions is minimized by minimizing the KL divergence.
In the formula D KL Is KL divergence (K-L divergence); II is a strategy set;as old strategy pi old A value function of;and carrying out normalized distribution on the Q value for the distribution function under the old strategy.
2.2.4SAC parameter update
The SAC algorithm is an Actor-Critic type algorithm, wherein the Actor models a strategy, and Critic models a Q value function. And respectively utilizing two neural networks to fit the Q value function and the strategy function, wherein the neural network parameter updating strategy of the Q value function is shown as a formula (37), and the strategy function parameter updating strategy is shown as a formula (38).
In the formula, theta and phi are Q value network and strategy network parameters;and Q θ Is an updated function.
Action entropy is also output in the policy network, wherein the update of the temperature parameter α is crucial to entropy, and the update is shown as (39):
the SAC of the present embodiment selects a linear correction function (ReLU) for neuron activation function
f(x)=max(0,x) (40)
The output layer is selected to have a tanh function in the range of-1, 1]. To facilitate scheduling, action a t The values are assigned to [0,1 ]]。
SAC-based multi-energy system optimization scheduling scheme
3.1 State space
In the multi-energy system environment of the embodiment, the information given to the agent by the environment generally includes: wind energy, light energy, time-of-use electricity price of a main network component, time-of-use electricity price of a micro-network, electric load, heat load, electric energy storage condition, hydrogen energy storage condition and time.
The state space is then:
S(t)=[L E (t),L Q (t),P solar (t),P wind (t),Γ PG (t),Γ DG (t),SOC h (t),SOC e (t),t] (41)
in the formula PG (t) is the time-of-use electricity price of the power grid in a period t; gamma-shaped DG And (t) is the time-of-use electricity price of the comprehensive energy system in the period of t.
3.2 space of action
After the agent obtains state information from the environment, an action is selected in the action space according to its own strategy pi. The power equipment model in the comprehensive energy system is more complex, the types of energy storage and energy conversion equipment are more, and in order to simplify the action space, the action of the two energy storage equipment is converted into the ACT 1 、ACT 2 Two actions, as can be seen from the equations (12) and (14), there is a coupling relationship between the electric power and the heat of the cogeneration, and the output power of the gas boiler can be obtained according to the heat supply network balance equation (25), therefore, the actions of the energy conversion equipment are the power output powers of the electric boiler and the cogeneration, and the action space is specifically as follows:
A(t)=[P GT (t),P EB (t),ACT 1 ,ACT 2 ] (42)
ACT in the formula 1 、ACT 2 The energy storage and charging are preferentially met when the renewable energy sources are excessive, and hydrogen is released by electrolyzed water. When the renewable energy is insufficient, comparing the electricity price and checking whether to start energy storage and discharge.
3.3 reward function
The reward function is a quantification of the objective task that can guide the agent towards optimization towards the objective. The reward function of the energy integration system of the embodiment is mainly derived from the operation cost, the energy sales income, the carbon emission and the strategy reward and punishment constant. The operation cost sources are the electricity purchasing cost, the gas purchasing cost and the maintenance cost of the comprehensive energy system; the income obtained by energy selling comes from the selling of electric energy, heat energy and hydrogen energy of the integrated energy system. Considering that the scale of the comprehensive energy system is small, the network loss cost of the heat-electricity-gas network and the start-up and stop cost of the equipment can be ignored. Operating cost C in t period 1 (t) is:
C 1 (t)=C e (t)+C f (t)+C ME (t) (43)
in the formula C e (t) the power grid electricity purchasing cost in t periods; c f (t) gas cost for time period t; c ME (t) maintenance cost for period t. Wherein t period of time purchasing electricity cost C e (t) is defined as:
in the formula P G And (t) is the purchased power in the t period, and delta t is the time interval. The cost of purchasing natural gas is:
C f (t)=c f (V GT (t)+V SB (t)) (45)
in the formula c f Is the natural gas price; v GT (t)、V SB (t) the amount of gas consumed by the cogeneration and the gas boiler within the time period t; the maintenance cost is as follows:
in the formula C ME (t) maintenance cost for time period t; c mi Is the maintenance cost coefficient for the ith unit; p is i (t) is the output power in the unit i during the period t. The energy selling income comprises selling income of electric energy, heat energy, electric energy storage and hydrogen energy storage residual energy of the comprehensive energy system:
in the formula C 2 (t) energy sales revenue of the integrated energy system for the time period t; l is a radical of an alcohol E (t) and L Q (t) the consumed power quantity of the electric load and the heat load of the comprehensive energy system in the period of t; gamma-shaped Q (t)、Γ h (t) thermal power and hydrogen prices for the t time period.
Carbon emissions are derived from the combustion of natural gas and the coal-electricity consumption of the main grid. According to the national 'double-carbon' construction target, the generated energy of new energy such as wind, light and the like is estimated to be in proportion to the generated energy in 2060 yearsWill reach 65%. The 1 degree electricity of the embodiment will discharge 0.45CO 2 ,1m 3 Natural gas produced 1.9kg of CO 2 . The carbon emission formula is as follows:
in the formula V GT (t) and V SB And (t) the amount of the natural gas used by the cogeneration and the gas boiler in the period of t. The occurrence of the strategy penalty constant reduces the times of actions exceeding the limit range during exploration, increases the times of the strategy correct actions and accelerates the algorithm convergence. Giving a penalty constant D in the time period t for natural gas supply exceeding the gas network pipeline limit range and heat and power bus unbalance 1 (t)、D 2 (t) the actions of reducing carbon emissions and increasing profit are given a reward constant D over a period of t 3 (t) of (d). Penalty constant C over time period t 4 (t) is:
C 4 (t)=D 1 (t)+D 2 (t)+D 3 (t) (49)
the optimal scheduling of the embodiment targets economy and carbon emission, and the reward function of the t period can be obtained by the following formula:
R(t)=α(C 2 (t)-C 1 (t))-(1-α)C 3 (t)+C 4 (t) (50)
as reinforcement learning during training randomly explores other actions to cause large fluctuation of R, the reward value R (t) is reduced in proportion, and meanwhile, a curve of the R value becomes smooth by adopting sliding average, so that the convergence condition of an observation algorithm is facilitated.
3.4 objective function
And combining the reward function to obtain an objective function C of the comprehensive energy system as follows:
4,SAC algorithm and dominant learning combination method
In the intelligent learning process of the DRL, the Q value fitted by the Q value neural network is not a true value, but is an estimated value of the true Q value, and the DRL only selects the action with the maximum Q value in the current state, and the Q value of the non-optimal action is possibly estimated to be too high, so that the DRL selects the action which is not the optimal action in the state, and the final result of the algorithm is influenced.
In 1999, baird provides the idea of dominant learning, which reduces the Q value of the non-optimal action in the reinforcement learning Q learning, thereby opening the difference from the Q value of the optimal action, reducing the over-estimation of the Q value of the non-optimal action, and reducing the probability of the intelligent agent for action misselection. The state value function for dominance learning is defined as:
in the formula A * (S, a) is an advantage function under state S and action A, defined as follows:
A * (s,a)=V * (s)-α(V * (s)-Q * (s,a)) (53)
in the formula, alpha (V) * (s)-Q * (s, a)) is a correction term when Q * And (s, a) is zero when the Q value of the optimal action is obtained, and is negative when the Q value of the non-optimal action is obtained, and the distance between the Q values of the optimal action and the non-optimal action is increased.
In order to add dominant learning into DRL using a deep neural network, change a correction function and utilize the characteristic that a SAC algorithm can quickly obtain a better strategy, the current state is input into the strategy network, the output action is taken as an optimal action and is brought into a Q value network, and the output Q value is taken as the current optimal state value Q(s) t ,a t+1 ;θ - ) The correction term is equation (54).
F(s t ,a t )=α(Q(s t ,a t+1 ;θ - )-Q(s t ,a t ;θ - )) (54)
However, the method neglects the defect that the estimation of the Q value of the action by the Q value network at the initial stage of the algorithm training is inaccurate, and if the Q value of the non-optimal action is larger than the Q value of the optimal action, the iterative convergence of the algorithm is interfered by opening the difference of the Q values. To solve the above problem, the present embodiment utilizes the decreasing rate of the Q-value network loss function loss to determine whether the Q-value network has the capability of starting dominant learning, and the decreasing rate is determined as follows:
in the formula (55), loss (t) is an average value of two Q-value network Loss function values at the time t, k is a descending rate of the Loss function values, and when the descending rate reaches a specified threshold, the neural network passes through an early-stage parameter unstable updating period, and starts dominant learning: when F(s) t ,a t )>0, F(s) t ,a t ) The value of (d) is not changed; when F(s) t ,a t )<0, F(s) t ,a t ) The value of (d) is 0.
Target Q value formula of the ALSAC median function estimation section:
the flow of combining dominant learning with SAC algorithm is shown in fig. 3 below.
5,ALSAC algorithm and transfer learning combination method
Compared with other DRLs, the ALSAC has stronger robustness, can ensure that the comprehensive energy system stably operates in a new environment to the maximum extent, but is difficult to make an optimal scheduling strategy for a new scene because historical data during intelligent agent training is incomplete. In order to obtain an optimal scheduling strategy, accelerate training speed, fully utilize existing scheduling knowledge, introduce parameter migration of transfer learning, and provide an integrated energy optimization scheduling method based on ALSAC and transfer learning, wherein the process is shown in FIG. 4.
In the practical application of the embodiment, the existing historical data is utilized to train the intelligent agent with the purposes of low carbon and economy, and the weight of the ALSAC algorithm neural network is stored after the training is finished, wherein the weight is the accumulated scheduling knowledge. And comparing the similarity of the environment of the target task with the historical environment of the source task through K-Means, and if the similarity is too low, determining the environment of the target task as a new environment. When a new environment is met, existing scheduling knowledge is migrated to a target task, and the parameters of the deep neural network are finely adjusted through an ALSAC algorithm, so that an optimal scheduling strategy is obtained.
6, example analysis
6.1 Integrated energy System simulation parameter settings
The prices of various types of energy, the time-of-use electricity prices, and the equipment parameters of the present embodiment are shown in tables 1, 2, and 3.
TABLE 1
TABLE 2
TABLE 3
6.2 training scenario and test scenario selection
The simulation data of the embodiment adopts historical data of 11 months from 2020 to 2021 of European Eliagroup company, the used data totals 358 pieces, each piece of data has 96 time periods, each time period is 15min, and the data of each time period is the sum of wind power, light power generation power and electric power and heat load power of the time period.
The history data are divided into four categories by K-Means, such as diamond, regular triangle, inverted triangle and pentagram in FIG. 5. In order to prevent data leakage, diamonds are selected from the four classes, and 12 data points closest to the diamond cluster center (solid circles) are used as training samples to optimize the scheduling strategy. In order to verify that the optimization scheduling strategy of the comprehensive energy system based on the ALSAC algorithm can improve the robustness of the system, 143 and 192 points and 172 points with relatively low similarity with the training set are randomly selected as scenes 1, 2 and 3 (all are solid graphs). The three scene data are shown in fig. 6 at 0-24, 24-48, 48-72 hours. The three scenes have obvious output characteristics, namely the conditions of weather with more wind and more light, less wind and less light and less wind and more light. In the embodiment, the performance of the strategy provided by the embodiment is tested by taking continuous 3 extreme and variable scenes as test scenes.
6.3 Algorithm reward function value comparison
The reward function values R are compared in the convergence process as shown in fig. 7.
The integrated energy system simulation experiment of the embodiment is built by utilizing Python3.7.3 software, a computer is configured to be I5-1135G7, a display card IrisXe Max, and the algorithm super-parameter is shown in Table 4.
The algorithm of ALSAC, SAC, PPO, A3C and DDPG is trained, and the convergence process of the reward R value is shown in figure 8.
As can be seen from fig. 7, in the five algorithms, the highest value R of the reward function is the ALSAC, which has the best optimization effect. In a complex scenario, the exploration mode of the action is different, and the optimal strategy for the exploration is also different. DDPG uses a method with a plurality of hyperparameters, OU-noise, to explore actions, and needs to manually add noise variance of the actions. The PPO adds a changeable penalty term in a trust domain of action selection to change the selection of the action, and changes the noise variance manually added by the DDPG into a trainable parameter. A3C explores in the environment in a multithreading mode and asynchronously updates the strategy. The motion updating parameters of the PPO and the SAC can be changed by training, the used superparameters are fewer, the algorithm robustness is stronger, but the SAC as offline learning has higher utilization rate on samples compared with online learning PPO, and a better scheduling strategy can be learned by using small sample data. After the SAC is added into the dominant learning, the over-estimation of the Q value of the non-optimal action by a Q value network is reduced, the misselection of the intelligent agent to the non-optimal action is reduced, and the algorithm convergence speed and the final optimization effect are improved.
TABLE 4
6.4 optimizing the scheduling results
The comprehensive energy system simulation model scheduling device of the embodiment is used for electric boilers, cogeneration, electric energy storage and hydrogen energy storage, and adopts ALSAC, SAC, PPO, DDPG and A3C algorithms to perform robustness tests of optimized scheduling in scenes 1, 2 and 3. Since all the calling conditions of the scheduling equipment in the heat supply network can be checked, the test in the section refers to the scheduling conditions of the heat supply network equipment.
As can be seen from the ALSAC scheduling optimization in fig. 8:
1) In the electricity price period and the peak period, when the total amount of wind and light power generation is large, under the condition of meeting the electric load, the electric energy storage starts to charge energy, the electrolyzed water starts to produce hydrogen, the hydrogen storage capacity of the hydrogen energy storage is increased, the electric boiler starts to operate, and the cogeneration starts to act because the heat load and the electric load are in the peak period. When the total amount of wind and light power generation is small, the requirements of electricity and heat load cannot be met, the electricity energy storage starts to output power, after three hours of power consumption, the hydrogen energy storage starts to output power, the hydrogen energy is converted into electric energy and heat energy, the carbon emission is reduced, and the two energy storages play roles in peak clipping and valley filling. The electric boiler does not operate because the total amount of wind and light power generation is too small and the electricity price is at a peak. Cogeneration generates heat and electricity due to high electricity prices and during periods of high load.
2) During the low electricity price period, because the electricity price is low, the combined heat and power generation and the electricity energy storage capacity are too low in profit and do not operate. When the wind and light energy production is not enough, the electric boiler buys electricity from the power grid and starts to supply heat on the premise of considering the electricity price and the carbon emission, and if the heat supply is insufficient, the gas boiler starts to generate heat.
6.4.1 comparison of optimization results:
from the optimization effect, in the optimization result of the PPO, the heat and power cogeneration in the second day is fully operated in the low electricity price period, so that the profit is reduced; in the optimization result of the A3C, the electric boiler is fully operated on the third day with the lowest wind and light capacity, so that the electric quantity purchased from the main network is increased, the profit is reduced, the carbon emission is increased, and meanwhile, the profit is reduced by starting the cogeneration in the period with low electricity price; in the optimization result of DDPG, the electric boiler is also operated at the third day in full load, and the cogeneration is operated in the low electricity price period. The three day optimization results are shown in Table 5, where the objective function C has an alpha value of 0.7. As can be seen from table 5, the SAC-based optimized scheduling policy has better optimization effect on coping with new environments than the other three DRL optimized scheduling policies.
From the aspect of power balance, the phenomenon that the thermal power of heating equipment is larger than the thermal load occurs when PPO is 18-20 hours and 42-48 hours, A3C is 18-20 hours and 68-70 hours, and DDPG is 18-20 hours and 68-72 hours, and the power is unbalanced. Compared with the obvious phenomenon of heat supply network power imbalance, the SAC-based optimal scheduling strategy has good performance in power balance. In summary, compared with the other three optimized scheduling strategies, the SAC-based optimized scheduling strategy has better robustness in the face of a new scene.
TABLE 5
Compared with the optimal scheduling result of the SAC, the optimal scheduling result based on the ALSAC increases the use of an electric boiler in a low electricity price period, reduces the use of cogeneration in an electricity price period, reduces the carbon emission of the comprehensive energy system on the premise of maximizing profits, and increases the objective function value.
6.4.2ALSAC in contrast to conventional optimization methods
The scheduling result is optimized as shown in fig. 9.
TABLE 6
The test scenarios 1, 2, 3 are scheduled and planned using a heuristic PSO algorithm and a traditional mixed integer program (MIQP) that is constructed and solved using Yalmip and solver Gurobi (with the MIPGap parameter set to 0.05). As shown in fig. 9 and table 6, the profit of the scheduling method based on the PSO is increased by 10.03% compared with the ALSAC algorithm, the profit is reduced by 28.44% in terms of carbon emission, the profit is reduced by 430.915% in terms of the total objective function value, the optimization effect is weaker than that of the ALSAC algorithm, and the algorithm falls into local optimum; compared with ALSAC, the scheduling result of MIQP has the advantages that the profit is reduced by 1.5%, the carbon emission is reduced by 3.17%, the objective function is increased by 6.25%, the optimization effect is slightly better than that of ALSAC in-line optimization scheduling of scenes 1, 2 and 3, the solution time is 245 times longer, and the solution efficiency is not high. When the scale of the comprehensive energy system is increased, the speed requirement on algorithm solving is higher, and the solving speed of the DRL can meet the online optimization scheduling of the comprehensive energy system.
After the training in the day, the neural network parameters of the ALSAC are determined, and in the actual decision in the day, the action A of the scheduling equipment can be directly output by collecting the data of the state S, so that the complex operation is reduced. The experiments show that the solving rate of the optimal scheduling problem of the comprehensive energy system is greatly improved under the condition that the optimization result of the ALSAC is not much different from the result of mixed integer programming in the day ahead.
6.5 ALSAC and transfer learning based analysis of optimized scheduling results of integrated energy system
The method can improve the learning efficiency of the intelligent agent and the generalization ability of coping with new scenes after the migration of the parameters introducing the migration learning. And transferring scheduling knowledge accumulated by 13 training samples in section 5.2 to a target task of a new scene (scene 1), and finely adjusting respective deep neural network parameters through an ALSAC algorithm and an SAC algorithm to further obtain an optimal scheduling strategy. Randomly selecting a new scene (183 points in fig. 5) and performing a generalization ability test, wherein data of the scene 4 is shown in fig. 11, optimal scheduling strategies of the scene 4 in combination with migration learning (migration-SAC), migration learning and dominant learning (migration-SAC) and non-migration learning (SAC) are respectively performed, and the results are shown in fig. 12, fig. 13, fig. 14 and table 7. After the ALSAC is combined with the migration learning, the electric boiler is started in the low-electricity-price and clean energy peak periods, the combined heat and power generation concentrates most of the running time in the high-electricity-price period, the carbon emission is reduced on the premise of maximizing profits as much as possible, the profits are reduced by 8.39% and 6.36% compared with a migration-SAC algorithm and an ALSAC algorithm, the carbon emission is reduced by 6.79% and 14.33%, the final objective function values are increased by 18.87% and 38.86%, and the generalization capability of the algorithm is improved. In order to improve learning efficiency, a contrast term is added, the ALSAC algorithm which does not combine with the migration learning is trained by using the data of the scene 1 as a training sample, and the training processes and the training time of the three are shown in fig. 10 and table 7. As can be seen from fig. 10 and table 7, in the training process, the convergence rate of the scheduling strategy combining dominant learning and transfer learning is significantly faster, the initial optimization effect is better, and the learning efficiency is higher. Because the optimization interval of the embodiment is 15min, which is far longer than the ALSAC convergence time after the combination of the migration learning, the real-time performance of the strategy updating can further meet the online optimization scheduling of the system.
TABLE 7
7, conclusion
Whether safe and efficient scheduling of energy can be performed in the comprehensive energy system is a precondition for operation of the comprehensive energy system. The energy multi-coupling of the integrated energy system, the uncertainty of renewable energy sources, and the like make the energy scheduling thereof face many challenges. The embodiment provides optimized dispatching of the comprehensive energy system based on the dominant flexible strategy-evaluation algorithm and transfer learning aiming at the problems of generalization capability of the optimized dispatching strategy of the comprehensive energy system and learning efficiency of an intelligent agent, and realizes the optimized dispatching of the comprehensive energy system by taking economy and low carbon as targets. After the SAC is added into the advantage learning, variable and extreme scenes are selected through K-Means clustering, and compared with various DRL algorithms based on strategy gradients, a traditional particle swarm algorithm and MIQP, the ALSAC is verified to have strong generalization capability and high convergence rate in the optimization scheduling of the comprehensive energy system, meanwhile, the learning efficiency of an intelligent body and the generalization capability facing a new scene are further improved after the transfer learning is introduced, and the flexible and efficient scheduling of the comprehensive energy system is realized.
The beneficial effects of the embodiment are as follows:
according to the method, an ALSAC algorithm and transfer learning-based comprehensive energy system optimization scheduling strategy is provided, low carbon and economy are taken as targets to achieve optimization scheduling of the comprehensive energy system, the SAC maximum entropy mechanism enables optimization scheduling of the comprehensive energy system to be more robust, after the idea of advantage learning is combined, over-estimation of a Q network on non-optimal action values is reduced, misselection of an intelligent body on the non-optimal actions is reduced, generalization capability is improved, meanwhile, neural network stability judgment is added into the algorithm to determine whether to start advantage learning, and neural network parameter iteration in the early stage of interference of the advantage learning is prevented. And (3) introducing parameter migration of transfer learning, judging whether the scene is a new scene or not by utilizing the correlation of K-Means, if so, transferring historical scheduling knowledge to a target task of the new scene, and finely adjusting the parameters of the deep neural network through an ALSAC algorithm to further obtain an optimal scheduling strategy.
The above description is only for the preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (8)
1. An optimal scheduling method for an integrated energy system is characterized by comprising the following steps:
constructing an integrated energy system, wherein the integrated energy system operates a plurality of equipment models through grid connection, and obtains a scheduling model based on the integrated energy system and a reinforcement learning algorithm, wherein the scheduling model comprises an intelligent agent and an environment;
and correcting a Q value function in the reinforcement learning algorithm based on dominance learning to obtain a comprehensive algorithm, and training the intelligent agent based on the comprehensive algorithm to obtain an optimized scheduling strategy.
2. The method according to claim 1, wherein the plant model comprises: the system comprises a hydrogen energy storage model, an electric energy storage model, a combined heat and power generation model, an electric boiler model, a gas boiler model and a heat exchange device model.
3. The method according to claim 1, wherein the process of obtaining the scheduling model comprises:
acquiring constraint balance in the integrated energy system, and constructing a scheduling model through a reinforcement learning algorithm based on the constraint balance, wherein the constraint balance comprises the following steps: power grid balancing, heat supply network balancing and air grid balancing.
4. The method according to claim 1, wherein the reinforcement learning algorithm comprises: algorithm iteration and algorithm parameter updating.
5. The method according to claim 4, wherein the process of obtaining the integrated algorithm comprises:
obtaining a Q value network loss function in algorithm iteration, calculating the reduction rate of the loss function, judging the starting advantage learning based on the reduction rate, correcting the Q value function in the reinforcement learning algorithm based on the judgment result, and finally obtaining a comprehensive algorithm.
6. The method according to claim 1, wherein the Q-value function is a function between a state parameter and an action parameter of the integrated energy system at time t.
7. The method of claim 1, further comprising a process of combining the integrated algorithm with transfer learning:
obtaining scheduling knowledge based on a comprehensive algorithm, and migrating the scheduling knowledge to a target task; and fine-tuning the scheduling strategy based on the migration result to obtain an optimized scheduling strategy.
8. The method according to claim 7, wherein migrating the scheduling knowledge to the target task comprises:
and based on the scheduling knowledge, carrying out parameter migration on the deep neural network, judging the environment of the target task through a k-means clustering algorithm, and migrating the scheduling knowledge to the target task based on a judgment result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211397926.XA CN115759604B (en) | 2022-11-09 | 2022-11-09 | Comprehensive energy system optimal scheduling method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211397926.XA CN115759604B (en) | 2022-11-09 | 2022-11-09 | Comprehensive energy system optimal scheduling method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115759604A true CN115759604A (en) | 2023-03-07 |
CN115759604B CN115759604B (en) | 2023-09-19 |
Family
ID=85369845
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211397926.XA Active CN115759604B (en) | 2022-11-09 | 2022-11-09 | Comprehensive energy system optimal scheduling method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115759604B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117787609A (en) * | 2023-12-22 | 2024-03-29 | 南京东博智慧能源研究院有限公司 | Comprehensive energy system low-carbon economic scheduling strategy based on CT-TD3 algorithm |
CN117993693A (en) * | 2024-04-03 | 2024-05-07 | 国网江西省电力有限公司电力科学研究院 | Zero-carbon park scheduling method and system for behavior clone reinforcement learning |
CN118485286A (en) * | 2024-07-16 | 2024-08-13 | 杭州电子科技大学 | Comprehensive energy system scheduling method based on enhanced exploration rollback clipping reinforcement learning |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210241090A1 (en) * | 2020-01-31 | 2021-08-05 | At&T Intellectual Property I, L.P. | Radio access network control with deep reinforcement learning |
CN113780688A (en) * | 2021-11-10 | 2021-12-10 | 中国电力科学研究院有限公司 | Optimized operation method, system, equipment and medium of electric heating combined system |
CN114091879A (en) * | 2021-11-15 | 2022-02-25 | 浙江华云电力工程设计咨询有限公司 | Multi-park energy scheduling method and system based on deep reinforcement learning |
CN114217524A (en) * | 2021-11-18 | 2022-03-22 | 国网天津市电力公司电力科学研究院 | Power grid real-time self-adaptive decision-making method based on deep reinforcement learning |
CN114357782A (en) * | 2022-01-06 | 2022-04-15 | 南京邮电大学 | Comprehensive energy system optimization scheduling method considering carbon source sink effect |
CN114971250A (en) * | 2022-05-17 | 2022-08-30 | 重庆大学 | Comprehensive energy economic dispatching system based on deep Q learning |
CN115207977A (en) * | 2022-08-19 | 2022-10-18 | 国网信息通信产业集团有限公司 | Active power distribution network deep reinforcement learning real-time scheduling method and system |
-
2022
- 2022-11-09 CN CN202211397926.XA patent/CN115759604B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210241090A1 (en) * | 2020-01-31 | 2021-08-05 | At&T Intellectual Property I, L.P. | Radio access network control with deep reinforcement learning |
CN113780688A (en) * | 2021-11-10 | 2021-12-10 | 中国电力科学研究院有限公司 | Optimized operation method, system, equipment and medium of electric heating combined system |
CN114091879A (en) * | 2021-11-15 | 2022-02-25 | 浙江华云电力工程设计咨询有限公司 | Multi-park energy scheduling method and system based on deep reinforcement learning |
CN114217524A (en) * | 2021-11-18 | 2022-03-22 | 国网天津市电力公司电力科学研究院 | Power grid real-time self-adaptive decision-making method based on deep reinforcement learning |
CN114357782A (en) * | 2022-01-06 | 2022-04-15 | 南京邮电大学 | Comprehensive energy system optimization scheduling method considering carbon source sink effect |
CN114971250A (en) * | 2022-05-17 | 2022-08-30 | 重庆大学 | Comprehensive energy economic dispatching system based on deep Q learning |
CN115207977A (en) * | 2022-08-19 | 2022-10-18 | 国网信息通信产业集团有限公司 | Active power distribution network deep reinforcement learning real-time scheduling method and system |
Non-Patent Citations (2)
Title |
---|
罗文健 等: "基于优势柔性策略-评价算法和迁移学习的区域综合能源系统优化调度", 《电网技术》, vol. 47, no. 4 * |
黄志勇: "深度强化学习的值函数估计改进研究", 《中国优秀硕士学位论文全文数据库(电子期刊)信息科技辑》, no. 02 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117787609A (en) * | 2023-12-22 | 2024-03-29 | 南京东博智慧能源研究院有限公司 | Comprehensive energy system low-carbon economic scheduling strategy based on CT-TD3 algorithm |
CN117993693A (en) * | 2024-04-03 | 2024-05-07 | 国网江西省电力有限公司电力科学研究院 | Zero-carbon park scheduling method and system for behavior clone reinforcement learning |
CN118485286A (en) * | 2024-07-16 | 2024-08-13 | 杭州电子科技大学 | Comprehensive energy system scheduling method based on enhanced exploration rollback clipping reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
CN115759604B (en) | 2023-09-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115759604B (en) | Comprehensive energy system optimal scheduling method | |
CN111555355B (en) | Scheduling strategy and optimization method for water-light-storage combined power generation | |
CN111463836A (en) | Optimized scheduling method for comprehensive energy system | |
CN111737884B (en) | Multi-target random planning method for micro-energy network containing multiple clean energy sources | |
CN111210079B (en) | Operation optimization method and system for distributed energy virtual power plant | |
CN114301081B (en) | Micro-grid optimization method considering storage battery energy storage life loss and demand response | |
CN112131712B (en) | Multi-objective optimization method and system for multi-energy system on client side | |
CN114219195A (en) | Regional comprehensive energy capacity optimization control method | |
CN114529075A (en) | Comprehensive energy system distribution robustness optimization scheduling method considering wind and light prediction error | |
CN111668878A (en) | Optimal configuration method and system for renewable micro-energy network | |
CN112580897A (en) | Multi-energy power system optimal scheduling method based on parrot algorithm | |
CN117726143B (en) | Environment-friendly micro-grid optimal scheduling method and system based on deep reinforcement learning | |
CN117077960A (en) | Day-ahead scheduling optimization method for regional comprehensive energy system | |
CN111091239A (en) | Energy service provider electricity price strategy making method and device based on differential evolution algorithm | |
CN112883630A (en) | Day-ahead optimized economic dispatching method for multi-microgrid system for wind power consumption | |
CN112510690B (en) | Optimal scheduling method and system considering wind-fire-storage combination and demand response reward and punishment | |
CN117526451A (en) | Regional comprehensive energy system configuration optimization method considering flexible load | |
CN117353399A (en) | Uncertainty-considered AC/DC hybrid micro-grid flexibility assessment method | |
CN112116122A (en) | Building cogeneration system operation optimization method for improving flexibility of power grid | |
CN111181158A (en) | Wind power plant economic dispatching method based on artificial neural network | |
CN109754128A (en) | A kind of wind/light/storage/bavin micro-capacitance sensor Optimal Configuration Method of meter and meteorological wave characteristic difference typical scene | |
CN114723230B (en) | Micro-grid double-layer scheduling method and system for new energy power generation and energy storage | |
CN117993693B (en) | Zero-carbon park scheduling method and system for behavior clone reinforcement learning | |
Zhang et al. | Optimal Location and Capacity Planning of Distribution Network Considering Demand Response and Battery Storage Capacity Degradation | |
CN116128384A (en) | Method and device for establishing data driving model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |