CN117291390A - Scheduling decision model establishment method based on SumPree-TD 3 algorithm - Google Patents

Scheduling decision model establishment method based on SumPree-TD 3 algorithm Download PDF

Info

Publication number
CN117291390A
CN117291390A CN202311320628.5A CN202311320628A CN117291390A CN 117291390 A CN117291390 A CN 117291390A CN 202311320628 A CN202311320628 A CN 202311320628A CN 117291390 A CN117291390 A CN 117291390A
Authority
CN
China
Prior art keywords
power
data
gas
representing
energy storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311320628.5A
Other languages
Chinese (zh)
Inventor
邱革非
罗世杰
何虹辉
刘铠铭
何超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN202311320628.5A priority Critical patent/CN117291390A/en
Publication of CN117291390A publication Critical patent/CN117291390A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06313Resource planning in a project environment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Strategic Management (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Mathematical Physics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Operations Research (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Primary Health Care (AREA)
  • Development Economics (AREA)
  • Water Supply & Treatment (AREA)
  • Public Health (AREA)
  • Educational Administration (AREA)
  • Game Theory and Decision Science (AREA)
  • Quality & Reliability (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention relates to a scheduling decision model building method based on a SumToee-TD 3 algorithm, and belongs to the technical field of comprehensive energy system scheduling. The invention firstly establishes a low-carbon economic dispatching characteristic model of the comprehensive energy system. And secondly, a Markov decision model is established based on the low-carbon economic dispatch characteristics of the comprehensive energy system. And finally, establishing a scheduling decision model for improving a dual-delay depth deterministic strategy gradient algorithm by combining with a SumPree computer data tree structure, and constructing and training a neural network to improve the network output stability. The method provided by the invention can effectively solve the low-carbon economic scheduling problem of the comprehensive energy system based on the deep reinforcement learning algorithm. In order to improve the network training efficiency, the historical experience data in the training process is stored and sampled through the summation tree, the performance of the method is better than that of the traditional reinforcement learning algorithm, and good references are provided for the problems of high data dimension, high modeling difficulty and the like in the low-carbon economic scheduling of the subsequent comprehensive energy system.

Description

Scheduling decision model establishment method based on SumPree-TD 3 algorithm
Technical Field
The invention relates to a scheduling decision model building method based on a SumToee-TD 3 algorithm, and belongs to the technical field of comprehensive energy system scheduling.
Background
The comprehensive energy system (integrated energy system, IES) is particularly important in the task of accelerating planning and constructing a novel energy system and promoting energy green low-carbon transformation due to the characteristic of multi-energy coupling, and the economical efficiency and low carbon performance of the IES operation are also widely concerned.
The deep reinforcement learning (deep reinforcement learning, DRL) method has stronger adaptability and generalization capability, and the optimal solution can be found more efficiently by modeling the problem with sequential decision characteristics by applying a Markov decision process (markov decision process, MDP). The DRL method is currently applied to the study in power system scheduling: a near-end policy optimization algorithm (proximal policy optimization, PPO) is applied in source load uncertainty scenarios. The dominant flexible actor critter (ALSAC) algorithm can handle environments with greater randomness. The above methods all adopt random strategies, so that the practical application generally has the defects of low convergence speed, waste of computing resources and easy generation of unstable results. Compared with a random strategy method, the depth deterministic strategy gradient (deep deterministic policy gradient, DDPG) improves the calculation efficiency and the convergence speed, but also has the problems of overestimation, lower execution efficiency, weak action exploration capability and easy sinking into local optimum. The dual-delay depth deterministic strategy gradient algorithm (twin delayed deep deterministic policy gradient, TD 3) solves the problem of operation safety of the power system, and the effectiveness and applicability of the method are shown in the actual operation scene of the power system. However, as can be seen from training results, the algorithm still has the problems of slow convergence speed and a large number of iteration rounds caused by randomly sampling data.
In view of the above, a method for establishing a comprehensive energy system scheduling decision model based on an improved dual-delay depth deterministic strategy gradient algorithm is provided, and priority experience playback is realized by applying a sum tree (SumTree) storage sampling to historical experience data on the basis of the existing study, so that the training efficiency and performance of a TD3 algorithm are improved. The specific process is as follows: markov modeling is performed on low-carbon economic dispatch strategy optimization of the IES, and a decision interaction environment is established to train the decision capability of the agent. In the training process, a priority index is set for the experience data based on the data updating value, and the experience data is efficiently utilized by using SumPree to store and sample, so that the training efficiency is improved.
Disclosure of Invention
The invention aims to provide a scheduling decision model building method based on a SumToee-TD 3 algorithm, which solves the problems of high dimension of faced data, high modeling difficulty and the like in low-carbon economic scheduling of a comprehensive energy system.
The technical scheme of the invention is as follows: a scheduling decision model building method based on a SumToe-TD 3 algorithm firstly builds a low-carbon economic scheduling characteristic model of a comprehensive energy system. And secondly, a Markov decision model is established based on the low-carbon economic dispatch characteristics of the comprehensive energy system. And finally, establishing a scheduling decision model for improving a dual-delay depth deterministic strategy gradient algorithm by combining with a SumPree computer data tree structure, and constructing and training a neural network to improve the network output stability.
The method comprises the following specific steps:
step1: and constructing a low-carbon economic dispatch model of an integrated energy system (integrated energy system, IES) and constructing a model for the objective function and the constraint condition.
Step2: and carrying out Markov modeling on the low-carbon economic dispatching model of the comprehensive energy system to obtain a Markov decision model with the low-carbon economic dispatching characteristic of the comprehensive energy system.
Step3: and establishing a scheduling decision model of an improved dual-delay depth deterministic strategy gradient algorithm (twin delayed deep deterministic policy gradient, TD 3) according to the Markov decision model with the comprehensive energy system low-carbon economic scheduling characteristic and a SumToe computer data tree structure.
The low-carbon economic dispatch model of the comprehensive energy system comprises the following steps:
solar energy unit and wind generating set.
The actual output of the solar generator is affected by the local ambient temperature and the intensity of the illumination, and the actual output of the wind generator is affected by the wind speed. Therefore, the solar unit and the wind turbine generator set adopt corresponding output data of the unit during research, and the photovoltaic power supply in the solar unit and the output power of the wind turbine generator set at the moment t are respectively P PV (t) and P WT (t) represents.
Gas turbines and boilers.
The relation between the electricity, the heat power and the consumed natural gas amount of the gas turbine and the boiler is as follows:
P GT (t)=η GT H gas G GT (t) (1)
Q GT (t)=(1-η GT )(1-ω GT )H gas G GT (t) (2)
Q WHB (t)=η WHB Q GT (t) (3)
wherein: g GT (t)、P GT (t)、Q GT (t)、Q WHB (t) represents the natural gas amount, the power generation and the power generation of the combustion of the gas turbine at the time t and the power generation of the waste heat boiler respectively, H gas 8.302kW/m is taken for the heat value of natural gas 3 ,η GT The electric conversion efficiency of the gas turbine is 0.42, eta WHB The heat conversion efficiency of the waste heat boiler is 0.85 omega GT Taking 0.2 for heat loss coefficient.
A gas boiler.
When the heat energy recovered by the waste heat boiler is insufficient for supplying the heat load, the gas boiler is used as the heat load shortage supplementing equipment, and the relationship between the input natural gas quantity and the output heating power is as follows:
Q GB (t)=η GB H gas G GB (t) (4)
wherein: q (Q) GB (t)、G GB (t) represents the heating power of the gas boiler at the moment t and the natural gas quantity, eta GB The heat conversion efficiency of the gas boiler is 0.84.
And a main power grid.
The main power grid for energy transaction of the comprehensive energy system implements a time-of-use electricity price strategy, and the energy transaction is carried out according to the strategy, and is mainly used for relieving uncontrollable and intermittent problems of output and load demands of a distributed power supply so as to improve the economical efficiency and stability of system operation.
A battery energy storage system.
The battery energy storage system stores electric energy and configures the scale of the electric energy when the output of the distributed power supply is excessive and the energy storage system does not reach the maximum allowable capacity, and the energy storage allowance of the system at the moment t is as follows:
B(t)=B(t-1)+η cha P B,cha (t)-η dis P B,dis (t) (5)
wherein: b (t) and B (t-1) respectively represent the energy storage allowance, eta at the time of t and t-1 cha 、η di s respectively represents the charge and discharge efficiency of the energy storage system, and 0.92 and 0.95 are respectively taken. P (P) B,cha (t) represents the charging power at time t, P B,dis And (t) represents the discharge power at the moment t, and the state of charge at the moment t of the energy storage system is as follows:
wherein: SOC (State of Charge) B (t) represents the state of charge of the energy storage system at the moment t, B max Representing the maximum capacity of the energy storage system.
An objective function.
The total running cost of the system consists of gas purchasing cost, environmental pollution treatment cost, system operation and maintenance cost and energy transaction cost with a main power grid, and the objective function is expressed as follows:
f=min(c gas +c env +c run +c mg ) (7)
wherein: c gas Representing the cost of purchasing gas c en v represents the environmental pollution treatment cost, c run Representing the running maintenance cost, c mg Representing the cost of energy trade with the main grid.
The gas purchase cost of two types of equipment of the gas turbine and the gas boiler is as follows:
wherein: zeta type toy gas The gas price is a certain value in the research, and does not change with the time.
In addition, due to the operating characteristics of the gas turbines and the gas boilers, as well as certain power generation equipment in the main grid, certain environmental pollution abatement costs are generated:
wherein: zeta type toy eg Representing the environmental pollution control cost coefficient, xi generated by a gas turbine and a gas boiler m g represents cost coefficient after pollution control conversion generated by a main power grid, and P mg,b And (t) represents the electric energy purchased from the main power grid at the time t.
The running cost mainly considers the cost generated by running and maintaining the distributed power supply and the energy storage system, is related to the actual output of the equipment, and is specifically as follows:
wherein: k (K) WT 、K PV 、K B And the operation maintenance cost coefficients of the fan, the photovoltaic system and the energy storage system are respectively represented, and the gas purchase cost during operation of the gas turbine and the gas boiler is only considered, so that the maintenance cost is ignored.
The cost when the energy is traded with the main power grid is as follows:
wherein: zeta type toy tou,b (t)、ξ tou,s (t) time-of-use electricity prices respectively representing purchase and sale of electric energy from and to the main grid, P mg,s And (t) represents the electric energy sold to the main power grid at the moment t.
Constraint conditions.
The constraint conditions of the power supply output are as follows:
wherein: p (P) PV,min 、P WT,min 、P GT,mi n represents the lower limit of the output of the photovoltaic, fan and gas turbine, P PV,max 、P WT,max 、P GT,max The upper limits of the output of the photovoltaic, fan and gas turbine are respectively indicated.
According to the operation characteristics of the gas turbine, the power climbing constraint condition of the gas turbine needs to be met:
ΔP GT,min ≤P GT (t+1)-P GT (t)≤ΔP GT,max (13)
wherein: ΔP GT,max And delta P GT,min Respectively represent the upper limit and the lower limit of the climbing power of the gas turbine.
Electric power balance constraint conditions:
wherein: l (L) e,i (t) represents the ith electric load power at time t, N e Representing the total number of electrical loads.
Thermal power balance constraint conditions:
wherein: l (L) h,j (t) represents the jth thermal load power at time t, N h Indicating the total number of thermal loads.
Constraint conditions of the electric energy storage system:
wherein: p (P) B,cha,max 、P B,cha,min And P B,dis,max 、P B,dis,min Respectively representing the minimum and maximum charging power and the minimum and maximum discharging power of the energy storage system, B min 、B max Respectively representing the minimum and maximum allowable capacities of the energy storage system and the SOC B,min 、SOC B,max And respectively representing the states of charge of the minimum energy storage system and the maximum energy storage system, and respectively taking 0.3 and 0.9.
In order to ensure the stability of the operation of the main power grid side, the real-time power interaction constraint condition with the main power grid needs to be met:
P mg,min ≤P mg (t)≤P mg,max (17)
wherein: p (P) mg,min 、P mg,max And respectively representing the lower limit and the upper limit of the interaction power of the comprehensive energy system and the main power grid.
The Markov decision model construction process with the comprehensive energy system low-carbon economic dispatching characteristic comprises the following steps:
in the present invention, one scheduling period is 1h and one scheduling period is 24h. In the preset scene of the Markov decision model, a state space set consists of distributed power output, the charge state of a battery energy storage system, electricity price information and two types of load demand, and a state space s (t) is expressed as follows:
wherein: p (P) DG And (t) represents the total output power of the photovoltaic power supply and the wind turbine generator at each time t, and specifically comprises the following steps:
P DG (t)=P PV (t)+P WT (t) (19)
constructing an intelligent body, wherein the intelligent body can schedule the output of a gas turbine and a gas boiler, the charge and discharge of a battery energy storage system and the purchase and sale electric quantity of a main power grid at each moment t, so the action space a (t) can be expressed as:
a(t)=P GT (t),Q GB (t),B a (t),P mg (t) (20)
wherein: b (B) a (t) represents the charge and discharge operation amount of the battery energy storage system, and the thermal power Q of the gas turbine recovered by the waste heat recovery device WHB (t) the force is expressed by the formulas (2) to (3) according to P GT (t) conversion, so that it is not embodied in the action space constituent.
The comprehensive energy system low-carbon economic dispatching problem takes the minimum system total running cost as an optimization target, and the intelligent body takes the maximized rewarding value as an action optimization basis, so that a rewarding value function is set to be negative for a corresponding objective function, and meanwhile, in order to reduce the power imbalance phenomenon generated by strategies, the electric and thermal power imbalance caused by equipment output is added into the rewarding value function as a penalty function, and the method specifically comprises the following steps:
wherein: c i (t) (i=1, 2,3, 4) respectively corresponds to the gas purchasing cost, the environmental pollution treatment cost, the operation and maintenance cost and the energy transaction cost with the main power grid of each scheduling period t, alpha i Prize value weights representing corresponding costs, g (t) represents penalty function, β c 、β g Representing the prize value function and penalty function coefficients.
The power imbalance penalty function is expressed as:
wherein: lambda (lambda) P 、λ Q Respectively represents the penalty factors epsilon of the constraint conditions of electric power and thermal power P (t)、ε Q (t) represents the degree of imbalance of the two types of constraints, respectively.
The scheduling decision model construction process of the improved dual-delay depth deterministic strategy gradient algorithm comprises the following steps:
SumPree is a computer data tree structure, and can reduce the relevance between data by applying the SumPree in a deep reinforcement learning method. According to the invention, the SumTiee is applied to the experience playback buffer zone to rapidly complete the playback of the priority experience, so that the utilization rate of effective data and the training speed of an intelligent agent are improved, and the TD3 algorithm is improved.
SumPree-based data storage samples.
The Critic network of the degree decision model adopts an action-cost function to calculate TD-error:
δ=r tQ *Q(s t+1 ,a t+1 )-Q(s t ,a t ) (25)
wherein: gamma ray Q For the discount factor, Q (s t ,a t ) Representing action-cost function s t+1 、s t Respectively representing the corresponding states of t+1 and t time, a t+1 、a t The actions taken at times t+1 and t are indicated, respectively.
Taking TD-error of each piece of experience data as a priority index of the data to obtain the sampled priority probability of the data:
wherein: ρ l 、δ l And respectively representing the sampled priority probability of the first piece of empirical data and the corresponding TD-error, wherein v is a weighing factor, v=0 is uniform sampling, v=1 is greedy policy sampling, and v=0.6 is taken in order to reduce the difference of the sampled probability between delta larger data and delta smaller data.
Meanwhile, in order to avoid sampling less empirical data with small TD-error, initializing newly added empirical data:
δ l,0 =δ max (27)
in the middle of:δ l,0 TD-error, delta representing the first empirical data added to the empirical playback buffer max Represents the maximum TD-error within the empirical playback buffer beta such that empirical data with small delta can still be sampled at least once.
The intelligent training process is as follows.
(1) Initializing three realistic network parameters θ 1 ,θ 2And initializing three target networks with the same parameter values: θ 1 ’←θ 1 ,θ 2 ’←θ 2 ,φ’←φ。
(2) The experience playback buffer beta capacity and the number of sample data stripes N at training time are set.
(3) The acquisition and addition of the experience data tuple to the BETA are specifically:
a: random fetching of initial state s from historical data t
b:π φ Selecting action a in state st in combination with noise x t
a t =π φ (s t )+x,x~N(0,σ)
c: in action a t Interaction with the environment to obtain a prize value r t And the next state s t+1 And form a data tuple s t ,a t ,r t ,a t+1
d: delta of data is used as a priority index of the data, and the delta is sequentially stored in SumToe leaf nodes according to a data adding sequence, and meanwhile, node values of related nodes are updated;
e: judging the number of the empirical data in the BETA, if the number does not reach the set capacity upper limit, making s at the moment t+1 As s in step b t Repeating the processes of the steps b-e, otherwise ending adding and giving the maximum delta in the beta to each piece of data.
(4) Sampling N data from the beta based on SumPree sampling mode, and pi-sampling each data φ 'adding a noise x' that smoothes regularization based on a target strategy,deriving s t+1 Corresponding target action a t+1
a t+1 =π φ’ (s t+1 )+x’,x’~clip[N(0,σ’),-ψ,ψ]
(5) Recording the s obtained t+1 ,a t+1 And observed rewards r t+1 Two Critic target networks are input to calculate the target value y t
(6) The error between the target value and the observed value is minimized based on the gradient descent algorithm, so that two Critic reality network parameters theta are updated:
(7) At a learning rate tau 1 Soft-updating the target network parameters by calculating a weighted average of the real network and the target network parameters:
θ i ’←τ 1 θ i +(1-τ 1i ’,i=1,2
(8) Delta of the data is recalculated and the node values of the leaf nodes and the related nodes where the delta is located are updated.
(9) After d steps are updated by the Critic network, the parameter phi of the Actor reality network is updated by a gradient descent algorithm:
(10) At a learning rate tau 2 To soft update the Actor target network parameters:
φ’←τ 2 φ+(1-τ 2 )φ’
and (4) to (10) above are circulated, and the prize value is recorded.
The objective function model formed by the formulas (7) to (11) and the constraint condition model formed by the formulas (12) to (17) are combined to form the low-carbon economic dispatch model of the comprehensive energy system. And (3) establishing a Markov model of a low-carbon economic dispatch model of the comprehensive energy system by using the formulas (18) - (24), improving a dual-delay depth deterministic strategy gradient algorithm by combining SumToe, finally completing the establishment of the improved dual-delay depth deterministic strategy gradient algorithm model by using the formulas (25) - (27), and completing the training of an intelligent body according to the flows (1) - (10).
The Chinese meaning of the SumPree-TD 3 algorithm is to improve the dual delay depth deterministic strategy gradient algorithm, as known to those skilled in the art.
The beneficial effects of the invention are as follows:
1. compared with heuristic algorithm, the invention can self-adaptively learn and mine the physical model from the data, and can continuously optimize the strategy to be optimal along with the increase of training rounds, thereby overcoming the difficulty that rules and models need to be written manually when certain high-dimensional complex problems are processed.
2. Compared with a deterministic strategy gradient algorithm with higher calculation efficiency and higher convergence speed, the method has stronger action exploration capability of an intelligent body and lower possibility of sinking into local optimum.
3. The improved method provided by the invention realizes the efficient utilization of the experience data with higher updating value before the comparison and improvement, and effectively avoids the problem that the training speed is reduced by similar experience data.
Drawings
FIG. 1 is a flow chart of the steps of the present invention;
FIG. 2 is a block diagram of an integrated energy system in an embodiment of the invention;
FIG. 3 is a block diagram of SumPree in an embodiment of the invention;
FIG. 4 is a diagram of a deep reinforcement learning model of low-carbon economic dispatch of a comprehensive energy system in an embodiment of the invention;
FIG. 5 is a graph of load and wind-solar power output prediction in an embodiment of the invention;
FIG. 6 is a chart showing convergence of prize values for a deep reinforcement learning method according to an embodiment of the present invention;
FIG. 7 is a power balance diagram of the scheduling policy results of various methods in an embodiment of the present invention;
FIG. 8 is a training 1200 before and after improvement of a dual delay depth deterministic strategy gradient algorithm in an embodiment of the present invention.
Detailed Description
The invention will be further described with reference to the drawings and detailed description.
Example 1: as shown in figures 1 and 3, a scheduling decision model building method based on a SumToe-TD 3 algorithm firstly builds a low-carbon economic scheduling characteristic model of a comprehensive energy system. And secondly, a Markov decision model is established based on the low-carbon economic dispatch characteristics of the comprehensive energy system. And finally, establishing a scheduling decision model for improving the dual-delay depth deterministic strategy gradient algorithm, and constructing and training a neural network to improve the network output stability.
The IES shown in fig. 2 is taken as an example system, wherein each equipment parameter and the related cost coefficient are shown in table 1, the peak, flat and valley time intervals when the IES interact with the main power grid are divided into table 2, the time-of-use electricity price information is shown in table 3, and the prediction results of the distributed power output, the electric load and the heat load demands according to the historical data of a certain place in the south of China are shown in fig. 5. Comparative analysis was performed using the following 4 methods:
method 1: adopting a multi-objective optimization scheduling strategy of an NSGA-II algorithm;
method 2: a scheduling strategy of a DDPG algorithm is adopted;
method 3: scheduling strategies of a dual-delay depth deterministic strategy gradient algorithm are adopted;
method 4: a scheduling strategy that improves the dual delay depth deterministic strategy gradient algorithm is employed.
Table 1 each device parameter and associated cost factor Table 1Equipmentconfiguration information andrelatedcostcoefficient
Table 2 main grid time-of-use electricity price policy time period division Table 2Time Division Table ofTOU forMain Grid
Table 3 Main grid time-of-use price information Table 3TOU electricity price ofthe main grid
For the NSGA-II algorithm in the method 1, the lowest running cost of the system and the lowest environmental treatment cost are used as optimization targets, decision variables are the purchase and sale energy of each controllable output device and a main power grid in the system, and parameters are set as follows: the population number is 200; maximum number of iterations 200; the crossover rate is 0.5; the variation rate is 0.1. The algorithm can only solve a single moment at a time, and the result integrated at each moment in the whole scheduling period is adopted in comparison analysis. In the neural network construction of the methods 2 to 4, because the IES runs a complex data set related to the time sequence, the learning rate, the empirical pool capacity, the hidden layer number and the neuron number of each neural network need to be preset. The deep reinforcement learning method adopts unified neural network parameters, wherein the unified neural network parameters comprise an Actor network learning rate of 0.0003, critic network learning rates of 0.003, soft update learning rates tau 1 and tau 2 of 0.005, hidden layers of the neural network are 3 layers, the adopted activation functions are ReLU, reLU, tanh respectively, 64 neurons in each layer are used, discount factors are 0.95, and experience pool capacity is 3000. In the method 3 and the method 4, other parameters are set in the dual-delay depth deterministic strategy gradient algorithm before and after improvement, wherein the noise x standard deviation sigma is 0.01, the x 'standard deviation sigma' is 0.02, and the interception boundary psi is 0.05. The scheduling results of the 4 methods are shown in fig. 7.
As can be seen from fig. 6, the improved dual-delay depth deterministic strategy gradient algorithm provided by the present invention has significant fluctuation of average rewarding value in the early training period, because the data priority index is given a uniform initial value in the early sampling period to avoid that some data cannot be sampled, so that some data with lower actual update value are overestimated, and the judgment of the intelligent agent on the action optimization is affected. The average prize level was gradually gentle with increasing training rounds, tending to converge after the 1200 rounds of training. Under the condition of the same training 2000 rounds, the highest average rewarding value level is slightly higher than that of the unmodified dual-delay depth deterministic strategy gradient algorithm, is obviously higher than that of the DDPG algorithm, and can find the optimal solution better than that of the other two methods.
As can be seen from fig. 7, no significant power imbalance problem occurs with the output of the four methods. The data of table4 shows that the output results of different methods have certain difference in cost, the total cost of the improved dual-delay depth deterministic strategy gradient algorithm is reduced by 5.48% compared with the total cost of the NSGA-II algorithm, the cost of the improved dual-delay depth deterministic strategy gradient algorithm and the cost of the DDPG algorithm are respectively reduced by 2.28% and 7.28%, and the output results of the method 4, namely the method provided by the invention, are the best in improving the running economy and low carbon performance of the system.
Table4 methods systems run cost tables (Unit: meta) Table4 Runningcosts ofeachmethodsystem
To further verify the improvement of the improved dual-delay depth deterministic strategy gradient algorithm compared with the optimization speed before the improvement, the two methods are respectively set to 1200 training rounds and then substituted into the same load and distributed power output prediction data, and the comparison result is shown in fig. 8. As shown by comparison analysis by taking the output result of the unmodified dual-delay depth deterministic strategy gradient algorithm as a reference, the output result of the modified dual-delay depth deterministic strategy gradient algorithm is reduced in various costs in the system operation compared with those before modification, so that under the condition of the same training 1200 rounds, the modified dual-delay depth deterministic strategy gradient algorithm finds a better strategy than that before modification.
According to the comparison analysis, the improved dual-delay depth deterministic strategy gradient algorithm provided by the invention can further improve the training efficiency on the basis of retaining the advantages of the dual-delay depth deterministic strategy gradient algorithm, and in an applied IES low-carbon economic scheduling scene, the low-carbon performance and the economical performance of the system operation can be better considered compared with other three methods.
The improved dual-delay depth deterministic strategy gradient algorithm provided by the invention realizes a priority experience playback mechanism of a deterministic strategy method by adopting SumToee for storage sampling of historical experience data, and has good applicability, optimality and self-adaptability in complex energy scheduling environments and multi-market demand application scenes of a comprehensive energy system low-carbon economic scheduling problem as a weighted sampling method.
While the present invention has been described in detail with reference to the drawings, the present invention is not limited to the above embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims (4)

1. A scheduling decision model establishing method based on a SumPree-TD 3 algorithm is characterized by comprising the following steps of:
step1: constructing a low-carbon economic dispatching model of the comprehensive energy system, and constructing a model for an objective function and constraint conditions;
step2: carrying out Markov modeling on the low-carbon economic dispatching model of the comprehensive energy system to obtain a Markov decision model with the low-carbon economic dispatching characteristic of the comprehensive energy system;
step3: and establishing a scheduling decision model of an improved dual-delay depth deterministic strategy gradient algorithm according to the Markov decision model with the comprehensive energy system low-carbon economic scheduling characteristic and a SumToe computer data tree structure.
2. The scheduling decision model building method based on the SumTree-TD3 algorithm according to claim 1, wherein the comprehensive energy system low-carbon economic scheduling model comprises:
a solar unit and a wind power generator unit;
the solar energy unit and the wind power generator set adopt corresponding output data of the unit, and the output power of a photovoltaic power supply in the solar energy unit and the output power of the wind power generator set at the moment t are respectively P PV (t) and P WT (t) represents;
gas turbines and boilers;
the relation between the electricity, the heat power and the consumed natural gas amount of the gas turbine and the boiler is as follows:
P GT (t)=η GT H gas G GT (t) (1)
Q GT (t)=(1-η GT )(1-ω GT )H gas G GT (t) (2)
Q WHB (t)=η WHB Q GT (t) (3)
wherein: g GT (t)、P GT (t)、Q GT (t)、Q WHB (t) represents the natural gas amount, the power generation and the power generation of the combustion of the gas turbine at the time t and the power generation of the waste heat boiler respectively, H gas Is natural gas calorific value eta GT For gas turbine electric conversion efficiency, eta WHB Heat conversion efficiency omega of waste heat boiler GT Is the heat loss coefficient;
a gas-fired boiler;
when the heat energy recovered by the waste heat boiler is insufficient for supplying the heat load, the gas boiler is used as the heat load shortage supplementing equipment, and the relationship between the input natural gas quantity and the output heating power is as follows:
Q GB (t)=η GB H gas G GB (t) (4)
wherein: q (Q) GB (t)、G GB (t) represents the heating power of the gas boiler at the moment t and the natural gas quantity, eta GB The heat conversion efficiency of the gas boiler is;
a main grid;
the main power grid for energy transaction of the comprehensive energy system implements a time-of-use electricity price strategy, and the energy transaction is carried out according to the strategy;
a battery energy storage system;
the battery energy storage system stores electric energy and configures the scale of the electric energy when the output of the distributed power supply is excessive and the energy storage system does not reach the maximum allowable capacity, and the energy storage allowance of the system at the moment t is as follows:
B(t)=B(t-1)+η cha P B,cha (t)-η dis P B,dis (t) (5)
wherein: b (t) and B (t-1) respectively represent the energy storage allowance, eta at the time of t and t-1 cha 、η di s respectively represent the charge and discharge efficiency of the energy storage system, P B,cha (t) represents the charging power at time t, P B,dis And (t) represents the discharge power at the moment t, and the state of charge at the moment t of the energy storage system is as follows:
wherein: SOC (State of Charge) B (t) represents the state of charge of the energy storage system at the moment t, B max Representing the maximum capacity of the energy storage system;
an objective function;
the total running cost of the system consists of gas purchasing cost, environmental pollution treatment cost, system operation and maintenance cost and energy transaction cost with a main power grid, and the objective function is expressed as follows:
f=min(c gas +c env +c run +c mg ) (7)
wherein: c gas Representing the cost of purchasing gas c en v represents the environmental pollution treatment cost, c run Representing the running maintenance cost, c mg Representing energy trade costs with a main grid;
constraint conditions;
the constraint conditions of the power supply output are as follows:
wherein: p (P) PV,min 、P WT,min 、P GT,mi n represents the lower limit of the output of the photovoltaic, fan and gas turbine, P PV,max 、P WT,max 、P GT,max Respectively representing the upper limits of the output of the photovoltaic, the fan and the gas turbine;
according to the operation characteristics of the gas turbine, the power climbing constraint condition of the gas turbine needs to be met:
ΔP GT,min ≤P GT (t+1)-P GT (t)≤ΔP GT,max (9)
wherein: ΔP GT,max And delta P GT,min Respectively representing the upper limit and the lower limit of the climbing power of the gas turbine;
electric power balance constraint conditions:
wherein: l (L) e,i (t) represents the ith electric load power at time t, N e Representing the total number of electrical loads;
thermal power balance constraint conditions:
wherein: l (L) h,j (t) represents the jth thermal load power at time t, N h Representing the total number of thermal loads;
constraint conditions of the electric energy storage system:
wherein: p (P) B,cha,max 、P B,cha,min And P B,dis,max 、P B,dis,min Respectively represents the minimum and maximum charging power of the energy storage systemAnd minimum and maximum discharge power, B min 、B max Respectively representing the minimum and maximum allowable capacities of the energy storage system and the SOC B,min 、SOC B,max Respectively representing the minimum and maximum energy storage system charge states;
in order to ensure the stability of the operation of the main power grid side, the real-time power interaction constraint condition with the main power grid needs to be met:
P mg,min ≤P mg (t)≤P mg,max (13)
wherein: p (P) mg,min 、P mg,max And respectively representing the lower limit and the upper limit of the interaction power of the comprehensive energy system and the main power grid.
3. The scheduling decision model building method based on the SumPree-TD 3 algorithm according to claim 1, wherein the Markov decision model building process with the comprehensive energy system low-carbon economic scheduling characteristic is as follows:
in the preset scene of the Markov decision model, a state space set consists of distributed power output, the charge state of a battery energy storage system, electricity price information and two types of load demand, and a state space s (t) is expressed as follows:
wherein: p (P) DG (t) represents the total output power of the photovoltaic power supply and the wind turbine generator system at each time t;
constructing an intelligent body, wherein the intelligent body can schedule the output of a gas turbine and a gas boiler, the charge and discharge of a battery energy storage system and the purchase and sale electric quantity of a main power grid at each moment t, so the action space a (t) can be expressed as:
a(t)=P GT (t),Q GB (t),B a (t),P mg (t) (15)
wherein: b (B) a (t) represents the charge and discharge motion amount of the battery energy storage system;
setting a reward value function as a corresponding objective function to take negative, and adding electric and thermal power unbalance caused by equipment output as a penalty function into the reward value function, wherein the method specifically comprises the following steps:
wherein: c i (t) (i=1, 2,3, 4) respectively corresponds to the gas purchasing cost, the environmental pollution treatment cost, the operation and maintenance cost and the energy transaction cost with the main power grid of each scheduling period t, alpha i Prize value weights representing corresponding costs, g (t) represents penalty function, β c 、β g Representing a bonus value function and a penalty function coefficient;
the power imbalance penalty function is expressed as:
wherein: lambda (lambda) P 、λ Q Respectively represents the penalty factors epsilon of the constraint conditions of electric power and thermal power P (t)、ε Q (t) represents the degree of imbalance of the two types of constraints, respectively.
4. The scheduling decision model building method based on the SumTree-TD3 algorithm according to claim 1, wherein the scheduling decision model building process of the improved dual-delay depth deterministic strategy gradient algorithm is as follows:
data storage sampling based on SumTree;
the Critic network of the degree decision model adopts an action-cost function to calculate TD-error:
δ=r tQ *Q(s t+1 ,a t+1 )-Q(s t ,a t ) (20)
wherein: gamma ray Q For the discount factor, Q (s t ,a t ) Representing action-cost function s t+1 、s t Respectively representing the corresponding states of t+1 and t time, a t+1 、a t Respectively representing actions taken at times t+1 and t;
taking TD-error of each piece of experience data as a priority index of the data to obtain the sampled priority probability of the data:
wherein: ρ l 、δ l The sampled priority probability of the first empirical data and the corresponding TD-error are respectively represented, and upsilon=0 is uniform sampling and upsilon=1 is greedy policy sampling;
initializing newly added experience data:
δ l,0 =δ max (22)
wherein: delta l,0 TD-error, delta representing the first empirical data added to the empirical playback buffer max Representing the maximum TD-error in the empirical playback buffer beta;
the intelligent training process is as follows;
(1) Initializing three realistic network parameters θ 1 ,θ 2And initializing three target networks with the same parameter values: θ 1 ’←θ 1 ,θ 2 ’←θ 2 ,φ’←φ;
(2) Setting the content of an experience playback buffer zone BETA and the number N of sampling data during training;
(3) The acquisition and addition of the experience data tuple to the BETA are specifically:
a: random initial fetch from historical dataState s t
b:π φ Selecting action a in state st in combination with noise x t
a t =π φ (s t )+x,x~N(0,σ)
c: in action a t Interaction with the environment to obtain a prize value r t And the next state s t+1 And form a data tuple s t ,a t ,r t ,a t+1
d: delta of data is used as a priority index of the data, and the delta is sequentially stored in SumToe leaf nodes according to a data adding sequence, and meanwhile, node values of related nodes are updated;
e: judging the number of the empirical data in the BETA, if the number does not reach the set capacity upper limit, making s at the moment t+1 As s in step b t Repeating the processes of the steps b-e, otherwise, finishing adding and giving the maximum delta in the beta to each piece of data;
(4) Sampling N data from the beta based on SumPree sampling mode, and pi-sampling each data φ 'adding a noise x' based on target strategy smoothing regularization to obtain s t+1 Corresponding target action a t+1
a t+1 =π φ’ (s t+1 )+x’,x’~clip[N(0,σ’),-ψ,ψ]
(5) Recording the s obtained t+1 ,a t+1 And observed rewards r t+1 Two Critic target networks are input to calculate the target value y t
(6) The error between the target value and the observed value is minimized based on the gradient descent algorithm, so that two Critic reality network parameters theta are updated:
(7) At a learning rate tau 1 Soft-updating the target network parameters by calculating a weighted average of the real network and the target network parameters:
θ i ’←τ 1 θ i +(1-τ 1i ’,i=1,2
(8) Re-calculating delta of the data and updating node values of the leaf nodes and related nodes where the delta is located;
(9) After d steps are updated by the Critic network, the parameter phi of the Actor reality network is updated by a gradient descent algorithm:
(10) At a learning rate tau 2 To soft update the Actor target network parameters:
φ’←τ 2 φ+(1-τ 2 )φ’
and (4) to (10) above are circulated, and the prize value is recorded.
CN202311320628.5A 2023-10-12 2023-10-12 Scheduling decision model establishment method based on SumPree-TD 3 algorithm Pending CN117291390A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311320628.5A CN117291390A (en) 2023-10-12 2023-10-12 Scheduling decision model establishment method based on SumPree-TD 3 algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311320628.5A CN117291390A (en) 2023-10-12 2023-10-12 Scheduling decision model establishment method based on SumPree-TD 3 algorithm

Publications (1)

Publication Number Publication Date
CN117291390A true CN117291390A (en) 2023-12-26

Family

ID=89240692

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311320628.5A Pending CN117291390A (en) 2023-10-12 2023-10-12 Scheduling decision model establishment method based on SumPree-TD 3 algorithm

Country Status (1)

Country Link
CN (1) CN117291390A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117993693A (en) * 2024-04-03 2024-05-07 国网江西省电力有限公司电力科学研究院 Zero-carbon park scheduling method and system for behavior clone reinforcement learning

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117993693A (en) * 2024-04-03 2024-05-07 国网江西省电力有限公司电力科学研究院 Zero-carbon park scheduling method and system for behavior clone reinforcement learning

Similar Documents

Publication Publication Date Title
CN110365057B (en) Distributed energy participation power distribution network peak regulation scheduling optimization method based on reinforcement learning
Feng et al. Multi-objective quantum-behaved particle swarm optimization for economic environmental hydrothermal energy system scheduling
CN110458443A (en) A kind of wisdom home energy management method and system based on deeply study
CN108206543A (en) A kind of energy source router and its running optimizatin method based on energy cascade utilization
CN107147152A (en) New energy power distribution network polymorphic type active reactive source cooperates with Optimal Configuration Method and system
CN104392394B (en) A kind of detection method of micro-capacitance sensor energy storage nargin
CN117291390A (en) Scheduling decision model establishment method based on SumPree-TD 3 algorithm
CN106712075A (en) Peaking strategy optimization method considering safety constraints of wind power integration system
CN115409645A (en) Comprehensive energy system energy management method based on improved deep reinforcement learning
CN110264012A (en) Renewable energy power combination prediction technique and system based on empirical mode decomposition
CN115759604B (en) Comprehensive energy system optimal scheduling method
CN117077960A (en) Day-ahead scheduling optimization method for regional comprehensive energy system
CN115795992A (en) Park energy Internet online scheduling method based on virtual deduction of operation situation
CN116432824A (en) Comprehensive energy system optimization method and system based on multi-target particle swarm
Liu et al. Multi-agent quantum-inspired deep reinforcement learning for real-time distributed generation control of 100% renewable energy systems
CN114417695A (en) Multi-park comprehensive energy system economic dispatching method
CN116128163B (en) Comprehensive energy optimization method and device considering green hydrogen production and storage and user satisfaction
CN116050632B (en) Micro-grid group interactive game strategy learning evolution method based on Nash Q learning
CN117691586A (en) New energy base micro-grid optimized operation method and system based on behavior cloning
Wang et al. Research on short‐term and mid‐long term optimal dispatch of multi‐energy complementary power generation system
CN117543581A (en) Virtual power plant optimal scheduling method considering electric automobile demand response and application thereof
CN117595392A (en) Power distribution network joint optimization method and system considering light Fu Xiaona and light storage and charge configuration
CN117254529A (en) Power distribution network real-time scheduling method and system considering carbon emission and uncertainty
CN115833244A (en) Wind-light-hydrogen-storage system economic dispatching method
CN114285093B (en) Source network charge storage interactive scheduling method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination