CN116562423A - Deep reinforcement learning-based electric-thermal coupling new energy system energy management method - Google Patents

Deep reinforcement learning-based electric-thermal coupling new energy system energy management method Download PDF

Info

Publication number
CN116562423A
CN116562423A CN202310315494.1A CN202310315494A CN116562423A CN 116562423 A CN116562423 A CN 116562423A CN 202310315494 A CN202310315494 A CN 202310315494A CN 116562423 A CN116562423 A CN 116562423A
Authority
CN
China
Prior art keywords
formula
expressed
representing
constraint
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310315494.1A
Other languages
Chinese (zh)
Inventor
毋格一
杨远超
安雯静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Polytechnic University
Original Assignee
Xian Polytechnic University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Polytechnic University filed Critical Xian Polytechnic University
Priority to CN202310315494.1A priority Critical patent/CN116562423A/en
Publication of CN116562423A publication Critical patent/CN116562423A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/28Arrangements for balancing of the load in a network by storage of energy
    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F28HEAT EXCHANGE IN GENERAL
    • F28DHEAT-EXCHANGE APPARATUS, NOT PROVIDED FOR IN ANOTHER SUBCLASS, IN WHICH THE HEAT-EXCHANGE MEDIA DO NOT COME INTO DIRECT CONTACT
    • F28D20/00Heat storage plants or apparatus in general; Regenerative heat-exchange apparatus not covered by groups F28D17/00 or F28D19/00
    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F28HEAT EXCHANGE IN GENERAL
    • F28FDETAILS OF HEAT-EXCHANGE AND HEAT-TRANSFER APPARATUS, OF GENERAL APPLICATION
    • F28F27/00Control arrangements or safety devices specially adapted for heat-exchange or heat-transfer apparatus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06313Resource planning in a project environment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/38Arrangements for parallely feeding a single network by two or more generators, converters or transformers
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2300/00Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
    • H02J2300/20The dispersed energy generation being of renewable origin
    • H02J2300/28The renewable source being wind energy
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Strategic Management (AREA)
  • Biomedical Technology (AREA)
  • Tourism & Hospitality (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Power Engineering (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Mechanical Engineering (AREA)
  • Development Economics (AREA)
  • Thermal Sciences (AREA)
  • Quality & Reliability (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Primary Health Care (AREA)
  • Educational Administration (AREA)
  • Probability & Statistics with Applications (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention discloses an energy management method of an electric-thermal coupling new energy system based on deep reinforcement learning, which comprises the steps of establishing an optimized operation model objective function and response constraint conditions aiming at the electric-thermal coupling new energy system; expressing the established model as a Markov decision process, defining the established model as an environment under a deep reinforcement learning framework, and designing a corresponding reward function mechanism; an improved multi-threaded PPO based algorithm is used to maximize the jackpot as desired, resulting in an optimal energy management strategy. According to the deep reinforcement learning-based electric-thermal coupling new energy system energy management method, a pumped storage unit and a cogeneration unit are considered in the electric-thermal coupling new energy system, and the electric heating relation of the cogeneration unit is decoupled; in the optimization process, the method is applied to the energy management problem of the electric-thermal coupling new energy system based on the deep reinforcement learning multithreading PPO algorithm, and can improve the utilization rate of renewable energy sources.

Description

Deep reinforcement learning-based electric-thermal coupling new energy system energy management method
Technical Field
The invention belongs to the technical field of energy system optimization control methods, and particularly relates to an electric-thermal coupling new energy system energy management method based on deep reinforcement learning.
Background
In recent years, the energy demand is greatly increased, the environmental problems are increasingly severe, the development of renewable energy sources represented by wind and light is quickened, and the promotion of energy conversion is a concern in various countries at present. The large-scale grid connection of renewable energy sources with strong uncertainty brings great challenges to the optimized operation of the traditional power system. The electric-thermal coupling system (Electronic-Heat combined system) realizes energy complementary utilization, and provides an effective way for further improving the utilization rate of renewable energy sources by virtue of the flexible energy supply characteristic.
At present, the industry commonly adopts traditional mathematical programming algorithms and heuristic algorithms. The method is characterized in that the method is used for solving the problem of optimizing and scheduling in the future, and the method is used for solving the problem of optimizing and scheduling in the future, wherein the method is used for solving the problem of optimizing and scheduling in the future, and the method is limited by the calculation speed and is used for a system with low calculation speed requirement.
Another class of artificial intelligence based learning driven algorithms, reinforcement learning (Reinforcement Learni ng, RL)/deep reinforcement learning (Deep Reinforcement Learning, DRL), trains agents to find optimal strategies in complex situations by constantly interacting with the environment for trial and error. Meanwhile, the trained agent does not need to rely on prediction information. At present, the technical problem of solving the economic scheduling problem of the electric-thermal coupling new energy system by adopting the RL is a hot direction in academia.
Disclosure of Invention
The invention aims to provide an energy management method of an electric-thermal coupling new energy system based on deep reinforcement learning, which can improve the utilization rate of renewable energy sources.
The technical scheme adopted by the invention is as follows: the electric-thermal coupling new energy system energy management method based on deep reinforcement learning comprises the following steps:
step 1, establishing an optimized operation model objective function and response constraint conditions aiming at an electric-thermal coupling new energy system;
step 2, expressing the model established in the step 1 as a Markov decision process, defining the Markov decision process as an environment under a deep reinforcement learning framework, and designing a corresponding rewarding function mechanism;
and 3, using an improved multithreading PPO algorithm to obtain an optimal energy management strategy by taking the maximum cumulative rewards as expectations.
The present invention is also characterized in that,
the optimal operation model established in the step 1 aims at minimizing operation cost of the dispatching cycle of the electric-thermal coupling new energy system, and the objective function is expressed as:
F=min(f 1 +f 2 ) (1)
in the formula (1), F is a system optimization operation target, F 1 Representing the running cost of the system within a scheduling day, expressed as:
f 1 =C g +C chp (2)
in the formula (2), C g C is the running cost of the thermal power generating unit chp The operation cost of the cogeneration unit is;
in the formula (3), the amino acid sequence of the compound,representing the running cost of the thermal power unit at the time t, P t g Represents the output of the thermal power unit at the moment t, alpha ggg Is the energy consumption coefficient of the thermal power generating unit;
in the formula (4), the amino acid sequence of the compound,representing the running cost of the cogeneration unit at the time t, and P t chp Representing the active power of the cogeneration unit at the t moment,/->The heat output of the cogeneration unit at the moment t is represented by a 0 ,a 1 ,a 2 ,a 3 ,a 4 ,a 5 ,a 6 The energy consumption coefficient of the cogeneration unit;
in the formula (1), f 2 Representing penalty function terms, expressed as:
f 2 =λd p +γd h (5)
in the formula (5), λd p Punishment for the grid, d p Is equal in value to the wind curtailed power or the cut-off load; γd h Punishment for heat supply network, d h Equal in value to the square of the rejected heat load.
The response constraints in step 1 include:
thermal power generating unit output constraint P t g Expressed as:
in the formula (6), the amino acid sequence of the compound,and->Respectively the maximum and minimum values of the output force of the thermal power generating unit;
climbing constraint R of thermal power generating unit g Expressed as:
in the formula (7), the amino acid sequence of the compound,and->The climbing speed and the climbing speed of the thermal power generating unit are respectively the climbing speed and the climbing speed of the thermal power generating unit;
active power output constraint P of cogeneration unit t chp Expressed as:
in the formula (8), the amino acid sequence of the compound,and->Representing the maximum and minimum value of active output of the cogeneration unit;
heat generation constraint of cogeneration unitExpressed as:
in the formula (9), the amino acid sequence of the compound,representing a thermocoupleMaximum heat generation of the generator set;
pumped power constraint P of pumped storage power station t h.p Expressed as:
in the formula (10), the amino acid sequence of the compound,and->Representing the maximum and minimum pumping power of a pumped storage power station;
power generation power constraint P of pumped storage power station t h.g Expressed as:
in the formula (11), the amino acid sequence of the compound,and->Representing the maximum and minimum power generated by the pumped storage power station;
pumped storage power station upstream water reservoir capacity constraint V u Expressed as:
in the formula (12), the amino acid sequence of the compound,and->Representing the maximum value of the storage capacity of the upstream reservoir available for power generationAnd a minimum value;
pumped storage power station downstream reservoir capacity constraint V d Expressed as:
in the formula (13), the amino acid sequence of the compound,and->Maximum and minimum values of reservoir capacity available for pumping water representing downstream reservoirs;
upper and lower reservoir state transition constraint of pumped storage power stationExpressed as:
in the formula (14), V t u ,V t d Representing the water quantity of the upper reservoir and the lower reservoir at the moment t, X h E {0,1} represents the power generation state; in the formula (15), Y h E {0,1} represents the pumping state, X h +Y h =1; Δt is a scheduling period, η g And eta p Generating and pumping efficiency factors for the pumped storage power station;
wind turbine generator set output constraint P t w Expressed as:
in the formula (16), the amino acid sequence of the compound,and->Representing the maximum value and the minimum value of the output of the wind turbine;
heat storage device heat supply state constraintAnd state of charge constraint->Expressed as:
in the formula (17), the amino acid sequence of the compound,represents the maximum value of the heat storage tank heat filling power, +.>Representing the maximum value of heat supply power of the heat storage tank; formula (18) represents +.>During heat supply->A state variable;
thermal storage tank state transition constraintsExpressed as:
in the formula (19), the amino acid sequence of the compound,the heat supply condition of the heat storage tank at the time t is shown, wherein more than zero indicates that heat is supplied to a heat load, and less than zero indicates that redundant heat of the cogeneration unit is absorbed;
heat storage tank output constraintExpressed as:
in the formula (20), the amino acid sequence of the compound,and->Representing the maximum and minimum values of heat supply/storage of the heat storage tank.
The step 2 specifically comprises the following steps: the state and action space of the Markov decision process of the model obtained in the step 1 are defined and expressed by a five-tuple (S, A, P, R, gamma), S is a state space set, A is an action space set, and P is S t-τ ×A t →S t Is the state transition probability, R is the reward function, gamma E [0,1 ]]Is the learning rate; then:
the markov decision process state space is expressed as:
in the formula (21), the amino acid sequence of the amino acid,representing the lastOutput of thermal power generating unit at end of scheduling period +.>And->Indicating the active power output and the heat output of the cogeneration unit at the end of the last scheduling period, and +.>And->Respectively representing the reserve states of the pumped storage power station and the heat storage device at the end of the last scheduling period, and +.>Representing predicted active power generated by wind power;
the markov decision process action space is expressed as:
in the formula (22), P t g Represents the output of the thermal power unit at the time t, P t chp Andrespectively representing the active power output and the heat output of the decoupled cogeneration unit, P t h The action of the pumped storage power station at the moment t is shown;
the markov decision process reward function is expressed as a negative value of the objective function:
r t (s t ,a t )=-F (23)。
the step 3 specifically comprises the following steps:
step 3.1, according to a formula (21) and real system operation data, randomly initializing different system states including states of a pumped storage unit, a heat storage device, a thermal power unit and a cogeneration unit in a constraint range, collecting system state information by adopting multiple threads, setting the number of collected data threads to be 4, and storing the state information into a sample pool;
step 3.2, initializing the weight of the neural network; setting the actor and critic neural network learning rate C of the main PPO algorithm actor =0.0003,C critci =0.001, rewarding discount factorDominance function shear coefficient epsilon=0.2; both the actor and critic network structures include input layers: the number of neurons is the number of system states, two hidden layers: the number of neurons is 64, the output layer: the number of the neurons is the number of actions, and the hyperbolic tangent function is adopted by each layer of neural network activation function;
step 3.3, inputting the system state information collected in the sample pool into an actor neural network, and outputting a corresponding control strategy pi (a|s), namely action distribution; from equation (22), the motion vector a is obtained t The elements in the system correspond to the next moment action of each unit in the system;
step 3.4, applying the actions to each unit state of the current system to obtain a state vector s of the unit at the next moment t+1 Calculating an instant prize r according to equation (23) t The thus obtained (s t ,a t ,r t ,s t+1 ) Collecting and storing in a sample pool;
step 3.5, value network critic is based on (s in the sample pool t ,a t ,r t ,s t+1 ) Calculating the total state value V (t) and the state action value Q (t) in a collecting way, and circulating the steps 3.2-3.4 until the dispatching day is finished;
step 3.6, updating the neural network parameters according to the formula A π (s,a)=Q π (s,a)-V π (s) calculating the dominance function A π (s, a) updating the actor and critic network parameters by means of gradient descent and parameter direction propagation;
and step 3.7, cycling the steps 3.2-3.6 until all training days are reached, and storing a training model.
The beneficial effects of the invention are as follows: according to the deep reinforcement learning-based electric-thermal coupling new energy system energy management method, a pumped storage unit and a cogeneration unit are considered in the electric-thermal coupling new energy system, and the electric heating relation of the cogeneration unit is decoupled; in the optimization process, the method is applied to the energy management problem of the electric-thermal coupling new energy system based on the deep reinforcement learning multithreading PPO algorithm, and can improve the utilization rate of renewable energy sources.
Drawings
FIG. 1 is a model diagram of an electro-thermal coupling new energy system applied to the deep reinforcement learning-based electro-thermal coupling new energy system energy management method of the present invention;
FIG. 2 is a flow chart of an improved multithreading PPO algorithm employed by the deep reinforcement learning-based electric-thermal coupling new energy system energy management method of the present invention;
FIG. 3 a) is a schematic diagram of an application of the deep reinforcement learning-based method for energy management of an electric-thermal coupling new energy system according to the present invention;
fig. 3 b) is a schematic diagram of an application of the deep reinforcement learning-based energy management method for an electric-thermal coupling new energy system according to the present invention.
Detailed Description
The invention will be described in detail with reference to the accompanying drawings and detailed description.
The invention provides an energy management method of an electric-thermal coupling new energy system based on deep reinforcement learning, which aims at the electric-thermal coupling new energy system shown in figure 1 and comprises two networks. The electric power network comprises a thermal power unit and a pumped storage unit, and the thermal power network comprises a heat storage device. The two networks are interconnected and intercommunicated through the cogeneration unit.
Step 1, aiming at an electric-thermal coupling new energy system optimizing operation model, establishing an objective function and response constraint conditions of the electric-thermal coupling new energy system, wherein the objective function and response constraint conditions comprise electric and thermal load balance constraint, upper and lower limit constraint of output of a thermal power unit and a cogeneration unit, climbing constraint and reserve capacity constraint of energy storage;
the objective function is expressed as:
F=min(f 1 +f 2 ) (1)
in the formula (1), the components are as follows,f is a system optimization operation target, F 1 Representing the running cost of the system within a scheduling day, expressed as:
f 1 =C g +C chp (2)
in the formula (2), C g C is the running cost of the thermal power generating unit chp The operation cost of the cogeneration unit is;
in the formula (3), the amino acid sequence of the compound,representing the running cost of the thermal power unit at the time t, P t g Represents the output of the thermal power unit at the moment t, alpha ggg Is the energy consumption coefficient of the thermal power generating unit;
in the formula (4), the amino acid sequence of the compound,representing the running cost of the cogeneration unit at the time t, and P t chp Representing the active power of the cogeneration unit at the t moment,/->The heat output of the cogeneration unit at the moment t is represented by a 0 ,a 1 ,a 2 ,a 3 ,a 4 ,a 5 ,a 6 The energy consumption coefficient of the cogeneration unit;
in the formula (1), f2 represents a penalty function term expressed as:
f 2 =λd p +γd h (5)
in the formula (5), λd p Punishment for the grid, d p Is equal in value to the wind curtailed power or the cut-off load; γd h Punishment for heat supply networks,d h Equal in value to the square of the rejected heat load.
The response constraints include:
thermal power generating unit output constraint P t g Expressed as:
in the formula (6), the amino acid sequence of the compound,and->Respectively the maximum and minimum values of the output force of the thermal power generating unit;
climbing constraint R of thermal power generating unit g Expressed as:
in the formula (7), the amino acid sequence of the compound,and->The climbing speed and the climbing speed of the thermal power generating unit are respectively the climbing speed and the climbing speed of the thermal power generating unit;
active power output constraint P of cogeneration unit t chp Expressed as:
in the formula (8), the amino acid sequence of the compound,and->Representing the maximum and minimum value of active output of the cogeneration unit;
heat generation constraint of cogeneration unitExpressed as:
in the formula (9), the amino acid sequence of the compound,representing the maximum heat generation value of the cogeneration unit;
pumped power constraint P of pumped storage power station t h.p Expressed as:
in the formula (10), the amino acid sequence of the compound,and->Representing the maximum and minimum pumping power of a pumped storage power station;
power generation power constraint P of pumped storage power station t h.g Expressed as:
in the formula (11), the amino acid sequence of the compound,and->Representing the maximum and minimum power generated by the pumped storage power station;
pumped storage power station upstream water reservoir capacity constraint V u Expressed as:
in the formula (12), the amino acid sequence of the compound,and->Maximum and minimum values of reservoir capacity available for power generation representing the upstream reservoir;
pumped storage power station downstream reservoir capacity constraint V d Expressed as:
in the formula (13), the amino acid sequence of the compound,and->Maximum and minimum values of reservoir capacity available for pumping water representing downstream reservoirs;
upper and lower reservoir state transition constraint of pumped storage power stationExpressed as:
in the formula (14), V t u ,V t d Representing the water quantity of the upper reservoir and the lower reservoir at the moment t, X h E {0,1} represents the power generation state; in the formula (15), Y h E {0,1} represents the pumping state, X h +Y h =1; Δt is a scheduling period, η g And eta p Generating and pumping efficiency factors for the pumped storage power station;
wind turbine generator set output constraint P t w Expressed as:
in the formula (16), the amino acid sequence of the compound,and->Representing the maximum value and the minimum value of the output of the wind turbine;
heat storage device heat supply state constraintAnd state of charge constraint->Expressed as:
in the formula (17), the amino acid sequence of the compound,represents the maximum value of the heat storage tank heat filling power, +.>Representing the maximum value of heat supply power of the heat storage tank; formula (18) represents +.>During heat supply->A state variable;
thermal storage tank state transition constraintsExpressed as:
in the formula (19), the amino acid sequence of the compound,the heat supply condition of the heat storage tank at the time t is shown, wherein more than zero indicates that heat is supplied to a heat load, and less than zero indicates that redundant heat of the cogeneration unit is absorbed;
heat storage tank output constraintExpressed as:
in the formula (20), the amino acid sequence of the compound,and->Representing the maximum and minimum values of heat supply/storage of the heat storage tank.
And 2, expressing the model established in the step 1 as a Markov decision process, defining the Markov decision process as an environment under a deep reinforcement learning framework, and designing a corresponding reward function mechanism. The method comprises the following steps:
the state and action space of the Markov decision process of the model obtained in the step 1 are defined and expressed by a five-tuple (S, A, P, R, gamma), S is a state space set, A is an action space set, and P is S t-τ ×A t →S t Is the state transition probability, R is the reward function, gamma E [0,1 ]]Is the learning rate; then:
the markov decision process state space is expressed as:
in the formula (21), the amino acid sequence of the amino acid,indicating the output of the thermal power generating unit at the end of the last scheduling period,/->And->Indicating the active power output and the heat output of the cogeneration unit at the end of the last scheduling period, and +.>And->Respectively representing the reserve states of the pumped storage power station and the heat storage device at the end of the last scheduling period, and +.>Representing predicted active power generated by wind power;
the markov decision process action space is expressed as:
in the formula (22), P t g And the output of the thermal power generating unit at the time t is shown. In order to meet the demand of the thermal load, the cogeneration unit increases its thermal power to be forced to generate a large amount of electric power. The thermal decoupling of the output of the cogeneration unit is needed, and the specific operation is as follows: and according to the polygonal model of the operation of the cogeneration unit, a heat storage device is arranged on the cogeneration unit. At the moment, the thermoelectricity operable operation area of the cogeneration unit is enlarged in the hot output direction, so that the adjustable range of the electric output is enlarged under the condition that the heat load is stable and the electric load fluctuation is large, the peak regulation capacity of the system is greatly increased, and the capacity of the system for absorbing wind power is improved. P (P) t chp Andrespectively representing the active power output and the heat output of the decoupled cogeneration unit, P t h The action of the pumped storage power station at the moment t is shown;
the markov decision process reward function is expressed as a negative value of the objective function:
rt(st,at)=-F (23)。
and 3, using an improved multithreading PPO algorithm to maximize the return rewards of the selected actions in the training round, so as to maximize the accumulated rewards as expected, and ensuring that the learned energy management strategy is optimal. The algorithm flow as shown in fig. 2 includes the following steps:
step 3.1, according to a formula (21) and real system operation data, randomly initializing different system states including states of a pumped storage unit, a heat storage device, a thermal power unit and a cogeneration unit in a constraint range, collecting system state information by adopting multiple threads, setting the number of collected data threads to be 4, and storing the state information into a sample pool;
step 3.2, initializing the weight of the neural network; setting the actor and critic neural network learning rate C of the main PPO algorithm actor =0.0003,C critci =0.001, rewarding discount factorDominance function shear coefficient epsilon=0.2; both the actor and critic network structures include input layers: the number of neurons is the number of system states, two hidden layers: the number of neurons is 64, the output layer: the number of the neurons is the number of actions, and the hyperbolic tangent function is adopted by each layer of neural network activation function;
step 3.3, inputting the system state information collected in the sample pool into an actor neural network, and outputting a corresponding control strategy pi (a|s), namely action distribution; from equation (22), the motion vector a is obtained t The elements in the system correspond to the next moment action of each unit in the system;
step 3.4, applying the actions to each unit state of the current system to obtain a state vector s of the unit at the next moment t+1 Calculating an instant prize r according to equation (23) t The thus obtained (s t ,a t ,r t ,s t+1 ) Collecting and storing in a sample pool;
step 3.5, value network critic is based on (s in the sample pool t ,a t ,r t ,s t+1 ) Calculating the total state value V (t) and the state action value Q (t) in a collecting way, and circulating the steps 3.2-3.4 until the dispatching day is finished;
step 3.6, updating the neural network parameters according to the formula A π (s,a)=Q π (s,a)-V π (s) calculating the dominance function A π (s, a) updating the actor and critic network parameters by means of gradient descent and parameter direction propagation;
and step 3.7, cycling the steps 3.2-3.6 until all training days are reached, and storing a training model.
By the method, the energy management method of the electric-thermal coupling new energy system based on deep reinforcement learning comprises the steps of firstly, aiming at an electric-thermal coupling new energy system optimizing operation model, establishing a model objective function and response constraint conditions, wherein the model objective function and response constraint conditions comprise electric and thermal load balance constraint, unit output upper and lower limit constraint, climbing constraint and energy storage reserve capacity constraint; then, under the deep reinforcement learning framework, a Markov decision process, namely an environment, of the model is defined, and a corresponding reward function mechanism is designed, so that modeling of a complex multi-energy system is avoided; finally, the algorithm PPO is optimized by utilizing a near-end strategy, a multi-thread mode is added in the data acquisition link of the algorithm, so that the training efficiency of the algorithm can be greatly improved while the convergence performance is not lost, the training cost of performing energy management of an electric-thermal coupling new energy system by using deep reinforcement learning is greatly reduced, the system energy management under various complex conditions can be realized by constantly interacting the algorithm with the environment and learning the scheduling strategy, and the utilization rate of renewable energy sources is improved.
Fig. 3 a) and 3 b) show the energy allocation plan at 15min granularity in the case of economic dispatch before day. Wherein, fig. 3 a) is a unit power output dispatching result, and fig. 3 b) is a unit hot output dispatching result. In order to make the result have universality, the application case uses the real power system load, the new energy output and the thermodynamic system heat load data, and inputs one of the summer typical daily load data for testing. At the same time, the first 15min and last 15min energy distribution schemes of the test typical day are discarded for accuracy.
As shown in fig. 3 a), the thermal power generating units 2 and 3 bear most of electricity consumption at most of the time in the day due to the superior economy, and the thermal power generating unit 1 adopts the most economical power-generating operation at most of the time in the day. And due to the existence of the wind abandoning punishment item, the combined dispatching of the thermal power unit and the drawing number energy storage unit can meet the maximum wind power consumption. In order to ensure the heat supply reliability, the heat supply reliability is influenced by the economical efficiency and the limit of a feasible operation area of the cogeneration unit, the fluctuation of the output force of the cogeneration unit is severe under different heat loads, so that the fluctuation of the electric output force is brought, and the fluctuation of the thermal power unit is relieved by adjusting the output force. Meanwhile, the fact that the water source is sufficient in summer is considered, and the water yield of the pumped storage power station can be properly increased, so that the pumping cost of the pumped storage unit is adjusted downwards in a typical daily test case in summer. As shown in fig. 3 b), the cogeneration unit output is determined with the aim of minimizing the cost; meanwhile, a heat storage tank (tst) exists in the system, so that the heat load can be shared to a certain extent, and the cogeneration unit can be operated in a more economical range. The energy distribution strategy obtained by the invention has the operation cost of 68.2581 ten thousand yuan. Meanwhile, the conventional mixed integer optimization algorithm is adopted to test the embodiment, and the obtained running cost is 68.3754 ten thousand yuan. The method of the invention is comparable to the traditional algorithm in terms of economic performance.

Claims (5)

1. The electric-thermal coupling new energy system energy management method based on deep reinforcement learning is characterized by comprising the following steps of:
step 1, establishing an optimized operation model objective function and response constraint conditions aiming at an electric-thermal coupling new energy system;
step 2, expressing the model established in the step 1 as a Markov decision process, defining the Markov decision process as an environment under a deep reinforcement learning framework, and designing a corresponding rewarding function mechanism;
and 3, using an improved multithreading PPO algorithm to obtain an optimal energy management strategy by taking the maximum cumulative rewards as expectations.
2. The deep reinforcement learning-based energy management method of the electric-thermal coupling new energy system according to claim 1, wherein the optimized operation model established in the step 1 is aimed at minimizing operation cost of the dispatching cycle of the electric-thermal coupling new energy system, and the objective function is expressed as:
F=min(f 1 +f 2 ) (1)
in the formula (1), F is a system optimization operation target, F 1 Representing the running cost of the system within a scheduling day, expressed as:
f 1 =C g +C chp (2)
in the formula (2), C g C is the running cost of the thermal power generating unit chp The operation cost of the cogeneration unit is;
in the formula (3), the amino acid sequence of the compound,representing the running cost of the thermal power unit at the time t, P t g Represents the output of the thermal power unit at the moment t, alpha ggg Is the energy consumption coefficient of the thermal power generating unit;
in the formula (4), the amino acid sequence of the compound,representing the running cost of the cogeneration unit at the time t, and P t chp Representing the active power of the cogeneration unit at the t moment,/->The heat output of the cogeneration unit at the moment t is represented by a 0 ,a 1 ,a 2 ,a 3 ,a 4 ,a 5 ,a 6 The energy consumption coefficient of the cogeneration unit;
in the formula (1), f 2 Representing penalty function terms, expressed as:
f 2 =λd p +γd h (5)
in the formula (5), λd p Punishment for the grid, d p Is equal in value to the wind curtailed power or the cut-off load; γd h Punishment for heat supply network, d h Equal in value to the square of the rejected heat load.
3. The deep reinforcement learning-based energy management method of an electric-thermal coupling new energy system of claim 1, wherein the response constraint condition in step 1 includes:
thermal power generating unit output constraint P t g Expressed as:
in the formula (6), the amino acid sequence of the compound,and->Respectively the maximum and minimum values of the output force of the thermal power generating unit;
climbing constraint R of thermal power generating unit g Expressed as:
in the formula (7), the amino acid sequence of the compound,and->The climbing speed and the climbing speed of the thermal power generating unit are respectively the climbing speed and the climbing speed of the thermal power generating unit;
active power output constraint P of cogeneration unit t chp Expressed as:
in the formula (8), the amino acid sequence of the compound,and->Representing the maximum and minimum value of active output of the cogeneration unit;
heat generation constraint of cogeneration unitExpressed as:
in the formula (9), the amino acid sequence of the compound,representing the maximum heat generation value of the cogeneration unit;
pumped power constraint P of pumped storage power station t h.p Expressed as:
in the formula (10), the amino acid sequence of the compound,and->Representing the maximum and minimum pumping power of a pumped storage power station;
power generation power constraint P of pumped storage power station t h.g Expressed as:
in the formula (11), the amino acid sequence of the compound,and->Representing the maximum and minimum power generated by the pumped storage power station;
pumped storage power station upstream water reservoir capacity constraint V u Expressed as:
in the formula (12), the amino acid sequence of the compound,and->Maximum and minimum values of reservoir capacity available for power generation representing the upstream reservoir;
pumped storage power station downstream reservoir capacity constraint V d Expressed as:
in the formula (13), the amino acid sequence of the compound,and->Maximum and minimum values of reservoir capacity available for pumping water representing downstream reservoirs;
upper and lower reservoir state transition constraint of pumped storage power stationExpressed as:
in the formula (14), V t u ,V t d Representing the water quantity of the upper reservoir and the lower reservoir at the moment t, X h E {0,1} representsA power generation state; in the formula (15), Y h E {0,1} represents the pumping state, X h +Y h =1; Δt is a scheduling period, η g And eta p Generating and pumping efficiency factors for the pumped storage power station;
wind turbine generator set output constraint P t w Expressed as:
in the formula (16), the amino acid sequence of the compound,and->Representing the maximum value and the minimum value of the output of the wind turbine;
heat storage device heat supply state constraintAnd state of charge constraint->Expressed as:
in the formula (17), the amino acid sequence of the compound,represents the maximum value of the heat storage tank heat filling power, +.>Representing the maximum value of heat supply power of the heat storage tank; formula (18) represents +.>During heat supply->A state variable;
thermal storage tank state transition constraintsExpressed as:
in the formula (19), the amino acid sequence of the compound,the heat supply condition of the heat storage tank at the time t is shown, wherein more than zero indicates that heat is supplied to a heat load, and less than zero indicates that redundant heat of the cogeneration unit is absorbed;
heat storage tank output constraintExpressed as:
in the formula (20), the amino acid sequence of the compound,and->Representing the maximum and minimum values of heat supply/storage of the heat storage tank.
4. The deep reinforcement learning-based energy management method of an electric-thermal coupling new energy system according to claim 1, wherein the step 2 specifically comprises: the state and action space of the Markov decision process of the model obtained in the step 1 are defined and expressed by a five-tuple (S, A, P, R, gamma), S is a state space set, A is an action space set, and P is S t-τ ×A t →S t Is the state transition probability, R is the reward function, gamma E [0,1 ]]Is the learning rate; then:
the markov decision process state space is expressed as:
in the formula (21), the amino acid sequence of the amino acid,indicating the output of the thermal power generating unit at the end of the last scheduling period,/->And->Indicating the active power output and the heat output of the cogeneration unit at the end of the last scheduling period, and +.>And->Respectively representing the reserve states of the pumped storage power station and the heat storage device at the end of the last scheduling period, and +.>Representing predicted active power generated by wind power;
the markov decision process action space is expressed as:
in the formula (22), P t g Represents the output of the thermal power unit at the time t, P t chp Andrespectively representing the active power output and the heat output of the decoupled cogeneration unit, P t h The action of the pumped storage power station at the moment t is shown;
the markov decision process reward function is expressed as a negative value of the objective function:
r t (s t ,a t )=-F (23)。
5. the deep reinforcement learning-based energy management method of an electric-thermal coupling new energy system according to claim 4, wherein the step 3 specifically comprises the steps of:
step 3.1, according to a formula (21) and real system operation data, randomly initializing different system states including states of a pumped storage unit, a heat storage device, a thermal power unit and a cogeneration unit in a constraint range, collecting system state information by adopting multiple threads, setting the number of collected data threads to be 4, and storing the state information into a sample pool;
step 3.2, initializing the weight of the neural network; setting the actor and critic neural network learning rate C of the main PPO algorithm actor =0.0003,C critci =0.001, rewarding discount factor y=0.95, dominance function clipping coefficient epsilon=0.2; both the actor and critic network structures include input layers: the number of neurons is the number of system states, two hidden layers: the number of neurons is 64, the output layer: the number of the neurons is the number of actions, and the hyperbolic tangent function is adopted by each layer of neural network activation function;
step 3.3, inputting the system state information collected in the sample pool into an actor godOutputting a corresponding control strategy pi (a|s), namely action distribution, through a network; from equation (22), the motion vector a is obtained t The elements in the system correspond to the next moment action of each unit in the system;
step 3.4, applying the actions to each unit state of the current system to obtain a state vector s of the unit at the next moment t+1 Calculating an instant prize r according to equation (23) t The thus obtained (s t ,a t ,r t ,s t+1 ) Collecting and storing in a sample pool;
step 3.5, value network critic is based on (s in the sample pool t ,a t ,r t ,s t+1 ) Calculating the total state value V (t) and the state action value Q (t) in a collecting way, and circulating the steps 3.2-3.4 until the dispatching day is finished;
step 3.6, updating the neural network parameters according to the formula A π (s,a)=Q π (s,a)-V π (s) calculating the dominance function A π (s, a) updating the actor and critic network parameters by means of gradient descent and parameter direction propagation;
and step 3.7, cycling the steps 3.2-3.6 until all training days are reached, and storing a training model.
CN202310315494.1A 2023-03-28 2023-03-28 Deep reinforcement learning-based electric-thermal coupling new energy system energy management method Pending CN116562423A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310315494.1A CN116562423A (en) 2023-03-28 2023-03-28 Deep reinforcement learning-based electric-thermal coupling new energy system energy management method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310315494.1A CN116562423A (en) 2023-03-28 2023-03-28 Deep reinforcement learning-based electric-thermal coupling new energy system energy management method

Publications (1)

Publication Number Publication Date
CN116562423A true CN116562423A (en) 2023-08-08

Family

ID=87497267

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310315494.1A Pending CN116562423A (en) 2023-03-28 2023-03-28 Deep reinforcement learning-based electric-thermal coupling new energy system energy management method

Country Status (1)

Country Link
CN (1) CN116562423A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117808174A (en) * 2024-03-01 2024-04-02 山东大学 Micro-grid operation optimization method and system based on reinforcement learning under network attack

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117808174A (en) * 2024-03-01 2024-04-02 山东大学 Micro-grid operation optimization method and system based on reinforcement learning under network attack
CN117808174B (en) * 2024-03-01 2024-05-28 山东大学 Micro-grid operation optimization method and system based on reinforcement learning under network attack

Similar Documents

Publication Publication Date Title
WO2022100091A1 (en) Centralized control method for scheduling of generalized source storage system
CN110263435B (en) Double-layer optimized fault recovery method based on electric-gas coupling comprehensive energy system
CN105048516A (en) Wind-light-water-fire multi-source complementary optimization scheduling method
CN113095791B (en) Comprehensive energy system operation method and system
CN110084443B (en) QPSO optimization algorithm-based power change station operation optimization model analysis method
CN111934360B (en) Virtual power plant-energy storage system energy collaborative optimization regulation and control method based on model predictive control
CN111737884B (en) Multi-target random planning method for micro-energy network containing multiple clean energy sources
CN109449925B (en) Self-adaptive dynamic planning method for multi-objective joint optimization scheduling
CN104156789B (en) Isolated micro-grid optimum economic operation method taking energy storage life loss into consideration
CN110796373B (en) Multi-stage scene generation electric heating system optimization scheduling method for wind power consumption
CN111160636B (en) CCHP type micro-grid scheduling optimization method
CN115759604B (en) Comprehensive energy system optimal scheduling method
CN114330827B (en) Distributed robust self-scheduling optimization method for multi-energy flow virtual power plant and application thereof
CN116050632B (en) Micro-grid group interactive game strategy learning evolution method based on Nash Q learning
CN116562423A (en) Deep reinforcement learning-based electric-thermal coupling new energy system energy management method
CN114676991A (en) Optimal scheduling method based on source-load double-side uncertain multi-energy complementary system
CN115952990A (en) Carbon emission accounting method and system based on park demand response economic dispatching
CN114465226A (en) Method for establishing multi-level standby acquisition joint optimization model of power system
CN107591806B (en) A kind of major network dispatching method a few days ago considering the active regulating power of active distribution network
CN117559391A (en) Hybrid energy storage joint planning method considering flexible supply-demand balance of power system
CN115940284B (en) Operation control strategy of new energy hydrogen production system considering time-of-use electricity price
CN111967646A (en) Renewable energy source optimal configuration method for virtual power plant
CN117013522A (en) Comprehensive energy system scheduling optimization method considering distributed power supply and gas-electricity cooperation
CN115800276A (en) Power system emergency scheduling method considering unit climbing
CN114285093B (en) Source network charge storage interactive scheduling method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination