CN113780688B - Optimized operation method, system, equipment and medium of electric heating combined system - Google Patents
Optimized operation method, system, equipment and medium of electric heating combined system Download PDFInfo
- Publication number
- CN113780688B CN113780688B CN202111328629.5A CN202111328629A CN113780688B CN 113780688 B CN113780688 B CN 113780688B CN 202111328629 A CN202111328629 A CN 202111328629A CN 113780688 B CN113780688 B CN 113780688B
- Authority
- CN
- China
- Prior art keywords
- agent
- power
- network
- action
- formula
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 95
- 238000005485 electric heating Methods 0.000 title claims abstract description 64
- 230000009471 action Effects 0.000 claims abstract description 111
- 230000002787 reinforcement Effects 0.000 claims abstract description 51
- 238000005457 optimization Methods 0.000 claims abstract description 47
- 238000010248 power generation Methods 0.000 claims abstract description 33
- 230000020169 heat generation Effects 0.000 claims abstract description 17
- 239000003795 chemical substances by application Substances 0.000 claims description 308
- 230000006870 function Effects 0.000 claims description 93
- 230000014509 gene expression Effects 0.000 claims description 44
- 230000009194 climbing Effects 0.000 claims description 36
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims description 28
- 230000008569 process Effects 0.000 claims description 26
- 238000012549 training Methods 0.000 claims description 25
- 238000004590 computer program Methods 0.000 claims description 16
- 238000003860 storage Methods 0.000 claims description 16
- 238000005265 energy consumption Methods 0.000 claims description 10
- 238000011156 evaluation Methods 0.000 claims description 9
- 238000013178 mathematical model Methods 0.000 claims description 9
- 238000005070 sampling Methods 0.000 claims description 9
- 230000005611 electricity Effects 0.000 claims description 6
- 238000005086 pumping Methods 0.000 claims description 5
- 238000010977 unit operation Methods 0.000 claims description 4
- 230000010354 integration Effects 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 description 26
- 239000000243 solution Substances 0.000 description 17
- 238000004422 calculation algorithm Methods 0.000 description 16
- 239000002245 particle Substances 0.000 description 14
- 238000010586 diagram Methods 0.000 description 13
- 230000007547 defect Effects 0.000 description 10
- 230000009286 beneficial effect Effects 0.000 description 9
- 230000007246 mechanism Effects 0.000 description 7
- 238000012545 processing Methods 0.000 description 7
- 230000000052 comparative effect Effects 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 230000006872 improvement Effects 0.000 description 5
- 230000004888 barrier function Effects 0.000 description 4
- 238000011217 control strategy Methods 0.000 description 4
- 238000011161 development Methods 0.000 description 4
- 238000002347 injection Methods 0.000 description 4
- 239000007924 injection Substances 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 230000000295 complement effect Effects 0.000 description 2
- 238000013016 damping Methods 0.000 description 2
- 238000004880 explosion Methods 0.000 description 2
- 238000010438 heat treatment Methods 0.000 description 2
- 210000002569 neuron Anatomy 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 239000011782 vitamin Substances 0.000 description 2
- 229940088594 vitamin Drugs 0.000 description 2
- 229930003231 vitamin Natural products 0.000 description 2
- 235000013343 vitamin Nutrition 0.000 description 2
- 150000003722 vitamin derivatives Chemical class 0.000 description 2
- JPKJQBJPBRLVTM-OSLIGDBKSA-N (2s)-2-amino-n-[(2s,3r)-3-hydroxy-1-[[(2s)-1-[[(2s)-1-[[(2s)-1-[[(2r)-1-(1h-indol-3-yl)-3-oxopropan-2-yl]amino]-1-oxo-3-phenylpropan-2-yl]amino]-1-oxo-3-phenylpropan-2-yl]amino]-1-oxo-3-phenylpropan-2-yl]amino]-1-oxobutan-2-yl]-6-iminohexanamide Chemical compound C([C@H](NC(=O)[C@@H](NC(=O)[C@@H](N)CCCC=N)[C@H](O)C)C(=O)N[C@@H](CC=1C=CC=CC=1)C(=O)N[C@@H](CC=1C=CC=CC=1)C(=O)N[C@H](CC=1C2=CC=CC=C2NC=1)C=O)C1=CC=CC=C1 JPKJQBJPBRLVTM-OSLIGDBKSA-N 0.000 description 1
- 102100031277 Calcineurin B homologous protein 1 Human genes 0.000 description 1
- 102100031272 Calcineurin B homologous protein 2 Human genes 0.000 description 1
- 241001510512 Chlamydia phage 2 Species 0.000 description 1
- 241000839426 Chlamydia virus Chp1 Species 0.000 description 1
- 101000777252 Homo sapiens Calcineurin B homologous protein 1 Proteins 0.000 description 1
- 101000777239 Homo sapiens Calcineurin B homologous protein 2 Proteins 0.000 description 1
- 101000943802 Homo sapiens Cysteine and histidine-rich domain-containing protein 1 Proteins 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02E—REDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
- Y02E40/00—Technologies for an efficient electrical power generation, transmission or distribution
- Y02E40/70—Smart grids as climate change mitigation technology in the energy generation sector
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Economics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Human Resources & Organizations (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Tourism & Hospitality (AREA)
- Primary Health Care (AREA)
- Water Supply & Treatment (AREA)
- Public Health (AREA)
- Development Economics (AREA)
- Game Theory and Decision Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Feedback Control In General (AREA)
- Supply And Distribution Of Alternating Current (AREA)
Abstract
The invention discloses an optimized operation method, a system, equipment and a medium of an electric-heat combined system, wherein the method comprises the following steps: acquiring state parameters of an electric heating combined system to be optimally operated; wherein the state parameters include: electrical load, wind power maximum output, thermal load and ambient temperature; inputting the state parameters into a pre-trained multi-agent deep reinforcement learning model, and outputting the action quantity through the multi-agent deep reinforcement learning model; wherein the action amount includes: the power generation power of the conventional unit, the power generation power of the cogeneration device, the wind power generation power and the heat generation power of the cogeneration device; and realizing the optimized operation of the electric-heat combined system based on the action quantity. The method or the system provided by the invention can realize the multi-energy coordination optimization scheduling of the electric heating combined system.
Description
Technical Field
The invention belongs to the technical field of comprehensive energy system optimization, relates to an electric heating combined system, and particularly relates to an optimized operation method, system, equipment and medium of the electric heating combined system.
Background
Under the background of energy internet, the development goals of improving the utilization efficiency of energy, promoting the consumption of renewable energy, realizing the sustainable development of energy and reducing the pollution to the environment are the current energy systems. The electric heating combined system is an important physical carrier of an energy internet, is a key for realizing application of concepts such as multi-energy complementation and energy cascade utilization, and is an important development direction for adjusting the structure of the current energy. The research on the comprehensive energy system coupling the power system and the heating system has important significance for breaking the existing mode of independent planning and independent operation of the original energy supply system and realizing the multi-energy complementary integration optimization of the energy system.
At present, a great deal of research is carried out on the optimization problem of the electric-heat combined system, and the research content generally comprises the steps of establishing an electric-heat combined system optimization model considering heat loss of a heat supply network water return pipe network by analyzing the actual structural characteristics of the heat supply network and combining a hydraulic thermal model of the thermal system, and solving the model. However, as the system scale is continuously increased, and meanwhile, on the basis of considering the heat loss characteristic of a heat supply network, the multi-energy complementary Optimization of the electric heating combined system presents a high-dimensional nonlinear non-convex characteristic, the traditional nonlinear solving method is difficult to solve, the solving precision is influenced by linearization processing, and the existing traditional algorithms such as PSO (Particle Swarm Optimization), DDPG (Deep Deterministic Policy gradient) and the like are difficult to overcome the problem of information barriers between different benefit subjects.
Disclosure of Invention
The invention aims to provide an optimal operation method, an optimal operation system, an optimal operation device and an optimal operation medium of an electric heating combined system, so as to solve one or more technical problems. The method or the system provided by the invention can realize the multi-energy coordination optimization scheduling of the electric heating combined system.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention provides an optimized operation method of an electric heating combined system in a first aspect, which comprises the following steps:
acquiring state parameters of an electric heating combined system to be optimally operated; wherein the state parameters include: electrical load, wind power maximum output, thermal load and ambient temperature;
inputting the state parameters into a pre-trained multi-agent deep reinforcement learning model, and outputting the action quantity through the multi-agent deep reinforcement learning model; wherein the action amount includes: the power generation power of the conventional unit, the power generation power of the cogeneration device, the wind power generation power and the heat generation power of the cogeneration device; the basic elements of the multi-agent deep reinforcement learning model comprise agents, environments, action spaces of the agents, state spaces of the agents and reward functions of the agents;
and realizing the optimized operation of the electric-heat combined system based on the action quantity.
In a further improvement of the method of the present invention, in the multi-agent deep reinforcement learning model,
the intelligent agents comprise an electric power system intelligent agent and a thermal system intelligent agent;
the environment includes mathematical models of power system and thermodynamic system energy flows;
the action space of each agent comprises an electric power system agent action space and a thermal system agent action space; the intelligent action space of the power system comprises conventional unit generating power, cogeneration device generating power and wind power generation power; the thermodynamic system intelligent body action space comprises heat generation power of a cogeneration device;
the state space of each intelligent agent comprises an electric power system intelligent agent state space and a thermodynamic system intelligent agent state space; the state space of the intelligent body of the power system comprises an electric load, the power generation power of the current cogeneration device, the maximum wind power output and the output of the current conventional unit; the intelligent state space of the thermodynamic system comprises a heat load, the heat generation power of the current cogeneration device and the ambient temperature;
the reward function of each intelligent agent comprises an electric power system intelligent agent reward function and a thermal system intelligent agent reward function; the power system intelligent agent reward function comprises a conventional unit operation cost, a wind curtailment penalty and a variable out-of-limit penalty; the thermodynamic system intelligent agent reward function comprises the operation cost of the cogeneration device and a variable out-of-limit penalty.
In a further development of the inventive method, the power system agent and the thermal system agent each comprise a respective actuator network and discriminator network;
the actor network is used for inputting a state set sensed by the agent from the environment and outputting the action of the agent in a given state; the arbiter network is used for generating a state value function according to the state of the agent and the action of the agent in the state, and evaluating the quality of the current action taken by the actor network;
the mobile network and the discriminator network both adopt a double-network structure, and comprise an estimation network and a target network with the same structure; in the training process, the estimation network parameters of the actuators and the estimation network parameters of the discriminators of all the agents are updated, and the estimation network parameters after training are used for soft updating of the target network.
The method of the invention is further improved in that, in the training process, the estimation network parameters of the actuators and the estimation network parameters of the discriminators of each agent are updated, and the step of performing soft update on the target network by using the trained estimation network parameters specifically comprises the following steps:
selecting an action for a power system agent at each scheduling period in a scheduling cycleSelecting actions for thermodynamic system agents(ii) a In the formula, s1、s2Respectively represents the current states observed by the power system intelligent agent and the thermal system intelligent agent,respectively representing the current strategies in the power system agent and the thermodynamic system agent actor networks,respectively are random noises of strategy actions of an intelligent agent of the power system and an intelligent agent of the thermodynamic system;
will be provided withThe experience of the intelligent agent of the power system is stored in a playback unitStoring the data into a thermodynamic system intelligent agent experience playback unit; wherein,andare respectively an actionActs on the real system to observe the status of the power system agent's immediate rewards and updates,andare respectively an actionInstant rewards and updated status for the thermodynamic system agents;
random sampling from power system agent experience playback unitCalculatingUpdating the arbiter estimated network parameters of the power system agent according to the first loss functionThe first loss function is expressed as,in the formula (I), wherein,a state value function of the evaluation network is evaluated for the power system agent arbiter,as a function of the state values of the power system agent arbiter target network,the number of all sub-strategies in the strategy;
updating power train estimation network parameters of power system agents according to a second loss functionAnd the second loss function is expressed as,
the expression of the target actuator network parameter and the target discriminator network parameter of the soft update power system agent is,
in the formula (I), wherein,、network parameters of an intelligent agent target actuator and a target discriminator of the power system are respectively;
random sampling from thermodynamic system intelligent agent experience playback unitCalculatingUpdating the network parameters estimated by the arbiter of the thermal system agent according to the third loss functionThe expression of the third loss function is,in the formula (I), wherein,a state value function of the evaluation network is evaluated for the thermal system agent arbiter,as a function of the state values of the thermodynamic system agent arbiter target network,the number of all sub-strategies in the strategy; updating actuator estimated value network parameters of thermodynamic system intelligent agent according to fourth loss functionThe expression of the fourth loss function is,the expression of the target actuator network parameter and the target discriminator network parameter of the intelligent agent of the soft updating thermodynamic system is,,
in the formula (I), wherein,、respectively are network parameters of an intelligent agent target actuator and a target discriminator of the thermodynamic system.
A further improvement of the method of the invention is that the mathematical model of the power and thermal system power flows comprises:
in the formula,in order to reduce the running cost of the conventional unit,in order to increase the operating cost of the cogeneration unit,punishment is carried out for wind abandonment;
in the formula (I), wherein,、、is an energy consumption coefficient of a conventional unit,the output of the conventional machine set is used,is a constantThe number of the gauge sets is set according to the requirements,in order to schedule the period of time,is a scheduling time interval;
in the formula (I), wherein,for the energy consumption coefficient of the cogeneration unit,for the amount of cogeneration,、respectively the electricity and heat output of the cogeneration unit;
in the formula (I), wherein,in order to make the wind abandon penalty factor,predicting a difference value between the power and the actual power for the wind power;
the network security constraints, expressed as,,,in the formula (I), wherein,representing nodes of an electrical power network3 the amplitude of the voltage is set to be 3,、are respectively nodes3 upper and lower limits of voltage amplitude;for flowing into heat supply network nodeThe temperature of the hot water of (a),、the upper limit and the lower limit of the water supply temperature are set;as a heat supply network nodeAnd nodeThe mass flow rate of the intermediate hot water pipeline,、respectively as its upper and lower limits;
the cogeneration unit is constrained, as expressed,in the formula (I), wherein,、are respectively a period of timeThe first stepThe bench pumping condensing unit generates electric power and heat power;、the upper limit and the lower limit of the electric output force are respectively;、、representing coefficients for the polygonal areas;
the climbing of the cogeneration device is restricted by the expression,in the formula (I), wherein,、the cogeneration power of the front and the back two periods respectively,、respectively is the upper limit and the lower limit of the climbing speed of the cogeneration device;
the renewable energy source is restricted, and the expression is,in the formula (I), wherein,indicating a period of timeWind turbineThe power generated by the generator is used as the power,the maximum output value is the maximum output value of the wind driven generator;
the climbing of the conventional unit is restrained, the expression is,in the formula (I), wherein,in order to generate the power for the conventional unit,respectively are the upper limit and the lower limit of the unit output,、the upper limit and the lower limit of the climbing speed of the unit are respectively set.
In a further improvement of the method of the present invention, the expression of the power system agent reward function is,
in the formula,punishment is carried out on the running cost and the abandoned wind of the power system;a system node voltage out-of-limit penalty item is obtained;for the output out-of-limit penalty term of the cogeneration unit,for the climbing out-of-limit punishment item of the cogeneration device,is an out-of-limit punishment item of the output of the conventional unit,a climbing out-of-limit punishment item for the conventional unit;
the expression of the thermodynamic system agent reward function is,
in the formula,for the output out-of-limit punishment item of the cogeneration unit,for the climbing out-of-limit punishment item of the cogeneration unit,punishment is carried out for the temperature of the system node,and punishing the out-of-limit of the mass flow rate of the system pipeline.
The invention provides an optimized operation system of an electric heating combined system in a second aspect, which comprises:
the parameter acquisition module is used for acquiring state parameters of the electric heating combined system to be optimally operated; wherein the state parameters include: electrical load, wind power maximum output, thermal load and ambient temperature;
the action quantity acquisition module is used for inputting the state parameters into a pre-trained multi-agent deep reinforcement learning model and outputting action quantities through the multi-agent deep reinforcement learning model; wherein the action amount includes: the power generation power of the conventional unit, the power generation power of the cogeneration device, the wind power generation power and the heat generation power of the cogeneration device; the basic elements of the multi-agent deep reinforcement learning model comprise agents, environments, action spaces of the agents, state spaces of the agents and reward functions of the agents;
and the optimized operation module is used for realizing the optimized operation of the electric heating combined system based on the action quantity.
In a further improvement of the system of the present invention, in the multi-agent deep reinforcement learning model of the motion quantity acquisition module,
the intelligent agents comprise an electric power system intelligent agent and a thermal system intelligent agent;
the environment includes mathematical models of power system and thermodynamic system energy flows;
the action space of each agent comprises an electric power system agent action space and a thermal system agent action space; the intelligent action space of the power system comprises conventional unit generating power, cogeneration device generating power and wind power generation power; the thermodynamic system intelligent body action space comprises heat generation power of a cogeneration device;
the state space of each intelligent agent comprises an electric power system intelligent agent state space and a thermodynamic system intelligent agent state space; the state space of the intelligent body of the power system comprises an electric load, the power generation power of the current cogeneration device, the maximum wind power output and the output of the current conventional unit; the intelligent state space of the thermodynamic system comprises a heat load, the heat generation power of the current cogeneration device and the ambient temperature;
the reward function of each intelligent agent comprises an electric power system intelligent agent reward function and a thermal system intelligent agent reward function; the power system intelligent agent reward function comprises a conventional unit operation cost, a wind curtailment penalty and a variable out-of-limit penalty; the thermodynamic system intelligent agent reward function comprises the operation cost of the cogeneration device and a variable out-of-limit penalty.
In the action quantity obtaining module, the power system intelligent agent and the thermal system intelligent agent both comprise respective actuator networks and discriminator networks;
the actor network is used for inputting a state set sensed by the agent from the environment and outputting the action of the agent in a given state; the arbiter network is used for generating a state value function according to the state of the agent and the action of the agent in the state, and evaluating the quality of the current action taken by the actor network;
the mobile network and the discriminator network both adopt a double-network structure, and comprise an estimation network and a target network with the same structure; in the training process, the estimation network parameters of the actuators and the estimation network parameters of the discriminators of all the agents are updated, and the estimation network parameters after training are used for soft updating of the target network.
The system of the present invention is further improved in that, in the action quantity obtaining module, in the training process, the estimation network parameters of the actuators and the estimation network parameters of the discriminators of each agent are updated, and the step of performing soft update on the target network by using the trained estimation network parameters specifically includes:
selecting an action for a power system agent at each scheduling period in a scheduling cycleSelecting actions for thermodynamic system agents(ii) a In the formula, s1、s2Respectively represents the current states observed by the power system intelligent agent and the thermal system intelligent agent,respectively representing the current strategies in the power system agent and the thermodynamic system agent actor networks,respectively are random noises of strategy actions of an intelligent agent of the power system and an intelligent agent of the thermodynamic system;
will be provided withThe experience of the intelligent agent of the power system is stored in a playback unitStoring the data into a thermodynamic system intelligent agent experience playback unit; wherein,andare respectively an actionActs on the real system to observe the status of the power system agent's immediate rewards and updates,andare respectively an actionInstant rewards and updated status for the thermodynamic system agents;
random sampling from power system agent experience playback unitCalculatingUpdating the arbiter estimated network parameters of the power system agent according to the first loss functionThe first loss function is expressed as,in the formula (I), wherein,a state value function of the evaluation network is evaluated for the power system agent arbiter,as a function of the state values of the power system agent arbiter target network,the number of all sub-strategies in the strategy;
updating power train estimation network parameters of power system agents according to a second loss functionAnd the second loss function is expressed as,the expression of the target actuator network parameter and the target discriminator network parameter of the soft update power system agent is,,in the formula (I), wherein,、network parameters of an intelligent agent target actuator and a target discriminator of the power system are respectively;
random sampling from thermodynamic system intelligent agent experience playback unitCalculatingUpdating the network parameters estimated by the arbiter of the thermal system agent according to the third loss functionThe expression of the third loss function is,in the formula (I), wherein,a state value function of the evaluation network is evaluated for the thermal system agent arbiter,as a function of the state values of the thermodynamic system agent arbiter target network,the number of all sub-strategies in the strategy; updating actuator estimated value network parameters of thermodynamic system intelligent agent according to fourth loss functionThe expression of the fourth loss function is,the expression of the target actuator network parameter and the target discriminator network parameter of the intelligent agent of the soft updating thermodynamic system is,,in the formula (I), wherein,、respectively are network parameters of an intelligent agent target actuator and a target discriminator of the thermodynamic system.
In the system of the present invention, the mathematical model of the power flow of the power system and the thermodynamic system in the action quantity obtaining module comprises:
in the formula,in order to reduce the running cost of the conventional unit,in order to increase the operating cost of the cogeneration unit,punishment is carried out for wind abandonment;
in the formula (I), wherein,、、is an energy consumption coefficient of a conventional unit,the output of the conventional machine set is used,the number of the conventional units is the same as that of the conventional units,in order to schedule the period of time,is a scheduling time interval;
in the formula (I), wherein,for the energy consumption coefficient of the cogeneration unit,for the amount of cogeneration,、respectively the electricity and heat output of the cogeneration unit;
in the formula (I), wherein,in order to make the wind abandon penalty factor,predicting a difference value between the power and the actual power for the wind power;
the network security constraints, expressed as,,,in the formula (I), wherein,representing nodes of an electrical power network3 the amplitude of the voltage is set to be 3,、are respectively nodes3 upper and lower limits of voltage amplitude;for flowing into heat supply network nodeThe temperature of the hot water of (a),、the upper limit and the lower limit of the water supply temperature are set;as a heat supply network nodeAnd nodeThe mass flow rate of the intermediate hot water pipeline,、respectively as its upper and lower limits;
the cogeneration unit is constrained, as expressed,in the formula (I), wherein,、are respectively a period of timeThe first stepThe bench pumping condensing unit generates electric power and heat power;、the upper limit and the lower limit of the electric output force are respectively;、、representing coefficients for the polygonal areas;
the climbing of the cogeneration device is restricted by the expression,in the formula (I), wherein,、the cogeneration power of the front and the back two periods respectively,、respectively is the upper limit and the lower limit of the climbing speed of the cogeneration device;
the renewable energy source is restricted, and the expression is,in the formula (I), wherein,indicating a period of timeWind turbineThe power generated by the generator is used as the power,the maximum output value is the maximum output value of the wind driven generator;
the climbing of the conventional unit is restrained, the expression is,in the formula (I), wherein,in order to generate the power for the conventional unit,respectively are the upper limit and the lower limit of the unit output,、the upper limit and the lower limit of the climbing speed of the unit are respectively set.
In a further improvement of the system of the present invention, the power system agent reward function is expressed as,
in the formula,punishment is carried out on the running cost and the abandoned wind of the power system;a system node voltage out-of-limit penalty item is obtained;for the output out-of-limit penalty term of the cogeneration unit,for the climbing out-of-limit punishment item of the cogeneration device,is an out-of-limit punishment item of the output of the conventional unit,a climbing out-of-limit punishment item for the conventional unit;
the expression of the thermodynamic system agent reward function is,
in the formula,for the output out-of-limit punishment item of the cogeneration unit,for the climbing out-of-limit punishment item of the cogeneration unit,punishment is carried out for the temperature of the system node,and punishing the out-of-limit of the mass flow rate of the system pipeline.
A third aspect of the present invention provides a computer device, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method for optimizing operation of an electric-thermal combination system according to any one of the above embodiments when executing the computer program.
A fourth aspect of the present invention provides a computer-readable storage medium storing a computer program, wherein the computer program is configured to, when executed by a processor, implement the steps of the method for optimizing operation of an electric-heat combined system according to any one of the above aspects of the present invention.
Compared with the prior art, the invention has the following beneficial effects:
the method provided by the invention determines the state parameters, based on a multi-agent deep reinforcement learning model, solves the problem of electric-thermal joint optimization by adopting a reinforcement learning method, improves the generation speed of the control strategy by reinforcement learning on the premise of ensuring the calculation effect, and can overcome the defects that the traditional method has overlong operation time along with the increase of the system scale and is difficult to meet the requirement of on-line calculation.
In the method, based on a multi-agent depth certainty strategy gradient algorithm framework, an electric heating combined system optimization scheduling model based on a multi-agent actor-evaluator is constructed, convergence is stable, space exploratory performance is strong, and the defect that the existing traditional method is easy to fall into a local optimal solution during solving can be overcome.
According to the method, an electric heating combined system is divided into an electric power system intelligent body and a thermodynamic system intelligent body, the intelligent bodies cooperate to achieve the overall optimization target of the system, a reinforcement learning action and a state space are divided by combining an electric heating combined system scheduling model, a reward and punishment mechanism of each intelligent body is established, respective strategy calculation can be completed only through local state information of each intelligent body, and the problem that data of different beneficial bodies are difficult to share is solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art are briefly introduced below; it is obvious that the drawings in the following description are some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
FIG. 1 is a schematic diagram of the training process of the DDPG model in comparative example 2 of the present invention;
FIG. 2 is a schematic flow chart of a method for optimizing operation of an integrated electric heating system according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of an electric heating combination system in an embodiment of the present invention;
FIG. 4 is a diagram illustrating a reinforcement learning model according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of the interior of a smart body in an embodiment of the invention;
FIG. 6 is a schematic diagram of a multi-agent framework of an electrothermal combined system according to an embodiment of the present invention;
FIG. 7 is a flow chart of a multi-agent deep reinforcement learning network training framework according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The invention is described in further detail below with reference to the accompanying drawings:
comparative example 1
The particle swarm optimization algorithm takes a particle swarm as a basic unit, each particle represents a possible problem solution, and the intelligence of problem solution is realized through the information interaction in the particle swarm through the simple behavior of individual particles. The particle swarm optimization algorithm is applied, firstly, an electric-thermal comprehensive energy system optimization scheduling model (which can exemplarily comprise power grid, heat grid power flow constraint, safe operation constraint, cogeneration unit constraint, system optimization target and the like) is established, and then the particle swarm optimization algorithm is utilized for solving.
When the method is specifically executed, firstly, setting the maximum iteration number, the independent variable number and the maximum particle speed, and initializing the speed and the position for the particle swarm; and then defining a fitness function according to an optimization target of the optimization scheduling model of the electric-thermal comprehensive energy system. The extreme value of each individual is the optimal solution found by each particle, the minimum value of all particle optimal solutions is the global optimal solution, the global optimal solution is compared with the historical global optimal solution, and the speed and the position are updated according to the formulas (1) and (2):
in the formula,is a variable ofIndividualsThe speed and the position of the vehicle,is a factor of the inertia, and is,in order to learn the factors, the learning device is provided with a plurality of learning units,is shown asFirst extreme of individual variableThe ratio of vitamin to vitamin is,represents the global optimal solutionAnd (5) maintaining.
And stopping iteration when the maximum iteration number is reached or the iteration difference value meets the precision requirement.
Based on the above analysis, the method of comparative example 1 of the present invention has the following defects:
(1) the particle swarm algorithm is easy to fall into a local optimal solution, and the problem of low convergence precision and the like can be caused when the exploration capability of the algorithm is insufficient, even the convergence is difficult, so that the effectiveness of the optimization scheduling calculation result of the electricity-heat integrated energy system is influenced.
(2) With the problem scale enlargement, the particle swarm optimization has the problem of dimension explosion, the dimension explosion greatly increases the calculation amount, further causes the calculation speed to be greatly reduced, and is probably not suitable for application occasions with higher requirements on the calculation speed.
Comparative example 2
The DDPG is a reinforcement learning algorithm aiming at a continuous action space, is developed from the traditional PG algorithm, and is suitable for solving the problem of optimal scheduling of an electric-thermal comprehensive energy system; the general steps of optimizing and scheduling by adopting a DDPG algorithm comprise establishing an intelligent body actor network and a judger network, interacting with the environment to generate training samples and constructing playback units, and randomly selecting the playback unit samples to train the actor network and the judger network; and after multiple times of training, outputting the control strategy of the electric-thermal comprehensive energy system by utilizing the actuator network according to the input information.
Referring to fig. 1, a model training process in comparative example 2 of the present invention is shown in fig. 1, and the calculation process specifically includes the following steps:
(1) establishing an actor network and a judger network, and initializing each network parameter;
(2) giving an initial state to the intelligent agent, generating a strategy through a forward process of an actor network in each iteration, evaluating the action by using a judger network, sending the action into an environment for state transition, and calculating to obtain a reward function; storing the generated group of samples into a playback unit, and randomly selecting a batch of samples to update the parameters of the Actor network and the Critic network;
wherein, the criticic network adopts formula (3) to update:
the Actor network is updated by the formula (4):
in the formula,in order to be a value of the prize,in order to be a factor of the discount,in the state of the intelligent agent, the intelligent agent state,as a result of the network parameters,is an agent action.
(3) And judging whether the upper limit of the iteration times is reached, if so, stopping training, and outputting parameters of the actor network and the judger network.
Based on the above analysis, the method of comparative example 2 of the present invention has the following defects:
(1) in practical application, different systems may belong to different departments in charge, information barriers exist, optimization is difficult to perform on the premise of complete data sharing, and the DDPG technology cannot give out optimal control action under the condition of only knowing local information of the DDPG technology;
(2) the problem scale under a large system is enlarged, the dimensionality of a single-agent DDPG action space is large, and the problem of insufficient exploration of the action space possibly exists, so that the local optimal solution is converged.
To sum up, under the background of energy internet, the electric heating combined system becomes the key for realizing the application of concepts such as multi-energy complementation, energy cascade utilization and the like. At present, an electric heating combined system optimization model considering heat loss of a heat supply network return water pipe network is mainly established in the electric heating combined system optimization, but along with the continuous increase of the system scale, the electric heating combined system optimization model presents the characteristic of high-dimensional non-linearity and non-convexity, and the traditional method is difficult to solve; however, the PSO or DDPG algorithm requires state information of the entire system, and it is difficult to overcome the information barrier problem.
Example 1
In the technical scheme provided by the embodiment of the invention, an electric heating combined system optimization scheduling model based on multi-agent depth certainty strategy gradient is constructed, and multi-energy coordination optimization scheduling of the electric heating combined system is realized. Compared with a traditional model, the method effectively solves the problem of sequence decision in the continuous control process, avoids the defects caused by adopting a discrete action space, can complete respective strategy calculation only by knowing the local state information of each intelligent agent, and solves the problem of data sharing of different intelligent agents. In addition, an electric-thermal combined system (for example, described in the following documents, [1] Wangbeiliang, Wangdan, Jia Hongjie, and the like, a typical regional comprehensive energy system steady state analysis research in the context of energy Internet reviews [ J ]. Chinese Motor engineering report, 2016, 36 (12): 3292-. At present, an electric heating combined system optimization model considering heat loss of a heat supply network and a water return network is increased along with the continuous increase of system scale, high-dimensional nonlinear non-convex characteristics are presented, the traditional nonlinear solving method is difficult to solve, and linear treatment influences solving precision. In the technical scheme provided by the embodiment of the invention, the method for optimizing the operation of the electric heating combined system is constructed based on a multi-agent depth certainty strategy gradient (MADDPG), the strategy generation speed is improved, the problem of precision reduction caused by discretization action state space can be avoided, each intelligent agent only depends on local information to complete calculation in the strategy execution process, the problem of data sharing of different beneficial agents is solved, and therefore the multifunctional coordination optimization scheduling of the electric heating combined system is realized.
Referring to fig. 2, an optimized operation method of an electric-heat combined system according to an embodiment of the present invention includes the following steps:
Step 2, inputting the state parameters into a pre-trained multi-agent deep reinforcement learning model, and outputting the action quantity through the multi-agent deep reinforcement learning model; wherein the action amount includes: the power generation power of the conventional unit, the power generation power of the cogeneration device, the wind power generation power and the heat generation power of the cogeneration device;
and 3, realizing the optimized operation of the electric heating combined system based on the action quantity.
The method of the embodiment of the invention determines the state parameters, adopts the reinforcement learning method to solve the problem of electric-heat combined optimization based on the multi-agent deep reinforcement learning model, improves the generation speed of the control strategy by reinforcement learning on the premise of ensuring the calculation effect, and can overcome the defects that the traditional method has overlong operation time along with the increase of the system scale and is difficult to meet the requirement of on-line calculation.
Example 2
Based on the above embodiment 1, referring to fig. 3, in an optional aspect of the embodiment of the present invention, the electric-heat combined system includes: conventional generator sets, wind turbine generators, cogeneration units, and the like; wherein, G1, G2 represent conventional generator sets, which are responsible for supplying electrical loads in the system; w1 represents a wind turbine generator, the influence of the maximum output wind speed and the like of the wind turbine generator is random, and the maximum output of the wind turbine generator needs to be obtained according to the prediction result in the day ahead; CHP1 and CHP2 indicate cogeneration units that supply an electric load in the system and supply an electric load in the system; load1, load2, load3 represent the electrical load within the system; hload1, Hload2, Hload3 represent the thermal load in the system.
Illustratively, given that cogeneration systems are already state of the art (reference may be made to the references given above), a brief description is given here as a support for the ease of understanding of the reader.
Example 3
Referring to fig. 4 and 5 based on the above embodiment 1, in an alternative embodiment of the present invention, the multi-agent deep reinforcement learning model is shown in fig. 4 and includes: agent, environment, action, status, and reward function.
The internal structure of the agent is shown in fig. 5, each agent is composed of a policy (Actor) network and a value function (criticic) network, the agent inputs the state set from the environment perception state(s) into the policy network, the policy of the agent is obtained through calculation of the neural network, and all actions (a) of the agent in a given state are output. Specifically, the invention divides the power system and the thermodynamic system into two agents in the model.
Environment: including basic mathematical models of power and thermal system power flows.
Exemplary, with respect to the power system model: in the embodiment of the invention, the alternating current power flow is used as an analysis method of the power system, and a power balance equation of the power system is expressed as follows:
in the formula,are respectively nodesThe active power and the reactive power are injected into the reactor,is a nodeThe magnitude of the voltage of (a) is,are respectively a branchThe electric conductance and the susceptance of the electric power,is a branchPhase angle difference of (2).
Exemplary, regarding the thermodynamic system model: in the embodiment of the invention, the thermodynamic system generates heat energy at a heat source, the heat energy is conveyed to a heat load through a water conveying pipeline, and the heat energy is cooled by the heat load and then flows back through a water return pipeline to form a closed loop; the thermodynamic system is divided into a hydraulic model and a thermodynamic model:
1) regarding the hydraulic model: the hydraulic model of the thermodynamic system represents the medium flow and consists of a flow continuity equation, a loop pressure equation and a head loss equation.
In the formula,for the purposes of the node-branch association matrix,is a loop-branch correlation matrix.In order to be able to measure the mass flow rate of the pipeline,the node injection flow rate is shown,the loss of head pressure is indicated,is the damping coefficient of the pipe.
2) Regarding the thermodynamic model: the thermodynamic model represents an energy transmission process and is composed of a node power equation, a pipeline temperature drop equation and a node medium mixing equation.
In the formula,is a nodeThe injection thermal power of (a) is,is the specific heat capacity of the water,is a nodeThe water temperature of the heat delivery pipeline and the water temperature of the outlet,subscriptIs shown inIs a heat supply network pipeline branch of the head-end node,of the branchThe temperature of the end part of the tube is measured,indicating the ambient temperature.
State space of each agent: for the intelligent state space of the power system, the intelligent state space comprises an electric load, the power generation power of the cogeneration device with the last time section, the maximum wind power output and the conventional unit output with the last time section; for the intelligent state space of the thermodynamic system, the intelligent state space comprises a heat load, the heat generation power of the heat and power cogeneration device with the last time section and the ambient temperature;
action space of each agent: the power system intelligent body motion space comprises conventional unit generating power, cogeneration generating power and wind power generating power; the heat and power cogeneration power is included for the thermodynamic system agent action space.
Reward and punishment mechanism of each agent: for the intelligent agent of the power system, the reward function comprises the operation cost of a conventional unit, a wind abandoning punishment and a variable out-of-limit punishment; for thermodynamic system agents, the reward function includes the cogeneration unit operating cost and the variable violation penalty.
According to the method, an electric heating combined system is divided into an electric power system intelligent body and a thermodynamic system intelligent body, the intelligent bodies cooperate to achieve the overall optimization target of the system, a reinforced learning action and a state space are divided by combining an electric heating combined system scheduling model, a reward and punishment mechanism of each intelligent body is established, respective strategy calculation can be completed only through local state information of each intelligent body, and the problem that data of different beneficial bodies are difficult to share is solved.
Preferably, in an embodiment of the present invention, the obtaining step of the pre-trained multi-agent deep reinforcement learning model includes:
acquiring sample operation parameters of an electric heating combined system to be optimally operated, and initializing the system state of the electric heating combined system; the operating parameters include: electric load powerCapacity of the generatorWind power forecast powerWind curtailment coefficientVoltage of nodeConstrainingUnit climbing restraintThermal load powerAmbient temperatureNode temperature constraintPipe flow restriction。
At each scheduling period in the scheduling cycle, for each agent, the actions are selected:act inReal-time rewards for real system observationsAnd new stateWill beStoring in an experience playback unit, performing state update, and randomly sampling from the playback unitTo obtainCalculatingThe arbiter network is updated according to the loss function shown in equation (8):
updating the actor network according to the loss function shown in equation (9):
the target network parameter expression for each agent is softly updated as follows:
and repeating the training process until convergence to obtain the trained reinforcement learning model.
In the method provided by the embodiment of the invention, based on a multi-agent depth certainty strategy gradient algorithm framework, an electric heating combined system optimization scheduling model based on a multi-agent actor-evaluator is constructed, the convergence is stable, the space exploratory property is strong, and the defect that the existing traditional method is easy to fall into a local optimal solution during solving can be overcome.
Example 4
Referring to fig. 2 to 7, an optimized operation method of an electric heating combined system according to an embodiment of the present invention includes the following steps:
TABLE 1 import parameter Table
Step 2, establishing an optimal scheduling model of the electric heating combined system
Step 201, respectively establishing energy flow models of the power system and the thermodynamic system.
For the electric power system model, the invention takes the alternating current power flow as the analysis method of the electric power system, and the power balance equation of the electric power system is expressed as follows:
in the formula,are respectively nodesThe active power and the reactive power are injected into the reactor,is a nodeThe magnitude of the voltage of (a) is,are respectively a branchThe conductance and the susceptance of (c),is a branchPhase angle difference of (2).
For a thermodynamic system model, the thermodynamic system in the embodiment of the invention generates heat energy at a heat source, the heat energy is conveyed to a heat load through a water conveying pipeline, and the heat energy is cooled by the heat load and then flows back through a water return pipeline to form a closed loop. The thermodynamic system is divided into a hydraulic model and a thermodynamic model:
1) and (4) a hydraulic model. The hydraulic model of the thermodynamic system represents the medium flow and consists of a flow continuity equation, a loop pressure equation and a head loss equation.
In the formula,for the purposes of the node-branch association matrix,is a loop-branch correlation matrix.In order to be able to measure the mass flow rate of the pipeline,the node injection flow rate is shown,the loss of head pressure is indicated,is the damping coefficient of the pipe.
2) A thermal model. The thermodynamic model represents an energy transmission process and is composed of a node power equation, a pipeline temperature drop equation and a node medium mixing equation.
In the formula,is a nodeThe injection thermal power of (a) is,is the specific heat capacity of the water,is a nodeThe water temperature of the heat delivery pipeline and the water temperature of the outlet,subscriptIs shown inIs a heat supply network pipeline branch of the head-end node,of the branchThe temperature of the end part of the tube is measured,indicating the ambient temperature.
Step 202, establishing a system optimization objective. In order to realize the minimum comprehensive target of the operation cost of the power system and the heat supply network and the consumption of new energy, the expression is,
in the formula,in order to reduce the running cost of the conventional unit,in order to increase the operating cost of the cogeneration unit,punishment is made for wind abandonment.
In the embodiment of the invention, the calculation expression of the operation cost of the conventional unit is as follows,
in the formula (I), wherein,、、is an energy consumption coefficient of a conventional unit,the output of the conventional machine set is used,the number of the conventional units is the same as that of the conventional units,in order to schedule the period of time,a time interval is scheduled.
In the embodiment of the invention, the calculation expression of the running cost of the cogeneration unit is as follows,
in the formula (I), wherein,for the energy consumption coefficient of the cogeneration unit,for the amount of cogeneration,、respectively the electricity and the heat output of the cogeneration unit.
In the embodiment of the invention, the calculation expression of the wind curtailment penalty is as follows,
in the formula (I), wherein,in order to make the wind abandon penalty factor,and predicting the difference value between the wind power and the actual power.
Step 203, establishing a constraint condition based on safe operation:
1) network security constraints
In order to realize safe and reliable operation of an electric-heat combined system, a power network needs to meet voltage constraint, a thermodynamic network meets the condition that the node temperature is in a specified range, and the mass flow rate of a heat pipe pipeline is in a limited range.
,,In the formula (I), wherein,representing nodes of an electrical power network3 the amplitude of the voltage is set to be 3,、are respectively nodes3 upper and lower limits of voltage amplitude;for flowing into heat supply network nodeThe temperature of the hot water of (a),、the upper limit and the lower limit of the water supply temperature are set;as a heat supply network nodeAnd nodeThe mass flow rate of the intermediate hot water pipeline,、the upper limit and the lower limit are respectively.
2) Cogeneration unit constraints
The electric heat cogeneration unit provided by the embodiment of the invention adopts a domestic common extraction condensing unit, the operating point is in a polygonal area, and the electricity and heat generation power can be represented by the following constraint form:
in the formula (I), wherein,、are respectively a period of timeThe first stepThe bench pumping condensing unit generates electric power and heat power;、the upper limit and the lower limit of the electric output force are respectively;、、the coefficients are represented for polygonal areas and are constant for a given cogeneration unit.
The cogeneration unit should satisfy the climbing constraint:
in the formula (I), wherein,、the cogeneration power of the front and the back two periods respectively,、respectively the upper and lower limits of the climbing speed of the cogeneration device.
3) Renewable energy constraints
In the formula (I), wherein,indicating a period of timeWind turbineThe power generated by the generator is used as the power,the maximum output value of the wind driven generator is obtained.
4) Conventional unit output constraints
Satisfy climbing restraint simultaneously:
in the formula (I), wherein,in order to generate the power for the conventional unit,respectively are the upper limit and the lower limit of the unit output,、the upper limit and the lower limit of the climbing speed of the unit are respectively set.
And 3, constructing an optimized scheduling model based on the multi-agent depth certainty strategy gradient. And establishing an optimized scheduling model based on the multi-agent depth certainty strategy gradient by combining an electric heating combined system scheduling model according to 5 basic elements of environment, state, action, reward and agent in the reinforcement learning model.
Step 301, constructing an action space and a state space
Respectively constructing and obtaining an electric power system intelligent agent and a thermodynamic system intelligent agent based on the obtained electric power system parameters and thermodynamic system parameters; dividing the action space according to the power system agent and the heating system agentState space。
Preferably, the motion space variable corresponds to a control variable of the system under study, and the generated power of the conventional unit is converted into the powerCogeneration powerAnd wind power generation powerAs an action variable of the power system agent; the action variable in the thermodynamic system is the combined heat and power generation powerNamely:
the state space variables correspond to the state variables of the system under study, reflecting the overall and true physical state of the entire system.
Preferably, the state space of the power system agent is selected as the electric loadGenerating power of cogeneration deviceMaximum output of wind powerAnd conventional unit output:
The thermodynamic system intelligent state space comprises a heat loadHeat power produced by combined heat and power generation deviceAnd ambient temperature:
Step 302, building a reinforcement learning environment based on the energy flow model formula (11-13) of the electric heating combined system, and setting up a section plan at each time
Slightly interacting with the environment completes the state transition process and gets the system reward feedback.
Step 303, respectively establishing a reward and punishment mechanism of the power system agent and the thermodynamic system agent, and judging the quality of the action amount based on the reward and punishment mechanism, specifically comprising the following steps:
(1) and establishing a reinforcement learning reward function.
For the power system agent, the reward function comprises the operation cost of a conventional unit, a wind abandoning penalty and a variable out-of-limit penalty.
In the formula,punishment is carried out on the running cost and the abandoned wind of the power system;a system node voltage out-of-limit penalty item is obtained;for the output out-of-limit penalty term of the cogeneration unit,for the climbing out-of-limit punishment item of the cogeneration device,is an out-of-limit punishment item of the output of the conventional unit,and (4) a conventional unit climbing out-of-limit punishment item.
(2) For thermodynamic system agents, the reward function includes cogeneration unit operating cost and variable violation penalty:
in the formula,for the output out-of-limit punishment item of the cogeneration unit,for the climbing out-of-limit punishment item of the cogeneration unit,punishment is carried out for the temperature of the system node,and punishing the out-of-limit of the mass flow rate of the system pipeline.
And finally, the sum of the reward functions of the intelligent agents is used as a basis for evaluating the quality of the action of each intelligent agent, and the intelligent agents cooperate with each other to realize the optimal optimization target of the electric-heating combined system.
All of the above-mentioned are provided withThe following form penalty terms are adopted for the constraints of (1):
in the formula,setting corresponding coefficients for penalty coefficients according to different out-of-limit penalties
And step 304, constructing an actor and judger network.
Designing a reinforcement learning actuator and a discriminator network structure; different network structures are adopted for the policy network and the value function network. The evaluation network and the target network share a network form, and the network consists of an input layer, a hidden layer and an output layer, wherein the hidden layer number of the actor network is 4, and the number of the neurons is 512, 256, 64 and 32 in sequence. The discriminator network comprises 3 layers of hidden layers, and the number of the neurons is 128, 128 and 32 in sequence. In order to prevent the neural network learning efficiency from being reduced due to gradient disappearance, a linear rectification function with leakage is adopted as an activation function of a hidden layer; and setting an activation function of the output layer of the actor network as a tanh function, limiting the action output within [ -1, 1], and selecting Adam as an optimization algorithm.
Step 4, multi-agent deep reinforcement learning network training: and repeatedly executing the following steps according to the set maximum training times to update the reinforcement learning network.
In the optional technical scheme of the embodiment of the invention, in the step 3, the step form can be used for replacing the linear form for the punishment item of the out-of-limit constraint, but the fitting effect of the punishment item of the step form is poor in practice, and the punishment item of the linear form can achieve a better fitting effect in the training process; in step 3, the reward function curve can be added with no information entropy regular term, but the algorithm convergence process is likely to be unstable; in step 4, the training method can adopt a random gradient descent method SGD to replace Adam (Adaptive moment estimation), but practice shows that the Adam algorithm is better.
In summary, for the optimization problem of the electric-heating combined system, the conventional method is difficult to solve the solving difficulty caused by the increase of the system scale and overcome the information barrier problem among different beneficial agents, and an electric-heating combined system optimization operation method with stronger solving capability and universality is required to be adopted to solve the problem, so that the electric-heating combined system optimization operation problem is solved by adopting a multi-agent-based depth certainty strategy gradient method. Therefore, the electric heating combined system optimization problem can be solved by using reinforcement learning, a deep reinforcement learning method based on a multi-agent technology is constructed, the sequence decision problem in the continuous control process is effectively solved, the defects caused by adoption of discrete action space are avoided, the difficulty of high-dimensional training is reduced, the method is more suitable for a dynamic environment, calculation is completed only by depending on local information in the strategy execution process of each agent, the problem that data of different beneficial agents are difficult to share is solved, and therefore multifunctional coordination optimization scheduling of the electric heating combined system is achieved.
In the method provided by the embodiment of the invention, an optimal operation method of an electric heating combined system is constructed based on a multi-agent depth certainty strategy gradient, and the method is mainly used for solving the following technical problems of the traditional model:
(1) the problem of high-dimensional nonlinear non-convex faced by a traditional model along with the increase of the system scale is solved, and the operation time is greatly reduced by constructing a multi-agent deep reinforcement learning method so as to meet the requirement of online calculation;
(2) the problem of large scheduling result error caused by linear processing for simplifying calculation in the traditional method is solved;
(3) by adopting a multi-agent reinforcement learning framework, each agent only depends on local information to complete calculation in the process of executing the strategy, and the problem that data of different benefit agents are difficult to share is solved.
Compared with the prior art, the technical scheme of the embodiment of the invention has the beneficial effects that at least:
(1) the invention adopts the reinforcement learning method to solve the electric-heating combined optimization problem, improves the generation speed of the control strategy on the premise of ensuring the calculation effect through reinforcement learning, and overcomes the defects that the traditional method has overlong calculation time along with the increase of the system scale and is difficult to meet the requirement of on-line calculation;
(2) the multi-agent depth certainty strategy gradient algorithm framework is based on, an electric heating combined system optimization scheduling model based on a multi-agent actor-evaluator is constructed, convergence is stable, space exploratory performance is strong, and the problem that a local optimal solution is easy to fall into in solving in a traditional method is solved;
(3) the electric heating combined system is divided into an electric power system intelligent body and a thermodynamic system intelligent body, the intelligent bodies cooperate to realize the overall optimization target of the system, the reinforcement learning action and the state space are divided by combining the electric heating combined system scheduling model, the reward and punishment mechanism of each intelligent body is established, the respective strategy calculation can be completed only through the local state information of each intelligent body, and the problem that the data of different beneficial bodies are difficult to share is solved.
The following are embodiments of the apparatus of the present invention that may be used to perform embodiments of the method of the present invention. For details of non-careless mistakes in the embodiment of the apparatus, please refer to the embodiment of the method of the present invention.
In another embodiment of the present invention, an optimized operation system of an electric heating combination system is provided, which includes:
the parameter acquisition module is used for acquiring state parameters of the electric heating combined system to be optimally operated; wherein the state parameters include: electrical load, wind power maximum output, thermal load and ambient temperature;
the action quantity acquisition module is used for inputting the state parameters into a pre-trained multi-agent deep reinforcement learning model and outputting action quantities through the multi-agent deep reinforcement learning model; wherein the action amount includes: the power generation power of the conventional unit, the power generation power of the cogeneration device, the wind power generation power and the heat generation power of the cogeneration device; the basic elements of the multi-agent deep reinforcement learning model comprise agents, environments, action spaces of the agents, state spaces of the agents and reward functions of the agents;
and the optimized operation module is used for realizing the optimized operation of the electric heating combined system based on the action quantity.
In the system of the embodiment of the invention, a reinforcement learning method is adopted to solve the electric-heat joint optimization problem, effectively solve the sequence decision problem in the continuous control process, avoid the defects caused by adopting a discrete action space, reduce the difficulty of high-dimensional training, and enable the system to be more suitable for a dynamic environment, and have high model precision and high solving speed; a multi-agent deep reinforcement learning framework is adopted, a target function for minimizing the operation cost of the system and an intelligent agent reward mechanism of the electric heating combined system constructed based on safety constraints are introduced, the convergence is stable, the space exploration is strong, and the model adaptability is good; an optimized scheduling model based on a multi-agent actor-judger framework is established by combining an electric heating combined system scheduling model, and respective strategy calculation can be completed only through local state information of each agent in the execution process, so that the problem that information of different beneficial agents is difficult to share is solved, and the model is wide in applicability.
In yet another embodiment of the present invention, a computer device is provided that includes a processor and a memory for storing a computer program comprising program instructions, the processor for executing the program instructions stored by the computer storage medium. The Processor may be a Central Processing Unit (CPU), or may be other general-purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable gate array (FPGA) or other Programmable logic device, a discrete gate or transistor logic device, a discrete hardware component, etc., which is a computing core and a control core of the terminal, and is specifically adapted to load and execute one or more instructions in a computer storage medium to implement a corresponding method flow or a corresponding function; the processor provided by the embodiment of the invention can be used for the operation of the optimized operation method of the electric heating combined system.
In yet another embodiment of the present invention, a storage medium, specifically a computer-readable storage medium (Memory), is provided, which is a Memory device in a computer device for storing programs and data. It is understood that the computer readable storage medium herein can include both built-in storage media in the computer device and, of course, extended storage media supported by the computer device. The computer-readable storage medium provides a storage space storing an operating system of the terminal. Also, one or more instructions, which may be one or more computer programs (including program code), are stored in the memory space and are adapted to be loaded and executed by the processor. It should be noted that the computer-readable storage medium may be a high-speed RAM memory, or may be a non-volatile memory (non-volatile memory), such as at least one disk memory. One or more instructions stored in the computer-readable storage medium may be loaded and executed by a processor to perform the corresponding steps of the method for optimized operation of an integrated electric heating system in the above-described embodiments.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.
Claims (12)
1. An optimized operation method of an electric-heat combined system is characterized by comprising the following steps: acquiring state parameters of an electric heating combined system to be optimally operated; wherein the state parameters include: electrical load, wind power maximum output, thermal load and ambient temperature;
inputting the state parameters into a pre-trained multi-agent deep reinforcement learning model, and outputting the action quantity through the multi-agent deep reinforcement learning model; wherein the action amount includes: the power generation power of the conventional unit, the power generation power of the cogeneration device, the wind power generation power and the heat generation power of the cogeneration device; the basic elements of the multi-agent deep reinforcement learning model comprise agents, environments, action spaces of the agents, state spaces of the agents and reward functions of the agents;
realizing the optimized operation of the electric heating combined system based on the action quantity;
wherein, in the multi-agent deep reinforcement learning model,
the intelligent agents comprise an electric power system intelligent agent and a thermal system intelligent agent;
the environment includes mathematical models of power system and thermodynamic system energy flows;
the action space of each agent comprises an electric power system agent action space and a thermal system agent action space; the intelligent action space of the power system comprises conventional unit generating power, cogeneration device generating power and wind power generation power; the thermodynamic system intelligent body action space comprises heat generation power of a cogeneration device;
the state space of each intelligent agent comprises an electric power system intelligent agent state space and a thermodynamic system intelligent agent state space; the state space of the intelligent body of the power system comprises an electric load, the power generation power of the current cogeneration device, the maximum wind power output and the output of the current conventional unit; the intelligent state space of the thermodynamic system comprises a heat load, the heat generation power of the current cogeneration device and the ambient temperature;
the reward function of each intelligent agent comprises an electric power system intelligent agent reward function and a thermal system intelligent agent reward function; the power system intelligent agent reward function comprises a conventional unit operation cost, a wind curtailment penalty and a variable out-of-limit penalty; the thermodynamic system intelligent agent reward function comprises the operation cost of the cogeneration device and a variable out-of-limit penalty.
2. The method of claim 1, wherein the power system agent and the thermal system agent each comprise a respective actuator network and arbiter network;
the actor network is used for inputting a state set sensed by the agent from the environment and outputting the action of the agent in a given state; the arbiter network is used for generating a state value function according to the state of the agent and the action of the agent in the state, and evaluating the quality of the current action taken by the actor network; the mobile network and the discriminator network both adopt a double-network structure, and comprise an estimation network and a target network with the same structure; in the training process, the estimation network parameters of the actuators and the estimation network parameters of the discriminators of all the agents are updated, and the estimation network parameters after training are used for soft updating of the target network.
3. The optimized operation method of an electric-heating combined system according to claim 2, wherein in the training process, the estimation network parameters of the actor and the estimator of each agent are updated, and the step of performing soft update on the target network by using the trained estimation network parameters specifically comprises:
selecting an action a for the power system agent at each scheduling period in the scheduling cycle1=μθ1(s1)+ξt1Selecting action a for the thermodynamic system agent2=μθ2(s2)+ξt2(ii) a In the formula, s1、s2Respectively represents the current states observed by the power system intelligent agent and the thermal system intelligent agent,respectively represents the current strategy of xi in the power system intelligent agent and thermodynamic system intelligent agent actuator networkst1、ξt2Respectively are random noises of strategy actions of an intelligent agent of the power system and an intelligent agent of the thermodynamic system;
will(s)1,a1,r1,s′1) Storing in the power system agent experience playback unit(s)2,a2,r2,s′2) Storing the data into a thermodynamic system intelligent agent experience playback unit; wherein r is1And s'1Respectively is action a ═ a1,a2) Real-time rewarding and updated status r for agents acting on real-time system observation power system2And s'2Respectively is action a ═ a1,a2) Instant rewards and updated status for the thermodynamic system agents;
from power system agent experiencesPlayback unit random samplingComputingUpdating a discriminator estimated network parameter theta of an agent of an electric power system according to a first loss function1 μThe first loss function is expressed as,in the formula,a state value function of the evaluation network is evaluated for the power system agent arbiter,function of state values, K, for the power system agent arbiter target network1The number of all sub-strategies in the strategy;
updating the power system agent's actor estimated network parameter θ according to a second loss function1 QAnd the second loss function is expressed as,the target actuator network parameter of the intelligent agent of the soft updating power system and the target discriminator network parameter expression are theta1′μ←τθ1 μ+(1-τ)θ1′μ,θ1′Q←τθ1 Q+(1-τ)θ1′QIn the formula, theta1′μ、θ1′QNetwork parameters of an intelligent agent target actuator and a target discriminator of the power system are respectively;
random sampling from thermodynamic system intelligent agent experience playback unitComputingUpdating the network parameter θ of the thermal system agent's arbiter estimate according to the third loss function2 μThe expression of the third loss function is,in the formula,a state value function of the evaluation network is evaluated for the thermal system agent arbiter,function of the state value of the target network of the arbiter of the thermodynamic system2The number of all sub-strategies in the strategy; updating the actuator estimated network parameter theta of the thermal system agent according to the fourth loss function2 QThe expression of the fourth loss function is,the expression of the target actuator network parameter and the target discriminator network parameter of the intelligent agent of the soft updating thermodynamic system is theta2′μ←τθ2 μ+(1-τ)θ2′μ,θ2′Q←τθ2 Q+(1-τ)θ2′QIn the formula, theta2′μ、θ2′QRespectively are network parameters of an intelligent agent target actuator and a target discriminator of the thermodynamic system.
4. The method of claim 3, wherein the mathematical models of power system and thermodynamic system power flows comprise:
system optimization objectiveThe expression is that min F ═ F1+f2+f3,
In the formula (f)1For the running cost of a conventional unit, f2For the running cost of the cogeneration unit, f3Punishment is carried out for wind abandonment;
in the formula, b0、b1、b2Is an energy consumption coefficient of a conventional unit,is the output of a conventional unit, NGThe number of the conventional units is T, a scheduling period is T, and delta T is a scheduling time interval;
in the formula, a0、a1、a2、a3、a4、a5Is the coefficient of energy consumption, N, of the cogeneration unitchpFor the amount of cogeneration,respectively the electricity and heat output of the cogeneration unit;
in the formula, k is a wind curtailment penalty coefficient,predicting a difference value between the power and the actual power for the wind power;
the network security constraints, expressed as,
in the formula, Vi3Representing nodes i of an electric power network3Amplitude of voltage, Vi3,max、Vi3,minAre respectively node i3Upper and lower limits of voltage amplitude; t issjTo the temperature of the hot water flowing into the heat network node j,the upper limit and the lower limit of the water supply temperature are set; m isjkIs the mass flow rate, m, of the hot water pipeline between the node j and the node k of the heat supply networkjk,max、mjk,minRespectively as its upper and lower limits;
the cogeneration unit is constrained, as expressed,
in the formula,respectively obtaining electric output and thermal output of the pumping condensing unit of the ith station and the time period t;the upper limit and the lower limit of the electric output force are respectively; alpha is alpha1、α2、α3Representing coefficients for the polygonal areas;
the climbing of the cogeneration device is restricted by the expression,
in the formula,the cogeneration power of the front and the back two periods respectively, respectively is the upper limit and the lower limit of the climbing speed of the cogeneration device;
the renewable energy source is restricted, and the expression is,
in the formula,representing the time period t, the generated power of the fan i,the maximum output value is the maximum output value of the wind driven generator;
the output constraint of the conventional unit is represented by the following expression,
the climbing of the conventional unit is restrained, the expression is,
5. The method of claim 1, wherein the power system agent reward function is expressed as,
in the formula (f)1、f3Punishment is carried out on the running cost and the abandoned wind of the power system; phi is aVA system node voltage out-of-limit penalty item is obtained;for the output out-of-limit penalty term of the cogeneration unit,for the climbing out-of-limit punishment item of the cogeneration device,is a constantThe output of the gauge set exceeds the limit punishment item,a climbing out-of-limit punishment item for the conventional unit;
the expression of the thermodynamic system agent reward function is,
6. An optimized operation system of an electric-heat combined system is characterized by comprising:
the parameter acquisition module is used for acquiring state parameters of the electric heating combined system to be optimally operated; wherein the state parameters include: electrical load, wind power maximum output, thermal load and ambient temperature;
the action quantity acquisition module is used for inputting the state parameters into a pre-trained multi-agent deep reinforcement learning model and outputting action quantities through the multi-agent deep reinforcement learning model; wherein the action amount includes: the power generation power of the conventional unit, the power generation power of the cogeneration device, the wind power generation power and the heat generation power of the cogeneration device; the basic elements of the multi-agent deep reinforcement learning model comprise agents, environments, action spaces of the agents, state spaces of the agents and reward functions of the agents;
the optimized operation module is used for realizing the optimized operation of the electric heating combined system based on the action quantity;
wherein, in the multi-agent deep reinforcement learning model of the action quantity acquisition module,
the intelligent agents comprise an electric power system intelligent agent and a thermal system intelligent agent;
the environment includes mathematical models of power system and thermodynamic system energy flows;
the action space of each agent comprises an electric power system agent action space and a thermal system agent action space; the intelligent action space of the power system comprises conventional unit generating power, cogeneration device generating power and wind power generation power; the thermodynamic system intelligent body action space comprises heat generation power of a cogeneration device;
the state space of each intelligent agent comprises an electric power system intelligent agent state space and a thermodynamic system intelligent agent state space; the state space of the intelligent body of the power system comprises an electric load, the power generation power of the current cogeneration device, the maximum wind power output and the output of the current conventional unit; the intelligent state space of the thermodynamic system comprises a heat load, the heat generation power of the current cogeneration device and the ambient temperature;
the reward function of each intelligent agent comprises an electric power system intelligent agent reward function and a thermal system intelligent agent reward function; the power system intelligent agent reward function comprises a conventional unit operation cost, a wind curtailment penalty and a variable out-of-limit penalty; the thermodynamic system intelligent agent reward function comprises the operation cost of the cogeneration device and a variable out-of-limit penalty.
7. The optimal operation system of an electric-heat combined system according to claim 6, wherein in the action amount obtaining module, each of the power system agent and the thermal system agent comprises a respective actuator network and a respective arbiter network;
the actor network is used for inputting a state set sensed by the agent from the environment and outputting the action of the agent in a given state; the arbiter network is used for generating a state value function according to the state of the agent and the action of the agent in the state, and evaluating the quality of the current action taken by the actor network; the mobile network and the discriminator network both adopt a double-network structure, and comprise an estimation network and a target network with the same structure; in the training process, the estimation network parameters of the actuators and the estimation network parameters of the discriminators of all the agents are updated, and the estimation network parameters after training are used for soft updating of the target network.
8. The optimal operation system of an electric-thermal combination system according to claim 7, wherein in the action quantity obtaining module, in the training process, the estimation network parameters of the actuators and the estimation network parameters of the discriminators of the agents are updated, and the step of performing soft update on the target network by using the trained estimation network parameters specifically comprises:
selecting an action a for the power system agent at each scheduling period in the scheduling cycle1=μθ1(s1)+ξt1Selecting action a for the thermodynamic system agent2=μθ2(s2)+ξt2(ii) a In the formula, s1、s2Respectively represents the current states observed by the power system intelligent agent and the thermal system intelligent agent,respectively represents the current strategy of xi in the power system intelligent agent and thermodynamic system intelligent agent actuator networkst1、ξt2Respectively are random noises of strategy actions of an intelligent agent of the power system and an intelligent agent of the thermodynamic system;
will(s)1,a1,r1,s′1) Storing in the power system agent experience playback unit(s)2,a2,r2,s′2) Storing the data into a thermodynamic system intelligent agent experience playback unit; wherein r is1And s'1Respectively is action a ═ a1,a2) Real-time rewarding and updated status r for agents acting on real-time system observation power system2And s'2Respectively is action a ═ a1,a2) Instant rewards and updated status for the thermodynamic system agents;
random sampling from power system agent experience playback unitComputingUpdating a discriminator estimated network parameter theta of an agent of an electric power system according to a first loss function1 μThe first loss function is expressed as,in the formula,a state value function of the evaluation network is evaluated for the power system agent arbiter,function of state values, K, for the power system agent arbiter target network1The number of all sub-strategies in the strategy;
updating the power system agent's actor estimated network parameter θ according to a second loss function1 QAnd the second loss function is expressed as,
the target actuator network parameter and target discriminator network parameter expressions of the power system intelligent agent are soft update,
θ1′μ←τθ1 μ+(1-τ)θ1′μ,θ1′Q←τθ1 Q+(1-τ)θ1′Qin the formula, theta1′μ、θ1′QRespectively are the intelligence of the power systemEnergy object actor, object discriminator network parameter;
random sampling from thermodynamic system intelligent agent experience playback unitComputingUpdating the network parameter θ of the thermal system agent's arbiter estimate according to the third loss function2 μThe expression of the third loss function is,in the formula,a state value function of the evaluation network is evaluated for the thermal system agent arbiter,function of the state value of the target network of the arbiter of the thermodynamic system2The number of all sub-strategies in the strategy; updating the actuator estimated network parameter theta of the thermal system agent according to the fourth loss function2 QThe expression of the fourth loss function is,the expression of the target actuator network parameter and the target discriminator network parameter of the intelligent agent of the soft updating thermodynamic system is theta2′μ←τθ2 μ+(1-τ)θ2′μ,θ2′Q←τθ2 Q+(1-τ)θ2′QIn the formula, theta2′μ、Q2′QRespectively are network parameters of an intelligent agent target actuator and a target discriminator of the thermodynamic system.
9. The system of claim 8, wherein the mathematical models of power system and thermodynamic system power flows in the action quantity obtaining module comprise:
the system optimization target is expressed as min F ═ F1+f2+f3,
In the formula (f)1For the running cost of a conventional unit, f2For the running cost of the cogeneration unit, f3Punishment is carried out for wind abandonment;
in the formula, b0、b1、b2Is an energy consumption coefficient of a conventional unit,is the output of a conventional unit, NGThe number of the conventional units is T, a scheduling period is T, and delta T is a scheduling time interval;
in the formula, a0、a1、a2、a3、a4、a5Is the coefficient of energy consumption, N, of the cogeneration unitchpFor the amount of cogeneration,respectively the electricity and heat output of the cogeneration unit;
in the formula, k is a wind curtailment penalty coefficient,predicting a difference value between the power and the actual power for the wind power;
the network security constraints, expressed as,
in the formula, Vi3Representing nodes i of an electric power network3Amplitude of voltage, Vi3,max、Vi3,minAre respectively node i3Upper and lower limits of voltage amplitude; t issjTo the temperature of the hot water flowing into the heat network node j,the upper limit and the lower limit of the water supply temperature are set; m isjkIs the mass flow rate, m, of the hot water pipeline between the node j and the node k of the heat supply networkjk,max、mjk,minRespectively as its upper and lower limits;
the cogeneration unit is constrained, as expressed,
in the formula,respectively obtaining electric output and thermal output of the pumping condensing unit of the ith station and the time period t;the upper limit and the lower limit of the electric output force are respectively; alpha is alpha1、α2、α3Representing coefficients for the polygonal areas;
the climbing of the cogeneration device is restricted by the expression,
in the formula,the cogeneration power of the front and the back two periods respectively, respectively is the upper limit and the lower limit of the climbing speed of the cogeneration device;
the renewable energy source is restricted, and the expression is,
in the formula,representing the time period t, the generated power of the fan i,the maximum output value is the maximum output value of the wind driven generator;
the output constraint of the conventional unit is represented by the following expression,
the climbing of the conventional unit is restrained, the expression is,
10. The optimal operation system of an electric-thermal combination system according to claim 6, wherein the expression of the power system agent reward function is,
in the formula (f)1、f3Punishment is carried out on the running cost and the abandoned wind of the power system; phi is aVA system node voltage out-of-limit penalty item is obtained;for the output out-of-limit penalty term of the cogeneration unit,for the climbing out-of-limit punishment item of the cogeneration device,is an out-of-limit punishment item of the output of the conventional unit,a climbing out-of-limit punishment item for the conventional unit;
the expression of the thermodynamic system agent reward function is,
11. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor, when executing the computer program, carries out the steps of a method for optimized operation of an electric heat integration system according to any one of claims 1 to 5.
12. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of a method for optimized operation of an electric heat integration system according to one of the claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111328629.5A CN113780688B (en) | 2021-11-10 | 2021-11-10 | Optimized operation method, system, equipment and medium of electric heating combined system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111328629.5A CN113780688B (en) | 2021-11-10 | 2021-11-10 | Optimized operation method, system, equipment and medium of electric heating combined system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113780688A CN113780688A (en) | 2021-12-10 |
CN113780688B true CN113780688B (en) | 2022-02-18 |
Family
ID=78873781
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111328629.5A Active CN113780688B (en) | 2021-11-10 | 2021-11-10 | Optimized operation method, system, equipment and medium of electric heating combined system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113780688B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114336759A (en) * | 2022-01-10 | 2022-04-12 | 国网上海市电力公司 | Micro-grid autonomous operation voltage control method based on deep reinforcement learning |
CN114398834B (en) * | 2022-01-18 | 2024-09-06 | 中国科学院半导体研究所 | Training method of particle swarm optimization algorithm model, particle swarm optimization method and device |
CN114693101B (en) * | 2022-03-24 | 2024-05-31 | 浙江英集动力科技有限公司 | Multi-region thermoelectric coordination control method for multi-agent reinforcement learning and double-layer strategy distribution |
CN115759604B (en) * | 2022-11-09 | 2023-09-19 | 贵州大学 | Comprehensive energy system optimal scheduling method |
CN117200225B (en) * | 2023-11-07 | 2024-01-30 | 中国电力科学研究院有限公司 | Power distribution network optimal scheduling method considering covering electric automobile clusters and related device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112186799A (en) * | 2020-09-22 | 2021-01-05 | 中国电力科学研究院有限公司 | Distributed energy system autonomous control method and system based on deep reinforcement learning |
CN112862281A (en) * | 2021-01-26 | 2021-05-28 | 中国电力科学研究院有限公司 | Method, device, medium and electronic equipment for constructing scheduling model of comprehensive energy system |
CN113341958A (en) * | 2021-05-21 | 2021-09-03 | 西北工业大学 | Multi-agent reinforcement learning movement planning method with mixed experience |
CN113469839A (en) * | 2021-06-30 | 2021-10-01 | 国网上海市电力公司 | Smart park optimization strategy based on deep reinforcement learning |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106849188B (en) * | 2017-01-23 | 2020-03-06 | 中国电力科学研究院 | Combined heat and power optimization method and system for promoting wind power consumption |
CN113589842B (en) * | 2021-07-26 | 2024-04-19 | 中国电子科技集团公司第五十四研究所 | Unmanned cluster task cooperation method based on multi-agent reinforcement learning |
-
2021
- 2021-11-10 CN CN202111328629.5A patent/CN113780688B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112186799A (en) * | 2020-09-22 | 2021-01-05 | 中国电力科学研究院有限公司 | Distributed energy system autonomous control method and system based on deep reinforcement learning |
CN112862281A (en) * | 2021-01-26 | 2021-05-28 | 中国电力科学研究院有限公司 | Method, device, medium and electronic equipment for constructing scheduling model of comprehensive energy system |
CN113341958A (en) * | 2021-05-21 | 2021-09-03 | 西北工业大学 | Multi-agent reinforcement learning movement planning method with mixed experience |
CN113469839A (en) * | 2021-06-30 | 2021-10-01 | 国网上海市电力公司 | Smart park optimization strategy based on deep reinforcement learning |
Non-Patent Citations (2)
Title |
---|
Towards next generation virtual power plant: Technology review and frameworks;Erphan A.Bhuiyan 等;《Renewable and Sustainable Energy Reviews》;20210712;第150卷;第1-18页 * |
基于深度强化学习的多智能体协同算法研究;李天旭;《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑(月刊)》;20210115(第01期);第I140-146页 * |
Also Published As
Publication number | Publication date |
---|---|
CN113780688A (en) | 2021-12-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113780688B (en) | Optimized operation method, system, equipment and medium of electric heating combined system | |
CN113902040B (en) | Method, system, equipment and storage medium for coordinating and optimizing electricity-heat comprehensive energy system | |
Yan et al. | A multi-agent deep reinforcement learning method for cooperative load frequency control of a multi-area power system | |
Zhang et al. | Dynamic energy conversion and management strategy for an integrated electricity and natural gas system with renewable energy: Deep reinforcement learning approach | |
CN112186799B (en) | Distributed energy system autonomous control method and system based on deep reinforcement learning | |
CN109685252B (en) | Building energy consumption prediction method based on cyclic neural network and multi-task learning model | |
CN104181900B (en) | Layered dynamic regulation method for multiple energy media | |
Jiang et al. | Combined economic and emission dispatch problem of wind‐thermal power system using gravitational particle swarm optimization algorithm | |
CN103345663B (en) | Consider the Unit Commitment optimization method of ramping rate constraints | |
Li et al. | Coordinated control of gas supply system in PEMFC based on multi-agent deep reinforcement learning | |
CN116629461B (en) | Distributed optimization method, system, equipment and storage medium for active power distribution network | |
Yin et al. | Relaxed deep generative adversarial networks for real-time economic smart generation dispatch and control of integrated energy systems | |
Zhang et al. | Novel Data-Driven decentralized coordination model for electric vehicle aggregator and energy hub entities in multi-energy system using an improved multi-agent DRL approach | |
Deng et al. | Recurrent neural network for combined economic and emission dispatch | |
Liu et al. | Multi-agent quantum-inspired deep reinforcement learning for real-time distributed generation control of 100% renewable energy systems | |
Zhang et al. | Hybrid data-driven method for low-carbon economic energy management strategy in electricity-gas coupled energy systems based on transformer network and deep reinforcement learning | |
Yalcinoz et al. | Economic Load Dispatch Using an Improved Particle Swarm Optimization based on functional constriction factor and functional inertia weight | |
Dou et al. | Double‐deck optimal schedule of micro‐grid based on demand‐side response | |
Liu et al. | Reinforcement learning-based energy trading and management of regional interconnected microgrids | |
Spea | Social network search algorithm for combined heat and power economic dispatch | |
Nie et al. | A general real-time OPF algorithm using DDPG with multiple simulation platforms | |
Polprasert et al. | A new improved particle swarm optimization for solving nonconvex economic dispatch problems | |
Ma et al. | A Reinforcement learning based coordinated but differentiated load frequency control method with heterogeneous frequency regulation resources | |
CN116384692A (en) | Data-driven-based environmental economic dispatching method and system for wind-energy-containing power system | |
Sun et al. | An on-line generator start-up strategy based on deep learning and tree search |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |