CN113780688B - Optimized operation method, system, equipment and medium of electric heating combined system - Google Patents

Optimized operation method, system, equipment and medium of electric heating combined system Download PDF

Info

Publication number
CN113780688B
CN113780688B CN202111328629.5A CN202111328629A CN113780688B CN 113780688 B CN113780688 B CN 113780688B CN 202111328629 A CN202111328629 A CN 202111328629A CN 113780688 B CN113780688 B CN 113780688B
Authority
CN
China
Prior art keywords
agent
power
network
action
formula
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111328629.5A
Other languages
Chinese (zh)
Other versions
CN113780688A (en
Inventor
蒲天骄
董雷
李烨
王新迎
王继业
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Electric Power Research Institute Co Ltd CEPRI
Original Assignee
China Electric Power Research Institute Co Ltd CEPRI
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Electric Power Research Institute Co Ltd CEPRI filed Critical China Electric Power Research Institute Co Ltd CEPRI
Priority to CN202111328629.5A priority Critical patent/CN113780688B/en
Publication of CN113780688A publication Critical patent/CN113780688A/en
Application granted granted Critical
Publication of CN113780688B publication Critical patent/CN113780688B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E40/00Technologies for an efficient electrical power generation, transmission or distribution
    • Y02E40/70Smart grids as climate change mitigation technology in the energy generation sector
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Human Resources & Organizations (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Tourism & Hospitality (AREA)
  • Primary Health Care (AREA)
  • Water Supply & Treatment (AREA)
  • Public Health (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Feedback Control In General (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention discloses an optimized operation method, a system, equipment and a medium of an electric-heat combined system, wherein the method comprises the following steps: acquiring state parameters of an electric heating combined system to be optimally operated; wherein the state parameters include: electrical load, wind power maximum output, thermal load and ambient temperature; inputting the state parameters into a pre-trained multi-agent deep reinforcement learning model, and outputting the action quantity through the multi-agent deep reinforcement learning model; wherein the action amount includes: the power generation power of the conventional unit, the power generation power of the cogeneration device, the wind power generation power and the heat generation power of the cogeneration device; and realizing the optimized operation of the electric-heat combined system based on the action quantity. The method or the system provided by the invention can realize the multi-energy coordination optimization scheduling of the electric heating combined system.

Description

Optimized operation method, system, equipment and medium of electric heating combined system
Technical Field
The invention belongs to the technical field of comprehensive energy system optimization, relates to an electric heating combined system, and particularly relates to an optimized operation method, system, equipment and medium of the electric heating combined system.
Background
Under the background of energy internet, the development goals of improving the utilization efficiency of energy, promoting the consumption of renewable energy, realizing the sustainable development of energy and reducing the pollution to the environment are the current energy systems. The electric heating combined system is an important physical carrier of an energy internet, is a key for realizing application of concepts such as multi-energy complementation and energy cascade utilization, and is an important development direction for adjusting the structure of the current energy. The research on the comprehensive energy system coupling the power system and the heating system has important significance for breaking the existing mode of independent planning and independent operation of the original energy supply system and realizing the multi-energy complementary integration optimization of the energy system.
At present, a great deal of research is carried out on the optimization problem of the electric-heat combined system, and the research content generally comprises the steps of establishing an electric-heat combined system optimization model considering heat loss of a heat supply network water return pipe network by analyzing the actual structural characteristics of the heat supply network and combining a hydraulic thermal model of the thermal system, and solving the model. However, as the system scale is continuously increased, and meanwhile, on the basis of considering the heat loss characteristic of a heat supply network, the multi-energy complementary Optimization of the electric heating combined system presents a high-dimensional nonlinear non-convex characteristic, the traditional nonlinear solving method is difficult to solve, the solving precision is influenced by linearization processing, and the existing traditional algorithms such as PSO (Particle Swarm Optimization), DDPG (Deep Deterministic Policy gradient) and the like are difficult to overcome the problem of information barriers between different benefit subjects.
Disclosure of Invention
The invention aims to provide an optimal operation method, an optimal operation system, an optimal operation device and an optimal operation medium of an electric heating combined system, so as to solve one or more technical problems. The method or the system provided by the invention can realize the multi-energy coordination optimization scheduling of the electric heating combined system.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention provides an optimized operation method of an electric heating combined system in a first aspect, which comprises the following steps:
acquiring state parameters of an electric heating combined system to be optimally operated; wherein the state parameters include: electrical load, wind power maximum output, thermal load and ambient temperature;
inputting the state parameters into a pre-trained multi-agent deep reinforcement learning model, and outputting the action quantity through the multi-agent deep reinforcement learning model; wherein the action amount includes: the power generation power of the conventional unit, the power generation power of the cogeneration device, the wind power generation power and the heat generation power of the cogeneration device; the basic elements of the multi-agent deep reinforcement learning model comprise agents, environments, action spaces of the agents, state spaces of the agents and reward functions of the agents;
and realizing the optimized operation of the electric-heat combined system based on the action quantity.
In a further improvement of the method of the present invention, in the multi-agent deep reinforcement learning model,
the intelligent agents comprise an electric power system intelligent agent and a thermal system intelligent agent;
the environment includes mathematical models of power system and thermodynamic system energy flows;
the action space of each agent comprises an electric power system agent action space and a thermal system agent action space; the intelligent action space of the power system comprises conventional unit generating power, cogeneration device generating power and wind power generation power; the thermodynamic system intelligent body action space comprises heat generation power of a cogeneration device;
the state space of each intelligent agent comprises an electric power system intelligent agent state space and a thermodynamic system intelligent agent state space; the state space of the intelligent body of the power system comprises an electric load, the power generation power of the current cogeneration device, the maximum wind power output and the output of the current conventional unit; the intelligent state space of the thermodynamic system comprises a heat load, the heat generation power of the current cogeneration device and the ambient temperature;
the reward function of each intelligent agent comprises an electric power system intelligent agent reward function and a thermal system intelligent agent reward function; the power system intelligent agent reward function comprises a conventional unit operation cost, a wind curtailment penalty and a variable out-of-limit penalty; the thermodynamic system intelligent agent reward function comprises the operation cost of the cogeneration device and a variable out-of-limit penalty.
In a further development of the inventive method, the power system agent and the thermal system agent each comprise a respective actuator network and discriminator network;
the actor network is used for inputting a state set sensed by the agent from the environment and outputting the action of the agent in a given state; the arbiter network is used for generating a state value function according to the state of the agent and the action of the agent in the state, and evaluating the quality of the current action taken by the actor network;
the mobile network and the discriminator network both adopt a double-network structure, and comprise an estimation network and a target network with the same structure; in the training process, the estimation network parameters of the actuators and the estimation network parameters of the discriminators of all the agents are updated, and the estimation network parameters after training are used for soft updating of the target network.
The method of the invention is further improved in that, in the training process, the estimation network parameters of the actuators and the estimation network parameters of the discriminators of each agent are updated, and the step of performing soft update on the target network by using the trained estimation network parameters specifically comprises the following steps:
selecting an action for a power system agent at each scheduling period in a scheduling cycle
Figure 851326DEST_PATH_IMAGE001
Selecting actions for thermodynamic system agents
Figure 76771DEST_PATH_IMAGE002
(ii) a In the formula, s1、s2Respectively represents the current states observed by the power system intelligent agent and the thermal system intelligent agent,
Figure 993911DEST_PATH_IMAGE003
respectively representing the current strategies in the power system agent and the thermodynamic system agent actor networks,
Figure 678708DEST_PATH_IMAGE004
respectively are random noises of strategy actions of an intelligent agent of the power system and an intelligent agent of the thermodynamic system;
will be provided with
Figure 339497DEST_PATH_IMAGE005
The experience of the intelligent agent of the power system is stored in a playback unit
Figure 153869DEST_PATH_IMAGE006
Storing the data into a thermodynamic system intelligent agent experience playback unit; wherein,
Figure 507490DEST_PATH_IMAGE007
and
Figure 915469DEST_PATH_IMAGE008
are respectively an action
Figure 114369DEST_PATH_IMAGE009
Acts on the real system to observe the status of the power system agent's immediate rewards and updates,
Figure 314406DEST_PATH_IMAGE010
and
Figure 714295DEST_PATH_IMAGE011
are respectively an action
Figure 468624DEST_PATH_IMAGE012
Instant rewards and updated status for the thermodynamic system agents;
random sampling from power system agent experience playback unit
Figure 736794DEST_PATH_IMAGE013
Calculating
Figure 525759DEST_PATH_IMAGE014
Updating the arbiter estimated network parameters of the power system agent according to the first loss function
Figure 830969DEST_PATH_IMAGE015
The first loss function is expressed as,
Figure 338174DEST_PATH_IMAGE016
in the formula (I), wherein,
Figure 144456DEST_PATH_IMAGE017
a state value function of the evaluation network is evaluated for the power system agent arbiter,
Figure 787927DEST_PATH_IMAGE018
as a function of the state values of the power system agent arbiter target network,
Figure 28153DEST_PATH_IMAGE019
the number of all sub-strategies in the strategy;
updating power train estimation network parameters of power system agents according to a second loss function
Figure 757075DEST_PATH_IMAGE020
And the second loss function is expressed as,
Figure 367048DEST_PATH_IMAGE021
the expression of the target actuator network parameter and the target discriminator network parameter of the soft update power system agent is,
Figure 130604DEST_PATH_IMAGE022
Figure 777617DEST_PATH_IMAGE023
in the formula (I), wherein,
Figure 259414DEST_PATH_IMAGE024
Figure 407499DEST_PATH_IMAGE025
network parameters of an intelligent agent target actuator and a target discriminator of the power system are respectively;
random sampling from thermodynamic system intelligent agent experience playback unit
Figure 25562DEST_PATH_IMAGE026
Calculating
Figure 843476DEST_PATH_IMAGE027
Updating the network parameters estimated by the arbiter of the thermal system agent according to the third loss function
Figure 546990DEST_PATH_IMAGE028
The expression of the third loss function is,
Figure 498766DEST_PATH_IMAGE029
in the formula (I), wherein,
Figure 971335DEST_PATH_IMAGE030
a state value function of the evaluation network is evaluated for the thermal system agent arbiter,
Figure 225730DEST_PATH_IMAGE031
as a function of the state values of the thermodynamic system agent arbiter target network,
Figure 416540DEST_PATH_IMAGE032
the number of all sub-strategies in the strategy; updating actuator estimated value network parameters of thermodynamic system intelligent agent according to fourth loss function
Figure 906427DEST_PATH_IMAGE033
The expression of the fourth loss function is,
Figure 499083DEST_PATH_IMAGE034
the expression of the target actuator network parameter and the target discriminator network parameter of the intelligent agent of the soft updating thermodynamic system is,
Figure 422914DEST_PATH_IMAGE035
Figure 835441DEST_PATH_IMAGE036
in the formula (I), wherein,
Figure 129019DEST_PATH_IMAGE037
Figure 576181DEST_PATH_IMAGE038
respectively are network parameters of an intelligent agent target actuator and a target discriminator of the thermodynamic system.
A further improvement of the method of the invention is that the mathematical model of the power and thermal system power flows comprises:
the system optimization objective, expressed as,
Figure 172378DEST_PATH_IMAGE039
in the formula,
Figure 72201DEST_PATH_IMAGE040
in order to reduce the running cost of the conventional unit,
Figure 903891DEST_PATH_IMAGE041
in order to increase the operating cost of the cogeneration unit,
Figure 205559DEST_PATH_IMAGE042
punishment is carried out for wind abandonment;
Figure 97292DEST_PATH_IMAGE043
in the formula (I), wherein,
Figure 359777DEST_PATH_IMAGE044
Figure 995158DEST_PATH_IMAGE045
Figure 416912DEST_PATH_IMAGE046
is an energy consumption coefficient of a conventional unit,
Figure 479546DEST_PATH_IMAGE047
the output of the conventional machine set is used,
Figure 229327DEST_PATH_IMAGE048
is a constantThe number of the gauge sets is set according to the requirements,
Figure 402820DEST_PATH_IMAGE049
in order to schedule the period of time,
Figure 679080DEST_PATH_IMAGE050
is a scheduling time interval;
Figure 912615DEST_PATH_IMAGE051
in the formula (I), wherein,
Figure 648228DEST_PATH_IMAGE052
for the energy consumption coefficient of the cogeneration unit,
Figure 625411DEST_PATH_IMAGE053
for the amount of cogeneration,
Figure 21758DEST_PATH_IMAGE054
Figure 426194DEST_PATH_IMAGE055
respectively the electricity and heat output of the cogeneration unit;
Figure 884988DEST_PATH_IMAGE056
in the formula (I), wherein,
Figure 400283DEST_PATH_IMAGE057
in order to make the wind abandon penalty factor,
Figure 916715DEST_PATH_IMAGE058
predicting a difference value between the power and the actual power for the wind power;
the network security constraints, expressed as,
Figure 492053DEST_PATH_IMAGE059
Figure 438144DEST_PATH_IMAGE060
Figure 491550DEST_PATH_IMAGE061
in the formula (I), wherein,
Figure 596909DEST_PATH_IMAGE062
representing nodes of an electrical power network
Figure 343149DEST_PATH_IMAGE063
3 the amplitude of the voltage is set to be 3,
Figure 166748DEST_PATH_IMAGE064
Figure 899212DEST_PATH_IMAGE065
are respectively nodes
Figure 124657DEST_PATH_IMAGE063
3 upper and lower limits of voltage amplitude;
Figure 41797DEST_PATH_IMAGE066
for flowing into heat supply network node
Figure 87114DEST_PATH_IMAGE067
The temperature of the hot water of (a),
Figure 121804DEST_PATH_IMAGE068
Figure 201755DEST_PATH_IMAGE069
the upper limit and the lower limit of the water supply temperature are set;
Figure 289797DEST_PATH_IMAGE070
as a heat supply network node
Figure 822409DEST_PATH_IMAGE071
And node
Figure 162255DEST_PATH_IMAGE072
The mass flow rate of the intermediate hot water pipeline,
Figure 96713DEST_PATH_IMAGE073
Figure 355656DEST_PATH_IMAGE074
respectively as its upper and lower limits;
the cogeneration unit is constrained, as expressed,
Figure 109985DEST_PATH_IMAGE075
in the formula (I), wherein,
Figure 378155DEST_PATH_IMAGE076
Figure 42486DEST_PATH_IMAGE077
are respectively a period of time
Figure 472330DEST_PATH_IMAGE078
The first step
Figure 979535DEST_PATH_IMAGE079
The bench pumping condensing unit generates electric power and heat power;
Figure 785817DEST_PATH_IMAGE080
Figure 304654DEST_PATH_IMAGE081
the upper limit and the lower limit of the electric output force are respectively;
Figure 170979DEST_PATH_IMAGE082
Figure 899901DEST_PATH_IMAGE083
Figure 509874DEST_PATH_IMAGE084
representing coefficients for the polygonal areas;
the climbing of the cogeneration device is restricted by the expression,
Figure 381752DEST_PATH_IMAGE085
in the formula (I), wherein,
Figure 418979DEST_PATH_IMAGE086
Figure 635196DEST_PATH_IMAGE087
the cogeneration power of the front and the back two periods respectively,
Figure 783281DEST_PATH_IMAGE088
Figure 276710DEST_PATH_IMAGE089
respectively is the upper limit and the lower limit of the climbing speed of the cogeneration device;
the renewable energy source is restricted, and the expression is,
Figure 484838DEST_PATH_IMAGE090
in the formula (I), wherein,
Figure 188351DEST_PATH_IMAGE091
indicating a period of time
Figure 140127DEST_PATH_IMAGE092
Wind turbine
Figure 488063DEST_PATH_IMAGE093
The power generated by the generator is used as the power,
Figure 601512DEST_PATH_IMAGE094
the maximum output value is the maximum output value of the wind driven generator;
the output constraint of the conventional unit is represented by the following expression,
Figure 792322DEST_PATH_IMAGE095
the climbing of the conventional unit is restrained, the expression is,
Figure 282209DEST_PATH_IMAGE096
in the formula (I), wherein,
Figure 874865DEST_PATH_IMAGE097
in order to generate the power for the conventional unit,
Figure 34582DEST_PATH_IMAGE098
respectively are the upper limit and the lower limit of the unit output,
Figure 712688DEST_PATH_IMAGE099
Figure 6266DEST_PATH_IMAGE100
the upper limit and the lower limit of the climbing speed of the unit are respectively set.
In a further improvement of the method of the present invention, the expression of the power system agent reward function is,
Figure 453428DEST_PATH_IMAGE101
in the formula,
Figure 548160DEST_PATH_IMAGE102
punishment is carried out on the running cost and the abandoned wind of the power system;
Figure 447983DEST_PATH_IMAGE103
a system node voltage out-of-limit penalty item is obtained;
Figure 279673DEST_PATH_IMAGE104
for the output out-of-limit penalty term of the cogeneration unit,
Figure 846921DEST_PATH_IMAGE105
for the climbing out-of-limit punishment item of the cogeneration device,
Figure 614020DEST_PATH_IMAGE106
is an out-of-limit punishment item of the output of the conventional unit,
Figure 1139DEST_PATH_IMAGE107
a climbing out-of-limit punishment item for the conventional unit;
the expression of the thermodynamic system agent reward function is,
Figure 636519DEST_PATH_IMAGE108
in the formula,
Figure 58273DEST_PATH_IMAGE109
for the output out-of-limit punishment item of the cogeneration unit,
Figure 996273DEST_PATH_IMAGE110
for the climbing out-of-limit punishment item of the cogeneration unit,
Figure 870688DEST_PATH_IMAGE111
punishment is carried out for the temperature of the system node,
Figure 44181DEST_PATH_IMAGE112
and punishing the out-of-limit of the mass flow rate of the system pipeline.
The invention provides an optimized operation system of an electric heating combined system in a second aspect, which comprises:
the parameter acquisition module is used for acquiring state parameters of the electric heating combined system to be optimally operated; wherein the state parameters include: electrical load, wind power maximum output, thermal load and ambient temperature;
the action quantity acquisition module is used for inputting the state parameters into a pre-trained multi-agent deep reinforcement learning model and outputting action quantities through the multi-agent deep reinforcement learning model; wherein the action amount includes: the power generation power of the conventional unit, the power generation power of the cogeneration device, the wind power generation power and the heat generation power of the cogeneration device; the basic elements of the multi-agent deep reinforcement learning model comprise agents, environments, action spaces of the agents, state spaces of the agents and reward functions of the agents;
and the optimized operation module is used for realizing the optimized operation of the electric heating combined system based on the action quantity.
In a further improvement of the system of the present invention, in the multi-agent deep reinforcement learning model of the motion quantity acquisition module,
the intelligent agents comprise an electric power system intelligent agent and a thermal system intelligent agent;
the environment includes mathematical models of power system and thermodynamic system energy flows;
the action space of each agent comprises an electric power system agent action space and a thermal system agent action space; the intelligent action space of the power system comprises conventional unit generating power, cogeneration device generating power and wind power generation power; the thermodynamic system intelligent body action space comprises heat generation power of a cogeneration device;
the state space of each intelligent agent comprises an electric power system intelligent agent state space and a thermodynamic system intelligent agent state space; the state space of the intelligent body of the power system comprises an electric load, the power generation power of the current cogeneration device, the maximum wind power output and the output of the current conventional unit; the intelligent state space of the thermodynamic system comprises a heat load, the heat generation power of the current cogeneration device and the ambient temperature;
the reward function of each intelligent agent comprises an electric power system intelligent agent reward function and a thermal system intelligent agent reward function; the power system intelligent agent reward function comprises a conventional unit operation cost, a wind curtailment penalty and a variable out-of-limit penalty; the thermodynamic system intelligent agent reward function comprises the operation cost of the cogeneration device and a variable out-of-limit penalty.
In the action quantity obtaining module, the power system intelligent agent and the thermal system intelligent agent both comprise respective actuator networks and discriminator networks;
the actor network is used for inputting a state set sensed by the agent from the environment and outputting the action of the agent in a given state; the arbiter network is used for generating a state value function according to the state of the agent and the action of the agent in the state, and evaluating the quality of the current action taken by the actor network;
the mobile network and the discriminator network both adopt a double-network structure, and comprise an estimation network and a target network with the same structure; in the training process, the estimation network parameters of the actuators and the estimation network parameters of the discriminators of all the agents are updated, and the estimation network parameters after training are used for soft updating of the target network.
The system of the present invention is further improved in that, in the action quantity obtaining module, in the training process, the estimation network parameters of the actuators and the estimation network parameters of the discriminators of each agent are updated, and the step of performing soft update on the target network by using the trained estimation network parameters specifically includes:
selecting an action for a power system agent at each scheduling period in a scheduling cycle
Figure 320441DEST_PATH_IMAGE113
Selecting actions for thermodynamic system agents
Figure 429343DEST_PATH_IMAGE114
(ii) a In the formula, s1、s2Respectively represents the current states observed by the power system intelligent agent and the thermal system intelligent agent,
Figure 791054DEST_PATH_IMAGE115
respectively representing the current strategies in the power system agent and the thermodynamic system agent actor networks,
Figure 768237DEST_PATH_IMAGE116
respectively are random noises of strategy actions of an intelligent agent of the power system and an intelligent agent of the thermodynamic system;
will be provided with
Figure 899004DEST_PATH_IMAGE117
The experience of the intelligent agent of the power system is stored in a playback unit
Figure 677342DEST_PATH_IMAGE118
Storing the data into a thermodynamic system intelligent agent experience playback unit; wherein,
Figure 526350DEST_PATH_IMAGE007
and
Figure 41645DEST_PATH_IMAGE119
are respectively an action
Figure 26918DEST_PATH_IMAGE120
Acts on the real system to observe the status of the power system agent's immediate rewards and updates,
Figure 602256DEST_PATH_IMAGE121
and
Figure 813926DEST_PATH_IMAGE122
are respectively an action
Figure 132911DEST_PATH_IMAGE123
Instant rewards and updated status for the thermodynamic system agents;
random sampling from power system agent experience playback unit
Figure 238271DEST_PATH_IMAGE124
Calculating
Figure 984510DEST_PATH_IMAGE125
Updating the arbiter estimated network parameters of the power system agent according to the first loss function
Figure 683476DEST_PATH_IMAGE126
The first loss function is expressed as,
Figure 540573DEST_PATH_IMAGE127
in the formula (I), wherein,
Figure 500439DEST_PATH_IMAGE128
a state value function of the evaluation network is evaluated for the power system agent arbiter,
Figure 683159DEST_PATH_IMAGE129
as a function of the state values of the power system agent arbiter target network,
Figure 603841DEST_PATH_IMAGE019
the number of all sub-strategies in the strategy;
updating power train estimation network parameters of power system agents according to a second loss function
Figure 264630DEST_PATH_IMAGE130
And the second loss function is expressed as,
Figure 344581DEST_PATH_IMAGE131
the expression of the target actuator network parameter and the target discriminator network parameter of the soft update power system agent is,
Figure 167044DEST_PATH_IMAGE132
Figure 699656DEST_PATH_IMAGE133
in the formula (I), wherein,
Figure 803616DEST_PATH_IMAGE134
Figure 472495DEST_PATH_IMAGE135
network parameters of an intelligent agent target actuator and a target discriminator of the power system are respectively;
random sampling from thermodynamic system intelligent agent experience playback unit
Figure 731438DEST_PATH_IMAGE136
Calculating
Figure 751346DEST_PATH_IMAGE137
Updating the network parameters estimated by the arbiter of the thermal system agent according to the third loss function
Figure 629304DEST_PATH_IMAGE138
The expression of the third loss function is,
Figure 418268DEST_PATH_IMAGE139
in the formula (I), wherein,
Figure 113692DEST_PATH_IMAGE140
a state value function of the evaluation network is evaluated for the thermal system agent arbiter,
Figure 620896DEST_PATH_IMAGE141
as a function of the state values of the thermodynamic system agent arbiter target network,
Figure 36965DEST_PATH_IMAGE142
the number of all sub-strategies in the strategy; updating actuator estimated value network parameters of thermodynamic system intelligent agent according to fourth loss function
Figure 946016DEST_PATH_IMAGE143
The expression of the fourth loss function is,
Figure 546761DEST_PATH_IMAGE144
the expression of the target actuator network parameter and the target discriminator network parameter of the intelligent agent of the soft updating thermodynamic system is,
Figure 541262DEST_PATH_IMAGE145
Figure 761022DEST_PATH_IMAGE146
in the formula (I), wherein,
Figure 524579DEST_PATH_IMAGE147
Figure 561805DEST_PATH_IMAGE148
respectively are network parameters of an intelligent agent target actuator and a target discriminator of the thermodynamic system.
In the system of the present invention, the mathematical model of the power flow of the power system and the thermodynamic system in the action quantity obtaining module comprises:
the system optimization objective, expressed as,
Figure 778022DEST_PATH_IMAGE149
in the formula,
Figure 660528DEST_PATH_IMAGE040
in order to reduce the running cost of the conventional unit,
Figure 652492DEST_PATH_IMAGE041
in order to increase the operating cost of the cogeneration unit,
Figure 860620DEST_PATH_IMAGE042
punishment is carried out for wind abandonment;
Figure 564133DEST_PATH_IMAGE150
in the formula (I), wherein,
Figure 250330DEST_PATH_IMAGE151
Figure 863845DEST_PATH_IMAGE045
Figure 242874DEST_PATH_IMAGE152
is an energy consumption coefficient of a conventional unit,
Figure 433683DEST_PATH_IMAGE153
the output of the conventional machine set is used,
Figure 923571DEST_PATH_IMAGE154
the number of the conventional units is the same as that of the conventional units,
Figure 126013DEST_PATH_IMAGE049
in order to schedule the period of time,
Figure 675943DEST_PATH_IMAGE155
is a scheduling time interval;
Figure 354049DEST_PATH_IMAGE156
in the formula (I), wherein,
Figure 647627DEST_PATH_IMAGE157
for the energy consumption coefficient of the cogeneration unit,
Figure 970155DEST_PATH_IMAGE158
for the amount of cogeneration,
Figure 159828DEST_PATH_IMAGE159
Figure 325230DEST_PATH_IMAGE160
respectively the electricity and heat output of the cogeneration unit;
Figure 156920DEST_PATH_IMAGE161
in the formula (I), wherein,
Figure 724168DEST_PATH_IMAGE162
in order to make the wind abandon penalty factor,
Figure 747660DEST_PATH_IMAGE163
predicting a difference value between the power and the actual power for the wind power;
the network security constraints, expressed as,
Figure 400358DEST_PATH_IMAGE164
Figure 35739DEST_PATH_IMAGE165
Figure 191914DEST_PATH_IMAGE166
in the formula (I), wherein,
Figure 129914DEST_PATH_IMAGE167
representing nodes of an electrical power network
Figure 4329DEST_PATH_IMAGE168
3 the amplitude of the voltage is set to be 3,
Figure 443400DEST_PATH_IMAGE169
Figure 719661DEST_PATH_IMAGE170
are respectively nodes
Figure 687617DEST_PATH_IMAGE168
3 upper and lower limits of voltage amplitude;
Figure 924694DEST_PATH_IMAGE171
for flowing into heat supply network node
Figure 901878DEST_PATH_IMAGE172
The temperature of the hot water of (a),
Figure 32645DEST_PATH_IMAGE173
Figure 437081DEST_PATH_IMAGE174
the upper limit and the lower limit of the water supply temperature are set;
Figure 161455DEST_PATH_IMAGE175
as a heat supply network node
Figure 676750DEST_PATH_IMAGE071
And node
Figure 927602DEST_PATH_IMAGE176
The mass flow rate of the intermediate hot water pipeline,
Figure 502940DEST_PATH_IMAGE177
Figure 213145DEST_PATH_IMAGE178
respectively as its upper and lower limits;
the cogeneration unit is constrained, as expressed,
Figure 532131DEST_PATH_IMAGE179
in the formula (I), wherein,
Figure 637490DEST_PATH_IMAGE180
Figure 118150DEST_PATH_IMAGE181
are respectively a period of time
Figure 676170DEST_PATH_IMAGE078
The first step
Figure 674213DEST_PATH_IMAGE079
The bench pumping condensing unit generates electric power and heat power;
Figure 634079DEST_PATH_IMAGE182
Figure 551220DEST_PATH_IMAGE183
the upper limit and the lower limit of the electric output force are respectively;
Figure 862115DEST_PATH_IMAGE184
Figure 398270DEST_PATH_IMAGE083
Figure 212642DEST_PATH_IMAGE185
representing coefficients for the polygonal areas;
the climbing of the cogeneration device is restricted by the expression,
Figure 566263DEST_PATH_IMAGE186
in the formula (I), wherein,
Figure 98876DEST_PATH_IMAGE187
Figure 438721DEST_PATH_IMAGE188
the cogeneration power of the front and the back two periods respectively,
Figure 107600DEST_PATH_IMAGE189
Figure 632122DEST_PATH_IMAGE190
respectively is the upper limit and the lower limit of the climbing speed of the cogeneration device;
the renewable energy source is restricted, and the expression is,
Figure 386452DEST_PATH_IMAGE191
in the formula (I), wherein,
Figure 389043DEST_PATH_IMAGE192
indicating a period of time
Figure 551908DEST_PATH_IMAGE193
Wind turbine
Figure 247332DEST_PATH_IMAGE194
The power generated by the generator is used as the power,
Figure 488957DEST_PATH_IMAGE195
the maximum output value is the maximum output value of the wind driven generator;
the output constraint of the conventional unit is represented by the following expression,
Figure 295239DEST_PATH_IMAGE196
the climbing of the conventional unit is restrained, the expression is,
Figure 79656DEST_PATH_IMAGE197
in the formula (I), wherein,
Figure 680401DEST_PATH_IMAGE198
in order to generate the power for the conventional unit,
Figure 674902DEST_PATH_IMAGE199
respectively are the upper limit and the lower limit of the unit output,
Figure 19296DEST_PATH_IMAGE200
Figure 517273DEST_PATH_IMAGE201
the upper limit and the lower limit of the climbing speed of the unit are respectively set.
In a further improvement of the system of the present invention, the power system agent reward function is expressed as,
Figure 164286DEST_PATH_IMAGE202
in the formula,
Figure 646083DEST_PATH_IMAGE203
punishment is carried out on the running cost and the abandoned wind of the power system;
Figure 794168DEST_PATH_IMAGE103
a system node voltage out-of-limit penalty item is obtained;
Figure 412231DEST_PATH_IMAGE204
for the output out-of-limit penalty term of the cogeneration unit,
Figure 495725DEST_PATH_IMAGE205
for the climbing out-of-limit punishment item of the cogeneration device,
Figure 199239DEST_PATH_IMAGE206
is an out-of-limit punishment item of the output of the conventional unit,
Figure 885435DEST_PATH_IMAGE207
a climbing out-of-limit punishment item for the conventional unit;
the expression of the thermodynamic system agent reward function is,
Figure 623584DEST_PATH_IMAGE208
in the formula,
Figure 110935DEST_PATH_IMAGE209
for the output out-of-limit punishment item of the cogeneration unit,
Figure 301745DEST_PATH_IMAGE210
for the climbing out-of-limit punishment item of the cogeneration unit,
Figure 791632DEST_PATH_IMAGE111
punishment is carried out for the temperature of the system node,
Figure 118708DEST_PATH_IMAGE211
and punishing the out-of-limit of the mass flow rate of the system pipeline.
A third aspect of the present invention provides a computer device, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method for optimizing operation of an electric-thermal combination system according to any one of the above embodiments when executing the computer program.
A fourth aspect of the present invention provides a computer-readable storage medium storing a computer program, wherein the computer program is configured to, when executed by a processor, implement the steps of the method for optimizing operation of an electric-heat combined system according to any one of the above aspects of the present invention.
Compared with the prior art, the invention has the following beneficial effects:
the method provided by the invention determines the state parameters, based on a multi-agent deep reinforcement learning model, solves the problem of electric-thermal joint optimization by adopting a reinforcement learning method, improves the generation speed of the control strategy by reinforcement learning on the premise of ensuring the calculation effect, and can overcome the defects that the traditional method has overlong operation time along with the increase of the system scale and is difficult to meet the requirement of on-line calculation.
In the method, based on a multi-agent depth certainty strategy gradient algorithm framework, an electric heating combined system optimization scheduling model based on a multi-agent actor-evaluator is constructed, convergence is stable, space exploratory performance is strong, and the defect that the existing traditional method is easy to fall into a local optimal solution during solving can be overcome.
According to the method, an electric heating combined system is divided into an electric power system intelligent body and a thermodynamic system intelligent body, the intelligent bodies cooperate to achieve the overall optimization target of the system, a reinforcement learning action and a state space are divided by combining an electric heating combined system scheduling model, a reward and punishment mechanism of each intelligent body is established, respective strategy calculation can be completed only through local state information of each intelligent body, and the problem that data of different beneficial bodies are difficult to share is solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art are briefly introduced below; it is obvious that the drawings in the following description are some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
FIG. 1 is a schematic diagram of the training process of the DDPG model in comparative example 2 of the present invention;
FIG. 2 is a schematic flow chart of a method for optimizing operation of an integrated electric heating system according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of an electric heating combination system in an embodiment of the present invention;
FIG. 4 is a diagram illustrating a reinforcement learning model according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of the interior of a smart body in an embodiment of the invention;
FIG. 6 is a schematic diagram of a multi-agent framework of an electrothermal combined system according to an embodiment of the present invention;
FIG. 7 is a flow chart of a multi-agent deep reinforcement learning network training framework according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The invention is described in further detail below with reference to the accompanying drawings:
comparative example 1
The particle swarm optimization algorithm takes a particle swarm as a basic unit, each particle represents a possible problem solution, and the intelligence of problem solution is realized through the information interaction in the particle swarm through the simple behavior of individual particles. The particle swarm optimization algorithm is applied, firstly, an electric-thermal comprehensive energy system optimization scheduling model (which can exemplarily comprise power grid, heat grid power flow constraint, safe operation constraint, cogeneration unit constraint, system optimization target and the like) is established, and then the particle swarm optimization algorithm is utilized for solving.
When the method is specifically executed, firstly, setting the maximum iteration number, the independent variable number and the maximum particle speed, and initializing the speed and the position for the particle swarm; and then defining a fitness function according to an optimization target of the optimization scheduling model of the electric-thermal comprehensive energy system. The extreme value of each individual is the optimal solution found by each particle, the minimum value of all particle optimal solutions is the global optimal solution, the global optimal solution is compared with the historical global optimal solution, and the speed and the position are updated according to the formulas (1) and (2):
Figure 668638DEST_PATH_IMAGE212
(1)
Figure 222110DEST_PATH_IMAGE213
(2)
in the formula,
Figure 250109DEST_PATH_IMAGE214
is a variable of
Figure 697271DEST_PATH_IMAGE215
Individuals
Figure 418102DEST_PATH_IMAGE216
The speed and the position of the vehicle,
Figure 458870DEST_PATH_IMAGE217
is a factor of the inertia, and is,
Figure 290560DEST_PATH_IMAGE218
in order to learn the factors, the learning device is provided with a plurality of learning units,
Figure 592229DEST_PATH_IMAGE219
is shown as
Figure 483961DEST_PATH_IMAGE215
First extreme of individual variable
Figure 12026DEST_PATH_IMAGE216
The ratio of vitamin to vitamin is,
Figure 381827DEST_PATH_IMAGE220
represents the global optimal solution
Figure 538002DEST_PATH_IMAGE216
And (5) maintaining.
And stopping iteration when the maximum iteration number is reached or the iteration difference value meets the precision requirement.
Based on the above analysis, the method of comparative example 1 of the present invention has the following defects:
(1) the particle swarm algorithm is easy to fall into a local optimal solution, and the problem of low convergence precision and the like can be caused when the exploration capability of the algorithm is insufficient, even the convergence is difficult, so that the effectiveness of the optimization scheduling calculation result of the electricity-heat integrated energy system is influenced.
(2) With the problem scale enlargement, the particle swarm optimization has the problem of dimension explosion, the dimension explosion greatly increases the calculation amount, further causes the calculation speed to be greatly reduced, and is probably not suitable for application occasions with higher requirements on the calculation speed.
Comparative example 2
The DDPG is a reinforcement learning algorithm aiming at a continuous action space, is developed from the traditional PG algorithm, and is suitable for solving the problem of optimal scheduling of an electric-thermal comprehensive energy system; the general steps of optimizing and scheduling by adopting a DDPG algorithm comprise establishing an intelligent body actor network and a judger network, interacting with the environment to generate training samples and constructing playback units, and randomly selecting the playback unit samples to train the actor network and the judger network; and after multiple times of training, outputting the control strategy of the electric-thermal comprehensive energy system by utilizing the actuator network according to the input information.
Referring to fig. 1, a model training process in comparative example 2 of the present invention is shown in fig. 1, and the calculation process specifically includes the following steps:
(1) establishing an actor network and a judger network, and initializing each network parameter;
(2) giving an initial state to the intelligent agent, generating a strategy through a forward process of an actor network in each iteration, evaluating the action by using a judger network, sending the action into an environment for state transition, and calculating to obtain a reward function; storing the generated group of samples into a playback unit, and randomly selecting a batch of samples to update the parameters of the Actor network and the Critic network;
wherein, the criticic network adopts formula (3) to update:
Figure 600636DEST_PATH_IMAGE221
(3)
the Actor network is updated by the formula (4):
Figure 475051DEST_PATH_IMAGE222
(4)
in the formula,
Figure 288024DEST_PATH_IMAGE223
in order to be a value of the prize,
Figure 564285DEST_PATH_IMAGE224
in order to be a factor of the discount,
Figure 797820DEST_PATH_IMAGE225
in the state of the intelligent agent, the intelligent agent state,
Figure 159531DEST_PATH_IMAGE226
as a result of the network parameters,
Figure 746501DEST_PATH_IMAGE227
is an agent action.
(3) And judging whether the upper limit of the iteration times is reached, if so, stopping training, and outputting parameters of the actor network and the judger network.
Based on the above analysis, the method of comparative example 2 of the present invention has the following defects:
(1) in practical application, different systems may belong to different departments in charge, information barriers exist, optimization is difficult to perform on the premise of complete data sharing, and the DDPG technology cannot give out optimal control action under the condition of only knowing local information of the DDPG technology;
(2) the problem scale under a large system is enlarged, the dimensionality of a single-agent DDPG action space is large, and the problem of insufficient exploration of the action space possibly exists, so that the local optimal solution is converged.
To sum up, under the background of energy internet, the electric heating combined system becomes the key for realizing the application of concepts such as multi-energy complementation, energy cascade utilization and the like. At present, an electric heating combined system optimization model considering heat loss of a heat supply network return water pipe network is mainly established in the electric heating combined system optimization, but along with the continuous increase of the system scale, the electric heating combined system optimization model presents the characteristic of high-dimensional non-linearity and non-convexity, and the traditional method is difficult to solve; however, the PSO or DDPG algorithm requires state information of the entire system, and it is difficult to overcome the information barrier problem.
Example 1
In the technical scheme provided by the embodiment of the invention, an electric heating combined system optimization scheduling model based on multi-agent depth certainty strategy gradient is constructed, and multi-energy coordination optimization scheduling of the electric heating combined system is realized. Compared with a traditional model, the method effectively solves the problem of sequence decision in the continuous control process, avoids the defects caused by adopting a discrete action space, can complete respective strategy calculation only by knowing the local state information of each intelligent agent, and solves the problem of data sharing of different intelligent agents. In addition, an electric-thermal combined system (for example, described in the following documents, [1] Wangbeiliang, Wangdan, Jia Hongjie, and the like, a typical regional comprehensive energy system steady state analysis research in the context of energy Internet reviews [ J ]. Chinese Motor engineering report, 2016, 36 (12): 3292-. At present, an electric heating combined system optimization model considering heat loss of a heat supply network and a water return network is increased along with the continuous increase of system scale, high-dimensional nonlinear non-convex characteristics are presented, the traditional nonlinear solving method is difficult to solve, and linear treatment influences solving precision. In the technical scheme provided by the embodiment of the invention, the method for optimizing the operation of the electric heating combined system is constructed based on a multi-agent depth certainty strategy gradient (MADDPG), the strategy generation speed is improved, the problem of precision reduction caused by discretization action state space can be avoided, each intelligent agent only depends on local information to complete calculation in the strategy execution process, the problem of data sharing of different beneficial agents is solved, and therefore the multifunctional coordination optimization scheduling of the electric heating combined system is realized.
Referring to fig. 2, an optimized operation method of an electric-heat combined system according to an embodiment of the present invention includes the following steps:
step 1, acquiring state parameters of an electric heating combined system to be optimally operated; wherein the state parameters include: electrical load, wind power maximum output, thermal load, and ambient temperature.
Step 2, inputting the state parameters into a pre-trained multi-agent deep reinforcement learning model, and outputting the action quantity through the multi-agent deep reinforcement learning model; wherein the action amount includes: the power generation power of the conventional unit, the power generation power of the cogeneration device, the wind power generation power and the heat generation power of the cogeneration device;
and 3, realizing the optimized operation of the electric heating combined system based on the action quantity.
The method of the embodiment of the invention determines the state parameters, adopts the reinforcement learning method to solve the problem of electric-heat combined optimization based on the multi-agent deep reinforcement learning model, improves the generation speed of the control strategy by reinforcement learning on the premise of ensuring the calculation effect, and can overcome the defects that the traditional method has overlong operation time along with the increase of the system scale and is difficult to meet the requirement of on-line calculation.
Example 2
Based on the above embodiment 1, referring to fig. 3, in an optional aspect of the embodiment of the present invention, the electric-heat combined system includes: conventional generator sets, wind turbine generators, cogeneration units, and the like; wherein, G1, G2 represent conventional generator sets, which are responsible for supplying electrical loads in the system; w1 represents a wind turbine generator, the influence of the maximum output wind speed and the like of the wind turbine generator is random, and the maximum output of the wind turbine generator needs to be obtained according to the prediction result in the day ahead; CHP1 and CHP2 indicate cogeneration units that supply an electric load in the system and supply an electric load in the system; load1, load2, load3 represent the electrical load within the system; hload1, Hload2, Hload3 represent the thermal load in the system.
Illustratively, given that cogeneration systems are already state of the art (reference may be made to the references given above), a brief description is given here as a support for the ease of understanding of the reader.
Example 3
Referring to fig. 4 and 5 based on the above embodiment 1, in an alternative embodiment of the present invention, the multi-agent deep reinforcement learning model is shown in fig. 4 and includes: agent, environment, action, status, and reward function.
The internal structure of the agent is shown in fig. 5, each agent is composed of a policy (Actor) network and a value function (criticic) network, the agent inputs the state set from the environment perception state(s) into the policy network, the policy of the agent is obtained through calculation of the neural network, and all actions (a) of the agent in a given state are output. Specifically, the invention divides the power system and the thermodynamic system into two agents in the model.
Environment: including basic mathematical models of power and thermal system power flows.
Exemplary, with respect to the power system model: in the embodiment of the invention, the alternating current power flow is used as an analysis method of the power system, and a power balance equation of the power system is expressed as follows:
Figure 142847DEST_PATH_IMAGE228
(5)
in the formula,
Figure 547284DEST_PATH_IMAGE229
are respectively nodes
Figure 396291DEST_PATH_IMAGE230
The active power and the reactive power are injected into the reactor,
Figure 786952DEST_PATH_IMAGE231
is a node
Figure 772226DEST_PATH_IMAGE232
The magnitude of the voltage of (a) is,
Figure 347564DEST_PATH_IMAGE233
are respectively a branch
Figure 683867DEST_PATH_IMAGE234
The electric conductance and the susceptance of the electric power,
Figure 2853DEST_PATH_IMAGE235
is a branch
Figure 983579DEST_PATH_IMAGE236
Phase angle difference of (2).
Exemplary, regarding the thermodynamic system model: in the embodiment of the invention, the thermodynamic system generates heat energy at a heat source, the heat energy is conveyed to a heat load through a water conveying pipeline, and the heat energy is cooled by the heat load and then flows back through a water return pipeline to form a closed loop; the thermodynamic system is divided into a hydraulic model and a thermodynamic model:
1) regarding the hydraulic model: the hydraulic model of the thermodynamic system represents the medium flow and consists of a flow continuity equation, a loop pressure equation and a head loss equation.
Figure 729818DEST_PATH_IMAGE237
(6)
In the formula,
Figure 287838DEST_PATH_IMAGE238
for the purposes of the node-branch association matrix,
Figure 410515DEST_PATH_IMAGE239
is a loop-branch correlation matrix.
Figure 744282DEST_PATH_IMAGE240
In order to be able to measure the mass flow rate of the pipeline,
Figure 661422DEST_PATH_IMAGE241
the node injection flow rate is shown,
Figure 972318DEST_PATH_IMAGE242
the loss of head pressure is indicated,
Figure 633106DEST_PATH_IMAGE243
is the damping coefficient of the pipe.
2) Regarding the thermodynamic model: the thermodynamic model represents an energy transmission process and is composed of a node power equation, a pipeline temperature drop equation and a node medium mixing equation.
Figure 322845DEST_PATH_IMAGE244
(7)
In the formula,
Figure 676466DEST_PATH_IMAGE245
is a node
Figure 209078DEST_PATH_IMAGE246
The injection thermal power of (a) is,
Figure 142399DEST_PATH_IMAGE247
is the specific heat capacity of the water,
Figure 76857DEST_PATH_IMAGE248
is a node
Figure 476746DEST_PATH_IMAGE249
The water temperature of the heat delivery pipeline and the water temperature of the outlet,
Figure 231075DEST_PATH_IMAGE250
subscript
Figure 499245DEST_PATH_IMAGE251
Is shown in
Figure 288210DEST_PATH_IMAGE252
Is a heat supply network pipeline branch of the head-end node,
Figure 593420DEST_PATH_IMAGE253
of the branch
Figure 100625DEST_PATH_IMAGE254
The temperature of the end part of the tube is measured,
Figure 906907DEST_PATH_IMAGE255
indicating the ambient temperature.
State space of each agent: for the intelligent state space of the power system, the intelligent state space comprises an electric load, the power generation power of the cogeneration device with the last time section, the maximum wind power output and the conventional unit output with the last time section; for the intelligent state space of the thermodynamic system, the intelligent state space comprises a heat load, the heat generation power of the heat and power cogeneration device with the last time section and the ambient temperature;
action space of each agent: the power system intelligent body motion space comprises conventional unit generating power, cogeneration generating power and wind power generating power; the heat and power cogeneration power is included for the thermodynamic system agent action space.
Reward and punishment mechanism of each agent: for the intelligent agent of the power system, the reward function comprises the operation cost of a conventional unit, a wind abandoning punishment and a variable out-of-limit punishment; for thermodynamic system agents, the reward function includes the cogeneration unit operating cost and the variable violation penalty.
According to the method, an electric heating combined system is divided into an electric power system intelligent body and a thermodynamic system intelligent body, the intelligent bodies cooperate to achieve the overall optimization target of the system, a reinforced learning action and a state space are divided by combining an electric heating combined system scheduling model, a reward and punishment mechanism of each intelligent body is established, respective strategy calculation can be completed only through local state information of each intelligent body, and the problem that data of different beneficial bodies are difficult to share is solved.
Preferably, in an embodiment of the present invention, the obtaining step of the pre-trained multi-agent deep reinforcement learning model includes:
acquiring sample operation parameters of an electric heating combined system to be optimally operated, and initializing the system state of the electric heating combined system; the operating parameters include: electric load power
Figure 550378DEST_PATH_IMAGE256
Capacity of the generator
Figure 790604DEST_PATH_IMAGE257
Wind power forecast power
Figure 519526DEST_PATH_IMAGE258
Wind curtailment coefficient
Figure 129499DEST_PATH_IMAGE259
Voltage of nodeConstraining
Figure 627476DEST_PATH_IMAGE260
Unit climbing restraint
Figure 399123DEST_PATH_IMAGE261
Thermal load power
Figure 756286DEST_PATH_IMAGE262
Ambient temperature
Figure 904371DEST_PATH_IMAGE263
Node temperature constraint
Figure 522434DEST_PATH_IMAGE264
Pipe flow restriction
Figure 464982DEST_PATH_IMAGE265
At each scheduling period in the scheduling cycle, for each agent, the actions are selected:
Figure 43862DEST_PATH_IMAGE266
act in
Figure 995638DEST_PATH_IMAGE267
Real-time rewards for real system observations
Figure 202628DEST_PATH_IMAGE268
And new state
Figure 581657DEST_PATH_IMAGE269
Will be
Figure 772467DEST_PATH_IMAGE270
Storing in an experience playback unit, performing state update, and randomly sampling from the playback unit
Figure 137720DEST_PATH_IMAGE271
To obtain
Figure 730375DEST_PATH_IMAGE272
Calculating
Figure 280306DEST_PATH_IMAGE273
The arbiter network is updated according to the loss function shown in equation (8):
Figure 692832DEST_PATH_IMAGE274
(8)
updating the actor network according to the loss function shown in equation (9):
Figure 360312DEST_PATH_IMAGE275
(9)
the target network parameter expression for each agent is softly updated as follows:
Figure 807474DEST_PATH_IMAGE276
(10)
and repeating the training process until convergence to obtain the trained reinforcement learning model.
In the method provided by the embodiment of the invention, based on a multi-agent depth certainty strategy gradient algorithm framework, an electric heating combined system optimization scheduling model based on a multi-agent actor-evaluator is constructed, the convergence is stable, the space exploratory property is strong, and the defect that the existing traditional method is easy to fall into a local optimal solution during solving can be overcome.
Example 4
Referring to fig. 2 to 7, an optimized operation method of an electric heating combined system according to an embodiment of the present invention includes the following steps:
step 1, importing operation parameters of an electric heating combined system. Illustratively, step 1 of the embodiment of the invention is introduced into network operation parameters of the electric-thermal combined system, and the specific parameters are shown in table 1:
TABLE 1 import parameter Table
Figure 528305DEST_PATH_IMAGE277
Step 2, establishing an optimal scheduling model of the electric heating combined system
Step 201, respectively establishing energy flow models of the power system and the thermodynamic system.
For the electric power system model, the invention takes the alternating current power flow as the analysis method of the electric power system, and the power balance equation of the electric power system is expressed as follows:
Figure 693707DEST_PATH_IMAGE278
in the formula,
Figure 400763DEST_PATH_IMAGE279
are respectively nodes
Figure 702431DEST_PATH_IMAGE280
The active power and the reactive power are injected into the reactor,
Figure 594164DEST_PATH_IMAGE281
is a node
Figure 981283DEST_PATH_IMAGE280
The magnitude of the voltage of (a) is,
Figure 492030DEST_PATH_IMAGE282
are respectively a branch
Figure 913784DEST_PATH_IMAGE283
The conductance and the susceptance of (c),
Figure 976418DEST_PATH_IMAGE284
is a branch
Figure 850833DEST_PATH_IMAGE285
Phase angle difference of (2).
For a thermodynamic system model, the thermodynamic system in the embodiment of the invention generates heat energy at a heat source, the heat energy is conveyed to a heat load through a water conveying pipeline, and the heat energy is cooled by the heat load and then flows back through a water return pipeline to form a closed loop. The thermodynamic system is divided into a hydraulic model and a thermodynamic model:
1) and (4) a hydraulic model. The hydraulic model of the thermodynamic system represents the medium flow and consists of a flow continuity equation, a loop pressure equation and a head loss equation.
Figure 899692DEST_PATH_IMAGE286
In the formula,
Figure 175952DEST_PATH_IMAGE287
for the purposes of the node-branch association matrix,
Figure 409487DEST_PATH_IMAGE288
is a loop-branch correlation matrix.
Figure 505619DEST_PATH_IMAGE289
In order to be able to measure the mass flow rate of the pipeline,
Figure 482803DEST_PATH_IMAGE290
the node injection flow rate is shown,
Figure 518629DEST_PATH_IMAGE291
the loss of head pressure is indicated,
Figure 657487DEST_PATH_IMAGE292
is the damping coefficient of the pipe.
2) A thermal model. The thermodynamic model represents an energy transmission process and is composed of a node power equation, a pipeline temperature drop equation and a node medium mixing equation.
Figure 240915DEST_PATH_IMAGE293
In the formula,
Figure 21789DEST_PATH_IMAGE294
is a node
Figure 148008DEST_PATH_IMAGE295
The injection thermal power of (a) is,
Figure 723346DEST_PATH_IMAGE296
is the specific heat capacity of the water,
Figure 794070DEST_PATH_IMAGE297
is a node
Figure 113056DEST_PATH_IMAGE298
The water temperature of the heat delivery pipeline and the water temperature of the outlet,
Figure 93781DEST_PATH_IMAGE299
subscript
Figure 840021DEST_PATH_IMAGE300
Is shown in
Figure 663620DEST_PATH_IMAGE301
Is a heat supply network pipeline branch of the head-end node,
Figure 520718DEST_PATH_IMAGE302
of the branch
Figure 621529DEST_PATH_IMAGE303
The temperature of the end part of the tube is measured,
Figure 538669DEST_PATH_IMAGE304
indicating the ambient temperature.
Step 202, establishing a system optimization objective. In order to realize the minimum comprehensive target of the operation cost of the power system and the heat supply network and the consumption of new energy, the expression is,
Figure 583986DEST_PATH_IMAGE305
in the formula,
Figure 244774DEST_PATH_IMAGE040
in order to reduce the running cost of the conventional unit,
Figure 698627DEST_PATH_IMAGE041
in order to increase the operating cost of the cogeneration unit,
Figure 786669DEST_PATH_IMAGE042
punishment is made for wind abandonment.
In the embodiment of the invention, the calculation expression of the operation cost of the conventional unit is as follows,
Figure 319281DEST_PATH_IMAGE306
in the formula (I), wherein,
Figure 783761DEST_PATH_IMAGE151
Figure 452639DEST_PATH_IMAGE045
Figure 586949DEST_PATH_IMAGE152
is an energy consumption coefficient of a conventional unit,
Figure 872436DEST_PATH_IMAGE153
the output of the conventional machine set is used,
Figure 875027DEST_PATH_IMAGE307
the number of the conventional units is the same as that of the conventional units,
Figure 663992DEST_PATH_IMAGE049
in order to schedule the period of time,
Figure 969202DEST_PATH_IMAGE308
a time interval is scheduled.
In the embodiment of the invention, the calculation expression of the running cost of the cogeneration unit is as follows,
Figure 476407DEST_PATH_IMAGE309
in the formula (I), wherein,
Figure 282689DEST_PATH_IMAGE157
for the energy consumption coefficient of the cogeneration unit,
Figure 926160DEST_PATH_IMAGE158
for the amount of cogeneration,
Figure 667851DEST_PATH_IMAGE310
Figure 396773DEST_PATH_IMAGE311
respectively the electricity and the heat output of the cogeneration unit.
In the embodiment of the invention, the calculation expression of the wind curtailment penalty is as follows,
Figure 6746DEST_PATH_IMAGE312
in the formula (I), wherein,
Figure 504723DEST_PATH_IMAGE162
in order to make the wind abandon penalty factor,
Figure 915851DEST_PATH_IMAGE313
and predicting the difference value between the wind power and the actual power.
Step 203, establishing a constraint condition based on safe operation:
1) network security constraints
In order to realize safe and reliable operation of an electric-heat combined system, a power network needs to meet voltage constraint, a thermodynamic network meets the condition that the node temperature is in a specified range, and the mass flow rate of a heat pipe pipeline is in a limited range.
Figure 132068DEST_PATH_IMAGE314
Figure 280153DEST_PATH_IMAGE315
Figure 898216DEST_PATH_IMAGE316
In the formula (I), wherein,
Figure 981710DEST_PATH_IMAGE167
representing nodes of an electrical power network
Figure 685223DEST_PATH_IMAGE063
3 the amplitude of the voltage is set to be 3,
Figure 371420DEST_PATH_IMAGE317
Figure 843989DEST_PATH_IMAGE170
are respectively nodes
Figure 223018DEST_PATH_IMAGE318
3 upper and lower limits of voltage amplitude;
Figure 289194DEST_PATH_IMAGE319
for flowing into heat supply network node
Figure 779081DEST_PATH_IMAGE067
The temperature of the hot water of (a),
Figure 371737DEST_PATH_IMAGE320
Figure 656088DEST_PATH_IMAGE174
the upper limit and the lower limit of the water supply temperature are set;
Figure 209560DEST_PATH_IMAGE321
as a heat supply network node
Figure 503138DEST_PATH_IMAGE071
And node
Figure 950300DEST_PATH_IMAGE322
The mass flow rate of the intermediate hot water pipeline,
Figure 671131DEST_PATH_IMAGE323
Figure 944855DEST_PATH_IMAGE324
the upper limit and the lower limit are respectively.
2) Cogeneration unit constraints
The electric heat cogeneration unit provided by the embodiment of the invention adopts a domestic common extraction condensing unit, the operating point is in a polygonal area, and the electricity and heat generation power can be represented by the following constraint form:
Figure 510966DEST_PATH_IMAGE325
in the formula (I), wherein,
Figure 78213DEST_PATH_IMAGE326
Figure 969946DEST_PATH_IMAGE327
are respectively a period of time
Figure 357065DEST_PATH_IMAGE328
The first step
Figure 867812DEST_PATH_IMAGE079
The bench pumping condensing unit generates electric power and heat power;
Figure 289566DEST_PATH_IMAGE182
Figure 352200DEST_PATH_IMAGE329
the upper limit and the lower limit of the electric output force are respectively;
Figure 226615DEST_PATH_IMAGE184
Figure 275474DEST_PATH_IMAGE330
Figure 551734DEST_PATH_IMAGE331
the coefficients are represented for polygonal areas and are constant for a given cogeneration unit.
The cogeneration unit should satisfy the climbing constraint:
Figure 785269DEST_PATH_IMAGE332
in the formula (I), wherein,
Figure 146981DEST_PATH_IMAGE333
Figure 999530DEST_PATH_IMAGE188
the cogeneration power of the front and the back two periods respectively,
Figure 130297DEST_PATH_IMAGE189
Figure 534734DEST_PATH_IMAGE334
respectively the upper and lower limits of the climbing speed of the cogeneration device.
3) Renewable energy constraints
Figure 383741DEST_PATH_IMAGE335
In the formula (I), wherein,
Figure 538517DEST_PATH_IMAGE336
indicating a period of time
Figure 523790DEST_PATH_IMAGE193
Wind turbine
Figure 99128DEST_PATH_IMAGE337
The power generated by the generator is used as the power,
Figure 435431DEST_PATH_IMAGE195
the maximum output value of the wind driven generator is obtained.
4) Conventional unit output constraints
Figure 488838DEST_PATH_IMAGE338
Satisfy climbing restraint simultaneously:
Figure 469563DEST_PATH_IMAGE339
in the formula (I), wherein,
Figure 215803DEST_PATH_IMAGE340
in order to generate the power for the conventional unit,
Figure 39402DEST_PATH_IMAGE341
respectively are the upper limit and the lower limit of the unit output,
Figure 896500DEST_PATH_IMAGE342
Figure 731732DEST_PATH_IMAGE343
the upper limit and the lower limit of the climbing speed of the unit are respectively set.
And 3, constructing an optimized scheduling model based on the multi-agent depth certainty strategy gradient. And establishing an optimized scheduling model based on the multi-agent depth certainty strategy gradient by combining an electric heating combined system scheduling model according to 5 basic elements of environment, state, action, reward and agent in the reinforcement learning model.
Step 301, constructing an action space and a state space
Respectively constructing and obtaining an electric power system intelligent agent and a thermodynamic system intelligent agent based on the obtained electric power system parameters and thermodynamic system parameters; dividing the action space according to the power system agent and the heating system agent
Figure 914451DEST_PATH_IMAGE344
State space
Figure 959768DEST_PATH_IMAGE345
Preferably, the motion space variable corresponds to a control variable of the system under study, and the generated power of the conventional unit is converted into the power
Figure 620556DEST_PATH_IMAGE346
Cogeneration power
Figure 575874DEST_PATH_IMAGE347
And wind power generation power
Figure 663916DEST_PATH_IMAGE348
As an action variable of the power system agent; the action variable in the thermodynamic system is the combined heat and power generation power
Figure 196528DEST_PATH_IMAGE349
Namely:
Figure 661007DEST_PATH_IMAGE350
the state space variables correspond to the state variables of the system under study, reflecting the overall and true physical state of the entire system.
Preferably, the state space of the power system agent is selected as the electric load
Figure 969367DEST_PATH_IMAGE351
Generating power of cogeneration device
Figure 228310DEST_PATH_IMAGE352
Maximum output of wind power
Figure 248218DEST_PATH_IMAGE353
And conventional unit output
Figure 985230DEST_PATH_IMAGE354
Figure 774195DEST_PATH_IMAGE355
The thermodynamic system intelligent state space comprises a heat load
Figure 344984DEST_PATH_IMAGE356
Heat power produced by combined heat and power generation device
Figure 852189DEST_PATH_IMAGE357
And ambient temperature
Figure 392892DEST_PATH_IMAGE358
Figure 301942DEST_PATH_IMAGE359
Step 302, building a reinforcement learning environment based on the energy flow model formula (11-13) of the electric heating combined system, and setting up a section plan at each time
Slightly interacting with the environment completes the state transition process and gets the system reward feedback.
Step 303, respectively establishing a reward and punishment mechanism of the power system agent and the thermodynamic system agent, and judging the quality of the action amount based on the reward and punishment mechanism, specifically comprising the following steps:
(1) and establishing a reinforcement learning reward function.
For the power system agent, the reward function comprises the operation cost of a conventional unit, a wind abandoning penalty and a variable out-of-limit penalty.
Figure 778054DEST_PATH_IMAGE360
In the formula,
Figure 772555DEST_PATH_IMAGE361
punishment is carried out on the running cost and the abandoned wind of the power system;
Figure 116948DEST_PATH_IMAGE103
a system node voltage out-of-limit penalty item is obtained;
Figure 880505DEST_PATH_IMAGE362
for the output out-of-limit penalty term of the cogeneration unit,
Figure 793097DEST_PATH_IMAGE363
for the climbing out-of-limit punishment item of the cogeneration device,
Figure 9315DEST_PATH_IMAGE206
is an out-of-limit punishment item of the output of the conventional unit,
Figure 157400DEST_PATH_IMAGE364
and (4) a conventional unit climbing out-of-limit punishment item.
(2) For thermodynamic system agents, the reward function includes cogeneration unit operating cost and variable violation penalty:
Figure 775463DEST_PATH_IMAGE365
in the formula,
Figure 357492DEST_PATH_IMAGE366
for the output out-of-limit punishment item of the cogeneration unit,
Figure 61005DEST_PATH_IMAGE367
for the climbing out-of-limit punishment item of the cogeneration unit,
Figure 747202DEST_PATH_IMAGE111
punishment is carried out for the temperature of the system node,
Figure 485351DEST_PATH_IMAGE211
and punishing the out-of-limit of the mass flow rate of the system pipeline.
And finally, the sum of the reward functions of the intelligent agents is used as a basis for evaluating the quality of the action of each intelligent agent, and the intelligent agents cooperate with each other to realize the optimal optimization target of the electric-heating combined system.
All of the above-mentioned are provided with
Figure 739745DEST_PATH_IMAGE368
The following form penalty terms are adopted for the constraints of (1):
Figure 930555DEST_PATH_IMAGE369
in the formula,
Figure 420443DEST_PATH_IMAGE370
setting corresponding coefficients for penalty coefficients according to different out-of-limit penalties
And step 304, constructing an actor and judger network.
Designing a reinforcement learning actuator and a discriminator network structure; different network structures are adopted for the policy network and the value function network. The evaluation network and the target network share a network form, and the network consists of an input layer, a hidden layer and an output layer, wherein the hidden layer number of the actor network is 4, and the number of the neurons is 512, 256, 64 and 32 in sequence. The discriminator network comprises 3 layers of hidden layers, and the number of the neurons is 128, 128 and 32 in sequence. In order to prevent the neural network learning efficiency from being reduced due to gradient disappearance, a linear rectification function with leakage is adopted as an activation function of a hidden layer; and setting an activation function of the output layer of the actor network as a tanh function, limiting the action output within [ -1, 1], and selecting Adam as an optimization algorithm.
Step 4, multi-agent deep reinforcement learning network training: and repeatedly executing the following steps according to the set maximum training times to update the reinforcement learning network.
In the optional technical scheme of the embodiment of the invention, in the step 3, the step form can be used for replacing the linear form for the punishment item of the out-of-limit constraint, but the fitting effect of the punishment item of the step form is poor in practice, and the punishment item of the linear form can achieve a better fitting effect in the training process; in step 3, the reward function curve can be added with no information entropy regular term, but the algorithm convergence process is likely to be unstable; in step 4, the training method can adopt a random gradient descent method SGD to replace Adam (Adaptive moment estimation), but practice shows that the Adam algorithm is better.
In summary, for the optimization problem of the electric-heating combined system, the conventional method is difficult to solve the solving difficulty caused by the increase of the system scale and overcome the information barrier problem among different beneficial agents, and an electric-heating combined system optimization operation method with stronger solving capability and universality is required to be adopted to solve the problem, so that the electric-heating combined system optimization operation problem is solved by adopting a multi-agent-based depth certainty strategy gradient method. Therefore, the electric heating combined system optimization problem can be solved by using reinforcement learning, a deep reinforcement learning method based on a multi-agent technology is constructed, the sequence decision problem in the continuous control process is effectively solved, the defects caused by adoption of discrete action space are avoided, the difficulty of high-dimensional training is reduced, the method is more suitable for a dynamic environment, calculation is completed only by depending on local information in the strategy execution process of each agent, the problem that data of different beneficial agents are difficult to share is solved, and therefore multifunctional coordination optimization scheduling of the electric heating combined system is achieved.
In the method provided by the embodiment of the invention, an optimal operation method of an electric heating combined system is constructed based on a multi-agent depth certainty strategy gradient, and the method is mainly used for solving the following technical problems of the traditional model:
(1) the problem of high-dimensional nonlinear non-convex faced by a traditional model along with the increase of the system scale is solved, and the operation time is greatly reduced by constructing a multi-agent deep reinforcement learning method so as to meet the requirement of online calculation;
(2) the problem of large scheduling result error caused by linear processing for simplifying calculation in the traditional method is solved;
(3) by adopting a multi-agent reinforcement learning framework, each agent only depends on local information to complete calculation in the process of executing the strategy, and the problem that data of different benefit agents are difficult to share is solved.
Compared with the prior art, the technical scheme of the embodiment of the invention has the beneficial effects that at least:
(1) the invention adopts the reinforcement learning method to solve the electric-heating combined optimization problem, improves the generation speed of the control strategy on the premise of ensuring the calculation effect through reinforcement learning, and overcomes the defects that the traditional method has overlong calculation time along with the increase of the system scale and is difficult to meet the requirement of on-line calculation;
(2) the multi-agent depth certainty strategy gradient algorithm framework is based on, an electric heating combined system optimization scheduling model based on a multi-agent actor-evaluator is constructed, convergence is stable, space exploratory performance is strong, and the problem that a local optimal solution is easy to fall into in solving in a traditional method is solved;
(3) the electric heating combined system is divided into an electric power system intelligent body and a thermodynamic system intelligent body, the intelligent bodies cooperate to realize the overall optimization target of the system, the reinforcement learning action and the state space are divided by combining the electric heating combined system scheduling model, the reward and punishment mechanism of each intelligent body is established, the respective strategy calculation can be completed only through the local state information of each intelligent body, and the problem that the data of different beneficial bodies are difficult to share is solved.
The following are embodiments of the apparatus of the present invention that may be used to perform embodiments of the method of the present invention. For details of non-careless mistakes in the embodiment of the apparatus, please refer to the embodiment of the method of the present invention.
In another embodiment of the present invention, an optimized operation system of an electric heating combination system is provided, which includes:
the parameter acquisition module is used for acquiring state parameters of the electric heating combined system to be optimally operated; wherein the state parameters include: electrical load, wind power maximum output, thermal load and ambient temperature;
the action quantity acquisition module is used for inputting the state parameters into a pre-trained multi-agent deep reinforcement learning model and outputting action quantities through the multi-agent deep reinforcement learning model; wherein the action amount includes: the power generation power of the conventional unit, the power generation power of the cogeneration device, the wind power generation power and the heat generation power of the cogeneration device; the basic elements of the multi-agent deep reinforcement learning model comprise agents, environments, action spaces of the agents, state spaces of the agents and reward functions of the agents;
and the optimized operation module is used for realizing the optimized operation of the electric heating combined system based on the action quantity.
In the system of the embodiment of the invention, a reinforcement learning method is adopted to solve the electric-heat joint optimization problem, effectively solve the sequence decision problem in the continuous control process, avoid the defects caused by adopting a discrete action space, reduce the difficulty of high-dimensional training, and enable the system to be more suitable for a dynamic environment, and have high model precision and high solving speed; a multi-agent deep reinforcement learning framework is adopted, a target function for minimizing the operation cost of the system and an intelligent agent reward mechanism of the electric heating combined system constructed based on safety constraints are introduced, the convergence is stable, the space exploration is strong, and the model adaptability is good; an optimized scheduling model based on a multi-agent actor-judger framework is established by combining an electric heating combined system scheduling model, and respective strategy calculation can be completed only through local state information of each agent in the execution process, so that the problem that information of different beneficial agents is difficult to share is solved, and the model is wide in applicability.
In yet another embodiment of the present invention, a computer device is provided that includes a processor and a memory for storing a computer program comprising program instructions, the processor for executing the program instructions stored by the computer storage medium. The Processor may be a Central Processing Unit (CPU), or may be other general-purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable gate array (FPGA) or other Programmable logic device, a discrete gate or transistor logic device, a discrete hardware component, etc., which is a computing core and a control core of the terminal, and is specifically adapted to load and execute one or more instructions in a computer storage medium to implement a corresponding method flow or a corresponding function; the processor provided by the embodiment of the invention can be used for the operation of the optimized operation method of the electric heating combined system.
In yet another embodiment of the present invention, a storage medium, specifically a computer-readable storage medium (Memory), is provided, which is a Memory device in a computer device for storing programs and data. It is understood that the computer readable storage medium herein can include both built-in storage media in the computer device and, of course, extended storage media supported by the computer device. The computer-readable storage medium provides a storage space storing an operating system of the terminal. Also, one or more instructions, which may be one or more computer programs (including program code), are stored in the memory space and are adapted to be loaded and executed by the processor. It should be noted that the computer-readable storage medium may be a high-speed RAM memory, or may be a non-volatile memory (non-volatile memory), such as at least one disk memory. One or more instructions stored in the computer-readable storage medium may be loaded and executed by a processor to perform the corresponding steps of the method for optimized operation of an integrated electric heating system in the above-described embodiments.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims (12)

1. An optimized operation method of an electric-heat combined system is characterized by comprising the following steps: acquiring state parameters of an electric heating combined system to be optimally operated; wherein the state parameters include: electrical load, wind power maximum output, thermal load and ambient temperature;
inputting the state parameters into a pre-trained multi-agent deep reinforcement learning model, and outputting the action quantity through the multi-agent deep reinforcement learning model; wherein the action amount includes: the power generation power of the conventional unit, the power generation power of the cogeneration device, the wind power generation power and the heat generation power of the cogeneration device; the basic elements of the multi-agent deep reinforcement learning model comprise agents, environments, action spaces of the agents, state spaces of the agents and reward functions of the agents;
realizing the optimized operation of the electric heating combined system based on the action quantity;
wherein, in the multi-agent deep reinforcement learning model,
the intelligent agents comprise an electric power system intelligent agent and a thermal system intelligent agent;
the environment includes mathematical models of power system and thermodynamic system energy flows;
the action space of each agent comprises an electric power system agent action space and a thermal system agent action space; the intelligent action space of the power system comprises conventional unit generating power, cogeneration device generating power and wind power generation power; the thermodynamic system intelligent body action space comprises heat generation power of a cogeneration device;
the state space of each intelligent agent comprises an electric power system intelligent agent state space and a thermodynamic system intelligent agent state space; the state space of the intelligent body of the power system comprises an electric load, the power generation power of the current cogeneration device, the maximum wind power output and the output of the current conventional unit; the intelligent state space of the thermodynamic system comprises a heat load, the heat generation power of the current cogeneration device and the ambient temperature;
the reward function of each intelligent agent comprises an electric power system intelligent agent reward function and a thermal system intelligent agent reward function; the power system intelligent agent reward function comprises a conventional unit operation cost, a wind curtailment penalty and a variable out-of-limit penalty; the thermodynamic system intelligent agent reward function comprises the operation cost of the cogeneration device and a variable out-of-limit penalty.
2. The method of claim 1, wherein the power system agent and the thermal system agent each comprise a respective actuator network and arbiter network;
the actor network is used for inputting a state set sensed by the agent from the environment and outputting the action of the agent in a given state; the arbiter network is used for generating a state value function according to the state of the agent and the action of the agent in the state, and evaluating the quality of the current action taken by the actor network; the mobile network and the discriminator network both adopt a double-network structure, and comprise an estimation network and a target network with the same structure; in the training process, the estimation network parameters of the actuators and the estimation network parameters of the discriminators of all the agents are updated, and the estimation network parameters after training are used for soft updating of the target network.
3. The optimized operation method of an electric-heating combined system according to claim 2, wherein in the training process, the estimation network parameters of the actor and the estimator of each agent are updated, and the step of performing soft update on the target network by using the trained estimation network parameters specifically comprises:
selecting an action a for the power system agent at each scheduling period in the scheduling cycle1=μθ1(s1)+ξt1Selecting action a for the thermodynamic system agent2=μθ2(s2)+ξt2(ii) a In the formula, s1、s2Respectively represents the current states observed by the power system intelligent agent and the thermal system intelligent agent,
Figure FDA0003455612980000031
respectively represents the current strategy of xi in the power system intelligent agent and thermodynamic system intelligent agent actuator networkst1、ξt2Respectively are random noises of strategy actions of an intelligent agent of the power system and an intelligent agent of the thermodynamic system;
will(s)1,a1,r1,s′1) Storing in the power system agent experience playback unit(s)2,a2,r2,s′2) Storing the data into a thermodynamic system intelligent agent experience playback unit; wherein r is1And s'1Respectively is action a ═ a1,a2) Real-time rewarding and updated status r for agents acting on real-time system observation power system2And s'2Respectively is action a ═ a1,a2) Instant rewards and updated status for the thermodynamic system agents;
from power system agent experiencesPlayback unit random sampling
Figure FDA0003455612980000032
Computing
Figure FDA0003455612980000033
Updating a discriminator estimated network parameter theta of an agent of an electric power system according to a first loss function1 μThe first loss function is expressed as,
Figure FDA0003455612980000034
in the formula,
Figure FDA0003455612980000035
a state value function of the evaluation network is evaluated for the power system agent arbiter,
Figure FDA0003455612980000036
function of state values, K, for the power system agent arbiter target network1The number of all sub-strategies in the strategy;
updating the power system agent's actor estimated network parameter θ according to a second loss function1 QAnd the second loss function is expressed as,
Figure FDA0003455612980000037
the target actuator network parameter of the intelligent agent of the soft updating power system and the target discriminator network parameter expression are theta1μ←τθ1 μ+(1-τ)θ1μ,θ1Q←τθ1 Q+(1-τ)θ1QIn the formula, theta1μ、θ1QNetwork parameters of an intelligent agent target actuator and a target discriminator of the power system are respectively;
random sampling from thermodynamic system intelligent agent experience playback unit
Figure FDA0003455612980000041
Computing
Figure FDA0003455612980000042
Updating the network parameter θ of the thermal system agent's arbiter estimate according to the third loss function2 μThe expression of the third loss function is,
Figure FDA0003455612980000043
in the formula,
Figure FDA0003455612980000044
a state value function of the evaluation network is evaluated for the thermal system agent arbiter,
Figure FDA0003455612980000045
function of the state value of the target network of the arbiter of the thermodynamic system2The number of all sub-strategies in the strategy; updating the actuator estimated network parameter theta of the thermal system agent according to the fourth loss function2 QThe expression of the fourth loss function is,
Figure FDA0003455612980000046
the expression of the target actuator network parameter and the target discriminator network parameter of the intelligent agent of the soft updating thermodynamic system is theta2μ←τθ2 μ+(1-τ)θ2μ,θ2Q←τθ2 Q+(1-τ)θ2QIn the formula, theta2μ、θ2QRespectively are network parameters of an intelligent agent target actuator and a target discriminator of the thermodynamic system.
4. The method of claim 3, wherein the mathematical models of power system and thermodynamic system power flows comprise:
system optimization objectiveThe expression is that min F ═ F1+f2+f3
In the formula (f)1For the running cost of a conventional unit, f2For the running cost of the cogeneration unit, f3Punishment is carried out for wind abandonment;
Figure FDA0003455612980000051
in the formula, b0、b1、b2Is an energy consumption coefficient of a conventional unit,
Figure FDA0003455612980000052
is the output of a conventional unit, NGThe number of the conventional units is T, a scheduling period is T, and delta T is a scheduling time interval;
Figure FDA0003455612980000053
in the formula, a0、a1、a2、a3、a4、a5Is the coefficient of energy consumption, N, of the cogeneration unitchpFor the amount of cogeneration,
Figure FDA0003455612980000054
respectively the electricity and heat output of the cogeneration unit;
Figure FDA0003455612980000055
in the formula, k is a wind curtailment penalty coefficient,
Figure FDA0003455612980000056
predicting a difference value between the power and the actual power for the wind power;
the network security constraints, expressed as,
Figure FDA0003455612980000057
Figure FDA0003455612980000058
Figure FDA0003455612980000059
in the formula, Vi3Representing nodes i of an electric power network3Amplitude of voltage, Vi3,max、Vi3,minAre respectively node i3Upper and lower limits of voltage amplitude; t issjTo the temperature of the hot water flowing into the heat network node j,
Figure FDA00034556129800000510
the upper limit and the lower limit of the water supply temperature are set; m isjkIs the mass flow rate, m, of the hot water pipeline between the node j and the node k of the heat supply networkjk,max、mjk,minRespectively as its upper and lower limits;
the cogeneration unit is constrained, as expressed,
Figure FDA0003455612980000061
Figure FDA0003455612980000062
in the formula,
Figure FDA0003455612980000063
respectively obtaining electric output and thermal output of the pumping condensing unit of the ith station and the time period t;
Figure FDA0003455612980000064
the upper limit and the lower limit of the electric output force are respectively; alpha is alpha1、α2、α3Representing coefficients for the polygonal areas;
the climbing of the cogeneration device is restricted by the expression,
Figure FDA0003455612980000065
Figure FDA0003455612980000066
in the formula,
Figure FDA0003455612980000067
the cogeneration power of the front and the back two periods respectively,
Figure FDA0003455612980000068
Figure FDA0003455612980000069
respectively is the upper limit and the lower limit of the climbing speed of the cogeneration device;
the renewable energy source is restricted, and the expression is,
Figure FDA00034556129800000610
in the formula,
Figure FDA00034556129800000611
representing the time period t, the generated power of the fan i,
Figure FDA00034556129800000612
the maximum output value is the maximum output value of the wind driven generator;
the output constraint of the conventional unit is represented by the following expression,
Figure FDA0003455612980000071
the climbing of the conventional unit is restrained, the expression is,
Figure FDA0003455612980000072
Figure FDA0003455612980000073
in the formula,
Figure FDA0003455612980000074
in order to generate the power for the conventional unit,
Figure FDA0003455612980000075
respectively are the upper limit and the lower limit of the unit output,
Figure FDA0003455612980000076
the upper limit and the lower limit of the climbing speed of the unit are respectively set.
5. The method of claim 1, wherein the power system agent reward function is expressed as,
Figure FDA0003455612980000077
in the formula (f)1、f3Punishment is carried out on the running cost and the abandoned wind of the power system; phi is aVA system node voltage out-of-limit penalty item is obtained;
Figure FDA0003455612980000078
for the output out-of-limit penalty term of the cogeneration unit,
Figure FDA0003455612980000079
for the climbing out-of-limit punishment item of the cogeneration device,
Figure FDA00034556129800000710
is a constantThe output of the gauge set exceeds the limit punishment item,
Figure FDA00034556129800000711
a climbing out-of-limit punishment item for the conventional unit;
the expression of the thermodynamic system agent reward function is,
Figure FDA00034556129800000712
in the formula,
Figure FDA0003455612980000081
for the output out-of-limit punishment item of the cogeneration unit,
Figure FDA0003455612980000082
a penalty term phi for climbing over the limit of the cogeneration unitTPenalty for system node temperature out-of-limit, phimAnd punishing the out-of-limit of the mass flow rate of the system pipeline.
6. An optimized operation system of an electric-heat combined system is characterized by comprising:
the parameter acquisition module is used for acquiring state parameters of the electric heating combined system to be optimally operated; wherein the state parameters include: electrical load, wind power maximum output, thermal load and ambient temperature;
the action quantity acquisition module is used for inputting the state parameters into a pre-trained multi-agent deep reinforcement learning model and outputting action quantities through the multi-agent deep reinforcement learning model; wherein the action amount includes: the power generation power of the conventional unit, the power generation power of the cogeneration device, the wind power generation power and the heat generation power of the cogeneration device; the basic elements of the multi-agent deep reinforcement learning model comprise agents, environments, action spaces of the agents, state spaces of the agents and reward functions of the agents;
the optimized operation module is used for realizing the optimized operation of the electric heating combined system based on the action quantity;
wherein, in the multi-agent deep reinforcement learning model of the action quantity acquisition module,
the intelligent agents comprise an electric power system intelligent agent and a thermal system intelligent agent;
the environment includes mathematical models of power system and thermodynamic system energy flows;
the action space of each agent comprises an electric power system agent action space and a thermal system agent action space; the intelligent action space of the power system comprises conventional unit generating power, cogeneration device generating power and wind power generation power; the thermodynamic system intelligent body action space comprises heat generation power of a cogeneration device;
the state space of each intelligent agent comprises an electric power system intelligent agent state space and a thermodynamic system intelligent agent state space; the state space of the intelligent body of the power system comprises an electric load, the power generation power of the current cogeneration device, the maximum wind power output and the output of the current conventional unit; the intelligent state space of the thermodynamic system comprises a heat load, the heat generation power of the current cogeneration device and the ambient temperature;
the reward function of each intelligent agent comprises an electric power system intelligent agent reward function and a thermal system intelligent agent reward function; the power system intelligent agent reward function comprises a conventional unit operation cost, a wind curtailment penalty and a variable out-of-limit penalty; the thermodynamic system intelligent agent reward function comprises the operation cost of the cogeneration device and a variable out-of-limit penalty.
7. The optimal operation system of an electric-heat combined system according to claim 6, wherein in the action amount obtaining module, each of the power system agent and the thermal system agent comprises a respective actuator network and a respective arbiter network;
the actor network is used for inputting a state set sensed by the agent from the environment and outputting the action of the agent in a given state; the arbiter network is used for generating a state value function according to the state of the agent and the action of the agent in the state, and evaluating the quality of the current action taken by the actor network; the mobile network and the discriminator network both adopt a double-network structure, and comprise an estimation network and a target network with the same structure; in the training process, the estimation network parameters of the actuators and the estimation network parameters of the discriminators of all the agents are updated, and the estimation network parameters after training are used for soft updating of the target network.
8. The optimal operation system of an electric-thermal combination system according to claim 7, wherein in the action quantity obtaining module, in the training process, the estimation network parameters of the actuators and the estimation network parameters of the discriminators of the agents are updated, and the step of performing soft update on the target network by using the trained estimation network parameters specifically comprises:
selecting an action a for the power system agent at each scheduling period in the scheduling cycle1=μθ1(s1)+ξt1Selecting action a for the thermodynamic system agent2=μθ2(s2)+ξt2(ii) a In the formula, s1、s2Respectively represents the current states observed by the power system intelligent agent and the thermal system intelligent agent,
Figure FDA0003455612980000101
respectively represents the current strategy of xi in the power system intelligent agent and thermodynamic system intelligent agent actuator networkst1、ξt2Respectively are random noises of strategy actions of an intelligent agent of the power system and an intelligent agent of the thermodynamic system;
will(s)1,a1,r1,s′1) Storing in the power system agent experience playback unit(s)2,a2,r2,s′2) Storing the data into a thermodynamic system intelligent agent experience playback unit; wherein r is1And s'1Respectively is action a ═ a1,a2) Real-time rewarding and updated status r for agents acting on real-time system observation power system2And s'2Respectively is action a ═ a1,a2) Instant rewards and updated status for the thermodynamic system agents;
random sampling from power system agent experience playback unit
Figure FDA0003455612980000102
Computing
Figure FDA0003455612980000103
Updating a discriminator estimated network parameter theta of an agent of an electric power system according to a first loss function1 μThe first loss function is expressed as,
Figure FDA0003455612980000104
in the formula,
Figure FDA0003455612980000105
a state value function of the evaluation network is evaluated for the power system agent arbiter,
Figure FDA0003455612980000106
function of state values, K, for the power system agent arbiter target network1The number of all sub-strategies in the strategy;
updating the power system agent's actor estimated network parameter θ according to a second loss function1 QAnd the second loss function is expressed as,
Figure FDA0003455612980000111
the target actuator network parameter and target discriminator network parameter expressions of the power system intelligent agent are soft update,
θ1μ←τθ1 μ+(1-τ)θ1μ,θ1Q←τθ1 Q+(1-τ)θ1Qin the formula, theta1μ、θ1QRespectively are the intelligence of the power systemEnergy object actor, object discriminator network parameter;
random sampling from thermodynamic system intelligent agent experience playback unit
Figure FDA0003455612980000112
Computing
Figure FDA0003455612980000113
Updating the network parameter θ of the thermal system agent's arbiter estimate according to the third loss function2 μThe expression of the third loss function is,
Figure FDA0003455612980000114
in the formula,
Figure FDA0003455612980000115
a state value function of the evaluation network is evaluated for the thermal system agent arbiter,
Figure FDA0003455612980000116
function of the state value of the target network of the arbiter of the thermodynamic system2The number of all sub-strategies in the strategy; updating the actuator estimated network parameter theta of the thermal system agent according to the fourth loss function2 QThe expression of the fourth loss function is,
Figure FDA0003455612980000117
the expression of the target actuator network parameter and the target discriminator network parameter of the intelligent agent of the soft updating thermodynamic system is theta2μ←τθ2 μ+(1-τ)θ2μ,θ2Q←τθ2 Q+(1-τ)θ2QIn the formula, theta2μ、Q2QRespectively are network parameters of an intelligent agent target actuator and a target discriminator of the thermodynamic system.
9. The system of claim 8, wherein the mathematical models of power system and thermodynamic system power flows in the action quantity obtaining module comprise:
the system optimization target is expressed as min F ═ F1+f2+f3
In the formula (f)1For the running cost of a conventional unit, f2For the running cost of the cogeneration unit, f3Punishment is carried out for wind abandonment;
Figure FDA0003455612980000121
in the formula, b0、b1、b2Is an energy consumption coefficient of a conventional unit,
Figure FDA0003455612980000122
is the output of a conventional unit, NGThe number of the conventional units is T, a scheduling period is T, and delta T is a scheduling time interval;
Figure FDA0003455612980000123
in the formula, a0、a1、a2、a3、a4、a5Is the coefficient of energy consumption, N, of the cogeneration unitchpFor the amount of cogeneration,
Figure FDA0003455612980000124
respectively the electricity and heat output of the cogeneration unit;
Figure FDA0003455612980000125
in the formula, k is a wind curtailment penalty coefficient,
Figure FDA0003455612980000126
predicting a difference value between the power and the actual power for the wind power;
the network security constraints, expressed as,
Figure FDA0003455612980000127
Figure FDA0003455612980000131
Figure FDA0003455612980000132
in the formula, Vi3Representing nodes i of an electric power network3Amplitude of voltage, Vi3,max、Vi3,minAre respectively node i3Upper and lower limits of voltage amplitude; t issjTo the temperature of the hot water flowing into the heat network node j,
Figure FDA0003455612980000133
the upper limit and the lower limit of the water supply temperature are set; m isjkIs the mass flow rate, m, of the hot water pipeline between the node j and the node k of the heat supply networkjk,max、mjk,minRespectively as its upper and lower limits;
the cogeneration unit is constrained, as expressed,
Figure FDA0003455612980000134
Figure FDA0003455612980000135
in the formula,
Figure FDA0003455612980000136
respectively obtaining electric output and thermal output of the pumping condensing unit of the ith station and the time period t;
Figure FDA0003455612980000137
the upper limit and the lower limit of the electric output force are respectively; alpha is alpha1、α2、α3Representing coefficients for the polygonal areas;
the climbing of the cogeneration device is restricted by the expression,
Figure FDA0003455612980000138
Figure FDA0003455612980000139
in the formula,
Figure FDA00034556129800001310
the cogeneration power of the front and the back two periods respectively,
Figure FDA00034556129800001311
Figure FDA00034556129800001312
respectively is the upper limit and the lower limit of the climbing speed of the cogeneration device;
the renewable energy source is restricted, and the expression is,
Figure FDA0003455612980000141
in the formula,
Figure FDA0003455612980000142
representing the time period t, the generated power of the fan i,
Figure FDA0003455612980000143
the maximum output value is the maximum output value of the wind driven generator;
the output constraint of the conventional unit is represented by the following expression,
Figure FDA0003455612980000144
the climbing of the conventional unit is restrained, the expression is,
Figure FDA0003455612980000145
Figure FDA0003455612980000146
in the formula,
Figure FDA0003455612980000147
in order to generate the power for the conventional unit,
Figure FDA0003455612980000148
respectively are the upper limit and the lower limit of the unit output,
Figure FDA0003455612980000149
the upper limit and the lower limit of the climbing speed of the unit are respectively set.
10. The optimal operation system of an electric-thermal combination system according to claim 6, wherein the expression of the power system agent reward function is,
Figure FDA00034556129800001410
in the formula (f)1、f3Punishment is carried out on the running cost and the abandoned wind of the power system; phi is aVA system node voltage out-of-limit penalty item is obtained;
Figure FDA0003455612980000151
for the output out-of-limit penalty term of the cogeneration unit,
Figure FDA0003455612980000152
for the climbing out-of-limit punishment item of the cogeneration device,
Figure FDA0003455612980000153
is an out-of-limit punishment item of the output of the conventional unit,
Figure FDA0003455612980000154
a climbing out-of-limit punishment item for the conventional unit;
the expression of the thermodynamic system agent reward function is,
Figure FDA0003455612980000155
in the formula,
Figure FDA0003455612980000156
for the output out-of-limit punishment item of the cogeneration unit,
Figure FDA0003455612980000157
a penalty term phi for climbing over the limit of the cogeneration unitTPenalty for system node temperature out-of-limit, phimAnd punishing the out-of-limit of the mass flow rate of the system pipeline.
11. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor, when executing the computer program, carries out the steps of a method for optimized operation of an electric heat integration system according to any one of claims 1 to 5.
12. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of a method for optimized operation of an electric heat integration system according to one of the claims 1 to 5.
CN202111328629.5A 2021-11-10 2021-11-10 Optimized operation method, system, equipment and medium of electric heating combined system Active CN113780688B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111328629.5A CN113780688B (en) 2021-11-10 2021-11-10 Optimized operation method, system, equipment and medium of electric heating combined system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111328629.5A CN113780688B (en) 2021-11-10 2021-11-10 Optimized operation method, system, equipment and medium of electric heating combined system

Publications (2)

Publication Number Publication Date
CN113780688A CN113780688A (en) 2021-12-10
CN113780688B true CN113780688B (en) 2022-02-18

Family

ID=78873781

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111328629.5A Active CN113780688B (en) 2021-11-10 2021-11-10 Optimized operation method, system, equipment and medium of electric heating combined system

Country Status (1)

Country Link
CN (1) CN113780688B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114336759A (en) * 2022-01-10 2022-04-12 国网上海市电力公司 Micro-grid autonomous operation voltage control method based on deep reinforcement learning
CN114398834B (en) * 2022-01-18 2024-09-06 中国科学院半导体研究所 Training method of particle swarm optimization algorithm model, particle swarm optimization method and device
CN114693101B (en) * 2022-03-24 2024-05-31 浙江英集动力科技有限公司 Multi-region thermoelectric coordination control method for multi-agent reinforcement learning and double-layer strategy distribution
CN115759604B (en) * 2022-11-09 2023-09-19 贵州大学 Comprehensive energy system optimal scheduling method
CN117200225B (en) * 2023-11-07 2024-01-30 中国电力科学研究院有限公司 Power distribution network optimal scheduling method considering covering electric automobile clusters and related device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112186799A (en) * 2020-09-22 2021-01-05 中国电力科学研究院有限公司 Distributed energy system autonomous control method and system based on deep reinforcement learning
CN112862281A (en) * 2021-01-26 2021-05-28 中国电力科学研究院有限公司 Method, device, medium and electronic equipment for constructing scheduling model of comprehensive energy system
CN113341958A (en) * 2021-05-21 2021-09-03 西北工业大学 Multi-agent reinforcement learning movement planning method with mixed experience
CN113469839A (en) * 2021-06-30 2021-10-01 国网上海市电力公司 Smart park optimization strategy based on deep reinforcement learning

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106849188B (en) * 2017-01-23 2020-03-06 中国电力科学研究院 Combined heat and power optimization method and system for promoting wind power consumption
CN113589842B (en) * 2021-07-26 2024-04-19 中国电子科技集团公司第五十四研究所 Unmanned cluster task cooperation method based on multi-agent reinforcement learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112186799A (en) * 2020-09-22 2021-01-05 中国电力科学研究院有限公司 Distributed energy system autonomous control method and system based on deep reinforcement learning
CN112862281A (en) * 2021-01-26 2021-05-28 中国电力科学研究院有限公司 Method, device, medium and electronic equipment for constructing scheduling model of comprehensive energy system
CN113341958A (en) * 2021-05-21 2021-09-03 西北工业大学 Multi-agent reinforcement learning movement planning method with mixed experience
CN113469839A (en) * 2021-06-30 2021-10-01 国网上海市电力公司 Smart park optimization strategy based on deep reinforcement learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Towards next generation virtual power plant: Technology review and frameworks;Erphan A.Bhuiyan 等;《Renewable and Sustainable Energy Reviews》;20210712;第150卷;第1-18页 *
基于深度强化学习的多智能体协同算法研究;李天旭;《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑(月刊)》;20210115(第01期);第I140-146页 *

Also Published As

Publication number Publication date
CN113780688A (en) 2021-12-10

Similar Documents

Publication Publication Date Title
CN113780688B (en) Optimized operation method, system, equipment and medium of electric heating combined system
CN113902040B (en) Method, system, equipment and storage medium for coordinating and optimizing electricity-heat comprehensive energy system
Yan et al. A multi-agent deep reinforcement learning method for cooperative load frequency control of a multi-area power system
Zhang et al. Dynamic energy conversion and management strategy for an integrated electricity and natural gas system with renewable energy: Deep reinforcement learning approach
CN112186799B (en) Distributed energy system autonomous control method and system based on deep reinforcement learning
CN109685252B (en) Building energy consumption prediction method based on cyclic neural network and multi-task learning model
CN104181900B (en) Layered dynamic regulation method for multiple energy media
Jiang et al. Combined economic and emission dispatch problem of wind‐thermal power system using gravitational particle swarm optimization algorithm
CN103345663B (en) Consider the Unit Commitment optimization method of ramping rate constraints
Li et al. Coordinated control of gas supply system in PEMFC based on multi-agent deep reinforcement learning
CN116629461B (en) Distributed optimization method, system, equipment and storage medium for active power distribution network
Yin et al. Relaxed deep generative adversarial networks for real-time economic smart generation dispatch and control of integrated energy systems
Zhang et al. Novel Data-Driven decentralized coordination model for electric vehicle aggregator and energy hub entities in multi-energy system using an improved multi-agent DRL approach
Deng et al. Recurrent neural network for combined economic and emission dispatch
Liu et al. Multi-agent quantum-inspired deep reinforcement learning for real-time distributed generation control of 100% renewable energy systems
Zhang et al. Hybrid data-driven method for low-carbon economic energy management strategy in electricity-gas coupled energy systems based on transformer network and deep reinforcement learning
Yalcinoz et al. Economic Load Dispatch Using an Improved Particle Swarm Optimization based on functional constriction factor and functional inertia weight
Dou et al. Double‐deck optimal schedule of micro‐grid based on demand‐side response
Liu et al. Reinforcement learning-based energy trading and management of regional interconnected microgrids
Spea Social network search algorithm for combined heat and power economic dispatch
Nie et al. A general real-time OPF algorithm using DDPG with multiple simulation platforms
Polprasert et al. A new improved particle swarm optimization for solving nonconvex economic dispatch problems
Ma et al. A Reinforcement learning based coordinated but differentiated load frequency control method with heterogeneous frequency regulation resources
CN116384692A (en) Data-driven-based environmental economic dispatching method and system for wind-energy-containing power system
Sun et al. An on-line generator start-up strategy based on deep learning and tree search

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant