CN116362504A - Optimal scheduling method for electric heating combined energy system, terminal equipment and storage medium - Google Patents
Optimal scheduling method for electric heating combined energy system, terminal equipment and storage medium Download PDFInfo
- Publication number
- CN116362504A CN116362504A CN202310328399.5A CN202310328399A CN116362504A CN 116362504 A CN116362504 A CN 116362504A CN 202310328399 A CN202310328399 A CN 202310328399A CN 116362504 A CN116362504 A CN 116362504A
- Authority
- CN
- China
- Prior art keywords
- energy system
- combined energy
- electric heating
- reinforcement learning
- learning model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 44
- 238000005485 electric heating Methods 0.000 title claims abstract description 41
- 230000002787 reinforcement Effects 0.000 claims abstract description 42
- 230000006870 function Effects 0.000 claims abstract description 24
- 230000009471 action Effects 0.000 claims abstract description 21
- 238000012549 training Methods 0.000 claims abstract description 16
- 238000010586 diagram Methods 0.000 claims abstract description 14
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 13
- 238000009826 distribution Methods 0.000 claims abstract description 11
- 238000013528 artificial neural network Methods 0.000 claims abstract description 10
- 238000004590 computer program Methods 0.000 claims description 19
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000010521 absorption reaction Methods 0.000 claims description 3
- 238000005315 distribution function Methods 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 claims description 3
- 230000004931 aggregating effect Effects 0.000 claims 1
- 230000008901 benefit Effects 0.000 abstract description 4
- 230000000875 corresponding effect Effects 0.000 description 9
- 238000005457 optimization Methods 0.000 description 8
- 239000013598 vector Substances 0.000 description 8
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 8
- 238000010438 heat treatment Methods 0.000 description 5
- 239000003795 chemical substances by application Substances 0.000 description 4
- 230000008878 coupling Effects 0.000 description 4
- 238000010168 coupling process Methods 0.000 description 4
- 238000005859 coupling reaction Methods 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 4
- 230000004913 activation Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 210000002569 neuron Anatomy 0.000 description 3
- 239000000243 solution Substances 0.000 description 3
- 230000009194 climbing Effects 0.000 description 2
- 238000004134 energy conservation Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000002347 injection Methods 0.000 description 2
- 239000007924 injection Substances 0.000 description 2
- 238000010248 power generation Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 1
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000013016 damping Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 238000010977 unit operation Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/11—Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/092—Reinforcement learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Resources & Organizations (AREA)
- Economics (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Strategic Management (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- General Business, Economics & Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Operations Research (AREA)
- Marketing (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Tourism & Hospitality (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Game Theory and Decision Science (AREA)
- Computational Mathematics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Quality & Reliability (AREA)
- Mathematical Analysis (AREA)
- Development Economics (AREA)
- Primary Health Care (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Educational Administration (AREA)
Abstract
The invention relates to an optimal scheduling method of an electric heating combined energy system, terminal equipment and a storage medium, wherein the method comprises the following steps: abstract modeling an electrothermal combined energy system into a state diagram; collecting a historical state diagram of an electric heating combined energy system to form a training set; constructing a reinforcement learning model of optimal scheduling of an electrothermal combined energy system, setting a multi-layer perceptron network in the reinforcement learning model to be changed into a graph neural network, setting an action space, a state space and a return function of the reinforcement learning model, adopting a maximum entropy of value distribution as an algorithm target of reinforcement learning, and training the reinforcement learning model through a training set; and obtaining the output result of the electric heating combined energy system through the trained reinforcement learning model. Compared with the method based on the MLP architecture, the method provided by the invention has the advantages that the utilization of the system topology information brings a larger exploration space and a faster convergence speed, and the method is more advantageous in the optimal scheduling work of the electrothermal combined energy system.
Description
Technical Field
The invention relates to the field of energy optimal scheduling, in particular to an optimal scheduling method, terminal equipment and a storage medium for an electric heating combined energy system.
Background
Along with the increasingly outstanding contradiction between the increasing of the energy demand and the energy conservation and emission reduction of the current society, how to fully utilize new energy to reduce the use of traditional energy and realize the aims of reducing the running cost and energy conservation and emission reduction becomes a problem to be solved urgently.
The development of the energy Internet provides a guarantee for realizing the complementation and conversion of the multi-energy flows and the full utilization of the energy, wherein the scheduling and the coupling of the multi-energy flows are key for realizing the efficient operation of the comprehensive energy system. Due to the nonlinear constraint conditions of the energy system, the global optimal solution is difficult to obtain by using the multi-energy flow optimization scheduling work as the non-convex optimization problem. The traditional solving work of the problem is mostly focused on the aspects of approximate solving, nonlinear solving and the like, and intelligent algorithms such as particle swarms and the like also appear. But the excessive complexity and need to be re-solved each time the system state changes makes these approaches difficult to achieve a fast response in the face of large-scale problems. With the popularization of renewable energy sources such as photovoltaics and wind power, the fluctuation and uncertainty of the output of the renewable energy sources bring new challenges for optimizing dispatching work.
Disclosure of Invention
In order to solve the problems, the invention provides an optimal scheduling method for an electric heating combined energy system, terminal equipment and a storage medium.
The specific scheme is as follows:
an optimal scheduling method of an electric heating combined energy system comprises the following steps:
s1: abstract modeling an electric heating combined energy system into a state diagram, wherein the electric power system equipment represents node characteristics through the electric load of the equipment, and represents edge characteristics through susceptance and conductance between two pieces of equipment corresponding to two nodes; the thermodynamic system equipment represents node characteristics through the thermal load of the equipment, and represents edge characteristics through the length of a pipeline branch and the pipeline mass flow rate between two corresponding equipment of two nodes;
s2: collecting a historical state diagram of an electric heating combined energy system to form a training set;
s3: constructing a reinforcement learning model of optimal scheduling of an electrothermal combined energy system, setting a multi-layer perceptron network in the reinforcement learning model to be changed into a graph neural network, setting an action space, a state space and a return function of the reinforcement learning model, adopting a maximum entropy of value distribution as an algorithm target of reinforcement learning, and training the reinforcement learning model through a training set;
s4: and obtaining the output result of the electric heating combined energy system through the trained reinforcement learning model.
Further, the active power output of the thermal power station, the electric power output of the CHP unit, the thermal power output of the heating station and the absorption coefficient of the wind power station are used as action variables in the action space of the reinforcement learning model, and the value range of each action variable is set.
Further, the system state in the state space of the reinforcement learning model is represented by node features and edge features of the graph.
Further, the calculation formula of the return function of the reinforcement learning model is:
wherein r is t Indicating the rewards at time t, F t Representing the running cost at time t, i representing the sequence number of the constraint, lambda i Represents penalty factor corresponding to the ith constraint, |L i I represents the absolute value of the difference in satisfaction of the i-th constraint, when the constraint is satisfied, L i I is 0; when the constraint is not established, |L i The i is the minimum of the absolute values of the differences from the boundary conditions.
Further, the electric heating combined energy system consists of a thermal power station, a heat supply station, a CHP unit and a wind power station, and the operation cost is the sum of the operation cost of the thermal power station, the operation cost of the heat supply station, the operation cost of the CHP unit and the wind discarding cost of the wind power station.
Furthermore, the node information is aggregated by adopting a concentration mechanism in the graph neural network of the reinforcement learning model to obtain node representation.
Further, in the reinforcement learning model, the expected value of the soft Q function return is not directly calculated any more, but a value distribution function of the soft Q function return is modeled, and the soft Q function return is learned based on the Bellman operator.
The invention relates to an optimal scheduling terminal device of an electric heating combined energy system, which comprises a processor, a memory and a computer program stored in the memory and capable of running on the processor, wherein the steps of the method of the embodiment of the invention are realized when the processor executes the computer program.
A computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the method described above for embodiments of the present invention.
According to the technical scheme, the reinforcement learning optimization scheduling method based on the GNN framework is provided, compared with the method based on the MLP framework, the utilization of system topology information brings larger exploration space and higher convergence speed, and the reinforcement learning optimization scheduling method based on the GNN framework has more advantages in optimization scheduling work of an electrothermal combined energy system.
Drawings
Fig. 1 is a flowchart of a first embodiment of the present invention.
Fig. 2 is a schematic diagram of the reinforcement learning model algorithm framework in this embodiment.
Fig. 3 is a schematic diagram of an Actor network structure in this embodiment.
Fig. 4 is a schematic diagram of the electrothermal combined energy system in this embodiment.
FIG. 5 is a graph showing the comparison of GNN and MLP in this example.
Fig. 6 is a schematic diagram showing the output result of the power system in this embodiment.
FIG. 7 is a graph showing the result of the output of the thermal power system in this embodiment.
Detailed Description
For further illustration of the various embodiments, the invention is provided with the accompanying drawings. The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate embodiments and together with the description, serve to explain the principles of the embodiments. With reference to these matters, one of ordinary skill in the art will understand other possible embodiments and advantages of the present invention.
The invention will now be further described with reference to the drawings and detailed description.
Embodiment one:
the embodiment of the invention provides an optimal scheduling method of an electric heating combined energy system, as shown in fig. 1, comprising the following steps:
s1: the electrothermal joint energy system is abstractly modeled as a state diagram.
1. Electric heating combined energy system model
The electric heating combined energy system constructed in the embodiment comprises an electric power system model, a thermodynamic system model and an electric heating system coupling link.
1. Electric power system model
The alternating current power system tide equation is:
wherein: n (N) P Representing a set of nodes of a power system, P i ,Q i Respectively representing the active power and the reactive power injected by the node i, U i Representing the voltage amplitude at node i, G ij Representing the conductance between node i and node j, B ij Representing susceptance between node i and node j, θ ij =θ i -θ j Representing the phase angle difference between node i and node j.
2. Thermodynamic system model
Since the conduction of thermal energy requires a medium, the most commonly used hydraulic medium is selected in this embodiment, and the thermodynamic system is divided into a hydraulic model and a thermodynamic model.
1) Hydraulic model
The hydraulic model consists of a flow continuous equation and a loop pressure equation:
wherein: a represents a node-branch association matrix, m represents a pipeline mass flow rate vector, m q Represents node injection flow vector, B represents loop-branch correlation matrix, h f Representing the head loss vector, and correlating the damping coefficient of the pipeline and the pipeline mass flow rate.
2) Thermodynamic model
The thermodynamic model comprises a node power equation, a pipeline temperature drop equation and a medium mixing equation:
wherein: n (N) H Representing a set of thermodynamic system nodes, H i Representing the thermal power of node i, C p For specific heat capacity of water, m q,i Representing the injection flow of node i, T i in ,T i out The water inlet temperature and the water outlet temperature of the node i are respectively T j,i Represents the water temperature, T, at the j-end of the pipe branch ij i,j Then the water temperature at the i-terminal of the pipe branch ij is indicated, T e Is the external environment temperature, lambda is the heat conductivity coefficient, L ij Representing the length, T, of the pipe branch ij i Represents the water temperature, m, of the hybrid node i ik Represents the mass flow rate between node k and node i, |n i I represents the total number of nodes for all flows to node i.
3. Coupling link of electric heating system
For the coupling link of the electric heating combined energy system, the embodiment considers the CHP cogeneration unit capable of generating electricity and supplying heat simultaneously to meet the load demands of an electric power system and a thermodynamic system.
The common model of the cogeneration unit comprises a polygonal model and a linear model with fixed electric heating ratio, the extraction condensing unit with more flexible regulation is selected in the embodiment, and the polygonal model corresponding to the extraction condensing unit is as follows:
wherein: p (P) CHP ,H CHP Respectively represents the electric output and the thermal output of the CHP unit,respectively represent the upper limit and the lower limit of the electric output power of the CHP unit, < >>Respectively represent the upper limit and the lower limit of the heat output power of the CHP unit, alpha 1 ,α 2 ,α 3 Is a polygonal region coefficient.
2. Objective function
For the optimal scheduling task of the electric heating combined system, the embodiment aims at minimizing the running cost and absorbing new energy output as much as possible.
1) Operation cost of thermal power station
Wherein: f (F) 1,t Indicating the operation cost of all the thermal power stations at the moment t, |N P I is the number of thermal power stations, P i,t Indicating the active output of the thermal power station i at the time t, alpha 0 ,α 1 ,α 2 Is the consumption characteristic curve parameter of the thermal power generating unit.
2) Heat supply station operating cost
Wherein: f (F) 2,t Indicating the running cost of all heating stations at time t, |N H I is the number of heating stations, H i,t Representing the thermal power output of the heating station i at time t, beta 0 ,β 1 ,β 2 Is a consumption characteristic curve parameter of the heating station.
3) CHP unit operation cost
Wherein: f (F) 3,t Representing the running cost of all CHP units at time t, |N CHP The I is the number of CHP units,respectively representing the electric output and the thermal output of the CHP unit i at the time t, mu 0 ~μ 5 Is the consumption characteristic curve parameter of the CHP unit.
4) Cost of wind disposal
Wherein: f (F) 4,t For the wind curtailment costs of all wind power stations at time t, |N W I is the number of wind farms, alpha i Is the wind power output absorption coefficient,for the output of wind power station i at time t, corresponding +.>For wind power grid-connected electric quantity, C w Is a cost coefficient of wind abandoning.
5) Objective function
minF t =F 1,t +F 2,t +F 3,t +F 4,t (9)
Wherein F is t And the total running cost of the electric heating combined energy system at the moment t is represented.
3. Constraint conditions
1) Power balance constraint
Wherein:respectively representing the output of a conventional unit, a CHP unit and a new energy power station of a node i at the time t,/I>The thermal output of the conventional unit, CHP unit, at time t, respectively representing node i, +.>An electrical load and a thermal load, respectively.
2) Safety restraint
The stable operation of the combined heat and power system needs to meet necessary safety constraint conditions, wherein the electric power network needs to meet voltage constraint, phase angle difference constraint, line transmission constraint and the thermal network needs to meet node temperature and pipeline flow constraint:
wherein: u (U) i,min ,U i,max The upper and lower limits of the voltage amplitude at node i,for the upper phase angle difference, P l For the upper limit of the transmission power of the power line, T i,max ,T i,min Upper and lower limits of water supply temperature, m of node i respectively ij,max ,m ij,min The upper and lower limits of the water supply flow rate of the pipeline ij are respectively set.
4. State diagram
Since the electrical and thermal networks have natural graph structures, they can be abstractly modeled as graphs G (V, E) of nodes and edges without considering internal device information. Where V represents a node in the system and E represents an edge in the system.
S2: and acquiring a historical state diagram of the electric heating combined energy system to form a training set.
S3: constructing a reinforcement learning model of optimal scheduling of an electrothermal combined energy system, setting a fully connected neural network in the reinforcement learning model to be changed into a graph neural network, setting an action space, a state space and a return function of the reinforcement learning model, adopting a maximum entropy of value distribution as an algorithm target of reinforcement learning, and training the reinforcement learning model through a training set.
Reinforcement learning is performed by an agent through exploration, and an optimal solution of the problem in the current environment is obtained. The agent obtains the current state s and outputs an action a that acts on the environment to obtain a corresponding reward or prize r. The intelligent agent learns network parameters according to the feedback return value, and continuously adjusts the output strategy to obtain the maximum accumulated return:
wherein: g is the cumulative return value and gamma E [0,1] is the discount rate, which is used to adjust the weight of the agent on the short-term return and the long-term return.
The Actor-Critic algorithm is a reinforcement learning method combining strategy gradient and time sequence difference learning, and the basic architecture is shown in fig. 2. Wherein, the Actor refers to a policy network pi θ (a|s), i.e. learning a strategy to get as high a return as possible, critic is the value network V φ (s) to estimate the current policy and output an evaluation value. Thus, the Actor-Critic algorithm can update parameters in a single step without waiting for the end of the environment time to update the network. Policy network pi in the algorithm framework θ (s, a) and value network V φ θ, φ in(s) are all functions to be learned, and need to be learned in the training process. In each step of updating, the Actor is updated according to the current environmental state s t Output action a t And get the immediate return r(s) t ,a t ,s t+1 ). Critic gives a true return value according to the circumstances and a score r+γV under the previous criteria φ (s t+1 ) To adjust its scoring criteria so that its score is closer to the actual return of the environment. The Actor adjusts its own policy pi according to Critic score θ 。
The value distribution maximum entropy Actor-Critic algorithm target adopted in the embodiment:
in the method, in the process of the invention,represents the entropy of the policy pi (a|s) in state s. Compared with the Actor-Critic algorithm, the purpose of adding the entropy item is to randomize the strategy, namely, the probability of each action output is dispersed as far as possible instead of being concentrated on one action, so that the randomness of strategy learning is ensured, the exploration range is as large as possible, and the problem of falling into a local optimal solution is avoided.
The goal of policy improvement is to find a new policy pi new Better than current policies makes rewards expected to be larger, the policy network updates learning according to maximizing soft Q value:
to avoid overestimation of Q value in learning and thus reduce policy performance, the algorithm no longer directly calculates soft return Z π Desired value Q of (s, a) π (s, a) but rather models the soft return Z π Distribution of (s, a):called value distribution function, and learn soft return Z based on Bellman operator π (s,a):
Wherein R to R (|s, a), s t+1 ~p,a t+1 Pi. Sign symbolThe random variables representing the left and right ends have the same probability distribution. Let->Obeying the distribution->Updating parameters by minimizing the distribution distance:
where d is a distance function that measures two distributions, commonly used KL divergence.
Compared with an MLP (multi-layer perceptron) which does not use topology information, the graph neural network model can conduct information transfer between nodes based on connection relations between the nodes. To better utilize the information in the graph, the present embodiment adopts an attention mechanism to aggregate the node information to obtain a node representation:
in the middle ofRepresenting the vector representation of node i in the k-th layer neural network, W representing the neural network parameter matrix to perform linear transformation on node characteristics,/for>Representing the neighborhood node of node i, alpha i,j As the attention coefficient:
wherein the vector a is the parameter vector of the attention network, W e Is a parameter matrix for linearly transforming the edge information, e i,j As feature vectors of edges, GELU is an activation function, and l is a vector connector.
Solving the optimal scheduling strategy of the electrothermal combined energy system requires setting the action, state and return functions of the problem.
1) Action space
Active power output of a thermal power station, electric and thermal power output of a CHP unit, thermal power output of a heat supply station and a dissipation coefficient of a wind power station are taken as action variables:
the corresponding action ranges are as follows:
wherein:the upper limit and the lower limit of the active power output of the thermal power station i are respectively +.>The upper limit and the lower limit of the power output of the CHP unit i are respectively +.>The upper limit and the lower limit of the i heat power output of the CHP unit are respectively +.>The upper limit and the lower limit of the heat power output of the heat supply station i are respectively.
2) State space
Since the system is modeled as graph G (V, E), the system state is reflected by node features and edge features. The equipment of the electrothermal combined energy system is divided into electric power system equipment and thermodynamic system equipment.
The node characteristics and the edge characteristics of the power system are respectively as follows:
wherein: p (P) i L Is the electrical load of node i.
The node characteristics and the edge characteristics of the thermodynamic system are respectively as follows:
wherein:is the thermal load of node i. L (L) ij Representing the length, m, of a pipe branch ij ij Representing the pipe mass flow rate between node i and node j.
3) Rewarding function
The calculation of the return value includes the system running cost and the violation constraint penalty, and since the running cost is minimized and the reinforcement learning is maximized, the return needs to take a negative value:
wherein: rt represents the return value at time t, F t For the running cost at time t shown in formula (9), lambda i For the penalty factor corresponding to the ith constraint, i represents the sequence number of the constraints listed in formulas (10) and (11) |L i I represents the absolute value of the difference in satisfaction of the i-th constraint, when the constraint is satisfied, L i I is 0; when the constraint is not established, |L i The i is the minimum of the absolute values of the differences from the boundary conditions. When the constraint is not established, as in the case of the constraint being the equation in the formula (10), L i The I is the absolute value of the difference between two sides of the equation; if the upper and lower limit boundary conditions are included when the constraint is inequality in the formula (11), the minimum value (if a<X<b, when |X-a|<X-b, then L i |= |x-a|; when |X-a|>X-b is L i |= |x-b|); if only the upper limit boundary condition or the lower limit boundary condition is included, the absolute value of the difference from the upper limit or the absolute value of the difference from the lower limit is taken.
The network architecture of the Actor in this embodiment is shown in fig. 3. The model is input as a state diagram G (V t ,E t ) After passing through the neural network of the k-layer graph, the representation h is obtained t,k Each layer ofThe activation function is GELU, the mean value mu of each action and the logarithm of variance lnsigma are output, and the lnsigma is subjected to exponential transformation to obtain normal distribution N (mu, sigma) 2 ). After sampling and adding noise, the values in (-1, 1) are obtained through the Tanh layer, and finally mapped into the action range listed in the formula (21) to obtain the actual dispatching output value.
Learning of the value network Critic requires combining the output actions, at which time the node characteristics of the power systemNode characteristics with thermodynamic system->The method comprises the following steps of:
map G (V) t ',E t ') into a value network to obtain a soft Q value.
S4: and obtaining the output result of the electric heating combined energy system through the trained reinforcement learning model.
Experimental verification analysis
The embodiment adopts an electric heating combined energy system as shown in fig. 4 for carrying out calculation analysis, and consists of an IEEE-33 bus power grid and a 32-node Bali heat supply network for analyzing the reinforcement learning optimization effect. Wherein G1 and G2 are two thermal power stations; w is a wind power station; GB1 and GB2 are two heat supply stations; CHP is a cogeneration unit.
In this embodiment, a comparison test is performed under the condition that the number of network layers is the same as the number of neurons, where the number of layers of the Actor network is 4, and the number of neurons in each layer is: 128, 64, 32, 32. The number of layers of Critic network is 5, and the number of neurons in each layer is as follows: 128 Each layer activation function is GELU, and the number of experience pools is: 500000. the learning rate is automatically adjusted through an Adam optimizer, and the adjustment range is 5 multiplied by 10 -4 ~5×10 -6 。
As can be seen from fig. 5, the return curve based on the GNN architecture converges after 3000 rounds of training, and the GNN starts to rise and converge faster than the return curve based on the MLP architecture, and the return value is larger, i.e. the running cost is lower. The algorithm model based on the GNN architecture is explained to utilize side information, so that a larger exploration space and a faster training speed are brought.
After training is completed, the strategy network can obtain system output action based on system load, and 24-hour scheduling results are shown in fig. 6 and 7.
As can be seen from the output results of the power system in each period shown in FIG. 6, the total output is basically consistent with the load curve, and the thermal power unit 1,2 bears more power generation tasks due to larger installed capacity, and has the characteristic of slowly climbing in the peak period of daytime power consumption, so that the increase of actual power consumption requirements can be met. The load gap in the peak period is borne by the CHP unit, and the load gap runs with the lowest output power in the rest period. At night, wind power generation is more, the corresponding grid-connected power is correspondingly increased, and the wind power consumption coefficient is always kept at about 97%. The thermodynamic output result shown in fig. 7 shows that the total output is basically identical to the load curve, and the slow climbing can be realized to meet the requirement during the night load peak period. The heat source output difference is smaller due to the limitation of the output upper limit, so that the thermal loss in the transmission process can be effectively reduced.
The embodiment of the invention provides a value distribution maximum entropy Actor-Critic reinforcement learning optimization scheduling method based on a GNN architecture, which can fully utilize topological structure information of a system and realize more effective exploration learning. Compared with a method based on an MLP architecture, the method has the advantages that the utilization of system topology information brings a larger exploration space and a faster convergence speed, and the method is more advantageous in the optimal scheduling work of the electrothermal combined energy system.
Embodiment two:
the invention also provides an optimal scheduling terminal device of the electric heating combined energy system, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the steps in the method embodiment of the first embodiment of the invention are realized when the processor executes the computer program.
Further, as an executable scheme, the electric heating combined energy system optimizing and scheduling terminal device may be a computing device such as a desktop computer, a notebook computer, a palm computer, a cloud server, and the like. The optimal scheduling terminal equipment of the electrothermal combined energy system can comprise, but is not limited to, a processor and a memory. It will be appreciated by those skilled in the art that the above-mentioned composition structure of the electrothermal combined energy system optimal scheduling terminal device is merely an example of the electrothermal combined energy system optimal scheduling terminal device, and does not constitute limitation of the electrothermal combined energy system optimal scheduling terminal device, and may include more or fewer components than the above-mentioned components, or combine some components, or different components, for example, the electrothermal combined energy system optimal scheduling terminal device may further include an input/output device, a network access device, a bus, and the embodiment of the present invention does not limit the foregoing.
Further, as an implementation, the processor may be a central processing unit (Central Processing Unit, CPU), other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc. The general processor can be a microprocessor or any conventional processor, and the processor is a control center of the electric heating combined energy system optimal scheduling terminal device, and various interfaces and lines are used for connecting various parts of the whole electric heating combined energy system optimal scheduling terminal device.
The memory may be used to store the computer program and/or the module, and the processor may implement various functions of the electrothermal combined energy system optimal scheduling terminal device by running or executing the computer program and/or the module stored in the memory and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, at least one application program required for a function; the storage data area may store data created according to the use of the cellular phone, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.
The present invention also provides a computer readable storage medium storing a computer program which when executed by a processor implements the steps of the above-described method of an embodiment of the present invention.
The modules/units integrated by the electrothermal combined energy system optimization scheduling terminal device can be stored in a computer readable storage medium if the modules/units are realized in the form of software functional units and sold or used as independent products. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a software distribution medium, and so forth.
While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (9)
1. An optimal scheduling method for an electric heating combined energy system is characterized by comprising the following steps:
s1: abstract modeling an electric heating combined energy system into a state diagram, wherein the electric power system equipment represents node characteristics through the electric load of the equipment, and represents edge characteristics through susceptance and conductance between two pieces of equipment corresponding to two nodes; the thermodynamic system equipment represents node characteristics through the thermal load of the equipment, and represents edge characteristics through the length of a pipeline branch and the pipeline mass flow rate between two corresponding equipment of two nodes;
s2: collecting a historical state diagram of an electric heating combined energy system to form a training set;
s3: constructing a reinforcement learning model of optimal scheduling of an electrothermal combined energy system, setting a multi-layer perceptron network in the reinforcement learning model to be changed into a graph neural network, setting an action space, a state space and a return function of the reinforcement learning model, adopting a maximum entropy of value distribution as an algorithm target of reinforcement learning, and training the reinforcement learning model through a training set;
s4: and obtaining the output result of the electric heating combined energy system through the trained reinforcement learning model.
2. The optimal scheduling method for the electric heating combined energy system according to claim 1, wherein the method comprises the following steps of: and taking the active power output of the thermal power station, the electric power output of the CHP unit, the thermal power output of the heat supply station and the absorption coefficient of the wind power station as action variables in the action space of the reinforcement learning model, and setting the value range of each action variable.
3. The optimal scheduling method for the electric heating combined energy system according to claim 1, wherein the method comprises the following steps of: the state of the system in the state space of the reinforcement learning model is represented by node features and edge features of the graph.
4. The optimal scheduling method for the electric heating combined energy system according to claim 1, wherein the method comprises the following steps of: the calculation formula of the return function of the reinforcement learning model is as follows:
wherein r is t Indicating the return value of time t, F t Representing the running cost at time t, i representing the sequence number of the constraint, lambda i Represents penalty factor corresponding to the ith constraint, |L i I represents the absolute value of the difference in satisfaction of the i-th constraint, when the constraint is satisfied, L i I is 0; when the constraint is not established, |L i The i is the minimum of the absolute values of the differences from the boundary conditions.
5. The optimal scheduling method for the electric heating combined energy system according to claim 4, wherein the optimal scheduling method is characterized by comprising the following steps of: the electric heating combined energy system consists of a thermal power station, a heat supply station, a CHP unit and a wind power station, and the operation cost is the sum of the operation cost of the thermal power station, the operation cost of the heat supply station, the operation cost of the CHP unit and the wind discarding cost of the wind power station.
6. The optimal scheduling method for the electric heating combined energy system according to claim 1, wherein the method comprises the following steps of: and aggregating the node information by adopting a concentration mechanism in the graph neural network of the reinforcement learning model to obtain node representation.
7. The optimal scheduling method for the electric heating combined energy system according to claim 1, wherein the method comprises the following steps of: in the reinforcement learning model, the expected value of the soft Q function return is not directly calculated any more, but a value distribution function of the soft Q function return is modeled, and the soft Q function return is learned based on the Bellman operator.
8. An electric heating combined energy system optimal scheduling terminal device is characterized in that: comprising a processor, a memory and a computer program stored in the memory and running on the processor, which processor, when executing the computer program, carries out the steps of the method according to any one of claims 1 to 7.
9. A computer-readable storage medium storing a computer program, characterized in that: the computer program implementing the steps of the method according to any one of claims 1 to 7 when executed by a processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310328399.5A CN116362504A (en) | 2023-03-30 | 2023-03-30 | Optimal scheduling method for electric heating combined energy system, terminal equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310328399.5A CN116362504A (en) | 2023-03-30 | 2023-03-30 | Optimal scheduling method for electric heating combined energy system, terminal equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116362504A true CN116362504A (en) | 2023-06-30 |
Family
ID=86914770
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310328399.5A Pending CN116362504A (en) | 2023-03-30 | 2023-03-30 | Optimal scheduling method for electric heating combined energy system, terminal equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116362504A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117649102A (en) * | 2024-01-30 | 2024-03-05 | 大连理工大学 | Optimal scheduling method of multi-energy flow system in steel industry based on maximum entropy reinforcement learning |
-
2023
- 2023-03-30 CN CN202310328399.5A patent/CN116362504A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117649102A (en) * | 2024-01-30 | 2024-03-05 | 大连理工大学 | Optimal scheduling method of multi-energy flow system in steel industry based on maximum entropy reinforcement learning |
CN117649102B (en) * | 2024-01-30 | 2024-05-17 | 大连理工大学 | Optimal scheduling method of multi-energy flow system in steel industry based on maximum entropy reinforcement learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107769237B (en) | Multi-energy system coordinated dispatching method and device based on electric car access | |
CN111242443B (en) | Deep reinforcement learning-based economic dispatching method for virtual power plant in energy internet | |
Veerasamy et al. | Design of single-and multi-loop self-adaptive PID controller using heuristic based recurrent neural network for ALFC of hybrid power system | |
CN116451947A (en) | Optimal scheduling method for electric heating gas comprehensive energy system, terminal equipment and storage medium | |
CN106712075A (en) | Peaking strategy optimization method considering safety constraints of wind power integration system | |
CN110212551B (en) | Micro-grid reactive power automatic control method based on convolutional neural network | |
CN116362504A (en) | Optimal scheduling method for electric heating combined energy system, terminal equipment and storage medium | |
CN112330021A (en) | Network coordination control method of distributed optical storage system | |
Deng et al. | Recurrent neural network for combined economic and emission dispatch | |
CN115689375A (en) | Virtual power plant operation control method, device, equipment and medium | |
Su et al. | Research on robust stochastic dynamic economic dispatch model considering the uncertainty of wind power | |
CN112084680B (en) | Energy internet optimization strategy method based on DQN algorithm | |
Yu et al. | Improved Elman neural network short-term residents load forecasting considering human comfort index | |
CN113723793A (en) | Method, device, equipment and medium for realizing park comprehensive energy system | |
CN112883630A (en) | Day-ahead optimized economic dispatching method for multi-microgrid system for wind power consumption | |
CN115411776B (en) | Thermoelectric collaborative scheduling method and device for residence comprehensive energy system | |
Qiu et al. | Local integrated energy system operational optimization considering multi‐type uncertainties: A reinforcement learning approach based on improved TD3 algorithm | |
CN115758763A (en) | Multi-energy flow system optimal configuration method and system considering source load uncertainty | |
Duan et al. | A non-convex dispatch problem with generator constraints using neural network and particle swarm optimization | |
CN112101651B (en) | Electric energy network coordination control method, system and information data processing terminal | |
CN111523792B (en) | Method for calculating scheduling parameters of comprehensive energy system, method for controlling equipment and related devices | |
Yu et al. | A fuzzy Q-learning algorithm for storage optimization in islanding microgrid | |
Qu et al. | A hybrid static economic dispatch optimization model with wind energy: Improved pathfinder optimization model | |
CN112165086A (en) | Online optimization system of active power distribution network | |
CN115660324B (en) | Power grid multi-section out-of-limit regulation and control method and system based on graph reinforcement learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |