CN116362504A - Optimal scheduling method for electric heating combined energy system, terminal equipment and storage medium - Google Patents

Optimal scheduling method for electric heating combined energy system, terminal equipment and storage medium Download PDF

Info

Publication number
CN116362504A
CN116362504A CN202310328399.5A CN202310328399A CN116362504A CN 116362504 A CN116362504 A CN 116362504A CN 202310328399 A CN202310328399 A CN 202310328399A CN 116362504 A CN116362504 A CN 116362504A
Authority
CN
China
Prior art keywords
energy system
combined energy
electric heating
reinforcement learning
learning model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310328399.5A
Other languages
Chinese (zh)
Inventor
王新
张志宏
任晓龙
司恒斌
陈曦
田双
王嘉
梁飞
张宝月
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Shaanxi Electric Power Co Ltd Information And Communication Co
Original Assignee
State Grid Shaanxi Electric Power Co Ltd Information And Communication Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Shaanxi Electric Power Co Ltd Information And Communication Co filed Critical State Grid Shaanxi Electric Power Co Ltd Information And Communication Co
Priority to CN202310328399.5A priority Critical patent/CN116362504A/en
Publication of CN116362504A publication Critical patent/CN116362504A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Strategic Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Game Theory and Decision Science (AREA)
  • Computational Mathematics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • Mathematical Analysis (AREA)
  • Development Economics (AREA)
  • Primary Health Care (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Educational Administration (AREA)

Abstract

The invention relates to an optimal scheduling method of an electric heating combined energy system, terminal equipment and a storage medium, wherein the method comprises the following steps: abstract modeling an electrothermal combined energy system into a state diagram; collecting a historical state diagram of an electric heating combined energy system to form a training set; constructing a reinforcement learning model of optimal scheduling of an electrothermal combined energy system, setting a multi-layer perceptron network in the reinforcement learning model to be changed into a graph neural network, setting an action space, a state space and a return function of the reinforcement learning model, adopting a maximum entropy of value distribution as an algorithm target of reinforcement learning, and training the reinforcement learning model through a training set; and obtaining the output result of the electric heating combined energy system through the trained reinforcement learning model. Compared with the method based on the MLP architecture, the method provided by the invention has the advantages that the utilization of the system topology information brings a larger exploration space and a faster convergence speed, and the method is more advantageous in the optimal scheduling work of the electrothermal combined energy system.

Description

Optimal scheduling method for electric heating combined energy system, terminal equipment and storage medium
Technical Field
The invention relates to the field of energy optimal scheduling, in particular to an optimal scheduling method, terminal equipment and a storage medium for an electric heating combined energy system.
Background
Along with the increasingly outstanding contradiction between the increasing of the energy demand and the energy conservation and emission reduction of the current society, how to fully utilize new energy to reduce the use of traditional energy and realize the aims of reducing the running cost and energy conservation and emission reduction becomes a problem to be solved urgently.
The development of the energy Internet provides a guarantee for realizing the complementation and conversion of the multi-energy flows and the full utilization of the energy, wherein the scheduling and the coupling of the multi-energy flows are key for realizing the efficient operation of the comprehensive energy system. Due to the nonlinear constraint conditions of the energy system, the global optimal solution is difficult to obtain by using the multi-energy flow optimization scheduling work as the non-convex optimization problem. The traditional solving work of the problem is mostly focused on the aspects of approximate solving, nonlinear solving and the like, and intelligent algorithms such as particle swarms and the like also appear. But the excessive complexity and need to be re-solved each time the system state changes makes these approaches difficult to achieve a fast response in the face of large-scale problems. With the popularization of renewable energy sources such as photovoltaics and wind power, the fluctuation and uncertainty of the output of the renewable energy sources bring new challenges for optimizing dispatching work.
Disclosure of Invention
In order to solve the problems, the invention provides an optimal scheduling method for an electric heating combined energy system, terminal equipment and a storage medium.
The specific scheme is as follows:
an optimal scheduling method of an electric heating combined energy system comprises the following steps:
s1: abstract modeling an electric heating combined energy system into a state diagram, wherein the electric power system equipment represents node characteristics through the electric load of the equipment, and represents edge characteristics through susceptance and conductance between two pieces of equipment corresponding to two nodes; the thermodynamic system equipment represents node characteristics through the thermal load of the equipment, and represents edge characteristics through the length of a pipeline branch and the pipeline mass flow rate between two corresponding equipment of two nodes;
s2: collecting a historical state diagram of an electric heating combined energy system to form a training set;
s3: constructing a reinforcement learning model of optimal scheduling of an electrothermal combined energy system, setting a multi-layer perceptron network in the reinforcement learning model to be changed into a graph neural network, setting an action space, a state space and a return function of the reinforcement learning model, adopting a maximum entropy of value distribution as an algorithm target of reinforcement learning, and training the reinforcement learning model through a training set;
s4: and obtaining the output result of the electric heating combined energy system through the trained reinforcement learning model.
Further, the active power output of the thermal power station, the electric power output of the CHP unit, the thermal power output of the heating station and the absorption coefficient of the wind power station are used as action variables in the action space of the reinforcement learning model, and the value range of each action variable is set.
Further, the system state in the state space of the reinforcement learning model is represented by node features and edge features of the graph.
Further, the calculation formula of the return function of the reinforcement learning model is:
Figure BDA0004155851820000021
wherein r is t Indicating the rewards at time t, F t Representing the running cost at time t, i representing the sequence number of the constraint, lambda i Represents penalty factor corresponding to the ith constraint, |L i I represents the absolute value of the difference in satisfaction of the i-th constraint, when the constraint is satisfied, L i I is 0; when the constraint is not established, |L i The i is the minimum of the absolute values of the differences from the boundary conditions.
Further, the electric heating combined energy system consists of a thermal power station, a heat supply station, a CHP unit and a wind power station, and the operation cost is the sum of the operation cost of the thermal power station, the operation cost of the heat supply station, the operation cost of the CHP unit and the wind discarding cost of the wind power station.
Furthermore, the node information is aggregated by adopting a concentration mechanism in the graph neural network of the reinforcement learning model to obtain node representation.
Further, in the reinforcement learning model, the expected value of the soft Q function return is not directly calculated any more, but a value distribution function of the soft Q function return is modeled, and the soft Q function return is learned based on the Bellman operator.
The invention relates to an optimal scheduling terminal device of an electric heating combined energy system, which comprises a processor, a memory and a computer program stored in the memory and capable of running on the processor, wherein the steps of the method of the embodiment of the invention are realized when the processor executes the computer program.
A computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the method described above for embodiments of the present invention.
According to the technical scheme, the reinforcement learning optimization scheduling method based on the GNN framework is provided, compared with the method based on the MLP framework, the utilization of system topology information brings larger exploration space and higher convergence speed, and the reinforcement learning optimization scheduling method based on the GNN framework has more advantages in optimization scheduling work of an electrothermal combined energy system.
Drawings
Fig. 1 is a flowchart of a first embodiment of the present invention.
Fig. 2 is a schematic diagram of the reinforcement learning model algorithm framework in this embodiment.
Fig. 3 is a schematic diagram of an Actor network structure in this embodiment.
Fig. 4 is a schematic diagram of the electrothermal combined energy system in this embodiment.
FIG. 5 is a graph showing the comparison of GNN and MLP in this example.
Fig. 6 is a schematic diagram showing the output result of the power system in this embodiment.
FIG. 7 is a graph showing the result of the output of the thermal power system in this embodiment.
Detailed Description
For further illustration of the various embodiments, the invention is provided with the accompanying drawings. The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate embodiments and together with the description, serve to explain the principles of the embodiments. With reference to these matters, one of ordinary skill in the art will understand other possible embodiments and advantages of the present invention.
The invention will now be further described with reference to the drawings and detailed description.
Embodiment one:
the embodiment of the invention provides an optimal scheduling method of an electric heating combined energy system, as shown in fig. 1, comprising the following steps:
s1: the electrothermal joint energy system is abstractly modeled as a state diagram.
1. Electric heating combined energy system model
The electric heating combined energy system constructed in the embodiment comprises an electric power system model, a thermodynamic system model and an electric heating system coupling link.
1. Electric power system model
The alternating current power system tide equation is:
Figure BDA0004155851820000041
wherein: n (N) P Representing a set of nodes of a power system, P i ,Q i Respectively representing the active power and the reactive power injected by the node i, U i Representing the voltage amplitude at node i, G ij Representing the conductance between node i and node j, B ij Representing susceptance between node i and node j, θ ij =θ ij Representing the phase angle difference between node i and node j.
2. Thermodynamic system model
Since the conduction of thermal energy requires a medium, the most commonly used hydraulic medium is selected in this embodiment, and the thermodynamic system is divided into a hydraulic model and a thermodynamic model.
1) Hydraulic model
The hydraulic model consists of a flow continuous equation and a loop pressure equation:
Figure BDA0004155851820000051
wherein: a represents a node-branch association matrix, m represents a pipeline mass flow rate vector, m q Represents node injection flow vector, B represents loop-branch correlation matrix, h f Representing the head loss vector, and correlating the damping coefficient of the pipeline and the pipeline mass flow rate.
2) Thermodynamic model
The thermodynamic model comprises a node power equation, a pipeline temperature drop equation and a medium mixing equation:
Figure BDA0004155851820000052
wherein: n (N) H Representing a set of thermodynamic system nodes, H i Representing the thermal power of node i, C p For specific heat capacity of water, m q,i Representing the injection flow of node i, T i in ,T i out The water inlet temperature and the water outlet temperature of the node i are respectively T j,i Represents the water temperature, T, at the j-end of the pipe branch ij i,j Then the water temperature at the i-terminal of the pipe branch ij is indicated, T e Is the external environment temperature, lambda is the heat conductivity coefficient, L ij Representing the length, T, of the pipe branch ij i Represents the water temperature, m, of the hybrid node i ik Represents the mass flow rate between node k and node i, |n i I represents the total number of nodes for all flows to node i.
3. Coupling link of electric heating system
For the coupling link of the electric heating combined energy system, the embodiment considers the CHP cogeneration unit capable of generating electricity and supplying heat simultaneously to meet the load demands of an electric power system and a thermodynamic system.
The common model of the cogeneration unit comprises a polygonal model and a linear model with fixed electric heating ratio, the extraction condensing unit with more flexible regulation is selected in the embodiment, and the polygonal model corresponding to the extraction condensing unit is as follows:
Figure BDA0004155851820000061
wherein: p (P) CHP ,H CHP Respectively represents the electric output and the thermal output of the CHP unit,
Figure BDA0004155851820000062
respectively represent the upper limit and the lower limit of the electric output power of the CHP unit, < >>
Figure BDA0004155851820000063
Respectively represent the upper limit and the lower limit of the heat output power of the CHP unit, alpha 123 Is a polygonal region coefficient.
2. Objective function
For the optimal scheduling task of the electric heating combined system, the embodiment aims at minimizing the running cost and absorbing new energy output as much as possible.
1) Operation cost of thermal power station
Figure BDA0004155851820000064
Wherein: f (F) 1,t Indicating the operation cost of all the thermal power stations at the moment t, |N P I is the number of thermal power stations, P i,t Indicating the active output of the thermal power station i at the time t, alpha 012 Is the consumption characteristic curve parameter of the thermal power generating unit.
2) Heat supply station operating cost
Figure BDA0004155851820000065
Wherein: f (F) 2,t Indicating the running cost of all heating stations at time t, |N H I is the number of heating stations, H i,t Representing the thermal power output of the heating station i at time t, beta 012 Is a consumption characteristic curve parameter of the heating station.
3) CHP unit operation cost
Figure BDA0004155851820000071
Wherein: f (F) 3,t Representing the running cost of all CHP units at time t, |N CHP The I is the number of CHP units,
Figure BDA0004155851820000072
respectively representing the electric output and the thermal output of the CHP unit i at the time t, mu 0 ~μ 5 Is the consumption characteristic curve parameter of the CHP unit.
4) Cost of wind disposal
Figure BDA0004155851820000073
Wherein: f (F) 4,t For the wind curtailment costs of all wind power stations at time t, |N W I is the number of wind farms, alpha i Is the wind power output absorption coefficient,
Figure BDA0004155851820000074
for the output of wind power station i at time t, corresponding +.>
Figure BDA0004155851820000075
For wind power grid-connected electric quantity, C w Is a cost coefficient of wind abandoning.
5) Objective function
minF t =F 1,t +F 2,t +F 3,t +F 4,t (9)
Wherein F is t And the total running cost of the electric heating combined energy system at the moment t is represented.
3. Constraint conditions
1) Power balance constraint
Figure BDA0004155851820000076
Wherein:
Figure BDA0004155851820000077
respectively representing the output of a conventional unit, a CHP unit and a new energy power station of a node i at the time t,/I>
Figure BDA0004155851820000078
The thermal output of the conventional unit, CHP unit, at time t, respectively representing node i, +.>
Figure BDA0004155851820000079
An electrical load and a thermal load, respectively.
2) Safety restraint
The stable operation of the combined heat and power system needs to meet necessary safety constraint conditions, wherein the electric power network needs to meet voltage constraint, phase angle difference constraint, line transmission constraint and the thermal network needs to meet node temperature and pipeline flow constraint:
Figure BDA0004155851820000081
wherein: u (U) i,min ,U i,max The upper and lower limits of the voltage amplitude at node i,
Figure BDA0004155851820000082
for the upper phase angle difference, P l For the upper limit of the transmission power of the power line, T i,max ,T i,min Upper and lower limits of water supply temperature, m of node i respectively ij,max ,m ij,min The upper and lower limits of the water supply flow rate of the pipeline ij are respectively set.
4. State diagram
Since the electrical and thermal networks have natural graph structures, they can be abstractly modeled as graphs G (V, E) of nodes and edges without considering internal device information. Where V represents a node in the system and E represents an edge in the system.
S2: and acquiring a historical state diagram of the electric heating combined energy system to form a training set.
S3: constructing a reinforcement learning model of optimal scheduling of an electrothermal combined energy system, setting a fully connected neural network in the reinforcement learning model to be changed into a graph neural network, setting an action space, a state space and a return function of the reinforcement learning model, adopting a maximum entropy of value distribution as an algorithm target of reinforcement learning, and training the reinforcement learning model through a training set.
Reinforcement learning is performed by an agent through exploration, and an optimal solution of the problem in the current environment is obtained. The agent obtains the current state s and outputs an action a that acts on the environment to obtain a corresponding reward or prize r. The intelligent agent learns network parameters according to the feedback return value, and continuously adjusts the output strategy to obtain the maximum accumulated return:
Figure BDA0004155851820000083
wherein: g is the cumulative return value and gamma E [0,1] is the discount rate, which is used to adjust the weight of the agent on the short-term return and the long-term return.
The Actor-Critic algorithm is a reinforcement learning method combining strategy gradient and time sequence difference learning, and the basic architecture is shown in fig. 2. Wherein, the Actor refers to a policy network pi θ (a|s), i.e. learning a strategy to get as high a return as possible, critic is the value network V φ (s) to estimate the current policy and output an evaluation value. Thus, the Actor-Critic algorithm can update parameters in a single step without waiting for the end of the environment time to update the network. Policy network pi in the algorithm framework θ (s, a) and value network V φ θ, φ in(s) are all functions to be learned, and need to be learned in the training process. In each step of updating, the Actor is updated according to the current environmental state s t Output action a t And get the immediate return r(s) t ,a t ,s t+1 ). Critic gives a true return value according to the circumstances and a score r+γV under the previous criteria φ (s t+1 ) To adjust its scoring criteria so that its score is closer to the actual return of the environment. The Actor adjusts its own policy pi according to Critic score θ
The value distribution maximum entropy Actor-Critic algorithm target adopted in the embodiment:
Figure BDA0004155851820000091
in the method, in the process of the invention,
Figure BDA0004155851820000092
represents the entropy of the policy pi (a|s) in state s. Compared with the Actor-Critic algorithm, the purpose of adding the entropy item is to randomize the strategy, namely, the probability of each action output is dispersed as far as possible instead of being concentrated on one action, so that the randomness of strategy learning is ensured, the exploration range is as large as possible, and the problem of falling into a local optimal solution is avoided.
To evaluate the strategy pi, a soft Q function is defined and Bellman operator is used
Figure BDA0004155851820000093
Figure BDA0004155851820000094
The goal of policy improvement is to find a new policy pi new Better than current policies makes rewards expected to be larger, the policy network updates learning according to maximizing soft Q value:
Figure BDA0004155851820000095
to avoid overestimation of Q value in learning and thus reduce policy performance, the algorithm no longer directly calculates soft return Z π Desired value Q of (s, a) π (s, a) but rather models the soft return Z π Distribution of (s, a):
Figure BDA0004155851820000101
called value distribution function, and learn soft return Z based on Bellman operator π (s,a):
Figure BDA0004155851820000102
Wherein R to R (|s, a), s t+1 ~p,a t+1 Pi. Sign symbol
Figure BDA00041558518200001010
The random variables representing the left and right ends have the same probability distribution. Let->
Figure BDA0004155851820000103
Obeying the distribution->
Figure BDA0004155851820000104
Updating parameters by minimizing the distribution distance:
Figure BDA0004155851820000105
where d is a distance function that measures two distributions, commonly used KL divergence.
Compared with an MLP (multi-layer perceptron) which does not use topology information, the graph neural network model can conduct information transfer between nodes based on connection relations between the nodes. To better utilize the information in the graph, the present embodiment adopts an attention mechanism to aggregate the node information to obtain a node representation:
Figure BDA0004155851820000106
in the middle of
Figure BDA0004155851820000107
Representing the vector representation of node i in the k-th layer neural network, W representing the neural network parameter matrix to perform linear transformation on node characteristics,/for>
Figure BDA0004155851820000108
Representing the neighborhood node of node i, alpha i,j As the attention coefficient:
Figure BDA0004155851820000109
wherein the vector a is the parameter vector of the attention network, W e Is a parameter matrix for linearly transforming the edge information, e i,j As feature vectors of edges, GELU is an activation function, and l is a vector connector.
Solving the optimal scheduling strategy of the electrothermal combined energy system requires setting the action, state and return functions of the problem.
1) Action space
Active power output of a thermal power station, electric and thermal power output of a CHP unit, thermal power output of a heat supply station and a dissipation coefficient of a wind power station are taken as action variables:
Figure BDA0004155851820000111
the corresponding action ranges are as follows:
Figure BDA0004155851820000112
wherein:
Figure BDA0004155851820000113
the upper limit and the lower limit of the active power output of the thermal power station i are respectively +.>
Figure BDA0004155851820000114
The upper limit and the lower limit of the power output of the CHP unit i are respectively +.>
Figure BDA0004155851820000115
The upper limit and the lower limit of the i heat power output of the CHP unit are respectively +.>
Figure BDA0004155851820000116
The upper limit and the lower limit of the heat power output of the heat supply station i are respectively.
2) State space
Since the system is modeled as graph G (V, E), the system state is reflected by node features and edge features. The equipment of the electrothermal combined energy system is divided into electric power system equipment and thermodynamic system equipment.
The node characteristics and the edge characteristics of the power system are respectively as follows:
Figure BDA0004155851820000117
wherein: p (P) i L Is the electrical load of node i.
The node characteristics and the edge characteristics of the thermodynamic system are respectively as follows:
Figure BDA0004155851820000118
wherein:
Figure BDA0004155851820000119
is the thermal load of node i. L (L) ij Representing the length, m, of a pipe branch ij ij Representing the pipe mass flow rate between node i and node j.
3) Rewarding function
The calculation of the return value includes the system running cost and the violation constraint penalty, and since the running cost is minimized and the reinforcement learning is maximized, the return needs to take a negative value:
Figure BDA0004155851820000121
wherein: rt represents the return value at time t, F t For the running cost at time t shown in formula (9), lambda i For the penalty factor corresponding to the ith constraint, i represents the sequence number of the constraints listed in formulas (10) and (11) |L i I represents the absolute value of the difference in satisfaction of the i-th constraint, when the constraint is satisfied, L i I is 0; when the constraint is not established, |L i The i is the minimum of the absolute values of the differences from the boundary conditions. When the constraint is not established, as in the case of the constraint being the equation in the formula (10), L i The I is the absolute value of the difference between two sides of the equation; if the upper and lower limit boundary conditions are included when the constraint is inequality in the formula (11), the minimum value (if a<X<b, when |X-a|<X-b, then L i |= |x-a|; when |X-a|>X-b is L i |= |x-b|); if only the upper limit boundary condition or the lower limit boundary condition is included, the absolute value of the difference from the upper limit or the absolute value of the difference from the lower limit is taken.
The network architecture of the Actor in this embodiment is shown in fig. 3. The model is input as a state diagram G (V t ,E t ) After passing through the neural network of the k-layer graph, the representation h is obtained t,k Each layer ofThe activation function is GELU, the mean value mu of each action and the logarithm of variance lnsigma are output, and the lnsigma is subjected to exponential transformation to obtain normal distribution N (mu, sigma) 2 ). After sampling and adding noise, the values in (-1, 1) are obtained through the Tanh layer, and finally mapped into the action range listed in the formula (21) to obtain the actual dispatching output value.
Learning of the value network Critic requires combining the output actions, at which time the node characteristics of the power system
Figure BDA0004155851820000122
Node characteristics with thermodynamic system->
Figure BDA0004155851820000123
The method comprises the following steps of:
Figure BDA0004155851820000124
map G (V) t ',E t ') into a value network to obtain a soft Q value.
S4: and obtaining the output result of the electric heating combined energy system through the trained reinforcement learning model.
Experimental verification analysis
The embodiment adopts an electric heating combined energy system as shown in fig. 4 for carrying out calculation analysis, and consists of an IEEE-33 bus power grid and a 32-node Bali heat supply network for analyzing the reinforcement learning optimization effect. Wherein G1 and G2 are two thermal power stations; w is a wind power station; GB1 and GB2 are two heat supply stations; CHP is a cogeneration unit.
In this embodiment, a comparison test is performed under the condition that the number of network layers is the same as the number of neurons, where the number of layers of the Actor network is 4, and the number of neurons in each layer is: 128, 64, 32, 32. The number of layers of Critic network is 5, and the number of neurons in each layer is as follows: 128 Each layer activation function is GELU, and the number of experience pools is: 500000. the learning rate is automatically adjusted through an Adam optimizer, and the adjustment range is 5 multiplied by 10 -4 ~5×10 -6
As can be seen from fig. 5, the return curve based on the GNN architecture converges after 3000 rounds of training, and the GNN starts to rise and converge faster than the return curve based on the MLP architecture, and the return value is larger, i.e. the running cost is lower. The algorithm model based on the GNN architecture is explained to utilize side information, so that a larger exploration space and a faster training speed are brought.
After training is completed, the strategy network can obtain system output action based on system load, and 24-hour scheduling results are shown in fig. 6 and 7.
As can be seen from the output results of the power system in each period shown in FIG. 6, the total output is basically consistent with the load curve, and the thermal power unit 1,2 bears more power generation tasks due to larger installed capacity, and has the characteristic of slowly climbing in the peak period of daytime power consumption, so that the increase of actual power consumption requirements can be met. The load gap in the peak period is borne by the CHP unit, and the load gap runs with the lowest output power in the rest period. At night, wind power generation is more, the corresponding grid-connected power is correspondingly increased, and the wind power consumption coefficient is always kept at about 97%. The thermodynamic output result shown in fig. 7 shows that the total output is basically identical to the load curve, and the slow climbing can be realized to meet the requirement during the night load peak period. The heat source output difference is smaller due to the limitation of the output upper limit, so that the thermal loss in the transmission process can be effectively reduced.
The embodiment of the invention provides a value distribution maximum entropy Actor-Critic reinforcement learning optimization scheduling method based on a GNN architecture, which can fully utilize topological structure information of a system and realize more effective exploration learning. Compared with a method based on an MLP architecture, the method has the advantages that the utilization of system topology information brings a larger exploration space and a faster convergence speed, and the method is more advantageous in the optimal scheduling work of the electrothermal combined energy system.
Embodiment two:
the invention also provides an optimal scheduling terminal device of the electric heating combined energy system, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the steps in the method embodiment of the first embodiment of the invention are realized when the processor executes the computer program.
Further, as an executable scheme, the electric heating combined energy system optimizing and scheduling terminal device may be a computing device such as a desktop computer, a notebook computer, a palm computer, a cloud server, and the like. The optimal scheduling terminal equipment of the electrothermal combined energy system can comprise, but is not limited to, a processor and a memory. It will be appreciated by those skilled in the art that the above-mentioned composition structure of the electrothermal combined energy system optimal scheduling terminal device is merely an example of the electrothermal combined energy system optimal scheduling terminal device, and does not constitute limitation of the electrothermal combined energy system optimal scheduling terminal device, and may include more or fewer components than the above-mentioned components, or combine some components, or different components, for example, the electrothermal combined energy system optimal scheduling terminal device may further include an input/output device, a network access device, a bus, and the embodiment of the present invention does not limit the foregoing.
Further, as an implementation, the processor may be a central processing unit (Central Processing Unit, CPU), other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc. The general processor can be a microprocessor or any conventional processor, and the processor is a control center of the electric heating combined energy system optimal scheduling terminal device, and various interfaces and lines are used for connecting various parts of the whole electric heating combined energy system optimal scheduling terminal device.
The memory may be used to store the computer program and/or the module, and the processor may implement various functions of the electrothermal combined energy system optimal scheduling terminal device by running or executing the computer program and/or the module stored in the memory and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, at least one application program required for a function; the storage data area may store data created according to the use of the cellular phone, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.
The present invention also provides a computer readable storage medium storing a computer program which when executed by a processor implements the steps of the above-described method of an embodiment of the present invention.
The modules/units integrated by the electrothermal combined energy system optimization scheduling terminal device can be stored in a computer readable storage medium if the modules/units are realized in the form of software functional units and sold or used as independent products. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a software distribution medium, and so forth.
While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (9)

1. An optimal scheduling method for an electric heating combined energy system is characterized by comprising the following steps:
s1: abstract modeling an electric heating combined energy system into a state diagram, wherein the electric power system equipment represents node characteristics through the electric load of the equipment, and represents edge characteristics through susceptance and conductance between two pieces of equipment corresponding to two nodes; the thermodynamic system equipment represents node characteristics through the thermal load of the equipment, and represents edge characteristics through the length of a pipeline branch and the pipeline mass flow rate between two corresponding equipment of two nodes;
s2: collecting a historical state diagram of an electric heating combined energy system to form a training set;
s3: constructing a reinforcement learning model of optimal scheduling of an electrothermal combined energy system, setting a multi-layer perceptron network in the reinforcement learning model to be changed into a graph neural network, setting an action space, a state space and a return function of the reinforcement learning model, adopting a maximum entropy of value distribution as an algorithm target of reinforcement learning, and training the reinforcement learning model through a training set;
s4: and obtaining the output result of the electric heating combined energy system through the trained reinforcement learning model.
2. The optimal scheduling method for the electric heating combined energy system according to claim 1, wherein the method comprises the following steps of: and taking the active power output of the thermal power station, the electric power output of the CHP unit, the thermal power output of the heat supply station and the absorption coefficient of the wind power station as action variables in the action space of the reinforcement learning model, and setting the value range of each action variable.
3. The optimal scheduling method for the electric heating combined energy system according to claim 1, wherein the method comprises the following steps of: the state of the system in the state space of the reinforcement learning model is represented by node features and edge features of the graph.
4. The optimal scheduling method for the electric heating combined energy system according to claim 1, wherein the method comprises the following steps of: the calculation formula of the return function of the reinforcement learning model is as follows:
Figure FDA0004155851800000011
wherein r is t Indicating the return value of time t, F t Representing the running cost at time t, i representing the sequence number of the constraint, lambda i Represents penalty factor corresponding to the ith constraint, |L i I represents the absolute value of the difference in satisfaction of the i-th constraint, when the constraint is satisfied, L i I is 0; when the constraint is not established, |L i The i is the minimum of the absolute values of the differences from the boundary conditions.
5. The optimal scheduling method for the electric heating combined energy system according to claim 4, wherein the optimal scheduling method is characterized by comprising the following steps of: the electric heating combined energy system consists of a thermal power station, a heat supply station, a CHP unit and a wind power station, and the operation cost is the sum of the operation cost of the thermal power station, the operation cost of the heat supply station, the operation cost of the CHP unit and the wind discarding cost of the wind power station.
6. The optimal scheduling method for the electric heating combined energy system according to claim 1, wherein the method comprises the following steps of: and aggregating the node information by adopting a concentration mechanism in the graph neural network of the reinforcement learning model to obtain node representation.
7. The optimal scheduling method for the electric heating combined energy system according to claim 1, wherein the method comprises the following steps of: in the reinforcement learning model, the expected value of the soft Q function return is not directly calculated any more, but a value distribution function of the soft Q function return is modeled, and the soft Q function return is learned based on the Bellman operator.
8. An electric heating combined energy system optimal scheduling terminal device is characterized in that: comprising a processor, a memory and a computer program stored in the memory and running on the processor, which processor, when executing the computer program, carries out the steps of the method according to any one of claims 1 to 7.
9. A computer-readable storage medium storing a computer program, characterized in that: the computer program implementing the steps of the method according to any one of claims 1 to 7 when executed by a processor.
CN202310328399.5A 2023-03-30 2023-03-30 Optimal scheduling method for electric heating combined energy system, terminal equipment and storage medium Pending CN116362504A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310328399.5A CN116362504A (en) 2023-03-30 2023-03-30 Optimal scheduling method for electric heating combined energy system, terminal equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310328399.5A CN116362504A (en) 2023-03-30 2023-03-30 Optimal scheduling method for electric heating combined energy system, terminal equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116362504A true CN116362504A (en) 2023-06-30

Family

ID=86914770

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310328399.5A Pending CN116362504A (en) 2023-03-30 2023-03-30 Optimal scheduling method for electric heating combined energy system, terminal equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116362504A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117649102A (en) * 2024-01-30 2024-03-05 大连理工大学 Optimal scheduling method of multi-energy flow system in steel industry based on maximum entropy reinforcement learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117649102A (en) * 2024-01-30 2024-03-05 大连理工大学 Optimal scheduling method of multi-energy flow system in steel industry based on maximum entropy reinforcement learning
CN117649102B (en) * 2024-01-30 2024-05-17 大连理工大学 Optimal scheduling method of multi-energy flow system in steel industry based on maximum entropy reinforcement learning

Similar Documents

Publication Publication Date Title
CN107769237B (en) Multi-energy system coordinated dispatching method and device based on electric car access
CN111242443B (en) Deep reinforcement learning-based economic dispatching method for virtual power plant in energy internet
Veerasamy et al. Design of single-and multi-loop self-adaptive PID controller using heuristic based recurrent neural network for ALFC of hybrid power system
CN116451947A (en) Optimal scheduling method for electric heating gas comprehensive energy system, terminal equipment and storage medium
CN106712075A (en) Peaking strategy optimization method considering safety constraints of wind power integration system
CN110212551B (en) Micro-grid reactive power automatic control method based on convolutional neural network
CN116362504A (en) Optimal scheduling method for electric heating combined energy system, terminal equipment and storage medium
CN112330021A (en) Network coordination control method of distributed optical storage system
Deng et al. Recurrent neural network for combined economic and emission dispatch
CN115689375A (en) Virtual power plant operation control method, device, equipment and medium
Su et al. Research on robust stochastic dynamic economic dispatch model considering the uncertainty of wind power
CN112084680B (en) Energy internet optimization strategy method based on DQN algorithm
Yu et al. Improved Elman neural network short-term residents load forecasting considering human comfort index
CN113723793A (en) Method, device, equipment and medium for realizing park comprehensive energy system
CN112883630A (en) Day-ahead optimized economic dispatching method for multi-microgrid system for wind power consumption
CN115411776B (en) Thermoelectric collaborative scheduling method and device for residence comprehensive energy system
Qiu et al. Local integrated energy system operational optimization considering multi‐type uncertainties: A reinforcement learning approach based on improved TD3 algorithm
CN115758763A (en) Multi-energy flow system optimal configuration method and system considering source load uncertainty
Duan et al. A non-convex dispatch problem with generator constraints using neural network and particle swarm optimization
CN112101651B (en) Electric energy network coordination control method, system and information data processing terminal
CN111523792B (en) Method for calculating scheduling parameters of comprehensive energy system, method for controlling equipment and related devices
Yu et al. A fuzzy Q-learning algorithm for storage optimization in islanding microgrid
Qu et al. A hybrid static economic dispatch optimization model with wind energy: Improved pathfinder optimization model
CN112165086A (en) Online optimization system of active power distribution network
CN115660324B (en) Power grid multi-section out-of-limit regulation and control method and system based on graph reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination