CN116362504A

CN116362504A - Optimal scheduling method for electric heating combined energy system, terminal equipment and storage medium

Info

Publication number: CN116362504A
Application number: CN202310328399.5A
Authority: CN
Inventors: 王新; 张志宏; 任晓龙; 司恒斌; 陈曦; 田双; 王嘉; 梁飞; 张宝月
Original assignee: State Grid Shaanxi Electric Power Co Ltd Information And Communication Co
Current assignee: State Grid Shaanxi Electric Power Co Ltd Information And Communication Co
Priority date: 2023-03-30
Filing date: 2023-03-30
Publication date: 2023-06-30

Abstract

The invention relates to an optimal scheduling method of an electric heating combined energy system, terminal equipment and a storage medium, wherein the method comprises the following steps: abstract modeling an electrothermal combined energy system into a state diagram; collecting a historical state diagram of an electric heating combined energy system to form a training set; constructing a reinforcement learning model of optimal scheduling of an electrothermal combined energy system, setting a multi-layer perceptron network in the reinforcement learning model to be changed into a graph neural network, setting an action space, a state space and a return function of the reinforcement learning model, adopting a maximum entropy of value distribution as an algorithm target of reinforcement learning, and training the reinforcement learning model through a training set; and obtaining the output result of the electric heating combined energy system through the trained reinforcement learning model. Compared with the method based on the MLP architecture, the method provided by the invention has the advantages that the utilization of the system topology information brings a larger exploration space and a faster convergence speed, and the method is more advantageous in the optimal scheduling work of the electrothermal combined energy system.

Description

Optimal scheduling method for electric heating combined energy system, terminal equipment and storage medium

Technical Field

The invention relates to the field of energy optimal scheduling, in particular to an optimal scheduling method, terminal equipment and a storage medium for an electric heating combined energy system.

Background

Along with the increasingly outstanding contradiction between the increasing of the energy demand and the energy conservation and emission reduction of the current society, how to fully utilize new energy to reduce the use of traditional energy and realize the aims of reducing the running cost and energy conservation and emission reduction becomes a problem to be solved urgently.

The development of the energy Internet provides a guarantee for realizing the complementation and conversion of the multi-energy flows and the full utilization of the energy, wherein the scheduling and the coupling of the multi-energy flows are key for realizing the efficient operation of the comprehensive energy system. Due to the nonlinear constraint conditions of the energy system, the global optimal solution is difficult to obtain by using the multi-energy flow optimization scheduling work as the non-convex optimization problem. The traditional solving work of the problem is mostly focused on the aspects of approximate solving, nonlinear solving and the like, and intelligent algorithms such as particle swarms and the like also appear. But the excessive complexity and need to be re-solved each time the system state changes makes these approaches difficult to achieve a fast response in the face of large-scale problems. With the popularization of renewable energy sources such as photovoltaics and wind power, the fluctuation and uncertainty of the output of the renewable energy sources bring new challenges for optimizing dispatching work.

Disclosure of Invention

In order to solve the problems, the invention provides an optimal scheduling method for an electric heating combined energy system, terminal equipment and a storage medium.

The specific scheme is as follows:

an optimal scheduling method of an electric heating combined energy system comprises the following steps:

s1: abstract modeling an electric heating combined energy system into a state diagram, wherein the electric power system equipment represents node characteristics through the electric load of the equipment, and represents edge characteristics through susceptance and conductance between two pieces of equipment corresponding to two nodes; the thermodynamic system equipment represents node characteristics through the thermal load of the equipment, and represents edge characteristics through the length of a pipeline branch and the pipeline mass flow rate between two corresponding equipment of two nodes;

s2: collecting a historical state diagram of an electric heating combined energy system to form a training set;

s3: constructing a reinforcement learning model of optimal scheduling of an electrothermal combined energy system, setting a multi-layer perceptron network in the reinforcement learning model to be changed into a graph neural network, setting an action space, a state space and a return function of the reinforcement learning model, adopting a maximum entropy of value distribution as an algorithm target of reinforcement learning, and training the reinforcement learning model through a training set;

s4: and obtaining the output result of the electric heating combined energy system through the trained reinforcement learning model.

Further, the active power output of the thermal power station, the electric power output of the CHP unit, the thermal power output of the heating station and the absorption coefficient of the wind power station are used as action variables in the action space of the reinforcement learning model, and the value range of each action variable is set.

Further, the system state in the state space of the reinforcement learning model is represented by node features and edge features of the graph.

Further, the calculation formula of the return function of the reinforcement learning model is:

wherein r is _t Indicating the rewards at time t, F _t Representing the running cost at time t, i representing the sequence number of the constraint, lambda _i Represents penalty factor corresponding to the ith constraint, |L _i I represents the absolute value of the difference in satisfaction of the i-th constraint, when the constraint is satisfied, L _i I is 0; when the constraint is not established, |L _i The i is the minimum of the absolute values of the differences from the boundary conditions.

Further, the electric heating combined energy system consists of a thermal power station, a heat supply station, a CHP unit and a wind power station, and the operation cost is the sum of the operation cost of the thermal power station, the operation cost of the heat supply station, the operation cost of the CHP unit and the wind discarding cost of the wind power station.

Furthermore, the node information is aggregated by adopting a concentration mechanism in the graph neural network of the reinforcement learning model to obtain node representation.

Further, in the reinforcement learning model, the expected value of the soft Q function return is not directly calculated any more, but a value distribution function of the soft Q function return is modeled, and the soft Q function return is learned based on the Bellman operator.

The invention relates to an optimal scheduling terminal device of an electric heating combined energy system, which comprises a processor, a memory and a computer program stored in the memory and capable of running on the processor, wherein the steps of the method of the embodiment of the invention are realized when the processor executes the computer program.

A computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the method described above for embodiments of the present invention.

According to the technical scheme, the reinforcement learning optimization scheduling method based on the GNN framework is provided, compared with the method based on the MLP framework, the utilization of system topology information brings larger exploration space and higher convergence speed, and the reinforcement learning optimization scheduling method based on the GNN framework has more advantages in optimization scheduling work of an electrothermal combined energy system.

Drawings

Fig. 1 is a flowchart of a first embodiment of the present invention.

Fig. 2 is a schematic diagram of the reinforcement learning model algorithm framework in this embodiment.

Fig. 3 is a schematic diagram of an Actor network structure in this embodiment.

Fig. 4 is a schematic diagram of the electrothermal combined energy system in this embodiment.

FIG. 5 is a graph showing the comparison of GNN and MLP in this example.

Fig. 6 is a schematic diagram showing the output result of the power system in this embodiment.

FIG. 7 is a graph showing the result of the output of the thermal power system in this embodiment.

Detailed Description

For further illustration of the various embodiments, the invention is provided with the accompanying drawings. The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate embodiments and together with the description, serve to explain the principles of the embodiments. With reference to these matters, one of ordinary skill in the art will understand other possible embodiments and advantages of the present invention.

The invention will now be further described with reference to the drawings and detailed description.

Embodiment one:

the embodiment of the invention provides an optimal scheduling method of an electric heating combined energy system, as shown in fig. 1, comprising the following steps:

s1: the electrothermal joint energy system is abstractly modeled as a state diagram.

1. Electric heating combined energy system model

The electric heating combined energy system constructed in the embodiment comprises an electric power system model, a thermodynamic system model and an electric heating system coupling link.

1. Electric power system model

The alternating current power system tide equation is:

wherein: n (N) _P Representing a set of nodes of a power system, P _i ,Q _i Respectively representing the active power and the reactive power injected by the node i, U _i Representing the voltage amplitude at node i, G _ij Representing the conductance between node i and node j, B _ij Representing susceptance between node i and node j, θ _ij ＝θ _i -θ _j Representing the phase angle difference between node i and node j.

2. Thermodynamic system model

Since the conduction of thermal energy requires a medium, the most commonly used hydraulic medium is selected in this embodiment, and the thermodynamic system is divided into a hydraulic model and a thermodynamic model.

1) Hydraulic model

The hydraulic model consists of a flow continuous equation and a loop pressure equation:

wherein: a represents a node-branch association matrix, m represents a pipeline mass flow rate vector, m _q Represents node injection flow vector, B represents loop-branch correlation matrix, h _f Representing the head loss vector, and correlating the damping coefficient of the pipeline and the pipeline mass flow rate.

2) Thermodynamic model

The thermodynamic model comprises a node power equation, a pipeline temperature drop equation and a medium mixing equation:

wherein: n (N) _H Representing a set of thermodynamic system nodes, H _i Representing the thermal power of node i, C _p For specific heat capacity of water, m _q,i Representing the injection flow of node i, T _i ⁱⁿ ,T _i ^out The water inlet temperature and the water outlet temperature of the node i are respectively T _j,i Represents the water temperature, T, at the j-end of the pipe branch ij _i,j Then the water temperature at the i-terminal of the pipe branch ij is indicated, T _e Is the external environment temperature, lambda is the heat conductivity coefficient, L _ij Representing the length, T, of the pipe branch ij _i Represents the water temperature, m, of the hybrid node i _ik Represents the mass flow rate between node k and node i, |n _i I represents the total number of nodes for all flows to node i.

3. Coupling link of electric heating system

For the coupling link of the electric heating combined energy system, the embodiment considers the CHP cogeneration unit capable of generating electricity and supplying heat simultaneously to meet the load demands of an electric power system and a thermodynamic system.

The common model of the cogeneration unit comprises a polygonal model and a linear model with fixed electric heating ratio, the extraction condensing unit with more flexible regulation is selected in the embodiment, and the polygonal model corresponding to the extraction condensing unit is as follows:

wherein: p (P) ^CHP ,H ^CHP Respectively represents the electric output and the thermal output of the CHP unit,

respectively represent the upper limit and the lower limit of the electric output power of the CHP unit, < >>

Respectively represent the upper limit and the lower limit of the heat output power of the CHP unit, alpha ₁ ,α ₂ ,α ₃ Is a polygonal region coefficient.

2. Objective function

For the optimal scheduling task of the electric heating combined system, the embodiment aims at minimizing the running cost and absorbing new energy output as much as possible.

1) Operation cost of thermal power station

Wherein: f (F) _1,t Indicating the operation cost of all the thermal power stations at the moment t, |N _P I is the number of thermal power stations, P _i,t Indicating the active output of the thermal power station i at the time t, alpha ₀ ,α ₁ ,α ₂ Is the consumption characteristic curve parameter of the thermal power generating unit.

2) Heat supply station operating cost

Wherein: f (F) _2,t Indicating the running cost of all heating stations at time t, |N _H I is the number of heating stations, H _i,t Representing the thermal power output of the heating station i at time t, beta ₀ ,β ₁ ,β ₂ Is a consumption characteristic curve parameter of the heating station.

3) CHP unit operation cost

Wherein: f (F) _3,t Representing the running cost of all CHP units at time t, |N _CHP The I is the number of CHP units,

respectively representing the electric output and the thermal output of the CHP unit i at the time t, mu ₀ ～μ ₅ Is the consumption characteristic curve parameter of the CHP unit.

4) Cost of wind disposal

Wherein: f (F) _4,t For the wind curtailment costs of all wind power stations at time t, |N _W I is the number of wind farms, alpha _i Is the wind power output absorption coefficient,

for the output of wind power station i at time t, corresponding +.>

For wind power grid-connected electric quantity, C _w Is a cost coefficient of wind abandoning.

5) Objective function

minF _t ＝F _1,t +F _2,t +F _3,t +F _4,t (9)

Wherein F is _t And the total running cost of the electric heating combined energy system at the moment t is represented.

3. Constraint conditions

1) Power balance constraint

Wherein:

respectively representing the output of a conventional unit, a CHP unit and a new energy power station of a node i at the time t,/I>

The thermal output of the conventional unit, CHP unit, at time t, respectively representing node i, +.>

An electrical load and a thermal load, respectively.

2) Safety restraint

The stable operation of the combined heat and power system needs to meet necessary safety constraint conditions, wherein the electric power network needs to meet voltage constraint, phase angle difference constraint, line transmission constraint and the thermal network needs to meet node temperature and pipeline flow constraint:

wherein: u (U) _i,min ,U _i,max The upper and lower limits of the voltage amplitude at node i,

for the upper phase angle difference, P _l For the upper limit of the transmission power of the power line, T _i,max ,T _i,min Upper and lower limits of water supply temperature, m of node i respectively _ij,max ,m _ij,min The upper and lower limits of the water supply flow rate of the pipeline ij are respectively set.

4. State diagram

Since the electrical and thermal networks have natural graph structures, they can be abstractly modeled as graphs G (V, E) of nodes and edges without considering internal device information. Where V represents a node in the system and E represents an edge in the system.

S2: and acquiring a historical state diagram of the electric heating combined energy system to form a training set.

S3: constructing a reinforcement learning model of optimal scheduling of an electrothermal combined energy system, setting a fully connected neural network in the reinforcement learning model to be changed into a graph neural network, setting an action space, a state space and a return function of the reinforcement learning model, adopting a maximum entropy of value distribution as an algorithm target of reinforcement learning, and training the reinforcement learning model through a training set.

Reinforcement learning is performed by an agent through exploration, and an optimal solution of the problem in the current environment is obtained. The agent obtains the current state s and outputs an action a that acts on the environment to obtain a corresponding reward or prize r. The intelligent agent learns network parameters according to the feedback return value, and continuously adjusts the output strategy to obtain the maximum accumulated return:

wherein: g is the cumulative return value and gamma E [0,1] is the discount rate, which is used to adjust the weight of the agent on the short-term return and the long-term return.

The Actor-Critic algorithm is a reinforcement learning method combining strategy gradient and time sequence difference learning, and the basic architecture is shown in fig. 2. Wherein, the Actor refers to a policy network pi _θ (a|s), i.e. learning a strategy to get as high a return as possible, critic is the value network V _φ (s) to estimate the current policy and output an evaluation value. Thus, the Actor-Critic algorithm can update parameters in a single step without waiting for the end of the environment time to update the network. Policy network pi in the algorithm framework _θ (s, a) and value network V _φ θ, φ in(s) are all functions to be learned, and need to be learned in the training process. In each step of updating, the Actor is updated according to the current environmental state s _t Output action a _t And get the immediate return r(s) _t ,a _t ,s _t+1 ). Critic gives a true return value according to the circumstances and a score r+γV under the previous criteria _φ (s _t+1 ) To adjust its scoring criteria so that its score is closer to the actual return of the environment. The Actor adjusts its own policy pi according to Critic score _θ 。

The value distribution maximum entropy Actor-Critic algorithm target adopted in the embodiment:

in the method, in the process of the invention,

represents the entropy of the policy pi (a|s) in state s. Compared with the Actor-Critic algorithm, the purpose of adding the entropy item is to randomize the strategy, namely, the probability of each action output is dispersed as far as possible instead of being concentrated on one action, so that the randomness of strategy learning is ensured, the exploration range is as large as possible, and the problem of falling into a local optimal solution is avoided.

To evaluate the strategy pi, a soft Q function is defined and Bellman operator is used

The goal of policy improvement is to find a new policy pi _new Better than current policies makes rewards expected to be larger, the policy network updates learning according to maximizing soft Q value:

to avoid overestimation of Q value in learning and thus reduce policy performance, the algorithm no longer directly calculates soft return Z ^π Desired value Q of (s, a) ^π (s, a) but rather models the soft return Z ^π Distribution of (s, a):

called value distribution function, and learn soft return Z based on Bellman operator ^π (s,a)：

Wherein R to R (|s, a), s _t+1 ～p,a _t+1 Pi. Sign symbol

The random variables representing the left and right ends have the same probability distribution. Let->

Obeying the distribution->

Updating parameters by minimizing the distribution distance:

where d is a distance function that measures two distributions, commonly used KL divergence.

Compared with an MLP (multi-layer perceptron) which does not use topology information, the graph neural network model can conduct information transfer between nodes based on connection relations between the nodes. To better utilize the information in the graph, the present embodiment adopts an attention mechanism to aggregate the node information to obtain a node representation:

in the middle of

Representing the vector representation of node i in the k-th layer neural network, W representing the neural network parameter matrix to perform linear transformation on node characteristics,/for>

Representing the neighborhood node of node i, alpha _i,j As the attention coefficient:

wherein the vector a is the parameter vector of the attention network, W _e Is a parameter matrix for linearly transforming the edge information, e _i,j As feature vectors of edges, GELU is an activation function, and l is a vector connector.

Solving the optimal scheduling strategy of the electrothermal combined energy system requires setting the action, state and return functions of the problem.

1) Action space

Active power output of a thermal power station, electric and thermal power output of a CHP unit, thermal power output of a heat supply station and a dissipation coefficient of a wind power station are taken as action variables:

the corresponding action ranges are as follows:

wherein:

the upper limit and the lower limit of the active power output of the thermal power station i are respectively +.>

The upper limit and the lower limit of the power output of the CHP unit i are respectively +.>

The upper limit and the lower limit of the i heat power output of the CHP unit are respectively +.>

The upper limit and the lower limit of the heat power output of the heat supply station i are respectively.

2) State space

Since the system is modeled as graph G (V, E), the system state is reflected by node features and edge features. The equipment of the electrothermal combined energy system is divided into electric power system equipment and thermodynamic system equipment.

The node characteristics and the edge characteristics of the power system are respectively as follows:

wherein: p (P) _i ^L Is the electrical load of node i.

The node characteristics and the edge characteristics of the thermodynamic system are respectively as follows:

wherein:

is the thermal load of node i. L (L) _ij Representing the length, m, of a pipe branch ij _ij Representing the pipe mass flow rate between node i and node j.

3) Rewarding function

The calculation of the return value includes the system running cost and the violation constraint penalty, and since the running cost is minimized and the reinforcement learning is maximized, the return needs to take a negative value:

wherein: rt represents the return value at time t, F _t For the running cost at time t shown in formula (9), lambda _i For the penalty factor corresponding to the ith constraint, i represents the sequence number of the constraints listed in formulas (10) and (11) |L _i I represents the absolute value of the difference in satisfaction of the i-th constraint, when the constraint is satisfied, L _i I is 0; when the constraint is not established, |L _i The i is the minimum of the absolute values of the differences from the boundary conditions. When the constraint is not established, as in the case of the constraint being the equation in the formula (10), L _i The I is the absolute value of the difference between two sides of the equation; if the upper and lower limit boundary conditions are included when the constraint is inequality in the formula (11), the minimum value (if a<X<b, when |X-a|<X-b, then L _i |= |x-a|; when |X-a|>X-b is L _i |= |x-b|); if only the upper limit boundary condition or the lower limit boundary condition is included, the absolute value of the difference from the upper limit or the absolute value of the difference from the lower limit is taken.

The network architecture of the Actor in this embodiment is shown in fig. 3. The model is input as a state diagram G (V _t ,E _t ) After passing through the neural network of the k-layer graph, the representation h is obtained _t,k Each layer ofThe activation function is GELU, the mean value mu of each action and the logarithm of variance lnsigma are output, and the lnsigma is subjected to exponential transformation to obtain normal distribution N (mu, sigma) ² ). After sampling and adding noise, the values in (-1, 1) are obtained through the Tanh layer, and finally mapped into the action range listed in the formula (21) to obtain the actual dispatching output value.

Learning of the value network Critic requires combining the output actions, at which time the node characteristics of the power system

Node characteristics with thermodynamic system->

The method comprises the following steps of:

map G (V) _t ',E _t ') into a value network to obtain a soft Q value.

Experimental verification analysis

The embodiment adopts an electric heating combined energy system as shown in fig. 4 for carrying out calculation analysis, and consists of an IEEE-33 bus power grid and a 32-node Bali heat supply network for analyzing the reinforcement learning optimization effect. Wherein G1 and G2 are two thermal power stations; w is a wind power station; GB1 and GB2 are two heat supply stations; CHP is a cogeneration unit.

In this embodiment, a comparison test is performed under the condition that the number of network layers is the same as the number of neurons, where the number of layers of the Actor network is 4, and the number of neurons in each layer is: 128, 64, 32, 32. The number of layers of Critic network is 5, and the number of neurons in each layer is as follows: 128 Each layer activation function is GELU, and the number of experience pools is: 500000. the learning rate is automatically adjusted through an Adam optimizer, and the adjustment range is 5 multiplied by 10 ^-4 ～5×10 ^-6 。

As can be seen from fig. 5, the return curve based on the GNN architecture converges after 3000 rounds of training, and the GNN starts to rise and converge faster than the return curve based on the MLP architecture, and the return value is larger, i.e. the running cost is lower. The algorithm model based on the GNN architecture is explained to utilize side information, so that a larger exploration space and a faster training speed are brought.

After training is completed, the strategy network can obtain system output action based on system load, and 24-hour scheduling results are shown in fig. 6 and 7.

As can be seen from the output results of the power system in each period shown in FIG. 6, the total output is basically consistent with the load curve, and the

thermal power unit

1,2 bears more power generation tasks due to larger installed capacity, and has the characteristic of slowly climbing in the peak period of daytime power consumption, so that the increase of actual power consumption requirements can be met. The load gap in the peak period is borne by the CHP unit, and the load gap runs with the lowest output power in the rest period. At night, wind power generation is more, the corresponding grid-connected power is correspondingly increased, and the wind power consumption coefficient is always kept at about 97%. The thermodynamic output result shown in fig. 7 shows that the total output is basically identical to the load curve, and the slow climbing can be realized to meet the requirement during the night load peak period. The heat source output difference is smaller due to the limitation of the output upper limit, so that the thermal loss in the transmission process can be effectively reduced.

The embodiment of the invention provides a value distribution maximum entropy Actor-Critic reinforcement learning optimization scheduling method based on a GNN architecture, which can fully utilize topological structure information of a system and realize more effective exploration learning. Compared with a method based on an MLP architecture, the method has the advantages that the utilization of system topology information brings a larger exploration space and a faster convergence speed, and the method is more advantageous in the optimal scheduling work of the electrothermal combined energy system.

Embodiment two:

the invention also provides an optimal scheduling terminal device of the electric heating combined energy system, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the steps in the method embodiment of the first embodiment of the invention are realized when the processor executes the computer program.

Further, as an executable scheme, the electric heating combined energy system optimizing and scheduling terminal device may be a computing device such as a desktop computer, a notebook computer, a palm computer, a cloud server, and the like. The optimal scheduling terminal equipment of the electrothermal combined energy system can comprise, but is not limited to, a processor and a memory. It will be appreciated by those skilled in the art that the above-mentioned composition structure of the electrothermal combined energy system optimal scheduling terminal device is merely an example of the electrothermal combined energy system optimal scheduling terminal device, and does not constitute limitation of the electrothermal combined energy system optimal scheduling terminal device, and may include more or fewer components than the above-mentioned components, or combine some components, or different components, for example, the electrothermal combined energy system optimal scheduling terminal device may further include an input/output device, a network access device, a bus, and the embodiment of the present invention does not limit the foregoing.

Further, as an implementation, the processor may be a central processing unit (Central Processing Unit, CPU), other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc. The general processor can be a microprocessor or any conventional processor, and the processor is a control center of the electric heating combined energy system optimal scheduling terminal device, and various interfaces and lines are used for connecting various parts of the whole electric heating combined energy system optimal scheduling terminal device.

The memory may be used to store the computer program and/or the module, and the processor may implement various functions of the electrothermal combined energy system optimal scheduling terminal device by running or executing the computer program and/or the module stored in the memory and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, at least one application program required for a function; the storage data area may store data created according to the use of the cellular phone, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.

The present invention also provides a computer readable storage medium storing a computer program which when executed by a processor implements the steps of the above-described method of an embodiment of the present invention.

The modules/units integrated by the electrothermal combined energy system optimization scheduling terminal device can be stored in a computer readable storage medium if the modules/units are realized in the form of software functional units and sold or used as independent products. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a software distribution medium, and so forth.

While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. An optimal scheduling method for an electric heating combined energy system is characterized by comprising the following steps:

2. The optimal scheduling method for the electric heating combined energy system according to claim 1, wherein the method comprises the following steps of: and taking the active power output of the thermal power station, the electric power output of the CHP unit, the thermal power output of the heat supply station and the absorption coefficient of the wind power station as action variables in the action space of the reinforcement learning model, and setting the value range of each action variable.

3. The optimal scheduling method for the electric heating combined energy system according to claim 1, wherein the method comprises the following steps of: the state of the system in the state space of the reinforcement learning model is represented by node features and edge features of the graph.

4. The optimal scheduling method for the electric heating combined energy system according to claim 1, wherein the method comprises the following steps of: the calculation formula of the return function of the reinforcement learning model is as follows:

wherein r is _t Indicating the return value of time t, F _t Representing the running cost at time t, i representing the sequence number of the constraint, lambda _i Represents penalty factor corresponding to the ith constraint, |L _i I represents the absolute value of the difference in satisfaction of the i-th constraint, when the constraint is satisfied, L _i I is 0; when the constraint is not established, |L _i The i is the minimum of the absolute values of the differences from the boundary conditions.

5. The optimal scheduling method for the electric heating combined energy system according to claim 4, wherein the optimal scheduling method is characterized by comprising the following steps of: the electric heating combined energy system consists of a thermal power station, a heat supply station, a CHP unit and a wind power station, and the operation cost is the sum of the operation cost of the thermal power station, the operation cost of the heat supply station, the operation cost of the CHP unit and the wind discarding cost of the wind power station.

6. The optimal scheduling method for the electric heating combined energy system according to claim 1, wherein the method comprises the following steps of: and aggregating the node information by adopting a concentration mechanism in the graph neural network of the reinforcement learning model to obtain node representation.

7. The optimal scheduling method for the electric heating combined energy system according to claim 1, wherein the method comprises the following steps of: in the reinforcement learning model, the expected value of the soft Q function return is not directly calculated any more, but a value distribution function of the soft Q function return is modeled, and the soft Q function return is learned based on the Bellman operator.

8. An electric heating combined energy system optimal scheduling terminal device is characterized in that: comprising a processor, a memory and a computer program stored in the memory and running on the processor, which processor, when executing the computer program, carries out the steps of the method according to any one of claims 1 to 7.

9. A computer-readable storage medium storing a computer program, characterized in that: the computer program implementing the steps of the method according to any one of claims 1 to 7 when executed by a processor.