WO2023082697A1

WO2023082697A1 - Coordination and optimization method and system for comprehensive electric-thermal energy system, and device, medium and program

Info

Publication number: WO2023082697A1
Application number: PCT/CN2022/107149
Authority: WO
Inventors: 蒲天骄; 董雷; 李烨; 王新迎
Original assignee: 中国电力科学研究院有限公司
Priority date: 2021-11-15
Filing date: 2022-07-21
Publication date: 2023-05-19
Also published as: CN113902040A; CN113902040B

Abstract

Disclosed in the present application are a coordination and optimization method and system for a comprehensive electric-thermal energy system, and a device, a medium and a program, wherein the method is executed by an electronic device. The method comprises: acquiring real-time parameters of a comprehensive electric-thermal energy system; on the basis of the real-time parameters of the comprehensive electric-thermal energy system, respectively calculating the real-time power-generation power of an electric power system, a thermodynamic system and a coupling apparatus of the comprehensive electric-thermal energy system; and inputting the real-time power-generation power into a pre-trained SAC framework-based optimization scheduling model, and outputting a scheduling action, to form a coordination strategy for the comprehensive electric-thermal energy system.

Description

Coordinated optimization method, system, equipment, medium and program for electric-thermal integrated energy system

Cross References to Related Applications

This patent application requires that the Chinese patent application number submitted on November 15, 2021 is 202111349881.4, the applicant is China Electric Power Research Institute Co., Ltd., and the application name is "coordinated optimization method, system, equipment and storage medium for electric-thermal comprehensive energy system" The entirety of this application is incorporated by reference into this disclosure.

technical field

The present disclosure relates to the field of optimal dispatching of integrated energy systems, in particular to a coordination and optimization method, system, equipment, medium and program for electric-thermal integrated energy systems.

Background technique

In the context of Energy Internet, in order to further improve energy utilization, alleviate energy crisis, and break through the traditional energy system structure and industry barriers, it has become an effective way to study the coordination and optimization of electric-thermal integrated energy systems.

The electric-thermal integrated energy system can promote the consumption of renewable energy and improve energy utilization by utilizing the complementary characteristics of heat and electricity. In practical applications, the solution to the optimization problem of the electric-thermal integrated energy system mainly includes traditional nonlinear methods such as particle swarm optimization, intelligent algorithms such as Q learning, and deep Q (Deep Q Network, DQN) algorithms. However, both the particle swarm algorithm and the Q-learning algorithm have problems such as insufficient accuracy, slow calculation speed, and limited scope of application. The DQN algorithm has insufficient exploration ability and is easy to fall into local optimal solutions.

Generally speaking, the optimal dispatching problem of the integrated energy system with increasingly close coupling has highly nonlinear characteristics, and currently lacks an economical, accurate and reliable solution method. Therefore, there is an urgent need for an intelligent algorithm with reliable convergence, strong ability to explore optimal strategies, and high-precision requirements to optimize the electric-thermal integrated energy system.

Contents of the invention

Embodiments of the present disclosure provide a coordinated optimization method, system, equipment, medium and program for an electric-thermal comprehensive energy system.

An embodiment of the present disclosure provides a coordinated optimization method for an electric-thermal comprehensive energy system, the method is executed by an electronic device; the method includes:

Obtain real-time electric-thermal comprehensive energy system parameters;

Based on the real-time electric-thermal comprehensive energy system parameters, calculate the real-time power generation power of the electric power system, thermal system and coupling device of the electric-thermal comprehensive energy system;

Input the real-time generated power into the pre-trained optimal scheduling model based on the Soft Actor-Critic (SAC) framework, output the scheduling action, and form the coordination strategy of the electric-thermal comprehensive energy system;

The training optimization method based on the SAC frame optimization dispatching model of described pre-training comprises:

Obtain historical electric-thermal comprehensive energy system parameters;

Based on the historical electric-thermal integrated energy system parameters, calculate the historical power generation power of the electric power system, thermal system and coupling device of the electric-thermal integrated energy system, and based on the historical power generation power of the electric power system, thermal system and coupling device Establish a dispatching model for an electric-thermal integrated energy system;

Taking reinforcement learning environment, state, action and reward as basic elements, combined with electric-thermal comprehensive energy system scheduling model to establish an optimal scheduling model based on SAC framework;

Train the optimal scheduling model based on the SAC framework to obtain a pre-trained optimal scheduling model based on the SAC framework;

The basic elements of strengthening the learning environment, state, action and reward are combined with the electric-thermal comprehensive energy system scheduling model to establish an optimal scheduling model based on the SAC framework, including:

Set the action variable to

Among them, P _i ^G ,

And P _i ^chp is the generating power of the conventional unit, the thermal power of the combined heat and power device, and the generating power of the combined heat and power device in turn;

Determine the state space variable as

Among them, P _i ^G , P ^load , P ^w , P _i ^chp , H ^load ,

And T _e is the generating power of the conventional unit, the electric load, the wind power generating power, the generating power of the cogeneration device, the heat load, the thermal power of the cogeneration device and the ambient temperature;

Build an intensive learning environment, get the current action acting on the environment through the policy network, get immediate rewards and the state of the next period, and provide rewards for strategy evaluation;

Set the reinforcement learning goal to maximize the long-term reward, determine the opposite number of the optimization goal and design it as an immediate reward, and set a penalty mechanism according to the constraints to add to the immediate reward to obtain the final reward function; wherein, the penalty mechanism is:

β _v is the penalty coefficient; the constant coefficient is set correspondingly to the limit violation penalty;

The reward function is

f ₁ , f ₂ , and f ₃ are the operating cost of the conventional unit, the operating cost of the combined heat and power unit, and the wind curtailment penalty, respectively;

as well as

Respectively, the conventional unit output over-limit and climbing over-limit penalty items; φ _V is the system node voltage over-limit penalty;

as well as

Respectively, the output of the combined heat and power unit and the over-limit penalty for climbing; φ _T is the over-limit penalty for the system node temperature; φ _m is the over-limit penalty for the mass flow rate of the system pipeline.

An embodiment of the present disclosure also provides a coordinated optimization system for an electric-thermal comprehensive energy system, including:

The first parameter acquisition module is configured to acquire real-time electric-thermal comprehensive energy system parameters;

The power calculation module is configured to calculate the real-time power generation power of the electric power system, the thermal system and the coupling device of the electric-thermal comprehensive energy system based on the real-time electric-thermal comprehensive energy system parameters;

The scheduling output module is configured to input the real-time generated power into the pre-trained optimal scheduling model based on the SAC framework, output scheduling actions, and form an electric-thermal comprehensive energy system coordination strategy;

The optimized scheduling model based on the SAC framework of the pre-training in the scheduling output module includes:

The second parameter acquisition module is configured to acquire historical electric-thermal comprehensive energy system parameters;

The model building module is configured to calculate the historical power generation power of the electric power system, thermal system and coupling device of the electric-thermal comprehensive energy system based on the historical electric-thermal comprehensive energy system parameters, and use the electric power system, thermal system and coupling device The dispatching model of the electric-thermal integrated energy system is established on the basis of the historical power generation;

The model optimization module is configured to take the reinforcement learning environment, state, action and reward as the basic elements, combined with the electric-thermal comprehensive energy system scheduling model to establish an optimal scheduling model based on the SAC framework;

The model training module is configured to train the optimal scheduling model based on the SAC framework to obtain a pre-trained optimal scheduling model based on the SAC framework.

The model optimization module is configured to set the action variable as

Among them, P _i ^G ,

Determine the state space variable as

Among them, P _i ^G , P ^load , P ^w , P _i ^chp , H ^load ,

In the formula, β _v is the penalty coefficient; the constant coefficient is set correspondingly to the limit violation penalty;

The reward function is

as well as

An embodiment of the present disclosure also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and operable on the processor. When the processor executes the computer program, the foregoing A coordinated optimization method for the electric-thermal comprehensive energy system.

An embodiment of the present disclosure also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, it realizes the coordinated optimization of the electric-thermal comprehensive energy system as described above method.

An embodiment of the present disclosure also provides a computer program, the computer program includes computer readable codes, and when the computer readable codes run in an electronic device, the processor of the electronic device executes the A coordinated optimization method for the electric-thermal comprehensive energy system.

This disclosure uses an optimized scheduling model based on an optimized SAC framework to give scheduling actions when a system state such as a load is given, and then generate a strategy; when the system gives a determined system state such as a load, the algorithm can directly pass the trained The strategy network gives the scheduling action, no need to solve the traditional nonlinear overall iterative solution, the calculation speed has been significantly improved, and the calculation efficiency is higher.

After the electric-thermal comprehensive energy system model is established in this disclosure, an optimal scheduling model based on the SAC framework is established. The SAC algorithm belongs to a random offline strategy algorithm, and its self-optimizing characteristics are used for autonomous learning of comprehensive energy optimal scheduling problems, and are obtained through interactive observation with the environment. The reward value evaluates the pros and cons of the strategy, and learns to explore all the optimal strategy approaches, so as to achieve the optimal cost in the scheduling cycle under unsupervised. The network model after learning and training solves the problem of dimensionality disaster caused by discrete processing state and action. Moreover, it can converge reliably and has a wide range of applications. For example, it can be applied to scenarios where random changes in output of renewable energy such as wind power and loads are considered, and it is more universal.

Description of drawings

In order to illustrate the technical solutions of the embodiments of the present disclosure more clearly, the following will briefly introduce the accompanying drawings used in the embodiments. The accompanying drawings here are incorporated into the specification and constitute a part of the specification. The drawings show embodiments consistent with the present disclosure, and are used together with the specification to illustrate the technical solutions of the embodiments of the present disclosure. It should be understood that the following drawings only show some embodiments of the present disclosure, and therefore should not be regarded as limiting the scope. For those skilled in the art, they can also make Other related drawings are derived from these drawings.

Fig. 1 shows a schematic flow diagram of a coordinated optimization method for an electric-thermal comprehensive energy system provided by an embodiment of the present disclosure;

FIG. 2 shows a schematic flow diagram of a pre-trained training optimization method based on an SAC framework optimization scheduling model provided by an embodiment of the present disclosure;

Figure 3 shows a schematic structural view of an electric-thermal integrated energy system;

Fig. 4 shows the structural representation of thermal system;

Figure 5 shows a specific algorithm flow chart for training based on the optimal scheduling model of the SAC framework;

Fig. 6 shows a schematic structural diagram of an electric-thermal comprehensive energy system coordination and optimization system provided by an embodiment of the present disclosure;

FIG. 7 shows a schematic structural diagram of a pre-trained SAC framework-based optimization scheduling model adopted by an embodiment of the present disclosure;

FIG. 8 shows a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.

Detailed ways

In order to make the purpose, technical solutions and advantages of the embodiments of the present disclosure clearer, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only It is a part of the embodiments of the present disclosure, but not all of them. The components of the disclosed embodiments generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations. Accordingly, the following detailed description of the embodiments of the present disclosure provided in the accompanying drawings is not intended to limit the scope of the claimed disclosure, but merely represents selected embodiments of the present disclosure. Based on the embodiments of the present disclosure, all other embodiments obtained by those skilled in the art without creative efforts shall fall within the protection scope of the present disclosure.

It should be noted that like numerals and letters denote similar items in the following figures, therefore, once an item is defined in one figure, it does not require further definition and explanation in subsequent figures.

In the field of electric-thermal comprehensive energy system optimization technology, related technologies provide a technical solution of particle swarm optimization algorithm, in which particle swarm optimization algorithm is an evolutionary computer technology. The information interaction of the environment starts from the random initial value of the system, and seeks the optimal value through iteration. In order to realize the above scheme, in the integrated electricity-heat energy system (Integrated Electricity-Heat Energy System, IEHS) optimal dispatching model, it is first necessary to establish the objective function and determine the constraints, such as power grid, heating network power flow constraints, equipment output constraints, and safe operation Constraints, etc., and then use the particle swarm optimization algorithm to solve the IEHS optimization problem.

The solution steps of the particle swarm algorithm are:

1) Setting parameters: number of iterations, number of independent variables, maximum velocity of particles, initial velocity and position of particle swarm;

2) Define the fitness function: determine the optimization goal according to the IEHS optimal scheduling model. In each iteration process, the optimal solution of the particle is the extreme value of the particle, and the global optimal solution takes the minimum value of all particles, compared with the last global optimal solution, it is updated according to formula (1):

Wherein, the value of i can be an integer greater than or equal to 1, which represents the total number of particles in the group; V _id is the velocity of the particle; P _gd is the extreme value of the particle; X _id is the current position of the particle; random(0, 1) is a random number between 0 and 1; C ₁ and C ₂ are learning factors; ω is an inertia factor, and its value is non-negative.

3) Conditions for stopping the iteration: the maximum number of iterations is reached or the iteration difference meets the accuracy requirement.

The disadvantages of the above algorithms are as follows: (1) The particle swarm optimization algorithm has insufficient convergence, and it may even be difficult to converge, and it is easy to fall into a local optimal solution. Lost the significance of multi-energy collaborative optimization; (2) The electrothermal integrated energy system with increasingly close coupling presents highly nonlinear characteristics, and the calculation speed of the particle swarm algorithm is significantly reduced when solving, and the calculation efficiency cannot meet the economic requirements of the electrothermal integrated energy system. Scheduling issues.

In the field of electric-thermal comprehensive energy system optimization technology, related technologies also provide a Q-learning solution, which needs to be realized through a Q-learning algorithm. Among them, the Q-learning algorithm is based on the Markov decision process and is a model-independent reinforcement learning algorithm. The general steps to optimize the electric-thermal comprehensive energy system using the Q-learning algorithm are: design the action state space, discretize the continuous action space and the state space, and establish the Q-learning reward and punishment mechanism according to the system optimization goals and operating constraints. Continuous trial and error exploration, interact with the environment and update the Q value table, and finally achieve the goal of autonomously selecting the optimal action.

In each training of the Q-value table, for a certain time T, an action _{a t} _is selected from the Q-value table according to the state st at that moment. Apply the action to the environment to get immediate rewards, and complete the state transition to enter the next state s _t '. According to the Bellman optimal criterion, the optimal index corresponding to the optimal strategy is: the instant reward rt obtained by the agent action a _t of the electric-thermal integrated energy system at this moment, and the maximum Q value obtained by the subsequent state transition The sum of max _a' Q(s _t ',a _t '). Therefore, the Q value table can be updated according to the Bellman optimal criterion through formula (2):

Q(s,a)←Q(s,a)+α[r _t +γmax _a' Q(s _t ',a _t ')-Q(s,a)] (2)

After several times of training, the agent can make the optimal control action of the electric-thermal integrated energy system according to the input state information and Q value table. In formula (2), the value ranges of α and γ are both [0,1].

The disadvantages of the above scheme are:

(1) The action state space of the electric-thermal integrated energy system is mostly a continuous interval. In order to use the Q-learning algorithm, the continuous space needs to be discretized. Calculation based on the discretized action space and state space will lead to a significant increase in calculation accuracy. (2) The Q-learning algorithm is not suitable for solving large-scale electric-thermal integrated energy system optimization problems. The increase in the scale of the problem will lead to an increase in the dimension of the action space and the complexity of the network; and the increase in the action space will lead to a substantial increase in the dimension of the Q index. If it is too large, it will lead to the increase of training difficulty and the poor fitting ability of Q index, which makes it difficult to model the high-complexity electric-thermal integrated energy system network.

In the field of electro-thermal comprehensive energy system optimization technology, related technologies also provide a DQN learning solution. Among them, DQN is the product of the combination of deep learning and reinforcement learning of decision-making ability. DQN extends the practicability of reinforcement learning by learning control policies directly from high-dimensional raw data when building deep learning networks. When applying DQN to optimize the electric-thermal comprehensive energy system, the action and state space should be designed first, the deep learning network should be constructed to fit the Q value, and the experience playback unit should be constructed to store historical samples. The experience playback unit is randomly sampled for each training, and the Q network is trained according to the sampled samples.

The DQN algorithm first obtains the observation value from the environment, and the agent obtains all Q(s, a) about the observation value according to the value function neural network; then the agent uses the strategy algorithm to make a decision, obtains the action, and obtains Feedback reward value r; then use the obtained reward r to update the parameters of the value function network, so as to enter the next iteration. until the network training is completed.

In the above training process, DQN needs to define the corresponding loss function, and use the gradient descent algorithm to update the parameters. By continuously updating the weight parameters of the neural network, the output value of the Q network can gradually approach the optimal Q value. The definition of the loss function is based on the residual model, that is, the square of the difference between the real value and the network output, as shown in formula (3):

In order to reduce the correlation and improve the stability of the algorithm, DQN introduces a target Q network on the basis of the original Q network. This network has the same structure as the Q network, and the initial weight is the same, but the Q network The parameters will be updated in the network, while the parameters of the target Q network will be updated every once in a while.

Compared with the Q-learning method, the above scheme is more suitable for continuous control action scenes, but the difficulty of continuous action space exploration is more complex, and the exploration is difficult. It is difficult for DQN to use a specific strategy algorithm to ensure effective exploration of the state space, and local maximums may occur. Solve the problem optimally.

Based on the above problems, the present disclosure adopts the SAC algorithm to solve the economic scheduling problem of the electric-thermal integrated energy system, and proposes a coordinated optimization method for the electric-thermal integrated energy system. Figure 1 shows the electric-thermal energy system provided by the embodiment of the present disclosure. A schematic flow chart of a coordinated optimization method for an integrated energy system, as shown in Figure 1, the method is performed by an electronic device, and the method includes the following steps:

Step 101. Obtain real-time electric-thermal comprehensive energy system parameters.

Step 102. Based on the parameters of the real-time electric-thermal comprehensive energy system, calculate the real-time power generation power of the electric power system, the thermal system and the coupling device of the electric-thermal comprehensive energy system respectively.

Step 103: Input the real-time generated power into the pre-trained optimal scheduling model based on the SAC framework, output scheduling actions, and form a coordination strategy for the electric-thermal comprehensive energy system.

FIG. 2 shows a schematic flow diagram of a pre-trained training optimization method based on an SAC framework optimization scheduling model provided by an embodiment of the present disclosure. As shown in FIG. 2 , the process includes:

Step 201. Obtain historical electric-thermal comprehensive energy system parameters.

Step 202. Based on the parameters of the historical electric-thermal integrated energy system, respectively calculate the historical power generation power of the electric-thermal integrated energy system, the thermal system and the coupling device, and take the historical power generation power of the electric power system, the thermal system and the coupling device as Basic establishment of electric-thermal comprehensive energy system dispatching model.

Step 203, taking reinforcement learning environment, state, action and reward as the basic elements, and combining with the electricity-thermal integrated energy system scheduling model to establish an optimal scheduling model based on the SAC framework.

Step 204: Train the optimal scheduling model based on the SAC framework to obtain a pre-trained optimal scheduling model based on the SAC framework.

The economic scheduling method of the electric-thermal comprehensive energy system based on the SAC algorithm in this disclosure adopts a continuous control strategy, and at the same time adds the maximum entropy to the optimization target, which can interact with the electric-thermal comprehensive energy system, learn and generate an optimal control strategy, and solves the problem of electric heating. -The problem of high-dimensional solution, difficult convergence, and difficult precise optimization in the process of collaborative optimization of thermal integrated energy system. Therefore, this disclosure provides strong technical support for the idea of collaborative optimization of integrated energy systems, provides decision-making basis for decision-makers of integrated energy dispatching, and is of great significance for realizing multi-energy complementarity of the system, renewable energy consumption, and improving system operation economy .

This disclosure adopts the deep reinforcement learning method to solve the economic scheduling problem of the electrothermal integrated energy system, which is mainly used to solve the following problems in the traditional method: 1) The deep reinforcement learning method uses the neural network to fit the optimal strategy of the electrothermal integrated energy system under different states, and the network After the training is completed, the scheduling strategy can be obtained in real time, while the traditional nonlinear traditional algorithm requires global optimization, and the deep reinforcement learning method improves the computational efficiency. 2) Deep reinforcement learning has stronger exploration ability and better convergence stability in the optimal scheduling problem of electric-thermal integrated energy system. Compared with the particle swarm algorithm of the agent algorithm, the scheduling cost is lower. 3) The SAC algorithm-based deep reinforcement learning economic scheduling method for electric-thermal comprehensive energy systems proposed in this disclosure adopts a continuous control strategy to overcome the difficult problem of high-dimensional solution caused by the discretization of variables based on the value function reinforcement learning method. Maximum entropy is added to the optimization objective, exploring various optimal possibilities.

The method for coordinating and optimizing the electric-thermal comprehensive energy system of the invention will be described in detail below in conjunction with specific implementation and accompanying drawings.

The disclosed method includes the following steps:

Step 1. Import the parameters of the electric-thermal comprehensive energy system.

FIG. 3 shows a schematic structural diagram of an electric-thermal comprehensive energy system. The system shown in FIG. 3 includes an electric power system 301 and a thermal system 302 . In the embodiment of the present disclosure, it is necessary to first collect the network parameters of the electrothermal integrated energy system as shown in FIG. 3 , the output of the electrothermal load, and the output of wind power. In the embodiment of the present disclosure, the real-time electric-thermal comprehensive energy system parameters and historical electric-thermal comprehensive energy system parameters include electric-thermal comprehensive energy system network parameters, electric-heat load output and wind power output. The collected data are shown in Table 1.

Table 1 Electric-thermal comprehensive energy parameter table

Based on the parameters of the real-time electric-thermal integrated energy system, the real-time power generation power of the power system, thermal system and coupling device of the point-thermal integrated energy system can be calculated separately, which can be realized by step 2:

Step 2. Establish an electric-thermal comprehensive energy system model. The electric-thermal comprehensive energy system is modeled by using the present disclosure in three parts: electric power, thermal system and coupling device.

(1) Power system. Determine the AC power flow as an analysis method of the power system; where, the power balance equation of the power system can be formula (4):

In formula (4), P _{i and} Q _i are the injected active and reactive power of node i respectively, V _i is the voltage amplitude of node i, G _ij and B _ij are the conductance and susceptance of branch ij respectively, θ _ij is the phase angle difference of branch ij;

A collection of power system nodes.

(2) thermal system. The district heating system adopts centralized heating for heating, and Fig. 4 shows the structural diagram of the heating system. As shown in Figure 4, the heat source of the thermal system generates heat energy, which is transported to the heat load through the water supply pipeline to form the first passage 401; after the heat load is cooled down, it flows back through the return water pipe to form the second passage 402, and the first passage 401 It forms a closed loop with the second passage 402 . The thermal system is divided into two parts: the hydraulic model and the thermal model:

1) Hydraulic model. The thermal system consists of a hydraulic model and a thermal model; the hydraulic model of the thermal system represents the medium flow, and consists of flow continuity equation, loop pressure equation and pressure head loss equation, as shown in formula (5).

In the formula, A _h is the node-branch correlation matrix; B is the loop-branch correlation matrix;

is the pipeline mass flow rate;

is the flow rate injected into the node; h _f is the pressure head loss, and K is the damping coefficient of the pipeline.

2) Thermal model. The thermal model can represent the energy transfer process, which can be composed of node power equation, pipe temperature drop equation and node medium mixing equation, as shown in formula (6):

In the formula, H _i is the injected heat power of node i, C _p is the specific heat capacity of water, T _s,i and T _o,i are the water temperature and outlet water temperature of the heat transfer pipe of node i, and the subscript ij of T _j(ij) Indicates the pipeline branch of the heating network with i and j as the head and end nodes, T _i(ij) and T _j(ij) are the temperatures of the i and j ends of the branch, and T _e represents the external ambient temperature.

(3) Coupling device

Exemplarily, the heat and power cogeneration unit adopts the extraction condensing unit, and the operating point is within the polygonal area. The power generation and heat generation of the coupling device can be expressed by formula (7):

In the formula,

Respectively, the period t, the electric output and thermal output of the i-th condensing unit;

are the upper and lower limits of electric output; α ₁ , α ₂ and α ₃ are polygonal area representation coefficients, and α ₁ , α ₂ and α ₃ are constants when a cogeneration device is given.

Exemplarily, establishing an electric-thermal integrated energy system dispatching model based on the historical power generation power of the electric power system, the thermal system, and the coupling device can be realized through the following steps:

Step 2-1, establishing the objective function.

Determine the goal of minimizing the total operating cost of the electric-thermal integrated energy system, and at the same time, in order to achieve the maximum consumption of renewable energy, the part of the renewable energy that is not consumed is used as a penalty item to establish an objective function; among them, the objective The function can be shown as formula (8):

min F＝f ₁ +f ₂ +f ₃ (8)

In the formula, f ₁ is the operating cost of the conventional unit, f ₂ is the operating cost of the combined heat and power unit, and f ₃ is the penalty for abandoning wind.

1) The operating cost f ₁ of the conventional unit can be obtained by formula (9):

In the formula,

is the generating power of conventional units, b ₀ , b ₁ and b ₂ are the energy consumption coefficients of conventional units, and N ^G is the number of conventional units. T is the scheduling period, and △t is the scheduling time interval.

2) The operating cost f ₂ of the combined heat and power unit can be obtained by formula (10):

In the formula,

are the power generation power and heat production power of the cogeneration device connected to node i in the period t; a ₀ , a ₁ , a ₂ , a ₃ , a ₄ and a ₅ are the energy consumption coefficients of the cogeneration device, N ^chp Amount of combined heat and power.

3) The wind curtailment penalty f ₃ can be calculated by formula (11):

In the formula,

Indicates the output of wind turbines connected to node i in time period t, k is the wind curtailment penalty coefficient, and k is a constant.

Step 2-2. Establish constraints on the scheduling model of the electric-thermal integrated energy system.

Among them, the constraints include node power balance equation constraints, network security constraints, combined heat and power device constraints, renewable energy constraints, and conventional unit output constraints.

1) Constraints of node power balance equations, Equation (12) and Equation (13) can express the network node active power balance equation.

in,

is the set of power system and thermal system nodes, and T is the scheduling period.

Respectively, period t, the electric load and thermal load power of node i.

2) Network security constraints

In order to realize the safe and reliable operation of the electric-thermal integrated energy system, the system should satisfy the network security constraints of Equation (14)-Equation (16).

In the formula: V _i,max and V _i,min are the upper limit and lower limit of the voltage amplitude of node i respectively; T _sj is the temperature of hot water flowing into node j of the heating network,

are the upper and lower limits of the water supply temperature; m _jk is the mass flow rate of the hot water pipe k, m _k,max and m _k,min are the upper and lower limits of the mass flow rate respectively.

3) Constraints of cogeneration device

The cogeneration unit should meet the climbing constraints, which can be shown in formula (17):

In the formula:

are the combined heat and power generation power of the two periods before and after, respectively,

are the upper and lower limits of the ramp rate of the cogeneration unit, respectively.

4) Renewable energy constraints can be shown in formula (18):

In the formula,

Indicates the power generation power of wind turbine i in time period t,

for

maximum output value.

5) The conventional unit output constraints can be shown in formula (19):

At the same time, the climbing constraints shown in formula (20) are satisfied:

In the formula:

are the upper and lower limits of unit output, respectively,

Divided into the upper and lower limits of the ramp rate of the unit.

Step 3. Establish an optimal dispatching model of the electric-thermal coupling integrated energy system based on SAC.

According to the four basic elements of reinforcement learning environment, state, action, and reward, combined with the electric-thermal comprehensive energy system scheduling model to establish an optimal scheduling model based on the SAC framework can include:

1) Action space. The action space variable corresponds to the control variable of the system under study, and the power generation power of the conventional unit, the cogeneration power generation power and the cogeneration heat power are set as the action variables, as shown in formula (21). In formula (21), P _i ^G ,

And P _i ^chp is the generating power of the conventional unit, the thermal power of the combined heat and power device, and the generating power of the combined heat and power device in turn.

2) State space. The state space variables correspond to the state variables of the system under study, and the determined state space variables are shown in formula (22), where P _i ^G , P ^load , P ^w , P _i ^chp , H ^load ,

And T _e is the generating power of conventional units, electric load, wind power, generating power of cogeneration device, heat load, thermal power of cogeneration device and ambient temperature:

3) Environment construction. Build a reinforcement learning environment, get the current action acting on the environment through the policy network, get immediate rewards and the state of the next period, play the role of state transfer, and provide rewards for strategy evaluation.

4) Obtain the reward function. The goal of reinforcement learning is set to maximize the long-term reward, and the opposite number of the optimization goal is determined as the immediate reward. At the same time, the penalty mechanism is set according to the constraints and added to the immediate reward to obtain the final reward function. The unified expression form of the penalty item can be shown in formula (23) to formula (24):

In the formula, β _v is the penalty coefficient, and the corresponding constant coefficient is set according to different limit violation penalties.

The reward function includes the operating cost of conventional units, the penalty for abandoning wind, the operating cost of the cogeneration unit, and the penalty for exceeding the variable limit, which can be shown in formula (25).

In the formula, f ₁ , f ₂ , and f ₃ are the operating cost of the conventional unit, the operating cost of the combined heat and power unit, and the wind curtailment penalty, respectively.

as well as

They are the penalty items for conventional unit output over-limit and ramp-up over-limit respectively, and φ _V is the system node voltage over-limit penalty.

as well as

They are the output of cogeneration units and the over-limit penalty for climbing, φ _T is the over-limit penalty for system node temperature, and φ _m is the over-limit penalty for system pipeline mass flow rate.

Step 4, SAC training process.

Exemplarily, the optimal scheduling model based on the SAC framework is trained to obtain a pre-trained optimal scheduling model based on the SAC framework, which can be implemented in the following manner:

Firstly, optimize the scheduling agent agent policy network φ and the evaluator Q network; Figure 5 shows the algorithm flow chart for training the optimal scheduling model based on the SAC framework. As shown in Figure 5, the process may include the following steps:

Step 501, θ ₁ and θ ₂ are initialized.

Exemplarily, it is possible to assign a value to the target network of the evaluator, namely

and

Step 502, setting the capacity of the intelligent memory bank D.

Step 503, initialize t and g.

Exemplarily, both t and g may be initialized to be 0.

Step 504, judging whether t is in T.

Exemplarily, if t is in T, execute step 505 to step 507; if t is not in T, execute step 508.

Step 505: Sampling the control action and applying the action to the environment to obtain the running state at the next moment.

Step 506, put the state transition and rewards into the experience database D.

Step 507, t is incremented by itself.

Step 508, judging whether g is in G.

Exemplarily, if g is in G, execute steps 509 to 510, and if g is not in G, execute step 511.

Step 509, update the Q network of the judge, the policy network of the actor, the temperature coefficient, and the target network.

Step 510, g is incremented by itself.

Step 511, judging whether the change range of the average value of rewards for m ₀ consecutive training rounds is less than δ _e %.

If yes, execute step 503; otherwise, execute step 512.

Step 512, end.

Exemplarily, when the variation range of the average value of the reward for continuous m ₀ rounds of training is less than δ _e %, for each time period, it is necessary to sample the control action a _t ～π _φ (a _t |S _t ) from the actor policy network , and apply the action on the electric-thermal integrated energy system, sample the uncertainty of wind power, and get the operating state S _t+1 of the system at the next moment, and then put the state transition and reward into the experience database D, that is, D←D {(S _t ,a _t ,r(S _t ,a _t ),S _t+1 }; For each gradient update, the Adam strategy is used to update the evaluator Q network

Actor Policy Network

Temperature Coefficient

target network

Update the evaluator Q network, the actor policy network φ, the temperature coefficient, and the target network to obtain the trained policy network as the optimal scheduling model of the SAC framework.

Exemplarily, the trained policy network can directly give scheduling actions and generate policies when the system state such as load is given.

For example, the calculation of the cost of generators and combined heat and power units can be replaced by a linear model, but it will affect the accuracy of the calculation results; the penalty item of the CHP unit can be modeled in the form of a constant power-to-heat ratio, but its control flexibility and calculation accuracy It is not as good as the polygonal area model; the penalty function in the reward and punishment mechanism can be established in the form of a step function, but it is difficult to fit the step function or noise neural network, which reduces the solution accuracy. The training method can use Stochastic Gradient Descent (SGD) instead of Adam, but practice shows that the Adam algorithm is better.

Fig. 6 shows a schematic structural diagram of an electric-thermal comprehensive energy system coordination and optimization system 6 provided by an embodiment of the present disclosure. As shown in Fig. 6, the system includes:

The first parameter acquisition module 601 is configured to acquire real-time electric-thermal comprehensive energy system parameters;

The power calculation module 602 is configured to calculate the real-time power generation power of the power system, the thermal system and the coupling device of the electric-thermal comprehensive energy system respectively based on the real-time electric-thermal comprehensive energy system parameters;

The scheduling output module 603 is configured to input real-time generated power into the pre-trained optimal scheduling model based on the SAC framework, output scheduling actions, and form a coordination strategy for the electric-thermal comprehensive energy system.

Wherein, FIG. 7 shows a schematic structural diagram of a pre-trained SAC framework-based optimal scheduling model 7 adopted by an embodiment of the present disclosure. As shown in FIG. 7 , the pre-trained SAC framework-based optimal scheduling model 7 in the scheduling output module includes:

The second parameter acquisition module 701 is configured to acquire historical electric-thermal comprehensive energy system parameters;

The model building module 702 is configured to separately calculate the historical power generation power of the electric power system, thermal system and coupling device of the electric-thermal comprehensive energy system based on the parameters of the historical electric-thermal comprehensive energy system, and use the power system, thermal system and coupling device Based on the historical power generation power, the electric-thermal integrated energy system dispatching model is established;

The model optimization module 703 is configured to use the reinforcement learning environment, state, action and reward as basic elements, and combine the electric-thermal integrated energy system scheduling model to establish an optimal scheduling model based on the SAC framework;

The model training module 704 is configured to train the optimal scheduling model based on the SAC framework to obtain a pre-trained optimal scheduling model based on the SAC framework.

The model optimization module 703 is configured to set the action variable as

Among them, P _i ^G ,

Determine the state space variable as

Among them, P _i ^G , P ^load , P ^w , P _i ^chp , H ^load ,

Set the reinforcement learning goal to maximize the long-term reward, determine the opposite number of the optimization goal and design it as the immediate reward, and set the penalty mechanism according to the constraints to add to the immediate reward to obtain the final reward function; among them, the penalty mechanism is as shown in formula (23) to Formula (24) shows.

The reward function is shown in formula (25).

In some embodiments, the real-time electric-thermal integrated energy system parameters and historical electric-thermal integrated energy system parameters include electric-thermal integrated energy system network parameters, electric-heat load output, and wind power output.

In some embodiments,

The power calculation module 602 is configured to determine the AC power flow as an analysis method of the power system; wherein, the power balance equation of the power system is formula (4);

The power calculation module 602 is configured to determine that the hydraulic model of the thermal system is composed of the flow continuity equation, the loop pressure equation and the pressure head loss equation; the thermal system is composed of the hydraulic model and the thermal model; Show.

The power calculation module 602 is configured to determine that the thermal model consists of node power equations, pipe temperature drop equations and node medium mixing equations; the thermal model is shown in formula (6).

The power calculation module 602 is configured to determine the electricity and heat generation power of the coupling device as formula (7).

Exemplarily, the model building module 702 is configured to aim at minimizing the total operating cost of the electric-thermal integrated energy system, and at the same time, to realize the maximum consumption of renewable energy, take the unconsumed part of renewable energy as Penalty term, establish objective function; establish constraints of electric-thermal integrated energy system scheduling model, constraints include: node power balance equation constraints, network security constraints, cogeneration device constraints, renewable energy constraints and conventional unit output constraints .

Exemplarily, the objective function is shown in formula (8); the operating cost of conventional units is shown in formula (9); the operating cost of combined heat and power units is shown in formula (10); the wind curtailment penalty is shown in formula (11) .

Exemplarily, the node power balance equation constraint condition is based on the network node active power balance equation, as shown in formula (12) to formula (13).

Exemplarily, the network security constraints are shown in equations (14) to (16).

Exemplarily, the constraint of the cogeneration device is shown in formula (17).

Exemplarily, the renewable energy constraints are shown in formula (18).

Exemplarily, the conventional unit output constraint is shown in formula (19).

Exemplarily, the climbing constraint shown in formula (20) is satisfied at the same time.

Exemplarily, the model optimization module 703 is configured to take the power generated by the conventional unit, the cogeneration power and the heat power generated by the cogeneration as the action variables shown in formula (21); The power generation of the cogeneration unit, the output of the conventional unit, the heat load, the thermal power of the cogeneration unit, and the ambient temperature are used as the state space variables shown in equation (22); build a reinforcement learning environment, and get the current action acting on the environment through the policy network, Get immediate rewards and the state of the next period, and provide rewards for strategy evaluation; the goal of reinforcement learning is to maximize long-term rewards, design the opposite of the optimization goal as immediate rewards, and set a penalty mechanism based on constraints to add to the immediate rewards The final reward function is obtained, and the unified expression of penalty items is in the form of formula (23) to formula (24); the reward function is shown in formula (25), including the operating cost of conventional units, the penalty for abandoning wind, the operating cost of cogeneration units, and variables Limit penalty.

Exemplarily, the model training module 704 is configured to assign a value to the target network of the evaluator, and set the capacity D of the intelligent memory bank;

When the variation range of the average reward value of continuous m ₀ rounds of training is less than δ _e %, for each time period, the control action a _t ~ π _φ ( _at |S _t ) is sampled from the actor policy network, and the control action Applied on the electric-thermal integrated energy system, sampling the uncertainty of wind power to obtain the system's next operating state S _t+1 , and then put the state transition and rewards into the experience database D; and update the evaluator Q network, action The policy network φ, the temperature coefficient and the target network are trained to obtain a well-trained policy network, which is used as the optimal scheduling model of the optimized SAC framework.

An embodiment of the present disclosure also provides an electronic device. FIG. 8 shows a schematic structural diagram of an electronic device 8 provided by an embodiment of the present disclosure. As shown in FIG. 8 , it includes a memory 801, a processor 802, and the A computer program that can run on the processor, when the processor executes the computer program, it can implement the method for coordination and optimization of an electric-thermal comprehensive energy system as provided in any one of the previous embodiments.

An embodiment of the present disclosure also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the coordinated optimization of the electric-thermal comprehensive energy system as described in the previous one is realized method.

An embodiment of the present disclosure also provides a computer program, where the computer program includes computer readable codes. When the computer readable codes run in the electronic device, the processor of the electronic device executes the computer program provided in any preceding embodiment. - Coordinated optimization method for thermal integrated energy systems.

Wherein, the memory 801 may include random access memory (Random Access Memory, RAM), read-only memory (Read Only Memory, ROM), programmable read-only memory (Programmable Read-Only Memory, PROM), erasable read-only memory (Erasable Programmable Read-Only Memory, EPROM), Electric Erasable Programmable Read-Only Memory (EEPROM), etc.

The processor 802 may be an integrated circuit chip with signal processing capabilities. The above-mentioned processor can be a general-purpose processor, including a CPU, a network processor (Network Processor, NP), etc.; it can also be a DSP, ASIC, FPGA or other programmable logic devices, discrete gate or transistor logic devices, and discrete hardware components. Various methods, steps and logic block diagrams disclosed in the embodiments of the present disclosure may be implemented or executed. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.

A computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device. A computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. As used herein, computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., pulses of light through fiber optic cables), or transmitted electrical signals.

The computer-readable program instructions described herein can be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device over at least one of a network, such as the Internet, a local area network, a wide area network, and a wireless network. . A network adapter card or a network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .

Computer program instructions for performing the operations of the present disclosure may be assembly instructions, Industry Standard Architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or in one or more source or object code written in any combination of programming languages. Computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement. In some embodiments, electronic circuits, such as programmable logic circuits, FPGAs, or programmable logic arrays (Programmable Logic Arrays, PLAs), can be customized by using state information of computer-readable program instructions, which can execute computer-readable Read program instructions, thereby implementing various aspects of the present disclosure.

The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.

If the functions are realized in the form of software function units and sold or used as independent products, they can be stored in a non-volatile computer-readable storage medium executable by a processor. Based on this understanding, the technical solution of the present disclosure is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in various embodiments of the present disclosure. The aforementioned storage medium includes: various media capable of storing program codes such as U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk.

Finally, it should be noted that: the above-mentioned embodiments are only specific implementations of the present disclosure, and are used to illustrate the technical solutions of the present disclosure, rather than limit them, and the protection scope of the present disclosure is not limited thereto, although referring to the aforementioned The embodiments have described the present disclosure in detail, and those skilled in the art should understand that any person familiar with the technical field can still modify the technical solutions described in the foregoing embodiments within the technical scope disclosed in the present disclosure Changes can be easily imagined, or equivalent replacements can be made to some of the technical features; and these modifications, changes or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present disclosure, and should be included in this disclosure. within the scope of protection. Therefore, the protection scope of the present disclosure should be defined by the protection scope of the claims.

Industrial Applicability

An embodiment of the present disclosure provides a coordinated optimization method, system, device, medium and program for an electric-thermal comprehensive energy system, wherein the method is executed by an electronic device, and the method includes: acquiring real-time electric-thermal comprehensive energy system parameters ; Based on the parameters of the electric-thermal integrated energy system, calculate the real-time power generation power of the power system, thermal system and coupling device of the electric-thermal integrated energy system respectively; input the real-time power generation power into the pre-trained optimal dispatching model based on the SAC framework In the process, the scheduling action is output to form a coordination strategy for the electric-thermal integrated energy system. The present disclosure can directly give scheduling actions through the trained policy network without performing traditional nonlinear overall iterative solution, so that the calculation speed is significantly improved.

Claims

A coordinated optimization method for an electric-thermal comprehensive energy system, the method is executed by electronic equipment; the method includes:

Obtain real-time electric-thermal comprehensive energy system parameters;

Based on the real-time electric-thermal comprehensive energy system parameters, calculate the real-time power generation power of the electric power system, thermal system and coupling device of the electric-thermal comprehensive energy system;

Input the real-time generated power into the pre-trained flexible actor-evaluator SAC framework optimization scheduling model, output the scheduling action, and form the electric-thermal comprehensive energy system coordination strategy;

The training optimization method based on the SAC frame optimization dispatching model of described pre-training comprises:

Obtain historical electric-thermal comprehensive energy system parameters;

Based on the historical electric-thermal integrated energy system parameters, calculate the historical power generation power of the electric power system, thermal system and coupling device of the electric-thermal integrated energy system, and establish based on the historical power generation power of the electric power system, thermal system and coupling device Scheduling model of electric-thermal integrated energy system;

Taking reinforcement learning environment, state, action and reward as basic elements, combined with electric-thermal comprehensive energy system scheduling model to establish an optimal scheduling model based on SAC framework;

Train the optimal scheduling model based on the SAC framework to obtain a pre-trained optimal scheduling model based on the SAC framework;

The basic elements of strengthening the learning environment, state, action and reward are combined with the electric-thermal comprehensive energy system scheduling model to establish an optimal scheduling model based on the SAC framework, including:

Set the action variable to
Among them, P i G ,
And P i chp is the generating power of the conventional unit, the thermal power of the combined heat and power device, and the generating power of the combined heat and power device in turn;

Determine the state space variable as
Among them, P i G , P load , P w , P i chp , H load ,
And T e is the generating power of the conventional unit, the electric load, the wind power generating power, the generating power of the cogeneration device, the heat load, the thermal power of the cogeneration device and the ambient temperature;

Build an intensive learning environment, get the current action acting on the environment through the policy network, get immediate rewards and the state of the next period, and provide rewards for strategy evaluation;

Set the reinforcement learning goal to maximize the long-term reward, determine the opposite number of the optimization goal and design it as an immediate reward, and set a penalty mechanism according to the constraints to add to the immediate reward to obtain the final reward function; wherein, the penalty mechanism is:

β v is the penalty coefficient; the constant coefficient is set correspondingly to the limit violation penalty;

The reward function is
f 1 , f 2 , and f 3 are the operating cost of the conventional unit, the operating cost of the combined heat and power unit, and the wind curtailment penalty, respectively;
as well as
Respectively, the conventional unit output over-limit and climbing over-limit penalty items; φ V is the system node voltage over-limit penalty;
as well as
Respectively, the output of the combined heat and power unit and the over-limit penalty for climbing; φ T is the over-limit penalty for the system node temperature; φ m is the over-limit penalty for the mass flow rate of the system pipeline.
The coordinated optimization method for the electric-thermal integrated energy system according to claim 1, wherein the real-time electric-thermal integrated energy system parameters and the historical electric-thermal integrated energy system parameters include the electric-thermal integrated energy system network parameters and the electric-heat load output and wind power output.
The coordinated optimization method for the electric-thermal integrated energy system according to claim 1, wherein, based on the real-time electric-thermal integrated energy system parameters, the electric power system, the thermal system and the coupling device of the electric-thermal integrated energy system are respectively calculated real-time power generation, including:

Determine the AC power flow as an analysis method of the power system; wherein, the power balance equation of the power system is:

P i and Q i are the injected active and reactive power of node i respectively; V i is the voltage amplitude of node i, G ij and B ij are the conductance and susceptance of branch ij respectively, θ ij is the phase angle difference;
A collection of power system nodes;

It is determined that the hydraulic model of the thermal system is composed of flow continuity equation, loop pressure equation and pressure head loss equation; the thermal system is composed of hydraulic model and thermal model; the hydraulic model of the thermal system is:

Among them, A h is the node-branch correlation matrix; B is the loop-branch correlation matrix;
is the pipeline mass flow rate;
is the node injection flow; h f is the pressure head loss; K is the damping coefficient of the pipeline;

Determine that the thermodynamic model is composed of node power equation, pipeline temperature drop equation and node medium mixing equation; the thermodynamic model is:

Among them, H i is the injected thermal power of node i, C p is the specific heat capacity of water, T s,i and T o,i are the water temperature and outlet water temperature of the heat transfer pipe of node i, and the subscript ij of T j(ij) represents The heating network pipe branch with i and j as the head and end nodes; T i(ij) and T j(ij) are the temperatures at the i and j ends of the branch, and T e represents the external ambient temperature;

Determine the electricity and heat generation power of the coupling device as:

In the formula,
Respectively, period t, electric output and thermal output of the i-th condensing unit;
are the upper and lower limits of electric output; α 1 , α 2 and α 3 are polygonal area representation coefficients; α 1 , α 2 and α 3 are constants when a cogeneration device is given.
The method for coordinating and optimizing an integrated electric-thermal energy system according to claim 1, wherein the establishment of a scheduling model for an integrated electric-thermal energy system based on the historical power generation power of the electric power system, the thermal system and the coupling device includes:

To achieve the minimum total operating cost of the electric-thermal integrated energy system as the goal, and to achieve the maximum consumption of renewable energy, the unconsumed part of the renewable energy is used as a penalty item to establish an objective function;

Constraints of the electric-thermal integrated energy system scheduling model are established; the constraints include: node power balance equation constraints, network security constraints, cogeneration device constraints, renewable energy constraints, and conventional unit output constraints.
The coordinated optimization method for electric-thermal comprehensive energy system according to claim 4, wherein, the objective function is: min F=f 1 +f 2 +f 3 ; f 1 is the operating cost of a conventional unit, and f 2 is The operating cost of the production device, f 3 is the penalty for wind abandonment; the operating cost f 1 of the conventional unit is:

is the generating power of conventional units, b 0 , b 1 and b 2 are the energy consumption coefficients of conventional units, N G is the number of conventional units; T is the scheduling period, and △t is the scheduling time interval;

The operating cost f2 of the combined heat and power unit is:

are the power generation power and heat production power of the cogeneration device connected to node i in the period t; a 0 , a 1 , a 2 , a 3 , a 4 and a 5 are the energy consumption coefficients of the cogeneration device, N chp Amount of combined heat and power;

Wind abandonment penalty f 3 is:

In the formula,
Indicates the output of the wind turbine connected to node i in the time period t, k is the wind curtailment penalty coefficient; k is a constant.
The electric-thermal comprehensive energy system coordination optimization method according to claim 4, wherein,

The node power balance equation constraints are based on the network node active power balance equation as:

in,
is the collection of power system and thermal system nodes; T is the scheduling period;
are the electric load and thermal load power of node i during period t, respectively;

The network security constraints include:

V i,max and V i,min are the upper limit and lower limit of the voltage amplitude of node i respectively; T sj is the temperature of hot water flowing into node j of the heating network,
are the upper and lower limits of the water supply temperature; m jk is the mass flow rate of the hot water pipe k, and m k,max and m k,min are the upper and lower limits of the mass flow rate;

The constraints of the cogeneration device are:

In the formula:
are the combined heat and power generation power of the two periods before and after, respectively,
Respectively, the upper and lower limits of the climbing rate of the cogeneration unit;

The renewable energy constraints are:

Indicates the power generation power of wind turbine i in time period t,
for
the maximum value;

The output constraint of the conventional unit is:

Also satisfy the climbing constraints:

are the upper and lower limits of unit output, respectively,
Divided into the upper and lower limits of the ramp rate of the unit.
The electric-thermal comprehensive energy system coordination and optimization method according to claim 1, wherein said training the optimal scheduling model based on the SAC framework to obtain a pre-trained optimal scheduling model based on the SAC framework includes:

Assign a value to the target network of the evaluator, and set the capacity D of the intelligent memory bank;

When the change range of the average reward value of continuous m 0 rounds of training is less than δ e %, for each period, sample the control action at -π φ ( at |S t ) from the actor policy network, and the control action The action is applied to the electric-thermal integrated energy system, and the wind power uncertainty is sampled to obtain the operating state S t+1 of the system at the next moment, and then the state transition and rewards are put into the experience database D;

Update the evaluator Q network, the actor policy network φ, the temperature coefficient, and the target network to obtain a trained policy network, which is used as the optimal scheduling model of the SAC framework.
The coordinated optimization method for electric-thermal comprehensive energy system according to claim 1, wherein,

The method used to update the Q network of the judge, the strategy network of the actor, the temperature coefficient, and the target network is the stochastic gradient descent SGD algorithm or the Adam algorithm.
An electric-thermal comprehensive energy system coordination and optimization system, including:

The first parameter acquisition module is configured to acquire real-time electric-thermal comprehensive energy system parameters;

The power calculation module is configured to calculate the real-time power generation power of the electric power system, the thermal system and the coupling device of the electric-thermal comprehensive energy system based on the real-time electric-thermal comprehensive energy system parameters;

The scheduling output module is configured to input the real-time generated power into the pre-trained flexible actor-evaluator SAC framework-based optimal scheduling model, output scheduling actions, and form an electric-thermal comprehensive energy system coordination strategy;

The optimized scheduling model based on the SAC framework of the pre-training in the scheduling output module includes:

The second parameter acquisition module is configured to acquire historical electric-thermal comprehensive energy system parameters;

The model building module is configured to calculate the historical power generation power of the electric power system, thermal system and coupling device of the electric-thermal comprehensive energy system based on the historical electric-thermal comprehensive energy system parameters, and use the electric power system, thermal system and coupling device The dispatching model of the electric-thermal integrated energy system is established on the basis of the historical power generation;

The model optimization module is configured to take the reinforcement learning environment, state, action and reward as the basic elements, combined with the electric-thermal comprehensive energy system scheduling model to establish an optimal scheduling model based on the SAC framework;

The model training module is configured to train the optimal scheduling model based on the SAC framework to obtain a pre-trained optimal scheduling model based on the SAC framework;

The model optimization module is configured to set the action variable as
Among them, P i G ,
And P i chp is the generating power of the conventional unit, the thermal power of the combined heat and power device, and the generating power of the combined heat and power device in turn;

Determine the state space variable as
Among them, P i G , P load , P w , P i chp , H load ,
And T e is the generating power of the conventional unit, the electric load, the wind power generating power, the generating power of the cogeneration device, the heat load, the thermal power of the cogeneration device and the ambient temperature;

Build an intensive learning environment, get the current action acting on the environment through the policy network, get immediate rewards and the state of the next period, and provide rewards for strategy evaluation;

Set the reinforcement learning goal to maximize the long-term reward, determine the opposite number of the optimization goal and design it as an immediate reward, and set a penalty mechanism according to the constraints to add to the immediate reward to obtain the final reward function; wherein, the penalty mechanism is:

In the formula, β v is the penalty coefficient; the constant coefficient is set correspondingly to the limit violation penalty;

The reward function is
f 1 , f 2 , and f 3 are the operating cost of the conventional unit, the operating cost of the combined heat and power unit, and the wind curtailment penalty, respectively;
as well as
Respectively, the conventional unit output over-limit and climbing over-limit penalty items; φ V is the system node voltage over-limit penalty;
as well as
Respectively, the output of the combined heat and power unit and the over-limit penalty for climbing; φ T is the over-limit penalty for the system node temperature; φ m is the over-limit penalty for the mass flow rate of the system pipeline.
The electric-thermal comprehensive energy system coordination and optimization system according to claim 9, wherein the real-time electric-thermal comprehensive energy system parameters and the historical electric-thermal comprehensive energy system parameters include electric-thermal comprehensive energy system network parameters and electric-heat load output and wind power output.
The electric-thermal comprehensive energy system coordination and optimization system according to claim 9, wherein,

The power calculation module is configured to determine the AC power flow as an analysis method of the power system; wherein, the power balance equation of the power system is:

P i and Q i are the injected active and reactive power of node i respectively; V i is the voltage amplitude of node i, G ij and B ij are the conductance and susceptance of branch ij respectively, θ ij is the phase angle difference;
A collection of power system nodes;

The power calculation module is configured to determine that the hydraulic model of the thermal system is composed of flow continuity equation, loop pressure equation and pressure head loss equation; the thermal system is composed of hydraulic model and thermal model; the hydraulic model of the thermal system The model is:

Among them, A h is the node-branch correlation matrix; B is the loop-branch correlation matrix;
is the pipeline mass flow rate;
is the node injection flow; h f is the pressure head loss; K is the damping coefficient of the pipeline;

The power calculation module is configured to determine that the thermodynamic model is composed of a node power equation, a pipeline temperature drop equation and a node medium mixing equation; the thermodynamic model is:

Among them, H i is the injected thermal power of node i, C p is the specific heat capacity of water, T s,i and T o,i are the water temperature and outlet water temperature of the heat transfer pipe of node i, and the subscript ij of T j(ij) represents The heating network pipe branch with i and j as the head and end nodes; T i(ij) and T j(ij) are the temperatures at the i and j ends of the branch, and T e represents the external ambient temperature;

The power calculation module is configured to determine the electricity and heat generation power of the coupling device as:

In the formula,
Respectively, period t, electric output and thermal output of the i-th condensing unit;
are the upper and lower limits of electric output; α 1 , α 2 and α 3 are polygonal area representation coefficients; α 1 , α 2 and α 3 are constants when a cogeneration device is given.
According to claim 9, the electric-thermal integrated energy system coordination optimization system, wherein, the model building module is configured to achieve the minimum total operating cost of the electric-thermal integrated energy system as the goal, and at the same time to realize the renewable energy The maximum consumption of , and the unconsumed part of the renewable energy can be used as a penalty item to establish an objective function;

Constraints for the scheduling model of the electric-thermal integrated energy system are established. The constraints include: node power balance equation constraints, network security constraints, cogeneration device constraints, renewable energy constraints, and conventional unit output constraints.
The electric-thermal comprehensive energy system coordination and optimization system according to claim 12, wherein,

The objective function is: min F=f 1 +f 2 +f 3 ; f 1 is the operating cost of the conventional unit, f 2 is the operating cost of the combined heat and power unit, and f 3 is the penalty for abandoning wind;

The operating cost f 1 of the conventional unit is:
is the generating power of conventional units, b 0 , b 1 and b 2 are the energy consumption coefficients of conventional units, N G is the number of conventional units; T is the scheduling period, and △t is the scheduling time interval;

The operating cost f2 of the combined heat and power unit is:

are the power generation power and heat production power of the cogeneration device connected to node i in the period t; a 0 , a 1 , a 2 , a 3 , a 4 and a 5 are the energy consumption coefficients of the cogeneration device, N chp Amount of combined heat and power;

Wind abandonment penalty f 3 is:

In the formula,
Indicates the output of the wind turbine connected to node i in the time period t, k is the wind curtailment penalty coefficient; k is a constant.
The electric-thermal comprehensive energy system coordination optimization system according to claim 9, wherein the node power balance equation constraint condition is based on the network node active power balance equation:

in,
is the collection of power system and thermal system nodes; T is the scheduling period;
are the electric load and thermal load power of node i during period t, respectively;

The network security constraints include:

V i,max and V i,min are the upper limit and lower limit of the voltage amplitude of node i respectively; T sj is the temperature of hot water flowing into node j of the heating network,
are the upper and lower limits of the water supply temperature; m jk is the mass flow rate of the hot water pipe k, and m k,max and m k,min are the upper and lower limits of the mass flow rate;

The constraints of the cogeneration device are:

In the formula:
are the combined heat and power generation power of the two periods before and after, respectively,
Respectively, the upper and lower limits of the climbing rate of the cogeneration unit;

The renewable energy constraints are:

In the formula,
Indicates the power generation power of wind turbine i in time period t,
for
the maximum value;

The output constraint of the conventional unit is:

Also satisfy the climbing constraints:

In the formula:
are the upper and lower limits of unit output, respectively,
Divided into the upper and lower limits of the ramp rate of the unit.
The electric-thermal integrated energy system coordination and optimization system according to claim 9, wherein the model training module is specifically configured as: assigning values to the target network of the evaluator, and setting the capacity D of the intelligent memory bank;

When the variation range of the average reward value of continuous m 0 rounds of training is less than δ e %, for each period, the control action at -π φ ( at |S t ) is sampled from the actor policy network, and the The control action is applied to the electric-thermal integrated energy system, and the wind power uncertainty is sampled to obtain the operating state S t+1 of the system at the next moment, and then the state transition and rewards are put into the experience database D;

And update the judger Q network, the actor strategy network φ, the temperature coefficient, and the target network to get the trained strategy network as the optimized SAC framework optimization scheduling model.
The electric-thermal comprehensive energy system coordination and optimization system according to claim 9, wherein the method used to update the evaluator Q network, the actor strategy network, the temperature coefficient, and the target network is the stochastic gradient descent SGD algorithm or the Adam algorithm.
An electronic device, comprising a memory, a processor, and a computer program stored in the memory and operable on the processor, when the processor executes the computer program, any one of claims 1-8 is implemented Coordinated optimization method for electric-thermal integrated energy system.
A computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the method for coordinating and optimizing the electric-thermal comprehensive energy system described in any one of claims 1-8 is realized.
A computer program, the computer program comprising computer-readable code, when the computer-readable code is run in an electronic device, a processor of the electronic device executes a program as described in any one of claims 1-8. Coordinated optimization method for electric-thermal integrated energy system.