WO2023082697A1 - Coordination and optimization method and system for comprehensive electric-thermal energy system, and device, medium and program - Google Patents

Coordination and optimization method and system for comprehensive electric-thermal energy system, and device, medium and program Download PDF

Info

Publication number
WO2023082697A1
WO2023082697A1 PCT/CN2022/107149 CN2022107149W WO2023082697A1 WO 2023082697 A1 WO2023082697 A1 WO 2023082697A1 CN 2022107149 W CN2022107149 W CN 2022107149W WO 2023082697 A1 WO2023082697 A1 WO 2023082697A1
Authority
WO
WIPO (PCT)
Prior art keywords
power
electric
thermal
energy system
node
Prior art date
Application number
PCT/CN2022/107149
Other languages
French (fr)
Chinese (zh)
Inventor
蒲天骄
董雷
李烨
王新迎
Original Assignee
中国电力科学研究院有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国电力科学研究院有限公司 filed Critical 中国电力科学研究院有限公司
Publication of WO2023082697A1 publication Critical patent/WO2023082697A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply

Definitions

  • the present disclosure relates to the field of optimal dispatching of integrated energy systems, in particular to a coordination and optimization method, system, equipment, medium and program for electric-thermal integrated energy systems.
  • the electric-thermal integrated energy system can promote the consumption of renewable energy and improve energy utilization by utilizing the complementary characteristics of heat and electricity.
  • the solution to the optimization problem of the electric-thermal integrated energy system mainly includes traditional nonlinear methods such as particle swarm optimization, intelligent algorithms such as Q learning, and deep Q (Deep Q Network, DQN) algorithms.
  • DQN Deep Q Network
  • both the particle swarm algorithm and the Q-learning algorithm have problems such as insufficient accuracy, slow calculation speed, and limited scope of application.
  • the DQN algorithm has insufficient exploration ability and is easy to fall into local optimal solutions.
  • Embodiments of the present disclosure provide a coordinated optimization method, system, equipment, medium and program for an electric-thermal comprehensive energy system.
  • An embodiment of the present disclosure provides a coordinated optimization method for an electric-thermal comprehensive energy system, the method is executed by an electronic device; the method includes:
  • the training optimization method based on the SAC frame optimization dispatching model of described pre-training comprises:
  • the basic elements of strengthening the learning environment, state, action and reward are combined with the electric-thermal comprehensive energy system scheduling model to establish an optimal scheduling model based on the SAC framework, including:
  • P i G , And P i chp is the generating power of the conventional unit, the thermal power of the combined heat and power device, and the generating power of the combined heat and power device in turn;
  • T e is the generating power of the conventional unit, the electric load, the wind power generating power, the generating power of the cogeneration device, the heat load, the thermal power of the cogeneration device and the ambient temperature;
  • ⁇ v is the penalty coefficient; the constant coefficient is set correspondingly to the limit violation penalty;
  • the reward function is f 1 , f 2 , and f 3 are the operating cost of the conventional unit, the operating cost of the combined heat and power unit, and the wind curtailment penalty, respectively; as well as Respectively, the conventional unit output over-limit and climbing over-limit penalty items; ⁇ V is the system node voltage over-limit penalty; as well as Respectively, the output of the combined heat and power unit and the over-limit penalty for climbing; ⁇ T is the over-limit penalty for the system node temperature; ⁇ m is the over-limit penalty for the mass flow rate of the system pipeline.
  • An embodiment of the present disclosure also provides a coordinated optimization system for an electric-thermal comprehensive energy system, including:
  • the first parameter acquisition module is configured to acquire real-time electric-thermal comprehensive energy system parameters
  • the power calculation module is configured to calculate the real-time power generation power of the electric power system, the thermal system and the coupling device of the electric-thermal comprehensive energy system based on the real-time electric-thermal comprehensive energy system parameters;
  • the scheduling output module is configured to input the real-time generated power into the pre-trained optimal scheduling model based on the SAC framework, output scheduling actions, and form an electric-thermal comprehensive energy system coordination strategy;
  • the optimized scheduling model based on the SAC framework of the pre-training in the scheduling output module includes:
  • the second parameter acquisition module is configured to acquire historical electric-thermal comprehensive energy system parameters
  • the model building module is configured to calculate the historical power generation power of the electric power system, thermal system and coupling device of the electric-thermal comprehensive energy system based on the historical electric-thermal comprehensive energy system parameters, and use the electric power system, thermal system and coupling device
  • the dispatching model of the electric-thermal integrated energy system is established on the basis of the historical power generation;
  • the model optimization module is configured to take the reinforcement learning environment, state, action and reward as the basic elements, combined with the electric-thermal comprehensive energy system scheduling model to establish an optimal scheduling model based on the SAC framework;
  • the model training module is configured to train the optimal scheduling model based on the SAC framework to obtain a pre-trained optimal scheduling model based on the SAC framework.
  • the model optimization module is configured to set the action variable as Among them, P i G , And P i chp is the generating power of the conventional unit, the thermal power of the combined heat and power device, and the generating power of the combined heat and power device in turn;
  • T e is the generating power of the conventional unit, the electric load, the wind power generating power, the generating power of the cogeneration device, the heat load, the thermal power of the cogeneration device and the ambient temperature;
  • ⁇ v is the penalty coefficient; the constant coefficient is set correspondingly to the limit violation penalty;
  • the reward function is f 1 , f 2 , and f 3 are the operating cost of the conventional unit, the operating cost of the combined heat and power unit, and the wind curtailment penalty, respectively; as well as Respectively, the conventional unit output over-limit and climbing over-limit penalty items; ⁇ V is the system node voltage over-limit penalty; as well as Respectively, the output of the combined heat and power unit and the over-limit penalty for climbing; ⁇ T is the over-limit penalty for the system node temperature; ⁇ m is the over-limit penalty for the mass flow rate of the system pipeline.
  • An embodiment of the present disclosure also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and operable on the processor.
  • the processor executes the computer program, the foregoing A coordinated optimization method for the electric-thermal comprehensive energy system.
  • An embodiment of the present disclosure also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, it realizes the coordinated optimization of the electric-thermal comprehensive energy system as described above method.
  • An embodiment of the present disclosure also provides a computer program, the computer program includes computer readable codes, and when the computer readable codes run in an electronic device, the processor of the electronic device executes the A coordinated optimization method for the electric-thermal comprehensive energy system.
  • This disclosure uses an optimized scheduling model based on an optimized SAC framework to give scheduling actions when a system state such as a load is given, and then generate a strategy; when the system gives a determined system state such as a load, the algorithm can directly pass the trained
  • the strategy network gives the scheduling action, no need to solve the traditional nonlinear overall iterative solution, the calculation speed has been significantly improved, and the calculation efficiency is higher.
  • an optimal scheduling model based on the SAC framework is established.
  • the SAC algorithm belongs to a random offline strategy algorithm, and its self-optimizing characteristics are used for autonomous learning of comprehensive energy optimal scheduling problems, and are obtained through interactive observation with the environment.
  • the reward value evaluates the pros and cons of the strategy, and learns to explore all the optimal strategy approaches, so as to achieve the optimal cost in the scheduling cycle under unsupervised.
  • the network model after learning and training solves the problem of dimensionality disaster caused by discrete processing state and action. Moreover, it can converge reliably and has a wide range of applications. For example, it can be applied to scenarios where random changes in output of renewable energy such as wind power and loads are considered, and it is more universal.
  • Fig. 1 shows a schematic flow diagram of a coordinated optimization method for an electric-thermal comprehensive energy system provided by an embodiment of the present disclosure
  • FIG. 2 shows a schematic flow diagram of a pre-trained training optimization method based on an SAC framework optimization scheduling model provided by an embodiment of the present disclosure
  • Figure 3 shows a schematic structural view of an electric-thermal integrated energy system
  • Fig. 4 shows the structural representation of thermal system
  • Figure 5 shows a specific algorithm flow chart for training based on the optimal scheduling model of the SAC framework
  • Fig. 6 shows a schematic structural diagram of an electric-thermal comprehensive energy system coordination and optimization system provided by an embodiment of the present disclosure
  • FIG. 7 shows a schematic structural diagram of a pre-trained SAC framework-based optimization scheduling model adopted by an embodiment of the present disclosure
  • FIG. 8 shows a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
  • particle swarm optimization algorithm is an evolutionary computer technology.
  • the information interaction of the environment starts from the random initial value of the system, and seeks the optimal value through iteration.
  • IEHS integrated Electricity-Heat Energy System
  • the value of i can be an integer greater than or equal to 1, which represents the total number of particles in the group; V id is the velocity of the particle; P gd is the extreme value of the particle; X id is the current position of the particle; random(0, 1) is a random number between 0 and 1; C 1 and C 2 are learning factors; ⁇ is an inertia factor, and its value is non-negative.
  • the disadvantages of the above algorithms are as follows: (1) The particle swarm optimization algorithm has insufficient convergence, and it may even be difficult to converge, and it is easy to fall into a local optimal solution. Lost the significance of multi-energy collaborative optimization; (2) The electrothermal integrated energy system with increasingly close coupling presents highly nonlinear characteristics, and the calculation speed of the particle swarm algorithm is significantly reduced when solving, and the calculation efficiency cannot meet the economic requirements of the electrothermal integrated energy system. Scheduling issues.
  • the Q-learning algorithm is based on the Markov decision process and is a model-independent reinforcement learning algorithm.
  • the general steps to optimize the electric-thermal comprehensive energy system using the Q-learning algorithm are: design the action state space, discretize the continuous action space and the state space, and establish the Q-learning reward and punishment mechanism according to the system optimization goals and operating constraints. Continuous trial and error exploration, interact with the environment and update the Q value table, and finally achieve the goal of autonomously selecting the optimal action.
  • the optimal index corresponding to the optimal strategy is: the instant reward rt obtained by the agent action a t of the electric-thermal integrated energy system at this moment, and the maximum Q value obtained by the subsequent state transition The sum of max a' Q(s t ',a t '). Therefore, the Q value table can be updated according to the Bellman optimal criterion through formula (2):
  • the agent can make the optimal control action of the electric-thermal integrated energy system according to the input state information and Q value table.
  • the value ranges of ⁇ and ⁇ are both [0,1].
  • the action state space of the electric-thermal integrated energy system is mostly a continuous interval.
  • the continuous space needs to be discretized. Calculation based on the discretized action space and state space will lead to a significant increase in calculation accuracy.
  • the Q-learning algorithm is not suitable for solving large-scale electric-thermal integrated energy system optimization problems.
  • the increase in the scale of the problem will lead to an increase in the dimension of the action space and the complexity of the network; and the increase in the action space will lead to a substantial increase in the dimension of the Q index. If it is too large, it will lead to the increase of training difficulty and the poor fitting ability of Q index, which makes it difficult to model the high-complexity electric-thermal integrated energy system network.
  • DQN is the product of the combination of deep learning and reinforcement learning of decision-making ability.
  • DQN extends the practicability of reinforcement learning by learning control policies directly from high-dimensional raw data when building deep learning networks.
  • the action and state space should be designed first, the deep learning network should be constructed to fit the Q value, and the experience playback unit should be constructed to store historical samples. The experience playback unit is randomly sampled for each training, and the Q network is trained according to the sampled samples.
  • the DQN algorithm first obtains the observation value from the environment, and the agent obtains all Q(s, a) about the observation value according to the value function neural network; then the agent uses the strategy algorithm to make a decision, obtains the action, and obtains Feedback reward value r; then use the obtained reward r to update the parameters of the value function network, so as to enter the next iteration. until the network training is completed.
  • DQN needs to define the corresponding loss function, and use the gradient descent algorithm to update the parameters.
  • the output value of the Q network can gradually approach the optimal Q value.
  • the definition of the loss function is based on the residual model, that is, the square of the difference between the real value and the network output, as shown in formula (3):
  • DQN introduces a target Q network on the basis of the original Q network.
  • This network has the same structure as the Q network, and the initial weight is the same, but the Q network
  • the parameters will be updated in the network, while the parameters of the target Q network will be updated every once in a while.
  • the above scheme is more suitable for continuous control action scenes, but the difficulty of continuous action space exploration is more complex, and the exploration is difficult. It is difficult for DQN to use a specific strategy algorithm to ensure effective exploration of the state space, and local maximums may occur. Solve the problem optimally.
  • the present disclosure adopts the SAC algorithm to solve the economic scheduling problem of the electric-thermal integrated energy system, and proposes a coordinated optimization method for the electric-thermal integrated energy system.
  • Figure 1 shows the electric-thermal energy system provided by the embodiment of the present disclosure.
  • Step 101 Obtain real-time electric-thermal comprehensive energy system parameters.
  • Step 102 Based on the parameters of the real-time electric-thermal comprehensive energy system, calculate the real-time power generation power of the electric power system, the thermal system and the coupling device of the electric-thermal comprehensive energy system respectively.
  • Step 103 Input the real-time generated power into the pre-trained optimal scheduling model based on the SAC framework, output scheduling actions, and form a coordination strategy for the electric-thermal comprehensive energy system.
  • FIG. 2 shows a schematic flow diagram of a pre-trained training optimization method based on an SAC framework optimization scheduling model provided by an embodiment of the present disclosure. As shown in FIG. 2 , the process includes:
  • Step 201 Obtain historical electric-thermal comprehensive energy system parameters.
  • Step 202 Based on the parameters of the historical electric-thermal integrated energy system, respectively calculate the historical power generation power of the electric-thermal integrated energy system, the thermal system and the coupling device, and take the historical power generation power of the electric power system, the thermal system and the coupling device as Basic establishment of electric-thermal comprehensive energy system dispatching model.
  • Step 203 taking reinforcement learning environment, state, action and reward as the basic elements, and combining with the electricity-thermal integrated energy system scheduling model to establish an optimal scheduling model based on the SAC framework.
  • Step 204 Train the optimal scheduling model based on the SAC framework to obtain a pre-trained optimal scheduling model based on the SAC framework.
  • the economic scheduling method of the electric-thermal comprehensive energy system based on the SAC algorithm in this disclosure adopts a continuous control strategy, and at the same time adds the maximum entropy to the optimization target, which can interact with the electric-thermal comprehensive energy system, learn and generate an optimal control strategy, and solves the problem of electric heating.
  • This disclosure provides strong technical support for the idea of collaborative optimization of integrated energy systems, provides decision-making basis for decision-makers of integrated energy dispatching, and is of great significance for realizing multi-energy complementarity of the system, renewable energy consumption, and improving system operation economy .
  • This disclosure adopts the deep reinforcement learning method to solve the economic scheduling problem of the electrothermal integrated energy system, which is mainly used to solve the following problems in the traditional method: 1)
  • the deep reinforcement learning method uses the neural network to fit the optimal strategy of the electrothermal integrated energy system under different states, and the network After the training is completed, the scheduling strategy can be obtained in real time, while the traditional nonlinear traditional algorithm requires global optimization, and the deep reinforcement learning method improves the computational efficiency.
  • Deep reinforcement learning has stronger exploration ability and better convergence stability in the optimal scheduling problem of electric-thermal integrated energy system. Compared with the particle swarm algorithm of the agent algorithm, the scheduling cost is lower.
  • the SAC algorithm-based deep reinforcement learning economic scheduling method for electric-thermal comprehensive energy systems proposed in this disclosure adopts a continuous control strategy to overcome the difficult problem of high-dimensional solution caused by the discretization of variables based on the value function reinforcement learning method. Maximum entropy is added to the optimization objective, exploring various optimal possibilities.
  • the disclosed method includes the following steps:
  • Step 1 Import the parameters of the electric-thermal comprehensive energy system.
  • FIG. 3 shows a schematic structural diagram of an electric-thermal comprehensive energy system.
  • the system shown in FIG. 3 includes an electric power system 301 and a thermal system 302 .
  • the real-time electric-thermal comprehensive energy system parameters and historical electric-thermal comprehensive energy system parameters include electric-thermal comprehensive energy system network parameters, electric-heat load output and wind power output. The collected data are shown in Table 1.
  • the real-time power generation power of the power system, thermal system and coupling device of the point-thermal integrated energy system can be calculated separately, which can be realized by step 2:
  • Step 2 Establish an electric-thermal comprehensive energy system model.
  • the electric-thermal comprehensive energy system is modeled by using the present disclosure in three parts: electric power, thermal system and coupling device.
  • Power system Determine the AC power flow as an analysis method of the power system; where, the power balance equation of the power system can be formula (4):
  • P i and Q i are the injected active and reactive power of node i respectively
  • V i is the voltage amplitude of node i
  • G ij and B ij are the conductance and susceptance of branch ij respectively
  • ⁇ ij is the phase angle difference of branch ij;
  • the district heating system adopts centralized heating for heating
  • Fig. 4 shows the structural diagram of the heating system.
  • the heat source of the thermal system generates heat energy, which is transported to the heat load through the water supply pipeline to form the first passage 401; after the heat load is cooled down, it flows back through the return water pipe to form the second passage 402, and the first passage 401 It forms a closed loop with the second passage 402 .
  • the thermal system is divided into two parts: the hydraulic model and the thermal model:
  • Hydraulic model The thermal system consists of a hydraulic model and a thermal model; the hydraulic model of the thermal system represents the medium flow, and consists of flow continuity equation, loop pressure equation and pressure head loss equation, as shown in formula (5).
  • a h is the node-branch correlation matrix
  • B is the loop-branch correlation matrix
  • h f is the pressure head loss
  • K is the damping coefficient of the pipeline.
  • Thermal model can represent the energy transfer process, which can be composed of node power equation, pipe temperature drop equation and node medium mixing equation, as shown in formula (6):
  • H i is the injected heat power of node i
  • C p is the specific heat capacity of water
  • T s,i and T o,i are the water temperature and outlet water temperature of the heat transfer pipe of node i
  • the subscript ij of T j(ij) Indicates the pipeline branch of the heating network with i and j as the head and end nodes
  • T i(ij) and T j(ij) are the temperatures of the i and j ends of the branch
  • T e represents the external ambient temperature.
  • the heat and power cogeneration unit adopts the extraction condensing unit, and the operating point is within the polygonal area.
  • the power generation and heat generation of the coupling device can be expressed by formula (7):
  • the period t, the electric output and thermal output of the i-th condensing unit; are the upper and lower limits of electric output;
  • ⁇ 1 , ⁇ 2 and ⁇ 3 are polygonal area representation coefficients, and
  • ⁇ 1 , ⁇ 2 and ⁇ 3 are constants when a cogeneration device is given.
  • establishing an electric-thermal integrated energy system dispatching model based on the historical power generation power of the electric power system, the thermal system, and the coupling device can be realized through the following steps:
  • Step 2-1 establishing the objective function.
  • f 1 is the operating cost of the conventional unit
  • f 2 is the operating cost of the combined heat and power unit
  • f 3 is the penalty for abandoning wind.
  • b 0 , b 1 and b 2 are the energy consumption coefficients of conventional units
  • N G is the number of conventional units.
  • T is the scheduling period
  • ⁇ t is the scheduling time interval.
  • a 0 , a 1 , a 2 , a 3 , a 4 and a 5 are the energy consumption coefficients of the cogeneration device, N chp Amount of combined heat and power.
  • the wind curtailment penalty f 3 can be calculated by formula (11):
  • Step 2-2 Establish constraints on the scheduling model of the electric-thermal integrated energy system.
  • the constraints include node power balance equation constraints, network security constraints, combined heat and power device constraints, renewable energy constraints, and conventional unit output constraints.
  • Equation (12) and Equation (13) can express the network node active power balance equation.
  • T is the scheduling period. Respectively, period t, the electric load and thermal load power of node i.
  • V i,max and V i,min are the upper limit and lower limit of the voltage amplitude of node i respectively;
  • T sj is the temperature of hot water flowing into node j of the heating network, are the upper and lower limits of the water supply temperature;
  • m jk is the mass flow rate of the hot water pipe k, m k,max and m k,min are the upper and lower limits of the mass flow rate respectively.
  • the cogeneration unit should meet the climbing constraints, which can be shown in formula (17):
  • Step 3 Establish an optimal dispatching model of the electric-thermal coupling integrated energy system based on SAC.
  • Action space corresponds to the control variable of the system under study, and the power generation power of the conventional unit, the cogeneration power generation power and the cogeneration heat power are set as the action variables, as shown in formula (21).
  • P i G , And P i chp is the generating power of the conventional unit, the thermal power of the combined heat and power device, and the generating power of the combined heat and power device in turn.
  • the state space variables correspond to the state variables of the system under study, and the determined state space variables are shown in formula (22), where P i G , P load , P w , P i chp , H load , And T e is the generating power of conventional units, electric load, wind power, generating power of cogeneration device, heat load, thermal power of cogeneration device and ambient temperature:
  • ⁇ v is the penalty coefficient
  • the corresponding constant coefficient is set according to different limit violation penalties.
  • the reward function includes the operating cost of conventional units, the penalty for abandoning wind, the operating cost of the cogeneration unit, and the penalty for exceeding the variable limit, which can be shown in formula (25).
  • f 1 , f 2 , and f 3 are the operating cost of the conventional unit, the operating cost of the combined heat and power unit, and the wind curtailment penalty, respectively. as well as They are the penalty items for conventional unit output over-limit and ramp-up over-limit respectively, and ⁇ V is the system node voltage over-limit penalty. as well as They are the output of cogeneration units and the over-limit penalty for climbing, ⁇ T is the over-limit penalty for system node temperature, and ⁇ m is the over-limit penalty for system pipeline mass flow rate.
  • Step 4 SAC training process.
  • the optimal scheduling model based on the SAC framework is trained to obtain a pre-trained optimal scheduling model based on the SAC framework, which can be implemented in the following manner:
  • Figure 5 shows the algorithm flow chart for training the optimal scheduling model based on the SAC framework. As shown in Figure 5, the process may include the following steps:
  • Step 501, ⁇ 1 and ⁇ 2 are initialized.
  • Step 502 setting the capacity of the intelligent memory bank D.
  • Step 503 initialize t and g.
  • both t and g may be initialized to be 0.
  • Step 504 judging whether t is in T.
  • step 505 if t is in T, execute step 505 to step 507; if t is not in T, execute step 508.
  • Step 505 Sampling the control action and applying the action to the environment to obtain the running state at the next moment.
  • Step 506 put the state transition and rewards into the experience database D.
  • Step 507 t is incremented by itself.
  • Step 508 judging whether g is in G.
  • Step 509 update the Q network of the judge, the policy network of the actor, the temperature coefficient, and the target network.
  • Step 510 g is incremented by itself.
  • Step 511 judging whether the change range of the average value of rewards for m 0 consecutive training rounds is less than ⁇ e %.
  • Step 512 end.
  • the variation range of the average value of the reward for continuous m 0 rounds of training is less than ⁇ e %, for each time period, it is necessary to sample the control action a t ⁇ ⁇ (a t
  • the Adam strategy is used to update the evaluator Q network Actor Policy Network Temperature Coefficient target network Update the evaluator Q network, the actor policy network ⁇ , the temperature coefficient, and the target network to obtain the trained policy network as the optimal scheduling model of the SAC framework.
  • the trained policy network can directly give scheduling actions and generate policies when the system state such as load is given.
  • the calculation of the cost of generators and combined heat and power units can be replaced by a linear model, but it will affect the accuracy of the calculation results;
  • the penalty item of the CHP unit can be modeled in the form of a constant power-to-heat ratio, but its control flexibility and calculation accuracy It is not as good as the polygonal area model;
  • the penalty function in the reward and punishment mechanism can be established in the form of a step function, but it is difficult to fit the step function or noise neural network, which reduces the solution accuracy.
  • the training method can use Stochastic Gradient Descent (SGD) instead of Adam, but practice shows that the Adam algorithm is better.
  • SGD Stochastic Gradient Descent
  • Fig. 6 shows a schematic structural diagram of an electric-thermal comprehensive energy system coordination and optimization system 6 provided by an embodiment of the present disclosure. As shown in Fig. 6, the system includes:
  • the first parameter acquisition module 601 is configured to acquire real-time electric-thermal comprehensive energy system parameters
  • the power calculation module 602 is configured to calculate the real-time power generation power of the power system, the thermal system and the coupling device of the electric-thermal comprehensive energy system respectively based on the real-time electric-thermal comprehensive energy system parameters;
  • the scheduling output module 603 is configured to input real-time generated power into the pre-trained optimal scheduling model based on the SAC framework, output scheduling actions, and form a coordination strategy for the electric-thermal comprehensive energy system.
  • FIG. 7 shows a schematic structural diagram of a pre-trained SAC framework-based optimal scheduling model 7 adopted by an embodiment of the present disclosure.
  • the pre-trained SAC framework-based optimal scheduling model 7 in the scheduling output module includes:
  • the second parameter acquisition module 701 is configured to acquire historical electric-thermal comprehensive energy system parameters
  • the model building module 702 is configured to separately calculate the historical power generation power of the electric power system, thermal system and coupling device of the electric-thermal comprehensive energy system based on the parameters of the historical electric-thermal comprehensive energy system, and use the power system, thermal system and coupling device Based on the historical power generation power, the electric-thermal integrated energy system dispatching model is established;
  • the model optimization module 703 is configured to use the reinforcement learning environment, state, action and reward as basic elements, and combine the electric-thermal integrated energy system scheduling model to establish an optimal scheduling model based on the SAC framework;
  • the model training module 704 is configured to train the optimal scheduling model based on the SAC framework to obtain a pre-trained optimal scheduling model based on the SAC framework.
  • the model optimization module 703 is configured to set the action variable as Among them, P i G , And P i chp is the generating power of the conventional unit, the thermal power of the combined heat and power device, and the generating power of the combined heat and power device in turn;
  • T e is the generating power of the conventional unit, the electric load, the wind power generating power, the generating power of the cogeneration device, the heat load, the thermal power of the cogeneration device and the ambient temperature;
  • the reward function is shown in formula (25).
  • the real-time electric-thermal integrated energy system parameters and historical electric-thermal integrated energy system parameters include electric-thermal integrated energy system network parameters, electric-heat load output, and wind power output.
  • the power calculation module 602 is configured to determine the AC power flow as an analysis method of the power system; wherein, the power balance equation of the power system is formula (4);
  • the power calculation module 602 is configured to determine that the hydraulic model of the thermal system is composed of the flow continuity equation, the loop pressure equation and the pressure head loss equation; the thermal system is composed of the hydraulic model and the thermal model; Show.
  • the power calculation module 602 is configured to determine that the thermal model consists of node power equations, pipe temperature drop equations and node medium mixing equations; the thermal model is shown in formula (6).
  • the power calculation module 602 is configured to determine the electricity and heat generation power of the coupling device as formula (7).
  • the model building module 702 is configured to aim at minimizing the total operating cost of the electric-thermal integrated energy system, and at the same time, to realize the maximum consumption of renewable energy, take the unconsumed part of renewable energy as Penalty term, establish objective function; establish constraints of electric-thermal integrated energy system scheduling model, constraints include: node power balance equation constraints, network security constraints, cogeneration device constraints, renewable energy constraints and conventional unit output constraints .
  • the objective function is shown in formula (8); the operating cost of conventional units is shown in formula (9); the operating cost of combined heat and power units is shown in formula (10); the wind curtailment penalty is shown in formula (11) .
  • the node power balance equation constraint condition is based on the network node active power balance equation, as shown in formula (12) to formula (13).
  • the model optimization module 703 is configured to take the power generated by the conventional unit, the cogeneration power and the heat power generated by the cogeneration as the action variables shown in formula (21);
  • the power generation of the cogeneration unit, the output of the conventional unit, the heat load, the thermal power of the cogeneration unit, and the ambient temperature are used as the state space variables shown in equation (22); build a reinforcement learning environment, and get the current action acting on the environment through the policy network, Get immediate rewards and the state of the next period, and provide rewards for strategy evaluation;
  • the goal of reinforcement learning is to maximize long-term rewards, design the opposite of the optimization goal as immediate rewards, and set a penalty mechanism based on constraints to add to the immediate rewards
  • the final reward function is obtained, and the unified expression of penalty items is in the form of formula (23) to formula (24); the reward function is shown in formula (25), including the operating cost of conventional units, the penalty for abandoning wind, the operating cost of cogeneration units, and variables Limit penalty.
  • model training module 704 is configured to assign a value to the target network of the evaluator, and set the capacity D of the intelligent memory bank;
  • S t ) is sampled from the actor policy network, and the control action Applied on the electric-thermal integrated energy system, sampling the uncertainty of wind power to obtain the system's next operating state S t+1 , and then put the state transition and rewards into the experience database D; and update the evaluator Q network, action
  • the policy network ⁇ , the temperature coefficient and the target network are trained to obtain a well-trained policy network, which is used as the optimal scheduling model of the optimized SAC framework.
  • FIG. 8 shows a schematic structural diagram of an electronic device 8 provided by an embodiment of the present disclosure. As shown in FIG. 8 , it includes a memory 801, a processor 802, and the A computer program that can run on the processor, when the processor executes the computer program, it can implement the method for coordination and optimization of an electric-thermal comprehensive energy system as provided in any one of the previous embodiments.
  • An embodiment of the present disclosure also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the coordinated optimization of the electric-thermal comprehensive energy system as described in the previous one is realized method.
  • An embodiment of the present disclosure also provides a computer program, where the computer program includes computer readable codes.
  • the computer readable codes run in the electronic device, the processor of the electronic device executes the computer program provided in any preceding embodiment. - Coordinated optimization method for thermal integrated energy systems.
  • the memory 801 may include random access memory (Random Access Memory, RAM), read-only memory (Read Only Memory, ROM), programmable read-only memory (Programmable Read-Only Memory, PROM), erasable read-only memory (Erasable Programmable Read-Only Memory, EPROM), Electric Erasable Programmable Read-Only Memory (EEPROM), etc.
  • RAM Random Access Memory
  • ROM Read Only Memory
  • PROM Programmable Read-Only Memory
  • PROM Programmable Read-Only Memory
  • EPROM Erasable Programmable Read-Only Memory
  • EEPROM Electric Erasable Programmable Read-Only Memory
  • the processor 802 may be an integrated circuit chip with signal processing capabilities.
  • the above-mentioned processor can be a general-purpose processor, including a CPU, a network processor (Network Processor, NP), etc.; it can also be a DSP, ASIC, FPGA or other programmable logic devices, discrete gate or transistor logic devices, and discrete hardware components.
  • NP Network Processor
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • a computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device.
  • a computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., pulses of light through fiber optic cables), or transmitted electrical signals.
  • the computer-readable program instructions described herein can be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device over at least one of a network, such as the Internet, a local area network, a wide area network, and a wireless network.
  • a network such as the Internet, a local area network, a wide area network, and a wireless network.
  • a network adapter card or a network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .
  • Computer program instructions for performing the operations of the present disclosure may be assembly instructions, Industry Standard Architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or in one or more source or object code written in any combination of programming languages.
  • Computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement.
  • electronic circuits such as programmable logic circuits, FPGAs, or programmable logic arrays (Programmable Logic Arrays, PLAs), can be customized by using state information of computer-readable program instructions, which can execute computer-readable Read program instructions, thereby implementing various aspects of the present disclosure.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the functions are realized in the form of software function units and sold or used as independent products, they can be stored in a non-volatile computer-readable storage medium executable by a processor.
  • the technical solution of the present disclosure is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in various embodiments of the present disclosure.
  • the aforementioned storage medium includes: various media capable of storing program codes such as U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk.
  • An embodiment of the present disclosure provides a coordinated optimization method, system, device, medium and program for an electric-thermal comprehensive energy system, wherein the method is executed by an electronic device, and the method includes: acquiring real-time electric-thermal comprehensive energy system parameters ; Based on the parameters of the electric-thermal integrated energy system, calculate the real-time power generation power of the power system, thermal system and coupling device of the electric-thermal integrated energy system respectively; input the real-time power generation power into the pre-trained optimal dispatching model based on the SAC framework In the process, the scheduling action is output to form a coordination strategy for the electric-thermal integrated energy system.
  • the present disclosure can directly give scheduling actions through the trained policy network without performing traditional nonlinear overall iterative solution, so that the calculation speed is significantly improved.

Abstract

Disclosed in the present application are a coordination and optimization method and system for a comprehensive electric-thermal energy system, and a device, a medium and a program, wherein the method is executed by an electronic device. The method comprises: acquiring real-time parameters of a comprehensive electric-thermal energy system; on the basis of the real-time parameters of the comprehensive electric-thermal energy system, respectively calculating the real-time power-generation power of an electric power system, a thermodynamic system and a coupling apparatus of the comprehensive electric-thermal energy system; and inputting the real-time power-generation power into a pre-trained SAC framework-based optimization scheduling model, and outputting a scheduling action, to form a coordination strategy for the comprehensive electric-thermal energy system.

Description

电-热综合能源系统协调优化方法、系统、设备、介质及程序Coordinated optimization method, system, equipment, medium and program for electric-thermal integrated energy system
相关申请的交叉引用Cross References to Related Applications
本专利申请要求2021年11月15日提交的中国专利申请号为202111349881.4、申请人为中国电力科学研究院有限公司,申请名称为“电-热综合能源系统协调优化方法、系统、设备及存储介质”的优先权,该申请的全文以引用的方式并入本公开中。This patent application requires that the Chinese patent application number submitted on November 15, 2021 is 202111349881.4, the applicant is China Electric Power Research Institute Co., Ltd., and the application name is "coordinated optimization method, system, equipment and storage medium for electric-thermal comprehensive energy system" The entirety of this application is incorporated by reference into this disclosure.
技术领域technical field
本公开涉及综合能源系统优化调度领域,特别是涉及一种电-热综合能源系统协调优化方法、系统、设备、介质及程序。The present disclosure relates to the field of optimal dispatching of integrated energy systems, in particular to a coordination and optimization method, system, equipment, medium and program for electric-thermal integrated energy systems.
背景技术Background technique
在能源互联网的背景下,为了进一步提高能源利用率,缓解能源危机,突破传统能源体系结构和行业壁垒,研究电热综合能源系统协调优化问题成为一种有效的途径。In the context of Energy Internet, in order to further improve energy utilization, alleviate energy crisis, and break through the traditional energy system structure and industry barriers, it has become an effective way to study the coordination and optimization of electric-thermal integrated energy systems.
电-热综合能源系统通过利用热力与电力的互补特性,能够促进可再生能源的消纳,提高能源利用率。在实际应用中,对电-热综合能源系统优化问题的求解,主要包括传统非线性方法求解如粒子群算法、智能算法如Q学习、以及深度Q(Deep Q Network,DQN)算法等求解思路。但粒子群算法与Q学习算法均存在精确度不足、计算速度慢、适用范围局限等问题,DQN算法存在探索能力不足,易陷入局部最优解等问题。The electric-thermal integrated energy system can promote the consumption of renewable energy and improve energy utilization by utilizing the complementary characteristics of heat and electricity. In practical applications, the solution to the optimization problem of the electric-thermal integrated energy system mainly includes traditional nonlinear methods such as particle swarm optimization, intelligent algorithms such as Q learning, and deep Q (Deep Q Network, DQN) algorithms. However, both the particle swarm algorithm and the Q-learning algorithm have problems such as insufficient accuracy, slow calculation speed, and limited scope of application. The DQN algorithm has insufficient exploration ability and is easy to fall into local optimal solutions.
总体来说,耦合愈加密切的综合能源系统优化调度问题存在高度的非线性特征,目前缺少一种经济、准确、可靠的求解方法。因此亟需需要一种收敛可靠、探索最优策略能力强、满足高精度要求的智能算法,对电-热综合能源系统进行优化。Generally speaking, the optimal dispatching problem of the integrated energy system with increasingly close coupling has highly nonlinear characteristics, and currently lacks an economical, accurate and reliable solution method. Therefore, there is an urgent need for an intelligent algorithm with reliable convergence, strong ability to explore optimal strategies, and high-precision requirements to optimize the electric-thermal integrated energy system.
发明内容Contents of the invention
本公开实施例提供了一种电-热综合能源系统协调优化方法、系统、设备、介质及程序。Embodiments of the present disclosure provide a coordinated optimization method, system, equipment, medium and program for an electric-thermal comprehensive energy system.
本公开实施例提供了一种电-热综合能源系统协调优化方法,所述方法由电子设备执行;所述方法包括:An embodiment of the present disclosure provides a coordinated optimization method for an electric-thermal comprehensive energy system, the method is executed by an electronic device; the method includes:
获取实时电-热综合能源系统参数;Obtain real-time electric-thermal comprehensive energy system parameters;
基于所述实时电-热综合能源系统参数,分别计算电-热综合能源系统的电力系统、热力系统以及耦合装置的实时发电功率;Based on the real-time electric-thermal comprehensive energy system parameters, calculate the real-time power generation power of the electric power system, thermal system and coupling device of the electric-thermal comprehensive energy system;
将所述实时发电功率输入至预先训练的基于柔性行动器-评判器(Soft Actor-Critic,SAC)框架优化调度模型中,输出调度动作,形成电-热综合能源系统协调策略;Input the real-time generated power into the pre-trained optimal scheduling model based on the Soft Actor-Critic (SAC) framework, output the scheduling action, and form the coordination strategy of the electric-thermal comprehensive energy system;
所述预先训练的基于SAC框架优化调度模型的训练优化方法包括:The training optimization method based on the SAC frame optimization dispatching model of described pre-training comprises:
获取历史电-热综合能源系统参数;Obtain historical electric-thermal comprehensive energy system parameters;
基于所述历史电-热综合能源系统参数,分别计算电-热综合能源系统的电力系统、热力系统以及耦合装置的历史发电功率,并以电力系统、热力系统以及耦合装置的历史发电功率为基础建立电-热综合能源系统调度模型;Based on the historical electric-thermal integrated energy system parameters, calculate the historical power generation power of the electric power system, thermal system and coupling device of the electric-thermal integrated energy system, and based on the historical power generation power of the electric power system, thermal system and coupling device Establish a dispatching model for an electric-thermal integrated energy system;
以强化学习环境、状态、动作及奖励作为基本要素,结合电-热综合能源系统调度模型建立基于SAC框架优化调度模型;Taking reinforcement learning environment, state, action and reward as basic elements, combined with electric-thermal comprehensive energy system scheduling model to establish an optimal scheduling model based on SAC framework;
对基于SAC框架优化调度模型进行训练,得到预先训练的基于SAC框架优化调度模型;Train the optimal scheduling model based on the SAC framework to obtain a pre-trained optimal scheduling model based on the SAC framework;
所述以强化学习环境、状态、动作及奖励作为基本要素,结合电-热综合能源系统调度模型建立基于SAC框架优化调度模型,包括:The basic elements of strengthening the learning environment, state, action and reward are combined with the electric-thermal comprehensive energy system scheduling model to establish an optimal scheduling model based on the SAC framework, including:
设置动作变量为
Figure PCTCN2022107149-appb-000001
其中,P i G
Figure PCTCN2022107149-appb-000002
以及P i chp依次为常规机组发电功率、热电联产装置热功率以及热电联产装置发电功率;
Set the action variable to
Figure PCTCN2022107149-appb-000001
Among them, P i G ,
Figure PCTCN2022107149-appb-000002
And P i chp is the generating power of the conventional unit, the thermal power of the combined heat and power device, and the generating power of the combined heat and power device in turn;
确定状态空间变量为
Figure PCTCN2022107149-appb-000003
其中,P i G、P load、P w、P i chp、 H load
Figure PCTCN2022107149-appb-000004
以及T e依次为常规机组发电功率、电负荷、风力发电功率、热电联产装置发电功率、热负荷、热电联产装置热功率以及环境温度;
Determine the state space variable as
Figure PCTCN2022107149-appb-000003
Among them, P i G , P load , P w , P i chp , H load ,
Figure PCTCN2022107149-appb-000004
And T e is the generating power of the conventional unit, the electric load, the wind power generating power, the generating power of the cogeneration device, the heat load, the thermal power of the cogeneration device and the ambient temperature;
搭建强化学习环境,通过策略网络得到当前动作作用于环境,得到即时奖励和下一时段的状态,并为策略评估提供奖励;Build an intensive learning environment, get the current action acting on the environment through the policy network, get immediate rewards and the state of the next period, and provide rewards for strategy evaluation;
设置强化学习目标为实现长期奖励最大化,确定优化目标的相反数设计为即时奖励,同时根据约束条件设置惩罚机制加入到所述即时奖励中得到最终的奖励函数;其中,所述惩罚机制为:Set the reinforcement learning goal to maximize the long-term reward, determine the opposite number of the optimization goal and design it as an immediate reward, and set a penalty mechanism according to the constraints to add to the immediate reward to obtain the final reward function; wherein, the penalty mechanism is:
Figure PCTCN2022107149-appb-000005
Figure PCTCN2022107149-appb-000005
Figure PCTCN2022107149-appb-000006
Figure PCTCN2022107149-appb-000006
β v为惩罚系数;常数系数与越限惩罚相应设置; β v is the penalty coefficient; the constant coefficient is set correspondingly to the limit violation penalty;
所述奖励函数为
Figure PCTCN2022107149-appb-000007
f 1、f 2、以及f 3分别为常规机组运行成本、热电联产装置运行成本以及弃风惩罚;
Figure PCTCN2022107149-appb-000008
以及
Figure PCTCN2022107149-appb-000009
分别为常规机组出力越限及爬坡越限惩罚项;φ V为系统节点电压越限惩罚;
Figure PCTCN2022107149-appb-000010
以及
Figure PCTCN2022107149-appb-000011
分别为热电联产机组出力及爬坡越限惩罚项;φ T为系统节点温度越限惩罚;φ m为系统管道质量流量速率越限惩罚。
The reward function is
Figure PCTCN2022107149-appb-000007
f 1 , f 2 , and f 3 are the operating cost of the conventional unit, the operating cost of the combined heat and power unit, and the wind curtailment penalty, respectively;
Figure PCTCN2022107149-appb-000008
as well as
Figure PCTCN2022107149-appb-000009
Respectively, the conventional unit output over-limit and climbing over-limit penalty items; φ V is the system node voltage over-limit penalty;
Figure PCTCN2022107149-appb-000010
as well as
Figure PCTCN2022107149-appb-000011
Respectively, the output of the combined heat and power unit and the over-limit penalty for climbing; φ T is the over-limit penalty for the system node temperature; φ m is the over-limit penalty for the mass flow rate of the system pipeline.
本公开实施例还提供了一种电-热综合能源系统协调优化系统,包括:An embodiment of the present disclosure also provides a coordinated optimization system for an electric-thermal comprehensive energy system, including:
第一参数获取模块,配置为获取实时电-热综合能源系统参数;The first parameter acquisition module is configured to acquire real-time electric-thermal comprehensive energy system parameters;
功率计算模块,配置为基于所述实时电-热综合能源系统参数,分别计算电-热综合能源系统的电力系统、热力系统以及耦合装置的实时发电功率;The power calculation module is configured to calculate the real-time power generation power of the electric power system, the thermal system and the coupling device of the electric-thermal comprehensive energy system based on the real-time electric-thermal comprehensive energy system parameters;
调度输出模块,配置为将所述实时发电功率输入至预先训练的基于SAC框架优化调度模型中,输出调度动作,形成电-热综合能源系统协调策略;The scheduling output module is configured to input the real-time generated power into the pre-trained optimal scheduling model based on the SAC framework, output scheduling actions, and form an electric-thermal comprehensive energy system coordination strategy;
所述调度输出模块中预先训练的基于SAC框架优化调度模型包括:The optimized scheduling model based on the SAC framework of the pre-training in the scheduling output module includes:
第二参数获取模块,配置为获取历史电-热综合能源系统参数;The second parameter acquisition module is configured to acquire historical electric-thermal comprehensive energy system parameters;
模型建立模块,配置为基于所述历史电-热综合能源系统参数,分别计算电-热综合能源系统的电力系统、热力系统以及耦合装置的历史发电功率,并以电力系统、热力系统以及耦合装置的历史发电功率为基础建立电-热综合能源系统调度模型;The model building module is configured to calculate the historical power generation power of the electric power system, thermal system and coupling device of the electric-thermal comprehensive energy system based on the historical electric-thermal comprehensive energy system parameters, and use the electric power system, thermal system and coupling device The dispatching model of the electric-thermal integrated energy system is established on the basis of the historical power generation;
模型优化模块,配置为以强化学习环境、状态、动作及奖励作为基本要素,结合电-热综合能源系统调度模型建立基于SAC框架优化调度模型;The model optimization module is configured to take the reinforcement learning environment, state, action and reward as the basic elements, combined with the electric-thermal comprehensive energy system scheduling model to establish an optimal scheduling model based on the SAC framework;
模型训练模块,配置为对基于SAC框架优化调度模型进行训练,得到预先训练的基于SAC框架优化调度模型。The model training module is configured to train the optimal scheduling model based on the SAC framework to obtain a pre-trained optimal scheduling model based on the SAC framework.
所述模型优化模块,配置为设置动作变量为
Figure PCTCN2022107149-appb-000012
其中,P i G
Figure PCTCN2022107149-appb-000013
以及P i chp依次为常规机组发电功率、热电联产装置热功率以及热电联产装置发电功率;
The model optimization module is configured to set the action variable as
Figure PCTCN2022107149-appb-000012
Among them, P i G ,
Figure PCTCN2022107149-appb-000013
And P i chp is the generating power of the conventional unit, the thermal power of the combined heat and power device, and the generating power of the combined heat and power device in turn;
确定状态空间变量为
Figure PCTCN2022107149-appb-000014
其中,P i G、P load、P w、P i chp、H load
Figure PCTCN2022107149-appb-000015
以及T e依次为常规机组发电功率、电负荷、风力发电功率、热电联产装置发电功率、热 负荷、热电联产装置热功率以及环境温度;
Determine the state space variable as
Figure PCTCN2022107149-appb-000014
Among them, P i G , P load , P w , P i chp , H load ,
Figure PCTCN2022107149-appb-000015
And T e is the generating power of the conventional unit, the electric load, the wind power generating power, the generating power of the cogeneration device, the heat load, the thermal power of the cogeneration device and the ambient temperature;
搭建强化学习环境,通过策略网络得到当前动作作用于环境,得到即时奖励和下一时段的状态,并为策略评估提供奖励;Build an intensive learning environment, get the current action acting on the environment through the policy network, get immediate rewards and the state of the next period, and provide rewards for strategy evaluation;
设置强化学习目标为实现长期奖励最大化,确定优化目标的相反数设计为即时奖励,同时根据约束条件设置惩罚机制加入到所述即时奖励中得到最终的奖励函数;其中,所述惩罚机制为:Set the reinforcement learning goal to maximize the long-term reward, determine the opposite number of the optimization goal and design it as an immediate reward, and set a penalty mechanism according to the constraints to add to the immediate reward to obtain the final reward function; wherein, the penalty mechanism is:
Figure PCTCN2022107149-appb-000016
Figure PCTCN2022107149-appb-000016
Figure PCTCN2022107149-appb-000017
Figure PCTCN2022107149-appb-000017
式中,β v为惩罚系数;常数系数与越限惩罚相应设置; In the formula, β v is the penalty coefficient; the constant coefficient is set correspondingly to the limit violation penalty;
所述奖励函数为
Figure PCTCN2022107149-appb-000018
f 1、f 2、以及f 3分别为常规机组运行成本、热电联产装置运行成本以及弃风惩罚;
Figure PCTCN2022107149-appb-000019
以及
Figure PCTCN2022107149-appb-000020
分别为常规机组出力越限及爬坡越限惩罚项;φ V为系统节点电压越限惩罚;
Figure PCTCN2022107149-appb-000021
以及
Figure PCTCN2022107149-appb-000022
分别为热电联产机组出力及爬坡越限惩罚项;φ T为系统节点温度越限惩罚;φ m为系统管道质量流量速率越限惩罚。
The reward function is
Figure PCTCN2022107149-appb-000018
f 1 , f 2 , and f 3 are the operating cost of the conventional unit, the operating cost of the combined heat and power unit, and the wind curtailment penalty, respectively;
Figure PCTCN2022107149-appb-000019
as well as
Figure PCTCN2022107149-appb-000020
Respectively, the conventional unit output over-limit and climbing over-limit penalty items; φ V is the system node voltage over-limit penalty;
Figure PCTCN2022107149-appb-000021
as well as
Figure PCTCN2022107149-appb-000022
Respectively, the output of the combined heat and power unit and the over-limit penalty for climbing; φ T is the over-limit penalty for the system node temperature; φ m is the over-limit penalty for the mass flow rate of the system pipeline.
本公开实施例还提供了一种电子设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如前任一所述电-热综合能源系统协调优化方法。An embodiment of the present disclosure also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and operable on the processor. When the processor executes the computer program, the foregoing A coordinated optimization method for the electric-thermal comprehensive energy system.
本公开实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现如前任一所述电-热综合能源系统协调优化方法。An embodiment of the present disclosure also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, it realizes the coordinated optimization of the electric-thermal comprehensive energy system as described above method.
本公开实施例还提供了一种计算机程序,所述计算机程序包括计算机可读代码,在所述计算机可读代码在电子设备中运行的情况下,所述电子设备的处理器执行用于如前任一所述电-热综合能源系统协调优化方法。An embodiment of the present disclosure also provides a computer program, the computer program includes computer readable codes, and when the computer readable codes run in an electronic device, the processor of the electronic device executes the A coordinated optimization method for the electric-thermal comprehensive energy system.
本公开采用基于优化的SAC框架的优化调度模型在给定负荷等系统状态时给出调度动作,进而生成策略;当系统给出确定的负荷等系统状态时,该算法可以直接通过已经训练好的策略网络给出调度动作,无需再进行传统非线性整体迭代求解,计算速度得到了显著的提升,使得计算效率更高。This disclosure uses an optimized scheduling model based on an optimized SAC framework to give scheduling actions when a system state such as a load is given, and then generate a strategy; when the system gives a determined system state such as a load, the algorithm can directly pass the trained The strategy network gives the scheduling action, no need to solve the traditional nonlinear overall iterative solution, the calculation speed has been significantly improved, and the calculation efficiency is higher.
本公开建立电-热综合能源系统模型后建立基于SAC框架的优化调度模型,SAC算法属于随机离线策略算法,其自趋优的特性对综合能源优化调度问题进行自主学习,通过和环境交互观测得到奖励值评价策略优劣,学习探索所有的最优策略途径,从而在无监督下实现调度周期内成本最优。经过学习训练之后的网络模型,解决了离散处理状态和动作所带来维数灾难的问题。并且能够可靠收敛,适用范围广,例如可以适用于考虑风电等可再生能源及负荷的出力随机性变化的场景,更加具有普适性。After the electric-thermal comprehensive energy system model is established in this disclosure, an optimal scheduling model based on the SAC framework is established. The SAC algorithm belongs to a random offline strategy algorithm, and its self-optimizing characteristics are used for autonomous learning of comprehensive energy optimal scheduling problems, and are obtained through interactive observation with the environment. The reward value evaluates the pros and cons of the strategy, and learns to explore all the optimal strategy approaches, so as to achieve the optimal cost in the scheduling cycle under unsupervised. The network model after learning and training solves the problem of dimensionality disaster caused by discrete processing state and action. Moreover, it can converge reliably and has a wide range of applications. For example, it can be applied to scenarios where random changes in output of renewable energy such as wind power and loads are considered, and it is more universal.
附图说明Description of drawings
为了更清楚地说明本公开实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,此处的附图被并入说明书中并构成本说明书中的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开实施例的技术方案。应当理解,以下附图仅示出了本公开的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它相关的附图。In order to illustrate the technical solutions of the embodiments of the present disclosure more clearly, the following will briefly introduce the accompanying drawings used in the embodiments. The accompanying drawings here are incorporated into the specification and constitute a part of the specification. The drawings show embodiments consistent with the present disclosure, and are used together with the specification to illustrate the technical solutions of the embodiments of the present disclosure. It should be understood that the following drawings only show some embodiments of the present disclosure, and therefore should not be regarded as limiting the scope. For those skilled in the art, they can also make Other related drawings are derived from these drawings.
图1示出了本公开实施例提供的电-热综合能源系统协调优化方法的流程示意图;Fig. 1 shows a schematic flow diagram of a coordinated optimization method for an electric-thermal comprehensive energy system provided by an embodiment of the present disclosure;
图2示出了本公开实施例提供的预先训练的基于SAC框架优化调度模型的训练优化方法的流程示意图;FIG. 2 shows a schematic flow diagram of a pre-trained training optimization method based on an SAC framework optimization scheduling model provided by an embodiment of the present disclosure;
图3示出了电-热综合能源系统的结构示意图;Figure 3 shows a schematic structural view of an electric-thermal integrated energy system;
图4示出了热力系统的结构示意图;Fig. 4 shows the structural representation of thermal system;
图5示出了基于SAC框架的优化调度模型进行训练具体算法流程图;Figure 5 shows a specific algorithm flow chart for training based on the optimal scheduling model of the SAC framework;
图6示出了本公开实施例提供的电-热综合能源系统协调优化系统的结构示意图;Fig. 6 shows a schematic structural diagram of an electric-thermal comprehensive energy system coordination and optimization system provided by an embodiment of the present disclosure;
图7示出了本公开实施例通过的预先训练的基于SAC框架优化调度模型的结构示意图;FIG. 7 shows a schematic structural diagram of a pre-trained SAC framework-based optimization scheduling model adopted by an embodiment of the present disclosure;
图8示出了本公开实施例提供的电子设备的结构示意图。FIG. 8 shows a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
具体实施方式Detailed ways
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本公开实施例的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本公开的实施例的详细描述并非旨在限制要求保护的本公开的范围,而是仅仅表示本公开的选定实施例。基于本公开的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其它实施例,都属于本公开保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present disclosure clearer, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only It is a part of the embodiments of the present disclosure, but not all of them. The components of the disclosed embodiments generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations. Accordingly, the following detailed description of the embodiments of the present disclosure provided in the accompanying drawings is not intended to limit the scope of the claimed disclosure, but merely represents selected embodiments of the present disclosure. Based on the embodiments of the present disclosure, all other embodiments obtained by those skilled in the art without creative efforts shall fall within the protection scope of the present disclosure.
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。It should be noted that like numerals and letters denote similar items in the following figures, therefore, once an item is defined in one figure, it does not require further definition and explanation in subsequent figures.
在电-热综合能源系统优化技术领域,相关技术提供了一种粒子群优化算法的技术方案,其中,粒子群优化算法是一种进化计算机技术,该算法通过粒子的简单行为、以及粒子群体与环境的信息交互,从系统随机初始值开始,经过迭代寻求最优值。为了实现上述方案,在电热综合能源系统(Integrated Electricity-Heat Energy System,IEHS)优化调度模型中,首先需要建立目标函数并确定约束条件,如电网、热网潮流约束、设备出力约束、以及安全运行约束等,然后利用粒子群算法对IEHS优化问题进行求解。In the field of electric-thermal comprehensive energy system optimization technology, related technologies provide a technical solution of particle swarm optimization algorithm, in which particle swarm optimization algorithm is an evolutionary computer technology. The information interaction of the environment starts from the random initial value of the system, and seeks the optimal value through iteration. In order to realize the above scheme, in the integrated electricity-heat energy system (Integrated Electricity-Heat Energy System, IEHS) optimal dispatching model, it is first necessary to establish the objective function and determine the constraints, such as power grid, heating network power flow constraints, equipment output constraints, and safe operation Constraints, etc., and then use the particle swarm optimization algorithm to solve the IEHS optimization problem.
粒子群算法求解步骤为:The solution steps of the particle swarm algorithm are:
1)设置参数:迭代次数、自变量个数、粒子最大速度、粒子群初始速度以及位置;1) Setting parameters: number of iterations, number of independent variables, maximum velocity of particles, initial velocity and position of particle swarm;
2)定义适应度函数:根据IEHS优化调度模型确定优化目标。在每次迭代过程中,粒子的最优解即为粒子的极值,全局最优解取所有粒子的最小值,与上一次全局最优比较,按照式(1)进行更新:2) Define the fitness function: determine the optimization goal according to the IEHS optimal scheduling model. In each iteration process, the optimal solution of the particle is the extreme value of the particle, and the global optimal solution takes the minimum value of all particles, compared with the last global optimal solution, it is updated according to formula (1):
Figure PCTCN2022107149-appb-000023
Figure PCTCN2022107149-appb-000023
其中,i的取值可以为大于或等于1的整数,其表示群中粒子的总数;V id是粒子的速度;P gd是粒子的极值;X id是粒子当前的位置;random(0,1)是0与1之间的随机数;C 1和C 2是学习因子;ω为惯性因子,其值非负。 Wherein, the value of i can be an integer greater than or equal to 1, which represents the total number of particles in the group; V id is the velocity of the particle; P gd is the extreme value of the particle; X id is the current position of the particle; random(0, 1) is a random number between 0 and 1; C 1 and C 2 are learning factors; ω is an inertia factor, and its value is non-negative.
3)停止迭代条件:达到最大迭代次数或迭代差值满足精度要求。3) Conditions for stopping the iteration: the maximum number of iterations is reached or the iteration difference meets the accuracy requirement.
以上算法的缺点为:(1)粒子群算法收敛性不足,甚至会出现难以收敛的情况,并且易陷入局部最优解,无法得到电热综合能源系统经济调度的最优解,计算精度欠缺,从而失去了多能源协同优化的意义;(2)耦合愈来愈密切的电热综合能源系统呈现出高度非线性特征,粒子群算法在求解时存在计算速度显著降低,计算效率无法满足电热综合能源系统经济调度的问题。The disadvantages of the above algorithms are as follows: (1) The particle swarm optimization algorithm has insufficient convergence, and it may even be difficult to converge, and it is easy to fall into a local optimal solution. Lost the significance of multi-energy collaborative optimization; (2) The electrothermal integrated energy system with increasingly close coupling presents highly nonlinear characteristics, and the calculation speed of the particle swarm algorithm is significantly reduced when solving, and the calculation efficiency cannot meet the economic requirements of the electrothermal integrated energy system. Scheduling issues.
在电-热综合能源系统优化技术领域,相关技术还提供了一种Q学习的方案,该方案需通过Q学习算法实现。其中,Q学习算法以马尔科夫决策过程为基础,是一种与模型无关的强化学习算法。采用Q学习算法的进行电-热综合能源系统优化的一般步骤为:设计动作状态空间,对连续动作空间以及状态空间进行离散化,根据系统优化目标和运行约束建立Q学习奖惩机制,智能体通过不断试错探索, 与环境进行交互并更新Q值表,最终达到自主选择最优动作的目标。In the field of electric-thermal comprehensive energy system optimization technology, related technologies also provide a Q-learning solution, which needs to be realized through a Q-learning algorithm. Among them, the Q-learning algorithm is based on the Markov decision process and is a model-independent reinforcement learning algorithm. The general steps to optimize the electric-thermal comprehensive energy system using the Q-learning algorithm are: design the action state space, discretize the continuous action space and the state space, and establish the Q-learning reward and punishment mechanism according to the system optimization goals and operating constraints. Continuous trial and error exploration, interact with the environment and update the Q value table, and finally achieve the goal of autonomously selecting the optimal action.
在Q值表的每次训练内,针对某时刻T,根据该时刻状态s t从Q值表中选取动作a t。将该动作作用于环境得到即时奖励,并完成状态转移进入下一个状态s t'。根据贝尔曼最优准则,得到最优策略所对应的最优指标为:该时刻下电-热综合能源系统智能体动作a t所得到的即时奖励rt、与后续状态转移所得到的最大Q值max a'Q(s t',a t')之和。因此可根据贝尔曼最优准则通过式(2)进行Q值表的更新: In each training of the Q-value table, for a certain time T, an action a t is selected from the Q-value table according to the state st at that moment. Apply the action to the environment to get immediate rewards, and complete the state transition to enter the next state s t '. According to the Bellman optimal criterion, the optimal index corresponding to the optimal strategy is: the instant reward rt obtained by the agent action a t of the electric-thermal integrated energy system at this moment, and the maximum Q value obtained by the subsequent state transition The sum of max a' Q(s t ',a t '). Therefore, the Q value table can be updated according to the Bellman optimal criterion through formula (2):
Q(s,a)←Q(s,a)+α[r t+γmax a'Q(s t',a t')-Q(s,a)]      (2) Q(s,a)←Q(s,a)+α[r t +γmax a' Q(s t ',a t ')-Q(s,a)] (2)
经过多次训练后,智能体可根据输入的状态信息和Q值表,做出电-热综合能源系统最优控制动作。在式(2)中,α和γ取值范围均为[0,1]。After several times of training, the agent can make the optimal control action of the electric-thermal integrated energy system according to the input state information and Q value table. In formula (2), the value ranges of α and γ are both [0,1].
上述方案的缺点为:The disadvantages of the above scheme are:
(1)电-热综合能源系统动作状态空间多为连续区间,为了使用Q学习算法需要对连续空间进行离散化处理,以离散后的动作空间、状态空间为基础进行计算会导致计算准确度大幅下降;(2)Q学习算法不适用于大规模电-热综合能源系统优化问题求解,问题规模增大导致动作空间维度以及网络复杂度增大;而动作空间增大导致Q指标维度大幅度增大,会导致训练难度的提高、Q指标拟合能力差,从而难以对高复杂度电热综合能源系统网络进行建模。(1) The action state space of the electric-thermal integrated energy system is mostly a continuous interval. In order to use the Q-learning algorithm, the continuous space needs to be discretized. Calculation based on the discretized action space and state space will lead to a significant increase in calculation accuracy. (2) The Q-learning algorithm is not suitable for solving large-scale electric-thermal integrated energy system optimization problems. The increase in the scale of the problem will lead to an increase in the dimension of the action space and the complexity of the network; and the increase in the action space will lead to a substantial increase in the dimension of the Q index. If it is too large, it will lead to the increase of training difficulty and the poor fitting ability of Q index, which makes it difficult to model the high-complexity electric-thermal integrated energy system network.
在电-热综合能源系统优化技术领域,相关技术还提供了DQN学习的方案。其中,DQN是深度学习和决策能力的强化学习相结合的产物。DQN在构建深度学习网络时,直接从高维原始数据中学习控制策略,从而扩展了强化学习的实用性。在应用DQN进行电-热综合能源系统优化时,首先应设计动作、状态空间,构建深度学习网络拟合Q值,并构建经验回放单元存储历史样本。每次训练对经验回放单元进行随机采样,根据采样样本进行Q网络训练。In the field of electro-thermal comprehensive energy system optimization technology, related technologies also provide a DQN learning solution. Among them, DQN is the product of the combination of deep learning and reinforcement learning of decision-making ability. DQN extends the practicability of reinforcement learning by learning control policies directly from high-dimensional raw data when building deep learning networks. When applying DQN to optimize the electric-thermal comprehensive energy system, the action and state space should be designed first, the deep learning network should be constructed to fit the Q value, and the experience playback unit should be constructed to store historical samples. The experience playback unit is randomly sampled for each training, and the Q network is trained according to the sampled samples.
DQN算法首先从环境中获取观测值,智能体根据值函数神经网络得到的关于该观测值的所有Q(s,a);然后智能体利用策略算法做出决策,得到动作,并从环境中得到反馈的奖励值r;再利用得到的奖励r去更新值函数网络的参数,从而进入下一次迭代。直到完成网络训练。The DQN algorithm first obtains the observation value from the environment, and the agent obtains all Q(s, a) about the observation value according to the value function neural network; then the agent uses the strategy algorithm to make a decision, obtains the action, and obtains Feedback reward value r; then use the obtained reward r to update the parameters of the value function network, so as to enter the next iteration. until the network training is completed.
在上述训练过程中,DQN需要定义相应的损失函数,并利用梯度下降算法来更新参数。通过不断更新神经网络权重参数,使得Q网络的输出值能够逐渐逼近最优的Q值。损失函数的定义基于残差模型,即真实值和网络输出的差的平方,如式(3)所示:In the above training process, DQN needs to define the corresponding loss function, and use the gradient descent algorithm to update the parameters. By continuously updating the weight parameters of the neural network, the output value of the Q network can gradually approach the optimal Q value. The definition of the loss function is based on the residual model, that is, the square of the difference between the real value and the network output, as shown in formula (3):
Figure PCTCN2022107149-appb-000024
Figure PCTCN2022107149-appb-000024
为了减少相关性提高算法稳定性,DQN在原来的Q网络的基础之上又引入了一个target Q网络,该网络与Q网络的结构一样,且初始权重也一样,只是Q网络在每次迭代过程中都会更新参数,而target Q网络的参数则是每隔一段时间才会更新。In order to reduce the correlation and improve the stability of the algorithm, DQN introduces a target Q network on the basis of the original Q network. This network has the same structure as the Q network, and the initial weight is the same, but the Q network The parameters will be updated in the network, while the parameters of the target Q network will be updated every once in a while.
上述方案相较于Q学习方法更适用于连续控制动作场景,但是连续动作空间探索难度更为复杂,探索难度大,DQN采用特定的策略算法难以保证对状态空间的有效探索,可能会出现局部最优解问题。Compared with the Q-learning method, the above scheme is more suitable for continuous control action scenes, but the difficulty of continuous action space exploration is more complex, and the exploration is difficult. It is difficult for DQN to use a specific strategy algorithm to ensure effective exploration of the state space, and local maximums may occur. Solve the problem optimally.
基于以上问题,本公开采用SAC算法以解决电-热综合能源系统经济调度求解问题,提出了一种电-热综合能源系统协调优化方法,图1示出了本公开实施例提供的电-热综合能源系统协调优化方法的流程示意图,如图1所示,该方法由电子设备执行,该方法包括以下步骤:Based on the above problems, the present disclosure adopts the SAC algorithm to solve the economic scheduling problem of the electric-thermal integrated energy system, and proposes a coordinated optimization method for the electric-thermal integrated energy system. Figure 1 shows the electric-thermal energy system provided by the embodiment of the present disclosure. A schematic flow chart of a coordinated optimization method for an integrated energy system, as shown in Figure 1, the method is performed by an electronic device, and the method includes the following steps:
步骤101、获取实时电-热综合能源系统参数。 Step 101. Obtain real-time electric-thermal comprehensive energy system parameters.
步骤102、基于实时电-热综合能源系统参数,分别计算电-热综合能源系统的电力系统、热力系 统以及耦合装置的实时发电功率。 Step 102. Based on the parameters of the real-time electric-thermal comprehensive energy system, calculate the real-time power generation power of the electric power system, the thermal system and the coupling device of the electric-thermal comprehensive energy system respectively.
步骤103、将实时发电功率输入至预先训练的基于SAC框架优化调度模型中,输出调度动作,形成电-热综合能源系统协调策略。Step 103: Input the real-time generated power into the pre-trained optimal scheduling model based on the SAC framework, output scheduling actions, and form a coordination strategy for the electric-thermal comprehensive energy system.
图2示出了本公开实施例提供的预先训练的基于SAC框架优化调度模型的训练优化方法的流程示意图,如图2所示,该流程包括:FIG. 2 shows a schematic flow diagram of a pre-trained training optimization method based on an SAC framework optimization scheduling model provided by an embodiment of the present disclosure. As shown in FIG. 2 , the process includes:
步骤201、获取历史电-热综合能源系统参数。 Step 201. Obtain historical electric-thermal comprehensive energy system parameters.
步骤202、基于历史电-热综合能源系统参数,分别计算电-热综合能源系统的电力系统、热力系统以及耦合装置的历史发电功率,并以电力系统、热力系统以及耦合装置的历史发电功率为基础建立电-热综合能源系统调度模型。 Step 202. Based on the parameters of the historical electric-thermal integrated energy system, respectively calculate the historical power generation power of the electric-thermal integrated energy system, the thermal system and the coupling device, and take the historical power generation power of the electric power system, the thermal system and the coupling device as Basic establishment of electric-thermal comprehensive energy system dispatching model.
步骤203、以强化学习环境、状态、动作及奖励作为基本要素,结合电-热综合能源系统调度模型建立基于SAC框架优化调度模型。 Step 203, taking reinforcement learning environment, state, action and reward as the basic elements, and combining with the electricity-thermal integrated energy system scheduling model to establish an optimal scheduling model based on the SAC framework.
步骤204、对基于SAC框架优化调度模型进行训练,得到预先训练的基于SAC框架优化调度模型。Step 204: Train the optimal scheduling model based on the SAC framework to obtain a pre-trained optimal scheduling model based on the SAC framework.
本公开基于SAC算法的电-热综合能源系统经济调度方法,采用连续控制策略,同时将最大熵添加到优化目标中,能够与电热综合能源系统交互、学习并生成最优控制策略,解决了电-热综合能源系统协同优化求解过程中高维求解、收敛困难、难以精准寻优的问题。因此,本公开为综合能源系统协同优化思路提供了有力的技术支持,为综合能源调度决策人员提供决策依据,对实现系统的多能互补、可再生能源消纳、提高系统运行经济性具有重要意义。The economic scheduling method of the electric-thermal comprehensive energy system based on the SAC algorithm in this disclosure adopts a continuous control strategy, and at the same time adds the maximum entropy to the optimization target, which can interact with the electric-thermal comprehensive energy system, learn and generate an optimal control strategy, and solves the problem of electric heating. -The problem of high-dimensional solution, difficult convergence, and difficult precise optimization in the process of collaborative optimization of thermal integrated energy system. Therefore, this disclosure provides strong technical support for the idea of collaborative optimization of integrated energy systems, provides decision-making basis for decision-makers of integrated energy dispatching, and is of great significance for realizing multi-energy complementarity of the system, renewable energy consumption, and improving system operation economy .
本公开采用深度强化学习方法求解电热综合能源系统经济调度问题,主要用于解决传统方法存在以下问题:1)深度强化学习方法利用神经网络拟合电热综合能源系统不同状态下的最优策略,网络训练完成后可实时得出调度策略,而传统的非线性传统算法需要全局寻优,深度强化学习方法提升了计算效率。2)深度强化学习在电-热综合能源系统的优化调度问题中探索能力更强,收敛稳定性更好,相较于智能体算法粒子群算法,调度成本更低。3)本公开提出的基于SAC算法的电-热综合能源系统深度强化学习经济调度方法,采用连续控制策略,克服了基于值函数强化学习方法离散化变量带来的高维求解困难问题,同时将最大熵添加到优化目标中,探索各种最优可能性。This disclosure adopts the deep reinforcement learning method to solve the economic scheduling problem of the electrothermal integrated energy system, which is mainly used to solve the following problems in the traditional method: 1) The deep reinforcement learning method uses the neural network to fit the optimal strategy of the electrothermal integrated energy system under different states, and the network After the training is completed, the scheduling strategy can be obtained in real time, while the traditional nonlinear traditional algorithm requires global optimization, and the deep reinforcement learning method improves the computational efficiency. 2) Deep reinforcement learning has stronger exploration ability and better convergence stability in the optimal scheduling problem of electric-thermal integrated energy system. Compared with the particle swarm algorithm of the agent algorithm, the scheduling cost is lower. 3) The SAC algorithm-based deep reinforcement learning economic scheduling method for electric-thermal comprehensive energy systems proposed in this disclosure adopts a continuous control strategy to overcome the difficult problem of high-dimensional solution caused by the discretization of variables based on the value function reinforcement learning method. Maximum entropy is added to the optimization objective, exploring various optimal possibilities.
以下结合具体实施和附图对发明的电-热综合能源系统协调优化方法进行详细说明。The method for coordinating and optimizing the electric-thermal comprehensive energy system of the invention will be described in detail below in conjunction with specific implementation and accompanying drawings.
本公开的方法包括以下步骤:The disclosed method includes the following steps:
步骤1、导入电-热综合能源系统参数。Step 1. Import the parameters of the electric-thermal comprehensive energy system.
图3示出了电-热综合能源系统的结构示意图,图3所示的系统包括电力系统301以及热力系统302组成。在本公开实施例中,需要首先收集如图3所示电热综合能源系统网络参数和电热负荷出力及风电出力。在本公开实施例中,实时电-热综合能源系统参数以及历史电-热综合能源系统参数包括电热综合能源系统网络参数和电热负荷出力及风电出力。采集的数据如表1所示。FIG. 3 shows a schematic structural diagram of an electric-thermal comprehensive energy system. The system shown in FIG. 3 includes an electric power system 301 and a thermal system 302 . In the embodiment of the present disclosure, it is necessary to first collect the network parameters of the electrothermal integrated energy system as shown in FIG. 3 , the output of the electrothermal load, and the output of wind power. In the embodiment of the present disclosure, the real-time electric-thermal comprehensive energy system parameters and historical electric-thermal comprehensive energy system parameters include electric-thermal comprehensive energy system network parameters, electric-heat load output and wind power output. The collected data are shown in Table 1.
表1电-热综合能源参数表Table 1 Electric-thermal comprehensive energy parameter table
Figure PCTCN2022107149-appb-000025
Figure PCTCN2022107149-appb-000025
基于实时电-热综合能源系统参数,分别计算点-热综合能源系统的电力系统、热力系统以及耦合装置的实时发电功率,可以通过步骤2实现:Based on the parameters of the real-time electric-thermal integrated energy system, the real-time power generation power of the power system, thermal system and coupling device of the point-thermal integrated energy system can be calculated separately, which can be realized by step 2:
步骤2、建立电-热综合能源系统模型。采用本公开将电-热综合能源系统分电力、热力系统和耦合装置这三部分进行建模。Step 2. Establish an electric-thermal comprehensive energy system model. The electric-thermal comprehensive energy system is modeled by using the present disclosure in three parts: electric power, thermal system and coupling device.
(1)电力系统。确定交流潮流作为电力系统的分析方法;其中,电力系统的功率平衡方程可以为式(4):(1) Power system. Determine the AC power flow as an analysis method of the power system; where, the power balance equation of the power system can be formula (4):
Figure PCTCN2022107149-appb-000026
Figure PCTCN2022107149-appb-000026
式(4)中P i,Q i分别为节点i的注入有功、无功功率,V i为节点i的电压幅值,G ij、B ij分别为支路ij的电导和电纳,θ ij为支路ij的相角差;
Figure PCTCN2022107149-appb-000027
为电力系统节点集合。
In formula (4), P i and Q i are the injected active and reactive power of node i respectively, V i is the voltage amplitude of node i, G ij and B ij are the conductance and susceptance of branch ij respectively, θ ij is the phase angle difference of branch ij;
Figure PCTCN2022107149-appb-000027
A collection of power system nodes.
(2)热力系统。区域热力系统采用集中供热方式进行供暖,图4示出了热力系统的结构示意图。如图4所示,热力系统的热源产生热能,通过送水管道输送到热负荷,形成第一通路401;经热负荷降温后通过回水管道回流,形成第二通路402,并且,第一通路401与第二通路402组成闭合回路。热力系统分为水力模型和热力模型两部分:(2) thermal system. The district heating system adopts centralized heating for heating, and Fig. 4 shows the structural diagram of the heating system. As shown in Figure 4, the heat source of the thermal system generates heat energy, which is transported to the heat load through the water supply pipeline to form the first passage 401; after the heat load is cooled down, it flows back through the return water pipe to form the second passage 402, and the first passage 401 It forms a closed loop with the second passage 402 . The thermal system is divided into two parts: the hydraulic model and the thermal model:
1)水力模型。热力系统由水力模型和热力模型组成;热力系统的水力模型表示介质流动,由流量连续性方程、回路压力方程和压头损失方程组成,如式(5)所示。1) Hydraulic model. The thermal system consists of a hydraulic model and a thermal model; the hydraulic model of the thermal system represents the medium flow, and consists of flow continuity equation, loop pressure equation and pressure head loss equation, as shown in formula (5).
Figure PCTCN2022107149-appb-000028
Figure PCTCN2022107149-appb-000028
式中,A h为节点-支路关联矩阵;B为回路-支路关联矩阵;
Figure PCTCN2022107149-appb-000029
为管道质量流量速率;
Figure PCTCN2022107149-appb-000030
为节点注入流量;h f表示压头损失,K为管道的阻尼系数。
In the formula, A h is the node-branch correlation matrix; B is the loop-branch correlation matrix;
Figure PCTCN2022107149-appb-000029
is the pipeline mass flow rate;
Figure PCTCN2022107149-appb-000030
is the flow rate injected into the node; h f is the pressure head loss, and K is the damping coefficient of the pipeline.
2)热力模型。热力模型可以表示能量传输过程,其可以由节点功率方程,管道温度下降方程和节点介质混合方程组成,如式(6)所示:2) Thermal model. The thermal model can represent the energy transfer process, which can be composed of node power equation, pipe temperature drop equation and node medium mixing equation, as shown in formula (6):
Figure PCTCN2022107149-appb-000031
Figure PCTCN2022107149-appb-000031
式中,H i为节点i的注入热功率,C p为水的比热容,T s,i、T o,i为节点i的送热管道水温和出口水温,T j(ij)的下标ij表示以i、j为首末端节点的热网管道支路,T i(ij)、T j(ij)为该支路的i、j端温度,T e表示外界环境温度。 In the formula, H i is the injected heat power of node i, C p is the specific heat capacity of water, T s,i and T o,i are the water temperature and outlet water temperature of the heat transfer pipe of node i, and the subscript ij of T j(ij) Indicates the pipeline branch of the heating network with i and j as the head and end nodes, T i(ij) and T j(ij) are the temperatures of the i and j ends of the branch, and T e represents the external ambient temperature.
(3)耦合装置(3) Coupling device
示例性的,热电联产机组采用抽凝式机组,运行点在多边形区域内,耦合装置的产电产热功率可用式(7)表示:Exemplarily, the heat and power cogeneration unit adopts the extraction condensing unit, and the operating point is within the polygonal area. The power generation and heat generation of the coupling device can be expressed by formula (7):
Figure PCTCN2022107149-appb-000032
Figure PCTCN2022107149-appb-000032
式中,
Figure PCTCN2022107149-appb-000033
分别为时段t,第i台抽凝机组电出力和热出力;
Figure PCTCN2022107149-appb-000034
分别为电出力上下限;α 1、α 2以及α 3为多边形区域表示系数,α 1、α 2以及α 3在给定热电联产装置时为常数。
In the formula,
Figure PCTCN2022107149-appb-000033
Respectively, the period t, the electric output and thermal output of the i-th condensing unit;
Figure PCTCN2022107149-appb-000034
are the upper and lower limits of electric output; α 1 , α 2 and α 3 are polygonal area representation coefficients, and α 1 , α 2 and α 3 are constants when a cogeneration device is given.
示例性的,以电力系统、热力系统以及耦合装置的历史发电功率为基础建立电-热综合能源系统调度模型,可以通过以下步骤实现:Exemplarily, establishing an electric-thermal integrated energy system dispatching model based on the historical power generation power of the electric power system, the thermal system, and the coupling device can be realized through the following steps:
步骤2-1、建立目标函数。Step 2-1, establishing the objective function.
确定以实现电-热综合能源系统的总运行成本最小为目标,同时为实现可在生能源的最大消纳,将可在生能源未消纳的部分作为惩罚项,建立目标函数;其中,目标函数可以如式(8)所示:Determine the goal of minimizing the total operating cost of the electric-thermal integrated energy system, and at the same time, in order to achieve the maximum consumption of renewable energy, the part of the renewable energy that is not consumed is used as a penalty item to establish an objective function; among them, the objective The function can be shown as formula (8):
min F=f 1+f 2+f 3        (8) min F=f 1 +f 2 +f 3 (8)
式中,f 1为常规机组运行成本,f 2为热电联产装置运行成本,f 3为弃风惩罚。 In the formula, f 1 is the operating cost of the conventional unit, f 2 is the operating cost of the combined heat and power unit, and f 3 is the penalty for abandoning wind.
1)常规机组运行成本f 1,可以通过式(9)得到: 1) The operating cost f 1 of the conventional unit can be obtained by formula (9):
Figure PCTCN2022107149-appb-000035
Figure PCTCN2022107149-appb-000035
式中,
Figure PCTCN2022107149-appb-000036
为常规机组发电功率,b 0、b 1以及b 2为常规机组能耗系数,N G为常规机组的数量。T为调度周期,△t为调度时间间隔。
In the formula,
Figure PCTCN2022107149-appb-000036
is the generating power of conventional units, b 0 , b 1 and b 2 are the energy consumption coefficients of conventional units, and N G is the number of conventional units. T is the scheduling period, and △t is the scheduling time interval.
2)热电联产机组运行成本f 2,可以通过式(10)得到: 2) The operating cost f 2 of the combined heat and power unit can be obtained by formula (10):
Figure PCTCN2022107149-appb-000037
Figure PCTCN2022107149-appb-000037
式中,
Figure PCTCN2022107149-appb-000038
分别为在时段t,节点i所连热电联产装置发电功率和产热功率;a 0、a 1、a 2、a 3、a 4以及a 5为热电联产装置能耗系数,N chp为热电联产的数量。
In the formula,
Figure PCTCN2022107149-appb-000038
are the power generation power and heat production power of the cogeneration device connected to node i in the period t; a 0 , a 1 , a 2 , a 3 , a 4 and a 5 are the energy consumption coefficients of the cogeneration device, N chp Amount of combined heat and power.
3)弃风惩罚f 3可以通过式(11)计算得到: 3) The wind curtailment penalty f 3 can be calculated by formula (11):
Figure PCTCN2022107149-appb-000039
Figure PCTCN2022107149-appb-000039
式中,
Figure PCTCN2022107149-appb-000040
表示在时段t,节点i所连风力发电机出力,k为弃风惩罚系数,k为常数。
In the formula,
Figure PCTCN2022107149-appb-000040
Indicates the output of wind turbines connected to node i in time period t, k is the wind curtailment penalty coefficient, and k is a constant.
步骤2-2、建立电-热综合能源系统调度模型的约束条件。Step 2-2. Establish constraints on the scheduling model of the electric-thermal integrated energy system.
其中,约束条件包括节点功率平衡等式约束、网络安全约束、热电联产装置约束、可再生能源约束以及常规机组出力约束。Among them, the constraints include node power balance equation constraints, network security constraints, combined heat and power device constraints, renewable energy constraints, and conventional unit output constraints.
1)节点功率平衡等式约束条件,式(12)以及式(13)可以表示网络节点有功功率平衡方程。1) Constraints of node power balance equations, Equation (12) and Equation (13) can express the network node active power balance equation.
Figure PCTCN2022107149-appb-000041
Figure PCTCN2022107149-appb-000041
Figure PCTCN2022107149-appb-000042
Figure PCTCN2022107149-appb-000042
其中,
Figure PCTCN2022107149-appb-000043
为电力系统、热力系统节点集合,T为调度时段。
Figure PCTCN2022107149-appb-000044
分别为时段t,节点i的电负荷和热负荷功率。
in,
Figure PCTCN2022107149-appb-000043
is the set of power system and thermal system nodes, and T is the scheduling period.
Figure PCTCN2022107149-appb-000044
Respectively, period t, the electric load and thermal load power of node i.
2)网络安全约束2) Network security constraints
为实现电-热综合能源系统安全可靠运行,系统应满足式(14)-式(16)的网络安全约束。In order to realize the safe and reliable operation of the electric-thermal integrated energy system, the system should satisfy the network security constraints of Equation (14)-Equation (16).
Figure PCTCN2022107149-appb-000045
Figure PCTCN2022107149-appb-000045
Figure PCTCN2022107149-appb-000046
Figure PCTCN2022107149-appb-000046
Figure PCTCN2022107149-appb-000047
Figure PCTCN2022107149-appb-000047
式中:V i,max、V i,min分别为节点i电压幅值的上限和下限;T sj为流入热网节点j的热水温度,
Figure PCTCN2022107149-appb-000048
Figure PCTCN2022107149-appb-000049
为供水温度上、下限;m jk为热水管道k的质量流量速率,m k,max,m k,min分别为质量流量速率的上、下限。
In the formula: V i,max and V i,min are the upper limit and lower limit of the voltage amplitude of node i respectively; T sj is the temperature of hot water flowing into node j of the heating network,
Figure PCTCN2022107149-appb-000048
Figure PCTCN2022107149-appb-000049
are the upper and lower limits of the water supply temperature; m jk is the mass flow rate of the hot water pipe k, m k,max and m k,min are the upper and lower limits of the mass flow rate respectively.
3)热电联产装置约束3) Constraints of cogeneration device
热电联产机组应满足爬坡约束可以如式(17)所示:The cogeneration unit should meet the climbing constraints, which can be shown in formula (17):
Figure PCTCN2022107149-appb-000050
Figure PCTCN2022107149-appb-000050
式中:
Figure PCTCN2022107149-appb-000051
分别为前后两个时段的热电联产发电功率,
Figure PCTCN2022107149-appb-000052
分别为热电联产装置爬坡速率上下限。
In the formula:
Figure PCTCN2022107149-appb-000051
are the combined heat and power generation power of the two periods before and after, respectively,
Figure PCTCN2022107149-appb-000052
are the upper and lower limits of the ramp rate of the cogeneration unit, respectively.
4)可再生能源约束可以如式(18)所示:4) Renewable energy constraints can be shown in formula (18):
Figure PCTCN2022107149-appb-000053
Figure PCTCN2022107149-appb-000053
式中,
Figure PCTCN2022107149-appb-000054
表示时段t,风机i的发电功率,
Figure PCTCN2022107149-appb-000055
Figure PCTCN2022107149-appb-000056
的最大出力值。
In the formula,
Figure PCTCN2022107149-appb-000054
Indicates the power generation power of wind turbine i in time period t,
Figure PCTCN2022107149-appb-000055
for
Figure PCTCN2022107149-appb-000056
maximum output value.
5)常规机组出力约束可以如式(19)所示:5) The conventional unit output constraints can be shown in formula (19):
Figure PCTCN2022107149-appb-000057
Figure PCTCN2022107149-appb-000057
同时满足如式(20)所示的爬坡约束:At the same time, the climbing constraints shown in formula (20) are satisfied:
Figure PCTCN2022107149-appb-000058
Figure PCTCN2022107149-appb-000058
式中:
Figure PCTCN2022107149-appb-000059
分别为机组出力上、下限,
Figure PCTCN2022107149-appb-000060
分为机组爬坡速率上、下限。
In the formula:
Figure PCTCN2022107149-appb-000059
are the upper and lower limits of unit output, respectively,
Figure PCTCN2022107149-appb-000060
Divided into the upper and lower limits of the ramp rate of the unit.
步骤3、建立基于SAC的电-热耦合综合能源系统优化调度模型。Step 3. Establish an optimal dispatching model of the electric-thermal coupling integrated energy system based on SAC.
根据强化学习环境,状态,动作,奖励这四个基本要素,结合电-热综合能源系统调度模型建立基于SAC框架的优化调度模型可以包括:According to the four basic elements of reinforcement learning environment, state, action, and reward, combined with the electric-thermal comprehensive energy system scheduling model to establish an optimal scheduling model based on the SAC framework can include:
1)动作空间。动作空间变量与所研究系统的控制变量相对应,设置常规机组发电功率,热电联产发电功率以及热电联产热功率作为动作变量,如式(21)所示,在式(21)中,P i G
Figure PCTCN2022107149-appb-000061
以及P i chp依次为常规机组发电功率、热电联产装置热功率以及热电联产装置发电功率。
1) Action space. The action space variable corresponds to the control variable of the system under study, and the power generation power of the conventional unit, the cogeneration power generation power and the cogeneration heat power are set as the action variables, as shown in formula (21). In formula (21), P i G ,
Figure PCTCN2022107149-appb-000061
And P i chp is the generating power of the conventional unit, the thermal power of the combined heat and power device, and the generating power of the combined heat and power device in turn.
Figure PCTCN2022107149-appb-000062
Figure PCTCN2022107149-appb-000062
2)状态空间。状态空间变量与所研究系统的状态变量对应,确定状态空间变量为式(22)所示,其中,P i G、P load、P w、P i chp、H load
Figure PCTCN2022107149-appb-000063
以及T e依次为常规机组发电功率、电负荷、风力发电功率、热电联产装置发电功率、热负荷、热电联产装置热功率以及环境温度:
2) State space. The state space variables correspond to the state variables of the system under study, and the determined state space variables are shown in formula (22), where P i G , P load , P w , P i chp , H load ,
Figure PCTCN2022107149-appb-000063
And T e is the generating power of conventional units, electric load, wind power, generating power of cogeneration device, heat load, thermal power of cogeneration device and ambient temperature:
Figure PCTCN2022107149-appb-000064
Figure PCTCN2022107149-appb-000064
3)环境搭建。搭建强化学习环境,通过策略网络得到当前动作作用于环境,得到即时奖励和下一时段的状态,起到状态转移的作用,并为策略评估提供奖励。3) Environment construction. Build a reinforcement learning environment, get the current action acting on the environment through the policy network, get immediate rewards and the state of the next period, play the role of state transfer, and provide rewards for strategy evaluation.
4)得到奖励函数。设置强化学习目标为实现长期奖励最大化,确定优化目标的相反数设计为即时奖励,同时根据约束条件设置惩罚机制加入到即时奖励中得到最终的奖励函数。惩罚项统一表达形式可以为式(23)至式(24)所示:4) Obtain the reward function. The goal of reinforcement learning is set to maximize the long-term reward, and the opposite number of the optimization goal is determined as the immediate reward. At the same time, the penalty mechanism is set according to the constraints and added to the immediate reward to obtain the final reward function. The unified expression form of the penalty item can be shown in formula (23) to formula (24):
Figure PCTCN2022107149-appb-000065
Figure PCTCN2022107149-appb-000065
Figure PCTCN2022107149-appb-000066
Figure PCTCN2022107149-appb-000066
式中,β v为惩罚系数,根据不同的越限惩罚设置相应的常数系数。 In the formula, β v is the penalty coefficient, and the corresponding constant coefficient is set according to different limit violation penalties.
奖励函数包括常规机组运行成本,弃风惩罚,热电联产装置运行成本以及变量越限惩罚,可以如式(25)所示。The reward function includes the operating cost of conventional units, the penalty for abandoning wind, the operating cost of the cogeneration unit, and the penalty for exceeding the variable limit, which can be shown in formula (25).
Figure PCTCN2022107149-appb-000067
Figure PCTCN2022107149-appb-000067
式中f 1、f 2、以及f 3分别为常规机组运行成本、热电联产装置运行成本以及弃风惩罚。
Figure PCTCN2022107149-appb-000068
以及
Figure PCTCN2022107149-appb-000069
分别为常规机组出力越限及爬坡越限惩罚项,φ V为系统节点电压越限惩罚。
Figure PCTCN2022107149-appb-000070
以及
Figure PCTCN2022107149-appb-000071
分别为热电联产机组出力及爬坡越限惩罚项,φ T为系统节点温度越限惩罚,φ m为系统管道质量流量速率越限惩罚。
In the formula, f 1 , f 2 , and f 3 are the operating cost of the conventional unit, the operating cost of the combined heat and power unit, and the wind curtailment penalty, respectively.
Figure PCTCN2022107149-appb-000068
as well as
Figure PCTCN2022107149-appb-000069
They are the penalty items for conventional unit output over-limit and ramp-up over-limit respectively, and φ V is the system node voltage over-limit penalty.
Figure PCTCN2022107149-appb-000070
as well as
Figure PCTCN2022107149-appb-000071
They are the output of cogeneration units and the over-limit penalty for climbing, φ T is the over-limit penalty for system node temperature, and φ m is the over-limit penalty for system pipeline mass flow rate.
步骤4、SAC训练过程。Step 4, SAC training process.
示例性的,对基于SAC框架优化调度模型进行训练,得到预先训练的基于SAC框架优化调度模型,可以通过以下方式实现:Exemplarily, the optimal scheduling model based on the SAC framework is trained to obtain a pre-trained optimal scheduling model based on the SAC framework, which can be implemented in the following manner:
首先对优化调度智能体行动器策略网络φ和评判器Q网络;图5示出了基于SAC框架的优化调度模型进行训练的算法流程图。如图5所示,该流程可以包括以下步骤:Firstly, optimize the scheduling agent agent policy network φ and the evaluator Q network; Figure 5 shows the algorithm flow chart for training the optimal scheduling model based on the SAC framework. As shown in Figure 5, the process may include the following steps:
步骤501、θ 1θ 2初始化。 Step 501, θ 1 and θ 2 are initialized.
示例性的,可以给评判器目标网络赋值,即
Figure PCTCN2022107149-appb-000072
Figure PCTCN2022107149-appb-000073
Exemplarily, it is possible to assign a value to the target network of the evaluator, namely
Figure PCTCN2022107149-appb-000072
and
Figure PCTCN2022107149-appb-000073
步骤502、设置智能记忆库D容量。 Step 502, setting the capacity of the intelligent memory bank D.
步骤503、初始化t和g。 Step 503, initialize t and g.
示例性的,可以初始化t和g的值均为0。Exemplarily, both t and g may be initialized to be 0.
步骤504、判断t是否在T中。 Step 504, judging whether t is in T.
示例性的,若t在T中,则执行步骤505至步骤507;若t不在T中,则执行步骤508。Exemplarily, if t is in T, execute step 505 to step 507; if t is not in T, execute step 508.
步骤505、采样控制动作,并将动作施加于环境,得到下一时刻运行状态。Step 505: Sampling the control action and applying the action to the environment to obtain the running state at the next moment.
步骤506、将状态转移及奖励置入经验库D。 Step 506, put the state transition and rewards into the experience database D.
步骤507、t自增。 Step 507, t is incremented by itself.
步骤508、判断g是否在G中。 Step 508, judging whether g is in G.
示例性的,若g在G中,则执行步骤509至510,若g不在G中,则执行步骤511。Exemplarily, if g is in G, execute steps 509 to 510, and if g is not in G, execute step 511.
步骤509、更新评判器Q网络、行动器策略网络、温度系数、目标网络。 Step 509, update the Q network of the judge, the policy network of the actor, the temperature coefficient, and the target network.
步骤510、g自增。 Step 510, g is incremented by itself.
步骤511、判断连续m 0轮训练的奖励平均值变化幅度是否小于δ e%。 Step 511, judging whether the change range of the average value of rewards for m 0 consecutive training rounds is less than δ e %.
若是,则执行步骤503;若否则执行步骤512。If yes, execute step 503; otherwise, execute step 512.
步骤512、结束。 Step 512, end.
示例性的,当连续m 0轮训练的奖励平均值变化幅度是否小于δ e%,对于每一时段,都需要从行动器策略网络中采样控制动作a t~π φ(a t|S t),并将动作施加在电-热综合能源系统上,针对风电不确定性进行采样,得到系统下一个时刻运行状态S t+1,再将状态转移及奖励置入经验库D,即D←D{(S t,a t,r(S t,a t),S t+1};对于每一次梯度更新,采用Adam策略更新评判器Q网络
Figure PCTCN2022107149-appb-000074
行动器策略网络
Figure PCTCN2022107149-appb-000075
温度系数
Figure PCTCN2022107149-appb-000076
目标网络
Figure PCTCN2022107149-appb-000077
更新评判器Q网络、行动器策略网络φ、温度系数、目标网络得到训练好的策略网络,作为SAC框架优化调度模型。
Exemplarily, when the variation range of the average value of the reward for continuous m 0 rounds of training is less than δ e %, for each time period, it is necessary to sample the control action a t ~π φ (a t |S t ) from the actor policy network , and apply the action on the electric-thermal integrated energy system, sample the uncertainty of wind power, and get the operating state S t+1 of the system at the next moment, and then put the state transition and reward into the experience database D, that is, D←D {(S t ,a t ,r(S t ,a t ),S t+1 }; For each gradient update, the Adam strategy is used to update the evaluator Q network
Figure PCTCN2022107149-appb-000074
Actor Policy Network
Figure PCTCN2022107149-appb-000075
Temperature Coefficient
Figure PCTCN2022107149-appb-000076
target network
Figure PCTCN2022107149-appb-000077
Update the evaluator Q network, the actor policy network φ, the temperature coefficient, and the target network to obtain the trained policy network as the optimal scheduling model of the SAC framework.
示例性的,训练好的策略网络在给定负荷等系统状态时可以直接给出调度动作,生成策略。Exemplarily, the trained policy network can directly give scheduling actions and generate policies when the system state such as load is given.
示例性的,发电机以及热电联产机组成本的计算可采用线性模型替代,但会影响计算结果精度;CHP机组惩罚项可采用定电热比形式进行建模,但其控制灵活性和计算准确度不如多边形区域模型;奖惩机制中的惩罚函数可以采用阶跃函数的形式建立,但阶跃函数或噪声神经网络拟合困难,降低求解精度。训练方法可采用随机梯度下降法(Stochastic Gradient Descent,SGD)替代Adam,但实践表明Adam算法更好。For example, the calculation of the cost of generators and combined heat and power units can be replaced by a linear model, but it will affect the accuracy of the calculation results; the penalty item of the CHP unit can be modeled in the form of a constant power-to-heat ratio, but its control flexibility and calculation accuracy It is not as good as the polygonal area model; the penalty function in the reward and punishment mechanism can be established in the form of a step function, but it is difficult to fit the step function or noise neural network, which reduces the solution accuracy. The training method can use Stochastic Gradient Descent (SGD) instead of Adam, but practice shows that the Adam algorithm is better.
图6示出了本公开实施例提供的电-热综合能源系统协调优化系统6的结构示意图,如图6所示,该系统包括:Fig. 6 shows a schematic structural diagram of an electric-thermal comprehensive energy system coordination and optimization system 6 provided by an embodiment of the present disclosure. As shown in Fig. 6, the system includes:
第一参数获取模块601,配置为获取实时电-热综合能源系统参数;The first parameter acquisition module 601 is configured to acquire real-time electric-thermal comprehensive energy system parameters;
功率计算模块602,配置为基于实时电-热综合能源系统参数,分别计算电-热综合能源系统的电力系统、热力系统以及耦合装置的实时发电功率;The power calculation module 602 is configured to calculate the real-time power generation power of the power system, the thermal system and the coupling device of the electric-thermal comprehensive energy system respectively based on the real-time electric-thermal comprehensive energy system parameters;
调度输出模块603,配置为将实时发电功率输入至预先训练的基于SAC框架优化调度模型中,输出调度动作,形成电-热综合能源系统协调策略。The scheduling output module 603 is configured to input real-time generated power into the pre-trained optimal scheduling model based on the SAC framework, output scheduling actions, and form a coordination strategy for the electric-thermal comprehensive energy system.
其中,图7示出了本公开实施例通过的预先训练的基于SAC框架优化调度模型7的结构示意图,如图7所示,调度输出模块中预先训练的基于SAC框架优化调度模型7包括:Wherein, FIG. 7 shows a schematic structural diagram of a pre-trained SAC framework-based optimal scheduling model 7 adopted by an embodiment of the present disclosure. As shown in FIG. 7 , the pre-trained SAC framework-based optimal scheduling model 7 in the scheduling output module includes:
第二参数获取模块701,配置为获取历史电-热综合能源系统参数;The second parameter acquisition module 701 is configured to acquire historical electric-thermal comprehensive energy system parameters;
模型建立模块702,配置为基于历史电-热综合能源系统参数,分别计算电-热综合能源系统的电力系统、热力系统和耦合装置的历史发电功率,并以电力系统、热力系统以及耦合装置的历史发电功率为基础建立电-热综合能源系统调度模型;The model building module 702 is configured to separately calculate the historical power generation power of the electric power system, thermal system and coupling device of the electric-thermal comprehensive energy system based on the parameters of the historical electric-thermal comprehensive energy system, and use the power system, thermal system and coupling device Based on the historical power generation power, the electric-thermal integrated energy system dispatching model is established;
模型优化模块703,配置为以强化学习环境、状态、动作及奖励作为基本要素,结合电-热综合能源系统调度模型建立基于SAC框架优化调度模型;The model optimization module 703 is configured to use the reinforcement learning environment, state, action and reward as basic elements, and combine the electric-thermal integrated energy system scheduling model to establish an optimal scheduling model based on the SAC framework;
模型训练模块704,配置为对基于SAC框架优化调度模型进行训练,得到预先训练的基于SAC框架优化调度模型。The model training module 704 is configured to train the optimal scheduling model based on the SAC framework to obtain a pre-trained optimal scheduling model based on the SAC framework.
模型优化模块703,配置为设置动作变量为
Figure PCTCN2022107149-appb-000078
其中,P i G
Figure PCTCN2022107149-appb-000079
以及P i chp依次为常规机组发电功率、热电联产装置热功率以及热电联产装置发电功率;
The model optimization module 703 is configured to set the action variable as
Figure PCTCN2022107149-appb-000078
Among them, P i G ,
Figure PCTCN2022107149-appb-000079
And P i chp is the generating power of the conventional unit, the thermal power of the combined heat and power device, and the generating power of the combined heat and power device in turn;
确定状态空间变量为
Figure PCTCN2022107149-appb-000080
其中,P i G、P load、P w、P i chp、H load
Figure PCTCN2022107149-appb-000081
以及T e依次为常规机组发电功率、电负荷、风力发电功率、热电联产装置发电功率、热负荷、热电联产装置热功率以及环境温度;
Determine the state space variable as
Figure PCTCN2022107149-appb-000080
Among them, P i G , P load , P w , P i chp , H load ,
Figure PCTCN2022107149-appb-000081
And T e is the generating power of the conventional unit, the electric load, the wind power generating power, the generating power of the cogeneration device, the heat load, the thermal power of the cogeneration device and the ambient temperature;
搭建强化学习环境,通过策略网络得到当前动作作用于环境,得到即时奖励和下一时段的状态,并为策略评估提供奖励;Build an intensive learning environment, get the current action acting on the environment through the policy network, get immediate rewards and the state of the next period, and provide rewards for strategy evaluation;
设置强化学习目标为实现长期奖励最大化,确定优化目标的相反数设计为即时奖励,同时根据约束条件设置惩罚机制加入到即时奖励中得到最终的奖励函数;其中,惩罚机制如式(23)至式(24)所示。Set the reinforcement learning goal to maximize the long-term reward, determine the opposite number of the optimization goal and design it as the immediate reward, and set the penalty mechanism according to the constraints to add to the immediate reward to obtain the final reward function; among them, the penalty mechanism is as shown in formula (23) to Formula (24) shows.
奖励函数如式(25)所示。The reward function is shown in formula (25).
在一些实施例中,实时电-热综合能源系统参数以及历史电-热综合能源系统参数包括电热综合能源系统网络参数和电热负荷出力及风电出力。In some embodiments, the real-time electric-thermal integrated energy system parameters and historical electric-thermal integrated energy system parameters include electric-thermal integrated energy system network parameters, electric-heat load output, and wind power output.
在一些实施例中,In some embodiments,
功率计算模块602,配置为确定交流潮流作为电力系统的分析方法;其中,电力系统的功率平衡方程为式(4);The power calculation module 602 is configured to determine the AC power flow as an analysis method of the power system; wherein, the power balance equation of the power system is formula (4);
功率计算模块602,配置为确定热力系统的水力模型由流量连续性方程、回路压力方程和压头损失方程组成;热力系统由水力模型和热力模型组成;热力系统的水利模型如式(5)所示。The power calculation module 602 is configured to determine that the hydraulic model of the thermal system is composed of the flow continuity equation, the loop pressure equation and the pressure head loss equation; the thermal system is composed of the hydraulic model and the thermal model; Show.
功率计算模块602,配置为确定热力模型由节点功率方程、管道温度下降方程和节点介质混合方程组成;热力模型如式(6)所示。The power calculation module 602 is configured to determine that the thermal model consists of node power equations, pipe temperature drop equations and node medium mixing equations; the thermal model is shown in formula (6).
功率计算模块602,配置为确定耦合装置产电产热功率为式(7)。The power calculation module 602 is configured to determine the electricity and heat generation power of the coupling device as formula (7).
示例性的,模型建立模块702,配置为以实现电-热综合能源系统的总运行成本最小为目标,同时为实现可在生能源的最大消纳,将可在生能源未消纳的部分作为惩罚项,建立目标函数;建立电-热综合能源系统调度模型的约束条件,约束条件包括:节点功率平衡等式约束、网络安全约束、热电联产装置约束、可再生能源约束和常规机组出力约束。Exemplarily, the model building module 702 is configured to aim at minimizing the total operating cost of the electric-thermal integrated energy system, and at the same time, to realize the maximum consumption of renewable energy, take the unconsumed part of renewable energy as Penalty term, establish objective function; establish constraints of electric-thermal integrated energy system scheduling model, constraints include: node power balance equation constraints, network security constraints, cogeneration device constraints, renewable energy constraints and conventional unit output constraints .
示例性的,目标函数为式(8)所示;常规机组运行成本为式(9)所示;热电联产机组运行成本为式(10)所示;弃风惩罚为式(11)所示。Exemplarily, the objective function is shown in formula (8); the operating cost of conventional units is shown in formula (9); the operating cost of combined heat and power units is shown in formula (10); the wind curtailment penalty is shown in formula (11) .
示例性的,节点功率平衡等式约束条件基于网络节点有功功率平衡方程,为式(12)至式(13)所示。Exemplarily, the node power balance equation constraint condition is based on the network node active power balance equation, as shown in formula (12) to formula (13).
示例性的,网络安全约束式(14)至式(16)所示。Exemplarily, the network security constraints are shown in equations (14) to (16).
示例性的,热电联产装置约束为式(17)所示。Exemplarily, the constraint of the cogeneration device is shown in formula (17).
示例性的,可再生能源约束为式(18)所示。Exemplarily, the renewable energy constraints are shown in formula (18).
示例性的,常规机组出力约束为式(19)所示。Exemplarily, the conventional unit output constraint is shown in formula (19).
示例性的,同时满足如式(20)所示的爬坡约束。Exemplarily, the climbing constraint shown in formula (20) is satisfied at the same time.
示例性的,模型优化模块703,配置为将常规机组发电功率、热电联产发电功率以及热电联产热功率作为如式(21)所示的动作变量;选取为电负荷、风力发电功率、热电联产装置发电功率、常规机组出力、热负荷、热电联产装置热功率以及环境温度作为如式(22)所示的状态空间变量;搭建强化学习环境,通过策略网络得到当前动作作用于环境,得到即时奖励和下一时段的状态,并为策略评估提供奖励;强化学习目标为实现长期奖励最大化,将优化目标的相反数设计为即时奖励,同时根据约束条件设置惩罚机制加入到即时奖励中得到最终的奖励函数,惩罚项统一表达形式为式(23)至式 (24);奖励函数如式(25)所示,包括常规机组运行成本,弃风惩罚,热电联产装置运行成本以及变量越限惩罚。Exemplarily, the model optimization module 703 is configured to take the power generated by the conventional unit, the cogeneration power and the heat power generated by the cogeneration as the action variables shown in formula (21); The power generation of the cogeneration unit, the output of the conventional unit, the heat load, the thermal power of the cogeneration unit, and the ambient temperature are used as the state space variables shown in equation (22); build a reinforcement learning environment, and get the current action acting on the environment through the policy network, Get immediate rewards and the state of the next period, and provide rewards for strategy evaluation; the goal of reinforcement learning is to maximize long-term rewards, design the opposite of the optimization goal as immediate rewards, and set a penalty mechanism based on constraints to add to the immediate rewards The final reward function is obtained, and the unified expression of penalty items is in the form of formula (23) to formula (24); the reward function is shown in formula (25), including the operating cost of conventional units, the penalty for abandoning wind, the operating cost of cogeneration units, and variables Limit penalty.
示例性的,模型训练模块704,配置为给评判器目标网络赋值,设置智能记忆库容量D;Exemplarily, the model training module 704 is configured to assign a value to the target network of the evaluator, and set the capacity D of the intelligent memory bank;
当连续m 0轮训练的奖励平均值变化幅度小于δ e%时,对于每一时段,都从行动器策略网络中采样控制动作a t~π φ(a t|S t),并将控制动作施加在电-热综合能源系统上,针对风电不确定性进行采样,得到系统下一个时刻运行状态S t+1,再将状态转移及奖励置入经验库D;并更新评判器Q网络、行动器策略网络φ、温度系数、目标网络得到训练好的策略网络,作为优化的SAC框架的优化调度模型。 When the variation range of the average reward value of continuous m 0 rounds of training is less than δ e %, for each time period, the control action a t ~ π φ ( at |S t ) is sampled from the actor policy network, and the control action Applied on the electric-thermal integrated energy system, sampling the uncertainty of wind power to obtain the system's next operating state S t+1 , and then put the state transition and rewards into the experience database D; and update the evaluator Q network, action The policy network φ, the temperature coefficient and the target network are trained to obtain a well-trained policy network, which is used as the optimal scheduling model of the optimized SAC framework.
本公开实施例还提供了一种电子设备,图8示出了本公开实施例提供的电子设备8的结构示意图,如图8所示包括存储器801、处理器802以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时能够实现如前任一实施例提供的电-热综合能源系统协调优化方法。An embodiment of the present disclosure also provides an electronic device. FIG. 8 shows a schematic structural diagram of an electronic device 8 provided by an embodiment of the present disclosure. As shown in FIG. 8 , it includes a memory 801, a processor 802, and the A computer program that can run on the processor, when the processor executes the computer program, it can implement the method for coordination and optimization of an electric-thermal comprehensive energy system as provided in any one of the previous embodiments.
本公开实施例还提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现如前任一所述的电-热综合能源系统协调优化方法。An embodiment of the present disclosure also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the coordinated optimization of the electric-thermal comprehensive energy system as described in the previous one is realized method.
本公开实施例还提供了一种计算机程序,所述计算机程序包括计算机可读代码,在计算机可读代码在电子设备中运行的情况下,电子设备的处理器执行如前任一实施例提供的电-热综合能源系统协调优化方法。An embodiment of the present disclosure also provides a computer program, where the computer program includes computer readable codes. When the computer readable codes run in the electronic device, the processor of the electronic device executes the computer program provided in any preceding embodiment. - Coordinated optimization method for thermal integrated energy systems.
其中,存储器801可以包括随机存取存储器(Random Access Memory,RAM),只读存储器(Read Only Memory,ROM),可编程只读存储器(Programmable Read-Only Memory,PROM),可擦除只读存储器(Erasable Programmable Read-Only Memory,EPROM),电可擦除只读存储器(Electric Erasable Programmable Read-Only Memory,EEPROM)等。Wherein, the memory 801 may include random access memory (Random Access Memory, RAM), read-only memory (Read Only Memory, ROM), programmable read-only memory (Programmable Read-Only Memory, PROM), erasable read-only memory (Erasable Programmable Read-Only Memory, EPROM), Electric Erasable Programmable Read-Only Memory (EEPROM), etc.
处理器802可能是一种集成电路芯片,具有信号的处理能力。上述的处理器可以是通用处理器,包括CPU、网络处理器(Network Processor,NP)等;还可以是DSP、ASIC、FPGA或者其它可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本公开实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The processor 802 may be an integrated circuit chip with signal processing capabilities. The above-mentioned processor can be a general-purpose processor, including a CPU, a network processor (Network Processor, NP), etc.; it can also be a DSP, ASIC, FPGA or other programmable logic devices, discrete gate or transistor logic devices, and discrete hardware components. Various methods, steps and logic block diagrams disclosed in the embodiments of the present disclosure may be implemented or executed. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是但不限于电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身,诸如无线电波或者其他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波(例如,通过光纤电缆的光脉冲)、或者通过电线传输的电信号。A computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device. A computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. As used herein, computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., pulses of light through fiber optic cables), or transmitted electrical signals.
这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和无线网中的至少之一下载到外部计算机或外部存储设备。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。The computer-readable program instructions described herein can be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device over at least one of a network, such as the Internet, a local area network, a wide area network, and a wireless network. . A network adapter card or a network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .
用于执行本公开操作的计算机程序指令可以是汇编指令、指令集架构(Industry Standard Architecture,ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在一些实施例中,通过利用计算机可读程序指令的状态信息来个性化定制电子电路,例如可编程逻辑电路、FPGA或可编程逻辑阵列(Programmable Logic Arrays,PLA),该电子电路可以执行计算机可读程序指令,从而实现本公开的各个方面。Computer program instructions for performing the operations of the present disclosure may be assembly instructions, Industry Standard Architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or in one or more source or object code written in any combination of programming languages. Computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement. In some embodiments, electronic circuits, such as programmable logic circuits, FPGAs, or programmable logic arrays (Programmable Logic Arrays, PLAs), can be customized by using state information of computer-readable program instructions, which can execute computer-readable Read program instructions, thereby implementing various aspects of the present disclosure.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。If the functions are realized in the form of software function units and sold or used as independent products, they can be stored in a non-volatile computer-readable storage medium executable by a processor. Based on this understanding, the technical solution of the present disclosure is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in various embodiments of the present disclosure. The aforementioned storage medium includes: various media capable of storing program codes such as U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk.
最后应说明的是:以上所述实施例,仅为本公开的具体实施方式,用以说明本公开的技术方案,而非对其限制,本公开的保护范围并不局限于此,尽管参照前述实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化,或者对其中部分技术特征进行等同替换;而这些修改、变化或者替换,并不使相应技术方案的本质脱离本公开实施例技术方案的精神和范围,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应所述以权利要求的保护范围为准。Finally, it should be noted that: the above-mentioned embodiments are only specific implementations of the present disclosure, and are used to illustrate the technical solutions of the present disclosure, rather than limit them, and the protection scope of the present disclosure is not limited thereto, although referring to the aforementioned The embodiments have described the present disclosure in detail, and those skilled in the art should understand that any person familiar with the technical field can still modify the technical solutions described in the foregoing embodiments within the technical scope disclosed in the present disclosure Changes can be easily imagined, or equivalent replacements can be made to some of the technical features; and these modifications, changes or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present disclosure, and should be included in this disclosure. within the scope of protection. Therefore, the protection scope of the present disclosure should be defined by the protection scope of the claims.
工业实用性Industrial Applicability
本公开实施例提供了一种电-热综合能源系统协调优化方法、系统、设备、介质及程序,其中,所述方法由电子设备执行,所述方法包括:获取实时电-热综合能源系统参数;基于所述电-热综合能源系统参数,分别计算电-热综合能源系统的电力系统、热力系统以及耦合装置的实时发电功率;将所述实时发电功率输入预先训练的基于SAC框架优化调度模型中,输出调度动作,形成电-热综合能源系统协调策略。本公开能够直接通过已经训练好的策略网络给出调度动作,无需再进行传统非线性整体迭代求解,使得计算速度得到了显著的提升。An embodiment of the present disclosure provides a coordinated optimization method, system, device, medium and program for an electric-thermal comprehensive energy system, wherein the method is executed by an electronic device, and the method includes: acquiring real-time electric-thermal comprehensive energy system parameters ; Based on the parameters of the electric-thermal integrated energy system, calculate the real-time power generation power of the power system, thermal system and coupling device of the electric-thermal integrated energy system respectively; input the real-time power generation power into the pre-trained optimal dispatching model based on the SAC framework In the process, the scheduling action is output to form a coordination strategy for the electric-thermal integrated energy system. The present disclosure can directly give scheduling actions through the trained policy network without performing traditional nonlinear overall iterative solution, so that the calculation speed is significantly improved.

Claims (19)

  1. 一种电-热综合能源系统协调优化方法,所述方法由电子设备执行;所述方法包括:A coordinated optimization method for an electric-thermal comprehensive energy system, the method is executed by electronic equipment; the method includes:
    获取实时电-热综合能源系统参数;Obtain real-time electric-thermal comprehensive energy system parameters;
    基于所述实时电-热综合能源系统参数,分别计算电-热综合能源系统的电力系统、热力系统以及耦合装置的实时发电功率;Based on the real-time electric-thermal comprehensive energy system parameters, calculate the real-time power generation power of the electric power system, thermal system and coupling device of the electric-thermal comprehensive energy system;
    将所述实时发电功率输入至预先训练的基于柔性行动器-评判器SAC框架优化调度模型中,输出调度动作,形成电-热综合能源系统协调策略;Input the real-time generated power into the pre-trained flexible actor-evaluator SAC framework optimization scheduling model, output the scheduling action, and form the electric-thermal comprehensive energy system coordination strategy;
    所述预先训练的基于SAC框架优化调度模型的训练优化方法包括:The training optimization method based on the SAC frame optimization dispatching model of described pre-training comprises:
    获取历史电-热综合能源系统参数;Obtain historical electric-thermal comprehensive energy system parameters;
    基于所述历史电-热综合能源系统参数,分别计算电-热综合能源系统的电力系统、热力系统以及耦合装置的历史发电功率,以电力系统、热力系统以及耦合装置的历史发电功率为基础建立电-热综合能源系统调度模型;Based on the historical electric-thermal integrated energy system parameters, calculate the historical power generation power of the electric power system, thermal system and coupling device of the electric-thermal integrated energy system, and establish based on the historical power generation power of the electric power system, thermal system and coupling device Scheduling model of electric-thermal integrated energy system;
    以强化学习环境、状态、动作及奖励作为基本要素,结合电-热综合能源系统调度模型建立基于SAC框架优化调度模型;Taking reinforcement learning environment, state, action and reward as basic elements, combined with electric-thermal comprehensive energy system scheduling model to establish an optimal scheduling model based on SAC framework;
    对基于SAC框架优化调度模型进行训练,得到预先训练的基于SAC框架优化调度模型;Train the optimal scheduling model based on the SAC framework to obtain a pre-trained optimal scheduling model based on the SAC framework;
    所述以强化学习环境、状态、动作及奖励作为基本要素,结合电-热综合能源系统调度模型建立基于SAC框架优化调度模型,包括:The basic elements of strengthening the learning environment, state, action and reward are combined with the electric-thermal comprehensive energy system scheduling model to establish an optimal scheduling model based on the SAC framework, including:
    设置动作变量为
    Figure PCTCN2022107149-appb-100001
    其中,P i G
    Figure PCTCN2022107149-appb-100002
    以及P i chp依次为常规机组发电功率、热电联产装置热功率以及热电联产装置发电功率;
    Set the action variable to
    Figure PCTCN2022107149-appb-100001
    Among them, P i G ,
    Figure PCTCN2022107149-appb-100002
    And P i chp is the generating power of the conventional unit, the thermal power of the combined heat and power device, and the generating power of the combined heat and power device in turn;
    确定状态空间变量为
    Figure PCTCN2022107149-appb-100003
    其中,P i G、P load、P w、P i chp、H load
    Figure PCTCN2022107149-appb-100004
    以及T e依次为常规机组发电功率、电负荷、风力发电功率、热电联产装置发电功率、热负荷、热电联产装置热功率以及环境温度;
    Determine the state space variable as
    Figure PCTCN2022107149-appb-100003
    Among them, P i G , P load , P w , P i chp , H load ,
    Figure PCTCN2022107149-appb-100004
    And T e is the generating power of the conventional unit, the electric load, the wind power generating power, the generating power of the cogeneration device, the heat load, the thermal power of the cogeneration device and the ambient temperature;
    搭建强化学习环境,通过策略网络得到当前动作作用于环境,得到即时奖励和下一时段的状态,并为策略评估提供奖励;Build an intensive learning environment, get the current action acting on the environment through the policy network, get immediate rewards and the state of the next period, and provide rewards for strategy evaluation;
    设置强化学习目标为实现长期奖励最大化,确定优化目标的相反数设计为即时奖励,同时根据约束条件设置惩罚机制加入到所述即时奖励中得到最终的奖励函数;其中,所述惩罚机制为:Set the reinforcement learning goal to maximize the long-term reward, determine the opposite number of the optimization goal and design it as an immediate reward, and set a penalty mechanism according to the constraints to add to the immediate reward to obtain the final reward function; wherein, the penalty mechanism is:
    Figure PCTCN2022107149-appb-100005
    Figure PCTCN2022107149-appb-100005
    Figure PCTCN2022107149-appb-100006
    Figure PCTCN2022107149-appb-100006
    β v为惩罚系数;常数系数与越限惩罚相应设置; β v is the penalty coefficient; the constant coefficient is set correspondingly to the limit violation penalty;
    所述奖励函数为
    Figure PCTCN2022107149-appb-100007
    f 1、f 2、以及f 3分别为常规机组运行成本、热电联产装置运行成本以及弃风惩罚;
    Figure PCTCN2022107149-appb-100008
    以及
    Figure PCTCN2022107149-appb-100009
    分别为常规机组 出力越限及爬坡越限惩罚项;φ V为系统节点电压越限惩罚;
    Figure PCTCN2022107149-appb-100010
    以及
    Figure PCTCN2022107149-appb-100011
    分别为热电联产机组出力及爬坡越限惩罚项;φ T为系统节点温度越限惩罚;φ m为系统管道质量流量速率越限惩罚。
    The reward function is
    Figure PCTCN2022107149-appb-100007
    f 1 , f 2 , and f 3 are the operating cost of the conventional unit, the operating cost of the combined heat and power unit, and the wind curtailment penalty, respectively;
    Figure PCTCN2022107149-appb-100008
    as well as
    Figure PCTCN2022107149-appb-100009
    Respectively, the conventional unit output over-limit and climbing over-limit penalty items; φ V is the system node voltage over-limit penalty;
    Figure PCTCN2022107149-appb-100010
    as well as
    Figure PCTCN2022107149-appb-100011
    Respectively, the output of the combined heat and power unit and the over-limit penalty for climbing; φ T is the over-limit penalty for the system node temperature; φ m is the over-limit penalty for the mass flow rate of the system pipeline.
  2. 根据权利要求1所述的电-热综合能源系统协调优化方法,其中,所述实时电-热综合能源系统参数以及所述历史电-热综合能源系统参数包括电热综合能源系统网络参数和电热负荷出力及风电出力。The coordinated optimization method for the electric-thermal integrated energy system according to claim 1, wherein the real-time electric-thermal integrated energy system parameters and the historical electric-thermal integrated energy system parameters include the electric-thermal integrated energy system network parameters and the electric-heat load output and wind power output.
  3. 根据权利要求1所述的电-热综合能源系统协调优化方法,其中,所述基于所述实时电-热综合能源系统参数,分别计算电-热综合能源系统的电力系统、热力系统以及耦合装置的实时发电功率,包括:The coordinated optimization method for the electric-thermal integrated energy system according to claim 1, wherein, based on the real-time electric-thermal integrated energy system parameters, the electric power system, the thermal system and the coupling device of the electric-thermal integrated energy system are respectively calculated real-time power generation, including:
    确定交流潮流作为电力系统的分析方法;其中,所述电力系统的功率平衡方程为:Determine the AC power flow as an analysis method of the power system; wherein, the power balance equation of the power system is:
    Figure PCTCN2022107149-appb-100012
    Figure PCTCN2022107149-appb-100012
    P i和Q i分别为节点i的注入有功、无功功率;V i为节点i的电压幅值,G ij和B ij分别为支路ij的电导和电纳,θ ij为支路ij的相角差;
    Figure PCTCN2022107149-appb-100013
    为电力系统节点集合;
    P i and Q i are the injected active and reactive power of node i respectively; V i is the voltage amplitude of node i, G ij and B ij are the conductance and susceptance of branch ij respectively, θ ij is the phase angle difference;
    Figure PCTCN2022107149-appb-100013
    A collection of power system nodes;
    确定所述热力系统的水力模型由流量连续性方程、回路压力方程和压头损失方程组成;所述热力系统由水力模型和热力模型组成;所述热力系统的水利模型为:It is determined that the hydraulic model of the thermal system is composed of flow continuity equation, loop pressure equation and pressure head loss equation; the thermal system is composed of hydraulic model and thermal model; the hydraulic model of the thermal system is:
    Figure PCTCN2022107149-appb-100014
    Figure PCTCN2022107149-appb-100014
    其中,A h为节点-支路关联矩阵;B为回路-支路关联矩阵;
    Figure PCTCN2022107149-appb-100015
    为管道质量流量速率;
    Figure PCTCN2022107149-appb-100016
    为节点注入流量;h f为压头损失;K为管道的阻尼系数;
    Among them, A h is the node-branch correlation matrix; B is the loop-branch correlation matrix;
    Figure PCTCN2022107149-appb-100015
    is the pipeline mass flow rate;
    Figure PCTCN2022107149-appb-100016
    is the node injection flow; h f is the pressure head loss; K is the damping coefficient of the pipeline;
    确定热力模型由节点功率方程、管道温度下降方程和节点介质混合方程组成;所述热力模型为:Determine that the thermodynamic model is composed of node power equation, pipeline temperature drop equation and node medium mixing equation; the thermodynamic model is:
    Figure PCTCN2022107149-appb-100017
    Figure PCTCN2022107149-appb-100017
    其中,H i为节点i的注入热功率,C p为水的比热容,T s,i、T o,i为节点i的送热管道水温和出口水温,T j(ij)的下标ij表示以i、j为首末端节点的热网管道支路;T i(ij)、T j(ij)为支路的i、j端温度,T e表示外界环境温度; Among them, H i is the injected thermal power of node i, C p is the specific heat capacity of water, T s,i and T o,i are the water temperature and outlet water temperature of the heat transfer pipe of node i, and the subscript ij of T j(ij) represents The heating network pipe branch with i and j as the head and end nodes; T i(ij) and T j(ij) are the temperatures at the i and j ends of the branch, and T e represents the external ambient temperature;
    确定耦合装置产电产热功率为:Determine the electricity and heat generation power of the coupling device as:
    Figure PCTCN2022107149-appb-100018
    Figure PCTCN2022107149-appb-100018
    式中,
    Figure PCTCN2022107149-appb-100019
    分别为时段t、第i台抽凝机组电出力和热出力;
    Figure PCTCN2022107149-appb-100020
    分别为电出力上下限;α 1、α 2以及α 3为多边形区域表示系数;α 1、α 2以及α 3在给定热电联产装置时为常数。
    In the formula,
    Figure PCTCN2022107149-appb-100019
    Respectively, period t, electric output and thermal output of the i-th condensing unit;
    Figure PCTCN2022107149-appb-100020
    are the upper and lower limits of electric output; α 1 , α 2 and α 3 are polygonal area representation coefficients; α 1 , α 2 and α 3 are constants when a cogeneration device is given.
  4. 根据权利要求1所述的电-热综合能源系统协调优化方法,其中,所述以电力系统、热力系统以及耦合装置的历史发电功率为基础建立电-热综合能源系统调度模型,包括:The method for coordinating and optimizing an integrated electric-thermal energy system according to claim 1, wherein the establishment of a scheduling model for an integrated electric-thermal energy system based on the historical power generation power of the electric power system, the thermal system and the coupling device includes:
    以实现电-热综合能源系统的总运行成本最小为目标,同时为实现可在生能源的最大消纳,将可在生能源未消纳的部分作为惩罚项,建立目标函数;To achieve the minimum total operating cost of the electric-thermal integrated energy system as the goal, and to achieve the maximum consumption of renewable energy, the unconsumed part of the renewable energy is used as a penalty item to establish an objective function;
    建立所述电-热综合能源系统调度模型的约束条件;所述约束条件包括:节点功率平衡等式约束、网络安全约束、热电联产装置约束、可再生能源约束以及常规机组出力约束。Constraints of the electric-thermal integrated energy system scheduling model are established; the constraints include: node power balance equation constraints, network security constraints, cogeneration device constraints, renewable energy constraints, and conventional unit output constraints.
  5. 根据权利要求4所述的电-热综合能源系统协调优化方法,其中,所述目标函数为:min F=f 1+f 2+f 3;f 1为常规机组运行成本,f 2为热电联产装置运行成本,f 3为弃风惩罚;所述常规机组运行成本f 1为: The coordinated optimization method for electric-thermal comprehensive energy system according to claim 4, wherein, the objective function is: min F=f 1 +f 2 +f 3 ; f 1 is the operating cost of a conventional unit, and f 2 is The operating cost of the production device, f 3 is the penalty for wind abandonment; the operating cost f 1 of the conventional unit is:
    Figure PCTCN2022107149-appb-100021
    为常规机组发电功率,b 0、b 1以及b 2为常规机组能耗系数,N G为常规机组的数量;T为调度周期,△t为调度时间间隔;
    Figure PCTCN2022107149-appb-100021
    is the generating power of conventional units, b 0 , b 1 and b 2 are the energy consumption coefficients of conventional units, N G is the number of conventional units; T is the scheduling period, and △t is the scheduling time interval;
    所述热电联产机组运行成本f 2为: The operating cost f2 of the combined heat and power unit is:
    Figure PCTCN2022107149-appb-100022
    Figure PCTCN2022107149-appb-100022
    Figure PCTCN2022107149-appb-100023
    分别为在时段t,节点i所连热电联产装置发电功率和产热功率;a 0、a 1、a 2、a 3、a 4以及a 5为热电联产装置能耗系数,N chp为热电联产的数量;
    Figure PCTCN2022107149-appb-100023
    are the power generation power and heat production power of the cogeneration device connected to node i in the period t; a 0 , a 1 , a 2 , a 3 , a 4 and a 5 are the energy consumption coefficients of the cogeneration device, N chp Amount of combined heat and power;
    弃风惩罚f 3为: Wind abandonment penalty f 3 is:
    Figure PCTCN2022107149-appb-100024
    Figure PCTCN2022107149-appb-100024
    式中,
    Figure PCTCN2022107149-appb-100025
    表示在时段t,节点i所连风力发电机出力,k为弃风惩罚系数;k为常数。
    In the formula,
    Figure PCTCN2022107149-appb-100025
    Indicates the output of the wind turbine connected to node i in the time period t, k is the wind curtailment penalty coefficient; k is a constant.
  6. 根据权利要求4所述的电-热综合能源系统协调优化方法,其中,The electric-thermal comprehensive energy system coordination optimization method according to claim 4, wherein,
    所述节点功率平衡等式约束条件基于网络节点有功功率平衡方程为:The node power balance equation constraints are based on the network node active power balance equation as:
    Figure PCTCN2022107149-appb-100026
    Figure PCTCN2022107149-appb-100026
    Figure PCTCN2022107149-appb-100027
    Figure PCTCN2022107149-appb-100027
    其中,
    Figure PCTCN2022107149-appb-100028
    为电力系统、热力系统节点集合;T为调度时段;
    Figure PCTCN2022107149-appb-100029
    分别为时段t,节点i的电负荷和热负荷功率;
    in,
    Figure PCTCN2022107149-appb-100028
    is the collection of power system and thermal system nodes; T is the scheduling period;
    Figure PCTCN2022107149-appb-100029
    are the electric load and thermal load power of node i during period t, respectively;
    所述网络安全约束包括:The network security constraints include:
    Figure PCTCN2022107149-appb-100030
    Figure PCTCN2022107149-appb-100030
    Figure PCTCN2022107149-appb-100031
    Figure PCTCN2022107149-appb-100031
    Figure PCTCN2022107149-appb-100032
    Figure PCTCN2022107149-appb-100032
    V i,max、V i,min分别为节点i电压幅值的上限和下限;T sj为流入热网节点j的热水温度,
    Figure PCTCN2022107149-appb-100033
    分别为供水温度上、下限;m jk为热水管道k的质量流量速率,m k,max、m k,min分别为质量流量速率的上、下限;
    V i,max and V i,min are the upper limit and lower limit of the voltage amplitude of node i respectively; T sj is the temperature of hot water flowing into node j of the heating network,
    Figure PCTCN2022107149-appb-100033
    are the upper and lower limits of the water supply temperature; m jk is the mass flow rate of the hot water pipe k, and m k,max and m k,min are the upper and lower limits of the mass flow rate;
    所述热电联产装置约束为:The constraints of the cogeneration device are:
    Figure PCTCN2022107149-appb-100034
    Figure PCTCN2022107149-appb-100034
    式中:
    Figure PCTCN2022107149-appb-100035
    分别为前后两个时段的热电联产发电功率,
    Figure PCTCN2022107149-appb-100036
    分别为热电联产装置爬坡速率上下限;
    In the formula:
    Figure PCTCN2022107149-appb-100035
    are the combined heat and power generation power of the two periods before and after, respectively,
    Figure PCTCN2022107149-appb-100036
    Respectively, the upper and lower limits of the climbing rate of the cogeneration unit;
    所述可再生能源约束为:The renewable energy constraints are:
    Figure PCTCN2022107149-appb-100037
    Figure PCTCN2022107149-appb-100037
    Figure PCTCN2022107149-appb-100038
    表示时段t、风机i的发电功率,
    Figure PCTCN2022107149-appb-100039
    Figure PCTCN2022107149-appb-100040
    的最大值;
    Figure PCTCN2022107149-appb-100038
    Indicates the power generation power of wind turbine i in time period t,
    Figure PCTCN2022107149-appb-100039
    for
    Figure PCTCN2022107149-appb-100040
    the maximum value;
    所述常规机组出力约束为:The output constraint of the conventional unit is:
    Figure PCTCN2022107149-appb-100041
    Figure PCTCN2022107149-appb-100041
    同时满足爬坡约束:
    Figure PCTCN2022107149-appb-100042
    Also satisfy the climbing constraints:
    Figure PCTCN2022107149-appb-100042
    Figure PCTCN2022107149-appb-100043
    分别为机组出力上、下限,
    Figure PCTCN2022107149-appb-100044
    分为机组爬坡速率上、下限。
    Figure PCTCN2022107149-appb-100043
    are the upper and lower limits of unit output, respectively,
    Figure PCTCN2022107149-appb-100044
    Divided into the upper and lower limits of the ramp rate of the unit.
  7. 根据权利要求1所述的电-热综合能源系统协调优化方法,其中,所述对基于SAC框架优化调度模型进行训练,得到预先训练的基于SAC框架优化调度模型,包括:The electric-thermal comprehensive energy system coordination and optimization method according to claim 1, wherein said training the optimal scheduling model based on the SAC framework to obtain a pre-trained optimal scheduling model based on the SAC framework includes:
    为评判器目标网络赋值,设置智能记忆库容量D;Assign a value to the target network of the evaluator, and set the capacity D of the intelligent memory bank;
    当连续m 0轮训练的奖励平均值变化幅度小于δ e%时,对于每一时段,从行动器策略网络中采样控制动作a tφ(a t|S t),并将所述控制动作施加在电-热综合能源系统上,针对风电不确定性进行采样,得到系统下一个时刻运行状态S t+1,再将状态转移及奖励置入经验库D; When the change range of the average reward value of continuous m 0 rounds of training is less than δ e %, for each period, sample the control action at φ ( at |S t ) from the actor policy network, and the control action The action is applied to the electric-thermal integrated energy system, and the wind power uncertainty is sampled to obtain the operating state S t+1 of the system at the next moment, and then the state transition and rewards are put into the experience database D;
    更新评判器Q网络、行动器策略网络φ、温度系数、目标网络得到训练好的策略网络,作为所述SAC框架优化调度模型。Update the evaluator Q network, the actor policy network φ, the temperature coefficient, and the target network to obtain a trained policy network, which is used as the optimal scheduling model of the SAC framework.
  8. 根据权利要求1所述的电-热综合能源系统协调优化方法,其中,The coordinated optimization method for electric-thermal comprehensive energy system according to claim 1, wherein,
    更新评判器Q网络、行动器策略网络、温度系数、目标网络采用的方法为随机梯度下降SGD算法或Adam算法。The method used to update the Q network of the judge, the strategy network of the actor, the temperature coefficient, and the target network is the stochastic gradient descent SGD algorithm or the Adam algorithm.
  9. 一种电-热综合能源系统协调优化系统,包括:An electric-thermal comprehensive energy system coordination and optimization system, including:
    第一参数获取模块,配置为获取实时电-热综合能源系统参数;The first parameter acquisition module is configured to acquire real-time electric-thermal comprehensive energy system parameters;
    功率计算模块,配置为基于所述实时电-热综合能源系统参数,分别计算电-热综合能源系统的电力系统、热力系统以及耦合装置的实时发电功率;The power calculation module is configured to calculate the real-time power generation power of the electric power system, the thermal system and the coupling device of the electric-thermal comprehensive energy system based on the real-time electric-thermal comprehensive energy system parameters;
    调度输出模块,配置为将所述实时发电功率输入至预先训练的基于柔性行动器-评判器SAC框架优化调度模型中,输出调度动作,形成电-热综合能源系统协调策略;The scheduling output module is configured to input the real-time generated power into the pre-trained flexible actor-evaluator SAC framework-based optimal scheduling model, output scheduling actions, and form an electric-thermal comprehensive energy system coordination strategy;
    所述调度输出模块中预先训练的基于SAC框架优化调度模型包括:The optimized scheduling model based on the SAC framework of the pre-training in the scheduling output module includes:
    第二参数获取模块,配置为获取历史电-热综合能源系统参数;The second parameter acquisition module is configured to acquire historical electric-thermal comprehensive energy system parameters;
    模型建立模块,配置为基于所述历史电-热综合能源系统参数,分别计算电-热综合能源系统的电力系统、热力系统以及耦合装置的历史发电功率,并以电力系统、热力系统以及耦合装置的历史发电功率为基础建立电-热综合能源系统调度模型;The model building module is configured to calculate the historical power generation power of the electric power system, thermal system and coupling device of the electric-thermal comprehensive energy system based on the historical electric-thermal comprehensive energy system parameters, and use the electric power system, thermal system and coupling device The dispatching model of the electric-thermal integrated energy system is established on the basis of the historical power generation;
    模型优化模块,配置为以强化学习环境、状态、动作及奖励作为基本要素,结合电-热综合能源系统调度模型建立基于SAC框架优化调度模型;The model optimization module is configured to take the reinforcement learning environment, state, action and reward as the basic elements, combined with the electric-thermal comprehensive energy system scheduling model to establish an optimal scheduling model based on the SAC framework;
    模型训练模块,配置为对基于SAC框架优化调度模型进行训练,得到预先训练的基于SAC框架优化调度模型;The model training module is configured to train the optimal scheduling model based on the SAC framework to obtain a pre-trained optimal scheduling model based on the SAC framework;
    所述模型优化模块,配置为设置动作变量为
    Figure PCTCN2022107149-appb-100045
    其中,P i G
    Figure PCTCN2022107149-appb-100046
    以及P i chp依次为常规机组发电功率、热电联产装置热功率以及热电联产装置发电功率;
    The model optimization module is configured to set the action variable as
    Figure PCTCN2022107149-appb-100045
    Among them, P i G ,
    Figure PCTCN2022107149-appb-100046
    And P i chp is the generating power of the conventional unit, the thermal power of the combined heat and power device, and the generating power of the combined heat and power device in turn;
    确定状态空间变量为
    Figure PCTCN2022107149-appb-100047
    其中,P i G、P load、P w、P i chp、H load
    Figure PCTCN2022107149-appb-100048
    以及T e依次为常规机组发电功率、电负荷、风力发电功率、热电联产装置发电功率、热负荷、热电联产装置热功率以及环境温度;
    Determine the state space variable as
    Figure PCTCN2022107149-appb-100047
    Among them, P i G , P load , P w , P i chp , H load ,
    Figure PCTCN2022107149-appb-100048
    And T e is the generating power of the conventional unit, the electric load, the wind power generating power, the generating power of the cogeneration device, the heat load, the thermal power of the cogeneration device and the ambient temperature;
    搭建强化学习环境,通过策略网络得到当前动作作用于环境,得到即时奖励和下一时段的状态,并为策略评估提供奖励;Build an intensive learning environment, get the current action acting on the environment through the policy network, get immediate rewards and the state of the next period, and provide rewards for strategy evaluation;
    设置强化学习目标为实现长期奖励最大化,确定优化目标的相反数设计为即时奖励,同时根据约束条件设置惩罚机制加入到所述即时奖励中得到最终的奖励函数;其中,所述惩罚机制为:Set the reinforcement learning goal to maximize the long-term reward, determine the opposite number of the optimization goal and design it as an immediate reward, and set a penalty mechanism according to the constraints to add to the immediate reward to obtain the final reward function; wherein, the penalty mechanism is:
    Figure PCTCN2022107149-appb-100049
    Figure PCTCN2022107149-appb-100049
    Figure PCTCN2022107149-appb-100050
    Figure PCTCN2022107149-appb-100050
    式中,β v为惩罚系数;常数系数与越限惩罚相应设置; In the formula, β v is the penalty coefficient; the constant coefficient is set correspondingly to the limit violation penalty;
    所述奖励函数为
    Figure PCTCN2022107149-appb-100051
    f 1、f 2、以及f 3分别为常规机组运行成本、热电联产装置运行成本以及弃风惩罚;
    Figure PCTCN2022107149-appb-100052
    以及
    Figure PCTCN2022107149-appb-100053
    分别为常规机组出力越限及爬坡越限惩罚项;φ V为系统节点电压越限惩罚;
    Figure PCTCN2022107149-appb-100054
    以及
    Figure PCTCN2022107149-appb-100055
    分别为热电联产机组出力及爬坡越限惩罚项;φ T为系统节点温度越限惩罚;φ m为系统管道质量流量速率越限惩罚。
    The reward function is
    Figure PCTCN2022107149-appb-100051
    f 1 , f 2 , and f 3 are the operating cost of the conventional unit, the operating cost of the combined heat and power unit, and the wind curtailment penalty, respectively;
    Figure PCTCN2022107149-appb-100052
    as well as
    Figure PCTCN2022107149-appb-100053
    Respectively, the conventional unit output over-limit and climbing over-limit penalty items; φ V is the system node voltage over-limit penalty;
    Figure PCTCN2022107149-appb-100054
    as well as
    Figure PCTCN2022107149-appb-100055
    Respectively, the output of the combined heat and power unit and the over-limit penalty for climbing; φ T is the over-limit penalty for the system node temperature; φ m is the over-limit penalty for the mass flow rate of the system pipeline.
  10. 根据权利要求9所述的电-热综合能源系统协调优化系统,其中,所述实时电-热综合能源系统参数以及所述历史电-热综合能源系统参数包括电热综合能源系统网络参数和电热负荷出力及风电出 力。The electric-thermal comprehensive energy system coordination and optimization system according to claim 9, wherein the real-time electric-thermal comprehensive energy system parameters and the historical electric-thermal comprehensive energy system parameters include electric-thermal comprehensive energy system network parameters and electric-heat load output and wind power output.
  11. 根据权利要求9所述的电-热综合能源系统协调优化系统,其中,The electric-thermal comprehensive energy system coordination and optimization system according to claim 9, wherein,
    所述功率计算模块,配置为确定交流潮流作为电力系统的分析方法;其中,所述电力系统的功率平衡方程为:The power calculation module is configured to determine the AC power flow as an analysis method of the power system; wherein, the power balance equation of the power system is:
    Figure PCTCN2022107149-appb-100056
    P i和Q i分别为节点i的注入有功、无功功率;V i为节点i的电压幅值,G ij和B ij分别为支路ij的电导和电纳,θ ij为支路ij的相角差;
    Figure PCTCN2022107149-appb-100057
    为电力系统节点集合;
    Figure PCTCN2022107149-appb-100056
    P i and Q i are the injected active and reactive power of node i respectively; V i is the voltage amplitude of node i, G ij and B ij are the conductance and susceptance of branch ij respectively, θ ij is the phase angle difference;
    Figure PCTCN2022107149-appb-100057
    A collection of power system nodes;
    所述功率计算模块,配置为确定所述热力系统的水力模型由流量连续性方程、回路压力方程和压头损失方程组成;所述热力系统由水力模型和热力模型组成;所述热力系统的水利模型为:The power calculation module is configured to determine that the hydraulic model of the thermal system is composed of flow continuity equation, loop pressure equation and pressure head loss equation; the thermal system is composed of hydraulic model and thermal model; the hydraulic model of the thermal system The model is:
    Figure PCTCN2022107149-appb-100058
    Figure PCTCN2022107149-appb-100058
    其中,A h为节点-支路关联矩阵;B为回路-支路关联矩阵;
    Figure PCTCN2022107149-appb-100059
    为管道质量流量速率;
    Figure PCTCN2022107149-appb-100060
    为节点注入流量;h f为压头损失;K为管道的阻尼系数;
    Among them, A h is the node-branch correlation matrix; B is the loop-branch correlation matrix;
    Figure PCTCN2022107149-appb-100059
    is the pipeline mass flow rate;
    Figure PCTCN2022107149-appb-100060
    is the node injection flow; h f is the pressure head loss; K is the damping coefficient of the pipeline;
    所述功率计算模块,配置为确定热力模型由节点功率方程、管道温度下降方程和节点介质混合方程组成;所述热力模型为:The power calculation module is configured to determine that the thermodynamic model is composed of a node power equation, a pipeline temperature drop equation and a node medium mixing equation; the thermodynamic model is:
    Figure PCTCN2022107149-appb-100061
    Figure PCTCN2022107149-appb-100061
    其中,H i为节点i的注入热功率,C p为水的比热容,T s,i、T o,i为节点i的送热管道水温和出口水温,T j(ij)的下标ij表示以i、j为首末端节点的热网管道支路;T i(ij)、T j(ij)为支路的i、j端温度,T e表示外界环境温度; Among them, H i is the injected thermal power of node i, C p is the specific heat capacity of water, T s,i and T o,i are the water temperature and outlet water temperature of the heat transfer pipe of node i, and the subscript ij of T j(ij) represents The heating network pipe branch with i and j as the head and end nodes; T i(ij) and T j(ij) are the temperatures at the i and j ends of the branch, and T e represents the external ambient temperature;
    所述功率计算模块,配置为确定耦合装置产电产热功率为:The power calculation module is configured to determine the electricity and heat generation power of the coupling device as:
    Figure PCTCN2022107149-appb-100062
    Figure PCTCN2022107149-appb-100062
    式中,
    Figure PCTCN2022107149-appb-100063
    分别为时段t、第i台抽凝机组电出力和热出力;
    Figure PCTCN2022107149-appb-100064
    分别为电出力上下限;α 1、α 2以及α 3为多边形区域表示系数;α 1、α 2以及α 3在给定热电联产装置时为常数。
    In the formula,
    Figure PCTCN2022107149-appb-100063
    Respectively, period t, electric output and thermal output of the i-th condensing unit;
    Figure PCTCN2022107149-appb-100064
    are the upper and lower limits of electric output; α 1 , α 2 and α 3 are polygonal area representation coefficients; α 1 , α 2 and α 3 are constants when a cogeneration device is given.
  12. 根据权利要求9所述的电-热综合能源系统协调优化系统,其中,所述模型建立模块,配置为以实现电-热综合能源系统的总运行成本最小为目标,同时为实现可在生能源的最大消纳,将可在生能源未消纳的部分作为惩罚项,建立目标函数;According to claim 9, the electric-thermal integrated energy system coordination optimization system, wherein, the model building module is configured to achieve the minimum total operating cost of the electric-thermal integrated energy system as the goal, and at the same time to realize the renewable energy The maximum consumption of , and the unconsumed part of the renewable energy can be used as a penalty item to establish an objective function;
    建立电-热综合能源系统调度模型的约束条件,约束条件包括:节点功率平衡等式约束、网络安全约束、热电联产装置约束、可再生能源约束和常规机组出力约束。Constraints for the scheduling model of the electric-thermal integrated energy system are established. The constraints include: node power balance equation constraints, network security constraints, cogeneration device constraints, renewable energy constraints, and conventional unit output constraints.
  13. 根据权利要求12所述的电-热综合能源系统协调优化系统,其中,The electric-thermal comprehensive energy system coordination and optimization system according to claim 12, wherein,
    所述目标函数为:min F=f 1+f 2+f 3;f 1为常规机组运行成本,f 2为热电联产装置运行成本,f 3为弃风惩罚; The objective function is: min F=f 1 +f 2 +f 3 ; f 1 is the operating cost of the conventional unit, f 2 is the operating cost of the combined heat and power unit, and f 3 is the penalty for abandoning wind;
    所述常规机组运行成本f 1为:
    Figure PCTCN2022107149-appb-100065
    为常规机组发电功率,b 0、b 1以及b 2为常规机组能耗系数,N G为常规机组的数量;T为调度周期,△t为调度时间间隔;
    The operating cost f 1 of the conventional unit is:
    Figure PCTCN2022107149-appb-100065
    is the generating power of conventional units, b 0 , b 1 and b 2 are the energy consumption coefficients of conventional units, N G is the number of conventional units; T is the scheduling period, and △t is the scheduling time interval;
    所述热电联产机组运行成本f 2为: The operating cost f2 of the combined heat and power unit is:
    Figure PCTCN2022107149-appb-100066
    Figure PCTCN2022107149-appb-100066
    Figure PCTCN2022107149-appb-100067
    分别为在时段t,节点i所连热电联产装置发电功率和产热功率;a 0、a 1、a 2、a 3、a 4以及a 5为热电联产装置能耗系数,N chp为热电联产的数量;
    Figure PCTCN2022107149-appb-100067
    are the power generation power and heat production power of the cogeneration device connected to node i in the period t; a 0 , a 1 , a 2 , a 3 , a 4 and a 5 are the energy consumption coefficients of the cogeneration device, N chp Amount of combined heat and power;
    弃风惩罚f 3为: Wind abandonment penalty f 3 is:
    Figure PCTCN2022107149-appb-100068
    Figure PCTCN2022107149-appb-100068
    式中,
    Figure PCTCN2022107149-appb-100069
    表示在时段t,节点i所连风力发电机出力,k为弃风惩罚系数;k为常数。
    In the formula,
    Figure PCTCN2022107149-appb-100069
    Indicates the output of the wind turbine connected to node i in the time period t, k is the wind curtailment penalty coefficient; k is a constant.
  14. 根据权利要求9所述的电-热综合能源系统协调优化系统,其中,所述节点功率平衡等式约束条件基于网络节点有功功率平衡方程为:The electric-thermal comprehensive energy system coordination optimization system according to claim 9, wherein the node power balance equation constraint condition is based on the network node active power balance equation:
    Figure PCTCN2022107149-appb-100070
    Figure PCTCN2022107149-appb-100070
    Figure PCTCN2022107149-appb-100071
    Figure PCTCN2022107149-appb-100071
    其中,
    Figure PCTCN2022107149-appb-100072
    为电力系统、热力系统节点集合;T为调度时段;
    Figure PCTCN2022107149-appb-100073
    分别为时段t,节点i的电负荷和热负荷功率;
    in,
    Figure PCTCN2022107149-appb-100072
    is the collection of power system and thermal system nodes; T is the scheduling period;
    Figure PCTCN2022107149-appb-100073
    are the electric load and thermal load power of node i during period t, respectively;
    所述网络安全约束包括:The network security constraints include:
    Figure PCTCN2022107149-appb-100074
    Figure PCTCN2022107149-appb-100074
    Figure PCTCN2022107149-appb-100075
    Figure PCTCN2022107149-appb-100075
    Figure PCTCN2022107149-appb-100076
    Figure PCTCN2022107149-appb-100076
    V i,max、V i,min分别为节点i电压幅值的上限和下限;T sj为流入热网节点j的热水温度,
    Figure PCTCN2022107149-appb-100077
    分别为供水温度上、下限;m jk为热水管道k的质量流量速率,m k,max、m k,min分别为质量流量速率的上、下限;
    V i,max and V i,min are the upper limit and lower limit of the voltage amplitude of node i respectively; T sj is the temperature of hot water flowing into node j of the heating network,
    Figure PCTCN2022107149-appb-100077
    are the upper and lower limits of the water supply temperature; m jk is the mass flow rate of the hot water pipe k, and m k,max and m k,min are the upper and lower limits of the mass flow rate;
    所述热电联产装置约束为:The constraints of the cogeneration device are:
    Figure PCTCN2022107149-appb-100078
    Figure PCTCN2022107149-appb-100078
    式中:
    Figure PCTCN2022107149-appb-100079
    分别为前后两个时段的热电联产发电功率,
    Figure PCTCN2022107149-appb-100080
    分别为热电联产装置爬坡速率上下限;
    In the formula:
    Figure PCTCN2022107149-appb-100079
    are the combined heat and power generation power of the two periods before and after, respectively,
    Figure PCTCN2022107149-appb-100080
    Respectively, the upper and lower limits of the climbing rate of the cogeneration unit;
    所述可再生能源约束为:The renewable energy constraints are:
    Figure PCTCN2022107149-appb-100081
    Figure PCTCN2022107149-appb-100081
    式中,
    Figure PCTCN2022107149-appb-100082
    表示时段t、风机i的发电功率,
    Figure PCTCN2022107149-appb-100083
    Figure PCTCN2022107149-appb-100084
    的最大值;
    In the formula,
    Figure PCTCN2022107149-appb-100082
    Indicates the power generation power of wind turbine i in time period t,
    Figure PCTCN2022107149-appb-100083
    for
    Figure PCTCN2022107149-appb-100084
    the maximum value;
    所述常规机组出力约束为:The output constraint of the conventional unit is:
    Figure PCTCN2022107149-appb-100085
    Figure PCTCN2022107149-appb-100085
    同时满足爬坡约束:
    Figure PCTCN2022107149-appb-100086
    Also satisfy the climbing constraints:
    Figure PCTCN2022107149-appb-100086
    式中:
    Figure PCTCN2022107149-appb-100087
    分别为机组出力上、下限,
    Figure PCTCN2022107149-appb-100088
    分为机组爬坡速率上、下限。
    In the formula:
    Figure PCTCN2022107149-appb-100087
    are the upper and lower limits of unit output, respectively,
    Figure PCTCN2022107149-appb-100088
    Divided into the upper and lower limits of the ramp rate of the unit.
  15. 根据权利要求9所述的电-热综合能源系统协调优化系统,其中,所述模型训练模块具体配置为:为评判器目标网络赋值,设置智能记忆库容量D;The electric-thermal integrated energy system coordination and optimization system according to claim 9, wherein the model training module is specifically configured as: assigning values to the target network of the evaluator, and setting the capacity D of the intelligent memory bank;
    当连续m 0轮训练的奖励平均值变化幅度小于δ e%时,对于每一时段,都从行动器策略网络中采样控制动作a tφ(a t|S t),并将所述控制动作施加在电-热综合能源系统上,并针对风电不确定性进行采样,得到系统下一个时刻运行状态S t+1,再将状态转移及奖励置入经验库D; When the variation range of the average reward value of continuous m 0 rounds of training is less than δ e %, for each period, the control action at φ ( at |S t ) is sampled from the actor policy network, and the The control action is applied to the electric-thermal integrated energy system, and the wind power uncertainty is sampled to obtain the operating state S t+1 of the system at the next moment, and then the state transition and rewards are put into the experience database D;
    并更新评判器Q网络、行动器策略网络φ、温度系数、目标网络得到训练好的策略网络,作为优化的SAC框架优化调度模型。And update the judger Q network, the actor strategy network φ, the temperature coefficient, and the target network to get the trained strategy network as the optimized SAC framework optimization scheduling model.
  16. 根据权利要求9所述的电-热综合能源系统协调优化系统,其中,更新评判器Q网络、行动器策略网络、温度系数、目标网络采用的方法为随机梯度下降SGD算法或Adam算法。The electric-thermal comprehensive energy system coordination and optimization system according to claim 9, wherein the method used to update the evaluator Q network, the actor strategy network, the temperature coefficient, and the target network is the stochastic gradient descent SGD algorithm or the Adam algorithm.
  17. 一种电子设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现权利要求1-8任一所述电-热综合能源系统协调优化方法。An electronic device, comprising a memory, a processor, and a computer program stored in the memory and operable on the processor, when the processor executes the computer program, any one of claims 1-8 is implemented Coordinated optimization method for electric-thermal integrated energy system.
  18. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1-8任一所述电-热综合能源系统协调优化方法。A computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the method for coordinating and optimizing the electric-thermal comprehensive energy system described in any one of claims 1-8 is realized.
  19. 一种计算机程序,所述计算机程序包括计算机可读代码,在所述计算机可读代码在电子设备中运行的情况下,所述电子设备的处理器执行用于如权利要求1-8任一所述电-热综合能源系统协调优化方法。A computer program, the computer program comprising computer-readable code, when the computer-readable code is run in an electronic device, a processor of the electronic device executes a program as described in any one of claims 1-8. Coordinated optimization method for electric-thermal integrated energy system.
PCT/CN2022/107149 2021-11-15 2022-07-21 Coordination and optimization method and system for comprehensive electric-thermal energy system, and device, medium and program WO2023082697A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111349881.4 2021-11-15
CN202111349881.4A CN113902040B (en) 2021-11-15 2021-11-15 Method, system, equipment and storage medium for coordinating and optimizing electricity-heat comprehensive energy system

Publications (1)

Publication Number Publication Date
WO2023082697A1 true WO2023082697A1 (en) 2023-05-19

Family

ID=79194394

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/107149 WO2023082697A1 (en) 2021-11-15 2022-07-21 Coordination and optimization method and system for comprehensive electric-thermal energy system, and device, medium and program

Country Status (2)

Country Link
CN (1) CN113902040B (en)
WO (1) WO2023082697A1 (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113706094A (en) * 2021-07-29 2021-11-26 国电南瑞科技股份有限公司 Comprehensive energy real-time collaborative simulation system and method based on message bus
CN116629029A (en) * 2023-07-19 2023-08-22 天津大学 Data-driven-based flow industry user flexibility assessment method and related equipment
CN116629587A (en) * 2023-07-24 2023-08-22 长江三峡集团实业发展(北京)有限公司 Multi-target capacity expansion planning method and device for comprehensive energy system and electronic equipment
CN116911577A (en) * 2023-09-13 2023-10-20 国网信息通信产业集团有限公司 Comprehensive energy scheduling method, device, electronic equipment and computer readable medium
CN117151701A (en) * 2023-10-31 2023-12-01 山东欣历能源有限公司 Industrial waste heat recycling system for cogeneration
CN117252043A (en) * 2023-11-17 2023-12-19 山东大学 Multi-target optimal scheduling method and device for regional multi-energy complementary energy system
CN117272842A (en) * 2023-11-21 2023-12-22 中国电建集团西北勘测设计研究院有限公司 Cooperative control system and method for multi-industrial park comprehensive energy system
CN117273810A (en) * 2023-11-03 2023-12-22 连云港智源电力设计有限公司 Comprehensive energy sharing scheduling method and system with excitation compatibility
CN117291315A (en) * 2023-11-24 2023-12-26 湖南大学 Carbon recycling electric-gas-thermal multi-energy combined supply network cooperative operation method
CN117291445A (en) * 2023-11-27 2023-12-26 国网安徽省电力有限公司电力科学研究院 Multi-target prediction method based on state transition under comprehensive energy system
CN117371219A (en) * 2023-10-20 2024-01-09 华北电力大学 Modeling method of expansion energy hub applied to comprehensive energy system
CN117374975A (en) * 2023-12-06 2024-01-09 国网湖北省电力有限公司电力科学研究院 Real-time cooperative voltage regulation method for power distribution network based on approximate dynamic programming
CN117411036A (en) * 2023-08-31 2024-01-16 国家电网有限公司华东分部 Electric hydrogen conversion comprehensive energy operation method and device considering comprehensive demand response
CN117436672A (en) * 2023-12-20 2024-01-23 国网湖北省电力有限公司经济技术研究院 Comprehensive energy operation method and system considering equivalent cycle life and temperature control load
CN117455183A (en) * 2023-11-09 2024-01-26 国能江苏新能源科技开发有限公司 Comprehensive energy system optimal scheduling method based on deep reinforcement learning
CN117494910A (en) * 2024-01-02 2024-02-02 国网山东省电力公司电力科学研究院 Multi-energy coordination optimization control system and method based on carbon emission reduction

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113902040B (en) * 2021-11-15 2022-03-08 中国电力科学研究院有限公司 Method, system, equipment and storage medium for coordinating and optimizing electricity-heat comprehensive energy system
CN114372645A (en) * 2022-03-22 2022-04-19 山东大学 Energy supply system optimization method and system based on multi-agent reinforcement learning
CN115117888A (en) * 2022-06-28 2022-09-27 国网江苏省电力有限公司电力科学研究院 Garden comprehensive energy pressure regulating method and device, storage and computing equipment
CN116307136A (en) * 2023-02-24 2023-06-23 国网安徽省电力有限公司营销服务中心 Deep reinforcement learning-based energy system parameter optimization method, system, device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109190271A (en) * 2018-09-13 2019-01-11 东北大学 A kind of electric heating integrated energy system economic optimization dispatching method considering transmission loss
US20200327411A1 (en) * 2019-04-14 2020-10-15 Di Shi Systems and Method on Deriving Real-time Coordinated Voltage Control Strategies Using Deep Reinforcement Learning
CN112862281A (en) * 2021-01-26 2021-05-28 中国电力科学研究院有限公司 Method, device, medium and electronic equipment for constructing scheduling model of comprehensive energy system
CN113902040A (en) * 2021-11-15 2022-01-07 中国电力科学研究院有限公司 Method, system, equipment and storage medium for coordinating and optimizing electricity-heat comprehensive energy system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109241655A (en) * 2018-09-27 2019-01-18 河海大学 A kind of electric-thermal interconnection integrated energy system chance constraint coordination optimizing method
CN109345045B (en) * 2018-11-29 2021-11-30 东北大学 Electric heating comprehensive energy system economic dispatching method based on double-multiplier iterative algorithm
CN112734591A (en) * 2020-11-26 2021-04-30 清华大学 Electric heating comprehensive coordination scheduling method and device, equipment and medium
CN112668791A (en) * 2020-12-30 2021-04-16 华北电力大学(保定) Optimization method of combined heat and power system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109190271A (en) * 2018-09-13 2019-01-11 东北大学 A kind of electric heating integrated energy system economic optimization dispatching method considering transmission loss
US20200327411A1 (en) * 2019-04-14 2020-10-15 Di Shi Systems and Method on Deriving Real-time Coordinated Voltage Control Strategies Using Deep Reinforcement Learning
CN112862281A (en) * 2021-01-26 2021-05-28 中国电力科学研究院有限公司 Method, device, medium and electronic equipment for constructing scheduling model of comprehensive energy system
CN113902040A (en) * 2021-11-15 2022-01-07 中国电力科学研究院有限公司 Method, system, equipment and storage medium for coordinating and optimizing electricity-heat comprehensive energy system

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113706094B (en) * 2021-07-29 2024-02-20 国电南瑞科技股份有限公司 Comprehensive energy real-time collaborative simulation system and method based on message bus
CN113706094A (en) * 2021-07-29 2021-11-26 国电南瑞科技股份有限公司 Comprehensive energy real-time collaborative simulation system and method based on message bus
CN116629029A (en) * 2023-07-19 2023-08-22 天津大学 Data-driven-based flow industry user flexibility assessment method and related equipment
CN116629029B (en) * 2023-07-19 2023-09-29 天津大学 Data-driven-based flow industry user flexibility assessment method and related equipment
CN116629587A (en) * 2023-07-24 2023-08-22 长江三峡集团实业发展(北京)有限公司 Multi-target capacity expansion planning method and device for comprehensive energy system and electronic equipment
CN117411036A (en) * 2023-08-31 2024-01-16 国家电网有限公司华东分部 Electric hydrogen conversion comprehensive energy operation method and device considering comprehensive demand response
CN116911577B (en) * 2023-09-13 2024-02-09 国网信息通信产业集团有限公司 Comprehensive energy scheduling method, device, electronic equipment and computer readable medium
CN116911577A (en) * 2023-09-13 2023-10-20 国网信息通信产业集团有限公司 Comprehensive energy scheduling method, device, electronic equipment and computer readable medium
CN117371219B (en) * 2023-10-20 2024-03-12 华北电力大学 Modeling method of expansion energy hub applied to comprehensive energy system
CN117371219A (en) * 2023-10-20 2024-01-09 华北电力大学 Modeling method of expansion energy hub applied to comprehensive energy system
CN117151701A (en) * 2023-10-31 2023-12-01 山东欣历能源有限公司 Industrial waste heat recycling system for cogeneration
CN117151701B (en) * 2023-10-31 2024-02-09 山东欣历能源有限公司 Industrial waste heat recycling system for cogeneration
CN117273810B (en) * 2023-11-03 2024-04-05 连云港智源电力设计有限公司 Comprehensive energy sharing scheduling method and system with excitation compatibility
CN117273810A (en) * 2023-11-03 2023-12-22 连云港智源电力设计有限公司 Comprehensive energy sharing scheduling method and system with excitation compatibility
CN117455183A (en) * 2023-11-09 2024-01-26 国能江苏新能源科技开发有限公司 Comprehensive energy system optimal scheduling method based on deep reinforcement learning
CN117252043B (en) * 2023-11-17 2024-04-09 山东大学 Multi-target optimal scheduling method and device for regional multi-energy complementary energy system
CN117252043A (en) * 2023-11-17 2023-12-19 山东大学 Multi-target optimal scheduling method and device for regional multi-energy complementary energy system
CN117272842A (en) * 2023-11-21 2023-12-22 中国电建集团西北勘测设计研究院有限公司 Cooperative control system and method for multi-industrial park comprehensive energy system
CN117272842B (en) * 2023-11-21 2024-02-27 中国电建集团西北勘测设计研究院有限公司 Cooperative control system and method for multi-industrial park comprehensive energy system
CN117291315B (en) * 2023-11-24 2024-02-20 湖南大学 Carbon recycling electric-gas-thermal multi-energy combined supply network cooperative operation method
CN117291315A (en) * 2023-11-24 2023-12-26 湖南大学 Carbon recycling electric-gas-thermal multi-energy combined supply network cooperative operation method
CN117291445B (en) * 2023-11-27 2024-02-13 国网安徽省电力有限公司电力科学研究院 Multi-target prediction method based on state transition under comprehensive energy system
CN117291445A (en) * 2023-11-27 2023-12-26 国网安徽省电力有限公司电力科学研究院 Multi-target prediction method based on state transition under comprehensive energy system
CN117374975B (en) * 2023-12-06 2024-02-27 国网湖北省电力有限公司电力科学研究院 Real-time cooperative voltage regulation method for power distribution network based on approximate dynamic programming
CN117374975A (en) * 2023-12-06 2024-01-09 国网湖北省电力有限公司电力科学研究院 Real-time cooperative voltage regulation method for power distribution network based on approximate dynamic programming
CN117436672A (en) * 2023-12-20 2024-01-23 国网湖北省电力有限公司经济技术研究院 Comprehensive energy operation method and system considering equivalent cycle life and temperature control load
CN117436672B (en) * 2023-12-20 2024-03-12 国网湖北省电力有限公司经济技术研究院 Comprehensive energy operation method and system considering equivalent cycle life and temperature control load
CN117494910A (en) * 2024-01-02 2024-02-02 国网山东省电力公司电力科学研究院 Multi-energy coordination optimization control system and method based on carbon emission reduction
CN117494910B (en) * 2024-01-02 2024-03-22 国网山东省电力公司电力科学研究院 Multi-energy coordination optimization control system and method based on carbon emission reduction

Also Published As

Publication number Publication date
CN113902040A (en) 2022-01-07
CN113902040B (en) 2022-03-08

Similar Documents

Publication Publication Date Title
WO2023082697A1 (en) Coordination and optimization method and system for comprehensive electric-thermal energy system, and device, medium and program
Zhang et al. Dynamic energy conversion and management strategy for an integrated electricity and natural gas system with renewable energy: Deep reinforcement learning approach
Zhang et al. Soft actor-critic–based multi-objective optimized energy conversion and management strategy for integrated energy systems with renewable energy
Subathra et al. A hybrid with cross-entropy method and sequential quadratic programming to solve economic load dispatch problem
Yang et al. Hybrid policy-based reinforcement learning of adaptive energy management for the Energy transmission-constrained island group
Li et al. Probabilistic charging power forecast of EVCS: Reinforcement learning assisted deep learning approach
Goudarzi et al. Intelligent analysis of wind turbine power curve models
CN113780688B (en) Optimized operation method, system, equipment and medium of electric heating combined system
Jia et al. An improved artificial bee colony-BP neural network algorithm in the short-term wind speed prediction
Yin et al. Relaxed deep generative adversarial networks for real-time economic smart generation dispatch and control of integrated energy systems
Jin et al. A deep neural network coordination model for electric heating and cooling loads based on IoT data
Yuan et al. A multi-timescale smart grid energy management system based on adaptive dynamic programming and Multi-NN Fusion prediction method
Zhu et al. Structural safety monitoring of high arch dam using improved ABC-BP model
Jiang et al. Hybrid DE-TLBO algorithm for solving short term hydro-thermal optimal scheduling with incommensurable Objectives
Bolland et al. Jointly Learning Environments and Control Policies with Projected Stochastic Gradient Ascent
Nie et al. A general real-time OPF algorithm using DDPG with multiple simulation platforms
Adhinarayanan et al. Particle swarm optimisation for economic dispatch with cubic fuel cost function
Niu et al. A novel social-environmental-economic dispatch model for thermal/wind power generation and application
Guolian et al. Multiple-model predictive control based on fuzzy adaptive weights and its application to main-steam temperature in power plant
CN113591391A (en) Power load control device, control method, terminal, medium and application
Zhu et al. Integration of fuzzy controller with adaptive dynamic programming
Yu et al. Short-term gas load forecasting based on wavelet BP neural network optimized by genetic algorithm
Delgoshaei et al. Towards a Semantically-enabled Control Strategy for Building Simulations: Integration of Semantic Technologies and Model Predictive Control
Zhan et al. Accelerating Deep Reinforcement Learning with Fuzzy Logic Rules
Hou Construction of demand response model of integrated energy system based on machine learning algorithm