CN116306372A - Electric-gas area comprehensive energy system safety correction control method based on DDPG algorithm - Google Patents

Electric-gas area comprehensive energy system safety correction control method based on DDPG algorithm Download PDF

Info

Publication number
CN116306372A
CN116306372A CN202310304127.1A CN202310304127A CN116306372A CN 116306372 A CN116306372 A CN 116306372A CN 202310304127 A CN202310304127 A CN 202310304127A CN 116306372 A CN116306372 A CN 116306372A
Authority
CN
China
Prior art keywords
energy
power
correction control
hub
distribution network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310304127.1A
Other languages
Chinese (zh)
Inventor
彭寒梅
胡磊
李金果
谭貌
苏永新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiangtan University
Original Assignee
Xiangtan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiangtan University filed Critical Xiangtan University
Priority to CN202310304127.1A priority Critical patent/CN116306372A/en
Publication of CN116306372A publication Critical patent/CN116306372A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/38Arrangements for parallely feeding a single network by two or more generators, converters or transformers
    • H02J3/46Controlling of the sharing of output between the generators, converters, or transformers
    • H02J3/466Scheduling the operation of the generators, e.g. connecting or disconnecting generators to meet a given demand
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/28Design optimisation, verification or simulation using fluid dynamics, e.g. using Navier-Stokes equations or computational fluid dynamics [CFD]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/38Arrangements for parallely feeding a single network by two or more generators, converters or transformers
    • H02J3/46Controlling of the sharing of output between the generators, converters, or transformers
    • H02J3/48Controlling the sharing of the in-phase component
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/38Arrangements for parallely feeding a single network by two or more generators, converters or transformers
    • H02J3/46Controlling of the sharing of output between the generators, converters, or transformers
    • H02J3/50Controlling the sharing of the out-of-phase component
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/04Constraint-based CAD
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2113/00Details relating to the application field
    • G06F2113/04Power grid distribution networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/08Thermal analysis or thermal optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/14Force analysis or force optimisation, e.g. static or dynamic forces
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/10Power transmission or distribution systems management focussing at grid-level, e.g. load flow analysis, node profile computation, meshed network optimisation, active network management or spinning reserve management
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2300/00Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
    • H02J2300/20The dispersed energy generation being of renewable origin
    • H02J2300/22The renewable source being solar energy
    • H02J2300/24The renewable source being solar energy of photovoltaic origin
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2300/00Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
    • H02J2300/20The dispersed energy generation being of renewable origin
    • H02J2300/28The renewable source being wind energy

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Algebra (AREA)
  • Power Engineering (AREA)
  • Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Economics (AREA)
  • Molecular Biology (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Bioinformatics & Cheminformatics (AREA)

Abstract

The invention discloses a safety correction control method of an electric-gas area comprehensive energy system based on a DDPG algorithm, which comprises the following steps of 1) constructing a safety correction control model based on the DDPG algorithm, and setting an environment, a state space, an action space and a reward function of agent interaction; 2) Generating energy flow out-of-limit sample data, storing a pre-generated expert experience data set in an experience playback pool, and performing offline training on an intelligent body; 3) And (3) carrying out real-time online decision by the trained intelligent agent to obtain a safety correction control strategy in the emergency state of the system. The method has better generalization capability, can solve the problem of safety correction control of the comprehensive energy system with strong coupling of the state variable and the control variable, and minimizes the operation cost of the energy hub and the wind-discarding light quantity while eliminating the system energy flow out-of-limit to meet the static safety constraint.

Description

Electric-gas area comprehensive energy system safety correction control method based on DDPG algorithm
Technical Field
The invention relates to the technical field of comprehensive energy system engineering, in particular to a safety correction control method of an electric-gas area comprehensive energy system based on a DDPG algorithm.
Background
The comprehensive energy system of the electric-gas area consists of a power distribution network, a gas distribution network, renewable energy power generation, a coupling link between electricity and gas and the like which are distributed in one area. The current operation state of the comprehensive energy system in the electric-gas area can be divided into a normal state, an emergency state and a state to be recovered by referring to the classification of the operation state of the electric power system, and the equation constraint is satisfied but the inequality constraint is not satisfied in the emergency state. Along with the increasingly compact connection and coupling between a distribution network and a gas distribution network in an electric-gas area comprehensive energy system and the access of renewable energy power generation, the complexity and the uncertainty of the running state of the system are aggravated, and the risk is brought to the safe running of the system. Thus, it is necessary to perform safety analysis and control of the integrated energy system of the electro-pneumatic region.
The safety correction control means that when the system is disturbed or the fault enters an emergency state in the operation, the control variable is adjusted to eliminate the out-of-limit of the state variable, so that the system is restored to the normal state. The electric energy and the gas energy of the comprehensive energy system of the electric-gas area containing the energy hub are coupled through the energy hub, and the active power and the reactive power in the power distribution network have stronger coupling, so that the safety correction control of the electric-gas area comprehensive energy system has the following characteristics: 1) The state variables and the control variables are more, and the control variable of one type of energy source has coupling influence on the state variable of the other type of energy source, so that the safety correction control difficulty is high; 2) The operation and control modes of the energy hub are flexible, and the energy hub can participate in active regulation, thereby providing a new way for safety correction control means; 3) The coupling characteristics bring more uncertainty factors, the safe operation faces to coexistence of multiple forms, and the requirement on the rapidity of the safe correction control is high. Therefore, the safety correction control of the comprehensive energy system of the electric-gas area with the energy hub is challenged accurately and in real time, and the traditional optimization method and the traditional sensitivity method are difficult to simultaneously require accuracy and rapidity.
At present, most researches on safety analysis and correction control are aimed at an electric power system, and researches on safety control of a regional comprehensive energy system and an electric-gas regional comprehensive energy system are in a starting stage. The safety correction control of the comprehensive energy system of the electric-gas area with the energy hub is to firstly control the variable of the motion, then determine the next motion according to the energy flow state of the system after the motion, and is a typical Markov decision process, and the safety correction control is suitable for being solved by deep reinforcement learning. The depth deterministic strategy gradient (deep deterministicpolicygradient, DDPG) algorithm is a deep reinforcement learning algorithm, combines Q learning and strategy gradient, comprises a strategy deep neural network, a value deep neural network, a target strategy deep neural network and a target value deep neural network, provides an experience playback mechanism and a target network, and has more stable learning process and faster convergence.
Disclosure of Invention
The invention aims to solve one of the technical problems existing in the prior art. Therefore, the invention provides a DDPG algorithm-based safety correction control method for an electric-gas region comprehensive energy system, which comprises the following steps:
1) Constructing an electric-gas area comprehensive energy system safety correction control model based on a DDPG algorithm, and setting an environment, a state space, an action space and a reward function of intelligent agent interaction;
2) Generating energy flow out-of-limit sample data, storing a pre-generated expert experience data set in an experience playback pool, and performing offline training on an intelligent body;
3) And (3) carrying out real-time online decision by the trained intelligent agent to obtain a safety correction control strategy in the emergency state of the system.
The method for constructing the electric-gas area comprehensive energy system safety correction control model based on the DDPG algorithm comprises the following steps of:
the method comprises the steps of constructing an electric-gas area comprehensive energy system safety correction control model based on a DDPG algorithm, setting an intelligent agent interaction environment as an energy-containing hub electric-gas area comprehensive energy system simulator, wherein the simulator can perform multi-energy flow calculation and comprehensive sensitivity calculation, the multi-energy flow calculation adopts a unified solution combining with a Newton Lapherson algorithm, and the unified multi-energy flow model is as follows:
Figure BDA0004146112910000021
in the formula (1): x and u are state variables and control variables, and x= [ x ] e ,x g ]=[V iij ],u=[Q c ,P eho ]P, Q, G respectively represents the active and reactive equations of the power distribution network, the energy flow equation of the gas distribution network, x e 、x g Representing state variables of the distribution network and the gas distribution network, V i 、δ i 、π j Respectively the voltage amplitude, the phase angle and the natural gas node pressure of the power node, Q c 、P eh 、π o The adjustment amounts of the power node on-site reactive power compensation, the interaction energy of the energy hub and the power distribution network and the outlet pressure of the air compressor are respectively P L 、Q L Active and reactive loads, respectively, P i 、Q i Active power and reactive power are respectively injected into the power nodes, A is a node-pipeline correlation matrix, f is a pipeline flow vector, and P REG Is the wind-light absorption output, P eh 、F g The interaction energy of the energy hub, the power distribution network and the gas distribution network is respectively, P eh >0 represents that the distribution network supplies electric energy to the energy hub, P eh <0 represents that the energy hub provides electric energy for the power distribution network, L e 、L h Electric and thermal loads, η, respectively, of an energy hub CHP,e 、η CHP,h Conversion efficiency, η, of electric energy and heat energy respectively generated for cogeneration T 、η GB The efficiency of the power transformer and the gas turbine are respectively, and v is the natural gas distribution coefficient;
the comprehensive sensitivity is the weighted summation of the sensitivity of the out-of-limit state variable and the un-out-of-limit state variable to the control variable, and is as follows:
Figure BDA0004146112910000031
in the formula (2): s is the integrated sensitivity vector, S x-Peh 、S x-Qc 、S x-πo Respectively represent the state variable versus the control variable P eh 、Q c 、π o Is used for the detection of the sensitivity of the sensor,
Figure BDA0004146112910000032
and->
Figure BDA0004146112910000033
Derived from the inverse of the Jacobian matrix of the unified multi-energy flow model,/and>
Figure BDA0004146112910000034
and->
Figure BDA0004146112910000035
Obtained according to a unified multi-energy flow model, k ei ={k e1 ,k e2 },k e1 For the weight of out-of-limit power nodes, k e2 Weight of normal power node, and the same applies k gj ={k g1 ,k g2 };
Considering the richness and necessity of variables, the state space of the intelligent agent is set as follows:
s t =[V i,ti,t ,P l,t ,F ij,t ,Q ci,t ,P ehj,tok,t P′ REGj,t ] (3)
in the formula (3): s is(s) t Is a state space, V i,t ,、π i,t 、P l,t 、F ij,t 、Q ci,t 、P ehj,t ,π ok,t 、P’ REGj,t The voltage amplitude of the power node, the pressure of the natural gas node, the line power, the pipeline flow, the on-site reactive compensation quantity of the power node, the interaction energy of the energy hub and the power distribution network and the outlet pressure of the air compressor and the wind-light output at the time t are respectively calculated;
the action is a control variable of the system, and the action space of the intelligent agent is set as follows:
a t =[ΔQ ci,t ,ΔP ehj,t ,Δπ ok,t ] (4)
in the formula (4): a, a t As the motion space, deltaQ ci,t 、ΔP ehj,t 、Δπ ok,t The method comprises the steps of respectively carrying out on-site reactive power compensation on an ith power node at a moment t, interaction energy of a jth energy hub and a power distribution network and adjustment quantity of outlet pressure of a kth gas compressor;
setting a target of safety correction control to minimize the total adjustment amount of the control variable with the aim of satisfying the static safety constraint while minimizing the running cost of the energy hub and the amount of wind-discarding light thereof, thereby setting a bonus function including a target bonus and a constraint bonus, wherein the target bonus includes the bonus of the total adjustment amount and the running cost of the energy hub and the amount of wind-discarding light thereof, adding a safety correction control knowledge experience of the control variable which preferentially adjusts the control variable with a high integrated sensitivity value, setting a weight coefficient of the total adjustment amount bonus with the integrated sensitivity value, forming an inverse relationship of the integrated sensitivity value and the adjustment amount with an exponential function in consideration of the inverse relationship thereof, setting a bonus r of the total adjustment amount 1 The method comprises the following steps:
Figure BDA0004146112910000036
setting the running cost of the energy hub and the rewarding r of the quantity of the abandoned wind light 2 The method comprises the following steps:
Figure BDA0004146112910000037
in formula (6): p (P) ehj,t 、F gj,t The interaction energy between the jth energy hub at the t moment and the power distribution network and the interaction energy between the jth energy hub at the t moment and the power distribution network are respectively P ehj,t >P at 0 eh To purchase electricity price, P ehj,t <P at 0 eh To sell electricity, p g For the price of gas purchase, α is the operation and equipment maintenance cost coefficient of the energy hub, (P '' REGj -P REGj ) C for discarding wind quantity REG For discarding punishment coefficients of wind and light, deltat is the duration of each period;
setting that when the energy flow of the system is not converged or the energy flow is converged but the output of the power balancing machine and the flow of the air balancing machine exceed a certain value, the reinforcement learning process is terminated; rewards r for setting balance machine constraint 3 The method comprises the following steps:
Figure BDA0004146112910000041
in the formula (7): ΔP s,t 、ΔF s,t The difference value between the output of the electric power balancing machine and the flow of the air balancing machine in reinforcement learning and the upper limit range is P s,t 、F s,t 、P s max 、F s max Respectively the output of the electric power balancing machine, the flow of the air balancing machine and the upper limit, alpha 1 As a margin coefficient alpha 2 As the upper limit coefficient, alpha 1 、α 2 Respectively taking 0.9 and 1.1;
in order to enable an intelligent agent to well sense the out-of-limit degree of energy flow, an out-of-limit severity measurement method based on a utility theory is adopted, discrete rewards are combined, and a static safety constraint rewards r is set 4 The method comprises the following steps:
Figure BDA0004146112910000042
in formula (8): dV (dV) i,t 、dπ j 、dP l,t 、df il,t The power node voltage amplitude, the natural gas node pressure, the line power and the pipeline flow out-of-limit quantity are respectively calculated;
will give a prize r 1 、r 2 、r 3 And r 4 Normalized, and the threshold values are [ -1,0]The total rewards obtained by the agent are r 1 ~r 4 Is the weighted sum of:
r=c 1 r 1 +c 2 r 2 +c 3 r 3 +c 4 r 4 (9)
in the formula (9): c 1 、c 2 、c 3 And c 4 Coefficients for each prize.
The method comprises the steps of generating energy flow out-of-limit sample data, storing a pre-generated expert experience data set in an experience playback pool, and performing offline training on an intelligent agent, and comprises the following steps:
generating energy flow out-of-limit sample data which are respectively used for generating expert experience data sets, training and testing intelligent bodies; the method for generating the expert experience data set by adopting the safety correction control method based on the comprehensive sensitivity and storing the expert experience data set in an initial experience playback pool to integrate reinforcement learning training, and the method for collecting the expert experience data set comprises the following steps: constructing a multi-energy flow out-of-limit sample state of an energy-containing hub electric-gas region comprehensive energy system, adopting a safety correction control method based on comprehensive sensitivity to obtain the adjustment quantity of an adjustable variable in a designed action space, and selecting data successfully corrected in one step as expert experience data; in the process of learning and training the DDPG intelligent agent, multiple rounds of circulation are required to be set for realizing convergence, and the intelligent agent in each round gives out multi-step correction control actions, specifically as follows:
1) Initializing neural network parameters, putting the collected expert experience data set into an initial experience playback pool, setting the training round number, the termination round number and the maximum iteration step number T in each round, and constructing a training environment;
2) The agent obtains initial state s from the environment t Is provided withStep count t=0;
3) The agent is based on the current state s t Output action a t Calculating the comprehensive sensitivity in the current state, and transferring the environment to the environment state s at the next moment after the intelligent agent performs actions on the environment t+1 And feed back a prize value r to the intelligent agent t The experience tuple (s t ,a t ,r t ,s t+1 ) Storing in an experience playback pool, wherein t=t+1;
4) Randomly extracting small batches of samples from the experience playback pool, updating the parameters of the neural network, judging whether the system meets strong constraint after each step of action is executed, terminating the round training process if the system does not meet the strong constraint, returning to the step 3) if the system does not meet the strong constraint, continuing to interact until the maximum iteration step number T is reached, finishing the round, entering the next round, returning to the step 2), and converging the parameters of the neural network after multi-round training.
The method comprises the steps of carrying out real-time online decision by a trained intelligent agent to obtain a safety correction control strategy in an emergency state of the system, wherein the safety correction control strategy is specifically as follows:
real-time data capable of flowing out of limit in an emergency state of the system is input into a trained intelligent agent, real-time online decision is made to obtain an optimal decision of safety correction control, the system adopts the decision given by the intelligent agent, and the system meets static safety constraint through adjustment of corresponding control variables and is restored to a normal state.
The embodiment of the invention has at least the following beneficial technical effects:
1) The electric-gas area comprehensive energy system safety correction control model based on the DDPG algorithm can obtain an effective control strategy after training, has better generalization capability, can realize on-line rapid generation of the safety correction control strategy so as to eliminate energy flow out-of-limit, can solve the problem of safety correction control of the comprehensive energy system with strong coupling of state variables and control variables, and has better engineering practicability.
2) The set target rewards consider the economic benefit of the energy hub as an independent benefit main body and the renewable energy consumption thereof, so that the safety correction control can minimize the running cost of the energy hub and maximize the renewable energy consumption capability while ensuring that the system meets the static safety constraint, and the practicability and the universality of the method are improved.
Drawings
Fig. 1 is a flowchart of a method for controlling safety correction of an electric-gas area integrated energy system based on a DDPG algorithm according to an embodiment of the present invention;
FIG. 2 is a topology of an embodiment of an electrical-to-gas domain integrated energy system according to the present invention;
FIG. 3 is a graph of prize change during DDPG agent training in accordance with an embodiment of the present invention;
Detailed Description
The invention will now be further described with reference to the accompanying drawings and specific examples, which are in no way limiting.
The step flow chart of the electric-gas area comprehensive energy system safety correction control method based on the DDPG algorithm is shown in figure 1, and comprises the following steps:
1) Constructing an electric-gas area comprehensive energy system safety correction control model based on a DDPG algorithm, and setting an environment, a state space, an action space and a reward function of intelligent agent interaction;
2) Generating energy flow out-of-limit sample data, storing a pre-generated expert experience data set in an experience playback pool, and performing offline training on an intelligent body;
3) And (3) carrying out real-time online decision by the trained intelligent agent to obtain a safety correction control strategy in the emergency state of the system.
The method for constructing the electric-gas area comprehensive energy system safety correction control model based on the DDPG algorithm comprises the following steps of:
the method comprises the steps of constructing an electric-gas area comprehensive energy system safety correction control model based on a DDPG algorithm, setting an intelligent agent interaction environment as an energy-containing hub electric-gas area comprehensive energy system simulator, wherein the simulator can perform multi-energy flow calculation and comprehensive sensitivity calculation, the multi-energy flow calculation adopts a unified solution combining with a Newton Lapherson algorithm, and the unified multi-energy flow model is as follows:
Figure BDA0004146112910000061
in the formula (1): x and u are state variables and control variables, and x= [ x ] e ,x g ]=[V iij ],u=[Q c ,P eho ]P, Q, G respectively represents the active and reactive equations of the power distribution network, the energy flow equation of the gas distribution network, x e 、x g Representing state variables of the distribution network and the gas distribution network, V i 、δ i 、π j Respectively the voltage amplitude, the phase angle and the natural gas node pressure of the power node, Q c 、P eh 、π o The adjustment amounts of the power node on-site reactive power compensation, the interaction energy of the energy hub and the power distribution network and the outlet pressure of the air compressor are respectively P L 、Q L Active and reactive loads, respectively, P i 、Q i Active power and reactive power are respectively injected into the power nodes, A is a node-pipeline correlation matrix, f is a pipeline flow vector, and P REG Is the wind-light absorption output, P eh 、F g The interaction energy of the energy hub, the power distribution network and the gas distribution network is respectively, P eh >0 represents that the distribution network supplies electric energy to the energy hub, P eh <0 represents that the energy hub provides electric energy for the power distribution network, L e 、L h Electric and thermal loads, η, respectively, of an energy hub CHP,e 、η CHP,h Conversion efficiency, η, of electric energy and heat energy respectively generated for cogeneration T 、η GB The efficiency of the power transformer and the gas turbine are respectively, and v is the natural gas distribution coefficient;
the comprehensive sensitivity is the weighted summation of the sensitivity of the out-of-limit state variable and the un-out-of-limit state variable to the control variable, and is as follows:
Figure BDA0004146112910000062
in the formula (2): s is the integrated sensitivity vector,S x-Peh 、S x-Qc 、S x-πo Respectively represent the state variable versus the control variable P eh 、Q c 、π o Is used for the detection of the sensitivity of the sensor,
Figure BDA0004146112910000063
and->
Figure BDA0004146112910000064
Derived from the inverse of the Jacobian matrix of the unified multi-energy flow model,/and>
Figure BDA0004146112910000065
and->
Figure BDA0004146112910000066
Obtained according to a unified multi-energy flow model, k ei ={k e1 ,k e2 },k e1 For the weight of out-of-limit power nodes, k e2 Weight of normal power node, and the same applies k gj ={k g1 ,k g2 };
Considering the richness and necessity of variables, the state space of the intelligent agent is set as follows:
s t =[V i,ti,t ,P l,t ,F ij,t ,Q ci,t ,P ehj,tok,t P′ REGj,t ] (3)
in the formula (3): s is(s) t Is a state space, V i,t ,、π i,t 、P l,t 、F ij,t 、Q ci,t 、P ehj,t ,π ok,t 、P’ REGj,t The voltage amplitude of the power node, the pressure of the natural gas node, the line power, the pipeline flow, the on-site reactive compensation quantity of the power node, the interaction energy of the energy hub and the power distribution network and the outlet pressure of the air compressor and the wind-light output at the time t are respectively calculated;
the action is a control variable of the system, and the action space of the intelligent agent is set as follows:
a t =[ΔQ ci,t ,ΔP ehj,t ,Δπ ok,t ] (4)
in (4):a t As the motion space, deltaQ ci,t 、ΔP ehj,t 、Δπ ok,t The method comprises the steps of respectively carrying out on-site reactive power compensation on an ith power node at a moment t, interaction energy of a jth energy hub and a power distribution network and adjustment quantity of outlet pressure of a kth gas compressor;
setting a target of safety correction control to minimize the total adjustment amount of the control variable with the aim of satisfying the static safety constraint while minimizing the running cost of the energy hub and the amount of wind-discarding light thereof, thereby setting a bonus function including a target bonus and a constraint bonus, wherein the target bonus includes the bonus of the total adjustment amount and the running cost of the energy hub and the amount of wind-discarding light thereof, adding a safety correction control knowledge experience of the control variable which preferentially adjusts the control variable with a high integrated sensitivity value, setting a weight coefficient of the total adjustment amount bonus with the integrated sensitivity value, forming an inverse relationship of the integrated sensitivity value and the adjustment amount with an exponential function in consideration of the inverse relationship thereof, setting a bonus r of the total adjustment amount 1 The method comprises the following steps:
Figure BDA0004146112910000071
setting the running cost of the energy hub and the rewarding r of the quantity of the abandoned wind light 2 The method comprises the following steps:
Figure BDA0004146112910000072
in formula (6): p (P) ehj,t 、F gj,t The interaction energy between the jth energy hub at the t moment and the power distribution network and the interaction energy between the jth energy hub at the t moment and the power distribution network are respectively P ehj,t >P at 0 eh To purchase electricity price, P ehj,t <P at 0 eh To sell electricity, p g For the price of gas purchase, α is the operation and equipment maintenance cost coefficient of the energy hub, (P '' REGj -P REGj ) C for discarding wind quantity REG For discarding punishment coefficients of wind and light, deltat is the duration of each period;
setting the output and air balance machine flow of the power balance machine when the energy flow of the system is not converged or the energy flow is convergedWhen the upper limit of the reinforcement learning process exceeds a certain value, the reinforcement learning process is terminated; rewards r for setting balance machine constraint 3 The method comprises the following steps:
Figure BDA0004146112910000073
in the formula (7): ΔP s,t 、ΔF s,t The difference value between the output of the electric power balancing machine and the flow of the air balancing machine in reinforcement learning and the upper limit range is P s,t 、F s,t 、P s max 、F s max Respectively the output of the electric power balancing machine, the flow of the air balancing machine and the upper limit, alpha 1 As a margin coefficient alpha 2 As the upper limit coefficient, alpha 1 、α 2 Respectively taking 0.9 and 1.1;
in order to enable an intelligent agent to well sense the out-of-limit degree of energy flow, an out-of-limit severity measurement method based on a utility theory is adopted, discrete rewards are combined, and a static safety constraint rewards r is set 4 The method comprises the following steps:
Figure BDA0004146112910000074
in formula (8): dV (dV) i,t 、dπ j 、dP l,t 、df il,t The power node voltage amplitude, the natural gas node pressure, the line power and the pipeline flow out-of-limit quantity are respectively calculated;
will give a prize r 1 、r 2 、r 3 And r 4 Normalized, and the threshold values are [ -1,0]The total rewards obtained by the agent are r 1 ~r 4 Is the weighted sum of:
r=c 1 r 1 +c 2 r 2 +c 3 r 3 +c 4 r 4 (9)
in the formula (9): c 1 、c 2 、c 3 And c 4 Coefficients for each prize.
The method comprises the steps of generating energy flow out-of-limit sample data, storing a pre-generated expert experience data set in an experience playback pool, and performing offline training on an intelligent agent, and comprises the following steps:
generating energy flow out-of-limit sample data which are respectively used for generating expert experience data sets, training and testing intelligent bodies; the method for generating the expert experience data set by adopting the safety correction control method based on the comprehensive sensitivity and storing the expert experience data set in an initial experience playback pool to integrate reinforcement learning training, and the method for collecting the expert experience data set comprises the following steps: constructing a multi-energy flow out-of-limit sample state of an energy-containing hub electric-gas region comprehensive energy system, adopting a safety correction control method based on comprehensive sensitivity to obtain the adjustment quantity of an adjustable variable in a designed action space, and selecting data successfully corrected in one step as expert experience data; in the process of learning and training the DDPG intelligent agent, multiple rounds of circulation are required to be set for realizing convergence, and the intelligent agent in each round gives out multi-step correction control actions, specifically as follows:
1) Initializing neural network parameters, putting the collected expert experience data set into an initial experience playback pool, setting the training round number, the termination round number and the maximum iteration step number T in each round, and constructing a training environment;
2) The agent obtains initial state s from the environment t Setting a step number t=0;
3) The agent is based on the current state s t Output action a t Calculating the comprehensive sensitivity in the current state, and transferring the environment to the environment state s at the next moment after the intelligent agent performs actions on the environment t+1 And feed back a prize value r to the intelligent agent t The experience tuple (s t ,a t ,r t ,s t+1 ) Storing in an experience playback pool, wherein t=t+1;
4) Randomly extracting small batches of samples from the experience playback pool, updating the parameters of the neural network, judging whether the system meets strong constraint after each step of action is executed, terminating the round training process if the system does not meet the strong constraint, returning to the step 3) if the system does not meet the strong constraint, continuing to interact until the maximum iteration step number T is reached, finishing the round, entering the next round, returning to the step 2), and converging the parameters of the neural network after multi-round training.
The method comprises the steps of carrying out real-time online decision by a trained intelligent agent to obtain a safety correction control strategy in an emergency state of the system, wherein the safety correction control strategy is specifically as follows:
real-time data capable of flowing out of limit in an emergency state of the system is input into a trained intelligent agent, real-time online decision is made to obtain an optimal decision of safety correction control, the system adopts the decision given by the intelligent agent, and the system meets static safety constraint through adjustment of corresponding control variables and is restored to a normal state.
The invention provides an electric-gas area comprehensive energy system calculation example, which consists of an IEEE33 node distribution network, a 16 node natural gas system and 4 energy hinges, wherein the topological structure is shown in figure 2, the energy hinge parameters are shown in table 1, the power nodes 5, 17 and 29 are additionally provided with static reactive compensation devices, REG is a wind driven generator, the electricity purchasing price is set to be 0.8 yuan/kWh, the electricity selling price is set to be 0.4 yuan/kWh, and the gas purchasing price is set to be 3.95 yuan/m 3 The punishment coefficient of the abandoned wind is 0.2 yuan/kW, more electric power node voltage amplitude and natural gas node pressure out-of-limit under the emergency state of the comprehensive energy system in the electric-gas area are calculated by taking the safety correction control of the electric power node voltage amplitude and the natural gas node pressure into consideration, and static safety constraints are respectively [0.95,1.05 ]]pu and [2,4 ]]pu, setting the adjustment quantity ranges of the power node on-site reactive power compensation, the interaction energy of the energy hub and the power distribution network and the outlet pressure of the air compressor to be [0,0.5 ] respectively]pu、[-0.5,0.2]pu and [2.5,3.5 ]]pu; the DDPG safety correction control model state space comprises 33 power node voltage amplitude values, 16 natural gas node pressures and wind power output in 4 energy hubs, the state space dimension number is 53, the action space comprises 3 power node on-site reactive compensation actions, 4 energy hubs and the interactive energy of a power distribution network and 2 gas compressor outlet pressure adjustment actions, and the action space dimension number is 9; the DDPG algorithm sets a 5-layer fully connected neural network, the discount factor is 0.995, the strategy network and the value network learning rate are respectively 0.0001 and 0.001, the small batch sample is 128, and the soft update coefficient is 0.01.
TABLE 1 energy hinge parameters
Figure BDA0004146112910000091
1) Firstly, generating energy flow out-of-limit sample data of an energy-containing hub electric-electric region comprehensive energy system example: taking the fluctuation of the load and the output of the wind driven generator into consideration, simulating the power load and the natural gas load to change in the range of 75% -125% by adopting Monte Carlo, taking the actual measured wind speed of 7 months in 2012 in a certain area to generate the output of the wind driven generator, and then adopting a unified solution based on Newton Lafson algorithm to perform multi-energy flow calculation to obtain the voltage amplitude of the power node, the pressure of the natural gas node and other state quantity information, and generating 7000 out-of-limit sample data, wherein the voltage amplitude of the power node is out-of-limit, the pressure of the natural gas node is out-of-limit, and the ratio of the out-of-limit of the voltage amplitude of the power node, the out-of-limit pressure of the natural gas node and the out-of-limit of the two are 45.67%, 44.83% and 9.5%, respectively.
2) Setting the maximum iteration step number in each round of intelligent body training as 10 steps, using 500 groups of out-of-limit sample data for generating a special experience data set in an initial experience playback pool, using 5000 groups as a training set for intelligent body training, using 1500 groups as a testing set for testing the safety correction control effect after intelligent body training, and obtaining a rewarding change curve in the DDPG intelligent body training process as shown in figure 3. As can be seen from fig. 3: as expert experience is added into the experience playback pool, the learning efficiency of the intelligent agent is improved, so that the early-stage cumulative rewards are fast to rise, the intelligent agent training reaches convergence through about 3500 rounds, and finally the stable safety correction control strategy is learned, so that the intelligent agent learning method has better convergence and higher efficiency.
3) 3 sets of out-of-limit sample data were selected from the test set: case1 is the voltage amplitude out-of-limit of power nodes 28-33, case2 is the pressure out-of-limit of natural gas nodes 12-14, case3 is the voltage amplitude out-of-limit of power nodes 11-16, 29-33 and the pressure out-of-limit of natural gas node 14, decision [ delta Q ] of the intelligent agent under 3 different out-of-limit scenes ci ,ΔP ehj ,Δπ ok ]As shown in table 2, the safety correction control results are shown in table 3.
(1) As shown in table 3, the security correction control decision given by the intelligent agent is adopted, the system meets the static security constraint and returns to the normal state through the adjustment of the adjustment variable, which indicates that the control strategy generated by the intelligent agent can effectively solve the security correction control problem of the comprehensive energy system in the electric-electric area of the energy-containing junction, and the DDPG model has better generalization capability; (2) in table 2, the decisions of the on-site reactive compensation of the power nodes and the adjustment amount of the outlet pressure of the gas compressor under Case1 to Case3 are to preferentially adjust the adjustment variable with high comprehensive sensitivity value and smaller corresponding adjustment amount, so that the decision of the interactive energy adjustment amount of the energy hub and the power distribution network does not completely accord with the knowledge experience, because the set safety correction control target takes the economic benefit of the energy hub and the renewable energy consumption thereof into account; (3) from table 3, it can be seen that, the control strategy generated by adopting the intelligent agent under Case1 to Case3 has higher renewable energy consumption rate, and the larger the interactive energy adjustment amount between the energy hub and the power distribution network is, the lower the operation cost of the energy hub is, so that the theoretical analysis is met. The analysis shows that the safety correction control decision obtained by the method can eliminate the energy flow out-of-limit, ensure the benefits of the energy hub main body and the renewable energy consumption rate thereof, and verify the effectiveness of the method of the embodiment of the invention.
4) The decision speed of the intelligent agent is tested by adopting a test set of 1500 groups of out-of-limit sample data, the average time consumption of the decision is 0.0139s, the requirement of safety correction control time can be met, the DDPG model provided by the invention can be directly applied to online decision after offline training is finished, and the rapidity of the method provided by the embodiment of the invention is verified.
TABLE 2 agent decision-making in different out-of-limit scenarios
Figure BDA0004146112910000101
TABLE 3 agent safety correction control results
Figure BDA0004146112910000102

Claims (3)

1. The electric-gas area comprehensive energy system safety correction control method based on the DDPG algorithm is characterized by comprising the following steps of:
1) Constructing an electric-gas area comprehensive energy system safety correction control model based on a DDPG algorithm, and setting an environment, a state space, an action space and a reward function of intelligent agent interaction;
2) Generating energy flow out-of-limit sample data, storing a pre-generated expert experience data set in an experience playback pool, and performing offline training on an intelligent body;
3) And (3) carrying out real-time online decision by the trained intelligent agent to obtain a safety correction control strategy in the emergency state of the system.
2. The method for constructing the electric-gas area comprehensive energy system safety correction control model based on the DDPG algorithm according to claim 1, wherein the method comprises the following specific steps of:
the method comprises the steps of constructing an electric-gas area comprehensive energy system safety correction control model based on a DDPG algorithm, setting an intelligent agent interaction environment as an energy-containing hub electric-gas area comprehensive energy system simulator, wherein the simulator can perform multi-energy flow calculation and comprehensive sensitivity calculation, the multi-energy flow calculation adopts a unified solution combining with a Newton Lapherson algorithm, and the unified multi-energy flow model is as follows:
Figure FDA0004146112900000011
in the formula (1): x and u are state variables and control variables, and x= [ x ] e ,x g ]=[V iij ],u=[Q c ,P eho ]P, Q, G respectively represents the active and reactive equations of the power distribution network, the energy flow equation of the gas distribution network, x e 、x g Representing state variables of the distribution network and the gas distribution network, V i 、δ i 、π j Respectively the voltage amplitude, the phase angle and the natural gas node pressure of the power node, Q c 、P eh 、π o Reactive power in situ for power nodesCompensation, interaction energy of energy hub and distribution network and adjustment amount of outlet pressure of air compressor, P L 、Q L Active and reactive loads, respectively, P i 、Q i Active power and reactive power are respectively injected into the power nodes, A is a node-pipeline correlation matrix, f is a pipeline flow vector, and P REG Is the wind-light absorption output, P eh 、F g The interaction energy of the energy hub, the power distribution network and the gas distribution network is respectively, P eh >0 represents that the distribution network supplies electric energy to the energy hub, P eh <0 represents that the energy hub provides electric energy for the power distribution network, L e 、L h The power and thermal loads, η, of the energy hinges, respectively CHP,e 、η CHP,h Conversion efficiency, η, of electric energy and heat energy respectively generated for cogeneration T 、η GB The efficiency of the power transformer and the gas turbine are respectively, and v is the natural gas distribution coefficient;
the comprehensive sensitivity is the weighted summation of the sensitivity of the out-of-limit state variable and the un-out-of-limit state variable to the control variable, and is as follows:
Figure FDA0004146112900000021
in the formula (2): s is the integrated sensitivity vector of the sensor,
Figure FDA0004146112900000022
respectively represent the state variable versus the control variable P eh 、Q c 、π o Is a combination of sensitivity of->
Figure FDA0004146112900000023
And->
Figure FDA0004146112900000024
Derived from the inverse of the Jacobian matrix of the unified multi-energy flow model,/and>
Figure FDA0004146112900000025
and->
Figure FDA0004146112900000026
Obtained according to a unified multi-energy flow model, k ei ={k e1 ,k e2 },k e1 For the weight of out-of-limit power nodes, k e2 Weight of normal power node, and the same applies k gj ={k g1 ,k g2 };
Considering the richness and necessity of variables, the state space of the intelligent agent is set as follows:
s t =[V i,ti,t ,P l,t ,F ij,t ,Q ci,t ,P ehj,tok,t P′ REGj,t ] (3)
in the formula (3): s is(s) t Is a state space, V i,t ,、π i,t 、P l,t 、F ij,t 、Q ci,t 、P ehj,t ,π ok,t 、P’ REGj,t The voltage amplitude of the power node, the pressure of the natural gas node, the line power, the pipeline flow, the on-site reactive compensation quantity of the power node, the interaction energy of the energy hub and the power distribution network and the outlet pressure of the air compressor and the wind-light output at the time t are respectively calculated;
the action is a control variable of the system, and the action space of the intelligent agent is set as follows:
a t =[ΔQ ci,t ,ΔP ehj,t ,Δπ ok,t ] (4)
in the formula (4): a, a t As the motion space, deltaQ ci,t 、ΔP ehj,t 、Δπ ok,t The method comprises the steps of respectively carrying out on-site reactive power compensation on an ith power node at a moment t, interaction energy of a jth energy hub and a power distribution network and adjustment quantity of outlet pressure of a kth gas compressor;
setting the objective of the safety correction control to minimize the total adjustment amount of the control variable while minimizing the running cost of the energy hub and the amount of the wind-discarding thereof with the objective of satisfying the static safety constraint, thereby setting a bonus function including a target bonus and a constraint bonus, the target bonus including the total adjustment amount of the bonus and the running cost of the energy hub and the amount of the wind-discarding thereof, addingSetting a weight coefficient of a total adjustment amount rewards by using a comprehensive sensitivity value, forming an inverse relation of the comprehensive sensitivity value and the adjustment amount by using an exponential function in consideration of the inverse relation of the comprehensive sensitivity value and the adjustment amount, and setting the rewards r of the total adjustment amount by using a safety correction control knowledge experience of preferentially adjusting the control variable with high comprehensive sensitivity value 1 The method comprises the following steps:
Figure FDA0004146112900000027
setting the running cost of the energy hub and the rewarding r of the quantity of the abandoned wind light 2 The method comprises the following steps:
Figure FDA0004146112900000028
in formula (6): p (P) ehj,t 、F gj,t The interaction energy between the jth energy hub at the t moment and the power distribution network and the interaction energy between the jth energy hub at the t moment and the power distribution network are respectively P ehj,t >P at 0 eh To purchase electricity price, P ehj,t <P at 0 eh To sell electricity, p g For the price of gas purchase, α is the operation and equipment maintenance cost coefficient of the energy hub, (P '' REGj -P REGj ) C for discarding wind quantity REG For discarding punishment coefficients of wind and light, deltat is the duration of each period;
setting that when the energy flow of the system is not converged or the energy flow is converged but the output of the power balancing machine and the flow of the air balancing machine exceed a certain value, the reinforcement learning process is terminated; rewards r for setting balance machine constraint 3 The method comprises the following steps:
Figure FDA0004146112900000031
in the formula (7): ΔP s,t 、ΔF s,t The difference value between the output of the electric power balancing machine and the flow of the air balancing machine in reinforcement learning and the upper limit range is P s,t 、F s,t 、P s max 、F s max Respectively the output of the power balancing machineFlow rate of air balancing machine and upper limit thereof, alpha 1 As a margin coefficient alpha 2 As the upper limit coefficient, alpha 1 、α 2 Respectively taking 0.9 and 1.1;
in order to enable an intelligent agent to well sense the out-of-limit degree of energy flow, an out-of-limit severity measurement method based on a utility theory is adopted, discrete rewards are combined, and a static safety constraint rewards r is set 4 The method comprises the following steps:
Figure FDA0004146112900000032
in formula (8): dV (dV) i,t 、dπ j 、dP l,t 、df il,t The power node voltage amplitude, the natural gas node pressure, the line power and the pipeline flow out-of-limit quantity are respectively calculated;
will give a prize r 1 、r 2 、r 3 And r 4 Normalized, and the threshold values are [ -1,0]The total rewards obtained by the agent are r 1 ~r 4 Is the weighted sum of:
r=c 1 r 1 +c 2 r 2 +c 3 r 3 +c 4 r 4 (9)
in the formula (9): c 1 、c 2 、c 3 And c 4 Coefficients for each prize.
3. The generated energy flow out-of-limit sample data of claim 1, storing a pre-generated expert experience data set in an experience playback pool, and performing offline training on an agent, comprising the steps of:
generating energy flow out-of-limit sample data which are respectively used for generating expert experience data sets, training and testing intelligent bodies; the method for generating the expert experience data set by adopting the safety correction control method based on the comprehensive sensitivity and storing the expert experience data set in an initial experience playback pool to integrate reinforcement learning training, and the method for collecting the expert experience data set comprises the following steps: constructing a multi-energy flow out-of-limit sample state of an energy-containing hub electric-gas region comprehensive energy system, adopting a safety correction control method based on comprehensive sensitivity to obtain the adjustment quantity of an adjustable variable in a designed action space, and selecting data successfully corrected in one step as expert experience data; in the process of learning and training the DDPG intelligent agent, multiple rounds of circulation are required to be set for realizing convergence, and the intelligent agent in each round gives out multi-step correction control actions, specifically as follows:
1) Initializing neural network parameters, putting the collected expert experience data set into an initial experience playback pool, setting the training round number, the termination round number and the maximum iteration step number T in each round, and constructing a training environment;
2) The agent obtains initial state s from the environment t Setting a step number t=0;
3) The agent is based on the current state s t Output action a t Calculating the comprehensive sensitivity in the current state, and transferring the environment to the environment state s at the next moment after the intelligent agent performs actions on the environment t+1 And feed back a prize value r to the intelligent agent t The experience tuple (s t ,a t ,r t ,s t+1 ) Storing in an experience playback pool, wherein t=t+1;
4) Randomly extracting small batches of samples from the experience playback pool, updating the parameters of the neural network, judging whether the system meets strong constraint after each step of action is executed, terminating the round training process if the system does not meet the strong constraint, returning to the step 3) if the system does not meet the strong constraint, continuing to interact until the maximum iteration step number T is reached, finishing the round, entering the next round, returning to the step 2), and converging the parameters of the neural network after multi-round training.
CN202310304127.1A 2023-03-27 2023-03-27 Electric-gas area comprehensive energy system safety correction control method based on DDPG algorithm Pending CN116306372A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310304127.1A CN116306372A (en) 2023-03-27 2023-03-27 Electric-gas area comprehensive energy system safety correction control method based on DDPG algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310304127.1A CN116306372A (en) 2023-03-27 2023-03-27 Electric-gas area comprehensive energy system safety correction control method based on DDPG algorithm

Publications (1)

Publication Number Publication Date
CN116306372A true CN116306372A (en) 2023-06-23

Family

ID=86827055

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310304127.1A Pending CN116306372A (en) 2023-03-27 2023-03-27 Electric-gas area comprehensive energy system safety correction control method based on DDPG algorithm

Country Status (1)

Country Link
CN (1) CN116306372A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117997152A (en) * 2024-04-03 2024-05-07 深圳市德兰明海新能源股份有限公司 Bottom layer control method of modularized multi-level converter based on reinforcement learning

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117997152A (en) * 2024-04-03 2024-05-07 深圳市德兰明海新能源股份有限公司 Bottom layer control method of modularized multi-level converter based on reinforcement learning

Similar Documents

Publication Publication Date Title
Li et al. Coordinated load frequency control of multi-area integrated energy system using multi-agent deep reinforcement learning
CN107203137B (en) The non-linear heuristic Adaptive PID Control method of pump-storage generator speed-regulating system gain
CN112507614B (en) Comprehensive optimization method for power grid in distributed power supply high-permeability area
CN110535121B (en) Two-stage multi-objective dynamic optimization scheduling method for alternating current-direct current hybrid power grid
CN112330020B (en) Collaborative optimization method for electricity-gas comprehensive energy system
Li et al. Coordinated automatic generation control of interconnected power system with imitation guided exploration multi-agent deep reinforcement learning
CN112947672B (en) Maximum power point tracking method and device for photovoltaic cell
CN116306372A (en) Electric-gas area comprehensive energy system safety correction control method based on DDPG algorithm
CN105888971A (en) Active load reducing control system and method for large wind turbine blade
CN106532691A (en) Adaptive dynamic programming-based frequency compound control method of single-region power system
WO2024016504A1 (en) Safety-economy-based electric-thermal integrated energy control method
Beheshtikhoo et al. Design of type-2 fuzzy logic controller in a smart home energy management system with a combination of renewable energy and an electric vehicle
CN113328435B (en) Active and reactive power combined control method for active power distribution network based on reinforcement learning
CN114462696A (en) Comprehensive energy system source-load cooperative operation optimization method based on TD3
CN105955032A (en) Inverter control method for optimization of extreme learning machine on the basis of bat algorithm
CN112701721A (en) Coordination planning method of comprehensive energy system
Jegajothi et al. Combination BFPSO Tuned Intelligent Controller for Maximum Power Point Tracking in Solar Photovoltaic Farm Interconnected to Grid Supply
CN109145503B (en) High-precision dynamic modeling method for photovoltaic power station cluster
CN116995645A (en) Electric power system safety constraint economic dispatching method based on protection mechanism reinforcement learning
CN115329911A (en) Safety correction method for UPFC-containing power system based on SAE two-classification model
CN105207220A (en) Hierarchical voltage control method based on incremental learning
Chenghui et al. Research on intelligent controller of wind-power yaw based on modulation of artificial neuro-endocrine-immunity system
Guolian et al. Multiple-model predictive control based on fuzzy adaptive weights and its application to main-steam temperature in power plant
Dao et al. An intelligent CPSOGSA-based mixed H2/H∞ robust controller for the multi-hydro-turbine governing system with sharing common penstock
Cheng et al. Modeling of main steam temperature using an improved fuzzy genetic algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination