CN114880929A

CN114880929A - Deep reinforcement learning-based multi-energy flow optimization intelligent simulation method and system

Info

Publication number: CN114880929A
Application number: CN202210510697.1A
Authority: CN
Inventors: 陈盛; 王新迎; 田捷; 闫冬; 武国良; 祖光鑫
Original assignee: State Grid Heilongjiang Electric Power Co Ltd Electric Power Research Institute; State Grid Corp of China SGCC; China Electric Power Research Institute Co Ltd CEPRI
Current assignee: State Grid Heilongjiang Electric Power Co Ltd Electric Power Research Institute; State Grid Corp of China SGCC; China Electric Power Research Institute Co Ltd CEPRI
Priority date: 2022-05-11
Filing date: 2022-05-11
Publication date: 2022-08-09

Abstract

The invention belongs to the technical field of energy Internet simulation, and discloses a deep reinforcement learning-based multi-energy flow optimization intelligent simulation method, which comprises the following steps: loading an energy Internet model; setting simulation parameters according to the energy Internet model; inputting the simulation parameters into a pre-trained deep reinforcement learning module to obtain the action of each device in the energy Internet model; and outputting the action and carrying out graphical display. In the training process of the deep reinforcement learning module, the deep reinforcement learning module and the graphical modeling module are jointly called, so that the deep reinforcement learning module is deepThe degree reinforcement learning module calculates the action a at the time t _t Observing the environmental state of the energy Internet model, updating the action to the graphical modeling module for load flow calculation, and generating an environmental state s at the t +1 moment _t+1 (ii) a And the combined operation of deep reinforcement learning and load flow calculation is realized. The invention adopts a depth certainty strategy gradient algorithm to carry out the optimization operation research of the energy internet, and can generate the optimization strategy on line in real time.

Description

Deep reinforcement learning-based multi-energy flow optimization intelligent simulation method and system

Technical Field

The invention belongs to the technical field of energy Internet simulation, and particularly relates to a multi-energy flow optimization intelligent simulation method and system based on deep reinforcement learning.

Background

At present, the energy Internet modeling simulation mainly takes a physical mechanism model as a main part, and key equipment and functional network models such as a generator, a cogeneration unit, a combined cooling heating and power supply, P2G, an energy router, a power grid, a heat supply network and the like are constructed by utilizing a mathematical formula and a physical mechanism; and the data-driven modeling method can also be used for constructing models of power generation, load, energy coupling, energy storage and the like by utilizing technologies such as deep learning, cluster analysis and the like based on massive historical data. The aim of modeling simulation is to reproduce key links such as real field equipment and environment as digital as possible, and develop contents such as planning design, monitoring analysis and operation optimization on the basis.

The optimization operation related research is mainly based on a mathematical model prediction method and a heuristic algorithm to realize optimization at present. However, mathematical model prediction methods such as a mixed integer linear programming method depend on the accuracy of prediction, and meanwhile, the solving process is complex, and heuristic algorithms such as a genetic algorithm and a particle swarm algorithm have high calculation cost and are to be improved in real-time.

University of Qinghua CloudPSS: a Cloud computing-based Power System simulation platform (Cloud computing-based Power System Simulator, Cloud PSS) is a modeling simulation platform facing to an energy Internet, adopts a completely self-developed electromagnetic transient simulation kernel, utilizes heterogeneous parallel computing resources of a Cloud end, and provides modeling and simulation analysis functions facing to various energy networks such as an alternating current-direct current hybrid Power grid, renewable energy Power generation, a micro-grid, a Power distribution network and a heat supply network for users. The cloud is emulated as an open cloud service platform. The cloud service framework comprises a highly decoupled presentation layer, an application layer and a computation layer as shown in FIG. 1, so that examples and results, models and algorithms and separation of computing resources in the modeling simulation process are realized. The presentation layer and the application layer are in data security isolation, and privacy and security of user data can be guaranteed. The safety of the model and the algorithm and the independence between applications are further guaranteed between the application layer and the computing layer through a virtualization technology, and therefore a highly safe, flexible and extensible cloud service platform is formed.

However, the prior art still has the following technical problems: 1. the graphical drag modeling is realized, but the combined debugging with the python program is not realized, and the deep reinforcement learning model cannot be directly called in the cloudbss; 2. the load flow calculation function of cloudbss cannot be invoked in the python program.

Disclosure of Invention

The invention aims to provide a deep reinforcement learning-based multi-energy flow optimization intelligent simulation method and system, which aim to solve the technical problem that the existing simulation system software and a deep reinforcement learning algorithm are difficult to carry out combined operation; the method can realize direct deep reinforcement learning model calling on a graphical modeling simulation interface, further realize operation optimization intelligent simulation research based on deep reinforcement learning based on a simulation tool, and greatly improve the research efficiency.

In order to achieve the purpose, the invention adopts the following technical scheme:

in a first aspect, the invention provides a deep reinforcement learning-based intelligent simulation method for optimizing a multi-energy flow, which comprises the following steps:

setting simulation parameters according to the energy Internet model;

inputting the simulation parameters into a pre-trained deep reinforcement learning model to obtain control strategies of each device in the energy Internet model and state information of the energy Internet model;

and outputting the control strategy and the energy Internet model state information.

The invention further improves the following steps: in the step of loading the energy internet model, the energy internet model is a pre-established electric/gas/thermal energy system model.

The invention further improves the following steps: in the step of setting simulation parameters according to the energy internet model, the simulation parameters include: sensing environmental status, actions, and rewards;

the perception environment state comprises multi-energy tidal flow data in three energy forms of electricity, heat and gas; parameters of the electrical network include active, reactive, voltage and power factor; the thermal network parameters include temperature and flow; gas network parameters include pressure and flow;

the actions comprise load adjustment and reduction, generator output adjustment and energy storage, heat storage and gas storage adjustment;

the rewards include power generation cost, operating cost, and power out-of-limit penalty.

The invention further improves the following steps: in the step of inputting the simulation parameters into a pre-trained deep reinforcement learning model to obtain the control strategy of each device in the energy internet model and the state information of the energy internet model, the training step of the pre-trained deep reinforcement learning model comprises the following steps:

setting simulation parameters for training based on the energy Internet model; the simulation parameters for training include: sensing environmental status, actions, and rewards; the perception environment state comprises multi-energy tidal flow data in three energy forms of electricity, heat and gas; parameters of the electric network comprise active power, reactive power, voltage and power factor; the heat network parameters comprise temperature and flow; the gas network parameters include pressure, flow; the actions comprise load adjustment and reduction, generator output adjustment and energy storage, heat storage and gas storage adjustment; the reward comprises power generation cost, operation cost and power out-of-limit punishment;

and training the intelligent agent based on the simulation parameters for training to obtain a pre-trained deep reinforcement learning model.

The invention further improves the following steps: in the step of training the agent to obtain the pre-trained deep reinforcement learning model based on the simulation parameters for training, the step of training the agent includes:

step 1: setting a training python script, and determining a main () function and iteration times;

step 2: constructing an actor network and a critic network, defining a state space s as n parameters according to the sensing environment state, wherein the number of neurons in a corresponding input layer is n, the number of actions in the action space is m, the number of the neurons in a corresponding output layer is m, and the actions in the action space are load reduction, generator output adjustment and energy storage, heat storage and gas storage adjustment;

and step 3: starting a training cycle, calculating a power flow parameter of the electric/gas/thermal energy system according to initial parameter conditions of each device in the energy Internet model, and generating an environmental state s at the time t _t ；

And 4, step 4: the actor network is based on the environmental state s _t Generating an action a at time t _t ；

And 5: act a according to time t _t Observing the environmental state of the energy Internet model, updating the action to the graphical modeling module for load flow calculation, and generating an environmental state s at the t +1 moment _t+1 ；

Step 6: act a according to time t _t And ambient state s at time t +1 _t+1 Calculating the action a at time t _t The reward value R of (1);

and 7: will be defined by the ambient state s at time t _t And action a _t And ambient state s at time t +1 _t+1 And operation a at time t _t Is formed by a quadruple s of reward values R s _t ，a _t ，s _t+1 R } as a stripThe experience is transmitted to an experience playback unit;

the experience playback unit is arranged as an update mechanism: new samples are continuously generated from the step 2 to the step 6 and are transmitted to the experience playback unit, and the old samples exceeding the preset storage quantity of the experience playback unit are automatically deleted; circularly performing the step 2 to the step 6 when the experience playback unit is not full, and performing the step 7 after the samples stored in the experience playback unit are full;

and 8: sampling samples stored in the experience playback unit by using a fixed Batch, and updating parameters of an actor network and a critic network after calculating a gradient;

the actor network and the critic network perform network learning and parameter updating according to the following loss functions:

wherein y is the target mobile network Q value;

is the Q value of the target review network; r is a reward function; s is a state; a is a relation vector transmitted to a target comment network by a target action network; γ is a discount factor; l (theta) is the square loss of the Q value of the target action network and the Q value of the target comment network; theta is a parameter set of the target mobile network; e represents an average value;

wherein J is an objective function of the target mobile network; θ is a set of parameters of the target mobile network; s is a state; d is the state space corpus; μ represents a deterministic action of the target mobile network output; q ^μ (s, a) is the Q value with deterministic action μ; a is the relation transmitted to the target comment network by the target action network;

is represented by a gradient;

the reward function R is based on economy and power balance constraints, including power generation cost, operation cost and power out-of-limit punishment:

R＝R ₁ +R ₂ +R ₃ +R ₄

in the formula, R ₁ Representing the operating cost of the power grid; r ₂ Represents the heat supply network operation cost; r ₃ Represents the gas network operating cost; r ₄ Representing an energy balance constraint out-of-limit penalty;

and step 9: and outputting the training result of the current round, judging whether the circulation reaches a preset ending condition, saving the actor network and the critic network when the circulation reaches an ending submission piece, and repeating the steps 3-8 if not.

The invention further improves the following steps: further comprising: and displaying the output control strategy and the energy Internet model state information by adopting one or more of a line graph, a curve graph and a table.

In a second aspect, the present invention provides a deep reinforcement learning-based intelligent simulation system for optimizing multi-energy flow, including:

the intelligent simulation setting module is used for setting simulation parameters according to the energy Internet model;

the deep reinforcement learning module is used for inputting the simulation parameters into a pre-trained deep reinforcement learning model to obtain control strategies of each device in the energy Internet model and state information of the energy Internet model;

and the result display module is used for outputting the control strategy and the energy Internet model state information.

The invention further improves the following steps: in the graphical modeling module, the energy Internet model is a pre-established electric/gas/heat energy system model.

The invention further improves the following steps: the simulation parameters set by the intelligent simulation setting module comprise: sensing environmental status, actions, and rewards;

The invention further improves the following steps: the training step of the pre-trained deep reinforcement learning model comprises the following steps:

The invention further improves the following steps: the step of training the agent comprises:

step 1: setting a python script to be put in a warehouse and determining a main () function and iteration times;

and 7: will be defined by the ambient state s at time t _t And action a _t And ambient state s at time t +1 _t+1 And operation a at time t _t Is formed by a quadruple s of reward values R s _t ，a _t ，s _t+1 R } is delivered as a sample to the empirical playback unit;

the experience playback unit is arranged as an update mechanism: continuously generating new samples from the step 2 to the step 6 and conveying the new samples to the experience playback unit, and automatically deleting old samples exceeding the preset storage quantity of the experience playback unit; circularly performing the step 2 to the step 6 when the experience playback unit is not full, and performing the step 7 after the samples stored in the experience playback unit are full;

wherein y is the target mobile network Q value;

is represented by a gradient;

R＝R ₁ +R ₂ +R ₃ +R ₄

The invention further improves the following steps: the result display module is also used for displaying the output control strategy and the state information of the energy Internet model by adopting one or more of a line graph, a curve graph and a table.

In a third aspect, the present invention provides an electronic device, which includes a processor and a memory, wherein the processor is configured to execute a computer program stored in the memory to implement the deep reinforcement learning-based intelligent simulation method for optimizing a multi-energy flow.

In a fourth aspect, the present invention provides a computer-readable storage medium storing at least one instruction, which when executed by a processor, implements the deep reinforcement learning-based intelligent simulation method for multi-energy flow optimization.

Compared with the prior art, the invention has the following beneficial effects:

the invention provides a deep reinforcement learning-based multi-energy flow optimization intelligent simulation method and system, which comprises the steps of setting simulation parameters according to an energy Internet model; inputting the simulation parameters into a pre-trained deep reinforcement learning module to obtain the action of each device in the energy Internet model; and outputting the action and carrying out graphical display. The existing simulation method basically only realizes the construction of an energy Internet model, only provides functions such as load flow calculation and the like, and does not provide interfaces of intelligent algorithms such as deep reinforcement learning and the like; the method is characterized in that a pre-trained deep reinforcement learning model is obtained by combined training of graphical modeling and deep reinforcement learning; on the basis of energy Internet modeling simulation, a deep reinforcement learning model is fused and applied; the combined operation of deep reinforcement learning and load flow calculation can be realized.

In the training process of the deep reinforcement learning model, the deep reinforcement learning and the graphical modeling are jointly called, and the deep reinforcement learning model calculates the action a at the time t _t Observing the environmental state of the energy Internet model, updating the action to the graphical modeling module for load flow calculation, and generating an environmental state s at the t +1 moment _t+1 (ii) a And the combined operation of deep reinforcement learning and load flow calculation is realized.

The invention adopts a depth certainty strategy gradient algorithm to carry out the optimization operation research of the energy internet, and can generate the optimization strategy on line in real time. The method can realize direct deep reinforcement learning model calling on a graphical modeling simulation interface, further realize development of operation optimization intelligent simulation research based on deep reinforcement learning based on a simulation tool, and improve the research efficiency.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:

FIG. 1 is a schematic diagram of the cloud open cloud services application integration framework;

FIG. 2 is a schematic diagram of an intelligent simulation method for optimizing a multi-energy flow based on deep reinforcement learning according to the present invention;

FIG. 3 is a diagram of a deep reinforcement learning module;

FIG. 4 is a diagram of an intelligent simulation interaction;

FIG. 5 is a structural block diagram of a deep reinforcement learning-based intelligent simulation system for multi-energy flow optimization according to the present invention;

FIG. 6 is a schematic flow chart of a multi-energy flow optimization intelligent simulation method based on deep reinforcement learning according to the present invention;

FIG. 7 is a resulting block diagram of an electronic device of the present invention.

Detailed Description

The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings. It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.

The following detailed description is exemplary in nature and is intended to provide further details of the invention. Unless otherwise defined, all technical terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention.

Example 1

Referring to fig. 2-5, the present invention provides a deep reinforcement learning-based intelligent simulation system for optimizing multi-energy flow, which mainly includes 4 modules: the device comprises a graphical modeling module, a deep reinforcement learning module, an intelligent simulation setting module and a result display module. Wherein the content of the first and second substances,

(1) a graphical modeling module: the system is used for realizing graphical modeling of an electric/gas/heat energy system model, realizing dragging type modeling of data parameters and network topology, and writing parameter files in a dictionary (Dict) format in a python program; the mathematical example of the graphical modeling module of the embodiment comprises a 33-node distribution network model, and a 34-node heat network model, a 35-node air network model and energy conversion equipment are constructed on the basis of Bali island public data by using a heat network model and an air network model;

1)33 nodes distribution network model: the system comprises 2 generator nodes and 32 load nodes;

2) 34-node heat supply network model: contains 1 electric boiler node, 3 circulating pump nodes, 1 heat-retaining device node, 1 cogeneration unit node, 30 heating power load nodes.

3)35 node air network model: the system comprises 1 gas source station node, 1 electric drive compressor v, 1 gas boiler node, 1 gas storage tank node and 34 gas load nodes.

4) Energy conversion equipment: 5 energy conversion devices including electrically driven compressor, gas boiler, electric boiler, combined heat and power generation unit and circulating pump.

(2) Deep reinforcement learning module

The deep reinforcement learning algorithm adopts a deep deterministic strategy gradient algorithm (DDPG), the DDPG algorithm is one of reinforcement learning algorithms, in the reinforcement learning, an Agent continuously interacts with a surrounding Environment (Environment), an Environment State (State) is sensed, a corresponding Action (Action) is made according to a certain strategy, the Environment feeds back an Action Reward (Reward) to the Agent after receiving the Action and enters the next State, then the process is circularly repeated, and finally an optimal strategy is learned, so that the obtained accumulated Reward can be maximized.

The parameter design of the deep reinforcement learning module is as follows:

1) perceptual context State (State): the multi-energy tidal current data comprises three energy forms of electricity, heat and gas, and parameters of an electric network comprise active power, reactive power, voltage and power factors; the heat network parameters comprise temperature and flow; the gas network parameters include pressure, flow.

2) Action (Action): the method comprises the steps of load adjustment and reduction, generator output adjustment and energy storage, heat storage and gas storage adjustment;

3) reward (Reward): including power generation cost, operating cost and power out-of-limit penalty;

4) agent training: the overall training framework is shown in fig. 6, and specifically comprises the following steps:

step 2: and constructing an actor network and a critic network, defining a state space s as n parameters according to the sensing environment state, wherein the number of neurons in a corresponding input layer is n, the number of actions in the action space is load reduction, generator output adjustment, energy storage, heat storage and gas storage adjustment, and if the number of actions is m, the number of neurons in a corresponding output layer is m. And selecting the number of hidden layers and the number of neuron parameters according to the scale.

And step 3: starting a training cycle, calculating a power flow parameter of the electric/gas/thermal energy system according to the condition of each equipment parameter (the 1 st round adopts an initial parameter) in the energy Internet model, and generating an environmental state s at the time t _t ；

And 5: act a according to time t _t Observing the environmental state of the energy system, and calculating the environmental state s at the moment when the load flow is generated to be t +1 _t+1 ；

Step 6: act a according to time t _t And ambient state s at time t +1 _t+1 Calculating the action a at time t _t Reward value (Reward);

and 7: the environmental state and action at time t, the environmental state at time t +1 and the action a at time t _t Is formed by a quadruple s of prize values s _t ，a _t ，s _t+1 R } is delivered as a sample to the empirical playback unit; the experience playback unit is set as an updating mechanism, new samples are continuously generated in the steps 2-7 and are transmitted to the experience playback unit, and the excess old samples are automatically deleted; and (4) circularly performing the step 2 to the step 7 when the experience playback unit is not full, and performing the step 8 after the experience playback unit is stored fully.

And 8: sampling by using a fixed Batch, calculating a gradient, and then updating the parameters of the deep neural network of the actor network and the critic network constructed in the step 2;

the comment network carries out network learning according to the following loss function;

wherein y is the target mobile network Q value;

is the Q value of the target review network; r is a reward function; s is a state; a is a relation vector transmitted to a target comment network by a target action network; γ is a discount factor; l (theta) is the square loss of the Q value of the target action network and the Q value of the target comment network; theta is a parameter set of the target mobile network; and E represents an average value.

is represented as a gradient.

The design aspect of the reward function R is mainly based on economic and power balance constraints, including power generation cost, operation cost and power out-of-limit punishment.

R＝R ₁ +R ₂ +R ₃ +R ₄

In the formula, R ₁ Representing the operating cost of the power grid; r ₂ Represents the heat supply network operation cost; r ₃ Represents the gas network operating cost; r ₄ Representing an energy balance constraint out-of-limit penalty; wherein R is ₁ 、R ₂ 、R ₃ Calculated according to published literature formulae of interest, R ₄ The present embodiment is set to-100 according to the specific design of the embodiment.

And step 9: and outputting the training result of the current round, judging whether the circulation is finished, storing the neural network model when the circulation is finished, and repeating the steps 3 to 8 when the circulation is not finished.

(3) Intelligent simulation setting module: selecting a python program script by using a program entry to realize the mutual calling of deep reinforcement learning and load flow calculation, wherein the flow is shown in FIG. 7; the method specifically comprises the following steps:

and 2, step: performing load flow calculation by using initial parameters from the graphical modeling module to obtain the environmental state s of the energy system _t ；

And step 3: deep reinforcement learning module generating action a _t And updating the action to a graphical modeling module for load flow calculation to obtain s _t+1 ；

And 4, step 4: calculating a reward value R;

and 5: updating the network parameters;

step 6: and judging whether the circulation is finished or not, otherwise, executing the step 2-5, and otherwise, finishing.

The intelligent simulation setting module has two modes, and simulation parameters for training are generated and set in the training stage of the deep reinforcement learning module and are used for the deep reinforcement learning module to learn; in practical application, real simulation parameters are set for the deep reinforcement learning module to process so as to obtain a control strategy (action) and energy Internet model state information which are finally output.

(4) And a result display module: and graphically displaying the result generated by the graphical modeling module in the forms of a line graph, a curve graph, a table and the like.

Example 2

Referring to fig. 6, the present invention provides a deep reinforcement learning-based intelligent simulation method for optimizing multi-energy flow, including:

s1, loading an energy Internet model;

s2, setting simulation parameters according to the energy Internet model;

s3, inputting the simulation parameters into a pre-trained deep reinforcement learning model to obtain control strategies of each device in the energy Internet model and state information of the energy Internet model;

and S4, outputting the control strategy and the energy Internet model state information, and performing graphical display.

In a specific implementation, in the step of loading the energy internet model, the energy internet model is a pre-established electric/gas/thermal energy system model.

In a specific implementation, in the step of setting simulation parameters according to the energy internet model, the simulation parameters include: sensing environmental status, actions, and rewards;

In a specific implementation, in the step of inputting the simulation parameters into a pre-trained deep reinforcement learning model to obtain the actions of each device in the energy internet model, the training step of the pre-trained deep reinforcement learning model includes:

In specific implementation, in the step of inputting the simulation parameters into a pre-trained deep reinforcement learning model to obtain the actions of each device in the energy internet model, the step of training the intelligent agent includes:

the experience playback unit is arranged as an update mechanism: continuously generating new samples from the step 3 to the step 7 and conveying the new samples to the experience playback unit, and automatically deleting old samples exceeding the preset storage quantity of the experience playback unit; circularly performing the step 3 to the step 7 when the experience playback unit is not full, and performing the step 8 after the samples stored in the experience playback unit are full;

wherein y is the target mobile network Q value;

is represented by a gradient;

R＝R ₁ +R ₂ +R ₃ +R ₄

in the formula, R ₁ Representing the grid operating cost; r ₂ Represents the heat supply network operation cost; r ₃ Represents the gas network operating cost; r ₄ Representing energy balanceConstraining the out-of-limit punishment;

In a specific implementation, in the step of outputting the action and performing graphical display, the performing graphical display specifically includes displaying the output action by using one or more of a line graph, a graph and a table.

Example 3

Referring to fig. 7, the present invention further provides an electronic device 100 for a deep reinforcement learning-based intelligent simulation method for multi-energy flow optimization; the electronic device 100 comprises a memory 101, at least one processor 102, a computer program 103 stored in the memory 101 and executable on the at least one processor 102, and at least one communication bus 104.

The memory 101 may be used to store the computer program 103, and the processor 102 implements the steps of the deep reinforcement learning-based intelligent simulation method for optimizing multi-energy flow based on deep reinforcement learning according to embodiment 2 by running or executing the computer program stored in the memory 101 and calling the data stored in the memory 101. The memory 101 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data) created according to the use of the electronic apparatus 100, and the like. In addition, the memory 101 may include a non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other non-volatile solid state storage device.

The at least one Processor 102 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The processor 102 may be a microprocessor or the processor 102 may be any conventional processor or the like, and the processor 102 is a control center of the electronic device 100 and connects various parts of the whole electronic device 100 by various interfaces and lines.

The memory 101 in the electronic device 100 stores a plurality of instructions to implement a deep reinforcement learning-based intelligent simulation method for multi-energy flow optimization, and the processor 102 can execute the plurality of instructions to implement:

loading an energy Internet model;

setting simulation parameters according to the energy Internet model;

inputting the simulation parameters into a pre-trained deep reinforcement learning module to obtain control strategies of each device in the energy Internet model and state information of the energy Internet model;

and outputting the control strategy and the energy Internet model state information, and carrying out graphical display.

Specifically, the processor 102 may refer to the description of the relevant steps in embodiment 2 for a specific implementation method of the instruction, which is not described herein again.

Example 4

The modules/units integrated by the electronic device 100 may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, and Read-Only Memory (ROM).

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims

1. The intelligent simulation method for optimizing the multi-energy flow based on the deep reinforcement learning is characterized by comprising the following steps of:

setting simulation parameters according to the energy Internet model;

2. The deep reinforcement learning-based multi-energy flow optimization intelligent simulation method according to claim 1, wherein the energy internet model is a pre-established electric/gas/thermal energy system model.

3. The deep reinforcement learning-based multi-energy flow optimization intelligent simulation method according to claim 1, wherein in the step of setting simulation parameters according to the energy internet model, the simulation parameters include: sensing environmental status, actions, and rewards;

4. The deep reinforcement learning-based multi-energy flow optimization intelligent simulation method according to claim 1, wherein in the step of inputting the simulation parameters into a pre-trained deep reinforcement learning model to obtain the control strategy of each device in the energy internet model and the state information of the energy internet model, the training step of the pre-trained deep reinforcement learning model comprises:

5. The deep reinforcement learning-based multi-energy flow optimization intelligent simulation method according to claim 4, wherein in the step of training the intelligent agent based on the simulation parameters for training to obtain the pre-trained deep reinforcement learning model, the step of training the intelligent agent comprises:

and step 3: starting a training cycle, calculating a power flow parameter of the electric/gas/heat energy system according to initial parameter conditions of each device in the energy Internet model, and generating an environmental state s at the moment t _t ；

wherein y is the target mobile network Q value;

wherein J is an objective function of the target mobile network; θ is a set of parameters of the target mobile network; s is a state; d is the state space corpus; μ represents a deterministic action of the target mobile network output;

a Q value for taking a deterministic action mu; a is the relation transmitted to the target comment network by the target action network;

is represented by a gradient;

R＝R ₁ +R ₂ +R ₃ +R ₄

6. The deep reinforcement learning-based intelligent simulation method for multi-energy flow optimization according to claim 1, further comprising: and displaying the output control strategy and the energy Internet model state information by adopting one or more of a line graph, a curve graph and a table.

7. Multi-energy flow optimization intelligent simulation system based on deep reinforcement learning is characterized by comprising the following steps:

the deep reinforcement learning module is used for inputting the simulation parameters into a deep reinforcement learning model to carry out training, or obtaining a control strategy of each device in the energy Internet model and state information of the energy Internet model by using a pre-trained deep reinforcement learning model;

8. The deep reinforcement learning-based multi-energy flow optimization intelligent simulation system according to claim 7, wherein the energy internet model is a pre-established electric/gas/thermal energy system model.

9. The deep reinforcement learning-based multi-energy flow optimization intelligent simulation system according to claim 7, wherein the simulation parameters set by the intelligent simulation setting module include: sensing environmental status, actions, and rewards;

10. The deep reinforcement learning-based multi-energy flow optimization intelligent simulation system according to claim 7, wherein the training step of the pre-trained deep reinforcement learning model comprises:

11. An electronic device comprising a processor and a memory, wherein the processor is configured to execute a computer program stored in the memory to implement the deep reinforcement learning based intelligent simulation method for multi-energy flow optimization according to any one of claims 1 to 6.

12. A computer-readable storage medium storing at least one instruction which, when executed by a processor, implements the deep reinforcement learning-based intelligent simulation method for multi-energy flow optimization according to any one of claims 1 to 6.