CN116307136A - Deep reinforcement learning-based energy system parameter optimization method, system, device and storage medium - Google Patents

Deep reinforcement learning-based energy system parameter optimization method, system, device and storage medium Download PDF

Info

Publication number
CN116307136A
CN116307136A CN202310182092.9A CN202310182092A CN116307136A CN 116307136 A CN116307136 A CN 116307136A CN 202310182092 A CN202310182092 A CN 202310182092A CN 116307136 A CN116307136 A CN 116307136A
Authority
CN
China
Prior art keywords
value
function
optimization management
parameter optimization
energy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310182092.9A
Other languages
Chinese (zh)
Inventor
陈曦鸣
吕斌
甘业平
郑抗震
白云龙
郑元杰
韩号
王品
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Marketing Service Center of State Grid Anhui Electric Power Co Ltd
Original Assignee
Marketing Service Center of State Grid Anhui Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Marketing Service Center of State Grid Anhui Electric Power Co Ltd filed Critical Marketing Service Center of State Grid Anhui Electric Power Co Ltd
Priority to CN202310182092.9A priority Critical patent/CN116307136A/en
Publication of CN116307136A publication Critical patent/CN116307136A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/38Arrangements for parallely feeding a single network by two or more generators, converters or transformers
    • H02J3/381Dispersed generators
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0206Price or cost determination based on market factors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/10Power transmission or distribution systems management focussing at grid-level, e.g. load flow analysis, node profile computation, meshed network optimisation, active network management or spinning reserve management
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2300/00Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
    • H02J2300/20The dispersed energy generation being of renewable origin
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • Economics (AREA)
  • Theoretical Computer Science (AREA)
  • Development Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Health & Medical Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Game Theory and Decision Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Power Engineering (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an energy system parameter optimization method, system, device and storage medium based on deep reinforcement learning, and belongs to the technical field of energy collaborative optimization; the method comprises the steps of obtaining energy data of each energy system; inputting the energy data into a trained parameter optimization model to obtain an optimal parameter set of each energy system, and completing parameter optimization; by setting independent training and collaborative training in the training process of the parameter optimization model, adding respective system loss functions to obtain a comprehensive loss function during collaborative training, calculating gradient information of the three optimization management models on the comprehensive loss function, updating evaluation network parameters of the three optimization management models according to the gradient information, realizing intermediate parameter interaction among the three optimization management models, realizing information interaction, effectively improving the reliability of energy system parameter optimization, having stronger robustness in the aspect of optimizing emission reduction, effectively solving the problems of energy load, system emission reduction and the like, and improving the optimization space.

Description

Deep reinforcement learning-based energy system parameter optimization method, system, device and storage medium
Technical Field
The invention relates to an energy system parameter optimization method, system and device based on deep reinforcement learning and a storage medium, and belongs to the technical field of energy collaborative optimization.
Background
Integrated energy systems are considered as effective ways to achieve low carbon emissions and efficient operation of energy, and will be the primary form of load bearing for future human social energy, receiving widespread attention. The comprehensive energy system takes the electric power system as a core, uniformly plans and schedules various energy systems such as electricity, gas, cold, heat and the like, improves the energy utilization rate, promotes the development and utilization of renewable resources, and has great promotion effect on national economy and social development.
At present, the following methods are mainly adopted for optimizing parameters of each part in an energy system: energy center model, alternate multiplier algorithm and customized ADMM (Alternating Direction Method of Multipliers, alternate direction multiplier method), but the practical application cost and emission reduction optimization space of the three methods are limited, and the reliability is low.
Disclosure of Invention
The invention aims to provide an energy system parameter optimization method, system, device and storage medium based on deep reinforcement learning, which solve the problems of small optimization space, low reliability and the like in the prior art.
In order to achieve the above purpose, the invention is realized by adopting the following technical scheme:
in a first aspect, the present invention provides a method for optimizing parameters of an energy system based on deep reinforcement learning, including:
acquiring energy data of each energy system;
inputting the energy data into a trained parameter optimization model to obtain an optimal parameter set of each energy system, and completing parameter optimization;
the parameter optimization model comprises three interconnected optimization management models based on a DQN algorithm: the system comprises an electric power parameter optimization management model, a thermal parameter optimization management model and a fuel gas parameter optimization management model, wherein the three optimization management models are respectively provided with an experience playback pool for storing experience samples;
the parameter optimization model is subjected to cyclic training by the following method until the preset training times are reached, and training is completed:
first, individual training is performed: in each independent training round of the three optimization management models, respectively acquiring respective action spaces, respectively selecting optimal actions from the respective action spaces, calculating the value of a reward function after executing the optimal actions through respective evaluation networks, calculating the value of a value function according to the value of the reward function, and putting the value of the value function and the value of the reward function of each optimization management model obtained in the independent training round into an experience playback pool of each optimization management model as an experience sample;
then, co-training is performed: and respectively extracting a plurality of experience samples from the experience playback pools, calculating respective system loss functions of the three optimization management models according to the experience samples, adding the system loss functions to obtain a comprehensive loss function, calculating gradient information of the three optimization management models on the comprehensive loss function, and updating evaluation network parameters of the three optimization management models according to the gradient information.
With reference to the first aspect, further, the selecting an optimal action from respective action spaces includes:
trial-and-error selection is performed by adopting an epsilon-greedy strategy: selecting action a with probability ε using random strategy t Selecting an optimal action with a probability of 1-v
Figure SMS_1
Where ε is the trial-and-error probability, a t And->
Figure SMS_2
The following relationship is satisfied:
Figure SMS_3
wherein s is t Is the state at time t in the state space of the energy system, Q (s t ,a t ) Is the value of the function of the value at time t, arg max represents the function of the parameter set of the function.
With reference to the first aspect, further, the calculating the value of the value function according to the value of the reward function is performed by the following assignment formula:
Q(s t ,a t )←Q(s t ,a t )+θ[R t +ζmax Q(s t+1 ,a t+1 )-Q(s t ,a t )]
wherein Q(s) t ,a t ) Is the value of a function of the value at the time t, and theta and zeta are respectively the learning rate and the rewarding attenuation coefficient of reinforcement learning, R t Is the value of the reward function at time t, max is the maximum value, Q (s t+1 ,a t+1 ) Is the value of the function of the time t+1.
With reference to the first aspect, further, the system loss function of each of the three optimization management models is calculated according to a plurality of experience samples, and the calculation is performed by the following formula:
Figure SMS_4
Figure SMS_5
Figure SMS_6
wherein F is ES Is a system loss function value of the power parameter optimization management model, F HS Is the system loss function value of the thermodynamic parameter optimization management model, F GS Is the system loss function value of the gas parameter optimization management model, omega is the number of experience samples, theta and zeta are the learning rate and the rewarding attenuation coefficient of reinforcement learning respectively, R i Is the value of the rewarding function at the moment i, max is the maximum value, Q * (s i+1 ,a i+1 ) Is the value of the value function of the target neural network at time i+1, Q (s i ,a i ) Is the value of the function of the value at time i,
Figure SMS_8
is the state space of the power system at moment i, +.>
Figure SMS_11
Is the state space of the thermodynamic system at moment i, +.>
Figure SMS_13
Is the state space of the gas system at moment i, +.>
Figure SMS_9
Is the action space of the power system at moment i, +.>
Figure SMS_12
Is the action space of the thermodynamic system at moment i, +.>
Figure SMS_14
Is the action space of the fuel gas system at moment i, +.>
Figure SMS_15
Is a power system at moment iIs a reward function set,/->
Figure SMS_7
Is the rewarding function set of the thermodynamic system at moment i, < >>
Figure SMS_10
Is the rewarding function set of the fuel gas system at the moment i, s i Sum s i+1 Respectively representing the states of i time and i+1, a i And a i+1 The operations at the time i and i+1 are shown, respectively.
With reference to the first aspect, further, the calculating gradient information of the three optimization management models for the comprehensive loss function includes:
the gradient information of the power parameter optimization management model for the comprehensive loss function is that
Figure SMS_16
The gradient information of the thermodynamic parameter optimization management model for the comprehensive loss function is that
Figure SMS_17
The gradient information of the gas parameter optimization management model for the comprehensive loss function is that
Figure SMS_18
Wherein F is the integrated loss function,
Figure SMS_19
is the estimated network parameters of the power parameter optimization management model at the time t,
Figure SMS_20
is the evaluation network parameter of the thermal parameter optimization management model at the moment t,/-for>
Figure SMS_21
The evaluation network parameters of the fuel gas parameter optimization management model at the moment t.
With reference to the first aspect, further, the evaluating network parameters of the three optimization management models are updated according to the gradient information, and the updating is performed by the following formula:
Figure SMS_22
Figure SMS_23
Figure SMS_24
wherein θ is the learning rate of reinforcement learning, F is the comprehensive loss function,
Figure SMS_25
evaluation network parameters of the power parameter optimization management model at time t,/->
Figure SMS_26
Is the evaluation network parameter of the thermal parameter optimization management model at the moment t,/-for>
Figure SMS_27
Is the evaluation network parameter of the fuel gas parameter optimization management model at the moment t,/-for the time>
Figure SMS_28
Is the evaluation network parameter of the power parameter optimization management model at the t-1 moment, < >>
Figure SMS_29
Is the evaluation network parameter of the thermal parameter optimization management model at the t-1 moment, < >>
Figure SMS_30
Is the evaluation network parameter of the fuel gas parameter optimization management model at the time t-1.
In a second aspect, the present invention also provides an energy system parameter optimization system based on deep reinforcement learning, including:
and a data acquisition module: the method comprises the steps of acquiring energy data of each energy system;
parameter optimization module: the method comprises the steps of inputting energy data into a trained parameter optimization model to obtain an optimal parameter set of each energy system, and completing parameter optimization;
the parameter optimization module comprises a model training unit and is used for carrying out cyclic training on the parameter optimization model until the preset training times are reached by the following method, so that training is completed:
first, individual training is performed: in each independent training round of the three optimization management models, respectively acquiring respective action spaces, respectively selecting optimal actions from the respective action spaces, calculating the value of a reward function after executing the optimal actions through respective evaluation networks, calculating the value of a value function according to the value of the reward function, and putting the value of the value function and the value of the reward function of each optimization management model obtained in the independent training round into an experience playback pool of each optimization management model as an experience sample;
then, co-training is performed: and respectively extracting a plurality of experience samples from the experience playback pools, calculating respective system loss functions of the three optimization management models according to the experience samples, adding the system loss functions to obtain a comprehensive loss function, calculating gradient information of the three optimization management models on the comprehensive loss function, and updating evaluation network parameters of the three optimization management models according to the gradient information.
In a third aspect, the invention also provides an energy system parameter optimization device based on deep reinforcement learning, which comprises a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is operative according to the instructions to perform the steps of the method according to any one of the first aspects.
In a fourth aspect, the present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method of any of the first aspects.
Compared with the prior art, the invention has the following beneficial effects:
according to the energy system parameter optimization method, system, device and storage medium based on deep reinforcement learning, the independent training and collaborative training steps are set in the training process of the parameter optimization model, the respective system loss functions are added in the collaborative training process to obtain the comprehensive loss function, gradient information of the three optimization management models on the comprehensive loss function is calculated, and evaluation network parameters of the three optimization management models are updated according to the gradient information, so that intermediate parameter interaction among the three optimization management models is realized, information interaction is realized, the reliability of energy system parameter optimization is effectively improved, the method has stronger robustness in the aspect of optimizing emission reduction, the problems of energy load, system emission reduction and the like can be effectively solved, and the optimization space is improved.
Drawings
FIG. 1 is a flow chart of an energy system parameter optimization method based on deep reinforcement learning provided by an embodiment of the invention;
FIG. 2 is a schematic diagram of a training process of a parameter optimization model provided by an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of an optimization management model according to an embodiment of the present invention;
fig. 4 is a graph of power load comparison before and after power load optimization according to an embodiment of the present invention.
Detailed Description
The present invention will be further described with reference to the accompanying drawings, and the following examples are only for more clearly illustrating the technical aspects of the present invention, and are not to be construed as limiting the scope of the present invention.
Example 1
As shown in fig. 1, the embodiment of the invention provides an energy system parameter optimization method based on deep reinforcement learning, which comprises the following steps:
s1, energy data of each energy system are acquired.
In this embodiment, parameters of the electric power system, the thermodynamic system and the gas system are optimized, so that energy data of the three energy systems including electric power data, thermodynamic data and gas data are collected first.
S2, inputting the energy data into the trained parameter optimization model to obtain an optimal parameter set of each energy system, and completing parameter optimization.
In this embodiment, the parameter optimization model is a neural network model constructed based on the DQN algorithm, and the structure thereof is shown in fig. 3.
The DQN algorithm is a method of approximating Q-learning by a neural network function, which eventually finds an optimal strategy, and Q is initialized to a possible arbitrary fixed value (free choice) before learning begins. The core is a value function iteration process, namely:
Q(s t ,a t )←Q(s t ,a t )+θ[R t +ζmax Q(s t+1 ,a t+1 )-Q(s t ,a t )]
wherein Q(s) t ,a t ) Is the value of a function of the value at the time t, and theta and zeta are respectively the learning rate and the rewarding attenuation coefficient of reinforcement learning, R t Is the value of the reward function at time t, max is the maximum value, Q (s t+1 ,a t+1 ) Is the value of the t+1 moment value function, and the learning flow is as follows:
state s t Input, get all actions a t The corresponding Q value;
selecting the action corresponding to the maximum Q value
Figure SMS_31
I.e. the current optimal action, and executing the operation by the current optimal action;
the environment changes after execution and can obtain the rewards R of the environment t
With rewards R t Updating the Q value;
the neural network outputs the Q value by inputting the state value, and updates the network parameters by using the new Q as a tag.
As shown in fig. 3, 3 energy systems respectively input time-sharing load and price data of a residential community, respectively pass through full-connection layers with the number of 3 neurons of 256, 512 and 1024, and adopt a Tanh activation function to complete the full-connection interlayer function at night; due to electricity and heatThe motion space of the gas 3 energy systems respectively comprises 5,7 and 8 decision motions under any state t, and meanwhile, the continuous space of motion management precision of the gas 3 energy systems is divided into 10 discrete motion spaces, so that when the DQN network model of three different energy sources makes motion decisions under each state, the three energy management models can respectively output 10 5 ,10 7 ,10 8 Q value of each potential action.
Under any time period, each energy system can obtain corresponding benefits only by deciding the energy purchase and storage amount of each energy system and using the purchased and stored energy to meet the energy consumption requirements of the residential communities under the time period. The method is based on an DQN reinforcement learning algorithm, and an optimal management model capable of adaptively adjusting energy purchase and storage capacity of an energy system is built. For any energy system, the DQN optimization management model takes the energy consumption requirement of a residential community and the energy storage energy of the energy system as learning environments, and energizes the energy system into an intelligent body for executing two actions of energy purchase and energy storage under the environments. The DQN optimization management model quantitatively analyzes economic effects and corresponding penalties of actions executed by the energy system based on whether the energy consumption requirements of residential communities are met, whether energy waste exists or not and the like, and obtains better action selection knowledge by continuously repeating the error trial and error process.
For 3 energy systems, the energy purchasing and storing behavior is a time-sequence Markov decision process, namely, each energy system is transferred to a new system energy storing and demand state after executing specific energy purchasing and storing actions under certain energy consumption and energy storage states of the residential communities. We will describe in detail how to define the states, action space and return functions in the optimization problem, and how to design the training and optimization process and build the cost function, as detailed below.
State space:
the power system, the thermodynamic system and the gas system divide 1 day by 48 time periods t at intervals of 30 min. Under any period t, the state spaces of the 3 energy systems are respectively:
Figure SMS_32
Figure SMS_33
Figure SMS_34
the energy system state space comprises 4 variables except for a time period t, namely a unit energy purchase price, a unit energy selling price, system energy storage energy and energy demand of a residential community;
Figure SMS_35
the unit energy purchase prices of the electric, thermal and gas 3 energy systems are respectively; />
Figure SMS_36
The price of unit energy sources of the 3 energy systems is respectively sold; />
Figure SMS_37
The energy storage energy of the 3 energy systems in the period t is respectively calculated; />
Figure SMS_38
Respectively the energy demands of the residential community in the t period.
Action space:
the action space of the electric power system, the thermodynamic system and the gas system is as follows:
Figure SMS_39
Figure SMS_40
Figure SMS_41
wherein the energy system action space comprises three actions, and the purchase amount P is decided t Decision energy storage E t Mechanical action M of energy system t I.e. the energy system needs to decide the size of the energy it purchases and stores at time period t. The energy stored by the energy system in the period t is from the energy purchased by the energy system in the same period;
Figure SMS_42
the energy system respectively determines the stored electric energy, heat energy and gas quantity of the 3 energy systems in a period t; />
Figure SMS_43
The energy, heat energy and gas quantity purchased by the 3 energy systems are decided in the period t. />
Figure SMS_44
The decision motion of the mechanical system of the 3 energy production entities (motor, heat generator and gas generator) in the period t is coded motion parameters, the number of the mechanical motion parameters is different for the mechanical systems of different productivity, the number of the mechanical motions of the electric, heat and gas sources is 3,5 and 6, and the number of the motion spaces can be flexibly set manually according to the actual requirements according to the characteristics of the neural network structure. Therefore, the action spaces of the three energy systems are 5,7 and 8. In addition, the actions performed by the electric, thermal and gas 3 energy systems need to be subject to the action space constraint, namely, the energy purchase quantity of the 3 energy systems needs to be larger than the energy storage quantity of the 3 energy systems at any time period t, namely +.>
Figure SMS_45
Bonus function:
for 3 energy systems of an electric power system, a thermodynamic system and a gas system, the difference between the sales income of the energy and the purchase cost of the energy is the state s of the energy system in the period t t Lower execution action a t Is a bonus function of (1).
Taking an electric power system as an example, description will be madeThe energy system DQN optimizes the management model rewarding function meaning. The power system is executing actions
Figure SMS_46
Then, 5 potential punishment conditions may occur at the demand side, the demand side and the electricity storage end, the supply side and the electricity storage end, and if the 5 punishment conditions do not occur in the electric power system, the electric power system is called as an ideal result of the electric power system. The 6 potential outcomes and corresponding reward functions are defined as follows:
Figure SMS_47
for the same reason of a thermodynamic system and a gas system, symbols are HS and GS respectively, and physical significance represented by a reward function of the thermodynamic system and the gas system is consistent with that of the electric system; at any moment, if the sum of the power supply quantity and the power storage quantity of the power system cannot meet the power consumption load of the residential community, the power system needs to bear the economic punishment of the demand side of less than the partial load of the power system, and the line 1 is arranged; if the electricity storage capacity of the electric power system exceeds the maximum electricity storage capacity of the system (the penalty on the demand side appears to mean that the electric power system has no energy storage at the moment), the electric power system also needs to bear an energy storage end economic penalty on the line 2; if the power supply quantity of the power system is larger than the power load of the residential community, the power system needs to bear the economic penalty of the supply side of the excess part of the power supply quantity, namely, line 3; because the power system does not need to mobilize the system for storing electricity when the economic punishment of the supply side is born, if the sum of the current electricity storage capacity of the system and the electricity storage capacity of the period exceeds the maximum electricity storage capacity of the system, the power system needs to bear the economic punishment of the energy storage end caused by the excess part of the electricity storage capacity, namely, the upper line 4; if the sum of the power supply quantity and the current power storage quantity of the power system can meet the community load, but the sum of the remaining power storage quantity of the system and the power storage quantity of the period exceeds the maximum power storage quantity of the system, the power system only needs to bear the economic punishment of the energy storage end of the excess part of the energy storage quantity at the moment, and the line 5 is arranged; in summary, the ideal result of the power system performing the action is: the system power supply quantity is smaller than the community load without causing economic penalty on the supply side, the sum of the system power supply quantity and the current power storage quantity of the system is larger than the community load without causing economic penalty on the demand side, and the sum of the residual power storage quantity of the system and the power storage quantity of the period does not exceed the maximum power storage quantity of the system without causing economic penalty on the power storage end, namely the line 6.
In this embodiment, training of the parameter optimization model is implemented through the framework shown in fig. 2, and the parameter optimization model is circularly trained through the following method until the preset training times are reached, so that training is completed:
first, individual training is performed: in each independent training round of the three optimization management models, respectively acquiring respective action spaces, respectively selecting optimal actions from the respective action spaces, calculating the value of a reward function after executing the optimal actions through respective evaluation networks, calculating the value of a value function according to the value of the reward function, and putting the value of the value function and the value of the reward function of each optimization management model obtained in the independent training round into an experience playback pool of each optimization management model as an experience sample;
then, co-training is performed: and respectively extracting a plurality of experience samples from the experience playback pools, calculating respective system loss functions of the three optimization management models according to the experience samples, adding the system loss functions to obtain a comprehensive loss function, calculating gradient information of the three optimization management models on the comprehensive loss function, and updating evaluation network parameters of the three optimization management models according to the gradient information.
The optimal action is selected by the following method:
trial-and-error selection is performed by adopting an epsilon-greedy strategy: selecting action a with probability ε using random strategy t Selecting an optimal action with a probability 1-epsilon
Figure SMS_48
Where ε is the trial-and-error probability, a t And->
Figure SMS_49
The following relationship is satisfied:
Figure SMS_50
wherein s is t Is the state at time t in the state space of the energy system, Q (s t ,a t ) Is the value of the function of the value at time t, arg max represents the function of the parameter set of the function.
Calculating the value of the value function according to the value of the reward function, wherein the value is calculated through the following assignment formula:
Q(s t ,a t )←Q(s t ,a t )+θ[R t +ζmax Q(s t+1 ,a t+1 )-Q(s t ,a t )]
wherein Q(s) t ,a t ) Is the value of a function of the value at the time t, and theta and zeta are respectively the learning rate and the rewarding attenuation coefficient of reinforcement learning, R t Is the value of the reward function at time t, max is the maximum value, Q (s t+1 ,a t+1 ) Is the value of the function of the time t+1.
Calculating the system loss function of each of the three optimization management models according to a plurality of experience samples, wherein the calculation is performed according to the following formula:
Figure SMS_51
Figure SMS_52
Figure SMS_53
wherein F is ES Is a system loss function value of the power parameter optimization management model, F HS Is the system loss function value of the thermodynamic parameter optimization management model, F GS Is the system loss function value of the gas parameter optimization management model, omega is the number of experience samples, theta and zeta are the learning rate and the rewarding attenuation coefficient of reinforcement learning respectively, R i Is the value of the rewarding function at the moment i, max is the maximum value, Q * (s i+1 ,a i+1 ) Is the value of the value function of the target neural network at time i+1, Q (s i ,a i ) Is iThe value of the time of day value function,
Figure SMS_55
is the state space of the power system at moment i, +.>
Figure SMS_58
Is the state space of the thermodynamic system at moment i, +.>
Figure SMS_60
Is the state space of the gas system at moment i, +.>
Figure SMS_56
Is the action space of the power system at moment i, +.>
Figure SMS_59
Is the action space of the thermodynamic system at moment i, +.>
Figure SMS_61
Is the action space of the fuel gas system at moment i, +.>
Figure SMS_62
Is the rewarding function set of the power system at moment i, < >>
Figure SMS_54
Is the rewarding function set of the thermodynamic system at moment i, < >>
Figure SMS_57
Is the rewarding function set of the fuel gas system at the moment i, s i Sum s i+1 Respectively representing the states of i time and i+1, a i And a i+1 The operations at the time i and i+1 are shown, respectively.
Calculating gradient information of three optimization management models for the comprehensive loss function, wherein the gradient information comprises the following steps:
the gradient information of the power parameter optimization management model for the comprehensive loss function is that
Figure SMS_63
Gradient information of thermodynamic parameter optimization management model for comprehensive loss functionRest as
Figure SMS_64
The gradient information of the gas parameter optimization management model for the comprehensive loss function is that
Figure SMS_65
Wherein F is the integrated loss function,
Figure SMS_66
is the estimated network parameters of the power parameter optimization management model at the time t,
Figure SMS_67
is the evaluation network parameter of the thermal parameter optimization management model at the moment t,/-for>
Figure SMS_68
The evaluation network parameters of the fuel gas parameter optimization management model at the moment t.
The comprehensive loss function is calculated by the following formula:
F=F ES +F HS +F GS
where f is the integrated loss function, f ES Is a system loss function value f of a power parameter optimization management model HS Is the system loss function value of the thermodynamic parameter optimization management model, F GS Is the system loss function value of the gas parameter optimization management model.
And updating the evaluation network parameters of the three optimization management models according to the gradient information, wherein the evaluation network parameters are updated through the following formula:
Figure SMS_69
Figure SMS_70
Figure SMS_71
wherein θ is the learning rate of reinforcement learning, F is the comprehensive loss function,
Figure SMS_72
evaluation network parameters of the power parameter optimization management model at time t,/->
Figure SMS_73
Is the evaluation network parameter of the thermal parameter optimization management model at the moment t,/-for>
Figure SMS_74
Is the evaluation network parameter of the fuel gas parameter optimization management model at the moment t,/-for the time>
Figure SMS_75
Is the evaluation network parameter of the power parameter optimization management model at the t-1 moment, < >>
Figure SMS_76
Is the evaluation network parameter of the thermal parameter optimization management model at the t-1 moment, < >>
Figure SMS_77
Is the evaluation network parameter of the fuel gas parameter optimization management model at the time t-1.
After training is completed, the energy data are input into a trained parameter optimization model to obtain an optimal parameter set of each energy system, and parameter optimization is completed.
Taking a power generation energy system as an example, according to the requirement of electric quantity balance, the power supply reliability degree needs to take an electric load as a measurement index, the difference between the original electric load obtained through testing and the electric load after optimization is shown in fig. 4, the upper half part is before optimization, the lower half part is after optimization, obvious changes are generated before and after the power load is optimized, the original electric load has a trend of obviously reducing compared with the optimized electric load, and the balance of the electric load is stronger.
Example 2
The embodiment of the invention provides an energy system parameter optimization system based on deep reinforcement learning, which comprises the following components:
and a data acquisition module: the method comprises the steps of acquiring energy data of each energy system;
parameter optimization module: the method comprises the steps of inputting energy data into a trained parameter optimization model to obtain an optimal parameter set of each energy system, and completing parameter optimization;
the parameter optimization module comprises a model training unit and is used for carrying out cyclic training on the parameter optimization model until the preset training times are reached by the following method, so that training is completed:
first, individual training is performed: in each independent training round of the three optimization management models, respectively acquiring respective action spaces, respectively selecting optimal actions from the respective action spaces, calculating the value of a reward function after executing the optimal actions through respective evaluation networks, calculating the value of a value function according to the value of the reward function, and putting the value of the value function and the value of the reward function of each optimization management model obtained in the independent training round into an experience playback pool of each optimization management model as an experience sample;
then, co-training is performed: and respectively extracting a plurality of experience samples from the experience playback pools, calculating respective system loss functions of the three optimization management models according to the experience samples, adding the system loss functions to obtain a comprehensive loss function, calculating gradient information of the three optimization management models on the comprehensive loss function, and updating evaluation network parameters of the three optimization management models according to the gradient information.
Example 3
The embodiment of the invention provides an energy system parameter optimization device based on deep reinforcement learning, which comprises a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform steps in accordance with the method of:
acquiring energy data of each energy system;
inputting the energy data into a trained parameter optimization model to obtain an optimal parameter set of each energy system, and completing parameter optimization;
the parameter optimization model comprises three interconnected optimization management models based on a DQN algorithm: the system comprises an electric power parameter optimization management model, a thermal parameter optimization management model and a fuel gas parameter optimization management model, wherein the three optimization management models are respectively provided with an experience playback pool for storing experience samples;
the parameter optimization model is subjected to cyclic training by the following method until the preset training times are reached, and training is completed:
first, individual training is performed: in each independent training round of the three optimization management models, respectively acquiring respective action spaces, respectively selecting optimal actions from the respective action spaces, calculating the value of a reward function after executing the optimal actions through respective evaluation networks, calculating the value of a value function according to the value of the reward function, and putting the value of the value function and the value of the reward function of each optimization management model obtained in the independent training round into an experience playback pool of each optimization management model as an experience sample;
then, co-training is performed: and respectively extracting a plurality of experience samples from the experience playback pools, calculating respective system loss functions of the three optimization management models according to the experience samples, adding the system loss functions to obtain a comprehensive loss function, calculating gradient information of the three optimization management models on the comprehensive loss function, and updating evaluation network parameters of the three optimization management models according to the gradient information.
Example 4
Embodiments of the present invention provide a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
acquiring energy data of each energy system;
inputting the energy data into a trained parameter optimization model to obtain an optimal parameter set of each energy system, and completing parameter optimization;
the parameter optimization model comprises three interconnected optimization management models based on a DQN algorithm: the system comprises an electric power parameter optimization management model, a thermal parameter optimization management model and a fuel gas parameter optimization management model, wherein the three optimization management models are respectively provided with an experience playback pool for storing experience samples;
the parameter optimization model is subjected to cyclic training by the following method until the preset training times are reached, and training is completed:
first, individual training is performed: in each independent training round of the three optimization management models, respectively acquiring respective action spaces, respectively selecting optimal actions from the respective action spaces, calculating the value of a reward function after executing the optimal actions through respective evaluation networks, calculating the value of a value function according to the value of the reward function, and putting the value of the value function and the value of the reward function of each optimization management model obtained in the independent training round into an experience playback pool of each optimization management model as an experience sample;
then, co-training is performed: and respectively extracting a plurality of experience samples from the experience playback pools, calculating respective system loss functions of the three optimization management models according to the experience samples, adding the system loss functions to obtain a comprehensive loss function, calculating gradient information of the three optimization management models on the comprehensive loss function, and updating evaluation network parameters of the three optimization management models according to the gradient information.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and variations could be made by those skilled in the art without departing from the technical principles of the present invention, and such modifications and variations should also be regarded as being within the scope of the invention.

Claims (9)

1. The energy system parameter optimization method based on deep reinforcement learning is characterized by comprising the following steps of:
acquiring energy data of each energy system;
inputting the energy data into a trained parameter optimization model to obtain an optimal parameter set of each energy system, and completing parameter optimization;
the parameter optimization model comprises three interconnected optimization management models based on a DQN algorithm: the system comprises an electric power parameter optimization management model, a thermal parameter optimization management model and a fuel gas parameter optimization management model, wherein the three optimization management models are respectively provided with an experience playback pool for storing experience samples;
the parameter optimization model is subjected to cyclic training by the following method until the preset training times are reached, and training is completed:
first, individual training is performed: in each independent training round of the three optimization management models, respectively acquiring respective action spaces, respectively selecting optimal actions from the respective action spaces, calculating the value of a reward function after executing the optimal actions through respective evaluation networks, calculating the value of a value function according to the value of the reward function, and putting the value of the value function and the value of the reward function of each optimization management model obtained in the independent training round into an experience playback pool of each optimization management model as an experience sample;
then, co-training is performed: and respectively extracting a plurality of experience samples from the experience playback pools, calculating respective system loss functions of the three optimization management models according to the experience samples, adding the system loss functions to obtain a comprehensive loss function, calculating gradient information of the three optimization management models on the comprehensive loss function, and updating evaluation network parameters of the three optimization management models according to the gradient information.
2. The method for optimizing parameters of an energy system based on deep reinforcement learning of claim 1, wherein said selecting optimal actions from respective action spaces comprises:
trial-and-error selection is performed by adopting an epsilon-greedy strategy: selecting action a with probability ε using random strategy t Selecting an optimal action with a probability 1-epsilon
Figure FDA0004102618010000021
Where ε is the trial-and-error probability, a t And->
Figure FDA0004102618010000022
The following are satisfiedRelationship:
Figure FDA0004102618010000023
wherein s is t Is the state at time t in the state space of the energy system, Q (s t ,a t ) Is the value of the function of the value at time t, arg max represents the function of the parameter set of the function.
3. The method for optimizing energy system parameters based on deep reinforcement learning according to claim 1, wherein the calculating the value of the value function according to the value of the reward function is performed by the following assignment formula:
Figure FDA0004102618010000024
wherein Q(s) t ,a t ) Is the value of the function of the value at time t, θ and
Figure FDA0004102618010000025
learning rate and rewarding attenuation coefficient of reinforcement learning respectively, R t Is the value of the reward function at time t, max is the maximum value, Q (s t+1 ,a t+1 ) Is the value of the function of the time t+1.
4. The method for optimizing energy system parameters based on deep reinforcement learning according to claim 1, wherein the system loss function of each of the three optimization management models is calculated according to a plurality of experience samples, and the calculation is performed by the following formula:
Figure FDA0004102618010000026
Figure FDA0004102618010000027
Figure FDA0004102618010000031
wherein F is ES Is a system loss function value of the power parameter optimization management model, F HS Is the system loss function value of the thermodynamic parameter optimization management model, F GS Is the system loss function value of the gas parameter optimization management model, omega is the number of experience samples, theta and
Figure FDA0004102618010000032
learning rate and rewarding attenuation coefficient of reinforcement learning respectively, R i Is the value of the rewarding function at the moment i, max is the maximum value, Q * (s i+1 ,a i+1 ) Is the value of the value function of the target neural network at time i+1, Q (s i ,a i ) Is the value of the function of the value of i moment, +.>
Figure FDA0004102618010000033
Is the state space of the power system at moment i, +.>
Figure FDA0004102618010000034
Is the state space of the thermodynamic system at moment i, +.>
Figure FDA0004102618010000035
Is the state space of the gas system at moment i, +.>
Figure FDA0004102618010000036
Is the action space of the power system at moment i, +.>
Figure FDA0004102618010000037
Is the action space of the thermodynamic system at moment i, +.>
Figure FDA0004102618010000038
Is the action space of the fuel gas system at moment i, +.>
Figure FDA0004102618010000039
Is the rewarding function set of the power system at moment i, < >>
Figure FDA00041026180100000310
Is the rewarding function set of the thermodynamic system at moment i, < >>
Figure FDA00041026180100000311
Is the rewarding function set of the fuel gas system at the moment i, s i Sum s i+1 Respectively representing the states of i time and i+1, a i And a i+1 The operations at the time i and i+1 are shown, respectively.
5. The method for optimizing parameters of an energy system based on deep reinforcement learning according to claim 4, wherein the calculating gradient information of three optimization management models for the integrated loss function comprises:
the gradient information of the power parameter optimization management model for the comprehensive loss function is that
Figure FDA00041026180100000312
The gradient information of the thermodynamic parameter optimization management model for the comprehensive loss function is that
Figure FDA00041026180100000313
The gradient information of the gas parameter optimization management model for the comprehensive loss function is that
Figure FDA00041026180100000314
Wherein F is the integrated loss function,
Figure FDA00041026180100000315
evaluation network parameters of the power parameter optimization management model at time t,/->
Figure FDA00041026180100000316
Is the evaluation network parameter of the thermal parameter optimization management model at the moment t,/-for>
Figure FDA00041026180100000317
The evaluation network parameters of the fuel gas parameter optimization management model at the moment t.
6. The method for optimizing energy system parameters based on deep reinforcement learning according to claim 5, wherein the evaluation network parameters of the three optimization management models are updated according to the gradient information by the following formula:
Figure FDA0004102618010000041
Figure FDA0004102618010000042
Figure FDA0004102618010000043
wherein θ is the learning rate of reinforcement learning, F is the comprehensive loss function,
Figure FDA0004102618010000044
evaluation network parameters of the power parameter optimization management model at time t,/->
Figure FDA0004102618010000045
Is the evaluation network parameter of the thermal parameter optimization management model at the moment t,/-for>
Figure FDA0004102618010000046
Is the evaluation network parameter of the fuel gas parameter optimization management model at the moment t,/-for the time>
Figure FDA0004102618010000047
Is the estimated network parameters of the power parameter optimization management model at the time t-1,
Figure FDA0004102618010000048
is the evaluation network parameter of the thermal parameter optimization management model at the t-1 moment, < >>
Figure FDA0004102618010000049
Is the evaluation network parameter of the fuel gas parameter optimization management model at the time t-1.
7. An energy system parameter optimization system based on deep reinforcement learning, which is characterized by comprising:
and a data acquisition module: the method comprises the steps of acquiring energy data of each energy system;
parameter optimization module: the method comprises the steps of inputting energy data into a trained parameter optimization model to obtain an optimal parameter set of each energy system, and completing parameter optimization;
the parameter optimization module comprises a model training unit and is used for carrying out cyclic training on the parameter optimization model until the preset training times are reached by the following method, so that training is completed:
first, individual training is performed: in each independent training round of the three optimization management models, respectively acquiring respective action spaces, respectively selecting optimal actions from the respective action spaces, calculating the value of a reward function after executing the optimal actions through respective evaluation networks, calculating the value of a value function according to the value of the reward function, and putting the value of the value function and the value of the reward function of each optimization management model obtained in the independent training round into an experience playback pool of each optimization management model as an experience sample;
then, co-training is performed: and respectively extracting a plurality of experience samples from the experience playback pools, calculating respective system loss functions of the three optimization management models according to the experience samples, adding the system loss functions to obtain a comprehensive loss function, calculating gradient information of the three optimization management models on the comprehensive loss function, and updating evaluation network parameters of the three optimization management models according to the gradient information.
8. The energy system parameter optimizing device based on deep reinforcement learning is characterized by comprising a processor and a storage medium;
the storage medium is used for storing instructions;
the processor being operative according to the instructions to perform the steps of the method according to any one of claims 1 to 6.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 6.
CN202310182092.9A 2023-02-24 2023-02-24 Deep reinforcement learning-based energy system parameter optimization method, system, device and storage medium Pending CN116307136A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310182092.9A CN116307136A (en) 2023-02-24 2023-02-24 Deep reinforcement learning-based energy system parameter optimization method, system, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310182092.9A CN116307136A (en) 2023-02-24 2023-02-24 Deep reinforcement learning-based energy system parameter optimization method, system, device and storage medium

Publications (1)

Publication Number Publication Date
CN116307136A true CN116307136A (en) 2023-06-23

Family

ID=86795453

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310182092.9A Pending CN116307136A (en) 2023-02-24 2023-02-24 Deep reinforcement learning-based energy system parameter optimization method, system, device and storage medium

Country Status (1)

Country Link
CN (1) CN116307136A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113378456A (en) * 2021-05-21 2021-09-10 青海大学 Multi-park comprehensive energy scheduling method and system
CN113902040A (en) * 2021-11-15 2022-01-07 中国电力科学研究院有限公司 Method, system, equipment and storage medium for coordinating and optimizing electricity-heat comprehensive energy system
CN114819337A (en) * 2022-04-25 2022-07-29 华北电力大学 Multi-task learning-based comprehensive energy system multi-load prediction method
WO2022160705A1 (en) * 2021-01-26 2022-08-04 中国电力科学研究院有限公司 Method and apparatus for constructing dispatching model of integrated energy system, medium, and electronic device
CN114880929A (en) * 2022-05-11 2022-08-09 中国电力科学研究院有限公司 Deep reinforcement learning-based multi-energy flow optimization intelligent simulation method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022160705A1 (en) * 2021-01-26 2022-08-04 中国电力科学研究院有限公司 Method and apparatus for constructing dispatching model of integrated energy system, medium, and electronic device
CN113378456A (en) * 2021-05-21 2021-09-10 青海大学 Multi-park comprehensive energy scheduling method and system
CN113902040A (en) * 2021-11-15 2022-01-07 中国电力科学研究院有限公司 Method, system, equipment and storage medium for coordinating and optimizing electricity-heat comprehensive energy system
CN114819337A (en) * 2022-04-25 2022-07-29 华北电力大学 Multi-task learning-based comprehensive energy system multi-load prediction method
CN114880929A (en) * 2022-05-11 2022-08-09 中国电力科学研究院有限公司 Deep reinforcement learning-based multi-energy flow optimization intelligent simulation method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈明昊 等: "基于纵向联邦强化学习的居民社区综合能源系统协同训练与优化管理方法", 中国电机工程学报, vol. 42, no. 15, pages 1 - 2 *

Similar Documents

Publication Publication Date Title
CN112186799B (en) Distributed energy system autonomous control method and system based on deep reinforcement learning
Wang et al. Deep reinforcement learning method for demand response management of interruptible load
Li et al. A reinforcement learning based RMOEA/D for bi-objective fuzzy flexible job shop scheduling
Qiu et al. Safe reinforcement learning for real-time automatic control in a smart energy-hub
CN109347149A (en) Micro-capacitance sensor energy storage dispatching method and device based on depth Q value network intensified learning
Hota et al. Short-term hydrothermal scheduling through evolutionary programming technique
Patyn et al. Comparing neural architectures for demand response through model-free reinforcement learning for heat pump control
Lu et al. Reward shaping-based actor–critic deep reinforcement learning for residential energy management
CN111181201B (en) Multi-energy park scheduling method and system based on double-layer reinforcement learning
CN116207739B (en) Optimal scheduling method and device for power distribution network, computer equipment and storage medium
CN114243797A (en) Distributed power supply optimal scheduling method, system, equipment and storage medium
CN116345578B (en) Micro-grid operation optimization scheduling method based on depth deterministic strategy gradient
Zhang et al. Data-driven cooperative trading framework for a risk-constrained wind integrated power system considering market uncertainties
Tittaferrante et al. Multiadvisor reinforcement learning for multiagent multiobjective smart home energy control
Chuang et al. Deep reinforcement learning based pricing strategy of aggregators considering renewable energy
CN117374937A (en) Multi-micro-grid collaborative optimization operation method, device, equipment and medium
CN116307136A (en) Deep reinforcement learning-based energy system parameter optimization method, system, device and storage medium
CN116227883A (en) Intelligent household energy management system prediction decision-making integrated scheduling method based on deep reinforcement learning
Xiong et al. Interpretable Deep Reinforcement Learning for Optimizing Heterogeneous Energy Storage Systems
Govardhan et al. Comparative analysis of economic viability with distributed energy resources on unit commitment
CN116451880B (en) Distributed energy optimization scheduling method and device based on hybrid learning
Zhang et al. Learning-based scheduling of integrated charging-storage-discharging station for minimizing electric vehicle users' cost
Mishra et al. An innovative multi-head attention model with BiMGRU for real-time electric vehicle charging management through deep reinforcement learning
Latha A Machine Learning Approach for Generation Scheduling in Electricity Markets
CN116681269B (en) Intelligent collaborative operation optimization method for power grid interactive type efficient residential building

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination