CN111478326B - Comprehensive energy optimization method and device based on model-free reinforcement learning - Google Patents

Comprehensive energy optimization method and device based on model-free reinforcement learning Download PDF

Info

Publication number
CN111478326B
CN111478326B CN202010397747.0A CN202010397747A CN111478326B CN 111478326 B CN111478326 B CN 111478326B CN 202010397747 A CN202010397747 A CN 202010397747A CN 111478326 B CN111478326 B CN 111478326B
Authority
CN
China
Prior art keywords
energy
preset
model
energy supply
supply guide
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010397747.0A
Other languages
Chinese (zh)
Other versions
CN111478326A (en
Inventor
雷金勇
郭祚刚
袁智勇
徐敏
黎小林
王�琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Southern Power Grid Co Ltd
Research Institute of Southern Power Grid Co Ltd
Original Assignee
China Southern Power Grid Co Ltd
Research Institute of Southern Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Southern Power Grid Co Ltd, Research Institute of Southern Power Grid Co Ltd filed Critical China Southern Power Grid Co Ltd
Priority to CN202010397747.0A priority Critical patent/CN111478326B/en
Publication of CN111478326A publication Critical patent/CN111478326A/en
Application granted granted Critical
Publication of CN111478326B publication Critical patent/CN111478326B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06312Adjustment or analysis of established resource schedule, e.g. resource or task levelling, or dynamic rescheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/067Enterprise or organisation modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/008Circuit arrangements for ac mains or ac distribution networks involving trading of energy or energy transmission rights
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E40/00Technologies for an efficient electrical power generation, transmission or distribution
    • Y02E40/70Smart grids as climate change mitigation technology in the energy generation sector
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S50/00Market activities related to the operation of systems integrating technologies related to power network operation or related to communication or information technologies
    • Y04S50/16Energy services, e.g. dispersed generation or demand or load or energy savings aggregation

Abstract

The application discloses a comprehensive energy optimization method and device based on model-free reinforcement learning, and the method comprises the following steps: acquiring an energy supply guide signal sample according to a preset comprehensive energy service provider model; inputting an energy supply guidance signal sample into a preset neural network, and carrying out network training according to a preset loss function to obtain the energy exchange quantity of the park comprehensive energy system and a distribution network, wherein the preset loss function comprises a norm punishment item; performing rewarding simulation calculation according to the energy exchange amount through a Monte Carlo algorithm to obtain an optimal energy supply guide signal; and substituting the optimal energy supply guide signal into a preset energy optimization model to obtain an optimal scheduling scheme, wherein the preset energy optimization model comprises a preset energy scheduling function and preset constraint conditions. The method and the device solve the technical problems that the comprehensive energy system energy optimization technology based on the model is low in applicability and efficiency.

Description

Comprehensive energy optimization method and device based on model-free reinforcement learning
Technical Field
The application relates to the technical field of energy systems, in particular to a comprehensive energy optimization method and device based on model-free reinforcement learning.
Background
In order to actively promote the adjustment of an energy structure, properly cope with the shortage of petrochemical energy and strengthen and promote environmental protection work, in recent years, China starts to implement an energy development strategy of replacing coal with electricity and replacing coal with gas, so that the connection among energy sources becomes tighter and tighter, the existing mode of separate planning and independent operation of each energy source is broken, and a park comprehensive energy system with coordinated operation of multiple systems such as power distribution, gas distribution and the like and complementation and mutual economy of multiple energy sources is gradually formed.
In recent years, new demand-side energy plays an increasingly important role in securing the economy and safety of a campus energy system. The safe and stable operation of the park comprehensive energy system is an important guarantee for improving the reliability of energy supply. Because the energy consumption forms of load terminals in the system are various, the cold and heat load demand characteristics are different, the change is frequent, the peak-valley difference is large, the system voltage and the air pressure have large fluctuation and are extremely unbalanced in distribution under long-time scale, the normal operation of equipment is interfered, the energy supply quality and stability are reduced, the tidal current fluctuation of a system line and the risk of the disconnection of a micro gas turbine are increased, and the challenge is provided for the safe operation of a park comprehensive energy system. The existing energy optimization method for the comprehensive energy system of the park is mainly based on a model and establishes a mathematical equation to describe the scheduling of energy, but the method cannot ensure the convergence of an algorithm and has larger time and resource consumption of iterative operation.
Disclosure of Invention
The application provides a comprehensive energy optimization method and device based on model-free reinforcement learning, which are used for solving the technical problems of low applicability and low efficiency of a comprehensive energy system energy optimization technology based on a model.
In view of the above, a first aspect of the present application provides a comprehensive energy optimization method based on model-free reinforcement learning, including:
acquiring an energy supply guide signal sample according to a preset comprehensive energy service provider model;
inputting the energy supply guidance signal sample into a preset neural network, and carrying out network training according to a preset loss function to obtain the energy exchange quantity of the park comprehensive energy system and a distribution network, wherein the preset loss function comprises a norm punishment item;
performing rewarding simulation calculation according to the energy exchange amount through a Monte Carlo algorithm to obtain an optimal energy supply guide signal;
and substituting the optimal energy supply guide signal into a preset energy optimization model to obtain an optimal scheduling scheme, wherein the preset energy optimization model comprises a preset energy scheduling function and preset constraint conditions.
Preferably, the preset integrated energy service provider model is as follows:
Figure BDA0002488286640000021
wherein alpha is a weighting factor, lambda (t) is an energy supply guide signal,
Figure BDA0002488286640000022
and
Figure BDA0002488286640000023
the energy exchange quantity of the park comprehensive energy system and the distribution network in the time period t and NTMaximum and average energy exchange in time, epsilonmAs a conversion factor, profitbaseIntegrated energy service for distribution networkBusiness profit, NTAnd NmRespectively the total time and the number of the comprehensive energy subsystems of the park,
Figure BDA0002488286640000024
and
Figure BDA0002488286640000025
the following constraint relationships are respectively satisfied:
Figure BDA0002488286640000026
Figure BDA0002488286640000027
preferably, the inputting the energy supply guidance signal sample into a preset neural network, and performing network training according to a preset loss function to obtain the energy exchange amount between the garden integrated energy system and the distribution network, before further comprising:
converting the selling price sample into a per unit value according to a preset reference value to obtain an energy supply guiding signal;
and normalizing the energy supply guide signal to obtain an energy supply guide signal sample.
Preferably, the inputting the energy supply guidance signal sample into a preset neural network, and performing network training according to a preset loss function to obtain the energy exchange capacity between the garden comprehensive energy system and the distribution network includes:
selecting a mean square error function as a training loss function of the preset neural network;
adding the norm penalty term obtained according to regularization calculation into the training loss function to obtain the preset loss function;
and inputting the energy supply guide signal sample into a preset neural network for training to obtain the energy exchange quantity of the park comprehensive energy system and the distribution network.
Preferably, the obtaining of the optimal energy supply guidance signal by performing the reward simulation calculation according to the energy exchange amount through the monte carlo algorithm includes:
and performing incentive simulation calculation according to the energy exchange amount, the preset incentive weight and the preset simulation times through a Monte Carlo algorithm to obtain an optimal energy supply guide signal.
The second aspect of the present application provides a comprehensive energy optimization device based on model-free reinforcement learning, including:
the acquisition module is used for acquiring an energy supply guidance signal sample according to a preset comprehensive energy service provider model;
the training module is used for inputting the energy supply guide signal samples into a preset neural network, carrying out network training according to a preset loss function and obtaining the energy exchange quantity of the garden comprehensive energy system and a distribution network, wherein the preset loss function comprises a norm punishment item;
the calculation module is used for carrying out reward simulation calculation according to the energy exchange quantity through a Monte Carlo algorithm to obtain an optimal energy supply guide signal;
and the optimization solving module is used for substituting the optimal energy supply guide signal into a preset energy optimization model to obtain an optimal scheduling scheme, and the preset energy optimization model comprises a preset energy scheduling function and preset constraint conditions.
Preferably, the preset integrated energy service provider model is as follows:
Figure BDA0002488286640000031
wherein alpha is a weighting factor, lambda (t) is an energy supply guide signal,
Figure BDA0002488286640000038
and
Figure BDA0002488286640000033
the energy exchange quantity between the t time period of the park comprehensive energy system and the distribution network is respectively within NTMaximum and average energy exchange in time, epsilonmTo convert toFactor, profitbaseRevenue for distribution network integrated energy service provider, NTAnd NmRespectively the total time and the number of the comprehensive energy subsystems of the park,
Figure BDA0002488286640000034
and
Figure BDA0002488286640000035
the following constraint relationships are respectively satisfied:
Figure BDA0002488286640000036
Figure BDA0002488286640000037
preferably, the method further comprises the following steps:
the preprocessing module is used for converting the selling price sample into a per-unit value according to a preset reference value to obtain an energy supply guiding signal;
and normalizing the energy supply guide signal to obtain an energy supply guide signal sample.
Preferably, the training module is specifically configured to:
selecting a mean square error function as a training loss function of the preset neural network;
adding the norm penalty term obtained according to regularization calculation into the training loss function to obtain the preset loss function;
and inputting the energy supply guide signal sample into a preset neural network for training to obtain the energy exchange quantity of the park comprehensive energy system and the distribution network.
Preferably, the calculation module is specifically configured to:
and performing incentive simulation calculation according to the energy exchange amount, the preset incentive weight and the preset simulation times through a Monte Carlo algorithm to obtain an optimal energy supply guide signal.
According to the technical scheme, the embodiment of the application has the following advantages:
the application provides a comprehensive energy optimization method based on model-free reinforcement learning, which comprises the following steps: acquiring an energy supply guide signal sample according to a preset comprehensive energy service provider model; inputting an energy supply guidance signal sample into a preset neural network, and carrying out network training according to a preset loss function to obtain the energy exchange quantity of the park comprehensive energy system and a distribution network, wherein the preset loss function comprises a norm punishment item; performing rewarding simulation calculation according to the energy exchange amount through a Monte Carlo algorithm to obtain an optimal energy supply guide signal; and substituting the optimal energy supply guide signal into a preset energy optimization model to obtain an optimal scheduling scheme, wherein the preset energy optimization model comprises a preset energy scheduling function and preset constraint conditions.
According to the comprehensive energy optimization method based on model-free reinforcement learning, energy optimization is carried out on a park comprehensive energy system by combining two algorithms of a neural network and Monte Carlo reinforcement learning; the energy supply guidance signals are trained by utilizing the data driving characteristics of the neural network, the energy exchange quantity of the park comprehensive energy system and the distribution network is expressed with high accuracy, and the calculation efficiency is high; the Monte Carlo reinforcement learning method can solve the problem of information hidden in data, has good applicability, and even if a preset energy optimization model with constraint conditions is used, the algorithm is not applicable due to the appropriate increase of calculated amount. Therefore, the comprehensive energy optimization method based on model-free reinforcement learning can solve the technical problems that the comprehensive energy system energy optimization technology based on the model is low in applicability and efficiency.
Drawings
Fig. 1 is a schematic flowchart of a comprehensive energy optimization method based on model-free reinforcement learning according to an embodiment of the present disclosure;
FIG. 2 is another schematic flow chart of a comprehensive energy optimization method based on model-free reinforcement learning according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of an integrated energy optimization device based on model-free reinforcement learning according to an embodiment of the present disclosure.
Detailed Description
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
For easy understanding, referring to fig. 1, a first embodiment of the comprehensive energy optimization method based on model-free reinforcement learning provided by the present application includes:
step 101, acquiring an energy supply guidance signal sample according to a preset comprehensive energy service provider model.
It should be noted that the energy supply guidance signal sample is actually a variable related to the retail price of electricity, and therefore, the energy supply guidance signal sample can be obtained directly through the preset comprehensive energy service provider model, according to the actual application principle, the preset comprehensive energy service provider model reflects the income condition of the energy service provider, and the larger the income is, the more beneficial the development of the energy service provider is.
And 102, inputting the energy supply guidance signal sample into a preset neural network, and carrying out network training according to a preset loss function to obtain the energy exchange quantity of the park comprehensive energy system and the distribution network, wherein the preset loss function comprises a norm punishment item.
It should be noted that the preset neural network is a network constructed and trained, and can be directly used; the energy supply guidance signal samples are used as training data, the energy exchange quantity of the park comprehensive energy system and the energy exchange quantity of the distribution network is used as an output result of the neural network, the training is mainly performed by regression analysis, and energy supply guidance signal sample data are required to be preprocessed before training, so that the deviation of the training data can be reduced, and the accuracy of the regression analysis and the effectiveness of a calculation result are improved. The norm penalty term is added into the preset loss function, namely uncertain variables such as distributed generation power, load fluctuation and the like exist in the park comprehensive energy system, and the uncertain factors can cause the power fluctuation to have larger or smaller deviation, so that the variables can possibly cause an overfitting phenomenon in training; in order to solve the problem, a regularization algorithm is added, and norm penalty terms are calculated, so that a loss function is more suitable for actual requirements.
103, performing rewarding simulation calculation according to the energy exchange amount through a Monte Carlo algorithm to obtain an optimal energy supply guide signal.
It should be noted that, when the problem to be solved is the probability of occurrence of a certain random event or the expected value of a certain random variable, the monte carlo algorithm estimates the probability of the random event by using the frequency of occurrence of the event through a certain "experiment" method, or obtains some digital features of the random variable, and uses it as the solution of the problem. In this embodiment, the object of calculation processing is the energy exchange amount, and an energy supply guidance signal corresponding to the optimal energy exchange amount is found.
And step 104, substituting the optimal energy supply guide signal into a preset energy optimization model to obtain an optimal scheduling scheme, wherein the preset energy optimization model comprises a preset energy scheduling function and preset constraint conditions.
It should be noted that the preset energy optimization model aims to minimize the running cost under the condition of the given energy supply guide signal related to the retail price; the optimal energy supply guide signal is closely related to the retail price, so that the change of the retail price can be reflected most, and the solution of the energy optimization model is optimized; and the preset energy scheduling function and the preset constraint condition jointly form a preset energy optimization model.
It should be noted that, although the present embodiment is a comprehensive energy optimization method based on model-free reinforcement learning, and relates to economic problems such as retail price and income of an operator, these are all necessary technical features of the present embodiment, and the present embodiment mainly solves the technical problems existing in model calculation, and designs a problem that an optimization method reinforces adaptability and efficiency of an energy optimization algorithm or an optimization model of a comprehensive energy system.
The comprehensive energy optimization method based on model-free reinforcement learning provided by the embodiment is used for optimizing the energy of the park comprehensive energy system by combining two algorithms of a neural network and Monte Carlo reinforcement learning; the energy supply guidance signals are trained by utilizing the data driving characteristics of the neural network, the energy exchange quantity of the park comprehensive energy system and the distribution network is expressed with high accuracy, and the calculation efficiency is high; the Monte Carlo reinforcement learning method can solve the problem of information hidden in data, has good applicability, and even if a preset energy optimization model with constraint conditions is used, the algorithm is not applicable due to the appropriate increase of calculated amount. Therefore, the comprehensive energy optimization method based on model-free reinforcement learning provided by the embodiment can solve the technical problems of low applicability and low efficiency of the comprehensive energy system energy optimization technology based on the model.
For easy understanding, please refer to fig. 2, an embodiment two of the comprehensive energy optimization method based on model-free reinforcement learning is provided in the embodiment of the present application, including:
step 201, acquiring an energy supply guidance signal sample according to a preset comprehensive energy service provider model.
It should be noted that the preset integrated energy service provider model is as follows:
Figure BDA0002488286640000071
in the formula, the first part is the income obtained by the comprehensive energy service provider selling electricity to the park comprehensive energy system, the second part is the peak-to-average ratio of the whole scheduling period, namely the ratio of the maximum power exchange quantity to the average power exchange quantity of the park comprehensive energy system, alpha is a weight factor, and lambda (t) is an energy supply guide signal,
Figure BDA0002488286640000072
Figure BDA0002488286640000073
and
Figure BDA0002488286640000074
the energy exchange quantity between the t time period of the park comprehensive energy system and the distribution network is respectively within NTMaximum and average energy exchange in time, epsilonmAs a conversion factor, profitbaseRevenue for distribution network integrated energy service provider, NTAnd NmRespectively the total time and the number of the comprehensive energy subsystems of the park,
Figure BDA0002488286640000075
and
Figure BDA0002488286640000076
the following constraint relationships are respectively satisfied:
Figure BDA0002488286640000077
Figure BDA0002488286640000078
step 202, converting the selling price sample into a per unit value according to a preset reference value to obtain an energy supply guiding signal.
And 203, normalizing the energy supply guide signal to obtain an energy supply guide signal sample.
The two parts are to carry out operations such as preprocessing and the like on the selling price sample to obtain an energy supply guide signal sample; the preset reference value can be set to 100, here, the preset reference value is a reference value for converting the selling price sample into a per unit value, the energy exchange amount output by the network also needs to be uniformly and correspondingly converted into a per unit value, and the reference value can be set to 1000; the normalization processing is mainly directly converted by a normalization formula, and specifically, the normalization processing can be performed according to the following formula:
Figure BDA0002488286640000079
wherein s is a training sample index value, namely an energy supply guide signal sample,
Figure BDA00024882866400000710
and
Figure BDA00024882866400000711
the maximum value and the minimum value of the energy supply guide signal in each t time periods are respectively.
According to the energy supply guidance signal sample obtained after the preprocessing operation, the data normalized between [0 and 1] is obtained, and the convergence of the algorithm is improved and the convergence speed is improved.
And 204, selecting a mean square error function as a training loss function of the preset neural network.
And 205, adding a norm penalty term obtained according to regularization calculation in the training loss function to obtain a preset loss function.
It should be noted that, some uncertain variables exist in the campus integrated energy system, for example, distributed power generation power and load fluctuation, and these uncertain factors may cause power fluctuation to be larger or smaller, so that an overfitting phenomenon may occur in the variables during training, in order to solve this problem, a norm penalty term is added to a preset loss function, and the norm penalty term is calculated according to a regularization algorithm, and a two-norm penalty term is adopted in this embodiment; the mean square error function with the addition of a two-norm penalty term can be expressed as:
Figure BDA0002488286640000081
wherein b is a bias coefficient,
Figure BDA0002488286640000082
is a two-norm penalty term, δ is a canonical parameter, and NSIn order to train the number of samples,
Figure BDA0002488286640000083
for every s trainingThe actual energy exchange capacity of the park comprehensive energy system at each t time intervals of the training samples,
Figure BDA0002488286640000084
for the estimated value of the energy exchange capacity of the park integrated energy system, NSIs the total training sample number. In addition, the estimated value and the actual value of the energy exchange amount meet the following conditions:
Figure BDA0002488286640000085
εmand the function of the conversion coefficient is to convert the power exchange between each energy-using body in the system into the power exchange at the common connection point of the system, and when the calculation of the loss function is completed, the first-order partial derivatives of the loss function to the weight and the deviation are continuously obtained and used for updating the variables:
Figure BDA0002488286640000086
where i is the iteration index value, l is the hidden layer index value, NLFor the total number of hidden layers,
Figure BDA0002488286640000087
is the output value of the l layer, and eta is the learning rate; the first partial derivative calculation of the deviation is similar to the weighting and will not be described herein.
And step 206, inputting the energy supply guidance signal sample into a preset neural network for training, and acquiring the energy exchange quantity of the park comprehensive energy system and the distribution network.
It should be noted that the preset neural network is a network constructed and trained, and can be directly used; energy supply guidance signal samples are used as training data, energy exchange quantity of a park comprehensive energy system and a distribution network is used as an output result of a neural network, and training is mainly based on regression analysis.
And step 207, performing incentive simulation calculation according to the energy exchange amount, the preset incentive weight and the preset simulation times through a Monte Carlo algorithm to obtain an optimal energy supply guide signal.
It should be noted that, because it is difficult to obtain the state transition probability in the markov decision process, that is, the total power exchange amount per hour of the campus energy system including the distributed energy generation, in this embodiment, the synthetic energy optimization based on the model-free reinforcement learning adopts the monte carlo reinforcement learning algorithm, uses the sample average reward of the action as the reward value, and according to the large number theorem, as long as there are enough reward sample values and enough simulation times, the average reward of the sample is approximately equal to the actual value. In this embodiment, the agent is a campus integrated energy system; the state is the total exchange power quantity of the comprehensive energy system and the power grid in each hourly park:
Figure BDA0002488286640000091
the action is to supply energy guidance signal lambda (t) every hour, t 1T(ii) a The reward being the hourly gain of power delivery to the grid
Figure BDA0002488286640000092
The specific calculation method is as follows:
selecting an energy supply instruction signal lambda from the energy supply instruction signal samples(s)(t); initialization counter n(s) → 0; wherein s' is from 1 to NSCirculation, if λ(s')(t)=λ(s)(t), then n(s) → n(s) + 1; estimating lambda based on rewarding weight mean(s)(t):
r(λ(s)(t))=1/n(s)·(α∑profit(λ(s)(t))-(1-α)∑PAR(λ(s)(t)));
Finally, selecting λ (t) ═ argmaxr (λ)(s)(t)),s∈Ns. In the above calculation, the network distribution comprehensive energy service provider gains
Figure BDA0002488286640000093
The discount factor γ is between 0 and 1, and in this embodiment, γ may be equal to 0.9,the distribution network comprehensive energy service provider is ensured to have higher robustness for energy supply signal decision; α is a weight coefficient for balancing Σ fit (λ)(s)(t)) and ∑ PAR (λ)(s)(t)) to find an optimal energy supply guidance signal λ (t); PAR is the peak-to-average ratio (PAR) of the entire scheduling period.
And step 208, substituting the optimal energy supply guide signal into a preset energy optimization model to obtain an optimal scheduling scheme, wherein the preset energy optimization model comprises a preset energy scheduling function and preset constraint conditions.
It should be noted that the preset energy optimization model is mainly composed of a preset energy scheduling function and preset constraint conditions, where the preset energy scheduling function is:
Figure BDA0002488286640000094
Figure BDA0002488286640000101
Figure BDA0002488286640000102
wherein, CCHPRepresents the fuel cost of the micro-combustion engine; beta is amRepresenting a network loss factor; λ (t) represents an energy supply instruction signal;
Figure BDA0002488286640000103
the energy exchange capacity of the park comprehensive energy system and the distribution network is obtained;
Figure BDA0002488286640000104
supplying power instruction signals to the demand response units;
Figure BDA0002488286640000105
responding to the power demand;
Figure BDA0002488286640000106
a variable of 0 to 1, which indicates whether the ith demand response interval is active or not; mu.sesThe charge-discharge coefficient of the energy storage system is obtained; SOCes(t) is the state of charge of the energy storage system at time t; cCH4Is the natural gas price; etaCHPThe power generation efficiency of the micro-combustion engine is obtained; l isHVNGIs the low heating value of natural gas; hCHPHeat energy emitted by the micro-combustion engine; etaLIs the coefficient of heat loss; etahThe gas recovery rate; cophThe heating coefficient. PCHPIs the total natural gas consumption of the micro-combustion engine.
The preset constraint conditions are as follows:
Figure BDA0002488286640000107
Figure BDA0002488286640000108
Figure BDA0002488286640000109
Figure BDA00024882866400001010
Figure BDA00024882866400001011
Figure BDA00024882866400001012
Figure BDA00024882866400001013
the load is a scheduling load upper limit value;
Figure BDA00024882866400001014
and
Figure BDA00024882866400001015
respectively representing the charging amount and the discharging amount of stored energy; SOCes(t) an energy storage charge;
Figure BDA00024882866400001016
and
Figure BDA00024882866400001017
the upper limit and the lower limit of the energy storage capacity are respectively; etaesCharging efficiency for energy storage; delta is the length of the time interval;
Figure BDA00024882866400001018
the generating capacity of the generator set can be scheduled.
For ease of understanding, please refer to fig. 3, an embodiment of an integrated energy optimization apparatus based on model-free reinforcement learning is further provided, including:
the acquisition module 301 is configured to acquire an energy supply guidance signal sample according to a preset comprehensive energy service provider model;
the training module 302 is used for inputting the energy supply guidance signal samples into a preset neural network, performing network training according to a preset loss function, and acquiring the energy exchange quantity of the garden comprehensive energy system and a distribution network, wherein the preset loss function comprises a norm penalty item;
the calculation module 303 is configured to perform rewarding simulation calculation according to the energy exchange amount through a monte carlo algorithm to obtain an optimal energy supply guidance signal;
and the optimization solving module 304 is configured to substitute the optimal energy supply guidance signal into a preset energy optimization model to obtain an optimal scheduling scheme, where the preset energy optimization model includes a preset energy scheduling function and preset constraint conditions.
Further, the preset comprehensive energy service provider model is as follows:
Figure BDA0002488286640000111
wherein alpha is a weighting factor, lambda (t) is an energy supply guide signal,
Figure BDA0002488286640000112
and
Figure BDA0002488286640000113
the energy exchange quantity between the t time period of the park comprehensive energy system and the distribution network is respectively within NTMaximum and average energy exchange in time, epsilonmAs a conversion factor, profitbaseDistribution network integrated energy service provider revenue, NT、NmRespectively the total time and the number of the comprehensive energy subsystems of the park,
Figure BDA0002488286640000114
and
Figure BDA0002488286640000115
the following constraint relationships are respectively satisfied:
Figure BDA0002488286640000116
Figure BDA0002488286640000117
further, still include:
the preprocessing module 305 is used for converting the selling price sample into a per unit value according to a preset reference value to obtain an energy supply guiding signal;
and normalizing the energy supply guide signal to obtain an energy supply guide signal sample.
Further, the training module 302 is specifically configured to:
selecting a mean square error function as a training loss function of a preset neural network;
adding a norm penalty term obtained according to regularization calculation into the training loss function to obtain a preset loss function;
and inputting the energy supply guide signal sample into a preset neural network for training to obtain the energy exchange quantity of the park comprehensive energy system and the distribution network.
Further, the calculating module 303 is specifically configured to:
and performing incentive simulation calculation according to the energy exchange amount, the preset incentive weight and the preset simulation times through a Monte Carlo algorithm to obtain an optimal energy supply guide signal.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for executing all or part of the steps of the method described in the embodiments of the present application through a computer device (which may be a personal computer, a server, or a network device). And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (8)

1. The comprehensive energy optimization method based on model-free reinforcement learning is characterized by comprising the following steps:
acquiring an energy supply guide signal sample according to a preset comprehensive energy service provider model, wherein the preset comprehensive energy service provider model comprises the following steps:
Figure FDA0003115610360000011
wherein alpha is a weighting factor, lambda (t) is an energy supply guide signal,
Figure FDA0003115610360000012
and
Figure FDA0003115610360000013
the energy exchange quantity of the park comprehensive energy system and the distribution network in the time period t and NTMaximum and average energy exchange in time, epsilonmAs a conversion factor, profitbaseRevenue for distribution network integrated energy service provider, NTAnd NmRespectively the total time and the number of the comprehensive energy subsystems of the park,
Figure FDA0003115610360000014
and
Figure FDA0003115610360000015
the following constraint relationships are respectively satisfied:
Figure FDA0003115610360000016
Figure FDA0003115610360000017
inputting the energy supply guidance signal sample into a preset neural network, and carrying out network training according to a preset loss function to obtain the energy exchange quantity of the park comprehensive energy system and a distribution network, wherein the preset loss function comprises a norm punishment item;
performing rewarding simulation calculation according to the energy exchange amount through a Monte Carlo algorithm to obtain an optimal energy supply guide signal;
and substituting the optimal energy supply guide signal into a preset energy optimization model to obtain an optimal scheduling scheme, wherein the preset energy optimization model comprises a preset energy scheduling function and preset constraint conditions.
2. The method for optimizing energy of integrated energy based on model-free reinforcement learning according to claim 1, wherein the energy supply guidance signal samples are input into a preset neural network, and network training is performed according to a preset loss function to obtain the energy exchange amount between the park integrated energy system and the distribution network, and the method further comprises the following steps:
converting the selling price sample into a per unit value according to a preset reference value to obtain an energy supply guiding signal;
and normalizing the energy supply guide signal to obtain an energy supply guide signal sample.
3. The method for optimizing energy of comprehensive energy based on model-free reinforcement learning according to claim 1, wherein the step of inputting the energy supply guidance signal samples into a preset neural network and performing network training according to a preset loss function to obtain the energy exchange amount between the park comprehensive energy system and a distribution network comprises the steps of:
selecting a mean square error function as a training loss function of the preset neural network;
adding the norm penalty term obtained according to regularization calculation into the training loss function to obtain the preset loss function;
and inputting the energy supply guide signal sample into a preset neural network for training to obtain the energy exchange quantity of the park comprehensive energy system and the distribution network.
4. The method for comprehensive energy optimization based on model-free reinforcement learning according to claim 1, wherein the obtaining of the optimal energy supply guidance signal through the rewarding simulation calculation by the Monte Carlo algorithm according to the energy exchange amount comprises:
and performing incentive simulation calculation according to the energy exchange amount, the preset incentive weight and the preset simulation times through a Monte Carlo algorithm to obtain an optimal energy supply guide signal.
5. Comprehensive energy optimizing device based on model-free reinforcement learning is characterized by comprising the following components:
the acquisition module is used for acquiring an energy supply guidance signal sample according to a preset comprehensive energy service provider model, wherein the preset comprehensive energy service provider model comprises the following steps:
Figure FDA0003115610360000021
wherein alpha is a weighting factor, lambda (t) is an energy supply guide signal,
Figure FDA0003115610360000022
and
Figure FDA0003115610360000023
the energy exchange quantity between the t time period of the park comprehensive energy system and the distribution network is respectively within NTMaximum and average energy exchange in time, epsilonmAs a conversion factor, profitbaseRevenue for distribution network integrated energy service provider, NTAnd NmRespectively the total time and the number of the comprehensive energy subsystems of the park,
Figure FDA0003115610360000024
and
Figure FDA0003115610360000025
the following constraint relationships are respectively satisfied:
Figure FDA0003115610360000026
Figure FDA0003115610360000027
the training module is used for inputting the energy supply guide signal samples into a preset neural network, carrying out network training according to a preset loss function and obtaining the energy exchange quantity of the garden comprehensive energy system and a distribution network, wherein the preset loss function comprises a norm punishment item;
the calculation module is used for carrying out reward simulation calculation according to the energy exchange quantity through a Monte Carlo algorithm to obtain an optimal energy supply guide signal;
and the optimization solving module is used for substituting the optimal energy supply guide signal into a preset energy optimization model to obtain an optimal scheduling scheme, and the preset energy optimization model comprises a preset energy scheduling function and preset constraint conditions.
6. The integrated energy optimization device based on model-free reinforcement learning according to claim 5, further comprising:
the preprocessing module is used for converting the selling price sample into a per-unit value according to a preset reference value to obtain an energy supply guiding signal;
and normalizing the energy supply guide signal to obtain an energy supply guide signal sample.
7. The model-free reinforcement learning-based integrated energy optimization device according to claim 5, wherein the training module is specifically configured to:
selecting a mean square error function as a training loss function of the preset neural network;
adding the norm penalty term obtained according to regularization calculation into the training loss function to obtain the preset loss function;
and inputting the energy supply guide signal sample into a preset neural network for training to obtain the energy exchange quantity of the park comprehensive energy system and the distribution network.
8. The model-free reinforcement learning-based integrated energy optimization device according to claim 5, wherein the computing module is specifically configured to:
and performing incentive simulation calculation according to the energy exchange amount, the preset incentive weight and the preset simulation times through a Monte Carlo algorithm to obtain an optimal energy supply guide signal.
CN202010397747.0A 2020-05-12 2020-05-12 Comprehensive energy optimization method and device based on model-free reinforcement learning Active CN111478326B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010397747.0A CN111478326B (en) 2020-05-12 2020-05-12 Comprehensive energy optimization method and device based on model-free reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010397747.0A CN111478326B (en) 2020-05-12 2020-05-12 Comprehensive energy optimization method and device based on model-free reinforcement learning

Publications (2)

Publication Number Publication Date
CN111478326A CN111478326A (en) 2020-07-31
CN111478326B true CN111478326B (en) 2021-09-03

Family

ID=71762522

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010397747.0A Active CN111478326B (en) 2020-05-12 2020-05-12 Comprehensive energy optimization method and device based on model-free reinforcement learning

Country Status (1)

Country Link
CN (1) CN111478326B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112421642B (en) * 2020-10-28 2022-07-12 国家电网有限公司 IES (Integrated energy System) reliability assessment method and system
CN114400675B (en) * 2022-01-21 2023-04-07 合肥工业大学 Active power distribution network voltage control method based on weight mean value deep double-Q network

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109472413A (en) * 2018-11-14 2019-03-15 南方电网科学研究院有限责任公司 Consider the garden integrated energy system Optimization Scheduling of hot pipe network transmission characteristic
CN109685332A (en) * 2018-12-06 2019-04-26 广东电网有限责任公司 A kind of comprehensive energy multiagent balance of interest Optimization Scheduling and equipment
CN110852839A (en) * 2019-10-29 2020-02-28 车主邦(北京)科技有限公司 Method, device and storage medium for interfacing energy service business

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109472413A (en) * 2018-11-14 2019-03-15 南方电网科学研究院有限责任公司 Consider the garden integrated energy system Optimization Scheduling of hot pipe network transmission characteristic
CN109685332A (en) * 2018-12-06 2019-04-26 广东电网有限责任公司 A kind of comprehensive energy multiagent balance of interest Optimization Scheduling and equipment
CN110852839A (en) * 2019-10-29 2020-02-28 车主邦(北京)科技有限公司 Method, device and storage medium for interfacing energy service business

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Optimal Scheduling of Hydro–PV–Wind Hybrid System Considering CHP and BESS Coordination;Shengmin Tan等;《applied sciences》;20190302;第1-18页 *
综合能源系统建模分析与运行优化研究;李明;《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》;20200215;第24-49页 *
考虑电热气耦合的综合能源系统规划方法;雷金勇等;《电力系统及其自动化学报》;20190131;第19-24页 *
计及不确定性的区域综合能源系统双层优化配置规划模型;仇知等;《电力自动化设备》;20190831;第176-185页 *
面向能源互联网的综合能源系统规划研究综述;袁智勇等;《南方电网技术》;20190731;第1-9页 *

Also Published As

Publication number Publication date
CN111478326A (en) 2020-07-31

Similar Documents

Publication Publication Date Title
Li et al. Data-driven distributionally robust scheduling of community integrated energy systems with uncertain renewable generations considering integrated demand response
Ghadimi et al. PSO based fuzzy stochastic long-term model for deployment of distributed energy resources in distribution systems with several objectives
CN111478326B (en) Comprehensive energy optimization method and device based on model-free reinforcement learning
Huang et al. A control strategy based on deep reinforcement learning under the combined wind-solar storage system
Memarzadeh et al. A new optimal energy storage system model for wind power producers based on long short term memory and Coot Bird Search Algorithm
CN112508287B (en) Energy storage optimal configuration method based on full life cycle of user side BESS
CN110350527A (en) A kind of increment power distribution network dual-layer optimization configuration method containing distributed generation resource
CN112084705A (en) Grid-connected coordination planning method and system for comprehensive energy system
CN113592133A (en) Energy hub optimal configuration method and system
CN116011821A (en) Virtual power plant optimization risk scheduling method in power market environment
Yu et al. Research on energy management of a virtual power plant based on the improved cooperative particle swarm optimization algorithm
Jiang et al. Monthly electricity purchase and decomposition optimization considering wind power accommodation and day-ahead schedule
CN115204944A (en) Energy storage optimal peak-to-valley price difference measuring and calculating method and device considering whole life cycle
CN112865101B (en) Linear transaction method considering uncertainty of output of renewable energy
Wang et al. Source-load scenario generation based on weakly su-pervised adversarial learning and its data-driven appli-cation in energy storage capacity sizing
CN114723230A (en) Micro-grid double-layer scheduling method and system for new energy power generation and energy storage
CN114301081A (en) Micro-grid optimization method considering energy storage life loss and demand response of storage battery
Zhao et al. Technical and economic operation of VPPs based on competitive bi–level negotiations
CN113255957A (en) Quantitative optimization analysis method and system for uncertain factors of comprehensive service station
Ghasemi et al. Combating Uncertainties in Wind and Distributed PV Energy Sources Using Integrated Reinforcement Learning and Time-Series Forecasting
CN117094745B (en) Comprehensive energy system optimization control method and device based on IGDT-utility entropy
Chen et al. Optimal generation bidding strategy for CHP units in deep peak regulation ancillary service market based on two-stage programming
Yan et al. Combined Source-Storage-Transmission Planning Considering the Comprehensive Incomes of Energy Storage System
Zhang et al. Deep Reinforcement Learning-Based Battery Conditioning Hierarchical V2G Coordination for Multi-Stakeholder Benefits
Katiraee et al. Modelling of microgrids to insure resource adequacy in the capacity market

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant