CN111478326B - Comprehensive energy optimization method and device based on model-free reinforcement learning - Google Patents
Comprehensive energy optimization method and device based on model-free reinforcement learning Download PDFInfo
- Publication number
- CN111478326B CN111478326B CN202010397747.0A CN202010397747A CN111478326B CN 111478326 B CN111478326 B CN 111478326B CN 202010397747 A CN202010397747 A CN 202010397747A CN 111478326 B CN111478326 B CN 111478326B
- Authority
- CN
- China
- Prior art keywords
- energy
- preset
- model
- energy supply
- supply guide
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
- G06Q10/06312—Adjustment or analysis of established resource schedule, e.g. resource or task levelling, or dynamic rescheduling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/067—Enterprise or organisation modelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/06—Electricity, gas or water supply
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/008—Circuit arrangements for ac mains or ac distribution networks involving trading of energy or energy transmission rights
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02E—REDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
- Y02E40/00—Technologies for an efficient electrical power generation, transmission or distribution
- Y02E40/70—Smart grids as climate change mitigation technology in the energy generation sector
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S50/00—Market activities related to the operation of systems integrating technologies related to power network operation or related to communication or information technologies
- Y04S50/16—Energy services, e.g. dispersed generation or demand or load or energy savings aggregation
Abstract
The application discloses a comprehensive energy optimization method and device based on model-free reinforcement learning, and the method comprises the following steps: acquiring an energy supply guide signal sample according to a preset comprehensive energy service provider model; inputting an energy supply guidance signal sample into a preset neural network, and carrying out network training according to a preset loss function to obtain the energy exchange quantity of the park comprehensive energy system and a distribution network, wherein the preset loss function comprises a norm punishment item; performing rewarding simulation calculation according to the energy exchange amount through a Monte Carlo algorithm to obtain an optimal energy supply guide signal; and substituting the optimal energy supply guide signal into a preset energy optimization model to obtain an optimal scheduling scheme, wherein the preset energy optimization model comprises a preset energy scheduling function and preset constraint conditions. The method and the device solve the technical problems that the comprehensive energy system energy optimization technology based on the model is low in applicability and efficiency.
Description
Technical Field
The application relates to the technical field of energy systems, in particular to a comprehensive energy optimization method and device based on model-free reinforcement learning.
Background
In order to actively promote the adjustment of an energy structure, properly cope with the shortage of petrochemical energy and strengthen and promote environmental protection work, in recent years, China starts to implement an energy development strategy of replacing coal with electricity and replacing coal with gas, so that the connection among energy sources becomes tighter and tighter, the existing mode of separate planning and independent operation of each energy source is broken, and a park comprehensive energy system with coordinated operation of multiple systems such as power distribution, gas distribution and the like and complementation and mutual economy of multiple energy sources is gradually formed.
In recent years, new demand-side energy plays an increasingly important role in securing the economy and safety of a campus energy system. The safe and stable operation of the park comprehensive energy system is an important guarantee for improving the reliability of energy supply. Because the energy consumption forms of load terminals in the system are various, the cold and heat load demand characteristics are different, the change is frequent, the peak-valley difference is large, the system voltage and the air pressure have large fluctuation and are extremely unbalanced in distribution under long-time scale, the normal operation of equipment is interfered, the energy supply quality and stability are reduced, the tidal current fluctuation of a system line and the risk of the disconnection of a micro gas turbine are increased, and the challenge is provided for the safe operation of a park comprehensive energy system. The existing energy optimization method for the comprehensive energy system of the park is mainly based on a model and establishes a mathematical equation to describe the scheduling of energy, but the method cannot ensure the convergence of an algorithm and has larger time and resource consumption of iterative operation.
Disclosure of Invention
The application provides a comprehensive energy optimization method and device based on model-free reinforcement learning, which are used for solving the technical problems of low applicability and low efficiency of a comprehensive energy system energy optimization technology based on a model.
In view of the above, a first aspect of the present application provides a comprehensive energy optimization method based on model-free reinforcement learning, including:
acquiring an energy supply guide signal sample according to a preset comprehensive energy service provider model;
inputting the energy supply guidance signal sample into a preset neural network, and carrying out network training according to a preset loss function to obtain the energy exchange quantity of the park comprehensive energy system and a distribution network, wherein the preset loss function comprises a norm punishment item;
performing rewarding simulation calculation according to the energy exchange amount through a Monte Carlo algorithm to obtain an optimal energy supply guide signal;
and substituting the optimal energy supply guide signal into a preset energy optimization model to obtain an optimal scheduling scheme, wherein the preset energy optimization model comprises a preset energy scheduling function and preset constraint conditions.
Preferably, the preset integrated energy service provider model is as follows:
wherein alpha is a weighting factor, lambda (t) is an energy supply guide signal,andthe energy exchange quantity of the park comprehensive energy system and the distribution network in the time period t and NTMaximum and average energy exchange in time, epsilonmAs a conversion factor, profitbaseIntegrated energy service for distribution networkBusiness profit, NTAnd NmRespectively the total time and the number of the comprehensive energy subsystems of the park,andthe following constraint relationships are respectively satisfied:
preferably, the inputting the energy supply guidance signal sample into a preset neural network, and performing network training according to a preset loss function to obtain the energy exchange amount between the garden integrated energy system and the distribution network, before further comprising:
converting the selling price sample into a per unit value according to a preset reference value to obtain an energy supply guiding signal;
and normalizing the energy supply guide signal to obtain an energy supply guide signal sample.
Preferably, the inputting the energy supply guidance signal sample into a preset neural network, and performing network training according to a preset loss function to obtain the energy exchange capacity between the garden comprehensive energy system and the distribution network includes:
selecting a mean square error function as a training loss function of the preset neural network;
adding the norm penalty term obtained according to regularization calculation into the training loss function to obtain the preset loss function;
and inputting the energy supply guide signal sample into a preset neural network for training to obtain the energy exchange quantity of the park comprehensive energy system and the distribution network.
Preferably, the obtaining of the optimal energy supply guidance signal by performing the reward simulation calculation according to the energy exchange amount through the monte carlo algorithm includes:
and performing incentive simulation calculation according to the energy exchange amount, the preset incentive weight and the preset simulation times through a Monte Carlo algorithm to obtain an optimal energy supply guide signal.
The second aspect of the present application provides a comprehensive energy optimization device based on model-free reinforcement learning, including:
the acquisition module is used for acquiring an energy supply guidance signal sample according to a preset comprehensive energy service provider model;
the training module is used for inputting the energy supply guide signal samples into a preset neural network, carrying out network training according to a preset loss function and obtaining the energy exchange quantity of the garden comprehensive energy system and a distribution network, wherein the preset loss function comprises a norm punishment item;
the calculation module is used for carrying out reward simulation calculation according to the energy exchange quantity through a Monte Carlo algorithm to obtain an optimal energy supply guide signal;
and the optimization solving module is used for substituting the optimal energy supply guide signal into a preset energy optimization model to obtain an optimal scheduling scheme, and the preset energy optimization model comprises a preset energy scheduling function and preset constraint conditions.
Preferably, the preset integrated energy service provider model is as follows:
wherein alpha is a weighting factor, lambda (t) is an energy supply guide signal,andthe energy exchange quantity between the t time period of the park comprehensive energy system and the distribution network is respectively within NTMaximum and average energy exchange in time, epsilonmTo convert toFactor, profitbaseRevenue for distribution network integrated energy service provider, NTAnd NmRespectively the total time and the number of the comprehensive energy subsystems of the park,andthe following constraint relationships are respectively satisfied:
preferably, the method further comprises the following steps:
the preprocessing module is used for converting the selling price sample into a per-unit value according to a preset reference value to obtain an energy supply guiding signal;
and normalizing the energy supply guide signal to obtain an energy supply guide signal sample.
Preferably, the training module is specifically configured to:
selecting a mean square error function as a training loss function of the preset neural network;
adding the norm penalty term obtained according to regularization calculation into the training loss function to obtain the preset loss function;
and inputting the energy supply guide signal sample into a preset neural network for training to obtain the energy exchange quantity of the park comprehensive energy system and the distribution network.
Preferably, the calculation module is specifically configured to:
and performing incentive simulation calculation according to the energy exchange amount, the preset incentive weight and the preset simulation times through a Monte Carlo algorithm to obtain an optimal energy supply guide signal.
According to the technical scheme, the embodiment of the application has the following advantages:
the application provides a comprehensive energy optimization method based on model-free reinforcement learning, which comprises the following steps: acquiring an energy supply guide signal sample according to a preset comprehensive energy service provider model; inputting an energy supply guidance signal sample into a preset neural network, and carrying out network training according to a preset loss function to obtain the energy exchange quantity of the park comprehensive energy system and a distribution network, wherein the preset loss function comprises a norm punishment item; performing rewarding simulation calculation according to the energy exchange amount through a Monte Carlo algorithm to obtain an optimal energy supply guide signal; and substituting the optimal energy supply guide signal into a preset energy optimization model to obtain an optimal scheduling scheme, wherein the preset energy optimization model comprises a preset energy scheduling function and preset constraint conditions.
According to the comprehensive energy optimization method based on model-free reinforcement learning, energy optimization is carried out on a park comprehensive energy system by combining two algorithms of a neural network and Monte Carlo reinforcement learning; the energy supply guidance signals are trained by utilizing the data driving characteristics of the neural network, the energy exchange quantity of the park comprehensive energy system and the distribution network is expressed with high accuracy, and the calculation efficiency is high; the Monte Carlo reinforcement learning method can solve the problem of information hidden in data, has good applicability, and even if a preset energy optimization model with constraint conditions is used, the algorithm is not applicable due to the appropriate increase of calculated amount. Therefore, the comprehensive energy optimization method based on model-free reinforcement learning can solve the technical problems that the comprehensive energy system energy optimization technology based on the model is low in applicability and efficiency.
Drawings
Fig. 1 is a schematic flowchart of a comprehensive energy optimization method based on model-free reinforcement learning according to an embodiment of the present disclosure;
FIG. 2 is another schematic flow chart of a comprehensive energy optimization method based on model-free reinforcement learning according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of an integrated energy optimization device based on model-free reinforcement learning according to an embodiment of the present disclosure.
Detailed Description
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
For easy understanding, referring to fig. 1, a first embodiment of the comprehensive energy optimization method based on model-free reinforcement learning provided by the present application includes:
It should be noted that the energy supply guidance signal sample is actually a variable related to the retail price of electricity, and therefore, the energy supply guidance signal sample can be obtained directly through the preset comprehensive energy service provider model, according to the actual application principle, the preset comprehensive energy service provider model reflects the income condition of the energy service provider, and the larger the income is, the more beneficial the development of the energy service provider is.
And 102, inputting the energy supply guidance signal sample into a preset neural network, and carrying out network training according to a preset loss function to obtain the energy exchange quantity of the park comprehensive energy system and the distribution network, wherein the preset loss function comprises a norm punishment item.
It should be noted that the preset neural network is a network constructed and trained, and can be directly used; the energy supply guidance signal samples are used as training data, the energy exchange quantity of the park comprehensive energy system and the energy exchange quantity of the distribution network is used as an output result of the neural network, the training is mainly performed by regression analysis, and energy supply guidance signal sample data are required to be preprocessed before training, so that the deviation of the training data can be reduced, and the accuracy of the regression analysis and the effectiveness of a calculation result are improved. The norm penalty term is added into the preset loss function, namely uncertain variables such as distributed generation power, load fluctuation and the like exist in the park comprehensive energy system, and the uncertain factors can cause the power fluctuation to have larger or smaller deviation, so that the variables can possibly cause an overfitting phenomenon in training; in order to solve the problem, a regularization algorithm is added, and norm penalty terms are calculated, so that a loss function is more suitable for actual requirements.
103, performing rewarding simulation calculation according to the energy exchange amount through a Monte Carlo algorithm to obtain an optimal energy supply guide signal.
It should be noted that, when the problem to be solved is the probability of occurrence of a certain random event or the expected value of a certain random variable, the monte carlo algorithm estimates the probability of the random event by using the frequency of occurrence of the event through a certain "experiment" method, or obtains some digital features of the random variable, and uses it as the solution of the problem. In this embodiment, the object of calculation processing is the energy exchange amount, and an energy supply guidance signal corresponding to the optimal energy exchange amount is found.
And step 104, substituting the optimal energy supply guide signal into a preset energy optimization model to obtain an optimal scheduling scheme, wherein the preset energy optimization model comprises a preset energy scheduling function and preset constraint conditions.
It should be noted that the preset energy optimization model aims to minimize the running cost under the condition of the given energy supply guide signal related to the retail price; the optimal energy supply guide signal is closely related to the retail price, so that the change of the retail price can be reflected most, and the solution of the energy optimization model is optimized; and the preset energy scheduling function and the preset constraint condition jointly form a preset energy optimization model.
It should be noted that, although the present embodiment is a comprehensive energy optimization method based on model-free reinforcement learning, and relates to economic problems such as retail price and income of an operator, these are all necessary technical features of the present embodiment, and the present embodiment mainly solves the technical problems existing in model calculation, and designs a problem that an optimization method reinforces adaptability and efficiency of an energy optimization algorithm or an optimization model of a comprehensive energy system.
The comprehensive energy optimization method based on model-free reinforcement learning provided by the embodiment is used for optimizing the energy of the park comprehensive energy system by combining two algorithms of a neural network and Monte Carlo reinforcement learning; the energy supply guidance signals are trained by utilizing the data driving characteristics of the neural network, the energy exchange quantity of the park comprehensive energy system and the distribution network is expressed with high accuracy, and the calculation efficiency is high; the Monte Carlo reinforcement learning method can solve the problem of information hidden in data, has good applicability, and even if a preset energy optimization model with constraint conditions is used, the algorithm is not applicable due to the appropriate increase of calculated amount. Therefore, the comprehensive energy optimization method based on model-free reinforcement learning provided by the embodiment can solve the technical problems of low applicability and low efficiency of the comprehensive energy system energy optimization technology based on the model.
For easy understanding, please refer to fig. 2, an embodiment two of the comprehensive energy optimization method based on model-free reinforcement learning is provided in the embodiment of the present application, including:
It should be noted that the preset integrated energy service provider model is as follows:
in the formula, the first part is the income obtained by the comprehensive energy service provider selling electricity to the park comprehensive energy system, the second part is the peak-to-average ratio of the whole scheduling period, namely the ratio of the maximum power exchange quantity to the average power exchange quantity of the park comprehensive energy system, alpha is a weight factor, and lambda (t) is an energy supply guide signal, andthe energy exchange quantity between the t time period of the park comprehensive energy system and the distribution network is respectively within NTMaximum and average energy exchange in time, epsilonmAs a conversion factor, profitbaseRevenue for distribution network integrated energy service provider, NTAnd NmRespectively the total time and the number of the comprehensive energy subsystems of the park,andthe following constraint relationships are respectively satisfied:
And 203, normalizing the energy supply guide signal to obtain an energy supply guide signal sample.
The two parts are to carry out operations such as preprocessing and the like on the selling price sample to obtain an energy supply guide signal sample; the preset reference value can be set to 100, here, the preset reference value is a reference value for converting the selling price sample into a per unit value, the energy exchange amount output by the network also needs to be uniformly and correspondingly converted into a per unit value, and the reference value can be set to 1000; the normalization processing is mainly directly converted by a normalization formula, and specifically, the normalization processing can be performed according to the following formula:
wherein s is a training sample index value, namely an energy supply guide signal sample,andthe maximum value and the minimum value of the energy supply guide signal in each t time periods are respectively.
According to the energy supply guidance signal sample obtained after the preprocessing operation, the data normalized between [0 and 1] is obtained, and the convergence of the algorithm is improved and the convergence speed is improved.
And 204, selecting a mean square error function as a training loss function of the preset neural network.
And 205, adding a norm penalty term obtained according to regularization calculation in the training loss function to obtain a preset loss function.
It should be noted that, some uncertain variables exist in the campus integrated energy system, for example, distributed power generation power and load fluctuation, and these uncertain factors may cause power fluctuation to be larger or smaller, so that an overfitting phenomenon may occur in the variables during training, in order to solve this problem, a norm penalty term is added to a preset loss function, and the norm penalty term is calculated according to a regularization algorithm, and a two-norm penalty term is adopted in this embodiment; the mean square error function with the addition of a two-norm penalty term can be expressed as:
wherein b is a bias coefficient,is a two-norm penalty term, δ is a canonical parameter, and NSIn order to train the number of samples,for every s trainingThe actual energy exchange capacity of the park comprehensive energy system at each t time intervals of the training samples,for the estimated value of the energy exchange capacity of the park integrated energy system, NSIs the total training sample number. In addition, the estimated value and the actual value of the energy exchange amount meet the following conditions:
εmand the function of the conversion coefficient is to convert the power exchange between each energy-using body in the system into the power exchange at the common connection point of the system, and when the calculation of the loss function is completed, the first-order partial derivatives of the loss function to the weight and the deviation are continuously obtained and used for updating the variables:
where i is the iteration index value, l is the hidden layer index value, NLFor the total number of hidden layers,is the output value of the l layer, and eta is the learning rate; the first partial derivative calculation of the deviation is similar to the weighting and will not be described herein.
And step 206, inputting the energy supply guidance signal sample into a preset neural network for training, and acquiring the energy exchange quantity of the park comprehensive energy system and the distribution network.
It should be noted that the preset neural network is a network constructed and trained, and can be directly used; energy supply guidance signal samples are used as training data, energy exchange quantity of a park comprehensive energy system and a distribution network is used as an output result of a neural network, and training is mainly based on regression analysis.
And step 207, performing incentive simulation calculation according to the energy exchange amount, the preset incentive weight and the preset simulation times through a Monte Carlo algorithm to obtain an optimal energy supply guide signal.
It should be noted that, because it is difficult to obtain the state transition probability in the markov decision process, that is, the total power exchange amount per hour of the campus energy system including the distributed energy generation, in this embodiment, the synthetic energy optimization based on the model-free reinforcement learning adopts the monte carlo reinforcement learning algorithm, uses the sample average reward of the action as the reward value, and according to the large number theorem, as long as there are enough reward sample values and enough simulation times, the average reward of the sample is approximately equal to the actual value. In this embodiment, the agent is a campus integrated energy system; the state is the total exchange power quantity of the comprehensive energy system and the power grid in each hourly park:
the action is to supply energy guidance signal lambda (t) every hour, t 1T(ii) a The reward being the hourly gain of power delivery to the gridThe specific calculation method is as follows:
selecting an energy supply instruction signal lambda from the energy supply instruction signal samples(s)(t); initialization counter n(s) → 0; wherein s' is from 1 to NSCirculation, if λ(s')(t)=λ(s)(t), then n(s) → n(s) + 1; estimating lambda based on rewarding weight mean(s)(t):
r(λ(s)(t))=1/n(s)·(α∑profit(λ(s)(t))-(1-α)∑PAR(λ(s)(t)));
Finally, selecting λ (t) ═ argmaxr (λ)(s)(t)),s∈Ns. In the above calculation, the network distribution comprehensive energy service provider gainsThe discount factor γ is between 0 and 1, and in this embodiment, γ may be equal to 0.9,the distribution network comprehensive energy service provider is ensured to have higher robustness for energy supply signal decision; α is a weight coefficient for balancing Σ fit (λ)(s)(t)) and ∑ PAR (λ)(s)(t)) to find an optimal energy supply guidance signal λ (t); PAR is the peak-to-average ratio (PAR) of the entire scheduling period.
And step 208, substituting the optimal energy supply guide signal into a preset energy optimization model to obtain an optimal scheduling scheme, wherein the preset energy optimization model comprises a preset energy scheduling function and preset constraint conditions.
It should be noted that the preset energy optimization model is mainly composed of a preset energy scheduling function and preset constraint conditions, where the preset energy scheduling function is:
wherein, CCHPRepresents the fuel cost of the micro-combustion engine; beta is amRepresenting a network loss factor; λ (t) represents an energy supply instruction signal;the energy exchange capacity of the park comprehensive energy system and the distribution network is obtained;supplying power instruction signals to the demand response units;responding to the power demand;a variable of 0 to 1, which indicates whether the ith demand response interval is active or not; mu.sesThe charge-discharge coefficient of the energy storage system is obtained; SOCes(t) is the state of charge of the energy storage system at time t; cCH4Is the natural gas price; etaCHPThe power generation efficiency of the micro-combustion engine is obtained; l isHVNGIs the low heating value of natural gas; hCHPHeat energy emitted by the micro-combustion engine; etaLIs the coefficient of heat loss; etahThe gas recovery rate; cophThe heating coefficient. PCHPIs the total natural gas consumption of the micro-combustion engine.
The preset constraint conditions are as follows:
the load is a scheduling load upper limit value;andrespectively representing the charging amount and the discharging amount of stored energy; SOCes(t) an energy storage charge;andthe upper limit and the lower limit of the energy storage capacity are respectively; etaesCharging efficiency for energy storage; delta is the length of the time interval;the generating capacity of the generator set can be scheduled.
For ease of understanding, please refer to fig. 3, an embodiment of an integrated energy optimization apparatus based on model-free reinforcement learning is further provided, including:
the acquisition module 301 is configured to acquire an energy supply guidance signal sample according to a preset comprehensive energy service provider model;
the training module 302 is used for inputting the energy supply guidance signal samples into a preset neural network, performing network training according to a preset loss function, and acquiring the energy exchange quantity of the garden comprehensive energy system and a distribution network, wherein the preset loss function comprises a norm penalty item;
the calculation module 303 is configured to perform rewarding simulation calculation according to the energy exchange amount through a monte carlo algorithm to obtain an optimal energy supply guidance signal;
and the optimization solving module 304 is configured to substitute the optimal energy supply guidance signal into a preset energy optimization model to obtain an optimal scheduling scheme, where the preset energy optimization model includes a preset energy scheduling function and preset constraint conditions.
Further, the preset comprehensive energy service provider model is as follows:
wherein alpha is a weighting factor, lambda (t) is an energy supply guide signal,andthe energy exchange quantity between the t time period of the park comprehensive energy system and the distribution network is respectively within NTMaximum and average energy exchange in time, epsilonmAs a conversion factor, profitbaseDistribution network integrated energy service provider revenue, NT、NmRespectively the total time and the number of the comprehensive energy subsystems of the park,andthe following constraint relationships are respectively satisfied:
further, still include:
the preprocessing module 305 is used for converting the selling price sample into a per unit value according to a preset reference value to obtain an energy supply guiding signal;
and normalizing the energy supply guide signal to obtain an energy supply guide signal sample.
Further, the training module 302 is specifically configured to:
selecting a mean square error function as a training loss function of a preset neural network;
adding a norm penalty term obtained according to regularization calculation into the training loss function to obtain a preset loss function;
and inputting the energy supply guide signal sample into a preset neural network for training to obtain the energy exchange quantity of the park comprehensive energy system and the distribution network.
Further, the calculating module 303 is specifically configured to:
and performing incentive simulation calculation according to the energy exchange amount, the preset incentive weight and the preset simulation times through a Monte Carlo algorithm to obtain an optimal energy supply guide signal.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for executing all or part of the steps of the method described in the embodiments of the present application through a computer device (which may be a personal computer, a server, or a network device). And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.
Claims (8)
1. The comprehensive energy optimization method based on model-free reinforcement learning is characterized by comprising the following steps:
acquiring an energy supply guide signal sample according to a preset comprehensive energy service provider model, wherein the preset comprehensive energy service provider model comprises the following steps:
wherein alpha is a weighting factor, lambda (t) is an energy supply guide signal,andthe energy exchange quantity of the park comprehensive energy system and the distribution network in the time period t and NTMaximum and average energy exchange in time, epsilonmAs a conversion factor, profitbaseRevenue for distribution network integrated energy service provider, NTAnd NmRespectively the total time and the number of the comprehensive energy subsystems of the park,andthe following constraint relationships are respectively satisfied:
inputting the energy supply guidance signal sample into a preset neural network, and carrying out network training according to a preset loss function to obtain the energy exchange quantity of the park comprehensive energy system and a distribution network, wherein the preset loss function comprises a norm punishment item;
performing rewarding simulation calculation according to the energy exchange amount through a Monte Carlo algorithm to obtain an optimal energy supply guide signal;
and substituting the optimal energy supply guide signal into a preset energy optimization model to obtain an optimal scheduling scheme, wherein the preset energy optimization model comprises a preset energy scheduling function and preset constraint conditions.
2. The method for optimizing energy of integrated energy based on model-free reinforcement learning according to claim 1, wherein the energy supply guidance signal samples are input into a preset neural network, and network training is performed according to a preset loss function to obtain the energy exchange amount between the park integrated energy system and the distribution network, and the method further comprises the following steps:
converting the selling price sample into a per unit value according to a preset reference value to obtain an energy supply guiding signal;
and normalizing the energy supply guide signal to obtain an energy supply guide signal sample.
3. The method for optimizing energy of comprehensive energy based on model-free reinforcement learning according to claim 1, wherein the step of inputting the energy supply guidance signal samples into a preset neural network and performing network training according to a preset loss function to obtain the energy exchange amount between the park comprehensive energy system and a distribution network comprises the steps of:
selecting a mean square error function as a training loss function of the preset neural network;
adding the norm penalty term obtained according to regularization calculation into the training loss function to obtain the preset loss function;
and inputting the energy supply guide signal sample into a preset neural network for training to obtain the energy exchange quantity of the park comprehensive energy system and the distribution network.
4. The method for comprehensive energy optimization based on model-free reinforcement learning according to claim 1, wherein the obtaining of the optimal energy supply guidance signal through the rewarding simulation calculation by the Monte Carlo algorithm according to the energy exchange amount comprises:
and performing incentive simulation calculation according to the energy exchange amount, the preset incentive weight and the preset simulation times through a Monte Carlo algorithm to obtain an optimal energy supply guide signal.
5. Comprehensive energy optimizing device based on model-free reinforcement learning is characterized by comprising the following components:
the acquisition module is used for acquiring an energy supply guidance signal sample according to a preset comprehensive energy service provider model, wherein the preset comprehensive energy service provider model comprises the following steps:
wherein alpha is a weighting factor, lambda (t) is an energy supply guide signal,andthe energy exchange quantity between the t time period of the park comprehensive energy system and the distribution network is respectively within NTMaximum and average energy exchange in time, epsilonmAs a conversion factor, profitbaseRevenue for distribution network integrated energy service provider, NTAnd NmRespectively the total time and the number of the comprehensive energy subsystems of the park,andthe following constraint relationships are respectively satisfied:
the training module is used for inputting the energy supply guide signal samples into a preset neural network, carrying out network training according to a preset loss function and obtaining the energy exchange quantity of the garden comprehensive energy system and a distribution network, wherein the preset loss function comprises a norm punishment item;
the calculation module is used for carrying out reward simulation calculation according to the energy exchange quantity through a Monte Carlo algorithm to obtain an optimal energy supply guide signal;
and the optimization solving module is used for substituting the optimal energy supply guide signal into a preset energy optimization model to obtain an optimal scheduling scheme, and the preset energy optimization model comprises a preset energy scheduling function and preset constraint conditions.
6. The integrated energy optimization device based on model-free reinforcement learning according to claim 5, further comprising:
the preprocessing module is used for converting the selling price sample into a per-unit value according to a preset reference value to obtain an energy supply guiding signal;
and normalizing the energy supply guide signal to obtain an energy supply guide signal sample.
7. The model-free reinforcement learning-based integrated energy optimization device according to claim 5, wherein the training module is specifically configured to:
selecting a mean square error function as a training loss function of the preset neural network;
adding the norm penalty term obtained according to regularization calculation into the training loss function to obtain the preset loss function;
and inputting the energy supply guide signal sample into a preset neural network for training to obtain the energy exchange quantity of the park comprehensive energy system and the distribution network.
8. The model-free reinforcement learning-based integrated energy optimization device according to claim 5, wherein the computing module is specifically configured to:
and performing incentive simulation calculation according to the energy exchange amount, the preset incentive weight and the preset simulation times through a Monte Carlo algorithm to obtain an optimal energy supply guide signal.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010397747.0A CN111478326B (en) | 2020-05-12 | 2020-05-12 | Comprehensive energy optimization method and device based on model-free reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010397747.0A CN111478326B (en) | 2020-05-12 | 2020-05-12 | Comprehensive energy optimization method and device based on model-free reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111478326A CN111478326A (en) | 2020-07-31 |
CN111478326B true CN111478326B (en) | 2021-09-03 |
Family
ID=71762522
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010397747.0A Active CN111478326B (en) | 2020-05-12 | 2020-05-12 | Comprehensive energy optimization method and device based on model-free reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111478326B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112421642B (en) * | 2020-10-28 | 2022-07-12 | 国家电网有限公司 | IES (Integrated energy System) reliability assessment method and system |
CN114400675B (en) * | 2022-01-21 | 2023-04-07 | 合肥工业大学 | Active power distribution network voltage control method based on weight mean value deep double-Q network |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109472413A (en) * | 2018-11-14 | 2019-03-15 | 南方电网科学研究院有限责任公司 | Consider the garden integrated energy system Optimization Scheduling of hot pipe network transmission characteristic |
CN109685332A (en) * | 2018-12-06 | 2019-04-26 | 广东电网有限责任公司 | A kind of comprehensive energy multiagent balance of interest Optimization Scheduling and equipment |
CN110852839A (en) * | 2019-10-29 | 2020-02-28 | 车主邦(北京)科技有限公司 | Method, device and storage medium for interfacing energy service business |
-
2020
- 2020-05-12 CN CN202010397747.0A patent/CN111478326B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109472413A (en) * | 2018-11-14 | 2019-03-15 | 南方电网科学研究院有限责任公司 | Consider the garden integrated energy system Optimization Scheduling of hot pipe network transmission characteristic |
CN109685332A (en) * | 2018-12-06 | 2019-04-26 | 广东电网有限责任公司 | A kind of comprehensive energy multiagent balance of interest Optimization Scheduling and equipment |
CN110852839A (en) * | 2019-10-29 | 2020-02-28 | 车主邦(北京)科技有限公司 | Method, device and storage medium for interfacing energy service business |
Non-Patent Citations (5)
Title |
---|
Optimal Scheduling of Hydro–PV–Wind Hybrid System Considering CHP and BESS Coordination;Shengmin Tan等;《applied sciences》;20190302;第1-18页 * |
综合能源系统建模分析与运行优化研究;李明;《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》;20200215;第24-49页 * |
考虑电热气耦合的综合能源系统规划方法;雷金勇等;《电力系统及其自动化学报》;20190131;第19-24页 * |
计及不确定性的区域综合能源系统双层优化配置规划模型;仇知等;《电力自动化设备》;20190831;第176-185页 * |
面向能源互联网的综合能源系统规划研究综述;袁智勇等;《南方电网技术》;20190731;第1-9页 * |
Also Published As
Publication number | Publication date |
---|---|
CN111478326A (en) | 2020-07-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Li et al. | Data-driven distributionally robust scheduling of community integrated energy systems with uncertain renewable generations considering integrated demand response | |
Ghadimi et al. | PSO based fuzzy stochastic long-term model for deployment of distributed energy resources in distribution systems with several objectives | |
CN111478326B (en) | Comprehensive energy optimization method and device based on model-free reinforcement learning | |
Huang et al. | A control strategy based on deep reinforcement learning under the combined wind-solar storage system | |
Memarzadeh et al. | A new optimal energy storage system model for wind power producers based on long short term memory and Coot Bird Search Algorithm | |
CN112508287B (en) | Energy storage optimal configuration method based on full life cycle of user side BESS | |
CN110350527A (en) | A kind of increment power distribution network dual-layer optimization configuration method containing distributed generation resource | |
CN112084705A (en) | Grid-connected coordination planning method and system for comprehensive energy system | |
CN113592133A (en) | Energy hub optimal configuration method and system | |
CN116011821A (en) | Virtual power plant optimization risk scheduling method in power market environment | |
Yu et al. | Research on energy management of a virtual power plant based on the improved cooperative particle swarm optimization algorithm | |
Jiang et al. | Monthly electricity purchase and decomposition optimization considering wind power accommodation and day-ahead schedule | |
CN115204944A (en) | Energy storage optimal peak-to-valley price difference measuring and calculating method and device considering whole life cycle | |
CN112865101B (en) | Linear transaction method considering uncertainty of output of renewable energy | |
Wang et al. | Source-load scenario generation based on weakly su-pervised adversarial learning and its data-driven appli-cation in energy storage capacity sizing | |
CN114723230A (en) | Micro-grid double-layer scheduling method and system for new energy power generation and energy storage | |
CN114301081A (en) | Micro-grid optimization method considering energy storage life loss and demand response of storage battery | |
Zhao et al. | Technical and economic operation of VPPs based on competitive bi–level negotiations | |
CN113255957A (en) | Quantitative optimization analysis method and system for uncertain factors of comprehensive service station | |
Ghasemi et al. | Combating Uncertainties in Wind and Distributed PV Energy Sources Using Integrated Reinforcement Learning and Time-Series Forecasting | |
CN117094745B (en) | Comprehensive energy system optimization control method and device based on IGDT-utility entropy | |
Chen et al. | Optimal generation bidding strategy for CHP units in deep peak regulation ancillary service market based on two-stage programming | |
Yan et al. | Combined Source-Storage-Transmission Planning Considering the Comprehensive Incomes of Energy Storage System | |
Zhang et al. | Deep Reinforcement Learning-Based Battery Conditioning Hierarchical V2G Coordination for Multi-Stakeholder Benefits | |
Katiraee et al. | Modelling of microgrids to insure resource adequacy in the capacity market |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |