CN115001002B - Optimal scheduling method and system for solving problem of energy storage participation peak clipping and valley filling - Google Patents
Optimal scheduling method and system for solving problem of energy storage participation peak clipping and valley filling Download PDFInfo
- Publication number
- CN115001002B CN115001002B CN202210916196.3A CN202210916196A CN115001002B CN 115001002 B CN115001002 B CN 115001002B CN 202210916196 A CN202210916196 A CN 202210916196A CN 115001002 B CN115001002 B CN 115001002B
- Authority
- CN
- China
- Prior art keywords
- energy storage
- value
- network
- strategy
- state
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000004146 energy storage Methods 0.000 title claims abstract description 218
- 238000000034 method Methods 0.000 title claims abstract description 43
- 238000011217 control strategy Methods 0.000 claims abstract description 66
- 238000012549 training Methods 0.000 claims abstract description 45
- 238000005457 optimization Methods 0.000 claims abstract description 32
- 230000008569 process Effects 0.000 claims abstract description 15
- 238000013528 artificial neural network Methods 0.000 claims description 80
- 239000013598 vector Substances 0.000 claims description 9
- 230000009471 action Effects 0.000 claims description 7
- 238000002372 labelling Methods 0.000 claims description 6
- 230000004044 response Effects 0.000 claims description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims 2
- 230000001537 neural effect Effects 0.000 claims 1
- 230000006870 function Effects 0.000 abstract description 39
- 238000009826 distribution Methods 0.000 abstract description 12
- 230000008859 change Effects 0.000 abstract description 6
- 238000007599 discharging Methods 0.000 description 11
- 230000008901 benefit Effects 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 6
- 150000001875 compounds Chemical class 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000005611 electricity Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000002787 reinforcement Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 1
- 229960001948 caffeine Drugs 0.000 description 1
- 229910052799 carbon Inorganic materials 0.000 description 1
- 230000009194 climbing Effects 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000009987 spinning Methods 0.000 description 1
- RYYVLZVUVIJVGH-UHFFFAOYSA-N trimethylxanthine Natural products CN1C(=O)N(C)C(=O)C2=C1N=CN2C RYYVLZVUVIJVGH-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/28—Arrangements for balancing of the load in a network by storage of energy
- H02J3/32—Arrangements for balancing of the load in a network by storage of energy using batteries with converting means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
- G06Q10/06315—Needs-based resource requirements planning or analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Economics (AREA)
- Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Health & Medical Sciences (AREA)
- Entrepreneurship & Innovation (AREA)
- Development Economics (AREA)
- General Health & Medical Sciences (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Game Theory and Decision Science (AREA)
- Computational Linguistics (AREA)
- Power Engineering (AREA)
- Artificial Intelligence (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Educational Administration (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Evolutionary Computation (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- Primary Health Care (AREA)
- Supply And Distribution Of Alternating Current (AREA)
Abstract
The invention provides an optimal scheduling method and system for solving the problem that energy storage participates in peak clipping and valley filling, which comprises the steps of setting a parameterized depth Q value network, training the parameterized depth Q value network by utilizing load historical data and the power yield of energy storage at the corresponding moment, and limiting the updating times of a control strategy by utilizing a trust domain optimization model in the training process, so that an optimal strategy is rapidly and accurately obtained, and the optimal scheduling control of the energy storage is realized under the current condition. The invention utilizes the trust domain-reinforcement learning to limit the size of strategy updating in continuous control, does not greatly change the distribution form during each updating, ensures that the yield meets the adjustment and increment convergence, can correct the optimization result on line, and takes charge and discharge constraints into consideration to achieve the optimal peak clipping and valley filling control function.
Description
Technical Field
The invention belongs to the technical field of power grid dispatching, and particularly relates to an optimal dispatching method and system for solving energy storage participation peak clipping and valley filling based on a trust domain-reinforcement learning.
Background
The large-scale battery energy storage system can realize the peak clipping and valley filling functions of the load by discharging at the peak of the load and charging at the valley of the load. The power grid company utilizes the stored energy to cut peaks and fill valleys, so that the upgrading of the equipment capacity can be postponed, the utilization rate of the equipment is improved, and the updating cost of the equipment is saved; the power consumer can utilize the energy storage to cut peak and fill valley, and can utilize the peak-valley power price difference to obtain economic benefit. How to achieve the optimal peak clipping and valley filling effects by using the limited battery capacity and meet the limits of a set of constraint conditions needs to be realized by means of an optimization algorithm.
The classical optimization algorithm for solving the charging and discharging strategy of the energy storage system comprises a gradient algorithm and a dynamic programming algorithm. The gradient algorithm cannot process discontinuous constraint conditions and has strong dependence on initial values. Discontinuous and nonlinear constraints can be considered in the model by adopting a dynamic programming algorithm, and the solution is convenient to use a computer. However, when large-scale energy storage grid connection and high-randomness loads exist, the two methods have the problems of precision and calculation efficiency, and meanwhile, the two methods are based on accurate physical models, so that the accuracy of modeling is difficult to guarantee in practical problems.
Disclosure of Invention
In view of the above, the invention aims to solve the problems that when a large-scale energy storage grid connection and a high-randomness load exist, the classical optimization algorithms for solving the charging and discharging strategies of the energy storage system have the precision and the calculation efficiency, and the modeling accuracy is difficult to ensure.
In order to solve the technical problems, the invention provides the following technical scheme:
in a first aspect, the present invention provides an optimal scheduling method for solving the problem that energy storage participates in peak clipping and valley filling, including the following steps:
setting a parameterized depth Q value network, wherein the parameterized depth Q value network is used for parameterizing the input control strategies by utilizing network parameters of the parameterized depth Q value network and outputting a plurality of parameterized control strategies;
acquiring historical active values and predicted values of loads and energy storage power output at corresponding moments, inputting the energy storage power output, the active values and the predicted values at initial moments as initial states, controlling energy storage by any initial energy storage control strategy, performing iterative training on a parameterized deep Q value network and updating network parameters by taking the variance of a minimized load curve as a target, controlling the updating times of the network parameters by using a trust domain optimization model, and meeting the conditionWhen it is time, finish training, whereinRepresenting a trust domain constraint on the manifold,representing utilization of network parametersParameterized control strategy,A constraint limit value is indicated and,andrepresenting network parametersThe number of updates of (a);
and acquiring the current load active value and the energy storage power output, inputting the current load active value and the energy storage power output into the trained parameterized depth Q value network, and selecting a strategy corresponding to the maximum value in the output result and issuing the strategy to the energy storage sub-controller for energy storage scheduling control.
Further, the parameterized depth Q-value network specifically includes: an energy storage strategy neural network and an energy storage state value neural network;
the energy storage strategy neural network is a Q-Value network based on approximate state-action energy storageSet up as corresponding network parameters;
The energy storage state Value neural network is an energy storage Q-Value network according to an approximate stateSet up as corresponding network parameters;
Wherein,the status is represented by a number of time slots,the representation of the motion is shown as,which is indicative of the time of day,the energy storage control strategy is represented by,indicating a stateWhen taking actionThe value of the time-frequency response is,indicating a stateFor all possible actionsIn the light of the expected value of the composition,the indication of the return is that,representing a discount factor.
Further, the trust domain optimization model specifically includes:
in the formula,which indicates the control strategy before the update,representing per-network parametersThe updated control strategy is then used to control the power converter,indicating an expected discount return for the updated control strategy compared to the pre-update control strategy,representing trust domain constraints between the updated control strategy and the control strategy before the update.
Further, iterative training is carried out on the parameterized depth Q value network, network parameters are updated, the trust domain optimization model is used for controlling the updating times of the network parameters, and the conditions are metAnd then, finishing the training, specifically comprising:
taking the initial state as an initial state to control the strategyTo store energySecondary control to obtain strategy state-action trackWhereinAs an output result of the energy storage strategy neural network,for the parameters of the energy storage policy network,is as followsThe wheel strategy state-the motion trajectory,is as followsA track and,is a time of dayTo (1)A trajectory state and a motion vector;
for theEach step inAll record the corresponding return and calculate the action-state cost function of the corresponding step by utilizing the energy storage strategy neural network based on the returnAnd calculating a state cost function of the corresponding step by using the energy storage state cost neural networkWhereinIs a parameter of the energy storage state value neural network;
for theEach step inComputing a merit function based on the action-state cost function and the state cost function,;
Estimating a policy gradient based on the merit function,Whereinthe total control wheel number of the load and the stored energy is represented;representing the energy storage strategy neural network inThe gradient of (d);
computing the energy storage policy neural network pair based on the policy gradientSecond order partial derivative of,WhereinIs an auxiliary variable and has no actual physical significance;
let iteration subscriptSequentially updating the network parameters of the energy storage strategy neural network to be,WhereinRepresenting the maximum backtracking times of the step length of the energy storage strategy neural network;
to the energy storage state value neural network toFor labeling, a random gradient descent algorithm is adopted to update parameters into,WhereinValue neural network loss function for the energy storage stateFor network parametersThe gradient of (a) of (b) is,;
Further, the expression for minimizing the variance of the load curve is specifically as follows:
in the formula,the number of load data points in a day is determined by the predicted load data, and the current time is set to correspond to the first time() Load data;is a time of dayIs a known amount, andthe time is the actual load,the time is the predicted load;is a time of dayArrival timeBetween BES, the battery is charged positive, the discharge is negative, andwhen is alreadyThe amount of the active carbon is known,is the control variable.
In a second aspect, the present invention provides an optimized scheduling system for solving the problem that energy storage participates in peak clipping and valley filling, including:
the system comprises a setting unit, a parameter setting unit and a parameter setting unit, wherein the setting unit is used for setting a parameterized depth Q value network, and the parameterized depth Q value network is used for parameterizing an input control strategy by utilizing a network parameter of the parameterized depth Q value network and outputting a plurality of parameterized control strategies;
a training unit for obtaining the historical active value and the predicted value of the load and the output of the energy storage power at the corresponding moment, inputting the output of the energy storage power at the initial moment, the active value and the predicted value of the load as initial states, controlling the energy storage by any initial energy storage control strategy, iteratively training the parameterized depth Q value network and updating the network parameters by taking the variance of the minimized load curve as a target, controlling the updating times of the network parameters by utilizing a trust domain optimization model, and meeting the requirementsWhen it is time, finish training, whereinRepresenting a trust domain constraint on the manifold,representing utilization of network parametersParameterized control strategy,A constraint limit value is indicated and,andrepresenting network parametersThe number of updates of (a);
and the control unit is used for acquiring the current load active value and the energy storage power output, inputting the current load active value and the energy storage power output into the trained parameterized depth Q value network, and selecting the strategy corresponding to the maximum value in the output result and issuing the strategy to the energy storage sub-station controller for energy storage scheduling control.
Further, the parameterized depth Q-value network specifically includes: an energy storage strategy neural network and an energy storage state value neural network;
the energy storage strategy neural network is a Q-Value network based on approximate state-action energy storageSet up as corresponding network parameters;
The energy storage state Value neural network is an energy storage Q-Value network according to an approximate stateSet up as corresponding network parameters;
Wherein,the status is represented by a number of time slots,the motion is represented by a motion vector representing the motion,which is indicative of the time of day,an energy storage control strategy is shown,indicating a stateWhen taking actionThe value of the time-domain data is,indicating a stateFor all possible actionsIn the light of the expected value of,the indication of the return is that,representing a discount factor.
Further, the trust domain optimization model specifically includes:
in the formula,which indicates the control strategy before the update,representing per-networkParameter(s)The updated control strategy is then used to control the power converter,indicating an expected discount return for the updated control strategy compared to the pre-update control strategy,representing trust domain constraints between the updated control strategy and the control strategy before the update.
Further, the process of the training unit performing iterative training on the parameterized deep Q-value network and updating the network parameters specifically includes:
taking the initial state as an initial state to control the strategyTo store energySecondary control to obtain strategy state-action trackWhereinAs an output result of the energy storage strategy neural network,for the parameters of the energy storage policy network,is as followsThe wheel strategy state-the motion trajectory,is as followsA track and,is a time of dayTo (1) aA trajectory state and a motion vector;
for theEach step inAll record the corresponding return and calculate the action-state cost function of the corresponding step by utilizing the energy storage strategy neural network based on the returnAnd calculating a state cost function of the corresponding step by using the energy storage state cost neural networkIn whichIs a parameter of the energy storage state value neural network;
for theEach step inComputing a merit function based on the action-state cost function and the state cost function,;
Estimating a policy gradient based on the merit function,Whereinthe total control wheel number of the load and the stored energy is represented;representing the energy storage strategy neural network inThe gradient of (d);
computing the energy storage policy neural network pair based on the policy gradientSecond order partial derivative of,WhereinIs an auxiliary variable and has no actual physical significance;
let iteration subscriptSequentially updating the network parameters of the energy storage strategy neural network to be,WhereinRepresenting the maximum backtracking times of the step length of the energy storage strategy neural network;
to the energy storage state value neural network toFor labeling, a random gradient descent algorithm is adopted to update parameters into,WhereinValue neural network loss function for the energy storage stateFor network parametersThe gradient of (a) of (b) is,;
Further, the expression for minimizing the variance of the load curve is specifically as follows:
in the formula,the number of load data points in a day is determined by the predicted load data, and the current time is set to correspond to the first time() Individual load data;is a time of dayIs a known amount, andthe time is the actual load,the time is the predicted load;is a time of dayArrival timeBetween BES, the battery is charged positive, the discharge is negative, andthe time is a known amount of the compound,is the control variable.
In conclusion, the invention provides an optimal scheduling method and system for solving the problem that energy storage participates in peak clipping and valley filling, which comprises the steps of setting a parameterized depth Q value network, training the parameterized depth Q value network by using load historical data and the power output rate of energy storage at the corresponding moment, and limiting the updating times of a control strategy by using a trust domain optimization model in the training process, so that an optimal strategy is quickly and accurately obtained, and optimal scheduling control of energy storage is realized under the current condition. The invention utilizes the trust domain-reinforcement learning to limit the size of strategy updating in continuous control, does not change the distribution form greatly during each updating, ensures that the yield meets the adjustment and increment convergence, can correct the optimization result on line, and takes charge and discharge constraints into consideration to achieve the optimal peak clipping and valley filling control function.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.
Fig. 1 is a schematic flowchart of an optimal scheduling method for solving energy storage participation peak clipping and valley filling according to an embodiment of the present invention;
FIG. 2 is a parameter updating process of trust domain-reinforcement learning provided by the embodiment of the present invention;
fig. 3 is a schematic diagram of an energy storage strategy neural network according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of an energy storage state value neural network provided by an embodiment of the present invention;
fig. 5 is a flowchart of network training according to an embodiment of the present invention.
Detailed Description
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The large-scale battery energy storage system can realize the peak clipping and valley filling functions of the load by discharging at the peak of the load and charging at the valley of the load. The power grid company utilizes the energy storage to carry out peak clipping and valley filling, so that the upgrading of the equipment capacity can be postponed, the utilization rate of the equipment is improved, and the updating cost of the equipment is saved; the power consumer can utilize the energy storage to cut peak and fill valley, and can utilize the peak-valley power price difference to obtain economic benefit. How to achieve the optimal peak clipping and valley filling effects by using the limited battery capacity and meet the limits of a set of constraint conditions needs to be realized by means of an optimization algorithm.
The classical optimization algorithm for solving the charging and discharging strategy of the energy storage system comprises a gradient algorithm and a dynamic programming algorithm. The gradient algorithm cannot process discontinuous constraint conditions and has strong dependence on initial values. Discontinuous and nonlinear constraints can be considered in the model by adopting a dynamic programming algorithm, and the solution is convenient to use a computer. However, when a large-scale energy storage grid connection and a high-randomness load exist, the two methods have the problems of precision and computational efficiency, and meanwhile, the two methods are based on an accurate physical model, so that the accuracy of modeling is difficult to guarantee in a practical problem.
The traditional reinforcement learning method based on strategy gradient makes the deep neural network make obvious progress in the control task. However, there are difficulties in achieving good results with the strategic gradient method, since this method is very sensitive to the number of iteration steps: if too small, the training process is very slow; if chosen too large, the feedback signal will be buried in the noise, possibly even allowing the model to behave avalanche-wise. The sampling efficiency of such methods is often low, and a task of learning simply requires millions to billions of total iterations.
Based on the method, the invention provides an optimal scheduling method and system for solving the problem that energy storage participates in peak clipping and valley filling based on the trust domain-reinforcement learning.
The following describes a method for optimizing and scheduling energy storage participation peak clipping and valley filling based on trust domain-reinforcement learning.
Referring to fig. 1, the present embodiment provides an optimal scheduling method for solving energy storage participation peak clipping and valley filling based on trust domain-reinforcement learning.
Firstly, a design idea of solving energy storage participating in peak clipping and valley filling optimization scheduling based on confidence domain-reinforcement learning is explained in detail as follows:
trust Region Policy Optimization (TRPO) limits the size of strategy update in continuous control, does not change the distribution form greatly every time of update, enables the benefit to meet the requirement of incremental convergence, and can correct the Optimization result on line.
Because the charging and discharging power of the stored energy can be changed rapidly and flexibly, the climbing rate constraint does not need to be considered. Ignoring the internal losses of the battery, the battery can be considered as a constant voltage source model. If the owner of the energy storage system is a power consumer, under a market electricity price system, the user aims to maximize the economic benefit brought to the user by the energy storage system; if the owner of the energy storage system is the grid, the load curve of the grid is as flat as possible in order to reduce the number of times of start-up and shut-down of the conventional generator set and the capacity of the spinning reserve. Mathematically, the variance may reflect the degree to which the random variable deviates from its mean, and the variance of the load may reflect the degree to which the load curve is flat. The present embodiment therefore chooses the variance of the minimized load curve as the objective function:
in the formula,the number of load data points in a day is determined by the predicted load data, and the current time is set to correspond to the first time() Individual load data;is a time of dayIs a known amount, andthe time is the actual load,the time is the predicted load;is a time of dayArrival timeBetween BES, the battery is charged positive, the discharge is negative, andthe time is a known amount of the compound,is the control variable.
The following describes the parameters of the present embodiment in turn. The real-time optimization of the present embodiment includes the following constraints.
1. Battery capacity constraint
The battery electric quantity at each moment does not exceed the upper and lower limits of the battery capacity:
in the formula:andrespectively the lower limit and the upper limit of the residual capacity of the battery;is a time of dayThe charge of the battery is set to be,the time is a known amount of the compound,is a state variable.
When calculating on line, the electric quantity of the current momentIs the initial value of the number of the first time,electric quantity of timeIs the final value. After neglecting the loss of the battery, the battery isThe amount of power reduced in time is equal to the amount of power output in this period of time:
in the formula:interval time of adjacent load data;andrespectively an initial value and a final value of the residual capacity of the battery.
2. Power constraint
Due to the limitations of a power electronic converter (PCS) and a battery body, the output power of the battery at each moment cannot exceed the upper and lower power limits:
In this embodiment, the optimization problem is converted into a markov sequence decision model, which mainly includes a state space, an action space, and a return function.
For convenience of description, the following description will be made with reference to the following drawings:
3、: state space, state space in this embodimentIs to store the current output powerAnd load prediction value。
: an action space, which refers to the charge and discharge power at the future moment of energy storage in this embodiment;
:the transition probability distribution, where the transition probability distribution is deterministic, is set to 1.
whereinIs the variance minimum objective function of the load fluctuations,it is ensured that the electric quantity is within the corresponding upper and lower limit ranges,in order to ensure that the charging and discharging power is within the corresponding upper and lower limits,the purpose is to ensure charge and discharge electricity quantity-power balance.
7、:The patent refers to the probability of charge and discharge power corresponding to stored energy.
9. State-action energy storage Q-Value network:
10. Energy storage Q-Value network:
11. The merit function:
physics of itMeaning, stateNext, the difference between the value corresponding to an action and the expected value for all possible actions is selected, where。
The design concept of the present embodiment is explained based on the above description. The starting point of the scheme is to take each strategyCan be made such thatIncrease monotonically and therefore willThe expression of (c) is written in the form:
whereinFor the function to be solved, it must satisfyIts purpose is to ensureMonotonically increasing.
Here, theAndis an arbitrary two control strategies, and can be seen as a successful discount return function for evaluating the strategiesInto a form evaluated by the merit function, and then when this term is positive, the policy is updated for positive. But this expression does not give much information, we will take the state in itThe explicit expression is as follows:
adjusting the position of each item:
defining discount state access probability:
its physical meaning is in policyNext, access to status with discount factorWhen there is no normalization, in the case ofComprises the following steps:
from this equation, it can be seen that for a new policyHow to judge whether it is a better strategy. That is to say for the new policyFor all possible statesAnd inspecting the expected advantage value of the product, if:
then explainFor better strategies, in the state under investigationThe policy is updated according to:
until allState of lower reachAnd state of the sumAll actions that may be takenAre no longer positiveConvergence to the optimal strategy is indicated.
Furthermore, in order to accelerate the calculation process, especially the load of the garden, the photovoltaic and the energy storage in each control later period, the optimal control capacity can not change greatly, the range of variation of each training is not particularly large, and the change of the access probability of the discount state caused by the strategy updating is considered to be ignored for useSubstitutionAt this time, there are:
for reinforcement learning, the use of parameter vectors can be adoptedParameterizable control strategyIs composed ofIt can be proved that:
wherein:for the purpose of the current parameterized control strategy,the strategy is controlled for the updated parameterization.
for the sake of consistency with the notation in the algorithm below, and for ease of description, the simple rewrite subscripts herein are labeled as follows:
here, theRepresents the current policy toRepresents the updated policy, which is one or moreThe parameterized policy function may be updated with this inequality relationship, which is an inequality of the variables.
Order to
This embodiment maximizes at each step according to the principle of Maxorize-Minimize optimizationUpdating control strategiesThe expected discount return may then be incrementally increased。
the idea of trust domains is to embody trust domain constraints on the manifoldThis constraint is applied to all states, and each state is examined, which is similar to the Euclidean spatial trust domain constraint in the optimization theory.
The following discussion calculates the objective function in the above optimization problem from the sampled values:
For theOne term, importance sample estimation may be employed, the orderRepresenting the distribution of samples, then forA stateIn other words, this term can be estimated by the following importance samples:
The final calculation form of the above trust domain problem is:
in summary, the implementation steps of the optimal scheduling method for solving the energy storage participation peak clipping and valley filling based on the trust domain-reinforcement learning in this embodiment are as follows:
s100: and setting a parameterized depth Q value network, wherein the parameterized depth Q value network is used for parameterizing the input control strategy by utilizing the network parameters of the parameterized depth Q value network and outputting a plurality of parameterized control strategies.
The setting flow of this embodiment is as follows:
step 1: respectively dispersing the energy storage control interval into 10 equally divided intervals, wherein the step length of each interval is;
Step 2: setting approximate state-action energy storage Q-Value networkThe corresponding energy storage strategy neural network:
And step 3: setting approximate state energy storage Q-Value networkThe corresponding energy storage state value neural network:
S200: obtaining historical active values and predicted values of loads and energy storage power output at corresponding moments, inputting the energy storage power output, the active values and the predicted values at initial moments as initial states, controlling energy storage by any initial energy storage control strategy, iteratively training a parameterized deep Q value network and updating network parameters by taking variance of a minimized load curve as a target, controlling the updating times of the network parameters by using a trust domain optimization model, and meeting the conditionsWhen it is time, finish training, whereinRepresenting a trust domain constraint on the manifold,representing utilization of network parametersParameter(s)Customized control strategy,Representing the constraint limits (this condition is not shown in figure 1).
In this embodiment, a specific process of performing iterative training on the parameterized depth Q-value network is as follows:
step 1: assuming that the initial distribution of the park success is a standard normal distributionObtaining historical active value and predicted value of the park load and the output of the energy storage power at the corresponding moment;
1) To be provided withIs in an initial stateTo control the strategyFor stored energySecondary control to obtain the trackHere, theIs shown asThe number of the tracks is one,is a time of dayTo (1)The state of each track and the motion vector,as an output result of the energy storage policy network,for the parameters of the energy storage policy network,is as followsWheel strategy state-action trajectory;
2) To pairEach step inRecord its corresponding rewardHere, theThe energy storage-load regulation gains are realized;
3) To pairEach step inComputing an action-state cost function for a corresponding step using an action-state neural network;
4) To pairEach step inCalculating the energy storage Q-Value network corresponding to the step by using the energy storage Q-Value networkWhereinIs a parameter of the energy storage Q-Value network;
6) Estimating a policy gradient
Here, theIn order to be a strategy gradient, the gradient is determined,the total control wheel number of the load and the stored energy is represented;representing energy storage policy network inThe gradient of (d);
8) Solving the following system of equations:
9) Let iteration subscriptAnd sequentially updating the network parameters of the energy storage strategy:
if it is notCan meet the requirement when reducing the network loss of the energy storage strategyIf so, ending the process of updating the energy storage strategy network parameters; otherwise, continuing to execute the step 9;
10 To a tank Q-Value network, toFor labeling, updating parameters by adopting a random gradient descent algorithm:
here, theFor energy storage Q-Value network loss functionFor network parametersThe gradient of (a) of (b) is,;
11 1) to 10) are repeated until the energy storage Q-Value network parameterEnergy storage policy network parametersAnd finishing the training.
As shown in fig. 2, fig. 2 is a process of updating a parameter of trust domain-reinforcement learning, a direction indicated by an arrow is a direction for ensuring reduction of network loss of the energy storage policy or reduction of a random gradient, and a corresponding circle is a value range of the parameter under the update. The updating range of the parameters is smaller and smaller along with the updating times when the parameters are updated every time, so that the updating of the network parameters is realized in a limited number of times.
Fig. 3 and 4 are schematic diagrams of an energy storage strategy neural network and an energy storage state value neural network, respectively. The input of the energy storage strategy neural network comprises a load predicted value, a current load and a current energy storage charging and discharging power, and the probability corresponding to the future energy storage charging and discharging power state is output after the hidden layer operation; the input of the energy storage state value neural network comprises a load predicted value, a current load and a current energy storage charging and discharging power, and after the hidden layer operation, a Q value corresponding to a future energy storage charging and discharging power state is output.
Fig. 5 shows a simplified flow chart of parameterized deep Q-value network training. The training process is based on updating of the merit function, the energy storage strategy neural network realizes parameter updating through a trust domain method, and the energy storage Q-Value network updates network parameters through a random gradient descent method.
S300: and acquiring the current load active value and the energy storage power output, inputting the current load active value and the energy storage power output into the trained parameterized depth Q value network, and selecting a strategy corresponding to the maximum value in the output result and issuing the strategy to the energy storage sub-controller for energy storage scheduling control.
Based on the trained parameterized depth Q-value network, the real-time control steps for implementing optimal scheduling in this embodiment are as follows:
step 2: selecting the strategy corresponding to the ten values with the maximum output result in the energy storage strategy network;
The embodiment provides an optimal scheduling method for solving the problem that energy storage participates in peak clipping and valley filling, which comprises the steps of setting a parameterized depth Q value network, training the parameterized depth Q value network by using load historical data and the power yield of energy storage at the corresponding moment, and limiting the updating times of a control strategy by using a trust domain optimization model in the training process, so that an optimal strategy is rapidly and accurately obtained, and optimal scheduling control of energy storage is realized under the current condition. The invention utilizes the trust domain-reinforcement learning to limit the size of strategy updating in continuous control, does not change the distribution form greatly during each updating, ensures that the yield meets the adjustment and increment convergence, can correct the optimization result on line, and takes charge and discharge constraints into consideration to achieve the optimal peak clipping and valley filling control function.
The above is a detailed description of an embodiment of the optimal scheduling method for solving the problem that the stored energy participates in load shifting, and the following is a detailed description of an embodiment of the optimal scheduling system for solving the problem that the stored energy participates in load shifting.
The embodiment provides an optimal scheduling system for solving the problem that energy storage participates in peak clipping and valley filling, which comprises:
the system comprises a setting unit, a parameter setting unit and a parameter setting unit, wherein the setting unit is used for setting a parameterized depth Q value network, and the parameterized depth Q value network is used for parameterizing an input control strategy by utilizing a network parameter of the parameterized depth Q value network and outputting a plurality of parameterized control strategies;
a training unit for obtaining the historical active value and the predicted value of the load and the output of the energy storage power at the corresponding moment, inputting the output of the energy storage power at the initial moment, the active value and the predicted value of the load as initial states, controlling the energy storage by any initial energy storage control strategy, iteratively training the parameterized depth Q value network and updating the network parameters by taking the variance of the minimized load curve as a target, controlling the updating times of the network parameters by utilizing a trust domain optimization model, and meeting the requirementsWhen it is time, the training is ended, whereinRepresenting a trust domain constraint on the manifold,representing utilization of network parametersParameterized control strategy,Representing a constraint limit;
and the control unit is used for acquiring the current load active value and the energy storage power output, inputting the current load active value and the energy storage power output into the trained parameterized depth Q value network, and selecting the strategy corresponding to the maximum value in the output result and issuing the strategy to the energy storage sub-station controller for energy storage scheduling control.
The parameterized depth Q-value network specifically includes: an energy storage strategy neural network and an energy storage state value neural network;
the energy storage strategy neural network is a Q-Value network based on approximate state-action energy storageSet up as corresponding network parameters of;
The energy storage state Value neural network is an energy storage Q-Value network according to an approximate stateSet up as corresponding network parameters;
Wherein,the status is represented by a number of time slots,the representation of the motion is shown as,which is indicative of the time of day,the energy storage control strategy is represented by,indicating a stateWhen taking actionThe value of the time-domain data is,indicating a stateFor all possible actionsIn the light of the expected value of,the indication of the return is that,representing a discount factor.
In addition, the trust domain optimization model is specifically as follows:
in the formula,which represents the control strategy before the update,representing per-network parametersThe updated control strategy is then used to control the power converter,indicating an expected discount return for the updated control strategy compared to the pre-updated control strategy,representing trust domain constraints between the updated control strategy and the control strategy before the update.
Further, the process of iteratively training the parameterized depth Q-value network and updating the network parameters by the training unit specifically includes:
taking the initial state as an initial state to control the strategyTo store energySecondary control to obtain strategy state-action trackWhereinAs an output result of the energy storage strategy neural network,for the parameters of the energy storage policy network,is as followsThe wheel strategy state-the motion trajectory,is as followsA track and,is a time of dayTo (1) aA trajectory state and a motion vector;
for theEach step inAll record the corresponding return and calculate the action-state cost function of the corresponding step by utilizing the energy storage strategy neural network based on the returnAnd calculating a state cost function of the corresponding step by using the energy storage state cost neural networkWhereinIs a parameter of the energy storage state value neural network;
for theEach step inComputing a merit function based on the action-state cost function and the state cost function,;
Estimating a policy gradient based on the merit function,Whereinthe total control wheel number of the load and the stored energy is represented;representing the energy storage strategy neural network inThe gradient of (d);
computing the energy storage policy neural network pair based on the policy gradientSecond order partial derivative of,WhereinIs an auxiliary variable and has no actual physical significance;
let iteration subscriptSequentially updating the network parameters of the energy storage strategy neural network to be,WhereinRepresenting the maximum backtracking times of the step length of the energy storage strategy neural network;
to the energy storage state value neural network toFor labeling, a random gradient descent algorithm is adopted to update parameters into,WhereinValue neural network loss function for the energy storage stateFor network parametersThe gradient of (a) of (b) is,;
Further, the expression for minimizing the variance of the load curve of the present embodiment is specifically as follows:
in the formula,the number of load data points in a day is determined by the predicted load data, and the current time is set to correspond to the first time() Individual load data;is a time of dayIs a known amount, andthe time is the actual load,the time is the predicted load;is a time of dayArrival timeBetween BES, the battery is charged positive, the discharge is negative, andthe time is a known amount of the compound,is the control variable.
It should be noted that, the optimal scheduling system for solving energy storage participation peak clipping and valley filling provided in this embodiment is used to implement the optimal scheduling method provided in the foregoing embodiment, and specific settings of each unit are based on complete implementation of the method, which is not described herein again.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (4)
1. An optimal scheduling method for solving the problem that energy storage participates in peak clipping and valley filling is characterized by comprising the following steps:
setting a parameterized depth Q value network, wherein the parameterized depth Q value network is used for parameterizing an input control strategy by utilizing network parameters of the parameterized depth Q value network and outputting a plurality of parameterized control strategies, and the parameterized depth Q value network specifically comprises the following steps: an energy storage strategy neural network and an energy storage state value neural network;
the energy storage strategy neural network is a Q-Value network based on approximate state-action energy storageSet up as corresponding network parameters of;
The energy storage state Value neural network is an energy storage Q-Value network according to an approximate stateSet up as corresponding network parameters;
Wherein,the status is represented by a number of time slots,the motion is represented by a motion vector representing the motion,which is indicative of the time of day,the energy storage control strategy is represented by,indicating a stateWhen taking actionThe value of the time-domain data is,indicating a stateFor all possible actionsIn the light of the expected value of,the indication of the return is that,represents a discount factor;
acquiring historical active values and predicted values of loads and energy storage power output at corresponding moments, inputting the historical active values and predicted values of the loads by taking the energy storage power output at initial moments, the active values and the predicted values of the loads as initial states, controlling energy storage by any initial energy storage control strategy, performing iterative training on the parameterized deep Q value network by taking the variance of a minimized load curve as a target, updating network parameters, and utilizing the variance of a minimized load curve as a targetThe trust domain optimization model controls the updating times of the network parameters to meet the conditionsWhen it is time, the training is ended, whereinRepresenting a trust domain constraint on the manifold,representing utilization of network parametersParameterized control strategy,A constraint limit value is indicated and,andrepresenting network parametersThe trust domain optimization model specifically comprises:
in the formula,which indicates the control strategy before the update,representing per-network parametersThe updated control strategy is then used to control the power converter,indicating an expected discount return for the updated control strategy compared to the pre-update control strategy,representing a trust domain constraint between the updated control strategy and the control strategy before updating; performing iterative training on the parameterized depth Q value network, updating the network parameters, and controlling the updating times of the network parameters by using a trust domain optimization model to meet the conditionsAnd then, finishing the training, specifically comprising:
taking the initial state as an initial state to control the strategyTo store energySecondary control to obtain strategy state-action trackIn whichAs an output result of the energy storage strategy neural network,for the parameters of the energy storage policy network,is as followsThe wheel strategy state-the motion trajectory,is as followsA track and,is a time of dayTo (1) aA trajectory state and an action vector;
for theEach step inAll record the corresponding return and calculate the action-state cost function of the corresponding step by utilizing the energy storage strategy neural network based on the returnAnd calculating correspondences using the energy storage state value neural networkState cost function of stepWhereinIs a parameter of the energy storage state value neural network;
for theEach step inComputing a merit function based on the action-state cost function and the state cost function,;
Estimating a policy gradient based on the merit function,Whereinthe total control wheel number of the load and the stored energy is represented;representing the energy storage strategy neural network inThe gradient of (d);
computing the energy storage policy neural network pair based on the policy gradientSecond order partial derivative of,WhereinIs an auxiliary variable and has no actual physical significance;
let iteration subscriptSequentially updating the network parameters of the energy storage strategy neural network to be,WhereinRepresenting the maximum backtracking times of the step length of the energy storage strategy neural network;
to the energy storage state value neural network toFor labeling, a random gradient descent algorithm is adopted to update parameters into,WhereinValue neural network loss function for the energy storage stateFor network parametersThe gradient of (a) of (b) is,;
repeating the above steps until the conditions are metAndwhen the training is finished, finishing the training;
and acquiring the current load active value and the energy storage power output, inputting the current load active value and the energy storage power output into the trained parameterized depth Q value network, and selecting a strategy corresponding to the maximum value in the output result and issuing the strategy to the energy storage sub-controller for energy storage scheduling control.
2. The optimal scheduling method for solving the problem of energy storage participation peak clipping and valley filling according to claim 1, wherein the expression of the variance of the minimized load curve is specifically as follows:
in the formula,for the number of load data points in a day, fromDetermining the load measuring data, and setting the current time to correspond to the second time() Individual load data;is a time of dayIs a known amount, andthe time is the actual load, and the load is,the time is the predicted load;is a time of dayArrival timeBetween BES, the battery is charged positive, the discharge is negative, andwhen the amount of the water is a known amount,is the control variable.
3. An optimal scheduling system for solving the problem that energy storage participates in peak clipping and valley filling is characterized by comprising the following steps of:
the system comprises a setting unit, a parameterized depth Q value network, a parameter setting unit and a parameter setting unit, wherein the parameterized depth Q value network is used for parameterizing an input control strategy by using a network parameter of the parameterized depth Q value network and outputting a plurality of parameterized control strategies, and the parameterized depth Q value network specifically comprises: an energy storage strategy neural network and an energy storage state value neural network;
the energy storage strategy neural network is a Q-Value network based on approximate state-action energy storageSet up as corresponding network parameters;
The energy storage state Value neural network is an energy storage Q-Value network according to an approximate stateSet up as corresponding network parameters;
Wherein,the status is represented by a number of time slots,the representation of the motion is shown as,which is indicative of the time of day,the energy storage control strategy is represented by,indicating a stateWhen taking actionThe value of the time-frequency response is,indicating a stateFor all possible actionsIn the light of the expected value of,the indication of the return is that,represents a discount factor;
a training unit for obtaining historical active value and predicted value of load and energy storage power output at corresponding time, inputting the energy storage power output, active value of load and predicted value at initial time as initial state, controlling energy storage by any initial energy storage control strategy, iteratively training the parameterized depth Q value network and updating the network parameter by using the variance of minimized load curve as target, controlling the update times of the network parameter by using a trust domain optimization model, and satisfying the conditionWhen it is time, finish training, whereinRepresenting a trust domain constraint on the manifold,representing utilization of network parametersParameterized control strategy,A constraint limit value is indicated and,andrepresenting network parametersThe trust domain optimization model specifically comprises:
in the formula,which indicates the control strategy before the update,representing per-network parametersThe updated control strategy is then used to control the power converter,indicating an expected discount return for the updated control strategy compared to the pre-updated control strategy,representing a trust domain constraint between the updated control strategy and the control strategy before updating; the process of the training unit performing iterative training on the parameterized depth Q-value network and updating the network parameters specifically includes:
taking the initial state as an initial state to control the strategyTo store energySecondary control to obtain strategy state-action trackWhereinAs an output result of the energy storage strategy neural network,for the parameters of the energy storage policy network,is a firstThe wheel strategy state-the motion trajectory,is a firstA track and,is a time of dayTo (1) aA trajectory state and an action vector;
for theEach step inAll record the corresponding return and calculate the action-state cost function of the corresponding step by utilizing the energy storage strategy neural network based on the returnAnd calculating a state cost function of the corresponding step by using the energy storage state cost neural networkWhereinIs a parameter of the energy storage state value neural network;
forEach step inComputing a merit function based on the action-state cost function and the state cost function,;
Estimating a policy gradient based on the merit function,Whereinthe total control wheel number of the load and the stored energy is represented;representing the energy storage strategy neural network inThe gradient of (d);
computing the energy storage policy neural network pair based on the policy gradientSecond order partial derivative of,WhereinIs an auxiliary variable and has no actual physical significance;
index iterationSequentially updating the network parameters of the energy storage strategy neural network to be,WhereinRepresenting the maximum backtracking times of the step length of the energy storage strategy neural network;
to the energy storage state value neural network toFor labeling, a random gradient descent algorithm is adopted to update parameters into,In whichValue neural network loss function for the energy storage stateFor network parametersThe gradient of (a) of (b) is,;
repeating the above steps until the conditions are metAndwhen the training is finished, the training is finished;
and the control unit is used for acquiring the current load active value and the energy storage power output, inputting the current load active value and the energy storage power output into the trained parameterized depth Q value network, and selecting a strategy corresponding to the maximum value in the output result and issuing the strategy to the energy storage sub-controller for energy storage scheduling control.
4. The optimal scheduling system for solving the problem of energy storage participation peak clipping and valley filling according to claim 3, wherein the expression of the variance of the minimized load curve is specifically as follows:
in the formula,the number of load data points in a day is determined by the predicted load data, and the current time is set to correspond to the first time() Individual load data;is a time of dayIs a known amount, andthe time is the actual load, and the load is,the time is the predicted load;is a time of dayArrival timeBetween BES, the battery is charged positive, the discharge is negative, andwhen the amount of the water is a known amount,is the control variable.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210916196.3A CN115001002B (en) | 2022-08-01 | 2022-08-01 | Optimal scheduling method and system for solving problem of energy storage participation peak clipping and valley filling |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210916196.3A CN115001002B (en) | 2022-08-01 | 2022-08-01 | Optimal scheduling method and system for solving problem of energy storage participation peak clipping and valley filling |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115001002A CN115001002A (en) | 2022-09-02 |
CN115001002B true CN115001002B (en) | 2022-12-30 |
Family
ID=83021019
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210916196.3A Active CN115001002B (en) | 2022-08-01 | 2022-08-01 | Optimal scheduling method and system for solving problem of energy storage participation peak clipping and valley filling |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115001002B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116826816B (en) * | 2023-08-30 | 2023-11-10 | 湖南大学 | Energy storage active-reactive coordination multiplexing method considering electric energy quality grading management |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109347149A (en) * | 2018-09-20 | 2019-02-15 | 国网河南省电力公司电力科学研究院 | Micro-capacitance sensor energy storage dispatching method and device based on depth Q value network intensified learning |
CN110365057A (en) * | 2019-08-14 | 2019-10-22 | 南方电网科学研究院有限责任公司 | Distributed energy participation power distribution network peak regulation scheduling optimization method based on reinforcement learning |
CN110488861A (en) * | 2019-07-30 | 2019-11-22 | 北京邮电大学 | Unmanned plane track optimizing method, device and unmanned plane based on deeply study |
CN113242469A (en) * | 2021-04-21 | 2021-08-10 | 南京大学 | Self-adaptive video transmission configuration method and system |
CN113572157A (en) * | 2021-07-27 | 2021-10-29 | 东南大学 | User real-time autonomous energy management optimization method based on near-end policy optimization |
CN114630299A (en) * | 2022-03-08 | 2022-06-14 | 南京理工大学 | Information age-perceptible resource allocation method based on deep reinforcement learning |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220164657A1 (en) * | 2020-11-25 | 2022-05-26 | Chevron U.S.A. Inc. | Deep reinforcement learning for field development planning optimization |
-
2022
- 2022-08-01 CN CN202210916196.3A patent/CN115001002B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109347149A (en) * | 2018-09-20 | 2019-02-15 | 国网河南省电力公司电力科学研究院 | Micro-capacitance sensor energy storage dispatching method and device based on depth Q value network intensified learning |
CN110488861A (en) * | 2019-07-30 | 2019-11-22 | 北京邮电大学 | Unmanned plane track optimizing method, device and unmanned plane based on deeply study |
CN110365057A (en) * | 2019-08-14 | 2019-10-22 | 南方电网科学研究院有限责任公司 | Distributed energy participation power distribution network peak regulation scheduling optimization method based on reinforcement learning |
CN113242469A (en) * | 2021-04-21 | 2021-08-10 | 南京大学 | Self-adaptive video transmission configuration method and system |
CN113572157A (en) * | 2021-07-27 | 2021-10-29 | 东南大学 | User real-time autonomous energy management optimization method based on near-end policy optimization |
CN114630299A (en) * | 2022-03-08 | 2022-06-14 | 南京理工大学 | Information age-perceptible resource allocation method based on deep reinforcement learning |
Non-Patent Citations (1)
Title |
---|
应对新能源预测偏差不确定性的电力系统动态经济调度研究;吕晓茜;《中国优秀硕士学位论文全文数据库-工程科技II辑》;20220228;29-30 * |
Also Published As
Publication number | Publication date |
---|---|
CN115001002A (en) | 2022-09-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110059844B (en) | Energy storage device control method based on ensemble empirical mode decomposition and LSTM | |
Jasmin et al. | Reinforcement learning approaches to economic dispatch problem | |
CN112614009A (en) | Power grid energy management method and system based on deep expected Q-learning | |
Zhou et al. | Reinforcement learning-based scheduling strategy for energy storage in microgrid | |
CN112491094B (en) | Hybrid-driven micro-grid energy management method, system and device | |
CN105631528B (en) | Multi-target dynamic optimal power flow solving method based on NSGA-II and approximate dynamic programming | |
CN117277357B (en) | Novel thermal power energy storage frequency modulation method and system adopting flow battery and electronic equipment | |
CN111367349A (en) | Photovoltaic MPPT control method and system based on prediction model | |
CN112213945B (en) | Improved robust prediction control method and system for electric vehicle participating in micro-grid group frequency modulation | |
CN114784823A (en) | Micro-grid frequency control method and system based on depth certainty strategy gradient | |
CN115001002B (en) | Optimal scheduling method and system for solving problem of energy storage participation peak clipping and valley filling | |
CN116629461B (en) | Distributed optimization method, system, equipment and storage medium for active power distribution network | |
CN116436003B (en) | Active power distribution network risk constraint standby optimization method, system, medium and equipment | |
CN116862050A (en) | Time sequence network-based daily prediction method, system, storage medium and equipment for carbon emission factors | |
CN118381095B (en) | Intelligent control method and device for energy storage charging and discharging of new energy micro-grid | |
CN115986839A (en) | Intelligent scheduling method and system for wind-water-fire comprehensive energy system | |
Harrold et al. | Battery control in a smart energy network using double dueling deep q-networks | |
CN111313449A (en) | Cluster electric vehicle power optimization management method based on machine learning | |
CN111516702B (en) | Online real-time layered energy management method and system for hybrid electric vehicle | |
Wang et al. | Prioritized sum-tree experience replay TD3 DRL-based online energy management of a residential microgrid | |
CN117833316A (en) | Method for dynamically optimizing operation of energy storage at user side | |
CN111799820B (en) | Double-layer intelligent hybrid zero-star cloud energy storage countermeasure regulation and control method for power system | |
CN115459320B (en) | Intelligent decision-making method and device for aggregation control of multipoint distributed energy storage system | |
CN114048576B (en) | Intelligent control method for energy storage system for stabilizing power transmission section tide of power grid | |
CN116979579A (en) | Electric automobile energy-computing resource scheduling method based on safety constraint of micro-grid |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |