CN112952831A - Intra-day optimization operation strategy for providing stacking service by load side energy storage - Google Patents

Intra-day optimization operation strategy for providing stacking service by load side energy storage Download PDF

Info

Publication number
CN112952831A
CN112952831A CN202110416788.4A CN202110416788A CN112952831A CN 112952831 A CN112952831 A CN 112952831A CN 202110416788 A CN202110416788 A CN 202110416788A CN 112952831 A CN112952831 A CN 112952831A
Authority
CN
China
Prior art keywords
time
formula
decision
day
energy storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110416788.4A
Other languages
Chinese (zh)
Inventor
李卫东
温可瑞
刘娆
巴宇
王海霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN202110416788.4A priority Critical patent/CN112952831A/en
Publication of CN112952831A publication Critical patent/CN112952831A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/008Circuit arrangements for ac mains or ac distribution networks involving trading of energy or energy transmission rights
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/24Arrangements for preventing or reducing oscillations of power in networks
    • H02J3/241The oscillation concerning frequency
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/28Arrangements for balancing of the load in a network by storage of energy
    • H02J3/32Arrangements for balancing of the load in a network by storage of energy using batteries with converting means
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/38Arrangements for parallely feeding a single network by two or more generators, converters or transformers
    • H02J3/46Controlling of the sharing of output between the generators, converters, or transformers
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]

Landscapes

  • Engineering & Computer Science (AREA)
  • Power Engineering (AREA)
  • Supply And Distribution Of Alternating Current (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

An intra-day optimization operation strategy for providing stack service by load side energy storage provides a 'look-ahead-value function approximation' mixed intra-day operation strategy of an online rolling optimization two-stage robust approximate dynamic programming model, and can optimize a power reference point in real time in a limited time window. And in the previous stage, an approximate dynamic programming idea is applied to introduce a post-decision state approximate function so as to represent long-term expected net benefits in different time period states, and a difference learning algorithm is utilized to carry out off-line training on the approximate function. In the operation process in the day, the power reference points of each period are dynamically obtained by combining rolling update prediction of electricity prices and loads, a frequency uncertainty set and a long-term time domain approximate function through a two-stage robust approximate dynamic programming model in online rolling optimization. The strategy provided by the invention can effectively guarantee the frequency regulation capability of the energy storage at the load side, can quickly evaluate the long-term influence of real-time decision by the offline training-online application of the approximate function, and gives consideration to the overall economic benefit and the online operation overhead.

Description

Intra-day optimization operation strategy for providing stacking service by load side energy storage
Technical Field
The invention belongs to the field of operation control of a battery energy storage system, and relates to an intra-day optimization operation strategy for providing stacking service by load side energy storage.
Background
Along with the establishment of a power market system and the rapid improvement of the technical performance of the battery, the quantity and the scale of the energy stored on the load side are obviously increased. The energy storage at the load side can effectively reduce the electricity purchasing cost of power consumers by responding to the real-time electricity price and the low storage and high discharge of the market. The economic benefits of the single application mode are often difficult to balance with the initial capital cost, subject to the relatively high cost of energy storage cells. If the electric power auxiliary service market is considered on the basis of electricity price response, the electric power auxiliary service market is flexibly participated, and the potential technical and economic values of the electric power auxiliary service market are beneficially mined. The load side energy storage is used for providing electricity price response and PFR double-stacking service, and the technical and economic values can be reasonably improved.
On the basis of participating in the bidding of the PFR market and finishing capacity clearing in the day ahead, the power reference points of each time period need to be reasonably optimized in the operation process in the day, and the SoC is dynamically adjusted to realize electricity price response and guarantee the reliable margin of frequency modulation. However, because the multidimensional information has uncertainty, the operating decision within the energy storage day is challenged. The frequency data is on the second scale and is difficult to predict effectively, and the electricity price and the load are on the minute scale and can be predicted in a rolling updating mode. How to utilize information data of different known degrees to perform real-time optimization in a limited time window and effectively ensure the economic benefit of daily operation and the reliability of frequency adjustment is a problem to be solved urgently at present.
The existing real-time optimization strategies are forward-looking strategies, and online optimization decisions are developed by utilizing the prediction information. The look-ahead strategy can be divided into Model Predictive Control (MPC) and Stochastic Model Predictive Control (SMPC) according to different information utilization degrees and optimization time domains. Based on the frequency prior information and the rolling updated short-term prediction data, the MPC obtains a real-time decision by solving a robust optimization model in a rolling prediction time domain. Although the time limit requirement can be met and the frequency modulation reliability can be guaranteed, the local optimization time domain reduces the operation economic benefit. The SMPC online rolling optimization two-stage robust-random optimization model can expand an optimization time domain and improve operation benefit by considering the random stage, but the problem solving scale is overlarge, so that the operation overhead is difficult to meet the optimization time limit requirement.
Disclosure of Invention
In order to overcome the defects, the invention provides a 'look-ahead-value function approximation' hybrid (LVFA) strategy of an online rolling optimization two-stage robust approximation dynamic programming model.
The technical scheme adopted by the invention is as follows:
an intra-day optimization operation strategy for providing stack service by load side energy storage, the strategy comprising the steps of:
step 1: under the dual market mechanism of real-time electricity price and PFR, the load-side energy storage is used to provide electricity price response and PFR stacking service, and the basic flow is shown in fig. 1.
Day-ahead bid (1 day): and optimizing the bidding capacity participating in the PFR market based on the historical statistical data and the short-term prediction information. When the PFR market is cleared, the bid amount and the compensation price are made clear, and the PFR service is required to be provided strictly according to the bid amount in a day.
Run daily (5 min): and dynamically optimizing the power reference point of each time interval based on the frequency prior information and the rolling updated short-term prediction data. The power reference point can coordinate dual functions: firstly, the electricity utilization cost is reduced by responding to the real-time electricity price and the low storage and high generation; and secondly, dynamically adjusting the SoC space margin to ensure that the energy storage provides the adjusting performance of the PFR. The power reference point should be maintained constant during the same time period.
Real-time control (1 s): and calculating the energy storage real-time control power of the load side based on the local frequency information and the power reference point, and optimally distributing power instructions among the energy storage units by combining BMS real-time monitoring information.
In the process, the optimization operation in the day belongs to a key link of the top-down, and the optimization decision result has obvious influence on the technical and economic benefits of the energy storage at the load side. The invention is developed aiming at optimizing the operation problem in the day by knowing the PFR capacity and the compensation price of the load side energy storage after the market is cleared.
Step 2: and modeling is carried out aiming at the dynamic process of the day-to-day optimized operation, and related information quantity, decision quantity, revenue function and the like are determined. And discretizing a limited time domain range T of the daily operation process by taking the duration time delta T of the real-time electricity price as 5min as a granularity, and defining a time set T as {0, delta T,2 delta T, …, T }.
As can be seen from fig. 1, the energy storage and the aggregate load operate in a coordinated manner within a day, and the real-time interaction power between the energy storage and the aggregate load and the power grid meets the active power balance. The direction of the injected power from the grid is selected as the positive direction, and then:
Pt g=Pt b+Pt l (1)
in the formula: pt g、Pt b、Pt lRespectively, the time t is the power grid, the energy storage and the load power. Real time power Pt bFrom the time interval power reference point Pt eAnd the real-time PFR power is obtained by superposition of the two parts, and is represented as:
Pt b=Pt e+α·Δft (2)
wherein, α · Δ ftRepresenting the real-time PFR power provided by simulating droop control of a synchronous generator set and automatically responding to frequency deviation, alpha representing a droop coefficient, and deltaftRepresents the average value of the frequency deviation within Δ t; pt eRepresenting the time period t power reference point. The frequency sampling period is in the second or millisecond level, and if frequency information is modeled according to the time granularity, the model in the day has the characteristic of double time scales, so that the problem complexity is greatly improved. Considering that the unit decision period is delta t, the invention adopts the average value delta f of the frequency deviation in delta ttApproximately characterizing the corresponding sequence of frequency deviations Δ ftThe influence of the approximate modeling on the operation process is less than 2 multiplied by 10-4
The load side stored energy provides the PFR response output according to the power-frequency characteristic. After the frequency crosses the dead zone, linearly responding to the frequency deviation; and when the frequency deviation exceeds the linear response interval, the force is exerted according to the medium bid capacity. Accordingly, the droop coefficient α in equation (2) can be expressed as:
Figure BDA0003026243960000021
in the formula: 1{z}A function is indicated for the condition, when z is true, the function value is 1, otherwise, the function value is 0; Δ fmaxMaximum frequency deviation of the linear response interval; pfThe medium power capacity in the PFR market for energy storage.
According to the PFR mechanism rule, the PFR deviation ratio should be lower than the maximum allowable value. In order to avoid the penalty risk caused by frequency modulation deviation, the invention requires that the reliability of PFR is provided to be 100%. For this reason, the frequency regulation capability of the load-side energy storage needs to be guaranteed from both power and electric quantity aspects:
Figure BDA0003026243960000031
in the formula: pnRated power for energy storage; smax、sminRespectively, the upper and lower limits allowed by the SoC.
With charging and discharging during operation within a day, the SoC dynamic transition from time t to t + Δ t can be described as:
Figure BDA0003026243960000032
in the formula: stStoring the energy SoC for the time t; etac、ηdRespectively charge and discharge efficiency; e.g. of the typemaxThe maximum amount of energy stored.
The stack service is provided by applying the energy storage under the dual market mechanism, so that the electricity purchasing cost can be reduced by responding to the electricity price, and the compensation income can be obtained by providing the PFR service. Meanwhile, the aging and decline of the battery can be caused by frequent charging and discharging of the stored energy, and the aging and loss cost is calculated according to the charging and discharging amount.
In conclusion, the load side energy storage device in a single operation periodNet economic benefit of raw CtCan be expressed as:
Figure BDA0003026243960000033
in the formula:
Figure BDA0003026243960000034
real-time electricity prices for a time period t, cfCompensating prices for PFR, cagIs the unit aging cost.
And step 3: the sequential decision process under uncertain environment is considered to be operated in the day aiming at the optimization in the day, so the problem is further constructed into a Markov Decision Process (MDP) model. The MDP is used as a general model of a random sequential decision problem and mainly comprises the following steps: the system comprises five basic elements of a state variable, a decision variable, random information, a transfer function and an objective function.
The state variables are as follows: reflecting the current state, including physical and information states. The specific definition is as follows:
Figure BDA0003026243960000035
decision variables: adjusting the power reference point according to the current state, so that the decision variable xtComprises the following steps:
xt=(Pt e),xt∈χt (8)
in the formula: chi shapetAnd (3) a feasible space is provided for decision variables formed by the formulas (2) to (5) so as to ensure the frequency regulation capability of the energy storage at the load side in the operation process.
Random information: for modeling random factors in the operation process, random information WtIs defined as:
Figure BDA0003026243960000036
in the formula:
Figure BDA0003026243960000041
and
Figure BDA0003026243960000042
respectively, the deviation amount between the predicted value (estimated value) and the actual value of the random information.
Based on this, the daily operating process can be described by the following status, decision and random information:
Figure BDA0003026243960000043
transfer function: the method refers to a process of transferring from a current state to a next state according to decision and random information:
St+Δt=SM(St,xt,Wt+Δt) (11)
in the formula: sM(. includes the dynamic transfer process for each state variable. Wherein, the process of SoC transfer between adjacent time periods is shown as formula (5).
For the dynamic transfer of real-time electricity price, load power and frequency deviation, all belong to the information process with independent state, and are respectively expressed as:
Figure BDA0003026243960000044
Figure BDA0003026243960000045
Figure BDA0003026243960000046
in the formula:
Figure BDA0003026243960000047
and
Figure BDA0003026243960000048
the predicted values (estimated values) of the electricity price, load, and frequency deviation are provided.
Figure BDA0003026243960000049
Δft+ΔtRespectively representing the real values of electricity price, load and frequency deviation;
Figure BDA00030262439600000410
respectively representing deviation amounts between predicted values (estimated values) and actual values of electricity prices, loads and frequency deviations;
an objective function: for day-to-day operational problems, the goal is to maximize the cumulative expected net benefit over each time period while meeting relevant technical performance conditions. Objective function F*Expressed as:
Figure BDA00030262439600000411
in the formula: e {. | · } represents a conditional expectation; ct(St,xt) Is the net benefit function of time period t, the physical meaning of which is equivalent to equation (6); s0Represents an initial state; ctRepresenting a time period t gain; stRepresents the time period t state; x is the number oftRepresenting the time period t decision.
And 4, step 4: the MDP model in the day constructed in the step 3 defines relevant variables and dynamic transfer process on the whole, and concretely solves the problem of real-time optimization: assume that the current time is tcAccording to the predicted time domain range of the rolling update, the subsequent time domain is divided into two stages: a short-term time domain and a long-term time domain.
Definition 1. short-term time domain: current time tcTo a future time tf=tcThe rolling update between + H Δ t predicts the time domain, where H represents the number of rolling update time segments. In the H delta t time domain range, the electricity price and the load can be accurately predicted.
Definition 2. long-term time domain: future time tfThe time domain range between the arrival time T and the end time T. For this time domain range, the power rate and load day-ahead prediction information and its error distribution are known.
The invention proposes to select an approximate function of the appropriate type
Figure BDA00030262439600000412
Representing expected net benefit of long-term time domain, and based on MDP model and available information in the day-ahead, approximating function to each time interval
Figure BDA00030262439600000413
And carrying out off-line calculation to effectively approximate the actual value of the expected net benefit. Through an offline calculation-online application mode, the model complexity of online rolling optimization can be fundamentally reduced, and therefore the global optimization benefit and the online operation efficiency are effectively considered. The basic idea of time domain partitioning and introducing an approximation function is shown in fig. 2.
Therefore, a two-stage robust approximate dynamic programming model combining frequency prior information and an offline approximate function is built in a day. And in a short-term time domain, counting frequency information to obtain a priori uncertain set, rolling and updating the ultra-short-term prediction information of the electricity price and the load, and constructing a rolling and updating prediction robust model in the time domain to ensure the reliability of the PFR. And (5) calling an approximate function of offline calculation in a long-term time domain, and quickly evaluating the subsequent expected net benefit. In the real-time optimization operation process, the current time tcThe corresponding optimization model can be expressed as:
Figure BDA0003026243960000051
s.t.(1)-(8)(16)
Figure BDA0003026243960000052
in the formula: Γ is the uncertainty set of frequency deviations. According to the robust optimization idea, the fluctuation range of the frequency deviation is represented by the gamma, and the optimal solution under the worst condition is searched according to the set, so that the frequency adjustment capability is guaranteed. The invention adopts an interval form to describe the fluctuation range, namely a box type uncertain set is formed, which is expressed as follows:
Γ={Δft|Δflow≤Δft≤Δfup,t∈T} (17)
in the formula: Δ fup、ΔflowThe upper and lower bounds of the frequency deviation, respectively.
The real-time run strategy can be generalized to a strategy type of hybrid look-ahead and value function approximation according to the optimization model described above as shown in equation (16). The first half part in the parentheses of the formula (16) takes account of the prediction information updated in a rolling way, so that the formula has a 'look ahead' structural feature; the second half adopts an approximation function to represent the expected net benefit of the corresponding time domain, so the method has the structural characteristic of the approximation function. Based on the above two-part structural feature mixture, the corresponding day-optimized operation process is shown in fig. 3. And predicting an uncertain set of frequency information and a long-term time domain approximate function calculated offline by combining rolling update of electricity price and load, and dynamically acquiring the power reference point of each period by using an online rolling solution (16) optimization model.
And 5: for the proposed real-time operation strategy, the key to the performance is: and selecting the type of approximate function, and further performing off-line calculation on the approximate function of each time period to effectively approximate the actual value of the expected net benefit. The invention introduces the post-decision state and the approximate Piecewise Linear Function (PLF) thereof, and converts the Bellman equation into an approximate dynamic programming form.
Based on the Bellman principle, the in-day MDP can be decoupled into a plurality of sub-problems in a single time period, and then the expected benefit value in each time period state can be solved through reverse order recursion. Defining an optimum function
Figure BDA0003026243960000053
Representing the cumulative expected maximum net benefit value from time T to the final time T. Equation (15) can be recursively expressed as:
Figure BDA0003026243960000054
however, solving equation (18) requires calculating the conditional expectation of all feasible states
Figure BDA0003026243960000061
Because of the problem of 'dimension disaster' caused by overlarge discrete scale of the state space and the decision space, the direct calculation of the expected value has serious calculation obstacle.
To overcome the above problems, post-decision state variables are introduced
Figure BDA0003026243960000062
It refers to the state after the energy storage execution decision and when the new random information has not yet arrived. The method for determining the state after decision comprises the following steps: a decision and information decoupling method, a state-behavior pairing method and a point estimation method. The invention adopts a point estimation method, namely, a state variable after decision is determined by using a predicted value (estimated value) of random information, which is specifically expressed as follows:
Figure BDA0003026243960000063
in the formula:
Figure BDA0003026243960000064
is a predicted value (estimated value) of all random information.
Using the post-decision value function to approximate the surrogate condition expectation, equation (19) can be transformed into the deterministic form:
Figure BDA0003026243960000065
in the formula: value function after decision
Figure BDA0003026243960000066
Due to the fact that
Figure BDA0003026243960000067
The state space of (2) is continuous and large in scale, and all state values are calculated
Figure BDA0003026243960000068
Difficulties still remain. Therefore, it is considered that the method of approximation using a value function is adopted to obtain all
Figure BDA0003026243960000069
The state value of (2).
The invention adopts a piecewise linear concave function
Figure BDA00030262439600000610
Approximate characterization
Figure BDA00030262439600000611
For a particular
Figure BDA00030262439600000612
Figure BDA00030262439600000613
Only related to a single variable of SoC, expressed as:
Figure BDA00030262439600000614
in the formula ItThe number of segments of the state value function after decision for the time period t;
Figure BDA00030262439600000615
for the slope of the ith segment, the slope of each segment should remain monotonically decreasing, i.e., for preserving the concavity of the piecewise linear function
Figure BDA00030262439600000616
ri,tThe resource allocation amount of the ith segment meets the following conditions:
Figure BDA00030262439600000617
in the formula (I), the compound is shown in the specification,
Figure BDA00030262439600000618
the maximum amount of resources for the ith segment of time period tth.
Despite the use of PLF approximation to characterize a particular
Figure BDA00030262439600000619
Is as follows
Figure BDA00030262439600000620
But because of
Figure BDA00030262439600000621
Including multidimensional continuous variables, it is necessary to enumerate all PLFs corresponding thereto. The invention applies hierarchical aggregation rule generation
Figure BDA00030262439600000622
And correspondingly, a small number of PLFs are used for weighting and combining different PLFs so as to reduce the operation complexity.
The formula (21) is substituted into the formula (20), an approximate dynamic programming form of the Bellman equation can be derived, and an approximate optimal decision value of the time period t can be obtained by solving the formula, and the formula is specifically expressed as follows:
Figure BDA0003026243960000071
in the formula, arg max (-) represents a decision variable corresponding to the optimal objective function,
Figure BDA0003026243960000072
r defined by formula (22)i,tCan be used.
Step 6: the invention applies a time difference learning (TD (1)) algorithm with a discount factor lambda being 1 to carry out off-line training on PLFs in all periods, the TD (1) algorithm is a bidirectional algorithm combining forward simulation and reverse updating, the differential learning process with the discount factor lambda being 1 is adopted to update the section slope of the PLFs in an iterative manner, the specific flow is shown in figure 4, and the method comprises the following steps:
step 6.1: setting the power P in PFRfAnd an initial SoC; the initial slope of the PLF for each period being zero, or other estimate, to speed up receptionConverging; and setting the maximum iteration number N of the off-line training, and enabling N to be 1.
Step 6.2: and generating a next-day electricity price and load scene set by applying Monte-Carlo simulation according to the day-ahead prediction information and the error distribution thereof. Considering that the frequency is difficult to predict effectively, a frequency scene set is selected from the historical frequency sequence. Extracting a sample path of the nth iteration training, namely omega, in the scene setn∈Ω。
Step 6.3: and forward simulating a sequential decision process, and carrying out forward recursive solution (23) along a sample path to obtain the optimal decision of each time interval. And calculating the marginal contribution and the marginal flow of each time interval according to the data, and providing a random sampling observation value for reversely updating the slope of the PLF of each time interval. If the (n-1) th iteration update is completed, the time intervals are
Figure BDA0003026243960000073
As is known, it can be used for the nth iteration. Decision of time period t in nth iteration
Figure BDA0003026243960000074
Solving the following equation yields:
Figure BDA0003026243960000075
in the formula: the superscript n denotes the variable used in the nth iteration.
This step is executed by first setting t to 0.
1) And judging whether T < T is met. If yes, executing the following contents, and respectively calculating the marginal contribution and the marginal flow; otherwise, the reverse update procedure of step 6.4 is performed.
2) Determining left and right numerical derivatives of the marginal contribution, respectively
Figure BDA0003026243960000076
And
Figure BDA0003026243960000077
Figure BDA0003026243960000078
in the formula:
Figure BDA0003026243960000079
δ s is the unit increment of the variable s.
3) Respectively calculating the energy storage marginal flow
Figure BDA00030262439600000710
And
Figure BDA00030262439600000711
Figure BDA0003026243960000081
in the formula:
Figure BDA0003026243960000082
respectively, the unit increment gain, the time interval gain and the unit decrement gain of the time interval t.
4) Storing the marginal contribution and the marginal flow, calculating the pre-decision state of the next time interval by using the formula (11), and returning to 1) for judgment.
Step 6.4: and obtaining the marginal contribution and the marginal flow of each time period through a forward simulation process, calculating a slope sampling observation value during reverse updating, and further updating PLFs of all the time periods by using the sampling observation value.
This step is executed by first making T equal to T.
1) It is determined whether t >0 is satisfied. If yes, executing the following specific contents; otherwise, the details of step 6.5 are executed.
2) The sampling slope of each time interval is calculated by reverse order recursion,
Figure BDA0003026243960000083
can be obtained by the following formula; by the same token, can also obtain
Figure BDA0003026243960000084
Figure BDA0003026243960000085
3) Further, smoothing the data to an estimated value of the current marginal value by using a random gradient method:
Figure BDA0003026243960000086
in the formula: α is the update step size. The invention uses the harmonic deterministic step rule, i.e. alphanA/(a + n), a is an adjustable parameter. In the same way, it can also be obtained
Figure BDA0003026243960000087
Random gradients of (3) are more modern.
4) In obtaining an estimated value
Figure BDA0003026243960000088
And
Figure BDA0003026243960000089
thereafter, the PLFs are updated using a concave function projection Correction (CAVE) algorithm. The CAVE algorithm corrects the slope through projection operation to ensure that the PLFs slope meets the concave characteristic.
5) Storing the updating result, and returning t to t-delta t to 1) for judgment.
Step 6.5: it is determined whether the iteration is terminated. After the reverse update is completed, the iteration number needs to be judged. If N is less than N, making the iteration number N equal to N +1, and returning to the step 6.2; otherwise, deriving the PLF of each time interval for calling the two-stage robust approximate dynamic programming model in the day.
The invention has the beneficial effects that: and in the previous stage, introducing a post-decision state approximation function by using an approximate dynamic programming idea to represent long-term expected net benefits in different time period states, and performing off-line training on the approximation function by using a TD (1) algorithm. In the operation process in the day, the power reference point of each time period is dynamically obtained by combining the rolling update prediction of electricity price and load, the frequency uncertainty set and the long-term time domain approximate function, and the two-stage robust approximate dynamic programming model is optimized, so that the frequency regulation capability of energy storage at the user side can be effectively guaranteed, the long-term influence of real-time decision can be rapidly evaluated through offline training-online application of the approximate function, and the global economic benefit and the online operation overhead are considered.
Drawings
In order to more clearly illustrate the technical solutions of the present invention in the specific implementation processes, the following figures briefly describe the related drawings.
FIG. 1 is a basic flow diagram of the load side energy storage stacking service provided by the present invention;
FIG. 2 is a diagram illustrating an approximation idea of a value function according to the present invention;
FIG. 3 is a schematic view of the in-day optimized operation process provided by the present invention;
FIG. 4 is a flowchart of the TD (1) algorithm provided by the present invention;
FIG. 5 is a box plot and mean values of the indicators under different strategies provided by the present invention; fig. 5(a) is a comparison graph of daily operation economic benefit, fig. 5(b) is a comparison graph of single-period average optimization time consumption, fig. 5(c) is a comparison graph of economic benefit deviation ratio, and fig. 5(d) is a comparison graph of SoC utilization ratio.
Detailed Description
The following description is to be read in connection with the accompanying drawings and the detailed description, but not intended to limit the invention.
In the current day operation process of providing electricity price response and primary frequency modulation stacking service by load side energy storage, a power reference point needs to be optimized in real time in a limited time window so as to ensure the reliability of PFR and the economic benefit of day operation. A 'look-ahead-value function approximation' hybrid LVFA strategy of an online rolling optimization two-stage robust approximation dynamic programming model is provided, and the strategy comprises the following steps:
step 1: under the dual market mechanism of real-time electricity price and PFR, the load-side energy storage is used to provide electricity price response and PFR stacking service, and the basic flow is shown in fig. 1.
Day-ahead bid (1 day): and optimizing the bidding capacity participating in the PFR market based on the historical statistical data and the short-term prediction information. When the PFR market is cleared, the bid amount and the compensation price are made clear, and the PFR service is required to be provided strictly according to the bid amount in a day.
Run daily (5 min): and dynamically optimizing the power reference point of each time interval based on the frequency prior information and the rolling updated short-term prediction data. The power reference point can coordinate dual functions: firstly, the electricity utilization cost is reduced by responding to the real-time electricity price and the low storage and high generation; and secondly, dynamically adjusting the SoC space margin to ensure that the energy storage provides the adjusting performance of the PFR. The power reference point should be maintained constant during the same time period.
Real-time control (1 s): and calculating the energy storage real-time control power of the load side based on the local frequency information and the power reference point, and optimally distributing power instructions among the energy storage units by combining BMS real-time monitoring information.
In the process, the optimization operation in the day belongs to a key link of the top-down, and the optimization decision result has obvious influence on the technical and economic benefits of the energy storage at the load side. The invention is developed aiming at optimizing the operation problem in the day by knowing the PFR capacity and the compensation price of the load side energy storage after the market is cleared.
Step 2: and modeling is carried out aiming at the dynamic process of the day-to-day optimized operation, and related information quantity, decision quantity, revenue function and the like are determined. And discretizing a limited time domain range T of the daily operation process by taking the duration time delta T of the real-time electricity price as 5min as a granularity, and defining a time set T as {0, delta T,2 delta T, …, T }.
As can be seen from fig. 1, the energy storage and the aggregate load operate in a coordinated manner within a day, and the real-time interaction power between the energy storage and the aggregate load and the power grid meets the active power balance. The direction of the injected power from the grid is selected as the positive direction, and then:
Pt g=Pt b+Pt l (1)
in the formula: pt g、Pt b、Pt lRespectively, the time t is the power grid, the energy storage and the load power. Real time power Pt bFrom the abovePeriod power reference point Pt eAnd the real-time PFR power is obtained by superposition of the two parts, and is represented as:
Pt b=Pt e+α·Δft (2)
wherein, α · Δ ftRepresenting the real-time PFR power provided by simulating droop control of a synchronous generator set and automatically responding to frequency deviation, alpha representing a droop coefficient, and deltaftRepresents the average value of the frequency deviation within Δ t; pt eRepresenting the time period t power reference point. The frequency sampling period is in the second or millisecond level, and if frequency information is modeled according to the time granularity, the model in the day has the characteristic of double time scales, so that the problem complexity is greatly improved. Considering that the unit decision period is delta t, the invention adopts the average value delta f of the frequency deviation in delta ttApproximately characterizing the corresponding sequence of frequency deviations Δ ftThe influence of the approximate modeling on the operation process is less than 2 multiplied by 10-4
The load side stored energy provides the PFR response output according to the power-frequency characteristic. After the frequency crosses the dead zone, linearly responding to the frequency deviation; and when the frequency deviation exceeds the linear response interval, the force is exerted according to the medium bid capacity. Accordingly, the droop coefficient α in equation (2) can be expressed as:
Figure BDA0003026243960000101
in the formula: 1{z}A function is indicated for the condition, when z is true, the function value is 1, otherwise, the function value is 0; Δ fmaxMaximum frequency deviation of the linear response interval; pfThe medium power capacity in the PFR market for energy storage.
According to the PFR mechanism rule, the PFR deviation ratio should be lower than the maximum allowable value. In order to avoid the penalty risk caused by frequency modulation deviation, the invention requires that the reliability of PFR is provided to be 100%. For this reason, the frequency regulation capability of the load-side energy storage needs to be guaranteed from both power and electric quantity aspects:
Figure BDA0003026243960000102
in the formula: pnRated power for energy storage; smax、sminRespectively, the upper and lower limits allowed by the SoC.
With charging and discharging during operation within a day, the SoC dynamic transition from time t to t + Δ t can be described as:
Figure BDA0003026243960000103
in the formula: stStoring the energy SoC for the time t; etac、ηdRespectively charge and discharge efficiency; e.g. of the typemaxThe maximum amount of energy stored.
The stack service is provided by applying the energy storage under the dual market mechanism, so that the electricity purchasing cost can be reduced by responding to the electricity price, and the compensation income can be obtained by providing the PFR service. Meanwhile, the aging and decline of the battery can be caused by frequent charging and discharging of the stored energy, and the aging and loss cost is calculated according to the charging and discharging amount.
In summary, the net economic benefit C generated by the load side energy storage during a single operating periodtCan be expressed as:
Figure BDA0003026243960000111
in the formula:
Figure BDA0003026243960000112
real-time electricity prices for a time period t, cfCompensating prices for PFR, cagIs the unit aging cost.
And step 3: the sequential decision process under uncertain environment is considered to be operated in the day aiming at the optimization in the day, so the problem is further constructed into a Markov Decision Process (MDP) model. The MDP is used as a general model of a random sequential decision problem and mainly comprises the following steps: the system comprises five basic elements of a state variable, a decision variable, random information, a transfer function and an objective function.
The state variables are as follows: reflecting the current state, including physical and information states. The specific definition is as follows:
Figure BDA0003026243960000113
decision variables: adjusting the power reference point according to the current state, so that the decision variable xtComprises the following steps:
xt=(Pt e),xt∈χt(8)
in the formula: chi shapetAnd (3) a feasible space is provided for decision variables formed by the formulas (2) to (5) so as to ensure the frequency regulation capability of the energy storage at the load side in the operation process.
Random information: for modeling random factors in the operation process, random information WtIs defined as:
Figure BDA0003026243960000114
in the formula:
Figure BDA0003026243960000115
and
Figure BDA0003026243960000116
respectively, the deviation amount between the predicted value (estimated value) and the actual value of the random information.
Based on this, the daily operating process can be described by the following status, decision and random information:
Figure BDA0003026243960000117
transfer function: the method refers to a process of transferring from a current state to a next state according to decision and random information:
St+Δt=SM(St,xt,Wt+Δt) (11)
in the formula: sM(. includes the dynamic transfer process for each state variable. WhereinThe SoC transfer process between adjacent time periods is shown in formula (5). For the dynamic transfer of real-time electricity price, load power and frequency deviation, all belong to the information process with independent state, and are respectively expressed as:
Figure BDA0003026243960000118
Figure BDA0003026243960000119
Figure BDA00030262439600001110
in the formula:
Figure BDA00030262439600001111
and
Figure BDA00030262439600001112
the predicted values (estimated values) of the electricity price, load, and frequency deviation are provided.
Figure BDA00030262439600001113
Δft+ΔtRespectively representing the real values of electricity price, load and frequency deviation;
Figure BDA0003026243960000121
respectively representing deviation amounts between predicted values (estimated values) and actual values of electricity prices, loads and frequency deviations;
an objective function: for day-to-day operational problems, the goal is to maximize the cumulative expected net benefit over each time period while meeting relevant technical performance conditions. Objective function F*Expressed as:
Figure BDA0003026243960000122
in the formula: e {. | · } represents a conditional expectation; ct(St,xt) Is the net benefit function of time period t, the physical meaning of which is equivalent to equation (6); s0Represents an initial state; ctRepresenting a time period t gain; stRepresents the time period t state; x is the number oftRepresenting the time period t decision.
And 4, step 4: the MDP model in the day constructed in the step 3 defines relevant variables and dynamic transfer process on the whole, and concretely solves the problem of real-time optimization: assume that the current time is tcAccording to the predicted time domain range of the rolling update, the subsequent time domain is divided into two stages: a short-term time domain and a long-term time domain.
Definition 1. short-term time domain: current time tcTo a future time tf=tcThe rolling update between + H Δ t predicts the time domain, where H represents the number of rolling update time segments. In the H delta t time domain range, the electricity price and the load can be accurately predicted.
Definition 2. long-term time domain: future time tfThe time domain range between the arrival time T and the end time T. For this time domain range, the power rate and load day-ahead prediction information and its error distribution are known.
The invention proposes to select an approximate function of the appropriate type
Figure BDA0003026243960000123
Representing expected net benefit of long-term time domain, and based on MDP model and available information in the day-ahead, approximating function to each time interval
Figure BDA0003026243960000124
And carrying out off-line calculation to effectively approximate the actual value of the expected net benefit. Through an offline calculation-online application mode, the model complexity of online rolling optimization can be fundamentally reduced, and therefore the global optimization benefit and the online operation efficiency are effectively considered. The basic idea of time domain partitioning and introducing an approximation function is shown in fig. 2.
Therefore, a two-stage robust approximate dynamic programming model combining frequency prior information and an offline approximate function is built in a day. Short-term time domain, counting frequency information to obtain a priori uncertain set, and updating the electricity price in a rolling mannerAnd (3) ultra-short-term prediction information of the load, and constructing a rolling updating prediction robust model in the time domain to ensure the reliability of the PFR. And (5) calling an approximate function of offline calculation in a long-term time domain, and quickly evaluating the subsequent expected net benefit. In the real-time optimization operation process, the current time tcThe corresponding optimization model can be expressed as:
Figure BDA0003026243960000125
s.t.(1)-(8)(16)
Figure BDA0003026243960000126
in the formula: Γ is the uncertainty set of frequency deviations. According to the robust optimization idea, the fluctuation range of the frequency deviation is represented by the gamma, and the optimal solution under the worst condition is searched according to the set, so that the frequency adjustment capability is guaranteed. The invention adopts an interval form to describe the fluctuation range, namely a box type uncertain set is formed, which is expressed as follows:
Γ={Δft|Δflow≤Δft≤Δfup,t∈T} (17)
in the formula: Δ fup、ΔflowThe upper and lower bounds of the frequency deviation, respectively.
The real-time run strategy can be generalized to a strategy type of hybrid look-ahead and value function approximation according to the optimization model described above as shown in equation (16). The first half part in the parentheses of the formula (16) takes account of the prediction information updated in a rolling way, so that the formula has a 'look ahead' structural feature; the second half adopts an approximation function to represent the expected net benefit of the corresponding time domain, so the method has the structural characteristic of the approximation function. Based on the above two-part structural feature mixture, the corresponding day-optimized operation process is shown in fig. 3. And predicting an uncertain set of frequency information and a long-term time domain approximate function calculated offline by combining rolling update of electricity price and load, and dynamically acquiring the power reference point of each period by using an online rolling solution (16) optimization model.
And 5: for the proposed real-time operation strategy, the key to the performance is: and selecting the type of approximate function, and further performing off-line calculation on the approximate function of each time period to effectively approximate the actual value of the expected net benefit. The invention introduces the post-decision state and the approximate Piecewise Linear Function (PLF) thereof, and converts the Bellman equation into an approximate dynamic programming form.
Based on the Bellman principle, the in-day MDP can be decoupled into a plurality of sub-problems in a single time period, and then the expected benefit value in each time period state can be solved through reverse order recursion. Defining an optimum function
Figure BDA0003026243960000131
Representing the cumulative expected maximum net benefit value from time T to the final time T. Equation (15) can be recursively expressed as:
Figure BDA0003026243960000132
however, solving equation (18) requires calculating the conditional expectation of all feasible states
Figure BDA0003026243960000133
Because of the problem of 'dimension disaster' caused by overlarge discrete scale of the state space and the decision space, the direct calculation of the expected value has serious calculation obstacle.
To overcome the above problems, post-decision state variables are introduced
Figure BDA0003026243960000134
It refers to the state after the energy storage execution decision and when the new random information has not yet arrived. The method for determining the state after decision comprises the following steps: a decision and information decoupling method, a state-behavior pairing method and a point estimation method. The invention adopts a point estimation method, namely, a state variable after decision is determined by using a predicted value (estimated value) of random information, which is specifically expressed as follows:
Figure BDA0003026243960000135
in the formula:
Figure BDA0003026243960000136
is a predicted value (estimated value) of all random information.
Using the post-decision value function to approximate the surrogate condition expectation, equation (19) can be transformed into the deterministic form:
Figure BDA0003026243960000137
in the formula: value function after decision
Figure BDA0003026243960000141
Due to the fact that
Figure BDA0003026243960000142
The state space of (2) is continuous and large in scale, and all state values are calculated
Figure BDA0003026243960000143
Difficulties still remain. Therefore, it is considered that the method of approximation using a value function is adopted to obtain all
Figure BDA0003026243960000144
The state value of (2).
The invention adopts a piecewise linear concave function
Figure BDA0003026243960000145
Approximate characterization
Figure BDA0003026243960000146
For a particular
Figure BDA0003026243960000147
And
Figure BDA0003026243960000148
Figure BDA0003026243960000149
only related to a single variable of SoC, expressed as:
Figure BDA00030262439600001410
in the formula ItThe number of segments of the state value function after decision for the time period t;
Figure BDA00030262439600001411
for the slope of the ith segment, the slope of each segment should remain monotonically decreasing, i.e., for preserving the concavity of the piecewise linear function
Figure BDA00030262439600001412
ri,tThe resource allocation amount of the ith segment meets the following conditions:
Figure BDA00030262439600001413
in the formula (I), the compound is shown in the specification,
Figure BDA00030262439600001414
the maximum amount of resources for the ith segment of time period tth.
Despite the use of PLF approximation to characterize a particular
Figure BDA00030262439600001415
Is as follows
Figure BDA00030262439600001416
But because of
Figure BDA00030262439600001417
Including multidimensional continuous variables, it is necessary to enumerate all PLFs corresponding thereto. The invention applies hierarchical aggregation rule generation
Figure BDA00030262439600001418
And correspondingly, a small number of PLFs are used for weighting and combining different PLFs so as to reduce the operation complexity.
The formula (21) is substituted into the formula (20), an approximate dynamic programming form of the Bellman equation can be derived, and an approximate optimal decision value of the time period t can be obtained by solving the formula, and the formula is specifically expressed as follows:
Figure BDA00030262439600001419
in the formula, arg max (-) represents a decision variable corresponding to the optimal objective function,
Figure BDA00030262439600001420
r defined by formula (22)i,tCan be used.
Step 6: the invention applies a time difference learning (TD (1)) algorithm with a discount factor lambda being 1 to carry out off-line training on PLFs in all periods, the TD (1) algorithm is a bidirectional algorithm combining forward simulation and reverse updating, the differential learning process with the discount factor lambda being 1 is adopted to update the section slope of the PLFs in an iterative manner, the specific flow is shown in figure 4, and the method comprises the following steps:
step 6.1: setting the power P in PFRfAnd an initial SoC; the initial slope of the PLF at each time interval is zero, or other estimated value to speed up convergence; and setting the maximum iteration number N of the off-line training, and enabling N to be 1.
Step 6.2: and generating a next-day electricity price and load scene set by applying Monte-Carlo simulation according to the day-ahead prediction information and the error distribution thereof. Considering that the frequency is difficult to predict effectively, a frequency scene set is selected from the historical frequency sequence. Extracting a sample path of the nth iteration training, namely omega, in the scene setn∈Ω。
Step 6.3: and forward simulating a sequential decision process, and carrying out forward recursive solution (23) along a sample path to obtain the optimal decision of each time interval. And calculating the marginal contribution and the marginal flow of each time interval according to the data, and providing a random sampling observation value for reversely updating the slope of the PLF of each time interval. If the (n-1) th iteration update is completed, the time intervals are
Figure BDA0003026243960000151
It can be used as knownFor the nth iteration. Decision of time period t in nth iteration
Figure BDA0003026243960000152
Solving the following equation yields:
Figure BDA0003026243960000153
in the formula: the superscript n denotes the variable used in the nth iteration.
This step is executed by first setting t to 0.
1) And judging whether T < T is met. If yes, executing the following contents, and respectively calculating the marginal contribution and the marginal flow; otherwise, the reverse update procedure of step 6.4 is performed.
2) Determining left and right numerical derivatives of the marginal contribution, respectively
Figure BDA0003026243960000154
And
Figure BDA0003026243960000155
Figure BDA0003026243960000156
in the formula:
Figure BDA0003026243960000157
δ s is the unit increment of the variable s.
3) Respectively calculating the energy storage marginal flow
Figure BDA0003026243960000158
And
Figure BDA0003026243960000159
Figure BDA00030262439600001510
in the formula:
Figure BDA00030262439600001511
respectively, the unit increment gain, the time interval gain and the unit decrement gain of the time interval t.
4) Storing the marginal contribution and the marginal flow, calculating the pre-decision state of the next time interval by using the formula (11), and returning to 1) for judgment.
Step 6.4: and obtaining the marginal contribution and the marginal flow of each time period through a forward simulation process, calculating a slope sampling observation value during reverse updating, and further updating PLFs of all the time periods by using the sampling observation value.
This step is executed by first making T equal to T.
1) It is determined whether t >0 is satisfied. If yes, executing the following specific contents; otherwise, the details of step 6.5 are executed.
2) The sampling slope of each time interval is calculated by reverse order recursion,
Figure BDA00030262439600001512
can be obtained by the following formula; by the same token, can also obtain
Figure BDA00030262439600001513
Figure BDA0003026243960000161
3) Further, smoothing the data to an estimated value of the current marginal value by using a random gradient method:
Figure BDA0003026243960000162
in the formula: α is the update step size. The invention uses the harmonic deterministic step rule, i.e. alphanA/(a + n), a is an adjustable parameter. In the same way, it can also be obtained
Figure BDA0003026243960000163
Random gradients of (3) are more modern.
4) In obtaining an estimated value
Figure BDA0003026243960000164
And
Figure BDA0003026243960000165
thereafter, the PLFs are updated using a concave function projection Correction (CAVE) algorithm. The CAVE algorithm corrects the slope through projection operation to ensure that the PLFs slope meets the concave characteristic.
5) Storing the updating result, and returning t to t-delta t to 1) for judgment.
Step 6.5: it is determined whether the iteration is terminated. After the reverse update is completed, the iteration number needs to be judged. If N is less than N, making the iteration number N equal to N +1, and returning to the step 6.2; otherwise, deriving the PLF of each time interval for calling the two-stage robust approximate dynamic programming model in the day.
And 7: in order to verify the performance advantage of the provided LVFA operation strategy, the load side energy storage of 0.5MW/1MWh is taken as an example for example to carry out example verification, and the charge-discharge efficiency eta isc=ηd0.95, aging cost cag0.5 $/MWh. After the PFR market is cleared day before, the actual bid amount is 0.2MW, and the frequency modulation compensation price cfThe initial SoC for the run-in-day process was 0.1MWh, 16.53 $/MW/h. An operation program is compiled by using a MATLAB 2016a platform, a CPLEX solver is called to carry out optimization solution, and a computer adopts a 4-core Intel (R) core (TM) i5-6400CPU @3.60GHz processor to configure a 16G RAM.
Simulating real-time optimization decision-making processes under different strategies based on a training scene set, and respectively operating economic benefits F and single-period average optimization time consumption T from day to dayoptEconomic efficiency deviation ratio ζrevAnd SoC utilization UsocAnd (5) performing four-dimensional quantitative analysis. Wherein the economic benefit deviation rate is the deviation of daily operation economic benefit and a theoretical optimal value and is represented as zetarev=(F*-F)/F*X 100%. The box plot and mean values for the indices under the different strategies are shown in fig. 5.
As can be seen from economic indicators in FIG. 5, both SMPC and LVFA strategies have good daily operation economic benefits, and the mean deviation rates are 5.25% and 4.45%, respectively; in contrast, the mean of the economic benefit of the MPC strategy is as high as 24.48% from the theoretical optimum. The internal reason is analyzed: the SMPC online rolling optimization two-stage robust-stochastic optimization model, while the LVFA adopts a value function approximation method to represent expected net benefits of a long-term time domain, and both strategies can take random stages and expand optimization time domains to optimize decisions from a global range so as to improve operation benefits. For the MPC strategy, only the optimization model in the rolling prediction time domain is solved to obtain a real-time decision, so that the global economic benefit in the day is difficult to be comprehensively obtained. Accordingly, the SoC utilization mean value under SMPC and LVFA policies is higher than 99.5%, while the utilization mean value under MPC policy is only 75.63%.
Compared with the optimization time consumption angle, the calculation time consumption of the provided LVFA strategy on-line rolling optimization is only 4.65s, and the calculation dimension is the same as that of the MPC strategy; in contrast, the computation time of the SMPC strategy is greatly increased, the average optimization time of a single period is up to 287s, and even 300s time limit boundary is exceeded under partial scenes. The reason for this is that: the LVFA utilizes PLFs of offline training to evaluate the influence of the current decision on a long-term stage, and the online optimization solving scale is obviously reduced; the SMPC strategy adopts a scene sample set to calculate the expected net benefit of a long-term time domain, so that the online operation scale is greatly increased. In conclusion, the LVFA can fundamentally reduce the complexity of an online rolling optimization model through an offline training-online application mode, and effectively give consideration to the global optimization benefit and the online execution overhead.
It should be understood by those skilled in the art that the method can be implemented by those skilled in the art by combining the prior art and the above-mentioned scheme, and the detailed description is not repeated herein.
The preferred method of the present invention is described above. It is to be understood that this invention is not limited to the particular embodiments described above, and that equipment and structures not described in detail are to be understood as being practiced in a manner common to the art; those skilled in the art can make many possible variations and modifications to the disclosed methods and techniques, or modify the equivalents thereof, without departing from the spirit and scope of the invention. Therefore, any simple modification, equivalent change and modification of the above method according to the technical spirit of the present invention will still fall within the protection scope of the technical method of the present invention, unless the technical essence of the present invention departs from the content of the technical method of the present invention.

Claims (3)

1. An intra-day optimization operation strategy for providing stack service by load side energy storage is characterized by comprising the following steps:
step 1: under the double market mechanism of real-time electricity price and PFR, the energy storage at the load side is used for providing electricity price response and PFR stacking service;
day-ahead bid (1 day): optimizing the bidding capacity participating in the PFR market based on historical statistical data and short-term prediction information; after the PFR market is cleared, the bid amount and the compensation price are made clear, and the PFR service is required to be provided strictly according to the bid amount in a day;
run daily (5 min): dynamically optimizing the power reference point of each time interval based on the frequency prior information and the rolling updated short-term prediction data; in the same time period, the power reference point should be kept constant;
real-time control (1 s): calculating the energy storage real-time control power of the load side based on the local frequency information and the power reference point, and optimally distributing power instructions among the energy storage units by combining BMS real-time monitoring information;
aiming at the PFR capacity and compensation price of the known load side energy storage after the market is cleared, the optimization operation problem in the day is developed;
step 2: modeling is carried out aiming at the dynamic process of the optimization operation in the day, and relevant information quantity, decision quantity, revenue function and the like are determined; discretizing a limited time domain range T in the daily operation process by taking the duration time delta T of the real-time electricity price as 5min as granularity, and defining a time set T as {0, delta T,2 delta T, …, T };
the energy storage and the aggregated load are coordinated in day, and the real-time interaction power of the energy storage and the aggregated load and the power grid meets the active power balance; the direction of the injected power from the grid is selected as the positive direction, and then:
Pt g=Pt b+Pt l (1)
in the formula: pt g、Pt b、Pt lRespectively is a power grid, an energy storage and a load power at the moment t; real time power Pt bFrom the time interval power reference point Pt eAnd the real-time PFR power is obtained by superposition of the two parts, and is represented as:
Pt b=Pt e+α·Δft (2)
wherein, α · Δ ftRepresenting the real-time PFR power provided by simulating droop control of a synchronous generator set and automatically responding to frequency deviation, alpha representing a droop coefficient, and deltaftThe average value of the frequency deviation in a unit decision period delta t is expressed; pt eRepresents the time period t power reference point; using the average value of frequency deviation Deltaf within DeltattApproximately characterizing the corresponding sequence of frequency deviations Δ ft
According to the PFR mechanism rule, the PFR deviation rate is lower than the maximum allowable value; the frequency regulation capability of the energy storage of the load side is ensured from two aspects of power and electric quantity:
Figure FDA0003026243950000011
in the formula: pnRated power for energy storage; smax、sminRespectively the upper and lower limits allowed by SoC;
with charging and discharging during operation within a day, the SoC dynamic transition from time t to t + Δ t can be described as:
Figure FDA0003026243950000021
in the formula: stStoring the energy SoC for the time t; etac、ηdRespectively charge and discharge efficiency; e.g. of the typemaxThe maximum electric quantity is stored;
the method comprises the steps that energy storage is applied to provide stacking service under a dual market mechanism, meanwhile, battery aging decline is caused by frequent charging and discharging of the energy storage, and aging depreciation cost is calculated according to charging and discharging quantities;
in summary, the net economic benefit C generated by the load side energy storage during a single operating periodtCan be expressed as:
Figure FDA0003026243950000022
in the formula:
Figure FDA0003026243950000023
real-time electricity prices for a time period t, cfCompensating prices for PFR, cagIs the unit aging cost;
and step 3: aiming at the intra-day optimization, the sequential decision process of the intra-day operation under the uncertain environment is considered, so the problem is further constructed into a Markov Decision Process (MDP) model; the MDP is used as a general model of a random sequential decision problem and mainly comprises the following steps: state variable, decision variable, random information, transfer function and target function;
and 4, step 4: the MDP model in the day constructed in the step 3 defines relevant variables and dynamic transfer process on the whole, and concretely solves the problem of real-time optimization: assume that the current time is tcAccording to the predicted time domain range of the rolling update, the subsequent time domain is divided into two stages: a short-term time domain and a long-term time domain;
definition 1. short-term time domain: current time tcTo a future time tf=tcA rolling update prediction horizon between + H Δ t, where H represents the number of rolling update periods; in the H delta t time domain range, the electricity price and the load can be accurately predicted if the electricity price and the load can be accurately predicted;
definition 2. long-term time domain: future time tfA time domain range between the arrival running termination time T; for the time domain range, the day-ahead prediction information of the electricity price and the load and the error distribution thereof are known;
selecting approximate function of appropriate type
Figure FDA0003026243950000024
Representing expected net benefit of long-term time domain, and based on MDP model and available information in the day-ahead, approximating function to each time interval
Figure FDA0003026243950000025
Carrying out off-line calculation to effectively approach a true value of expected net benefit; through an offline calculation-online application mode, the model complexity of online rolling optimization can be fundamentally reduced, and the global optimization benefit and the online operation efficiency are effectively considered;
therefore, a two-stage robust approximate dynamic programming model combining frequency prior information and an offline approximate function is built in a day; in the short-term time domain, counting frequency information to obtain a priori uncertain set, rolling and updating the ultra-short-term prediction information of the electricity price and the load, and constructing a rolling and updating prediction robust model in the time domain to guarantee the reliability of the PFR; calling an approximate value function of offline calculation in a long-term time domain, and quickly evaluating the subsequent expected net benefit; in the real-time optimization operation process, the current time tcThe corresponding optimization model can be expressed as:
Figure FDA0003026243950000031
in the formula: gamma is an uncertain set of frequency deviation; according to the robust optimization idea, the fluctuation range of the frequency deviation is represented by the gamma, and the optimal solution under the worst condition is searched according to the set, so that the frequency adjustment capability is guaranteed; the fluctuation range is described in a range form, namely a box type uncertain set is formed, and the box type uncertain set is expressed as follows:
Γ={Δft|Δflow≤Δft≤Δfup,t∈T} (17)
in the formula: Δ fup、ΔflowThe upper and lower bounds of the frequency deviation are respectively;
according to the optimization model shown in the formula (16), the real-time operation strategy can be generalized to a strategy type of hybrid look-ahead and value function approximation; forecasting an uncertain set of frequency information and a long-term time domain approximate function calculated off line by combining rolling update of electricity price and load, and dynamically acquiring power reference points of each period by an online rolling solution formula (16) optimization model;
and 5: for the proposed real-time operation strategy, the key to the performance is: selecting which type of approximate function to perform off-line calculation on the approximate functions of each time period so as to effectively approximate to a true value of expected net benefit; introducing a post-decision state and an approximate Piecewise Linear Function (PLF) thereof, and converting the Bellman equation into an approximate dynamic programming form;
based on the Bellman principle, the in-day MDP can be decoupled into a plurality of sub-problems in a single time period, and then the expected benefit value in each time period state can be solved through reverse order recursion; defining an optima function Vt *(St) Representing the accumulated expected maximum net benefit value from the time T to the final time T; equation (15) can be recursively expressed as:
Figure FDA0003026243950000032
in the process of solving the formula (18), the state variable after decision is introduced
Figure FDA0003026243950000033
The state is the state where new random information is not reached after the energy storage execution decision is made; the post-decision state variable is determined by a point estimation method, namely by using a predicted value (estimated value) of random information, and is expressed as:
Figure FDA0003026243950000034
in the formula:
Figure FDA0003026243950000035
the predicted value of all random information is obtained;
using the post-decision value function to approximate the surrogate condition expectation, equation (19) can be transformed into the deterministic form:
Figure FDA0003026243950000036
in the formula: value function after decision
Figure FDA0003026243950000037
Obtaining all by value function approximation
Figure FDA0003026243950000038
A state value of (d);
using piecewise linear concave functions
Figure FDA0003026243950000041
Approximate characterization
Figure FDA0003026243950000042
For a particular
Figure FDA0003026243950000043
And
Figure FDA0003026243950000044
only related to a single variable of SoC, expressed as:
Figure FDA0003026243950000045
in the formula ItThe number of segments of the state value function after decision for the time period t;
Figure FDA0003026243950000046
for the slope of the ith segment, the slope of each segment should remain monotonically decreasing, i.e., for preserving the concavity of the piecewise linear function
Figure FDA0003026243950000047
ri,tThe resource allocation amount of the ith segment meets the following conditions:
Figure FDA0003026243950000048
in the formula (I), the compound is shown in the specification,
Figure FDA0003026243950000049
maximum amount of resources for the ith segment of time period tth;
despite the use of PLF approximation to characterize a particular
Figure FDA00030262439500000410
Is as follows
Figure FDA00030262439500000411
But because of
Figure FDA00030262439500000412
Including multidimensional continuous variables, requiring enumeration of all PLFs corresponding thereto; generation using hierarchical aggregation rules
Figure FDA00030262439500000413
A corresponding small number of PLFs, and further carrying out weighted combination processing on different PLFs to reduce the operation complexity;
substituting the formula (21) into the formula (20), deriving an approximate dynamic programming form of the Bellman equation, and solving the formula to obtain an approximate optimal decision value of the time period t, which is specifically expressed as:
Figure FDA00030262439500000414
wherein argmax (·) represents a decision variable corresponding to the optimal objective function,
Figure FDA00030262439500000415
r defined by formula (22)i,tA feasible field of;
step 6: the off-line training is carried out on PLFs in all periods by applying a time difference learning (TD (1)) algorithm with a discount factor lambda-1, the TD (1) algorithm is a bidirectional algorithm combining forward simulation and reverse updating, and the segmented slope of the PLFs is updated iteratively by adopting a difference learning process with the discount factor lambda-1, and the off-line training method comprises the following steps of:
step 6.1: setting the power P in PFRfAnd an initial SoC; the initial slope of the PLF at each time interval is zero, or other estimated value to speed up convergence; setting the maximum iteration number N of off-line training, and setting N to be 1;
step 6.2: according to the day-ahead prediction information and the error distribution thereof, generating a next-day electricity price and load scene set by applying Monte-Carlo simulation; selecting a frequency scene set from a historical frequency sequence; extracting a sample path of the nth iteration training, namely omega, in the scene setn∈Ω;
Step 6.3: forward simulation sequential decision making process is carried out, and forward recursion solution formula (23) is solved along a sample path to obtain optimal decision of each time interval; calculating the marginal contribution and marginal flow of each time interval according to the data, and providing random sampling observation values for reversely updating the slope of PLF of each time interval; if the (n-1) th iteration update is completed, the time intervals are
Figure FDA0003026243950000051
As is known, it can be used for the nth iteration; decision of time period t in nth iteration
Figure FDA0003026243950000052
Solving the following equation yields:
Figure FDA0003026243950000053
in the formula: the superscript n denotes the variable used in the nth iteration;
executing the step, firstly making t equal to 0;
1) judging whether T < T is met: if yes, executing the following steps, and respectively calculating the marginal contribution and the marginal flow; otherwise, the reverse updating process of step 6.4 is executed;
2) determining left and right numerical derivatives of the marginal contribution, respectively
Figure FDA0003026243950000054
And
Figure FDA0003026243950000055
Figure FDA0003026243950000056
in the formula:
Figure FDA0003026243950000057
δ s is the unit increment of the variable s;
3) respectively calculating the energy storage marginal flow
Figure FDA0003026243950000058
And
Figure FDA0003026243950000059
Figure FDA00030262439500000510
in the formula:
Figure FDA00030262439500000511
respectively the unit increment gain, the time interval gain and the unit decrement gain of the time interval t;
4) storing the marginal contribution and the marginal flow, calculating the pre-decision state of the next time interval by using a formula (11), and returning to 1) for judgment;
step 6.4: obtaining the marginal contribution and the marginal flow of each time period through a forward simulation process, calculating a slope sampling observation value during reverse updating according to the marginal contribution and the marginal flow, and further updating PLFs of all the time periods by using the sampling observation value;
executing the step, firstly, making T equal to T;
1) judging whether t is more than 0: if yes, executing the following specific contents; otherwise, executing the content of step 6.5;
2) the sampling slope of each time interval is calculated by reverse order recursion,
Figure FDA00030262439500000512
can be obtained by the following formula; by the same token, can also obtain
Figure FDA00030262439500000513
Figure FDA00030262439500000514
3) Further, smoothing the data to an estimated value of the current marginal value by using a random gradient method:
Figure FDA0003026243950000061
in the formula: alpha is the updating step length;
4) in obtaining an estimated value
Figure FDA0003026243950000062
And
Figure FDA0003026243950000063
then, updating PLFs by applying a concave function projection Correction (CAVE) algorithm;
5) storing the updating result, and returning to 1) for judgment, wherein t is t-delta t;
step 6.5: judging whether the iteration is terminated; after the reverse updating is completed, the iteration times need to be judged; if N is less than N, making the iteration number N equal to N +1, and returning to the step 6.2; otherwise, deriving the PLF of each time interval for calling the two-stage robust approximate dynamic programming model in the day.
2. The intra-day optimized operation strategy for providing stacking services by using the load-side energy storage according to claim 1, wherein in the step 2, the load-side energy storage provides PFR (power-frequency response) output according to power-frequency characteristics; after the frequency crosses the dead zone, linearly responding to the frequency deviation; when the frequency deviation exceeds the linear response interval, the force is exerted according to the medium bid capacity; accordingly, the droop coefficient α in equation (2) can be expressed as:
Figure FDA0003026243950000064
in the formula: 1{z}A function is indicated for the condition, when z is true, the function value is 1, otherwise, the function value is 0; Δ fmaxMaximum frequency deviation of the linear response interval; pfThe medium power capacity in the PFR market for energy storage.
3. The intra-day optimization operation strategy for providing stack service by load-side energy storage according to claim 1, wherein in the step 3, five basic elements are specifically described as follows:
the state variables are as follows: reflecting the current state, including physical and information states; the specific definition is as follows:
Figure FDA0003026243950000065
decision variables: adjusting the power reference point according to the current state, so that the decision variable xtComprises the following steps:
xt=(Pt e),xt∈χt (8)
in the formula: chi shapetThe decision variables formed by the formulas (2) to (5) are feasible spaces to ensure the frequency regulation capacity of the energy storage at the load side in the operation process;
random information: for modeling random factors in the operation process, random information WtIs defined as:
Figure FDA0003026243950000066
in the formula:
Figure FDA0003026243950000067
and
Figure FDA0003026243950000068
respectively, deviation amounts between a predicted value (estimated value) and a true value of the random information;
based on this, the daily operating process can be described by the following status, decision and random information:
l:=(S0,x0,WΔt,…,St,xt,Wt+Δt,…,ST) (10)
transfer function: the method refers to a process of transferring from a current state to a next state according to decision and random information:
St+Δt=SM(St,xt,Wt+Δt) (11)
in the formula: sM() a dynamic transfer process including state variables; wherein, the process of SoC transfer between adjacent time periods is shown as a formula (5); for the dynamic transfer of real-time electricity price, load power and frequency deviation, all belong to the information process with independent state, and are respectively expressed as:
Figure FDA0003026243950000071
Figure FDA0003026243950000072
Figure FDA0003026243950000073
in the formula:
Figure FDA0003026243950000074
and
Figure FDA0003026243950000075
predicted values (estimated values) of electricity price, load and frequency deviation respectively;
Figure FDA0003026243950000076
Δft+Δtrespectively representing the real values of electricity price, load and frequency deviation;
Figure FDA0003026243950000077
respectively representing deviation amounts between predicted values (estimated values) and actual values of electricity prices, loads and frequency deviations;
an objective function: for the day-to-day operation problem, the aim is to maximize the accumulated expected net benefit in each time period under the condition of meeting the performance conditions of the related technology; objective function F*Expressed as:
Figure FDA0003026243950000078
in the formula: e {. | · } represents a conditional expectation; ct(St,xt) Is the net benefit function of time period t, the physical meaning of which is equivalent to equation (6); s0Represents an initial state; ctRepresenting a time period t gain; stRepresents the time period t state; x is the number oftRepresenting the time period t decision.
CN202110416788.4A 2021-04-19 2021-04-19 Intra-day optimization operation strategy for providing stacking service by load side energy storage Pending CN112952831A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110416788.4A CN112952831A (en) 2021-04-19 2021-04-19 Intra-day optimization operation strategy for providing stacking service by load side energy storage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110416788.4A CN112952831A (en) 2021-04-19 2021-04-19 Intra-day optimization operation strategy for providing stacking service by load side energy storage

Publications (1)

Publication Number Publication Date
CN112952831A true CN112952831A (en) 2021-06-11

Family

ID=76232949

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110416788.4A Pending CN112952831A (en) 2021-04-19 2021-04-19 Intra-day optimization operation strategy for providing stacking service by load side energy storage

Country Status (1)

Country Link
CN (1) CN112952831A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115775081A (en) * 2022-12-16 2023-03-10 华南理工大学 Random economic dispatching method, device and medium for power system
CN117277393A (en) * 2023-11-22 2023-12-22 宁德时代新能源科技股份有限公司 Energy storage configuration method, energy storage configuration device, energy storage system and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115775081A (en) * 2022-12-16 2023-03-10 华南理工大学 Random economic dispatching method, device and medium for power system
CN115775081B (en) * 2022-12-16 2023-10-03 华南理工大学 Random economic scheduling method, device and medium for electric power system
CN117277393A (en) * 2023-11-22 2023-12-22 宁德时代新能源科技股份有限公司 Energy storage configuration method, energy storage configuration device, energy storage system and storage medium
CN117277393B (en) * 2023-11-22 2024-04-12 宁德时代新能源科技股份有限公司 Energy storage configuration method, energy storage configuration device, energy storage system and storage medium

Similar Documents

Publication Publication Date Title
JP7320503B2 (en) Systems and methods for optimal control of energy storage systems
CN112186743B (en) Dynamic power system economic dispatching method based on deep reinforcement learning
Kou et al. Distributed EMPC of multiple microgrids for coordinated stochastic energy management
Ju et al. Multi-objective stochastic scheduling optimization model for connecting a virtual power plant to wind-photovoltaic-electric vehicles considering uncertainties and demand response
Machlev et al. A review of optimal control methods for energy storage systems-energy trading, energy balancing and electric vehicles
Huang et al. A control strategy based on deep reinforcement learning under the combined wind-solar storage system
Yin et al. Hybrid metaheuristic multi-layer reinforcement learning approach for two-level energy management strategy framework of multi-microgrid systems
US10804702B2 (en) Self-organizing demand-response system
CN112952831A (en) Intra-day optimization operation strategy for providing stacking service by load side energy storage
Qi et al. Energyboost: Learning-based control of home batteries
Han et al. A coordinated dispatch method for energy storage power system considering wind power ramp event
Wen et al. Optimal intra-day operations of behind-the-meter battery storage for primary frequency regulation provision: A hybrid lookahead method
Scarabaggio et al. Stochastic model predictive control of community energy storage under high renewable penetration
CN115310775A (en) Multi-agent reinforcement learning rolling scheduling method, device, equipment and storage medium
Rezazadeh et al. A federated DRL approach for smart micro-grid energy control with distributed energy resources
CN114169916A (en) Market member quotation strategy making method suitable for novel power system
Saini et al. Data driven net load uncertainty quantification for cloud energy storage management in residential microgrid
Cao et al. Model-free voltage regulation of unbalanced distribution network based on surrogate model and deep reinforcement learning
Wen et al. Optimal operation framework of customer-premise battery storage for energy charge reduction and primary frequency regulation
Tsao et al. Integrated voltage control and maintenance insurance planning for distribution networks considering uncertainties
CN114547821A (en) Schedulable flexible resource identification method based on grey correlation theory and storage medium
Panda et al. Prioritized experience replay based deep distributional reinforcement learning for battery operation in microgrids
Madahi et al. Distributional Reinforcement Learning-based Energy Arbitrage Strategies in Imbalance Settlement Mechanism
Schoot Uiterkamp Robust planning of electric vehicle charging
Zhang et al. A MATD3-based Voltage Control Strategy for Distribution Networks Considering Active and Reactive Power Adjustment Costs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination