CN115860789B - CES day-ahead scheduling method based on FRL - Google Patents
CES day-ahead scheduling method based on FRL Download PDFInfo
- Publication number
- CN115860789B CN115860789B CN202310191179.2A CN202310191179A CN115860789B CN 115860789 B CN115860789 B CN 115860789B CN 202310191179 A CN202310191179 A CN 202310191179A CN 115860789 B CN115860789 B CN 115860789B
- Authority
- CN
- China
- Prior art keywords
- ces
- lces
- model
- agent
- frl
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 27
- 238000004146 energy storage Methods 0.000 claims abstract description 23
- 238000012549 training Methods 0.000 claims abstract description 23
- 238000005265 energy consumption Methods 0.000 claims abstract description 13
- 230000008569 process Effects 0.000 claims abstract description 12
- 230000006870 function Effects 0.000 claims description 39
- 230000009471 action Effects 0.000 claims description 30
- 230000035945 sensitivity Effects 0.000 claims description 13
- 230000005611 electricity Effects 0.000 claims description 12
- 230000003068 static effect Effects 0.000 claims description 11
- 238000009826 distribution Methods 0.000 claims description 6
- 238000005457 optimization Methods 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 4
- 230000004931 aggregating effect Effects 0.000 claims description 3
- 230000002452 interceptive effect Effects 0.000 claims description 3
- 238000010008 shearing Methods 0.000 claims description 3
- 230000002787 reinforcement Effects 0.000 abstract description 19
- 239000003795 chemical substances by application Substances 0.000 description 50
- 230000004224 protection Effects 0.000 description 10
- 238000010586 diagram Methods 0.000 description 7
- 238000003860 storage Methods 0.000 description 7
- 230000008901 benefit Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 4
- 238000004590 computer program Methods 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 239000000047 product Substances 0.000 description 3
- 238000010276 construction Methods 0.000 description 2
- 125000004122 cyclic group Chemical group 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 239000011232 storage material Substances 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a CES day-ahead scheduling method based on FRL, which comprises a plurality of community energy storage systems LCES and a single global server GS; the training process of the FRL comprises the following steps: LCES trains and updates the local model, and uses noise disturbance to update gradient; the GS sums the noise gradients of a plurality of LCES, updates the global model of the GS, and broadcasts the latest GS model to the LCES; and (5) iteratively updating the local model and the global model, meeting the stopping requirement, and completing training. The CES scheduling is carried out based on the federal reinforcement learning framework, the whole algorithm operates in a layered distributed architecture, the local community scheduling agent aims at minimizing daily energy cost of communities, and the method does not need to share energy consumption data among communities, only needs to share disturbance model gradient, and protects privacy of community families.
Description
Technical Field
The invention relates to the technical field of energy storage scheduling, in particular to a CES day-ahead scheduling method based on FRL.
Background
The household sharing high-capacity energy storage equipment in communities can realize space-time transfer and energy arbitrage of household demands under a time-of-use electricity price plan, energy Storage (ES) is an important component of a novel power system, randomness and fluctuation of renewable energy sources can be relieved, under a time-of-use electricity price (ToU) plan, ES can also realize energy arbitrage by storing energy in off-peak periods and releasing energy in peak periods, and with the development of the era, a community sharing energy storage system (CES) appears, however, the traditional scheduling method cannot meet the dynamically-changed household demands, and the energy storage scheduling needs household detailed energy consumption data, so that the privacy problem is related.
Disclosure of Invention
The invention aims to provide a CES (community shared energy storage system) day-ahead scheduling method based on FRL (federal reinforcement learning) to solve the defects in the background technology.
In order to achieve the above object, the present invention provides the following technical solutions: the CES day-ahead scheduling method based on FRL comprises a plurality of community energy storage systems LCES and a single global server GS;
the training process of the FRL comprises the following steps:
LCES trains and updates the local model, and uses noise disturbance to update gradient;
the GS sums the noise gradients of a plurality of LCES, updates the global model of the GS, and broadcasts the latest GS model to the LCES;
and (5) iteratively updating the local model and the global model, meeting the stopping requirement, and completing training.
Preferably, the FRL operates in a hierarchical distributed architecture, the GS updates the global model by aggregating local model gradients, the LCES trains the DRL agent using local data, and model gradients are reported to the GS, only model gradients or model parameters are exchanged between the GS and LCES to enable computation of CES agent.
Preferably, the CES builds a target optimization model for minimizing total energy consumption of the community, including:
objective function: the community total energy consumption minimization is defined as:
wherein comprisesTime CES charge +.>Cost of->Part of the demand that cannot be satisfied by CES at the momentCost of CES service +.>,/>A service charge indicating a CES unit charge amount;
wherein Is->Time ToU price of electricity, ->Is->Time CES charge,/->Is->The amount of discharge delivered by CES to the home in the community at time,/->Is->The total household demand in the community at any moment;
constraint conditions:
constraint: consider CES charging efficiency ratio->And discharge efficiency ratio->In case of updating state of charge, +.>Is->Time CES remaining capacity, ++>Representing CES total capacity;
constraint II: constraining the CES state, and setting SOE of initial time to 0;
constraint III and constraint IV: constraining CES charge rateAnd discharge rate->Within a reasonable range, CES is prevented from being excessively charged and discharged;
constraint V: the balance of the total demands of communities is ensured.
Preferably, constraint III and constraint IV are defined by reasonable ranges of constraint parameters of the formula:
in the state ofIs->The ratio of the remaining capacity of CES to the total capacity at time,/->Representation->The state of the environment where the CES agent is located at the moment, the static factors of energy storage are used as the states to be input into the model network, and the action space is +.>Including CES atThe charge and discharge coefficients at different times are defined as:
in the formula ,indicating CES is +.>The moment of time is from the grid charging capacity, the value of which ranges from +.>Between, and->Moment of time from grid charge->The relation of->,/>Indicating CES is +.>The discharge coefficient given to the community at the moment is equal to +.>The relation of (2) is that,/>Representation->Time CES agent in the environment->Under execution ofAction;
the reward function R represents feedback obtained by the CES agent in the exploration of the environment, for guiding the agent to achieve a predetermined objective, the reward function comprising a reward for the agent to perform a correct action, and a penalty for performing a false action resulting in the environment not meeting the CES device basic constraints, defined as:
constraintIs->The amount of energy cost saved by the whole system when the agent has performed CES scheduling for 24 hours is defined as follows:
when (when)The larger the scheduling savings, the larger the system awards the agent->When negative, the system gives a proxy penalty, < ->All are coefficients, and the strength of rewards and punishments is adjusted.
Preferably, after each LCES is trained locally for a fixed number of times, the final noise gradient is uploaded to the GS, and the structure is satisfiedNoise gradient of->Is a privacy requirement;
original obtained by training LCES modelGradient ofNeed to restrict->The sensitivity of (2) is calculated as:
wherein Is the gradient of LCES local training, < >>Is sensitivity, that is to say any two gradients +.>The method meets the following conditions:
based on gradient after shearingSensitivity->Each LCES locally generates Laplace noise +.>,/>The method meets the following conditions:
Preferably, the interactive gradient and model of the mutual iteration of the LCES and the GS are scheduled in a continuous state and action space, the PPO algorithm is applied to the learning process of the LCES agent, the PPO algorithm operates a plurality of epodes with a fixed strategy, the running track is reserved, and the rewards obtained by the LCES agent are products of the saved amount and the correlation coefficient when the whole epode is finished.
Preferably, the policy model of the LCES agent inputs the state at each moment, outputs the mean and variance of the continuous motion, samples the motion from the distribution determined by the mean and variance, constructs a noise gradient satisfying LDP definition, reports to the global GS, the global GS caches the received disturbance gradient, updates the GS model using these gradients when a certain number is reached, and broadcasts the updated model to all LCES.
Preferably, in the framework of the FRL, each LCES agent reports a satisfactionGS uses the noise gradient of LCES to update +.>Independent of any privacy information of LCES, GS will be updated after the next round>Broadcast to all LCES, which train in the local environment.
Preferably, it is provided withIs an original function, without noise, not conforming to the LDP definition, < >>Is in accordance withDefined function +_>,/>Is two different gradients, sensitivity is defined as:
in the noiseObeys->Then a function satisfying the strict differential privacy definition is obtained>。
In the technical scheme, the invention has the technical effects and advantages that:
1. the CES scheduling is carried out based on the federal reinforcement learning framework, the whole algorithm operates in a layered distributed architecture, the local community scheduling agent aims at minimizing daily energy cost of communities, and the method does not need to share energy consumption data among communities, only needs to share disturbance model gradient, and protects privacy of community families.
2. Compared with a static CES scheduling method, the method provided by the invention has the advantages that the effectiveness of the scheduling method is proved by experiments, the federal learning method can be converged more quickly to reach an optimal solution, the agent can be trained in different environments, and meanwhile, aiming at different privacy requirements, the proposed method obtains different experimental effects, and the trade-off between the cost saving amount and the privacy protection degree is demonstrated.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.
FIG. 1 is a diagram of a community energy storage scheduling architecture of the present invention.
Fig. 2 is a diagram of a CES scheduling architecture based on FRL according to the present invention.
Fig. 3 is a block diagram of an FRL-based CES system of the present invention.
FIG. 4 is a schematic diagram of community energy requirements and ToU electricity prices according to the present invention.
Fig. 5 is a diagram of CES scheduling results for different communities of the present invention.
FIG. 6 is a plot of the impact of CES capacity size on community cost savings of the present invention.
FIG. 7 is a schematic diagram showing a comparison of reinforcement learning, federal reinforcement learning, methods of combining differential privacy and static allocation policies in different communities.
FIG. 8 is a graph of reinforcement learning and federal reinforcement learning training according to the present invention.
Fig. 9 is a schematic diagram showing comparison of model convergence speeds under different privacy protection forces according to the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
Referring to fig. 1, fig. 2 and fig. 3, the CES day-ahead scheduling method based on FRL according to the present embodiment is composed of N community energy storage systems LCES and a single global server GS, and the training process of federal reinforcement learning FRL includes two steps:
LCES trains and updates the local model, and uses noise disturbance to update gradient;
the GS sums the noise gradients of the N LCES to update the global model of the GS, and then broadcasts the latest GS model to the LCES; the local model and the global model are iteratively updated until a certain stopping requirement is met.
The reinforcement learning agent performs CES day-ahead scheduling using a near-end policy optimization (PPO) algorithm.
The agent has the task of reducing the total energy expenditure of the community as much as possible under the condition of meeting the energy requirements of families in the community.
CES scheduling is based on the federal reinforcement learning framework. The entire algorithm runs in a hierarchical distributed architecture, with the local community scheduling agent targeting minimizing the daily energy costs of the community. The method does not need to share energy consumption data among communities, only needs to share disturbance model gradient, and protects privacy of communities and families.
For CES agents, the given states include ToU electricity prices, community total energy demand on the day, CES total capacity, CES current capacity duty cycle, and current time of day.
CES agent calculates optimal charge and discharge schedules.
Because of CES capacity limitations, if the community total energy demand cannot be met at a certain time, then the user needs to purchase the balance of energy from the grid at that time.
The FRL mathematical model and algorithm-state and action space formulas, rewarding functions, LDP, CES scheduling algorithm based on FRL and reinforcement learning PPO algorithm.
The scheduling algorithm runs in a layered distributed architecture, the GS updates a global model by aggregating local model gradients, for LCES, the DRL agent is trained by using local data, and model gradients are reported to the GS, and the calculation of the optimal CES agent can be realized only by exchanging model gradients or model parameters between the GS and the LCES;
the combination of LDP into the FRL framework implements a privacy-preserving CES scheduling algorithm, and LCES will perturb the local model gradient using laplace noise before uploading the locally trained model gradient. Gradient aggregation of privacy protection is realized, and local environment privacy is protected;
compared with the independent DRL, the FRL has higher convergence rate, and meanwhile, by adjusting the LDP parameters, the optimal solution can be weighed between privacy protection and model precision.
Example 2
In this embodiment, the optimization target and constraint condition of the CES scheduling system are defined in a mathematical form, and the CES scheduling model based on deep reinforcement learning DRL and the CES scheduling model combined with local differential privacy LDP are described.
CES day-ahead scheduling requires users to reserve a day ahead and then arrange for the corresponding energy storage service in order to minimize the total energy expenditure of the overall system.
Due to the high cost of CES construction, long maintenance is required and energy storage resources cannot be fully utilized by a single household.
Therefore, the energy storage equipment is shared by a plurality of families in the community, the utilization rate of the energy storage equipment is improved, the initial construction cost and the long-term maintenance cost can be shared together, and the total energy consumption of the community is reduced as a whole.
To this end, we construct a target optimization model for community total energy consumption minimization, comprising:
1) An objective function.
The community total energy consumption minimization is defined as follows
The goal of equation (1) is to minimize the total energy consumption of the community, includingTime CES charge +.>Cost of (2)Part of the demands which cannot be met by the time CES ∈>Cost of (2) and CES service fee,/>Indicating the service charge required for CES unit charge.
wherein Is->Time ToU price of electricity, ->Is->Time CES charge,/->Is->The amount of discharge delivered by CES to the home in the community at time,/->Is->And (5) the total household demand in the community at any time.
2) Constraint conditions.
Constraint: in consideration of CES charging efficiency ratio->And discharge efficiency ratio->Is a case of updating the state of charge,is CES remaining capacity at time t, +.>Representing CES total capacity.
Constraint II: ensuring a viable CES state, assuming an SOE of 0 at the initial time.
Constraint III and constraint IV: ensuring CES charge rateAnd discharge rate->And within a reasonable range, CES is prevented from being overcharged and discharged.
Constraint V: the balance of the total demands of communities is ensured, namely, the household electricity demand in communities can be completely met.
Equation (3) constrains a reasonable range of parameters in the system,is the maximum timestamp, the present application considers the day-ahead schedule at the interval of hours, therefore +.>。
A. CES scheduling model based on DRL:
in the definition of the state space above, the stateIs->The ratio of the remaining capacity of CES to the total capacity at the moment. />Representation->The state of the environment in which the CES agent is located at the moment.
In the prior art, only time-related dynamic variables are considered for the state space of the energy storage agent, but we find through experiments that the static factors related to energy storage are also used as the state input model network, so that the agent convergence speed can be accelerated.
The method is also direct, more relevant information is input into the model network, and the agent can be enabled to know the environment more comprehensively and carefully, so that an excellent decision can be made more quickly.
2) Action space: action spaceThe charge and discharge coefficients including CES at different times are defined as follows:
indicating CES is +.>The moment of time is from the grid charging capacity, the value of which ranges from +.>Between, and->Moment of time from grid charge->The relation of->,/>Indicating CES is +.>The discharge coefficient given to the community at the moment is equal to +.>The relation of (2) is that,/>Representation->Time CES agent in the environment->Actions performed below.
3) Bonus function: the reward function R represents feedback obtained by the CES agent in the exploration of the environment S for guiding the agent to achieve a predetermined objective.
The setting of the reward function should include the reward for the agent performing the correct action, and the penalty for performing the false action resulting in the environment not meeting the CES device base constraint, and therefore the reward function is defined as follows:
where constraint VII-constraint IX represents that when the action performed by CES exceeds the constraint in P (1), the system gives a penalty, and if within the constraint, a reward.
ConstraintIs->The amount of energy cost saved by the whole system when the agent has performed CES scheduling for 24 hours is defined as follows:
thus whenThe larger the current day, the larger the amount of the scheduling savings is, and the more rewards are given to the agent by the system. If->When negative, the system gives severe penalties to agents.
All are coefficients, the dynamics of rewards and punishments are used for adjusting, and optimal rewards and punishment coefficients are adjusted through experimental results.
For a 24 hour day before schedule scenario, consider that the operation of the agent at each time may exceed the constraint of P (1), and that the total savings at the last time of day before schedule optimizes the execution actions of the agent.
4) PPO algorithm: the CES agent, after performing actions with a specified policy, optimizes the CES agent by increasing the probability of good actions and decreasing the probability of bad actions after the end of the session.
The PPO algorithm uses an importance sampling technology, the problem that samples in the strategy gradient algorithm can be used only once is solved, and the PPO algorithm uses a dominance function to replace a rewarding function, so that the model is more focused on average rewards brought by actions.
We mark the trajectory asParameterized policy->, wherein />Is a parameter of the distribution approximation. The purpose of the PPO algorithm is to maximize the policy +.>Lower rewarding desire->The likelihood function is therefore as follows:
wherein ,respectively represent policy->Under the probability of performing an action, +.>Is a privacy requirement, defines the clipping range and is related to sensitivity.
B. CES scheduling model in combination with LDP:
LCES generates Laplace noise to perturb the local gradient before reporting the local gradient, preventing malicious parties from analyzing local privacy information from the gradient.
Thus, the local differential privacy provides a strict privacy guarantee for training results before the LCES reports training results, we assume that LCES uses a random functionDisturbance training result, the value range of the random function is +.>Define the domain as->。/>
Definition 1: for any possible inputAnd a subset of arbitrary outputs +.>The random function is +_if and only if the following inequality holds>Satisfy->:
sensitivity defines the maximum variation of the random function as the input data set fluctuatesThe maximum change in output that occurs.
Laplace mechanism: the Laplace mechanism is a random mechanism that randomly samples from the Laplace distribution according to the sensitivity of the objective function, defined as:
for the random functionArbitrary deterministic or random function as defined above>If->Satisfy->ThenAlso for arbitrary inputs->Satisfy->。
after a number of rounds, the agent updates the model with a loss function based on the historical trajectory information and the rewards obtained;
After multiple rounds of updating, the agent finds the final updated gradient and LCES computes a perturbed random gradient before reporting to GS;
Definition 3 (meetNoise gradient of (d): for any local community scheduling system->Arbitrary two local gradients +.>And arbitrary random gradient subset->The following inequality must hold:
wherein Is the noise gradient after disturbance, +.>Is the true gradient obtained by LCES local training.
For LCES reported noise gradients, the GS convergence averages the gradient and is then used to update the global model and share the latest GS model with all LCES.
We assume that each LCES uploads the final noise gradient to the GS after a fixed number of local training.
Raw gradient training for LCES modelFirst of all, a restriction is required>The sensitivity of (2) is calculated as: />
wherein Is the gradient of LCES local training, < >>Is sensitivity, that is to say any two gradients +.>The method meets the following conditions:
based on gradient after shearingSensitivity->Each LCES can generate Laplace noise locally +.>,/>The method meets the following conditions:
Example 3
The present embodiment proposes a CES scheduling algorithm based on FRL, see algorithm one:
firstly, initializing related input parameters including energy requirements, toU price and CES related parameters at all moments of a community, and a GS reinforcement learning modelDimension->And broadcast toAll LCES, clipping parameters->Local privacy requirement->Then starting the loop, the maximum loop number being the maximum communication number, starting the calculation for all LCES, iterating from epoode=0 to the maximum update number of LCES,
according to the strategyRun 96 time stamps and record policy track +.>Calculating the dominance function of each state +.>Calculating a loss function:
then updating the LCES reinforcement learning model by using an Adamw optimizer to calculate model gradientAnd a disturbance gradient->And the noise gradient after disturbance is +.>Reported to the GS, which can buffer all received noise gradients, and if the GS buffer is full, calculate the mean +.>And update the global model +.>Finally, the buffer memory is emptied,and outputting a result to finish the algorithm.
The algorithm runs in a distributed fashion, with interactive gradients and models of LCES and GS iterating over each other. The LCES agents are scheduled in a continuous state and action space, we apply the PPO algorithm to the LCES agent's learning process.
The PPO algorithm runs multiple epodes with a fixed policy, keeping the running trace, which we set to 96 timestamps in this application. Then, according to the existing track, the probability of the action with large average rewards is increased, and the probability of the action with small average rewards is reduced.
The reward earned by the LCES agent in the present system is the product of the saved amount and the correlation coefficient when the entire epoode is finished.
The policy model of the LCES agent inputs the state of each moment, outputs the mean and variance of the continuous action, and samples the action from the distribution determined by the mean and variance.
This allows the LCES agent to try to all possible possibilities of the action space, avoiding trapping in the extremum region.
After the local training is completed, LCES follows algorithm two:
to calculateNoise gradient->Firstly, the corresponding parameters are input, including the original gradient +.>Dimension->Privacy requirement->Cutting range->According to equation (13), it is possible to rely on the original gradient +.>Calculating shear gradient->Then, the cyclic calculation is performed a plurality of times according to the same formula (15) until the cyclic number reaches d, and then noise +.>Finally return the result->The algorithm is completed.
Constructing noise gradients meeting LDP definition, reporting the noise gradients to a global GS, caching the received disturbance gradients by the global GS, updating a GS model by using the gradients after a certain number of the received disturbance gradients are reached, and broadcasting the updated model to all LCES.
In the FRL framework-based algorithm one described in the application, all LCES meet the following requirements。
Within the framework of FRL, each LCES agent reports one satisfactionIs updated by the noise gradient of LCES alone>This step is independent of any privacy information of the LCES;
and update the model to violateAfter the next round, GS will be updated +.>Broadcast to all LCES, LCES trains in the local environment, and the local learning process is independent of all other agents, so that other agents are not violated eitherAnd (5) defining.
Assume thatIs an original function, without noise, not conforming to the LDP definition, < >>Is in accordance with->Defined functions, i.e.)>,/>Is two different gradients. Sensitivity is defined as see equation (10), privacy budget is +.>It is possible to obtain:
i.e. the probability of the random function outputting a specified value, equal to the probability distribution of the associated noise, we let us say thatObeys->Then a function satisfying the strict differential privacy definition can be obtained +.>The following formula is obtained:
the above formula shows a gradientObtaining a specified result via a noise function>Is the same to obtain the gradient +.>And->Is defined by the probability formula:
in comparison, the two can be obtained:
Example 4
In this embodiment, the relevant work is verified using the authentic data. Consider three communities of different CES specifications, as shown in table 1:
in the table 1 of the description,
energy demand and ToU electricity prices for each community as shown in fig. 4, we assume that 50 iterations of LCES training followed by one round of communication with GS, the experiment was run on the Ubuntu system using python3.9 and pytorch1.12.1.
The scheduling effect of the proposed method is evaluated first. CES scheduling service scenario showing 3 communities is shown in fig. 5.
Each community can discharge by using CES equipment in the peak period of electricity price to realize energy arbitrage, and as can be seen from fig. 5, the main energy requirements of the community come from the power grid in the low peak period of electricity price.
Because the initial CES has no stored energy, CES is charging the reserve energy before the peak period, starting at 0 a day.
When the time comes to the electricity price peak period, the main energy consumption of the community is provided by CES, and if the CES cannot completely meet the demands of the community family at certain moments, the community family can supplement the balance demands from the power grid.
As shown in fig. 6, it can be observed that when CES capacity is small, there is a significant increase in the cost savings amount of the community as CES capacity increases, but the cost savings amount after CES capacity exceeds some upper limit is not significant, and even does not increase any more, and for community two, CES maximum capacity threshold is between 70 and 80kWh, so our method can also combine user history data to predict community optimal CES capacity.
In fig. 7, we compare the cost savings of four different scheduling methods, namely reinforcement learning, federal reinforcement learning, methods of combining differential privacy, and static allocation strategies as proposed in this application.
In the static allocation strategy, community shared energy storage capacity is divided into different community users, and the users independently operate own energy storage capacity.
When privacy concerns are not considered, reinforcement learning and federal reinforcement perform better than static allocation strategies.
And dynamic battery allocation strategies are always preferred over static strategies because static allocation cannot reuse CES capacity and cannot achieve optimal CES scheduling solutions.
As can be seen from fig. 8, the federal reinforcement not only improves the model performance but also increases the model convergence rate.
This is because agents in federal enforcement can learn knowledge from more environments.
When privacy is considered, CES agents sacrifice some performance in exchange for privacy protection, indicating a tradeoff between privacy and utility.
At the same time we can see that even with privacy protection in mind, the proposed method still performs better than the static allocation strategy.
Fig. 9 shows a comparison of model convergence rates at different privacy protections, as can be seen,the represented solid line is better than +.>Is shown in the figure).
This is becauseThe larger the noise is, the smaller the privacy protection on the gradient is, but the more excellent model performance and faster convergence speed can be obtained.
The difference between the final convergence of the two is not great as the model is trained, which also means that the model can learn the correct knowledge even under the stricter privacy requirements.
This is because adding noise to the model is also a method to prevent model overfitting and can improve the reasoning ability of the model.
The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. When the computer instructions or computer program are loaded or executed on a computer, the processes or functions described in accordance with the embodiments of the present application are all or partially produced. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more sets of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.
It should be understood that the term "and/or" in this application is merely an association relationship describing the associated object, and indicates that three relationships may exist, for example, a and/or B may indicate: there are three cases, a alone, a and B together, and B alone, wherein a, B may be singular or plural. In addition, the character "/" in this application generally indicates that the associated object is an "or" relationship, but may also indicate an "and/or" relationship, and may be understood by referring to the context.
In the present application, "at least one" means one or more, and "a plurality" means two or more. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural.
It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.
In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (8)
1. The CES day-ahead scheduling method based on FRL is characterized in that: including a plurality of community energy storage systems LCES and a single global server GS;
the training process of the FRL comprises the following steps:
LCES trains and updates the local model, and uses noise disturbance to update gradient;
the GS sums the noise gradients of a plurality of LCES, updates the global model of the GS, and broadcasts the latest GS model to the LCES;
the local model and the global model are iteratively updated to meet the stopping requirement, and training is completed;
the FRL operates in a hierarchical distributed architecture, the GS updates a global model by aggregating local model gradients, the LCES trains a DRL agent by using local data, and reports model gradients to the GS, and only model gradients or model parameters are exchanged between the GS and the LCES to realize the calculation of the CES agent;
the CES building a target optimization model for minimizing community total energy consumption comprises the following steps:
objective function: the community total energy consumption minimization is defined as:
wherein comprisesTime CES charge +.>Cost of->Part of the demand that cannot be satisfied by CES at the momentCost of CES service +.>,/>A service charge indicating a CES unit charge amount; t is the maximum timestamp, scheduled before the day at the interval of hours, then t=24;
wherein Is->Time ToU price of electricity, ->Is->Time CES charge,/->Is->The amount of discharge delivered by CES to the home in the community at time,/->Is->The total household demand in the community at any moment;
constraint conditions:
constraint: consider CES charging efficiency ratio->And discharge efficiency ratio->In case of updating state of charge, +.>Is->Time CES remaining capacity, ++>Representing CES total capacity;
constraint II: constraining the CES state, and setting SOE of initial time to 0;
constraint III and constraint IV: constraining CES charge rateAnd discharge rate->Within a reasonable range, CES is prevented from being excessively charged and discharged;
constraint V: the balance of the total demands of communities is ensured.
3. The FRL-based CES day-ahead scheduling method of claim 2, further comprising: for any arbitraryThe state space of the CES agent is defined at time instant as:
in the state ofIs->The ratio of the remaining capacity of CES to the total capacity at time,/->,/>Representation->The state of the environment where the CES agent is located at the moment, the static factors of energy storage are used as the states to be input into the model network, and the action space is +.>The charge and discharge coefficients including CES at different times are defined as:
in the formula ,indicating CES is +.>The moment of time is from the grid charging capacity, the value of which ranges from +.>Between, and->Moment of time from grid charge->The relation of->,/>Indicating CES is +.>The discharge coefficient given to the community at the moment is equal to +.>The relation of->,/>Representation->Time CES agent in the environment->Action performed down;
the reward function R represents feedback obtained by the CES agent in the exploration of the environment, for guiding the agent to achieve a predetermined objective, the reward function comprising a reward for the agent to perform a correct action, and a penalty for performing a false action resulting in the environment not meeting the CES device basic constraints, defined as:
constraintIs->The amount of energy cost saved by the whole system when the agent has performed CES scheduling for 24 hours is defined as follows:
4. The FRL-based CES day-ahead scheduling method of claim 3, further comprising: after each LCES is trained locally for a fixed number of times, the final noise gradient is uploaded to the GS, and the structure meets the requirementNoise gradient of->Is a privacy requirement;
original gradient obtained by LCES model trainingNeed to restrict->The sensitivity of (2) is calculated as:
wherein Is the gradient of LCES local training, < >>Is sensitivity, that is to say any two gradients +.>The method meets the following conditions:
based on gradient after shearingSensitivity->Each LCES locally generates Laplace noise +.>,/>The method meets the following conditions: />
5. The FRL-based CES day-ahead scheduling method of claim 4, further comprising: the LCES and GS mutually iterate interactive gradient and model, the LCES agent is scheduled in continuous state and action space, the PPO algorithm is applied to the learning process of the LCES agent, the PPO algorithm runs a plurality of epodes with fixed strategy, the running track is reserved, and the rewards obtained by the LCES agent are the products of the saved amount and the related coefficients when the whole epode is finished.
6. The FRL-based CES day-ahead scheduling method of claim 5, further comprising: the strategy model of the LCES agent inputs the state of each moment, outputs the mean value and the variance of continuous motion, samples the motion from the distribution determined by the mean value and the variance, constructs noise gradients meeting LDP definition by LCES, reports the noise gradients to the global GS, the global GS caches the received disturbance gradients, updates the GS model by using the gradients after a certain number of the received disturbance gradients is reached, and broadcasts the updated model to all LCES.
7. The FRL-based CES day-ahead scheduling method of claim 6, further comprising: in the frame of the FRLEach LCES agent reports one of the satisfactionGS uses the noise gradient of LCES to update +.>Independent of any privacy information of LCES, GS will be updated after the next round>Broadcast to all LCES, which train in the local environment.
8. The FRL-based CES day-ahead scheduling method of any of claims 1-7, characterized by: is provided withIs an original function, without noise, not conforming to the LDP definition, < >>Is in accordance with->The function of the definition is such that,,/>is two different gradients, sensitivity is defined as:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310191179.2A CN115860789B (en) | 2023-03-02 | 2023-03-02 | CES day-ahead scheduling method based on FRL |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310191179.2A CN115860789B (en) | 2023-03-02 | 2023-03-02 | CES day-ahead scheduling method based on FRL |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115860789A CN115860789A (en) | 2023-03-28 |
CN115860789B true CN115860789B (en) | 2023-05-30 |
Family
ID=85659704
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310191179.2A Active CN115860789B (en) | 2023-03-02 | 2023-03-02 | CES day-ahead scheduling method based on FRL |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115860789B (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115310121A (en) * | 2022-07-12 | 2022-11-08 | 华中农业大学 | Real-time reinforced federal learning data privacy security method based on MePC-F model in Internet of vehicles |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210089910A1 (en) * | 2019-09-25 | 2021-03-25 | Deepmind Technologies Limited | Reinforcement learning using meta-learned intrinsic rewards |
CN111611610B (en) * | 2020-04-12 | 2023-05-30 | 西安电子科技大学 | Federal learning information processing method, system, storage medium, program, and terminal |
CN112214788B (en) * | 2020-08-28 | 2023-07-25 | 国网江西省电力有限公司信息通信分公司 | Ubiquitous power Internet of things dynamic data publishing method based on differential privacy |
CN112818394A (en) * | 2021-01-29 | 2021-05-18 | 西安交通大学 | Self-adaptive asynchronous federal learning method with local privacy protection |
CN113221183B (en) * | 2021-06-11 | 2022-09-16 | 支付宝(杭州)信息技术有限公司 | Method, device and system for realizing privacy protection of multi-party collaborative update model |
CN113591145B (en) * | 2021-07-28 | 2024-02-23 | 西安电子科技大学 | Federal learning global model training method based on differential privacy and quantization |
CN113570155A (en) * | 2021-08-13 | 2021-10-29 | 常州工程职业技术学院 | Multi-community energy cooperation game management model based on energy storage device and cheating behavior |
CN114330743A (en) * | 2021-12-24 | 2022-04-12 | 浙江大学 | Cross-equipment federal learning method for minimum-maximum problem |
CN115511054A (en) * | 2022-09-27 | 2022-12-23 | 中国科学技术大学 | Cost perception privacy protection federal learning method facing unbalanced data |
-
2023
- 2023-03-02 CN CN202310191179.2A patent/CN115860789B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115310121A (en) * | 2022-07-12 | 2022-11-08 | 华中农业大学 | Real-time reinforced federal learning data privacy security method based on MePC-F model in Internet of vehicles |
Non-Patent Citations (1)
Title |
---|
跨音速极限环型颤振的高效数值分析方法;张伟伟;王博斌;叶正寅;;力学学报(06);31-41 * |
Also Published As
Publication number | Publication date |
---|---|
CN115860789A (en) | 2023-03-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Chiş et al. | Reinforcement learning-based plug-in electric vehicle charging with forecasted price | |
Ciupageanu et al. | Real-time stochastic power management strategies in hybrid renewable energy systems: A review of key applications and perspectives | |
CN113610303B (en) | Load prediction method and system | |
Kusiak et al. | A data-driven approach for steam load prediction in buildings | |
Chen | Two-level hierarchical approach to unit commitment using expert system and elite PSO | |
Leterme et al. | A flexible stochastic optimization method for wind power balancing with PHEVs | |
Liu et al. | Optimal reserve management of electric vehicle aggregator: Discrete bilevel optimization model and exact algorithm | |
Ghanbarzadeh et al. | Reliability constrained unit commitment with electric vehicle to grid using hybrid particle swarm optimization and ant colony optimization | |
James et al. | Optimal V2G scheduling of electric vehicles and unit commitment using chemical reaction optimization | |
Zhou et al. | LSTM-based energy management for electric vehicle charging in commercial-building prosumers | |
CN111200293A (en) | Battery loss and distributed power grid battery energy storage day-ahead random scheduling method | |
Cao et al. | Energy management optimisation using a combined Long Short-Term Memory recurrent neural network–Particle Swarm Optimisation model | |
Hosking et al. | Short‐term forecasting of the daily load curve for residential electricity usage in the Smart Grid | |
Kong et al. | Refined peak shaving potential assessment and differentiated decision-making method for user load in virtual power plants | |
Porras et al. | An efficient robust approach to the day-ahead operation of an aggregator of electric vehicles | |
Ampatzis et al. | Robust optimisation for deciding on real‐time flexibility of storage‐integrated photovoltaic units controlled by intelligent software agents | |
Chu et al. | A multiagent federated reinforcement learning approach for plug-in electric vehicle fleet charging coordination in a residential community | |
CN116227806A (en) | Model-free reinforcement learning method based on energy demand response management | |
Haque et al. | Stochastic methods for prediction of charging and discharging power of electric vehicles in vehicle‐to‐grid environment | |
Zhang et al. | Data augmentation strategy for small sample short‐term load forecasting of distribution transformer | |
Nammouchi et al. | Robust opportunistic optimal energy management of a mixed microgrid under asymmetrical uncertainties | |
Liu et al. | Reinforcement learning-based energy trading and management of regional interconnected microgrids | |
Xuemei et al. | A novel air-conditioning load prediction based on ARIMA and BPNN model | |
Gulotta et al. | Short-term uncertainty in the dispatch of energy resources for VPP: A novel rolling horizon model based on stochastic programming | |
CN115860789B (en) | CES day-ahead scheduling method based on FRL |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |