CN115860789B - CES day-ahead scheduling method based on FRL - Google Patents

CES day-ahead scheduling method based on FRL Download PDF

Info

Publication number
CN115860789B
CN115860789B CN202310191179.2A CN202310191179A CN115860789B CN 115860789 B CN115860789 B CN 115860789B CN 202310191179 A CN202310191179 A CN 202310191179A CN 115860789 B CN115860789 B CN 115860789B
Authority
CN
China
Prior art keywords
ces
lces
model
agent
frl
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310191179.2A
Other languages
Chinese (zh)
Other versions
CN115860789A (en
Inventor
邱日轩
肖子洋
李帆
郑锦坤
余腾龙
陈明亮
井思桐
吴灵芝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
Information and Telecommunication Branch of State Grid Jiangxi Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
Information and Telecommunication Branch of State Grid Jiangxi Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, Information and Telecommunication Branch of State Grid Jiangxi Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN202310191179.2A priority Critical patent/CN115860789B/en
Publication of CN115860789A publication Critical patent/CN115860789A/en
Application granted granted Critical
Publication of CN115860789B publication Critical patent/CN115860789B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a CES day-ahead scheduling method based on FRL, which comprises a plurality of community energy storage systems LCES and a single global server GS; the training process of the FRL comprises the following steps: LCES trains and updates the local model, and uses noise disturbance to update gradient; the GS sums the noise gradients of a plurality of LCES, updates the global model of the GS, and broadcasts the latest GS model to the LCES; and (5) iteratively updating the local model and the global model, meeting the stopping requirement, and completing training. The CES scheduling is carried out based on the federal reinforcement learning framework, the whole algorithm operates in a layered distributed architecture, the local community scheduling agent aims at minimizing daily energy cost of communities, and the method does not need to share energy consumption data among communities, only needs to share disturbance model gradient, and protects privacy of community families.

Description

CES day-ahead scheduling method based on FRL
Technical Field
The invention relates to the technical field of energy storage scheduling, in particular to a CES day-ahead scheduling method based on FRL.
Background
The household sharing high-capacity energy storage equipment in communities can realize space-time transfer and energy arbitrage of household demands under a time-of-use electricity price plan, energy Storage (ES) is an important component of a novel power system, randomness and fluctuation of renewable energy sources can be relieved, under a time-of-use electricity price (ToU) plan, ES can also realize energy arbitrage by storing energy in off-peak periods and releasing energy in peak periods, and with the development of the era, a community sharing energy storage system (CES) appears, however, the traditional scheduling method cannot meet the dynamically-changed household demands, and the energy storage scheduling needs household detailed energy consumption data, so that the privacy problem is related.
Disclosure of Invention
The invention aims to provide a CES (community shared energy storage system) day-ahead scheduling method based on FRL (federal reinforcement learning) to solve the defects in the background technology.
In order to achieve the above object, the present invention provides the following technical solutions: the CES day-ahead scheduling method based on FRL comprises a plurality of community energy storage systems LCES and a single global server GS;
the training process of the FRL comprises the following steps:
LCES trains and updates the local model, and uses noise disturbance to update gradient;
the GS sums the noise gradients of a plurality of LCES, updates the global model of the GS, and broadcasts the latest GS model to the LCES;
and (5) iteratively updating the local model and the global model, meeting the stopping requirement, and completing training.
Preferably, the FRL operates in a hierarchical distributed architecture, the GS updates the global model by aggregating local model gradients, the LCES trains the DRL agent using local data, and model gradients are reported to the GS, only model gradients or model parameters are exchanged between the GS and LCES to enable computation of CES agent.
Preferably, the CES builds a target optimization model for minimizing total energy consumption of the community, including:
objective function: the community total energy consumption minimization is defined as:
Figure SMS_1
wherein comprises
Figure SMS_2
Time CES charge +.>
Figure SMS_3
Cost of->
Figure SMS_4
Part of the demand that cannot be satisfied by CES at the moment
Figure SMS_5
Cost of CES service +.>
Figure SMS_6
,/>
Figure SMS_7
A service charge indicating a CES unit charge amount;
wherein
Figure SMS_9
Is->
Figure SMS_10
Time ToU price of electricity, ->
Figure SMS_11
Is->
Figure SMS_12
Time CES charge,/->
Figure SMS_13
Is->
Figure SMS_14
The amount of discharge delivered by CES to the home in the community at time,/->
Figure SMS_15
Is->
Figure SMS_8
The total household demand in the community at any moment;
constraint conditions:
Figure SMS_16
constraint
Figure SMS_17
: consider CES charging efficiency ratio->
Figure SMS_18
And discharge efficiency ratio->
Figure SMS_19
In case of updating state of charge, +.>
Figure SMS_20
Is->
Figure SMS_21
Time CES remaining capacity, ++>
Figure SMS_22
Representing CES total capacity;
constraint II: constraining the CES state, and setting SOE of initial time to 0;
constraint III and constraint IV: constraining CES charge rate
Figure SMS_23
And discharge rate->
Figure SMS_24
Within a reasonable range, CES is prevented from being excessively charged and discharged;
constraint V: the balance of the total demands of communities is ensured.
Preferably, constraint III and constraint IV are defined by reasonable ranges of constraint parameters of the formula:
Figure SMS_25
Figure SMS_26
is the maximum timestamp, schedule before day at the interval of hours, +.>
Figure SMS_27
Preferably, for any of
Figure SMS_28
The state space of the CES agent is defined at time instant as:
Figure SMS_29
in the state of
Figure SMS_30
Is->
Figure SMS_31
The ratio of the remaining capacity of CES to the total capacity at time,/->
Figure SMS_32
Representation->
Figure SMS_33
The state of the environment where the CES agent is located at the moment, the static factors of energy storage are used as the states to be input into the model network, and the action space is +.>
Figure SMS_34
Including CES atThe charge and discharge coefficients at different times are defined as:
Figure SMS_35
in the formula ,
Figure SMS_37
indicating CES is +.>
Figure SMS_39
The moment of time is from the grid charging capacity, the value of which ranges from +.>
Figure SMS_41
Between, and->
Figure SMS_43
Moment of time from grid charge->
Figure SMS_45
The relation of->
Figure SMS_47
,/>
Figure SMS_48
Indicating CES is +.>
Figure SMS_36
The discharge coefficient given to the community at the moment is equal to +.>
Figure SMS_38
The relation of (2) is that
Figure SMS_40
,/>
Figure SMS_42
Representation->
Figure SMS_44
Time CES agent in the environment->
Figure SMS_46
Under execution ofAction;
the reward function R represents feedback obtained by the CES agent in the exploration of the environment, for guiding the agent to achieve a predetermined objective, the reward function comprising a reward for the agent to perform a correct action, and a penalty for performing a false action resulting in the environment not meeting the CES device basic constraints, defined as:
Figure SMS_49
constraint
Figure SMS_50
Is->
Figure SMS_51
The amount of energy cost saved by the whole system when the agent has performed CES scheduling for 24 hours is defined as follows:
Figure SMS_52
when (when)
Figure SMS_53
The larger the scheduling savings, the larger the system awards the agent->
Figure SMS_54
When negative, the system gives a proxy penalty, < ->
Figure SMS_55
All are coefficients, and the strength of rewards and punishments is adjusted.
Preferably, after each LCES is trained locally for a fixed number of times, the final noise gradient is uploaded to the GS, and the structure is satisfied
Figure SMS_56
Noise gradient of->
Figure SMS_57
Is a privacy requirement;
original obtained by training LCES modelGradient of
Figure SMS_58
Need to restrict->
Figure SMS_59
The sensitivity of (2) is calculated as:
Figure SMS_60
wherein
Figure SMS_61
Is the gradient of LCES local training, < >>
Figure SMS_62
Is sensitivity, that is to say any two gradients +.>
Figure SMS_63
The method meets the following conditions:
Figure SMS_64
based on gradient after shearing
Figure SMS_65
Sensitivity->
Figure SMS_66
Each LCES locally generates Laplace noise +.>
Figure SMS_67
,/>
Figure SMS_68
The method meets the following conditions:
Figure SMS_69
wherein ,
Figure SMS_70
is noise->
Figure SMS_71
Is>
Figure SMS_72
And a dimension.
Preferably, the interactive gradient and model of the mutual iteration of the LCES and the GS are scheduled in a continuous state and action space, the PPO algorithm is applied to the learning process of the LCES agent, the PPO algorithm operates a plurality of epodes with a fixed strategy, the running track is reserved, and the rewards obtained by the LCES agent are products of the saved amount and the correlation coefficient when the whole epode is finished.
Preferably, the policy model of the LCES agent inputs the state at each moment, outputs the mean and variance of the continuous motion, samples the motion from the distribution determined by the mean and variance, constructs a noise gradient satisfying LDP definition, reports to the global GS, the global GS caches the received disturbance gradient, updates the GS model using these gradients when a certain number is reached, and broadcasts the updated model to all LCES.
Preferably, in the framework of the FRL, each LCES agent reports a satisfaction
Figure SMS_73
GS uses the noise gradient of LCES to update +.>
Figure SMS_74
Independent of any privacy information of LCES, GS will be updated after the next round>
Figure SMS_75
Broadcast to all LCES, which train in the local environment.
Preferably, it is provided with
Figure SMS_76
Is an original function, without noise, not conforming to the LDP definition, < >>
Figure SMS_77
Is in accordance with
Figure SMS_78
Defined function +_>
Figure SMS_79
,/>
Figure SMS_80
Is two different gradients, sensitivity is defined as:
Figure SMS_81
in the noise
Figure SMS_82
Obeys->
Figure SMS_83
Then a function satisfying the strict differential privacy definition is obtained>
Figure SMS_84
In the technical scheme, the invention has the technical effects and advantages that:
1. the CES scheduling is carried out based on the federal reinforcement learning framework, the whole algorithm operates in a layered distributed architecture, the local community scheduling agent aims at minimizing daily energy cost of communities, and the method does not need to share energy consumption data among communities, only needs to share disturbance model gradient, and protects privacy of community families.
2. Compared with a static CES scheduling method, the method provided by the invention has the advantages that the effectiveness of the scheduling method is proved by experiments, the federal learning method can be converged more quickly to reach an optimal solution, the agent can be trained in different environments, and meanwhile, aiming at different privacy requirements, the proposed method obtains different experimental effects, and the trade-off between the cost saving amount and the privacy protection degree is demonstrated.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.
FIG. 1 is a diagram of a community energy storage scheduling architecture of the present invention.
Fig. 2 is a diagram of a CES scheduling architecture based on FRL according to the present invention.
Fig. 3 is a block diagram of an FRL-based CES system of the present invention.
FIG. 4 is a schematic diagram of community energy requirements and ToU electricity prices according to the present invention.
Fig. 5 is a diagram of CES scheduling results for different communities of the present invention.
FIG. 6 is a plot of the impact of CES capacity size on community cost savings of the present invention.
FIG. 7 is a schematic diagram showing a comparison of reinforcement learning, federal reinforcement learning, methods of combining differential privacy and static allocation policies in different communities.
FIG. 8 is a graph of reinforcement learning and federal reinforcement learning training according to the present invention.
Fig. 9 is a schematic diagram showing comparison of model convergence speeds under different privacy protection forces according to the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
Referring to fig. 1, fig. 2 and fig. 3, the CES day-ahead scheduling method based on FRL according to the present embodiment is composed of N community energy storage systems LCES and a single global server GS, and the training process of federal reinforcement learning FRL includes two steps:
LCES trains and updates the local model, and uses noise disturbance to update gradient;
the GS sums the noise gradients of the N LCES to update the global model of the GS, and then broadcasts the latest GS model to the LCES; the local model and the global model are iteratively updated until a certain stopping requirement is met.
The reinforcement learning agent performs CES day-ahead scheduling using a near-end policy optimization (PPO) algorithm.
The agent has the task of reducing the total energy expenditure of the community as much as possible under the condition of meeting the energy requirements of families in the community.
CES scheduling is based on the federal reinforcement learning framework. The entire algorithm runs in a hierarchical distributed architecture, with the local community scheduling agent targeting minimizing the daily energy costs of the community. The method does not need to share energy consumption data among communities, only needs to share disturbance model gradient, and protects privacy of communities and families.
For CES agents, the given states include ToU electricity prices, community total energy demand on the day, CES total capacity, CES current capacity duty cycle, and current time of day.
CES agent calculates optimal charge and discharge schedules.
Because of CES capacity limitations, if the community total energy demand cannot be met at a certain time, then the user needs to purchase the balance of energy from the grid at that time.
The FRL mathematical model and algorithm-state and action space formulas, rewarding functions, LDP, CES scheduling algorithm based on FRL and reinforcement learning PPO algorithm.
The scheduling algorithm runs in a layered distributed architecture, the GS updates a global model by aggregating local model gradients, for LCES, the DRL agent is trained by using local data, and model gradients are reported to the GS, and the calculation of the optimal CES agent can be realized only by exchanging model gradients or model parameters between the GS and the LCES;
the combination of LDP into the FRL framework implements a privacy-preserving CES scheduling algorithm, and LCES will perturb the local model gradient using laplace noise before uploading the locally trained model gradient. Gradient aggregation of privacy protection is realized, and local environment privacy is protected;
compared with the independent DRL, the FRL has higher convergence rate, and meanwhile, by adjusting the LDP parameters, the optimal solution can be weighed between privacy protection and model precision.
Example 2
In this embodiment, the optimization target and constraint condition of the CES scheduling system are defined in a mathematical form, and the CES scheduling model based on deep reinforcement learning DRL and the CES scheduling model combined with local differential privacy LDP are described.
CES day-ahead scheduling requires users to reserve a day ahead and then arrange for the corresponding energy storage service in order to minimize the total energy expenditure of the overall system.
Due to the high cost of CES construction, long maintenance is required and energy storage resources cannot be fully utilized by a single household.
Therefore, the energy storage equipment is shared by a plurality of families in the community, the utilization rate of the energy storage equipment is improved, the initial construction cost and the long-term maintenance cost can be shared together, and the total energy consumption of the community is reduced as a whole.
To this end, we construct a target optimization model for community total energy consumption minimization, comprising:
1) An objective function.
The community total energy consumption minimization is defined as follows
Figure SMS_85
(1)
The goal of equation (1) is to minimize the total energy consumption of the community, including
Figure SMS_86
Time CES charge +.>
Figure SMS_87
Cost of (2)
Figure SMS_88
Part of the demands which cannot be met by the time CES ∈>
Figure SMS_89
Cost of (2) and CES service fee
Figure SMS_90
,/>
Figure SMS_91
Indicating the service charge required for CES unit charge.
wherein
Figure SMS_92
Is->
Figure SMS_94
Time ToU price of electricity, ->
Figure SMS_95
Is->
Figure SMS_96
Time CES charge,/->
Figure SMS_97
Is->
Figure SMS_98
The amount of discharge delivered by CES to the home in the community at time,/->
Figure SMS_99
Is->
Figure SMS_93
And (5) the total household demand in the community at any time.
2) Constraint conditions.
Figure SMS_100
(2),
Constraint
Figure SMS_101
: in consideration of CES charging efficiency ratio->
Figure SMS_102
And discharge efficiency ratio->
Figure SMS_103
Is a case of updating the state of charge,
Figure SMS_104
is CES remaining capacity at time t, +.>
Figure SMS_105
Representing CES total capacity.
Constraint II: ensuring a viable CES state, assuming an SOE of 0 at the initial time.
Constraint III and constraint IV: ensuring CES charge rate
Figure SMS_106
And discharge rate->
Figure SMS_107
And within a reasonable range, CES is prevented from being overcharged and discharged.
Constraint V: the balance of the total demands of communities is ensured, namely, the household electricity demand in communities can be completely met.
Figure SMS_108
(3),
Equation (3) constrains a reasonable range of parameters in the system,
Figure SMS_109
is the maximum timestamp, the present application considers the day-ahead schedule at the interval of hours, therefore +.>
Figure SMS_110
A. CES scheduling model based on DRL:
1) State space: for any arbitrary
Figure SMS_111
At time, the CES agent's state space is defined as follows:
Figure SMS_112
(4),
in the definition of the state space above, the state
Figure SMS_113
Is->
Figure SMS_114
The ratio of the remaining capacity of CES to the total capacity at the moment. />
Figure SMS_115
Representation->
Figure SMS_116
The state of the environment in which the CES agent is located at the moment.
In the prior art, only time-related dynamic variables are considered for the state space of the energy storage agent, but we find through experiments that the static factors related to energy storage are also used as the state input model network, so that the agent convergence speed can be accelerated.
The method is also direct, more relevant information is input into the model network, and the agent can be enabled to know the environment more comprehensively and carefully, so that an excellent decision can be made more quickly.
2) Action space: action space
Figure SMS_117
The charge and discharge coefficients including CES at different times are defined as follows:
Figure SMS_118
(5),
Figure SMS_120
indicating CES is +.>
Figure SMS_121
The moment of time is from the grid charging capacity, the value of which ranges from +.>
Figure SMS_123
Between, and->
Figure SMS_125
Moment of time from grid charge->
Figure SMS_126
The relation of->
Figure SMS_128
,/>
Figure SMS_129
Indicating CES is +.>
Figure SMS_119
The discharge coefficient given to the community at the moment is equal to +.>
Figure SMS_122
The relation of (2) is that
Figure SMS_124
,/>
Figure SMS_127
Representation->
Figure SMS_130
Time CES agent in the environment->
Figure SMS_131
Actions performed below.
3) Bonus function: the reward function R represents feedback obtained by the CES agent in the exploration of the environment S for guiding the agent to achieve a predetermined objective.
The setting of the reward function should include the reward for the agent performing the correct action, and the penalty for performing the false action resulting in the environment not meeting the CES device base constraint, and therefore the reward function is defined as follows:
Figure SMS_132
(6),
where constraint VII-constraint IX represents that when the action performed by CES exceeds the constraint in P (1), the system gives a penalty, and if within the constraint, a reward.
Constraint
Figure SMS_133
Is->
Figure SMS_134
The amount of energy cost saved by the whole system when the agent has performed CES scheduling for 24 hours is defined as follows:
Figure SMS_135
(7),
thus when
Figure SMS_136
The larger the current day, the larger the amount of the scheduling savings is, and the more rewards are given to the agent by the system. If->
Figure SMS_137
When negative, the system gives severe penalties to agents.
Figure SMS_138
All are coefficients, the dynamics of rewards and punishments are used for adjusting, and optimal rewards and punishment coefficients are adjusted through experimental results.
For a 24 hour day before schedule scenario, consider that the operation of the agent at each time may exceed the constraint of P (1), and that the total savings at the last time of day before schedule optimizes the execution actions of the agent.
4) PPO algorithm: the CES agent, after performing actions with a specified policy, optimizes the CES agent by increasing the probability of good actions and decreasing the probability of bad actions after the end of the session.
The PPO algorithm uses an importance sampling technology, the problem that samples in the strategy gradient algorithm can be used only once is solved, and the PPO algorithm uses a dominance function to replace a rewarding function, so that the model is more focused on average rewards brought by actions.
We mark the trajectory as
Figure SMS_139
Parameterized policy->
Figure SMS_140
, wherein />
Figure SMS_141
Is a parameter of the distribution approximation. The purpose of the PPO algorithm is to maximize the policy +.>
Figure SMS_142
Lower rewarding desire->
Figure SMS_143
The likelihood function is therefore as follows:
Figure SMS_144
(8),
wherein ,
Figure SMS_145
respectively represent policy->
Figure SMS_146
Under the probability of performing an action, +.>
Figure SMS_147
Is a privacy requirement, defines the clipping range and is related to sensitivity.
Figure SMS_148
Indicating that CES agent is in state->
Figure SMS_149
Execution of action down->
Figure SMS_150
Resulting in an average advantage.
B. CES scheduling model in combination with LDP:
LCES generates Laplace noise to perturb the local gradient before reporting the local gradient, preventing malicious parties from analyzing local privacy information from the gradient.
Thus, the local differential privacy provides a strict privacy guarantee for training results before the LCES reports training results, we assume that LCES uses a random function
Figure SMS_151
Disturbance training result, the value range of the random function is +.>
Figure SMS_152
Define the domain as->
Figure SMS_153
。/>
Definition 1: for any possible input
Figure SMS_154
And a subset of arbitrary outputs +.>
Figure SMS_155
The random function is +_if and only if the following inequality holds>
Figure SMS_156
Satisfy->
Figure SMS_157
Figure SMS_158
(9),
Definition 1 requires that in the random function, the output from the two approximate inputs is indistinguishable, i.e., for the approximate training results in LCES, the result is passed through the random function
Figure SMS_159
The resulting output is indistinguishable.
Definition 2: for arbitrary input
Figure SMS_160
Random function->
Figure SMS_161
The sensitivity of (c) is defined as follows:
Figure SMS_162
(10),
sensitivity defines the maximum variation of the random function as the input data set fluctuates
Figure SMS_163
The maximum change in output that occurs.
Laplace mechanism: the Laplace mechanism is a random mechanism that randomly samples from the Laplace distribution according to the sensitivity of the objective function, defined as:
Figure SMS_164
(11),
for the random function
Figure SMS_165
Arbitrary deterministic or random function as defined above>
Figure SMS_166
If->
Figure SMS_167
Satisfy->
Figure SMS_168
Then
Figure SMS_169
Also for arbitrary inputs->
Figure SMS_170
Satisfy->
Figure SMS_171
We set the GS to have a parameterized global model
Figure SMS_172
,/>
Figure SMS_173
Is->
Figure SMS_174
Dimension of (2);
CES proxy input during local training
Figure SMS_175
And acquiring the next action;
after a number of rounds, the agent updates the model with a loss function based on the historical trajectory information and the rewards obtained
Figure SMS_176
After multiple rounds of updating, the agent finds the final updated gradient and LCES computes a perturbed random gradient before reporting to GS;
desirably by a random function
Figure SMS_177
Is satisfied->
Figure SMS_178
Is a noise gradient of (c).
Definition 3 (meet
Figure SMS_179
Noise gradient of (d): for any local community scheduling system->
Figure SMS_180
Arbitrary two local gradients +.>
Figure SMS_181
And arbitrary random gradient subset->
Figure SMS_182
The following inequality must hold:
Figure SMS_183
(12),
wherein
Figure SMS_184
Is the noise gradient after disturbance, +.>
Figure SMS_185
Is the true gradient obtained by LCES local training.
For LCES reported noise gradients, the GS convergence averages the gradient and is then used to update the global model and share the latest GS model with all LCES.
We assume that each LCES uploads the final noise gradient to the GS after a fixed number of local training.
By the definition above we can construct a structure satisfying
Figure SMS_186
Is a noise gradient of (c).
Raw gradient training for LCES model
Figure SMS_187
First of all, a restriction is required>
Figure SMS_188
The sensitivity of (2) is calculated as: />
Figure SMS_189
(13),
wherein
Figure SMS_190
Is the gradient of LCES local training, < >>
Figure SMS_191
Is sensitivity, that is to say any two gradients +.>
Figure SMS_192
The method meets the following conditions:
Figure SMS_193
(14),
based on gradient after shearing
Figure SMS_194
Sensitivity->
Figure SMS_195
Each LCES can generate Laplace noise locally +.>
Figure SMS_196
,/>
Figure SMS_197
The method meets the following conditions:
Figure SMS_198
(15),
wherein ,
Figure SMS_199
is noise->
Figure SMS_200
Is>
Figure SMS_201
And a dimension.
Example 3
The present embodiment proposes a CES scheduling algorithm based on FRL, see algorithm one:
firstly, initializing related input parameters including energy requirements, toU price and CES related parameters at all moments of a community, and a GS reinforcement learning model
Figure SMS_202
Dimension->
Figure SMS_203
And broadcast toAll LCES, clipping parameters->
Figure SMS_204
Local privacy requirement->
Figure SMS_205
Then starting the loop, the maximum loop number being the maximum communication number, starting the calculation for all LCES, iterating from epoode=0 to the maximum update number of LCES,
according to the strategy
Figure SMS_206
Run 96 time stamps and record policy track +.>
Figure SMS_207
Calculating the dominance function of each state +.>
Figure SMS_208
Calculating a loss function:
Figure SMS_209
then updating the LCES reinforcement learning model by using an Adamw optimizer to calculate model gradient
Figure SMS_210
And a disturbance gradient->
Figure SMS_211
And the noise gradient after disturbance is +.>
Figure SMS_212
Reported to the GS, which can buffer all received noise gradients, and if the GS buffer is full, calculate the mean +.>
Figure SMS_213
And update the global model +.>
Figure SMS_214
Finally, the buffer memory is emptied,and outputting a result to finish the algorithm.
The algorithm runs in a distributed fashion, with interactive gradients and models of LCES and GS iterating over each other. The LCES agents are scheduled in a continuous state and action space, we apply the PPO algorithm to the LCES agent's learning process.
The PPO algorithm runs multiple epodes with a fixed policy, keeping the running trace, which we set to 96 timestamps in this application. Then, according to the existing track, the probability of the action with large average rewards is increased, and the probability of the action with small average rewards is reduced.
The reward earned by the LCES agent in the present system is the product of the saved amount and the correlation coefficient when the entire epoode is finished.
The policy model of the LCES agent inputs the state of each moment, outputs the mean and variance of the continuous action, and samples the action from the distribution determined by the mean and variance.
This allows the LCES agent to try to all possible possibilities of the action space, avoiding trapping in the extremum region.
After the local training is completed, LCES follows algorithm two:
to calculate
Figure SMS_215
Noise gradient->
Figure SMS_218
Firstly, the corresponding parameters are input, including the original gradient +.>
Figure SMS_220
Dimension->
Figure SMS_221
Privacy requirement->
Figure SMS_222
Cutting range->
Figure SMS_223
According to equation (13), it is possible to rely on the original gradient +.>
Figure SMS_224
Calculating shear gradient->
Figure SMS_216
Then, the cyclic calculation is performed a plurality of times according to the same formula (15) until the cyclic number reaches d, and then noise +.>
Figure SMS_217
Finally return the result->
Figure SMS_219
The algorithm is completed.
Constructing noise gradients meeting LDP definition, reporting the noise gradients to a global GS, caching the received disturbance gradients by the global GS, updating a GS model by using the gradients after a certain number of the received disturbance gradients are reached, and broadcasting the updated model to all LCES.
In the FRL framework-based algorithm one described in the application, all LCES meet the following requirements
Figure SMS_225
Within the framework of FRL, each LCES agent reports one satisfaction
Figure SMS_226
Is updated by the noise gradient of LCES alone>
Figure SMS_227
This step is independent of any privacy information of the LCES;
and update the model to violate
Figure SMS_228
After the next round, GS will be updated +.>
Figure SMS_229
Broadcast to all LCES, LCES trains in the local environment, and the local learning process is independent of all other agents, so that other agents are not violated either
Figure SMS_230
And (5) defining.
Gradient satisfaction of Laplace noise disturbance
Figure SMS_231
And (5) defining.
Assume that
Figure SMS_232
Is an original function, without noise, not conforming to the LDP definition, < >>
Figure SMS_233
Is in accordance with->
Figure SMS_234
Defined functions, i.e.)>
Figure SMS_235
,/>
Figure SMS_236
Is two different gradients. Sensitivity is defined as see equation (10), privacy budget is +.>
Figure SMS_237
It is possible to obtain:
Figure SMS_238
(16),
i.e. the probability of the random function outputting a specified value, equal to the probability distribution of the associated noise, we let us say that
Figure SMS_239
Obeys->
Figure SMS_240
Then a function satisfying the strict differential privacy definition can be obtained +.>
Figure SMS_241
The following formula is obtained:
Figure SMS_242
(17),
at this time if function
Figure SMS_243
A scalar is output of (1), then:
Figure SMS_244
Figure SMS_245
(18),
the above formula shows a gradient
Figure SMS_246
Obtaining a specified result via a noise function>
Figure SMS_247
Is the same to obtain the gradient +.>
Figure SMS_248
And->
Figure SMS_249
Is defined by the probability formula:
Figure SMS_250
,/>
in comparison, the two can be obtained:
Figure SMS_251
thus, the gradient of the disturbance satisfies
Figure SMS_252
And (5) defining.
Example 4
In this embodiment, the relevant work is verified using the authentic data. Consider three communities of different CES specifications, as shown in table 1:
Figure SMS_253
in the table 1 of the description,
energy demand and ToU electricity prices for each community as shown in fig. 4, we assume that 50 iterations of LCES training followed by one round of communication with GS, the experiment was run on the Ubuntu system using python3.9 and pytorch1.12.1.
The scheduling effect of the proposed method is evaluated first. CES scheduling service scenario showing 3 communities is shown in fig. 5.
Each community can discharge by using CES equipment in the peak period of electricity price to realize energy arbitrage, and as can be seen from fig. 5, the main energy requirements of the community come from the power grid in the low peak period of electricity price.
Because the initial CES has no stored energy, CES is charging the reserve energy before the peak period, starting at 0 a day.
When the time comes to the electricity price peak period, the main energy consumption of the community is provided by CES, and if the CES cannot completely meet the demands of the community family at certain moments, the community family can supplement the balance demands from the power grid.
As shown in fig. 6, it can be observed that when CES capacity is small, there is a significant increase in the cost savings amount of the community as CES capacity increases, but the cost savings amount after CES capacity exceeds some upper limit is not significant, and even does not increase any more, and for community two, CES maximum capacity threshold is between 70 and 80kWh, so our method can also combine user history data to predict community optimal CES capacity.
In fig. 7, we compare the cost savings of four different scheduling methods, namely reinforcement learning, federal reinforcement learning, methods of combining differential privacy, and static allocation strategies as proposed in this application.
In the static allocation strategy, community shared energy storage capacity is divided into different community users, and the users independently operate own energy storage capacity.
When privacy concerns are not considered, reinforcement learning and federal reinforcement perform better than static allocation strategies.
And dynamic battery allocation strategies are always preferred over static strategies because static allocation cannot reuse CES capacity and cannot achieve optimal CES scheduling solutions.
As can be seen from fig. 8, the federal reinforcement not only improves the model performance but also increases the model convergence rate.
This is because agents in federal enforcement can learn knowledge from more environments.
When privacy is considered, CES agents sacrifice some performance in exchange for privacy protection, indicating a tradeoff between privacy and utility.
At the same time we can see that even with privacy protection in mind, the proposed method still performs better than the static allocation strategy.
Fig. 9 shows a comparison of model convergence rates at different privacy protections, as can be seen,
Figure SMS_254
the represented solid line is better than +.>
Figure SMS_255
Is shown in the figure).
This is because
Figure SMS_256
The larger the noise is, the smaller the privacy protection on the gradient is, but the more excellent model performance and faster convergence speed can be obtained.
The difference between the final convergence of the two is not great as the model is trained, which also means that the model can learn the correct knowledge even under the stricter privacy requirements.
This is because adding noise to the model is also a method to prevent model overfitting and can improve the reasoning ability of the model.
The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. When the computer instructions or computer program are loaded or executed on a computer, the processes or functions described in accordance with the embodiments of the present application are all or partially produced. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more sets of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.
It should be understood that the term "and/or" in this application is merely an association relationship describing the associated object, and indicates that three relationships may exist, for example, a and/or B may indicate: there are three cases, a alone, a and B together, and B alone, wherein a, B may be singular or plural. In addition, the character "/" in this application generally indicates that the associated object is an "or" relationship, but may also indicate an "and/or" relationship, and may be understood by referring to the context.
In the present application, "at least one" means one or more, and "a plurality" means two or more. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural.
It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.
In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (8)

1. The CES day-ahead scheduling method based on FRL is characterized in that: including a plurality of community energy storage systems LCES and a single global server GS;
the training process of the FRL comprises the following steps:
LCES trains and updates the local model, and uses noise disturbance to update gradient;
the GS sums the noise gradients of a plurality of LCES, updates the global model of the GS, and broadcasts the latest GS model to the LCES;
the local model and the global model are iteratively updated to meet the stopping requirement, and training is completed;
the FRL operates in a hierarchical distributed architecture, the GS updates a global model by aggregating local model gradients, the LCES trains a DRL agent by using local data, and reports model gradients to the GS, and only model gradients or model parameters are exchanged between the GS and the LCES to realize the calculation of the CES agent;
the CES building a target optimization model for minimizing community total energy consumption comprises the following steps:
objective function: the community total energy consumption minimization is defined as:
Figure QLYQS_1
wherein comprises
Figure QLYQS_2
Time CES charge +.>
Figure QLYQS_3
Cost of->
Figure QLYQS_4
Part of the demand that cannot be satisfied by CES at the moment
Figure QLYQS_5
Cost of CES service +.>
Figure QLYQS_6
,/>
Figure QLYQS_7
A service charge indicating a CES unit charge amount; t is the maximum timestamp, scheduled before the day at the interval of hours, then t=24;
wherein
Figure QLYQS_9
Is->
Figure QLYQS_10
Time ToU price of electricity, ->
Figure QLYQS_11
Is->
Figure QLYQS_12
Time CES charge,/->
Figure QLYQS_13
Is->
Figure QLYQS_14
The amount of discharge delivered by CES to the home in the community at time,/->
Figure QLYQS_15
Is->
Figure QLYQS_8
The total household demand in the community at any moment;
constraint conditions:
Figure QLYQS_16
Figure QLYQS_17
Figure QLYQS_18
Figure QLYQS_19
Figure QLYQS_20
constraint
Figure QLYQS_21
: consider CES charging efficiency ratio->
Figure QLYQS_22
And discharge efficiency ratio->
Figure QLYQS_23
In case of updating state of charge, +.>
Figure QLYQS_24
Is->
Figure QLYQS_25
Time CES remaining capacity, ++>
Figure QLYQS_26
Representing CES total capacity;
constraint II: constraining the CES state, and setting SOE of initial time to 0;
constraint III and constraint IV: constraining CES charge rate
Figure QLYQS_27
And discharge rate->
Figure QLYQS_28
Within a reasonable range, CES is prevented from being excessively charged and discharged;
constraint V: the balance of the total demands of communities is ensured.
2. The FRL-based CES day-ahead scheduling method of claim 1, further comprising: constraint III and constraint IV are defined by the following reasonable range of constraint parameters:
Figure QLYQS_29
Figure QLYQS_30
is the maximum timestamp, schedule before day at the interval of hours, +.>
Figure QLYQS_31
。/>
3. The FRL-based CES day-ahead scheduling method of claim 2, further comprising: for any arbitrary
Figure QLYQS_32
The state space of the CES agent is defined at time instant as:
Figure QLYQS_33
in the state of
Figure QLYQS_34
Is->
Figure QLYQS_35
The ratio of the remaining capacity of CES to the total capacity at time,/->
Figure QLYQS_36
,/>
Figure QLYQS_37
Representation->
Figure QLYQS_38
The state of the environment where the CES agent is located at the moment, the static factors of energy storage are used as the states to be input into the model network, and the action space is +.>
Figure QLYQS_39
The charge and discharge coefficients including CES at different times are defined as:
Figure QLYQS_40
in the formula ,
Figure QLYQS_42
indicating CES is +.>
Figure QLYQS_44
The moment of time is from the grid charging capacity, the value of which ranges from +.>
Figure QLYQS_46
Between, and->
Figure QLYQS_48
Moment of time from grid charge->
Figure QLYQS_50
The relation of->
Figure QLYQS_52
,/>
Figure QLYQS_53
Indicating CES is +.>
Figure QLYQS_41
The discharge coefficient given to the community at the moment is equal to +.>
Figure QLYQS_43
The relation of->
Figure QLYQS_45
,/>
Figure QLYQS_47
Representation->
Figure QLYQS_49
Time CES agent in the environment->
Figure QLYQS_51
Action performed down;
the reward function R represents feedback obtained by the CES agent in the exploration of the environment, for guiding the agent to achieve a predetermined objective, the reward function comprising a reward for the agent to perform a correct action, and a penalty for performing a false action resulting in the environment not meeting the CES device basic constraints, defined as:
Figure QLYQS_54
constraint
Figure QLYQS_55
Is->
Figure QLYQS_56
The amount of energy cost saved by the whole system when the agent has performed CES scheduling for 24 hours is defined as follows:
Figure QLYQS_57
when (when)
Figure QLYQS_58
The larger the scheduling savings, the larger the system awards the agent->
Figure QLYQS_59
When negative, the system gives a proxy penalty,
Figure QLYQS_60
all are coefficients, and the strength of rewards and punishments is adjusted.
4. The FRL-based CES day-ahead scheduling method of claim 3, further comprising: after each LCES is trained locally for a fixed number of times, the final noise gradient is uploaded to the GS, and the structure meets the requirement
Figure QLYQS_61
Noise gradient of->
Figure QLYQS_62
Is a privacy requirement;
original gradient obtained by LCES model training
Figure QLYQS_63
Need to restrict->
Figure QLYQS_64
The sensitivity of (2) is calculated as:
Figure QLYQS_65
wherein
Figure QLYQS_66
Is the gradient of LCES local training, < >>
Figure QLYQS_67
Is sensitivity, that is to say any two gradients +.>
Figure QLYQS_68
The method meets the following conditions:
Figure QLYQS_69
based on gradient after shearing
Figure QLYQS_70
Sensitivity->
Figure QLYQS_71
Each LCES locally generates Laplace noise +.>
Figure QLYQS_72
,/>
Figure QLYQS_73
The method meets the following conditions: />
Figure QLYQS_74
wherein ,
Figure QLYQS_75
is noise->
Figure QLYQS_76
Is>
Figure QLYQS_77
And a dimension.
5. The FRL-based CES day-ahead scheduling method of claim 4, further comprising: the LCES and GS mutually iterate interactive gradient and model, the LCES agent is scheduled in continuous state and action space, the PPO algorithm is applied to the learning process of the LCES agent, the PPO algorithm runs a plurality of epodes with fixed strategy, the running track is reserved, and the rewards obtained by the LCES agent are the products of the saved amount and the related coefficients when the whole epode is finished.
6. The FRL-based CES day-ahead scheduling method of claim 5, further comprising: the strategy model of the LCES agent inputs the state of each moment, outputs the mean value and the variance of continuous motion, samples the motion from the distribution determined by the mean value and the variance, constructs noise gradients meeting LDP definition by LCES, reports the noise gradients to the global GS, the global GS caches the received disturbance gradients, updates the GS model by using the gradients after a certain number of the received disturbance gradients is reached, and broadcasts the updated model to all LCES.
7. The FRL-based CES day-ahead scheduling method of claim 6, further comprising: in the frame of the FRLEach LCES agent reports one of the satisfaction
Figure QLYQS_78
GS uses the noise gradient of LCES to update +.>
Figure QLYQS_79
Independent of any privacy information of LCES, GS will be updated after the next round>
Figure QLYQS_80
Broadcast to all LCES, which train in the local environment.
8. The FRL-based CES day-ahead scheduling method of any of claims 1-7, characterized by: is provided with
Figure QLYQS_81
Is an original function, without noise, not conforming to the LDP definition, < >>
Figure QLYQS_82
Is in accordance with->
Figure QLYQS_83
The function of the definition is such that,
Figure QLYQS_84
,/>
Figure QLYQS_85
is two different gradients, sensitivity is defined as:
Figure QLYQS_86
Figure QLYQS_87
in the noise
Figure QLYQS_88
Obeys->
Figure QLYQS_89
Then a function satisfying the strict differential privacy definition is obtained>
Figure QLYQS_90
。/>
CN202310191179.2A 2023-03-02 2023-03-02 CES day-ahead scheduling method based on FRL Active CN115860789B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310191179.2A CN115860789B (en) 2023-03-02 2023-03-02 CES day-ahead scheduling method based on FRL

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310191179.2A CN115860789B (en) 2023-03-02 2023-03-02 CES day-ahead scheduling method based on FRL

Publications (2)

Publication Number Publication Date
CN115860789A CN115860789A (en) 2023-03-28
CN115860789B true CN115860789B (en) 2023-05-30

Family

ID=85659704

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310191179.2A Active CN115860789B (en) 2023-03-02 2023-03-02 CES day-ahead scheduling method based on FRL

Country Status (1)

Country Link
CN (1) CN115860789B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115310121A (en) * 2022-07-12 2022-11-08 华中农业大学 Real-time reinforced federal learning data privacy security method based on MePC-F model in Internet of vehicles

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210089910A1 (en) * 2019-09-25 2021-03-25 Deepmind Technologies Limited Reinforcement learning using meta-learned intrinsic rewards
CN111611610B (en) * 2020-04-12 2023-05-30 西安电子科技大学 Federal learning information processing method, system, storage medium, program, and terminal
CN112214788B (en) * 2020-08-28 2023-07-25 国网江西省电力有限公司信息通信分公司 Ubiquitous power Internet of things dynamic data publishing method based on differential privacy
CN112818394A (en) * 2021-01-29 2021-05-18 西安交通大学 Self-adaptive asynchronous federal learning method with local privacy protection
CN113221183B (en) * 2021-06-11 2022-09-16 支付宝(杭州)信息技术有限公司 Method, device and system for realizing privacy protection of multi-party collaborative update model
CN113591145B (en) * 2021-07-28 2024-02-23 西安电子科技大学 Federal learning global model training method based on differential privacy and quantization
CN113570155A (en) * 2021-08-13 2021-10-29 常州工程职业技术学院 Multi-community energy cooperation game management model based on energy storage device and cheating behavior
CN114330743A (en) * 2021-12-24 2022-04-12 浙江大学 Cross-equipment federal learning method for minimum-maximum problem
CN115511054A (en) * 2022-09-27 2022-12-23 中国科学技术大学 Cost perception privacy protection federal learning method facing unbalanced data

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115310121A (en) * 2022-07-12 2022-11-08 华中农业大学 Real-time reinforced federal learning data privacy security method based on MePC-F model in Internet of vehicles

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
跨音速极限环型颤振的高效数值分析方法;张伟伟;王博斌;叶正寅;;力学学报(06);31-41 *

Also Published As

Publication number Publication date
CN115860789A (en) 2023-03-28

Similar Documents

Publication Publication Date Title
Chiş et al. Reinforcement learning-based plug-in electric vehicle charging with forecasted price
Ciupageanu et al. Real-time stochastic power management strategies in hybrid renewable energy systems: A review of key applications and perspectives
CN113610303B (en) Load prediction method and system
Kusiak et al. A data-driven approach for steam load prediction in buildings
Chen Two-level hierarchical approach to unit commitment using expert system and elite PSO
Leterme et al. A flexible stochastic optimization method for wind power balancing with PHEVs
Liu et al. Optimal reserve management of electric vehicle aggregator: Discrete bilevel optimization model and exact algorithm
Ghanbarzadeh et al. Reliability constrained unit commitment with electric vehicle to grid using hybrid particle swarm optimization and ant colony optimization
James et al. Optimal V2G scheduling of electric vehicles and unit commitment using chemical reaction optimization
Zhou et al. LSTM-based energy management for electric vehicle charging in commercial-building prosumers
CN111200293A (en) Battery loss and distributed power grid battery energy storage day-ahead random scheduling method
Cao et al. Energy management optimisation using a combined Long Short-Term Memory recurrent neural network–Particle Swarm Optimisation model
Hosking et al. Short‐term forecasting of the daily load curve for residential electricity usage in the Smart Grid
Kong et al. Refined peak shaving potential assessment and differentiated decision-making method for user load in virtual power plants
Porras et al. An efficient robust approach to the day-ahead operation of an aggregator of electric vehicles
Ampatzis et al. Robust optimisation for deciding on real‐time flexibility of storage‐integrated photovoltaic units controlled by intelligent software agents
Chu et al. A multiagent federated reinforcement learning approach for plug-in electric vehicle fleet charging coordination in a residential community
CN116227806A (en) Model-free reinforcement learning method based on energy demand response management
Haque et al. Stochastic methods for prediction of charging and discharging power of electric vehicles in vehicle‐to‐grid environment
Zhang et al. Data augmentation strategy for small sample short‐term load forecasting of distribution transformer
Nammouchi et al. Robust opportunistic optimal energy management of a mixed microgrid under asymmetrical uncertainties
Liu et al. Reinforcement learning-based energy trading and management of regional interconnected microgrids
Xuemei et al. A novel air-conditioning load prediction based on ARIMA and BPNN model
Gulotta et al. Short-term uncertainty in the dispatch of energy resources for VPP: A novel rolling horizon model based on stochastic programming
CN115860789B (en) CES day-ahead scheduling method based on FRL

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant