CN115860789B

CN115860789B - CES day-ahead scheduling method based on FRL

Info

Publication number: CN115860789B
Application number: CN202310191179.2A
Authority: CN
Inventors: 邱日轩; 肖子洋; 李帆; 郑锦坤; 余腾龙; 陈明亮; 井思桐; 吴灵芝
Original assignee: State Grid Corp of China SGCC; Information and Telecommunication Branch of State Grid Jiangxi Electric Power Co Ltd
Current assignee: State Grid Corp of China SGCC; Information and Telecommunication Branch of State Grid Jiangxi Electric Power Co Ltd
Priority date: 2023-03-02
Filing date: 2023-03-02
Publication date: 2023-05-30
Anticipated expiration: 2043-03-02
Also published as: CN115860789A

Abstract

The invention discloses a CES day-ahead scheduling method based on FRL, which comprises a plurality of community energy storage systems LCES and a single global server GS; the training process of the FRL comprises the following steps: LCES trains and updates the local model, and uses noise disturbance to update gradient; the GS sums the noise gradients of a plurality of LCES, updates the global model of the GS, and broadcasts the latest GS model to the LCES; and (5) iteratively updating the local model and the global model, meeting the stopping requirement, and completing training. The CES scheduling is carried out based on the federal reinforcement learning framework, the whole algorithm operates in a layered distributed architecture, the local community scheduling agent aims at minimizing daily energy cost of communities, and the method does not need to share energy consumption data among communities, only needs to share disturbance model gradient, and protects privacy of community families.

Description

CES day-ahead scheduling method based on FRL

Technical Field

The invention relates to the technical field of energy storage scheduling, in particular to a CES day-ahead scheduling method based on FRL.

Background

The household sharing high-capacity energy storage equipment in communities can realize space-time transfer and energy arbitrage of household demands under a time-of-use electricity price plan, energy Storage (ES) is an important component of a novel power system, randomness and fluctuation of renewable energy sources can be relieved, under a time-of-use electricity price (ToU) plan, ES can also realize energy arbitrage by storing energy in off-peak periods and releasing energy in peak periods, and with the development of the era, a community sharing energy storage system (CES) appears, however, the traditional scheduling method cannot meet the dynamically-changed household demands, and the energy storage scheduling needs household detailed energy consumption data, so that the privacy problem is related.

Disclosure of Invention

The invention aims to provide a CES (community shared energy storage system) day-ahead scheduling method based on FRL (federal reinforcement learning) to solve the defects in the background technology.

In order to achieve the above object, the present invention provides the following technical solutions: the CES day-ahead scheduling method based on FRL comprises a plurality of community energy storage systems LCES and a single global server GS;

the training process of the FRL comprises the following steps:

LCES trains and updates the local model, and uses noise disturbance to update gradient;

the GS sums the noise gradients of a plurality of LCES, updates the global model of the GS, and broadcasts the latest GS model to the LCES;

and (5) iteratively updating the local model and the global model, meeting the stopping requirement, and completing training.

Preferably, the FRL operates in a hierarchical distributed architecture, the GS updates the global model by aggregating local model gradients, the LCES trains the DRL agent using local data, and model gradients are reported to the GS, only model gradients or model parameters are exchanged between the GS and LCES to enable computation of CES agent.

Preferably, the CES builds a target optimization model for minimizing total energy consumption of the community, including:

objective function: the community total energy consumption minimization is defined as:

，

wherein comprises

Time CES charge +.>

Cost of->

Part of the demand that cannot be satisfied by CES at the moment

Cost of CES service +.>

，/>

A service charge indicating a CES unit charge amount;

wherein

Is->

Time ToU price of electricity, ->

Is->

Time CES charge,/->

Is->

The amount of discharge delivered by CES to the home in the community at time,/->

Is->

The total household demand in the community at any moment;

constraint conditions:

，

constraint

: consider CES charging efficiency ratio->

And discharge efficiency ratio->

In case of updating state of charge, +.>

Is->

Time CES remaining capacity, ++>

Representing CES total capacity;

constraint II: constraining the CES state, and setting SOE of initial time to 0;

constraint III and constraint IV: constraining CES charge rate

And discharge rate->

Within a reasonable range, CES is prevented from being excessively charged and discharged;

constraint V: the balance of the total demands of communities is ensured.

Preferably, constraint III and constraint IV are defined by reasonable ranges of constraint parameters of the formula:

，

is the maximum timestamp, schedule before day at the interval of hours, +.>

。

Preferably, for any of

The state space of the CES agent is defined at time instant as:

，

in the state of

Is->

The ratio of the remaining capacity of CES to the total capacity at time,/->

Representation->

The state of the environment where the CES agent is located at the moment, the static factors of energy storage are used as the states to be input into the model network, and the action space is +.>

Including CES atThe charge and discharge coefficients at different times are defined as:

，

in the formula ,

indicating CES is +.>

The moment of time is from the grid charging capacity, the value of which ranges from +.>

Between, and->

Moment of time from grid charge->

The relation of->

，/>

Indicating CES is +.>

The discharge coefficient given to the community at the moment is equal to +.>

The relation of (2) is that

，/>

Representation->

Time CES agent in the environment->

Under execution ofAction;

the reward function R represents feedback obtained by the CES agent in the exploration of the environment, for guiding the agent to achieve a predetermined objective, the reward function comprising a reward for the agent to perform a correct action, and a penalty for performing a false action resulting in the environment not meeting the CES device basic constraints, defined as:

，

constraint

Is->

The amount of energy cost saved by the whole system when the agent has performed CES scheduling for 24 hours is defined as follows:

，

when (when)

The larger the scheduling savings, the larger the system awards the agent->

When negative, the system gives a proxy penalty, < ->

All are coefficients, and the strength of rewards and punishments is adjusted.

Preferably, after each LCES is trained locally for a fixed number of times, the final noise gradient is uploaded to the GS, and the structure is satisfied

Noise gradient of->

Is a privacy requirement;

original obtained by training LCES modelGradient of

Need to restrict->

The sensitivity of (2) is calculated as:

，

wherein

Is the gradient of LCES local training, < >>

Is sensitivity, that is to say any two gradients +.>

The method meets the following conditions:

，

based on gradient after shearing

Sensitivity->

Each LCES locally generates Laplace noise +.>

，/>

The method meets the following conditions:

，

wherein ,

is noise->

Is>

And a dimension.

Preferably, the interactive gradient and model of the mutual iteration of the LCES and the GS are scheduled in a continuous state and action space, the PPO algorithm is applied to the learning process of the LCES agent, the PPO algorithm operates a plurality of epodes with a fixed strategy, the running track is reserved, and the rewards obtained by the LCES agent are products of the saved amount and the correlation coefficient when the whole epode is finished.

Preferably, the policy model of the LCES agent inputs the state at each moment, outputs the mean and variance of the continuous motion, samples the motion from the distribution determined by the mean and variance, constructs a noise gradient satisfying LDP definition, reports to the global GS, the global GS caches the received disturbance gradient, updates the GS model using these gradients when a certain number is reached, and broadcasts the updated model to all LCES.

Preferably, in the framework of the FRL, each LCES agent reports a satisfaction

GS uses the noise gradient of LCES to update +.>

Independent of any privacy information of LCES, GS will be updated after the next round>

Broadcast to all LCES, which train in the local environment.

Preferably, it is provided with

Is an original function, without noise, not conforming to the LDP definition, < >>

Is in accordance with

Defined function +_>

，/>

Is two different gradients, sensitivity is defined as:

，

in the noise

Obeys->

Then a function satisfying the strict differential privacy definition is obtained>

。

In the technical scheme, the invention has the technical effects and advantages that:

1. the CES scheduling is carried out based on the federal reinforcement learning framework, the whole algorithm operates in a layered distributed architecture, the local community scheduling agent aims at minimizing daily energy cost of communities, and the method does not need to share energy consumption data among communities, only needs to share disturbance model gradient, and protects privacy of community families.

2. Compared with a static CES scheduling method, the method provided by the invention has the advantages that the effectiveness of the scheduling method is proved by experiments, the federal learning method can be converged more quickly to reach an optimal solution, the agent can be trained in different environments, and meanwhile, aiming at different privacy requirements, the proposed method obtains different experimental effects, and the trade-off between the cost saving amount and the privacy protection degree is demonstrated.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.

FIG. 1 is a diagram of a community energy storage scheduling architecture of the present invention.

Fig. 2 is a diagram of a CES scheduling architecture based on FRL according to the present invention.

Fig. 3 is a block diagram of an FRL-based CES system of the present invention.

FIG. 4 is a schematic diagram of community energy requirements and ToU electricity prices according to the present invention.

Fig. 5 is a diagram of CES scheduling results for different communities of the present invention.

FIG. 6 is a plot of the impact of CES capacity size on community cost savings of the present invention.

FIG. 7 is a schematic diagram showing a comparison of reinforcement learning, federal reinforcement learning, methods of combining differential privacy and static allocation policies in different communities.

FIG. 8 is a graph of reinforcement learning and federal reinforcement learning training according to the present invention.

Fig. 9 is a schematic diagram showing comparison of model convergence speeds under different privacy protection forces according to the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Example 1

Referring to fig. 1, fig. 2 and fig. 3, the CES day-ahead scheduling method based on FRL according to the present embodiment is composed of N community energy storage systems LCES and a single global server GS, and the training process of federal reinforcement learning FRL includes two steps:

the GS sums the noise gradients of the N LCES to update the global model of the GS, and then broadcasts the latest GS model to the LCES; the local model and the global model are iteratively updated until a certain stopping requirement is met.

The reinforcement learning agent performs CES day-ahead scheduling using a near-end policy optimization (PPO) algorithm.

The agent has the task of reducing the total energy expenditure of the community as much as possible under the condition of meeting the energy requirements of families in the community.

CES scheduling is based on the federal reinforcement learning framework. The entire algorithm runs in a hierarchical distributed architecture, with the local community scheduling agent targeting minimizing the daily energy costs of the community. The method does not need to share energy consumption data among communities, only needs to share disturbance model gradient, and protects privacy of communities and families.

For CES agents, the given states include ToU electricity prices, community total energy demand on the day, CES total capacity, CES current capacity duty cycle, and current time of day.

CES agent calculates optimal charge and discharge schedules.

Because of CES capacity limitations, if the community total energy demand cannot be met at a certain time, then the user needs to purchase the balance of energy from the grid at that time.

The FRL mathematical model and algorithm-state and action space formulas, rewarding functions, LDP, CES scheduling algorithm based on FRL and reinforcement learning PPO algorithm.

The scheduling algorithm runs in a layered distributed architecture, the GS updates a global model by aggregating local model gradients, for LCES, the DRL agent is trained by using local data, and model gradients are reported to the GS, and the calculation of the optimal CES agent can be realized only by exchanging model gradients or model parameters between the GS and the LCES;

the combination of LDP into the FRL framework implements a privacy-preserving CES scheduling algorithm, and LCES will perturb the local model gradient using laplace noise before uploading the locally trained model gradient. Gradient aggregation of privacy protection is realized, and local environment privacy is protected;

compared with the independent DRL, the FRL has higher convergence rate, and meanwhile, by adjusting the LDP parameters, the optimal solution can be weighed between privacy protection and model precision.

Example 2

In this embodiment, the optimization target and constraint condition of the CES scheduling system are defined in a mathematical form, and the CES scheduling model based on deep reinforcement learning DRL and the CES scheduling model combined with local differential privacy LDP are described.

CES day-ahead scheduling requires users to reserve a day ahead and then arrange for the corresponding energy storage service in order to minimize the total energy expenditure of the overall system.

Due to the high cost of CES construction, long maintenance is required and energy storage resources cannot be fully utilized by a single household.

Therefore, the energy storage equipment is shared by a plurality of families in the community, the utilization rate of the energy storage equipment is improved, the initial construction cost and the long-term maintenance cost can be shared together, and the total energy consumption of the community is reduced as a whole.

To this end, we construct a target optimization model for community total energy consumption minimization, comprising:

1) An objective function.

The community total energy consumption minimization is defined as follows

（1）

The goal of equation (1) is to minimize the total energy consumption of the community, including

Time CES charge +.>

Cost of (2)

Part of the demands which cannot be met by the time CES ∈>

Cost of (2) and CES service fee

，/>

Indicating the service charge required for CES unit charge.

wherein

Is->

Time ToU price of electricity, ->

Is->

Time CES charge,/->

Is->

Is->

And (5) the total household demand in the community at any time.

2) Constraint conditions.

（2），

Constraint

: in consideration of CES charging efficiency ratio->

And discharge efficiency ratio->

Is a case of updating the state of charge,

is CES remaining capacity at time t, +.>

Representing CES total capacity.

Constraint II: ensuring a viable CES state, assuming an SOE of 0 at the initial time.

Constraint III and constraint IV: ensuring CES charge rate

And discharge rate->

And within a reasonable range, CES is prevented from being overcharged and discharged.

Constraint V: the balance of the total demands of communities is ensured, namely, the household electricity demand in communities can be completely met.

（3），

Equation (3) constrains a reasonable range of parameters in the system,

is the maximum timestamp, the present application considers the day-ahead schedule at the interval of hours, therefore +.>

。

A. CES scheduling model based on DRL:

1) State space: for any arbitrary

At time, the CES agent's state space is defined as follows:

（4），

in the definition of the state space above, the state

Is->

The ratio of the remaining capacity of CES to the total capacity at the moment. />

Representation->

The state of the environment in which the CES agent is located at the moment.

In the prior art, only time-related dynamic variables are considered for the state space of the energy storage agent, but we find through experiments that the static factors related to energy storage are also used as the state input model network, so that the agent convergence speed can be accelerated.

The method is also direct, more relevant information is input into the model network, and the agent can be enabled to know the environment more comprehensively and carefully, so that an excellent decision can be made more quickly.

2) Action space: action space

The charge and discharge coefficients including CES at different times are defined as follows:

（5），

indicating CES is +.>

Between, and->

Moment of time from grid charge->

The relation of->

，/>

Indicating CES is +.>

The discharge coefficient given to the community at the moment is equal to +.>

The relation of (2) is that

，/>

Representation->

Time CES agent in the environment->

Actions performed below.

3) Bonus function: the reward function R represents feedback obtained by the CES agent in the exploration of the environment S for guiding the agent to achieve a predetermined objective.

The setting of the reward function should include the reward for the agent performing the correct action, and the penalty for performing the false action resulting in the environment not meeting the CES device base constraint, and therefore the reward function is defined as follows:

（6），

where constraint VII-constraint IX represents that when the action performed by CES exceeds the constraint in P (1), the system gives a penalty, and if within the constraint, a reward.

Constraint

Is->

（7），

thus when

The larger the current day, the larger the amount of the scheduling savings is, and the more rewards are given to the agent by the system. If->

When negative, the system gives severe penalties to agents.

All are coefficients, the dynamics of rewards and punishments are used for adjusting, and optimal rewards and punishment coefficients are adjusted through experimental results.

For a 24 hour day before schedule scenario, consider that the operation of the agent at each time may exceed the constraint of P (1), and that the total savings at the last time of day before schedule optimizes the execution actions of the agent.

4) PPO algorithm: the CES agent, after performing actions with a specified policy, optimizes the CES agent by increasing the probability of good actions and decreasing the probability of bad actions after the end of the session.

The PPO algorithm uses an importance sampling technology, the problem that samples in the strategy gradient algorithm can be used only once is solved, and the PPO algorithm uses a dominance function to replace a rewarding function, so that the model is more focused on average rewards brought by actions.

We mark the trajectory as

Parameterized policy->

, wherein />

Is a parameter of the distribution approximation. The purpose of the PPO algorithm is to maximize the policy +.>

Lower rewarding desire->

The likelihood function is therefore as follows:

（8），

wherein ,

respectively represent policy->

Under the probability of performing an action, +.>

Is a privacy requirement, defines the clipping range and is related to sensitivity.

Indicating that CES agent is in state->

Execution of action down->

Resulting in an average advantage.

B. CES scheduling model in combination with LDP:

LCES generates Laplace noise to perturb the local gradient before reporting the local gradient, preventing malicious parties from analyzing local privacy information from the gradient.

Thus, the local differential privacy provides a strict privacy guarantee for training results before the LCES reports training results, we assume that LCES uses a random function

Disturbance training result, the value range of the random function is +.>

Define the domain as->

。/>

Definition 1: for any possible input

And a subset of arbitrary outputs +.>

The random function is +_if and only if the following inequality holds>

Satisfy->

：

（9），

Definition 1 requires that in the random function, the output from the two approximate inputs is indistinguishable, i.e., for the approximate training results in LCES, the result is passed through the random function

The resulting output is indistinguishable.

Definition 2: for arbitrary input

Random function->

The sensitivity of (c) is defined as follows:

（10），

sensitivity defines the maximum variation of the random function as the input data set fluctuates

The maximum change in output that occurs.

Laplace mechanism: the Laplace mechanism is a random mechanism that randomly samples from the Laplace distribution according to the sensitivity of the objective function, defined as:

（11），

for the random function

Arbitrary deterministic or random function as defined above>

If->

Satisfy->

Then

Also for arbitrary inputs->

Satisfy->

。

We set the GS to have a parameterized global model

，/>

Is->

Dimension of (2);

CES proxy input during local training

And acquiring the next action;

after a number of rounds, the agent updates the model with a loss function based on the historical trajectory information and the rewards obtained

；

After multiple rounds of updating, the agent finds the final updated gradient and LCES computes a perturbed random gradient before reporting to GS;

desirably by a random function

Is satisfied->

Is a noise gradient of (c).

Definition 3 (meet

Noise gradient of (d): for any local community scheduling system->

Arbitrary two local gradients +.>

And arbitrary random gradient subset->

The following inequality must hold:

（12），

wherein

Is the noise gradient after disturbance, +.>

Is the true gradient obtained by LCES local training.

For LCES reported noise gradients, the GS convergence averages the gradient and is then used to update the global model and share the latest GS model with all LCES.

We assume that each LCES uploads the final noise gradient to the GS after a fixed number of local training.

By the definition above we can construct a structure satisfying

Is a noise gradient of (c).

Raw gradient training for LCES model

First of all, a restriction is required>

The sensitivity of (2) is calculated as: />

（13），

wherein

Is the gradient of LCES local training, < >>

Is sensitivity, that is to say any two gradients +.>

The method meets the following conditions:

（14），

based on gradient after shearing

Sensitivity->

Each LCES can generate Laplace noise locally +.>

，/>

The method meets the following conditions:

（15），

wherein ,

is noise->

Is>

And a dimension.

Example 3

The present embodiment proposes a CES scheduling algorithm based on FRL, see algorithm one:

firstly, initializing related input parameters including energy requirements, toU price and CES related parameters at all moments of a community, and a GS reinforcement learning model

Dimension->

And broadcast toAll LCES, clipping parameters->

Local privacy requirement->

Then starting the loop, the maximum loop number being the maximum communication number, starting the calculation for all LCES, iterating from epoode=0 to the maximum update number of LCES,

according to the strategy

Run 96 time stamps and record policy track +.>

Calculating the dominance function of each state +.>

Calculating a loss function:

，

then updating the LCES reinforcement learning model by using an Adamw optimizer to calculate model gradient

And a disturbance gradient->

And the noise gradient after disturbance is +.>

Reported to the GS, which can buffer all received noise gradients, and if the GS buffer is full, calculate the mean +.>

And update the global model +.>

Finally, the buffer memory is emptied,and outputting a result to finish the algorithm.

The algorithm runs in a distributed fashion, with interactive gradients and models of LCES and GS iterating over each other. The LCES agents are scheduled in a continuous state and action space, we apply the PPO algorithm to the LCES agent's learning process.

The PPO algorithm runs multiple epodes with a fixed policy, keeping the running trace, which we set to 96 timestamps in this application. Then, according to the existing track, the probability of the action with large average rewards is increased, and the probability of the action with small average rewards is reduced.

The reward earned by the LCES agent in the present system is the product of the saved amount and the correlation coefficient when the entire epoode is finished.

The policy model of the LCES agent inputs the state of each moment, outputs the mean and variance of the continuous action, and samples the action from the distribution determined by the mean and variance.

This allows the LCES agent to try to all possible possibilities of the action space, avoiding trapping in the extremum region.

After the local training is completed, LCES follows algorithm two:

to calculate

Noise gradient->

Firstly, the corresponding parameters are input, including the original gradient +.>

Dimension->

Privacy requirement->

Cutting range->

According to equation (13), it is possible to rely on the original gradient +.>

Calculating shear gradient->

Then, the cyclic calculation is performed a plurality of times according to the same formula (15) until the cyclic number reaches d, and then noise +.>

Finally return the result->

The algorithm is completed.

Constructing noise gradients meeting LDP definition, reporting the noise gradients to a global GS, caching the received disturbance gradients by the global GS, updating a GS model by using the gradients after a certain number of the received disturbance gradients are reached, and broadcasting the updated model to all LCES.

In the FRL framework-based algorithm one described in the application, all LCES meet the following requirements

。

Within the framework of FRL, each LCES agent reports one satisfaction

Is updated by the noise gradient of LCES alone>

This step is independent of any privacy information of the LCES;

and update the model to violate

After the next round, GS will be updated +.>

Broadcast to all LCES, LCES trains in the local environment, and the local learning process is independent of all other agents, so that other agents are not violated either

And (5) defining.

Gradient satisfaction of Laplace noise disturbance

And (5) defining.

Assume that

Is in accordance with->

Defined functions, i.e.)>

，/>

Is two different gradients. Sensitivity is defined as see equation (10), privacy budget is +.>

It is possible to obtain:

（16），

i.e. the probability of the random function outputting a specified value, equal to the probability distribution of the associated noise, we let us say that

Obeys->

Then a function satisfying the strict differential privacy definition can be obtained +.>

The following formula is obtained:

（17），

at this time if function

A scalar is output of (1), then:

，

（18），

the above formula shows a gradient

Obtaining a specified result via a noise function>

Is the same to obtain the gradient +.>

And->

Is defined by the probability formula:

，/>

in comparison, the two can be obtained:

，

thus, the gradient of the disturbance satisfies

And (5) defining.

Example 4

In this embodiment, the relevant work is verified using the authentic data. Consider three communities of different CES specifications, as shown in table 1:

，

in the table 1 of the description,

energy demand and ToU electricity prices for each community as shown in fig. 4, we assume that 50 iterations of LCES training followed by one round of communication with GS, the experiment was run on the Ubuntu system using python3.9 and pytorch1.12.1.

The scheduling effect of the proposed method is evaluated first. CES scheduling service scenario showing 3 communities is shown in fig. 5.

Each community can discharge by using CES equipment in the peak period of electricity price to realize energy arbitrage, and as can be seen from fig. 5, the main energy requirements of the community come from the power grid in the low peak period of electricity price.

Because the initial CES has no stored energy, CES is charging the reserve energy before the peak period, starting at 0 a day.

When the time comes to the electricity price peak period, the main energy consumption of the community is provided by CES, and if the CES cannot completely meet the demands of the community family at certain moments, the community family can supplement the balance demands from the power grid.

As shown in fig. 6, it can be observed that when CES capacity is small, there is a significant increase in the cost savings amount of the community as CES capacity increases, but the cost savings amount after CES capacity exceeds some upper limit is not significant, and even does not increase any more, and for community two, CES maximum capacity threshold is between 70 and 80kWh, so our method can also combine user history data to predict community optimal CES capacity.

In fig. 7, we compare the cost savings of four different scheduling methods, namely reinforcement learning, federal reinforcement learning, methods of combining differential privacy, and static allocation strategies as proposed in this application.

In the static allocation strategy, community shared energy storage capacity is divided into different community users, and the users independently operate own energy storage capacity.

When privacy concerns are not considered, reinforcement learning and federal reinforcement perform better than static allocation strategies.

And dynamic battery allocation strategies are always preferred over static strategies because static allocation cannot reuse CES capacity and cannot achieve optimal CES scheduling solutions.

As can be seen from fig. 8, the federal reinforcement not only improves the model performance but also increases the model convergence rate.

This is because agents in federal enforcement can learn knowledge from more environments.

When privacy is considered, CES agents sacrifice some performance in exchange for privacy protection, indicating a tradeoff between privacy and utility.

At the same time we can see that even with privacy protection in mind, the proposed method still performs better than the static allocation strategy.

Fig. 9 shows a comparison of model convergence rates at different privacy protections, as can be seen,

the represented solid line is better than +.>

Is shown in the figure).

This is because

The larger the noise is, the smaller the privacy protection on the gradient is, but the more excellent model performance and faster convergence speed can be obtained.

The difference between the final convergence of the two is not great as the model is trained, which also means that the model can learn the correct knowledge even under the stricter privacy requirements.

This is because adding noise to the model is also a method to prevent model overfitting and can improve the reasoning ability of the model.

The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. When the computer instructions or computer program are loaded or executed on a computer, the processes or functions described in accordance with the embodiments of the present application are all or partially produced. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more sets of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.

It should be understood that the term "and/or" in this application is merely an association relationship describing the associated object, and indicates that three relationships may exist, for example, a and/or B may indicate: there are three cases, a alone, a and B together, and B alone, wherein a, B may be singular or plural. In addition, the character "/" in this application generally indicates that the associated object is an "or" relationship, but may also indicate an "and/or" relationship, and may be understood by referring to the context.

In the present application, "at least one" means one or more, and "a plurality" means two or more. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural.

It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. The CES day-ahead scheduling method based on FRL is characterized in that: including a plurality of community energy storage systems LCES and a single global server GS;

the training process of the FRL comprises the following steps:

the local model and the global model are iteratively updated to meet the stopping requirement, and training is completed;

the FRL operates in a hierarchical distributed architecture, the GS updates a global model by aggregating local model gradients, the LCES trains a DRL agent by using local data, and reports model gradients to the GS, and only model gradients or model parameters are exchanged between the GS and the LCES to realize the calculation of the CES agent;

the CES building a target optimization model for minimizing community total energy consumption comprises the following steps:

，

wherein comprises

Time CES charge +.>

Cost of->

Part of the demand that cannot be satisfied by CES at the moment

Cost of CES service +.>

，/>

A service charge indicating a CES unit charge amount; t is the maximum timestamp, scheduled before the day at the interval of hours, then t=24;

wherein

Is->

Time ToU price of electricity, ->

Is->

Time CES charge,/->

Is->

Is->

The total household demand in the community at any moment;

constraint conditions:

，

，

，

，

，

constraint

: consider CES charging efficiency ratio->

And discharge efficiency ratio->

In case of updating state of charge, +.>

Is->

Time CES remaining capacity, ++>

Representing CES total capacity;

constraint III and constraint IV: constraining CES charge rate

And discharge rate->

constraint V: the balance of the total demands of communities is ensured.

2. The FRL-based CES day-ahead scheduling method of claim 1, further comprising: constraint III and constraint IV are defined by the following reasonable range of constraint parameters:

，

is the maximum timestamp, schedule before day at the interval of hours, +.>

。/>

3. The FRL-based CES day-ahead scheduling method of claim 2, further comprising: for any arbitrary

The state space of the CES agent is defined at time instant as:

，

in the state of

Is->

The ratio of the remaining capacity of CES to the total capacity at time,/->

，/>

Representation->

The charge and discharge coefficients including CES at different times are defined as:

，

in the formula ,

indicating CES is +.>

Between, and->

Moment of time from grid charge->

The relation of->

，/>

Indicating CES is +.>

The discharge coefficient given to the community at the moment is equal to +.>

The relation of->

，/>

Representation->

Time CES agent in the environment->

Action performed down;

，

constraint

Is->

when (when)

The larger the scheduling savings, the larger the system awards the agent->

When negative, the system gives a proxy penalty,

all are coefficients, and the strength of rewards and punishments is adjusted.

4. The FRL-based CES day-ahead scheduling method of claim 3, further comprising: after each LCES is trained locally for a fixed number of times, the final noise gradient is uploaded to the GS, and the structure meets the requirement

Noise gradient of->

Is a privacy requirement;

original gradient obtained by LCES model training

Need to restrict->

The sensitivity of (2) is calculated as:

，

wherein

Is the gradient of LCES local training, < >>

Is sensitivity, that is to say any two gradients +.>

The method meets the following conditions:

，

based on gradient after shearing

Sensitivity->

Each LCES locally generates Laplace noise +.>

，/>

The method meets the following conditions: />

，

wherein ,

is noise->

Is>

And a dimension.

5. The FRL-based CES day-ahead scheduling method of claim 4, further comprising: the LCES and GS mutually iterate interactive gradient and model, the LCES agent is scheduled in continuous state and action space, the PPO algorithm is applied to the learning process of the LCES agent, the PPO algorithm runs a plurality of epodes with fixed strategy, the running track is reserved, and the rewards obtained by the LCES agent are the products of the saved amount and the related coefficients when the whole epode is finished.

6. The FRL-based CES day-ahead scheduling method of claim 5, further comprising: the strategy model of the LCES agent inputs the state of each moment, outputs the mean value and the variance of continuous motion, samples the motion from the distribution determined by the mean value and the variance, constructs noise gradients meeting LDP definition by LCES, reports the noise gradients to the global GS, the global GS caches the received disturbance gradients, updates the GS model by using the gradients after a certain number of the received disturbance gradients is reached, and broadcasts the updated model to all LCES.

7. The FRL-based CES day-ahead scheduling method of claim 6, further comprising: in the frame of the FRLEach LCES agent reports one of the satisfaction