CN116862551A

CN116862551A - New energy consumption price decision method considering user privacy protection

Info

Publication number: CN116862551A
Application number: CN202310714951.4A
Authority: CN
Inventors: 沈煜; 胡伟; 孔祥玉; 杨帆; 杨志淳; 雷杨; 胡成奕; 陈鹤冲
Original assignee: Tianjin University; Electric Power Research Institute of State Grid Hubei Electric Power Co Ltd
Current assignee: Tianjin University; Electric Power Research Institute of State Grid Hubei Electric Power Co Ltd
Priority date: 2023-06-15
Filing date: 2023-06-15
Publication date: 2023-10-10

Abstract

The invention provides a new energy consumption electricity price decision method considering user privacy protection, which comprises the following steps: performing flexible user demand response potential evaluation aiming at new energy consumption demands; constructing an optimizing model for new energy consumption; the local reinforcement learning pricing algorithm combines the user response potential evaluation result, and makes a response electricity price decision by utilizing local user data based on an optimization target; based on the improved encryption transverse federal learning idea, the joint training of the response electricity price decision global model is carried out, and on the premise that user data does not leave the local area, the privacy of the user and the safety of the electricity consumption data are protected, and the optimization promotion of the global electricity price decision is realized. The invention can effectively avoid the leakage of the actual electricity utilization data of the user, and protect the safety of the user data and the privacy of the user information; through proper electricity price decision, the flexible power consumer is stimulated to actively adjust the load electricity consumption and cooperate with the new energy output, so that the purposes of improving the new energy consumption and reducing the power fluctuation are realized.

Description

New energy consumption price decision method considering user privacy protection

Technical Field

The invention relates to the field of electricity price decision, in particular to a new energy consumption electricity price decision method considering user privacy protection.

Background

Aiming at demand response pricing strategies oriented to new energy consumption, rich researches are carried out at home and abroad, and main stream methods can be divided into pricing based on an optimization theory, pricing based on a game theory and dynamic pricing methods based on reinforcement learning. In terms of optimization theory, cross-iteration of the Lagrangian multiplier with the alternating direction multiplier is used to improve the convergence speed and accuracy of the real-time pricing algorithm. The optimization model can be firstly converted into an equation set with Lagrangian multipliers through KKT (Karush-Kuhn-Tucker) conditions, and then the equation set is solved by a smooth Newton method, so that the objective is achievedThe fast availability of real-time optimized electricity price. Aiming at user utility functions with different characteristics, a method for continuously approaching by using linear piecewise functions is researched to realize smooth solution of real-time electricity price ^[2] . Although the optimization theory type solving method represented by the above documents can be used for solving the feasible real-time electricity price through the traditional optimization method, when facing to a large number of users with different types in a virtual power plant, the calculating process of solving is complex, the burden is heavy, the optimization decision behaviors of the users cannot be accurately simulated, and the accurate and rapid demand response pricing requirement is difficult to meet ^[3] 。

The game theory pricing method is mainly used for solving the incentive electricity price based on the interaction relation between the energy supplier and the load user. Common gaming pricing methods can be categorized into collaborative gaming and non-collaborative gaming. While the user side load participation in the virtual power plant involves both the supply and demand gaming, the entire gaming process is completed as an aggregate. Because the users operate independently and have different strategies, and the virtual power plant participates in the process as a service aggregator, the process can be regarded as a global cooperative game in which a plurality of user bodies participate, and the aim of the process is to obtain the overall optimal effect of the alliance ^[4] . The global cooperative game has higher efficiency in solving the optimal solution, and the difficulty is mainly in the construction of a follow-up benefit distribution mechanism. Measuring the actual response contributions of individual users, and thus, equitable benefit distribution, becomes a challenge for virtual power plants. In addition, in practical application, it is generally difficult for a real power consumer to maintain a perfect ideal game state, and the response behavior of the real power consumer has the characteristics of random deviation and incomplete rationality. This is also a key issue limiting game pricing methods ^[5] 。

In fact, the dynamic decision process of electricity price can be generalized as a markov decision problem, that is, the user makes appropriate load adjustment behavior feedback according to the response environment and influencing factors such as electricity price, so as to obtain better benefits. Reinforcement learning is a neural algorithm based on the Markov decision idea, and is more suitable for solving the problem of interaction between a user and a virtual power plant environment, and is widely adopted in recent years. In terms of reinforcement learning, early studies mostly focused on dynamic programming algorithms (Dynamic Programming, DP) and monte carlo methods. The dynamic programming algorithm runs independently of the environment model and requires accurate state transition probabilities. And in the actual electricity price decision, the virtual power plant has difficulty in acquiring the probabilities of different behaviors of the user. The Monte Carlo method does not depend entirely on an environmental model, and can learn and make optimization decisions from past experience autonomously. However, it is more suitable for strategic tasks, and the randomness of the selection actions is strong, and the decision efficiency is relatively low. Time Differential Learning (TDL) in reinforcement learning combines the advantages of both, and is not dependent on an environmental model, is not limited to a strategic task, and can continuously execute decisions. However, the reinforcement learning model needs a large amount of user actual data during training, and often faces the problems of "data island" among different units, user privacy disclosure and the like, and needs to be solved.

In summary, the current VPP internal demand response price decision method mainly faces the following problems: (1) The traditional optimization theory pricing method is complex in calculation, and when the number of users in the virtual power plant is increased, algorithm efficiency and accuracy are limited. (2) The pricing method based on the game theory is difficult to simulate the randomness of the response behavior of the real user, and needs to be further researched in the aspect of fairness guarantee mechanism. (3) The deep reinforcement learning algorithm can effectively simulate the real scene of response interaction between price excitation and users, but the algorithm operation needs huge real user data to support, so that the privacy safety of user information is threatened, and the decision efficiency is limited by calculation discretization.

Disclosure of Invention

According to the new energy consumption electricity price decision method considering user privacy protection, firstly, aiming at new energy consumption requirements, flexible user demand response potential evaluation is carried out; then, an optimization model oriented to new energy consumption is constructed so as to reduce the new energy waste amount, increase the new energy power generation capacity, stabilize the power fluctuation of the power grid and promote the demand response income of a user side; further, the local reinforcement learning pricing algorithm combines the user response potential evaluation result, and based on an optimization target, the response electricity price decision is made by utilizing the local user data; and finally, based on an improved encryption transverse federal learning idea, the joint training of a response electricity price decision global model is carried out, and on the premise that user data does not leave the local area, the privacy of the user and the safety of the electricity consumption data are protected, so that the optimization promotion of the global electricity price decision is realized.

A new energy consumption electricity price decision method considering user privacy protection comprises the following steps:

step 1: establishing a user demand response potential evaluation model, and evaluating the user demand response potential according to the user demand response potential evaluation model to obtain a potential evaluation result, wherein the potential evaluation result comprises a demand response load adjustment amount and a contribution factor for evaluating the participation of the user in demand response;

step 2: constructing a multi-objective optimization model for new energy consumption;

step 3: based on the potential evaluation result and the multi-objective optimization model, training a local pricing model based on a reinforcement learning algorithm;

step 4: based on the trained local pricing model, joint training of a local pricing algorithm is carried out, and global day-ahead electricity price decision taking user privacy protection into consideration is realized.

Further, the step 1 includes the following steps:

step 1.1: considering the data demand of the user demand response prediction load adjustment quantity, and arranging a data set required by the potential evaluation model training;

step 1.2: based on the data set arranged in the step 1.1, establishing a user demand response potential evaluation model based on LSTM, and predicting the demand response load adjustment quantity d of a user under a given electricity price _i,t ；

Step 1.3: responsive to the load adjustment amount d according to the demand of step 1.2 _i,t And actual demand response data of subsequent users, solving a contribution factor D for evaluating user participation demand response _M 。

Further, step 1.1 specifically includes:

when user response behaviors are simulated, dividing a day into T time periods, and correspondingly establishing T LSTM networksCollaterals are respectively treated with L _j (j=1, 2, …, T), the T period on the nth day being denoted as t=nt+j, training the network for time period T with data for the corresponding period in the historical date; the input data contains the price lambda of the current period _dr,t And response price and response load adjustment amount for the first m periods:

I _t ＝{(λ _dr,t-m ,d _t-m ),(λ _dr,t+m+1 ,d _t-m+1 ),…,(λ _dr,t+1 ,d _t+1 ),λ _dr,t } (1)

in training, for each LSTM model, the input dataset for the network is { I } ₁ ,I ₂ ,…,I _t+1 Output as user response dataset { d } ₁ ,d ₂ ,…,d _t+1 After training is completed, input I of target t period _t The load adjustment d of the user simulation response in the period can be obtained _t 。

Further, step 1.2 specifically includes:

the specific training process of the LSTM model is completed by an input gate, a forgetting gate and an output gate, wherein the input gate is combined with the current input content I _t And the output d of the last period _t-1 Update information and store the updated part selectively to cell state c _t The forgetting gate determines the forgetting or updating degree of the cell state, determines the forgotten information, and the output gate calculates the output content at the moment, namely the response load adjustment quantity of the user, based on the updated cell state, wherein the process is expressed as follows:

f _t ＝σ(W _f [h _t-1 ，I _t ]+b _f ) (2)

z _t ＝tanh(W _z [h _t-1 ，I _t ]+b _z ) (3)

i _t ＝σ(W _i [h _t-1 ，I _t ]+b _i ) (4)

c _t ＝f _t ·c _t-1 +I _t ·z _t (5)

o _t ＝σ(W _o [h _t-1 ，I _t ]+b _o ) (6)

d _i ＝o _t ·tanh c _t (7)

Wherein W is _i 、W _z 、W _f 、W _o Respectively representing input gate, forgetting gate, output gate and weight matrix input by LSTM memory unit for communicating x and LSTM memory unit, b _i 、b _z 、b _f 、b _o Respectively representing the input gate, the forgetting gate, the output gate and the bias vector input by the LSTM memory unit;

in the actual training process, adding a full-connection layer behind the LSTM neuron layer for processing the final output result; the historical data set is divided into a test set and a verification set, the root mean square error RMSE is selected as a loss function fed back by the LSTM model, the average absolute percentage error MAPE is used as error evaluation of LSTM prediction, and the calculation formulas of the two are respectively as follows:

wherein d andrepresenting the actual and predicted values of the user response, respectively.

Further, step 1.3 specifically includes:

the main factors of the contribution factors include response quantity, response participation rate and conservation condition, wherein the response quantity represents the load quantity which is actually reduced and adjusted by a user in the peak period of response; the response participation rate refers to the proportion of the number of times that the user actually participates in the response to all the users needing to respond in one quarter; the conservation case is used for measuring the deviation range of the actual load reduction amount and the promised load reduction amount of the user:

wherein, gamma _i Representing response participation rate of the ith user in a quarter, f _t Representing the number of times the ith user participates in the demand response; Δd _i Measure index for user i's conservation, d _i，t For user i, predicted demand response in period t, d _Ac，i，t To actually respond to the load, d _avg，t The response average value of all users in the virtual power plant at the moment t is taken as a default punishment force parameter;

the process of calculating the contribution factor is as follows:

X _i ＝(d _i ，γ _i ，Δd _i ) (13)

wherein D is _Mi Contribution factor, X, calculated for user i in terms of Markov distance within the quarter _i The vector is calculated for the user i and consists of three aspects of response quantity, response participation rate and conservation condition; mu is the sample mean vector of all users, Σ ^-1 A covariance matrix representing the multidimensional random variable.

Further, step 2 includes the steps of:

step 2.1: calculating power fluctuation of total energy output in park by delta _t Total energy output power fluctuation rate in the park representing the period t:

wherein P is _G，t The method comprises the steps of actually outputting all energy sources in a local park, wherein the actual output comprises new energy sources, a micro-coal turbine and energy storage equipment; Δt is the time interval between two time periods;

step 2.2: from the new energy output and the power fluctuation rate delta obtained in step 2.1 _i Co-determining a target response d for a user _target，t ：

Step 2.3: combining the user contribution factor obtained in the step 1.3 and the response load adjustment quantity d of the user _i，t Response to electricity price lambda with demand _dr，t Is variable based on the target response d of the user _target，t And establishing a multi-objective optimization model for smooth consumption of new energy.

Further, step 2.2 specifically includes:

firstly, the total load after the user responds cannot exceed the maximum output of all power supplies; subsequently, the load amount d which specifically requires the user to respond is calculated according to the limitation of the fluctuation rate _target，t The process is as follows:

(1) If delta _t ＜δ _drop Wherein delta _drop Is the lower limit of the power reduction rate, is a negative value, then

d _target，t ＝P _G，t-1 -δ _drop Δt (15)

(2) If delta _t ＞δ _rise Wherein delta _rise An upper limit of the power increase rate is a positive value, then

d _target，t ＝P _G，t-1 -δ _rise Δt (16)

(3) If delta _drop ≤δ _t ≤δ _rise Then

d _target，t ＝0 (17)。

Further, in step 2.3, the multi-objective optimization objective B of the multi-objective optimization model includes a new energy consumption benefit B _R Power fluctuation B _F And user demand response benefit B _U New energy consumption benefit B _R The optimization is mainly carried out by reducing the waste electric quantity and increasing the power generation capacity of new energy, and the power fluctuation B _F After the micro-coal turbine is added, all energy sources in the park are consideredThe cost of feeder power fluctuation caused by output, and the demand response income B of users _U The method mainly comprises two parts of energy consumption cost reduced by a user and response deviation punishment, wherein the response deviation is the difference between a target response value and an actual response value;

The objective function and constraint conditions of the multi-objective optimization model are as follows:

max B(λ)＝B _R +B _F +B _U (18)

wherein eta represents the preferential specific gravity of the two parts and is in the range of 0,1]When η=0, the optimization objective only considers smooth digestion, respectively using T _new 、T _dr Representing the number of time periods in which the calculation of power fluctuations and demand responses are performed; t period, lambda _o，t Purchasing price for the electrical energy before demand response; lambda (lambda) _flu Loss of cost coefficients for power fluctuations; lambda (lambda) _dr，t Lambda is the price of electricity price for demand response _p，t Penalty coefficients for response bias; lambda (lambda) _dr，min 、λ _dr，max Limiting the real-time electricity price; p (P) _R，t For the actual output of new energy in t period, P _Ro，t Predicted power before new energy is regulated for t period, P _Rmax，t 、P _Rmin，t Representing the limit of new energy output in the t period; d, d _i，t A predicted demand response amount representing the user, which is required in the user response capability range d _max Is carried out internally; d (D) _Mi For the contribution factor calculated by user i as a function of the mahalanobis distance in the quarter, the parameter y is used to control the extent of influence of the user contribution factor.

Further, step 3 includes the following steps:

step 3.1: based on the user target response obtained in the step 2.2, establishing a reinforcement learning pricing environment of the local new energy park, and specifically setting the external bid price lambda of the virtual power plant in the multi-target optimization model _vpp，t The value is determined by the distribution network operators, and the variables in the optimization targets mainly consider the response electricity price lambda inside the new energy local park _dr，t And a consequent user response load adjustment d _i，t The method comprises the steps of carrying out a first treatment on the surface of the Reinforcement learning is performed by using an A3C distributed algorithm, where X (x=1, 2,..and X) represents the number of interaction iterations, and the pricing process of A3C mainly involves four main factors, and the state s= { S _x Action a= { a _x The return r= { R _x Probability of state transition P(s) _x+1 |s _x ，a _x ) State s _x Information representing a demand response target, a bid result, an internal response price, and the like at the x-th iteration, action a _x Representing response electricity price formulated in virtual power plant, and reporting r _x Representing the benefits obtained by the virtual power plant through demand response, the state transition probabilities P (s _x+1 |s _x ,a _x ) Indicating that the VPP takes action a _x State is made to be s _x Transfer to s _x+1 Probability of (2);

step 3.2: based on the local reinforcement learning pricing environment established in the step 3.1, the Actor network makes a demand response electricity price decision, and inputs a state s _x Probability distribution P of each electricity price selected as response electricity price in output range of Actor network and accordingly gives demand response price lambda in virtual power plant _dr,t ；

Step 3.3: based on the response potential evaluation result of the step 1.2 and the multi-objective optimization model of the step 2.3, the Critic network evaluates the electricity price decision action of the step 3.2, and guides the decision action according to the optimization target so as to improve the income and speed of the decision of the Actor network;

Critic network outputs the desired V(s) for decision making using the current Actor network to obtain revenue _x ) The profit situation of the pricing decision is measured by using an optimized objective function:

V(s _x )＝E[R _x |s＝s _x ] (22)

wherein E [ ] represents mathematical expectation, gamma is a discount factor, and the value range is [0,1];

evaluation of Critic network output V(s) using time differential error TD-error _x ) The difference between the value and the true value is specifically obtained by an Actor network to obtain the demand response price lambda in the VPP _dr,t Thereafter, the load adjustment amount d of the user is predicted by using the user response potential evaluation model _t Calculating the benefit r by an objective function _x And obtain the next state s _x+1 Subsequently, critic network utilizes s _x Sum s _x+1 Calculate the output V(s) _x ) And V(s) _x+1 ) Then the TD-error value delta of the decision is obtained:

δ＝r _x +γV(s _x+1 )-V(s _x ) (24)

step 3.4: judging the property of the pricing action and adjusting the future pricing direction according to the evaluation result in the step 3.3;

for each day-ahead decision electricity price a _x Obtained response benefit r _x If r _x Is higher than the expected value of Critic network for revenue (V(s) _x )-γV(s _x+1 ) If the corresponding delta is a positive value, the training of the Actor network by the current electricity price decision action is an effective positive experience; otherwise, the action is considered as the negative experience of the Actor network training, and the strategic avoidance is carried out in the subsequent pricing process;

Step 3.5: according to the TD-error value delta calculated in the step 3.3, carrying out gradient parameter updating of the local reinforcement learning pricing model;

loss function loss for Critic networks _critic The gradient parameters are denoted epsilon and the calculation and iteration formulas are expressed as:

loss _critic ＝δ ² (25)

in the loss of _critic Is the loss function of the Critic network; epsilon is the gradient parameter of the Critic network; beta is the learning rate, and Critic network training aims at minimizing TD-error;

loss for loss function of Actor network _actor The gradient parameters are represented by θ, and the calculation and iteration formulas are represented as:

loss _actor ＝δ·logP(a _x ∣s _x ,θ)+c·H(P(s _x ,θ)) (27)

in the loss of _actor Is the loss function of the Actor network; θ is the gradient parameter of the Actor network; alpha is the learning rate; h is the entropy of the probability distribution P, c is its coefficient;

step 3.6: and (3) repeating the steps 3.2-3.5 until the training of the reinforcement learning pricing model of the local new energy park is completed.

Further, step 4 includes the steps of:

step 4.1: the virtual power plant management center obtains load reduction information from an upper layer operator, wherein the load reduction information comprises a reduction period, load quantity required to be reduced and reference response electricity price;

step 4.2: and the central server performs parameter issuing: in the parameter issuing stage, a central server initializes model parameters and transmits original parameters to each local server, when global model updating is carried out, the central server broadcasts the downloaded global parameters which become the updated pricing model, and before issuing, PHE semi-homomorphic encryption is carried out on the parameters by the central server;

The key feature of the PHE semi-homomorphic encryption algorithm is to support addition homomorphic and scalar multiplication homomorphic, assuming that the results of homomorphic encryption of u and v are [ [ u ] ] and [ [ v ] ], for PHE, the following rules hold:

addition homomorphism:

scalar multiplication homomorphism: dec _sk ([[u]]⊙m)＝Dec _sk ([[u·m]]) (30)

Where Dec represents a decryption function; sk represents a key used for decryption;indicating that multiplication can be performed on the ciphertext, ++indicates the m-th power of the ciphertext content;

step 4.3: the method comprises the steps that model parameters in step 4.2 are utilized, local servers conduct local pricing model training according to step 3.6, iterative gradient parameters are encrypted and uploaded to a global center server, after the global parameters obtained through decryption of all local models are utilized, local training of a virtual power plant day-ahead demand response pricing optimization model is directly conducted according to step 3, all local servers do not share data sets, the local models are directly trained through own data, and leakage of privacy data used by users is avoided;

in the training process of the local model, along with updating and calculating of parameters, the model parameters after training are shown in the formula:

Wherein omega is _(n)s Is the parameter of the nth local model at the (s+1) th round of iteration of the global model; η is the learning rate of algorithm iteration;a computational function representing gradient descent;

step 4.4: the central server aggregates the local model parameters uploaded in the step 4.3, calculates and updates global model gradient parameters: when the local training reaches the upper limit of the times, each local server executes PHE semi-homomorphic encryption on parameter information such as gradient obtained after the local model training is completed, and then uploads the parameter information to a central server, key training promotion of a global model is realized on the premise of not directly externally transmitting local data, the central server aggregates each local model parameter according to an improved weighted average algorithm, global parameters are generated, and the global parameters are transmitted to each local server after encryption;

assuming that N units are managed in the virtual power plant, each unit has k _n The users participate in demand response, wherein n=1, 2, …, N, the total number of users of all units is K, and according to a gradient average algorithm, when the central server performs the (s+1) th iteration, the model gradient weight of the nth local server in the FedAvg framework is as follows:

wherein k is _n The number of users for which the load adjustment is to be responded to for the nth unit internal reference and the demand; k is the total number of all users, N is the total number of units; s is the iteration number of the model; omega _s The global model weight updating result after the s-th iteration of the central server is obtained; omega _(n)s Performing gradient descent on the nth user unit by using the local data to obtain a new local model weight;

the model gradient aggregation is carried out by adopting an improved weighted average federation algorithm, the weight for accurately measuring data and calculating quality is added to the improved weighted average federation algorithm on the basis of the original model, the training sample data is divided into a test training sample and an original training sample, and the original training sample carries out pricing training on the original global model according to the traditional weighted average algorithm; the test sample is used for measuring the accuracy of the model training in the last step and is used for measuring the data quality and the model training effect; training accuracy q _n The calculation formula of (2) is shown as follows:

wherein q is _n Representing training accuracy of the nth user unit; c (C) _n Representing the correct sample number of the model training result on the nth local server after being compared with the test sample; x is X _n Representing the total number of test samples;

after obtaining the model training accuracy of each local server, replacing the specific gravity factor which only considers the number of users in a unit by the specific gravity factor which considers the training accuracy, redefining an improved weighted average federation algorithm, wherein the calculation process is as follows:

In the method, in the process of the invention,the global model parameters updated after the s-th round of iteration are also parameters which are broadcasted to each local model at the beginning of the (s+1) -th round of iteration; omega _(n)s Parameters of an nth local model in the s-th round of iteration; alpha _(n)s+1 Training quality factors of the nth unit in the (s+1) th round of iteration of the global model; q _(n)s+1 Training accuracy obtained after testing the unit; l is an adjustable super-parameter for adjusting the influence of training quality factors, due to all alpha during the whole iteration process _(n)s+1 The sum of (2) is 1, so that the convergence of the formula (34) in the improved algorithm can be ensured as long as the formula (32) in the algorithm is converged;

step 4.5, judging whether the global pricing model training is completed, if yes, ending, otherwise, encrypting the latest global model parameters and sending the encrypted global model parameters to each local server, and repeating the steps 4.2-4.4 until the global pricing model training is completed, so that a day-ahead electricity price scheme considering new energy consumption optimization can be obtained;

step 4.6: applying the pricing model trained in the step 4.5 to carry out global pricing decision of day-ahead response of the virtual power plant; reporting the decision electricity price and the expected response condition to an upper power distribution network power transaction center; the electric power transaction center organizes centralized bidding of each virtual power plant and discloses result information; the virtual power plant adjusts the day-ahead response pricing strategy according to the bidding result and reports the day-ahead response pricing strategy again; repeatedly deciding, bidding and optimizing until a day-ahead response electricity price decision scheme for finally considering new energy consumption in the virtual power plant is determined;

Step 4.7: after determining a final response electricity price scheme, broadcasting the electricity price scheme obtained in the step 4.6 to each unit inside by the virtual power plant management center, exciting a user to respond in the future, and adjusting load energy; according to the actual response performance of the user on the second day, the local server updates a database and calculates a model error; the central server aggregates the error conditions of each unit, updates the global model parameters and participates in actual response settlement of the upper-layer power transaction center.

The technical scheme provided by the method has the beneficial effects that:

(1) In the energy company level, the new energy consumption electricity price decision method considering the privacy protection of the user fully aims at the problems of strong fluctuation, more wind abandoning and electricity abandoning quantity and the like in the new energy consumption process, and through proper electricity price decision, the flexible power user is stimulated to actively adjust the load electricity consumption quantity and match with the new energy output, so that the purposes of improving the new energy consumption quantity and reducing the power fluctuation are realized, the current situation of new energy consumption is effectively improved, and the clean energy consumption is promoted.

(2) In the resident user level, the new energy consumption electricity price decision method considering the user privacy protection can perform the joint training of the global pricing model under the condition that the user electricity data does not leave the local area, and the equivalent electricity price optimization effect is obtained. On one hand, the method can effectively avoid the leakage of the actual electricity utilization data of the user, and protect the safety of the user data and the privacy of the user information; on the other hand, the method considers the historical contribution situation of the user, ensures the response fairness of the user, can fully mine the demand response potential of the user, and enables the user to obtain more response benefits.

(3) In the social aspect, the new energy consumption price decision method considering the user privacy protection can provide a new strategy for protecting the user data privacy for the government, promote the construction and development of a power system after the high-proportion new energy is accessed, protect the data privacy and safety of the public, develop innovative research ideas for university researchers, and have good reference value for the big data development of other enterprises in the society.

Drawings

FIG. 1 is a schematic diagram of LSTM network time and internal architecture according to an embodiment of the present invention;

FIG. 2 is an improved weighted average based encrypted lateral federal learning training framework;

FIG. 3 is a flow chart of a virtual power plant demand response electricity price decision algorithm based on deep reinforcement learning;

FIG. 4 is a demand response day-ahead decision flow considering user privacy protection;

FIG. 5 is a graph of user load energy usage in different scenarios;

FIG. 6 shows electricity prices in different scenarios;

FIG. 7 is a user response load adjustment;

FIG. 8 is a comparison of user response accuracy.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The new energy consumption electricity price decision method considering user privacy protection comprises user demand response potential evaluation, new energy consumption oriented optimization model construction, local pricing algorithm based on reinforcement learning and global day-ahead electricity price decision considering user privacy protection. The details are described below:

step 1: and establishing a user demand response potential evaluation model, and evaluating the user demand response potential according to the user demand response potential evaluation model to obtain a potential evaluation result, wherein the potential evaluation result comprises a demand response load adjustment amount and a contribution factor for evaluating the participation of the user in the demand response. The method specifically comprises the following steps:

step 1.1: and (3) finishing a data set required by the potential evaluation model training by considering the data demand of the user demand response predicted load adjustment.

Referring to fig. 1, the contents of the figure are corresponding LSTM network time modules and internal data structure diagrams. When the response behavior of a user is simulated, the invention divides a day into T time periods by considering the training effect and the actual day-ahead electricity price making requirement, correspondingly establishes T LSTM networks and uses L respectively _j (j=1, 2, …, T). The t period on day n is noted as t=nt+j. And training the network of the time period t by utilizing the data of the corresponding time period in the historical date. Since the individual user has limited response capability for a period of time, the subsequent response is affected by the history of previous responses, and thus the input data contains not only the price lambda for the current period _dr,t The present invention is represented by the formula (1) further including the response price and the response load adjustment amount for the first m periods. The LSTM structure used in the invention is shown in the attached drawings. In training, for each LSTM model, the input dataset for the network is { I } ₁ ,I ₂ ,…,I _t-1 Output as user response dataset { d } ₁ ,d ₂ ,…,d _t-1 }. After training is completed, input I of target t period _t The load adjustment d of the user simulation response in the period can be obtained _t 。

I _t ＝{(λ _dr,t-m ,d _t-m ),(λ _dr,t-m+1 ,d _t-m+1 ),…,(λ _dr,t-1 ,d _t-1 ),λ _dr,t And (1) wherein m is the number of history periods affecting the current period.

Step 1.2: based on the data set arranged in the step 1.1, establishing an LSTM-based user demand response potential evaluation model for predicting the demand response load adjustment quantity d of a user under a given electricity price _i,t 。

The specific training process of the LSTM model is mainly completed by three gate structures, namely an input gate, a forget gate and an output gate. Input gate incorporates the current input content I _t And the output d of the last period _t-1 Update information and store the updated part selectively to cell state c _t Is a kind of medium. The forgetting gate determines the degree to which the cell state is forgotten or updated and determines the information that is forgotten. The output gate calculates the output content at that time, i.e., the response load adjustment amount of the user, based on the updated cell state. The above process is formulated as:

f _t ＝σ(W _f [h _t-1 ,I _t ]+b _f ) (2)

z _t ＝tanh(W _z [h _t-1 ,I _t ]+b _z ) (3)

i _t ＝σ(W _i [h _t-1 ,I _t ]+b _i ) (4)

c _t ＝f _t ·c _t-1 +I _t ·z _t (5)

o _t ＝σ(W _o [h _t-1 ,I _t ]+b _o ) (6)

d _t ＝o _t ·tanh c _t (7)

Wherein W is _i 、W _z 、W _o 、W _o Respectively representing input gate, forgetting gate, output gate and weight matrix input by LSTM memory unit for communicating x and LSTM memory unit, b _i 、b _z 、b _f 、b _o Respectively represent the input gate, the forget gate, the output gate and the bias vector input by the LSTM memory cell.

In the actual training process, a full connection layer is added behind the LSTM neuron layer and is used for processing the final output result. Meanwhile, in order to verify the accuracy of the virtual response result, the historical data set is divided into a test set and a verification set. The invention selects root mean square error (Root mean square error, RMSE) as a loss function fed back by an LSTM model, and average absolute percentage error (Mean absolute percent error, MAPE) as error evaluation of LSTM prediction, wherein the calculation formulas of the two are respectively shown as follows.

Step 1.3: user response potential assessment result d according to step 1.2 _i,t And actual demand response data of subsequent users, solving a contribution factor D for evaluating user participation demand response _M 。

In order to fully evaluate the contribution of the user in the demand response and excite the enthusiasm of the user to participate in the demand response, the invention provides a contribution factor D _M To characterize the user's performance in the response process and the contribution made to the virtual power plant. The main considerations of the contribution factor are response volume, response participation rate and conservation. The response amount indicates that the user actually cuts down the adjusted load amount in the peak period of the response; the response participation rate refers to the proportion of the number of times that the user actually participates in the response to all the users needing to respond in one quarter; the conservation case is used for measuring the deviation degree of the actual load reduction amount and the promised load reduction amount of the user. The three indexes are calculated by formulas respectively.

Wherein, gamma _i Representing response participation of the ith user within a quarterRate f _i Representing the number of times the ith user participates in the demand response; Δd _i Measure index for user i's conservation, d _i,t For user i, predicted demand response in period t, d _Ac,i,t To actually respond to the load, d _avg,t And (3) taking the response average value of all users in the virtual power plant at the moment t as a rule breaking penalty force parameter.

Because the unit dimensions are different but are related, in order to reduce the mutual interference of data among the different dimensions, the invention utilizes a Markov distance formula to calculate the contribution factors. The mahalanobis distance is the Euclidean distance rotationally scaled according to the characteristic vector direction, so that the problem of non-uniform data dimension in the calculation process can be effectively solved, and the contribution of each user in the demand response load adjustment process can be measured more objectively. The calculation of the contribution factor is shown in the following formula.

X _i ＝(d _i ,γ _i ,Δd _i ) In the formula (13), D _Mi Contribution factor, X, calculated for user i in terms of Markov distance within the quarter _i The vector is calculated for the user i and consists of three aspects of response quantity, response participation rate and conservation condition; μ is the sample mean vector for all users. Sigma (sigma) ^-1 A covariance matrix representing the multidimensional random variable.

Step 2: the method for constructing the multi-objective optimization model for new energy consumption comprises the following steps:

step 2.1: and calculating the power fluctuation of the total energy output in the park. By delta _t And the fluctuation rate of the total energy output power in the park in the period t is represented.

Wherein P is _G,t The actual output of all the energy sources in the local park comprises new energy sources, a micro-coal turbine and a storageAn equipment, etc. Δt is the time interval between two periods.

Step 2.2: from the new energy output and the power fluctuation rate delta obtained in step 2.1 _t Co-determining a target response d for a user _target,t . Firstly, the total load after the user responds cannot exceed the maximum output of all power supplies; subsequently, the load amount d which specifically requires the user to respond is calculated according to the limitation of the fluctuation rate _target,t The process is as follows:

d _target,t ＝P _G,t-1 -δ _drop Δt (15)

d _target,t ＝P _G,t-1 -δ _rise Δt (16)

(3) If delta _drop ≤δ _t ≤δ _rise Then

d _target,t ＝0 (17)

Step 2.3: combining the user contribution factor and the potential evaluation result obtained in the step 1.3 (namely the response load adjustment quantity d of the user _i,t ) Response to electricity price lambda with demand _dr,t Is variable based on the target response d of the user _target,t And establishing a multi-objective optimization model for smooth consumption of new energy.

The multi-objective optimization B comprises new energy consumption benefits B _R Power fluctuation B _F And user demand response benefit B _U . New energy consumption benefit B _R The optimization is mainly performed by reducing the waste electric quantity and increasing the power generation capacity of new energy. Power fluctuation B _F And the feeder power fluctuation cost caused by the output of all energy sources in the park after the micro-coal turbine is added is considered. Demand response benefit B for users _U Mainly comprises two parts of reduced energy consumption cost and response deviation punishment of users. The response deviation is the difference between the target response value and the actual response value, and depends on the evaluation of the user response by the algorithmAccuracy. When the evaluation accuracy is low, the real-time price is set too high or too low, which affects the total response yield.

The objective function and constraints are as follows.

max B(λ)＝B _R +B _F +B _U (18)

Wherein eta represents the preferential specific gravity of the two parts and is in the range of 0,1 ]The optimization objective only considers smooth digestion when η=0. In practice, since the frequency of the user demand response is lower than the frequency of the new energy consumption power fluctuation, T is used respectively _new 、T _dr Representing the number of time periods in which the calculation of power fluctuations and demand response takes place. t period, lambda _o,t Purchasing price for the electrical energy before demand response; lambda (lambda) _flu Loss of cost coefficients for power fluctuations; lambda (lambda) _dr,t Lambda is the price of electricity price for demand response _p,t Penalty coefficients are responsive to the deviation. Lambda (lambda) _dr,min 、λ _dr,max And limiting the real-time electricity price. P (P) _R,t For the actual output of new energy in t period, P _Ro,t Predicted power before new energy is regulated for t period, P _Rmax,t 、P _Rmin,t Representing the limit of the new energy output in the t period. d, d _i,t A predicted demand response amount representing the user, which is required in the user response capability range d _max And is performed internally. D (D) _Mi For the contribution factor calculated by user i as a function of the mahalanobis distance in the quarter, the parameter lambda is used to control the extent of influence of the user contribution factor.

Step 3: based on the potential evaluation result and the optimization model, performing local pricing model training based on a reinforcement learning algorithm, including the following steps:

step 3.1: and (3) establishing a reinforcement learning pricing environment of the local new energy park based on the user target response quantity obtained in the step (2.2).

Setting the external bidding price lambda of the virtual power plant in the optimization model _vpp,t The values have been determined by the distribution network operator. The variables in the optimization objective mainly consider the response electricity price lambda inside the new energy local park _dr,t And a consequent user response load adjustment d _i,t . With respect to the method of predicting load adjustment at a given price, the user demand response potential assessment model in step 2.2 has given an introduction where it is continued how to explore the most profitable day-ahead pricing scheme using reinforcement learning.

Reinforcement learning is essentially an interactive learning algorithm based on a markov decision process for exploring the relationship between states and action behaviors to obtain the maximum return. Wherein A3C (Asynchronous Advantage Actor-critic) is a distributed algorithm in reinforcement learning, the core idea of which is similar to that of federal learning, i.e. the local edge device performs a calculation process involving specific user data, and the management center is responsible for updating the model with core parameters and making a final response price decision. In the aspect of protecting the privacy of user data, A3C avoids the direct transmission of a large amount of data, and is not compatible with the federal learning pricing algorithm provided by the invention. The number of interaction iterations is denoted by X (x=1, 2..x), and the pricing process for A3C involves mainly four main factors, the state s= { S _x Action a= { a _x The return r= { R _x Probability of state transition P(s) _x+1 |s _x ,a _x ). In the present invention, state s _x Information representing a demand response target, a bid result, an internal response price, and the like at the x-th iteration, action a _x Representing response electricity price formulated in virtual power plant, and reporting r _x Representing the benefits obtained by the virtual power plant through demand response, the state transition probabilities P (s _x+1 |s _x ,a _x ) Indicating that the VPP takes action a _x State is made to be s _x Transfer ofTo s _x+1 Is a probability of (2).

Step 3.2: and (3) based on the local reinforcement learning pricing environment established in the step (3.1), the Actor network makes a demand response electricity price decision. Input state s _x Probability distribution P of each electricity price selected as response electricity price in output range of Actor network and accordingly gives demand response price lambda in virtual power plant _dr,t 。

Step 3.3: based on the response potential evaluation result of the step 1.2 and the optimization model of the step 2.3, the Critic network evaluates the electricity price decision action of the step 3.2, and guides the decision action according to an optimization target so as to improve the income and speed of the decision of the Actor network.

Critic network output desired V(s) for which decision making using current Actor network can yield benefits _x ) The invention uses the optimized objective function to measure the income situation of the pricing decision.

V(s _x )＝E[R _x |s＝s _x ] (22)

Wherein E [ ] represents mathematical expectation, gamma is a discount factor, and the value range is [0,1].

Since V(s) is directly calculated _x ) There are difficulties in that, in practical algorithmic calculations, time difference errors (TD-error) are commonly used to evaluate the V(s) output by the Critic network _x ) And the difference value from the true value is that the training target of the Critic network is that the difference value is reduced to approach 0. The calculation formula of TD-error is shown below.

Obtaining demand response price lambda in VPP through Actor network _dr,t Thereafter, the LSTM-based user response potential evaluation model in the step 1 is utilized to predict the load adjustment amount d of the user _t Calculating the benefit r by the objective function in step 2 _x And obtain the next state s _x+1 . Subsequently, critic network utilizes s _x Sum s _x+1 The output V(s) _x ) And V(s) _x+1 ) Then getThe TD-error value delta of this decision.

δ＝r _x +γV(s _x+1 )-V(s _x ) (24)

Wherein delta is TD-error.

Step 3.4: and (3) judging the property of the pricing action and adjusting the future pricing direction according to the evaluation result of the step 3.3.

Expected V(s) of benefit _x ) The electricity price decision-making behavior of the subsequent Actor network can be influenced, and the training process of the Actor network is directly influenced. For each day-ahead decision electricity price a _x Obtained response benefit r _x If r _x Is higher than the expected value of Critic network for revenue (V(s) _x )-γV(s _x+1 ) If the corresponding delta is a positive value, the training of the Actor network by the current electricity price decision action is an effective positive experience; otherwise, the action is considered as the negative experience of the Actor network training, and the strategic avoidance is carried out in the subsequent pricing process.

Step 3.5: and (3) updating the gradient parameters of the local reinforcement learning pricing model according to the TD-error value delta calculated in the step (3.3).

The local model directly uses the local user response data to carry out gradient iterative computation, and uploads the gradient update result to the global model, and parameter aggregation is completed by the global model, so that the virtual power plant management center completes the update of the global model on the premise of not directly contacting the user data of each place. The detailed global model parameter aggregation process will be described in the section 4, where the present invention only describes the gradient iteration process of the local pricing model.

Loss function loss for Critic networks _critic Expressed, gradient parameters are denoted epsilon, and their calculation and iteration formulas can be expressed as:

loss _critic ＝δ ² (25)

in the loss of _critic Is the loss function of the Critic network; epsilon is Critic netGradient parameters of the complex; beta is the learning rate. Critic network training goal is to minimize TD-error.

Loss for loss function of Actor network _actor The gradient parameters are represented by θ, and their calculation and iteration formulas can be expressed as:

loss _actor ＝δ·logP(a _x ∣s _x ,θ)+c·H(P(s _x ,θ)) (27)

in the loss of _actor Is the loss function of the Actor network; θ is the gradient parameter of the Actor network; alpha is the learning rate; h is the entropy of the probability distribution P and c is its coefficient.

Step 4: based on the trained local pricing model, joint training of a local pricing algorithm is carried out to realize global day-ahead electricity price decision considering user privacy protection, and the method comprises the following steps:

step 4.1: the virtual power plant management center obtains load shedding information including a shedding period, a required shedding load amount, a reference response price of electricity, and the like from the upper layer operator.

Step 4.2: and the central server performs parameter issuing. In the parameter delivery stage, the central server initializes the model parameters and delivers the original parameters to each local server. When the global model is updated, the central server broadcasts the downloaded global parameters which become the updated pricing model. Before the parameters are issued, PHE semi-homomorphic encryption is carried out by a central server.

The key feature of the PHE semi-homomorphic encryption algorithm is the support of addition homomorphic and scalar multiplication homomorphic. Assuming that the results of homomorphic encryption of u and v are [ [ u ] ] and [ [ v ] ], respectively, then for PHE, the following rule holds:

addition homomorphism:

scalar multiplication homomorphism: dec _sk ([[u]]⊙m)＝Dec _sk ([[u·m]]) In the formula (30), dec represents a decryption function; sk represents a key used for decryption;indicating that multiplication can be performed on the ciphertext, +..

Step 4.3: and (3) carrying out local pricing model training by using the model parameters in the step (4.2) according to the step (3.6) by the local server, and encrypting and uploading the iterated gradient parameters to the global center server. After the global parameters obtained by decryption of each local model are utilized, the local training of the day-ahead demand response pricing optimization model of the virtual power plant is directly carried out according to the step 3 by utilizing demand response data of the power users contained in the local units. In this step, the local servers do not share the data sets, but directly train the local model by using the data of the local servers, so that the leakage of the private data used by the user is avoided.

In the training process of the local model, parameters such as gradient and the like are updated and calculated. The model parameters after training are shown in the formula.

Wherein omega is _(n)s Is the parameter of the nth local model at the (s+1) th round of iteration of the global model; η is the learning rate of algorithm iteration;a computational function representing the gradient descent.

Step 4.4: and (3) the central server aggregates the local model parameters uploaded in the step (4.3) and calculates and updates global model gradient parameters. When the local training reaches the upper limit of the times, each local server executes PHE semi-homomorphic encryption on parameter information such as gradient obtained after the local model training is completed, and then uploads the parameter information to the central server, so that the key training promotion of the global model is realized on the premise of not directly externally transmitting local data. The central server aggregates each local model parameter according to the improved weighted average algorithm to generate a global parameter, and sends the global parameter to each local server after encryption. The parametric co-training process is illustrated in fig. 2 by the encrypted lateral federal learning training framework based on an improved weighted average.

Assuming that N units are managed in the virtual power plant, each unit has k _n (n=1, 2, …, N) users participate in the demand response, and the total number of users in all units is K. Then, according to the gradient averaging algorithm, when the central server performs the (s+1) th iteration, the model gradient weights of the nth local server in the FedAVg framework are:

Wherein k is _n The number of users for which the load adjustment is to be responded to for the nth unit internal reference and the demand; k is the total number of all users, N is the total number of units; s is the iteration number of the model; omega _s The global model weight updating result after the s-th iteration of the central server is obtained; omega _(n)s And (5) performing gradient descent on the nth user unit by using the local data to obtain new local model weight.

In actual calculation, considering that the number of users and the data quality under each unit are different, the influence of low-quality small user pollution model parameters is easily amplified by directly carrying out weighted average processing on the gradient, and the contribution of high-quality user data and poor-quality user data in the response process is not favorably distinguished, so that the overall optimality of the pricing model is greatly reduced. Thus, the present invention employs an improved weighted average federal algorithm for model gradient aggregation. The improved weighted average federation algorithm is developed from the traditional federation average algorithm, and the weight for measuring the accuracy and the calculation quality of the data is added on the original basis. Its core improvement idea is to divide the training sample data into test training samples and raw training samples. The original training sample is fixed on the initial global model according to the traditional weighted average algorithm Price training; the test sample is used for measuring the accuracy of the model training in the last step and is used for measuring the data quality and the model training effect. Training accuracy q _n The calculation formula of (2) is shown.

Wherein q is _n Representing training accuracy of the nth user unit; c (C) _n Representing the correct sample number of the model training result on the nth local server after being compared with the test sample; x is X _n Indicating the total number of test samples.

After obtaining the model training accuracy of each local server, the weight factor considering the training accuracy is used for replacing the weight factor considering only the number of users in a unit, so that the improved weighted average federal algorithm can be redefined, and the calculation process is shown in a formula.

In the method, in the process of the invention,the global model parameters updated after the s-th round of iteration are also parameters which are broadcasted to each local model at the beginning of the (s+1) -th round of iteration; omega _(n)s Parameters of an nth local model in the s-th round of iteration; alpha _(n)s+1 Training quality factors of the nth unit in the (s+1) th round of iteration of the global model; q _(n)s+1 Training accuracy obtained after testing the unit; l is an adjustable hyper-parameter for adjusting the impact of the training quality factor. During the whole iteration process, all alpha are due to _(n)s+1 The sum of (2) is 1, so that the improvement of the yield of equation (34) in the algorithm can be ensured as long as equation (32) in the algorithm is convergentAnd (5) astringing.

And 4.5, judging whether the global pricing model training is completed, if yes, ending, otherwise, encrypting the latest global model parameters and sending the encrypted latest global model parameters to each local server. And (3) repeating the steps 4.2-4.4 until the global pricing model training is completed, and obtaining a day-ahead electricity price scheme considering new energy consumption optimization. The flow of the new energy consumption user demand response electricity price decision algorithm based on deep reinforcement learning is shown in fig. 3.

Step 4:6: applying the step 4.5 training the completed pricing model. Performing global pricing decision of day-ahead response of the virtual power plant; reporting the decision electricity price and the expected response condition to an upper power distribution network power transaction center; the electric power transaction center organizes centralized bidding of each virtual power plant and discloses result information; the virtual power plant adjusts the day-ahead response pricing strategy according to the bidding result and reports the day-ahead response pricing strategy again; and repeatedly making decisions, bidding and optimizing until a day-ahead response electricity price decision scheme for finally considering new energy consumption in the virtual power plant is determined.

Step 4.7: after determining a final response electricity price scheme, broadcasting the electricity price scheme obtained in the step 4.6 to each unit inside by the virtual power plant management center, exciting a user to respond in the future, and adjusting load energy; according to the actual response performance of the user on the second day, the local server updates a database and calculates a model error; the central server aggregates the error conditions of each unit, updates the global model parameters and participates in actual response settlement of the upper-layer power transaction center. The above-mentioned demand response day-ahead price decision flow considering user privacy protection is shown in fig. 4.

In order to verify the effectiveness of the method provided by the invention, the following calculation example verification is carried out:

(1) Scene setting

In order to verify the effectiveness of the proposed federal learning-based demand response day-ahead pricing method taking user privacy into consideration, the present chapter uses a power demand response dataset of a city from 5 months, 17 days, in 2020, 5 months, 16 days, in 2021 for example analysis. The data set is derived from virtual power plant demonstration projects, the number of users is 900, the users are divided into A, B, C groups according to the energy consumption and response behavior characteristics, and the direct management is respectively carried out by different local servers. The dataset was divided into training, validation and test sets in a 7:1:2 ratio. The test was performed in the following three scenarios.

Scene 1: the virtual power plant does not organize demand response, and the load energy can uniformly execute fixed electricity price standard, and the price is 0.75 per kWh.

Scene 2: the virtual power plant sets a time-of-use electricity price. The original electricity price standards of the peak, the normal and the valley are respectively 0.95, 0.70 and 0.65 per kWh.

Scene 3: the virtual power plant is priced in response to demand for 24 hours in a day-ahead pricing period, with time granularity of 24 points per day, and the day-ahead pricing range is [0.5,1.25 ]/kWh.

When testing, peak time periods are set to be 9:00-14:00, 19:00-23:00, and usual time periods are set to be 07:00-09:00, 14:00-19:00, and valley time periods are set to be 23:00-7:00. When peak clipping is carried out on demand response before the day, the upper limit of the peak-to-peak load threshold is set to 4200kW, and the peak-to-valley load threshold is set to 1300kW.

(2) Parameter setting

The invention uses LSTM algorithm to build the model for simulating the habit of user demand response behavior, and builds the corresponding model for each response time period. In practical application, the input training data of the corresponding models of each period are different, but the basic parameters are consistent, and the specific content is shown in table 1. The A3C algorithm in reinforcement learning is used to perform optimal electricity price decisions, and the invention divides the effective user data in the virtual power plant into three groups, so the number of edge trainers is 3. The specific parameter settings are shown in table 2.

Table 1 basic parameter settings of LSTM algorithm

Table 2A3C algorithm parameter settings

In order to ensure the data privacy safety of users, the invention adopts a transverse federal learning framework for improving weighted average to train the electricity price decision model. Based on the framework, after finishing a round of local training, the 3 local reinforcement learning models upload key gradient parameters to the control center, and the control center performs unified improved weighted average to obtain global gradient updating results, so that the global model training effect is optimized on the premise of not directly transmitting data. Key parameters in the federal learning framework are shown in table 3.

Table 3 federal learning framework parameter settings

The algorithm program is written by using Python3.7, is realized on a TensorFlow2.2 platform and is 2.3.1 by using Keras version. Run on an AMD Ryzen7 5800H CPU,3.20GHz computer with 16G memory. The federal algorithm deployment mode is single-machine deployment.

(3) Load energy response and pricing

The present invention selects a typical day for response pricing studies. The load energy usage of the user in the three scenarios is shown. As can be seen from fig. 5, the original user load in the scene has two energy utilization peaks in one day, namely 11:00-15:00 and 19:00-21:00, and the energy utilization can present obvious valley states in the period of 0:00-5:00, and the peak-valley difference of the maximum load can reach 2130kW. The daily virtual power plant management center receives the load peak clipping response demands of 7 peak periods, and based on the load peak clipping response demands, the daily pricing optimization is carried out on the basis, the daily pricing range is [0.5,1.25 ]/kWh, and the adjustment amplitude is 0.05 per kWh.

And respectively carrying out time-of-use electricity price and day-ahead pricing on the second scene and the third scene, and organizing users to carry out demand response load transfer. After the adjustment, the electricity prices and the corresponding user load adjustment amounts of the respective periods are shown in fig. 6 and 7, respectively. From the response result, although different electricity prices are set for the peak, normal and valley periods respectively in the second scene, the attribute of the specific period is fixed, and cannot be adjusted according to the actual load energy consumption condition of the day. Therefore, the time-of-use electricity price can cut down part of peak load, but on one hand, the load peak is not transferred in place, and the phenomenon of out-of-limit exists in part of time period; on the other hand, the peak-valley difference under the second scene is 2040kW, and a larger adjustable space is still reserved.

The method provided by the invention is applied to a third scene, and daily electricity price decision optimization aiming at the response behavior characteristics of the users in each period is carried out. The algorithm sets different response electricity prices, and distributes the electricity prices to users in the previous day to perform integral energy utilization optimization. Because the power price in the peak period is obviously increased, the user tends to transfer the energy consumption to the adjacent period or the valley period, and the power consumption requirement is met with lower price on the premise of ensuring that the total power consumption is not reduced. The electricity price decision calculation amount involved in the process is higher than that of the first two scenes, but the electricity price establishment of time intervals is more flexible. As can be seen from comparison of the load adjustment amounts in FIG. 7, the method used in the third scenario can excite the response potential of the user to a greater extent in the same period, increase the flexible load adjustment amount, and have better adjustment effect on the load energy. Finally, the third scene realizes controlling the peak-valley difference of the load to 1290kW, and the peak clipping effect is good.

(4) Algorithm performance analysis

The invention adjusts the reference degree of the historical participation demand response condition of the user in the model optimization process by controlling the numerical value of the parameter gamma, and obtains the response accuracy under different training scenes, and the result is shown in figure 8.

As can be seen from fig. 8, in terms of the accuracy of the user's day-ahead response, the response error of the dispersion training model is large, and the three groups of average deviation amounts are concentrated in the range of 7% -9% as a whole, that is, the response accuracy is concentrated in the range of about 90% -93%. The model accuracy of the federal training framework provided by the invention is concentrated near 95-97%, the overall response deviation of the user is small, and the response effect is good in the future. The method is characterized in that in the actual decentralized training process, the single group of data volume is limited, the randomness of the actual behaviors of the user is difficult to process well aiming at the single user type, the adaptability to the users with various behavior habits is insufficient, and the decentralized training effect is further limited. The federal training achieves the overall optimization training effect of the model through improved weighting processing of parameters, and can more accurately formulate a day-ahead electricity price strategy aiming at the user response behavior habit, thereby improving the day-ahead response effect.

On the other hand, the influence of the reference weight of the user contribution factor on the response effect is observed by comparing response accuracy results corresponding to different gamma values. The invention uses gamma to control the influence weight of the user history contribution factor, and the larger the gamma is, the larger the influence of the contribution factor in the optimization target is, namely, the more the virtual power plant can consider to increase the income of the high-response quality user in the pricing process. As shown in the graph, compared with the case that γ is 0 (the contribution of the user history response is not considered at all), increasing γ can improve the response effect to a certain extent, the response accuracy is slightly improved, and the error distribution shows a gradually concentrated trend, namely the user response performance is more stable. However, when γ is too large, such as near 1, the model training effect rather tends to decrease. Mainly considering the overdependency on history experience, the benefits of users with poor response effects before can be seriously reduced, so that vicious circle is formed, and the load adjustment enthusiasm of other users is not stimulated to a certain extent. Through the test, the user response accuracy is high when gamma is about 0.8, the response deviation is stable, and the whole effect is the best.

In terms of the transmission data quantity and privacy performance of the algorithm, if the federal learning framework is not used, each local server directly uploads the original user data to the central server, and the related data quantity is mainly the total product of the training days, the daily load acquisition points and the user quantity. In the federal training process, the data volume required to be transmitted by the algorithm provided by the invention is mainly determined by the model parameter number and the global updating turn. The global optimization round is set to be 200 times, but the model parameter quantity relates to an LSTM user behavior model and a deep reinforcement learning decision model of 24 time periods, and only one Actor network in reinforcement learning contains 100 x 50 node gradient parameters, so that the total traffic is huge. But the process avoids the load energy data from leaving the local, and enhances the protection of privacy security of users.

The foregoing is merely illustrative embodiments of the present invention, and the present invention is not limited thereto, and any changes or substitutions that may be easily contemplated by those skilled in the art within the scope of the present invention should be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims

1. The new energy consumption electricity price decision method considering user privacy protection is characterized by comprising the following steps of:

2. The new energy consumption electricity price deciding method considering user privacy protection according to claim 1, wherein the step 1 comprises the steps of:

3. The new energy consumption price decision method considering user privacy protection according to claim 2, wherein step 1.1 specifically comprises:

when the response behavior of the user is simulated, dividing a day into T time periods, correspondingly establishing T LSTM networks, respectively using L _j (j=1, 2, …, T), the T period on the nth day being denoted as t=nt+j, training the network for time period T with data for the corresponding period in the historical date; the input data contains the price lambda of the current period _dr,t And response price and response load adjustment amount for the first m periods:

I _t ＝{(λ _dr，t-m ，d _t-m )，(λ _dr，t-m+1 ，d _t-m+1 )，…，(λ _dr，t-1 ，d _t-1 )，λ _dr，t } (1)

In training, for each LSTM model, the input dataset for the network is { I } ₁ ,I ₂ ,…,I _t-1 { output is user response data set Pd ₁ ,d ₂ ,…,d _t-1 After training is completed, input I of target t period _t The load adjustment d of the user simulation response in the period can be obtained _t 。

4. The new energy consumption price decision method considering user privacy protection according to claim 3, wherein step 1.2 specifically comprises:

the specific training process of the LSTM model is completed by an input gate, a forgetting gate and an output gate, wherein the input gate is combined with the current input content I _t And the output d of the last period _t-1 Update information and store the updated part selectively to cell state c _t In the method, a forgetting gate determines the forgetting or updating degree of the cell state, determines the forgotten information, and an output gate is based on the updated informationThe cell state, the output content at this time, namely the response load adjustment amount of the user, is calculated, and the above process is expressed as:

f _t ＝σ(W _f [h _t-1 ，I _t ]+b _f ) (2)

z _t ＝tanh(W _z [h _t-1 ，I _t ]+b _z ) (3)

i _t ＝σ(W _i [h _t-1 ，I _t ]+b _i ) (4)

c _t ＝f _t ·c _t-1 +I _t ·z _t (5)

o _t ＝σ(W _o [h _t-1 ，I _t ]+b _o ) (6)

d _t ＝o _t ·tanhc _t (7)

5. The new energy consumption price decision method considering user privacy protection according to claim 4, wherein step 1.3 specifically comprises:

wherein, gamma _i Representing response participation rate of the ith user in a quarter, f _i Representing the number of times the ith user participates in the demand response; Δd _i Measure index for user i's conservation, d _i,t For user i, predicted demand response in period t, d _Ac,i,t To actually respond to the load, d _avg,t The response average value of all users in the virtual power plant at the moment t is taken as a default punishment force parameter;

the process of calculating the contribution factor is as follows:

X _i ＝(d _i ，γ _i ，Δd _i ) (13)

6. The new energy consumption electricity price deciding method considering user privacy protection as claimed in claim 1, wherein the step 2 comprises the steps of:

wherein P is _G,t The method comprises the steps of actually outputting all energy sources in a local park, wherein the actual output comprises new energy sources, a micro-coal turbine and energy storage equipment; Δt is the time interval between two time periods;

step 2.2: from the new energy output and the power fluctuation rate delta obtained in step 2.1 _t Co-determining a target response d for a user _target,t ：

Step 2.3: combining the user contribution factor obtained in the step 1.3 and the response load adjustment quantity d of the user _i,t Response to electricity price lambda with demand _dr,t Is variable based on the target response d of the user _target,t And establishing a multi-objective optimization model for smooth consumption of new energy.

7. The new energy consumption price decision method considering user privacy protection according to claim 6, wherein step 2.2 specifically comprises:

firstly, the total load after the user responds cannot exceed the maximum output of all power supplies; subsequently, the load amount d which specifically requires the user to respond is calculated according to the limitation of the fluctuation rate _target,t Too muchThe process is as follows:

d _target，t ＝P _G，t-1 -δ _drop Δt (15)

d _target，t ＝P _G，t-1 -δ _rise Δt (16)

(3) If delta _drop ≤δ _t ≤δ _rise Then

d _target，t ＝0 (17)。

8. The new energy consumption price decision method considering user privacy protection as claimed in claim 7, wherein in step 2.3, the multi-objective optimization objective B of the multi-objective optimization model includes new energy consumption benefits B _R Power fluctuation B _F And user demand response benefit B _U New energy consumption benefit B _R The optimization is mainly carried out by reducing the waste electric quantity and increasing the power generation capacity of new energy, and the power fluctuation B _F Considering the feeder power fluctuation cost caused by the output of all energy sources in a park after the micro-coal turbine is added, the demand response income B of a user _U The method mainly comprises two parts of energy consumption cost reduced by a user and response deviation punishment, wherein the response deviation is the difference between a target response value and an actual response value;

max B(λ)＝B _R +B _F +B _U (18)

wherein eta represents the preferential specific gravity of the two parts and is in the range of 0,1]When η=0, the optimization objective only considers smooth digestion, respectively using T _new 、T _dr Representing the number of time periods in which the calculation of power fluctuations and demand responses are performed; t period, lambda _o,t Purchasing price for the electrical energy before demand response; lambda (lambda) _flu Loss of cost coefficients for power fluctuations; lambda (lambda) _dr,t Lambda is the price of electricity price for demand response _p , _t Penalty coefficients for response bias; lambda (lambda) _dr,min 、λ _dr,max Limiting the real-time electricity price; p (P) _R,t For the actual output of new energy in t period, P _Ro,t Predicted power before new energy is regulated for t period, P _Rmax,t 、P _Rmin,t Representing the limit of new energy output in the t period; d, d _i,t A predicted demand response amount representing the user, which is required in the user response capability range d _max Is carried out internally; d (D) _Mi For the contribution factor calculated by user i as a function of the mahalanobis distance in the quarter, the parameter y is used to control the extent of influence of the user contribution factor.

9. The new energy consumption electricity price deciding method considering user privacy protection as claimed in claim 6, wherein the step 3 comprises the steps of:

step 3.1: based on the user target response obtained in the step 2.2, establishing a reinforcement learning pricing environment of the local new energy park, and specifically setting the external bid price lambda of the virtual power plant in the multi-target optimization model _vpp,t The value is determined by the distribution network operators, and the variables in the optimization targets mainly consider the response electricity price lambda inside the new energy local park _dr,t And a consequent user response load adjustment d _i,t The method comprises the steps of carrying out a first treatment on the surface of the Reinforcement learning is performed using an A3C distributed algorithm, where X (x=1, 2, X) represents the number of interactive iterations, and the A3C pricing process involves mainly four main factorsFactor, state s= { S _x Action a= { a _x The return r= { R _x Probability of state transition P(s) _x+1 |s _x ,a _x ) State s _x Information representing a demand response target, a bid result, an internal response price, and the like at the x-th iteration, action a _x Representing response electricity price formulated in virtual power plant, and reporting r _x Representing the benefits obtained by the virtual power plant through demand response, the state transition probabilities P (s _x+1 |s _x ,a _x ) Indicating that the VPP takes action a _x State is made to be s _x Transfer to s _x+1 Probability of (2);

V(s _x )＝E[R _x |s＝s _x ] (22)

evaluation of Critic network output V(s) using time differential error TD-error _x ) The difference between the value and the true value is specifically obtained by an Actor network to obtain the demand response price lambda in the VPP _dr,t Thereafter, a model is evaluated by using the user response potentialTo predict the load adjustment amount d of the user _t Calculating the benefit r by an objective function _x And obtain the next state s _x+1 Subsequently, critic network utilizes s _x Sum s _x+1 Calculate the output V(s) _x ) And V(s) _x+1 ) Then the TD-error value delta of the decision is obtained:

δ＝r _x +γV(s _x+1 )-V(s _x ) (24)

loss _critic ＝δ ² (25)

in the loss of _critic Is the loss function of the Critic network; epsilon is the gradient parameter of the Critic network; beta is learning rate and Critic network training goal is to minimize TD- _error ；

loss _acbr ＝δ·log P(a _x |s _x ，θ)+C·H(P(s _x ，θ)) (27)

10. The new energy consumption electricity price deciding method considering user privacy protection as claimed in claim 9, wherein the step 4 comprises the steps of:

addition homomorphism: