Dynamic demand response pricing method based on fuzzy reinforcement learning
Technical Field
The invention relates to a dynamic demand response pricing method based on fuzzy reinforcement learning.
Background
With the development of the communication technology of the power distribution network, the demand side response has a flexible adjusting effect at the load end, and the demand side response becomes an effective method for improving the reliability of the power grid and reducing the energy loss. The price type demand response enables a user to change the electricity utilization mode of the user according to the electricity price signal which changes in real time, and the purpose of adjusting the load curve is achieved. The dynamic demand response pricing process is a decision-making process that aims to find a reasonable price for electricity to allocate the system's power services. Demand response pricing models often employ deterministic pricing models, such as time-of-use pricing models, and do not reflect well the uncertainty of the energy of the real-time dynamic market. Dynamic price pricing models typically utilize linear pricing models, have no logical pricing process that is reasonable, and do not reflect the complexity of demand response distribution. It is necessary how to build an uncertain demand response model reflecting the load demand response.
Reinforcement Learning (RL) is an artificial intelligence algorithm. The reinforcement learning algorithm is one of machine learning by taking the behavioral psychology as a reference, and can be used for a decision problem. Reinforcement learning maximizes the rewards of some decisions by individuals continually taking action on an uncertain environment. The application of the reinforcement learning algorithm in the pricing model is beneficial to fully considering the uncertainty and flexibility of the power market, and can be used for solving the dynamic demand response pricing method with uncertainty.
Disclosure of Invention
The invention aims to overcome the defects of a traditional dynamic demand response pricing model, and provides a dynamic demand response pricing method based on a fuzzy reinforcement learning algorithm, which can fully consider the uncertainty and the flexibility of a power market into the decision of electricity price.
The technical scheme adopted by the invention is as follows:
a dynamic demand response pricing method based on fuzzy reinforcement learning comprises the following steps:
s1, establishing a layered power market model, including a fuzzy load demand response model, a load aggregation quotient optimization model and an objective function model thereof;
and S2, solving the model established in the step S1 by using a fuzzy reinforcement learning algorithm to obtain the optimal retail electricity price.
Further, in step S1, the establishing the fuzzy load demand response model specifically includes:
s11, establishing a model of the basic load and an interruptible load model:
in the fuzzy load demand response model, the load comprises an interruptible load and a basic load which does not participate in demand response, and the model of the basic load is as follows:
in the formula,andrespectively representing the energy consumption and the actual energy demand of the user n in the time period t; t ∈ {1,2,3 … T }, T representing the total number of time segments of a day; n is equal to {1,2,3 … N }, N represents the total number of users, and b is a superscript tableIndicating a basic load;
the interruptible load model is:
ξt=(ξa,ξb,ξc)
ξa,ξb,ξc<0
λt,n≥πt
in the formula, E2]Representing a fuzzy expected value;andrepresenting the interruptible energy consumption and energy demand of user n during time t, respectively ξtIs the price elastic coefficient of the time period t, the value of which is less than zero and is a triangular fuzzy number; lambda [ alpha ]t,nIndicating a retail price of electricity for user n for time period t; pitA wholesale electricity price representing a t time period; superscript c represents interruptible load; subscripts a, b, and c represent the start, middle, and end points of the triangular blur number, respectively;
s12, determining a minimum cost target model of the user according to the model of the basic load and the interruptible load model:
wherein,indicating the total actual load consumption expected value,represents the dissatisfaction degree of the user n in the time period t:
αn>0,βn>0
in the formula, αnAnd βnA reaction parameter representing the load on the tangential load amount; dminAnd DmaxRepresenting the minimum and maximum load shedding amounts of the load, respectively.
Further, in step S1, the purpose of establishing the load aggregator optimization model is to earn the maximum profit of the retail electricity price and the wholesale electricity price, and the specific model is as follows:
further, in step S1, when the cost of the user and the profit of the load aggregator are considered simultaneously, the objective function model is:
in the formula, rho epsilon [0,1] represents the weight relation of the user cost and the load aggregation quotient.
Further, the step S2 specifically includes:
step S21: initializing parameters including: energy requirement E of the loadt,nElastic coefficient of price ξtThe reaction parameter α of load bisection loadn、βn(ii) a Minimum and maximum load shedding D of the loadmin、Dmax(ii) a Price of wholesale electricity pin(ii) a The weight factor θ of the reward; the weight relationship rho between the user cost and the load provider;
step S22: initializing Q (e)t,n|Et,n,λt,n) Each element in the Q table is zero, a time period t is set to be 0, and the iteration number k is set to be 0;
step S23: observing the energy demand E of the user at t ═ 1t,n;
Step S24: selection of retail electricity prices lambda with greedy strategyt,n;
Step S25: calculating a reward, i.e. objective functionObserving the energy demand E of the user over a time period t +1t+1,nAnd updating the FQ value;
step S26: judging whether the maximum time period T is reached, if yes, turning to the next step, otherwise, returning to the step S24 when T is T + 1;
step S27: judging whether the Q table converges to the maximum value, if yes, going to the next step, otherwise, returning to the step S23 when k is k + 1;
step S28: and outputting the optimal retail price of the T time periods in one day.
Further, in step S24, the state-action value function is:
V(x)=maxFQ(sk+1,a),
wherein FQ (·) represents the FQ value and is a fuzzy expected value; k represents the number of iterations; a is in the state sk+1An act of selecting;
then the action policy is selected using greedy principle as:
wherein x is a random number within the interval [0,1 ]; ε represents the search rate.
Further, in step S25, the FQ value may be updated by the following equation:
FQ(sk,a)←FQ(sk,ak)+αk[r(sk,ak)+γmaxFQ(sk+1,a)-FQ(sk,ak)]
wherein α represents a learning factor, γ represents a discount factor, r(s)k,ak) Is represented by skIn state selection akAnd (4) reporting the action.
Compared with the prior art, the invention has the following beneficial effects:
in the operation process, the uncertainty of the load is fully considered, the defect of fuzzy uncertainty of load response is not considered for the dynamic demand response pricing model, the method is suitable for the real-time changing electric power market environment, the rationality of dynamic pricing is improved, the calculation efficiency is improved, the real-time optimal pricing strategy is found through the optimization algorithm, and the effects of improving the reliability of the power grid and reducing the energy imbalance are achieved.
Drawings
FIG. 1 is a schematic diagram of a tiered power market model.
Fig. 2 is a schematic flow chart of solving to obtain the optimal retail electricity price based on the fuzzy reinforcement learning algorithm.
Detailed Description
The embodiments are further described below with reference to the accompanying drawings.
A dynamic demand response pricing method based on fuzzy reinforcement learning comprises the following steps:
s1, establishing a layered power market model, including a fuzzy load demand response model, a load aggregation quotient optimization model and an objective function model thereof;
and S2, solving the model established in the step S1 by using a fuzzy reinforcement learning algorithm to obtain the optimal retail electricity price.
As shown in fig. 1, energy is sold to a load aggregator at a wholesale price by a power producer, and then sold to a consumer at a retail price by the load aggregator. The exchange information among the three is mainly the purchase price and the electricity consumption. The information exchange and pricing decision mechanism of the retail price between the load aggregator and the consumer is the dynamic load demand response pricing method based on the fuzzy reinforcement learning provided by the embodiment.
Specifically, in step S1, the establishing the fuzzy load demand response model specifically includes:
s11, establishing a model of the basic load and an interruptible load model:
in the fuzzy load demand response model, the load comprises an interruptible load and a basic load which does not participate in demand response, and the model of the basic load is as follows:
in the formula,andrespectively representing the energy consumption and the actual energy demand of the user n in the time period t; t ∈ {1,2,3 … T }, T representing the total number of time segments of a day; n ∈ {1,2,3 … N }, where N denotes the total userNumber, superscript b represents base load;
the interruptible load model is:
ξt=(ξa,ξb,ξc)
ξa,ξb,ξc<0
λt,n≥πt
in the formula, E2]Representing a fuzzy expected value;andrepresenting the interruptible energy consumption and energy demand of user n during time t, respectively ξtIs the price elastic coefficient of the time period t, the value of which is less than zero and is a triangular fuzzy number; lambda [ alpha ]t,nIndicating a retail price of electricity for user n for time period t; pitA wholesale electricity price representing a t time period; superscript c represents interruptible load; subscripts a, b, and c represent the start, middle, and end points of the triangular blur number, respectively;
s11, determining a minimum cost target model of the user according to the model of the basic load and the interruptible load model:
wherein,indicating the total actual load consumption expected value,represents the dissatisfaction degree of the user n in the time period t:
αn>0,βn>0
in the formula, αnAnd βnA reaction parameter representing the load on the tangential load amount; dminAnd DmaxRepresenting the minimum and maximum load shedding amounts of the load, respectively.
Specifically, in step S1, the purpose of establishing the load aggregator optimization model is to earn the maximum profit of the retail electricity price and the wholesale electricity price, and the specific model is as follows:
specifically, in step S1, when the cost of the user and the profit of the load aggregator are considered, the objective function model is:
in the formula, rho epsilon [0,1] represents the weight relation of the user cost and the load aggregation quotient.
Specifically, as shown in fig. 2, the step S2 specifically includes:
step S21: initializing parameters, including: energy requirement E of the loadt,nElastic coefficient of price ξtThe reaction parameter α of load bisection loadn、βn(ii) a Minimum and maximum load shedding D of the loadmin、Dmax(ii) a Price of wholesale electricity pin(ii) a The weight factor θ of the reward; the weight relationship rho between the user cost and the load provider;
step S22: initializing Q (e)t,n|Et,n,λt,n) Each element in the Q table is zero, a time period t is set to be 0, and the iteration number k is set to be 0;
step S23: observing the energy demand E of the user at t ═ 1t,n;
Step S24: selection of retail electricity prices lambda with greedy strategyt,nAt this time, the state-action value function is:
V(x)=maxFQ(sk+1,a),
wherein FQ (·) represents the FQ value and is a fuzzy expected value; k represents the number of iterations; a is in the state sk+1An act of selecting;
then the action policy is selected using greedy principle as:
wherein x is a random number within the interval [0,1 ]; ε represents the search rate;
step S25: calculating a reward, i.e. objective functionObserving the energy demand E of the user over a time period t +1t+1,nAnd updating the FQ value with the following equation:
FQ(sk,a)←FQ(sk,ak)+αk[r(sk,ak)+γmaxFQ(sk+1,a)-FQ(sk,ak)]
wherein α represents a learning factor, γ represents a discount factor, r(s)k,ak) Is represented by skIn state selection akA reward for an action;
step S26: judging whether the maximum time period T is reached, if yes, turning to the next step, otherwise, returning to the step S24 when T is T + 1;
step S27: judging whether the Q table converges to the maximum value, if yes, going to the next step, otherwise, returning to the step S23 when k is k + 1;
step S28: and outputting the optimal retail price of the T time periods in one day.
The load aggregation businessman collects the electricity demand E of the consumert,nAnd the initial parameters such as the user dissatisfaction degree coefficient and the like, the maximization of the objective function is obtained through the pricing method of the dynamic load demand response based on the fuzzy reinforcement learning, the calculated optimized retail price is issued to the consumption user, the power consumption demand is fed back to the power production department, and the power production department then guides the power production.
The method searches for reasonable electricity price under the condition of considering the fuzzy uncertainty of the load response, provides a load demand response model, a service provider model and an objective function model aiming at the defect that the fuzzy uncertainty of the load response is not considered in a dynamic demand response pricing model, and provides the steps of a dynamic demand response pricing algorithm based on fuzzy reinforcement learning, so that the uncertainty of the load response can be fully considered, and the method can adapt to the environment of a dynamically changing power market.
Although the present invention has been described with reference to the above embodiments, it should be understood that the present invention is not limited to the above embodiments, and other embodiments and modifications may be made by those skilled in the art without departing from the scope of the present invention.