CN109461019A

CN109461019A - A kind of dynamic need response pricing method based on Fuzzy Reinforcement Learning

Info

Publication number: CN109461019A
Application number: CN201811109049.5A
Authority: CN
Inventors: 邱守强
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2018-09-21
Filing date: 2018-09-21
Publication date: 2019-03-12

Abstract

The invention discloses a dynamic demand response pricing method based on fuzzy reinforcement learning. The model established in step S1 is solved by the fuzzy reinforcement learning algorithm to obtain the optimal retail electricity price. The invention finds a reasonable electricity price under the condition of considering the fuzzy uncertainty of the load response, and proposes a fuzzy load demand response model, a load aggregator optimization model and an objective function for the deficiency that the dynamic demand response pricing model does not consider the fuzzy uncertainty of the load response. model, and proposes a dynamic demand response pricing procedure based on fuzzy reinforcement learning, which not only fully considers the uncertainty of load response, but also adapts to the dynamically changing power market environment, improves computing efficiency, and finds the real-time optimal pricing strategy through optimization. To improve grid reliability and reduce energy imbalance.

Description

Dynamic demand response pricing method based on fuzzy reinforcement learning

Technical Field

The invention relates to a dynamic demand response pricing method based on fuzzy reinforcement learning.

Background

With the development of the communication technology of the power distribution network, the demand side response has a flexible adjusting effect at the load end, and the demand side response becomes an effective method for improving the reliability of the power grid and reducing the energy loss. The price type demand response enables a user to change the electricity utilization mode of the user according to the electricity price signal which changes in real time, and the purpose of adjusting the load curve is achieved. The dynamic demand response pricing process is a decision-making process that aims to find a reasonable price for electricity to allocate the system's power services. Demand response pricing models often employ deterministic pricing models, such as time-of-use pricing models, and do not reflect well the uncertainty of the energy of the real-time dynamic market. Dynamic price pricing models typically utilize linear pricing models, have no logical pricing process that is reasonable, and do not reflect the complexity of demand response distribution. It is necessary how to build an uncertain demand response model reflecting the load demand response.

Reinforcement Learning (RL) is an artificial intelligence algorithm. The reinforcement learning algorithm is one of machine learning by taking the behavioral psychology as a reference, and can be used for a decision problem. Reinforcement learning maximizes the rewards of some decisions by individuals continually taking action on an uncertain environment. The application of the reinforcement learning algorithm in the pricing model is beneficial to fully considering the uncertainty and flexibility of the power market, and can be used for solving the dynamic demand response pricing method with uncertainty.

Disclosure of Invention

The invention aims to overcome the defects of a traditional dynamic demand response pricing model, and provides a dynamic demand response pricing method based on a fuzzy reinforcement learning algorithm, which can fully consider the uncertainty and the flexibility of a power market into the decision of electricity price.

The technical scheme adopted by the invention is as follows:

a dynamic demand response pricing method based on fuzzy reinforcement learning comprises the following steps:

s1, establishing a layered power market model, including a fuzzy load demand response model, a load aggregation quotient optimization model and an objective function model thereof;

and S2, solving the model established in the step S1 by using a fuzzy reinforcement learning algorithm to obtain the optimal retail electricity price.

Further, in step S1, the establishing the fuzzy load demand response model specifically includes:

s11, establishing a model of the basic load and an interruptible load model:

in the fuzzy load demand response model, the load comprises an interruptible load and a basic load which does not participate in demand response, and the model of the basic load is as follows:

in the formula,andrespectively representing the energy consumption and the actual energy demand of the user n in the time period t; t ∈ {1,2,3 … T }, T representing the total number of time segments of a day; n is equal to {1,2,3 … N }, N represents the total number of users, and b is a superscript tableIndicating a basic load;

the interruptible load model is:

ξ_t＝(ξ_a,ξ_b,ξ_c)

ξ_a,ξ_b,ξ_c<0

λ_t,n≥π_t

in the formula, E2]Representing a fuzzy expected value;andrepresenting the interruptible energy consumption and energy demand of user n during time t, respectively ξ_tIs the price elastic coefficient of the time period t, the value of which is less than zero and is a triangular fuzzy number; lambda [ alpha ]_t,nIndicating a retail price of electricity for user n for time period t; pi_tA wholesale electricity price representing a t time period; superscript c represents interruptible load; subscripts a, b, and c represent the start, middle, and end points of the triangular blur number, respectively;

s12, determining a minimum cost target model of the user according to the model of the basic load and the interruptible load model:

wherein,indicating the total actual load consumption expected value,represents the dissatisfaction degree of the user n in the time period t:

α_n>0,β_n>0

in the formula, α_nAnd β_nA reaction parameter representing the load on the tangential load amount; d_minAnd D_maxRepresenting the minimum and maximum load shedding amounts of the load, respectively.

Further, in step S1, the purpose of establishing the load aggregator optimization model is to earn the maximum profit of the retail electricity price and the wholesale electricity price, and the specific model is as follows:

further, in step S1, when the cost of the user and the profit of the load aggregator are considered simultaneously, the objective function model is:

in the formula, rho epsilon [0,1] represents the weight relation of the user cost and the load aggregation quotient.

Further, the step S2 specifically includes:

step S21: initializing parameters including: energy requirement E of the load_t,nElastic coefficient of price ξ_tThe reaction parameter α of load bisection load_n、β_n(ii) a Minimum and maximum load shedding D of the load_min、D_max(ii) a Price of wholesale electricity pi_n(ii) a The weight factor θ of the reward; the weight relationship rho between the user cost and the load provider;

step S22: initializing Q (e)_t,n|E_t,n,λ_t,n) Each element in the Q table is zero, a time period t is set to be 0, and the iteration number k is set to be 0;

step S23: observing the energy demand E of the user at t ═ 1_t,n；

Step S24: selection of retail electricity prices lambda with greedy strategy_t,n；

Step S25: calculating a reward, i.e. objective functionObserving the energy demand E of the user over a time period t +1_t+1,nAnd updating the FQ value;

step S26: judging whether the maximum time period T is reached, if yes, turning to the next step, otherwise, returning to the step S24 when T is T + 1;

step S27: judging whether the Q table converges to the maximum value, if yes, going to the next step, otherwise, returning to the step S23 when k is k + 1;

step S28: and outputting the optimal retail price of the T time periods in one day.

Further, in step S24, the state-action value function is:

V(x)＝maxFQ(s_k+1,a)，

wherein FQ (·) represents the FQ value and is a fuzzy expected value; k represents the number of iterations; a is in the state s_k+1An act of selecting;

then the action policy is selected using greedy principle as:

wherein x is a random number within the interval [0,1 ]; ε represents the search rate.

Further, in step S25, the FQ value may be updated by the following equation:

FQ(s_k,a)←FQ(s_k,a_k)+α^k[r(s_k,a_k)+γmaxFQ(s_k+1,a)-FQ(s_k,a_k)]

wherein α represents a learning factor, γ represents a discount factor, r(s)_k,a_k) Is represented by s_kIn state selection a_kAnd (4) reporting the action.

Compared with the prior art, the invention has the following beneficial effects:

in the operation process, the uncertainty of the load is fully considered, the defect of fuzzy uncertainty of load response is not considered for the dynamic demand response pricing model, the method is suitable for the real-time changing electric power market environment, the rationality of dynamic pricing is improved, the calculation efficiency is improved, the real-time optimal pricing strategy is found through the optimization algorithm, and the effects of improving the reliability of the power grid and reducing the energy imbalance are achieved.

Drawings

FIG. 1 is a schematic diagram of a tiered power market model.

Fig. 2 is a schematic flow chart of solving to obtain the optimal retail electricity price based on the fuzzy reinforcement learning algorithm.

Detailed Description

The embodiments are further described below with reference to the accompanying drawings.

As shown in fig. 1, energy is sold to a load aggregator at a wholesale price by a power producer, and then sold to a consumer at a retail price by the load aggregator. The exchange information among the three is mainly the purchase price and the electricity consumption. The information exchange and pricing decision mechanism of the retail price between the load aggregator and the consumer is the dynamic load demand response pricing method based on the fuzzy reinforcement learning provided by the embodiment.

Specifically, in step S1, the establishing the fuzzy load demand response model specifically includes:

s11, establishing a model of the basic load and an interruptible load model:

in the formula,andrespectively representing the energy consumption and the actual energy demand of the user n in the time period t; t ∈ {1,2,3 … T }, T representing the total number of time segments of a day; n ∈ {1,2,3 … N }, where N denotes the total userNumber, superscript b represents base load;

the interruptible load model is:

ξ_t＝(ξ_a,ξ_b,ξ_c)

ξ_a,ξ_b,ξ_c<0

λ_t,n≥π_t

s11, determining a minimum cost target model of the user according to the model of the basic load and the interruptible load model:

α_n>0,β_n>0

Specifically, in step S1, the purpose of establishing the load aggregator optimization model is to earn the maximum profit of the retail electricity price and the wholesale electricity price, and the specific model is as follows:

specifically, in step S1, when the cost of the user and the profit of the load aggregator are considered, the objective function model is:

Specifically, as shown in fig. 2, the step S2 specifically includes:

step S21: initializing parameters, including: energy requirement E of the load_t,nElastic coefficient of price ξ_tThe reaction parameter α of load bisection load_n、β_n(ii) a Minimum and maximum load shedding D of the load_min、D_max(ii) a Price of wholesale electricity pi_n(ii) a The weight factor θ of the reward; the weight relationship rho between the user cost and the load provider;

step S23: observing the energy demand E of the user at t ═ 1_t,n；

Step S24: selection of retail electricity prices lambda with greedy strategy_t,nAt this time, the state-action value function is:

V(x)＝maxFQ(s_k+1,a)，

then the action policy is selected using greedy principle as:

wherein x is a random number within the interval [0,1 ]; ε represents the search rate;

step S25: calculating a reward, i.e. objective functionObserving the energy demand E of the user over a time period t +1_t+1,nAnd updating the FQ value with the following equation:

FQ(s_k,a)←FQ(s_k,a_k)+α^k[r(s_k,a_k)+γmaxFQ(s_k+1,a)-FQ(s_k,a_k)]

wherein α represents a learning factor, γ represents a discount factor, r(s)_k,a_k) Is represented by s_kIn state selection a_kA reward for an action;

The load aggregation businessman collects the electricity demand E of the consumer_t,nAnd the initial parameters such as the user dissatisfaction degree coefficient and the like, the maximization of the objective function is obtained through the pricing method of the dynamic load demand response based on the fuzzy reinforcement learning, the calculated optimized retail price is issued to the consumption user, the power consumption demand is fed back to the power production department, and the power production department then guides the power production.

The method searches for reasonable electricity price under the condition of considering the fuzzy uncertainty of the load response, provides a load demand response model, a service provider model and an objective function model aiming at the defect that the fuzzy uncertainty of the load response is not considered in a dynamic demand response pricing model, and provides the steps of a dynamic demand response pricing algorithm based on fuzzy reinforcement learning, so that the uncertainty of the load response can be fully considered, and the method can adapt to the environment of a dynamically changing power market.

Although the present invention has been described with reference to the above embodiments, it should be understood that the present invention is not limited to the above embodiments, and other embodiments and modifications may be made by those skilled in the art without departing from the scope of the present invention.

Claims

1. A dynamic demand response pricing method based on fuzzy reinforcement learning is characterized by comprising the following steps:

2. The dynamic demand response pricing method based on fuzzy reinforcement learning of claim 1, wherein: in step S1, the establishing the fuzzy load demand response model specifically includes:

s11, establishing a model of the basic load and an interruptible load model:

in the formula,andrespectively representing the energy consumption and the actual energy demand of the user n in the time period t; t ∈ {1,2,3 … T }, T representing the total number of time segments of a day; n belongs to {1,2,3 … N }, wherein N represents the total number of users, and superscript b represents the basic load;

the interruptible load model is:

ξ_t＝(ξ_a,ξ_b,ξ_c)

ξ_a,ξ_b,ξ_c<0

λ_t,n≥π_t

α_n>0,β_n>0

3. The dynamic demand response pricing method based on fuzzy reinforcement learning according to claim 2, characterized in that: in step S1, the purpose of establishing the load aggregator optimization model is to earn the maximum profit of the retail electricity price and the wholesale electricity price, and the specific model is as follows:

4. the dynamic demand response pricing method based on fuzzy reinforcement learning according to claim 3, wherein: in step S1, when the cost of the user and the profit of the load aggregator are considered, the objective function model is:

5. The dynamic demand response pricing method based on fuzzy reinforcement learning according to claim 2, characterized in that: the step S2 specifically includes:

step S23: observing the energy demand E of the user at t ═ 1_t,n；

6. The dynamic demand response pricing method based on fuzzy reinforcement learning according to claim 5, wherein: in step S24, the state-action value function is:

V(x)＝max FQ(s_k+1,a)，

then the action policy is selected using greedy principle as:

7. The dynamic demand response pricing method based on fuzzy reinforcement learning of claim 6, wherein: in step S25, the FQ value may be updated by the following equation:

FQ(s_k,a)←FQ(s_k,a_k)+α^k[r(s_k,a_k)+γmax FQ(s_k+1,a)-FQ(s_k,a_k)]