CN113052638B

CN113052638B - Price demand response-based determination method and system

Info

Publication number: CN113052638B
Application number: CN202110366209.XA
Authority: CN
Inventors: 秦家虎; 万艳妮
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2021-04-06
Filing date: 2021-04-06
Publication date: 2023-11-24
Anticipated expiration: 2041-04-06
Also published as: CN113052638A

Abstract

The invention discloses a method and a system for determining price demand response, which are used for modeling a dynamic retail price pricing problem of an electric company as a Markov decision process, determining retail price making action according to the state of all load units at the current moment and by utilizing a price selection probability-greedy strategy, acquiring immediate return of benefits and the state of all load units at the next moment, updating a reference action value function obtained in the last iteration into a target action value function, taking the target action value function as an optimal action value function when the current moment reaches the terminal moment and the absolute value of the difference value between the target action value function and the reference action value function is not larger than a difference value threshold, and determining an optimal retail price strategy according to the optimal action value function, thereby calculating the optimal energy consumption of a schedulable load. When the action value function is determined, the influence of the current electricity price on the instant response of the load and the response in a future period of time is considered, so that the accuracy of response based on price demand is improved.

Description

Price demand response-based determination method and system

Technical Field

The invention relates to the technical field of power grids, in particular to a price demand response-based determination method and system.

Background

Smart grids are a typical information physical system that integrates advanced detection, control and communication technologies into a physical power system to provide reliable energy supply, facilitate active participation of loads, and ensure stable operation of the grid system. Based on the feature of smart grid information physical fusion, power demand response (demand response) has become a research hotspot in the field of energy management (energy management), and the purpose of the power demand response (demand response) is to change the energy use mode of a load according to time-varying electricity prices or rewards/punishment incentives, so as to achieve the goals of reducing energy cost on the demand side and the like. In other words, the power demand response is a means to remodel the manner in which the load energy is used by price or incentive means to achieve more efficient energy management.

Currently, existing research efforts focus mainly on two branches of demand response, price-based demand response (price-based demand response) and incentive-based demand response (incentive-based demand response). Among them, as a general demand response based on price demand response, it is desirable to change the energy usage pattern of the end user by a time-dependent electricity price pricing mechanism such as a time-of-use pricing mechanism and a real-time pricing mechanism.

The existing price demand response is mostly based on a deterministic price mechanism, such as a time-of-use electricity price pricing mechanism, a day-ahead electricity price pricing mechanism or a linear price model. However, the deterministic price mechanism cannot truly characterize the uncertainty and flexibility of the dynamic power market, so the accuracy of the existing price demand response is not high.

Disclosure of Invention

In view of the above, the invention discloses a method and a system for determining response based on price demand, so as to realize real characterization of uncertainty and flexibility of dynamic electric power market and improve accuracy of response based on price demand.

A price demand response based determination method, comprising:

modeling a dynamic retail electricity pricing problem for an electric utility company as a markov decision process;

monitoring states of all load units at the current moment, marking the states as a first state, and selecting one target retail price making action at the current moment within an allowable retail price range by utilizing a price selection probability-greedy strategy;

calculating the immediate return of the income after the target retail electricity price making action is executed, monitoring the states of all load units at the moment next to the current moment, and marking the states as a second state;

Updating a reference action value function into a target action value function based on the first state, the second state, the target retail price setting action and the immediate return of the income, wherein the reference action value function is an action value function obtained in the last iteration;

judging whether the current moment reaches the terminal moment or not;

if yes, judging whether the absolute value of the difference value of the target action value function and the reference action value function is not larger than a difference value threshold value;

if so, taking the target action value function as an optimal action value function, and determining an optimal retail electricity price strategy according to the optimal action value function;

and calculating the optimal energy consumption of the schedulable load according to the optimal retail electricity price strategy.

Optionally, the specific meaning of the electricity price selection probability-greedy strategy is as follows: and randomly selecting a retail electricity price from the action set with epsilon probability, or selecting the retail electricity price corresponding to the maximum action value function with 1-epsilon probability, wherein epsilon represents the electricity price selection probability.

Optionally, the benefit immediate return r ^t The expression of (2) is as follows:

r ^t ＝ρU _t -(1-ρ)C _t ；

wherein ρ ε [0,1 ]]Is a weight parameter representing the relative social value of the profit of the electric company and the comprehensive cost of the load unit, U _t Represents the net income of the electric company at the time t, C _t The total cost on the load side at time t is shown.

Optionally, the net benefit U of the electric company at time t _t The expression of (2) is as follows:

in the method, in the process of the invention,for a non-schedulable load set, +.>Representing non-schedulable load +.>Retail electricity price received at time t, +.>Representing non-schedulable load +.>The energy consumption at the time t is marked with the upper index n representing the non-schedulable load mark, the lower index n representing the index of the load unit, the lower index t representing the index of the time index +.>Representing schedulable load +.>The energy consumption at time t, superscript d indicates the schedulable load identification, ++>For a schedulable load set, +.>Representing schedulable load +.>Retail electricity price, eta, received at time t _t Representing schedulable load +.>Wholesale electricity price at time t and satisfies +.> The method comprises the steps that total electric energy purchased by an electric company to a power grid operator at the time t is represented, and a superscript tot represents a total electric energy identifier;

comprehensive cost C of load side at time t _t The expression of (2) is as follows:

in the method, in the process of the invention,representing schedulable load +.>At time t, the dissatisfaction caused by the reduced energy consumption requirement.

Optionally, degree of dissatisfactionThe expression of (2) is as follows:

in the method, in the process of the invention,representing schedulable load +.>At time t the dissatisfaction caused by the reduced energy consumption requirement, And->Representing two dependence on schedulable load +.>Dissatisfaction coefficient of->Representing schedulable loadsIs decreased in demand, is->Representation canScheduling load->At time t the energy demand, superscript d, indicates the schedulable load identification,/i>For a schedulable load set, +.>Representing schedulable load +.>Energy consumption at time t.

Alternatively, the load may be scheduledIs->The following inequality is satisfied:

in the method, in the process of the invention,and->Respectively represent schedulable load->And the minimum demand reduction amount and the maximum demand reduction amount of (c) are both known amounts.

Optionally, the expression of the target action value function is as follows:

in which Q ^k (s ^t ，a ^t ) Representing the state s from all load units at the kth iteration as the target action value function ^t Starting, a target retail electricity price making action a is executed ^t Is defined as the cumulative future discount returns ofWhere γ represents the discount factor, e 0,1]Is the learning rate, representing newly obtained Q ^k Value pair Q ^k-1 Degree of coverage of value, Q ^k-1 (s ^t ，a ^t ) Representing the reference motion value function, s ^t+1 Representing the states of all load units at time t+1, a ^t+1 Retail electricity price establishment operation at time t+1, Q ^k-1 (s ^t+1 ，a ^t+1 ) Representing the state s of k-1 iterations from all loads ^t+1 Proceeding, execution a ^t+1 Is provided for the accumulated future discount returns.

Optionally, the expression of the optimal retail electricity price policy is as follows:

in the formula, pi ^* (s ^t ) For optimal retail electricity price strategy, Q ^* (s ^t ，a ^t ) As the optimal action value function, A is an action set, and A= { a ¹ ，a ² ，…，a ^T The value range of the time t is: t=1, 2,..t, T represents the total number of time intervals, s ^t Representing the states of all load units at time t, a ^t And represents retail electricity price making action at time t.

Alternatively, the expression of the optimal energy consumption amount is as follows:

in the method, in the process of the invention,the optimal energy consumption is represented, the subscript n represents a load unit index, the subscript t represents a time index, the subscript d represents a schedulable load identifier, and the load unit index is represented by +_>Representing schedulable load +.>Energy demand at time t, +.>Representing schedulable load +.>Retail electricity price, eta, received at time t _t Representing schedulable load +.>Wholesale electricity price at time t and satisfies +.>μ _t The electricity rate elastic coefficient represents the ratio of energy demand to change with retail electricity rate change at time t.

Optionally, the determining method further includes: initializing the action value function, which specifically comprises:

and acquiring known prior parameter data, carrying the prior parameter data into the predetermined action value function, and initializing the action value function, wherein the initial value of the action value function is 0.

A price demand response based determination system comprising:

the modeling unit is used for modeling the dynamic retail electricity price pricing problem of the electric company into a Markov decision process;

the action selection unit is used for monitoring the states of all load units at the current moment, marking the states as a first state, and selecting one target retail electricity price making action at the current moment within the allowable retail electricity price range by utilizing an electricity price selection probability-greedy strategy;

the return calculation unit is used for calculating return immediately after the target retail electricity price making action is executed, monitoring the states of all load units at the moment next to the current moment and recording the states as a second state;

the function updating unit is used for updating a reference action value function into a target action value function based on the first state, the second state, the target retail electricity price making action and the immediate return of the benefit, wherein the reference action value function is the action value function obtained in the last iteration;

the first judging unit is used for judging whether the current moment reaches the terminal moment;

a second judging unit configured to judge whether an absolute value of a difference between the target action value function and the reference action value function is not greater than a difference threshold value, if the first judging unit judges yes;

The electricity price strategy determining unit is used for taking the target action value function as an optimal action value function and determining an optimal retail electricity price strategy according to the optimal action value function under the condition that the second judging unit judges that the target action value function is yes;

and the energy consumption calculation unit is used for calculating the optimal energy consumption of the schedulable load according to the optimal retail electricity price strategy.

According to the technical scheme, the method and the system for determining the price demand response are disclosed, the dynamic retail price pricing problem of the electric company is modeled as a Markov decision process, the states of all load units at the current moment are monitored to be recorded as a first state, a target retail price making action at the current moment is selected by utilizing a price selection probability-greedy strategy in the allowable retail price range, the immediate return of the income after the target retail price making action is executed is calculated, the states of all load units at the next moment at the current moment are monitored and recorded as a second state, the reference action value function obtained in the last iteration is updated as a target action value function based on the first state, the second state, the target retail price making action and the immediate return of the income, when the current moment reaches the terminal moment and the absolute value of the difference value of the target action value function and the reference action value function is not larger than the difference threshold, the optimal retail price making action value function is determined by adopting the Markov decision process according to the optimal retail price decision strategy, and further the optimal energy consumption of the schedulable load is calculated according to the optimal retail price decision strategy, and accordingly the demand response is determined. According to the invention, the dynamic retail price pricing problem of the electric company is modeled as a Markov decision process, when the optimal retail price strategy is determined according to the optimal action value function, the influence of the current price on the instant response of the load is considered, and the influence of the current price on the load response in a future period is considered, so that the uncertainty and flexibility of the dynamic electric market can be truly depicted, and the accuracy of the response based on price demand is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only embodiments of the present invention, and other drawings can be obtained according to the disclosed drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a price demand response-based determination method disclosed in an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a price demand response-based determination system according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The embodiment of the invention discloses a method and a system for determining response based on price demand, wherein the method and the system are used for modeling a dynamic retail price pricing problem of an electric company as a Markov decision process, monitoring states of all load units at the current moment as a first state, selecting a target retail price formulating action at the current moment within an allowable retail price range by utilizing a price selection probability-greedy strategy, calculating immediate return of benefits after the target retail price formulating action is executed, monitoring states of all load units at the next moment at the current moment as a second state, immediately returning the states of all load units at the next moment and the benefits, updating a reference action value function obtained in the last iteration as a target action value function based on the first state, the second state and the target retail price formulating action, and when the current moment reaches a terminal moment and the absolute value of a difference value of the target action value function and the reference action value function is not larger than a difference threshold value, determining the optimal retail price strategy according to the optimal action value function by adopting the Markov decision process, and further calculating optimal energy consumption of a schedulable load according to the optimal retail price formulating strategy, thereby realizing the determination of response based on price demand. According to the invention, the dynamic retail price pricing problem of the electric company is modeled as a Markov decision process, when the optimal retail price strategy is determined according to the optimal action value function, the influence of the current price on the instant response of the load is considered, and the influence of the current price on the load response in a future period is considered, so that the uncertainty and flexibility of the dynamic electric market can be truly depicted, and the accuracy of the response based on price demand is improved.

It should be specifically noted that the price-based demand response to be protected by the present invention is specifically: problems with price demand response based in residential retail power markets. The retail power market includes a utility company and a finite setWherein N refers to the total number of load units in the retail power market. In practical application, the upper-layer power company establishes retail electricity prices for lower-layer load units of all services, and when the lower-layer load units receive retail electricity price signals, the lower-layer load units respond to the electricity prices in real time, so that own energy consumption strategies are determined and transmitted to the power company. Thus, in the residential retail power market framework, the goal of price demand based response is to +.>In coordination of limited set +.>To maximize the social benefits of the power system, including the utility of the utility company and the overall cost of the load side, where T represents the total number of time intervals.

Referring to fig. 1, a flowchart of a determining method based on price demand response, which is applied to a processor in an electric power company, is disclosed in an embodiment of the present invention, and the determining method includes:

Step S101, modeling a dynamic retail electricity price pricing problem of an electric company as a Markov decision process;

step S102, monitoring states of all load units at the current moment, marking the states as a first state, and selecting a target retail price making action at the current moment within an allowable retail price range by utilizing a price selection probability-greedy strategy;

due to the initial state s ^t Refers to the time index and the energy requirements e of all load units at the moment t _t And the information is stored in the computer as prior parameter data, so that the initial state of the initialization parameter data acquisition can be queried only by randomly selecting the initial time t.

In this embodiment, the initial state s is monitored ^t And selecting a target retail price formulating action a at the current moment within the allowable retail price range by utilizing a price selection probability-greedy strategy ^t The electricity price selection probability is expressed by epsilon, and the electricity price selection probability-greedy strategy is a criterion of electricity price selection, and has the following specific meaning: randomly selecting a retail electricity price θ from action set A with epsilon probability _t Or selecting retail electricity price theta corresponding to the maximum action value function with the probability of 1-epsilon _t 。

Step S103, calculating the immediate return of the income after the target retail electricity price making action is executed, monitoring the states of all load units at the next moment of the current moment, and marking the states as a second state;

in this embodiment, after the immediate return of the calculated benefit, the states s of all the load units at the next time t, i.e. time t+1, are also monitored ^t+1 。

Wherein the immediate return of revenue is calculated according to equation (1), equation (1) is as follows:

r ^t ＝ρU _t -(1-ρ)C _t (1)；

U _t The expression of (2) is shown in the formula (2), and the formula (2) is as follows:

in the method, in the process of the invention,for a non-schedulable load set, +.>Representing non-schedulable load +.>Retail electricity price received at time t, +.>Representing non-schedulable load +.>The energy consumption at the time t is marked with the upper index n representing the non-schedulable load mark, the lower index n representing the index of the load unit, the lower index t representing the index of the time index +.>Representing schedulable load +.>The energy consumption at time t, superscript d indicates the schedulable load identification, ++ >For a schedulable load set, +.>Representing schedulable load +.>Retail electricity price, eta, received at time t _t Representing schedulable load +.>Wholesale electricity price at time t and satisfies +.> The total electric energy purchased by the electric power company to the electric network operator at the time t is represented, and the superscript tot represents the total electric energy identifier.

C _t The expression of (3) is shown in the formula (3), and the formula (3) is as follows:

Step S104, updating a reference action value function into a target action value function based on the first state, the second state, the target retail electricity price making action and the immediate return of the benefit;

the reference action value function is an action value function obtained by the previous iteration;

target action value function Q ^k (s ^t ，a ^t ) The expression of (2) is shown in the formula (4), and the formula (4) is as follows:

in which Q ^k (s ^t ，a ^t ) As a function of the target action value, representing the state s from all load cells at the kth iteration ^t Starting, a target retail electricity price making action a is executed ^t Is defined as the cumulative future discount returns ofWhere γ represents the discount factor, e 0,1]Is the learning rate, representing newly obtained Q ^k Value pair Q ^k-1 Degree of coverage of value, Q ^k-1 (s ^t ，a ^t ) Representing the reference motion value function, s ^t+1 Representing the states of all load units at time t+1, a ^t+1 Retail electricity price system representing time t+1Fixed motion, Q ^k-1 (s ^t-1 ，a ^t+1 ) Representing the state s of k-1 iterations from all loads ^t+1 Proceeding, execution a ^t+1 Is provided for the accumulated future discount returns.

Step S105, judging whether the current moment reaches the terminal moment, if so, executing step S106;

when the current time T does not reach the terminal time T, the process returns to step S102.

Step S106, judging whether the absolute value of the difference between the target action value function and the reference action value function is not larger than a difference threshold value, if so, continuing to execute step S107;

in the present embodiment, when |Q ^k -Q ^k-1 When the delta is not more than the delta, continuing to execute the step S108, otherwise, return to step S102, wherein Q ^k-1 As a function of the reference motion value, Q ^k Delta is a difference threshold value as a function of the target motion value.

Step S107, taking the target action value function as an optimal action value function, and determining an optimal retail electricity price strategy according to the optimal action value function by adopting a Markov decision process;

specifically, the optimal retail electricity price strategy is shown in formula (5), and formula (5) is as follows:

in the formula, pi ^* (s ^t ) For optimal retail electricity price strategy, Q ^* (s ^t ，a ^t ) As the optimal action value function, A is an action set, and A= { a ¹ ，a ² ，…，a ^T }。

And step S108, calculating the optimal energy consumption of the schedulable load according to the optimal retail electricity price strategy.

Wherein, the expression of the optimal energy consumption is shown in formula (6), and formula (6) is as follows:

In summary, the invention discloses a price demand response-based determination method, which is characterized in that a dynamic retail price pricing problem of an electric company is modeled as a Markov decision process, states of all load units at the current moment are monitored as a first state, a target retail price formulating action at the current moment is selected by utilizing a price selection probability-greedy strategy within an allowable retail price range, immediate return of benefits after the target retail price formulating action is calculated and executed, states of all load units at the next moment at the current moment are monitored and recorded as a second state, a reference action value function obtained in the last iteration is updated as a target action value function based on the first state, the second state, the target retail price formulating action and the immediate return of benefits, when the current moment reaches a terminal moment and the absolute value of the difference value of the target action value function and the reference action value function is not larger than a difference threshold, the target action value function is used as an optimal action value function, the optimal retail price strategy is determined by adopting the Markov decision process according to the optimal action value function, and further optimal energy consumption of a schedulable load is calculated according to the optimal retail price decision strategy, so that the price demand response-based on the price demand response is determined. According to the invention, the dynamic retail price pricing problem of the electric company is modeled as a Markov decision process, when the optimal retail price strategy is determined according to the optimal action value function, the influence of the current price on the instant response of the load is considered, and the influence of the current price on the load response in a future period is considered, so that the uncertainty and flexibility of the dynamic electric market can be truly depicted, and the accuracy of the response based on price demand is improved.

In addition, the present invention utilizes reinforcement learning algorithms to address price-based demand response problems in unknown power market environments (i.e., retail electricity prices and load energy consumption are uncertain and random).

To further optimize the above embodiment, before step S102, the operation value function needs to be initialized, where the process of initializing the operation value function includes:

and acquiring known prior parameter data, bringing the prior parameter data into a predetermined action value function, and initializing the action value function.

Wherein the a priori parameter data comprises: energy requirement e of load unit _t Coefficient of dissatisfactionAnd->Coefficient of elasticity mu of electricity price _t Price eta of wholesale power generation _t Weights and methodsHeavy parameters ρ, etc., t represents time.

The initial value of the action value function being 0, i.e. Q ^k (s ^t ，a ^t ) At this time, the iteration number k takes a value of 1, i.e., k=1, and the time t takes a value of 1, i.e., t=1, i.e., Q ⁰ (s，a)＝0。

In this embodiment, the value range of the time t is: t=1, 2,..t, T represents the total number of time intervals.

The variable parameters in the action value function include: s is(s) ^t And a ^t ，s ^t Representing the state of all load units at time t, i.e. the energy demand e of all load units at time t _t Energy consumption p _t A time index t; a, a ^t Representing retail electricity price making action at time t, namely retail electricity price theta made by electric company for all load units at time t _t 。

It should be noted that, because the power system model relates to information interaction between the power company and the load unit, in order to facilitate understanding of the technical solution to be protected by the present invention, a mathematical model between the power company and the load unit is described below.

The loads are generally classified into two types according to the preference of users and the energy consumption characteristics of the loads, namely, the loads can be scheduledAnd non-schedulable load->That is to say->

(one) schedulable load: the energy consumption of a general schedulable load is expressed as shown in equation (7), equation (7) is as follows:

in the method, in the process of the invention,and->Respectively represent schedulable load->The energy consumption and the energy demand at the time t, wherein the energy demand refers to the electric energy expected to be consumed by the load unit before the retail electricity price signal is received, the energy consumption information refers to the electric energy actually consumed by the load unit after the retail electricity price signal is received, the subscript n represents the index of the load unit, the subscript t represents the index of the time index, and the superscript d represents the schedulable load identification. Mu (mu) _t The electricity rate elastic coefficient represents the ratio of energy demand to change with retail electricity rate change at time t. />And eta _t Respectively represent schedulable load->Retail electricity prices received at time t and wholesale electricity prices at time t, and satisfy +.>

Equation (7) shows that the actual energy consumption of the schedulable load depends not only on the energy demand information, but also on the reduction of energy demand due to retail price change. When the real energy consumption of the schedulable load unit n isWhen it means the remaining->Are not satisfied and thus may lead to dissatisfaction with the load user.

To characterize this degree of dissatisfaction, a dissatisfaction function shown in equation (8) is defined, equation (8) being as follows:

in the method, in the process of the invention,representing schedulable load +.>At time t the dissatisfaction caused by the reduced energy consumption requirement,and->Representing two dependence on schedulable load +.>Dissatisfaction coefficient of->Representing schedulable loadsIs decreased in demand, is->Representing schedulable load +.>At time t the energy demand, superscript d, indicates the schedulable load identification,/i>For a schedulable load set, +.>Representing schedulable load +.>Energy consumption at time t.

Equation (8) shows that a larger demand reduction results in a higher degree of dissatisfaction of the load cell.

Furthermore, the load may be scheduledThe required decrease amount of (2) cannot exceed the allowable range thereof, specifically as shown in the inequality (9), the inequality (9) is as follows:

in the method, in the process of the invention,and->Respectively represent schedulable load->And the minimum demand reduction amount and the maximum demand reduction amount of (c) are both known amounts. Once->And->Knowing, the schedulable load n true energy consumption can be determined accordinglyIs not limited in terms of the range of (a).

(II) non-schedulable load: generally, the energy requirements of non-dispatchable loads are not freely transferable and curtailable, so the energy requirements of these loads must be met strictly at any time.

Thus, forIn the invention, the formula (10) is satisfied, and the formula (10) is as follows:

in the method, in the process of the invention,and->Respectively represent non-schedulable load->Energy consumption and energy demand at time t, wherein superscript n denotes a non-schedulable load identification.

Thus, from a load point of view, the goal is to minimize the overall cost on the load side by determining the optimal energy consumption combination for all loads, i.e

Where p represents the energy consumption vector of all participating load units over the time period T,representing non-schedulable load +.>Retail electricity prices received at time t.

It can be seen that the above equation consists of two parts, corresponding to two types of loads, respectively. Concrete embodiments For the first itemRepresenting the cost of power for a non-dispatchable load to purchase power from the utility company, the second item +.>Representing the cost of electricity and unsatisfactory costs of a schedulable load purchasing power from an electric utility.

For convenience of subsequent discussion and writing, the comprehensive cost of the load side at the time t is defined as C _t Therefore, the above can be further described as

The utility company acts as an intermediary between the end user and the power producer, first buying power from the grid operator at a wholesale price and then selling the purchased power to different types of load units on the load side at a retail price. The goal of the utility company is therefore to maximize revenue by doing business in wholesale and retail markets, the mathematical model of which can be expressed as:

wherein θ represents a retail price vector formulated by the utility company for all load units throughout the time period T;represents the total power purchased by the utility company from the grid operator at time t, where the superscript tot represents the total power identification,θ _n andrepresenting the lower and upper bounds, respectively, of retail electricity prices established by the utility company for the load units. It can be seen that the objective function in the mathematical model consists of three terms, wherein +_>Representing the return of the utility to sell electrical energy to non-dispatchable load units,/- >Indicating the return of the utility company to sell electrical energy to the dispatchable load unit,/->Representing the cost of electricity for an electric utility to purchase electrical energy from a grid operator. Similarly, for convenience of presentation, the net benefit of the utility at time t is defined as U _t The objective function in the above formula is therefore further denoted +.>

Generally, when the power loss is not considered and the power balance criterion is followed, the total purchased power of the electric power company is equal to the total power consumption of the load side at any time, that is, as shown in the formula (11), the formula (11) is as follows:

in modeling utility and load units, it can be found that price-based demand response is closely related to utility revenue and load unit cost. From a social perspective, the goal of the system is therefore to maximize the social benefits including utility benefits and load composite costs, namely, as shown in equation (12), equation (12) is as follows:

wherein ρ ε [0,1] is a weight parameter representing the relative social value of the revenue of the utility company and the comprehensive cost of the load unit. The larger ρ means that from the social point of view, the more interest is paid to the benefits of the electric company; on the contrary, the influence of the comprehensive cost of the load on the social benefits is more concerned.

In order to formulate a dynamic retail electricity price which can adapt to flexible load change in an unknown electricity market environment, the invention firstly models the retail electricity market by utilizing a reinforcement learning framework.

Specifically, the electric company acts as an agent; all load units are used as environments; retail electricity prices are acted on the environment as agents; the energy demand, the energy consumption and the time of the load are taken as states; the social benefit (i.e., a weighted sum of utility returns and the overall cost of the load unit) is rewarded.

Second, dynamic retail electricity pricing problems are modeled using a Markov decision process, which is also typically the first step in using reinforcement learning algorithms. Without loss of generality, the markov decision process is represented by a five-tuple < S, a, R, P, γ >, wherein each element has the following meaning:

1) State set: s= { S ¹ ，s ² ，…，s ^T (s is therein ^t ＝(e _t ，p _t T) represents the state at time t, and the energy requirements e of all load units at time t _t Energy consumption p _t And the time index t.

2) Action set: a= { a ¹ ，a ² ，…，a ^T (wherein a) ^t ＝θ _t Representing the action of the intelligent agent at the moment t, namely retail electricity price theta established by the electric company for all load units at the moment t _t 。

3) A return set: r= { R ¹ ，r ² ，…，r ^T -where r ^t ＝ρU _t -(1-ρ)C _t And the return of the system at the time t is indicated, namely the social benefit at the current time.

4) State transition matrix:wherein->

Indicating the probability that the environment will transition to state s' at the next time after state s takes action a. Since the energy demand and energy consumption of a load are affected by many factors, it is difficult to obtain the state transition probability thereof. The power market environment is unknown in the present invention, so a model-free Q-learning approach is employed to solve the dynamic retail pricing problem.

5) Discount factor: gamma e 0,1 indicates the importance of the subsequent rewards relative to the current rewards.

Defining a strategy pi: S→A, i.e. mapping of states to actions, the pricing problem of retail electricity prices translates into finding an optimal strategy pi ^* Maximizing cumulative return of the system, i.eSince the goal of the system is to maximize the social benefit over the entire time period and the social value reported at any time is equal, γ=1 is taken in the present invention.

After modeling the dynamic retail price pricing problem for the utility company as a markov decision process, the utility company is analyzed by a Q-learning algorithm (a model-free reinforcement learning algorithm) for how to select the optimal retail price while interacting with all load units to achieve the utility system objective.

The basic principle of the Q-learning algorithm is to assign an action-value function Q (s, a) to each state-action pair (s, a), and then update the function in each iteration to obtain the optimal action-value function Q ^* (s, a). The optimal action value function is defined as taking action a from state s, then taking the optimal policy pi ^* And satisfies the bellman equation, i.e

Where S 'e S, a' e A represents the state at the next moment and the action taken, respectively, and r (S, a) represents the starting from state S, takingGet immediate return after action a, Q ^* (s ', a') means that starting from state s ', action a' is performed, after which an optimal policy pi is taken ^* Is the maximum cumulative future discount return. Gamma e [0,1 ]]The discount factor is represented, and the influence of the current retail price on the load instant response is considered in the algorithm, and the influence of the current retail price on the load response in a future period of time is reflected. Thus, once the optimal action value function Q is obtained ^* (s ^t ，a ^t ) The optimal retail price strategy shown in the formula (5) can be directly obtained according to the following formula.

Corresponding to the embodiment of the method, the invention also discloses a price demand response-based determination system.

Referring to fig. 2, a schematic structural diagram of a price demand response-based determination system according to an embodiment of the present invention is disclosed, the system being applied to a processor in an electric power company, the system comprising:

a modeling unit 201 for modeling a dynamic retail electricity pricing problem of an electric company as a markov decision process;

an action selecting unit 202, configured to monitor states of all load units at a current time, record the states as a first state, and select a target retail price setting action at the current time within an allowable retail price range by using a price selection probability-greedy strategy;

a return calculation unit 203, configured to calculate a return immediately after the target retail electricity price making action is executed, monitor the states of all load units at the time next to the current time, and record the states as a second state;

a function updating unit 204, configured to update a reference action value function to a target action value function based on the first state, the second state, the target retail price setting action, and the immediate return of the benefit, where the reference action value function is an action value function obtained in the last iteration;

a first determining unit 205, configured to determine whether the current time reaches a terminal time;

A second judging unit 206, configured to judge whether an absolute value of a difference between the target action value function and the reference action value function is not greater than a difference threshold value, if the first judging unit 205 judges yes;

an electricity price policy determining unit 207 configured to take the target action value function as an optimal action value function and determine an optimal retail electricity price policy according to the optimal action value function, if the second determining unit 206 determines that the target action value function is yes;

an energy consumption calculation unit 208 for calculating an optimal energy consumption amount of the schedulable load according to the optimal retail electricity price strategy.

The specific operation principle of each component in the system embodiment should be specifically described, please refer to the corresponding portion of the method embodiment, and the detailed description is omitted herein.

In summary, the invention discloses a price demand response-based determining system, which models a dynamic retail price pricing problem of an electric company as a Markov decision process, monitors states of all load units at the current moment as a first state, selects a target retail price formulating action at the current moment by using a price selection probability-greedy strategy within an allowable retail price range, calculates a return immediately after executing the target retail price formulating action, monitors states of all load units at the next moment at the current moment as a second state, and immediately returns the states of all load units at the current moment and the return, updates a reference action value function obtained in the last iteration as a target action value function based on the first state, the second state and the target retail price formulating action, and when the current moment reaches a terminal moment and a difference absolute value of the target action value function and the reference action value function is not larger than a difference threshold, determines an optimal retail price strategy by using the Markov decision process according to the optimal action value function, and further calculates optimal energy consumption of a schedulable load according to the optimal retail price decision strategy, thereby realizing the determination of price demand response. According to the invention, the dynamic retail price pricing problem of the electric company is modeled as a Markov decision process, when the optimal retail price strategy is determined according to the optimal action value function, the influence of the current price on the instant response of the load is considered, and the influence of the current price on the load response in a future period is considered, so that the uncertainty and flexibility of the dynamic electric market can be truly depicted, and the accuracy of the response based on price demand is improved.

It should be specifically noted that, outside the field of demand-side energy management listed in the present invention, the present invention may also be applied to decision problems in other unknown environments in the smart grid, such as power balance on both supply and demand sides, and optimal generator set scheduling problems.

The definition of the state space, the action space and the return in the Markov decision process is not unique, and the definition can be redefined according to other targets of the system or the individual; in addition, the selection of the learning rate in the Q-learning algorithm has a great influence on the convergence of the algorithm, so that the selection of the learning rate can be further analyzed and discussed.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of determining a price demand response, comprising:

judging whether the current moment reaches the terminal moment or not;

calculating the optimal energy consumption of the schedulable load according to the optimal retail electricity price strategy;

the expression of the optimal energy consumption is as follows:

in the method, in the process of the invention,the optimal energy consumption is represented, the subscript n represents a load unit index, the subscript t represents a time index, the subscript d represents a schedulable load identifier, and the load unit index is represented by +_ >Representing schedulable load +.>Energy demand at time t, +.>Representing schedulable load +.>Retail electricity price, eta, received at time t _t Representing schedulable load +.>Wholesale electricity price at time t and satisfies +.>μ _t The electricity price elastic coefficient is used for representing the ratio of energy demand to retail electricity price change at the time t;

the return on return r ^t The expression of (2) is as follows:

r ^t ＝ρU _t -(1-ρ)C _t ；

wherein ρ ε [0,1 ]]Is a weight parameter representing the phase of the return of the utility company and the integrated cost of the load unitFor social value, U _t Represents the net income of the electric company at the time t, C _t Representing the comprehensive cost of the load side at the time t;

net income U of electric power company at time t _t The expression of (2) is as follows:

in the method, in the process of the invention,for a non-schedulable load set, +.>Representing non-schedulable load +.>Retail electricity price received at time t, +.>Representing non-schedulable load +.>The energy consumption at the time t is marked with the upper index n representing the non-schedulable load mark, the lower index n representing the index of the load unit, the lower index t representing the index of the time index +.>Representing schedulable load +.>The energy consumption at time t, superscript d indicates the schedulable load identification, ++>For a schedulable load set, +.>Representing schedulable loadsRetail electricity price, eta, received at time t _t Representing schedulable load +.>Wholesale electricity prices at time t and satisfy The method comprises the steps that total electric energy purchased by an electric company to a power grid operator at the time t is represented, and a superscript tot represents a total electric energy identifier;

2. The method of claim 1, wherein the specific meaning of the price choosing probability-greedy strategy is: and randomly selecting a retail electricity price from the action set with epsilon probability, or selecting the retail electricity price corresponding to the maximum action value function with 1-epsilon probability, wherein epsilon represents the electricity price selection probability.

3. The method of determining according to claim 1, wherein the degree of dissatisfactionThe expression of (2) is as follows:

in the method, in the process of the invention,representing schedulable load +.>Dissatisfaction caused by reduced energy consumption demand at time t, < >>Andrepresenting two dependence on schedulable load +.>Dissatisfaction coefficient of->Representing schedulable load +.>Is decreased in demand, is->Representing schedulable load +.>At time t, the energy demand, superscript d, represents the schedulable load identification,for a schedulable load set, +. >Representing schedulable load +.>Energy consumption at time t.

4. A method of determining as claimed in claim 3, wherein the load is schedulableIs->The following inequality is satisfied:

5. The determination method according to claim 1, wherein the expression of the target action value function is as follows:

in which Q ^k (s ^t ，a ^t ) Representing the state s from all load units at the kth iteration as the target action value function ^t Starting, a target retail electricity price making action a is executed ^t Is defined as the cumulative future discount returns ofWherein gamma represents a discount factor, ">Is the learning rate, representing newly obtained Q ^k Value pair Q ^k-1 Degree of coverage of value, Q ^k-1 (s ^t ，a ^t ) Representing the reference motion value function, s ^t+1 Representing the states of all load units at time t+1, a ^t+1 Retail electricity price establishment operation at time t+1, Q ^k-1 (s ^t+1 ，a ^t+1 ) Representing the state s of k-1 iterations from all loads ^t+1 Proceeding, execution a ^t+1 Is provided for the accumulated future discount returns.

6. The method of determining according to claim 1, wherein the expression of the optimal retail price strategy is as follows:

7. The determination method according to claim 1, characterized in that the determination method further comprises: initializing the action value function, which specifically comprises:

8. A price demand response based determination system comprising:

the energy consumption calculation unit is used for calculating the optimal energy consumption of the schedulable load according to the optimal retail electricity price strategy;

The expression of the optimal energy consumption is as follows:

in the method, in the process of the invention,the optimal energy consumption is represented, the subscript n represents a load unit index, the subscript t represents a time index, the subscript d represents a schedulable load identifier, and the load unit index is represented by +_>Representing schedulable load +.>Energy demand at time t, +.>Representing schedulable load +.>Retail electricity price, eta, received at time t _t Representing schedulable load +.>The price of wholesale electricity at time t,and satisfy->μ _t The electricity price elastic coefficient is used for representing the ratio of energy demand to retail electricity price change at the time t;

the return on return r ^t The expression of (2) is as follows:

r ^t ＝ρU _t -(1-ρ)C _t ；

wherein ρ ε [0,1 ]]Is a weight parameter representing the relative social value of the profit of the electric company and the comprehensive cost of the load unit, U _t Represents the net income of the electric company at the time t, C _t Representing the comprehensive cost of the load side at the time t;

in the method, in the process of the invention,for a non-schedulable load set, +.>Representing non-schedulable load +.>Retail electricity price received at time t, +.>Representing non-schedulable load +.>The energy consumption at the time t is marked with the upper index n representing the non-schedulable load mark, the lower index n representing the index of the load unit, and the lower index t representing the time Index between->Representing schedulable load +.>The energy consumption at time t, superscript d indicates the schedulable load identification, ++>For a schedulable load set, +.>Representing schedulable loadsRetail electricity price, eta, received at time t _t Representing schedulable load +.>Wholesale electricity prices at time t and satisfy The method comprises the steps that total electric energy purchased by an electric company to a power grid operator at the time t is represented, and a superscript tot represents a total electric energy identifier;