CN113052638A

CN113052638A - Price demand response-based determination method and system

Info

Publication number: CN113052638A
Application number: CN202110366209.XA
Authority: CN
Inventors: 秦家虎; 万艳妮
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2021-04-06
Filing date: 2021-04-06
Publication date: 2021-06-29
Anticipated expiration: 2041-04-06
Also published as: CN113052638B

Abstract

The invention discloses a price demand response-based determination method and a price demand response-based determination system, wherein a dynamic retail electricity price pricing problem of an electric power company is modeled into a Markov decision process, retail electricity price making actions are determined by utilizing an electricity price selection probability-greedy strategy according to states of all load units at the current moment, immediate return of earnings and states of all load units at the next moment are obtained, a reference action value function obtained by last iteration is updated into a target action value function, when the current moment reaches a terminal moment and an absolute value of a difference value between the target action value function and the reference action value function is not greater than a difference threshold value, the target action value function is used as an optimal action value function, the optimal retail electricity price strategy is determined according to the optimal action value function, and then the optimal energy consumption of schedulable loads is calculated. When the action value function is determined, the influence of the current electricity price on the instant response of the load and the response in a future period of time is considered, so that the accuracy of the response based on price demand is improved.

Description

Price demand response-based determination method and system

Technical Field

The invention relates to the technical field of power grids, in particular to a price demand response-based determination method and system.

Background

The smart grid is a typical cyber-physical system that integrates advanced detection, control and communication technologies into a physical power system to provide reliable energy supply, promote active participation of loads, and ensure stable operation of the grid system. Based on the characteristic of smart grid information physical fusion, power demand response (demand response) has become a research hotspot in the field of energy management (energy management), and the aim is to change the energy use mode of a load according to time-varying electricity price or reward/penalty excitation, so as to achieve the aims of reducing energy cost on a demand side and the like. In other words, power demand response is a means of reshaping the load energy usage by price or incentive means to achieve more efficient energy management.

Currently, existing research work focuses mainly on two branches of demand response, namely, price-based demand response (price-based demand response) and incentive-based demand response (incentive-based demand response). Among them, it is expected to change the energy use pattern of the end user by pricing the electricity price according to the time-dependent electricity price, such as time-of-use pricing and real-time pricing, based on price demand response as a kind of commonly used demand response.

The existing price-based demand response is mostly based on a deterministic price mechanism, such as a time-of-use electricity price pricing mechanism, a day-ahead electricity price pricing mechanism or a linear price model. However, the deterministic price mechanism does not truly characterize the uncertainty and flexibility of the dynamic electricity market, and thus the accuracy of the existing price demand response-based method is not high.

Disclosure of Invention

In view of the above, the invention discloses a method and a system for determining based on price demand response, so as to achieve uncertainty and flexibility in truly depicting a dynamic power market and improve accuracy based on price demand response.

A price demand response based determination method, comprising:

modeling a dynamic retail electricity price pricing problem of an electric power company as a Markov decision process;

monitoring the states of all load units at the current moment, recording the states as a first state, and selecting a target retail price making action at the current moment by using a price selection probability-greedy strategy within an allowable retail price range;

calculating the profit return immediately after the target retail electricity price making action is executed, monitoring the states of all load units at the next moment of the current moment, and recording as a second state;

updating a reference action value function into a target action value function based on the first state, the second state, the target retail electricity price making action and the immediate return of income, wherein the reference action value function is an action value function obtained by last iteration;

judging whether the current time reaches the terminal time;

if so, judging whether the absolute value of the difference between the target action value function and the reference action value function is not greater than a difference threshold value;

if so, taking the target action value function as an optimal action value function, and determining an optimal retail electricity price strategy according to the optimal action value function;

and calculating the optimal energy consumption of the dispatchable load according to the optimal retail electricity price strategy.

Optionally, the specific meaning of the electricity price selection probability-greedy strategy is as follows: and randomly selecting one retail price from the action set according to the probability of epsilon, or selecting the retail price corresponding to the maximum action value function according to the probability of 1-epsilon, wherein epsilon represents the price selection probability.

Optionally, the revenue is reported immediately r^tThe expression of (a) is as follows:

r^t＝ρU_t-(1-ρ)C_t；

in the formula, rho is [0, 1]]Is a weight parameter, U, representing the relative social value of the utility's revenue and the combined cost of the load unit_tRepresenting the net income of the utility at time t, C_tIndicating the overall cost of the load side at time t.

Optionally, the net profit U of the utility at time t_tThe expression of (a) is as follows:

in the formula (I), the compound is shown in the specification,

in order to be a non-schedulable set of loads,

indicating non-schedulable load

The retail electricity prices received at the time t,

indicating non-schedulable load

The energy consumption at time t, the superscript n denoting the non-dispatchable load identity, the subscript n denoting the load unit index, the subscript t denoting the time index,

indicating schedulable load

The energy consumption at time t, the superscript d representing the schedulable load identifier,

in order to be able to schedule a set of loads,

indicating schedulable load

Retail electricity prices, η, received at time t_tIndicating schedulable load

Wholesale electricity price at time t and satisfy

The method comprises the steps that total electric energy purchased by an electric power company to a power grid operator at the moment t is represented, and the superscript tot represents a total electric energy identifier;

comprehensive cost C of load side at time t_tThe expression of (a) is as follows:

in the formula (I), the compound is shown in the specification,

indicating schedulable load

The level of dissatisfaction caused by the reduced energy consumption requirement at time t.

Optionally, degree of dissatisfaction

The expression of (a) is as follows:

in the formula (I), the compound is shown in the specification,

indicating schedulable load

The level of dissatisfaction caused by the reduced energy consumption requirement at time t,

and

representing two schedulable dependent loads

The dissatisfaction coefficient of (a) to (b),

indicating schedulable load

The amount of reduction in the demand of (c),

indicating schedulable load

At the time t, the energy demand, the superscript d, represents the schedulable load identifier,

in order to be able to schedule a set of loads,

indicating schedulable load

Energy consumption at time t.

Optional, schedulable load

Reduction in demand of

The following inequality is satisfied:

in the formula (I), the compound is shown in the specification,

and

respectively representing schedulable loads

And a maximum demand reduction amount, and are both known amounts.

Optionally, the expression of the target action value function is as follows:

in the formula, Q^k(s^t，a^t) Representing the state s from all load units at the kth iteration as a function of the target action value^tStarting, a target retail price making action a is executed^tIs defined as the cumulative future discount return of

Where γ represents a discount factor, e ∈ [0, 1]]Is the learning rate, represents the newly acquired Q^kValue pair Q^k-1Degree of coverage of value, Q^k-1(s^t，a^t) Representing said reference motion value function, s^t+1Representing the state of all load units at time t +1, a^t+1Indicating retail electricity price making action at time t +1, Q^k-1(s^t+1，a^t+1) State s representing k-1 iterations from all loads^t+1Starting from, carry out a^t+1Accumulated future discount returns.

Optionally, the expression of the optimal retail price policy is as follows:

in the formula, pi^*(s^t) For optimal retail electricity price strategy, Q^*(s^t，a^t) For the optimal action value function, A is the action set, and A ═ a¹，a²，…，a^TThe value range of the time t is: t1, 2, T represents the total number of time intervals, s^tRepresenting the state of all load units at time t, a^tIndicating a retail electricity price making action at time t.

Optionally, the expression of the optimal energy consumption is as follows:

in the formula (I), the compound is shown in the specification,

indicating the optimal energy consumption, the index n for the load unit, the index t for the time index, the index d for the dispatchable load identification,

indicating schedulable load

The energy requirement at the time of t,

indicating schedulable load

Retail electricity prices, η, received at time t_tIndicating schedulable load

Wholesale electricity price at time t and satisfy

μ_tThe electricity rate elastic coefficient indicates a rate at which the energy demand changes with the retail electricity rate at time t.

Optionally, the determining method further includes: initializing the action value function, specifically including:

obtaining known prior parameter data, substituting the prior parameter data into the predetermined action value function, and initializing the action value function, wherein the initial value of the action value function is 0.

A price demand response based determination system comprising:

the modeling unit is used for modeling the dynamic retail electricity price pricing problem of the power company into a Markov decision process;

the action selection unit is used for monitoring the states of all load units at the current moment, recording the states as first states, and selecting a target retail price at the current moment to make an action by using a price selection probability-greedy strategy within an allowable retail price range;

the return calculating unit is used for calculating the return immediately after the target retail price making action is executed, monitoring the states of all load units at the next moment of the current moment and recording the states as a second state;

a function updating unit, configured to update a reference action value function to a target action value function based on the first state, the second state, the target retail electricity price making action, and the immediate return of revenue, where the reference action value function is an action value function obtained through last iteration;

the first judgment unit is used for judging whether the current moment reaches the terminal moment;

a second judgment unit configured to judge whether or not an absolute value of a difference between the target motion value function and the reference motion value function is not greater than a difference threshold value, in a case where the first judgment unit judges yes;

the electricity price strategy determining unit is used for taking the target action value function as an optimal action value function under the condition that the second judging unit judges that the electricity price strategy is positive, and determining an optimal retail electricity price strategy according to the optimal action value function;

and the energy consumption calculating unit is used for calculating the optimal energy consumption of the dispatchable load according to the optimal retail electricity price strategy.

From the above technical solutions, the present invention discloses a method and a system for determining based on price demand response, modeling a dynamic retail electricity price pricing problem of an electric power company as a markov decision process, monitoring states of all load units at a current moment as a first state, selecting a target retail electricity price making action at the current moment by using an electricity price selection probability-greedy strategy within an allowable retail electricity price range, calculating an immediate return of revenue after executing the target retail electricity price making action, monitoring states of all load units at a next moment at the current moment and marking as a second state, updating a reference action value function obtained from a previous iteration to a target action value function based on the first state, the second state, the target retail electricity price making action and the immediate return of the revenue, when the current moment reaches a terminal moment, and an absolute value of a difference between the target action value function and the reference action value function is not greater than a difference threshold value, and taking the target action value function as an optimal action value function, determining an optimal retail electricity price strategy according to the optimal action value function by adopting a Markov decision process, and further calculating the optimal energy consumption of the dispatchable load according to the optimal retail electricity price strategy, thereby realizing the determination based on price demand response. The dynamic retail electricity price pricing problem of the power company is modeled into a Markov decision process, and when the optimal retail electricity price strategy is determined according to the optimal action value function, the influence of the current electricity price on the instant response of the load is considered, and the influence of the current electricity price on the response of the load in a period of time in the future is also considered, so that the uncertainty and the flexibility of the dynamic power market can be truly depicted, and the accuracy based on price demand response is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the disclosed drawings without creative efforts.

FIG. 1 is a flow chart of a method for determining a response based on price demand according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a price demand response-based determination system according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention discloses a method and a system for determining based on price demand response, which model the dynamic retail electricity price pricing problem of an electric power company into a Markov decision process, monitor the states of all load units at the current moment and record the states as first states, select a target retail electricity price making action at the current moment by using an electricity price selection probability-greedy strategy within the allowable retail electricity price range, calculate the immediate return of income after executing the target retail electricity price making action, monitor the states of all load units at the next moment at the current moment and record the states as second states, update the reference action value function obtained by the last iteration into a target action value function based on the first state, the second state, the target retail electricity price making action and the immediate return of income, when the current moment reaches the terminal moment and the absolute value of the difference between the target action value function and the reference action value function is not more than the difference threshold value, and taking the target action value function as an optimal action value function, determining an optimal retail electricity price strategy according to the optimal action value function by adopting a Markov decision process, and further calculating the optimal energy consumption of the dispatchable load according to the optimal retail electricity price strategy, thereby realizing the determination based on price demand response. The dynamic retail electricity price pricing problem of the power company is modeled into a Markov decision process, and when the optimal retail electricity price strategy is determined according to the optimal action value function, the influence of the current electricity price on the instant response of the load is considered, and the influence of the current electricity price on the response of the load in a period of time in the future is also considered, so that the uncertainty and the flexibility of the dynamic power market can be truly depicted, and the accuracy based on price demand response is improved.

It should be particularly noted that the price-based demand response to be protected by the present invention is specifically: a problem for residential retail power markets that is based on price demand response. The retail power market comprises a power company and limited collection

Wherein N refers to the total number of load units in the retail power market. In practical application, the upper-layer power company makes retail electricity prices for all the served lower-layer load units, and when the lower-layer load units receive retail electricity price signals, the lower-layer load units respond to the electricity prices in real time, so that own energy consumption strategies are determined and transmitted to the power company. Thus, in the residential retail power market framework, the goal of price-based demand response is to be within a limited period of time

Coordinating a limited set of electricity prices according to a dynamic retail

Thereby maximizing the social benefit of the power system (including the utility's benefit and the combined cost on the load side), where T represents the total number of time intervals.

Referring to fig. 1, a flowchart of a determination method based on price demand response according to an embodiment of the present invention is applied to a processor in an electric power company, and the determination method includes:

step S101, modeling a dynamic retail electricity price pricing problem of an electric power company into a Markov decision process;

step S102, monitoring the states of all load units at the current moment, recording the states as a first state, and selecting a target retail price making action at the current moment by using a price selection probability-greedy strategy within an allowable retail price range;

due to the initial state s^tRefers to the time index and the energy demand e of all load units at time t_tAnd the information is stored in the computer as prior parameter data, so that the initial state of the initialization parameter data can be inquired by randomly selecting the initial time t.

In this embodiment, the initial state s is monitored^tAnd selecting a target retail price making action a at the current moment by using a price selection probability-greedy strategy within an allowable retail price range^tThe electricity price selection probability is expressed by epsilon, the electricity price selection probability-greedy strategy is a criterion for electricity price selection, and the specific meaning is as follows: randomly selecting a retail price theta from the action set A according to the probability of epsilon_tOr selecting the retail price theta corresponding to the maximum action value function according to the probability of 1-epsilon_t。

Step S103, calculating the profit return immediately after the target retail price making action is executed, monitoring the states of all load units at the next moment of the current moment, and recording as a second state;

in this embodiment, after the profit is calculated and returned immediately, the states s of all the load units at the next time of the current time t, that is, at the time t +1, are monitored^t+1。

Wherein the profit is calculated according to the formula (1) for immediate return, and the formula (1) is as follows:

r^t＝ρU_t-(1-ρ)C_t(1)；

U_tIs shown in equation (2), equation (2) is as follows:

in the formula (I), the compound is shown in the specification,

in order to be a non-schedulable set of loads,

indicating non-schedulable load

The retail electricity prices received at the time t,

indicating non-schedulable load

indicating schedulable load

in order to be able to schedule a set of loads,

indicating schedulable load

Retail electricity prices, η, received at time t_tIndicating schedulable load

Wholesale electricity price at time t and satisfy

The total electric energy purchased by the electric power company to the electric network operator at the time t is represented, and the superscript tot represents a total electric energy identifier.

C_tIs shown in equation (3), equation (3) is as follows:

in the formula (I), the compound is shown in the specification,

indicating schedulable load

Step S104, updating a reference action value function into a target action value function based on the first state, the second state, the target retail price making action and the income immediate return;

the reference action value function is an action value function obtained by last iteration;

target action value function Q^k(s^t，a^t) Is shown in equation (4), equation (4) is as follows:

in the formula, Q^k(s^t，a^t) Represents the state s from all load units at the kth iteration as a function of the target action value^tStarting, a target retail price making action a is executed^tAccumulated future discount ofThe reward is defined as

Where γ represents a discount factor, e ∈ [0, 1]]Is the learning rate, represents the newly acquired Q^kValue pair Q^k-1Degree of coverage of value, Q^k-1(s^t，a^t) Representing said reference motion value function, s^t+1Representing the state of all load units at time t +1, a^t+1Indicating retail electricity price making action at time t +1, Q^k-1(s^t-1，a^t+1) State s representing k-1 iterations from all loads^t+1Starting from, carry out a^t+1Accumulated future discount returns.

Step S105, judging whether the current time reaches the terminal time, if so, executing step S106;

if the current time T does not reach the terminal time T, the process returns to step S102.

Step S106, judging whether the absolute value of the difference between the target action value function and the reference action value function is not greater than a difference threshold value, if so, continuing to execute step S107;

in this embodiment, when | Q^k-Q^k-1If the value is less than or equal to delta, continuing to execute the step S108, otherwise, returning to the step S102, wherein Q^k-1As a function of a reference action value, Q^kAs a function of the target motion value, δ is the difference threshold.

Step S107, taking the target action value function as an optimal action value function, and determining an optimal retail electricity price strategy according to the optimal action value function by adopting a Markov decision process;

specifically, the optimal retail electricity price strategy is shown in formula (5), and formula (5) is as follows:

in the formula, pi^*(s^t) For optimal retail electricity price strategy, Q^*(s^t，a^t) As a function of the optimum action value, A isAction set, A ═ a¹，a²，…，a^T}。

And S108, calculating the optimal energy consumption of the dispatchable load according to the optimal retail electricity price strategy.

The expression of the optimal energy consumption is shown in formula (6), and formula (6) is as follows:

in the formula (I), the compound is shown in the specification,

indicating schedulable load

The energy requirement at the time of t,

indicating schedulable load

Retail electricity prices, η, received at time t_tIndicating schedulable load

Wholesale electricity price at time t and satisfy

To sum up, the invention discloses a method for determining based on price demand response, modeling the dynamic retail electricity price pricing problem of an electric power company as a Markov decision process, monitoring the states of all load units at the current moment as a first state, selecting a target retail electricity price making action at the current moment by using an electricity price selection probability-greedy strategy within an allowable retail electricity price range, calculating the immediate return of income after executing the target retail electricity price making action, monitoring the states of all load units at the next moment at the current moment and marking as a second state, updating a reference action value function obtained from the last iteration into a target action value function based on the first state, the second state, the target retail electricity price making action and the immediate return of income, when the current moment reaches a terminal moment and the absolute value of the difference between the target action value function and the reference action value function is not more than a difference threshold value, and taking the target action value function as an optimal action value function, determining an optimal retail electricity price strategy according to the optimal action value function by adopting a Markov decision process, and further calculating the optimal energy consumption of the dispatchable load according to the optimal retail electricity price strategy, thereby realizing the determination based on price demand response. The dynamic retail electricity price pricing problem of the power company is modeled into a Markov decision process, and when the optimal retail electricity price strategy is determined according to the optimal action value function, the influence of the current electricity price on the instant response of the load is considered, and the influence of the current electricity price on the response of the load in a period of time in the future is also considered, so that the uncertainty and the flexibility of the dynamic power market can be truly depicted, and the accuracy based on price demand response is improved.

In addition, the present invention utilizes a reinforcement learning algorithm to solve the price-based demand response problem in an unknown electricity market environment (i.e., retail electricity prices and load energy consumption are uncertain and random).

To further optimize the above embodiment, before step S102, the action value function needs to be initialized, and the process of initializing the action value function includes:

obtaining known prior parameter data, bringing the prior parameter data into a predetermined action value function, and initializing the action value function.

Wherein the prior parameter data comprises: energy requirement e of load cell_tCoefficient of dissatisfaction

And

elastic coefficient of electricity price mu_tAnd wholesale electricity price eta_tAnd a weight parameter ρ, etc., t representing the time.

The initial value of the action value function is 0, i.e. Q^k(s^t，a^t) At 0, the number of iterations k is 1, i.e., k is 1, and the time t is 1, i.e., Q⁰(s，a)＝0。

In this embodiment, the value range of the time t is: t1, 2, T represents the total number of time intervals.

The variable parameters in the action value function include: s^tAnd a^t，s^tIndicating the state of all load units at time t, i.e. the energy demand e of all load units at time t_tEnergy consumption p_tAnd a time index t; a is^tRetail electricity price making action for representing t time, namely retail electricity price theta made by electric power company for all load units at t time_t。

It should be noted that, since the power system model relates to information interaction between the power company and the load unit, in order to facilitate understanding of the technical solution to be protected by the present invention, a mathematical model between the power company and the load unit is described below.

Loads are generally classified into two categories, i.e., schedulable loads, according to user preferences and energy consumption characteristics of the loads

And non-schedulable load

That is to say that

The load can be scheduled: the energy consumption of a typical schedulable load is expressed as equation (7), where equation (7) is as follows:

in the formula (I), the compound is shown in the specification,

and

respectively representing schedulable loads

Energy consumption and energy demand at time t, wherein energy demand refers to the electric energy expected to be consumed by the load unit before receiving the retail electricity price signal, energy consumption information refers to the electric energy actually consumed by the load unit after receiving the retail electricity price signal, subscript n denotes a load unit index, subscript t denotes a time index, and superscript d denotes a schedulable load identifier. Mu.s_tThe electricity rate elastic coefficient indicates a rate at which the energy demand changes with the retail electricity rate at time t.

And η_tRespectively representing schedulable loads

Retail electricity prices received at time t and wholesale electricity prices at time t, and satisfy

Equation (7) shows that the actual energy consumption of the schedulable load depends not only on the energy demand information but also on the reduction of the energy demand due to the change of the retail electricity prices. When the schedulable load cell n is trueThe actual energy consumption is

Time means the rest

Are not satisfied and thus may result in dissatisfaction with the load user.

To characterize this dissatisfaction, a dissatisfaction function is defined as shown in equation (8), equation (8) being as follows:

in the formula (I), the compound is shown in the specification,

indicating schedulable load

and

representing two schedulable dependent loads

The dissatisfaction coefficient of (a) to (b),

indicating schedulable load

The amount of reduction in the demand of (c),

indicating schedulable load

in order to be able to schedule a set of loads,

indicating schedulable load

Energy consumption at time t.

Equation (8) shows that a greater reduction in demand results in a higher level of dissatisfaction of the load unit.

In addition, the load can be scheduled

Cannot exceed its allowable range, as shown in the inequality (9), the inequality (9) is as follows:

in the formula (I), the compound is shown in the specification,

and

respectively representing schedulable loads

And a maximum demand reduction amount, and are both known amounts. Once the cover is closed

And

when known, the real energy consumption of the schedulable load n can be determined accordingly

The range of (1).

(II) non-schedulable load: generally, the energy requirements of non-dispatchable loads cannot be transferred and curtailed at will, so the energy requirements of these loads must be strictly met at all times.

Thus, pair

In the present invention, formula (10) is satisfied, and formula (10) is as follows:

in the formula (I), the compound is shown in the specification,

and

respectively representing non-dispatchable loads

Energy consumption and energy demand at time t, where the superscript n denotes the non-dispatchable load identity.

Thus, from a load point of view, the goal is to minimize the overall cost on the load side by determining the optimal energy consumption combination for all loads, i.e. to minimize the overall cost on the load side

Wherein p represents the energy consumption vector of all the participating load units in the whole time period T,

indicating non-schedulable load

Retail electricity prices received at time t.

It can be seen that the above formula consists of two parts, corresponding to two types of loads, respectively. Specifically, item one

Representing the cost of electricity to a utility company for the purchase of electricity by a non-dispatchable load, item two

Representing the power cost and dissatisfaction costs of dispatchable loads to purchase power from the utility.

For the convenience of subsequent discussion and writing, the comprehensive cost of the load side at the time t is defined as C_tTherefore, the above formula can be further written as

The electric power company, as an intermediary between the end user and the electric power producer, first purchases electric energy from the grid operator at a wholesale price and then sells the purchased electric energy to different types of load units on the load side at a retail price. The goal of the electric company is therefore to maximize revenue by trading in wholesale and retail markets, the mathematical model of which can be expressed as:

wherein θ represents a retail electricity price vector established by the electric power company for all load units during the whole time period T;

representing the total power purchased by the utility company to the grid operator at time t, where tot represents the total power identification,θ _nand

respectively, representing the lower and upper bounds of the retail price of electricity made by the utility for the load unit. It can be seen that the objective function in the mathematical model consists of three terms, wherein,

representing the revenue of the utility selling power to the non-dispatchable load units,

representing the revenue of the utility selling power to the dispatchable load unit,

representing the cost of electricity from a utility company to purchase electricity from a grid operator. Likewise, for convenience of presentation, the net revenue for the utility at time t is defined as U_tTherefore, the objective function in the above formula is further described as

Generally, when power loss is not considered and a power balance rule is followed, the total purchased power of the electric power company is equal to the total power consumption of the load side at any time, that is, as shown in equation (11), equation (11) is as follows:

in modeling the utility and load units, it can be found that price-based demand responses are closely related to revenue for the utility and cost of the load units. From a social perspective, therefore, the goal of the system is to maximize the social benefit including the combined cost of the utility revenue and load, as shown in equation (12), where equation (12) is as follows:

where ρ ∈ [0, 1] is a weight parameter representing the relative social value of revenue of the utility and the combined cost of the load unit. The larger ρ is, the more the profit of the electric power company is concerned from the social point of view; on the contrary, the influence of the comprehensive cost of the load on the social profit is more concerned.

In order to establish a dynamic retail electricity price capable of adapting to flexible load change in an unknown electric power market environment, the method firstly utilizes a reinforcement learning framework to model the retail electric power market.

Specifically, the electric power company acts as an agent; all load units are used as environment; an act of retail price being acted upon the environment as an agent; energy demand, energy consumption and time of the load as conditions; the social benefit (i.e., the weighted sum of the utility revenue and the combined cost of the load units) is rewarded.

Second, the dynamic retail price pricing problem is modeled using a Markov decision process, which is also typically the first step in using a reinforcement learning algorithm. Without loss of generality, the markov decision process is represented by a five-tuple < S, a, R, P, γ >, where the meaning of each element is as follows:

1) state collection: s ═ S¹，s²，…，s^TIn which s is^t＝(e_t，p_tT) represents the state at time t, from which the energy demand e of all load units_tEnergy consumption p_tAnd a time index t.

2) And (4) action set: a ═ a¹，a²，…，a^TIn which a is^t＝θ_tRepresenting the action of the agent at time t, i.e. the retail price of electricity theta established by the electric power company for all load units at time t_t。

3) And (3) return set: r ═ R¹，r²，…，r^TIn which r is^t＝ρU_t-(1-ρ)C_tAnd the return of the system at the moment t is shown, namely the social benefit at the current moment.

4) State transition matrix:

wherein

Representing the probability that after taking action a at state s, the environment will transition to state s' at the next moment. Since the energy demand and energy consumption of the load are influenced by many factors, it is difficult to obtain the state transition probability thereof. The power market environment is unknown in the present invention, and therefore a modeless Q-learning approach is employed to address the dynamic retail pricing problem.

5) Discount factor: γ ∈ [0, 1] denotes the importance of the subsequent reward relative to the current reward.

Defining a strategy pi: s → A, i.e. mapping of state to action, the pricing problem of retail electricity prices translates into finding an optimal strategy of π^*Maximizing the cumulative return of the system, i.e.

Since the goal of the system is to maximize the social benefit over the entire time period and the social value returned at any time is equal, γ is 1 in the present invention.

After modeling the dynamic retail price pricing problem for the utility as a markov decision process, a Q-learning algorithm (a model-free reinforcement learning algorithm) is used to analyze how the utility selects the optimal retail price while interacting with all load units to achieve the power system objective.

The basic principle of the Q-learning algorithm is to assign an action-value function Q (s, a) to each state-action pair (s, a), and then update the function in each iteration to obtain the optimal action-value function Q^*(s, a). The optimal action value function is defined as starting from a state s, taking an action a and then taking an optimal strategy pi^*And satisfies the Bellman equation, i.e. the maximum cumulative future discount return

Where S 'is e.S, a' is e.A represents the state and action taken at the next time, respectively, r (S, a) represents the reward immediately after action a is taken from state S, Q^*(s ', a') denotes that starting from the state s ', an action a' is performed, after which an optimal strategy is taken π^*The maximum cumulative future discount returns. Gamma is belonged to 0, 1]And a discount factor is expressed, so that the influence of the current retail price on the load response in the future period is considered in the algorithm, and the influence of the current retail price on the load response is reflected at the same time. Therefore, once the optimum action value function Q is obtained^*(s^t，a^t) The optimal retail electricity price policy shown in formula (5) can be directly obtained according to the following formula.

Corresponding to the embodiment of the method, the invention also discloses a system for determining the price demand response.

Referring to fig. 2, a schematic structural diagram of a price demand response-based determination system disclosed in an embodiment of the present invention, the system is applied to a processor in an electric power company, and the system includes:

a modeling unit 201, configured to model a dynamic retail price pricing problem of an electric power company as a markov decision process;

the action selection unit 202 is used for monitoring the states of all load units at the current moment, recording the states as first states, and selecting a target retail price at the current moment to make an action by using a price selection probability-greedy strategy within an allowable retail price range;

a reward calculation unit 203, configured to calculate a reward immediate reward after the target retail price making action is executed, monitor states of all load units at a next time of the current time, and record the states as a second state;

a function updating unit 204, configured to update a reference action value function to a target action value function based on the first state, the second state, the target retail electricity price making action, and the immediate return of revenue, where the reference action value function is an action value function obtained through last iteration;

a first judging unit 205, configured to judge whether the current time reaches a terminal time;

a second determination unit 206 configured to determine whether or not an absolute value of a difference between the target motion value function and the reference motion value function is not greater than a difference threshold value if the first determination unit 205 determines yes;

an electricity price policy determination unit 207 configured to, if the second determination unit 206 determines that the current price is the retail price, take the target action value function as an optimal action value function, and determine an optimal retail electricity price policy according to the optimal action value function;

and the energy consumption calculating unit 208 is used for calculating the optimal energy consumption of the dispatchable load according to the optimal retail electricity price strategy.

It should be noted that, for the specific working principle of each component in the system embodiment, please refer to the corresponding part of the method embodiment, which is not described herein again.

To sum up, the invention discloses a price demand response-based determination system, which models a dynamic retail electricity price pricing problem of an electric power company into a Markov decision process, monitors the states of all load units at the current moment to be recorded as a first state, selects a target retail electricity price making action at the current moment by using an electricity price selection probability-greedy strategy within an allowable retail electricity price range, calculates the immediate return of income after executing the target retail electricity price making action, monitors the states of all load units at the next moment at the current moment to be recorded as a second state, updates a reference action value function obtained from the last iteration into a target action value function based on the first state, the second state, the target retail electricity price making action and the immediate return of income, when the current moment reaches a terminal moment and the absolute value of the difference between the target action value function and the reference action value function is not more than a difference threshold value, and taking the target action value function as an optimal action value function, determining an optimal retail electricity price strategy according to the optimal action value function by adopting a Markov decision process, and further calculating the optimal energy consumption of the dispatchable load according to the optimal retail electricity price strategy, thereby realizing the determination based on price demand response. The dynamic retail electricity price pricing problem of the power company is modeled into a Markov decision process, and when the optimal retail electricity price strategy is determined according to the optimal action value function, the influence of the current electricity price on the instant response of the load is considered, and the influence of the current electricity price on the response of the load in a period of time in the future is also considered, so that the uncertainty and the flexibility of the dynamic power market can be truly depicted, and the accuracy based on price demand response is improved.

It should be noted that, besides the field of energy management on the demand side, the present invention can also be applied to decision-making problems in other unknown environments in the smart grid, such as power balance on both sides of supply and demand, scheduling problems of optimal generator sets, and the like.

The state space, the action space and the return definition in the Markov decision process are not unique, and can be redefined according to other targets of a system or an individual; in addition, the selection of the learning rate in the Q-learning algorithm has a great influence on the convergence of the algorithm, so that the selection of the learning rate can be further analyzed and discussed.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for determining a response based on a price demand, comprising:

monitoring the states of all load units at the current moment, recording the states as a first state, and selecting a target retail electricity price making action at the current moment by using an electricity price selection probability-greedy strategy within an allowable retail electricity price range;

judging whether the current time reaches the terminal time;

2. The validation method according to claim 1, wherein the specific meaning of the electricity price selection probability-greedy strategy is: and randomly selecting one retail price from the action set according to the probability of epsilon, or selecting the retail price corresponding to the maximum action value function according to the probability of 1-epsilon, wherein epsilon represents the price selection probability.

3. The validation method of claim 1, wherein the revenue is reported back immediately r^tThe expression of (a) is as follows:

r^t＝ρU_t-(1-ρ)C_t；

4. The validation method of claim 3, wherein the net profit U for the utility at time t_tThe expression of (a) is as follows:

in the formula (I), the compound is shown in the specification,

in order to be a non-schedulable set of loads,

indicating non-schedulable load

The retail electricity prices received at the time t,

indicating non-schedulable load

indicating schedulable load

in order to be able to schedule a set of loads,

indicating schedulable load

Retail electricity prices, η, received at time t_tIndicating schedulable load

Wholesale electricity price at time t and satisfy

Representing the total power purchased by the utility company to the grid operator at time t, and the superscript tot representing the totalElectric energy identification;

in the formula (I), the compound is shown in the specification,

indicating schedulable load

5. The confirmation method according to claim 4, wherein the degree of dissatisfaction

The expression of (a) is as follows:

in the formula (I), the compound is shown in the specification,

indicating schedulable load

and

representing two schedulable dependent loads

The dissatisfaction coefficient of (a) to (b),

indicating schedulable load

The amount of reduction in the demand of (c),

indicating schedulable load

in order to be able to schedule a set of loads,

indicating schedulable load

Energy consumption at time t.

6. The validation method of claim 5, wherein the load is schedulable

Reduction in demand of

The following inequality is satisfied:

in the formula (I), the compound is shown in the specification,

and

respectively representing schedulable loads

And a maximum demand reduction amount, and are both known amounts.

7. The validation method of claim 1, wherein the target action value function is expressed as follows:

8. The confirmation method according to claim 1, wherein the expression of the optimal retail electricity price policy is as follows:

9. The validation method of claim 1, wherein the expression of the optimal amount of energy consumption is as follows:

in the formula (I), the compound is shown in the specification,

indicating schedulable load

The energy requirement at the time of t,

indicating schedulable load

Retail electricity prices, η, received at time t_tIndicating schedulable load

Wholesale electricity price at time t and satisfy

10. The validation method of claim 1, wherein the determination method further comprises: initializing the action value function, specifically including:

11. A price demand response-based determination system, comprising: