CN113052638A - Price demand response-based determination method and system - Google Patents

Price demand response-based determination method and system Download PDF

Info

Publication number
CN113052638A
CN113052638A CN202110366209.XA CN202110366209A CN113052638A CN 113052638 A CN113052638 A CN 113052638A CN 202110366209 A CN202110366209 A CN 202110366209A CN 113052638 A CN113052638 A CN 113052638A
Authority
CN
China
Prior art keywords
load
value function
time
price
retail
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110366209.XA
Other languages
Chinese (zh)
Other versions
CN113052638B (en
Inventor
秦家虎
万艳妮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN202110366209.XA priority Critical patent/CN113052638B/en
Publication of CN113052638A publication Critical patent/CN113052638A/en
Application granted granted Critical
Publication of CN113052638B publication Critical patent/CN113052638B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0206Price or cost determination based on market factors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Finance (AREA)
  • Marketing (AREA)
  • Theoretical Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Game Theory and Decision Science (AREA)
  • Data Mining & Analysis (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Primary Health Care (AREA)
  • Tourism & Hospitality (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a price demand response-based determination method and a price demand response-based determination system, wherein a dynamic retail electricity price pricing problem of an electric power company is modeled into a Markov decision process, retail electricity price making actions are determined by utilizing an electricity price selection probability-greedy strategy according to states of all load units at the current moment, immediate return of earnings and states of all load units at the next moment are obtained, a reference action value function obtained by last iteration is updated into a target action value function, when the current moment reaches a terminal moment and an absolute value of a difference value between the target action value function and the reference action value function is not greater than a difference threshold value, the target action value function is used as an optimal action value function, the optimal retail electricity price strategy is determined according to the optimal action value function, and then the optimal energy consumption of schedulable loads is calculated. When the action value function is determined, the influence of the current electricity price on the instant response of the load and the response in a future period of time is considered, so that the accuracy of the response based on price demand is improved.

Description

Price demand response-based determination method and system
Technical Field
The invention relates to the technical field of power grids, in particular to a price demand response-based determination method and system.
Background
The smart grid is a typical cyber-physical system that integrates advanced detection, control and communication technologies into a physical power system to provide reliable energy supply, promote active participation of loads, and ensure stable operation of the grid system. Based on the characteristic of smart grid information physical fusion, power demand response (demand response) has become a research hotspot in the field of energy management (energy management), and the aim is to change the energy use mode of a load according to time-varying electricity price or reward/penalty excitation, so as to achieve the aims of reducing energy cost on a demand side and the like. In other words, power demand response is a means of reshaping the load energy usage by price or incentive means to achieve more efficient energy management.
Currently, existing research work focuses mainly on two branches of demand response, namely, price-based demand response (price-based demand response) and incentive-based demand response (incentive-based demand response). Among them, it is expected to change the energy use pattern of the end user by pricing the electricity price according to the time-dependent electricity price, such as time-of-use pricing and real-time pricing, based on price demand response as a kind of commonly used demand response.
The existing price-based demand response is mostly based on a deterministic price mechanism, such as a time-of-use electricity price pricing mechanism, a day-ahead electricity price pricing mechanism or a linear price model. However, the deterministic price mechanism does not truly characterize the uncertainty and flexibility of the dynamic electricity market, and thus the accuracy of the existing price demand response-based method is not high.
Disclosure of Invention
In view of the above, the invention discloses a method and a system for determining based on price demand response, so as to achieve uncertainty and flexibility in truly depicting a dynamic power market and improve accuracy based on price demand response.
A price demand response based determination method, comprising:
modeling a dynamic retail electricity price pricing problem of an electric power company as a Markov decision process;
monitoring the states of all load units at the current moment, recording the states as a first state, and selecting a target retail price making action at the current moment by using a price selection probability-greedy strategy within an allowable retail price range;
calculating the profit return immediately after the target retail electricity price making action is executed, monitoring the states of all load units at the next moment of the current moment, and recording as a second state;
updating a reference action value function into a target action value function based on the first state, the second state, the target retail electricity price making action and the immediate return of income, wherein the reference action value function is an action value function obtained by last iteration;
judging whether the current time reaches the terminal time;
if so, judging whether the absolute value of the difference between the target action value function and the reference action value function is not greater than a difference threshold value;
if so, taking the target action value function as an optimal action value function, and determining an optimal retail electricity price strategy according to the optimal action value function;
and calculating the optimal energy consumption of the dispatchable load according to the optimal retail electricity price strategy.
Optionally, the specific meaning of the electricity price selection probability-greedy strategy is as follows: and randomly selecting one retail price from the action set according to the probability of epsilon, or selecting the retail price corresponding to the maximum action value function according to the probability of 1-epsilon, wherein epsilon represents the price selection probability.
Optionally, the revenue is reported immediately rtThe expression of (a) is as follows:
rt=ρUt-(1-ρ)Ct
in the formula, rho is [0, 1]]Is a weight parameter, U, representing the relative social value of the utility's revenue and the combined cost of the load unittRepresenting the net income of the utility at time t, CtIndicating the overall cost of the load side at time t.
Optionally, the net profit U of the utility at time ttThe expression of (a) is as follows:
Figure BDA0003007626190000021
in the formula (I), the compound is shown in the specification,
Figure BDA0003007626190000031
in order to be a non-schedulable set of loads,
Figure BDA0003007626190000032
indicating non-schedulable load
Figure BDA0003007626190000033
The retail electricity prices received at the time t,
Figure BDA0003007626190000034
indicating non-schedulable load
Figure BDA0003007626190000035
The energy consumption at time t, the superscript n denoting the non-dispatchable load identity, the subscript n denoting the load unit index, the subscript t denoting the time index,
Figure BDA0003007626190000036
indicating schedulable load
Figure BDA0003007626190000037
The energy consumption at time t, the superscript d representing the schedulable load identifier,
Figure BDA0003007626190000038
in order to be able to schedule a set of loads,
Figure BDA0003007626190000039
indicating schedulable load
Figure BDA00030076261900000310
Retail electricity prices, η, received at time ttIndicating schedulable load
Figure BDA00030076261900000311
Wholesale electricity price at time t and satisfy
Figure BDA00030076261900000312
Figure BDA00030076261900000313
The method comprises the steps that total electric energy purchased by an electric power company to a power grid operator at the moment t is represented, and the superscript tot represents a total electric energy identifier;
comprehensive cost C of load side at time ttThe expression of (a) is as follows:
Figure BDA00030076261900000314
in the formula (I), the compound is shown in the specification,
Figure BDA00030076261900000315
indicating schedulable load
Figure BDA00030076261900000316
The level of dissatisfaction caused by the reduced energy consumption requirement at time t.
Optionally, degree of dissatisfaction
Figure BDA00030076261900000317
The expression of (a) is as follows:
Figure BDA00030076261900000318
in the formula (I), the compound is shown in the specification,
Figure BDA00030076261900000319
indicating schedulable load
Figure BDA00030076261900000320
The level of dissatisfaction caused by the reduced energy consumption requirement at time t,
Figure BDA00030076261900000321
and
Figure BDA00030076261900000322
representing two schedulable dependent loads
Figure BDA00030076261900000323
The dissatisfaction coefficient of (a) to (b),
Figure BDA00030076261900000324
indicating schedulable load
Figure BDA00030076261900000325
The amount of reduction in the demand of (c),
Figure BDA00030076261900000326
indicating schedulable load
Figure BDA0003007626190000041
At the time t, the energy demand, the superscript d, represents the schedulable load identifier,
Figure BDA0003007626190000042
in order to be able to schedule a set of loads,
Figure BDA0003007626190000043
indicating schedulable load
Figure BDA0003007626190000044
Energy consumption at time t.
Optional, schedulable load
Figure BDA0003007626190000045
Reduction in demand of
Figure BDA0003007626190000046
The following inequality is satisfied:
Figure BDA0003007626190000047
in the formula (I), the compound is shown in the specification,
Figure BDA0003007626190000048
and
Figure BDA0003007626190000049
respectively representing schedulable loads
Figure BDA00030076261900000410
And a maximum demand reduction amount, and are both known amounts.
Optionally, the expression of the target action value function is as follows:
Figure BDA00030076261900000411
in the formula, Qk(st,at) Representing the state s from all load units at the kth iteration as a function of the target action valuetStarting, a target retail price making action a is executedtIs defined as the cumulative future discount return of
Figure BDA00030076261900000412
Where γ represents a discount factor, e ∈ [0, 1]]Is the learning rate, represents the newly acquired QkValue pair Qk-1Degree of coverage of value, Qk-1(st,at) Representing said reference motion value function, st+1Representing the state of all load units at time t +1, at+1Indicating retail electricity price making action at time t +1, Qk-1(st+1,at+1) State s representing k-1 iterations from all loadst+1Starting from, carry out at+1Accumulated future discount returns.
Optionally, the expression of the optimal retail price policy is as follows:
Figure BDA00030076261900000413
in the formula, pi*(st) For optimal retail electricity price strategy, Q*(st,at) For the optimal action value function, A is the action set, and A ═ a1,a2,…,aTThe value range of the time t is: t1, 2, T represents the total number of time intervals, stRepresenting the state of all load units at time t, atIndicating a retail electricity price making action at time t.
Optionally, the expression of the optimal energy consumption is as follows:
Figure BDA0003007626190000051
in the formula (I), the compound is shown in the specification,
Figure BDA0003007626190000052
indicating the optimal energy consumption, the index n for the load unit, the index t for the time index, the index d for the dispatchable load identification,
Figure BDA0003007626190000053
indicating schedulable load
Figure BDA0003007626190000054
The energy requirement at the time of t,
Figure BDA0003007626190000055
indicating schedulable load
Figure BDA0003007626190000056
Retail electricity prices, η, received at time ttIndicating schedulable load
Figure BDA0003007626190000057
Wholesale electricity price at time t and satisfy
Figure BDA0003007626190000058
μtThe electricity rate elastic coefficient indicates a rate at which the energy demand changes with the retail electricity rate at time t.
Optionally, the determining method further includes: initializing the action value function, specifically including:
obtaining known prior parameter data, substituting the prior parameter data into the predetermined action value function, and initializing the action value function, wherein the initial value of the action value function is 0.
A price demand response based determination system comprising:
the modeling unit is used for modeling the dynamic retail electricity price pricing problem of the power company into a Markov decision process;
the action selection unit is used for monitoring the states of all load units at the current moment, recording the states as first states, and selecting a target retail price at the current moment to make an action by using a price selection probability-greedy strategy within an allowable retail price range;
the return calculating unit is used for calculating the return immediately after the target retail price making action is executed, monitoring the states of all load units at the next moment of the current moment and recording the states as a second state;
a function updating unit, configured to update a reference action value function to a target action value function based on the first state, the second state, the target retail electricity price making action, and the immediate return of revenue, where the reference action value function is an action value function obtained through last iteration;
the first judgment unit is used for judging whether the current moment reaches the terminal moment;
a second judgment unit configured to judge whether or not an absolute value of a difference between the target motion value function and the reference motion value function is not greater than a difference threshold value, in a case where the first judgment unit judges yes;
the electricity price strategy determining unit is used for taking the target action value function as an optimal action value function under the condition that the second judging unit judges that the electricity price strategy is positive, and determining an optimal retail electricity price strategy according to the optimal action value function;
and the energy consumption calculating unit is used for calculating the optimal energy consumption of the dispatchable load according to the optimal retail electricity price strategy.
From the above technical solutions, the present invention discloses a method and a system for determining based on price demand response, modeling a dynamic retail electricity price pricing problem of an electric power company as a markov decision process, monitoring states of all load units at a current moment as a first state, selecting a target retail electricity price making action at the current moment by using an electricity price selection probability-greedy strategy within an allowable retail electricity price range, calculating an immediate return of revenue after executing the target retail electricity price making action, monitoring states of all load units at a next moment at the current moment and marking as a second state, updating a reference action value function obtained from a previous iteration to a target action value function based on the first state, the second state, the target retail electricity price making action and the immediate return of the revenue, when the current moment reaches a terminal moment, and an absolute value of a difference between the target action value function and the reference action value function is not greater than a difference threshold value, and taking the target action value function as an optimal action value function, determining an optimal retail electricity price strategy according to the optimal action value function by adopting a Markov decision process, and further calculating the optimal energy consumption of the dispatchable load according to the optimal retail electricity price strategy, thereby realizing the determination based on price demand response. The dynamic retail electricity price pricing problem of the power company is modeled into a Markov decision process, and when the optimal retail electricity price strategy is determined according to the optimal action value function, the influence of the current electricity price on the instant response of the load is considered, and the influence of the current electricity price on the response of the load in a period of time in the future is also considered, so that the uncertainty and the flexibility of the dynamic power market can be truly depicted, and the accuracy based on price demand response is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the disclosed drawings without creative efforts.
FIG. 1 is a flow chart of a method for determining a response based on price demand according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a price demand response-based determination system according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention discloses a method and a system for determining based on price demand response, which model the dynamic retail electricity price pricing problem of an electric power company into a Markov decision process, monitor the states of all load units at the current moment and record the states as first states, select a target retail electricity price making action at the current moment by using an electricity price selection probability-greedy strategy within the allowable retail electricity price range, calculate the immediate return of income after executing the target retail electricity price making action, monitor the states of all load units at the next moment at the current moment and record the states as second states, update the reference action value function obtained by the last iteration into a target action value function based on the first state, the second state, the target retail electricity price making action and the immediate return of income, when the current moment reaches the terminal moment and the absolute value of the difference between the target action value function and the reference action value function is not more than the difference threshold value, and taking the target action value function as an optimal action value function, determining an optimal retail electricity price strategy according to the optimal action value function by adopting a Markov decision process, and further calculating the optimal energy consumption of the dispatchable load according to the optimal retail electricity price strategy, thereby realizing the determination based on price demand response. The dynamic retail electricity price pricing problem of the power company is modeled into a Markov decision process, and when the optimal retail electricity price strategy is determined according to the optimal action value function, the influence of the current electricity price on the instant response of the load is considered, and the influence of the current electricity price on the response of the load in a period of time in the future is also considered, so that the uncertainty and the flexibility of the dynamic power market can be truly depicted, and the accuracy based on price demand response is improved.
It should be particularly noted that the price-based demand response to be protected by the present invention is specifically: a problem for residential retail power markets that is based on price demand response. The retail power market comprises a power company and limited collection
Figure BDA0003007626190000081
Wherein N refers to the total number of load units in the retail power market. In practical application, the upper-layer power company makes retail electricity prices for all the served lower-layer load units, and when the lower-layer load units receive retail electricity price signals, the lower-layer load units respond to the electricity prices in real time, so that own energy consumption strategies are determined and transmitted to the power company. Thus, in the residential retail power market framework, the goal of price-based demand response is to be within a limited period of time
Figure BDA0003007626190000082
Coordinating a limited set of electricity prices according to a dynamic retail
Figure BDA0003007626190000083
Thereby maximizing the social benefit of the power system (including the utility's benefit and the combined cost on the load side), where T represents the total number of time intervals.
Referring to fig. 1, a flowchart of a determination method based on price demand response according to an embodiment of the present invention is applied to a processor in an electric power company, and the determination method includes:
step S101, modeling a dynamic retail electricity price pricing problem of an electric power company into a Markov decision process;
step S102, monitoring the states of all load units at the current moment, recording the states as a first state, and selecting a target retail price making action at the current moment by using a price selection probability-greedy strategy within an allowable retail price range;
due to the initial state stRefers to the time index and the energy demand e of all load units at time ttAnd the information is stored in the computer as prior parameter data, so that the initial state of the initialization parameter data can be inquired by randomly selecting the initial time t.
In this embodiment, the initial state s is monitoredtAnd selecting a target retail price making action a at the current moment by using a price selection probability-greedy strategy within an allowable retail price rangetThe electricity price selection probability is expressed by epsilon, the electricity price selection probability-greedy strategy is a criterion for electricity price selection, and the specific meaning is as follows: randomly selecting a retail price theta from the action set A according to the probability of epsilontOr selecting the retail price theta corresponding to the maximum action value function according to the probability of 1-epsilont
Step S103, calculating the profit return immediately after the target retail price making action is executed, monitoring the states of all load units at the next moment of the current moment, and recording as a second state;
in this embodiment, after the profit is calculated and returned immediately, the states s of all the load units at the next time of the current time t, that is, at the time t +1, are monitoredt+1
Wherein the profit is calculated according to the formula (1) for immediate return, and the formula (1) is as follows:
rt=ρUt-(1-ρ)Ct(1);
in the formula, rho is [0, 1]]Is a weight parameter, U, representing the relative social value of the utility's revenue and the combined cost of the load unittRepresenting the net income of the utility at time t, CtIndicating the overall cost of the load side at time t.
UtIs shown in equation (2), equation (2) is as follows:
Figure BDA0003007626190000091
in the formula (I), the compound is shown in the specification,
Figure BDA0003007626190000092
in order to be a non-schedulable set of loads,
Figure BDA0003007626190000093
indicating non-schedulable load
Figure BDA0003007626190000094
The retail electricity prices received at the time t,
Figure BDA0003007626190000095
indicating non-schedulable load
Figure BDA0003007626190000096
The energy consumption at time t, the superscript n denoting the non-dispatchable load identity, the subscript n denoting the load unit index, the subscript t denoting the time index,
Figure BDA0003007626190000097
indicating schedulable load
Figure BDA0003007626190000098
The energy consumption at time t, the superscript d representing the schedulable load identifier,
Figure BDA0003007626190000099
in order to be able to schedule a set of loads,
Figure BDA00030076261900000910
indicating schedulable load
Figure BDA00030076261900000911
Retail electricity prices, η, received at time ttIndicating schedulable load
Figure BDA00030076261900000912
Wholesale electricity price at time t and satisfy
Figure BDA00030076261900000913
Figure BDA00030076261900000914
The total electric energy purchased by the electric power company to the electric network operator at the time t is represented, and the superscript tot represents a total electric energy identifier.
CtIs shown in equation (3), equation (3) is as follows:
Figure BDA00030076261900000915
in the formula (I), the compound is shown in the specification,
Figure BDA0003007626190000101
indicating schedulable load
Figure BDA0003007626190000102
The level of dissatisfaction caused by the reduced energy consumption requirement at time t.
Step S104, updating a reference action value function into a target action value function based on the first state, the second state, the target retail price making action and the income immediate return;
the reference action value function is an action value function obtained by last iteration;
target action value function Qk(st,at) Is shown in equation (4), equation (4) is as follows:
Figure BDA0003007626190000103
in the formula, Qk(st,at) Represents the state s from all load units at the kth iteration as a function of the target action valuetStarting, a target retail price making action a is executedtAccumulated future discount ofThe reward is defined as
Figure BDA0003007626190000104
Where γ represents a discount factor, e ∈ [0, 1]]Is the learning rate, represents the newly acquired QkValue pair Qk-1Degree of coverage of value, Qk-1(st,at) Representing said reference motion value function, st+1Representing the state of all load units at time t +1, at+1Indicating retail electricity price making action at time t +1, Qk-1(st-1,at+1) State s representing k-1 iterations from all loadst+1Starting from, carry out at+1Accumulated future discount returns.
Step S105, judging whether the current time reaches the terminal time, if so, executing step S106;
if the current time T does not reach the terminal time T, the process returns to step S102.
Step S106, judging whether the absolute value of the difference between the target action value function and the reference action value function is not greater than a difference threshold value, if so, continuing to execute step S107;
in this embodiment, when | Qk-Qk-1If the value is less than or equal to delta, continuing to execute the step S108, otherwise, returning to the step S102, wherein Qk-1As a function of a reference action value, QkAs a function of the target motion value, δ is the difference threshold.
Step S107, taking the target action value function as an optimal action value function, and determining an optimal retail electricity price strategy according to the optimal action value function by adopting a Markov decision process;
specifically, the optimal retail electricity price strategy is shown in formula (5), and formula (5) is as follows:
Figure BDA0003007626190000111
in the formula, pi*(st) For optimal retail electricity price strategy, Q*(st,at) As a function of the optimum action value, A isAction set, A ═ a1,a2,…,aT}。
And S108, calculating the optimal energy consumption of the dispatchable load according to the optimal retail electricity price strategy.
The expression of the optimal energy consumption is shown in formula (6), and formula (6) is as follows:
Figure BDA0003007626190000112
in the formula (I), the compound is shown in the specification,
Figure BDA0003007626190000113
indicating the optimal energy consumption, the index n for the load unit, the index t for the time index, the index d for the dispatchable load identification,
Figure BDA0003007626190000114
indicating schedulable load
Figure BDA0003007626190000115
The energy requirement at the time of t,
Figure BDA0003007626190000116
indicating schedulable load
Figure BDA0003007626190000117
Retail electricity prices, η, received at time ttIndicating schedulable load
Figure BDA0003007626190000118
Wholesale electricity price at time t and satisfy
Figure BDA0003007626190000119
μtThe electricity rate elastic coefficient indicates a rate at which the energy demand changes with the retail electricity rate at time t.
To sum up, the invention discloses a method for determining based on price demand response, modeling the dynamic retail electricity price pricing problem of an electric power company as a Markov decision process, monitoring the states of all load units at the current moment as a first state, selecting a target retail electricity price making action at the current moment by using an electricity price selection probability-greedy strategy within an allowable retail electricity price range, calculating the immediate return of income after executing the target retail electricity price making action, monitoring the states of all load units at the next moment at the current moment and marking as a second state, updating a reference action value function obtained from the last iteration into a target action value function based on the first state, the second state, the target retail electricity price making action and the immediate return of income, when the current moment reaches a terminal moment and the absolute value of the difference between the target action value function and the reference action value function is not more than a difference threshold value, and taking the target action value function as an optimal action value function, determining an optimal retail electricity price strategy according to the optimal action value function by adopting a Markov decision process, and further calculating the optimal energy consumption of the dispatchable load according to the optimal retail electricity price strategy, thereby realizing the determination based on price demand response. The dynamic retail electricity price pricing problem of the power company is modeled into a Markov decision process, and when the optimal retail electricity price strategy is determined according to the optimal action value function, the influence of the current electricity price on the instant response of the load is considered, and the influence of the current electricity price on the response of the load in a period of time in the future is also considered, so that the uncertainty and the flexibility of the dynamic power market can be truly depicted, and the accuracy based on price demand response is improved.
In addition, the present invention utilizes a reinforcement learning algorithm to solve the price-based demand response problem in an unknown electricity market environment (i.e., retail electricity prices and load energy consumption are uncertain and random).
To further optimize the above embodiment, before step S102, the action value function needs to be initialized, and the process of initializing the action value function includes:
obtaining known prior parameter data, bringing the prior parameter data into a predetermined action value function, and initializing the action value function.
Wherein the prior parameter data comprises: energy requirement e of load celltCoefficient of dissatisfaction
Figure BDA0003007626190000121
And
Figure BDA0003007626190000122
elastic coefficient of electricity price mutAnd wholesale electricity price etatAnd a weight parameter ρ, etc., t representing the time.
The initial value of the action value function is 0, i.e. Qk(st,at) At 0, the number of iterations k is 1, i.e., k is 1, and the time t is 1, i.e., Q0(s,a)=0。
In this embodiment, the value range of the time t is: t1, 2, T represents the total number of time intervals.
The variable parameters in the action value function include: stAnd at,stIndicating the state of all load units at time t, i.e. the energy demand e of all load units at time ttEnergy consumption ptAnd a time index t; a istRetail electricity price making action for representing t time, namely retail electricity price theta made by electric power company for all load units at t timet
It should be noted that, since the power system model relates to information interaction between the power company and the load unit, in order to facilitate understanding of the technical solution to be protected by the present invention, a mathematical model between the power company and the load unit is described below.
Loads are generally classified into two categories, i.e., schedulable loads, according to user preferences and energy consumption characteristics of the loads
Figure BDA0003007626190000131
And non-schedulable load
Figure BDA0003007626190000132
That is to say that
Figure BDA0003007626190000133
The load can be scheduled: the energy consumption of a typical schedulable load is expressed as equation (7), where equation (7) is as follows:
Figure BDA0003007626190000134
in the formula (I), the compound is shown in the specification,
Figure BDA0003007626190000135
and
Figure BDA0003007626190000136
respectively representing schedulable loads
Figure BDA0003007626190000137
Energy consumption and energy demand at time t, wherein energy demand refers to the electric energy expected to be consumed by the load unit before receiving the retail electricity price signal, energy consumption information refers to the electric energy actually consumed by the load unit after receiving the retail electricity price signal, subscript n denotes a load unit index, subscript t denotes a time index, and superscript d denotes a schedulable load identifier. Mu.stThe electricity rate elastic coefficient indicates a rate at which the energy demand changes with the retail electricity rate at time t.
Figure BDA0003007626190000138
And ηtRespectively representing schedulable loads
Figure BDA0003007626190000139
Retail electricity prices received at time t and wholesale electricity prices at time t, and satisfy
Figure BDA00030076261900001310
Equation (7) shows that the actual energy consumption of the schedulable load depends not only on the energy demand information but also on the reduction of the energy demand due to the change of the retail electricity prices. When the schedulable load cell n is trueThe actual energy consumption is
Figure BDA00030076261900001311
Time means the rest
Figure BDA00030076261900001312
Are not satisfied and thus may result in dissatisfaction with the load user.
To characterize this dissatisfaction, a dissatisfaction function is defined as shown in equation (8), equation (8) being as follows:
Figure BDA00030076261900001313
in the formula (I), the compound is shown in the specification,
Figure BDA00030076261900001314
indicating schedulable load
Figure BDA00030076261900001315
The level of dissatisfaction caused by the reduced energy consumption requirement at time t,
Figure BDA0003007626190000141
and
Figure BDA0003007626190000142
representing two schedulable dependent loads
Figure BDA0003007626190000143
The dissatisfaction coefficient of (a) to (b),
Figure BDA0003007626190000144
indicating schedulable load
Figure BDA0003007626190000145
The amount of reduction in the demand of (c),
Figure BDA0003007626190000146
indicating schedulable load
Figure BDA0003007626190000147
At the time t, the energy demand, the superscript d, represents the schedulable load identifier,
Figure BDA0003007626190000148
in order to be able to schedule a set of loads,
Figure BDA0003007626190000149
indicating schedulable load
Figure BDA00030076261900001410
Energy consumption at time t.
Equation (8) shows that a greater reduction in demand results in a higher level of dissatisfaction of the load unit.
In addition, the load can be scheduled
Figure BDA00030076261900001411
Cannot exceed its allowable range, as shown in the inequality (9), the inequality (9) is as follows:
Figure BDA00030076261900001412
in the formula (I), the compound is shown in the specification,
Figure BDA00030076261900001413
and
Figure BDA00030076261900001414
respectively representing schedulable loads
Figure BDA00030076261900001415
And a maximum demand reduction amount, and are both known amounts. Once the cover is closed
Figure BDA00030076261900001416
And
Figure BDA00030076261900001417
when known, the real energy consumption of the schedulable load n can be determined accordingly
Figure BDA00030076261900001418
The range of (1).
(II) non-schedulable load: generally, the energy requirements of non-dispatchable loads cannot be transferred and curtailed at will, so the energy requirements of these loads must be strictly met at all times.
Thus, pair
Figure BDA00030076261900001419
In the present invention, formula (10) is satisfied, and formula (10) is as follows:
Figure BDA00030076261900001420
in the formula (I), the compound is shown in the specification,
Figure BDA00030076261900001421
and
Figure BDA00030076261900001422
respectively representing non-dispatchable loads
Figure BDA00030076261900001423
Energy consumption and energy demand at time t, where the superscript n denotes the non-dispatchable load identity.
Thus, from a load point of view, the goal is to minimize the overall cost on the load side by determining the optimal energy consumption combination for all loads, i.e. to minimize the overall cost on the load side
Figure BDA00030076261900001424
Wherein p represents the energy consumption vector of all the participating load units in the whole time period T,
Figure BDA0003007626190000151
indicating non-schedulable load
Figure BDA0003007626190000152
Retail electricity prices received at time t.
It can be seen that the above formula consists of two parts, corresponding to two types of loads, respectively. Specifically, item one
Figure BDA0003007626190000153
Representing the cost of electricity to a utility company for the purchase of electricity by a non-dispatchable load, item two
Figure BDA0003007626190000154
Representing the power cost and dissatisfaction costs of dispatchable loads to purchase power from the utility.
For the convenience of subsequent discussion and writing, the comprehensive cost of the load side at the time t is defined as CtTherefore, the above formula can be further written as
Figure BDA0003007626190000155
The electric power company, as an intermediary between the end user and the electric power producer, first purchases electric energy from the grid operator at a wholesale price and then sells the purchased electric energy to different types of load units on the load side at a retail price. The goal of the electric company is therefore to maximize revenue by trading in wholesale and retail markets, the mathematical model of which can be expressed as:
Figure BDA0003007626190000156
wherein θ represents a retail electricity price vector established by the electric power company for all load units during the whole time period T;
Figure BDA0003007626190000157
representing the total power purchased by the utility company to the grid operator at time t, where tot represents the total power identification,θ nand
Figure BDA0003007626190000158
respectively, representing the lower and upper bounds of the retail price of electricity made by the utility for the load unit. It can be seen that the objective function in the mathematical model consists of three terms, wherein,
Figure BDA0003007626190000161
representing the revenue of the utility selling power to the non-dispatchable load units,
Figure BDA0003007626190000162
representing the revenue of the utility selling power to the dispatchable load unit,
Figure BDA0003007626190000163
representing the cost of electricity from a utility company to purchase electricity from a grid operator. Likewise, for convenience of presentation, the net revenue for the utility at time t is defined as UtTherefore, the objective function in the above formula is further described as
Figure BDA0003007626190000164
Generally, when power loss is not considered and a power balance rule is followed, the total purchased power of the electric power company is equal to the total power consumption of the load side at any time, that is, as shown in equation (11), equation (11) is as follows:
Figure BDA0003007626190000165
in modeling the utility and load units, it can be found that price-based demand responses are closely related to revenue for the utility and cost of the load units. From a social perspective, therefore, the goal of the system is to maximize the social benefit including the combined cost of the utility revenue and load, as shown in equation (12), where equation (12) is as follows:
Figure BDA0003007626190000166
where ρ ∈ [0, 1] is a weight parameter representing the relative social value of revenue of the utility and the combined cost of the load unit. The larger ρ is, the more the profit of the electric power company is concerned from the social point of view; on the contrary, the influence of the comprehensive cost of the load on the social profit is more concerned.
In order to establish a dynamic retail electricity price capable of adapting to flexible load change in an unknown electric power market environment, the method firstly utilizes a reinforcement learning framework to model the retail electric power market.
Specifically, the electric power company acts as an agent; all load units are used as environment; an act of retail price being acted upon the environment as an agent; energy demand, energy consumption and time of the load as conditions; the social benefit (i.e., the weighted sum of the utility revenue and the combined cost of the load units) is rewarded.
Second, the dynamic retail price pricing problem is modeled using a Markov decision process, which is also typically the first step in using a reinforcement learning algorithm. Without loss of generality, the markov decision process is represented by a five-tuple < S, a, R, P, γ >, where the meaning of each element is as follows:
1) state collection: s ═ S1,s2,…,sTIn which s ist=(et,ptT) represents the state at time t, from which the energy demand e of all load unitstEnergy consumption ptAnd a time index t.
2) And (4) action set: a ═ a1,a2,…,aTIn which a ist=θtRepresenting the action of the agent at time t, i.e. the retail price of electricity theta established by the electric power company for all load units at time tt
3) And (3) return set: r ═ R1,r2,…,rTIn which r ist=ρUt-(1-ρ)CtAnd the return of the system at the moment t is shown, namely the social benefit at the current moment.
4) State transition matrix:
Figure BDA0003007626190000171
wherein
Figure BDA0003007626190000172
Representing the probability that after taking action a at state s, the environment will transition to state s' at the next moment. Since the energy demand and energy consumption of the load are influenced by many factors, it is difficult to obtain the state transition probability thereof. The power market environment is unknown in the present invention, and therefore a modeless Q-learning approach is employed to address the dynamic retail pricing problem.
5) Discount factor: γ ∈ [0, 1] denotes the importance of the subsequent reward relative to the current reward.
Defining a strategy pi: s → A, i.e. mapping of state to action, the pricing problem of retail electricity prices translates into finding an optimal strategy of π*Maximizing the cumulative return of the system, i.e.
Figure BDA0003007626190000173
Since the goal of the system is to maximize the social benefit over the entire time period and the social value returned at any time is equal, γ is 1 in the present invention.
After modeling the dynamic retail price pricing problem for the utility as a markov decision process, a Q-learning algorithm (a model-free reinforcement learning algorithm) is used to analyze how the utility selects the optimal retail price while interacting with all load units to achieve the power system objective.
The basic principle of the Q-learning algorithm is to assign an action-value function Q (s, a) to each state-action pair (s, a), and then update the function in each iteration to obtain the optimal action-value function Q*(s, a). The optimal action value function is defined as starting from a state s, taking an action a and then taking an optimal strategy pi*And satisfies the Bellman equation, i.e. the maximum cumulative future discount return
Figure BDA0003007626190000181
Where S 'is e.S, a' is e.A represents the state and action taken at the next time, respectively, r (S, a) represents the reward immediately after action a is taken from state S, Q*(s ', a') denotes that starting from the state s ', an action a' is performed, after which an optimal strategy is taken π*The maximum cumulative future discount returns. Gamma is belonged to 0, 1]And a discount factor is expressed, so that the influence of the current retail price on the load response in the future period is considered in the algorithm, and the influence of the current retail price on the load response is reflected at the same time. Therefore, once the optimum action value function Q is obtained*(st,at) The optimal retail electricity price policy shown in formula (5) can be directly obtained according to the following formula.
Corresponding to the embodiment of the method, the invention also discloses a system for determining the price demand response.
Referring to fig. 2, a schematic structural diagram of a price demand response-based determination system disclosed in an embodiment of the present invention, the system is applied to a processor in an electric power company, and the system includes:
a modeling unit 201, configured to model a dynamic retail price pricing problem of an electric power company as a markov decision process;
the action selection unit 202 is used for monitoring the states of all load units at the current moment, recording the states as first states, and selecting a target retail price at the current moment to make an action by using a price selection probability-greedy strategy within an allowable retail price range;
a reward calculation unit 203, configured to calculate a reward immediate reward after the target retail price making action is executed, monitor states of all load units at a next time of the current time, and record the states as a second state;
a function updating unit 204, configured to update a reference action value function to a target action value function based on the first state, the second state, the target retail electricity price making action, and the immediate return of revenue, where the reference action value function is an action value function obtained through last iteration;
a first judging unit 205, configured to judge whether the current time reaches a terminal time;
a second determination unit 206 configured to determine whether or not an absolute value of a difference between the target motion value function and the reference motion value function is not greater than a difference threshold value if the first determination unit 205 determines yes;
an electricity price policy determination unit 207 configured to, if the second determination unit 206 determines that the current price is the retail price, take the target action value function as an optimal action value function, and determine an optimal retail electricity price policy according to the optimal action value function;
and the energy consumption calculating unit 208 is used for calculating the optimal energy consumption of the dispatchable load according to the optimal retail electricity price strategy.
It should be noted that, for the specific working principle of each component in the system embodiment, please refer to the corresponding part of the method embodiment, which is not described herein again.
To sum up, the invention discloses a price demand response-based determination system, which models a dynamic retail electricity price pricing problem of an electric power company into a Markov decision process, monitors the states of all load units at the current moment to be recorded as a first state, selects a target retail electricity price making action at the current moment by using an electricity price selection probability-greedy strategy within an allowable retail electricity price range, calculates the immediate return of income after executing the target retail electricity price making action, monitors the states of all load units at the next moment at the current moment to be recorded as a second state, updates a reference action value function obtained from the last iteration into a target action value function based on the first state, the second state, the target retail electricity price making action and the immediate return of income, when the current moment reaches a terminal moment and the absolute value of the difference between the target action value function and the reference action value function is not more than a difference threshold value, and taking the target action value function as an optimal action value function, determining an optimal retail electricity price strategy according to the optimal action value function by adopting a Markov decision process, and further calculating the optimal energy consumption of the dispatchable load according to the optimal retail electricity price strategy, thereby realizing the determination based on price demand response. The dynamic retail electricity price pricing problem of the power company is modeled into a Markov decision process, and when the optimal retail electricity price strategy is determined according to the optimal action value function, the influence of the current electricity price on the instant response of the load is considered, and the influence of the current electricity price on the response of the load in a period of time in the future is also considered, so that the uncertainty and the flexibility of the dynamic power market can be truly depicted, and the accuracy based on price demand response is improved.
It should be noted that, besides the field of energy management on the demand side, the present invention can also be applied to decision-making problems in other unknown environments in the smart grid, such as power balance on both sides of supply and demand, scheduling problems of optimal generator sets, and the like.
The state space, the action space and the return definition in the Markov decision process are not unique, and can be redefined according to other targets of a system or an individual; in addition, the selection of the learning rate in the Q-learning algorithm has a great influence on the convergence of the algorithm, so that the selection of the learning rate can be further analyzed and discussed.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (11)

1. A method for determining a response based on a price demand, comprising:
modeling a dynamic retail electricity price pricing problem of an electric power company as a Markov decision process;
monitoring the states of all load units at the current moment, recording the states as a first state, and selecting a target retail electricity price making action at the current moment by using an electricity price selection probability-greedy strategy within an allowable retail electricity price range;
calculating the profit return immediately after the target retail electricity price making action is executed, monitoring the states of all load units at the next moment of the current moment, and recording as a second state;
updating a reference action value function into a target action value function based on the first state, the second state, the target retail electricity price making action and the immediate return of income, wherein the reference action value function is an action value function obtained by last iteration;
judging whether the current time reaches the terminal time;
if so, judging whether the absolute value of the difference between the target action value function and the reference action value function is not greater than a difference threshold value;
if so, taking the target action value function as an optimal action value function, and determining an optimal retail electricity price strategy according to the optimal action value function;
and calculating the optimal energy consumption of the dispatchable load according to the optimal retail electricity price strategy.
2. The validation method according to claim 1, wherein the specific meaning of the electricity price selection probability-greedy strategy is: and randomly selecting one retail price from the action set according to the probability of epsilon, or selecting the retail price corresponding to the maximum action value function according to the probability of 1-epsilon, wherein epsilon represents the price selection probability.
3. The validation method of claim 1, wherein the revenue is reported back immediately rtThe expression of (a) is as follows:
rt=ρUt-(1-ρ)Ct
in the formula, rho is [0, 1]]Is a weight parameter, U, representing the relative social value of the utility's revenue and the combined cost of the load unittRepresenting the net income of the utility at time t, CtIndicating the overall cost of the load side at time t.
4. The validation method of claim 3, wherein the net profit U for the utility at time ttThe expression of (a) is as follows:
Figure FDA0003007626180000021
in the formula (I), the compound is shown in the specification,
Figure FDA0003007626180000022
in order to be a non-schedulable set of loads,
Figure FDA0003007626180000023
indicating non-schedulable load
Figure FDA0003007626180000024
The retail electricity prices received at the time t,
Figure FDA0003007626180000025
indicating non-schedulable load
Figure FDA0003007626180000026
The energy consumption at time t, the superscript n denoting the non-dispatchable load identity, the subscript n denoting the load unit index, the subscript t denoting the time index,
Figure FDA0003007626180000027
indicating schedulable load
Figure FDA0003007626180000028
The energy consumption at time t, the superscript d representing the schedulable load identifier,
Figure FDA0003007626180000029
in order to be able to schedule a set of loads,
Figure FDA00030076261800000210
indicating schedulable load
Figure FDA00030076261800000211
Retail electricity prices, η, received at time ttIndicating schedulable load
Figure FDA00030076261800000212
Wholesale electricity price at time t and satisfy
Figure FDA00030076261800000213
Figure FDA00030076261800000214
Representing the total power purchased by the utility company to the grid operator at time t, and the superscript tot representing the totalElectric energy identification;
comprehensive cost C of load side at time ttThe expression of (a) is as follows:
Figure FDA00030076261800000215
in the formula (I), the compound is shown in the specification,
Figure FDA00030076261800000216
indicating schedulable load
Figure FDA00030076261800000217
The level of dissatisfaction caused by the reduced energy consumption requirement at time t.
5. The confirmation method according to claim 4, wherein the degree of dissatisfaction
Figure FDA00030076261800000218
The expression of (a) is as follows:
Figure FDA00030076261800000219
in the formula (I), the compound is shown in the specification,
Figure FDA0003007626180000031
indicating schedulable load
Figure FDA0003007626180000032
The level of dissatisfaction caused by the reduced energy consumption requirement at time t,
Figure FDA0003007626180000033
and
Figure FDA0003007626180000034
representing two schedulable dependent loads
Figure FDA0003007626180000035
The dissatisfaction coefficient of (a) to (b),
Figure FDA0003007626180000036
indicating schedulable load
Figure FDA0003007626180000037
The amount of reduction in the demand of (c),
Figure FDA0003007626180000038
indicating schedulable load
Figure FDA0003007626180000039
At the time t, the energy demand, the superscript d, represents the schedulable load identifier,
Figure FDA00030076261800000310
in order to be able to schedule a set of loads,
Figure FDA00030076261800000311
indicating schedulable load
Figure FDA00030076261800000312
Energy consumption at time t.
6. The validation method of claim 5, wherein the load is schedulable
Figure FDA00030076261800000313
Reduction in demand of
Figure FDA00030076261800000314
The following inequality is satisfied:
Figure FDA00030076261800000315
in the formula (I), the compound is shown in the specification,
Figure FDA00030076261800000316
and
Figure FDA00030076261800000317
respectively representing schedulable loads
Figure FDA00030076261800000318
And a maximum demand reduction amount, and are both known amounts.
7. The validation method of claim 1, wherein the target action value function is expressed as follows:
Figure FDA00030076261800000319
in the formula, Qk(st,at) Representing the state s from all load units at the kth iteration as a function of the target action valuetStarting, a target retail price making action a is executedtIs defined as the cumulative future discount return of
Figure FDA00030076261800000320
Where γ represents a discount factor, e ∈ [0, 1]]Is the learning rate, represents the newly acquired QkValue pair Qk-1Degree of coverage of value, Qk-1(st,at) Representing said reference motion value function, st+1Representing the state of all load units at time t +1, at+1Indicating retail electricity price making action at time t +1, Qk-1(st+1,at+1) State s representing k-1 iterations from all loadst+1Starting from, carry out at+1Accumulated future discount returns.
8. The confirmation method according to claim 1, wherein the expression of the optimal retail electricity price policy is as follows:
Figure FDA0003007626180000041
in the formula, pi*(st) For optimal retail electricity price strategy, Q*(st,at) For the optimal action value function, A is the action set, and A ═ a1,a2,…,aTThe value range of the time t is: t1, 2, T represents the total number of time intervals, stRepresenting the state of all load units at time t, atIndicating a retail electricity price making action at time t.
9. The validation method of claim 1, wherein the expression of the optimal amount of energy consumption is as follows:
Figure FDA0003007626180000042
in the formula (I), the compound is shown in the specification,
Figure FDA0003007626180000043
indicating the optimal energy consumption, the index n for the load unit, the index t for the time index, the index d for the dispatchable load identification,
Figure FDA0003007626180000044
indicating schedulable load
Figure FDA0003007626180000045
The energy requirement at the time of t,
Figure FDA0003007626180000046
indicating schedulable load
Figure FDA0003007626180000047
Retail electricity prices, η, received at time ttIndicating schedulable load
Figure FDA0003007626180000048
Wholesale electricity price at time t and satisfy
Figure FDA0003007626180000049
μtThe electricity rate elastic coefficient indicates a rate at which the energy demand changes with the retail electricity rate at time t.
10. The validation method of claim 1, wherein the determination method further comprises: initializing the action value function, specifically including:
obtaining known prior parameter data, substituting the prior parameter data into the predetermined action value function, and initializing the action value function, wherein the initial value of the action value function is 0.
11. A price demand response-based determination system, comprising:
the modeling unit is used for modeling the dynamic retail electricity price pricing problem of the power company into a Markov decision process;
the action selection unit is used for monitoring the states of all load units at the current moment, recording the states as first states, and selecting a target retail price at the current moment to make an action by using a price selection probability-greedy strategy within an allowable retail price range;
the return calculating unit is used for calculating the return immediately after the target retail price making action is executed, monitoring the states of all load units at the next moment of the current moment and recording the states as a second state;
a function updating unit, configured to update a reference action value function to a target action value function based on the first state, the second state, the target retail electricity price making action, and the immediate return of revenue, where the reference action value function is an action value function obtained through last iteration;
the first judgment unit is used for judging whether the current moment reaches the terminal moment;
a second judgment unit configured to judge whether or not an absolute value of a difference between the target motion value function and the reference motion value function is not greater than a difference threshold value, in a case where the first judgment unit judges yes;
the electricity price strategy determining unit is used for taking the target action value function as an optimal action value function under the condition that the second judging unit judges that the electricity price strategy is positive, and determining an optimal retail electricity price strategy according to the optimal action value function;
and the energy consumption calculating unit is used for calculating the optimal energy consumption of the dispatchable load according to the optimal retail electricity price strategy.
CN202110366209.XA 2021-04-06 2021-04-06 Price demand response-based determination method and system Active CN113052638B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110366209.XA CN113052638B (en) 2021-04-06 2021-04-06 Price demand response-based determination method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110366209.XA CN113052638B (en) 2021-04-06 2021-04-06 Price demand response-based determination method and system

Publications (2)

Publication Number Publication Date
CN113052638A true CN113052638A (en) 2021-06-29
CN113052638B CN113052638B (en) 2023-11-24

Family

ID=76517587

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110366209.XA Active CN113052638B (en) 2021-04-06 2021-04-06 Price demand response-based determination method and system

Country Status (1)

Country Link
CN (1) CN113052638B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190036488A (en) * 2017-09-27 2019-04-04 한양대학교 에리카산학협력단 Real-time decision method and system for industrial load management in a smart grid
CN110378058A (en) * 2019-07-26 2019-10-25 中民新能投资集团有限公司 A kind of method for building up for the electro thermal coupling microgrid optimal response model comprehensively considering reliability and economy
KR20190132193A (en) * 2018-05-18 2019-11-27 한양대학교 에리카산학협력단 A Dynamic Pricing Demand Response Method and System for Smart Grid Systems
CN111105126A (en) * 2019-10-30 2020-05-05 国网浙江省电力有限公司舟山供电公司 Power grid service value making method based on reinforcement learning of user side demand response

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190036488A (en) * 2017-09-27 2019-04-04 한양대학교 에리카산학협력단 Real-time decision method and system for industrial load management in a smart grid
KR20190132193A (en) * 2018-05-18 2019-11-27 한양대학교 에리카산학협력단 A Dynamic Pricing Demand Response Method and System for Smart Grid Systems
CN110378058A (en) * 2019-07-26 2019-10-25 中民新能投资集团有限公司 A kind of method for building up for the electro thermal coupling microgrid optimal response model comprehensively considering reliability and economy
CN111105126A (en) * 2019-10-30 2020-05-05 国网浙江省电力有限公司舟山供电公司 Power grid service value making method based on reinforcement learning of user side demand response

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MING JIN等: ""Microgrid to enable optimal distributed energy retail and end-user demand response"", 《APPLIED ENERGY》 *
翟亚飞;刘继春;刘俊勇;: "多种电价形式和负荷类型下售电公司的定价策略", 供用电, no. 08 *

Also Published As

Publication number Publication date
CN113052638B (en) 2023-11-24

Similar Documents

Publication Publication Date Title
Lu et al. A dynamic pricing demand response algorithm for smart grid: Reinforcement learning approach
Ahrarinouri et al. Multiagent reinforcement learning for energy management in residential buildings
Celik et al. Electric energy management in residential areas through coordination of multiple smart homes
Yang et al. Decision-making for electricity retailers: A brief survey
Meng et al. A profit maximization approach to demand response management with customers behavior learning in smart grid
Yousefi et al. Optimal real time pricing in an agent-based retail market using a comprehensive demand response model
Hatami et al. A stochastic-based decision-making framework for an electricity retailer: Time-of-use pricing and electricity portfolio optimization
JP6236585B2 (en) Load forecast from individual customer to system level
EP3580719A2 (en) Methods and systems for an automated utility marketplace platform
Wan et al. Price-based residential demand response management in smart grids: A reinforcement learning-based approach
Lu et al. A reinforcement learning-based decision system for electricity pricing plan selection by smart grid end users
KR20190132193A (en) A Dynamic Pricing Demand Response Method and System for Smart Grid Systems
Ruan et al. Time-varying price elasticity of demand estimation for demand-side smart dynamic pricing
Yang et al. Quantifying the benefits to consumers for demand response with a statistical elasticity model
Reddy et al. Computational intelligence for demand response exchange considering temporal characteristics of load profile via adaptive fuzzy inference system
Liu et al. A home energy management system incorporating data-driven uncertainty-aware user preference
Oh et al. A multi-use framework of energy storage systems using reinforcement learning for both price-based and incentive-based demand response programs
He et al. An occupancy-informed customized price design for consumers: A stackelberg game approach
Li et al. Reinforcement learning aided smart-home decision-making in an interactive smart grid
CN116227806A (en) Model-free reinforcement learning method based on energy demand response management
KR20180044700A (en) Demand response management system and method for managing customized demand response program
Ahmed et al. Building load management clusters using reinforcement learning
Henni et al. Industrial peak shaving with battery storage using a probabilistic forecasting approach: Economic evaluation of risk attitude
Xiang et al. Smart Households' Available Aggregated Capacity Day-ahead Forecast Model for Load Aggregators under Incentive-based Demand Response Program
Wang et al. Coordinated residential energy resource scheduling with human thermal comfort modelling and renewable uncertainties

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant