CN115423539A - Demand response incentive price determination method and device considering user satisfaction - Google Patents

Demand response incentive price determination method and device considering user satisfaction Download PDF

Info

Publication number
CN115423539A
CN115423539A CN202211367309.5A CN202211367309A CN115423539A CN 115423539 A CN115423539 A CN 115423539A CN 202211367309 A CN202211367309 A CN 202211367309A CN 115423539 A CN115423539 A CN 115423539A
Authority
CN
China
Prior art keywords
user
model
incentive
price
demand response
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211367309.5A
Other languages
Chinese (zh)
Inventor
丁贵立
韩威
许志浩
王宗耀
康兵
朱卓航
高永民
蒋善旗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanchang Institute of Technology
Original Assignee
Nanchang Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanchang Institute of Technology filed Critical Nanchang Institute of Technology
Priority to CN202211367309.5A priority Critical patent/CN115423539A/en
Publication of CN115423539A publication Critical patent/CN115423539A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0206Price or cost determination based on market factors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0207Discounts or incentives, e.g. coupons or rebates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Theoretical Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Primary Health Care (AREA)
  • Tourism & Hospitality (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention belongs to the technical field of intelligent power utilization, and relates to a demand response incentive price determining method and device considering user satisfaction, wherein the method is used for establishing an incentive type demand response model based on a layered power market and a carbon emission trading market, and the incentive type demand response model comprises a power grid operator model, a user side model and a target function model; determining the demand response potential grade of the user through a clustering algorithm; determining the response rate of a user and the excitation elastic coefficients of different periods of time through historical response data; solving the excitation type demand response model based on a Q learning algorithm in reinforcement learning to obtain the optimal excitation subsidy price. According to the method, the carbon emission trading market, the user dissatisfaction and other influence factors are comprehensively considered, parameters of the model are changed according to the relevant energy consumption characteristics of the user and the real-time trading price of the carbon trading market, subsidy prices are output, a power supply company is helped to make a scientific demand response regulation and control strategy, and reasonable allocation of power resources is realized.

Description

Demand response incentive price determination method and device considering user satisfaction
Technical Field
The invention belongs to the technical field of power supply and demand management, and relates to a demand response incentive price determining method and device considering user satisfaction.
Background
The power demand side management is an important component for constructing a novel power system, demand response is used as an important means of the demand side management, the problem of unbalanced supply and demand in peak load periods of a power grid is effectively solved, and the problems of carbon emission, ecological pollution and cost expenditure caused by the construction of a new power plant and matched power grid facilities are solved. Demand response is divided into two modes, price type demand response, which guides a user to adjust electricity usage behavior by changing electricity price, and incentive type demand response, which encourages the user to reduce electricity usage by subsidizing rewards or discounts. Compared with the price type demand response, the incentive type demand response has more advantages and flexibility in solving peak clipping and valley filling in a short time and is more attractive to users.
At present, the national research on demand response incentive prices establishes models from uncertain and principal and subordinate game angles, belongs to a model-based method, depends on a well-designed model, needs to know all or most of environmental information, has complex algorithm and low expandability and flexibility, and does not consider the influence of a carbon emission trading market on the electricity utilization behavior of a user side.
Reinforcement learning is a data-driven method, does not need to know excessive environmental information, estimates the long-term value of a strategy by continuously observing and exploring interactive data between an agent and the environment, and finally obtains the strategy for optimizing the target.
Disclosure of Invention
Aiming at the problems, the invention provides a method and a device for determining a demand response incentive price considering user satisfaction. The invention can make a refined subsidy price strategy under the condition of limited information and motivate the user to finish energy-saving response more accurately.
The technical scheme adopted by the invention is as follows:
a demand response incentive price determining method considering user satisfaction, comprising the steps of:
s1, establishing an incentive type demand response model based on a layered power market and a carbon emission trading market, wherein the incentive type demand response model comprises a power grid operator model, a user side model and a target function model;
s2, determining the demand response potential grade of the user through a clustering algorithm; determining the response rate of the user and the excitation elastic coefficient of different periods of time through historical response data;
and S3, inputting the demand response potential grade, the response rate and the excitation elastic coefficient of the user as model parameters into the excitation type demand response model, and solving the excitation type demand response model based on a Q learning algorithm in reinforcement learning to obtain the optimal excitation subsidy price.
Further preferably, the electric network operator model comprises a carbon emission profit model, a reduced electricity purchase cost model and an incentive cost model;
the electric network operator model is as follows:
Figure 764376DEST_PATH_IMAGE001
the carbon emission profit model is as follows:
Figure 592655DEST_PATH_IMAGE002
the reduced electricity purchase cost model is as follows:
Figure 576661DEST_PATH_IMAGE003
the incentive cost model is as follows:
Figure 857600DEST_PATH_IMAGE004
wherein
Figure 212184DEST_PATH_IMAGE005
In the formula, F Gr Representing the benefit of the grid, C Gr Indicates carbon emission yield, C RC Representing reduced electricity purchase cost, C IC Representing incentive costs, i represents users, T represents time, n is the set of users, T is the set of all time periods in the day;
Figure 160548DEST_PATH_IMAGE006
the response load representing the user i engaged in the activity during the time period t,
Figure 315455DEST_PATH_IMAGE007
Figure 614849DEST_PATH_IMAGE008
representing the original power demand of the user i in the t period;
Figure 513404DEST_PATH_IMAGE009
represents the excitation elastic coefficient of the user i, which means the percentage of the patch with 1% deviation to cause the load demand adjustment in the period;
Figure 316275DEST_PATH_IMAGE010
the incentive subsidy price representing the participation of the user i in the demand response acquisition during the time period t, is set by the grid operator,
Figure 642083DEST_PATH_IMAGE011
express bestLow incentive subsidy price;
Figure 897615DEST_PATH_IMAGE012
representing the trade price unit price of standard coal at time t;
Figure 68702DEST_PATH_IMAGE013
a trade unit price representing carbon dioxide of a local carbon trading market at time t;
Figure 991659DEST_PATH_IMAGE014
and the real-time price of the wholesale electricity price of the power market at the moment t is shown.
Further preferably, the user side model comprises an electricity-saving fee profit model, a subsidy profit model and a user dissatisfaction model;
the user side model is as follows:
Figure 488368DEST_PATH_IMAGE015
the model for saving the electric charge and the profit is as follows:
Figure 496775DEST_PATH_IMAGE016
the subsidy profit model is as follows:
Figure 737133DEST_PATH_IMAGE017
the user dissatisfaction model
Figure 514596DEST_PATH_IMAGE018
Figure 791993DEST_PATH_IMAGE019
Wherein, F Ur Representing user revenue, C Sr Representing savings in electric charge, C SI Representing subsidy benefits,C Ud Indicating the dissatisfaction with the user,
Figure 271385DEST_PATH_IMAGE020
representing the electricity price of the user i at the time t;
Figure 800587DEST_PATH_IMAGE021
is the response rate of the user i, is calculated by the historical response effect,
Figure 687683DEST_PATH_IMAGE022
representing the historical response load of user i,
Figure 542507DEST_PATH_IMAGE023
representing historical power demand of user i;
Figure 774774DEST_PATH_IMAGE024
a demand response potential level representing user i;
Figure 107666DEST_PATH_IMAGE025
∈[0,1],
Figure 577831DEST_PATH_IMAGE026
∈[0,1]。
further preferably, the objective function model is:
Figure 869135DEST_PATH_IMAGE027
in the formula, rho is a profit weight factor and is expressed as the importance ratio of the profit of the power grid operator and the profit of the user, and rho belongs to [0,1]. When implementing demand response, the ρ value can be set to (0.5, 1) if the grid side cost is to be controlled; if the user gain is more emphasized, the value ρ may be set at (0, 0.5).
Further preferably, in step S2, an electricity load curve of a previous week of the users in the activity implementation area is obtained, energy consumption characteristics of different users are identified through load curve clustering, and the demand response potential levels are classified according to the energy consumption characteristics.
Preferably, in step S2, historical response data of the user participating in the demand response activity is obtained, and the response rate of the user i is identified through the neural network
Figure 57540DEST_PATH_IMAGE028
And excitation elastic coefficient of different time periods
Figure 194123DEST_PATH_IMAGE029
Preferably, in step S3, the incentive type demand response model decision optimization problem based on the hierarchical power market and the carbon emission trading market is modeled into a limited MDP for learning by an intelligent agent, the grid operator sets an incentive subsidy price for the user first, then the user responds and reduces the own power load, and simultaneously the benefits of the user and the grid are fed back to the grid operator, then the grid operator resets the incentive subsidy price according to the load reduction amount of the user and the current total benefit, and the iteration process is stopped when the total benefit of the user and the grid obtains the maximum value or reaches the convergence condition, and the incentive subsidy price at this time is the optimal incentive-based demand response strategy.
Further preferably, the step S3 procedure is as follows:
step S31, initializing parameters, including: user i demand response potential rating
Figure 518794DEST_PATH_IMAGE030
Coefficient of excitation elasticity
Figure 715420DEST_PATH_IMAGE031
Response rate of user i
Figure 391121DEST_PATH_IMAGE028
Revenue weighting factor ρ, user i response rate
Figure 924870DEST_PATH_IMAGE028
A discount coefficient alpha, a learning rate theta, a greedy coefficient epsilon,
Figure 385939DEST_PATH_IMAGE032
Figure 2734DEST_PATH_IMAGE033
Figure 916463DEST_PATH_IMAGE034
Figure 644116DEST_PATH_IMAGE035
Figure 959691DEST_PATH_IMAGE036
t, set up
Figure 764966DEST_PATH_IMAGE037
The interval of (1);
s32, initializing a Q table, wherein each element in the Q table is zero, setting a time period t =0, and setting a user i =0;
step S33, observing response load of user i participating in activity in t =0 time period
Figure 165991DEST_PATH_IMAGE038
Step S34, selecting incentive subsidy prices obtained by participation of user i in demand response in t period by greedy strategy
Figure 431756DEST_PATH_IMAGE039
Step S35, calculating the reward
Figure 867417DEST_PATH_IMAGE040
Observe the response load of user i participating in the activity during the t +1 time period
Figure 560435DEST_PATH_IMAGE041
And updating the Q value;
step S36, judging whether the maximum time period T is reached, if so, turning to the next step, otherwise, T = T +1, and returning to the step S34;
step S37, judging whether the Q table converges to the maximum value, if yes, turning to the next step, otherwise, i = i +1, and returning to the step S33;
and S38, outputting the optimal incentive subsidy price of T time periods in one day.
Further preferably, in step S35, the Q value may be calculated by the following equation:
Figure 979915DEST_PATH_IMAGE042
in the formula (I), the compound is shown in the specification,
Figure 783792DEST_PATH_IMAGE043
for the Q value of user i at time t,
Figure 73959DEST_PATH_IMAGE044
for the response load of user i at time t,
Figure 203458DEST_PATH_IMAGE045
the incentive subsidy price for user i to get at time t, alpha is the discount coefficient,
Figure 579076DEST_PATH_IMAGE046
for the Q value of user i at time t +1,
Figure 186643DEST_PATH_IMAGE047
updating is performed in each iteration process until the maximum accumulated discount return is obtained or the maximum iteration number is reached, and the updating mode of the Q value is as follows:
Figure 596896DEST_PATH_IMAGE048
in the formula (I), the compound is shown in the specification,
Figure 366138DEST_PATH_IMAGE049
for the Q value of the next iteration of user i at time t,
Figure 229052DEST_PATH_IMAGE050
for the response load of the next iteration of user i at time t,
Figure 640310DEST_PATH_IMAGE051
subsidizing the price for the next iteration of the user i at time t,
Figure 905070DEST_PATH_IMAGE052
for the Q value of the next iteration of the user i at the moment of t +1, theta is the learning rate, and the value range is [0,1]]。
The invention also provides a demand response incentive price determining apparatus considering user satisfaction, comprising a non-volatile computer storage medium storing computer-executable instructions that may perform a demand response incentive price determining method considering user satisfaction in any of the above method embodiments.
The method has the advantages that aiming at two types of demand response participation main bodies of power grid operators and residential users, a demand response incentive strategy optimization model considering carbon emission and user dissatisfaction degree under the environment of the intelligent power grid and the dual-carbon is established, and the model is solved through reinforcement learning. The power grid operator guides the user to respond to power saving by issuing different subsidy prices to the user, so that the problem of peak clipping and valley filling in a certain specific time period is solved, and meanwhile, the comprehensive income maximization of the power grid and the user is realized. According to the method, the carbon emission trading market, the user dissatisfaction and other influence factors are comprehensively considered, the model parameters including price elasticity coefficients, dissatisfaction parameters and the like are identified according to the real resident demand historical response data aiming at the characteristics of randomness and dispersity of user load, the response rate of the user is judged according to the real resident demand historical response data, the demand response potential grade of the user is obtained through a clustering algorithm, and the sensitivity degree of different users to prices in different time periods is accurately described; according to the related energy utilization characteristics of the users and the parameters of the real-time transaction price change model of the carbon transaction market, the subsidy price is output, the power supply company is helped to make a scientific demand response regulation and control strategy, the local users are guided to reasonably use the power, the power consumption peak is avoided, the peak clipping and the valley filling are carried out, and the reasonable allocation of the power resources is realized.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Figure 2 is the power demand of 3 users.
Fig. 3 is a Q-value convergence diagram.
Fig. 4 is the optimum incentive subsidy price and power consumption situation for the user 1 for each time period.
Fig. 5 is the optimum incentive subsidy price and power consumption situation of the user 2 per time period.
Fig. 6 is the optimum incentive subsidy price and power consumption situation of the user 3 per time period.
Fig. 7 is a schematic structural diagram of an electronic device.
Detailed Description
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a demand response incentive price determining method considering user satisfaction, comprising the steps of:
s1, establishing an incentive type demand response model based on a layered power market and a carbon emission trading market, wherein the incentive type demand response model comprises a power grid operator model, a user side model and a target function model;
s2, determining the demand response potential grade of the user through a clustering algorithm; determining the response rate of a user and the excitation elastic coefficients of different periods of time through historical response data;
and S3, inputting the demand response potential level, the response rate and the excitation elastic coefficient of the user as model parameters into the excitation type demand response model, and solving the excitation type demand response model based on a Q learning algorithm in reinforcement learning to obtain the optimal excitation subsidy price.
In step S1 of this embodiment, the power grid operator model includes a carbon emission profit model, a reduced electricity purchase cost model, and an incentive cost model;
the electric network operator model is as follows:
Figure 382230DEST_PATH_IMAGE001
the carbon emission profit model is as follows:
Figure 466861DEST_PATH_IMAGE053
the reduced electricity purchase cost model is as follows:
Figure 416231DEST_PATH_IMAGE054
the excitation cost model is as follows:
Figure 394552DEST_PATH_IMAGE055
wherein
Figure 787487DEST_PATH_IMAGE056
In the formula, F Gr Representing the grid profit, C Gr Indicates carbon emission yield, C RC Representing reduced electricity purchase cost, C IC Representing an incentive cost, i represents a user, T represents time, n is a set of users, and T is a set of all time periods in a day;
Figure 608681DEST_PATH_IMAGE057
a response load representing that user i is engaged in an activity for a time period t,
Figure 112475DEST_PATH_IMAGE058
Figure 335515DEST_PATH_IMAGE059
representing the original power demand of the user i in the t period;
Figure 899351DEST_PATH_IMAGE009
represents the excitation elastic coefficient of the user i, which means the percentage of the patch with 1% deviation to cause the load demand adjustment in the period;
Figure 473421DEST_PATH_IMAGE060
the incentive subsidy price representing the participation of the user i in the demand response acquisition during the time period t, is established by the grid operator,
Figure 249747DEST_PATH_IMAGE061
representing a minimum incentive subsidy price;
Figure 592873DEST_PATH_IMAGE062
representing the trade price unit price of standard coal at time t;
Figure 327610DEST_PATH_IMAGE013
a trade unit price representing carbon dioxide of a local carbon trading market at time t;
Figure 857818DEST_PATH_IMAGE063
and the real-time price represents the wholesale electricity price of the power market at the moment t.
The user side model in step S1 of this embodiment includes an electricity-saving fee profit model, a subsidy profit model, and a user dissatisfaction model;
the user side model is as follows:
Figure 828048DEST_PATH_IMAGE015
the model for saving the electric charge and the profit is as follows:
Figure 776412DEST_PATH_IMAGE064
the subsidy profit model is as follows:
Figure 665740DEST_PATH_IMAGE017
the user dissatisfaction model
Figure 699555DEST_PATH_IMAGE065
Figure 326671DEST_PATH_IMAGE019
Wherein, F Ur Representing user profits, C Sr Representing savings in electric charge, C SI Indicates subsidy income, C Ud Which is indicative of the dissatisfaction of the user,
Figure 395121DEST_PATH_IMAGE066
representing the electricity price of the user i at the time t;
Figure 720929DEST_PATH_IMAGE021
is the response rate of the user i, is calculated by the historical response effect,
Figure 242040DEST_PATH_IMAGE067
representing the historical response load of user i,
Figure 678707DEST_PATH_IMAGE068
representing historical power demand of user i;
Figure 867243DEST_PATH_IMAGE030
a demand response potential level representing user i;
Figure 363952DEST_PATH_IMAGE069
∈[0,1],
Figure 106780DEST_PATH_IMAGE070
∈[0,1]。
the aim of establishing an incentive type demand response model based on a layered power market and a carbon emission trading market is to obtain the maximum benefit of comprehensively considering both a power grid and a user, so that an objective function model is as follows:
Figure 612716DEST_PATH_IMAGE071
where ρ is a revenue weighting factor expressed as the importance ratio of the grid operator revenue to the user revenue, ρ ∈ [0,1]. When implementing demand response, the ρ value can be set to (0.5, 1) if the grid side cost is to be controlled; if the user gain is more emphasized, the value ρ may be set at (0, 0.5).
In step S2 of this embodiment, the demand response potential level of the user i is determined through a clustering algorithm
Figure 718076DEST_PATH_IMAGE072
: the method comprises the steps of obtaining the power load curves of users in the activity implementation area in the previous week, identifying the energy consumption characteristics of different users through load curve clustering, and dividing the demand response potential levels of the users according to the energy consumption characteristics.
In step S2 of this embodiment, the response rate of user i is determined by the historical response data
Figure 995473DEST_PATH_IMAGE028
And excitation elastic coefficient of different time periods
Figure 474865DEST_PATH_IMAGE029
: obtaining historical response data of users participating in demand response activities, and identifying response rate of users i through a neural network
Figure 4066DEST_PATH_IMAGE028
And excitation elastic coefficient of different periods
Figure 885304DEST_PATH_IMAGE029
And accurately depicting the sensitivity of different users to prices in different time periods.
In step S3 of this embodiment, an incentive type demand response model decision optimization problem based on a hierarchical power market and a carbon emission trading market is modeled into a limited MDP for learning by an intelligent agent, a power grid operator sets an incentive subsidy price for a user first, then the user responds and reduces own power load, and meanwhile, the benefits of the user and the power grid are fed back to the power grid operator, then the power grid operator resets the incentive subsidy price according to the load reduction amount of the user and the current total benefit, and when the total benefit of the user and the power grid obtains a maximum value or reaches a convergence condition, the iteration process is stopped, and the incentive subsidy price at this time is the optimal incentive-based demand response strategy.
Step S3 of this embodiment specifically includes:
step S31, initializing parameters, including: user i's demand response potential rating
Figure 474548DEST_PATH_IMAGE030
Coefficient of excitation elasticity
Figure 712674DEST_PATH_IMAGE031
Response rate of user i
Figure 45567DEST_PATH_IMAGE028
Revenue weighting factor ρ, user i response rate
Figure 515731DEST_PATH_IMAGE028
A discount coefficient alpha, a learning rate theta, a greedy coefficient epsilon,
Figure 541456DEST_PATH_IMAGE073
Figure 729861DEST_PATH_IMAGE074
Figure 600865DEST_PATH_IMAGE075
Figure 456694DEST_PATH_IMAGE076
Figure 512375DEST_PATH_IMAGE077
t, set up
Figure 329021DEST_PATH_IMAGE078
Zone (D) ofA (c) is added;
s32, initializing a Q table, wherein each element in the Q table is zero, setting a time period t =0, and setting a user i =0;
step S33, observing response load of user i participating in activity in t =0 time period
Figure 128350DEST_PATH_IMAGE079
Step S34, selecting incentive subsidy prices obtained by participation of user i in demand response in t period by greedy strategy
Figure 589418DEST_PATH_IMAGE080
Step S35 calculating the reward
Figure 940634DEST_PATH_IMAGE081
Observe the response load of user i participating in the activity during the t +1 time period
Figure 510156DEST_PATH_IMAGE082
And updating the Q value;
s36, judging whether the maximum time period T is reached, if so, turning to the next step, otherwise, T = T +1, and returning to the step S34;
step S37, judging whether the Q table is converged to the maximum value, if so, turning to the next step, otherwise, returning to the step S33 when i = i + 1;
and S38, outputting the optimal incentive subsidy price of T time periods in one day.
In step S35 of this embodiment, the Q value can be calculated by the following equation:
Figure 847596DEST_PATH_IMAGE042
in the formula (I), the compound is shown in the specification,
Figure 163171DEST_PATH_IMAGE083
for the Q value of user i at time t,
Figure 685288DEST_PATH_IMAGE084
for the sound of user i at time tIn response to the load, the load is measured,
Figure 86314DEST_PATH_IMAGE085
the incentive subsidy price for user i to get at time t, alpha is the discount coefficient,
Figure 447736DEST_PATH_IMAGE086
for the Q value of user i at time t +1,
Figure 883397DEST_PATH_IMAGE087
updating is performed during each iteration until the maximum accumulated discount return is obtained or the maximum iteration number is reached, and the updating mode of the Q value is as follows:
Figure 576415DEST_PATH_IMAGE088
in the formula (I), the compound is shown in the specification,
Figure 730316DEST_PATH_IMAGE089
for the Q value of the next iteration of user i at time t,
Figure 799772DEST_PATH_IMAGE090
for the response load of the next iteration of user i at time t,
Figure 824360DEST_PATH_IMAGE091
subsidizing the price for the next iteration of user i at time t,
Figure 953858DEST_PATH_IMAGE092
for the Q value of the next iteration of user i at time t +1, θ is the learning rate, which represents the degree of replacement of old Q value by new Q value, and the value range is [0,1]]. When θ =0, it indicates that the agent has only utilized previous knowledge, and has not explored new knowledge; when θ =1, it is indicated that the agent discarded the previous knowledge and only explored the new knowledge. So in practical applications theta takes a number between 0 and 1.
For the purpose of facilitating an understanding of the present invention, a demand response incentive price determination method of the present invention, which considers both carbon emissions and customer satisfaction, will be described in more detail with reference to examples of the process:
the experiment considers the incentive type DR consisting of a power grid company and a plurality of users, takes 24h a day as a complete optimization cycle, and is divided into 24 periods, and each period lasts for 1h. The ratio of the peak to the peak (07.
3 users are selected to participate in the experiment, and the related parameter information is as follows:
Figure 63897DEST_PATH_IMAGE093
Figure 671465DEST_PATH_IMAGE094
Figure 816138DEST_PATH_IMAGE095
the electricity demand of the user is shown in fig. 2 below.
The result of the Q value iterated by the method of the invention is shown in fig. 3, where initially the grid operator does not know how to choose an action to produce a higher Q value, however, as the iteration progresses, the Q value increases as the grid operator learns from the environment through trial and error, eventually converging to a maximum value.
The optimal subsidy prices of 3 users at different time periods are output, and the power demand, the actual power consumption and the response electricity saving amount of the 3 users in the scene are shown in the figures 4, 5 and 6.
The response rate of user 1 is 0.125. It can be seen from fig. 4 that the sensitivity of the user to the incentive subsidy price is low, and the fluctuation of the incentive subsidy price and the electricity saving amount tends to be consistent in the valley period; the flat time period encourages the subsidy price to fluctuate greatly, but the electricity-saving quantity thereof fluctuates relatively little and is always at a low level. Through analysis, the demand response participation willingness of the user during 7-00; during peak periods 19.
The response rate of user 2 is 0.21. It can be seen from fig. 5 that the price sensitivity of the user is obviously improved relative to the user 1 in the valley period and the peak period, the power saving amount is kept at a higher level in the late peak period, and the power saving amount variation trend and the incentive subsidy price variation trend are kept consistent. Through analysis, the power saving willingness of the user during 11-00-19 of the flat period is low, and the change of the incentive subsidy price and the change of the power saving amount of the user have no obvious correlation; in the valley period, the excitation subsidy price has no obvious fluctuation change, the fluctuation of the electricity saving quantity change is large, and the excitation subsidy price is possibly related to the opening and closing of high-power electric appliances such as an air conditioner and the like; during the peak period (07.
The response rate of the user 3 is 0.18. The price sensitivity is lower than that of the user 2, and the electricity saving quantity has good performance in the valley period, the flat period and the peak period. As can be seen from fig. 6, the power saving amount variation trend of the user in the normal time period and the peak time period substantially coincides with the incentive subsidy price variation trend. The analysis shows that the change range of the excitation subsidy price is not obvious in the valley period, the change range of the electricity saving quantity is large, the excitation subsidy price is related to old people and children in the family structure, and the influence of the excitation subsidy price on electricity necessary for life such as the switch of an air conditioner at night is small; in the normal period of 13; in the peak period of 20; the user may be listed as a preferred client for participating in demand response activities during flat hours (13-00-19).
In summary, when a demand response activity is being performed, a user with similar performance to user 3 is listed as a first priority invitation client, a user with similar performance to user 2 is listed as a second priority invitation client, and a user with similar performance to user 1 is listed as a third priority invitation client; if the load of the power grid is changed sharply in a short time and the emergency peak clipping and valley filling needs to be carried out, the characteristic that the price sensitivity of the user 3 and the user 2 is high can be utilized to increase the subsidy force and achieve the purpose of peak clipping and valley filling in a short time.
In still other embodiments, an embodiment of the present invention further provides a non-transitory computer storage medium storing computer-executable instructions that may perform a method for demand response incentive price determination considering customer satisfaction in any of the above method embodiments.
The non-volatile computer-readable storage medium may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of a demand response incentive price determining apparatus that comprehensively considers carbon emissions and user satisfaction, and the like. Further, the non-volatile computer-readable storage medium may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the non-transitory computer readable storage medium optionally includes a memory remotely located from the processor, and the remote memory may be connected via a network to a demand response incentive price determining device that considers both carbon emissions and customer satisfaction. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
An embodiment of the present invention further provides a demand response incentive price determining apparatus considering user satisfaction, including a non-volatile computer storage medium, where the computer storage medium stores computer-executable instructions, and the computer-executable instructions may execute a demand response incentive price determining method considering user satisfaction in any of the above method embodiments.
Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 7, the electronic device includes: one or more processors 310 and memory 320, with one processor 310 being an example in fig. 7. The electronic device may further include: an input device 330 and an output device 340. The processor 310, the memory 320, the input device 330, and the output device 340 may be connected by a bus or other means, as exemplified by the bus connection in fig. 7. The memory 320 is a non-volatile computer-readable storage medium as described above. The processor 310 executes various functional applications of the server and data processing, i.e., the above-described demand response incentive price determination method in consideration of user satisfaction, by executing nonvolatile software programs, instructions, and modules stored in the memory 320. The input device 330 may receive input numeric or character information and generate key signal inputs related to a user setting and function control of a demand response incentive price determining apparatus in consideration of carbon emissions and user satisfaction. The output device 340 may include a display device such as a display screen.
The product can execute the method provided by the embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the method provided by the embodiment of the present invention.
As an embodiment, the electronic device is applied to a demand response incentive price determination device comprehensively considering carbon emission and user satisfaction, and is used for a client, and comprises: at least one processor; and at least one memory communicatively coupled to the processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to execute the instructions stored by the computer storage medium.
The above-described embodiments of the apparatus are merely illustrative, and units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A demand response incentive price determining method considering user satisfaction, comprising the steps of:
s1, establishing an incentive type demand response model based on a layered power market and a carbon emission trading market, wherein the incentive type demand response model comprises a power grid operator model, a user side model and a target function model;
s2, determining the demand response potential grade of the user through a clustering algorithm; determining the response rate of a user and the excitation elastic coefficients of different periods of time through historical response data;
and S3, inputting the demand response potential grade, the response rate and the excitation elastic coefficient of the user as model parameters into the excitation type demand response model, and solving the excitation type demand response model based on a Q learning algorithm in reinforcement learning to obtain the optimal excitation subsidy price.
2. The demand response incentive price determining method of claim 1, wherein the grid operator model comprises a carbon emission revenue model, a reduced electricity purchase cost model, an incentive cost model;
the electric network operator model is as follows:
Figure 577491DEST_PATH_IMAGE001
the carbon emission profit model is as follows:
Figure 689673DEST_PATH_IMAGE002
the reduced electricity purchase cost model is as follows:
Figure 179560DEST_PATH_IMAGE003
the incentive cost model is as follows:
Figure 444319DEST_PATH_IMAGE004
wherein
Figure 181200DEST_PATH_IMAGE005
In the formula, F Gr Representing the grid profit, C Gr Indicates carbon emission yield, C RC Representing reduced electricity purchase cost, C IC Representing incentive costs, i represents users, T represents time, n is the set of users, T is the set of all time periods in the day;
Figure 265830DEST_PATH_IMAGE006
the response load representing the user i engaged in the activity during the time period t,
Figure 684042DEST_PATH_IMAGE007
Figure 803308DEST_PATH_IMAGE008
representing the original power demand of the user i in the t period;
Figure 711090DEST_PATH_IMAGE009
represents the excitation elastic coefficient of the user i, which means the percentage of the patch with 1% deviation to cause the load demand adjustment in the period;
Figure 876492DEST_PATH_IMAGE010
the incentive subsidy price representing the participation of the user i in the demand response acquisition during the time period t, is set by the grid operator,
Figure 301657DEST_PATH_IMAGE011
representing a minimum incentive subsidy price;
Figure 603326DEST_PATH_IMAGE012
representing the trade price unit price of standard coal at time t;
Figure 167162DEST_PATH_IMAGE013
a trade unit price representing carbon dioxide of a local carbon trade market at time t;
Figure 741232DEST_PATH_IMAGE014
and the real-time price represents the wholesale electricity price of the power market at the moment t.
3. The demand response incentive price determining method according to claim 2, wherein the customer-side model comprises a power-saving rate profit model, a subsidy profit model, a customer dissatisfaction model;
the user side model is as follows:
Figure 783137DEST_PATH_IMAGE015
the model for saving the electric charge and the profit is as follows:
Figure 798366DEST_PATH_IMAGE016
the subsidy profit model is as follows:
Figure 788231DEST_PATH_IMAGE017
the user dissatisfaction model
Figure 334750DEST_PATH_IMAGE018
Figure DEST_PATH_IMAGE019
Wherein, F Ur Representing user profits, C Sr Shows the profit of saving electricity charge, C SI Indicates subsidy profit, C Ud Indicating the dissatisfaction with the user,
Figure 960773DEST_PATH_IMAGE020
representing the electricity price of the user i at the time t;
Figure 909137DEST_PATH_IMAGE021
is the response rate of the user i, is calculated by the historical response effect,
Figure 64044DEST_PATH_IMAGE022
representing the historical response load of user i,
Figure 97859DEST_PATH_IMAGE023
representing historical power demand of user i;
Figure 996413DEST_PATH_IMAGE024
a demand response potential level representing user i;
Figure 268126DEST_PATH_IMAGE021
∈[0,1],
Figure 593934DEST_PATH_IMAGE024
∈[0,1]。
4. the method of claim 3 wherein the objective function model is:
Figure 849466DEST_PATH_IMAGE025
in the formula, rho is a profit weight factor and is expressed as the importance ratio of the profit of the power grid operator and the profit of the user, and rho belongs to [0,1].
5. The method as claimed in claim 1, wherein in step S2, the power load curve of the previous week of the user in the activity execution region is obtained, the power consumption characteristics of different users are identified by clustering the load curves, and the demand response incentive price is classified according to the power consumption characteristics.
6. The method of claim 1, wherein in step S2, historical response data of users participating in demand response activity is obtained, and the response rate of user i is identified by neural network
Figure 551712DEST_PATH_IMAGE021
And excitation elastic coefficient of different periods
Figure 271406DEST_PATH_IMAGE026
7. The method for demand response incentive price determination in view of customer satisfaction according to claim 1, wherein step S3 comprises in particular: modeling an incentive type demand response model decision optimization problem based on a hierarchical power market and a carbon emission trading market into a limited MDP (model driven planning) for learning by an intelligent agent, setting an incentive subsidy price for a user by a power grid operator, responding and reducing own power load by the user, feeding back the benefits of the user and a power grid to the power grid operator, resetting the incentive subsidy price by the power grid operator according to the load reduction amount of the user and the current total benefit, stopping an iteration process when the total benefit of the user and the power grid obtains the maximum value or reaches a convergence condition, and determining the incentive subsidy price at the moment as the optimal incentive-based demand response strategy.
8. The demand response incentive price determining method of claim 4, wherein the step S3 comprises the steps of:
step S31, initializing parameters, including: user i's demand response potential rating
Figure 768115DEST_PATH_IMAGE024
Coefficient of excitation elasticity
Figure 714206DEST_PATH_IMAGE027
Response rate of user i
Figure 751301DEST_PATH_IMAGE021
Revenue weighting factor ρ, user i response rate
Figure 732026DEST_PATH_IMAGE021
A discount coefficient alpha, a learning rate theta, a greedy coefficient epsilon,
Figure 417215DEST_PATH_IMAGE028
Figure 912918DEST_PATH_IMAGE029
Figure 691387DEST_PATH_IMAGE030
Figure 588936DEST_PATH_IMAGE031
Figure 693027DEST_PATH_IMAGE032
t, set up
Figure 738344DEST_PATH_IMAGE033
The interval of (1);
s32, initializing a Q table, wherein each element in the Q table is zero, setting a time period t =0, and setting a user i =0;
step S33, observing response load of user i participating in activity in t =0 time period
Figure 71236DEST_PATH_IMAGE034
;
Step S34, selecting incentive subsidy prices obtained by participation of user i in demand response in t period by greedy strategy
Figure 72559DEST_PATH_IMAGE035
Step S35, calculating the reward
Figure 832704DEST_PATH_IMAGE036
Observe the response load of user i participating in the activity during the t +1 time period
Figure 552268DEST_PATH_IMAGE037
And updating the Q value;
s36, judging whether the maximum time period T is reached, if so, turning to the next step, otherwise, T = T +1, and returning to the step S34;
step S37, judging whether the Q table converges to the maximum value, if yes, turning to the next step, otherwise, i = i +1, and returning to the step S33;
and S38, outputting the optimal incentive subsidy price of T time periods in one day.
9. The method for demand response incentive price determination according to claim 8, wherein, in step S35, the Q value is calculated according to the following equation:
Figure 688851DEST_PATH_IMAGE038
in the formula (I), the compound is shown in the specification,
Figure 544680DEST_PATH_IMAGE039
for the Q value of user i at time t,
Figure 678990DEST_PATH_IMAGE040
for the response load of user i at time t,
Figure 885849DEST_PATH_IMAGE041
for the incentive subsidy price that user i gets at time t, alpha is the discount coefficient,
Figure 826123DEST_PATH_IMAGE042
for the Q value of user i at time t +1,
Figure 536459DEST_PATH_IMAGE043
updating is performed during each iteration until the maximum accumulated discount return is obtained or the maximum iteration number is reached, and the updating mode of the Q value is as follows:
Figure 966303DEST_PATH_IMAGE044
in the formula (I), the compound is shown in the specification,
Figure 411191DEST_PATH_IMAGE045
to useThe Q value of the next iteration of user i at time t,
Figure 144704DEST_PATH_IMAGE046
for the response load of the next iteration of user i at time t,
Figure 460279DEST_PATH_IMAGE047
subsidizing the price for the next iteration of user i at time t,
Figure 247975DEST_PATH_IMAGE048
for the Q value of the next iteration of the user i at the moment of t +1, theta is the learning rate, and the value range is [0,1]]。
10. A demand response incentive price determining apparatus considering user satisfaction comprising a non-volatile computer storage medium having computer-executable instructions stored thereon, wherein the computer-executable instructions are executable to perform the method of demand response incentive price determining considering user satisfaction of any of claims 1-9.
CN202211367309.5A 2022-11-03 2022-11-03 Demand response incentive price determination method and device considering user satisfaction Pending CN115423539A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211367309.5A CN115423539A (en) 2022-11-03 2022-11-03 Demand response incentive price determination method and device considering user satisfaction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211367309.5A CN115423539A (en) 2022-11-03 2022-11-03 Demand response incentive price determination method and device considering user satisfaction

Publications (1)

Publication Number Publication Date
CN115423539A true CN115423539A (en) 2022-12-02

Family

ID=84207841

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211367309.5A Pending CN115423539A (en) 2022-11-03 2022-11-03 Demand response incentive price determination method and device considering user satisfaction

Country Status (1)

Country Link
CN (1) CN115423539A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116523273A (en) * 2023-07-04 2023-08-01 广东电网有限责任公司广州供电局 Demand response characteristic analysis method for industrial users
CN117114455A (en) * 2023-10-25 2023-11-24 广东电网有限责任公司中山供电局 Demand response scheduling method and device based on user participation

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116523273A (en) * 2023-07-04 2023-08-01 广东电网有限责任公司广州供电局 Demand response characteristic analysis method for industrial users
CN116523273B (en) * 2023-07-04 2023-09-22 广东电网有限责任公司广州供电局 Demand response characteristic analysis method for industrial users
CN117114455A (en) * 2023-10-25 2023-11-24 广东电网有限责任公司中山供电局 Demand response scheduling method and device based on user participation
CN117114455B (en) * 2023-10-25 2024-02-13 广东电网有限责任公司中山供电局 Demand response scheduling method and device based on user participation

Similar Documents

Publication Publication Date Title
US20230006444A1 (en) Methods and systems for an automated utility marketplace platform
CN115423539A (en) Demand response incentive price determination method and device considering user satisfaction
Rieger et al. Estimating the benefits of cooperation in a residential microgrid: A data-driven approach
Meng et al. An optimal real-time pricing for demand-side management: A Stackelberg game and genetic algorithm approach
Zhang et al. Bi-level stochastic real-time pricing model in multi-energy generation system: A reinforcement learning approach
Mbuwir et al. Battery scheduling in a residential multi-carrier energy system using reinforcement learning
Liu et al. A home energy management system incorporating data-driven uncertainty-aware user preference
US9805321B2 (en) Eco score analytics system
Dukovska et al. Local energy exchange considering heterogeneous prosumer preferences
Li et al. Exploring potential of energy flexibility in buildings for energy system services
Gao et al. Bounded rationality based multi-VPP trading in local energy markets: a dynamic game approach with different trading targets
Zhou et al. Eliciting private user information for residential demand response
Bandyopadhyay et al. Solar panels and smart thermostats: The power duo of the residential sector?
Casella et al. Hvac power conservation through reverse auctions and machine learning
Wang et al. Reward fairness-based optimal distributed real-time pricing to enable supply–demand matching
Shu et al. Dynamic incentive strategy for voluntary demand response based on TDP scheme
Nekouei et al. A game-theoretic analysis of demand response in electricity markets
Zeng et al. Holistic modeling framework of demand response considering multi-timescale uncertainties for capacity value estimation
Zois et al. Integrated platform for automated sustainable demand response in smart grids
Bâra et al. A value sharing method for heterogeneous energy communities archetypes
Liu et al. Improved reinforcement learning-based real-time energy scheduling for prosumer with elastic loads in smart grid
Yang et al. Reinforcement Learning-Based Market Game Model Considering Virtual Power Plants
Razzak et al. Leveraging Deep Q-Learning to maximize consumer quality of experience in smart grid
Asadinejad Electricity Market Designs for Demand Response from Residential Customers
Lu et al. Assessing the Impact of Demand Response on Renewable Energy Exploitation in Smart Grids with Multi-dimensional Uncertainties

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination