CN111461803A - Method and system for selecting bidding strategy for cross-country power market price reinforcement learning - Google Patents

Method and system for selecting bidding strategy for cross-country power market price reinforcement learning Download PDF

Info

Publication number
CN111461803A
CN111461803A CN201910048373.9A CN201910048373A CN111461803A CN 111461803 A CN111461803 A CN 111461803A CN 201910048373 A CN201910048373 A CN 201910048373A CN 111461803 A CN111461803 A CN 111461803A
Authority
CN
China
Prior art keywords
bidding strategy
bidding
strategy
reinforcement learning
algorithm model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910048373.9A
Other languages
Chinese (zh)
Inventor
李俊辉
白小保
周海明
张志峰
茹海波
张帅
郑磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
China Electric Power Research Institute Co Ltd CEPRI
Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
China Electric Power Research Institute Co Ltd CEPRI
Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, China Electric Power Research Institute Co Ltd CEPRI, Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN201910048373.9A priority Critical patent/CN111461803A/en
Publication of CN111461803A publication Critical patent/CN111461803A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0611Request for offers or quotes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Business, Economics & Management (AREA)
  • Economics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Health & Medical Sciences (AREA)
  • Marketing (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Public Health (AREA)
  • Tourism & Hospitality (AREA)
  • Primary Health Care (AREA)
  • Human Resources & Organizations (AREA)
  • General Health & Medical Sciences (AREA)
  • Water Supply & Treatment (AREA)
  • Development Economics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a method and a system for selecting a bidding strategy for cross-country power market price quotation by reinforcement learning, which are used for acquiring a bidding strategy set; substituting the bidding strategy set into a pre-established reinforcement learning RE algorithm model, and calculating the behavior tendency corresponding to the selected bidding strategy in a wheel disc mode; iteratively calculating a probability selection function of each bidding strategy in the bidding strategy set according to the behavior tendency corresponding to the bidding strategy selected by the power transaction operator until a convergence condition is met; and selecting a bidding strategy based on the probability selection function meeting the convergence condition.

Description

Method and system for selecting bidding strategy for cross-country power market price reinforcement learning
Technical Field
The invention relates to a method and a system, in particular to a method and a system for selecting a bidding strategy for cross-country electric power market price reinforcement learning.
Background
In the global energy internet, market union is an important means for promoting transnational electric power trading, and occurs between countries, regions and regions, but in the global electric power market union, a decision process between multiple electric power market operators and an interaction process between multiple electric power suppliers are complex dynamic problems, and are difficult to analyze and calculate by using a traditional analysis method, which is particularly prominent in medium-and long-term electric power market trading.
At present, two methods are mainly used for solving the transnational power market transaction, one method is based on the traditional optimization theory, a multi-level architecture is applied, the production benefit optimization problem of a power generator is taken as a core, and the power market transaction optimization is realized through the optimal trend of a transcontinent power operation backbone power network. The other method is based on random optimization, a Monte Carlo method is used, and a transaction game is developed under the condition of incomplete information from the optimal quotation of an operator, so that the game result reaches Nash balance.
However, due to the particularity of the power market, the power market transaction is constrained by multi-party conditions, even under the assumption of complete information and single-time-period transaction, the existence/uniqueness of nash equilibrium is a difficulty which is generally concerned, in addition, the global energy internet has a complex model in market union, the optimal reporting price of an operator is realized under the condition of multi-transaction time period and incomplete information, and the optimal production benefit of a generator is difficult to solve from the aspect of an analytic mathematical model.
With the development of artificial intelligence technology, reinforcement learning is a novel effective calculation method for processing the optimal strategy problem, reinforcement learning is a machine learning method based on the animal learning conditioned reflex principle, a reinforcement learning system mainly comprises environments and agents, common reinforcement learning main algorithms include a Q-learning method, a (Roth-Erev) RE method and the like, and a basic framework is shown in fig. 2.
The Agent comprises three parts: the system comprises an input module I, an enhancement module R and a strategy module P. The input module I converts the state of the description environment into a state which can be accepted by the adaptation Agent, and provides input X for the strategy module; the enhancement module assigns each state of the environment to a value r, and an enhancement signal can be directly or indirectly obtained from the state of the environment and is closely related to a subjective target; the policy module P is the most critical module, and its main function is to update Agent's knowledge through a learning mechanism, and at the same time, enable Agent to select an action according to a certain policy and act on the environment.
In a power combined scene of a cross-country power market, the following two problems can exist based on the learning mechanism model: first, if a policy action causes a very large negative profit, and the corresponding action trend is negative, it is very likely that the selection probability is negative, which does not meet the probability definition; second, if the profit is 0, the behavior tendency of each behavior strategy is reduced by the same proportion, so that the selection probability corresponding to each behavior strategy is kept unchanged, and the learning is stopped.
Disclosure of Invention
In order to solve the problems, the invention provides a method and a system for selecting a bidding strategy for a reinforcement learning cross-country power market, which optimize an RE reinforcement learning algorithm and apply the algorithm to a cross-country power market combined scene, so that the optimal overall price and the optimal production benefit of a power generator in all power trading market combinations are realized under the condition of multiple trading periods and incomplete information.
In order to achieve the purpose of the invention, the invention adopts the following technical scheme:
a reinforcement learning cross-country power market bid strategy selection method, the method comprising:
acquiring a bidding strategy set;
substituting the bidding strategy set into a pre-established reinforcement learning RE algorithm model, and calculating the behavior tendency corresponding to the selected bidding strategy in a wheel disc mode;
iteratively calculating a probability selection function of each bidding strategy in the bidding strategy set according to the behavior tendency corresponding to the bidding strategy selected by the power transaction operator until a convergence condition is met;
and selecting a bidding strategy based on the probability selection function meeting the convergence condition.
Preferably, the building of the reinforcement learning RE algorithm model includes:
determining a response function of the reinforcement learning RE algorithm model based on the competitive bidding income of the power transaction operator in the current round;
and obtaining the reinforcement learning RE algorithm model based on the response function of the reinforcement learning RE algorithm model.
Further, the response function in the reinforcement learning RE algorithm model is determined by the following formula;
Figure BDA0001949949090000021
in the formula, Rim(D) For the response function of the reinforcement learning RE algorithm model, M is the total number of operators, profitik(D) The competitive bidding income of the electric power transaction operator in the No. D turn is represented by the number D of the current turns; k is the bidding strategy number.
Further, obtaining the bidding revenue of the electricity trading operator in the current round comprises:
respectively generating quotations based on each bidding strategy in the bidding strategy set;
and determining the bidding income of the power transaction operator in the current turn based on the clearing information and the bidding strategy corresponding to the quoted price.
Further, the generating the price quote respectively based on each bidding strategy in the bidding strategy set includes:
initializing bidding strategy set of power transaction operator
Figure BDA0001949949090000022
Initial function ci(qGi) Initial behavior tendency qim(0) Initial selection probability pim(0) Constraint conditions and price, i is the ith electricity transaction operator;
the electric power transaction operator selects the bidding strategy
Figure BDA0001949949090000031
Generating corresponding offers fi (q)Gi)=ci(qGi);
Wherein the initial behavior tendency qim(0)=qi(0) Initial selection probability pim(0) Is 1/M, and M is the number of total operators.
Further, the determining the bidding income of the power transaction operator in the current round based on the clearing information and the bidding strategy corresponding to the quote comprises:
after all operators submit quotations, clearing information is formulated according to predefined clearing rules, the clearing information is fed back to the electric power transaction operator, and the clearing information is sent to the power generator by the electric power transaction operator;
the electric power transaction operator obtains bidding benefits of the current round according to the clearing information and the selected bidding strategy; wherein the content of the first and second substances,
the clearing information comprises: clearing price and middle standard electric quantity.
Preferably, the behavior tendency corresponding to the selected bidding strategy is determined by the following formula:
qim(D+1)=[1-r]qim(D)+Rim(D)
in the formula, qim(D)Showing selection of bidding strategy i in the Dth roundmTendency of behavior of qim(D+1)Indicating selection of bidding strategy i in the next round of the D-th roundmR denotes a certain behavior, Rim(D) The response function of the RE algorithm model is learned for reinforcement.
Further, a probability selection function of each bidding strategy in the bidding strategy set is determined according to the following formula:
Figure BDA0001949949090000032
in the formula, pim(D) Showing selection of bidding strategy a by electric power transaction operatormK is the bidding strategy quantity, and c is the cooling coefficient; q. q.sij(D)Representing the behavior tendency corresponding to the bidding strategy selected by the jth power transaction operator in the D round; m is the total number of power transaction operators, and e is an experience parameter.
A reinforcement learning cross-country power market bid strategy selection system, the system comprising:
the obtaining module is used for obtaining a bidding strategy set;
the determining module is used for substituting the bidding strategy set into a pre-established reinforcement learning RE algorithm model and calculating the behavior tendency corresponding to the selected bidding strategy in a wheel disc mode;
the iterative computation module is used for iteratively computing a probability selection function of each bidding strategy in the bidding strategy set according to the behavior tendency corresponding to the bidding strategy selected by the power transaction operator until a convergence condition is met;
and the selection module is used for selecting the bidding strategy based on the probability selection function meeting the convergence condition.
Preferably, the determining module includes:
the determining unit is used for determining a response function of the reinforcement learning RE algorithm model based on the competitive bidding income of the power transaction operator in the current turn;
and the obtaining unit is used for obtaining the reinforcement learning RE algorithm model based on the response function of the reinforcement learning RE algorithm model.
Compared with the closest prior art, the technical scheme provided by the invention has the following beneficial effects:
the invention provides a method and a system for selecting a bidding strategy for strengthening learning cross-country power market quotation, which can be applied to a cross-country power market power combination scene, and a bidding strategy set is obtained; substituting the bidding strategy set into a pre-established reinforcement learning RE algorithm model, and calculating the behavior tendency corresponding to the selected bidding strategy in a wheel disc mode; the problem of negative value behavior tendency and learning interruption of a reinforcement learning RE general algorithm model is solved, clear price selection in a cross-country electric power market electric power combined scene is stable, and powerful technical support can be provided for price strategies of operators.
Iteratively calculating a probability selection function of each bidding strategy in the bidding strategy set according to the behavior tendency corresponding to the bidding strategy selected by the power transaction operator until a convergence condition is met; and selecting a bidding strategy based on the probability selection function meeting the convergence condition. The probability selection function is iteratively calculated until convergence is achieved, so that the accuracy of strategy selection is enhanced, and the selected result is closer to the actual situation.
Drawings
FIG. 1 is a flow chart of a cross-country power market bid bidding strategy selection method based on reinforcement learning provided in an embodiment of the present invention;
FIG. 2 is a basic framework diagram of a reinforcement learning system provided in the background of the invention;
fig. 3 is a flowchart of a cross-country operator electricity trading market quotation algorithm provided in an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings.
The method mainly optimizes the RE reinforcement learning algorithm and applies the algorithm to a cross-country power market union scene, so that the optimal overall price and the optimal production benefit of a power generator in all power market unions are realized under the condition of multiple trading periods and incomplete information.
Introduction of basic principle:
suppose that the optional policy set of the electric power trade operator Agent i of each country is A (a)1,a2,…,ai,am) If the game is repeated, the game is played in the D-th round strategy akIs selected and Age is calculatednti the benefit of this round is profitik(D) Then at round D +1, for strategy amThe trend update formula of (1) is as follows:
qim(D+1)=[1-r]qim(D)+Rim(D) (2-1)
wherein the response function
Figure BDA0001949949090000051
In the formula, r is a forgetting factor which plays a role in inhibiting the increase of each behavior trend along with the time, reduces the importance of previous experience, and enhances the influence of a new strategy. e is an experience parameter which plays an encouraging role for the Agent to generate various quotation strategies in the early learning stage of the repeated game.
At this time, strategy imThe selection probability formula of (1) is as follows:
Figure BDA0001949949090000052
and the Agent i selects the next round of strategy behaviors according to the new selection probability and a wheel disc mode.
Marking the tendency coefficient q for each behavior i in the feasible domainiAnd probability coefficient piAnd each round of bidding is updated according to the income. In the case of a suitable coefficient adaptation, a convergence state may be reached, i.e. the probability p that a certain behavior r is selectedrApproaching 1. This means that the quote always performs an action r in the feasible domain when the agent reaches the converged state. Each generator set agent adopts RE reinforcement learning algorithm to make price decision of each round of transaction, the best declared price is searched in repeated auction so that the selection probability of various quotation strategies is the same when the profit is maximized and the initial price is made, then the updated quotation is obtained according to the learning algorithm, and the process is circulated until the final balance is reached.
In order to solve the problems that in a power combined scene of a transnational power market, an RE algorithm model has the following two problems: first, if a policy's behavior results in a very large negative valueIf the behavior tendency is negative, the selection probability is negative, which is not in accordance with the probability definition; second, if the profitik(D) When the value is 0, the behavior tendency of each behavior strategy is reduced by the same proportion, so that the selection probability corresponding to each behavior strategy is kept unchanged, and the learning is stopped; a method for selecting a bidding strategy for cross-country power market price reinforcement learning is provided, as shown in FIGS. 1 and 3, the specific operation steps are as follows:
s1, acquiring a bidding strategy set;
s2, substituting the bidding strategy set into a pre-established reinforcement learning RE algorithm model, and calculating the behavior tendency corresponding to the selected bidding strategy in a wheel disc mode;
s3, according to the behavior tendency corresponding to the bidding strategy selected by the power transaction operator, iteratively calculating the probability selection function of each bidding strategy in the bidding strategy set until the convergence condition is satisfied;
s4 selects a bidding strategy based on the probability selection function satisfying the convergence condition.
In step S1, the building of the reinforcement learning RE algorithm model includes:
determining a response function of the reinforcement learning RE algorithm model based on the competitive bidding income of the power transaction operator in the current round;
and obtaining the reinforcement learning RE algorithm model based on the response function of the reinforcement learning RE algorithm model.
Wherein, the response function in the reinforcement learning RE algorithm model is determined by the following formula;
Figure BDA0001949949090000061
in the formula, Rim(D) For the response function of the reinforcement learning RE algorithm model, M is the total number of operators, profitik(D) The competitive bidding income of the electric power transaction operator in the No. D turn is represented by the number D of the current turns; k is the bidding strategy number.
Wherein obtaining the competitive bidding revenue of the electricity trading operator in the current round comprises:
step a, respectively generating quotations based on each bidding strategy in a bidding strategy set;
and b, determining the bidding income of the power transaction operator in the current round based on the clearing information and the bidding strategy corresponding to the quoted price.
In step a, respectively generating bids based on each bidding strategy in the bidding strategy set comprises:
initializing bidding strategy set of power transaction operator
Figure BDA0001949949090000062
Initial function ci(qGi) Initial behavior tendency qim(0) Initial selection probability pim(0) Constraint conditions and price, i is the ith electricity transaction operator;
the electric power transaction operator selects the bidding strategy
Figure BDA0001949949090000063
Generating corresponding offers fi (q)Gi)=ci(qGi);
Wherein the initial behavior tendency qim(0)=qi(0) Initial selection probability pim(0) Is 1/M, and M is the number of total operators.
In step b, the determining the bidding income of the power transaction operator in the current round based on the clearing information and the bidding strategy corresponding to the quote comprises:
after all operators submit quotations, clearing information is formulated according to predefined clearing rules, the clearing information is fed back to the electric power transaction operator, and the clearing information is sent to the power generator by the electric power transaction operator;
the electric power transaction operator obtains bidding benefits of the current round according to the clearing information and the selected bidding strategy; wherein the clearing information comprises: clearing price and middle standard electric quantity.
In step S2, the behavior tendency corresponding to the selected bidding strategy is determined according to the following formula:
qim(D+1)=[1-r]qim(D)+Rim(D)
in the formula, qim(D)Showing selection of bidding strategy i in the Dth roundmTendency of behavior of qim(D+1)Indicating selection of bidding strategy i in the next round of the D-th roundmR denotes a certain behavior, Rim(D) The response function of the RE algorithm model is learned for reinforcement.
In step S3, a probability selection function of each bidding strategy in the bidding strategy set is determined according to the following formula:
Figure BDA0001949949090000064
in the formula, pim(D) Showing selection of bidding strategy i by power transaction operatormThe probability selection function k is the bidding strategy quantity, and c is the cooling coefficient; q. q.sij(D) Representing the behavior tendency corresponding to the bidding strategy selected by the jth power transaction operator in the D round; m is the total number of power transaction operators, and e is an experience parameter.
Further, the convergence condition in step S3 is customized, and when the convergence condition is not satisfied, the process returns to step S1, otherwise, the process ends.
According to the specific implementation mode, the key point of the invention lies in the selection of the response function and the probability selection function in the reinforcement learning RE algorithm model, so that the invention protects the combined application of the two functions and the similar functions in the patent and the modified reinforcement learning RE algorithm model in the electric power market.
Example (b):
table 1 lists the results of the simulation in which AVE refers to the mean, SD refers to the standard deviation, and S% represents the percentage of the standard deviation relative to the mean. Experiments show that the average clearing price of the new algorithm is higher than that of the general algorithm, the value of the standard deviation calculated by the general algorithm is larger than that of the new algorithm, and the fluctuation of the clearing price of the new algorithm is reduced after modification, the clearing price is more accurate, the quotation of each operator is closer to the actual current situation, and the benefit of each power operator is favorably ensured.
TABLE 1 average out price (k ¥/MWh)
Figure BDA0001949949090000071
Based on the same inventive concept, the application also provides a system for selecting the bidding strategy for the cross-country power market through reinforcement learning, which comprises the following steps:
the obtaining module is used for obtaining a bidding strategy set;
the determining module is used for substituting the bidding strategy set into a pre-established reinforcement learning RE algorithm model and calculating the behavior tendency corresponding to the selected bidding strategy in a wheel disc mode;
the iterative computation module is used for iteratively computing a probability selection function of each bidding strategy in the bidding strategy set according to the behavior tendency corresponding to the bidding strategy selected by the power transaction operator until a convergence condition is met;
and the selection module is used for selecting the bidding strategy based on the probability selection function meeting the convergence condition.
Wherein, the determining module further comprises:
the determining unit is used for determining a response function of the reinforcement learning RE algorithm model based on the competitive bidding income of the power transaction operator in the current turn;
and the obtaining unit is used for obtaining the reinforcement learning RE algorithm model based on the response function of the reinforcement learning RE algorithm model.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Claims (10)

1. A method for selecting a bidding strategy for strengthening learning cross-country power market quotation is characterized by comprising the following steps:
acquiring a bidding strategy set;
substituting the bidding strategy set into a pre-established reinforcement learning RE algorithm model, and calculating the behavior tendency corresponding to the selected bidding strategy in a wheel disc mode;
iteratively calculating a probability selection function of each bidding strategy in the bidding strategy set according to the behavior tendency corresponding to the bidding strategy selected by the power transaction operator until a convergence condition is met;
and selecting a bidding strategy based on the probability selection function meeting the convergence condition.
2. The method of claim 1, wherein the building of the reinforcement learning RE algorithm model comprises:
determining a response function of the reinforcement learning RE algorithm model based on the competitive bidding income of the power transaction operator in the current round;
and obtaining the reinforcement learning RE algorithm model based on the response function of the reinforcement learning RE algorithm model.
3. The method of claim 2, wherein the response function in the reinforcement learning RE algorithm model is determined by the following equation;
Figure FDA0001949949080000011
in the formula, Rim(D) For the response function of the reinforcement learning RE algorithm model, M is the total number of operators, profitik(D) The competitive bidding income of the electric power transaction operator in the No. D turn is represented by the number D of the current turns; k is the bidding strategy number.
4. The method of claim 3, wherein obtaining a bid revenue for the electricity trading operator in a current round comprises:
respectively generating quotations based on each bidding strategy in the bidding strategy set;
and determining the bidding income of the power transaction operator in the current turn based on the clearing information and the bidding strategy corresponding to the quoted price.
5. The method of claim 4, wherein the generating bids based on each bidding policy in the set of bidding policies separately comprises:
initializing bidding strategy set of power transaction operator
Figure FDA0001949949080000012
Initial function ci(qGi) Initial behavior tendency qim(0) Initial selection probability pim(0) Constraint conditions and price, i is the ith electricity transaction operator;
the electric power transaction operator selects the bidding strategy
Figure FDA0001949949080000013
Generating corresponding offers fi (q)Gi)=ci(qGi);
Wherein the initial behavior tendency qim(0)=qi(0) Initial selection probability pim(0) Is 1/M, and M is the number of total operators.
6. The method of claim 4, wherein determining a bidding return of the power trading operator in the current round based on the clearing information and bidding strategy corresponding to the bid price comprises:
after all operators submit quotations, clearing information is formulated according to predefined clearing rules, the clearing information is fed back to the electric power transaction operator, and the clearing information is sent to the power generator by the electric power transaction operator;
the electric power transaction operator obtains bidding benefits of the current round according to the clearing information and the selected bidding strategy; wherein the content of the first and second substances,
the clearing information comprises: clearing price and middle standard electric quantity.
7. The method of claim 1, wherein the behavioral propensity corresponding to the selected bidding strategy is determined by:
qim(D+1)=[1-r]qim(D)+Rim(D)
in the formula, qim(D)Showing selection of bidding strategy i in the Dth roundmTendency of behavior of qim(D+1)Indicating selection of bidding strategy i in the next round of the D-th roundmR denotes a certain behavior, Rim(D) The response function of the RE algorithm model is learned for reinforcement.
8. The method of claim 7, wherein the probability selection function for each bidding strategy in the set of bidding strategies is determined by:
Figure FDA0001949949080000021
in the formula, pim(D) Showing selection of bidding strategy i by power transaction operatormK is the bidding strategy quantity, and c is the cooling coefficient; q. q.sij(D)Representing the behavior tendency corresponding to the bidding strategy selected by the jth power transaction operator in the D round; m is the total number of power transaction operators, and e is an experience parameter.
9. A system for reinforcement learning cross-country power market bid strategy selection, the system comprising:
the obtaining module is used for obtaining a bidding strategy set;
the determining module is used for substituting the bidding strategy set into a pre-established reinforcement learning RE algorithm model and calculating the behavior tendency corresponding to the selected bidding strategy in a wheel disc mode;
the iterative computation module is used for iteratively computing a probability selection function of each bidding strategy in the bidding strategy set according to the behavior tendency corresponding to the bidding strategy selected by the power transaction operator until a convergence condition is met;
and the selection module is used for selecting the bidding strategy based on the probability selection function meeting the convergence condition.
10. The system of claim 9, wherein the determination module comprises:
the determining unit is used for determining a response function of the reinforcement learning RE algorithm model based on the competitive bidding income of the power transaction operator in the current turn;
and the obtaining unit is used for obtaining the reinforcement learning RE algorithm model based on the response function of the reinforcement learning RE algorithm model.
CN201910048373.9A 2019-01-18 2019-01-18 Method and system for selecting bidding strategy for cross-country power market price reinforcement learning Pending CN111461803A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910048373.9A CN111461803A (en) 2019-01-18 2019-01-18 Method and system for selecting bidding strategy for cross-country power market price reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910048373.9A CN111461803A (en) 2019-01-18 2019-01-18 Method and system for selecting bidding strategy for cross-country power market price reinforcement learning

Publications (1)

Publication Number Publication Date
CN111461803A true CN111461803A (en) 2020-07-28

Family

ID=71684914

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910048373.9A Pending CN111461803A (en) 2019-01-18 2019-01-18 Method and system for selecting bidding strategy for cross-country power market price reinforcement learning

Country Status (1)

Country Link
CN (1) CN111461803A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112348621A (en) * 2020-08-21 2021-02-09 国网吉林省电力有限公司 Generator quotation model based on RE Learning algorithm

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101916400A (en) * 2010-07-29 2010-12-15 中国电力科学研究院 ACE (Agent-based Computational Economics) simulation method of electricity market by adopting cooperative particle swarm algorithm
CN107644370A (en) * 2017-09-29 2018-01-30 中国电力科学研究院 Price competing method and system are brought in a kind of self-reinforcing study together

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101916400A (en) * 2010-07-29 2010-12-15 中国电力科学研究院 ACE (Agent-based Computational Economics) simulation method of electricity market by adopting cooperative particle swarm algorithm
CN107644370A (en) * 2017-09-29 2018-01-30 中国电力科学研究院 Price competing method and system are brought in a kind of self-reinforcing study together

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
冯恒: "基于智能代理的电力市场成员行为模拟方法", 《CNKI优秀硕士学位论文全文库》, pages 10 - 71 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112348621A (en) * 2020-08-21 2021-02-09 国网吉林省电力有限公司 Generator quotation model based on RE Learning algorithm

Similar Documents

Publication Publication Date Title
Zhao et al. Jointly learning to recommend and advertise
Vytelingum The structure and behaviour of the continuous double auction
CN110796477A (en) Advertisement display method and device, electronic equipment and readable storage medium
Baranwal et al. A truthful and fair multi-attribute combinatorial reverse auction for resource procurement in cloud computing
CN111798280B (en) Multimedia information recommendation method, device and equipment and storage medium
CN109711871B (en) Potential customer determination method, device, server and readable storage medium
WO2019105235A1 (en) Pricing method and device, and computer-readable storage medium
Brânzei et al. Proportional dynamics in exchange economies
CN102163304A (en) Method and system for collaborative networking with optimized inter-domain information quality assessment
CN111192161A (en) Electric power market trading object recommendation method and device
Alcalde et al. Competition for procurement shares
WO2020104806A1 (en) Real-time bidding
Aggarwal et al. Multi-channel auction design in the autobidding world
CN107527128B (en) Resource parameter determination method and equipment for advertisement platform
KR20220017379A (en) E-bidding consulting system based on competitor prediction
CN111461803A (en) Method and system for selecting bidding strategy for cross-country power market price reinforcement learning
Bertsimas et al. Optimal bidding in online auctions
CN110555742A (en) Generation method and system for generator agent quotation
Boyer et al. Common-value auction versus posted-price selling: an agent-based model approach
Chandlekar et al. Multi-unit double auctions: equilibrium analysis and bidding strategy using DDPG in smart-grids
Agapitos et al. On the genetic programming of time-series predictors for supply chain management
US8321262B1 (en) Method and system for generating pricing recommendations
Mayer et al. Accounting for price dependencies in simultaneous sealed-bid auctions
US20050144081A1 (en) Method and system for predicting the outcome of an online auction
KR102451291B1 (en) E-bidding consulting system based on competitor prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination