CN110276698A - Distribution type renewable energy trade decision method based on the study of multiple agent bilayer cooperative reinforcing - Google Patents

Distribution type renewable energy trade decision method based on the study of multiple agent bilayer cooperative reinforcing Download PDF

Info

Publication number
CN110276698A
CN110276698A CN201910519858.1A CN201910519858A CN110276698A CN 110276698 A CN110276698 A CN 110276698A CN 201910519858 A CN201910519858 A CN 201910519858A CN 110276698 A CN110276698 A CN 110276698A
Authority
CN
China
Prior art keywords
layer
reinforcement learning
renewable energy
double
agent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910519858.1A
Other languages
Chinese (zh)
Other versions
CN110276698B (en
Inventor
王建春
陈张宇
刘�东
黄玉辉
孙健
李峰
殷小荣
吉兰芳
孙宏斌
戴晖
吴晓飞
芦苇
戴易见
徐晓春
李佑伟
汤同峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
HuaiAn Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Original Assignee
Shanghai Jiaotong University
HuaiAn Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University, HuaiAn Power Supply Co of State Grid Jiangsu Electric Power Co Ltd filed Critical Shanghai Jiaotong University
Priority to CN201910519858.1A priority Critical patent/CN110276698B/en
Publication of CN110276698A publication Critical patent/CN110276698A/en
Application granted granted Critical
Publication of CN110276698B publication Critical patent/CN110276698B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Evolutionary Computation (AREA)
  • Computer Hardware Design (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Geometry (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Supply And Distribution Of Alternating Current (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of distribution type renewable energy trade decision methods based on the study of multiple agent bilayer cooperative reinforcing, and the method includes following key steps: 1) constructing the double-deck Stochastic Decision-making Optimized model of distribution type renewable energy transaction;2) multiple agent bilayer cooperative reinforcing learning algorithm is introduced, according to the theoretical frame of multiple agent bilayer cooperative reinforcing learning algorithm, learning training is carried out, establishes function approximator and cooperative reinforcing study and work mechanism;3) estimated value of optimal Q value function is sought using iterative calculation method in the step 2) frame foundation;4) it using the multiple agent bilayer cooperative reinforcing learning algorithm solving optimization model trained, completes optimization and calculates.The present invention considers the uncertainty in distribution type renewable energy transaction, can be in the promotion Power Generation income for taking into account risk, while but also comprehensive benefit maximizes.

Description

Distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning
Technical Field
The invention relates to the field of intelligent power distribution networks, in particular to a distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning.
Background
With the progress and development of society, the global demand for green, clean and efficient power is larger and larger, more and more distributed renewable energy sources are connected to a power distribution network, and the distributed energy sources have the characteristics of reasonable energy efficiency utilization, small loss, less pollution, flexible operation, good system economy and the like. The development mainly has the problems of grid connection, power supply quality, capacity storage, fuel supply and the like.
Distributed photovoltaic and wind power generation, while free of fuel costs, are high in construction, operating and maintenance costs. At present, the new energy distributed generator in China is mainly subsidized for profit through the electricity price of the state and local governments. However, as distributed power penetration increases, the profitability model is significantly less consistent with market laws. The distributed generators are subsidized through the subscription fee of the users, the generators can be helped to participate in market competition, and reasonable quotation is carried out according to the potential benefits and the power generation cost of the generators, so that the social benefits are improved to the maximum extent. Meanwhile, various uncertain information such as power generator quotation, distributed power supply output fluctuation, user subscription and the like are considered, model solution can be carried out through a multi-agent double-layer collaborative reinforcement learning solution method, an optimal scheduling decision can be rapidly calculated, risks are reduced, and economic benefits are improved.
Disclosure of Invention
In order to overcome the defects of the existing transaction decision method, the invention provides a distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning, a distributed energy double-layer random planning model under various uncertain information such as power generator quotation, distributed power supply output fluctuation, user subscription and the like is considered, model solution is carried out through a multi-agent double-layer collaborative reinforcement learning solution method, the optimal scheduling decision can be rapidly calculated, the risk is reduced, and the economic benefit is improved.
The invention realizes the aim through the following technical scheme:
a distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning comprises the following steps:
step 1) constructing a double-layer random decision optimization model of distributed renewable energy trading;
step 2) introducing a multi-agent double-layer collaborative reinforcement learning algorithm, carrying out learning training according to a theoretical framework of the multi-agent double-layer collaborative reinforcement learning algorithm, and establishing a function approximator and a collaborative reinforcement learning working mechanism; the function approximator estimates a Q value by adopting a series of adjustable parameters and characteristics extracted from a state action space, the approximator establishes mapping of the state action of a function from the parameter space to the Q value to the space, the mapping can be linear or nonlinear, solvability can be analyzed by utilizing linear mapping, and the typical form of the function approximator is as follows:
whereinIs an adjustable approximate parameter vector and is characterized in that,is the feature vector of the state-action pair,is the Basis Function (BF) (. DEG)TRepresenting a matrix transpose operation;
step 3) solving an estimation value of an optimal Q value function by using an iterative calculation method on the basis of the frame in the step 2);
and 4) solving an optimization model by using the trained multi-agent double-layer collaborative reinforcement learning algorithm to complete optimization calculation.
Preferably, the double-layer random decision optimization model for the distributed renewable energy transaction in step 1) includes an upper-layer planning modeling and a lower-layer planning modeling, which respectively correspond to two parts of an energy transaction link.
Preferably, the opportunity constraint programming for maximizing the optimistic value of the objective function constructed in the upper-level programming modeling has the optimization goal of maximum economic benefit, the constraint condition is composed of objective constraint limits and opportunity constraint limits, and the mathematical expression of the upper-level programming modeling is as follows:
constraint function:
wherein λ -power generation trade time-sharing quotation, wherein λtIs the bid at time t, ξ -a random variable that is not known to be caused by the bidder's bid,random variables caused by uncertainty of deviation of real values and predicted values of wind power and photovoltaic,when the quote is lambda, atξ andgenerator revenue in the scenario, β — bearing risk confidence,-meeting expected yield at β confidence, qt,ξξ, the power generator obtained in the lower layer planning marks the power amount in the time period t,under the scene of- ξ, user new energy subscription compensation (lower layer decision output) of unit electric quantity obtained by lower layer decision, cbase-the cost per unit of electricity generation,the power generator is ξPenalty fines in the scene, gamma-unit fines for outstanding electricity,time t, ξ scenario with exceeding of the amount of electricityThe unbalanced electric quantity of the maximum output under the scene,-atAnd in the scene, at the moment T, the actual output upper limit of the distributed power supply is T-one time period, and the default value is one hour.
Preferably, the lower-layer planning modeling is used for optimizing scheduling and allocating bid amount in each power generator aiming at a bidding scenario and with market operation comprehensive benefits as a target, and the mathematical expression of the lower-layer planning is as follows:
constraint function:
in the formula: n is a radical ofpv、Nwp-the total number of photovoltaic and wind power generators in the area, L-the total number of power consumers in the area,-unit cost of purchasing electricity from the external grid at time t,the electricity purchasing cost from No. i photovoltaic and wind power generators at the moment t,-t time point purchasing electric power from the external grid,purchasing electricity from No. i photovoltaic and wind power generators at the moment t,-load of No. i electricity consumer at time t, comppv、compwp-per-degree electricity subscription compensation paid in renewable energy sources such as photovoltaic, wind power and the like within the user subscription range, Qload-pv-i、Qload-wp-iPhotovoltaic and wind power generation system for I number user to settle accounts on current daySubscription of electric quantity, Qpv、Qwp、Qgrid-photovoltaic, wind-electric, external electric quantity, upsilon, consumed in the area of the daypv、υwp-ratio of photovoltaic to wind power generation in the area of the day, αi、βi-the photovoltaic and wind power ratio subscribed by the i-th user,and (4) the maximum generating capacity at the time t reported by the No. i photovoltaic and wind power generator.
Preferably, in the step 2), a plurality of agents are utilized to respectively process the randomness problem of the upper-layer planning modeling and the lower-layer planning modeling and the mutual iteration of the upper-layer planning modeling and the lower-layer planning modeling; the double-layer collaborative reinforcement algorithm introduces a diffusion strategy in the reinforcement learning process, and introduces an adaptive combination (ATC) mechanism into the reinforcement learning algorithm, and the collaborative reinforcement learning algorithm can adapt to randomness and uncertainty caused by distributed renewable energy sources and can adapt to the problem of complex calculation of a double-layer random decision optimization model; in addition, to avoid the storage of a large number of Q value tables, a function approximator is used to record the Q values of complex continuous states and motion spaces.
Preferably, the diffusion strategy can achieve faster convergence and can achieve a lower mean square deviation than a uniform strategy, which is as follows:
whereinIs an intermediate term, x, introduced by the diffusion strategyi(k +1) is the state updated by combining all intermediate terms of agent i; n is a radical ofiIs a set of points adjacent to agent i; bijIs the weight assigned by agent i to neighboring agent j; defining a matrix B ═ Bij]∈Rn×nAs a topological matrix of the microgrid communication network; the topology matrix B is a random matrix, B1n=1nIn which 1 isn∈RnIs a unit vector.
Has the advantages that:
1. the double-layer decision optimization model established by the invention can comprehensively consider the uncertainty situation caused by the random variables and make better decisions. It is therefore well suited for optimization decisions for distributed generators.
2. The algorithm provided by the invention is a double-layer collaborative reinforcement learning algorithm, can be well integrated into a two-layer random decision optimization model, and provides a new idea for intensive energy trading decision of a future information network and an energy network.
3. The invention introduces a plurality of agents to respectively process the randomness problem of the upper and lower layers of planning and the mutual iteration of the upper and lower layers, so that the collaborative reinforcement learning algorithm is more suitable for the problem of the double-layer planning.
4. The multi-agent double-layer collaborative reinforcement learning is used as a multi-agent reinforcement learning algorithm with self-learning and collaborative learning capabilities, and is more suitable for solving the large-scale distributed access energy problem with strong randomness and uncertainty. After certain training and updating, the algorithm can quickly carry out dynamic optimization, and meanwhile, the stability of global convergence is guaranteed.
5. A diffusion strategy is introduced in the reinforcement learning process, so that distributed information exchange can be realized in the microgrid, the calculation cost is reduced, faster convergence can be realized, and the mean square deviation lower than that of a consistent strategy can be achieved.
Drawings
FIG. 1 is an overall frame diagram of the present invention;
FIG. 2 is a flow chart of multi-agent dual-tier collaborative reinforcement learning according to the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings and specific embodiments.
The distributed renewable energy trading decision method based on multi-agent double-layer collaborative reinforcement learning disclosed by the invention takes a power distribution network as a medium, simultaneously schedules a distributed power supply and a controllable load, and realizes economic benefit optimization, and the optimization object and model of the method are schematically shown in figure 1.
The invention provides a distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning, which comprises the following steps:
step 1) constructing a double-layer random decision optimization model of distributed renewable energy trading;
step 2) introducing a multi-agent double-layer collaborative reinforcement learning algorithm, carrying out learning training according to a theoretical framework of the multi-agent double-layer collaborative reinforcement learning algorithm, and establishing a function approximator and a collaborative reinforcement learning working mechanism;
step 3) solving an estimation value of an optimal Q value function by using an iterative calculation method on the basis of the frame in the step 2);
and 4) solving an optimization model by using the trained multi-agent double-layer collaborative reinforcement learning algorithm to complete optimization calculation.
The double-layer random decision optimization model for the distributed renewable energy transaction in the step 1) comprises an upper-layer planning modeling and a lower-layer planning modeling, which respectively correspond to two parts of an energy transaction link
In the step 2), a plurality of agents are used for respectively processing the randomness problem of the upper-layer planning modeling and the lower-layer planning modeling and the mutual iteration of the upper-layer planning modeling and the lower-layer planning modeling; the double-layer collaborative reinforcement algorithm introduces a diffusion strategy in the reinforcement learning process, and introduces an adaptive combination (ATC) mechanism into the reinforcement learning algorithm, and the collaborative reinforcement learning algorithm can adapt to randomness and uncertainty caused by distributed renewable energy sources and can adapt to the problem of complex calculation of a double-layer random decision optimization model; to avoid the storage of a large number of Q-value tables, a function approximator is used to record the Q-values of complex continuous states and motion spaces.
The step 3) iterative computation flow comprises the following steps (see fig. 2):
s1 initialization theta0,ω0
S2, repeating the times k is 1to T
S3, each agent calculates i-1 to n in turn
S4 calculating the feature vectorAnd state si(k)
S5 selecting action a according to strategy pii(k)
S6 observing the prize value ri(k)
TD error delta S7i(k)
S8 estimation
S9, updating the parameter thetai(k),ωi(k)
S10 Return to S3
S11 Return to S2
S12: and returning the result.
The basic steps and explanation of the application of the distributed renewable energy framework of multi-agent double-layer collaborative reinforcement learning are as follows:
a1: decomposing and writing the target function and the constraint function of the upper and lower layer plans into respective rewards of a reinforcement learning algorithm to serve as reference values of rewards, wherein the target function of the upper layer plan is expected to be the maximum and is set as forward rewards, the target function of the lower layer plan is expected to be the lowest in price and is set as reverse rewards, the constraint conditions of the upper and lower layer plans are used as penalty items, coefficients are set according to actual debugging conditions, the requirement is that the penalty coefficient of the strong constraint is far greater than the Reward item coefficient, and the weak constraint is greater than the Reward item coefficient.
A2: the method comprises the steps of constructing a first reinforcement learning module which is essentially a combination of two (usually a plurality of) reinforcement learning intelligent agents, establishing a reinforcement learning intelligent agent unit by taking a lower-layer plan as a module, establishing a reinforcement learning intelligent agent unit by taking each power generator as a module at an upper layer due to a plurality of power generators, and finally integrating the intelligent agent unit at the upper layer and the intelligent agent unit at the lower layer through a whole intelligent agent unit, wherein as shown in an intelligent agent II in figure 1, the Reward structure of the intelligent agent II is that the maximum total Reward of each intelligent agent unit is the maximum target.
A3: and establishing a function approximator. The storage of the Q value occupies a large amount of resources of the computer, so as to reduce the occupation of the computer resources and increase the calculation speed.
A4: establishing a cooperative reinforcement learning working mechanism, and establishing a parameter updating process of integrating an adaptive combination (ATC) diffusion strategy into Greedy-GQ in order to accelerate the calculation efficiency of a multi-agent.
A5: and constructing a second reinforcement learning module, taking the agent II as an environment of the agent, and establishing an updating strategy by using a conventional Q learning (or Sarsar, DQN and the like) updating rule.
Modeling upper layer planning:
and the opportunity constraint planning of the optimistic value of the maximized objective function constructed in the upper-layer planning aims at maximizing economic benefit, and the constraint condition consists of objective constraint limit and opportunity constraint limit. Moreover, the upper layer optimization aims at an optimistic value of the economic benefit (i.e. the economic benefit obtained is better than the value at a certain confidence) to minimize the operation cost of the distribution network. The objective constraint limits are constraint conditions aiming at the deterministic objects and comprise unit power generation cost, unit unfinished power generation amount fine, upper and lower limits of actual processing of the distributed power supply and the like. The opportunity constraint limit is a constraint condition aiming at the distribution network uncertainty object, and comprises a probability constraint for bearing risk execution degree, a power flow safety limit and the like. Sources of uncertainty factors include uncertainty of distributed photovoltaic, wind power output, generator bid, uncertainty of traditional load forecast deviation, and the like.
Therefore, the mathematical expression of the upper-level planning modeling is as follows:
constraint function:
in the formula
lambda-Power Generation trade time-of-sale quotes, where lambdatIs a quote at time t
ξ random variable caused by unknown price quote of bidder
-random variables caused by uncertainty of deviation of real values and predicted values of wind power and photovoltaic
When the offer is λ, ξ andpower generator revenue under scene
β -confidence of bearing risk
Satisfaction of expected yield at β confidence
qt,ξξ scenario, the power generator obtained in the lower layer planning bid amount in the time period t
ξ scene, unit electric quantity user new energy subscription compensation obtained by lower layer decision (lower layer decision output)
cbase-cost per unit of electricity generation
-power generators at ξ andpenalty fines under scene
Gamma-unit fine of unfinished electricity
Time t, ξ scenario with power charge exceededUnbalanced electric quantity of maximum output under scene
-atActual output upper limit of distributed power supply at time t under scene
T-a period of time, with a default value of one hour.
Modeling of a lower layer plan:
and the lower-layer planning optimizes the scheduling and the allocation of the right to bid of each power plant by taking the comprehensive benefits of market operation as a target. The lower level programming model is actually a market-balanced scheduling model for the regional retail market. The accuracy of the model determines whether the regional market can function properly according to the rules. Due to the neglect of energy storage, the electricity purchase sources in the region include both the distributed generator and the external grid, and the sum of the electricity purchase cost of each period constitutes the cost source of the system. In addition, considering that the user is willing to pay a certain cost to order new energy and enjoy green power, this user group may also be included in the overall benefit. Therefore, the optimal goal can be to minimize the cost of electricity purchase and increase the subscription fees for green electricity.
Therefore, the mathematical expression of the underlying plan modeling is as follows:
constraint function:
in the formula:
Npv、Nwp-total number of photovoltaic and wind power generation suppliers in the area
L-total number of power consumers in area
-unit cost of purchasing electricity from external grid at time t
-the electricity purchasing cost from No. i photovoltaic and wind power generator at the moment t
-purchasing electric power from external grid at time t
-purchasing electricity from No. i photovoltaic and wind power generator at time t
Load capacity of No. i power consumer at time t
comppv、compwpThe subscription range of the user is per-degree electricity subscription compensation paid in renewable energy sources such as photovoltaic energy, wind power energy and the like
Qload-pv-i、Qload-wp-iThe photovoltaic and wind power subscription electric quantity which is paid by the i-number user in the current day settlement
Qpv、Qwp、Qgrid-photovoltaic, wind power, external electrical quantities consumed within the area of the day
υpv、υwp-ratio of photovoltaic to wind power generation in the region of the same day
αi、βi-photovoltaic and wind power ratio subscribed by No. i user
-maximum generated energy at time t reported by No. i photovoltaic and wind power generator
A function approximator:
the function approximator estimates the Q value using a series of adjustable parameters and features extracted from the state action space. The approximator then builds a mapping of the state contribution from the parameter space to the Q-value function to the space. The mapping may be linear or non-linear. Solvability may be analyzed using linear mapping. A typical form of a linear approximator is as follows:
whereinIs an adjustable approximate parameter vector and is characterized in that,is the feature vector of the state-action pair, which can be derived from the following equation:
whereinIs a Basis Function (BF), such as gaussian radial BF, centered at a selected motionless point in the state space. Typically, the BFs sets corresponding to fixed points are evenly distributed in the state space. Herein, all vectors are considered to be column vectors if not specified. (.)TRepresenting a matrix transpose operation. Radial basis function neural networks have been used in random nonlinear interconnect systems and have been shown to have good generalization performance.
Diffusion strategy:
the reinforcement learning algorithm introduces a diffusion strategy in the reinforcement learning process, and introduces an adaptive combination (ATC) mechanism into the reinforcement learning algorithm. The diffusion strategy may achieve faster convergence and may achieve a lower mean square deviation than the uniform strategy. Furthermore, the flooding strategy has better response performance to continuous real-time signals and is insensitive to neighboring weights. The basic idea of the flooding strategy is to combine collaboration items based on neighboring states during the self-state update of each agent. Consider having state xiAnd its dynamic characteristics.
xi(k+1)=xi(k)+f(xi(k))
The diffusion strategy is as follows:
whereinIs an intermediate term, x, introduced by the diffusion strategyi(k +1) is the state updated by combining all the intermediate terms of agent i. N is a radical ofiIs a set of points adjacent to agent i. In addition, bijIs the weight assigned by agent i to neighboring agent j. Here, we can define a matrix B ═ Bij]∈Rn×nAs a topology matrix of the microgrid communication network. In general, the topology matrix B is a random matrix, which means B1n=1nIn which 1 isn∈RnIs a unit vector.
A collaborative reinforcement learning algorithm is provided by integrating an adaptive combination (ATC) diffusion strategy into the parameter updating process of Greeny-GQ.
It is noted that the proposed cooperative reinforcement learning algorithm introduces two intermediate vectors:andactual approximation parameter vector θi(k +1) and correction parameter vector ωiIn the proposed algorithm, the learning rate parameters α (k) and β (k) can be set with the conditions P (1) to P (4).
α(k)>0,β(k)>0 P(1)
α(k)/β(k)→0 P(4)
Although the present invention has been described in detail with reference to the above embodiments, those skilled in the art can make modifications and equivalents to the embodiments of the present invention without departing from the spirit and scope of the present invention, which is set forth in the claims of the present application.

Claims (6)

1. A distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning is characterized by comprising the following steps:
step 1) constructing a double-layer random decision optimization model of distributed renewable energy trading;
step 2) introducing a multi-agent double-layer collaborative reinforcement learning algorithm, carrying out learning training according to a theoretical framework of the multi-agent double-layer collaborative reinforcement learning algorithm, and establishing a function approximator; the function approximator estimates a Q value by adopting a series of adjustable parameters and characteristics extracted from a state action space, the approximator establishes mapping of the state action of a function from the parameter space to the Q value to the space, the mapping can be linear or nonlinear, solvability can be analyzed by utilizing linear mapping, and the typical form of the function approximator is as follows:
whereinIs an adjustable approximate parameter vector and is characterized in that,is the feature vector of the state-action pair,is the Basis Function (BF); (.)TRepresenting a matrix transpose operation;
step 3) solving an estimation value of an optimal Q value function by using an iterative calculation method on the basis of the frame in the step 2);
and 4) solving an optimization model by using the trained multi-agent double-layer collaborative reinforcement learning algorithm to complete optimization calculation.
2. The distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning according to claim 1, characterized in that: the distributed renewable energy trading double-layer random decision optimization model in the step 1) comprises an upper-layer planning modeling and a lower-layer planning modeling, and the upper-layer planning modeling and the lower-layer planning modeling respectively correspond to two parts of an energy trading link.
3. The distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning according to claim 2, characterized in that: the opportunity constraint planning of the optimistic value of the maximized objective function constructed in the upper-layer planning modeling has the optimization target of the maximum economic benefit, the constraint condition consists of objective constraint limit and opportunity constraint limit, and the mathematical expression of the upper-layer planning modeling is as follows:
constraint function:
wherein lambda is a time-shared quote for the generator, wherein lambdatIs the quote at time t, ξ -a random variable that is not known to the bidder,random variables caused by uncertainty of deviation of real values and predicted values of wind power and photovoltaic,when the quote is λ, ξ withGenerator revenue in the scenario, β -confidence in risk,-meeting expected revenue at β confidence, qt,ξξ, the power generator obtained in the lower layer planning is marked with power in the time period t,under the scene of- ξ, the new energy subscription compensation of the unit electric quantity user obtained by lower layer decision, cbase-the cost per unit of electricity generation,the generator at ξ andpenalty fines in the scene, gamma-unit fines for outstanding electricity,time t, ξ scenario with power exceedingThe unbalanced electric quantity of the maximum output under the scene,-atUnder the scene, at the actual output upper limit of the distributed power supply at the moment T, T-one time period, the default value is one hour.
4. The distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning according to claim 2, characterized in that: the lower-layer planning modeling is used for optimizing scheduling and distributing bid amount of each power generator aiming at bidding scenes and taking market operation comprehensive benefits as targets, and the mathematical expression of the lower-layer planning is as follows:
constraint function:
in the formula: n is a radical ofpv、Nwp-the total number of photovoltaic, wind power generators in the area, L-the total number of power consumers in the area,-unit cost of purchasing electricity from the external grid at time t,the electricity purchasing cost from No. i photovoltaic and wind power generators at the moment t,-t time purchasing electric power from an external grid,purchasing electricity from No. i photovoltaic and wind power generation suppliers at the moment t,-load of No. i electricity consumer at time t, comppv、compwp-subscriber subscriptionThe range is per-degree electricity subscription compensation paid in renewable energy sources such as photovoltaic energy, wind power energy and the like, Qload-pv-i、Qload-wp-iI number user settles the charge-receivable photovoltaic and wind power subscription electric quantity on the same day, Qpv、Qwp、Qgrid-photovoltaic, wind-electric, external electric quantity, upsilon, consumed in the area of the daypv、υwp-ratio of photovoltaic to wind power generation in the area of the day, αi、βi-the photovoltaic and wind power ratio subscribed by the i-th user,and (4) the maximum generating capacity at the time t reported by the No. i photovoltaic and wind power generator.
5. The distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning according to claim 1, characterized in that: in the step 2), a plurality of agents are used for respectively processing the randomness problem of the upper-layer planning modeling and the lower-layer planning modeling and the mutual iteration of the upper-layer planning modeling and the lower-layer planning modeling; the double-layer collaborative reinforcement learning algorithm introduces a diffusion strategy in the reinforcement learning process, and introduces a self-adaptive combined ATC mechanism into the reinforcement learning algorithm, and the double-layer collaborative reinforcement learning algorithm can adapt to randomness and uncertainty caused by distributed renewable energy sources and can adapt to the problem of complex calculation of a double-layer random decision optimization model; to avoid the storage of a large number of Q-value tables, a function approximator is used to record the Q-values of complex continuous states and motion spaces.
6. The distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning according to claim 5, characterized in that: the diffusion strategy can achieve faster convergence and can achieve lower mean square deviation than the uniform strategy, and the diffusion strategy is as follows:
wherein,is an intermediate term, x, introduced by the diffusion strategyi(k +1) is the state updated by combining all intermediate terms of agent i; n is a radical ofiIs a set of points adjacent to agent i; bijIs the weight assigned by agent i to neighboring agent j; here, a matrix B ═ B is definedij]∈Rn×nAs a topological matrix of the microgrid communication network; the topology matrix B is a random matrix, B1n=1nIn which 1 isn∈RnIs a unit vector.
CN201910519858.1A 2019-06-17 2019-06-17 Distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning Active CN110276698B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910519858.1A CN110276698B (en) 2019-06-17 2019-06-17 Distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910519858.1A CN110276698B (en) 2019-06-17 2019-06-17 Distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning

Publications (2)

Publication Number Publication Date
CN110276698A true CN110276698A (en) 2019-09-24
CN110276698B CN110276698B (en) 2022-08-02

Family

ID=67960916

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910519858.1A Active CN110276698B (en) 2019-06-17 2019-06-17 Distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning

Country Status (1)

Country Link
CN (1) CN110276698B (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110990793A (en) * 2019-12-07 2020-04-10 国家电网有限公司 Scheduling optimization method for electric-thermal gas coupling micro-energy source station
CN111064229A (en) * 2019-12-18 2020-04-24 广东工业大学 Wind-light-gas-storage combined dynamic economic dispatching optimization method based on Q learning
CN111200285A (en) * 2020-02-12 2020-05-26 燕山大学 Micro-grid hybrid coordination control method based on reinforcement learning and multi-agent theory
CN112612206A (en) * 2020-11-27 2021-04-06 合肥工业大学 Multi-agent collaborative decision-making method and system for uncertain events
CN112714165A (en) * 2020-12-22 2021-04-27 声耕智能科技(西安)研究院有限公司 Distributed network cooperation strategy optimization method and device based on combination mechanism
CN112859591A (en) * 2020-12-23 2021-05-28 华电电力科学研究院有限公司 Reinforced learning control system for operation optimization of energy system
CN113378456A (en) * 2021-05-21 2021-09-10 青海大学 Multi-park comprehensive energy scheduling method and system
CN113421004A (en) * 2021-06-30 2021-09-21 国网山东省电力公司潍坊供电公司 Transmission and distribution cooperative active power distribution network distributed robust extension planning system and method
CN113555870A (en) * 2021-07-26 2021-10-26 国网江苏省电力有限公司南通供电分公司 Q-learning photovoltaic prediction-based power distribution network multi-time scale optimization scheduling method
CN113743583A (en) * 2021-08-07 2021-12-03 中国航空工业集团公司沈阳飞机设计研究所 Intelligent agent invalid behavior switching inhibition method based on reinforcement learning
CN113780622A (en) * 2021-08-04 2021-12-10 华南理工大学 Multi-micro-grid power distribution system distributed scheduling method based on multi-agent reinforcement learning
CN114021815A (en) * 2021-11-04 2022-02-08 东南大学 Extensible energy management cooperation method for community containing large-scale production and consumption persons
CN114611813A (en) * 2022-03-21 2022-06-10 特斯联科技集团有限公司 Community hot-cold water circulation optimal scheduling method and system based on hydrogen energy storage
CN117350515A (en) * 2023-11-21 2024-01-05 安徽大学 Ocean island group energy flow scheduling method based on multi-agent reinforcement learning
CN117559387A (en) * 2023-10-18 2024-02-13 东南大学 VPP internal energy optimization method and system based on deep reinforcement learning dynamic pricing
WO2024084125A1 (en) * 2022-10-19 2024-04-25 Aalto University Foundation Sr Trained optimization agent for renewable energy time shifting

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325608A (en) * 2018-06-01 2019-02-12 国网上海市电力公司 Consider the distributed generation resource Optimal Configuration Method of energy storage and meter and photovoltaic randomness
US20190072916A1 (en) * 2017-09-07 2019-03-07 Hitachi, Ltd. Learning control system and learning control method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190072916A1 (en) * 2017-09-07 2019-03-07 Hitachi, Ltd. Learning control system and learning control method
CN109325608A (en) * 2018-06-01 2019-02-12 国网上海市电力公司 Consider the distributed generation resource Optimal Configuration Method of energy storage and meter and photovoltaic randomness

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110990793B (en) * 2019-12-07 2024-03-15 国家电网有限公司 Scheduling optimization method for electric heating gas coupling micro energy station
CN110990793A (en) * 2019-12-07 2020-04-10 国家电网有限公司 Scheduling optimization method for electric-thermal gas coupling micro-energy source station
CN111064229A (en) * 2019-12-18 2020-04-24 广东工业大学 Wind-light-gas-storage combined dynamic economic dispatching optimization method based on Q learning
CN111064229B (en) * 2019-12-18 2023-04-07 广东工业大学 Wind-light-gas-storage combined dynamic economic dispatching optimization method based on Q learning
CN111200285A (en) * 2020-02-12 2020-05-26 燕山大学 Micro-grid hybrid coordination control method based on reinforcement learning and multi-agent theory
CN111200285B (en) * 2020-02-12 2023-12-19 燕山大学 Micro-grid hybrid coordination control method based on reinforcement learning and multi-agent theory
CN112612206A (en) * 2020-11-27 2021-04-06 合肥工业大学 Multi-agent collaborative decision-making method and system for uncertain events
CN112714165A (en) * 2020-12-22 2021-04-27 声耕智能科技(西安)研究院有限公司 Distributed network cooperation strategy optimization method and device based on combination mechanism
CN112859591A (en) * 2020-12-23 2021-05-28 华电电力科学研究院有限公司 Reinforced learning control system for operation optimization of energy system
CN113378456A (en) * 2021-05-21 2021-09-10 青海大学 Multi-park comprehensive energy scheduling method and system
CN113421004A (en) * 2021-06-30 2021-09-21 国网山东省电力公司潍坊供电公司 Transmission and distribution cooperative active power distribution network distributed robust extension planning system and method
CN113555870A (en) * 2021-07-26 2021-10-26 国网江苏省电力有限公司南通供电分公司 Q-learning photovoltaic prediction-based power distribution network multi-time scale optimization scheduling method
CN113555870B (en) * 2021-07-26 2023-10-13 国网江苏省电力有限公司南通供电分公司 Q-learning photovoltaic prediction-based power distribution network multi-time scale optimal scheduling method
CN113780622A (en) * 2021-08-04 2021-12-10 华南理工大学 Multi-micro-grid power distribution system distributed scheduling method based on multi-agent reinforcement learning
CN113780622B (en) * 2021-08-04 2024-03-12 华南理工大学 Multi-agent reinforcement learning-based distributed scheduling method for multi-microgrid power distribution system
CN113743583B (en) * 2021-08-07 2024-02-02 中国航空工业集团公司沈阳飞机设计研究所 Method for inhibiting switching of invalid behaviors of intelligent agent based on reinforcement learning
CN113743583A (en) * 2021-08-07 2021-12-03 中国航空工业集团公司沈阳飞机设计研究所 Intelligent agent invalid behavior switching inhibition method based on reinforcement learning
CN114021815B (en) * 2021-11-04 2023-06-27 东南大学 Scalable energy management collaboration method for community containing large-scale producers and consumers
CN114021815A (en) * 2021-11-04 2022-02-08 东南大学 Extensible energy management cooperation method for community containing large-scale production and consumption persons
CN114611813A (en) * 2022-03-21 2022-06-10 特斯联科技集团有限公司 Community hot-cold water circulation optimal scheduling method and system based on hydrogen energy storage
WO2024084125A1 (en) * 2022-10-19 2024-04-25 Aalto University Foundation Sr Trained optimization agent for renewable energy time shifting
CN117559387A (en) * 2023-10-18 2024-02-13 东南大学 VPP internal energy optimization method and system based on deep reinforcement learning dynamic pricing
CN117350515A (en) * 2023-11-21 2024-01-05 安徽大学 Ocean island group energy flow scheduling method based on multi-agent reinforcement learning
CN117350515B (en) * 2023-11-21 2024-04-05 安徽大学 Ocean island group energy flow scheduling method based on multi-agent reinforcement learning

Also Published As

Publication number Publication date
CN110276698B (en) 2022-08-02

Similar Documents

Publication Publication Date Title
CN110276698B (en) Distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning
Li et al. Distributed tri-layer risk-averse stochastic game approach for energy trading among multi-energy microgrids
Cheng et al. Game-theoretic approaches applied to transactions in the open and ever-growing electricity markets from the perspective of power demand response: An overview
Adetunji et al. A review of metaheuristic techniques for optimal integration of electrical units in distribution networks
Aghaei et al. Risk-constrained offering strategy for aggregated hybrid power plant including wind power producer and demand response provider
Varkani et al. A new self-scheduling strategy for integrated operation of wind and pumped-storage power plants in power markets
Chen et al. Research on day-ahead transactions between multi-microgrid based on cooperative game model
Maity et al. Simulation and pricing mechanism analysis of a solar-powered electrical microgrid
CN109190802B (en) Multi-microgrid game optimization method based on power generation prediction in cloud energy storage environment
Gao et al. A multiagent competitive bidding strategy in a pool-based electricity market with price-maker participants of WPPs and EV aggregators
CN112381263B (en) Block chain-based distributed data storage multi-microgrid pre-day robust electric energy transaction method
Adil et al. Energy trading among electric vehicles based on Stackelberg approaches: A review
CN111082451A (en) Incremental distribution network multi-objective optimization scheduling model based on scene method
CN111311012A (en) Multi-agent-based micro-grid power market double-layer bidding optimization method
Gao et al. Distributed energy trading and scheduling among microgrids via multiagent reinforcement learning
Chuang et al. Deep reinforcement learning based pricing strategy of aggregators considering renewable energy
CN111553750A (en) Energy storage bidding strategy method considering power price uncertainty and loss cost
Liu et al. Research on bidding strategy of thermal power companies in electricity market based on multi-agent deep deterministic policy gradient
CN112217195A (en) Cloud energy storage charging and discharging strategy forming method based on GRU multi-step prediction technology
CN116451880B (en) Distributed energy optimization scheduling method and device based on hybrid learning
CN112686693A (en) Method, system, equipment and storage medium for predicting marginal electricity price of electric power spot market
Peng et al. Review on bidding strategies for renewable energy power producers participating in electricity spot markets
CN117578409A (en) Multi-energy complementary optimization scheduling method and system in power market environment
CN115422728A (en) Robust optimization virtual power plant optimization control system based on stochastic programming
CN116307029A (en) Double-layer optimal scheduling method and system for promoting coordination of source storage among multiple virtual grids

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant