CN110276698B - Distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning - Google Patents

Distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning Download PDF

Info

Publication number
CN110276698B
CN110276698B CN201910519858.1A CN201910519858A CN110276698B CN 110276698 B CN110276698 B CN 110276698B CN 201910519858 A CN201910519858 A CN 201910519858A CN 110276698 B CN110276698 B CN 110276698B
Authority
CN
China
Prior art keywords
layer
reinforcement learning
double
agent
photovoltaic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910519858.1A
Other languages
Chinese (zh)
Other versions
CN110276698A (en
Inventor
王建春
陈张宇
刘�东
黄玉辉
孙健
李峰
殷小荣
吉兰芳
孙宏斌
戴晖
吴晓飞
芦苇
戴易见
徐晓春
李佑伟
汤同峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
HuaiAn Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Original Assignee
Shanghai Jiaotong University
HuaiAn Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University, HuaiAn Power Supply Co of State Grid Jiangsu Electric Power Co Ltd filed Critical Shanghai Jiaotong University
Priority to CN201910519858.1A priority Critical patent/CN110276698B/en
Publication of CN110276698A publication Critical patent/CN110276698A/en
Application granted granted Critical
Publication of CN110276698B publication Critical patent/CN110276698B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Human Resources & Organizations (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Public Health (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention discloses a distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning, which comprises the following main steps of: 1) constructing a double-layer random decision optimization model of distributed renewable energy trading; 2) introducing a multi-agent double-layer collaborative reinforcement learning algorithm, carrying out learning training according to a theoretical framework of the multi-agent double-layer collaborative reinforcement learning algorithm, and establishing a function approximator and a collaborative reinforcement learning working mechanism; 3) calculating an estimated value of an optimal Q value function by using an iterative calculation method on the basis of the frame in the step 2); 4) and solving an optimization model by using the trained multi-agent double-layer collaborative reinforcement learning algorithm to complete optimization calculation. The invention considers the uncertainty in the distributed renewable energy transaction, can improve the income of the power generator in consideration of risks, and simultaneously maximizes the comprehensive benefit.

Description

Distributed renewable energy transaction decision-making method based on multi-agent double-layer collaborative reinforcement learning
Technical Field
The invention relates to the field of intelligent power distribution networks, in particular to a distributed renewable energy trading decision method based on multi-agent double-layer collaborative reinforcement learning.
Background
With the progress and development of society, the global demand for green, clean and efficient power is larger and larger, more and more distributed renewable energy sources are connected to a power distribution network, and the distributed energy sources have the characteristics of reasonable energy efficiency utilization, small loss, less pollution, flexible operation, good system economy and the like. The development mainly has the problems of grid connection, power supply quality, capacity storage, fuel supply and the like.
Distributed photovoltaic and wind power generation, while free of fuel costs, are high in construction, operating and maintenance costs. At present, the new energy distributed generator in China is mainly subsidized for profit through the electricity price of the state and local governments. However, as distributed power penetration increases, the profitability model is significantly less consistent with market laws. The distributed generators are subsidized through the subscription fee of the users, the generators can be helped to participate in market competition, and reasonable quotation is carried out according to the potential benefits and the power generation cost of the generators, so that the social benefits are improved to the maximum extent. Meanwhile, various uncertain information such as power generator quotation, distributed power supply output fluctuation, user subscription and the like are considered, model solution can be carried out through a multi-agent double-layer collaborative reinforcement learning solution method, an optimal scheduling decision can be rapidly calculated, risks are reduced, and economic benefits are improved.
Disclosure of Invention
In order to overcome the defects of the existing transaction decision method, the invention provides a distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning, a distributed energy double-layer random planning model under various uncertain information such as power generator quotation, distributed power supply output fluctuation, user subscription and the like is considered, model solution is carried out through a multi-agent double-layer collaborative reinforcement learning solution method, the optimal scheduling decision can be rapidly calculated, the risk is reduced, and the economic benefit is improved.
The invention realizes the purpose through the following technical scheme:
a distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning comprises the following steps:
step 1) constructing a double-layer random decision optimization model of distributed renewable energy trading;
step 2) introducing a multi-agent double-layer cooperative reinforcement learning algorithm, carrying out learning training according to a theoretical framework of the multi-agent double-layer cooperative reinforcement learning algorithm, and establishing a function approximator and a cooperative reinforcement learning work mechanism; the function approximator estimates a Q value by adopting a series of adjustable parameters and characteristics extracted from a state action space, the approximator establishes mapping of the state action of a function from the parameter space to the Q value to the space, the mapping can be linear or nonlinear, solvability can be analyzed by utilizing linear mapping, and the typical form of the function approximator is as follows:
Figure BDA0002096301310000011
Figure BDA0002096301310000021
wherein
Figure BDA0002096301310000022
Is an adjustable approximate parameter vector and is characterized in that,
Figure BDA0002096301310000023
is the feature vector of the state-action pair,
Figure BDA0002096301310000024
is the Basis Function (BF) (. DEG) T Representing a matrix transpose operation;
step 3) solving an estimation value of an optimal Q value function by using an iterative calculation method on the basis of the frame in the step 2);
and 4) solving an optimization model by using the trained multi-agent double-layer collaborative reinforcement learning algorithm to complete optimization calculation.
Preferably, the double-layer random decision optimization model for the distributed renewable energy transaction in step 1) includes an upper-layer planning modeling and a lower-layer planning modeling, which respectively correspond to two parts of an energy transaction link.
Preferably, the opportunity constraint programming for maximizing the optimistic value of the objective function constructed in the upper-level programming modeling has the optimization goal of maximum economic benefit, the constraint condition is composed of objective constraint limits and opportunity constraint limits, and the mathematical expression of the upper-level programming modeling is as follows:
Figure BDA0002096301310000025
constraint function:
Figure BDA0002096301310000026
Figure BDA0002096301310000027
Figure BDA0002096301310000028
Figure BDA0002096301310000029
wherein λ -power generation trade time-sharing quotation, wherein λ t Is the quote at time t, ξ -a random variable caused by the bid being unknown to the bidder,
Figure BDA00020963013100000210
random variables caused by uncertainty of deviation of real values and predicted values of wind power and photovoltaic,
Figure BDA00020963013100000211
when the quoted price is lambda, in xi and
Figure BDA00020963013100000219
generator revenue under the scenario, beta-risk tolerance confidence,
Figure BDA00020963013100000213
-satisfy expected yield at confidence of β, q t,ξ ξ scene, the electricity generator obtained in the lower layer planning draws the electricity quantity in the time period t,
Figure BDA00020963013100000220
and in xi scene, the new energy subscription compensation (lower layer decision output) of the unit electric quantity user obtained by the lower layer decision, c base -the cost per unit of electricity generation,
Figure BDA00020963013100000215
the power generator is xi and
Figure BDA00020963013100000216
penalty fines in the scene, gamma-unit fines for outstanding electricity,
Figure BDA00020963013100000217
moment-t, over-charging in xi scene
Figure BDA0002096301310000031
The unbalanced electric quantity of the maximum output under the scene,
Figure BDA0002096301310000032
-at
Figure BDA0002096301310000033
And in the scene, at the moment T, the actual output upper limit of the distributed power supply is T-one time period, and the default value is one hour.
Preferably, the lower-layer planning modeling is used for optimizing scheduling and allocating bid amount in each power generator aiming at a bidding scenario and with market operation comprehensive benefits as a target, and the mathematical expression of the lower-layer planning is as follows:
Figure BDA0002096301310000034
constraint function:
Figure BDA0002096301310000035
Figure BDA0002096301310000036
Figure BDA0002096301310000037
Figure BDA0002096301310000038
Figure BDA0002096301310000039
Figure BDA00020963013100000310
Figure BDA00020963013100000311
Figure BDA00020963013100000312
Figure BDA00020963013100000313
Figure BDA00020963013100000314
Figure BDA00020963013100000315
in the formula: n is a radical of pv 、N wp -the total number of photovoltaic and wind power generators in the area, L-the total number of power consumers in the area,
Figure BDA0002096301310000041
-unit cost of purchasing electricity from the external grid at time t,
Figure BDA0002096301310000042
the electricity purchasing cost from No. i photovoltaic and wind power generators at the moment t,
Figure BDA0002096301310000043
-t time point purchasing electric power from the external grid,
Figure BDA0002096301310000044
purchasing electricity from No. i photovoltaic and wind power generators at the moment t,
Figure BDA0002096301310000045
-load of No. i electricity consumer at time t, comp pv 、comp wp -per-degree electricity subscription compensation paid in renewable energy sources such as photovoltaic, wind power and the like within the user subscription range, Q load-pv-i 、Q load-wp-i I number user settles the charge-receivable photovoltaic and wind power subscription electric quantity on the same day, Q pv 、Q wp 、Q grid -photovoltaic, wind-electric, external electric quantity, upsilon, consumed in the area of the day pv 、υ wp -ratio of photovoltaic to wind power generation in the region of the day, α i 、β i -the photovoltaic and wind power ratio subscribed by the i-th user,
Figure BDA0002096301310000046
and (4) the maximum generating capacity at the time t reported by the No. i photovoltaic and wind power generator.
Preferably, in the step 2), a plurality of agents are utilized to respectively process the randomness problem of the upper-layer planning modeling and the lower-layer planning modeling and the mutual iteration of the upper-layer planning modeling and the lower-layer planning modeling; the double-layer collaborative reinforcement algorithm introduces a diffusion strategy in the reinforcement learning process, and introduces an adaptive combination (ATC) mechanism into the reinforcement learning algorithm, and the collaborative reinforcement learning algorithm can adapt to randomness and uncertainty caused by distributed renewable energy sources and can adapt to the problem of complex calculation of a double-layer random decision optimization model; in addition, to avoid the storage of a large number of Q value tables, a function approximator is used to record the Q values of complex continuous states and motion spaces.
Preferably, the diffusion strategy can achieve faster convergence and can achieve a lower mean square deviation than a uniform strategy, which is as follows:
Figure BDA0002096301310000047
Figure BDA0002096301310000048
wherein
Figure BDA0002096301310000049
Is an intermediate term, x, introduced by the diffusion strategy i (k +1) is the state updated by combining all intermediate terms of agent i; n is a radical of i Is a set of points adjacent to agent i; b ij Is the weight assigned by agent i to neighboring agent j; defining a matrix B ═ B ij ]∈R n×n As a topological matrix of the microgrid communication network; the topology matrix B is a random matrix, B1 n =1 n In which 1 is n ∈R n Is a unit vector.
Has the advantages that:
1. the double-layer decision optimization model established by the invention can comprehensively consider the uncertainty situation caused by the random variables and make better decisions. It is therefore well suited for optimization decisions for distributed generators.
2. The algorithm provided by the invention is a double-layer collaborative reinforcement learning algorithm, can be well integrated into a two-layer random decision optimization model, and provides a new idea for intensive energy trading decision of a future information network and an energy network.
3. The invention introduces a plurality of agents to respectively process the randomness problem of the upper and lower layers of planning and the mutual iteration of the upper and lower layers, so that the collaborative reinforcement learning algorithm is more suitable for the problem of the double-layer planning.
4. The multi-agent double-layer collaborative reinforcement learning is used as a multi-agent reinforcement learning algorithm with self-learning and collaborative learning capabilities, and is more suitable for solving the large-scale distributed access energy problem with strong randomness and uncertainty. After certain training and updating, the algorithm can quickly carry out dynamic optimization, and meanwhile, the stability of global convergence is guaranteed.
5. A diffusion strategy is introduced in the reinforcement learning process, so that distributed information exchange can be realized in the microgrid, the calculation cost is reduced, faster convergence can be realized, and the mean square deviation lower than that of a consistent strategy can be achieved.
Drawings
FIG. 1 is an overall frame diagram of the present invention;
FIG. 2 is a flow chart of multi-agent dual-tier collaborative reinforcement learning according to the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings and specific embodiments.
The distributed renewable energy trading decision method based on multi-agent double-layer collaborative reinforcement learning disclosed by the invention takes a power distribution network as a medium, simultaneously schedules a distributed power supply and a controllable load, and realizes economic benefit optimization, and the optimization object and model of the method are schematically shown in figure 1.
The invention provides a distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning, which comprises the following steps:
step 1) constructing a double-layer random decision optimization model of distributed renewable energy trading;
step 2) introducing a multi-agent double-layer collaborative reinforcement learning algorithm, carrying out learning training according to a theoretical framework of the multi-agent double-layer collaborative reinforcement learning algorithm, and establishing a function approximator and a collaborative reinforcement learning working mechanism;
step 3) solving an estimation value of an optimal Q value function by using an iterative calculation method on the basis of the frame in the step 2);
and 4) solving an optimization model by using the trained multi-agent double-layer collaborative reinforcement learning algorithm to complete optimization calculation.
The double-layer random decision optimization model for the distributed renewable energy transaction in the step 1) comprises an upper-layer planning modeling and a lower-layer planning modeling, which respectively correspond to two parts of an energy transaction link
In the step 2), a plurality of agents are used for respectively processing the randomness problem of the upper-layer planning modeling and the lower-layer planning modeling and the mutual iteration of the upper-layer planning modeling and the lower-layer planning modeling; the double-layer collaborative reinforcement algorithm introduces a diffusion strategy in the reinforcement learning process, and introduces an adaptive combination (ATC) mechanism into the reinforcement learning algorithm, and the collaborative reinforcement learning algorithm can adapt to randomness and uncertainty caused by distributed renewable energy sources and can adapt to the problem of complex calculation of a double-layer random decision optimization model; to avoid the storage of a large number of Q-value tables, a function approximator is used to record the Q-values of complex continuous states and motion spaces.
The step 3) iterative computation flow comprises the following steps (see fig. 2):
s1 initialization theta 0 ,ω 0
S2, repeating the times k is 1to T
S3, each agent calculates i-1 to n in turn
S4 calculating the feature vector
Figure BDA0002096301310000061
And state s i (k)
S5 selecting action a according to strategy pi i (k)
S6 observing the prize value r i (k)
TD error delta S7 i (k)
S8 estimation
Figure BDA0002096301310000062
S9, updating the parameter theta i (k),ω i (k)
S10 Return to S3
S11 Return to S2
S12: and returning the result.
The basic steps and explanation of the application of the distributed renewable energy framework of multi-agent double-layer collaborative reinforcement learning are as follows:
a1: decomposing and writing the target function and the constraint function of the upper and lower layer plans into respective rewards of a reinforcement learning algorithm to serve as reference values of rewards, wherein the target function of the upper layer plan is expected to be the maximum and is set as forward rewards, the target function of the lower layer plan is expected to be the lowest in price and is set as reverse rewards, the constraint conditions of the upper and lower layer plans are used as penalty items, coefficients are set according to actual debugging conditions, the requirement is that the penalty coefficient of the strong constraint is far greater than the Reward item coefficient, and the weak constraint is greater than the Reward item coefficient.
A2: the method comprises the steps of constructing a first reinforcement learning module which is essentially a combination of two (usually a plurality of) reinforcement learning intelligent agents, establishing a reinforcement learning intelligent agent unit by taking a lower-layer plan as a module, establishing a reinforcement learning intelligent agent unit by taking each power generator as a module at an upper layer due to a plurality of power generators, and finally integrating the intelligent agent unit at the upper layer and the intelligent agent unit at the lower layer through a whole intelligent agent unit, wherein as shown in an intelligent agent II in figure 1, the Reward structure of the intelligent agent II is that the maximum total Reward of each intelligent agent unit is the maximum target.
A3: and establishing a function approximator. The storage of the Q value occupies a large amount of resources of the computer, so as to reduce the occupation of the computer resources and increase the calculation speed.
A4: establishing a cooperative reinforcement learning working mechanism, and establishing a parameter updating process of integrating an adaptive combination (ATC) diffusion strategy into Greedy-GQ in order to accelerate the calculation efficiency of a multi-agent.
A5: and constructing a second reinforcement learning module, taking the agent II as an environment of the agent, and establishing an updating strategy by using a conventional Q learning (or Sarsar, DQN and the like) updating rule.
Modeling upper layer planning:
and the opportunity constraint planning of the optimistic value of the maximized objective function constructed in the upper-layer planning aims at maximizing economic benefit, and the constraint condition consists of objective constraint limit and opportunity constraint limit. Moreover, the upper layer optimization aims at an optimistic value of the economic benefit (i.e. the economic benefit obtained is better than the value at a certain confidence) to minimize the operation cost of the distribution network. The objective constraint limits are constraint conditions aiming at the deterministic objects and comprise unit power generation cost, unit unfinished power generation amount fine, upper and lower limits of actual processing of the distributed power supply and the like. The opportunity constraint limit is a constraint condition aiming at the distribution network uncertainty object, and comprises a probability constraint for bearing risk execution degree, a power flow safety limit and the like. Sources of uncertainty factors include uncertainty of distributed photovoltaic, wind power output, generator bid, uncertainty of traditional load forecast deviation, and the like.
Therefore, the mathematical expression of the upper-level planning modeling is as follows:
Figure BDA0002096301310000071
constraint function:
Figure BDA0002096301310000072
Figure BDA0002096301310000073
Figure BDA0002096301310000074
Figure BDA0002096301310000075
in the formula
lambda-Power Generation trade time-of-sale quotes, where lambda t Is quoted at time t
Xi-random variable caused by unknown quote of bidder
Figure BDA0002096301310000076
Random variable caused by uncertainty of deviation between wind power and photovoltaic true value and predicted value
Figure BDA0002096301310000077
When the quote is lambda, in xi and
Figure BDA0002096301310000078
revenue of power generator under scene
Beta-bearing risk confidence
Figure BDA0002096301310000081
Satisfying expected yield at beta confidence
q t,ξ In xi scene, the power marking amount of the power generator obtained in the lower layer planning in the time period t
Figure BDA0002096301310000082
Xi scene, unit electric quantity user new energy subscription compensation obtained by lower layer decision (lower layer decision output)
c base -cost per unit of electricity generation
Figure BDA0002096301310000083
-the generator is xi and
Figure BDA0002096301310000084
penalty fines under scene
Gamma-unit fine of unfinished electricity
Figure BDA0002096301310000085
Moment t, bid amount exceedes in xi scene
Figure BDA0002096301310000086
Unbalanced electric quantity of maximum output under scene
Figure BDA0002096301310000087
-at
Figure BDA0002096301310000088
Actual output upper limit of distributed power supply at time t under scene
T-a period of time, with a default value of one hour.
Modeling of a lower layer plan:
and the lower-layer planning optimizes the scheduling and the allocation of the right to bid of each power plant by taking the comprehensive benefits of market operation as a target. The lower level programming model is actually a market-balanced scheduling model for the regional retail market. The accuracy of the model determines whether the regional market can function properly according to the rules. Due to the neglect of energy storage, the electricity purchase sources in the region include both the distributed generator and the external grid, and the sum of the electricity purchase cost of each period constitutes the cost source of the system. In addition, considering that the user is willing to pay a certain cost to order new energy and enjoy green power, this user group may also be included in the overall benefit. Therefore, the optimal goal can be to minimize the cost of electricity purchase and increase the subscription fees for green electricity.
Therefore, the mathematical expression of the underlying plan modeling is as follows:
Figure BDA0002096301310000089
constraint function:
Figure BDA00020963013100000810
Figure BDA00020963013100000811
Figure BDA0002096301310000091
Figure BDA0002096301310000092
Figure BDA0002096301310000093
Figure BDA0002096301310000094
Figure BDA0002096301310000095
Figure BDA0002096301310000096
Figure BDA0002096301310000097
Figure BDA0002096301310000098
Figure BDA0002096301310000099
in the formula:
N pv 、N wp -total number of photovoltaic and wind power generation suppliers in area
L-total number of power consumers in area
Figure BDA00020963013100000910
-unit cost of purchasing electricity from external grid at time t
Figure BDA00020963013100000911
-the electricity purchasing cost from No. i photovoltaic and wind power generator at the moment t
Figure BDA00020963013100000912
-purchasing electric power from external grid at time t
Figure BDA00020963013100000913
-purchasing electricity from No. i photovoltaic and wind power generator at time t
Figure BDA00020963013100000914
Load capacity of No. i power consumer at time t
comp pv 、comp wp The subscription range of the user is per-degree electricity subscription compensation paid in renewable energy sources such as photovoltaic energy, wind power energy and the like
Q load-pv-i 、Q load-wp-i The photovoltaic and wind power subscription electric quantity which is paid by the i-number user in the current day settlement
Q pv 、Q wp 、Q grid -photovoltaic, wind power, external electrical quantities consumed within the area of the day
υ pv 、υ wp -ratio of photovoltaic to wind power generation in the region of the same day
α i 、β i -photovoltaic and wind power ratio subscribed by No. i user
Figure BDA0002096301310000101
-maximum generated energy at time t reported by No. i photovoltaic and wind power generator
A function approximator:
the function approximator estimates the Q value using a series of adjustable parameters and features extracted from the state action space. The approximator then builds a mapping of the state contribution from the parameter space to the Q-value function to the space. The mapping may be linear or non-linear. Solvability may be analyzed using linear mapping. A typical form of a linear approximator is as follows:
Figure BDA0002096301310000102
wherein
Figure BDA0002096301310000103
Is an adjustable approximate parameter vector and is characterized in that,
Figure BDA0002096301310000104
is the feature vector of the state-action pair, which can be derived from the following equation:
Figure BDA0002096301310000105
wherein
Figure BDA0002096301310000106
Is a Basis Function (BF), such as gaussian radial BF, centered at a selected motionless point in the state space. Typically, the BFs sets corresponding to fixed points are evenly distributed in the state space. Herein, all vectors are considered to be column vectors if not specified. (.) T Representing a matrix transpose operation. Radial basis function neural networks have been used in random nonlinear interconnect systems and have been shown to have good generalization performance.
Diffusion strategy:
the reinforcement learning algorithm introduces a diffusion strategy in the reinforcement learning process, and introduces an adaptive combination (ATC) mechanism into the reinforcement learning algorithm. The diffusion strategy may achieve faster convergence and may achieve a lower mean square deviation than the uniform strategy. Furthermore, the flooding strategy has better response performance to continuous real-time signals and is insensitive to neighboring weights. The basic idea of the flooding strategy is to combine collaboration items based on neighboring states during the self-state update process of each agent. Consider having state x i And its dynamic characteristics.
x i (k+1)=x i (k)+f(x i (k))
The diffusion strategy is as follows:
Figure BDA0002096301310000107
Figure BDA0002096301310000111
wherein
Figure BDA0002096301310000112
Is an intermediate term, x, introduced by the diffusion strategy i (k +1) is the state updated by combining all the intermediate terms of agent i. N is a radical of i Is a set of points adjacent to agent i. In addition, b ij Is the weight assigned by agent i to neighboring agent j. Here, we can define a matrix B ═ B ij ]∈R n×n As a topology matrix of the microgrid communication network. In general, the topology matrix B is a random matrix, which means B1 n =1 n In which 1 is n ∈R n Is a unit vector.
By integrating an adaptive combination (ATC) diffusion strategy into the parameter updating process of Greedy-GQ, a cooperative reinforcement learning algorithm is provided.
Figure BDA0002096301310000113
Figure BDA0002096301310000114
Figure BDA0002096301310000115
Figure BDA0002096301310000116
It is noted that the proposed cooperative reinforcement learning algorithm introduces two intermediate vectors:
Figure BDA0002096301310000117
and
Figure BDA0002096301310000118
actual approximation parameter vector θ i (k +1) and correction parameter vector ω i (k +1) is the combination of the intermediate vectors of the neighboring agents. In the proposed algorithm, the learning rate parameters α (k) and β (k) can be set with the conditions P (1) to P (4).
α(k)>0,β(k)>0 P(1)
Figure BDA0002096301310000119
Figure BDA00020963013100001110
α(k)/β(k)→0 P(4)
Although the present invention has been described in detail with reference to the above embodiments, those skilled in the art can make modifications and equivalents to the embodiments of the present invention without departing from the spirit and scope of the present invention, which is set forth in the claims of the present application.

Claims (3)

1. A distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning is characterized by comprising the following steps:
step 1) constructing a double-layer random decision optimization model of distributed renewable energy transactions, wherein the double-layer random decision optimization model of distributed renewable energy transactions in the step 1) comprises an upper-layer planning modeling and a lower-layer planning modeling, and respectively corresponds to two parts of an energy transaction link;
step 2) introducing a multi-agent double-layer collaborative reinforcement learning algorithm, carrying out learning training according to a theoretical framework of the multi-agent double-layer collaborative reinforcement learning algorithm, and establishing a function approximator; the function approximator adopts a series of adjustable parameters and characteristics extracted from a state action space to estimate a Q value, the approximator establishes mapping of state action of a parameter space to a Q value function to the space, the mapping is linear or nonlinear, solvability is analyzed by utilizing linear mapping, and the typical form of the function approximator is as follows:
Figure FDA0003699062350000011
Figure FDA0003699062350000012
wherein
Figure FDA0003699062350000013
Is an adjustable approximate parameter vector and is characterized in that,
Figure FDA0003699062350000014
is the feature vector of the state-action pair,
Figure FDA0003699062350000015
is the Basis Function (BF); (.) T Representing a matrix transpose operation;
step 3) solving an estimated value of an optimal Q value function by using an iterative computation method on the basis of the frame in the step 2);
step 4) solving an optimization model by using a trained multi-agent double-layer collaborative reinforcement learning algorithm to complete optimization calculation; the opportunity constraint planning of the optimistic value of the maximized objective function constructed in the upper-layer planning modeling has the optimization target of the maximum economic benefit, the constraint condition consists of objective constraint limit and opportunity constraint limit, and the mathematical expression of the upper-layer planning modeling is as follows:
Figure FDA0003699062350000016
constraint function:
Figure FDA0003699062350000017
Figure FDA0003699062350000018
Figure FDA0003699062350000019
Figure FDA0003699062350000021
wherein λ -power generation trade time-sharing quotation, wherein λ t Is the quote at time t, ξ -a random variable caused by the bid being unknown to the bidder,
Figure FDA0003699062350000022
random variables caused by uncertainty of deviation of real values and predicted values of wind power and photovoltaic,
Figure FDA0003699062350000023
when the quoted price is lambda, in xi and
Figure FDA0003699062350000024
generator revenue under the scenario, beta-risk tolerance confidence,
Figure FDA0003699062350000025
-satisfy expected yield at confidence of β, q t Xi scene, the electric power quantity of the generator obtained in the lower layer planning in the time period t, c s ξ And c, under xi scene, the new energy subscription compensation of unit electric quantity users obtained by lower layer decision base -the cost per unit of electricity generation,
Figure FDA0003699062350000026
the power generator is xi and
Figure FDA0003699062350000027
penalty fines in the scene, gamma-unit fines for outstanding electricity,
Figure FDA0003699062350000028
moment-t, over-rated electric quantity in xi scene
Figure FDA0003699062350000029
The unbalanced electric quantity of the maximum output under the scene,
Figure FDA00036990623500000210
-at
Figure FDA00036990623500000211
In a scene, at the moment T, the actual output upper limit of the distributed power supply is T-one time period, and the default value is one hour;
the lower-layer planning modeling is used for optimizing scheduling and distributing bid amount of each power generator aiming at bidding scenes and taking market operation comprehensive benefits as targets, and the mathematical expression of the lower-layer planning is as follows:
Figure FDA00036990623500000212
constraint function:
Figure FDA00036990623500000213
Figure FDA00036990623500000214
Figure FDA00036990623500000215
Figure FDA00036990623500000216
Figure FDA00036990623500000217
Figure FDA00036990623500000218
Figure FDA00036990623500000219
Figure FDA0003699062350000031
Figure FDA0003699062350000032
Figure FDA0003699062350000033
Figure FDA0003699062350000034
in the formula: n is a radical of pv 、N wp -the total number of photovoltaic and wind power generators in the area, L-the total number of power consumers in the area,
Figure FDA0003699062350000035
-unit cost of purchasing electricity from the external grid at time t,
Figure FDA0003699062350000036
the electricity purchasing cost from No. i photovoltaic and wind power generators at the moment t,
Figure FDA0003699062350000037
-t time point purchasing electric power from the external grid,
Figure FDA0003699062350000038
purchasing electricity from No. i photovoltaic and wind power generators at the moment t,
Figure FDA0003699062350000039
-load of No. i electricity consumer at time t, comp pv 、comp wp -per degree electricity subscription compensation paid in photovoltaic, wind power renewable energy, Q, user subscription scope load-pv-i 、Q load-wp-i I number user settles the charge-receivable photovoltaic and wind power subscription electric quantity on the same day, Q pv 、Q wp 、Q grid -photovoltaic, wind-electric, external electric quantity, upsilon, consumed in the area of the day pv 、υ wp -ratio of photovoltaic to wind power generation in the region of the day, α i 、β i -the photovoltaic and wind power ratio subscribed by the i-th user,
Figure FDA00036990623500000310
and (4) the maximum generating capacity at the time t reported by the No. i photovoltaic and wind power generator.
2. The distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning according to claim 1, characterized in that: in the step 2), a plurality of agents are used for respectively processing the randomness problem of the upper-layer planning modeling and the lower-layer planning modeling and the mutual iteration of the upper-layer planning modeling and the lower-layer planning modeling; the double-layer collaborative reinforcement learning algorithm introduces a diffusion strategy in the reinforcement learning process, and introduces a self-adaptive combined ATC mechanism into the reinforcement learning algorithm, and the double-layer collaborative reinforcement learning algorithm can adapt to randomness and uncertainty caused by distributed renewable energy sources and can adapt to the problem of complex calculation of a double-layer random decision optimization model; to avoid the storage of a large number of Q-value tables, a function approximator is used to record the Q-values of complex continuous states and motion spaces.
3. The distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning according to claim 2, characterized in that: the diffusion strategy can achieve faster convergence and can achieve lower mean square deviation than the uniform strategy, and the diffusion strategy is as follows:
Figure FDA00036990623500000311
Figure FDA0003699062350000041
wherein,
Figure FDA0003699062350000042
is an intermediate term, x, introduced by the diffusion strategy i (k +1) is the state updated by combining all intermediate terms of agent i; n is a radical of i Is a set of points adjacent to agent i; b ij Is the weight assigned by agent i to neighboring agent j; here, a matrix B ═ B is defined ij ]∈R n×n As a topological matrix of the microgrid communication network; the topology matrix B is a random matrix, B1 n =1 n In which 1 is n ∈R n Is a unit vector.
CN201910519858.1A 2019-06-17 2019-06-17 Distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning Active CN110276698B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910519858.1A CN110276698B (en) 2019-06-17 2019-06-17 Distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910519858.1A CN110276698B (en) 2019-06-17 2019-06-17 Distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning

Publications (2)

Publication Number Publication Date
CN110276698A CN110276698A (en) 2019-09-24
CN110276698B true CN110276698B (en) 2022-08-02

Family

ID=67960916

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910519858.1A Active CN110276698B (en) 2019-06-17 2019-06-17 Distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning

Country Status (1)

Country Link
CN (1) CN110276698B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110990793B (en) * 2019-12-07 2024-03-15 国家电网有限公司 Scheduling optimization method for electric heating gas coupling micro energy station
CN111064229B (en) * 2019-12-18 2023-04-07 广东工业大学 Wind-light-gas-storage combined dynamic economic dispatching optimization method based on Q learning
CN111200285B (en) * 2020-02-12 2023-12-19 燕山大学 Micro-grid hybrid coordination control method based on reinforcement learning and multi-agent theory
CN112612206B (en) * 2020-11-27 2022-11-08 合肥工业大学 Multi-agent collaborative decision-making method and system for uncertain events
CN112714165B (en) * 2020-12-22 2023-04-04 声耕智能科技(西安)研究院有限公司 Distributed network cooperation strategy optimization method and device based on combination mechanism
CN112859591B (en) * 2020-12-23 2022-10-21 华电电力科学研究院有限公司 Reinforced learning control system for operation optimization of energy system
CN113378456B (en) * 2021-05-21 2023-04-07 青海大学 Multi-park comprehensive energy scheduling method and system
CN113421004B (en) * 2021-06-30 2023-05-26 国网山东省电力公司潍坊供电公司 Transmission and distribution cooperative active power distribution network distributed robust extension planning system and method
CN113555870B (en) * 2021-07-26 2023-10-13 国网江苏省电力有限公司南通供电分公司 Q-learning photovoltaic prediction-based power distribution network multi-time scale optimal scheduling method
CN113780622B (en) * 2021-08-04 2024-03-12 华南理工大学 Multi-agent reinforcement learning-based distributed scheduling method for multi-microgrid power distribution system
CN113743583B (en) * 2021-08-07 2024-02-02 中国航空工业集团公司沈阳飞机设计研究所 Method for inhibiting switching of invalid behaviors of intelligent agent based on reinforcement learning
CN114021815B (en) * 2021-11-04 2023-06-27 东南大学 Scalable energy management collaboration method for community containing large-scale producers and consumers
CN114611813B (en) * 2022-03-21 2022-09-27 特斯联科技集团有限公司 Community hot-cold water circulation optimal scheduling method and system based on hydrogen energy storage
WO2024084125A1 (en) * 2022-10-19 2024-04-25 Aalto University Foundation Sr Trained optimization agent for renewable energy time shifting
CN117559387B (en) * 2023-10-18 2024-06-21 东南大学 VPP internal energy optimization method and system based on deep reinforcement learning dynamic pricing
CN117350515B (en) * 2023-11-21 2024-04-05 安徽大学 Ocean island group energy flow scheduling method based on multi-agent reinforcement learning

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6820815B2 (en) * 2017-09-07 2021-01-27 株式会社日立製作所 Learning control system and learning control method
CN109325608B (en) * 2018-06-01 2022-04-01 国网上海市电力公司 Distributed power supply optimal configuration method considering energy storage and considering photovoltaic randomness

Also Published As

Publication number Publication date
CN110276698A (en) 2019-09-24

Similar Documents

Publication Publication Date Title
CN110276698B (en) Distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning
Crespo-Vazquez et al. A community-based energy market design using decentralized decision-making under uncertainty
Adetunji et al. A review of metaheuristic techniques for optimal integration of electrical units in distribution networks
Wang et al. Stochastic cooperative bidding strategy for multiple microgrids with peer-to-peer energy trading
Aghaei et al. Risk-constrained offering strategy for aggregated hybrid power plant including wind power producer and demand response provider
Chen et al. Research on day-ahead transactions between multi-microgrid based on cooperative game model
Varkani et al. A new self-scheduling strategy for integrated operation of wind and pumped-storage power plants in power markets
CN109190802B (en) Multi-microgrid game optimization method based on power generation prediction in cloud energy storage environment
Ghadimi et al. PSO based fuzzy stochastic long-term model for deployment of distributed energy resources in distribution systems with several objectives
Gao et al. A multiagent competitive bidding strategy in a pool-based electricity market with price-maker participants of WPPs and EV aggregators
Adil et al. Energy trading among electric vehicles based on Stackelberg approaches: A review
CN112381263B (en) Block chain-based distributed data storage multi-microgrid pre-day robust electric energy transaction method
CN111030188A (en) Hierarchical control strategy containing distributed and energy storage
CN111082451A (en) Incremental distribution network multi-objective optimization scheduling model based on scene method
CN112001752A (en) Multi-virtual power plant dynamic game transaction behavior analysis method based on limited rationality
CN111311012A (en) Multi-agent-based micro-grid power market double-layer bidding optimization method
CN111553750A (en) Energy storage bidding strategy method considering power price uncertainty and loss cost
Gao et al. Distributed energy trading and scheduling among microgrids via multiagent reinforcement learning
Chuang et al. Deep reinforcement learning based pricing strategy of aggregators considering renewable energy
CN112217195A (en) Cloud energy storage charging and discharging strategy forming method based on GRU multi-step prediction technology
CN116451880B (en) Distributed energy optimization scheduling method and device based on hybrid learning
Liu et al. Research on bidding strategy of thermal power companies in electricity market based on multi-agent deep deterministic policy gradient
CN116207739A (en) Optimal scheduling method and device for power distribution network, computer equipment and storage medium
Gao et al. Bounded rationality based multi-VPP trading in local energy markets: a dynamic game approach with different trading targets
Peng et al. Review on bidding strategies for renewable energy power producers participating in electricity spot markets

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant