CN110276698B - Distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning - Google Patents
Distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning Download PDFInfo
- Publication number
- CN110276698B CN110276698B CN201910519858.1A CN201910519858A CN110276698B CN 110276698 B CN110276698 B CN 110276698B CN 201910519858 A CN201910519858 A CN 201910519858A CN 110276698 B CN110276698 B CN 110276698B
- Authority
- CN
- China
- Prior art keywords
- layer
- reinforcement learning
- double
- agent
- photovoltaic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000002787 reinforcement Effects 0.000 title claims abstract description 59
- 238000000034 method Methods 0.000 title claims abstract description 27
- 230000006870 function Effects 0.000 claims abstract description 39
- 238000005457 optimization Methods 0.000 claims abstract description 27
- 230000008901 benefit Effects 0.000 claims abstract description 16
- 238000004364 calculation method Methods 0.000 claims abstract description 13
- 230000007246 mechanism Effects 0.000 claims abstract description 8
- 238000012549 training Methods 0.000 claims abstract description 5
- 239000003795 chemical substances by application Substances 0.000 claims description 55
- 230000005611 electricity Effects 0.000 claims description 27
- 238000009792 diffusion process Methods 0.000 claims description 16
- 239000013598 vector Substances 0.000 claims description 16
- 239000011159 matrix material Substances 0.000 claims description 15
- 238000010248 power generation Methods 0.000 claims description 11
- 230000008569 process Effects 0.000 claims description 10
- 238000013507 mapping Methods 0.000 claims description 9
- 230000009471 action Effects 0.000 claims description 6
- 238000004891 communication Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 230000003044 adaptive effect Effects 0.000 description 5
- 238000011161 development Methods 0.000 description 2
- 239000000446 fuel Substances 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000004146 energy storage Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000035515 penetration Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- General Physics & Mathematics (AREA)
- Water Supply & Treatment (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- Human Resources & Organizations (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Public Health (AREA)
- Computer Hardware Design (AREA)
- Evolutionary Computation (AREA)
- Geometry (AREA)
- General Engineering & Computer Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Supply And Distribution Of Alternating Current (AREA)
Abstract
The invention discloses a distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning, which comprises the following main steps of: 1) constructing a double-layer random decision optimization model of distributed renewable energy trading; 2) introducing a multi-agent double-layer collaborative reinforcement learning algorithm, carrying out learning training according to a theoretical framework of the multi-agent double-layer collaborative reinforcement learning algorithm, and establishing a function approximator and a collaborative reinforcement learning working mechanism; 3) calculating an estimated value of an optimal Q value function by using an iterative calculation method on the basis of the frame in the step 2); 4) and solving an optimization model by using the trained multi-agent double-layer collaborative reinforcement learning algorithm to complete optimization calculation. The invention considers the uncertainty in the distributed renewable energy transaction, can improve the income of the power generator in consideration of risks, and simultaneously maximizes the comprehensive benefit.
Description
Technical Field
The invention relates to the field of intelligent power distribution networks, in particular to a distributed renewable energy trading decision method based on multi-agent double-layer collaborative reinforcement learning.
Background
With the progress and development of society, the global demand for green, clean and efficient power is larger and larger, more and more distributed renewable energy sources are connected to a power distribution network, and the distributed energy sources have the characteristics of reasonable energy efficiency utilization, small loss, less pollution, flexible operation, good system economy and the like. The development mainly has the problems of grid connection, power supply quality, capacity storage, fuel supply and the like.
Distributed photovoltaic and wind power generation, while free of fuel costs, are high in construction, operating and maintenance costs. At present, the new energy distributed generator in China is mainly subsidized for profit through the electricity price of the state and local governments. However, as distributed power penetration increases, the profitability model is significantly less consistent with market laws. The distributed generators are subsidized through the subscription fee of the users, the generators can be helped to participate in market competition, and reasonable quotation is carried out according to the potential benefits and the power generation cost of the generators, so that the social benefits are improved to the maximum extent. Meanwhile, various uncertain information such as power generator quotation, distributed power supply output fluctuation, user subscription and the like are considered, model solution can be carried out through a multi-agent double-layer collaborative reinforcement learning solution method, an optimal scheduling decision can be rapidly calculated, risks are reduced, and economic benefits are improved.
Disclosure of Invention
In order to overcome the defects of the existing transaction decision method, the invention provides a distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning, a distributed energy double-layer random planning model under various uncertain information such as power generator quotation, distributed power supply output fluctuation, user subscription and the like is considered, model solution is carried out through a multi-agent double-layer collaborative reinforcement learning solution method, the optimal scheduling decision can be rapidly calculated, the risk is reduced, and the economic benefit is improved.
The invention realizes the purpose through the following technical scheme:
a distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning comprises the following steps:
step 1) constructing a double-layer random decision optimization model of distributed renewable energy trading;
step 2) introducing a multi-agent double-layer cooperative reinforcement learning algorithm, carrying out learning training according to a theoretical framework of the multi-agent double-layer cooperative reinforcement learning algorithm, and establishing a function approximator and a cooperative reinforcement learning work mechanism; the function approximator estimates a Q value by adopting a series of adjustable parameters and characteristics extracted from a state action space, the approximator establishes mapping of the state action of a function from the parameter space to the Q value to the space, the mapping can be linear or nonlinear, solvability can be analyzed by utilizing linear mapping, and the typical form of the function approximator is as follows:
whereinIs an adjustable approximate parameter vector and is characterized in that,is the feature vector of the state-action pair,is the Basis Function (BF) (. DEG) T Representing a matrix transpose operation;
step 3) solving an estimation value of an optimal Q value function by using an iterative calculation method on the basis of the frame in the step 2);
and 4) solving an optimization model by using the trained multi-agent double-layer collaborative reinforcement learning algorithm to complete optimization calculation.
Preferably, the double-layer random decision optimization model for the distributed renewable energy transaction in step 1) includes an upper-layer planning modeling and a lower-layer planning modeling, which respectively correspond to two parts of an energy transaction link.
Preferably, the opportunity constraint programming for maximizing the optimistic value of the objective function constructed in the upper-level programming modeling has the optimization goal of maximum economic benefit, the constraint condition is composed of objective constraint limits and opportunity constraint limits, and the mathematical expression of the upper-level programming modeling is as follows:
constraint function:
wherein λ -power generation trade time-sharing quotation, wherein λ t Is the quote at time t, ξ -a random variable caused by the bid being unknown to the bidder,random variables caused by uncertainty of deviation of real values and predicted values of wind power and photovoltaic,when the quoted price is lambda, in xi andgenerator revenue under the scenario, beta-risk tolerance confidence,-satisfy expected yield at confidence of β, q t,ξ ξ scene, the electricity generator obtained in the lower layer planning draws the electricity quantity in the time period t,and in xi scene, the new energy subscription compensation (lower layer decision output) of the unit electric quantity user obtained by the lower layer decision, c base -the cost per unit of electricity generation,the power generator is xi andpenalty fines in the scene, gamma-unit fines for outstanding electricity,moment-t, over-charging in xi sceneThe unbalanced electric quantity of the maximum output under the scene,-atAnd in the scene, at the moment T, the actual output upper limit of the distributed power supply is T-one time period, and the default value is one hour.
Preferably, the lower-layer planning modeling is used for optimizing scheduling and allocating bid amount in each power generator aiming at a bidding scenario and with market operation comprehensive benefits as a target, and the mathematical expression of the lower-layer planning is as follows:
constraint function:
in the formula: n is a radical of pv 、N wp -the total number of photovoltaic and wind power generators in the area, L-the total number of power consumers in the area,-unit cost of purchasing electricity from the external grid at time t,the electricity purchasing cost from No. i photovoltaic and wind power generators at the moment t,-t time point purchasing electric power from the external grid,purchasing electricity from No. i photovoltaic and wind power generators at the moment t,-load of No. i electricity consumer at time t, comp pv 、comp wp -per-degree electricity subscription compensation paid in renewable energy sources such as photovoltaic, wind power and the like within the user subscription range, Q load-pv-i 、Q load-wp-i I number user settles the charge-receivable photovoltaic and wind power subscription electric quantity on the same day, Q pv 、Q wp 、Q grid -photovoltaic, wind-electric, external electric quantity, upsilon, consumed in the area of the day pv 、υ wp -ratio of photovoltaic to wind power generation in the region of the day, α i 、β i -the photovoltaic and wind power ratio subscribed by the i-th user,and (4) the maximum generating capacity at the time t reported by the No. i photovoltaic and wind power generator.
Preferably, in the step 2), a plurality of agents are utilized to respectively process the randomness problem of the upper-layer planning modeling and the lower-layer planning modeling and the mutual iteration of the upper-layer planning modeling and the lower-layer planning modeling; the double-layer collaborative reinforcement algorithm introduces a diffusion strategy in the reinforcement learning process, and introduces an adaptive combination (ATC) mechanism into the reinforcement learning algorithm, and the collaborative reinforcement learning algorithm can adapt to randomness and uncertainty caused by distributed renewable energy sources and can adapt to the problem of complex calculation of a double-layer random decision optimization model; in addition, to avoid the storage of a large number of Q value tables, a function approximator is used to record the Q values of complex continuous states and motion spaces.
Preferably, the diffusion strategy can achieve faster convergence and can achieve a lower mean square deviation than a uniform strategy, which is as follows:
whereinIs an intermediate term, x, introduced by the diffusion strategy i (k +1) is the state updated by combining all intermediate terms of agent i; n is a radical of i Is a set of points adjacent to agent i; b ij Is the weight assigned by agent i to neighboring agent j; defining a matrix B ═ B ij ]∈R n×n As a topological matrix of the microgrid communication network; the topology matrix B is a random matrix, B1 n =1 n In which 1 is n ∈R n Is a unit vector.
Has the advantages that:
1. the double-layer decision optimization model established by the invention can comprehensively consider the uncertainty situation caused by the random variables and make better decisions. It is therefore well suited for optimization decisions for distributed generators.
2. The algorithm provided by the invention is a double-layer collaborative reinforcement learning algorithm, can be well integrated into a two-layer random decision optimization model, and provides a new idea for intensive energy trading decision of a future information network and an energy network.
3. The invention introduces a plurality of agents to respectively process the randomness problem of the upper and lower layers of planning and the mutual iteration of the upper and lower layers, so that the collaborative reinforcement learning algorithm is more suitable for the problem of the double-layer planning.
4. The multi-agent double-layer collaborative reinforcement learning is used as a multi-agent reinforcement learning algorithm with self-learning and collaborative learning capabilities, and is more suitable for solving the large-scale distributed access energy problem with strong randomness and uncertainty. After certain training and updating, the algorithm can quickly carry out dynamic optimization, and meanwhile, the stability of global convergence is guaranteed.
5. A diffusion strategy is introduced in the reinforcement learning process, so that distributed information exchange can be realized in the microgrid, the calculation cost is reduced, faster convergence can be realized, and the mean square deviation lower than that of a consistent strategy can be achieved.
Drawings
FIG. 1 is an overall frame diagram of the present invention;
FIG. 2 is a flow chart of multi-agent dual-tier collaborative reinforcement learning according to the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings and specific embodiments.
The distributed renewable energy trading decision method based on multi-agent double-layer collaborative reinforcement learning disclosed by the invention takes a power distribution network as a medium, simultaneously schedules a distributed power supply and a controllable load, and realizes economic benefit optimization, and the optimization object and model of the method are schematically shown in figure 1.
The invention provides a distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning, which comprises the following steps:
step 1) constructing a double-layer random decision optimization model of distributed renewable energy trading;
step 2) introducing a multi-agent double-layer collaborative reinforcement learning algorithm, carrying out learning training according to a theoretical framework of the multi-agent double-layer collaborative reinforcement learning algorithm, and establishing a function approximator and a collaborative reinforcement learning working mechanism;
step 3) solving an estimation value of an optimal Q value function by using an iterative calculation method on the basis of the frame in the step 2);
and 4) solving an optimization model by using the trained multi-agent double-layer collaborative reinforcement learning algorithm to complete optimization calculation.
The double-layer random decision optimization model for the distributed renewable energy transaction in the step 1) comprises an upper-layer planning modeling and a lower-layer planning modeling, which respectively correspond to two parts of an energy transaction link
In the step 2), a plurality of agents are used for respectively processing the randomness problem of the upper-layer planning modeling and the lower-layer planning modeling and the mutual iteration of the upper-layer planning modeling and the lower-layer planning modeling; the double-layer collaborative reinforcement algorithm introduces a diffusion strategy in the reinforcement learning process, and introduces an adaptive combination (ATC) mechanism into the reinforcement learning algorithm, and the collaborative reinforcement learning algorithm can adapt to randomness and uncertainty caused by distributed renewable energy sources and can adapt to the problem of complex calculation of a double-layer random decision optimization model; to avoid the storage of a large number of Q-value tables, a function approximator is used to record the Q-values of complex continuous states and motion spaces.
The step 3) iterative computation flow comprises the following steps (see fig. 2):
s1 initialization theta 0 ,ω 0
S2, repeating the times k is 1to T
S3, each agent calculates i-1 to n in turn
S5 selecting action a according to strategy pi i (k)
S6 observing the prize value r i (k)
TD error delta S7 i (k)
S9, updating the parameter theta i (k),ω i (k)
S10 Return to S3
S11 Return to S2
S12: and returning the result.
The basic steps and explanation of the application of the distributed renewable energy framework of multi-agent double-layer collaborative reinforcement learning are as follows:
a1: decomposing and writing the target function and the constraint function of the upper and lower layer plans into respective rewards of a reinforcement learning algorithm to serve as reference values of rewards, wherein the target function of the upper layer plan is expected to be the maximum and is set as forward rewards, the target function of the lower layer plan is expected to be the lowest in price and is set as reverse rewards, the constraint conditions of the upper and lower layer plans are used as penalty items, coefficients are set according to actual debugging conditions, the requirement is that the penalty coefficient of the strong constraint is far greater than the Reward item coefficient, and the weak constraint is greater than the Reward item coefficient.
A2: the method comprises the steps of constructing a first reinforcement learning module which is essentially a combination of two (usually a plurality of) reinforcement learning intelligent agents, establishing a reinforcement learning intelligent agent unit by taking a lower-layer plan as a module, establishing a reinforcement learning intelligent agent unit by taking each power generator as a module at an upper layer due to a plurality of power generators, and finally integrating the intelligent agent unit at the upper layer and the intelligent agent unit at the lower layer through a whole intelligent agent unit, wherein as shown in an intelligent agent II in figure 1, the Reward structure of the intelligent agent II is that the maximum total Reward of each intelligent agent unit is the maximum target.
A3: and establishing a function approximator. The storage of the Q value occupies a large amount of resources of the computer, so as to reduce the occupation of the computer resources and increase the calculation speed.
A4: establishing a cooperative reinforcement learning working mechanism, and establishing a parameter updating process of integrating an adaptive combination (ATC) diffusion strategy into Greedy-GQ in order to accelerate the calculation efficiency of a multi-agent.
A5: and constructing a second reinforcement learning module, taking the agent II as an environment of the agent, and establishing an updating strategy by using a conventional Q learning (or Sarsar, DQN and the like) updating rule.
Modeling upper layer planning:
and the opportunity constraint planning of the optimistic value of the maximized objective function constructed in the upper-layer planning aims at maximizing economic benefit, and the constraint condition consists of objective constraint limit and opportunity constraint limit. Moreover, the upper layer optimization aims at an optimistic value of the economic benefit (i.e. the economic benefit obtained is better than the value at a certain confidence) to minimize the operation cost of the distribution network. The objective constraint limits are constraint conditions aiming at the deterministic objects and comprise unit power generation cost, unit unfinished power generation amount fine, upper and lower limits of actual processing of the distributed power supply and the like. The opportunity constraint limit is a constraint condition aiming at the distribution network uncertainty object, and comprises a probability constraint for bearing risk execution degree, a power flow safety limit and the like. Sources of uncertainty factors include uncertainty of distributed photovoltaic, wind power output, generator bid, uncertainty of traditional load forecast deviation, and the like.
Therefore, the mathematical expression of the upper-level planning modeling is as follows:
constraint function:
in the formula
lambda-Power Generation trade time-of-sale quotes, where lambda t Is quoted at time t
Xi-random variable caused by unknown quote of bidder
Random variable caused by uncertainty of deviation between wind power and photovoltaic true value and predicted value
Beta-bearing risk confidence
q t,ξ In xi scene, the power marking amount of the power generator obtained in the lower layer planning in the time period t
Xi scene, unit electric quantity user new energy subscription compensation obtained by lower layer decision (lower layer decision output)
c base -cost per unit of electricity generation
Gamma-unit fine of unfinished electricity
T-a period of time, with a default value of one hour.
Modeling of a lower layer plan:
and the lower-layer planning optimizes the scheduling and the allocation of the right to bid of each power plant by taking the comprehensive benefits of market operation as a target. The lower level programming model is actually a market-balanced scheduling model for the regional retail market. The accuracy of the model determines whether the regional market can function properly according to the rules. Due to the neglect of energy storage, the electricity purchase sources in the region include both the distributed generator and the external grid, and the sum of the electricity purchase cost of each period constitutes the cost source of the system. In addition, considering that the user is willing to pay a certain cost to order new energy and enjoy green power, this user group may also be included in the overall benefit. Therefore, the optimal goal can be to minimize the cost of electricity purchase and increase the subscription fees for green electricity.
Therefore, the mathematical expression of the underlying plan modeling is as follows:
constraint function:
in the formula:
N pv 、N wp -total number of photovoltaic and wind power generation suppliers in area
L-total number of power consumers in area
comp pv 、comp wp The subscription range of the user is per-degree electricity subscription compensation paid in renewable energy sources such as photovoltaic energy, wind power energy and the like
Q load-pv-i 、Q load-wp-i The photovoltaic and wind power subscription electric quantity which is paid by the i-number user in the current day settlement
Q pv 、Q wp 、Q grid -photovoltaic, wind power, external electrical quantities consumed within the area of the day
υ pv 、υ wp -ratio of photovoltaic to wind power generation in the region of the same day
α i 、β i -photovoltaic and wind power ratio subscribed by No. i user
A function approximator:
the function approximator estimates the Q value using a series of adjustable parameters and features extracted from the state action space. The approximator then builds a mapping of the state contribution from the parameter space to the Q-value function to the space. The mapping may be linear or non-linear. Solvability may be analyzed using linear mapping. A typical form of a linear approximator is as follows:
whereinIs an adjustable approximate parameter vector and is characterized in that,is the feature vector of the state-action pair, which can be derived from the following equation:
whereinIs a Basis Function (BF), such as gaussian radial BF, centered at a selected motionless point in the state space. Typically, the BFs sets corresponding to fixed points are evenly distributed in the state space. Herein, all vectors are considered to be column vectors if not specified. (.) T Representing a matrix transpose operation. Radial basis function neural networks have been used in random nonlinear interconnect systems and have been shown to have good generalization performance.
Diffusion strategy:
the reinforcement learning algorithm introduces a diffusion strategy in the reinforcement learning process, and introduces an adaptive combination (ATC) mechanism into the reinforcement learning algorithm. The diffusion strategy may achieve faster convergence and may achieve a lower mean square deviation than the uniform strategy. Furthermore, the flooding strategy has better response performance to continuous real-time signals and is insensitive to neighboring weights. The basic idea of the flooding strategy is to combine collaboration items based on neighboring states during the self-state update process of each agent. Consider having state x i And its dynamic characteristics.
x i (k+1)=x i (k)+f(x i (k))
The diffusion strategy is as follows:
whereinIs an intermediate term, x, introduced by the diffusion strategy i (k +1) is the state updated by combining all the intermediate terms of agent i. N is a radical of i Is a set of points adjacent to agent i. In addition, b ij Is the weight assigned by agent i to neighboring agent j. Here, we can define a matrix B ═ B ij ]∈R n×n As a topology matrix of the microgrid communication network. In general, the topology matrix B is a random matrix, which means B1 n =1 n In which 1 is n ∈R n Is a unit vector.
By integrating an adaptive combination (ATC) diffusion strategy into the parameter updating process of Greedy-GQ, a cooperative reinforcement learning algorithm is provided.
It is noted that the proposed cooperative reinforcement learning algorithm introduces two intermediate vectors:andactual approximation parameter vector θ i (k +1) and correction parameter vector ω i (k +1) is the combination of the intermediate vectors of the neighboring agents. In the proposed algorithm, the learning rate parameters α (k) and β (k) can be set with the conditions P (1) to P (4).
α(k)>0,β(k)>0 P(1)
α(k)/β(k)→0 P(4)
Although the present invention has been described in detail with reference to the above embodiments, those skilled in the art can make modifications and equivalents to the embodiments of the present invention without departing from the spirit and scope of the present invention, which is set forth in the claims of the present application.
Claims (3)
1. A distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning is characterized by comprising the following steps:
step 1) constructing a double-layer random decision optimization model of distributed renewable energy transactions, wherein the double-layer random decision optimization model of distributed renewable energy transactions in the step 1) comprises an upper-layer planning modeling and a lower-layer planning modeling, and respectively corresponds to two parts of an energy transaction link;
step 2) introducing a multi-agent double-layer collaborative reinforcement learning algorithm, carrying out learning training according to a theoretical framework of the multi-agent double-layer collaborative reinforcement learning algorithm, and establishing a function approximator; the function approximator adopts a series of adjustable parameters and characteristics extracted from a state action space to estimate a Q value, the approximator establishes mapping of state action of a parameter space to a Q value function to the space, the mapping is linear or nonlinear, solvability is analyzed by utilizing linear mapping, and the typical form of the function approximator is as follows:
whereinIs an adjustable approximate parameter vector and is characterized in that,is the feature vector of the state-action pair,is the Basis Function (BF); (.) T Representing a matrix transpose operation;
step 3) solving an estimated value of an optimal Q value function by using an iterative computation method on the basis of the frame in the step 2);
step 4) solving an optimization model by using a trained multi-agent double-layer collaborative reinforcement learning algorithm to complete optimization calculation; the opportunity constraint planning of the optimistic value of the maximized objective function constructed in the upper-layer planning modeling has the optimization target of the maximum economic benefit, the constraint condition consists of objective constraint limit and opportunity constraint limit, and the mathematical expression of the upper-layer planning modeling is as follows:
constraint function:
wherein λ -power generation trade time-sharing quotation, wherein λ t Is the quote at time t, ξ -a random variable caused by the bid being unknown to the bidder,random variables caused by uncertainty of deviation of real values and predicted values of wind power and photovoltaic,when the quoted price is lambda, in xi andgenerator revenue under the scenario, beta-risk tolerance confidence,-satisfy expected yield at confidence of β, q t ,ξ Xi scene, the electric power quantity of the generator obtained in the lower layer planning in the time period t, c s ξ And c, under xi scene, the new energy subscription compensation of unit electric quantity users obtained by lower layer decision base -the cost per unit of electricity generation,the power generator is xi andpenalty fines in the scene, gamma-unit fines for outstanding electricity,moment-t, over-rated electric quantity in xi sceneThe unbalanced electric quantity of the maximum output under the scene,-atIn a scene, at the moment T, the actual output upper limit of the distributed power supply is T-one time period, and the default value is one hour;
the lower-layer planning modeling is used for optimizing scheduling and distributing bid amount of each power generator aiming at bidding scenes and taking market operation comprehensive benefits as targets, and the mathematical expression of the lower-layer planning is as follows:
constraint function:
in the formula: n is a radical of pv 、N wp -the total number of photovoltaic and wind power generators in the area, L-the total number of power consumers in the area,-unit cost of purchasing electricity from the external grid at time t,the electricity purchasing cost from No. i photovoltaic and wind power generators at the moment t,-t time point purchasing electric power from the external grid,purchasing electricity from No. i photovoltaic and wind power generators at the moment t,-load of No. i electricity consumer at time t, comp pv 、comp wp -per degree electricity subscription compensation paid in photovoltaic, wind power renewable energy, Q, user subscription scope load-pv-i 、Q load-wp-i I number user settles the charge-receivable photovoltaic and wind power subscription electric quantity on the same day, Q pv 、Q wp 、Q grid -photovoltaic, wind-electric, external electric quantity, upsilon, consumed in the area of the day pv 、υ wp -ratio of photovoltaic to wind power generation in the region of the day, α i 、β i -the photovoltaic and wind power ratio subscribed by the i-th user,and (4) the maximum generating capacity at the time t reported by the No. i photovoltaic and wind power generator.
2. The distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning according to claim 1, characterized in that: in the step 2), a plurality of agents are used for respectively processing the randomness problem of the upper-layer planning modeling and the lower-layer planning modeling and the mutual iteration of the upper-layer planning modeling and the lower-layer planning modeling; the double-layer collaborative reinforcement learning algorithm introduces a diffusion strategy in the reinforcement learning process, and introduces a self-adaptive combined ATC mechanism into the reinforcement learning algorithm, and the double-layer collaborative reinforcement learning algorithm can adapt to randomness and uncertainty caused by distributed renewable energy sources and can adapt to the problem of complex calculation of a double-layer random decision optimization model; to avoid the storage of a large number of Q-value tables, a function approximator is used to record the Q-values of complex continuous states and motion spaces.
3. The distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning according to claim 2, characterized in that: the diffusion strategy can achieve faster convergence and can achieve lower mean square deviation than the uniform strategy, and the diffusion strategy is as follows:
wherein,is an intermediate term, x, introduced by the diffusion strategy i (k +1) is the state updated by combining all intermediate terms of agent i; n is a radical of i Is a set of points adjacent to agent i; b ij Is the weight assigned by agent i to neighboring agent j; here, a matrix B ═ B is defined ij ]∈R n×n As a topological matrix of the microgrid communication network; the topology matrix B is a random matrix, B1 n =1 n In which 1 is n ∈R n Is a unit vector.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910519858.1A CN110276698B (en) | 2019-06-17 | 2019-06-17 | Distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910519858.1A CN110276698B (en) | 2019-06-17 | 2019-06-17 | Distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110276698A CN110276698A (en) | 2019-09-24 |
CN110276698B true CN110276698B (en) | 2022-08-02 |
Family
ID=67960916
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910519858.1A Active CN110276698B (en) | 2019-06-17 | 2019-06-17 | Distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110276698B (en) |
Families Citing this family (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110990793B (en) * | 2019-12-07 | 2024-03-15 | 国家电网有限公司 | Scheduling optimization method for electric heating gas coupling micro energy station |
CN111064229B (en) * | 2019-12-18 | 2023-04-07 | 广东工业大学 | Wind-light-gas-storage combined dynamic economic dispatching optimization method based on Q learning |
CN111200285B (en) * | 2020-02-12 | 2023-12-19 | 燕山大学 | Micro-grid hybrid coordination control method based on reinforcement learning and multi-agent theory |
CN112612206B (en) * | 2020-11-27 | 2022-11-08 | 合肥工业大学 | Multi-agent collaborative decision-making method and system for uncertain events |
CN112714165B (en) * | 2020-12-22 | 2023-04-04 | 声耕智能科技(西安)研究院有限公司 | Distributed network cooperation strategy optimization method and device based on combination mechanism |
CN112859591B (en) * | 2020-12-23 | 2022-10-21 | 华电电力科学研究院有限公司 | Reinforced learning control system for operation optimization of energy system |
CN113378456B (en) * | 2021-05-21 | 2023-04-07 | 青海大学 | Multi-park comprehensive energy scheduling method and system |
CN113421004B (en) * | 2021-06-30 | 2023-05-26 | 国网山东省电力公司潍坊供电公司 | Transmission and distribution cooperative active power distribution network distributed robust extension planning system and method |
CN113555870B (en) * | 2021-07-26 | 2023-10-13 | 国网江苏省电力有限公司南通供电分公司 | Q-learning photovoltaic prediction-based power distribution network multi-time scale optimal scheduling method |
CN113780622B (en) * | 2021-08-04 | 2024-03-12 | 华南理工大学 | Multi-agent reinforcement learning-based distributed scheduling method for multi-microgrid power distribution system |
CN113743583B (en) * | 2021-08-07 | 2024-02-02 | 中国航空工业集团公司沈阳飞机设计研究所 | Method for inhibiting switching of invalid behaviors of intelligent agent based on reinforcement learning |
CN114021815B (en) * | 2021-11-04 | 2023-06-27 | 东南大学 | Scalable energy management collaboration method for community containing large-scale producers and consumers |
CN114611813B (en) * | 2022-03-21 | 2022-09-27 | 特斯联科技集团有限公司 | Community hot-cold water circulation optimal scheduling method and system based on hydrogen energy storage |
WO2024084125A1 (en) * | 2022-10-19 | 2024-04-25 | Aalto University Foundation Sr | Trained optimization agent for renewable energy time shifting |
CN117559387B (en) * | 2023-10-18 | 2024-06-21 | 东南大学 | VPP internal energy optimization method and system based on deep reinforcement learning dynamic pricing |
CN117350515B (en) * | 2023-11-21 | 2024-04-05 | 安徽大学 | Ocean island group energy flow scheduling method based on multi-agent reinforcement learning |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP6820815B2 (en) * | 2017-09-07 | 2021-01-27 | 株式会社日立製作所 | Learning control system and learning control method |
CN109325608B (en) * | 2018-06-01 | 2022-04-01 | 国网上海市电力公司 | Distributed power supply optimal configuration method considering energy storage and considering photovoltaic randomness |
-
2019
- 2019-06-17 CN CN201910519858.1A patent/CN110276698B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN110276698A (en) | 2019-09-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110276698B (en) | Distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning | |
Crespo-Vazquez et al. | A community-based energy market design using decentralized decision-making under uncertainty | |
Adetunji et al. | A review of metaheuristic techniques for optimal integration of electrical units in distribution networks | |
Wang et al. | Stochastic cooperative bidding strategy for multiple microgrids with peer-to-peer energy trading | |
Aghaei et al. | Risk-constrained offering strategy for aggregated hybrid power plant including wind power producer and demand response provider | |
Chen et al. | Research on day-ahead transactions between multi-microgrid based on cooperative game model | |
Varkani et al. | A new self-scheduling strategy for integrated operation of wind and pumped-storage power plants in power markets | |
CN109190802B (en) | Multi-microgrid game optimization method based on power generation prediction in cloud energy storage environment | |
Ghadimi et al. | PSO based fuzzy stochastic long-term model for deployment of distributed energy resources in distribution systems with several objectives | |
Gao et al. | A multiagent competitive bidding strategy in a pool-based electricity market with price-maker participants of WPPs and EV aggregators | |
Adil et al. | Energy trading among electric vehicles based on Stackelberg approaches: A review | |
CN112381263B (en) | Block chain-based distributed data storage multi-microgrid pre-day robust electric energy transaction method | |
CN111030188A (en) | Hierarchical control strategy containing distributed and energy storage | |
CN111082451A (en) | Incremental distribution network multi-objective optimization scheduling model based on scene method | |
CN112001752A (en) | Multi-virtual power plant dynamic game transaction behavior analysis method based on limited rationality | |
CN111311012A (en) | Multi-agent-based micro-grid power market double-layer bidding optimization method | |
CN111553750A (en) | Energy storage bidding strategy method considering power price uncertainty and loss cost | |
Gao et al. | Distributed energy trading and scheduling among microgrids via multiagent reinforcement learning | |
Chuang et al. | Deep reinforcement learning based pricing strategy of aggregators considering renewable energy | |
CN112217195A (en) | Cloud energy storage charging and discharging strategy forming method based on GRU multi-step prediction technology | |
CN116451880B (en) | Distributed energy optimization scheduling method and device based on hybrid learning | |
Liu et al. | Research on bidding strategy of thermal power companies in electricity market based on multi-agent deep deterministic policy gradient | |
CN116207739A (en) | Optimal scheduling method and device for power distribution network, computer equipment and storage medium | |
Gao et al. | Bounded rationality based multi-VPP trading in local energy markets: a dynamic game approach with different trading targets | |
Peng et al. | Review on bidding strategies for renewable energy power producers participating in electricity spot markets |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |