CN110276698A - Distribution type renewable energy trade decision method based on the study of multiple agent bilayer cooperative reinforcing - Google Patents
Distribution type renewable energy trade decision method based on the study of multiple agent bilayer cooperative reinforcing Download PDFInfo
- Publication number
- CN110276698A CN110276698A CN201910519858.1A CN201910519858A CN110276698A CN 110276698 A CN110276698 A CN 110276698A CN 201910519858 A CN201910519858 A CN 201910519858A CN 110276698 A CN110276698 A CN 110276698A
- Authority
- CN
- China
- Prior art keywords
- layer
- reinforcement learning
- renewable energy
- double
- agent
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 29
- 230000003014 reinforcing effect Effects 0.000 title abstract 6
- 230000006870 function Effects 0.000 claims abstract description 39
- 238000005457 optimization Methods 0.000 claims abstract description 26
- 230000008901 benefit Effects 0.000 claims abstract description 16
- 238000004364 calculation method Methods 0.000 claims abstract description 13
- 238000010248 power generation Methods 0.000 claims abstract description 13
- 230000007246 mechanism Effects 0.000 claims abstract description 8
- 238000012549 training Methods 0.000 claims abstract description 5
- 239000003795 chemical substances by application Substances 0.000 claims description 57
- 230000002787 reinforcement Effects 0.000 claims description 56
- 230000005611 electricity Effects 0.000 claims description 25
- 238000009792 diffusion process Methods 0.000 claims description 16
- 239000011159 matrix material Substances 0.000 claims description 15
- 239000013598 vector Substances 0.000 claims description 15
- 238000013507 mapping Methods 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 9
- 230000009471 action Effects 0.000 claims description 6
- 238000004891 communication Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 230000003044 adaptive effect Effects 0.000 description 5
- 238000011161 development Methods 0.000 description 2
- 239000000446 fuel Substances 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000004146 energy storage Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000035515 penetration Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Economics (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Evolutionary Computation (AREA)
- Computer Hardware Design (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- General Health & Medical Sciences (AREA)
- Geometry (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Supply And Distribution Of Alternating Current (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of distribution type renewable energy trade decision methods based on the study of multiple agent bilayer cooperative reinforcing, and the method includes following key steps: 1) constructing the double-deck Stochastic Decision-making Optimized model of distribution type renewable energy transaction;2) multiple agent bilayer cooperative reinforcing learning algorithm is introduced, according to the theoretical frame of multiple agent bilayer cooperative reinforcing learning algorithm, learning training is carried out, establishes function approximator and cooperative reinforcing study and work mechanism;3) estimated value of optimal Q value function is sought using iterative calculation method in the step 2) frame foundation;4) it using the multiple agent bilayer cooperative reinforcing learning algorithm solving optimization model trained, completes optimization and calculates.The present invention considers the uncertainty in distribution type renewable energy transaction, can be in the promotion Power Generation income for taking into account risk, while but also comprehensive benefit maximizes.
Description
Technical Field
The invention relates to the field of intelligent power distribution networks, in particular to a distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning.
Background
With the progress and development of society, the global demand for green, clean and efficient power is larger and larger, more and more distributed renewable energy sources are connected to a power distribution network, and the distributed energy sources have the characteristics of reasonable energy efficiency utilization, small loss, less pollution, flexible operation, good system economy and the like. The development mainly has the problems of grid connection, power supply quality, capacity storage, fuel supply and the like.
Distributed photovoltaic and wind power generation, while free of fuel costs, are high in construction, operating and maintenance costs. At present, the new energy distributed generator in China is mainly subsidized for profit through the electricity price of the state and local governments. However, as distributed power penetration increases, the profitability model is significantly less consistent with market laws. The distributed generators are subsidized through the subscription fee of the users, the generators can be helped to participate in market competition, and reasonable quotation is carried out according to the potential benefits and the power generation cost of the generators, so that the social benefits are improved to the maximum extent. Meanwhile, various uncertain information such as power generator quotation, distributed power supply output fluctuation, user subscription and the like are considered, model solution can be carried out through a multi-agent double-layer collaborative reinforcement learning solution method, an optimal scheduling decision can be rapidly calculated, risks are reduced, and economic benefits are improved.
Disclosure of Invention
In order to overcome the defects of the existing transaction decision method, the invention provides a distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning, a distributed energy double-layer random planning model under various uncertain information such as power generator quotation, distributed power supply output fluctuation, user subscription and the like is considered, model solution is carried out through a multi-agent double-layer collaborative reinforcement learning solution method, the optimal scheduling decision can be rapidly calculated, the risk is reduced, and the economic benefit is improved.
The invention realizes the aim through the following technical scheme:
a distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning comprises the following steps:
step 1) constructing a double-layer random decision optimization model of distributed renewable energy trading;
step 2) introducing a multi-agent double-layer collaborative reinforcement learning algorithm, carrying out learning training according to a theoretical framework of the multi-agent double-layer collaborative reinforcement learning algorithm, and establishing a function approximator and a collaborative reinforcement learning working mechanism; the function approximator estimates a Q value by adopting a series of adjustable parameters and characteristics extracted from a state action space, the approximator establishes mapping of the state action of a function from the parameter space to the Q value to the space, the mapping can be linear or nonlinear, solvability can be analyzed by utilizing linear mapping, and the typical form of the function approximator is as follows:
whereinIs an adjustable approximate parameter vector and is characterized in that,is the feature vector of the state-action pair,is the Basis Function (BF) (. DEG)TRepresenting a matrix transpose operation;
step 3) solving an estimation value of an optimal Q value function by using an iterative calculation method on the basis of the frame in the step 2);
and 4) solving an optimization model by using the trained multi-agent double-layer collaborative reinforcement learning algorithm to complete optimization calculation.
Preferably, the double-layer random decision optimization model for the distributed renewable energy transaction in step 1) includes an upper-layer planning modeling and a lower-layer planning modeling, which respectively correspond to two parts of an energy transaction link.
Preferably, the opportunity constraint programming for maximizing the optimistic value of the objective function constructed in the upper-level programming modeling has the optimization goal of maximum economic benefit, the constraint condition is composed of objective constraint limits and opportunity constraint limits, and the mathematical expression of the upper-level programming modeling is as follows:
constraint function:
wherein λ -power generation trade time-sharing quotation, wherein λtIs the bid at time t, ξ -a random variable that is not known to be caused by the bidder's bid,random variables caused by uncertainty of deviation of real values and predicted values of wind power and photovoltaic,when the quote is lambda, atξ andgenerator revenue in the scenario, β — bearing risk confidence,-meeting expected yield at β confidence, qt,ξξ, the power generator obtained in the lower layer planning marks the power amount in the time period t,under the scene of- ξ, user new energy subscription compensation (lower layer decision output) of unit electric quantity obtained by lower layer decision, cbase-the cost per unit of electricity generation,the power generator is ξPenalty fines in the scene, gamma-unit fines for outstanding electricity,time t, ξ scenario with exceeding of the amount of electricityThe unbalanced electric quantity of the maximum output under the scene,-atAnd in the scene, at the moment T, the actual output upper limit of the distributed power supply is T-one time period, and the default value is one hour.
Preferably, the lower-layer planning modeling is used for optimizing scheduling and allocating bid amount in each power generator aiming at a bidding scenario and with market operation comprehensive benefits as a target, and the mathematical expression of the lower-layer planning is as follows:
constraint function:
in the formula: n is a radical ofpv、Nwp-the total number of photovoltaic and wind power generators in the area, L-the total number of power consumers in the area,-unit cost of purchasing electricity from the external grid at time t,the electricity purchasing cost from No. i photovoltaic and wind power generators at the moment t,-t time point purchasing electric power from the external grid,purchasing electricity from No. i photovoltaic and wind power generators at the moment t,-load of No. i electricity consumer at time t, comppv、compwp-per-degree electricity subscription compensation paid in renewable energy sources such as photovoltaic, wind power and the like within the user subscription range, Qload-pv-i、Qload-wp-iPhotovoltaic and wind power generation system for I number user to settle accounts on current daySubscription of electric quantity, Qpv、Qwp、Qgrid-photovoltaic, wind-electric, external electric quantity, upsilon, consumed in the area of the daypv、υwp-ratio of photovoltaic to wind power generation in the area of the day, αi、βi-the photovoltaic and wind power ratio subscribed by the i-th user,and (4) the maximum generating capacity at the time t reported by the No. i photovoltaic and wind power generator.
Preferably, in the step 2), a plurality of agents are utilized to respectively process the randomness problem of the upper-layer planning modeling and the lower-layer planning modeling and the mutual iteration of the upper-layer planning modeling and the lower-layer planning modeling; the double-layer collaborative reinforcement algorithm introduces a diffusion strategy in the reinforcement learning process, and introduces an adaptive combination (ATC) mechanism into the reinforcement learning algorithm, and the collaborative reinforcement learning algorithm can adapt to randomness and uncertainty caused by distributed renewable energy sources and can adapt to the problem of complex calculation of a double-layer random decision optimization model; in addition, to avoid the storage of a large number of Q value tables, a function approximator is used to record the Q values of complex continuous states and motion spaces.
Preferably, the diffusion strategy can achieve faster convergence and can achieve a lower mean square deviation than a uniform strategy, which is as follows:
whereinIs an intermediate term, x, introduced by the diffusion strategyi(k +1) is the state updated by combining all intermediate terms of agent i; n is a radical ofiIs a set of points adjacent to agent i; bijIs the weight assigned by agent i to neighboring agent j; defining a matrix B ═ Bij]∈Rn×nAs a topological matrix of the microgrid communication network; the topology matrix B is a random matrix, B1n=1nIn which 1 isn∈RnIs a unit vector.
Has the advantages that:
1. the double-layer decision optimization model established by the invention can comprehensively consider the uncertainty situation caused by the random variables and make better decisions. It is therefore well suited for optimization decisions for distributed generators.
2. The algorithm provided by the invention is a double-layer collaborative reinforcement learning algorithm, can be well integrated into a two-layer random decision optimization model, and provides a new idea for intensive energy trading decision of a future information network and an energy network.
3. The invention introduces a plurality of agents to respectively process the randomness problem of the upper and lower layers of planning and the mutual iteration of the upper and lower layers, so that the collaborative reinforcement learning algorithm is more suitable for the problem of the double-layer planning.
4. The multi-agent double-layer collaborative reinforcement learning is used as a multi-agent reinforcement learning algorithm with self-learning and collaborative learning capabilities, and is more suitable for solving the large-scale distributed access energy problem with strong randomness and uncertainty. After certain training and updating, the algorithm can quickly carry out dynamic optimization, and meanwhile, the stability of global convergence is guaranteed.
5. A diffusion strategy is introduced in the reinforcement learning process, so that distributed information exchange can be realized in the microgrid, the calculation cost is reduced, faster convergence can be realized, and the mean square deviation lower than that of a consistent strategy can be achieved.
Drawings
FIG. 1 is an overall frame diagram of the present invention;
FIG. 2 is a flow chart of multi-agent dual-tier collaborative reinforcement learning according to the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings and specific embodiments.
The distributed renewable energy trading decision method based on multi-agent double-layer collaborative reinforcement learning disclosed by the invention takes a power distribution network as a medium, simultaneously schedules a distributed power supply and a controllable load, and realizes economic benefit optimization, and the optimization object and model of the method are schematically shown in figure 1.
The invention provides a distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning, which comprises the following steps:
step 1) constructing a double-layer random decision optimization model of distributed renewable energy trading;
step 2) introducing a multi-agent double-layer collaborative reinforcement learning algorithm, carrying out learning training according to a theoretical framework of the multi-agent double-layer collaborative reinforcement learning algorithm, and establishing a function approximator and a collaborative reinforcement learning working mechanism;
step 3) solving an estimation value of an optimal Q value function by using an iterative calculation method on the basis of the frame in the step 2);
and 4) solving an optimization model by using the trained multi-agent double-layer collaborative reinforcement learning algorithm to complete optimization calculation.
The double-layer random decision optimization model for the distributed renewable energy transaction in the step 1) comprises an upper-layer planning modeling and a lower-layer planning modeling, which respectively correspond to two parts of an energy transaction link
In the step 2), a plurality of agents are used for respectively processing the randomness problem of the upper-layer planning modeling and the lower-layer planning modeling and the mutual iteration of the upper-layer planning modeling and the lower-layer planning modeling; the double-layer collaborative reinforcement algorithm introduces a diffusion strategy in the reinforcement learning process, and introduces an adaptive combination (ATC) mechanism into the reinforcement learning algorithm, and the collaborative reinforcement learning algorithm can adapt to randomness and uncertainty caused by distributed renewable energy sources and can adapt to the problem of complex calculation of a double-layer random decision optimization model; to avoid the storage of a large number of Q-value tables, a function approximator is used to record the Q-values of complex continuous states and motion spaces.
The step 3) iterative computation flow comprises the following steps (see fig. 2):
s1 initialization theta0,ω0
S2, repeating the times k is 1to T
S3, each agent calculates i-1 to n in turn
S4 calculating the feature vectorAnd state si(k)
S5 selecting action a according to strategy pii(k)
S6 observing the prize value ri(k)
TD error delta S7i(k)
S8 estimation
S9, updating the parameter thetai(k),ωi(k)
S10 Return to S3
S11 Return to S2
S12: and returning the result.
The basic steps and explanation of the application of the distributed renewable energy framework of multi-agent double-layer collaborative reinforcement learning are as follows:
a1: decomposing and writing the target function and the constraint function of the upper and lower layer plans into respective rewards of a reinforcement learning algorithm to serve as reference values of rewards, wherein the target function of the upper layer plan is expected to be the maximum and is set as forward rewards, the target function of the lower layer plan is expected to be the lowest in price and is set as reverse rewards, the constraint conditions of the upper and lower layer plans are used as penalty items, coefficients are set according to actual debugging conditions, the requirement is that the penalty coefficient of the strong constraint is far greater than the Reward item coefficient, and the weak constraint is greater than the Reward item coefficient.
A2: the method comprises the steps of constructing a first reinforcement learning module which is essentially a combination of two (usually a plurality of) reinforcement learning intelligent agents, establishing a reinforcement learning intelligent agent unit by taking a lower-layer plan as a module, establishing a reinforcement learning intelligent agent unit by taking each power generator as a module at an upper layer due to a plurality of power generators, and finally integrating the intelligent agent unit at the upper layer and the intelligent agent unit at the lower layer through a whole intelligent agent unit, wherein as shown in an intelligent agent II in figure 1, the Reward structure of the intelligent agent II is that the maximum total Reward of each intelligent agent unit is the maximum target.
A3: and establishing a function approximator. The storage of the Q value occupies a large amount of resources of the computer, so as to reduce the occupation of the computer resources and increase the calculation speed.
A4: establishing a cooperative reinforcement learning working mechanism, and establishing a parameter updating process of integrating an adaptive combination (ATC) diffusion strategy into Greedy-GQ in order to accelerate the calculation efficiency of a multi-agent.
A5: and constructing a second reinforcement learning module, taking the agent II as an environment of the agent, and establishing an updating strategy by using a conventional Q learning (or Sarsar, DQN and the like) updating rule.
Modeling upper layer planning:
and the opportunity constraint planning of the optimistic value of the maximized objective function constructed in the upper-layer planning aims at maximizing economic benefit, and the constraint condition consists of objective constraint limit and opportunity constraint limit. Moreover, the upper layer optimization aims at an optimistic value of the economic benefit (i.e. the economic benefit obtained is better than the value at a certain confidence) to minimize the operation cost of the distribution network. The objective constraint limits are constraint conditions aiming at the deterministic objects and comprise unit power generation cost, unit unfinished power generation amount fine, upper and lower limits of actual processing of the distributed power supply and the like. The opportunity constraint limit is a constraint condition aiming at the distribution network uncertainty object, and comprises a probability constraint for bearing risk execution degree, a power flow safety limit and the like. Sources of uncertainty factors include uncertainty of distributed photovoltaic, wind power output, generator bid, uncertainty of traditional load forecast deviation, and the like.
Therefore, the mathematical expression of the upper-level planning modeling is as follows:
constraint function:
in the formula
lambda-Power Generation trade time-of-sale quotes, where lambdatIs a quote at time t
ξ random variable caused by unknown price quote of bidder
-random variables caused by uncertainty of deviation of real values and predicted values of wind power and photovoltaic
When the offer is λ, ξ andpower generator revenue under scene
β -confidence of bearing risk
Satisfaction of expected yield at β confidence
qt,ξξ scenario, the power generator obtained in the lower layer planning bid amount in the time period t
ξ scene, unit electric quantity user new energy subscription compensation obtained by lower layer decision (lower layer decision output)
cbase-cost per unit of electricity generation
-power generators at ξ andpenalty fines under scene
Gamma-unit fine of unfinished electricity
Time t, ξ scenario with power charge exceededUnbalanced electric quantity of maximum output under scene
-atActual output upper limit of distributed power supply at time t under scene
T-a period of time, with a default value of one hour.
Modeling of a lower layer plan:
and the lower-layer planning optimizes the scheduling and the allocation of the right to bid of each power plant by taking the comprehensive benefits of market operation as a target. The lower level programming model is actually a market-balanced scheduling model for the regional retail market. The accuracy of the model determines whether the regional market can function properly according to the rules. Due to the neglect of energy storage, the electricity purchase sources in the region include both the distributed generator and the external grid, and the sum of the electricity purchase cost of each period constitutes the cost source of the system. In addition, considering that the user is willing to pay a certain cost to order new energy and enjoy green power, this user group may also be included in the overall benefit. Therefore, the optimal goal can be to minimize the cost of electricity purchase and increase the subscription fees for green electricity.
Therefore, the mathematical expression of the underlying plan modeling is as follows:
constraint function:
in the formula:
Npv、Nwp-total number of photovoltaic and wind power generation suppliers in the area
L-total number of power consumers in area
-unit cost of purchasing electricity from external grid at time t
-the electricity purchasing cost from No. i photovoltaic and wind power generator at the moment t
-purchasing electric power from external grid at time t
-purchasing electricity from No. i photovoltaic and wind power generator at time t
Load capacity of No. i power consumer at time t
comppv、compwpThe subscription range of the user is per-degree electricity subscription compensation paid in renewable energy sources such as photovoltaic energy, wind power energy and the like
Qload-pv-i、Qload-wp-iThe photovoltaic and wind power subscription electric quantity which is paid by the i-number user in the current day settlement
Qpv、Qwp、Qgrid-photovoltaic, wind power, external electrical quantities consumed within the area of the day
υpv、υwp-ratio of photovoltaic to wind power generation in the region of the same day
αi、βi-photovoltaic and wind power ratio subscribed by No. i user
-maximum generated energy at time t reported by No. i photovoltaic and wind power generator
A function approximator:
the function approximator estimates the Q value using a series of adjustable parameters and features extracted from the state action space. The approximator then builds a mapping of the state contribution from the parameter space to the Q-value function to the space. The mapping may be linear or non-linear. Solvability may be analyzed using linear mapping. A typical form of a linear approximator is as follows:
whereinIs an adjustable approximate parameter vector and is characterized in that,is the feature vector of the state-action pair, which can be derived from the following equation:
whereinIs a Basis Function (BF), such as gaussian radial BF, centered at a selected motionless point in the state space. Typically, the BFs sets corresponding to fixed points are evenly distributed in the state space. Herein, all vectors are considered to be column vectors if not specified. (.)TRepresenting a matrix transpose operation. Radial basis function neural networks have been used in random nonlinear interconnect systems and have been shown to have good generalization performance.
Diffusion strategy:
the reinforcement learning algorithm introduces a diffusion strategy in the reinforcement learning process, and introduces an adaptive combination (ATC) mechanism into the reinforcement learning algorithm. The diffusion strategy may achieve faster convergence and may achieve a lower mean square deviation than the uniform strategy. Furthermore, the flooding strategy has better response performance to continuous real-time signals and is insensitive to neighboring weights. The basic idea of the flooding strategy is to combine collaboration items based on neighboring states during the self-state update of each agent. Consider having state xiAnd its dynamic characteristics.
xi(k+1)=xi(k)+f(xi(k))
The diffusion strategy is as follows:
whereinIs an intermediate term, x, introduced by the diffusion strategyi(k +1) is the state updated by combining all the intermediate terms of agent i. N is a radical ofiIs a set of points adjacent to agent i. In addition, bijIs the weight assigned by agent i to neighboring agent j. Here, we can define a matrix B ═ Bij]∈Rn×nAs a topology matrix of the microgrid communication network. In general, the topology matrix B is a random matrix, which means B1n=1nIn which 1 isn∈RnIs a unit vector.
A collaborative reinforcement learning algorithm is provided by integrating an adaptive combination (ATC) diffusion strategy into the parameter updating process of Greeny-GQ.
It is noted that the proposed cooperative reinforcement learning algorithm introduces two intermediate vectors:andactual approximation parameter vector θi(k +1) and correction parameter vector ωiIn the proposed algorithm, the learning rate parameters α (k) and β (k) can be set with the conditions P (1) to P (4).
α(k)>0,β(k)>0 P(1)
α(k)/β(k)→0 P(4)
Although the present invention has been described in detail with reference to the above embodiments, those skilled in the art can make modifications and equivalents to the embodiments of the present invention without departing from the spirit and scope of the present invention, which is set forth in the claims of the present application.
Claims (6)
1. A distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning is characterized by comprising the following steps:
step 1) constructing a double-layer random decision optimization model of distributed renewable energy trading;
step 2) introducing a multi-agent double-layer collaborative reinforcement learning algorithm, carrying out learning training according to a theoretical framework of the multi-agent double-layer collaborative reinforcement learning algorithm, and establishing a function approximator; the function approximator estimates a Q value by adopting a series of adjustable parameters and characteristics extracted from a state action space, the approximator establishes mapping of the state action of a function from the parameter space to the Q value to the space, the mapping can be linear or nonlinear, solvability can be analyzed by utilizing linear mapping, and the typical form of the function approximator is as follows:
whereinIs an adjustable approximate parameter vector and is characterized in that,is the feature vector of the state-action pair,is the Basis Function (BF); (.)TRepresenting a matrix transpose operation;
step 3) solving an estimation value of an optimal Q value function by using an iterative calculation method on the basis of the frame in the step 2);
and 4) solving an optimization model by using the trained multi-agent double-layer collaborative reinforcement learning algorithm to complete optimization calculation.
2. The distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning according to claim 1, characterized in that: the distributed renewable energy trading double-layer random decision optimization model in the step 1) comprises an upper-layer planning modeling and a lower-layer planning modeling, and the upper-layer planning modeling and the lower-layer planning modeling respectively correspond to two parts of an energy trading link.
3. The distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning according to claim 2, characterized in that: the opportunity constraint planning of the optimistic value of the maximized objective function constructed in the upper-layer planning modeling has the optimization target of the maximum economic benefit, the constraint condition consists of objective constraint limit and opportunity constraint limit, and the mathematical expression of the upper-layer planning modeling is as follows:
constraint function:
wherein lambda is a time-shared quote for the generator, wherein lambdatIs the quote at time t, ξ -a random variable that is not known to the bidder,random variables caused by uncertainty of deviation of real values and predicted values of wind power and photovoltaic,when the quote is λ, ξ withGenerator revenue in the scenario, β -confidence in risk,-meeting expected revenue at β confidence, qt,ξξ, the power generator obtained in the lower layer planning is marked with power in the time period t,under the scene of- ξ, the new energy subscription compensation of the unit electric quantity user obtained by lower layer decision, cbase-the cost per unit of electricity generation,the generator at ξ andpenalty fines in the scene, gamma-unit fines for outstanding electricity,time t, ξ scenario with power exceedingThe unbalanced electric quantity of the maximum output under the scene,-atUnder the scene, at the actual output upper limit of the distributed power supply at the moment T, T-one time period, the default value is one hour.
4. The distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning according to claim 2, characterized in that: the lower-layer planning modeling is used for optimizing scheduling and distributing bid amount of each power generator aiming at bidding scenes and taking market operation comprehensive benefits as targets, and the mathematical expression of the lower-layer planning is as follows:
constraint function:
in the formula: n is a radical ofpv、Nwp-the total number of photovoltaic, wind power generators in the area, L-the total number of power consumers in the area,-unit cost of purchasing electricity from the external grid at time t,the electricity purchasing cost from No. i photovoltaic and wind power generators at the moment t,-t time purchasing electric power from an external grid,purchasing electricity from No. i photovoltaic and wind power generation suppliers at the moment t,-load of No. i electricity consumer at time t, comppv、compwp-subscriber subscriptionThe range is per-degree electricity subscription compensation paid in renewable energy sources such as photovoltaic energy, wind power energy and the like, Qload-pv-i、Qload-wp-iI number user settles the charge-receivable photovoltaic and wind power subscription electric quantity on the same day, Qpv、Qwp、Qgrid-photovoltaic, wind-electric, external electric quantity, upsilon, consumed in the area of the daypv、υwp-ratio of photovoltaic to wind power generation in the area of the day, αi、βi-the photovoltaic and wind power ratio subscribed by the i-th user,and (4) the maximum generating capacity at the time t reported by the No. i photovoltaic and wind power generator.
5. The distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning according to claim 1, characterized in that: in the step 2), a plurality of agents are used for respectively processing the randomness problem of the upper-layer planning modeling and the lower-layer planning modeling and the mutual iteration of the upper-layer planning modeling and the lower-layer planning modeling; the double-layer collaborative reinforcement learning algorithm introduces a diffusion strategy in the reinforcement learning process, and introduces a self-adaptive combined ATC mechanism into the reinforcement learning algorithm, and the double-layer collaborative reinforcement learning algorithm can adapt to randomness and uncertainty caused by distributed renewable energy sources and can adapt to the problem of complex calculation of a double-layer random decision optimization model; to avoid the storage of a large number of Q-value tables, a function approximator is used to record the Q-values of complex continuous states and motion spaces.
6. The distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning according to claim 5, characterized in that: the diffusion strategy can achieve faster convergence and can achieve lower mean square deviation than the uniform strategy, and the diffusion strategy is as follows:
wherein,is an intermediate term, x, introduced by the diffusion strategyi(k +1) is the state updated by combining all intermediate terms of agent i; n is a radical ofiIs a set of points adjacent to agent i; bijIs the weight assigned by agent i to neighboring agent j; here, a matrix B ═ B is definedij]∈Rn×nAs a topological matrix of the microgrid communication network; the topology matrix B is a random matrix, B1n=1nIn which 1 isn∈RnIs a unit vector.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910519858.1A CN110276698B (en) | 2019-06-17 | 2019-06-17 | Distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910519858.1A CN110276698B (en) | 2019-06-17 | 2019-06-17 | Distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110276698A true CN110276698A (en) | 2019-09-24 |
CN110276698B CN110276698B (en) | 2022-08-02 |
Family
ID=67960916
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910519858.1A Active CN110276698B (en) | 2019-06-17 | 2019-06-17 | Distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110276698B (en) |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110990793A (en) * | 2019-12-07 | 2020-04-10 | 国家电网有限公司 | Scheduling optimization method for electric-thermal gas coupling micro-energy source station |
CN111064229A (en) * | 2019-12-18 | 2020-04-24 | 广东工业大学 | Wind-light-gas-storage combined dynamic economic dispatching optimization method based on Q learning |
CN111200285A (en) * | 2020-02-12 | 2020-05-26 | 燕山大学 | Micro-grid hybrid coordination control method based on reinforcement learning and multi-agent theory |
CN112612206A (en) * | 2020-11-27 | 2021-04-06 | 合肥工业大学 | Multi-agent collaborative decision-making method and system for uncertain events |
CN112714165A (en) * | 2020-12-22 | 2021-04-27 | 声耕智能科技(西安)研究院有限公司 | Distributed network cooperation strategy optimization method and device based on combination mechanism |
CN112859591A (en) * | 2020-12-23 | 2021-05-28 | 华电电力科学研究院有限公司 | Reinforced learning control system for operation optimization of energy system |
CN113378456A (en) * | 2021-05-21 | 2021-09-10 | 青海大学 | Multi-park comprehensive energy scheduling method and system |
CN113421004A (en) * | 2021-06-30 | 2021-09-21 | 国网山东省电力公司潍坊供电公司 | Transmission and distribution cooperative active power distribution network distributed robust extension planning system and method |
CN113555870A (en) * | 2021-07-26 | 2021-10-26 | 国网江苏省电力有限公司南通供电分公司 | Q-learning photovoltaic prediction-based power distribution network multi-time scale optimization scheduling method |
CN113743583A (en) * | 2021-08-07 | 2021-12-03 | 中国航空工业集团公司沈阳飞机设计研究所 | Intelligent agent invalid behavior switching inhibition method based on reinforcement learning |
CN113780622A (en) * | 2021-08-04 | 2021-12-10 | 华南理工大学 | Multi-micro-grid power distribution system distributed scheduling method based on multi-agent reinforcement learning |
CN114021815A (en) * | 2021-11-04 | 2022-02-08 | 东南大学 | Extensible energy management cooperation method for community containing large-scale production and consumption persons |
CN114611813A (en) * | 2022-03-21 | 2022-06-10 | 特斯联科技集团有限公司 | Community hot-cold water circulation optimal scheduling method and system based on hydrogen energy storage |
CN117350515A (en) * | 2023-11-21 | 2024-01-05 | 安徽大学 | Ocean island group energy flow scheduling method based on multi-agent reinforcement learning |
CN117559387A (en) * | 2023-10-18 | 2024-02-13 | 东南大学 | VPP internal energy optimization method and system based on deep reinforcement learning dynamic pricing |
WO2024084125A1 (en) * | 2022-10-19 | 2024-04-25 | Aalto University Foundation Sr | Trained optimization agent for renewable energy time shifting |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109325608A (en) * | 2018-06-01 | 2019-02-12 | 国网上海市电力公司 | Consider the distributed generation resource Optimal Configuration Method of energy storage and meter and photovoltaic randomness |
US20190072916A1 (en) * | 2017-09-07 | 2019-03-07 | Hitachi, Ltd. | Learning control system and learning control method |
-
2019
- 2019-06-17 CN CN201910519858.1A patent/CN110276698B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190072916A1 (en) * | 2017-09-07 | 2019-03-07 | Hitachi, Ltd. | Learning control system and learning control method |
CN109325608A (en) * | 2018-06-01 | 2019-02-12 | 国网上海市电力公司 | Consider the distributed generation resource Optimal Configuration Method of energy storage and meter and photovoltaic randomness |
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110990793B (en) * | 2019-12-07 | 2024-03-15 | 国家电网有限公司 | Scheduling optimization method for electric heating gas coupling micro energy station |
CN110990793A (en) * | 2019-12-07 | 2020-04-10 | 国家电网有限公司 | Scheduling optimization method for electric-thermal gas coupling micro-energy source station |
CN111064229A (en) * | 2019-12-18 | 2020-04-24 | 广东工业大学 | Wind-light-gas-storage combined dynamic economic dispatching optimization method based on Q learning |
CN111064229B (en) * | 2019-12-18 | 2023-04-07 | 广东工业大学 | Wind-light-gas-storage combined dynamic economic dispatching optimization method based on Q learning |
CN111200285A (en) * | 2020-02-12 | 2020-05-26 | 燕山大学 | Micro-grid hybrid coordination control method based on reinforcement learning and multi-agent theory |
CN111200285B (en) * | 2020-02-12 | 2023-12-19 | 燕山大学 | Micro-grid hybrid coordination control method based on reinforcement learning and multi-agent theory |
CN112612206A (en) * | 2020-11-27 | 2021-04-06 | 合肥工业大学 | Multi-agent collaborative decision-making method and system for uncertain events |
CN112714165A (en) * | 2020-12-22 | 2021-04-27 | 声耕智能科技(西安)研究院有限公司 | Distributed network cooperation strategy optimization method and device based on combination mechanism |
CN112859591A (en) * | 2020-12-23 | 2021-05-28 | 华电电力科学研究院有限公司 | Reinforced learning control system for operation optimization of energy system |
CN113378456A (en) * | 2021-05-21 | 2021-09-10 | 青海大学 | Multi-park comprehensive energy scheduling method and system |
CN113421004A (en) * | 2021-06-30 | 2021-09-21 | 国网山东省电力公司潍坊供电公司 | Transmission and distribution cooperative active power distribution network distributed robust extension planning system and method |
CN113555870A (en) * | 2021-07-26 | 2021-10-26 | 国网江苏省电力有限公司南通供电分公司 | Q-learning photovoltaic prediction-based power distribution network multi-time scale optimization scheduling method |
CN113555870B (en) * | 2021-07-26 | 2023-10-13 | 国网江苏省电力有限公司南通供电分公司 | Q-learning photovoltaic prediction-based power distribution network multi-time scale optimal scheduling method |
CN113780622A (en) * | 2021-08-04 | 2021-12-10 | 华南理工大学 | Multi-micro-grid power distribution system distributed scheduling method based on multi-agent reinforcement learning |
CN113780622B (en) * | 2021-08-04 | 2024-03-12 | 华南理工大学 | Multi-agent reinforcement learning-based distributed scheduling method for multi-microgrid power distribution system |
CN113743583B (en) * | 2021-08-07 | 2024-02-02 | 中国航空工业集团公司沈阳飞机设计研究所 | Method for inhibiting switching of invalid behaviors of intelligent agent based on reinforcement learning |
CN113743583A (en) * | 2021-08-07 | 2021-12-03 | 中国航空工业集团公司沈阳飞机设计研究所 | Intelligent agent invalid behavior switching inhibition method based on reinforcement learning |
CN114021815B (en) * | 2021-11-04 | 2023-06-27 | 东南大学 | Scalable energy management collaboration method for community containing large-scale producers and consumers |
CN114021815A (en) * | 2021-11-04 | 2022-02-08 | 东南大学 | Extensible energy management cooperation method for community containing large-scale production and consumption persons |
CN114611813A (en) * | 2022-03-21 | 2022-06-10 | 特斯联科技集团有限公司 | Community hot-cold water circulation optimal scheduling method and system based on hydrogen energy storage |
WO2024084125A1 (en) * | 2022-10-19 | 2024-04-25 | Aalto University Foundation Sr | Trained optimization agent for renewable energy time shifting |
CN117559387A (en) * | 2023-10-18 | 2024-02-13 | 东南大学 | VPP internal energy optimization method and system based on deep reinforcement learning dynamic pricing |
CN117350515A (en) * | 2023-11-21 | 2024-01-05 | 安徽大学 | Ocean island group energy flow scheduling method based on multi-agent reinforcement learning |
CN117350515B (en) * | 2023-11-21 | 2024-04-05 | 安徽大学 | Ocean island group energy flow scheduling method based on multi-agent reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
CN110276698B (en) | 2022-08-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110276698B (en) | Distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning | |
Li et al. | Distributed tri-layer risk-averse stochastic game approach for energy trading among multi-energy microgrids | |
Cheng et al. | Game-theoretic approaches applied to transactions in the open and ever-growing electricity markets from the perspective of power demand response: An overview | |
Adetunji et al. | A review of metaheuristic techniques for optimal integration of electrical units in distribution networks | |
Aghaei et al. | Risk-constrained offering strategy for aggregated hybrid power plant including wind power producer and demand response provider | |
Varkani et al. | A new self-scheduling strategy for integrated operation of wind and pumped-storage power plants in power markets | |
Chen et al. | Research on day-ahead transactions between multi-microgrid based on cooperative game model | |
Maity et al. | Simulation and pricing mechanism analysis of a solar-powered electrical microgrid | |
CN109190802B (en) | Multi-microgrid game optimization method based on power generation prediction in cloud energy storage environment | |
Gao et al. | A multiagent competitive bidding strategy in a pool-based electricity market with price-maker participants of WPPs and EV aggregators | |
CN112381263B (en) | Block chain-based distributed data storage multi-microgrid pre-day robust electric energy transaction method | |
Adil et al. | Energy trading among electric vehicles based on Stackelberg approaches: A review | |
CN111082451A (en) | Incremental distribution network multi-objective optimization scheduling model based on scene method | |
CN111311012A (en) | Multi-agent-based micro-grid power market double-layer bidding optimization method | |
Gao et al. | Distributed energy trading and scheduling among microgrids via multiagent reinforcement learning | |
Chuang et al. | Deep reinforcement learning based pricing strategy of aggregators considering renewable energy | |
CN111553750A (en) | Energy storage bidding strategy method considering power price uncertainty and loss cost | |
Liu et al. | Research on bidding strategy of thermal power companies in electricity market based on multi-agent deep deterministic policy gradient | |
CN112217195A (en) | Cloud energy storage charging and discharging strategy forming method based on GRU multi-step prediction technology | |
CN116451880B (en) | Distributed energy optimization scheduling method and device based on hybrid learning | |
CN112686693A (en) | Method, system, equipment and storage medium for predicting marginal electricity price of electric power spot market | |
Peng et al. | Review on bidding strategies for renewable energy power producers participating in electricity spot markets | |
CN117578409A (en) | Multi-energy complementary optimization scheduling method and system in power market environment | |
CN115422728A (en) | Robust optimization virtual power plant optimization control system based on stochastic programming | |
CN116307029A (en) | Double-layer optimal scheduling method and system for promoting coordination of source storage among multiple virtual grids |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |