CN110276698B - Distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning - Google Patents

Distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning Download PDF

Info

Publication number
CN110276698B
CN110276698B CN201910519858.1A CN201910519858A CN110276698B CN 110276698 B CN110276698 B CN 110276698B CN 201910519858 A CN201910519858 A CN 201910519858A CN 110276698 B CN110276698 B CN 110276698B
Authority
CN
China
Prior art keywords
reinforcement learning
layer
renewable energy
power
agent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910519858.1A
Other languages
Chinese (zh)
Other versions
CN110276698A (en
Inventor
王建春
陈张宇
刘�东
黄玉辉
孙健
李峰
殷小荣
吉兰芳
孙宏斌
戴晖
吴晓飞
芦苇
戴易见
徐晓春
李佑伟
汤同峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
HuaiAn Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Shanghai Jiao Tong University
Original Assignee
HuaiAn Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Shanghai Jiao Tong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by HuaiAn Power Supply Co of State Grid Jiangsu Electric Power Co Ltd, Shanghai Jiao Tong University filed Critical HuaiAn Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Priority to CN201910519858.1A priority Critical patent/CN110276698B/en
Publication of CN110276698A publication Critical patent/CN110276698A/en
Application granted granted Critical
Publication of CN110276698B publication Critical patent/CN110276698B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Evolutionary Computation (AREA)
  • Computer Hardware Design (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Geometry (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Supply And Distribution Of Alternating Current (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning, which comprises the following main steps of: 1) constructing a double-layer random decision optimization model of distributed renewable energy trading; 2) introducing a multi-agent double-layer collaborative reinforcement learning algorithm, carrying out learning training according to a theoretical framework of the multi-agent double-layer collaborative reinforcement learning algorithm, and establishing a function approximator and a collaborative reinforcement learning working mechanism; 3) calculating an estimated value of an optimal Q value function by using an iterative calculation method on the basis of the frame in the step 2); 4) and solving an optimization model by using the trained multi-agent double-layer collaborative reinforcement learning algorithm to complete optimization calculation. The invention considers the uncertainty in the distributed renewable energy transaction, can improve the income of the power generator in consideration of risks, and simultaneously maximizes the comprehensive benefit.

Description

基于多智能体双层协同强化学习的分布式可再生能源交易决 策方法Distributed renewable energy trading decision method based on multi-agent double-layer collaborative reinforcement learning

技术领域technical field

本发明涉及智能配电网领域,具体涉及一种基于多智能体双层协同强化学习的分布式可再生能源交易决策方法。The invention relates to the field of intelligent power distribution networks, in particular to a distributed renewable energy transaction decision-making method based on multi-agent double-layer collaborative reinforcement learning.

背景技术Background technique

随着社会的进步和发展,全球对绿色、清洁、高效电力的需求越来越大,越来越多的分布式可再生能源接入配电网,分布式能源具有能效利用合理、损耗小、污染少、运行灵活,系统经济性好等特点。发展主要存在并网、供电质量、容量储备、燃料供应等问题。With the progress and development of society, the global demand for green, clean and efficient power is increasing. More and more distributed renewable energy is connected to the distribution network. Distributed energy has the advantages of reasonable energy efficiency, low loss, It has the characteristics of less pollution, flexible operation and good system economy. The development mainly involves problems such as grid connection, power supply quality, capacity reserve, and fuel supply.

分布式光伏和风力发电虽然没有燃料成本,但其建设成本、运行成本和维护成本都很高。目前,我国新能源分布式发电机主要通过国家和地方政府电价补贴盈利。然而,随着分布式电力渗透率的提高,盈利模式明显不符合市场规律。通过用户订阅费对分布式发电机进行补贴,可以帮助发电机参与市场竞争,根据自身的潜在效益和发电成本合理报价,从而最大限度地提高社会效益。同时考虑发电商报价、分布式电源出力波动及用户认购等多种不确定信息,可以通过多智能体双层协同强化学习的求解方法进行模型求解,可以快速计算出最优调度决策,降低风险,提升了经济效益。Although distributed photovoltaic and wind power generation has no fuel cost, its construction cost, operation cost and maintenance cost are high. At present, my country's new energy distributed generators mainly make profits through national and local government electricity price subsidies. However, with the increase in the penetration rate of distributed power, the profit model obviously does not conform to the laws of the market. Subsidizing distributed generators through user subscription fees can help generators to participate in market competition and make reasonable quotations based on their own potential benefits and power generation costs, thereby maximizing social benefits. At the same time, considering various uncertain information such as generator quotations, distributed power output fluctuations, and user subscriptions, the model can be solved by the multi-agent double-layer collaborative reinforcement learning solution method, which can quickly calculate the optimal scheduling decision and reduce risks. Improve economic efficiency.

发明内容SUMMARY OF THE INVENTION

为了克服现有交易决策方法的缺陷,本发明提出了一种基于多智能体双层协同强化学习的分布式可再生能源交易决策方法,考虑发电商报价、分布式电源出力波动及用户认购等多种不确定信息下的分布式能源双层随机规划模型,通过多智能体双层协同强化学习的求解方法进行模型求解,可以快速计算出最优调度决策,降低风险,提升了经济效益。In order to overcome the shortcomings of the existing transaction decision-making methods, the present invention proposes a distributed renewable energy transaction decision-making method based on multi-agent double-layer collaborative reinforcement learning, which takes into account the quotations of power generators, the output fluctuation of distributed power sources, and user subscriptions. This is a two-layer stochastic programming model for distributed energy under uncertain information. The model is solved by a multi-agent two-layer collaborative reinforcement learning solution method, which can quickly calculate the optimal scheduling decision, reduce risks, and improve economic benefits.

本发明通过以下技术方案实现上述目的:The present invention realizes above-mentioned purpose through following technical scheme:

一种基于多智能体双层协同强化学习的分布式可再生能源交易决策方法,包括以下步骤:A distributed renewable energy transaction decision-making method based on multi-agent double-layer collaborative reinforcement learning, comprising the following steps:

步骤1)构建分布式可再生能源交易的双层随机决策优化模型;Step 1) Build a two-layer stochastic decision-making optimization model for distributed renewable energy trading;

步骤2)引入多智能体双层协同强化学习算法,根据多智能体双层协同强化学习算法的理论框架,进行学习训练,建立函数逼近器以及协同强化学习工作机制;所述函数逼近器采用一系列可调参数和从状态作用空间中提取的特征来估计Q值,逼近器建立从参数空间到Q值函数的状态作用对空间的映射,映射可以是线性的,也可以是非线性的,可以利用线性映射分析可解性,函数逼近器的典型形式如下:Step 2) Introduce a multi-agent double-layer collaborative reinforcement learning algorithm, carry out learning and training according to the theoretical framework of the multi-agent double-layer collaborative reinforcement learning algorithm, and establish a function approximator and a collaborative reinforcement learning working mechanism; the function approximator adopts a A series of adjustable parameters and features extracted from the state action space are used to estimate the Q value. The approximator establishes a mapping from the parameter space to the state action to space of the Q value function. The mapping can be linear or nonlinear. The linear map analyzes solvability, and the typical form of the function approximator is as follows:

Figure BDA0002096301310000011
Figure BDA0002096301310000011

Figure BDA0002096301310000021
Figure BDA0002096301310000021

其中

Figure BDA0002096301310000022
是一种可调近似参数向量,
Figure BDA0002096301310000023
是状态-动作对的特征向量,
Figure BDA0002096301310000024
是基函数(BF),(·)T表示矩阵转置操作;in
Figure BDA0002096301310000022
is a tunable approximate parameter vector,
Figure BDA0002096301310000023
is the eigenvector of the state-action pair,
Figure BDA0002096301310000024
is the basis function (BF), ( ) T represents the matrix transpose operation;

步骤3)在步骤2)所述框架基础上利用迭代计算方法求取最优Q值函数的估计值;Step 3) utilizes iterative calculation method to obtain the estimated value of the optimal Q value function on the basis of the frame described in step 2);

步骤4)利用训练完的多智能体双层协同强化学习算法求解优化模型,完成优化计算。Step 4) Use the trained multi-agent double-layer collaborative reinforcement learning algorithm to solve the optimization model, and complete the optimization calculation.

优选地,所述步骤1)中的分布式可再生能源交易的双层随机决策优化模型包括上层规划建模和下层规划建模,分别对应能源交易环节的两个部分。Preferably, the two-layer stochastic decision-making optimization model for distributed renewable energy trading in the step 1) includes upper-level planning modeling and lower-level planning modeling, respectively corresponding to two parts of the energy trading link.

优选地,所述上层规划建模中构建的最大化目标函数乐观值的机会约束规划,其优化目标为经济效益最大,约束条件由客观约束限制与机会约束限制组成,所述上层规划建模其数学表达式如下:Preferably, the opportunity-constrained programming that maximizes the optimistic value of the objective function constructed in the upper-level planning modeling, the optimization goal is to maximize economic benefits, and the constraints are composed of objective constraints and opportunity constraints, and the upper-level planning modeling The mathematical expression is as follows:

Figure BDA0002096301310000025
Figure BDA0002096301310000025

约束函数:Constraint function:

Figure BDA0002096301310000026
Figure BDA0002096301310000026

Figure BDA0002096301310000027
Figure BDA0002096301310000027

Figure BDA0002096301310000028
Figure BDA0002096301310000028

Figure BDA0002096301310000029
Figure BDA0002096301310000029

式中,λ—发电商分时报价,其中λt是时刻为t时的报价,ξ—竞价商报价未知造成的随机变量,

Figure BDA00020963013100000210
—风电、光伏真实值与预测值偏差不确定性造成的随机变量,
Figure BDA00020963013100000211
—报价为λ时,在ξ与
Figure BDA00020963013100000219
场景下的发电商收益,β—承受风险置信度,
Figure BDA00020963013100000213
—满足β置信度下的预期收益,qt,ξ—ξ场景下,下层规划中得到的该发电商在t时段中标电量,
Figure BDA00020963013100000220
—ξ场景下,下层决策得到的单位电量用户新能源认购补偿(下层决策输出),cbase—单位发电成本,
Figure BDA00020963013100000215
—发电商在ξ与
Figure BDA00020963013100000216
场景下的违约罚款,γ—未完成电量的单位罚款,
Figure BDA00020963013100000217
—t时刻,ξ场景中标电量超出
Figure BDA0002096301310000031
场景下最大出力的不平衡电量,
Figure BDA0002096301310000032
—在
Figure BDA0002096301310000033
场景下在t时刻分布式电源实际出力上限,T—一个时段,默认值为一小时。In the formula, λ—the time-sharing quotation of the generator, where λ t is the quotation at time t, ξ—the random variable caused by the unknown quotation of the bidder,
Figure BDA00020963013100000210
- Random variables caused by the uncertainty of the deviation between the actual value and the predicted value of wind power and photovoltaics,
Figure BDA00020963013100000211
—When quoted as λ, when ξ and
Figure BDA00020963013100000219
The income of the generator under the scenario, β—the confidence level of risk tolerance,
Figure BDA00020963013100000213
—Meet the expected revenue under the β confidence, q t,ξ —ξ scenario, the power generation company obtained in the lower-level planning won the bid electricity in the t period,
Figure BDA00020963013100000220
- In the scenario of ξ, the user's new energy subscription compensation per unit of electricity obtained by the lower-level decision (the output of the lower-level decision), c base - unit power generation cost,
Figure BDA00020963013100000215
- Power generators are
Figure BDA00020963013100000216
The penalty for breach of contract in the scenario, γ—the unit penalty for uncompleted electricity,
Figure BDA00020963013100000217
-At time t, the standard power in the ξ scene exceeds
Figure BDA0002096301310000031
The unbalanced power with the maximum output in the scene,
Figure BDA0002096301310000032
-exist
Figure BDA0002096301310000033
In the scenario, the upper limit of the actual output of the distributed power supply at time t, T—a period, and the default value is one hour.

优选地,所述下层规划建模用于针对竞价场景,以市场运行综合效益为目标,优化调度,分配各发电商中标电量,所述下层规划其数学表达式如下:Preferably, the lower-level planning modeling is used for the bidding scenario, aiming at the comprehensive benefit of market operation, optimizing the scheduling, and allocating the bid-winning power of each generator. The mathematical expression of the lower-level planning is as follows:

Figure BDA0002096301310000034
Figure BDA0002096301310000034

约束函数:Constraint function:

Figure BDA0002096301310000035
Figure BDA0002096301310000035

Figure BDA0002096301310000036
Figure BDA0002096301310000036

Figure BDA0002096301310000037
Figure BDA0002096301310000037

Figure BDA0002096301310000038
Figure BDA0002096301310000038

Figure BDA0002096301310000039
Figure BDA0002096301310000039

Figure BDA00020963013100000310
Figure BDA00020963013100000310

Figure BDA00020963013100000311
Figure BDA00020963013100000311

Figure BDA00020963013100000312
Figure BDA00020963013100000312

Figure BDA00020963013100000313
Figure BDA00020963013100000313

Figure BDA00020963013100000314
Figure BDA00020963013100000314

Figure BDA00020963013100000315
Figure BDA00020963013100000315

式中:Npv、Nwp—区域内光伏、风力发电商总数,L—区域内电力用户总数,

Figure BDA0002096301310000041
—t时刻从外电网购电的单位成本,
Figure BDA0002096301310000042
—t时刻从i号光伏、风力发电商购电成本,
Figure BDA0002096301310000043
—t时刻从外电网购电电量,
Figure BDA0002096301310000044
—t时刻从i号光伏、风力发电商购电电量,
Figure BDA0002096301310000045
—t时刻第i号电力用户的负荷量,comppv、compwp—用户认购范围为光伏、风电等可再生能源内付出的每度电认购补偿,Qload-pv-i、Qload-wp-i—i号用户当日结算下应付费的光伏、风电认购电量,Qpv、Qwp、Qgrid—当日区域内消耗的光伏、风电、外部电量,υpv、υwp—当日区域内光伏、风力发电比例,αi、βi—第i号用户认购的光伏、风电比例,
Figure BDA0002096301310000046
—i号光伏、风力发电商申报的t时刻最大发电量。In the formula: N pv , N wp - the total number of photovoltaic and wind power generators in the area, L - the total number of power users in the area,
Figure BDA0002096301310000041
—The unit cost of purchasing electricity from the external grid at time t,
Figure BDA0002096301310000042
—The cost of purchasing electricity from PV and wind power generators at time t,
Figure BDA0002096301310000043
- Purchase electricity from the external power grid at time t,
Figure BDA0002096301310000044
— Purchase electricity from photovoltaic and wind power generators at time t,
Figure BDA0002096301310000045
—The load of the ith power user at time t, comp pv , comp wp — The subscription scope of the user is the subscription compensation for each kilowatt-hour of electricity paid in renewable energy such as photovoltaic and wind power, Q load-pv-i , Q load-wp- i —the amount of photovoltaic and wind power subscription electricity payable under the settlement of the user on the day, Q pv , Q wp , Q grid —the photovoltaic, wind power, and external electricity consumed in the area on the day, υ pv , υ wp —the photovoltaic and wind power in the area on the day Proportion of power generation, α i , β i — the ratio of photovoltaic and wind power subscribed by the i-th user,
Figure BDA0002096301310000046
- The maximum power generation at time t declared by PV and wind power generators.

优选地,所述步骤2)中利用多个智能体来分别处理上层规划建模与下层规划建模本身的随机性问题和上层规划建模与下层规划建模相互的迭代;双层协同强化算法在强化学习过程中引入扩散策略,将自适应组合(ATC)机制引入强化学习算法中,所述协同强化学习算法能适应分布式可再生能源带来的随机性和不确定性,且能适应双层随机决策优化模型计算复杂的问题;此外,为了避免大量Q值表的存储,使用函数逼近器来记录着复杂连续的状态和动作空间的Q值。Preferably, in the step 2), multiple agents are used to deal with the randomness problem of the upper-level planning modeling and the lower-level planning modeling itself, and the mutual iteration between the upper-level planning modeling and the lower-level planning modeling; the two-layer collaborative strengthening algorithm The diffusion strategy is introduced in the reinforcement learning process, and the adaptive combination (ATC) mechanism is introduced into the reinforcement learning algorithm. The collaborative reinforcement learning algorithm can adapt to the randomness and uncertainty brought by distributed renewable energy, and can adapt to dual The layer stochastic decision optimization model is computationally complex; in addition, in order to avoid the storage of a large number of Q-value tables, a function approximator is used to record the Q-values of complex continuous state and action spaces.

优选地,所述扩散策略可以实现更快的收敛,并且可以达到比一致策略更低的均方偏差,所述扩散策略如下:Preferably, the diffusion strategy can achieve faster convergence and can achieve a lower mean square deviation than the consensus strategy, and the diffusion strategy is as follows:

Figure BDA0002096301310000047
Figure BDA0002096301310000047

Figure BDA0002096301310000048
Figure BDA0002096301310000048

其中

Figure BDA0002096301310000049
是一个由扩散策略引入的中间项,xi(k+1)是通过组合智能体i的所有中间项来更新的状态;Ni是与智能体i相邻点的集合;bij是由智能体i分配给相邻智能体j的权重;定义一个矩阵B=[bij]∈Rn×n作为微电网通信网络的拓扑矩阵;拓扑矩阵B是一个随机矩阵,B1n=1n,其中1n∈Rn是单位向量。in
Figure BDA0002096301310000049
is an intermediate term introduced by the diffusion strategy, xi (k+1) is the state updated by combining all intermediate terms of agent i ; Ni is the set of points adjacent to agent i; The weight assigned by the body i to the adjacent agent j; define a matrix B=[b ij ]∈R n×n as the topology matrix of the microgrid communication network; the topology matrix B is a random matrix, B1 n =1 n , where 1 n ∈ R n is a unit vector.

有益效果:Beneficial effects:

1、本发明建立的双层决策优化模型能够综合考虑这些随机变量引起的不确定性情景,做出更好的决策。因此,它非常适合于分布式发电机的优化决策。1. The two-layer decision-making optimization model established by the present invention can comprehensively consider the uncertain scenarios caused by these random variables, and make better decisions. Therefore, it is very suitable for optimal decision-making of distributed generators.

2、本发明中提出的算法是一种双层协同强化学习算法,可以很好地集成到两层随机决策优化模型中,为未来信息网络和能源网络的集约化能源交易决策提供了新的思路。2. The algorithm proposed in the present invention is a two-layer collaborative reinforcement learning algorithm, which can be well integrated into the two-layer stochastic decision-making optimization model, and provides a new idea for the intensive energy trading decision of the future information network and energy network. .

3、本发明引入了多个智能体来分别处理上下层规划本身的随机性问题和上下层相互的迭代,使得协同强化学习算法更适用于双层规划问题中。3. The present invention introduces a plurality of agents to deal with the randomness of the upper and lower level planning itself and the mutual iteration of the upper and lower levels, so that the collaborative reinforcement learning algorithm is more suitable for the two-level planning problem.

4、多智能体双层协同增强学习作为一种具有自学习和协同学习能力的多智能体增强学习算法,更适合于解决具有较强随机性和不确定性的大规模分布式接入能量问题。经过一定的训练更新后,该算法能够快速进行动态优化,同时保证全局收敛的稳定性。4. Multi-agent double-layer collaborative reinforcement learning, as a multi-agent reinforcement learning algorithm with self-learning and collaborative learning capabilities, is more suitable for solving large-scale distributed access energy problems with strong randomness and uncertainty . After a certain training update, the algorithm can quickly perform dynamic optimization while ensuring the stability of global convergence.

5、在强化学习过程中引入了扩散策略,使得在微电网中可以实现分布式信息交换,同时降低了计算成本,可以实现更快的收敛,并且可以达到比一致策略更低的均方偏差。5. The diffusion strategy is introduced in the reinforcement learning process, which enables distributed information exchange in the microgrid while reducing the computational cost, enabling faster convergence, and achieving a lower mean square deviation than the consensus strategy.

附图说明Description of drawings

图1是本发明的整体框架图;Fig. 1 is the overall frame diagram of the present invention;

图2是本发明的多智能体双层协同强化学习流程图。FIG. 2 is a flowchart of the multi-agent double-layer collaborative reinforcement learning of the present invention.

具体实施方式Detailed ways

下方结合附图和具体实施例对本发明做进一步的描述。The present invention will be further described below with reference to the accompanying drawings and specific embodiments.

本发明所述的基于多智能体双层协同强化学习的分布式可再生能源交易决策方法,以配电网为媒介,同时调度分布式电源与可控负荷,实现经济效益优化,其优化对象、模型示意如图1所示。The distributed renewable energy transaction decision-making method based on multi-agent double-layer collaborative reinforcement learning described in the present invention uses the distribution network as a medium to simultaneously dispatch distributed power sources and controllable loads to achieve economic benefit optimization. The schematic diagram of the model is shown in Figure 1.

本发明提出了一种基于多智能体双层协同强化学习的分布式可再生能源交易决策方法,所述方法包括以下步骤:The present invention proposes a distributed renewable energy transaction decision-making method based on multi-agent double-layer collaborative reinforcement learning, and the method includes the following steps:

步骤1)构建分布式可再生能源交易的双层随机决策优化模型;Step 1) Build a two-layer stochastic decision-making optimization model for distributed renewable energy trading;

步骤2)引入多智能体双层协同强化学习算法,根据多智能体双层协同强化学习算法的理论框架,进行学习训练,建立函数逼近器以及协同强化学习工作机制;Step 2) Introduce a multi-agent double-layer collaborative reinforcement learning algorithm, carry out learning and training according to the theoretical framework of the multi-agent double-layer collaborative reinforcement learning algorithm, and establish a function approximator and a collaborative reinforcement learning working mechanism;

步骤3)在步骤2)所述框架基础上利用迭代计算方法求取最优Q值函数的估计值;Step 3) utilizes iterative calculation method to obtain the estimated value of the optimal Q value function on the basis of the frame described in step 2);

步骤4)利用训练完的多智能体双层协同强化学习算法求解优化模型,完成优化计算。Step 4) Use the trained multi-agent double-layer collaborative reinforcement learning algorithm to solve the optimization model, and complete the optimization calculation.

所述步骤1)中的分布式可再生能源交易的双层随机决策优化模型包括上层规划建模和下层规划建模,分别对应能源交易环节的两个部分The two-layer stochastic decision-making optimization model for distributed renewable energy trading in the step 1) includes upper-level planning modeling and lower-level planning modeling, respectively corresponding to two parts of the energy trading link

所述步骤2)中利用多个智能体来分别处理上层规划建模与下层规划建模本身的随机性问题和上层规划建模与下层规划建模相互的迭代;双层协同强化算法在强化学习过程中引入扩散策略,将自适应组合(ATC)机制引入强化学习算法中,所述协同强化学习算法能适应分布式可再生能源带来的随机性和不确定性,且能适应双层随机决策优化模型计算复杂的问题;为了避免大量Q值表的存储,使用函数逼近器来记录复杂连续的状态和动作空间的Q值。In the step 2), multiple agents are used to deal with the randomness of the upper-level planning modeling and the lower-level planning modeling itself, and the mutual iteration between the upper-level planning modeling and the lower-level planning modeling; the two-layer collaborative reinforcement algorithm is used in reinforcement learning. The diffusion strategy is introduced in the process, and the adaptive combination (ATC) mechanism is introduced into the reinforcement learning algorithm. The collaborative reinforcement learning algorithm can adapt to the randomness and uncertainty brought by distributed renewable energy, and can adapt to two-layer random decision-making. The optimization model is computationally complex; in order to avoid the storage of a large number of Q-value tables, a function approximator is used to record the Q-values of complex continuous state and action spaces.

所述步骤3)迭代计算流程包括以下步骤(参见图2):The step 3) iterative calculation process includes the following steps (see Figure 2):

S1:初始化:θ0,ω0 S1: Initialization: θ 0 , ω 0

S2:重复次数k=1to TS2: number of repetitions k = 1to T

S3:每个智能体依次计算i=1to nS3: Each agent calculates i=1to n in turn

S4:计算特征向量

Figure BDA0002096301310000061
和状态si(k)S4: Calculate eigenvectors
Figure BDA0002096301310000061
and state s i (k)

S5:根据策略π选取动作ai(k)S5: Choose action a i (k) according to policy π

S6:观测奖励值ri(k)S6: Observe the reward value ri ( k )

S7:TD误差δi(k)S7: TD error δ i (k)

S8:估算

Figure BDA0002096301310000062
S8: Estimation
Figure BDA0002096301310000062

S9:更新参数θi(k),ωi(k)S9: Update parameters θ i (k), ω i (k)

S10:回到S3S10: Back to S3

S11:回到S2S11: Back to S2

S12:返回结果。S12: Return the result.

本发明所述的多智能体双层协同强化学习的分布式可再生能源框架应用基本步骤与解释如下:The basic steps and explanations of the distributed renewable energy framework application of the multi-agent double-layer collaborative reinforcement learning described in the present invention are as follows:

A1:将上下层规划的目标函数以及约束函数分解写入强化学习算法的各自Reward中,作为奖励的参考值,其中上层规划目标函数应得到最大期望,所以设置为正向奖励,下层规划目标函数希望价格最低,所以设置为反向奖励,上下层规划的约束条件作为惩罚项,系数根据实际调试情况进行设定,要求是强约束的惩罚系数要远远大于奖励项系数,弱约束大于奖励项系数即可。A1: Decompose the objective function and constraint function of the upper and lower planning into the respective Rewards of the reinforcement learning algorithm, as the reference value of the reward, in which the upper planning objective function should get the maximum expectation, so it is set as a positive reward, and the lower planning objective function I hope the price is the lowest, so it is set as a reverse reward, and the constraints of the upper and lower planning are used as the penalty item, and the coefficient is set according to the actual debugging situation. coefficient can be.

A2:构建第一个强化学习模块,其实质是两个(通常为多个)强化学习智能体的组合,将下层规划作为一个模块建立强化学习智能体单元,上层由于有多个发电商,每个发电商作为一个模块建立强化学习智能体单元,最后将上层的智能体单元和下层的智能体单元通过一整个智能体单元整合在一起,如图1中的智能体II所示,此时智能体II的Reward构成是以各个智能体单元总奖励最大为目标。A2: Build the first reinforcement learning module, which is essentially a combination of two (usually multiple) reinforcement learning agents. The lower layer planning is used as a module to build reinforcement learning agent units. Since there are multiple generators in the upper layer, each A power generator is used as a module to establish a reinforcement learning agent unit, and finally the upper-layer intelligent unit and the lower-layer intelligent unit are integrated through a whole intelligent unit, as shown in the agent II in Figure 1, at this time the intelligent The Reward composition of Body II aims to maximize the total reward of each agent unit.

A3:建立函数逼近器。Q值的存储会占用计算机大量的资源,为了减少计算机资源的占用,同时加快计算速度。A3: Build a function approximator. The storage of the Q value will take up a lot of computer resources, in order to reduce the occupation of computer resources and speed up the calculation.

A4:建立协同强化学习工作机制,为了加快多智能体之前的计算效率,建立了自适应组合(ATC)扩散策略集成到Greedy-GQ的参数更新过程。A4: Establish a collaborative reinforcement learning working mechanism. In order to speed up the computational efficiency before multi-agent, an adaptive combination (ATC) diffusion strategy is established to integrate the parameter update process of Greedy-GQ.

A5:构建第二个强化学习模块,以智能体II作为智能体的环境,利用常规的Q学习(或Sarsar、DQN等)更新规则来建立更新策略。A5: Construct a second reinforcement learning module, using Agent II as the environment of the agent, and using conventional Q-learning (or Sarsar, DQN, etc.) update rules to establish an update policy.

上层规划建模:Upper-level planning modeling:

上层规划中构建的最大化目标函数乐观值的机会约束规划,其优化目标为经济效益最大,约束条件由客观约束限制与机会约束限制组成。而且,上层优化以经济效益的乐观值为优化目标(即在特定的置信度下所获经济效益优于该值)使配网运行成本最低。所述客观约束限制为针对确定性对象的约束条件,包括单位发电成本、单位未完成发电量的罚款、分布式电源实际处理上下限等。所述机会约束限制为针对配网不确定性对象的约束条件,包括承受风险执行度的概率约束、潮流安全限制等。所述不确定性因素来源包括分布式光伏、风电出力、发电商出价的不确定性、传统负荷预测偏差不确定性等。The chance-constrained programming that maximizes the optimistic value of the objective function constructed in the upper-level planning, its optimization objective is to maximize economic benefits, and the constraints are composed of objective constraints and opportunity constraints. Moreover, the upper layer optimization takes the optimistic value of economic benefit as the optimization goal (that is, the economic benefit obtained is better than this value under a certain confidence level) to minimize the operating cost of the distribution network. The objective constraints are limited to constraints on deterministic objects, including unit power generation costs, fines per unit of uncompleted power generation, and upper and lower limits for the actual processing of distributed power sources. The opportunity constraint is limited to the constraint conditions for the uncertainty object of the distribution network, including the probability constraint of the execution degree of risk tolerance, the safety limit of the power flow, and the like. The sources of the uncertainty factors include distributed photovoltaics, wind power output, uncertainty of bids by power generators, uncertainty of traditional load forecast deviation, and the like.

所以,所述上层规划建模其数学表达式如下所示:Therefore, the mathematical expression of the upper-level programming modeling is as follows:

Figure BDA0002096301310000071
Figure BDA0002096301310000071

约束函数:Constraint function:

Figure BDA0002096301310000072
Figure BDA0002096301310000072

Figure BDA0002096301310000073
Figure BDA0002096301310000073

Figure BDA0002096301310000074
Figure BDA0002096301310000074

Figure BDA0002096301310000075
Figure BDA0002096301310000075

式中in the formula

λ——发电商分时报价,其中λt是时刻为t时的报价λ——The time-of-use quotation of the generator, where λ t is the quotation at time t

ξ——竞价商报价未知造成的随机变量ξ——Random variable caused by unknown bidder’s bid

Figure BDA0002096301310000076
——风电、光伏真实值与预测值偏差不确定性造成的随机变量
Figure BDA0002096301310000076
——Random variables caused by the uncertainty of the deviation between the actual value and the predicted value of wind power and photovoltaic power

Figure BDA0002096301310000077
——报价为λ时,在ξ与
Figure BDA0002096301310000078
场景下的发电商收益
Figure BDA0002096301310000077
——When quoted as λ, between ξ and
Figure BDA0002096301310000078
Power generator revenue in the scenario

β——承受风险置信度β - the confidence level of risk taking

Figure BDA0002096301310000081
——满足β置信度下的预期收益
Figure BDA0002096301310000081
- Satisfy the expected return under the β confidence level

qt,ξ——ξ场景下,下层规划中得到的该发电商在t时段中标电量q t,ξ ——In the ξ scenario, the power generation company's winning bid power in the t period obtained from the lower-level planning

Figure BDA0002096301310000082
——ξ场景下,下层决策得到的单位电量用户新能源认购补偿(下层决策输出)
Figure BDA0002096301310000082
——In the ξ scenario, the user's new energy subscription compensation per unit of electricity obtained by the lower-level decision (the output of the lower-level decision)

cbase——单位发电成本c base - unit power generation cost

Figure BDA0002096301310000083
——发电商在ξ与
Figure BDA0002096301310000084
场景下的违约罚款
Figure BDA0002096301310000083
——The power generator is in ξ and
Figure BDA0002096301310000084
Penalty for breach of contract

γ——未完成电量的单位罚款γ - unit fine for uncompleted electricity

Figure BDA0002096301310000085
——t时刻,ξ场景中标电量超出
Figure BDA0002096301310000086
场景下最大出力的不平衡电量
Figure BDA0002096301310000085
——At time t, the standard power in the ξ scene exceeds
Figure BDA0002096301310000086
The unbalanced power of the maximum output in the scene

Figure BDA0002096301310000087
——在
Figure BDA0002096301310000088
场景下在t时刻分布式电源实际出力上限
Figure BDA0002096301310000087
--exist
Figure BDA0002096301310000088
The upper limit of the actual output of the distributed power supply at time t in the scenario

T——一个时段,默认值为一小时。T - a time period, the default value is one hour.

下层规划建模:Lower-level planning modeling:

下层规划,以市场运作的综合效益为目标,优化各发电厂的调度和分配投标权。较低层次的编程模型实际上是区域零售市场的市场均衡调度模型。模型的准确性决定了区域市场是否能够按照规则正常运作。由于忽略了能量存储,该地区的电力购买源包括分布式发电机和外部电网两部分,每个时期的购电成本总和构成了系统的成本来源。此外,考虑到用户愿意支付一定的成本来订购新能源并享受绿色电力,这个用户群也可以包含在综合效益中。因此,最佳目标可以最小化购电成本并增加绿色电力的认购费用。The lower-level planning, aiming at the comprehensive benefit of market operation, optimizes the dispatching and allocation of bidding rights of each power plant. The lower-level programming model is actually a market equilibrium scheduling model for regional retail markets. The accuracy of the model determines whether the regional market can function normally according to the rules. Due to the neglect of energy storage, the power purchase sources in this region include distributed generators and external grids, and the sum of the power purchase costs for each period constitutes the cost source of the system. In addition, considering that users are willing to pay a certain cost to order new energy and enjoy green power, this user group can also be included in the overall benefit. Therefore, the optimal target can minimize the cost of electricity purchase and increase the subscription fee for green electricity.

所以,所述下层规划建模其数学表达式如下所示:Therefore, the mathematical expression of the lower-level programming modeling is as follows:

Figure BDA0002096301310000089
Figure BDA0002096301310000089

约束函数:Constraint function:

Figure BDA00020963013100000810
Figure BDA00020963013100000810

Figure BDA00020963013100000811
Figure BDA00020963013100000811

Figure BDA0002096301310000091
Figure BDA0002096301310000091

Figure BDA0002096301310000092
Figure BDA0002096301310000092

Figure BDA0002096301310000093
Figure BDA0002096301310000093

Figure BDA0002096301310000094
Figure BDA0002096301310000094

Figure BDA0002096301310000095
Figure BDA0002096301310000095

Figure BDA0002096301310000096
Figure BDA0002096301310000096

Figure BDA0002096301310000097
Figure BDA0002096301310000097

Figure BDA0002096301310000098
Figure BDA0002096301310000098

Figure BDA0002096301310000099
Figure BDA0002096301310000099

式中:where:

Npv、Nwp——区域内光伏、风力发电商总数N pv , N wp - the total number of photovoltaic and wind power generators in the region

L——区域内电力用户总数L - the total number of electricity users in the area

Figure BDA00020963013100000910
——t时刻从外电网购电的单位成本
Figure BDA00020963013100000910
——The unit cost of purchasing electricity from the external grid at time t

Figure BDA00020963013100000911
——t时刻从i号光伏、风力发电商购电成本
Figure BDA00020963013100000911
——The cost of purchasing electricity from PV and wind power generators at time t

Figure BDA00020963013100000912
——t时刻从外电网购电电量
Figure BDA00020963013100000912
—— Purchase electricity from the external power grid at time t

Figure BDA00020963013100000913
——t时刻从i号光伏、风力发电商购电电量
Figure BDA00020963013100000913
——Purchasing electricity from photovoltaic and wind power generators at time t

Figure BDA00020963013100000914
——t时刻第i号电力用户的负荷量
Figure BDA00020963013100000914
——The load of the ith power user at time t

comppv、compwp——用户认购范围为光伏、风电等可再生能源内付出的每度电认购补偿comp pv , comp wp - the subscription scope of users is the subscription compensation for each kilowatt-hour of electricity paid in renewable energy sources such as photovoltaics and wind power

Qload-pv-i、Qload-wp-i——i号用户当日结算下应付费的光伏、风电认购电量Q load-pv-i , Q load-wp-i ——the amount of photovoltaic and wind power subscription electricity payable under the settlement of user i on the day

Qpv、Qwp、Qgrid——当日区域内消耗的光伏、风电、外部电量Q pv , Q wp , Q grid - photovoltaic, wind power, external power consumption in the area on the day

υpv、υwp——当日区域内光伏、风力发电比例υ pv , υ wp —— the ratio of photovoltaic and wind power generation in the region on the day

αi、βi——第i号用户认购的光伏、风电比例α i , β i - the proportion of photovoltaic and wind power subscribed by the i-th user

Figure BDA0002096301310000101
——i号光伏、风力发电商申报的t时刻最大发电量
Figure BDA0002096301310000101
——Maximum power generation at time t declared by PV and wind power generators

函数逼近器:Function approximator:

函数逼近器采用一系列可调参数和从状态作用空间中提取的特征来估计Q值。然后,逼近器建立从参数空间到Q值函数的状态作用对空间的映射。映射可以是线性的,也可以是非线性的。可以利用线性映射分析可解性。线性逼近器的典型形式如下所示:The function approximator uses a series of tunable parameters and features extracted from the state action space to estimate the Q value. The approximator then builds a mapping from the parameter space to the state action-to-space of the Q-value function. The mapping can be linear or non-linear. Solvability can be analyzed using linear mapping. A typical form of a linear approximator looks like this:

Figure BDA0002096301310000102
Figure BDA0002096301310000102

其中

Figure BDA0002096301310000103
是一种可调近似参数向量,
Figure BDA0002096301310000104
是状态-动作对的特征向量,由下式可得:in
Figure BDA0002096301310000103
is a tunable approximate parameter vector,
Figure BDA0002096301310000104
is the eigenvector of the state-action pair, which can be obtained by:

Figure BDA0002096301310000105
Figure BDA0002096301310000105

其中

Figure BDA0002096301310000106
是基函数(BF),例如高斯径向BF,其中心是状态空间中的选择不动点。一般情况下,对应于固定点的BFs集合均匀分布在状态空间中。在本文中,如果没有指定,所有向量被认为是列向量。(·)T表示矩阵转置操作。径向基神经网络已被用于随机非线性互联系统,并已证明其具有良好的泛化性能。in
Figure BDA0002096301310000106
is a basis function (BF), such as a Gaussian radial BF, whose center is a choice fixed point in the state space. In general, the set of BFs corresponding to fixed points is uniformly distributed in the state space. In this article, if not specified, all vectors are considered column vectors. (·) T represents the matrix transpose operation. Radial Basis Neural Networks have been used in stochastic nonlinear interconnected systems and have demonstrated good generalization performance.

扩散策略:Diffusion strategy:

强化学习算法在强化学习过程中引入了扩散策略,将自适应组合(ATC)机制引入强化学习算法中。扩散策略可以实现更快的收敛,并且可以达到比一致策略更低的均方偏差。此外,扩散策略对连续实时信号具有更好的响应性能,并且对相邻权重不敏感。扩散策略的基本思想是在每个智能体的自状态更新过程中将基于相邻状态的协作项结合起来。考虑具有状态xi的智能体i及其动态特性。The reinforcement learning algorithm introduces the diffusion strategy in the reinforcement learning process, and the adaptive composition (ATC) mechanism is introduced into the reinforcement learning algorithm. Diffusion strategies can achieve faster convergence and can achieve lower mean squared deviations than consistent strategies. In addition, the diffusion strategy has better response performance to continuous real-time signals and is insensitive to adjacent weights. The basic idea of the diffusion strategy is to combine cooperative terms based on neighboring states during the self-state update process of each agent. Consider an agent i with state xi and its dynamics.

xi(k+1)=xi(k)+f(xi(k))x i (k+1)=x i (k)+f(x i (k))

扩散策略如下所示:The diffusion strategy is as follows:

Figure BDA0002096301310000107
Figure BDA0002096301310000107

Figure BDA0002096301310000111
Figure BDA0002096301310000111

其中

Figure BDA0002096301310000112
是一个由扩散策略引入的中间项,xi(k+1)是通过组合智能体i的所有中间项来更新的状态。Ni是与智能体i相邻点的集合。此外,bij是由智能体i分配给相邻智能体j的权重。在这里,我们可以定义一个矩阵B=[bij]∈Rn×n作为微电网通信网络的拓扑矩阵。一般来说,拓扑矩阵B是一个随机矩阵,这意味着B1n=1n,其中1n∈Rn是单位向量。in
Figure BDA0002096301310000112
is an intermediate term introduced by the diffusion strategy, and xi (k+1) is the state updated by combining all intermediate terms of agent i. Ni is the set of points adjacent to agent i . Furthermore, b ij is the weight assigned by agent i to neighboring agent j. Here, we can define a matrix B=[b ij ]∈R n×n as the topology matrix of the microgrid communication network. In general, the topological matrix B is a random matrix, which means B1 n = 1 n , where 1 n ∈ R n is the unit vector.

通过将自适应组合(ATC)扩散策略集成到Greedy-GQ的参数更新过程中,给出了协同强化学习算法。A collaborative reinforcement learning algorithm is presented by integrating the adaptive composition (ATC) diffusion strategy into the parameter update process of Greedy-GQ.

Figure BDA0002096301310000113
Figure BDA0002096301310000113

Figure BDA0002096301310000114
Figure BDA0002096301310000114

Figure BDA0002096301310000115
Figure BDA0002096301310000115

Figure BDA0002096301310000116
Figure BDA0002096301310000116

需要注意的是,所提出的协同强化学习算法引入了两个中间向量:

Figure BDA0002096301310000117
Figure BDA0002096301310000118
实际近似参数向量θi(k+1)和校正参数向量ωi(k+1)是相邻代理的中间矢量的组合。在所提出的算法中,学习速率参数α(k)和β(k)可以用条件P(1)~P(4)来设定。It should be noted that the proposed collaborative reinforcement learning algorithm introduces two intermediate vectors:
Figure BDA0002096301310000117
and
Figure BDA0002096301310000118
The actual approximation parameter vector θ i (k+1) and the correction parameter vector ω i (k+1) are the combination of the intermediate vectors of adjacent agents. In the proposed algorithm, the learning rate parameters α(k) and β(k) can be set with conditions P(1)~P(4).

α(k)>0,β(k)>0 P(1)α(k)>0, β(k)>0 P(1)

Figure BDA0002096301310000119
Figure BDA0002096301310000119

Figure BDA00020963013100001110
Figure BDA00020963013100001110

α(k)/β(k)→0 P(4)α(k)/β(k)→0 P(4)

以上实施例仅用以说明本发明的技术方案而非对其限制,尽管参照上述实施例对本发明进行了详细的说明,所属领域的普通技术人员依然可以对本发明的具体实施方式进行修改或者等同替换,而这些未脱离本发明精神和范围的任何修改或者等同替换,其均在申请待批的本发明的权利要求保护范围之内。The above embodiments are only used to illustrate the technical solutions of the present invention and not to limit them. Although the present invention has been described in detail with reference to the above embodiments, those of ordinary skill in the art can still modify or equivalently replace the specific embodiments of the present invention. , and any modifications or equivalent replacements that do not depart from the spirit and scope of the present invention are all within the protection scope of the claims of the present invention for which the application is pending.

Claims (3)

1.一种基于多智能体双层协同强化学习的分布式可再生能源交易决策方法,其特征在于,包括以下步骤:1. A distributed renewable energy transaction decision-making method based on multi-agent double-layer collaborative reinforcement learning, is characterized in that, comprises the following steps: 步骤1)构建分布式可再生能源交易的双层随机决策优化模型,所述步骤1)中的分布式可再生能源交易的双层随机决策优化模型包括上层规划建模和下层规划建模,分别对应能源交易环节的两个部分;Step 1) constructing a two-layer stochastic decision-making optimization model for distributed renewable energy trading, and the two-layer stochastic decision-making optimization model for distributed renewable energy trading in step 1) includes upper-level planning modeling and lower-level planning modeling, respectively. Corresponding to the two parts of the energy trading link; 步骤2)引入多智能体双层协同强化学习算法,根据多智能体双层协同强化学习算法的理论框架,进行学习训练,建立函数逼近器;所述函数逼近器采用一系列可调参数和从状态作用空间中提取的特征来估计Q值,逼近器建立从参数空间到Q值函数的状态作用对空间的映射,映射是线性的或是非线性的,利用线性映射分析可解性,函数逼近器的典型形式如下:Step 2) Introduce a multi-agent double-layer collaborative reinforcement learning algorithm, carry out learning and training according to the theoretical framework of the multi-agent double-layer collaborative reinforcement learning algorithm, and establish a function approximator; the function approximator adopts a series of adjustable parameters and from The features extracted from the state action space are used to estimate the Q value. The approximator establishes the mapping of the state action from the parameter space to the Q value function to the space. The mapping is linear or non-linear, and the linear mapping is used to analyze the solvability. The function approximator The typical form of is as follows:
Figure FDA0003699062350000011
Figure FDA0003699062350000011
Figure FDA0003699062350000012
Figure FDA0003699062350000012
其中
Figure FDA0003699062350000013
是一种可调近似参数向量,
Figure FDA0003699062350000014
是状态-动作对的特征向量,
Figure FDA0003699062350000015
是基函数(BF);(·)T表示矩阵转置操作;
in
Figure FDA0003699062350000013
is a tunable approximate parameter vector,
Figure FDA0003699062350000014
is the eigenvector of the state-action pair,
Figure FDA0003699062350000015
is the basis function (BF); ( ) T represents the matrix transpose operation;
步骤3)在步骤2)所述框架基础上利用迭代计算方法求取最优Q值函数的估计值;Step 3) utilizes iterative calculation method to obtain the estimated value of the optimal Q value function on the basis of the frame described in step 2); 步骤4)利用训练完的多智能体双层协同强化学习算法求解优化模型,完成优化计算;所述上层规划建模中构建的最大化目标函数乐观值的机会约束规划,其优化目标为经济效益最大,约束条件由客观约束限制与机会约束限制组成,所述上层规划建模其数学表达式如下:Step 4) use the trained multi-agent double-layer collaborative reinforcement learning algorithm to solve the optimization model, and complete the optimization calculation; the opportunity-constrained planning of maximizing the optimistic value of the objective function constructed in the upper-level planning modeling, the optimization goal of which is economic benefit Maximum, the constraints are composed of objective constraints and chance constraints, and the mathematical expression of the upper-level programming modeling is as follows:
Figure FDA0003699062350000016
Figure FDA0003699062350000016
约束函数:Constraint function:
Figure FDA0003699062350000017
Figure FDA0003699062350000017
Figure FDA0003699062350000018
Figure FDA0003699062350000018
Figure FDA0003699062350000019
Figure FDA0003699062350000019
Figure FDA0003699062350000021
Figure FDA0003699062350000021
式中,λ—发电商分时报价,其中λt是时刻为t时的报价,ξ—竞价商报价未知造成的随机变量,
Figure FDA0003699062350000022
—风电、光伏真实值与预测值偏差不确定性造成的随机变量,
Figure FDA0003699062350000023
—报价为λ时,在ξ与
Figure FDA0003699062350000024
场景下的发电商收益,β—承受风险置信度,
Figure FDA0003699062350000025
—满足β置信度下的预期收益,qt —ξ场景下,下层规划中得到的该发电商在t时段中标电量,cs ξ—ξ场景下,下层决策得到的单位电量用户新能源认购补偿,cbase—单位发电成本,
Figure FDA0003699062350000026
—发电商在ξ与
Figure FDA0003699062350000027
场景下的违约罚款,γ—未完成电量的单位罚款,
Figure FDA0003699062350000028
—t时刻,ξ场景中标电量超出
Figure FDA0003699062350000029
场景下最大出力的不平衡电量,
Figure FDA00036990623500000210
—在
Figure FDA00036990623500000211
场景下在t时刻分布式电源实际出力上限,T—一个时段,默认值为一小时;
In the formula, λ—the time-sharing quotation of the generator, where λ t is the quotation at time t, ξ—the random variable caused by the unknown quotation of the bidder,
Figure FDA0003699062350000022
- Random variables caused by the uncertainty of the deviation between the actual value and the predicted value of wind power and photovoltaics,
Figure FDA0003699062350000023
—When quoted as λ, when ξ and
Figure FDA0003699062350000024
The income of the generator under the scenario, β—the confidence level of risk tolerance,
Figure FDA0003699062350000025
—Meet the expected revenue under the confidence of β, in the scenario of q t —ξ, the power generation company’s winning bid power in the t period obtained in the lower-level planning, in the scenario of c s ξ —ξ, the lower-level decision-making obtains the new energy for users per unit of electricity Subscription compensation, c base — unit power generation cost,
Figure FDA0003699062350000026
- Power generators are
Figure FDA0003699062350000027
The penalty for breach of contract in the scenario, γ—the unit penalty for uncompleted electricity,
Figure FDA0003699062350000028
-At time t, the standard power in the ξ scene exceeds
Figure FDA0003699062350000029
The unbalanced power with the maximum output in the scene,
Figure FDA00036990623500000210
-exist
Figure FDA00036990623500000211
In the scenario, the upper limit of the actual output of the distributed power supply at time t, T—a period, and the default value is one hour;
所述下层规划建模用于针对竞价场景,以市场运行综合效益为目标,优化调度,分配各发电商中标电量,所述下层规划其数学表达式如下:The lower-level planning modeling is used for the bidding scenario, aiming at the comprehensive benefit of market operation, optimizing scheduling, and allocating the bid-winning power of each generator. The mathematical expression of the lower-level planning is as follows:
Figure FDA00036990623500000212
Figure FDA00036990623500000212
约束函数:Constraint function:
Figure FDA00036990623500000213
Figure FDA00036990623500000213
Figure FDA00036990623500000214
Figure FDA00036990623500000214
Figure FDA00036990623500000215
Figure FDA00036990623500000215
Figure FDA00036990623500000216
Figure FDA00036990623500000216
Figure FDA00036990623500000217
Figure FDA00036990623500000217
Figure FDA00036990623500000218
Figure FDA00036990623500000218
Figure FDA00036990623500000219
Figure FDA00036990623500000219
Figure FDA0003699062350000031
Figure FDA0003699062350000031
Figure FDA0003699062350000032
Figure FDA0003699062350000032
Figure FDA0003699062350000033
Figure FDA0003699062350000033
Figure FDA0003699062350000034
Figure FDA0003699062350000034
式中:Npv、Nwp—区域内光伏、风力发电商总数,L—区域内电力用户总数,
Figure FDA0003699062350000035
—t时刻从外电网购电的单位成本,
Figure FDA0003699062350000036
—t时刻从i号光伏、风力发电商购电成本,
Figure FDA0003699062350000037
—t时刻从外电网购电电量,
Figure FDA0003699062350000038
—t时刻从i号光伏、风力发电商购电电量,
Figure FDA0003699062350000039
—t时刻第i号电力用户的负荷量,comppv、compwp—用户认购范围为光伏、风电可再生能源内付出的每度电认购补偿,Qload-pv-i、Qload-wp-i—i号用户当日结算下应付费的光伏、风电认购电量,Qpv、Qwp、Qgrid—当日区域内消耗的光伏、风电、外部电量,υpv、υwp—当日区域内光伏、风力发电比例,αi、βi—第i号用户认购的光伏、风电比例,
Figure FDA00036990623500000310
—i号光伏、风力发电商申报的t时刻最大发电量。
In the formula: N pv , N wp - the total number of photovoltaic and wind power generators in the area, L - the total number of power users in the area,
Figure FDA0003699062350000035
—The unit cost of purchasing electricity from the external grid at time t,
Figure FDA0003699062350000036
—The cost of purchasing electricity from PV and wind power generators at time t,
Figure FDA0003699062350000037
- Purchase electricity from the external power grid at time t,
Figure FDA0003699062350000038
— Purchase electricity from photovoltaic and wind power generators at time t,
Figure FDA0003699062350000039
—The load amount of the ith power user at time t, comp pv , comp wp — The subscription scope of the user is the subscription compensation for each kilowatt-hour of electricity paid in photovoltaic and wind power renewable energy, Q load-pv-i , Q load-wp-i —The amount of photovoltaic and wind power subscription electricity payable under the settlement of the user on the day, Q pv , Q wp , Q grid — the photovoltaic, wind power, and external electricity consumed in the region on the day, υ pv , υ wp — the photovoltaic and wind power generation in the region on the day Proportion, α i , β i - the proportion of photovoltaic and wind power subscribed by the i-th user,
Figure FDA00036990623500000310
- The maximum power generation at time t declared by PV and wind power generators.
2.根据权利要求1所述的一种基于多智能体双层协同强化学习的分布式可再生能源交易决策方法,其特征在于:所述步骤2)中利用多个智能体来分别处理上层规划建模与下层规划建模本身的随机性问题和上层规划建模与下层规划建模相互的迭代;双层协同强化学习算法在强化学习过程中引入扩散策略,将自适应组合ATC机制引入强化学习算法中,所述双层协同强化学习算法能适应分布式可再生能源带来的随机性和不确定性,且能适应双层随机决策优化模型计算复杂的问题;为了避免大量Q值表的存储,使用函数逼近器来记录复杂连续的状态和动作空间的Q值。2. A distributed renewable energy transaction decision-making method based on multi-agent double-layer collaborative reinforcement learning according to claim 1, characterized in that: in the step 2), multiple agents are used to process upper-level planning respectively The randomness problem of modeling and lower-level planning modeling itself and the mutual iteration of upper-level planning modeling and lower-level planning modeling; the two-layer collaborative reinforcement learning algorithm introduces a diffusion strategy in the reinforcement learning process, and the adaptive combination ATC mechanism is introduced into reinforcement learning. In the algorithm, the two-layer collaborative reinforcement learning algorithm can adapt to the randomness and uncertainty caused by distributed renewable energy, and can adapt to the complex calculation problem of the two-layer random decision optimization model; in order to avoid the storage of a large number of Q-value tables. , using a function approximator to record Q-values for complex continuous state and action spaces. 3.根据权利要求2所述的一种基于多智能体双层协同强化学习的分布式可再生能源交易决策方法,其特征在于:所述扩散策略可以实现更快的收敛,且可以达到比一致策略更低的均方偏差,所述扩散策略如下:3. A distributed renewable energy transaction decision-making method based on multi-agent double-layer collaborative reinforcement learning according to claim 2, characterized in that: the diffusion strategy can achieve faster convergence, and can achieve more consistent lower mean squared deviation of the strategy, the diffusion strategy is as follows:
Figure FDA00036990623500000311
Figure FDA00036990623500000311
Figure FDA0003699062350000041
Figure FDA0003699062350000041
其中,
Figure FDA0003699062350000042
是一个由扩散策略引入的中间项,xi(k+1)是通过组合智能体i的所有中间项来更新的状态;Ni是与智能体i相邻点的集合;bij是由智能体i分配给相邻智能体j的权重;此处,定义一个矩阵B=[bij]∈Rn×n作为微电网通信网络的拓扑矩阵;拓扑矩阵B是一个随机矩阵,B1n=1n,其中1n∈Rn是单位向量。
in,
Figure FDA0003699062350000042
is an intermediate term introduced by the diffusion strategy, xi (k+1) is the state updated by combining all intermediate terms of agent i ; Ni is the set of points adjacent to agent i; The weight assigned by the body i to the adjacent agent j; here, a matrix B=[b ij ]∈R n×n is defined as the topology matrix of the microgrid communication network; the topology matrix B is a random matrix, B1 n =1 n , where 1 n ∈ R n is the unit vector.
CN201910519858.1A 2019-06-17 2019-06-17 Distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning Active CN110276698B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910519858.1A CN110276698B (en) 2019-06-17 2019-06-17 Distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910519858.1A CN110276698B (en) 2019-06-17 2019-06-17 Distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning

Publications (2)

Publication Number Publication Date
CN110276698A CN110276698A (en) 2019-09-24
CN110276698B true CN110276698B (en) 2022-08-02

Family

ID=67960916

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910519858.1A Active CN110276698B (en) 2019-06-17 2019-06-17 Distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning

Country Status (1)

Country Link
CN (1) CN110276698B (en)

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110990793B (en) * 2019-12-07 2024-03-15 国家电网有限公司 Scheduling optimization method for electric-heat-gas coupled micro-energy station
CN111064229B (en) * 2019-12-18 2023-04-07 广东工业大学 Wind-light-gas-storage combined dynamic economic dispatching optimization method based on Q learning
CN111200285B (en) * 2020-02-12 2023-12-19 燕山大学 Micro-grid hybrid coordination control method based on reinforcement learning and multi-agent theory
CN112612206B (en) * 2020-11-27 2022-11-08 合肥工业大学 Multi-agent collaborative decision-making method and system for uncertain events
CN112714165B (en) * 2020-12-22 2023-04-04 声耕智能科技(西安)研究院有限公司 Distributed network cooperation strategy optimization method and device based on combination mechanism
CN112859591B (en) * 2020-12-23 2022-10-21 华电电力科学研究院有限公司 Reinforced learning control system for operation optimization of energy system
CN113378456B (en) * 2021-05-21 2023-04-07 青海大学 Multi-Park Integrated Energy Scheduling Method and System
CN113421004B (en) * 2021-06-30 2023-05-26 国网山东省电力公司潍坊供电公司 Transmission and distribution cooperative active power distribution network distributed robust extension planning system and method
CN113555870B (en) * 2021-07-26 2023-10-13 国网江苏省电力有限公司南通供电分公司 Q-learning photovoltaic prediction-based power distribution network multi-time scale optimal scheduling method
CN113780622B (en) * 2021-08-04 2024-03-12 华南理工大学 Multi-agent reinforcement learning-based distributed scheduling method for multi-microgrid power distribution system
CN113743583B (en) * 2021-08-07 2024-02-02 中国航空工业集团公司沈阳飞机设计研究所 Method for inhibiting switching of invalid behaviors of intelligent agent based on reinforcement learning
CN114021815B (en) * 2021-11-04 2023-06-27 东南大学 Scalable energy management collaboration method for community containing large-scale producers and consumers
CN114611813B (en) * 2022-03-21 2022-09-27 特斯联科技集团有限公司 Community hot-cold water circulation optimal scheduling method and system based on hydrogen energy storage
CN115115211A (en) * 2022-06-24 2022-09-27 中国电力科学研究院有限公司 Multi-microgrid system layered reinforcement learning optimization method and system and storage medium
WO2024084125A1 (en) * 2022-10-19 2024-04-25 Aalto University Foundation Sr Trained optimization agent for renewable energy time shifting
CN117559387B (en) * 2023-10-18 2024-06-21 东南大学 VPP internal energy optimization method and system based on deep reinforcement learning dynamic pricing
CN117350515B (en) * 2023-11-21 2024-04-05 安徽大学 A method for energy flow scheduling of offshore island groups based on multi-agent reinforcement learning
CN117833359A (en) * 2023-12-28 2024-04-05 国网山东省电力公司东营供电公司 Distributed photovoltaic management method and device based on optimal consumption interval

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6820815B2 (en) * 2017-09-07 2021-01-27 株式会社日立製作所 Learning control system and learning control method
CN109325608B (en) * 2018-06-01 2022-04-01 国网上海市电力公司 Distributed power supply optimal configuration method considering energy storage and considering photovoltaic randomness

Also Published As

Publication number Publication date
CN110276698A (en) 2019-09-24

Similar Documents

Publication Publication Date Title
CN110276698B (en) Distributed renewable energy transaction decision method based on multi-agent double-layer collaborative reinforcement learning
Wang et al. Stochastic cooperative bidding strategy for multiple microgrids with peer-to-peer energy trading
Maity et al. Simulation and pricing mechanism analysis of a solar-powered electrical microgrid
Liu et al. Comparison of centralized and peer-to-peer decentralized market designs for community markets
CN110518580B (en) An active distribution network operation optimization method considering active optimization of microgrid
Li et al. Two-stage community energy trading under end-edge-cloud orchestration
CN112149914A (en) Method for optimizing and configuring power market resources under multi-constraint condition
CN113378456A (en) Multi-park comprehensive energy scheduling method and system
CN114169916B (en) Market member quotation strategy formulation method suitable for novel power system
Chuang et al. Deep reinforcement learning based pricing strategy of aggregators considering renewable energy
CN111311012A (en) Multi-agent-based micro-grid power market double-layer bidding optimization method
Yin et al. Equilibrium stability of asymmetric evolutionary games of multi-agent systems with multiple groups in open electricity market
CN114399369B (en) Multi-benefit-body random dynamic transaction game method at power distribution network side
CN111259315B (en) Decentralized scheduling method of multi-subject coordinated pricing mode
CN111553750A (en) Energy storage bidding strategy method considering power price uncertainty and loss cost
CN112636327A (en) Block chain transaction and energy storage system based solution
CN118157143A (en) GA-MADRL-PPO combination-based distributed photovoltaic optimal scheduling strategy method, device and system
Li et al. A data-driven joint chance-constrained game for renewable energy aggregators in the local market
Xu et al. Deep reinforcement learning and blockchain for peer-to-peer energy trading among microgrids
CN114204549A (en) Wind-solar-storage cluster joint optimization operation method considering energy storage sharing
Fang et al. Multiagent reinforcement learning with learning automata for microgrid energy management and decision optimization
CN117114877A (en) Medium-and-long-term power transaction method and system based on virtual power plant
Gao et al. Decision-making method of sharing mode for multi-microgrid system considering risk and coordination cost
Fan et al. Medium and long-term electricity trading considering renewable energy participation
CN118917753A (en) Cluster division-based power distribution network double-layer planning method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant