WO2023019536A1 - 基于深度强化学习的光伏组件智能追日方法 - Google Patents

基于深度强化学习的光伏组件智能追日方法 Download PDF

Info

Publication number
WO2023019536A1
WO2023019536A1 PCT/CN2021/113655 CN2021113655W WO2023019536A1 WO 2023019536 A1 WO2023019536 A1 WO 2023019536A1 CN 2021113655 W CN2021113655 W CN 2021113655W WO 2023019536 A1 WO2023019536 A1 WO 2023019536A1
Authority
WO
WIPO (PCT)
Prior art keywords
network
value
action
state
photovoltaic module
Prior art date
Application number
PCT/CN2021/113655
Other languages
English (en)
French (fr)
Inventor
吴新亚
胡磊
王磊
Original Assignee
上海电气电站设备有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海电气电站设备有限公司 filed Critical 上海电气电站设备有限公司
Priority to PCT/CN2021/113655 priority Critical patent/WO2023019536A1/zh
Publication of WO2023019536A1 publication Critical patent/WO2023019536A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05FSYSTEMS FOR REGULATING ELECTRIC OR MAGNETIC VARIABLES
    • G05F1/00Automatic systems in which deviations of an electric quantity from one or more predetermined values are detected at the output of the system and fed back to a device within the system to restore the detected quantity to its predetermined value or values, i.e. retroactive systems
    • G05F1/66Regulating electric power
    • G05F1/67Regulating electric power to the maximum power available from a generator, e.g. from solar cell

Definitions

  • the present invention relates to a tracking control method in the technical field of solar tracking brackets, in particular to a method for intelligent solar tracking of photovoltaic modules based on deep reinforcement learning based on strategy network and value network.
  • photovoltaic tracking generally uses the solar trajectory tracking method to calculate the sun angle in real time according to data such as latitude and longitude, date, and time, and to keep the normal direction of the photovoltaic module and the incident angle of sunlight at a minimum.
  • This tracking method cannot guarantee that the module is at the optimal angle under special weather conditions such as cloudy and rainy days, and in the application of bifacial modules, resulting in loss of power generation. Therefore, combining artificial intelligence technology to develop an AI intelligent tracking algorithm and control system to further increase the overall power generation of photovoltaic power plants on the basis of traditional tracking will have huge market value and application prospects.
  • the present invention provides a photovoltaic module intelligent solar tracking method based on deep reinforcement learning, which can be used according to temperature, humidity, radiation, wind speed, etc.
  • the environmental state data corrects the tracking angle of the photovoltaic module to maximize the power generation.
  • the present invention is realized through the following technical solutions, and the present invention includes the following steps: first, collecting state data of photovoltaic modules under various weather conditions; second, establishing a strategy network and a value network; third, analyzing the strategy network and The value network is trained; fourthly, the photovoltaic modules are intelligently tracked based on the policy network.
  • the state data under various weather conditions include but not limited to air temperature, humidity, radiation, wind speed, time, latitude and longitude, and current angle.
  • the model of the policy network is expressed as ⁇ (a
  • s; ⁇ ) can make a decision based on the observed state s to control the component bracket to make an action , the output is the probability distribution of action a;
  • the value network q(a, s; w) takes state quantity s and action a as input to evaluate the value of action decision a made in state s, and the output is Definition of power variation before and after component adjustment.
  • the steps of training the policy network and the value network are:
  • Step 1 observe the current state s t ;
  • Step 2 based on the observed state s t , make an action decision through the decision network ⁇ ( ⁇
  • Step 3 execute the action a t to correct the tracking angle of the component, and obtain the new state s t+1 and reward r t after angle correction, where r t is defined as the power variation;
  • Step 4 based on the new observed state s t+1 again, make an action decision through the decision network ⁇ ( ⁇
  • Step seven deriving value network
  • Step 9 deriving the strategy network
  • the present invention includes two neural network models, one is called the decision network, expressed as ⁇ (a
  • the other network is called the value network, expressed as q(a, s; w), where w is the parameter of the value network.
  • s; ⁇ ) can make decisions based on the observed state s to control the component scaffolding to make actions.
  • the state quantity s input by the strategy network can contain various data that affect the power generation of components, such as temperature, humidity, radiation, wind speed, time, latitude and longitude, and current angle.
  • the output of the policy network is the probability distribution of action a, such as the probability that the component bracket makes "+ correction", "- correction” or "no correction” actions according to the state s, and the optimal tracking can be selected according to the output probability distribution action.
  • the value network q(a, s; w) takes state quantity s and action a as input, and is mainly used to evaluate the value of action decision a made in state s.
  • the output feedback of the value function is defined based on the power change before and after component adjustment.
  • the power increase value is positive, and the greater the increase, the higher the value; conversely, the power decrease value is negative, and the greater the decrease, the lower the value. Based on this value feedback, the neural network can be driven to make actions with higher value (increasing component power) through training.
  • the present invention has the following beneficial effects: the present invention is based on big data, and independently learns the optimal tracking strategy of photovoltaic modules through reinforcement learning algorithms, with flexible application and wide applicability; the present invention can be used according to weather, site and Double-sided module conditions, automatic correction of the tracking angle of the module, can maximize the power generation efficiency of the module.
  • Fig. 1 is the decision-making network model that the present invention is used to formulate component optimal tracking strategy
  • Fig. 2 is the value network model that the present invention is used for evaluating tracking decision-making effect
  • Fig. 3 is a flow chart of training the strategy network and the value network in the present invention.
  • the decision network Collect various environmental data, such as temperature, humidity, radiation, wind speed, time, latitude and longitude, and current angle, etc., and form the state quantity s of these data into the decision network ⁇ (a
  • the parameter of the decision network is ⁇ .
  • the output of the decision network is the probability distribution of executing action a, and the action with the highest probability can be selected as the optimal action.
  • the state quantity s and the action decision a made by the decision network are used as the input of the value network q(a, s; w), as shown in Figure 2, the value network can evaluate the current state quantity s and the value a of the action . If the generated power after performing action a increases, the value increases, and if the generated power decreases, the value decreases.
  • s; ⁇ ) and the value network q(a,s; w) are randomly initialized. Therefore, at the beginning, the decision network ⁇ (a
  • the specific process of training is shown in Figure 3. The specific steps of training are as follows: Firstly, based on the observed state s t , an action decision is made through the decision network ⁇ ( ⁇

Landscapes

  • Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Sustainable Development (AREA)
  • Sustainable Energy (AREA)
  • Power Engineering (AREA)
  • Physics & Mathematics (AREA)
  • Electromagnetism (AREA)
  • General Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Automation & Control Theory (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

一种太阳能跟踪支架技术领域的基于深度强化学习的光伏组件智能追日方法,包括以下步骤:第一,采集光伏组件在各种天气条件下的状态数据;第二,建立策略网络和价值网络;第三,对策略网络和价值网络进行训练;第四,基于策略网络对光伏组件进行智能跟踪。本发明的方法是基于强化学习的,无需人为的光伏组件跟踪策略制定,这在实际的应用中更易于实现。本发明的方法是由光伏发电的运行大数据驱动的,其对光伏组件的追日控制策略完全是从状态参数中通过发电量最大化的价值目标学习确定的。本发明的方法可以自动确定双面组件在特殊天气和起伏场地条件下的跟踪策略,并实现发电量的最大化。

Description

基于深度强化学习的光伏组件智能追日方法 技术领域
本发明涉及的是一种太阳能跟踪支架技术领域的追踪控制方法,特别是一种基于策略网络和价值网络的基于深度强化学习的光伏组件智能追日方法。
背景技术
在国家碳达峰、碳中和的大背景下,太阳能光伏发电的发展潜力巨大。随着光伏发电平价时代的到来,光伏发电的度电成本(LCOE)不断降低已成为行业趋势,而先进的跟踪技术更有助于度电成本的降低。
目前,光伏跟踪普遍使用视日轨迹跟踪法,根据经纬度、日期、时间等数据实时计算太阳角度,并使得光伏组件的法向与太阳光入射角度保持最小。这种跟踪方式在多云、阴雨天等特殊天气条件下,以及在双面组件应用中,并不能保证组件处于最优角度,从而造成发电量的损失。因此,结合人工智能技术开发一种AI智能跟踪算法和控制系统,在传统跟踪的基础上进一步提升光伏电站的全场发电量,将具有巨大的市场价值和应用前景。
发明内容
本发明为了克服现有技术中视日轨迹跟踪法在特殊天气和双面组件条件下的不足,提供了一种基于深度强化学习的光伏组件智能追日方法,可以根据气温、湿度、辐射、风速等环境状态数据对光伏组件的跟踪角度进行修正,从而实现发电量的最大化。
本发明是通过以下技术方案来实现的,本发明包括以下步骤:第一,采集光伏组件在各种天气条件下的状态数据;第二,建立策略网络和价值网络;第三,对策略网络和价值网络进行训练;第四,基于策略网络对光伏组件进行智能跟踪。
进一步地,在本发明中,各种天气条件下的状态数据包括但不限于气温、湿度、辐射、风速、时间、经纬度和当前角度。
更进一步地,在本发明中,策略网络的模型表示为π(a|s;θ),价值网络的模型表示为q(a,s;w),其中a为决策网络根据状态s做出的动作决策,s为输入的状态,θ为网络的参数,w为价值网络的参数;策略网络π(a|s;θ)可以根据观测到的状态s做出决策,来控制组件支架做出动作,输出的是动作a的概率分布;价值网络q(a,s;w)将状态量 s和动作a作为输入,用于对s状态下做出的动作决策a的价值进行评估,输出的是组件调整前后的功率变化量定义。
更进一步地,在本发明中,对策略网络和价值网络进行训练的步骤为:
步骤一,观测当前状态s t
步骤二,基于观测状态s t,通过决策网络π(·|s t;θ t)做出动作决策,确定当前状态下的最优动作a t
步骤三,执行动作a t修正组件的跟踪角度,并获得角度修正后新的状态s t+1和奖励r t,r t定义为功率变化量;
步骤四,再次基于新的观测状态s t+1,通过决策网络π(·|s t+1;θ t)做出动作决策,确定新状态下的最优动作
Figure PCTCN2021113655-appb-000001
步骤五,通过价值网络评估两种状态下的价值q t=q(a t,s t;w t)和
Figure PCTCN2021113655-appb-000002
步骤六,计算价值估计误差δ t=q t-(r t+γ·q t+1),其中,γ折扣率,是价值网络的超参数;
步骤七,对价值网络求导
Figure PCTCN2021113655-appb-000003
步骤八,更新价值网络的参数w t+1=w t-α·δ t·d w,t
步骤九,对策略网络求导
Figure PCTCN2021113655-appb-000004
步骤十,更新策略网络的参数θ t+1=θ t+β·q t·d θ,t
本发明包含两个神经网络模型,一个称为决策网络,表示为π(a|s;θ),其中θ为网络的参数,s为输入的状态,a为决策网络根据状态s做出的动作决策。另一个网络称为价值网络,表示为q(a,s;w),其中w为价值网络的参数。
策略网络π(a|s;θ)可以根据观测到的状态s做出决策,来控制组件支架做出动作。 策略网络输入的状态量s可以包含各类影响组件发电量的数据,比如气温、湿度、辐射、风速、时间、经纬度和当前角度等。策略网络输出的是动作a的概率分布,比如组件支架根据状态s分别做出“+修正”、“-修正”或者“不修正”动作的概率,根据输出的概率分布即可选择最优的跟踪动作。
价值网络q(a,s;w)将状态量s和动作a作为输入,主要用于对s状态下做出的动作决策a的价值进行评估。价值函数的输出反馈基于组件调整前后的功率变化量定义,功率增大价值为正,且增幅越大价值越高;反之,功率减小价值为负,且降幅越大价值越低。基于该价值反馈,通过训练就可以驱使神经网络做出价值更高(使组件功率增加)的动作。
与现有技术相比,本发明具有如下有益效果为:本发明基于大数据,通过强化学习算法自主学习光伏组件的最优跟踪策略,应用灵活、适用性广;本发明可根据天气、场地和双面组件条件,对组件的跟踪角度进行自动修正,可以最大程度提升组件的发电效率。
附图说明
图1为本发明用于制定组件最优跟踪策略的决策网络模型;
图2为本发明用于评估跟踪决策效果的价值网络模型;
图3为本发明对策略网络和价值网络进行训练的流程图。
具体实施方式
下面结合附图对本发明的实施例作详细说明,本实施例以本发明技术方案为前提,给出了详细的实施方式和具体的操作过程,但本发明的保护范围不限于下述的实施例。
实施例
对各类环境数据进行采集,比如气温、湿度、辐射、风速、时间、经纬度和当前角度等,并将这些数据组成状态量s输入决策网络π(a|s;θ),如图1所示。决策网络的参数为θ,根据输入的状态量s,决策网络输出的是执行动作a的概率分布,可以选择概率最大的动作作为最优的动作。
将状态量s和决策网络做出的动作决策a作为价值网络q(a,s;w)的输入,如图2所示,价值网络则可以对当前状态量s和动作的价值a做出评估。如果执行动作a之后的发电功率上升,则价值增加,若发电功率下降,则价值减少。
决策网络π(a|s;θ)和价值网络q(a,s;w)的参数θ和w是随机初始化的。因此, 刚开始决策网络π(a|s;θ)和价值网络q(a,s;w)并不能根据状态量s做出最优的动作决策并给出准确的价值评估,而是需要通过一定量的训练才能逐渐提高网络的性能。训练的具体流程如图3所示。训练的具体步骤如下:首先基于观测状态s t,通过决策网络π(·|s t;θ t)做出动作决策,确定当前状态下的最优动作a t。执行动作a t使得状态量发生变化后,观测获得新的状态s t+1并确定奖励r t,即功率变化量。再次基于新的观测状态s t+1,通过决策网络确定新状态下的最优动作
Figure PCTCN2021113655-appb-000005
利用价值网络可以分别计算出两种状态s t和s t+1下的价值q t=q(a t,s t;w t)和
Figure PCTCN2021113655-appb-000006
从而计算出价值估计误差δ t=q t-(r t+γ·q t+1)。对价值网络求导
Figure PCTCN2021113655-appb-000007
则可以根据梯度下降更新价值网络的参数w t+1=w t-α·δ t·d w,t。最后,对策略网络求导
Figure PCTCN2021113655-appb-000008
并根据梯度上升更新策略网络的参数θ t+1=θ t+β·q t·d θ,t
以上对本发明的具体实施例进行了描述。需要理解的是,本发明并不局限于上述特定实施方式,本领域技术人员可以在权利要求的范围内做出各种变形或修改,这并不影响本发明的实质内容。

Claims (6)

  1. 一种基于深度强化学习的光伏组件智能追日方法,其特征在于,包括以下步骤:
    第一,采集光伏组件在各种天气条件下的状态数据;
    第二,建立策略网络和价值网络;
    第三,对策略网络和价值网络进行训练;
    第四,基于策略网络对光伏组件进行智能跟踪。
  2. 根据权利要求1所述的基于深度强化学习的光伏组件智能追日方法,其特征在于所述各种天气条件下的状态数据包括但不限于气温、湿度、辐射、风速、时间、经纬度和当前角度。
  3. 根据权利要求1所述的基于深度强化学习的光伏组件智能追日方法,其特征在于所述策略网络的模型表示为π(a|s;θ),价值网络的模型表示为q(a,s;w),其中a为决策网络根据状态s做出的动作决策,s为输入的状态,θ为网络的参数,w为价值网络的参数;
    所述策略网络π(a|s;θ)可以根据观测到的状态s做出决策,来控制组件支架做出动作,输出的是动作a的概率分布;
    所述价值网络q(a,s;w)将状态量s和动作a作为输入,用于对s状态下做出的动作决策a的价值进行评估,输出的是组件调整前后的功率变化量价值。
  4. 根据权利要求1所述的基于深度强化学习的光伏组件智能追日方法,其特征在于所述对策略网络和价值网络进行训练的步骤为:
    步骤一,观测当前状态s t
    步骤二,基于观测状态s t,通过决策网络π(·|s t;θ t)做出动作决策,确定当前状态下的最优动作a t
    步骤三,执行动作a t修正组件的跟踪角度,并获得角度修正后新的状态s t+1和奖励r t,r t定义为功率变化量;
    步骤四,再次基于新的观测状态s t+1,通过决策网络π(·|s t+1;θ t)做出动作决策,确定新状态下的最优动作
    Figure PCTCN2021113655-appb-100001
    步骤五,通过价值网络评估两种状态下的价值q t=q(a t,s t;w t)和
    Figure PCTCN2021113655-appb-100002
    步骤六,计算价值估计误差δ t=q t-(r t+γ·q t+1),其中,γ折扣率,是价值网络的超参数;
    步骤七,对价值网络求导
    Figure PCTCN2021113655-appb-100003
    步骤八,更新价值网络的参数w t+1=w t-α·δ t·d w,t
    步骤九,对策略网络求导
    Figure PCTCN2021113655-appb-100004
    步骤十,更新策略网络的参数θ t+1=θ t+β·q t·d θ,t
  5. 根据权利要求3所述的基于深度强化学习的光伏组件智能追日方法,其特征在于所述动作a的概率分布包括但不限于组件支架根据状态s分别做出“+修正”、“-修正”或者“不修正”动作的概率。
  6. 根据权利要求3所述的基于深度强化学习的光伏组件智能追日方法,其特征在于所述功率变化量价值,功率增大时价值为正,且增幅越大价值越高;反之,功率减小时价值为负,且降幅越大价值越低。
PCT/CN2021/113655 2021-08-20 2021-08-20 基于深度强化学习的光伏组件智能追日方法 WO2023019536A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/113655 WO2023019536A1 (zh) 2021-08-20 2021-08-20 基于深度强化学习的光伏组件智能追日方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/113655 WO2023019536A1 (zh) 2021-08-20 2021-08-20 基于深度强化学习的光伏组件智能追日方法

Publications (1)

Publication Number Publication Date
WO2023019536A1 true WO2023019536A1 (zh) 2023-02-23

Family

ID=85239336

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/113655 WO2023019536A1 (zh) 2021-08-20 2021-08-20 基于深度强化学习的光伏组件智能追日方法

Country Status (1)

Country Link
WO (1) WO2023019536A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116610037A (zh) * 2023-07-17 2023-08-18 中国海洋大学 海洋平台通风系统的风量综合优化控制方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080243439A1 (en) * 2007-03-28 2008-10-02 Runkle Paul R Sensor exploration and management through adaptive sensing framework
CN108803321A (zh) * 2018-05-30 2018-11-13 清华大学 基于深度强化学习的自主水下航行器轨迹跟踪控制方法
US20200119556A1 (en) * 2018-10-11 2020-04-16 Di Shi Autonomous Voltage Control for Power System Using Deep Reinforcement Learning Considering N-1 Contingency
CN111324167A (zh) * 2020-02-27 2020-06-23 上海电力大学 一种光伏发电最大功率点跟踪控制方法与装置
CN113139655A (zh) * 2021-03-31 2021-07-20 北京大学 一种基于强化学习的目标追踪的训练方法、追踪方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080243439A1 (en) * 2007-03-28 2008-10-02 Runkle Paul R Sensor exploration and management through adaptive sensing framework
CN108803321A (zh) * 2018-05-30 2018-11-13 清华大学 基于深度强化学习的自主水下航行器轨迹跟踪控制方法
US20200119556A1 (en) * 2018-10-11 2020-04-16 Di Shi Autonomous Voltage Control for Power System Using Deep Reinforcement Learning Considering N-1 Contingency
CN111324167A (zh) * 2020-02-27 2020-06-23 上海电力大学 一种光伏发电最大功率点跟踪控制方法与装置
CN113139655A (zh) * 2021-03-31 2021-07-20 北京大学 一种基于强化学习的目标追踪的训练方法、追踪方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116610037A (zh) * 2023-07-17 2023-08-18 中国海洋大学 海洋平台通风系统的风量综合优化控制方法
CN116610037B (zh) * 2023-07-17 2023-09-29 中国海洋大学 海洋平台通风系统的风量综合优化控制方法

Similar Documents

Publication Publication Date Title
Stefenon et al. Photovoltaic power forecasting using wavelet Neuro-Fuzzy for active solar trackers
CN104834215B (zh) 一种变异粒子群优化的bp神经网络pid控制算法
CN105955394B (zh) 基于蚁群优化与变步长扰动观察算法的光伏系统mppt方法
CN111561732B (zh) 基于人工智能的换热站供热调节方法及系统
CN104965558A (zh) 一种考虑雾霾因素的光伏发电系统最大功率跟踪方法及装置
CN110009135B (zh) 一种基于宽度学习的风电功率预测方法
WO2023019536A1 (zh) 基于深度强化学习的光伏组件智能追日方法
CN105787592A (zh) 基于改进rbf网络的风电机组超短期风功率预测方法
CN110648006A (zh) 一种考虑风光相关性的日前优化调度方法
CN115617083A (zh) 基于ppo算法的光伏发电太阳能电池板角度自动调整方法
CN111509785A (zh) 一种用于电网多源最优协同控制的方法、系统及存储介质
CN112149905A (zh) 一种基于小波变换和小波神经网络的光伏电站短期功率预测方法
CN112101626A (zh) 一种分布式光伏发电功率预测方法及系统
CN110336285B (zh) 电力系统最优经济潮流计算方法
CN115632394A (zh) 基于ppo算法的光伏电站暂态模型构建与参数辨识方法
CN113555908B (zh) 一种智能配电网储能优化配置方法
CN106026103A (zh) 一种风电场接入的概率潮流计算方法
TW202018534A (zh) 太陽光電模組陣列最大功率追蹤方法
CN116046018B (zh) 一种应用于mems陀螺仪的温度补偿方法
CN114429233A (zh) 一种基于聚类算法的场站级最优功率控制方法及系统
TW202141216A (zh) 太陽光電裝置及其最大功率追蹤方法
CN114530848B (zh) 一种光储虚拟电厂多时间尺度动态划分方法
CN108054751B (zh) 一种确定电网系统中可再生能源最优接入容量的方法
CN110555225A (zh) 一种基于分层粒子群优化算法的rbpf-slam计算方法
CN112631365B (zh) 一种基于scasl的光伏发电多峰值mppt控制方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21953779

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE