JP7090440B2

JP7090440B2 - Control systems, learning devices, control devices, and control methods

Info

Publication number: JP7090440B2
Application number: JP2018049924A
Authority: JP
Inventors: 房二堀部
Original assignee: Lixil Corp
Current assignee: Lixil Corp
Priority date: 2018-03-16
Filing date: 2018-03-16
Publication date: 2022-06-24
Anticipated expiration: 2038-03-16
Also published as: JP2019162003A

Description

本発明は、制御システム、学習装置、制御装置、及び制御方法に関する。 The present invention relates to a control system, a learning device, a control device, and a control method.

従来、風力発電システムにおいて、風車が受ける風速に対して発電電力（回生電力）が最大となる風車の回転数は、風車ごとの固有の特性値として一義的に決められている。例えば、風速に応じて風車のトルクを調整することで風車の回転数を制御し、風速に応じた最大の発電電力が得られるように制御を行う。
風況は、風車の設置場所や季節によって変化する。このため、風車が設置された状態や、風車が設置された場所の環境、及び季節等に応じた制御が行われることが好ましい。
この対策として、機械学習エンジンを用いて、個々の風車から出力される回生電力が最大となる制御パラメータと風速との関係を学習させ、学習済みの機械学習エンジンから出力される制御パラメータに応じて個々の風車を制御する技術が開示されている（例えば、特許文献１）。また、機械学習の技法としては、強化学習を用いて回生電力が最大となる制御パラメータと風速との関係を学習させる方法もある。この場合、風車を制御した結果として発電された回生電力の大きさに応じた報酬を与えることで、回生電力が最大となるような風速と制御パラメータとの関係を学習させることができる。 Conventionally, in a wind power generation system, the rotation speed of a wind turbine having the maximum generated power (regenerative power) with respect to the wind speed received by the wind turbine is uniquely determined as a characteristic value peculiar to each wind turbine. For example, the rotation speed of the wind turbine is controlled by adjusting the torque of the wind turbine according to the wind speed, and the control is performed so that the maximum generated power according to the wind speed can be obtained.
Wind conditions change depending on the location of the wind turbine and the season. Therefore, it is preferable that the control is performed according to the state in which the wind turbine is installed, the environment of the place where the wind turbine is installed, the season, and the like.
As a countermeasure, a machine learning engine is used to learn the relationship between the wind speed and the control parameter that maximizes the regenerative power output from each wind turbine, and according to the control parameter output from the learned machine learning engine. A technique for controlling an individual wind turbine is disclosed (for example, Patent Document 1). In addition, as a machine learning technique, there is also a method of learning the relationship between the control parameter that maximizes the regenerative power and the wind speed by using reinforcement learning. In this case, by giving a reward according to the magnitude of the regenerative power generated as a result of controlling the wind turbine, it is possible to learn the relationship between the wind speed and the control parameter so that the regenerative power is maximized.

特表２０１３－５２４７４４号公報Special Table 2013-524744

しかしながら、単に回生電力が増加した場合に高い報酬を与えると、制御パラメータが不適切であった場合でも、たまたま風速が適切に変化したことによって回生電力が増加した場合にも高い報酬が付与されてしまうことが考えられる。また、制御パラメータが適切であった場合でも、たまたま風速が想定外に変化したことによって回生電力が減少してしまった場合には低い報酬となってしまう。このため、学習モデルに誤った学習をさせてしまう問題があった。 However, if you simply give a high reward when the regenerative power increases, even if the control parameters are inappropriate, a high reward will be given even if the regenerative power increases due to an appropriate change in the wind speed. It is possible that it will end up. Even if the control parameters are appropriate, if the regenerative power is reduced due to an unexpected change in the wind speed, the reward will be low. Therefore, there is a problem that the learning model is erroneously learned.

本発明は、このような事情に鑑みてなされたものであり、その目的は、風速と回生電力の関係に応じて適切な制御を行うことができる風力発電システムの制御装置を提供することである。 The present invention has been made in view of such circumstances, and an object of the present invention is to provide a control device for a wind power generation system capable of performing appropriate control according to the relationship between wind speed and regenerative power. ..

上述した課題を解決するために本発明の一実施形態は、風力発電システムにより発電された回生電力を制御する制御システムであって、前記風力発電システムの風車の設置場所における風速を示す風速情報、前記風車の回転に関する回転情報、前記風力発電システムにより発電される回生電力に関する電力情報、及び、前記風速と前記回生電力との関係情報と前記風速に対する前記電力情報に応じた報酬とに基づいて、前記風速と前記回生電力との対応情報を学習する学習部と、前記風速と前記回生電力との対応情報を記憶する記憶部と、前記回生電力を制御する電力制御パラメータを前記風力発電システムに設定した場合における前記回転情報、前記電力情報、及び前記風速情報を検出する状態検出部と、前記状態検出部により検出された前記回転情報と前記風速情報、及び前記対応情報に基づいて、前記電力情報の指令値を決定する決定部と、前記決定部により決定された前記指令値に基づいて、前記回生電力を制御する制御部と、前記状態検出部により検出された前記風速情報、前記回転情報、及び前記電力情報に基づいて、前記報酬を算出する報酬算出部とを備え、前記報酬算出部は、前記風速情報に示される風速が所定の強風閾値以上である場合に前記回転情報に示される前記風車の回転数に応じて前記報酬を算出し、前記風速情報に示される風速が所定の強風閾値未満である場合に前記電力情報に応じて前記報酬を算出することを特徴とする制御システムである。 In order to solve the above-mentioned problems, one embodiment of the present invention is a control system that controls regenerative power generated by a wind power generation system, and wind speed information indicating the wind speed at the installation location of the wind turbine of the wind power generation system. Based on the rotation information regarding the rotation of the wind turbine, the power information regarding the regenerative power generated by the wind power generation system, the relationship information between the wind speed and the regenerative power, and the reward according to the power information for the wind speed. A learning unit that learns correspondence information between the wind speed and the regenerative power, a storage unit that stores the correspondence information between the wind speed and the regenerative power, and a power control parameter that controls the regenerative power are set in the wind power generation system. The power information is based on the state detection unit that detects the rotation information, the power information, and the wind speed information, the rotation information and the wind speed information detected by the state detection unit, and the corresponding information. A determination unit that determines the command value of the wind power, a control unit that controls the regenerative power based on the command value determined by the determination unit, and the wind speed information and the rotation information detected by the state detection unit. The reward calculation unit includes a reward calculation unit that calculates the reward based on the power information, and the reward calculation unit is shown in the rotation information when the wind speed shown in the wind speed information is equal to or higher than a predetermined strong wind threshold value. A control system characterized in that the reward is calculated according to the number of rotations of the wind turbine, and the reward is calculated according to the power information when the wind speed shown in the wind speed information is less than a predetermined strong wind threshold value . be.

また、本発明の一実施形態は、上述の制御システムであって、前記報酬算出部は、前記状態検出部により検出された前記風速情報に示される風速が所定の強風閾値未満である場合、前記状態検出部により検出された前記電力情報に示される前記回生電力と予め記憶された基準電力とに基づいて、前記回生電力が前記基準電力より大きい程、より高い前記報酬を算出する。 Further, one embodiment of the present invention is the above-mentioned control system, and the reward calculation unit is described when the wind speed indicated in the wind speed information detected by the state detection unit is less than a predetermined strong wind threshold value. Based on the regenerative power shown in the power information detected by the state detection unit and the reference power stored in advance, the larger the regenerative power is, the higher the reward is calculated.

また、本発明の一実施形態は、上述の制御システムであって、前記報酬算出部は、前記状態検出部により検出された前記風速情報に示される風速が所定の強風閾値以上である場合、前記状態検出部により検出された前記回転情報に示される前記風車の回転数と予め記憶された回転閾値とに基づいて、前記回転数が前記回転閾値より小さく、前記回転数が前記回転閾値より小さい程、より高い前記報酬を算出する。 Further, one embodiment of the present invention is the above-mentioned control system, and the reward calculation unit is described when the wind speed indicated in the wind speed information detected by the state detection unit is equal to or higher than a predetermined strong wind threshold value. Based on the rotation speed of the wind turbine shown in the rotation information detected by the state detection unit and the rotation threshold value stored in advance, the rotation speed is smaller than the rotation threshold value and the rotation speed is smaller than the rotation threshold value. , Calculate the higher reward.

また、本発明の一実施形態は、上述の制御システムであって、前記報酬算出部は、前記状態検出部により前記電力情報が検出された時点より過去に前記状態検出部により取得された前記風速情報に基づいて、前記報酬を算出する。 Further, one embodiment of the present invention is the above-mentioned control system, in which the reward calculation unit is the wind speed acquired by the state detection unit in the past from the time when the power information is detected by the state detection unit. The reward is calculated based on the information.

また、本発明の一実施形態は、風力発電システムの風車の設置場所における風速を示す風速情報、前記風車の回転に関する回転情報、前記風力発電システムにより発電される回生電力に関する電力情報、及び、前記風速と前記回生電力との関係情報と前記風速に対する前記電力情報に応じた報酬とに基づいて、前記風速と前記回生電力との対応情報を学習する学習部を備え、前記報酬は、前記回生電力を制御する電力制御パラメータを前記風力発電システムに設定した場合において検出された前記風速情報、前記回転情報、前記電力情報に基づいて算出され、前記風速情報に示される風速が所定の強風閾値以上である場合に前記回転情報に示される前記風車の回転数に応じて算出され、前記風速情報に示される風速が所定の強風閾値未満である場合に前記電力情報に応じて算出される学習装置である。 Further, in one embodiment of the present invention, wind speed information indicating the wind speed at the installation location of the wind turbine of the wind power generation system, rotation information regarding the rotation of the wind turbine, power information regarding the regenerative power generated by the wind power generation system, and the above-mentioned A learning unit for learning the correspondence information between the wind speed and the regenerative power based on the relationship information between the wind speed and the regenerative power and the reward corresponding to the power information for the wind speed is provided , and the reward is the regenerative power. The wind speed calculated based on the wind speed information, the rotation information, and the power information detected when the power control parameter for controlling the wind power generation system is set to the wind power generation system, and the wind speed shown in the wind speed information is equal to or higher than a predetermined strong wind threshold value. It is a learning device that is calculated according to the rotation speed of the wind turbine shown in the rotation information in a certain case, and is calculated according to the power information when the wind speed shown in the wind speed information is less than a predetermined strong wind threshold value . ..

また、本発明の一実施形態は、風力発電システムの風車の設置場所における風速を示す風速情報、前記風車の回転に関する回転情報、前記風力発電システムにより発電される回生電力に関する電力情報、及び、前記風速と前記回生電力との関係情報と前記風速に対する前記電力情報に応じた報酬とに基づいて、前記風速と前記回生電力との対応情報を学習する学習部と、前記回生電力を制御する電力制御パラメータを前記風力発電システムに設定した場合における前記回転情報、前記電力情報、及び前記風速情報を検出する状態検出部と、前記状態検出部により検出された前記回転情報と前記風速情報、及び前記対応情報に基づいて、前記電力情報の指令値を決定する決定部と、前記決定部により決定された前記指令値に基づいて、前記回生電力を制御する制御部と、前記状態検出部により検出された前記風速情報、前記回転情報、及び前記電力情報に基づいて、前記報酬を算出する報酬算出部とを備え、前記報酬算出部は、前記風速情報に示される風速が所定の強風閾値以上である場合に前記回転情報に示される前記風車の回転数に応じて前記報酬を算出し、前記風速情報に示される風速が所定の強風閾値未満である場合に前記電力情報に応じて前記報酬を算出する制御装置である。 Further, in one embodiment of the present invention, wind speed information indicating the wind speed at the installation location of the wind turbine of the wind power generation system, rotation information regarding the rotation of the wind turbine, power information regarding the regenerative power generated by the wind power generation system, and the above-mentioned A learning unit that learns correspondence information between the wind speed and the regenerated power based on the relationship information between the wind speed and the regenerated power and a reward corresponding to the power information for the wind speed, and a power control that controls the regenerated power. A state detection unit that detects the rotation information, the power information, and the wind speed information when the parameters are set in the wind power generation system, the rotation information and the wind speed information detected by the state detection unit, and the correspondence. It was detected by the determination unit that determines the command value of the power information based on the information, the control unit that controls the regenerated power based on the command value determined by the determination unit, and the state detection unit. A reward calculation unit for calculating the reward based on the wind speed information, the rotation information, and the power information is provided , and the reward calculation unit is used when the wind speed shown in the wind speed information is equal to or higher than a predetermined strong wind threshold value. The reward is calculated according to the rotation speed of the wind turbine shown in the rotation information, and the reward is calculated according to the power information when the wind speed shown in the wind speed information is less than a predetermined strong wind threshold. It is a control device.

また、本発明の一実施形態は、風力発電システムにより発電された回生電力を制御する電力制御パラメータを前記風力発電システムに設定した場合における前記風力発電システムの風車の回転に関する回転情報、前記風力発電システムにより発電される回生電力に関する電力情報、及び前記風車の設置場所における風速を示す風速情報を検出する状態検出部と、前記状態検出部により検出された前記回転情報と前記風速情報、及び前記風速と前記回生電力との対応情報に基づいて、前記電力情報の指令値を決定する決定部と、前記決定部により決定された前記指令値に基づいて、前記回生電力を制御する制御部とを備え、前記対応情報は、前記風速情報、前記回転情報、前記電力情報、及び、前記風速と前記回生電力との関係情報と前記風速に対する前記電力情報に応じた報酬とに基づいて学習された情報であり、前記報酬は、前記状態検出部により検出された前記風速情報、前記回転情報、前記電力情報に基づいて算出され、前記風速情報に示される風速が所定の強風閾値以上である場合に前記回転情報に示される前記風車の回転数に応じて算出され、前記風速情報に示される風速が所定の強風閾値未満である場合に前記電力情報に応じて算出される制御装置である。 Further, in one embodiment of the present invention, rotation information regarding the rotation of the wind turbine of the wind power generation system when the power control parameter for controlling the regenerated power generated by the wind power generation system is set in the wind power generation system, the wind power generation. A state detection unit that detects power information regarding regenerative power generated by the system and wind speed information indicating the wind speed at the installation location of the wind turbine, the rotation information and the wind speed information detected by the state detection unit, and the wind speed. A determination unit that determines a command value of the power information based on the correspondence information between the wind power and the regenerative power generation, and a control unit that controls the regenerative power generation based on the command value determined by the determination unit. The corresponding information is information learned based on the wind speed information, the rotation information, the power information, the relationship information between the wind speed and the regenerated power, and the reward corresponding to the power information for the wind speed. The reward is calculated based on the wind speed information, the rotation information, and the power information detected by the state detection unit, and the rotation is when the wind speed shown in the wind speed information is equal to or higher than a predetermined strong wind threshold value. It is a control device calculated according to the rotation speed of the wind turbine shown in the information, and calculated according to the power information when the wind speed shown in the wind speed information is less than a predetermined strong wind threshold value .

また、本発明の一実施形態は、風力発電システムにより発電された回生電力を制御する制御方法であって、学習部が、前記風力発電システムの風車の設置場所における風速を示す風速情報、前記風車の回転に関する回転情報、前記風力発電システムにより発電される回生電力に関する電力情報、及び、前記風速と前記回生電力との関係情報と前記風速に対する前記電力情報に応じた報酬とに基づいて、前記風速と前記回生電力との対応情報を学習し、記憶部に、前記風速と前記回転との対応情報を記憶させ、状態検出部が、前記回生電力を制御する電力制御パラメータを前記風力発電システムに設定した場合における前記回転情報、前記電力情報、及び前記風速情報を検出し、決定部が、前記状態検出部により検出された前記回転情報と前記風速情報、及び前記対応情報に基づいて、前記電力情報の指令値を決定し、制御部が、前記決定部により決定された前記指令値に基づいて、前記回電力を制御し、報酬算出部が、前記状態検出部により検出された前記風速情報、前記回転情報、及び前記電力情報に基づいて、前記報酬を算出し、前記風速情報に示される風速が所定の強風閾値以上である場合に前記回転情報に示される前記風車の回転数に応じて前記報酬を算出し、前記風速情報に示される風速が所定の強風閾値未満である場合に前記電力情報に応じて前記報酬を算出する制御方法である。
Further, one embodiment of the present invention is a control method for controlling the regenerated power generated by the wind power generation system, wherein the learning unit indicates the wind speed at the installation location of the wind turbine of the wind power generation system, the wind turbine. The wind speed is based on the rotation information regarding the rotation of the wind power generation system, the power information regarding the regenerative power generated by the wind power generation system, the relationship information between the wind speed and the regenerative power, and the reward corresponding to the power information for the wind speed. The correspondence information between the wind power and the regenerative power is learned, the storage unit stores the correspondence information between the wind speed and the rotation, and the state detection unit sets the power control parameter for controlling the regenerative power in the wind power generation system. In this case, the rotation information, the power information, and the wind speed information are detected, and the determination unit determines the power information based on the rotation information, the wind speed information, and the corresponding information detected by the state detection unit. The command value is determined, the control unit controls the power generation based on the command value determined by the determination unit, and the reward calculation unit determines the wind speed information detected by the state detection unit. The reward is calculated based on the rotation information and the power information, and when the wind speed shown in the wind speed information is equal to or higher than a predetermined strong wind threshold value, the reward is shown according to the rotation speed of the wind turbine shown in the rotation information. Is a control method for calculating the reward according to the power information when the wind speed shown in the wind speed information is less than a predetermined strong wind threshold value .

以上説明したように、この発明によれば、風速と回生電力の関係に応じて適切な制御を行うことができる。 As described above, according to the present invention, appropriate control can be performed according to the relationship between the wind speed and the regenerative power.

第１の実施形態に係る風力発電システム１の概略構成の一例を示すブロック図である。It is a block diagram which shows an example of the schematic structure of the wind power generation system 1 which concerns on 1st Embodiment. 第１の実施形態に係る風力発電システム１の制御装置６０及び学習装置７０の構成の一例を示すブロック図である。It is a block diagram which shows an example of the structure of the control device 60 and the learning device 70 of the wind power generation system 1 which concerns on 1st Embodiment. 第１の実施形態に係る強風時における報酬条件の例を示す図である。It is a figure which shows the example of the reward condition at the time of a strong wind which concerns on 1st Embodiment. 第１の実施形態に係る制御装置６０の動作例を示すフローチャートである。It is a flowchart which shows the operation example of the control device 60 which concerns on 1st Embodiment. 第２の実施形態に係る制御装置６０Ａの構成の一例を示すブロック図である。It is a block diagram which shows an example of the structure of the control device 60A which concerns on 2nd Embodiment. 第２の実施形態に係る応答時間推定部６９が応答時間を推定する処理を説明する図である。It is a figure explaining the process of estimating the response time by the response time estimation unit 69 which concerns on 2nd Embodiment. 第３の実施形態に係る報酬条件の例を示す図である。It is a figure which shows the example of the reward condition which concerns on 3rd Embodiment. 第３の実施形態に係る報酬算出部６３Ｂが報酬を算出する処理を説明する図である。It is a figure explaining the process of calculating the reward by the reward calculation unit 63B which concerns on 3rd Embodiment. 第３の実施形態に係る制御システム５０Ｃの動作例を示すフローチャートである。It is a flowchart which shows the operation example of the control system 50C which concerns on 3rd Embodiment. 第４の実施形態の変形例に係る風力発電システム１Ｄの概略構成の一例を示すブロック図である。It is a block diagram which shows an example of the schematic structure of the wind power generation system 1D which concerns on the modification of 4th Embodiment. 第５の実施形態の変形例に係る風力発電システム１Ｅの概略構成の一例を示すブロック図である。It is a block diagram which shows an example of the schematic structure of the wind power generation system 1E which concerns on the modification of 5th Embodiment.

以下、実施形態の制御システム、学習装置、制御装置を、図面を参照して説明する。 Hereinafter, the control system, the learning device, and the control device of the embodiment will be described with reference to the drawings.

＜第１の実施形態＞
図１は、第１の実施形態に係る風力発電システム１の概略構成の一例を示すブロック図である。風力発電システム１は、風力発電機本体１０と制御システム５０とを備える。風力発電機本体１０と制御システム５０との間では、種々の情報がやりとりされる。
図１に示すように、例えば、制御システム５０から風力発電機本体１０に、風力発電機本体１０を制御する制御パラメータが出力される。
また、例えば、風力発電機本体１０から制御システム５０に、風力発電機本体１０の状態を示す状態パラメータが出力される。 <First Embodiment>
FIG. 1 is a block diagram showing an example of a schematic configuration of the wind power generation system 1 according to the first embodiment. The wind power generation system 1 includes a wind power generator main body 10 and a control system 50. Various information is exchanged between the wind power generator main body 10 and the control system 50.
As shown in FIG. 1, for example, a control parameter for controlling the wind power generator main body 10 is output from the control system 50 to the wind power generator main body 10.
Further, for example, a state parameter indicating the state of the wind power generator main body 10 is output from the wind power generator main body 10 to the control system 50.

制御パラメータは、例えば、発電機３０により発電される回生電力の電力量を制御する電力制御パラメータである。
また、状態パラメータは、例えば、風車２０の設置場所における風速を示す風速情報、風車２０の回転速度（以下、単に回転速度ともいう）を示す回転情報、及び発電機３０により発電された回生電力の電力量を示す電力情報である。 The control parameter is, for example, a power control parameter that controls the amount of regenerative power generated by the generator 30.
Further, the state parameters are, for example, wind speed information indicating the wind speed at the installation location of the wind turbine 20, rotation information indicating the rotation speed of the wind turbine 20 (hereinafter, also simply referred to as rotation speed), and regenerative power generated by the generator 30. It is power information indicating the amount of power.

風力発電機本体１０は、風車２０と、発電機３０と、整流・昇圧部３１と、電圧検出部３２と、電流検出部３３と、風速センサ４１と、回転速度センサ４２とを備える。
風車２０は、例えば、垂直軸型風車として構成されており、鉛直方向に延びる回転軸の周囲に複数の直線翼が一体として回転可能に連結させた直線翼垂直軸風車などによって構成されている。 The wind power generator main body 10 includes a wind turbine 20, a generator 30, a rectifying / boosting unit 31, a voltage detection unit 32, a current detection unit 33, a wind speed sensor 41, and a rotation speed sensor 42.
The wind turbine 20 is configured as, for example, a vertical axis type wind turbine, and is configured by a straight wing vertical axis wind turbine or the like in which a plurality of straight blades are integrally rotatably connected around a rotating shaft extending in the vertical direction.

風車２０は、例えば、後述する発電機３０の回転子と回転軸を介して接続され、発電機３０の回転子と一体となって回転する。ここで、発電機３０の回転子は、発電機３０により発電される回生電力の電力量に応じた回転数で回転する。また、回生電力の電力量は、後述する制御システム５０によりＭＰＰＴ（Maximum Power Point Tracking）制御がなされる。このため、風車２０の回転数は、制御システム５０によるＭＰＰＴ制御により、間接的に制御される。 For example, the wind turbine 20 is connected to the rotor of the generator 30 described later via a rotation shaft, and rotates integrally with the rotor of the generator 30. Here, the rotor of the generator 30 rotates at a rotation speed corresponding to the electric energy of the regenerated electric power generated by the generator 30. Further, the electric energy of the regenerative power is controlled by MPPT (Maximum Power Point Tracking) by the control system 50 described later. Therefore, the rotation speed of the wind turbine 20 is indirectly controlled by the MPPT control by the control system 50.

発電機３０は、風車２０の回転力を変換して電力を生じさせる装置であり、例えば、三相交流発電機として構成され、風車２０の回転と連動して回転する回転子が風車２０の回転軸に連結されて回転することにより交流電力を発電する。発電機３０は、発電した交流電力を整流・昇圧部３１に供給する。なお、発電機３０は、発電した電力を整流・昇圧部３１側に供給する発電機として動作する他、整流・昇圧部３１から交流電力が供給される電動機としても動作する。発電機３０は、例えば、風車２０の起動時に回転をアシストするアシスト制御を行う場合等に電動機として動作する。 The generator 30 is a device that converts the rotational force of the wind turbine 20 to generate electric power. For example, the generator 30 is configured as a three-phase alternating current generator, and a rotor that rotates in conjunction with the rotation of the wind turbine 20 rotates the wind turbine 20. AC power is generated by being connected to a shaft and rotating. The generator 30 supplies the generated AC power to the rectifying / boosting unit 31. The generator 30 operates as a generator that supplies the generated power to the rectifying / boosting unit 31 side, and also operates as an electric motor to which AC power is supplied from the rectifying / boosting unit 31. The generator 30 operates as an electric motor, for example, when performing assist control for assisting rotation when the wind turbine 20 is started.

整流・昇圧部３１は、発電機３０により発電された交流電力を直流電力に変換し、変換した直流電力の電圧を変換（昇圧）する。整流・昇圧部３１は、例えば、昇圧チョッパ回路である。ここで、整流・昇圧部３１から出力される直流電力は、「回生電力」の一例である。 The rectifying / boosting unit 31 converts the AC power generated by the generator 30 into DC power, and converts (boosts) the voltage of the converted DC power. The rectifying / boosting unit 31 is, for example, a boosting chopper circuit. Here, the DC power output from the rectifying / boosting unit 31 is an example of “regenerative power”.

整流・昇圧部３１は、例えば、ＰＷＭ（Pulse Width Modulation）サーボコントローラを備え、制御システム５０からのＭＰＰＴ制御に基づき、直流電力の電圧を変化させる。具体的には、整流・昇圧部３１は、制御システム５０からのＰＷＭ信号に応じたスイッチングを行い、ＰＷＭ信号のデューティ比の変化に応じた電圧となる直流電力を出力する。 The rectifying / boosting unit 31 includes, for example, a PWM (Pulse Width Modulation) servo controller, and changes the voltage of DC power based on MPPT control from the control system 50. Specifically, the rectifying / boosting unit 31 performs switching according to the PWM signal from the control system 50, and outputs DC power having a voltage corresponding to the change in the duty ratio of the PWM signal.

なお、整流・昇圧部３１は、発電機３０に発電動作を行わせる場合には昇圧チョッパ回路として作動し、アシスト制御時等に発電機３０を電動機として動作させる場合にはインバ－タとして作動する回路である。なお、アシスト制御時の供給電力は、風力発電システム１のバッテリ（不図示）からの電力であってもよい。 The rectifying / boosting unit 31 operates as a boosting chopper circuit when the generator 30 is operated to generate power, and operates as an inverter when the generator 30 is operated as an electric motor during assist control or the like. It is a circuit. The electric power supplied during the assist control may be the electric power from the battery (not shown) of the wind power generation system 1.

電圧検出部３２は、公知の電圧計によって構成され、整流・昇圧部３１から出力される出力電圧を検出し、検出した出力電圧を制御システム５０に出力する。
電流検出部３３は、公知の電流計によって構成され、整流・昇圧部３１から出力される出力電流を検出し、検出した出力電流を制御システム５０に出力する。 The voltage detection unit 32 is composed of a known voltmeter, detects the output voltage output from the rectifying / boosting unit 31, and outputs the detected output voltage to the control system 50.
The current detection unit 33 is composed of a known ammeter, detects the output current output from the rectifying / boosting unit 31, and outputs the detected output current to the control system 50.

風速センサ４１は、公知の風速センサによって構成され、例えば、風車２０の近傍の所定位置（例えば、風車２０における回転翼以外の部位）に設けられて風車が受ける風の風速を検出する。風速センサ４１は、検出した風速を示す情報を、制御システム５０に出力する。 The wind speed sensor 41 is composed of a known wind speed sensor, and is provided at a predetermined position near the wind turbine 20 (for example, a portion other than the rotary blade in the wind turbine 20) to detect the wind speed of the wind received by the wind turbine. The wind speed sensor 41 outputs information indicating the detected wind speed to the control system 50.

回転速度センサ４２は、風車２０の回転速度を検出する。回転速度センサ４２は、風車２０の回転軸部（不図示）の回転速度を検出し得るセンサであればよく、公知の様々な回転速度センサを用いることができる。回転速度センサ４２は、検出した回転速度を示す情報を、制御システム５０に出力する。 The rotation speed sensor 42 detects the rotation speed of the wind turbine 20. The rotation speed sensor 42 may be any sensor that can detect the rotation speed of the rotation shaft portion (not shown) of the wind turbine 20, and various known rotation speed sensors can be used. The rotation speed sensor 42 outputs information indicating the detected rotation speed to the control system 50.

制御システム５０は、制御装置６０と、学習装置７０とを備える。
制御装置６０は、風速センサ４１により検出された風速、回転速度センサ４２により検出された回転速度、及び電圧検出部３２と電流検出部３３とにより検出された回生電力の電力量に基づいて、学習装置７０を用いて電力制御パラメータを決定する。制御装置６０は、決定した電力制御パラメータを、風力発電機本体１０に設定することにより、発電機３０により発電される回生電力量を制御し、また、風車２０の回転数を（間接的に）制御する。 The control system 50 includes a control device 60 and a learning device 70.
The control device 60 learns based on the wind speed detected by the wind speed sensor 41, the rotation speed detected by the rotation speed sensor 42, and the electric energy of the regenerative power detected by the voltage detection unit 32 and the current detection unit 33. The power control parameters are determined using the device 70. The control device 60 controls the amount of regenerated electric power generated by the generator 30 by setting the determined electric power control parameter in the wind power generator main body 10, and also (indirectly) determines the rotation speed of the wind turbine 20. Control.

ここで、風速センサ４１により検出された風速は「風速情報」の一例である。回転速度センサ４２により検出された回転速度は、「回転情報」の一例である。また、電圧検出部３２と電流検出部３３とにより検出された回生電力の電力量は、「電力情報」の一例である。 Here, the wind speed detected by the wind speed sensor 41 is an example of "wind speed information". The rotation speed detected by the rotation speed sensor 42 is an example of "rotation information". Further, the electric energy of the regenerative electric power detected by the voltage detection unit 32 and the current detection unit 33 is an example of "electric power information".

学習装置７０は、例えば、強化学習を行う装置である。この場合、学習装置７０は、強化学習における学習する主体となるエージェントに相当し、制御対象（本実施形態では、風力発電機本体１０）とのやりとりにより、制御対象をより適切に制御するための学習を進める。 The learning device 70 is, for example, a device that performs reinforcement learning. In this case, the learning device 70 corresponds to an agent that is a learning subject in reinforcement learning, and is for more appropriately controlling the controlled object by interacting with the controlled object (in the present embodiment, the wind power generator main body 10). Advance learning.

以下では、学習装置７０が強化学習を行う場合を例示して説明するが、これに限定されない。学習装置７０は、制御対象（風力発電機本体１０）に関する状態に基づいて、制御対象を制御するパラメータがより適切となるように学習するものであればよい。学習装置７０は、教師あり学習を行ってもよいし、教師なし学習を行ってもよいし、その他の学習を行ってもよい。ここで、制御対象（風力発電機本体１０）に関する状態とは、風力発電機本体１０及び風力発電機本体１０の周囲の状態であり、例えば、状態パラメータでしまされる風車２０における風速、風車２０の回転速度、及び発電機３０の発電量等の変数である。また、ここでの状態には、上述した風速等のような時々刻々変化する状態の他、予め定められた状態、例えば、風車２０の回転速度の限界値、風車２０の回転トルクの上下限、及び発電機３０が発電可能な最大の電力量等を含む。 Hereinafter, the case where the learning device 70 performs reinforcement learning will be described as an example, but the present invention is not limited to this. The learning device 70 may be one that learns so that the parameters for controlling the controlled object become more appropriate based on the state of the controlled object (wind power generator main body 10). The learning device 70 may perform supervised learning, unsupervised learning, or other learning. Here, the state related to the controlled object (wind power generator main body 10) is the state around the wind power generator main body 10 and the wind power generator main body 10, and for example, the wind speed in the wind turbine 20 and the wind turbine 20 set by the state parameter. It is a variable such as the rotation speed of the generator 30 and the amount of power generated by the generator 30. Further, the state here includes a state that changes from moment to moment such as the above-mentioned wind speed, and a predetermined state, for example, a limit value of the rotation speed of the wind turbine 20, an upper and lower limit of the rotation torque of the wind turbine 20. And the maximum amount of electric power that the generator 30 can generate.

本実施形態では、学習装置７０は、発電機３０により発電される回生電力の電力量を制御する電力制御パラメータを出力し、出力した電力制御パラメータが風力発電機本体１０に設定されることによる状態の変化を観察し、状態の変化に応じて次の電力制御パラメータを決定する。 In the present embodiment, the learning device 70 outputs a power control parameter that controls the amount of regenerated power generated by the generator 30, and the output power control parameter is set in the wind generator main body 10. Observe the change in the state and determine the next power control parameter according to the change in the state.

また、学習装置７０は、学習装置７０が出力した電力制御パラメータが風力発電機本体１０に設定されることによる風力発電機本体１０の状態の変化に応じた報酬を受け取る。これにより、学習装置７０は、報酬を手掛かりとして自身が出力した電力制御パラメータの良し悪しを判断することにより学習を進め、より適した電力制御パラメータを出力することが可能となる。 Further, the learning device 70 receives a reward according to a change in the state of the wind power generator main body 10 due to the power control parameter output by the learning device 70 being set in the wind power generator main body 10. As a result, the learning device 70 can proceed with learning by determining whether the power control parameter output by itself is good or bad by using the reward as a clue, and can output a more suitable power control parameter.

図２は、本発明の一実施形態に係る風力発電システム１の制御装置６０の構成の一例を示すブロック図である。
図２に示すように、制御装置６０は、パラメータ取得部６１と、状態検出部６２と、報酬算出部６３と、報酬出力部６４とを備える。また、学習装置７０は、強化学習部７１を備える。ここで、強化学習部７１は、「学習部」の一例である。 FIG. 2 is a block diagram showing an example of the configuration of the control device 60 of the wind power generation system 1 according to the embodiment of the present invention.
As shown in FIG. 2, the control device 60 includes a parameter acquisition unit 61, a state detection unit 62, a reward calculation unit 63, and a reward output unit 64. Further, the learning device 70 includes a reinforcement learning unit 71. Here, the reinforcement learning unit 71 is an example of the “learning unit”.

パラメータ取得部６１は、強化学習部７１から出力される電力制御パラメータを取得する。パラメータ取得部６１は、取得した電力制御パラメータを、風力発電機本体１０に対して出力する。 The parameter acquisition unit 61 acquires the power control parameter output from the reinforcement learning unit 71. The parameter acquisition unit 61 outputs the acquired power control parameters to the wind power generator main body 10.

状態検出部６２は、風力発電機本体１０の状態を示す状態パラメータを検出する。状態パラメータは、風力発電機本体１０に含まれる風車２０や発電機３０に関する情報であり、例えば、風速センサ４１により検出された風速、回転速度センサ４２により検出された回転速度、及び発電機３０により発電された回生電力を示す情報である。状態検出部６２は、検出した状態パラメータを、報酬算出部６３に出力する。 The state detection unit 62 detects a state parameter indicating the state of the wind power generator main body 10. The state parameters are information about the wind turbine 20 and the generator 30 included in the wind power generator main body 10, and are, for example, the wind speed detected by the wind speed sensor 41, the rotation speed detected by the rotation speed sensor 42, and the generator 30. It is information indicating the regenerated electric power generated. The state detection unit 62 outputs the detected state parameter to the reward calculation unit 63.

報酬算出部６３は、状態検出部６２により検出された風速情報、及び電力情報に基づいて、報酬を算出する。報酬算出部６３は、予め定めた所定の報酬条件に応じて報酬を算出する。報酬算出部６３は、算出した報酬を報酬出力部６４に出力する。 The reward calculation unit 63 calculates the reward based on the wind speed information and the electric power information detected by the state detection unit 62. The reward calculation unit 63 calculates the reward according to a predetermined reward condition. The reward calculation unit 63 outputs the calculated reward to the reward output unit 64.

報酬出力部６４は、報酬算出部６３から取得した報酬を、強化学習部７１に出力する。 The reward output unit 64 outputs the reward acquired from the reward calculation unit 63 to the reinforcement learning unit 71.

ここで、報酬条件は、例えば、風速が強風でない通常時においては、風速に対する回生電力の電力量に応じて決定される。つまり、発電量が大きい程、高い報酬が算出される。また、強風時においては、風車２０の安全性に応じて、風車２０の回転速度がより適切に制御されたと判定される場合に、より高い報酬が得られるように設定される。風車２０の安全性とは、風車２０が機械的な耐用限界等を考慮した場合に安全に回転することができる程度であり、例えば、風速が適切である通常状態においては風車２０の回転数が多少増加しても風車２０が破損する等の危険性がなく安全であるが、風速が強い強風状態で制御する場合においては回転数が多少増加した場合でも風車２０が破損する可能性があり危険であるというような、風車２０の安全性を示す度合である。なお、通常状態とは風速が所定の強風閾値未満であることをいい、強風状態とは風速が所定の強風閾値以上であることをいう。 Here, the reward condition is determined according to the electric energy of the regenerative power with respect to the wind speed, for example, in a normal time when the wind speed is not a strong wind. That is, the larger the amount of power generation, the higher the reward is calculated. Further, in a strong wind, it is set so that a higher reward can be obtained when it is determined that the rotation speed of the wind turbine 20 is controlled more appropriately according to the safety of the wind turbine 20. The safety of the wind turbine 20 is such that the wind turbine 20 can rotate safely when the mechanical durability limit and the like are taken into consideration. For example, in a normal state where the wind speed is appropriate, the rotation speed of the wind turbine 20 is high. It is safe because there is no danger that the wind turbine 20 will be damaged even if it increases a little, but when controlling in a strong wind state where the wind speed is strong, the wind turbine 20 may be damaged even if the rotation speed increases a little. It is the degree to which the safety of the wind turbine 20 is shown. The normal state means that the wind speed is less than the predetermined strong wind threshold value, and the strong wind state means that the wind speed is equal to or higher than the predetermined strong wind threshold value.

風力発電では、強風時においては、特に風車２０の回転数を抑制して風車２０が過回転となって破損してしまうことを抑制する必要がある。風車２０の回転速度が過多になった場合には風車２０を強制停止するような安全対策が講じられる場合がほとんどである。風車２０が強制停止に至れば、回生電力を得ることができなくなってしまう。このため。強風時においては、風車２０の回転数が過多とならないように制御され、強制停止に至らないように制御されることが望ましい。 In wind power generation, it is necessary to suppress the rotation speed of the wind turbine 20 in particular in a strong wind to prevent the wind turbine 20 from being over-rotated and damaged. In most cases, safety measures are taken such as forcibly stopping the wind turbine 20 when the rotation speed of the wind turbine 20 becomes excessive. If the wind turbine 20 is forcibly stopped, it will not be possible to obtain regenerative power. For this reason. In a strong wind, it is desirable that the rotation speed of the wind turbine 20 is controlled so as not to be excessive and the wind turbine 20 is controlled so as not to reach a forced stop.

そこで、本実施形態では、報酬算出部６３は、風速が強風ではない通常状態においては、発電された回生電力が大きい程、高い報酬を算出する。また、報酬算出部６３は、風速が強風である強風状態においては、風車２０の回転数が過多とならないように制御されている場合に、高い報酬を算出する。 Therefore, in the present embodiment, the reward calculation unit 63 calculates a higher reward as the generated regenerative power is larger in a normal state where the wind speed is not a strong wind. Further, the reward calculation unit 63 calculates a high reward when the rotation speed of the wind turbine 20 is controlled so as not to be excessive in a strong wind state where the wind speed is a strong wind.

強化学習部７１は、風力発電機本体１０から状態パラメータを取得する。また、強化学習部７１は、報酬出力部６４から報酬を取得する。強化学習部７１は、取得した状態パラメータ、及び風速に対する風車２０の安全性を示す情報に基づいて、報酬を手掛かりとしてより高い報酬が得られるように学習を進め、より適切な電力制御パラメータを出力する。 The reinforcement learning unit 71 acquires a state parameter from the wind power generator main body 10. Further, the reinforcement learning unit 71 acquires a reward from the reward output unit 64. Based on the acquired state parameters and information indicating the safety of the wind turbine 20 with respect to the wind speed, the reinforcement learning unit 71 advances learning so that a higher reward can be obtained by using the reward as a clue, and outputs more appropriate power control parameters. do.

図３は、第１の実施形態に係る報酬条件の例を示す図である。図３（ａ）は、風速が通常状態である場合における報酬条件の例、図３（ｂ）は、風速が強風状態である場合における報酬条件の例をそれぞれ示す。
図３（ａ）に示すように、風速が通常状態である場合において、風速に応じて基準となる発電電力を示す基準電力が予め決定される。この例では、通常の風速である風速１に対して基準電力１が、通常の風速である風速２に対して基準電力２が、それぞれ対応する基準となる発電電力である。また、報酬は、風速に応じた基準電力より発電された回生電力が高い（大きい）ほど、より高い報酬となる。 FIG. 3 is a diagram showing an example of the reward condition according to the first embodiment. FIG. 3A shows an example of a reward condition when the wind speed is in a normal state, and FIG. 3B shows an example of a reward condition when the wind speed is in a strong wind state.
As shown in FIG. 3A, when the wind speed is in the normal state, the reference power indicating the reference generated power is determined in advance according to the wind speed. In this example, the reference power 1 corresponds to the wind speed 1 which is the normal wind speed, and the reference power 2 corresponds to the wind speed 2 which is the normal wind speed. In addition, the higher (larger) the regenerative power generated than the reference power according to the wind speed, the higher the reward.

ここで、回生電力の電力量は、風速Ｖの３乗（Ｖ＾３）に比例することが知られている。このため、基準電力は、例えば、Ｐ／Ｖ＾３を指標として決定されてよい。 Here, it is known that the electric energy of the regenerative electric power is proportional to the cube of the wind speed V (V ^ 3). Therefore, the reference power may be determined using, for example, P / V ^ 3 as an index.

なお、風力発電においては、風車２０が受ける風速に対する回生電力の最大値は風車ごとに固有の特性値として一義的に定められている。そして、この回生電力の最大値が、風速に応じた回生電力の目標値として設定され、回生電力の目標値に近づくように回生電力が制御される。基準電力は、例えば、回生電力の最大値から所定のマージン値を減少させた値に設定される。 In wind power generation, the maximum value of regenerative power with respect to the wind speed received by the wind turbine 20 is uniquely determined as a characteristic value peculiar to each wind turbine. Then, the maximum value of this regenerative power is set as a target value of the regenerative power according to the wind speed, and the regenerative power is controlled so as to approach the target value of the regenerative power. The reference power is set to, for example, a value obtained by subtracting a predetermined margin value from the maximum value of the regenerative power.

図３（ｂ）に示すように、風速が強風状態である場合において、強風時において基準となる風車２０の回転速度の基準を示す強風基準速度が予め決定される。また、報酬は、風車２０の回転速度が強風基準回転速度より低いほど、より高い報酬となる。
この場合において、単に風車２０の回転速度が強風基準回転速度より低いほど高い報酬を与えた場合、風車の回転速度が０（ゼロ）となった場合にも高い報酬が算出されてしまう場合がある。このための対策として、風車２０の回転速度の下限値が設定されてもよい。この場合、報酬は、風車２０の回転速度が強風基準回転速度より低く、尚且つ、風車２０の回転速度が風車２０の回転速度の下限値より高い（大きい）場合に、より高い報酬となる。 As shown in FIG. 3B, when the wind speed is in a strong wind state, a strong wind reference speed indicating a reference of the rotation speed of the wind turbine 20 as a reference in a strong wind is determined in advance. Further, the reward becomes higher as the rotation speed of the wind turbine 20 is lower than the strong wind reference rotation speed.
In this case, if the rotation speed of the wind turbine 20 is lower than the strong wind reference rotation speed, the higher the reward is given, and even if the rotation speed of the wind turbine becomes 0 (zero), the higher reward may be calculated. .. As a countermeasure for this, a lower limit value of the rotation speed of the wind turbine 20 may be set. In this case, the reward is higher when the rotation speed of the wind turbine 20 is lower than the strong wind reference rotation speed and the rotation speed of the wind turbine 20 is higher (larger) than the lower limit of the rotation speed of the wind turbine 20.

図４は、第１の実施形態に係る制御装置６０の動作例を示すフローチャートである。
まず、状態検出部６２は、状態パラメータを取得する（ステップＳ１１）。具体的には、状態検出部６２は、電力情報、風速情報、及び回転情報を取得する。状態検出部６２は、検出した状態パラメータを報酬算出部６３に出力する。 FIG. 4 is a flowchart showing an operation example of the control device 60 according to the first embodiment.
First, the state detection unit 62 acquires a state parameter (step S11). Specifically, the state detection unit 62 acquires power information, wind speed information, and rotation information. The state detection unit 62 outputs the detected state parameter to the reward calculation unit 63.

次に、報酬算出部６３は、報酬を算出する。
報酬算出部６３は、風速が通常であるか否かを判定する（ステップＳ１２）。報酬算出部６３は、風速が通常である場合、風速に応じた基準電力を取得する（ステップＳ１３）。基準電力は、例えば、制御装置６０の図示しない電力情報記憶部に記憶される。報酬算出部６３は、回生電力と基準電力との差分を算出し（ステップＳ１４）、差分に応じた報酬を算出する（ステップＳ１５）。具体的には、報酬算出部６３は、回生電力が基準電力に比べて大きい程、より高い報酬を算出する。 Next, the reward calculation unit 63 calculates the reward.
The reward calculation unit 63 determines whether or not the wind speed is normal (step S12). When the wind speed is normal, the reward calculation unit 63 acquires the reference power according to the wind speed (step S13). The reference power is stored, for example, in a power information storage unit (not shown) of the control device 60. The reward calculation unit 63 calculates the difference between the regenerative power and the reference power (step S14), and calculates the reward according to the difference (step S15). Specifically, the reward calculation unit 63 calculates a higher reward as the regenerative power is larger than the reference power.

一方、ステップＳ１２において、風速が通常ではなく強風である場合、報酬算出部６３は、強風時における風車２０の回転速度の基準となる強風基準速度を取得する。強風基準速度は、例えば、制御装置６０の図示しない回転情報記憶部に記憶される。報酬算出部６３は、回転速度と強風基準速度との差分を算出し（ステップＳ１７）、差分に応じた報酬を算出する（ステップＳ１５）。具体的には、報酬算出部６３は、回転速度が強風基準速度に比べて小さいほど、より高い報酬を算出する。 On the other hand, in step S12, when the wind speed is not normal but strong wind, the reward calculation unit 63 acquires a strong wind reference speed which is a reference of the rotation speed of the wind turbine 20 at the time of strong wind. The strong wind reference speed is stored, for example, in a rotation information storage unit (not shown) of the control device 60. The reward calculation unit 63 calculates the difference between the rotation speed and the strong wind reference speed (step S17), and calculates the reward according to the difference (step S15). Specifically, the reward calculation unit 63 calculates a higher reward as the rotation speed is smaller than the strong wind reference speed.

＜第２の実施形態＞
次に第２の実施形態について説明する。
本実施形態では、制御システム５０Ａの制御装置６０Ａが応答時間を考慮した報酬を算出する点において、他の実施形態と相違する。以下では、上述した実施形態と異なる点を説明し、上述した実施形態と同一または類似の機能を有する構成に同一の符号を付し、その説明を省略する。 <Second embodiment>
Next, the second embodiment will be described.
This embodiment differs from other embodiments in that the control device 60A of the control system 50A calculates a reward in consideration of the response time. Hereinafter, the points different from those of the above-described embodiment will be described, and the same reference numerals will be given to the configurations having the same or similar functions as those of the above-described embodiments, and the description thereof will be omitted.

図５は、第２の実施形態に係る風力発電システム１Ａの構成の一例を示すブロック図である。図５に示すように、制御装置６０Ａは、応答時間推定部６９を備える。
応答時間推定部６９は、応答時間を推定する。応答時間は、風力発電機本体１０に電力制御パラメータが設定された時刻から、発電機３０により発電される回生電力が応答するまでの時刻である。回生電力が応答するとは、設定された電力制御パラメータに応じた電力値が出力されることである。 FIG. 5 is a block diagram showing an example of the configuration of the wind power generation system 1A according to the second embodiment. As shown in FIG. 5, the control device 60A includes a response time estimation unit 69.
The response time estimation unit 69 estimates the response time. The response time is the time from the time when the power control parameter is set in the wind power generator main body 10 to the time when the regenerated electric power generated by the generator 30 responds. The response of the regenerative power means that the power value corresponding to the set power control parameter is output.

応答時間推定部６９は、風速情報と電力情報とを取得し、取得した風速情報の時系列変化と電力情報の時系列変化とを比較する。応答時間推定部６９は、ある電力情報の時系列変化を示す時間区間に対し、その電力情報の時系列変化と変化の傾向が一致する、又は似ている風速情報が示される時間区間を抽出する。 The response time estimation unit 69 acquires wind speed information and electric power information, and compares the time-series change of the acquired wind speed information with the time-series change of the electric power information. The response time estimation unit 69 extracts, for a time interval indicating a time-series change of a certain electric power information, a time interval in which the time-series change of the electric power information and the tendency of the change are the same or similar. ..

応答時間推定部６９は、例えば、ある電力情報の時系列変化を示す時間区間と、その時間区間に所定の検索単位時間遡った時間区間における電力情報の時系列変化との類似度合を算出する。応答時間推定部６９は、類似度合として、双方の時間区間の先頭の値から変化の方向を逐次比較する。変化の方向とは、例えば、値が増加しているか、又は減少しているかである。応答時間推定部６９は、変化の方向が一致している場合に類似、一致していない場合に非類似とし、時間区間の全体における類似する値の総和を類似度合として算出する。 The response time estimation unit 69 calculates, for example, the degree of similarity between the time interval showing the time-series change of a certain electric power information and the time-series change of the electric power information in the time interval retroactive to a predetermined search unit time in the time interval. The response time estimation unit 69 sequentially compares the direction of change from the value at the beginning of both time intervals as the degree of similarity. The direction of change is, for example, whether the value is increasing or decreasing. The response time estimation unit 69 makes it similar when the directions of change match, and dissimilar when they do not match, and calculates the sum of similar values in the entire time interval as the degree of similarity.

応答時間推定部６９は、電力情報の時間区間を所定の検索単位時間ずつ遡った風速情報の時間区間に対する類似度合をそれぞれ算出し、算出した類似度合が最も高い風速情報の時間区間との差分を応答時間として推定する。応答時間推定部６９は、推定した応答時間を報酬算出部６３に出力する。 The response time estimation unit 69 calculates the degree of similarity to the time section of the wind velocity information that traces the time interval of the power information by a predetermined search unit time, and calculates the difference from the time interval of the wind speed information having the highest degree of similarity. Estimated as response time. The response time estimation unit 69 outputs the estimated response time to the reward calculation unit 63.

ここで、応答時間は、風車２０の持つ慣性（イナーシャ）により発生する。また、応答時間は、風速の変化に応じた時間である。つまり、風速の変化に対して回転速度に一定でない遅れが発生する。このため、応答時間推定部６９は、風速に応じた応答時間を推定する。すなわち、応答時間推定部６９は、風速が変化する度に応答時間を推定する処理を行うことが望ましい。実際の風速は時々刻々と変化するため、応答時間推定部６９は、電力情報を取得する度に、その取得した電力情報に応じた応答時間を推定するようにしてよい。 Here, the response time is generated by the inertia of the wind turbine 20. The response time is a time corresponding to a change in wind speed. That is, a non-constant delay occurs in the rotation speed with respect to the change in the wind speed. Therefore, the response time estimation unit 69 estimates the response time according to the wind speed. That is, it is desirable that the response time estimation unit 69 performs a process of estimating the response time each time the wind speed changes. Since the actual wind speed changes from moment to moment, the response time estimation unit 69 may estimate the response time according to the acquired power information each time the power information is acquired.

報酬算出部６３は、応答時間推定部６９により推定された応答時間に応じて、報酬を算出する。報酬算出部６３は、風速情報を取得した時点より、応答時間だけ時間が経過した後の電力情報に基づいて、報酬を算出する。報酬算出部６３は、例えば、風速が加速した時刻において回生電力を増加させるような電力制御パラメータが設定された場合、その時刻から応答時間経過後に、実際に発電された回生電力が増加していた場合に、適切な制御が行われたとしてより高い報酬を算出する。逆に、報酬算出部６３は、風速が加速した時刻において回生電力を増加させるような電力制御パラメータが設定された場合、その時刻から応答時間経過後に、実際に発電された回生電力が減少していた場合に、不適切な制御が行われたとしてより低い報酬を算出する。 The reward calculation unit 63 calculates the reward according to the response time estimated by the response time estimation unit 69. The reward calculation unit 63 calculates the reward based on the electric power information after the response time has elapsed from the time when the wind speed information is acquired. For example, when the power control parameter for increasing the regenerated power is set at the time when the wind speed accelerates, the reward calculation unit 63 increases the regenerated power actually generated after the response time elapses from that time. In some cases, a higher reward is calculated as appropriate control is taken. On the contrary, when the power control parameter that increases the regenerated power is set at the time when the wind speed accelerates, the reward calculation unit 63 reduces the regenerated power actually generated after the response time elapses from that time. If so, a lower reward is calculated for improper control.

図６は、第２の実施形態に係る応答時間推定部６９が応答時間を推定する処理を説明する図である。図６の上側では回生電力と時間の関係を示し、図６の下側では風速と時間の関係を示している。
図６に示すように、応答時間推定部６９がある時刻ｔ２における電力情報Ｐ（ｔ２）を取得した場合を例に説明する。まず、応答時間推定部６９は、電力情報Ｐ（ｔ２）を取得すると、時刻ｔ２を含む時間区間Ｔ１における電力情報の時系列変化を取得する。次に、応答時間推定部６９は、時間区間Ｔ１における電力情報の時系列変化と、時間区間Ｔ１から所定の検索単位時間αだけ遡った時間区間Ｔ２における風力発電の時系列変化との類似度合を算出する。次に、応答時間推定部６９は、時間区間Ｔ１における電力情報の時系列変化と、時間区間Ｔ１から所定の検索単位時間２αだけ遡った時間区間Ｔ３における風力発電の時系列変化との類似度合を算出する。このように、応答時間推定部６９は、順次、所定の検索単位時間αずつ遡った時間区間における風力発電の時系列変化との類似度合を算出し、算出した類似度合が最も大きい時間区間と、時間区間Ｔ１との差分を応答時間Ｄと推定する。この例では、時間区間Ｔ１における電力情報の時系列変化と、時間区間Ｔ１から時間２α遡った時間区間Ｔ３における風力情報の時系列変化の類似度合が最も大きくなる例を示している。この場合、応答時間推定部６９は、時間区間Ｔ１と時間区間Ｔ３との差分Ｄ（＝２α）を応答時間と推定する。 FIG. 6 is a diagram illustrating a process in which the response time estimation unit 69 according to the second embodiment estimates the response time. The upper part of FIG. 6 shows the relationship between the regenerative power and time, and the lower part of FIG. 6 shows the relationship between the wind speed and time.
As shown in FIG. 6, a case where the response time estimation unit 69 acquires the power information P (t2) at a certain time t2 will be described as an example. First, when the response time estimation unit 69 acquires the power information P (t2), the response time estimation unit 69 acquires the time-series change of the power information in the time interval T1 including the time t2. Next, the response time estimation unit 69 determines the degree of similarity between the time-series change of the electric power information in the time section T1 and the time-series change of the wind power generation in the time section T2 retroactive by a predetermined search unit time α from the time section T1. calculate. Next, the response time estimation unit 69 determines the degree of similarity between the time-series change of the electric power information in the time section T1 and the time-series change of the wind power generation in the time section T3 that goes back by a predetermined search unit time 2α from the time section T1. calculate. In this way, the response time estimation unit 69 sequentially calculates the degree of similarity with the time-series change of the wind power generation in the time section traced back by a predetermined search unit time α, and the time section having the largest calculated similarity degree. The difference from the time interval T1 is estimated as the response time D. This example shows an example in which the degree of similarity between the time-series change of the electric power information in the time section T1 and the time-series change of the wind power information in the time section T3 retroactively from the time section T1 by the time 2α is the largest. In this case, the response time estimation unit 69 estimates the difference D (= 2α) between the time interval T1 and the time interval T3 as the response time.

＜第３の実施形態＞
ここでは、第３の実施形態について、説明する。本実施形態では、風速の変化の方向に応じて報酬条件が異なる点において、他の実施形態と相違する。以下では、上述した実施形態と異なる点を説明し、上述した実施形態と同一または類似の機能を有する構成に同一の符号を付し、その説明を省略する。 <Third embodiment>
Here, the third embodiment will be described. This embodiment is different from other embodiments in that the reward conditions differ depending on the direction of change in wind speed. Hereinafter, the points different from those of the above-described embodiment will be described, and the same reference numerals will be given to the configurations having the same or similar functions as those of the above-described embodiments, and the description thereof will be omitted.

図７は、第３の実施形態に係る報酬条件の例を示す図である。図７に示すように、本実施形態では、風速が増加している場合と減速している場合とで異なる報酬条件となる。
風速が増加している場合、報酬算出部６３Ｂは、他の実施形態と同様に、風速に応じた基準電力と回生電力の電力量に応じて、回生電力の電力量がより大きい場合により高い報酬を算出する。
風速が減速している場合、報酬算出部６３Ｂは、その風速が減速している時点における回生電力の電力量に応じて報酬を算出せず、その後に風速が加速した時点を考慮した回生電力の電力量に応じて報酬を算出する。
具体的には、報酬算出部６３Ｂは、風速が減速している場合、その時点から風速が加速に変化した後、所定時間が経過するまでの間の平均の風速に応じた基準電力を取得する。
また、報酬算出部６３Ｂは、風速が減速している場合、その時点から風速が加速に変化した後、所定時間が経過するまでの間の回生電力の平均である平均電力量を取得する。
そして、報酬算出部６３Ｂは、平均の風速に応じた基準電力と平均電力量とに基づいて、報酬を算出する。なお、報酬算出部６３Ｂは、応答時間を考慮した風速情報や電力情報を用いて報酬を算出してもよいし、応答時間を考慮せずに報酬を算出してもよい。 FIG. 7 is a diagram showing an example of the reward condition according to the third embodiment. As shown in FIG. 7, in the present embodiment, the reward conditions are different depending on whether the wind speed is increasing or decelerating.
When the wind speed is increasing, the reward calculation unit 63B, as in the other embodiment, receives a higher reward when the electric energy of the regenerative electric power is larger according to the electric energy of the reference power and the regenerative electric power according to the wind speed. Is calculated.
When the wind speed is decelerating, the reward calculation unit 63B does not calculate the reward according to the electric energy of the regenerative power at the time when the wind speed is decelerating, and the regenerative power considering the time when the wind speed accelerates thereafter. The reward is calculated according to the amount of electricity.
Specifically, when the wind speed is decelerating, the reward calculation unit 63B acquires the reference power according to the average wind speed from that time until the predetermined time elapses after the wind speed changes to acceleration. ..
Further, when the wind speed is decelerating, the reward calculation unit 63B acquires the average electric energy which is the average of the regenerative power from that time until the predetermined time elapses after the wind speed changes to acceleration.
Then, the reward calculation unit 63B calculates the reward based on the reference power and the average electric energy according to the average wind speed. The reward calculation unit 63B may calculate the reward by using the wind speed information or the electric power information in consideration of the response time, or may calculate the reward without considering the response time.

風力発電システム１においては、風速が減速する場合、風速に応じた最大の回生電力が得られる回転数で風車を回転させ続けると、発電負荷により風車の回転が失速してしまう場合がある。このため、風速が減速する場合には、風速の減速の度合い（単位時間あたりの風速の変化量）を考慮して、回生電力がなるべく維持されるように制御されることが望ましい。
しかしながら、風速が減速しているにも関わらず、その時点における回生電力の電力量に応じて、電力量が大きい場合に高い報酬を算出してしまうと、風車の回転が失速させてまで大きな電力量が得られるような制御がなされた場合にも高い報酬が算出されてしまうことになる。このような報酬が与えられた場合、強化学習部７１に、誤った学習をさせてしまう。
この対策として、本実施形態の報酬算出部６３Ｂは、風速が減速している場合、その時点から風速が加速に変化した後、所定時間が経過するまでの間の平均の風速に応じた基準電力を取得する。
なお、報酬算出部６３Ｂは、例えば、今回取得した風速情報と前回取得した風速情報との差分を算出し、算出した差分が正である場合に風速が加速していると判定し、差分が負である場合に風速が減速していると判定する。この場合、報酬算出部６３Ｂは、差分が０（ゼロ）である場合には、風速が加速していると判定してもよいし、風速が減速していると判定してもよい。 In the wind power generation system 1, when the wind speed is decelerated, if the wind turbine is continuously rotated at a rotation speed at which the maximum regenerative power corresponding to the wind speed can be obtained, the rotation of the wind turbine may stall due to the power generation load. Therefore, when the wind speed is decelerated, it is desirable to control the regenerative power so as to be maintained as much as possible in consideration of the degree of deceleration of the wind speed (the amount of change in the wind speed per unit time).
However, even though the wind speed is decelerating, if a high reward is calculated when the amount of power is large according to the amount of regenerative power at that time, the rotation of the wind turbine will stall and the amount of power will be large. Even if the control is performed so that the amount can be obtained, a high reward will be calculated. When such a reward is given, the reinforcement learning unit 71 is made to perform erroneous learning.
As a countermeasure, when the wind speed is decelerating, the reward calculation unit 63B of the present embodiment has a reference power according to the average wind speed from that point onward until the predetermined time elapses after the wind speed changes to acceleration. To get.
The reward calculation unit 63B calculates, for example, the difference between the wind speed information acquired this time and the wind speed information acquired last time, and if the calculated difference is positive, it is determined that the wind speed is accelerating, and the difference is negative. If, it is determined that the wind speed is decelerating. In this case, when the difference is 0 (zero), the reward calculation unit 63B may determine that the wind speed is accelerating or may determine that the wind speed is decelerating.

図８は、第３の実施形態に係る報酬算出部６３Ｂが報酬を算出する処理を説明する図である。図８の上側では回生電力と時間の関係を示し、図８の下側では風速と時間の関係を示している。図８では、報酬算出部６３Ｂが応答時間Ｄを考慮して報酬を算出する場合を例示している。 FIG. 8 is a diagram illustrating a process in which the reward calculation unit 63B according to the third embodiment calculates a reward. The upper part of FIG. 8 shows the relationship between the regenerative power and time, and the lower part of FIG. 8 shows the relationship between the wind speed and time. FIG. 8 illustrates a case where the reward calculation unit 63B calculates the reward in consideration of the response time D.

図８に示すように、報酬算出部６３Ｂがある時刻ｔ３における風速情報Ｖｍを取得した場合を例に説明する。まず、報酬算出部６３Ｂは、時刻ｔ３における風速の変化の方向を算出する。この例では、時刻ｔ３は風速が減速する減速区間Ｖｇに含まれている場合を示している。
報酬算出部６３Ｂは、時刻ｔ３における風速が減速している場合、その時刻ｔ３から風速が加速に変化した時刻ｔ４の後、所定時間Ｔ４が経過するまでの時間区間Ｖａｖｅの間の平均の風速、つまり風速情報Ｖｍ～Ｖｍ＃までの平均の風速に応じた基準電力を取得する。なお、報酬算出部６３Ｂは、所定時間Ｔ４を風車の特性に応じて任意に決定してよい。 As shown in FIG. 8, a case where the reward calculation unit 63B acquires the wind speed information Vm at a certain time t3 will be described as an example. First, the reward calculation unit 63B calculates the direction of change in the wind speed at time t3. In this example, the time t3 shows the case where the time t3 is included in the deceleration section Vg at which the wind speed decelerates.
When the wind speed at time t3 is decelerating, the reward calculation unit 63B determines that the average wind speed during the time interval Wave from that time t3 until the predetermined time T4 elapses after the time t4 when the wind speed changes to acceleration. That is, the reference power corresponding to the average wind speed from the wind speed information Vm to Vm # is acquired. The reward calculation unit 63B may arbitrarily determine the predetermined time T4 according to the characteristics of the wind turbine.

また、報酬算出部６３Ｂは、時間区間Ｖａｖｅに対応する時間区間Ｐａｖｅにおける回生電力の平均である平均電力量、つまり電力情報Ｐｍ～Ｐｍ＃までの平均の電力量を取得する。なお、この例では、時間区間Ｖａｖｅに対し、所定の応答時間Ｄだけ遅延させた区間が時間区間Ｐａｖｅに相当する場合を示している。
報酬算出部６３Ｂは、時間区間Ｖａｖｅにおける平均の風速に応じた基準電力に対し、時間区間Ｐａｖｅにおける平均の電力量が大きい程、より高い報酬を算出する。 Further, the reward calculation unit 63B acquires the average electric energy amount which is the average of the regenerated electric power in the time interval Pave corresponding to the time interval Wave, that is, the average electric energy from the electric power information Pm to Pm #. In this example, the case where the interval delayed by a predetermined response time D with respect to the time interval Wave corresponds to the time interval Pave is shown.
The reward calculation unit 63B calculates a higher reward as the average power amount in the time section Pave is larger than the reference power corresponding to the average wind speed in the time section Wave.

図９は、第３の実施形態に係る制御システム５０Ｃの動作例を示すフローチャートである。
まず、状態検出部６２は、状態パラメータを取得する（ステップＳ２１）。具体的には、状態検出部６２は、電力情報、風速情報、及び回転情報を取得する。状態検出部６２は、検出した状態パラメータを報酬算出部６３Ｂに出力する。 FIG. 9 is a flowchart showing an operation example of the control system 50C according to the third embodiment.
First, the state detection unit 62 acquires a state parameter (step S21). Specifically, the state detection unit 62 acquires power information, wind speed information, and rotation information. The state detection unit 62 outputs the detected state parameter to the reward calculation unit 63B.

次に、報酬算出部６３Ｂは、報酬を算出する。
報酬算出部６３Ｂは、風速が加速であるか否かを判定する（ステップＳ２２）。報酬算出部６３Ｂは、風速が加速である場合、風速に対応する回生電力を取得する（ステップＳ２３）。報酬算出部６３Ｂは、風速の基準電力と回生電力とに応じた報酬を算出する（ステップＳ２４）。具体的には、報酬算出部６３Ｂは、回生電力が基準電力に比べて大きい程、より高い報酬を算出する。 Next, the reward calculation unit 63B calculates the reward.
The reward calculation unit 63B determines whether or not the wind speed is accelerating (step S22). When the wind speed is accelerating, the reward calculation unit 63B acquires the regenerative power corresponding to the wind speed (step S23). The reward calculation unit 63B calculates a reward according to the reference power of the wind speed and the regenerative power (step S24). Specifically, the reward calculation unit 63B calculates a higher reward as the regenerative power is larger than the reference power.

一方、ステップＳ２２において、風速が加速ではなく減速である場合、報酬算出部６３Ｂは、その後風速が加速に変化した後、所定時間経過するまでの間における風速の平均値を取得する（ステップＳ２５）。報酬算出部６３Ｂは、風速の平均値を算出した時間区間に対応する時間区間における回生電力の電力量の平均算出する（ステップＳ２６）。報酬算出部６３Ｂは、風速の平均値に応じた基準電力と回生電力の平均値とに応じた報酬を算出する（ステップＳ２７）。 On the other hand, in step S22, when the wind speed is deceleration instead of acceleration, the reward calculation unit 63B acquires the average value of the wind speed until a predetermined time elapses after the wind speed changes to acceleration (step S25). .. The reward calculation unit 63B calculates the average electric energy of the regenerated electric power in the time interval corresponding to the time interval in which the average value of the wind speed is calculated (step S26). The reward calculation unit 63B calculates a reward according to the average value of the reference power and the regenerative power according to the average value of the wind speed (step S27).

（第４の実施形態）
次に、第４の実施形態について説明する。
本実施形態では、制御装置６０Ｄ（図１０参照）が、学習済みモデルを用いて風車２０の回転数を制御する点において、上述した実施形態と相違する。
図１０は、第４の実施形態の変形例に係る風力発電システム１Ｄの概略構成の一例を示すブロック図である。図１０に示すように、制御装置６０Ｄは、学習済みモデル記憶部６５と、決定部６６と、制御部６７とを備える。 (Fourth Embodiment)
Next, a fourth embodiment will be described.
In the present embodiment, the control device 60D (see FIG. 10) differs from the above-described embodiment in that the rotation speed of the wind turbine 20 is controlled by using the trained model.
FIG. 10 is a block diagram showing an example of a schematic configuration of the wind power generation system 1D according to the modified example of the fourth embodiment. As shown in FIG. 10, the control device 60D includes a learned model storage unit 65, a determination unit 66, and a control unit 67.

学習済みモデル記憶部６５は、学習済みモデルを記憶する。学習済みモデルは、制御対象である風力発電機本体１０の状態と、風力発電機本体１０に対する制御との関係を示す情報（関係情報）が格納されたデータベース（学習済みモデル）である。学習済みモデルは、風力発電機本体１０の状態に応じて、その状態に対応する風力発電機本体１０を制御する指標を示すパラメータ（以下、制御指標パラメータという）を推定するモデルである。 The trained model storage unit 65 stores the trained model. The trained model is a database (learned model) in which information (relationship information) showing the relationship between the state of the wind power generator main body 10 to be controlled and the control for the wind power generator main body 10 is stored. The trained model is a model that estimates a parameter (hereinafter, referred to as a control index parameter) indicating an index for controlling the wind power generator main body 10 corresponding to the state of the wind power generator main body 10.

ここで、制御指標パラメータは、風力発電機本体１０を制御する指標となる情報であって、制御パラメータそのものであってもよいし、制御パラメータを導出するために用いられる情報であってもよい。 Here, the control index parameter is information that is an index for controlling the wind power generator main body 10, and may be the control parameter itself or the information used for deriving the control parameter.

例えば、制御指標パラメータが回生電力を制御する指標となる情報である場合、制御指標パラメータは、電力制御パラメータそのものであってもよいし、回生電力の目標値を示すものであってもよいし、回生電力を増加させる、又は減少させるというような回生電力の制御を相対的に示すものであってもよい。 For example, when the control index parameter is information that is an index for controlling the regenerative power, the control index parameter may be the power control parameter itself, or may indicate a target value of the regenerative power. It may relatively indicate the control of the regenerative power such as increasing or decreasing the regenerative power.

学習済みモデルは、例えば、上述した実施形態において強化学習部７１により学習が実施されることにより作成された学習済みモデルであってもよいし、他の風車であって、風車２０と似た構造を有し、風車２０が設置された地域と似たような地域に設けられた風車における風力発電システムの状態と制御との関係を学習した学習済みモデルであってもよい。 The trained model may be, for example, a trained model created by learning by the reinforcement learning unit 71 in the above-described embodiment, or another wind turbine having a structure similar to that of the wind turbine 20. It may be a trained model that has learned the relationship between the state and control of the wind power generation system in a wind turbine installed in an area similar to the area where the wind turbine 20 is installed.

決定部６６は、取得した制御指標パラメータに基づいて、風力発電機本体１０に対する制御に関する制御情報を決定する。ここでの制御情報は、制御指標パラメータに応じて決定される制御を示す情報であり、例えば回生電力に関する電力情報の指令値である。つまり、決定部６６は、電力指標パラメータに基づいて電力情報の指令値を決定する。決定部６６は、決定した電力情報の指令値を、制御部６７に出力する。 The determination unit 66 determines control information regarding control for the wind power generator main body 10 based on the acquired control index parameters. The control information here is information indicating control determined according to the control index parameter, and is, for example, a command value of power information regarding regenerative power. That is, the determination unit 66 determines the command value of the power information based on the power index parameter. The determination unit 66 outputs the command value of the determined power information to the control unit 67.

ここでの、電力情報には、例えば、回生電力を増加させるか、或いは減少させるかといった回生電力の変化を示す情報の他、段階的に変化させるか、即時変化させるかといった回生電力を変化させる度合を示す情報も含まれる。 Here, the electric power information includes, for example, information indicating a change in the regenerative power such as whether to increase or decrease the regenerative power, and also to change the regenerative power such as whether to change it stepwise or immediately. Information indicating the degree is also included.

制御部６７は、決定部６６により決定された制御情報に基づいて、風力発電機本体１０を制御する制御パラメータを決定する。制御部６７は、例えば、決定部６６により決定された電力情報に基づいて、回生電力が指令値に近づくよう、回生電力を制御する電力制御パラメータを決定する。制御部６７は、決定した制御パラメータを、パラメータ取得部６１を介して風力発電機本体１０に出力する。 The control unit 67 determines the control parameters for controlling the wind power generator main body 10 based on the control information determined by the determination unit 66. The control unit 67 determines, for example, a power control parameter for controlling the regenerative power so that the regenerative power approaches the command value based on the power information determined by the determination unit 66. The control unit 67 outputs the determined control parameter to the wind power generator main body 10 via the parameter acquisition unit 61.

（第５の実施形態）
次に、第５の実施形態について説明する。
本実施形態では、制御装置６０Ｅが学習済みモデルを用いて出力した制御指標パラメータ（以下、単にパラメータという）と、学習装置７０が出力したパラメータとのいずれかを用いて、風車２０の回転数を制御する点において、上述した実施形態と相違する。
図１１は、第５の実施形態の変形例に係る風力発電システム１Ｅの概略構成の一例を示すブロック図である。図１１に示すように、制御装置６０Ｅは、選択部６８を備える。 (Fifth Embodiment)
Next, a fifth embodiment will be described.
In the present embodiment, the rotation speed of the wind turbine 20 is determined by using either the control index parameter (hereinafter, simply referred to as a parameter) output by the control device 60E using the trained model or the parameter output by the learning device 70. It differs from the above-described embodiment in that it is controlled.
FIG. 11 is a block diagram showing an example of a schematic configuration of the wind power generation system 1E according to the modified example of the fifth embodiment. As shown in FIG. 11, the control device 60E includes a selection unit 68.

選択部６８は、学習済みモデル記憶部６５に記憶される学習済みモデルから出力されるパラメータと、学習装置７０により出力されるパラメータとの何れか一方を決定部６６に出力する。選択部６８は、何れの一方を選択するかを、予め定められたフェーズに従って決定するようにしてよい。選択部６８は、例えば、風車２０の回転数の制御を学習装置７０に学習させる学習フェーズにおいては、学習装置７０により出力されるパラメータを選択する。一方、選択部６８は、風車２０の回転数の制御を学習済みの学習モデルが学習済みモデル記憶部６５に記憶され、学習済みモデルを用いて風車２０の回転数の制御する制御フェーズにおいては、学習済みモデル記憶部６５に記憶される学習済みモデルから出力されるパラメータを選択する。 The selection unit 68 outputs either a parameter output from the trained model stored in the trained model storage unit 65 or a parameter output by the learning device 70 to the determination unit 66. The selection unit 68 may determine which one to select according to a predetermined phase. The selection unit 68 selects, for example, the parameters output by the learning device 70 in the learning phase in which the learning device 70 learns the control of the rotation speed of the wind turbine 20. On the other hand, in the selection unit 68, the learning model learned to control the rotation speed of the windmill 20 is stored in the learned model storage unit 65, and in the control phase in which the learning model is used to control the rotation speed of the windmill 20. Select the parameter output from the trained model stored in the trained model storage unit 65.

また、上述した少なくとも一つの実施形態では、強化学習部７１が学習した内容を、学習済みモデル記憶部６５やその他の図示しない記憶部に記憶させておき、記憶させた内容に基づいて、更に学習を進めるようにしてよい。これにより、風車２０に共通するある程度の基本的な制御について学習したモデルを、風車２０が設けられた地域の風況や、季節の風況、昼夜の時間帯による風況の相違や、天候等の状態に応じた制御を行うことが可能となる。 Further, in at least one embodiment described above, the content learned by the reinforcement learning unit 71 is stored in the trained model storage unit 65 and other storage units (not shown), and further learning is performed based on the stored content. You may try to proceed. As a result, the model learned about some basic control common to the wind turbine 20 can be used for wind conditions in the area where the wind turbine 20 is installed, seasonal wind conditions, differences in wind conditions depending on the time of day and night, weather, etc. It is possible to perform control according to the state.

以上説明したように、第５の実施形態の制御システム５０Ｅは、風力発電システム１により発電された回生電力を制御する制御システムであって、風力発電システム１の風車２０の設置場所における風速を示す風速情報、風車２０の回転に関する回転情報、風力発電システム１により発電される回生電力に関する電力情報、及び、風速と回生電力との関係情報と風速に対する電力情報に応じた報酬とに基づいて、風速と回生電力との対応情報を学習する強化学習部７１と、風速と回生電力との対応情報を記憶する学習済みモデル記憶部６５と、回生電力を制御する電力制御パラメータを風力発電システム１に設定した場合における回転情報、電力情報、及び風速情報を検出する状態検出部６２と、状態検出部６２により検出された回転情報と風速情報、及び対応情報に基づいて、電力情報の指令値を決定する決定部６６と、決定部６６により決定された指令値に基づいて、回生電力を制御する制御部６７とを備える。これにより、第５の実施形態の制御システム５０Ｅは、強化学習部７１に風速に対する電力情報に応じた報酬に基づいて学習させることができ、風速と回生電力の関係に応じて適切な制御を行うことができる。 As described above, the control system 50E of the fifth embodiment is a control system that controls the regenerated power generated by the wind power generation system 1, and indicates the wind speed at the installation location of the wind turbine 20 of the wind power generation system 1. Wind speed based on wind speed information, rotation information on the rotation of the wind turbine 20, power information on the regenerated power generated by the wind power generation system 1, information on the relationship between the wind speed and the regenerated power, and rewards according to the power information on the wind speed. The enhanced learning unit 71 that learns the correspondence information between the wind power and the regenerative power, the learned model storage unit 65 that stores the correspondence information between the wind speed and the regenerative power, and the power control parameter that controls the regenerative power are set in the wind power generation system 1. The command value of the power information is determined based on the state detection unit 62 that detects the rotation information, the power information, and the wind speed information in the case of the above, the rotation information and the wind speed information detected by the state detection unit 62, and the corresponding information. It includes a determination unit 66 and a control unit 67 that controls regenerative power based on a command value determined by the determination unit 66. As a result, the control system 50E of the fifth embodiment can make the reinforcement learning unit 71 learn based on the reward according to the power information for the wind speed, and performs appropriate control according to the relationship between the wind speed and the regenerative power. be able to.

また、第１の実施形態の制御システム５０は、状態検出部６２により検出された風速情報、及び電力情報に基づいて、報酬を算出する報酬算出部６３を更に備える。これにより、第１の実施形態の制御システム５０は、状態検出部６２により検出された風速と回生電力を用いての関係に応じてより適切な制御を行うことができる。 Further, the control system 50 of the first embodiment further includes a reward calculation unit 63 that calculates a reward based on the wind speed information and the electric power information detected by the state detection unit 62. As a result, the control system 50 of the first embodiment can perform more appropriate control according to the relationship between the wind speed detected by the state detection unit 62 and the regenerative power.

また、第１の実施形態の制御システム５０では、報酬算出部６３は、風速情報に示される風速が所定の強風閾値以上であるか否かに基づいて、異なる報酬条件を用いて前記報酬を算出する。これにより、第１の実施形態の制御システム５０は、強風時に通常時と同じ報酬条件で報酬を算出すると風車の安全性が確保され難くなる場合がある等誤った学習をさせてしまう可能性がある場合には、通常時と強風時とで報酬条件を変更することができるため、より適切な制御を行うことができる。 Further, in the control system 50 of the first embodiment, the reward calculation unit 63 calculates the reward using different reward conditions based on whether or not the wind speed shown in the wind speed information is equal to or higher than a predetermined strong wind threshold value. do. As a result, the control system 50 of the first embodiment may cause erroneous learning such that it may be difficult to secure the safety of the wind turbine if the reward is calculated under the same reward conditions as in the normal time in a strong wind. In some cases, the reward conditions can be changed between normal times and strong winds, so more appropriate control can be performed.

また、第１の実施形態の制御システム５０では、報酬算出部６３は、風速情報に示される風速が所定の強風閾値未満である場合、回転情報に示される風車２０の回転数が所定の回転閾値以上である場合に第１レベルの報酬を算出し、回転数が所定の回転閾値未満である場合に第１レベルより低い第２レベルの報酬を算出する。これにより、第１の実施形態の制御システム５０は、通常時には、発電量の大きくなるように制御された場合に、より高い報酬を算出することができ、より適切な制御を行うことができる。 Further, in the control system 50 of the first embodiment, when the wind speed indicated in the wind speed information is less than a predetermined strong wind threshold value, the reward calculation unit 63 determines that the rotation speed of the wind turbine 20 indicated in the rotation information is a predetermined rotation threshold value. When the above is the case, the reward of the first level is calculated, and when the rotation speed is less than the predetermined rotation threshold value, the reward of the second level lower than the first level is calculated. As a result, the control system 50 of the first embodiment can calculate a higher reward and can perform more appropriate control when the control system 50 is normally controlled so as to increase the amount of power generation.

また、第１の実施形態の制御システム５０では、報酬算出部６３は、風速情報に示される風速が所定の強風閾値以上である場合、回転情報に示される風車２０の回転数が所定の回転閾値以上である場合に第３レベルの報酬を算出し、回転数が所定の回転閾値未満である場合、第３レベルより高い第４レベルの報酬を算出する。これにより、第１の実施形態の制御システム５０は、強風時には、回転数を抑えて安全に制御された場合に、より高い報酬を算出することができ、より適切な制御を行うことができる。 Further, in the control system 50 of the first embodiment, when the wind speed indicated in the wind speed information is equal to or higher than a predetermined strong wind threshold value, the reward calculation unit 63 determines that the rotation speed of the wind turbine 20 indicated in the rotation information is a predetermined rotation threshold value. If the above is the case, the reward of the third level is calculated, and if the rotation speed is less than the predetermined rotation threshold value, the reward of the fourth level higher than the third level is calculated. As a result, the control system 50 of the first embodiment can calculate a higher reward when the rotation speed is suppressed and controlled safely in a strong wind, and more appropriate control can be performed.

また、第１の実施形態の制御システム５０では、報酬算出部６３は、電力情報が検出された時点より過去に取得された風速情報に基づいて、報酬を算出するようにしてもよい。これにより、第１の実施形態の制御システム５０は、電力制御パラメータが設定されてから、実際に電力が設定された値となるまでに所定の応答時間がかかる場合があっても、過去の風速に応じて報酬を算出することができるため、より適切な制御を行うことができる。 Further, in the control system 50 of the first embodiment, the reward calculation unit 63 may calculate the reward based on the wind speed information acquired in the past from the time when the power information is detected. As a result, the control system 50 of the first embodiment has a past wind speed even if it may take a predetermined response time from the setting of the power control parameter to the actual setting value of the power. Since the reward can be calculated according to the above, more appropriate control can be performed.

また、第１の実施形態の学習装置７０は、風力発電システム１の風車２０の設置場所における風速を示す風速情報、風車２０の回転に関する回転情報、風力発電システム１により発電される回生電力に関する電力情報、及び、風速と回生電力との関係情報と風速に対する電力情報に応じた報酬とに基づいて、風速と回生電力との対応情報を学習する強化学習部７１を備える。このため、第１の実施形態の学習装置７０は、風速に応じた電力に基づく報酬を手掛かりとして回生電力をどの様に制御すべきかを学習することが可能となるため、より適切な制御を行うことができる。 Further, the learning device 70 of the first embodiment has wind speed information indicating the wind speed at the installation location of the wind turbine 20 of the wind power generation system 1, rotation information regarding the rotation of the wind turbine 20, and electric power related to the regenerative power generated by the wind power generation system 1. The enhanced learning unit 71 is provided to learn the correspondence information between the wind speed and the regenerated power based on the information, the relation information between the wind speed and the regenerated power, and the reward according to the power information for the wind speed. Therefore, the learning device 70 of the first embodiment can learn how to control the regenerative power by using the reward based on the power according to the wind speed as a clue, and therefore performs more appropriate control. be able to.

また、第１の実施形態の制御装置６０は、風力発電システム１の風車２０の設置場所における風速を示す風速情報、風車２０の回転に関する回転情報、風力発電システム１により発電される回生電力に関する電力情報、及び、風速と回生電力との関係情報と風速に対する電力情報に応じた報酬とに基づいて、風速と回生電力との対応情報を学習する強化学習部７１と、回生電力を制御する電力制御パラメータを風力発電システム１に設定した場合における回転情報、電力情報、及び風速情報を検出する状態検出部６２と、状態検出部６２により検出された回転情報と風速情報、及び対応情報に基づいて、電力情報の指令値を決定する決定部６６と、決定部６６により決定された指令値に基づいて、回生電力を制御する制御部６７とを備える。これにより、第１の実施形態の制御装置６０は、風速に対する電力情報に応じた報酬に基づいて学習する強化学習部７１により出力された電力制御パラメータを用いて回生電力を制御することができるため、より適切な制御を行うことができる。 Further, the control device 60 of the first embodiment has wind speed information indicating the wind speed at the installation location of the wind turbine 20 of the wind power generation system 1, rotation information regarding the rotation of the wind turbine 20, and electric power related to the regenerative power generated by the wind power generation system 1. The enhanced learning unit 71 that learns the correspondence information between the wind speed and the regenerated power based on the information, the relation information between the wind speed and the regenerated power, and the reward according to the power information for the wind speed, and the power control that controls the regenerated power. Based on the state detection unit 62 that detects rotation information, power information, and wind speed information when the parameters are set in the wind power generation system 1, and the rotation information, wind speed information, and corresponding information detected by the state detection unit 62. It includes a determination unit 66 that determines a command value of power information, and a control unit 67 that controls regenerated power based on the command value determined by the determination unit 66. As a result, the control device 60 of the first embodiment can control the regenerated power by using the power control parameter output by the reinforcement learning unit 71 that learns based on the reward according to the power information for the wind speed. , More appropriate control can be performed.

また、第４の実施形態の制御装置６０は、回生電力を制御する電力制御パラメータを風力発電システム１に設定した場合における回転情報、電力情報、及び風速情報を検出する状態検出部６２と、状態検出部６２により検出された回転情報と風速情報、及び対応情報（例えば、学習済みモデル記憶部６５に記憶された風速と回生電力との対応情報）に基づいて、電力情報の指令値を決定する決定部６６と、決定部６６により決定された指令値に基づいて、回生電力を制御する制御部６７とを備える。これにより、第４の実施形態の制御装置６０は、対応情報に基づく電力制御パラメータに応じた制御を行うことができ、より適切な制御を行うことが可能となる。 Further, the control device 60 of the fourth embodiment has a state detection unit 62 for detecting rotation information, power information, and wind speed information when a power control parameter for controlling regenerated power is set in the wind power generation system 1, and a state. The command value of the power information is determined based on the rotation information and the wind speed information detected by the detection unit 62, and the correspondence information (for example, the correspondence information between the wind speed and the regenerative power stored in the learned model storage unit 65). It includes a determination unit 66 and a control unit 67 that controls regenerative power based on a command value determined by the determination unit 66. As a result, the control device 60 of the fourth embodiment can perform control according to the power control parameter based on the corresponding information, and more appropriate control can be performed.

以上説明したように、第５の実施形態の制御システム５０Ｅは、風力発電システム１により発電された回生電力を制御する制御システムであって、風力発電システム１の風車２０の設置場所における風速を示す風速情報、風車２０の回転に関する回転情報、風力発電システム１により発電される回生電力に関する電力情報、及び、風速と回生電力との関係情報と風速に対する回生電力の制御における応答時間が経過した後の電力情報に応じた報酬とに基づいて、風速と回生電力との対応情報を学習する強化学習部７１と、風速と回生電力との対応情報を記憶する学習済みモデル記憶部６５と、回生電力を制御する電力制御パラメータを風力発電システム１に設定した場合における回転情報、電力情報、及び風速情報を検出する状態検出部６２と、状態検出部６２により検出された回転情報と風速情報、及び対応情報に基づいて、電力情報の指令値を決定する決定部６６と、決定部６６により決定された指令値に基づいて、回生電力を制御する制御部６７とを備える。これにより、第５の実施形態の制御システム５０Ｅは、強化学習部７１に風速に対する応答時間を考慮した電力情報に応じた報酬に基づいて学習させることができ、応答時間に応じて制御を行うことができる。 As described above, the control system 50E of the fifth embodiment is a control system that controls the regenerated power generated by the wind power generation system 1, and indicates the wind speed at the installation location of the wind turbine 20 of the wind power generation system 1. Wind speed information, rotation information regarding the rotation of the wind turbine 20, power information regarding the regenerative power generated by the wind power generation system 1, information on the relationship between the wind speed and the regenerative power, and after the response time in the control of the regenerative power with respect to the wind speed has elapsed. The enhanced learning unit 71 that learns the correspondence information between the wind speed and the regenerative power based on the reward according to the power information, the learned model storage unit 65 that stores the correspondence information between the wind speed and the regenerative power, and the regenerative power. A state detection unit 62 that detects rotation information, power information, and wind speed information when the power control parameter to be controlled is set in the wind power generation system 1, and rotation information, wind speed information, and corresponding information detected by the state detection unit 62. A determination unit 66 that determines a command value of power information based on the above, and a control unit 67 that controls regenerated power based on the command value determined by the determination unit 66 are provided. As a result, the control system 50E of the fifth embodiment can make the reinforcement learning unit 71 learn based on the reward according to the power information in consideration of the response time to the wind speed, and controls according to the response time. Can be done.

また、第２の実施形態の制御システム５０Ａは、風速情報の時系列変化、及び電力情報の時系列変化に基づいて、応答時間を推定する応答時間推定部６９を更に備え、報酬算出部６３Ａは、応答時間推定部６９により推定された応答時間、状態検出部６２により検出された風速情報、及び電力情報に基づいて、風速情報が検出された時点より応答時間が経過した後の電力情報に応じた報酬を算出する。これにより、第２の実施形態の制御システム５０Ａは、応答時間推定部６９により風速情報と電力情報との双方の時系列変化に基づいて答時間が推定されるため、風速の変化の状況により応答時間が変化する場合であっても、その時々の時系列変化に応じた応答時間を推定することができ、より適切な応答時間に応じて制御を行うことができる。 Further, the control system 50A of the second embodiment further includes a response time estimation unit 69 that estimates the response time based on the time-series change of the wind velocity information and the time-series change of the power information, and the reward calculation unit 63A Based on the response time estimated by the response time estimation unit 69, the wind speed information detected by the state detection unit 62, and the power information, depending on the power information after the response time has elapsed from the time when the wind speed information was detected. Calculate the reward. As a result, in the control system 50A of the second embodiment, the response time is estimated by the response time estimation unit 69 based on the time-series changes of both the wind speed information and the power information, so that the response time depends on the state of the change in the wind speed. Even when the time changes, the response time according to the time-series change at that time can be estimated, and the control can be performed according to a more appropriate response time.

また、第３の実施形態の制御システム５０Ｂでは、報酬算出部６３Ｂは、風速情報に示される風速が加速する加速区間にあるか否かに基づいて、異なった報酬条件を用いて報酬を算出する。これにより、第３の実施形態の制御システム５０Ｂは、減速時に加速時と同じ報酬条件で報酬を算出するとトータルの回生電力の電力量がしてしまう場合であっても、加速時と減速時とで報酬条件を変更することができるため、より適切な制御を行うことができる。 Further, in the control system 50B of the third embodiment, the reward calculation unit 63B calculates the reward using different reward conditions based on whether or not the wind speed shown in the wind speed information is in the acceleration section where the wind speed accelerates. .. As a result, the control system 50B of the third embodiment can be used for acceleration and deceleration even when the total regenerative power amount is calculated when the reward is calculated under the same reward conditions as for acceleration during deceleration. Since the reward conditions can be changed with, more appropriate control can be performed.

また、第３の実施形態の制御システム５０Ｂでは、報酬算出部６３Ｂでは、風速情報に示される風速が加速する加速区間にある場合、風速に応じた基準電力値と電力情報に示される電力値との差分に応じた報酬を算出する。これにより、第３の実施形態の制御システム５０Ｂは、加速時において風車２０が失速する可能性がほとんどない場合に発電力を大きくするような制御を学習させることができ、より適切な制御を行うことができる。 Further, in the control system 50B of the third embodiment, in the reward calculation unit 63B, when the wind speed shown in the wind speed information is in the acceleration section where the wind speed accelerates, the reference power value according to the wind speed and the power value shown in the power information are used. Calculate the reward according to the difference between. Thereby, the control system 50B of the third embodiment can learn the control to increase the power generation when there is almost no possibility that the wind turbine 20 stalls at the time of acceleration, and performs more appropriate control. be able to.

また、第３の実施形態の制御システム５０Ｂでは、報酬算出部６３Ｂは、風速情報に示される風速が減速する減速区間にある場合、風速が加速に変化した後の風速情報を含む風速に対応する電力情報に示される電力値に応じた報酬を算出する。これにより、第３の実施形態の制御システム５０Ｂは、減速時において、回生電力を大きくさせた場合に風車２０が失速する可能性がある場合に、発電力を大きくすることなく、風速が加速に転じた後に回生電力を増やすことでトータルの発電量を大きくするような制御を学習させることができ、より適切な制御を行うことができる。 Further, in the control system 50B of the third embodiment, the reward calculation unit 63B corresponds to the wind speed including the wind speed information after the wind speed changes to acceleration when the wind speed is in the deceleration section in which the wind speed is decelerated. Calculate the reward according to the power value shown in the power information. As a result, the control system 50B of the third embodiment accelerates the wind speed without increasing the power generation when the wind turbine 20 may stall when the regenerative power is increased during deceleration. By increasing the regenerative power after turning, it is possible to learn the control that increases the total amount of power generation, and it is possible to perform more appropriate control.

また、第３の実施形態の制御システム５０Ｂでは、報酬算出部６３Ｂは、風速情報に示される風速が減速する減速区間にある場合、風速から所定時間経過するまでの風速の平均値と、風速から所定時間経過するまでの風速情報に対応する電力情報に示される電力値における平均値に応じた報酬を算出する。これにより、第３の実施形態の制御システム５０Ｂは、減速時において、回生電力を大きくさせた場合に風車２０が失速する可能性がある場合には発電力を大きくすることなく、その後に回生電力を増やすことでトータルの発電量を大きくするような制御を学習させることができ、より適切な制御を行うことができる。 Further, in the control system 50B of the third embodiment, when the reward calculation unit 63B is in the deceleration section in which the wind speed shown in the wind speed information decelerates, the average value of the wind speed from the wind speed to the elapse of a predetermined time and the wind speed are used. The reward according to the average value in the power value shown in the power information corresponding to the wind speed information until the predetermined time elapses is calculated. As a result, the control system 50B of the third embodiment does not increase the power generation when the wind turbine 20 may stall when the regenerative power is increased during deceleration, and then the regenerative power is subsequently increased. By increasing the number of units, it is possible to learn the control that increases the total amount of power generation, and it is possible to perform more appropriate control.

また、第２の実施形態の学習装置７０は、風力発電システム１の風車２０の設置場所における風速を示す風速情報、風車２０の回転に関する回転情報、風力発電システム１により発電される回生電力に関する電力情報、及び、風速と回生電力との関係情報と風速に対する前記回生電力の制御における応答時間経過後の前記電力情報に応じた報酬とに基づいて、風速と回生電力との対応情報を学習する強化学習部７１を備える。このため、第１の実施形態の学習装置７０は、風速に応じた電力に基づく報酬を手掛かりとして回生電力をどの様に制御すべきかを学習することが可能となるため、より適切な制御を行うことができる。 Further, the learning device 70 of the second embodiment has wind speed information indicating the wind speed at the installation location of the wind turbine 20 of the wind power generation system 1, rotation information regarding the rotation of the wind turbine 20, and electric power related to the regenerative power generated by the wind power generation system 1. Enhancement to learn the correspondence information between wind speed and regenerated power based on the information and the relation information between the wind speed and the regenerated power and the reward according to the power information after the response time in the control of the regenerated power with respect to the wind speed has elapsed. A learning unit 71 is provided. Therefore, the learning device 70 of the first embodiment can learn how to control the regenerative power by using the reward based on the power according to the wind speed as a clue, and therefore performs more appropriate control. be able to.

また、第２の実施形態の制御装置６０Ａは、風力発電システム１の風車２０の設置場所における風速を示す風速情報、風車２０の回転に関する回転情報、風力発電システム１により発電される回生電力に関する電力情報、及び、風速と回生電力との関係情報と風速に対する前記回生電力の制御における応答時間経過後の前記電力情報に応じた報酬とに基づいて、風速と回生電力との対応情報を学習する強化学習部７１と、回生電力を制御する電力制御パラメータを風力発電システム１に設定した場合における回転情報、電力情報、及び風速情報を検出する状態検出部６２と、状態検出部６２により検出された回転情報と風速情報、及び対応情報に基づいて、電力情報の指令値を決定する決定部６６と、決定部６６により決定された指令値に基づいて、回生電力を制御する制御部６７とを備える。これにより、第１の実施形態の制御装置６０は、風速に対する電力情報に応じた報酬に基づいて学習する強化学習部７１により出力された電力制御パラメータを用いて回生電力を制御することができるため、より適切な制御を行うことができる。 Further, the control device 60A of the second embodiment has wind speed information indicating the wind speed at the installation location of the wind turbine 20 of the wind power generation system 1, rotation information regarding the rotation of the wind turbine 20, and electric power related to the regenerative power generated by the wind power generation system 1. Enhancement to learn the correspondence information between wind speed and regenerated power based on the information and the relationship information between wind speed and regenerated power and the reward according to the power information after the response time in the control of the regenerated power to the wind speed has elapsed. The learning unit 71, the state detection unit 62 that detects rotation information, power information, and wind speed information when the power control parameter that controls the regenerative power is set in the wind power generation system 1, and the rotation detected by the state detection unit 62. It includes a determination unit 66 that determines a command value of power information based on information, wind speed information, and correspondence information, and a control unit 67 that controls regenerated power based on the command value determined by the determination unit 66. As a result, the control device 60 of the first embodiment can control the regenerated power by using the power control parameter output by the reinforcement learning unit 71 that learns based on the reward according to the power information for the wind speed. , More appropriate control can be performed.

上述した実施形態における制御システム５０、制御装置６０、及び学習装置７０の各々が行う処理の全部または一部をコンピュータで実現するようにしてもよい。その場合、この機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ－ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでもよい。また上記プログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよく、ＦＰＧＡ等のプログラマブルロジックデバイスを用いて実現されるものであってもよい。 All or part of the processing performed by each of the control system 50, the control device 60, and the learning device 70 in the above-described embodiment may be realized by a computer. In that case, a program for realizing this function may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read by a computer system and executed. The term "computer system" as used herein includes hardware such as an OS and peripheral devices. Further, the "computer-readable recording medium" refers to a portable medium such as a flexible disk, a magneto-optical disk, a ROM, or a CD-ROM, and a storage device such as a hard disk built in a computer system. Further, a "computer-readable recording medium" is a communication line for transmitting a program via a network such as the Internet or a communication line such as a telephone line, and dynamically holds the program for a short period of time. It may also include a program that holds a program for a certain period of time, such as a volatile memory inside a computer system that is a server or a client in that case. Further, the above program may be for realizing a part of the above-mentioned functions, and may be further realized for realizing the above-mentioned functions in combination with a program already recorded in the computer system. It may be realized by using a programmable logic device such as FPGA.

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 Although the embodiments of the present invention have been described in detail with reference to the drawings, the specific configuration is not limited to this embodiment, and includes designs and the like within a range that does not deviate from the gist of the present invention.

１風力発電システム
１０風力発電機本体
２０風車
３０発電機
３１整流・昇圧部
３２電圧検出部
３３電流検出部
４１風速センサ
４２回転速度センサ
５０制御システム
６０制御装置
６１パラメータ取得部
６２状態検出部
６３報酬算出部
６４報酬出力部
６５学習済みモデル記憶部
６６決定部
６７制御部
６８選択部
６９応答時間推定部
７０学習装置
７１強化学習部 1 Wind power generation system 10 Wind power generator body 20 Wind turbine 30 Generator 31 Rectification / boosting unit 32 Voltage detection unit 33 Current detection unit 41 Wind speed sensor 42 Rotation speed sensor 50 Control system 60 Control device 61 Parameter acquisition unit 62 State detection unit 63 Reward Calculation unit 64 Reward output unit 65 Learned model storage unit 66 Decision unit 67 Control unit 68 Selection unit 69 Response time estimation unit 70 Learning device 71 Enhanced learning unit

Claims

It is a control system that controls the regenerative power generated by the wind power generation system.
Wind speed information indicating the wind speed at the installation location of the wind turbine of the wind power generation system, rotation information regarding the rotation of the wind turbine, power information regarding the regenerative power generated by the wind power generation system, and information on the relationship between the wind speed and the regenerative power. And a learning unit that learns correspondence information between the wind speed and the regenerated power based on the reward corresponding to the power information for the wind speed.
A storage unit that stores information on the correspondence between the wind speed and the regenerative power,
A state detection unit that detects the rotation information, the power information, and the wind speed information when the power control parameter for controlling the regenerative power is set in the wind power generation system.
A determination unit that determines a command value of the power information based on the rotation information, the wind speed information, and the corresponding information detected by the state detection unit.
A control unit that controls the regenerative power based on the command value determined by the determination unit, and
With the reward calculation unit that calculates the reward based on the wind speed information, the rotation information, and the power information detected by the state detection unit.
Equipped with
When the wind speed shown in the wind speed information is equal to or higher than a predetermined strong wind threshold value, the reward calculation unit calculates the reward according to the rotation speed of the wind turbine shown in the rotation information, and the wind speed shown in the wind speed information. Is less than a predetermined strong wind threshold, the reward is calculated according to the power information.
A control system characterized by that.

When the wind speed indicated in the wind speed information detected by the state detection unit is less than a predetermined strong wind threshold value, the reward calculation unit has previously combined the regenerative power indicated in the power information detected by the state detection unit with the regenerative power. Based on the stored reference power, the higher the regenerative power is, the higher the reward is calculated.
The control system according to claim 1 .

When the wind speed indicated in the wind speed information detected by the state detection unit is equal to or higher than a predetermined strong wind threshold value, the reward calculation unit has the number of rotations of the wind turbine indicated in the rotation information detected by the state detection unit. And the rotation threshold value stored in advance, the higher the rotation speed is calculated as the rotation speed is smaller than the rotation threshold value and the rotation speed is smaller than the rotation threshold value .
The control system according to claim 1 or 2 .

The reward calculation unit calculates the reward based on the wind speed information acquired by the state detection unit in the past from the time when the power information is detected by the state detection unit.
The control system according to any one of claims 1 to 3 .

Wind speed information indicating the wind speed at the installation location of the wind turbine of the wind power generation system, rotation information regarding the rotation of the wind turbine, power information regarding the regenerative power generated by the wind power generation system, and information on the relationship between the wind speed and the regenerative power. A learning unit for learning the correspondence information between the wind speed and the regenerated power based on the reward corresponding to the power information for the wind speed is provided .
The reward is calculated based on the wind speed information, the rotation information, and the power information detected when the power control parameter for controlling the regenerative power is set in the wind power generation system, and the wind speed shown in the wind speed information. Is calculated according to the rotation speed of the wind turbine shown in the rotation information when is equal to or higher than a predetermined strong wind threshold, and is calculated according to the power information when the wind speed indicated in the wind speed information is less than the predetermined strong wind threshold. Calculated
Learning device.

Wind speed information indicating the wind speed at the installation location of the wind turbine of the wind power generation system, rotation information regarding the rotation of the wind turbine, power information regarding the regenerative power generated by the wind power generation system, and information on the relationship between the wind speed and the regenerative power. A learning unit that learns correspondence information between the wind speed and the regenerated power based on the reward corresponding to the power information for the wind speed.
A state detection unit that detects the rotation information, the power information, and the wind speed information when the power control parameter for controlling the regenerative power is set in the wind power generation system.
A determination unit that determines a command value of the power information based on the rotation information, the wind speed information, and the corresponding information detected by the state detection unit.
A control unit that controls the regenerative power based on the command value determined by the determination unit, and
With the reward calculation unit that calculates the reward based on the wind speed information, the rotation information, and the power information detected by the state detection unit.
Equipped with
When the wind speed shown in the wind speed information is equal to or higher than a predetermined strong wind threshold value, the reward calculation unit calculates the reward according to the rotation speed of the wind turbine shown in the rotation information, and the wind speed shown in the wind speed information. Is less than a predetermined strong wind threshold, the reward is calculated according to the power information.
Control device.

Rotation information regarding the rotation of the wind turbine of the wind power generation system when the power control parameter for controlling the regenerative power generated by the wind power generation system is set in the wind power generation system, and power information regarding the regenerative power generated by the wind power generation system. , And a state detection unit that detects wind speed information indicating the wind speed at the installation location of the wind turbine.
A determination unit that determines a command value of the power information based on the rotation information and the wind speed information detected by the state detection unit, and the correspondence information between the wind speed and the regenerative power.
A control unit that controls the regenerative power based on the command value determined by the determination unit is provided .
The correspondence information is information learned based on the wind speed information, the rotation information, the power information, the relationship information between the wind speed and the regenerative power, and the reward corresponding to the power information for the wind speed. ,
The reward is calculated based on the wind speed information, the rotation information, and the power information detected by the state detection unit, and when the wind speed shown in the wind speed information is equal to or higher than a predetermined strong wind threshold value, the rotation information is used. It is calculated according to the rotation speed of the wind turbine shown, and is calculated according to the power information when the wind speed shown in the wind speed information is less than a predetermined strong wind threshold value.
Control device.

It is a control method that controls the regenerative power generated by the wind power generation system.
The learning unit has wind speed information indicating the wind speed at the installation location of the wind turbine of the wind power generation system, rotation information regarding the rotation of the wind turbine, power information regarding the regenerative power generated by the wind power generation system, and the wind speed and the regenerative power. Based on the relationship information with the wind speed and the reward corresponding to the power information for the wind speed, the correspondence information between the wind speed and the regenerated power is learned.
The storage unit stores the correspondence information between the wind speed and the regenerative power .
The state detection unit detects the rotation information, the power information, and the wind speed information when the power control parameter for controlling the regenerative power is set in the wind power generation system.
The determination unit determines the command value of the power information based on the rotation information, the wind speed information, and the corresponding information detected by the state detection unit.
The control unit controls the regenerative power based on the command value determined by the determination unit.
The reward calculation unit calculates the reward based on the wind speed information, the rotation information, and the power information detected by the state detection unit, and the wind speed shown in the wind speed information is equal to or higher than a predetermined strong wind threshold value. In this case, the reward is calculated according to the rotation speed of the wind turbine shown in the rotation information, and the reward is calculated according to the power information when the wind speed shown in the wind speed information is less than a predetermined strong wind threshold . Control method.