JP7384311B1

JP7384311B1 - Driving support device, driving support method and program

Info

Publication number: JP7384311B1
Application number: JP2023109143A
Authority: JP
Inventors: 智志桐生
Original assignee: Fuji Electric Co Ltd
Current assignee: Fuji Electric Co Ltd
Priority date: 2023-07-03
Filing date: 2023-07-03
Publication date: 2023-11-21
Anticipated expiration: 2043-07-03

Abstract

【課題】対象プラントに関するデータが十分に存在しない場合であっても精度の良い運転支援を実現できる技術を提供すること。【解決手段】本開示の一態様による運転支援装置は、対象プラントの運転を支援するための運転支援装置であって、前記対象プラント以外のプラントの運転実績データを用いて、プラントの状態を表す運転データを入力としてプラントに対する推奨設定値を出力するモデルを学習する事前学習部と、前記対象プラントの運転実績データを用いて、前記事前学習部によって学習されたモデルをファインチューニングするファインチューニング部と、前記対象プラントから運転データが取得される毎に、前記対象プラントの運転データから前記モデルにより推奨設定値を算出する推奨設定値算出部と、前記推奨設定値算出部によって算出された推奨設定値を前記対象プラントのオペレータに提案する提案部と、を有する。【選択図】図３An object of the present invention is to provide a technology that can realize highly accurate operation support even when sufficient data regarding a target plant does not exist. [Solution] An operation support device according to one aspect of the present disclosure is an operation support device for supporting the operation of a target plant, and represents the state of the plant using operation performance data of plants other than the target plant. a pre-learning unit that learns a model that receives operating data as input and outputs recommended setting values for the plant; and a fine-tuning unit that fine-tunes the model learned by the pre-learning unit using operating performance data of the target plant. and a recommended setting value calculation unit that calculates recommended setting values using the model from the operation data of the target plant each time operation data is acquired from the target plant, and recommended settings calculated by the recommended setting value calculation unit. and a proposal unit that proposes the value to the operator of the target plant. [Selection diagram] Figure 3

Description

本開示は、運転支援装置、運転支援方法及びプログラムに関する。 The present disclosure relates to a driving support device, a driving support method, and a program.

燃焼炉等のプラントでは炉の燃焼状態等の計測値を踏まえて、オペレータが過去の状況を考慮しながら設定値を変更することにより安定した運転を実現している。このため、オペレータの負荷が大きく、オペレータの運転を支援するための様々な技術が提案されている。例えば、特許文献１には、ニューラルネットワークにより作成されたモデルを利用して、プラントの運転状態に応じた推奨操作を出力する技術が記載されている。 In plants such as combustion furnaces, stable operation is achieved by operators changing set values while taking past conditions into account, based on measured values such as the combustion state of the furnace. This places a heavy burden on the operator, and various techniques have been proposed to support the operator's operation. For example, Patent Document 1 describes a technique that uses a model created by a neural network to output recommended operations according to the operating state of a plant.

特開２０１９－１５９６７５号公報JP2019-159675A

しかしながら、運転支援の対象とするプラント（以下、対象プラント）によっては対象プラントの状態の計測値や設定値等を表すデータが十分に存在しない場合がある。このため、例えば、特許文献１に記載されている技術では精度の良いモデルを作成することができない場合がある。 However, depending on the plant targeted for operation support (hereinafter referred to as the target plant), there may be insufficient data representing the measured values, set values, etc. of the state of the target plant. For this reason, for example, the technique described in Patent Document 1 may not be able to create a highly accurate model.

本開示は、上記の点に鑑みてなされたもので、対象プラントに関するデータが十分に存在しない場合であっても精度の良い運転支援を実現できる技術を提供する。 The present disclosure has been made in view of the above points, and provides a technology that can realize highly accurate operation support even when there is insufficient data regarding the target plant.

本開示の一態様による運転支援装置は、対象プラントの運転を支援するための運転支援装置であって、前記対象プラント以外のプラントの運転実績データを用いて、プラントの状態を表す運転データを入力としてプラントに対する推奨設定値を出力するモデルを学習する事前学習部と、前記対象プラントの運転実績データを用いて、前記事前学習部によって学習されたモデルをファインチューニングするファインチューニング部と、前記対象プラントから運転データが取得される毎に、前記対象プラントの運転データから前記モデルにより推奨設定値を算出する推奨設定値算出部と、前記推奨設定値算出部によって算出された推奨設定値を前記対象プラントのオペレータに提案する提案部と、を有する。 An operation support device according to one aspect of the present disclosure is an operation support device for supporting the operation of a target plant, and inputs operation data representing the state of the plant using operation record data of plants other than the target plant. a pre-learning unit that learns a model that outputs recommended setting values for the plant as a target plant; a fine-tuning unit that fine-tunes the model learned by the pre-learning unit using operation record data of the target plant; a recommended setting value calculating section that calculates recommended setting values using the model from the operating data of the target plant each time operating data is acquired from the plant; It has a proposal department that makes proposals to plant operators.

対象プラントに関するデータが十分に存在しない場合であっても精度の良い運転支援を実現できる技術が提供される。 Provided is a technology that can realize highly accurate operation support even when sufficient data regarding a target plant does not exist.

本実施形態に係るプラント制御システムの全体構成の一例を示す図である。1 is a diagram showing an example of the overall configuration of a plant control system according to the present embodiment. 本実施形態に係る運転支援装置のハードウェア構成の一例を示す図である。1 is a diagram illustrating an example of a hardware configuration of a driving support device according to an embodiment. 本実施形態に係る運転支援装置の機能構成の一例を示す図である。1 is a diagram illustrating an example of a functional configuration of a driving support device according to an embodiment. 本実施形態に係るオフライン処理の一例を示すフローチャートである。It is a flowchart which shows an example of offline processing concerning this embodiment. 本実施形態に係るオンライン処理の一例を示すフローチャートである。3 is a flowchart illustrating an example of online processing according to the present embodiment. モデル出力の順位付けの一例を示す図（その１）である。FIG. 3 is a diagram (part 1) showing an example of ranking of model outputs. モデル出力の順位付けの一例を示す図（その２）である。FIG. 7 is a diagram (part 2) showing an example of ranking of model outputs. 一実施例におけるごみ焼却プラントを模式的に示す図である。FIG. 1 is a diagram schematically showing a waste incineration plant in one embodiment. 一実施例における各モデルの推奨設定値を模式的に示す図である。FIG. 3 is a diagram schematically showing recommended setting values for each model in an example. 一実施例におけるオペレータの設定値を模式的に示す図である。FIG. 3 is a diagram schematically showing operator setting values in one embodiment. 一実施例における報酬関数の学習を模式的に示す図である。FIG. 3 is a diagram schematically showing learning of a reward function in an example. 一実施例におけるモデル学習を模式的に示す図である。FIG. 3 is a diagram schematically showing model learning in an example. 一実施例におけるモデル学習の結果を模式的に示す図である。FIG. 3 is a diagram schematically showing the results of model learning in an example.

以下、本発明の一実施形態について説明する。以下の実施形態では、対象プラントに関するデータ（対象プラントの状態を計測した計測値、対象プラントの設定値）が十分に存在しない場合であっても精度の良い運転支援を実現できる運転支援装置１０が含まれるプラント制御システム１について説明する。なお、対象プラントとは、運転支援装置１０による運転支援の対象とするプラントのことである。 An embodiment of the present invention will be described below. In the following embodiments, an operation support device 10 that can realize highly accurate operation support even when there is insufficient data regarding the target plant (measured values of the state of the target plant, set values of the target plant) is provided. The included plant control system 1 will be explained. Note that the target plant is a plant targeted for operational support by the operational support device 10.

以下、プラントの状態を計測した計測値で構成されるデータのことを「運転データ」と呼ぶことにする。運転データは、プラントの様々な状態を表す物理量（例えば、温度、圧力、流量、ガス濃度等）を各種センサで計測した計測値で構成される。すなわち、運転データは、各物理量を表す変数（これは「状態変数」とも呼ばれる。）で構成される多変量データである。 Hereinafter, data composed of measured values of the state of the plant will be referred to as "operating data." The operation data is composed of measured values obtained by measuring physical quantities (for example, temperature, pressure, flow rate, gas concentration, etc.) representing various states of the plant using various sensors. That is, the operation data is multivariate data composed of variables (also called "state variables") representing each physical quantity.

また、以下では、或る時刻の運転データと、その運転データが得られたときにオペレータによってプラントに設定された設定値（つまり、当該時刻の設定値）との組を「プラント運転実績データ」と呼ぶことにする。例えば、時刻ｔの運転データをｘ_ｔ、その運転データｘ_ｔが得られたときにオペレータによってプラントに設定された設定値をｙ_ｔとしたとき、プラント運転実績データは（ｘ_ｔ，ｙ_ｔ）と表される。なお、時刻ｔの運転データｘ_ｔは、時刻ｔ－１～時刻ｔの間に計測された状態変数の時系列データを表す多変量データであってもよい。 In addition, in the following, a set of operating data at a certain time and a setting value set in the plant by an operator when the operating data was obtained (that is, a setting value at the relevant time) will be referred to as "plant operating performance data". I will call it. For example, when the operating data at time t is x _t and the setting value set in the plant by the operator when the operating data x _t was obtained is y _t , the plant operating performance data is (x _t , y _t ) It is expressed as Note that the driving data x _t at time t may be multivariate data representing time-series data of state variables measured between time t-1 and time t.

更に、以下では、対象プラント以外のプラントのプラント運転実績データ等を利用するため、対象プラントとそれ以外のプラントを区別する記号として、対象プラントをｐ'、対象プラント以外のプラントをｐ∈［Ｐ］で表すことにする。ここで、Ｐは対象プラント以外のプラントの総数である。 Furthermore, in the following, since we will use plant operation performance data of plants other than the target plant, we will use p' for the target plant and p∈[P ]. Here, P is the total number of plants other than the target plant.

＜プラント制御システム１の全体構成例＞
本実施形態に係るプラント制御システム１の全体構成例を図１に示す。図１に示すように、本実施形態に係るプラント制御システム１には、運転支援装置１０と、オペレータ端末２０と、対象プラント３０とが含まれる。ここで、運転支援装置１０とオペレータ端末２０は任意の通信ネットワークを介して通信可能に接続されている。同様に、オペレータ端末２０と対象プラント３０は任意の通信ネットワークを介して通信可能に接続されており、対象プラント３０と運転支援装置１０は任意の通信ネットワークを介して通信可能に接続されている。 <Example of overall configuration of plant control system 1>
FIG. 1 shows an example of the overall configuration of a plant control system 1 according to this embodiment. As shown in FIG. 1, the plant control system 1 according to the present embodiment includes an operation support device 10, an operator terminal 20, and a target plant 30. Here, the driving support device 10 and the operator terminal 20 are communicably connected via an arbitrary communication network. Similarly, the operator terminal 20 and the target plant 30 are communicably connected via any communication network, and the target plant 30 and the operation support device 10 are communicably connected via any communication network.

運転支援装置１０は、対象プラント３０の運転データから推奨設定値を算出するモデル（以下、「推奨設定値算出モデル」ともいう。）をオフラインで学習する。また、運転支援装置１０は、オンラインで対象プラント３０から運転データを取得する毎に、学習済みの推奨設定値算出モデルにより推奨設定値を算出してオペレータに提案すると共に推奨設定値算出モデルを動的に再学習する。ここで、推奨設定値とは、オペレータに推奨する設定値のことである。また、推奨設定値算出モデルは、運転データを入力として推奨設定値を算出及び出力する機械学習モデルである。なお、オンラインとは、対象プラント３０が運用中の状態のことである。一方で、オフラインとは、オンライン以外の状態のことであり、例えば、対象プラント３０の運用開始前や運用停止中の状態のことである。 The operation support device 10 learns a model (hereinafter also referred to as a "recommended setting value calculation model") that calculates recommended setting values from operating data of the target plant 30 offline. In addition, each time the operation support device 10 acquires operational data from the target plant 30 online, the operation support device 10 calculates recommended setting values using the learned recommended setting value calculation model, proposes them to the operator, and operates the recommended setting value calculation model. Re-learn the basics. Here, the recommended setting value is a setting value recommended to the operator. Further, the recommended setting value calculation model is a machine learning model that calculates and outputs recommended setting values using operating data as input. Note that online refers to a state in which the target plant 30 is in operation. On the other hand, offline refers to a state other than online, and is, for example, a state before the target plant 30 starts operating or while the operation is stopped.

オペレータ端末２０は、対象プラント３０のオペレータが操作する各種端末（例えば、ＰＣ（パーソナルコンピュータ）、スマートフォン、タブレット端末、ウェアラブルデバイス等）である。オペレータは、運転支援装置１０から提案された推奨設定値を参考にして対象プラント３０に実際に設定する設定値を決定した上で、オペレータ端末２０を操作して当該設定値を対象プラント３０に設定する。これにより、当該設定値によって対象プラント３０の運転が制御される。なお、設定値とは、対象プラント３０に設定される操作量等の値のことである。 The operator terminal 20 is a variety of terminals (for example, a PC (personal computer), a smartphone, a tablet terminal, a wearable device, etc.) operated by an operator of the target plant 30. The operator refers to the recommended setting values proposed by the operation support device 10 and determines the setting values to actually be set in the target plant 30, and then operates the operator terminal 20 to set the setting values in the target plant 30. do. Thereby, the operation of the target plant 30 is controlled by the set value. Note that the set value refers to a value such as a manipulated variable set to the target plant 30.

対象プラント３０は、運転支援の対象となる各種プラントである。対象プラント３０は特定のプラントに限定されるものではないが、一例として、ごみ焼却プラント等が挙げられる。また、対象プラント３０は、必ずしもプラントに限定されるものではなく、例えば、エネルギーマネジメントシステム等といった需給系統であってもよい。 The target plants 30 are various plants that are targets of operational support. Although the target plant 30 is not limited to a specific plant, an example is a garbage incineration plant. Further, the target plant 30 is not necessarily limited to a plant, and may be a supply and demand system such as an energy management system, for example.

なお、図１に示すプラント制御システム１の全体構成は一例であって、他の構成であってもよい。例えば、運転支援装置１０とオペレータ端末２０とが一体で構成されていてもよい。また、複数のオペレータ端末２０が存在してもよい。 Note that the overall configuration of the plant control system 1 shown in FIG. 1 is an example, and other configurations may be used. For example, the driving support device 10 and the operator terminal 20 may be integrated. Further, a plurality of operator terminals 20 may exist.

＜運転支援装置１０のハードウェア構成例＞
本実施形態に係る運転支援装置１０のハードウェア構成例を図２に示す。図２に示すように、本実施形態に係る運転支援装置１０は、入力装置１０１と、表示装置１０２と、外部Ｉ／Ｆ１０３と、通信Ｉ／Ｆ１０４と、ＲＡＭ（Random Access Memory）１０５と、ＲＯＭ（Read Only Memory）１０６と、補助記憶装置１０７と、プロセッサ１０８とを有する。これらの各ハードウェアは、それぞれがバス１０９を介して通信可能に接続される。 <Example of hardware configuration of driving support device 10>
FIG. 2 shows an example of the hardware configuration of the driving support device 10 according to this embodiment. As shown in FIG. 2, the driving support device 10 according to the present embodiment includes an input device 101, a display device 102, an external I/F 103, a communication I/F 104, a RAM (Random Access Memory) 105, and a ROM. (Read Only Memory) 106, an auxiliary storage device 107, and a processor 108. Each of these pieces of hardware is communicably connected via a bus 109.

入力装置１０１は、例えば、キーボード、マウス、タッチパネル、物理ボタン等である。表示装置１０２は、例えば、ディスプレイ、表示パネル等である。なお、運転支援装置１０は、入力装置１０１及び表示装置１０２のうちの少なくとも一方を有していなくてもよい。 The input device 101 is, for example, a keyboard, a mouse, a touch panel, a physical button, or the like. The display device 102 is, for example, a display, a display panel, or the like. Note that the driving support device 10 does not need to include at least one of the input device 101 and the display device 102.

外部Ｉ／Ｆ１０３は、記録媒体１０３ａ等の外部装置とのインタフェースである。記録媒体１０３ａとしては、例えば、ＣＤ（Compact Disc）、ＤＶＤ（Digital Versatile Disk）、ＳＤメモリカード（Secure Digital memory card）、ＵＳＢ（Universal Serial Bus）メモリカード等が挙げられる。 The external I/F 103 is an interface with an external device such as the recording medium 103a. Examples of the recording medium 103a include a CD (Compact Disc), a DVD (Digital Versatile Disk), an SD memory card (Secure Digital memory card), and a USB (Universal Serial Bus) memory card.

通信Ｉ／Ｆ１０４は、運転支援装置１０を通信ネットワークに接続するためのインタフェースである。ＲＡＭ１０５は、プログラムやデータを一時保持する揮発性の半導体メモリ（記憶装置）である。ＲＯＭ１０６は、電源を切ってもプログラムやデータを保持することができる不揮発性の半導体メモリ（記憶装置）である。補助記憶装置１０７は、例えば、ＨＤＤ（Hard Disk Drive）やＳＳＤ（Solid State Drive）等の不揮発性の記憶装置であり、プログラムやデータが格納される。プロセッサ１０８は、例えば、ＣＰＵ（Central Processing Unit）やＧＰＵ（Graphics Processing Unit）等の各種演算装置である。 Communication I/F 104 is an interface for connecting driving support device 10 to a communication network. The RAM 105 is a volatile semiconductor memory (storage device) that temporarily holds programs and data. The ROM 106 is a nonvolatile semiconductor memory (storage device) that can retain programs and data even when the power is turned off. The auxiliary storage device 107 is, for example, a nonvolatile storage device such as an HDD (Hard Disk Drive) or an SSD (Solid State Drive), and stores programs and data. The processor 108 is, for example, various arithmetic devices such as a CPU (Central Processing Unit) or a GPU (Graphics Processing Unit).

なお、図２に示すハードウェア構成は一例であって、運転支援装置１０は、他のハードウェア構成を有していてもよい。例えば、運転支援装置１０は、複数の補助記憶装置１０７や複数のプロセッサ１０８を有していてもよいし、図示したハードウェア以外の種々のハードウェアを有していてもよい。 Note that the hardware configuration shown in FIG. 2 is an example, and the driving support device 10 may have other hardware configurations. For example, the driving support device 10 may include multiple auxiliary storage devices 107 and multiple processors 108, or may include various hardware other than the illustrated hardware.

＜運転支援装置１０の機能構成例＞
本実施形態に係る運転支援装置１０の機能構成例を図３に示す。図３に示すように、本実施形態に係る運転支援装置１０は、オフライン処理部２０１と、オンライン処理部２０２とを有する。これら各部は、例えば、運転支援装置１０にインストールされた１以上のプログラムが、プロセッサ１０８等に実行させる処理により実現される。また、本実施形態に係る運転支援装置１０は、プラント運転実績記憶部２０３と、対象プラント運転実績記憶部２０４とを有する。これら各記憶部は、例えば、補助記憶装置１０７等といった記憶装置の記憶領域により実現される。なお、プラント運転実績記憶部２０３及び対象プラント運転実績記憶部２０４の少なくとも一方の記憶部が、運転支援装置１０と通信ネットワークを介して接続される記憶装置の記憶領域により実現されていてもよい。 <Example of functional configuration of driving support device 10>
FIG. 3 shows an example of the functional configuration of the driving support device 10 according to this embodiment. As shown in FIG. 3, the driving support device 10 according to this embodiment includes an offline processing section 201 and an online processing section 202. Each of these units is realized, for example, by one or more programs installed in the driving support device 10 causing the processor 108 or the like to execute the process. Further, the operation support device 10 according to the present embodiment includes a plant operation record storage section 203 and a target plant operation record storage section 204. Each of these storage units is realized by a storage area of a storage device such as the auxiliary storage device 107, for example. Note that at least one of the plant operation record storage unit 203 and the target plant operation record storage unit 204 may be realized by a storage area of a storage device connected to the operation support device 10 via a communication network.

オフライン処理部２０１は、対象プラント３０以外のプラントのプラント運転実績データと、対象プラント３０の過去のプラント運転実績データ（以下、「対象プラント運転実績データ」ともいう。）とを用いて、オフラインで推奨設定値算出モデルを学習する。ここで、オフライン処理部２０１には、事前学習部２１１と、ファインチューニング部２１２とが含まれる。事前学習部２１１は、対象プラント３０以外のプラントのプラント運転実績データを用いて、推奨設定値算出モデルを事前学習する。ファインチューニング部２１２は、対象プラント運転実績データを用いて、事前学習済みの推奨設定値算出モデルをファインチューニングする。これにより、事前学習済みの推奨設定値算出モデルが対象プラント３０用にファインチューニングされ、学習済みの推奨設定値算出モデルが得られる。 The offline processing unit 201 uses plant operation record data of plants other than the target plant 30 and past plant operation record data of the target plant 30 (hereinafter also referred to as "target plant operation record data") to perform offline processing. Learn the recommended setting value calculation model. Here, the offline processing section 201 includes a preliminary learning section 211 and a fine tuning section 212. The pre-learning unit 211 uses plant operation record data of plants other than the target plant 30 to pre-learn a recommended setting value calculation model. The fine-tuning unit 212 fine-tunes the pre-trained recommended setting value calculation model using target plant operation performance data. As a result, the pre-trained recommended setting value calculation model is fine-tuned for the target plant 30, and a learned recommended setting value calculation model is obtained.

オンライン処理部２０２は、オンラインで対象プラント３０から運転データが取得される毎に、その運転データと、学習済みの推奨設定値算出モデルとを用いて、推奨設定値を算出すると共に当該推奨設定値算出モデルを再学習する。ここで、オンライン処理部２０２には、推奨設定値算出部２２１と、提案部２２２と、報酬関数学習部２２３と、モデル学習部２２４とが含まれる。推奨設定値算出部２２１は、対象プラント３０から取得された運転データと学習済みの推奨設定値算出モデルとを用いて、推奨設定値を算出する。提案部２２２は、推奨設定値算出部２２１によって算出された推奨設定値をオペレータ端末２０に送信する。これにより、当該推奨設定値がオペレータに提案される。報酬関数学習部２２３は、オンライン中に得られた対象プラント運転実績データを用いて、推奨設定値が、オペレータが対象プラント３０に実際に設定した設定値と近いほど高い報酬を出力する報酬関数を学習する。モデル学習部２２４は、オンライン中に得られた対象プラント運転実績データと、報酬関数学習部２２３によって学習された報酬関数とを用いて、当該報酬関数によって計算される報酬を最大化するように、推奨設定値算出モデルを再学習する。 Every time operating data is acquired online from the target plant 30, the online processing unit 202 calculates recommended setting values using the operating data and the learned recommended setting value calculation model, and also calculates the recommended setting values. Retrain the calculation model. Here, the online processing section 202 includes a recommended setting value calculation section 221, a proposal section 222, a reward function learning section 223, and a model learning section 224. The recommended setting value calculation unit 221 calculates recommended setting values using the operating data acquired from the target plant 30 and the learned recommended setting value calculation model. The proposal unit 222 transmits the recommended setting values calculated by the recommended setting value calculation unit 221 to the operator terminal 20. Thereby, the recommended setting value is proposed to the operator. The reward function learning unit 223 uses target plant operation performance data obtained while online to develop a reward function that outputs a higher reward as the recommended setting value is closer to the setting value actually set by the operator in the target plant 30. learn. The model learning unit 224 uses target plant operation performance data obtained while online and the reward function learned by the reward function learning unit 223 to maximize the reward calculated by the reward function. Retrain the recommended setting value calculation model.

プラント運転実績記憶部２０３は、対象プラント３０以外のプラントのプラント運転実績データを記憶する。これらのプラント運転実績データはオフライン時に運転支援装置１０に与えられる。 The plant operation record storage unit 203 stores plant operation record data of plants other than the target plant 30. These plant operation performance data are given to the operation support device 10 when offline.

対象プラント運転実績記憶部２０４は、対象プラント３０の対象プラント運転実績データを記憶する。これらの対象プラント運転実績データはオフライン時に運転支援装置１０に与えられると共に、オンライン時に収集された運転データ及び設定値から作成される。なお、オフライン時に運転支援装置１０に与えられる対象プラント運転実績データは、プラント運転実績記憶部２０３に記憶されているプラント運転実績データと比べて少量であることを想定する。言い換えれば、オフライン時には、精度の良い推奨設定値算出モデルを学習するには不十分な量の対象プラント運転実績データしか与えられないものとする。 The target plant operation record storage unit 204 stores target plant operation record data of the target plant 30. These target plant operation performance data are given to the operation support device 10 when offline, and are created from operation data and set values collected when online. Note that it is assumed that the target plant operation performance data given to the operation support device 10 during offline is a small amount compared to the plant operation performance data stored in the plant operation performance storage unit 203. In other words, when offline, only an insufficient amount of target plant operation performance data is provided to learn an accurate recommended setting value calculation model.

＜オフライン処理＞
以下、本実施形態に係るオフライン処理について、図４を参照しながら説明する。なお、オフライン処理は、後述するオンライン処理よりも前に実行される。 <Offline processing>
The offline processing according to this embodiment will be described below with reference to FIG. 4. Note that offline processing is executed before online processing, which will be described later.

オフライン処理部２０１の事前学習部２１１は、対象プラント３０以外のプラントのプラント運転実績データを用いて、推奨設定値算出モデルを事前学習する（ステップＳ１０１）。例えば、事前学習部２１１は、既知の最適化手法を利用して、以下の式（１）により推奨設定値算出モデルのパラメータを算出する。これにより、このパラメータが設定された推奨設定値算出モデルが事前学習済み推奨設定値算出モデルとして得られる。 The pre-learning unit 211 of the offline processing unit 201 pre-learns a recommended setting value calculation model using plant operation record data of plants other than the target plant 30 (step S101). For example, the pre-learning unit 211 uses a known optimization method to calculate the parameters of the recommended setting value calculation model using the following equation (1). Thereby, a recommended setting value calculation model in which this parameter is set is obtained as a pre-trained recommended setting value calculation model.

ここで、

here,

はｎ番目の推奨設定値算出モデルである。また、ｗ_ｎ（ｎ∈［Ｎ］）はｎ番目の推奨設定値算出モデルのパラメータ、Ｎは推奨設定値算出モデルの総数である。ｔ_ｐはプラントｐの時刻インデックスを表す変数、Ｔ_ｐはプラントｐのプラント運転実績データの最終時刻を表す時刻インデックスである。なお、以下では、パラメータｗ_ｎを明示せずに、ｎ番目の推奨設定値算出モデルを単にｆ^ｎと記載することもある。また、時刻インデックスを単に時刻ともいう。

is the nth recommended setting value calculation model. Further, w _n (n∈[N]) is a parameter of the n-th recommended setting value calculation model, and N is the total number of recommended setting value calculation models. t _p is a variable representing the time index of the plant p, and T _p is a time index representing the final time of the plant operation performance data of the plant p. Note that hereinafter, the n-th recommended setting value calculation model may be simply written as f ⁿ without explicitly specifying the parameter w _n . Further, the time index is also simply referred to as time.

また、 Also,

はプラントｐの時刻ｔ_ｐにおけるプラント運転実績データに含まれる運転データである。更に、

is the operation data included in the plant operation performance data of the plant p at time _tp . Furthermore,

はプラントｐの時刻ｔ_ｐにおけるプラント運転実績データに含まれる設定値である。上記の式（１）は、対象プラント３０以外の各プラントｐの各時刻ｔ_ｐ＝１～Ｔ_ｐにおける運転データから推奨設定値算出モデルによって算出された推奨設定値とそのときの実際の設定値との誤差を最小化するように、各推奨設定値算出モデルを学習することを意味している。

is a set value included in the plant operation performance data of the plant p at time _tp . The above equation (1) is based on the recommended setting value calculated by the recommended setting value calculation model from the operating data at each time t _p =1 to T _p of each plant p other than the target plant 30, and the actual setting value at that time. This means learning each recommended setting value calculation model so as to minimize the error between.

なお、Ｎは１以上の整数値であるが、２以上であることが好ましい。これは、Ｎ個の推奨設定値算出モデルによってそれぞれ算出されたＮ個の推奨設定値がオペレータに提案されるため、Ｎ≧２である場合、オペレータは、複数個の推奨設定値を比較しながら実際の設定値を決定することができるためである。 Note that N is an integer value of 1 or more, preferably 2 or more. This is because N recommended setting values each calculated by N recommended setting value calculation models are proposed to the operator, so if N≧2, the operator can compare multiple recommended setting values and This is because the actual set value can be determined.

次に、オフライン処理部２０１のファインチューニング部２１２は、対象プラント運転実績データを用いて、上記のステップＳ１０１で得られた事前学習済み推奨設定値算出モデルをファインチューニングする（ステップＳ１０２）。例えば、ファインチューニング部２１２は、既知の最適化手法を利用して、以下の式（２）により推奨設定値算出モデルのパラメータを算出する。これにより、このパラメータが設定された推奨設定値算出モデルが学習済み推奨設定値算出モデルとして得られる。 Next, the fine-tuning unit 212 of the offline processing unit 201 fine-tunes the pre-learned recommended setting value calculation model obtained in step S101 above using the target plant operation performance data (step S102). For example, the fine tuning unit 212 uses a known optimization method to calculate the parameters of the recommended setting value calculation model using the following equation (2). Thereby, a recommended setting value calculation model in which this parameter is set is obtained as a learned recommended setting value calculation model.

ここで、ｔ_ｐ'は対象プラント３０の時刻インデックスを表す変数、－Ｔ_ｐ' ^１は対象プラント３０の対象プラント運転実績データの最初の時刻を表す時刻インデックスである。また、

Here, t _p' is a variable representing the time index of the target plant 30, and -T _p' ¹ is a time index representing the first time of the target plant operation performance data of the target plant 30. Also,

は対象プラント３０の時刻ｔ_ｐ'における対象プラント運転実績データに含まれる運転データである。更に、

is the operation data included in the target plant operation performance data at time tp _' of the target plant 30. Furthermore,

は対象プラント３０の時刻ｔ_ｐ'における対象プラント運転実績データに含まれる設定値である。上記の式（２）は、対象プラント３０の各時刻ｔ_ｐ'＝－Ｔ_ｐ' ^１～０における運転データから推奨設定値算出モデルによって算出された推奨設定値とそのときの実際の設定値との誤差を最小化するように、各推奨設定値算出モデルを学習することを意味している。

is a setting value included in the target plant operation performance data of the target plant 30 at time tp _' . The above equation (2) is based on the recommended setting value calculated by the recommended setting value calculation model from the operating data at each time t _p' = -T _p' ¹ to 0 of the target plant 30 and the actual setting value at that time. This means that each recommended setting value calculation model is learned so as to minimize the error.

以上のように、本実施形態に係る運転支援装置１０では、対象プラント３０以外のプラントのプラント運転実績データを用いて推奨設定値算出モデルを事前学習した後、対象プラント３０の対象プラント運転実績データを用いて推奨設定値算出モデルをファインチューニングする。これにより、対象プラント３０以外のプラントの知識が推奨設定値算出モデルに転移されるため、対象プラント運転実績データが少量しかない場合（例えば、対象プラント３０が運用開始後間もない場合等）であっても、精度の良い推奨設定値算出モデルを得ることができる。 As described above, in the operation support device 10 according to the present embodiment, after pre-learning the recommended setting value calculation model using the plant operation record data of plants other than the target plant 30, the target plant operation record data of the target plant 30 is used. Fine-tune the recommended setting value calculation model using As a result, the knowledge of plants other than the target plant 30 is transferred to the recommended setting value calculation model. Even if there is, a highly accurate recommended setting value calculation model can be obtained.

＜オンライン処理＞
以下、本実施形態に係るオンライン処理について、図５を参照しながら説明する。図５のステップＳ２０１～ステップＳ２０５は、対象プラント３０から運転データが取得される毎に繰り返し実行する。以下、オンラインの開始時刻をｔ_ｐ'＝１として、対象プラント運転実績記憶部２０４には時刻ｔ_ｐ'＝１～Ｔ_ｐ' ^２－１までの対象プラント運転実績データが記憶されており、時刻ｔ_ｐ'＝Ｔ_ｐ' ^２の運転データが対象プラント３０から取得された場合について説明する。 <Online processing>
Online processing according to this embodiment will be described below with reference to FIG. 5. Steps S201 to S205 in FIG. 5 are repeatedly executed every time operation data is acquired from the target plant 30. Hereinafter, assuming that the online start time is t _p' = 1, target plant operation performance data from time t _p' = 1 to T _p' ² -1 is stored in the target plant operation performance storage unit 204. A case will be described in which the operation data of t _p' =T _p' ² is acquired from the target plant 30.

オンライン処理部２０２の推奨設定値算出部２２１は、対象プラント３０から取得された運転データと、学習済み推奨設定値算出モデルとを用いて、推奨設定値を算出する（ステップＳ２０１）。すなわち、推奨設定値算出部２２１は、ｎ＝１，・・・，Ｎに対して、 The recommended setting value calculation unit 221 of the online processing unit 202 calculates recommended setting values using the operating data acquired from the target plant 30 and the learned recommended setting value calculation model (step S201). That is, the recommended setting value calculation unit 221 calculates, for n=1,...,N,

によりＮ個の推奨設定値を算出する。なお、ｗ_ｎは、オフライン処理が実行された直後の場合は上記のステップＳ１０２で算出された値であり、一方で後述するステップＳ２０５が実行された後は当該ステップＳ２０５で更新された値である。

N recommended setting values are calculated. Note that w _n is the value calculated in step S102 above immediately after offline processing is executed, and on the other hand, after step S205 described below is executed, it is the value updated in step S205. .

次に、オンライン処理部２０２の提案部２２２は、上記のステップＳ２０１で算出されたＮ個の推奨設定値をオペレータ端末２０に送信する（ステップＳ２０２）。これにより、これらＮ個の推奨設定値がオペレータに提案される。 Next, the proposal unit 222 of the online processing unit 202 transmits the N recommended setting values calculated in step S201 above to the operator terminal 20 (step S202). As a result, these N recommended setting values are proposed to the operator.

以下、オペレータは、Ｎ個の推奨設定値を参考にして対象プラント３０に実際に設定する設定値を決定した上で、オペレータ端末２０を操作して当該設定値を対象プラント３０に設定したものとする。また、当該設定値は、時刻ｔ_ｐ'＝Ｔ_ｐ' ^２の設定値としてオペレータ端末２０から運転支援装置１０に送信されたものとする。これにより、時刻ｔ_ｐ'＝Ｔ_ｐ' ^２の運転データと時刻ｔ_ｐ'＝Ｔ_ｐ' ^２の設定値との組が、時刻ｔ_ｐ'＝Ｔ_ｐ' ^２の対象プラント運転実績データとして対象プラント運転実績記憶部２０４に記憶される。 Hereinafter, it is assumed that the operator refers to the N recommended setting values and determines the setting values to be actually set in the target plant 30, and then operates the operator terminal 20 to set the setting values in the target plant 30. do. Further, it is assumed that the set value is transmitted from the operator terminal 20 to the driving support device 10 as a set value at time t _p' =T _p' ² . As a result, the set of the operating data at time t _p' = T _p' ² and the set value at time t _p' = T _p' ² is targeted as the target plant operation performance data at time t _p' = T _p' ² . It is stored in the plant operation record storage unit 204.

次に、オンライン処理部２０２の報酬関数学習部２２３は、オンライン中に得られた対象プラント運転実績データ（つまり、時刻ｔ_ｐ'＝１～Ｔ_ｐ' ^２までの対象プラント運転実績データ）を用いて、報酬関数を学習する（ステップＳ２０３）。例えば、報酬関数学習部２２３は、既知の最適化手法を利用して、以下の式（３）により報酬関数のパラメータを算出する。これにより、このパラメータが設定された報酬関数が学習済み報酬関数として得られる。 Next, the reward function learning unit 223 of the online processing unit 202 uses the target plant operation performance data obtained while online (that is, the target plant operation performance data from time t _p' = 1 to T _p' ² ). Then, the reward function is learned (step S203). For example, the reward function learning unit 223 uses a known optimization method to calculate the parameters of the reward function using the following equation (3). Thereby, a reward function in which this parameter is set is obtained as a learned reward function.

ここで、ｒ_θは報酬関数、θは報酬関数のパラメータである。報酬関数ｒ_θは、運転データと設定値（又は推奨設定値）とを入力として、報酬を表す値を出力する関数である。上記の式（３）により、推奨設定値が、オペレータが対象プラント３０に実際に設定した設定値と近いほど高い報酬を出力する報酬関数ｒ_θを獲得できる。

Here, r _θ is a reward function, and θ is a parameter of the reward function. The reward function r _θ is a function that receives driving data and a set value (or recommended set value) as input, and outputs a value representing the reward. Using the above equation (3), it is possible to obtain a reward function r _θ that outputs a higher reward as the recommended setting value is closer to the setting value actually set by the operator in the target plant 30.

次に、オンライン処理部２０２のモデル学習部２２４は、オンライン中に得られた対象プラント運転実績データ（つまり、時刻ｔ_ｐ'＝１～Ｔ_ｐ' ^２までの対象プラント運転実績データ）と、上記のステップＳ２０４で学習された報酬関数とを用いて、推奨設定値算出モデルを再学習する（ステップＳ２０４）。例えば、モデル学習部２２４は、ｎ＝１，・・・，Ｎに対して、φ_ｎ←ｗ_ｎとした上で、既存の最適化手法や強化学習手法を利用して、以下の式（４）により推奨設定値算出モデルのパラメータφ_ｎを算出する。これにより、このパラメータφ_ｎが設定された推奨設定値算出モデルが再学習済み推奨設定値算出モデルとして得られる。 Next, the model learning unit 224 of the online processing unit 202 uses the target plant operation performance data obtained while online (that is, the target plant operation performance data from time t _p' = 1 to T _p' ² ) and the above-mentioned The recommended setting value calculation model is retrained using the reward function learned in step S204 (step S204). For example, the model learning unit 224 sets φ _n ←w _n for n = 1, ..., N, and then uses the existing optimization method or reinforcement learning method to calculate the following equation (4 ) to calculate the parameter φ _n of the recommended setting value calculation model. As a result, a recommended setting value calculation model in which this parameter φ _n is set is obtained as a relearned recommended setting value calculation model.

上記の（４）により、ｎ＝１，・・・，Ｎに対して、報酬を最大化するような推奨設定値算出モデルｆ^ｎ（つまり、オペレータの実際の運転を模擬するような推奨設定値算出モデルｆ^ｎ）が獲得できる。

According to (4) above, for n = 1, ..., N, a recommended setting value calculation model f ⁿ that maximizes the reward (that is, a recommended setting value that simulates the actual operation of the operator A calculation model f ⁿ ) can be obtained.

そして、オンライン処理部２０２のモデル学習部２２４は、ｎ＝１，・・・，Ｎに対して、ｗ_ｎ←φ_ｎと更新する（ステップＳ２０５）。なお、φ_ｎは、上記のステップＳ２０４で算出されたパラメータである。これにより、対象プラント３０から新たな運転データが取得された後、本ステップで更新されたパラメータｗ_ｎを用いてステップＳ２０１以降の処理が実行される。 Then, the model learning unit 224 of the online processing unit 202 updates w _n ←φ _n for n=1, . . . , N (step S205). Note that φ _n is the parameter calculated in step S204 above. As a result, after new operation data is acquired from the target plant 30, the processes from step S201 onwards are executed using the parameters w _n updated in this step.

以上のように、本実施形態に係る運転支援装置１０では、オンライン中に運転データが取得される毎に、推奨設定値をオペレータに提案すると共に、対象プラント３０に実際に設定された設定値を用いて報酬関数と推奨設定値算出モデルとを動的に学習する。これにより、オペレータの運転を精度良く模擬する推奨設定値算出モデルが獲得され、精度の良い推奨設定値をオペレータに提案することができるようになる。 As described above, in the operation support device 10 according to the present embodiment, each time operation data is acquired while online, recommended setting values are proposed to the operator, and setting values actually set in the target plant 30 are proposed to the operator. The reward function and recommended setting value calculation model are dynamically learned using the method. As a result, a recommended setting value calculation model that accurately simulates the operator's driving is obtained, and highly accurate recommended setting values can be proposed to the operator.

なお、図５に示すオンライン処理では繰り返し毎に報酬関数と推奨設定値算出モデルを学習したが、例えば、或る所定の条件を満たした以降は報酬関数と推奨設定値算出モデルの学習は行わずに、ステップＳ２０１～ステップＳ２０２のみが実行されてもよい。当該条件としては、例えば、ステップＳ２０１～ステップＳ２０５の繰り返し回数が所定の回数（１回である場合も含む。）を超えた場合、各パラメータｗ_ｎが収束した場合等が挙げられる。 Note that in the online process shown in Figure 5, the reward function and recommended setting value calculation model were learned in each iteration, but for example, after a certain predetermined condition is met, the reward function and recommended setting value calculation model are not learned. Alternatively, only steps S201 and S202 may be executed. Examples of such conditions include, for example, when the number of repetitions of steps S201 to S205 exceeds a predetermined number of times (including the case of 1 time), when each parameter w _n converges, etc.

＜報酬関数の学習方法の他の例＞
以下、図５のステップＳ２０３における報酬関数の学習方法の他の例について説明する。 <Other examples of reward function learning methods>
Another example of the reward function learning method in step S203 of FIG. 5 will be described below.

・報酬関数の学習方法の他の例その１
上記の式（３）の代わりに、以下の式（５）により報酬関数のパラメータが算出されてもよい。・Other examples of reward function learning method part 1
Instead of the above equation (3), the parameters of the reward function may be calculated using the following equation (5).

ここで、σはシグモイド関数である。上記の式（５）により、より安定した学習が期待できる。

Here, σ is a sigmoid function. More stable learning can be expected with the above equation (5).

・報酬関数の学習方法の他の例その２
各時刻ｔ_ｐ'で任意の２つの推奨設定値算出モデルの組み合わせ毎にその２つの推奨設定値算出モデルの出力（つまり、推奨設定値）を或る所定の基準で順位付けを行って報酬関数を学習してもよい。・Other example of reward function learning method part 2
At each time t _p', for each combination of two arbitrary recommended setting value calculation models, the outputs (that is, recommended setting values) of the two recommended setting value calculation models are ranked based on a certain predetermined standard, and a reward function is calculated. You can also learn.

より具体的には、各時刻ｔ_ｐ'で任意の２つの推奨設定値算出モデルの出力を比較したときに順位が高い方の推奨設定値をｆ^ｇｏｏｄ、順位が低い方の推奨設定値をｆ^ｂａｄとして、ｆ^ｇｏｏｄとｆ^ｂａｄの組に対してその組を識別するインデックスｄを１から順に付与する。そして、｛（ｄ，ｆ^ｇｏｏｄ，ｆ^ｂａｄ）｜ｄ＝１，・・・，Ｄ｝（ただし、Ｄはｆ^ｇｏｏｄとｆ^ｂａｄの組の総数）を用いて、以下の式（６）により報酬関数のパラメータを算出する。 More specifically, when comparing the outputs of any two recommended setting value calculation models at each time t _p', the recommended setting value with a higher rank is f ^good , and the recommended setting value with a lower rank is f good As ^bad , an index d for identifying the pair of f ^good and f ^bad is sequentially assigned starting from 1. Then, using {(d, f ^good , f ^bad ) | d=1,...,D} (where D is the total number of pairs of f ^good and f ^bad ), the reward is calculated by the following formula (6). Calculate the parameters of the function.

上記の式（６）により、オペレータが対象プラント３０に実際に設定した設定値を用いることなく、各時刻ｔ_ｐ'における任意の２つの推奨設定値算出モデルの順位付けのみで報酬関数を学習することができる。

According to the above equation (6), the reward function is learned only by ranking two arbitrary recommended setting value calculation models at each time _tp' , without using the setting values actually set by the operator in the target plant 30. be able to.

ここで、順位付けの基準としては様々な基準を採用し得るが、例えば、推奨設定値が大きい方の順位を高い方の順位とすることが考えられる。 Here, various criteria can be adopted as the criteria for ranking, but for example, it is conceivable to rank the one with a larger recommended setting value as the higher one.

一例として、Ｎ＝３、推奨設定値が大きい方の順位を高い方の順位とする。このとき、ｔ_ｐ'＝１でｆ^２＞ｆ^３＞ｆ^１、ｔ_ｐ'＝２でｆ^１＞ｆ^３＞ｆ^２である場合における（ｄ，ｆ^ｇｏｏｄ，ｆ^ｂａｄ）を図６に示す。図６に示すように、ｔ_ｐ'＝１のとき、ｆ^２＞ｆ^３＞ｆ^１であるため、（ｄ，ｆ^ｇｏｏｄ，ｆ^ｂａｄ）＝（１，ｆ^２，ｆ^１），（２，ｆ^３，ｆ^１），（３，ｆ^２，ｆ^３）となる。同様に、ｔ_ｐ'＝２のとき、ｆ^１＞ｆ^３＞ｆ^２であるため、（ｄ，ｆ^ｇｏｏｄ，ｆ^ｂａｄ）＝（４，ｆ^１，ｆ^２），（５，ｆ^１，ｆ^３），（６，ｆ^３，ｆ^２）となる。 As an example, if N=3, the higher the recommended setting value, the higher the rank. At this time, (d, f ^good , f ^bad ) in the case where f ² > f ³ > f ¹ when t p _' = 1 and f ¹ > f ³ > f ² when t _p' = 2 is shown in Fig. 6. . As shown in FIG. 6, when t _p' = 1, since f ² > f ³ > f ¹ , (d, f ^good , f ^bad ) = (1, f ² , f ¹ ), (2, f ³ , f ¹ ), (3, f ² , f ³ ). Similarly, when t _p' = 2, since f ¹ > f ³ > f ² , (d, f ^good , f ^bad ) = (4, f ¹ , f ² ), (5, f ¹ , f ³ ), (6, f ³ , f ² ).

・報酬関数の学習方法の他の例その３
上記の報酬関数の学習方法の他の例その２において、２つの推奨設定値算出モデルの出力の順位が同順位となることを許容してもよい。また、或る時刻ｔ_ｐ'で特定の２つの推奨設定値算出モデルの出力のみが順位付けできてもよい（言い換えれば、或る時刻ｔ_ｐ'で順位付けできない推奨設定値算出モデルの組み合わせがあってもよい。）。・Other example of reward function learning method part 3
In the second example of the reward function learning method described above, it may be possible to allow the output rankings of the two recommended setting value calculation models to be the same. Furthermore, only the outputs of two specific recommended setting value calculation models may be ranked at a certain time tp _' (in other words, a combination of recommended setting value calculation models that cannot be ranked at a certain time tp _' may be ).

このとき、｛（ｄ，ａ，ｆ^ｇｏｏｄ，ｆ^ｂａｄ）｜ｄ＝１，・・・，Ｄ｝を用いて、以下の式（７）により報酬関数のパラメータを算出してもよい。 At this time, the parameters of the reward function may be calculated using the following equation (7) using {(d, a, f ^good , f ^bad ) | d=1, . . . , D}.

ここで、ａは同一時刻ｔ_ｐ'で得られたｆ^ｇｏｏｄとｆ^ｂａｄの組の数の逆数であり、各時刻ｔ_ｐ'でｆ^ｇｏｏｄに関する報酬とｆ^ｂａｄに関する報酬との差の期待値（平均）を取るためのパラメータである。

Here, a is the reciprocal of the number of pairs of f ^good and ^f ^bad obtained at the same time t _p' _, and the expected value ⁽ This is a parameter for taking the average).

一例として、Ｎ＝３、推奨設定値が大きい方の順位を高い方の順位とする。このとき、ｔ_ｐ'＝１でｆ^２＞ｆ^３＝ｆ^１、ｔ_ｐ'＝２でｆ^１＞ｆ^３＞ｆ^２であり、またｔ_ｐ'＝３ではｆ^１とｆ^２のみが順位付け可能でｆ^１＞ｆ^２である場合における（ｄ，ａ，ｆ^ｇｏｏｄ，ｆ^ｂａｄ）を図７に示す。図７に示すように、ｔ_ｐ'＝１のとき、ｆ^２＞ｆ^３＝ｆ^１であるため、（ｄ，ａ，ｆ^ｇｏｏｄ，ｆ^ｂａｄ）＝（１，１／２，ｆ^２，ｆ^１），（２，１／２，ｆ^２，ｆ^３）となる。同様に、ｔ_ｐ'＝２のとき、ｆ^１＞ｆ^３＞ｆ^２であるため、（ｄ，ａ，ｆ^ｇｏｏｄ，ｆ^ｂａｄ）＝（３，１／３，ｆ^１，ｆ^２），（４，１／３，ｆ^１，ｆ^３），（５，１／３，ｆ^３，ｆ^２）となる。また、ｔ_ｐ'＝３のときはｆ^１とｆ^２のみが順位付け可能でｆ^１＞ｆ^２であるため、（ｄ，ａ，ｆ^ｇｏｏｄ，ｆ^ｂａｄ）＝（６，１，ｆ^１，ｆ^２）となる。 As an example, if N=3, the higher the recommended setting value, the higher the rank. At this time, f ² > f ³ = f ¹ at t _p' = 1, f ¹ > f ³ > f 2 at t _p' = ² , and only f ¹ and f ² are ranked at t _p' = 3. FIG. 7 shows (d, a, f ^good , f ^bad ) in the case where f ¹ >f ² and f 1 can be attached. As shown in FIG. 7, when t _p' = 1, f ² > f ³ = f ¹ , so (d, a, f ^good , f ^bad ) = (1,1/2, f ² , f ¹ ), (2,1/2, f ² , f ³ ). Similarly, when t _p' = 2, since f ¹ > f ³ > f ² , (d, a, f ^good , f ^bad ) = (3,1/3, f ¹ , f ² ), ( 4,1/3, f ¹ , f ³ ), (5,1/3, f ³ , f ² ). Furthermore, when t _p' = 3, only f ¹ and f ² can be ranked and f ¹ > f ² , so (d, a, f ^good , f ^bad ) = (6, 1, f ¹ , f ² ).

＜推奨設定値算出モデルの学習方法の他の例＞
以下、図５のステップＳ２０４における推奨設定値算出モデルの再学習方法の他の例について説明する。 <Other examples of learning methods for recommended setting value calculation model>
Hereinafter, another example of the method for relearning the recommended setting value calculation model in step S204 of FIG. 5 will be described.

・推奨設定値算出モデルの学習方法その１
推奨設定値算出モデルを再学習する際は、実際の運転データと推奨設定値ではなく、仮想的な運転データとその仮想的な運転データを推奨設定値算出モデルに入力することによって得られた推奨設定値とを用いてもよい。このとき、仮想的な運転データの作成方法としては、例えば、実際の運転データの分布からサンプリング等によって作成されてもよい。・Learning method for recommended setting value calculation model part 1
When relearning the recommended setting value calculation model, use virtual operating data and the recommendations obtained by inputting that virtual operating data into the recommended setting value calculation model, rather than actual operating data and recommended setting values. A set value may also be used. At this time, the virtual driving data may be created by, for example, sampling from the distribution of actual driving data.

・推奨設定値算出モデルの学習方法その２・Learning method for recommended setting value calculation model part 2

ｘを実際の運転データの分布として、上記の式（４）の代わりに、以下の式（８）により推奨設定値算出モデルのパラメータφ_ｎを算出してもよい。 The parameter φ _n of the recommended setting value calculation model may be calculated using the following equation (8) instead of the above equation (4), where x is the distribution of actual operating data.

ここで、ＫＬはカルバック・ライブラー情報量、Ｌは対数尤度である。また、β及びγは予め決められたパラメータ（ハイパーパラメータ）である。

Here, KL is the Kullback-Leibler information amount, and L is the log likelihood. Further, β and γ are predetermined parameters (hyper parameters).

＜実施例＞
以下、本実施形態の一実施例について説明する。 <Example>
An example of this embodiment will be described below.

本実施例では、対象プラント３０としてごみ焼却プラントを想定する。ごみ焼却プラントの模式図を図８に示す。図８に示すように、ごみ焼却プラントでは、ごみと空気を燃焼炉に投入し、その燃焼によって発生した熱が蒸気に変換され、蒸気と一酸化炭素（ＣＯ）等といった排気ガスとが出力される。一般に蒸気は発電等に利用されるため、蒸気の生成量を増やすと共にそれを安定化させることが求められる。一方で、蒸気の生成量を増やすためにはごみの投入量と空気の流量を増やす必要があるが、それにより不完全燃焼が発生し、その結果、ＣＯ濃度が高くなる可能性がある。このため、ごみの投入量と空気の流量とを適切に操作する必要がある。なお、ごみの投入量はフィーダと呼ばれる設備の動作速度により操作され、空気流量はバルブ等の開閉角度により操作される。また、蒸気流量と排ガス濃度（ＣＯ濃度）はセンサ等により計測される。 In this embodiment, a garbage incineration plant is assumed as the target plant 30. A schematic diagram of the waste incineration plant is shown in Figure 8. As shown in Figure 8, in a waste incineration plant, waste and air are input into a combustion furnace, and the heat generated by the combustion is converted into steam, and steam and exhaust gases such as carbon monoxide (CO) are output. Ru. Since steam is generally used for power generation, etc., it is required to increase the amount of steam generated and to stabilize it. On the other hand, in order to increase the amount of steam produced, it is necessary to increase the amount of waste input and the flow rate of air, which may lead to incomplete combustion and, as a result, higher CO concentrations. For this reason, it is necessary to appropriately control the amount of garbage input and the flow rate of air. Note that the amount of garbage input is controlled by the operating speed of equipment called a feeder, and the air flow rate is controlled by the opening/closing angle of a valve or the like. Further, the steam flow rate and the exhaust gas concentration (CO concentration) are measured by a sensor or the like.

このため、ごみ焼却プラントの状態変数は、ｘ_１：フィーダ速度、ｘ_２：空気流量、ｘ_３：蒸気流量、ｘ_４：排ガス濃度となる。本実施例では、時刻ｔ_ｐ'の運転データは、時刻ｔ_ｐ'－１～時刻ｔ_ｐ'までの間の各状態変数ｘ_１，ｘ_２，ｘ_３，ｘ_４の時系列データを表すものとする。また、推奨設定値算出モデルとしては、時刻ｔ_ｐ'の運転データを入力として、将来の時刻の推奨設定値ｘ_１，ｘ_２の時系列データを算出する２つのモデルｆ^１及びｆ^２を想定する。 Therefore, the state variables of the waste incineration plant are x ₁ : feeder speed, x ₂ : air flow rate, x ₃ : steam flow rate, and x ₄ : exhaust gas concentration. In this embodiment, the operation data at time t _p' represents time series data of each state variable x ₁ , x ₂ , x ₃ , x ₄ between time t _p' -1 and time t _p'. shall be. In addition, as recommended setting value calculation models, two models f ¹ and f ² are assumed, which calculate time series data of recommended setting values x ₁ and x ₂ at future times using operating data at time t _{p' as} input. do.

本実施例で事前学習及びファインチューニングしたモデルｆ^１及びｆ^２の入出力を図９に示す。図９に示すように、モデルｆ^１及びｆ^２は、時刻ｔ_ｐ'の運転データを入力として、将来の時刻の推奨設定値ｘ_１，ｘ_２（つまり、フィーダ速度及び空気流量）の時系列データを算出及び出力する。図９に示すように、モデルｆ^１では過去の設定値を維持した推奨設定値が算出されている一方で、モデルｆ^２では過去の設定値を上昇させた推奨設定値が算出されている。 FIG. 9 shows the input and output of models f ¹ and f ² that were pre-trained and fine-tuned in this example. As shown in FIG. 9, the models f ¹ and f ² are a time series of recommended setting values x ₁ and x ₂ (that is, feeder speed and air flow rate) at future times using the operating data at time t _p' as input. Calculate and output data. As shown in FIG. 9, model f ¹ calculates recommended setting values that maintain the past settings, while model f ² calculates recommended settings that increase the past settings.

本実施例でオペレータがごみ焼却プラントに設定した設定値を図１０に示す。図１０に示すように、オペレータは、モデルｆ^１の推奨設定値とｆ^２の推奨設定値とを参考にして、自身の経験等に基づいてフィーダ速度と空気流量の両方を下げる運転を選択している。 FIG. 10 shows the setting values set in the waste incineration plant by the operator in this embodiment. As shown in Fig. 10, the operator selects an operation that reduces both the feeder speed and the air flow rate based on his own experience, referring to the recommended setting values of model ^f1 and ^f2 . ing.

本実施例で報酬関数ｒ_θを学習した様子を図１１に示す。図１１に示すように、報酬関数ｒ_θの学習では、オペレータが実際に設定した設定値では報酬が高くなるように報酬関数のパラメータθが学習される。 FIG. 11 shows how the reward function r _θ is learned in this example. As shown in FIG. 11, in learning the reward function r _θ , the parameter θ of the reward function is learned such that the reward is higher at the set value actually set by the operator.

本実施例で推奨設定値算出モデルｆ^１を再学習した様子を図１２に示す。図１２に示すように、パラメータφ_１の初期値をｗ_１とした上で、報酬関数ｒ_θを用いて、モデルｆ^１のパラメータφ_１が学習される。なお、推奨設定値算出モデルｆ^２についても同様に、パラメータφ_２の初期値をｗ_２とした上で、報酬関数ｒ_θを用いて、モデルｆ^２のパラメータφ_２が学習される。 FIG. 12 shows how the recommended setting value calculation model ^f1 is retrained in this embodiment. As shown in FIG. 12, the parameter φ ₁ of the model f ₁ is learned using the reward function r _θ with the initial value of the parameter φ ¹ set to w ₁ . Similarly, regarding the recommended setting value calculation model f ² , the parameter φ ₂ _of the model f 2 is learned using the reward function r _θ after setting the initial value of the parameter φ ² to w ₂ .

本実施例で推奨設定値算出モデルｆ^１及びｆ^２を学習した結果を図１３に示す。図１３に示すように、オンラインで再学習が行われるため、その後、同一の状態を表す運転データが取得されるとモデルｆ^１はオペレータが実際に設定する設定値に近い推奨設定値を算出できるようになる。これにより、オペレータの実際の運転を模擬するような推奨設定値算出モデルが得られることがわかる。 FIG. 13 shows the results of learning the recommended setting value calculation models f ¹ and f ² in this embodiment. As shown in Figure 13, since relearning is performed online, when operating data representing the same condition is obtained thereafter, model ^f1 can calculate recommended settings close to the settings actually set by the operator. It becomes like this. It can be seen that this allows a recommended setting value calculation model that simulates the actual operation of the operator to be obtained.

＜まとめ＞
以上のように、本実施形態に係る運転支援装置１０は、オフラインにおいて、対象プラント３０以外のプラントのプラント運転実績データを用いて推奨設定値算出モデルを事前学習した後、対象プラント３０の対象プラント運転実績データを用いて推奨設定値算出モデルをファインチューニングする。また、本実施形態に係る運転支援装置１０は、オンラインにおいて、運転データが取得される毎に、推奨設定値算出モデルによって算出された推奨設定値をオペレータに提案すると共に、対象プラント３０に実際に設定された設定値を用いて報酬関数と推奨設定値算出モデルとを動的に学習する。 <Summary>
As described above, the operation support device 10 according to the present embodiment pre-learns the recommended setting value calculation model offline using the plant operation record data of plants other than the target plant 30, and then Fine-tune the recommended setting value calculation model using operational performance data. Further, the operation support device 10 according to the present embodiment not only proposes recommended setting values calculated by the recommended setting value calculation model to the operator online every time operation data is acquired, but also proposes recommended settings values calculated by the recommended setting value calculation model to the operator. A reward function and a recommended setting value calculation model are dynamically learned using the set setting values.

これにより、本実施形態に係る運転支援装置１０では、対象プラント運転実績データが少量しかない場合であっても、オフラインで精度の良い推奨設定値算出モデルを得ることができる。また、本実施形態に係る運転支援装置１０では、オンラインで対象プラント３０に実際に設定された設定値から推奨設定値算出モデルを再学習するため、オペレータの実際の運転を精度良く模擬する推奨設定値算出モデルを得ることができる。更に、本実施形態に係る運転支援装置１０では、オンラインで報酬関数を推定することにより、推奨設定値算出モデルの再学習を安定化させることができる。 Thereby, in the operation support device 10 according to the present embodiment, even if there is only a small amount of target plant operation performance data, an accurate recommended setting value calculation model can be obtained off-line. In addition, in the operation support device 10 according to the present embodiment, since the recommended setting value calculation model is re-learned online from the setting values actually set in the target plant 30, the recommended settings that accurately simulate the operator's actual operation are performed. A value calculation model can be obtained. Furthermore, in the driving support device 10 according to the present embodiment, relearning of the recommended setting value calculation model can be stabilized by estimating the reward function online.

本発明、具体的に開示された上記の実施形態に限定されるものではなく、特許請求の範囲の記載から逸脱することなく、種々の変形や変更、既知の技術との組み合わせ等が可能である。 The present invention is not limited to the above-described embodiments specifically disclosed, and various modifications and changes, combinations with known techniques, etc. are possible without departing from the scope of the claims. .

１プラント制御システム
１０運転支援装置
２０オペレータ端末
３０対象プラント
１０１入力装置
１０２表示装置
１０３外部Ｉ／Ｆ
１０３ａ記録媒体
１０４通信Ｉ／Ｆ
１０５ＲＡＭ
１０６ＲＯＭ
１０７補助記憶装置
１０８プロセッサ
１０９バス
２０１オフライン処理部
２０２オンライン処理部
２０３プラント運転実績記憶部
２０４対象プラント運転実績記憶部
２１１事前学習部
２１２ファインチューニング部
２２１推奨設定値算出部
２２２提案部
２２３報酬関数学習部
２２４モデル学習部 1 Plant control system 10 Operation support device 20 Operator terminal 30 Target plant 101 Input device 102 Display device 103 External I/F
103a Recording medium 104 Communication I/F
105 RAM
106 ROM
107 Auxiliary storage device 108 Processor 109 Bus 201 Offline processing unit 202 Online processing unit 203 Plant operation record storage unit 204 Target plant operation record storage unit 211 Preliminary learning unit 212 Fine tuning unit 221 Recommended setting value calculation unit 222 Proposal unit 223 Reward function learning Section 224 Model learning section

Claims

An operation support device for supporting the operation of a target plant,
a pre-learning unit that uses operating record data of plants other than the target plant to learn a model that receives operating data representing the state of the plant as input and outputs recommended setting values for the plant;
a fine-tuning unit that fine-tunes the model learned by the preliminary learning unit using operation record data of the target plant;
a recommended setting value calculation unit that calculates a recommended setting value using the model from the operation data of the target plant each time operation data is acquired from the target plant;
a proposal unit that proposes the recommended setting values calculated by the recommended setting value calculation unit to the operator of the target plant;
The operating data acquired from the target plant, the recommended setting values calculated by inputting the operational data acquired from the target plant into the model, and the settings actually set in the target plant by the operator of the target plant. Using a reward function that has been learned to output a higher reward as the recommended setting value is closer to the setting value, the model is re-run so as to maximize the sum of rewards output by the reward function. A model learning section for learning,
A driving support device with

The report is generated using the operating data acquired from the target plant, the recommended setting values calculated by the recommended setting value calculation unit, and the setting values actually set in the target plant by the operator of the target plant. a reward function learning unit that learns a reward function ;
The driving support device according to claim 1, comprising:

The model learning section includes:
Instead of the operation data acquired from the target plant, virtual operation data sampled from the distribution of the operation data acquired from the target plant is used, and the operation data acquired from the target plant is applied to the model. 3. The model is re-learned using the recommended setting values calculated by inputting the virtual driving data to the model instead of the recommended setting values calculated by inputting the virtual driving data to the model. Driving support equipment.

The model learning section includes:
Further, the model is re-trained so as to minimize the amount of Kullback-Leibler information between the distributions of the driving data before and after re-learning the model, and to maximize the log likelihood of the model. The driving support device according to item 2.

The reward function learning unit is
the difference between the reward calculated by the reward function from the operation data acquired from the target plant and the recommended setting value, and the reward calculated by the reward function from the operation data acquired from the target plant and the setting value; The driving support device according to any one of claims 2 to 4, which learns the reward function so as to maximize.

The reward function learning unit is
The driving support device according to claim 5, wherein the reward function is learned so as to maximize a value obtained by taking the logarithm of the sigmoid function value of the difference.

The reward function learning unit is
Ranking recommended setting values calculated by the models for a combination of two of the plurality of models,
Learning the reward function so as to maximize the difference between the reward calculated by the reward function from recommended setting values with a high ranking and the reward calculated by the reward function from recommended setting values with a low ranking. The driving support device according to any one of items 2 to 4.

The reward function learning unit is
The driving support device according to claim 7, wherein the reward function is learned so as to maximize an expected value or an average of the differences.

The operation support equipment to support the operation of the target plant is
a pre-learning procedure for learning a model that uses operating record data of plants other than the target plant to input operating data representing the state of the plant and outputs recommended setting values for the plant;
a fine-tuning procedure of fine-tuning the model learned by the pre-learning procedure using operating performance data of the target plant;
a recommended setting value calculation procedure of calculating a recommended setting value using the model from the operation data of the target plant each time operation data is acquired from the target plant;
a proposing step of proposing the recommended setting value calculated by the recommended setting value calculating step to the operator of the target plant;
The operating data acquired from the target plant, the recommended setting values calculated by inputting the operational data acquired from the target plant into the model, and the settings actually set in the target plant by the operator of the target plant. Using a reward function that has been learned to output a higher reward as the recommended setting value is closer to the setting value, the model is re-run so as to maximize the sum of rewards output by the reward function. A model learning procedure to be learned;
A driving assistance method that performs.

Operation support equipment to support the operation of the target plant,
a pre-learning procedure for learning a model that uses operating record data of plants other than the target plant to input operating data representing the state of the plant and outputs recommended setting values for the plant;
a fine-tuning procedure of fine-tuning the model learned by the pre-learning procedure using operating performance data of the target plant;
a recommended setting value calculation procedure of calculating a recommended setting value using the model from the operation data of the target plant each time operation data is acquired from the target plant;
a proposing step of proposing the recommended setting value calculated by the recommended setting value calculating step to the operator of the target plant;
The operating data acquired from the target plant, the recommended setting values calculated by inputting the operational data acquired from the target plant into the model, and the settings actually set in the target plant by the operator of the target plant. Using a reward function that has been learned to output a higher reward as the recommended setting value is closer to the setting value, the model is re-run so as to maximize the sum of rewards output by the reward function. A model learning procedure to be learned;
A program to run.