JP7484868B2

JP7484868B2 - Operation system, operation method, and operation program, as well as evaluation model generation device, evaluation model generation method, and evaluation model generation program

Info

Publication number: JP7484868B2
Application number: JP2021175652A
Authority: JP
Inventors: 豪 ▲高▼見; 宏明鹿子木; 恵一郎小渕; 陽太古川
Original assignee: Yokogawa Electric Corp
Current assignee: Yokogawa Electric Corp
Priority date: 2021-10-27
Filing date: 2021-10-27
Publication date: 2024-05-16
Anticipated expiration: 2041-10-27
Also published as: JP2023065072A

Description

本発明は、操業システム、操業方法、および、操業プログラム、ならびに、評価モデル生成装置、評価モデル生成方法、および、評価モデル生成プログラムに関する。 The present invention relates to an operation system, an operation method, and an operation program, as well as an evaluation model generation device, an evaluation model generation method, and an evaluation model generation program.

特許文献１には、「測定データの入力に応じ、予め設定された報酬関数により定まる報酬値を高めるために推奨される第１種類の制御内容を示す推奨制御パラメータを出力する第１モデルの学習処理を実行する」と記載されている。
［先行技術文献］
［特許文献］
［特許文献１］特開２０２１－０８６２８３
［特許文献２］特開２０２０－０２７５５６
［特許文献３］特開２０１９－０２０８８５ Patent Document 1 states that "in response to input of measurement data, a learning process for a first model is executed to output recommended control parameters indicating a first type of control content recommended for increasing a reward value determined by a preset reward function."
[Prior Art Literature]
[Patent Documents]
[Patent Document 1] JP2021-086283
[Patent Document 2] JP2020-027556A
[Patent Document 3] JP2019-020885A

本発明の第１の態様においては、操業システムを提供する。上記操業システムは、機械学習により、設備における操業目標および上記設備における状態に基づいて対象とする目標について上記設備における状態を評価した指標を出力する評価モデルを生成する評価モデル生成装置を備えてよい。上記操業システムは、上記評価モデルの出力を報酬の少なくとも一部とした強化学習により、上記設備における状態に応じた行動を出力する操業モデルを生成する操業モデル生成装置を備えてよい。上記操業システムは、上記設備の状態に応じて上記操業モデルが出力する行動に基づく操作量を、上記設備における制御対象へ与える制御装置を備えてよい。 In a first aspect of the present invention, an operation system is provided. The operation system may include an evaluation model generation device that generates, by machine learning, an evaluation model that outputs an index that evaluates a state of the equipment for a target goal based on an operation goal of the equipment and the state of the equipment. The operation system may include an operation model generation device that generates, by reinforcement learning using the output of the evaluation model as at least a part of a reward, an operation model that outputs an action according to the state of the equipment. The operation system may include a control device that applies an operation amount based on the action output by the operation model according to the state of the equipment to a control target of the equipment.

上記評価モデル生成装置は、上記制御対象が上記操業モデルを用いて制御された場合における上記設備における状態に基づいて、上記評価モデルを更新してよい。 The evaluation model generation device may update the evaluation model based on the state of the equipment when the control object is controlled using the operation model.

上記操業モデル生成装置は、更新された上記評価モデルの出力を報酬の少なくとも一部とした強化学習により、上記操業モデルを更新してよい。 The operation model generation device may update the operation model by reinforcement learning using the output of the updated evaluation model as at least a part of the reward.

上記制御装置は、更新された上記操業モデルを用いて上記制御対象を制御してよい。 The control device may control the control object using the updated operation model.

上記評価モデル生成装置は、上記操業目標を取得する操業目標取得部を備えてよい。上記評価モデル生成装置は、上記設備における状態を示す状態データを取得する状態データ取得部を備えてよい。上記評価モデル生成装置は、上記操業目標に基づいて、上記状態データに含まれる少なくとも１つの物理量と時間との間における相関、および、上記状態データに含まれる少なくとも２つの物理量の間における相関の少なくともいずれかを示す相関データを生成する相関データ生成部を備えてよい。上記評価モデル生成装置は、ラベリングモデルを用いて、上記相関データをラベリングするラベリング部を備えてよい。上記評価モデル生成装置は、上記ラベリングされた相関データを用いて、上記評価モデルを生成する評価モデル生成部を備えてよい。 The evaluation model generation device may include an operation goal acquisition unit that acquires the operation goal. The evaluation model generation device may include a state data acquisition unit that acquires state data indicating a state of the equipment. The evaluation model generation device may include a correlation data generation unit that generates correlation data indicating at least one of a correlation between at least one physical quantity included in the state data and time and a correlation between at least two physical quantities included in the state data, based on the operation goal. The evaluation model generation device may include a labeling unit that labels the correlation data using a labeling model. The evaluation model generation device may include an evaluation model generation unit that generates the evaluation model using the labeled correlation data.

上記評価モデル生成装置は、上記評価モデルの妥当性を判定する評価モデル判定部を更に備えてよい。 The evaluation model generation device may further include an evaluation model determination unit that determines the validity of the evaluation model.

上記評価モデル生成装置は、上記評価モデルが妥当であると判定された場合に、上記評価モデルを出力する評価モデル出力部を更に備えてよい。 The evaluation model generation device may further include an evaluation model output unit that outputs the evaluation model when the evaluation model is determined to be valid.

上記評価モデル生成装置は、上記評価モデルが妥当であると判定された場合に、上記ラベリングモデルを更新するラベリングモデル更新部を更に備えてよい。 The evaluation model generation device may further include a labeling model update unit that updates the labeling model when the evaluation model is determined to be valid.

上記評価モデル生成装置は、上記相関データの少なくとも一部について教師ラベルを取得する教師ラベル取得部を更に備えてよい。上記ラベリングモデル更新部は、上記教師ラベルに基づいて生成された初期のラベリングモデルとは別に、更新用のラベリングモデルを生成してよい。 The evaluation model generating device may further include a teacher label acquiring unit that acquires teacher labels for at least a portion of the correlation data. The labeling model updating unit may generate an updated labeling model separately from the initial labeling model generated based on the teacher labels.

本発明の第２の態様においては、操業方法を提供する。上記操業方法は、機械学習により、設備における操業目標および上記設備における状態に基づいて対象とする目標について上記設備における状態を評価した指標を出力する評価モデルを生成することを備えてよい。上記操業方法は、上記評価モデルの出力を報酬の少なくとも一部とした強化学習により、上記設備における状態に応じた行動を出力する操業モデルを生成することを備えてよい。上記操業方法は、上記設備の状態に応じて上記操業モデルが出力する行動に基づく操作量を、上記設備における制御対象へ与えることを備えてよい。 In a second aspect of the present invention, an operation method is provided. The operation method may include generating, by machine learning, an evaluation model that outputs an index that evaluates a state of the equipment for a target target based on an operation target of the equipment and a state of the equipment. The operation method may include generating, by reinforcement learning using an output of the evaluation model as at least a part of a reward, an operation model that outputs an action according to the state of the equipment. The operation method may include providing, to a control target of the equipment, an operation amount based on the action output by the operation model according to the state of the equipment.

本発明の第３の態様においては、操業プログラムを提供する。上記操業プログラムは、コンピュータにより実行されてよい。上記操業プログラムは、上記コンピュータを、機械学習により、設備における操業目標および上記設備における状態に基づいて対象とする目標について上記設備における状態を評価した指標を出力する評価モデルを生成する評価モデル生成装置として機能させてよい。上記操業プログラムは、上記コンピュータを、上記評価モデルの出力を報酬の少なくとも一部とした強化学習により、上記設備における状態に応じた行動を出力する操業モデルを生成する操業モデル生成装置として機能させてよい。上記操業プログラムは、上記コンピュータを、上記設備の状態に応じて上記操業モデルが出力する行動に基づく操作量を、上記設備における制御対象へ与える制御装置として機能させてよい。 In a third aspect of the present invention, an operation program is provided. The operation program may be executed by a computer. The operation program may cause the computer to function as an evaluation model generation device that generates, by machine learning, an evaluation model that outputs an index that evaluates a state of the equipment for a target goal based on an operational goal of the equipment and the state of the equipment. The operation program may cause the computer to function as an operation model generation device that generates, by reinforcement learning using the output of the evaluation model as at least a part of a reward, an operation model that outputs an action according to the state of the equipment. The operation program may cause the computer to function as a control device that gives an operation amount based on the action output by the operation model according to the state of the equipment to a control target in the equipment.

本発明の第４の態様においては、評価モデル生成装置を提供する。上記評価モデル生成装置は、設備における操業目標を取得する操業目標取得部を備えてよい。上記評価モデル生成装置は、上記設備における状態を示す状態データを取得する状態データ取得部を備えてよい。上記評価モデル生成装置は、上記操業目標に基づいて、上記状態データに含まれる少なくとも１つの物理量と時間との間における相関、および、上記状態データに含まれる少なくとも２つの物理量の間における相関の少なくともいずれかを示す相関データを生成する相関データ生成部を備えてよい。上記評価モデル生成装置は、ラベリングモデルを用いて、上記相関データをラベリングするラベリング部を備えてよい。上記評価モデル生成装置は、上記ラベリングされた相関データを用いて、上記設備における操業目標および上記設備における状態に基づいて対象とする目標について上記設備における状態を評価した指標を出力する評価モデルを生成する評価モデル生成部を備えてよい。 In a fourth aspect of the present invention, an evaluation model generation device is provided. The evaluation model generation device may include an operation target acquisition unit that acquires an operation target in the facility. The evaluation model generation device may include a state data acquisition unit that acquires state data indicating a state in the facility. The evaluation model generation device may include a correlation data generation unit that generates correlation data indicating at least one of a correlation between at least one physical quantity included in the state data and time and a correlation between at least two physical quantities included in the state data, based on the operation target. The evaluation model generation device may include a labeling unit that labels the correlation data using a labeling model. The evaluation model generation device may include an evaluation model generation unit that generates an evaluation model that outputs an index that evaluates the state of the facility for a target target based on the operation target in the facility and the state of the facility, using the labeled correlation data.

本発明の第５の態様においては、評価モデル生成方法を提供する。上記評価モデル生成方法は、設備における操業目標を取得することを備えてよい。上記評価モデル生成方法は、上記設備における状態を示す状態データを取得することを備えてよい。上記評価モデル生成方法は、上記操業目標に基づいて、上記状態データに含まれる少なくとも１つの物理量と時間との間における相関、および、上記状態データに含まれる少なくとも２つの物理量の間における相関の少なくともいずれかを示す相関データを生成することを備えてよい。上記評価モデル生成方法は、ラベリングモデルを用いて、上記相関データをラベリングすることを備えてよい。上記評価モデル生成方法は、上記ラベリングされた相関データを用いて、上記設備における操業目標および上記設備における状態に基づいて対象とする目標について上記設備における状態を評価した指標を出力する評価モデルを生成することを備えてよい。 In a fifth aspect of the present invention, there is provided an evaluation model generation method. The evaluation model generation method may include acquiring an operation target for a facility. The evaluation model generation method may include acquiring status data indicating a status of the facility. The evaluation model generation method may include generating correlation data indicating at least one of a correlation between at least one physical quantity included in the status data and time and a correlation between at least two physical quantities included in the status data, based on the operation target. The evaluation model generation method may include labeling the correlation data using a labeling model. The evaluation model generation method may include generating an evaluation model that outputs an index that evaluates a status of the facility for a target target based on the operation target for the facility and the status of the facility, using the labeled correlation data.

本発明の第６の態様においては、評価モデル生成プログラムを提供する。上記評価モデル生成プログラムは、コンピュータにより実行されてよい。上記評価モデル生成プログラムは、上記コンピュータを、設備における操業目標を取得する操業目標取得部として機能させてよい。上記評価モデル生成プログラムは、上記コンピュータを、上記設備における状態を示す状態データを取得する状態データ取得部として機能させてよい。上記評価モデル生成プログラムは、上記コンピュータを、上記操業目標に基づいて、上記状態データに含まれる少なくとも１つの物理量と時間との間における相関、および、上記状態データに含まれる少なくとも２つの物理量の間における相関の少なくともいずれかを示す相関データを生成する相関データ生成部として機能させてよい。上記評価モデル生成プログラムは、上記コンピュータを、ラベリングモデルを用いて、上記相関データをラベリングするラベリング部として機能させてよい。上記評価モデル生成プログラムは、上記コンピュータを、上記ラベリングされた相関データを用いて、上記設備における操業目標および上記設備における状態に基づいて対象とする目標について上記設備における状態を評価した指標を出力する評価モデルを生成する評価モデル生成部として機能させてよい。 In a sixth aspect of the present invention, an evaluation model generation program is provided. The evaluation model generation program may be executed by a computer. The evaluation model generation program may cause the computer to function as an operation target acquisition unit that acquires an operation target in the facility. The evaluation model generation program may cause the computer to function as a status data acquisition unit that acquires status data indicating a status in the facility. The evaluation model generation program may cause the computer to function as a correlation data generation unit that generates correlation data indicating at least one of a correlation between at least one physical quantity included in the status data and time and a correlation between at least two physical quantities included in the status data, based on the operation target. The evaluation model generation program may cause the computer to function as a labeling unit that labels the correlation data using a labeling model. The evaluation model generation program may cause the computer to function as an evaluation model generation unit that generates an evaluation model that outputs an index that evaluates the status of the facility for a target target based on the operation target in the facility and the status of the facility, using the labeled correlation data.

なお、上記の発明の概要は、本発明の特徴の全てを列挙したものではない。また、これらの特徴群のサブコンビネーションもまた、発明となりうる。 Note that the above summary of the invention does not list all of the features of the present invention. Also, subcombinations of these features may also be inventions.

本実施形態に係る操業システム１００のブロック図の一例を、設備１０および本部２０とともに示す。An example of a block diagram of an operation system 100 according to this embodiment is shown together with a facility 10 and a headquarters 20. 本実施形態に係る操業システム１００における評価モデル生成装置２００のブロック図の一例を示す。1 shows an example of a block diagram of an evaluation model generating device 200 in an operation system 100 according to this embodiment. ラベリング部２２０が初期のラベリングモデルを生成するにあたって学習器に入力する教師データの一例を示す。13 shows an example of training data that the labeling unit 220 inputs to the learning device when generating an initial labeling model. ラベリング部２２０が用いる学習器の設計例の一例を示す。An example of a design example of a learning module used by the labeling unit 220 will be described below. ラベリング部２２０が用いる学習器の設計例の他の例を示す。13 shows another example of a design of a learning module used by the labeling unit 220. ラベリング部２２０がラベリングの対象とするラベルが未付与の相関データの一例を示す。13 shows an example of correlation data to which labels have not been assigned by the labeling unit 220. 設備１０におけるＩ／Ｏリストの一例を示す。2 shows an example of an I/O list in the facility 10. 設備１０におけるセグメント図の一例を示す。1 shows an example of a segment diagram for the facility 10. ラベリングデータ出力部２２２が出力するラベリングデータの一例を示す。13 shows an example of labeling data output by the labeling data output unit 222. 評価モデル生成装置２００のラベリング機能部２１０における処理フローの一例を示す。13 shows an example of a processing flow in a labeling function unit 210 of the evaluation model generating device 200. 評価モデル生成部２５０のブロック図の一例を示す。2 shows an example of a block diagram of an evaluation model generating unit 250. 評価モデルの出力の一例を示す。1 shows an example of an output of the evaluation model. 評価モデルの出力の他の例を示す。13 shows another example of an output of the evaluation model. 評価モデル生成装置２００の機械学習機能部２３０における処理フローの一例を示す。13 shows an example of a processing flow in the machine learning function unit 230 of the evaluation model generating device 200. 本実施形態に係る操業システム１００における操業モデル生成装置３００のブロック図の一例を示す。1 shows an example of a block diagram of an operation model generating device 300 in an operation system 100 according to this embodiment. 操業モデル生成装置３００が生成する操業モデルの一例を示す。1 shows an example of an operation model generated by the operation model generating device 300. 行動決定テーブルの一例を示す。13 shows an example of a behavior decision table. 操業モデル生成装置３００における処理フローの一例を示す。1 shows an example of a process flow in an operation model generating device 300. 操業モデル生成部３１６における強化学習フローの一例を示す。13 shows an example of a reinforcement learning flow in the operation model generation unit 316. 本実施形態に係る操業システム１００における制御装置４００のブロック図の一例を示す。1 shows an example of a block diagram of a control device 400 in an operation system 100 according to this embodiment. 制御装置４００における処理フローの一例を示す。1 shows an example of a process flow in the control device 400. 本発明の複数の態様が全体的または部分的に具現化されてよいコンピュータ９９００の例を示す。An example of a computer 9900 is shown in which aspects of the present invention may be embodied in whole or in part.

以下、発明の実施の形態を通じて本発明を説明するが、以下の実施形態は特許請求の範囲にかかる発明を限定するものではない。また、実施形態の中で説明されている特徴の組み合わせの全てが発明の解決手段に必須であるとは限らない。 The present invention will be described below through embodiments of the invention, but the following embodiments do not limit the invention according to the claims. Furthermore, not all of the combinations of features described in the embodiments are necessarily essential to the solution of the invention.

図１は、本実施形態に係る操業システム１００のブロック図の一例を、設備１０および本部２０とともに示す。なお、これらブロックは、それぞれ機能的に分離された機能ブロックであって、実際の装置構成とは必ずしも一致していなくてもよい。すなわち、本図において、１つのブロックとして示されているからといって、それが必ずしも１つの装置により構成されていなくてもよい。また、本図において、別々のブロックとして示されているからといって、それらが必ずしも別々の装置により構成されていなくてもよい。これより先のブロック図についても同様である。 Figure 1 shows an example of a block diagram of an operating system 100 according to this embodiment, together with equipment 10 and headquarters 20. Note that these blocks are functionally separated functional blocks and do not necessarily correspond to the actual device configuration. In other words, just because something is shown as one block in this diagram does not necessarily mean that it is made up of one device. Also, just because something is shown as separate blocks in this diagram does not necessarily mean that they are made up of separate devices. The same applies to the block diagrams that follow.

設備１０は、アクチュエータ等の制御対象が備え付けられた施設や装置等である。例えば、設備１０は、プラントであってもよいし、複数の機器を複合させた複合装置であってもよい。プラントとしては、化学やバイオ等の工業プラントの他、ガス田や油田等の井戸元やその周辺を管理制御するプラント、水力・火力・原子力等の発電を管理制御するプラント、太陽光や風力等の環境発電を管理制御するプラント、上下水やダム等を管理制御するプラント等が挙げられる。 The equipment 10 is a facility or device equipped with an actuator or other control target. For example, the equipment 10 may be a plant, or a composite device that combines multiple devices. Examples of plants include industrial plants such as chemical and bio plants, as well as plants that manage and control wellheads and surrounding areas of gas and oil fields, plants that manage and control hydroelectric, thermal, and nuclear power generation, plants that manage and control environmental power generation such as solar and wind power, and plants that manage and control water supply and sewage systems and dams.

本部２０は、設備１０を経営する事業の中心となる機関であり、例えば、事業者の本社等であってよい。例えばこのような本部２０には、事業者の経営を担当する経営陣が在籍していてよい。経営陣は、操業システム１００へ操業目標を指定する。ここで、操業目標とは、設備１０を操業するにあたって設定された目標であり、例えば、目標とする項目や値が含まれていてよい。 The headquarters 20 is the central institution for operating the facility 10, and may be, for example, the head office of the business operator. For example, such a headquarters 20 may house the management team in charge of the business operator's management. The management team specifies operational goals to the operation system 100. Here, the operational goals are goals set for operating the facility 10, and may include, for example, target items and values.

本実施形態に係る操業システム１００においては、設備１０における状態を評価した指標を出力する評価モデルを機械学習により生成し、当該評価モデルの出力を報酬の少なくとも一部とした強化学習により操業モデルを生成する。そして、本実施形態に係る操業システム１００においては、このようにして生成された操業モデルを用いて設備１０における制御対象を制御する。 In the operation system 100 according to this embodiment, an evaluation model that outputs an index that evaluates the state of the equipment 10 is generated by machine learning, and an operation model is generated by reinforcement learning in which the output of the evaluation model is used as at least a part of the reward. Then, in the operation system 100 according to this embodiment, the operation model generated in this manner is used to control the controlled object in the equipment 10.

操業システム１００は、評価モデル生成装置２００、操業モデル生成装置３００、および、制御装置４００を備える。 The operation system 100 includes an evaluation model generation device 200, an operation model generation device 300, and a control device 400.

評価モデル生成装置２００は、機械学習により、設備１０における操業目標および設備１０における状態に基づいて対象とする目標について設備１０における状態を評価した指標を出力する評価モデルを生成する。評価モデル生成装置２００は、生成した評価モデルを操業モデル生成装置３００へ供給する。 The evaluation model generating device 200 uses machine learning to generate an evaluation model that outputs an index that evaluates the state of the facility 10 for a target goal based on the operational goal of the facility 10 and the state of the facility 10. The evaluation model generating device 200 supplies the generated evaluation model to the operation model generating device 300.

操業モデル生成装置３００は、評価モデル生成装置２００が生成した評価モデルの出力を報酬の少なくとも一部とした強化学習により、設備１０における状態に応じた行動を出力する操業モデルを生成する。操業モデル生成装置３００は、生成した操業モデルを制御装置４００へ供給する。 The operation model generating device 300 generates an operation model that outputs an action according to the state of the facility 10 by reinforcement learning using the output of the evaluation model generated by the evaluation model generating device 200 as at least a part of the reward. The operation model generating device 300 supplies the generated operation model to the control device 400.

制御装置４００は、設備１０の状態に応じて操業モデル生成装置３００が生成した操業モデルが出力する行動に基づく操作量を、設備１０における制御対象へ与える。すなわち、制御装置４００は、強化学習により生成された操業モデルを用いたＡＩ（ＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ）コントローラとして機能する。 The control device 400 applies to the controlled object in the facility 10 an operation amount based on the behavior output by the operation model generated by the operation model generating device 300 according to the state of the facility 10. In other words, the control device 400 functions as an AI (Artificial Intelligence) controller using the operation model generated by reinforcement learning.

この際、評価モデル生成装置２００は、制御対象が操業モデルを用いて制御された場合における設備１０における状態に基づいて、評価モデルを更新する。これに応じて、操業モデル生成装置３００は、更新された評価モデルの出力を報酬の少なくとも一部とした強化学習により、操業モデルを更新する。そして、制御装置４００は、更新された操業モデルを用いて制御対象を制御する。 At this time, the evaluation model generating device 200 updates the evaluation model based on the state of the equipment 10 when the controlled object is controlled using the operation model. In response to this, the operation model generating device 300 updates the operation model by reinforcement learning in which the output of the updated evaluation model is at least a part of the reward. Then, the control device 400 controls the controlled object using the updated operation model.

このように、本実施形態に係る操業システム１００においては、ＡＩが自動的に操業におけるボトルネック（ポテンシャルフォルト）を探し出し、改善のための指標を評価モデルとして生成する。そして、ＡＩが与えられた指標を基に試行錯誤を行い、より良い操業方法を指示する操業モデルを生成する。そして、ＡＩコントローラが当該操業モデルを用いて制御対象をＡＩ制御する。これにより、本実施形態に係る操業システム１００によれば、ＡＩ技術を用いて設備１０を自律制御可能な環境を提供する。そして、本実施形態に係る操業システム１００は、このようなＡＩ制御下における設備の状態に基づいて、評価モデルおよび操業モデルを更新し、更新された操業モデルを用いて制御対象をＡＩ制御する。これにより、本実施形態に係る操業システム１００によれば、設備１０における操業を改善するループを自律的に回すことができる。これについて詳細に説明する。 In this way, in the operation system 100 according to this embodiment, the AI automatically finds bottlenecks (potential faults) in the operation and generates indicators for improvement as an evaluation model. Then, the AI performs trial and error based on the given indicators to generate an operation model that indicates a better operation method. Then, the AI controller uses the operation model to AI-control the controlled object. As a result, the operation system 100 according to this embodiment provides an environment in which the equipment 10 can be autonomously controlled using AI technology. Then, the operation system 100 according to this embodiment updates the evaluation model and the operation model based on the state of the equipment under such AI control, and uses the updated operation model to AI-control the controlled object. As a result, the operation system 100 according to this embodiment can autonomously run a loop to improve the operation of the equipment 10. This will be described in detail.

図２は、本実施形態に係る操業システム１００における評価モデル生成装置２００のブロック図の一例を示す。評価モデル生成装置２００は、ＰＣ（パーソナルコンピュータ）、タブレット型コンピュータ、スマートフォン、ワークステーション、サーバコンピュータ、または汎用コンピュータ等のコンピュータであってよく、複数のコンピュータが接続されたコンピュータシステムであってもよい。このようなコンピュータシステムもまた広義のコンピュータである。また、評価モデル生成装置２００は、コンピュータ内で１または複数実行可能な仮想コンピュータ環境によって実装されてもよい。これに代えて、評価モデル生成装置２００は、評価モデルの生成用に設計された専用コンピュータであってもよく、専用回路によって実現された専用ハードウェアであってもよい。また、インターネットに接続可能な場合、評価モデル生成装置２００は、クラウドコンピューティングにより実現されてもよい。 Figure 2 shows an example of a block diagram of the evaluation model generating device 200 in the operation system 100 according to this embodiment. The evaluation model generating device 200 may be a computer such as a PC (personal computer), a tablet computer, a smartphone, a workstation, a server computer, or a general-purpose computer, or may be a computer system to which multiple computers are connected. Such a computer system is also a computer in the broad sense. The evaluation model generating device 200 may be implemented by one or more virtual computer environments that can be executed within a computer. Alternatively, the evaluation model generating device 200 may be a dedicated computer designed for generating an evaluation model, or may be dedicated hardware realized by a dedicated circuit. In addition, if the evaluation model generating device 200 is connectable to the Internet, it may be realized by cloud computing.

評価モデル生成装置２００は、ラベリング機能部２１０および機械学習機能部２３０を備える。なお、本図においては、ラベリング機能部２１０と機械学習機能部２３０とが一体の装置として構成されている場合を一例として示しているが、これに限定されるものではない。ラベリング機能部２１０と機械学習機能部２３０とは別体の装置として構成されていてもよい。 The evaluation model generating device 200 includes a labeling function unit 210 and a machine learning function unit 230. Note that, in this figure, a case where the labeling function unit 210 and the machine learning function unit 230 are configured as an integrated device is shown as an example, but this is not limited thereto. The labeling function unit 210 and the machine learning function unit 230 may be configured as separate devices.

ラベリング機能部２１０は、操業目標取得部２１２と、状態データ取得部２１４と、相関データ生成部２１６と、教師ラベル取得部２１８と、ラベリング部２２０と、ラベリングデータ出力部２２２と、ラベリングモデル更新部２２４とを含む。すなわち、評価モデル生成装置２００は、操業目標取得部２１２と、状態データ取得部２１４と、相関データ生成部２１６と、教師ラベル取得部２１８と、ラベリング部２２０と、ラベリングデータ出力部２２２と、ラベリングモデル更新部２２４とを備える。 The labeling function unit 210 includes an operation target acquisition unit 212, a state data acquisition unit 214, a correlation data generation unit 216, a teacher label acquisition unit 218, a labeling unit 220, a labeling data output unit 222, and a labeling model update unit 224. That is, the evaluation model generation device 200 includes an operation target acquisition unit 212, a state data acquisition unit 214, a correlation data generation unit 216, a teacher label acquisition unit 218, a labeling unit 220, a labeling data output unit 222, and a labeling model update unit 224.

操業目標取得部２１２は、設備１０における操業目標を取得する。例えば、操業目標取得部２１２は、本部２０からネットワークを介して操業目標を取得する。しかしながら、これに限定されるものではない。操業目標取得部２１２は、操業目標を他の装置から取得してもよいし、各種メモリデバイスを介して取得してもよいし、ユーザ入力を介して取得してもよい。操業目標取得部２１２は、取得した操業目標を相関データ生成部２１６へ供給する。 The operation target acquisition unit 212 acquires the operation targets for the facility 10. For example, the operation target acquisition unit 212 acquires the operation targets from the headquarters 20 via a network. However, this is not limited to this. The operation target acquisition unit 212 may acquire the operation targets from another device, may acquire them via various memory devices, or may acquire them via user input. The operation target acquisition unit 212 supplies the acquired operation targets to the correlation data generation unit 216.

状態データ取得部２１４は、設備１０における状態を示す状態データを取得する。例えば、状態データ取得部２１４は、設備１０に設けられた各種センサが測定した様々な物理量を、状態データとして設備１０からネットワークを介して時系列に取得する。しかしながら、これに限定されるものではない。状態データ取得部２１４は、状態データを他の装置から取得してもよいし、各種メモリデバイスを介して取得してもよいし、ユーザ入力を介して取得してもよい。状態データ取得部２１４は、取得した状態データを相関データ生成部２１６へ供給する。 The status data acquisition unit 214 acquires status data indicating the status of the equipment 10. For example, the status data acquisition unit 214 acquires various physical quantities measured by various sensors provided in the equipment 10 as status data in chronological order from the equipment 10 via a network. However, this is not limited to this. The status data acquisition unit 214 may acquire the status data from another device, may acquire the status data via various memory devices, or may acquire the status data via user input. The status data acquisition unit 214 supplies the acquired status data to the correlation data generation unit 216.

相関データ生成部２１６は、操業目標取得部２１２が取得した操業目標に基づいて、状態データ取得部２１４が取得した状態データに含まれる少なくとも１つの物理量と時間との間における相関、および、状態データに含まれる少なくとも２つの物理量の間における相関の少なくともいずれかを示す相関データを生成する。この際、相関データ生成部２１６は、これら相関をグラフ化したグラフ画像を含む相関データを生成してよい。相関データ生成部２１６は、生成した相関データをラベリング部２２０へ供給する。 The correlation data generating unit 216 generates correlation data indicating at least one of the correlation between at least one physical quantity and time included in the state data acquired by the state data acquiring unit 214, and the correlation between at least two physical quantities included in the state data, based on the operation target acquired by the operation target acquiring unit 212. At this time, the correlation data generating unit 216 may generate correlation data including a graph image that graphs these correlations. The correlation data generating unit 216 supplies the generated correlation data to the labeling unit 220.

教師ラベル取得部２１８は、相関データ生成部２１６が生成した相関データの少なくとも一部について教師ラベルを取得する。例えば、教師ラベル取得部２１８は、ユーザ（有識者等）入力を介して教師ラベルを取得する。しかしながら、これに限定されるものではない。教師ラベル取得部２１８は、教師ラベルを他の装置から取得してもよいし、ネットワークを介して取得してもよいし、各種メモリデバイスを介して取得してもよい。教師ラベル取得部２１８は、取得した教師ラベルをラベリング部２２０へ供給する。 The teacher label acquisition unit 218 acquires teacher labels for at least a portion of the correlation data generated by the correlation data generation unit 216. For example, the teacher label acquisition unit 218 acquires teacher labels via user (expert, etc.) input. However, this is not limited to this. The teacher label acquisition unit 218 may acquire the teacher labels from another device, via a network, or via various memory devices. The teacher label acquisition unit 218 supplies the acquired teacher labels to the labeling unit 220.

ラベリング部２２０は、教師ラベル取得部２１８が取得した教師ラベルに基づいて生成されたラベリングモデルを用いて、相関データ生成部２１６が生成した相関データをラベリングする。例えば、ラベリング部２２０は、相関データ生成部２１６が生成した相関データの少なくとも一部に教師ラベル取得部２１８が取得した教師ラベルが付されたデータを教師データとして学習器に入力することにより、初期のラベリングモデルを生成する。そして、ラベリング部２２０は、当該初期のラベリングモデルおよび後述する更新用のラベリングモデルを用いて、ラベルが未付与の相関データをラベリングする。なお、上述の説明では、教師データにおける相関データが相関データ生成部２１６によって生成されたものである場合を一例として示したが、これに限定されるものではない。教師データにおける相関データは、有識者等によって生成されたものであってもよい。この場合、教師ラベル取得部２１８は、有識者等から教師ラベルを取得することに代えて、教師ラベルが付された教師データそれ自体を取得し、これをラベリング部２２０へ供給してもよい。ラベリング部２２０は、ラベリングした相関データをラベリングデータ出力部２２２へ供給する。 The labeling unit 220 labels the correlation data generated by the correlation data generation unit 216 using a labeling model generated based on the teacher label acquired by the teacher label acquisition unit 218. For example, the labeling unit 220 generates an initial labeling model by inputting data to which the teacher label acquired by the teacher label acquisition unit 218 is attached to at least a part of the correlation data generated by the correlation data generation unit 216 as teacher data into a learning device. Then, the labeling unit 220 labels the unlabeled correlation data using the initial labeling model and an update labeling model described later. In the above description, the case where the correlation data in the teacher data is generated by the correlation data generation unit 216 is shown as an example, but is not limited to this. The correlation data in the teacher data may be generated by an expert or the like. In this case, the teacher label acquisition unit 218 may acquire the teacher data itself to which the teacher label is attached, instead of acquiring the teacher label from the expert or the like, and supply it to the labeling unit 220. The labeling unit 220 supplies the labeled correlation data to the labeling data output unit 222.

ラベリングデータ出力部２２２は、ラベリング部２２０がラベリングした相関データとセンサデータ（物理量の測定値）とに基づき、ラベリングデータを生成する。ラベリングデータ出力部２２２は、生成したラベリングデータを機械学習機能部２３０へ出力する。 The labeling data output unit 222 generates labeling data based on the correlation data labeled by the labeling unit 220 and the sensor data (measured values of physical quantities). The labeling data output unit 222 outputs the generated labeling data to the machine learning function unit 230.

ラベリングモデル更新部２２４は、後述する評価モデルの判定結果を取得する。そして、ラベリングモデル更新部２２４は、評価モデルが妥当であると判定された場合に、ラベリングモデルを更新する。この際、ラベリングモデル更新部２２４は、教師ラベルに基づいて生成された初期のラベリングモデルとは別に、更新用のラベリングモデルを生成してよい。 The labeling model update unit 224 obtains the judgment result of the evaluation model described below. Then, when it is judged that the evaluation model is appropriate, the labeling model update unit 224 updates the labeling model. At this time, the labeling model update unit 224 may generate an updated labeling model separately from the initial labeling model generated based on the teacher label.

機械学習機能部２３０は、ラベリングデータ取得部２４０と、評価モデル生成部２５０と、評価モデル判定部２６０と、評価モデル出力部２７０とを含む。すなわち、評価モデル生成装置２００は、ラベリングデータ取得部２４０と、評価モデル生成部２５０と、評価モデル判定部２６０と、評価モデル出力部２７０と備える。 The machine learning function unit 230 includes a labeling data acquisition unit 240, an evaluation model generation unit 250, an evaluation model determination unit 260, and an evaluation model output unit 270. That is, the evaluation model generation device 200 includes a labeling data acquisition unit 240, an evaluation model generation unit 250, an evaluation model determination unit 260, and an evaluation model output unit 270.

ラベリングデータ取得部２４０は、ラベリングデータ出力部２２２が出力したラベリングデータを取得する。ラベリングデータ取得部２４０は、取得したラベリングデータを評価モデル生成部２５０へ供給する。 The labeling data acquisition unit 240 acquires the labeling data output by the labeling data output unit 222. The labeling data acquisition unit 240 supplies the acquired labeling data to the evaluation model generation unit 250.

評価モデル生成部２５０は、ラベリングデータ取得部２４０が取得したラベリングデータ、すなわち、ラベリング部２２０によってラベリングされた相関データとセンサデータとに基づき生成されたラベリングデータを用いて、設備１０における操業目標および設備１０における状態に基づいて対象とする目標について設備１０における状態を評価した指標を出力する評価モデルを生成する。評価モデル生成部２５０は、生成した評価モデルを評価モデル判定部２６０および評価モデル出力部２７０へ供給する。 The evaluation model generation unit 250 uses the labeling data acquired by the labeling data acquisition unit 240, i.e., the labeling data generated based on the correlation data labeled by the labeling unit 220 and the sensor data, to generate an evaluation model that outputs an index that evaluates the state of the equipment 10 for a target target based on the operation target of the equipment 10 and the state of the equipment 10. The evaluation model generation unit 250 supplies the generated evaluation model to the evaluation model determination unit 260 and the evaluation model output unit 270.

評価モデル判定部２６０は、評価モデル生成部２５０が生成した評価モデルの妥当性を判定する。評価モデル判定部２６０は、評価モデルが妥当であると判定した場合に、その旨をラベリングモデル更新部２２４および評価モデル出力部２７０へ通知する。これに応じて、ラベリングモデル更新部２２４は、ラベリングモデルを更新する。 The evaluation model determination unit 260 determines the validity of the evaluation model generated by the evaluation model generation unit 250. If the evaluation model determination unit 260 determines that the evaluation model is valid, it notifies the labeling model update unit 224 and the evaluation model output unit 270 of this fact. In response to this, the labeling model update unit 224 updates the labeling model.

評価モデル出力部２７０は、評価モデルが妥当であると判定された場合に、評価モデルを操業モデル生成装置３００へ出力する。 If the evaluation model is determined to be valid, the evaluation model output unit 270 outputs the evaluation model to the operation model generation device 300.

これについて、先ず、評価モデル生成装置２００のラベリング機能部２１０における処理の詳細を、データ例やフローを用いて詳細に説明する。 First, we will explain the details of the processing in the labeling function unit 210 of the evaluation model generation device 200 in detail using example data and flow charts.

図３は、ラベリング部２２０が初期のラベリングモデルを生成するにあたって学習器に入力する教師データの一例を示す。本図に示されるように、学習器に入力する教師データには、「操業目標」、「対象」、「タグ情報１」、「タグ情報２」、「グラフ画像」、および、「ラベル」が含まれていてよい。ここで、「操業目標」には、目標とする項目を示す「目標項目」、および、当該目標項目についての「目標値」が含まれていてよい。また、「対象」には、対象となるセグメントの「カテゴリ」、および、対象となる「制御ループ」が含まれていてよい。また、「タグ情報１」には、タグ１の「名前」、「物理量」の種別、「最小値」、「最大値」、および、「単位」が含まれていてよい。また、「タグ情報２」には、タグ２の「名前」、「物理量」の種別、「最小値」、「最大値」、および、「単位」が含まれていてよい。 Figure 3 shows an example of teacher data that the labeling unit 220 inputs to the learning device when generating an initial labeling model. As shown in this figure, the teacher data input to the learning device may include "operation goal", "target", "tag information 1", "tag information 2", "graph image", and "label". Here, "operation goal" may include "target item" indicating the target item and "target value" for the target item. Furthermore, "target" may include "category" of the target segment and "control loop" to be targeted. Furthermore, "tag information 1" may include "name", "type of physical quantity", "minimum value", "maximum value", and "unit" of tag 1. Furthermore, "tag information 2" may include "name", "type of physical quantity", "minimum value", "maximum value", and "unit" of tag 2.

また、「グラフ画像」は、タグ１とタグ２の相関をグラフ化した画像であってよい。例えば、「ｘｘｘ１．ｊｐｇ」や「ｘｘｘ２．ｊｐｇ」は、タグ１の濃度を縦軸、タグ２の時間を横軸としてグラフ化した時系列画像であってよい。同様に、「ｘｘｘ３．ｊｐｇ」や「ｘｘｘ４．ｊｐｇ」は、タグ１の加速度を縦軸、タグ２の時間を横軸としてグラフ化した時系列画像であってよい。また、「ｘｘｘ５．ｊｐｇ」や「ｘｘｘ６．ｊｐｇ」は、タグ１の温度を縦軸、タグ２の濃度を横軸としてグラフ化した分布図（散布図）画像であってよい。換言すれば、「グラフ画像」における縦軸は「タグ情報１」によって定義され、「グラフ画像」における横軸は「タグ情報２」によって定義されているということができる。 The "graph image" may be an image that graphs the correlation between tag 1 and tag 2. For example, "xxx1.jpg" and "xxx2.jpg" may be time-series images that are graphed with the concentration of tag 1 on the vertical axis and the time of tag 2 on the horizontal axis. Similarly, "xxx3.jpg" and "xxx4.jpg" may be time-series images that are graphed with the acceleration of tag 1 on the vertical axis and the time of tag 2 on the horizontal axis. Furthermore, "xxx5.jpg" and "xxx6.jpg" may be distribution (scatter) images that are graphed with the temperature of tag 1 on the vertical axis and the concentration of tag 2 on the horizontal axis. In other words, the vertical axis in the "graph image" is defined by "tag information 1", and the horizontal axis in the "graph image" is defined by "tag information 2".

また、「ラベル」は、有識者等によって付された教師ラベルである。ラベリング部２２０は、このような項目を含む教師データを学習器に入力することにより、初期のラベリングモデルを生成してよい。なお、本図においては、ラベルとして「ＯＫ」ラベルと「ＮＧ」ラベルの両者が用いられる場合を一例として示したが、これに限定されるものではない。ラベルとして「ＯＫ」ラベルのみが用いられてもよいし、「ＮＧ」ラベルのみが用いられてもよい。特に、操業におけるボトルネックを探し出す用途においては、ラベルとして少なくとも「ＮＧ」ラベルが用いられるとよい。 Furthermore, a "label" is a teacher label assigned by an expert or the like. The labeling unit 220 may generate an initial labeling model by inputting teacher data including such items into a learning device. Note that in this figure, an example is shown in which both an "OK" label and an "NG" label are used as labels, but this is not limited to this. Only an "OK" label may be used as a label, or only an "NG" label may be used as a label. In particular, in applications for finding bottlenecks in operations, it is preferable to use at least an "NG" label as a label.

図４は、ラベリング部２２０が用いる学習器の設計例の一例を示す。ラベリング部２２０は、このような学習器として様々な学習アルゴリズムを用いてもよく、一例として、ＤｅｅｐＬｅａｒｎｉｎｇを用いてもよい。学習器は、操業目標を分類する機能部と、対象を分類する機能部と、物理量（タグ情報）を分類する機能部と、グラフ画像を分類する機能部を有していてよい。この際、学習器は、本図に示されるように、操業目標を分類する目標分類層、対象を分類する対象分類層、物理量を分類する物理量分類層、および、グラフ画像を分類するグラフ分類層のような各項目に反応する層を有するように設計されてもよい。 Figure 4 shows an example of a design example of a learning device used by the labeling unit 220. The labeling unit 220 may use various learning algorithms as such a learning device, and as an example, Deep Learning may be used. The learning device may have a functional unit for classifying operation goals, a functional unit for classifying objects, a functional unit for classifying physical quantities (tag information), and a functional unit for classifying graph images. In this case, the learning device may be designed to have layers that respond to each item, such as a target classification layer that classifies operation goals, an object classification layer that classifies objects, a physical quantity classification layer that classifies physical quantities, and a graph classification layer that classifies graph images, as shown in this figure.

図５は、ラベリング部２２０が用いる学習器の設計例の他の例を示す。図４のように学習器に各項目に反応する層を用意することに代えて、学習器は、本図に示されるように、操業目標を分類する目標分類モデル、対象を分類する対象分類モデル、物理量を分類する物理量分類モデル、および、グラフ画像を分類するグラフ分類モデルのような各層毎に分類可能なモデルを有するように設計されてもよい。 Figure 5 shows another example of the design of a learning machine used by the labeling unit 220. Instead of providing a layer in the learning machine that responds to each item as in Figure 4, the learning machine may be designed to have a classifiable model for each layer, such as a target classification model that classifies operational goals, an object classification model that classifies objects, a physical quantity classification model that classifies physical quantities, and a graph classification model that classifies graph images, as shown in this figure.

図６は、ラベリング部２２０がラベリングの対象とするラベルが未付与の相関データの一例を示す。本図に示されるように、ラベルが未付与の相関データには、「操業目標」、「対象」、「タグ情報１」、「タグ情報２」、および、「グラフ画像」が含まれていてよい。これら各項目については、図３に示される教師データの各項目と同様であってよいので、ここでは説明を省略する。 Figure 6 shows an example of unlabeled correlation data that is the target of labeling by the labeling unit 220. As shown in this figure, the unlabeled correlation data may include "operational goal," "target," "tag information 1," "tag information 2," and "graph image." Each of these items may be the same as each of the items in the teacher data shown in Figure 3, so a description of them will be omitted here.

図７は、設備１０におけるＩ／Ｏリストの一例を示す。Ｉ／Ｏリストは、設備１０に設けられた各機器の情報をリスト化したものである。このようなＩ／Ｏリストは、相関データの生成やラベリングデータの生成の際に適宜参照される。 Figure 7 shows an example of an I/O list in the facility 10. The I/O list is a list of information about each device installed in the facility 10. Such an I/O list is referred to as appropriate when generating correlation data and labeling data.

図８は、設備１０におけるセグメント図の一例を示す。セグメント図は、設備１０におけるセグメントの構成を示す図である。このようなセグメント図についても、Ｉ／Ｏリストと同様、相関データの生成やラベリングデータの生成の際に適宜参照される。 Figure 8 shows an example of a segment diagram for equipment 10. A segment diagram is a diagram showing the configuration of segments in equipment 10. As with I/O lists, such segment diagrams are referred to as appropriate when generating correlation data and labeling data.

図９は、ラベリングデータ出力部２２２が出力するラベリングデータの一例を示す。本図に示されるように、ラベリングデータには、「ＯＫ」ラベルが付されたラベリングデータおよび「ＮＧ」ラベルが付されたラベリングデータが含まれていてよい。このようなラベリングデータには、それぞれ、センサＩＤ（タグ名）に対応するセンサデータが含まれている。すなわち、ラベリングデータは、１つまたは複数のセンサにおける測定値がどのような値の場合に「ＯＫ」ラベルが付され、どのような値の場合に「ＮＧ」ラベルが付されているかを示している。 Figure 9 shows an example of labeling data output by the labeling data output unit 222. As shown in this figure, the labeling data may include labeling data with an "OK" label and labeling data with an "NG" label. Each such labeling data includes sensor data corresponding to a sensor ID (tag name). In other words, the labeling data indicates what values of measurements from one or more sensors are labeled with an "OK" label and what values are labeled with an "NG" label.

ラベリング機能部２１０は、例えば本図に示されるようなラベリングデータを生成して出力する。これについてフローを用いて詳細に説明する。 The labeling function unit 210 generates and outputs labeling data, for example as shown in this figure. This will be explained in detail using a flow chart.

図１０は、評価モデル生成装置２００のラベリング機能部２１０における処理フローの一例を示す。評価モデル生成装置２００のラベリング機能部２１０は、例えば本図に示されるフローにより、ラベリング処理を実行してよい。 Figure 10 shows an example of a processing flow in the labeling function unit 210 of the evaluation model generating device 200. The labeling function unit 210 of the evaluation model generating device 200 may execute the labeling process, for example, according to the flow shown in this figure.

ステップＳ１００２において、評価モデル生成装置２００は、設備１０における操業目標を取得する。例えば、操業目標取得部２１２は、本部２０からネットワークを介して操業目標を取得する。このような操業目標には、例えば、目標とする項目を示す目標項目、および、当該目標項目についての目標値が含まれていてよい。一例として、設備１０がプラントである場合、操業目標取得部２１２は、操業目標としてプラントＫＰＩ（ＫｅｙＰｅｒｆｏｒｍａｎｃｅＩｎｄｉｃａｔｏｒ：重要業績評価指標）を取得してよい。操業目標取得部２１２は、取得した操業目標を相関データ生成部２１６へ供給する。 In step S1002, the evaluation model generating device 200 acquires the operation targets for the facility 10. For example, the operation target acquisition unit 212 acquires the operation targets from the headquarters 20 via a network. Such operation targets may include, for example, a target item indicating a target item and a target value for the target item. As an example, when the facility 10 is a plant, the operation target acquisition unit 212 may acquire a plant KPI (Key Performance Indicator) as an operation target. The operation target acquisition unit 212 supplies the acquired operation targets to the correlation data generating unit 216.

ステップＳ１００４において、評価モデル生成装置２００は、設備１０における状態を示す状態データを取得する。例えば、状態データ取得部２１４は、設備１０に設けられた各種センサが測定した様々な物理量を、状態データとして設備１０からネットワークを介して時系列に取得する。このような物理量には、例えば、設備１０の様々な箇所における温度、濃度、加速度、および、圧力等が含まれていてよい。状態データ取得部２１４は、取得した状態データを相関データ生成部２１６へ供給する。 In step S1004, the evaluation model generating device 200 acquires state data indicating the state of the facility 10. For example, the state data acquiring unit 214 acquires various physical quantities measured by various sensors provided in the facility 10 as state data from the facility 10 in time series via a network. Such physical quantities may include, for example, temperature, concentration, acceleration, and pressure at various locations in the facility 10. The state data acquiring unit 214 supplies the acquired state data to the correlation data generating unit 216.

ステップＳ１００６において、評価モデル生成装置２００は、操業目標に基づいて、状態データに含まれる少なくとも１つの物理量と時間との間における相関、および、状態データに含まれる少なくとも２つの物理量の間における相関の少なくともいずれかを示す相関データを生成する。例えば、相関データ生成部２１６は、ステップＳ１００２において取得された操業目標に基づいて、ステップＳ１００４において取得された状態データに含まれる少なくとも１つの物理量と時間との間における相関、および、状態データに含まれる少なくとも２つの物理量の間における相関の少なくともいずれかを示す相関データを生成する。一例として、相関データ生成部２１６は、ステップＳ１００２において取得された操業目標により「操業目標」の項目を入力する。また、相関データ生成部２１６は、図７に示されるＩ／Ｏリスト、および、図８に示されるセグメント図を参照することにより「対象」の項目、「タグ情報１」、および、「タグ情報２」の項目を網羅的に入力する。そして、相関データ生成部２１６は、「タグ情報１」によって横軸を定義し、「タグ情報２」によって縦軸を定義することにより、タグ１とタグ２の相関をグラフ化し、このようにグラフ化された画像（例えば、時系列画像や分布図画像）を「グラフ画像」の項目に入力する。これにより、相関データ生成部２１６は、例えば、図３に示される教師データのうちの「教師ラベル」の項目以外の相関データや、図６に示されるラベルが未付与の相関データを生成する。相関データ生成部２１６は、生成した相関データをラベリング部２２０へ供給する。 In step S1006, the evaluation model generating device 200 generates correlation data indicating at least one of the correlation between at least one physical quantity included in the state data and time, and the correlation between at least two physical quantities included in the state data, based on the operation target. For example, the correlation data generating unit 216 generates correlation data indicating at least one of the correlation between at least one physical quantity included in the state data acquired in step S1004 and time, and the correlation between at least two physical quantities included in the state data, based on the operation target acquired in step S1002. As an example, the correlation data generating unit 216 inputs the "operation target" item based on the operation target acquired in step S1002. In addition, the correlation data generating unit 216 comprehensively inputs the "target" item, "tag information 1", and "tag information 2" items by referring to the I/O list shown in FIG. 7 and the segment diagram shown in FIG. 8. Then, the correlation data generation unit 216 defines the horizontal axis by "tag information 1" and the vertical axis by "tag information 2" to graph the correlation between tags 1 and 2, and inputs the graphed image (e.g., a time series image or a distribution map image) into the "graph image" field. In this way, the correlation data generation unit 216 generates, for example, correlation data other than the "teacher label" field of the teacher data shown in FIG. 3, and unlabeled correlation data shown in FIG. 6. The correlation data generation unit 216 supplies the generated correlation data to the labeling unit 220.

ステップＳ１００８において、評価モデル生成装置２００は、ラベリングモデルの有無を判定する。例えば、評価モデル生成装置２００は、ラベルが未付与の相関データをラベリングするためのラベリングモデルが生成済みであるか否か判定する。ラベリングモデルが無い（生成済みでない）と判定された場合（Ｎｏの場合）、評価モデル生成装置２００は、処理をステップＳ１０１０へ進める。 In step S1008, the evaluation model generating device 200 determines whether or not a labeling model exists. For example, the evaluation model generating device 200 determines whether or not a labeling model for labeling unlabeled correlation data has already been generated. If it is determined that a labeling model does not exist (has not been generated) (No), the evaluation model generating device 200 proceeds to step S1010.

ステップＳ１０１０において、評価モデル生成装置２００は、相関データの少なくとも一部について教師ラベルを取得する。例えば、教師ラベル取得部２１８は、ステップＳ１００６において生成された相関データが表示されたことに応じて、教師ラベルの入力を受け付ける。これに応じて、有識者等は、グラフ画像を基に相関データを判断し、「問題なし」と判断した相関データに「ＯＫ」ラベルを付し、「問題あり／疑わしい」と判断した相関データに「ＮＧ」ラベルを付す。すなわち、有識者等は、どのデータに着目してどこがおかしいかをラベリングする。教師ラベル取得部２１８は、例えばこのようにして、ユーザ入力を介して教師ラベルを取得する。教師ラベル取得部２１８は、取得した教師ラベルをラベリング部２２０へ供給する。 In step S1010, the evaluation model generating device 200 acquires teacher labels for at least a portion of the correlation data. For example, the teacher label acquisition unit 218 accepts input of teacher labels in response to the display of the correlation data generated in step S1006. In response to this, the expert or the like judges the correlation data based on the graph image, and assigns an "OK" label to correlation data that is judged to be "no problem", and an "NG" label to correlation data that is judged to be "problematic/suspicious". In other words, the expert or the like focuses on which data and labels where the problem is. The teacher label acquisition unit 218 acquires teacher labels via user input, for example, in this way. The teacher label acquisition unit 218 supplies the acquired teacher labels to the labeling unit 220.

ステップＳ１０１２において、評価モデル生成装置２００は、初期のラベリングモデルを生成する。例えば、ラベリング部２２０は、ステップＳ１００６において生成された相関データの少なくとも一部にステップＳ１０１０において取得した教師ラベルを付すことにより、例えば図３に示されるような教師データを生成する。そして、ラベリング部２２０は、このような教師データを、例えば、図４や図５に示される学習器に入力することにより、初期のラベリングモデルを生成する。そして、評価モデル生成装置２００は、処理をステップＳ１００２に戻してフローを継続する。これ以降、評価モデル生成装置２００は、このようにして生成したラベリングモデルを用いてラベルが未付与の相関データをラベリングすることとなる。 In step S1012, the evaluation model generating device 200 generates an initial labeling model. For example, the labeling unit 220 generates teacher data such as that shown in FIG. 3 by attaching the teacher label acquired in step S1010 to at least a part of the correlation data generated in step S1006. The labeling unit 220 then generates an initial labeling model by inputting such teacher data to a learning device such as that shown in FIG. 4 or FIG. 5. The evaluation model generating device 200 then returns the process to step S1002 and continues the flow. From this point on, the evaluation model generating device 200 uses the labeling model generated in this way to label correlation data to which no labels have been assigned.

ステップＳ１０１２において初期のラベリングモデルが生成された後は、ステップＳ１００８において、評価モデル生成装置２００は、ラベリングモデルが有る（生成済みである）と判定し、処理をステップＳ１０１４へ進める。 After the initial labeling model is generated in step S1012, in step S1008, the evaluation model generation device 200 determines that a labeling model exists (has already been generated), and proceeds to step S1014.

ステップＳ１０１４において、評価モデル生成装置２００は、ステップＳ１０１０において取得された教師ラベルに基づいて生成されたラベリングモデルを用いて、ステップＳ１００６において生成された相関データをラベリングする。この際、グラフ画像で用いるデータの範囲は、操業目標を基にして紐づけておいてもよいし、初期のラベリングモデルを生成する際に用いられたデータを使って絞っておいてもよい。例えば、ラベリング部２２０は、ステップＳ１０１２において生成された初期のラベリングモデルおよび更新用のラベリングモデルに、ラベルが未付与の相関データを入力する。これに応じて、ラベリングモデルは、操業目標を分類し、対象を分類し、物理量を分類し、そして、グラフ画像を分類する。すなわち、ラベリングモデルは、操業目標が共通または類似するデータを識別し、対象が共通または類似するデータを識別し、物理量が共通または類似するデータを識別する。そして、ラベリングモデルは、このようにして識別されたグラフ画像を比較し、ラベルが未付与の相関データに対して、グラフ画像が「ＯＫ」ラベルが付されたグラフ画像に類似する場合に「ＯＫ」ラベルを付し、グラフ画像が「ＮＧ」ラベルが付されたグラフ画像に類似する場合に「ＮＧ」ラベルを付す。この際、ラベリングモデルは、データを画像にしてグラフの形で類似を判別してもよいし、ＲＮＮ（ＲｅｃｕｒｒｅｎｔＮｅｕｒａｌＮｅｔｗоｒｋ）やＬＳＴＭ（ＬｏｎｇＳｈｏｒｔＴｅｒｍＭｅｍｏｒｙ）等を使ってデータ波形が類似するかどうかを判別してもよい。 In step S1014, the evaluation model generating device 200 labels the correlation data generated in step S1006 using the labeling model generated based on the teacher label acquired in step S1010. At this time, the range of data used in the graph image may be linked based on the operation target, or may be narrowed down using the data used when generating the initial labeling model. For example, the labeling unit 220 inputs unlabeled correlation data to the initial labeling model generated in step S1012 and the labeling model for update. In response to this, the labeling model classifies the operation target, classifies the object, classifies the physical quantity, and classifies the graph image. That is, the labeling model identifies data with common or similar operation targets, identifies data with common or similar objects, and identifies data with common or similar physical quantities. The labeling model then compares the graph images thus identified, and assigns an "OK" label to unlabeled correlation data if the graph image is similar to a graph image labeled "OK", and an "NG" label if the graph image is similar to a graph image labeled "NG". In this case, the labeling model may turn the data into an image and determine the similarity in the form of a graph, or may use a recurrent neural network (RNN) or a long short term memory (LSTM) to determine whether the data waveforms are similar.

ラベリング部２２０は、初期のラベリングモデルが「ＯＫ」に分類し、かつ、更新用のラベリングモデルが「ＯＫ」に分類した相関データに対して「ＯＫ」ラベルを付してよい。同様に、ラベリング部２２０は、初期のラベリングモデルが「ＮＧ」に分類し、かつ、更新用のラベリングモデルが「ＮＧ」に分類した相関データに対して「ＮＧ」ラベルを付してよい。すなわち、ラベリング部２２０は、初期のラベリングモデルの分類結果と更新用のラベリングモデルの分類結果の論理積により、相関データをラベリングしてもよい。しかしながら、初期のラベリングモデルと更新用のラベリングモデルとが異なる分類結果を示すことも考えられる。このような場合には、ラベリング部２２０は、初期のラベリングモデルの分類結果を優先してもよい。これに代えて、ラベリング部２２０は、更新用のラベリングモデルの分類結果を優先してもよい。または、ラベリング部２２０は、初期のラベリングモデルの分類結果と更新用のラベリングモデルの分類結果の論理和により、相関データをラベリングしてもよい。ラベリング部２２０は、このようにしてラベリングされた相関データをラベリングデータ出力部２２２へ供給する。 The labeling unit 220 may attach an "OK" label to correlation data that the initial labeling model classifies as "OK" and that the updated labeling model classifies as "OK". Similarly, the labeling unit 220 may attach an "NG" label to correlation data that the initial labeling model classifies as "NG" and that the updated labeling model classifies as "NG". That is, the labeling unit 220 may label the correlation data by the logical product of the classification result of the initial labeling model and the classification result of the updated labeling model. However, it is also conceivable that the initial labeling model and the updated labeling model show different classification results. In such a case, the labeling unit 220 may give priority to the classification result of the initial labeling model. Alternatively, the labeling unit 220 may give priority to the classification result of the updated labeling model. Alternatively, the labeling unit 220 may label the correlation data by performing a logical sum of the classification result of the initial labeling model and the classification result of the updated labeling model. The labeling unit 220 supplies the correlation data labeled in this manner to the labeling data output unit 222.

ステップＳ１０１６において、評価モデル生成装置２００は、ラベリングデータを出力する。例えば、ラベリングデータ出力部２２２は、ステップＳ１０１４においてラベリングされた相関データとセンサデータとに基づき、例えば、図９に示されるようなラベリングデータを生成する。そして、ラベリングデータ出力部２２２は、生成したラベリングデータを機械学習機能部２３０へ出力する。 In step S1016, the evaluation model generating device 200 outputs the labeling data. For example, the labeling data output unit 222 generates labeling data such as that shown in FIG. 9 based on the correlation data and the sensor data labeled in step S1014. Then, the labeling data output unit 222 outputs the generated labeling data to the machine learning function unit 230.

ステップＳ１０１８において、評価モデル生成装置２００は、評価モデルが妥当である旨の判定結果を取得したかどうか判定する。取得していないと判定された場合（Ｎｏの場合）、評価モデル生成装置２００は、フローを終了する。一方、取得したと判定された場合（Ｙｅｓの場合）、評価モデル生成装置２００は、処理をステップＳ１０２０へ進める。 In step S1018, the evaluation model generating device 200 determines whether a judgment result indicating that the evaluation model is valid has been obtained. If it is determined that a judgment result has not been obtained (No), the evaluation model generating device 200 ends the flow. On the other hand, if it is determined that a judgment result has been obtained (Yes), the evaluation model generating device 200 advances the process to step S1020.

ステップＳ１０２０において、評価モデル生成装置２００は、評価モデルが妥当である旨の判定結果を取得したと判定された場合に、ラベリングモデルを更新する。例えば、ラベリングモデル更新部２２４は、教師ラベルに基づいて生成された初期のラベリングモデルとは別に、更新用のラベリングモデルを生成し、当該更新用のラベリングモデルを更新する。一般に、学習器により付与されたラベルよりも有識者等により付与された教師ラベルの方が確からしい。したがって、ラベリングモデル更新部２２４は、初期のラベリングモデルを更新していくのではなく、初期のラベリングモデルとは別に生成された更新用のラベリングモデルを更新することで、有識者等により付与された教師ラベルの影響度が徐々に薄まっていくことを避けることができる。しかしながら、これに限定されるものではなく、ラベリングモデル更新部２２４が、初期のラベリングモデルを更新する場合を排除するものではない。 In step S1020, the evaluation model generating device 200 updates the labeling model when it is determined that a judgment result indicating that the evaluation model is valid has been obtained. For example, the labeling model updating unit 224 generates an updated labeling model separately from the initial labeling model generated based on the teacher label, and updates the updated labeling model. In general, the teacher label assigned by an expert or the like is more likely to be accurate than the label assigned by the learning device. Therefore, the labeling model updating unit 224 does not update the initial labeling model, but updates the updated labeling model generated separately from the initial labeling model, thereby preventing the influence of the teacher label assigned by an expert or the like from gradually weakening. However, this is not limited to this, and does not exclude the case where the labeling model updating unit 224 updates the initial labeling model.

評価モデル生成装置２００のラベリング機能部２１０は、例えばこのようにしてラベリング処理を実行してよい。次に、評価モデル生成装置２００の機械学習機能部２３０における処理の詳細を、データ例やフローを用いて詳細に説明する。 The labeling function unit 210 of the evaluation model generating device 200 may perform the labeling process in this manner, for example. Next, the details of the processing in the machine learning function unit 230 of the evaluation model generating device 200 will be explained in detail using example data and a flow.

図１１は、評価モデル生成部２５０のブロック図の一例を示す。評価モデル生成部２５０は、複数の学習部２５２ａ、２５２ｂ、・・・、学習部２５２ｎ（「学習部２５２」と総称する。）を有しており、複数の学習部２５２が並列して学習を実行する。学習部２５２ａは、前処理部２５４ａと、機械学習部２５６ａとを含む。同様に、学習部２５２ｂは、前処理部２５４ｂと、機械学習部２５６ｂとを含む。同様に、学習部２５２ｃは、前処理部２５４ｃと、機械学習部２５６ｃとを含む。ここで、前処理部２５４ａ、２５４ｂ、・・・、２５４ｎを「前処理部２５４」と総称する。また、機械学習部２５６ａ、２５６ｂ、・・・、２５６ｎを「機械学習部２５６」と総称する。 FIG. 11 shows an example of a block diagram of the evaluation model generation unit 250. The evaluation model generation unit 250 has multiple learning units 252a, 252b, ..., learning unit 252n (collectively referred to as "learning units 252"), and the multiple learning units 252 perform learning in parallel. The learning unit 252a includes a preprocessing unit 254a and a machine learning unit 256a. Similarly, the learning unit 252b includes a preprocessing unit 254b and a machine learning unit 256b. Similarly, the learning unit 252c includes a preprocessing unit 254c and a machine learning unit 256c. Here, the preprocessing units 254a, 254b, ..., 254n are collectively referred to as "preprocessing units 254". Also, the machine learning units 256a, 256b, ..., 256n are collectively referred to as "machine learning units 256".

前処理部２５４は、ラベリングデータを前処理する。例えば、前処理部２５４は、ラベリングデータに対して標準化処理、正規化処理、ローパスフィルタ、ハイパスフィルタ、および、主成分分析等の処理を実行する。前処理部２５４は、前処理したラベリングデータを機械学習部２５６へ供給する。 The preprocessing unit 254 preprocesses the labeling data. For example, the preprocessing unit 254 performs standardization, normalization, low-pass filtering, high-pass filtering, principal component analysis, and other processes on the labeling data. The preprocessing unit 254 supplies the preprocessed labeling data to the machine learning unit 256.

機械学習部２５６は、前処理部２５４が前処理したラベリングデータを学習データとして、機械学習のアルゴリズムにより評価モデルを生成する。 The machine learning unit 256 uses the labeling data preprocessed by the preprocessing unit 254 as learning data and generates an evaluation model using a machine learning algorithm.

このように、評価モデル生成部２５０は、複数の学習部２５２を有し、複数の学習部２５２のそれぞれが機械学習部２５６を含む。これにより、評価モデル生成部２５０は、それぞれの学習部２５２による複数の評価モデルを生成することとなる。この際、複数の学習部２５２において、前処理部２５４の処理内容、および、機械学習部２５６のアルゴリズムの少なくともいずれかが異なっているとよい。これにより、評価モデル生成部２５０は、それぞれ異なる複数の評価モデルを生成することができる。このようにして生成された評価モデルは、例えば、センサデータに対して、ＯＫ教師に近いかＮＧ教師に近いかを数値で出力してよい。 In this way, the evaluation model generation unit 250 has multiple learning units 252, each of which includes a machine learning unit 256. As a result, the evaluation model generation unit 250 generates multiple evaluation models using the respective learning units 252. At this time, it is preferable that at least one of the processing contents of the preprocessing unit 254 and the algorithm of the machine learning unit 256 is different in the multiple learning units 252. As a result, the evaluation model generation unit 250 can generate multiple evaluation models that are different from each other. The evaluation model generated in this way may output a numerical value indicating whether the sensor data is close to an OK teacher or a NG teacher, for example.

図１２は、評価モデルの出力の一例を示す。本図においては、一例として、操業目標において対象とする品質についての品質向上を目標とした評価モデルの出力を示している。本図において縦軸は、ヘルスインデックスを示している。一例として、このような評価モデルは、対象とする目標が目標値と同じ値となることが推測される場合に、ヘルスインデックス＝０を出力する。そして、評価モデルは、対象とする目標が目標値よりも良いことが推測される程、０よりも大きい値を出力し、対象とする目標が目標値よりも悪いことが推測される程、０よりも小さい値を出力する。 Figure 12 shows an example of the output of an evaluation model. In this figure, as an example, the output of an evaluation model with the goal of improving the quality of the target quality in the operational objectives is shown. In this figure, the vertical axis represents the health index. As an example, such an evaluation model outputs a health index = 0 when it is predicted that the target will be the same value as the target value. The more the target is predicted to be better than the target value, the more the evaluation model outputs a value greater than 0, and the more the target is predicted to be worse than the target value, the more the evaluation model outputs a value smaller than 0.

また、本図において横軸は、時間を示している。一例として、本図の横軸における前半は、「ＯＫ」ラベルが付された期間のデータを評価モデルに入力した場合を一例として示している。この場合、評価モデルは、当該期間において０以上の値を出力する率が高い程、ラベリングとの正答率が高いため、妥当であるということができる。同様に、本図の横軸における後半は、「ＮＧ」ラベルが付された期間のデータを評価モデルに入力した場合を示している。この場合、評価モデルは、当該期間において０未満の値を出力する率が高い程、ラベリングとの正答率が高いため、妥当であるということができる。 In addition, the horizontal axis in this figure represents time. As an example, the first half of the horizontal axis in this figure shows a case where data from a period labeled "OK" is input into the evaluation model. In this case, the evaluation model can be said to be valid because the higher the rate at which values greater than or equal to 0 are output during that period, the higher the accuracy rate with the labeling. Similarly, the second half of the horizontal axis in this figure shows a case where data from a period labeled "NG" is input into the evaluation model. In this case, the higher the rate at which values less than 0 are output during that period, the higher the accuracy rate with the labeling.

図１３は、評価モデルの出力の他の例を示す。本図においては、一例として、操業目標において対象とする触媒利用の延長（コストカット）を目標とした評価モデルの出力を示している。本図において縦軸は、ヘルスインデックスを示している。また、本図において横軸は、時間を示している。一般に、触媒が時間の経過とともに徐々に減少していくという事象が発生する。したがって、評価モデルは、本図において矢印で示されるように、出力が単調減少性を有する程、事象を正しく捕らえているため、妥当であるということができる。評価モデル生成装置２００の機械学習機能部２３０は、例えばこのような結果を出力し得る評価モデルを生成する。 Figure 13 shows another example of the output of the evaluation model. In this figure, as an example, the output of an evaluation model with the goal of extending catalyst use (cost cutting) as a target in the operational goal is shown. In this figure, the vertical axis represents the health index. Also, in this figure, the horizontal axis represents time. Generally, an event occurs in which the catalyst gradually decreases over time. Therefore, as shown by the arrow in this figure, the evaluation model can be said to be valid because the more monotonically decreasing the output is, the more accurately the event is captured. The machine learning function unit 230 of the evaluation model generation device 200 generates an evaluation model that can output such a result, for example.

図１４は、評価モデル生成装置２００の機械学習機能部２３０における処理フローの一例を示す。評価モデル生成装置２００の機械学習機能部２３０は、例えば本図に示されるフローにより、機械学習による評価モデルの生成処理を実行してよい。 Figure 14 shows an example of a processing flow in the machine learning function unit 230 of the evaluation model generating device 200. The machine learning function unit 230 of the evaluation model generating device 200 may execute the process of generating an evaluation model by machine learning, for example, according to the flow shown in this figure.

ステップＳ１４１０において、評価モデル生成装置２００は、ラベリングデータを取得する。例えば、ラベリングデータ取得部２４０は、図１０のフローにおけるステップＳ１０１６において出力されたラベリングデータを取得する。一例として、ラベリングデータ取得部２４０は、図９に示されるようなラベリングデータを取得してよい。 In step S1410, the evaluation model generating device 200 acquires labeling data. For example, the labeling data acquisition unit 240 acquires the labeling data output in step S1016 in the flow of FIG. 10. As an example, the labeling data acquisition unit 240 may acquire labeling data such as that shown in FIG. 9.

ステップＳ１４２０において、評価モデル生成装置２００は、判定用データをサンプリングする。例えば、ラベリングデータ取得部２４０は、ステップＳ１４１０において取得されたラベリングデータの一部を判定用データとしてサンプリングする。一例として、ラベリングデータ取得部２４０は、取得したラベリングデータから判定用データをランダムにサンプリングしてよい。これに代えて、ラベリングデータ取得部２４０は、取得したラベリングデータの前半を機械学習用データとし、後半を判定用データとしてサンプリングしてもよい。ラベリングデータ取得部２４０は、サンプリングした判定用データを評価モデル判定部２６０へ供給する。また、ラベリングデータ取得部２４０は、残りのラベリングデータを機械学習用データとして評価モデル生成部２５０へ供給する。 In step S1420, the evaluation model generating device 200 samples the judgment data. For example, the labeling data acquiring unit 240 samples a portion of the labeling data acquired in step S1410 as judgment data. As an example, the labeling data acquiring unit 240 may randomly sample judgment data from the acquired labeling data. Alternatively, the labeling data acquiring unit 240 may sample the first half of the acquired labeling data as machine learning data and the second half as judgment data. The labeling data acquiring unit 240 supplies the sampled judgment data to the evaluation model determining unit 260. In addition, the labeling data acquiring unit 240 supplies the remaining labeling data to the evaluation model generating unit 250 as machine learning data.

ステップＳ１４３０において、評価モデル生成装置２００は、ステップＳ１４２０において供給されたラベリングデータ、すなわち、図１０のフローにおけるステップＳ１０１４においてラベリングされた相関データとセンサデータとに基づき生成されたラベリングデータを用いて、設備１０における操業目標および設備１０における状態に基づいて対象とする目標について設備１０における状態を評価した指標を出力する評価モデルを生成する。 In step S1430, the evaluation model generating device 200 uses the labeling data supplied in step S1420, i.e., the labeling data generated based on the correlation data and sensor data labeled in step S1014 in the flow of FIG. 10, to generate an evaluation model that outputs an index that evaluates the state of the equipment 10 for a target target based on the operational target of the equipment 10 and the state of the equipment 10.

より詳細には、ステップＳ１４３２において、評価モデル生成装置２００は、ラベリングデータを前処理する。例えば、前処理部２５４ａ、２５４ｂ、・・・、２５４ｎはそれぞれ、ステップＳ１４２０において供給されたラベリングデータに対して、標準化処理、正規化処理、ローパスフィルタ、ハイパスフィルタ、および、主成分分析等の処理を実行する。この際、前処理部２５４ａ、２５４ｂ、・・・、２５４ｎは、それぞれ異なる処理内容を実行してよい。前処理部２５４ａ、２５４ｂ、・・・、２５４ｎは、それぞれ、前処理したラベリングデータを機械学習部２５６ａ、２５６ｂ、・・・、２５６ｎへ供給する。 More specifically, in step S1432, the evaluation model generating device 200 preprocesses the labeling data. For example, the preprocessing units 254a, 254b, ..., 254n each perform standardization, normalization, low-pass filtering, high-pass filtering, and principal component analysis on the labeling data supplied in step S1420. At this time, the preprocessing units 254a, 254b, ..., 254n may each perform different processing. The preprocessing units 254a, 254b, ..., 254n each supply the preprocessed labeling data to the machine learning units 256a, 256b, ..., 256n.

ステップＳ１４３４において、評価モデル生成装置２００は、機械学習を実行する。例えば、機械学習部２５６ａ、２５６ｂ、・・・、２５６ｎはそれぞれ、ステップＳ１４３２において前処理されたラベリングデータを学習データとして、機械学習のアルゴリズムにより評価モデルを生成する。この際、機械学習部２５６ａ、２５６ｂ、・・・、２５６ｎは、それぞれ異なるアルゴリズムにより機械学習を実行してよい。したがって、学習部２５２ａ、２５２ｂ、・・・、２５２ｎは、前処理内容および機械学習アルゴリズムの少なくともいずれかが異なる処理を実行することによって、それぞれ異なる複数の評価モデルを生成してよい。このような評価モデルは、一例として、データが入力されたことに応じて図１２や図１３のような結果を出力するモデルであってよい。評価モデル生成部２５０は、このようにして生成された評価モデルを評価モデル判定部２６０および評価モデル出力部２７０へ供給する。 In step S1434, the evaluation model generating device 200 performs machine learning. For example, the machine learning units 256a, 256b, ..., 256n each generate an evaluation model using a machine learning algorithm, using the labeling data preprocessed in step S1432 as learning data. At this time, the machine learning units 256a, 256b, ..., 256n may perform machine learning using different algorithms. Therefore, the learning units 252a, 252b, ..., 252n may generate multiple different evaluation models by performing processing in which at least one of the preprocessing content and the machine learning algorithm is different. As an example, such an evaluation model may be a model that outputs a result such as that shown in FIG. 12 or FIG. 13 in response to input data. The evaluation model generating unit 250 supplies the evaluation model generated in this manner to the evaluation model determining unit 260 and the evaluation model output unit 270.

ステップＳ１４４０において、評価モデル生成装置２００は、評価モデルの妥当性を判定する。例えば、評価モデル判定部２６０は、ステップＳ１４２０においてサンプリングされた判定用データを、評価モデル生成部２５０における複数の学習部２５２によって生成された複数の評価モデルのそれぞれに入力することによって、複数の評価モデルのそれぞれについて妥当性を判定する。 In step S1440, the evaluation model generating device 200 judges the validity of the evaluation model. For example, the evaluation model judging unit 260 judges the validity of each of the multiple evaluation models by inputting the judgment data sampled in step S1420 into each of the multiple evaluation models generated by the multiple learning units 252 in the evaluation model generating unit 250.

この際、一例として、生成された評価モデルが図１２のような結果を出力する評価モデルである場合、「ＯＫ」ラベルが付された判定用データを入力したことに応じて、評価モデルが０以上の値を出力する割合（「ＯＫ」ラベルに対する正答率）が予め定めらえた閾値を超える場合に、評価モデル判定部２６０は、当該評価モデルが妥当であると判定してよい。これに代えて、または、加えて、生成された評価モデルが図１２のような結果を出力する評価モデルである場合、「ＮＧ」ラベルが付された判定用データを入力したことに応じて、評価モデルが０未満の値を出力する割合（「ＮＧ」ラベルに対する正答率）が予め定めらえた閾値を超える場合に、評価モデル判定部２６０は、当該評価モデルが妥当であると判定してもよい。他の例として、生成された評価モデルが図１３のような結果を出力する評価モデルである場合、判定用データを入力したことに応じて、評価モデルが単調減少性を有する結果を出力している場合に、評価モデル判定部２６０は、当該評価モデルが妥当であると判定してもよい。 In this case, as an example, if the generated evaluation model is an evaluation model that outputs a result as shown in FIG. 12, when the evaluation model outputs a value of 0 or more in response to input of judgment data labeled with an "OK" label (correct answer rate for the "OK" label) exceeds a predetermined threshold, the evaluation model determination unit 260 may determine that the evaluation model is valid. Alternatively, or in addition, if the generated evaluation model is an evaluation model that outputs a result as shown in FIG. 12, when the evaluation model outputs a value of less than 0 in response to input of judgment data labeled with an "NG" label (correct answer rate for the "NG" label) exceeds a predetermined threshold, the evaluation model determination unit 260 may determine that the evaluation model is valid. As another example, if the generated evaluation model is an evaluation model that outputs a result as shown in FIG. 13, when the evaluation model outputs a result having monotonically decreasing properties in response to input of judgment data, the evaluation model determination unit 260 may determine that the evaluation model is valid.

生成された複数の評価モデルのうち、いずれの評価モデルも妥当でないと判定された場合（Ｎｏの場合）、評価モデル生成装置２００は、処理をステップＳ１４１０に戻してフローを継続する。一方、生成された複数の評価モデルのうち、少なくともいずれかの評価モデルが妥当であると判定された場合（Ｙｅｓの場合）、評価モデル生成装置２００は、処理をステップＳ１４５０へ進める。 If it is determined that none of the multiple evaluation models generated are valid (No), the evaluation model generating device 200 returns the process to step S1410 and continues the flow. On the other hand, if it is determined that at least one of the multiple evaluation models generated is valid (Yes), the evaluation model generating device 200 advances the process to step S1450.

ステップＳ１４５０において、評価モデル生成装置２００は、判定結果をフィードバックする。例えば、評価モデル判定部２６０は、評価モデルが妥当であると判定した旨を、当該判定を得るにあたったラベリングデータ、すなわち、ステップＳ１４１０において取得されたラベリングデータを特定する情報とともに、ラベリングモデル更新部２２４へ通知する。これに応じて、ラベリングモデル更新部２２４は、ラベリングモデルを更新する。また、評価モデル判定部２６０は、評価モデルが妥当であると判定した旨を、当該妥当であると判定した評価モデルを識別する情報とともに、評価モデル出力部２７０へ通知する。 In step S1450, the evaluation model generating device 200 feeds back the judgment result. For example, the evaluation model judgment unit 260 notifies the labeling model update unit 224 that it has judged the evaluation model to be valid, together with information identifying the labeling data used to obtain the judgment, i.e., the labeling data acquired in step S1410. In response to this, the labeling model update unit 224 updates the labeling model. In addition, the evaluation model judgment unit 260 notifies the evaluation model output unit 270 that it has judged the evaluation model to be valid, together with information identifying the evaluation model judged to be valid.

ステップＳ１４６０において、評価モデル生成装置２００は、評価モデルを出力する。例えば、評価モデル出力部２７０は、ステップＳ１４５０において、評価モデルが妥当である旨を通知された場合に、妥当であると判定された評価モデルを、操業モデル生成装置３００へ出力する。 In step S1460, the evaluation model generating device 200 outputs the evaluation model. For example, if the evaluation model output unit 270 is notified in step S1450 that the evaluation model is valid, the evaluation model output unit 270 outputs the evaluation model that is determined to be valid to the operation model generating device 300.

評価モデル生成装置２００の機械学習機能部２３０は、例えばこのようにして機械学習による評価モデルの生成処理を実行してよい。すなわち、評価モデル生成装置２００は、図１０のフローによりラベリング処理を実行し、図１４のフローにより評価モデルの生成処理を実行する。そして、評価モデル生成装置２００は、生成した評価モデルを操業モデル生成装置３００へ出力する。 The machine learning function unit 230 of the evaluation model generating device 200 may execute the process of generating an evaluation model by machine learning, for example, in this manner. That is, the evaluation model generating device 200 executes the labeling process according to the flow of FIG. 10, and executes the process of generating an evaluation model according to the flow of FIG. 14. Then, the evaluation model generating device 200 outputs the generated evaluation model to the operation model generating device 300.

図１５は、本実施形態に係る操業システム１００における操業モデル生成装置３００のブロック図の一例を示す。操業モデル生成装置３００についても、評価モデル生成装置２００と同様、コンピュータであってよく、複数のコンピュータが接続されたコンピュータシステムであってもよい。また、操業モデル生成装置３００は、コンピュータ内で１または複数実行可能な仮想コンピュータ環境によって実装されてもよい。これに代えて、操業モデル生成装置３００は、操業モデルの生成用に設計された専用コンピュータであってもよく、専用回路によって実現された専用ハードウェアであってもよい。また、インターネットに接続可能な場合、操業モデル生成装置３００は、クラウドコンピューティングにより実現されてもよい。 Figure 15 shows an example of a block diagram of the operation model generating device 300 in the operation system 100 according to this embodiment. Like the evaluation model generating device 200, the operation model generating device 300 may be a computer or a computer system to which multiple computers are connected. The operation model generating device 300 may also be implemented by a virtual computer environment in which one or more programs can be executed within a computer. Alternatively, the operation model generating device 300 may be a dedicated computer designed for generating an operation model, or may be dedicated hardware realized by a dedicated circuit. Furthermore, if the operation model generating device 300 is connectable to the Internet, it may be realized by cloud computing.

操業モデル生成装置３００は、評価モデル取得部３１２と、学習環境データ取得部３１４と、操業モデル生成部３１６と、学習操作指示部３１８と、操業モデル判定部３２０と、操業モデル出力部３２２とを備える。 The operation model generating device 300 includes an evaluation model acquisition unit 312, a learning environment data acquisition unit 314, an operation model generating unit 316, a learning operation instruction unit 318, an operation model determination unit 320, and an operation model output unit 322.

評価モデル取得部３１２は、評価モデル出力部２３８が出力した評価モデルを、例えば、ネットワークを介して取得する。しかしながら、これに限定されるものではない。評価モデル取得部３１２は、評価モデルを、各種メモリデバイスを介して取得してもよいし、ユーザ入力を介して取得してもよい。評価モデル取得部３１２は、取得した評価モデルを、操業モデル生成部３１６へ供給する。 The evaluation model acquisition unit 312 acquires the evaluation model output by the evaluation model output unit 238, for example, via a network. However, this is not limited to this. The evaluation model acquisition unit 312 may acquire the evaluation model via various memory devices or via user input. The evaluation model acquisition unit 312 supplies the acquired evaluation model to the operation model generation unit 316.

学習環境データ取得部３１４は、学習環境における状態を示す学習環境データを、ネットワークを介して取得する。しかしながら、これに限定されるものではない。学習環境データ取得部３１４は、学習環境データを、各種メモリデバイスを介して取得してもよいし、ユーザ入力を介して取得してもよい。学習環境データ取得部３１４は、取得した学習環境データを操業モデル生成部３１６へ供給する。 The learning environment data acquisition unit 314 acquires learning environment data indicating the state in the learning environment via a network. However, this is not limited to this. The learning environment data acquisition unit 314 may acquire the learning environment data via various memory devices or via user input. The learning environment data acquisition unit 314 supplies the acquired learning environment data to the operation model generation unit 316.

操業モデル生成部３１６は、学習環境データ取得部３１４が取得した学習環境データを用いて、評価モデル取得部３１２が取得した評価モデルの出力を報酬の少なくとも一部とした強化学習により、設備１０における状態に応じた行動を出力する操業モデルを生成する。操業モデル生成部３１６は、生成した操業モデルを操業モデル判定部３２０および操業モデル出力部３２２へ供給する。 The operation model generation unit 316 uses the learning environment data acquired by the learning environment data acquisition unit 314 to generate an operation model that outputs behavior according to the state of the facility 10 through reinforcement learning using the output of the evaluation model acquired by the evaluation model acquisition unit 312 as at least a part of the reward. The operation model generation unit 316 supplies the generated operation model to the operation model determination unit 320 and the operation model output unit 322.

学習操作指示部３１８は、強化学習中の操業モデルが出力する行動に基づく操作量を、学習環境における制御対象へ与える。 The learning operation instruction unit 318 applies the operation amount based on the behavior output by the operation model during reinforcement learning to the control object in the learning environment.

操業モデル判定部３２０は、操業モデル生成部３１６が生成した操業モデルの妥当性を判定する。操業モデル判定部３２０は、操業モデルが妥当であると判定した場合に、その旨を操業モデル出力部３２２へ通知する。 The operation model determination unit 320 determines the validity of the operation model generated by the operation model generation unit 316. If the operation model determination unit 320 determines that the operation model is valid, it notifies the operation model output unit 322 of that effect.

操業モデル出力部３２２は、操業モデルが妥当であると判定された場合に、操業モデルを制御装置４００へ出力する。このような操業モデル生成装置３００における処理の詳細を、データ例やフローを用いて詳細に説明する。 If the operation model is determined to be valid, the operation model output unit 322 outputs the operation model to the control device 400. The details of the processing in the operation model generating device 300 will be explained in detail using example data and flow charts.

図１６は、操業モデル生成装置３００が生成する操業モデルの一例を示す。操業モデルは、サンプリングされた状態データの集合を示す状態ｓと各状態下に取られた行動ａとの組み合わせ（ｓ，ａ）と、報酬によって計算されたウエイトｗとで構成される。なお、このようなウエイトを計算するための報酬の少なくとも一部として、評価モデル生成装置２００が生成した評価モデルの出力が用いられる。本図においては、一例として、状態ｓ＝（ＴＩ００１，ＴＩ００２，ＴＩ００３，ＦＩ００１，ＦＩ００２，ＶＩ００１）とした場合を示している。そして、本図においては、例えば、ｓ＝（－２．４７８０３，－２．４８４１３，－０．０７３２４，２９．７１１９１，２４．２５１１，７０）の状態下でａ＝１の行動が取られた場合に、報酬によって計算されたウエイトがｗ＝１４４．１４８４であることを意味している。このような操業モデルにより次の行動が決定される。 Figure 16 shows an example of an operation model generated by the operation model generating device 300. The operation model is composed of a combination (s, a) of a state s indicating a set of sampled state data and an action a taken under each state, and a weight w calculated by the reward. Note that the output of the evaluation model generated by the evaluation model generating device 200 is used as at least a part of the reward for calculating such weights. In this figure, as an example, a case where state s = (TI001, TI002, TI003, FI001, FI002, VI001) is shown. In this figure, for example, when an action a = 1 is taken under a state s = (-2.47803, -2.48413, -0.07324, 29.71191, 24.2511, 70), the weight calculated by the reward is w = 144.1484. The next action is determined by such an operation model.

図１７は、行動決定テーブルの一例を示す。行動決定テーブルは、入力された状態ｓと取り得る行動ａとで構成される。本図においては、一例として、入力された状態がｓ＝（０．１，０．２，０．４，０．３，０．８，０．２）であり、取り得る行動がａ＝（－３，－１，０，１，３）の５つである場合を示している。例えば、このような行動決定テーブルを図１６に示される操業モデルに入力することにより、次の行動が決定される。これについてフローを用いて詳細に説明する。 Figure 17 shows an example of an action decision table. The action decision table is composed of an input state s and possible actions a. In this figure, as an example, the input state s = (0.1, 0.2, 0.4, 0.3, 0.8, 0.2) is shown, and the possible actions are a = (-3, -1, 0, 1, 3). For example, by inputting such an action decision table into the operation model shown in Figure 16, the next action is decided. This will be explained in detail using a flow chart.

図１８は、操業モデル生成装置３００における処理フローの一例を示す。操業モデル生成装置３００は、例えば本図に示されるフローにより、操業モデルの生成処理を実行してよい。 Figure 18 shows an example of a processing flow in the operation model generation device 300. The operation model generation device 300 may execute the generation process of the operation model, for example, according to the flow shown in this figure.

ステップＳ１８０２において、操業モデル生成装置３００は、評価モデルを取得する。例えば、評価モデル取得部３１２は、図１４のフローにおけるステップＳ１４６０において出力された評価モデルを、ネットワークを介して取得する。評価モデル取得部３１２は、取得した評価モデルを、操業モデル生成部３１６へ供給する。 In step S1802, the operation model generation device 300 acquires an evaluation model. For example, the evaluation model acquisition unit 312 acquires the evaluation model output in step S1460 in the flow of FIG. 14 via a network. The evaluation model acquisition unit 312 supplies the acquired evaluation model to the operation model generation unit 316.

ステップＳ１８０４において、操業モデル生成装置３００は、強化学習により操業モデルを生成する。例えば、操業モデル生成部３１６は、ステップＳ１８０２において取得された評価モデルの出力を報酬の少なくとも一部とした強化学習により、設備１０における状態に応じた行動を出力する操業モデルを生成する。一例として、操業モデル生成部３１６は、図１６に示されるような操業モデルを生成する。この詳細については、別フローを用いて後述する。操業モデル生成部３１６は、生成した操業モデルを操業モデル判定部３２０および操業モデル出力部３２２へ供給する。 In step S1804, the operation model generation device 300 generates an operation model by reinforcement learning. For example, the operation model generation unit 316 generates an operation model that outputs an action according to the state of the equipment 10 by reinforcement learning using the output of the evaluation model acquired in step S1802 as at least a part of the reward. As an example, the operation model generation unit 316 generates an operation model as shown in FIG. 16. Details of this will be described later using a separate flow. The operation model generation unit 316 supplies the generated operation model to the operation model determination unit 320 and the operation model output unit 322.

ステップＳ１８０６において、操業モデル生成装置３００は、操業モデルの妥当性を判定する。例えば、操業モデル判定部３２０は、ステップＳ１８０４において生成された操業モデルの妥当性を判定する。一例として、操業モデル判定部３２０は、評価モデル生成装置２００により設定された目標設定、操作端、および、観測点の情報に基づいて、プラントシミュレータにユーザによる操作を入れた際のデータ（ａ）、もしくは、ユーザによる操作の過去データ（ｂ）をリファレンスデータとして用意する。次に、操業モデル判定部３２０は、生成された操業モデルに対して、プラントシミュレータ上で操業モデルを動作させる（ｃ）。この際、操業モデル判定部３２０は、プラントシミュレータに代えて、実機を用いてもよい。そして、操業モデル判定部３２０は、（ｃ）により出力された結果と、（ａ）または（ｂ）とを比較することで、操業モデルの妥当性を判定する。すなわち、操業モデル判定部３２０は、プラントシミュレータにユーザによる操作を入れた際のリファレンスデータと、ＡＩにより操作した際の結果とを比較することで、操業モデルの妥当性を判定する。そして、操業モデル判定部３２０は、ＡＩにより操作した際の結果の方が高い場合に、生成された操業モデルが妥当（良好）であると判定する。操業モデル判定部３２０は、操業モデルが妥当であると判定した場合に、その旨を操業モデル出力部３２２へ通知する。 In step S1806, the operation model generating device 300 judges the validity of the operation model. For example, the operation model judging unit 320 judges the validity of the operation model generated in step S1804. As an example, the operation model judging unit 320 prepares data (a) when a user operates the plant simulator, or past data (b) of the user's operation, as reference data based on the target setting, operation terminal, and observation point information set by the evaluation model generating device 200. Next, the operation model judging unit 320 operates the operation model on the plant simulator for the generated operation model (c). At this time, the operation model judging unit 320 may use an actual machine instead of the plant simulator. Then, the operation model judging unit 320 judges the validity of the operation model by comparing the result output by (c) with (a) or (b). That is, the operational model determination unit 320 determines the validity of the operational model by comparing the reference data when the user inputs operations into the plant simulator with the results of operations performed by AI. If the results of operations performed by AI are higher, the operational model determination unit 320 determines that the generated operational model is valid (good). If the operational model determination unit 320 determines that the operational model is valid, it notifies the operational model output unit 322 of that effect.

ステップＳ１８０８において、操業モデル生成装置３００は、操業モデルを出力する。例えば、操業モデル出力部３２２は、ステップＳ１８０６において操業モデルが妥当であると判定された場合に、操業モデルを制御装置４００へ出力する。 In step S1808, the operation model generating device 300 outputs the operation model. For example, if the operation model is determined to be valid in step S1806, the operation model output unit 322 outputs the operation model to the control device 400.

図１９は、操業モデル生成部３１６における強化学習フローの一例を示す。操業モデル生成部３１６は、例えば本図に示されるフローにより、図１８のステップＳ１８０４における処理を実行してよい。 Figure 19 shows an example of a reinforcement learning flow in the operation model generation unit 316. The operation model generation unit 316 may execute the process in step S1804 in Figure 18, for example, according to the flow shown in this figure.

ステップＳ１９０２において、操業モデル生成装置３００は、学習環境データを取得する。例えば、学習環境データ取得部３１４は、学習環境における状態を示す学習環境データを、ネットワークを介して取得する。このような学習環境としては、設備１０の挙動を模擬するシミュレータが用いられてもよいし、実際の設備１０が用いられてもよい。例えば、設備１０がプラントである場合、学習環境として、プラントシミュレータが用いられてもよいし、実プラントが用いられてもよい。学習環境データ取得部３１４は、取得した学習環境データを操業モデル生成部３１６へ供給する。 In step S1902, the operation model generating device 300 acquires learning environment data. For example, the learning environment data acquiring unit 314 acquires learning environment data indicating the state in the learning environment via a network. As such a learning environment, a simulator that simulates the behavior of the equipment 10 may be used, or the actual equipment 10 may be used. For example, if the equipment 10 is a plant, a plant simulator may be used as the learning environment, or the actual plant may be used. The learning environment data acquiring unit 314 supplies the acquired learning environment data to the operation model generating unit 316.

ステップＳ１９０４において、操業モデル生成装置３００は、行動を決定する。例えば、操業モデル生成部３１６は、ランダムに行動を決定する。なお、上述の説明では、操業モデル生成部３１６がランダムに行動を決定する場合を一例として示したが、これに限定されるものではない。操業モデル生成部３１６が行動を決定するにあたって、例えば、ＦＫＤＰＰ（ＦａｃｔｏｒｉａｌＫｅｒｎｅｌＤｙｎａｍｉｃＰｏｌｉｃｙＰｒｏｇｒａｍｍｉｎｇ）等の既知のＡＩアルゴリズムが用いられてもよい。このようなカーネル法を用いる場合、操業モデル生成部３１６は、学習環境データにより得られたセンサ値から状態ｓのベクトルを生成する。次に、操業モデル生成部３１６は、状態ｓと、取り得る全ての行動ａとの組み合わせを、例えば図１７に示されるような行動決定テーブルとして生成する。そして、操業モデル生成部３１６は、行動決定テーブルを、例えば図１６に示されるような操業モデルへ入力する。これに応じて、操業モデルは、行動決定テーブルの各行と、操業モデルのうちのウエイト列を除いた各サンプルデータとの間でカーネル計算を行い、各サンプルデータとの間の距離をそれぞれ算出する。そして、操業モデルは、各サンプルデータについて算出した距離にそれぞれのウエイト列の値を乗算したものを順次足し合わせ、各行動における報酬期待値を計算する。操業モデルは、このようにして計算された報酬期待値が最も高くなる行動を選択する。操業モデル生成部３１６は、例えばこのようにして、更新中の操業モデルを用いて報酬期待値が最も高いと判断された行動を選択することにより行動を決定してもよい。学習時においては、操業モデル生成部３１６は、ランダムに行動を決定するか、操業モデルを用いて行動を決定するかを適宜選択しながら行動を決定すればよい。操業モデル生成部３１６は、決定した行動を学習操作指示部３１８へ供給する。 In step S1904, the operation model generating device 300 determines an action. For example, the operation model generating unit 316 randomly determines an action. In the above description, the case where the operation model generating unit 316 randomly determines an action is shown as an example, but is not limited to this. When the operation model generating unit 316 determines an action, a known AI algorithm such as FKDPP (Factorial Kernel Dynamic Policy Programming) may be used. When using such a kernel method, the operation model generating unit 316 generates a vector of state s from the sensor value obtained by the learning environment data. Next, the operation model generating unit 316 generates a combination of the state s and all possible actions a as an action determination table such as that shown in FIG. 17. Then, the operation model generating unit 316 inputs the action determination table into an operation model such as that shown in FIG. 16. In response to this, the operation model performs kernel calculations between each row of the behavior decision table and each sample data of the operation model excluding the weight column, and calculates the distance between each sample data. Then, the operation model sequentially adds the distances calculated for each sample data multiplied by the value of each weight column to calculate the reward expectation value for each action. The operation model selects the action that will maximize the reward expectation value calculated in this way. For example, the operation model generation unit 316 may determine the action by selecting the action that is determined to have the highest reward expectation value using the operation model being updated in this way. During learning, the operation model generation unit 316 may determine the action while appropriately selecting whether to determine the action randomly or to determine the action using the operation model. The operation model generation unit 316 supplies the determined action to the learning operation instruction unit 318.

ステップＳ１９０６において、操業モデル生成装置３００は、学習環境へ操作を指示する。例えば、学習操作指示部３１８は、ステップＳ１９０４において決定された行動を、学習環境における制御対象の値（バルブ値等）に加算した操作量を、学習環境における制御対象へ与える。これにより学習環境の状態が変化する。 In step S1906, the operation model generation device 300 instructs the learning environment to perform an operation. For example, the learning operation instruction unit 318 adds the action determined in step S1904 to the value of the control object in the learning environment (valve value, etc.) to obtain an operation amount, and gives the operation amount to the control object in the learning environment. This changes the state of the learning environment.

ステップＳ１９０８において、操業モデル生成装置３００は、学習環境データを取得する。例えば、学習環境データ取得部３１４は、ステップＳ１９０２と同様、学習環境における状態を示す学習環境データを取得する。すなわち、学習環境データ取得部３１４は、決定された行動に基づく操作量が制御対象へ与えたことに応じて変化した後の学習環境の状態を取得する。学習環境データ取得部３１４は、取得した学習環境データを操業モデル生成部３１６へ供給する。 In step S1908, the operation model generation device 300 acquires learning environment data. For example, the learning environment data acquisition unit 314 acquires learning environment data indicating the state in the learning environment, similar to step S1902. That is, the learning environment data acquisition unit 314 acquires the state of the learning environment after it has changed in response to the operation amount based on the determined action being applied to the control object. The learning environment data acquisition unit 314 supplies the acquired learning environment data to the operation model generation unit 316.

ステップＳ１９１０において、操業モデル生成装置３００は、報酬値を算出する。例えば、操業モデル生成部３１６は、評価モデルの出力に少なくとも部分的に基づき、報酬値を算出する。一例として、操業モデル生成部３１６は、ステップＳ１９０８において取得された学習環境データを、図１８のステップＳ１８０２において取得された評価モデルへ入力したことに応じて評価モデルが出力する指標をそのまま用いて報酬値を算出してもよいし、評価モデルによってＯＫと判断された場合に１、ＮＧと判断された場合に０として報酬値を算出してもよい。 In step S1910, the operation model generating device 300 calculates a reward value. For example, the operation model generating unit 316 calculates the reward value based at least in part on the output of the evaluation model. As an example, the operation model generating unit 316 may calculate the reward value by directly using the index output by the evaluation model in response to inputting the learning environment data acquired in step S1908 into the evaluation model acquired in step S1802 of FIG. 18, or may calculate the reward value as 1 when the evaluation model judges the data to be OK, and as 0 when the evaluation model judges the data to be NG.

ステップＳ１９１２において、操業モデル生成装置３００は、行動の決定に応じた状態の取得処理が、指定されたステップ回数を超えたかどうか判定する。なお、このようなステップ回数は、予めユーザにより指定されたものであってもよいし、学習対象期間（例えば１０日間等）を基に定められたものであってもよい。上述の処理が指定されたステップ回数を超えていないと判定された場合（Ｎｏの場合）、操業モデル生成装置３００は、処理をステップＳ１９０４に戻してフローを継続する。操業モデル生成装置３００は、このような行動の決定に応じた状態の取得処理を指定されたステップ回数実行する。 In step S1912, the operation model generating device 300 determines whether the process of acquiring the state according to the action decision has exceeded a specified number of steps. Note that such a number of steps may be specified in advance by the user, or may be determined based on the learning period (e.g., 10 days, etc.). If it is determined that the above process has not exceeded the specified number of steps (No), the operation model generating device 300 returns the process to step S1904 and continues the flow. The operation model generating device 300 executes the process of acquiring the state according to the action decision for the specified number of steps.

ステップＳ１９１２において、上述の処理が指定されたステップ回数を超えたと判定された場合（Ｙｅｓの場合）、操業モデル生成装置３００は、処理をステップＳ１９１４へ進める。ステップＳ１９１４において、操業モデル生成装置３００は、操業モデルを更新する。例えば、操業モデル生成部３１６は、図１６に示される操業モデルにおけるウエイト列の値を上書きするほか、これまでに保存されていない新たなサンプルデータを操業モデルに追加する。 If it is determined in step S1912 that the above-mentioned processing has exceeded the specified number of steps (Yes), the operation model generation device 300 advances the processing to step S1914. In step S1914, the operation model generation device 300 updates the operation model. For example, the operation model generation unit 316 overwrites the values of the weight column in the operation model shown in FIG. 16, and adds new sample data that has not been saved so far to the operation model.

ステップＳ１９１６において、操業モデル生成装置３００は、操業モデルの更新処理が、指定された繰り返し回数を超えたかどうか判定する。なお、このような繰り返し回数は、予めユーザにより指定されたものであってもよいし、操業モデルの妥当性に応じて定められたものであってもよい。上述の処理が指定された繰り返し回数を超えていないと判定された場合（Ｎｏの場合）、操業モデル生成装置３００は、処理をステップＳ１９０２へ戻してフローを継続する。 In step S1916, the operation model generating device 300 determines whether the update process of the operation model has exceeded a specified number of repetitions. Note that such a number of repetitions may be specified in advance by the user, or may be determined according to the validity of the operation model. If it is determined that the above process has not exceeded the specified number of repetitions (No), the operation model generating device 300 returns the process to step S1902 and continues the flow.

ステップＳ１９１６において、上述の処理が指定された繰り返し回数を超えたと判定された場合（Ｙｅｓ）の場合、操業モデル生成装置３００は、フローを終了する。操業モデル生成装置３００は、例えばこのようにして、評価モデルの出力を報酬の少なくとも一部とした強化学習により、設備１０における状態に応じた行動を出力する操業モデルを生成することができる。 If it is determined in step S1916 that the above-mentioned process has exceeded the specified number of repetitions (Yes), the operation model generation device 300 ends the flow. In this way, for example, the operation model generation device 300 can generate an operation model that outputs an action according to the state of the equipment 10 by reinforcement learning in which the output of the evaluation model is at least a part of the reward.

図２０は、本実施形態に係る操業システム１００における制御装置４００のブロック図の一例を示す。制御装置４００は、例えば、ＤＣＳ（ＤｉｓｔｒｉｂｕｔｅｄＣｏｎｔｒｏｌＳｙｓｔｅｍ：分散制御システム）や中規模向け計装システムにおけるコントローラであってもよいし、リアルタイムＯＳコントローラ等であってもよい。 Figure 20 shows an example of a block diagram of the control device 400 in the operation system 100 according to this embodiment. The control device 400 may be, for example, a controller in a distributed control system (DCS) or a medium-scale instrumentation system, or may be a real-time OS controller, etc.

制御装置４００は、操業モデル取得部４１２と、実環境データ取得部４１４と、制御部４１６と、実操作指示部４１８とを備える。 The control device 400 includes an operation model acquisition unit 412, an actual environment data acquisition unit 414, a control unit 416, and an actual operation instruction unit 418.

操業モデル取得部４１２は、操業モデル出力部３２２が出力した操業モデルを、例えば、ネットワークを介して取得する。しかしながら、これに限定されるものではない。操業モデル取得部４１２は、操業モデルを、各種メモリデバイスを介して取得してもよいし、ユーザ入力を介して取得してもよい。操業モデル取得部４１２は、取得した操業モデルを、制御部４１６へ供給する。 The operation model acquisition unit 412 acquires the operation model output by the operation model output unit 322, for example, via a network. However, this is not limited to this. The operation model acquisition unit 412 may acquire the operation model via various memory devices or via user input. The operation model acquisition unit 412 supplies the acquired operation model to the control unit 416.

実環境データ取得部４１４は、実環境、すなわち、設備１０における状態を示す実環境データを取得する。このような実環境データは、前述の状態データと同様のデータであってよい。実環境データ取得部４１４は、取得した実環境データを制御部４１６へ供給する。 The real-environment data acquisition unit 414 acquires real-environment data indicating the state of the real environment, i.e., the equipment 10. Such real-environment data may be data similar to the state data described above. The real-environment data acquisition unit 414 supplies the acquired real-environment data to the control unit 416.

制御部４１６は、実環境データ取得部４１４が取得した実環境、すなわち、設備１０の状態に応じて操業モデル取得部４１２が取得した操業モデルが出力する行動に基づく操作量を決定する。制御部４１６は、決定した操作量を実操作指示部４１８へ供給する。 The control unit 416 determines the amount of operation based on the behavior output by the operation model acquired by the operation model acquisition unit 412 according to the actual environment acquired by the actual environment data acquisition unit 414, i.e., the state of the equipment 10. The control unit 416 supplies the determined amount of operation to the actual operation instruction unit 418.

実操作指示部４１８は、制御部４１６が決定した操作量を、実環境、すなわち、設備１０における制御対象へ与える。 The actual operation instruction unit 418 applies the operation amount determined by the control unit 416 to the control target in the actual environment, i.e., the facility 10.

図２１は、制御装置４００における処理フローの一例を示す。制御装置４００は、例えば本図に示されるフローにより、制御対象の制御処理を実行してよい。 Figure 21 shows an example of a processing flow in the control device 400. The control device 400 may execute control processing of the control target, for example, according to the flow shown in this figure.

ステップＳ２１０２において、制御装置４００は、操業モデルを取得する。例えば、操業モデル取得部４１２は、図１８のステップＳ１８０８において出力された操業モデルを、ネットワークを介して取得する。操業モデル取得部４１２は、取得した操業モデルを、制御部４１６へ供給する。 In step S2102, the control device 400 acquires an operation model. For example, the operation model acquisition unit 412 acquires the operation model output in step S1808 of FIG. 18 via a network. The operation model acquisition unit 412 supplies the acquired operation model to the control unit 416.

ステップＳ２１０４において、制御装置４００は、実環境データを取得する。例えば、実環境データ取得部４１４は、実環境における状態を示す実環境データを取得する。このような実環境データは、上述の設備１０における状態を示す状態データと同様のデータであってよい。実環境データ取得部４１４は、取得した実環境データを制御部４１６へ供給する。 In step S2104, the control device 400 acquires real-world environment data. For example, the real-world data acquisition unit 414 acquires real-world environment data indicating the state in the real environment. Such real-world environment data may be data similar to the state data indicating the state in the equipment 10 described above. The real-world data acquisition unit 414 supplies the acquired real-world environment data to the control unit 416.

ステップＳ２１０６において、制御装置４００は、行動を決定する。例えば、制御部４１６は、操業モデルを用いて報酬期待値が最も高いと判断された行動を選択することにより行動を決定する。制御部４１６は、決定した行動を実操作指示部４１８へ供給する。 In step S2106, the control device 400 determines an action. For example, the control unit 416 determines the action by selecting the action that is determined to have the highest expected reward value using the operation model. The control unit 416 supplies the determined action to the actual operation instruction unit 418.

ステップＳ２１０８において、制御装置４００は、実環境へ操作を指示する。例えば、実操作指示部４１８は、ステップＳ２１０６において決定された行動を、設備１０における制御対象の値に加算した操作量を、設備１０における制御対象へ与える。これにより実環境の状態が変化する。 In step S2108, the control device 400 instructs the real environment to perform an operation. For example, the real operation instruction unit 418 applies to the control object in the facility 10 an operation amount obtained by adding the action determined in step S2106 to the value of the control object in the facility 10. This changes the state of the real environment.

ステップＳ２１１０において、制御装置４００は、ＡＩ制御を終了するかどうか判定する。ＡＩ制御を終了すると判定された場合（Ｙｅｓの場合）、制御装置４００はフローを終了する。ＡＩ制御を終了しないと判定された場合（Ｎｏの場合）、制御装置４００は、処理をステップＳ２１０４へ戻してフローを継続する。 In step S2110, the control device 400 determines whether to end AI control. If it is determined that AI control is to be ended (Yes), the control device 400 ends the flow. If it is determined that AI control is not to be ended (No), the control device 400 returns the process to step S2104 and continues the flow.

従来、例えば、特許文献１のように、強化学習されたモデルを用いて制御対象を制御するＡＩ制御技術が知られている。しかしながら、ＡＩ制御技術においては、報酬値を算出するための報酬関数をユーザが経験や勘等により事前に設定しておく必要があった。このように人の手が介在する場合、操業サイクルを解決まで導くためには、複数の労働力を用いた年単位の長期間に及ぶ作業が必要である等、莫大な手間と時間を要していた。また、労働力不足や人員配置ミスによる遅延や中断、遠隔地・危険地での作業を伴う可能性等も考慮する必要があった。さらに、熟練オペレータの経験や勘を用いても、常に迅速で最適な判断ができるとも限らない。長期間にわたるプラント管理の場合、同レベルのスキルを受け継ぐ後継者の確保も容易ではない。また、個人のスキルは一面的になることが多く、異なる部門や機能間で情報を共有し、複数の問題を網羅的に把握して解決するには限界があった。 Conventionally, as in Patent Document 1, for example, an AI control technology that uses a reinforcement learning model to control a control target is known. However, in the AI control technology, the user needs to set the reward function for calculating the reward value in advance based on experience, intuition, etc. When human intervention is involved, it takes a huge amount of time and effort to bring the operation cycle to a solution, such as long-term work using multiple labor forces over a period of years. In addition, it is necessary to consider delays and interruptions due to labor shortages and personnel allocation errors, and the possibility of work in remote or dangerous areas. Furthermore, even if the experience and intuition of a skilled operator are used, it is not always possible to make quick and optimal decisions. In the case of long-term plant management, it is not easy to secure a successor who can inherit the same level of skills. In addition, individual skills are often one-sided, and there are limitations to sharing information between different departments and functions and comprehensively understanding and solving multiple problems.

これに対して、本実施形態に係る操業システム１００においては、ＡＩが自動的に操業におけるボトルネック（ポテンシャルフォルト）を探し出し、改善のための指標を評価モデルとして生成する。そして、ＡＩが与えられた指標を基に試行錯誤を行い、より良い操業方法を指示する操業モデルを生成する。そして、ＡＩコントローラが当該操業モデルを用いて制御対象をＡＩ制御する。これにより、本実施形態に係る操業システム１００によれば、ＡＩ技術を用いて設備１０を自律制御可能な環境を提供する。そして、本実施形態に係る操業システム１００は、このようなＡＩ制御下における設備の状態に基づいて、評価モデルおよび操業モデルを更新し、更新された操業モデルを用いて制御対象をＡＩ制御する。これにより、本実施形態に係る操業システム１００によれば、設備１０における操業を改善するループを自律的に回すことができる。したがって、本実施形態に係る操業システム１００によれば、これまで行っていたデータ収集・調査のＰＤＣＡサイクルを２４時間３６５日休まず継続的かつ高速に行い、もって、プラント等の生産性・効率性を半永久的に高め続けることを可能とする。また、状況に応じた意思決定を客観的かつ包括的に行うことができるので、熟練オペレータの退職によるスキル継承リスク等にとらわれず、長期間にわたって積み上げた知識を多角的に活用することができる。 In contrast, in the operation system 100 according to the present embodiment, the AI automatically finds bottlenecks (potential faults) in the operation and generates indicators for improvement as an evaluation model. Then, the AI performs trial and error based on the given indicators to generate an operation model that indicates a better operation method. Then, the AI controller uses the operation model to control the controlled object by AI. As a result, the operation system 100 according to the present embodiment provides an environment in which the equipment 10 can be autonomously controlled using AI technology. Then, the operation system 100 according to the present embodiment updates the evaluation model and the operation model based on the state of the equipment under such AI control, and uses the updated operation model to control the controlled object by AI. As a result, the operation system 100 according to the present embodiment can autonomously run a loop to improve the operation of the equipment 10. Therefore, according to the operation system 100 according to the present embodiment, the PDCA cycle of data collection and investigation that has been performed so far can be performed continuously and at high speed 24 hours a day, 365 days a year, without interruption, thereby making it possible to continue to increase the productivity and efficiency of plants, etc. semi-permanently. In addition, because decisions can be made objectively and comprehensively according to the situation, knowledge accumulated over a long period of time can be utilized in a multifaceted way without being limited by risks such as skill transfer when experienced operators retire.

本発明の様々な実施形態は、フローチャートおよびブロック図を参照して記載されてよく、ここにおいてブロックは、（１）操作が実行されるプロセスの段階または（２）操作を実行する役割を持つ装置のセクションを表わしてよい。特定の段階およびセクションが、専用回路、コンピュータ可読媒体上に格納されるコンピュータ可読命令と共に供給されるプログラマブル回路、および／またはコンピュータ可読媒体上に格納されるコンピュータ可読命令と共に供給されるプロセッサによって実装されてよい。専用回路は、デジタルおよび／またはアナログハードウェア回路を含んでよく、集積回路（ＩＣ）および／またはディスクリート回路を含んでよい。プログラマブル回路は、論理ＡＮＤ、論理ＯＲ、論理ＸＯＲ、論理ＮＡＮＤ、論理ＮＯＲ、および他の論理操作、フリップフロップ、レジスタ、フィールドプログラマブルゲートアレイ（ＦＰＧＡ）、プログラマブルロジックアレイ（ＰＬＡ）等のようなメモリ要素等を含む、再構成可能なハードウェア回路を含んでよい。 Various embodiments of the present invention may be described with reference to flow charts and block diagrams, where the blocks may represent (1) stages of a process in which operations are performed or (2) sections of an apparatus responsible for performing the operations. Particular stages and sections may be implemented by dedicated circuitry, programmable circuitry provided with computer readable instructions stored on a computer readable medium, and/or a processor provided with computer readable instructions stored on a computer readable medium. Dedicated circuitry may include digital and/or analog hardware circuitry and may include integrated circuits (ICs) and/or discrete circuits. Programmable circuitry may include reconfigurable hardware circuitry including logical AND, logical OR, logical XOR, logical NAND, logical NOR, and other logical operations, memory elements such as flip-flops, registers, field programmable gate arrays (FPGAs), programmable logic arrays (PLAs), and the like.

コンピュータ可読媒体は、適切なデバイスによって実行される命令を格納可能な任意の有形なデバイスを含んでよく、その結果、そこに格納される命令を有するコンピュータ可読媒体は、フローチャートまたはブロック図で指定された操作を実行するための手段を作成すべく実行され得る命令を含む、製品を備えることになる。コンピュータ可読媒体の例としては、電子記憶媒体、磁気記憶媒体、光記憶媒体、電磁記憶媒体、半導体記憶媒体等が含まれてよい。コンピュータ可読媒体のより具体的な例としては、フロッピー（登録商標）ディスク、ディスケット、ハードディスク、ランダムアクセスメモリ（ＲＡＭ）、リードオンリメモリ（ＲＯＭ）、消去可能プログラマブルリードオンリメモリ（ＥＰＲＯＭまたはフラッシュメモリ）、電気的消去可能プログラマブルリードオンリメモリ（ＥＥＰＲＯＭ）、静的ランダムアクセスメモリ（ＳＲＡＭ）、コンパクトディスクリードオンリメモリ（ＣＤ-ＲＯＭ）、デジタル多用途ディスク（ＤＶＤ）、ブルーレイ（ＲＴＭ）ディスク、メモリスティック、集積回路カード等が含まれてよい。 A computer-readable medium may include any tangible device capable of storing instructions that are executed by a suitable device, such that the computer-readable medium having instructions stored thereon comprises an article of manufacture that includes instructions that can be executed to create means for performing the operations specified in the flowchart or block diagram. Examples of computer-readable media may include electronic storage media, magnetic storage media, optical storage media, electromagnetic storage media, semiconductor storage media, and the like. More specific examples of computer-readable media may include floppy disks, diskettes, hard disks, random access memories (RAMs), read-only memories (ROMs), erasable programmable read-only memories (EPROMs or flash memories), electrically erasable programmable read-only memories (EEPROMs), static random access memories (SRAMs), compact disk read-only memories (CD-ROMs), digital versatile disks (DVDs), Blu-ray (RTM) disks, memory sticks, integrated circuit cards, and the like.

コンピュータ可読命令は、アセンブラ命令、命令セットアーキテクチャ（ＩＳＡ）命令、マシン命令、マシン依存命令、マイクロコード、ファームウェア命令、状態設定データ、またはＳｍａｌｌｔａｌｋ（登録商標）、ＪＡＶＡ（登録商標）、Ｃ＋＋等のようなオブジェクト指向プログラミング言語、および「Ｃ」プログラミング言語または同様のプログラミング言語のような従来の手続型プログラミング言語を含む、１または複数のプログラミング言語の任意の組み合わせで記述されたソースコードまたはオブジェクトコードのいずれかを含んでよい。 The computer readable instructions may include either assembler instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including object-oriented programming languages such as Smalltalk®, JAVA®, C++, etc., and conventional procedural programming languages such as the "C" programming language or similar programming languages.

コンピュータ可読命令は、汎用コンピュータ、特殊目的のコンピュータ、若しくは他のプログラム可能なデータ処理装置のプロセッサまたはプログラマブル回路に対し、ローカルにまたはローカルエリアネットワーク（ＬＡＮ）、インターネット等のようなワイドエリアネットワーク（ＷＡＮ）を介して提供され、フローチャートまたはブロック図で指定された操作を実行するための手段を作成すべく、コンピュータ可読命令を実行してよい。プロセッサの例としては、コンピュータプロセッサ、処理ユニット、マイクロプロセッサ、デジタル信号プロセッサ、コントローラ、マイクロコントローラ等を含む。 The computer-readable instructions may be provided to a processor or programmable circuit of a general-purpose computer, special-purpose computer, or other programmable data processing apparatus, either locally or over a wide area network (WAN) such as a local area network (LAN), the Internet, etc., to execute the computer-readable instructions to create means for performing the operations specified in the flowcharts or block diagrams. Examples of processors include computer processors, processing units, microprocessors, digital signal processors, controllers, microcontrollers, etc.

図２２は、本発明の複数の態様が全体的または部分的に具現化されてよいコンピュータ９９００の例を示す。コンピュータ９９００にインストールされたプログラムは、コンピュータ９９００に、本発明の実施形態に係る装置に関連付けられる操作または当該装置の１または複数のセクションとして機能させることができ、または当該操作または当該１または複数のセクションを実行させることができ、および／またはコンピュータ９９００に、本発明の実施形態に係るプロセスまたは当該プロセスの段階を実行させることができる。そのようなプログラムは、コンピュータ９９００に、本明細書に記載のフローチャートおよびブロック図のブロックのうちのいくつかまたはすべてに関連付けられた特定の操作を実行させるべく、ＣＰＵ９９１２によって実行されてよい。 22 shows an example of a computer 9900 in which aspects of the present invention may be embodied in whole or in part. A program installed on the computer 9900 may cause the computer 9900 to function as or perform operations associated with an apparatus according to an embodiment of the present invention or one or more sections of the apparatus, and/or to perform a process or steps of a process according to an embodiment of the present invention. Such a program may be executed by the CPU 9912 to cause the computer 9900 to perform certain operations associated with some or all of the blocks of the flowcharts and block diagrams described herein.

本実施形態によるコンピュータ９９００は、ＣＰＵ９９１２、ＲＡＭ９９１４、グラフィックコントローラ９９１６、およびディスプレイデバイス９９１８を含み、それらはホストコントローラ９９１０によって相互に接続されている。コンピュータ９９００はまた、通信インターフェイス９９２２、ハードディスクドライブ９９２４、ＤＶＤドライブ９９２６、およびＩＣカードドライブのような入／出力ユニットを含み、それらは入／出力コントローラ９９２０を介してホストコントローラ９９１０に接続されている。コンピュータはまた、ＲＯＭ９９３０およびキーボード９９４２のようなレガシの入／出力ユニットを含み、それらは入／出力チップ９９４０を介して入／出力コントローラ９９２０に接続されている。 The computer 9900 according to this embodiment includes a CPU 9912, a RAM 9914, a graphics controller 9916, and a display device 9918, which are interconnected by a host controller 9910. The computer 9900 also includes input/output units such as a communication interface 9922, a hard disk drive 9924, a DVD drive 9926, and an IC card drive, which are connected to the host controller 9910 via an input/output controller 9920. The computer also includes legacy input/output units such as a ROM 9930 and a keyboard 9942, which are connected to the input/output controller 9920 via an input/output chip 9940.

ＣＰＵ９９１２は、ＲＯＭ９９３０およびＲＡＭ９９１４内に格納されたプログラムに従い動作し、それにより各ユニットを制御する。グラフィックコントローラ９９１６は、ＲＡＭ９９１４内に提供されるフレームバッファ等またはそれ自体の中にＣＰＵ９９１２によって生成されたイメージデータを取得し、イメージデータがディスプレイデバイス９９１８上に表示されるようにする。 The CPU 9912 operates according to the programs stored in the ROM 9930 and the RAM 9914, thereby controlling each unit. The graphics controller 9916 retrieves image data generated by the CPU 9912 into a frame buffer or the like provided in the RAM 9914 or into itself, and causes the image data to be displayed on the display device 9918.

通信インターフェイス９９２２は、ネットワークを介して他の電子デバイスと通信する。ハードディスクドライブ９９２４は、コンピュータ９９００内のＣＰＵ９９１２によって使用されるプログラムおよびデータを格納する。ＤＶＤドライブ９９２６は、プログラムまたはデータをＤＶＤ－ＲＯＭ９９０１から読み取り、ハードディスクドライブ９９２４にＲＡＭ９９１４を介してプログラムまたはデータを提供する。ＩＣカードドライブは、プログラムおよびデータをＩＣカードから読み取り、および／またはプログラムおよびデータをＩＣカードに書き込む。 The communication interface 9922 communicates with other electronic devices via a network. The hard disk drive 9924 stores programs and data used by the CPU 9912 in the computer 9900. The DVD drive 9926 reads programs or data from the DVD-ROM 9901 and provides the programs or data to the hard disk drive 9924 via the RAM 9914. The IC card drive reads programs and data from an IC card and/or writes programs and data to an IC card.

ＲＯＭ９９３０はその中に、アクティブ化時にコンピュータ９９００によって実行されるブートプログラム等、および／またはコンピュータ９９００のハードウェアに依存するプログラムを格納する。入／出力チップ９９４０はまた、様々な入／出力ユニットをパラレルポート、シリアルポート、キーボードポート、マウスポート等を介して、入／出力コントローラ９９２０に接続してよい。 The ROM 9930 stores therein a boot program, etc., executed by the computer 9900 upon activation, and/or a program that depends on the hardware of the computer 9900. The input/output chip 9940 may also connect various input/output units to the input/output controller 9920 via a parallel port, a serial port, a keyboard port, a mouse port, etc.

プログラムが、ＤＶＤ－ＲＯＭ９９０１またはＩＣカードのようなコンピュータ可読媒体によって提供される。プログラムは、コンピュータ可読媒体から読み取られ、コンピュータ可読媒体の例でもあるハードディスクドライブ９９２４、ＲＡＭ９９１４、またはＲＯＭ９９３０にインストールされ、ＣＰＵ９９１２によって実行される。これらのプログラム内に記述される情報処理は、コンピュータ９９００に読み取られ、プログラムと、上記様々なタイプのハードウェアリソースとの間の連携をもたらす。装置または方法が、コンピュータ９９００の使用に従い情報の操作または処理を実現することによって構成されてよい。 The programs are provided by a computer-readable medium such as a DVD-ROM 9901 or an IC card. The programs are read from the computer-readable medium and installed in the hard disk drive 9924, RAM 9914, or ROM 9930, which are also examples of computer-readable media, and executed by the CPU 9912. The information processing described in these programs is read by the computer 9900, and brings about cooperation between the programs and the various types of hardware resources described above. An apparatus or method may be constructed by realizing the manipulation or processing of information according to the use of the computer 9900.

例えば、通信がコンピュータ９９００および外部デバイス間で実行される場合、ＣＰＵ９９１２は、ＲＡＭ９９１４にロードされた通信プログラムを実行し、通信プログラムに記述された処理に基づいて、通信インターフェイス９９２２に対し、通信処理を命令してよい。通信インターフェイス９９２２は、ＣＰＵ９９１２の制御下、ＲＡＭ９９１４、ハードディスクドライブ９９２４、ＤＶＤ－ＲＯＭ９９０１、またはＩＣカードのような記録媒体内に提供される送信バッファ処理領域に格納された送信データを読み取り、読み取られた送信データをネットワークに送信し、またはネットワークから受信された受信データを記録媒体上に提供される受信バッファ処理領域等に書き込む。 For example, when communication is performed between the computer 9900 and an external device, the CPU 9912 may execute a communication program loaded into the RAM 9914 and instruct the communication interface 9922 to perform communication processing based on the processing described in the communication program. Under the control of the CPU 9912, the communication interface 9922 reads transmission data stored in a transmission buffer processing area provided in the RAM 9914, the hard disk drive 9924, the DVD-ROM 9901, or a recording medium such as an IC card, and transmits the read transmission data to the network, or writes reception data received from the network to a reception buffer processing area or the like provided on the recording medium.

また、ＣＰＵ９９１２は、ハードディスクドライブ９９２４、ＤＶＤドライブ９９２６（ＤＶＤ－ＲＯＭ９９０１）、ＩＣカード等のような外部記録媒体に格納されたファイルまたはデータベースの全部または必要な部分がＲＡＭ９９１４に読み取られるようにし、ＲＡＭ９９１４上のデータに対し様々なタイプの処理を実行してよい。ＣＰＵ９９１２は次に、処理されたデータを外部記録媒体にライトバックする。 The CPU 9912 may also cause all or a necessary portion of a file or database stored on an external recording medium such as a hard disk drive 9924, a DVD drive 9926 (DVD-ROM 9901), an IC card, etc. to be read into the RAM 9914, and perform various types of processing on the data on the RAM 9914. The CPU 9912 then writes back the processed data to the external recording medium.

様々なタイプのプログラム、データ、テーブル、およびデータベースのような様々なタイプの情報が記録媒体に格納され、情報処理を受けてよい。ＣＰＵ９９１２は、ＲＡＭ９９１４から読み取られたデータに対し、本開示の随所に記載され、プログラムの命令シーケンスによって指定される様々なタイプの操作、情報処理、条件判断、条件分岐、無条件分岐、情報の検索／置換等を含む、様々なタイプの処理を実行してよく、結果をＲＡＭ９９１４に対しライトバックする。また、ＣＰＵ９９１２は、記録媒体内のファイル、データベース等における情報を検索してよい。例えば、各々が第２の属性の属性値に関連付けられた第１の属性の属性値を有する複数のエントリが記録媒体内に格納される場合、ＣＰＵ９９１２は、第１の属性の属性値が指定される、条件に一致するエントリを当該複数のエントリの中から検索し、当該エントリ内に格納された第２の属性の属性値を読み取り、それにより予め定められた条件を満たす第１の属性に関連付けられた第２の属性の属性値を取得してよい。 Various types of information, such as various types of programs, data, tables, and databases, may be stored in the recording medium and undergo information processing. The CPU 9912 may perform various types of processing on the data read from the RAM 9914, including various types of operations, information processing, conditional judgment, conditional branching, unconditional branching, information search/replacement, etc., as described throughout this disclosure and specified by the instruction sequence of the program, and write back the results to the RAM 9914. The CPU 9912 may also search for information in a file, database, etc. in the recording medium. For example, if multiple entries each having an attribute value of a first attribute associated with an attribute value of a second attribute are stored in the recording medium, the CPU 9912 may search for an entry that matches a condition in which an attribute value of the first attribute is specified from among the multiple entries, read the attribute value of the second attribute stored in the entry, and thereby obtain the attribute value of the second attribute associated with the first attribute that satisfies a predetermined condition.

上で説明したプログラムまたはソフトウェアモジュールは、コンピュータ９９００上またはコンピュータ９９００近傍のコンピュータ可読媒体に格納されてよい。また、専用通信ネットワークまたはインターネットに接続されたサーバーシステム内に提供されるハードディスクまたはＲＡＭのような記録媒体が、コンピュータ可読媒体として使用可能であり、それによりプログラムを、ネットワークを介してコンピュータ９９００に提供する。 The above-described program or software module may be stored on a computer-readable medium on the computer 9900 or in the vicinity of the computer 9900. Also, a recording medium such as a hard disk or RAM provided in a server system connected to a dedicated communication network or the Internet can be used as a computer-readable medium, thereby providing the program to the computer 9900 via the network.

以上、本発明を実施の形態を用いて説明したが、本発明の技術的範囲は上記実施の形態に記載の範囲には限定されない。上記実施の形態に、多様な変更または改良を加えることが可能であることが当業者に明らかである。その様な変更または改良を加えた形態も本発明の技術的範囲に含まれ得ることが、特許請求の範囲の記載から明らかである。 The present invention has been described above using an embodiment, but the technical scope of the present invention is not limited to the scope described in the above embodiment. It is clear to those skilled in the art that various modifications and improvements can be made to the above embodiment. It is clear from the claims that forms with such modifications or improvements can also be included in the technical scope of the present invention.

特許請求の範囲、明細書、および図面中において示した装置、システム、プログラム、および方法における動作、手順、ステップ、および段階等の各処理の実行順序は、特段「より前に」、「先立って」等と明示しておらず、また、前の処理の出力を後の処理で用いるのでない限り、任意の順序で実現しうることに留意すべきである。特許請求の範囲、明細書、および図面中の動作フローに関して、便宜上「まず、」、「次に、」等を用いて説明したとしても、この順で実施することが必須であることを意味するものではない。 The order of execution of each process, such as operations, procedures, steps, and stages, in the devices, systems, programs, and methods shown in the claims, specifications, and drawings is not specifically stated as "before" or "prior to," and it should be noted that the processes may be performed in any order, unless the output of a previous process is used in a later process. Even if the operational flow in the claims, specifications, and drawings is explained using "first," "next," etc. for convenience, it does not mean that it is necessary to perform the processes in this order.

１０設備
２０本部
１００操業システム
２００評価モデル生成装置
２１０ラベリング機能部
２１２操業目標取得部
２１４状態データ取得部
２１６相関データ生成部
２１８教師ラベル取得部
２２０ラベリング部
２２２ラベリングデータ出力部
２２４ラベリングモデル更新部
２３０機械学習機能部
２４０ラベリングデータ取得部
２５０評価モデル生成部
２５２学習部
２５４前処理部
２５６機械学習部
２６０評価モデル判定部
２７０評価モデル出力部
３００操業モデル生成装置
３１２評価モデル取得部
３１４学習環境データ取得部
３１６操業モデル生成部
３１８学習操作指示部
３２０操業モデル判定部
３２２操業モデル出力部
４００制御装置
４１２操業モデル取得部
４１４実環境データ取得部
４１６制御部
４１８実操作指示部
９９００コンピュータ
９９０１ＤＶＤ－ＲＯＭ
９９１０ホストコントローラ
９９１２ＣＰＵ
９９１４ＲＡＭ
９９１６グラフィックコントローラ
９９１８ディスプレイデバイス
９９２０入／出力コントローラ
９９２２通信インターフェイス
９９２４ハードディスクドライブ
９９２６ＤＶＤドライブ
９９３０ＲＯＭ
９９４０入／出力チップ
９９４２キーボード 10 Facility 20 Headquarters 100 Operation system 200 Evaluation model generating device 210 Labeling function unit 212 Operation target acquisition unit 214 Status data acquisition unit 216 Correlation data generating unit 218 Teacher label acquisition unit 220 Labeling unit 222 Labeling data output unit 224 Labeling model update unit 230 Machine learning function unit 240 Labeling data acquisition unit 250 Evaluation model generating unit 252 Learning unit 254 Preprocessing unit 256 Machine learning unit 260 Evaluation model determination unit 270 Evaluation model output unit 300 Operation model generating device 312 Evaluation model acquisition unit 314 Learning environment data acquisition unit 316 Operation model generating unit 318 Learning operation instruction unit 320 Operation model determination unit 322 Operation model output unit 400 Control device 412 Operation model acquisition unit 414 Real environment data acquisition unit 416 Control unit 418 Real operation instruction unit 9900 Computer 9901 DVD-ROM
9910 Host controller 9912 CPU
9914 RAM
9916 Graphics controller 9918 Display device 9920 Input/output controller 9922 Communication interface 9924 Hard disk drive 9926 DVD drive 9930 ROM
9940 Input/Output Chip 9942 Keyboard

Claims

an evaluation model generation device that generates an evaluation model that outputs an index that evaluates a state of a facility with respect to a target target based on an operation target of the facility and a state of the facility by machine learning;
an operation model generation device that generates an operation model that outputs an action according to a state of the facility by reinforcement learning using an output of the evaluation model as at least a part of a reward;
a control device that applies a manipulation amount based on the behavior output by the operation model in accordance with the state of the equipment to a control target in the equipment ,
the evaluation model generation device updates the evaluation model based on a state of the facility when the control object is controlled using the operation model; and
An operation system , wherein the operation model generation device updates the operation model through reinforcement learning using the output of the updated evaluation model as at least a part of a reward.

The operation system according to claim 1 , wherein the control device controls the controlled object using the updated operation model.

an evaluation model generation device that generates an evaluation model that outputs an index that evaluates a state of a facility with respect to a target target based on an operation target of the facility and a state of the facility by machine learning;
an operation model generation device that generates an operation model that outputs an action according to a state of the facility by reinforcement learning using an output of the evaluation model as at least a part of a reward;
a control device that applies a manipulation amount based on the behavior output by the operation model in accordance with the state of the equipment to a control target in the equipment ,
The evaluation model generation device comprises:
An operation target acquisition unit that acquires the operation target;
A status data acquisition unit that acquires status data indicating a status of the equipment;
a correlation data generating unit that generates correlation data indicating at least one of a correlation between at least one physical quantity included in the state data and time and a correlation between at least two physical quantities included in the state data based on the operation target;
a labeling unit that labels the correlation data using a labeling model;
an evaluation model generation unit that generates the evaluation model using the labeled correlation data;
An operating system comprising :

The evaluation model generation device comprises:
The operation system according to claim 3 , further comprising an evaluation model determination unit that determines the validity of the evaluation model.

The evaluation model generation device comprises:
The operation system according to claim 4 , further comprising an evaluation model output unit that outputs the evaluation model when the evaluation model is determined to be valid.

The evaluation model generation device comprises:
The operation system according to claim 4 or 5 , further comprising a labeling model update unit that updates the labeling model when the evaluation model is determined to be valid.

The evaluation model generation device further includes a truth label acquisition unit that acquires truth labels for at least a portion of the correlation data,
The operation system according to claim 6 , wherein the labeling model update unit generates an updated labeling model separately from an initial labeling model generated based on the teacher label.

generating an evaluation model that outputs an index that evaluates a state of the facility with respect to a target target based on an operation target of the facility and a state of the facility by machine learning;
generating an operation model that outputs an action according to a state of the equipment by reinforcement learning using an output of the evaluation model as at least a part of a reward;
and providing a manipulated variable based on the behavior output by the operation model in accordance with the state of the equipment to a control target in the equipment ,
generating the evaluation model includes updating the evaluation model based on a state of the equipment when the control object is controlled using the operation model;
An operating method , wherein generating the operating model includes updating the operating model by reinforcement learning using an output of the updated evaluation model as at least a part of a reward.

generating an evaluation model that outputs an index that evaluates a state of the facility with respect to a target target based on an operation target of the facility and a state of the facility by machine learning;
generating an operation model that outputs an action according to a state of the equipment by reinforcement learning using an output of the evaluation model as at least a part of a reward;
and providing a manipulated variable based on the behavior output by the operation model in accordance with the state of the equipment to a control target in the equipment ,
Generating the valuation model includes:
obtaining said operational objectives;
acquiring status data indicative of a status of the facility;
generating correlation data indicating at least one of a correlation between at least one physical quantity included in the status data and time and a correlation between at least two physical quantities included in the status data based on the operation target;
labeling the correlation data using a labeling model; and
generating the evaluation model using the labeled correlation data; and
A method of operation having the steps :

When executed by a computer, the computer is caused to
an evaluation model generation device that generates an evaluation model that outputs an index that evaluates a state of a facility with respect to a target target based on an operation target of the facility and a state of the facility by machine learning;
an operation model generation device that generates an operation model that outputs an action according to a state of the facility by reinforcement learning using an output of the evaluation model as at least a part of a reward;
a control device that applies a manipulated variable based on the behavior output by the operation model in accordance with the state of the equipment to a control target in the equipment ;
the evaluation model generation device updates the evaluation model based on a state of the facility when the control object is controlled using the operation model; and
The operation model generation device updates the operation model through reinforcement learning using the output of the updated evaluation model as at least a part of a reward.

When executed by a computer, the computer is caused to
an evaluation model generation device that generates an evaluation model that outputs an index that evaluates a state of a facility with respect to a target target based on an operation target of the facility and a state of the facility by machine learning;
an operation model generation device that generates an operation model that outputs an action according to a state of the facility by reinforcement learning using an output of the evaluation model as at least a part of a reward;
a control device that applies a manipulated variable based on the behavior output by the operation model in accordance with the state of the equipment to a control target in the equipment ;
The evaluation model generation device comprises:
An operation target acquisition unit that acquires the operation target;
A status data acquisition unit that acquires status data indicating a status of the equipment;
a correlation data generating unit that generates correlation data indicating at least one of a correlation between at least one physical quantity included in the state data and time and a correlation between at least two physical quantities included in the state data based on the operation target;
a labeling unit that labels the correlation data using a labeling model;
an evaluation model generation unit that generates the evaluation model using the labeled correlation data;
An operational program that includes :

an operation target acquisition unit that acquires an operation target in the facility;
A status data acquisition unit that acquires status data indicating a status of the equipment;
a correlation data generating unit that generates correlation data indicating at least one of a correlation between at least one physical quantity included in the state data and time and a correlation between at least two physical quantities included in the state data based on the operation target;
a labeling unit that labels the correlation data using a labeling model;
and an evaluation model generation unit that generates an evaluation model that outputs an index that evaluates a state of the facility with respect to a target target based on an operation target of the facility and a state of the facility, using the labeled correlation data ;
The evaluation model generating device outputs the index used as at least a part of a reward in reinforcement learning of an operation model that outputs an action according to a state of the equipment.

Obtaining operational goals for the facility;
acquiring status data indicative of a status of the facility;
generating correlation data indicating at least one of a correlation between at least one physical quantity included in the status data and time and a correlation between at least two physical quantities included in the status data based on the operation target;
labeling the correlation data using a labeling model; and
generating an evaluation model that outputs an index that evaluates a state of the facility with respect to a target target based on an operation target of the facility and a state of the facility, using the labeled correlation data ;
An evaluation model generation method, in which the evaluation model outputs the index used as at least a part of a reward in reinforcement learning of an operation model that outputs an action according to a state of the equipment.

When executed by a computer, the computer is caused to
an operation target acquisition unit that acquires an operation target in the facility;
A status data acquisition unit that acquires status data indicating a status of the equipment;
a correlation data generating unit that generates correlation data indicating at least one of a correlation between at least one physical quantity included in the state data and time and a correlation between at least two physical quantities included in the state data based on the operation target;
a labeling unit that labels the correlation data using a labeling model;
using the labeled correlation data, the evaluation model generating unit generates an evaluation model that outputs an index that evaluates a state of the facility with respect to a target target based on an operation target of the facility and a state of the facility ;
The evaluation model generates an evaluation model that outputs the index used as at least a part of a reward in reinforcement learning of an operation model that outputs an action according to a state of the equipment.