JP2019067238A

JP2019067238A - Control device, control method and control program

Info

Publication number: JP2019067238A
Application number: JP2017193547A
Authority: JP
Inventors: 健一郎島田; Kenichiro Shimada; 知範泉谷; Tomonori Izumitani; 大地木村; Daichi Kimura; 恵介切通; Keisuke Kiritoshi
Original assignee: NTT Communications Corp
Current assignee: NTT Communications Corp
Priority date: 2017-10-03
Filing date: 2017-10-03
Publication date: 2019-04-25

Abstract

To simply and accurately execute an optimum control for an actual environment.SOLUTION: A control device 10 collects data acquired by a sensor 21 installed in a control target facility 20; determines a control content using a model for determining a control content of the control target facility 20 while receiving the collected data as an input; and controls the control target facility 20 on the basis of the control content. Then, the control device 10 learns about a model so that a value of data obtained by the sensor 21 after the control is performed is given higher reward as the value is closer to a predetermined value.SELECTED DRAWING: Figure 1

Description

本発明は、制御装置、制御方法および制御プログラムに関する。 The present invention relates to a control device, a control method, and a control program.

従来、工場、プラント、ビル、データセンタ等の様々な環境において、制御装置や空調その他様々な機器の制御を行っている。従来の手法では、一般的に、人間が閾値や制御内容などを決めてルール化を行い、最適な温度や安定した制御状態を作り出す。 Conventionally, in various environments such as a factory, a plant, a building, and a data center, control of control devices, air conditioning, and various other devices is performed. In the conventional method, in general, a human determines thresholds and control contents and performs rule making to create an optimum temperature and a stable control state.

特開２０１７−１４２６５４号公報Unexamined-Japanese-Patent No. 2017-142654

しかしながら、従来の手法では、実環境を対象とした最適制御を簡易かつ精度よく実行することができないという課題があった。例えば、人間が閾値や制御内容などを決めてルール化を行い、最適な温度や安定した制御状態を作り出す場合には、専門家が手動でルールを決めたりするので、手間が掛かり簡易に実行することができなかった。 However, in the conventional method, there has been a problem that the optimum control for the real environment can not be simply and accurately executed. For example, when a human decides a threshold or control content and performs rule making and creating an optimum temperature and a stable control state, an expert manually determines the rule, so it takes time and is easy to execute. I could not.

なお、強化学習を用いて閾値や制御内容などを学習し、機器の制御を自動的に行うことが考えられる。このような強化学習の報酬の設計では、シンプルな報酬設計が行われるので、最適制御を精度よく実行することができなかった。 In addition, it is possible to learn a threshold value, a control content, etc. using reinforcement learning, and to control an apparatus automatically. In designing such a reward for reinforcement learning, a simple reward design is performed, so optimal control can not be performed accurately.

上述した課題を解決し、目的を達成するために、本発明の制御装置は、制御対象設備に設置されたセンサによって取得されたデータを収集する収集手段と、前記収集手段によって収集されたデータを入力として、前記制御対象設備の制御内容を決定するためのモデルを用いて制御内容を決定し、該制御内容に基づいて、前記制御対象設備を制御する制御手段と、前記モデルについて、前記制御手段によって制御が行われた後の前記センサによって得られたデータの値が、所定の値に近いほど高い報酬が付与されるように学習する学習手段とを有することを特徴とする。 In order to solve the problems described above and to achieve the object, the control device of the present invention comprises: collection means for collecting data acquired by a sensor installed in a control target facility; and data collected by the collection means Control means for determining control contents using a model for determining control contents of the control target equipment as input, and control means for controlling the control target equipment based on the control contents, and the control means for the model And a learning means for learning such that a higher reward is given as the value of the data obtained by the sensor after the control is performed is closer to a predetermined value.

また、本発明の制御方法は、制御装置によって実行される制御方法であって、制御対象設備に設置されたセンサによって取得されたデータを収集する収集工程と、前記収集工程によって収集されたデータを入力として、前記制御対象設備の制御内容を決定するためのモデルを用いて制御内容を決定し、該制御内容に基づいて、前記制御対象設備を制御する制御工程と、前記モデルについて、前記制御工程によって制御が行われた後の前記センサによって得られたデータの値が、所定の値に近いほど高い報酬が付与されるように学習する学習工程とを含んだことを特徴とする。 Further, the control method of the present invention is a control method executed by a control device, and includes a collection step of collecting data acquired by a sensor installed in a control target facility, and data collected by the collection step. As an input, the control content is determined using a model for determining control content of the control target equipment, and a control step of controlling the control target equipment based on the control content, and the control step for the model And a learning step of learning such that the value of the data obtained by the sensor after the control is performed according to the value of the data is higher as it is closer to a predetermined value.

また、本発明の制御プログラムは、制御対象設備に設置されたセンサによって取得されたデータを収集する収集ステップと、前記収集ステップによって収集されたデータを入力として、前記制御対象設備の制御内容を決定するためのモデルを用いて制御内容を決定し、該制御内容に基づいて、前記制御対象設備を制御する制御ステップと、前記モデルについて、前記制御ステップによって制御が行われた後の前記センサによって得られたデータの値が、所定の値に近いほど高い報酬が付与されるように学習する学習ステップとをコンピュータに実行させることを特徴とする。 Further, the control program of the present invention determines the control content of the control target facility by using the collection step of collecting data acquired by the sensor installed in the control target facility and the data collected by the collection step. Control content is determined using the model to be controlled, and based on the control content, a control step of controlling the equipment to be controlled and a sensor obtained by the sensor after the control step is performed on the model And causing the computer to execute a learning step of learning so that a higher reward is given as the value of the stored data is closer to the predetermined value.

本発明によれば、実環境を対象とした最適制御を簡易かつ精度よく実行することができるという効果を奏する。 According to the present invention, it is possible to easily and accurately execute the optimum control for the real environment.

図１は、第１の実施形態に係る制御システムの構成例を示すブロック図である。FIG. 1 is a block diagram showing an example of the configuration of a control system according to the first embodiment. 図２は、第１の実施形態に係る制御装置が最適制御を行う実環境の一例を示す図である。FIG. 2 is a diagram illustrating an example of a real environment in which the control device according to the first embodiment performs optimum control. 図３は、第１の実施形態に係る制御装置の構成例を示すブロック図である。FIG. 3 is a block diagram showing a configuration example of a control device according to the first embodiment. 図４は、センサデータ記憶部に記憶されるデータの一例を示す図である。FIG. 4 is a diagram showing an example of data stored in the sensor data storage unit. 図５は、第１の実施形態に係る制御装置における最適制御処理の流れを説明する図である。FIG. 5 is a diagram for explaining the flow of the optimum control process in the control device according to the first embodiment. 図６は、第１の実施形態に係る制御装置における最適制御学習の並列処理を説明する図である。FIG. 6 is a diagram for explaining parallel processing of optimal control learning in the control device according to the first embodiment. 図７は、第１の実施形態に係る制御装置における報酬付与について説明する図である。FIG. 7 is a diagram for explaining reward provision in the control device according to the first embodiment. 図８は、第１の実施形態に係る制御装置における処理の流れの一例を示すフローチャートである。FIG. 8 is a flowchart showing an example of the flow of processing in the control device according to the first embodiment. 図９は、第２の実施形態に係る制御装置の処理の概要を示す図である。FIG. 9 is a diagram showing an outline of processing of a control device according to the second embodiment. 図１０は、制御プログラムを実行するコンピュータを示す図である。FIG. 10 is a diagram illustrating a computer that executes a control program.

以下に、本願に係る制御装置、制御方法および制御プログラムの実施の形態を図面に基づいて詳細に説明する。なお、この実施の形態により本願に係る制御装置、制御方法および制御プログラムが限定されるものではない。 Hereinafter, embodiments of a control device, a control method, and a control program according to the present application will be described in detail based on the drawings. The control device, the control method, and the control program according to the present application are not limited by the embodiment.

［第１の実施形態］
以下の実施の形態では、第１の実施形態に係る制御システム１００の構成、制御装置１０の構成、制御装置１０の処理の流れを順に説明し、最後に第１の実施形態による効果を説明する。 First Embodiment
In the following embodiments, the configuration of the control system 100 according to the first embodiment, the configuration of the control device 10, and the flow of processing of the control device 10 will be described in order, and finally the effects of the first embodiment will be described. .

［制御システムの構成］
図１は、第１の実施形態に係る制御システムの構成例を示すブロック図である。第１の実施形態に係る制御システム１００は、制御装置１０と実環境である複数の制御対象設備２０Ａ〜２０Ｃとを有し、制御装置１０と制御対象設備２０Ａ〜２０Ｃはネットワーク３０を介して互いに接続されている。なお、図１に示す構成は一例にすぎず、具体的な構成や各装置の数は特に限定されない。また、制御対象設備２０Ａ〜２０Ｃについて、特に区別なく説明する場合には、適宜制御対象設備２０と記載する。 [Control system configuration]
FIG. 1 is a block diagram showing an example of the configuration of a control system according to the first embodiment. The control system 100 according to the first embodiment includes the control device 10 and a plurality of control target facilities 20A to 20C in a real environment, and the control device 10 and the control target facilities 20A to 20C mutually communicate via the network 30. It is connected. The configuration shown in FIG. 1 is merely an example, and the specific configuration and the number of devices are not particularly limited. Moreover, when demonstrating control object installation 20A-20C especially without distinction, it describes with the control object installation 20 suitably.

制御装置１０は、制御対象設備２０Ａ〜２０Ｃに設置されたセンサ２１によって取得されたデータを収集する。そして、制御装置１０は、収集したセンサのデータを入力として、制御対象設備２０Ａ〜２０Ｃの制御内容を決定するためのニューラルネットワークモデル等のモデルを用いて制御内容を決定し、該制御内容に基づいて、制御対象設備２０Ａ〜２０Ｃを制御する。 The control device 10 collects data acquired by the sensors 21 installed in the control target facilities 20A to 20C. Then, the control device 10 receives the collected sensor data as an input, determines the control content using a model such as a neural network model for determining the control content of the control target equipment 20A to 20C, and based on the control content Control the control target equipment 20A to 20C.

続いて、制御装置１０は、モデルについて、制御が行われた後のセンサ２１によって得られたデータの値が、所定の値に近いほど高い報酬が付与されるように強化学習を実施する。例えば、制御装置１０は、予め設定された所定の上限値と所定の下限値との平均値に近いほど高い報酬を段階的に付与する。 Subsequently, the control device 10 performs reinforcement learning so that the value of data obtained by the sensor 21 after the control is performed on the model is given a higher reward as the value is closer to a predetermined value. For example, the control device 10 gradually gives higher rewards as it approaches the average value of the predetermined upper limit value and the predetermined lower limit value set in advance.

制御対象設備２０Ａ〜２０Ｃは、それぞれ複数のセンサ２１が設置されている。制御対象設備は、例えば、プラント内装置や反応炉、建物空調、データセンタ内ラック等である。なお、ここでは各制御対象設備２０Ａ〜２０Ｃは、距離的に離れているものとする。制御対象設備２０について具体例を挙げて説明すると、例えば、図２に例示するように、プラント内に設置されたタンクであり、各６基のタンクにそれぞれ温度センサ（図示略）が設けられている。図２は、第１の実施形態に係る制御装置が最適制御を行う実環境の一例を示す図である。 Each of the control target facilities 20A to 20C is provided with a plurality of sensors 21. The control target equipment is, for example, an in-plant apparatus, a reactor, a building air conditioner, a rack in a data center, and the like. Here, each of the control target equipments 20A to 20C is assumed to be distant in distance. For example, as illustrated in FIG. 2, the control target facility 20 is a tank installed in a plant, and each of the six tanks is provided with a temperature sensor (not shown). There is. FIG. 2 is a diagram illustrating an example of a real environment in which the control device according to the first embodiment performs optimum control.

この例では、各センサ２１がタンクの温度のデータを取得し、制御装置１０に送信する。そして、制御装置１０は、実環境のデータを取得して強化学習を行うので、仮想環境では得られない実環境上の外的要因なども含むこととなり、ランダム性がより高まり、様々な状況における学習を実行することが可能である。 In this example, each sensor 21 acquires data on the temperature of the tank and transmits it to the control device 10. Then, since the control device 10 acquires data of the real environment and performs reinforcement learning, it also includes external factors on the real environment which can not be obtained in the virtual environment, etc., and the randomness is further enhanced, and in various situations. It is possible to carry out learning.

また、制御装置１０は、例えば、収集したセンサ２１のデータに応じて、各６基のタンクに対して制御内容を決定し、空冷、冷水等で各６基のタンクの温度を調整する。温度の調節は、常に最適な値でとどまるように、自動調整される。 Further, the control device 10 determines the control content for each of the six tanks according to the collected data of the sensor 21, for example, and adjusts the temperature of each of the six tanks with air cooling, cold water or the like. The temperature adjustment is automatically adjusted so that it always stays at the optimum value.

［制御装置の構成］
次に、図３を用いて、制御装置１０の構成を説明する。図３は、第１の実施形態に係る制御装置の構成例を示すブロック図である。図３に示すように、この制御装置１０は、通信処理部１１、制御部１２および記憶部１３を有する。以下に制御装置１０が有する各部の処理を説明する。 [Configuration of control unit]
Next, the configuration of the control device 10 will be described with reference to FIG. FIG. 3 is a block diagram showing a configuration example of a control device according to the first embodiment. As shown in FIG. 3, the control device 10 includes a communication processing unit 11, a control unit 12, and a storage unit 13. The process of each part which control device 10 has below is explained.

通信処理部１１は、各種情報に関する通信を制御する。例えば、通信処理部１１は、制御対象設備２０との間でセンサのデータの送受信を行う。 The communication processing unit 11 controls communication regarding various information. For example, the communication processing unit 11 transmits and receives sensor data to and from the control target facility 20.

記憶部１３は、制御部１２による各種処理に必要なデータおよびプログラムを格納するが、特に本発明に密接に関連するものとしては、センサデータ記憶部１３ａを有する。例えば、記憶部１３は、ＲＡＭ（Random Access Memory）、フラッシュメモリ（Flash Memory）等の半導体メモリ素子、又は、ハードディスク、光ディスク等の記憶装置などである。 The storage unit 13 stores data and programs necessary for various processes performed by the control unit 12, and particularly includes a sensor data storage unit 13a as closely related to the present invention. For example, the storage unit 13 is a semiconductor memory device such as a random access memory (RAM) or a flash memory, or a storage device such as a hard disk or an optical disk.

センサデータ記憶部１３ａは、後述する収集部１２ａによって各制御対象設備２０の各センサ２１から収集された同一時刻のセンサ２１のデータを一時的に記憶する。例えば、センサデータ記憶部１３ａは、図４に例示するように、時刻１２：００に各センサ２１によって取得されたセンサ２１のデータとして、実環境Ａの状態１〜３、実環境Ｂの状態１〜３、実環境Ｃの状態１〜３を記憶する。ここで、「状態」とは、センサ２１が取得した温度や圧力、音、振動等の各種データである。図４は、センサデータ記憶部に記憶されるデータの一例を示す図である。 The sensor data storage unit 13a temporarily stores data of the sensors 21 at the same time collected from the sensors 21 of the control target facilities 20 by the collecting unit 12a described later. For example, as illustrated in FIG. 4, the sensor data storage unit 13 a uses the data of the sensor 21 acquired by each sensor 21 at time 12:00, the status 1 to 3 of the real environment A, and the status 1 of the real environment B. The states 1 to 3 of the real environment C are stored. Here, the “state” is various data such as temperature, pressure, sound, vibration and the like acquired by the sensor 21. FIG. 4 is a diagram showing an example of data stored in the sensor data storage unit.

具体例を挙げて説明すると、実環境Ａの状態１〜３とは、例えば、プラント内のそれぞれ異なる場所に設置された各温度センサの温度の値であってもよいし、状態１が温度の値、状態２が圧力の値、状態３が振動の値というように、それぞれ異なる種類のセンサ２１のデータであってもよい。なお、以下では、各状態が、それぞれ異なる場所に設置された各温度センサの温度の値である場合を例として説明する。 Describing the specific example, the states 1 to 3 of the real environment A may be, for example, the values of the temperatures of the respective temperature sensors installed at different places in the plant, and the state 1 is the temperature The data may be data of different types of sensors 21 such as the value, the state 2 being a pressure value, and the state 3 being a vibration value. In addition, below, the case where each state is a value of the temperature of each temperature sensor installed in each different place is demonstrated as an example.

例えば、図４の例では、センサデータ記憶部１３ａは、時刻１２：００における実環境Ａの状態１として「４０」度、状態２として「３１」度、状態３として「１７」度を記憶し、実環境Ｂの状態１として「７０」度、状態２として「８０」度、状態３として「６６」度を記憶し、実環境Ｃの状態１として「５０」度、状態２として「４５」度、状態３として「５６」度を記憶する。 For example, in the example of FIG. 4, the sensor data storage unit 13a stores “40” degrees as state 1 of real environment A at time 12:00, “31” degrees as state 2, and “17” degrees as state 3. It stores "70" degrees as state 1 of real environment B, "80" degrees as state 2 and "66" degrees as state 3 and "50" degrees as state 1 of real environment C and "45" as state 2 Degree, "56" degree is stored as state 3.

制御部１２は、各種の処理手順などを規定したプログラムおよび所要データを格納するための内部メモリを有し、これらによって種々の処理を実行するが、特に本発明に密接に関連するものとしては、収集部１２ａ、制御部１２ｂおよび学習部１２ｃを有する。ここで、制御部１２は、ＣＰＵ（Central Processing Unit）やＭＰＵ（Micro Processing Unit）などの電子回路やＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）などの集積回路である。 The control unit 12 has a program that defines various processing procedures and the like, and an internal memory for storing required data, and executes various processing by these, and particularly as closely related to the present invention, It has a collection unit 12a, a control unit 12b, and a learning unit 12c. Here, the control unit 12 is an electronic circuit such as a central processing unit (CPU) or a micro processing unit (MPU) or an integrated circuit such as an application specific integrated circuit (ASIC) or a field programmable gate array (FPGA).

収集部１２ａは、制御対象設備２０に設置されたセンサ２１によって取得されたデータを収集する。具体的には、収集部１２ａは、複数の制御対象設備２０にそれぞれ設置された各センサ２１のデータをそれぞれ収集し、センサデータ記憶部１３ａに同時刻のデータをバッファリングする。 The collection unit 12a collects data acquired by the sensor 21 installed in the control target facility 20. Specifically, the collection unit 12a collects data of each sensor 21 installed in each of a plurality of control target facilities 20, and buffers data of the same time in the sensor data storage unit 13a.

例えば、収集部１２ａは、工場やプラントなどの制御対象設備２０に設置されるセンサ２１からデータを定期的（例えば、１分ごと）に受信し、センサデータ記憶部１３ａにバッファリングする。ここでセンサ２１が取得するデータとは、例えば、制御対象設備である工場、プラント内の装置や反応炉についての温度や圧力、音、振動等の各種データである。 For example, the collection unit 12a periodically (eg, every one minute) receives data from the sensor 21 installed in the control target facility 20 such as a factory or a plant, and buffers the data in the sensor data storage unit 13a. Here, the data acquired by the sensor 21 is, for example, various data such as temperature, pressure, sound, vibration, etc. of the plant which is the control target equipment, the apparatus in the plant, and the reaction furnace.

制御部１２ｂは、収集部１２ａによって収集されたデータを入力として、制御対象設備２０の制御内容を決定するためのモデルを用いて制御内容を決定し、該制御内容に基づいて、制御対象設備２０を制御する。制御部１２ｂは、収集部１２ａによって収集された各センサ２１のデータを、各モデルにそれぞれ入力して制御内容をそれぞれ決定し、各制御内容に基づいて、各制御対象設備２０をそれぞれ制御する。 The control unit 12b receives the data collected by the collection unit 12a, determines the control content using a model for determining the control content of the control target equipment 20, and based on the control content, the control target equipment 20 Control. The control unit 12b inputs the data of each sensor 21 collected by the collection unit 12a into each model to determine the control content, and controls each control target facility 20 based on each control content.

例えば、制御部１２ｂは、収集部１２ａによって収集された各センサのデータをセンサデータ記憶部１３ａに格納し、同一時刻のデータを同時に各モデルにそれぞれ入力して制御内容をそれぞれ決定し、各制御内容に基づいて、各制御対象設備２０をそれぞれ制御する。 For example, the control unit 12b stores the data of each sensor collected by the collection unit 12a in the sensor data storage unit 13a, inputs data of the same time simultaneously to each model, and determines the control content, respectively. Each control target equipment 20 is controlled based on the content.

学習部１２ｃは、モデルについて、制御部１２ｂによって制御が行われた後のセンサ２１によって得られたデータの値が、所定の値に近いほど高い報酬が付与されるように学習する。例えば、学習部１２ｃは、制御部１２ｂによって制御が行われた後のセンサ２１によって得られたデータの値が、予め設定された所定の上限値と所定の下限値との平均値に近いほど高い報酬を付与する。つまり、学習部１２ｃは、安定の中心に行くほど、段階的に高い報酬を付与する。 The learning unit 12c learns that the value of the data obtained by the sensor 21 after being controlled by the control unit 12b for the model is given a higher reward as the value is closer to a predetermined value. For example, the learning unit 12c is higher as the value of the data obtained by the sensor 21 after the control by the control unit 12b is closer to the average value between the predetermined upper limit and the predetermined lower limit set in advance. Give a reward. That is, as the learning unit 12c goes to the center of stability, it gives higher rewards in stages.

ここで、図５を用いて、第１の実施形態に係る制御装置１０における最適制御処理の流れを説明する。図５は、第１の実施形態に係る制御装置における最適制御処理の流れを説明する図である。図５に例示するように、制御装置１０は、実環境Ａの状態１〜３、実環境Ｂの状態１〜３、実環境Ｃの状態１〜３として、各実環境Ａ〜Ｃに設置されたセンサ２１のデータを収集する。そして、制御装置１０は、各実環境Ａ〜Ｃにおいて最適制御学習を並列して実施し、各実環境Ａ〜Ｃそれぞれに適用したモデルのうち、最適なモデルを採用する。 Here, the flow of the optimum control process in the control device 10 according to the first embodiment will be described with reference to FIG. FIG. 5 is a diagram for explaining the flow of the optimum control process in the control device according to the first embodiment. As illustrated in FIG. 5, the control device 10 is installed in each of the real environments A to C as the states 1 to 3 of the real environment A, the states 1 to 3 of the real environment B, and the states 1 to 3 of the real environment C. The data of the sensor 21 is collected. Then, the control device 10 executes optimal control learning in parallel in each of the real environments A to C, and adopts an optimal model among the models applied to each of the real environments A to C.

ここで、図６を用いて、最適制御学習の並列処理を具体的に説明する。図６は、第１の実施形態に係る制御装置における最適制御学習の並列処理を説明する図である。図６に示すように、制御装置１０は、各実環境Ａ〜Ｃのセンサ２１のデータを各モデルにそれぞれ入力し、制御対象設備２０に対する制御内容を各モデルの出力としてそれぞれ取得する。そして、制御装置１０は、各制御内容に基づいて、各実環境Ａ〜Ｃの制御を実行する。 Here, the parallel processing of the optimal control learning will be specifically described using FIG. FIG. 6 is a diagram for explaining parallel processing of optimal control learning in the control device according to the first embodiment. As shown in FIG. 6, the control device 10 inputs data of the sensors 21 of the real environments A to C into the respective models, and acquires control contents for the control target equipment 20 as the outputs of the respective models. And control device 10 performs control of each real environment AC based on each control contents.

そして、制御装置１０は、各実環境Ａ〜Ｃの制御結果を取得する。具体的には、制御装置１０は、制御が行われた後の各実環境Ａ〜Ｃのセンサ２１によって得られたデータの値を取得する。続いて、制御装置１０は、各モデルについて、制御が行われた後のセンサ２１によって得られたデータの値が、予め設定された所定の上限値と所定の下限値との平均値に近いほど高い報酬が付与されるように学習する。 And the control apparatus 10 acquires the control result of each real environment AC. Specifically, the control device 10 acquires the value of the data obtained by the sensor 21 of each of the real environments A to C after the control is performed. Subsequently, for each model, the control device 10 determines that the value of the data obtained by the sensor 21 after the control is performed is closer to the average value of the predetermined upper limit value and the predetermined lower limit value set in advance. Learn to get high rewards.

ここで、図７の例を用いて、報酬付与について具体的に説明する。図７は、第１の実施形態に係る制御装置における報酬付与について説明する図である。図７に示すように、制御装置１０は、各モデルについて、制御が行われた後のセンサ２１によって得られたデータの値が、求める適切な値の上限（例えば、上限温度）と求める適切な値の下限（例えば、下限温度）との平均値に近いほど高い報酬が付与され、平均値から遠くなるほど低い報酬が付与されるように学習する。 Here, reward giving will be specifically described using the example of FIG. 7. FIG. 7 is a diagram for explaining reward provision in the control device according to the first embodiment. As shown in FIG. 7, the control device 10 appropriately determines, for each model, the value of the data obtained by the sensor 21 after the control is performed, as the upper limit (for example, the upper limit temperature) of the appropriate value to be determined. It is learned that a higher reward is given closer to the average value with the lower limit (for example, the lower limit temperature) of the value, and a lower reward is given as the distance is farther from the average value.

例えば、制御装置１０は、制御が行われた後のセンサ２１によって得られたデータの値「ｘ」と、予め設定された所定の上限値「θ_１」および所定の下限値「θ_２」とを用いて、付与する報酬を算出する方法として、「−ａ（ｘ−θ_１）（ｘ−θ_２）」を計算する。なお、ここで「ａ」は、任意に変更可能な変数である。 For example, the control device 10 calculates the value “x” of the data obtained by the sensor 21 after the control is performed, the predetermined upper limit “θ ₁ ” and the predetermined lower limit “θ ₂ ” which are set in advance. "-A (x- (theta) ₁ ) (x- (theta) ₂ )" is calculated as a method of calculating the reward to give using these. Here, “a” is a variable that can be arbitrarily changed.

このような学習を行った後、制御装置１０は、実環境Ａ〜Ｃのモデルのうち、最適なモデルを採用し、全てのモデルを最適なモデルに更新する。なお、更新するタイミングは、学習を行うたびに行ってもよいし、任意のタイミングであってもよい。また、最適なモデルをどのように決定するかについては、所定の条件から自動で決定してもよいし、手動で決定してもよい。 After performing such learning, the control device 10 adopts the optimal model among the models of the real environments A to C, and updates all the models to the optimal model. Note that the timing of updating may be performed each time learning is performed, or may be any timing. In addition, how to determine the optimal model may be determined automatically from predetermined conditions or may be determined manually.

［制御装置の処理手順］
次に、図８を用いて、第１の実施形態に係る制御装置１０による処理手順の例を説明する。図８は、第１の実施形態に係る制御装置における処理の流れの一例を示すフローチャートである。 [Processing procedure of control device]
Next, an example of a processing procedure by the control device 10 according to the first embodiment will be described using FIG. FIG. 8 is a flowchart showing an example of the flow of processing in the control device according to the first embodiment.

図８に例示するように、収集部１２ａは、制御対象設備２０におけるセンサ２１のデータを収集すると（ステップＳ１０１肯定）、収集したデータをセンサデータ記憶部１３ａを格納する（ステップＳ１０２）。 As illustrated in FIG. 8, when collecting data of the sensor 21 in the control target facility 20 (Yes at Step S101), the collecting unit 12a stores the collected data in the sensor data storage unit 13a (Step S102).

そして、制御部１２ｂは、全ての実環境における同時刻のデータを収集したかを判定する（ステップＳ１０３）。この結果、制御部１２ｂは、全ての実環境における同時刻のデータを収集していない場合には（ステップＳ１０３否定）、ステップＳ１０１に戻り、全ての実環境における同時刻のデータを収集するまでステップＳ１０１〜ステップＳ１０３の処理を繰り返す。 Then, the control unit 12b determines whether data at the same time in all real environments has been collected (step S103). As a result, when the control unit 12b does not collect data of the same time in all real environments (No at step S103), the control unit 12b returns to step S101 and continues the steps until collecting data of the same time in all real environments. The processing of S101 to step S103 is repeated.

また、制御部１２ｂは、全ての実環境における同時刻のデータを収集した場合には（ステップＳ１０３肯定）、収集部１２ａによって収集された各センサ２１のデータを、各モデルにそれぞれ入力する（ステップＳ１０４）。そして、制御部１２ｂは、制御内容をそれぞれ決定し（ステップＳ１０５）、各制御内容に基づいて、各制御対象設備２０をそれぞれ制御する（ステップＳ１０６）。 When the control unit 12b collects data at the same time in all real environments (Yes at step S103), the control unit 12b inputs the data of each sensor 21 collected by the collection unit 12a to each model (step S104). And control part 12b determines control contents, respectively (Step S105), and controls each controlled object equipment 20, respectively based on each control contents (Step S106).

続いて、学習部１２ｃは、モデルについて、制御部１２ｂによって制御が行われた後のセンサ２１によって得られたデータの値が、所定の値に近いほど高い報酬を付与する（ステップＳ１０７）。例えば、学習部１２ｃは、制御部１２ｂによって制御が行われた後のセンサ２１によって得られたデータの値が、予め設定された所定の上限値と所定の下限値との平均値に近いほど高い報酬を付与する。そして、制御装置１０は、各実環境Ａ〜Ｃそれぞれに適用したモデルのうち、最適なモデルを採用する（ステップＳ１０８）。 Subsequently, the learning unit 12c gives a higher reward to the model as the value of the data obtained by the sensor 21 after the control by the control unit 12b is closer to the predetermined value (step S107). For example, the learning unit 12c is higher as the value of the data obtained by the sensor 21 after the control by the control unit 12b is closer to the average value between the predetermined upper limit and the predetermined lower limit set in advance. Give a reward. Then, the control device 10 adopts an optimal model among the models applied to each of the real environments A to C (step S108).

［第１の実施形態の効果］
第１の実施形態に係る制御装置１０は、制御対象設備２０に設置されたセンサ２１によって取得されたデータを収集し、収集したデータを入力として、制御対象設備２０の制御内容を決定するためのモデルを用いて制御内容を決定し、該制御内容に基づいて、制御対象設備２０を制御する。そして、制御装置１０は、モデルについて、制御が行われた後のセンサ２１によって得られたデータの値が、所定の値に近いほど高い報酬が付与されるように学習する。このため、制御装置１０では、実環境を対象とした最適制御を簡易かつ精度よく実行することが可能である。つまり、制御装置１０では、例えば、強化学習の報酬の与え方を、安定の中心に行くほど、報酬が高くなるモデルを適用し、より望ましい報酬の価値を高めることで、強化学習の最適解が生み出される確率が高めることが可能である。 [Effect of First Embodiment]
The control device 10 according to the first embodiment collects data acquired by the sensor 21 installed in the control target facility 20, and uses the collected data as an input to determine the control content of the control target facility 20. The control content is determined using a model, and the control target equipment 20 is controlled based on the control content. Then, the control device 10 learns about the model so that the value of the data obtained by the sensor 21 after the control is performed is given higher reward as the value is closer to a predetermined value. For this reason, in the control device 10, it is possible to easily and accurately execute the optimum control for the real environment. That is, in the control device 10, for example, by applying a model in which the reward becomes higher as the way of giving reinforcement learning goes to the center of stability, the optimal solution of reinforcement learning is obtained by increasing the value of more desirable reward. It is possible to increase the probability of being generated.

また、制御装置１０は、複数の制御対象設備２０にそれぞれ設置された各センサ２１のデータをそれぞれ収集し、収集された各センサ２１のデータを、各モデルにそれぞれ入力して制御内容をそれぞれ決定し、各制御内容に基づいて、各制御対象設備２０をそれぞれ制御する。そして、制御装置１０は、各モデルについて、制御が行われた後の各センサ２１のデータの値が、所定の値に近いほど高い報酬が付与されるようにそれぞれ学習する。このため、実環境においては、仮想環境では得られない実環境上の外的要因なども含むこととなる。制御装置１０では、その実環境を並列的に学習することで、ランダム性がより高まり、様々な状況における学習を実行することが可能となる。 In addition, the control device 10 collects data of each sensor 21 installed in each of a plurality of control target facilities 20, respectively, and inputs data of each collected sensor 21 to each model to determine control contents. And each control target equipment 20 based on each control content. Then, for each model, the control device 10 learns that the value of the data of each sensor 21 after the control is performed is given a higher reward as the value is closer to a predetermined value. Therefore, in the real environment, it also includes external factors on the real environment which can not be obtained in the virtual environment. In the control device 10, by learning the real environment in parallel, randomness is further enhanced, and learning in various situations can be performed.

また、制御装置１０は、収集した各センサのデータをセンサデータ記憶部１３ａに格納し、同一時刻のデータを同時に各モデルにそれぞれ入力して制御内容をそれぞれ決定し、各制御内容に基づいて、各制御対象設備２０をそれぞれ制御する。制御装置１０では、実環境において並列処理を行う際に、生じる時間的な差分をバッファリングして、似たような環境において同時に学習することを可能とする。このため、実環境同士の距離が離れている等の原因から生じてしまう時間的な差分を、バッファリングを用いることで吸収し、同時実行の状態を作り出すことが可能となる。 Further, the control device 10 stores the collected data of each sensor in the sensor data storage unit 13a, simultaneously inputs data of the same time simultaneously to each model to determine the control content, and based on each control content, Each control target equipment 20 is controlled. In the control device 10, when performing parallel processing in a real environment, it is possible to buffer temporal differences that occur and to learn simultaneously in a similar environment. For this reason, it becomes possible to absorb the time difference which arises from causes, such as distance of real environments being separated, using buffering, and to create the state of simultaneous execution.

［第２の実施形態］
上述した第１の実施形態では、制御装置１０が、各センサのデータをセンサデータ記憶部１３ａにバッファリングし、同一時刻のデータを同時に各モデルにそれぞれ入力し、制御内容をそれぞれ決定し、各制御内容に基づいて、各制御対象設備をそれぞれ制御する場合を説明したが、これに限定されるものではない。例えば、制御装置は、複数の制御対象設備にそれぞれ設置された各センサのデータを同時に収集し、収集された各センサのデータを、モデルに同時に入力して制御内容をそれぞれ決定し、該制御内容に基づいて、各制御対象設備を同時に制御するようにしてもよい。 Second Embodiment
In the first embodiment described above, the control device 10 buffers data of each sensor in the sensor data storage unit 13a, inputs data of the same time simultaneously into each model, determines the control content, and Although the case where each control object installation was controlled was explained based on control contents, it is not limited to this. For example, the control device simultaneously collects data of each sensor installed in each of a plurality of control target facilities, and simultaneously inputs the collected data of each sensor to the model to determine the control content, and the control content The control target equipment may be simultaneously controlled based on

そこで、以下では、第２の実施形態に係る制御装置が、複数の制御対象設備にそれぞれ設置された各センサのデータを同時に収集し、収集された各センサのデータを、モデルに同時に入力して制御内容をそれぞれ決定し、該制御内容に基づいて各制御対象設備を同時に制御する場合について説明する。なお、第１の実施形態に係る制御装置１０と同様の構成や処理については説明を省略する。 Therefore, in the following, the control device according to the second embodiment simultaneously collects data of each sensor installed in each of a plurality of control target facilities, and simultaneously inputs the collected data of each sensor to the model. The case where control content is determined respectively and each control object installation is simultaneously controlled based on the control content is explained. Descriptions of configurations and processes similar to those of the control device 10 according to the first embodiment will be omitted.

第２の実施形態に係る制御装置の収集部１２ａは、複数の制御対象設備２０にそれぞれ設置された各センサ２１のデータを同時に収集する。また、第２の実施形態に係る制御装置の制御部１２ｂは、収集された各センサ２１のデータを、モデルに同時に入力して制御内容をそれぞれ決定し、該制御内容に基づいて、各制御対象設備２０を同時に制御する。 The collection unit 12a of the control device according to the second embodiment simultaneously collects data of each sensor 21 installed in each of a plurality of control target facilities 20. Further, the control unit 12b of the control device according to the second embodiment simultaneously inputs the collected data of each sensor 21 to the model to determine the control content, and based on the control content, each control target The equipment 20 is simultaneously controlled.

ここで、図９を用いて、第２の実施形態に係る制御装置の処理の概要を説明する。図９は、第２の実施形態に係る制御装置の処理の概要を示す図である。図９に示すように、第２の実施形態に係る制御装置は、実環境Ａの状態１〜３、実環境Ｂの状態１〜３、実環境Ｃの状態１〜３として、各実環境Ａ〜Ｃに設置されたセンサ２１のデータを同時に収集する。なお、ここでは各各実環境Ａ〜Ｃは、距離的に近いものとする。 Here, an outline of processing of the control device according to the second embodiment will be described with reference to FIG. FIG. 9 is a diagram showing an outline of processing of a control device according to the second embodiment. As illustrated in FIG. 9, the control device according to the second embodiment includes the real environment A as states 1 to 3, the real environment B as states 1 to 3, and the real environment C as states 1 to 3 respectively. The data of the sensors 21 installed at ~ C are collected at the same time. Here, it is assumed that the respective real environments A to C are close in distance.

そして、第２の実施形態に係る制御装置は、各実環境Ａ〜Ｃのセンサ２１のデータをモデルに同時に入力し、制御対象設備２０に対する制御内容を決定する。そして、第２の実施形態に係る制御装置は、決定した制御内容に基づいて、各環境Ａ〜Ｃにおける制御対象設備２０の制御を同時に行う。 And the control apparatus which concerns on 2nd Embodiment simultaneously inputs the data of the sensor 21 of each real environment AC to a model, and determines the control content with respect to the control object installation 20. FIG. And the control apparatus which concerns on 2nd Embodiment performs control of the control object installation 20 in each environment AC simultaneously based on the determined control content.

［第２の実施形態の効果］
第２の実施形態に係る制御装置は、複数の制御対象設備にそれぞれ設置された各センサ２１のデータを同時に収集し、収集された各センサ２１のデータを、モデルに同時に入力して制御内容をそれぞれ決定し、該制御内容に基づいて、各制御対象設備２０を同時に制御する。つまり、第２の実施形態に係る制御装置では、例えば、より近い場所に置かれた実環境には、なんらかの相互影響があるものと想定し、それらも含みながら同時に学習を行うことで、より効率化された学習が可能となる。 [Effect of Second Embodiment]
The control device according to the second embodiment simultaneously collects data of each sensor 21 installed in each of a plurality of control target facilities, and simultaneously inputs the collected data of each sensor 21 into a model to control content. Each of the control target facilities 20 is simultaneously controlled based on the control content. That is, in the control device according to the second embodiment, for example, it is assumed that there is some mutual influence in a real environment placed closer, and learning is performed simultaneously including them, which is more efficient. It is possible to make a structured learning.

［システム構成等］
また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。さらに、各装置にて行なわれる各処理機能は、その全部または任意の一部が、ＣＰＵおよび当該ＣＰＵにて解析実行されるプログラムにて実現され、あるいは、ワイヤードロジックによるハードウェアとして実現され得る。 [System configuration etc.]
Further, each component of each device illustrated is functionally conceptual, and does not necessarily have to be physically configured as illustrated. That is, the specific form of the distribution and integration of each device is not limited to the illustrated one, and all or a part thereof may be functionally or physically dispersed in any unit depending on various loads, usage conditions, etc. It can be integrated and configured. Furthermore, all or any part of each processing function performed in each device may be realized by a CPU and a program analyzed and executed by the CPU, or may be realized as wired logic hardware.

また、本実施の形態において説明した各処理のうち、自動的におこなわれるものとして説明した処理の全部または一部を手動的におこなうこともでき、あるいは、手動的におこなわれるものとして説明した処理の全部または一部を公知の方法で自動的におこなうこともできる。この他、上記文書中や図面中で示した処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。 Also, among the processes described in the present embodiment, all or part of the process described as being automatically performed can be manually performed, or the process described as being manually performed. All or part of can be performed automatically by a known method. In addition to the above, the processing procedures, control procedures, specific names, and information including various data and parameters shown in the above documents and drawings can be arbitrarily changed unless otherwise specified.

［プログラム］
また、上記実施形態において説明した制御装置が実行する処理をコンピュータが実行可能な言語で記述したプログラムを作成することもできる。例えば、実施形態に係る制御装置１０が実行する処理をコンピュータが実行可能な言語で記述した制御プログラムを作成することもできる。この場合、コンピュータが制御プログラムを実行することにより、上記実施形態と同様の効果を得ることができる。さらに、かかる制御プログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録された制御プログラムをコンピュータに読み込ませて実行することにより上記実施形態と同様の処理を実現してもよい。 [program]
In addition, it is also possible to create a program in which the processing executed by the control device described in the above embodiment is described in a language that can be executed by a computer. For example, it is also possible to create a control program in which the processing to be executed by the control device 10 according to the embodiment is described in a computer executable language. In this case, when the computer executes the control program, the same effect as that of the above embodiment can be obtained. Furthermore, the control program may be recorded in a computer readable recording medium, and the control program recorded in the recording medium may be read and executed by a computer to realize the same processing as that of the above embodiment.

図１０は、制御プログラムを実行するコンピュータを示す図である。図１０に例示するように、コンピュータ１０００は、例えば、メモリ１０１０と、ＣＰＵ１０２０と、ハードディスクドライブインタフェース１０３０と、ディスクドライブインタフェース１０４０と、シリアルポートインタフェース１０５０と、ビデオアダプタ１０６０と、ネットワークインタフェース１０７０とを有し、これらの各部はバス１０８０によって接続される。 FIG. 10 is a diagram illustrating a computer that executes a control program. As illustrated in FIG. 10, the computer 1000 has, for example, a memory 1010, a CPU 1020, a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. These units are connected by a bus 1080.

メモリ１０１０は、図１０に例示するように、ＲＯＭ（Read Only Memory）１０１１及びＲＡＭ１０１２を含む。ＲＯＭ１０１１は、例えば、ＢＩＯＳ（Basic Input Output System）等のブートプログラムを記憶する。ハードディスクドライブインタフェース１０３０は、図１０に例示するように、ハードディスクドライブ１０９０に接続される。ディスクドライブインタフェース１０４０は、図１０に例示するように、ディスクドライブ１１００に接続される。例えば磁気ディスクや光ディスク等の着脱可能な記憶媒体が、ディスクドライブ１１００に挿入される。シリアルポートインタフェース１０５０は、図１０に例示するように、例えばマウス１１１０、キーボード１１２０に接続される。ビデオアダプタ１０６０は、図１０に例示するように、例えばディスプレイ１１３０に接続される。 The memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM 1012 as illustrated in FIG. The ROM 1011 stores, for example, a boot program such as a BIOS (Basic Input Output System). The hard disk drive interface 1030 is connected to the hard disk drive 1090 as illustrated in FIG. The disk drive interface 1040 is connected to the disk drive 1100 as illustrated in FIG. For example, a removable storage medium such as a magnetic disk or an optical disk is inserted into the disk drive 1100. The serial port interface 1050 is connected to, for example, a mouse 1110 and a keyboard 1120 as illustrated in FIG. The video adapter 1060 is connected to, for example, a display 1130 as illustrated in FIG.

ここで、図１０に例示するように、ハードディスクドライブ１０９０は、例えば、ＯＳ１０９１、アプリケーションプログラム１０９２、プログラムモジュール１０９３、プログラムデータ１０９４を記憶する。すなわち、上記の、制御プログラムは、コンピュータ１０００によって実行される指令が記述されたプログラムモジュールとして、例えばハードディスクドライブ１０９０に記憶される。 Here, as illustrated in FIG. 10, the hard disk drive 1090 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094. That is, the control program described above is stored in, for example, the hard disk drive 1090 as a program module in which an instruction to be executed by the computer 1000 is described.

また、上記実施形態で説明した各種データは、プログラムデータとして、例えばメモリ１０１０やハードディスクドライブ１０９０に記憶される。そして、ＣＰＵ１０２０が、メモリ１０１０やハードディスクドライブ１０９０に記憶されたプログラムモジュール１０９３やプログラムデータ１０９４を必要に応じてＲＡＭ１０１２に読み出し、各種処理手順を実行する。 In addition, various data described in the above embodiment are stored as program data in, for example, the memory 1010 or the hard disk drive 1090. Then, the CPU 1020 reads the program module 1093 and program data 1094 stored in the memory 1010 and the hard disk drive 1090 into the RAM 1012 as necessary, and executes various processing procedures.

なお、制御プログラムに係るプログラムモジュール１０９３やプログラムデータ１０９４は、ハードディスクドライブ１０９０に記憶される場合に限られず、例えば着脱可能な記憶媒体に記憶され、ディスクドライブ等を介してＣＰＵ１０２０によって読み出されてもよい。あるいは、制御プログラムに係るプログラムモジュール１０９３やプログラムデータ１０９４は、ネットワーク（ＬＡＮ（Local Area Network）、ＷＡＮ（Wide Area Network）等）を介して接続された他のコンピュータに記憶され、ネットワークインタフェース１０７０を介してＣＰＵ１０２０によって読み出されてもよい。 The program module 1093 and the program data 1094 related to the control program are not limited to being stored in the hard disk drive 1090, and may be stored in, for example, a removable storage medium and read by the CPU 1020 via a disk drive or the like. Good. Alternatively, the program module 1093 and the program data 1094 related to the control program are stored in another computer connected via a network (LAN (Local Area Network), WAN (Wide Area Network), etc.), and via the network interface 1070 It may be read by the CPU 1020.

上記の実施形態やその変形は、本願が開示する技術に含まれると同様に、特許請求の範囲に記載された発明とその均等の範囲に含まれるものである。 The above embodiments and the modifications thereof are included in the invention described in the claims and the equivalents thereof as well as included in the technology disclosed in the present application.

１０制御装置
１１通信処理部
１２制御部
１２ａ収集部
１２ｂ制御部
１２ｃ学習部
１３記憶部
１３ａセンサデータ記憶部
２０、２０Ａ〜２０Ｃ制御対象設備
２１センサ
１００制御システム DESCRIPTION OF SYMBOLS 10 control apparatus 11 communication processing part 12 control part 12a collection part 12b control part 12c learning part 13 memory | storage part 13a sensor data storage part 20, 20A-20C control object installation 21 sensor 100 control system

Claims

Collection means for collecting data acquired by a sensor installed in the control target facility;
Control means for determining control content using a model for determining control content of the control target facility using data collected by the collection means as input and controlling means for controlling the control target facility based on the control content When,
And a learning means for learning that the higher the value of the data obtained by the sensor after the control by the control means is given, the higher the reward is given to the model. Control device.

The collecting means collects data of each sensor installed in each of a plurality of control target equipments,
The control means inputs the data of each sensor collected by the collection means into each model to determine the control content, and controls each control target facility based on the control content.
The learning means is characterized in that, for each model, the value of data of each sensor after being controlled by the control means learns such that a higher reward is given as the value of each sensor is closer to a predetermined value. The control device according to claim 1.

The control means stores the data of each sensor collected by the collection means in the storage unit, inputs data of the same time simultaneously to each model to determine the control contents respectively, and based on each control contents The control device according to claim 2, wherein each control target facility is controlled.

The collection means simultaneously collects data of each sensor installed in each of a plurality of control target facilities,
The control means simultaneously inputs into the model the data of each sensor collected by the collection means to determine the control contents respectively, and simultaneously controls the respective control target facilities based on the control contents. The control device according to claim 1, wherein

The learning means has a higher reward as the value of the data obtained by the sensor after the control by the control means is closer to an average value of a predetermined upper limit and a predetermined lower limit. The control device according to claim 1, wherein the control device is provided.

A control method implemented by the controller,
A collection step of collecting data acquired by a sensor installed in the control target facility;
A control step of determining control content using a model for determining control content of the control target facility using data collected by the collection step as an input, and controlling the control target facility based on the control content When,
And a learning step of learning that the value of the data obtained by the sensor after the control step is performed on the model is given higher reward as the value is closer to a predetermined value. Characteristic control method.

A collection step of collecting data acquired by a sensor installed in the control target facility;
A control step of determining control content using a model for determining control content of the control target equipment using the data collected in the collection step as an input, and controlling the control target equipment based on the control content When,
Performing a learning step of learning that the value of the data obtained by the sensor after the control in the control step is performed on the model is higher as the value is closer to a predetermined value. A control program characterized by