JP2018123989A

JP2018123989A - Thermal comfort device and control content determination method

Info

Publication number: JP2018123989A
Application number: JP2017014870A
Authority: JP
Inventors: 伸裕見市; Nobuhiro Miichi; 智彦藤田; Tomohiko Fujita; 真梨奈大野; Marina ONO
Original assignee: Panasonic Intellectual Property Management Co Ltd
Current assignee: Panasonic Intellectual Property Management Co Ltd
Priority date: 2017-01-30
Filing date: 2017-01-30
Publication date: 2018-08-09

Abstract

PROBLEM TO BE SOLVED: To provide a thermal comfort device capable of achieving thermal comfort suitable for an individual user.SOLUTION: A thermal comfort device 100 includes: a first acquisition unit 110 for acquiring first indoor environment information; a determination unit 120 for (i) determining control content of equipment 500 for performing indoor environment control for thermal comfort from the first indoor environment information according to a control content determination rule or for (ii) determining the control content of the equipment 500 randomly; an output unit 130 for outputting the control content; a second acquisition unit 140 for acquiring comfort information which shows comfort of a user indoors; and an update unit 150 for updating the control content determination rule based on the first indoor environment information and the comfort information. The determination unit 120 selects the determination of (ii) with the probability ε.SELECTED DRAWING: Figure 1

Description

本発明は、温熱快適（thermal comfort）のための屋内環境制御（inside climate control）を行う機器の制御内容を決定する温熱快適装置及び制御内容決定方法に関する。 The present invention relates to a thermal comfort apparatus and a control content determination method for determining a control content of a device that performs inside climate control for thermal comfort.

従来、人感センサの検知結果に基づいて床上空調の運転条件を制御するとともに、タイムスケジュールデータに基づいて床下空調の運転条件を制御することで、省エネかつ温熱快適を実現するシステムが提案されている（例えば、特許文献１を参照）。 Conventionally, a system has been proposed that realizes energy saving and thermal comfort by controlling the operating condition of the floor air conditioning based on the detection result of the human sensor and controlling the operating condition of the underfloor air conditioning based on the time schedule data. (For example, refer to Patent Document 1).

特開２０１５−１１７９２９号公報JP2015-117929A

しかしながら、従来技術では、予め定められたタイムスケジュール等に基づいて空調機器の運転条件が制御されるため、あるユーザに対しては快適な屋内環境を提供できたとしても、別のユーザには不快な屋内環境しか提供できない場合がある。つまり、従来技術では、個々のユーザに適応した温熱快適を実現することが難しい。 However, in the prior art, since the operating condition of the air conditioner is controlled based on a predetermined time schedule or the like, even if a comfortable indoor environment can be provided for one user, it is uncomfortable for another user. There are cases in which only a comfortable indoor environment can be provided. That is, it is difficult for the conventional technology to realize thermal comfort adapted to individual users.

そこで、本発明は、個々のユーザに適応した温熱快適を実現することができる温熱快適装置及び温熱快適のための制御内容決定方法を提供する。 Therefore, the present invention provides a thermal comfort device capable of realizing thermal comfort adapted to individual users and a control content determination method for thermal comfort.

本発明の一態様に係る温熱快適装置は、第１屋内環境情報を取得する第１取得部と、（i）制御内容決定ルールに従って、温熱快適のための屋内環境制御を行う機器の制御内容を前記第１屋内環境情報から決定する、又は（ii）ランダムに前記機器の制御内容を決定する決定部と、決定された前記制御内容を出力する出力部と、屋内におけるユーザの快適さを示す快適性情報を取得する第２取得部と、前記第１屋内環境情報及び前記快適性情報に基づいて、前記制御内容決定ルールを更新する更新部と、を備え、前記決定部は、確率εで前記（ii）の決定を選択する。 A thermal comfort apparatus according to an aspect of the present invention includes: a first acquisition unit that acquires first indoor environment information; and (i) control content of a device that performs indoor environment control for thermal comfort according to a control content determination rule. A determination unit that determines from the first indoor environment information, or (ii) a determination unit that randomly determines the control content of the device, an output unit that outputs the determined control content, and a comfort that indicates the comfort of the user indoors A second acquisition unit that acquires sex information; and an update unit that updates the control content determination rule based on the first indoor environment information and the comfort information, and the determination unit has the probability ε Select decision (ii).

本発明の一態様に係る温熱快適のための制御内容決定方法は、第１屋内環境情報を取得する第１取得ステップと、（i）制御内容決定ルールに従って、温熱快適のための屋内環境制御を行う機器の制御内容を前記第１屋内環境情報から決定する、又は（ii）ランダムに前記機器の制御内容を決定する決定ステップと、決定された前記制御内容を出力する出力ステップと、屋内におけるユーザの快適さを示す快適性情報を取得する第２取得ステップと、前記第１屋内環境情報及び前記快適性情報に基づいて、前記制御内容決定ルールを更新する更新ステップと、を含み、前記決定ステップでは、確率εで前記（ii）の決定を選択する。 A control content determination method for thermal comfort according to one aspect of the present invention includes: a first acquisition step of acquiring first indoor environment information; and (i) indoor environment control for thermal comfort according to a control content determination rule. The control content of the device to be determined is determined from the first indoor environment information, or (ii) a determination step of randomly determining the control content of the device, an output step of outputting the determined control content, and an indoor user A second acquisition step of acquiring comfort information indicating comfort, and an update step of updating the control content determination rule based on the first indoor environment information and the comfort information, the determination step Then, the determination of (ii) is selected with probability ε.

なお、これらの包括的又は具体的な態様は、システム、集積回路、コンピュータプログラム又はコンピュータ読み取り可能なＣＤ−ＲＯＭなどの記録媒体で実現されてもよく、システム、集積回路、コンピュータプログラム及び記録媒体の任意な組み合わせで実現されてもよい。 Note that these comprehensive or specific modes may be realized by a recording medium such as a system, an integrated circuit, a computer program, or a computer-readable CD-ROM, and the system, the integrated circuit, the computer program, and the recording medium. You may implement | achieve in arbitrary combinations.

本発明の一態様に係る温熱快適装置は、個々のユーザに適応した温熱快適を実現することができる。 The thermal comfort apparatus according to one aspect of the present invention can realize thermal comfort adapted to individual users.

図１は、実施の形態１に係る温熱快適システムの構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of the thermal comfort system according to the first embodiment. 図２は、実施の形態１に係る温熱快適装置におけるニューラルネットワークの一例を示す概念図である。FIG. 2 is a conceptual diagram illustrating an example of a neural network in the thermal comfort apparatus according to the first embodiment. 図３は、実施の形態１における複数の制御内容の具体例を示す図である。FIG. 3 is a diagram illustrating a specific example of a plurality of control contents in the first embodiment. 図４は、実施の形態１に係る温熱快適装置の処理を示すフローチャートである。FIG. 4 is a flowchart showing processing of the thermal comfort apparatus according to the first embodiment. 図５は、実施の形態１に係る快適性情報の入力のためのグラフィカルユーザーインターフェースの一例を示す図である。FIG. 5 is a diagram illustrating an example of a graphical user interface for inputting comfort information according to the first embodiment. 図６は、実施の形態２に係る温熱快適装置の機能構成を示すブロック図である。FIG. 6 is a block diagram illustrating a functional configuration of the thermal comfort apparatus according to the second embodiment. 図７は、実施の形態２における複数の制御内容及び気付き困難度情報の具体例を示す図である。FIG. 7 is a diagram illustrating a specific example of a plurality of control contents and notice difficulty level information in the second embodiment. 図８は、実施の形態２に係る温熱快適装置の処理を示すフローチャートである。FIG. 8 is a flowchart showing processing of the thermal comfort apparatus according to the second embodiment. 図９は、実施の形態３に係る温熱快適装置の機能構成を示すブロック図である。FIG. 9 is a block diagram illustrating a functional configuration of the thermal comfort apparatus according to the third embodiment. 図１０は、実施の形態３に係る温熱快適装置の処理を示すフローチャートである。FIG. 10 is a flowchart showing processing of the thermal comfort apparatus according to the third embodiment.

以下、実施の形態について、図面を参照しながら具体的に説明する。 Hereinafter, embodiments will be specifically described with reference to the drawings.

なお、以下で説明する実施の形態は、いずれも包括的または具体的な例を示すものである。以下の実施の形態で示される数値、形状、材料、構成要素、構成要素の配置位置及び接続形態、ステップ、ステップの順序などは、一例であり、請求の範囲を限定する主旨ではない。また、以下の実施の形態における構成要素のうち、最上位概念を示す独立請求項に記載されていない構成要素については、任意の構成要素として説明される。 It should be noted that each of the embodiments described below shows a comprehensive or specific example. Numerical values, shapes, materials, components, arrangement positions and connection forms of components, steps, order of steps, and the like shown in the following embodiments are merely examples, and are not intended to limit the scope of the claims. In addition, among the constituent elements in the following embodiments, constituent elements that are not described in the independent claims indicating the highest concept are described as optional constituent elements.

また、各図は、模式図であり、必ずしも厳密に図示されたものではない。また、各図において、同一又は類似の構成部及び処理ステップについては同じ符号を付している。 Each figure is a mimetic diagram and is not necessarily illustrated strictly. Moreover, in each figure, the same code | symbol is attached | subjected about the same or similar component and process step.

（実施の形態１）
［温熱快適システムの構成］
まず、温熱快適システムの全体構成について説明する。図１は、実施の形態１に係る温熱快適システム１０の構成を示すブロック図である。本実施の形態に係る温熱快適システム１０は、温熱快適装置１００と、センサ２００と、入力装置３００と、制御装置４００と、機器５００と、を備える。 (Embodiment 1)
[Configuration of thermal comfort system]
First, the overall configuration of the thermal comfort system will be described. FIG. 1 is a block diagram illustrating a configuration of a thermal comfort system 10 according to the first embodiment. The thermal comfort system 10 according to the present embodiment includes a thermal comfort apparatus 100, a sensor 200, an input device 300, a control device 400, and a device 500.

温熱快適装置１００は、機器５００のための制御内容を決定し、その制御内容を制御装置４００に出力する。この温熱快適装置１００の詳細については後述する。 The thermal comfort device 100 determines the control content for the device 500 and outputs the control content to the control device 400. Details of the thermal comfort device 100 will be described later.

センサ２００は、屋内環境を検知し、検知結果を室内環境情報として出力する。屋内環境情報とは、区画された空間内の環境の情報である。具体的には、屋内環境情報は、例えば、機器５００によって制御される屋内の温度、湿度又は気流などに関する情報である。 The sensor 200 detects an indoor environment and outputs the detection result as indoor environment information. Indoor environment information is information on the environment in a partitioned space. Specifically, the indoor environment information is, for example, information related to indoor temperature, humidity, or airflow controlled by the device 500.

入力装置３００は、ユーザから屋内の快適さを示す快適性情報の入力を受ける。例えば、入力装置３００は、スマートフォン及びタブレットコンピュータであり、タッチディスプレイを介して快適性情報の入力を受ける。また例えば、入力装置３００は、マイクロフォンであってもよい。この場合、入力装置３００は、ユーザから音声入力を受ける。また例えば、入力装置３００は、機械式のプッシュボタン、キーボード、又はマウスなどであってもよい。 The input device 300 receives comfort information indicating indoor comfort from a user. For example, the input device 300 is a smartphone and a tablet computer, and receives comfort information input via a touch display. For example, the input device 300 may be a microphone. In this case, the input device 300 receives voice input from the user. For example, the input device 300 may be a mechanical push button, a keyboard, a mouse, or the like.

制御装置４００は、温熱快適装置１００から出力された制御内容に基づいて機器５００を制御する。具体的には、制御装置４００は、制御内容に対応する制御信号を機器５００に送信する。なお、制御装置４００は、機器５００に内蔵されてもよい。 The control device 400 controls the device 500 based on the control content output from the thermal comfort device 100. Specifically, the control device 400 transmits a control signal corresponding to the control content to the device 500. The control device 400 may be built in the device 500.

機器５００は、温熱快適のための屋内環境制御を行う。具体的には、機器５００は、屋内の空気調和を行う。機器５００は、例えばエアコン、空気清浄機、換気扇、扇風機又は床暖房である。本実施の形態では、機器５００が複数種類の機器を含む場合について説明するが、機器５００は、単一の機器であってもよい。 The device 500 performs indoor environment control for thermal comfort. Specifically, the device 500 performs indoor air conditioning. The device 500 is, for example, an air conditioner, an air cleaner, a ventilation fan, a fan, or floor heating. In the present embodiment, the case where the device 500 includes a plurality of types of devices will be described. However, the device 500 may be a single device.

［温熱快適装置の構成］
次に、温熱快適装置１００の詳細な構成について説明する。図１に示すように、温熱快適装置１００は、第１取得部１１０と、決定部１２０と、出力部１３０と、第２取得部１４０と、更新部１５０と、を備える。 [Configuration of thermal comfort device]
Next, a detailed configuration of the thermal comfort device 100 will be described. As shown in FIG. 1, the thermal comfort apparatus 100 includes a first acquisition unit 110, a determination unit 120, an output unit 130, a second acquisition unit 140, and an update unit 150.

温熱快適装置１００は、例えば、プロセッサ及びメモリにより実現される。例えば、メモリに格納されたソフトウェアプログラムをプロセッサが実行したときに、プロセッサが、第１取得部１１０、決定部１２０、出力部１３０、第２取得部１４０、及び更新部１５０として機能する。また、温熱快適装置１００は、第１取得部１１０、決定部１２０、出力部１３０、第２取得部１４０、及び更新部１５０に対応する専用の１以上の電子回路として実現されてもよい。 The thermal comfort apparatus 100 is realized by, for example, a processor and a memory. For example, when the processor executes a software program stored in the memory, the processor functions as the first acquisition unit 110, the determination unit 120, the output unit 130, the second acquisition unit 140, and the update unit 150. Further, the thermal comfort device 100 may be realized as one or more dedicated electronic circuits corresponding to the first acquisition unit 110, the determination unit 120, the output unit 130, the second acquisition unit 140, and the update unit 150.

第１取得部１１０は、センサ２００から屋内環境情報（第１屋内環境情報）を取得する。例えば、第１取得部１１０は、センサ２００の出力信号を処理することにより屋内環境情報を取得する。 The first acquisition unit 110 acquires indoor environment information (first indoor environment information) from the sensor 200. For example, the first acquisition unit 110 acquires indoor environment information by processing the output signal of the sensor 200.

決定部１２０は、（i）制御内容決定ルールに従って、屋内環境情報から機器５００の制御内容を決定する、又は、（ii）ランダムに機器５００の制御内容を決定する。つまり、決定部１２０は、（i）の決定及び（ii）の決定を含む複数の決定の中から１つの決定を選択的に実行する。このとき、決定部１２０は、確率εで（ii）の決定を選択する。εは、０より大きく１より小さい予め定められた値である。例えば、決定部１２０は、１−εの確率で（i）の決定を選択し、εの確率で（ii）の決定を選択する。 The determination unit 120 (i) determines the control content of the device 500 from the indoor environment information according to the control content determination rule, or (ii) determines the control content of the device 500 at random. That is, the determination unit 120 selectively executes one determination from among a plurality of determinations including the determination of (i) and the determination of (ii). At this time, the determination unit 120 selects the determination of (ii) with the probability ε. ε is a predetermined value larger than 0 and smaller than 1. For example, the determination unit 120 selects the determination of (i) with a probability of 1−ε, and selects the determination of (ii) with a probability of ε.

制御内容決定ルールは、例えば、屋内環境情報から複数の制御内容の各々の価値を推定するためのニューラルネットワークで表される。制御内容決定ルールは、図示しない記憶部に記憶されている。ニューラルネットワークの詳細については図２を用いて後述する。 The control content determination rule is represented by, for example, a neural network for estimating the value of each of a plurality of control content from indoor environment information. The control content determination rule is stored in a storage unit (not shown). Details of the neural network will be described later with reference to FIG.

出力部１３０は、決定部１２０によって決定された制御内容を出力する。ここでは、出力部１３０は、制御装置４００に制御内容を出力する。 The output unit 130 outputs the control content determined by the determination unit 120. Here, the output unit 130 outputs the control content to the control device 400.

第２取得部１４０は、屋内におけるユーザの快適さを示す快適性情報を取得する。この快適性情報は、入力装置３００を介してユーザから入力された情報を含む。例えば、第２取得部１４０は、ユーザによって入力された屋内環境の快適性を示す値を入力装置３００から取得する。 The second acquisition unit 140 acquires comfort information indicating the comfort of the user indoors. This comfort information includes information input from the user via the input device 300. For example, the second acquisition unit 140 acquires a value indicating the comfort of the indoor environment input by the user from the input device 300.

また例えば、第２取得部１４０は、入力装置３００から音声信号を受信し、音声認識により所定のキーワードの発言を検出することにより快適性情報を取得してもよい。所定のキーワードは、ユーザの快適性を示す予め定められたキーワードである。例えば、所定のキーワードは、「暑い」あるいは「寒い」などである。 Further, for example, the second acquisition unit 140 may acquire the comfort information by receiving a voice signal from the input device 300 and detecting a predetermined keyword by voice recognition. The predetermined keyword is a predetermined keyword indicating the comfort of the user. For example, the predetermined keyword is “hot” or “cold”.

更新部１５０は、第１取得部１１０によって取得された屋内環境情報と、第２取得部１４０によって取得された快適性情報とに基づいて、決定部１２０で用いられる制御内容決定ルールを更新する。具体的には、更新部１５０は、快適性情報に基づく値を報酬として用いて複数の制御内容の価値を更新する。そして、更新部１５０は、更新された価値に基づいてニューラルネットワークのパラメータ（例えば重みｗ）を更新する。つまり、更新部１５０は、複数の制御内容の価値に基づいた強化学習により、ユーザに適応した制御内容の決定を学習する。 The update unit 150 updates the control content determination rule used by the determination unit 120 based on the indoor environment information acquired by the first acquisition unit 110 and the comfort information acquired by the second acquisition unit 140. Specifically, the update unit 150 updates the values of the plurality of control contents using a value based on the comfort information as a reward. Then, the updating unit 150 updates a neural network parameter (for example, weight w) based on the updated value. That is, the update unit 150 learns the determination of the control content adapted to the user by reinforcement learning based on the values of the plurality of control content.

［ニューラルネットワークの説明］
ここで、本実施の形態におけるニューラルネットワークについて図２を参照しながら説明する。図２は、実施の形態１に係る温熱快適装置１００におけるニューラルネットワークの一例を示す概念図である。このニューラルネットワークは、多階層の人工ニューラルネットワークであり、屋内環境情報が示す環境ｓにおける複数の制御内容ａｉ（ｉ＝１〜ｎ）の価値Ｑａｉを推定するための数学モデルである。 [Description of neural network]
Here, the neural network in the present embodiment will be described with reference to FIG. FIG. 2 is a conceptual diagram illustrating an example of a neural network in the thermal comfort device 100 according to the first embodiment. This neural network is a multi-layer artificial neural network and is a mathematical model for estimating the value Qai of a plurality of control contents ai (i = 1 to n) in the environment s indicated by the indoor environment information.

［制御内容の具体例］
図３は、実施の形態１における複数の制御内容の具体例を示す図である。ここでは、機器５００がエアコン及び床暖房を含む場合について説明する。 [Specific examples of control contents]
FIG. 3 is a diagram illustrating a specific example of a plurality of control contents in the first embodiment. Here, the case where the apparatus 500 includes an air conditioner and floor heating will be described.

複数の制御内容ａ１〜ａｎは、複数の機器の複数の制御内容を含む複数の制御内容セットである。例えば、制御内容ａ１は、エアコンの風力「弱」、エアコンの目標温度「摂氏２５度」、及び床暖房の目標温度「摂氏２５度」を含む。このような複数の制御内容ａ１〜ａｎの各々の環境ｓにおける価値がニューラルネットワークによって推定される。 The plurality of control contents a1 to an are a plurality of control contents sets including a plurality of control contents of a plurality of devices. For example, the control content a1 includes the wind power “weak” of the air conditioner, the target temperature “25 degrees Celsius” of the air conditioner, and the target temperature “25 degrees Celsius” of the floor heating. The value of each of the plurality of control contents a1 to an in the environment s is estimated by a neural network.

［温熱快適装置の動作］
次に、以上のように構成された温熱快適装置１００の動作について図４及び図５を参照しながら説明する。 [Operation of thermal comfort device]
Next, the operation of the thermal comfort device 100 configured as described above will be described with reference to FIGS. 4 and 5.

図４は、実施の形態１に係る温熱快適装置１００の処理を示すフローチャートである。この処理は、例えば、予め定められた時間間隔で周期的に実行される。また例えば、この処理は、予め定められたタイムスケジュールに従って実行されてもよいし、ユーザからの指示に基づいて実行されてもよい。 FIG. 4 is a flowchart showing processing of the thermal comfort apparatus 100 according to the first embodiment. This process is periodically executed at predetermined time intervals, for example. Further, for example, this process may be executed according to a predetermined time schedule, or may be executed based on an instruction from the user.

まず、第１取得部１１０は、屋内環境情報を取得する（Ｓ１１０）。決定部１２０は、ニューラルネットワークに基づいて、屋内環境情報から各制御内容の価値を推定する（Ｓ１２０）。 First, the first acquisition unit 110 acquires indoor environment information (S110). The determination unit 120 estimates the value of each control content from the indoor environment information based on the neural network (S120).

続いて、決定部１２０は、確率εを用いて分岐処理を行う（Ｓ１３０）。ここでは、決定部１２０は、１−εの確率で（i）の決定を選択し、εの確率で（ii）の決定を選択する。 Subsequently, the determination unit 120 performs branch processing using the probability ε (S130). Here, the determination unit 120 selects the determination (i) with a probability of 1−ε, and selects the determination (ii) with a probability of ε.

ここで、（ii）の決定が選択された場合（Ｓ１３０のε）、決定部１２０は、ランダムに制御内容を決定する（Ｓ１４０）。つまり、決定部１２０は、複数の制御内容の中からランダムに制御内容を選択する。言い換えると、決定部１２０は、ニューラルネットワークに基づいて推定される価値に依存せずに制御内容を決定する。 Here, when the determination of (ii) is selected (ε in S130), the determination unit 120 determines the control content at random (S140). That is, the determination unit 120 randomly selects a control content from among a plurality of control details. In other words, the determination unit 120 determines the control content without depending on the value estimated based on the neural network.

一方、（i）の決定が選択された場合（Ｓ１３０の１−ε）、決定部１２０は、推定された価値に基づいて制御内容を決定する（Ｓ１５０）。例えば、決定部１２０は、複数の制御内容の中から最も高い価値を有する制御内容を選択する。 On the other hand, when the determination of (i) is selected (1-ε of S130), the determination unit 120 determines the control content based on the estimated value (S150). For example, the determination unit 120 selects the control content having the highest value from the plurality of control contents.

出力部１３０は、ステップＳ１４０又はステップＳ１５０で決定された制御内容を出力する（Ｓ１６０）。これにより、決定された制御内容に基づいて機器５００が制御される。 The output unit 130 outputs the control content determined in step S140 or step S150 (S160). Thereby, the apparatus 500 is controlled based on the determined control content.

その後、第２取得部１４０は、快適性情報を取得する（Ｓ１７０）。例えば、入力装置３００がディスプレイを有する場合、第２取得部１４０は、図５に示すようなグラフィカルユーザーインターフェース（ＧＵＩ）を介して、屋内におけるユーザの快適性を示す値を取得する。なお、図５のＧＵＩでは、スライダーを用いて快適性の値が入力されるが、これに限定される必要はない。ＧＵＩは、数値が直接入力されるテキストボックスを含んでもよいし、数値増加／減少ボタンを含んでもよいし、これらの組合せを含んでもよい。 Thereafter, the second acquisition unit 140 acquires comfort information (S170). For example, when the input device 300 includes a display, the second acquisition unit 140 acquires a value indicating the comfort of the user indoors via a graphical user interface (GUI) as illustrated in FIG. In the GUI of FIG. 5, the comfort value is input using a slider, but it is not necessary to be limited to this. The GUI may include a text box in which a numerical value is directly input, a numerical value increase / decrease button, or a combination thereof.

続いて、更新部１５０は、屋内環境情報及び快適性情報に基づいて、複数の制御内容の価値を更新する（Ｓ１８０）。このとき、快適性情報に基づく値が強化学習における報酬として用いられる。快適性情報に基づく値とは、快適性を示す値であり、例えば、快適性が高いほど増加する値である。 Subsequently, the updating unit 150 updates the values of the plurality of control contents based on the indoor environment information and the comfort information (S180). At this time, a value based on the comfort information is used as a reward in reinforcement learning. The value based on the comfort information is a value indicating comfort, for example, a value that increases as the comfort increases.

さらに、更新部１５０は、更新された価値に基づいてニューラルネットワークのパラメータを更新する（Ｓ１９０）。つまり、更新部１５０は、更新された各制御内容の価値を教師信号として入力することにより、複数階層のニューラルネットワークのパラメータを学習する。 Furthermore, the update unit 150 updates the parameters of the neural network based on the updated value (S190). That is, the update unit 150 learns the parameters of a neural network having a plurality of layers by inputting the value of each updated control content as a teacher signal.

このようなステップＳ１８０及びステップＳ１９０の処理が内部的に繰り返されることにより、いわゆる深層強化学習が行われる。なお、深層強化学習については、特に限定される必要はなく、従来技術が用いられてもよい。したがって、深層強化学習の詳細な説明については省略する。 So-called deep reinforcement learning is performed by internally repeating the processes in steps S180 and S190. The deep reinforcement learning is not particularly limited, and a conventional technique may be used. Therefore, detailed description of the deep reinforcement learning is omitted.

なお、快適性情報の取得は、制御内容の決定のたびに行われなくてもよい。つまり、ステップＳ１７０はスキップされてもよい。この場合、更新部１５０は、予め定められた値（例えば０）を報酬として用いて各制御内容の価値を学習してもよい。 In addition, acquisition of comfort information does not need to be performed every time the content of control is determined. That is, step S170 may be skipped. In this case, the update unit 150 may learn the value of each control content using a predetermined value (for example, 0) as a reward.

［効果］
以上のように、本実施の形態に係る温熱快適装置１００は、第１屋内環境情報を取得する第１取得部１１０と、（i）制御内容決定ルールに従って、温熱快適のための屋内環境制御を行う機器５００の制御内容を第１屋内環境情報から決定する、又は（ii）ランダムに機器５００の制御内容を決定する決定部１２０と、決定された制御内容を出力する出力部１３０と、屋内におけるユーザの快適さを示す快適性情報を取得する第２取得部１４０と、第１屋内環境情報及び快適性情報に基づいて、制御内容決定ルールを更新する更新部１５０と、を備え、決定部１２０は、確率εで（ii）の決定を選択する。 [effect]
As described above, the thermal comfort device 100 according to the present embodiment performs the indoor environment control for thermal comfort according to the first acquisition unit 110 that acquires the first indoor environment information and (i) the control content determination rule. The control content of the device 500 to be determined is determined from the first indoor environment information, or (ii) the determination unit 120 that randomly determines the control content of the device 500, the output unit 130 that outputs the determined control content, The determination unit 120 includes a second acquisition unit 140 that acquires comfort information indicating user comfort, and an update unit 150 that updates the control content determination rule based on the first indoor environment information and the comfort information. Selects the decision of (ii) with probability ε.

この構成により、更新部１５０は、快適性情報に基づいて制御内容決定ルールを更新することができる。したがって、温熱快適装置１００は、ユーザの快適性の向上に適した制御内容決定ルールを学習することができ、個々のユーザに適応した温熱快適を実現することができる。さらに、決定部１２０は、確率εでランダムな決定を選択するので、現在の制御内容決定ルールに縛られることなく、最適な制御内容を探査することができる。つまり、温熱快適装置１００は、探査と学習結果の利用とのバランスを図ることができ、制御内容決定ルールを効果的に更新することができる。 With this configuration, the update unit 150 can update the control content determination rule based on the comfort information. Therefore, the thermal comfort device 100 can learn the control content determination rule suitable for improving the user's comfort, and can realize the thermal comfort adapted to each user. Furthermore, since the determination unit 120 selects a random determination with the probability ε, the optimal control content can be searched without being bound by the current control content determination rule. That is, the thermal comfort apparatus 100 can balance the exploration and the use of the learning result, and can effectively update the control content determination rule.

また、本実施の形態に係る温熱快適装置１００において、制御内容決定ルールは、屋内環境情報から複数の制御内容の各々の価値を推定するためのニューラルネットワークで表され、更新部１５０は、快適性情報に基づく値を報酬として用いて複数の制御内容の価値を更新し、更新された価値に基づいてニューラルネットワークのパラメータを更新する。 Moreover, in the thermal comfort apparatus 100 according to the present embodiment, the control content determination rule is represented by a neural network for estimating the value of each of the plurality of control content from the indoor environment information, and the update unit 150 is configured to provide comfort. A value based on information is used as a reward to update the values of a plurality of control contents, and the parameters of the neural network are updated based on the updated values.

この構成により、いわゆる深層強化学習を温熱快適装置１００に適用することができ、温熱快適装置１００は、よりユーザに適した制御内容決定ルールを構築することができる。その結果、温熱快適装置１００は、個々のユーザに適した温熱快適を実現することができる。 With this configuration, so-called deep reinforcement learning can be applied to the thermal comfort device 100, and the thermal comfort device 100 can construct a control content determination rule more suitable for the user. As a result, the thermal comfort device 100 can realize thermal comfort suitable for individual users.

また、本実施の形態に係る温熱快適装置１００において、第２取得部１４０は、音声認識により所定のキーワードの発言を検出することにより快適性情報を取得してもよい。 Moreover, in the thermal comfort apparatus 100 according to the present embodiment, the second acquisition unit 140 may acquire comfort information by detecting a utterance of a predetermined keyword by voice recognition.

この構成により、温熱快適装置１００は、ユーザの快適性情報の入力の負荷を軽減することができ、ユーザの利便性を向上させることができる。 With this configuration, the thermal comfort device 100 can reduce the burden of input of user comfort information, and can improve user convenience.

（実施の形態２）
次に、実施の形態２について説明する。実施の形態２では、ユーザが制御内容の変化に気付くことの難しさを示す気付き困難度に基づいて複数の制御内容の中から１以上の制御内容を抽出し、抽出された１以上の制御内容の中からランダムに制御内容が決定される点が上記実施の形態１と主として異なる。以下に、実施の形態１と異なる点を中心に実施の形態２について説明する。 (Embodiment 2)
Next, a second embodiment will be described. In the second embodiment, one or more control contents are extracted from a plurality of control contents based on the degree of difficulty of noticing indicating difficulty of the user noticing a change in the control contents, and the extracted one or more control contents The main difference from the first embodiment is that the control content is randomly determined from the above. The second embodiment will be described below with a focus on differences from the first embodiment.

［温熱快適装置の構成］
実施の形態２に係る温熱快適装置の詳細な構成について説明する。図６は、実施の形態２に係る温熱快適装置１００Ａの機能構成を示すブロック図である。図６に示すように、温熱快適装置１００Ａは、第１取得部１１０と、決定部１２０Ａと、出力部１３０と、第２取得部１４０と、更新部１５０Ａと、検知部１６０Ａと、を備える。 [Configuration of thermal comfort device]
A detailed configuration of the thermal comfort device according to the second embodiment will be described. FIG. 6 is a block diagram illustrating a functional configuration of the thermal comfort device 100A according to the second embodiment. As shown in FIG. 6, the thermal comfort device 100A includes a first acquisition unit 110, a determination unit 120A, an output unit 130, a second acquisition unit 140, an update unit 150A, and a detection unit 160A.

決定部１２０Ａは、（i）制御内容決定ルールに従って、温熱快適のための屋内環境制御を行う機器の制御内容を屋内環境情報から決定する、又は（ii）ランダムに制御内容を決定する。ここで、（ii）の場合に、決定部１２０Ａは、気付き困難度情報を参照して、気付き困難度が所定の条件を満たす１以上の制御内容の中からランダムに制御内容を決定する。 120 A of determination parts determine the control content of the apparatus which performs indoor environment control for thermal comfort from indoor environment information according to (i) control content determination rules, or (ii) determine the control content at random. Here, in the case of (ii), the determining unit 120A refers to the notice difficulty level information, and randomly determines the control contents from one or more control contents that satisfy the predetermined condition.

気付き困難度情報とは、複数の制御内容の各々に対して気付き困難度が対応付けられた情報である。気付き困難度は、ユーザが制御内容の変化に気付くことの難しさを示す値である。例えば、気付き困難度情報は、複数の制御内容の各々に対してユーザの気付き難さを表す値が対応付けられたテーブルである。気付き困難度情報は、図示しない記憶部に記憶されている。 The notice difficulty level information is information in which a notice difficulty level is associated with each of a plurality of control contents. The notice difficulty level is a value indicating the difficulty of the user noticing the change in the control content. For example, the difficulty level information is a table in which a value representing the difficulty level of the user is associated with each of a plurality of control contents. The difficulty level information is stored in a storage unit (not shown).

所定の条件は、ユーザがその変化に気付き難い制御内容を決定するための条件である。例えば、所定の条件は、気付き困難度の値が予め定められた閾値より大きいことである。 The predetermined condition is a condition for determining a control content that is difficult for the user to notice the change. For example, the predetermined condition is that the value of the difficulty level of awareness is larger than a predetermined threshold value.

例えば、決定部１２０Ａは、気付き困難度情報を参照して、複数の制御内容の中から、閾値より大きい気付き困難度の値を有する１以上の制御内容を抽出する。そして、決定部１２０Ａは、抽出された１以上の制御内容からランダムに制御内容を決定する。 For example, the determination unit 120A refers to the difficulty level information and extracts one or more control contents having a value of the difficulty level of awareness that is larger than the threshold from a plurality of control contents. And the determination part 120A determines a control content at random from the extracted 1 or more control content.

図７は、実施の形態２における複数の制御内容及び気付き困難度情報の具体例を示す図である。図７では、各制御内容は、前回の制御内容に対する相対的な値（変化量）で定義されている。例えば、制御内容ａ２は、エアコンの風力が「５増加」であり、エアコンの目標温度が「変化なし」であり、床暖房の目標温度が「１度増加」であることを示す。そして、この制御内容ａ２の気付き困難度は「５０」である。 FIG. 7 is a diagram illustrating a specific example of a plurality of control contents and notice difficulty level information in the second embodiment. In FIG. 7, each control content is defined by a relative value (amount of change) with respect to the previous control content. For example, the control content a2 indicates that the wind power of the air conditioner is “5 increase”, the target temperature of the air conditioner is “no change”, and the target temperature of the floor heating is “1 degree increase”. The degree of difficulty in recognizing this control content a2 is “50”.

検知部１６０Ａは、ユーザが制御内容の変化に気付いたかどうかを検知する。つまり、検知部１６０Ａは、ランダムに制御内容が決定された場合に、その決定による制御内容の変化に対してユーザが気付いたか否かを決定する。例えば、ユーザが手動で制御内容の変更を指示した場合に、検知部１６０Ａは、ユーザが制御内容の変化に気付いたと検知する。ユーザの手動による制御内容の指示は、ユーザが屋内環境に不快感を覚えていることを間接的に示す。したがって、検知部１６０Ａは、ランダムに制御内容を決定することでユーザに不快感を与えているかどうかを検知することができる。 160 A of detection parts detect whether the user noticed the change of the control content. That is, when the control content is randomly determined, the detection unit 160A determines whether or not the user has noticed a change in the control content due to the determination. For example, when the user instructs to change the control content manually, the detection unit 160A detects that the user has noticed the change in the control content. The user's manual control content instruction indirectly indicates that the user feels uncomfortable in the indoor environment. Therefore, the detection unit 160A can detect whether or not the user is uncomfortable by randomly determining the control content.

更新部１５０Ａは、実施の形態１と同様に、屋内環境情報と快適性情報とに基づいて、制御内容決定ルールを更新する。本実施の形態に係る更新部１５０Ａは、さらに、ランダムに制御内容が決定された場合に、検知部１６０Ａによる検知結果に基づいて、気付き困難度情報を更新する。例えば、ユーザが制御内容の変化に気付いたと検知された場合に、更新部１５０Ａは、決定された制御内容の気付き困難度の値を減少させる。逆に、例えばユーザが制御内容の変化に気付いたと検知されなかった場合に、更新部１５０Ａは、決定された制御内容の気付き困難度の値を増加させる。 The update unit 150A updates the control content determination rule based on the indoor environment information and the comfort information as in the first embodiment. The update unit 150A according to the present embodiment further updates the notice difficulty level information based on the detection result by the detection unit 160A when the control content is randomly determined. For example, when it is detected that the user has noticed a change in the control content, the updating unit 150A decreases the value of the difficulty level of noticing the determined control content. On the other hand, for example, when it is not detected that the user has noticed a change in the control content, the update unit 150A increases the value of the degree of difficulty in recognizing the determined control content.

［温熱快適装置の動作］
次に、以上のように構成された温熱快適装置１００Ａの動作について図８を参照しながら説明する。図８は、実施の形態２に係る温熱快適装置１００Ａの処理を示すフローチャートである。 [Operation of thermal comfort device]
Next, the operation of the thermal comfort apparatus 100A configured as described above will be described with reference to FIG. FIG. 8 is a flowchart showing processing of the thermal comfort apparatus 100A according to the second embodiment.

ステップＳ１３０において（ii）の決定が選択された場合（Ｓ１３０のε）、決定部１２０Ａは、気付き困難度情報に基づいて、複数の制御内容の中から１以上の制御内容を抽出する（Ｓ１３２Ａ）。例えば、決定部１２０Ａは、図７の気付き困難度情報を参照して、複数の制御内容ａ１〜ａｎの中から、予め定められた閾値（例えば５０）より大きい気付き困難度の値を有する制御内容（例えばａ１、ａ２）を抽出する。 When the determination of (ii) is selected in step S130 (ε of S130), the determination unit 120A extracts one or more control contents from a plurality of control contents based on the notice difficulty level information (S132A). . For example, the determination unit 120A refers to the notice difficulty level information of FIG. 7, and the control contents having a notice difficulty level value larger than a predetermined threshold (for example, 50) from among the plurality of control contents a1 to an. (Eg, a1, a2) are extracted.

そして、決定部１２０Ａは、抽出された制御内容の中からランダムに制御内容を決定する（Ｓ１４０Ａ）。 Then, the determination unit 120A randomly determines the control content from the extracted control content (S140A).

その後、ステップＳ１６０〜Ｓ１９０が実行され、ランダムに制御内容が決定されていない場合は（Ｓ１９２ＡのＮｏ）、そのまま処理を終了する。一方、ランダムに制御内容が決定されていた場合は（Ｓ１９２ＡのＹｅｓ）、検知部１６０Ａは、ユーザが制御内容の変化に気付いたかどうかを検知する（Ｓ１９４Ａ）。更新部１５０Ａは、検知部１６０Ａによる検知結果に基づいて気付き困難度情報を更新する（Ｓ１９６Ａ）。 Thereafter, Steps S160 to S190 are executed, and if the control content is not determined at random (No in S192A), the process is terminated as it is. On the other hand, when the control content is determined at random (Yes in S192A), the detection unit 160A detects whether the user has noticed a change in the control content (S194A). The update unit 150A updates the notice difficulty level information based on the detection result by the detection unit 160A (S196A).

［効果］
以上のように、本実施の形態に係る温熱快適装置１００Ａにおいて、決定部１２０Ａは、ランダムな決定において、複数の制御内容の各々に対してユーザが制御内容の変化に気付くことの難しさを示す気付き困難度が対応付けられた気付き困難度情報を参照して、気付き困難度が所定の条件を満たす１以上の制御内容の中からランダムに制御内容を決定する。 [effect]
As described above, in thermal comfort device 100A according to the present embodiment, determination unit 120A indicates the difficulty of the user noticing a change in control content for each of a plurality of control details in a random determination. With reference to the difficulty level information associated with the difficulty level of awareness, the control content is randomly determined from one or more control content conditions that satisfy the predetermined level of difficulty level of awareness.

この構成により、決定部１２０Ａは、ランダムに制御内容を決定したときに、ユーザが制御内容の変化に気付く可能性を低減させることができる。つまり、決定部１２０Ａは、ランダムな決定において、ユーザに不快感を与える制御内容が決定されることを抑制することができる。 With this configuration, the determination unit 120A can reduce the possibility that the user will notice a change in the control content when the control content is randomly determined. That is, the determination unit 120 </ b> A can suppress the determination of the control content that causes discomfort to the user in the random determination.

また、本実施の形態に係る温熱快適装置１００Ａは、さらに、ユーザが制御内容の変化に気付いたかどうかを検知する検知部１６０Ａを備え、更新部１５０Ａは、ランダムに制御内容が決定された場合に、さらに、検知部１６０Ａによる検知結果に基づいて、気付き困難度情報を更新する。 Moreover, the thermal comfort apparatus 100A according to the present embodiment further includes a detection unit 160A that detects whether or not the user has noticed a change in the control content, and the update unit 150A has a control content that is randomly determined. Further, the notice difficulty level information is updated based on the detection result by the detection unit 160A.

この構成により、更新部１５０Ａは、ユーザが制御内容の変化に気付いたかどうかの検知結果に基づいて気付き困難情報を更新することができ、気付き困難情報の改良を図ることができる。したがって、決定部１２０Ａは、ランダムな決定において、ユーザに不快感を与える制御内容が決定されることをさらに抑制することができる。 With this configuration, the update unit 150A can update the difficult-to-understand information based on the detection result of whether or not the user has noticed the change in the control content, and can improve the difficult-to-see information. Therefore, the determination unit 120A can further suppress the determination of the control content that causes discomfort to the user in the random determination.

（実施の形態３）
次に、実施の形態３について説明する。実施の形態３では、制御内容が屋内環境に反映されるまでの時間である応答時間に基づいて複数の制御内容の中から１以上の制御内容を抽出し、抽出された１以上の制御内容の中からランダムに制御内容が決定される点が上記実施の形態１と主として異なる。以下に、実施の形態１と異なる点を中心に実施の形態３について説明する。 (Embodiment 3)
Next, Embodiment 3 will be described. In the third embodiment, one or more control contents are extracted from a plurality of control contents based on a response time which is a time until the control contents are reflected in the indoor environment, and the extracted one or more control contents The main difference from Embodiment 1 is that the contents of control are determined at random from the inside. The third embodiment will be described below with a focus on differences from the first embodiment.

［温熱快適装置の構成］
実施の形態３に係る温熱快適装置の詳細な構成について説明する。図９は、実施の形態３に係る温熱快適装置１００Ｂの機能構成を示すブロック図である。図９に示すように、温熱快適装置１００Ｂは、第１取得部１１０と、決定部１２０Ｂと、出力部１３０と、第２取得部１４０と、更新部１５０Ｂと、を備える。 [Configuration of thermal comfort device]
A detailed configuration of the thermal comfort device according to the third embodiment will be described. FIG. 9 is a block diagram illustrating a functional configuration of the thermal comfort device 100B according to the third embodiment. As shown in FIG. 9, the thermal comfort apparatus 100B includes a first acquisition unit 110, a determination unit 120B, an output unit 130, a second acquisition unit 140, and an update unit 150B.

決定部１２０Ｂは、（i）制御内容決定ルールに従って、温熱快適のための屋内環境制御を行う機器の制御内容を屋内環境情報から決定する、又は（ii）ランダムに制御内容を決定する。ここで、（ii）の場合に、決定部１２０Ｂは、応答時間情報を参照して、応答時間が所定の条件を満たす１以上の制御内容の中からランダムに制御内容を決定する。 The determination unit 120B determines (i) the control content of the device that performs indoor environment control for thermal comfort from the indoor environment information according to the control content determination rule, or (ii) randomly determines the control content. Here, in the case of (ii), the determination unit 120B refers to the response time information, and randomly determines the control content from one or more control details that satisfy the predetermined condition.

応答時間情報とは、複数の制御内容の各々について、当該制御内容が屋内環境に反映されるまでの時間（以下、応答時間という）を導出するための情報である。応答時間情報は、図示しない記憶部に記憶されている。例えば、応答時間情報は、複数の制御内容に含まれる制御要素ごとの関数を示す。この場合、例えば、制御内容ａｘから制御内容ａｙへ変化するときの応答時間Ｔは、以下の式１に従って導出される。 The response time information is information for deriving the time (hereinafter referred to as response time) until each control content is reflected in the indoor environment for each of the plurality of control details. The response time information is stored in a storage unit (not shown). For example, the response time information indicates a function for each control element included in a plurality of control contents. In this case, for example, the response time T when changing from the control content ax to the control content ay is derived according to the following Equation 1.

ここで、ｉは、制御内容に含まれる各制御要素を識別するための値である。例えばｉが「１」の場合はエアコンの風力を示し、ｉが「２」の場合はエアコンの目標温度を示し、ｉが「３」の場合は床暖房の目標温度を示す。ａｘｉ、ａｙｉは、それぞれ、制御内容ａｘ、ａｙに含まれる制御要素の制御内容を示す値（例えば風力値、目標温度）である。ｆｉは、ｉで識別される制御要素のための応答時間関数である。例えば、ｆ１は、エアコンの風力の応答時間を導出するための関数である。 Here, i is a value for identifying each control element included in the control content. For example, when i is “1”, the wind power of the air conditioner is indicated, when i is “2”, the target temperature of the air conditioner is indicated, and when i is “3”, the target temperature of the floor heating is indicated. axi and ayi are values (for example, wind power value and target temperature) indicating the control contents of the control elements included in the control contents ax and ay, respectively. fi is the response time function for the control element identified by i. For example, f1 is a function for deriving the response time of the wind power of the air conditioner.

応答時間情報は、例えば、風力の変化に対しては短い応答時間が導出され、目標温度の変化に対しては長い応答時間が導出されるように関数ｆｉを定義する。 For example, the response time information defines the function fi so that a short response time is derived for a change in wind power and a long response time is derived for a change in target temperature.

所定の条件は、応答時間が短い制御内容を決定するための条件である。例えば、所定の条件は、応答時間が予め定められた閾値時間より小さいことである。 The predetermined condition is a condition for determining a control content with a short response time. For example, the predetermined condition is that the response time is smaller than a predetermined threshold time.

具体的には、決定部１２０Ｂは、例えば、応答時間情報を参照して、複数の制御内容の中から、閾値時間より小さい応答時間を有する１以上の制御内容を抽出する。そして、決定部１２０Ｂは、抽出された１以上の制御内容からランダムに制御内容を決定する。 Specifically, the determination unit 120B refers to response time information, for example, and extracts one or more control contents having a response time smaller than the threshold time from a plurality of control contents. And the determination part 120B determines a control content at random from the extracted 1 or more control content.

更新部１５０Ｂは、実施の形態１と同様に、屋内環境情報と快適性情報とに基づいて、制御内容決定ルールを更新する。本実施の形態に係る更新部１５０Ｂは、さらに、制御内容が出力された後に取得された屋内環境情報（第２屋内環境情報）に基づいて、応答時間情報を更新する。例えば、更新部１５０Ｂは、屋内環境情報に基づいて、制御内容が出力されてから、当該制御内容に屋内環境が一致するまでの時間を検出し、検出した時間に基づいて応答時間情報を更新する。 Update unit 150B updates the control content determination rule based on the indoor environment information and the comfort information, as in the first embodiment. The updating unit 150B according to the present embodiment further updates the response time information based on the indoor environment information (second indoor environment information) acquired after the control content is output. For example, the update unit 150B detects the time from when the control content is output based on the indoor environment information until the indoor environment matches the control content, and updates the response time information based on the detected time. .

［温熱快適装置の動作］
次に、以上のように構成された温熱快適装置１００Ｂの動作について図１０を参照しながら説明する。図１０は、実施の形態３に係る温熱快適装置１００Ｂの処理を示すフローチャートである。 [Operation of thermal comfort device]
Next, the operation of the thermal comfort apparatus 100B configured as described above will be described with reference to FIG. FIG. 10 is a flowchart showing processing of the thermal comfort apparatus 100B according to the third embodiment.

ステップＳ１３０において（ii）の決定が選択された場合（Ｓ１３０のε）、決定部１２０Ｂは、応答時間情報に基づいて、複数の制御内容の各々の応答時間を導出する（Ｓ１３２Ｂ）。そして、決定部１２０Ｂは、導出された応答時間に基づいて、複数の制御内容の中から１以上の制御内容を抽出する（Ｓ１３４Ｂ）。例えば、決定部１２０Ｂは、複数の制御内容の中から、予め定められた閾値時間より小さい応答時間を有する１以上の制御内容を抽出する。 When the determination of (ii) is selected in step S130 (ε of S130), the determination unit 120B derives the response time of each of the plurality of control contents based on the response time information (S132B). Then, the determination unit 120B extracts one or more control contents from a plurality of control contents based on the derived response time (S134B). For example, the determination unit 120B extracts one or more control contents having a response time smaller than a predetermined threshold time from a plurality of control contents.

そして、決定部１２０Ａは、抽出された１以上の制御内容の中からランダムに制御内容を決定する（Ｓ１４０Ｂ）。 Then, the determination unit 120A randomly determines control contents from the extracted one or more control contents (S140B).

その後、ステップＳ１６０〜Ｓ１９０が実行され、ランダムに制御内容が決定されていない場合は（Ｓ１９２ＢのＮｏ）、そのまま処理を終了する。一方、ランダムに制御内容が決定されていた場合は（Ｓ１９２ＢのＹｅｓ）、第１取得部１１０は、屋内環境情報を取得する（Ｓ１９４Ｂ）。更新部１５０Ｂは、ステップＳ１９４Ｂで取得された屋内環境情報に基づいて応答時間情報を更新する（Ｓ１９６Ｂ）。 Thereafter, Steps S160 to S190 are executed, and when the control content is not determined at random (No in S192B), the process is ended as it is. On the other hand, when the control content is determined at random (Yes in S192B), the first acquisition unit 110 acquires indoor environment information (S194B). The updating unit 150B updates the response time information based on the indoor environment information acquired in step S194B (S196B).

［効果］
以上のように、本実施の形態に係る温熱快適装置１００Ｂにおいて、決定部１２０Ｂは、複数の制御内容の各々について、当該制御内容が屋内環境に反映されるまでの時間である応答時間を導出するための応答時間情報を参照して、応答時間が所定の条件を満たす１以上の制御内容の中からランダムに制御内容を決定する。 [effect]
As described above, in thermal comfort device 100B according to the present embodiment, determination unit 120B derives a response time that is a time until each control content is reflected in the indoor environment for each of the plurality of control details. With reference to the response time information, the control content is randomly determined from one or more control details that satisfy the predetermined condition.

この構成により、決定部１２０Ｂは、ランダムに制御内容を決定するときに、応答時間が長い制御内容が決定されることを制限することができる。つまり、決定部１２０Ｂは、ランダムな決定において比較的短時間で学習が行える制御内容を決定することができ、ランダムな制御内容の決定によってユーザに与える不快感を軽減することができる。 With this configuration, the determination unit 120B can restrict the determination of the control content having a long response time when determining the control content at random. That is, the determination unit 120B can determine the control content that can be learned in a relatively short time in the random determination, and can reduce discomfort given to the user by the determination of the random control content.

また、本実施の形態に係る温熱快適装置１００Ｂにおいて、第１取得部１１０は、制御内容が出力された後に、さらに、第２屋内環境情報を取得し、更新部１５０Ｂは、第２屋内環境情報に基づいて応答時間情報を更新する。 Moreover, in the thermal comfort apparatus 100B according to the present embodiment, the first acquisition unit 110 further acquires the second indoor environment information after the control content is output, and the update unit 150B includes the second indoor environment information. Update response time information based on

この構成により、更新部１５０Ｂは、第２屋内環境情報に基づいて応答時間情報を更新することができ、応答時間情報の改良を図ることができる。したがって、決定部１２０Ｂは、ランダムな制御内容の決定によってユーザに不快感を与えることをさらに軽減することができる。 With this configuration, the updating unit 150B can update the response time information based on the second indoor environment information, and can improve the response time information. Therefore, the determination unit 120B can further reduce discomfort to the user by determining a random control content.

（他の実施の形態）
以上、本発明の１つまたは複数の態様に係る温熱快適装置について、実施の形態に基づいて説明したが、本発明は、この実施の形態に限定されるものではない。本発明の趣旨を逸脱しない限り、当業者が思いつく各種変形を本実施の形態に施したものや、異なる実施の形態における構成要素を組み合わせて構築される形態も、本発明の１つまたは複数の態様の範囲内に含まれてもよい。 (Other embodiments)
As described above, the thermal comfort device according to one or more aspects of the present invention has been described based on the embodiment, but the present invention is not limited to this embodiment. Unless it deviates from the gist of the present invention, one or more of the present invention may be applied to various modifications that can be conceived by those skilled in the art, or forms constructed by combining components in different embodiments. It may be included within the scope of the embodiments.

例えば、上記各実施の形態において、実施の形態２と実施の形態３とは組合せて実施されてもよい。つまり、決定部は、（ii）の決定において、気付き困難度情報及び応答時間情報を参照して、気付き困難度が第１の所定の条件を満たし、かつ、応答時間が第２の所定の条件を満たす１以上の制御内容の中から、ランダムに制御内容を決定してもよい。 For example, in each of the above embodiments, the second embodiment and the third embodiment may be implemented in combination. That is, in the determination of (ii), the determination unit refers to the difficulty level information and the response time information, and the difficulty level of the notification satisfies the first predetermined condition, and the response time is the second predetermined condition. The control content may be determined at random from one or more control content satisfying the above.

なお、上記各実施の形態では、第２取得部１４０は、入力装置３００から受信した情報に基づいて快適性情報を取得していたが、入力装置３００だけではなくセンサ２００から受信した情報にも基づいて、快適性情報を取得してもよい。例えば、第２取得部１４０は、センサ２００から受信した情報を用いて、入力装置３００から受信した情報を修正することにより快適性情報を取得してもよい。具体的には、第２取得部１４０は、ユーザの表情、脳波又は心拍数に基づいて、入力装置３００から受信した情報を修正してもよい。この場合、センサ２００は、画像センサ、脳波センサ、又は、心拍センサを含めばよい。 In each of the above embodiments, the second acquisition unit 140 acquires comfort information based on information received from the input device 300. However, the second acquisition unit 140 acquires not only the input device 300 but also information received from the sensor 200. Based on this, comfort information may be acquired. For example, the second acquisition unit 140 may acquire the comfort information by correcting the information received from the input device 300 using the information received from the sensor 200. Specifically, the second acquisition unit 140 may correct the information received from the input device 300 based on the user's facial expression, brain wave, or heart rate. In this case, the sensor 200 may include an image sensor, an electroencephalogram sensor, or a heart rate sensor.

なお、上記各実施の形態では、深層強化学習を利用して、ユーザに適応した制御内容の決定を学習していたが、深層強化学習に限定されなくてもよい。例えば、制御内容決定ルールは、多階層のニューラルネットワークではなく、単階層のニューラルネットワークで表されてもよい。また、制御内容決定ルールは、ニューラルネットワークではなく、他の数学モデル（例えば、線形回帰、サポートベクタマシンなど）で表されてもよい。 In each of the above embodiments, the determination of the control content adapted to the user is learned using the deep reinforcement learning. However, the embodiment is not limited to the deep reinforcement learning. For example, the control content determination rule may be represented not by a multi-layer neural network but by a single-layer neural network. Further, the control content determination rule may be expressed not by a neural network but by another mathematical model (for example, linear regression, support vector machine, etc.).

なお、上記各実施の形態では、主として２つの決定（（i）制御内容決定ルールに従って、屋内環境情報から機器５００の制御内容を決定する、又は、（ii）ランダムに機器５００の制御内容を決定する）について説明したが、必ずしも２つの決定に限定される必要はない。例えば、３以上の決定の中から１つの決定が選択されてもよい。つまり、決定部は、（i）の決定及び（ii）の決定を含む複数の決定のうちのいずれかを選択的に実行すればよく、このとき、（ii）の決定が確率εで選択されればよい。 In each of the above embodiments, the control content of the device 500 is determined from the indoor environment information according to two decisions ((i) control content determination rules) or (ii) the control content of the device 500 is determined at random. However, the determination is not necessarily limited to two decisions. For example, one decision may be selected from three or more decisions. In other words, the determination unit only needs to selectively execute one of a plurality of determinations including the determination of (i) and the determination of (ii). At this time, the determination of (ii) is selected with the probability ε. Just do it.

なお、上記各実施の形態では、温熱快適装置は、単一の装置で実現されていたが、互いに接続された複数の装置で実現されてもよい。例えば、温熱快適装置は、クラウドコンピューティングによって実現されてもよい。 In each of the above-described embodiments, the thermal comfort device is realized by a single device, but may be realized by a plurality of devices connected to each other. For example, the thermal comfort device may be realized by cloud computing.

なお、上記実施の形態２では、気付き困難度情報が更新されていたが、必ずしも気付き困難度情報は更新されなくてもよい。この場合、温熱快適装置１００Ａは検知部１６０Ａを備えなくてもよい。 In the second embodiment, the notice difficulty level information is updated. However, the notice difficulty level information does not necessarily have to be updated. In this case, the thermal comfort device 100A may not include the detection unit 160A.

なお、上記実施の形態３では、応答時間情報が更新されていたが、必ずしも応答時間情報は更新されなくてもよい。この場合、温熱快適装置１００Ｂは検知部１６０Ａを備えなくてもよい。 In the third embodiment, the response time information is updated. However, the response time information may not necessarily be updated. In this case, the thermal comfort device 100B may not include the detection unit 160A.

なお、上記実施の形態３では、応答時間Ｔは式１に従って導出されたが、算出方法及び算出式はこれに限定されない。例えば、式１では、応答時間は、ｆｉ（ａｘｉ，ａｙｉ）の総和であったが、ｆｉ（ａｘｉ，ａｙｉ）の最大値であってもよい。また、数式ではなく、テーブルあるいはニューラルネットワークによって定義されてもよい。 In the third embodiment, the response time T is derived according to Equation 1. However, the calculation method and the calculation equation are not limited to this. For example, in Expression 1, the response time is the sum of fi (axi, ayi), but may be the maximum value of fi (axi, ayi). Further, it may be defined by a table or a neural network instead of a mathematical expression.

また、上記各実施の形態における温熱快適装置が備える構成要素の一部又は全部は、１個のシステムＬＳＩ（ＬａｒｇｅＳｃａｌｅＩｎｔｅｇｒａｔｉｏｎ：大規模集積回路）から構成されているとしてもよい。例えば、温熱快適装置１００は、第１取得部１１０と、決定部１２０と、出力部１３０と、第２取得部１４０と、更新部１５０とを有するシステムＬＳＩから構成されてもよい。 In addition, some or all of the components included in the thermal comfort apparatus in each of the above embodiments may be configured by a single system LSI (Large Scale Integration). For example, the thermal comfort apparatus 100 may be configured by a system LSI having a first acquisition unit 110, a determination unit 120, an output unit 130, a second acquisition unit 140, and an update unit 150.

システムＬＳＩは、複数の構成部を１個のチップ上に集積して製造された超多機能ＬＳＩであり、具体的には、マイクロプロセッサ、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）などを含んで構成されるコンピュータシステムである。前記ＲＯＭには、コンピュータプログラムが記憶されている。前記マイクロプロセッサが、前記コンピュータプログラムに従って動作することにより、システムＬＳＩは、その機能を達成する。 The system LSI is an ultra-multifunctional LSI manufactured by integrating a plurality of components on one chip. Specifically, a microprocessor, a ROM (Read Only Memory), a RAM (Random Access Memory), and the like. It is a computer system comprised including. A computer program is stored in the ROM. The system LSI achieves its functions by the microprocessor operating according to the computer program.

なお、ここでは、システムＬＳＩとしたが、集積度の違いにより、ＩＣ、ＬＳＩ、スーパーＬＳＩ、ウルトラＬＳＩと呼称されることもある。また、集積回路化の手法はＬＳＩに限るものではなく、専用回路または汎用プロセッサで実現してもよい。ＬＳＩ製造後に、プログラムすることが可能なＦＰＧＡ（ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）、あるいはＬＳＩ内部の回路セルの接続や設定を再構成可能なリコンフィギュラブル・プロセッサを利用してもよい。 Although the system LSI is used here, it may be called IC, LSI, super LSI, or ultra LSI depending on the degree of integration. Further, the method of circuit integration is not limited to LSI's, and implementation using dedicated circuitry or general purpose processors is also possible. An FPGA (Field Programmable Gate Array) that can be programmed after manufacturing the LSI, or a reconfigurable processor that can reconfigure the connection and setting of circuit cells inside the LSI may be used.

さらには、半導体技術の進歩または派生する別技術によりＬＳＩに置き換わる集積回路化の技術が登場すれば、当然、その技術を用いて機能ブロックの集積化を行ってもよい。バイオ技術の適用等が可能性としてありえる。 Furthermore, if integrated circuit technology comes out to replace LSI's as a result of the advancement of semiconductor technology or a derivative other technology, it is naturally also possible to carry out function block integration using this technology. Biotechnology can be applied.

また、本発明の一態様は、このような温熱快適装置だけではなく、温熱快適装置に含まれる特徴的な構成部をステップとする温熱快適のための制御内容決定方法であってもよい。また、本発明の一態様は、制御内容決定方法に含まれる特徴的な各ステップをコンピュータに実行させるコンピュータプログラムであってもよい。また、本発明の一態様は、そのようなコンピュータプログラムが記録された、コンピュータ読み取り可能な非一時的な記録媒体であってもよい。 Moreover, one aspect of the present invention may be a control content determination method for thermal comfort using not only such a thermal comfort device but also a characteristic component included in the thermal comfort device as a step. Further, one aspect of the present invention may be a computer program that causes a computer to execute each characteristic step included in the control content determination method. One embodiment of the present invention may be a computer-readable non-transitory recording medium in which such a computer program is recorded.

なお、上記各実施の形態において、各構成要素は、専用のハードウェアで構成されるか、各構成要素に適したソフトウェアプログラムを実行することによって実現されてもよい。各構成要素は、ＣＰＵまたはプロセッサなどのプログラム実行部が、ハードディスクまたは半導体メモリなどの記録媒体に記録されたソフトウェアプログラムを読み出して実行することによって実現されてもよい。ここで、上記各実施の形態の温熱快適装置などを実現するソフトウェアは、次のようなプログラムである。 In each of the above embodiments, each component may be configured by dedicated hardware or may be realized by executing a software program suitable for each component. Each component may be realized by a program execution unit such as a CPU or a processor reading and executing a software program recorded on a recording medium such as a hard disk or a semiconductor memory. Here, the software that realizes the thermal comfort apparatus of each of the above embodiments is the following program.

すなわち、このプログラムは、コンピュータに、第１屋内環境情報を取得する第１取得ステップと、（i）制御内容決定ルールに従って、温熱快適のための屋内環境制御を行う機器の制御内容を前記第１屋内環境情報から決定する、又は（ii）ランダムに機器の制御内容を決定する決定ステップと、決定された前記制御内容を出力する出力ステップと、屋内におけるユーザの快適さを示す快適性情報を取得する第２取得ステップと、前記第１屋内環境情報及び前記快適性情報に基づいて、前記制御内容決定ルールを更新する更新ステップと、を含む制御内容決定方法を実行させ、前記決定ステップでは、確率εで前記（ii）の決定を選択する。 That is, the program stores, in the computer, the first acquisition step of acquiring first indoor environment information, and (i) the control content of the device that performs indoor environment control for thermal comfort according to the control content determination rule. Determination from indoor environment information, or (ii) a determination step for randomly determining the control content of the device, an output step for outputting the determined control content, and comfort information indicating the comfort of the user indoors is acquired A control content determination method including: a second acquisition step that includes: an update step of updating the control content determination rule based on the first indoor environment information and the comfort information. Select decision (ii) above for ε.

１００、１００Ａ、１００Ｂ温熱快適装置
１１０第１取得部
１２０、１２０Ａ、１２０Ｂ決定部
１３０出力部
１４０第２取得部
１５０、１５０Ａ、１５０Ｂ更新部
１６０Ａ検知部 100, 100A, 100B Thermal comfort device 110 First acquisition unit 120, 120A, 120B Determination unit 130 Output unit 140 Second acquisition unit 150, 150A, 150B Update unit 160A Detection unit

Claims

A first acquisition unit for acquiring first indoor environment information;
(I) According to the control content determination rule, the control content of the device that performs indoor environment control for thermal comfort is determined from the first indoor environment information, or (ii) the control portion that randomly determines the control content of the device When,
An output unit for outputting the determined control content;
A second acquisition unit that acquires comfort information indicating the comfort of the user indoors;
An update unit that updates the control content determination rule based on the first indoor environment information and the comfort information,
The determination unit selects the determination of (ii) with a probability ε.
Thermal comfort device.

The control content determination rule is represented by a neural network for estimating the value of each of a plurality of control content from indoor environment information,
The update unit updates a value of the plurality of control contents using a value based on the comfort information as a reward, and updates a parameter of the neural network based on the updated value.
The thermal comfort apparatus according to claim 1.

The second acquisition unit acquires the comfort information by detecting an utterance of a predetermined keyword by voice recognition.
The thermal comfort apparatus according to claim 1 or 2.

In the determination of (ii), the determination unit refers to notice difficulty level information in which a difficulty level of the user indicating difficulty in noticing a change in the control contents is associated with each of the plurality of control contents. Then, the control content is determined at random from one or more control content in which the degree of difficulty of awareness satisfies a predetermined condition.
The thermal comfort apparatus according to any one of claims 1 to 3.

The thermal comfort apparatus further includes a detection unit that detects whether or not the user notices a change in control content,
The update unit, when the determination of (ii) is selected, further updates the difficulty level information based on the detection result by the detection unit,
The thermal comfort apparatus according to claim 4.

In the determination of (ii), the determination unit refers to response time information for deriving a response time that is a time until each control content is reflected in the indoor environment, The control content is randomly determined from one or more control details satisfying a predetermined condition for the response time.
The thermal comfort apparatus according to any one of claims 1 to 5.

The first acquisition unit further acquires second indoor environment information after the control content is output,
The update unit updates the response time information based on the second indoor environment information.
The thermal comfort apparatus according to claim 6.

A first acquisition step of acquiring first indoor environment information;
(I) A determination step of determining control content of a device that performs indoor environment control for thermal comfort from the first indoor environment information according to a control content determination rule, or (ii) determining control content of the device at random When,
An output step for outputting the determined control content;
A second acquisition step of acquiring comfort information indicating the comfort of the user indoors;
Updating the control content determination rule based on the first indoor environment information and the comfort information, and
In the determination step, the determination of (ii) is selected with probability ε.
Control content determination method.

A program for causing a computer to execute the control content determination method according to claim 8.