JP2020103496A

JP2020103496A - Environment control system and environment control method

Info

Publication number: JP2020103496A
Application number: JP2018244069A
Authority: JP
Inventors: ゆり藤原; Yuri Fujiwara; 山内　健太郎; Kentaro Yamauchi; 健太郎山内; 原田　和樹; Kazuki Harada; 和樹原田; 由布川瀬; Yoshinobu Kawase; 順平薮亀; Jumpei Yabuki
Original assignee: Panasonic Intellectual Property Management Co Ltd
Current assignee: Panasonic Intellectual Property Management Co Ltd
Priority date: 2018-12-27
Filing date: 2018-12-27
Publication date: 2020-07-09
Also published as: JP2024050544A

Abstract

To provide an environment control system capable of effectively awakening a user.SOLUTION: An environment control system 10 includes: a first acquisition unit 110 acquiring input information including information on user's physiological index; a determination unit 120 determining control contents for awakening a user and also of environment control equipment 400 including light output equipment from the input information according to the control content determination rule; a control unit 130 executing control of the environment control equipment 400 on the basis of the determined control contents; a second acquisition unit 150 acquiring evaluation information showing user evaluation to the executed control; and an update unit 180 updating the control content determination rule according to machine learning using a value based on the evaluation information as a reward.SELECTED DRAWING: Figure 1

Description

本発明は、ユーザを覚醒させるための環境制御機器の制御内容を決定する環境制御システム及び環境制御方法に関する。 The present invention relates to an environment control system and an environment control method for determining control contents of an environment control device for awakening a user.

従来、ユーザを覚醒させるための様々な技術が提案されている。特許文献１には、就寝者の睡眠状態に基づいて就寝者に起床を報知する目覚まし装置が開示されている。 Conventionally, various techniques for awakening a user have been proposed. Patent Document 1 discloses a wake-up device that notifies a sleeping person to wake up based on the sleeping state of the sleeping person.

特開２０１４−０２３５７１号公報JP, 2014-023571, A

ところで、ユーザを覚醒させるために環境を制御する環境制御システムは、ユーザを効果的に覚醒させることができることが望まれている。 By the way, it is desired that the environment control system that controls the environment to wake up the user can effectively wake up the user.

そこで、本発明は、ユーザを効果的に覚醒させることができる環境制御システム及び環境制御方法を提供する。 Therefore, the present invention provides an environment control system and an environment control method that can effectively awaken a user.

本発明の一態様に係る環境制御システムは、ユーザの生理指標情報を含む入力情報を取得する第１取得部と、制御内容決定ルールに従って、前記入力情報から、前記ユーザを覚醒させるための制御内容であって、光を出力する機器を含む環境制御機器の制御内容を決定する決定部と、決定された前記制御内容に基づいて前記環境制御機器の制御を実行する制御部と、実行された前記制御に対する前記ユーザの評価を示す評価情報を取得する第２取得部と、前記評価情報に基づく値を報酬として用いる機械学習によって前記制御内容決定ルールを更新する更新部とを備える。 An environment control system according to an aspect of the present invention includes a first acquisition unit that acquires input information including user's physiological index information, and a control content for awakening the user from the input information according to a control content determination rule. The determining unit that determines the control content of the environmental control device including the device that outputs light, the control unit that controls the environmental control device based on the determined control content, and the executed control unit. A second acquisition unit that acquires evaluation information indicating the evaluation of the user with respect to control, and an updating unit that updates the control content determination rule by machine learning using a value based on the evaluation information as a reward.

本発明の一態様に係る環境制御方法は、ユーザの生理指標情報を含む入力情報を取得する第１取得ステップと、制御内容決定ルールに従って、前記入力情報から、前記ユーザを覚醒させるための制御内容であって、光を出力する機器を含む環境制御機器の制御内容を決定する決定ステップと、決定された前記制御内容に基づいて前記環境制御機器の制御を実行する制御ステップと、実行された前記制御に対する前記ユーザの評価を示す評価情報を取得する第２取得ステップと、前記評価情報に基づく値を報酬として用いる機械学習によって前記制御内容決定ルールを更新する更新ステップとを含む。 An environmental control method according to an aspect of the present invention includes a first acquisition step of acquiring input information including user's physiological index information, and a control content for awakening the user from the input information according to a control content determination rule. The determining step of determining the control content of the environmental control device including the device that outputs light, the control step of executing control of the environmental control device based on the determined control content, and the executed The method includes a second acquisition step of acquiring evaluation information indicating an evaluation of the user with respect to control, and an updating step of updating the control content determination rule by machine learning using a value based on the evaluation information as a reward.

なお、これらの包括的又は具体的な態様は、システム、方法、集積回路、コンピュータプログラム又はコンピュータ読み取り可能なＣＤ−ＲＯＭなどの記録媒体で実現されてもよく、システム、集積回路、コンピュータプログラム及び記録媒体の任意な組み合わせで実現されてもよい。 Note that these comprehensive or specific aspects may be realized by a recording medium such as a system, a method, an integrated circuit, a computer program, or a computer-readable CD-ROM, and the system, the integrated circuit, the computer program, and the recording. It may be realized by any combination of media.

本発明の一態様に係る環境制御システム及び環境制御方法は、ユーザを効果的に覚醒させることができる。 The environment control system and the environment control method according to an aspect of the present invention can effectively awaken a user.

図１は、実施の形態１に係る環境制御システムの機能構成を示すブロック図である。FIG. 1 is a block diagram showing the functional configuration of the environment control system according to the first embodiment. 図２は、覚醒制御の概要を説明するための図である。FIG. 2 is a diagram for explaining the outline of awakening control. 図３は、複数種類の照明機器を示す図である。FIG. 3 is a diagram showing a plurality of types of lighting equipment. 図４は、実施の形態１に係る制御装置の覚醒制御時の動作のフローチャートである。FIG. 4 is a flowchart of the operation of the control device according to the first embodiment during awakening control. 図５は、入力情報として使用することができる情報を示す図である。FIG. 5 is a diagram showing information that can be used as input information. 図６は、覚醒制御の制御パラメータを説明するための図である。FIG. 6 is a diagram for explaining control parameters for awakening control. 図７は、覚醒制御における発光色の変更を説明するための色度図である。FIG. 7 is a chromaticity diagram for explaining the change of the emission color in the awakening control. 図８は、実施の形態１に係る制御装置の報酬の算出動作のフローチャートである。FIG. 8 is a flowchart of the reward calculating operation of the control device according to the first embodiment. 図９は、報酬の算出に用いることが可能な項目を示す図である。FIG. 9 is a diagram showing items that can be used to calculate a reward. 図１０は、個別報酬を決定するための項目と、当該個別報酬を決定するために使用できる生理指標情報との関係を示す図である。FIG. 10: is a figure which shows the relationship between the item for determining individual reward, and the physiological index information which can be used for determining the said individual reward. 図１１は、睡眠制御の制御パラメータを説明するための第一の図である。FIG. 11 is a first diagram for explaining control parameters for sleep control. 図１２は、睡眠制御の制御パラメータを説明するための第二の図である。FIG. 12 is a second diagram for explaining control parameters for sleep control. 図１３は、実施の形態２に係る環境制御システムの機能構成を示すブロック図である。FIG. 13 is a block diagram showing a functional configuration of the environment control system according to the second embodiment.

以下、実施の形態について、図面を参照しながら説明する。なお、以下で説明する実施の形態は、いずれも包括的又は具体的な例を示すものである。以下の実施の形態で示される数値、形状、材料、構成要素、構成要素の配置位置、接続形態、ステップ、及び、ステップの順序などは、一例であり、本発明を限定する主旨ではない。また、以下の実施の形態における構成要素のうち、独立請求項に記載されていない構成要素については、任意の構成要素として説明される。 Hereinafter, embodiments will be described with reference to the drawings. It should be noted that each of the embodiments described below shows a comprehensive or specific example. Numerical values, shapes, materials, constituent elements, arrangement positions of constituent elements, connection forms, steps, order of steps, and the like shown in the following embodiments are examples, and are not intended to limit the present invention. Further, among the constituent elements in the following embodiments, constituent elements not described in independent claims are described as arbitrary constituent elements.

なお、各図は模式図であり、必ずしも厳密に図示されたものではない。また、各図において、実質的に同一の構成に対しては同一の符号を付しており、重複する説明は省略又は簡略化される場合がある。 It should be noted that each drawing is a schematic diagram and is not necessarily strictly illustrated. Further, in each drawing, the substantially same configurations are denoted by the same reference numerals, and overlapping description may be omitted or simplified.

また、本明細書において、数値、および、数値範囲は、厳格な意味のみを表す表現ではなく、実質的に同等な範囲、例えば数％程度の差異をも含むことを意味する表現である。 Further, in the present specification, numerical values and numerical ranges are expressions that represent not only a strict meaning but also substantially equivalent ranges, for example, including a difference of about several percent.

（実施の形態１）
［概要］
以下、実施の形態１に係る環境制御システムについて説明する。図１は、実施の形態１に係る環境制御システムの機能構成を示すブロック図である。実施の形態１に係る環境制御システム１０は、ユーザが睡眠中に当該ユーザを覚醒させるための覚醒支援システムである。環境制御システム１０は、例えば、照明機器などの光を出力する機器を用いてユーザを覚醒させる覚醒制御を行う。図２は、覚醒制御の概要を説明するための図である。 (Embodiment 1)
[Overview]
Hereinafter, the environment control system according to the first embodiment will be described. FIG. 1 is a block diagram showing the functional configuration of the environment control system according to the first embodiment. The environment control system 10 according to the first embodiment is an awakening support system for awakening the user during sleep. The environment control system 10 performs awakening control to awaken a user by using a device that outputs light, such as a lighting device. FIG. 2 is a diagram for explaining the outline of awakening control.

図２に示されるように、覚醒制御は、例えば、対象期間の開始時刻から対象期間の終了時刻にかけて照明機器が発する光の明るさを徐々に増加させる制御である。これにより、環境制御システム１０は、ユーザを快適に覚醒させることができる。 As shown in FIG. 2, the awakening control is, for example, control for gradually increasing the brightness of light emitted from the lighting device from the start time of the target period to the end time of the target period. Thereby, the environment control system 10 can wake up the user comfortably.

ここで、睡眠前または睡眠中のユーザの心身の状態を示す生理指標情報により、覚醒制御の制御内容（対象期間の長さ、最大明るさ、明るさの変化量、明るさの変化関数（つまり、明るさのカーブの形状）など）の最適値は異なると考えられる。したがって、覚醒制御の制御内容がユーザの心身の状態にかかわらず固定されると、ユーザ快適に覚醒させる効果が減少する可能性がある。 Here, the control content of the awakening control (the length of the target period, the maximum brightness, the amount of change in brightness, the change function of brightness (that is, the following) is defined by the physiological index information indicating the physical and mental state of the user before or during sleep. , The shape of the brightness curve) etc.) is considered to be different. Therefore, if the control content of the awakening control is fixed regardless of the state of mind and body of the user, the effect of awakening the user comfortably may decrease.

そこで、環境制御システム１０は、あらかじめ機械学習によって構築された学習器１００ａを有する。学習器１００ａに生理指標情報が入力情報として与えられると、学習器１００ａは、最適と考えられる覚醒制御の制御内容を出力する。これにより、環境制御システム１０は、ユーザを効果的に（つまり、快適に）覚醒させることができる。 Therefore, the environment control system 10 has a learning device 100a that is constructed in advance by machine learning. When the physiological index information is given to the learning device 100a as input information, the learning device 100a outputs the control content of the awakening control considered to be optimum. Thereby, the environment control system 10 can wake up the user effectively (that is, comfortably).

なお、図２に示される、環境制御システム１０は、睡眠導入期間または睡眠期間に、ユーザを快適に眠らせるための睡眠制御を行うこともできる。睡眠導入期間において行われる制御は、光ゆらぎ制御とも記載される。後述のように、睡眠制御における制御内容は、学習器１００ａに覚醒制御の制御内容を出力させるための入力情報として使用される。 The environment control system 10 shown in FIG. 2 can also perform sleep control for making the user sleep comfortably during the sleep introduction period or the sleep period. The control performed during the sleep induction period is also described as light fluctuation control. As described below, the control content in sleep control is used as input information for causing the learning device 100a to output the control content of awakening control.

なお、ここでの睡眠には、仮眠が含まれる。仮眠とは、短時間の睡眠であり、例えば、作業の途中で当該作業を中断して行われる睡眠であり、日中に活動している人が夜にとる長時間の睡眠（以下、本睡眠とも記載される）より浅い睡眠である。図１に示されるように、環境制御システム１０は、具体的には、制御装置１００と、センサ２００と、入力装置３００と、環境制御機器４００とを備える。以下、これらの各装置について詳細に説明する。 Note that the sleep here includes a nap. A nap is a short sleep, for example, a sleep performed by interrupting the work in the middle of a work, and a long sleep (hereinafter referred to as main sleep) taken by a person who is active during the day at night. Also described)) is a lighter sleep. As shown in FIG. 1, the environmental control system 10 specifically includes a control device 100, a sensor 200, an input device 300, and an environmental control device 400. Hereinafter, each of these devices will be described in detail.

［センサ］
センサ２００は、ユーザの心身の状態を検出し、検出したユーザの心身の状態を示す生理指標情報を制御装置１００に出力する。このようなセンサ２００は、例えば、心拍計、カメラ、体温計、脳波計、唾液センサ、発汗センサ、呼吸センサ、体動センサ、血流センサなどである。このような生理指標情報を制御装置１００に出力するセンサ２００のそれぞれは、ユーザに接触する接触型のセンサであってもよいし、非接触型のセンサであってもよい。 [Sensor]
The sensor 200 detects the physical and physical condition of the user and outputs physiological index information indicating the detected physical and physical condition of the user to the control device 100. Such a sensor 200 is, for example, a heart rate monitor, a camera, a thermometer, an electroencephalograph, a saliva sensor, a perspiration sensor, a respiratory sensor, a body movement sensor, a blood flow sensor, or the like. Each of the sensors 200 that outputs such physiological index information to the control device 100 may be a contact sensor that contacts a user or a non-contact sensor.

また、センサ２００には、ユーザの行動を検出し、検出したユーザの行動を直接的または間接的に示す行動指標情報を制御装置１００に出力するセンサが含まれてもよい。このようなセンサ２００は、例えば、ユーザのタイプスピード、または、ユーザのマウスクリック数を検出するユーザインタフェースシステム、ユーザの離席状態（離席頻度及び離席回数）を検出するためにユーザが座る椅子に設けられた圧力センサなどである。センサ２００は、ユーザの会話数または状態を検出する音声認識システム、ユーザの睡眠時間、睡眠の質、または、起床時刻・就寝時刻を検出するバイタルセンサ（具体的には、睡眠計または脳波計など）であってもよい。ユーザの睡眠時間、睡眠の質、または、起床時刻・就寝時刻は、スマートフォンなどの携帯端末のアプリケーションが実行されることで検出されてもよいし、専用のウェアラブル端末によって検出されてもよい。 Further, the sensor 200 may include a sensor that detects a user's action and outputs action index information that directly or indirectly indicates the detected user action to the control device 100. Such a sensor 200 is, for example, a user interface system that detects the user's type speed or the number of mouse clicks of the user, and the user sits down to detect the state of leaving the user (the frequency of leaving and the number of times of leaving). For example, a pressure sensor provided on a chair. The sensor 200 is a voice recognition system that detects the number of conversations or the state of the user, a vital sensor that detects the sleep time of the user, the quality of sleep, or the wake-up time and bedtime (specifically, a sleep meter or an electroencephalograph). ). The user's sleep time, sleep quality, or wake-up time/sleep time may be detected by executing an application of a mobile terminal such as a smartphone, or may be detected by a dedicated wearable terminal.

また、センサ２００には、ユーザの周囲の環境情報を検出し、検出した環境情報を制御装置１００に出力するセンサが含まれてもよい。このようなセンサ２００は、例えば、日射量センサ、受光量センサ、温度センサ、においセンサ、マイクロフォン、ＣＯ_２濃度センサなどである。なお、環境制御システム１０が備えるセンサ２００の数は特に限定されない。 Further, the sensor 200 may include a sensor that detects environmental information around the user and outputs the detected environmental information to the control device 100. Such a sensor 200 is, for example, a solar radiation sensor, a received light amount sensor, a temperature sensor, an odor sensor, a microphone, a CO ₂ concentration sensor, or the like. The number of sensors 200 included in the environment control system 10 is not particularly limited.

［入力装置］
入力装置３００は、ユーザが情報を制御装置１００へ入力するための操作を受け付けるユーザインターフェース装置である。入力装置３００は、例えば、ユーザの操作に基づいて、行動指標情報（薬の摂取履歴、飲食履歴）、主観指標情報、スケジュール情報、及び、ユーザ情報などを制御装置１００に出力する。 [Input device]
The input device 300 is a user interface device that receives an operation for a user to input information to the control device 100. The input device 300 outputs behavior index information (medicine intake history, eating and drinking history), subjective index information, schedule information, user information, and the like to the control device 100, for example, based on a user operation.

また、入力装置３００は、ユーザの操作に基づいて、実行中または実行後の覚醒制御に対するユーザの評価情報を制御装置１００に出力する。評価情報は、機械学習における報酬の算出に用いられる。 In addition, the input device 300 outputs, to the control device 100, the evaluation information of the user regarding the awakening control during or after the execution, based on the operation of the user. The evaluation information is used to calculate a reward in machine learning.

入力装置３００は、例えば、スマートフォンまたはタブレット端末などの携帯端末であるが、スマートウォッチなどのウェアラブルデバイスであってもよい。また、入力装置３００は、マイクロフォン、機械式のプッシュボタン、キーボード、またはマウスなどであってもよい。なお、環境制御システム１０が備える入力装置３００の数は特に限定されない。 The input device 300 is, for example, a mobile terminal such as a smartphone or a tablet terminal, but may be a wearable device such as a smart watch. Further, the input device 300 may be a microphone, a mechanical push button, a keyboard, a mouse, or the like. The number of input devices 300 included in the environment control system 10 is not particularly limited.

［環境制御機器］
環境制御機器４００は、ユーザの周囲の環境（光環境、空気環境、または温度環境など）を制御するための機器であり、覚醒制御において制御装置１００によって制御される機器である。環境制御機器４００は、具体的には、照明機器などの光を出力する機器（つまり、光によりユーザに刺激を与える機器）である。 [Environmental control equipment]
The environment control device 400 is a device for controlling an environment (a light environment, an air environment, a temperature environment, or the like) around the user, and is a device controlled by the control device 100 in the awakening control. The environment control device 400 is, specifically, a device such as a lighting device that outputs light (that is, a device that stimulates a user with light).

なお、環境制御システム１０は、複数種類の照明機器を環境制御機器４００として備えてもよい。図３は、複数種類の照明機器を示す図である。図３に示されるように、複数種類の照明機器には、例えば、ダウンライトなどの直接照明を行う照明機器と、コーブ照明及びコーニス照明などの間接照明を行う照明機器と、上側覚醒照明及び下側覚醒照明とが含まれる。覚醒制御において、これら複数種類の照明機器は、個別に制御されてもよいし、グループ化されてグループ単位で制御されてもよい。 The environment control system 10 may include a plurality of types of lighting devices as the environment control device 400. FIG. 3 is a diagram showing a plurality of types of lighting equipment. As shown in FIG. 3, the plurality of types of lighting devices include, for example, lighting devices that perform direct lighting such as downlights, lighting devices that perform indirect lighting such as cove lighting and cornice lighting, and upper awakening lighting and lower lighting. Side awakening lighting is included. In the awakening control, these multiple types of lighting devices may be individually controlled, or may be grouped and controlled in group units.

なお、環境制御機器４００は、照明機器以外の機器であって、ユーザを覚醒させるために環境を制御する他の機器を含んでもよい。環境制御機器４００は、光以外に、映像、音、香り、振動、温湿度、気流、及び、触感などによりユーザに刺激を与える機器を含んでもよい。具体的には、環境制御機器４００は、さらに、空調機器、空気清浄機、換気扇、扇風機または床暖房などを含んでいてもよい。また、環境制御機器４００は、窓を開閉可能に覆う遮光設備（例えばブラインド及びカーテンなど）の開閉機器を含んでもよい。また、環境制御機器４００は、映像機器、音響機器または映像音響機器を含んでもよい。また、環境制御機器４００は、芳香器を含んでもよい。また、環境制御機器４００は、マッサージ器を含んでもよい。 The environment control device 400 may be a device other than the lighting device and may include another device that controls the environment in order to wake up the user. The environment control device 400 may include, in addition to light, a device that stimulates the user with images, sounds, scents, vibrations, temperature and humidity, airflow, and tactile sensations. Specifically, the environment control device 400 may further include an air conditioner, an air purifier, a ventilation fan, a fan, or floor heating. Further, the environment control device 400 may include an opening/closing device for a light-shielding facility (for example, a blind and a curtain) that covers the window so that the window can be opened/closed. Further, the environment control device 400 may include a video device, an audio device, or an audiovisual device. Further, the environment control device 400 may include an fragrancer. The environment control device 400 may also include a massager.

［制御装置］
次に、制御装置１００について説明する。制御装置１００は、生理指標情報を入力情報として取得し、取得した入力情報に基づいて、覚醒制御における環境制御機器４００の制御内容を決定する。また、制御装置１００は、決定した制御内容にしたがって環境制御機器４００を制御するための制御信号を環境制御機器４００に出力する。制御装置１００は、例えば、マイクロコンピュータによって実現されるが、プロセッサなどによって実現されてもよい。 [Control device]
Next, the control device 100 will be described. The control device 100 acquires the physiological index information as the input information, and determines the control content of the environmental control device 400 in the awakening control based on the acquired input information. Further, the control device 100 outputs a control signal for controlling the environment control device 400 to the environment control device 400 according to the determined control content. The control device 100 is realized by, for example, a microcomputer, but may be realized by a processor or the like.

制御装置１００は、具体的には、第１取得部１１０と、決定部１２０と、制御部１３０と、第２取得部１５０と、報酬算出部１６０と、報酬条件設定部１７０と、更新部１８０と、記憶部１９０とを備える。これらの構成要素のうち制御部１３０以外の構成要素は、学習器１００ａを構成する。報酬算出部１６０、報酬条件設定部１７０、更新部１８０、及び、記憶部１９０は、学習部１００ｂを構成する。 Specifically, the control device 100 includes a first acquisition unit 110, a determination unit 120, a control unit 130, a second acquisition unit 150, a reward calculation unit 160, a reward condition setting unit 170, and an update unit 180. And a storage unit 190. Among these constituent elements, constituent elements other than the control unit 130 constitute the learning device 100a. The reward calculation unit 160, the reward condition setting unit 170, the update unit 180, and the storage unit 190 configure the learning unit 100b.

第１取得部１１０は、ユーザの生理指標情報を含む入力情報を取得する。 The first acquisition unit 110 acquires input information including user's physiological index information.

決定部１２０は、制御内容決定ルールに従って、入力情報からユーザを覚醒させるための制御内容を決定する。決定部１２０は、具体的には、ユーザを覚醒させるための覚醒制御において制御される環境制御機器４００の制御内容を決定する。制御内容決定ルールは、記憶部１９０に記憶されている。 The determination unit 120 determines the control content for awakening the user from the input information according to the control content determination rule. The determination unit 120 specifically determines the control content of the environment control device 400 controlled in the awakening control for waking up the user. The control content determination rule is stored in the storage unit 190.

制御部１３０は、決定部１２０によって決定された制御内容に基づいて環境制御機器４００を制御する。具体的には、制御部１３０は、制御内容に対応する制御信号を環境制御機器４００に出力する。 The control unit 130 controls the environment control device 400 based on the control content determined by the determination unit 120. Specifically, the control unit 130 outputs a control signal corresponding to the control content to the environment control device 400.

第２取得部１５０は、覚醒制御に対するユーザの評価を示す評価情報を取得する。評価情報は、制御部１３０によって実行された制御に対する評価を示す情報を含み、例えば、覚醒制御後に入力装置３００を介してユーザから入力された情報を含む。第２取得部１５０は、入力装置３００によって出力される情報を評価情報として取得するが、センサ２００によって出力される情報を評価情報として取得してもよい。 The 2nd acquisition part 150 acquires the evaluation information which shows a user's evaluation with respect to awakening control. The evaluation information includes information indicating an evaluation of the control executed by the control unit 130, and includes, for example, information input by the user via the input device 300 after the awakening control. The second acquisition unit 150 acquires the information output by the input device 300 as the evaluation information, but may acquire the information output by the sensor 200 as the evaluation information.

報酬算出部１６０は、第２取得部１５０が取得した評価情報に基づいて報酬を算出する。報酬を算出する処理の詳細については後述される。 The reward calculation unit 160 calculates a reward based on the evaluation information acquired by the second acquisition unit 150. Details of the process of calculating the reward will be described later.

報酬条件設定部１７０は、報酬算出部１６０における報酬の算出における条件を設定する。後述のように、条件は、例えば、重み係数である。条件は、予め記憶部１９０に記憶されていてもよいし、条件が固定される場合、報酬条件設定部１７０は、設けられなくてもよい。 The reward condition setting unit 170 sets conditions for calculating the reward in the reward calculating unit 160. As described below, the condition is, for example, a weighting coefficient. The condition may be stored in advance in the storage unit 190, or when the condition is fixed, the reward condition setting unit 170 may not be provided.

更新部１８０は、第２取得部１５０が取得した評価情報に基づく値を報酬として用いる機械学習によって制御内容決定ルールを更新する。 The update unit 180 updates the control content determination rule by machine learning using a value based on the evaluation information acquired by the second acquisition unit 150 as a reward.

記憶部１９０は、制御内容決定ルール、環境制御システム１０の利用履歴情報、入力装置３００によって出力されたユーザのスケジュール情報などが記憶される記憶装置である。記憶部１９０は、例えば、半導体メモリによって実現される。 The storage unit 190 is a storage device that stores control content determination rules, usage history information of the environment control system 10, user schedule information output by the input device 300, and the like. The storage unit 190 is realized by, for example, a semiconductor memory.

［覚醒制御時の動作］
次に、制御装置１００の覚醒制御時の動作について説明する。図４は、制御装置１００の覚醒制御時の動作のフローチャートである。 [Operation during awakening control]
Next, the operation of the control device 100 during awakening control will be described. FIG. 4 is a flowchart of the operation of the control device 100 during awakening control.

まず、第１取得部１１０は、ユーザの生理指標情報を含む入力情報を取得する（Ｓ１１０）。図５は、入力情報として使用することができる生理指標情報を示す図である。図５に示されるように、生理指標情報には、心拍（脈波）、瞬目・視線、眼球運動、瞳孔変動、皮膚温度（末梢、鼻、額）、表情（感情）、脳波、唾液、頭部の動き、発汗（通常発汗、精神性発汗）、呼吸、体動、血流（脳・末梢）などが含まれる。なお、生理指標情報として使用される心拍には、心拍数だけでなく、心拍変動の周波数成分ＬＦ、ＨＦ、ＨＦ／ＬＦなどが含まれる。生理指標情報には、これらの項目の経時変化を示す情報（時間情報）が含まれてもよい。 First, the 1st acquisition part 110 acquires input information containing a user's physiological index information (S110). FIG. 5 is a diagram showing physiological index information that can be used as input information. As shown in FIG. 5, the physiological index information includes heartbeat (pulse wave), blink/line of sight, eye movement, pupil change, skin temperature (periphery, nose, forehead), facial expression (emotion), electroencephalogram, saliva, Head movement, sweating (normal sweating, mental sweating), respiration, body movement, blood flow (brain/periphery), etc. are included. The heartbeat used as the physiological index information includes not only the heart rate but also frequency components LF, HF, HF/LF of heartbeat fluctuation. The physiological index information may include information (time information) indicating changes with time of these items.

第１取得部１１０は、これらの生理指標情報のうちの少なくとも１つを入力情報として取得する。第１取得部１１０は、例えば、センサ２００から生理指標情報を取得するが、入力装置３００から生理指標情報を取得してもよいし、記憶部１９０に記憶された生理指標情報を取得してもよい。 The 1st acquisition part 110 acquires at least 1 of these physiological index information as input information. The first acquisition unit 110 acquires, for example, the physiological index information from the sensor 200, but may acquire the physiological index information from the input device 300 or the physiological index information stored in the storage unit 190. Good.

次に、決定部１２０は、記憶部１９０に記憶された制御内容決定ルールに従って、入力情報から、ユーザを覚醒させるため覚醒制御の制御内容であって、光を出力する機器を含む環境制御機器４００の制御内容を決定する（Ｓ１２０）。 Next, according to the control content determination rule stored in the storage unit 190, the determination unit 120 is the control content of the awakening control for awakening the user from the input information, and the environment control device 400 including a device that outputs light. The control content of is determined (S120).

制御内容決定ルールは、例えば、制御内容の価値を決定する行動価値関数で表される。行動価値関数は、価値関数の一例である。決定部１２０は、例えば、行動価値関数を用いて、入力情報から報酬が最も高くなると推定される制御内容（例えば、覚醒効果に対する最大限の報酬が得られる制御内容）を当該入力情報に対する制御内容であると決定する。 The control content determination rule is represented by, for example, an action value function that determines the value of the control content. The action value function is an example of a value function. The determination unit 120 uses, for example, the action value function to determine the control content for which the reward is estimated to be the highest from the input information (for example, the control content for which the maximum reward for the arousal effect is obtained) as the control content for the input information. To determine that.

報酬が最も高くなる制御内容は、ユーザごとに異なる場合がある。そこで、環境制御システム１０が複数のユーザによって共用される場合、決定部１２０は、ユーザごとに異なる制御内容決定ルールに従って、制御内容に対する当該ユーザの報酬を算出してもよい。 The control content with the highest reward may differ for each user. Therefore, when the environment control system 10 is shared by a plurality of users, the determination unit 120 may calculate the reward of the user for the control content according to the control content determination rule different for each user.

覚醒制御の制御内容には、図６に示されるような制御パラメータが含まれる。図６は、覚醒制御の制御パラメータを説明するための図である。図６の縦軸は、照明機器が発する光の明るさを示し、図６の横軸は、時間を示す。 The control content of the awakening control includes control parameters as shown in FIG. FIG. 6 is a diagram for explaining control parameters for awakening control. The vertical axis of FIG. 6 represents the brightness of light emitted by the lighting device, and the horizontal axis of FIG. 6 represents time.

覚醒制御は、対象期間の開始時刻から対象期間の終了時刻にかけて照明機器が発する光の明るさを徐々に増加させた後、一定にする制御である。これにより、環境制御システム１０は、ユーザを快適に覚醒させることができる。このとき、決定部１２０によって決定される制御パラメータとしては、（ａ）対象期間の長さ、（ｂ）最大明るさ、（ｃ）最大明るさに到達するまでの所要時間、（ｄ）最大明るさが維持される時間、及び、（ｅ）最大明るさに到達するまでの明るさのカーブの形状、が例示される。カーブの形状には、直線状、上に凸状、下に凸状などの形状が含まれる。 The awakening control is a control in which the brightness of the light emitted from the lighting device is gradually increased from the start time of the target period to the end time of the target period and then kept constant. Thereby, the environment control system 10 can wake up the user comfortably. At this time, the control parameters determined by the determining unit 120 include (a) the length of the target period, (b) the maximum brightness, (c) the time required to reach the maximum brightness, and (d) the maximum brightness. Is maintained, and (e) the shape of the curve of the brightness until the maximum brightness is reached is illustrated. The shape of the curve includes a linear shape, an upward convex shape, a downward convex shape, and the like.

なお、覚醒制御においては、明るさに代えて、または、明るさに加えて発光色（照明機器が発する光の色度）が変更されてもよい。発光色が変更される場合、図６の縦軸は、発光色と読み代えられる。図７は、覚醒制御における発光色の変更を説明するための色度図である。 In the awakening control, the emission color (chromaticity of light emitted by the lighting device) may be changed instead of or in addition to the brightness. When the emission color is changed, the vertical axis in FIG. 6 is replaced with the emission color. FIG. 7 is a chromaticity diagram for explaining the change of the emission color in the awakening control.

例えば、図７に示される色度図上のｂ点からａ点まで色度を変化させる場合、決定部１２０によって決定される制御パラメータとしては、（ａ）対象期間の長さ、（ｂ）ａ点の色度、（ｃ）色度がｂ点からａ点に到達するまでの所要時間、（ｄ）a点の色度が維持される時間、及び、（ｅ）ａ点の色度に到達するまでの色度のカーブの形状、が例示される。カーブの形状には、直線状、上に凸状、及び、下に凸状などの形状が含まれる。 For example, when changing the chromaticity from point b to point a on the chromaticity diagram shown in FIG. 7, the control parameters determined by the determination unit 120 include (a) the length of the target period and (b) a. Chromaticity of point, (c) time required for chromaticity to reach from point b to point a, (d) time to maintain chromaticity of point a, and (e) reach chromaticity of point a The shape of the chromaticity curve up to the above is exemplified. The shape of the curve includes a linear shape, an upward convex shape, and a downward convex shape.

また、覚醒制御においては、照明機器に加えて空調機器が制御されてもよい。空調機器を制御対象とした覚醒制御は、対象期間の開始時刻から対象期間の終了時刻にかけて空調機器が発する気流の強さを徐々に増加させた後一定にする制御である。つまり、図６の縦軸は、気流の強さと読み代えられる。 In the awakening control, the air conditioner may be controlled in addition to the lighting device. The awakening control targeting the air conditioner is a control in which the strength of the airflow generated by the air conditioner is gradually increased and then made constant from the start time of the target period to the end time of the target period. That is, the vertical axis in FIG. 6 is replaced with the strength of the airflow.

この場合、決定部１２０によって決定される制御パラメータとしては、（ａ）対象期間の長さ、（ｂ）最大気流、（ｃ）最大気流に到達するまでの所要時間、（ｄ）最大気流が維持される時間、及び、（ｅ）最大気流に到達するまでの気流の強さのカーブの形状、が例示される。カーブの形状には、直線状、上に凸状、下に凸状などの形状が含まれる。 In this case, (a) the length of the target period, (b) the maximum airflow, (c) the time required to reach the maximum airflow, and (d) the maximum airflow are maintained as the control parameters determined by the determination unit 120. And the shape of the curve of the strength of the airflow until reaching the maximum airflow (e). The shape of the curve includes a linear shape, an upward convex shape, a downward convex shape, and the like.

ステップＳ１２０の後、制御部１３０は、決定された制御内容に基づいて環境制御機器４００の制御（つまり、覚醒制御）を実行する（Ｓ１３０）。制御部１３０は、具体的には、制御内容に対応する制御信号を環境制御機器４００に出力する。 After step S120, the control unit 130 executes control of the environmental control device 400 (that is, awakening control) based on the determined control content (S130). Specifically, the control unit 130 outputs a control signal corresponding to the control content to the environment control device 400.

次に、第２取得部１５０は、ステップＳ１３０において実行された覚醒制御に対するユーザの評価を示す評価情報を取得する（Ｓ１４０）。第２取得部１５０は、入力装置３００によって出力される情報を評価情報として取得するが、センサ２００によって出力される情報を評価情報として取得してもよい。 Next, the 2nd acquisition part 150 acquires the evaluation information which shows a user's evaluation with respect to the awakening control performed in step S130 (S140). The second acquisition unit 150 acquires the information output by the input device 300 as the evaluation information, but may acquire the information output by the sensor 200 as the evaluation information.

次に、報酬算出部１６０は、ステップＳ１４０において取得された評価情報に基づいて報酬を算出する（Ｓ１５０）。報酬の算出動作の詳細については後述される。なお、ステップＳ１５０の評価情報の取得は、覚醒制御後に行われるが、覚醒制御中に行われてもよい。 Next, the reward calculation unit 160 calculates a reward based on the evaluation information acquired in step S140 (S150). Details of the reward calculation operation will be described later. The acquisition of the evaluation information in step S150 is performed after the awakening control, but may be performed during the awakening control.

次に、更新部１８０は、報酬算出部１６０によって算出された報酬を用いて機械学習によって制御内容決定ルールを更新する（Ｓ１６０）。更新部１８０は、報酬算出部１６０によって算出された報酬に基づく強化学習により、ユーザに適応した制御内容（すなわち、当該ユーザにおける報酬が最も多く得られる制御内容）の決定を学習する。上述のように、実施の形態１では、更新部１８０は、行動価値関数を更新することで、制御内容決定ルールを更新する。 Next, the updating unit 180 updates the control content determination rule by machine learning using the reward calculated by the reward calculating unit 160 (S160). The updating unit 180 learns the determination of the control content adapted to the user (that is, the control content with which the user receives the most reward) by the reinforcement learning based on the reward calculated by the reward calculating unit 160. As described above, in the first embodiment, the updating unit 180 updates the control content determination rule by updating the action value function.

以下、行動価値関数の更新方法について説明する。強化学習の代表的な手法としては、Ｑ学習やＴＤ学習が知られている。以下、Ｑ学習を例に説明する。Ｑ学習は、入力情報が示すユーザの状態ｓの下で、制御内容ａを選択する価値Ｑ（ｓ、ａ）を学習する方法であって、ある状態ｓのとき、価値Ｑ（ｓ、ａ）の最も高い制御内容ａを最適な制御内容として選択する。学習器１００ａ（更新部１８０）は、ある状態ｓの下で様々な制御内容ａを選択し、そのときの制御内容ａに対して報酬が与えられる。それにより、学習器１００ａは、よりよい制御内容の選択、すなわち正しい価値Ｑ（ｓ、ａ）を学習していく。このような価値Ｑ（ｓ、ａ）の更新式は、例えば、式１により表すことができる。 Hereinafter, a method of updating the action value function will be described. Q learning and TD learning are known as typical methods of reinforcement learning. Hereinafter, Q learning will be described as an example. Q-learning is a method of learning the value Q(s, a) of selecting the control content a under the user's state s indicated by the input information, and in a certain state s, the value Q(s, a) The highest control content a is selected as the optimum control content. The learning device 100a (update unit 180) selects various control contents a under a certain state s, and rewards the control contents a at that time. Thereby, the learning device 100a learns better selection of control content, that is, the correct value Q(s,a). Such an updating formula of the value Q(s, a) can be expressed by Formula 1, for example.

ここで、ｓ_ｔは、時刻tにおける状態を表し、ａ_ｔは、時刻tにおける制御内容を表す。制御内容ａ_ｔにより、状態はｓ_ｔ＋１に変化する。ｒ_ｔ＋１は、その状態の変化により得られる報酬を表している。また、ｍａｘの付いた項は、状態ｓ_ｔ＋１の下で、そのときに分かっている最もＱ値の高い制御内容ａ_ｔ＋１を選択した場合のＱ値にγを乗じたものになる。ここで、γは、０＜γ≦１のパラメータで、割引率と呼ばれる。また、αは、学習係数で、０＜α≦１の範囲とする。 Here, it s _t represents a state at time t, a _t represents the control content at time t. The state changes to s _t+1 depending on the control content a _t . r _t+1 represents the reward obtained by changing the state. In addition, the term with max is the value obtained by multiplying γ by the Q value under the state s _t+1 when the control content a _t+1 having the highest Q value known at that time is selected. Here, γ is a parameter of 0<γ≦1 and is called a discount rate. Further, α is a learning coefficient, and is set in a range of 0<α≦1.

なお、上記手法は、強化学習の手法の一例である。強化学習には、ニューラルネットワークを用いた手法、強化学習にディープラーニングを組み合わせた手法など、既存のどのような手法が用いられてもよい。 The above method is an example of the method of reinforcement learning. For the reinforcement learning, any existing method such as a method using a neural network or a method combining deep learning with the reinforcement learning may be used.

［報酬の算出動作］
次に、上記ステップＳ１５０の報酬の算出動作の詳細について説明する。図８は、報酬の算出動作のフローチャートである。 [Reward calculation operation]
Next, details of the reward calculating operation in step S150 will be described. FIG. 8 is a flowchart of the reward calculation operation.

まず、報酬算出部１６０は、快適感に対する個別報酬Ｆａを決定する（Ｓ２１０）。報酬算出部１６０は、例えば、入力装置３００によって出力されるユーザの主観評価結果を示す主観指標情報を評価情報として個別報酬Ｆａを決定するが、センサ２００として用いられる心拍計によって検出されるユーザの心拍数を評価情報として個別報酬Ｆａを決定してもよい。この場合、主観指標情報または心拍数が示す快適感が高いほど、個別報酬Ｆａの値は大きくなる。 First, the reward calculation unit 160 determines the individual reward Fa for the feeling of comfort (S210). The reward calculation unit 160 determines the individual reward Fa by using, for example, the subjective index information indicating the subjective evaluation result of the user output by the input device 300 as the evaluation information. The individual reward Fa may be determined using the heart rate as the evaluation information. In this case, the higher the feeling of comfort indicated by the subjective index information or the heart rate, the larger the value of the individual reward Fa.

次に、報酬算出部１６０は、リフレッシュ感に対する個別報酬Ｆｂを決定する（Ｓ２２０）。報酬算出部１６０は、例えば、入力装置３００によって出力されるユーザの主観指標情報を評価情報として個別報酬Ｆｂを決定する。この場合、主観指標情報が示す快適感が高いほど、個別報酬Ｆｂの値は大きくなる。 Next, the reward calculation unit 160 determines the individual reward Fb for the refresh feeling (S220). The reward calculation unit 160 determines the individual reward Fb by using the subjective index information of the user output by the input device 300 as the evaluation information, for example. In this case, the higher the comfort level indicated by the subjective index information, the larger the value of the individual reward Fb.

次に、報酬算出部１６０は、ユーザのモチベーションに対する個別報酬Ｆｃを決定する（Ｓ２３０）。報酬算出部１６０は、例えば、入力装置３００によって出力されるユーザの主観指標情報を評価情報として個別報酬Ｆｃを決定するが、センサ２００として用いられるユーザインタフェースシステムによって検出されるユーザのタイプスピードを評価情報として個別報酬Ｆｃを決定してもよい。この場合、主観指標情報が示すモチベーションの向上度合いが高いほど、個別報酬Ｆｃの値は大きくなり、タイプスピードが速いほど、個別報酬Ｆｃの値は大きくなる。 Next, the reward calculation unit 160 determines an individual reward Fc for the user's motivation (S230). The reward calculation unit 160 determines the individual reward Fc by using the subjective index information of the user output by the input device 300 as the evaluation information, for example, but evaluates the type speed of the user detected by the user interface system used as the sensor 200. The individual reward Fc may be determined as the information. In this case, the higher the degree of motivation indicated by the subjective index information, the larger the value of the individual reward Fc, and the faster the type speed, the larger the value of the individual reward Fc.

次に、報酬算出部１６０は、眠気レベルに対する個別報酬Ｆｄを決定する（Ｓ２４０）。報酬算出部１６０は、例えば、入力装置３００によって出力されるユーザの主観指標情報を評価情報として個別報酬Ｆｄを決定するが、センサ２００として用いられるカメラによって検出される、所定期間におけるユーザの瞬目の回数を評価情報として個別報酬Ｆｄを決定してもよい。この場合、主観指標情報が示す眠気レベルの低減度合いが高いほど、個別報酬Ｆｄの値は大きくなり、入眠前における瞬目の回数よりも覚醒後における瞬目の回数が少ないほど、個別報酬Ｆｄの値は大きくなる。 Next, the reward calculation unit 160 determines the individual reward Fd for the drowsiness level (S240). The reward calculation unit 160 determines the individual reward Fd using, for example, the user's subjective index information output by the input device 300 as evaluation information, but the user's blink in a predetermined period detected by the camera used as the sensor 200. The individual reward Fd may be determined by using the number of times as evaluation information. In this case, the higher the degree of reduction of the drowsiness level indicated by the subjective index information, the larger the value of the individual reward Fd, and the smaller the number of blinks after awakening than the number of blinks before falling asleep, the smaller the individual reward Fd. The value increases.

次に、報酬算出部１６０は、覚醒後の集中レベルの向上に対する個別報酬Ｆｅを決定する（Ｓ２５０）。報酬算出部１６０は、例えば、入力装置３００によって出力されるユーザの主観指標情報を評価情報として個別報酬Ｆｅを決定するが、センサ２００として用いられるユーザインタフェースシステムによって検出されるユーザのタイプスピードを評価情報として個別報酬Ｆｅを決定してもよい。この場合、主観指標情報が示すモチベーションの向上度合いが高いほど、個別報酬Ｆｅの値は大きくなり、タイプスピードが速いほど、個別報酬Ｆｅの値は大きくなる。 Next, the reward calculation unit 160 determines an individual reward Fe for improving the concentration level after awakening (S250). The reward calculation unit 160 determines the individual reward Fe, for example, using the subjective index information of the user output by the input device 300 as the evaluation information, but evaluates the type speed of the user detected by the user interface system used as the sensor 200. The individual reward Fe may be determined as the information. In this case, the higher the motivation improvement degree indicated by the subjective index information, the larger the value of the individual reward Fe, and the faster the type speed, the larger the value of the individual reward Fe.

そして、報酬算出部１６０は、ステップＳ２１０〜Ｓ２５０で決定した個別報酬Ｆａ〜Ｆｅに基づいて、制御内容に対する報酬Ｆを決定する（Ｓ２６０）。このとき、報酬算出部１６０は、個別報酬Ｆａ〜Ｆｅを重み付け加算することで報酬Ｆを算出してもよい。例えば、報酬算出部１６０は、式２に基づいて報酬Ｆを算出してもよい。 Then, the reward calculation unit 160 determines the reward F for the control content based on the individual rewards Fa to Fe determined in steps S210 to S250 (S260). At this time, the reward calculation unit 160 may calculate the reward F by weighting and adding the individual rewards Fa to Fe. For example, the reward calculation unit 160 may calculate the reward F based on Expression 2.

Ｆ＝ｗ１×Ｆａ＋ｗ２×Ｆｂ＋ｗ３×Ｆｃ＋ｗ４×Ｆｄ＋ｗ５×Ｆｅ・・（式２） F=w1×Fa+w2×Fb+w3×Fc+w4×Fd+w5×Fe (Equation 2)

ｗ１〜ｗ５は、報酬条件設定部１７０が設定した項目それぞれの重みであり、報酬条件の一例である。つまり、報酬条件設定部１７０は、報酬算出部１６０が重み付け加算するときの重みｗ１〜ｗ５を設定する。 w1 to w5 are weights of the items set by the reward condition setting unit 170, and are examples of reward conditions. That is, the reward condition setting unit 170 sets the weights w1 to w5 when the reward calculation unit 160 performs weighted addition.

なお、報酬条件設定部１７０は、ユーザが覚醒するときの天気、季節及び時間帯の少なくとも１つに応じて、条件（例えば、重み）を変更してもよい。例えば、天気、季節及び時間帯が第１取得部１１０によって取得された入力情報に含まれている場合、報酬条件設定部１７０は、第１取得部１１０から天気、季節及び時間帯に関する情報を取得することができる。また、報酬条件設定部１７０は、制御装置１００が備える、現在時刻を計測する汎用のタイマＩＣ（タイマ回路）、又は、リアルタイムクロックＩＣなどから季節及び時間帯に関する情報を取得してもよい。 Note that the reward condition setting unit 170 may change the condition (for example, weight) according to at least one of weather, season, and time zone when the user awakens. For example, when the weather, season, and time zone are included in the input information acquired by the first acquisition unit 110, the reward condition setting unit 170 acquires information about the weather, season, and time zone from the first acquisition unit 110. can do. In addition, the reward condition setting unit 170 may acquire the information regarding the season and the time zone from a general-purpose timer IC (timer circuit) that measures the current time, which is included in the control device 100, or a real-time clock IC.

以上、報酬の算出動作について説明したが、このような報酬の算出動作は一例である。例えば、個別報酬を決定するための項目（快適感、リフレッシュ感、モチベーション、眠気レベル、及び、集中レベル）は、一例であり、報酬は、少なくとも１つの項目について個別報酬が決定されることにより算出されればよい。また、報酬の算出において、その他の項目について個別報酬が決定されてもよい。図９は、報酬の算出に用いることが可能な項目を示す図である。 Although the reward calculating operation has been described above, such a reward calculating operation is an example. For example, the items for determining individual rewards (comfort, refreshment, motivation, drowsiness level, and concentration level) are examples, and the reward is calculated by determining individual rewards for at least one item. It should be done. Further, in the calculation of the reward, the individual reward may be determined for other items. FIG. 9 is a diagram showing items that can be used to calculate a reward.

また、上記報酬の算出動作においては、個別報酬を決定するために、主観評価結果を示す情報、行動指標情報（タイプスピード）、または、生理指標情報（心拍数、及び、瞬目）が評価情報として用いられた。ここで、報酬の算出動作においては、その他の生理指標情報を用いて個別報酬が決定されてもよい。図１０は、個別報酬を決定するための項目と、当該個別報酬を決定するために使用できる生理指標情報との関係を示す図である。 In the reward calculation operation, information indicating subjective evaluation results, behavior index information (type speed), or physiological index information (heart rate and blink) is used to determine individual rewards. Was used as. Here, in the operation of calculating the reward, the individual reward may be determined using other physiological index information. FIG. 10: is a figure which shows the relationship between the item for determining individual reward, and the physiological index information which can be used for determining the said individual reward.

なお、図１０に示されるように、個別報酬を決定するための項目には、覚醒制御中のユーザの状態を示す項目と、覚醒後のユーザの状態を示す項目とが含まれる。また、個別報酬を決定するための項目には、入眠前及び覚醒後の生理指標情報の変化に基づいて定められる項目が含まれる。このように、個別報酬は、覚醒制御中または覚醒後のユーザの状態に基づいて絶対的に決定されてもよいし、入眠前及び覚醒後の生理指標情報の変化に基づいて相対的に決定されてもよい。 As shown in FIG. 10, the items for determining the individual reward include an item indicating the state of the user during awakening control and an item indicating the state of the user after awakening. The items for determining the individual reward include items determined based on changes in physiological index information before falling asleep and after waking up. As described above, the individual reward may be absolutely determined based on the state of the user during or after awakening control, or may be relatively determined based on changes in physiological index information before falling asleep and after awakening. May be.

［その他の入力情報１］
入力情報は、生理指標情報に限定されない。以下、上記図５を参照しながらその他の入力情報について説明する。 [Other input information 1]
The input information is not limited to the physiological index information. Hereinafter, other input information will be described with reference to FIG.

第１取得部１１０は、ユーザの行動を示す行動指標情報を入力情報として取得してもよい。行動指標情報には、ユーザがキーボードを操作するタイプスピード、睡眠時間・睡眠の質・起床時間・就寝時間、マウスクリック数、離席頻度・離席回数、会話数・会話の状態、休憩時間、薬の摂取履歴、飲食履歴などが含まれる。行動指標情報には、これらの項目の経時変化を示す情報（時間情報）が含まれてもよい。行動指標情報は、例えば、センサ２００から取得されるが、入力装置３００から取得されてもよい。 The 1st acquisition part 110 may acquire action index information which shows a user's action as input information. The behavior index information includes the type speed at which the user operates the keyboard, sleep time, sleep quality, wake-up time, bedtime, number of mouse clicks, frequency of leaving and leaving, number of conversations/state of conversation, break time, The history of medicine intake and the history of eating and drinking are included. The action index information may include information (time information) indicating changes with time of these items. The action index information is acquired from the sensor 200, for example, but may be acquired from the input device 300.

また、第１取得部１１０は、主観指標情報を入力情報として取得してもよい。主観指標情報には、眠気、やる気、体調・身体疲労、ストレス、集中力、パフォーマンス、緊張・リラックス度合、イライラ度合・怒り・悲しさなどが含まれる。主観指標情報には、これらの項目の経時変化を示す情報（時間情報）が含まれてもよい。主観指標情報は、例えば、入力装置３００から取得される。 In addition, the first acquisition unit 110 may acquire subjective index information as input information. Subjective index information includes drowsiness, motivation, physical condition/fatigue, stress, concentration, performance, tension/relaxation degree, frustration/anger/sadness, and the like. The subjective index information may include information (time information) indicating changes with time of these items. The subjective index information is acquired from the input device 300, for example.

また、第１取得部１１０は、ユーザのスケジュールを示すスケジュール情報を入力情報として取得してもよい。スケジュール情報には、利用日・利用前日・翌日の活動予定（会議など）、作業状態（取り込み中、離席中、打合わせ中など）利用日が属する季節、利用時間帯などが含まれる。スケジュール情報は、例えば、入力装置３００から取得される。 The first acquisition unit 110 may also acquire schedule information indicating the user's schedule as input information. The schedule information includes the date of use, the day before use, the activity schedule of the next day (meeting, etc.), the work state (taking in, leaving, having a meeting, etc.), the season to which the day of use belongs, the time period of use, and the like. The schedule information is acquired from the input device 300, for example.

また、第１取得部１１０は、ユーザの環境制御システム１０（覚醒制御）の利用履歴情報を入力情報として取得してもよい。利用履歴情報には、利用時間・時刻、曜日、タイミング、入力情報、出力情報、報酬などが含まれる。利用履歴情報は、例えば、記憶部１９０から取得される。つまり、利用履歴情報は、記憶部１９０に記憶される。 The first acquisition unit 110 may also acquire, as input information, usage history information of the user's environment control system 10 (wakeup control). The usage history information includes usage time/time, day of the week, timing, input information, output information, reward, and the like. The usage history information is acquired from the storage unit 190, for example. That is, the usage history information is stored in the storage unit 190.

また、第１取得部１１０は、ユーザの周囲の環境情報を入力情報として取得してもよい。環境情報には、天気（日射量）、受光量（つまり、光環境情報）、季節・時間帯、環境温度、湿度、におい、音環境、ＣＯ_２濃度などが含まれる。 In addition, the first acquisition unit 110 may acquire environment information around the user as input information. The environmental information includes weather (amount of solar radiation), amount of received light (that is, light environment information), season/time zone, environmental temperature, humidity, odor, sound environment, CO ₂ concentration, and the like.

また、第１取得部１１０は、ユーザ情報を入力情報として取得してもよい。ユーザ情報には、性別、年齢、人種、出身地、職種、体質（光過敏など）、病歴（不眠症等を含む）などが含まれる。ユーザ情報は、例えば、入力装置３００から取得される。 The first acquisition unit 110 may also acquire user information as input information. The user information includes sex, age, race, place of origin, occupation, constitution (such as photosensitivity), medical history (including insomnia), and the like. The user information is acquired from the input device 300, for example.

［その他の入力情報２］
第１取得部１１０は、睡眠制御における制御パラメータを入力情報として取得してもよい。第１取得部１１０は、具体的には、これから行う覚醒制御よりも前（例えば、直前）の睡眠導入期間（または睡眠期間）に行われていた睡眠制御における制御パラメータを入力情報として取得してもよい。以下、睡眠制御の制御パラメータについて説明する。図１１及び図１２は、睡眠制御の制御パラメータを説明するための図である。図１１及び図１２の縦軸は、照明機器が発する光の明るさを示し、図１１及び図１２の横軸は、時間を示す。 [Other input information 2]
The 1st acquisition part 110 may acquire a control parameter in sleep control as input information. Specifically, the first acquisition unit 110 acquires, as input information, a control parameter in sleep control performed in a sleep induction period (or sleep period) before (for example, immediately before) the wakefulness control to be performed. Good. The control parameters for sleep control will be described below. 11 and 12 are diagrams for explaining control parameters for sleep control. The vertical axis in FIGS. 11 and 12 represents the brightness of light emitted from the lighting device, and the horizontal axes in FIGS. 11 and 12 represent time.

なお、以下では、リラックス期間、睡眠導入期間、及び、睡眠期間のうち、主に睡眠導入期間の制御内容（言い換えれば、光ゆらぎ制御の制御内容）について説明する。 In the following, among the relaxation period, the sleep induction period, and the sleep period, the control content of the sleep induction period (in other words, the control content of the light fluctuation control) will be described.

図１１に示されるように、睡眠制御の制御パラメータには、睡眠導入期間全体の長さ、最大明るさ及び最小明るさ、明るさを上げる上昇所要時間、最大明るさが維持される時間、明るさを下げる下降所要時間、最小明るさが維持される時間、周期、明るさを上げるときの変化の仕方（例えば、傾き）を示す上昇カーブ、及び、明るさを下げるときの変化の仕方（例えば、傾き）を示す下降カーブが含まれる。なお、最大明るさ及び最小明るさは、明るさを周期的に変化させるときの明るさの最大値及び最小値を意味する。 As shown in FIG. 11, the control parameters for sleep control include the length of the entire sleep induction period, the maximum brightness and the minimum brightness, the rising time required to raise the brightness, the time for which the maximum brightness is maintained, and the brightness. Required time for lowering the brightness, time required to maintain the minimum brightness, cycle, rising curve showing how to change when increasing brightness (for example, slope), and how to change when decreasing brightness (for example, , Slope) is included. The maximum brightness and the minimum brightness mean the maximum value and the minimum value of the brightness when the brightness is changed periodically.

また、最大明るさ、最小明るさ、各種時間、及び、各種カーブの少なくとも１つは、時間の経過とともに変化してもよい。図１２では、最大明るさが時間の経過とともに変化する例を示している。 Further, at least one of the maximum brightness, the minimum brightness, various times, and various curves may change with the passage of time. FIG. 12 shows an example in which the maximum brightness changes with the passage of time.

図１２に示されるように、睡眠制御の制御パラメータには、さらに、最大明るさの変化を開始する変化開始時間、最大明るさの変化を終了する変化終了期間、最大明るさの変化の仕方を示す変化カーブ、及び、変化後の最大明るさを示す変化目標値が含まれてもよい。 As shown in FIG. 12, the sleep control parameters further include a change start time at which a change in maximum brightness starts, a change end period at which a change in maximum brightness ends, and a method of changing maximum brightness. A change curve shown and a change target value showing the maximum brightness after the change may be included.

なお、睡眠制御においては、明るさに代えて、または、明るさに加えて発光色（照明機器が発する光の色度）が変更される場合がある。発光色が変更される場合、睡眠制御の制御パラメータには、上記明るさに関する制御パラメータに加えて、発光色に関する制御パラメータが含まれてもよい。例えば、図７に示される色度図上のｂ点からａ点まで発光色を変化させる場合、睡眠制御の制御パラメータには、ａ点の色度及びｂ点の色度、ａ点の色度に到達するまでの時間、ａ点の色度が維持される時間、ｂ点の色度に到達するまでの時間、ｂ点の色度が維持される時間、周期、ａ点の色度に到達するまでカーブの形状、及び、ｂ点の色度に到達するまでのカーブの形状が含まれる。 Note that in sleep control, the emission color (chromaticity of light emitted by the lighting device) may be changed instead of or in addition to the brightness. When the emission color is changed, the control parameters for sleep control may include a control parameter for the emission color in addition to the control parameter for the brightness. For example, when changing the emission color from point b to point a on the chromaticity diagram shown in FIG. 7, the control parameters for sleep control include chromaticity at point a, chromaticity at point b, and chromaticity at point a. Time, the chromaticity at point a is maintained, the chromaticity at point b is reached, the chromaticity at point b is maintained, the cycle, the chromaticity at point a is reached. The shape of the curve until reaching the point and the shape of the curve until reaching the chromaticity at point b are included.

また、この場合も、ａ点の色度、ｂ点の色度、各種期間、及び、各種カーブの少なくとも１つは、時間の経過とともに変化してもよい。例えば、ａ点の色度が時間の経過とともに変化する場合、睡眠制御の制御パラメータには、ａ点の色度が変化を開始する変化開始時間、ａ点の色度の変化を終了する変化終了期間、ａ点の色度の変化の仕方を示す変化カーブ、及び、変化後のａ点の色度を示す変化目標値が含まれてもよい。 Also in this case, at least one of the chromaticity at point a, the chromaticity at point b, various periods, and various curves may change over time. For example, when the chromaticity at the point a changes with time, the sleep control parameters include a change start time at which the chromaticity at the point a starts to change, and a change end to end the change in the chromaticity at the point a. A period, a change curve indicating how the chromaticity at the point a changes, and a change target value indicating the chromaticity at the point a after the change may be included.

［変形例１］
上述のように、第２取得部１５０は、覚醒制御の実行中に評価情報を取得することができる。そうすると、報酬算出部１６０は、取得された評価情報に基づいて、覚醒制御の実行中に報酬を算出することができ、更新部１８０は、覚醒制御の実行中に、算出された報酬に基づいて制御内容決定ルール（行動価値関数）を更新することができる。この結果、決定部１２０は、覚醒制御の実行中に、制御内容を変更する（つまり、制御内容を決定しなおす）ことができる。 [Modification 1]
As described above, the second acquisition unit 150 can acquire the evaluation information during the execution of the awakening control. Then, the reward calculation unit 160 can calculate the reward during execution of the awakening control based on the acquired evaluation information, and the update unit 180 calculates the reward based on the calculated reward during the execution of the awakening control. The control content determination rule (action value function) can be updated. As a result, the determination unit 120 can change the control content (that is, redetermine the control content) during the execution of the awakening control.

このように、覚醒制御の実行中に制御内容が決定しなおされれば、環境制御システム１０は、そのときのユーザの状態に対してより適切な制御内容を決定することができるので、ユーザを効果的に覚醒させることができる。 In this way, if the control content is re-determined during the execution of the awakening control, the environment control system 10 can determine a more appropriate control content for the user's state at that time, and Can wake up effectively.

［変形例２］
記憶部１９０は、決定部１２０によって過去に決定された制御内容を記憶してもよい。これにより、環境制御システム１０は、何らかの原因で入力情報を取得できず制御内容の決定が不可能な場合に、記憶部１９０に記憶された制御内容を代替え使用することができる。なお、記憶部１９０に記憶される制御内容は、上述の利用履歴情報の一部として記憶されてもよいし、単独で記憶されてもよい。記憶部１９０は、例えば、制御内容を、当該制御内容を決定するために使用された入力情報と対応付けて記憶してもよいし、制御内容を、当該制御内容を決定するために使用された入力情報と対応付けずに記憶してもよい。 [Modification 2]
The storage unit 190 may store the control content determined in the past by the determination unit 120. Thereby, the environment control system 10 can substitute and use the control content stored in the storage unit 190 when the input information cannot be acquired and the control content cannot be determined for some reason. The control content stored in the storage unit 190 may be stored as a part of the above-mentioned usage history information or may be stored alone. The storage unit 190 may store the control content in association with the input information used to determine the control content, or may use the control content to determine the control content. It may be stored without being associated with the input information.

［効果など］
以上説明したように、環境制御システム１０は、ユーザの生理指標情報を含む入力情報を取得する第１取得部１１０と、制御内容決定ルールに従って、入力情報から、ユーザを覚醒させるための制御内容であって、光を出力する機器を含む環境制御機器４００の制御内容を決定する決定部１２０と、決定された制御内容に基づいて環境制御機器４００の制御を実行する制御部１３０と、実行された制御に対するユーザの評価を示す評価情報を取得する第２取得部１５０と、評価情報に基づく値を報酬として用いる機械学習によって制御内容決定ルールを更新する更新部１８０とを備える。 [Effects]
As described above, the environment control system 10 uses the first acquisition unit 110 that acquires the input information including the physiological index information of the user and the control content for awakening the user from the input information according to the control content determination rule. Therefore, the determination unit 120 that determines the control content of the environment control device 400 including the device that outputs light, the control unit 130 that controls the environment control device 400 based on the determined control content, and The 2nd acquisition part 150 which acquires the evaluation information which shows a user's evaluation with respect to control is provided, and the update part 180 which updates a control content determination rule by machine learning which uses the value based on evaluation information as a reward.

このような環境制御システム１０は、生理指標情報及び評価情報に関連付けて学習した制御内容決定ルールに基づいて制御内容を決定することができる。したがって、環境制御システム１０は、ユーザを効果的に覚醒させることができる。 The environment control system 10 as described above can determine the control content based on the control content determination rule learned in association with the physiological index information and the evaluation information. Therefore, the environment control system 10 can effectively awaken the user.

また、例えば、制御内容決定ルールは、制御内容の価値を定める価値関数を含み、更新部１８０は、価値関数を更新する。 Further, for example, the control content determination rule includes a value function that determines the value of the control content, and the updating unit 180 updates the value function.

このような環境制御システム１０は、生理指標情報及び評価情報に関連付けて学習した価値関数に基づいて制御内容を決定することができる。 Such an environment control system 10 can determine the control content based on the value function learned in association with the physiological index information and the evaluation information.

また、例えば、環境制御システム１０は、さらに、評価情報に基づいて報酬を算出する報酬算出部１６０と、報酬算出部１６０における報酬の算出における条件を設定する報酬条件設定部１７０とを備える。 In addition, for example, the environment control system 10 further includes a reward calculating unit 160 that calculates a reward based on the evaluation information, and a reward condition setting unit 170 that sets conditions for calculating the reward in the reward calculating unit 160.

このような環境制御システム１０は、報酬条件設定部１７０によって設定された条件に応じて報酬を算出することができる。例えば、ユーザの好みに応じて条件が設定されれば、ユーザの好みに応じた制御内容が決定されやすくなる。 Such an environment control system 10 can calculate the reward according to the condition set by the reward condition setting unit 170. For example, if the condition is set according to the user's preference, the control content according to the user's preference can be easily determined.

また、例えば、環境制御システム１０は、さらに、決定部１２０によって決定された制御内容を記憶する記憶部１９０を備える。 Further, for example, the environmental control system 10 further includes a storage unit 190 that stores the control content determined by the determination unit 120.

このような環境制御システム１０は、記憶部１９０に記憶されている制御内容（例えば、前回の制御内容）を読み出すことで、入力情報が取得できずに制御内容が決定できない場合であってもユーザを覚醒させるための制御を行うことができる。 Such an environment control system 10 reads the control content (for example, the previous control content) stored in the storage unit 190, so that even if the input information cannot be acquired and the control content cannot be determined, Can be controlled to awaken.

また、例えば、更新部１８０は、上記制御内容で環境制御機器４００を制御中に第２取得部１５０が取得した評価情報に基づいて、制御内容決定ルールを更新する。決定部１２０は、さらに、環境制御機器４００を制御中に更新された制御内容決定ルールに従って入力情報に対する制御内容を制御中に決定しなおす。 In addition, for example, the update unit 180 updates the control content determination rule based on the evaluation information acquired by the second acquisition unit 150 while controlling the environment control device 400 with the control content. The determination unit 120 further determines the control content for the input information during control again according to the control content determination rule updated while controlling the environment control device 400.

このような環境制御システム１０は、制御中のユーザの状態に応じて制御内容決定ルールを更新し、制御内容を変更することができる。つまり、環境制御システム１０は、制御中に学習することで、そのときのユーザの状態に対してより適切な制御内容を決定することができるので、さらにユーザを効果的に覚醒させることができる。 Such an environment control system 10 can update the control content determination rule and change the control content according to the state of the user under control. That is, the environment control system 10 can determine the more appropriate control content for the user's state at that time by learning during the control, so that the user can be more effectively awakened.

また、例えば、入力情報には、さらに、ユーザの行動指標情報、ユーザのスケジュール情報、ユーザの環境制御システム１０の利用履歴情報、及び、環境情報の少なくとも１つが含まれる。 Further, for example, the input information further includes at least one of user action index information, user schedule information, usage history information of the user's environment control system 10, and environment information.

このような環境制御システム１０は、行動指標情報、スケジュール情報、利用履歴情報、及び、環境情報の少なくとも１つを評価情報に関連付けて学習した制御内容決定ルールに基づいて制御内容を決定することができる。 Such an environment control system 10 can determine the control content based on the control content determination rule learned by associating at least one of the action index information, the schedule information, the usage history information, and the environment information with the evaluation information. it can.

また、例えば、制御部１３０は、さらに、環境制御機器４００を用いてユーザを眠らせるための睡眠制御を実行することが可能であり、入力情報には、さらに、睡眠制御における制御パラメータが含まれる。 Further, for example, the control unit 130 can further execute sleep control for making the user sleep by using the environment control device 400, and the input information further includes a control parameter in sleep control.

このような環境制御システム１０は、睡眠制御における制御パラメータを評価情報に関連付けて学習した制御内容決定ルールに基づいて制御内容を決定することができる。 Such an environment control system 10 can determine the control content based on the control content determination rule learned by associating the control parameter in sleep control with the evaluation information.

また、例えば、環境制御機器４００には、光を出力する機器以外の他の機器が含まれる。決定部１２０は、上記他の機器を含む環境制御機器４００の制御内容を決定する。 Further, for example, the environment control device 400 includes devices other than the device that outputs light. The determination unit 120 determines the control content of the environment control device 400 including the other devices.

このような環境制御システム１０は、光を出力する機器及びそれ以外の機器のそれぞれを用いて、ユーザを覚醒させるための制御を行うことができる。 Such an environment control system 10 can perform control for awakening a user by using each of a device that outputs light and a device other than that.

また、例えば、上記他の機器は、空調機器である。 Further, for example, the other device is an air conditioner.

このような環境制御システム１０は、光を出力する機器及び空調機器のそれぞれを用いて、ユーザを覚醒させるための制御を行うことができる。 Such an environment control system 10 can perform control for waking the user by using each of the device that outputs light and the air conditioner.

また、例えば、ユーザを覚醒させるための制御内容は、対象期間において行われる光を出力する機器によって出力される光の明るさを増加させる覚醒制御における、（ａ）対象期間の長さ、（ｂ）最大明るさ、（ｃ）最大明るさに到達するまでの所要時間、（ｄ）最大明るさが維持される時間、及び、（ｅ）最大明るさに到達するまでの明るさのカーブの形状の少なくとも１つを含む。 In addition, for example, the control content for waking up the user is (a) the length of the target period in the awakening control for increasing the brightness of the light output by the device that outputs light performed in the target period, (b) ) Maximum brightness, (c) time required to reach maximum brightness, (d) time when maximum brightness is maintained, and (e) shape of curve of brightness until reaching maximum brightness At least one of

このような環境制御システム１０は、覚醒制御における制御パラメータを制御内容として決定することができる。 Such an environment control system 10 can determine the control parameter in the awakening control as the control content.

また、環境制御システム１０などのコンピュータが実行する環境制御方法は、ユーザの生理指標情報を含む入力情報を取得する第１取得ステップ（Ｓ１１０）と、制御内容決定ルールに従って、入力情報から、ユーザを覚醒させるための制御内容であって、光を出力する機器を含む環境制御機器４００の制御内容を決定する決定ステップ（Ｓ１２０）と、決定された制御内容に基づいて環境制御機器４００の制御を実行する制御ステップ（Ｓ１３０）と、実行された制御に対するユーザの評価を示す評価情報を取得する第２取得ステップ（Ｓ１４０）と、評価情報に基づく値を報酬として用いる機械学習によって制御内容決定ルールを更新する更新ステップ（Ｓ１６０）とを含む。 In addition, the environment control method executed by the computer such as the environment control system 10 includes the first acquisition step (S110) of acquiring input information including the physiological index information of the user, and the user from the input information according to the control content determination rule. A determination step (S120) of determining the control content of the environment control equipment 400 including the equipment that outputs light, which is the control content for awakening, and the control of the environment control equipment 400 is executed based on the determined control content. The control content determination rule is updated by a control step (S130) for performing, a second acquisition step (S140) for acquiring evaluation information indicating the user's evaluation of the executed control, and machine learning using a value based on the evaluation information as a reward. And an updating step (S160).

（実施の形態２）
［構成］
実施の形態２では、環境制御システムの他の構成について、図１３を参照しながら説明する。図１３は、実施の形態２に係る環境制御システムの機能構成を示すブロック図である。 (Embodiment 2)
[Constitution]
In the second embodiment, another configuration of the environment control system will be described with reference to FIG. FIG. 13 is a block diagram showing a functional configuration of the environment control system according to the second embodiment.

図１３に示されるように、実施の形態２に係る環境制御システム１０ｂは、複数の個別環境制御システム１０ｃと、複数の個別環境制御システム１０ｃのそれぞれ通信可能に接続されたサーバ装置５００とを備える。 As shown in FIG. 13, the environment control system 10b according to the second embodiment includes a plurality of individual environment control systems 10c and a server apparatus 500 communicably connected to each of the plurality of individual environment control systems 10c. ..

複数の個別環境制御システム１０ｃのそれぞれは、例えば、実施の形態１の環境制御システム１０と同様の構成であるが、図１３では第１取得部１１０及び記憶部１９０以外の構成の図示が省略されている。環境制御システム１０ｂが備える複数の個別環境制御システム１０ｃの数は、特に限定されない。 Each of the plurality of individual environment control systems 10c has, for example, a configuration similar to that of the environment control system 10 of the first embodiment, but in FIG. 13, the configuration other than the first acquisition unit 110 and the storage unit 190 is omitted. ing. The number of the plurality of individual environment control systems 10c included in the environment control system 10b is not particularly limited.

サーバ装置５００は、複数の個別環境制御システム１０ｃのそれぞれから、第１取得部１１０が取得した入力情報、及び、記憶部１９０に記憶された学習結果（例えば、制御内容、報酬、及び、更新された行動価値関数）の少なくとも１つを取得し、集中管理する。サーバ装置５００は、取得した情報を記憶部５１０に格納する。これにより、複数の個別環境制御システム１０ｃのそれぞれの学習結果等を共有することができる。 The server apparatus 500 receives the input information acquired by the first acquisition unit 110 from each of the plurality of individual environment control systems 10c and the learning result stored in the storage unit 190 (for example, control content, reward, and updated). At least one of the action value functions) and centrally manage it. The server device 500 stores the acquired information in the storage unit 510. As a result, it is possible to share the learning result of each of the plurality of individual environment control systems 10c.

なお、複数の個別環境制御システム１０ｃのそれぞれが備える複数の制御装置１００のうちの少なくとも１つの制御装置１００がサーバ装置として機能してもよい。つまり、環境制御システム１０ｂは、個別環境制御システム１０ｃとは別にサーバ装置５００を備えていなくてもよい。この場合、複数の個別環境制御システム１０ｃのそれぞれは、互いに通信可能に接続されており、入力情報、評価情報及び学習結果の少なくとも一つを相互に通信する。そして、更新部１８０は、他の個別環境制御システム１０ｃから取得した入力情報、及び、学習結果の少なくとも一つに基づいて、行動価値関数を更新する。 Note that at least one control device 100 of the plurality of control devices 100 included in each of the plurality of individual environment control systems 10c may function as a server device. That is, the environment control system 10b does not have to include the server device 500 separately from the individual environment control system 10c. In this case, each of the plurality of individual environment control systems 10c is communicably connected to each other and communicates at least one of the input information, the evaluation information, and the learning result with each other. Then, the updating unit 180 updates the action value function based on at least one of the input information acquired from the other individual environment control system 10c and the learning result.

［効果など］
以上説明したように、環境制御システム１０ｂは、第１取得部１１０、決定部１２０、制御部１３０、第２取得部１５０、及び、更新部１８０を有する個別環境制御システム１０ｃを複数備える。複数の個別環境制御システム１０ｃのそれぞれは、互いに通信可能に接続されており、入力情報及び学習結果の少なくとも一つを相互に通信する。そして、更新部１８０は、他の個別環境制御システム１０ｃから取得した入力情報及び学習結果の少なくとも一つに基づいて、制御内容決定ルールを更新する。 [Effects]
As described above, the environment control system 10b includes a plurality of individual environment control systems 10c including the first acquisition unit 110, the determination unit 120, the control unit 130, the second acquisition unit 150, and the update unit 180. Each of the plurality of individual environment control systems 10c is communicably connected to each other, and communicates at least one of the input information and the learning result with each other. Then, the updating unit 180 updates the control content determination rule based on at least one of the input information and the learning result acquired from the other individual environment control system 10c.

このような環境制御システム１０ｂにおいて、更新部１８０は、他の個別環境制御システム１０ｃが取得した入力情報等に基づいて、自装置の行動価値関数を更新することができる。よって、自装置における学習の精度が向上し、より適切な行動価値関数を得ることができる。 In such an environment control system 10b, the update unit 180 can update the action value function of the own device based on the input information and the like acquired by the other individual environment control system 10c. Therefore, the accuracy of learning in the own device is improved, and a more appropriate action value function can be obtained.

また、環境制御システム１０ｂは、複数の個別環境制御システム１０ｃと、複数の個別環境制御システム１０ｃのそれぞれと通信可能に接続されたサーバ装置５００とを備える。 The environment control system 10b includes a plurality of individual environment control systems 10c, and a server device 500 communicably connected to each of the plurality of individual environment control systems 10c.

このような環境制御システム１０ｂは、入力情報、評価情報、及び、学習結果の少なくとも１つを集中管理することができる。また、サーバ装置５００が学習部を備えている場合、複数の個別環境制御システム１０ｃのそれぞれから取得した入力情報及び評価情報等に基づいて、機械学習によって行動価値関数を更新することが可能となる。それゆえ、学習の精度が向上し、さらに適切な行動価値関数を得ることができる。なお、この場合、サーバ装置５００は、制御装置１００より高速で処理が行えるプロセッサ等を備えているとよい。 Such an environment control system 10b can centrally manage at least one of input information, evaluation information, and a learning result. Further, when the server device 500 includes a learning unit, the action value function can be updated by machine learning based on the input information and the evaluation information acquired from each of the plurality of individual environment control systems 10c. .. Therefore, learning accuracy is improved, and a more appropriate action value function can be obtained. In this case, it is preferable that the server device 500 includes a processor or the like that can perform processing faster than the control device 100.

（その他の実施の形態）
以上、実施の形態について説明したが、本発明は、上記実施の形態に限定されるものではない。 (Other embodiments)
Although the embodiments have been described above, the present invention is not limited to the above embodiments.

例えば、上記実施の形態において、環境制御システムは複数パターンの制御内容をユーザへの推薦パターンとして決定し、ユーザは入力装置を用いて複数パターンの制御内容の中から１つを選択してもよい。この場合、環境制御システムは、選択された制御内容の制御を実行する。 For example, in the above embodiment, the environment control system may determine a plurality of patterns of control content as a recommended pattern to the user, and the user may select one of the plurality of patterns of control content using the input device. .. In this case, the environmental control system executes control of the selected control content.

また、上記実施の形態では、覚醒制御における環境制御機器の制御内容を決定するために睡眠制御の制御パラメータ（つまり、制御内容）が入力情報として用いられる例について説明された。しかしながら、これとは逆に、睡眠制御における環境制御機器の制御内容を決定するために覚醒制御の制御パラメータ（つまり、制御内容）が入力情報として用いられてもよい。 Further, in the above-described embodiment, the example in which the control parameter (that is, the control content) of the sleep control is used as the input information to determine the control content of the environmental control device in the wakeup control has been described. However, conversely, a control parameter (that is, control content) of awakening control may be used as input information to determine the control content of the environment control device in sleep control.

また、上記実施の形態において、覚醒制御が行われる場所については特に限定されない。覚醒制御は、例えば、オフィスの仮眠室で行われるが、住宅、医療施設、または、介護施設などで行われてもよい。 Further, in the above embodiment, the place where the awakening control is performed is not particularly limited. Awakening control is performed, for example, in a nap room of an office, but may be performed in a house, a medical facility, a nursing facility, or the like.

また、上記実施の形態では、環境制御システムは複数の装置によって実現されたが、単一の装置として実現されてもよい。環境制御システムが複数の装置によって実現される場合に、上記実施の形態で説明された構成要素は、複数の装置にどのように振り分けられてもよい。また、環境制御システムは、クライアントサーバシステムとして実現されてもよい。 Further, in the above embodiment, the environment control system is realized by a plurality of devices, but may be realized as a single device. When the environment control system is implemented by a plurality of devices, the components described in the above embodiments may be distributed to the plurality of devices in any way. Further, the environment control system may be realized as a client server system.

また、上記実施の形態において装置間の通信方法については特に限定されるものではない。装置間で行われる通信は、例えば、特定小電力無線、ＺｉｇＢｅｅ（登録商標）、Ｂｌｕｅｔｏｏｔｈ（登録商標）、または、Ｗｉ−Ｆｉ（登録商標）などの通信規格を用いた無線通信であるが、有線通信であってもよい。また、装置間の通信においては、図示されない中継装置が介在してもよい。 Further, the communication method between the devices in the above embodiment is not particularly limited. The communication performed between the devices is wireless communication using a communication standard such as specific low power wireless communication, ZigBee (registered trademark), Bluetooth (registered trademark), or Wi-Fi (registered trademark). It may be communication. Further, a relay device (not shown) may intervene in the communication between the devices.

また、上記実施の形態において、特定の処理部が実行する処理を別の処理部が実行してもよい。また、複数の処理の順序が変更されてもよいし、複数の処理が並行して実行されてもよい。 Further, in the above-described embodiment, the processing executed by the specific processing unit may be executed by another processing unit. Further, the order of the plurality of processes may be changed, or the plurality of processes may be executed in parallel.

また、上記実施の形態において、各構成要素は、各構成要素に適したソフトウェアプログラムを実行することによって実現されてもよい。各構成要素は、ＣＰＵまたはプロセッサなどのプログラム実行部が、ハードディスクまたは半導体メモリなどの記録媒体に記録されたソフトウェアプログラムを読み出して実行することによって実現されてもよい。 Further, in the above-described embodiment, each component may be realized by executing a software program suitable for each component. Each component may be realized by a program execution unit such as a CPU or a processor reading and executing a software program recorded in a recording medium such as a hard disk or a semiconductor memory.

また、各構成要素は、ハードウェアによって実現されてもよい。例えば、各構成要素は、回路（または集積回路）でもよい。これらの回路は、全体として１つの回路を構成してもよいし、それぞれ別々の回路でもよい。また、これらの回路は、それぞれ、汎用的な回路でもよいし、専用の回路でもよい。 Further, each component may be realized by hardware. For example, each component may be a circuit (or integrated circuit). These circuits may constitute one circuit as a whole or may be separate circuits. Further, each of these circuits may be a general-purpose circuit or a dedicated circuit.

また、本発明の全般的または具体的な態様は、システム、装置、方法、集積回路、コンピュータプログラムまたはコンピュータ読み取り可能なＣＤ−ＲＯＭなどの記録媒体で実現されてもよい。また、システム、装置、方法、集積回路、コンピュータプログラム及び記録媒体の任意な組み合わせで実現されてもよい。例えば、本発明は、上記実施の形態に係る制御装置として実現されてもよい。また、本発明は、上記実施の形態に係る環境制御方法をコンピュータに実行させるためのプログラムとして実現されてもよいし、このようなプログラムが記録されたコンピュータ読み取り可能な非一時的な記録媒体として実現されてもよい。 Further, the general or specific aspects of the present invention may be realized by a recording medium such as a system, a device, a method, an integrated circuit, a computer program, or a computer-readable CD-ROM. Further, the system, the device, the method, the integrated circuit, the computer program, and the recording medium may be implemented in any combination. For example, the present invention may be realized as the control device according to the above embodiment. Further, the present invention may be realized as a program for causing a computer to execute the environment control method according to the above-described embodiment, or as a computer-readable non-transitory recording medium in which such a program is recorded. May be realized.

その他、各実施の形態に対して当業者が思いつく各種変形を施して得られる形態、または、本発明の趣旨を逸脱しない範囲で各実施の形態における構成要素及び機能を任意に組み合わせることで実現される形態も本発明に含まれる。 In addition, it is realized by making various modifications to those embodiments by those skilled in the art, or by arbitrarily combining the components and functions of the embodiments without departing from the spirit of the present invention. The present invention also includes the forms.

１０、１０ｂ環境制御システム
１０ｃ個別環境制御システム
１１０第１取得部
１２０決定部
１３０制御部
１５０第２取得部
１６０報酬算出部
１７０報酬条件設定部
１８０更新部
１９０記憶部
４００環境制御機器
５００サーバ装置 10, 10b Environment control system 10c Individual environment control system 110 First acquisition unit 120 Determination unit 130 Control unit 150 Second acquisition unit 160 Reward calculation unit 170 Reward condition setting unit 180 Update unit 190 Storage unit 400 Environmental control device 500 Server device

Claims

A first acquisition unit for acquiring input information including user's physiological index information;
According to a control content determination rule, from the input information, a determination unit that determines the control content of the environmental control device that is a control content for awakening the user, and that includes a device that outputs light,
A control unit that executes control of the environment control device based on the determined control content;
A second acquisition unit that acquires evaluation information indicating the evaluation of the user for the executed control;
An environment control system comprising: an updating unit that updates the control content determination rule by machine learning using a value based on the evaluation information as a reward.

The control content determination rule includes a value function that determines the value of the control content,
The environment control system according to claim 1, wherein the updating unit updates the value function.

further,
A reward calculation unit that calculates the reward based on the evaluation information,
The environment control system according to claim 1, further comprising: a reward condition setting unit that sets a condition for calculating the reward in the reward calculating unit.

The environment control system according to claim 1, further comprising a storage unit that stores the control content determined by the determination unit.

The update unit updates the control content determination rule based on the evaluation information acquired by the second acquisition unit while controlling the environment control device with the control content,
The determination unit further determines the control content for the input information during the control again according to the control content determination rule updated while controlling the environment control device. Environmental control system.

The input information further includes at least one of behavior index information of the user, schedule information of the user, usage history information of the environment control system of the user, and environment information. The environment control system according to item 1.

The control unit is further capable of executing sleep control for sleeping the user using the environment control device,
The environment control system according to claim 1, wherein the input information further includes a control parameter in the sleep control.

The environment control device further includes a device other than the device that outputs the light,
The environment control system according to claim 1, wherein the determination unit determines the control content of the environment control device including the other device.

The environment control system according to claim 8, wherein the other device is an air conditioner.

The control content for waking up the user is (a) the length of the target period and (b) the maximum in the control for increasing the brightness of the light output by the device that outputs the light performed in the target period. Brightness, (c) time required to reach the maximum brightness, (d) time when the maximum brightness is maintained, and (e) a curve of brightness until the maximum brightness is reached. The environmental control system according to claim 1, comprising at least one of shapes.

A plurality of individual environment control systems having the first acquisition unit, the determination unit, the control unit, the second acquisition unit, and the update unit,
Each of the plurality of individual environment control systems are communicably connected to each other, and mutually communicate at least one of the input information and the learning result,
The update unit updates the control content determination rule based on at least one of the input information and the learning result acquired from another individual environment control system. Environmental control system.

A plurality of individual environment control systems,
The environment control system according to claim 11, comprising a server device communicatively connected to each of the plurality of individual environment control systems.

A first acquisition step of acquiring input information including user's physiological index information;
In accordance with a control content determination rule, a determination step of determining the control content of the environmental control device including the device that outputs light, which is the control content for awakening the user from the input information.
A control step of executing control of the environmental control device based on the determined control content;
A second acquisition step of acquiring evaluation information indicating the evaluation of the user with respect to the executed control;
An updating step of updating the control content determination rule by machine learning using a value based on the evaluation information as a reward.