JP7309069B2

JP7309069B2 - Learning device and reasoning device for control of air conditioner

Info

Publication number: JP7309069B2
Application number: JP2022530391A
Authority: JP
Inventors: 洋志守安
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2020-06-09
Filing date: 2020-06-09
Publication date: 2023-07-14
Anticipated expiration: 2040-06-09
Also published as: WO2021250770A1; JPWO2021250770A1

Description

本開示は、空気調和装置の制御のための学習装置および推論装置に関する。 TECHNICAL FIELD The present disclosure relates to a learning device and a reasoning device for controlling an air conditioner.

空気調和システムを最適運転制御する方法が知られている。たとえば、特許文献１には、空調システムの運転時の計測データを用いて、エネルギー消費関数を決定することによって、空気調和装置を運転制御する方法が記載されている。 Methods are known for optimal operation control of air conditioning systems. For example, Patent Literature 1 describes a method of controlling the operation of an air conditioner by determining an energy consumption function using measurement data during operation of the air conditioning system.

特開２００６－２０７９２９号公報JP 2006-207929 A

しかしながら、特許文献１の空気調和装置の運転制御方法では、室内の温度を検出する温度センサが室内の決められた箇所に固定されて配置される。そのため、ユーザが所在する場所の温度を設定温度にすることができない場合がある。特に、室内の什器などによって気流が乱される場合には、このような問題が発生する。 However, in the method for controlling the operation of an air conditioner disclosed in Patent Document 1, a temperature sensor that detects the temperature in the room is fixed and arranged at a predetermined location in the room. Therefore, it may not be possible to set the temperature of the location where the user is located to the set temperature. In particular, such a problem occurs when air currents are disturbed by indoor fixtures and the like.

それゆえに、本開示の目的は、ユーザが所在する場所の温度を設定温度にすることができる空気調和装置の運転制御のための学習装置および推論装置を提供することである。 SUMMARY OF THE INVENTION Therefore, an object of the present disclosure is to provide a learning device and a reasoning device for operation control of an air conditioner that can set the temperature of the location where the user is located to the set temperature.

本開示の空気調和装置のための学習装置は、空気調和装置のユーザの位置、およびユーザの位置における検出温度と空気調和装置の室内機の設定温度との差を含む状態と、状態における室内機の設定風量および設定風向とを含む学習用データを取得するデータ取得部と、学習用データを用いて、空気調和装置のユーザの位置、およびユーザの位置における検出温度と空気調和装置の室内機の設定温度との差から室内機の設定風量および室内機の設定風向を推論するための学習済モデルを生成するモデル生成部とを備える。 The learning device for an air conditioner of the present disclosure includes a user position of the air conditioner, a state including the difference between the detected temperature at the user's position and the set temperature of the indoor unit of the air conditioner, and the indoor unit in the state. a data acquisition unit that acquires learning data including the set air volume and the set wind direction; and a model generation unit that generates a learned model for inferring the set air volume of the indoor unit and the set wind direction of the indoor unit from the difference from the set temperature.

本開示の空気調和装置のための推論装置は、空気調和装置のユーザの位置、およびユーザの位置における温度と空気調和装置の室内機の設定温度との差を含む状態を取得するデータ取得部と、空気調和装置のユーザの位置、およびユーザの位置における検出温度と空気調和装置の室内機の設定温度との差から室内機の設定風量および室内機の設定風向を推論するための学習済モデルを用いて、データ取得部で取得した状態から室内機の設定風量および室内機の設定風向を推論する推論部とを備える。 A reasoning device for an air conditioner of the present disclosure includes a data acquisition unit that acquires a position of a user of the air conditioner and a state including a difference between a temperature at the user's position and a set temperature of an indoor unit of the air conditioner. , the position of the user of the air conditioner, and a trained model for inferring the set air volume of the indoor unit and the set wind direction of the indoor unit from the difference between the detected temperature at the user's position and the set temperature of the indoor unit of the air conditioner. and an inference unit that infers the set air volume and the set air direction of the indoor unit from the state acquired by the data acquisition unit.

本開示によれば、ユーザが所在する場所の温度を設定温度にすることができる。 According to the present disclosure, the temperature of the location where the user is located can be set to the set temperature.

実施の形態の空気調和システムの構成を表わす図である。BRIEF DESCRIPTION OF THE DRAWINGS It is a figure showing the structure of the air conditioning system of embodiment. 実施の形態１の学習装置１０および推論装置３０に入力または出力されるデータを表わす図である。3 is a diagram representing data input to or output from learning device 10 and inference device 30 of Embodiment 1. FIG. ユーザの位置の例を表わす図である。FIG. 4 is a diagram representing an example of user positions; 実施の形態１の学習装置１０の構成を表わす図である。1 is a diagram showing a configuration of learning device 10 according to Embodiment 1. FIG. 学習装置１０の学習処理に関するフローチャートである。4 is a flow chart relating to learning processing of the learning device 10. FIG. 推論装置３０の構成を表わす図である。3 is a diagram showing the configuration of an inference device 30; FIG. 推論装置３０による室内機の設定風量および室内機の設定風向の推論手順を表わすフローチャートである。4 is a flowchart showing a procedure for inferring the set air volume and the set wind direction of the indoor unit by the inference device 30. FIG. 実施の形態２の学習装置１０および推論装置３０に入力または出力されるデータを表わす図である。FIG. 10 is a diagram representing data input to or output from learning device 10 and inference device 30 according to a second embodiment; 実施の形態３の学習装置１０の構成を表わす図である。FIG. 13 is a diagram showing the configuration of learning device 10 of Embodiment 3; （ａ）～（ｊ）は、可搬式センサ３および制御装置２から得られたデータの例を表す図である。(a) to (j) are diagrams showing examples of data obtained from the portable sensor 3 and the control device 2. FIG. 図１０（ａ）～図１０（ｊ）におけるユーザの位置を表わす図である。FIG. 10(a) to FIG. 10(j) are diagrams showing the positions of the users. （ａ）～（ｃ）は、データを増加する方法を説明するための図である。(a) to (c) are diagrams for explaining a method of increasing data. （ａ）～（ｃ）は、データを増加する方法を説明するための図である。(a) to (c) are diagrams for explaining a method of increasing data. 学習装置１０、推論装置３０、または制御装置２のハードウェア構成を表わす図である。3 is a diagram showing a hardware configuration of a learning device 10, an inference device 30, or a control device 2; FIG.

以下、実施の形態について、図面を参照して説明する。
実施の形態１．
図１は、実施の形態の空気調和システムの構成を表わす図である。Embodiments will be described below with reference to the drawings.
Embodiment 1.
FIG. 1 is a diagram showing the configuration of an air conditioning system according to an embodiment.

空気調和システムは、空気調和装置１と、制御装置２と、可搬式センサ３と、学習装置１０と、学習済モデル記憶部２０と、推論装置３０とを備える。 The air conditioning system includes an air conditioner 1 , a control device 2 , a portable sensor 3 , a learning device 10 , a learned model storage section 20 and an inference device 30 .

可搬式センサ３は、ユーザが携帯して持ち運びすることができる。可搬式センサ３は、温度を検出することができる。可搬式センサ３は、ユーザの位置および、ユーザの位置における温度を検出することができる。 The portable sensor 3 can be carried by a user. The portable sensor 3 can detect temperature. The portable sensor 3 can detect the user's position and the temperature at the user's position.

学習装置１０は、ユーザの位置、およびユーザの位置の検出温度と室内機の設定温度との差から室内機の設定風量および室内機の風向設定を推論する学習済モデルを生成する。 The learning device 10 generates a trained model that infers the set air volume and direction of the indoor unit from the user's position and the difference between the detected temperature at the user's position and the set temperature of the indoor unit.

学習済モデル記憶部２０は、学習装置１０によって生成された学習済モデルを記憶する。 The trained model storage unit 20 stores trained models generated by the learning device 10 .

推論装置３０は、学習済みモデル記憶部に記憶されている学習済モデルに従って、ユーザの位置、およびユーザの位置の検出温度と室内機の設定温度との差から、ユーザが所在する場所を適切に設定温度にするために、空気調和装置の室内機の設定風量および室内機の設定風向を推定する。 The inference device 30 appropriately determines the location of the user from the user's position and the difference between the detected temperature at the user's position and the set temperature of the indoor unit according to the learned model stored in the learned model storage unit. In order to achieve the set temperature, the set air volume and the set wind direction of the indoor unit of the air conditioner are estimated.

制御装置２は、推論装置３０の推論結果などに基づいて、空気調和装置１を制御する。
図２は、実施の形態１の学習装置１０および推論装置３０に入力または出力されるデータを表わす図である。The control device 2 controls the air conditioner 1 based on the inference result of the inference device 30 and the like.
FIG. 2 is a diagram representing data input to or output from learning apparatus 10 and inference apparatus 30 according to the first embodiment.

Ｂ１（行動）は、室内機の設定風量および室内機の設定風向である。Ｂ２（状態）は、ユーザの位置、およびユーザの位置の検出温度と室内機の設定温度との差である。Ｃ（出力）は、室内機の設定風量および室内機の設定風向である。Ｄ（報酬基準）は、単位時間当りのユーザの位置の検出温度の変化量である。 B1 (behavior) is the set air volume of the indoor unit and the set wind direction of the indoor unit. B2 (state) is the user's position and the difference between the detected temperature at the user's position and the set temperature of the indoor unit. C (output) is the set air volume of the indoor unit and the set air direction of the indoor unit. D (reward criterion) is the amount of change in the detected temperature of the user's location per unit time.

図３は、ユーザの位置の例を表わす図である。
ユーザが可搬式センサ３を携帯することによって、可搬式センサ３によってユーザの位置を検出することができる。ユーザの所在位置の温度を検出することによって、ユーザが所在する場所の気流の制御が可能となる。FIG. 3 is a diagram representing an example of user positions.
By carrying the portable sensor 3 by the user, the position of the user can be detected by the portable sensor 3 . By sensing the temperature of the user's location, it is possible to control the airflow at the user's location.

図４は、実施の形態１の学習装置１０の構成を表わす図である。学習装置１０は、データ取得部１２と、モデル生成部１３とを備える。 FIG. 4 is a diagram showing the configuration of learning device 10 according to the first embodiment. The learning device 10 includes a data acquisition section 12 and a model generation section 13 .

データ取得部１２は、Ｂ１（行動）とＢ２（状態）とを含む学習データを取得する。すなわち、データ取得部１２は、室内機の設定風量および室内機の設定風向と、ユーザの位置、およびユーザの位置の検出温度と室内機の設定温度との差とを含む学習データを取得する。 The data acquisition unit 12 acquires learning data including B1 (behavior) and B2 (state). That is, the data acquisition unit 12 acquires learning data including the set air volume of the indoor unit, the set air direction of the indoor unit, the position of the user, and the difference between the detected temperature at the user's position and the set temperature of the indoor unit.

モデル生成部１３は、データ取得部１２で取得したＢ１（行動）とＢ２（状態）とを含む学習データを用いて、Ｂ２（状態）からＣ（出力）を推論する学習済モデルを生成する。すなわち、モデル生成部１３は、室内機の設定風量および室内機の設定風向と、ユーザの位置、およびユーザの位置の検出温度と室内機の設定温度との差とを含む学習用データを用いて、ユーザの位置、およびユーザの位置の検出温度と室内機の設定温度との差から室内機の設定風量および室内機の風向設定を推論する学習済モデルを生成する。モデル生成部１３は、生成した学習済モデルを学習済モデル記憶部２０に記憶させる。 The model generation unit 13 generates a trained model that infers C (output) from B2 (state) using learning data including B1 (behavior) and B2 (state) acquired by the data acquisition unit 12 . That is, the model generation unit 13 uses learning data including the set air volume of the indoor unit, the set air direction of the indoor unit, the position of the user, and the difference between the detected temperature at the user's position and the set temperature of the indoor unit. , the position of the user, and a trained model that infers the set air volume and the wind direction setting of the indoor unit from the difference between the detected temperature at the user's position and the set temperature of the indoor unit. The model generation unit 13 stores the generated learned model in the learned model storage unit 20 .

モデル生成部１３が用いる学習アルゴリズムとして、教師あり学習、教師なし学習、または強化学習等の公知のアルゴリズムを用いることができる。一例として、強化学習を適用した場合について説明する。強化学習では、ある環境内におけるエージェント（行動主体）が、現在の状態（環境のパラメータ）を観測し、取るべき行動を決定する。エージェントの行動により環境が動的に変化し、エージェントには環境の変化に応じて報酬が与えられる。エージェントはこれを繰り返し、一連の行動を通じて報酬が最も多く得られる行動方針を学習する。強化学習の代表的な手法であるＱ学習、またはＴＤ学習（Temporal Difference Learning）を用いることができる。例えば、Ｑ学習（Q-learning）の場合、行動価値関数Ｑ（ｓ，ａ）の一般的な更新式は、式（１）で表される。 A known algorithm such as supervised learning, unsupervised learning, or reinforcement learning can be used as the learning algorithm used by the model generation unit 13 . As an example, a case where reinforcement learning is applied will be described. In reinforcement learning, an agent (actor) in an environment observes the current state (environmental parameters) and decides what action to take. The environment dynamically changes according to the actions of the agent, and the agent is rewarded according to the change in the environment. The agent repeats this and learns the course of action that yields the most rewards through a series of actions. Q-learning, which is a representative method of reinforcement learning, or TD-learning (Temporal Difference Learning) can be used. For example, in the case of Q-learning, a general update formula for the action-value function Q(s, a) is represented by formula (1).

式（１）において、ｓｔは時刻ｔにおける環境の状態を表し、ａｔは時刻ｔにおける行動を表す。行動ａｔにより、状態はｓｔ＋１に変わる。ｒｔ＋１はその状態の変化によってもらえる報酬を表し、γは割引率を表し、αは学習係数を表す。なお、γは０＜γ≦１、αは０＜α≦１の範囲とする。Ｂ１（行動）が行動ａｔとなり、Ｂ２（状態）が状態ｓｔとなる。すなわち、室内機の設定風量および室内機の設定風向が行動ａｔとなり、ユーザの位置、およびユーザの位置の検出温度と室内機の設定温度との差が状態ｓｔとなる。Ｑ学習では、時刻ｔの状態ｓｔにおける最良の行動ａｔを学習する。 In equation (1), st represents the state of the environment at time t, and at represents action at time t. Action at causes the state to change to st+1. rt+1 represents the reward obtained by changing the state, γ represents the discount rate, and α represents the learning coefficient. γ is in the range of 0<γ≦1, and α is in the range of 0<α≦1. B1 (action) becomes action at, and B2 (state) becomes state st. That is, the set air volume of the indoor unit and the set air direction of the indoor unit are the action at, and the user's position and the difference between the detected temperature of the user's position and the set temperature of the indoor unit are the state st. In Q-learning, the best action at in state st at time t is learned.

式（１）で表される更新式は、時刻ｔ＋１における最もＱ値の高い行動ａの行動価値Ｑが、時刻ｔにおいて実行された行動ａの行動価値Ｑよりも大きければ、行動価値Ｑを大きくし、逆の場合は、行動価値Ｑを小さくする。換言すれば、時刻ｔにおける行動ａの行動価値Ｑを、時刻ｔ＋１における最良の行動価値に近づけるように、行動価値関数Ｑ（ｓ，ａ）を更新する。それにより、或る環境における最良の行動価値が、それ以前の環境における行動価値に順次伝播していくようになる。 The update formula represented by formula (1) increases the action value Q if the action value Q of action a with the highest Q value at time t+1 is greater than the action value Q of action a executed at time t. On the contrary, the action value Q is decreased. In other words, the action value function Q(s, a) is updated so that the action value Q of action a at time t approaches the best action value at time t+1. As a result, the best behavioral value in a certain environment will be propagated to the behavioral value in the previous environment.

上記のように、強化学習によって学習済モデルを生成する場合、モデル生成部１３は、報酬計算部１４と、関数更新部１５とを備える。 As described above, when generating a trained model by reinforcement learning, the model generator 13 includes the reward calculator 14 and the function updater 15 .

報酬計算部１４は、Ｂ１（行動）と、Ｂ２（状態）とに基づいて報酬を計算する。すなわち、報酬計算部１４は、室内機の設定風量および室内機の設定風向と、ユーザの位置、およびユーザの位置の検出温度と室内機の設定温度との差とに基づいて、報酬を計算する。報酬計算部１４は、ユーザの位置における単位時間当りの温度変化量に基づいて、報酬ｒを計算する。例えば、報酬計算部１４は、ユーザの位置における単位時間当りの温度変化量が増加した場合には報酬ｒを増大させ（例えば「１」の報酬を与える。）、他方、ユーザの位置における単位時間当りの温度変化量が減少した場合には報酬ｒを低減する（例えば「－１」の報酬を与える。）。 The reward calculator 14 calculates a reward based on B1 (behavior) and B2 (state). That is, the reward calculation unit 14 calculates a reward based on the set air volume of the indoor unit, the set wind direction of the indoor unit, the position of the user, and the difference between the detected temperature at the user's position and the set temperature of the indoor unit. . The reward calculator 14 calculates a reward r based on the amount of temperature change per unit time at the user's position. For example, if the amount of temperature change per unit time at the user's position increases, the reward calculation unit 14 increases the reward r (for example, gives a reward of "1"). If the amount of temperature change per win is reduced, the reward r is reduced (for example, a reward of "-1" is given).

関数更新部１５は、報酬計算部１４によって計算される報酬に従って、室内機の設定風量および室内機の風向設定を決定するための関数を更新し、学習済モデル記憶部２０に出力する。例えばＱ学習の場合、関数更新部１５は、式（１）で表される行動価値関数Ｑ（ｓｔ，ａｔ）を、室内機の設定風量および室内機の風向設定を算出するための関数として用いる。 The function updating unit 15 updates the function for determining the set air volume of the indoor unit and the wind direction setting of the indoor unit according to the reward calculated by the reward calculating unit 14 , and outputs the function to the learned model storage unit 20 . For example, in the case of Q-learning, the function updating unit 15 uses the action value function Q(st, at) represented by Equation (1) as a function for calculating the set air volume of the indoor unit and the wind direction setting of the indoor unit. .

以上のような学習を繰り返し実行する。学習済モデル記憶部２０は、関数更新部１５によって更新された行動価値関数Ｑ（ｓｔ，ａｔ）、すなわち、学習済モデルを記憶する。 The above learning is repeatedly executed. The learned model storage unit 20 stores the action value function Q(st, at) updated by the function update unit 15, that is, the learned model.

図５は、学習装置１０の学習処理に関するフローチャートである。
ステップＳ１０１において、データ取得部１２は、データ取得部１２は、室内機の設定風量および室内機の設定風向と、ユーザの位置、およびユーザの位置の検出温度と室内機の設定温度との差とを含む学習データを取得する。FIG. 5 is a flowchart relating to the learning process of the learning device 10. As shown in FIG.
In step S101, the data acquisition unit 12 obtains the set air volume of the indoor unit, the set air direction of the indoor unit, the position of the user, and the difference between the detected temperature at the user's position and the set temperature of the indoor unit. Get training data containing

ステップＳ１０２において、モデル生成部１３は、室内機の設定風量および室内機の設定風向と、ユーザの位置、およびユーザの位置の検出温度と室内機の設定温度との差とに基づいて、報酬を計算する。具体的には、報酬計算部１４は、ユーザの位置における単位時間当りの温度変化量に基づいて、報酬を増大させるか、あるいは減少させるかを決定する。 In step S102, the model generation unit 13 calculates a reward based on the set air volume of the indoor unit, the set air direction of the indoor unit, the position of the user, and the difference between the detected temperature at the user's position and the set temperature of the indoor unit. calculate. Specifically, the reward calculator 14 determines whether to increase or decrease the reward based on the amount of temperature change per unit time at the user's position.

報酬計算部１４が報酬を増大させると判断した場合に、処理がステップＳ１０３に進む。報酬計算部１４が報酬を減少させると判断した場合に、処理がステップＳ１０４に進む。 When the remuneration calculation unit 14 determines to increase the remuneration, the process proceeds to step S103. When the remuneration calculation unit 14 determines to decrease the remuneration, the process proceeds to step S104.

ステップＳ１０３において、報酬計算部１４が、報酬を増大させる。
ステップＳ１０４において、報酬計算部１４は、報酬を減少させる。In step S103, the reward calculator 14 increases the reward.
In step S104, the reward calculator 14 reduces the reward.

ステップＳ１０５において、関数更新部１５は、報酬計算部１４によって計算された報酬に基づいて、学習済モデル記憶部２０が記憶する式（１）で表される行動価値関数Ｑ（ｓｔ，ａｔ）を更新する。 In step S105, the function updating unit 15 updates the action value function Q(st, at) expressed by Equation (1) stored in the trained model storage unit 20 based on the reward calculated by the reward calculation unit 14. Update.

学習装置１０は、以上のステップＳ１０１からＳ１０５までのステップを繰り返し実行し、生成された行動価値関数Ｑ（ｓｔ，ａｔ）を学習済モデルとして記憶する。 The learning device 10 repeatedly executes steps S101 to S105 described above, and stores the generated action-value function Q(st, at) as a learned model.

本実施の形態に係る学習装置１０は、学習済モデルを学習装置１０の外部に設けられた学習済モデル記憶部２０に記憶するものとしたが、学習済モデル記憶部２０を学習装置１０の内部に備えていてもよい。 Although the learning device 10 according to the present embodiment stores the learned model in the learned model storage unit 20 provided outside the learning device 10, the learned model storage unit 20 is stored inside the learning device 10. be prepared for

図６は、推論装置３０の構成を表わす図である。推論装置３０は、データ取得部３１、および推論部３２を備える。 FIG. 6 is a diagram showing the configuration of the inference device 30. As shown in FIG. The inference device 30 includes a data acquisition unit 31 and an inference unit 32 .

データ取得部３１は、Ｂ２入力を取得する。すなわち、データ取得部３１は、ユーザの位置、およびユーザの位置の検出温度と室内機の設定温度との差を取得する。 The data acquisition unit 31 acquires the B2 input. That is, the data acquisition unit 31 acquires the user's position and the difference between the detected temperature at the user's position and the set temperature of the indoor unit.

推論部３２は、学習済モデル記憶部２０から、ユーザの位置、およびユーザの位置の検出温度と室内機の設定温度との差から室内機の設定風量および室内機の風向設定を推論するための学習済モデルを読出す。 The inference unit 32 is used to infer the set air volume of the indoor unit and the wind direction setting of the indoor unit from the user's position and the difference between the detected temperature at the user's position and the set temperature of the indoor unit from the learned model storage unit 20. Read the trained model.

推論部３２は、データ取得部３１で取得したデータと、学習済モデルを利用して、Ｃ出力を推論する。すなわち、推論部３２は、学習済モデルにデータ取得部３１が取得した、ユーザの位置、およびユーザの位置の検出温度と室内機の設定温度との差を入力することで、ユーザが所在する場所の温度が設定温度となるように室内機の設定風量および室内機の設定風向を推論することができる。 The inference unit 32 infers the C output using the data acquired by the data acquisition unit 31 and the learned model. That is, the inference unit 32 inputs the location of the user and the difference between the detected temperature of the user's location and the set temperature of the indoor unit, which are acquired by the data acquisition unit 31, into the trained model, thereby determining the location where the user is located. It is possible to infer the set air volume of the indoor unit and the set wind direction of the indoor unit so that the temperature of the indoor unit becomes the set temperature.

たとえば、推論部３２は、学習済モデル記憶部２０から学習済みモデルとして、行動価値関数Ｑ（ｓｔ，ａｔ）を読み出す。推論部３２は、ユーザの位置、およびユーザの位置の検出温度と室内機の設定温度との差（状態ｓｔ）に対して、行動価値関数Ｑ（ｓ，ａ）に基づいて、室内機の設定風量および室内機の設定風向（行動ａｔ）を得る。 For example, the inference unit 32 reads the action-value function Q(st, at) from the learned model storage unit 20 as a learned model. The inference unit 32 determines the setting of the indoor unit based on the action value function Q(s, a) with respect to the user's position and the difference between the detected temperature at the user's position and the set temperature of the indoor unit (state st). Obtain the wind volume and set wind direction (behavior at) of the indoor unit.

本実施の形態では、空気調和装置のモデル生成部で学習した学習済モデルを用いて室内機の設定風量および室内機の風向設定を出力するものとして説明したが、他の空気調和装置から学習済モデルを取得し、この学習済モデルに基づいて室内機の設定風量および室内機の風向設定を出力するようにしてもよい。 In the present embodiment, the learned model learned by the model generation unit of the air conditioner is used to output the set air volume of the indoor unit and the wind direction setting of the indoor unit. A model may be acquired, and the set air volume of the indoor unit and the wind direction setting of the indoor unit may be output based on this learned model.

図７は、推論装置３０による室内機の設定風量および室内機の設定風向の推論手順を表わすフローチャートである。 FIG. 7 is a flowchart showing a procedure for inferring the set air volume of the indoor unit and the set wind direction of the indoor unit by the inference device 30 .

ステップＳ２０１において、データ取得部３１は、ユーザの位置、およびユーザの位置の検出温度と室内機の設定温度との差を取得する。 In step S201, the data acquisition unit 31 acquires the position of the user and the difference between the detected temperature at the position of the user and the set temperature of the indoor unit.

ステップＳ２０２において、推論部３２は、学習済モデル記憶部２０に記憶された学習済モデルに、ユーザの位置、およびユーザの位置の検出温度と室内機の設定温度との差を入力する。 In step S<b>202 , the inference unit 32 inputs the user's position and the difference between the detected temperature at the user's position and the set temperature of the indoor unit to the learned model stored in the learned model storage unit 20 .

ステップＳ２０３において、推論部３２は、学習済モデルか室内機の設定風量および室内機の設定風向を得る。推論部３２は得られた室内機の設定風量および室内機の設定風向を制御装置２に出力する。 In step S203, the inference unit 32 obtains the set air volume of the indoor unit and the set air direction of the indoor unit from the learned model. The inference unit 32 outputs the obtained set air volume of the indoor unit and set air direction of the indoor unit to the control device 2 .

ステップＳ２０４において、制御装置２は、出力された室内機の設定風量および室内機の設定風向を用いて。空気調和装置１を制御する。 In step S204, the control device 2 uses the output set air volume of the indoor unit and set air direction of the indoor unit. It controls the air conditioner 1 .

本実施の形態では、推論部が用いる学習アルゴリズムに強化学習を適用した場合について説明したが、これに限られるものではない。学習アルゴリズムについては、強化学習以外にも、教師あり学習、教師なし学習、または半教師あり学習等を適用することも可能である。 In the present embodiment, the case where reinforcement learning is applied to the learning algorithm used by the inference unit has been described, but the present invention is not limited to this. As for the learning algorithm, supervised learning, unsupervised learning, or semi-supervised learning can be applied in addition to reinforcement learning.

モデル生成部１３に用いられる学習アルゴリズムとしては、特徴量そのものの抽出を学習する深層学習を用いることもできる。あるいは、これに代えて他の公知の方法、例えばニューラルネットワーク、遺伝的プログラミング、機能論理プログラミング、またはサポートベクターマシンなどに従って機械学習を実行してもよい。 As a learning algorithm used in the model generating unit 13, deep learning that learns to extract the feature amount itself can also be used. Alternatively, machine learning may alternatively be performed according to other known methods, such as neural networks, genetic programming, functional logic programming, or support vector machines.

学習装置１０及び推論装置３０は、例えば、ネットワークを介して制御装置２に接続され、制御装置２とは別個の装置であってもよい。また、学習装置１０及び推論装置３０は、制御装置２に内蔵されていてもよい。さらに、学習装置１０及び推論装置３０は、クラウドサーバ上に存在していてもよい。 For example, the learning device 10 and the inference device 30 may be connected to the control device 2 via a network and may be separate devices from the control device 2 . Also, the learning device 10 and the reasoning device 30 may be built in the control device 2 . Furthermore, the learning device 10 and the reasoning device 30 may reside on a cloud server.

モデル生成部１３は、複数の空気調和装置から取得される学習用データを用いて、室内機の設定風量および室内機の風向設定を学習するようにしてもよい。なお、モデル生成部１３は、同一のエリアで使用される複数の空気調和装置から学習用データを取得してもよいし、異なるエリアで独立して動作する複数の空気調和装置から収集される学習用データを利用して室内機の設定風量および室内機の風向設定を学習してもよい。また、学習用データを収集する空気調和装置を途中で対象に追加したり、対象から除去することも可能である。さらに、ある空気調和装置に関して室内機の設定風量および室内機の風向設定を学習した学習装置を、これとは別の空気調和装置に適用し、当該別の空気調和装置に関して室内機の設定風量および室内機の風向設定を再学習して更新するようにしてもよい。 The model generation unit 13 may learn the set air volume of the indoor unit and the wind direction setting of the indoor unit using learning data acquired from a plurality of air conditioners. Note that the model generating unit 13 may acquire learning data from a plurality of air conditioners used in the same area, or may acquire learning data collected from a plurality of air conditioners operating independently in different areas. The set air volume of the indoor unit and the wind direction setting of the indoor unit may be learned using the data for the indoor unit. Also, it is possible to add or remove an air conditioner from which data for learning is collected on the way. Furthermore, a learning device that has learned the set air volume of an indoor unit and the wind direction setting of an indoor unit for a certain air conditioner is applied to another air conditioner, and the set air volume and air direction of the indoor unit are applied to the other air conditioner. The wind direction setting of the indoor unit may be re-learned and updated.

以上のように、本実施の形態によれば、学習装置が、ユーザの位置、およびユーザの位置の検出温度と室内機の設定温度との差から室内機の設定風量および室内機の風向設定を推論する学習済モデルを生成し、推論装置が、学習済モデルに従って、ユーザの位置、およびユーザの位置の検出温度と室内機の設定温度との差からユーザが所在する場所を適切に設定温度にするために、空気調和装置の室内機の設定風量および室内機の設定風向を推定することができる。 As described above, according to the present embodiment, the learning device sets the set air volume and the air direction of the indoor unit based on the position of the user and the difference between the detected temperature at the user's position and the set temperature of the indoor unit. A learned model for inference is generated, and the inference device adjusts the location of the user to the set temperature appropriately based on the user's location and the difference between the detected temperature at the user's location and the set temperature of the indoor unit according to the learned model. In order to do so, it is possible to estimate the set air volume and the set wind direction of the indoor unit of the air conditioner.

実施の形態２．
本実施の形態は、実施の形態１と異なる報酬基準に関する。Embodiment 2.
This embodiment relates to a remuneration standard different from that of the first embodiment.

図８は、実施の形態２の学習装置１０および推論装置３０に入力または出力されるデータを表わす図である。 FIG. 8 is a diagram representing data input to or output from learning apparatus 10 and inference apparatus 30 according to the second embodiment.

Ｂ１（行動）は、室内機の設定風量および室内機の設定風向である。Ｂ２（状態）は、ユーザの位置、およびユーザの位置の検出温度と室内機の設定温度との差である。Ｃ（出力）は、室内機の設定風量および室内機の設定風向である。Ｄ（報酬基準）は、ユーザによる風量または風向の設定操作である。 B1 (behavior) is the set air volume of the indoor unit and the set wind direction of the indoor unit. B2 (state) is the user's position and the difference between the detected temperature at the user's position and the set temperature of the indoor unit. C (output) is the set air volume of the indoor unit and the set air direction of the indoor unit. D (reward standard) is an operation for setting the wind volume or wind direction by the user.

報酬計算部１４は、Ｂ１（行動）と、Ｂ２（状態）とに基づいて報酬を計算する。すなわち、報酬計算部１４は、室内機の設定風量および室内機の設定風向と、ユーザの位置、およびユーザの位置の検出温度と室内機の設定温度との差とに基づいて、報酬を計算する。報酬計算部１４は、ユーザによる風量または風向の設定操作に基づいて、報酬ｒを計算する。例えば、報酬計算部１４は、ユーザによる風量または風向の設定操作が実行されなかった場合には報酬ｒを増大させ（例えば「１」の報酬を与える。）、他方、ユーザによる風量または風向の設定操作が実行された場合には報酬ｒを低減する（例えば「－１」の報酬を与える。）。 The reward calculator 14 calculates a reward based on B1 (behavior) and B2 (state). That is, the reward calculation unit 14 calculates a reward based on the set air volume of the indoor unit, the set wind direction of the indoor unit, the position of the user, and the difference between the detected temperature at the user's position and the set temperature of the indoor unit. . The remuneration calculation unit 14 calculates a remuneration r based on the user's operation for setting the wind volume or wind direction. For example, if the user does not set the wind volume or wind direction, the reward calculation unit 14 increases the reward r (for example, gives a reward of “1”). If the operation is executed, the reward r is reduced (for example, a reward of "-1" is given).

実施の形態３．
図９は、実施の形態３の学習装置の構成を表わす図である。Embodiment 3.
FIG. 9 is a diagram showing the configuration of the learning device according to the third embodiment.

実施の形態３の学習装置１０が実施の形態１の学習装置１０と相違する点は、実施の形態３の学習装置１０が、データ拡張部６２を備える点である。 The learning device 10 according to the third embodiment differs from the learning device 10 according to the first embodiment in that the learning device 10 according to the third embodiment includes a data extender 62 .

データ拡張部６２は、データ取得部１２によって取得された学習用データに含まれるユーザの位置以外の未取得位置に対して、未取得位置と、データ取得部１２によって取得された学習用データに含まれるユーザの位置との差に基づいて、学習用データに含まれるユーザの位置における検出温度と空気調和装置の室内機の設定温度との差と、室内機の設定風量および設定風向とを用いて、未取得位置における検出温度と空気調和装置の室内機の設定温度との差と、室内機の設定風量および設定風向とを含む拡張データを生成する。 The data extension unit 62 determines the unacquired positions other than the user's position included in the learning data acquired by the data acquisition unit 12 and the unacquired positions included in the learning data acquired by the data acquisition unit 12. Based on the difference from the user position stored, the difference between the detected temperature at the user position contained in the learning data and the set temperature of the indoor unit of the air conditioner, and the set air volume and set air direction of the indoor unit. , the extended data including the difference between the detected temperature at the unacquired position and the set temperature of the indoor unit of the air conditioner, and the set air volume and set air direction of the indoor unit.

図１０（ａ）～図１０（ｊ）は、可搬式センサ３および制御装置２から得られたデータの例を表す図である。図１１は、図１０（ａ）～図１０（ｊ）におけるユーザの位置を表わす図である。 10(a) to 10(j) are diagrams showing examples of data obtained from the portable sensor 3 and the control device 2. FIG. FIG. 11 is a diagram showing the positions of the users in FIGS. 10(a) to 10(j).

ユーザの位置、およびユーザの位置の検出温度は、可搬式センサ３によって得られる。室内機の設定風量、および室内機の設定風向は、制御装置２から得られる。 The position of the user and the detected temperature of the position of the user are obtained by the portable sensor 3 . The set air volume of the indoor unit and the set air direction of the indoor unit are obtained from the control device 2 .

ユーザの位置は、室内機を中心とした極座標（ｘ，ｙ）で表わされる。ユーザの位置の検出温度と室内機の設定温度との差は、ユーザの位置（ｘ，ｙ）における温度差Ｔで表される。室内機の設定風量は、ユーザの位置（ｘ，ｙ）において制御装置２によって設定された風量Ｗで表される。室内機の設定風向は、ユーザの位置（ｘ，ｙ）において制御装置２によって設定された風向Ｄで表される。 The user's position is represented by polar coordinates (x, y) centering on the indoor unit. The difference between the detected temperature at the user's position and the set temperature of the indoor unit is represented by the temperature difference T at the user's position (x, y). The set air volume of the indoor unit is represented by the air volume W set by the control device 2 at the user's position (x, y). The set wind direction of the indoor unit is represented by the wind direction D set by the control device 2 at the user's position (x, y).

図１０（ａ）～（ｅ）において、ユーザの位置の角度が一定値ｙａで、ユーザの位置の距離がｘａ、ｘｂ、ｘｃ、ｘｄ、ｘｅと変化する。図１０（ｆ）～（ｊ）において、ユーザの位置の角度が一定値ｙｂで、ユーザの位置の距離がｘａ、ｘｂ、ｘｃ、ｘｄ、ｘｅと変化する。 In FIGS. 10A to 10E, the angle of the user's position is a constant value ya, and the distance of the user's position changes as xa, xb, xc, xd, and xe. In FIGS. 10(f) to (j), the angle of the user's position is a constant value yb, and the distance of the user's position changes to xa, xb, xc, xd, and xe.

図１１に示すように、ユーザの位置（ｘｆ，ｙａ）のデータは、ユーザの位置（ｘｂ、ｙａ）のデータと、ユーザの位置（ｘｃ，ｙａ）のデータとから生成される。 As shown in FIG. 11, the data on the user's position (xf, ya) is generated from the data on the user's position (xb, ya) and the data on the user's position (xc, ya).

データ拡張部６２は、ユーザの位置（ｘｃ、ｙａ）のデータと、ユーザの位置（ｘｄ，ｙａ）のデータとからユーザの位置（ｘｇ，ｙａ）のデータを生成する。データ拡張部６２は、ユーザの位置（ｘｄ、ｙａ）のデータと、ユーザの位置（ｘｄ，ｙｂ）のデータとからユーザの位置（ｘｄ，ｙｃ）のデータを生成する。 The data extension unit 62 generates data on the user's position (xg, ya) from the data on the user's position (xc, ya) and the data on the user's position (xd, ya). The data extension unit 62 generates data on the user's position (xd, yc) from the data on the user's position (xd, ya) and the data on the user's position (xd, yb).

図１２（ａ）～（ｃ）は、データを増加する方法を説明するための図である。
図１２（ａ）に示すように、データ拡張部６２は、ユーザの位置（ｘｂ，ｙａ）の検出温度と室内機の設定温度との差Ｔ（ｘｂ，ｙａ）と、ユーザの位置（ｘｃ，ｙａ）の検出温度と室内機の設定温度との差Ｔ（ｘｃ，ｙａ）とを線形補完することによって、ユーザの位置（ｘｆ，ｙａ）の検出温度と室内機の設定温度との差Ｔ（ｘｆ，ｙａ）を生成する。FIGS. 12A to 12C are diagrams for explaining a method of increasing data.
As shown in FIG. 12(a), the data expansion unit 62 calculates the difference T(xb, ya) between the detected temperature at the user's position (xb, ya) and the set temperature of the indoor unit, the user's position (xc, By linearly interpolating the difference T (xc, ya) between the detected temperature of ya) and the set temperature of the indoor unit, the difference T ( xf, ya).

図１２（ｂ）に示すように、データ拡張部６２は、ユーザの位置（ｘｂ，ｙａ）において制御装置２によって設定された風量Ｗ（ｘｂ，ｙａ）と、ユーザの位置（ｘｃ，ｙａ）において制御装置２によって設定された風量Ｗ（ｘｃ，ｙａ）とを線形補完することによって、ユーザの位置（ｘｆ，ｙａ）において制御装置２によって設定された風量Ｗ（ｘｆ，ｙａ）を生成する。 As shown in FIG. 12(b), the data expansion unit 62 determines the air volume W (xb, ya) set by the control device 2 at the user's position (xb, ya) and By linearly interpolating the air volume W (xc, ya) set by the control device 2, the air volume W (xf, ya) set by the control device 2 is generated at the user's position (xf, ya).

図１２（ｃ）に示すように、データ拡張部６２は、ユーザの位置（ｘｂ，ｙａ）において制御装置２によって設定された風向Ｄ（ｘｂ，ｙａ）と、ユーザの位置（ｘｃ，ｙａ）において制御装置２によって設定された風向Ｄ（ｘｃ，ｙａ）とを線形補完することによって、ユーザの位置（ｘｆ，ｙａ）において制御装置２によって設定された風向Ｄ（ｘｆ，ｙａ）を生成する。 As shown in FIG. 12(c), the data extension unit 62 determines the wind direction D (xb, ya) set by the control device 2 at the user's position (xb, ya) and the wind direction D (xb, ya) at the user's position (xc, ya) By linearly interpolating the wind direction D(xc, ya) set by the controller 2, the wind direction D(xf, ya) set by the controller 2 at the user's position (xf, ya) is generated.

図１３（ａ）～（ｃ）は、データを増加する方法を説明するための図である。
図１３（ａ）に示すように、データ拡張部６２は、ユーザの位置（ｘｄ，ｙａ）の検出温度と室内機の設定温度との差Ｔ（ｘｄ，ｙａ）と、ユーザの位置（ｘｄ，ｙｆ）の検出温度と室内機の設定温度との差Ｔ（ｘｄ，ｙｆ）とを線形補完することによって、ユーザの位置（ｘｄ，ｙｆ）の検出温度と室内機の設定温度との差Ｔ（ｘｆ，ｙｆ）を生成する。FIGS. 13A to 13C are diagrams for explaining a method of increasing data.
As shown in FIG. 13A, the data extension unit 62 calculates the difference T(xd, ya) between the detected temperature at the user's position (xd, ya) and the set temperature of the indoor unit, and the user's position (xd, ya). By linearly interpolating the difference T (xd, yf) between the detected temperature of the user position (xd, yf) and the set temperature of the indoor unit, the difference T ( xf, yf).

図１３（ｂ）に示すように、データ拡張部６２は、ユーザの位置（ｘｄ，ｙａ）において制御装置２によって設定された風量Ｗ（ｘｄ，ｙａ）と、ユーザの位置（ｘｄ，ｙｂ）において制御装置２によって設定された風量Ｗ（ｘｄ，ｙｂ）とを線形補完することによって、ユーザの位置（ｘｄ，ｙｆ）において制御装置２によって設定された風量Ｗ（ｘｄ，ｙｆ）を生成する。 As shown in FIG. 13(b), the data expansion unit 62 determines the air volume W (xd, ya) set by the control device 2 at the user's position (xd, ya), and at the user's position (xd, yb) By linearly interpolating the air volume W (xd, yb) set by the control device 2, the air volume W (xd, yf) set by the control device 2 is generated at the user's position (xd, yf).

図１３（ｃ）に示すように、データ拡張部６２は、ユーザの位置（ｘｄ，ｙａ）において制御装置２によって設定された風向Ｄ（ｘｄ，ｙａ）と、ユーザの位置（ｘｄ，ｙｂ）において制御装置２によって設定された風向Ｄ（ｘｄ，ｙｂ）とを線形補完することによって、ユーザの位置（ｘｄ，ｙｆ）において制御装置２によって設定された風向Ｄ（ｘｄ，ｙｆ）を生成する。 As shown in FIG. 13(c), the data extension unit 62 determines the wind direction D (xd, ya) set by the control device 2 at the user's position (xd, ya) and By linearly interpolating the wind direction D(xd, yb) set by the controller 2, the wind direction D(xd, yf) set by the controller 2 at the user's position (xd, yf) is generated.

変形例．
本開示は、上記の実施形態に限定されるものではない。Modification.
The present disclosure is not limited to the embodiments described above.

（１）図１４は、学習装置１０、推論装置３０、または制御装置２のハードウェア構成を表わす図である。 (1) FIG. 14 is a diagram showing the hardware configuration of learning device 10, inference device 30, or control device 2. As shown in FIG.

学習装置１０、推論装置３０、および制御装置２は、相当する動作をデジタル回路のハードウェアまたはソフトウェアで構成することができる。学習装置１０、推論装置３０、および制御装置２の機能をソフトウェアを用いて実現する場合には、学習装置１０、推論装置３０、および制御装置２は、例えば、図１５に示すように、バス５３によって接続されたプロセッサ５１とメモリ５２とを備え、メモリ５２に記憶されたプログラムをプロセッサ５１が実行するようにすることができる。 The learning device 10, the reasoning device 30, and the control device 2 can be configured with digital circuit hardware or software for corresponding operations. When the functions of the learning device 10, the reasoning device 30, and the control device 2 are realized using software, the learning device 10, the reasoning device 30, and the control device 2 are connected to the bus 53 as shown in FIG. a processor 51 and a memory 52 connected by a , such that the processor 51 executes a program stored in the memory 52 ;

（２）室内に複数の室内機が存在する場合に、これらを連動さえて最適な風向き、風速設定を探索することとしてもよい。 (2) When there are a plurality of indoor units in the room, they may be interlocked to search for the optimum wind direction and wind speed setting.

今回開示された実施の形態はすべての点で例示であって制限的なものではないと考えられるべきである。本開示の範囲は上記した説明ではなくて請求の範囲によって示され、請求の範囲と均等の意味及び範囲内でのすべての変更が含まれることが意図される。 It should be considered that the embodiments disclosed this time are illustrative in all respects and not restrictive. The scope of the present disclosure is indicated by the scope of the claims rather than the above description, and is intended to include all changes within the meaning and scope of equivalents of the scope of the claims.

１空気調和装置、２制御装置、３可搬式センサ、１０学習装置、１２，３１データ取得部、１３モデル生成部、１４報酬計算部、１５関数更新部、２０学習済モデル記憶部、３０推論装置、３２推論部、５１プロセッサ、５２メモリ、５３バス。 1 air conditioner, 2 control device, 3 portable sensor, 10 learning device, 12, 31 data acquisition unit, 13 model generation unit, 14 reward calculation unit, 15 function update unit, 20 learned model storage unit, 30 reasoning device , 32 reasoning unit, 51 processor, 52 memory, 53 bus.

Claims

A state including a position of a user of an air conditioner, a difference between a detected temperature at the user's position and a set temperature of an indoor unit of the air conditioner, and a set air volume and a set wind direction of the indoor unit in the state. a data acquisition unit that acquires learning data;
Using the learning data, the user's position of the air conditioner and the difference between the detected temperature at the user's position and the set temperature of the indoor unit of the air conditioner are used to determine the set air volume of the indoor unit and the temperature of the indoor unit. a model generation unit that generates a trained model for inferring the set wind direction,
The model generation unit generates the learned model by Q-learning,
The model generating unit increases the reward when the amount of temperature change per unit time at the user's position increases, and increases the reward when the temperature change per unit time at the user's position decreases. A learning device for the control of reducing air conditioners.

2. The learning device for controlling an air conditioner according to claim 1, wherein said data acquisition unit acquires said state from data output from a portable sensor.

With respect to unacquired positions other than the user's position included in the learning data acquired by the data acquisition unit, the unacquired position and the user's position included in the learning data acquired by the data acquisition unit. Based on the difference, using the difference between the detected temperature at the user's position and the set temperature of the indoor unit of the air conditioner included in the learning data, and the set air volume and set wind direction of the indoor unit, a data extension unit that generates extended data including the difference between the detected temperature at the unacquired position and the set temperature of the indoor unit of the air conditioner, and the set air volume and set wind direction of the indoor unit;
3. The learning device for controlling an air conditioner according to claim 1, wherein said model generation unit further uses extended data generated by said data extension unit as said learning data.

a data acquisition unit that acquires the position of the user of the air conditioner and the state including the difference between the temperature at the user's position and the set temperature of the indoor unit of the air conditioner;
Acquiring the learned model generated by the learning device for controlling the air conditioner according to any one of claims 1 to 3, and using the learned model, the data acquisition unit acquires the an inference unit that infers a set air volume of the indoor unit and a set wind direction of the indoor unit from the state;
A reasoning device for controlling an air conditioner, comprising:

5. The reasoning device for controlling an air conditioner according to claim 4 , wherein said data acquisition unit acquires said state from data output from a portable sensor.