JP6719434B2

JP6719434B2 - Device control device, device control method, and device control system

Info

Publication number: JP6719434B2
Application number: JP2017183400A
Authority: JP
Inventors: 豊典高木
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2017-09-25
Filing date: 2017-09-25
Publication date: 2020-07-08
Anticipated expiration: 2037-09-25
Also published as: JP2019061334A

Description

本発明は、ユーザの動作に基づいて機器を制御する機器制御装置、機器制御方法及び機器制御システムに関する。 The present invention relates to a device control device, a device control method, and a device control system that control a device based on a user's operation.

ユーザの動作（ジェスチャ）を検知することによって、住宅内に設置された様々な機器への指示を行う機器制御装置が開発されている。ここでユーザの動作とは、機器の操作部に対して直接操作を行うことではなく、機器の操作部から同一又は離れた空間でユーザが行う様々な動作をいう。機器制御装置はユーザが所定の動作を行ったことを検知すると、該動作に基づいて照明、空調機器、ＡＶ（Audio Visual）機器等の機器をオン又はオフにし、あるいは調節することができる。 A device control device has been developed that gives instructions to various devices installed in a house by detecting a user's motion (gesture). Here, the user's operation does not mean that the operation section of the device is directly operated, but various operations performed by the user in the same space or a space apart from the operation section of the device. When the equipment control device detects that the user has performed a predetermined operation, the equipment control device can turn on or off or adjust equipment such as lighting, air conditioning equipment, and AV (Audio Visual) equipment based on the operation.

特許文献１には、ユーザ端末において、ユーザが行った動作を検知し、該動作に関連付けられた相手との通信を開始する技術が記載されている。 Patent Document 1 describes a technique of detecting an action performed by a user in a user terminal and starting communication with a partner associated with the action.

特許第５８２００５８号公報Japanese Patent No. 5820058

しかしながら、ユーザの動作のみによって機器への指示を行う場合には、ユーザは制御対象の機器を指定する動作に加えて、機器のオン、オフ、調節等の各作動に対応する様々な動作を覚える必要がある。特許文献１に記載の技術は、ユーザ端末を保持した手をフリックすることや、ユーザ端末を振ること等を、動作として検出する。このように、動作による機器の制御を実行するためには、ユーザにとって様々な動作を覚えることが必要という難しさがある。 However, when an instruction is given to the device only by the user's action, the user remembers various actions corresponding to each operation such as turning on, off, and adjusting the device, in addition to the action of designating the device to be controlled. There is a need. The technique described in Patent Document 1 detects, as an action, flicking a hand holding a user terminal, shaking the user terminal, and the like. As described above, it is difficult for the user to learn various operations in order to control the device by the operation.

本発明は上述の点に鑑みてなされたものであり、ユーザが動作に基づく機器の制御を容易に行うことが可能な機器制御装置、機器制御方法及び機器制御システムを提供することを目的とする。 The present invention has been made in view of the above points, and an object of the present invention is to provide a device control apparatus, a device control method, and a device control system that allow a user to easily control a device based on an operation. ..

本発明の第１の態様の機器制御装置は、ユーザを撮像することによって得られた画像から前記ユーザの動作を検出する動作検出部と、前記ユーザが発した音声を検出する音声検出部と、前記動作検出部が検出した前記動作によって指定された機器に対して、前記音声検出部が検出した前記音声に対応する制御を行う機器制御部と、を有する。 The device control apparatus according to the first aspect of the present invention includes a motion detection unit that detects a motion of the user from an image obtained by capturing an image of the user, and a sound detection unit that detects a voice uttered by the user, A device control unit that performs control corresponding to the voice detected by the voice detection unit with respect to a device designated by the action detected by the action detection unit.

前記機器制御部は、前記動作検出部が検出した前記動作及び前記音声検出部が検出した前記音声によって指定された前記機器に対して、前記音声検出部が検出した前記音声に対応する前記制御を行ってもよい。 The device control unit controls the device corresponding to the voice detected by the voice detection unit, with respect to the device designated by the action detected by the action detection unit and the voice detected by the voice detection unit. You can go.

前記動作検出部が検出した前記動作によって指定された第１の機器と前記音声検出部が検出した前記音声によって指定された第２の機器とが異なる場合に、前記機器制御部は前記第１の機器及び前記第２の機器を間欠的に動作させてもよい。 When the first device designated by the motion detected by the motion detection unit and the second device designated by the sound detected by the voice detection unit are different, the device control unit controls the first device. The device and the second device may be operated intermittently.

前記動作検出部が検出した前記動作によって指定された第１の機器と前記音声検出部が検出した前記音声によって指定された第２の機器とが異なる場合に、前記機器制御部は所定の規則に従って前記第１の機器及び前記第２の機器のどちらか一方に対して前記制御を行ってもよい。 When the first device designated by the action detected by the action detection unit and the second device designated by the voice detected by the voice detection unit are different, the device control unit follows a predetermined rule. The control may be performed on either one of the first device and the second device.

前記機器制御部は、前記ユーザごとに設定された前記所定の規則に従って、前記第１の機器及び前記第２の機器のどちらか一方に対して前記制御を行ってもよい。 The device control unit may perform the control on either one of the first device and the second device according to the predetermined rule set for each user.

前記音声検出部が検出した前記音声と前記制御の対象である前記機器とを関連付けて学習する学習部をさらに備え、前記機器制御部は、前記学習部によって前記音声検出部が検出した前記音声に関連付けられた前記機器に対して、前記音声検出部が検出した前記音声に対応する前記制御を行ってもよい。 It further comprises a learning unit that learns by associating the voice detected by the voice detection unit with the device to be controlled, and the device control unit adds to the voice detected by the voice detection unit by the learning unit. The control corresponding to the voice detected by the voice detection unit may be performed on the associated device.

前記動作検出部が検出した前記動作によって複数の前記機器が指定された場合に、前記機器制御部は複数の前記機器を間欠的に作動させてもよい。 When a plurality of the devices are designated by the operation detected by the operation detection unit, the device control unit may operate the plurality of devices intermittently.

前記動作検出部が検出した前記動作によって複数の前記機器が指定された場合に、前記機器制御部は複数の前記機器を示す音声を出力してもよい。 When a plurality of devices are designated by the operation detected by the operation detection unit, the device control unit may output a sound indicating the plurality of devices.

前記動作検出部が検出した前記動作によって複数の前記機器が指定された場合に、前記機器制御部は、複数の前記機器に予め設定された優先度に従って複数の前記機器のうち少なくとも１つの機器の前記制御を行ってもよい。 When a plurality of the devices are specified by the operation detected by the operation detection unit, the device control unit determines that at least one of the plurality of devices has a priority set in advance for the plurality of the devices. You may perform the said control.

前記機器制御部は、前記動作検出部が検出した前記動作の時間の長さ及び前記音声検出部が検出した前記音声の大きさが所定の閾値以上である場合に、前記制御を行ってもよい。 The device control unit may perform the control when the length of the operation time detected by the operation detection unit and the volume of the voice detected by the voice detection unit are equal to or more than a predetermined threshold. ..

前記機器制御部は、前記動作検出部が検出した前記動作によって指定された範囲内に位置する前記機器に対して、前記音声検出部が検出した前記音声に対応する前記制御を行ってもよい。 The device control unit may perform the control corresponding to the voice detected by the voice detection unit with respect to the device located within a range specified by the action detected by the action detection unit.

前記機器制御部は、前記ユーザが前記範囲から外へ移動することによって指定された前記範囲内に位置する前記機器に対して、前記音声検出部が検出した前記音声に対応する前記制御を行ってもよい。 The device control unit performs the control corresponding to the voice detected by the voice detection unit on the device located in the range specified by the user moving out of the range. Good.

本発明の第２の態様の機器制御方法は、ユーザを撮像することによって得られた画像から前記ユーザの動作を検出するステップと、前記ユーザが発した音声を検出するステップと、前記動作を検出するステップで検出された前記動作によって指定された機器に対して、前記音声を検出するステップで検出された前記音声に対応する制御を行うステップと、を有する。 A device control method according to a second aspect of the present invention includes a step of detecting a motion of the user from an image obtained by capturing an image of the user, a step of detecting a voice uttered by the user, and a detection of the motion. Performing a control corresponding to the voice detected in the voice detecting step on the device designated by the operation detected in the step.

本発明の第３の態様の機器制御システムは、機器と、前記機器を制御する機器制御装置とを備え、前記機器制御装置は、ユーザを撮像することによって得られた画像から前記ユーザの動作を検出する動作検出部と、前記ユーザが発した音声を検出する音声検出部と、前記動作検出部が検出した前記動作によって指定された前記機器に対して、前記音声検出部が検出した前記音声に対応する制御を行う機器制御部と、を有し、前記機器は、前記機器制御部による前記制御に従って動作する。 A device control system according to a third aspect of the present invention includes a device and a device control device that controls the device, and the device control device shows the operation of the user from an image obtained by capturing an image of the user. For the voice detected by the voice detection unit for the device specified by the action detection unit for detecting the voice, the voice detection unit for detecting the voice uttered by the user, and the action detected by the action detection unit. A device control unit that performs corresponding control, and the device operates according to the control by the device control unit.

本発明によれば、ユーザの動作に基づいて機器の制御を行う際に、ユーザはより簡単に動作に基づく機器の制御を行うことが可能になる。 According to the present invention, when the device is controlled based on the operation of the user, the user can more easily control the device based on the operation.

第１の実施形態に係る機器制御システムの模式図である。It is a schematic diagram of the apparatus control system which concerns on 1st Embodiment. 第１の実施形態に係る制御サーバが検出するユーザの動作及び音声を示す模式図である。It is a schematic diagram which shows a user's operation|movement and audio|voice which the control server which concerns on 1st Embodiment detects. 第１の実施形態に係る制御サーバがユーザの音声から抽出する言語を示す模式図である。It is a schematic diagram which shows the language which the control server which concerns on 1st Embodiment extracts from a user's voice. 第１の実施形態に係る制御サーバで用いられる制御データベース及び学習データベースを示す模式図である。It is a schematic diagram which shows the control database and learning database used by the control server which concerns on 1st Embodiment. 第１の実施形態に係る機器制御システムのブロック図である。It is a block diagram of the apparatus control system which concerns on 1st Embodiment. 第１の実施形態に係る機器制御方法のフローチャートを示す図である。It is a figure which shows the flowchart of the apparatus control method which concerns on 1st Embodiment. 第２の実施形態に係る機器制御システムの模式図である。It is a schematic diagram of the apparatus control system which concerns on 2nd Embodiment. 第３の実施形態に係る機器制御方法のフローチャートを示す図である。It is a figure which shows the flowchart of the apparatus control method which concerns on 3rd Embodiment. 第４の実施形態においてユーザの動作及び音声が指定する範囲を示す模式図である。It is a schematic diagram which shows the range which a user's operation|movement and a voice specify in 4th Embodiment. 第５の実施形態に係る機器制御方法のフローチャートを示す図である。It is a figure which shows the flowchart of the apparatus control method which concerns on 5th Embodiment.

（第１の実施形態）
図１は、本実施形態に係る機器制御システムＳの模式図である。機器制御システムＳは、機器制御装置としての制御サーバ１と、様々な情報を取得する１つ又は複数のセンサ装置２と、制御サーバ１によって制御される１つ又は複数の機器３と、通信を仲介するゲートウェイ４と、を含む。機器制御システムＳは、その他のサーバ、端末等の機器を含んでもよい。 (First embodiment)
FIG. 1 is a schematic diagram of a device control system S according to this embodiment. The device control system S communicates with a control server 1 as a device control device, one or more sensor devices 2 that acquire various information, one or more devices 3 controlled by the control server 1. And an intermediary gateway 4. The device control system S may include devices such as other servers and terminals.

機器３は、制御サーバ１からの制御信号を受けることによって、オン、オフ、調節等の制御がなされる被制御機器である。機器３として、照明、空調機器、ＡＶ機器等、任意の機器を用いることができる。機器３は、制御サーバ１による制御だけでなく、ユーザによる操作によって作動してもよい。機器３は、通信モジュールを有しており、有線通信又は無線通信によってゲートウェイ４を介して制御サーバ１と通信する。 The device 3 is a controlled device that is controlled to be turned on, off, adjusted, and the like by receiving a control signal from the control server 1. As the device 3, any device such as a lighting device, an air conditioning device, an AV device, or the like can be used. The device 3 may be operated not only by the control by the control server 1 but also by an operation by the user. The device 3 has a communication module and communicates with the control server 1 via the gateway 4 by wire communication or wireless communication.

センサ装置２は、画像、音声、温度等の情報を測定して取得する。センサ装置２は、情報の取得に必要な各種のセンサを有する。センサ装置２は機器３に内蔵されてもよく、あるいは独立して設置されてもよい。センサ装置２は、通信モジュールを有しており、有線通信又は無線通信によってゲートウェイ４を介して制御サーバ１と通信する。 The sensor device 2 measures and acquires information such as images, sounds, and temperatures. The sensor device 2 has various sensors necessary for acquiring information. The sensor device 2 may be built in the device 3 or may be installed independently. The sensor device 2 has a communication module and communicates with the control server 1 via the gateway 4 by wire communication or wireless communication.

ゲートウェイ４は、センサ装置２、機器３及び制御サーバ１の間で授受される信号に対してプロトコル変換等の処理を行って通信可能にする中継装置である。ゲートウェイ４は、有線接続又は無線接続によってセンサ装置２及び機器３のそれぞれに接続され、またローカルエリアネットワーク、インターネット等の任意のネットワークＮを介して制御サーバ１に接続される。 The gateway 4 is a relay device that performs processing such as protocol conversion on signals exchanged between the sensor device 2, the device 3, and the control server 1 to enable communication. The gateway 4 is connected to each of the sensor device 2 and the device 3 by a wired connection or a wireless connection, and is also connected to the control server 1 via an arbitrary network N such as a local area network or the Internet.

制御サーバ１は、センサ装置２、機器３及びゲートウェイ４が設置されている建物の外部において、有線通信又は無線通信によってネットワークＮに接続される。制御サーバ１は、単一のコンピュータ、又はコンピュータ資源の集合であるクラウドによって構成される。制御サーバ１は、センサ装置２が測定した情報を示す信号を、ゲートウェイ４を介して受信する。次に制御サーバ１は、センサ装置２からの情報に基づいて、制御対象の機器３及び制御内容を判定する。そして制御サーバ１は、制御内容を示す信号を、ゲートウェイ４を介して機器３へ送信する。 The control server 1 is connected to the network N by wire communication or wireless communication outside the building in which the sensor device 2, the device 3, and the gateway 4 are installed. The control server 1 is configured by a single computer or a cloud that is a collection of computer resources. The control server 1 receives a signal indicating the information measured by the sensor device 2 via the gateway 4. Next, the control server 1 determines the device 3 to be controlled and the control content based on the information from the sensor device 2. Then, the control server 1 transmits a signal indicating the control content to the device 3 via the gateway 4.

［ユーザの動作及び音声の説明］
図２は、本実施形態に係る制御サーバ１が検出するユーザの動作及び音声を示す模式図である。ユーザの動作は、制御対象の機器を指定する動作である。図２の例では、ユーザの動作はユーザの手を所定の形状にすることであり、より具体的には制御対象の機器３に向けて指を差す行為である。ユーザの音声は、制御対象の機器３を指定するとともに、機器３の制御内容を指定する音声である。図２の例では、ユーザが発する「あれつけて」という音声である。 [Explanation of user actions and voice]
FIG. 2 is a schematic diagram showing user actions and voices detected by the control server 1 according to the present embodiment. The user operation is an operation of designating a device to be controlled. In the example of FIG. 2, the user's action is to make the user's hand into a predetermined shape, and more specifically, the action of pointing a finger toward the device 3 to be controlled. The user's voice is a voice that specifies the device 3 to be controlled and also specifies the control content of the device 3. In the example of FIG. 2, it is a voice "promise" issued by the user.

制御サーバ１は、同時又は所定時間内に検出された動作及び音声を１つの組として検出する。すなわち、制御サーバ１は、検出された動作と、該動作が検出された前後の所定の時間内に検出された音声とを１つの組として検出する。制御サーバ１は、動作の指差し方向（すなわち指の延長線上）に位置する機器３を制御対象として選択する。また、制御サーバ１は、音声が示す機器３が既に学習されている場合には、動作の代わりに音声に関連付けられた機器３を制御対象として選択する。そして制御サーバ１は、選択された制御対象の機器に対して、音声に対応する制御内容（例えば機器３をオンにすること）を決定する。 The control server 1 detects a motion and a voice detected simultaneously or within a predetermined time as one set. That is, the control server 1 detects the detected motion and the voice detected within a predetermined time before and after the motion is detected as one set. The control server 1 selects the device 3 located in the pointing direction of the operation (that is, on the extension line of the finger) as the control target. Further, when the device 3 indicated by the voice has already been learned, the control server 1 selects the device 3 associated with the voice instead of the operation as a control target. Then, the control server 1 determines the control content corresponding to the voice (for example, turning on the device 3) for the selected device to be controlled.

図２に示したユーザの動作は一例であり、制御サーバ１はユーザを撮像して得られた画像から判定可能な所定の動作を検出するように構成される。ユーザの動作として、１つの動作に限らず、２つ以上の動作を組み合わせて用いてもよい。また、図２に示したユーザの音声は一例であり、制御サーバ１は後述の特定語及び調整語を示す所定の音声を検出するように構成される。 The operation of the user shown in FIG. 2 is an example, and the control server 1 is configured to detect a predetermined operation that can be determined from an image obtained by capturing an image of the user. The operation of the user is not limited to one operation, and two or more operations may be used in combination. Further, the voice of the user shown in FIG. 2 is an example, and the control server 1 is configured to detect a predetermined voice indicating a specific word and an adjustment word described later.

［ユーザの音声から抽出される言語の説明］
図３（ａ）、図３（ｂ）は、本実施形態に係る制御サーバ１がユーザの音声から抽出する言語を示す模式図である。制御サーバ１は、センサ装置２によって測定した音声から言語を抽出し、該言語に基づいて制御対象及び制御内容を決定する。本実施形態において、音声は、特定語及び調整語を含む。特定語は、制御対象の機器３を特定する言葉である。調整語は、機器３の制御内容を示す言葉である。 [Explanation of languages extracted from user's voice]
FIG. 3A and FIG. 3B are schematic diagrams showing languages extracted by the control server 1 according to this embodiment from the user's voice. The control server 1 extracts a language from the voice measured by the sensor device 2, and determines a control target and control content based on the language. In the present embodiment, the voice includes a specific word and an adjustment word. The specific word is a word that specifies the device 3 to be controlled. The adjustment word is a word indicating the control content of the device 3.

図３（ａ）の例では、ユーザの音声は、「あれ」という特定語、及び「つけて」という調整語を含む。これにより、制御サーバ１は、動作によって指定された方向にある機器３に対して、「つけて」に対応するオンの作動を行わせる。図３（ｂ）の例では、ユーザの音声は、「あのエアコン」という特定語、及び「消して」という調整語を含む。これにより、制御サーバ１は、動作によって指定された方向にあるエアコンである機器３に対して、「消して」に対応するオフの作動を行わせる。 In the example of FIG. 3A, the user's voice includes the specific word “that” and the adjustment word “attach”. As a result, the control server 1 causes the device 3 in the direction designated by the operation to perform the ON operation corresponding to “put on”. In the example of FIG. 3B, the user's voice includes the specific word “that air conditioner” and the adjustment word “turn off”. As a result, the control server 1 causes the device 3 which is the air conditioner in the direction designated by the operation to perform the off operation corresponding to “turn off”.

このように、制御サーバ１は動作及び音声を組み合わせて制御を実行するため、動作のみによって制御対象及び制御内容の両方を指定することがなく、ユーザは様々な動作を覚える必要がない。 As described above, the control server 1 executes the control by combining the action and the voice. Therefore, the control server 1 does not specify both the control target and the control content only by the action, and the user does not need to learn various actions.

［データベースの説明］
図４（ａ）は、本実施形態に係る制御サーバ１で用いられる制御データベースＤを示す模式図である。制御データベースＤは、ユーザの動作及び音声から制御対象の機器３及び制御内容を決定するための情報である。制御データベースＤは、機器番号Ｄ１、位置情報Ｄ２及び種類Ｄ３を関連付けて記憶する。 [Description of database]
FIG. 4A is a schematic diagram showing the control database D used in the control server 1 according to the present embodiment. The control database D is information for determining the device 3 to be controlled and the control content from the user's operation and voice. The control database D stores the device number D1, the position information D2, and the type D3 in association with each other.

機器番号Ｄ１は、機器３を識別するための番号である。機器番号Ｄ１は、例えば登録された機器３の通し番号でもよく、機器３に予め付与されたＭＡＣ（Media Access Control）アドレス等でもよい。 The device number D1 is a number for identifying the device 3. The device number D1 may be, for example, a serial number of the registered device 3 or a MAC (Media Access Control) address given to the device 3 in advance.

位置情報Ｄ２は、機器３の位置を示す情報であり、例えば建物の所定の点を基準とした３次元座標Ｘ、Ｙ、Ｚで表される。機器３の位置は、ユーザによって指差しの動作が行われた際に、指差し方向に位置する機器３を特定するために使用される。機器３の位置は、例えばセンサ装置２によって機器３を撮像して得られた画像から制御サーバ１が推定する。あるいは、ユーザが携帯端末のアプリケーション等を用いて各機器３の位置を制御サーバ１に設定してもよい。この場合には、例えばユーザはアプリケーション上で座標情報を入力する、あるいはアプリケーションに表示された建物内のマップにマークをつけることによって、機器３の位置を制御サーバ１に設定する。本実施形態では３次元座標を用いているが、高さ方向の座標を除いて２次元座標を用いてもよい。この場合には、高さを区別せずに水平方向の位置によって機器３が特定される。 The position information D2 is information indicating the position of the device 3, and is represented by, for example, three-dimensional coordinates X, Y, Z with reference to a predetermined point of the building. The position of the device 3 is used to identify the device 3 located in the pointing direction when the user performs the pointing operation. The position of the device 3 is estimated by the control server 1 from an image obtained by imaging the device 3 with the sensor device 2, for example. Alternatively, the user may set the position of each device 3 in the control server 1 using an application of a mobile terminal or the like. In this case, for example, the user sets the position of the device 3 in the control server 1 by inputting coordinate information on the application or by marking a map in the building displayed in the application. Although three-dimensional coordinates are used in this embodiment, two-dimensional coordinates may be used except for the coordinates in the height direction. In this case, the device 3 is specified by the horizontal position without distinguishing the height.

種類Ｄ３は、機器３の種類を示す情報であり、例えばテレビ、エアコン、ライト等である。機器３の種類は、ユーザの音声で指定された場合に、該種類に機器３の選択を限定するために使用される。機器３の種類は、例えばセンサ装置２によって得られた画像における機器３の外観に基づいて制御サーバ１が推定する。あるいは、ユーザが携帯端末のアプリケーション等を用いて各機器３の種類を制御サーバ１に設定してもよい。この場合には、例えばアプリケーションに複数の機器３の種類が記憶されており、ユーザは機器３ごとに該複数の種類のいずれかを選択することによって、機器３の種類を制御サーバ１に設定する。また、ユーザがアプリケーションに任意の機器３の種類を追加して設定可能にしてもよい。 The type D3 is information indicating the type of the device 3, and is, for example, a television, an air conditioner, a light, or the like. When the type of the device 3 is designated by the voice of the user, the type of the device 3 is used to limit the selection of the device 3 to the type. The type of the device 3 is estimated by the control server 1 based on the appearance of the device 3 in the image obtained by the sensor device 2, for example. Alternatively, the user may set the type of each device 3 in the control server 1 using an application of a mobile terminal or the like. In this case, for example, the types of the plurality of devices 3 are stored in the application, and the user sets the types of the devices 3 in the control server 1 by selecting one of the plurality of types for each device 3. .. Further, the user may add an arbitrary type of the device 3 to the application so that it can be set.

図４（ｂ）は、本実施形態に係る制御サーバ１で用いられる学習データベースＤＤを示す模式図である。学習データベースＤＤは、特定語と制御対象の機器３とを関連付けるように学習された内容を示す情報である。学習データベースＤＤは、機器番号ＤＤ１、学習位置ＤＤ２及び学習語ＤＤ３を関連付けて記憶する。 FIG. 4B is a schematic diagram showing the learning database DD used in the control server 1 according to the present embodiment. The learning database DD is information indicating the content learned to associate the specific word with the device 3 to be controlled. The learning database DD stores the device number DD1, the learning position DD2, and the learning word DD3 in association with each other.

機器番号ＤＤ１は、機器３を識別するための番号である。機器番号ＤＤ１は、制御データベースＤの機器番号Ｄ１に対応する。学習位置ＤＤ２は、学習時のユーザの位置を示す情報であり、例えば建物の所定の点を基準とした３次元座標Ｘ、Ｙ、Ｚで表される。学習語ＤＤ３は、学習された特定語を示す情報である。 The device number DD1 is a number for identifying the device 3. The device number DD1 corresponds to the device number D1 of the control database D. The learning position DD2 is information indicating the position of the user at the time of learning, and is represented by, for example, three-dimensional coordinates X, Y, Z with reference to a predetermined point of the building. The learning word DD3 is information indicating the learned specific word.

制御サーバ１は、ユーザの動作によって指定された制御対象の機器３に対する制御を行った場合に、該機器３（機器番号ＤＤ１）と、該ユーザの位置（学習位置ＤＤ２）と、該音声から抽出された特定語（学習語ＤＤ３）とを関連付けて学習データベースＤＤに記憶する。そして制御サーバ１は、学習位置ＤＤ２の近傍（例えば学習位置ＤＤ２を中心とした所定の範囲内）で学習語ＤＤ３に一致する特定語が検出された場合に、学習位置ＤＤ２及び学習語ＤＤ３に関連付けられた機器番号ＤＤ１の機器３を制御対象として選択する。これにより、あるユーザの位置において特定語が学習済の場合には、ユーザの動作が検出されなくても、該特定語のみによって制御対象の機器３を特定することができる。 The control server 1 extracts from the device 3 (device number DD1), the position of the user (learning position DD2), and the voice when the control target device 3 specified by the operation of the user is controlled. The identified specific word (learning word DD3) is associated and stored in the learning database DD. Then, the control server 1 associates with the learning position DD2 and the learning word DD3 when a specific word that matches the learning word DD3 is detected in the vicinity of the learning position DD2 (for example, within a predetermined range around the learning position DD2). The device 3 with the assigned device number DD1 is selected as a control target. As a result, when the specific word has been learned at the position of a certain user, the device 3 to be controlled can be specified only by the specific word even if the user's action is not detected.

図４（ａ）、図４（ｂ）に示した制御データベースＤ及び学習データベースＤＤは一例であり、制御データベースＤ及び学習データベースＤＤの内容及び形式は、本実施形態に係る機器制御方法で利用可能である限り任意に設定される。図４（ａ）、図４（ｂ）において、制御データベースＤ及び学習データベースＤＤは視認性のために文字列の表で表されているが、制御サーバ１が解釈可能な任意のデータ形式（ファイル形式）で表されてもよく、例えばバイナリデータ又はテキストデータでよい。制御データベースＤ及び学習データベースＤＤは制御サーバ１の内部に記憶されてもよく、制御サーバ１の外部の記憶装置に記憶されてもよい。 The control database D and the learning database DD shown in FIGS. 4A and 4B are examples, and the contents and formats of the control database D and the learning database DD can be used in the device control method according to the present embodiment. Is set arbitrarily. 4A and 4B, the control database D and the learning database DD are represented by a character string table for visibility, but any data format (file Format), and may be, for example, binary data or text data. The control database D and the learning database DD may be stored inside the control server 1 or may be stored in a storage device outside the control server 1.

［機器制御システムＳの構成］
図５は、本実施形態に係る機器制御システムＳのブロック図である。図５において、矢印は主なデータの流れを示しており、図５に示したもの以外のデータの流れがあってよい。図５において、各ブロックはハードウェア（装置）単位の構成ではなく、機能単位の構成を示している。そのため、図５に示すブロックは単一の装置内に実装されてよく、あるいは複数の装置内に別れて実装されてよい。ブロック間のデータの授受は、データバス、ネットワーク、可搬記憶媒体等、任意の手段を介して行われてよい。 [Configuration of device control system S]
FIG. 5 is a block diagram of the device control system S according to the present embodiment. In FIG. 5, arrows indicate main data flows, and there may be data flows other than those shown in FIG. In FIG. 5, each block does not have a hardware (device) unit configuration but a functional unit configuration. As such, the blocks shown in FIG. 5 may be implemented within a single device or may be implemented separately within multiple devices. Data transfer between blocks may be performed via any means such as a data bus, a network, and a portable storage medium.

センサ装置２は、画像センサ２１及び音声センサ２２を有する。画像センサ２１は、撮像した画像を示す情報を出力する撮像素子を有するカメラである。画像センサ２１によって撮像された画像は、制御サーバ１がユーザの動作を検出するために用いられる。また制御サーバ１は、複数の画像センサ２１によって複数の方向から撮像された画像から、ユーザの位置を推定する。あるいは、ユーザの位置を検出するために、赤外線や超音波を被写体（ユーザ）に照射することによって距離を測定する距離センサを設けてもよい。 The sensor device 2 has an image sensor 21 and an audio sensor 22. The image sensor 21 is a camera having an image sensor that outputs information indicating a captured image. The image captured by the image sensor 21 is used by the control server 1 to detect the operation of the user. The control server 1 also estimates the position of the user from the images captured by the plurality of image sensors 21 from the plurality of directions. Alternatively, in order to detect the position of the user, a distance sensor that measures a distance by irradiating an object (user) with infrared rays or ultrasonic waves may be provided.

音声センサ２２は、周囲から取得した音声を示す情報を出力するマイクロフォンである。音声センサ２２は、例えば複数の指向性マイクロフォンを搭載し、いずれの方向からの音声であるかを特定できるように構成されることが望ましい。 The voice sensor 22 is a microphone that outputs information indicating a voice acquired from the surroundings. It is desirable that the voice sensor 22 be equipped with, for example, a plurality of directional microphones, and be configured to identify from which direction the voice is coming.

制御サーバ１は、制御部１１と、通信部１２と、記憶部１３と、を有する。制御部１１は、動作検出部１１１と、音声検出部１１２と、機器選択部１１３と、機器制御部１１４と、学習部１１５とを有する。 The control server 1 includes a control unit 11, a communication unit 12, and a storage unit 13. The control unit 11 includes a motion detection unit 111, a voice detection unit 112, a device selection unit 113, a device control unit 114, and a learning unit 115.

通信部１２は、ゲートウェイ４を介してセンサ装置２及び機器３との間で通信をするための通信インターフェースである。通信部１２は、センサ装置２及び機器３から受信した通信信号に所定の処理を行ってデータを取得し、取得したデータを制御部１１に入力する。また、通信部１２は、制御部１１から入力されたデータに所定の処理を行って通信信号を生成し、生成した通信信号をセンサ装置２及び機器３に送信する。 The communication unit 12 is a communication interface for communicating with the sensor device 2 and the device 3 via the gateway 4. The communication unit 12 performs a predetermined process on the communication signal received from the sensor device 2 and the device 3, acquires data, and inputs the acquired data to the control unit 11. In addition, the communication unit 12 performs a predetermined process on the data input from the control unit 11 to generate a communication signal, and transmits the generated communication signal to the sensor device 2 and the device 3.

記憶部１３は、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）、ハードディスクドライブ等を含む記憶媒体である。記憶部１３は、制御部１１が実行するプログラムを予め記憶している。 The storage unit 13 is a storage medium including a ROM (Read Only Memory), a RAM (Random Access Memory), a hard disk drive, and the like. The storage unit 13 stores a program executed by the control unit 11 in advance.

制御部１１は、例えばＣＰＵ（Central Processing Unit）等のプロセッサであり、記憶部１３に記憶されたプログラムを実行することにより、動作検出部１１１、音声検出部１１２、機器選択部１１３、機器制御部１１４及び学習部１１５として機能する。制御部１１の機能の少なくとも一部は、電気回路によって実行されてもよい。また、制御部１１の機能の少なくとも一部は、ネットワーク経由で実行されるプログラムによって実行されてもよい。 The control unit 11 is, for example, a processor such as a CPU (Central Processing Unit), and executes a program stored in the storage unit 13 to cause the operation detection unit 111, the voice detection unit 112, the device selection unit 113, and the device control unit. It functions as 114 and the learning part 115. At least a part of the function of the control unit 11 may be executed by an electric circuit. Further, at least a part of the functions of the control unit 11 may be executed by a program executed via the network.

本実施形態に係る機器制御システムＳは、図５に示す具体的な構成に限定されない。例えば制御サーバ１は、１つの装置に限られず、２つ以上の物理的に分離した装置が有線又は無線で接続されることにより構成されてもよい。 The device control system S according to the present embodiment is not limited to the specific configuration shown in FIG. For example, the control server 1 is not limited to one device, and may be configured by connecting two or more physically separated devices by wire or wirelessly.

動作検出部１１１は、センサ装置２の画像センサ２１によって撮像された画像から、ユーザの動作及び位置を検出する。具体的には、動作検出部１１１は、取得した画像に対して３次元認識の画像解析技術を適用し、ユーザの動作及び位置を検出する。３次元認識の画像解析技術として、公知の技術を用いることができる。ユーザによる動作が行われない場合には、動作検出部１１１は、ユーザの位置のみを検出する。 The motion detector 111 detects the motion and position of the user from the image captured by the image sensor 21 of the sensor device 2. Specifically, the motion detection unit 111 applies the image analysis technique of three-dimensional recognition to the acquired image to detect the motion and position of the user. A known technique can be used as the image analysis technique for three-dimensional recognition. When the operation by the user is not performed, the operation detection unit 111 detects only the position of the user.

動作検出部１１１による動作検出と並行して、音声検出部１１２は、センサ装置２の音声センサ２２から取得した音声から、特定語及び調整語を検出する。具体的には、音声検出部１１２は、取得した音声に対して音声認識技術を適用し、該音声から言語を抽出する。音声認識技術として、公知の技術を用いることができる。 In parallel with the motion detection by the motion detection unit 111, the voice detection unit 112 detects the specific word and the adjustment word from the voice acquired from the voice sensor 22 of the sensor device 2. Specifically, the voice detection unit 112 applies a voice recognition technique to the acquired voice and extracts a language from the voice. A known technique can be used as the voice recognition technique.

そして音声検出部１１２は、抽出した言語に含まれる特定語及び調整語を検出する。例えば特定語及び調整語は予めリストとして定義されており、音声検出部１１２は、抽出した言語とリストとを比較することによって、該言語中の特定語及び調整語を検出する。さらに音声検出部１１２は、検出した調整語が指定する内容（オン、オフ、調節等）を制御内容として決定する。 Then, the voice detection unit 112 detects the specific word and the adjustment word included in the extracted language. For example, the specific word and the adjustment word are defined in advance as a list, and the voice detection unit 112 detects the specific word and the adjustment word in the language by comparing the extracted language with the list. Furthermore, the voice detection unit 112 determines the content (on, off, adjustment, etc.) designated by the detected adjustment word as the control content.

ここで音声検出部１１２は、動作検出部１１１が用いる画像が撮像された時間と同時又はその前後の所定の時間内に取得された音声を、検出する対象の音声とする。これにより、動作及び音声を１つの組として検出できる。 Here, the sound detection unit 112 sets the sound acquired at the same time as the time when the image used by the motion detection unit 111 is captured or within a predetermined time before and after the image as the sound to be detected. Thereby, the motion and the voice can be detected as one set.

機器選択部１１３は、学習データベースＤＤにおいて、動作検出部１１１が検出したユーザの位置（学習位置ＤＤ２）及び音声検出部１１２が検出した特定語（学習語ＤＤ３）に対応する学習内容があるか否かを判定する。ここで機器選択部１１３は、動作検出部１１１が検出した位置が、学習内容が示す位置の近傍（例えば学習位置ＤＤ２を中心とした所定の範囲内）である場合に、検出した位置は学習内容が示す位置に対応していると判定する。動作検出部１１１が検出したユーザの位置及び音声検出部１１２が検出した特定語に対応する学習内容がある場合に、機器選択部１１３は、該位置及び特定語に関連付けられた機器３（機器番号ＤＤ１）を示す学習内容を取得する。 In the learning database DD, the device selection unit 113 determines whether or not there is learning content corresponding to the position of the user detected by the motion detection unit 111 (learning position DD2) and the specific word detected by the voice detection unit 112 (learning word DD3). Determine whether. Here, when the position detected by the operation detection unit 111 is in the vicinity of the position indicated by the learning content (for example, within a predetermined range around the learning position DD2), the device selection unit 113 determines that the detected position is the learning content. Is determined to correspond to the position indicated by. When there is learning content corresponding to the position of the user detected by the motion detection unit 111 and the specific word detected by the voice detection unit 112, the device selection unit 113 causes the device 3 (device number associated with the position and the specific word). The learning content indicating DD1) is acquired.

そして機器選択部１１３は、取得した学習内容が示す機器３を制御対象として選択する。このように、動作検出部１１１が検出したユーザの位置において音声検出部１１２が検出した特定語が既に学習されている場合には、機器選択部１１３は、ユーザの動作を用いずに制御対象の機器３を選択できる。 Then, the device selection unit 113 selects the device 3 indicated by the acquired learning content as a control target. In this way, when the specific word detected by the voice detection unit 112 has already been learned at the position of the user detected by the motion detection unit 111, the device selection unit 113 causes the device selection unit 113 to control the target object without using the user's motion. The device 3 can be selected.

学習データベースＤＤにおいて動作検出部１１１が検出したユーザの位置及び音声検出部１１２が検出した特定語に対応する学習内容がない場合に、機器選択部１１３は、動作検出部１１１が検出したユーザの動作に対応する機器３を制御対象として選択する。具体的には、機器選択部１１３は、制御データベースＤにおける各機器３の位置情報Ｄ２から、動作検出部１１１が検出したユーザの位置を起点とした指の延長線上に位置する機器３（すなわち機器番号Ｄ１）を制御対象として決定する。 When there is no learning content corresponding to the position of the user detected by the motion detection unit 111 and the specific word detected by the voice detection unit 112 in the learning database DD, the device selection unit 113 causes the motion of the user detected by the motion detection unit 111. The device 3 corresponding to is selected as the control target. Specifically, the device selection unit 113 determines, based on the position information D2 of each device 3 in the control database D, the device 3 (that is, the device 3) located on the extension line of the finger starting from the position of the user detected by the motion detection unit 111. The number D1) is determined as the control target.

さらに、機器選択部１１３は、動作検出部１１１が検出したユーザの動作に対応する機器３を制御対象として選択する際に、音声検出部１１２が検出した特定語を組み合わせて利用してもよい。この場合に、機器選択部１１３は、ユーザの位置を起点とした指の延長線上に位置する複数の機器３のうち、特定語によって限定された１つの機器３を選択する。 Furthermore, the device selection unit 113 may combine and use the specific word detected by the voice detection unit 112 when selecting the device 3 corresponding to the user's motion detected by the motion detection unit 111 as a control target. In this case, the device selection unit 113 selects one device 3 limited by the specific word from the plurality of devices 3 located on the extension line of the finger starting from the position of the user.

具体的には、機器選択部１１３は、機器３の種類（例えば「エアコン」）を示す特定語が検出された場合には、動作に対応する複数の機器３のうち、制御データベースＤにおいて特定語が示す種類に一致する種類Ｄ３に関連付けられた機器３を選択する。 Specifically, when the specific word indicating the type of the device 3 (for example, “air conditioner”) is detected, the device selection unit 113 selects the specific word in the control database D among the plurality of devices 3 corresponding to the operation. The device 3 associated with the type D3 that matches the type indicated by is selected.

さらに、機器選択部１１３は、選択可能な機器３の位置の範囲を限定するために特定語を利用してもよい。この場合に、例えば「あれ」という特定語は遠方を示し、「これ」という特定語は近傍を示すことが制御サーバ１に予め定義される。そして機器選択部１１３は、「あれ」という特定語が検出された場合には動作に対応する複数の機器３のうち最も遠くの機器３を選択し、「これ」という特定語の場合には該複数の機器３のうち最も近くの機器３を選択する。 Furthermore, the device selection unit 113 may use a specific word to limit the range of positions of the device 3 that can be selected. In this case, for example, the specific word "that" indicates a distant place, and the specific word "this" indicates a neighborhood, which is defined in advance in the control server 1. Then, when the specific word "that" is detected, the device selection unit 113 selects the farthest device 3 from the plurality of devices 3 corresponding to the operation, and when the specific word "this" is selected, the device 3 is selected. The closest device 3 is selected from the plurality of devices 3.

このように制御サーバ１は、動作と音声とを組み合わせて制御対象の機器３を選択することによって、より柔軟に機器３の選択を受け付けることができる。 As described above, the control server 1 can more flexibly accept the selection of the device 3 by selecting the device 3 to be controlled by combining the operation and the voice.

動作の誤検出を抑制するために、ユーザの動作の長さ及びユーザの音声の大きさに基づいて、制御を行ってもよい。具体的には、動作検出部１１１、音声検出部１１２及び機器選択部１１３は、同一の動作が所定時間継続された場合であって、音声の大きさが所定閾値以上である場合に、動作及び音声に基づいて制御対象及び制御内容を決定する。これにより、機器３への指示を意図していない日常的な動作及び音声を検出して誤って機器３を制御することを抑制することができる。 In order to suppress erroneous detection of motion, control may be performed based on the length of motion of the user and the volume of the user's voice. Specifically, the motion detection unit 111, the voice detection unit 112, and the device selection unit 113 perform the motion and operation when the same motion is continued for a predetermined time and the volume of the voice is equal to or higher than a predetermined threshold. The control target and control contents are determined based on the voice. As a result, it is possible to prevent erroneous control of the device 3 by detecting a daily operation and voice that are not intended to give an instruction to the device 3.

動作検出部１１１、音声検出部１１２及び機器選択部１１３がユーザの動作及び音声に関連付けられた制御対象及び制御内容を決定した場合に、機器制御部１１４は、決定された制御内容に従って制御を行うための制御信号を生成し、決定された制御対象の機器３に通信部１２を介して送信する。機器３は、ゲートウェイ４を介して制御サーバ１から受信した制御信号に従って、オン、オフ、調節等の所定の作動を行う。 When the motion detection unit 111, the voice detection unit 112, and the device selection unit 113 determine the control target and the control content associated with the user's motion and voice, the device control unit 114 performs control according to the determined control content. A control signal for generating the control signal is generated and transmitted to the determined device 3 to be controlled via the communication unit 12. The device 3 performs predetermined operations such as on, off, and adjustment according to the control signal received from the control server 1 via the gateway 4.

動作検出部１１１が検出したユーザの位置において音声検出部１１２が検出した特定語がまだ学習されていない場合に、学習部１１５は、該位置及び該特定語について選択された機器３を学習する。具体的には、学習部１１５は、機器制御部１１４が制御を実行した際に、動作検出部１１１が検出したユーザの位置と、音声検出部１１２が検出した特定語と、機器選択部１１３が制御対象として選択した機器３とを、学習データベースＤＤに登録する。 When the specific word detected by the voice detection unit 112 is not yet learned at the position of the user detected by the motion detection unit 111, the learning unit 115 learns the device 3 selected for the position and the specific word. Specifically, when the device control unit 114 executes the control, the learning unit 115 determines that the position of the user detected by the motion detection unit 111, the specific word detected by the voice detection unit 112, and the device selection unit 113. The device 3 selected as the control target is registered in the learning database DD.

学習部１１５は、初めて制御が実行された機器３、ユーザの位置及び特定語の組み合わせを学習内容として学習データベースＤＤに登録してもよい。あるいは、学習部１１５は、制御が実行された機器３、ユーザの位置及び特定語の組み合わせの回数を計数し、所定の回数になった組み合わせのみを学習内容として学習データベースＤＤに登録してもよい。 The learning unit 115 may register, in the learning database DD, the combination of the device 3 for which control has been executed for the first time, the position of the user, and the specific word as learning content. Alternatively, the learning unit 115 may count the number of combinations of the device 3 for which the control has been executed, the position of the user, and the specific word, and register only the combination having the predetermined number of times as learning content in the learning database DD. ..

センサ装置２から制御サーバ１へ画像及び音声を常時送信すると通信負荷が大きいため、センサ装置２は、画像から所定の動作に近い動作を検出した場合にのみ、制御サーバ１への画像及び音声の送信を開始してもよい。この場合には、センサ装置２は動作検知ソフトウェアを実行するコンピュータとして構成され、所定の動作に近い動作を予め登録する。センサ装置２は、該動作検知ソフトウェアを用いて、画像センサ２１が撮像した画像から予め登録された動作を検知した場合に、制御サーバ１への画像及び音声の送信を開始する。これにより、センサ装置２から制御サーバ１へ送信されるデータの通信量を低減することができる。 Since the communication load is large when the image and the sound are constantly transmitted from the sensor device 2 to the control server 1, the sensor device 2 sends the image and the sound to the control server 1 only when the motion close to the predetermined motion is detected from the image. The transmission may be started. In this case, the sensor device 2 is configured as a computer that executes motion detection software, and a motion close to a predetermined motion is registered in advance. When the sensor device 2 detects a pre-registered motion from the image captured by the image sensor 21 using the motion detection software, the sensor device 2 starts transmitting the image and the sound to the control server 1. As a result, the communication amount of data transmitted from the sensor device 2 to the control server 1 can be reduced.

［機器制御方法のフローチャート］
図６は、本実施形態に係る機器制御方法のフローチャートを示す図である。図６のフローチャートは、例えばユーザが制御サーバ１に対して所定の開始操作を行うことによって開始される。 [Flow chart of device control method]
FIG. 6 is a diagram showing a flowchart of the device control method according to the present embodiment. The flowchart of FIG. 6 is started, for example, when the user performs a predetermined start operation on the control server 1.

まず動作検出部１１１は、センサ装置２の画像センサ２１によって撮像された画像からユーザの動作及び位置を検出するとともに、音声検出部１１２は、センサ装置２の音声センサ２２から取得した音声から、特定語及び調整語を検出する（Ｓ１１）。ステップＳ１１においてユーザによる動作を検出できない場合には、動作検出部１１１は、ユーザの位置のみを検出する。 First, the motion detection unit 111 detects the motion and position of the user from the image captured by the image sensor 21 of the sensor device 2, and the voice detection unit 112 identifies from the voice acquired from the voice sensor 22 of the sensor device 2. The word and the adjustment word are detected (S11). When the motion by the user cannot be detected in step S11, the motion detection unit 111 detects only the position of the user.

ステップＳ１１で動作検出部１１１及び音声検出部１１２が位置及び音声を検出しない場合に（Ｓ１２のＮＯ）、制御サーバ１はステップＳ１１に戻って処理を繰り返す。ステップＳ１１で動作検出部１１１及び音声検出部１１２が位置及び音声を検出した場合に（Ｓ１２のＹＥＳ）、機器選択部１１３は、学習データベースＤＤから、ステップＳ１１で動作検出部１１１が検出した位置（学習位置ＤＤ２）及び音声検出部１１２が検出した特定語（学習語ＤＤ３）に対応する学習内容があるか否かを判定する（Ｓ１３）。 When the operation detecting unit 111 and the voice detecting unit 112 do not detect the position and the voice in step S11 (NO in S12), the control server 1 returns to step S11 and repeats the process. When the motion detection unit 111 and the sound detection unit 112 detect the position and the sound in step S11 (YES in S12), the device selection unit 113 detects the position (the position detected by the motion detection unit 111 in step S11 from the learning database DD). It is determined whether or not there is learning content corresponding to the learning position DD2) and the specific word (learning word DD3) detected by the voice detection unit 112 (S13).

検出した位置及び特定語に対応する学習内容がある場合に（Ｓ１４のＹＥＳ）、機器選択部１１３は、該位置及び特定語に関連付けられた機器３（機器番号ＤＤ１）を示す学習内容を取得する。そして機器選択部１１３は、取得した学習内容が示す機器３を制御対象として選択する（Ｓ１５）。 If there is learning content corresponding to the detected position and specific word (YES in S14), the device selection unit 113 acquires learning content indicating the device 3 (device number DD1) associated with the position and specific word. .. Then, the device selection unit 113 selects the device 3 indicated by the acquired learning content as a control target (S15).

検出した位置及び特定語に対応する学習内容がない場合に（Ｓ１４のＮＯ）、機器選択部１１３は、ステップＳ１１で動作検出部１１１が検出したユーザの動作に対応する機器３を制御対象として選択する（Ｓ１６）。 When there is no learning content corresponding to the detected position and specific word (NO in S14), the device selection unit 113 selects the device 3 corresponding to the user's motion detected by the motion detection unit 111 in step S11 as a control target. Yes (S16).

音声検出部１１２は、ステップＳ１１で音声検出部１１２が検出した調整語が指定する内容（オン、オフ、調節等）を制御内容として決定する（Ｓ１７）。 The voice detection unit 112 determines the content (on, off, adjustment, etc.) designated by the adjustment word detected by the voice detection unit 112 in step S11 as the control content (S17).

機器制御部１１４は、ステップＳ１７で音声検出部１１２が決定した制御内容に従って制御を行うための制御信号を生成し、ステップＳ１５又はＳ１６で機器選択部１１３が選択した制御対象の機器３に通信部１２を介して送信する（Ｓ１８）。このとき、ステップＳ１１で動作検出部１１１が検出したユーザの位置において音声検出部１１２が検出した特定語がまだ学習されていない場合に、学習部１１５は、該位置及び該特定語についてステップＳ１６で機器選択部１１３が選択した機器３を学習する。 The device control unit 114 generates a control signal for performing control in accordance with the control content determined by the voice detection unit 112 in step S17, and communicates with the device 3 to be controlled selected by the device selection unit 113 in step S15 or S16. It transmits via 12 (S18). At this time, when the specific word detected by the voice detection unit 112 is not yet learned at the position of the user detected by the motion detection unit 111 in step S11, the learning unit 115 determines in step S16 about the position and the specific word. The device 3 selected by the device selection unit 113 is learned.

所定の終了条件（例えばユーザが制御サーバ１に対して所定の終了操作を行うこと）が満たされない場合に（Ｓ１９のＮＯ）、制御サーバ１はステップＳ１１に戻って処理を繰り返す。所定の終了条件が満たされる場合に（Ｓ１９のＹＥＳ）、制御サーバ１は処理を終了する。 When a predetermined end condition (for example, the user performing a predetermined end operation on the control server 1) is not satisfied (NO in S19), the control server 1 returns to step S11 and repeats the processing. When the predetermined end condition is satisfied (YES in S19), the control server 1 ends the process.

［動作及び音声が矛盾した場合の処理］
ユーザの動作が示す内容と、ユーザの音声が示す内容とが矛盾する場合がある。例えば、ユーザの動作（テレビを指さす動作）によって選択される第１の機器３（テレビ）と、特定語（「あのエアコン」という音声）の学習内容から選択される第２の機器３（エアコン）とが異なる場合である。この理由としては、ユーザ自身が誤って矛盾する動作及び音声を発したことや、動作を行ったユーザと音声を発したユーザとが異なっていることが考えられる。 [Processing when operation and voice are inconsistent]
The content indicated by the user's action and the content represented by the user's voice may conflict. For example, the first device 3 (TV) selected by the user's action (pointing to the television) and the second device 3 (air conditioner) selected from the learning content of the specific word (the voice "that air conditioner"). When is different from. This may be because the user himself/herself erroneously made a contradictory action and voice, or the user who performed the action and the user who made a voice were different.

この場合に、制御サーバ１は、ステップＳ１７の後であってステップＳ１８の前に、動作及び音声の矛盾を解消する処理を行う。具体的には、機器制御部１１４は、動作及び音声によって選択された複数の機器３に、それぞれランプの点滅（すなわち間欠的な作動）等の作動を行わせる。そして動作検出部１１１、音声検出部１１２及び機器選択部１１３は、ユーザの動作及び音声を改めて検出することによって制御対象及び制御内容の再選択を受け付ける。これにより、制御サーバ１は、動作及び音声が矛盾したことをユーザに知らせることができ、制御対象の機器３を改めて選択させることができる。選択された機器３を制御対象の機器３として、機器制御部１１４は、ステップＳ１８以降の処理を行う。 In this case, the control server 1 performs the process of resolving the contradiction between the operation and the voice after step S17 and before step S18. Specifically, the device control unit 114 causes each of the plurality of devices 3 selected by the operation and voice to perform an operation such as blinking of a lamp (that is, intermittent operation). Then, the motion detection unit 111, the voice detection unit 112, and the device selection unit 113 accept the reselection of the control target and the control content by newly detecting the motion and voice of the user. As a result, the control server 1 can notify the user that the operation and the sound are inconsistent, and can reselect the device 3 to be controlled. The device control unit 114 performs the processes from step S18 onward by using the selected device 3 as the control target device 3.

別の方法として、制御サーバ１は、動作及び音声が矛盾した場合に、動作及び音声のどちらを優先するかを示す所定の規則に従って、動作及び音声のどちらか一方に基づいて制御対象の機器３を選択してもよい。選択された機器３を制御対象の機器３として、機器制御部１１４は、ステップＳ１８以降の処理を行う。 As another method, the control server 1 controls the device 3 to be controlled based on either the action or the voice according to a predetermined rule indicating which of the action and the voice has priority when the action and the voice are inconsistent. May be selected. The device control unit 114 performs the processes from step S18 onward by using the selected device 3 as the control target device 3.

また、制御サーバ１は、動作及び音声が矛盾した場合に、ユーザごとに動作及び音声のどちらを優先するか判定してもよい。この場合に、制御サーバ１には、ユーザごとに動作及び音声のどちらを優先するかを示す規則が予め設定される。そして機器選択部１１３は、画像又は音声からユーザを識別し、識別したユーザごとの設定に基づいて動作及び音声のどちらか一方に基づいて制御対象の機器３を選択する。ユーザの識別は、画像又は音声に画像認識技術又は音声認識技術を適用することによって行われる。画像認識技術又は音声認識技術として、公知の技術を用いることができる。選択された機器３を制御対象の機器３として、機器制御部１１４は、ステップＳ１８以降の処理を行う。 In addition, the control server 1 may determine which of the action and the voice has priority for each user when the action and the voice are inconsistent. In this case, the control server 1 is preliminarily set with a rule indicating which of action and voice should be prioritized for each user. Then, the device selection unit 113 identifies the user from the image or the voice, and selects the device 3 to be controlled based on either the operation or the voice based on the setting for each identified user. User identification is performed by applying image recognition technology or voice recognition technology to images or sounds. A known technique can be used as the image recognition technique or the voice recognition technique. The device control unit 114 performs the processes from step S18 onward by using the selected device 3 as the control target device 3.

［第１の実施形態の効果］
本実施形態に係る制御サーバ１は、ユーザの動作及び音声を組み合わせることによって制御対象及び制御内容を決定する。そのため、ユーザは様々な動作を覚える必要がなく、動作を用いる機器制御を容易に行うことができる。 [Effects of First Embodiment]
The control server 1 according to the present embodiment determines a control target and control content by combining a user's action and voice. Therefore, the user does not need to learn various operations and can easily perform device control using the operations.

さらに、制御サーバ１はユーザの動作によって指定された機器３と特定語とを関連付けるように学習することによって、次回からは学習された特定語のみによって動作を伴わずに制御対象の機器３を選択可能にする。すなわち、初回には「あれつけて」という言葉と指差し動作とを組み合わせて制御対象の機器３を特定する必要があるが、次回からは「あれつけて」という言葉のみで同一の機器３を特定可能になる。このように、ユーザは「あれ」や「これ」のような特定語が示す機器３を制御サーバ１に自動的に学習させることができるため、ユーザの利便性を向上させることができる。 Furthermore, the control server 1 learns to associate the device 3 designated by the user's action with the specific word, so that from the next time on, the device 3 to be controlled is selected only by the learned specific word without any action. enable. That is, at the first time, it is necessary to specify the device 3 to be controlled by combining the word "between" and the pointing operation, but from the next time, the same device 3 will be called only by the word "between". It becomes possible to identify. In this way, the user can have the control server 1 automatically learn the device 3 indicated by a specific word such as "that" or "this", so that the convenience of the user can be improved.

（第２の実施形態）
第１の実施形態において制御サーバ１は建物の外部に配置され、ネットワークＮを介してセンサ装置２及び機器３と通信する。そのため、ネットワークＮの通信状況によっては動作や音声の検出及び機器の制御の実行に時間が掛かるおそれがある。また、センサ装置２から制御サーバ１へ画像及び音声を常時送信すると通信負荷が大きい。それに対して、本実施形態では、建物の外部に配置された制御サーバ１−１に加えて、建物内に制御サーバ１−２を設ける。 (Second embodiment)
In the first embodiment, the control server 1 is arranged outside the building and communicates with the sensor device 2 and the device 3 via the network N. Therefore, depending on the communication status of the network N, it may take time to detect the operation and voice and execute the control of the device. Further, if images and sounds are constantly transmitted from the sensor device 2 to the control server 1, the communication load is large. On the other hand, in this embodiment, in addition to the control server 1-1 arranged outside the building, the control server 1-2 is provided inside the building.

図７は、本実施形態に係る機器制御システムＳの模式図である。機器制御システムＳは、第１の実施形態と同様のセンサ装置２、機器３及びゲートウェイ４に加えて、建物外においてネットワークＮに接続されている制御サーバ１−１、及び建物内においてゲートウェイ４とネットワークＮとの間に接続されている制御サーバ１−２を含む。制御サーバ１−１及び制御サーバ１−２は、それぞれ図５の制御サーバ１と同様の構成を有する。 FIG. 7 is a schematic diagram of the device control system S according to the present embodiment. The device control system S includes a sensor device 2, a device 3 and a gateway 4 similar to those in the first embodiment, a control server 1-1 connected to a network N outside the building, and a gateway 4 inside the building. It includes a control server 1-2 connected to the network N. The control server 1-1 and the control server 1-2 have the same configurations as the control server 1 of FIG. 5, respectively.

本実施形態では、建物内の制御サーバ１−２及び建物外の制御サーバ１−１の間で処理やデータを調整することによって、動作及び音声の検出の速度及び精度のバランスをとる。例えば建物内の制御サーバ１−２において計算量の小さい検出処理を行い、建物外の制御サーバ１−１において計算量の大きい検出処理を行う。あるいは例えば建物内の制御サーバ１−２の記憶部１３に少ない量の制御データベースＤ及び学習データベースＤＤを記憶し、建物外の制御サーバ１−１の記憶部１３に多い量の制御データベースＤ及び学習データベースＤＤを記憶する。 In the present embodiment, the speed and accuracy of motion and voice detection are balanced by adjusting processing and data between the control server 1-2 inside the building and the control server 1-1 outside the building. For example, the control server 1-2 inside the building performs a detection process with a small calculation amount, and the control server 1-1 outside the building performs a detection process with a large calculation amount. Alternatively, for example, a small amount of the control database D and the learning database DD are stored in the storage unit 13 of the control server 1-2 inside the building, and a large amount of the control database D and learning are stored in the storage unit 13 of the control server 1-1 outside the building. Store the database DD.

まず建物内の制御サーバ１−２は、センサ装置２からの情報に基づいて動作及び音声の検出を行う。そして制御サーバ１−２が動作及び音声を検出できなかった場合に、制御サーバ１−２は建物外の制御サーバ１−１にセンサ装置２からの情報を転送し、制御サーバ１−１は、センサ装置２からの情報に基づいて動作及び音声の検出を行う。制御サーバ１−２及び制御サーバ１−１のどちらか一方は、検出した動作及び音声に基づいて機器３の制御を行う。 First, the control server 1-2 in the building detects an operation and a voice based on the information from the sensor device 2. When the control server 1-2 cannot detect the operation and the voice, the control server 1-2 transfers the information from the sensor device 2 to the control server 1-1 outside the building, and the control server 1-1 The operation and the voice are detected based on the information from the sensor device 2. One of the control server 1-2 and the control server 1-1 controls the device 3 based on the detected operation and voice.

［第２の実施形態の効果］
一般的に、建物内の制御サーバ１−２はネットワークＮを介さないため通信負荷が掛からないが、設置スペースや設置コストによって機器構成に制約がある。一方、建物外の制御サーバ１−１は高度な機器構成を備えることができるため高速に処理を行えるが、通信負荷が掛かる。そこで本実施形態に係る機器制御システムＳは、建物内の制御サーバ１−２で簡易的な検出を行い、建物外の制御サーバ１−１で複雑な検出を行う。このような構成によって、動作及び音声の検出の精度を維持しつつも検出の速度を向上できる。 [Effects of Second Embodiment]
In general, the control server 1-2 in the building does not have a communication load because it does not pass through the network N, but the device configuration is restricted by the installation space and installation cost. On the other hand, the control server 1-1 outside the building can perform processing at high speed because it can have a high-level device configuration, but a communication load is applied. Therefore, in the device control system S according to the present embodiment, the control server 1-2 inside the building performs a simple detection, and the control server 1-1 outside the building performs a complicated detection. With such a configuration, it is possible to improve the speed of detection while maintaining the accuracy of operation and voice detection.

（第３の実施形態）
本実施形態は、ユーザの動作及び音声によって複数の制御対象の候補が発生した場合に、ユーザに該候補を通知し、ユーザが該候補のうち１つの機器３を改めて選択して制御することを可能にする。 (Third Embodiment)
In the present embodiment, when a plurality of control target candidates are generated due to the user's operation and voice, the user is notified of the candidates, and the user newly selects and controls one device 3 of the candidates. enable.

［機器制御方法のフローチャート］
図８は、本実施形態に係る機器制御方法のフローチャートを示す図である。図８のフローチャートでは、図６のフローチャートにおいてステップＳ１５、Ｓ１６の後であってステップＳ１７の前に、ステップＳ２０〜Ｓ２２が行われる。それ以外のステップは図６のフローチャートと同様であるため、以下では説明を省略する。 [Flow chart of device control method]
FIG. 8 is a diagram showing a flowchart of the device control method according to the present embodiment. In the flowchart of FIG. 8, steps S20 to S22 are performed after steps S15 and S16 and before step S17 in the flowchart of FIG. The other steps are the same as those in the flowchart of FIG.

機器選択部１１３がステップＳ１５又はステップＳ１６で複数の機器３を制御対象として選択した場合に、該機器３を複数の制御対象の候補として扱う。複数の制御対象の候補がない場合には（Ｓ２０のＮＯ）、制御サーバ１は図６のフローチャートと同様のステップＳ１７〜Ｓ１９を行う。複数の制御対象の候補がある場合に（Ｓ２０のＹＥＳ）、機器制御部１１４は、所定の方法によって複数の制御対象の候補を通知する（Ｓ２１）。 When the device selection unit 113 selects a plurality of devices 3 as control targets in step S15 or step S16, the device 3 is treated as a plurality of control target candidates. If there is no candidate for a plurality of control targets (NO in S20), the control server 1 performs steps S17 to S19 similar to the flowchart of FIG. When there are a plurality of control target candidates (YES in S20), the device control unit 114 notifies the plurality of control target candidates by a predetermined method (S21).

通知方法として、機器制御部１１４は、複数の制御対象の候補である機器３に、ランプの点滅（すなわち間欠的な作動）等の作動を行わせる。このような方法によれば、ユーザは、機器３の点滅等を見ることによって、複数の制御対象の候補を知ることができる。 As a notification method, the device control unit 114 causes the device 3, which is a plurality of control target candidates, to perform an operation such as blinking of a lamp (that is, intermittent operation). According to such a method, the user can know a plurality of control target candidates by looking at the blinking of the device 3 and the like.

別の通知方法として、機器選択部１１３は、予め記憶された優先順位に従って、複数の制御対象の候補のうち２つ以上の所定の数（例えば３つ）の機器３を選択する。そして機器制御部１１４は、センサ装置２に設けられたスピーカ（不図示）から、機器選択部１１３が選択した機器３を示す音声を出力する。機器３を示す音声は、制御サーバ１に予め記憶された機器３を識別するための音声でもよく、あるいはユーザが制御サーバ１に設定した音声（例えば録音されたユーザの音声）でもよい。このような方法によれば、ユーザは、音声を聴くことによって、機器３自体を見なくても、複数の制御対象の候補を知ることができる。 As another notification method, the device selection unit 113 selects a predetermined number (for example, three) of the devices 3 of two or more from the plurality of candidates of the control target according to the priority order stored in advance. Then, the device control unit 114 outputs a sound indicating the device 3 selected by the device selection unit 113 from a speaker (not shown) provided in the sensor device 2. The voice indicating the device 3 may be a voice stored in advance in the control server 1 for identifying the device 3, or a voice set by the user in the control server 1 (for example, a recorded voice of the user). According to such a method, the user can know a plurality of control target candidates by listening to the sound without looking at the device 3 itself.

複数の制御対象の候補の通知を受けたユーザは、１つの制御対象の機器３を指定するように、改めて動作を行うとともに音声を発する。動作検出部１１１、音声検出部１１２及び機器選択部１１３は、ユーザの動作及び音声を改めて検出することによって制御対象及び制御内容の再選択を受け付ける（Ｓ２２）。ステップＳ２２の再選択の受付方法は、ステップＳ１１〜Ｓ１６と同様である。そして制御サーバ１はステップＳ２０に戻って処理を繰り返す。 The user who receives the notification of the plurality of control target candidates performs the operation again and outputs a voice so as to specify one control target device 3. The motion detection unit 111, the voice detection unit 112, and the device selection unit 113 accept the reselection of the control target and the control content by newly detecting the user's motion and voice (S22). The method of accepting the reselection in step S22 is the same as that in steps S11 to S16. Then, the control server 1 returns to step S20 and repeats the processing.

別の方法として、機器選択部１１３は、制御サーバ１に予め記憶された優先度に従って、複数の制御対象の候補のうち１つの機器３を制御対象として決定してもよい。この場合には、図８のフローチャートにおいてステップＳ２２の再選択の受付は省略される。 As another method, the device selection unit 113 may determine one device 3 out of a plurality of control target candidates as a control target according to the priority stored in advance in the control server 1. In this case, the acceptance of the reselection in step S22 in the flowchart of FIG. 8 is omitted.

［第３の実施形態の効果］
本実施形態に係る制御サーバ１は、第１の実施形態と同様の効果を奏するとともに、ユーザの動作及び音声によって複数の制御対象の候補が発生した場合に、該候補をユーザに通知し、再選択を受け付ける。これにより、例えば機器３の位置や学習内容によって意図せず複数の機器が選択された場合であっても、ユーザは複数の候補が発生したことを知った上で改めて機器３の選択を行うことができるため、意図しない機器３を制御することを抑制できる。 [Effects of Third Embodiment]
The control server 1 according to the present embodiment has the same effect as that of the first embodiment, and when a plurality of control target candidates are generated due to the user's operation and voice, the control server 1 notifies the user of the candidates and re-executes them. Accept the selection. As a result, for example, even when a plurality of devices are unintentionally selected due to the position of the device 3 or the contents of learning, the user can select the device 3 again after knowing that a plurality of candidates have occurred. Therefore, it is possible to suppress the control of the unintended device 3.

（第４の実施形態）
本実施形態は、ユーザの動作及び音声によって、機器３が位置する部屋（範囲）に基づいて機器３を制御することを可能にする。 (Fourth Embodiment)
The present embodiment makes it possible to control the device 3 based on the room (range) in which the device 3 is located, by the action and voice of the user.

［指差しによる範囲Ｒの指定］
図９は、本実施形態においてユーザの動作及び音声が指定する範囲Ｒを示す模式図である。図９は、建物の内部を上方から見た状態を示す。図９には、ユーザの動作及び音声によって指定された範囲Ｒが破線で表されている。 [Specify range R by pointing]
FIG. 9 is a schematic diagram showing a range R designated by a user's action and voice in the present embodiment. FIG. 9 shows the inside of the building as viewed from above. In FIG. 9, the range R designated by the user's action and voice is represented by a broken line.

制御サーバ１には、建物内を区画する範囲Ｒを示す範囲情報が予め記憶されている。範囲Ｒは、建物の所定の点を基準とした座標範囲として定義される。範囲Ｒは、例えば部屋ごとに設定される。 Range information indicating a range R for partitioning the building is stored in the control server 1 in advance. The range R is defined as a coordinate range based on a predetermined point of the building. The range R is set for each room, for example.

動作検出部１１１は、センサ装置２の画像センサ２１によって撮像された画像から、ユーザの動作及び位置を検出する。また、音声検出部１１２は、センサ装置２の音声センサ２２から取得した音声から、特定語及び調整語を検出する。 The motion detector 111 detects the motion and position of the user from the image captured by the image sensor 21 of the sensor device 2. The voice detection unit 112 also detects a specific word and an adjustment word from the voice acquired from the voice sensor 22 of the sensor device 2.

範囲Ｒを指定する語（例えば「あの部屋」）は、予め制御サーバ１に記憶される。音声検出部１１２が検出した特定語が範囲Ｒを指定する語を含む場合に、機器選択部１１３は、動作検出部１１１が検出したユーザの位置を起点とした指の延長線上に位置する範囲Ｒを、範囲情報から特定する。 A word designating the range R (for example, “that room”) is stored in the control server 1 in advance. When the specific word detected by the voice detection unit 112 includes the word designating the range R, the device selection unit 113 causes the device detection unit 111 to detect the range R located on the extension line of the finger starting from the position of the user. From the range information.

そして音声検出部１１２が検出した特定語が機器３の種類（例えば「エアコン」）を含む場合に、機器選択部１１３は、特定された範囲Ｒの中に位置する機器３のうち、制御データベースＤにおいて特定語が示す種類に一致する種類Ｄ３に関連付けられた機器３を選択する。音声検出部１１２が検出した特定語が機器３の種類を含まない場合には、機器選択部１１３は、特定された範囲Ｒの中に位置する全ての機器３を選択する。このような構成において、ユーザは動作及び音声によって部屋を指定することによって、指定した部屋内に位置する機器３を制御することができる。 Then, when the specific word detected by the voice detection unit 112 includes the type of the device 3 (for example, “air conditioner”), the device selection unit 113 selects the control database D among the devices 3 located in the specified range R. The device 3 associated with the type D3 that matches the type indicated by the specific word is selected. When the specific word detected by the voice detection unit 112 does not include the type of the device 3, the device selection unit 113 selects all the devices 3 located in the specified range R. In such a configuration, the user can control the device 3 located in the designated room by designating the room by operation and voice.

［ユーザの移動による範囲Ｒの指定］
さらに、ユーザがある範囲Ｒ（部屋）から外へ移動した後、所定時間内に「消して」のような調整語を含む音声を発した場合に、機器選択部１１３はユーザの移動前の範囲Ｒの中に位置する機器３を選択してもよい。換言すると、機器選択部１１３は、ユーザが範囲Ｒから外へ移動することを、ユーザが範囲Ｒを指定する動作として利用する。 [Specify range R by moving user]
Further, when the user moves out of a certain range R (room) and then outputs a voice including an adjustment word such as “turn off” within a predetermined time, the device selection unit 113 causes the device selection unit 113 to move to the range before the movement. The device 3 located in R may be selected. In other words, the device selection unit 113 uses the movement of the user out of the range R as an operation for the user to specify the range R.

また、ユーザがある範囲Ｒ（部屋）から外へ移動した後、所定時間内に「エアコン消して」のように機器３の種類を示す特定語及び調整語を含む音声を発した場合には、機器選択部１１３は、ユーザの移動前の範囲Ｒの中に位置する機器３のうち、制御データベースＤにおいて特定語が示す種類に一致する種類Ｄ３に関連付けられた機器３を選択する。このような構成により、ユーザは部屋に向かって指差しすることなく、ユーザの移動前の部屋内の機器３を制御することができる。 Further, when the user moves out of a certain range R (room) and then outputs a voice including a specific word and an adjustment word indicating the type of the device 3 such as “turn off the air conditioner” within a predetermined time, The device selection unit 113 selects the device 3 associated with the type D3 that matches the type indicated by the specific word in the control database D among the devices 3 positioned in the range R before the user moves. With such a configuration, the user can control the device 3 in the room before the user moves without pointing the room.

［第４の実施形態の効果］
本実施形態に係る制御サーバ１は、第１の実施形態と同様の効果を奏するとともに、動作及び音声によって指定された部屋内に位置する機器３を制御する。これにより、ユーザの利便性を向上させることができる。例えばユーザから特定の部屋の中にある機器３を視認できない状態であっても、ユーザは該部屋を指定することによって該部屋の中にある機器３を制御することができる。 [Effects of Fourth Embodiment]
The control server 1 according to the present embodiment has the same effect as that of the first embodiment and controls the device 3 located in the room designated by the operation and the voice. This can improve the convenience of the user. For example, even when the user cannot visually recognize the device 3 in the specific room, the user can control the device 3 in the room by designating the room.

（第５の実施形態）
第１の実施形態は常時ユーザの動作及び音声を検出するのに対して、本実施形態はユーザの開始音声が検出された場合にユーザの動作及び音声の検出を開始する。これにより動作及び音声の検出のために制御サーバ１に掛かる負荷を軽減することができる。 (Fifth Embodiment)
While the first embodiment constantly detects the user's action and voice, this embodiment starts detecting the user's action and voice when the user's start voice is detected. As a result, the load on the control server 1 for detecting the operation and the voice can be reduced.

［機器制御方法のフローチャート］
図１０は、本実施形態に係る機器制御方法のフローチャートを示す図である。図１０のフローチャートでは、図６のフローチャートにおいてステップＳ１１の前に、ステップＳ２３〜Ｓ２４が行われる。それ以外のステップは図６のフローチャートと同様であるため、以下では説明を省略する。 [Flow chart of device control method]
FIG. 10 is a diagram showing a flowchart of the device control method according to the present embodiment. In the flowchart of FIG. 10, steps S23 to S24 are performed before step S11 in the flowchart of FIG. The other steps are the same as those in the flowchart of FIG.

まず制御サーバ１は、開始音声が検出される前には動作検出部１１１による動作の検出を行わず、音声検出部１１２による開始音声の検出を開始する。音声検出部１１２は、センサ装置２の音声センサ２２から取得した音声から、所定の開始音声を検出する（Ｓ２３）。開始音声は、例えば「ＯＫ」、「Ｈｅｌｌｏ」等の制御サーバ１に記憶された音声である。また、開始音声は、ユーザが制御サーバ１に予め録音した音声でもよい。例えば音声検出部１１２は、センサ装置２の音声センサ２２から取得した音声の周波数スペクトルに、予め録音された開始音声の周波数スペクトルの所定の割合以上が含まれている場合に、開始音声を検出する。所定の割合は、開始音声と一致していると考えられる程度の割合であり、例えば９０％である。音声検出部１１２は、音声センサ２２から取得した音声の周波数スペクトルのパターンと、予め録音された開始音声の周波数スペクトルのパターンとの相関度が所定値以上である場合に、開始音声を検出してもよい。 First, the control server 1 does not detect the motion by the motion detector 111 before the start voice is detected, and starts the detection of the start voice by the voice detector 112. The voice detection unit 112 detects a predetermined start voice from the voices acquired from the voice sensor 22 of the sensor device 2 (S23). The start voice is a voice stored in the control server 1, such as “OK” and “Hello”. The start voice may be a voice prerecorded by the user in the control server 1. For example, the voice detection unit 112 detects the start voice when the frequency spectrum of the voice acquired from the voice sensor 22 of the sensor device 2 includes a predetermined ratio or more of the frequency spectrum of the start voice recorded in advance. .. The predetermined rate is a rate at which it is considered to match the start voice, and is 90%, for example. The voice detection unit 112 detects the start voice when the degree of correlation between the pattern of the frequency spectrum of the voice acquired from the voice sensor 22 and the pattern of the frequency spectrum of the pre-recorded start voice is equal to or more than a predetermined value. Good.

開始音声はあくまでユーザの動作及び音声の検出を開始するためのトリガとして用いられ、その後にユーザの動作及び音声が行われることによって初めて機器３の制御が行われる。開始音声の検出は高い精度で行われる必要はないため、例えば音声認識技術を適用して開始音声から言語を抽出するような計算量の大きい処理を行わない。これにより、開始音声が検出されるまでの制御サーバ１の負荷は大幅に軽減される。 The start voice is used only as a trigger for starting the detection of the user's action and voice, and the device 3 is controlled only after the user's action and voice are performed. Since the detection of the starting voice does not need to be performed with high accuracy, a large amount of calculation processing such as applying a speech recognition technique to extract a language from the starting voice is not performed. As a result, the load on the control server 1 until the start voice is detected is significantly reduced.

ステップＳ２３で音声検出部１１２が開始音声を検出しない場合に（Ｓ２４のＮＯ）、制御サーバ１はステップＳ２３に戻って処理を繰り返す。ステップＳ２３で音声検出部１１２が開始音声を検出した場合に（Ｓ２４のＹＥＳ）、図６のフローチャートと同様のステップＳ１〜Ｓ１９を行う。また、制御サーバ１は、開始音声が検出された時点から所定の時間が経過した場合に、動作の検出を停止し、再びステップＳ２３に戻って開始音声の検出を行ってもよい。このように開始音声を検出した後の所定期間のみ、画像からの動作の検出を行うことによって、制御サーバ１の処理負荷及び通信負荷を軽減することができるため好適である。 When the voice detection unit 112 does not detect the start voice in step S23 (NO in S24), the control server 1 returns to step S23 and repeats the process. When the voice detection unit 112 detects the start voice in step S23 (YES in S24), steps S1 to S19 similar to the flowchart of FIG. 6 are performed. In addition, the control server 1 may stop the detection of the operation and return to step S23 again to detect the start voice when a predetermined time has elapsed from the time when the start voice was detected. In this way, the processing load and the communication load of the control server 1 can be reduced by detecting the operation from the image only for a predetermined period after detecting the start voice, which is preferable.

音声検出部１１２が検出した開始音声の特徴情報に基づいて、制御サーバ１は制御可能な機器３又は機器３の機能を制限してもよい。音声の特徴情報は、例えば声の高さや声紋である。音声検出部１１２は、センサ装置２の音声センサ２２から取得した音声から、特定語及び調整語を検出するとともに、音声の特徴情報を取得する。そして機器制御部１１４は、音声検出部１１２が検出した音声の特徴情報に基づいて、動作検出部１１１、音声検出部１１２及び機器選択部１１３が決定した制御の可否を判定する。 The control server 1 may limit the controllable device 3 or the function of the device 3 based on the characteristic information of the start voice detected by the voice detection unit 112. The voice characteristic information is, for example, the pitch of a voice or a voiceprint. The voice detection unit 112 detects a specific word and an adjustment word from the voice acquired from the voice sensor 22 of the sensor device 2 and acquires the characteristic information of the voice. Then, the device control unit 114 determines whether or not the control determined by the motion detection unit 111, the audio detection unit 112, and the device selection unit 113 is possible based on the characteristic information of the audio detected by the audio detection unit 112.

具体的には、例えば機器制御部１１４は音声の高さからユーザの年齢を推定し、推定した年齢が所定の年齢以上である場合に制御を許可し、そうでない場合に制御を拒否する。また、例えば機器制御部１１４は音声の声紋からユーザを識別し、識別したユーザの年齢が所定の年齢以上である場合に制御を許可し、そうでない場合に制御を拒否する。また、機器制御部１１４は、機器３の作動（オン、オフ、調節等）ごとに制御の可否を判定してもよい。開始音声の特徴情報に基づく制限の規則は、制御サーバ１に予め記憶されてもよく、あるいはユーザによって任意に設定されてもよい。 Specifically, for example, the device control unit 114 estimates the age of the user from the pitch of the voice, permits the control when the estimated age is a predetermined age or more, and rejects the control when it is not. Further, for example, the device control unit 114 identifies the user from the voiceprint of the voice, permits the control when the age of the identified user is a predetermined age or more, and rejects the control otherwise. In addition, the device control unit 114 may determine whether control is possible for each operation of the device 3 (ON, OFF, adjustment, etc.). The restriction rule based on the characteristic information of the start voice may be stored in advance in the control server 1 or may be arbitrarily set by the user.

このような構成により、制御サーバ１は、例えばユーザが子供の場合に、加熱機器（例えば電気コンロ、電気ストーブ等の加熱機器）のような安全に係る機器３の制御を制限することができる。 With such a configuration, the control server 1 can limit the control of the safety-related device 3 such as a heating device (for example, a heating device such as an electric stove or an electric stove) when the user is a child.

［第５の実施形態の効果］
本実施形態に係る制御サーバ１は、第１の実施形態と同様の効果を奏するとともに、開始音声が検出された場合にユーザの動作及び音声の検出を開始する。これにより、制御サーバ１は常時ユーザの動作及び音声の監視を行う必要がないため、制御サーバ１の負荷を軽減することができる。さらに制御サーバ１は、ユーザの音声の特徴情報に基づいて制御の可否を判定することによって、特定の機器３の制御を制限し、安全性を向上させることができる。 [Effects of the fifth embodiment]
The control server 1 according to the present embodiment achieves the same effect as that of the first embodiment, and starts the action of the user and the detection of the voice when the start voice is detected. As a result, the control server 1 does not need to constantly monitor the user's operation and voice, so that the load on the control server 1 can be reduced. Furthermore, the control server 1 can limit the control of the specific device 3 and improve the safety by determining whether or not the control is possible based on the characteristic information of the voice of the user.

以上、本発明を実施の形態を用いて説明したが、本発明の技術的範囲は上記実施の形態に記載の範囲には限定されず、その要旨の範囲内で種々の変形及び変更が可能である。例えば、装置の分散・統合の具体的な実施の形態は、以上の実施の形態に限られず、その全部又は一部について、任意の単位で機能的又は物理的に分散・統合して構成することができる。また、複数の実施の形態の任意の組み合わせによって生じる新たな実施の形態も、本発明の実施の形態に含まれる。組み合わせによって生じる新たな実施の形態の効果は、もとの実施の形態の効果を合わせ持つ。 Although the present invention has been described above using the embodiments, the technical scope of the present invention is not limited to the scope described in the above embodiments, and various modifications and changes are possible within the scope of the gist thereof. is there. For example, the specific embodiment of device distribution/integration is not limited to the above embodiment, and all or part of the device may be functionally or physically distributed/integrated in arbitrary units. You can Further, a new embodiment that occurs due to an arbitrary combination of a plurality of embodiments is also included in the embodiment of the present invention. The effect of the new embodiment produced by the combination also has the effect of the original embodiment.

制御サーバ１の制御部１１（プロセッサ）は、図６、８、１０に示す機器制御方法に含まれる各ステップ（工程）の主体となる。すなわち、制御部１１は、図６、８、１０に示す機器制御方法を実行するための機器制御プログラムを記憶部１３から読み出し、該機器制御プログラムを実行して制御サーバ１の各部を制御することによって、図６、８、１０に示す機器制御方法を実行する。図６、８、１０に示す機器制御方法に含まれるステップは一部省略されてもよく、ステップ間の順番が変更されてもよく、複数のステップが並行して行われてもよい。 The control unit 11 (processor) of the control server 1 is a main body of each step (process) included in the device control method shown in FIGS. That is, the control unit 11 reads out a device control program for executing the device control method shown in FIGS. 6, 8 and 10 from the storage unit 13 and executes the device control program to control each unit of the control server 1. The device control method shown in FIGS. Steps included in the device control method shown in FIGS. 6, 8, and 10 may be partially omitted, the order of steps may be changed, and a plurality of steps may be performed in parallel.

Ｓ機器制御システム
１制御サーバ
１１１動作検出部
１１２音声検出部
１１３機器選択部
１１４機器制御部
１１５学習部
３機器 S device control system 1 control server 111 operation detection unit 112 voice detection unit 113 device selection unit 114 device control unit 115 learning unit 3 device

Claims

A motion detection unit that detects the motion of the user from the image obtained by capturing the user,
A voice detection unit for detecting the voice uttered by the user,
For a device designated by the operation detected by the operation detection unit, a device control unit that performs control corresponding to the voice detected by the voice detection unit,
Have a,
The device control unit controls the device corresponding to the voice detected by the voice detection unit, with respect to the device designated by the action detected by the action detection unit and the voice detected by the voice detection unit. Done,
When the first device designated by the action detected by the action detection unit and the second device designated by the voice detected by the voice detection unit are different, the device control unit follows a predetermined rule. A device control device that performs the control on either one of the first device and the second device.

A motion detection unit that detects the motion of the user from the image obtained by capturing the user,
A voice detection unit for detecting the voice uttered by the user,
For a device designated by the operation detected by the operation detection unit, a device control unit that performs control corresponding to the voice detected by the voice detection unit,
Have
The device control unit controls the device corresponding to the voice detected by the voice detection unit, with respect to the device designated by the action detected by the action detection unit and the voice detected by the voice detection unit. Done,
When the first device designated by the motion detected by the motion detection unit and the second device designated by the sound detected by the voice detection unit are different, the device control unit controls the first device. A device control device for intermittently operating a device and the second device.

A motion detection unit that detects the motion of the user from the image obtained by capturing the user,
A voice detection unit for detecting the voice uttered by the user,
For a device designated by the operation detected by the operation detection unit, a device control unit that performs control corresponding to the voice detected by the voice detection unit,
Have
The device control unit detects, by the voice detection unit, the device located in the range specified by the operation, which is detected by the operation detection unit and indicates that the user moves out of the range. An apparatus control device that performs the control corresponding to the sound that has been performed.

The device control unit controls the one of the first device and the second device according to the predetermined rule set for each user.
The device control apparatus according to claim 1 .

Further comprising a learning unit for learning by associating the voice detected by the voice detection unit with the device to be controlled,
The device control unit performs the control corresponding to the voice detected by the voice detection unit on the device associated with the voice detected by the voice detection unit by the learning unit,
The device control apparatus according to any one of claims 1 to 4.

When a plurality of the devices are designated by the operation detected by the operation detection unit, the device control unit operates a plurality of the devices intermittently,
The device control apparatus according to any one of claims 1 to 5.

When a plurality of devices are designated by the operation detected by the operation detection unit, the device control unit outputs a sound indicating the plurality of devices,
The device control apparatus according to any one of claims 1 to 5.

When a plurality of the devices are specified by the operation detected by the operation detection unit, the device control unit determines that at least one of the plurality of devices has a priority set in advance for the plurality of the devices. Perform the control,
The device control apparatus according to any one of claims 1 to 5.

The device control unit performs the control when the length of the operation detected by the operation detection unit and the volume of the sound detected by the sound detection unit is equal to or more than a predetermined threshold value,
The device control apparatus according to any one of claims 1 to 8 .

Detecting a motion of the user from an image obtained by capturing the user,
Detecting the voice uttered by the user,
A step of performing control corresponding to the voice detected in the step of detecting the voice, with respect to a device designated by the action detected in the step of detecting the voice,
Have a,
In the step of performing the control, the action detected in the step of detecting the action and the device specified in the voice detected in the step of detecting the voice are detected in the step of detecting the voice. Perform the control corresponding to voice,
Performing the control when the first device designated by the action detected in the step of detecting the action and the second device designated by the voice detected in the step of detecting the voice are different Then, a device control method , wherein the control is performed on one of the first device and the second device according to a predetermined rule .

A device and a device control device for controlling the device,
The device control device is
A motion detection unit that detects the motion of the user from the image obtained by capturing the user,
A voice detection unit for detecting the voice uttered by the user,
For the device specified by the operation detected by the operation detection unit, a device control unit that performs control corresponding to the voice detected by the voice detection unit,
Have
The device control unit controls the device corresponding to the voice detected by the voice detection unit, with respect to the device designated by the action detected by the action detection unit and the voice detected by the voice detection unit. Done,
When the first device designated by the action detected by the action detection unit and the second device designated by the voice detected by the voice detection unit are different, the device control unit follows a predetermined rule. The control is performed on either one of the first device and the second device,
A device control system in which the device operates according to the control by the device control unit.