JP2018036902A

JP2018036902A - Equipment operation system, equipment operation method, and equipment operation program

Info

Publication number: JP2018036902A
Application number: JP2016170107A
Authority: JP
Inventors: 祐司篠村; Yuji Shinomura; 藤原　直樹; Naoki Fujiwara; 直樹藤原; 泉　賢二; Kenji Izumi; 賢二泉
Original assignee: Shimane Prefecture
Current assignee: Shimane Prefecture
Priority date: 2016-08-31
Filing date: 2016-08-31
Publication date: 2018-03-08

Abstract

PROBLEM TO BE SOLVED: To provide an equipment operation system for preventing an erroneous input due to an input whose operation is not intended from being performed to operation target equipment to be operated on the basis of a gesture input and a voice input.SOLUTION: An equipment operation system for operating equipment on the basis of a gesture input and a voice input comprises: operation intention confirmation means for confirming the presence/absence of an operation intention without executing the operation of the equipment by the gesture input or the voice input; and operation instruction means for instructing the operation of the equipment by at least one of the gesture input and the voice input. The operation intention confirmation means is characterized to, when it is determined that a specific gesture indicating the operation intention has been detected as the gesture input, and it is detected that a specific voice keyword indicating the operation intention has been detected as the voice input, determine that the operation intention is present, and to proceed to processing by the operation instruction means.SELECTED DRAWING: Figure 2

Description

本発明は機器操作システム、機器操作方法および機器操作プログラムに関する。 The present invention relates to a device operation system, a device operation method, and a device operation program.

従来、入力用のユーザインタフェースとして、ユーザの身振り、手まねなどをカメラなどの撮像装置より画像として取得し、画像処理により認識するジェスチャ認識技術を用いたジェスチャ入力装置が知られている。例えば、車両の運転席前面に設けられた表示装置における表示内容をジェスチャ操作により変更すること（特許文献１）や、ＨＭＤ（ＨｅａｄＭｏｕｎｔｅｄＤｉｓｐｌａｙ）における視点切り替えをジェスチャ入力により行なうこと（特許文献２）が知られている。特許文献２のＨＭＤでは、ジェスチャ以外にもボタン／スイッチ入力、視線入力、音声入力、筋電入力、脳波入力などの複数の入力手段を用いて表示を切り替えることができる。 2. Description of the Related Art Conventionally, as a user interface for input, a gesture input device using a gesture recognition technology that acquires user gestures and imitations as an image from an imaging device such as a camera and recognizes it by image processing is known. For example, the display content on the display device provided in front of the driver's seat of the vehicle is changed by a gesture operation (Patent Document 1), or the viewpoint switching in HMD (Head Mounted Display) is performed by a gesture input (Patent Document 2). It has been known. In the HMD of Patent Document 2, the display can be switched using a plurality of input means such as button / switch input, line-of-sight input, voice input, myoelectric input, and electroencephalogram input in addition to gestures.

ジェスチャ入力を含む複数の入力手段によって機器を操作することは、他にも特許文献３や特許文献４にも記載されている。特許文献３のマルチモーダル入力・ユーザインタフェースは、音声認識した結果とジェスチャ入力を含むその他の入力結果を統合して尤度の高い解釈結果を出力することが記載されている。特許文献４には２種以上の異なる入力情報から、入力情報の種類ごとに入力操作の意味する意味情報を認識して、２つ以上の意味情報を組み合わせることにより、対象装置に所定の動作を実行させることが記載されている。 The operation of a device by a plurality of input means including gesture input is also described in Patent Document 3 and Patent Document 4. The multimodal input / user interface of Patent Document 3 describes that a speech recognition result and other input results including a gesture input are integrated to output a highly likely interpretation result. Patent Literature 4 recognizes semantic information meaning an input operation for each type of input information from two or more different types of input information, and combines the two or more semantic information to perform a predetermined operation on the target device. It is described to be executed.

特開２０１４−８８１８号公報JP 2014-8818 A 特開２０１４−１１５４５７号公報JP 2014-115457 A 特開２００６−４８６２８号公報JP 2006-48628 A 特開２０１２−１０３８４０号公報JP 2012-103840 A

しかしながら、従来技術では、例えば、意図しないジェスチャ入力により誤作動が発生してしまうという問題がある。操作者が無意識に何らかの動作をした場合に、その動作が入力操作として操作者が意図した動作なのか、操作を意図しない動作なのかを、システム側で判別することは容易ではない。例えば、ジェスチャ操作可能なサイネージ（ｓｉｇｎａｇｅ：デジタルサイネージ）の前を単に横切る人の動作を、サイネージが入力操作として誤検出し、サイネージが誤作動する可能性がある。 However, the conventional technique has a problem that malfunction occurs due to, for example, an unintended gesture input. When the operator performs some operation unconsciously, it is not easy for the system to determine whether the operation is an operation intended by the operator as an input operation or an operation not intended for the operation. For example, there is a possibility that the signage erroneously detects the movement of a person who simply crosses in front of a signage (signage: digital signage) as an input operation, and the signage malfunctions.

本発明は上述の問題に鑑みなされたものであって、本発明の課題は、ジェスチャ入力と音声入力とに基づいて操作可能な操作対象機器に対して、操作を意図しない動作や発話による誤入力がなされることが防止可能な機器操作システム、機器操作方法および機器操作プログラムを提供することにある。 The present invention has been made in view of the above-described problems, and it is an object of the present invention to perform an erroneous input due to an operation or speech that is not intended for an operation target device that can be operated based on gesture input and voice input. An object of the present invention is to provide a device operation system, a device operation method, and a device operation program that can prevent the occurrence of the problem.

上記課題を解決するための一実施形態に記載された発明は、ジェスチャ入力と音声入力とに基づいて機器を操作する機器操作システムであって、ジェスチャ入力および音声入力による前記機器の操作を実行せずに、操作意思の有無を確認する操作意思確認手段と、ジェスチャ入力と音声入力との少なくとも一方の入力による前記機器の操作を指示する操作指示手段とを備え、前記操作意思確認手段は、操作意思を示す特定のジェスチャが前記ジェスチャ入力として検出されたと判定され、かつ操作意思を示す特定の音声キーワードが前記音声入力として検出された場合に、前記操作意思が有りと判断して、前記操作指示手段による処理に移行することを特徴とする機器操作システムである。 An invention described in an embodiment for solving the above-described problem is a device operation system that operates a device based on gesture input and voice input, and executes the operation of the device by gesture input and voice input. Without the operation intention confirmation means for confirming the presence or absence of the operation intention, and the operation instruction means for instructing the operation of the device by at least one of the gesture input and the voice input, the operation intention confirmation means When it is determined that a specific gesture indicating intention is detected as the gesture input and a specific voice keyword indicating intention to operate is detected as the voice input, it is determined that the operation intention is present, and the operation instruction It is an apparatus operation system which shifts to processing by means.

他の実施形態に記載された発明は、ジェスチャ入力と音声入力とに基づいて機器を操作する機器操作方法であって、ジェスチャ入力と音声入力とに基づいて機器を操作する機器操作方法であって、ジェスチャ入力および音声入力による前記機器の操作を実行せずに、操作意思の有無を確認する操作意思確認段階と、ジェスチャ入力と音声入力との少なくとも一方の入力による前記機器の操作を指示する操作指示段階とを含み、前記操作意思確認段階において操作意思を示す特定のジェスチャが前記ジェスチャ入力として検出されたと判定され、かつ操作意思を示す特定の音声キーワードが前記音声入力として検出された場合に、前記操作意思が有りと判断して、前記操作指示段階に移行することを特徴とする機器操作方法である。 The invention described in another embodiment is a device operation method for operating a device based on gesture input and voice input, and is a device operation method for operating a device based on gesture input and voice input. , An operation intention confirmation stage for confirming the presence or absence of an operation intention without executing the operation of the device by gesture input and voice input, and an operation for instructing operation of the device by at least one of gesture input and voice input An instruction stage, and when it is determined that the specific gesture indicating the operation intention is detected as the gesture input in the operation intention confirmation stage, and the specific voice keyword indicating the operation intention is detected as the voice input, The device operating method is characterized in that it is determined that there is an intention to operate, and the operation instruction stage is entered.

さらに他の実施形態に記載された発明は、ジェスチャ入力と音声入力とに基づいて機器を操作する機器操作方法をコンピュータに実行させるプログラムであって、該機器操作方法は、ジェスチャ入力および音声入力による前記機器の操作を実行せずに、操作意思の有無を確認する操作意思確認段階と、ジェスチャ入力と音声入力との少なくとも一方による前記機器の操作を指示する操作指示段階とを含み、前記操作意思確認段階において、操作意思を示す特定のジェスチャが前記ジェスチャ入力として検出されたと判定され、かつ操作意思を示す特定の音声キーワードが前記音声入力として検出された場合に、前記操作意思が有りと判断して前記操作指示段階に移行することを特徴とする機器操作プログラムである。 The invention described in yet another embodiment is a program that causes a computer to execute a device operation method for operating a device based on gesture input and voice input. The device operation method is based on gesture input and voice input. An operation intention confirmation step for confirming the presence or absence of an operation intention without performing an operation of the device; and an operation instruction step for instructing an operation of the device by at least one of a gesture input and a voice input, In the confirmation stage, when it is determined that a specific gesture indicating an operation intention is detected as the gesture input, and a specific voice keyword indicating the operation intention is detected as the voice input, it is determined that the operation intention is present. Then, the apparatus operation program is shifted to the operation instruction step.

実施形態に記載の機器操作システムの構成例を示すブロック図である。It is a block diagram which shows the structural example of the apparatus operating system as described in embodiment. 実施形態に記載の機器操作システムの動作の一例を示すフロー図である。It is a flowchart which shows an example of operation | movement of the apparatus operating system as described in embodiment. ジェスチャ入力判定処理の一例を示すフロー図である。It is a flowchart which shows an example of a gesture input determination process. 音声入力判定処理の一例を示すフロー図である。It is a flowchart which shows an example of an audio | voice input determination process. フラグ判定処理の一例を示すフロー図である。It is a flowchart which shows an example of a flag determination process. 図３から図５の処理のタイミングチャートの一例を示す図である。It is a figure which shows an example of the timing chart of the process of FIGS. 表示装置がある場合に表示装置に表示されるフィードバック表示を示す図である。It is a figure which shows the feedback display displayed on a display apparatus when there exists a display apparatus. 表示装置がある場合に表示装置に表示されるフィードバック表示を示す図である。It is a figure which shows the feedback display displayed on a display apparatus when there exists a display apparatus. 表示装置がある場合に表示装置に表示されるフィードバック表示を示す図である。It is a figure which shows the feedback display displayed on a display apparatus when there exists a display apparatus. 発光灯の発光例を示している。The example of light emission of a light-emitting lamp is shown. 優先操作者の設定を説明するための図である。It is a figure for demonstrating the setting of a priority operator. 優先操作者の切り替えを説明するための図である。It is a figure for demonstrating switching of a priority operator. 複数操作者の設定を説明するための図である。It is a figure for demonstrating the setting of multiple operators. 機器操作システムが搭載されたサイネージとこれを操作する操作者を示す説明図である。It is explanatory drawing which shows the signage by which an apparatus operation system is mounted, and the operator who operates this. サイネージのＴＯＰ画面を示す図である。It is a figure which shows the TOP screen of signage. 操作者とフィードバック表示の関係を示す図である。It is a figure which shows the relationship between an operator and a feedback display. サイネージのメニュー項目の階層構造を示す図である。It is a figure which shows the hierarchical structure of the menu item of signage. 操作対象機器が家電機器である場合の機器操作システムの構成例を示す図である。It is a figure which shows the structural example of the apparatus operation system in case an operation target apparatus is a household appliance. 操作対象機器が家電機器である場合の機器操作システムの動作を説明する図である。It is a figure explaining operation | movement of the apparatus operation system in case an operation target apparatus is a household appliance. 操作対象機器が家電機器である場合の機器操作システムの動作を説明する図である。It is a figure explaining operation | movement of the apparatus operation system in case an operation target apparatus is a household appliance. 操作対象機器が家電機器である場合の機器操作システムの動作を説明する図である。It is a figure explaining operation | movement of the apparatus operation system in case an operation target apparatus is a household appliance. 操作対象機器が家電機器である場合の機器操作システムの動作を説明する図である。It is a figure explaining operation | movement of the apparatus operation system in case an operation target apparatus is a household appliance. 操作対象機器が家電機器である場合の機器操作システムの他の構成例を示す図である。It is a figure which shows the other structural example of an apparatus operation system in case an operation target apparatus is a household appliance. 操作対象機器が家電機器である場合の機器操作システムの動作を説明する図である。It is a figure explaining operation | movement of the apparatus operation system in case an operation target apparatus is a household appliance. 操作対象機器が家電機器である場合の機器操作システムの動作を説明する図である。It is a figure explaining operation | movement of the apparatus operation system in case an operation target apparatus is a household appliance. 操作対象機器が家電機器である場合の機器操作システムの動作を説明する図である。It is a figure explaining operation | movement of the apparatus operation system in case an operation target apparatus is a household appliance. 自動車装備の制御システムに組み込まれた機器操作システムの構成例を示す図である。It is a figure which shows the structural example of the apparatus operation system integrated in the control system of motor vehicle equipment. 操作対象機器が自動車装備である場合の操作意思判定フェーズを説明する図である。It is a figure explaining the operation intention determination phase in case an operation target apparatus is a motor vehicle equipment. 操作対象機器が自動車装備である場合の操作指示フェーズを説明する図である。It is a figure explaining the operation instruction | indication phase in case operation target apparatus is a motor vehicle equipment. 操作対象機器が自動車装備である場合の操作指示フェーズを説明する図である。It is a figure explaining the operation instruction | indication phase in case operation target apparatus is a motor vehicle equipment. 操作対象機器が自動車装備である場合の操作指示フェーズを説明する図である。It is a figure explaining the operation instruction | indication phase in case operation target apparatus is a motor vehicle equipment. 操作対象機器が自動車装備である場合の操作指示フェーズを説明する図である。It is a figure explaining the operation instruction | indication phase in case operation target apparatus is a motor vehicle equipment.

以下、本発明の実施の形態について、詳細に説明する。実施形態に記載の機器操作システムは、サイネージ、ＰＣ、スマートフォン、ＨＭＤ、家電機器、自動車装備などの各種機器をジェスチャ入力および音声入力により操作する機器操作システムとして構成される。 Hereinafter, embodiments of the present invention will be described in detail. The device operation system described in the embodiment is configured as a device operation system that operates various devices such as a signage, a PC, a smartphone, an HMD, a home appliance, and an automobile with gesture input and voice input.

（システム構成）
図１は、実施形態に記載の機器操作システムの構成例を示すブロック図である。図１に示すように、機器操作システム１は、中央処理装置（ＣＰＵ：ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１１と、読み取り専用メモリ（ＲＯＭ：ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）１２と、ランダムアクセスメモリ（ＲＡＭ：ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）１３と、音声入力部１４と、撮像部１５と、出力部１６、機器機能処理部１７と、これらを接続する接続手段１０とを備えて構成される。 (System configuration)
FIG. 1 is a block diagram illustrating a configuration example of a device operation system described in the embodiment. As shown in FIG. 1, the device operation system 1 includes a central processing unit (CPU) 11, a read only memory (ROM) 12, and a random access memory (RAM) 13. And an audio input unit 14, an imaging unit 15, an output unit 16, a device function processing unit 17, and connection means 10 for connecting them.

機器機能処理部１７は操作対象の機器自体の機能を発揮するための処理を行なう装置であり、例えば、照明における照明装置である。機器機能処理部１７は単独の装置であってもよいが、ＣＰＵ１１，ＲＯＭ１２およびＲＡＭ１３に一体的に組み込んでもよい。 The device function processing unit 17 is a device that performs processing for exhibiting the function of the operation target device itself, and is, for example, an illumination device for illumination. The device function processing unit 17 may be a single device, but may be integrated into the CPU 11, ROM 12, and RAM 13.

ＣＰＵ１１は、各構成要素１１，１２，１３，１４，１５，１６，１７とバスなどの接続手段１０で接続されて、制御信号やデータの転送処理を行うとともに、音声入力部１４における音声入力及び撮像部１５によるジェスチャ入力に応答して、機器操作システム１全体の動作を実現するための各種のプログラムの実行、演算処理等を行う。 The CPU 11 is connected to each component 11, 12, 13, 14, 15, 16, and 17 through a connection means 10 such as a bus, performs control signal and data transfer processing, and performs voice input and output in the voice input unit 14. In response to a gesture input by the imaging unit 15, various programs for executing the operation of the entire device operation system 1 are executed, arithmetic processing, and the like are performed.

接続手段１０は、機器操作システム１の各構成要素を接続するバスなどの接続手段であり、機器操作システム１の構成要素がそれぞれ独立した装置である場合は、有線または無線で各構成要素を接続する手段である。 The connection means 10 is a connection means such as a bus for connecting each component of the device operation system 1. When the component of the device operation system 1 is an independent device, each component is connected by wire or wirelessly. It is means to do.

ＲＯＭ１２には、機器操作システム１全体の動作に必要なプログラムおよびデータを格納する。これらプログラムはＤＶＤ−ＲＯＭ、ＨＤＤ、ＳＤＤ等の記録媒体に格納されており、ＲＡＭ１３に読み出されてＣＰＵ１１による実行が開始され、本実施形態の機器制御システム１の処理を行う。なお、このようなＲＯＭに代えて、クラウドサービスを用いて必要なプログラムおよびデータを取り込むことももちろん可能である。 The ROM 12 stores programs and data necessary for the operation of the entire device operating system 1. These programs are stored in a recording medium such as a DVD-ROM, HDD, or SDD, and are read into the RAM 13 to be executed by the CPU 11 to perform processing of the device control system 1 of the present embodiment. Of course, necessary programs and data can be fetched using a cloud service instead of such a ROM.

ＲＡＭ１３には、後述する音声入力処理およびジェスチャ入力処理を行うためのソフトウエアに従って作成されたプログラム、および音声入力のデータおよびジェスチャ入力のデータを一時的に保持する。 The RAM 13 temporarily stores a program created in accordance with software for performing voice input processing and gesture input processing, which will be described later, and voice input data and gesture input data.

音声入力部１４は、音声入力を行なう手段であり、例えば、集音マイク、指向性マイクなどを用いることができる。 The voice input unit 14 is a means for performing voice input. For example, a sound collection microphone, a directional microphone, or the like can be used.

撮像部１５は、ジェスチャ入力を行なう手段であり、操作者の身体の動きを撮影可能なＲＧＢカメラ、赤外線カメラ、距離画像カメラ（ＴｉｍｅｏｆＦｌｉｇｈｔ方式など）や、操作者の身体の動きを検知可能な超音波カメラ、ステレオカメラなどを用いることができる。撮像部１５は、操作者のどのような動きを動作判定に用いるかに応じて、その設置位置を決定すればよい。撮像部１５で撮影した操作者の身体の動きからジェスチャ入力を検出する。例えば、手の動きをジェスチャ入力とする場合は、撮影した操作者の身体の動きから、手の動きをジェスチャ入力として検出する。 The imaging unit 15 is a means for performing gesture input, and can detect the movement of the operator's body, such as an RGB camera, an infrared camera, a distance image camera (Time of Flight method, etc.) that can capture the movement of the operator's body. An ultrasonic camera, a stereo camera, or the like can be used. The imaging unit 15 may determine the installation position according to what kind of movement of the operator is used for the operation determination. A gesture input is detected from the movement of the operator's body photographed by the imaging unit 15. For example, when hand movement is used as a gesture input, the hand movement is detected as a gesture input from the photographed body movement of the operator.

出力部１６は、各種の出力をするためのものであり、表示によるフィードバックを行なう表示装置、音声によるフィードバックを行なうスピーカなどを用いることができる。操作対象機器が表示装置、スピーカなどを備えている場合は操作対象機器の表示装置、スピーカなどが出力部１６を兼ねてもよい。 The output unit 16 is for various outputs, and a display device that performs feedback by display, a speaker that performs feedback by sound, and the like can be used. When the operation target device includes a display device, a speaker, or the like, the display device, the speaker, or the like of the operation target device may also serve as the output unit 16.

（処理フロー）
図２は、実施形態に記載の機器操作システムの動作の一例を示すフロー図である。図３は、ジェスチャ入力判定処理の一例を示すフロー図であり、図４は音声入力判定処理の一例を示すフロー図であり、図５は、フラグ判定処理Ｓ１の一例を示すフロー図であり、図６は図３から図５の処理のタイミングチャートの一例を示す図である。図１を参照しながら、図２から図６に基づいて機器操作システムの動作を説明する。 (Processing flow)
FIG. 2 is a flowchart illustrating an example of the operation of the device operation system described in the embodiment. 3 is a flowchart showing an example of a gesture input determination process, FIG. 4 is a flowchart showing an example of a voice input determination process, and FIG. 5 is a flowchart showing an example of a flag determination process S1. FIG. 6 is a diagram illustrating an example of a timing chart of the processing of FIGS. The operation of the device operating system will be described with reference to FIGS. 2 to 6 with reference to FIG.

機器操作システム１の電源が投入されるなどすると、図２に示す動作を実行するプログラムが起動するとともに図３に示すジェスチャ入力判定処理と図４に示す音声入力判定処理とが開始される。これらの処理プログラムおよび処理に必要なデータはＲＯＭ１２などの記憶手段に予め記憶されている。機器操作システム１では、図１に示すＣＰＵ１１が機器操作システム１の各部と協働して、操作者からの入力に基づいて、ジェスチャ入力判定処理および音声入力判定処理を実行しつつ、操作者の意思を判定する操作意思判定フェーズＦ１と、実際に操作を指示する操作指示フェーズＦ２を順に実行する。 When the device operating system 1 is turned on, the program for executing the operation shown in FIG. 2 is started, and the gesture input determination process shown in FIG. 3 and the voice input determination process shown in FIG. 4 are started. These processing programs and data necessary for the processing are stored in advance in storage means such as the ROM 12. In the device operation system 1, the CPU 11 shown in FIG. 1 cooperates with each unit of the device operation system 1 to execute the gesture input determination process and the voice input determination process based on the input from the operator, An operation intention determination phase F1 for determining an intention and an operation instruction phase F2 for actually instructing an operation are sequentially executed.

操作意思判定フェーズＦ１は、何らかの入力があった場合に、操作者が操作の意思を有するのか否かを判定するための処理を行なうフェーズであり、実際に機器に対する操作処理をしないフェーズである。次の操作指示フェーズＦ２では、操作者が操作の意思を有することが確認された場合に操作者の入力に応じて実際に個別具体的に機器を操作する処理を行なうフェーズである。 The operation intention determination phase F1 is a phase for performing processing for determining whether or not the operator has an intention to operate when there is any input, and is a phase in which no operation processing is actually performed on the device. The next operation instruction phase F2 is a phase in which, when it is confirmed that the operator has an intention to operate, a process for actually operating the device individually and specifically according to the input of the operator.

操作意思判定フェーズＦ１では、ジェスチャ入力および音声入力の両方の操作手段を用いた操作入力を受け付けるが、両方の操作手段を用いた入力が完了しても実際の機器を操作する処理は行なわない。すなわち、操作意思判定フェーズＦ１では、原則として機器機能処理部１７による処理を行なわない。操作意思判定フェーズＦ１では、操作者の意思を確認する以外にも、操作者が行なった操作入力（ジェスチャ入力および音声入力）をシステムが認識状態にあることを確認したり、システムが認識可能な入力態様（ジェスチャをすべき位置や音声の大きさなど）を操作者に把握させることもできる。操作意思判定フェーズにおける入力は、実際に機器を操作するための入力とは見なされないので、操作を意図しない入力による不用意な操作がなされることがない。 In the operation intention determination phase F1, the operation input using both the gesture input and voice input operation means is accepted, but the process of operating the actual device is not performed even when the input using both operation means is completed. That is, in the operation intention determination phase F1, in principle, the processing by the device function processing unit 17 is not performed. In the operation intention determination phase F1, in addition to confirming the operator's intention, it is possible to confirm that the system is in a recognition state of the operation input (gesture input and voice input) performed by the operator, or to recognize the system. It is also possible for the operator to grasp the input mode (position where a gesture should be made, the volume of the voice, etc.). Since the input in the operation intention determination phase is not regarded as an input for actually operating the device, an inadvertent operation due to an input not intended for the operation is not performed.

操作意思判定フェーズＦ１では、図３に示すジェスチャ入力判定処理によって設定される特定ジェスチャ判定フラグと図４に示す音声入力判定処理によって設定される特定音声判定フラグとに基づいて図５に示すフラグ判定処理Ｓ１を実行することにより操作者の操作意思を確認している。 In the operation intention determination phase F1, the flag determination shown in FIG. 5 is performed based on the specific gesture determination flag set by the gesture input determination process shown in FIG. 3 and the specific voice determination flag set by the voice input determination process shown in FIG. The operator's intention to operate is confirmed by executing the process S1.

ジェスチャ入力判定処理について説明する。図３に示すジェスチャ入力判定処理では、特定ジェスチャがなされたことを判定して、特定ジェスチャ判定フラグをＯＦＦからＯＮに変化させる。まず、図３に示すように、撮像部１５により操作者の１フレームの画像を取得し（Ｓ３１）、取得した１フレームの画像についてジェスチャ入力の検出を行なうジェスチャ入力検出処理を行なう（Ｓ３２）。 The gesture input determination process will be described. In the gesture input determination process shown in FIG. 3, it is determined that a specific gesture has been performed, and the specific gesture determination flag is changed from OFF to ON. First, as shown in FIG. 3, an image of the operator's one frame is acquired by the imaging unit 15 (S31), and gesture input detection processing is performed to detect gesture input for the acquired one-frame image (S32).

ジェスチャ入力検出処理（Ｓ３２）は、取得した画像中のジェスチャの有無を判定すると共に、ジェスチャが有りと判定した場合は、さらにどのようなジェスチャであるのかを判定する。「どのようなジェスチャであるのか」とは、ジェスチャの内容のことであり、例えば、グー、パーなどの手のひらの形状であったり、手のひらが右向き、手のひらが左向きなどの手のひらの向きであったり、腕を曲げた、腕を伸ばしたなどの腕の状態であったり、その他、手、足、頭などいずれかの身体に関連する状態や、必要に応じてそれらの位置を判定結果とするものである。 The gesture input detection process (S32) determines the presence or absence of a gesture in the acquired image, and further determines what kind of gesture it is when it is determined that there is a gesture. `` What kind of gesture is it '' means the content of the gesture, for example, the shape of the palm such as goo, par, the palm is right, the palm is left, etc. It is the state of the arm such as bent arm, extended arm, etc., or any other state related to any body such as hand, foot, head, etc. is there.

次いで、特定ジェスチャ判定フラグ設定処理を行なう（Ｓ３３）。特定ジェスチャ判定フラグ設定処理では、現在のフレームでのＳ３２におけるジェスチャ入力の検出結果と過去のジェスチャ入力の検出結果の履歴データとに基づいて、特定のジェスチャが行なわれているかどうかの特定ジェスチャ判定処理を行ない、この判定結果に基づいて判定フラグのＯＮ／ＯＦＦ（それぞれ有効／無効ともいう）を行なう。 Next, a specific gesture determination flag setting process is performed (S33). In the specific gesture determination flag setting process, a specific gesture determination process for determining whether or not a specific gesture is performed based on the detection result of the gesture input in S32 and the history data of the detection result of the past gesture input in the current frame. The determination flag is turned ON / OFF (also referred to as valid / invalid respectively) based on the determination result.

特定ジェスチャ判定処理は、過去から現在のフレームまでの所定数（現在のフレームのみでも可）のジェスチャ入力の検出結果のそれぞれのジェスチャ入力について、予め記憶されている特定ジェスチャとの一致を判定する。このとき、ＲＯＭ１２等に設けられたデータベース等に予め特定ジェスチャが格納されているので、ジェスチャ入力と格納された特定ジェスチャとの一致（部分一致を含む）を判断し、一致した場合に特定ジェスチャを検出したと判定する。操作意思判定フェーズＦ１では、予め決められた操作意思を示す、１つまたは複数の特定ジェスチャとの一致を判定することができる。 In the specific gesture determination process, for each gesture input of a predetermined number of gesture input detection results from the past to the current frame (only the current frame is acceptable), a match with a specific gesture stored in advance is determined. At this time, since a specific gesture is stored in advance in a database or the like provided in the ROM 12 or the like, a match (including partial match) between the gesture input and the stored specific gesture is determined. It is determined that it has been detected. In the operation intention determination phase F1, it is possible to determine a match with one or more specific gestures indicating a predetermined operation intention.

過去から現在のフレームまでの所定数のジェスチャ入力の検出結果の中に、特定ジェスチャと一致するものが、閾値以上である場合にその特定ジェスチャの判定フラグをＯＮに変更またはＯＮを維持し、閾値未満である場合にその特定ジェスチャの判定フラグをＯＦＦに変更またはＯＦＦを維持する。例えば、フラグ変更の閾値が３であるとき、現フレームを含む連続する５フレームのうち、３フレームだけ特定ジェスチャが検出されたと判定された場合に、その特定ジェスチャの判定フラグをＯＮに変更し、３フレーム特定ジェスチャが検出されなかった場合は、その特定ジェスチャの判定フラグをＯＦＦに変更したりすることができる。閾値を設けて所定数のフレームに亘るジェスチャに基づいて判定すると、一時的な身体のブレなどによる誤検出を防止することができる。閾値はＯＮにする場合とＯＦＦにする場合で異なっていてもよい。 If the detection result of a predetermined number of gesture inputs from the past to the current frame matches the specific gesture and is equal to or greater than the threshold, the determination flag for the specific gesture is changed to ON or maintained ON, and the threshold is set. If it is less than that, the determination flag of the specific gesture is changed to OFF or maintained OFF. For example, when the flag change threshold is 3, when it is determined that a specific gesture has been detected for only 3 frames among 5 consecutive frames including the current frame, the determination flag for the specific gesture is changed to ON, When the three-frame specific gesture is not detected, the determination flag of the specific gesture can be changed to OFF. If a threshold is provided and a determination is made based on a gesture over a predetermined number of frames, erroneous detection due to temporary body shake or the like can be prevented. The threshold value may be different depending on whether it is turned on or off.

特定ジェスチャ判定フラグ設定処理（Ｓ３３）を実行した後、ジェスチャの検出結果の履歴保存処理（Ｓ３４）を行なう。ジェスチャの検出結果の履歴保存処理では、ジェスチャ入力検出処理Ｓ３２で検出したジェスチャの内容を記録し、履歴として保存する。履歴は、例えばＲＡＭ１３等に設けられたデータベース等に逐次記憶していけばよい。別の態様として、履歴は、操作意思判定フェーズＦ１では、操作意思を示す特定ジェスチャの有無を記録し、操作指示フェーズＦ２では、ジェスチャ入力検出処理Ｓ３２における検出結果としてのジェスチャの内容を記録することとしてもよい。 After executing the specific gesture determination flag setting process (S33), the history detection process (S34) of the gesture detection result is performed. In the history detection process of the gesture detection result, the content of the gesture detected in the gesture input detection process S32 is recorded and stored as a history. The history may be stored sequentially in a database provided in the RAM 13 or the like, for example. As another aspect, the history records the presence / absence of a specific gesture indicating an operation intention in the operation intention determination phase F1, and records the content of the gesture as the detection result in the gesture input detection process S32 in the operation instruction phase F2. It is good.

このジェスチャの検出結果の履歴保存処理（Ｓ３４）は、ジェスチャ入力検出処理Ｓ３２の後であればよく、特定ジェスチャ判定フラグ設定処理Ｓ３３の前に行なうこととしてもよい。なお、特定ジェスチャ判定フラグ設定処理を上記のように複数フレームのジェスチャ検出結果によらず、現在のフレームに対してのみ行なう場合は、この履歴保存処理Ｓ３４を省略することができる。 The history detection process (S34) of the gesture detection result may be performed after the gesture input detection process S32, and may be performed before the specific gesture determination flag setting process S33. Note that, when the specific gesture determination flag setting process is performed only for the current frame regardless of the gesture detection result of a plurality of frames as described above, the history storage process S34 can be omitted.

ジェスチャの検出結果の履歴保存処理（Ｓ３４）が終了すると、再び画像取得処理（Ｓ３１）に戻り、次のフレームについてのジェスチャ入力判定処理を行なう。 When the history detection process (S34) of the gesture detection result is completed, the process returns to the image acquisition process (S31) again, and the gesture input determination process for the next frame is performed.

上記ジェスチャ入力判定処理では、１フレームで１つの画像を取得する場合を例に挙げて説明しているが、１フレームで複数の画像を取得してもよい。この場合、特定ジェスチャは複数の静止画でもよいし、動画でもよい。また、例えば手を左右に振るスワイプといった時間軸のあるジェスチャを検出するために、一定時間内の各フレームでの取得画像（複数の静止画）を保存しておき、連続した画像に対してジェスチャ検出を行なってもよい。 In the gesture input determination process, a case where one image is acquired in one frame is described as an example, but a plurality of images may be acquired in one frame. In this case, the specific gesture may be a plurality of still images or a moving image. For example, in order to detect a gesture with a time axis, such as swiping a hand to the left or right, an acquired image (a plurality of still images) in each frame within a predetermined time is stored, and a gesture is performed on a continuous image. Detection may be performed.

また、履歴保存は、判定結果とともに、そのジェスチャ入力を検出したときのタイムスタンプを履歴保存してもよい。特定ジェスチャ判定フラグ設定処理において、タイムスタンプが古すぎるジェスチャ入力については判定の対象から外すこととしてもよい。 In addition, the history storage may save the time stamp when the gesture input is detected together with the determination result. In the specific gesture determination flag setting process, a gesture input whose time stamp is too old may be excluded from the determination target.

次に音声入力判定処理について説明する。また、図４に示す音声入力判定処理では、特定の音声の入力を受け付けたかを判定して、音声判定フラグを変化させる。音声入力判定処理は、まず、音声入力部により音声を取得し（Ｓ４１）、取得した音声から音声認識を行なう音声認識処理をする（Ｓ４２）。無音状態（所定の音入力レベル以下の状態）から、何らかの音声入力があると、音声取得処理（Ｓ４１）における音声取得が開始され、所定の時間の間、音声取得が継続して行なわれる。別の態様として、所定時間の間ではなく、再び無音状態になるまで継続して音声取得してもよい。音声認識処理（Ｓ４２）は、取得した音声から音声キーワードを抽出することにより音声信号を言語化する変換を行なう。 Next, the voice input determination process will be described. In the voice input determination process shown in FIG. 4, it is determined whether a specific voice input has been accepted, and the voice determination flag is changed. In the voice input determination process, first, voice is acquired by the voice input unit (S41), and voice recognition processing for performing voice recognition from the acquired voice is performed (S42). If there is any voice input from the silent state (state below a predetermined sound input level), voice acquisition in the voice acquisition process (S41) is started, and voice acquisition is continuously performed for a predetermined time. As another aspect, the voice may be continuously acquired until a silent state is reached again instead of during a predetermined time. In the voice recognition process (S42), a voice keyword is extracted from the acquired voice to convert the voice signal into a language.

次いで、特定音声判定フラグ設定処理を行なう（Ｓ４３）。特定音声判定フラグ設定処理では、認識された音声キーワードに特定の音声キーワードと一致したものが検出されたか否かを判定することにより、特定の音声キーワードが発話されたかどうかに基づいて、判定フラグのＯＮ／ＯＦＦを行なう。このとき、ＲＯＭ１２等に設けられたデータベース等に予め特定の音声キーワードが格納されており、認識された音声キーワードと特定の音声キーワードとの一致を判断し、認識された音声キーワードが特定の音声キーワードと一致する場合に特定の音声キーワードを検出したと判定する。操作意思判定フェーズＦ１では、予め決められた操作意思を示す、１つまたは複数の特定の音声キーワードとの一致を判定することができる。 Next, a specific voice determination flag setting process is performed (S43). In the specific voice determination flag setting process, it is determined whether or not a specific voice keyword is uttered by determining whether or not a recognized voice keyword that matches the specific voice keyword is detected. Turn ON / OFF. At this time, a specific voice keyword is stored in advance in a database or the like provided in the ROM 12 or the like, and a match between the recognized voice keyword and the specific voice keyword is determined, and the recognized voice keyword is the specific voice keyword. It is determined that a specific voice keyword has been detected. In the operation intention determination phase F1, it is possible to determine a match with one or more specific voice keywords indicating a predetermined operation intention.

また、現在のフレームを含む一定時間内の過去の音声キーワードの検出結果の履歴データに対して特定の音声キーワードとの一致を判定して特定音声判定フラグのＯＮ／ＯＦＦを行なってもよい。現在のフレームを含む過去の履歴データから一定時間内に複数の音声キーワードが検出されていなければ、判定フラグをＯＮにし、一定時間は判定フラグをＯＮのまま維持する。これにより、通常の会話が操作者の意図した発話として判定されるのを防止することができる。また、特定の音声キーワードがない場合でも、過去一定時間内に特定の音声キーワードがあれば、フラグを維持する。なお、判定フラグを変更する必要がない場合は、そのままフラグの状態を維持する。 Further, it is also possible to turn ON / OFF the specific voice determination flag by determining whether the history data of the detection result of the past voice keyword within a certain time including the current frame matches the specific voice keyword. If a plurality of voice keywords are not detected within a predetermined time from past history data including the current frame, the determination flag is turned on, and the determination flag is kept on for a certain time. This can prevent a normal conversation from being determined as an utterance intended by the operator. Even if there is no specific voice keyword, the flag is maintained if there is a specific voice keyword within a certain past time. When there is no need to change the determination flag, the flag state is maintained as it is.

特定音声判定フラグ設定処理（Ｓ４３）の後、音声キーワードの検出結果の履歴保存処理を行なう（Ｓ４４）。音声認識処理（Ｓ４２）において抽出した音声キーワードおよびそのタイムスタンプを記録し、履歴として保存する。音声キーワードだけでなく、タイムスタンプを保存することで、所定時間以上経過したものについては所定時間経過時にそのフラグのリセットを行なうことができるし、Ｓ４３において過去一定時間内のキーワードを簡易に確認することができる。 After the specific voice determination flag setting process (S43), a voice keyword detection result history saving process is performed (S44). The voice keyword extracted in the voice recognition process (S42) and its time stamp are recorded and stored as a history. By storing not only the voice keyword but also the time stamp, the flag can be reset when the predetermined time elapses, and the keyword within a certain past time can be easily confirmed in S43. be able to.

この音声キーワードの検出結果の履歴保存処理（Ｓ４４）は、Ｓ４２の後であればよく、Ｓ４３の前に行なうこととしてもよい。なお、特定音声判定フラグ設定処理において履歴を利用しない場合、所定時間経過時のフラグのリセットを行なわない場合は、この履歴保存処理Ｓ３４を省略することができる。 The history storing process (S44) of the voice keyword detection result may be performed after S42, and may be performed before S43. Note that if the history is not used in the specific voice determination flag setting process, or if the flag is not reset when a predetermined time has elapsed, the history storage process S34 can be omitted.

音声キーワードの検出結果の履歴保存処理（Ｓ４４）が終了すると、再び、音声取得処理（Ｓ４１）に戻り、次の音声について、音声入力判定処理を行なう。 When the history storage process (S44) of the voice keyword detection result is completed, the process returns to the voice acquisition process (S41) again, and the voice input determination process is performed for the next voice.

図２に戻って、フラグ判定処理Ｓ１では、図５に示す処理にしたがって、上記したジェスチャ入力判定処理および音声入力判定処理の結果を示すフラグの状態から操作意思を判別する。フラグ判定処理Ｓ１について図５に基づいて説明する。まず、ジェスチャ入力と音声入力の判定結果により操作意思を判別する（Ｓ５１）。フラグ判定処理Ｓ１においては、特定ジェスチャ判定フラグおよび特定音声判定フラグのＯＮ・ＯＦＦを見て操作意思を判別することができる。具体的には、ジェスチャ判定フラグおよび音声判定フラグの両フラグがＯＦＦである場合、およびジェスチャ判定フラグおよび音声判定フラグのうちの一方のフラグのみがＯＮである場合は、操作意思なしと判別する（Ｓ５３へ進む）。 Returning to FIG. 2, in the flag determination process S <b> 1, the intention of operation is determined from the state of the flag indicating the results of the gesture input determination process and the voice input determination process described above according to the process shown in FIG. 5. The flag determination process S1 will be described with reference to FIG. First, the intention of operation is determined based on the determination result of gesture input and voice input (S51). In the flag determination process S1, it is possible to determine the intention to operate by looking at the ON / OFF state of the specific gesture determination flag and the specific voice determination flag. Specifically, when both the gesture determination flag and the voice determination flag are OFF, and when only one of the gesture determination flag and the voice determination flag is ON, it is determined that there is no intention to operate ( (Proceed to S53).

ただし、ジェスチャ判定フラグおよび音声判定フラグのうちの一方のフラグのみがＯＮである場合は、もう一方のフラグが変化する操作入力を促す表示を行なうための待機モードフラグをＯＮすることができる（Ｓ５４）。待機モードフラグは、ＯＮになると、表示手段等（図１の出力部１６に相当）にもう一方の入力の待機状態であることを示す表示をすることができる。 However, when only one of the gesture determination flag and the voice determination flag is ON, a standby mode flag for displaying an operation input for changing the other flag can be turned ON (S54). ). When the standby mode flag is turned on, it can display on the display means or the like (corresponding to the output unit 16 in FIG. 1) that it is in the standby state of the other input.

一方、ジェスチャ判定フラグおよび音声判定フラグの両フラグがＯＮである場合に操作意思ありと判別する（Ｓ５２へ進む）。操作意思がありと判別されたら、Ｓ５４でＯＮにした待機モードフラグがある場合は、これをＯＦＦに戻す。 On the other hand, when both the gesture determination flag and the voice determination flag are ON, it is determined that there is an intention to operate (proceed to S52). If it is determined that there is an intention to operate, if there is a standby mode flag turned ON in S54, it is returned to OFF.

ここで、図３のジェスチャ入力判定処理と、図４の音声入力判定処理と、図５のフラグ判定処理との関係について説明する。図６において、（ａ）はジェスチャ入力判定処理、（ｂ）は音声入力判定処理、（ｃ）はフラグ判定処理をそれぞれ示す。（ａ）から（ｃ）はフレーム数が異なることからも明らかなように、それぞれ非同期の処理である。 Here, the relationship between the gesture input determination process of FIG. 3, the voice input determination process of FIG. 4, and the flag determination process of FIG. 5 will be described. 6A shows gesture input determination processing, FIG. 6B shows voice input determination processing, and FIG. 6C shows flag determination processing. As is clear from the fact that the number of frames is different from (a) to (c), each is an asynchronous process.

図６（ａ）のジェスチャ判定処理では、「グー」が操作意思を示す特定ジェスチャであり、現在のフレームを含む過去３フレームにおいて検出されたジェスチャのうち２個が特定ジェスチャであると判定された場合に特定ジェスチャ判定フラグをＯＮに変更し、判定された特定ジェスチャが０である場合には特定ジェスチャ判定フラグをＯＦＦに変更するように制御している。この処理では、３フレーム目で、特定ジェスチャが２回連続して検出されたので、特定ジェスチャ判定フラグをＯＮにしており、９フレーム目で過去３フレーム中「グー」の検出が０であるのでフラグをＯＦＦにしている。 In the gesture determination process of FIG. 6A, it is determined that “Goo” is a specific gesture indicating an operation intention, and two of the gestures detected in the past three frames including the current frame are specific gestures. In this case, the specific gesture determination flag is changed to ON, and when the determined specific gesture is 0, the specific gesture determination flag is changed to OFF. In this process, since the specific gesture is detected twice consecutively in the third frame, the specific gesture determination flag is set to ON, and the detection of “Goo” in the past three frames is 0 in the ninth frame. The flag is turned off.

図６（ｂ）の音声入力判定処理では、「開始」が操作意思を示す特定の音声キーワードであり、認識された音声キーワードが特定の音声キーワードである場合に、特定音声判定フラグをＯＮするように制御している。この処理では、１フレーム目では、操作意思を示す特定の音声キーワードと異なる「天気予報」と発話されたので、特定音声判定フラグはＯＦＦのままであるが、２フレーム目では、特定の音声キーワードと一致する「開始」と発話されたので、特定音声判定フラグは２フレーム目でＯＮとなる。 In the voice input determination process in FIG. 6B, when the “start” is a specific voice keyword indicating an intention to operate and the recognized voice keyword is a specific voice keyword, the specific voice determination flag is turned on. Is controlling. In this process, since “weather forecast” is spoken different from the specific voice keyword indicating the operation intention in the first frame, the specific voice determination flag remains OFF, but in the second frame, the specific voice keyword Since the “start” that coincides with the utterance is uttered, the specific voice determination flag is turned ON in the second frame.

図６（ｃ）のフラグ判定処理では、５フレーム目で、特定ジェスチャ判定フラグと特定音声判定フラグとの両方がＯＮであることが確認されたので、操作意思ありと判定され、特定ジェスチャ判定フラグと特定音声判定フラグとの両方のフラグを必要に応じてＯＦＦにリセットする。 In the flag determination process of FIG. 6C, since it is confirmed that both the specific gesture determination flag and the specific voice determination flag are ON in the fifth frame, it is determined that there is an intention to operate, and the specific gesture determination flag is determined. And the specific voice determination flag are reset to OFF as necessary.

再び図２に戻って、操作意思判定フェーズＦ１において、フラグ判定処理Ｓ１により、操作意思があるか否かを判定すると、操作意思の有無に基づいて操作指示フェーズＦ２に移行するか否かが判定される（Ｓ２）。操作意思ありの場合（Ｓ２：Ｙｅｓ）は、操作指示フェーズＦ２に移行し、操作意思なしの場合（Ｓ２：Ｎｏ）は、操作指示フェーズＦ２に移行せず、再びフラグ判定処理Ｓ１に戻る。 Returning to FIG. 2 again, in the operation intention determination phase F1, when it is determined whether or not there is an operation intention by the flag determination processing S1, it is determined whether or not the operation instruction phase F2 is shifted based on the presence or absence of the operation intention. (S2). If there is an operation intention (S2: Yes), the operation instruction phase F2 is entered. If there is no operation intention (S2: No), the operation instruction phase F2 is not executed, and the process returns to the flag determination process S1 again.

操作指示フェーズＦ２は、ユーザによる入力を操作指示として解釈して、その解釈された操作指示に基づいて処理を行なうフェーズである。操作指示フェーズＦ２では必ずしもジェスチャ入力および音声入力の両方の操作手段を用いて入力を行なう必要はない。図３のジェスチャ入力判定処理と図４の音声入力判定処理との少なくとも一方の入力処理を完了することにより操作が実行できる。 The operation instruction phase F2 is a phase in which an input by the user is interpreted as an operation instruction and processing is performed based on the interpreted operation instruction. In the operation instruction phase F2, it is not always necessary to input using both the gesture input and voice input operation means. The operation can be executed by completing at least one of the gesture input determination process of FIG. 3 and the voice input determination process of FIG.

操作指示フェーズＦ２では、ジェスチャ入力判定処理（図３）における特定ジェスチャは、操作意思を確認するためのジェスチャと異なり、実際の各種操作の内容などを意味するジェスチャとなる。同様に、音声入力判定処理（図４）における特定の音声キーワードは、操作意思を確認するための音声キーワードと異なり、実際の各種操作の内容などを意味する音声キーワードとなる。複数の操作内容が有る場合には、その操作内容に応じた種類の特定ジェスチャ、特定の音声キーワードが存在するため、それぞれの入力判定処理（図３、４）では、特定ジェスチャおよび特定の音声キーワードごとのフラグのＯＮ、ＯＦＦを処理することにより、どの操作内容に対する入力が受け付けられたか否かを、それぞれの特定ジェスチャ判定フラグ、特定音声判定フラグで示すことができる。 In the operation instruction phase F2, the specific gesture in the gesture input determination process (FIG. 3) is a gesture that means the contents of various actual operations, unlike the gesture for confirming the intention of operation. Similarly, the specific voice keyword in the voice input determination process (FIG. 4) is a voice keyword that means the contents of various actual operations, etc., unlike the voice keyword for confirming the intention of operation. When there are a plurality of operation contents, there are specific gestures and specific voice keywords according to the operation contents. Therefore, in each input determination process (FIGS. 3 and 4), the specific gesture and the specific voice keyword are included. By processing ON / OFF of each flag, it is possible to indicate which operation content has been accepted by each specific gesture determination flag and specific voice determination flag.

操作指示フェーズＦ２では、これらの処理の結果に基づいて、操作指示が有ったか否かを判定する（Ｓ３）。操作指示の有無は、それぞれの入力判定処理（図３、図４）において変更され得る特定ジェスチャ判定フラグや特定音声判定フラグに基づいて判定する。操作指示フェーズＦ２では、特定ジェスチャ入力判定フラグと特定音声判定フラグの少なくとも一方のフラグがＯＮであることをもって操作指示が有ったと判定できる。もちろん、操作の内容によっては、両方のフラグが有効であることを操作指示の条件としてもよい。 In the operation instruction phase F2, it is determined whether or not there is an operation instruction based on the results of these processes (S3). The presence / absence of an operation instruction is determined based on a specific gesture determination flag or a specific voice determination flag that can be changed in each input determination process (FIGS. 3 and 4). In the operation instruction phase F2, it can be determined that there has been an operation instruction when at least one of the specific gesture input determination flag and the specific voice determination flag is ON. Of course, depending on the content of the operation, the condition for the operation instruction may be that both flags are valid.

操作指示が有ったと判定できた場合（Ｓ３：Ｙｅｓ）は、図３、図４の判定処理、にて入力があったと判定された、予め決められた入力パターン（動作のみ、音声のみ、動作および音声）に合致している操作指示を操作対象に対して行なうことにより、操作対象のシステムを制御する（Ｓ４）。操作対象のシステムの制御（Ｓ４）は、機器機能処理部１７により操作対象の機器の機能を動作処理することであって、例えば照明の場合、照明を点灯、調光することや、ＴＶの場合、ＴＶのオンオフや選局したり、音量調節したりすることなどである。 When it can be determined that there is an operation instruction (S3: Yes), a predetermined input pattern (operation only, sound only, operation determined as having been input in the determination processing of FIGS. 3 and 4) The operation target system is controlled by giving an operation instruction matching the operation target to the operation target (S4). Control of the operation target system (S4) is to process the function of the operation target device by the device function processing unit 17. For example, in the case of illumination, the illumination is turned on and dimmed, or in the case of TV. TV on / off, tuning, volume adjustment, etc.

Ｓ３において、操作指示がないと判別したかまたは判別できなかった場合（Ｓ３：Ｎｏ）か、または操作対象のシステムの制御（Ｓ４）が終了したら、操作指示フェーズＦ２が終了であるか否かの判定が行なわれる（Ｓ５）。 In S3, if it is determined that there is no operation instruction or if it cannot be determined (S3: No), or if the control of the operation target system (S4) ends, whether or not the operation instruction phase F2 is ended A determination is made (S5).

操作指示フェーズが終了であるか否かは、（１）操作指示として「終了」が指示された場合、（２）一定時間内に操作指示が無い場合、（３）一定時間の間、ジェスチャ判定フラグと音声判定フラグが両方ＯＦＦの場合に操作指示フェーズが終了であると判定される（Ｓ５：Ｙｅｓ）。 Whether or not the operation instruction phase is ended is determined by whether (1) “End” is instructed as an operation instruction, (2) If there is no operation instruction within a certain time, (3) Gesture determination for a certain time When both the flag and the voice determination flag are OFF, it is determined that the operation instruction phase is finished (S5: Yes).

操作指示フェーズが終了でないと判定された場合（Ｓ５：Ｎｏ）は、操作指示フェーズＦ２の最初の操作指示判定処理Ｓ３に戻る。 When it is determined that the operation instruction phase is not finished (S5: No), the process returns to the first operation instruction determination process S3 of the operation instruction phase F2.

操作指示フェーズが終了であると判定される（Ｓ５：Ｙｅｓ）と、再び操作意思判定フェーズＦ１に戻り、フラグ判定処理Ｓ１の処理を行なう。 When it is determined that the operation instruction phase is complete (S5: Yes), the operation intention determination phase F1 is returned again, and the flag determination process S1 is performed.

（操作のフィードバック）
本実施形態の機器操作システムでは、操作者に操作結果のフィードバックを行なっている。フィードバックには、表示手段にフィードバック表示を行なう態様と、音声出力装置からフィードバック出力を行なう態様がある。フィードバック表示を行なう手段は、出力部１６（図１参照）として表示装置がある場合とない場合でその形態は異なる。図７から図９は表示装置がある場合に表示装置に表示されるフィードバック表示を示す図である。 (Operation feedback)
In the device operation system of this embodiment, the operation result is fed back to the operator. The feedback includes a mode in which feedback is displayed on the display means and a mode in which feedback is output from the audio output device. The means for performing feedback display differs depending on whether or not the output unit 16 (see FIG. 1) has a display device. 7 to 9 are diagrams showing feedback displays displayed on the display device when the display device is present.

フィードバックは、操作意思判定フェーズＦ１ではジェスチャ判定フラグおよび音声判定フラグが切り替わったことに基づいてフィードバックを行い、操作指示フェーズでは、ジェスチャ判定フラグおよび音声判定フラグの少なくとも一方が切り替わったことに基づいてフィードバックを行なうことができる。 In the operation intention determination phase F1, feedback is performed based on switching of the gesture determination flag and the voice determination flag. In the operation instruction phase, feedback is performed based on switching of at least one of the gesture determination flag and the voice determination flag. Can be performed.

フィードバック表示は、ジェスチャ入力のフィードバック表示と音声入力のフィードバック表示とを別々の表示で示すことができる。これにより、フィードバック表示がどちらの入力に関するものであるかをそれぞれ確認できる。図７に示すように、ジェスチャ入力のフィードバックを示す手のひら形のアイコンと、音声入力のフィードバックを示す吹き出しのアイコンとでフィードバック表示をすることができる。また図７に示すように、操作意思判定フェーズと操作指示フェーズとでアイコンの色などの表示状態を変化させてもよい。 The feedback display can show the gesture input feedback display and the voice input feedback display as separate displays. Thereby, it is possible to confirm which input the feedback display relates to. As shown in FIG. 7, feedback display can be performed with a palm-shaped icon indicating feedback of gesture input and a balloon icon indicating feedback of voice input. Further, as shown in FIG. 7, the display state such as the color of the icon may be changed between the operation intention determination phase and the operation instruction phase.

フィードバック表示は、例えば、ジェスチャ（音声）入力待機状態と、ジェスチャ（音声）入力中と、ジェスチャ（音声）入力済と、ジェスチャ（音声）入力失敗とのステータスに応じてそれぞれ４種類の表示態様のいずれかが表示される。操作者がかかる表示態様を視認することによって、入力のステータスを確認することができる。 The feedback display has, for example, four types of display modes according to the statuses of a gesture (speech) input standby state, a gesture (speech) input, a gesture (speech) input completed, and a gesture (speech) input failure. Either one is displayed. The operator can confirm the status of the input by visually recognizing the display mode.

フィードバック表示として、操作指示フェーズの残り時間を表示してもよい。例えば、所定時間入力がない場合に操作指示フェーズが自動的に終了する設定の場合に、「何もしないとあと３０秒で操作指示フェーズが終わります」というフィードバック表示をするなどしてもよい。この残り時間の表示は、例えばジェスチャのランプの色の変化やメーター表示、別途状態遷移ランプ（操作意思と操作指示のフェーズ）等の手段を用いて行なってもよい。 As the feedback display, the remaining time of the operation instruction phase may be displayed. For example, when the operation instruction phase is set to automatically end when there is no input for a predetermined time, a feedback display such as “the operation instruction phase will end in 30 seconds if nothing is done” may be displayed. The remaining time may be displayed using means such as a change in color of a gesture lamp, a meter display, a separate state transition lamp (operation intention and operation instruction phase), or the like.

また例えば、ジェスチャとして手の動作を用いる場合に、手認識のために適正な位置に手を誘導する必要がある。操作者のジェスチャ位置が適正でない場合に、図８や図９に示すように、センサから近すぎたり、遠すぎたり、カメラからずれていたり、適切な手の形でなかったりした場合にも、そのことを示す表示装置に表示を行なって、操作者にフィードバックすることができる。 In addition, for example, when a hand motion is used as a gesture, it is necessary to guide the hand to an appropriate position for hand recognition. When the operator's gesture position is not appropriate, as shown in FIGS. 8 and 9, even when the operator is too close, too far away from the camera, or out of the camera, It is possible to display on the display device indicating this and feed back to the operator.

図８の場合、手の位置がｍｉｄｄｌｅでｓｐｏｔに入っている場合は適正位置範囲内として表示によりフィードバックを行なっている。手の位置が前後にずれている場合は、手を適正位置に移動するように促す表示を行なう。 In the case of FIG. 8, when the position of the hand is in a middle and a spot, feedback is performed by display as being within the appropriate position range. When the hand position is shifted back and forth, a display prompting the user to move the hand to an appropriate position is displayed.

また、図９に示すように手の奥行き位置が適正でも、ｓｐｏｔから左右上下にずれている場合や、適正な動作（ジェスチャ・ポーズ）を行なっていない場合、手を適正位置に移動したり、適正動作へ促す表示を行なう。 Also, as shown in FIG. 9, even if the depth position of the hand is appropriate, if it is shifted left and right and up and down from the spot, or if an appropriate operation (gesture pose) is not performed, the hand is moved to the appropriate position, Display to prompt proper operation.

また、出力部１６（図１参照）としてディスプレイなどの表示装置に代えて、発光灯（ランプ）を設けてもよい。図１０は発光灯の発光例を示している。これらの発光灯は、ジェスチャと音声とのそれぞれについて設けることができ、ジェスチャと音声との判定フラグがそれぞれ入力されたことを示すときに点灯状態を変化させて、操作者にフィードバックを行なうことができる。図１０の例では、操作意思判定フェーズでのジェスチャ（音声）入力待機状態と、操作指示フェーズでのジェスチャ（音声）入力待機状態と、ジェスチャ（音声）入力中と、ジェスチャ（音声）入力済と、ジェスチャ（音声）入力失敗とのステータスに応じてそれぞれ５種類の表示態様のいずれかが表示される。５種類の表示態様の変化は表示色により表わすことができる。 Further, a light-emitting lamp (lamp) may be provided as the output unit 16 (see FIG. 1) instead of a display device such as a display. FIG. 10 shows a light emission example of the light-emitting lamp. These light-emitting lamps can be provided for each of the gesture and the voice, and the lighting state can be changed to indicate feedback to the operator when the determination flag for the gesture and the voice is indicated. it can. In the example of FIG. 10, a gesture (voice) input standby state in the operation intention determination phase, a gesture (voice) input standby state in the operation instruction phase, a gesture (voice) input, and a gesture (voice) input completed Each of the five display modes is displayed according to the status of the gesture (voice) input failure. Changes in the five types of display modes can be represented by display colors.

さらに、出力部１６がスピーカなどの音声出力装置を備える場合は、フィードバック表示の代わりに、音声で操作者にフィードバック出力を行なうことができる。もちろん、フィードバック表示と音声によるフィードバック出力とを併せて行なってもよい。 Furthermore, when the output unit 16 includes an audio output device such as a speaker, feedback output can be performed to the operator by voice instead of feedback display. Of course, feedback display and voice feedback output may be performed together.

（優先操作者判定）
本実施形態の機器操作システムでは、複数の人の中で、優先して操作可能な操作者を決めるために、操作意思判定フェーズＦ１で優先操作者を判定することもできる。図１１は優先操作者の設定を説明するための図であり、図１２は優先操作者の切り替えを説明するための図であり、図１３は複数操作者の設定を説明するための図である。 (Priority operator judgment)
In the device operation system of the present embodiment, the priority operator can be determined in the operation intention determination phase F1 in order to determine an operator who can operate with priority among a plurality of people. 11 is a diagram for explaining the setting of the priority operator, FIG. 12 is a diagram for explaining the switching of the priority operator, and FIG. 13 is a diagram for explaining the setting of a plurality of operators. .

操作意思判定フェーズＦ１において、画像処理により１人のジェスチャ入力のみが特定ジェスチャと一致すると検出された場合、そのジェスチャ入力を行なった人物を操作者候補とする。さらに音声キーワードと一致する音声入力が検出された場合、どの操作者候補が発話したのかを音源方向や撮像画像中の口の動きなどから判定し、その操作者を操作者候補とする。 In the operation intention determination phase F1, when it is detected by image processing that only one gesture input matches the specific gesture, the person who performed the gesture input is set as an operator candidate. Further, when a voice input that matches the voice keyword is detected, it is determined which operator candidate has spoken from the direction of the sound source, the movement of the mouth in the captured image, and the like, and the operator is set as the operator candidate.

図１１に示すように、点線枠で囲まれた操作者候補が特定動作および音声入力をした場合に、実線枠で囲まれた優先操作者として決定し、以降の操作の操作権限が与えられ、優先操作者のジェスチャ入力はトラッキングされる。 As shown in FIG. 11, when an operator candidate surrounded by a dotted line frame performs a specific action and voice input, it is determined as a priority operator surrounded by a solid line frame, and operation authority for subsequent operations is given, The priority operator's gesture input is tracked.

図１２に示すように、優先操作者の切り替えることもできる。図１２では、実線枠で囲まれた優先操作者が操作を終了し、ジェスチャ入力および音声入力が停止する。この状態から点線枠で囲まれた別の人物が操作意思判定の操作入力を行なう。次いで、新たな操作者候補が特定動作および音声入力をした場合に、実線枠で囲まれた優先操作者として決定し、以降の操作権限を与えられ、優先操作者の動作はトラッキングされる。 As shown in FIG. 12, the priority operator can be switched. In FIG. 12, the priority operator surrounded by the solid line frame ends the operation, and the gesture input and voice input stop. From this state, another person surrounded by a dotted line frame performs an operation input for determining the operation intention. Next, when a new operator candidate performs a specific action and voice input, it is determined as a priority operator surrounded by a solid line frame, and subsequent operation authority is given, and the priority operator's action is tracked.

図１３に示すように、実線枠で囲まれた優先操作者が複数人であってもよい。操作意思判定で操作者と判定された複数の人物について、それぞれが優先操作者として判定され、それぞれに操作権限が与えられる。 As shown in FIG. 13, there may be a plurality of priority operators surrounded by a solid line frame. Each of the plurality of persons determined as the operator in the operation intention determination is determined as a priority operator, and an operation authority is given to each.

次に、具体的な操作対象機器における動作について説明する。
（サイネージにおける構成例）
まず、操作対象機器がサイネージである場合の機器操作システムの動作について図１、２および図１４から１７に基づいて説明する。図１４は図１の機器操作システムが搭載されたサイネージ２０とこれを操作する操作者を示す説明図である。ここでは、サイネージ２０でＭａｐ機能を使用する場合を例に挙げて説明する。図１５はサイネージのＴＯＰ画面を示す図である。図１６は操作者とフィードバック表示の関係を示す図である。 Next, the operation in a specific operation target device will be described.
(Configuration example in signage)
First, the operation of the device operation system when the operation target device is signage will be described with reference to FIGS. FIG. 14 is an explanatory diagram showing a signage 20 on which the device operation system of FIG. 1 is mounted and an operator who operates the signage 20. Here, a case where the Map function is used in the signage 20 will be described as an example. FIG. 15 shows a signage TOP screen. FIG. 16 is a diagram showing the relationship between the operator and feedback display.

機器操作システムが搭載されたサイネージ２０は、例えば図１４に示すように、出力部１６として機能するモニタ２１と撮像部１５として機能するカメラセンサ２２と、音声入力部として機能するマイク２３と、出力部として機能するスピーカ２４とを備えて構成されている。図１４の例では、図１のＣＰＵ１１、ＲＯＭ１２、ＲＡＭ１３に相当する構成は、サイネージ２０内に一体に組み込まれている。 As shown in FIG. 14, for example, the signage 20 equipped with the device operation system includes a monitor 21 that functions as an output unit 16, a camera sensor 22 that functions as an imaging unit 15, a microphone 23 that functions as an audio input unit, and an output. And a speaker 24 functioning as a unit. In the example of FIG. 14, configurations corresponding to the CPU 11, the ROM 12, and the RAM 13 of FIG. 1 are integrally incorporated in the signage 20.

サイネージ２０においては、図１５に示すように、入力待機状態ではＴＯＰ画面としてモニタ２１に選択可能な項目「Ｍａｐ」、「Ｎｅｗｓ」、「Ｆｏｒｅｃａｓｔ」と、フィードバック表示項目が表示されている。この状態では、操作入力（ジェスチャ入力および音声入力）は機器に対する操作指示としては受け付けられない。この状態で、操作者がサイネージに向かって手をかざし、操作開始の音声キーワードである「ＯＫ」と発話すると、カメラセンサ２２で撮像した画像からジェスチャ入力を取得し、マイク２３で取得した音声から音声入力を取得する。取得したジェスチャ入力が特定ジェスチャと一致し、取得した音声入力に音声キーワードが含まれると判定すると、ジェスチャ判定フラグと音声判定フラグの両方が有効に切り替わり、フィードバック表示がジェスチャ入力と音声入力の入力がなされたことを示すように変化する。 In the signage 20, as shown in FIG. 15, items “Map”, “News”, “Forecast”, and feedback display items, which are selectable on the monitor 21 as a TOP screen in the input standby state, are displayed. In this state, operation inputs (gesture input and voice input) are not accepted as operation instructions for the device. In this state, when the operator holds his hand toward the signage and speaks “OK”, which is the voice keyword for starting the operation, the gesture input is acquired from the image captured by the camera sensor 22, and from the voice acquired by the microphone 23. Get voice input. When it is determined that the acquired gesture input matches the specific gesture and the acquired voice input includes a voice keyword, both the gesture determination flag and the voice determination flag are switched to be effective, and the feedback display indicates whether the gesture input and the voice input are input. It changes to show what has been done.

機器操作システムは、ジェスチャ判定フラグと音声判定フラグの両方に基づいて操作意思がありと判断し、操作意思判定フェーズから操作指示フェーズに移行する。ここで重要なのは、操作意思判定フェーズにおいて操作入力がなされても、操作指示フェーズへ移行するのみで項目画面は切り替わらずＭａｐ機能への切り替えはなされない。操作開始の音声キーワードは「ＯＫ」以外にも「操作」、「スタート」等の任意のキーワードを用いることが出来るし、操作意思を示すジェスチャは手をかざす以外のジェスチャとしてもよい。操作開始キーワードを発話する操作者の声が小さかったりすると、フィードバック表示で音声入力がなされたことを示す表示に変化しないので、操作者はフィードバック表示が変化するまで声の大きさ変えるなどして、操作に必要な声の大きさなどを確認することができる。 The device operation system determines that there is an intention to operate based on both the gesture determination flag and the voice determination flag, and shifts from the operation intention determination phase to the operation instruction phase. What is important here is that even if an operation input is made in the operation intention determination phase, only the operation instruction phase is entered, the item screen is not changed, and the Map function is not changed. In addition to “OK”, an arbitrary keyword such as “operation” or “start” can be used as the voice keyword for starting the operation, and the gesture indicating the intention of operation may be a gesture other than holding the hand. If the voice of the operator speaking the operation start keyword is low, the feedback display does not change to a display indicating that voice input has been made, so the operator changes the voice volume until the feedback display changes, etc. The loudness necessary for operation can be confirmed.

次いで、操作指示フェーズにおいて、図１６に示すように、操作者がサイネージ２０に向かって手をかざした状態で、キーワードである「Ｍａｐ」と発話すると、ここではＭａｐ機能への切り替えがなされ、コンテンツ「Ｍａｐ」の初期画面がモニタ２１に表示される。このように操作指示フェーズにおいては、操作者によるジェスチャ入力や音声入力を受け付けると、その入力に応じた処理を行なう。例えば、キーワード発話の音声入力を受け付けると、キーワード発話に対応する画面上のアイコンを操作したり、手を左や右にかざすジェスチャ入力を受け付けると、画面上のカーソル操作およびアイコン等の機能の選択操作を実行する。また、手をかざして一定時間停止するジェスチャ入力が行なわれると、カーソルで選択されたアイコン等の機能の決定操作が実行される。図１６に示す例では、操作指示フェーズにおいて、ジェスチャ入力と音声入力の両方でコンテンツ「Ｍａｐ」を選択する操作をしているが、ジェスチャ入力と音声入力のいずれか一方で操作をすることとしてもよい。 Next, in the operation instruction phase, as shown in FIG. 16, when the operator speaks the keyword “Map” while holding his hand toward the signage 20, here, switching to the Map function is performed, and the content An initial screen of “Map” is displayed on the monitor 21. In this manner, in the operation instruction phase, when a gesture input or voice input by the operator is accepted, processing corresponding to the input is performed. For example, when a voice input for a keyword utterance is accepted, an icon on the screen corresponding to the keyword utterance is operated, or when a gesture input for holding the hand left or right is accepted, a cursor operation on the screen and selection of functions such as an icon are selected. Perform the operation. In addition, when a gesture input for holding a hand and stopping for a certain period of time is performed, an operation for determining a function such as an icon selected by the cursor is performed. In the example shown in FIG. 16, in the operation instruction phase, the operation of selecting the content “Map” by both the gesture input and the voice input is performed. However, the operation may be performed by either the gesture input or the voice input. Good.

以上説明したように、操作対象機器がサイネージである場合に、本実施形態の機器操作システムによれば、操作を意図しない入力による誤入力がなされることが防止される。 As described above, when the operation target device is signage, according to the device operation system of the present embodiment, it is possible to prevent an erroneous input due to an input that is not intended for operation.

図１７はサイネージのメニュー項目の階層構造を示す図である。サイネージのメニュー項目は、ＴＯＰ画面から「Ｍａｐ」以外にも「Ｎｅｗｓ」または「Ｆｏｒｅｃａｓｔ」を選択することができ、それぞれの項目ごとに下の階層の項目をさらに選択することができる。このとき、段階的に下の階層を選択する以外にも、キーワード入力で一気により下の階層を選択することもできる。例えば、「Ｎｅｗｓ−Ｓｏｃｃｅｒ」を選択する場合、操作指示フェーズにおいてＴＯＰ画面が表示された状態でキーワードである「サッカーニュース」と発話すると、一気に「Ｎｅｗｓ−Ｓｏｃｃｅｒ」を選択した状態（ジャンプ）にすることができる。 FIG. 17 is a diagram illustrating a hierarchical structure of signage menu items. As for the menu item of signage, “News” or “Forecast” can be selected in addition to “Map” from the TOP screen, and an item in a lower hierarchy can be further selected for each item. At this time, in addition to selecting the lower hierarchy step by step, the lower hierarchy can also be selected at once by keyword input. For example, when selecting “News-Soccer”, if the keyword “Soccer News” is spoken while the TOP screen is displayed in the operation instruction phase, “News-Soccer” is selected (jump) at once. be able to.

このように、操作階層をジャンプする構成により操作者の負担を減らすことができる。因みに、操作メニューが階層構造になっていたりすると、所望の操作結果を得るために、長時間操作を必要とするため、操作者の肉体的負担が大きくなる可能性もある。ジェスチャ入力では、センサに対して操作者が操作意図を示す動作によって、操作することが一般的である。つまり、人間工学的に楽な姿勢（直立不動等）ではなく、何らかの肉体的負荷を伴う動作を必要とする。例えば、センサに対して手をかざす動作をジェスチャ操作とする。長時間手を挙げ続けることは、操作者にとって肉体的負担を課すこととなる。操作対象に対して多くの指示をしようとすれば、必然的にジェスチャの動作時間も長くなる。逆に操作対象への指示数を減らせば、その分操作対象への操作項目も減り、利便性が損なわれる。よって、操作階層をジャンプする構成により、操作者の負担を減らすことができる。 Thus, the burden on the operator can be reduced by the configuration of jumping the operation hierarchy. Incidentally, if the operation menu has a hierarchical structure, a long operation time is required to obtain a desired operation result, which may increase the physical burden on the operator. In gesture input, it is common for an operator to perform an operation on a sensor by a motion indicating an operation intention. That is, not an ergonomically comfortable posture (upright immobility, etc.) but an action with some physical load is required. For example, an operation of holding a hand over the sensor is a gesture operation. Keeping your hand raised for a long time imposes a physical burden on the operator. If many instructions are to be given to the operation target, the operation time of the gesture inevitably increases. Conversely, if the number of instructions to the operation target is reduced, the number of operation items to the operation target is reduced accordingly, and convenience is impaired. Therefore, the burden on the operator can be reduced by the configuration of jumping the operation hierarchy.

（スマートフォンにおける構成例）
操作対象機器がスマートフォンである場合の機器操作システムの動作について図１、２に基づいて説明する。ここではスマートフォンでメール機能を使用する場合を例に挙げて説明する。 (Configuration example for smartphones)
The operation of the device operation system when the operation target device is a smartphone will be described with reference to FIGS. Here, a case where the mail function is used with a smartphone will be described as an example.

スマートフォンが入力待機状態にあるときに、スマートフォンに手をかざし、キーワードである「操作」と発話すると、ジェスチャ判定フラグと音声判定フラグの両方が有効に変化して、スマートフォン画面におけるフィードバック表示がジェスチャ入力と音声入力の入力がなされたことを示す表示に切り替わると共に操作意思がありと判断される。操作意思がありと判断された段階では、操作指示フェーズへ移行するのみで、メール機能への切り替えはなされない。 When the smartphone is in the input standby state, if you hold your hand over the smartphone and say “Operation” as a keyword, both the gesture determination flag and the voice determination flag will change effectively, and the feedback display on the smartphone screen will be the gesture input. And a display indicating that voice input has been made, and it is determined that there is an intention to operate. When it is determined that there is an intention to operate, only the operation instruction phase is entered, and the mail function is not switched.

操作指示フェーズにおいて操作者によるジェスチャ入力や音声入力を受け付けると、その入力に応じた処理を行なう。例えば、キーワード発話の音声入力を受け付けると、キーワード発話に対応する画面上のアイコンを操作したり、手を左や右にかざすジェスチャ入力を受け付けると、画面上のカーソル操作およびアイコン等の機能の選択操作をする。また、手をかざして一定時間停止するジェスチャ入力が行なわれると、カーソルで選択されたアイコン等の機能の決定操作が行なわれる。 When a gesture input or voice input by the operator is received in the operation instruction phase, processing corresponding to the input is performed. For example, when a voice input for a keyword utterance is accepted, an icon on the screen corresponding to the keyword utterance is operated, or when a gesture input for holding the hand left or right is accepted, a cursor operation on the screen and selection of functions such as an icon are selected. Operate. In addition, when a gesture input for holding a hand for a certain period of time is performed, an operation for determining a function such as an icon selected by the cursor is performed.

また、スマートフォンを操作対象として構成される機器操作システムでは、受電するなどの緊急事態の場合に、操作意思判定フェーズを省略することを妨げるものではない。この場合、スマートフォンが入力待機状態にあるときに電話やビデオ電話を受電したときに、手をかざすジェスチャ入力と「もしもし」とのキーワードの音声入力を受け付けると、操作意思判定を省略し、通話開始する。 Further, in an equipment operation system configured with a smartphone as an operation target, it does not prevent the operation intention determination phase from being omitted in an emergency situation such as receiving power. In this case, when receiving a call or video call while the smartphone is in the input standby state, if the gesture input with the hand and the voice input of the keyword “Moshi” are received, the operation intention determination is omitted and the call starts. To do.

このように、操作対象機器がスマートフォンである場合でも、本実施形態の機器操作システムによれば、操作を意図しない入力による誤入力がなされることが防止される。 Thus, even when the operation target device is a smartphone, according to the device operation system of the present embodiment, erroneous input due to an input that is not intended for operation is prevented.

（家電機器における構成例）
ここでは操作対象機器が家電機器である場合の機器操作システムの動作について図１、２および図１８から図２６に基づいて説明する。 (Configuration example in home appliances)
Here, the operation of the device operation system when the operation target device is a home appliance will be described with reference to FIGS. 1 and 2 and FIGS. 18 to 26.

［構成例１］
構成例１は、操作対象機器がリビングのテレビと照明である場合である。構成例１では、図１８に示すように、テレビ３１と照明３２とが機器機能処理部１７を搭載しており、音声入力部１４、撮像部１５、出力部１６はテレビ３１と照明３２とは別の装置として構成されていてもよい。図１の機器操作システム１における音声入力部１４、撮像部１５、出力部１６は、操作者の操作入力が可能であり操作者が容易に確認できるように、例えばリビングの棚や壁に設置することができる。これらの音声入力部１４、撮像部１５、出力部１６は、ＣＰＵ１１、ＲＯＭ１２、ＲＡＭ１３と一体の装置として構成することができる。この場合、有線または無線の接続手段１０により、テレビ３１や照明３２と通信し、テレビ３１と照明３２のそれぞれの機器機能処理部１７が操作指示に基づいて処理を実行する。この例では、テレビ３１と照明３２のいずれもがＯＦＦの状態から照明を点灯して、さらにテレビもＯＮ状態にする場合について説明する。 [Configuration example 1]
The configuration example 1 is a case where the operation target devices are a living room television and lighting. In the configuration example 1, as illustrated in FIG. 18, the television 31 and the illumination 32 are equipped with the device function processing unit 17, and the audio input unit 14, the imaging unit 15, and the output unit 16 are You may be comprised as another apparatus. The voice input unit 14, the imaging unit 15, and the output unit 16 in the device operation system 1 of FIG. 1 are installed, for example, on a shelf or a wall in a living room so that an operator can input an operation and can be easily confirmed. be able to. The audio input unit 14, the imaging unit 15, and the output unit 16 can be configured as an apparatus integrated with the CPU 11, the ROM 12, and the RAM 13. In this case, the wired or wireless connection means 10 communicates with the television 31 and the illumination 32, and each device function processing unit 17 of the television 31 and the illumination 32 executes processing based on the operation instruction. In this example, a case will be described in which lighting is turned on from the state where both the television 31 and the illumination 32 are OFF, and the television is also turned ON.

まず、操作者が撮像部１５に向かって手をかざし、キーワードである「操作」と発話すると、ジェスチャ判定フラグと音声判定フラグがそれぞれ有効になる。このように両方のフラグが有効になると、出力部１６においてフィードバック表示が点灯すると共に操作意思ありと判定され、操作指示フェーズに移行する。 First, when the operator holds his hand toward the imaging unit 15 and speaks the keyword “operation”, the gesture determination flag and the voice determination flag become valid. When both the flags are thus valid, the feedback display is turned on in the output unit 16 and it is determined that there is an intention to operate, and the process proceeds to the operation instruction phase.

次に、操作指示フェーズにおいて、図１９に示すように操作者が撮像部１５に向かって手をかざし、キーワードである「照明」と発話すると、フィードバック表示が点灯すると共に、照明３２が操作対象として選択される。このとき、照明３２が消灯状態である場合は、選択されたときに照明３２を点灯状態にさせてもよい。さらに、図２０に示すように、キーワード「調光」と発話し、撮像部１５に向かって右向きに手をかざすと、フィードバック表示が点灯すると共に、照明の強度を強くすることが指示されたと判定し、照明３２をより明るく点灯させる操作を実行する。その後、キーワード「調光」と発話し、撮像部１５に向かって左向きに手をかざすと、フィードバック表示が点灯すると共に、照明３２の強度を弱めることが指示されたと判定し、照明３２の点灯強度を下げる。 Next, in the operation instruction phase, as shown in FIG. 19, when the operator holds his hand toward the imaging unit 15 and speaks the keyword “lighting”, the feedback display is turned on and the light 32 is set as the operation target. Selected. At this time, when the illumination 32 is in the off state, the illumination 32 may be in the on state when selected. Furthermore, as shown in FIG. 20, when the keyword “dimming” is spoken and the hand is held in the right direction toward the imaging unit 15, the feedback display is turned on and it is determined that an instruction to increase the intensity of illumination is given. Then, an operation for lighting the illumination 32 brighter is executed. Thereafter, when the user speaks the keyword “dimming” and holds his hand toward the imaging unit 15, the feedback display is turned on, and it is determined that an instruction to weaken the intensity of the illumination 32 is given. Lower.

図２１に示すように、照明３２の操作中に、キーワード「ＴＶ」と発話すると、フィードバック表示が点灯すると共に、操作対象がＴＶ３１に切り替わる。ＴＶ３１がもともとＯＦＦになっていた場合は操作対象がＴＶ３１に切り替わったときにＴＶ３１のスイッチをＯＮにさせてもよい。さらにキーワード「音量」と発話すると、フィードバック表示が点灯すると共に、音量を調節することができる。音量の調節は、音声入力で行なってもよいし、ジェスチャ入力で行なってもよい。照明３２からＴＶ３１の切り替えは、図２２に示すように、音声入力とジェスチャ入力とを組み合わせて行なってもよい。 As shown in FIG. 21, when the keyword “TV” is spoken during operation of the illumination 32, the feedback display is turned on and the operation target is switched to the TV 31. When the TV 31 is originally turned off, the switch of the TV 31 may be turned on when the operation target is switched to the TV 31. Furthermore, when the keyword “volume” is spoken, the feedback display is lit and the volume can be adjusted. The volume may be adjusted by voice input or gesture input. Switching from the illumination 32 to the TV 31 may be performed by combining voice input and gesture input as shown in FIG.

このように、本実施形態の機器操作システムでは、複数の家電機器を切り替えて操作することができる。 Thus, in the device operation system of the present embodiment, a plurality of home appliances can be switched and operated.

［構成例２］
構成例２は、操作対象機器が照明である場合である。この例では、図２３に示すように、構成例１と異なり、機器操作システム１に表示部がない構成で説明する。この例では、音声出力部１６がフィードバックを行なう。 [Configuration example 2]
Configuration example 2 is a case where the operation target device is illumination. In this example, as shown in FIG. 23, unlike the configuration example 1, the device operation system 1 will be described with a configuration without a display unit. In this example, the audio output unit 16 performs feedback.

まず、図２４に示すように、操作者が撮像部１５に向かって手をかざし、キーワードである「照明」と発話すると、ジェスチャ判定フラグと音声判定フラグがそれぞれ有効になる。このように両方のフラグが有効になると、音声出力部１６が「音声入力あり」と出力することによりフィードバックがなされると共に、操作意思ありと判定され、操作指示フェーズに切り替えられる。 First, as shown in FIG. 24, when the operator holds his hand toward the imaging unit 15 and speaks the keyword “lighting”, the gesture determination flag and the voice determination flag become valid. When both flags are enabled in this manner, the voice output unit 16 outputs “voice input is present” to provide feedback, determine that there is an intention to operate, and switch to the operation instruction phase.

次に、図２５に示すように、キーワード「調光」と発話し、撮像部１５に向かって右向きに手をかざすと、音声出力部１６が「音声入力あり」「ジェスチャ入力あり」と出力することによりフィードバックがなされると共に、照明の調光をすることが指示されたと判定し、照明を点灯し、ジェスチャ入力に応じた調光を実行する。その後、撮像部１５に向かって左向きに手をかざすと、音声出力部１６が「ジェスチャ入力あり」と出力することによりフィードバックがなされると共に、ジェスチャ入力に応じた調光を実行する。さらに、図２６に示すように、キーワード「消灯」と発話し、照明を消灯することが指示されたと判定し、照明３２を消灯する。また、消灯した後、さらに一定時間以上操作しないと、操作意思判定フェーズＦ１に戻るため、操作者が撮像部１５に向かって右向きに手をかざしても、操作することができず、再度照明を調整する場合は、操作意思の確認から始まる。 Next, as shown in FIG. 25, when the user speaks the keyword “light control” and holds his / her hand toward the imaging unit 15, the voice output unit 16 outputs “voice input” and “gesture input”. Thus, feedback is made, and it is determined that it is instructed to dimm the illumination, the illumination is turned on, and the dimming according to the gesture input is executed. Thereafter, when the hand is held leftward toward the image pickup unit 15, the audio output unit 16 outputs “gesture input present” to provide feedback and perform dimming according to the gesture input. Further, as shown in FIG. 26, the keyword “turn off” is spoken, it is determined that turning off the illumination is instructed, and the illumination 32 is turned off. Further, if the operator does not operate for a certain period of time after the light is turned off, the operation will return to the operation intention determination phase F1. In the case of adjustment, it starts from confirmation of the operation intention.

このように、操作対象機器が家電機器である場合でも、本実施形態の機器操作システムによれば、操作を意図しない入力による誤入力がなされることが防止される。 Thus, even when the operation target device is a home appliance, according to the device operation system of the present embodiment, erroneous input due to an input that is not intended for operation is prevented.

（自動車装備における構成例）
本実施形態の機器操作システムが自動車装備の制御システムとして構成されることもできる。操作対象機器が自動車装備である場合の機器操作システムの動作について図１、２および図２７から図３２を用いて説明する。図２７は自動車装備の制御システムに組み込まれた機器操作システムの構成例を示す図であり、図２８は操作意思判定フェーズを説明する図であり、図２９から図３２は操作指示フェーズを説明する図である。 (Configuration example for automobile equipment)
The equipment operation system of this embodiment can also be configured as a control system for automobile equipment. The operation of the device operation system when the operation target device is an automobile equipment will be described with reference to FIGS. FIG. 27 is a diagram illustrating a configuration example of a device operation system incorporated in a control system for automobile equipment, FIG. 28 is a diagram illustrating an operation intention determination phase, and FIGS. 29 to 32 are diagrams illustrating an operation instruction phase. FIG.

図２７に示すように、自動車装備の制御システムでは、ＨＵＤ（ＨｅａｄＵｐＤｉｓｐｌａｙ：ヘッドアップディスプレイ）４１とカーナビ４２とスピーカ４３とが出力部１６（図１参照）として機能することができる。ハンドルの奥のパネルに設けられたカメラ４４とマイク４５はそれぞれ撮像部１５（図１参照）と音声入力部１４（図１参照）として機能することができる。また、インストルメントパネル４６を出力部１６（図１参照）として用いてもよい。 As shown in FIG. 27, in a control system for automobile equipment, a HUD (Head Up Display) 41, a car navigation system 42, and a speaker 43 can function as the output unit 16 (see FIG. 1). The camera 44 and the microphone 45 provided on the panel at the back of the handle can function as the imaging unit 15 (see FIG. 1) and the voice input unit 14 (see FIG. 1), respectively. Moreover, you may use the instrument panel 46 as the output part 16 (refer FIG. 1).

自動車装備としては、例えば、カーナビ、オーディオ、ＨＵＤ、エアコン、サイドミラー、電子ミラー、バックモニタ、ライト、パワーウインドウ、トランク、ワイパーなどを操作対象とすることができる。カーナビに対して、地図操作、カメラモニタ切り替え等を制御したり、オーディオに対して、音量、チャンネル等を制御したり、ヘッドアップディスプレイに対して、速度表示、目的地距離表示等を制御したり、エアコンに対して、風量調整、風向等を制御したり、サイドミラーに対して、向き調整等を制御したり、ライトに対して、点灯と消灯とを制御したり、パワーウインドウに対して、開閉調整を制御したり、トランクに対して、開閉を制御したり、ワイパーに対して、動作制御をしたりすることができる。 As car equipment, for example, a car navigation system, an audio system, a HUD, an air conditioner, a side mirror, an electronic mirror, a back monitor, a light, a power window, a trunk, a wiper, and the like can be targeted for operation. Control map operation, camera monitor switching, etc. for car navigation, control volume, channel, etc. for audio, control speed display, destination distance display, etc. for head-up display For air conditioners, air volume adjustment, wind direction, etc., for side mirrors, direction adjustment, etc., for lights, turning on and off, for power windows, It is possible to control the opening / closing adjustment, to control the opening / closing of the trunk, and to control the operation of the wiper.

まず、機器操作システムが待機状態のときに、図２８に示すように、両親指を挙げるジェスチャを行ない、キーワード「スタート」と発話すると、ジェスチャ判定フラグと音声判定フラグの両方が有効に変化して、ＨＵＤ４１等におけるフィードバック表示がジェスチャ入力と音声入力の入力がなされたことを示す表示に切り替わると共に操作意思がありと判断され、操作指示フェーズに移行する。 First, when the device operation system is in a standby state, as shown in FIG. 28, when a gesture is performed with both thumbs and the keyword “start” is spoken, both the gesture determination flag and the voice determination flag are effectively changed. The feedback display on the HUD 41 or the like is switched to a display indicating that a gesture input and a voice input are made, and it is determined that there is an intention to operate, and the operation instruction phase is entered.

次いで操作指示フェーズにおいて、「カーナビ」と発話すると、ＨＵＤ４１等におけるフィードバック表示が音声入力の入力がなされたことを示す表示に切り替わると共に、カーナビ４２をＯＮするなどして選択する処理を実行する。 Next, in the operation instruction phase, when “car navigation” is spoken, the feedback display on the HUD 41 or the like is switched to a display indicating that an input of voice input has been made, and a process of selecting by turning on the car navigation 42 is executed.

カーナビが選択された状態で、図２９に示すように右手の親指によるプッシュ動作を行なうと、ＨＵＤ４１等におけるフィードバック表示がジェスチャ入力がなされたことを示す表示に切り替わると共にカーナビ４２の画面に目的地ナビ情報が表示される。 When a push operation with the thumb of the right hand is performed with the car navigation selected, the feedback display on the HUD 41 or the like is switched to a display indicating that a gesture input has been made and the destination navigation is displayed on the screen of the car navigation 42. Information is displayed.

また、カーナビが選択された状態で、図３０に示すように両手の親指によるプッシュ動作を行なうと、ＨＵＤ４１等におけるフィードバック表示がジェスチャ入力がなされたことを示す表示に切り替わると共にカーナビ４２に表示された音楽が再生され、さらに右手の親指によるプッシュ動作を行なうと、ＨＵＤ４１等におけるフィードバック表示がジェスチャ入力がなされたことを示す表示に切り替わると共にカーナビ４２に表示された音楽が一時停止される。 In addition, when a push operation with the thumbs of both hands is performed with the car navigation selected, the feedback display on the HUD 41 or the like is switched to a display indicating that a gesture input has been made and displayed on the car navigation 42. When music is played and further a push operation with the thumb of the right hand is performed, the feedback display on the HUD 41 or the like is switched to a display indicating that a gesture input has been made, and the music displayed on the car navigation system 42 is paused.

また、カーナビが選択された状態からエアコンの風量調整を行なう場合は、図３１に示すように、「エアコン」と発話すると、ＨＵＤ４１等におけるフィードバック表示が音声入力の入力がなされたことを示す表示に切り替わると共にエアコンをＯＮするなどして選択する処理が実行される。さらに「風量」と発話して右手の親指によるプッシュ動作を行なうと、ＨＵＤ４１等におけるフィードバック表示がジェスチャ入力と音声入力の入力がなされたことを示す表示に切り替わると共にエアコンの風量がアップする。プッシュ動作をやめると、エアコンの風量調整が停止する。 Further, when the air volume adjustment of the air conditioner is performed from the state in which the car navigation is selected, as shown in FIG. 31, when “air conditioner” is spoken, the feedback display on the HUD 41 or the like is displayed to indicate that the voice input is input. The selection process is performed by switching the air conditioner and turning on the air conditioner. Further, when the “air volume” is spoken and a push operation with the thumb of the right hand is performed, the feedback display on the HUD 41 or the like is switched to a display indicating that the gesture input and the voice input have been made, and the air volume of the air conditioner is increased. When the push operation is stopped, the air volume adjustment of the air conditioner stops.

カーナビが選択された状態からエアコンの風量調整を行なう場合、「エアコン」と発話することなく、図３２に示すように、カーナビが選択された状態でエアコンに特有である特定の音声キーワード「風量」と発話して、右手の親指によるプッシュ動作を行なうことにより、エアコンの風量をアップすることもできる。 When the air volume adjustment of the air conditioner is performed from the state where the car navigation system is selected, a specific voice keyword “air volume” that is specific to the air conditioner is selected with the car navigation system selected, without speaking “air conditioner”. The air volume of the air conditioner can be increased by performing a push operation with the thumb of the right hand.

このように、操作対象機器が自動車装備である場合でも、本実施形態の機器操作システムによれば、操作を意図しない入力による誤入力がなされることが防止される。 Thus, even when the operation target device is equipped with an automobile, according to the device operation system of the present embodiment, erroneous input due to an input that is not intended for operation is prevented.

１機器操作システム
１１ＣＰＵ
１２ＲＯＭ
１３ＲＡＭ
１４音声入力部
１５撮像部
１６出力部
１７機器機能処理部
１０接続手段
２０サイネージ
２１モニタ
２２カメラセンサ
２３マイク
２４スピーカ
３１テレビ
３２照明
４１ＨＵＤ
４２カーナビ
４３スピーカ
４４カメラ
４５マイク
４６インストルメントパネル 1 Device operation system 11 CPU
12 ROM
13 RAM
14 Audio input unit 15 Imaging unit 16 Output unit 17 Device function processing unit 10 Connection means 20 Signage 21 Monitor 22 Camera sensor 23 Microphone 24 Speaker 31 Television 32 Illumination 41 HUD
42 Car navigation system 43 Speaker 44 Camera 45 Microphone 46 Instrument panel

Claims

A device operation system for operating a device based on gesture input and voice input,
Operation intention confirming means for confirming the presence or absence of an operation intention without performing the operation of the device by gesture input and voice input;
Operation instruction means for instructing operation of the device by at least one of gesture input and voice input;
The operation intention confirmation means has the operation intention when it is determined that a specific gesture indicating the operation intention is detected as the gesture input and a specific voice keyword indicating the operation intention is detected as the voice input. The apparatus operation system is characterized in that the process is shifted to processing by the operation instruction means.

The operation intention confirmation unit determines whether or not the gesture input is a specific gesture, determines whether or not the gesture input determination unit sets a specific gesture determination flag, and determines whether or not the voice input is a specific voice keyword Voice input determination means for setting a specific voice determination flag, and flag determination means for determining whether or not there is an operation intention based on the specific gesture determination flag and the specific voice determination flag. The device operation system according to claim 1.

3. The device according to claim 2, wherein the gesture input determination unit stores a history of gesture inputs detected over a plurality of times, and sets a specific gesture determination flag based on the plurality of gesture inputs stored in the history. Operation system.

The device operation system according to claim 2 or 3, wherein the gesture input determination unit, the voice input determination unit, and the flag determination unit perform processing asynchronously with each other.

5. The device operation system according to claim 1, wherein the device to be operated is any one of a signage, a PC, a smartphone, an HMD, a home appliance, and an automobile equipment.

A device operation method for operating a device based on gesture input and voice input,
An operation intention confirmation stage for confirming the presence or absence of an operation intention without performing the operation of the device by gesture input and voice input,
An operation instruction step for instructing operation of the device by at least one of gesture input and voice input,
When it is determined that the specific gesture indicating the operation intention is detected as the gesture input in the operation intention confirmation stage and the specific voice keyword indicating the operation intention is detected as the voice input, the operation intention is present. A device operation method characterized by determining and proceeding to the operation instruction step.

A program for causing a computer to execute a device operation method for operating a device based on gesture input and voice input, the device operation method comprising:
An operation intention confirmation stage for confirming the presence or absence of an operation intention without performing the operation of the device by gesture input and voice input,
An operation instruction step for instructing operation of the device by at least one of gesture input and voice input,
In the operation intention confirmation step, when it is determined that a specific gesture indicating an operation intention is detected as the gesture input and a specific voice keyword indicating the operation intention is detected as the voice input, the operation intention is present. And a device operation program, wherein the operation instruction stage is determined.