JP2010238145A

JP2010238145A - Information output device, remote control method and program

Info

Publication number: JP2010238145A
Application number: JP2009087910A
Authority: JP
Inventors: Kunio Sato; 邦雄佐藤
Original assignee: Casio Computer Co Ltd
Current assignee: Casio Computer Co Ltd
Priority date: 2009-03-31
Filing date: 2009-03-31
Publication date: 2010-10-21

Abstract

<P>PROBLEM TO BE SOLVED: To provide an information output device, a remote control method and a program, facilitating control by an intuitive gesture of a user. <P>SOLUTION: A CPU 32 detects a face from an image captured by an imaging part 44, detects eyes, ears, and mouth from the detected face, detects the gesture in the vicinity of the eyes, ears, and mouth, specifies processing related to a characteristic object (eyes, ears, or mouth) that is a target of the gesture, and an instruction associated with the detected gesture, and controls the processing related to the characteristic object (eyes, ears, or mouth) detected based on the instruction. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、人間の動作を検出して制御を行う情報出力装置、遠隔制御方法、及びプログラムに関する。 The present invention relates to an information output device, a remote control method, and a program for detecting and controlling human movements.

従来より、人間が機械を操作するマンマシンインタフェースとして多様なユーザインタフェース（ＵＩ：ＵｓｅｒＩｎｔｅｒｆａｃｅ）が提案されている。このようなインタフェースとして、例えば、多くの電化製品や電子機器に設けられた操作パネルがあげられる。ユーザはこの操作パネルを操作することによって、当該電化製品や電子機器に各種の指示を入力をする。また、遠隔から電化製品や電子機器を操作するためのリモートコントローラ（以後、「リモコン」と言う）も存在している。リモコンは、ユーザが電化製品や電子機器に近づいて操作パネルを直接操作する作業を不要にしている。 Conventionally, various user interfaces (UI: User Interface) have been proposed as man-machine interfaces for manipulating machines. Examples of such an interface include operation panels provided in many electrical appliances and electronic devices. The user operates the operation panel to input various instructions to the appliance or electronic device. There are also remote controllers (hereinafter referred to as “remote controllers”) for remotely operating electrical appliances and electronic devices. The remote control eliminates the need for the user to operate the operation panel directly by approaching an electric appliance or electronic device.

遠隔操作に関する技術として、例えば、特許文献１に、カメラより入力した入力画像から被撮影者を特定し、被撮影者が行う身振り手振りによるジェスチャによって、カメラ及び雲台を制御するものが開示されている。 As a technique related to remote operation, for example, Patent Document 1 discloses a technique in which a person to be photographed is specified from an input image input from a camera, and a camera and a pan head are controlled by gestures by gestures performed by the person to be photographed. Yes.

特開２００５−５１４７２号公報JP 2005-51472 A

しかしながら、特許文献１に記載の手法では、ジェスチャに応じて操作する設定項目を特定するため、操作する設定項目の数が増加すると、覚える必要のあるジェスチャの数も増加し、一つ一つのジェスチャと設定項目の関係を覚えきれなくなる可能性がある。また、ユーザが操作しようとする設定項目に対して、必要とするジェスチャが思い出せず、操作に時間と手間がかかることがある。 However, in the method described in Patent Document 1, in order to specify setting items to be operated according to gestures, when the number of setting items to be operated increases, the number of gestures that need to be remembered also increases, and each gesture. You may not be able to remember the relationship between and setting items. In addition, a user may not remember a necessary gesture for a setting item to be operated by the user, and the operation may take time and effort.

本発明は、このような課題に対して鑑みなされたものであり、ユーザの直感的なジェスチャによる制御を容易に実現することができる情報出力装置、遠隔制御方法、及び、プログラムを提供することを目的とする。 The present invention has been made in view of such problems, and provides an information output device, a remote control method, and a program that can easily realize control by a user's intuitive gesture. Objective.

上述の目的を達成するため、請求項１に記載の発明に係る情報出力装置は、撮像手段と、情報を出力する出力手段と前記撮像手段によって撮像した画像から、遠隔制御する者が有する前記情報の種類に対応する感覚器官の像を検出する検出手段と、前記検出手段によって前記感覚器官の像を検出すると、前記遠隔制御する者による前記感覚器官に対する所定の動作を検出する動作検出手段と、前記動作検出手段によって所定の動作を検出すると、前記出力手段による前記情報の出力を、前記動作検出手段によって検出した動作で特定される制御内容で制御する制御手段と、を備えたことを特徴とする。 In order to achieve the above-described object, an information output apparatus according to the first aspect of the present invention is an information output device according to claim 1, which the remote control person has from the image pickup means, the output means for outputting information, and the image picked up by the image pickup means. Detecting means for detecting an image of a sensory organ corresponding to the type of the device, and detecting an image of the sensory organ by the detecting means, and detecting a predetermined action on the sensory organ by a person who is remotely controlled; Control means for controlling the output of the information by the output means with the control content specified by the action detected by the action detecting means when a predetermined action is detected by the action detecting means. To do.

請求項２に記載の発明に係る情報出力装置は、前記情報とは画像であるとともに、前記情報の種類に対応する感覚器官とは前記遠隔制御する者の目であることを特徴とする。 In the information output apparatus according to the second aspect of the present invention, the information is an image, and the sensory organ corresponding to the type of the information is an eye of the person who performs the remote control.

請求項３に記載の発明に係る情報出力装置は、前記情報とは音声であるとともに、前記情報の種類に対応する感覚器官とは前記遠隔制御する者の口若しくは耳であることを特徴とする。 According to a third aspect of the present invention, in the information output device according to the third aspect, the information is a voice, and the sensory organ corresponding to the type of the information is the mouth or ear of the person who performs remote control. .

請求項４に記載の発明に係る情報出力装置は、前記出力手段が情報を出力した際に、前記検出手段による検出があったか否かを判断する検出判断手段と、この検出判断手段によって検出があったと判断されると、付加情報を入力する付加情報入力手段と、この付加情報入力手段によって入力された付加情報と前記出力手段が出力した情報に付加する情報付加手段と、を更に備えたことを特徴とする。 An information output apparatus according to a fourth aspect of the invention includes a detection determination unit that determines whether or not a detection has been performed by the detection unit when the output unit outputs information, and the detection determination unit detects the detection. If it is determined that the additional information is input, additional information input means for inputting the additional information, additional information input by the additional information input means, and information adding means for adding to the information output by the output means are further provided. Features.

請求項５に記載の発明に係る情報出力装置は、音声入力手段を更に備え、前記情報付加手段は、前記音声入力手段によって入力された音声を前記情報に付加することを特徴とする。 The information output apparatus according to a fifth aspect of the present invention further includes voice input means, and the information adding means adds the voice input by the voice input means to the information.

請求項６に記載の発明に係る遠隔制御方法は、複数種の機能を実現する装置の制御方法であって、撮像した画像から所定の特徴物を検出する検出ステップと、前記検出ステップにて所定の特徴物を検出すると、遠隔制御する者による当該特徴物に対する所定の動作を検出する動作検出ステップと、前記動作検出ステップにて所定の動作を検出すると、前記複数種の機能のうち前記所定の特徴物で特定される機能を、前記動作検出ステップにて検出した動作で特定される制御内容で制御する制御ステップとを有することを特徴とする。 According to a sixth aspect of the present invention, there is provided a remote control method for a device that realizes a plurality of functions, a detection step of detecting a predetermined feature from a captured image, and a predetermined step in the detection step. When detecting a feature, the operation detecting step for detecting a predetermined operation on the feature by a remote controller, and detecting the predetermined operation in the operation detecting step, the predetermined function among the plurality of types of functions is detected. And a control step of controlling the function specified by the characteristic object by the control content specified by the operation detected in the operation detection step.

請求項７に記載の発明に係る遠隔制御プログラムは、撮像した画像から所定の特徴物を検出する検出手段、前記検出手段によって所定の特徴物を検出すると、遠隔制御する者による当該特徴物に対する所定の動作を検出する動作検出手段、前記動作検出手段によって所定の動作を検出すると、前記複数種の機能のうち前記所定の特徴物で特定される機能を、前記動作検出手段によって検出した動作で特定される制御内容で制御する制御手段として機能をコンピュータに実行させることを特徴とする。 According to a seventh aspect of the present invention, there is provided a remote control program for detecting a predetermined feature from a captured image. When the predetermined feature is detected by the detection unit, a remote control person performs a predetermined control on the feature. When a predetermined motion is detected by the motion detection means, the function specified by the predetermined feature among the plurality of types of functions is identified by the motion detected by the motion detection means. It is characterized by causing a computer to execute a function as a control means for controlling according to the control content.

本発明によれば、ジェスチャによる設定項目の制御を容易に実現することができる。 According to the present invention, it is possible to easily realize control of setting items by gestures.

本発明の第１の実施の形態の情報出力装置であるジェスチャ入力装置を備えたデジタルフォトフレームの外観斜視図である。1 is an external perspective view of a digital photo frame including a gesture input device that is an information output device according to a first embodiment of the present invention. 図１のデジタルフォトフレーム１の内部の電気的構成を示すブロック図である。FIG. 2 is a block diagram showing an internal electrical configuration of the digital photo frame 1 of FIG. 1. 図１のデジタルフォトフレームの制御回路３０が実行する指示入力処理を示すフォローチャートである。3 is a follow chart showing instruction input processing executed by the control circuit 30 of the digital photo frame of FIG. 1. 画像における感覚器官の検出を説明する図である。It is a figure explaining the detection of the sensory organ in an image. ジェスチャ入力用テーブルを示す図である。It is a figure which shows the table for gesture input. ジェスチャの例を示す図である。It is a figure which shows the example of gesture. デジタルフォトフレーム１の動作例を示す図である。6 is a diagram illustrating an operation example of the digital photo frame 1. FIG. 本発明の第２の実施の形態の情報出力装置である遠隔制御装置からなるシステムの概要を示す図である。It is a figure which shows the outline | summary of the system which consists of a remote control apparatus which is the information output device of the 2nd Embodiment of this invention. 図８の遠隔制御装置の内部の電気的構成を示すブロック図である。It is a block diagram which shows the electrical structure inside the remote control apparatus of FIG.

以下、本発明の実施の形態について、図面を参照しながら詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

［第１の実施の形態］
まず、本発明の第１の実施の形態について説明する。 [First Embodiment]
First, a first embodiment of the present invention will be described.

［基本構成］
図１は本発明の第１の実施の形態の情報出力装置であるジェスチャ入力装置を備えたデジタルフォトフレームの外観斜視図である。デジタルフォトフレーム１は、本体ケース５に、液晶表示装置１０、カメラレンズ１２、複数の押しボタンからなる操作パネル１４、スピーカ１６、制御回路（図２参照）を構成する回路基板等を組み込むことによって構成されている。本体ケース５の表側には液晶表示装置１０、カメラレンズ１２、スピーカ１６などを配設している。本体ケース５の裏側には、デジタルフォトフレーム１を立てた姿勢で維持するための支持部材１９が取り付けられている。本体ケース５の側部には、メモリカード５０（図２参照）を挿入するためのスロット（図示せず）を開閉させる開閉部材２０や、操作パネル１４が設けられている。 [Basic configuration]
FIG. 1 is an external perspective view of a digital photo frame provided with a gesture input device as an information output device according to a first embodiment of the present invention. The digital photo frame 1 includes a liquid crystal display device 10, a camera lens 12, an operation panel 14 including a plurality of push buttons, a speaker 16, and a circuit board constituting a control circuit (see FIG. 2) in a main body case 5. It is configured. On the front side of the main body case 5, a liquid crystal display device 10, a camera lens 12, a speaker 16, and the like are disposed. A support member 19 for maintaining the digital photo frame 1 in an upright posture is attached to the back side of the main body case 5. An opening / closing member 20 for opening and closing a slot (not shown) for inserting a memory card 50 (see FIG. 2) and an operation panel 14 are provided on the side of the main body case 5.

［電気的構成］
図２は、デジタルフォトフレーム１の内部の電気的構成を示すブロック図である。デジタルフォトフレーム１は、制御回路３０と、制御回路３０に接続された撮像部４４と、液晶表示装置１０と、操作パネル１４と、スピーカ１６と、マイク１８等を備える。制御回路３０は、ＣＰＵ３２と、ＲＯＭ３４と、ＲＡＭ３６と、液晶表示装置１０を駆動するための表示制御回路３８と、スピーカ１６やマイク１８を駆動するための音声制御回路４０と、カードＩ／Ｆ４２等から構成されている。カードＩ／Ｆ４２には、デジタルフォトフレーム１本体のカードスロット（図示せず）によってメモリカード５０が着脱可能に接続される。本実施の形態の情報出力装置であるジェスチャ入力装置は、撮像部４４及び制御回路３０によって構成され、撮像部４４によって撮像された画像データに基づいて制御回路３０が、撮影されている遠隔操作者の動作、例えばジェスチャを検出し、検出した動作に対応する処理を実行して液晶表示装置１０及びスピーカ１６等の制御の対象となる機能を持った構成部を制御する。 [Electrical configuration]
FIG. 2 is a block diagram showing an internal electrical configuration of the digital photo frame 1. The digital photo frame 1 includes a control circuit 30, an imaging unit 44 connected to the control circuit 30, a liquid crystal display device 10, an operation panel 14, a speaker 16, a microphone 18, and the like. The control circuit 30 includes a CPU 32, a ROM 34, a RAM 36, a display control circuit 38 for driving the liquid crystal display device 10, an audio control circuit 40 for driving the speaker 16 and the microphone 18, a card I / F 42, and the like. It is composed of A memory card 50 is detachably connected to the card I / F 42 through a card slot (not shown) of the digital photo frame 1 main body. The gesture input device, which is an information output device according to the present embodiment, includes an imaging unit 44 and a control circuit 30, and the control circuit 30 is imaged based on image data captured by the imaging unit 44. For example, a gesture is detected, and processing corresponding to the detected operation is executed to control components such as the liquid crystal display device 10 and the speaker 16 having functions to be controlled.

撮像部４４は、カメラレンズ１２と、撮像素子であるＣＣＤ４６と、ユニット回路（ＣＤＳ／ＡＧＣ／ＡＤ）４８を有する。 The imaging unit 44 includes a camera lens 12, a CCD 46 that is an imaging device, and a unit circuit (CDS / AGC / AD) 48.

ＣＣＤ４６は、カメラレンズ１２を介して投影された被写体の光を電気信号に変換し、撮像信号としてユニット回路４８に出力する。 The CCD 46 converts the light of the subject projected through the camera lens 12 into an electric signal and outputs it to the unit circuit 48 as an imaging signal.

ユニット回路４８は、ＣＣＤ４６から出力される撮像信号を相関二重サンプリングして保持するＣＤＳ（ＣｏｒｒｅｌａｔｅｄＤｏｕｂｌｅＳａｍｐｌｉｎｇ）回路と、ＣＤＳ回路によりサンプリングを行った撮像信号の自動利得調整を行うＡＧＣ（ＡｕｔｏｍａｔｉｃＧａｉｎＣｏｎｔｒｏｌ）回路と、ＡＣＧ回路によって自動利得調整を行ったアナログの撮像信号をデジタル信号に変換するＡ／Ｄ変換器とを有し、ＣＣＤ４６から出力される撮像信号は、ユニット回路４８を介してデジタル信号としてＣＰＵ３２に送られる。 The unit circuit 48 is a CDS (Correlated Double Sampling) circuit that holds the imaged signal output from the CCD 46 by correlated double sampling, and an AGC (Automatic Gain Control) that performs automatic gain adjustment of the imaged signal sampled by the CDS circuit. ) Circuit and an A / D converter that converts an analog image pickup signal subjected to automatic gain adjustment by the ACG circuit into a digital signal, and the image pickup signal output from the CCD 46 is a digital signal via the unit circuit 48. To the CPU 32.

ＣＰＵ３２は、ＲＯＭ３４に記憶されたアプリケーションプログラムを実行して、ユニット回路４８から送られてきた画像データの画像処理、振れ補正処理、画像データの圧縮・伸張（例えば、ＪＰＥＧ形式の圧縮・伸張）の処理等を行う機能を実行するとともに、デジタルフォトフレーム１の各部をＲＯＭ３４に記憶された制御プログラムにしたがって制御するワンチップマイコンである。 The CPU 32 executes an application program stored in the ROM 34 to perform image processing, shake correction processing, and image data compression / decompression (for example, JPEG format compression / decompression) sent from the unit circuit 48. It is a one-chip microcomputer that executes functions for processing and the like and controls each part of the digital photo frame 1 according to a control program stored in the ROM 34.

ＲＯＭ３４には、デジタルフォトフレーム１の各部を制御するための制御プログラム、ジェスチャを検出し、検出したジェスチャに対応する処理を含む、各種機能を実行するためのアプリケーションプログラム、画像認識処理を行うための顔全般、口、耳、目など感覚器官の像の特徴データ、ジェスチャを検出するための特徴データを含む各種機能を実行するための各種情報が記憶されている。 In the ROM 34, a control program for controlling each part of the digital photo frame 1, a gesture is detected, an application program for executing various functions including a process corresponding to the detected gesture, and an image recognition process are performed. Various information for executing various functions including feature data of images of sensory organs such as the entire face, mouth, ears, eyes, and feature data for detecting gestures is stored.

ＲＡＭ３６は、ＣＣＤ４６によって撮像され、ＣＰＵ３２に送られてきた画像データを一時記憶するバッファメモリとして、さらに、ＣＰＵ３２のワーキングメモリとして使用される。また、情報として液晶表示装置１０にスライドショーとして表示させる画像の画像データベース、同じく情報としてスピーカ１６に音声出力させる音声の音声データベース、必要に応じて人間の感覚器官に関連するジェスチャと、対応する処理との関係を示すジェスチャ入力用テーブルなども記憶されている。 The RAM 36 is used as a buffer memory for temporarily storing image data picked up by the CCD 46 and sent to the CPU 32, and further used as a working memory for the CPU 32. In addition, an image database of images to be displayed as a slide show on the liquid crystal display device 10 as information, an audio database of audio to be output to the speaker 16 as information, gestures related to human sensory organs as necessary, and corresponding processing A gesture input table showing the relationship is also stored.

表示制御回路３８は液晶表示装置１０を制御して、ＣＰＵ３２の指示に基づいてＲＡＭ３６の画像データベースから読み出した画像データを液晶表示装置１０に表示させる。液晶表示装置１０は、表示制御回路３８の制御に基づいて画像表示を行う。 The display control circuit 38 controls the liquid crystal display device 10 and causes the liquid crystal display device 10 to display image data read from the image database in the RAM 36 based on an instruction from the CPU 32. The liquid crystal display device 10 displays an image based on the control of the display control circuit 38.

音声制御回路４０はスピーカ１６を制御して、ＣＰＵ３２の指示に基づいてＲＡＭ３６の音声データベースから読み出された音声データをスピーカ１６に音声出力させる。スピーカ１６は、音声制御回路４０の制御に基づいて音声出力を行う。また、音声制御回路４０は、ＣＰＵ３２の指示に基づいてマイク１８から入力されるアナログの音声データを付加すべき情報として、デジタル変換して表示出力されている画像と対応付けてＲＡＭ３６に記憶する。 The audio control circuit 40 controls the speaker 16 to cause the speaker 16 to output audio data read from the audio database in the RAM 36 based on an instruction from the CPU 32. The speaker 16 performs audio output based on the control of the audio control circuit 40. In addition, the audio control circuit 40 stores the analog audio data input from the microphone 18 based on an instruction from the CPU 32 in the RAM 36 in association with an image that is digitally converted and displayed and output as information to be added.

操作パネル１４には、電源ボタン、選択ボタン、決定ボタン、モード選択ボタン等の複数の押しボタン等が含まれており、ユーザのボタン操作に応じて対応する操作信号をＣＰＵ３２に入力する。 The operation panel 14 includes a plurality of push buttons such as a power button, a selection button, an enter button, and a mode selection button, and inputs a corresponding operation signal to the CPU 32 in accordance with the user's button operation.

メモリカード５０には、情報として画像データや音声データ等が記憶されている。操作パネル１４の操作により、画像データあるいは音声データの読み出し先を、メモリカード５０とＲＡＭ３６のいずれか一方又は両方に設定することができる。 The memory card 50 stores image data, audio data, and the like as information. By operating the operation panel 14, the reading destination of image data or audio data can be set to either one or both of the memory card 50 and the RAM 36.

［指示入力処理］
このようにして構成された本実施の形態の情報出力装置であるジェスチャ入力装置は、撮像部４４で検出した動作（ジェスチャ）に応じてユーザ（遠隔操作者）の指示を特定し、デジタルフォトフレーム１の該当する機能を持った構成部を制御して、指示に対応する様々な処理をデジタルフォトフレーム１に行わせることができる。本実施の形態では、撮像部４４が検出する動作を、人間のジェスチャとして説明する。以下、デジタルフォトフレーム１がスライドショーを行っている状態で、撮像部４４により人間のジェスチャを検出した場合に、制御回路３０において実行される指示入力処理について、図３を参照して説明する。 [Instruction input processing]
The gesture input device, which is the information output device of the present embodiment configured as described above, specifies a user (remote operator) instruction according to an operation (gesture) detected by the imaging unit 44, and a digital photo frame It is possible to cause the digital photo frame 1 to perform various processes corresponding to the instructions by controlling the component having the corresponding function 1. In the present embodiment, the operation detected by the imaging unit 44 will be described as a human gesture. Hereinafter, an instruction input process executed in the control circuit 30 when a human gesture is detected by the imaging unit 44 while the digital photo frame 1 is performing a slide show will be described with reference to FIG.

図３に示すように、まず、ＣＰＵ３２は、スライドショーを開始する処理を行う（ステップＳ１０）。すなわち、ＣＰＵ３２は、ＲＡＭ３６に記憶されている画像データベースから一定時間ごとに一枚ずつ画像データを読み出して、液晶表示装置１０に表示させる。 As shown in FIG. 3, first, the CPU 32 performs a process of starting a slide show (step S10). That is, the CPU 32 reads out image data one by one from the image database stored in the RAM 36 at regular time intervals and displays the image data on the liquid crystal display device 10.

次に、ＣＰＵ３２は、撮像部４４を制御して、所定の周期間隔で繰り返しの撮像を開始する処理を行う（ステップＳ１２）。この処理において、ＣＰＵ３２は、ＣＣＤ４６から出力される撮像信号をユニット回路４８で処理してデジタル信号の画像データとして取得し、こうして取得した画像データをＲＡＭ３６に順次一時記憶する。 Next, the CPU 32 controls the imaging unit 44 to perform processing for starting repeated imaging at a predetermined cycle interval (step S12). In this process, the CPU 32 processes the image pickup signal output from the CCD 46 by the unit circuit 48 to acquire it as digital signal image data, and sequentially stores the acquired image data in the RAM 36 sequentially.

次に、ＣＰＵ３２は、ＲＡＭ３６に一時記憶した画像データに画像認識処理を実行して、画像中に人間の顔があるか否かを判定する（ステップＳ１４）。人間の顔があると判定した場合には（ステップＳ１４でＹＥＳ）、ステップＳ１６に処理を移し、人間の顔がないと判定した場合には（ステップＳ１４でＮＯ）、ステップＳ２４に処理を移す。ここで実行する画像認識処理として様々な処理が可能である。本実施の形態では、ＣＰＵ３２は、例えば、ＲＡＭ３６に一次記憶した画像データから撮像された被写体の輪郭や特徴点の認識、及び、それらの位置関係等を認識し、それらを数値化した数値データである特徴データを算出し、こうして算出した特徴データを、予めＲＯＭ３４に記憶してある画像認識用の被写体の特徴データと比較照合して行う。したがって、ステップＳ１４で、ＣＰＵ３２は、ＲＡＭ３６に一次記憶した画像データから特徴データを算出し、算出した特徴データを、予めＲＯＭ３４に記憶してある画像認識用の顔の特徴データと比較照合して行う。 Next, the CPU 32 executes image recognition processing on the image data temporarily stored in the RAM 36, and determines whether or not there is a human face in the image (step S14). If it is determined that there is a human face (YES in step S14), the process proceeds to step S16. If it is determined that there is no human face (NO in step S14), the process proceeds to step S24. Various processes can be performed as the image recognition process executed here. In the present embodiment, for example, the CPU 32 recognizes the contours and feature points of the subject imaged from the image data temporarily stored in the RAM 36, the positional relationship thereof, and the like, and numerical data obtained by digitizing them. Certain feature data is calculated, and the calculated feature data is compared with the feature data of the subject for image recognition stored in the ROM 34 in advance. Accordingly, in step S14, the CPU 32 calculates feature data from the image data temporarily stored in the RAM 36, and compares the calculated feature data with the face feature data for image recognition stored in the ROM 34 in advance. .

ステップＳ１４において、ＣＰＵ３２は、画像中に人間の顔があると判断すると（ステップＳ１４でＹＥＳ）、ステップＳ１６に進み、ＣＰＵ３２は、画像に感覚器官、例えば、口、耳、目の何れかが含まれているか否かを判定する（ステップＳ１６）。具体的には、ＣＰＵ３２は、図４（ａ）に示すように、ＲＡＭ３６に一時記憶した画像データの顔の領域Ｆを特定し、この領域Ｆに上述の画像認識処理を実行して口、耳、目があるか否かを判定する。口、耳、目の少なくともいずれかがあると判定した場合には（ステップＳ１６でＹＥＳ）、ステップＳ１８に処理を移し、口、耳、目のいずれもないと判定した場合には（ステップＳ１６でＮＯ）、ステップＳ２４に処理を移す。 In step S14, if the CPU 32 determines that there is a human face in the image (YES in step S14), the process proceeds to step S16, and the CPU 32 includes any sensory organ, for example, mouth, ear, or eye. It is determined whether or not (step S16). Specifically, as shown in FIG. 4A, the CPU 32 specifies the face area F of the image data temporarily stored in the RAM 36, and executes the above-described image recognition processing on the area F to thereby perform mouth, ear, and Determine whether there are eyes. If it is determined that there is at least one of mouth, ear, or eye (YES in step S16), the process proceeds to step S18, and if it is determined that there is no mouth, ear, or eye (in step S16). NO), the process proceeds to step S24.

次にステップＳ１８において、ＣＰＵ３２は、ステップＳ１６で検出した感覚器官に対するジェスチャを検出し、どのようなジェスチャを行っているかを判定する（ステップＳ１８）。ここでＣＰＵ３２は、ステップＳ１６において、例えば、感覚器官として「口」があると判定した場合には、ＲＡＭ３６に一時記憶した画像データから、図４（ｂ）に示すように、口を基準とする口周辺の領域Ｂの画像を切り出し、ＲＡＭ３６に記憶する処理を行う。同様に、「目」があると判定された場合には、ＲＡＭ３６に一時記憶した画像データから、図４（ｂ）に示すように、目を基準とする目周辺の領域Ａの画像を切り出し、ＲＡＭ３６に記憶する処理を行い、「耳」があると判定された場合には、ＲＡＭ３６に一時記憶した画像データから、図４（ｂ）に示すように、耳を基準とする耳周辺の領域Ｃの画像を切り出し、ＲＡＭ３６に記憶する処理を行う。そして、所定の期間の領域Ａ、領域Ｂ、領域Ｃの画像データを複数枚蓄積し（例えば、所定の期間が３秒であり、撮像部４４が１／４秒に１枚の撮像を行っている場合には、１２枚の領域Ａ、領域Ｂ、領域Ｃの画像がＲＡＭ３６に蓄積されることになる）、所定の期間における領域Ａ、領域Ｂ、領域Ｃの画像の変化から動きベクトルを算出する。動きベクトルは、例えば、代表点マッチング法や、ブロックマッチング法などを用いて算出してもよい。ＣＰＵ３２は、こうして取得した画像データと動きベクトルをＲＯＭ３４に記憶した特徴データと照合してジェスチャを検出する。なお、口周辺の画像の範囲としては、両手の手のひらが収まる程度の範囲あるいはそれよりも若干大きい範囲であることが考えられる。耳周辺や目周辺の画像の範囲も同様に、両手の手のひらが収まる程度の範囲あるいはそれよりも若干大きい範囲であることが考えられるが、これらの範囲については、ＣＰＵの処理能力、想定するジェスチャの大きさなどを考慮して、実装に応じて決定してもよい。この処理が終了した場合には、ステップＳ２０に処理を移す。 Next, in step S18, the CPU 32 detects a gesture for the sensory organ detected in step S16, and determines what kind of gesture is being performed (step S18). If the CPU 32 determines in step S16 that there is a “mouth” as a sensory organ, for example, the CPU 32 uses the mouth as a reference from the image data temporarily stored in the RAM 36 as shown in FIG. An image of the area B around the mouth is cut out and stored in the RAM 36. Similarly, if it is determined that there are “eyes”, as shown in FIG. 4B, an image of the area A around the eyes is extracted from the image data temporarily stored in the RAM 36. When it is determined that there is an “ear” when the process stored in the RAM 36 is performed, as shown in FIG. 4B, the area C around the ear with reference to the ear is determined from the image data temporarily stored in the RAM 36. The image is cut out and stored in the RAM 36. Then, a plurality of pieces of image data of areas A, B, and C for a predetermined period are accumulated (for example, the predetermined period is 3 seconds, and the imaging unit 44 takes one image every 1/4 second). If there are 12 images of region A, region B, and region C, the motion vector is calculated from changes in the images of region A, region B, and region C over a predetermined period. To do. The motion vector may be calculated using, for example, a representative point matching method or a block matching method. The CPU 32 compares the acquired image data and motion vector with the feature data stored in the ROM 34 to detect a gesture. Note that the range of the image around the mouth may be a range in which the palms of both hands can be accommodated, or a range slightly larger than that. Similarly, the range of the image around the ears and around the eyes may be a range where the palms of both hands can be accommodated, or a range slightly larger than that. However, these ranges include the CPU processing capability and the assumed gesture. The size may be determined according to the implementation in consideration of the size of. If this process ends, the process moves to a step S20.

ステップＳ２０において、ＣＰＵ３２は、ステップＳ１８において検出したジェスチャと、ＲＯＭ３４のジェスチャ入力用テーブルに登録されているステップＳ１６で検出した感覚器官に対応するジェスチャとを比較し、ステップＳ１８において検出したジェスチャの中に、ジェスチャ入力用テーブルに登録されているジェスチャと一致しているジェスチャがあるか否かを判定する処理を行う。一致しているジェスチャがあると判定した場合には（ステップＳ２０でＹＥＳ）、ステップＳ２２に処理を移す。一致しているジェスチャがあると判定しない場合には（ステップＳ２０でＮＯ）、ステップＳ２４に処理を移す。 In step S20, the CPU 32 compares the gesture detected in step S18 with the gesture corresponding to the sensory organ detected in step S16 registered in the gesture input table of the ROM 34, and among the gestures detected in step S18. Then, a process for determining whether there is a gesture that matches the gesture registered in the gesture input table is performed. If it is determined that there is a matching gesture (YES in step S20), the process proceeds to step S22. If it is not determined that there is a matching gesture (NO in step S20), the process proceeds to step S24.

ステップＳ２２において、ＣＰＵ３２は、ステップＳ１８において一致していると判定したジェスチャに対応する処理プログラムに基づいて、液晶表示装置１０あるいはスピーカ１６を制御する処理を行う。次に、ステップＳ２４に処理を移す。 In step S22, the CPU 32 performs a process of controlling the liquid crystal display device 10 or the speaker 16 based on the processing program corresponding to the gesture determined to match in step S18. Next, the process proceeds to step S24.

ステップＳ２４において、ＣＰＵ３２は、スライドショー終了の指示があるか否かを判定する処理を行う。例えば、ＣＰＵ３２は、操作パネル１４の操作によってスライドショー終了の指示が入力されたか否かを判定する。スライドショー終了の指示があると判定した場合には、本ルーチンの処理を終了する。スライドショー終了の指示がないとした場合には、ステップＳ１２に処理を移し、スライドショーを続行する。なお、ステップＳ２４の処理を実行する前に、ジェスチャの検出に用いた撮像部４４の撮影画像データを消去してもよい。 In step S24, the CPU 32 performs a process of determining whether or not there is an instruction to end the slide show. For example, the CPU 32 determines whether or not an instruction to end the slide show is input by operating the operation panel 14. If it is determined that there is an instruction to end the slide show, the process of this routine ends. If there is no instruction to end the slide show, the process proceeds to step S12 to continue the slide show. Note that the captured image data of the imaging unit 44 used for detecting the gesture may be deleted before executing the process of step S24.

すなわち、デジタルフォトフレーム１においてスライドショーの表示が実行されている間は、ＣＰＵ３２は、撮像部４４によって撮像部４４の撮影範囲内における人間の口、耳、目に対するジェスチャを検出する。そして、ＣＰＵ３２が、それら人間の感覚器官に対するジェスチャを検知した場合には、ＣＰＵ３２は、ジェスチャ入力用テーブルを参照して実行する処理の内容を決定し、決定した処理にしたがって、液晶表示装置１０の表示制御、あるいはスピーカ１６の音声出力制御あるいはマイク１８の音声入力制御を実行する。 That is, while the slide show display is being executed in the digital photo frame 1, the CPU 32 detects gestures for the human mouth, ears, and eyes within the imaging range of the imaging unit 44 by the imaging unit 44. When the CPU 32 detects a gesture for these human sensory organs, the CPU 32 determines the content of the processing to be executed with reference to the gesture input table, and according to the determined processing, the liquid crystal display device 10 Display control, audio output control of the speaker 16 or audio input control of the microphone 18 is executed.

次に、図５を用いて、ＲＯＭ３４に記憶されているジェスチャ入力用テーブルの内容について説明する。 Next, the contents of the gesture input table stored in the ROM 34 will be described with reference to FIG.

ジェスチャ入力用テーブルは、人間の感覚器官、すなわち、目、耳、口に対するジェスチャと、液晶表示装置１０あるいはスピーカ１６に対してＣＰＵ３２が行うべき処理に関する指示とを対応付けたものである。ＣＰＵ３２は、液晶表示装置１０あるいはスピーカ１６に対する指示を特定すると、この指示に対応する処理をＲＯＭ３４から読み出して、液晶表示装置１０及びスピーカ１６の制御を行う。具体的には、本実施の形態における人間の感覚器官としては、目、耳、口が該当しており、人間の感覚器官に対するジェスチャとしては、例えば、目付近における手のひらあるいは指の動作、耳付近における手のひらあるいは指の動作、口付近における手のひらあるいは指の動作が該当する。 The gesture input table associates gestures for human sensory organs, that is, eyes, ears, and mouth, with instructions regarding processing to be performed by the CPU 32 on the liquid crystal display device 10 or the speaker 16. When the CPU 32 specifies an instruction for the liquid crystal display device 10 or the speaker 16, the CPU 32 reads out processing corresponding to the instruction from the ROM 34 and controls the liquid crystal display device 10 and the speaker 16. Specifically, eyes, ears, and mouths correspond to human sensory organs in the present embodiment, and gestures for human sensory organs include, for example, palm or finger movements near the eyes, near the ears This corresponds to the movement of the palm or finger in the hand and the movement of the palm or finger in the vicinity of the mouth.

ジェスチャ入力用テーブルから、本実施の形態においてジェスチャと、ジェスチャに対応してＣＰＵ３２が行うべき処理の一例として、次のものがあげられる。図５（ａ）に示すように、目付近のジェスチャが、顔の前に両手で眼鏡の形を作るものである場合には（図６（ａ）参照）、写真の表示（スライドショー）を実行する。また、目付近のジェスチャが、手の平をのばし、両目の上でひさしの様にするというものである場合には、スライドショーを停止して、同じ写真をしばらく表示させたままの状態にする。また、目付近のジェスチャが、目の前で、親指と人差し指をくっつけては離すというものである場合には、画像の一部を拡大表示する。目付近のジェスチャが、まぶたを手のひらで覆うというものである場合には、液晶表示装置１０のバックライトをオフにする（ＭＵＴＥ）。 From the gesture input table, as an example of the gesture and the processing to be performed by the CPU 32 corresponding to the gesture in the present embodiment, the following can be given. As shown in FIG. 5 (a), when the gesture near the eyes forms a pair of glasses with both hands in front of the face (see FIG. 6 (a)), the display of a photo (slide show) is executed. To do. If the gesture near the eyes is to extend the palm of the hand and make it look like the eaves on both eyes, the slide show is stopped and the same picture is displayed for a while. In addition, when the gesture near the eyes is to put the thumb and index finger in front of the eyes and release them, a part of the image is enlarged. When the gesture near the eyes covers the eyelid with the palm of the hand, the backlight of the liquid crystal display device 10 is turned off (MUTE).

また、図５（ｂ）に示すように、口付近のジェスチャが、口の前で、人差し指と親指で○のマークを作るというものである場合には、ＢＧＭの再生を実行する。また、口付近のジェスチャが、人差し指１本を口の前で立てる『し−っ』という動作である場合には（図６（ｂ）参照）、音量を小さくする。また、口付近のジェスチャが、くっつけた人差し指と親指を口の前で離すというものである場合には、音量を大きくする。口付近のジェスチャが、口で両耳を覆うというものである場合には、音声出力を停止する（ＭＵＴＥ）。また、耳付近のジェスチャが、人差し指で、耳をふさぐというものである場合には（図６（ｃ）参照）、音量を小さくする。また、耳付近のジェスチャが、耳に手のひらをかざすというものである場合には、音量を大きくする。また、耳付近のジェスチャが、手の平で両耳を覆うというものである場合には、音声出力を停止する（ＭＵＴＥ）。 Also, as shown in FIG. 5B, when the gesture near the mouth is to make a mark with a forefinger and thumb in front of the mouth, BGM playback is executed. Further, when the gesture near the mouth is an operation of “shi-tsu” in which one index finger is raised in front of the mouth (see FIG. 6B), the volume is reduced. Also, if the gesture near the mouth is to release the attached index finger and thumb in front of the mouth, the volume is increased. If the gesture near the mouth covers both ears with the mouth, the audio output is stopped (MUTE). If the gesture near the ear is to close the ear with the index finger (see FIG. 6C), the volume is reduced. Also, if the gesture near the ear is to hold the palm over the ear, the volume is increased. If the gesture near the ear covers both ears with the palm of the hand, the audio output is stopped (MUTE).

また、図５（ｃ）に示すように、口付近のジェスチャが、口の前でマイクを握ったまねをするというものである場合には（図６（ｄ）参照）、写真への音声の追加記録を実行する。また、口付近のジェスチャが、マイクを口から遠ざけるまねをするというものである場合には、録音音量を下げる。また、口付近のジェスチャが、マイクを口に近づけるまねをするというものである場合には、録音音量を上げる。 Also, as shown in FIG. 5 (c), when the gesture near the mouth is to imitate holding a microphone in front of the mouth (see FIG. 6 (d)), the voice of the photo Perform additional recording. Also, if the gesture near the mouth is to imitate the microphone away from the mouth, the recording volume is lowered. Also, if the gesture near the mouth is to imitate the microphone close to the mouth, the recording volume is increased.

図７は、本実施の形態の情報出力装置であるジェスチャ入力装置を備えたデジタルフォトフレーム１の使用例を示すものである。デジタルフォトフレーム１から出力されているＢＧＭの音量を小さくしたい場合には、図７に示すように、口の前に人差し指を立てて「しーっ」という動作をすることにより、ＢＧＭの音量が小さくなる。なお、本実施の形態の情報出力装置であるジェスチャ入力装置においては、図６（ｃ）に示すように、人差し指で耳をふさぐ動作をしても、ＢＧＭの音量を小さくすることができる。 FIG. 7 shows a usage example of the digital photo frame 1 provided with the gesture input device which is the information output device of the present embodiment. In order to reduce the volume of the BGM output from the digital photo frame 1, as shown in FIG. 7, the index finger is raised in front of the mouth and the operation “Shi” is performed, so that the volume of the BGM is increased. Get smaller. Note that in the gesture input device that is the information output device of the present embodiment, as shown in FIG. 6C, the volume of the BGM can be reduced even when the ear is closed with the index finger.

以上説明したように、本実施の形態では、撮像部４４によって撮像した画像から、ＣＰＵ３２が、顔を検出し、さらに顔から特徴物、例えば目、耳、口を検出し、さらに目、耳、口においてどのようなジェスチャを行っているかを検出し、ＲＯＭ３４に記憶されているジェスチャ入力用テーブルを参照して、該特徴物と関連した処理へのジェスチャに対応付けられた処理内容を特定し、この処理内容に基づいて検出した特徴物と関連した処理への制御を行う。このため、例えば、図５に示すように、くっつけた親指と人差し指を離すというジェスチャを、目の前で行えば目と関連した表示処理の制御（例えば写真表示を拡大）を行い、同様のジェスチャを口の前で行えば口と関連した音声出力処理の制御（例えばＢＧＭの音量調整）を行うというようにして、同一のジェスチャであっても行う対象が、例えば、目、耳、口などと異なれば別の操作指示を指定することができるので、操作を指示する項目が多い場合であっても、対応するジェスチャの数は抑制できる。また、ジェスチャを行う対象である感覚器官と関連した処理を該ジェスチャに応じて制御するようにしたので、操作を指示したいときに直感的に対応することができる。 As described above, in the present embodiment, the CPU 32 detects a face from the image captured by the imaging unit 44, further detects features such as eyes, ears, and mouth from the face, and further detects the eyes, ears, Detecting what kind of gesture is being performed at the mouth, referring to the gesture input table stored in the ROM 34, identifying the processing content associated with the gesture to the processing associated with the feature, Control is performed on processing related to the detected feature based on the processing content. For this reason, for example, as shown in FIG. 5, if the gesture of releasing the thumb and the index finger attached is performed in front of the eyes, the display processing related to the eyes is controlled (for example, the photographic display is enlarged), and the same gesture is performed. If the sound is performed in front of the mouth, the audio output processing control related to the mouth (for example, BGM volume adjustment) is performed, and the target to be performed even with the same gesture is, for example, eyes, ears, mouth, etc. Since different operation instructions can be specified if they are different, the number of corresponding gestures can be suppressed even when there are many items for instructing operations. In addition, since the processing related to the sensory organ that is the object of the gesture is controlled according to the gesture, it is possible to intuitively cope with an operation instruction.

また、本実施の形態においては、検出された特徴物及びその周辺領域の画像から、特徴物に対するジェスチャを検出している。このため、撮像部４４によって撮像した画像の画像データを全てジェスチャの検出に用いることなく、少ない画像データに基づいてジェスチャの検出を行うことが可能になり、ＣＰＵ３２の処理負担を軽減することができる。 In the present embodiment, a gesture for a feature is detected from the detected feature and an image of the surrounding area. For this reason, it is possible to detect a gesture based on a small amount of image data without using all the image data of the image captured by the imaging unit 44 for the gesture detection, and the processing burden on the CPU 32 can be reduced. .

また、ジェスチャ入力用テーブルは、図５に示したものではなく、対応関係はこれに限らない。例えば、一方の手で口の前に指で○を作り、他方の手で耳の側に指で○を作れば通信オン、また、一方の手で口を覆い、他方の手で耳を覆うと通信オフというように、複数の感覚器官のジェスチャに１つのパラメータを対応させてもよい。 The gesture input table is not shown in FIG. 5, and the correspondence relationship is not limited to this. For example, if you make a circle with a finger in front of your mouth with one hand and make a circle with your finger on the side of the ear with the other hand, communication is turned on. Also, cover the mouth with one hand and cover the ear with the other hand. One parameter may correspond to gestures of a plurality of sensory organs, such as communication off.

ジェスチャ入力用テーブルはユーザが自由に設定できるようにしてもよい。また、本実施の形態では、ジェスチャ入力テーブルをＲＯＭ３４に記憶するとして説明したが、本発明はこれに限定されず、電気的にデータの書き換えが可能なメモリ、例えば、ＲＡＭ３６に記憶してもよい。このようにして、ユーザが操作パネル１４を操作することによって、ユーザの好みに応じて、ジェスチャと、対応する処理とを、ジェスチャ入力用テーブルに自由に設定できるようにしてもよい。 The gesture input table may be freely set by the user. In the present embodiment, the gesture input table is described as being stored in the ROM 34. However, the present invention is not limited to this, and the gesture input table may be stored in an electrically rewritable memory such as the RAM 36. . In this way, by operating the operation panel 14 by the user, the gesture and the corresponding processing may be freely set in the gesture input table according to the user's preference.

さらに、本実施の形態においては、ＣＰＵ３２が検出する特徴物を人間の感覚器官としているため、例えば、表示関連の操作を指示する場合であれば目付近のジェスチャに対応させ、音量関連の操作を指示する場合であれば耳付近のジェスチャに対応させ、音声関連の操作を指示する場合であれば口付近のジェスチャに対応させるというように、操作の指示内容を感覚器官別に分類することができる。このように、感覚器官の機能に関連性のある操作とジェスチャとを対応させることによって、ジェスチャを親しみやすくし、かつ、覚えやすくすることができる。また、感覚器官を基にしたインタフェースとなるので、言語や文化が異なる世界中の国々で共通のインタフェースとして用いることが可能になる。さらに、リモコンなどの第３の機器を用いることなく、制御の対象となる機能を持った構成部を遠隔操作することができるになる。 Furthermore, in the present embodiment, since the feature detected by the CPU 32 is a human sensory organ, for example, when a display-related operation is instructed, a gesture near the eye is associated, and a volume-related operation is performed. The instruction contents of the operation can be classified by sensory organ, such that the instruction corresponds to a gesture near the ear if the instruction is given, and the gesture near the mouth corresponds to a gesture near the mouth if the instruction is related to the voice. In this way, by making the operation associated with the function of the sensory organ and the gesture correspond to each other, the gesture can be made familiar and easy to remember. In addition, since the interface is based on sensory organs, it can be used as a common interface in countries around the world with different languages and cultures. Furthermore, it is possible to remotely operate a component having a function to be controlled without using a third device such as a remote controller.

以上、本発明の実施の形態について説明したが、本発明は、上述した実施の形態に限るものではない。例えば、撮像部４４の撮影画像の中に、複数の顔が含まれている場合には、最も中央よりの顔、あるいは最も大きく写されている顔を選択し、選択した顔の目、耳、口付近のジェスチャに基づいて、そのジェスチャの対象となっている感覚器官によって特定される機能を制御するようにしてもよい。 Although the embodiment of the present invention has been described above, the present invention is not limited to the above-described embodiment. For example, when a plurality of faces are included in the captured image of the imaging unit 44, the face from the center or the face that is most enlarged is selected, and the selected face's eyes, ears, Based on the gesture near the mouth, the function specified by the sensory organ that is the target of the gesture may be controlled.

また、最初に検出する特徴物を、人間の感覚器官としたが、本発明はこれに限定されず、撮像部４４の撮影範囲に含むことができ、ＣＰＵ３２が認識可能であれば、特徴物は眼鏡などのように、該感覚器官と関連している物体でもよい。この場合、ＣＰＵ３２は、設定した特徴物付近のジェスチャを検出して、操作の指示を決定して、その物体と関連している感覚器官（眼鏡であった場合は、目）によって特定される機能（目であった場合は、表示機能）を制御するようにしてもよい。 In addition, although the first feature to be detected is a human sensory organ, the present invention is not limited to this, and can be included in the imaging range of the imaging unit 44, and if the CPU 32 can recognize the feature, It may be an object associated with the sensory organ, such as glasses. In this case, the CPU 32 detects a gesture in the vicinity of the set feature, determines an operation instruction, and a function specified by a sensory organ (eyes in the case of glasses) related to the object. (If it is an eye, the display function) may be controlled.

また、予め所有者の画像を登録し、撮像画像から顔を検出する時に、所有者の顔であるか否かを判別し、所有者の顔である場合に目、耳、口の検出を行うようにすることによって、所有者のみのジェスチャによって制御の対象となる機能を制御するようにしてもよい。 Also, when the owner image is registered in advance and the face is detected from the captured image, it is determined whether or not it is the owner's face, and if it is the owner's face, the eyes, ears, and mouth are detected. By doing so, a function to be controlled may be controlled by a gesture only by the owner.

上述の実施の形態では、本発明をデジタルフォトフレームに適用した場合について説明したが、本発明はこれに限定されず、例えば、テレビ、冷蔵庫、エアコンなどの電気製品や電子機器の制御部として適用してもよく、また上述の実施の形態で示した目、耳、口のうちの少なくとも何れか１つを検出し、その検出した感覚器官によって特定される機能を制御するものであればよい。あるいはＣＰＵとメモリを含む、撮像部が有するコンピュータを上述した各手段として機能させるプログラムによって動作させることができる。プログラムは、通信回線を介して配布することも可能であるし、ＣＤ−ＲＯＭ等の記録媒体に書き込んで配布することも可能である。 In the above-described embodiment, the case where the present invention is applied to a digital photo frame has been described. However, the present invention is not limited to this, and for example, the present invention is applied as a control unit of an electric product or electronic device such as a television, a refrigerator, or an air conditioner. Alternatively, any function may be used as long as it detects at least one of the eyes, ears, and mouth shown in the above embodiment and controls the function specified by the detected sensory organ. Or it can be made to operate | move by the program which functions as each means mentioned above which the computer which an imaging part contains CPU and memory has. The program can be distributed via a communication line, or can be distributed by writing on a recording medium such as a CD-ROM.

さらに、図１に示すデジタルフォトフレーム１においては、本体ケース５の内部に情報出力装置であるジェスチャ入力装置と液晶表示装置１０及びスピーカ１６等の制御対象となる機能を持った構成部とを設けているが、情報出力装置と制御の対象となる機能を持った構成部とを分離してもよい。ジェスチャ入力装置と制御の対象となる機能を持った構成部とを分離した場合については次の実施の形態で説明する。 Further, in the digital photo frame 1 shown in FIG. 1, a gesture input device that is an information output device and a component having a function to be controlled such as a liquid crystal display device 10 and a speaker 16 are provided inside the main body case 5. However, the information output device may be separated from the component having the function to be controlled. The case where the gesture input device and the component having the function to be controlled are separated will be described in the next embodiment.

［第２の実施の形態］
図８は本発明の第２の実施の形態の情報出力装置である遠隔制御装置を含むシステムの概要を示す図で、図９は本発明の第２の実施の形態の情報出力装置である遠隔制御装置の制御ブロック図である。第１の実施の形態と同様の構成には、同様の符号を付して詳細な説明を省略する。本実施の形態の情報出力装置である遠隔制御装置は、第１の実施の形態情報出力装置であるジェスチャ入力装置の特徴に加えて、遠隔制御装置と制御の対象となる機能を持った構成部とを分離しているという特徴を有する。図８に示すように、遠隔制御装置７０は、デジタルテレビ受信装置１５０、ＤＶＤ記録再生装置１６０、ビデオ記録再生装置１７０、衛星放送用チューナ１８０、地上デジタル放送用チューナ１９０のＡＶ機器等をコントロールするためのコマンドを赤外線送信する機能、及び、例えば、図示していない浴室の湯沸器をオンオフするためのコマンドを無線送信する機能が備えられている。遠隔制御装置７０からのコマンドを直接ＡＶ機器に赤外線送信できない場合には、遠隔制御装置７０からのコマンドを中継装置１２０を介してＡＶ機器に送信する。 [Second Embodiment]
FIG. 8 is a diagram showing an outline of a system including a remote control device which is an information output device according to the second embodiment of the present invention. FIG. 9 is a remote view which is an information output device according to the second embodiment of the present invention. It is a control block diagram of a control apparatus. The same components as those in the first embodiment are denoted by the same reference numerals, and detailed description thereof is omitted. The remote control device, which is the information output device of the present embodiment, is a constituent unit having functions to be controlled with the remote control device in addition to the features of the gesture input device which is the information output device of the first embodiment And are separated from each other. As shown in FIG. 8, the remote control device 70 controls the digital television receiving device 150, the DVD recording / reproducing device 160, the video recording / reproducing device 170, the satellite broadcasting tuner 180, the AV equipment of the terrestrial digital broadcasting tuner 190, and the like. For example, and a function for wirelessly transmitting a command for turning on and off a water heater in a bathroom (not shown). If the command from the remote control device 70 cannot be directly transmitted to the AV device by infrared, the command from the remote control device 70 is transmitted to the AV device via the relay device 120.

遠隔制御装置７０は、図９に示すように、制御回路１００と、制御回路１００に接続した撮像部１１０、操作パネル１１２、赤外線発生部１０４、アンテナ１０８とから構成されている。制御回路１００は、ＣＰＵ３２と、ＲＯＭ３４と、ＲＡＭ３６と、赤外線送信回路１０２と、無線送信回路１０６とを有する。なお、図９に示す遠隔制御装置７０の構成において、図２を参照して説明した制御回路３０における構成要素と同様の構成要素については、同一の符号を付して詳細な説明を省略する。また、撮像部１１０についても、図２を参照して説明した撮像部４４と同様の構成で実現可能であるため、詳細な説明を省略する。なお、本実施の形態では、制御回路１００のＲＯＭ３４に記憶されたジェスチャ入力用テーブルには、図５を参照した人間の感覚器官に関連するジェスチャと、ジェスチャに対応する処理の項目に加えて、処理を実行する機器と送信方法も含んでいる。 As shown in FIG. 9, the remote control device 70 includes a control circuit 100, an imaging unit 110 connected to the control circuit 100, an operation panel 112, an infrared generation unit 104, and an antenna 108. The control circuit 100 includes a CPU 32, a ROM 34, a RAM 36, an infrared transmission circuit 102, and a wireless transmission circuit 106. In the configuration of the remote control device 70 shown in FIG. 9, the same components as those in the control circuit 30 described with reference to FIG. The imaging unit 110 can also be realized with the same configuration as the imaging unit 44 described with reference to FIG. In the present embodiment, the gesture input table stored in the ROM 34 of the control circuit 100 includes, in addition to the gesture related to the human sensory organ with reference to FIG. 5 and the processing items corresponding to the gesture, It also includes a device that executes the process and a transmission method.

制御回路１００を構成するＣＰＵ３２は、図２を参照して説明した制御回路３０と同様に、撮像部１１０によって撮影された画像データに基づいて、撮像部１１０の撮影範囲内における人間の感覚器官、例えば、口、耳、目、及び人間の口、耳、目付近のジェスチャを検出する。そして、ＣＰＵ３２が、人間の感覚器官に対するジェスチャを検知すると、ＣＰＵ３２は、ジェスチャ入力用テーブルを参照して、その感覚器官に対応する機器と送信方法及び処理の内容を決定する。そして、ＣＰＵ３２は、検出した感覚器官が赤外線で情報を送受信する機器に対するものであれば、赤外線送信回路１０２を制御して、赤外線発生部１０４から、決定した処理内容を赤外線信号によって該当する機器に送信する。該当する機器は、こうして送信された赤外線信号を受信すると、赤外線信号が示す処理コマンドを実行して、指定された処理を実行する。また、ＣＰＵ３２は、検出した感覚器官が無線通信で情報を送受信する機器に対するものであれば、無線送信回路１０６を制御して、アンテナ１０８から、決定した処理内容を無線信号によって該当する機器に送信する。該当する機器は、こうして送信された無線信号を受信すると、無線信号が示す検知したジェスチャに対応した処理コマンドを実行して、指定された処理を実行する。 Similar to the control circuit 30 described with reference to FIG. 2, the CPU 32 constituting the control circuit 100 is based on the image data captured by the imaging unit 110, and the human sensory organ within the imaging range of the imaging unit 110, For example, the mouth, ears, eyes, and gestures around the human mouth, ears, and eyes are detected. When the CPU 32 detects a gesture for a human sensory organ, the CPU 32 refers to the gesture input table and determines a device corresponding to the sensory organ, a transmission method, and processing contents. Then, if the detected sensory organ is for a device that transmits and receives information by infrared rays, the CPU 32 controls the infrared transmission circuit 102 to transfer the determined processing contents from the infrared generation unit 104 to the corresponding device by the infrared signal. Send. When the corresponding device receives the infrared signal thus transmitted, it executes the processing command indicated by the infrared signal and executes the designated processing. In addition, if the detected sensory organ is for a device that transmits and receives information by wireless communication, the CPU 32 controls the wireless transmission circuit 106 and transmits the determined processing content from the antenna 108 to the corresponding device by a wireless signal. To do. When the corresponding device receives the wireless signal transmitted in this manner, the corresponding device executes a processing command corresponding to the detected gesture indicated by the wireless signal, and executes the designated processing.

さらに、制御回路１００のＲＯＭ３４には、ジェスチャ入力用テーブルに加えて、さらに、それぞれの処理を赤外線リモコン又は無線リモコンによって各種の機器を遠隔操作するためのコマンドを記憶している。 Furthermore, in addition to the gesture input table, the ROM 34 of the control circuit 100 further stores commands for remotely operating various devices by using an infrared remote controller or a wireless remote controller.

ジェスチャ入力用テーブルはユーザが自由に設定できるようにしてもよい。本実施の形態では、ジェスチャ入力テーブルをＲＯＭ３４に記憶するとしたが、本発明はこれに限定されず、例えば、電気的にデータの書き換えが可能なメモリ、例えば、ＲＡＭ３６に記憶してもよい。このようにして、ユーザが操作パネル１１２を操作することによって、ユーザの好みに応じて、ジェスチャと、遠隔操作したい機種の機能に対応するコマンドとを関連付けて、ジェスチャ入力用テーブルに設定入力を行うことができる。例えば、目付近のジェスチャは、ＤＶＤ記録再生装置１６０の制御を行う操作の指示に対応させ、口付近のジェスチャはデジタルテレビ受信装置１５０の制御を行う操作の指示に対応させるように、デジタルテレビ受信装置１５０やＤＶＤ記録再生装置１６０を遠隔操作する際のユーザのジェスチャを分類してもよい。 The gesture input table may be freely set by the user. In the present embodiment, the gesture input table is stored in the ROM 34. However, the present invention is not limited to this. For example, the gesture input table may be stored in an electrically rewritable memory, for example, the RAM 36. In this way, when the user operates the operation panel 112, according to the user's preference, the gesture is associated with the command corresponding to the function of the model to be remotely operated, and setting input is performed in the gesture input table. be able to. For example, a gesture near the eyes corresponds to an instruction for an operation for controlling the DVD recording / reproducing device 160, and a gesture near the mouth corresponds to an instruction for an operation for controlling the digital television receiver 150. User gestures when the device 150 and the DVD recording / reproducing device 160 are remotely operated may be classified.

本発明は、上述の実施の形態に限定されるものでは無く、その趣旨を逸脱しない範囲で、上述の実施の形態を種々に組み合わせ、さらには上述の実施の形態に種々に変形を加えた形態とすることができる。 The present invention is not limited to the above-described embodiments, and various combinations of the above-described embodiments and various modifications are added to the above-described embodiments without departing from the spirit of the present invention. It can be.

１デジタルフォトフレーム
１０液晶表示装置
１２カメラレンズ
１４、１１２操作パネル
１６スピーカ
１８マイク
３０、１００制御回路
３２ＣＰＵ
３４ＲＯＭ
３６ＲＡＭ
４４、１１０撮像部
７０遠隔制御装置（第２の実施の形態）
１０２赤外線送信回路
１０４赤外線発生部
１０６無線送信回路
１０８アンテナ
１２０中継装置
１５０デジタルテレビ受信装置
１６０ＤＶＤ記録再生装置 DESCRIPTION OF SYMBOLS 1 Digital photo frame 10 Liquid crystal display device 12 Camera lens 14, 112 Operation panel 16 Speaker 18 Microphone 30, 100 Control circuit 32 CPU
34 ROM
36 RAM
44, 110 Imaging unit 70 Remote control device (second embodiment)
DESCRIPTION OF SYMBOLS 102 Infrared transmission circuit 104 Infrared generation part 106 Wireless transmission circuit 108 Antenna 120 Relay apparatus 150 Digital television receiver 160 DVD recording / reproducing apparatus

Claims

Imaging means;
An output means for outputting information; a detection means for detecting an image of a sensory organ corresponding to the type of information possessed by a person remotely controlling from an image taken by the imaging means;
Detecting an image of the sensory organ by the detection means, a motion detection means for detecting a predetermined motion on the sensory organ by the remote control person;
Control means for controlling the output of the information by the output means with the control content specified by the action detected by the action detection means when a predetermined action is detected by the action detection means;
An information output device comprising:

2. The information output apparatus according to claim 1, wherein the information is an image, and the sensory organ corresponding to the type of information is an eye of the person who performs the remote control.

3. The information output apparatus according to claim 1, wherein the information is a voice, and the sensory organ corresponding to the type of information is a mouth or an ear of the person who performs remote control.

Detection judgment means for judging whether or not there is detection by the detection means when the output means outputs information;
If it is determined by the detection determination means that there is detection, additional information input means for inputting additional information;
Additional information input by the additional information input means and information adding means added to the information output by the output means;
The information output device according to claim 1, further comprising:

A voice input means;
5. The information output apparatus according to claim 4, wherein the information adding unit adds the voice input by the voice input unit to the information.

A method for controlling an apparatus that realizes a plurality of functions,
A detection step of detecting a predetermined feature from the captured image;
Detecting a predetermined feature in the detection step, an operation detecting step of detecting a predetermined operation on the feature by a person who remotely controls;
When a predetermined operation is detected in the operation detection step, a function specified by the predetermined feature among the plurality of types of functions is controlled by a control content specified by the operation detected in the operation detection step. A remote control method comprising: a control step.

Computer
Detecting means for detecting a predetermined feature from the captured image;
An action detecting means for detecting a predetermined action on the feature by a person who remotely controls when a predetermined feature is detected by the detecting means;
Control means for controlling a function specified by the predetermined feature among a plurality of types of functions with a control content specified by the action detected by the action detection means when a predetermined action is detected by the action detection means,
A program characterized by functioning as