JP2007249595A

JP2007249595A - Display, projector, display system, display method, display program, and recording medium

Info

Publication number: JP2007249595A
Application number: JP2006071862A
Authority: JP
Inventors: Takashi Kakiuchi; 崇垣内
Original assignee: Omron Corp; Omron Tateisi Electronics Co
Current assignee: Omron Corp
Priority date: 2006-03-15
Filing date: 2006-03-15
Publication date: 2007-09-27
Anticipated expiration: 2026-03-15
Also published as: JP4984583B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a projector allowing hands-free operation control. <P>SOLUTION: This projector is provided with a moving image input part 32 for inputting a face image, a face image recognition part 44 for recognizing a face action based on the face image, a command determination part 48 for determining a command for instructing reproduction of a content, based on the face action, and an output part 38 for display-outputting the content reproduced based on the command, the face image is input from the animation input part 32, the face image is tracked time-serially to recognize the face action, the command for instructing the reproduction of the content is determined based on the face action, the content reproduced based on the command is display-output by the output part 38, and the command for instructing a reproduction operation for the content is thereby determined using the series of face actions. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、画像を用いてコンテンツの再生動作を指示する表示装置、プロジェクタ、表示システム、表示方法、表示プログラム、および記録媒体に関するものである。 The present invention relates to a display device, a projector, a display system, a display method, a display program, and a recording medium that instruct a content reproduction operation using an image.

従来、オペレータによる操作指示を画像および／または音声の解析によって認識するシステムが用いられている。 Conventionally, a system for recognizing an operation instruction by an operator by analyzing an image and / or sound has been used.

特許文献１には、ユーザをモニタし、動作からインタラクション情報を決定してコンテンツ提示方法を変更するデジタルコンテンツ表示方法が開示されている。当該デジタルコンテンツ表示方法では、ユーザ動作からインタラクション情報を特定し、コンテンツの表示を変えることが示されている。 Patent Document 1 discloses a digital content display method that monitors a user, determines interaction information from an operation, and changes the content presentation method. In the digital content display method, it is indicated that the interaction information is specified from the user operation and the display of the content is changed.

特許文献２には、目線、顔のポーズ、体の姿勢、手振り、顔の表情等の視覚的キューでイベントを予測する方法及び装置について開示されている。 Patent Document 2 discloses a method and apparatus for predicting an event using visual cues such as eyes, face poses, body postures, hand gestures, facial expressions, and the like.

特許文献３には、オペレータを撮影した画像を解析することによって、オペレータの行動に応じた指示命令を実行する情報処理装置について開示されている。当該情報処理装置では、オペレータごとに任意の行動と指示命令とを対応づけて登録することでオペレータの好みに合った行動で希望する処理を実行できるようにしている。 Patent Document 3 discloses an information processing apparatus that executes an instruction command according to an operator's action by analyzing an image captured by the operator. In the information processing apparatus, an arbitrary action and an instruction command are registered in association with each operator so that a desired process can be executed with an action that suits the operator's preference.

特許文献４には、従来のデバイスにおける入力の煩わしさや使用者の負担を軽減することが可能な人物顔動作検出方法が開示されている。上記の人物顔動作検出方法では、顔画像を含む画像を入力し、画像から人物の注目方向を抽出し、顔内部の部品の変化を検出し、検出された変化が操作コマンドであるか否かを判定している。 Patent Document 4 discloses a human face motion detection method capable of reducing the burden of input and the burden on the user in a conventional device. In the human face motion detection method described above, an image including a face image is input, the attention direction of the person is extracted from the image, a change in a part inside the face is detected, and whether or not the detected change is an operation command. Is judged.

特許文献５には、撮影した画像に基づいて利用者のジェスチャーの内容を特定し、複数のコマンド候補を抽出し、音声の入力から複数のコマンド候補を出力する操作指示装置が開示されている。
特開２００５−１０８２１０号公報（２００５年４月２１日公開）特表２００４−５１５９８２号公報（２００４年５月２７日公開）特開２００５−９２４１９号公報（２００５年４月７日公開）特開平９−３０５７４３号公報（１９９７年１１月２８日公開）特開２００２−１８２６８０号公報（２００２年６月２６日公開） Patent Document 5 discloses an operation instruction device that identifies the content of a user's gesture based on a photographed image, extracts a plurality of command candidates, and outputs a plurality of command candidates from voice input.
JP 2005-108210 A (released on April 21, 2005) Special table 2004-515982 gazette (published May 27, 2004) Japanese Patent Laying-Open No. 2005-92419 (released on April 7, 2005) JP 9-305743 A (published on November 28, 1997) JP 2002-182680 A (released on June 26, 2002)

しかしながら、上記のようなシステムでは、例えば、プレゼンテーションを行う際に、講演者を撮影した画像からコマンドを特定することが困難であり、また、容易な動作をコマンドと対応付けた場合には、コマンドの誤認識が増加するという問題が生じる。 However, in the system as described above, for example, when giving a presentation, it is difficult to specify a command from an image of a lecturer, and when an easy operation is associated with a command, The problem of increased misrecognition occurs.

本発明は上記の問題点に鑑みてなされたものであり、その目的は、顔を用いた容易な動作によるコマンドの指定をより確実に行うことができる表示装置、プロジェクタ、表示システム、表示方法、表示プログラム、および該プログラムを記録した記録媒体を提供することにある。 The present invention has been made in view of the above problems, and its purpose is to provide a display device, a projector, a display system, a display method, and a display device that can more reliably specify a command by an easy operation using a face. The object is to provide a display program and a recording medium on which the program is recorded.

本発明の表示装置は、上記の問題を解決するために、顔画像を入力する画像入力部と、前記顔画像から顔動作を認識する顔画像認識手段と、コンテンツの再生を指示するコマンドを前記顔動作に基づいて決定するコマンド決定手段と、前記コマンドに基づいて再生されたコンテンツを表示出力する出力部とを備えることを特徴としている。 In order to solve the above problem, the display device of the present invention includes an image input unit that inputs a face image, a face image recognition unit that recognizes a face motion from the face image, and a command that instructs content reproduction. Command determining means for determining based on a face motion and an output unit for displaying and outputting content reproduced based on the command are provided.

上記の構成によれば、画像入力部から顔画像を入力し、顔画像を時系列に沿って追跡することで顔動作を認識し、顔動作からコンテンツの再生を指示するコマンドを決定し、コマンドに基づいて再生されたコンテンツを出力部が表示出力する。これによって、一連の顔動作を用いてコンテンツを表示させる再生操作を指示するコマンドを決定することができる。 According to the above configuration, the face image is input from the image input unit, the face motion is recognized in time series, the face motion is recognized, the command for instructing the content reproduction from the face motion is determined, and the command The output unit displays and outputs the content reproduced based on the above. As a result, it is possible to determine a command for instructing a playback operation for displaying content using a series of face motions.

上記の構成を用いることで、表示装置の画像入力部に向かって顔動作を行うことによって、例えば、リモートコントローラなどの物理的な入力デバイスを用いることなく、ハンドフリーな状態で操作制御することができる表示装置を実現することができる。 By using the above configuration, by performing a face motion toward the image input unit of the display device, for example, operation control can be performed in a hands-free state without using a physical input device such as a remote controller. A display device that can be realized can be realized.

本発明の表示装置では、上記の構成において、前記顔動作を前記コマンドと対応付けて記憶するコマンド記憶部をさらに備え、前記コマンド決定手段は、前記コマンド記憶部を参照して、前記顔動作に対応する前記コマンドを決定することを特徴としている。 In the display device of the present invention, in the above configuration, the display device further includes a command storage unit that stores the face motion in association with the command, and the command determination unit refers to the command storage unit to perform the face motion. It is characterized in that the corresponding command is determined.

上記の構成によれば、コマンド記憶部に顔動作とコマンドとを対応付けたテーブルを記憶し、コマンド決定手段はテーブルを参照してコマンドを決定する。これにより、顔動作およびコマンドを独自に設定することができるので、例えば、表示装置や表示するコンテンツに応じたコマンドを設定し、表示出力を指示するユーザごとに顔動作を独自に設定することができる。 According to said structure, the table which matched the face motion and the command was memorize | stored in the command memory | storage part, and a command determination means determines a command with reference to a table. As a result, the face motion and the command can be set independently. For example, a command corresponding to the display device or the content to be displayed can be set, and the face motion can be set independently for each user who instructs display output. it can.

本発明の表示装置では、前記コマンド決定手段は、第１の顔動作が入力されたとき、コマンドの入力を受け付けるコマンド入力状態となり、前記コマンド入力状態となった後に認識された第２の顔動作に基づいてコマンドの決定を行うことを特徴としている。 In the display device of the present invention, when the first face motion is input, the command determining means enters a command input state for accepting a command input, and the second face motion recognized after the command input state is reached. It is characterized by determining a command based on the above.

上記の構成によれば、第１の顔動作を入力することでコマンドの入力を受け付ける状態にし、その後、第２の顔動作を入力することで、コマンドの誤認識を低減させることができる。このため、例えば、第１の顔動作として頻繁に同様の動作を行うことの少ない、特徴のある動作を設定し、第２の顔動作として簡単な動作を設定することで、簡単な動作の組み合わせによって、偶然、同様の動作を行うことによる誤認識および誤動作を低減することができるという効果を奏する。 According to said structure, it can be in the state which receives the input of a command by inputting a 1st face action, and can reduce the misrecognition of a command by inputting a 2nd face action after that. For this reason, for example, a combination of simple movements is set by setting a characteristic movement that does not frequently perform the same movement as the first facial movement and setting a simple movement as the second facial movement. Therefore, it is possible to reduce erroneous recognition and malfunction caused by performing the same operation accidentally.

本発明の表示装置では、前記第１の顔動作は、顔が所定の方向に向いて静止する動作であり、前記第２の顔動作は顔の少なくとも一部が動く動作であることを特徴としている。 In the display device of the present invention, the first face motion is a motion in which the face is stationary in a predetermined direction, and the second face motion is a motion in which at least a part of the face moves. Yes.

上記の構成によれば、顔を所定の方向に向けて静止することでコマンドの入力を受け付ける状態にし、その後、顔の一部を動かすことによってコマンドの種類を確定することで、コマンド入力の誤認識を低減させることができる。 According to the above configuration, an error in command input can be obtained by setting the command type by moving a part of the face after moving the face to a state where the face is allowed to stand still in a predetermined direction. Recognition can be reduced.

なお、上記の所定の方向の具体例をあげると、例えば、表示装置の方向があげられる。 A specific example of the predetermined direction is, for example, the direction of the display device.

本発明の表示装置では、コマンド入力状態に入ったことをユーザに通知する入力状態通知手段をさらに備え、前記コマンド決定手段は、前記コマンド入力状態となった場合に、前記入力状態通知手段に通知させることを特徴としている。 The display device of the present invention further includes an input state notifying unit for notifying a user that the command input state has been entered, and the command determining unit notifies the input state notifying unit when the command input state is entered. It is characterized by letting.

上記の構成によれば、入力状態通知手段を用いてコマンド入力状態にはいったことをユーザに通知するので、ユーザはコマンド入力状態であることを確認し、その後、第２の顔動作を入力することができる。これによって、ユーザはコマンド入力状態であることを把握し、確認した上で第２のコマンドの入力を確実に行うことができるという効果を奏する。 According to the above configuration, the user is notified that the command input state has been entered using the input state notifying means, so the user confirms that the command input state has been entered, and then inputs the second facial motion. be able to. Thus, there is an effect that the user can surely input the second command after grasping and confirming the command input state.

本発明の表示装置では、前記顔画像認識手段は、人物を識別するために登録された識別情報と、前記顔画像から抽出された特徴情報とが一致するかを認識し、前記コマンド決定手段は、前記識別情報と一致する前記特徴情報が抽出される前記顔画像から認識された顔動作に基づいてコマンドを決定することを特徴としている。 In the display device of the present invention, the face image recognition means recognizes whether the identification information registered for identifying a person matches the feature information extracted from the face image, and the command determination means A command is determined based on a face motion recognized from the face image from which the feature information matching the identification information is extracted.

上記の構成によれば、識別情報に基づいて顔画像を識別した上でコマンドを決定するので、指示を出しているユーザを特定して、特定したユーザからの指示のみを受け付けるようにすることができる。これによって、複数の人物の顔画像が画像データ内に存在する場合でも、ユーザを識別してコマンドの決定を行うことができるという効果を奏する。 According to the above configuration, since the command is determined after identifying the face image based on the identification information, it is possible to identify the user who has issued the instruction and accept only the instruction from the identified user. it can. As a result, even when face images of a plurality of persons exist in the image data, it is possible to identify the user and determine the command.

本発明の表示装置では、音声を入力する音声入力部と、入力された前記音声を認識する音声認識手段とをさらに備え、前記コマンド決定手段は、前記顔動作および認識された前記音声に基づいて前記コマンドを決定することを特徴としている。 The display device of the present invention further includes a voice input unit that inputs voice and voice recognition means that recognizes the inputted voice, and the command determination means is based on the face motion and the recognized voice. The command is determined.

上記の構成によれば、顔画像による認識に加えて、音声認識手段による音声の認識結果に基づいてコマンドを決定するので、顔画像によるコマンドの認識に失敗した場合でも、音声によるコマンド入力によってコマンドを訂正することができる。 According to the above configuration, in addition to the recognition by the face image, the command is determined based on the voice recognition result by the voice recognition means. Therefore, even if the command recognition by the face image fails, the command is input by the voice command input. Can be corrected.

上述の表示装置と、再生する前記コンテンツを記憶するコンテンツ記憶部と、前記表示装置からのコマンドに基づいて前記コンテンツ記憶部からコンテンツを再生し、前記表示装置の出力部に入力する再生部とを備えることによって、プロジェクタを構成することができる。 A display unit, a content storage unit that stores the content to be played back, and a playback unit that plays the content from the content storage unit based on a command from the display device and inputs the content to the output unit of the display device By providing, a projector can be configured.

上記の構成によれば、プロジェクタに上述の効果を適用することができる。 According to said structure, the above-mentioned effect is applicable to a projector.

上述の表示装置と、当該表示装置からのコマンドに基づいて前記コンテンツを再生する再生装置とを備えることで表示システムを構成することができる。 A display system can be configured by including the display device described above and a playback device that plays back the content based on a command from the display device.

上記の構成によれば、上述のものと同様の効果を奏する表示システムを容易に構築することができる。 According to said structure, the display system which has an effect similar to the above-mentioned thing can be constructed | assembled easily.

本発明の表示方法は、上記の問題を解決するために、顔画像を入力する画像入力部と、コンテンツの再生を指示するコマンドに基づいて再生されたコンテンツを表示出力する出力部とを備えた表示装置における表示方法であって、顔画像認識手段が、前記顔画像から顔動作を認識するステップと、コマンド決定手段が、前記コマンドを前記顔動作に基づいて決定するステップとを備えることを特徴としている。 In order to solve the above problem, the display method of the present invention includes an image input unit that inputs a face image, and an output unit that displays and outputs the content reproduced based on a command for instructing the reproduction of the content. A display method in a display device, comprising: a face image recognizing unit recognizing a face motion from the face image; and a command determining unit determining the command based on the face motion. It is said.

上記の構成によれば、顔画像を入力し、入力された顔画像から顔動作を認識し、顔動作に基づいてコンテンツの再生動作を指示するコマンドを決定することができる。これによって、一連の顔動作を用いてコンテンツを表示させる再生操作を指示するコマンドを決定することができる。 According to the above configuration, it is possible to input a face image, recognize a face motion from the input face image, and determine a command for instructing a content reproduction operation based on the face motion. As a result, it is possible to determine a command for instructing a playback operation for displaying content using a series of face motions.

上記の方法を用いることで、表示装置の画像入力手段に向かって顔動作を行うことによって、例えば、リモートコントローラなどの物理的な入力デバイスを用いることなく、ハンドフリーな状態で操作制御することができる表示方法を実現することができる。 By using the above method, it is possible to control the operation in a hands-free state without using a physical input device such as a remote controller by performing a face motion toward the image input means of the display device. A display method that can be realized.

なお、上記表示方法を、コンピュータの制御によりコンピュータ上で実行させることができる。さらに、上記表示プログラムをコンピュータ読み取り可能な記録媒体に記憶させることにより、任意のコンピュータ上で実行させることができる。 Note that the above display method can be executed on a computer under the control of the computer. Furthermore, by storing the display program in a computer-readable recording medium, the display program can be executed on any computer.

以上のように、本発明に係る表示装置は、画像入力部から顔画像を入力し、顔画像を時系列に沿って追跡することで顔動作を認識し、顔動作からコンテンツの再生を指示するコマンドを決定し、コマンドに基づいて再生されたコンテンツを出力部が表示出力する。これによって、一連の顔動作を用いてコンテンツの再生操作を指示するコマンドを決定することができる。 As described above, the display device according to the present invention inputs a face image from the image input unit, recognizes the face motion by tracking the face image in time series, and instructs the playback of content from the face motion. The command is determined, and the output unit displays and outputs the content reproduced based on the command. As a result, a command for instructing a content reproduction operation can be determined using a series of face movements.

本発明の一実施形態について、図１〜図７に基づいて説明すると以下の通りである。本実施形態では表示装置として、以下で説明する各部の機能を備えたプロジェクタを例に説明するが、これに限るものではない。外部のコンテンツ再生装置に、以下で説明する各部の機能を備えた表示装置を接続することで表示システムとして機能させてもよい。 An embodiment of the present invention will be described below with reference to FIGS. In the present embodiment, a projector having a function of each unit described below will be described as an example of a display device, but is not limited thereto. You may make it function as a display system by connecting the display apparatus provided with the function of each part demonstrated below to an external content reproduction apparatus.

図１は、本実施形態におけるプロジェクタ１０の各部の概略構成を示したブロック図である。プロジェクタ（表示装置）１０は、再生部（再生手段）２０、入出力部３０、処理部４０、および、記憶部５０を備えている。入出力部３０は、動画入力部（画像入力手段）３２、音声入力部（音声入力手段）３４、通知部（入力通知手段）３６、および、出力部（映像出力手段）３８を備える構成である。処理部４０は、時系列画像記録部４２、顔画像認識部（顔画像認識手段）４４、音声認識部（音声認識手段）４６、および、コマンド決定部（コマンド決定手段）４８を備える構成である。記憶部５０は、時系列画像記憶部５２、識別画像記憶部５４、コマンド記憶部（コマンド記録手段）５６、および、コンテンツ記憶部（コンテンツ記憶手段）５８を備える構成である。 FIG. 1 is a block diagram illustrating a schematic configuration of each unit of the projector 10 according to the present embodiment. The projector (display device) 10 includes a reproduction unit (reproduction unit) 20, an input / output unit 30, a processing unit 40, and a storage unit 50. The input / output unit 30 includes a moving image input unit (image input unit) 32, an audio input unit (audio input unit) 34, a notification unit (input notification unit) 36, and an output unit (video output unit) 38. . The processing unit 40 includes a time-series image recording unit 42, a face image recognition unit (face image recognition unit) 44, a voice recognition unit (voice recognition unit) 46, and a command determination unit (command determination unit) 48. . The storage unit 50 includes a time-series image storage unit 52, an identification image storage unit 54, a command storage unit (command recording unit) 56, and a content storage unit (content storage unit) 58.

まず、再生部２０について説明する。再生部２０は、処理部４０のコマンド決定部４８からの指示に基づいてコンテンツ記憶部５８からコンテンツを読み出し、出力部３８に出力する。出力するコンテンツは、画像データを順に表示するスライドショーであってもよいし、操作に基づいてスライド内のアクションが進行するプレゼンテーションデータであってもよい。コマンド決定部４８からの指示に基づいて操作することが可能なコンテンツであれば、どのようなコンテンツであってもよい。 First, the reproducing unit 20 will be described. The playback unit 20 reads content from the content storage unit 58 based on an instruction from the command determination unit 48 of the processing unit 40 and outputs the content to the output unit 38. The content to be output may be a slide show that sequentially displays image data, or may be presentation data in which an action in the slide proceeds based on an operation. Any content can be used as long as the content can be operated based on an instruction from the command determination unit 48.

次に、入出力部３０の各部について説明する。動画入力部３２は、外部から映像を画像データとして入力するための映像入力デバイスである。具体的には、入出力部３０は、例えば、ＣＣＤ（Charge Coupled Devices）やＣＭＯＳ（Complementary Metal Oxide Semiconductor）などを用いたカメラであってもよいし、外部のＡＶ（Audio Visual）機器や外部カメラから映像を入力するための入力端子であってもよい。プロジェクタ１０の再生操作を指示するユーザの映像を入力することができるものであれば、どのような入力デバイスでもよい。 Next, each part of the input / output unit 30 will be described. The moving image input unit 32 is a video input device for inputting video as image data from the outside. Specifically, the input / output unit 30 may be a camera using, for example, a CCD (Charge Coupled Devices) or a CMOS (Complementary Metal Oxide Semiconductor), or an external AV (Audio Visual) device or an external camera. It may be an input terminal for inputting video from. Any input device may be used as long as it can input a user's video instructing the reproduction operation of the projector 10.

音声入力部３４は、外部から音声を音声データとして入力するための音声入力デバイスである。具体的には、音声入力部３４は、ダイナミックマイクやコンデンサマイクなどを用いた音声入力装置である。この音声入力部３４は、主としてユーザからのコンテンツ再生指示を音声によって入力するために用いられる。 The voice input unit 34 is a voice input device for inputting voice as voice data from the outside. Specifically, the voice input unit 34 is a voice input device using a dynamic microphone, a condenser microphone, or the like. The voice input unit 34 is mainly used for inputting a content reproduction instruction from the user by voice.

通知部３６は、処理部４０のコマンド決定部４８がコマンド入力状態になった際に、コマンド入力状態になったことをユーザに対して通知するための表示手段である。具体的には、通知部３６は、ライトやＬＥＤ（light emitting diode）などを用いて光線や点滅によってユーザに通知する発光手段や、発信音や音声合成出力などを用いて音声によってユーザに通知する発呼手段などを用いた出力デバイスである。なお、通知部３６は、ユーザに対してコマンド入力状態であることを通知できれば、どのようなデバイスであっても良い。 The notification unit 36 is a display unit for notifying the user that the command input state is entered when the command determination unit 48 of the processing unit 40 is in the command input state. Specifically, the notification unit 36 notifies the user by sound using a light emitting means for notifying the user by a light beam or blinking using a light, an LED (light emitting diode), or the like, or using a dial tone or a voice synthesis output. An output device using calling means or the like. The notification unit 36 may be any device as long as it can notify the user that it is in the command input state.

出力部３８は、再生部２０から出力されたコンテンツを外部に表示するための表示出力デバイスである。本実施形態では、光学レンズを用いてスクリーンに映像を投影する投影手段を用いるものとする。なお、出力部３６は、コンテンツをコマンドに応じて出力することができれば、どのような出力装置であっても良い。 The output unit 38 is a display output device for displaying the content output from the playback unit 20 to the outside. In the present embodiment, it is assumed that a projection unit that projects an image on a screen using an optical lens is used. The output unit 36 may be any output device as long as the content can be output in response to the command.

次に、処理部４０の各部について説明する。時系列画像記録部４２は、動画入力部３２から入力された映像を、連続する静止画像、または、ストリームで入力される動画像などとして時系列画像記憶部５２に記録する。本実施形態では、１００ｍｓの間隔で撮影された静止画を時系列に沿って並べて記録するものとするが、これに限るものではない。 Next, each part of the processing unit 40 will be described. The time-series image recording unit 42 records the video input from the moving image input unit 32 in the time-series image storage unit 52 as a continuous still image or a moving image input as a stream. In the present embodiment, still images taken at 100 ms intervals are recorded side by side in time series, but the present invention is not limited to this.

顔画像認識部４４は、動画入力部３２から入力される現在の入力画像から顔画像の領域を特定し、コマンド決定部４８に入力する。なお、画像認識部４４は、顔画像の特定のために、時系列画像記憶部５２から過去の画像を参照して、特定の人物の顔画像を追跡してもよいし、識別画像記憶部５４に登録された識別画像に基づいて特定人物の顔画像の領域を特定してもよい。過去の画像や識別画像に基づくことによって、顔画像認識部４４による顔画像の特定をより容易に、また確実にすることができる。 The face image recognition unit 44 specifies a face image area from the current input image input from the moving image input unit 32 and inputs the region to the command determination unit 48. Note that the image recognition unit 44 may track the face image of a specific person with reference to a past image from the time-series image storage unit 52 in order to identify the face image, or may identify the identification image storage unit 54. The area of the face image of the specific person may be specified based on the identification image registered in the above. Based on past images and identification images, the face image recognition unit 44 can more easily and reliably specify a face image.

音声認識部４６は、音声入力部３４から入力された音声を認識してコマンド決定部４８に音声信号を入力する。なお、音声認識部４６は、入力された音声から雑音を除去し、複数の音声を分離し、特定の周波数で発信される音声のみを選択する処理を行っても良い。 The voice recognition unit 46 recognizes the voice input from the voice input unit 34 and inputs a voice signal to the command determination unit 48. Note that the voice recognition unit 46 may perform a process of removing noise from the input voice, separating a plurality of voices, and selecting only voices transmitted at a specific frequency.

コマンド決定部４８は、顔画像認識部４４からの顔画像、および／または、音声認識部４６からの音声信号からコマンドを決定する。具体的には、入力された複数の顔画像に基づいて顔動作を識別し、対応するコマンドを選択する。あるいは、入力された音声信号を識別し、対応するコマンドを選択する。複数の顔画像から顔動作を識別する方法について、詳細は後述する。 The command determination unit 48 determines a command from the face image from the face image recognition unit 44 and / or the voice signal from the voice recognition unit 46. Specifically, a face motion is identified based on a plurality of input face images, and a corresponding command is selected. Alternatively, the input voice signal is identified and a corresponding command is selected. Details of a method for identifying a face motion from a plurality of face images will be described later.

次に、記憶部５０の各部について説明する。なお、記憶部５０内の各記憶部は、同じ記憶装置内を分類して記憶してもよいし、用途に応じて別の記憶素子を用いて構成してもよい。また、フラッシュメモリや光ディスクなどの外部記憶用メディアであってもよいし、半導体メモリ、磁気記憶媒体、および光記憶媒体などの記録装置であってもよい。 Next, each part of the storage unit 50 will be described. In addition, each memory | storage part in the memory | storage part 50 may classify | categorize and memorize | store the inside of the same memory | storage device, and may be comprised using another memory element according to a use. Further, it may be an external storage medium such as a flash memory or an optical disk, or a recording device such as a semiconductor memory, a magnetic storage medium, or an optical storage medium.

時系列画像記憶部５３は、時系列画像記録部４２から入力された時系列画像を記録するための記憶部である。本実施形態では、時系列画像記憶部５３には、時系列に沿って連続する画像を静止画として保存するものとする。なお、保存形式は、過去の画像との差分をとった形式で保存されていてもよいし、ストリーミング形式など保存してもよい。また、静止画ごとに顔領域を検出し、顔領域の画像だけを記録してもよいし、顔領域の画像を解析し、目や口の開閉状態など、後述するコマンドの要素となる特徴量を抽出して記録してもよい。 The time series image storage unit 53 is a storage unit for recording the time series image input from the time series image recording unit 42. In the present embodiment, the time-series image storage unit 53 stores images that are continuous in time series as still images. The storage format may be stored in a format obtained by taking a difference from a past image, or may be stored in a streaming format or the like. In addition, the face area may be detected for each still image, and only the face area image may be recorded, or the face area image may be analyzed, and features such as eyes and mouth open / closed states, which will be described later as command elements. May be extracted and recorded.

識別画像記憶部５４は、顔画像認識部５４に参照されてユーザを識別するための識別情報をあらかじめ登録するための記憶装置である。なお、識別情報は顔画像または音声情報として記録してもよいし、顔画像および音声情報から取り出した特徴量を識別情報として記録してもよい。 The identification image storage unit 54 is a storage device for preliminarily registering identification information for identifying the user with reference to the face image recognition unit 54. The identification information may be recorded as face image or audio information, or a feature amount extracted from the face image and audio information may be recorded as identification information.

コマンド記憶部５６は、顔動作と操作コマンドとを対応付けたテーブルを記憶するための記憶装置である。顔動作および操作コマンドの対応付けは、ユーザごとに行っても良いし、共通の標準コマンドを設定しても良い。また、操作コマンドは、再生部２０で行う動作にあわせて適時設定しても良い。 The command storage unit 56 is a storage device for storing a table in which face motions are associated with operation commands. The association between the face motion and the operation command may be performed for each user, or a common standard command may be set. Further, the operation command may be set as appropriate in accordance with the operation performed by the playback unit 20.

コンテンツ記憶部５８は、再生部２０で再生するコンテンツデータを蓄積するための記憶装置である。なお、コンテンツ記憶部５８は、各種記録メディアからコンテンツデータを読み出すためのドライブ装置であっても良いし、ネットワークを介してデータをダウンロードするものであっても良い。再生部２０の要求に応じてコンテンツデータを提供することができれば、コンテンツ記憶部５８は、どのような記憶装置であっても良い。 The content storage unit 58 is a storage device for accumulating content data to be played back by the playback unit 20. The content storage unit 58 may be a drive device for reading content data from various recording media or may download data via a network. The content storage unit 58 may be any storage device as long as content data can be provided in response to a request from the playback unit 20.

なお、上述の例では、プロジェクタ１０内に全ての機能部を備えている構成を示したが、これに限るものではない。図２は、本発明の別の実施形態におけるプロジェクタ１０の各部の概略構成を示したブロック図である。図１の実施形態と同じ機能を持つブロックについては、同じ符号を付し、その説明を省略するものとする。 In the above-described example, the projector 10 includes all the functional units. However, the configuration is not limited thereto. FIG. 2 is a block diagram showing a schematic configuration of each part of the projector 10 according to another embodiment of the present invention. Blocks having the same functions as those in the embodiment of FIG. 1 are denoted by the same reference numerals and description thereof is omitted.

プロジェクタ１０は、入出力部３０、処理部４０、および、記憶部５０を備えている。ＰＣ６０は、再生部２０およびコンテンツ記憶部５８をそなえている。図１の実施形態に比べて、本実施形態では、再生部２０およびコンテンツ記憶部５８をＰＣ６０内に備えている点が相違し、その他の構成は同様である。 The projector 10 includes an input / output unit 30, a processing unit 40, and a storage unit 50. The PC 60 includes a playback unit 20 and a content storage unit 58. Compared with the embodiment of FIG. 1, the present embodiment is different in that the playback unit 20 and the content storage unit 58 are provided in the PC 60, and the other configurations are the same.

プロジェクタ１０は、ＰＣ６０の再生部２０に対して操作を指示するコマンドを送信し、ＰＣ６０はコマンドに対応するコンテンツをコンテンツ記憶部５８から取り出して再生し、プロジェクタ１０内の出力部３８に出力させる。再生部２０は、ＰＣ６０上で実行されるプレゼンテーションソフトウェアであっても良いし、コンテンツ記憶部５８は、ＰＣ６０に備え付けられた記憶メディアの読取ドライブであっても良い。 The projector 10 transmits a command for instructing the operation to the reproduction unit 20 of the PC 60, and the PC 60 takes out the content corresponding to the command from the content storage unit 58 and reproduces it, and outputs it to the output unit 38 in the projector 10. The playback unit 20 may be presentation software executed on the PC 60, and the content storage unit 58 may be a storage medium reading drive provided in the PC 60.

次に、本実施形態のプロジェクタ１０を用いて、実際にプレゼンテーションを行う様子について、図３を参照して説明を行う。図３は、図２に示されたプロジェクタ１０およびＰＣ６０を用いてプレゼンテーションを行う際の様子を示す概略図である。 Next, how a presentation is actually performed using the projector 10 of the present embodiment will be described with reference to FIG. FIG. 3 is a schematic diagram showing a state when a presentation is performed using the projector 10 and the PC 60 shown in FIG.

プロジェクタ１０は、講演者７０の顔領域７２を動画入力部３２のカメラを用いて入力しつつ、スクリーン８０上にコンテンツを投影する。講演者７０は、スクリーン８０上に投影されたコンテンツを確認しながら、プロジェクタ１０の動画入力部３２のカメラに向かって顔動作を行う。プロジェクタ１０は、動画入力部３２によって撮影した講演者７０の顔領域７２から、時系列に沿った複数の顔画像領域を取り出し、顔動作を認識する。 The projector 10 projects the content on the screen 80 while inputting the face area 72 of the speaker 70 using the camera of the moving image input unit 32. The speaker 70 performs a face motion toward the camera of the moving image input unit 32 of the projector 10 while confirming the content projected on the screen 80. The projector 10 takes out a plurality of face image areas in time series from the face area 72 of the speaker 70 photographed by the moving image input unit 32 and recognizes the face motion.

その後、プロジェクタ１０は、顔動作に対応するコマンドをトリガ信号としてＰＣ６０に送信する。ＰＣ６０は、トリガ信号を受け取ると、対応するコンテンツを再生し、表示データとしてプロジェクタ１０に送信する。プロジェクタ１０は表示データを受け取って、プロジェクタ１０内の出力部３８からスクリーン８０にコンテンツを投影する。 Thereafter, the projector 10 transmits a command corresponding to the face motion to the PC 60 as a trigger signal. When receiving the trigger signal, the PC 60 reproduces the corresponding content and transmits it to the projector 10 as display data. The projector 10 receives the display data and projects content from the output unit 38 in the projector 10 onto the screen 80.

以上のように、講演者７０がプロジェクタ１０に向かって顔動作を行うことによって、ＰＣ６０で再生するコンテンツを制御するためのコマンドをトリガ信号として送信し、表示データをプロジェクタ１０が受け取りスクリーン８０に投影することで、コンテンツの再生操作を行う操作者を準備する必要なく、また、講演者７０がリモートコントローラなどを手に持つ必要なくコンテンツを制御しプレゼンテーションを進行することができる。 As described above, when the speaker 70 performs a face motion toward the projector 10, a command for controlling the content to be reproduced on the PC 60 is transmitted as a trigger signal, and the display data is received by the projector 10 and projected onto the screen 80. By doing so, it is not necessary to prepare an operator for performing the content reproduction operation, and it is possible to control the content and advance the presentation without requiring the speaker 70 to hold a remote controller or the like.

次に、顔動作の認識処理の流れについて、図４を参照して説明する。図４は、顔動作の認識処理の流れを示すフロー図である。認識処理が開始されると、まず、各種パラメータのリセットが行われる。 Next, the flow of facial motion recognition processing will be described with reference to FIG. FIG. 4 is a flowchart showing the flow of facial motion recognition processing. When the recognition process is started, first, various parameters are reset.

Ｓ１０１において、顔画像認識部４８は、カウンタおよびシーケンス履歴のリセットを行う。Ｓ１０２において、顔画像認識部４８は、タイマのカウントを開始して、１００ｍｓごとに顔検出処理を実行する。 In S101, the face image recognition unit 48 resets the counter and sequence history. In S <b> 102, the face image recognition unit 48 starts a timer and executes face detection processing every 100 ms.

なお、シーケンス履歴とは、一連の顔動作を時系列の流れに沿って繋げたもので、具体例をあげると、顔を静止する動作、顔を傾ける動作、口を開閉する動作、瞼を開閉する動作、頷く動作、顔を横に振る動作などの顔動作である。また、シーケンス履歴では、同じ動作であっても、各動作における時間経過の長さによって別のシーケンスと見なしても良い。 Note that the sequence history is a series of face movements connected in a time-series manner. Specific examples include face movement, face tilting, mouth opening / closing, and eyelid opening / closing. Face motions such as a motion to move, a motion to whisper and a motion to shake the face. In the sequence history, even the same operation may be regarded as a different sequence depending on the length of time elapsed in each operation.

Ｓ１０３において、前回の画像データおよび今回の画像データを比較することで、顔領域の移動を追跡する。顔領域の移動が前回の検出位置から一定以下の場合（Ｓ１０３でＹＥＳ）、処理はＳ１０４に進む。顔領域の移動が前回の検出位置から一定の値を超えた場合（Ｓ１０３でＮＯ）、処理はＳ１０１に戻ってカウンタのリセットを行う。 In S103, the movement of the face area is tracked by comparing the previous image data and the current image data. When the movement of the face area is not more than a certain value from the previous detection position (YES in S103), the process proceeds to S104. If the movement of the face area exceeds a certain value from the previous detection position (NO in S103), the process returns to S101 to reset the counter.

Ｓ１０４において、顔画像認識部４８はカウンタを＋１し、Ｓ１０５において、カウンタの値から経過時間を判断する。そして、顔画像認識部４８は経過時間に基づいて、ユーザが一定時間顔を静止させる動作（コマンド入力動作）を行ったかどうか判断する。カウンタが２０以上だった場合（Ｓ１０５でＹＥＳ）、処理はＳ１０６へ進む。カウンタが２０より少ない場合（Ｓ１０５でＮＯ）、顔の静止時間が２秒を越していないと判断し、処理はＳ１０２に戻ってカウントを再開する。 In S104, the face image recognition unit 48 increments the counter, and in S105, the elapsed time is determined from the value of the counter. Then, the face image recognition unit 48 determines based on the elapsed time whether or not the user has performed an operation (command input operation) for keeping the face stationary for a certain period of time. If the counter is 20 or more (YES in S105), the process proceeds to S106. If the counter is less than 20 (NO in S105), it is determined that the face still time has not exceeded 2 seconds, and the process returns to S102 to resume counting.

なお、上記の例では、顔を一定時間、カメラに向かって静止させる顔動作をコマンド入力動作として説明したが、これに限るものではない。カメラに向かって一定時間、瞼を閉じる顔動作であっても良いし、２回頷く顔動作であっても良い。ただし、顔をカメラに向かって静止させる動作であった場合、より少ない顔動作で容易にコマンドを入力することができ、誤認識を起こしにくいコマンド入力動作とすることができる。 In the above-described example, the face motion that stops the face toward the camera for a certain period of time has been described as the command input operation. However, the present invention is not limited to this. It may be a face action that closes the eyelid for a certain time toward the camera, or a face action that whips twice. However, when the operation is to make the face stand still toward the camera, a command can be easily input with fewer face motions, and a command input operation that is less likely to cause erroneous recognition can be achieved.

Ｓ１０６において、顔画像認識部４４は、口の開度を計測し、開／閉状態を判断する。Ｓ１０７において、顔の静止と、口の開閉の各顔動作とを一連のシーケンスとして、シーケンス履歴として更新を行う。なお、ここでは、顔の静止および口の開閉を連続する顔動作としてシーケンス履歴に記録したが、これに限らない。顔を用いる連続する動作の組み合わせであれば、どのようなものでもよい。また、顔動作およびコマンド操作の対応テーブルの例について、詳細は後述する。 In S106, the face image recognition unit 44 measures the opening degree of the mouth and determines the open / closed state. In step S107, the stillness of the face and the facial actions of opening and closing the mouth are updated as a sequence history as a sequence history. In this case, the face still and the opening and closing of the mouth are recorded in the sequence history as continuous face movements, but the present invention is not limited to this. Any combination of continuous motions using the face may be used. Details of the correspondence table of face motion and command operation will be described later.

Ｓ１０８において、コマンド認識部４８は、シーケンス履歴が規定のものと一致するか判断する。すなわち、一連の顔動作がコマンドと対応付けられているか判断する。シーケンス履歴が規定通りの場合（Ｓ１０８でＹＥＳ）、処理はＳ１０９へ進む。シーケンス履歴が規定通りでない場合（Ｓ１０８でＮＯ）、処理はＳ１０２に戻り、再びカウントを進める処理を繰り返す。 In S108, the command recognition unit 48 determines whether the sequence history matches the specified one. That is, it is determined whether a series of face movements is associated with a command. If the sequence history is as specified (YES in S108), the process proceeds to S109. If the sequence history is not as specified (NO in S108), the process returns to S102 and repeats the process of incrementing the count again.

Ｓ１０９において、コマンド認識部４８は、一連の顔動作に対応するコマンドをトリガ信号として再生部２０に送信する。その後、処理はＳ１０１に戻り、再びカウンタおよびシーケンス履歴をリセットして、プロジェクタ１０によるコンテンツの表示が終了するまで、上記の処理を繰り返す。 In S109, the command recognition unit 48 transmits a command corresponding to a series of face movements to the reproduction unit 20 as a trigger signal. Thereafter, the process returns to S101, the counter and the sequence history are reset again, and the above process is repeated until the display of the content by the projector 10 is completed.

次に、顔動作が規定通りであるかを判定するためのアルゴリズムについて、一例を図５および図６を参照して説明する。図５は、人間の顔の特徴を示すための模式図である。図６は、本実施形態で用いるアルゴリズムによって算出される画素値を示すグラフである。 Next, an example of an algorithm for determining whether the face motion is as specified will be described with reference to FIGS. 5 and 6. FIG. 5 is a schematic diagram for illustrating the characteristics of a human face. FIG. 6 is a graph showing pixel values calculated by the algorithm used in this embodiment.

図５内の大きい円は、人間の顔を模式的に示すものである。大きい円の中にある２つの小さい円は、人間の眼を模式的に示すものである。小さい２つの円の下にある円弧は、人間の口を模式的に示すものである。 A large circle in FIG. 5 schematically shows a human face. Two small circles within the large circle schematically represent the human eye. The arc under the two small circles schematically represents the human mouth.

ここで、図５内の矩形は、顔領域を示す枠線とする。顔領域は正方形で特定され、縦および横の幅はＤで示される。この場合、人間の口は顔領域として特定した矩形内の、下辺からＤ／３以内の距離に存在するものとして、口の開閉について検出を行う。Ｄは、ここでは１７９画素とする。なお、顔領域の特定には、従来使用されている各種のアルゴリズムを適用できる。ここでは、両目および唇を含む正方形の顔領域を特定したものとして、説明を行う。 Here, the rectangle in FIG. 5 is a frame line indicating the face area. The face area is specified by a square, and the vertical and horizontal widths are indicated by D. In this case, the opening / closing of the mouth is detected on the assumption that the human mouth exists within a distance of D / 3 from the lower side in the rectangle specified as the face region. Here, D is 179 pixels. Note that various algorithms conventionally used can be applied to the identification of the face area. Here, description will be made assuming that a square face area including both eyes and lips is specified.

図６は、検出した顔領域を示す矩形内の画素値から算出した値を示している。具体的には、顔領域の矩形内において、垂直な直線を領域内の中心に、当該領域内の下辺から１／３の位置まで引いた直線Ｌ上の画素値Ｐについて、次の式（１）を適用したものである。 FIG. 6 shows values calculated from pixel values in a rectangle indicating the detected face area. Specifically, for the pixel value P on the straight line L obtained by drawing a vertical straight line from the lower side in the region to the position of 1/3 within the rectangle of the face region, the following formula (1 ) Is applied.

Ｐ＝Ｒ／（Ｇ＋Ｂ）…（１）
なお、縦軸は画素値Ｐの値を示し、横軸は顔領域を示す矩形のＹ座標を示している。ここで、Ｒ、Ｇ、およびＢは、各画素における赤色、緑色、および青色の強度を示す値である。本実施形態では、ＲＧＢの各値は０から２５５の範囲を取るものとして説明する。なお、ここでは、０が暗い側の画素値を、２５５が明るい側の画素値を示すものとする。 P = R / (G + B) (1)
The vertical axis indicates the pixel value P, and the horizontal axis indicates the rectangular Y coordinate indicating the face area. Here, R, G, and B are values indicating red, green, and blue intensities in each pixel. In the present embodiment, description will be made assuming that each value of RGB takes a range from 0 to 255. Here, 0 indicates a pixel value on the dark side, and 255 indicates a pixel value on the bright side.

図６内において、画素値Ｐが０．５〜１の間に水平に引かれた太い直線Ｍは下記の式（２）および式（１）から計算される値である。 In FIG. 6, a thick straight line M drawn horizontally between pixel values P of 0.5 to 1 is a value calculated from the following equations (2) and (1).

Ｍ＝（ΣＰ（Ｖ_１）＋ΣＰ（Ｖ_２））／（Ｖ_１＋Ｖ_２）×１．３…（２）
すなわち、直線Ｌの両端から、それぞれ略３％のエリアＶ_１（Ｙ＝０〜５）およびＶ_２（Ｙ＝１７４〜１７９）の範囲内における画素値Ｐに対して、平均をとり１.３倍したものである。上記（２）式に実際の数値を当てはめると、Ｍの値は略０．８９となる。 M = (ΣP (V ₁ ) + ΣP (V ₂ )) / (V ₁ + V ₂ ) × 1.3 (2)
In other words, from both ends of the straight line L, an average of 1.3% is obtained with respect to the pixel values P in the range of approximately 3% of the areas V ₁ (Y = 0 to 5) and V ₂ (Y = 174 to 179), respectively. It is doubled. When an actual numerical value is applied to the above equation (2), the value of M is approximately 0.89.

上記の計算式において、Ｙ座標の両端からＰ＞Ｍを満たすＹの最小値および最大値を検索する。Ｙ＝６１のとき、Ｐ＝Ｍとなる。また、Ｙ＝１２２のとき、Ｐ＝Ｍとなる。直線Ｍ上で、Ｙ＝６１の点をａ、Ｙ＝１２２の点をｂとした場合、ａ−ｂ間をＮとする。Ｎ＞１７９／２を満たすとき、顔画像に示される人物は口を開けていると判断する。 In the above formula, the minimum value and the maximum value of Y satisfying P> M are searched from both ends of the Y coordinate. When Y = 61, P = M. When Y = 122, P = M. On the straight line M, when the point of Y = 61 is a and the point of Y = 122 is b, it is N between a and b. When N> 179/2 is satisfied, it is determined that the person shown in the face image is open.

上記のアルゴリズム中の各数値は一例であり、異なっていても良い。また、計算式や割合なども、別のもので実現することができる。アルゴリズムとして、例えば、顔器官の輪郭抽出技術を用いても良い。 Each numerical value in the above algorithm is an example and may be different. Also, the calculation formulas and ratios can be realized by other things. For example, a facial organ contour extraction technique may be used as the algorithm.

次に、再生部２０に指示するコマンドと、顔動作との対応について、図７を参照して説明する。図７の（ａ）および（ｂ）は、それぞれ、コマンドおよび顔動作の対応の一例を示す表である。 Next, the correspondence between the command instructed to the reproduction unit 20 and the face motion will be described with reference to FIG. (A) and (b) of FIG. 7 are tables showing examples of correspondence between commands and face motions, respectively.

図７（ａ）では、「コマンド１」として、「プレゼンテーションを次に進める」コマンドを、口を閉じた状態から、０.５秒未満の間隔で口を開け、閉じる顔動作に対応付けている。また、「コマンド２」として、「プレゼンテーションを前に進める」コマンドを、口を閉じた状態から、０．５秒以上の期間だけ口を開けて、再び口を閉じ、さらに、０．５秒未満の期間だけ口を開けた後、口を閉じる顔動作と対応付けている。 In FIG. 7A, as “command 1”, the “progress presentation” command is associated with a face motion that opens and closes the mouth at intervals of less than 0.5 seconds from the state where the mouth is closed. . Also, as “Command 2”, the “Proceed Presentation” command is opened for a period of 0.5 seconds or more from the closed state, closed again, and less than 0.5 seconds. This is associated with a face action in which the mouth is closed after the mouth is closed.

また、「コマンド３」として、「音声／動画の停止」コマンドを、口を閉じた状態から、０.５秒以上の期間だけ口を開けた後、口を閉じる顔動作と対応付けている。また、「コマンド４」として、「音声／動画の再生」コマンドを、口を閉じた状態から、０．５秒以上の期間だけ口を開けた後、また口を閉じて、さらに０．５秒以上の期間だけ口を開けた後、また口を閉じる顔動作と対応付けている。 In addition, as “command 3”, a “sound / moving image stop” command is associated with a face action in which the mouth is opened after the mouth is closed for a period of 0.5 seconds or more and then the mouth is closed. In addition, as a “command 4”, a “sound / video playback” command is used. After the mouth is closed, the mouth is opened for a period of 0.5 seconds or more, and then the mouth is closed again for another 0.5 seconds. After opening the mouth only for the above-mentioned period, it is associated with a face action that closes the mouth again.

上記のように、口を開いている時間を０．５秒より長いものと短いものとで区別すると、２回までの口の開閉で４通りのコマンドを規定できる。上記の例では、よく使うコマンドに対して短いアクションを割り当てている。また、口の開時間が長短入り混じっている場合には、エラーとしてコマンド扱いしないものとする。 As described above, when the mouth open time is distinguished from those longer than 0.5 seconds and those shorter than 0.5 seconds, four commands can be defined by opening and closing the mouth up to two times. In the above example, short actions are assigned to frequently used commands. Also, if the mouth open time is mixed, the command is not treated as an error.

図７（ｂ）では、「コマンド１」として、「プレゼンテーションを次に進める」コマンドを、「顔を正面→顔を右向き→顔を正面→顔を右向き→顔を正面」という一連の顔動作に対応付けている。また、「コマンド２」として、「プレゼンテーションを前に進める」コマンドを、「顔を正面→顔を左向き→顔を正面→顔を左向き→顔を正面」という一連の顔動作と対応付けている。 In FIG. 7B, as “command 1”, the “next presentation” command is changed to a series of face motions of “face front → face facing → face facing front → face facing right → face facing”. Corresponds. In addition, as “command 2”, the “progress presentation” command is associated with a series of face actions “face front → face left → face front → face left → face front”.

また、「コマンド３」として、「音声／動画の停止」コマンドを、「顔を正面→顔を下向き→顔を正面」という一連の顔動作と対応付けている。また、「コマンド４」として、「音声／動画の再生」コマンドを、「顔を正面→顔を下向き→顔を正面→顔を下向き→顔を正面」という顔動作と対応付けている。 Further, as “command 3”, the “sound / moving video stop” command is associated with a series of face motions of “face in front → face down → face in front”. Further, as “command 4”, the “sound / moving image reproduction” command is associated with a face action of “face front → face downward → face front → face downward → face front”.

以上のように、本実施形態のプロジェクタ１０は、顔画像を入力する動画入力部３２と、顔画像から顔動作を認識する顔画像認識部４４と、コンテンツの再生を指示するコマンドを顔動作に基づいて決定するコマンド決定部４６と、前記コマンドに基づいて再生されたコンテンツを表示出力する出力部３８とを備えているので、動画入力部３２から画像データを入力し、画像データの中から顔画像が含まれる領域を検索し、時系列に沿った複数の顔画像を追跡することでコマンドを指示する講演者７０を特定し、追跡した顔画像から再生装置を指示するコマンドを決定することで一連の顔動作を用いてコンテンツの再生操作を指示するコマンドを決定することができる。 As described above, the projector 10 according to the present embodiment uses the moving image input unit 32 for inputting a face image, the face image recognition unit 44 for recognizing the face motion from the face image, and the command for instructing the content reproduction as the face motion. A command determination unit 46 for determining the content based on the command, and an output unit 38 for displaying and outputting the content reproduced based on the command. By searching for a region including an image, tracking a plurality of face images in time series, identifying a speaker 70 who instructs a command, and determining a command for instructing a playback device from the tracked face image A command for instructing a content reproduction operation can be determined using a series of face movements.

上記の構成を用いることで、プロジェクタ１０の動画入力部３２に向かって顔動作を行うことによって、リモートコントローラなどの物理的な入力デバイスを用いることなく、ハンドフリーな状態で操作制御することができるプロジェクタを実現することができる。 By using the above configuration, by performing a face motion toward the moving image input unit 32 of the projector 10, it is possible to control the operation in a hands-free state without using a physical input device such as a remote controller. A projector can be realized.

また、本実施形態のプロジェクタ１０では、顔動作をコマンドと関連付けて記録するコマンド記録部５６をさらに備え、コマンド決定部４８は、コマンド記録部５６を参照して対応するコマンドを決定するので、コマンド記録部４８に顔動作とコマンドとを対応付けたテーブルを記憶し、コマンド決定部４８はテーブルを参照してコマンドを決定することによって、顔動作およびコマンドを独自に設定することができる。そのため、プロジェクタ１０や表示するコンテンツに応じたコマンドを設定し、表示出力を指示する講演者７０ごとに顔動作を独自に設定することができる。 Further, the projector 10 according to the present embodiment further includes a command recording unit 56 that records the face motion in association with the command, and the command determination unit 48 refers to the command recording unit 56 to determine the corresponding command. The recording unit 48 stores a table in which the face motion and the command are associated with each other, and the command determining unit 48 can uniquely set the face motion and the command by determining the command with reference to the table. Therefore, a command corresponding to the projector 10 and the content to be displayed can be set, and the face motion can be uniquely set for each speaker 70 who instructs display output.

また、本実施形態のプロジェクタ１０では、コマンド決定部４８は、第１の顔動作に基づいてコマンドの入力を受け付けるコマンド入力状態に入り、コマンド入力状態になった後に第２の顔動作が入力されることによってコマンドの決定を行うので、第１の顔動作を入力することでコマンドの入力を受け付ける状態にし、その後、第２の顔動作を入力することで、コマンドの誤認識を低減させることができる。このため、例えば、第１の顔動作として頻繁に同様の動作を行うことの少ない、特徴のある動作を設定し、第２の顔動作として簡単な動作を設定することで、簡単な動作の組み合わせによって、偶然、同様の動作を行うことによる誤認識および誤動作を低減することができるという効果を奏する。 Further, in the projector 10 of the present embodiment, the command determination unit 48 enters a command input state for accepting command input based on the first face motion, and the second face motion is input after entering the command input state. Therefore, the command can be received by inputting the first facial motion, and then the second facial motion is input to reduce erroneous recognition of the command. it can. For this reason, for example, a combination of simple movements is set by setting a characteristic movement that does not frequently perform the same movement as the first facial movement and setting a simple movement as the second facial movement. Therefore, it is possible to reduce erroneous recognition and malfunction caused by performing the same operation accidentally.

また、本実施形態のプロジェクタ１０では、第１の顔動作は、顔を所定の方向に向けて静止する動作であり、第２の顔動作は、顔の少なくとも一部を動かす動作であるので、顔を所定の方向に向けて静止することでコマンドの入力を受け付ける状態にし、その後、顔の一部を動かすことによってコマンドの種類を確定することで、コマンド入力の誤認識を低減させることができる。なお、上記の所定の方向には、プロジェクタ１０に備えられたカメラの方向があげられる。 Further, in the projector 10 according to the present embodiment, the first face motion is a motion that stops the face in a predetermined direction, and the second face motion is a motion that moves at least a part of the face. It is possible to reduce misrecognition of command input by setting the type of command by moving a part of the face after moving the part of the face by allowing the face to stand still in a predetermined direction. . The predetermined direction includes the direction of the camera provided in the projector 10.

また、本実施形態のプロジェクタ１０では、コマンド入力状態に入ったことを講演者７０に通知する通知部３６をさらに備え、コマンド決定部４８は、コマンド入力状態に入った場合に、通知部３６に通知させるので、入力状態通知手段を用いてコマンド入力状態にはいったことを講演者７０に通知し、講演者７０はコマンド入力状態であることを確認し、その後、第２の顔動作を入力することができる。これによって、講演者７０はコマンド入力状態であることを把握し、確認した上で第２のコマンドの入力を確実に行うことができるという効果を奏する。 The projector 10 according to the present embodiment further includes a notification unit 36 that notifies the speaker 70 that the command input state has been entered, and the command determination unit 48 notifies the notification unit 36 when the command input state has been entered. Since the notification is made, the speaker 70 is notified that the command input state has been entered using the input state notification means, the speaker 70 confirms that it is in the command input state, and then inputs the second facial motion. be able to. Thus, there is an effect that the lecturer 70 can grasp the command input state and confirm the input, and then reliably input the second command.

また、本実施形態のプロジェクタ１０では、顔画像認識部４４は、人物を識別するために識別画像５４に登録された識別情報と、顔画像から抽出された特徴情報とが一致するかを判断し、コマンド決定部４８は、識別情報と一致する抽出情報が抽出される顔画像から認識された顔動作に基づいてコマンドを決定するので、識別情報に基づいて顔画像を識別した上でコマンドを決定するため、指示を出している講演者７０を特定して、特定した講演者７０からの指示のみを受け付けるようにすることができる。これによって、複数の人物の顔画像が画像データ内に存在する場合でも、講演者７０を識別してコマンドの決定を行うことができるという効果を奏する。 In the projector 10 according to the present embodiment, the face image recognition unit 44 determines whether the identification information registered in the identification image 54 for identifying a person matches the feature information extracted from the face image. The command determination unit 48 determines the command based on the face motion recognized from the face image from which the extracted information that matches the identification information is extracted, and thus determines the command after identifying the face image based on the identification information. Therefore, it is possible to identify the speaker 70 who has given the instruction and accept only the instruction from the identified speaker 70. As a result, even when face images of a plurality of persons exist in the image data, it is possible to identify the speaker 70 and determine the command.

また、本実施形態のプロジェクタ１０では、音声を入力する音声入力部３４と、入力された音声を認識する音声認識部４６とをさらに備え、コマンド認識部４８は、顔動作に加え、認識された音声に基づいてコマンドを決定するので、顔画像による認識に加えて、音声認識手段による音声の認識結果に基づいてコマンドを決定するため、顔画像によるコマンドの認識に失敗した場合でも、音声によるコマンド入力によってコマンドを訂正することができる。 The projector 10 according to the present embodiment further includes a voice input unit 34 for inputting voice and a voice recognition unit 46 for recognizing the input voice. The command recognition unit 48 is recognized in addition to the face motion. Since the command is determined based on the voice, the command is determined based on the voice recognition result by the voice recognition means in addition to the recognition by the face image, so even if the command recognition by the face image fails, the voice command Commands can be corrected by input.

最後に、プロジェクタ１０の各ブロック、特に処理部４０の顔画像認識部４４、音声認識部４６、および、コマンド決定部４８は、ハードウェアロジックによって構成してもよいし、次のようにＣＰＵを用いてソフトウェアによって実現してもよい。 Finally, each block of the projector 10, in particular, the face image recognition unit 44, the voice recognition unit 46, and the command determination unit 48 of the processing unit 40 may be configured by hardware logic, and the CPU is configured as follows. And may be realized by software.

すなわち、処理部４０は、各機能を実現する制御プログラムの命令を実行するＣＰＵ（central processing unit）、上記プログラムを格納したＲＯＭ（read only memory）、上記プログラムを展開するＲＡＭ（random access memory）、上記プログラムおよび各種データを格納するメモリ等の記憶装置（記録媒体）などを備えている。そして、本発明の目的は、上述した機能を実現するソフトウェアである再生装置１の制御プログラムのプログラムコード（実行形式プログラム、中間コードプログラム、ソースプログラム）をコンピュータで読み取り可能に記録した記録媒体を、上記プロジェクタ１０に供給し、そのコンピュータ（またはＣＰＵやＭＰＵ）が記録媒体に記録されているプログラムコードを読み出し実行することによっても、達成可能である。 That is, the processing unit 40 includes a CPU (central processing unit) that executes instructions of a control program that realizes each function, a ROM (read only memory) that stores the program, a RAM (random access memory) that expands the program, A storage device (recording medium) such as a memory for storing the program and various data is provided. An object of the present invention is to provide a recording medium on which a program code (execution format program, intermediate code program, source program) of a control program of the playback apparatus 1 that is software that realizes the above-described functions is recorded so as to be readable by a computer. This can also be achieved by supplying the projector 10 and reading and executing the program code recorded on the recording medium by the computer (or CPU or MPU).

上記記録媒体としては、例えば、磁気テープやカセットテープ等のテープ系、フロッピー（登録商標）ディスク／ハードディスク等の磁気ディスクやＣＤ−ＲＯＭ／ＭＯ／ＭＤ／ＤＶＤ／ＣＤ−Ｒ等の光ディスクを含むディスク系、ＩＣカード（メモリカードを含む）／光カード等のカード系、あるいはマスクＲＯＭ／ＥＰＲＯＭ／ＥＥＰＲＯＭ／フラッシュＲＯＭ等の半導体メモリ系などを用いることができる。 Examples of the recording medium include a tape system such as a magnetic tape and a cassette tape, a magnetic disk such as a floppy (registered trademark) disk / hard disk, and an optical disk such as a CD-ROM / MO / MD / DVD / CD-R. Card system such as IC card, IC card (including memory card) / optical card, or semiconductor memory system such as mask ROM / EPROM / EEPROM / flash ROM.

また、プロジェクタ１０を通信ネットワークと接続可能に構成し、通信ネットワークを介して上記プログラムコードを供給してもよい。この通信ネットワークとしては、特に限定されず、例えば、インターネット、イントラネット、エキストラネット、ＬＡＮ、ＩＳＤＮ、ＶＡＮ、ＣＡＴＶ通信網、仮想専用網（Virtual Private Network）、電話回線網、移動体通信網、衛星通信網等が利用可能である。また、通信ネットワークを構成する伝送媒体としては、特に限定されず、例えば、ＩＥＥＥ１３９４、ＵＳＢ、電力線搬送、ケーブルＴＶ回線、電話線、ＡＤＳＬ回線等の有線でも、ＩｒＤＡやリモコンのような赤外線、Ｂｌｕｅｔｏｏｔｈ（登録商標）、８０２．１１無線、ＨＤＲ、携帯電話網、衛星回線、地上波デジタル網等の無線でも利用可能である。なお、本発明は、上記プログラムコードが電子的な伝送で具現化された、搬送波に埋め込まれたコンピュータデータ信号の形態でも実現され得る。 Further, the projector 10 may be configured to be connectable to a communication network, and the program code may be supplied via the communication network. The communication network is not particularly limited. For example, the Internet, intranet, extranet, LAN, ISDN, VAN, CATV communication network, virtual private network, telephone line network, mobile communication network, satellite communication. A net or the like is available. Also, the transmission medium constituting the communication network is not particularly limited. For example, even in the case of wired such as IEEE 1394, USB, power line carrier, cable TV line, telephone line, ADSL line, etc., infrared rays such as IrDA and remote control, Bluetooth ( (Registered trademark), 802.11 wireless, HDR, mobile phone network, satellite line, terrestrial digital network, and the like can also be used. The present invention can also be realized in the form of a computer data signal embedded in a carrier wave in which the program code is embodied by electronic transmission.

以上のように、本発明に係るプロジェクタ１０は、一連の顔動作を用いてコンテンツを表示させる再生操作を指示するコマンドを決定することが出来るので、プレゼンテーションシステムなどの任意の表示装置に適用できる。 As described above, the projector 10 according to the present invention can determine a command for instructing a reproduction operation for displaying content using a series of face motions, and thus can be applied to any display device such as a presentation system.

本発明の一実施形態であるプロジェクタの各部の概略構成を示した機能ブロック図である。FIG. 2 is a functional block diagram illustrating a schematic configuration of each unit of the projector according to the embodiment of the invention. 本発明の別の実施形態である表示システムの各部の概略構成を示した機能ブロック図である。It is the functional block diagram which showed schematic structure of each part of the display system which is another embodiment of this invention. 上記の表示システムを用いて、プレゼンテーションを行う様子を示した概略図である。It is the schematic which showed a mode that a presentation was performed using said display system. 上記の表示システムを用いた、顔動作の認識処理の流れを示すフロー図である。It is a flowchart which shows the flow of the recognition process of a face motion using said display system. 上記の表示システムにおいて、計算に用いる人間の顔の特徴を示すための模式図である。In said display system, it is a schematic diagram for showing the feature of the human face used for calculation. 上記の表示システムで用いるアルゴリズムによって算出される画素値を示すグラフである。It is a graph which shows the pixel value calculated by the algorithm used with said display system. （ａ）および（ｂ）は、それぞれ、上記の表示システムで用いるコマンドおよび顔動作の対応の一例を示す表である。(A) And (b) is a table | surface which shows an example of a response | compatibility of the command and face motion which are respectively used with said display system.

Explanation of symbols

１０プロジェクタ（表示装置）
２０再生部（再生手段）
３０入出力部
３２動画入力部（画像入力装置）
３４音声入力部（音声入力装置）
３６通知部（入力通知手段）
３８出力部（映像出力手段）
４０処理部
４２時系列画像記録部
４４顔画像認識部（顔画像認識手段）
４６音声認識部（音声認識手段）
４８コマンド決定部（コマンド決定手段）
５０記憶部
５２時系列画像記憶部
５４識別画像記憶部
５６コマンド記憶部（コマンド記憶装置）
５８コンテンツ記憶部（コンテンツ記憶装置） 10 Projector (display device)
20 Reproduction unit (reproduction means)
30 Input / output unit 32 Video input unit (image input device)
34 Voice input unit (voice input device)
36 Notification section (input notification means)
38 Output unit (video output means)
40 processing unit 42 time-series image recording unit 44 face image recognition unit (face image recognition means)
46 Voice recognition unit (voice recognition means)
48 Command decision part (command decision means)
50 Storage Unit 52 Time Series Image Storage Unit 54 Identification Image Storage Unit 56 Command Storage Unit (Command Storage Device)
58 Content storage unit (content storage device)

Claims

An image input unit for inputting a face image;
Facial image recognition means for recognizing facial motion from the facial image;
Command determining means for determining a command for instructing reproduction of content based on the face motion;
A display device comprising: an output unit that displays and outputs content reproduced based on the command.

A command storage unit that stores the face motion in association with the command;
The display device according to claim 1, wherein the command determination unit determines the command corresponding to the face motion with reference to the command storage unit.

When the first face motion is input, the command determination means enters a command input state for accepting a command input, and determines a command based on the second face motion recognized after the command input state. The display device according to claim 2, wherein the display device is performed.

The first face motion is a motion in which the face is stationary in a predetermined direction, and the second face motion is a motion in which at least a part of the face moves. Display device.

It further comprises an input state notification means for notifying the user that the command input state has been entered,
The display device according to claim 3, wherein the command determination unit causes the input state notification unit to notify when the command input state is entered.

The face image recognition means recognizes whether the identification information registered for identifying a person matches the feature information extracted from the face image;
The display device according to claim 1, wherein the command determination unit determines a command based on a facial motion recognized from the facial image from which the feature information that matches the identification information is extracted.

A voice input unit for inputting voice;
Voice recognition means for recognizing the input voice,
The display device according to claim 1, wherein the command determination unit determines the command based on the face motion and the recognized voice.

A display device according to claim 1;
A content storage unit for storing the content to be reproduced;
A projector comprising: a reproduction unit that reproduces content from the content storage unit based on a command from the display device and inputs the content to an output unit of the display device.

8. A display system comprising: the display device according to claim 1; and a playback device that plays back the content based on a command from the display device.

A display method in a display device, comprising: an image input unit for inputting a face image; and an output unit for displaying and outputting content reproduced based on a command for instructing reproduction of the content,
A step of recognizing a face motion from the face image;
A command determining means comprising: determining the command based on the face motion;

A program for operating the display device according to any one of claims 1 to 7, wherein the program causes a computer to function as each of the means.

A computer-readable recording medium on which the display program according to claim 11 is recorded.