JP4781181B2

JP4781181B2 - User interface program, apparatus and method, information processing system

Info

Publication number: JP4781181B2
Application number: JP2006188635A
Authority: JP
Inventors: 智一掛
Original assignee: Sony Interactive Entertainment Inc; Sony Computer Entertainment Inc
Current assignee: Sony Interactive Entertainment Inc
Priority date: 2006-07-07
Filing date: 2006-07-07
Publication date: 2011-09-28
Anticipated expiration: 2026-07-07
Also published as: JP2008015942A

Description

本発明は、ユーザインタフェース技術に関する。 The present invention relates to user interface technology.

従来から、ユーザの動作をビデオカメラなどの撮像装置を用いて撮影し、ユーザの動画像を画面に映し出して、コマンドの入力やゲームのプレイを可能にしたものが知られている（例えば、特許文献１）。このような画像処理装置では、画面上に配置されるメニュー画面やオブジェクトにユーザの動画像が画面内で接触することで、コマンドを入力することができる。つまり、ユーザの動画像そのものが入力インタフェースとして機能している。
特開２００５−２１６０６１号公報 2. Description of the Related Art Conventionally, it is known that a user's operation is photographed using an imaging device such as a video camera, and a user's moving image is displayed on a screen so that a command can be input or a game can be played (for example, patents). Reference 1). In such an image processing apparatus, a user can input a command when a user's moving image contacts a menu screen or an object arranged on the screen. That is, the user's moving image itself functions as an input interface.
Japanese Patent Laid-Open No. 2005-216061

上述のようなユーザの動画像を入力インタフェースとして利用するゲームを始めとしたアプリケーションでは、アプリケーションを操作するユーザの行動を、画面上の演出などによって自然なかたちで導き出すことが重要である。日常の動作にはない不自然な動きをユーザに要求すると、アプリケーションに対するユーザの興味が失われる恐れがある。また、画面に何のインストラクションも出ていない状態では、ユーザは何をすれば所望の操作をすることができるかが分からず、アプリケーションの開始や継続ができなくなってしまう恐れもある。 In an application such as a game that uses a user's moving image as an input interface as described above, it is important to derive the behavior of the user who operates the application in a natural manner by means of effects on the screen. If the user requests an unnatural movement that is not in daily operation, the user's interest in the application may be lost. Further, in a state where no instruction is displayed on the screen, the user does not know what the user can perform a desired operation, and the application may not be started or continued.

本発明はこうした課題に鑑みてなされたものであり、その目的は、ユーザの動画像を入力インタフェースとして利用するアプリケーションにおいて、ユーザにとって自然で使いやすいインタフェースを実現するための技術を提供することにある。 The present invention has been made in view of these problems, and an object of the present invention is to provide a technique for realizing an interface that is natural and easy for a user to use in an application that uses a user's moving image as an input interface. .

本発明のある態様は、撮像装置により撮像されたユーザの画像から該ユーザの顔の領域を特定する領域特定機能と、特定された顔の領域の画像を用いてユーザの顔の動作を検出する動作検出機能と、顔の動作とイベントとを関連づけたテーブルを保持し、動作検出機能により検出された顔の動作に対応するイベントをテーブルから検索して該イベントを実行するイベント実行機能と、をコンピュータに発揮させるユーザインタフェースプログラムである。 An aspect of the present invention detects a user's face motion using a region specifying function for specifying a user's face region from the user's image captured by an imaging device, and the specified face region image. An event execution function for holding a motion detection function and a table associating a face motion with an event, searching the table for an event corresponding to the face motion detected by the motion detection function, and executing the event; This is a user interface program to be exhibited on a computer.

この態様によると、顔の動きという非常に簡単かつ負荷の少ない動作でコンピュータに入力を与えることができ、ユーザにとって自然かつ使いやすい入力インタフェースを提供することができる。
なお、撮像されたユーザの画像は、ディスプレイの画面にそのまま映されてもよいし、ユーザの鏡面画像が映されてもよい。ユーザの画像は画面に映されなくてもよい。ユーザの動作と同じ動きをするキャラクタの画像を画面に表示してもよい。また、顔の領域の画像が特定できればよいので、ユーザがお面等をかぶっていてもよい。 According to this aspect, it is possible to provide an input to the computer with a very simple and light operation such as the movement of the face, and it is possible to provide an input interface that is natural and easy to use for the user.
Note that the captured user image may be displayed on the display screen as it is, or a mirror image of the user may be displayed. The user image may not be displayed on the screen. An image of a character that moves the same as the user's movement may be displayed on the screen. Moreover, since the image of the face area may be specified, the user may be wearing a mask.

「イベント」には、ゲームのコントローラやテレビのリモコンなどのボタン操作に対応する操作指示と、オブジェクトの画面表示などが含まれる。操作指示には、ヘルプの表示、キャンセル、決定、選択、メニューの表示、終了、移動方向の入力などが含まれる。 The “event” includes an operation instruction corresponding to a button operation of a game controller, a TV remote control, etc., and an object screen display. The operation instruction includes help display, cancellation, determination, selection, menu display, termination, input of the movement direction, and the like.

動作検出機能はユーザの顔の傾きを動作として検出してもよい。また、動作検出機能は、検出したユーザの顔の傾きが、画面上に定義される三次元座標のいずれの軸回りのものであるかを判定し、イベント実行機能は、動作検出機能により判定された顔の傾き軸に応じて異なるイベントを実行してもよい。顔の回転軸がｘ、ｙ、ｚのいずれの軸回りかを判定することで、顔の傾きだけで三通りの異なるイベントを実行することができる。軸に対する回転方向、つまり右回りか左回りかによって異なるイベントを実行するようにしてもよい。 The motion detection function may detect the tilt of the user's face as a motion. The motion detection function determines which axis of the detected three-dimensional coordinates the user's face tilt is detected on, and the event execution function is determined by the motion detection function. Different events may be executed according to the tilt axis of the face. By determining whether the rotation axis of the face is about x, y, or z, three different events can be executed only by the inclination of the face. Different events may be executed depending on the rotation direction with respect to the axis, that is, clockwise or counterclockwise.

領域特定機能は、特定された顔の領域内で顔を構成するパーツの領域をさらに特定し、動作検出機能で検出される顔の動作にはパーツの動作が含まれる。顔のパーツには、目、口、鼻、耳、眉、瞳、舌などが含まれるが、顔認識技術で特定できるパーツであればこれらに限定されない。これによると、表情を変化させるだけで様々な入力をコンピュータに対して与えることができる。また、全身の動作を伴わないため、従来のこの種のアプリケーションでは困難だった、椅子に座った状態や寝ころんだ状態での操作も実現できる。 The area specifying function further specifies a part area constituting the face within the specified face area, and the face action detected by the action detecting function includes a part action. The facial parts include eyes, mouth, nose, ears, eyebrows, pupils, tongues, etc., but are not limited to these as long as they can be identified by face recognition technology. According to this, various inputs can be given to the computer simply by changing the facial expression. In addition, since it does not involve the movement of the whole body, it is possible to realize an operation while sitting on a chair or lying down, which was difficult with this type of conventional application.

イベント実行機能により実行されるイベントには、コンピュータで実行されるアプリケーションに対する操作指示が含まれてもよい。この場合、テーブルにおいて操作指示と関連つけられた顔の動作の少なくとも一部は、ユーザがその操作指示を現実世界で表現するときの動作と近似する動作であると好ましい。 The event executed by the event execution function may include an operation instruction for an application executed on the computer. In this case, it is preferable that at least a part of the face motion associated with the operation instruction in the table is an operation that approximates the operation when the user expresses the operation instruction in the real world.

動作検出機能はユーザの顔の動作の繰り返し数をカウントし、イベント実行機能は、繰り返し数に応じて異なるイベントを実行してもよい。また、動作検出機能は、顔の動作が持続している時間を計測し、計測した時間がしきい値以上になったとき顔の動作が実行されたものと判定してもよい。顔の動作はユーザが無意識に実行することも多いが、繰り返し数や持続時間といった条件を含めることで、誤動作の発生を抑制できる。 The motion detection function may count the number of repetitions of the user's face motion, and the event execution function may execute different events depending on the number of repetitions. The motion detection function may measure a time during which the face motion is continued, and may determine that the face motion has been executed when the measured time is equal to or greater than a threshold value. The face motion is often executed unconsciously by the user, but the occurrence of malfunction can be suppressed by including conditions such as the number of repetitions and duration.

イベント実行機能は、動作検出機能により検出された顔の動作に関連つけられるイベントがオブジェクトの画面表示であるとき、該オブジェクトを生成するオブジェクト生成機能を有してもよい。オブジェクト生成機能は、オブジェクトの表示形態を時間の経過や、顔の傾き角の増大または減少とともに変化させてもよい。ここでいう表示形態には、オブジェクトの色や大きさを変化させること、移動や回転の速度を変化させることなどが含まれる。ユーザの動作に応じてオブジェクトを変化させることで、想定されている入力動作をするようにユーザを誘導することができる。 The event execution function may include an object generation function that generates an object when the event associated with the face motion detected by the motion detection function is a screen display of the object. The object generation function may change the display form of the object as time elapses or the inclination angle of the face increases or decreases. The display form here includes changing the color and size of an object, changing the speed of movement and rotation, and the like. By changing the object according to the user's movement, the user can be guided to perform the assumed input movement.

なお、本発明の構成要素や表現を方法、システム、コンピュータプログラム、コンピュータプログラムを格納した記録媒体などの間で相互に置換したものもまた、本発明の態様として有効である。 It is also effective as an aspect of the present invention to replace the components and expressions of the present invention among methods, systems, computer programs, recording media storing computer programs, and the like.

本発明によれば、ユーザの動画像を入力インタフェースとして利用するアプリケーションにおいて、ユーザにとって自然で使いやすいインタフェースを提供することができる。 According to the present invention, it is possible to provide an interface that is natural and easy for a user to use in an application that uses a user's moving image as an input interface.

実施の形態１．
図１は、本発明の一実施形態である画像処理装置１０を用いた画像処理システム１００の構成を示す。画像処理システム１００は、ユーザの動画像を撮像する撮像装置であるカメラ２と、画像処理装置１０と、画像を表示するためのディスプレイ装置４とから構成される。 Embodiment 1 FIG.
FIG. 1 shows a configuration of an image processing system 100 using an image processing apparatus 10 according to an embodiment of the present invention. The image processing system 100 includes a camera 2 that is an imaging device that captures a moving image of a user, an image processing device 10, and a display device 4 that displays an image.

画像処理装置１０は、ディスプレイ装置４に相対するユーザ６をカメラ２で撮影して得られた動画像を時系列的に連続に取り込んで鏡面動画像を生成し、鏡面動画像８をディスプレイ装置４にリアルタイムで表示させる。また、画像処理装置１０は、鏡面動画像８の他に、メニューやカーソル等のオブジェクト画像を合成してディスプレイ装置４に表示させることもできる。ユーザ６は、この鏡面動画像とオブジェクトを見ながら、画像処理装置１０により実行されるゲームやアルバム表示などのアプリケーションを操作する。ユーザが予め決められた動作をすると、ユーザの動作を反映させた画像が画面上に表示されるとともに、アプリケーションに対して何らかの指示が与えられる。 The image processing device 10 continuously captures a moving image obtained by photographing the user 6 facing the display device 4 with the camera 2 in time series to generate a mirror moving image, and the mirror moving image 8 is displayed on the display device 4. To display in real time. The image processing apparatus 10 can also synthesize object images such as menus and cursors in addition to the mirrored moving image 8 and display them on the display device 4. The user 6 operates an application such as a game or album display executed by the image processing apparatus 10 while viewing the mirrored moving image and the object. When the user performs a predetermined operation, an image reflecting the user's operation is displayed on the screen, and some instruction is given to the application.

鏡面動画像は、カメラ２から取り込んだ動画像を画像処理装置１０で鏡面処理（画像の左右反転処理）することにより生成される。この鏡面処理によって、ユーザは鏡を見ているような感覚でアプリケーションの操作をすることができる。鏡面処理は、カメラ２の有している機能によって行われてもよい。 The mirror moving image is generated by performing mirror processing (image left-right reversal processing) on the moving image captured from the camera 2 by the image processing apparatus 10. By this mirror surface processing, the user can operate the application as if looking at the mirror. The mirror surface process may be performed by a function of the camera 2.

カメラ２は、ＣＣＤカメラであることが好ましいが、それ以外のアナログまたはデジタルのビデオカメラであってもよい。図示するように、カメラ２は、ディスプレイ装置４の上部に設置されることが好ましい。ディスプレイ装置４に対面したユーザの動作を撮影することで、ユーザは自らの姿を見た状態でアプリケーションを操作できるからである。 The camera 2 is preferably a CCD camera, but may be other analog or digital video cameras. As shown in the figure, the camera 2 is preferably installed on top of the display device 4. This is because the user can operate the application while looking at his / her appearance by photographing the user's operation facing the display device 4.

図２は、画像処理装置１０のハードウェア構成図である。画像処理装置１０は、ＣＰＵ１２、メインメモリ１４、ＧＰＵ１６、入出力制御装置１８、ドライブ装置２０、表示制御装置２４およびＤＭＡＣ２６を備える。これらは、バス２８を介して相互に接続される。 FIG. 2 is a hardware configuration diagram of the image processing apparatus 10. The image processing apparatus 10 includes a CPU 12, a main memory 14, a GPU 16, an input / output control device 18, a drive device 20, a display control device 24, and a DMAC 26. These are connected to each other via a bus 28.

ＣＰＵ１２は、オペレーティングシステムを動作させて画像処理装置１０の全体を制御するとともに、ドライブ装置２０に装着された記録媒体５０からメインメモリ１４にプログラムやデータを読み出し、これにしたがって各種の処理を実行する。また、ＣＰＵ１２は、記録媒体５０から読み出した三次元オブジェクトデータに対して、オブジェクトの形状や動き等を表現するためのジオメトリ処理を行い、ＧＰＵ１６に渡す。 The CPU 12 operates the operating system to control the entire image processing apparatus 10 and reads programs and data from the recording medium 50 mounted on the drive apparatus 20 to the main memory 14 and executes various processes according to the programs and data. . Further, the CPU 12 performs geometry processing for expressing the shape, movement, etc. of the object on the three-dimensional object data read from the recording medium 50 and passes it to the GPU 16.

ＧＰＵ１６は、描画用のデータを保持しており、ＣＰＵ１２からの指令にしたがって必要な描画データを読み出してレンダリング処理を行い、図示しないフレームメモリに描画する。ＤＭＡＣ２６は、バス２８に接続された各回路を対象としてＤＭＡ転送の制御を行う。 The GPU 16 holds drawing data, reads necessary drawing data in accordance with a command from the CPU 12, performs rendering processing, and draws it in a frame memory (not shown). The DMAC 26 controls DMA transfer for each circuit connected to the bus 28.

ドライブ装置２０は、磁気ディスク、光ディスク、光磁気ディスク、または半導体メモリなどのリムーバブル記録媒体５０を使用するドライブ装置である。以下で説明する機能を有したユーザインタフェースプログラムを格納した記録媒体５０をドライブ装置２０から読み込むことで、画像処理装置１０はユーザインタフェース装置として機能する。 The drive device 20 is a drive device that uses a removable recording medium 50 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory. The image processing apparatus 10 functions as a user interface device by reading from the drive device 20 a recording medium 50 that stores a user interface program having the functions described below.

入出力制御装置１８には、カメラ２からの画像信号が入力される。入出力制御装置１８には、ハードディスクドライブなどの外部記憶装置、ネットワークを介して外部の機器と通信を行うためのモデムやターミナルアダプタ（ＴＡ）などの通信装置、ＴＶチューナ、プリンタなどが接続されてもよい。画像処理装置１０に対してデータを入力するためのコントローラ５２が接続されてもよい。 An image signal from the camera 2 is input to the input / output control device 18. The input / output control device 18 is connected to an external storage device such as a hard disk drive, a communication device such as a modem or terminal adapter (TA) for communicating with an external device via a network, a TV tuner, a printer, and the like. Also good. A controller 52 for inputting data to the image processing apparatus 10 may be connected.

表示制御装置２４にはディスプレイ装置４が接続される。表示制御装置２４は、ＧＰＵ１６によってフレームメモリに描画されたデータをディスプレイ装置４で表示できるようにするためのビデオ信号を生成する。
なお、ユーザインタフェースプログラムは、上述のように記録媒体からロードさせる代わりに、画像処理装置１０に予め組み込まれていてもよい。 The display device 4 is connected to the display control device 24. The display control device 24 generates a video signal for enabling the data drawn in the frame memory by the GPU 16 to be displayed on the display device 4.
Note that the user interface program may be incorporated in advance in the image processing apparatus 10 instead of being loaded from the recording medium as described above.

図３は、本実施形態にかかるユーザインタフェース装置の構成を示す。これらの構成は、ＣＰＵ、メモリ、メモリにロードされたプログラムなどによって実現されるが、ここではそれらの連携によって実現される機能ブロックを描いている。したがって、これらの機能ブロックがハードウェアのみ、ソフトウエアのみ、またはそれらの組合せによっていろいろな形で実現できることは、当業者には理解されるところである。 FIG. 3 shows the configuration of the user interface device according to the present embodiment. These configurations are realized by a CPU, a memory, a program loaded in the memory, and the like, but here, functional blocks realized by their cooperation are depicted. Therefore, those skilled in the art will understand that these functional blocks can be realized in various forms by hardware only, software only, or a combination thereof.

画像入力部１０２は、カメラ２によって撮影された動画像を１フレームずつ取り込む。入力される画像がアナログ画像の場合は、Ａ／Ｄ変換を行ってデジタル画像に変換してから取り込む。 The image input unit 102 captures a moving image captured by the camera 2 frame by frame. If the input image is an analog image, the image is captured after A / D conversion is performed to convert it to a digital image.

画像反転部１０４は、画像入力部１０２により取り込まれた画像に対して鏡面処理を施し、鏡面動画像を生成する。 The image reversing unit 104 performs mirror surface processing on the image captured by the image input unit 102 to generate a mirror moving image.

領域特定部１０６は、鏡面動画像を構成する各フレーム画像について、ユーザの顔に相当する領域を特定する。領域特定部１０６は、顔・パーツ検出部１２２、手検出部１２４、基準画像記憶部１２６を含む。 The area specifying unit 106 specifies an area corresponding to the user's face for each frame image constituting the specular moving image. The area specifying unit 106 includes a face / part detection unit 122, a hand detection unit 124, and a reference image storage unit 126.

顔・パーツ検出部１２２は、基準画像記憶部１２６に格納されている顔の基準画像を取り出し、基準画像と入力された画像とのマッチングを実行して、ユーザの顔に相当する領域の画像を特定する。顔の領域が特定されると、その内部でさらに顔を構成するパーツ、例えば目、鼻、口に相当する領域を、各パーツの基準画像とのマッチングによって特定する。 The face / part detection unit 122 extracts the reference image of the face stored in the reference image storage unit 126, executes matching between the reference image and the input image, and obtains an image of an area corresponding to the user's face. Identify. When the face area is specified, parts further constituting the face, for example, areas corresponding to eyes, nose, and mouth are specified by matching with the reference image of each part.

基準画像記憶部１２６には、正面、横向き、上向き、下向きのそれぞれについての基準画像が準備されている。顔・パーツ検出部１２２は、これら基準画像のそれぞれと入力された画像とのマッチングを実行し、最もマッチするもので顔検出をすることで、ユーザの顔がいずれの向きを向いていても顔の領域を特定することが可能である。また、正面顔の状態では、両目が平行のときを中立位置として、この基準画像を傾けたものと入力画像とのマッチングを実行することで、顔の傾きの角度も検出することができる。
手検出部１２４は、画面内の所定の部分において動き検出を実行し、ユーザの手の動きを検出する。この検出は、公知の任意の技術を使用することができる。 In the reference image storage unit 126, reference images for front, sideways, upward, and downward are prepared. The face / part detection unit 122 performs matching between each of these reference images and the input image, and detects the face with the best match so that the face of the user can face in any direction. Can be specified. In the front face state, the angle of the face can also be detected by executing the matching between the input image and the tilted reference image using the neutral position when both eyes are parallel.
The hand detection unit 124 detects a motion of a user's hand by performing motion detection at a predetermined portion in the screen. For this detection, any known technique can be used.

動作検出部１０８は、領域特定部１０６で特定された顔の領域の画像および各種パーツの領域の画像を複数フレーム分参照して、ユーザの顔の動作およびパーツの動作を検出する。動作検出部１０８の詳細な構成は、図４を参照して後述する。 The motion detection unit 108 refers to a plurality of frames of the face region image and various part region images specified by the region specifying unit 106 to detect the user's face operation and part operation. The detailed configuration of the operation detection unit 108 will be described later with reference to FIG.

イベント実行部１１０は、顔の動作およびパーツの動作とイベントとを関連づけたテーブルを保持し、動作検出部１０８により検出された顔またはパーツの動作に対応するイベントをテーブルから検索してそのイベントを実行する。「イベント」には、アプリケーションに対する操作指示と、オブジェクトの画面表示が含まれる。操作指示には、ヘルプの表示、キャンセル、決定、選択、メニューの表示、終了、移動方向の入力などが含まれる。イベント実行部１１０の詳細な構成は、図４を参照して後述する。 The event execution unit 110 holds a table that associates the face motion and part motion with the event, searches the table for an event corresponding to the face or part motion detected by the motion detection unit 108, and searches for the event. Execute. “Event” includes an operation instruction for the application and a screen display of the object. The operation instruction includes help display, cancellation, determination, selection, menu display, termination, input of the movement direction, and the like. The detailed configuration of the event execution unit 110 will be described later with reference to FIG.

画像合成部１１６は、画像反転部１０４により生成された鏡面動画像と、イベント実行部１１０により生成されたオブジェクト画像とを合成した合成画像を図示しないフレームメモリに描画する。表示制御部１１８は、画像合成部１１６により合成された画像のディスプレイ装置４への表示を制御する。 The image synthesizing unit 116 draws a synthesized image obtained by synthesizing the specular moving image generated by the image inverting unit 104 and the object image generated by the event executing unit 110 in a frame memory (not shown). The display control unit 118 controls display of the image synthesized by the image synthesis unit 116 on the display device 4.

図２に示したハードウェア構成との関係では、画像入力部１０２は入出力制御装置１８などにより構成される。画像反転部１０４、領域特定部１０６、動作検出部１０８、イベント実行部１１０はＣＰＵ１２、メインメモリ１４、ＧＰＵ１６などにより構成される。画像合成部１１６はＧＰＵ１６により構成され、表示制御部１１８はＧＰＵ１６と表示制御装置２４により構成される。 In relation to the hardware configuration shown in FIG. 2, the image input unit 102 includes the input / output control device 18 and the like. The image reversing unit 104, the region specifying unit 106, the operation detecting unit 108, and the event executing unit 110 are configured by the CPU 12, the main memory 14, the GPU 16, and the like. The image composition unit 116 is configured by the GPU 16, and the display control unit 118 is configured by the GPU 16 and the display control device 24.

図４は、動作検出部１０８およびイベント実行部１１０の詳細な構成を示す機能ブロック図である。これらの機能ブロックも、ハードウェアのみ、ソフトウエアのみ、またはそれらの組合せによっていろいろな形で実現できる。 FIG. 4 is a functional block diagram illustrating detailed configurations of the operation detection unit 108 and the event execution unit 110. These functional blocks can also be realized in various forms by hardware only, software only, or a combination thereof.

動作検出部１０８は、顔傾き軸判定部１５０、変位量判定部１５２、パーツ動作判定部１５４および画面状態取得部１５６を含む。
顔傾き軸判定部１５０は、ユーザの顔の傾きが、画面上に定義される三次元座標のいずれの軸回りのものであるかを判定する。上述したように、領域特定部１０６は、正面、横向き、上向き、下向きの顔についてそれぞれ基準画像を保持しており、顔の領域を特定する際にいずれの向きの顔であるかについても特定することができる。したがって、顔傾き軸判定部１５０は、領域特定部１０６で特定された顔の向きを複数フレームについて取得し、この間の顔の向きの変化にしたがって顔の傾き軸を決定する。顔の傾き軸の具体例については後述する。顔の向きの変化は、目の数の変化で判定することもできる。例えば、あるフレームでは両目が映っていたが、後のフレームでは片目になっていたときは、顔を横に振ったものと判定できる。 The motion detection unit 108 includes a face tilt axis determination unit 150, a displacement amount determination unit 152, a part motion determination unit 154, and a screen state acquisition unit 156.
The face inclination axis determination unit 150 determines which axis of the three-dimensional coordinates defined on the screen the user's face inclination is about. As described above, the area specifying unit 106 holds reference images for front, sideways, upside, and downside faces, and specifies the face in which direction when specifying the face area. be able to. Therefore, the face tilt axis determination unit 150 acquires the face orientations specified by the region specifying unit 106 for a plurality of frames, and determines the face tilt axis according to the change in the face orientation during this period. A specific example of the face tilt axis will be described later. Changes in the orientation of the face can also be determined by changes in the number of eyes. For example, if both eyes are shown in a certain frame but one eye is seen in a later frame, it can be determined that the face is shaken sideways.

変位量判定部１５２は、顔傾き軸判定部１５０により顔の傾き軸が決定された後、その軸に垂直な平面内でユーザの顔の傾けられた角度を算出する。領域特定部１０６において、入力画像内の顔が傾いている場合は基準画像を傾けたものとのマッチングにより顔領域を検出しているため、基準画像の傾き角を算出すれば、顔の傾き角を求めることができる。変位量判定部１５２は、顔の傾き角が予め定められたしきい値、例えば４０度以上になったとき、顔を傾けていると判定するようにしてもよい。また、変位量判定部１５２は、ユーザが顔を傾けている持続時間を計測してもよい。これは、顔の領域を特定するために基準画像を傾けたものを何フレームにわたって用いたかによって、算出することができる。変位量判定部１５２は、計測した時間が予めしきい値以上になったとき、顔が傾けられていると判定してもよい。さらに、変位量判定部１５２は、顔の傾きの繰り返し数をカウントしてもよい。 After the face tilt axis determination unit 150 determines the face tilt axis, the displacement amount determination unit 152 calculates the tilted angle of the user's face in a plane perpendicular to the axis. In the area specifying unit 106, when the face in the input image is tilted, the face area is detected by matching the tilted reference image. Therefore, if the tilt angle of the reference image is calculated, the face tilt angle is calculated. Can be requested. The displacement amount determination unit 152 may determine that the face is tilted when the face inclination angle reaches a predetermined threshold, for example, 40 degrees or more. Further, the displacement amount determination unit 152 may measure a duration time during which the user is tilting his / her face. This can be calculated depending on how many frames the reference image is tilted to identify the face region. The displacement amount determination unit 152 may determine that the face is tilted when the measured time is equal to or greater than a threshold value in advance. Furthermore, the displacement amount determination unit 152 may count the number of face tilt repetitions.

パーツ動作判定部１５４は、領域特定部１０６で特定された顔のパーツ毎の領域の画像を複数フレームについて取得し、複数フレーム間の差分に基づいてパーツ毎の動作を検出する。検出する動作は、例えば、目の開閉（瞬き）、口の開閉などである。 The part motion determination unit 154 acquires an image of a region for each part of the face specified by the region specifying unit 106 for a plurality of frames, and detects an operation for each part based on a difference between the plurality of frames. The detection operation is, for example, opening / closing of eyes (blinking), opening / closing of mouth, and the like.

画面状態取得部１５６は、ユーザによる顔の動作や手の動作が行われているときに、ディスプレイ装置４に表示されている画面状態を取得する。ここでいう画面状態には、画面上のオブジェクトの配置や運動、ユーザの鏡面動画像の画面内での位置のことである。この画面状態を取得することによって、後述する接触判定や、特定のオブジェクトが配置されているときの顔の動作によってイベントを実行するといった制御が実現できる。 The screen state acquisition unit 156 acquires the screen state displayed on the display device 4 when the user performs face movements or hand movements. The screen state here refers to the arrangement and movement of objects on the screen, and the position of the user's mirrored moving image within the screen. By acquiring this screen state, it is possible to implement control such as a touch determination described later, or an event executed by a face motion when a specific object is placed.

イベント実行部１１０は、入力条件判定部１５８、テーブル記憶部１６０、操作指示部１６２、オブジェクト生成部１１２およびオブジェクトデータ記憶部１１４を含む。
テーブル記憶部１６０は、顔傾き軸判定部１５０、変位量判定部１５２、およびパーツ動作判定部１５４で判定または検出された顔の動作と、イベントとを関連づけたテーブルを格納する。顔の動作には、顔傾き軸、顔の傾き角、パーツの動作が含まれ、イベントには、決定、キャンセルといった画像処理装置１０で実行されるアプリケーションに対する操作指示と、画面上へのオブジェクトの表示とが含まれる。テーブルに定義されている顔の動作とイベントとの組合せの例は、図８を参照して後述する。 The event execution unit 110 includes an input condition determination unit 158, a table storage unit 160, an operation instruction unit 162, an object generation unit 112, and an object data storage unit 114.
The table storage unit 160 stores a table in which the face motion determined or detected by the face tilt axis determination unit 150, the displacement amount determination unit 152, and the part motion determination unit 154 is associated with an event. The face motion includes the face tilt axis, the face tilt angle, and the part motion. The event includes an operation instruction to the application executed by the image processing apparatus 10 such as determination and cancellation, and an object on the screen. Includes display. An example of combinations of face motions and events defined in the table will be described later with reference to FIG.

入力条件判定部１５８は、顔傾き軸判定部１５０、変位量判定部１５２、およびパーツ動作判定部１５４で判定または検出された顔の動作に対応するイベントをテーブル記憶部１６０から検索し、さらにテーブルで定義されている条件を満たしているか否かを判定する。 The input condition determination unit 158 searches the table storage unit 160 for an event corresponding to the face motion determined or detected by the face tilt axis determination unit 150, the displacement amount determination unit 152, and the part motion determination unit 154, and further includes a table. It is determined whether or not the condition defined in is satisfied.

オブジェクトデータ記憶部１１４は、ディスプレイ装置４の画面に表示するためのオブジェクトデータを格納する。オブジェクトデータには、メニュー画像、カーソル画像、ユーザのアクションを装飾するための画像、画像処理装置で実行されるゲームを構成するための画像などが含まれる。 The object data storage unit 114 stores object data to be displayed on the screen of the display device 4. The object data includes a menu image, a cursor image, an image for decorating a user action, an image for configuring a game executed on the image processing apparatus, and the like.

オブジェクト生成部１１２は、オブジェクトデータ記憶部１１４からオブジェクトデータを取り込み、オブジェクト画像を生成する。オブジェクト生成部１１２は、入力条件判定部１５８においてテーブルで定義されている条件が満たされたと判定されたとき、同じくテーブルに定義されているオブジェクト画像を生成する。オブジェクト生成部１１２は、オブジェクトの表示形態を時間の経過とともに変化させたり、動作検出部１０８により算出された顔の傾き角の増大または減少とともにオブジェクトの表示形態を変化させたりもする。
操作指示部１６２は、入力条件判定部１５８においてテーブルで定義されている条件が満たされたと判定されたとき、テーブルで関連づけられている操作を実行する。 The object generation unit 112 takes in the object data from the object data storage unit 114 and generates an object image. When the input condition determination unit 158 determines that the conditions defined in the table are satisfied, the object generation unit 112 generates an object image that is also defined in the table. The object generation unit 112 changes the display form of the object with time, or changes the display form of the object as the face inclination angle calculated by the motion detection unit 108 increases or decreases.
When the input condition determination unit 158 determines that the conditions defined in the table are satisfied, the operation instruction unit 162 executes an operation associated with the table.

入力条件判定部１５８は、手とオブジェクトとの接触判定も実行する。鏡面画像の手が画面上に配置されたオブジェクトに接触したと判定すると、オブジェクトに関連つけられたイベントを実行するように、操作指示部１６２およびオブジェクト生成部１１２に指令する。 The input condition determination unit 158 also performs contact determination between the hand and the object. If it is determined that the hand of the specular image has touched the object arranged on the screen, the operation instruction unit 162 and the object generation unit 112 are instructed to execute an event associated with the object.

図５は、本実施形態にかかる顔の動作による入力インタフェースを実現するフローチャートである。
まず、画像入力部１０２によりユーザの画像が取得され、画像反転部１０４によって作成されたユーザの鏡面動画像がディスプレイ装置４に表示される（Ｓ１０）。次に、領域特定部１０６により、ユーザの顔の領域およびパーツの領域が特定される（Ｓ１２）。動作検出部１０８は、複数フレーム分の顔の領域の画像、およびパーツの領域の画像を用いて、それぞれ顔の動作とパーツの動作を検出する（Ｓ１４）。動作検出部１０８はさらに、検出した顔の動作の傾き軸および変位量を判定する（Ｓ１６）。 FIG. 5 is a flowchart for realizing the input interface based on the face motion according to the present embodiment.
First, a user image is acquired by the image input unit 102, and the user's mirror moving image created by the image reversing unit 104 is displayed on the display device 4 (S10). Next, the area specifying unit 106 specifies the face area and the parts area of the user (S12). The motion detection unit 108 detects the motion of the face and the motion of the part using the image of the face region and the image of the part region for a plurality of frames, respectively (S14). The motion detection unit 108 further determines the tilt axis and displacement amount of the detected face motion (S16).

入力条件判定部１５８は、Ｓ１６で判定された顔の動作またはパーツの動作に関連づけられたイベントがテーブル記憶部１６０内のテーブルに定義されているか否かを判定する（Ｓ１８）。定義されていなければ（Ｓ１８のＮ）、このフローを終了する。定義されていれば（Ｓ１８のＹ）、Ｓ１６で検出された変位量や持続時間、繰り返し回数などの条件が、テーブルに定義されている条件を満たすか否かを判定する（Ｓ２０）。条件を満たさなければ（Ｓ２０のＮ）、このフローを終了する。条件を満たしていれば（Ｓ２０のＹ）、イベント実行部１１０はテーブルに関連づけられているイベントを実行する（Ｓ２２）。 The input condition determination unit 158 determines whether or not an event associated with the face motion or part motion determined in S16 is defined in the table in the table storage unit 160 (S18). If not defined (N in S18), this flow is terminated. If it is defined (Y in S18), it is determined whether the conditions such as the displacement amount, the duration, and the number of repetitions detected in S16 satisfy the conditions defined in the table (S20). If the condition is not satisfied (N in S20), this flow ends. If the condition is satisfied (Y in S20), the event execution unit 110 executes an event associated with the table (S22).

図６は、図５のＳ１２における顔またはパーツの検出方法を示すフローチャートである。
まず、領域特定部１０６は、入力画像を１フレーム分取得する（Ｓ３０）。この画像に対し、ノイズを除去して特徴量を効果的に抽出するためのフィルタリング処理を行う（Ｓ３２）。続いて、基準画像記憶部１２６に格納されている顔の基準画像と入力画像とを比較するパターンマッチングを実行する（Ｓ３４）。顔の基準画像は、数十から数百の様々な人物の顔の画像を元に作成された画像であり、この基準画像との類似度が高ければ、画像内のその部分は顔である可能性が高くなる。そこで、領域特定部１０６は、入力画像のある領域について、基準画像との類似度を算出する（Ｓ３６）。多くの場合、基準画像と入力画像間では、顔の位置や大きさが異なる。そこで、入力画像と基準画像間の類似度を計算する前に、入力画像をオフセットしたり、入力画像を拡大または縮小してもよい。類似度の算出手法は種々のものがあり、任意のものを使用できる。この手法は既知であるので、詳細な説明は省略する。 FIG. 6 is a flowchart showing the face or part detection method in S12 of FIG.
First, the area specifying unit 106 acquires an input image for one frame (S30). A filtering process is performed on the image to remove the noise and effectively extract the feature amount (S32). Subsequently, pattern matching for comparing the reference image of the face stored in the reference image storage unit 126 with the input image is executed (S34). The reference image of the face is an image created based on the images of the faces of several tens to several hundreds of people. If the similarity with the reference image is high, the portion in the image may be a face. Increases nature. Therefore, the area specifying unit 106 calculates the similarity with the reference image for a certain area of the input image (S36). In many cases, the position and size of the face are different between the reference image and the input image. Therefore, the input image may be offset, or the input image may be enlarged or reduced before calculating the similarity between the input image and the reference image. There are various methods for calculating the similarity, and any method can be used. Since this method is known, a detailed description is omitted.

領域特定部１０６は、類似度と予め定められたしきい値とを比較して、類似度がしきい値以上か否かを判定する（Ｓ３８）。しきい値未満であれば（Ｓ３８のＮ）、この領域は顔でないと判定されるため、Ｓ３４に戻り入力画像の他の部分について基準画像とのマッチングを継続する。Ｓ３８において類似度がしきい値以上であれば（Ｓ３８のＹ）、顔の領域が確定する（Ｓ４０）。 The region specifying unit 106 compares the degree of similarity with a predetermined threshold value, and determines whether or not the degree of similarity is greater than or equal to the threshold value (S38). If it is less than the threshold value (N in S38), it is determined that this region is not a face, so the process returns to S34 and the matching with the reference image is continued for other parts of the input image. If the similarity is equal to or greater than the threshold value in S38 (Y in S38), the face area is determined (S40).

顔の領域が確定すると、続いて顔の領域の画像内で顔のパーツの領域、例えば目、鼻、口の領域を探索する。顔の領域内の一部分の画像とパーツの基準画像とを比較し（Ｓ４２）、上記と同様にして類似度を算出する（Ｓ４４）。領域特定部１０６は、パーツ毎に、類似度と予め定められたしきい値とを比較して、類似度がしきい値以上か否かを判定する（Ｓ４６）。しきい値未満であれば（Ｓ４６のＮ）、Ｓ４２に戻り、顔の領域の画像の他の部分について基準画像とのパターンマッチングを継続する。Ｓ４６において類似度がしきい値以上であれば（Ｓ４６のＹ）、パーツの領域が確定する（Ｓ４８）。パーツの領域の検出は、他の手法を用いてもよい。例えば、目と口の領域を確定した後、目と口との相対位置関係から鼻の位置を推測し、鼻の領域を確定してもよい。
以上のようにして、入力画像から、ユーザの顔の位置、大きさ、目、鼻、口などのパーツの位置を検出することができる。 When the face area is determined, a face part area, for example, an eye, nose, or mouth area is searched for in the face area image. A partial image in the face region is compared with the reference image of the part (S42), and the similarity is calculated in the same manner as described above (S44). The region specifying unit 106 compares the similarity with a predetermined threshold for each part, and determines whether the similarity is equal to or greater than the threshold (S46). If it is less than the threshold value (N in S46), the process returns to S42, and pattern matching with the reference image is continued for the other part of the face region image. If the degree of similarity is equal to or greater than the threshold value in S46 (Y in S46), the part area is determined (S48). Other methods may be used to detect the part region. For example, after determining the eye and mouth area, the nose position may be determined by estimating the position of the nose from the relative positional relationship between the eyes and the mouth.
As described above, the position and size of the user's face, the positions of parts such as eyes, nose and mouth can be detected from the input image.

類似度算出による顔領域の検出は公知であるため、これ以上の説明は省略する。なお、上述以外の手法によって顔の領域を特定してもよい。 Since the detection of the face region by calculating the similarity is well known, further explanation is omitted. The face area may be specified by a method other than the above.

図７（ａ）〜（ｃ）は、顔の傾き軸の例を示す模式図である。（ａ）〜（ｃ）は、ディスプレイ装置に映し出されたユーザの鏡面画像を表している。画面内の空間に三次元座標を想定する。（ａ）に示すように、ユーザが首を傾げるような動きをしたときは、ｘ軸回りの傾きと判定される。（ｂ）に示すように、ユーザが上または下を向くような動作をしたときは、ｙ軸回りの傾きと判定される。（ｃ）に示すように、ユーザが左または右を向くような動作をしたときは、ｚ軸回りの傾きと判定される。顔傾き軸の判定は、上述したように、顔の向きの変化に基づいて行う。例えば、あるフレームでの顔の向きが正面と判定され、数フレーム後の顔の向きが左顔と判定されれば、顔の傾き軸はｚ軸であると判定できる。ｘ軸、ｙ軸についても同様である。 7A to 7C are schematic diagrams illustrating examples of face tilt axes. (A)-(c) represents the specular image of the user projected on the display apparatus. Assume 3D coordinates in the space in the screen. As shown in (a), when the user moves to tilt his / her neck, it is determined that the inclination is about the x axis. As shown in (b), when the user moves upward or downward, it is determined that the inclination is about the y axis. As shown in (c), when the user moves to the left or right, it is determined that the inclination is around the z axis. As described above, the determination of the face tilt axis is performed based on the change in the face orientation. For example, if the face orientation in a certain frame is determined to be the front face and the face orientation after several frames is determined to be the left face, the face tilt axis can be determined to be the z-axis. The same applies to the x axis and the y axis.

図８は、テーブル記憶部に記憶される、顔の動作とイベントとを関連つけたテーブル４０の一例を示す。「動作部位」欄３０は、動作検出部１０８で動作が検出された、顔、目、口などの部位を表す。「動作内容」欄３２は、動作検出部１０８で検出された動作を示しており、動作部位が顔であれば回転の方向を、目や口であれば開閉のことである。目を「隠す」とあるのは、手などで目を覆ったために顔の領域内のあるべきところに目の画像が検出されなかった場合である。「条件」欄３４は、動作内容欄３２で示した動作内容がなされたと判定するための条件であり、入力条件判定部１５８で判定される。この条件には、変位量判定部１５２で検出される顔の傾きの角度や、動作の持続時間、動作の繰り返し数などが含まれる。 FIG. 8 shows an example of the table 40 stored in the table storage unit that associates facial motions with events. The “motion part” column 30 represents a part such as a face, eyes, or mouth where the motion is detected by the motion detection unit 108. The “motion content” column 32 indicates the motion detected by the motion detection unit 108, and indicates the direction of rotation if the motion part is a face, and opening and closing if the eye or mouth. “Hiding” the eye means that the eye image is not detected at a desired position in the face area because the eye is covered with a hand or the like. The “condition” column 34 is a condition for determining that the operation content shown in the operation content column 32 has been made, and is determined by the input condition determination unit 158. This condition includes the face inclination angle detected by the displacement amount determination unit 152, the duration of the motion, the number of repetitions of the motion, and the like.

「オブジェクト」欄３６は、ある行の欄３０〜３４で定義された動作が満たされたときに、画面表示するべきオブジェクトを表している。オブジェクトには、図示するように「？」マーク、「ＯＫ」「使い方ガイド」などの文字列、「マル」「バツ」などの記号などのほか、表示中のオブジェクトを消去することも含まれる。「操作指示」欄３８は、ある行の欄３０〜３４で定義された動作が満たされたときに、画像処理装置で実行中のアプリケーションに対してなされるべき操作指示を表している。この操作指示には、ヘルプの表示、キャンセル、決定、選択、メニューの表示、終了、移動方向の入力などが含まれる。
一例を挙げると、テーブル４０の上から２行目を参照して、検出したユーザの顔がｘ軸回りに４０度回転したとき、オブジェクト生成部１１２は画面上にオブジェクト「？」を表示し、さらに操作指示部１６２はヘルプメニューを表示するようにアプリケーションに対して指示する。 The “object” column 36 represents an object to be displayed on the screen when the operations defined in the columns 30 to 34 in a certain row are satisfied. As shown in the figure, the object includes a character string such as “?” Mark, “OK”, “how to use guide”, a symbol such as “maru”, “X”, etc., as well as erasing the displayed object. The “operation instruction” column 38 represents an operation instruction to be issued to the application being executed in the image processing apparatus when the operations defined in the columns 30 to 34 in a certain row are satisfied. This operation instruction includes help display, cancellation, determination, selection, menu display, termination, input of the moving direction, and the like.
For example, referring to the second line from the top of the table 40, when the detected user's face is rotated 40 degrees around the x axis, the object generation unit 112 displays the object “?” On the screen, Further, the operation instruction unit 162 instructs the application to display a help menu.

なお、テーブル４０において操作指示と関連つけられた顔の動作は、ユーザがその操作指示を現実世界で表現するときの動作と近似する動作であることが好ましい。例えば、人間は分からないことがあったときには図７（ａ）のように首を傾げる動作をする。したがって、この動きをｘ軸回りの傾きと判定し、分からないことを解決するためのヘルプメニューを表示するイベントと関連づける。また、承諾をするときには図７（ｂ）のように頷く動作をするが、この動きをｙ軸回りの傾きと判定し、決定の操作指示を出すイベントと関連づける。さらに、否定するときには図７（ｃ）のように首を横に振る動作をするが、この動きをｚ軸回りの回転と判定し、キャンセルの操作指示を出すイベントと関連づける。当然であるが、動作とイベントの組合せはこれに限られるわけではなく、全く関係ないイベントや、動作と逆の意味を持つイベントと関連づけてもよい。後者であれば、逆の動作をユーザに要求することで、ゲーム性を高めることにもなりうる。 In addition, it is preferable that the face motion associated with the operation instruction in the table 40 is an operation that approximates the operation when the user expresses the operation instruction in the real world. For example, when there is something that humans do not understand, an operation of tilting the neck as shown in FIG. Therefore, this movement is determined as an inclination around the x-axis, and is associated with an event for displaying a help menu for solving the unknown. In addition, when consenting, an operation is performed as shown in FIG. 7B, and this movement is determined as a tilt around the y axis, and is associated with an event for issuing a determination operation instruction. Further, when negating, an operation of swinging the head sideways is performed as shown in FIG. 7C. This movement is determined as rotation around the z axis, and is associated with an event for issuing a cancel operation instruction. As a matter of course, the combination of the action and the event is not limited to this, and it may be associated with an event that is not related at all or an event having the opposite meaning to the action. If it is the latter, it can also improve game property by requesting the reverse operation from the user.

また、人間の身振りが表す意味は、ユーザの人種や国によって異なることもあるので、複数のテーブルを予め準備しておき、ユーザが国を選択するとそれに適したテーブルが使用されるようにしてもよい。 In addition, since the meaning represented by human gestures may vary depending on the race and country of the user, a plurality of tables are prepared in advance, and when the user selects a country, a table suitable for that is used. Also good.

以上説明したような画像処理システムの構成のもと、本実施形態にかかる顔インタフェースを使用した画面操作の実施例をいくつか説明する。 Based on the configuration of the image processing system as described above, some examples of screen operations using the face interface according to the present embodiment will be described.

実施例１．
図９（ａ）〜（ｃ）は、首を傾けるとヘルプを出現させる実施例における画面変化の一例を示す。アプリケーションの進行中に何か分からないことが出てきたときに、ユーザが顔をｘ軸回りに傾ける動作をすると（図９（ａ））、動作検出部１０８は、顔の領域の検出を通して、顔の傾き軸と傾き角を判定する。顔の傾き角がテーブルで定義されている条件に達すると、オブジェクト生成部１１２は、画面内のユーザの鏡面画像の頭の近傍に、オブジェクト「？」を表示する（図９（ｂ））。出現した「？」のオブジェクトにユーザが手で触れる動作をすると、入力条件判定部１５８により接触が判定されて、オブジェクト生成部により「使い方ガイド」の文字が画面上に表示される（図９（ｃ））。その後、ヘルプ画面が表示される。このヘルプ画面についても、顔パーツや手などの動作で操作できるようにしてもよいし、またはコントローラなどで操作可能であってもよい。 Example 1.
FIGS. 9A to 9C show examples of screen changes in the embodiment in which help appears when the head is tilted. When something is known while the application is in progress, when the user performs an operation of tilting the face around the x axis (FIG. 9A), the motion detection unit 108 detects the face area through The tilt axis and tilt angle of the face are determined. When the tilt angle of the face reaches the condition defined in the table, the object generation unit 112 displays the object “?” In the vicinity of the head of the user's mirror image on the screen (FIG. 9B). When the user touches the appearing “?” Object with his / her hand, the input condition determination unit 158 determines contact, and the object generation unit displays a “Usage Guide” character on the screen (FIG. 9 ( c)). After that, a help screen is displayed. This help screen may also be operable by operations such as facial parts and hands, or may be operable by a controller or the like.

図１０（ａ）〜（ｃ）は、首を振るとヘルプが消える実施例における画面変化の一例を示す。「使い方ガイド」の文字が画面上に表示されている状態で（図１０（ａ））、ヘルプの必要がなくなったため、ユーザが首を振る動作をすると（図１０（ｂ））、動作検出部１０８は、顔の領域の検出を通して顔の傾き軸と傾き角を判定する。入力条件判定部１５８によって、顔を振った回数がテーブルで定義されている条件に達したと判定されると、オブジェクト生成部１１２によって、ユーザの鏡面画像の頭の近傍に表示されていた「使い方ガイド」の文字列のオブジェクトが画面の左端に向けて飛んでいき、画面から消えるような演出がなされる（図１０（ｃ））。「使い方ガイド」の文字列のオブジェクトに対して、ユーザが手で振り払う動作をしたときに、オブジェクトが画面から消えるようにしてもよい。 FIGS. 10A to 10C show examples of screen changes in the embodiment where the help disappears when the head is shaken. In the state where the “Usage Guide” is displayed on the screen (FIG. 10 (a)), the need for help is eliminated, and the user swings his / her head (FIG. 10 (b)). 108 determines the tilt axis and tilt angle of the face through detection of the face area. When the input condition determination unit 158 determines that the number of times the face has been shaken has reached the conditions defined in the table, the object generation unit 112 displays “how to use” displayed near the head of the user's specular image. The character string object “Guide” flies toward the left end of the screen, and the effect of disappearing from the screen is produced (FIG. 10C). The object may disappear from the screen when the user performs a hand-shaking operation on the character string object of the “Usage Guide”.

図９や図１０に示した動作は、ユーザが無意識のうちにしてしまうことがあるため、誤作動を防止するために、その動作を複数回繰り返すことで認識されるようにすることが好ましい。また、動作の繰り返し数の大小によって、異なるイベントに関連づけられていてもよい。
また、ｘ、ｙ、ｚ軸の回転にイベントを対応づけるほか、正面からの顔の向きの方向、例えば、右上、右下、左上、左下への顔の動きにそれぞれ別のイベントを割り当ててもよい。 Since the operation shown in FIG. 9 and FIG. 10 may be unconsciously performed by the user, it is preferable that the operation is recognized by repeating the operation a plurality of times in order to prevent malfunction. Further, it may be associated with different events depending on the number of repetitions of the operation.
In addition to associating events with the rotation of the x, y, and z axes, different events may be assigned to the face direction from the front, for example, the movement of the face in the upper right, lower right, upper left, and lower left. Good.

実施例２．
図１１（ａ）〜（ｄ）は、動作の継続時間を判定する実施例における画面変化の一例を示す。
ユーザが、アプリケーションの進行中に何か分からないことが出てきたときに、顔をｘ軸回りに傾ける動作をすると（図１１（ａ））、動作検出部１０８は、顔の領域の検出を通して顔の傾き軸と傾き角を判定する。入力条件判定部１５８により顔の傾き角がテーブルで定義されている条件で指定されている角度以上になったと判定されると、オブジェクト生成部１１２によって、画面内のユーザの鏡面画像の頭の近傍に小さめのオブジェクト「？」が表示される（図１１（ｂ））。顔を傾けた状態を継続すると、時間に応じてオブジェクトが次第に拡大していき（図１１（ｃ））、その状態が所定の時間（例えば３秒）以上継続したと判定されると、オブジェクトが消滅してオブジェクト生成部１１２により「使い方ガイド」の文字列オブジェクトが鏡面画像の頭の近傍に表示される（図１１（ｄ））。その後、コントローラなどによる操作で項目を検索できるヘルプ画面が表示される。一旦オブジェクト「？」が出現した後、ユーザが顔を元の状態に戻すと、オブジェクトは次第に小さくなっていき、画面から消滅する。 Example 2
FIGS. 11A to 11D show examples of screen changes in the embodiment for determining the duration of the operation.
When the user performs an operation of tilting the face around the x-axis when something is unknown while the application is in progress (FIG. 11A), the motion detection unit 108 detects the face area. The tilt axis and tilt angle of the face are determined. When the input condition determination unit 158 determines that the face inclination angle is equal to or larger than the angle specified in the conditions defined in the table, the object generation unit 112 near the head of the user's mirror image on the screen A small object “?” Is displayed on the screen (FIG. 11B). If the state where the face is tilted is continued, the object gradually expands according to the time (FIG. 11C), and if it is determined that the state has continued for a predetermined time (for example, 3 seconds), the object is The object generation unit 112 disappears and a character string object of “how to use guide” is displayed near the head of the mirror image (FIG. 11D). After that, a help screen is displayed on which items can be searched by an operation by a controller or the like. Once the object “?” Appears, when the user returns the face to the original state, the object gradually becomes smaller and disappears from the screen.

誤作動を防止するために、角度で判断する要素を取り入れてもよい。例えば、ユーザが顔をｘ軸回りに傾ける動作を始めると、画面内のユーザの鏡面画像の近傍に、小さめのオブジェクト「？」が表示される。傾き角が大きくなるにつれて、オブジェクトも拡大していき、傾き角が所定の大きさ以上になると、オブジェクトが消滅して「使い方ガイド」の文字列オブジェクトが鏡面画像の頭の近傍に表示される。一旦オブジェクト「？」が出現した後、ユーザが顔の傾きを小さくすると、それに合わせてオブジェクトも小さくなり、画面から消滅する。顔の傾き角に応じてオブジェクトの表示位置をずらすようにしてもよい。また、他のメニューが先に画面に表示されていた場合、オブジェクト生成部１１２は、そのメニューにオブジェクトが重ならないようにオブジェクトの出現位置を変えてもよい。図示しないコントローラを介してオブジェクトの出現位置をユーザが指示できるように構成してもよい。 In order to prevent a malfunction, an element determined by an angle may be incorporated. For example, when the user starts to tilt the face around the x axis, a small object “?” Is displayed in the vicinity of the user's mirror image on the screen. As the tilt angle increases, the object also expands. When the tilt angle exceeds a predetermined size, the object disappears and a character string object of “how to use guide” is displayed near the head of the specular image. Once the object “?” Appears, if the user decreases the tilt of the face, the object is also reduced accordingly and disappears from the screen. The display position of the object may be shifted according to the tilt angle of the face. When another menu is displayed on the screen first, the object generation unit 112 may change the appearance position of the object so that the object does not overlap the menu. You may comprise so that a user can instruct | indicate the appearance position of an object via the controller which is not shown in figure.

ユーザが顔を傾ける方向に応じてオブジェクト「？」が表示される位置を変えるようにしてもよい。例えば、オブジェクトをユーザの顔の中心軸の延長線上に表示するようにすれば、ユーザが右、左のいずれの方向に顔を傾けても頭部付近にオブジェクトが表示されるようになる。ユーザが顔を傾ける動作の速度に応じてオブジェクト「？」の表示位置を変えるようにしてもよい。 The position where the object “?” Is displayed may be changed according to the direction in which the user tilts his / her face. For example, if an object is displayed on an extension line of the center axis of the user's face, the object is displayed near the head even if the user tilts the face in either the right or left direction. You may make it change the display position of object "?" According to the speed of the operation | movement which a user inclines a face.

このように、動作の継続や大きさの要素を判定に加えることで、選択や決定などの操作指示が、ユーザの偶然の動作でなされることを防止でき、操作の確実性を高められる。また、所定の動作が検出された段階でオブジェクトを表示して、その後の動作に応じてオブジェクトの表示態様を変化させることで、ユーザは、自らの動作が正しいかまたは間違っているのかを視覚的に認識することができる。つまり、ユーザに対して現在の動作を継続するか中止するかの示唆を自然に与えることができる。
これと合わせて、図示しない音声制御部によって、オブジェクトの表示態様の変化に合わせて所定の効果音を出力するようにしてもよい。例えば、図１１の例において、オブジェクトが次第に拡大していくのに合わせて効果音の音量を増やすようにすれば、ユーザの動作が装置によって検出されていることを聴覚的にユーザに知らせることができる。 As described above, by adding the continuation and size elements of the operation to the determination, it is possible to prevent an operation instruction such as selection or determination from being performed by the user's accidental operation, and the reliability of the operation can be improved. In addition, by displaying an object when a predetermined motion is detected and changing the display mode of the object according to the subsequent motion, the user can visually determine whether his motion is correct or incorrect. Can be recognized. That is, it is possible to give a natural suggestion to the user whether to continue or cancel the current operation.
In combination with this, a sound effect unit (not shown) may output a predetermined sound effect in accordance with a change in the display mode of the object. For example, in the example of FIG. 11, if the volume of the sound effect is increased as the object gradually expands, the user can be audibly informed that the user's action is detected by the device. it can.

図１１では、ヘルプの際の動作を例として説明したが、決定やキャンセルの場合でも同様のことができる。 In FIG. 11, the operation at the time of help is described as an example, but the same can be done in the case of determination or cancellation.

実施例３．
一般的なゲームのコントローラ機能の全てまたは一部を、顔の動作で代替するようにしてもよい。例えば、顔の回転方向、つまり上向き、下向き、左向き、右向きをコントローラの十字キーの上下左右の入力と同等とする。さらに、顔のパーツのうち、例えば右目の開閉をマルボタン（一般的な機能は「決定」）、左目の開閉のバツボタン（一般的な機能は「キャンセル」）の押下とそれぞれ対応させる。こうすることで、より複雑な操作指示を顔の動作で表現することができる。
また、顔の動作に応じたメニュー画面がディスプレイ装置に表示されるようにしてもよい。例えば、顔の向いた方向にメニュー画面を表示させてもよい。表示されたメニュー画面内に選択、実行、キャンセルなどを表すオブジェクトを表示させ、顔の動作でそれらを選択できるようにしてもよい。 Example 3
All or part of a general game controller function may be replaced with a face motion. For example, the rotation direction of the face, that is, upward, downward, leftward, and rightward is equivalent to the input of up / down / left / right of the controller cross key. Further, among the facial parts, for example, the opening / closing of the right eye corresponds to the pressing of a multi-button (general function is “OK”) and the cross button of the left eye opening / closing (general function is “Cancel”). In this way, more complicated operation instructions can be expressed by facial movements.
In addition, a menu screen corresponding to the face motion may be displayed on the display device. For example, the menu screen may be displayed in the direction the face is facing. Objects representing selection, execution, cancellation, etc. may be displayed in the displayed menu screen so that they can be selected by the action of the face.

実施の形態２．
実施例１〜３では、ユーザの鏡面画像がディスプレイ装置に映し出されていた場合を述べたが、ユーザの動作が認識さえできていれば、鏡面画像が映し出される必要はない。 Embodiment 2. FIG.
In the first to third embodiments, the case where the mirror image of the user is displayed on the display device has been described. However, as long as the user's operation can be recognized, the mirror image need not be displayed.

以下、ユーザの鏡面画像を表示しない場合の実施例を述べる。 Hereinafter, an embodiment in the case where the mirror image of the user is not displayed will be described.

実施例４．
記録媒体に格納されているデジタル写真のスライドショーを実行するアプリケーションを考える。ディスプレイ装置に写真が表示されている状態で、ユーザが顔をｚ軸回りに回転させたとする。動作検出部１０８がこの動作を検出すると、イベント実行部１１０は、現在表示されている写真を顔を向けた方向に移動させ、次の写真をディスプレイ装置に表示させる。また、ディスプレイ装置に複数枚の写真が表示されている状態で、ユーザが顔をｙ軸回りに回転、つまり「頷く」ような動作をしたとする。動作検出部１０８がこの動作を検出すると、イベント実行部１１０は、複数枚の写真のうちその時点でフォーカスの当たっている写真を選択して、ディスプレイ装置の画面全体に拡大表示させる。さらに、ディスプレイ装置に写真が表示されている状態で、動作検出部１０８により目の開閉が検出されると、イベント実行部１１０はスライドショーを終了させる。 Example 4
Consider an application that performs a slideshow of digital photos stored on a recording medium. It is assumed that the user rotates his / her face around the z axis while a photograph is displayed on the display device. When the motion detection unit 108 detects this motion, the event execution unit 110 moves the currently displayed photo in the direction facing the face, and displays the next photo on the display device. Further, it is assumed that the user rotates the face around the y-axis, that is, performs an operation of “whipping” in a state where a plurality of photographs are displayed on the display device. When the motion detection unit 108 detects this motion, the event execution unit 110 selects a photo that is currently in focus from among a plurality of photos, and displays the enlarged photo on the entire screen of the display device. Further, when the opening / closing of the eyes is detected by the motion detection unit 108 while the photograph is displayed on the display device, the event execution unit 110 ends the slide show.

実施例５．
一人称視点のゲーム、つまりディスプレイ装置に相対しているユーザの視点が画面表示される態様のゲームを想定する。このようなゲームにおいて、画面に分かれ道が表示されたとき、ユーザが顔を向けた方向を動作検出部１０８により検出し、その方向の道が選択されたときの画像をイベント実行部１１０が作成して表示する。また、ユーザが下を向くと、見下ろした画像が表示され、上を向くと、空が見えるといったようにすれば、ユーザがゲームの仮想空間内にいる感覚を強めることができる。
なお、ユーザの鏡面画像を画面の片隅に映し出しておいてもよい。こうすることで、ユーザの動作が画像処理装置に認識されていることをユーザに知らしめることができる。 Example 5 FIG.
Assume a first-person viewpoint game, that is, a game in which the viewpoint of the user facing the display device is displayed on the screen. In such a game, when a fork road is displayed on the screen, the motion detection unit 108 detects the direction in which the user faces his face, and the event execution unit 110 creates an image when the road in that direction is selected. To display. Further, when the user looks down, an image looking down is displayed, and when the user looks up, the sky can be seen, so that the user can feel more in the virtual space of the game.
A mirror image of the user may be displayed at one corner of the screen. By doing so, it is possible to notify the user that the user's operation is recognized by the image processing apparatus.

以上、本発明をいくつかの実施形態について説明した。本発明によれば、顔の動きという非常に簡単かつ負荷の少ない動作で入力を与えることができ、ユーザにとって自然かつ簡単な入力インタフェースを提供することができる。
また、何かの行動に対してすぐに反応すること、今どのような状態かをリアルタイムに明示することが、対話型のユーザインタフェースでは重要である。本発明では、ユーザの動作に応じて画面にオブジェクトが表示されるので、ユーザの動作が画像処理装置に認識されていることを伝達することができる。また、ユーザの動作に応じて画面に表示されるオブジェクトの表示形態を変化させることで、ユーザに対して、現在の動作が所望の機能を導くものか否かを視覚的に認識させ、動作の継続または中止を自然に誘導することができる。さらに、ユーザは画像と対話していく中で、無意識のうちに扱い方や遊び方を取得していくことができる。 The present invention has been described with respect to several embodiments. According to the present invention, it is possible to provide an input with a very simple and light load operation such as a face movement, and it is possible to provide a natural and simple input interface for the user.
Also, it is important in an interactive user interface to react immediately to some action and to specify in real time what state it is now. In the present invention, since the object is displayed on the screen in accordance with the user's operation, it can be transmitted that the user's operation is recognized by the image processing apparatus. In addition, by changing the display form of the object displayed on the screen according to the user's action, the user can visually recognize whether the current action leads to a desired function, A continuation or discontinuation can be induced naturally. Furthermore, the user can unconsciously acquire how to handle and play while interacting with the image.

以上、実施の形態をもとに本発明を説明した。これらの実施の形態は例示であり、各構成要素またはプロセスの組合せにいろいろな変形例が可能なこと、またそうした変形例も本発明の範囲にあることは当業者に理解されるところである。 The present invention has been described above based on the embodiment. It is to be understood by those skilled in the art that these embodiments are exemplifications, and various modifications are possible for each component or combination of processes, and such modifications are also within the scope of the present invention.

実施の形態で述べた構成要素の任意の組合せ、本発明の表現を方法、装置、システム、コンピュータプログラム、記録媒体などの間で変換したものもまた、本発明の態様として有効である。また、本明細書にフローチャートとして記載した方法は、その順序にそって時系列的に実行される処理のほか、並列的または個別に実行される処理をも含む。 Arbitrary combinations of the constituent elements described in the embodiment, and a representation of the present invention converted between a method, an apparatus, a system, a computer program, a recording medium, and the like are also effective as an aspect of the present invention. In addition, the method described as a flowchart in the present specification includes processing executed in parallel or individually in addition to processing executed in time series according to the order.

ゲームなどのアプリケーションプログラムと、本発明のユーザインタフェースプログラムとは、同一の記録媒体によって提供されてもよいし、別の記録媒体によって提供されてもよい。本発明のユーザインタフェースプログラムは、予めゲーム装置などに組み込まれていてもよい。 The application program such as a game and the user interface program of the present invention may be provided by the same recording medium or may be provided by different recording media. The user interface program of the present invention may be incorporated in advance in a game device or the like.

実施の形態では、カメラによって撮影されたユーザの鏡面画像がそのままディスプレイ装置に映し出されることを説明したが、ユーザの代わりにキャラクタを映し出し、ユーザの動作に合わせて動かすようにしてもよい。キャラクタは人型であってもよいし、顔や手の動作が分かれば、動物や非生物であってもよい。単に目、鼻、口に対応するオブジェクトが映ってもよい。
この場合でも、ユーザの鏡面画像に基づいた顔、パーツの領域特定、動作の検出は常に実行されており、検出した動作に関連づけられたイベントが発生することも上述の実施形態と同様である。 In the embodiment, it has been described that the mirror image of the user captured by the camera is displayed as it is on the display device, but a character may be displayed instead of the user and moved according to the user's action. The character may be a humanoid, or it may be an animal or a non-living creature if the face and hand movements are known. Objects that simply correspond to the eyes, nose, and mouth may be reflected.
Even in this case, the face / part area identification and the motion detection based on the mirror image of the user are always executed, and an event associated with the detected motion is generated as in the above-described embodiment.

本発明による動作検出を、音声認識の補助として使ってもよい。画像処理装置がユーザの発する音声を認識して対応する文字を入力する音声入力機能を有しているとき、ユーザの「はい」や「いいえ」と発生するときの口の動作をも考慮して、決定やキャンセルの操作指示を出すことができる。 Motion detection according to the present invention may be used as an aid to speech recognition. When the image processing device has a voice input function for recognizing the voice uttered by the user and inputting the corresponding character, it also takes into account the mouth movement when the user says “Yes” or “No”. It is possible to issue a decision or cancel operation instruction.

本発明の一実施形態である画像処理装置を用いた画像処理システムの構成を示す図である。1 is a diagram illustrating a configuration of an image processing system using an image processing apparatus according to an embodiment of the present invention. 画像処理装置のハードウェア構成図である。It is a hardware block diagram of an image processing apparatus. 本実施形態にかかるユーザインタフェース装置の構成を示す図である。It is a figure which shows the structure of the user interface apparatus concerning this embodiment. 動作検出部およびイベント実行部の詳細な構成を示す機能ブロック図である。It is a functional block diagram which shows the detailed structure of an operation | movement detection part and an event execution part. 本実施形態にかかる顔の動作による入力インタフェースを実現するフローチャートである。It is a flowchart which implement | achieves the input interface by the operation | movement of the face concerning this embodiment. 図５のＳ１２における顔またはパーツの検出方法を示すフローチャートである。It is a flowchart which shows the detection method of the face or parts in S12 of FIG. 図７（ａ）〜（ｃ）は、顔の傾き軸の例を示す模式図である。7A to 7C are schematic diagrams illustrating examples of face tilt axes. テーブル記憶部に記憶される、顔の動作とイベントとを関連つけたテーブルの一例を示す図である。It is a figure which shows an example of the table which linked | related the operation | movement of the face and the event memorize | stored in a table memory | storage part. 図９（ａ）〜（ｃ）は、首を傾けるとヘルプを出現させる実施例における画面変化の一例を示す図である。FIGS. 9A to 9C are diagrams illustrating an example of a screen change in the embodiment in which help appears when the head is tilted. 図１０（ａ）〜（ｃ）は、首を振るとヘルプが消える実施例における画面変化の一例を示す図である。FIGS. 10A to 10C are diagrams illustrating an example of a screen change in the embodiment in which the help disappears when the head is shaken. 図１１（ａ）〜（ｃ）は、動作の継続時間を判定する実施例における画面変化の一例を示す図である。FIGS. 11A to 11C are diagrams showing an example of a screen change in the embodiment for determining the duration of the operation.

Explanation of symbols

２カメラ、４ディスプレイ装置、１０画像処理装置、４０テーブル、１０２画像入力部、１０４画像反転部、１０６領域特定部、１０８動作検出部、１１０イベント実行部、１１２オブジェクト生成部、１１４オブジェクトデータ記憶部、１１６画像合成部、１１８表示制御部、１２０画像比較部、１２２顔・パーツ検出部、１２４手検出部、１２６基準画像記憶部、１５０顔傾き軸判定部、１５２変位量判定部、１５４パーツ動作判定部、１５６画面状態取得部、１５８入力条件判定部、１６０テーブル記憶部、１６２操作指示部。 2 camera, 4 display device, 10 image processing device, 40 table, 102 image input unit, 104 image inversion unit, 106 area specifying unit, 108 motion detection unit, 110 event execution unit, 112 object generation unit, 114 object data storage unit , 116 Image composition unit, 118 Display control unit, 120 Image comparison unit, 122 Face / part detection unit, 124 Hand detection unit, 126 Reference image storage unit, 150 Face tilt axis determination unit, 152 Displacement amount determination unit, 154 Parts operation Determination unit, 156 screen state acquisition unit, 158 input condition determination unit, 160 table storage unit, 162 operation instruction unit.

Claims

An area specifying function for specifying an area of the user's face from the user's image captured by the imaging device;
An action detection function for detecting the tilt of the user's face as one of the face actions using an image of the identified face area;
An event execution function for holding a table associating a face motion and an event, searching the table for an event corresponding to the face motion detected by the motion detection function, and executing the event;
In allowed to demonstrate on the computer Ruyu over The interface program,
The event execution function has an object generation function for displaying a first object on the screen as an event corresponding to the inclination of the face detected by the motion detection function,
The object generation function changes the display size of the first object as the face inclination angle increases or decreases, and changes the first object display size when the face inclination angle exceeds a predetermined size. A user interface program that displays a second object different from an object on a screen.

The motion detection function determines which axis of the three-dimensional coordinates defined on the screen the detected user's face inclination is,
The user interface program according to claim 1 , wherein the event execution function executes different events according to a tilt axis of the face determined by the motion detection function.

The area specifying function further specifies an area of parts constituting the face within the specified face area,
The user interface program according to claim 1, wherein the motion of the face detected by the motion detection function includes the motion of the part.

The user interface program according to claim 1, further comprising a display function for displaying a menu screen on the screen according to the action of the face.

5. The user interface program according to claim 4 , wherein the event execution function can specify any one of selection, execution, and cancellation of a menu screen in accordance with a face motion.

The event executed by the event execution function includes an operation instruction for an application executed on the computer, and at least a part of the facial motion associated with the operation instruction in the table is operated by the user. the user interface program according to any one of claims 1 to 3, wherein the instruction to an operation of operating an approximation of when expressed in the real world.

The motion detection function counts the number of repetitions of the user's face motion,
The event execution function, the user interface program according to any one of claims 1 to 3, characterized in that to perform different event according to the number of repetitions.

The motion detection function counts the number of repetitions of the user's face motion,
The event execution function, when the number of repetitions reaches a threshold, according to any one of 3 claims 1, characterized in that executing the associated wear event in the operation of the face in the table User interface program.

The area specifying function holds a reference image for front, sideways, upward, and downward faces, detects the face orientation by matching the reference image and the image of the specified face area,
The user interface program according to claim 2 , wherein the motion detection function determines the tilt axis of the face according to a change in face orientation at a plurality of time points.

The motion detection function detects the motion of the user's mouth,
When the user interface program further includes a voice input function for recognizing a voice uttered by the user and inputting a corresponding character, refer to the movement of the mouth detected by the motion detection function when the voice is recognized. The user interface program according to claim 3 , wherein:

The motion detection function measures a time during which the facial motion continues, and determines that the facial motion has been executed when the measured time exceeds a threshold value. Item 11. A user interface program according to any one of Items 1 to 10 .

Prior Symbol object creation function, a user interface program according to the display form of the object to claim 11, characterized in that to change over the time.

13. The user interface program according to claim 12 , further causing the computer to exhibit an image composition function for composing the mirror image of the user imaged by the imaging device and the object to generate one image.

Recording medium characterized by storing a program according to any one of claims 1 to 13.

An area specifying unit for specifying an area of the user's face from the image of the user imaged by the imaging device;
An action detection unit that detects the tilt of the user's face as one of the face actions using an image of the identified face area;
An event execution unit that holds a table associating a face motion and an event, searches the table for an event corresponding to the face motion detected by the motion detection unit, and executes the event ;
The event execution unit includes an object generation unit that displays a first object on the screen as an event corresponding to the inclination of the face detected by the motion detection unit,
The object generation unit changes the display size of the first object to be larger or smaller as the angle of face inclination increases or decreases, and when the angle of face inclination exceeds a predetermined value, A user interface device that displays a second object different from an object on a screen .

An area of the user's face is identified from the user's image captured by the imaging device,
Using the image of the identified face area, the user's face tilt is detected as one of the face movements,
The first object is displayed on the screen as an event corresponding to the detected face inclination,
As the face tilt angle increases or decreases, the display size of the first object is changed to be larger or smaller, and when the face tilt angle exceeds a predetermined size, the second object is different from the first object. A user interface method characterized in that an object is displayed on a screen .

Imaging means for imaging a user;
Display means for displaying a captured user image;
Area specifying means for specifying an area of the user's face from the user's image;
Motion detection means for detecting the tilt of the user's face as one of the face motions using an image of the identified facial area;
An event execution means for holding a table associating a face motion with an event, searching the table for an event corresponding to the face motion detected by the motion detection means, and executing the event ;
The event execution means includes object generation means for displaying a first object on the screen as an event corresponding to the inclination of the face detected by the motion detection means,
The object generation means changes the display size of the first object as the face inclination angle increases or decreases, and increases or decreases the display size of the first object. An information processing system that displays a second object different from an object on a screen .