JP2013161406A

JP2013161406A - Data input device, display device, data input method, and data input program

Info

Publication number: JP2013161406A
Application number: JP2012024951A
Authority: JP
Inventors: Tomoya Shimura; 智哉紫村; Yasutaka Wakabayashi; 保孝若林; Ko Imai; 巧今井; Daisuke Murayama; 大輔村山; Kenichi Iwauchi; 謙一岩内
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2012-02-08
Filing date: 2012-02-08
Publication date: 2013-08-19
Anticipated expiration: 2032-02-08
Also published as: JP5964603B2

Abstract

PROBLEM TO BE SOLVED: To enable comfortable operation regardless of the level of mastery of an operation method, thereby preventing malfunction.SOLUTION: A first position detection unit acquires first positional information representing the position of part of body for each user included in a video captured by an imaging apparatus, a user information analysis unit determines the user on the basis of the first positional information, and detects user information including information representing the shape of part of body for the user included in the video captured by the imaging apparatus, and a control unit executes processing corresponding to the user information detected by the user information analysis unit.

Description

本発明は、データ入力装置、表示装置、データ入力方法、及びデータ入力プログラムに関する。 The present invention relates to a data input device, a display device, a data input method, and a data input program.

コンピュータ、ゲーム機等の機器を操作するためのユーザインタフェースとして様々な機器や手法が提案されている。特にゲーム機には、ユーザの動きを検出（モーションキャプチャ）し、ユーザの身体全体の姿勢で機器を操作するものが提案されている。
例えば、特許文献１に記載のインターフェイス装置は、ＣＣＤ（ＣｈａｒｇｅＣｏｕｐｌｅｄＤｅｖｉｃｅ）カメラで撮像した画像中の物体の形状、動きを認識し、認識した物体の形状や動きをディスプレイに表示する。ユーザがＣＣＤカメラに向かって、例えば手振り等の動作を行うと、ユーザの動作が当該装置のディスプレイの表示画面上に表示される。このとき、ユーザは、表示画面上に表示された仮想スイッチ等を動作によって表示された矢印カーソルのアイコンで選択できる。
特許文献２に記載の映像表示装置は、ユーザを識別し、ユーザの位置と動作を検出し、検出したユーザの位置と動作により、表示映像の視聴対象であるユーザを判別し、ユーザに対応して映像を表示するアプリケーションを制御する。 Various devices and methods have been proposed as user interfaces for operating devices such as computers and game machines. In particular, game machines have been proposed that detect a user's movement (motion capture) and operate the device with the posture of the user's entire body.
For example, the interface device described in Patent Document 1 recognizes the shape and movement of an object in an image captured by a CCD (Charge Coupled Device) camera, and displays the recognized shape and movement of the object on a display. When the user performs an operation such as a hand shake toward the CCD camera, the user's operation is displayed on the display screen of the display of the apparatus. At this time, the user can select a virtual switch or the like displayed on the display screen with an arrow cursor icon displayed by the operation.
The video display device described in Patent Literature 2 identifies a user, detects the position and operation of the user, determines a user who is a viewing target of the display video based on the detected position and operation of the user, and corresponds to the user. To control applications that display video.

特開２００４−７８９７７号公報JP 2004-78977 A 特開２００９−８７０２６号公報JP 2009-87026 A

しかしながら、特許文献１に記載のインターフェイス装置では、ユーザがどのような動作を行って、どのような操作ができるのか直感的に認識できないということがある。例えば、入力操作を確定する方法（例えば、マウスのクリック等の操作入力の代替となる動作）や表示された映像コンテンツの移動方法（例えば、マウスのドラッグ等の操作入力の代替となる動作）などの操作に対応した入力方法が、ユーザにとって容易ではないことがある。クリック操作に対応する入力操作を確定する方法については、例えば、ユーザが所定の時間よりも長い時間、手を静止させる方法がある。この場合、ユーザは、入力操作を確定するために、手の位置を固定させることを予め知っておく必要がある。また、所定の時間が長すぎると、ユーザが快適に操作を行うことができないことがある。 However, in the interface device described in Patent Document 1, it may be impossible to intuitively recognize what operation the user performs and what operation can be performed. For example, a method for confirming an input operation (for example, an operation that substitutes for an operation input such as a mouse click), a method for moving a displayed video content (for example, an operation that substitutes for an operation input such as a mouse drag), or the like An input method corresponding to the above operation may not be easy for the user. As a method for confirming the input operation corresponding to the click operation, for example, there is a method in which the user stops his / her hand for a longer time than a predetermined time. In this case, the user needs to know in advance that the position of the hand is fixed in order to confirm the input operation. If the predetermined time is too long, the user may not be able to perform a comfortable operation.

また、特許文献２に記載の映像表示装置では、ユーザが動作した場合、その動作が操作入力を意図した動作か、他の目的を意図した動作かを判別することができないことがある。複数のユーザが当該映像表示装置に表示された映像を同時に視聴する場合、一方のユーザによる動作によって、他方のユーザが意図しない操作入力がなされることがある。また、複数のユーザが同時に動作して、当該映像表示装置が各々操作入力を受け付けると、相矛盾する操作入力に対する処理が指示されることで誤動作の原因になることがある。このことは、映像の視聴に支障をきたすことがある。 Further, in the video display device described in Patent Document 2, when the user operates, it may not be possible to determine whether the operation is an operation intended for an operation input or an operation intended for another purpose. When a plurality of users view the video displayed on the video display device at the same time, an operation input unintended by the other user may be performed due to an operation by one user. Further, when a plurality of users operate at the same time and each video display apparatus accepts an operation input, an operation may be instructed by instructing processing for conflicting operation inputs. This may hinder viewing of the video.

本発明は、このような問題を鑑みてなされたもので、操作方法の習得度に関わらず快適に操作でき、誤動作の発生を抑制する。 The present invention has been made in view of such a problem, and can be operated comfortably regardless of the level of mastery of the operation method, thereby suppressing the occurrence of malfunction.

（１）本発明は上記の課題を解決するためになされたものであり、本発明の一態様は、撮像装置が撮像した映像に表されたユーザ毎の身体の一部の位置を表す第１位置情報を取得する第１位置検出部と、前記第１位置情報に基づいてユーザを判別し、前記撮像装置が撮像した映像に表されたユーザの身体の一部の形状を表す情報を含むユーザ情報を検出するユーザ情報解析部と、前記ユーザ情報解析部が検出したユーザ情報に対応する処理を実行する制御部とを備えることを特徴とするデータ入力装置である。 (1) The present invention has been made to solve the above-described problem, and one aspect of the present invention is a first that represents the position of a part of the body for each user represented in an image captured by the imaging device. A first position detection unit that acquires position information; a user that determines information based on the first position information; and a user that includes information representing a shape of a part of the user's body shown in an image captured by the imaging device A data input device comprising: a user information analysis unit that detects information; and a control unit that executes processing corresponding to the user information detected by the user information analysis unit.

（２）本発明のその他の態様は、上述のデータ入力装置であって、前記ユーザ情報解析部は、前記ユーザ毎の身体の一部とは異なる他の部分が所在する第２位置情報を取得する第２位置検出部を備え、前記制御部は、前記第１位置情報と第２位置情報が予め定めた関係にあるユーザについて前記ユーザ情報に対応する処理を実行することを特徴とする。 (2) Another aspect of the present invention is the above-described data input device, wherein the user information analysis unit obtains second position information where another part different from a part of the body for each user is located. A second position detector configured to perform a process corresponding to the user information for a user having a predetermined relationship between the first position information and the second position information.

（３）本発明のその他の態様は、上述のデータ入力装置であって、前記身体の一部が、手であることを特徴とする。 (3) Another aspect of the present invention is the data input device described above, wherein the part of the body is a hand.

（４）本発明のその他の態様は、上述のデータ入力装置であって、前記身体の一部とは異なる他の部分は、少なくとも顔面又は目の一方であることを特徴とする。 (4) Another aspect of the present invention is the data input device described above, wherein the other part different from the part of the body is at least one of a face or eyes.

（５）本発明のその他の態様は、上述のデータ入力装置であって、前記撮像装置は、前記顔面又は目の高さよりも高い位置に設置されていることを特徴とする。 (5) Another aspect of the present invention is the data input device described above, wherein the imaging device is installed at a position higher than the height of the face or eyes.

（６）本発明のその他の態様は、上述のデータ入力装置であって、前記撮像装置が撮像した映像を表示する表示部を備え、前記撮像装置は、前記表示部よりも高い位置に設置されていることを特徴とする。 (6) Another aspect of the present invention is the above-described data input device including a display unit that displays an image captured by the imaging device, and the imaging device is installed at a position higher than the display unit. It is characterized by.

（７）本発明のその他の態様は、上述のデータ入力装置であって、前記撮像装置は、各々異なる位置に備えられた複数の撮像部を備え、前記第１位置検出部は、前記複数の撮像部が撮像した映像に基づいて距離情報を含む第１位置情報を取得し、前記ユーザ情報解析部は、前記第１位置検出部が取得した前記第１位置情報に基づいてユーザを判別することを特徴とする。 (7) Another aspect of the present invention is the above-described data input device, wherein the imaging device includes a plurality of imaging units provided at different positions, and the first position detection unit includes the plurality of imaging units. First position information including distance information is acquired based on an image captured by the imaging unit, and the user information analysis unit determines a user based on the first position information acquired by the first position detection unit. It is characterized by.

（８）本発明のその他の態様は、上述のデータ入力装置であって、前記ユーザ情報解析部が前記ユーザ情報の検出開始から予め定めた時間が経過した後に、実行する処理とユーザの身体の一部の形状と関係を表す案内画像を表示する表示部とを備えることを特徴とする。 (8) Another aspect of the present invention is the above-described data input device, in which the user information analysis unit executes a process executed after a predetermined time has elapsed from the start of detection of the user information and the body of the user. And a display unit for displaying a guide image representing a part of the shape and the relationship.

（９）本発明のその他の態様は、上述のデータ入力装置であって、前記ユーザ情報解析部が前記ユーザ情報を検出してから前記予め定めた時間が経過する前に、前記制御部が、前記ユーザ情報解析部が検出したユーザ情報に対応する処理を実行した場合、前記表示部は、前記案内画像を表示しないことを特徴とする。 (9) Another aspect of the present invention is the above-described data input device, wherein the control unit is configured so that the predetermined time elapses after the user information analysis unit detects the user information. When the process corresponding to the user information detected by the user information analysis unit is executed, the display unit does not display the guide image.

（１０）本発明のその他の態様は、上述のデータ入力装置であって、前記ユーザ情報解析部は、前記撮像装置が撮像した映像に基づき、前記ユーザの特徴を表す特徴情報を推定し、前記表示部は、前記特徴情報によって表示態様が異なる案内画像を表示することを特徴とする。 (10) Another aspect of the present invention is the above-described data input device, wherein the user information analysis unit estimates feature information representing the feature of the user based on an image captured by the imaging device, and The display unit displays a guide image having a different display mode according to the feature information.

（１１）本発明のその他の態様は、上述のデータ入力装置を備えることを特徴とする表示装置である。 (11) Another aspect of the present invention is a display device including the above-described data input device.

（１２）本発明のその他の態様は、撮像装置が撮像した映像に基づいてデータを入力するデータ入力装置におけるデータ入力方法において、前記データ入力装置が、撮像装置が撮像した映像に表されたユーザ毎の身体の一部の位置を表す第１位置情報を取得する第１の過程と、前記データ入力装置が、前記第１位置情報に基づいてユーザを判別し、前記撮像装置が撮像した映像に表されたユーザの身体の一部の形状を表す情報を含むユーザ情報を検出する第２の過程と、前記データ入力装置が、前記検出したユーザ情報に対応する処理を実行する第３の過程とを有することを特徴とするデータ入力方法である。 (12) According to another aspect of the present invention, in a data input method in a data input device that inputs data based on an image captured by an imaging device, the data input device is a user represented by an image captured by the imaging device. A first process of acquiring first position information representing a position of a part of the body, and the data input device discriminates a user based on the first position information, and the image captured by the imaging device A second step of detecting user information including information representing a shape of a part of the represented user's body, and a third step of executing a process corresponding to the detected user information by the data input device; It is a data input method characterized by having.

（１３）本発明のその他の態様は、上述のデータ入力方法であって、前記第２の過程は、前記ユーザ毎の身体の一部とは異なる他の部分が所在する第２位置情報を取得する第４の過程を有し、前記第３の過程は、前記第１位置情報と第２位置情報が予め定めた関係にあるユーザについて前記ユーザ情報に対応する処理を実行することを特徴とする。 (13) Another aspect of the present invention is the above-described data input method, wherein the second process acquires second position information where another part different from a part of the body for each user is located. And the third step executes a process corresponding to the user information for a user having a predetermined relationship between the first position information and the second position information. .

（１４）本発明のその他の態様は、撮像装置が撮像した映像に基づいてデータを入力するデータ入力装置のコンピュータに、撮像装置が撮像した映像に表されたユーザ毎の身体の一部の位置を表す第１位置情報を取得する第１の手順、前記第１位置情報に基づいてユーザを判別し、前記撮像装置が撮像した映像に表されたユーザの身体の一部の形状を表す情報を含むユーザ情報を検出する第２の手順、前記検出したユーザ情報に対応する処理を実行する第３の手順、を実行させるためのデータ入力プログラムである。 (14) According to another aspect of the present invention, the position of a part of the body for each user represented in the video captured by the imaging device is input to the computer of the data input device that inputs data based on the video captured by the imaging device. A first procedure for obtaining first position information representing the information, a user is determined based on the first position information, and information representing a shape of a part of the body of the user represented in an image captured by the imaging device A data input program for executing a second procedure for detecting user information including the third procedure for executing processing corresponding to the detected user information.

（１５）本発明のその他の態様は、上述のデータ入力プログラムであって、前記第２の手順は、前記ユーザ毎の身体の一部とは異なる他の部分が所在する第２位置情報を取得する第４の手順を含み、前記第３の手順は、前記第１位置情報と第２位置情報が予め定めた関係にあるユーザについて前記ユーザ情報に対応する処理を実行することを特徴とする。 (15) Another aspect of the present invention is the above-described data input program, wherein the second procedure acquires second position information where another part different from a part of the body for each user is located. The third procedure is characterized in that a process corresponding to the user information is executed for a user having a predetermined relationship between the first position information and the second position information.

本発明によれば、操作方法の習得度に関わらず快適に操作でき、誤動作の発生を抑制する。 According to the present invention, it is possible to operate comfortably regardless of the level of mastery of the operation method, and to prevent malfunctions.

本発明の実施形態に係る表示装置の使用態様を表す概念図である。It is a conceptual diagram showing the usage condition of the display apparatus which concerns on embodiment of this invention. ユーザと表示装置の位置関係を表す平面図である。It is a top view showing the positional relationship of a user and a display apparatus. 本実施形態に係る表示装置の構成を表すブロック図である。It is a block diagram showing the structure of the display apparatus which concerns on this embodiment. 左画像及び右画像の一例を表す概念図である。It is a conceptual diagram showing an example of a left image and a right image. 画像ブロックの例を表す概念図である。It is a conceptual diagram showing the example of an image block. 撮像面の位置関係を表す概念図である。It is a conceptual diagram showing the positional relationship of an imaging surface. 本実施形態に係るユーザ情報解析部の構成を表す概略図である。It is the schematic showing the structure of the user information analysis part which concerns on this embodiment. 操作開始検出範囲の一例を表す概念図である。It is a conceptual diagram showing an example of the operation start detection range. 操作開始検出範囲のその他の例を表す概念図である。It is a conceptual diagram showing the other example of the operation start detection range. 案内画像の一例を表す概念図である。It is a conceptual diagram showing an example of a guidance image. 本実施形態に係る制御部が処理を行うタイミングを表す概略図である。It is the schematic showing the timing which the control part which concerns on this embodiment performs a process. 本実施形態に係るデータ入力処理を表すフローチャートである。It is a flowchart showing the data input process which concerns on this embodiment.

以下、図面を参照しながら本発明の実施形態について説明する。
図１は、本実施形態に係る表示装置１０の使用態様を表す概念図である。
図１において、表示装置１０は、映像を表示する装置、例えばテレビジョン受信機、ディジタルサイネージ（電子看板）装置、映像会議装置である。表示装置１０は、正面下辺の中央部に撮像装置１１を備え、正面の大部分を覆うように表示部１２を備える。
撮像装置１１は、手前方向の映像を撮像する、例えばステレオカメラである。撮像装置１１は、例えば、左右方向に互いに離れた位置に映像を撮像する撮像部１１０ａ、１１０ｂを備える。撮像部１１０ａ、１１０ｂは、それぞれカメラユニットである。表示部１２は、映像表示する、例えばディスプレイである。なお、表示装置１０は、音声を出力するスピーカ（図示せず）を備える。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
FIG. 1 is a conceptual diagram showing how the display device 10 according to this embodiment is used.
In FIG. 1, a display device 10 is a device that displays video, such as a television receiver, a digital signage (electronic signage) device, or a video conference device. The display device 10 includes an imaging device 11 in the center of the lower front side, and includes a display unit 12 so as to cover most of the front surface.
The imaging device 11 is, for example, a stereo camera that captures an image in the front direction. The imaging device 11 includes, for example, imaging units 110a and 110b that capture images at positions separated from each other in the left-right direction. The imaging units 110a and 110b are each a camera unit. The display unit 12 is, for example, a display that displays an image. The display device 10 includes a speaker (not shown) that outputs sound.

操作者１３は、表示装置１０を操作するユーザである。操作者１３は、表示装置１０の正面に対面し、所定の姿勢、例えば身体の手振り、身振り（ジェスチャ）をとる。表示装置１０に内蔵されたユーザ情報解析部２０１（図３）は、撮像装置１１が撮像した画像に表された操作者１３の身体の一部がとる姿勢を表すユーザ情報を取得する。ユーザ情報には、例えば、指差しなどの手の形状・握り拳、その動かし方を表す情報が含まれる。表示装置１０は、撮像装置１１を介して取得したユーザ情報に対応した処理（機能、動作）を実行する。これにより、操作者１３は、指差しなどの手の形状・握り拳、その動かし方により、表示装置１０の処理を操作することができる。 The operator 13 is a user who operates the display device 10. The operator 13 faces the front of the display device 10 and takes a predetermined posture, for example, a hand gesture or a gesture (gesture). The user information analysis unit 201 (FIG. 3) built in the display device 10 acquires user information representing the posture taken by a part of the body of the operator 13 represented in the image captured by the imaging device 11. The user information includes, for example, a hand shape such as a pointing hand, a fist, and information indicating how to move the hand. The display device 10 executes processing (function, operation) corresponding to the user information acquired via the imaging device 11. Thereby, the operator 13 can operate the processing of the display device 10 by the shape of the hand such as a pointing hand, the fist, and how to move it.

表示装置１０には、自己の位置を基準とする予め設定された領域であって、操作者１３による操作を受け付ける操作可能領域が予め設定されている。表示装置１０には、操作可能領域として、例えば、表示装置１０の中心部から操作者１３の手の位置までの手前方向の距離である操作可能距離の上限（例えば、３ｍ）が設定されている。但し、操作可能領域の左右方向は、例えば、撮像装置１１の視野角の範囲内に設定されることができる。この場合には、左右方向の設定は不要である。表示装置１０には、操作可能距離の上限よりも離れたユーザである操作不可者１４による操作を受け付けない。操作を受け付けるユーザを判別する処理については、後で詳述する。 The display device 10 is preset with an operable region that is set in advance based on its own position and that accepts an operation by the operator 13. In the display device 10, as an operable region, for example, an upper limit (for example, 3 m) of an operable distance that is a distance in the forward direction from the center of the display device 10 to the position of the hand of the operator 13 is set. . However, the left-right direction of the operable region can be set within the range of the viewing angle of the imaging device 11, for example. In this case, setting in the left-right direction is not necessary. The display device 10 does not accept an operation by the operation disabled person 14 who is a user away from the upper limit of the operable distance. The process for determining the user who receives the operation will be described in detail later.

図２は、ユーザと表示装置１０の位置関係を表す平面図である。
図２において、上下は、それぞれ表示装置１０の奥、手前の方向を表す。図２が示す操作者１３、操作不可者１４及び表示装置１０の位置関係は同様である。ここで、図２では、操作者１３は、表示装置１０の正面であって、表示装置１０からの距離が操作可能距離の上限よりも短い（近い）位置に所在している。これに対し、操作不可者１４は、表示装置１０の正面であって、表示装置１０からの距離が操作可能距離の上限よりも長い（遠い）位置に所在している。 FIG. 2 is a plan view showing the positional relationship between the user and the display device 10.
In FIG. 2, the top and bottom represent the back and front directions of the display device 10, respectively. The positional relationship among the operator 13, the inoperable person 14, and the display device 10 shown in FIG. 2 is the same. Here, in FIG. 2, the operator 13 is located on the front face of the display device 10 at a position where the distance from the display device 10 is shorter (closer) than the upper limit of the operable distance. On the other hand, the operation disabled person 14 is located in front of the display device 10 at a position where the distance from the display device 10 is longer (distant) than the upper limit of the operable distance.

上述のように操作可能距離（操作可能距離の上限）が設定されることにより、制御部２２(図３)が、複数のユーザが同時に操作を行う機会や、操作以外の目的で操作又は操作と誤認される動作（例えば、路上近傍におかれたディジタルサイネージの場合における通行人による手振り）を表す画像が入力される機会を制限する。例えば公共の場に設置されたディジタルサイネージ装置のように、表示装置１０が同時に複数のユーザに利用される場合でも映像を視聴するユーザが意図しない処理が回避される。 By setting the operable distance (the upper limit of the operable distance) as described above, the control unit 22 (FIG. 3) can perform an operation or an operation for an opportunity for a plurality of users to perform an operation at the same time or for a purpose other than the operation. Limits the opportunity to input an image representing an action that is mistakenly recognized (for example, a hand gesture by a passerby in the case of digital signage placed near the road). For example, even when the display device 10 is used by a plurality of users at the same time, such as a digital signage device installed in a public place, processing that is not intended by the user who views the video is avoided.

図１に示す例は、撮像装置１１が表示装置１０の正面下辺に設置されている場合を示すが、これに限定されない。例えば、撮像装置１１は、表示装置１０の正面上辺に設置されていてもよいし、表示装置１０から離れた位置に設置されていてもよい。
撮像装置１１は、操作者１３の顔面の高さ、特に、目の高さよりも高い位置に設置されていてもよい。そのために、撮像装置１１の高さを、操作者１３が所在する床面の高さと人間の平均身長を考慮して予め定めておく。その他、表示装置１０が、例えば床面など比較的低い位置に設置される場合には、撮像装置１１は、表示部１２よりも高い位置に設置されていてもよい。
これにより、撮像装置１１は、操作者１３の顔面よりも高い位置から操作者１３の身体を表す映像を撮像することができ、指差しなどの手の形状・握り拳、その動かし方によって操作者１３の顔面が遮蔽されることを防止することができる。そのため、制御部２２が、操作者１３の顔面の画像を用いて行われる操作者の識別、顔面の位置の検出、操作の検出などの処理を安定して実行できる。これらの処理については後述する。 Although the example shown in FIG. 1 shows the case where the imaging device 11 is installed in the lower front side of the display device 10, it is not limited to this. For example, the imaging device 11 may be installed on the upper front side of the display device 10 or may be installed at a position away from the display device 10.
The imaging device 11 may be installed at a position higher than the height of the face of the operator 13, in particular, the height of the eyes. For this purpose, the height of the imaging device 11 is determined in advance in consideration of the height of the floor on which the operator 13 is located and the average height of a person. In addition, when the display device 10 is installed at a relatively low position such as a floor surface, the imaging device 11 may be installed at a position higher than the display unit 12.
Thereby, the imaging device 11 can capture an image representing the body of the operator 13 from a position higher than the face of the operator 13, and the operator 13 can change the shape of the hand such as a pointing hand, a fist, and how to move it. Can be prevented from being masked. Therefore, the control unit 22 can stably execute processing such as operator identification, face position detection, and operation detection performed using the face image of the operator 13. These processes will be described later.

〔表示装置の構成〕
次に本実施形態に係る表示装置１０の構成について説明する。
図３は、本実施形態に係る表示装置１０の構成を表すブロック図である。
表示装置１０は、データ入力装置２ａ又は表示制御装置２ｂを含んで構成される。データ入力装置２ａは、撮像装置１１、画像処理装置２０、情報ＤＢ２１（Ｄａｔａｂａｓｅ、データベース）及び制御部２２を含んで構成される。
撮像装置１１は、撮像した映像を表す映像信号を生成し、生成した映像信号を画像処理装置２０に出力する。画像処理装置２０は、撮像装置１１から入力された映像信号に基づき判別した操作者を表す操作者情報の取得、操作者の身体の一部が所在する位置を表す第１空間情報の取得、操作者の身体の一部がとる形状を表すユーザ情報の取得を行う。画像処理装置２０は、取得した操作者情報、第１空間情報及びユーザ情報を検出情報として表示制御装置２ｂに出力する。 [Configuration of display device]
Next, the configuration of the display device 10 according to the present embodiment will be described.
FIG. 3 is a block diagram illustrating the configuration of the display device 10 according to the present embodiment.
The display device 10 includes a data input device 2a or a display control device 2b. The data input device 2a includes an imaging device 11, an image processing device 20, an information DB 21 (database), and a control unit 22.
The imaging device 11 generates a video signal representing the captured video and outputs the generated video signal to the image processing device 20. The image processing device 20 obtains operator information representing the operator determined based on the video signal input from the imaging device 11, obtains first spatial information representing a position where a part of the operator's body is located, and operates. The user information representing the shape taken by a part of the person's body is acquired. The image processing device 20 outputs the acquired operator information, first spatial information, and user information as detection information to the display control device 2b.

表示制御装置２ｂは、情報ＤＢ２１、制御部２２及び表示部１２を含んで構成される。
情報ＤＢ２１には、操作者１３の映像を表す映像信号に基づく操作入力に応じて表示する表示情報が記憶されている。表示情報は、例えば、映像コンテンツ等を表す映像信号、ニュース等を表すテキスト情報、ネットワークから受信したコンテンツを表すコンテンツ情報、案内画像（操作ガイド）を表す案内画像信号である。案内画像の詳細については後述する。 The display control device 2b includes an information DB 21, a control unit 22, and a display unit 12.
The information DB 21 stores display information to be displayed in response to an operation input based on a video signal representing the video of the operator 13. The display information is, for example, a video signal representing video content, text information representing news, content information representing content received from the network, and a guide image signal representing a guide image (operation guide). Details of the guide image will be described later.

制御部２２は、画像処理装置２０から入力された検出情報から第１空間情報とユーザ情報を抽出する。制御部２２は、抽出した第１空間情報が表す操作者１３の位置が予め定められた操作可能領域の範囲内である場合、抽出したユーザ情報に対応する処理を行う。ここで、制御部２２は、例えば、第１空間情報が表す操作者１３の距離が予め設定された操作可能距離の上限よりも小さいか否かを判断する。ユーザ情報に対応する処理とは、例えば、案内画像の表示、映像コンテンツの表示、ネットワークからの情報検索、検索された情報に係る映像コンテンツ等やニュース等の保存、保存された情報の表示等、各種の映像表示に係る処理である。 The control unit 22 extracts first spatial information and user information from the detection information input from the image processing device 20. When the position of the operator 13 represented by the extracted first spatial information is within the predetermined operable region, the control unit 22 performs processing corresponding to the extracted user information. Here, for example, the control unit 22 determines whether or not the distance of the operator 13 represented by the first spatial information is smaller than the upper limit of the operable distance set in advance. Processing corresponding to user information includes, for example, display of guidance images, display of video content, information retrieval from the network, storage of video content and news related to the searched information, display of stored information, etc. This is processing related to various video displays.

制御部２２は、保存することが指示された情報を表示情報として情報ＤＢ２１に記憶する。制御部２２は、表示することが指示された表示情報を情報ＤＢ２１から読み出し、読み出した表示情報を表す映像信号を表示部１２に出力する。制御部２２は、停止することが指示された表示情報の出力を停止する。
表示部１２は、制御部２２から入力された映像信号を映像として表示する。これにより、操作者１３が行って操作によって選択された映像コンテンツやニュースに係る映像や、案内画像を表示する。
これにより、表示制御装置２ｂは、画像処理装置２０から入力された検出情報に含まれるユーザ情報が表すコンテンツを選択する処理や、選択したコンテンツを表示する処理を実行する。 The control unit 22 stores information instructed to be stored in the information DB 21 as display information. The control unit 22 reads display information instructed to be displayed from the information DB 21, and outputs a video signal representing the read display information to the display unit 12. The control unit 22 stops outputting the display information instructed to stop.
The display unit 12 displays the video signal input from the control unit 22 as a video. As a result, the video content or news related to the video content selected by the operator 13 and the operation, or the guidance image is displayed.
Thereby, the display control device 2b executes processing for selecting content represented by the user information included in the detection information input from the image processing device 20, and processing for displaying the selected content.

次に、データ入力装置２ａのより詳細な構成について説明する。
撮像装置１１は、撮像部１１０ａ、１１０ｂを含んで構成される。撮像部１１０ａ、１１０ｂは、撮像した映像を表す映像信号を生成し、生成した映像信号を画像処理装置２００に出力する。撮像部１１０ａは、生成した映像信号をユーザ情報解析部２０１に出力する。撮像部１１０ａ、１１０ｂは、例えば、被写体から入射された光を集光するレンズを備えた光学系と、集光された光を電気信号に変換する撮像素子を備えるカメラである。撮像部１１０ａ、１１０ｂが備える撮像素子は、例えば、ＣＣＤ（ＣｈａｒｇｅＣｏｕｐｌｅｄＤｅｖｉｃｅ、）素子、ＣＭＯＳ（ＣｏｍｐｌｅｍｅｎｔａｒｙＭｅｔａｌＯｘｉｄｅＳｅｍｉｃｏｎｄｕｃｔｏｒ）素子である。 Next, a more detailed configuration of the data input device 2a will be described.
The imaging device 11 includes imaging units 110a and 110b. The imaging units 110 a and 110 b generate a video signal representing the captured video and output the generated video signal to the image processing device 200. The imaging unit 110a outputs the generated video signal to the user information analysis unit 201. The imaging units 110a and 110b are, for example, cameras that include an optical system including a lens that collects light incident from a subject and an imaging element that converts the collected light into an electrical signal. The imaging devices included in the imaging units 110a and 110b are, for example, CCD (Charge Coupled Device) elements and CMOS (Complementary Metal Oxide Semiconductor) elements.

画像処理装置２０は、距離算出部２００、ユーザ情報解析部２０１を含んで構成される。
距離算出部２００は、撮像部１１０ａ、１１０ｂから各々映像信号が入力される。距離算出部２００は、各々入力された映像信号に基づいて撮像装置１１から被写体（例えば、操作者１３）までの距離を表す距離情報を、例えばステレオマッチング方式を用いて算出する。 The image processing apparatus 20 includes a distance calculation unit 200 and a user information analysis unit 201.
The distance calculation unit 200 receives video signals from the imaging units 110a and 110b. The distance calculation unit 200 calculates distance information representing the distance from the imaging device 11 to the subject (for example, the operator 13) based on each input video signal using, for example, a stereo matching method.

〔距離情報の算出〕
ここで、ステレオマッチング方式の一種であるブロックマッチング方式を用いた距離情報の算出方法について説明する。ステレオマッチング方式では、撮像部１１０ａ、１１０ｂが撮像した映像の視差値を距離値として算出する。以下の説明では、撮像部１１０ａが撮像した映像に含まれる画像であって、ある時点における画像を左画像と呼ぶ。撮像部１１０ｂが撮像した映像に含まれる画像であって、その時点における画像を右画像と呼ぶ。 [Calculation of distance information]
Here, a calculation method of distance information using a block matching method which is a kind of stereo matching method will be described. In the stereo matching method, the parallax value of the video imaged by the imaging units 110a and 110b is calculated as the distance value. In the following description, an image included in a video captured by the imaging unit 110a and an image at a certain time point is referred to as a left image. An image included in the video imaged by the imaging unit 110b and the image at that time is called a right image.

ステレオマッチング方式では、左画像の一部の領域である左画像ブロックと対応する領域である右画像ブロックを探索する。ここで左画像及び同時に撮像された右画像を例にとって説明する。
図４は、左画像及び右画像の一例を表す概念図である。
図４は、左側に左画像４０を表し、右側に右画像４１を表す。
距離算出部２００は、左画像４０において、注目画素を中心とする左画像ブロック（ウィンドウ）４００を設定する。左画像ブロック４００に含まれる左右方向に３個、上下方向に３個、計９個の四角形は、それぞれ画素を表す。図４における左画像４０の右端から左画像ブロック４００の右端までの水平方向の距離がＬ画素（Ｌ個の画素分の距離）である。Ｌは、１又は１よりも大きい整数である。 In the stereo matching method, a right image block that is an area corresponding to a left image block that is a partial area of the left image is searched. Here, a left image and a right image captured simultaneously will be described as an example.
FIG. 4 is a conceptual diagram illustrating an example of the left image and the right image.
FIG. 4 shows a left image 40 on the left side and a right image 41 on the right side.
The distance calculation unit 200 sets a left image block (window) 400 centered on the target pixel in the left image 40. A total of nine squares in the left image block 400, three in the horizontal direction and three in the vertical direction, each represent a pixel. The horizontal distance from the right end of the left image 40 to the right end of the left image block 400 in FIG. 4 is L pixels (distance corresponding to L pixels). L is 1 or an integer greater than 1.

距離算出部２００は、右画像４１において、左画像ブロック４００と上下方向の座標が同一であって、右画像４１の右端からの距離がＬ＋ｄ２に右端をもつ右画像ブロック４１０を初期値として設定する。ｄ２は、予め設定された整数値であり、視差値の最大値を表す。右画像ブロック４１０の大きさ及び形状は、左画像ブロック４００と同一である。
距離算出部２００は、左画像ブロック４００と右画像ブロック４１０との間の指標値を算出する。距離算出部２００は、右画像ブロック４１０の右端が右画像４１の右端からの距離がＬ画素になるまで位置をずらし、それぞれずれた位置で指標値を算出する。距離算出部２００は算出した指標値に基づいて左画像ブロック４００と対応する位置にある右画像ブロック４１０を定める。指標値として、例えばＳＡＤ（ＳｕｍｏｆＡｂｓｏｌｕｔｅＤｉｆｆｅｒｅｎｃｅ）値を用いる場合、ＳＡＤ値が最小となる位置にある右画像ブロック４１０を定める。この位置が左画像４０における注目画素に対応する注目点である。注目点と注目画素の間の水平方向の座標の差分に対する絶対値が視差である。距離算出部２００は、左画像４０に含まれる各画素について実行し、撮像部１１０ａが撮像した映像に含まれる画素毎の視差値を表す視差情報（視差マップ、ディスパリティマップともいう）を距離情報として生成する。視差は、撮像装置１１から被写体までの距離が短いほど大きく、距離が長いほど小さくなる関係がある。距離算出部２００は、生成した距離情報をユーザ情報解析部２０１に出力する。 In the right image 41, the distance calculation unit 200 sets, as an initial value, a right image block 410 that has the same vertical coordinate as the left image block 400 and has a right end at L + d2 from the right end of the right image 41. . d2 is an integer value set in advance and represents the maximum parallax value. The size and shape of the right image block 410 are the same as those of the left image block 400.
The distance calculation unit 200 calculates an index value between the left image block 400 and the right image block 410. The distance calculation unit 200 shifts the position of the right image block 410 until the right edge of the right image block 410 from the right edge of the right image 41 becomes L pixels, and calculates an index value at each shifted position. The distance calculation unit 200 determines the right image block 410 at a position corresponding to the left image block 400 based on the calculated index value. For example, when an SAD (Sum of Absolute Difference) value is used as the index value, the right image block 410 at a position where the SAD value is minimized is determined. This position is an attention point corresponding to the attention pixel in the left image 40. The absolute value for the horizontal coordinate difference between the target point and the target pixel is parallax. The distance calculation unit 200 executes the respective pixels included in the left image 40, and displays disparity information (also referred to as a disparity map or a disparity map) indicating disparity values for each pixel included in the image captured by the image capturing unit 110a. Generate as The parallax has a relationship that increases as the distance from the imaging device 11 to the subject decreases, and decreases as the distance increases. The distance calculation unit 200 outputs the generated distance information to the user information analysis unit 201.

視差マップは、予め定めたビット数で表される整数値（例えば、８ビットの場合、最小値０−最大値２５５）で表された視差値を画素毎に有する、グレースケール化したビットマップ画像である。なお、距離算出部２００は、視差は、撮像部１１０ａと撮像部１１０ｂの間隔である基線長等のカメラパラメータに基づいて、撮像装置１１から被写体までの被写空間における距離に変換し、変換した距離を表す距離情報を生成してもよい。従って、距離算出部２００は、画素毎の視差値を表す距離情報の代わりに変換した距離をグレースケール化したビットマップ画像（デプスマップ）を距離情報として生成してもよい。
なお、撮像部１１０ａ、１１０ｂは、上下方向に異なる座標値に配置され、各々が撮像した画像を表す撮像画像を用いて視差を算出してもよい。その場合、距離算出部２００は、撮像部１１０ａ、１１０ｂのいずれかが撮影した画像における画像ブロックを基準として、他方が撮影した画像における画像ブロックを上下方向にずらして対応する画像ブロックを探索すればよい。 The disparity map is a grayscale bitmap image having a disparity value represented by an integer value represented by a predetermined number of bits (for example, in the case of 8 bits, a minimum value 0 to a maximum value 255) for each pixel. It is. The distance calculation unit 200 converts the parallax into a distance in the subject space from the imaging device 11 to the subject based on a camera parameter such as a baseline length that is an interval between the imaging unit 110a and the imaging unit 110b. Distance information representing the distance may be generated. Therefore, the distance calculation unit 200 may generate a bitmap image (depth map) obtained by converting the converted distance into a gray scale instead of the distance information indicating the parallax value for each pixel as the distance information.
Note that the imaging units 110a and 110b may be arranged at different coordinate values in the vertical direction, and calculate parallax using captured images that represent images captured by each. In that case, the distance calculation unit 200 may search the corresponding image block by shifting the image block in the image captured by the other image sensor in the vertical direction with reference to the image block in the image captured by either of the imaging units 110a and 110b. Good.

距離算出部２００は、ＳＡＤ値を算出する際、例えば、式（１）を用いる。 The distance calculation unit 200 uses, for example, Expression (1) when calculating the SAD value.

式（１）において、ｘ_ｉは、左画像ブロック４００に含まれる、例えば緑色（Ｇ）の画素毎の画素値である。８は、１個の画像ブロックに含まれる画素数の一例である。画素値ｘ_０〜ｘ_８にそれぞれ対応する画素の配置は、図５の左側に示すように各行毎に左端から右端に向かい、行間では最上行から最下行に向かう順序である。ｘ_ａｉは、右画像ブロック４１０に含まれる画素毎の画素値である。画素値ｘ_ａ０〜ｘ_ａ８にそれぞれ対応する画素の配置は、図５の右側に示すように各行毎に左端から右端に向かい、行間では最上行から最下行に向かう順序である。 In Expression (1), x _i is a pixel value for each pixel of, for example, green (G) included in the left image block 400. 8 is an example of the number of pixels included in one image block. The arrangement of the pixels corresponding to the pixel values x _{0 to} x ₈ is the order from the left end to the right end for each row as shown on the left side of FIG. 5, and from the top row to the bottom row between the rows. x _ai is a pixel value for each pixel included in the right image block 410. The arrangement of the pixels corresponding to the pixel values x _{a0 to} x _a8 is in the order from the left end to the right end for each row as shown on the right side of FIG. 5, and from the top row to the bottom row between rows.

指標値は、ＳＡＤ値に限られない。左画像ブロック４００に含まれる画素値と右画像ブロック４１０に含まれる画素値との相関を表すものであれば、他の指標値、例えばＳＳＤ（ＳｕｍｏｆＳｑｕａｒｅｄＤｉｆｆｅｒｅｎｃｅ）値や、ＤＰ（ＤｙｎａｍｉｃＰｒｏｇｒａｍｍｉｎｇ）値であってもよい。
左画像ブロック４００、右画像ブロック４１０の大きさであるウィンドウサイズは、上述のように水平方向３画素×上下方向３画素に限られない。例えば、水平方向５画素×上下方向５画素、水平方向９画素×上下方向９画素のように上述よりも大きくてもよいし、水平方向４画素×上下方向４画素のように中心座標が注目点からずれた位置になってもよい。また、右画像ブロック４１０をずらす方向は、上述のように左側から右側に限られず、右側から左側であってもよい。左画像ブロック４００、右画像ブロック４１０に含まれる画素は、緑色（Ｇ）の画素の信号値に限られず、他の色、例えば赤色（Ｒ）の画素の信号値であってもよいし、他の表色系に基づく画素の信号値や、それらの任意の組み合わせであってもよい。 The index value is not limited to the SAD value. Any other index value, such as an SSD (Sum of Squared Difference) value or DP (Dynamic Programming), may be used as long as it represents the correlation between the pixel value included in the left image block 400 and the pixel value included in the right image block 410. It may be a value.
The window size which is the size of the left image block 400 and the right image block 410 is not limited to 3 pixels in the horizontal direction × 3 pixels in the vertical direction as described above. For example, it may be larger than the above, such as 5 pixels in the horizontal direction × 5 pixels in the vertical direction, 9 pixels in the horizontal direction × 9 pixels in the vertical direction, and the center coordinate is the point of interest as in 4 pixels in the horizontal direction × 4 pixels in the vertical direction. The position may be shifted from the position. Further, the direction of shifting the right image block 410 is not limited from the left side to the right side as described above, and may be from the right side to the left side. The pixels included in the left image block 400 and the right image block 410 are not limited to the signal value of the green (G) pixel, but may be a signal value of another color, for example, a red (R) pixel. It may be a pixel signal value based on the above color system or any combination thereof.

上述のブロックマッチング方式では、左画像４０のある座標と右画像４１の対応する座標が左右方向にずれており、上下方向にずれておらず、左画像４０と右画像４１のエピポーラ線が一致していると仮定していた。上述のように撮像部１１０ａ、１１０ｂの光軸が平行になるように配置していたのはエピポーラ（ｅｐｉｐｏｌａｒ）線（補助線とも呼ばれる）を一致させるためである。エピポーラ線を一致させるために、距離算出部２００が予め取得した撮像部１１０ａ、１１０ｂのカメラパラメータに基づいて、左画像４０と右画像４１の光軸が平行になるように撮像した画像信号を座標変換する処理を行ってもよい。この座標変換を行う処理は、レクティフィケーション（ｒｅｃｔｉｆｉｃａｔｉｏｎ）、又は偏移修正と呼ばれる。距離算出部２００は、この処理を行った後で距離情報を生成する。 In the block matching method described above, the coordinates of the left image 40 and the corresponding coordinates of the right image 41 are shifted in the left-right direction, not shifted in the vertical direction, and the epipolar lines of the left image 40 and the right image 41 match. Was assumed. The reason why the optical axes of the imaging units 110a and 110b are arranged in parallel as described above is to match epipolar lines (also called auxiliary lines). In order to match the epipolar lines, based on the camera parameters of the imaging units 110a and 110b acquired in advance by the distance calculation unit 200, the image signals captured so that the optical axes of the left image 40 and the right image 41 are parallel are coordinated. You may perform the process to convert. The process of performing the coordinate transformation is called rectification or shift correction. The distance calculation unit 200 generates distance information after performing this process.

エピポーラ線とは、図６に示されるようにエピポーラ平面５３と、２つの撮像部１１０ａ、１１０ｂ各々が備える撮像面５４、５５が交わる線５６、５７である。エピポーラ平面５３は、２つの撮像部１１０ａ、１１０ｂ各々が備えるレンズの焦点５０、５１と、被写空間における特徴点５２の３点を通る平面である。
撮像部１１０ａ、１１０ｂが、各々の光軸が平行になるように配置されている場合、エピポーラ線５６、５７は、左画像４０、右画像４１各々において上下方向の座標が同一の水平線になる。 As shown in FIG. 6, the epipolar lines are lines 56 and 57 where the epipolar plane 53 and the imaging surfaces 54 and 55 included in each of the two imaging units 110 a and 110 b intersect. The epipolar plane 53 is a plane that passes through the three focal points 50 and 51 of the lens provided in each of the two imaging units 110a and 110b and the feature point 52 in the object space.
When the imaging units 110a and 110b are arranged so that the optical axes thereof are parallel, the epipolar lines 56 and 57 are horizontal lines having the same vertical coordinate in the left image 40 and the right image 41, respectively.

〔ユーザ情報の解析〕
次に、本実施形態に係るユーザ情報解析部２０１の構成について説明する。
図７は、本実施形態に係るユーザ情報解析部２０１の構成を表す概略図である。
ユーザ情報解析部２０１は、顔検出部３０、目位置検出部３１、手位置検出部３２、手形状・指先位置検出部３３、注目位置検出部３５、特徴情報解析部３７、操作者判別部３９、検出情報出力部４１を備える。 [Analysis of user information]
Next, the configuration of the user information analysis unit 201 according to the present embodiment will be described.
FIG. 7 is a schematic diagram illustrating the configuration of the user information analysis unit 201 according to the present embodiment.
The user information analysis unit 201 includes a face detection unit 30, an eye position detection unit 31, a hand position detection unit 32, a hand shape / fingertip position detection unit 33, a target position detection unit 35, a feature information analysis unit 37, and an operator determination unit 39. The detection information output unit 41 is provided.

〔ユーザの顔の検出〕
顔検出部３０は、撮像部１１０ａから入力された映像信号が表す操作者の顔面の画像を表す領域を検出する。顔検出部３０は、検出した顔面の領域における代表点（例えば、重心点）の２次元座標や、その領域の上端、下端、左端、右端の２次元座標を表す２次元顔面領域情報を生成する。顔検出部３０は、距離算出部２００から入力された距離情報から、２次元顔領域情報が表す２次元座標の画素に係る距離値を抽出する。顔検出部３０は、前述の２次元座標と対応する距離値を、被写空間における３次元座標に変換して、３次元顔面位置情報を生成する。
顔面の領域を検出するために、顔検出部３０は、例えば予め設定した顔面の色彩（例えば、肌色）を表す色信号値の範囲にある画素を、入力された画像信号から抽出する。 [User face detection]
The face detection unit 30 detects an area representing an image of the operator's face represented by the video signal input from the imaging unit 110a. The face detection unit 30 generates two-dimensional face area information representing the two-dimensional coordinates of the representative points (for example, the center of gravity) in the detected face area and the two-dimensional coordinates of the upper end, the lower end, the left end, and the right end of the area. . The face detection unit 30 extracts the distance value related to the pixel of the two-dimensional coordinate represented by the two-dimensional face area information from the distance information input from the distance calculation unit 200. The face detection unit 30 converts the distance value corresponding to the above-described two-dimensional coordinates into three-dimensional coordinates in the object space, and generates three-dimensional face position information.
In order to detect a face area, the face detection unit 30 extracts, for example, pixels in a range of color signal values representing a preset face color (for example, skin color) from the input image signal.

なお、顔検出部３０は、予め人間の顔面を表す濃淡（モノクロ）画像信号を記憶した記憶部を備えるようにしてもよい。そこで、顔検出部３０は、記憶部から読み出した濃淡画像信号と入力された画像信号との相関値を複数の画素を含む画像ブロック毎に算出し、算出した相関値が予め定めた閾値よりも大きい画像ブロックを顔面の領域と検出する。
その他、顔検出部３０は、入力された画像信号に基づいて特徴量（例えば、Ｈａａｒ−Ｌｉｋｅ特徴量）を算出し、算出した特徴量に基づいて予め定めた処理（例えば、Ａｄａｂｏｏｓｔアルゴリズム）を行って顔面の領域を検出してもよい。顔検出部３０が顔面の領域を検出する方法は、上述の方法に限られず、入力された画像信号から顔面の領域を検出する方法であれば、いかなる方法を用いてもよい。
顔検出部３０は、検出した顔面の画像を表す顔面画像信号を特徴情報解析部３７と目位置検出部３１に出力する。顔検出部３０は、生成した３次元顔面位置情報及び２次元顔面領域情報を操作者判別部３９に出力する。顔検出部３０は、生成した３次元顔面位置情報を検出情報の一部として検出情報出力部４１に出力する。 Note that the face detection unit 30 may include a storage unit that previously stores a grayscale (monochrome) image signal representing a human face. Therefore, the face detection unit 30 calculates a correlation value between the grayscale image signal read from the storage unit and the input image signal for each image block including a plurality of pixels, and the calculated correlation value is greater than a predetermined threshold value. A large image block is detected as a facial region.
In addition, the face detection unit 30 calculates a feature amount (for example, Haar-Like feature amount) based on the input image signal, and performs a predetermined process (for example, an Adaboost algorithm) based on the calculated feature amount. The face area may be detected. The method by which the face detection unit 30 detects the facial region is not limited to the above-described method, and any method may be used as long as the facial region is detected from the input image signal.
The face detection unit 30 outputs a face image signal representing the detected face image to the feature information analysis unit 37 and the eye position detection unit 31. The face detection unit 30 outputs the generated 3D face position information and 2D face area information to the operator determination unit 39. The face detection unit 30 outputs the generated three-dimensional face position information to the detection information output unit 41 as part of the detection information.

〔目の位置の検出〕
目位置検出部３１は、顔検出部３０から入力された顔面画像信号が表す顔面の画像から目の領域を検出する。目位置検出部３１は、検出した目の領域の代表点（例えば、重心点）である２次元の目位置座標を算出する。目位置検出部３１は、検出した目位置座標に所在する画素における距離値を、距離情報算出部２００から入力された距離情報から抽出する。目位置検出部３１は、算出した２次元の目位置座標と抽出した距離値の組を、被写空間における３次元の目位置座標に変換して３次元目位置情報を生成する。目位置検出部３１は、算出した３次元の目位置座標を表す３次元目位置情報を注目位置検出部３５および操作者判別部３９に出力する。目位置検出部３１は、検出した目の領域の画像を表す目領域信号、算出した２次元の目位置座標を表す２次元目位置情報を操作者判別部３９に出力する。 [Eye position detection]
The eye position detection unit 31 detects an eye region from the facial image represented by the facial image signal input from the face detection unit 30. The eye position detection unit 31 calculates two-dimensional eye position coordinates that are representative points (for example, centroid points) of the detected eye region. The eye position detection unit 31 extracts the distance value at the pixel located at the detected eye position coordinates from the distance information input from the distance information calculation unit 200. The eye position detection unit 31 converts the set of the calculated two-dimensional eye position coordinates and the extracted distance value into three-dimensional eye position coordinates in the object space, and generates three-dimensional eye position information. The eye position detection unit 31 outputs the three-dimensional eye position information representing the calculated three-dimensional eye position coordinates to the attention position detection unit 35 and the operator determination unit 39. The eye position detection unit 31 outputs an eye region signal representing the detected image of the eye region and two-dimensional eye position information representing the calculated two-dimensional eye position coordinates to the operator determination unit 39.

目位置検出部３１が目の領域を検出するために、例えば、予め撮影された目のテンプレート画像を記憶させた記憶部を備えておく。目位置検出部３１は、記憶部から目のテンプレート画像を読み出し、読み出したテンプレート画像と入力された顔面画像信号を照合するテンプレートマッチング法を用いてもよい。また、目位置検出部３１は、入力された顔面画像信号が表す顔面の領域のうち、予め設定された顔面における目の位置関係（例えば、予め計測された顔面の領域と両眼の位置）を表す目位置情報を用いて目の領域を検出してもよい。また、目位置検出部３１は、入力された顔面画像信号に基づいて特徴量（例えば、Ｈａａｒ−Ｌｉｋｅ特徴量）を算出し、算出した特徴量に基づいて予め定めた判別処理（例えば、Ａｄａｂｏｏｓｔアルゴリズム）を行って目の領域を検出してもよい。
目位置検出部３１が目の領域を検出する方法は、上述の方法に限られず、顔面画像信号から目の領域を検出する方法であれば、いかなる方法を用いてもよい。
目位置検出部３１は、検出する目の領域として、両目の重心点にかかわらず、左目や右目の位置や、これら全てを表す目領域信号を出力するようにしてもよい。 In order for the eye position detection unit 31 to detect an eye region, for example, a storage unit that stores a template image of an eye that has been captured in advance is provided. The eye position detection unit 31 may use a template matching method that reads an eye template image from the storage unit and collates the read template image with the input facial image signal. In addition, the eye position detection unit 31 calculates the positional relationship of eyes on a predetermined face (for example, the pre-measured face area and the positions of both eyes) among the facial areas represented by the input facial image signal. The eye region may be detected using the eye position information that is represented. Further, the eye position detection unit 31 calculates a feature amount (for example, Haar-Like feature amount) based on the input face image signal, and performs a predetermined discrimination process (for example, an Adaboost algorithm) based on the calculated feature amount. ) To detect the eye region.
The method of detecting the eye region by the eye position detection unit 31 is not limited to the above-described method, and any method may be used as long as the method detects the eye region from the face image signal.
The eye position detection unit 31 may output an eye area signal representing the positions of the left eye and the right eye, or all of them, regardless of the center of gravity of both eyes as the eye area to be detected.

〔手の位置の検出〕
手位置検出部３２は、撮像部１１０ａから入力された映像信号が表す操作者の手の画像を表す領域を検出し、検出した手の位置を算出する。
手の画像を表す領域を検出するために、手位置検出部３２は、例えば予め設定した手の表面の色彩（例えば、肌色）を表す色信号値の範囲にある画素を、入力された映像信号から抽出する。手位置検出部３２は、手の位置として検出した手の画像を表す領域の代表点（例えば、重心点）の２次元座標値を算出する。手位置検出部３２は、算出した座標値に対応する距離値を、距離算出部２００から入力された距離情報から抽出し、算出した２次元座標値と対応する距離値の組を被写空間における３次元座標に変換して３次元手位置情報を生成する。手位置検出部３２は、検出した手の領域の画像を表す手画像信号と、算出した代表点の２次元座標値を表す手位置情報を手形状・指先位置検出部３３に出力する。手位置検出部３２は、当該手位置情報を操作者判別部３９に出力する。
また、手の画像を表す領域を検出するために、手位置検出部３２は、距離検出部２００から入力された距離情報に基づいて、顔検出部３０から入力された３次元顔面位置情報が表す３次元顔面位置を基準とした予め定めた奥行方向の開始点と終了点で表される距離範囲内にある画像を撮像部１１０ａから入力された映像信号から手の画像を表す領域として抽出してもよい。予め定めた距離範囲は、例えば３次元顔面位置より前方（表示装置１０側）にある範囲である。これにより、操作者の前方または後方にいる別人の手を、操作者の手と認識することを防ぐことができる。 [Detection of hand position]
The hand position detection unit 32 detects a region representing the image of the operator's hand represented by the video signal input from the imaging unit 110a, and calculates the position of the detected hand.
In order to detect a region representing a hand image, the hand position detection unit 32 inputs, for example, a pixel within a color signal value range representing a color (for example, skin color) of a hand surface set in advance to an input video signal. Extract from The hand position detection unit 32 calculates a two-dimensional coordinate value of a representative point (for example, a center of gravity point) of an area representing the hand image detected as the hand position. The hand position detection unit 32 extracts a distance value corresponding to the calculated coordinate value from the distance information input from the distance calculation unit 200, and sets a set of distance values corresponding to the calculated two-dimensional coordinate value in the subject space. Three-dimensional hand position information is generated by converting into three-dimensional coordinates. The hand position detection unit 32 outputs a hand image signal representing the detected image of the hand region and hand position information representing the calculated two-dimensional coordinate value of the representative point to the hand shape / fingertip position detection unit 33. The hand position detection unit 32 outputs the hand position information to the operator determination unit 39.
Further, in order to detect the region representing the hand image, the hand position detection unit 32 represents the three-dimensional face position information input from the face detection unit 30 based on the distance information input from the distance detection unit 200. Extracting an image within a distance range represented by a start point and an end point in a predetermined depth direction with reference to the three-dimensional face position as a region representing a hand image from the video signal input from the imaging unit 110a Also good. The predetermined distance range is, for example, a range in front of the three-dimensional face position (on the display device 10 side). Thereby, it can prevent recognizing the hand of another person who is ahead or behind the operator as the hand of the operator.

〔手の形状と指先位置の検出〕
手形状・指先位置検出部３３は、手位置検出部３２から入力された手画像信号と手位置情報に基づいて手の形状を検出する。
手の形状を検出するために、手形状・指先位置検出部３３は、手画像信号から、例えばエッジ抽出処理を行って手の輪郭部分を検出する。手形状・指先位置検出部３３は、検出した輪郭部分のうち予め定めた範囲の曲率半径（例えば、６−１２ｍｍ）をもつ突起部分を指の領域の画像として探索する。手形状・指先位置検出部３３は、探索において手位置情報が表す代表点からの所定の半径の探索領域に前述の突起部分の有無を判断し、順次半径を変更することで探索領域を同心円状に更新する。手形状・指先位置検出部３３は、検出した指の領域に基づいて指の本数を計数する。手形状・指先位置検出部３３は、検出した突起部分の頂点を各指の指先位置を表す２次元座標として検出する。手形状・指先位置検出部３３は、定めた指先における２次元座標に所在する画素の距離値を距離算出部２００から入力された距離情報から抽出する。手形状・指先位置検出部３３は、抽出した距離値と指先における２次元座標の組を被写空間における３次元座標を表す３次元指先位置情報を生成する。手形状・指先位置検出部３３は、生成した３次元指先位置情報を注目位置検出部３５に出力する。手形状・指先位置検出部３３は、検出した指の領域を表す指画像信号、指の本数を表す本数情報、指先における２次元座標を表す２次元指先位置情報を検出情報の一部として検出情報出力部４１に出力する。 [Detection of hand shape and fingertip position]
The hand shape / fingertip position detection unit 33 detects the shape of the hand based on the hand image signal input from the hand position detection unit 32 and the hand position information.
In order to detect the shape of the hand, the hand shape / fingertip position detection unit 33 performs edge extraction processing, for example, from the hand image signal to detect the contour portion of the hand. The hand shape / fingertip position detection unit 33 searches the detected contour portion for a projection portion having a predetermined radius of curvature (eg, 6-12 mm) as an image of the finger region. The hand shape / fingertip position detection unit 33 determines the presence or absence of the above-described protrusion in the search area having a predetermined radius from the representative point represented by the hand position information in the search, and changes the radius sequentially to make the search area concentric. Update to The hand shape / fingertip position detection unit 33 counts the number of fingers based on the detected finger area. The hand shape / fingertip position detection unit 33 detects the apex of the detected protrusion as two-dimensional coordinates representing the fingertip position of each finger. The hand shape / fingertip position detection unit 33 extracts the distance value of the pixel located at the two-dimensional coordinates of the determined fingertip from the distance information input from the distance calculation unit 200. The hand shape / fingertip position detection unit 33 generates a three-dimensional fingertip position information representing a three-dimensional coordinate in the subject space from the set of the extracted distance value and the two-dimensional coordinate at the fingertip. The hand shape / fingertip position detection unit 33 outputs the generated three-dimensional fingertip position information to the attention position detection unit 35. The hand shape / fingertip position detection unit 33 detects the finger image signal representing the detected finger region, the number information representing the number of fingers, and the two-dimensional fingertip position information representing the two-dimensional coordinates on the fingertip as detection information. Output to the output unit 41.

〔注目位置の検出〕
注目位置検出部３５は、目位置検出部３１から入力された３次元目位置情報と手形状・指先位置検出部３３から入力された３次元指先位置情報に基づいて操作者が注目する位置である注目位置を検出する。注目位置検出部３５は、例えば、３次元目位置情報が表す目の位置と３次元指先位置情報が表す指先の位置を結ぶ直線が、表示装置１０と交わる交点を注目位置として算出する。注目位置検出部３５は、算出した交点（被写空間における３次元座標）を表示装置１０が表す画像に対する２次元の画像座標系に変換して、変換した座標を表す注目位置情報を生成する。注目位置検出部３５は、生成した注目位置情報を検出情報の一部として検出情報出力部４１に出力する。 [Detection of attention position]
The attention position detection unit 35 is a position where the operator pays attention based on the three-dimensional eye position information input from the eye position detection unit 31 and the three-dimensional fingertip position information input from the hand shape / fingertip position detection unit 33. Detect the attention position. The attention position detection unit 35 calculates, for example, an intersection point where a straight line connecting the eye position represented by the three-dimensional position information and the fingertip position represented by the three-dimensional fingertip position information intersects the display device 10 as the attention position. The attention position detection unit 35 converts the calculated intersection (three-dimensional coordinates in the object space) into a two-dimensional image coordinate system for the image represented by the display device 10, and generates attention position information representing the converted coordinates. The attention position detection unit 35 outputs the generated attention position information to the detection information output unit 41 as a part of the detection information.

〔特徴情報の解析〕
特徴情報解析部３７は、顔検出部３０から入力された顔面画像信号に基づきユーザの属性（例えば、年齢、性別、表情）を表す特徴情報を生成する。年齢を表す特徴情報（年齢情報）として、例えば具体的な年齢に限らず予め定めた年齢層（例えば、１０代、２０代、幼年、青少年、青壮年、後年）を表す情報であってもよい。性別を表す特徴情報（性別情報）は、男性又は女性を表す情報である。表情を表す特徴情報（表情情報）は、例えば笑っているかいないかを表す情報である。表情情報は、笑顔である度合いを表す笑顔度を含んでいてもよい。 [Analysis of feature information]
The feature information analysis unit 37 generates feature information representing user attributes (for example, age, gender, facial expression) based on the face image signal input from the face detection unit 30. As characteristic information (age information) representing age, for example, information not only representing a specific age but also a predetermined age group (for example, teens, 20s, childhood, youth, youth, later years) Good. The characteristic information (gender information) representing gender is information representing male or female. The feature information (expression information) representing a facial expression is information representing, for example, whether or not you are laughing. The facial expression information may include a smile level indicating the degree of smile.

年齢情報や性別情報を生成するためには、特徴情報解析部３７は、例えば、年齢が既知である人間の顔面の画像や性別が既知である人間の顔面の画像を表す顔面画像信号を記憶させた記憶部を備える。特徴情報解析部３７は、検出した顔面の領域の画像と記憶部から読み出した顔面画像信号との間の指標値を算出し、算出した指標値に基づいて年齢情報又は性別情報を定める。例えば、指標値として類似度を算出し、類似度が最大となる年齢情報又は性別情報を定める方法を用いてもよい。その他、指標値としてＧａｂｏｒ特徴量を算出し、ＳＶＭ（ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅ、サポートベクターマシン）を用いて年齢情報又は性別情報を判別する方法を用いてもよい。 In order to generate age information and gender information, the feature information analysis unit 37 stores, for example, a facial image signal representing an image of a human face whose age is known or an image of a human face whose sex is known. A storage unit. The feature information analysis unit 37 calculates an index value between the detected facial region image and the facial image signal read from the storage unit, and determines age information or gender information based on the calculated index value. For example, a method may be used in which similarity is calculated as an index value, and age information or gender information that maximizes the similarity is determined. In addition, a method may be used in which a Gabor feature amount is calculated as an index value, and age information or gender information is determined using an SVM (Support Vector Machine, support vector machine).

表情情報を生成するためには、特徴情報解析部３７は、例えば、笑っている人間の画像を表す画像情報の各構成部（例えば、目、口）の輪郭情報と該構成部の位置を表す位置情報を予め記憶した記憶部を備える。特徴情報解析部３７は、検出した顔面の領域の画像から各構成部の輪郭情報と該構成部の位置情報を生成し、記憶部から読み出した輪郭情報と位置情報とをそれぞれ照合する。 In order to generate facial expression information, the feature information analysis unit 37 represents, for example, the contour information of each component (for example, eyes and mouth) of the image information representing the image of a smiling human and the position of the component. A storage unit that stores location information in advance is provided. The feature information analysis unit 37 generates the contour information of each component and the position information of the component from the detected facial region image, and collates the contour information and the position information read from the storage unit.

これにより特徴情報解析部３７は、年齢、性別、又は表情を表す特徴情報を推定する。特徴情報を推定する方法は、上述に限らず、顔面を表す画像情報から年齢、性別又は表情を推定する方法であればよい。
特徴情報解析部３７は、生成した特徴情報を検出情報の一部として検出情報出力部４１に出力する。 Thereby, the feature information analysis unit 37 estimates feature information representing age, sex, or facial expression. The method for estimating the feature information is not limited to the above, and any method may be used as long as it estimates the age, sex, or facial expression from the image information representing the face.
The feature information analysis unit 37 outputs the generated feature information to the detection information output unit 41 as part of the detection information.

〔操作者の判別〕
次に、操作者の判別処理について説明する。
操作者判別部３９は、顔検出部３０から入力された３次元顔位置情報に基づいて、顔面の領域を検出したユーザのうち予め定めた操作可能距離の領域に所在するユーザを操作者として判別する（図１、２参照）。操作者判別部３９は、例えば、３次元顔位置情報が表す距離が予め定めた操作可能距離の上限よりも、撮像装置１１からの距離が短い位置に顔面が所在するユーザを操作者として判別する。これにより、顔検出部３０が顔面を検出したユーザの中から操作者が判別される。他方、操作者判別部３９は、予め定めた操作可能領域外に所在するユーザを操作不可者として判別する。 [Determination of operator]
Next, an operator determination process will be described.
Based on the three-dimensional face position information input from the face detection unit 30, the operator determination unit 39 determines a user who is located in a predetermined operable distance area among users who have detected a face area as an operator. (See FIGS. 1 and 2). The operator discriminating unit 39 discriminates, as an operator, a user whose face is located at a position where the distance represented by the three-dimensional face position information is shorter than the predetermined upper limit of the operable distance, for example. . Thereby, an operator is discriminated from the users whose face detection unit 30 has detected the face. On the other hand, the operator discriminating unit 39 discriminates a user who is located outside a predetermined operable area as an inoperable person.

操作者判別部３９は、操作可能領域に所在するユーザの身体（例えば、手）の一部が、操作開始検出範囲に所在するか否か判断する。操作開始検出範囲は、操作可能領域に含まれ、操作可能領域よりも狭い定めた領域である。操作者判別部３９は、顔検出部３０から入力された２次元顔位置情報や目位置検出部３１から入力された２次元目位置情報に基づいて操作開始検出範囲を定める。これにより操作者からの操作が開始されたことを検知するとともに、同時に複数名からの操作受け付けられることを回避し、操作者１名のみの操作が受け付けられるようにする。 The operator determination unit 39 determines whether a part of the user's body (for example, a hand) located in the operable area is located in the operation start detection range. The operation start detection range is a defined area that is included in the operable area and is narrower than the operable area. The operator determination unit 39 determines an operation start detection range based on the two-dimensional face position information input from the face detection unit 30 and the two-dimensional eye position information input from the eye position detection unit 31. As a result, it is detected that the operation from the operator has been started, and at the same time, it is possible to avoid receiving operations from a plurality of people at the same time and to accept the operation of only one operator.

次に、操作開始検出範囲の一例について説明する。
図８は、操作開始検出範囲の一例を表す概念図である。
図８は、左側に操作者１３の正面を表し、右側に操作者１３の左側面を表す。操作開始検出領域１３３は、例えば、撮像装置１１から入力された映像信号が表す操作者１３の画像に含まれる左眼１３１−１及び右眼１３１−２を結ぶ両眼の高さを表す線分１３４を含む領域である。即ち、線分１３４と平行であって顔面１３２の上端に接する線分１３５と、線分１３４と平行であって顔面１３２の下端に接する線分１３６に挟まれる領域である。ここで、操作者判別部３９は、顔検出部３０から入力された２次元顔面領域情報と目位置検出部３１から入力された２次元目位置情報に基づいて操作開始検出領域を算出する。 Next, an example of the operation start detection range will be described.
FIG. 8 is a conceptual diagram illustrating an example of the operation start detection range.
FIG. 8 shows the front of the operator 13 on the left side and the left side of the operator 13 on the right side. The operation start detection area 133 is, for example, a line segment representing the height of both eyes connecting the left eye 131-1 and the right eye 131-2 included in the image of the operator 13 represented by the video signal input from the imaging device 11. This is a region including 134. That is, the region is sandwiched between a line segment 135 parallel to the line segment 134 and in contact with the upper end of the face 132, and a line segment 136 parallel to the line segment 134 and in contact with the lower end of the face 132. Here, the operator determination unit 39 calculates an operation start detection region based on the two-dimensional face region information input from the face detection unit 30 and the two-dimensional eye position information input from the eye position detection unit 31.

図７に戻り、操作者判別部３９は、手位置検出部３２から入力された手位置情報が表わす代表点が、操作開始検出領域の範囲内にある場合、その代表点に手を所在させ、その操作開始検出領域に顔面が所在するユーザを操作者１３と判断する。その場合、操作者判別部３９は、その操作者に係る検出情報に基づいて操作が開始されたことを表す操作開始信号を検出情報出力部４１に出力する。即ち、操作者判別部３９は、このように操作開始検出領域を定めることにより顔面と同じ高さに手を移動させたユーザを操作者１３と判断する。
操作者判別部３９は、手位置検出部３２から入力された手位置情報が表わす代表点が、操作開始検出領域の範囲外に離れた場合、その代表点に手を所在させた操作者１３は操作を終了したと判断する。その場合、操作者判別部３９は、その操作者に係る検出情報に基づいて操作が終了されたことを表す操作終了信号を検出情報出力部４１に出力する。即ち、操作者判別部３９が、ある操作者１３について操作開始信号を検出情報出力部４１に出力してから、操作終了信号を出力されるまでの間は、表示装置１０は、操作者１３と判断されたユーザの手の形状に基づく操作入力を受け付ける。他のユーザについて手位置情報が表す代表点が操作開始検出領域の範囲内にあったとしても、その間、表示装置１０は、他のユーザからの操作を受け付けない。 Returning to FIG. 7, when the representative point represented by the hand position information input from the hand position detecting unit 32 is within the range of the operation start detection area, the operator determining unit 39 locates the hand at the representative point, The user whose face is located in the operation start detection area is determined as the operator 13. In that case, the operator discriminating unit 39 outputs an operation start signal indicating that the operation has started based on the detection information relating to the operator to the detection information output unit 41. That is, the operator discriminating unit 39 determines the user who moves his / her hand to the same height as the face as the operator 13 by defining the operation start detection area in this way.
When the representative point represented by the hand position information input from the hand position detector 32 is outside the range of the operation start detection area, the operator determination unit 39 determines that the operator 13 who has placed his hand at the representative point It is determined that the operation has been completed. In that case, the operator discriminating unit 39 outputs an operation end signal indicating that the operation has ended based on the detection information relating to the operator to the detection information output unit 41. In other words, the display device 10 is connected to the operator 13 during a period from when the operator determination unit 39 outputs an operation start signal for a certain operator 13 to the detection information output unit 41 until an operation end signal is output. An operation input based on the determined user's hand shape is received. Even if the representative point represented by the hand position information for another user is within the range of the operation start detection region, the display device 10 does not accept an operation from another user during that time.

操作者判別部３９は、操作可能領域に所在する他のユーザの有無を確認し、他のユーザがいると判断された場合、上述のように、いると判断された他のユーザが操作者１３か否かを判断する。なお、他のユーザが複数名いる場合には、操作者判別部３９は、手位置情報が表わす代表点が操作開始検出領域の中心に最も近接する１名のユーザを操作者１３として定める。これにより、表示装置１０は、新たな１名の操作者１３からのみの操作入力を受け付け、同時に２名のユーザからの操作入力を受け付けない。 The operator discriminating unit 39 checks the presence / absence of another user located in the operable area, and when it is determined that there is another user, as described above, the other user who is determined to be present is the operator 13. Determine whether or not. When there are a plurality of other users, the operator determination unit 39 determines one user whose representative point represented by the hand position information is closest to the center of the operation start detection area as the operator 13. Thereby, the display device 10 accepts an operation input from only one new operator 13 and does not accept an operation input from two users at the same time.

図８では、操作者は、底面に対して水平な姿勢（例えば、起立）を取っている。しかし、操作者は、このような姿勢を取るとは限らず、例えば、操作者は底面上に横たわることがある。このような場合にも、操作者判別部３９は、上述のように操作開始検出範囲を定めて操作者１３を判断することによって、操作者の姿勢によらず安定した判断が可能であり、誤検出を回避することができる。このことを次に説明する。 In FIG. 8, the operator is taking a horizontal posture (for example, standing) with respect to the bottom surface. However, the operator does not always take such a posture. For example, the operator may lie on the bottom surface. Even in such a case, the operator determination unit 39 can make a stable determination regardless of the operator's posture by determining the operation start detection range and determining the operator 13 as described above. Detection can be avoided. This will be described next.

図９は、操作開始検出範囲のその他の例を表す概念図である。
図９に示す例では、操作者１３は、底面１３７の上を横たわっている。このとき、左眼１３１−１及び右眼１３１−２を結ぶ線分１３４と底面１３７となす角度θは、０°よりも９０°に近い角度（例えば、２０°）である。
この場合、操作者判別部３９は、目位置検出部３１から入力された２次元目位置情報に基づいて左眼１３１−１及び右眼１３１−２を結ぶ線分１３４を定める。操作者判別部３９は、顔検出部３０から入力された２次元顔面領域情報に基づいて顔面の上端に接し線分１３４に平行な線分１３５と顔面の下端に接し線分１３４に平行な線分１３６を定める。操作者判別部３９は、線分１３５と線分１３６に挟まれる領域を操作開始検出範囲１３３と定める。このように操作者判別部３９は、操作者１３の身体の一部である顔面の位置に基づいて操作開始検出範囲１３３を定め、操作入力に係る身体の他部である手との位置関係に基づいて操作を受け付ける操作者を判別する。 FIG. 9 is a conceptual diagram illustrating another example of the operation start detection range.
In the example shown in FIG. 9, the operator 13 is lying on the bottom surface 137. At this time, the angle θ formed by the line segment 134 connecting the left eye 131-1 and the right eye 131-2 and the bottom surface 137 is an angle closer to 90 ° than 0 ° (for example, 20 °).
In this case, the operator determination unit 39 determines a line segment 134 that connects the left eye 131-1 and the right eye 131-2 based on the two-dimensional eye position information input from the eye position detection unit 31. Based on the two-dimensional face area information input from the face detection unit 30, the operator determination unit 39 touches the upper end of the face and is parallel to the line segment 134, and touches the lower end of the face and is parallel to the line segment 134. Minute 136 is defined. The operator determination unit 39 determines an area between the line segment 135 and the line segment 136 as an operation start detection range 133. Thus, the operator determination unit 39 determines the operation start detection range 133 based on the position of the face that is a part of the body of the operator 13, and sets the positional relationship with the hand that is the other part of the body related to the operation input. Based on this, an operator who receives the operation is determined.

なお、上述では、操作者判別部３９は、２次元目位置情報と２次元顔面領域情報に基づいて操作開始検出範囲を定める例について説明したが、本実施形態ではこれに限られない。操作者判別部３９は、被写空間上の３次元座標を表す３次元目位置情報と３次元顔領域情報に基づいて操作開始検出範囲を定めてもよい。その場合、操作者判別部３９は、３次元手位置情報が表す被写空間上の手位置が奥行方向の操作開始検出範囲（図８における操作開始検出開始位置（奥行方向）から開始される操作開始検出領域（奥行方向））にも含まれるか否かによって操作を受け付ける操作者か否かを判別するようにしてもよい。これにより、操作開始検出範囲と手位置との関係を、奥行方向の座標も考慮して誤検出を回避することができる。 In the above description, the operator determination unit 39 has been described with respect to the example in which the operation start detection range is determined based on the two-dimensional position information and the two-dimensional face area information. However, the present embodiment is not limited thereto. The operator determination unit 39 may determine the operation start detection range based on the three-dimensional position information representing the three-dimensional coordinates in the object space and the three-dimensional face area information. In this case, the operator discriminating unit 39 starts the operation start detection range in which the hand position on the subject space represented by the three-dimensional hand position information is in the depth direction (operation start detection start position (depth direction) in FIG. 8). It may be determined whether or not the operator accepts the operation depending on whether it is also included in the start detection area (depth direction). Thereby, it is possible to avoid erroneous detection of the relationship between the operation start detection range and the hand position in consideration of the coordinates in the depth direction.

図７に戻り、検出情報出力部４１には、顔検出部３０、特徴情報解析部３７及び操作者判別部から検出情報が入力される。検出情報出力部４１は、操作者判別部３９から操作開始信号が入力された場合、入力された操作開始信号に係る操作者の検出情報を制御部２２に出力する。検出情報出力部４１は、検出情報出力部４１は、操作者判別部３９から操作終了信号が入力された場合、入力された操作終了信号に係る操作者の検出情報の出力を終了する。
なお、ユーザ情報解析部２０１は、上述した方法や特徴量に限られず、入力された映像信号に基づいてユーザの特徴やユーザが指示する操作に係る情報を検出してもよい。 Returning to FIG. 7, detection information is input to the detection information output unit 41 from the face detection unit 30, the feature information analysis unit 37, and the operator determination unit. When an operation start signal is input from the operator determination unit 39, the detection information output unit 41 outputs detection information of the operator related to the input operation start signal to the control unit 22. When the operation end signal is input from the operator determination unit 39, the detection information output unit 41 ends the output of the operator detection information related to the input operation end signal.
Note that the user information analysis unit 201 is not limited to the above-described method and feature amount, and may detect information related to the user's feature and the operation instructed by the user based on the input video signal.

〔案内画像の表示〕
制御部２２は、情報ＤＢ２１から案内画像信号を読み出し、読み出した案内画像信号を表示部１２に出力する。表示部１２は、制御部２２から入力された案内画像信号に基づく案内画像を表示する。
案内画像とは、表示装置１０が行う処理の種類毎に、手の形状を表す画像を含む画像である。即ち、案内画像は、表示されている手の形状のうちの一つを操作者が実行して、対応する処理を表示装置１０に指示できることを表す。また、手の形状とは、静止した形状だけではなく時間経過に伴って変化する手の動作も含む。
制御部２２が、案内画像信号を表示部１２に出力する時期は、例えば、検出情報出力部４１から検出情報が入力された時点である。この時期は、操作者１３が操作開始検出領域（図８、９参照）内に手を挙げている時期に相当する。 [Guidance image display]
The control unit 22 reads the guide image signal from the information DB 21 and outputs the read guide image signal to the display unit 12. The display unit 12 displays a guide image based on the guide image signal input from the control unit 22.
The guidance image is an image including an image representing the shape of the hand for each type of processing performed by the display device 10. That is, the guidance image indicates that the operator can execute one of the displayed hand shapes and instruct the display device 10 to perform a corresponding process. The hand shape includes not only a stationary shape but also a hand motion that changes with time.
The time when the control unit 22 outputs the guidance image signal to the display unit 12 is, for example, the time when the detection information is input from the detection information output unit 41. This time corresponds to a time when the operator 13 raises his / her hand in the operation start detection area (see FIGS. 8 and 9).

ここで、本実施形態に係る案内画像の一例について説明する。
図１０は、案内画像の一例を表す概念図である。
図１０において、長方形の外枠は、表示部１２の画面の表示領域を表す。表示部１２の下方の破線で示された長方形は、案内画像８０を表す。案内画像８０は、４種類の処理の種類毎に手の形状又は動作を表す画像８１−８４を含む。画像８１−８４の真上には、それぞれ、対応する処理の内容を表す文字列が表示されている。例えば、画像８１は、両手を広げて左右に往復して移動させることを表し、画像８１の真上には「電源を切る」という電源断を表す文字列が表示されている。画像８２は、両手の全ての指を握りながら上下に移動させることを表し、画像８２の真上には「番組表を見る」という電子番組表（ＥＰＧ、ＥｌｅｃｔｒｏｎｉｃＰｒｏｇｒａｍＧｕｉｄｅ）を表示することを表す文字列が表示されている。画像８３は、両手の人差し指と中指を上方に立てて奥行方向に移動させることを表し、画像８３の真上には「音を大きく」という音量を大きくすることを表す文字列が表されている。画像８４は、両手の人差し指と中指を下方に立てて奥行方向に移動させることを表し、画像８４の真上には「音を小さく」という音量を小さくすることを表す文字列が表されている。即ち、操作に習熟していない操作者１３であっても案内画像を視聴することによって、目的とする操作を行なうためにとるべき手の形状および動作を把握することができる。
また、案内画像はこれに限定されたものではなく、手の形状および動作を視覚的に表す動画（アニメーション）で構成された案内画像であってもよい。例えば、画像８１では、両手を広げて左右に往復して移動する動画（アニメーション）で案内画像に表示してもよい。 Here, an example of the guidance image according to the present embodiment will be described.
FIG. 10 is a conceptual diagram illustrating an example of a guidance image.
In FIG. 10, the rectangular outer frame represents the display area of the screen of the display unit 12. A rectangle indicated by a broken line below the display unit 12 represents the guide image 80. The guide image 80 includes images 81-84 representing hand shapes or actions for each of the four types of processing. A character string representing the content of the corresponding process is displayed immediately above the images 81-84. For example, the image 81 represents that both hands are extended and reciprocated to the left and right, and a character string indicating power-off such as “turn off the power” is displayed immediately above the image 81. The image 82 represents moving up and down while grasping all fingers of both hands, and displaying an electronic program guide (EPG, Electronic Program Guide) “View the program guide” directly above the image 82. A character string is displayed. The image 83 represents that the index finger and middle finger of both hands are raised upward and moved in the depth direction, and a character string representing that the volume is increased is displayed immediately above the image 83. . The image 84 represents that the forefinger and middle finger of both hands are raised downward and moved in the depth direction, and a character string representing that the volume is set to “reduce sound” is displayed directly above the image 84. . That is, even the operator 13 who is not familiar with the operation can grasp the shape and movement of the hand to be taken in order to perform the intended operation by viewing the guidance image.
In addition, the guide image is not limited to this, and may be a guide image composed of a moving image (animation) that visually represents the shape and motion of the hand. For example, the image 81 may be displayed in the guide image as a moving image (animation) that moves both hands open and reciprocates left and right.

〔手形状認識による操作〕
制御部２２は、検出情報出力部４１から入力された検出情報から、３次元顔面位置情報、指画像信号、本数情報、２次元指先位置情報、注目位置情報、特徴情報を抽出する。手の形状を表す手形状情報は、抽出された情報のうち、指画像信号、本数情報、２次元指先位置を含んで構成される。手形状情報は、一時的又は静止した手の形状、即ち姿勢を表す情報であってもよいし、時間経過によって変化する手の形状、即ち動作を表す情報であってもよい。
制御部２２は、予め操作対象となる処理の種別毎に手形状情報を対応付けられて記憶された記憶部を備える。制御部２２は、記憶部から手形状情報を読み出し、読み出した手形状情報と入力された手形状情報を照合する。制御部２２は、照合に成功した手形状情報に対応した処理を実行する。例えば、入力された手形状情報が、画像８２に示されるように両手の全ての指を握りながら上下に移動させることを表す場合、制御部２２は、番組表情報を情報ＤＢ２１から読み出し、読み出した番組表情報を表示部１２に出力する。
上述の手の形状は、図１０に示したものに限られず、表示装置１０の処理の種類に対する指示として特定できればよい。例えば、伸ばしている指の本数、手の方向、親指の先端と人差し指の先端を接触させた形状であってもよい。また、本実施形態では、操作者の手の形状に限られず、身体の他の部分の形状を用いてもよい。 [Operation by hand shape recognition]
The control unit 22 extracts three-dimensional face position information, finger image signals, number information, two-dimensional fingertip position information, attention position information, and feature information from the detection information input from the detection information output unit 41. Hand shape information representing the shape of the hand is configured to include the finger image signal, the number information, and the two-dimensional fingertip position among the extracted information. The hand shape information may be information representing a temporary or stationary hand shape, that is, a posture, or may be information representing a hand shape that changes over time, that is, an action.
The control unit 22 includes a storage unit that stores in advance hand shape information in association with each type of process to be operated. The control unit 22 reads the hand shape information from the storage unit, and collates the read hand shape information with the input hand shape information. The control unit 22 executes processing corresponding to the hand shape information that has been successfully verified. For example, in the case where the input hand shape information represents moving up and down while grasping all fingers of both hands as shown in the image 82, the control unit 22 reads the program guide information from the information DB 21 and reads it out. The program guide information is output to the display unit 12.
The shape of the hand described above is not limited to that illustrated in FIG. 10, and may be specified as an instruction for the type of processing of the display device 10. For example, the shape may be such that the number of fingers being stretched, the direction of the hand, the tip of the thumb and the tip of the index finger are in contact. In the present embodiment, the shape of the other part of the body is not limited to the shape of the operator's hand.

〔案内画像の表示タイミングの制御〕
操作に習熟した操作者にとって、案内画像８０が表示されてから操作を行うことが煩わしい場合がある。例えば、制御部２２は、検出情報出力部４１から検出情報が入力された時点で案内画像を表示しなくともよい。但し、制御部２２は、記憶部から読み出した手形状情報と入力された手形状情報との照合に失敗した場合に、案内画像を表示する。これにより、操作者に対して照合に失敗した場合に所望の処理を行わせるための操作内容の確認を促すことができる。
また、制御部２２は、操作者毎に異なったタイミングで案内画像を表示してもよい。例えば、初めて操作を行う操作者や、操作に不慣れな操作者に対して、Ｉ．案内画像を表示する場合（Ｉ．案内画像表示）がある。また、操作に習熟している操作者に対して、案内画像を表示せず当初から検出情報を受け付ける場合（ＩＩ．直接操作）がある。 [Control of guide image display timing]
For an operator who is familiar with the operation, it may be troublesome to perform the operation after the guidance image 80 is displayed. For example, the control unit 22 may not display the guidance image when the detection information is input from the detection information output unit 41. However, the control unit 22 displays a guide image when the verification of the hand shape information read from the storage unit and the input hand shape information fails. Accordingly, it is possible to prompt the operator to confirm the operation content for performing a desired process when the verification fails.
Moreover, the control part 22 may display a guidance image at a different timing for each operator. For example, I.D. There is a case where a guide image is displayed (I. guide image display). In addition, there is a case where detection information is received from the beginning without displaying a guidance image for an operator who is proficient in the operation (II. Direct operation).

次に、本実施形態に係る制御部２２が処理を行うタイミングについて説明する。
図１１は、本実施形態に係る制御部２２が処理を行うタイミングを表す概略図である。
図１１において、横軸は時刻、縦軸は、上から下へ順にＩ．案内画像表示、ＩＩ．直接操作を表す。横軸の左端は、ともに制御部２２が最初に検出情報が入力された時点を表す。
Ｉ．案内画像表示について、制御部２２は、検出情報が最初に入力されてから予め定めた時間Ｔ１だけ経過するまでの間、案内画像信号の出力を待機する。制御部２２は、時間Ｔ１が経過した時点において、案内画像信号を表示部１２に出力する。制御部２２は、検出情報出力部４１から検出情報が入力されると、入力された検出情報に対応する処理を実行し、表示部１２は、表示している案内画像を消去する。
ＩＩ．直接操作について、制御部２２は、検出情報が最初に入力されてから、検出情報出力部４１からの検出情報を待ち受ける。制御部２２は、検出情報出力部４１から検出情報が入力されると、入力された検出情報に対応する処理を実行する。 Next, timing when the control unit 22 according to the present embodiment performs processing will be described.
FIG. 11 is a schematic diagram illustrating the timing at which the control unit 22 according to the present embodiment performs processing.
In FIG. 11, the horizontal axis represents time, and the vertical axis represents I.D. Guide image display, II. Represents a direct operation. The left end of the horizontal axis represents the time point when the detection information is first input by the control unit 22.
I. Regarding the guide image display, the control unit 22 waits for the output of the guide image signal until the predetermined time T1 elapses after the detection information is first input. The control unit 22 outputs a guide image signal to the display unit 12 when the time T1 has elapsed. When the detection information is input from the detection information output unit 41, the control unit 22 executes a process corresponding to the input detection information, and the display unit 12 erases the displayed guide image.
II. For direct operation, the control unit 22 waits for detection information from the detection information output unit 41 after detection information is first input. When the detection information is input from the detection information output unit 41, the control unit 22 executes a process corresponding to the input detection information.

制御部２２が、案内画像を表示するタイミングを区別するために、例えば、ユーザ情報解析部２０１は、ユーザ認識部を備え、操作者を認識し、認識した操作者毎の習熟度を判断してもよい。ここで、ユーザ認識部は、予め操作者毎に顔面画像信号を記憶させておいた記憶部を備え、顔検出部３０から入力された顔面画像信号と照合がとれた操作者を認識する。制御部２２は、認識された操作者による検出情報に対応する処理を実行すると、実行されたことを表す処理実行信号をユーザ認識部に出力する。ユーザ認識部は、制御部２２から処理実行信号が入力された回数を累積し、累積した回数を表す回数情報を、検出情報出力部４１を通じて制御部２２に出力する。制御部２２は、ユーザ認識部から入力された回数情報が表す回数が、予め設定された回数（例えば、３０回）を越えない場合、当該操作者は習熟していないと判断し、案内画像を表示する。即ち、制御部２２は、上述のＩ．案内画像に順じたタイミングで処理を行う。制御部２２は、ユーザ認識部から入力された回数情報が表す回数が、予め設定された回数を越えた場合、当該操作者は習熟したと判断し、案内画像を表示する。即ち、制御部２２は、上述のＩＩ．直接操作に順じたタイミングで処理を行う。なお、本実施形態では、操作者の習熟度が高いほど、案内画像を表示するまでの時間Ｔ１が短くなるようにしてもよい。
これにより、操作に習熟した操作者による操作入力に対して、案内画像の表示が行われず、対応する処理が行われる。そのため、本実施形態に係る表示装置１０では、操作の習熟の有無に関わらず快適に操作入力を行うことができる。 In order to distinguish the timing at which the control unit 22 displays the guidance image, for example, the user information analysis unit 201 includes a user recognition unit, recognizes the operator, and determines the proficiency level for each recognized operator. Also good. Here, the user recognition unit includes a storage unit that stores a facial image signal for each operator in advance, and recognizes an operator that is collated with the facial image signal input from the face detection unit 30. When executing the process corresponding to the detected information by the recognized operator, the control unit 22 outputs a process execution signal indicating that the process has been executed to the user recognition unit. The user recognizing unit accumulates the number of times the process execution signal is input from the control unit 22, and outputs the number of times information indicating the accumulated number of times to the control unit 22 through the detection information output unit 41. When the number of times represented by the number information input from the user recognition unit does not exceed a preset number (for example, 30 times), the control unit 22 determines that the operator is not proficient and displays a guidance image. indicate. That is, the control unit 22 performs the above-described I.D. Processing is performed at the timing in accordance with the guide image. When the number of times represented by the number information input from the user recognition unit exceeds a preset number, the control unit 22 determines that the operator has mastered and displays a guidance image. That is, the control unit 22 performs the above II. Processing is performed at the timing in accordance with the direct operation. In the present embodiment, the higher the skill level of the operator, the shorter the time T1 until the guidance image is displayed.
Thereby, the guidance image is not displayed for the operation input by the operator who is familiar with the operation, and the corresponding processing is performed. Therefore, in the display device 10 according to the present embodiment, it is possible to comfortably perform an operation input regardless of whether or not the operation is proficient.

情報ＤＢ２１には、複数の種類の案内画像信号を、予め定めた距離区分と対応付けて記憶させておき、制御部２２は抽出した３次元顔面距離情報が表す距離を含む距離区分に対応する案内画像信号を読み出すようにしてもよい。そして、制御部２２は、制御読み出した案内画像信号を表示部１２に出力し、表示部１２は、制御部２２から入力された案内画像信号が表す案内画像を表示する。例えば、表示装置１０からの操作者１３の距離が長い区分ほど、１つの案内画像に含まれる処理の種類の数を減少させ、各処理に対する操作を表す画面の面積（画素数）を増加した画像を表す案内画像信号を情報ＤＢ２１に記憶させておく。また、処理内容を表示する文字を大きくした画像を表す案内画像信号を記憶させておいてもよい。このようにして、操作者１３は、表示部１２からの距離が長くなっても大きく表示された操作内容を明確に把握できるようになる。 The information DB 21 stores a plurality of types of guidance image signals in association with predetermined distance sections, and the control unit 22 guides corresponding to the distance sections including the distance represented by the extracted three-dimensional face distance information. An image signal may be read out. Then, the control unit 22 outputs the control-read guidance image signal to the display unit 12, and the display unit 12 displays the guide image represented by the guide image signal input from the control unit 22. For example, as the distance of the operator 13 from the display device 10 is longer, the number of types of processing included in one guide image is decreased, and the screen area (number of pixels) representing operations for each processing is increased. Is stored in the information DB 21. Further, a guide image signal representing an image in which characters for displaying the processing contents are enlarged may be stored. In this way, the operator 13 can clearly grasp the operation content that is displayed greatly even when the distance from the display unit 12 is increased.

情報ＤＢ２１には、複数の種類の案内画像信号を特徴情報と対応付けて記憶させておき、制御部２２は抽出した特徴情報に対応する案内画像信号を読み出し、読み出した案内画像信号を表示部１２に出力するようにしてもよい。例えば、操作者１３の年齢層が高いほど、１つの案内画像に含まれる処理の種類の数を減少させ、各処理に対する操作を表す画面の面積（画素数）を増加した画像を表す案内画像信号を情報ＤＢ２１に記憶させておく。また、処理内容を表示する文字を他の年齢層よりも大きくした画像を表す案内画像信号を記憶させておいてもよい。このようにして、年齢の高い操作者１３でも、操作内容を明確に把握できるようになる。また、音量調整の刻み幅を他の年齢層よりも大きくした画像を表す画像案内信号を情報ＤＢ２１記憶させておき、制御部２２は、その刻み幅に基づき音量調整に係る処理を行ってもよい。これにより、年齢の高い操作者１３に対しては、他の年齢層よりも音量の調整量を大きくして便宜を図ることができる。 The information DB 21 stores a plurality of types of guide image signals in association with feature information, and the control unit 22 reads the guide image signal corresponding to the extracted feature information and displays the read guide image signal on the display unit 12. May be output. For example, as the age group of the operator 13 is higher, the number of types of processing included in one guidance image is decreased, and a guidance image signal representing an image with an increased screen area (number of pixels) representing an operation for each processing. Is stored in the information DB 21. Moreover, you may memorize | store the guidance image signal showing the image which made the character which displays the processing content larger than another age group. In this way, even an older operator 13 can clearly grasp the operation content. In addition, an image guidance signal representing an image whose volume adjustment step size is larger than that of other age groups is stored in the information DB 21, and the control unit 22 may perform processing related to the volume adjustment based on the step size. . Thereby, it is possible to increase the volume adjustment amount for the operator 13 with a higher age than other age groups for convenience.

例えば、操作者１３の性別が女性の場合には、背景を赤色や桃色等の暖色系で表した案内画像信号や、アニメーション映画の登場人物の表した案内画像信号を記憶させておいてもよい。このようにして、操作者１３が女性であっても、親近感をもって表示装置１０を操作できるようになる。
また、小児（例えば、１０歳以下）である操作者１３の年齢層に対して、処理内容を表示する文字をひらがなで表した案内画像信号や、アニメーション映画の登場人物の表した案内画像信号を記憶させておいてもよい。このようにして、操作者１３が小児であっても、表示された操作内容を把握でき、親近感をもって表示装置１０を操作できるようになる。
また、小児や高齢者（例えば、６０歳以上）である操作者１３の年齢層に対しては、処理内容を動画像で表す案内画像信号を記憶させておいてもよい。これにより、小児や高齢者でも操作方法を表示された動画を視聴して直感的に把握することができる。 For example, when the gender of the operator 13 is female, a guide image signal whose background is expressed in a warm color system such as red or pink, or a guide image signal expressed by a character in an animated movie may be stored. . In this way, even if the operator 13 is a woman, the display device 10 can be operated with a sense of familiarity.
In addition, for the age group of the operator 13 who is a child (for example, 10 years old or younger), a guide image signal representing characters for displaying the processing contents in hiragana or a guide image signal representing a character in an animated movie is displayed. It may be memorized. In this way, even if the operator 13 is a child, the displayed operation content can be grasped, and the display device 10 can be operated with a sense of familiarity.
Further, for the age group of the operator 13 who is a child or an elderly person (for example, 60 years old or older), a guide image signal representing the processing content as a moving image may be stored. Thereby, even a child or an elderly person can view and intuitively grasp the moving image on which the operation method is displayed.

制御部２２は、操作者１３が意図しない微細な手の形状の変化を検出しないことで誤操作を回避している。そのために、制御部２２は、一定時間間隔毎に予め定めた距離や位置の閾値よりも手の形状が変化した場合、その変化を検出したと判定し、その後、上述の手形状情報の照合を行う。ここで、制御部２２は、抽出した３次元顔面距離情報が表す距離が長いほど、距離の閾値を大きくする。これにより、表示装置１０からの距離が長いほど、操作者１３による手形状の変化を大きくしなければ、制御部２２は、その変化を受け付けなくなる。ひいては、操作者１３は、表示装置１０からの距離が長いほど、手形状の変化をより大きくして操作入力することが促され、撮像装置１１の解像度の影響を低減することができる。 The control unit 22 avoids an erroneous operation by not detecting a minute hand shape change that is not intended by the operator 13. For this purpose, the control unit 22 determines that the change has been detected when the shape of the hand changes from a predetermined distance or position threshold at predetermined time intervals, and then checks the above-described hand shape information. Do. Here, the control unit 22 increases the distance threshold as the distance represented by the extracted three-dimensional face distance information is longer. Accordingly, as the distance from the display device 10 is longer, the control unit 22 cannot accept the change unless the change in the hand shape by the operator 13 is increased. As a result, as the distance from the display device 10 is longer, the operator 13 is prompted to perform an operation input with a larger change in hand shape, and the influence of the resolution of the imaging device 11 can be reduced.

制御部２２は、小児である操作者１３の年齢層に対して、他の年齢層よりも手形状の変化を検出するための距離の閾値を小さくしてもよい。これにより、手の大きさが十分発達していない小児であっても、表示装置１０を快適に操作することができる。また、制御部２２は、操作者１３の年齢層が高いほど検出するための時間間隔を長くしてもよい。これにより、動作が緩慢な高齢者でも表示装置１０を快適に操作することができる。
これにより、本実施形態では、操作者の特徴を表す特徴情報毎に案内画像の表示、その他の操作に係る処理を変更されるため、操作者の特徴に関わらず快適な操作を実現させることができる。 The control part 22 may make the threshold value of the distance for detecting the change of a hand shape smaller than the other age group with respect to the age group of the operator 13 who is a child. Thereby, even the child whose hand size is not sufficiently developed can comfortably operate the display device 10. Moreover, the control part 22 may lengthen the time interval for detecting, so that the age group of the operator 13 is high. Thereby, even the elderly person whose operation | movement is slow can operate the display apparatus 10 comfortably.
As a result, in this embodiment, the display of the guidance image and processing related to other operations are changed for each feature information representing the feature of the operator, so that a comfortable operation can be realized regardless of the feature of the operator. it can.

〔処理フロー〕
次に、本実施形態に係るデータ入力処理について説明する。
図１２は、本実施形態に係るデータ入力処理を表すフローチャートである。
（ステップＳ９００）撮像部１１０ａ、１１０ｂは、それぞれ映像を撮像し、撮像した映像信号を距離算出部２００に出力する。撮像部１１０ａは、映像信号をユーザ情報解析部２０１の顔検出部３０、手位置検出部３２に出力する。その後、ステップＳ９０１に進む。
（ステップＳ９０１）距離算出部２００は、撮像部１１０ａ、１１０ｂから各々入力された映像信号に基づいて撮像装置１１から操作者までの距離を、例えばステレオマッチング方式を用いて算出し、算出した距離情報を生成する。距離算出部２００は、生成した距離情報をユーザ情報解析部２０１の顔検出部３０、目位置検出部３１、手位置検出部３２、手形状・指先位置検出部３３に出力する。その後、ステップＳ９０２に出力する。 [Process flow]
Next, data input processing according to the present embodiment will be described.
FIG. 12 is a flowchart showing data input processing according to the present embodiment.
(Step S900) The imaging units 110a and 110b each capture a video and output the captured video signal to the distance calculation unit 200. The imaging unit 110a outputs the video signal to the face detection unit 30 and the hand position detection unit 32 of the user information analysis unit 201. Thereafter, the process proceeds to step S901.
(Step S901) The distance calculation unit 200 calculates the distance from the imaging device 11 to the operator based on the video signals input from the imaging units 110a and 110b, for example, using a stereo matching method, and the calculated distance information. Is generated. The distance calculation unit 200 outputs the generated distance information to the face detection unit 30, the eye position detection unit 31, the hand position detection unit 32, and the hand shape / fingertip position detection unit 33 of the user information analysis unit 201. Then, it outputs to step S902.

（ステップＳ９０２）顔検出部３０は、撮像部１１０ａから入力された映像信号が表す操作者の顔面の画像を表す領域を検出する。顔検出部３０は、検出した顔面の領域に基づいて２次元顔面領域情報を生成する。顔検出部３０は、距離算出部２００から入力された距離情報から、２次元顔領域情報が表す２次元座標の画素に係る距離値を抽出する。顔検出部３０は、前述の２次元座標と対応する距離値を、被写空間における３次元座標に変換して、３次元顔面位置情報を生成する。顔検出部３０は、検出した顔面の画像を表す顔面画像信号を特徴情報解析部３７と目位置検出部３１に出力する。顔検出部３０は、生成した３次元顔面位置情報及び２次元顔面領域情報を操作者判別部３９に出力する。顔検出部３０は、生成した３次元顔面位置情報を検出情報の一部として検出情報出力部４１に出力する。 (Step S902) The face detection unit 30 detects an area representing an image of the face of the operator represented by the video signal input from the imaging unit 110a. The face detection unit 30 generates two-dimensional face area information based on the detected face area. The face detection unit 30 extracts the distance value related to the pixel of the two-dimensional coordinate represented by the two-dimensional face area information from the distance information input from the distance calculation unit 200. The face detection unit 30 converts the distance value corresponding to the above-described two-dimensional coordinates into three-dimensional coordinates in the object space, and generates three-dimensional face position information. The face detection unit 30 outputs a face image signal representing the detected face image to the feature information analysis unit 37 and the eye position detection unit 31. The face detection unit 30 outputs the generated 3D face position information and 2D face area information to the operator determination unit 39. The face detection unit 30 outputs the generated three-dimensional face position information to the detection information output unit 41 as part of the detection information.

目位置検出部３１は、顔検出部３０から入力された顔面画像信号が表す顔面の画像から目の領域を検出する。目位置検出部３１は、検出した目の領域に基づいて目位置座標を算出する。目位置検出部３１は、検出した目位置座標に所在する画素における距離値を、距離情報算出部２００から入力された距離情報から抽出する。目位置検出部３１は、算出した２次元の目位置座標と抽出した距離値の組を、被写空間における３次元の目位置座標に変換して３次元目位置情報を生成する。目位置検出部３１は、算出した３次元の目位置座標を表す３次元目位置情報を注目位置検出部３５および操作者判別部３９に出力する。目位置検出部３１は、検出した目の領域の画像を表す目領域信号、算出した２次元の目位置座標を表す２次元目位置情報を操作者判別部３９に出力する。その後、ステップＳ９０３に進む。 The eye position detection unit 31 detects an eye region from the facial image represented by the facial image signal input from the face detection unit 30. The eye position detection unit 31 calculates eye position coordinates based on the detected eye region. The eye position detection unit 31 extracts the distance value at the pixel located at the detected eye position coordinates from the distance information input from the distance information calculation unit 200. The eye position detection unit 31 converts the set of the calculated two-dimensional eye position coordinates and the extracted distance value into three-dimensional eye position coordinates in the object space, and generates three-dimensional eye position information. The eye position detection unit 31 outputs the three-dimensional eye position information representing the calculated three-dimensional eye position coordinates to the attention position detection unit 35 and the operator determination unit 39. The eye position detection unit 31 outputs an eye region signal representing the detected image of the eye region and two-dimensional eye position information representing the calculated two-dimensional eye position coordinates to the operator determination unit 39. Thereafter, the process proceeds to step S903.

（ステップＳ９０３）手位置検出部３２は、撮像部１１０ａから入力された映像信号が表す操作者の手の画像を表す領域を検出し、検出した手の位置を表す２次元座標値を算出する。手位置検出部３２は、算出した座標値に対応する距離値を、距離算出部２００から入力された距離情報から抽出し、算出した２次元座標値と対応する距離値の組を被写空間における３次元座標に変換して３次元手位置情報を生成する。手位置検出部３２は、検出した手の領域の画像を表す手画像信号と、算出した代表点の２次元座標値を表す手位置情報を手形状・指先位置検出部３３に出力する。手位置検出部３２は、当該手位置情報を操作者判別部３９に出力する。 (Step S903) The hand position detection unit 32 detects an area representing the image of the operator's hand represented by the video signal input from the imaging unit 110a, and calculates a two-dimensional coordinate value representing the detected position of the hand. The hand position detection unit 32 extracts a distance value corresponding to the calculated coordinate value from the distance information input from the distance calculation unit 200, and sets a set of distance values corresponding to the calculated two-dimensional coordinate value in the subject space. Three-dimensional hand position information is generated by converting into three-dimensional coordinates. The hand position detection unit 32 outputs a hand image signal representing the detected image of the hand region and hand position information representing the calculated two-dimensional coordinate value of the representative point to the hand shape / fingertip position detection unit 33. The hand position detection unit 32 outputs the hand position information to the operator determination unit 39.

手形状・指先位置検出部３３は、手位置検出部３２から入力された手画像信号と手位置情報に基づいて手の形状を検出する。手形状・指先位置検出部３３は、検出した手の形状に基づいて指の領域の画像を探索し、指の本数を計数する。手形状・指先位置検出部３３は、各指の指先位置を表す２次元座標として検出し、検出した２次元座標に所在する画素の距離値を距離算出部２００から入力された距離情報から抽出する。手形状・指先位置検出部３３は、抽出した距離値と指先における２次元座標の組を被写空間における３次元座標を表す３次元指先位置情報を生成する。手形状・指先位置検出部３３は、生成した３次元指先位置情報を注目位置検出部３５に出力する。手形状・指先位置検出部３３は、検出した指の領域を表す指画像信号、指の本数を表す本数情報、指先における２次元座標を表す２次元指先位置情報を検出情報の一部として検出情報出力部４１に出力する。その後、ステップＳ９０４に進む。 The hand shape / fingertip position detection unit 33 detects the shape of the hand based on the hand image signal input from the hand position detection unit 32 and the hand position information. The hand shape / fingertip position detection unit 33 searches for an image of the finger area based on the detected hand shape, and counts the number of fingers. The hand shape / fingertip position detection unit 33 detects the fingertip position of each finger as a two-dimensional coordinate, and extracts the distance value of the pixel located at the detected two-dimensional coordinate from the distance information input from the distance calculation unit 200. . The hand shape / fingertip position detection unit 33 generates a three-dimensional fingertip position information representing a three-dimensional coordinate in the subject space from the set of the extracted distance value and the two-dimensional coordinate at the fingertip. The hand shape / fingertip position detection unit 33 outputs the generated three-dimensional fingertip position information to the attention position detection unit 35. The hand shape / fingertip position detection unit 33 detects the finger image signal representing the detected finger region, the number information representing the number of fingers, and the two-dimensional fingertip position information representing the two-dimensional coordinates on the fingertip as detection information. Output to the output unit 41. Thereafter, the process proceeds to step S904.

（ステップＳ９０４）操作者判別部３９は、顔検出部３０から入力された３次元顔位置情報に基づいて、顔面の領域を検出したユーザのうち予め定めた操作可能領域に所在するユーザを操作者として判別する。操作者判別部３９は、顔検出部３０から入力された２次元顔領域情報や目位置検出部３１から入力された２次元目位置情報に基づいて操作開始検出範囲を定める。操作者判別部３９は、操作可能領域に所在する操作者の手が、操作開始検出範囲に所在するか否か判断する。これにより操作者からの操作が開始されたことを検知する。操作が開始されたと判断された場合（ステップＳ９０４Ｙｅｓ）、ステップＳ９０５に進む。操作が開始されていないと判断された場合（ステップＳ９０４Ｎｏ）、ステップＳ９００に進む。
なお、前述同様に、操作者判別部３９は、被写空間上の３次元座標を表す３次元目位置情報と３次元顔領域情報に基づいて操作開始検出範囲を定めてもよい。その場合、操作者判別部３９は、３次元手位置情報が表す被写空間上の手位置が奥行方向の操作開始検出範囲（図８における操作開始検出開始位置（奥行方向）から開始される操作開始検出領域（奥行方向））に含まれるか否かによって操作を受け付ける操作者か否かを判別するようにしてもよい。これにより、操作開始検出範囲と手位置との関係を、奥行方向の座標も考慮して誤検出を回避することができる。 (Step S904) Based on the three-dimensional face position information input from the face detection unit 30, the operator determination unit 39 selects a user who is located in a predetermined operable region from among users who have detected a facial region. It is determined as The operator determination unit 39 determines the operation start detection range based on the two-dimensional face area information input from the face detection unit 30 and the two-dimensional eye position information input from the eye position detection unit 31. The operator determination unit 39 determines whether or not the operator's hand located in the operable region is located in the operation start detection range. Thereby, it is detected that the operation from the operator is started. If it is determined that the operation has started (step S904 Yes), the process proceeds to step S905. If it is determined that the operation has not been started (No in step S904), the process proceeds to step S900.
As described above, the operator determination unit 39 may determine the operation start detection range based on the three-dimensional position information representing the three-dimensional coordinates on the object space and the three-dimensional face area information. In this case, the operator discriminating unit 39 starts the operation start detection range in which the hand position on the subject space represented by the three-dimensional hand position information is in the depth direction (operation start detection start position (depth direction) in FIG. 8). You may make it discriminate | determine whether it is an operator who receives operation by whether it is included in a start detection area | region (depth direction). Thereby, it is possible to avoid erroneous detection of the relationship between the operation start detection range and the hand position in consideration of the coordinates in the depth direction.

（ステップＳ９０５）制御部２２は、検出情報出力部４１から入力された検出情報から、操作者の手の形状を表す手形状情報として指画像信号、本数情報、２次元指先位置を抽出する。その後、ステップＳ９０６に進む。
（ステップＳ９０６）制御部２２は、記憶部から手形状情報を読み出し、読み出した手形状情報と入力された手形状情報を照合する。制御部２２は、読み出した手形状情報と入力された手形状情報との照合が成功した場合（ステップＳ９０６Ｙｅｓ）、ステップＳ９１０に進む。制御部２２は、読み出した手形状情報と入力された手形状情報との照合に失敗した場合（ステップＳ９０６Ｎｏ）、ステップＳ９０７に進む。 (Step S905) The control unit 22 extracts a finger image signal, number information, and a two-dimensional fingertip position as hand shape information representing the shape of the operator's hand from the detection information input from the detection information output unit 41. Thereafter, the process proceeds to step S906.
(Step S906) The control unit 22 reads the hand shape information from the storage unit, and collates the read hand shape information with the input hand shape information. When the collation between the read hand shape information and the input hand shape information is successful (step S906 Yes), the control unit 22 proceeds to step S910. If the verification of the read hand shape information and the input hand shape information fails (No in step S906), the control unit 22 proceeds to step S907.

（ステップＳ９０７）制御部２２は、案内画像出力待機時間（Ｔ１）を経過しているか否か判断する。案内画像出力待機時間（Ｔ１）を経過している場合（ステップＳ９０７Ｙｅｓ）、ステップＳ９０８に進む。
制御部２２は、案内画像出力待機時間（Ｔ１）を経過していない場合（ステップ９０７Ｎｏ）、ステップＳ９００に進む。
（ステップＳ９０８）制御部２２は、予め定めた時間Ｔ１が経過した後、情報ＤＢ２１から案内画像信号を読み出し、読み出した案内画像信号を表示部１２に出力する。表示部１２は、制御部２２から入力された案内画像信号に基づく案内画像を表示する。その後、ステップＳ９００に進む。 (Step S907) The control unit 22 determines whether or not the guide image output standby time (T1) has elapsed. When the guide image output standby time (T1) has elapsed (step S907 Yes), the process proceeds to step S908.
When the guide image output standby time (T1) has not elapsed (No in Step 907), the control unit 22 proceeds to Step S900.
(Step S908) After the predetermined time T1 has elapsed, the control unit 22 reads the guide image signal from the information DB 21, and outputs the read guide image signal to the display unit 12. The display unit 12 displays a guide image based on the guide image signal input from the control unit 22. Thereafter, the process proceeds to step S900.

（ステップＳ９１０）表示部１２は、案内画像を表示している場合、案内画像を消去する。制御部２２は、照合に成功した手形状情報に対応する処理を実行する。この処理とは、例えば、上述の表示装置１０の機能に係る動作のいずれかである。また、その動作を行う際に表示する画像（例えば、ユーザインタフェースに関わる画面）を表す画像信号を情報ＤＢから読み出し、読み出した画像信号を表示部１２に出力する。その後、ステップＳ９１１に進む。 (Step S910) When the guide image is displayed, the display unit 12 deletes the guide image. The control unit 22 executes processing corresponding to the hand shape information that has been successfully verified. This process is, for example, one of operations related to the function of the display device 10 described above. Further, an image signal representing an image (for example, a screen related to a user interface) displayed when performing the operation is read from the information DB, and the read image signal is output to the display unit 12. Thereafter, the process proceeds to step S911.

（ステップＳ９１１）制御部２２は、操作者による操作が終了したか否かを判断する。制御部２２は、例えば、電源断を表す操作入力が入力された場合に操作が終了したと判断する。操作が終了していないと判断された場合（ステップＳ９１１Ｎｏ）、ステップＳ９００に進む。操作が終了したと判断された場合（ステップＳ９１１Ｙｅｓ）、データ入力処理を終了する。 (Step S911) The control unit 22 determines whether or not the operation by the operator is finished. For example, the control unit 22 determines that the operation is completed when an operation input indicating power-off is input. If it is determined that the operation has not ended (No at step S911), the process proceeds to step S900. If it is determined that the operation has been completed (step S911 Yes), the data input process is terminated.

なお、上述では、２台の撮像部１１０ａ、１１０ｂから各々入力された画像信号に基づいて操作者の身体までの距離を表す距離情報を取得する例をとって説明したが、本実施形態ではこれには限られない。例えば、撮像部１１０ａ、１１０ｂのいずれかを、その他の方式で被写体までの距離情報を取得する測距部に置き換え、測距部は取得した距離情報を、ユーザ情報解析部２０１に入力するようにしてもよい。
この測距部は、例えば、距離情報を取得する方式としてＴＯＦ（ＴｉｍｅｏｆＦｌｉｇｈｔ）法を用いる。ＴＯＦ法では、光源として、例えば、ＬＥＤ（ＬｉｇｈｔＥｍｉｔｔｉｎｇＤｉｏｄｅ、発光ダイオード）を用いて、光線を放射してから被写体から反射した反射光を受信するまでの到達時間を計測する。到達時間の計測を、予め定めた分割された領域毎に行うことで、被写空間内の平面位置毎の距離情報を取得することができる。光源は、例えば、人間の眼に見えない赤外線を放射する赤外線発光ダイオードであるが、これには限られない。例えば、パルス照射することができるレーザ光源であってもよい。また、光線の振幅又は位相を変調することができる位相変調器を備え、放射光と反射光の位相差に基づいて到達時間を算出してもよい。 In the above description, the distance information indicating the distance to the operator's body is acquired based on the image signals input from the two imaging units 110a and 110b. It is not limited to. For example, one of the imaging units 110a and 110b is replaced with a distance measurement unit that acquires distance information to the subject by other methods, and the distance measurement unit inputs the acquired distance information to the user information analysis unit 201. May be.
This distance measuring unit uses, for example, a TOF (Time of Flight) method as a method for acquiring distance information. In the TOF method, for example, an LED (Light Emitting Diode) is used as a light source, and the arrival time from when a light beam is emitted until the reflected light reflected from the subject is received is measured. By measuring the arrival time for each predetermined divided area, it is possible to acquire distance information for each plane position in the object space. The light source is, for example, an infrared light emitting diode that emits infrared light that cannot be seen by human eyes, but is not limited thereto. For example, a laser light source capable of performing pulse irradiation may be used. Further, a phase modulator that can modulate the amplitude or phase of the light beam may be provided, and the arrival time may be calculated based on the phase difference between the emitted light and the reflected light.

なお、上述では、制御部２２がユーザ情報解析部から入力された検出情報として手形状情報に対応する種類の処理を行う例について説明したが、本実施形態ではこれには限られない。本実施形態では、身体の一部として手に限らず、他の部分、例えば腕、首、頭部、胴体部の形状を表す情報を含むユーザ情報を用いてもよい。また、当該ユーザ情報における形状とは、静止した形状に限らず、形状の時間変化である動作、姿勢を含んでいてもよい。当該ユーザ情報には、２次元の位置情報の代わりに３次元の位置情報を用いてもよい。 In the above description, the example in which the control unit 22 performs the type of processing corresponding to the hand shape information as the detection information input from the user information analysis unit has been described, but the present embodiment is not limited thereto. In the present embodiment, user information including information representing the shape of not only a hand but also other parts such as an arm, a neck, a head, and a torso may be used as a part of the body. In addition, the shape in the user information is not limited to a stationary shape, and may include an action and a posture that are changes in shape over time. As the user information, three-dimensional position information may be used instead of the two-dimensional position information.

このように、本実施形態によれば、撮像装置１１が撮像した映像に表されたユーザ毎の身体の一部の位置を表す第１位置情報を取得する。また、本実施形態によれば、第１位置情報に基づいてユーザを判別し、撮像装置１１が撮像した映像に表されたユーザの身体の一部の形状を表す情報を含むユーザ情報を検出し、検出したユーザ情報に対応する処理を実行する。そのため、複数人の操作者が身体の形状を変化させて操作を試みても、同時に複数の操作に係る処理が衝突せずに快適な操作が可能になる。
また、本実施形態によれば、ユーザ情報に対応する処理を実行したタイミングによる、案内画像の表示の有無や、撮像装置が撮像した映像に基づきユーザの特徴を表す特徴情報を推定し、特徴情報によって表示態様が異なる案内画像を表示する。そのため、多様な特徴を有するユーザに対して快適な操作が可能になる。 As described above, according to the present embodiment, the first position information representing the position of a part of the body for each user represented in the video captured by the imaging device 11 is acquired. In addition, according to the present embodiment, the user is determined based on the first position information, and user information including information representing the shape of a part of the user's body shown in the image captured by the imaging device 11 is detected. Then, processing corresponding to the detected user information is executed. Therefore, even if a plurality of operators attempt an operation by changing the shape of the body, it is possible to perform a comfortable operation without causing the processes related to the plurality of operations to collide at the same time.
In addition, according to the present embodiment, the feature information representing the feature of the user is estimated based on the presence / absence of the display of the guidance image or the image captured by the imaging device at the timing when the process corresponding to the user information is executed. A guide image with a different display mode is displayed. Therefore, a comfortable operation can be performed for a user having various features.

なお、上述した実施形態における表示装置１０の一部、例えば、距離算出部２００、顔検出部３０、目位置検出部３１、手位置検出部３２、手形状・指先位置検出部３３、注目位置検出部３５、特徴情報解析部３７、操作者判別部３９、検出情報出力部４１、及び制御部２２をコンピュータで実現するようにしても良い。その場合、この制御機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現しても良い。なお、ここでいう「コンピュータシステム」とは、表示装置１０に内蔵されたコンピュータシステムであって、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでも良い。また上記プログラムは、前述した機能の一部を実現するためのものであっても良く、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであっても良い。
また、上述した実施形態における表示装置１０の一部、または全部を、ＬＳＩ（ＬａｒｇｅＳｃａｌｅＩｎｔｅｇｒａｔｉｏｎ）等の集積回路として実現しても良い。表示装置１０の各機能ブロックは個別にプロセッサ化してもよいし、一部、または全部を集積してプロセッサ化しても良い。また、集積回路化の手法はＬＳＩに限らず専用回路、または汎用プロセッサで実現しても良い。また、半導体技術の進歩によりＬＳＩに代替する集積回路化の技術が出現した場合、当該技術による集積回路を用いても良い。 Note that a part of the display device 10 in the above-described embodiment, for example, the distance calculation unit 200, the face detection unit 30, the eye position detection unit 31, the hand position detection unit 32, the hand shape / fingertip position detection unit 33, and the target position detection. The unit 35, the feature information analysis unit 37, the operator determination unit 39, the detection information output unit 41, and the control unit 22 may be realized by a computer. In that case, the program for realizing the control function may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read by a computer system and executed. Here, the “computer system” is a computer system built in the display device 10 and includes an OS and hardware such as peripheral devices. The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Furthermore, the “computer-readable recording medium” is a medium that dynamically holds a program for a short time, such as a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line, In such a case, a volatile memory inside a computer system serving as a server or a client may be included and a program that holds a program for a certain period of time. The program may be a program for realizing a part of the functions described above, and may be a program capable of realizing the functions described above in combination with a program already recorded in a computer system.
Moreover, you may implement | achieve part or all of the display apparatus 10 in embodiment mentioned above as integrated circuits, such as LSI (Large Scale Integration). Each functional block of the display device 10 may be individually made into a processor, or a part or all of them may be integrated into a processor. Further, the method of circuit integration is not limited to LSI, and may be realized by a dedicated circuit or a general-purpose processor. Further, in the case where an integrated circuit technology that replaces LSI appears due to progress in semiconductor technology, an integrated circuit based on the technology may be used.

以上、図面を参照してこの発明の一実施形態について詳しく説明してきたが、具体的な構成は上述のものに限られることはなく、この発明の要旨を逸脱しない範囲内において様々な設計変更等をすることが可能である。 As described above, the embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to the above, and various design changes and the like can be made without departing from the scope of the present invention. It is possible to

１０…表示装置、２ａ…データ入力装置、１１…撮像装置、２０…画像処理装置、
２００…距離算出部、２０１…ユーザ情報解析部、３０…顔検出部、３１…目位置検出部、
３２…手位置検出部、３３…手形状・指先位置検出部、３５…注目位置検出部、
３７…特徴情報解析部、３９…操作者判別部、４１…検出情報出力部、
２ｂ…表示制御装置、１２…表示部、２１…情報ＤＢ、２２…制御部 DESCRIPTION OF SYMBOLS 10 ... Display apparatus, 2a ... Data input device, 11 ... Imaging device, 20 ... Image processing apparatus,
200: distance calculation unit, 201: user information analysis unit, 30 ... face detection unit, 31 ... eye position detection unit,
32 ... Hand position detection unit, 33 ... Hand shape / fingertip position detection unit, 35 ... Attention position detection unit,
37 ... feature information analysis unit, 39 ... operator discrimination unit, 41 ... detection information output unit,
2b ... display control device, 12 ... display unit, 21 ... information DB, 22 ... control unit

Claims

A first position detection unit that acquires first position information indicating a position of a part of the body of each user represented in an image captured by the imaging device;
A user information analysis unit that determines a user based on the first position information and detects user information including information representing a shape of a part of the body of the user represented in an image captured by the imaging device;
A data input device comprising: a control unit that executes processing corresponding to user information detected by the user information analysis unit.

The user information analysis unit includes a second position detection unit that acquires second position information in which another part different from a part of the body for each user is located,
The data input device according to claim 1, wherein the control unit executes processing corresponding to the user information for a user in which the first position information and the second position information have a predetermined relationship.

The data input device according to claim 1, wherein the part of the body is a hand.

The data input device according to claim 2, wherein the other part different from the part of the body is at least one of a face and an eye.

The data input device according to claim 4, wherein the imaging device is installed at a position higher than a height of the face or eyes.

A display unit for displaying video captured by the imaging device;
The data input device according to claim 5, wherein the imaging device is installed at a position higher than the display unit.

The imaging device includes a plurality of imaging units provided at different positions,
The first position detection unit acquires first position information including distance information based on images captured by the plurality of imaging units,
The data input apparatus according to claim 6, wherein the user information analysis unit determines a user based on the first position information acquired by the first position detection unit.

A display unit for displaying a guide image representing a relationship between a process to be executed and a shape of a part of the user's body after the user information analysis unit has passed a predetermined time from the start of detection of the user information; The data input device according to claim 1, wherein

When the control unit executes a process corresponding to the user information detected by the user information analysis unit before the predetermined time elapses after the user information analysis unit detects the user information, The data input device according to claim 8, wherein the display unit does not display the guide image.

The user information analysis unit estimates feature information representing the features of the user based on the video captured by the imaging device,
The data input device according to claim 8, wherein the display unit displays a guide image having a different display mode according to the feature information.

A display device comprising the data input device according to claim 1.

In the data input method in the data input device for inputting data based on the video imaged by the imaging device,
A first process in which the data input device acquires first position information indicating a position of a part of a body for each user represented in an image captured by an imaging device;
The data input device determines a user based on the first position information, and detects user information including information representing a shape of a part of the user's body represented in an image captured by the imaging device. And the process
The data input method includes: a third step of executing processing corresponding to the detected user information.

The second process includes a fourth process of acquiring second position information where another part different from a part of the body for each user is located,
The data input method according to claim 12, wherein in the third step, a process corresponding to the user information is executed for a user having a predetermined relationship between the first position information and the second position information. .

To the computer of the data input device that inputs data based on the image captured by the imaging device,
A first procedure for acquiring first position information representing a position of a part of a body for each user represented in an image captured by an imaging device;
A second procedure for determining a user based on the first position information and detecting user information including information representing a shape of a part of the body of the user represented in an image captured by the imaging device;
A third procedure for executing processing corresponding to the detected user information;
Data input program for executing

The second procedure includes a fourth procedure for obtaining second position information where another part different from a part of the body for each user is located,
15. The data input program according to claim 14, wherein the third procedure executes a process corresponding to the user information for a user having a predetermined relationship between the first position information and the second position information. .