JP4575829B2

JP4575829B2 - Display screen position analysis device and display screen position analysis program

Info

Publication number: JP4575829B2
Application number: JP2005099897A
Authority: JP
Inventors: 保明金次; 則好浦谷; 一晃小峯
Original assignee: NHK Engineering Services Inc; Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp; NHK Engineering System Inc
Priority date: 2005-03-30
Filing date: 2005-03-30
Publication date: 2010-11-04
Anticipated expiration: 2025-03-30
Also published as: JP2006277666A

Abstract

<P>PROBLEM TO BE SOLVED: To provide a display screen position analysis device capable of stably analyzing a position on a display screen pointed by an operator without depending on the position of the operator or requiring any calibration. <P>SOLUTION: This display screen position analysis device 1 is provided with an image input means 10 for inputting a frame image of an operator taken by two or more cameras, a distance image generation means 11 generating a distance image showing a distance from the display screen based on a parallax, a fingertip detection means 14 detecting a fingertip position giving a minimum distance within a distance image prediction area, a viewpoint detection means (means detecting a distance between eyebrows) 16 detecting a viewpoint position of the operator from the frame image, and a screen position analysis means (cursor position analysis means) 18 analyzing a position on the display screen based on the fingertip position, the viewpoint position, and the distance image. The fingertip detection means 14 has a prediction area setting means (prediction area setting part) 14c setting the prediction area. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、操作者によって選択された表示画面上の位置を解析する技術に係り、特に、操作者が指さした表示画面上の位置を解析する表示画面上位置解析装置及び表示画面上位置解析プログラムに関する。 The present invention relates to a technique for analyzing a position on a display screen selected by an operator, and in particular, a display screen position analyzing apparatus and a display screen position analyzing program for analyzing a position on a display screen pointed to by an operator. About.

従来、操作者が表示画面上を選択するためには、マウス、トラックボール、ジョイスティック、タブレット等のポインティングデバイスがある。そして、表示画面上を選択するためには、操作者はこれらのポインティングデバイスを手に持ち、操作しなければならない。これらのポインティングデバイスでは、操作者が意図する表示画面上の位置を直接的に選択することはできず、ポインティングデバイスを操作して、表示画面上のカーソルを意図する位置まで移動させなければならないため、操作には訓練や慣れが必要となる。また、例えばマウスでは、移動させるための平面の机などを必要とし、操作者が操作する環境にも制限があった。 Conventionally, there are pointing devices such as a mouse, a trackball, a joystick, and a tablet for an operator to select a display screen. In order to select the display screen, the operator must hold and operate these pointing devices. With these pointing devices, it is not possible to directly select the position on the display screen intended by the operator, and the cursor on the display screen must be moved to the intended position by operating the pointing device. The operation requires training and familiarity. Further, for example, a mouse requires a flat desk or the like for movement, and the environment in which the operator operates is limited.

また、操作者が表示画面上の意図する場所を直接選択できるポインティングデバイスとしてタッチパネルがある。しかし、タッチパネルでは、操作者が表示画面に直接触れる必要があるため、コンピュータのように視距離が短い場合は便利であるが、テレビを視聴する場合のように、視距離が大きいときには操作者はそのまま表示画面に触れることができず、身体を移動させなければならない。 There is a touch panel as a pointing device that allows an operator to directly select an intended location on a display screen. However, with a touch panel, the operator needs to touch the display screen directly, which is convenient when the viewing distance is short like a computer, but when the viewing distance is large, such as when watching TV, the operator You cannot touch the display screen as it is, and you must move your body.

そこで、指や操作者の体の動きや位置を検出して、表示画面上の位置を設定する技術が研究されている。例えば、操作者の体の動きを検出する技術として、レーザ光を操作者に照射して動作を検出する技術が開発されている。また、カメラによって操作者を撮影し、撮影された映像から操作者の指先の位置を検出して、その位置から表示画面上のカーソルの位置を設定する技術が開示されている（非特許文献１及び非特許文献２参照）。この技術では、操作者の指先の位置によってカーソルの位置が決まるので、操作者は選択したい位置を指さすことで、その位置の近傍にカーソルを移動させて選択することができる。 Therefore, a technique for setting the position on the display screen by detecting the movement and position of the body of the finger or the operator has been studied. For example, as a technique for detecting the movement of the operator's body, a technique for detecting an operation by irradiating the operator with a laser beam has been developed. In addition, a technique is disclosed in which an operator is photographed by a camera, the position of the operator's fingertip is detected from the photographed video, and the position of the cursor on the display screen is set from the position (Non-patent Document 1). And Non-Patent Document 2). In this technique, since the position of the cursor is determined by the position of the fingertip of the operator, the operator can select by moving the cursor to the vicinity of the position by pointing to the position to be selected.

更に、操作者の位置を仮定し、この位置と、検出された指先の位置とに基づいて、カーソルの位置を設定する手法がある。この手法では、操作者の位置に基づいてカーソルの位置を設定するため、操作者から見た自分の指先の延長線に近い位置の表示画面上に、カーソルを移動させることができる。
ＬｅｅＹ、外２名、「シルエット画像を用いた手先動作のリアルタイム認識とその応用」、ヒューマン・インターフェース・シンポジウム論文集、１９９７年、Ｖｏｌ．１３、ｐ．６６５−６７２原川健一、「コンピュータを指の動きで操作するシステム」、画像ラボ、２０００年４月１日、Ｖｏｌ．１１、Ｎｏ．４、ｐ．２９−３３ Furthermore, there is a method for setting the position of the cursor based on the position of the operator and assuming the position of the operator and the position of the detected fingertip. In this method, since the position of the cursor is set based on the position of the operator, the cursor can be moved on the display screen at a position close to the extension line of the fingertip as viewed from the operator.
Lee Y, two others, “Real-time recognition of hand movement using silhouette images and its application”, Proceedings of Human Interface Symposium, 1997, Vol. 13, p. 665-672 Harakawa, Kenichi, “System for operating a computer with the movement of a finger”, Image Lab, April 1, 2000, Vol. 11, no. 4, p. 29-33

しかしながら、レーザ光を操作者に照射する技術では、レーザ光により操作者の体や目に害を与える可能性があり、また、表示画面の視聴を妨害することがあるという問題があった。また、非特許文献及び非特許文献２の技術では、指先の位置のみに基づいてカーソルの位置を設定するため、操作者の位置によっては、操作者が指さした位置からずれたところにカーソルが移動してしまい、実用的ではなかった。更に、操作者の体の一部の位置を検出する方法では、検出できなかったり、誤検出を生じたりすることがあり、カーソルの位置が不安定になるという問題があった。 However, in the technique of irradiating the operator with laser light, there is a possibility that the operator's body and eyes may be damaged by the laser light, and viewing of the display screen may be disturbed. In the techniques of Non-Patent Document and Non-Patent Document 2, the cursor position is set based only on the position of the fingertip. Therefore, depending on the position of the operator, the cursor moves away from the position pointed by the operator. It was not practical. Furthermore, in the method of detecting the position of a part of the operator's body, there is a problem that the position of the cursor becomes unstable because it may not be detected or erroneous detection may occur.

更に、操作者の位置を仮定する手法では、操作者の位置を仮定するためにキャリブレーションが必要となる。そのため、表示画面を視聴中に操作者が移動するたびにキャリブレーションを取らなくてはならず、実用的ではなかった。 Further, in the method of assuming the position of the operator, calibration is required to assume the position of the operator. Therefore, calibration must be performed every time the operator moves while viewing the display screen, which is not practical.

本発明は、前記従来技術の問題を解決するために成されたもので、操作者の位置に依存せず、また、キャリブレーションを不要とし、かつ、安定して操作者から見た指先の延長線上の表示画面上の位置を解析することができる表示画面上位置解析装置及び表示画面上位置解析プログラムを提供することを目的とする。 The present invention has been made to solve the above-described problems of the prior art, does not depend on the position of the operator, does not require calibration, and extends the fingertip as viewed from the operator stably. It is an object of the present invention to provide a display screen position analysis device and a display screen position analysis program that can analyze the position on the display screen on a line.

前記課題を解決するため、請求項１に記載の表示画面上位置解析装置は、少なくとも２台のカメラによって操作者を含む被写体を撮影した映像を入力し、前記映像を構成するフレーム画像に基づいて、前記操作者の指先によって示される表示装置の表示画面上の位置を解析する表示画面上位置解析装置であって、画像入力手段と、背景画像記憶手段と、距離画像生成手段と、背景差分処理手段と、指先検出手段と、視点検出手段と、画面上位置解析手段とを備える構成とした。 In order to solve the above-described problem, the on-display-position analyzing apparatus according to claim 1 inputs a video obtained by photographing a subject including an operator by at least two cameras, and based on a frame image constituting the video. A display screen position analysis device for analyzing the position on the display screen of the display device indicated by the fingertip of the operator, wherein the image input means, the background image storage means, the distance image generation means, and the background difference processing It means, and the fingertip detection unit, and the viewpoint detecting unit, and configured to include a screen position analyzer.

かかる構成によれば、表示画面上位置解析装置は、背景画像記憶手段によって、前記カメラによって撮影された前記操作者の背景の画像である背景画像を記憶する。また、表示画面上位置解析装置は、画像入力手段によって、少なくとも２台のカメラによって操作者を撮影した映像を構成するフレーム画像を時系列に入力し、距離画像生成手段によって、このフレーム画像の視差に基づいて、撮影された被写体の、表示画面からの距離を示す画像である距離画像を生成する。 According to this configuration, the position analysis device on the display screen stores the background image that is the background image of the operator photographed by the camera by the background image storage unit. Further, the position analysis device on the display screen inputs a frame image constituting a video obtained by photographing the operator with at least two cameras by the image input means in time series, and the disparity of the frame image by the distance image generation means. Based on, a distance image that is an image indicating the distance of the photographed subject from the display screen is generated.

また、表示画面上位置解析装置は、背景差分処理手段が、前記背景画像記憶手段が記憶する背景画像と前記画像入力手段から入力されたフレーム画像との差分画像を生成する差分処理部と、当該差分画像によって示される差分が閾値より小さい画素をマスクして、前記フレーム画像において前記操作者の撮影されていないマスク領域を示すマスク画像を生成するマスク画像生成部と、前記差分画像によって示される差分の変化量が所定量以下である場合に、前記フレーム画像を前記背景画像であると判断し、前記背景画像記憶手段に記憶させる背景画像選択部とを有する。また、表示画面上位置解析装置は、指先検出手段によって、距離画像生成手段で生成された距離画像に基づいて、所定の大きさの予測領域内で表示画面から被写体までの距離が最短となる部分のフレーム画像内における位置である指先位置を検出する。また、表示画面上位置解析装置は、指先検出手段の予測領域設定手段によって、当該フレーム画像より前に入力されたフレーム画像について検出された指先位置に基づいて、当該距離画像における指先位置を予測した予測位置を算出し、距離画像における予測位置を含む予測領域を設定する。これによって、表示画面上位置解析装置は、指先位置を追跡して、予測領域を設定するため、安定して指先位置を検出することができる。また、表示画面上位置解析装置は、指先検出手段の指先判定手段によって、前記マスク画像生成部が生成したマスク画像によって示されるマスク領域を除いた領域である非マスク領域から前記指先位置を検出し、当該指先位置における前記距離と、前記非マスク領域における前記距離の平均値とに基づいて、前記指先位置が前記指先の位置を示すか否かを判定する。 Further, the display screen position analyzing apparatus includes a difference processing unit that generates a difference image between a background image stored in the background image storage unit and a frame image input from the image input unit; A mask image generation unit that masks pixels whose difference indicated by the difference image is smaller than a threshold value and generates a mask image indicating a mask area not photographed by the operator in the frame image, and a difference indicated by the difference image A background image selection unit that determines that the frame image is the background image and stores the frame image in the background image storage unit when the amount of change is less than or equal to a predetermined amount. Further, the position analysis device on the display screen is a part in which the distance from the display screen to the subject is the shortest in the prediction area of a predetermined size based on the distance image generated by the distance image generation unit by the fingertip detection unit. The fingertip position which is the position in the frame image is detected. Further, the position analysis device on the display screen predicts the fingertip position in the distance image based on the fingertip position detected for the frame image input before the frame image by the prediction area setting unit of the fingertip detection unit. A predicted position is calculated, and a predicted area including the predicted position in the distance image is set. As a result, the position analysis device on the display screen tracks the fingertip position and sets the prediction region, so that the fingertip position can be detected stably. Further, the position analysis device on the display screen detects the fingertip position from a non-mask area that is an area excluding the mask area indicated by the mask image generated by the mask image generation unit by the fingertip determination unit of the fingertip detection unit. Based on the distance at the fingertip position and the average value of the distances in the non-mask area, it is determined whether or not the fingertip position indicates the position of the fingertip.

更に、表示画面上位置解析装置は、視点検出手段によって、画像入力手段から入力されたフレーム画像から、当該フレーム画像内における操作者の目の間の位置である視点位置を検出し、画面上位置解析手段によって、指先検出手段で検出された指先位置と、視点検出手段によって検出された視点位置と、指先位置及び視点位置における、距離画像によって示される距離とに基づいて、指先によって示される表示画面上の位置を解析する。 Further, the display screen position analysis device detects a viewpoint position that is a position between the eyes of the operator in the frame image from the frame image input from the image input means by the viewpoint detection means, A display screen indicated by the fingertip based on the fingertip position detected by the fingertip detection means by the analysis means, the viewpoint position detected by the viewpoint detection means, and the distance indicated by the distance image at the fingertip position and the viewpoint position Analyze the top position.

これによって、表示画面上位置解析装置は、操作者を撮影した映像に基づいて、指先位置と視点位置とを検出して、この指先位置と、視点位置と、視差を利用して生成した被写体の表示画面からの距離を示す距離画像とに基づいて、操作者の指先と目の間の位置の三次元位置を解析することができる。更に、表示画面上位置解析装置は、この三次元位置に基づいて、操作者の目の間から指先を通る延長線が表示画面と交差する位置を解析して、この位置の表示画面上における位置を解析することができ、操作者の指先によって示される表示装置の表示画面上の位置を解析することができる。 Thereby, the position analysis device on the display screen detects the fingertip position and the viewpoint position based on the video imaged of the operator, and the fingertip position, the viewpoint position, and the subject generated using the parallax are detected. Based on the distance image indicating the distance from the display screen, the three-dimensional position of the position between the fingertip of the operator and the eyes can be analyzed. Furthermore, the display screen position analysis device analyzes the position where the extension line passing through the fingertip from between the eyes of the operator intersects the display screen based on the three-dimensional position, and the position on the display screen The position on the display screen of the display device indicated by the fingertip of the operator can be analyzed.

これによって、表示画面上位置解析装置は、背景画像とフレーム画像との差分画像に基づいて、背景画像と変化の生じた部分、つまり、操作者の撮影されている領域から指先位置を検出するため、安定して指先位置を検出することができる。また、非マスク領域における被写体から表示画面までの距離の平均値と、指先位置の距離は、それぞれ操作者の表示画面からの距離の平均値と、検出された指先位置に対応する被写体の表示画面からの距離とを示すため、表示画面上位置解析装置は、操作者に比べて、検出された指先位置に対応する被写体がどれだけ表示画面に近いかに基づいて、指先位置が操作者の指先に対応する位置であるかを判定することができる。 As a result, the position analysis device on the display screen detects the fingertip position from the background image and the part where the change has occurred, that is, the area where the operator is photographed, based on the difference image between the background image and the frame image. The fingertip position can be detected stably. Further, the average value of the distance from the subject to the display screen in the non-mask area and the distance of the fingertip position are respectively the average value of the distance from the operator's display screen and the display screen of the subject corresponding to the detected fingertip position. In order to indicate the distance from the display screen, the position analysis device on the display screen determines whether the fingertip position is closer to the operator's fingertip based on how close the subject corresponding to the detected fingertip position is to the display screen than the operator. It can be determined whether or not the corresponding position.

これによって、表示画面上位置解析装置は、背景が変化した場合においても、変化後の背景画像を背景画像記憶手段に記憶させるため、操作者が撮影されている領域のみを非マスク領域として設定でき、安定して指先を検出することができる。 As a result, even if the background changes, the position analysis device on the display screen stores the changed background image in the background image storage means, so that only the area where the operator is photographed can be set as a non-mask area. The fingertip can be detected stably.

また、請求項２に記載の表示画面上位置解析プログラムは、少なくとも２台のカメラによって操作者を含む被写体を撮影した映像を入力し、前記映像を構成するフレーム画像に基づいて、前記操作者の指先によって示される表示装置の表示画面上の位置を解析するために、前記カメラによって撮影された前記操作者の背景の画像である背景画像を記憶する背景画像記憶手段を備えるコンピュータを、画像入力手段、距離画像生成手段、背景差分処理手段、指先検出手段、視点検出手段、画面上位置解析手段として機能させることとした。 Further, the display screen position analysis program according to claim 2 inputs a video obtained by photographing a subject including an operator by at least two cameras, and based on frame images constituting the video, the operator's An image input means comprising a background image storage means for storing a background image that is a background image of the operator photographed by the camera in order to analyze the position on the display screen of the display device indicated by the fingertip. distance image generating means, and the background difference processing means, the fingertip detection means, the viewpoint detecting means, and be made to function as on-screen position analyzing means.

かかる構成によれば、表示画面上位置解析プログラムは、画像入力手段によって、少なくとも２台のカメラによって操作者を撮影した映像を構成するフレーム画像を時系列に入力し、距離画像生成手段によって、このフレーム画像の視差に基づいて、撮影された被写体の、表示画面からの距離を示す画像である距離画像を生成する。 According to such a configuration, the position analysis program on the display screen inputs, in a time series, the frame images constituting the video obtained by photographing the operator with at least two cameras by the image input means, and the distance image generating means Based on the parallax of the frame image, a distance image that is an image indicating the distance of the photographed subject from the display screen is generated.

また、表示画面上位置解析プログラムは、背景差分処理手段が、前記背景画像記憶手段が記憶する背景画像と前記画像入力手段から入力されたフレーム画像との差分画像を生成する差分処理部と、当該差分画像によって示される差分が閾値より小さい画素をマスクして、前記フレーム画像において前記操作者の撮影されていないマスク領域を示すマスク画像を生成するマスク画像生成部と、前記差分画像によって示される差分の変化量が所定量以下である場合に、前記フレーム画像を前記背景画像であると判断し、前記背景画像記憶手段に記憶させる背景画像選択部とを有する。また、表示画面上位置解析プログラムは、指先検出手段によって、距離画像生成手段で生成された距離画像に基づいて、所定の大きさの予測領域内で表示画面から被写体までの距離が最短となる部分のフレーム画像内における位置である指先位置を検出する。また、表示画面上位置解析プログラムは、指先検出手段の予測領域設定手段によって、当該フレーム画像より前に入力されたフレーム画像について検出された指先位置に基づいて、当該距離画像における指先位置を予測した予測位置を算出し、距離画像における予測位置を含む予測領域を設定する。また、表示画面上位置解析プログラムは、指先検出手段の指先判定手段によって、前記マスク画像生成部が生成したマスク画像によって示されるマスク領域を除いた領域である非マスク領域から前記指先位置を検出し、当該指先位置における前記距離と、前記非マスク領域における前記距離の平均値とに基づいて、前記指先位置が前記指先の位置を示すか否かを判定する。 Further, the display screen position analysis program includes a difference processing unit that generates a difference image between a background image stored in the background image storage unit and a frame image input from the image input unit; A mask image generation unit that masks pixels whose difference indicated by the difference image is smaller than a threshold value and generates a mask image indicating a mask area not photographed by the operator in the frame image, and a difference indicated by the difference image A background image selection unit that determines that the frame image is the background image and stores the frame image in the background image storage unit when the amount of change is less than or equal to a predetermined amount. Further, the position analysis program on the display screen is a part in which the distance from the display screen to the subject is the shortest in the prediction area of a predetermined size based on the distance image generated by the distance image generation unit by the fingertip detection unit. The fingertip position which is the position in the frame image is detected. Further, the position analysis program on the display screen predicts the fingertip position in the distance image based on the fingertip position detected for the frame image input before the frame image by the prediction area setting unit of the fingertip detection unit. A predicted position is calculated, and a predicted area including the predicted position in the distance image is set. Further, the display screen position analysis program detects the fingertip position from a non-mask area that is an area excluding the mask area indicated by the mask image generated by the mask image generation unit by the fingertip determination means of the fingertip detection means. Based on the distance at the fingertip position and the average value of the distances in the non-mask area, it is determined whether or not the fingertip position indicates the position of the fingertip.

更に、表示画面上位置解析プログラムは、視点検出手段によって、画像入力手段から入力されたフレーム画像から、当該フレーム画像内における操作者の目の間の位置である視点位置を検出し、画面上位置解析手段によって、指先検出手段で検出された指先位置と、視点検出手段によって検出された視点位置と、指先位置及び視点位置における、距離画像によって示される距離とに基づいて、指先によって示される表示画面上の位置を解析する。 Further, the display screen position analysis program detects a viewpoint position that is a position between the eyes of the operator in the frame image from the frame image input from the image input means by the viewpoint detection means, A display screen indicated by the fingertip based on the fingertip position detected by the fingertip detection means by the analysis means, the viewpoint position detected by the viewpoint detection means, and the distance indicated by the distance image at the fingertip position and the viewpoint position Analyze the top position.

これによって、表示画面上位置解析プログラムは、操作者を撮影した映像に基づいて、指先位置と視点位置とを検出して、この指先位置と、視点位置と、視差を利用して生成した被写体の表示画面からの距離を示す距離画像とに基づいて、操作者の指先と目の間の位置の三次元位置を解析することができる。更に、表示画面上位置解析プログラムは、この三次元位置に基づいて、操作者の目の間から指先を通る延長線が表示画面と交差する位置を解析して、この位置の表示画面上における位置を解析することができ、操作者の指先によって示される表示装置の表示画面上の位置を解析することができる。また、表示画面上位置解析プログラムは、指先位置を追跡して、予測領域を設定するため、安定して指先位置を検出することができる。 As a result, the position analysis program on the display screen detects the fingertip position and the viewpoint position based on the video captured by the operator, and the fingertip position, the viewpoint position, and the subject generated using the parallax are detected. Based on the distance image indicating the distance from the display screen, the three-dimensional position of the position between the fingertip of the operator and the eyes can be analyzed. Further, the position analysis program on the display screen analyzes the position where the extension line passing through the fingertip from between the eyes of the operator intersects the display screen based on the three-dimensional position, and the position on the display screen is determined. The position on the display screen of the display device indicated by the fingertip of the operator can be analyzed. Further, the display screen position analysis program tracks the fingertip position and sets the prediction region, so that the fingertip position can be detected stably.

本発明に係る表示画面上位置解析装置及び表示画面上位置解析プログラムでは、以下のような優れた効果を奏する。 The display screen position analysis apparatus and the display screen position analysis program according to the present invention have the following excellent effects.

請求項１又は請求項２に記載の発明によれば、操作者を撮影した映像に基づいて、操作者の指先によって示される表示画面上の位置を解析するため、操作者はリモコン等のデバイスを操作することなく、指さすだけで表示画面上の位置を指定することができる。また、操作者の指先の位置と目の間の位置を検出するため、操作者の位置に依存せず、操作者の目の間から指先への延長線上の表示画面上の位置を解析することができるとともに、操作者の位置が変化した際にもキャリブレーションを行う必要がなくなる。 According to the first or second aspect of the invention, since the operator analyzes the position on the display screen indicated by the fingertip of the operator based on the video image of the operator, the operator uses a device such as a remote controller. The position on the display screen can be designated by pointing with no operation. Also, to detect the position between the fingertip of the operator and the position of the eyes, it is possible to analyze the position on the display screen on the extension line from between the eyes of the operator to the fingertips without depending on the position of the operator This eliminates the need for calibration even when the position of the operator changes.

更に、レーザ光を操作者に照射したり、あるいは、操作者の体に、位置を検出するための検出器等を装着させる必要がないため、操作者に害を与えたり、操作者が表示画面を視聴する際に不快感を与えることない。また、指先位置を追跡することで安定して指先位置を検出することができるため、操作者が指さした表示画面上の位置を安定して解析することができる。 Furthermore, there is no need to irradiate the operator with laser light or to attach a detector or the like for detecting the position to the operator's body, which may harm the operator or cause the operator to display the screen. Do not give discomfort when watching. Further, since the fingertip position can be stably detected by tracking the fingertip position, the position on the display screen pointed by the operator can be stably analyzed.

請求項１又は請求項２に記載の発明によれば、操作者の撮影されている領域のみから指先位置を検出するため、表示画面の近くにある操作者以外のものが指先として検出されることがなくなり、安定して操作者が指さした表示画面上の位置を解析することができる。また、操作者に比べて、検出された指先位置に対応する被写体がどれだけ表示画面に近いかに基づいて、指先位置が操作者の指先に対応する位置であるかを判定するため、操作者が表示画面上の位置を指さしておらず腕を下ろしている場合には位置を検出せず、指さしている場合のみ表示画面上の位置を解析することができる。 According to the first or second aspect of the present invention, since the fingertip position is detected only from the area where the operator is photographed, a thing other than the operator near the display screen is detected as the fingertip. The position on the display screen pointed by the operator can be analyzed stably. Further, the operator determines whether the fingertip position is a position corresponding to the fingertip of the operator based on how close the subject corresponding to the detected fingertip position is to the display screen compared to the operator. When the position on the display screen is not pointed and the arm is lowered, the position is not detected, and the position on the display screen can be analyzed only when the finger is pointing.

請求項１又は請求項２に記載の発明によれば、背景画像を自動で背景画像記憶手段に記憶させることができるため、予めあるいは背景が変化した際に、操作者が背景画像を登録する必要がなくなる。 According to the invention described in claim 1 or 2 , since the background image can be automatically stored in the background image storage means, it is necessary for the operator to register the background image in advance or when the background changes. Disappears.

以下、本発明の実施の形態について図面を参照して説明する。
［カーソル位置解析装置の構成］
図１及び図２を参照して、本発明の実施の形態であるカーソル位置解析装置（表示画面上位置解析装置）１の構成について説明する。図１は、本発明における実施の形態であるカーソル位置解析装置の概要を説明するための説明図である。図２は、本発明における実施の形態であるカーソル位置解析装置の構成を示したブロック図である。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[Configuration of cursor position analyzer]
With reference to FIG. 1 and FIG. 2, the configuration of a cursor position analysis device (display screen position analysis device) 1 according to an embodiment of the present invention will be described. FIG. 1 is an explanatory diagram for explaining an outline of a cursor position analyzing apparatus according to an embodiment of the present invention. FIG. 2 is a block diagram showing the configuration of the cursor position analysis apparatus according to the embodiment of the present invention.

図１に示すように、カーソル位置解析装置１は、例えば、テレビ受像機のような表示装置Ｄの表示画面近傍に設置された複数のカメラを有する立体カメラＣによって操作者Ｕを撮影した映像を外部から入力し、当該映像に示される操作者Ｕの指先と目の間である眉間（視点）の位置を算出して、当該眉間から指先への延長線上に示される表示装置Ｄの表示画面上の位置を解析し、この位置にカーソルＳを表示させるものである。そして、図２に示すように、ここでは、カーソル位置解析装置１は、画像入力手段１０と、距離画像生成手段１１と、背景差分処理手段１２と、背景画像蓄積手段１３と、指先検出手段１４と、顔検出手段１５と、眉間検出手段１６と、眉間位置設定手段１７と、カーソル位置解析手段１８と、カーソル表示手段１９と、映像入力手段２０とを備える。 As shown in FIG. 1, the cursor position analysis apparatus 1 captures an image of an operator U taken by a stereoscopic camera C having a plurality of cameras installed in the vicinity of a display screen of a display device D such as a television receiver. On the display screen of the display device D, which is input from the outside, calculates the position of the eyebrow (viewpoint) between the fingertip and eyes of the operator U shown in the video, and is shown on the extension line from the eyebrow to the fingertip And the cursor S is displayed at this position. As shown in FIG. 2, here, the cursor position analysis apparatus 1 includes an image input unit 10, a distance image generation unit 11, a background difference processing unit 12, a background image storage unit 13, and a fingertip detection unit 14. A face detection unit 15, an eyebrow detection unit 16, an eyebrow position setting unit 17, a cursor position analysis unit 18, a cursor display unit 19, and a video input unit 20.

画像入力手段１０は、立体カメラＣによって撮影された操作者Ｕの映像を構成するフレーム画像を時系列に入力するものである。この画像入力手段１０によって入力されるフレーム画像は、複数のカメラによって撮影されたものであり、異なる位置から同時刻に撮影された複数のフレーム画像が同時に入力される。ここで入力されたフレーム画像は、距離画像生成手段１１と、背景差分処理手段１２と、顔検出手段１５とへ出力される。 The image input means 10 inputs the frame images constituting the video of the operator U taken by the stereoscopic camera C in time series. The frame image input by the image input means 10 is captured by a plurality of cameras, and a plurality of frame images captured at the same time from different positions are input simultaneously. The frame image input here is output to the distance image generation unit 11, the background difference processing unit 12, and the face detection unit 15.

距離画像生成手段１１は、画像入力手段１０から入力されたフレーム画像に基づいて、当該フレーム画像内に撮影された被写体の、表示画面からの距離を示す距離画像を生成するものである。ここで生成された距離画像は、指先検出手段１４と、カーソル位置解析手段１８とに出力される。 The distance image generation unit 11 generates a distance image indicating the distance from the display screen of the subject imaged in the frame image based on the frame image input from the image input unit 10. The distance image generated here is output to the fingertip detection unit 14 and the cursor position analysis unit 18.

ここで、画像入力手段１０には、異なる位置の複数のカメラによって撮影されたフレーム画像が入力され、距離画像生成手段１１は、これらのフレーム画像の視差を計算し、この視差に基づいて距離を計算することで、表示画面の近くに設置されたカメラからの距離を示す画像を、距離画像として生成することとした。なお、ここでは、立体カメラＣを表示画面の近傍に設置したため、距離画像生成手段１１は、カメラからの距離を示す画像を生成することで、表示画面からの距離を示す距離画像を生成することができる。この視差の計算は、例えば、ブロックマッチング法によって行うことができる。以下、ブロックマッチング法によって、距離画像を生成する方法について説明する。 Here, frame images taken by a plurality of cameras at different positions are input to the image input unit 10, and the distance image generation unit 11 calculates the parallax of these frame images, and calculates the distance based on the parallax. By calculating, an image indicating a distance from a camera installed near the display screen is generated as a distance image. Here, since the stereoscopic camera C is installed in the vicinity of the display screen, the distance image generation unit 11 generates a distance image indicating the distance from the display screen by generating an image indicating the distance from the camera. Can do. This parallax calculation can be performed by, for example, a block matching method. Hereinafter, a method for generating a distance image by the block matching method will be described.

まず、距離画像生成手段１１は、それぞれのフレーム画像について、画素ごとに、その画素を中心としたブロックを設定する。このブロックは、例えば、８×８画素のような、所定数の画素の集まりである。そして、距離画像生成手段１１は、注目するフレーム画像のブロックと、他のフレーム画像のブロックとのマッチングを取る。そして、マッチングの結果、最もマッチングのとれた２つのブロックの中心の画素のずれが、フレーム画像間の視差値となる。このとき、サブピクセル処理により、ずれを１画素の間隔よりも小さくすることもできる。そして、距離画像生成手段１１は、視差値をその画素の値とする。更に、距離画像生成手段１１は、注目するフレーム画像の次の画素について、視差値を同様に計算する。このようにして、距離画像生成手段１１は、注目するフレーム画像の全画素について視差値を求め、これを画素の値とする視差画像を生成する。 First, the distance image generation unit 11 sets a block centered on each pixel for each frame image. This block is a collection of a predetermined number of pixels such as 8 × 8 pixels. Then, the distance image generation means 11 performs matching between the block of the frame image of interest and the block of another frame image. Then, as a result of matching, the shift of the center pixel of the two matching blocks is the parallax value between the frame images. At this time, the shift can be made smaller than the interval of one pixel by the sub-pixel processing. Then, the distance image generation unit 11 sets the parallax value as the value of the pixel. Further, the distance image generation unit 11 similarly calculates the parallax value for the next pixel of the frame image of interest. In this way, the distance image generation unit 11 calculates the parallax value for all the pixels of the frame image of interest, and generates a parallax image using this as the pixel value.

更に、距離画像生成手段１１は、視差画像の視差値を距離に対応する画素値に変換して距離画像を生成する。この際、予めキャリブレーションを行い、カメラの内部パラメータ（カメラパラメータ）を求めておき、距離画像生成手段１１は、このカメラパラメータを用いて、視差値から距離に変換する。以上のようにして、距離画像生成手段１１は、注目するフレーム画像の各々の画素について、当該画素に対応する被写体の表示画面からの距離を示す距離画像を生成することができる。 Further, the distance image generation unit 11 converts the parallax value of the parallax image into a pixel value corresponding to the distance to generate a distance image. At this time, calibration is performed in advance to obtain an internal parameter (camera parameter) of the camera, and the distance image generation unit 11 converts the parallax value into a distance using the camera parameter. As described above, the distance image generation unit 11 can generate a distance image indicating the distance from the display screen of the subject corresponding to the pixel of each pixel of the frame image of interest.

なお、立体カメラＣが、表示画面から離れた位置に設置され、距離画像生成手段１１は、視差に基づいて１つのカメラからの距離を算出し、更に、各々の画素に対応する実空間における三次元位置を算出することで、表示画面からの距離を示す距離画像を算出することとしてもよい。また、立体カメラＣは、少なくとも２台のカメラを備えていればよく、２台以上であれば、距離画像生成手段１１は、フレーム画像から、視差に基づいて距離画像を生成することができる。 Note that the stereoscopic camera C is installed at a position away from the display screen, and the distance image generation unit 11 calculates the distance from one camera based on the parallax, and further, the tertiary in the real space corresponding to each pixel. A distance image indicating a distance from the display screen may be calculated by calculating the original position. Further, the stereoscopic camera C only needs to include at least two cameras, and if the number is two or more, the distance image generation unit 11 can generate a distance image from the frame image based on the parallax.

背景差分処理手段１２は、後記する背景画像蓄積手段１３に蓄積された、操作者Ｕのいない背景を撮影した画像である背景画像と、画像入力手段１０から入力されたフレーム画像との差分画像を生成し、背景の領域を示すマスク画像を生成するものである。ここでは、背景差分処理手段１２は、差分処理部１２ａと、マスク画像生成部１２ｂと、背景画像選択部１２ｃとを備える。 The background difference processing means 12 obtains a difference image between a background image stored in a background image storage means 13 to be described later, which is an image of a background without an operator U, and a frame image input from the image input means 10. And generating a mask image indicating a background region. Here, the background difference processing means 12 includes a difference processing unit 12a, a mask image generation unit 12b, and a background image selection unit 12c.

差分処理部１２ａは、背景画像蓄積手段１３に蓄積された背景画像と、画像入力手段１０から入力されたフレーム画像との差分画像を生成するものである。ここで、差分処理部１２ａは、フレーム画像と背景画像との差分を画素ごとに計算して差分画像とする。ここで生成された差分画像はマスク画像生成部１２ｂと、背景画像選択部１２ｃとに出力される。なお、差分処理部１２ａが差分を計算する際に用いる信号としては、輝度信号、色差信号、色信号、Ｉ信号や、これらの信号を組み合わせたものを用いることができる。 The difference processing unit 12 a generates a difference image between the background image stored in the background image storage unit 13 and the frame image input from the image input unit 10. Here, the difference processing unit 12a calculates a difference between the frame image and the background image for each pixel to obtain a difference image. The difference image generated here is output to the mask image generation unit 12b and the background image selection unit 12c. In addition, as a signal used when the difference processing unit 12a calculates a difference, a luminance signal, a color difference signal, a color signal, an I signal, or a combination of these signals can be used.

マスク画像生成部１２ｂは、差分処理部１２ａによって生成された差分画像に基づいて、画像入力手段１０から入力されたフレーム画像において、操作者Ｕの撮影されていない領域を示すマスク画像を生成するものである。このマスク画像生成部１２ｂは、差分画像によって示される差分が閾値より小さい画素をマスクとし、マスク画像を生成することができる。この閾値には、例えば、背景画像を撮影した際の各画素値の分散、最大値などを用いることができる。ここで生成されたマスク画像は、指先検出手段１４の最近接ブロック判定部１４ｂに出力される。 The mask image generation unit 12b generates a mask image indicating a region where the operator U is not photographed in the frame image input from the image input unit 10 based on the difference image generated by the difference processing unit 12a. It is. The mask image generation unit 12b can generate a mask image using a pixel whose difference indicated by the difference image is smaller than a threshold as a mask. As this threshold value, for example, the variance or maximum value of each pixel value when a background image is taken can be used. The mask image generated here is output to the closest block determination unit 14 b of the fingertip detection unit 14.

背景画像選択部１２ｃは、差分処理部１２ａによって生成された差分画像に基づいて、画像入力手段１０から入力されたフレーム画像が、背景を撮影した画像であるかを判別するものである。ここで背景画像と判別されたフレーム画像は、背景画像蓄積手段１３に蓄積される。これによって、予め背景画像を背景画像蓄積手段１３に蓄積させなくても、背景画像選択部１２ｃによってフレーム画像の中から自動的に背景画像となる画像を選択することができる。 The background image selection unit 12c determines whether the frame image input from the image input unit 10 is an image of the background based on the difference image generated by the difference processing unit 12a. Here, the frame image determined as the background image is stored in the background image storage means 13. As a result, even if the background image is not stored in the background image storage unit 13 in advance, the background image selection unit 12c can automatically select an image to be the background image from the frame images.

ここで、背景画像選択部１２ｃは、差分処理部１２ａによって生成された差分画像によって示される差分の合計（変化量）が所定値より小さいときに、背景を撮影した画像であると判別することができる。なお、ここでは、背景画像選択部１２ｃは、背景の画像と判別した画像を背景画像蓄積手段１３に蓄積させるとともに、カメラノイズの影響を低減するために、背景画像蓄積手段１３に蓄積された背景を撮影した複数のフレーム画像の各々の画素値の平均値を画素値とする画像である背景画像を生成して背景画像蓄積手段１３に蓄積させることとした。 Here, the background image selection unit 12c may determine that the background image is captured when the total difference (change amount) indicated by the difference image generated by the difference processing unit 12a is smaller than a predetermined value. it can. Here, the background image selection unit 12c accumulates the image determined as the background image in the background image accumulation unit 13 and reduces the influence of camera noise, and the background accumulated in the background image accumulation unit 13 A background image which is an image having an average value of pixel values of a plurality of frame images obtained by shooting the pixel values as pixel values is generated and stored in the background image storage unit 13.

背景画像蓄積手段（背景画像記憶手段）１３は、背景を撮影した画像である背景画像を蓄積するもので、半導体メモリ、ハードディスク等の一般的な記憶手段である。ここでは、背景画像蓄積手段１３は、背景画像選択部１２ｃによって背景の画像と判別された画像と、背景画像選択部１２ｃによって生成された背景画像とを蓄積することとした。また、背景画像蓄積手段１３には、予め背景画像が蓄積されていることとしてもよいし、背景画像選択部１２ｃによって背景の画像と判別された最新の画像を背景画像として蓄積することとしてもよい。ここで蓄積されている背景画像は、差分処理部１２ａによって、差分画像を生成する際に読み出され、背景画像選択部１２ｃによって、背景画像を生成する際に読み出される。 The background image storage unit (background image storage unit) 13 stores a background image that is an image of the background, and is a general storage unit such as a semiconductor memory or a hard disk. Here, the background image accumulating unit 13 accumulates the image determined as the background image by the background image selecting unit 12c and the background image generated by the background image selecting unit 12c. The background image storage unit 13 may store a background image in advance, or may store the latest image determined as a background image by the background image selection unit 12c as a background image. . The background image accumulated here is read out when the difference processing unit 12a generates a difference image, and is read out when the background image selection unit 12c generates a background image.

指先検出手段１４は、距離画像生成手段１１によって生成された距離画像と、背景差分処理手段１２によって生成されたマスク画像とに基づいて、画像入力手段１０から入力されたフレーム画像内において撮影された操作者Ｕの指先の位置を検出するものである。ここで指先検出手段１４は、候補ブロック探索部１４ａと、最近接ブロック判定部１４ｂと、指先判定部１４ｃと、予測領域設定部１４ｄとを備える。 The fingertip detection unit 14 is captured in the frame image input from the image input unit 10 based on the distance image generated by the distance image generation unit 11 and the mask image generated by the background difference processing unit 12. The position of the fingertip of the operator U is detected. Here, the fingertip detection unit 14 includes a candidate block search unit 14a, a closest block determination unit 14b, a fingertip determination unit 14c, and a prediction region setting unit 14d.

候補ブロック探索部１４ａは、距離画像生成手段１１によって生成された距離画像のうち、背景差分処理手段１２によって生成されたマスク画像によって示されるマスク領域を除いた領域である非マスク領域及び後記する予測領域設定部１４ｄによって設定された予測領域をブロック化し、距離画像によって示される距離の短いブロックを所定数だけ抽出するものである。ここで抽出されたブロックは、最近接ブロック判定部１４ｂに出力される。 The candidate block search unit 14a includes, in the distance image generated by the distance image generation means 11, a non-mask area that is an area excluding the mask area indicated by the mask image generated by the background difference processing means 12, and the prediction described later The prediction region set by the region setting unit 14d is blocked, and a predetermined number of short blocks indicated by the distance image are extracted. The extracted block is output to the closest block determination unit 14b.

なお、ブロックは、例えば、８×８画素の領域のような、複数の画素によって構成される。また、非マスク領域は、フレーム画像において操作者Ｕの画像の存在する領域である。そのため、距離画像のこの領域について、距離の短いブロックを抽出することで、候補ブロック探索部１４ａは、背景の影響を除去して、操作者Ｕの指先に対応する領域を含むブロックを抽出することができる。また、予測領域は、指先の動きに基づいて、指先に対応するブロックが高い確率で存在すると予測された領域である。この予測領域からも距離の短いブロックを抽出することで、指先の位置を追跡し、安定して指先に対応する領域を含むブロックを抽出することができる。ここでは、候補ブロック探索部１４ａは、距離が短い順に所定数のブロックを抽出することとした。 The block is composed of a plurality of pixels such as an 8 × 8 pixel area. The non-mask area is an area where the image of the operator U exists in the frame image. Therefore, by extracting a block having a short distance for this region of the distance image, the candidate block search unit 14a removes the influence of the background and extracts a block including a region corresponding to the fingertip of the operator U. Can do. The predicted area is an area where a block corresponding to the fingertip is predicted to exist with a high probability based on the movement of the fingertip. By extracting a block having a short distance from the predicted area, the position of the fingertip can be tracked and a block including an area corresponding to the fingertip can be extracted stably. Here, the candidate block search unit 14a extracts a predetermined number of blocks in ascending order of distance.

最近接ブロック判定部１４ｂは、候補ブロック探索部１４ａによって抽出されたブロックのうち、１つのブロックを、指先位置を示すブロックとして選定するものである。ここで、最近接ブロック判定部１４ｂには、後記する予測領域設定部１４ｄによって前のフレーム画像の指先の位置に基づいて、この距離画像内において指先に対応するブロックが存在すると予測された予測領域の情報が入力され、最近接ブロック判定部１４ｂは、この予測領域内に、候補ブロック探索部１４ａによって抽出されたブロックが含まれる場合には、この予測領域内のブロックから距離の最も短いブロックを選定し、予測領域内に含まれない場合には、予測領域外において距離が最も短いブロックを選定する。ここで選定されたブロックは、指先判定部１４ｃに出力される。 The closest block determination unit 14b selects one block among the blocks extracted by the candidate block search unit 14a as a block indicating the fingertip position. Here, in the closest block determination unit 14b, based on the position of the fingertip of the previous frame image by the prediction region setting unit 14d described later, a prediction region predicted to include a block corresponding to the fingertip in this distance image When the block extracted by the candidate block search unit 14a is included in the prediction area, the closest block determination unit 14b selects the block having the shortest distance from the block in the prediction area. If it is selected and not included in the prediction region, the block having the shortest distance outside the prediction region is selected. The block selected here is output to the fingertip determination unit 14c.

指先判定部（指先判定手段）１４ｃは、最近接ブロック判定部１４ｂによって選定されたブロックの距離と、操作者Ｕの画像に対応するブロックの距離の平均値とを比較し、選定されたブロックが指先に対応するブロックか否かを判定するものである。ここで、指先判定部１４ｃは、背景差分処理手段１２によって生成されたマスク画像によって示されるマスク領域以外の領域の距離の平均値を算出し、この平均値と、最近接ブロック判定部１４ｂによって選定されたブロックの距離との差が所定値より大きいか否かを判断する。そして、所定値より大きければ、最近接ブロック判定部１４ｂは、このブロックが指先に対応するブロックであると判定し、距離画像上におけるこのブロックの位置の情報（指先位置）を予測領域設定部１４ｄとカーソル位置解析手段１８とに、指先に対応するブロックが検出されたことを示す検出開始信号を眉間位置設定手段１７に出力する。また、所定値より大きくなければ、指先判定部１４ｃは、このブロックは指先に対応するブロックではないと判定し、不検出信号をカーソル表示手段１９に出力する。 The fingertip determination unit (fingertip determination means) 14c compares the block distance selected by the closest block determination unit 14b with the average value of the block distances corresponding to the image of the operator U, and the selected block is It is determined whether the block corresponds to the fingertip. Here, the fingertip determination unit 14c calculates the average value of the distances of the regions other than the mask region indicated by the mask image generated by the background difference processing means 12, and selects this average value and the closest block determination unit 14b. It is determined whether or not the difference from the distance of the determined block is greater than a predetermined value. If it is greater than the predetermined value, the closest block determination unit 14b determines that this block is a block corresponding to the fingertip, and uses the position information (fingertip position) of this block on the distance image as the prediction region setting unit 14d. And a cursor position analyzing unit 18, a detection start signal indicating that a block corresponding to the fingertip has been detected is output to the eyebrow position setting unit 17. If it is not greater than the predetermined value, the fingertip determination unit 14c determines that this block is not a block corresponding to the fingertip, and outputs a non-detection signal to the cursor display means 19.

つまり、非マスク領域は、操作者Ｕの画像に対応する領域であり、この領域の距離の平均値は、フレーム画像を撮影したカメラから操作者Ｕまでの距離を示しているとみなすことができる。そして、所定値を、例えば、腕の長さ程度に設定しておき、予測領域内で最もカメラに近い部分が、マスク領域以外の領域内の距離の平均値より所定値以上近ければ、操作者Ｕは腕を上げて指先を表示画面に向けていることが分かる。一方、所定値以上近くなければ、操作者Ｕは指先を表示画面に向けておらず、表示画面上にカーソルを表示する必要がないことが分かる。なお、マスク領域以外の領域内の距離の平均値の代わりに、後記する眉間検出手段１６によって検出された眉間の位置における距離を用いることとしてもよい。 That is, the non-mask area is an area corresponding to the image of the operator U, and the average value of the distance of this area can be regarded as indicating the distance from the camera that captured the frame image to the operator U. . Then, the predetermined value is set to, for example, the length of the arm, and if the portion closest to the camera in the prediction region is closer than the average value of the distance in the region other than the mask region, the operator It can be seen that U raises his arm and points his fingertip toward the display screen. On the other hand, if it is not close to a predetermined value or more, it can be seen that the operator U does not point his fingertip at the display screen and does not need to display a cursor on the display screen. In addition, it is good also as using the distance in the position between the eyebrows detected by the eyebrow detection means 16 mentioned later instead of the average value of the distance in areas other than a mask area | region.

予測領域設定部（予測領域設定手段）１４ｄは、指先判定部１４ｃから入力された指先位置に基づいて、次のフレームの距離画像内において指先に対応するブロックが存在すると予測される領域である予測領域を設定するものである。ここでは、予測領域設定部１４ｄは、現在のフレームと１フレーム前の距離画像に基づいて検出された指先に対応するブロックの位置（指先位置）に基づいて、距離画像上において指先が移動する速度を推定し、その速度から次のフレームの距離画像における指先位置を推定する。なお、指先位置の推定には、例えば、一般的なカルマンフィルタを用いることができる。そして、予測領域設定部１４ｄは、その指先位置を中心とした所定の大きさの領域を予測領域とする。ここで設定された予測領域は、候補ブロック探索部１４ａと、最近接ブロック判定部１４ｂとに出力される。 The prediction region setting unit (prediction region setting unit) 14d is a region in which a block corresponding to the fingertip is predicted to exist in the distance image of the next frame based on the fingertip position input from the fingertip determination unit 14c. The area is set. Here, the prediction area setting unit 14d determines the speed at which the fingertip moves on the distance image based on the position (fingertip position) of the block corresponding to the fingertip detected based on the current frame and the distance image one frame before. And the fingertip position in the distance image of the next frame is estimated from the speed. For example, a general Kalman filter can be used for estimation of the fingertip position. Then, the prediction region setting unit 14d sets a region having a predetermined size centered on the fingertip position as the prediction region. The prediction region set here is output to the candidate block search unit 14a and the closest block determination unit 14b.

顔検出手段１５は、画像入力手段１０から入力されたフレーム画像から操作者Ｕの顔を検出するものである。この顔の検出は、後記する眉間検出手段１６によって、フレーム画像から操作者Ｕの両目の間である眉間の検出を高速に行うための前処理であり、眉間の検出が実時間で処理できる場合は、カーソル位置解析装置１は顔検出手段１５を備えず、眉間検出手段１６によって、画像入力手段１０から入力されたフレーム画像から直接眉間を検出することとしてもよい。ここで検出された顔の領域は、眉間検出手段１６に出力される。また、顔が検出されなかった場合には、顔検出手段１５は、顔が検出されなかった旨を眉間位置設定手段１７に通知する。このように、顔検出手段１５が検出範囲を限定して、後記する眉間検出手段１６による眉間の位置を検出するため、処理速度を向上させることができる。 The face detection unit 15 detects the face of the operator U from the frame image input from the image input unit 10. This face detection is pre-processing for detecting the eyebrows between the eyes of the operator U from the frame image at high speed by the eyebrows detecting means 16 described later, and the detection of the eyebrows can be processed in real time. The cursor position analysis device 1 may not include the face detection unit 15, and the eyebrow detection unit 16 may directly detect the eyebrow interval from the frame image input from the image input unit 10. The face area detected here is output to the eyebrow detection means 16. If no face is detected, the face detection means 15 notifies the eyebrow position setting means 17 that no face has been detected. Thus, since the face detection unit 15 limits the detection range and detects the position between the eyebrows by the eyebrow detection unit 16 described later, the processing speed can be improved.

なお、この顔検出手段１５は、一般的な顔検出手法を用いることができる。例えば、顔検出手段１５は、色情報に基づく手法を用いることができる。この手法は、フレーム画像内において、予め設定した顔色に近い色が存在する領域を、顔の領域と判断する。この手法には、色の差分を求めるためにＲＧＢ空間を用いる方法や、色差のみを用いる方法や、Ｉ軸を用いる方法などがある。 The face detection means 15 can use a general face detection method. For example, the face detection means 15 can use a method based on color information. According to this method, an area where a color close to a preset face color exists in the frame image is determined as a face area. This method includes a method using an RGB space for obtaining a color difference, a method using only a color difference, and a method using an I axis.

また、顔検出手段１５は、顔検出手法として、ＯｐｅｎＣＶライブラリに含まれる顔画像認識アルゴリズムを用いることとしてもよい。このＯｐｅｎＣＶは、インテル（登録商標）社によって開発者向けにオープンソースで公開されている、ＣｏｍｐｕｔｅｒＶｉｓｉｏｎ関連のソフトウェア・ライブラリーである。このＯｐｅｎＣＶに含まれる顔認識のアルゴリズムは、ＰａｕｌＶｉｏｌａらによって提案され、ＲａｉｎｅｒＬｉｅｎｈａｒらにより改良されたものである。この顔認識アルゴリズムでは、数百のサンプルにより学習を行った検出器を用いて、顔の検出を行う。検出器は階層化された構造を持ち、高速にオブジェクトの検出を行うことができる。 In addition, the face detection unit 15 may use a face image recognition algorithm included in the OpenCV library as a face detection method. This OpenCV is a software library related to Computer Vision that is open sourced to developers by Intel (registered trademark). The face recognition algorithm included in OpenCV was proposed by Paul Viola et al. And improved by Rainer Lienhar et al. In this face recognition algorithm, a face is detected by using a detector that has learned from hundreds of samples. The detector has a hierarchical structure and can detect an object at high speed.

ここで、この顔検出手段１５は、例えば、後記する眉間検出手段１６の眉間候補点選択部１６ｂによって選択された前フレームの眉間の位置に基づいて、顔を検出する範囲を特定することとしてもよい。 Here, for example, the face detection unit 15 may specify a range for detecting a face based on the position between the eyebrows of the previous frame selected by the eyebrow candidate point selection unit 16b of the eyebrow detection unit 16 described later. Good.

眉間検出手段（視点検出手段）１６は、顔検出手段１５によって検出された顔の画像の領域から、眉間の位置を検出するものである。ここでは、眉間検出手段１６は、リングフィルタを用いて眉間の位置を検出することとした。眉間検出手段１６は、リングフィルタ部１６ａと、眉間候補点選択部１６ｂとを備える。なお、ここでは、眉間検出手段１６が、リングフィルタを用いる場合について説明するが、眉間検出手段１６は、この手法に限定されず、眉間の位置を検出する様々な手法によって実現することができる。 The eyebrow detection means (viewpoint detection means) 16 detects the position between the eyebrows from the area of the face image detected by the face detection means 15. Here, the eyebrow detection means 16 detects the position between the eyebrows using a ring filter. The eyebrow detection means 16 includes a ring filter unit 16a and an eyebrow candidate point selection unit 16b. Here, the case where the eyebrow detection unit 16 uses a ring filter will be described. However, the eyebrow detection unit 16 is not limited to this method, and can be realized by various methods for detecting the position between the eyebrows.

リングフィルタ部１６ａは、顔検出手段１５から入力された顔の画像の領域から、１次元のリングフィルタによって、眉間の画像の候補を抽出するものである。ここで、リングフィルタ部１６ａは、１次元の情報に基づいて候補を抽出するため、眉間以外の領域も含む複数の候補が抽出される。ここで抽出された眉間の候補は、眉間候補点選択部１６ｂに出力される。 The ring filter unit 16a extracts an image candidate between the eyebrows from the face image region input from the face detection means 15 by a one-dimensional ring filter. Here, since the ring filter unit 16a extracts candidates based on one-dimensional information, a plurality of candidates including regions other than the eyebrows are extracted. The eyebrow candidate extracted here is output to the eyebrow candidate point selection unit 16b.

眉間候補点選択部１６ｂは、リングフィルタ部１６ａによって抽出された複数の眉間の候補から、１つの眉間の候補を操作者Ｕの眉間の画像として選択し、この候補の位置を検出するものである。ここでは、眉間候補点選択部１６ｂは、眉間の候補に対して、図示しない蓄積手段に予め蓄積された眉間の２次元テンプレートのマッチングを行い、最も一致する候補の位置を検出することとした。ここで検出された眉間の位置の情報は、眉間位置設定手段１７に出力される。 The eyebrow candidate point selection unit 16b selects one eyebrow candidate as an image between the eyebrows of the operator U from the plurality of eyebrow candidates extracted by the ring filter unit 16a, and detects the position of this candidate. . Here, the eyebrow candidate point selection unit 16b matches the eyebrow candidate with a two-dimensional template between eyebrows stored in advance in a storage unit (not shown) to detect the position of the most matching candidate. Information on the position between the eyebrows detected here is output to the position setting means 17 between the eyebrows.

眉間位置設定手段１７は、フレーム画像内における操作者Ｕの眉間の位置である眉間位置を設定するものである。ここで、眉間位置設定手段１７は、顔検出手段１５からの顔が検出されなかった通知と、眉間候補点選択部１６ｂからの眉間の位置の情報のいずれかを入力し、眉間位置を設定する。また、眉間位置設定手段１７は、指先検出手段１４の指先判定部１４ｃから入力される検出開始信号に基づいて、指先検出手段１４によって指先が検出され続けている間は、眉間位置をロック（固定）する。ここで設定された眉間位置は、カーソル位置解析手段１８に出力される。以下、眉間位置設定手段１７が眉間位置を設定する方法について説明する。 The eyebrow position setting means 17 is for setting the position between the eyebrows, which is the position between the eyebrows of the operator U in the frame image. Here, the eyebrow position setting means 17 inputs either the notification that the face is not detected from the face detection means 15 or the information on the position between the eyebrows from the eyebrow candidate point selection unit 16b, and sets the position between the eyebrows. . The eyebrow position setting means 17 locks (fixes) the position between the eyebrows while the fingertip detection means 14 continues to detect the fingertip based on the detection start signal input from the fingertip determination unit 14c of the fingertip detection means 14. ) The position between the eyebrows set here is output to the cursor position analyzing means 18. Hereinafter, a method in which the eyebrow position setting unit 17 sets the eyebrow position will be described.

眉間位置設定手段１７は、指先検出手段１４の指先判定部１４ｃから１つ前のフレーム及び現在のフレームにおいて検出開始信号が入力されると、１つ前のフレームにおいて設定された眉間位置を現在のフレームの眉間位置とする。また、眉間位置設定手段１７は、眉間候補点選択部１６ｂから眉間の位置を示す情報が入力され、かつ、指先検出手段１４の指先判定部１４ｃから１つ前のフレーム及び現在のフレームにおいて検出開始信号が入力されていない場合には、眉間候補点選択部１６ｂから入力された眉間の位置を眉間位置とする。 When the detection start signal is input in the previous frame and the current frame from the fingertip determination unit 14c of the fingertip detection unit 14, the interbrow position setting unit 17 sets the interbrow position set in the previous frame to the current position. The position between the eyebrows of the frame. Further, the eyebrow position setting means 17 receives information indicating the position between the eyebrows from the eyebrow candidate point selection unit 16b, and starts detection in the previous frame and the current frame from the fingertip determination unit 14c of the fingertip detection means 14. When no signal is input, the position between the eyebrows input from the eyebrow candidate point selection unit 16b is set as the position between the eyebrows.

また、眉間位置設定手段１７は、眉間候補点選択部１６ｂから入力される眉間の位置の変動量が所定時間以上、所定値以下の場合にも、１つ前のフレームにおいて設定された眉間位置を現在のフレームの眉間位置とする。このように設定することで、眉間位置設定手段１７は、操作者ＵがカーソルＳを表示画面上に表示させる位置を指定する一連の動作を行う間は、操作者Ｕの眉間の位置をロックすることができ、カーソルＳの動きを安定させることができる。また、眉間位置設定手段１７は、操作者Ｕの眉間の位置にわずかな変動がある場合にも、操作者Ｕの眉間の位置をロックすることができ、カーソルＳの動きを安定させることができる。 Further, the eyebrow position setting means 17 also determines the position between the eyebrows set in the previous frame even when the amount of change in the position between the eyebrows input from the eyebrow candidate point selection unit 16b is not less than a predetermined time and not more than a predetermined value. The position between the eyebrows of the current frame. By setting in this way, the eyebrow position setting means 17 locks the position between the eyebrows of the operator U while the operator U performs a series of operations for specifying the position at which the cursor S is displayed on the display screen. And the movement of the cursor S can be stabilized. Moreover, the position setting means 17 between eyebrows can lock the position between the eyebrows of the operator U and can stabilize the movement of the cursor S even when the position between the eyebrows of the operator U is slightly changed. .

更に、眉間位置設定手段１７は、顔検出手段１５からの顔が検出されなかった通知が入力された場合には、フレーム画像の中央の位置を眉間位置として設定することとした。また、眉間位置設定手段１７は、眉間候補点選択部１６ｂからの眉間の位置の情報が入力されていない場合には、過去に眉間検出手段１６によって眉間が検出されていれば、最後の眉間の位置を、検出されていなければ顔検出手段１５によって検出された顔の領域の中央の位置を眉間位置として設定する。 Further, the eyebrow position setting unit 17 sets the center position of the frame image as the eyebrow position when the notification that the face is not detected from the face detection unit 15 is input. In addition, when the information about the position between the eyebrows from the eyebrow candidate point selection unit 16b is not input, the eyebrow position setting unit 17 determines that the eyebrow position is detected by the eyebrow detection unit 16 in the past. If the position is not detected, the position of the center of the face area detected by the face detection means 15 is set as the eyebrow position.

カーソル位置解析手段（画面上位置解析手段）１８は、距離画像生成手段１１によって生成された距離画像と、指先判定部１４ｃによって検出された指先位置と、眉間位置設定手段１７によって設定された眉間位置とに基づいて、操作者Ｕの眉間から指先への延長線上に示される表示装置Ｄの表示画面上の位置を解析し、カーソルＳの位置として設定するものである。カーソル位置解析手段１８は、三次元位置算出部１８ａと、カーソル位置算出部１８ｂとを備える。 The cursor position analysis means (on-screen position analysis means) 18 includes a distance image generated by the distance image generation means 11, a fingertip position detected by the fingertip determination unit 14 c, and an eyebrow position set by the eyebrow position setting means 17. Based on the above, the position on the display screen of the display device D shown on the extension line from the eyebrow of the operator U to the fingertip is analyzed and set as the position of the cursor S. The cursor position analysis means 18 includes a three-dimensional position calculation unit 18a and a cursor position calculation unit 18b.

三次元位置算出部１８ａは、距離画像生成手段１１によって生成された距離画像と、指先判定部１４ｃによって検出された指先位置と、眉間位置設定手段１７によって設定された眉間位置とに基づいて、操作者Ｕの指先及び眉間の三次元位置を算出するものである。この際、予めキャリブレーションを行い、カメラパラメータを求めておく。そして、三次元位置算出部１８ａは、このカメラパラメータと、指先位置及び眉間位置と、この指先位置及び眉間位置における距離画像によって示される表示画面から指先及び眉間までの距離とに基づいて、指先及び眉間の三次元位置を算出することができる。ここで算出された指先及び眉間の三次元位置は、カーソル位置算出部１８ｂに出力される。 The three-dimensional position calculation unit 18a operates based on the distance image generated by the distance image generation unit 11, the fingertip position detected by the fingertip determination unit 14c, and the interbrow position set by the interbrow position setting unit 17. The three-dimensional position between the fingertip of the person U and the eyebrows is calculated. At this time, calibration is performed in advance to obtain camera parameters. Then, the three-dimensional position calculation unit 18a, based on the camera parameters, the fingertip position and the eyebrow position, and the distance from the display screen indicated by the distance image at the fingertip position and the eyebrow position to the fingertip and the eyebrow position, A three-dimensional position between the eyebrows can be calculated. The three-dimensional position between the fingertip and the eyebrows calculated here is output to the cursor position calculation unit 18b.

カーソル位置算出部１８ｂは、三次元位置算出部１８ａによって算出された指先及び眉間の三次元位置に基づいて、カーソルＳを表示する位置である、操作者Ｕの指先によって示される表示装置Ｄの表示画面上の位置を解析するものである。ここで、カーソル位置解析装置１には、カメラと表示画面との位置関係が予め登録されていることとする。そして、カーソル位置算出部１８ｂは、指先及び眉間の三次元位置に基づいて、この２点を通る直線の方程式を求め、この直線と表示画面との交点を算出して、この交点の表示画面上における位置をカーソルＳの位置とする。これによって、カーソル位置算出部１８ｂは、操作者Ｕから見て、指先の延長線上にある表示画面上の位置を解析することができる。ここで解析されたカーソルＳの位置は、カーソル表示手段１９に出力される。なお、カメラと表示画面との位置関係を示すデータは、表示画面とカメラの位置関係が変化しない限り、固定の値である。 The cursor position calculation unit 18b displays the display device D indicated by the fingertip of the operator U, which is a position for displaying the cursor S, based on the three-dimensional position between the fingertip and the eyebrows calculated by the three-dimensional position calculation unit 18a. Analyzes the position on the screen. Here, it is assumed that the positional relationship between the camera and the display screen is registered in advance in the cursor position analysis apparatus 1. Then, the cursor position calculation unit 18b calculates an equation of a straight line passing through the two points based on the three-dimensional position between the fingertip and the eyebrows, calculates an intersection of the straight line and the display screen, and displays the intersection on the display screen. The position at is the position of the cursor S. Thus, the cursor position calculation unit 18b can analyze the position on the display screen that is on the extension line of the fingertip as viewed from the operator U. The position of the cursor S analyzed here is output to the cursor display means 19. The data indicating the positional relationship between the camera and the display screen is a fixed value unless the positional relationship between the display screen and the camera changes.

カーソル表示手段１９は、後記する映像入力手段２０から表示画面に表示する映像を入力して、この映像を構成するフレーム画像の各々に対して、カーソル位置算出部１８ｂによって算出された位置にカーソルＳの画像を付加して、表示可能な出力形式に変換して表示装置Ｄへ出力するものである。 The cursor display means 19 inputs a video to be displayed on the display screen from the video input means 20 to be described later, and the cursor S is positioned at the position calculated by the cursor position calculation unit 18b for each of the frame images constituting the video. These images are added, converted into a displayable output format, and output to the display device D.

映像入力手段２０は、表示装置Ｄの表示画面に表示するための映像を入力するものである。ここで入力された映像は、カーソル表示手段１９に出力される。 The video input means 20 inputs video for display on the display screen of the display device D. The video input here is output to the cursor display means 19.

以上によって、カーソル位置解析装置１は、操作者Ｕを撮影した映像に基づいて、操作者Ｕから見て指先の延長線上の表示画面上の位置に、カーソルＳを表示させることができる。そのため、操作者Ｕは、リモコン等のデバイスを持たずに、容易にカーソルＳの位置を設定することができる。また、カーソル位置解析装置１は、操作者Ｕの眉間（視点）の位置も検出するため、操作者Ｕがどの位置にいても、操作者Ｕから見て指先の延長線上にカーソルＳを表示させることができる。 As described above, the cursor position analyzing apparatus 1 can display the cursor S at a position on the display screen on the extension line of the fingertip as viewed from the operator U, based on the video image of the operator U. Therefore, the operator U can easily set the position of the cursor S without having a device such as a remote controller. The cursor position analyzing apparatus 1 also detects the position of the eyebrows (viewpoint) of the operator U, so that the cursor S is displayed on the extension line of the fingertip as viewed from the operator U regardless of the position of the operator U. be able to.

また、カーソル位置解析装置１によれば、操作者Ｕの位置が変化したときに、補正するためのキャリブレーションを行う必要がなくなる。更に、カーソル位置解析装置１によれば、レーザ光を発したり、あるいは、操作者Ｕに対して、位置を検出するための検出器等を装着する必要がないため、操作者Ｕに害を与えたり、操作者Ｕに対して、検出器を取り付けることによる不快感を与えるようなことはない。 Further, according to the cursor position analyzing apparatus 1, it is not necessary to perform calibration for correction when the position of the operator U changes. Further, according to the cursor position analyzing apparatus 1, there is no need to emit a laser beam or to attach a detector or the like for detecting the position to the operator U, which causes harm to the operator U. In addition, the operator U is not discomforted by attaching the detector.

また、カーソル位置解析装置１は、眉間位置設定手段１７によって、継続して指先を検出している間に、誤検出を起こしやすかった眉間検出にロックをかけることにより、安定してカーソルＳを移動させることができる。更に、カーソル位置解析装置１は、指先検出手段１４の予測領域設定部１４ｄによって、指先位置を追跡して予測領域を設定し、この予測領域内から次の指先位置を検出するため、誤検出を防ぎ安定して指先位置を検出することができる。 In addition, the cursor position analyzing apparatus 1 stably moves the cursor S by locking the eyebrow detection that is likely to cause erroneous detection while the fingertip position setting unit 17 continuously detects the fingertip. Can be made. Furthermore, the cursor position analysis apparatus 1 sets the prediction area by tracking the fingertip position by the prediction area setting unit 14d of the fingertip detection unit 14, and detects the next fingertip position from the prediction area. It is possible to prevent and stably detect the fingertip position.

なお、ここでは、カーソル位置解析装置１は、距離画像生成手段１１によって、視差値を表示画面からの距離に変換して、各々の画素にこの距離を対応させた距離画像を生成することとしたが、本発明のカーソル位置解析装置１は、距離画像生成手段１１によって、表示画面からフレーム画像の各画素に対応する被写体までの距離を示す距離画像を生成すればよく、例えば、この距離を示す視差値を各画素に対応させた視差画像を生成して距離画像とし、カーソル位置解析手段１８の三次元位置算出部１８ａによって視差値を距離に変換することとしてもよい。 Here, the cursor position analysis device 1 converts the parallax value into a distance from the display screen by the distance image generation unit 11 and generates a distance image in which each distance is associated with each pixel. However, the cursor position analyzing apparatus 1 of the present invention may generate a distance image indicating the distance from the display screen to the subject corresponding to each pixel of the frame image by the distance image generating unit 11. A parallax image in which a parallax value is associated with each pixel may be generated and used as a distance image, and the parallax value may be converted into a distance by the three-dimensional position calculation unit 18 a of the cursor position analysis unit 18.

更に、ここでは、カーソル位置解析装置１は、指先検出手段１４によって、予測領域を設定して、この予測領域とマスク領域とに基づいて、距離画像から指先位置を検出することとしたが、本発明のカーソル位置解析装置１は、予測領域を設定しないこととしてもよい。このとき、カーソル位置解析装置１は、予測領域設定部１４ｄを備えず、指先検出手段１４の候補ブロック探索部１４ａによって、距離画像の非マスク領域をブロック化して距離の短いブロックを所定数抽出し、最近接ブロック判定部１４ｂによって、抽出されたブロックから距離の最も短いブロックを選定して、指先判定部１４ｃによって、選定されたブロックが指先に対応するブロックか否かを判定する。 Further, here, the cursor position analyzing apparatus 1 sets the prediction area by the fingertip detection unit 14 and detects the fingertip position from the distance image based on the prediction area and the mask area. The cursor position analysis apparatus 1 of the invention may not set the prediction region. At this time, the cursor position analysis device 1 does not include the prediction region setting unit 14d, and the candidate block search unit 14a of the fingertip detection unit 14 blocks the non-mask region of the distance image and extracts a predetermined number of blocks having a short distance. The nearest block determination unit 14b selects the block having the shortest distance from the extracted block, and the fingertip determination unit 14c determines whether the selected block is a block corresponding to the fingertip.

また、カーソル位置解析装置１は、コンピュータにおいて各手段を各機能プログラムとして実現することも可能であり、各機能プログラムを結合して、カーソル位置解析プログラムとして動作させることも可能である。 The cursor position analyzing apparatus 1 can also realize each means as a function program in a computer, and can also operate the cursor position analyzing program by combining the function programs.

［カーソル位置解析装置の動作］
次に、図３から図６（適宜図２参照）を参照して、本発明におけるカーソル位置解析装置１の動作について説明する。まず、図３を参照して、カーソル位置解析装置１が、操作者Ｕを撮影した映像を外部から入力し、当該映像に示される操作者Ｕの指先によって示される表示装置Ｄの表示画面の位置にカーソルＳを表示させる動作について説明する。図３は、本発明におけるカーソル位置解析装置の動作を示したフローチャートである。 [Operation of cursor position analyzer]
Next, the operation of the cursor position analyzing apparatus 1 according to the present invention will be described with reference to FIGS. First, referring to FIG. 3, the cursor position analysis device 1 inputs a video image of the operator U from the outside, and the position of the display screen of the display device D indicated by the fingertip of the operator U indicated in the video image An operation for displaying the cursor S on the screen will be described. FIG. 3 is a flowchart showing the operation of the cursor position analyzing apparatus according to the present invention.

カーソル位置解析装置１は、画像入力手段１０によって、立体カメラＣで操作者Ｕを撮影した映像のフレーム画像と、映像入力手段２０によって、表示画面に表示するための映像のフレーム画像とを外部から入力する（ステップＳ１０）。ここで、立体カメラＣは複数のカメラを有し、カーソル位置解析装置１は、画像入力手段１０によって、異なる位置から撮影した複数の映像を入力する。 The cursor position analyzing apparatus 1 externally displays a frame image of a video obtained by photographing the operator U with the stereoscopic camera C by the image input means 10 and a video frame image to be displayed on the display screen by the video input means 20. Input (step S10). Here, the stereoscopic camera C has a plurality of cameras, and the cursor position analysis apparatus 1 inputs a plurality of images taken from different positions by the image input means 10.

続いて、カーソル位置解析装置１は、距離画像生成手段１１による後記する距離画像生成動作によって、ステップＳ１０において画像入力手段１０から入力されたフレーム画像から、当該フレーム画像内に撮影された被写体の、表示画面からの距離を示す距離画像を生成し（ステップＳ１１）、ステップＳ１９に進む。 Subsequently, the cursor position analysis apparatus 1 uses the distance image generation operation described later by the distance image generation unit 11 to detect the subject imaged in the frame image from the frame image input from the image input unit 10 in step S10. A distance image indicating the distance from the display screen is generated (step S11), and the process proceeds to step S19.

また、ステップＳ１０においてフレーム画像を入力した後、カーソル位置解析装置１は、背景差分処理手段１２の差分処理部１２ａによって、背景画像蓄積手段１３に蓄積された背景画像と、ステップＳ１０において画像入力手段１０から入力されたフレーム画像との差分画像を生成する（ステップＳ１２）。続いて、カーソル位置解析装置１は、背景差分処理手段１２のマスク画像生成部１２ｂによって、ステップＳ１２において生成された差分画像に基づいて、操作者Ｕの撮影されていない領域を示すマスク画像を生成する（ステップＳ１３）。 After inputting the frame image in step S10, the cursor position analyzing apparatus 1 uses the difference processing unit 12a of the background difference processing unit 12 to store the background image stored in the background image storage unit 13 and the image input unit in step S10. A difference image from the frame image input from 10 is generated (step S12). Subsequently, the cursor position analyzing apparatus 1 generates a mask image indicating a region where the operator U has not been photographed based on the difference image generated in step S12 by the mask image generation unit 12b of the background difference processing unit 12. (Step S13).

また、カーソル位置解析装置１は、背景差分処理手段１２の背景画像選択部１２ｃによって、ステップＳ１３において生成された差分画像に基づいて、ステップＳ１０において画像入力手段１０から入力されたフレーム画像が、背景を撮影した画像であるかを判断する（ステップＳ１４）。そして、背景の画像である場合（ステップＳ１４でＹｅｓ）には、カーソル位置解析装置１は、背景画像選択部１２ｃによって、ステップＳ１０において画像入力手段１０から入力されたフレーム画像及びすでに背景画像蓄積手段１３に蓄積された背景の画像に基づいて背景画像を生成し、このフレーム画像と背景画像とを背景画像蓄積手段１３に蓄積させ（ステップＳ１５）ステップＳ１６に進む。また、背景の画像でない場合（ステップＳ１４でＮｏ）には、そのままステップＳ１９に進む。 Further, the cursor position analyzing apparatus 1 determines that the frame image input from the image input unit 10 in step S10 is based on the difference image generated in step S13 by the background image selection unit 12c of the background difference processing unit 12. It is determined whether the image is a photographed image (step S14). When the image is a background image (Yes in step S14), the cursor position analysis apparatus 1 uses the background image selection unit 12c to input the frame image input from the image input unit 10 in step S10 and the background image storage unit. A background image is generated based on the background image stored in 13, and the frame image and the background image are stored in the background image storage means 13 (step S15), and the process proceeds to step S16. If it is not a background image (No in step S14), the process proceeds to step S19.

また、ステップＳ１０においてフレーム画像を入力した後、カーソル位置解析装置１は、顔検出手段１５によって、ステップＳ１０において画像入力手段１０から入力されたフレーム画像から操作者Ｕの顔の画像を検出する（ステップＳ１６）。そして、カーソル位置解析装置１は、眉間検出手段１６のリングフィルタ部１６ａによって、リングフィルタを用いて眉間の候補を抽出する（ステップＳ１７）。更に、カーソル位置解析装置１は、眉間検出手段１６の眉間候補点選択部１６ｂによって、ステップＳ１７において抽出された眉間の候補から、１つの眉間の候補を選択し、この眉間の位置を検出し（ステップＳ１８）、ステップＳ１９に進む。 After the frame image is input in step S10, the cursor position analyzing apparatus 1 detects the face image of the operator U from the frame image input from the image input unit 10 in step S10 by the face detection unit 15 ( Step S16). Then, the cursor position analysis apparatus 1 extracts the eyebrow candidates using the ring filter by the ring filter unit 16a of the eyebrow detection unit 16 (step S17). Furthermore, the cursor position analyzing apparatus 1 selects a candidate between the eyebrows from the candidate between eyebrows extracted in step S17 by the candidate point selecting unit 16b between eyebrows, and detects the position between the eyebrows ( The process proceeds to step S18) and step S19.

そして、カーソル位置解析装置１は、指先検出手段１４による後記する指先検出動作によって、ステップＳ１１において生成された距離画像と、ステップＳ１３において生成されたマスク画像とに基づいて指先位置を検出し（ステップＳ１９）、ステップＳ２０に進む。そして、カーソル位置解析装置１は、指先検出手段１４の指先判定部１４ｃによって、指先の位置が検出されたかを判断する（ステップＳ２０）。つまり、カーソル位置解析装置１は、指先判定部１４ｃによって、ステップＳ１３において生成されたマスク画像によって示されるマスク領域以外の領域内における距離画像によって示される距離の平均値を算出し、この平均値と、ステップＳ１９において検出された指先位置の距離画像によって示される距離との差が所定値より大きいかを判断する。 Then, the cursor position analysis apparatus 1 detects the fingertip position based on the distance image generated in step S11 and the mask image generated in step S13 by the fingertip detection operation described later by the fingertip detection unit 14 (step S13). S19), the process proceeds to step S20. Then, the cursor position analysis apparatus 1 determines whether the fingertip position is detected by the fingertip determination unit 14c of the fingertip detection unit 14 (step S20). That is, the cursor position analysis apparatus 1 calculates the average value of the distance indicated by the distance image in the area other than the mask area indicated by the mask image generated in step S13 by the fingertip determination unit 14c. Then, it is determined whether the difference from the distance indicated by the distance image of the fingertip position detected in step S19 is greater than a predetermined value.

そして、指先の位置が検出されなかった場合（ステップＳ２０でＮｏ）には、カーソル位置解析装置１は、カーソル表示手段１９によって、ステップＳ１０において映像入力手段２０から入力されたフレーム画像にカーソルＳの画像を付加せずにそのまま出力し（ステップＳ２１）、ステップＳ１０に戻って、操作者Ｕを撮影した映像と表示画面に表示するための映像とのフレーム画像を入力する動作以降の動作を行う。 If the position of the fingertip is not detected (No in step S20), the cursor position analysis device 1 uses the cursor display means 19 to place the cursor S on the frame image input from the video input means 20 in step S10. The image is output as it is without being added (step S21), and the process returns to step S10, and the operation after the operation of inputting the frame image of the video image of the operator U and the video image to be displayed on the display screen is performed.

また、指先の位置が検出された場合（ステップＳ２０でＹｅｓ）には、カーソル位置解析装置１は、予測領域設定部１４ｄによって、ステップＳ１９において検出された指先位置と、前のフレームについて検出された指先位置とに基づいて、次のフレームの距離画像の予測領域を設定する（ステップＳ２２）。そして、カーソル位置解析装置１は、眉間位置設定手段１７による後記する眉間位置設定動作によって、眉間位置を設定する（ステップＳ２３）。 When the position of the fingertip is detected (Yes in step S20), the cursor position analysis apparatus 1 detects the fingertip position detected in step S19 and the previous frame by the prediction region setting unit 14d. Based on the fingertip position, a prediction area of the distance image of the next frame is set (step S22). Then, the cursor position analyzing apparatus 1 sets the position between the eyebrows by the position setting operation between the eyebrows described later by the position setting means 17 for the eyebrows (step S23).

そして、カーソル位置解析装置１は、カーソル位置解析手段１８の三次元位置算出部１８ａによって、ステップＳ１１において生成された距離画像と、ステップＳ１９において検出された指先位置とステップＳ２３において検出された眉間位置とに基づいて、操作者Ｕの指先と眉間の三次元位置を算出する（ステップＳ２４）。そして、カーソル位置解析装置１は、カーソル位置算出部１８ｂによって、ステップＳ２４において算出された指先と眉間の三次元位置とに基づいて、この２点を通る直線と表示画面との交点を算出し、この交点の表示画面上における位置をカーソルの位置として算出する（ステップＳ２５）。 Then, the cursor position analysis apparatus 1 includes the distance image generated in step S11, the fingertip position detected in step S19, and the eyebrow position detected in step S23 by the three-dimensional position calculation unit 18a of the cursor position analysis unit 18. Based on the above, the three-dimensional position between the fingertip of the operator U and the eyebrows is calculated (step S24). Then, the cursor position analysis apparatus 1 calculates the intersection of the straight line passing through the two points and the display screen based on the three-dimensional position between the fingertip and the eyebrows calculated in step S24 by the cursor position calculation unit 18b. The position of the intersection point on the display screen is calculated as the cursor position (step S25).

そして、カーソル位置解析装置１は、カーソル表示手段１９によって、ステップＳ１０において映像入力手段２０から入力されたフレーム画像における、ステップＳ２５において算出されたカーソルの位置に、カーソルＳの画像を付加して出力し（ステップＳ２６）、ステップＳ１０に戻って、操作者Ｕを撮影した映像のフレーム画像と表示画面に表示するための映像のフレーム画像を入力する動作以降の動作を行う。 The cursor position analyzing apparatus 1 adds the image of the cursor S to the position of the cursor calculated in step S25 in the frame image input from the video input unit 20 in step S10 by the cursor display unit 19 and outputs it. Then (step S26), returning to step S10, the operation after the operation of inputting the frame image of the video obtained by photographing the operator U and the frame image of the video to be displayed on the display screen is performed.

以上の動作によって、カーソル位置解析装置１は、操作者Ｕを撮影した映像に基づいて、操作者Ｕから見て指先の延長線上の表示画面上の位置に、カーソルを表示させることができる。 With the above operation, the cursor position analyzing apparatus 1 can display the cursor at a position on the display screen on the extension line of the fingertip as viewed from the operator U, based on the video image of the operator U.

（距離画像生成動作）
次に図４を参照（適宜図１及び図３参照）して、カーソル位置解析装置１が距離画像生成手段１１によって、画像入力手段１０から入力されたフレーム画像に基づいて距離画像を生成する距離画像生成動作（図３のステップＳ１１）について説明する。図４は、本発明におけるカーソル位置解析装置の距離画像生成動作を示したフローチャートである。 (Distance image generation operation)
Next, referring to FIG. 4 (refer to FIGS. 1 and 3 as appropriate), the distance at which the cursor position analyzing apparatus 1 generates a distance image by the distance image generating unit 11 based on the frame image input from the image input unit 10. The image generation operation (step S11 in FIG. 3) will be described. FIG. 4 is a flowchart showing the distance image generation operation of the cursor position analysis apparatus according to the present invention.

まず、カーソル位置解析装置１は、図３のステップＳ１０において画像入力手段１０から入力されたフレーム画像について、各々の画素を中心としたブロックを設定する（ステップＳ３０）。そして、カーソル位置解析装置１は、距離画像生成手段１１によって、注目するフレーム画像のブロックと、他のフレーム画像のブロックとのマッチングを取り、最もマッチングのとれたブロックの中心の画素のずれを、この画素の視差値とする（ステップＳ３１）。 First, the cursor position analyzing apparatus 1 sets a block centered on each pixel in the frame image input from the image input means 10 in step S10 of FIG. 3 (step S30). Then, the cursor position analysis device 1 uses the distance image generation unit 11 to match the block of the frame image of interest with the block of the other frame image, and to determine the pixel shift at the center of the most matched block, The parallax value of this pixel is set (step S31).

そして、カーソル位置解析装置１は、距離画像生成手段１１によって、注目するフレーム画像のすべての画素についてマッチングが終了したかを判断する（ステップＳ３２）。そして、すべての画素について終了していない場合（ステップＳ３２でＮｏ）には、カーソル位置解析装置１は、距離画像生成手段１１によって、次の画素を設定し（ステップＳ３３）、ステップＳ３１に戻って、注目するフレーム画像のこの画素のブロックと、他のフレーム画像のブロックとのマッチングを取る動作以降の動作を行う。 Then, the cursor position analysis apparatus 1 determines whether matching has been completed for all pixels of the frame image of interest by the distance image generation unit 11 (step S32). If all the pixels have not been completed (No in step S32), the cursor position analysis apparatus 1 sets the next pixel by the distance image generation unit 11 (step S33), and returns to step S31. The operation after the operation of matching the block of this pixel of the frame image of interest with the block of another frame image is performed.

また、すべての画素について終了した場合（ステップＳ３２でＹｅｓ）には、カーソル位置解析装置１は、距離画像生成手段１１によって、ステップＳ３１においてマッチングによって算出された視差値を距離に変換して距離画像を生成し（ステップＳ３４）、動作を終了する。 When all the pixels are completed (Yes in step S32), the cursor position analysis apparatus 1 converts the disparity value calculated by the matching in step S31 into a distance by the distance image generation unit 11 and converts the distance image. Is generated (step S34), and the operation is terminated.

（指先検出動作）
次に図５を参照（適宜図１及び図３参照）して、カーソル位置解析装置１が指先検出手段１４の候補ブロック探索部１４ａと最近接ブロック判定部１４ｂとによって、距離画像生成手段１１によって生成された距離画像から指先位置を検出する指先検出動作（図３のステップＳ１９）について説明する。図５は、本発明におけるカーソル位置解析装置の指先検出動作を示したフローチャートである。 (Fingertip detection operation)
Next, referring to FIG. 5 (refer to FIG. 1 and FIG. 3 as appropriate), the cursor position analyzing apparatus 1 is detected by the distance image generating unit 11 by the candidate block searching unit 14a and the closest block determining unit 14b of the fingertip detecting unit 14. A fingertip detection operation (step S19 in FIG. 3) for detecting the fingertip position from the generated distance image will be described. FIG. 5 is a flowchart showing the fingertip detection operation of the cursor position analysis apparatus according to the present invention.

まず、カーソル位置解析装置１は、指先検出手段１４の候補ブロック探索部１４ａによって、図３のステップＳ１１において生成された距離画像のうち、図３のステップＳ１３において生成されたマスク画像によって示される非マスク領域及び図３のステップＳ２２において設定された予測領域をブロック化し（ステップＳ４０）、距離画像によって示される距離の短いブロックを所定数抽出する（ステップＳ４１）。 First, the cursor position analysis apparatus 1 is shown by the mask image generated in step S13 of FIG. 3 among the distance images generated in step S11 of FIG. 3 by the candidate block search unit 14a of the fingertip detection unit 14. The mask area and the prediction area set in step S22 of FIG. 3 are blocked (step S40), and a predetermined number of blocks having a short distance indicated by the distance image are extracted (step S41).

そして、カーソル位置解析装置１は、最近接ブロック判定部１４ｂによって、ステップＳ４１において抽出されたブロックが予測領域に含まれているかを判断する（ステップＳ４２）。そして、予測領域に含まれている場合（ステップＳ４２でＹｅｓ）には、カーソル位置解析装置１は、最近接ブロック判定部１４ｂによって、予測領域内のブロックのうち距離の最も短いブロックを選定して、当該ブロックの位置を指先位置とし（ステップＳ４３）、動作を終了する。 Then, the cursor position analysis apparatus 1 determines whether the block extracted in step S41 is included in the prediction region by the closest block determination unit 14b (step S42). Then, when it is included in the prediction region (Yes in step S42), the cursor position analysis apparatus 1 selects the block with the shortest distance among the blocks in the prediction region by the closest block determination unit 14b. The position of the block is set as the fingertip position (step S43), and the operation is terminated.

また、予測領域に含まれていない場合（ステップＳ４２でＮｏ）には、カーソル位置解析装置１は、最近接ブロック判定部１４ｂによって、予測領域外のブロックのうち距離の最も短いブロックを選定して、当該ブロックの位置を指先位置とし（ステップＳ４４）、動作を終了する。 In addition, when it is not included in the prediction region (No in step S42), the cursor position analysis apparatus 1 selects the block with the shortest distance among the blocks outside the prediction region by the closest block determination unit 14b. Then, the position of the block is set as the fingertip position (step S44), and the operation is terminated.

（眉間位置設定動作）
次に図６を参照（適宜図１及び図３参照）して、カーソル位置解析装置１が眉間位置設定手段１７によって、眉間位置を設定する眉間位置設定動作（図３のステップＳ２３）について説明する。図６は、本発明におけるカーソル位置解析装置の眉間位置設定動作を示したフローチャートである。 (Brow position setting operation)
Next, referring to FIG. 6 (refer to FIG. 1 and FIG. 3 as appropriate), the inter-eyebrow position setting operation (step S23 in FIG. 3) in which the cursor position analyzer 1 sets the inter-brow position by the inter-brow position setting means 17 will be described. . FIG. 6 is a flowchart showing an eyebrow position setting operation of the cursor position analysis apparatus according to the present invention.

まず、カーソル位置解析装置１は、眉間位置設定手段１７によって、図３のステップＳ２０において指先が検出されたと判断された際に生成される検出開始信号が現在のフレームと１つ前のフレームについて入力されているかを判断する（ステップＳ５０）。そして、入力されている場合（ステップＳ５０でＹｅｓ）には、カーソル位置解析装置１は、眉間位置設定手段１７によって、１つ前のフレームにおいて設定された眉間位置を現在のフレームの眉間位置とし（ステップＳ５２）、動作を終了する。 First, the cursor position analyzing apparatus 1 inputs a detection start signal generated when it is determined by the eyebrow position setting means 17 that a fingertip has been detected in step S20 of FIG. 3 for the current frame and the previous frame. It is determined whether it has been performed (step S50). If it is input (Yes in step S50), the cursor position analysis apparatus 1 sets the inter-eyebrow position set in the previous frame by the inter-brow position setting unit 17 as the inter-brow position of the current frame ( Step S52), the operation is terminated.

また、入力されていない場合（ステップＳ５０でＮｏ）には、カーソル位置解析装置１は、眉間位置設定手段１７によって、図３のステップＳ１８において眉間の位置が検出され、眉間検出手段１６の眉間候補点選択部１６ｂから眉間の位置の情報が入力されているかを判断する（ステップＳ５１）。そして、入力されていない場合（ステップＳ５１でＮｏ）には、そのままステップＳ５５に進む。また、入力されている場合（ステップＳ５１でＹｅｓ）には、カーソル位置解析装置１は、眉間位置設定手段１７によって、以前に入力された眉間の位置の情報と位置の変動量を解析し、変動量が所定時間以上、所定値以下であるかを判断する（ステップＳ５３）。 If no input has been made (No in step S50), the cursor position analyzing apparatus 1 detects the position of the eyebrows in step S18 of FIG. It is determined whether information on the position between the eyebrows is input from the point selection unit 16b (step S51). If it has not been input (No in step S51), the process directly proceeds to step S55. In addition, when input (Yes in step S51), the cursor position analysis apparatus 1 analyzes the position information and the amount of change in position between the eyebrows by using the position setting means 17 between the eyebrows. It is determined whether the amount is not less than a predetermined value for a predetermined time (step S53).

そして、変動量が所定時間以上、所定値以下である場合（ステップＳ５３でＹｅｓ）には、カーソル位置解析装置１は、眉間位置設定手段１７によって、１つ前のフレームにおいて設定された眉間位置を現在のフレームの眉間位置とし（ステップＳ５２）、動作を終了する。また、変動量が所定時間以上、所定値以下でない場合（ステップＳ５３でＮｏ）には、カーソル位置解析装置１は、眉間位置設定手段１７によって、図３のステップＳ１８において検出された眉間の位置を眉間位置として設定し（ステップＳ５４）、動作を終了する。 When the fluctuation amount is not less than the predetermined time and not more than the predetermined value (Yes in step S53), the cursor position analyzing apparatus 1 uses the inter-eyebrow position setting unit 17 to determine the inter-eyebrow position set in the previous frame. The position between the eyebrows of the current frame is set (step S52), and the operation is terminated. If the fluctuation amount is not less than the predetermined time and not more than the predetermined value (No in step S53), the cursor position analyzing apparatus 1 determines the position between the eyebrows detected in step S18 in FIG. The position is set between the eyebrows (step S54), and the operation ends.

また、眉間の位置の情報が入力されていない場合（ステップＳ５１でＮｏ）には、カーソル位置解析装置１は、眉間位置設定手段１７によって、以前に眉間の位置が検出されているかを判断する（ステップＳ５５）。そして、検出されている場合（ステップＳ５５でＹｅｓ）には、カーソル位置解析装置１は、眉間位置設定手段１７によって、最後に検出された眉間の位置を眉間位置として設定する（ステップＳ５６）。 When the information on the position between the eyebrows is not input (No in step S51), the cursor position analyzing apparatus 1 determines whether the position between the eyebrows has been previously detected by the position setting means 17 between the eyebrows ( Step S55). If it is detected (Yes in step S55), the cursor position analyzing apparatus 1 sets the position between the eyebrows detected last by the interbrow position setting means 17 as the position between the eyebrows (step S56).

また、検出されていない場合（ステップＳ５５でＮｏ）には、カーソル位置解析装置１は、眉間位置設定手段１７によって、図３のステップＳ１６において顔が検出されなかったことが通知されているかを判断する（ステップＳ５７）。 If not detected (No in step S55), the cursor position analyzing apparatus 1 determines whether the face position setting unit 17 notifies that the face is not detected in step S16 of FIG. (Step S57).

そして、通知されている場合（ステップＳ５７でＹｅｓ）には、カーソル位置解析装置１は、眉間位置設定手段１７によって、図３のステップＳ１６において検出された顔の領域の中央の位置を眉間位置として設定し（ステップＳ５８）、動作を終了する。また、通知されていない場合（ステップＳ５７でＮｏ）には、カーソル位置解析装置１は、眉間位置設定手段１７によって、フレーム画像の中央の位置を眉間位置として設定し（ステップＳ５９）、動作を終了する。 When notified (Yes in step S57), the cursor position analyzing apparatus 1 uses the position between the eyebrows position setting means 17 to set the position of the center of the face area detected in step S16 in FIG. Set (step S58), the operation is terminated. If not notified (No in step S57), the cursor position analysis apparatus 1 sets the center position of the frame image as the position between the eyebrows by means of the eyebrow position setting means 17 (step S59), and ends the operation. To do.

本発明における実施の形態であるカーソル位置解析装置の概要を説明するための説明図である。It is explanatory drawing for demonstrating the outline | summary of the cursor position analysis apparatus which is embodiment in this invention. 本発明における実施の形態であるカーソル位置解析装置の構成を示したブロック図である。It is the block diagram which showed the structure of the cursor position analysis apparatus which is embodiment in this invention. 本発明におけるカーソル位置解析装置の動作を示したフローチャートである。It is the flowchart which showed operation | movement of the cursor position analysis apparatus in this invention. 本発明におけるカーソル位置解析装置の距離画像生成動作を示したフローチャートである。It is the flowchart which showed the distance image generation operation | movement of the cursor position analysis apparatus in this invention. 本発明におけるカーソル位置解析装置の指先検出動作を示したフローチャートである。It is the flowchart which showed the fingertip detection operation | movement of the cursor position analysis apparatus in this invention. 本発明におけるカーソル位置解析装置の眉間位置設定動作を示したフローチャートである。It is the flowchart which showed the eyebrow position setting operation | movement of the cursor position analyzer in this invention.

Explanation of symbols

１カーソル位置解析装置（表示画面上位置解析装置）
１０画像入力手段
１１距離画像生成手段
１２背景差分処理手段
１３背景画像蓄積手段（背景画像記憶手段）
１４指先検出手段
１４ｃ指先判定部（指先判定手段）
１４ｄ予測領域設定部（予測領域設定手段）
１６眉間検出手段（視点検出手段）
１８カーソル位置解析手段（画面上位置解析手段） 1 Cursor position analyzer (Position analyzer on display screen)
DESCRIPTION OF SYMBOLS 10 Image input means 11 Distance image generation means 12 Background difference processing means 13 Background image storage means (background image storage means)
14 fingertip detection means 14c fingertip determination unit (fingertip determination means)
14d Prediction region setting unit (Prediction region setting means)
16 Eyebrow detection means (viewpoint detection means)
18 Cursor position analysis means (on-screen position analysis means)

Claims

A display for inputting a video obtained by photographing a subject including an operator by at least two cameras, and analyzing a position on a display screen of a display device indicated by a fingertip of the operator based on a frame image constituting the video A position analysis device on the screen,
Image input means for inputting the frame images in time series;
Background image storage means for storing a background image that is an image of the background of the operator photographed by the camera;
Based on the parallax of the input frame image from the image input unit, the captured the object, a distance image generating means for generating a distance image is an image showing the distance from the display screen,
A difference processing unit that generates a difference image between the background image stored in the background image storage unit and the frame image input from the image input unit, and masks pixels whose difference indicated by the difference image is smaller than a threshold, A mask image generation unit configured to generate a mask image indicating a mask area not photographed by the operator in the frame image; and a change amount of a difference indicated by the difference image is equal to or less than a predetermined amount, the frame image is A background difference processing unit having a background image selection unit that determines that the background image is to be stored in the background image storage unit;
Based on the distance image generated by the distance image generation unit, and the fingertip detection means for detecting the fingertip position is a position in the frame image of the portion in which the distance in the prediction region of a predetermined size is the shortest,
Viewpoint detection means for detecting a viewpoint position that is a position between the eyes of the operator in the frame image from the frame image input from the image input means;
Indicated by the fingertip based on the fingertip position detected by the fingertip detection means, the viewpoint position detected by the viewpoint detection means, and the distance indicated by the distance image at the fingertip position and the viewpoint position. On-screen position analyzing means for analyzing the position on the display screen,
With
The fingertip detection means includes
Based on the detected fingertip position on the frame image input prior to those the frame image, and calculates a predicted position predicted the fingertip position in the range image, in the distance image, wherein including the predicted position A prediction area setting means for setting a prediction area;
The fingertip position is detected from a non-mask area that is an area excluding the mask area indicated by the mask image generated by the mask image generation section, and the distance at the fingertip position and the average value of the distance in the non-mask area And a fingertip determination means for determining whether or not the fingertip position indicates the position of the fingertip,
The on-screen position analyzing means analyzes the position on the display screen indicated by the fingertip when the fingertip detecting means determines that the fingertip position is a position corresponding to the fingertip. Upper position analysis device.

In order to analyze a position on the display screen of the display device indicated by the fingertip of the operator, based on a frame image constituting the image, by inputting a video image of a subject including the operator by at least two cameras A computer comprising background image storage means for storing a background image that is a background image of the operator photographed by the camera ;
Image input means for inputting the frame images in time series,
A distance image generating means for generating a distance image which is an image indicating a distance from the display screen of the photographed subject based on the parallax of the frame image input from the image input means;
A difference processing unit that generates a difference image between the background image stored in the background image storage unit and the frame image input from the image input unit, and masks pixels whose difference indicated by the difference image is smaller than a threshold, A mask image generation unit configured to generate a mask image indicating a mask area not photographed by the operator in the frame image; and a change amount of a difference indicated by the difference image is equal to or less than a predetermined amount, the frame image is A background difference processing unit having a background image selection unit that determines that the background image is to be stored in the background image storage unit;
The distance image based on the distance image generated by the generating means, the fingertip detection means for detecting the fingertip position the distance a predetermined size of the prediction region is located in the frame image of the portion having the shortest,
Viewpoint detection means for detecting a viewpoint position that is a position between the eyes of the operator in the frame image from the frame image input from the image input means;
Indicated by the fingertip based on the fingertip position detected by the fingertip detection means, the viewpoint position detected by the viewpoint detection means, and the distance indicated by the distance image at the fingertip position and the viewpoint position. Function as on-screen position analyzing means for analyzing the position on the display screen,
The fingertip detecting means ;
Based on the detected fingertip position on the frame image input prior to those the frame image, and calculates a predicted position predicted the fingertip position in the range image, in the distance image, wherein including the predicted position Prediction area setting means for setting the prediction area ;
The fingertip position is detected from a non-mask area that is an area excluding the mask area indicated by the mask image generated by the mask image generation section, and the distance at the fingertip position and the average value of the distance in the non-mask area And the fingertip determination means for determining whether or not the fingertip position indicates the position of the fingertip,
The on-screen position analyzing means analyzes the position on the display screen indicated by the fingertip when the fingertip detecting means determines that the fingertip position is a position corresponding to the fingertip. Top position analysis program.