JP2007148663A

JP2007148663A - Object-tracking device, object-tracking method, and program

Info

Publication number: JP2007148663A
Application number: JP2005340788A
Authority: JP
Inventors: Ikoku Go; 偉国呉
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2005-11-25
Filing date: 2005-11-25
Publication date: 2007-06-14
Anticipated expiration: 2025-11-25
Also published as: JP4682820B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an object-tracking device which very precisely tracks an object at high speed, and also provide an object-tracking method and a program. <P>SOLUTION: A plurality of positions of a hand area of a current-time image are forecasted by a hand-area position forecasting part 43 based on the position and moving speed of a hand area of previous-time image, and a plurality of object positions forecasted by a forecasted-area evaluation part 44 are evaluated based on the current-time image and a preliminarily registered hand area initial model. Then, an object is tracked by a hand-area tracking result estimating part 45 based on each of the evaluated object positions. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、人と機器との間のインタフェースに関し、入力画像から、例えば、人の手の領域といったオブジェクトを検出し、そのオブジェクトを追跡するオブジェクト追跡装置及びオブジェクト追跡方法、並びにオブジェクトを追跡する処理を実行させるプログラムに関するものである。 The present invention relates to an interface between a person and a device, and detects an object such as a human hand region from an input image and tracks the object, and an object tracking method and object tracking process It is related with the program which performs.

人と機器との間の自然なインタフェースを実現するには、人の手振り・身振りを含めて、操作者の様々な行動や音声などの情報を統合的に解析し、人の存在や意図などを検出し認識することが不可欠である。そこで、従来から人の行動解析に関する様々な技術が検討されてきた。これらを大別すると、人の顔に着目したもの（例えば、顔の向き、表情、視線、唇の動きなど）と、人の手足の動き（手振り、指差し、ジェスチャ、動きイベント、歩行パターンなど）に着目したものに分けることができる。その中でも一般環境での手領域の検出・認識は最も重要な技術の一つであり、数多く検討されている。例えば、人の手を左右に動かす往復運動に着目した手振り検出方法（例えば、非特許文献１参照。）、データグローブなどの特殊なものを取り付ける手法（例えば、非特許文献２参照。）、人の肌色情報などを用いる手法（例えば、非特許文献３及び非特許文献４参照。）などが提案されている。 In order to realize a natural interface between humans and devices, information such as human gestures and gestures, including various actions and voices of the operator, is analyzed in an integrated manner, and the presence and intention of humans are determined. It is essential to detect and recognize. Therefore, various techniques related to human behavior analysis have been studied. These can be broadly classified as those that focus on human faces (for example, face orientation, facial expression, line of sight, lip movement, etc.) and human limb movements (hand gestures, pointing, gestures, motion events, walking patterns, etc.) ). Among them, detection and recognition of the hand region in a general environment is one of the most important technologies, and many studies have been made. For example, a hand shaking detection method focusing on a reciprocating motion that moves a human hand to the left and right (for example, see Non-Patent Document 1), a method for attaching a special object such as a data glove (for example, see Non-Patent Document 2), a person Have been proposed (for example, see Non-Patent Document 3 and Non-Patent Document 4).

入江、梅田: 濃淡値の時系列変化を利用した画像からの手振りの検出、 P2-8, 画像の認識・理解シンポジウム（MIRU'02）（名工大（名古屋市），7 月30日-8 月1 日，2002）Irie, Umeda: Detection of hand gestures from images using time-series changes in gray values, P2-8, Symposium on Image Recognition and Understanding (MIRU'02) (Nagoya Institute of Technology (Nagoya City), July 30-August 1st, 2002) 星野: 非言語コミュニケーションが可能なロボットハンドの設計、文科省科研費特定領域研究「IT の深化の基盤を拓く情報学研究」研究項目A03「人間の情報処理の理解とその応用に関する研究」第4 回柱研究会（芝蘭会館（京都市），7 月2 日，2003）Hoshino: Design of robot hand capable of non-verbal communication, Ministry of Education, Culture, Sports, Science and Technology Grant-in-Aid for Scientific Research “Informatics research that pioneers the foundation of IT deepening” Research item A03 “Research on human information processing and its application” No. 4 Kaishira Kenkyukai (Shibaran Kaikan (Kyoto City), July 2, 2003) 伊藤、尾関、中村、大田: 映像インデキシングのための手と把持物体のロバストな認識と追跡、画像センシングシンポジウム、 pp.143-162, (2003)Ito, Ozeki, Nakamura, Ota: Robust recognition and tracking of hands and grasped objects for video indexing, Symposium on Image Sensing, pp.143-162, (2003) 入江、梅田：ジェスチャ認識に基づくインテリジェントルームの構築、第4 回動画像処理実利用化ワークショップ、平成１5 年3 月6 日−7 日Irie, Umeda: Construction of intelligent room based on gesture recognition, 4th rotating image processing practical use workshop, March 6-7, 1995

ところで、ジェスチャなどの人の行動は、人の手足といったオブジェクトの動きを時間的に追わなければ把握することができない。つまり、上記特許文献１〜４に記載された手領域検出は、時間的に継続して行ってやらなければならない。 By the way, a person's action such as a gesture cannot be grasped unless the movement of an object such as a person's limb is followed in time. That is, the hand region detection described in Patent Documents 1 to 4 must be performed continuously in time.

しかしながら、上記特許文献１〜４に記載された手領域検出を入力画像毎に行った場合、装置の処理負担が大きくなってしまい、手領域を追跡することが困難となる。 However, when the hand area detection described in Patent Documents 1 to 4 is performed for each input image, the processing burden on the apparatus increases, and it becomes difficult to track the hand area.

本発明は、上記問題点を解消するためになされたものであり、高速且つ、高精度にオブジェクトを追跡することができるオブジェクト追跡装置及びオブジェクト追跡方法、並びにプログラムを提供することを目的とする。 The present invention has been made to solve the above problems, and an object of the present invention is to provide an object tracking device, an object tracking method, and a program capable of tracking an object at high speed and with high accuracy.

上述した課題を解決するために、本発明に係るオブジェクト追跡装置は、画像から追跡対象のオブジェクトを追跡するオブジェクト追跡装置において、上記オブジェクトのオブジェクト位置を推定するための推測オブジェクトに対し、前時刻画像の推測オブジェクトの情報から推測オブジェクトが移動する予測範囲を算出し、当該予測範囲、前時刻画像の推測オブジェクトの推測オブジェクト位置及び動き速度に基づいて現時刻画像の推測オブジェクト位置を予測する位置予測手段と、上記位置予測手段で予測された複数の推測オブジェクト位置を、当該推測オブジェクトの現時刻画像と予め登録されたオブジェクト画像とのオブジェクト相似性に基づいて評価する位置評価手段と、上記位置評価手段で評価された各推測オブジェクト位置に基づいて現時刻画像のオブジェクト位置を推定する推定手段とを備えることにより、上述の課題を解決する。 In order to solve the above-described problem, an object tracking device according to the present invention is an object tracking device that tracks an object to be tracked from an image, and a previous time image for an estimated object for estimating the object position of the object. Position prediction means for calculating a prediction range in which the guess object moves from information on the guess object and predicting the guess object position of the current time image based on the forecast range, the guess object position of the guess object of the previous time image, and the motion speed A position evaluation unit that evaluates a plurality of estimated object positions predicted by the position prediction unit based on object similarity between a current time image of the estimated object and a previously registered object image; and the position evaluation unit At each guessed object location evaluated in By providing an estimation unit for estimating the object position at the present time image Zui, to solve the problems described above.

また、本発明に係るオブジェクト追跡方法は、画像から追跡対象のオブジェクトを追跡するオブジェクト追跡方法において、上記オブジェクトのオブジェクト位置を推定するための推測オブジェクトに対し、前時刻画像の推測オブジェクトの情報から推測オブジェクトが移動する予測範囲を算出し、当該予測範囲、前時刻画像の推測オブジェクトの推測オブジェクト位置及び動き速度に基づいて現時刻画像の推測オブジェクト位置を予測する位置予測工程と、上記位置予測工程で予測された複数の推測オブジェクト位置を、当該推測オブジェクトの現時刻画像と予め登録されたオブジェクト画像とのオブジェクト相似性に基づいて評価する位置評価工程と、上記位置評価工程で評価された各推測オブジェクト位置に基づいて現時刻画像のオブジェクト位置を推定する推定工程とを有することにより、上述の課題を解決する。 In addition, the object tracking method according to the present invention is an object tracking method for tracking an object to be tracked from an image, inferring from the estimated object information of the previous time image with respect to the estimated object for estimating the object position of the object. A position prediction step of calculating a prediction range in which the object moves, and predicting the estimated object position of the current time image based on the prediction range, the estimated object position of the estimated object of the previous time image, and the motion speed; A position evaluation step for evaluating the predicted plurality of estimated object positions based on object similarity between the current time image of the estimated object and a pre-registered object image, and each estimated object evaluated in the position evaluation step Obfuscation of the current time image based on position By having the estimating step of estimating the-objects position, to solve the problems described above.

また、本発明に係るプログラムは、画像から追跡対象のオブジェクトを追跡する処理を実行させるプログラムにおいて、上記オブジェクトのオブジェクト位置を推定するための推測オブジェクトに対し、前時刻画像の推測オブジェクトの情報から推測オブジェクトが移動する予測範囲を算出し、当該予測範囲、前時刻画像の推測オブジェクトの推測オブジェクト位置及び動き速度に基づいて現時刻画像の推測オブジェクト位置を予測する位置予測工程と、上記位置予測工程で予測された複数の推測オブジェクト位置を、当該推測オブジェクトの現時刻画像と予め登録されたオブジェクト画像とのオブジェクト相似性に基づいて評価する位置評価工程と、上記位置評価工程で評価された各推測オブジェクト位置に基づいて現時刻画像のオブジェクト位置を推定する推定工程とを有することにより、上述の課題を解決する。 Further, the program according to the present invention is a program for executing processing for tracking an object to be tracked from an image, and for estimating an object position for estimating the object position of the object from an estimated object information of a previous time image. A position prediction step of calculating a prediction range in which the object moves, and predicting the estimated object position of the current time image based on the prediction range, the estimated object position of the estimated object of the previous time image, and the motion speed; A position evaluation step for evaluating the predicted plurality of estimated object positions based on object similarity between the current time image of the estimated object and a pre-registered object image, and each estimated object evaluated in the position evaluation step Object of current time image based on position By having the estimating step of estimating the bets position, to solve the problems described above.

本発明によれば、追跡対象のオブジェクトのオブジェクト位置を推定するための推測オブジェクトに対し、前時刻画像の推測オブジェクトの情報から推測オブジェクトが移動する予測範囲を算出し、当該予測範囲、前時刻画像の推測オブジェクトの推測オブジェクト位置及び動き速度に基づいて現時刻画像の推測オブジェクト位置を予測し、予測された複数の推測オブジェクト位置を、当該推測オブジェクトの現時刻画像と予め登録されたオブジェクト画像とのオブジェクト相似性に基づいて評価し、評価された各推測オブジェクト位置に基づいて現時刻画像のオブジェクト位置を推定することにより、高速且つ、高精度にオブジェクトを追跡することができる。 According to the present invention, for a guess object for estimating the object position of an object to be tracked, a prediction range in which the guess object moves is calculated from information on the guess object in the previous time image, and the prediction range, the previous time image The estimated object position of the current time image is predicted based on the estimated object position and the motion speed of the estimated object, and a plurality of estimated estimated object positions are determined between the current time image of the estimated object and the object image registered in advance. By evaluating based on the object similarity and estimating the object position of the current time image based on each estimated estimated object position, the object can be tracked at high speed and with high accuracy.

以下、本発明の具体的な実施の形態について、図面を参照しながら詳細に説明する。本発明の具体例として示すオブジェクト追跡装置は、画像内のオブジェクトとして手領域を検出し、この手領域の前時刻の情報から手領域が移動する予測範囲を算出し、算出された予測範囲、前時刻画像の手領域の位置及び動き速度に基づいて現時刻画像の推測手領域位置を予測し、この手領域予測位置を、現時刻画像の分布特性と予め登録された手領域画像に基づいて評価することにより、現時刻の手領域を決定するものである。 Hereinafter, specific embodiments of the present invention will be described in detail with reference to the drawings. An object tracking device shown as a specific example of the present invention detects a hand region as an object in an image, calculates a prediction range in which the hand region moves from information on the previous time of the hand region, Predict the estimated hand region position of the current time image based on the position and movement speed of the hand region of the time image, and evaluate the hand region predicted position based on the distribution characteristics of the current time image and the hand region image registered in advance. By doing so, the hand area at the current time is determined.

図１は、手領域を追跡する手領域追跡装置の全体的な処理を示す図である。撮像手段１０１で撮像された現時刻画像は、画像取り込み手段１０２により取り込まれる。画像取り込み手段１０２により取り込まれた現時刻画像は、後述するように、手領域追跡処理や手領域検出処理に用いられる。モード判別手段１０３は、追跡モードであるか否かを判別する。 FIG. 1 is a diagram illustrating an overall process of a hand region tracking device that tracks a hand region. The current time image captured by the imaging unit 101 is captured by the image capturing unit 102. The current time image captured by the image capturing unit 102 is used for hand region tracking processing and hand region detection processing, as will be described later. The mode determination unit 103 determines whether or not the tracking mode is set.

追跡モードでは、追跡結果読込手段１０４により前時刻（ｔ−１）の手領域追跡結果が読み込まれるとともに、予め登録された手領域モデル１０５が読み込まれる。そして、追跡処理手段１０６により手領域の追跡が行われ、追跡結果保存手段１０７により現時刻（ｔ）の追跡結果が保存される。現時刻（ｔ）の追跡結果保存後、モード判別手段１０３にて次のモードが判別される。 In the tracking mode, the tracking result reading means 104 reads the hand region tracking result at the previous time (t−1) and also reads the hand region model 105 registered in advance. Then, tracking of the hand region is performed by the tracking processing unit 106, and the tracking result at the current time (t) is stored by the tracking result storing unit 107. After saving the tracking result at the current time (t), the mode determination unit 103 determines the next mode.

追跡処理手段１０６は、後述するように、前時刻画像の手領域情報から手領域が移動する予測範囲を算出し、算出された予測範囲、前時刻画像の手領域の位置及び動き速度に基づいて現時刻画像の推測手領域位置を複数予測する位置予測手段と、位置予測手段により予測された複数の推測手領域位置を、現時刻画像と予め登録された手領域モデル画像との相似性に基づいて評価する位置評価手段と、評価手段により評価された各推測手領域位置に基づいて手領域を追跡する追跡手段とを有している。 As will be described later, the tracking processing unit 106 calculates a prediction range in which the hand region moves from the hand region information of the previous time image, and based on the calculated prediction range, the position of the hand region of the previous time image, and the motion speed. Based on the similarity between the current time image and the hand region model image registered in advance, the position predicting unit that predicts a plurality of estimated hand region positions of the current time image, and the plurality of estimated hand region positions predicted by the position predicting unit. Position evaluation means for evaluating and tracking means for tracking the hand area based on each estimated hand area position evaluated by the evaluation means.

ここで、位置予測手段は、手領域が移動する予測範囲を、前時刻画像の手領域の大きさ及び動き速度に基づいて算出することが好ましい。また、位置予測手段は、前時刻画像の手領域位置を、前時刻画像の相似性の評価による確率に基づいて取得することが好ましい。 Here, it is preferable that the position predicting unit calculates a prediction range in which the hand region moves based on the size and the movement speed of the hand region of the previous time image. Moreover, it is preferable that a position prediction means acquires the hand region position of a previous time image based on the probability by evaluation of the similarity of a previous time image.

また、位置評価手段は、予測された各推測手領域位置を、現時刻画像の分布特性と予め登録された手領域画像の分布特性との相似性に基づいて評価することが好ましい。また、位置評価手段は、現時刻画像の分布特性と予め登録された手領域モデル画像の分布特性との相似性を、位置予測手段で予測された推測手領域位置の中心からの距離に基づいて評価することが好ましい。また、位置評価手段は、現時刻画像の分布特性と予め登録された手領域モデル画像の分布特性との相似性を、予め定められたマスク領域に基づいて評価することが好ましい。 Further, it is preferable that the position evaluation means evaluate each estimated hand region position based on the similarity between the distribution characteristics of the current time image and the distribution characteristics of the hand region image registered in advance. Further, the position evaluation means determines the similarity between the distribution characteristics of the current time image and the distribution characteristics of the hand area model image registered in advance based on the distance from the center of the estimated hand area position predicted by the position prediction means. It is preferable to evaluate. Further, the position evaluation means preferably evaluates the similarity between the distribution characteristics of the current time image and the distribution characteristics of the hand region model image registered in advance based on a predetermined mask region.

また、追跡手段は、位置評価手段で評価された推測手領域位置が手領域である確率を、この推測手領域位置の評価結果に基づいて算出し、算出された各推測手領域位置の確率に基づいて手領域を追跡することが好ましい。 The tracking means calculates the probability that the estimated hand area position evaluated by the position evaluation means is a hand area based on the evaluation result of the estimated hand area position, and calculates the probability of each estimated hand area position. It is preferable to track the hand region based on it.

また、追跡処理手段１０６は、所定の位置関係を有する複数のカメラからの画像に基づいて各画素の３次元空間内の位置を推定した距離情報を取得する距離情報取得手段と、上記距離情報取得手段で取得された距離情報に基づいて画像からオブジェクトの背景を除去する背景除去手段と有していることが好ましい。 Further, the tracking processing means 106 is a distance information acquisition means for acquiring distance information obtained by estimating the position of each pixel in the three-dimensional space based on images from a plurality of cameras having a predetermined positional relationship, and the distance information acquisition described above. It is preferable to have background removal means for removing the background of the object from the image based on the distance information acquired by the means.

一方、モード判別手段１０３において、追跡モードでないと判別された場合、手領域検出手段１０８は、現時刻（ｔ）の画像から手領域を検出する処理を行う。手領域検出判別手段１０９は、手領域検出手段１０８にて手領域が検出されたか否かを判別する。手領域が検出された場合、手領域モデル１０５を手領域登録手段１１０に登録し、モード切替手段１１１により追跡モードに切り替えられる。手領域が検出されなかった場合、モード判別手段１０３にて次のモードが判別される。 On the other hand, when the mode determining unit 103 determines that the mode is not the tracking mode, the hand region detecting unit 108 performs processing for detecting the hand region from the image at the current time (t). The hand region detection determination unit 109 determines whether or not the hand region detection unit 108 has detected a hand region. When the hand region is detected, the hand region model 105 is registered in the hand region registration unit 110 and switched to the tracking mode by the mode switching unit 111. When the hand region is not detected, the mode determination unit 103 determines the next mode.

続いて、モード判別手段１０３において追跡モードでないと判別された場合について説明する。 Next, a case where the mode determining unit 103 determines that the mode is not the tracking mode will be described.

図２は、手領域を検出する手領域検出手段１０８に対応した手領域検出装置の構成を示すブロック図である。この手領域検出装置１０は、撮像部１１と、撮像部１１で撮像された画像を取り込む画像取込部１２と、画像取込部１２で取り込んだ画像から肌色の候補領域を選定する肌色候補領域選定部１３と、肌色候補選定部１３で選定された肌色の候補領域から手候補領域を抽出する手候補領域抽出部１４と、手候補領域抽出部１４で抽出された手候補領域の形状の複雑度を算出する形状複雑度算出部１５と、形状複雑度算出部１５で算出された複雑度に基づいて手領域を検出する手領域検出部１６とを備えて構成される。 FIG. 2 is a block diagram showing a configuration of a hand region detection device corresponding to the hand region detection means 108 for detecting the hand region. The hand region detection device 10 includes an imaging unit 11, an image capturing unit 12 that captures an image captured by the imaging unit 11, and a skin color candidate region that selects a skin color candidate region from the image captured by the image capturing unit 12. The selection unit 13, the hand candidate region extraction unit 14 that extracts a hand candidate region from the skin color candidate region selected by the skin color candidate selection unit 13, and the complexity of the shape of the hand candidate region extracted by the hand candidate region extraction unit 14 A shape complexity calculating unit 15 for calculating the degree and a hand region detecting unit 16 for detecting a hand region based on the complexity calculated by the shape complexity calculating unit 15 are configured.

撮像部１１は、ＣＣＤ（Charge Coupled Device）等の撮像素子を有するカメラで構成されている。 The imaging unit 11 includes a camera having an imaging element such as a CCD (Charge Coupled Device).

画像取込部１２は、撮像部１１からＪＰＥＧ（Joint Photographic Experts Group）等のフォーマットの画像を取り込む。 The image capturing unit 12 captures an image in a format such as JPEG (Joint Photographic Experts Group) from the imaging unit 11.

肌色候補領域選定部１３は、人間の統計的肌色特徴に基づいた肌色モデルを用いて、画像内の各画素が肌色モデルの色であるかどうかを判定することによって、肌色候補領域を選定する。この肌色モデルは、色相、彩度、ＲＧＢ情報等を用いた統計的な処理により得られる。 The skin color candidate region selection unit 13 selects a skin color candidate region by determining whether each pixel in the image is the color of the skin color model using a skin color model based on human statistical skin color characteristics. This skin color model is obtained by statistical processing using hue, saturation, RGB information, and the like.

手候補領域抽出部１４は、肌色候補領域選定部１３で選定された各領域に対して、例えば、色相、彩度の分布等の画像分布特性を計算し、それらの領域内に例えば色相と彩度等の色特徴が類似する画素が最も大きい領域を手候補領域として抽出する。 The hand candidate area extraction unit 14 calculates, for example, image distribution characteristics such as hue and saturation distribution for each area selected by the skin color candidate area selection unit 13, and includes, for example, hue and saturation in those areas. A region having the largest pixel having similar color characteristics such as degree is extracted as a hand candidate region.

形状複雑度算出部１５は、手候補領域抽出部１４で抽出された各手候補領域に対して、例えば、面積（画素数）と周辺長との比や手候補領域の中心からの距離に基づいて形状複雑度を算出する。 For each hand candidate area extracted by the hand candidate area extraction unit 14, the shape complexity calculation unit 15 is based on, for example, the ratio of the area (number of pixels) to the peripheral length or the distance from the center of the hand candidate area. To calculate the shape complexity.

手領域検出部１６は、形状複雑度算出部１５で算出された各手候補領域の複雑度を、例えば、予め設定された閾値と比較評価することにより手領域を検出する。 The hand region detection unit 16 detects the hand region by comparing and evaluating the complexity of each hand candidate region calculated by the shape complexity calculation unit 15 with, for example, a preset threshold value.

なお、上述の手領域検出装置１０の構成では、撮像部１１から画像を取得することとしたが、ＤＶＤ（Digital Versatile Disc）、インターネット等の様々な媒体から画像を取得するようにしてもよい。 In the configuration of the hand region detection device 10 described above, an image is acquired from the imaging unit 11, but an image may be acquired from various media such as a DVD (Digital Versatile Disc) and the Internet.

続いて、入力画像より手領域を検出するための処理について図面を参照して説明する。図３〜図６は、ＲＧＢ情報を用いて入力画像から肌色候補を選定する際の画像例を示す図である。 Next, processing for detecting a hand region from an input image will be described with reference to the drawings. 3 to 6 are diagrams illustrating image examples when selecting skin color candidates from an input image using RGB information.

肌色候補領域選定部１３は、画像取込部１２により取り込んだ、例えば、図３に示す入力画像に対して、文献（M. J. Jones and J. M. Rehg: Statistical color models with application to skin detection, Proc. IEEE Conf.Computer Vision and Pattern Recognition, vol. 2, pp. 274-280, 1999.）に記載された各画素のＲＧＢ情報を用いることにより、図４のように肌色モデル色を検出する。そして、各画素が肌色モデル色であるかどうかを判定することにより、図５に示すように肌色候補領域Ａ〜Ｄを選定する。これにより、図３に示す入力画像に対して図６に示すような肌色候補領域Ａ〜Ｄを決定することができる。なお、ＲＧＢ情報に限らず、例えば、学習により得た肌色の統計的な情報（先見的知識）を用いて、各画素が肌色であるかどうかを判定し、画像のフィルタリング処理を用いた領域統合処理によって、画像内の肌色の候補領域を選定してもよい。 For example, the skin color candidate region selection unit 13 extracts the document (MJ Jones and JM Rehg: Statistical color models with application to skin detection, Proc. IEEE Conf.) From the input image shown in FIG. Using the RGB information of each pixel described in Computer Vision and Pattern Recognition, vol. 2, pp. 274-280, 1999.), the skin color model color is detected as shown in FIG. Then, by determining whether each pixel is a skin color model color, skin color candidate areas A to D are selected as shown in FIG. Thereby, skin color candidate areas A to D as shown in FIG. 6 can be determined for the input image shown in FIG. In addition to the RGB information, for example, by using statistical information (a priori knowledge) of skin color obtained by learning, it is determined whether each pixel is skin color, and region integration using image filtering processing A skin color candidate region in the image may be selected by the processing.

手候補領域抽出部１４は、図３に示すような入力画像から色相画像Ｈｕｅ（ｉ，ｊ）及び彩度画像Ｓｔａ（ｉ，ｊ）を計算し、入力画像を色相画像及び彩度画像に変換する。なお、色相はスペクトル上での色の位置を表しており、０度〜３６０度の角度により表されるものである。また、彩度は色の鮮やかさを表しており、灰色に近づくと値は小さくなり原色に近づくと値は大きくなる。 The hand candidate area extraction unit 14 calculates a hue image Hue (i, j) and a saturation image Sta (i, j) from the input image as shown in FIG. 3, and converts the input image into a hue image and a saturation image. To do. The hue represents the position of the color on the spectrum, and is represented by an angle of 0 degrees to 360 degrees. The saturation represents the vividness of the color. The value decreases as the color approaches gray, and the value increases as the color approaches the primary color.

また、手候補領域抽出部１４は、肌色候補領域選定部１３により選定された各肌色候補領域Ａ〜Ｄに対し、色相分布（ヒストグラム）ｈ_ｍ（ｋ）及び彩度分布（ヒストグラム）ｓ_ｍ（ｋ）（ｋ＝０，１，２，．．．，Ｋ）をそれぞれ（１）式及び（２）式により計算する。 The hand candidate area extraction unit 14 also outputs a hue distribution (histogram) h _m (k) and a saturation distribution (histogram) s _m (for each skin color candidate area A to D selected by the skin color candidate area selection unit 13. k) (k = 0, 1, 2,..., K) are calculated by the equations (1) and (2), respectively.

ここで、Ｒ_ｍは各肌色補領域を示し、ｍ（ｍ＝１，２，．．．，Ｍ）は、肌色候補領域の数である。また、ｇ（Ｈｕｅ（ｉ，ｊ））及びｇ（Ｓｔａ（ｉ，ｊ））は、それぞれ色相画像Ｈｕｅ（ｉ，ｊ）及び彩度画像Ｓｔａ（ｉ，ｊ）の画素（ｉ，ｊ）における輝度値を示す。また、δ[ｘ]は、ｘが０の場合１であり、ｘが０でない場合０である。 Here, R _m represents each skin color complementary region, and m (m = 1, 2,..., M) is the number of skin color candidate regions. Also, g (Hue (i, j)) and g (Sta (i, j)) are respectively in the pixel (i, j) of the hue image Hue (i, j) and the saturation image Sta (i, j). Indicates the luminance value. Further, δ [x] is 1 when x is 0, and 0 when x is not 0.

上記（１）式を用いて、例えば、図７に示す色相画像の肌色候補領域Ｂの色相分布ｈ_ｍ（ｋ）を計算すると、図８に示すようなヒストグラムを得ることができる。また、上記（２）式を用いて、例えば、図９に示す彩度画像の肌色候補領域Ｂの彩度分布ｓ_ｍ（ｋ）を計算すると、図１０に示すようなヒストグラムを得ることができる。 For example, when the hue distribution h _m (k) of the skin color candidate region B of the hue image shown in FIG. 7 is calculated using the above equation (1), a histogram as shown in FIG. 8 can be obtained. Moreover, for example, when the saturation distribution s _m (k) of the skin color candidate region B of the saturation image shown in FIG. 9 is calculated using the above equation (2), a histogram as shown in FIG. 10 can be obtained. .

図１１及び図１２は、上述のヒストグラムから検出された肌色の色相領域及び彩度領域の例である。また、図１３及び図１４は、それぞれ肌色の色相領域及び彩度領域に基づいて手候補領域が検出された色相画像及び彩度画像である。 FIGS. 11 and 12 are examples of flesh color hue areas and saturation areas detected from the above-described histogram. FIGS. 13 and 14 are a hue image and a saturation image in which hand candidate areas are detected based on the hue area and the saturation area of the skin color, respectively.

各肌色候補領域Ａ〜Ｄの色相分布ｈ_ｍ（ｋ）及び彩度分布ｓ_ｍ（ｋ）から、手候補領域抽出部１４は図１１及び図１２に示すように最も画素数が多い色相の値ｈ＿Ｌｅｖｅｌ及び彩度の値ｓ＿Ｌｅｖｅｌ並びにその分散ｈ＿ｖａｒ、ｓ＿ｖａｒを検出する。 From the hue distribution h _m (k) and the saturation distribution s _m (k) of each skin color candidate area A to D, the hand candidate area extraction unit 14 has a hue value with the largest number of pixels as shown in FIGS. The h_Level and the saturation value s_Level and their variances h_var and s_var are detected.

ここで、肌色は色相の値が低いため、各肌色候補領域Ａ〜Ｄの色相画像Ｈｕｅ（ｉ，ｊ）における色相の値がｈ＿Ｌｅｖｅｌ±ｈ＿ｖａｒ以内で、且つ、その値が肌色の色相初期値Ｈ＿Ｉｎｉｔｉａｌより小さい画素の領域を肌色の色相領域（画像）Ｈｍ（ｉ，ｊ）とする。これにより、図１３に示すような各肌色候補領域Ａ〜Ｄの肌色の色相領域Ｈｍ（ｉ，ｊ）が求められる。なお、図１３に示す肌色候補領域Ａ〜Ｄにおいて肌色の色相領域Ｈｍ（ｉ，ｊ）は黒色で示している。 Here, since the flesh color has a low hue value, the hue value in the hue image Hue (i, j) of each flesh color candidate area A to D is within h_Level ± h_var, and the value is the flesh initial hue value H_Initial. The smaller pixel area is defined as a flesh-colored hue area (image) Hm (i, j). As a result, the skin tone area Hm (i, j) of each skin color candidate area A to D as shown in FIG. 13 is obtained. In the skin color candidate regions A to D shown in FIG. 13, the skin color hue region Hm (i, j) is shown in black.

また、肌色は彩度の値が高いため、各肌色候補領域Ａ〜Ｄの彩度画像Ｓｔａ（ｉ，ｊ）における彩度の値がｓ＿Ｌｅｖｅｌ±ｓ＿ｖａｒ以内で、且つ、その値が肌色の彩度初期値Ｓ＿Ｉｎｉｔｉａｌより大きい画素の領域を肌色の彩度領域（画像）Ｓｍ（ｉ，ｊ）とする。これにより、図１４に示すような各肌色候補領域Ａ〜Ｄの肌色の彩度領域Ｓｍ（ｉ，ｊ）が求められる。なお、図１４に示す肌色候補領域Ａ〜Ｄにおいて肌色の彩度領域Ｓｍ（ｉ，ｊ）は白色で示している。 Further, since the skin color has a high saturation value, the saturation value in the saturation image Sta (i, j) of each skin color candidate region A to D is within s_Level ± s_var, and the value is the saturation of the skin color. A pixel region larger than the initial value S_Initial is defined as a skin color saturation region (image) Sm (i, j). Thus, the skin color saturation region Sm (i, j) of each skin color candidate region A to D as shown in FIG. 14 is obtained. In the skin color candidate areas A to D shown in FIG. 14, the skin color saturation area Sm (i, j) is shown in white.

そして、肌色の色相領域Ｈｍ（ｉ，ｊ）と彩度領域Ｓｍ（ｉ，ｊ）とが重なる画素領域を手候補領域Ｒｍとする。すなわち、図１５に示す手候補領域のように各肌色候補領域Ａ〜Ｄにおいて図１３に示す黒色の領域と図１４に示す白色の領域とが重なる領域である。 A pixel region where the skin tone region Hm (i, j) and the saturation region Sm (i, j) overlap is defined as a hand candidate region Rm. That is, like the hand candidate area shown in FIG. 15, in each skin color candidate area A to D, the black area shown in FIG. 13 and the white area shown in FIG. 14 overlap.

このように色相と彩度とが類似する画素が最も大きい領域、つまり主導的な色を示す領域を手の候補領域として抽出することにより、人の肌色の個人差や実環境での照明変化に対するロバスト性を高めることができる。 In this way, by extracting the region where the pixels with the same hue and saturation are the largest, that is, the region indicating the dominant color, as the candidate region of the hand, it is possible to cope with individual differences in human skin color and lighting changes in the real environment. Robustness can be improved.

形状複雑度算出部１５は、手候補領域抽出部１４で抽出された手候補領域Ｒｍに対し、例えば、その領域の面積（画素数）及び周辺長に基づいて手候補領域の複雑度を計算する。 The shape complexity calculation unit 15 calculates the complexity of the hand candidate region based on, for example, the area (number of pixels) and the peripheral length of the hand candidate region Rm extracted by the hand candidate region extraction unit 14. .

ここで、Ｌは手の候補領域の周囲長、Ｓは手候補領域の面積及びδは形状の複雑度を示す。 Here, L is the perimeter of the hand candidate region, S is the area of the hand candidate region, and δ is the complexity of the shape.

手領域検出部１６は、形状複雑度算出部１５で算出された各手候補領域の形状の複雑度を、予め決められた手の形状に応じた複雑度と比較評価する。図１６は、手の形状に応じた複雑度の算出（学習）結果例である。例えば、指が５本開いている状態、つまり手が開いている状態（Ｋ１）の形状複雑度は、手が開いていない状態（Ｋ６）の形状複雑度と大きく違う。これを考慮して経験的に手領域を検出するための閾値を決める。例えば、頭部領域が手領域として検出されないように、４本以上の指が開いている状態（δ＞０．９）の形状複雑度を手領域の閾値とする。そして、手候補領域の複雑度が決められた閾値より大きい場合を手領域として検出する。 The hand region detection unit 16 compares and evaluates the complexity of the shape of each hand candidate region calculated by the shape complexity calculation unit 15 with the complexity corresponding to a predetermined hand shape. FIG. 16 is an example of the calculation (learning) result of the complexity according to the hand shape. For example, the shape complexity in the state where five fingers are open, that is, the state where the hand is open (K1) is significantly different from the shape complexity in the state where the hand is not open (K6). Considering this, a threshold value for detecting the hand region is determined empirically. For example, the shape complexity in a state where four or more fingers are open (δ> 0.9) is set as the threshold value of the hand region so that the head region is not detected as the hand region. A case where the complexity of the hand candidate area is larger than a predetermined threshold is detected as a hand area.

図１７は、５本の指が開いていない状態、つまり、手が開いていない状態の画像を示し、図１８は、そのときの手候補領域の形状複雑度を示す。この図１７に示す画像が入力された場合、手領域検出部１６は、手候補領域ｂ、ｃの複雑度を閾値と比較する。そして、例えば、複雑度の閾値をδ＝０．９と設定した場合、手候補領域ｂ及び手候補領域ｃとも複雑度が閾値よりも低いため、手領域検出部１６はこれらを手領域として検出しない。 FIG. 17 shows an image in which five fingers are not open, that is, a hand is not open, and FIG. 18 shows the shape complexity of the hand candidate region at that time. When the image shown in FIG. 17 is input, the hand region detection unit 16 compares the complexity of the hand candidate regions b and c with a threshold value. For example, when the complexity threshold is set as δ = 0.9, the complexity of both the hand candidate area b and the hand candidate area c is lower than the threshold, so the hand area detection unit 16 detects these as hand areas. do not do.

また、図１９は、人差し指と中指の２本の指が開いている状態を示し、図２０は、そのときの手候補領域の形状複雑度を示す。この図１９に示す２本の指が開いている状態の画像が入力され、複雑度の閾値をδ＝０．９と設定した場合も、手候補領域ｂ及び手候補領域ｃとも複雑度が閾値よりも低いため、手領域検出部１６はこれらを手領域として検出しない。 FIG. 19 shows a state in which the two fingers, the index finger and the middle finger, are open, and FIG. 20 shows the shape complexity of the hand candidate region at that time. Even when an image with two fingers open as shown in FIG. 19 is input and the complexity threshold is set to δ = 0.9, the complexity of both the hand candidate area b and the hand candidate area c is a threshold. Therefore, the hand region detection unit 16 does not detect these as hand regions.

また、図２１は、５本の指が全て開いている状態、つまり手が開いている状態を示し、図２２は、そのときの手候補領域の形状複雑度を示す。この図２１に示すように手が開いている状態の画像が入力され、手候補領域の形状の複雑度の閾値をδ＝０．９と設定した場合、手候補領域ｃの複雑度は閾値より小さいが手候補領域ｂの複雑度は閾値より大きいため、手領域検出部１６は、手候補領域ｂを手領域として検出する。 FIG. 21 shows a state where all five fingers are open, that is, a state where the hand is open, and FIG. 22 shows the shape complexity of the hand candidate region at that time. As shown in FIG. 21, when an image with the hand open is input and the threshold value of the complexity of the shape of the hand candidate area is set to δ = 0.9, the complexity of the hand candidate area c is greater than the threshold value. Since the complexity of the hand candidate area b is smaller than the threshold value, the hand area detection unit 16 detects the hand candidate area b as a hand area.

なお、上述のように手候補領域の面積及び周辺長に基づいて複雑度を計算することに限らず、図２３に示すように、手候補領域の中心を検出し、中心からの距離に基づいて計算してもよい。この場合、（４）式のように距離関数ｄ（θ）を定義すればよい。 Note that, as described above, not only the complexity is calculated based on the area and the peripheral length of the hand candidate area, but the center of the hand candidate area is detected based on the distance from the center as shown in FIG. You may calculate. In this case, the distance function d (θ) may be defined as in equation (4).

図２４は、（４）式による距離関数のグラフである。ここで、距離関数ｄ（θ）の極大値が指の形状を示す閾値（Ｔｈｄ）以上である場合、その極大値の数を指の本数と推定することができる。 FIG. 24 is a graph of the distance function according to equation (4). Here, when the maximum value of the distance function d (θ) is equal to or greater than the threshold (Thd) indicating the shape of the finger, the number of maximum values can be estimated as the number of fingers.

手領域検出部１６は、図２５に示すように閾値（Ｔｈｄ）以上の極大値の数に応じて手領域を検出する。例えば、閾値（Ｔｈｄ）以上の極大値の数の閾値を４、つまり指が４本以上検出された場合、その手候補領域を手領域として検出する。 As shown in FIG. 25, the hand region detection unit 16 detects a hand region according to the number of local maximum values equal to or greater than a threshold value (Thd). For example, when the threshold of the number of maximum values equal to or greater than the threshold (Thd) is 4, that is, when four or more fingers are detected, the hand candidate area is detected as a hand area.

また、図２４に示すようにｈ（ｋ）（ｋ＝１，２，．．．，Ｋ）を定義し、ｈ（ｋ）がある閾値以上の場合、指と推定するようにしてもよい。この場合、手領域検出部１６は閾値以上のｈ（ｋ）に応じて手領域を検出する。例えば、閾値を５とすると指が５本検出された場合、その手候補領域を手領域として検出することとなる。なお、これらの指を検出するための閾値や手領域を検出するための閾値は、学習によって決めてもよい。 Also, as shown in FIG. 24, h (k) (k = 1, 2,..., K) may be defined, and if h (k) is equal to or greater than a certain threshold value, it may be estimated as a finger. In this case, the hand region detection unit 16 detects the hand region according to h (k) equal to or greater than the threshold value. For example, when the threshold value is 5, when five fingers are detected, the hand candidate area is detected as a hand area. Note that the threshold for detecting these fingers and the threshold for detecting the hand region may be determined by learning.

このように人の統計的肌色特徴に基づいた肌色の候補領域を選定し、それらの領域内の画像分布特性による手の候補領域を抽出し、この手の候補領域の形状複雑度を評価することにより、人の肌色の個人差や照明変化によることなく、実環境でもロバストな手領域検出を行うことができる。 In this way, selecting candidate skin color regions based on human statistical skin color characteristics, extracting hand candidate regions based on image distribution characteristics within those regions, and evaluating the shape complexity of these hand candidate regions Thus, robust hand region detection can be performed even in a real environment without depending on individual differences in human skin color or lighting changes.

また、形状複雑度の評価によって、手の開いている時の領域特徴とその他の領域特徴とを区別することができるため、ロバストな手領域検出を行うことができる。 In addition, since the region feature when the hand is open can be distinguished from other region features by evaluating the shape complexity, robust hand region detection can be performed.

なお、この手領域検出装置１０は、統計的肌色特徴に基づいた肌色の候補領域を選定し、それらの領域内の画像分布特性による手の候補領域を抽出することとしたが、これに限られるものではなく、肌色の候補領域選定と手候補領域抽出とを統合し、ＲＧＢ処理や画像分布処理等を任意の順番で行い手候補領域を絞るようにしてもよい。 The hand region detection apparatus 10 selects skin color candidate regions based on statistical skin color features and extracts hand candidate regions based on image distribution characteristics in these regions, but is not limited thereto. Instead, the selection of skin color candidate regions and the extraction of hand candidate regions may be integrated, and the hand candidate regions may be narrowed down by performing RGB processing, image distribution processing, and the like in an arbitrary order.

図２５は、上述した手領域検出装置の他の構成（その１）を示すブロック図である。この手領域検出装置２０は、実環境のロバスト性をより高めるために、例えば、フレーム間の差分情報を用いて、検出された手の領域に対して、さらに動いたもの（領域）を正しい手領域として決定する。なお、上述した手領域検出装置１０と同一な構成には同一の符号を付し、説明を省略する。 FIG. 25 is a block diagram showing another configuration (part 1) of the hand region detection apparatus described above. In order to further improve the robustness of the real environment, the hand region detection device 20 uses, for example, difference information between frames, and moves the detected hand region further (region) to the correct hand. Determine as an area. In addition, the same code | symbol is attached | subjected to the structure same as the hand region detection apparatus 10 mentioned above, and description is abbreviate | omitted.

この手領域検出装置２０は、撮像部１１と、撮像部１１で撮像された画像を取り込む画像取込部１２と、画像取込部１２で取り込んだ画像から肌色の候補領域を選定する肌色候補領域選定部１３と、肌色候補選定部１３で選定された肌色の候補領域から手候補領域を抽出する手候補領域抽出部１４と、手候補領域抽出部１４で抽出された手候補領域の形状の複雑度を算出する形状複雑度算出部１５と、形状複雑度算出部１５で算出された複雑度に基づいて手領域を検出する手領域検出部１６と、手領域検出部１６で検出された手領域を含む画像を所定フレーム分遅延する遅延回路部２１と、上記遅延回路部２１で所定フレーム分遅延された画像と現フレームの画像とを比較し動き領域を検出する動き領域検出部２２と、上記動き領域検出部２２で検出された動き領域に基づいて手領域を決定する手領域決定部２３とを備えて構成される。 The hand region detection device 20 includes an imaging unit 11, an image capturing unit 12 that captures an image captured by the imaging unit 11, and a skin color candidate region that selects a skin color candidate region from the image captured by the image capturing unit 12. The selection unit 13, the hand candidate region extraction unit 14 that extracts a hand candidate region from the skin color candidate region selected by the skin color candidate selection unit 13, and the complexity of the shape of the hand candidate region extracted by the hand candidate region extraction unit 14 A shape complexity calculating unit 15 for calculating the degree, a hand region detecting unit 16 for detecting a hand region based on the complexity calculated by the shape complexity calculating unit 15, and a hand region detected by the hand region detecting unit 16 A delay circuit unit 21 that delays an image including a predetermined frame, a motion region detection unit 22 that compares the image delayed by a predetermined frame by the delay circuit unit 21 with an image of the current frame, and detects a motion region; Motion region detector 2 In constructed and a hand region determining unit 23 for determining the hand region based on the detected motion region.

遅延回路部２１は、入力された所定フレーム分の画像を一時的に保持するバッファを有しており、例えば入力された画像と、それよりも１フレーム前の画像とを動き領域検出部２２に出力する。 The delay circuit unit 21 has a buffer that temporarily holds an input image for a predetermined frame. For example, the delay circuit unit 21 receives the input image and an image one frame before that in the motion region detection unit 22. Output.

動き領域検出部２２は、例えば、入力された１フレーム分異なる２つの画像を比較し動き領域を検出する。ここで、動き領域検出部２２は、手領域検出部１６で検出された手領域近傍の動きを検出することが好ましい。 For example, the motion region detection unit 22 compares two input images that differ by one frame and detects a motion region. Here, it is preferable that the motion region detection unit 22 detects the motion in the vicinity of the hand region detected by the hand region detection unit 16.

手領域決定部２３は、動き領域検出部２２で検出された動き領域が手領域検出部１６で検出された手領域か否を判別し、検出された動き領域が手領域である場合のみ、その領域を手領域として決定する。なお、手領域検出部１６で検出された手領域が動き領域検出部２２で検出された動き領域であるか否かを判別し、検出された手領域が動き領域である場合、手領域として決定してもよい。 The hand region determination unit 23 determines whether or not the motion region detected by the motion region detection unit 22 is a hand region detected by the hand region detection unit 16, and only when the detected motion region is a hand region. The area is determined as a hand area. It is determined whether or not the hand region detected by the hand region detection unit 16 is a motion region detected by the motion region detection unit 22, and if the detected hand region is a motion region, it is determined as a hand region. May be.

続いて、手領域検出装置２０の動作について説明する。 Next, the operation of the hand region detection device 20 will be described.

この手領域検出装置２０は上述した手領域検出装置１０と同様にして手領域検出部１６で手領域を検出するため、そこまでの詳細な説明を省略する。手領域検出装置２０は、上述したように入力画像から人の統計的肌色特徴に基づいた肌色の候補領域を選定し、それらの領域内の画像分布特性による手の候補領域を抽出し、この手の候補領域の形状複雑度に基づいて手領域を検出する。手領域が検出された画像は遅延回路部２１に入力される。 Since the hand region detection device 20 detects the hand region by the hand region detection unit 16 in the same manner as the hand region detection device 10 described above, detailed description thereof will be omitted. As described above, the hand region detection device 20 selects a skin color candidate region based on a person's statistical skin color feature from the input image, extracts a hand candidate region based on image distribution characteristics in these regions, and extracts the hand region. The hand region is detected based on the shape complexity of the candidate region. The image in which the hand area is detected is input to the delay circuit unit 21.

図２６及び図２７は、遅延回路部２１から出力される撮像時刻が異なる画像例をそれぞれ示すものである。また、図２８は、遅延回路２１から出力された図２６及び図２７の画像に基づいて動き領域を検出した画像例である。 FIG. 26 and FIG. 27 show examples of images with different imaging times output from the delay circuit unit 21, respectively. FIG. 28 is an example of an image in which a motion region is detected based on the images of FIGS. 26 and 27 output from the delay circuit 21.

遅延回路部２１は、入力された画像をバッファに保持し、例えば図２６に示すような画像Ｉｍ（ｔ）と、図２７に示すような画像Ｉｍ(t＋1)を動き領域検出部２２に出力する。 The delay circuit unit 21 holds the input image in a buffer, and outputs, for example, an image Im (t) as shown in FIG. 26 and an image Im (t + 1) as shown in FIG. 27 to the motion region detection unit 22. .

動き領域検出部２２は、入力された画像Ｉｍ（ｔ）と画像Ｉｍ(t＋1)とから、例えば、フレーム間差分処理によって動き情報を検出する。これにより、図２８に示すように、手部分と顔部分を検出することができる。 The motion region detection unit 22 detects motion information from the input image Im (t) and the image Im (t + 1), for example, by inter-frame difference processing. Thereby, as shown in FIG. 28, a hand part and a face part can be detected.

手領域決定部２３は、検出された動き領域が手領域検出部１６で検出された手領域か否かを判別する。例えば、図２８に示すように動き領域が検出された手部分と顔部分のうち、手領域検出部１６で検出された手領域である手部分を最終的な手領域として決定する（図２９参照）。 The hand region determination unit 23 determines whether or not the detected motion region is a hand region detected by the hand region detection unit 16. For example, as shown in FIG. 28, the hand part that is the hand area detected by the hand area detection unit 16 is determined as the final hand area among the hand part and the face part in which the motion area is detected (see FIG. 29). ).

図３０は、肌色候補領域の選定例を示し、図３１は、手領域の検出例を示す。上述した動き情報を用いることにより、図３０に示すように肌色候補選定部１３で選定された肌色候補領域Ｅ、Ｆ、Ｇ、Ｈに示す手領域からポスターに描写された肌色候補領域Ｇ、Ｈの手領域を除くことができ、図３１のように確実に最終的な手領域を検出することができる。 FIG. 30 shows an example of selection of a skin color candidate area, and FIG. 31 shows an example of detection of a hand area. By using the motion information described above, the skin color candidate regions G and H depicted on the poster from the hand regions shown in the skin color candidate regions E, F, G, and H selected by the skin color candidate selection unit 13 as shown in FIG. Thus, the final hand region can be reliably detected as shown in FIG.

一方、動き情報を用いなかった場合、図３２及び図３３に示すように、肌色候補領域Ｅ、Ｆ、Ｇ、Ｈから、肌色候補領域Ｅ、Ｇ内の手領域を最終的に検出してしまう。つまり、ポスターに描写された手も手領域として検出してしまう。 On the other hand, when the motion information is not used, as shown in FIGS. 32 and 33, the hand regions in the skin color candidate regions E and G are finally detected from the skin color candidate regions E, F, G, and H. . That is, the hand depicted on the poster is also detected as a hand region.

なお、上述した手領域検出装置２０は、複雑度に基づいて手領域を検出した後、動き領域を検出し、手領域と動き領域とから最終的な手領域を決定することとしたが、画像取込部１２からの入力画像から動き領域を検出し、手領域を決定してもよい。 The above-described hand region detection device 20 detects the hand region based on the complexity, then detects the motion region, and determines the final hand region from the hand region and the motion region. A hand region may be determined by detecting a motion region from an input image from the capturing unit 12.

また、図３４は、上述した手領域検出装置の他の構成（その２）を示すブロック図である。この手領域検出装置３０は、実環境のロバスト性をより高めるために、例えば、ステレオカメラを用いて、距離情報を推定し、所定距離以上にある肌色候補領域を背景として除去する。なお、上述した手領域検出装置１０と同一な構成には同一の符号を付し、説明を省略する。 FIG. 34 is a block diagram showing another configuration (part 2) of the hand region detection device described above. In order to further improve the robustness of the real environment, the hand region detection device 30 estimates distance information using, for example, a stereo camera, and removes skin color candidate regions that are at a predetermined distance or more as a background. In addition, the same code | symbol is attached | subjected to the structure same as the hand region detection apparatus 10 mentioned above, and description is abbreviate | omitted.

この手領域検出装置３０は、撮像部１１と、撮像部１１で撮像された画像を取り込む画像取込部１２と、画像取込部１２で取り込んだ画像から肌色の候補領域を選定する肌色候補領域選定部１３と、肌色候補選定部１３で選定された肌色の候補領域から手候補領域を抽出する手候補領域抽出部１４と、手候補領域抽出部１４で抽出された手候補領域の形状の複雑度を算出する形状複雑度算出部１５と、形状複雑度算出部１５で算出された複雑度に基づいて手領域を検出する手領域検出部１６と、基準撮像部３１と、撮像部１１と基準撮像部３２とに基づいて画像内の被写体の距離情報を取得する距離情報取得部３２と、手領域検出部１６で検出された手領域と距離情報取得部３２で取得した距離情報とに基づいて手領域を決定する手領域決定部３３とを備えて構成される。 The hand region detection device 30 includes an imaging unit 11, an image capturing unit 12 that captures an image captured by the imaging unit 11, and a skin color candidate region that selects a skin color candidate region from the image captured by the image capturing unit 12. The selection unit 13, the hand candidate region extraction unit 14 that extracts a hand candidate region from the skin color candidate region selected by the skin color candidate selection unit 13, and the complexity of the shape of the hand candidate region extracted by the hand candidate region extraction unit 14 A shape complexity calculating unit 15 for calculating the degree, a hand region detecting unit 16 for detecting a hand region based on the complexity calculated by the shape complexity calculating unit 15, a reference imaging unit 31, an imaging unit 11, and a reference Based on the distance information acquisition unit 32 that acquires the distance information of the subject in the image based on the imaging unit 32, the hand region detected by the hand region detection unit 16, and the distance information acquired by the distance information acquisition unit 32. Hand region determination to determine hand region Configured with a 33.

基準撮像部３１は、ＣＣＤ等の撮像素子を有するカメラで構成され、撮像部１１を用いてステレオカメラとして構成される。なお、さらに複数のカメラでステレオカメラを構成することが好ましい。 The reference imaging unit 31 is configured by a camera having an imaging element such as a CCD, and is configured as a stereo camera using the imaging unit 11. In addition, it is preferable to further configure a stereo camera with a plurality of cameras.

距離情報取得部３２は、撮像部１１及び基準撮像部３１を用いて、例えば、「ステレオ法」に基づき対象物との距離を測定する。ここで言うステレオ法とは、いわゆる「三角測量」の原理により所定の位置関係を持つ複数のカメラの視点（投影中心）から撮像した画像を用いて、シーンすなわち撮像画像中の各点の３次元空間内の位置を推定し、投影中心との距離を測定する。 The distance information acquisition unit 32 uses the imaging unit 11 and the reference imaging unit 31 to measure the distance to the object based on the “stereo method”, for example. The stereo method referred to here is a three-dimensional view of each point in a scene, that is, a captured image using images captured from the viewpoints (projection centers) of a plurality of cameras having a predetermined positional relationship based on the principle of so-called “triangulation”. Estimate the position in space and measure the distance to the projection center.

手領域決定部３３は、手領域検出部１６で検出された手領域を距離情報取得部３２で取得された距離情報に基づいて最終的な手領域を決定する。例えば、手領域検出部１６で検出された手領域が所定距離以下又は所定距離以上若しくは所定の距離範囲内に属するか否かに応じて最終的な手領域を決定する。 The hand region determination unit 33 determines the final hand region based on the distance information acquired by the distance information acquisition unit 32 for the hand region detected by the hand region detection unit 16. For example, the final hand region is determined according to whether the hand region detected by the hand region detection unit 16 is equal to or smaller than a predetermined distance, equal to or greater than a predetermined distance, or within a predetermined distance range.

続いて、手領域検出装置３０の動作について説明する。 Next, the operation of the hand region detection device 30 will be described.

手領域検出装置３０は上述した手領域検出装置１０と同様にして手領域検出部１６で手領域を検出するため、そこまでの説明を省略する。すなわち、入力画像から人の統計的肌色特徴に基づいた肌色の候補領域を選定し、それらの領域内の画像分布特性による手の候補領域を抽出し、この手の候補領域の形状複雑度に基づいて手領域を検出する。 Since the hand region detection device 30 detects the hand region by the hand region detection unit 16 in the same manner as the hand region detection device 10 described above, description thereof is omitted. In other words, skin color candidate regions based on human statistical skin color features are selected from the input image, hand candidate regions based on image distribution characteristics in those regions are extracted, and based on the shape complexity of the hand candidate regions To detect the hand area.

図３５〜図３９は、距離情報を用いて入力画像から肌色候補を選定する際の画像例を示す図である。図３５に示すように二人の人物が手を広げている画像が画像取込部１２に入力された場合、肌色候補領域選定部１３により肌色候補領域Ｉ、Ｊ、Ｋ、Ｌが選定される（図３６参照）。そして、手領域検出部１６は、図３７に示すように形状複雑度に基づいて肌色候補領域Ｉ、Ｋ内の手領域を検出する。 FIGS. 35 to 39 are diagrams illustrating image examples when skin color candidates are selected from the input image using the distance information. As shown in FIG. 35, when an image in which two persons are spreading their hands is input to the image capturing unit 12, the skin color candidate regions I, J, K, and L are selected by the skin color candidate region selection unit 13. (See FIG. 36). Then, the hand region detection unit 16 detects the hand regions in the skin color candidate regions I and K based on the shape complexity as shown in FIG.

また、距離情報取得部３２は、撮像部１１、３１で撮像された画像に基づいて、例えば撮像部１１から被写体までの距離情報を取得し、手領域決定部３３に出力する。手領域決定部３３は手候補領域抽出部１４で抽出された手候補領域ｉ、ｊ、ｋ、ｌから、図３８に示すように距離情報に応じた色の濃淡に基づいて手領域を決定する。図３９は、撮像部１１により近い位置の人物の手領域を検出し、それよりも遠い位置の人物の手領域を除去したものである。 The distance information acquisition unit 32 acquires distance information from the imaging unit 11 to the subject, for example, based on the images captured by the imaging units 11 and 31, and outputs the distance information to the hand region determination unit 33. The hand region determination unit 33 determines the hand region from the hand candidate regions i, j, k, and l extracted by the hand candidate region extraction unit 14 based on the color shading according to the distance information as shown in FIG. . In FIG. 39, a person's hand area at a position closer to the imaging unit 11 is detected, and a person's hand area at a position farther than that is removed.

このように距離情報を用いることにより、後方又は一定の距離にいる人物の手のみを検出することができる。 Thus, by using the distance information, it is possible to detect only the hand of a person behind or at a certain distance.

なお、上述した手領域検出装置３０は、複雑度に基づいて手領域を検出した後、距離情報に基づいて最終的な手領域を決定することとしたが、距離情報に基づいて画像取込部１２からの入力画像から手領域を検出する範囲を決定し、その範囲内で手領域を決定するようにしてもよい。 Note that the hand region detection device 30 described above determines the final hand region based on the distance information after detecting the hand region based on the complexity, but the image capturing unit is based on the distance information. A range for detecting the hand region may be determined from the input image from 12, and the hand region may be determined within the range.

次に、上述した手領域検出装置１０，２０，３０により検出された手領域を追跡する手領域追跡装置について説明する。 Next, a hand region tracking device that tracks the hand regions detected by the hand region detection devices 10, 20, and 30 described above will be described.

図４０は、手領域追跡装置の構成を示すブロック図である。この手領域追跡装置４０は、撮像部４１と、撮像部４１で撮像された画像を取り込む画像取込部４２と、前時刻の手領域追跡結果から現時刻の推測手領域位置を予測する手領域位置予測部４３と、手領域予測部４３で予測された推測手領域位置を現時刻の画像の分布特性に基づいて評価する予測領域評価部４４と、予測領域評価部４４における評価結果に基づいて現時刻の手領域位置を推定する手領域追跡結果推定部４５と、手領域追跡結果推定部４５で推定された現時刻の手領域位置を出力する追跡結果出力部４６と、追跡結果出力部４６から出力された現時刻の手領域を記憶する追跡結果記憶部４７とを備えて構成されている。ここで、撮像部４１及び画像取込部４２及び追跡結果記憶部４７は、図１に示す画像取込手段１０２、画像取込手段１０２及び追跡結果保存手段１０７にそれぞれ対応している。また、手領域予測部４３及び予測領域評価部４４は、図１に示す追跡結果読込手段１０４を含み、手領域予測部４３、予測領域評価部４４、手領域追跡結果推定部４５及び追跡結果出力部４６は、図１に示す追跡処理手段１０６を含む構成となっている。 FIG. 40 is a block diagram illustrating a configuration of the hand region tracking apparatus. The hand region tracking device 40 includes an imaging unit 41, an image capturing unit 42 that captures an image captured by the imaging unit 41, and a hand region that predicts the estimated hand region position at the current time from the previous hand region tracking result. Based on the position prediction unit 43, the prediction region evaluation unit 44 that evaluates the estimated hand region position predicted by the hand region prediction unit 43 based on the distribution characteristics of the image at the current time, and the evaluation result in the prediction region evaluation unit 44 A hand region tracking result estimation unit 45 that estimates the hand region position at the current time, a tracking result output unit 46 that outputs the hand region position at the current time estimated by the hand region tracking result estimation unit 45, and a tracking result output unit 46 And a tracking result storage unit 47 for storing the hand area of the current time output from the. Here, the imaging unit 41, the image capturing unit 42, and the tracking result storage unit 47 correspond to the image capturing unit 102, the image capturing unit 102, and the tracking result storage unit 107 shown in FIG. Further, the hand region prediction unit 43 and the prediction region evaluation unit 44 include the tracking result reading unit 104 shown in FIG. 1, and the hand region prediction unit 43, the prediction region evaluation unit 44, the hand region tracking result estimation unit 45, and the tracking result output. The unit 46 includes the tracking processing means 106 shown in FIG.

撮像部４１は、ＣＣＤ（Charge Coupled Device）等の撮像素子を有するカメラで構成されている。 The imaging unit 41 is configured by a camera having an imaging element such as a CCD (Charge Coupled Device).

画像取込部４２は、撮像部４１からＪＰＥＧ（Joint Photographic Experts Group）等のフォーマットの画像を取り込む。 The image capturing unit 42 captures an image in a format such as JPEG (Joint Photographic Experts Group) from the imaging unit 41.

手領域位置予測部４３は、前時刻（ｔ−１）の追跡結果を追跡結果記憶部４７から読込む追跡結果読込部４３１と、予め用意された手の動き予測モデル４３２に基づいて手領域の現時刻（ｔ）の存在可能な予測位置ｓ_ｋ（ｔ）を算出する予測位置計算部４３３とを備えている。 The hand region position predicting unit 43 reads the tracking result of the previous time (t−1) from the tracking result storage unit 47 and the hand region position prediction unit 43 based on the hand motion prediction model 432 prepared in advance. A predicted position calculation unit 433 that calculates a predicted position s _k (t) that can exist at the current time (t).

追跡結果読込部４３１は、前時刻の手領域位置ｓ_ｋ（ｔ−１）をその手領域確率π_ｋ（ｔ−１）に基づいて読み込む。つまり、前時刻（ｔ−１）において、手領域の確率が高い候補位置を読み込んで、現時刻ｔの追跡位置ｓ_ｋ（ｔ）を予測する。 The tracking result reading unit 431 reads the hand region position s _k (t−1) at the previous time based on the hand region probability π _k (t−1). That is, at the previous time (t−1), a candidate position with a high hand region probability is read, and the tracking position s _k (t) at the current time t is predicted.

例えば、確率π_ｋ（ｔ−１）を［０，１］に正規化する。つまり、確率π_ｋ（ｔ−１）がその閾値Ｔより大きいものを前時刻の手領域位置とする。この場合、閾値Ｔより大きいＫ個の手領域位置の確率の総和Ｐ_Ｋは、（５）式で算出される。 For example, the probability π _k (t−1) is normalized to [0, 1]. That is, a hand region position at the previous time having a probability π _k (t−1) larger than the threshold value T is used. In this case, the sum P _K of probabilities of K hand region positions larger than the threshold T is calculated by the equation (5).

また、各手領域位置における寄与率λ_ｋ(ｔ−１)は、（６）式で算出される。 Further, the contribution ratio λ _k (t−1) at each hand region position is calculated by the equation (6).

また、前時刻における追跡対象の予測位置の総数をＮ（Ｎ＞Ｋ）とし、前時刻における各手領域位置が選ばれる回数ｎ_ｋ（ｔ−１）を（７）式により算出する。 In addition, the total number of predicted positions of the tracking target at the previous time is N (N> K), and the number of times n _k (t−1) at which each hand region position is selected at the previous time is calculated by Expression (7).

つまり、確率π_ｋ（ｔ−１）がある閾値Ｔより大きい手領域位置は、候補としてｎ_ｋ（ｔ−１）回選ばれて、次に説明するように、現時刻の予測位置が算出される。 That is, a hand region position whose probability π _k (t−1) is larger than a certain threshold T is selected as a candidate n _k (t−1) times, and a predicted position at the current time is calculated as described below. The

手領域予測位置計算部４３３は、（８）式のように、現時刻（ｔ）の追跡対象の存在可能な予測位置ｓ_ｋ（ｔ）を、前時刻の追跡対象の位置ｓ_ｋ（ｔ−１）と、前時刻の追跡対象の動き速度ｖ（ｔ−１）と、現時刻の追跡対象の予測位置の範囲Δｄ（ｔ）とに基づいて、算出する。 The hand region predicted position calculation unit 433 uses the predicted position s _k (t) _where the tracking target at the current time (t) can exist as the tracking target position s _k (t− 1), the movement speed v (t−1) of the tracking target at the previous time, and the range Δd (t) of the predicted position of the tracking target at the current time.

この（８）式において、α（ｔ−１）及びβ（ｔ）はそれぞれ、ｖ（ｔ−１）及びΔｄ（ｔ）に対するウェイト（係数）である。 In this equation (8), α (t−1) and β (t) are weights (coefficients) for v (t−1) and Δd (t), respectively.

図４１は、予測位置ｓ_ｋ（ｔ）を算出する演算器の構成を模式的に示す図である。現時刻の予測位置ｓ_ｋ（ｔ）は、加算器４３３１、４３３２により前時刻の追跡対象の位置ｓ_ｋ（ｔ−１）と前時刻の追跡対象の動き速度ｖ（ｔ−１）と予測範囲Δｄ（ｔ）とが加算される。 FIG. 41 is a diagram schematically illustrating a configuration of a computing unit that calculates the predicted position s _k (t). The predicted position s _k (t) at the current time is determined by the adders 4331 and 4332, the position s _k (t−1) of the tracking target at the previous time, the motion speed v (t−1) of the tracking target at the previous time, and the prediction range. Δd (t) is added.

予測範囲Δｄ（ｔ）は、（９）式に示すように、平均値ａｍ＝０、分散σ（ｔ）の正規乱数発生器４３３３によってランダムに与える。 The prediction range Δd (t) is randomly given by a normal random number generator 4333 having an average value am = 0 and a variance σ (t) as shown in the equation (9).

また、追跡対象の予測範囲をその対象の動き量に従って動的に決めるために、分散σ（ｔ）を（１０）式に示すように算出した。この分散σ（ｔ）は、絶対値回路４３３４により算出された前時刻の追跡対象の速度（動き量）の大きさの絶対値｜ｖ（ｔ−１）｜と、追跡対象領域の大きさによるパラメータｒ２（ｔ）とに基づいて演算器４３３５にて算出される。 Further, in order to dynamically determine the prediction range of the tracking target according to the amount of motion of the target, the variance σ (t) was calculated as shown in the equation (10). This variance σ (t) depends on the absolute value | v (t−1) | of the magnitude of the speed (motion amount) of the tracking target at the previous time calculated by the absolute value circuit 4334 and the size of the tracking target area. Based on the parameter r2 (t), it is calculated by the calculator 4335.

この（１０）式において、｜ｖ（ｔ−１）｜は、前時刻の追跡対象の速度（動き量）の大きさの絶対値を示し、ｋ１は、その動き量の貢献度を示す定数である。 In this equation (10), | v (t−1) | indicates the absolute value of the speed (motion amount) of the tracking target at the previous time, and k1 is a constant indicating the contribution of the motion amount. is there.

また、ｒ２（ｔ）は、追跡対象領域の大きさによるパラメータであり、（１１）式に示すように追跡対象の大きさに従って分散値σ（ｔ）を決めた。つまり予測範囲を動的に決めることとした。 Also, r2 (t) is a parameter depending on the size of the tracking target area, and the variance value σ (t) is determined according to the size of the tracking target as shown in equation (11). In other words, the prediction range was decided dynamically.

この（１１）式において、ＯｂｊＳｉｚｅ（ｔ−１）は、前時刻の追跡対象の大きさを示す。そして、このＯｂｊＳｉｚｅ（ｔ−１）がある閾値Ｔ０以上の場合、ｒ２（ｔ）はその大きさに比例して算出される。一方、ＯｂｊＳｉｚｅ（ｔ−１）がある閾値ＴＯより小さい場合、ｒ２（ｔ）はある定数Ｃｏｎｓｔによって決められる。これは、前時刻の追跡結果が間違っている可能性が高いので、予測範囲が大きくなるようにある定数Ｃｏｎｓｔとしたものである。つまり、ＯｂｊＳｉｚｅ（ｔ−１）は、前時刻の追跡位置において、色相画像Ｈｕｅと彩度画像Ｓｍの２値化処理により算出された肌色領域の面積とすることができる。例えば、図４２に示すように前時刻において検出した手領域ａが手の場合、２値化処理により、図４３に示すような肌色領域ｂを算出すると、ある閾値Ｔ０以上となる。一方、図４４に示すように前時刻において検出した手領域ｃが手でない場合、図４５に示すような肌色領域ｄを算出すると、ある閾値Ｔ０よりも小さくなる。 In this equation (11), ObjSize (t−1) indicates the size of the tracking target at the previous time. When this ObjSize (t−1) is greater than or equal to a certain threshold value T0, r2 (t) is calculated in proportion to the magnitude. On the other hand, if ObjSize (t−1) is smaller than a certain threshold value TO, r2 (t) is determined by a certain constant Const. This is a constant Const that increases the prediction range because there is a high possibility that the previous time tracking result is incorrect. That is, ObjSize (t−1) can be the area of the skin color area calculated by the binarization process of the hue image Hue and the saturation image Sm at the tracking position at the previous time. For example, as shown in FIG. 42, when the hand region a detected at the previous time is a hand, the skin color region b as shown in FIG. On the other hand, when the hand region c detected at the previous time is not a hand as shown in FIG. 44, the skin color region d as shown in FIG. 45 is calculated to be smaller than a certain threshold T0.

このように手領域位置予測部４３は、前時刻（ｔ−１）における手の追跡結果を追跡結果記憶部４７から読込み、手の大きさや手の動き速度に基づいた手の動き予測モデルに基づいて現時刻（ｔ）の存在可能な手領域の予測位置ｓ_ｋ（ｔ）を算出することにより、高速且つ高精度に手領域を予測することができる。 As described above, the hand region position prediction unit 43 reads the hand tracking result at the previous time (t−1) from the tracking result storage unit 47, and based on the hand motion prediction model based on the hand size and the hand movement speed. Thus, by calculating the predicted position s _k (t) of the hand region where the current time (t) can exist, the hand region can be predicted with high speed and high accuracy.

手領域評価部４４は、手領域位置予測部４３で予測された手領域の予測位置ｓ_ｋ（ｔ）の予測領域画像のＨＳＶ分布と、画像取込部４２から入力された現時刻画像のＨＳＶ分布を算出する画像分布算出部４４１と、画像分布算出部４４１で算出されたＨＳＶ分布画像と追跡結果記憶部４７に記憶された手領域モデル４４２とに基づいて予測領域を評価する評価値計算部４４３とを備えている。この手領域モデル４４２は、例えば、手領域画像の肌色特徴を表すヒストグラム等の情報を含むものであり、予測領域画像のヒストグラムと比較される。 The hand region evaluation unit 44 calculates the HSV distribution of the predicted region image of the hand region predicted position s _k (t) predicted by the hand region position prediction unit 43 and the HSV of the current time image input from the image capturing unit 42. An image distribution calculation unit 441 that calculates the distribution, and an evaluation value calculation unit that evaluates the prediction region based on the HSV distribution image calculated by the image distribution calculation unit 441 and the hand region model 442 stored in the tracking result storage unit 47 443. The hand region model 442 includes, for example, information such as a histogram representing the skin color feature of the hand region image, and is compared with the histogram of the predicted region image.

手領域評価部４４は、入力された予測位置ｓ_ｋ（ｔ）の予測領域画像（２４Ｂｉｔ,ＲＧＢ）を、ＨＳＶ画像（８Ｂｉｔ）に変換する。このＨＳＶ画像は、色をＨ(Hue/色相,３Ｂｉｔ)、Ｓ(Saturation/彩度，３Ｂｉｔ)、Ｖ(Value/明度，２Ｂｉｔ)で表したものである。例えば、図４６に示すようなＲＧＢ画像をＨＳＶ画像に変換した場合、図４７に示すようなＨＳＶ画像が得られる。また、図４８は、図４７に示すＨＳＶ画像の手領域ａのヒストグラムを示すものである。このヒストグラムは、人の肌色に対して特徴的な分布を示す。 The hand region evaluation unit 44 converts the predicted region image (24 Bit, RGB) at the input predicted position s _k (t) into an HSV image (8 Bit). In this HSV image, colors are represented by H (Hue / hue, 3 bits), S (Saturation / saturation, 3 bits), and V (value / lightness, 2 bits). For example, when an RGB image as shown in FIG. 46 is converted into an HSV image, an HSV image as shown in FIG. 47 is obtained. FIG. 48 shows a histogram of the hand region a of the HSV image shown in FIG. This histogram shows a characteristic distribution with respect to human skin color.

そこで、予測領域のＨＳＶ画像の分布特性を、（１２）式により求める。 Therefore, the distribution characteristic of the HSV image in the prediction region is obtained by the equation (12).

この（１２）式において、ＨＳＶ（ｉ,ｊ）はＨＳＶ画像の輝度値であり、δ（ｘ）はｘ＝０のときに１、その他には０の値を与えるデルタ関数である。つまり、ｋが０〜２５５の値をとり、ＨＳＶ（ｉ,ｊ）の輝度値と同じ場合、分布特性関数ｈｉｓｔｏｇｒａｍ（ｋ）はｐ_{（ｉ,ｊ）}（ｒ）の値だけ増加する。 In this equation (12), HSV (i, j) is a luminance value of the HSV image, and δ (x) is a delta function that gives a value of 1 when x = 0, and 0 otherwise. That is, when k takes a value of 0 to 255 and is the same as the brightness value of HSV (i, j), the distribution characteristic function histogram (k) increases by the value of p _{(i, j)} (r).

ｐ_{（ｉ,ｊ）}（ｒ）は、（１３）式に示すように、現時刻の画像の座標位置（ｉ,ｊ）と予測領域の中心位置（ｘｃ,ｙｃ）との距離ｒに逆比例する関数であり、分布特性関数ｈｉｓｔｏｇｒａｍ（ｋ）の重み付けを行っている。つまり、中心付近の画素は貢献度が高くなり、中心から離れた画素になるにつれて貢献度が低くなる。これにより、背景の影響を低減することができる。 p _{(i, j)} (r) is inversely proportional to the distance r between the coordinate position (i, j) of the image at the current time and the center position (xc, yc) of the prediction region, as shown in equation (13). The distribution characteristic function histogram (k) is weighted. That is, the pixel near the center has a higher contribution, and the contribution becomes lower as the pixel becomes farther from the center. Thereby, the influence of a background can be reduced.

また、α（ｉ,ｊ）は、例えば、予め設定されたマスク領域に依存する。例えば、顔検出器により検出された顔領域をマスク領域とし、予測領域が顔領域の一部を含む場合、その画素（ｉ,ｊ）が手領域となる確率を下げるように調整する関数である。このα（ｉ,ｊ）は、（１５）式に示すように、その画素（ｉ，ｊ）が顔領域のようなマスク領域に含まれているか否かで決められる。 In addition, α (i, j) depends on, for example, a preset mask area. For example, when the face area detected by the face detector is used as a mask area and the prediction area includes a part of the face area, the function is adjusted to reduce the probability that the pixel (i, j) is a hand area. . This α (i, j) is determined by whether or not the pixel (i, j) is included in a mask area such as a face area, as shown in equation (15).

このようにＨＳＶ画像のヒストグラムだけではなく、マスク領域に基づいて確率を調整することにより、手領域のＨＳＶ値（特性）と顔領域のＨＳＶ値の相似性が高くなっても、手領域追跡エラーの発生を防止し、ロバストな手領域追跡が実現できる。 In this way, by adjusting the probability based not only on the histogram of the HSV image but also on the mask area, even if the similarity between the HSV value (characteristic) of the hand area and the HSV value of the face area increases, a hand area tracking error occurs. Can be prevented and robust hand region tracking can be realized.

また、手領域評価部４４は、（１６）式のように、（１２）式により算出されるヒストグラムと追跡結果記憶部４７に登録された手領域モデルとのマッチングコスト計算を行い、予測領域を評価する。 Further, the hand region evaluation unit 44 calculates a matching cost between the histogram calculated by the equation (12) and the hand region model registered in the tracking result storage unit 47 as in the equation (16), and calculates the prediction region. evaluate.

このように追跡結果記憶部４７に登録された手領域モデルと予測領域との相似性をマッチングにより評価することにより、ロバストな手領域追跡が実現できる。 Thus, by evaluating the similarity between the hand region model registered in the tracking result storage unit 47 and the prediction region by matching, robust hand region tracking can be realized.

手領域追跡結果推定部４５は、予測領域評価部４４で算出されたマッチングコストによる確率を計算する確率計算部４５１と、全ての予測領域の確率に基づいて追跡結果を推定する追跡結果推定部４５２とを備えている。 The hand region tracking result estimation unit 45 calculates a probability based on the matching cost calculated by the prediction region evaluation unit 44, and a tracking result estimation unit 452 that estimates the tracking result based on the probabilities of all prediction regions. And.

確率計算部４５１は、予測領域評価部４４で算出されたマッチングコストを用いて、手領域である確率π_ｋ（ｔ）を（１７）式により算出する。算出された確率π_ｋ（ｔ）は、予測位置ｓ_ｋ（ｔ）とともに追跡結果記憶部４７に記憶される。 The probability calculation unit 451 uses the matching cost calculated by the prediction region evaluation unit 44 to calculate a probability π _k (t) that is a hand region using Equation (17). The calculated probability π _k (t) is stored in the tracking result storage unit 47 together with the predicted position s _k (t).

追跡結果推定部４５２は、現時刻の各追跡対象の予測位置ｓ_ｋ（ｔ）とその相似性評価による確率π_ｋ（ｔ）によって、（１８）式により追跡結果（位置）を推定する。 The tracking result estimation unit 452 estimates the tracking result (position) according to the equation (18) based on the predicted position s _k (t) of each tracking target at the current time and the probability π _k (t) based on the similarity evaluation.

ここで、Ｓ（ｔ）は、現時刻の追跡対象（例えば、手や顔）の位置を示す。また、Ｋは、現時刻で追跡対象が存在する可能性のある位置の予測数を示す。 Here, S (t) indicates the position of the tracking target (for example, hand or face) at the current time. K indicates the predicted number of positions where the tracking target may exist at the current time.

このように手領域追跡結果推定部４５は、追跡結果（位置）を現時刻の各追跡対象の予測位置ｓ_ｋ(ｔ)とその相似性評価による確率π_ｋ(ｔ)によって推定することにより、高速且つ、高精度に手領域を追跡することができる。 As described above, the hand region tracking result estimation unit 45 estimates the tracking result (position) by the predicted position s _k (t) of each tracking target at the current time and the probability π _k (t) based on the similarity evaluation, The hand region can be tracked at high speed and with high accuracy.

追跡結果出力部４６は、手領域追跡結果推定部４５で推定された手領域の位置Ｓ（ｔ）、大きさ、確率等を追跡結果記憶部４７に出力する。 The tracking result output unit 46 outputs the position S (t), size, probability, and the like of the hand region estimated by the hand region tracking result estimation unit 45 to the tracking result storage unit 47.

追跡結果記憶部４７は、追跡結果出力部４６から出力された追跡結果を記憶する。また、手領域位置予測部４３に前時刻の追跡結果を読み出す。また、予測領域評価部４４に、例えば、推定確率の高い位置の手領域モデルを読み出し、手領域初期モデルを更新する。 The tracking result storage unit 47 stores the tracking result output from the tracking result output unit 46. Further, the tracking result of the previous time is read out to the hand region position prediction unit 43. Further, for example, the hand region model at a position with a high estimation probability is read out to the prediction region evaluation unit 44, and the hand region initial model is updated.

続いて、上述した手領域検出装置１０，２０，３０及び手領域追跡装置４０の全体的な動作について図４９に示すフローチャートを用いて説明する。 Next, the overall operation of the hand region detection devices 10, 20, 30 and the hand region tracking device 40 described above will be described with reference to the flowchart shown in FIG.

まず、手領域検出装置１０，２０，３０及び手領域追跡装置４０を含むオブジェクト追跡装置は、手領域を追跡する追跡モードであるか否かを判別する（ステップＳ１）。すなわち、追跡対象となる手領域が検出・登録されているか否かを判別する。 First, the object tracking device including the hand region detection devices 10, 20, 30 and the hand region tracking device 40 determines whether or not the tracking mode is for tracking the hand region (step S1). That is, it is determined whether or not the hand region to be tracked is detected and registered.

ステップＳ１において、手領域の追跡モードでないと判別された場合、すなわち、手領域の登録が必要である場合、ステップＳ２に進み、手領域を検出する。この手領域検出は、上述した手領域検出装置１０，２０，３０を用いて行うことができる。すなわち、入力画像から人の統計的肌色特徴に基づいて手の候補領域を検出し、この手の候補領域の形状複雑度を算出し、この形状複雑度に基づいて手の候補領域から手領域を検出する。 If it is determined in step S1 that the mode is not the hand region tracking mode, that is, if it is necessary to register the hand region, the process proceeds to step S2 to detect the hand region. This hand region detection can be performed using the hand region detection devices 10, 20, and 30 described above. That is, a hand candidate area is detected from an input image based on a person's statistical skin color feature, a shape complexity of the hand candidate area is calculated, and a hand area is calculated from the hand candidate area based on the shape complexity. To detect.

ステップＳ３において、オブジェクト追跡装置は、手領域を検出できたか否かを判別する。ステップＳ３において、手領域を検出できた場合、ステップＳ４に進み、この手領域を手領域モデルとして追跡結果記憶部４７に登録する。一方、手領域を検出できなかった場合、ステップＳ１に戻り、モードを判別する。 In step S3, the object tracking device determines whether or not the hand region has been detected. If the hand area can be detected in step S3, the process proceeds to step S4, and the hand area is registered in the tracking result storage unit 47 as a hand area model. On the other hand, if the hand region cannot be detected, the process returns to step S1 to determine the mode.

ステップＳ１において、追跡モードであると判別された場合、手領域位置予測部４３は、追跡結果記憶部４７から前時刻（ｔ−１）の追跡結果を読込む（ステップＳ５）。また、手領域評価部４４は、追跡結果記憶部４７から手領域モデルを読み込む（ステップＳ６）。 If it is determined in step S1 that the mode is the tracking mode, the hand region position prediction unit 43 reads the tracking result at the previous time (t-1) from the tracking result storage unit 47 (step S5). The hand region evaluation unit 44 reads the hand region model from the tracking result storage unit 47 (step S6).

ステップＳ７において、手領域追跡装置４０は、図５０〜図５３に示すような手領域の追跡処理を行う。ここで、図５０〜図５３は、それぞれ、前時刻（ｔ−１）の入力画像例、現時刻（ｔ）の手領域予測位置例、現時刻（ｔ）の画像例、現時刻（ｔ）の手領域推定位置例を示す。 In step S7, the hand region tracking device 40 performs a hand region tracking process as shown in FIGS. Here, FIGS. 50 to 53 respectively show an input image example at the previous time (t−1), a hand region predicted position example at the current time (t), an image example at the current time (t), and the current time (t). An example of the hand region estimated position is shown.

先ず、手領域位置予測部４３は、ステップＳ５にて読込んだ前時刻（ｔ−１）の追跡結果から手の動き予測モデルに基づいて、現時刻（ｔ）の手領域予測位置を算出する。例えば、図５０に示すような追跡結果ｓ_ｋ（ｔ−１）が入力された場合、手の動き予測モデルに基づいて図５１に示すような手領域予測位置ｓ_ｋ１（ｔ），ｓ_ｋ２（ｔ），ｓ_ｋ３（ｔ），ｓ_ｋ４（ｔ），ｓ_ｋ５（ｔ）を算出する。予測領域評価部４４は、手領域位置予測部４３で予測された各推測手領域位置を、図５２に示すような現時刻の画像に基づいて相似性を評価する。 First, the hand region position prediction unit 43 calculates the hand region predicted position at the current time (t) based on the hand motion prediction model from the tracking result at the previous time (t-1) read in step S5. . For example, when a tracking result s _k (t−1) as shown in FIG. 50 is inputted, hand region predicted positions s _k1 (t), s _k2 ( t), s _k3 (t), s _k4 (t), and s _k5 (t) are calculated. The prediction region evaluation unit 44 evaluates the similarity of each estimated hand region position predicted by the hand region position prediction unit 43 based on an image at the current time as shown in FIG.

手領域追跡推定部４５は、予測領域評価部４４で評価された各予測位置について、手領域モデルとの相似性評価による確率π_ｋ(ｔ)を計算する。例えば、図５２に示す画像の場合、各予測位置の確率計算結果は、確率が高い順にｓ_ｋ５（ｔ），ｓ_ｋ３（ｔ），ｓ_ｋ２（ｔ），ｓ_ｋ４（ｔ），ｓ_ｋ１（ｔ）となる。そして、図５３に示すように、現時刻の各追跡対象の予測位置Ｓ_ｋ（ｔ）とその相似性評価による確率π_ｋ（ｔ）によって追跡位置Ｓ（ｔ）を推定する。 The hand region tracking estimation unit 45 calculates a probability π _k (t) based on similarity evaluation with the hand region model for each predicted position evaluated by the prediction region evaluation unit 44. For example, in the case of the image shown in FIG. 52, the probability calculation result of each predicted position is s _k5 (t), s _k3 (t), s _k2 (t), s _k4 (t), s _k1 ( t). Then, as shown in FIG. 53, the tracking position S (t) is estimated from the predicted position S _k (t) of each tracking target at the current time and the probability π _k (t) based on the similarity evaluation.

ステップＳ８において、手領域追跡装置４０は、ステップＳ７の追跡処理にて推定した追跡位置Ｓ（ｔ），大きさ，確率等の追跡結果を追跡結果記憶部４７に記憶し、現時刻（ｔ）の手領域の追跡処理を終了する。そして、時刻をインクリメントし、次時刻の追跡処理を行う。 In step S8, the hand region tracking device 40 stores the tracking result such as the tracking position S (t), the size, and the probability estimated in the tracking process in step S7 in the tracking result storage unit 47, and the current time (t). The tracking process of the hand region is terminated. Then, the time is incremented, and the next time tracking process is performed.

このように予測された手領域の画像特徴と、手領域検出（初期化）により登録された手領域モデルの画像特徴との相似性評価によって、それらの予測領域が手領域である評価値（可能性）を計算し、この評価値に基づいて各予測領域が手領域である確率を計算し、これら予測領域の位置と確率から最終的に手領域の追跡結果を決めることにより、高速且つ、高精度に手領域を追跡することができる。 By evaluating the similarity between the image features of the hand region predicted in this way and the image features of the hand region model registered by hand region detection (initialization), an evaluation value (possible that these prediction regions are hand regions) ) And calculating the probability that each prediction area is a hand area based on the evaluation value, and finally determining the tracking result of the hand area from the position and probability of these prediction areas. The hand area can be accurately tracked.

また、上述した手領域追跡装置４０において、例えば、手領域のＨＳＶ分布特性と似ている背景領域が存在する場合等の背景の影響を低減するために、距離情報を用いるようにしてもよい。図５４は、例えば、ステレオカメラを用いて、距離情報を推定し、所定距離以上にある領域を背景として除去するようにした手領域追跡装置５０の構成を示すものである。なお、上述した手領域追跡装置４０と同一な構成には同一の符号を付し、説明を省略する。 In the hand region tracking device 40 described above, distance information may be used to reduce the influence of the background, for example, when a background region similar to the HSV distribution characteristic of the hand region exists. FIG. 54 shows a configuration of the hand region tracking apparatus 50 that estimates distance information using a stereo camera, for example, and removes an area that is a predetermined distance or more as a background. In addition, the same code | symbol is attached | subjected to the structure same as the hand region tracking apparatus 40 mentioned above, and description is abbreviate | omitted.

この手領域追跡装置５０は、撮像部５１と、撮像部５１で撮像された画像に基づいて画像内のオブジェクトの距離情報を推定する距離情報推定部５２と、距離情報推定部５２で推定された距離情報に基づいて所定距離にあるオブジェクトの画像領域を計算する距離画像計算部５３と、距離画像計算部５３で計算された距離画像よりも遠い距離にある背景画像を除去する背景画像除去部５４と、前時刻の手領域追跡結果から現時刻の推測手領域位置を予測する手領域位置予測部４３と、手領域予測部４３で予測された推測手領域位置を現時刻の画像の分布特性に基づいて評価する予測領域評価部４４と、予測領域評価部４４における評価結果に基づいて現時刻の手領域位置を推定する手領域追跡結果推定部４５と、手領域追跡結果推定部４５で推定された現時刻の手領域位置を出力する追跡結果出力部４６と、追跡結果出力部４６から出力された現時刻の手領域を記憶する追跡結果記憶部４７とを備えて構成されている。 The hand region tracking device 50 is estimated by the imaging unit 51, the distance information estimation unit 52 that estimates the distance information of the object in the image based on the image captured by the imaging unit 51, and the distance information estimation unit 52. A distance image calculation unit 53 that calculates an image region of an object at a predetermined distance based on the distance information, and a background image removal unit 54 that removes a background image at a distance farther than the distance image calculated by the distance image calculation unit 53 A hand region position prediction unit 43 that predicts the estimated hand region position at the current time from the hand region tracking result at the previous time, and the estimated hand region position predicted by the hand region prediction unit 43 as the distribution characteristics of the image at the current time. A prediction region evaluation unit 44 that evaluates based on the result, a hand region tracking result estimation unit 45 that estimates a hand region position at the current time based on the evaluation result in the prediction region evaluation unit 44, and a hand region tracking result estimation unit 45 A tracking result output unit 46 for outputting a hand region position of the constant has been present time, and a and a tracking result storage unit 47 for storing the hand area of the current time outputted from the tracking result output section 46.

撮像部５１は、２台以上のカメラで構成され、所定の位置関係を有して設置されている。 The imaging unit 51 includes two or more cameras and is installed with a predetermined positional relationship.

距離情報推定部５２は、所定の位置関係を持つ複数の視点（投影中心）から撮像した画像を用いて、撮像画像中の各点と投影中心との距離を、いわゆる「三角測量」の原理により距離情報を推定する。 The distance information estimation unit 52 uses the images captured from a plurality of viewpoints (projection centers) having a predetermined positional relationship, and calculates the distance between each point in the captured image and the projection center based on the principle of so-called “triangulation”. Estimate distance information.

距離画像計算部５３は、予め設定されたカメラからの距離（Depth）の閾値Ｄ_０によって、例えば、閾値Ｄ_０よりも小さい距離にある画素からなる距離画像を計算する。 Distance image calculation unit 53, by the threshold D ₀ of the distance (Depth) from preset camera, for example, calculates a distance image consisting of pixels at distance less than the threshold value D _0.

背景画像除去部５４は、距離画像計算部５３で計算された距離画像以外の領域を背景画像としてマスクする。 The background image removal unit 54 masks an area other than the distance image calculated by the distance image calculation unit 53 as a background image.

図５５〜５８は、所定距離以上にある領域を背景として除去するようにした手領域追跡装置５０における画像処理例を示すものである。例えば、図５５に示すような画像が撮像部５１により撮像された場合、手領域追跡装置５０は、距離画像推定部５２により各画素の距離情報を推定し、図５６に示すように距離画像計算部５３によりある閾値Ｄ_０よりも小さい距離の人物領域を前景として切り出す。背景画像除去部５４は、距離画像計算部５３により切り出された人物領域以外をマスクするマスク画像を生成する。このマスク画像は、例えば、図５７に示すようにマスク領域の濃淡値を０としたものである。そして、背景画像除去部５４は、撮像部５１により撮像された画像からマスク画像を用いて背景を除去する。したがって、予測領域評価部４４に入力される現時刻の画像は、図５８に示すようなマスク画像により背景が除去されたＨＳＶ画像となる。なお、この後の処理は上述した手領域追跡装置４０と同様である。 FIGS. 55 to 58 show examples of image processing in the hand region tracking device 50 in which an area that is more than a predetermined distance is removed as a background. For example, when an image as shown in FIG. 55 is picked up by the image pickup unit 51, the hand region tracking device 50 estimates the distance information of each pixel by the distance image estimation unit 52, and calculates the distance image as shown in FIG. cutting out the person area of the distance smaller than the threshold value D ₀ in the part 53 as the foreground. The background image removal unit 54 generates a mask image that masks areas other than the person region cut out by the distance image calculation unit 53. In this mask image, for example, as shown in FIG. Then, the background image removal unit 54 removes the background from the image captured by the imaging unit 51 using the mask image. Therefore, the image at the current time input to the prediction region evaluation unit 44 is an HSV image with the background removed by the mask image as shown in FIG. The subsequent processing is the same as that of the hand region tracking device 40 described above.

このように所定距離以上にある肌色候補領域を背景として除去することにより、例えば、背景に他の人の手があっても所定距離以内の手領域を追跡するため、実環境のロバスト性をより高めることができる。 By removing skin color candidate areas that are more than a predetermined distance as a background in this manner, for example, even if there is another person's hand in the background, the hand area within a predetermined distance is tracked, so the robustness of the real environment is further improved. Can be increased.

なお、本発明は上述した実施の形態のみに限定されるものではなく、本発明の要旨を逸脱しない範囲において種々の変更が可能であることは勿論である。例えば、本実施の形態では、画像内のオブジェクトとして人の手領域を追跡することとしたが、例えば、顔領域を追跡するようにしてもよい。この場合、手領域追跡と同じように、先ず、顔検出器によって顔の領域を検出し登録する。そして、上述のような追跡処理により顔領域の追跡を行うことができる。 It should be noted that the present invention is not limited to the above-described embodiments, and various modifications can be made without departing from the scope of the present invention. For example, in the present embodiment, the human hand region is tracked as an object in the image, but the face region may be tracked, for example. In this case, as in the hand region tracking, first, a face region is detected and registered by the face detector. Then, the face area can be tracked by the tracking process as described above.

また、例えば、上述の実施の形態では、ハードウェアの構成として説明したが、これに限定されるものではなく、任意の処理を、ＣＰＵ（Central Processing Unit）にコンピュータプログラムを実行させることにより実現することも可能である。この場合、コンピュータプログラムは、記録媒体に記録して提供することも可能であり、また、インターネットその他の伝送媒体を介して伝送することにより提供することも可能である。 Further, for example, in the above-described embodiment, the hardware configuration has been described. However, the present invention is not limited to this, and arbitrary processing is realized by causing a CPU (Central Processing Unit) to execute a computer program. It is also possible. In this case, the computer program can be provided by being recorded on a recording medium, or can be provided by being transmitted via the Internet or another transmission medium.

本発明に係る一実施形態の手領域追跡装置の全体的な処理を説明するための図である。It is a figure for demonstrating the whole process of the hand region tracking apparatus of one Embodiment which concerns on this invention. 本発明に係る一実施形態の手領域検出装置の構成を示すブロック図である。It is a block diagram which shows the structure of the hand region detection apparatus of one Embodiment which concerns on this invention. 入力画像例を示す図である。It is a figure which shows the example of an input image. ＲＧＢ情報による画像処理例を示す図である。It is a figure which shows the example of an image process by RGB information. 肌色候補領域の選定例を示す図である。It is a figure which shows the example of selection of a skin color candidate area | region. 肌色候補領域が選定された画像例を示す図である。It is a figure which shows the example of an image from which the skin color candidate area | region was selected. 色相による画像処理例を示す図である。It is a figure which shows the image processing example by a hue. 肌色候補領域の色相分布例を示す図である。It is a figure which shows the hue distribution example of a skin color candidate area | region. 彩度による画像処理例を示す図である。It is a figure which shows the image processing example by saturation. 肌色候補領域の彩度分布例を示す図である。It is a figure which shows the saturation distribution example of a skin color candidate area | region. 肌色の色相領域の検出を説明するための図である。It is a figure for demonstrating the detection of the hue area | region of a skin color. 肌色の彩度領域の検出を説明するための図である。It is a figure for demonstrating the detection of the saturation area | region of a skin color. 色相画像における色相領域の検出例を示す図である。It is a figure which shows the example of a detection of the hue area | region in a hue image. 彩度画像における色相領域の検出例を示す図である。It is a figure which shows the example of a detection of the hue area | region in a saturation image. 手候補領域の検出例を示す図である。It is a figure which shows the example of detection of a hand candidate area | region. 手の形状に応じた複雑度を説明するための図である。It is a figure for demonstrating the complexity according to the shape of a hand. 手を閉じた場合の入力画像例を示す図である。It is a figure which shows the example of an input image at the time of closing a hand. 手を閉じた場合の形状複雑度の算出結果を示す図である。It is a figure which shows the calculation result of the shape complexity at the time of closing a hand. 指を２本開いた場合の入力画像例を示す図である。It is a figure which shows the example of an input image at the time of opening two fingers. 指を２本開いた場合の形状複雑度の算出結果を示す図である。It is a figure which shows the calculation result of the shape complexity at the time of opening two fingers. 手を開いた場合の入力画像例を示す図である。It is a figure which shows the example of an input image at the time of opening a hand. 手を開いた場合の形状複雑度の算出結果を示す図である。It is a figure which shows the calculation result of the shape complexity at the time of opening a hand. 手候補領域の中心からの距離に基づく形状複雑度を説明するための図である。It is a figure for demonstrating the shape complexity based on the distance from the center of a hand candidate area | region. 中心からの距離の算出結果を示す図である。It is a figure which shows the calculation result of the distance from a center. 本発明に係る一実施形態の手領域検出装置における他の構成（その２）を示すブロック図である。It is a block diagram which shows the other structure (the 2) in the hand area | region detection apparatus of one Embodiment which concerns on this invention. 時刻ｔにおける画像例を示す図である。It is a figure which shows the example of an image in the time t. 時刻ｔ＋１における画像例を示す図である。It is a figure which shows the example of an image in the time t + 1. 動き領域の検出例を示す図である。It is a figure which shows the example of a detection of a motion area | region. 動き領域から手領域を検出した例を示す図である。It is a figure which shows the example which detected the hand area | region from the movement area | region. 肌色候補領域の検出例を示す図である。It is a figure which shows the example of detection of a skin color candidate area | region. 手領域の決定例を示す図である。It is a figure which shows the example of determination of a hand area | region. 動き領域の検出を行わなかった場合の肌色候補領域の検出例を示す図である。It is a figure which shows the example of detection of the skin color candidate area | region at the time of not detecting a motion area | region. 動き領域の検出を行わなかった場合の手領域検出結果例を示す図である。It is a figure which shows the example of a hand area | region detection result at the time of not detecting a movement area | region. 本発明に係る一実施形態の手領域検出装置における他の構成（その３）を示すブロック図である。It is a block diagram which shows the other structure (the 3) in the hand area | region detection apparatus of one Embodiment which concerns on this invention. 入力画像例を示す図である。It is a figure which shows the example of an input image. 肌色候補領域の検出例を示す図である。It is a figure which shows the example of detection of a skin color candidate area | region. 手領域の検出結果例を示す図である。It is a figure which shows the example of a detection result of a hand area | region. 距離情報により手候補領域の距離を推定した例を示す図である。It is a figure which shows the example which estimated the distance of the hand candidate area | region from distance information. 距離情報に基づく手領域の検出結果を示す図である。It is a figure which shows the detection result of the hand area | region based on distance information. 本発明に係る一実施形態の手領域追跡装置の構成を示すブロック図である。It is a block diagram which shows the structure of the hand region tracking apparatus of one Embodiment which concerns on this invention. 手領域位置予測部における処理を説明するための図である。It is a figure for demonstrating the process in a hand area position estimation part. 前時刻において手領域を検出した場合の画像例を示す図である。It is a figure which shows the example of an image when a hand area | region is detected in the previous time. 二値化処理により手領域の面積を計測した場合の画像例を示す図である。It is a figure which shows the example of an image at the time of measuring the area of a hand area | region by the binarization process. 前時刻において手領域を検出しなかった場合の画像例を示す図である。It is a figure which shows the example of an image when a hand area | region is not detected in the previous time. 二値化処理により手領域の面積を計測した場合の画像例を示す図である。It is a figure which shows the example of an image at the time of measuring the area of a hand area | region by the binarization process. 撮像されたＲＧＢ画像例を示す図である。It is a figure which shows the imaged RGB image example. ＲＧＢ画像をＨＳＶ画像に変換した例を示す図である。It is a figure which shows the example which converted the RGB image into the HSV image. ＨＳＶ画像における手領域のヒストグラムを示す図である。It is a figure which shows the histogram of the hand area | region in an HSV image. 本発明の一実施形態における追跡動作のフローチャートである。It is a flowchart of the tracking operation | movement in one Embodiment of this invention. 前時刻における手領域位置の画像例を示す図である。It is a figure which shows the example of an image of the hand region position in the previous time. 現時刻における手領域予測位置例を示す図である。It is a figure which shows the hand region estimated position example in the present time. 手領域予測位置を評価した際の画像例を示す図である。It is a figure which shows the example of an image at the time of evaluating a hand region estimated position. 手領域の追跡位置の画像例を示す図である。It is a figure which shows the example of an image of the tracking position of a hand area | region. 本発明に係る一実施形態の手領域追跡装置における他の構成（その２）を示すブロック図である。It is a block diagram which shows the other structure (the 2) in the hand region tracking apparatus of one Embodiment which concerns on this invention. 現時刻における画像例を示す図である。It is a figure which shows the example of an image in the present time. 距離情報に基づいて距離画像例を示す図である。It is a figure which shows the example of a distance image based on distance information. マスク画像例を示す図である。It is a figure which shows the example of a mask image. マスク画像により背景除去を施した画像例を示す図である。It is a figure which shows the example of an image which performed the background removal by the mask image.

Explanation of symbols

１０手領域検出装置、１１撮像部、１２画像取込部、１３肌色候補領域選定部、１４手候補領域抽出部、１５形状複雑度算出部、１６手領域検出部、２０手領域検出装置、２１遅延回路部、２２動き領域検出部、２３手領域決定部、３０手領域検出装置、３１基準撮像部、３２距離情報取得部、３３手領域決定部
４０手領域追跡装置、４１撮像部、４２画像取込部、４３手領域位置予測部、４４予測領域評価部、４５手領域追跡結果推定部、４６追跡結果出力部、４７追跡結果記憶部、５０手領域追跡装置、５１撮像部、５２距離画像推定部、５３距離画像計算部、５４背景画像除去部 DESCRIPTION OF SYMBOLS 10 hand area detection apparatus, 11 imaging part, 12 image capture part, 13 skin color candidate area selection part, 14 hand candidate area extraction part, 15 shape complexity calculation part, 16 hand area detection part, 20 hand area detection apparatus, 21 Delay circuit unit, 22 motion region detection unit, 23 hand region determination unit, 30 hand region detection device, 31 reference imaging unit, 32 distance information acquisition unit, 33 hand region determination unit 40 hand region tracking device, 41 imaging unit, 42 image Capture unit, 43 hand region position prediction unit, 44 prediction region evaluation unit, 45 hand region tracking result estimation unit, 46 tracking result output unit, 47 tracking result storage unit, 50 hand region tracking device, 51 imaging unit, 52 distance image Estimator, 53 Distance image calculator, 54 Background image remover

Claims

In an object tracking device that tracks an object to be tracked from an image,
With respect to the estimation object for estimating the object position of the object, a prediction range in which the estimation object moves from information of the estimation object of the previous time image is calculated, the prediction range, the estimation object position of the estimation object of the previous time image, and Position prediction means for predicting the estimated object position of the current time image based on the movement speed;
Position evaluation means for evaluating a plurality of estimated object positions predicted by the position prediction means based on object similarity between a current time image of the estimated object and a pre-registered object image;
An object tracking device comprising: estimation means for estimating an object position of a current time image based on each estimated object position evaluated by the position evaluation means.

2. The object tracking device according to claim 1, wherein the position predicting unit calculates the prediction range based on one of the size, color, and shape of the object of the previous time image and the movement speed.

2. The object tracking device according to claim 1, wherein the position predicting unit predicts the estimated object position of the current time image independently from each of the plurality of estimated objects of the previous time image.

The position evaluation means evaluates each estimated object position predicted by the position prediction means based on similarity between the distribution characteristics of the current time image of the estimated object and the distribution characteristics of the object image registered in advance. The object tracking device according to claim 1, wherein:

The position evaluation means sets the similarity between the distribution characteristic of the current time image of the estimated object and the distribution characteristic of the object image registered in advance to the distance from the center of the estimated object position predicted by the position prediction means. 5. The object tracking device according to claim 4, wherein the evaluation is performed based on the evaluation.

The position evaluation means evaluates the similarity between the distribution characteristics of the current time image of the guess object and the distribution characteristics of the object image registered in advance based on other objects tracked at the same time. The object tracking device according to claim 4.

The position evaluation means may change the evaluation of the estimated object position based on the evaluation of the other object when the estimated object position predicted by the position prediction means is included in the object area of the other object. 7. The object tracking device according to claim 6, wherein

7. The object tracking device according to claim 6, wherein the other object is a human face area.

The tracking means calculates a probability that the estimated object position evaluated by the position evaluation means is an object region based on the evaluation result of the object position, and based on the probability of each estimated object position, the object of the current time image The object tracking device according to claim 1, wherein the position is estimated.

Distance information acquisition means for acquiring distance information obtained by estimating the position of each pixel in the three-dimensional space based on images from a plurality of cameras having a predetermined positional relationship;
The object tracking device according to claim 1, further comprising: a background removing unit that removes the background of the object from the image based on the distance information acquired by the distance information acquiring unit.

The object tracking apparatus according to claim 1, wherein the object is a human hand area.

In an object tracking method for tracking an object to be tracked from an image,
With respect to the estimation object for estimating the object position of the object, a prediction range in which the estimation object moves from information of the estimation object of the previous time image is calculated, the prediction range, the estimation object position of the estimation object of the previous time image, and A position prediction step for predicting the estimated object position of the current time image based on the movement speed;
A position evaluation step for evaluating a plurality of estimated object positions predicted in the position prediction step based on object similarity between a current time image of the estimated object and a pre-registered object image;
An object tracking method comprising: an estimation step of estimating an object position of a current time image based on each estimated object position evaluated in the position evaluation step.

In a program that executes a process of tracking an object to be tracked from an image,
With respect to the estimation object for estimating the object position of the object, a prediction range in which the estimation object moves from information of the estimation object of the previous time image is calculated, the prediction range, the estimation object position of the estimation object of the previous time image, and A position prediction step for predicting the estimated object position of the current time image based on the movement speed;
A position evaluation step for evaluating a plurality of estimated object positions predicted in the position prediction step based on object similarity between a current time image of the estimated object and a pre-registered object image;
An estimation step of estimating an object position of a current time image based on each estimated object position evaluated in the position evaluation step.