JP4153818B2

JP4153818B2 - Gesture recognition device, gesture recognition method, and gesture recognition program

Info

Publication number: JP4153818B2
Application number: JP2003096271A
Authority: JP
Inventors: 信男檜垣; 貴通嶋田
Original assignee: Honda Motor Co Ltd
Current assignee: Honda Motor Co Ltd
Priority date: 2003-03-31
Filing date: 2003-03-31
Publication date: 2008-09-24
Anticipated expiration: 2023-03-31
Also published as: JP2004302992A

Abstract

PROBLEM TO BE SOLVED: To provide a gesture recognition device capable of reducing the amount of calculation required for posture recognition processing or gesture recognition processing. SOLUTION: The gesture recognition device 4 is provided with: a face and finger position detection means 41 for detecting a face position and finger positions in the real space of a target person; and a posture/gesture recognition means 42 for recognizing the posture or gesture of the target person on the basis of the face position and finger positions detected by the face and finger position detection means 41. The posture/gesture recognition means 42 detects the "relative positional relation between the face position and the finger positions" and "displacement of finger positions based on the face position" from "the face position in the real space" and "the finger positions in the real space" and compares the detected results with the posture data or the gesture data stored in the posture/gesture storage part 12A to recognize the posture or gesture of the target person. COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、カメラによって対象人物を撮像した画像から、対象人物のポスチャ（姿勢）又はジェスチャ（動作）を認識するための装置、方法及びプログラムに関する。
【０００２】
【従来の技術】
従来、カメラによって対象人物を撮像した画像から、対象人物の動きの特徴を示す点（特徴点）を検出し、その特徴点に基づいて対象人物のジェスチャを推定するジェスチャ認識手法が数多く提案されている（例えば、特許文献１参照）。
【０００３】
【特許文献１】
特開２０００−１４９０２５号公報（第３−６頁、第１図）
【０００４】
【発明が解決しようとする課題】
しかし、従来のジェスチャ認識手法では、対象人物のジェスチャを認識する際に、前記特徴点を一々検出する必要があるため、ポスチャ認識処理又はジェスチャ認識処理に要する計算量が多くなるという問題があった。
【０００５】
本発明は、以上のような問題点に鑑みてなされたものであり、ポスチャ認識処理又はジェスチャ認識処理に要する計算量を減らすことのできるジェスチャ認識装置を提供することを目的とする。
【０００６】
【課題を解決するための手段】
請求項１に記載のジェスチャ認識装置は、カメラによって対象人物を撮像した画像から、前記対象人物のポスチャ又はジェスチャを認識するための装置であって、前記撮像画像から生成した前記対象人物の輪郭情報と肌色領域情報に基づいて、前記対象人物の実空間上における顔位置と手先位置を検出する顔・手先位置検出手段と、前記顔位置と手先位置から、前記顔位置と前記手先位置との相対的な位置関係及び前記顔位置を基準とした際の前記手先位置の変動を検出し、その検出結果と、顔位置と手先位置との相対的な位置関係及び顔位置を基準とした際の手先位置の変動に対応するポスチャ又はジェスチャを記したポスチャデータ又はジェスチャデータとを比較することにより、前記対象人物のポスチャ又はジェスチャを認識するポスチャ・ジェスチャ認識手段と、を備え、前記ポスチャ・ジェスチャ認識手段は、前記相対的な位置関係について、前記対象人物の手が入る大きさの判定領域を設定し、前記手の面積と前記判定領域の面積とを比較することにより、前記顔位置と前記手先位置との相対的な位置関係が類似しているポスチャ又はジェスチャを区別することを特徴とする。
【０００７】
この装置は、まず、顔・手先位置検出手段によって、画像から生成した対象人物の輪郭情報と肌色領域情報に基づいて、対象人物の実空間上における顔位置と手先位置とを検出する。次に、ポスチャ・ジェスチャ認識手段によって、「顔位置及び手先位置から、顔位置と手先位置との相対的な位置関係」及び「顔位置を基準とした際の手先位置の変動」を検出する。そして、その検出結果と、「顔位置と手先位置との相対的な位置関係」及び「顔位置を基準とした際の手先位置の変動」に対応するポスチャ又はジェスチャを記したポスチャデータ又はジェスチャデータとを比較することにより、対象人物のポスチャ又はジェスチャを認識する。
【０００８】
なお、ポスチャ・ジェスチャ認識手段が調べる「顔位置と手先位置との相対的な位置関係」とは、具体的には、「顔位置及び手先位置の高さ」及び「顔位置及び手先位置のカメラからの距離」のことである（請求項２）。このように構成すると、「顔位置の高さ」と「手先位置の高さ」との比較、及び「顔位置のカメラからの距離」と「手先位置のカメラからの距離」との比較により、「顔位置と手先位置との相対的な位置関係」を容易に検出することができる。また、ポスチャ・ジェスチャ認識手段は、「画像上における顔位置及び手先位置の水平方向のずれ」を調べることにより、「顔位置と手先位置との相対的な位置関係」を検出することもできる。
【０００９】
また、ポスチャ・ジェスチャ認識手段は、対象人物のポスチャ又はジェスチャを認識する際に、パターンマッチング法を用いることもできる（請求項３）。このように構成すると、「顔位置と手先位置との相対的な位置関係」と「顔位置を基準とした際の手先位置の変動」とからなる「入力パターン」を、予め記憶しておいたポスチャデータ又はジェスチャデータと重ね合わせて、最も似ているパターンを探すことにより、対象人物のポスチャ又はジェスチャを容易に認識することができる。
【００１０】
また、ポスチャ・ジェスチャ認識手段は、対象人物の手が入る大きさの判定領域を設定し、手の面積と判定領域の面積とを比較することにより、顔位置と手先位置との相対的な位置関係が類似しているポスチャ又はジェスチャを区別することができる（請求項１）。このように構成すると、例えば、共に手先位置の高さが顔位置の高さよりも低く、カメラから手先位置までの距離がカメラから顔位置までの距離よりも短いため、互いに区別しづらい、「ＨＡＮＤＳＨＡＫＥ」というポスチャ（図９（ｄ）参照）と、「ＣＯＭＥＨＥＲＥ」というジェスチャ（図１０（ｃ）参照）とを区別することが可能となる。具体的には、手先の面積が判定領域である判定円の面積の１／２よりも大きい場合は「ＣＯＭＥＨＥＲＥ」であると判定し、手先の面積が判定円の面積の１／２以下である場合は「ＨＡＮＤＳＨＡＫＥ」であると判定することにより、両者を区別する。
【００１１】
請求項４に記載のジェスチャ認識方法は、カメラによって対象人物を撮像した画像から、前記対象人物のポスチャ又はジェスチャを認識するための方法であって、前記画像から生成した前記対象人物の輪郭情報と肌色領域情報に基づいて、前記対象人物の実空間上における顔位置と手先位置を顔・手先位置検出手段により検出する顔・手先位置検出ステップと、前記顔位置と手先位置から、ポスチャ・ジェスチャ認識手段により、前記顔位置と前記手先位置との相対的な位置関係及び前記顔位置を基準とした際の前記手先位置の変動を検出し、その検出結果と、顔位置と手先位置との相対的な位置関係及び顔位置を基準とした際の手先位置の変動に対応するポスチャ又はジェスチャを記したポスチャデータ又はジェスチャデータとを比較することにより、前記対象人物のポスチャ又はジェスチャを認識するポスチャ・ジェスチャ認識ステップと、を含み、前記ポスチャ・ジェスチャ認識ステップは、前記ポスチャ・ジェスチャ認識手段により、前記相対的な位置関係について、前記対象人物の手が入る大きさの判定領域を設定し、前記手の面積と前記判定領域の面積とを比較することにより、前記顔位置と前記手先位置との相対的な位置関係が類似しているポスチャ又はジェスチャを区別することを特徴とする。
【００１２】
この方法は、まず、顔・手先位置検出ステップにおいて、画像から生成した対象人物の輪郭情報と肌色領域情報とに基づいて、対象人物の実空間上における顔位置と手先位置とを検出する。次に、ポスチャ・ジェスチャ認識ステップにおいて、「顔位置及び手先位置から顔位置と手先位置との相対的な位置関係」及び「顔位置を基準とした際の手先位置の変動」検出する。そして、その検出結果と、「顔位置と手先位置との相対的な位置関係」及び「顔位置を基準とした際の手先位置の変動」に対応するポスチャ又はジェスチャを記したポスチャデータ又はジェスチャデータとを比較することにより、対象人物のポスチャ又はジェスチャを認識する。
【００１３】
請求項５に記載のジェスチャ認識プログラムは、カメラによって対象人物を撮像した画像から、前記対象人物のポスチャ又はジェスチャを認識するために、コンピュータを、前記画像から生成した前記対象人物の輪郭情報と肌色領域情報に基づいて、前記対象人物の実空間上における顔位置と手先位置を検出する顔・手先位置検出手段、前記顔位置と手先位置から、前記顔位置と前記手先位置との相対的な位置関係及び前記顔位置を基準とした際の前記手先位置の変動を検出し、その検出結果と、顔位置と手先位置との相対的な位置関係及び顔位置を基準とした際の手先位置の変動に対応するポスチャ又はジェスチャを記したポスチャデータ又はジェスチャデータとを比較することにより、前記対象人物のポスチャ又はジェスチャを認識するポスチャ・ジェスチャ認識手段、として機能させ、前記ポスチャ・ジェスチャ認識手段は、前記相対的な位置関係について、前記対象人物の手が入る大きさの判定領域を設定し、前記手の面積と前記判定領域の面積とを比較することにより、前記顔位置と前記手先位置との相対的な位置関係が類似しているポスチャ又はジェスチャを区別するようにすることを特徴とする。
【００１４】
このプログラムは、まず、顔・手先位置検出手段によって、画像から生成した対象人物の輪郭情報と肌色領域情報に基づいて、対象人物の実空間上における顔位置と手先位置を検出する。次に、ポスチャ・ジェスチャ認識手段によって、「顔位置及び手先位置から顔位置と手先位置との相対的な位置関係」及び「顔位置を基準とした際の手先位置の変動」を検出する。そして、その検出結果と、「顔位置と手先位置との相対的な位置関係」及び「顔位置を基準とした際の手先位置の変動」に対応するポスチャ又はジェスチャを記したポスチャデータ又はジェスチャデータとを比較することにより、対象人物のポスチャ又はジェスチャを認識する。
【００１５】
【発明の実施の形態】
以下、本発明の実施の形態について、適宜図面を参照して詳細に説明する。ここでは、まず、本発明に係るジェスチャ認識装置を含むジェスチャ認識システムの構成について図１〜図１９を参照して説明し、その後、ジェスチャ認識システムの動作について図２０及び図２１を参照して説明する。
【００１６】
（ジェスチャ認識システムＡの構成）
まず、本発明に係るジェスチャ認識装置４を含むジェスチャ認識システムＡの全体構成について図１を参照して説明する。図１はジェスチャ認識システムＡの全体構成を示すブロック図である。
【００１７】
図１に示すように、ジェスチャ認識システムＡは、図示しない対象人物を撮像する２台のカメラ１（１ａ，１ｂ）と、カメラ１で撮像された画像（撮像画像）を解析して各種情報を生成する撮像画像解析装置２と、撮像画像解析装置２で生成された各種情報に基づいて対象人物の輪郭を抽出する輪郭抽出装置３と、撮像画像解析装置２で生成された各種情報と、輪郭抽出装置３で抽出された対象人物の輪郭（輪郭情報）に基づいて、対象人物のポスチャ（姿勢）又はジェスチャ（動作）を認識するジェスチャ認識装置４とから構成されている。以下、カメラ１、撮像画像解析装置２、輪郭抽出装置３、ジェスチャ認識装置４について、順に説明する。
【００１８】
（カメラ１）
カメラ１（１ａ，１ｂ）はカラーＣＣＤカメラであり、右カメラ１ａと左カメラ１ｂは、左右に距離Ｂだけ離れて並設されている。ここでは、右カメラ１ａを基準カメラとしている。カメラ１ａ，１ｂで撮像された画像（撮像画像）は、フレーム毎に図示しないフレームグラバに記憶された後、撮像画像解析装置２に同期して入力される。
【００１９】
なお、カメラ１ａ，１ｂで撮像した画像（撮像画像）は、図示しない補正機器によりキャリブレーション処理とレクティフィケーション処理を行い、画像補正した後に撮像画像解析装置２に入力される。
【００２０】
（撮像画像解析装置２）
撮像画像解析装置２は、カメラ１ａ，１ｂから入力された画像（撮像画像）を解析して、「距離情報」、「動き情報」、「エッジ情報」、「肌色領域情報」を生成する装置である（図１参照）。
【００２１】
図２は、図１に示したジェスチャ認識システムＡに含まれる撮像画像解析装置２と輪郭抽出装置３の構成を示すブロック図である。図２に示すように、撮像画像解析装置２は、「距離情報」を生成する距離情報生成部２１と、「動き情報」を生成する動き情報生成部２２と、「エッジ情報」を生成するエッジ情報生成部２３と、「肌色領域情報」を生成する肌色領域情報生成部２４とから構成されている。
【００２２】
（距離情報生成部２１）
距離情報生成部２１は、同時刻にカメラ１ａ，１ｂで撮像された２枚の撮像画像の視差に基づいて、各画素についてカメラ１からの距離を検出する。具体的には、基準カメラであるカメラ１ａで撮像された第１の撮像画像と、カメラ１ｂで撮像された第２の撮像画像とからブロック相関法を用いて視差を求め、その視差から三角法を用いて、カメラ１から「各画素に撮像された物」までの距離を求める。そして、求めた距離を第１の撮像画像の各画素に対応付けて、距離を画素値で表現した距離画像Ｄ１（図３（ａ）参照）を生成する。この距離画像Ｄ１が距離情報となる。図３（ａ）の例では、同一の距離に対象人物Ｃが存在している。
【００２３】
なお、ブロック相関法とは、第１の撮像画像と第２の撮像画像とで特定の大きさの同一ブロック（例えば８×３画素）を比較し、第１の撮像画像と第２の撮像画像とでブロック内の被写体が何画素分ずれているかを調べることにより視差を検出する方法である。
【００２４】
（動き情報生成部２２）
動き情報生成部２２は、基準カメラであるカメラ１ａで時系列に撮像した「時刻ｔ」における「撮像画像（ｔ）」と、「時刻ｔ＋Δｔ」における「撮像画像（ｔ＋Δｔ）」との差分に基づいて、対象人物の動きを検出する。具体的には、「撮像画像（ｔ）」と「撮像画像（ｔ＋Δｔ）」との差分をとり、各画素の変位を調べる。そして、調べた変位に基づいて変位ベクトルを求め、求めた変位ベクトルを画素値で表わした差分画像Ｄ２（図３（ｂ）参照）を生成する。この差分画像Ｄ２が動き情報となる。図３（ｂ）の例では、対象人物Ｃの左腕に動きが検出されている。
【００２５】
（エッジ情報生成部２３）
エッジ情報生成部２３は、基準カメラであるカメラ１ａで撮像された画像（撮像画像）における各画素の濃淡情報又は色情報に基づいて、その撮像画像内に存在するエッジを抽出したエッジ画像を生成する。具体的には、撮像画像における各画素の輝度に基づいて、輝度が大きく変化する部分をエッジとして検出し、そのエッジのみからなるエッジ画像Ｄ３（図３（ｃ）参照）を生成する。このエッジ画像Ｄ３がエッジ情報となる。
【００２６】
エッジの検出は、例えばＳｏｂｅｌオペレータを画素毎に乗算し、行又は列単位で、隣の線分と所定の差がある線分をエッジ（横エッジ又は縦エッジ）として検出する。なお、Ｓｏｂｅｌオペレータとは、ある画素の近傍領域の画素に対して重み係数を持つ係数行例のことである。
【００２７】
（肌色領域情報生成部２４）
肌色領域情報生成部２４は、基準カメラであるカメラ１ａで撮像された画像（撮像画像）から、その撮像画像内に存在する対象人物の肌色領域を抽出する。具体的には、撮像画像における全画素のＲＧＢ値を、色相、明度、彩度からなるＨＬＳ空間に変換し、色相、明度、彩度が予め設定された閾値の範囲内にある画素を肌色領域として抽出する（図３（ｄ）参照）。図３（ｄ）の例では、対象人物Ｃの顔が肌色領域Ｒ１として抽出され、手が肌色領域Ｒ２として抽出されている。この肌色領域Ｒ１，Ｒ２が肌色領域情報となる。
【００２８】
撮像画像解析装置２で生成された「距離情報（距離画像Ｄ１）」、「動き情報（差分画像Ｄ２）」、「エッジ情報（エッジ画像Ｄ３）」は、輪郭抽出装置３に入力される。また、撮像画像解析装置２で生成された「距離情報（距離画像Ｄ１）」と「肌色領域情報（肌色領域Ｒ１，Ｒ２）」は、ジェスチャ認識装置４に入力される。
【００２９】
（輪郭抽出装置３）
輪郭抽出装置３は、撮像画像解析装置２で生成された「距離情報（距離画像Ｄ１）」、「動き情報（差分画像Ｄ２）」、「エッジ情報（エッジ画像Ｄ３）」に基づいて、対象人物の輪郭を抽出する装置である（図１参照）。
【００３０】
図２に示すように、輪郭抽出装置３は、対象人物が存在する距離である「対象距離」を設定する対象距離設定部３１と、「対象距離」に基づいた「対象距離画像」を生成する対象距離画像生成部３２と、「対象距離画像内」における「対象領域」を設定する対象領域設定部３３と、「対象領域内」から「対象人物の輪郭」を抽出する輪郭抽出部３４とから構成されている。
【００３１】
（対象距離設定部３１）
対象距離設定部３１は、撮像画像解析装置２で生成された距離画像Ｄ１（図３（ａ）参照）と、差分画像Ｄ２（図３（ｂ）参照）とに基づいて、対象人物が存在する距離である「対象距離」を設定する。具体的には、距離画像Ｄ１における同一の画素値を有する画素を一群（画素群）として、差分画像Ｄ２における前記画素群の画素値を累計する。そして、画素値の累計値が所定値よりも大きい、かつ、カメラ１に最も近い距離にある領域に、最も動き量の多い移動物体、即ち対象人物が存在しているとみなし、その距離を対象距離とする（図４（ａ）参照）。図４（ａ）の例では、対象距離は２．２ｍに設定されている。対象距離設定部３１で設定された対象距離は、対象距離画像生成部３２に入力される。
【００３２】
（対象距離画像生成部３２）
対象距離画像生成部３２は、撮像画像解析装置２で生成された距離画像Ｄ１（図３（ａ）参照）を参照し、対象距離設定部３１で設定された対象距離±αｍに存在する画素に対応する画素をエッジ画像Ｄ３（図３（ｃ）参照）から抽出した「対象距離画像」を生成する。具体的には、距離画像Ｄ１における対象距離設定部３１から入力された対象距離±αｍに対応する画素を求める。そして、求められた画素のみをエッジ情報生成部２３で生成されたエッジ画像Ｄ３から抽出し、対象距離画像Ｄ４（図４（ｂ）参照）を生成する。したがって、対象距離画像Ｄ４は、対象距離に存在する対象人物をエッジで表現した画像になる。対象距離画像生成部３２で生成された対象距離画像Ｄ４は、対象領域設定部３３と輪郭抽出部３４に入力される。
【００３３】
（対象領域設定部３３）
対象領域設定部３３は、対象距離画像生成部３２で生成された対象距離画像Ｄ４（図３（ｂ）参照）内における「対象領域」を設定する。具体的には、対象距離画像Ｄ４の縦方向の画素値を累計したヒストグラムＨを生成し、ヒストグラムＨにおける度数が最大となる位置を、対象人物Ｃの水平方向における中心位置として特定する（図５（ａ）参照）。そして、特定された中心位置の左右に特定の大きさ（例えば０．５ｍ）の範囲を対象領域Ｔとして設定する（図５（ｂ）参照）。なお、対象領域Ｔの縦方向の範囲は、特定の大きさ（例えば２ｍ）に設定される。また、対象領域Ｔを設定する際は、カメラ１のチルト角や高さ等のカメラパラメータを参照して、対象領域Ｔの設定範囲を補正する。対象領域設定部３３で設定された対象領域Ｔは、輪郭抽出部３４に入力される。
【００３４】
（輪郭抽出部３４）
輪郭抽出部３４は、対象距離画像生成部３２で生成された対象距離画像Ｄ４（図４（ｂ）参照）において、対象領域設定部３３で設定された対象領域Ｔ内から対象人物Ｃの輪郭Ｏを抽出する（図５（ｃ）参照）。具体的には、対象人物Ｃの輪郭Ｏを抽出する際は、「Ｓｎａｋｅｓ」と呼ばれる閉曲線からなる動的輪郭モデルを用いた手法（以下、「スネーク手法」という）を用いる。なお、スネーク手法とは、動的輪郭モデルである「Ｓｎａｋｅｓ」を、予め定義されたエネルギ関数が最小となるように収縮変形させることにより、対象物の輪郭を抽出する手法である。輪郭抽出部３４で抽出された対象人物Ｃの輪郭Ｏは、「輪郭情報」としてジェスチャ認識装置４に入力される（図１参照）。
【００３５】
ジェスチャ認識装置４は、撮像画像解析装置２で生成された「距離情報」及び「肌色領域情報」と、輪郭抽出装置３で生成された「輪郭情報」とに基づいて、対象人物のポスチャ又はジェスチャを認識し、その認識結果を出力する装置である（図１参照）。
【００３６】
図６は、図１に示したジェスチャ認識システムＡに含まれるジェスチャ認識装置４の構成を示すブロック図である。図６に示すように、ジェスチャ認識装置４は、対象人物Ｃの実空間上における顔位置と手先位置を検出する顔・手先位置検出手段４１と、顔・手先位置検出手段４１によって検出された顔位置と手先位置に基づいて、対象人物のポスチャ又はジェスチャを認識するポスチャ・ジェスチャ認識手段４２とを備えている。
【００３７】
顔・手先位置検出手段４１は、実空間上における対象人物の「頭頂部の位置（頭頂部位置）」を検出する頭位置検出部４１Ａと、対象人物の「顔の位置（顔位置）」を検出する顔位置検出部４１Ｂと、対象人物の「手の位置（手位置）」を検出する手位置検出部４１Ｃと、対象人物の「手先の位置（手先位置）」を検出する手先位置検出部４１Ｄとから構成されている。なお、ここでいう「手」とは、腕（ａｒｍ）と手（Ｈａｎｄ）とからなる部位のことであり、「手先」とは、手（Ｈａｎｄ）の指先のことである。
【００３８】
（頭位置検出部４１Ａ）
頭位置検出部４１Ａは、輪郭抽出装置３で生成された輪郭情報に基づいて、対象人物Ｃの「頭頂部位置」を検出する。頭頂部位置の検出方法について図７（ａ）を参照して説明すると、まず、輪郭Ｏで囲まれた領域における重心Ｇを求める（１）。次に、頭頂部位置を探索するための領域（頭頂部位置探索領域）Ｆ１を設定する（２）。頭頂部位置探索領域Ｆ１の横幅（Ｘ軸方向の幅）は、重心ＧのＸ座標を中心にして、予め設定されている人間の平均肩幅Ｗとなるようにする。なお、人間の平均肩幅Ｗは、撮像画像解析装置２で生成された距離情報を参照して設定される。また、頭頂部位置探索領域Ｆ１の縦幅（Ｙ軸方向の幅）は、輪郭Ｏを覆うことができるような幅に設定される。そして、頭頂部位置探索領域Ｆ１内における輪郭Ｏの上端点を、頭頂部位置ｍ１とする（３）。頭位置検出部４１Ａで検出された頭頂部位置ｍ１は、顔位置検出部４１Ｂに入力される。
【００３９】
（顔位置検出部４１Ｂ）
顔位置検出部４１Ｂは、頭位置検出部４１Ａで検出された頭頂部位置ｍ１と、撮像画像解析装置２で生成された肌色領域情報とに基づいて、対象人物Ｃの「顔位置」を検出する。顔位置の検出方法について図７（ｂ）を参照して説明すると、まず、顔位置を探索するための領域（顔位置探索領域）Ｆ２を設定する（４）。顔位置探索領域Ｆ２の範囲は、頭頂部位置ｍ１を基準にして、予め設定されている「おおよそ人間の頭部を覆う大きさ」となるようにする。なお、顔位置探索領域Ｆ２の範囲は、撮像画像解析装置２で生成された距離情報を参照して設定される。
【００４０】
次に、顔位置探索領域Ｆ２内における肌色領域Ｒ１の重心を、画像上における顔位置ｍ２とする（５）。肌色領域Ｒ１については、撮像画像解析装置２で生成された肌色領域情報を参照する。そして、画像上における顔位置ｍ２（Ｘｆ，Ｙｆ）から、撮像画像解析装置２で生成された距離情報を参照して、実空間上における顔位置ｍ２ｔ（Ｘｆｔ，Ｙｆｔ，Ｚｆｔ）を求める。
【００４１】
顔位置検出部４１Ｂで検出された「画像上における顔位置ｍ２」は、手位置検出部４１Ｃと手先位置検出部４１Ｄに入力される。また、顔位置検出部４１Ｂで検出された「実空間上における顔位置ｍ２ｔ」は、図示しない記憶手段に記憶され、ポスチャ・ジェスチャ認識手段４２のポスチャ・ジェスチャ認識部４２Ｂ（図６参照）において対象人物Ｃのポスチャ又はジェスチャを認識する際に使用される。
【００４２】
（手位置検出部４１Ｃ）
手位置検出部４１Ｃは、撮像画像解析装置２で生成された肌色領域情報と、輪郭抽出装置３で生成された輪郭情報とに基づいて、対象人物Ｃの「手位置」を検出する。なお、ここでは、肌色領域情報は、顔位置ｍ２周辺を除いた領域の情報を用いる。手位置の検出方法について図８（ａ）を参照して説明すると、まず、手位置を探索するための領域（手位置探索領域）Ｆ３（Ｆ３Ｒ，Ｆ３Ｌ）を設定する（６）。手位置探索領域Ｆ３は、顔位置検出部４１Ｂで検出された顔位置ｍ２を基準にして、予め設定されている「手が届く範囲（左右の手の届く範囲）」となるようにする。なお、手位置探索領域Ｆ３の大きさは、撮像画像解析装置２で生成された距離情報を参照して設定される。
【００４３】
次に、手位置探索領域Ｆ３内における肌色領域Ｒ２の重心を、画像上における手位置ｍ３とする（７）。肌色領域Ｒ２については、撮像画像解析装置２で生成された肌色領域情報を参照する。なお、ここでは、肌色領域情報は、顔位置ｍ２周辺を除いた領域の情報を用いる。図８（ａ）の例では、肌色領域は手位置探索領域Ｆ３（Ｌ）においてのみ存在しているので、手位置ｍ３は手位置探索領域Ｆ３（Ｌ）においてのみ検出される。また、図８（ａ）の例では、対象人物は長袖の服を着ており、手首より先しか露出していないので、手（ＨＡＮＤ）の位置が手位置ｍ３となる。手位置検出部４１Ｃで検出された「画像上における手位置ｍ３」は、手先位置検出部４１Ｄに入力される。
【００４４】
（手先位置検出部４１Ｄ）
手先位置検出部４１Ｄは、顔位置検出部４１Ｂで検出された顔位置ｍ２と、手位置検出部４１Ｃで検出された手位置ｍ３とに基づいて、対象人物Ｃの「手先位置」を検出する。手先位置の検出方法について図８（ｂ）を参照して説明すると、まず、手位置探索領域Ｆ３Ｌ内において、手先位置を探索するための領域（手先位置探索範囲）Ｆ４を設定する（８）。手先位置探索範囲Ｆ４は、手位置ｍ３を中心にして、予め設定されている「おおよそ手を覆う大きさ」となるようにする。なお、手先位置探索範囲Ｆ４の範囲は、撮像画像解析装置２で生成された距離情報を参照して設定される。
【００４５】
続いて、手先位置探索範囲Ｆ４における肌色領域Ｒ２の上下左右の端点ｍ４ａ〜ｍ４ｄを検出する（９）。肌色領域Ｒ２については、撮像画像解析装置２で生成された肌色領域情報を参照する。そして、上下端点間（ｍ４ａ、ｍ４ｂ間）の垂直方向距離ｄ１と、左右端点間（ｍ４ｃ、ｍ４ｄ間）の水平方向距離ｄ２とを比較し、距離が長い方を手が伸びている方向と判断する（１０）。図８（ｂ）の例では、垂直方向距離ｄ１の方が水平方向距離ｄ２よりも距離が長いので、手先は上下方向に伸びていると判断される。
【００４６】
次に、画像上における顔位置ｍ２と、画像上における手位置ｍ３との位置関係に基づいて、上下端点ｍ４ａ，ｍ４ｂのどちら（もしくは左右端点ｍ４ｃ，ｍ４ｄのどちらか）が手先位置であるかを判断する。具体的には、手位置ｍ３が顔位置ｍ２から遠い場合は、手は伸びているとみなし、顔位置ｍ２から遠い方の端点を手先位置（画像上における手先位置）ｍ４と判断する。逆に、手位置ｍ３が顔位置ｍ２に近い場合は、肘を曲げているとみなし、顔位置ｍ２に近い方の端点を手先位置ｍ４と判断する。図８（ｂ）の例では、手位置ｍ３が顔位置ｍ２から遠く、上端点ｍ４ａが下端点ｍ４ｂよりも顔位置ｍ２から遠いので、上端点ｍ４ａが手先位置ｍ４であると判断する（１１）。
【００４７】
そして、画像上における手先位置ｍ４（Ｘｈ，Ｙｈ）から、撮像画像解析装置２で生成された距離情報を参照して、実空間上における手先位置ｍ４ｔ（Ｘｈｔ，Ｙｈｔ，Ｚｈｔ）を求める。手先位置検出部４１Ｄで検出された「実空間上における手先位置ｍ４ｔ」は、図示しない記憶手段に記憶され、ポスチャ・ジェスチャ認識手段４２のポスチャ・ジェスチャ認識部４２Ｂ（図６参照）において対象人物Ｃのポスチャ又はジェスチャを認識する際に使用される。
【００４８】
（ポスチャ・ジェスチャ認識手段４２）
ポスチャ・ジェスチャ認識手段４２は、ポスチャデータ及びジェスチャデータを記憶するポスチャ・ジェスチャデータ記憶部４２Ａと、顔・手先位置検出手段４１によって検出された「実空間上における顔位置ｍ２ｔ」及び「実空間上における手先位置ｍ４ｔ」に基づいて、対象人物のポスチャ又はジェスチャを認識するポスチャ・ジェスチャ認識部４２Ｂとから構成されている（図６参照）。
【００４９】
（ポスチャ・ジェスチャデータ記憶部４２Ａ）
ポスチャ・ジェスチャデータ記憶部４２Ａは、ポスチャデータＰ１〜Ｐ６（図９参照）とジェスチャデータＪ１〜Ｊ４（図１０参照）を記憶している。ポスチャデータＰ１〜Ｐ６とジェスチャデータＪ１〜Ｊ４は、「実空間上における顔位置と手先位置との相対的な位置関係」及び「顔位置を基準とした際の手先位置の変動」に対応するポスチャ又はジェスチャを記したデータである。なお、「顔位置と手先位置との相対的な位置関係」とは、具体的には、「顔位置及び手先位置の高さ」と、「顔位置及び手先位置のカメラ１からの距離」とのことである。また、ポスチャ・ジェスチャ認識手段４２は、「画像上における顔位置及び手先位置の水平方向のずれ」を調べることにより、「顔位置と手先位置との相対的な位置関係」を検出することもできる。ポスチャデータＰ１〜Ｐ６とジェスチャデータＪ１〜Ｊ４は、ポスチャ・ジェスチャ認識部４２Ｂにおいて対象人物のポスチャ又はジェスチャを認識する際に使用される。
【００５０】
ポスチャデータＰ１〜Ｐ６について図９を参照して説明すると、図９（ａ）に示す「ポスチャＰ１：ＦＡＣＥＳＩＤＥ」は「こんにちは」、図９（ｂ）に示す「ポスチャＰ２：ＨＩＧＨＨＡＮＤ」は「追従開始」、図９（ｃ）に示す「ポスチャＰ３：ＳＴＯＰ」は「止まれ」、図９（ｄ）に示す「ポスチャＰ４：ＨＡＮＤＳＨＡＫＥ」は「握手」、図９（ｅ）に示す「ポスチャＰ５：ＳＩＤＥＨＡＮＤ」は「手の方向を見よ」、図９（ｆ）に示す「ポスチャＰ６：ＬＯＷＨＡＮＤ」は「手の方向に曲がれ」を意味するポスチャである。
【００５１】
また、ジェスチャＪ１〜Ｊ４について図１０を参照して説明すると、図１０（ａ）に示す「ジェスチャＪ１：ＨＡＮＤＳＷＩＮＧ」は「注意せよ」、図１０（ｂ）に示す「ジェスチャＪ２：ＢＹＥＢＹＥ」は「さようなら」、図１０（ｃ）に示す「ジェスチャＪ３：ＣＯＭＥＨＥＲＥ」は「接近せよ」、図１０（ｄ）に示す「ジェスチャＪ４：ＨＡＮＤＣＩＲＣＬＩＮＧ」は「旋回せよ」を意味するジェスチャである。
【００５２】
なお、本実施の形態では、ポスチャ・ジェスチャデータ記憶部４２Ａ（図６参照）は、ポスチャデータＰ１〜Ｐ６（図９参照）とジェスチャデータＪ１〜Ｊ４（図１０参照）を記憶しているが、ポスチャ・ジェスチャデータ記憶部４２Ａに記憶させるポスチャデータとジェスチャデータは任意に設定することができる。また、各ポスチャと各ジェスチャの意味も任意に設定することができる。
【００５３】
ポスチャ・ジェスチャ認識部４２Ｂは、顔・手先位置検出手段４１によって検出された「実空間上における顔位置ｍ２ｔ」及び「実空間上における手先位置ｍ４ｔ」から、「顔位置ｍ２ｔと手先位置ｍ４ｔとの相対的な位置関係」及び「顔位置ｍ２ｔを基準とした際の手先位置ｍ４ｔの変動」を検出し、その検出結果と、ポスチャ・ジェスチャデータ記憶部４２Ａに記憶されているポスチャデータＰ１〜Ｐ６（図９参照）又はジェスチャデータＪ１〜Ｊ４（図１０参照）とを比較することにより、対象人物のポスチャ又はジェスチャを認識する。なお、ポスチャ・ジェスチャ認識部４２Ｂでの認識結果は履歴として保存される。
【００５４】
次に、図１１〜図１４に示すフローチャートを参照して、ポスチャ・ジェスチャ認識部４２Ｂにおけるポスチャ又はジェスチャの認識方法について詳しく説明する。ここでは、まず、図１１に示すフローチャートを参照してポスチャ・ジェスチャ認識部４２Ｂでの処理の概略について説明し、その後、図１２に示すフローチャートを参照して図１１に示したフローチャートにおける「ステップＳ１：ポスチャ認識処理」について説明し、図１３及び図１４に示すフローチャートを参照して図１１に示したフローチャートにおける「ステップＳ４：ポスチャ・ジェスチャ認識処理」について説明する。
【００５５】
（ポスチャ・ジェスチャ認識部４２Ｂでの処理の概略）
図１１は、ポスチャ・ジェスチャ認識部４２Ｂでの処理の概略を説明するためのフローチャートである。図１１に示すフローチャートを参照して、まず、ステップＳ１では、ポスチャＰ１〜Ｐ４（図９参照）の認識を試みる。続いて、ステップＳ２では、ステップＳ１においてポスチャを認識できたかどうかを判断する。ここで、ポスチャを認識できたと判断された場合はステップＳ３に進み、ポスチャを認識できなかったと判断された場合はステップＳ４に進む。そして、ステップＳ３では、ステップＳ１において認識されたポスチャを認識結果として出力し、処理を終了する。
【００５６】
ステップＳ４では、ポスチャＰ５，Ｐ６（図９参照）又はジェスチャＪ１〜Ｊ４（図１０参照）の認識を試みる。次に、ステップＳ５では、ステップＳ４においてポスチャ又はジェスチャを認識できたかどうかを判断する。ここで、ポスチャ又はジェスチャを認識できたと判断された場合はステップＳ６に進み、ポスチャ又はジェスチャを認識できなかったと判断された場合はステップＳ８に進む。
【００５７】
ステップＳ６では、過去の所定数のフレーム（例えば１０フレーム）において、同一のポスチャ又はジェスチャを所定回数（例えば５回）以上認識できたかどうかを判断する。ここで、同一のポスチャ又はジェスチャを所定回数以上認識できたと判断された場合はステップＳ７に進み、同一のポスチャ又はジェスチャを所定回数以上認識できなかったと判断された場合はステップＳ８に進む。
【００５８】
そして、ステップＳ７では、ステップＳ４において認識されたポスチャ又はジェスチャを認識結果として出力し、処理を終了する。また、ステップＳ８では、ポスチャ又はジェスチャを認識できなかった、即ち認識不能であると出力し、処理を終了する。
【００５９】
（ステップＳ１：ポスチャ認識処理）
図１２は、図１１に示したフローチャートにおける「ステップＳ１：ポスチャ認識処理」について説明するためのフローチャートである。図１２に示すフローチャートを参照して、まず、ステップＳ１１では、顔・手先位置検出手段４１から対象人物の実空間上における顔位置ｍ２ｔ及び手先位置ｍ４ｔ（以降、「入力情報」という）が入力される。続いて、ステップＳ１２では、顔位置ｍ２ｔ及び手先位置ｍ４ｔに基づいて、カメラ１から手先までの距離（以降、「手先距離」という）と、カメラ１から顔までの距離（以降、「顔距離」という）とを比較し、手先距離と顔距離とがほぼ同じであるかどうか、つまり手先距離と顔距離との差が所定値以下であるかどうかを判断する。ここで、両者がほぼ同じ距離であると判断された場合はステップＳ１３に進み、両者がほぼ同じではないと判断された場合はステップＳ１８に進む。
【００６０】
ステップＳ１３では、手先の高さ（以降、「手先高さ」という）と、顔の高さ（以降、「顔高さ」という）とを比較し、手先高さと顔高さとがほぼ同じであるかどうか、つまり手先高さと顔高さとの差が所定値以下であるかどうかを判断する。ここで、両者がほぼ同じであると判断された場合はステップＳ１４に進み、両者がほぼ同じではないと判断された場合はステップＳ１５に進む。ステップＳ１４では、入力情報に対応するポスチャは、「ポスチャＰ１：ＦＡＣＥＳＩＤＥ」（図９（ａ）参照）であるという認識結果を出力し、処理を終了する。
【００６１】
ステップＳ１５では、手先高さと顔高さとを比較し、手先高さが顔高さよりも高いかどうかを判断する。ここで、手先高さが顔高さよりも高いと判断された場合はステップＳ１６に進み、手先高さが顔高さよりも高くはないと判断された場合はステップＳ１７に進む。そして、ステップＳ１６では、入力情報に対応するポスチャは、「ポスチャＰ２：ＨＩＧＨＨＡＮＤ」（図９（ｂ）参照）であるという認識結果を出力し、処理を終了する。また、ステップＳ１７では、入力情報に対応するポスチャは「無し」という認識結果を出力し、処理を終了する。
【００６２】
ステップＳ１８では、手先高さと顔高さとを比較し、手先高さと顔高さとがほぼ同じであるかどうか、つまり手先高さと顔高さとの差が所定値以下であるかどうかを判断する。ここで、両者がほぼ同じであると判断された場合はステップＳ１９に進み、両者がほぼ同じではないと判断された場合はステップＳ２０に進む。そして、ステップＳ１９では、入力情報に対応するポスチャは、「ポスチャＰ３：ＳＴＯＰ」（図９（ｃ）参照）であるという認識結果を出力し、処理を終了する。
【００６３】
ステップＳ２０では、手先高さと顔高さとを比較し、手先高さが顔高さよりも低いかどうかを判断する。ここで、手先高さが顔高さよりも低くはないと判断された場合はステップＳ２１に進み、手先高さが顔高さよりも高いと判断された場合はステップＳ２２に進む。そして、ステップＳ２１では、入力情報に対応するポスチャは「ポスチャＰ４：ＨＡＮＤＳＨＡＫＥ」（図９（ｄ）参照）であるという認識結果を出力し、処理を終了する。また、ステップＳ２２では、入力情報に対応するポスチャは「無し」という認識結果を出力し、処理を終了する。
【００６４】
（ステップＳ４：ポスチャ・ジェスチャ認識処理）
図１３は、図１１に示したフローチャートにおける「ステップＳ４：ポスチャ・ジェスチャ認識処理」について説明するための第１のフローチャートである。図１３に示すフローチャートを参照して、まず、ステップＳ３１では、入力情報（対象人物の実空間上における顔位置ｍ２ｔ及び手先位置ｍ４ｔ）が入力される。続いて、ステップＳ３２では、顔位置ｍ２ｔを基準とした場合の手先位置ｍ４ｔの標準偏差を求め、求めた標準偏差に基づいて、手の動きの有無を判断する。具体的には、手先位置ｍ４ｔの標準偏差が所定値以下の場合は手の動きが無いと判断し、所定値よりも大きい場合は手の動きが有ると判断する。ここで、手の動きが無いと判断された場合はステップＳ３３に進み、手の動きが有ると判断された場合はステップＳ３６に進む。
【００６５】
ステップＳ３３では、手先高さが顔高さのすぐ下であるかどうかを判断する。ここで、手先高さが顔高さのすぐ下であると判断された場合はステップＳ３４に進み、手先高さが顔高さのすぐ下ではないと判断された場合はステップＳ３５に進む。そして、ステップＳ３４では、入力情報に対応するポスチャ又はジェスチャは、「ポスチャＰ５：ＳＩＤＥＨＡＮＤ」（図９（ｅ）参照）であるという認識結果を出力し、処理を終了する。また、ステップＳ３５では、入力情報に対応するポスチャ又はジェスチャは、「ポスチャＰ６：ＬＯＷＨＡＮＤ」（図９（ｆ）参照）であるという認識結果を出力し、処理を終了する。
【００６６】
ステップＳ３６では、手先高さと顔高さとを比較し、手先高さが顔高さよりも高いかどうかを判断する。ここで、手先高さが顔高さよりも高いと判断された場合はステップＳ３７に進み、手先高さが顔高さよりも高くはないと判断された場合はステップＳ４１（図１４参照）に進む。そして、ステップＳ３７では、手先距離と顔距離とを比較し、手先距離と顔距離とがほぼ同じであるかどうか、つまり手先距離と顔距離との差が所定値以下であるかどうかを判断する。ここで、両者がほぼ同じであると判断された場合はステップＳ３８に進み、両者がほぼ同じではないと判断された場合はステップＳ４０に進む。
【００６７】
ステップＳ３８では、手先が左右に振れているかどうかを判断する。ここで、２フレーム間における左右方向のずれから、手先が左右に振れていると判断された場合はステップＳ３９に進み、手先が左右に振れていないと判断された場合はステップＳ４０に進む。そして、ステップＳ３９では、入力情報に対応するポスチャ又はジェスチャは、「ジェスチャＪ１：ＨＡＮＤＳＷＩＮＧ」（図１０（ａ）参照）であるという認識結果を出力し、処理を終了する。また、ステップＳ４１では、入力情報に対応するポスチャ又はジェスチャは「無し」という認識結果を出力して処理を終了する。
【００６８】
図１４は、図１１に示したフローチャートにおける「ステップＳ４：ポスチャ・ジェスチャ認識処理」について説明するための第２のフローチャートである。図１４に示すフローチャートを参照して、ステップＳ４１では、手先距離と顔距離とを比較し、手先距離が顔距離よりも短いかどうかを判断する。ここで、手先距離が顔距離よりも短いと判断された場合はステップＳ４２に進み、手先距離が顔距離よりも短くはないと判断された場合はステップＳ４７に進む。
【００６９】
ステップＳ４２では、手先が左右に振れているかどうかを判断する。ここで、２フレーム間における左右方向のずれから、手先が左右に振れていると判断された場合はステップＳ４３に進み、手先が左右に振れていないと判断された場合はステップＳ４４に進む。そして、ステップＳ４３では、入力情報に対応するポスチャ又はジェスチャは、「ジェスチャＪ２：ＢＹＥＢＹＥ」（図１０（ｂ）参照）であるという認識結果を出力し、処理を終了する。
【００７０】
ステップＳ４４では、手先が上下に振れているかどうかを判断する。ここで、２フレーム間における上下方向のずれから、手先が上下に振れていると判断された場合はステップＳ４５に進み、手先が上下に振れていないと判断された場合はステップＳ４６に進む。そして、ステップＳ４５では、入力情報に対応するポスチャ又はジェスチャは、「ジェスチャＪ３：ＣＯＭＥＨＥＲＥ」（図１０（ｃ）参照）であるという認識結果を出力し、処理を終了する。また、ステップＳ４６では、入力情報に対応するポスチャ又はジェスチャは「無し」という認識結果を出力して処理を終了する。
【００７１】
ステップＳ４７では、手先距離と顔距離とを比較し、手先距離と顔距離とがほぼ同じであるかどうか、つまり手先距離と顔距離との差が所定値以下であるかどうかを判断する。ここで、両者がほぼ同じであると判断された場合はステップＳ４８に進み、両者がほぼ同じではないと判断された場合はステップＳ５０に進む。ステップＳ４８では、手先が左右に振れているかどうかを判断する。ここで、手先が左右に振れていると判断された場合はステップＳ４９に進み、手先が左右に振れていないと判断された場合はステップＳ５０に進む。
【００７２】
そして、ステップＳ４９では、入力情報に対応するポスチャ又はジェスチャは、「ジェスチャＪ４：ＨＡＮＤＣＩＲＣＬＩＮＧ」（図１０（ｄ）参照）であるという認識結果を出力し、処理を終了する。また、ステップＳ５０では、入力情報に対応するポスチャ又はジェスチャは「無し」という認識結果を出力して処理を終了する。
【００７３】
以上のようにして、ポスチャ・ジェスチャ認識部４２Ｂは、顔・手先位置検出手段４１から入力された入力情報（対象人物の実空間上における顔位置ｍ２ｔ及び手先位置ｍ４ｔ）から、「顔位置ｍ２ｔと手先位置ｍ４ｔとの相対的な位置関係」及び「顔位置ｍ２ｔを基準とした際の手先位置ｍ４ｔの変動」を検出し、その検出結果と、ポスチャ・ジェスチャデータ記憶部４２Ａに記憶されたポスチャＰ１〜Ｐ６（図９参照）及びジェスチャデータＪ１〜Ｊ４（図１０参照）とを比較することにより、対象人物のポスチャ又はジェスチャを認識することができる。
【００７４】
なお、ポスチャ・ジェスチャ認識部４２Ｂでは前記した方法以外にも、次の変形例１及び変形例２の方法によっても対象人物のポスチャ又はジェスチャを認識することができる。以下、図１５〜図１７を参照して「ポスチャ・ジェスチャ認識部４２Ｂでの処理の変形例１」について説明し、図１８及び図１９を参照して「ポスチャ・ジェスチャ認識部４２Ｂでの処理の変形例２」について説明する。
【００７５】
（ポスチャ・ジェスチャ認識部４２Ｂでの処理の変形例１）
変形例１では、対象人物のポスチャ又はジェスチャを認識する際に、パターンマッチング法を用いている。図１５は、「ポスチャ・ジェスチャ認識部４２Ｂでの処理の変形例１」について説明するためのフローチャートである。図１５に示すフローチャートを参照して、まず、ステップＳ６１では、ポスチャ又はジェスチャの認識を試みる。このとき、ポスチャ・ジェスチャ認識部４２Ｂは、顔・手先位置検出手段４１から入力された入力情報（対象人物の実空間上における顔位置ｍ２ｔ及び手先位置ｍ４ｔ）と「顔位置ｍ２ｔを基準とした際の手先位置ｍ４ｔの変動」とからなる「入力パターン」を、ポスチャ・ジェスチャデータ記憶部４２Ａに記憶されているポスチャＰ１１〜Ｐ１６（図１６参照）又はジェスチャデータＪ１１〜Ｊ１４（図１７参照）と重ね合わせて、最も似ているパターンを探すことにより、対象人物のポスチャ又はジェスチャを認識する。なお、ポスチャ・ジェスチャデータ記憶部４２Ａには、予めパターンマッチング用のポスチャデータＰ１１〜Ｐ１６（図１６参照）とジェスチャデータＪ１１〜Ｊ１４（図１７参照）とを記憶させておく。
【００７６】
続いて、ステップＳ６２では、ステップＳ６１においてポスチャ又はジェスチャを認識できたかどうかを判断する。ここで、ポスチャ又はジェスチャを認識できたと判断された場合はステップＳ６３に進み、ポスチャ又はジェスチャを認識できなかったと判断された場合はステップＳ６５に進む。
【００７７】
ステップＳ６３では、過去の所定数のフレーム（例えば１０フレーム）において、同一のポスチャ又はジェスチャを所定回数（例えば５回）以上認識できたかどうかを判断する。ここで、同一のポスチャ又はジェスチャを所定回数以上認識できたと判断された場合はステップＳ６４に進み、同一のポスチャ又はジェスチャを所定回数以上認識できなかったと判断された場合はステップＳ６５に進む。
【００７８】
そして、ステップＳ６４では、ステップＳ６１において認識されたポスチャ又はジェスチャを認識結果として出力し、処理を終了する。また、ステップＳ６５では、ポスチャ又はジェスチャを認識することができなかった、即ち認識不能であると出力し、処理を終了する。
【００７９】
以上のようにして、ポスチャ・ジェスチャ認識部４２Ｂは、パターンマッチング法を用いて、顔・手先位置検出手段４１から入力された入力情報と「顔位置ｍ２ｔを基準とした際の手先位置ｍ４ｔの変動」とからなる「入力パターン」を、ポスチャ・ジェスチャデータ記憶部４２Ａに記憶されたポスチャＰ１１〜Ｐ１６（図１６参照）及びジェスチャデータＪ１１〜Ｊ１４（図１７参照）とパターンマッチングさせることにより、対象人物のポスチャ又はジェスチャを認識することができる。
【００８０】
（ポスチャ・ジェスチャ認識部４２Ｂでの処理の変形例２）
変形例２では、対象人物の手先が入る大きさの判定円Ｅを設定し、手先の面積と判定円の面積とを比較することにより、顔位置ｍ２ｔと手先位置ｍ４ｔとの相対的な位置関係が類似している「ポスチャＰ４：ＨＡＮＤＳＨＡＫＥ」（図９（ｄ）参照）と「ジェスチャＪ３：ＣＯＭＥＨＥＲＥ」（図１０（ｃ）参照）とを区別している。なお、「ポスチャＰ４：ＨＡＮＤＳＨＡＫＥ」と「ジェスチャＪ３：ＣＯＭＥＨＥＲＥ」は、共に手先位置の高さが顔位置の高さよりも低く、カメラから手先位置までの距離がカメラから顔位置までの距離よりも短いため、互いに区別しづらい。
【００８１】
図１８は、ポスチャ・ジェスチャ認識部４２Ｂでの処理の変形例２について説明するためのフローチャートである。図１８に示すフローチャートを参照して、まず、ステップＳ７１では、手位置ｍ３を中心とする判定円（判定領域）Ｅを設定する（図１９参照）。判定円Ｅの大きさは、判定円Ｅ内に手が入るような大きさに設定される。なお、判定円Ｅの大きさ（直径）は、撮像画像解析装置２で生成された距離情報を参照して設定される。図１９の例では、判定円Ｅの直径は２０ｃｍに設定されている。
【００８２】
続いて、ステップＳ７２では、判定円Ｅ内における肌色領域Ｒ２の面積Ｓｈが判定円Ｅの面積Ｓの１／２以上かどうかを判断する。なお、肌色領域Ｒ２については、撮像画像解析装置２で生成された肌色領域情報を参照する。ここで、肌色領域Ｒ２の面積Ｓｈが判定円Ｅの面積Ｓの１／２以上と判断された場合（図１９（ｂ）参照）はステップＳ７３に進み、肌色領域Ｒ２の面積Ｓｈが判定円Ｅの面積Ｓの１／２以上ではない、即ち１／２よりも小さいと判断された場合（図１９（ｃ）参照）はステップＳ７４に進む。
【００８３】
ステップＳ７３では、入力情報に対応するポスチャ又はジェスチャは、「ジェスチャＪ３：ＣＯＭＥＨＥＲＥ」（図１０（ｃ）参照）であるという判定結果を出力し、処理を終了する。また、ステップＳ７３では、入力情報に対応するポスチャ又はジェスチャは、「ポスチャＰ４：ＨＡＮＤＳＨＡＫＥ」（図９（ｄ）参照）であるという判定結果を出力し、処理を終了する。
【００８４】
以上のようにして、ポスチャ・ジェスチャ認識部４２Ｂは、対象人物の手先が入る大きさの判定円Ｅを設定し、判定円Ｅ内における肌色領域Ｒ２の面積Ｓｈと判定円Ｅの面積とを比較し、手先の面積が判定円の面積の１／２よりも大きい場合は「ＣＯＭＥＨＥＲＥ」であると判定し、手先の面積が判定円の面積の１／２以下である場合は「ＨＡＮＤＳＨＡＫＥ」であると判定することにより、「ジェスチャＪ３：ＣＯＭＥＨＥＲＥ」と「ポスチャＰ４：ＨＡＮＤＳＨＡＫＥ」とを区別することができる。
【００８５】
（ジェスチャ認識システムＡの動作）
次に、ジェスチャ認識システムＡの動作について図１に示すジェスチャ認識システムＡの全体構成を示すブロック図と、図２０及び図２１に示すフローチャートを参照して説明する。図２０は、ジェスチャ認識システムＡの動作における「撮像画像解析ステップ」と「輪郭抽出ステップ」を説明するために示すフローチャートであり、図２１は、ジェスチャ認識システムＡの動作における「顔・手先位置検出ステップ」と「ポスチャ・ジェスチャ認識ステップ」を説明するために示すフローチャートである。
【００８６】
＜撮像画像解析ステップ＞
図２０に示すフローチャートを参照して、撮像画像解析装置２では、カメラ１ａ，１ｂから撮像画像が入力されると（ステップＳ１０１）、距離情報生成部２１において、撮像画像から距離情報である距離画像Ｄ１（図３（ａ）参照）を生成し（ステップＳ１０２）、動き情報生成部２２において、撮像画像から動き情報である差分画像Ｄ２（図３（ｂ）参照）を生成する（ステップＳ１０３）。また、エッジ情報生成部２３において、撮像画像からエッジ情報であるエッジ画像Ｄ３（図３（ｃ）参照）を生成し（ステップＳ１０４）、肌色領域情報生成部２４において、撮像画像から肌色領域情報である肌色領域Ｒ１，Ｒ２（図３（ｄ）参照）を抽出する（ステップＳ１０５）。
【００８７】
＜輪郭抽出ステップ＞
引き続き図２０に示すフローチャートを参照して、輪郭抽出装置３では、まず、対象距離設定部３１において、ステップＳ１０２とステップＳ１０３で生成された距離画像Ｄ１と差分画像Ｄ２から、対象人物が存在する距離である対象距離を設定する（ステップＳ１０６）。続いて、対象距離画像生成部３２において、ステップＳ１０４で生成されたエッジ画像Ｄ３からステップＳ１０６で設定された対象距離に存在する画素を抽出した対象距離画像Ｄ４（図４（ｂ）参照）を生成する（ステップＳ１０７）。
【００８８】
次に、対象領域設定部３３において、ステップＳ１０７で生成された対象距離画像Ｄ４内における対象領域Ｔ（図５（ｂ）参照）を設定する（ステップＳ１０８）。そして、輪郭抽出部３４において、ステップＳ１０８で設定された対象領域Ｔ内から、対象人物Ｃの輪郭Ｏ（図５（ｃ）参照）を抽出する（ステップＳ１０９）。
【００８９】
＜顔・手先位置検出ステップ＞
図２１に示すフローチャートを参照して、ジェスチャ認識装置４の顔・手先位置検出手段４１では、まず、頭位置検出部４１Ａにおいて、ステップＳ１０９で生成された輪郭情報に基づいて、対象人物Ｃの頭頂部位置ｍ１（図７（ａ）参照）を検出する（ステップＳ１１０）。
【００９０】
続いて、顔位置検出部４１Ｂにおいて、ステップＳ１１０で検出された頭頂部位置ｍ１と、ステップＳ１０５で生成された肌色領域情報とに基づいて、「画像上における顔位置ｍ２」（図７（ｂ）参照）を検出し、検出された「画像上における顔位置ｍ２（Ｘｆ，Ｙｆ）」から、ステップＳ１０２で生成された距離情報を参照して、「実空間上における顔位置ｍ２ｔ（Ｘｆｔ，Ｙｆｔ，Ｚｆｔ）」を求める（ステップＳ１１１）。
【００９１】
次に、手位置検出部４１Ｃにおいて、ステップＳ１１１で検出された「画像上における顔位置ｍ２」から、「画像上における手位置ｍ３」（図８（ａ）参照）を検出する（ステップＳ１１２）。
【００９２】
そして、手先位置検出部４１Ｄにおいて、顔位置検出部４１Ｂで検出された「画像上における顔位置ｍ２」と、手位置検出部４１Ｃで検出された手位置ｍ３とに基づいて、「画像上における手先位置ｍ４」（図８（ｂ）参照）を検出し、検出された「画像上における手先位置ｍ４（Ｘｈ，Ｙｈ）」から、ステップＳ１０２で生成された距離情報を参照して、「実空間上における手先位置ｍ４ｔ（Ｘｈｔ，Ｙｈｔ，Ｚｈｔ）」を求める（ステップＳ１１３）。
【００９３】
＜ポスチャ・ジェスチャ認識ステップ＞
引き続き図１８に示すフローチャートを参照して、ジェスチャ認識装置４のポスチャ・ジェスチャ認識手段４２では、ポスチャ・ジェスチャ認識部４２Ｂにおいて、ステップＳ１１１及びステップＳ１１３で求められた「実空間上における顔位置ｍ２ｔ（Ｘｆｔ，Ｙｆｔ，Ｚｆｔ）」と「実空間上における手先位置ｍ４ｔ（Ｘｈｔ，Ｙｈｔ，Ｚｈｔ）」とから、「顔位置ｍ２ｔと手先位置ｍ４ｔとの相対的な位置関係」及び「顔位置ｍ２ｔを基準とした際の手先位置ｍ４ｔの変動」を検出し、その検出結果と、ポスチャ・ジェスチャデータ記憶部４２Ａに記憶されたポスチャＰ１〜Ｐ６（図９参照）及びジェスチャデータＪ１〜Ｊ４（図１０参照）とを比較することにより、対象人物のポスチャ又はジェスチャを認識する（ステップＳ１１４）。ポスチャ・ジェスチャ認識部４２Ｂにおけるポスチャ又はジェスチャの認識方法の詳細については、前記したのでここでは省略する。
【００９４】
以上、ジェスチャ認識システムＡについて説明したが、このジェスチャ認識システムＡに含まれるジェスチャ認識装置４は、コンピュータにおいて各手段を各機能プログラムとして実現することも可能であり、各機能プログラムを結合してジェスチャ認識プログラムとして動作させることも可能である。
【００９５】
また、このジェスチャ認識システムＡは、例えば自律ロボットに適用することができる。その場合、自律ロボットは、例えば人が手を差し出すとそのポスチャ「ポスチャＰ４：ＨＡＮＤＳＨＡＫＥ」（図９（ｄ）参照）と認識することや、人が手を振るとそのジェスチャを「ジェスチャＪ１：ＨＡＮＤＳＷＩＮＧ」（図１０（ａ）参照）と認識することが可能となる。
【００９６】
なお、ポスチャやジェスチャによる指示は、音声による指示と比べて、周囲の騒音により左右されない、音声が届かないような状況でも指示が可能である、言葉では表現が難しい（又は冗長になる）指示を簡潔に行うことができる、という利点がある。
【００９７】
【発明の効果】
以上、詳細に説明したように、本発明によれば、対象人物のジェスチャを認識する際に、特徴点（対象人物の動きの特徴を示す点）を一々検出する必要が無いため、ポスチャ認識処理又はジェスチャ認識処理に要する計算量を減らすことができる。
【図面の簡単な説明】
【図１】ジェスチャ認識システムＡの全体構成を示すブロック図である。
【図２】図１に示したジェスチャ認識システムＡに含まれる撮像画像解析装置２と輪郭抽出装置３の構成を示すブロック図である。
【図３】（ａ）は距離画像Ｄ１、（ｂ）は差分画像Ｄ２、（ｃ）はエッジ画像Ｄ３、（ｄ）は肌色領域Ｒ１，Ｒ２を示す図である。
【図４】対象距離を設定する方法を説明するための図である。
【図５】対象領域Ｔを設定する方法と、対象領域Ｔ内から対象人物Ｃの輪郭Ｏを抽出する方法を説明するための図である。
【図６】図１に示したジェスチャ認識システムＡに含まれるジェスチャ認識装置４の構成を示すブロック図である。
【図７】（ａ）は頭頂部位置ｍ１の検出方法を説明するための図であり、（ｂ）は顔位置ｍ２の検出方法を説明するための図である。
【図８】（ａ）は手位置ｍ３の検出方法を説明するための図であり、（ｂ）は手先位置ｍ４の検出方法を説明するための図である。
【図９】ポスチャデータＰ１〜Ｐ６を示す図である。
【図１０】ジェスチャデータＪ１〜Ｊ４を示す図である。
【図１１】ポスチャ・ジェスチャ認識部４２Ｂでの処理の概略を説明するためのフローチャートである。
【図１２】図１１に示したフローチャートにおける「ステップＳ１：ポスチャ認識処理」について説明するためのフローチャートである。
【図１３】図１１に示したフローチャートにおける「ステップＳ４：ポスチャ・ジェスチャ認識処理」について説明するための第１のフローチャートである。
【図１４】図１１に示したフローチャートにおける「ステップＳ４：ポスチャ・ジェスチャ認識処理」について説明するための第２のフローチャートである。
【図１５】ポスチャ・ジェスチャ認識部４２Ｂでの処理の変形例１について説明するためのフローチャートである。
【図１６】ポスチャデータＰ１１〜Ｐ１６を示す図である。
【図１７】ジェスチャデータＪ１１〜Ｊ１４を示す図である。
【図１８】ポスチャ・ジェスチャ認識部４２Ｂでの処理の変形例２について説明するためのフローチャートである。
【図１９】（ａ）は判定円Ｅの設定方法を説明するための図である。また、（ｂ）は肌色領域Ｒ２の面積Ｓｈが判定円Ｅの面積Ｓの１／２よりも大きい場合を示す図であり、（ｃ）は肌色領域Ｒ２の面積Ｓｈが判定円Ｅの面積Ｓの１／２以下である場合を示す図である。
【図２０】ジェスチャ認識システムＡの動作における「撮像画像解析ステップ」と「輪郭抽出ステップ」を説明するために示すフローチャートである。
【図２１】ジェスチャ認識システムＡの動作における「顔・手先位置検出ステップ」と「ポスチャ・ジェスチャ認識ステップ」を説明するために示すフローチャートである。
【符号の説明】
Ａジェスチャ認識システム
１カメラ
２撮像画像解析装置
３輪郭抽出装置
４ジェスチャ認識装置
４１顔・手先位置検出手段
４１Ａ頭位置検出部
４１Ｂ顔位置検出部
４１Ｃ手位置検出部
４１Ｄ手先位置検出部
４２ポスチャ・ジェスチャ認識手段
４２Ａポスチャ・ジェスチャデータ記憶部
４２Ｂポスチャ・ジェスチャ認識部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an apparatus, a method, and a program for recognizing a posture (posture) or a gesture (motion) of a target person from an image obtained by capturing the target person with a camera.
[0002]
[Prior art]
Conventionally, many gesture recognition methods have been proposed in which a point (feature point) indicating a motion characteristic of a target person is detected from an image of the target person captured by a camera, and the target person's gesture is estimated based on the feature point. (For example, refer to Patent Document 1).
[0003]
[Patent Document 1]
Japanese Unexamined Patent Publication No. 2000-149025 (page 3-6, FIG. 1)
[0004]
[Problems to be solved by the invention]
However, the conventional gesture recognition method has a problem that the amount of calculation required for the posture recognition process or the gesture recognition process increases because it is necessary to detect the feature points one by one when recognizing the gesture of the target person. .
[0005]
The present invention has been made in view of the above problems, and an object of the present invention is to provide a gesture recognition device that can reduce the amount of calculation required for posture recognition processing or gesture recognition processing.
[0006]
[Means for Solving the Problems]
The gesture recognition apparatus according to claim 1 is an apparatus for recognizing a posture or a gesture of the target person from an image obtained by capturing the target person with a camera, and the contour information of the target person generated from the captured image. And a face / hand position detecting means for detecting a face position and a hand position in the real space of the target person based on the skin color area information, and the relative position between the face position and the hand position from the face position and the hand position. Of the hand position when the positional relationship and the face position are used as a reference, and the detection result and the relative positional relationship between the face position and the hand position and the hand position when the face position is used as a reference A posture for recognizing the posture or gesture of the target person by comparing the posture data or gesture data describing the posture or gesture corresponding to the change in position. Provided with a turbocharger, gesture recognition means, the The posture / gesture recognizing unit sets a determination area having a size that allows the hand of the target person to enter with respect to the relative positional relationship, and compares the area of the hand with the area of the determination area, Distinguish a posture or gesture in which the relative positional relationship between the face position and the hand position is similar It is characterized by that.
[0007]
In this apparatus, first, the face / hand position detection means detects the face position and hand position of the target person in the real space based on the contour information and skin color area information of the target person generated from the image. Next, the posture / gesture recognition means detects “the relative positional relationship between the face position and the hand position from the face position and the hand position” and “the fluctuation of the hand position with respect to the face position”. Then, posture data or gesture data in which the detection result and the posture or gesture corresponding to “the relative positional relationship between the face position and the hand position” and “the fluctuation of the hand position when the face position is used as a reference” are described. To recognize the posture or gesture of the target person.
[0008]
The “relative positional relationship between the face position and the hand position” checked by the posture / gesture recognition means specifically refers to “the height of the face position and the hand position” and “the camera of the face position and the hand position”. It is a "distance from" (Claim 2). With this configuration, by comparing the “face position height” with the “hand position height”, and the “face position camera distance” with the “hand position camera distance”, The “relative positional relationship between the face position and the hand position” can be easily detected. Further, the posture / gesture recognition means can detect “the relative positional relationship between the face position and the hand position” by examining “the horizontal shift of the face position and the hand position on the image”.
[0009]
The posture / gesture recognition means may use a pattern matching method when recognizing the posture or gesture of the target person (claim 3). With this configuration, an “input pattern” composed of “the relative positional relationship between the face position and the hand position” and “the fluctuation of the hand position with respect to the face position” is stored in advance. It is possible to easily recognize the posture or gesture of the target person by searching for the most similar pattern by superimposing the posture data or the gesture data.
[0010]
In addition, the posture / gesture recognition unit sets a determination area of a size in which the hand of the target person can enter, and compares the area of the hand with the area of the determination area, thereby comparing the relative position between the face position and the hand position. Differentiate postures or gestures that have similar relationships ( Claim 1 ). With this configuration, for example, both the hand position height is lower than the face position height, and the distance from the camera to the hand position is shorter than the distance from the camera to the face position. "(See FIG. 9D) and the gesture" COMOMEHER "(see FIG. 10C) can be distinguished. Specifically, when the area of the hand is larger than ½ of the area of the determination circle that is the determination region, it is determined as “COMEHERE”, and the area of the hand is equal to or less than ½ of the area of the determination circle. In this case, it is discriminated by determining that it is “HANDSHAKE”.
[0011]
Claim 4 The gesture recognition method according to claim 1 is a method for recognizing a posture or a gesture of the target person from an image obtained by capturing the target person with a camera, and the contour information and skin color area information of the target person generated from the image Based on the face position and hand position of the target person in real space By face / hand position detection means From the face / hand position detection step to be detected, and the relative position relationship between the face position and the hand position and the face position by the posture / gesture recognition means based on the face position and the hand position. Posture data that detects fluctuations of the hand position and describes the detection result, the relative positional relationship between the face position and the hand position, and the posture or gesture corresponding to the movement of the hand position when the face position is used as a reference, or A gesture / gesture recognition step for recognizing a posture or gesture of the target person by comparing with gesture data, and the posture / gesture recognition step includes: The posture / gesture recognition means sets a determination area of a size that allows the hand of the target person to enter with respect to the relative positional relationship, and compares the area of the hand with the area of the determination area. Distinguish between postures and gestures in which the relative positional relationship between the face position and the hand position is similar It is characterized by that.
[0012]
In this method, first, in the face / hand position detection step, the face position and hand position in the real space of the target person are detected based on the contour information and skin color area information of the target person generated from the image. Next, in the posture / gesture recognition step, “the relative positional relationship between the face position and the hand position from the face position and the hand position” and “the fluctuation of the hand position with respect to the face position” are detected. Then, posture data or gesture data in which the detection result and the posture or gesture corresponding to “the relative positional relationship between the face position and the hand position” and “the fluctuation of the hand position when the face position is used as a reference” are described. To recognize the posture or gesture of the target person.
[0013]
Claim 5 In order to recognize the posture or gesture of the target person from an image obtained by capturing the target person with a camera, the gesture recognition program described in (1) is applied to the contour information and skin color area information of the target person generated from the image. A face / hand position detecting means for detecting a face position and a hand position in the real space of the target person based on the relative position relationship between the face position and the hand position from the face position and the hand position; and The variation in the hand position when the face position is used as a reference is detected, and the detection result, the relative positional relationship between the face position and the hand position, and the change in the hand position when the face position is used as a reference are dealt with. Posture for recognizing the posture or gesture of the target person by comparing with posture data or gesture data describing the posture or gesture Gesture recognition means, to function as, The posture / gesture recognizing means sets a determination area of a size in which the hand of the target person can enter with respect to the relative positional relationship, and compares the area of the hand with the area of the determination area, thereby Differentiate postures or gestures in which the relative positional relationship between the face position and the hand position is similar. It is characterized by that.
[0014]
This program first detects the face position and hand position of the target person in the real space based on the contour information and skin color area information of the target person generated from the image by the face / hand position detection means. Next, the posture / gesture recognition means detects “the relative positional relationship between the face position and the hand position from the face position and the hand position” and “the fluctuation of the hand position with respect to the face position”. Then, posture data or gesture data in which the detection result and the posture or gesture corresponding to “the relative positional relationship between the face position and the hand position” and “the fluctuation of the hand position when the face position is used as a reference” are described. To recognize the posture or gesture of the target person.
[0015]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings as appropriate. Here, first, the configuration of a gesture recognition system including a gesture recognition device according to the present invention will be described with reference to FIGS. 1 to 19, and then the operation of the gesture recognition system will be described with reference to FIGS. 20 and 21. To do.
[0016]
(Configuration of gesture recognition system A)
First, an overall configuration of a gesture recognition system A including a gesture recognition device 4 according to the present invention will be described with reference to FIG. FIG. 1 is a block diagram showing the overall configuration of the gesture recognition system A.
[0017]
As shown in FIG. 1, the gesture recognition system A analyzes two cameras 1 (1a, 1b) that capture a target person (not shown) and an image (captured image) captured by the camera 1 to obtain various information. A captured image analysis device 2 to be generated, a contour extraction device 3 that extracts the contour of the target person based on various information generated by the captured image analysis device 2, various information generated by the captured image analysis device 2, and a contour The gesture recognition device 4 is configured to recognize a posture (posture) or a gesture (motion) of the target person based on the contour (outline information) of the target person extracted by the extraction device 3. Hereinafter, the camera 1, the captured image analysis device 2, the contour extraction device 3, and the gesture recognition device 4 will be described in order.
[0018]
(Camera 1)
The camera 1 (1a, 1b) is a color CCD camera, and the right camera 1a and the left camera 1b are arranged side by side by a distance B on the left and right. Here, the right camera 1a is used as a reference camera. Images captured by the cameras 1a and 1b (captured images) are stored in a frame grabber (not shown) for each frame, and then input in synchronization with the captured image analysis apparatus 2.
[0019]
Note that images (captured images) captured by the cameras 1a and 1b are input to the captured image analysis apparatus 2 after performing calibration processing and rectification processing by a correction device (not shown) and correcting the images.
[0020]
(Captured image analysis device 2)
The captured image analysis apparatus 2 is an apparatus that analyzes images (captured images) input from the cameras 1a and 1b and generates “distance information”, “motion information”, “edge information”, and “skin color area information”. Yes (see FIG. 1).
[0021]
FIG. 2 is a block diagram illustrating configurations of the captured image analysis device 2 and the contour extraction device 3 included in the gesture recognition system A illustrated in FIG. As illustrated in FIG. 2, the captured image analysis apparatus 2 includes a distance information generation unit 21 that generates “distance information”, a motion information generation unit 22 that generates “motion information”, and an edge that generates “edge information”. The information generation unit 23 includes a skin color region information generation unit 24 that generates “skin color region information”.
[0022]
(Distance information generator 21)
The distance information generation unit 21 detects the distance from the camera 1 for each pixel based on the parallax between the two captured images captured by the cameras 1a and 1b at the same time. Specifically, the parallax is obtained from the first captured image captured by the camera 1a as the reference camera and the second captured image captured by the camera 1b using the block correlation method, and the trigonometric method is obtained from the parallax. Is used to determine the distance from the camera 1 to the “object captured at each pixel”. Then, the obtained distance is associated with each pixel of the first captured image, and a distance image D1 (see FIG. 3A) in which the distance is expressed by a pixel value is generated. This distance image D1 becomes distance information. In the example of FIG. 3A, the target person C exists at the same distance.
[0023]
In the block correlation method, the first captured image and the second captured image are compared with the same block (for example, 8 × 3 pixels) having a specific size between the first captured image and the second captured image. This is a method for detecting parallax by examining how many pixels the subject in the block is shifted.
[0024]
(Motion information generating unit 22)
The motion information generation unit 22 is based on the difference between the “captured image (t)” at “time t” and the “captured image (t + Δt)” at “time t + Δt” captured in time series by the camera 1a as the reference camera. Then, the movement of the target person is detected. Specifically, the difference between the “captured image (t)” and the “captured image (t + Δt)” is taken to examine the displacement of each pixel. Then, a displacement vector is obtained based on the examined displacement, and a difference image D2 (see FIG. 3B) in which the obtained displacement vector is represented by a pixel value is generated. This difference image D2 becomes motion information. In the example of FIG. 3B, a motion is detected on the left arm of the target person C.
[0025]
(Edge information generation unit 23)
The edge information generation unit 23 generates an edge image in which edges existing in the captured image are extracted based on the density information or color information of each pixel in the image (captured image) captured by the camera 1a as the reference camera. To do. Specifically, based on the luminance of each pixel in the captured image, a portion where the luminance greatly changes is detected as an edge, and an edge image D3 (see FIG. 3C) including only the edge is generated. This edge image D3 becomes edge information.
[0026]
For example, the Sobel operator is multiplied for each pixel, and a line segment having a predetermined difference from the adjacent line segment is detected as an edge (horizontal edge or vertical edge) in units of rows or columns. Note that the Sobel operator is an example of a coefficient row having a weighting factor for pixels in the vicinity of a certain pixel.
[0027]
(Skin color area information generation unit 24)
The skin color area information generation unit 24 extracts the skin color area of the target person existing in the captured image from the image (captured image) captured by the camera 1a that is the reference camera. Specifically, the RGB values of all the pixels in the captured image are converted into an HLS space consisting of hue, brightness, and saturation, and pixels whose hue, brightness, and saturation are within a preset threshold range are converted to a flesh-colored region. (See FIG. 3D). In the example of FIG. 3D, the face of the target person C is extracted as the skin color region R1, and the hand is extracted as the skin color region R2. The skin color areas R1 and R2 serve as skin color area information.
[0028]
The “distance information (distance image D1)”, “motion information (difference image D2)”, and “edge information (edge image D3)” generated by the captured image analysis device 2 are input to the contour extraction device 3. Further, “distance information (distance image D1)” and “skin color area information (skin color areas R1, R2)” generated by the captured image analysis apparatus 2 are input to the gesture recognition apparatus 4.
[0029]
(Outline extraction device 3)
The contour extraction device 3 is based on the “distance information (distance image D1)”, “motion information (difference image D2)”, and “edge information (edge image D3)” generated by the captured image analysis device 2. It is an apparatus which extracts the outline of (refer FIG. 1).
[0030]
As illustrated in FIG. 2, the contour extraction device 3 generates a target distance setting unit 31 that sets a “target distance” that is a distance where the target person exists, and a “target distance image” based on the “target distance”. From the target distance image generating unit 32, the target region setting unit 33 for setting the “target region” in “in the target distance image”, and the contour extracting unit 34 for extracting “the contour of the target person” from “in the target region” It is configured.
[0031]
(Target distance setting unit 31)
The target distance setting unit 31 includes a target person based on the distance image D1 (see FIG. 3A) generated by the captured image analysis device 2 and the difference image D2 (see FIG. 3B). Set the “target distance” which is the distance. Specifically, the pixels having the same pixel value in the distance image D1 are grouped (pixel group), and the pixel values of the pixel group in the difference image D2 are accumulated. Then, it is considered that a moving object with the largest amount of movement, that is, a target person exists in an area where the cumulative value of pixel values is larger than a predetermined value and is closest to the camera 1, and the distance is set as a target. The distance is used (see FIG. 4A). In the example of FIG. 4A, the target distance is set to 2.2 m. The target distance set by the target distance setting unit 31 is input to the target distance image generation unit 32.
[0032]
(Target distance image generation unit 32)
The target distance image generation unit 32 refers to the distance image D1 generated by the captured image analysis device 2 (see FIG. 3A), and applies the pixel existing in the target distance ± αm set by the target distance setting unit 31. A “target distance image” is generated by extracting corresponding pixels from the edge image D3 (see FIG. 3C). Specifically, a pixel corresponding to the target distance ± αm input from the target distance setting unit 31 in the distance image D1 is obtained. Then, only the obtained pixels are extracted from the edge image D3 generated by the edge information generation unit 23, and a target distance image D4 (see FIG. 4B) is generated. Therefore, the target distance image D4 is an image representing the target person existing at the target distance with an edge. The target distance image D4 generated by the target distance image generation unit 32 is input to the target region setting unit 33 and the contour extraction unit 34.
[0033]
(Target area setting unit 33)
The target area setting unit 33 sets a “target area” in the target distance image D4 (see FIG. 3B) generated by the target distance image generating unit 32. Specifically, a histogram H in which the pixel values in the vertical direction of the target distance image D4 are accumulated is generated, and the position where the frequency in the histogram H is maximum is specified as the center position in the horizontal direction of the target person C (FIG. 5). (See (a)). Then, a range of a specific size (for example, 0.5 m) is set as the target region T on the left and right of the specified center position (see FIG. 5B). The vertical range of the target region T is set to a specific size (for example, 2 m). When setting the target area T, the setting range of the target area T is corrected with reference to camera parameters such as the tilt angle and height of the camera 1. The target area T set by the target area setting unit 33 is input to the contour extraction unit 34.
[0034]
(Outline extraction unit 34)
The contour extraction unit 34 is generated by the target distance image generation unit 32. Target distance image D4 In FIG. 4B, the contour O of the target person C is extracted from the target area T set by the target area setting unit 33 (see FIG. 5C). Specifically, when extracting the contour O of the target person C, a method using a dynamic contour model composed of a closed curve called “Snakes” (hereinafter referred to as “snake method”) is used. The snake technique is a technique for extracting the contour of the object by contracting and deforming “Snakes”, which is a dynamic contour model, so that a predefined energy function is minimized. The contour O of the target person C extracted by the contour extraction unit 34 is input to the gesture recognition device 4 as “contour information” (see FIG. 1).
[0035]
Based on the “distance information” and “skin color area information” generated by the captured image analysis device 2 and the “contour information” generated by the contour extraction device 3, the gesture recognition device 4 performs the posture or gesture of the target person. Is a device that recognizes and outputs the recognition result (see FIG. 1).
[0036]
FIG. 6 is a block diagram showing a configuration of the gesture recognition device 4 included in the gesture recognition system A shown in FIG. As shown in FIG. 6, the gesture recognition device 4 includes a face / hand position detection unit 41 that detects a face position and a hand position in the real space of the target person C, and a face detected by the face / hand position detection unit 41. Posture / gesture recognition means for recognizing the posture or gesture of the target person based on the position and the hand position is provided.
[0037]
The face / hand position detection means 41 includes a head position detection unit 41A that detects the “head position (head position)” of the target person in real space, and the “face position (face position)” of the target person. A face position detecting unit 41B for detecting, a hand position detecting unit 41C for detecting the “hand position (hand position)” of the target person, and a hand position detecting unit for detecting the “hand position (hand position)” of the target person 41D. Here, the “hand” is a part made up of an arm and a hand, and the “hand” is a fingertip of the hand.
[0038]
(Head position detector 41A)
The head position detection unit 41 A detects the “head position” of the target person C based on the contour information generated by the contour extraction device 3. A method for detecting the position of the top of the head will be described with reference to FIG. Next, an area (top position search area) F1 for searching for the top position is set (2). The lateral width (width in the X-axis direction) of the top position search area F1 is set to a preset average shoulder width W of a human with the X coordinate of the center of gravity G as the center. The average shoulder width W of the human is set with reference to the distance information generated by the captured image analysis device 2. Further, the vertical width (width in the Y-axis direction) of the top position search area F1 is set to a width that can cover the contour O. Then, the upper end point of the contour O in the parietal position search area F1 is set as the parietal position m1 (3). The top position m1 detected by the head position detection unit 41A is input to the face position detection unit 41B.
[0039]
(Face position detection unit 41B)
The face position detection unit 41B detects the “face position” of the target person C based on the top position m1 detected by the head position detection unit 41A and the skin color area information generated by the captured image analysis device 2. . The face position detection method will be described with reference to FIG. 7B. First, an area (face position search area) F2 for searching for a face position is set (4). The range of the face position search area F2 is set to a preset “approximate size covering a human head” with reference to the top position m1. The range of the face position search area F2 is set with reference to the distance information generated by the captured image analysis device 2.
[0040]
Next, the center of gravity of the skin color area R1 in the face position search area F2 is set as the face position m2 on the image (5). For the skin color region R1, the skin color region information generated by the captured image analysis device 2 is referred to. Then, the face position m2t (Xft, Yft, Zft) in the real space is obtained from the face position m2 (Xf, Yf) on the image with reference to the distance information generated by the captured image analysis apparatus 2.
[0041]
The “face position m2 on the image” detected by the face position detection unit 41B is input to the hand position detection unit 41C and the hand position detection unit 41D. Further, the “face position m2t in the real space” detected by the face position detection unit 41B is stored in a storage unit (not shown), and the target is detected by the posture / gesture recognition unit 42B (see FIG. 6) of the posture / gesture recognition unit 42. Used when recognizing the posture or gesture of the person C.
[0042]
(Hand position detector 41C)
The hand position detection unit 41 C detects the “hand position” of the target person C based on the skin color area information generated by the captured image analysis device 2 and the contour information generated by the contour extraction device 3. Here, the skin color area information uses information of an area excluding the periphery of the face position m2. The hand position detection method will be described with reference to FIG. 8A. First, an area (hand position search area) F3 (F3R, F3L) for searching for a hand position is set (6). The hand position search area F3 is set to a preset “range within which the hand can be reached (range where the left and right hands can reach)” with reference to the face position m2 detected by the face position detection unit 41B. Note that the size of the hand position search area F3 is set with reference to the distance information generated by the captured image analysis apparatus 2.
[0043]
Next, the center of gravity of the skin color area R2 in the hand position search area F3 is set as the hand position m3 on the image (7). For the skin color region R2, the skin color region information generated by the captured image analysis device 2 is referred to. Here, the skin color area information uses information of an area excluding the periphery of the face position m2. In the example of FIG. 8A, since the skin color area exists only in the hand position search area F3 (L), the hand position m3 is detected only in the hand position search area F3 (L). In the example of FIG. 8A, the target person is wearing long-sleeved clothes and is only exposed beyond the wrist, so the position of the hand (HAND) is the hand position m3. The “hand position m3 on the image” detected by the hand position detection unit 41C is input to the hand position detection unit 41D.
[0044]
(Hand position detector 41D)
The hand position detector 41D detects the “hand position” of the target person C based on the face position m2 detected by the face position detector 41B and the hand position m3 detected by the hand position detector 41C. The method for detecting the hand position will be described with reference to FIG. 8B. First, an area (hand position search range) F4 for searching for the hand position is set in the hand position search area F3L (8). The hand position search range F4 is set to a preset “approximate size to cover the hand” with the hand position m3 as the center. The range of the hand position search range F4 is set with reference to the distance information generated by the captured image analysis device 2.
[0045]
Subsequently, the upper, lower, left and right end points m4a to m4d of the skin color region R2 in the hand position search range F4 are detected (9). For the skin color region R2, the skin color region information generated by the captured image analysis device 2 is referred to. Then, the vertical distance d1 between the upper and lower end points (between m4a and m4b) and the horizontal direction distance d2 between the left and right end points (between m4c and m4d) are compared, and the longer distance is determined as the direction in which the hand extends. (10). In the example of FIG. 8B, since the vertical distance d1 is longer than the horizontal distance d2, it is determined that the hand extends in the vertical direction.
[0046]
Next, based on the positional relationship between the face position m2 on the image and the hand position m3 on the image, which of the upper and lower end points m4a and m4b (or the left and right end points m4c and m4d) is the tip position is determined. to decide. Specifically, when the hand position m3 is far from the face position m2, the hand is regarded as extending, and the end point far from the face position m2 is determined as the hand position (hand position on the image) m4. Conversely, if the hand position m3 is close to the face position m2, it is considered that the elbow is bent, and the end point closer to the face position m2 is determined as the hand position m4. In the example of FIG. 8B, since the hand position m3 is far from the face position m2 and the upper end point m4a is farther from the face position m2 than the lower end point m4b, it is determined that the upper end point m4a is the hand position m4 (11). .
[0047]
Then, the hand position m4t (Xht, Yht, Zht) in the real space is obtained from the hand position m4 (Xh, Yh) on the image with reference to the distance information generated by the captured image analysis apparatus 2. The “hand position m4t in real space” detected by the hand position detection unit 41D is stored in a storage unit (not shown), and the target person C is detected by the posture / gesture recognition unit 42B (see FIG. 6) of the posture / gesture recognition unit 42. Used when recognizing a posture or gesture.
[0048]
(Posture / gesture recognition means 42)
The posture / gesture recognition unit 42 includes a posture / gesture data storage unit 42A for storing posture data and gesture data, and “face position m2t in real space” and “in real space” detected by the face / hand position detection unit 41. And a posture / gesture recognition unit 42B that recognizes the posture or gesture of the target person based on the “hand position m4t” (see FIG. 6).
[0049]
(Posture / gesture data storage unit 42A)
The posture / gesture data storage unit 42A stores posture data P1 to P6 (see FIG. 9) and gesture data J1 to J4 (see FIG. 10). The posture data P1 to P6 and the gesture data J1 to J4 are postures corresponding to “the relative positional relationship between the face position and the hand position in the real space” and “the fluctuation of the hand position with respect to the face position”. Alternatively, it is data describing a gesture. The “relative positional relationship between the face position and the hand position” specifically refers to “the height of the face position and the hand position” and “the distance of the face position and the hand position from the camera 1”. That is. Posture / gesture recognition means 42 Can also detect “the relative positional relationship between the face position and the hand position” by examining “the horizontal shift of the face position and the hand position on the image”. The posture data P1 to P6 and the gesture data J1 to J4 are used when the posture or gesture of the target person is recognized by the posture / gesture recognition unit 42B.
[0050]
With reference to FIG. 9, the posture data P1 to P6, shown in FIG. 9 (a) "Posture P1: FACE SIDE" is "Hello", shown in FIG. 9 (b) "Posture P2: HIGH HAND" is " 9 is “stop”, “posture P4: HANDSHAKE” shown in FIG. 9D is “handshake”, and “posture P5” shown in FIG. 9 (e). : SIDE HAND is a posture meaning “see the direction of the hand”, and “posture P6: LOW HAND” shown in FIG. 9F is a posture meaning “bend in the direction of the hand”.
[0051]
The gestures J1 to J4 will be described with reference to FIG. 10. “Gesture J1: HAND SWING” shown in FIG. 10A is “Caution”, and “Gesture J2: BYE BYE” shown in FIG. Is a gesture meaning “Goodbye”, “Gesture J3: COME HERE” shown in FIG. 10 (c) is “Get closer”, and “Gesture J4: HAND CIRCLING” shown in FIG. 10 (d) is a gesture meaning “Turn”. .
[0052]
In the present embodiment, the posture / gesture data storage unit 42A (see FIG. 6) stores posture data P1 to P6 (see FIG. 9) and gesture data J1 to J4 (see FIG. 10). Posture data and gesture data to be stored in the posture / gesture data storage unit 42A can be arbitrarily set. The meaning of each posture and each gesture can also be set arbitrarily.
[0053]
The posture / gesture recognition unit 42B determines whether the “face position m2t and the hand position m4t” from the “face position m2t in the real space” and the “hand position m4t in the real space” detected by the face / hand position detection means 41. "Relative positional relationship" and "variation of hand position m4t with respect to face position m2t" are detected, and the detection result and posture data P1 to P6 (posture / gesture data storage unit 42A) The posture or gesture of the target person is recognized by comparing the gesture data J1 to J4 (see FIG. 10). The recognition result in the posture / gesture recognition unit 42B is stored as a history.
[0054]
Next, a posture or gesture recognition method in the posture / gesture recognition unit 42B will be described in detail with reference to the flowcharts shown in FIGS. Here, first, the outline of the processing in the posture / gesture recognition unit 42B will be described with reference to the flowchart shown in FIG. 11, and then “step S1 in the flowchart shown in FIG. 11 will be described with reference to the flowchart shown in FIG. : Posture recognition processing ”, and“ Step S4: Posture / gesture recognition processing ”in the flowchart shown in FIG. 11 will be described with reference to the flowcharts shown in FIGS.
[0055]
(Outline of processing in the posture / gesture recognition unit 42B)
FIG. 11 is a flowchart for explaining an outline of processing in the posture / gesture recognition unit 42B. Referring to the flowchart shown in FIG. 11, first, in step S1, recognition of postures P1 to P4 (see FIG. 9) is attempted. Subsequently, in step S2, it is determined whether or not the posture has been recognized in step S1. If it is determined that the posture has been recognized, the process proceeds to step S3. If it is determined that the posture has not been recognized, the process proceeds to step S4. In step S3, the posture recognized in step S1 is output as a recognition result, and the process ends.
[0056]
In step S4, recognition of postures P5 and P6 (see FIG. 9) or gestures J1 to J4 (see FIG. 10) is attempted. Next, in step S5, it is determined whether or not a posture or gesture has been recognized in step S4. If it is determined that the posture or gesture has been recognized, the process proceeds to step S6. If it is determined that the posture or gesture has not been recognized, the process proceeds to step S8.
[0057]
In step S6, it is determined whether the same posture or gesture has been recognized a predetermined number of times (for example, 5 times) or more in a predetermined number of frames (for example, 10 frames) in the past. If it is determined that the same posture or gesture can be recognized a predetermined number of times or more, the process proceeds to step S7. If it is determined that the same posture or gesture cannot be recognized a predetermined number of times or more, the process proceeds to step S8.
[0058]
In step S7, the posture or gesture recognized in step S4 is output as a recognition result, and the process ends. In step S8, it is output that the posture or gesture could not be recognized, that is, it cannot be recognized, and the process is terminated.
[0059]
(Step S1: Posture recognition processing)
FIG. 12 is a flowchart for explaining “step S1: posture recognition processing” in the flowchart shown in FIG. Referring to the flowchart shown in FIG. 12, first, in step S11, the face position m2t and the hand position m4t (hereinafter referred to as “input information”) in the real space of the target person are input from the face / hand position detection means 41. The Subsequently, in step S12, based on the face position m2t and the hand position m4t, the distance from the camera 1 to the hand (hereinafter referred to as “hand distance”) and the distance from the camera 1 to the face (hereinafter referred to as “face distance”). To determine whether or not the hand distance and the face distance are substantially the same, that is, whether or not the difference between the hand distance and the face distance is equal to or less than a predetermined value. Here, if it is determined that the distance is approximately the same, the process proceeds to step S13, and if it is determined that the distance is not approximately the same, the process proceeds to step S18.
[0060]
In step S13, the hand height (hereinafter referred to as “hand height”) is compared with the face height (hereinafter referred to as “face height”), and the hand height and the face height are substantially the same. Whether or not the difference between the hand height and the face height is less than or equal to a predetermined value. If it is determined that the two are substantially the same, the process proceeds to step S14. If it is determined that the two are not substantially the same, the process proceeds to step S15. In step S14, a recognition result indicating that the posture corresponding to the input information is “posture P1: FACE SIDE” (see FIG. 9A) is output, and the process ends.
[0061]
In step S15, the hand height is compared with the face height to determine whether the hand height is higher than the face height. If it is determined that the hand height is higher than the face height, the process proceeds to step S16. If it is determined that the hand height is not higher than the face height, the process proceeds to step S17. In step S16, a recognition result indicating that the posture corresponding to the input information is “posture P2: HIGH HAND” (see FIG. 9B) is output, and the process ends. In step S17, a recognition result indicating that the posture corresponding to the input information is “none” is output, and the process ends.
[0062]
In step S18, the hand height and the face height are compared, and it is determined whether or not the hand height and the face height are substantially the same, that is, whether or not the difference between the hand height and the face height is a predetermined value or less. If it is determined that both are substantially the same, the process proceeds to step S19. If it is determined that both are not substantially the same, the process proceeds to step S20. In step S19, a recognition result indicating that the posture corresponding to the input information is “posture P3: STOP” (see FIG. 9C) is output, and the process ends.
[0063]
In step S20, the hand height is compared with the face height to determine whether the hand height is lower than the face height. If it is determined that the hand height is not lower than the face height, the process proceeds to step S21. If the hand height is determined to be higher than the face height, the process proceeds to step S22. In step S21, a recognition result indicating that the posture corresponding to the input information is “posture P4: HANDSHAKE” (see FIG. 9D) is output, and the process ends. In step S22, a recognition result “No” is output for the posture corresponding to the input information, and the process ends.
[0064]
(Step S4: posture / gesture recognition processing)
FIG. 13 is a first flowchart for explaining the “step S4: posture / gesture recognition processing” in the flowchart shown in FIG. Referring to the flowchart shown in FIG. 13, first, in step S31, input information (a face position m2t and a hand position m4t in the real space of the target person) is input. Subsequently, in step S32, a standard deviation of the hand position m4t with respect to the face position m2t is obtained, and the presence / absence of hand movement is determined based on the obtained standard deviation. Specifically, when the standard deviation of the hand position m4t is equal to or smaller than a predetermined value, it is determined that there is no hand movement, and when it is larger than the predetermined value, it is determined that there is a hand movement. If it is determined that there is no hand movement, the process proceeds to step S33. If it is determined that there is a hand movement, the process proceeds to step S36.
[0065]
In step S33, it is determined whether or not the hand height is just below the face height. If it is determined that the hand height is immediately below the face height, the process proceeds to step S34. If it is determined that the hand height is not immediately below the face height, the process proceeds to step S35. In step S34, a recognition result indicating that the posture or gesture corresponding to the input information is “posture P5: SIDE HAND” (see FIG. 9E) is output, and the process ends. In step S35, a recognition result indicating that the posture or gesture corresponding to the input information is “posture P6: LOW HAND” (see FIG. 9F) is output, and the process ends.
[0066]
In step S36, the hand height is compared with the face height to determine whether the hand height is higher than the face height. If it is determined that the hand height is higher than the face height, the process proceeds to step S37. If it is determined that the hand height is not higher than the face height, the process proceeds to step S41 (see FIG. 14). In step S37, the hand distance and the face distance are compared, and it is determined whether or not the hand distance and the face distance are substantially the same, that is, whether or not the difference between the hand distance and the face distance is equal to or less than a predetermined value. . If it is determined that both are substantially the same, the process proceeds to step S38. If it is determined that both are not substantially the same, the process proceeds to step S40.
[0067]
In step S38, it is determined whether or not the hand is swung from side to side. Here, if it is determined that the hand is swinging left and right based on the shift in the horizontal direction between the two frames, the process proceeds to step S39. If it is determined that the hand is not swinging left and right, the process proceeds to step S40. In step S39, a recognition result indicating that the posture or gesture corresponding to the input information is “gesture J1: HAND SWING” (see FIG. 10A) is output, and the process ends. In step S41, the recognition result that the posture or gesture corresponding to the input information is “none” is output, and the process ends.
[0068]
FIG. 14 is a second flowchart for explaining the “step S4: posture / gesture recognition processing” in the flowchart shown in FIG. Referring to the flowchart shown in FIG. 14, in step S41, the hand distance is compared with the face distance to determine whether the hand distance is shorter than the face distance. If it is determined that the hand distance is shorter than the face distance, the process proceeds to step S42. If it is determined that the hand distance is not shorter than the face distance, the process proceeds to step S47.
[0069]
In step S42, it is determined whether or not the hand is swung from side to side. Here, if it is determined that the hand is swinging left and right based on the shift in the horizontal direction between the two frames, the process proceeds to step S43. If it is determined that the hand is not swinging left and right, the process proceeds to step S44. In step S43, a recognition result indicating that the posture or gesture corresponding to the input information is “gesture J2: BYE BYE” (see FIG. 10B) is output, and the process ends.
[0070]
In step S44, it is determined whether or not the hand is swung up and down. If it is determined from the vertical shift between the two frames that the hand is swinging up and down, the process proceeds to step S45. If it is determined that the hand is not swinging up and down, the process proceeds to step S46. In step S45, a recognition result indicating that the posture or gesture corresponding to the input information is “gesture J3: COME HERE” (see FIG. 10C) is output, and the process ends. In step S46, the recognition result that the posture or gesture corresponding to the input information is “none” is output, and the process ends.
[0071]
In step S47, the hand distance and the face distance are compared, and it is determined whether or not the hand distance and the face distance are substantially the same, that is, whether or not the difference between the hand distance and the face distance is a predetermined value or less. If it is determined that both are substantially the same, the process proceeds to step S48. If it is determined that both are not substantially the same, the process proceeds to step S50. In step S48, it is determined whether or not the hand is swung from side to side. If it is determined that the hand is swinging left or right, the process proceeds to step S49. If it is determined that the hand is not swinging left or right, the process proceeds to step S50.
[0072]
In step S49, a recognition result indicating that the posture or gesture corresponding to the input information is “gesture J4: HAND CIRCLING” (see FIG. 10D) is output, and the process ends. In step S50, the recognition result that the posture or gesture corresponding to the input information is “none” is output, and the process ends.
[0073]
As described above, the posture / gesture recognition unit 42B obtains “the face position m2t and the face position m2t from the input information (the face position m2t and the hand position m4t in the real space of the target person) input from the face / hand position detection unit 41. “Relative positional relationship with the hand position m4t” and “fluctuation of the hand position m4t with respect to the face position m2t” are detected, and the detection result and posture / gesture are detected. data By comparing the postures P1 to P6 (see FIG. 9) and the gesture data J1 to J4 (see FIG. 10) stored in the storage unit 42A, the posture or gesture of the target person can be recognized.
[0074]
Note that the posture / gesture recognition unit 42B can recognize the posture or gesture of the target person by the following methods 1 and 2 in addition to the method described above. Hereinafter, “variation example 1 of the processing in the posture / gesture recognition unit 42B” will be described with reference to FIGS. 15 to 17, and the processing in the posture / gesture recognition unit 42B will be described with reference to FIGS. 18 and 19. Modification 2 ”will be described.
[0075]
(Modification 1 of processing in the posture / gesture recognition unit 42B)
In the first modification, the pattern matching method is used when recognizing the posture or gesture of the target person. FIG. 15 is a flowchart for explaining “variation example 1 of processing in the posture / gesture recognition unit 42 B”. Referring to the flowchart shown in FIG. 15, first, in step S61, recognition of a posture or a gesture is attempted. At this time, the posture / gesture recognition unit 42B receives the input information (the face position m2t and the hand position m4t in the real space of the target person) input from the face / hand position detection means 41 and “when the face position m2t is used as a reference. "Input pattern" consisting of "change in hand position m4t" data The posture or gesture of the target person can be obtained by searching for the most similar pattern by superimposing with the postures P11 to P16 (see FIG. 16) or the gesture data J11 to J14 (see FIG. 17) stored in the storage unit 42A. recognize. Posture and gesture data In the storage unit 42A, pattern matching posture data P11 to P16 (see FIG. 16) and gesture data J11 to J14 (see FIG. 17) are stored in advance.
[0076]
Subsequently, in step S62, it is determined whether or not a posture or a gesture has been recognized in step S61. If it is determined that the posture or gesture has been recognized, the process proceeds to step S63. If it is determined that the posture or gesture has not been recognized, the process proceeds to step S65.
[0077]
In step S63, it is determined whether the same posture or gesture has been recognized a predetermined number of times (for example, 5 times) or more in a predetermined number of frames (for example, 10 frames) in the past. If it is determined that the same posture or gesture can be recognized a predetermined number of times or more, the process proceeds to step S64. If it is determined that the same posture or gesture cannot be recognized a predetermined number of times or more, the process proceeds to step S65.
[0078]
In step S64, the posture or gesture recognized in step S61 is output as a recognition result, and the process ends. In step S65, it is output that the posture or the gesture could not be recognized, that is, the recognition is impossible, and the process is terminated.
[0079]
As described above, the posture / gesture recognition unit 42B uses the pattern matching method to input information input from the face / hand position detecting unit 41 and “variation of the hand position m4t with respect to the face position m2t. "Input pattern" consisting of "posture and gesture data By performing pattern matching with the postures P11 to P16 (see FIG. 16) and the gesture data J11 to J14 (see FIG. 17) stored in the storage unit 42A, the posture or gesture of the target person can be recognized.
[0080]
(Modification 2 of processing in the posture / gesture recognition unit 42B)
In the second modification, the relative position relationship between the face position m2t and the hand position m4t is set by setting a determination circle E having a size that allows the hand of the target person to enter and comparing the area of the hand with the area of the determination circle. Are different from “posture P4: HANDSHAKE” (see FIG. 9D) and “gesture J3: COME HERE” (see FIG. 10C). Note that “posture P4: HANDSHAKE” and “gesture J3: COME HERE” both have a hand position that is lower than the face position and the distance from the camera to the hand position is greater than the distance from the camera to the face position. It is short and difficult to distinguish from each other.
[0081]
FIG. 18 is a flowchart for explaining a second modification of the processing in the posture / gesture recognition unit 42B. Referring to the flowchart shown in FIG. 18, first, in step S71, a determination circle (determination area) E centered on the hand position m3 is set (see FIG. 19). The size of the determination circle E is set to a size that allows the hand to enter the determination circle E. The size (diameter) of the determination circle E is set with reference to the distance information generated by the captured image analysis device 2. In the example of FIG. 19, the diameter of the determination circle E is set to 20 cm.
[0082]
Subsequently, in step S72, it is determined whether or not the area Sh of the skin color region R2 in the determination circle E is ½ or more of the area S of the determination circle E. For the flesh color region R2, the flesh color region information generated by the captured image analysis device 2 is referred to. Here, when it is determined that the area Sh of the skin color region R2 is equal to or greater than ½ of the area S of the determination circle E (see FIG. 19B), the process proceeds to step S73, and the area Sh of the skin color region R2 is determined as the determination circle E. If it is determined that the area is not ½ or more of the area S, that is, smaller than ½ (see FIG. 19C), the process proceeds to step S74.
[0083]
In step S73, the determination result that the posture or gesture corresponding to the input information is “gesture J3: COME HERE” (see FIG. 10C) is output, and the process ends. In step S73, a determination result that the posture or gesture corresponding to the input information is “posture P4: HANDSHAKE” (see FIG. 9D) is output, and the process ends.
[0084]
As described above, the posture / gesture recognition unit 42B sets the determination circle E having a size that allows the hand of the target person to enter, and compares the area Sh of the skin color region R2 in the determination circle E with the area of the determination circle E. If the area of the hand is larger than 1/2 of the area of the determination circle, it is determined as “COME HERE”, and if the area of the hand is equal to or less than 1/2 of the area of the determination circle, “HANDSHAKE” is determined. By determining that there is, it is possible to distinguish between “gesture J3: COME HERE” and “posture P4: HANDSHAKE”.
[0085]
(Operation of gesture recognition system A)
Next, the operation of the gesture recognition system A will be described with reference to the block diagram showing the overall configuration of the gesture recognition system A shown in FIG. 1 and the flowcharts shown in FIGS. FIG. 20 is a flowchart for explaining the “captured image analysis step” and the “contour extraction step” in the operation of the gesture recognition system A, and FIG. 21 shows “face / hand position detection in the operation of the gesture recognition system A”. It is a flowchart shown in order to explain "step" and "posture / gesture recognition step".
[0086]
<Captured image analysis step>
Referring to the flowchart shown in FIG. 20, in captured image analysis apparatus 2, when captured images are input from cameras 1a and 1b (step S101), distance information that is distance information from the captured images is displayed in distance information generation unit 21. D1 (see FIG. 3A) is generated (step S102), and the motion information generating unit 22 generates a difference image D2 (see FIG. 3B) as motion information from the captured image (step S103). Further, the edge information generation unit 23 generates an edge image D3 (see FIG. 3C) as edge information from the captured image (step S104), and the skin color area information generation unit 24 uses the skin color area information from the captured image. A certain skin color region R1, R2 (see FIG. 3D) is extracted (step S105).
[0087]
<Outline extraction step>
Still referring to the flowchart shown in FIG. 20, in the contour extracting device 3, first, the target distance setting unit 31 uses the distance image D1 and the difference image D2 generated in steps S102 and S103 to determine the distance at which the target person exists. The target distance is set (step S106). Subsequently, the target distance image generation unit 32 generates a target distance image D4 (see FIG. 4B) obtained by extracting pixels existing at the target distance set in step S106 from the edge image D3 generated in step S104. (Step S107).
[0088]
Next, the target area setting unit 33 sets the target area T (see FIG. 5B) in the target distance image D4 generated in step S107 (step S108). Then, the contour extracting unit 34 extracts the contour O (see FIG. 5C) of the target person C from the target region T set in step S108 (step S109).
[0089]
<Face / hand position detection step>
Referring to the flowchart shown in FIG. 21, in the face / hand position detection means 41 of the gesture recognition device 4, first, the head position detection unit 41A uses the head information of the target person C based on the contour information generated in step S109. The top position m1 (see FIG. 7A) is detected (step S110).
[0090]
Subsequently, in the face position detection unit 41B, based on the top position m1 detected in step S110 and the skin color area information generated in step S105, the “face position m2 on the image” (FIG. 7B). Reference) is detected, and the distance information generated in step S102 is referred to from the detected “face position m2 (Xf, Yf) on the image”, and “face position m2t (Xft, Yft, Zft) "is obtained (step S111).
[0091]
Next, the “hand position m3 on the image” (see FIG. 8A) is detected from the “face position m2 on the image” detected in step S111 in the hand position detection unit 41C (step S112).
[0092]
Then, in the hand position detection unit 41D, based on the “face position m2 on the image” detected by the face position detection unit 41B and the hand position m3 detected by the hand position detection unit 41C, “the hand on the image” The position m4 "(see FIG. 8B) is detected, and the distance information generated in step S102 is referred to from the detected" hand position m4 (Xh, Yh) on the image " The hand position m4t (Xht, Yht, Zht) "is obtained (step S113).
[0093]
<Posture and gesture recognition step>
With reference to the flowchart shown in FIG. 18 again, the posture / gesture recognition unit 42 of the gesture recognition device 4 uses the “face position m2t (in real space) obtained by the posture / gesture recognition unit 42B in step S111 and step S113. From “Xft, Yft, Zft)” and “hand position m4t in real space (Xht, Yht, Zht)”, “relative positional relationship between face position m2t and hand position m4t” and “face position m2t are used as references. Variation of the hand position m4t at the time of detection, and the detection result and posture / gesture data The posture or gesture of the target person is recognized by comparing the postures P1 to P6 (see FIG. 9) and the gesture data J1 to J4 (see FIG. 10) stored in the storage unit 42A (step S114). Since the details of the posture or gesture recognition method in the posture / gesture recognition unit 42B have been described above, they are omitted here.
[0094]
Although the gesture recognition system A has been described above, the gesture recognition device 4 included in the gesture recognition system A can also implement each unit as each function program in a computer. It is also possible to operate as a recognition program.
[0095]
The gesture recognition system A can be applied to an autonomous robot, for example. In this case, the autonomous robot recognizes the posture as “posture P4: HANDSHAKE” (see FIG. 9D) when a person puts his hand, for example, or recognizes the gesture as “gesture J1: HAND”. It can be recognized as “SWING” (see FIG. 10A).
[0096]
It should be noted that instructions by gestures and gestures are not influenced by ambient noise and can be given even in situations where voice does not reach, compared to instructions by voice, and instructions that are difficult to express in words (or become redundant) There is an advantage that it can be done simply.
[0097]
【The invention's effect】
As described above in detail, according to the present invention, when recognizing the gesture of the target person, it is not necessary to detect feature points (points indicating the characteristics of the movement of the target person) one by one. Alternatively, the calculation amount required for the gesture recognition process can be reduced.
[Brief description of the drawings]
FIG. 1 is a block diagram showing an overall configuration of a gesture recognition system A. FIG.
2 is a block diagram showing a configuration of a captured image analysis device 2 and a contour extraction device 3 included in the gesture recognition system A shown in FIG. 1. FIG.
3A is a diagram illustrating a distance image D1, FIG. 3B is a difference image D2, FIG. 3C is an edge image D3, and FIG. 3D is a diagram illustrating skin color regions R1 and R2.
FIG. 4 is a diagram for explaining a method of setting a target distance.
FIG. 5 is a diagram for explaining a method for setting a target region T and a method for extracting a contour O of a target person C from within the target region T;
6 is a block diagram showing a configuration of a gesture recognition device 4 included in the gesture recognition system A shown in FIG. 1. FIG.
FIG. 7A is a diagram for explaining a method for detecting the top position m1, and FIG. 7B is a diagram for explaining a method for detecting the face position m2.
8A is a diagram for explaining a method for detecting the hand position m3, and FIG. 8B is a diagram for explaining a method for detecting the hand position m4.
FIG. 9 is a diagram showing posture data P1 to P6.
FIG. 10 is a diagram showing gesture data J1 to J4.
FIG. 11 is a flowchart for explaining an outline of processing in a posture / gesture recognition unit 42B;
12 is a flowchart for explaining “step S1: posture recognition processing” in the flowchart shown in FIG. 11. FIG.
13 is a first flowchart for explaining “step S4: posture / gesture recognition processing” in the flowchart shown in FIG. 11. FIG.
14 is a second flowchart for explaining the “step S4: posture / gesture recognition processing” in the flowchart shown in FIG. 11. FIG.
FIG. 15 is a flowchart for explaining a first modification of the processing in the posture / gesture recognition unit 42B;
FIG. 16 is a diagram showing posture data P11 to P16.
FIG. 17 is a diagram showing gesture data J11 to J14.
FIG. 18 is a flowchart for explaining a second modification of the processing in the posture / gesture recognition unit 42B;
FIG. 19A is a diagram for explaining a method for setting a determination circle E; Further, (b) is a diagram showing a case where the area Sh of the skin color region R2 is larger than ½ of the area S of the determination circle E, and (c) is an area of the skin color region R2. Sh Is the area of judgment circle E S It is a figure which shows the case where it is below 1/2.
FIG. 20 is a flowchart shown for explaining the “captured image analysis step” and the “contour extraction step” in the operation of the gesture recognition system A.
FIG. 21 is a flowchart for explaining a “face / hand position detection step” and a “posture / gesture recognition step” in the operation of the gesture recognition system A;
[Explanation of symbols]
A gesture recognition system
1 Camera
2 Captured image analyzer
3 Contour extraction device
4 Gesture recognition device
41 Face / hand position detection means
41A Head position detector
41B face position detection unit
41C Hand position detector
41D Hand position detector
42 Posture and gesture recognition means
42A Posture / gesture data storage
42B Posture / Gesture Recognition Unit

Claims

An apparatus for recognizing a posture or a gesture of the target person from an image obtained by capturing the target person with a camera,
A face / hand position detecting means for detecting a face position and a hand position in the real space of the target person based on the contour information and skin color area information of the target person generated from the captured image;
From the face position and the hand position, a relative positional relationship between the face position and the hand position and a variation in the hand position when the face position is used as a reference are detected, and the detection result, the face position and the hand position are detected. Comparing the posture or gesture of the target person by comparing the posture data or the gesture data describing the posture or the gesture corresponding to the relative positional relationship with the position and the movement of the hand position when the face position is used as a reference. Posture and gesture recognition means for recognizing ,
The posture / gesture recognizing means sets a determination area of a size in which the hand of the target person can enter with respect to the relative positional relationship, and compares the area of the hand with the area of the determination area, thereby A gesture recognition device that distinguishes a posture or a gesture in which a relative positional relationship between a face position and the hand position is similar .

The gesture recognition apparatus according to claim 1, wherein a relative positional relationship between the face position and the hand position is detected by comparing a height and a distance from the camera.

The gesture recognition apparatus according to claim 1, wherein the posture / gesture recognition unit recognizes a posture or a gesture of the target person using a pattern matching method.

A method for recognizing a posture or gesture of the target person from an image of the target person captured by a camera,
A face / hand position detecting step for detecting a face position and a hand position in the real space of the target person by a face / hand position detecting means based on the contour information and skin color area information of the target person generated from the image;
From the face position and the hand position, a posture / gesture recognition unit detects a relative positional relationship between the face position and the hand position and a change in the hand position when the face position is used as a reference. By comparing the result with the posture data or gesture data describing the relative positional relationship between the face position and the hand position and the posture or gesture corresponding to the variation of the hand position when the face position is used as a reference, A gesture / gesture recognition step for recognizing a posture or gesture of a target person ,
In the posture / gesture recognition step, the posture / gesture recognition means sets a determination area of a size in which the hand of the target person can enter with respect to the relative positional relationship, and the area of the hand and the area of the determination area A gesture recognition method characterized by distinguishing a posture or a gesture in which the relative positional relationship between the face position and the hand position is similar by comparing with each other .

In order to recognize the posture or gesture of the target person from an image obtained by capturing the target person with a camera,
Face / hand position detecting means for detecting a face position and a hand position in the real space of the target person based on the contour information and skin color area information of the target person generated from the image;
From the face position and the hand position, a relative positional relationship between the face position and the hand position and a variation in the hand position when the face position is used as a reference are detected, and the detection result, the face position and the hand position are detected. The posture or gesture of the target person is compared by comparing posture data or gesture data describing the posture or gesture corresponding to the positional relationship relative to the position and the movement of the hand position when the face position is used as a reference. Function as a gesture and gesture recognition means to recognize ,
The posture / gesture recognizing means sets a determination area of a size in which the hand of the target person can enter with respect to the relative positional relationship, and compares the area of the hand with the area of the determination area, thereby A gesture recognition program characterized by distinguishing a posture or a gesture whose relative positional relationship between a face position and the hand position is similar .