JP3657463B2

JP3657463B2 - Motion recognition system and recording medium on which motion recognition program is recorded

Info

Publication number: JP3657463B2
Application number: JP18424299A
Authority: JP
Inventors: 通広大野; 宏之赤木
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1999-06-29
Filing date: 1999-06-29
Publication date: 2005-06-08
Anticipated expiration: 2019-06-29
Also published as: JP2001016606A

Description

【０００１】
【発明の属する技術分野】
本発明は、例えば手などの画像が含まれている時系列画像データを処理することによって、手の形状および動作を認識する動作認識システムに関するものである。
【０００２】
【従来の技術】
従来、パーソナルコンピュータ等の情報処理機器におけるユーザーインターフェースとして、キー入力デバイスとしてのキーボードや、ポインティングデバイスとしてのマウスなどが一般的に用いられている。しかしながら、キーボードやマウスによる操作は、ある程度のスキルを要求するため、初心者にとっては、その操作が難しいという問題がある。
【０００３】
また、キーボードやマウスなどを使用する場合、操作とそれに対するシステムの応答との関連を使用者が記憶しておく必要がある。例えばキーボードにおいては、ＣｔｒｌキーやＡｌｔキーなどの機能を覚える必要があったり、マウスにおいては、シングルクリックとダブルクリックとの違いや、左ボタンと右ボタンとの機能の違いなどを覚える必要がある。このような多種多様の操作および機能を逐一覚えることは、初心者にとって大きな負担となる。
【０００４】
そこで、近年では、簡単かつ直観的なユーザーインターフェースとして、人間の身体、すなわち、身振りや手振りを利用しようとする試みが盛んに行われている。身振りや手振りをユーザーインターフェースとして利用するためには、カメラ等の入力デバイスによって身体の姿勢、形状や動作に関する情報を入力し、入力された情報を解析することによって身体情報の内容を認識し、認識された動作に対して特定のコマンド等の意味を持たせることが必要となる。
【０００５】
ここで、人間の身体を利用したユーザーインターフェースとして提案されている方法について以下にその例を示す。電子情報通信学会論文誌 D-II Vol.J80-D-II No.6 pp.1571-1580(1997)「インタラクティブシステム構築のための動画像からの実時間ジェスチャ認識手法−仮想指揮システムへの応用−」（文献１）には、ＣＣＤカメラによって撮影された画像から、腕部分を抽出し、その動きの軌跡を解析することによって、実時間でジェスチャを認識する方法が開示されている。また、特開平２−１４４６７５号公報（文献２）には、指の各関節毎に異なる色で塗り分けられた手袋を装着し、撮影画像から手袋の色を手がかりに指の動きを認識する方法が開示されている。
【０００６】
また、テレビジョン学会誌 Vol.48, No.8, pp.960-965(1994) 「仮想環境実現のための基板技術」（文献３）には、データグローブと呼ばれる手袋型のセンサデバイスを用いて、手指の動きをコンピュータに入力する方法が開示されている。また、映像情報(I) 1992/9 pp.55-60 「赤外画像と可視画像による人物抽出」（文献４）には、赤外画像と可視画像とを入力として用い、赤外画像から人物候補領域を抽出し、さらに可視画像において人物候補領域内の肌色領域を抽出することで顔や手の位置を特定する方法が開示されている。
【０００７】
【発明が解決しようとする課題】
文献１に開示されている方法のように、入力として可視画像を用いる場合、最も難しい問題は、入力された画像から手や指等の認識対象領域を抽出することである。手や指の抽出に関しては、画像の輝度情報や色情報を利用して肌色領域を抽出することによって実現する手法が一般的である。しかしながら、輝度情報や色情報は、背景に肌色に近い領域がある場合に、身体との区別が難しくなることや、照明状態などの環境条件によって値が変動しやすいため、上記の手法は、認識の安定性に欠けるという問題を有している。
【０００８】
この問題を解決するために、例えば文献１では、背景に暗幕を配置することによって対応する例が示されており、このような特殊な環境下において実施する例は比較的多く提案されている。あるいは、例えば文献２に開示されている方法のように、認識対象となる部位に、マーカーとなるものを装着することによって検出精度を高める例も多く提案されている。
【０００９】
また、文献３に開示されている方法のように、データグローブのような動作入力専用のデバイスを用いる場合には、手領域の抽出や動作情報の取得の安定性に関する問題は考慮する必要がなくなる。しかしながら、操作の前に動作入力用のデバイスを装着することの煩わしさや、ユーザーインターフェースとしては高価なものとなるという問題があり、キーボードやマウスの代用として利用するには困難な点も多い。
【００１０】
また、文献４に開示されている方法のように、赤外画像を利用すれば、一般的な環境においては身体と背景との温度差が大きいので、人間の身体領域の抽出は容易となる。しかしながら、赤外画像を入力する装置は、一般的に価格が高価であり、かつ、装置が大型となるものが多く、不法侵入者の監視などの特殊用途以外には流用することが難しいので、一般家庭には普及しにくいという問題がある。
【００１１】
本発明は上記の問題点を解決するためになされたもので、その目的は、特定の対象の画像が含まれている時系列画像データを処理することによって該対象の形状および動作を認識する動作認識システムにおいて、対象の形状および動作の検出の精度が高く、かつ、低コストの動作認識システムを提供することにある。
【００１２】
【課題を解決するための手段】
上記の課題を解決するために、本発明に係る動作認識システムは、特定の対象の画像が含まれている時系列画像データを処理することによって該対象の形状および動作を認識する動作認識システムにおいて、上記時系列画像データから動きのある領域を抽出する動き検出手段と、上記時系列画像データから上記対象を特徴づける色を含む領域を抽出する色検出手段と、上記動き検出手段および上記色検出手段の検出結果に基づいて、動きのある領域で、かつ上記対象を特徴づける色を含む領域となる領域を対象領域として抽出する領域統合手段とを備えていることを特徴としている。
【００１３】
上記の構成によれば、時系列画像データから、動き検出手段によって抽出された動きのある領域と、色検出手段によって抽出された、動作認識対象を特徴づける色を含む領域とに基づいて、領域統合手段によって対象領域を抽出しているので、例えば従来の技術で示したように、輝度情報や色情報のみによって対象領域を抽出する構成と比較して、対象領域を、より的確にかつ高い信頼性でもって抽出することができる。例えば、背景に対象を特徴づける色と同じような色の領域がある場合でも、背景は基本的に動かないものであるので、上記の動き検出手段において対象領域の候補として抽出されないことになる。したがって、背景に暗幕をひくなどの特殊な環境にする必要なく、対象を適切に抽出することが可能となる。
【００１４】
また、データグローブなどの接触型の入力装置を必要としないので、手などに特殊な装置を装着するなどの煩わしい作業を不要とすることができる。同時に、データグローブなどの接触型の入力装置は、一般的に高価なものであるので、このような入力装置を不要とすることにより、システムにおけるコストの低減を図ることができる。
【００１５】
また、上記の構成では、動きのある領域の検出と、対象を特徴づける色を含む領域の検出を行えばいいので、必要とする画像データは、一般に用いられている可視画像データでよいことになる。したがって、例えば赤外画像などの特殊な画像データを入力することが可能な、高価格で大型の画像入力装置を不要とすることができる。
【００１６】
また、本発明に係る動作認識システムは、上記の構成において、上記動き検出手段が、上記時系列画像データにおける互いに異なる時間の２つの画像データにおいて、各画素における輝度値の差分を画素値とする差分画像を作成し、この差分画像に基づいて動きのある領域を検出する構成としてもよい。
【００１７】
上記の構成によれば、動き検出手段は、時系列画像データにおける互いに異なる時間の２つの画像データにおいて、各画素における輝度値の差分を画素値とする差分画像に基づいて動きのある領域を検出するので、動きのある領域を、的確に、かつ、少ない演算処理によって検出することができる。
【００１８】
また、本発明に係る動作認識システムは、上記の構成において、上記動き検出手段が、上記差分画像を所定の大きさのブロック単位に分割し、各ブロックに含まれる画素の輝度値の平均値あるいは積算値をブロック値とするブロック画像を作成し、ブロック値が所定の閾値を越えるブロックを連結することによって形成される領域の面積が所定の範囲内にある領域を、動きのある領域として抽出する構成としてもよい。
【００１９】
上記の構成によれば、動き検出手段は、上記差分画像に基づいてブロック画像を作成し、ブロック値が所定の閾値を越えるブロックを連結することによって形成される領域の面積が所定の範囲内にある領域を、動きのある領域として抽出するので、動いている領域の中でも、ある程度広い範囲を占める領域のみを抽出することになる。よって、例えば背景において、対象とは異なる小さな物体が動いている場合でも、これを対象となる領域の候補からはずすことができる。したがって、対象となる領域の検出の精度を上げることが可能となる。
【００２０】
また、本発明に係る動作認識システムは、上記の構成において、上記色検出手段が、画像データにおいて、各色成分の画素値が所定の条件を満たす画素領域を、対象を特徴づける色を含む領域として抽出する構成としてもよい。
【００２１】
上記の構成によれば、色検出手段は、各色成分の画素値が所定の条件を満たす画素領域を、対象を特徴づける色を含む領域として抽出するので、的確に対象を特徴づける色を含む領域を検出することができる。また、各色成分に対する条件を適宜変更することによって、背景や照明の変化にも適切に対応することが可能となる。
【００２２】
また、本発明に係る動作認識システムは、上記の構成において、上記色検出手段が、画像データにおいて、各色成分の画素値が所定の条件を満たす画素領域で、かつ、その画素領域を連結することによって形成される領域の形状および面積が所定の条件を満たしている場合に、該領域を、対象を特徴づける色を含む領域として抽出する構成としてもよい。
【００２３】
上記の構成によれば、色検出手段は、各色成分の画素値が所定の条件を満たすとともに、その画素領域を連結することによって形成される領域の形状および面積が所定の条件を満たしている場合に、該領域を、対象を特徴づける色を含む領域として抽出するので、色の条件のみならず、その領域の形状および面積をも考慮して、対象を特徴づける色を含む領域を検出することになる。よって、例えば、背景に、対象を特徴づける色と同様の色からなる領域があったとしても、形状や面積による条件によって、このような領域を候補から外すことが可能となる。したがって、対象を特徴づける色を含む領域の検出の精度を上げることが可能となる。
【００２４】
また、本発明に係る動作認識システムは、上記の構成において、対象を特徴づける色を含む領域を抽出する際に用いられる、各色成分の画素値に対する条件が、現時刻に到るまでの、対象を特徴づける色を含む領域の抽出結果に基づいて決定される構成としてもよい。
【００２５】
上記の構成によれば、各色成分の画素値に対する条件を、現時刻に到るまでの、対象を特徴づける色を含む領域の抽出結果に基づいて決定するので、例えば、背景や照明の状態などの環境の変化が動作認識中に生じたとしても、このような変化に応じて、各色成分の画素値に対する条件を変化させることが可能となる。すなわち、環境に変化が生じても、対象を特徴づける色を含む領域の抽出の精度を維持することができる。
【００２６】
また、本発明に係る動作認識システムは、上記の構成において、上記領域統合手段が、さらに、所定の過去の時刻において、領域統合手段によって対象領域として抽出された領域で、かつ、現時刻における、対象を特徴づける色を含む領域をも上記対象領域として抽出する構成としてもよい。
【００２７】
上記の構成によれば、領域統合手段は、現時刻において動きのある領域でかつ上記対象を特徴づける色を含む領域とともに、所定の過去の時刻において、領域統合手段によって対象領域として抽出された領域で、かつ、現時刻における、対象を特徴づける色を含む領域をも上記対象領域として抽出するので、対象がほとんど動いていない状態の時でも、対象を対象領域として抽出することが可能となる。
【００２８】
また、本発明に係る動作認識システムは、上記の構成において、上記領域統合手段によって抽出された対象領域の形状を解析する形状解析手段をさらに備えている構成としてもよい。
【００２９】
上記の構成によれば、形状解析手段によって、領域統合手段によって抽出された対象領域の形状を解析することができるので、対象領域の形状の状態を、形状を示すある種のコードによって認識することが可能となる。すなわち、多様に変化する対象領域の形状を、複数のカテゴリーに分類することが可能となる。
【００３０】
また、本発明に係る動作認識システムは、上記の構成において、上記形状解析手段が、対象領域の輪郭線を所定の範囲の長さからなる複数の直線で近似し、この直線の傾き、長さ、位置関係によって対象領域の形状を認識する構成としてもよい。
【００３１】
上記の構成によれば、形状解析手段は、対象領域の輪郭線を所定の範囲の長さからなる複数の直線で近似し、この直線の傾き、長さ、位置関係によって対象領域の形状を認識するので、必要最小限の形状解析を行うことができる。
【００３２】
また、本発明に係る動作認識システムは、上記の構成において、上記形状解析手段によって解析された対象領域の形状を、経時的に追跡することによって、対象領域の動きの方向を認識する動作認識手段をさらに備えている構成としてもよい。
【００３３】
上記の構成によれば、動作認識手段によって、形状解析手段によって解析された対象領域の形状を、経時的に追跡することによって、対象領域の動きの方向を認識するので、対象領域の動きの状態を、動きを示すある種のコードによって認識することが可能となる。すなわち、多様に変化する対象領域の動きを、複数のカテゴリーに分類することが可能となる。
【００３４】
また、本発明に係る動作認識システムは、上記の構成において、上記領域統合手段における対象領域の抽出と、上記形状解析手段における形状の解析とを、それぞれ別時刻の画像データに対して行う構成としてもよい。
【００３５】
上記の構成によれば、領域統合手段における対象領域の抽出と、形状解析手段における形状の解析とを、それぞれ別時刻の画像データに対して行うので、１単位時刻に行う処理量を低減することが可能となる。よって、演算性能が若干劣るシステムにおいても、処理の停滞などが生じることなく、円滑に処理を行うことが可能となる。
【００３６】
また、本発明に係る動作認識システムは、上記の構成において、上記対象が人間の手である構成としてもよい。
【００３７】
上記の構成によれば、人間の手の領域を抽出、形状解析、動作認識を行うことになるので、例えば、差し出す指の本数、およびその向き、さらに動きの方向にそれぞれ意味を持たせ、これらを認識することによって、例えば外部に接続された情報処理装置などのシステムに対して制御命令を送信するなどのインターフェースとして機能させることが可能となる。これによって、複雑な操作を使用者が覚えることなく、直観的な操作によるユーザーインターフェースを実現することが可能となる。
【００３８】
また、本発明に係る動作認識プログラムを記録した記録媒体は、特定の対象の画像が含まれている時系列画像データを処理することによって該対象の形状および動作を認識する動作認識プログラムを記録した記録媒体において、上記時系列画像データから動きのある領域を抽出する処理と、上記時系列画像データから上記対象を特徴づける色を含む領域を抽出する処理と、上記動き検出手段および上記色検出手段の検出結果に基づいて、動きのある領域でかつ上記対象を特徴づける色を含む領域となる領域を対象領域として抽出する処理とをコンピュータに実行させるための動作認識プログラムを記録していることを特徴としている。
【００３９】
上記の構成によれば、時系列画像データから、動きのある領域と動作認識対象を特徴づける色を含む領域とに基づいて対象領域を抽出しているので、例えば従来の技術で示したように、輝度情報や色情報のみによって対象領域を抽出する構成と比較して、対象領域を、より的確にかつ高い信頼性でもって抽出することができる。例えば、背景に対象を特徴づける色と同じような色の領域がある場合でも、背景は基本的に動かないものであるので、動きのある領域として抽出されないことになる。したがって、背景に暗幕をひくなどの特殊な環境にする必要なく、対象を適切に抽出することが可能となる。
【００４０】
また、データグローブなどの接触型の入力装置を必要としないので、手などに特殊な装置を装着するなどの煩わしい作業を不要とすることができる。同時に、データグローブなどの接触型の入力装置は、一般的に高価なものであるので、このような入力装置を不要とすることにより、システムにおけるコストの低減を図ることができる。
【００４１】
また、上記の構成では、動きのある領域の検出と、対象を特徴づける色を含む領域の検出を行えばいいので、必要とする画像データは、一般に用いられている可視画像データでよいことになる。したがって、例えば赤外画像などの特殊な画像データを入力することが可能な、高価格で大型の画像入力装置を不要とすることができる。
【００４２】
【発明の実施の形態】
本発明の実施の一形態について図１ないし図７に基づいて説明すれば、以下のとおりである。
【００４３】
図１は、本実施形態に係る動作認識システムの概略構成を示すブロック図である。該動作認識システムは、フレームメモリ１、動き検出部（動き検出手段）２、肌色検出部（色検出手段）３、領域統合部（領域統合手段）４、形状解析部（形状解析手段）５、および動作認識部（動作認識手段）６を備えている。また、図１に示すように、動作認識システムは、動画入力装置７および情報処理装置８に接続されている。
【００４４】
動画入力装置７は、例えばＣＣＤ(Charge Coupled Device) カメラによって構成される。このＣＣＤカメラは、近年価格が急激に下落したことや、テレビ会議システム、インターネットを介してのテレビ通話などの市場が拡大していることなどの理由によって、一般的に広く普及する傾向にある。また、すでに一般に広く普及しているビデオムービーカメラを動画入力装置７として用いることも可能である。このように、動画入力装置７としては、可視画像を入力することが可能なものであればよいので、比較的安価な装置によって構成することができる。
【００４５】
情報処理装置８は、例えばパーソナルコンピュータなどによって構成され、種々の情報処理を行うものである。また、例えば、さらに他の装置の動作を制御するためのコンピュータであってもよい。
【００４６】
以下に、上記動作認識システムが備える各処理部に関して詳細に説明する。
【００４７】
フレームメモリ１は、動画入力装置７から順次転送されるフレーム画像のデータを一時的に記憶するものである。このフレームメモリ１は、少なくとも２フレーム分の画像を記憶可能な程度の記憶容量を持つものとし、新しいフレーム画像が入力されるときには、入力時刻が最も早いフレーム画像を消去することによって、記憶しているフレーム画像を順次更新していく。なお、以降の処理量を軽減するために、フレーム画像を縮小して記憶するようにしてもよい。
【００４８】
動き検出部２は、フレームメモリ１に記憶されている２つのフレーム画像を比較することによって、画像内において動いている動き領域の検出を行うものである。
【００４９】
この動き検出部２の検出動作において、動画入力装置７の転送レートが高い場合、例えば１秒あたり３０フレーム程度となる場合には、動き領域の動く速度が遅いと、連続する２時刻の画像間の差が極めて僅かとなるので、動き領域の検出ができなくなってしまう。このような場合には、フレームメモリ１が、動画入力装置７から出力される全てのフレーム画像を記憶せずに、１フレームおき、あるいは２フレームおきに、フレーム画像を記憶していけばよい。また、別の方法としては、動き領域の動く速度が遅い場合にも画像間の差が生じるように、フレームメモリ１の記憶容量を十分に大きくすることによって対応してもよい。この場合には、動き検出部２は、現時刻のフレーム画像と、フレームメモリ１に記憶されているフレーム画像の中で最も早い時刻のフレーム画像との比較を行うことになる。
【００５０】
ここで、図３（ａ）ないし（ｄ）を参照しながら、動き検出部３における作用について詳細に説明する。動き検出部２は、まず、フレームメモリ１に記憶されている２つのフレーム画像の各画素について、輝度値に関する減算値の絶対値を各画素の値とする差分画像を作成する。この差分画像の一例を図３（ａ）に示す。なお、フレーム画像の表色形式がＲＧＢ形式である場合には、計算量削減のために、簡略的にＲＧＢの１つの要素のみ（一般的にはＧ）の値を輝度値として用いてもよい。
【００５１】
次に、動き検出部２は、図３（ｂ）に示すように、上記差分画像を所定の大きさのブロック単位に分割する。例えば、差分画像の大きさが３２０×２４０画素である場合、１ブロックの大きさを１６×１６画素とすると、差分画像は、２０×１５ブロックからなる画像となる。そして、各ブロック内に含まれる画素値の平均値を各ブロックにおけるブロック値とし、このブロック値に基づいて、図３（ｃ）に示すように、ブロック画像を作成する。ここで、ブロック値が所定の閾値以下となっているブロックに対しては、そのブロックのブロック値を０とすることで、動作検出対象とは異なる微小な動き領域を排除している。なお、図３（ｃ）においては、便宜的に、ブロック値の大きさを各ブロック内での表示面積の大きさで表している。
【００５２】
なお、上記では、各ブロック内に含まれる画素値の平均値を各ブロックにおけるブロック値としていたが、これに限定されるものではなく、例えば、各ブロック内に含まれる画素値の総和値をブロック値としてもよい。
【００５３】
次に、図３（ｄ）に示すように、動き検出部２は、図３（ｃ）に示すブロック画像を２値化することによって、動きの大きい画素のみを検出する。この２値化を行う際に用いられる閾値は、判別分析法等の手法を用いてブロック画像の各ブロック値を分析することによって自動的に決定されるものとする。そして、２値化された画像において、動きの大きい画素として検出されたブロックが隣接している場合に、これらを連結させた領域の面積を求める。この連結させた領域の面積が所定の閾値を越える場合には、その領域を動きのある領域として抽出する。領域の面積に対する閾値は、例えば、ブロック画像全体の面積の３０分の１というように決定する。
【００５４】
以上のように、動き検出部２は、フレームメモリ１に記憶されている２つのフレーム画像の差分画像からブロック画像を作成し、このブロック画像の解析に基づいて、画像中の動きのある領域を抽出する動作を行っている。
【００５５】
次に、肌色検出部３について詳細に説明する。肌色検出部３には、フレームメモリ１内に現時刻のフレーム画像として記憶されている画像データが入力される。入力された画像データにおける各画素のＲＧＢ値に対して、以下に示す変換を行うことによって正規化を行い、色度画像を作成する。なお、正規化を行う意味は、照明むらを除去し、色度成分のみを抽出することにある。
【００５６】
【数１】

【００５７】
次に、正規化後の色度画像における各画素に関して、肌色である条件を満たす画素を検出する。肌色である条件は、式（１）の（ｒ，ｇ，ｂ）に関して、次の式（２）〜（６）のように表される。
ｒ_min≦ｒ≦ｒ_max （２）
ｇ_min≦ｇ≦ｇ_max （３）
ｂ_min≦ｂ≦ｂ_max （４）
ｒ＞ｇ（５）
ｒ＞ｂ（６）
【００５８】
ここで、ｒ_min、ｇ_min、ｂ_minは、（ｒ，ｇ，ｂ）のそれぞれの値に対する最小値であり、ｒ_max、ｇ_max、ｂ_maxは最大値を表している。これらの最小値および最大値の決め方については後述する。
【００５９】
次に、肌色検出部３は、上記の動き検出部２と同様に、上記の色度画像を複数のブロックに分割し、各ブロック内で上記の肌色である条件を満たす画素の数が所定の閾値を越えるときに、該ブロック値を１とし、所定の閾値以下であるときに、該ブロック値を０とするブロック画像を作成する。このブロック画像における各ブロックの大きさは、動き検出部２において作成したブロック画像における各ブロックの大きさと同じであるものとする。
【００６０】
続いて、動き検出部２と同様に、ブロック画像において、ブロック値が１であるブロックのうち、隣接するブロック同士を連結して、その連結領域の面積を求める。この連結領域の面積が所定の閾値を越える場合に、この領域を肌色領域の候補として抽出する。
【００６１】
さらに、肌色検出部３では、上記で肌色領域の候補として抽出された領域の形状に関する解析を行う。一般に、顔や手が表示されている領域は、ブロック画像上では円形や楕円形に近い形となっている。これに基づいて、領域の円形度を目安に肌色領域を絞り込むことができる。領域の円形度Ｃは、該領域の周囲長をＬ、面積をＡとすると、次の式（７）によって求められる。
Ｃ＝Ｌ²／Ａ（７）
【００６２】
この円形度Ｃの値が小さい程円に近いと判定される。したがって、円形度Ｃが所定の閾値より小さい領域を肌色領域として抽出する。
【００６３】
肌色領域の大きさや形状は、連続するフレームの間ではほとんど変化しないものである。したがって、面積や形状に関する閾値は、適当な初期値を決めておき、以降は前時刻の検出結果に基づいて決定することができる。つまり、面積の閾値に関しては、前時刻に近傍で検出された領域の面積より若干小さい値とし、形状の閾値に関しては、前時刻の円形度より少し大きい値とすればよい。
【００６４】
次に、領域統合部４について詳細に説明する。領域統合部４には、動き検出部２で作成されたブロック画像と、肌色検出部３で作成されたブロック画像が入力される。領域統合部４は、動き検出部２によって検出された動き領域と、肌色検出部３によって検出された肌色領域とで重複する領域の面積が所定の閾値を越える場合に、その領域を身体領域の候補として抽出するものである。この閾値としては、例えば、重複する領域の面積が、肌色領域の３分の１であるという具合に決定すればよい。
【００６５】
また、身体部分にほとんど動きがない時刻に対応するために、１時刻前の領域統合部４において身体領域の候補として抽出した領域を記憶しておき、その領域と現時刻の肌色領域との重複領域が所定の閾値を越える場合にも、身体領域の候補として抽出する。
【００６６】
以上のような処理をまとめると、図４に示すようになる。図４において、Ａは、動き検出部２において動き領域として検出された領域、Ｂは、領域統合部４において前時刻において身体領域の候補として検出された領域、Ｃは、肌色検出部３において肌色領域として検出された領域をそれぞれ示している。領域統合部４では、ＡとＢとが足し合わされた画像と、Ｃの画像とで重複する領域を、図中Ｄで示す、動きのある肌色領域、すなわち身体領域の候補として検出する。ここで、どの領域も動きのある肌色領域として検出されなかった場合には、次のフレーム画像の入力処理に戻り、領域統合部４以降の処理部における処理は行われない。
【００６７】
前記した肌色検出部３における肌色条件値ｒ_min、ｇ_min、ｂ_min、ｒ_max、ｇ_max、ｂ_maxは、領域統合部４における身体領域の候補として検出された結果に基づいて更新される。このことについて、図５（ａ）および（ｂ）、ならびに図６を参照しながら、以下に説明する。
【００６８】
図５（ａ）は、領域統合部４において、身体領域の候補として検出された領域を示している。この領域を、図５（ｂ）に示すように、肌色検出部３において作成された色度画像に投影すると、図中破線で囲まれた領域となる。この破線で囲まれた領域に含まれる画素のｒ，ｇ，ｂの各色度値に関して、各画素値に対する画素数を毎時刻積算する。そして、この積算結果に基づいて、横軸に色度値、縦軸に画素数をとったヒストグラムを作成する。図６は、ｒ成分に関するヒストグラムを示している。
【００６９】
各色成分に対応したヒストグラムにおいて、画素数のピーク値を検出し、そのピーク値が所定の閾値を越えた場合に、上記の肌色条件値を更新する。この閾値は、ヒストグラムのピーク値に対する割合が所定の値となるように設定すればよい。図６においては、破線で示した値がこの閾値を表している。そして、度数が、設定された閾値以上になる色値の範囲を肌色範囲とし、これに応じて、肌色条件値が決定される。すなわち、図６においては、ｒ_minおよびｒ_maxの値が決定される。
【００７０】
以上のように、肌色の検出条件を過去の検出結果に基づいて決定すれば、照明条件などの微妙な変化や、背景の変化などに適応することが可能となる。
【００７１】
次に、形状解析部５について詳細に説明する。形状解析部５は、領域統合部４で抽出された身体領域の候補領域の形状を解析することによって、身体によって何が表現されているかを認識する。ここでは、対象となる身体を手であるものとし、その手において示されている指の本数を特定することにする。
【００７２】
領域統合部４において抽出される手の候補領域は、ブロック画像上の領域であるため、領域が小さい場合には、形状の詳細な部分が不明瞭となる。したがって、以下のような方法によって形状の解析を行う。
【００７３】
まず、手の候補領域を、肌色検出部３において作成された色度画像に投影したときに、該候補領域と重複する領域を抽出する。そして、その領域の内部において、前記した式（２）〜（６）の肌色条件を満たしている画素によって形成される形状領域を抽出し、この形状領域に対して、孤立点の除去処理、穴埋め処理、輪郭部分の平滑化処理などを行う。その結果、候補領域内に複数の形状領域が抽出される場合もあるが、この場合には、最も面積が大きい形状領域に対して、形状の解析を行う。
【００７４】
形状の解析は、例えば以下の手順で行うことができる。まず、抽出された形状領域から、その輪郭線を抽出する。次に、抽出された輪郭線を、ある程度の長さを有する複数の直線で近似する。これらの複数の直線の中で、ほぼ同じ傾きを有する直線を輪郭直線として選出する。この形状解析の具体例を図７に示す。
【００７５】
図７において、細線で示した部分が、抽出された形状領域の輪郭線であり、Ｌ１ないしＬ６で示した太線が、選出された輪郭直線である。これらの輪郭直線の中から、形状領域を挟んだ任意の２本の輪郭直線を選択し、挟まれている領域の幅、長さ、面積、位置関係などを調べることによって、指の領域や手のひらの領域などを検出することができる。
【００７６】
例えば図７に示す例においては、輪郭直線Ｌ２とＬ３とに挟まれている領域と、輪郭直線Ｌ４とＬ５とに挟まれている領域とが、ほぼ同じ長さで同じ幅となっていることから、指の領域と推定することができる。また、これらの他には同様の領域が存在しないことから、指の本数は２本であることも推定できる。また、輪郭直線Ｌ１とＬ６とで挟まれている領域は、上記の指の領域と比較して、その面積がかなり大きいことから、手のひらの領域と推定することができる。さらに、画像領域において、指の領域が上方、手のひらの領域が下方に位置することから、指は上向きに出されていることが推定される。
【００７７】
以上のように、形状解析部５は、領域統合部４で抽出された身体領域の候補領域から形状領域を抽出し、この形状領域から得られる輪郭直線に基づいて、身体がどのような形状となっているかを解析する。
【００７８】
次に、動作認識部６について詳細に説明する。動作認識部６は、認識された身体領域、例えば手の領域の位置を毎時刻追跡することによって、動きの方向を特定する。１時刻間には手の領域の位置は大きく変化しないと仮定することができるので、現時刻の手の領域と前時刻の手の領域との位置関係が近く、かつ指の本数や向きが同じであれば、２つの手の領域の重心位置を結ぶ直線の向きが動きの方向であるとみなすことができる。
【００７９】
一方、手を動画入力装置７に近づけていくような動作や、逆に遠ざけるような動作を行った場合には、重心位置はあまり変化しないことになる。しかしながら、動画入力装置７に近づけていくような動作を行った場合には、手の領域の面積が増加し、逆に遠ざけるような動作を行った場合には、面積が減少することから動作を特定することができる。
【００８０】
以上のようにして認識された指の本数や動きの方向に何らかの意味付けをすることによって、動作認識システムに接続された情報処理装置８を制御するための様々な入力を行うことが可能となる。また、上記のシステムは、手の動きを追跡している間に、認識された指の本数や特定された位置を、情報処理装置８に入力することが可能となっている。したがって、例えばモニタ上で手の動きの軌跡を表示することによって、使用者は動作認識システムに認識されている動作の確認を行うことができる。また、この機能を利用すれば、使用者に動作入力のガイダンスを行うことも可能となる。
【００８１】
次に、図２に示すフローチャートを参照しながら、本実施形態に係る動作認識システムにおける処理の流れを説明する。処理が開始されると、動画入力装置７によって撮影されたフレーム画像が、フレームメモリ１に順に記憶される（ステップ１、以下、Ｓ１と称する）。
【００８２】
次に、フレームメモリ１に記憶されている現時刻および前時刻の２つのフレーム画像に基づいて、動き検出部２によって動き領域が検出される（Ｓ２）。この際に、動き検出部２は、上記の２つのフレーム画像の差分画像に基づいて複数のブロックからなるブロック画像を作成し、このブロック画像に基づいて動き領域の検出を行っている。
【００８３】
次に、フレームメモリ１に記憶されている現時刻のフレーム画像に基づいて、肌色検出部３によって肌色領域が検出される（Ｓ３）。この際に、肌色検出部３は、現時刻のフレーム画像から色度画像を作成し、この色度画像における各画素に対して肌色検出条件に基づいてブロック画像として肌色領域を抽出するとともに、領域の円形度を調べることによって身体領域としての肌色領域を絞りこんでいる。
【００８４】
次に、領域統合部４において、動き検出部２によって作成された動き領域に関するブロック画像と、肌色検出部３によって作成された肌色領域に関するブロック画像とを統合する（Ｓ４）。そして、この統合結果によって身体領域が検出された否かが判断される（Ｓ５）。
【００８５】
身体領域が検出されなかった場合（Ｓ５においてＮＯ）には、以降の処理は行わずに、再びＳ１からの処理を始める。一方、身体領域が検出された場合（Ｓ５においてＹＥＳ）には、身体領域の各画素の色度を検出し、色度値と画素数との関係を示すヒストグラムに基づいて、肌色検出条件を更新する（Ｓ６）。
【００８６】
次に、形状解析部５において、上記の身体領域を基に形状領域を作成し、この形状領域に基づいて形状解析を行う（Ｓ７）。この形状解析においては、形状領域の輪郭線を検出し、この輪郭線をある程度の長さを有する輪郭直線で近似し、この輪郭直線を解析することによって、手などの領域が解析される。
【００８７】
この形状解析部５において、手などの領域が認識されなかった場合（Ｓ８においてＮＯ）には、以降の処理は行わずに、再びＳ１からの処理を始める。一方、手などの領域が認識された場合（Ｓ８においてＹＥＳ）には、動作認識部６において、上記で認識された手などの領域の動作が認識される（Ｓ９）。
【００８８】
Ｓ９において動作が認識されなかった場合（Ｓ１０においてＮＯ）には、以降の処理は行わずに、再びＳ１からの処理を始める。一方、Ｓ９において動作が認識された場合（Ｓ１０においてＹＥＳ）には、認識結果を情報処理装置８に出力する（Ｓ１１）。
【００８９】
その後、動作認識処理の終了命令があったかどうかが判断され（Ｓ１２）、終了命令がなかった場合（Ｓ１２においてＮＯ）には、再びＳ１からの処理を始め、終了命令があった場合（Ｓ１２においてＹＥＳ）には、処理を終了する。
【００９０】
以上で述べた実施の形態では、動画像の入力から動作の認識までの処理が、全て同一時刻の入力画像に対して行われることを前提としている。しかしながら、システムの処理能力が不足しているなどの理由で、全ての処理を１つの時刻に行うことができない場合には、領域統合部４による身体領域の候補を抽出するまでの処理と、形状解析部５以降の処理とを別時刻の入力画像に対して行うことも可能である。このことについて、以下により詳しく説明する。
【００９１】
例えば、時刻０に手の候補領域が抽出されると仮定する。そして、時刻０よりも後となる時刻１の入力画像に対しては、まず式（１）に従って色度画像を作成する。そして、この色度画像上に時刻０における候補領域を投影し、投影領域内において、時刻０の肌色検出条件によって肌色領域を検出する。これを手の候補領域として、以降の形状解析、動作認識の処理を行う。すなわち、時刻１においては、時刻０において領域統合部４によって身体領域の候補として抽出された領域に基づいて、時刻１における入力画像における身体領域の候補領域を設定し、この候補領域に対して、形状解析部５以降の処理を行うことになる。
【００９２】
以上のように、本実施形態に係る動作認識システムは、動画入力装置７から毎時刻入力されるフレーム画像データから、動き検出部２によって抽出された動きのある領域と、肌色検出部３によって抽出された肌色領域とに基づいて、領域統合部４によって対象領域を抽出しているので、例えば従来の技術で示したように、輝度情報や色情報のみによって対象領域を抽出する構成と比較して、対象領域を、より的確にかつ高い信頼性でもって抽出することができる。例えば、背景に対象を特徴づける色と同じような色の領域がある場合でも、背景は基本的に動かないものであるので、上記の動き検出手段において対象領域の候補として抽出されないことになる。したがって、背景に暗幕をひくなどの特殊な環境にする必要なく、対象を適切に抽出することが可能となる。
【００９３】
また、データグローブなどの接触型の入力装置を必要としないので、手などに特殊な装置を装着するなどの煩わしい作業を不要とすることができる。同時に、データグローブなどの接触型の入力装置は、一般的に高価なものであるので、このような入力装置を不要とすることにより、システムにおけるコストの低減を図ることができる。
【００９４】
また、上記の構成では、動きのある領域の検出と肌色領域の検出とを行えばいいので、必要とする画像データは、一般に用いられている可視画像データでよいことになる。したがって、例えば赤外画像などの特殊な画像データを入力することが可能な、高価格で大型の画像入力装置を不要とすることができる。
【００９５】
なお、以上説明した動作認識システムは、動き検出部２、肌色検出部３、領域統合部４、形状解析部５、および動作認識部６において行われる処理を、コンピュータ上で実行可能なプログラムとして記述し、このプログラムをコンピュータ上で実行することによっても、実現することが可能である。このプログラムはコンピュータで読み取り可能な記録媒体に格納されることになる。この記録媒体としては、磁気テープやカセットテープ等のテープ系、フロッピーディスクやハードディスク等の磁気ディスクやＣＤ−ＲＯＭ／ＭＯ／ＭＤ／ＤＶＤ等の光ディスクのディスク系、ＩＣカード（メモリカードを含む）／光カード等のカード系、あるいはマスクＲＯＭ、ＥＰＲＯＭ、ＥＥＰＲＯＭ、フラッシュＲＯＭ等による半導体メモリを含めた固定的にプログラムを担持する媒体などが挙げられる。
【００９６】
また、通信ネットワークからプログラムをダウンロードするように流動的にプログラムを担持する媒体であってもよい。尚、このように通信ネットワークからプログラムをダウンロードする場合には、そのダウンロード用プログラムは予め本体装置に格納しておくか、あるいは別な記録媒体からインストールされるものであってもよい。
【００９７】
また、記録媒体に格納されている内容としてはプログラムに限定されず、データであってもよい。
【００９８】
【発明の効果】
以上のように、本発明に係る動作認識システムは、特定の対象の画像が含まれている時系列画像データを処理することによって該対象の形状および動作を認識する動作認識システムにおいて、上記時系列画像データから動きのある領域を抽出する動き検出手段と、上記時系列画像データから上記対象を特徴づける色を含む領域を抽出する色検出手段と、上記動き検出手段および上記色検出手段の検出結果に基づいて、動きのある領域でかつ上記対象を特徴づける色を含む領域となる領域を対象領域として抽出する領域統合手段とを備えている構成である。
【００９９】
これにより、対象領域を、より的確にかつ高い信頼性でもって抽出することができるという効果を奏する。例えば、背景に対象を特徴づける色と同じような色の領域がある場合でも、背景は基本的に動かないものであるので、上記の動き検出手段において対象領域の候補として抽出されないことになる。したがって、背景に暗幕をひくなどの特殊な環境にする必要なく、対象を適切に抽出することが可能となるという効果を奏する。
【０１００】
また、データグローブなどの接触型の入力装置を必要としないので、手などに特殊な装置を装着するなどの煩わしい作業を不要とすることができると同時に、システムにおけるコストの低減を図ることができるという効果を奏する。
【０１０１】
また、必要とする画像データは、一般に用いられている可視画像データでよいので、例えば赤外画像などの特殊な画像データを入力することが可能な、高価格で大型の画像入力装置を不要とすることができるという効果を奏する。
【０１０２】
また、本発明に係る動作認識システムは、上記動き検出手段が、上記時系列画像データにおける互いに異なる時間の２つの画像データにおいて、各画素における輝度値の差分を画素値とする差分画像を作成し、この差分画像に基づいて動きのある領域を検出する構成としてもよい。
【０１０３】
これにより、上記の効果に加えて、動きのある領域を、的確に、かつ、少ない演算処理によって検出することができるという効果を奏する。
【０１０４】
また、本発明に係る動作認識システムは、上記動き検出手段が、上記差分画像を所定の大きさのブロック単位に分割し、各ブロックに含まれる画素の輝度値の平均値あるいは積算値をブロック値とするブロック画像を作成し、ブロック値が所定の閾値を越えるブロックを連結することによって形成される領域の面積が所定の範囲内にある領域を、動きのある領域として抽出する構成としてもよい。
【０１０５】
これにより、上記の効果に加えて、動いている領域の中でも、ある程度広い範囲を占める領域のみを抽出することになる。よって、例えば背景において、対象とは異なる小さな物体が動いている場合でも、これを対象となる領域の候補からはずすことができる。したがって、対象となる領域の検出の精度を上げることが可能となるという効果を奏する。
【０１０６】
また、本発明に係る動作認識システムは、上記色検出手段が、画像データにおいて、各色成分の画素値が所定の条件を満たす画素領域を、対象を特徴づける色を含む領域として抽出する構成としてもよい。
【０１０７】
これにより、上記の効果に加えて、的確に対象を特徴づける色を含む領域を検出することができるという効果を奏する。また、各色成分に対する条件を適宜変更することによって、背景や照明の変化にも適切に対応することが可能となるという効果を奏する。
【０１０８】
また、本発明に係る動作認識システムは、上記色検出手段が、画像データにおいて、各色成分の画素値が所定の条件を満たす画素領域で、かつ、その画素領域を連結することによって形成される領域の形状および面積が所定の条件を満たしている場合に、該領域を、対象を特徴づける色を含む領域として抽出する構成としてもよい。
【０１０９】
これにより、上記の効果に加えて、色の条件のみならず、その領域の形状および面積をも考慮して、対象を特徴づける色を含む領域を検出することになる。よって、例えば、背景に、対象を特徴づける色と同様の色からなる領域があったとしても、形状や面積による条件によって、このような領域を候補から外すことが可能となる。したがって、対象を特徴づける色を含む領域の検出の精度を上げることが可能となるという効果を奏する。
【０１１０】
また、本発明に係る動作認識システムは、対象を特徴づける色を含む領域を抽出する際に用いられる、各色成分の画素値に対する条件が、現時刻に到るまでの、対象を特徴づける色を含む領域の抽出結果に基づいて決定される構成としてもよい。
【０１１１】
これにより、上記の効果に加えて、例えば、背景や照明の状態などの環境の変化が動作認識中に生じたとしても、このような変化に応じて、各色成分の画素値に対する条件を変化させることが可能となる。すなわち、環境に変化が生じても、対象を特徴づける色を含む領域の抽出の精度を維持することができるという効果を奏する。
【０１１２】
また、本発明に係る動作認識システムは、上記領域統合手段が、さらに、所定の過去の時刻において、領域統合手段によって対象領域として抽出された領域で、かつ、現時刻における、対象を特徴づける色を含む領域をも上記対象領域として抽出する構成としてもよい。
【０１１３】
これにより、上記の効果に加えて、対象がほとんど動いていない状態の時でも、対象を対象領域として抽出することが可能となるという効果を奏する。
【０１１４】
また、本発明に係る動作認識システムは、上記領域統合手段によって抽出された対象領域の形状を解析する形状解析手段をさらに備えている構成としてもよい。
【０１１５】
これにより、上記の効果に加えて、対象領域の形状の状態を、形状を示すある種のコードによって認識することが可能となる。すなわち、多様に変化する対象領域の形状を、複数のカテゴリーに分類することが可能となるという効果を奏する。
【０１１６】
また、本発明に係る動作認識システムは、上記形状解析手段が、対象領域の輪郭線を所定の範囲の長さからなる複数の直線で近似し、この直線の傾き、長さ、位置関係によって対象領域の形状を認識する構成としてもよい。
【０１１７】
これにより、上記の効果に加えて、必要最小限の形状解析を行うことができるという効果を奏する。
【０１１８】
また、本発明に係る動作認識システムは、上記形状解析手段によって解析された対象領域の形状を、経時的に追跡することによって、対象領域の動きの方向を認識する動作認識手段をさらに備えている構成としてもよい。
【０１１９】
これにより、上記の効果に加えて、対象領域の動きの状態を、動きを示すある種のコードによって認識することが可能となる。すなわち、多様に変化する対象領域の動きを、複数のカテゴリーに分類することが可能となるという効果を奏する。
【０１２０】
また、本発明に係る動作認識システムは、上記領域統合手段における対象領域の抽出と、上記形状解析手段における形状の解析とを、それぞれ別時刻の画像データに対して行う構成としてもよい。
【０１２１】
これにより、上記の効果に加えて、１単位時刻に行う処理量を低減することが可能となるので、演算性能が若干劣るシステムにおいても、処理の停滞などが生じることなく、円滑に処理を行うことが可能となるという効果を奏する。
【０１２２】
また、本発明に係る動作認識システムは、上記対象が人間の手である構成としてもよい。
【０１２３】
これにより、上記の効果に加えて、例えば、差し出す指の本数、およびその向き、さらに動きの方向にそれぞれ意味を持たせ、これらを認識することによって、例えば外部に接続された情報処理装置などのシステムに対して制御命令を送信するなどのインターフェースとして機能させることが可能となる。これによって、複雑な操作を使用者が覚えることなく、直観的な操作によるユーザーインターフェースを実現することが可能となるという効果を奏する。
【０１２４】
また、本発明に係る動作認識プログラムを記録した記録媒体は、特定の対象の画像が含まれている時系列画像データを処理することによって該対象の形状および動作を認識する動作認識プログラムを記録した記録媒体において、上記時系列画像データから動きのある領域を抽出する処理と、上記時系列画像データから上記対象を特徴づける色を含む領域を抽出する処理と、上記動き検出手段および上記色検出手段の検出結果に基づいて、動きのある領域でかつ上記対象を特徴づける色を含む領域となる領域を対象領域として抽出する処理とをコンピュータに実行させるための動作認識プログラムを記録している構成である。
【０１２５】
これにより、対象領域を、より的確にかつ高い信頼性でもって抽出することができるという効果を奏する。例えば、背景に対象を特徴づける色と同じような色の領域がある場合でも、背景は基本的に動かないものであるので、動きのある領域として抽出されないことになる。したがって、背景に暗幕をひくなどの特殊な環境にする必要なく、対象を適切に抽出することが可能となるという効果を奏する。
【０１２６】
また、データグローブなどの接触型の入力装置を必要としないので、手などに特殊な装置を装着するなどの煩わしい作業を不要とすることができるとともに、システムにおけるコストの低減を図ることができるという効果を奏する。
【０１２７】
また、必要とする画像データは、一般に用いられている可視画像データでよいので、例えば赤外画像などの特殊な画像データを入力することが可能な、高価格で大型の画像入力装置を不要とすることができるという効果を奏する。
【図面の簡単な説明】
【図１】本発明の実施の一形態に係る動作認識システムの概略構成を示すブロック図である。
【図２】上記動作認識システムにおいて行われる処理の流れを示すフローチャートである。
【図３】同図（ａ）ないし（ｄ）は、上記動作認識システムが備える動き検出部が、処理を行う上で作成する画像を示す説明図である。
【図４】上記動作認識システムが備える領域統合部での処理を模式的に示す説明図である。
【図５】同図（ａ）は、領域統合部において、身体領域の候補として検出された領域を示す説明図であり、同図（ｂ）は、同図（ａ）に示す領域を、肌色検出部において作成された色度画像に投影した状態を示す説明図である。
【図６】図５（ｂ）において破線で囲まれた領域に含まれる画素の各色度値に関して、各画素値に対する画素数を毎時刻積算した結果に基づき、横軸に色度値、縦軸に画素数をとって作成したヒストグラムである。
【図７】上記動作認識システムが備える形状解析部において行われる処理を示す説明図である。
【符号の説明】
１フレームメモリ
２動き検出部（動き検出手段）
３肌色検出部（色検出手段）
４領域統合部（領域統合手段）
５形状解析部（形状解析手段）
６動作認識部（動作認識手段）
７動画入力装置
８情報処理装置[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a motion recognition system that recognizes the shape and motion of a hand by processing time-series image data including an image of a hand, for example.
[0002]
[Prior art]
Conventionally, as a user interface in an information processing device such as a personal computer, a keyboard as a key input device, a mouse as a pointing device, or the like is generally used. However, since operation with a keyboard and mouse requires a certain level of skill, there is a problem that operation is difficult for beginners.
[0003]
Further, when using a keyboard, a mouse, or the like, it is necessary for the user to remember the relationship between the operation and the response of the system to the operation. For example, it is necessary to memorize functions such as Ctrl key and Alt key on the keyboard, and it is necessary to memorize the difference between single click and double click and the function between left button and right button on the mouse. . It is a great burden for beginners to learn such various operations and functions one by one.
[0004]
Therefore, in recent years, there have been many attempts to use the human body, that is, gestures and hand gestures, as a simple and intuitive user interface. In order to use gestures and hand gestures as a user interface, input information related to body posture, shape, and movement using an input device such as a camera, and the content of body information is recognized by analyzing the input information. It is necessary to give a meaning such as a specific command to the performed operation.
[0005]
Here, an example of a method proposed as a user interface using a human body is shown below. IEICE Transactions D-II Vol.J80-D-II No.6 pp.1571-1580 (1997) “Real-Time Gesture Recognition Method from Moving Images for Interactive System Construction-Application to Virtual Conducting System -"(Reference 1) discloses a method of recognizing a gesture in real time by extracting an arm portion from an image photographed by a CCD camera and analyzing a movement locus thereof. Japanese Patent Laid-Open No. 2-144675 (Reference 2) discloses a method of wearing a glove that is painted differently for each joint of a finger and recognizing the movement of the finger from the photographed image based on the color of the glove. Is disclosed.
[0006]
The Journal of the Institute of Television Engineers of Japan, Vol.48, No.8, pp.960-965 (1994) uses a glove-type sensor device called a data glove for the “substrate technology for realizing a virtual environment” (reference 3). Thus, a method for inputting finger movements to a computer is disclosed. In addition, in the video information (I) 1992/9 pp.55-60 "Person extraction by infrared image and visible image" (Reference 4), an infrared image and a visible image are used as inputs. A method is disclosed in which a candidate area is extracted, and further, a skin color area in a human candidate area is extracted from a visible image to specify the position of a face or hand.
[0007]
[Problems to be solved by the invention]
When a visible image is used as an input as in the method disclosed in Document 1, the most difficult problem is extracting a recognition target region such as a hand or a finger from the input image. Regarding the extraction of hands and fingers, a technique that is generally realized by extracting a skin color region using luminance information and color information of an image. However, luminance information and color information are difficult to distinguish from the body when there is an area close to the skin color in the background, and values are likely to fluctuate depending on environmental conditions such as lighting conditions. Has the problem of lack of stability.
[0008]
In order to solve this problem, for example, Document 1 shows a corresponding example by arranging a dark curtain in the background, and relatively many examples have been proposed for implementation in such a special environment. Or many examples which raise detection accuracy by attaching what becomes a marker to the part used as recognition object like the method indicated by literature 2, for example are proposed.
[0009]
Further, when a device dedicated to motion input such as a data glove is used as in the method disclosed in Document 3, there is no need to consider the problems related to the stability of hand region extraction and motion information acquisition. . However, there are problems that it is troublesome to attach an operation input device before the operation and the user interface is expensive, and there are many points that are difficult to use as a substitute for a keyboard or a mouse.
[0010]
If infrared images are used as in the method disclosed in Document 4, the temperature difference between the body and the background is large in a general environment, so that it is easy to extract a human body region. However, devices that input infrared images are generally expensive in price, and many of the devices are large, and it is difficult to divert them except for special purposes such as monitoring illegal intruders. There is a problem that it is difficult to spread to ordinary households.
[0011]
The present invention has been made to solve the above-described problems, and an object of the present invention is to recognize the shape and motion of a target by processing time-series image data including a specific target image. An object of the present invention is to provide a low-cost motion recognition system with high accuracy in detecting the shape and motion of an object.
[0012]
[Means for Solving the Problems]
In order to solve the above-described problems, a motion recognition system according to the present invention is a motion recognition system that recognizes the shape and motion of a target by processing time-series image data including a specific target image. , Motion detection means for extracting a region having motion from the time series image data, color detection means for extracting a region including a color characterizing the object from the time series image data, the motion detection means, and the color detection The image processing apparatus is characterized by comprising area integration means for extracting, as a target area, an area that is a moving area and that includes a color that characterizes the target based on the detection result of the means.
[0013]
According to the above configuration, the region based on the region with motion extracted by the motion detection unit and the region including the color characterizing the motion recognition target extracted by the color detection unit from the time-series image data. Since the target area is extracted by the integration means, for example, as shown in the prior art, the target area is more accurately and highly reliable than the configuration in which the target area is extracted only by luminance information and color information. It can be extracted with sex. For example, even if there is an area of a color similar to the color characterizing the object in the background, the background is basically non-moving, so that it is not extracted as a candidate for the object area by the motion detection means. Therefore, it is possible to appropriately extract the object without having to use a special environment such as a dark curtain on the background.
[0014]
Further, since a contact-type input device such as a data glove is not required, troublesome work such as mounting a special device on the hand or the like can be eliminated. At the same time, a contact input device such as a data glove is generally expensive. Therefore, the cost of the system can be reduced by eliminating such an input device.
[0015]
Further, in the above configuration, since it is only necessary to detect a moving region and a region including a color characterizing a target, the necessary image data may be visible image data that is generally used. Become. Therefore, for example, it is possible to eliminate the need for an expensive and large-sized image input device that can input special image data such as an infrared image.
[0016]
In the motion recognition system according to the present invention, in the above configuration, the motion detection unit uses a difference between luminance values of each pixel as a pixel value in two image data at different times in the time-series image data. It is good also as a structure which produces a difference image and detects the area | region with a motion based on this difference image.
[0017]
According to the above configuration, the motion detection means detects a region in motion based on a difference image having a pixel value as a difference in luminance value in each pixel in two image data at different times in time-series image data. As a result, it is possible to detect a region with movement accurately and with a small amount of arithmetic processing.
[0018]
Further, in the motion recognition system according to the present invention, in the above configuration, the motion detection unit divides the difference image into blocks of a predetermined size, and an average value of luminance values of pixels included in each block or A block image having the integrated value as a block value is created, and a region in which the area of the region formed by connecting blocks whose block value exceeds a predetermined threshold is within a predetermined range is extracted as a region having movement. It is good also as a structure.
[0019]
According to the above configuration, the motion detection unit creates a block image based on the difference image, and an area formed by connecting blocks whose block values exceed a predetermined threshold is within a predetermined range. Since a certain region is extracted as a region having movement, only a region that occupies a certain range is extracted from the moving region. Therefore, for example, even when a small object different from the target is moving in the background, it can be removed from the target region candidates. Therefore, it is possible to increase the accuracy of detection of the target region.
[0020]
In the motion recognition system according to the present invention, in the configuration described above, the color detection unit may define, in the image data, a pixel region in which pixel values of each color component satisfy a predetermined condition as a region including a color characterizing the target. It is good also as a structure to extract.
[0021]
According to the above configuration, the color detection unit extracts the pixel region in which the pixel value of each color component satisfies the predetermined condition as the region including the color that characterizes the target. Therefore, the region that accurately includes the color that characterizes the target. Can be detected. Further, by appropriately changing the conditions for each color component, it is possible to appropriately cope with changes in the background and illumination.
[0022]
Further, in the motion recognition system according to the present invention, in the above configuration, the color detection unit is a pixel region in which pixel values of each color component satisfy a predetermined condition in the image data and connects the pixel regions. When the shape and area of the region formed by the above satisfy a predetermined condition, the region may be extracted as a region including a color characterizing the object.
[0023]
According to the above configuration, the color detection unit is configured such that the pixel value of each color component satisfies the predetermined condition, and the shape and area of the region formed by connecting the pixel regions satisfy the predetermined condition In addition, since the region is extracted as a region including a color characterizing the target, the region including the color characterizing the target is detected in consideration of not only the color condition but also the shape and area of the region. become. Therefore, for example, even if there is a region having the same color as the color that characterizes the object in the background, such a region can be excluded from the candidates depending on the conditions based on the shape and area. Therefore, it is possible to increase the accuracy of detection of a region including a color that characterizes the object.
[0024]
Further, the motion recognition system according to the present invention is configured so that, in the above-described configuration, the condition for the pixel value of each color component used when extracting the region including the color characterizing the target reaches the current time. It is good also as a structure determined based on the extraction result of the area | region containing the color which characterizes.
[0025]
According to the above configuration, the condition for the pixel value of each color component is determined based on the extraction result of the region including the color that characterizes the object until the current time is reached. Even if the environmental change occurs during motion recognition, the condition for the pixel value of each color component can be changed according to such a change. That is, even when the environment changes, the accuracy of extracting the region including the color characterizing the object can be maintained.
[0026]
Further, the motion recognition system according to the present invention is the above configuration, wherein the region integration unit is a region extracted as a target region by the region integration unit at a predetermined past time, and at the current time. An area including a color characterizing the object may be extracted as the object area.
[0027]
According to the above configuration, the region integration unit is a region that is moving at the current time and includes a region that includes a color that characterizes the target, and a region that is extracted as a target region by the region integration unit at a predetermined past time. In addition, since the region including the color characterizing the target at the current time is also extracted as the target region, the target can be extracted as the target region even when the target is hardly moving.
[0028]
Moreover, the motion recognition system according to the present invention may be configured to further include a shape analysis unit that analyzes the shape of the target region extracted by the region integration unit in the above configuration.
[0029]
According to the above configuration, the shape analysis unit can analyze the shape of the target region extracted by the region integration unit, so that the shape state of the target region is recognized by a certain code indicating the shape. Is possible. That is, it is possible to classify the shape of the target region that varies in various ways into a plurality of categories.
[0030]
Further, in the motion recognition system according to the present invention, in the above configuration, the shape analysis unit approximates the contour line of the target region with a plurality of straight lines having a predetermined range length, and the inclination and length of the straight line. The configuration may be such that the shape of the target region is recognized based on the positional relationship.
[0031]
According to the above configuration, the shape analysis unit approximates the outline of the target area with a plurality of straight lines having a predetermined length, and recognizes the shape of the target area based on the inclination, length, and positional relationship of the straight line. Therefore, the minimum necessary shape analysis can be performed.
[0032]
Further, the motion recognition system according to the present invention has the above-described configuration, the motion recognition unit that recognizes the direction of movement of the target region by tracking the shape of the target region analyzed by the shape analysis unit over time. It is good also as a structure further equipped with.
[0033]
According to the above configuration, the motion recognition unit recognizes the direction of movement of the target region by tracking the shape of the target region analyzed by the shape analysis unit over time by the motion recognition unit. Can be recognized by a certain code indicating movement. In other words, it is possible to classify the movement of the target area that changes in various ways into a plurality of categories.
[0034]
Further, the motion recognition system according to the present invention has a configuration in which, in the above configuration, the target region extraction in the region integration unit and the shape analysis in the shape analysis unit are performed on image data at different times, respectively. Also good.
[0035]
According to the above configuration, since the target region extraction in the region integration unit and the shape analysis in the shape analysis unit are performed on image data at different times, the amount of processing performed at one unit time can be reduced. Is possible. Therefore, even in a system with slightly inferior computing performance, it is possible to perform processing smoothly without causing processing stagnation or the like.
[0036]
Further, the motion recognition system according to the present invention may be configured such that, in the above configuration, the target is a human hand.
[0037]
According to the above configuration, the region of the human hand is extracted, shape analysis, and motion recognition are performed.For example, the number of fingers to be presented, the direction thereof, and the direction of movement are given meanings, respectively. Can be made to function as an interface for transmitting a control command to a system such as an information processing apparatus connected to the outside. As a result, it is possible to realize a user interface based on an intuitive operation without the user having to learn complicated operations.
[0038]
Further, the recording medium on which the motion recognition program according to the present invention is recorded records the motion recognition program for recognizing the shape and motion of the target by processing the time-series image data including the image of the specific target. In the recording medium, processing for extracting a region having motion from the time-series image data, processing for extracting a region including a color characterizing the object from the time-series image data, the motion detection unit, and the color detection unit Recording a motion recognition program for causing a computer to execute a process of extracting, as a target area, an area that is a moving area and that includes a color that characterizes the target based on the detection result of It is a feature.
[0039]
According to the above configuration, since the target area is extracted from the time-series image data based on the moving area and the area including the color characterizing the motion recognition target, for example, as shown in the related art Compared with the configuration in which the target area is extracted based only on luminance information and color information, the target area can be extracted more accurately and with high reliability. For example, even if there is an area of the same color as the color that characterizes the object in the background, the background is basically non-moving, so that it is not extracted as a moving area. Therefore, it is possible to appropriately extract the object without having to use a special environment such as a dark curtain on the background.
[0040]
Further, since a contact-type input device such as a data glove is not required, troublesome work such as mounting a special device on the hand or the like can be eliminated. At the same time, a contact input device such as a data glove is generally expensive. Therefore, the cost of the system can be reduced by eliminating such an input device.
[0041]
Further, in the above configuration, since it is only necessary to detect a moving region and a region including a color characterizing a target, the necessary image data may be visible image data that is generally used. Become. Therefore, for example, it is possible to eliminate the need for an expensive and large-sized image input device that can input special image data such as an infrared image.
[0042]
DETAILED DESCRIPTION OF THE INVENTION
An embodiment of the present invention will be described with reference to FIGS. 1 to 7 as follows.
[0043]
FIG. 1 is a block diagram showing a schematic configuration of the motion recognition system according to the present embodiment. The motion recognition system includes a frame memory 1, a motion detection unit (motion detection unit) 2, a skin color detection unit (color detection unit) 3, a region integration unit (region integration unit) 4, a shape analysis unit (shape analysis unit) 5, And a motion recognition unit (motion recognition means) 6. Further, as shown in FIG. 1, the motion recognition system is connected to a moving image input device 7 and an information processing device 8.
[0044]
The moving image input device 7 is constituted by a CCD (Charge Coupled Device) camera, for example. This CCD camera tends to become widespread in general due to the rapid drop in price in recent years and the expansion of markets such as video conference systems and video calls over the Internet. In addition, a video movie camera that is already widely used can be used as the moving image input device 7. Thus, since the moving image input device 7 may be any device capable of inputting a visible image, it can be configured by a relatively inexpensive device.
[0045]
The information processing apparatus 8 is constituted by a personal computer, for example, and performs various information processing. Further, for example, it may be a computer for controlling the operation of another device.
[0046]
Hereinafter, each processing unit included in the motion recognition system will be described in detail.
[0047]
The frame memory 1 temporarily stores frame image data sequentially transferred from the moving image input device 7. The frame memory 1 has a storage capacity that can store at least two frames of images. When a new frame image is input, the frame memory 1 is stored by deleting the frame image having the earliest input time. The frame images are updated sequentially. In order to reduce the amount of subsequent processing, the frame image may be reduced and stored.
[0048]
The motion detection unit 2 detects a motion region moving in the image by comparing two frame images stored in the frame memory 1.
[0049]
In the detection operation of the motion detection unit 2, when the transfer rate of the moving image input device 7 is high, for example, about 30 frames per second, if the motion region moves slowly, the interval between successive two time images Since the difference between the two is extremely small, the motion region cannot be detected. In such a case, the frame memory 1 may store frame images every other frame or every two frames without storing all the frame images output from the moving image input device 7. As another method, even when the moving speed of the moving region is slow, the storage capacity of the frame memory 1 may be sufficiently increased so as to cause a difference between images. In this case, the motion detection unit 2 compares the frame image at the current time with the frame image at the earliest time among the frame images stored in the frame memory 1.
[0050]
Here, the operation of the motion detection unit 3 will be described in detail with reference to FIGS. 3 (a) to 3 (d). First, the motion detection unit 2 creates a difference image for each pixel of the two frame images stored in the frame memory 1 with the absolute value of the subtraction value related to the luminance value as the value of each pixel. An example of this difference image is shown in FIG. When the color format of the frame image is the RGB format, the value of only one element of RGB (generally G) may be simply used as the luminance value in order to reduce the calculation amount. .
[0051]
Next, as shown in FIG. 3B, the motion detection unit 2 divides the difference image into blocks each having a predetermined size. For example, when the size of the difference image is 320 × 240 pixels, if the size of one block is 16 × 16 pixels, the difference image is an image composed of 20 × 15 blocks. Then, an average value of pixel values included in each block is set as a block value in each block, and a block image is created based on this block value as shown in FIG. Here, for a block whose block value is equal to or less than a predetermined threshold, by setting the block value of the block to 0, a minute motion region different from the motion detection target is excluded. In FIG. 3C, for convenience, the size of the block value is represented by the size of the display area in each block.
[0052]
In the above, the average value of the pixel values included in each block is the block value in each block. However, the present invention is not limited to this. For example, the sum of the pixel values included in each block is the block value. It may be a value.
[0053]
Next, as illustrated in FIG. 3D, the motion detection unit 2 binarizes the block image illustrated in FIG. 3C to detect only pixels with large motion. The threshold value used when performing the binarization is automatically determined by analyzing each block value of the block image using a method such as a discriminant analysis method. Then, in the binarized image, when blocks detected as pixels with large motion are adjacent to each other, the area of a region where these blocks are connected is obtained. If the area of the connected area exceeds a predetermined threshold, the area is extracted as a moving area. The threshold for the area of the region is determined to be, for example, 1/30 of the area of the entire block image.
[0054]
As described above, the motion detection unit 2 creates a block image from a difference image between two frame images stored in the frame memory 1 and, based on the analysis of the block image, identifies a region with motion in the image. The operation to extract is performed.
[0055]
Next, the skin color detection unit 3 will be described in detail. The skin color detection unit 3 receives image data stored in the frame memory 1 as a frame image at the current time. The RGB values of each pixel in the input image data are normalized by performing the following conversion to create a chromaticity image. Note that the meaning of normalization is to remove uneven illumination and extract only the chromaticity component.
[0056]
[Expression 1]

[0057]
Next, for each pixel in the normalized chromaticity image, a pixel that satisfies the condition of skin color is detected. The condition of skin color is expressed by the following formulas (2) to (6) with respect to (r, g, b) in formula (1).
r _min ≦ r ≦ r _max (2)
g _min ≦ g ≦ g _max (3)
b _min ≦ b ≦ b _max (4)
r> g (5)
r> b (6)
[0058]
Where r _min , G _min , B _min Is the minimum value for each value of (r, g, b), r _max , G _max , B _max Represents the maximum value. How to determine these minimum and maximum values will be described later.
[0059]
Next, the skin color detection unit 3 divides the chromaticity image into a plurality of blocks, like the motion detection unit 2, and the number of pixels satisfying the condition of the skin color in each block is predetermined. When the threshold value is exceeded, the block value is set to 1. When the threshold value is equal to or less than the predetermined threshold value, a block image is generated with the block value set to 0. It is assumed that the size of each block in the block image is the same as the size of each block in the block image created by the motion detection unit 2.
[0060]
Subsequently, similar to the motion detection unit 2, in the block image, among the blocks having a block value of 1, adjacent blocks are connected to obtain the area of the connected region. When the area of the connected region exceeds a predetermined threshold, this region is extracted as a skin color region candidate.
[0061]
Further, the skin color detection unit 3 analyzes the shape of the region extracted as a skin color region candidate as described above. In general, a region where a face or a hand is displayed has a shape close to a circle or an ellipse on a block image. Based on this, the skin color region can be narrowed down based on the circularity of the region. The circularity C of the region is obtained by the following equation (7), where L is the perimeter of the region and A is the area.
C = L ² / A (7)
[0062]
The smaller the value of the circularity C, the closer to the circle. Therefore, an area where the circularity C is smaller than a predetermined threshold is extracted as a skin color area.
[0063]
The size and shape of the skin color area hardly change between consecutive frames. Accordingly, the threshold value relating to the area and shape can be determined based on the detection result of the previous time after determining an appropriate initial value. In other words, the area threshold may be set to a value slightly smaller than the area detected in the vicinity at the previous time, and the shape threshold may be set to a value slightly larger than the circularity at the previous time.
[0064]
Next, the region integration unit 4 will be described in detail. The region integration unit 4 receives the block image created by the motion detection unit 2 and the block image created by the skin color detection unit 3. When the area of the overlapping region between the motion region detected by the motion detection unit 2 and the skin color region detected by the skin color detection unit 3 exceeds a predetermined threshold, the region integration unit 4 determines that region as a body region. This is extracted as a candidate. As this threshold value, for example, the area of the overlapping region may be determined to be one third of the skin color region.
[0065]
In addition, in order to correspond to a time when there is almost no movement in the body part, the region extracted as a body region candidate in the region integration unit 4 one hour before is stored, and the region overlaps with the skin color region at the current time Even when the region exceeds a predetermined threshold, it is extracted as a body region candidate.
[0066]
The above processing is summarized as shown in FIG. In FIG. 4, A is a region detected as a motion region in the motion detection unit 2, B is a region detected as a body region candidate at the previous time in the region integration unit 4, and C is a skin color in the skin color detection unit 3. Regions detected as regions are respectively shown. The region integration unit 4 detects an overlapping region between the image obtained by adding A and B and the C image as a moving skin color region indicated by D in FIG. Here, if no region is detected as a moving skin color region, the process returns to the next frame image input processing, and the processing in the processing unit after the region integration unit 4 is not performed.
[0067]
Skin color condition value r in the above-described skin color detection unit 3 _min , G _min , B _min , R _max , G _max , B _max Is updated based on a result detected as a body region candidate in the region integration unit 4. This will be described below with reference to FIGS. 5A and 5B and FIG.
[0068]
FIG. 5A shows a region detected as a body region candidate in the region integration unit 4. As shown in FIG. 5B, when this area is projected onto the chromaticity image created by the skin color detection unit 3, the area is surrounded by a broken line in the figure. With respect to the chromaticity values of r, g, and b of the pixels included in the area surrounded by the broken line, the number of pixels for each pixel value is integrated every time. Based on the integration result, a histogram is created with the chromaticity value on the horizontal axis and the number of pixels on the vertical axis. FIG. 6 shows a histogram relating to the r component.
[0069]
In the histogram corresponding to each color component, the peak value of the number of pixels is detected, and when the peak value exceeds a predetermined threshold, the above skin color condition value is updated. This threshold value may be set so that the ratio to the peak value of the histogram becomes a predetermined value. In FIG. 6, a value indicated by a broken line represents this threshold value. Then, a range of color values in which the frequency is equal to or greater than the set threshold value is defined as a skin color range, and the skin color condition value is determined accordingly. That is, in FIG. _min And r _max The value of is determined.
[0070]
As described above, if the skin color detection conditions are determined based on past detection results, it is possible to adapt to subtle changes in illumination conditions, background changes, and the like.
[0071]
Next, the shape analysis unit 5 will be described in detail. The shape analysis unit 5 recognizes what is represented by the body by analyzing the shape of the candidate region of the body region extracted by the region integration unit 4. Here, it is assumed that the target body is a hand, and the number of fingers shown in the hand is specified.
[0072]
Since the candidate region of the hand extracted by the region integration unit 4 is a region on the block image, when the region is small, a detailed portion of the shape becomes unclear. Therefore, the shape is analyzed by the following method.
[0073]
First, when a hand candidate area is projected onto a chromaticity image created by the skin color detection unit 3, an area overlapping with the candidate area is extracted. Then, a shape area formed by the pixels satisfying the skin color conditions of the above formulas (2) to (6) is extracted inside the area, and isolated point removal processing and hole filling are performed on the shape area. Processing, smoothing of the contour portion, etc. are performed. As a result, a plurality of shape regions may be extracted from the candidate region. In this case, the shape analysis is performed on the shape region having the largest area.
[0074]
The analysis of the shape can be performed, for example, by the following procedure. First, the contour line is extracted from the extracted shape region. Next, the extracted contour line is approximated by a plurality of straight lines having a certain length. Among these plural straight lines, straight lines having substantially the same inclination are selected as contour straight lines. A specific example of this shape analysis is shown in FIG.
[0075]
In FIG. 7, the portion indicated by the thin line is the contour line of the extracted shape region, and the thick lines indicated by L1 to L6 are the selected contour straight lines. From these contour straight lines, any two contour straight lines sandwiching the shape region are selected, and by examining the width, length, area, positional relationship, etc. of the sandwiched region, the finger region or palm Can be detected.
[0076]
For example, in the example shown in FIG. 7, the region sandwiched between the contour straight lines L2 and L3 and the region sandwiched between the contour straight lines L4 and L5 have substantially the same length and the same width. From this, it can be estimated as a finger region. In addition, since there are no other similar regions, it can be estimated that the number of fingers is two. Further, since the area sandwiched between the contour lines L1 and L6 is considerably larger than the above finger area, it can be estimated as a palm area. Further, in the image area, the finger area is located above and the palm area is located below, so that it is estimated that the finger is placed upward.
[0077]
As described above, the shape analysis unit 5 extracts a shape region from the candidate regions of the body region extracted by the region integration unit 4, and based on the contour straight line obtained from this shape region, Analyze whether it is.
[0078]
Next, the motion recognition unit 6 will be described in detail. The motion recognition unit 6 specifies the direction of movement by tracking the position of the recognized body region, for example, the hand region every time. Since it can be assumed that the position of the hand area does not change significantly during one time, the positional relationship between the hand area at the current time and the hand area at the previous time is close, and the number and orientation of fingers are the same. If so, the direction of the straight line connecting the gravity center positions of the two hand regions can be regarded as the direction of movement.
[0079]
On the other hand, when an operation that moves the hand closer to the moving image input device 7 or an operation that moves the hand away from the moving image input device 7 is performed, the position of the center of gravity does not change much. However, when an operation that moves closer to the video input device 7 is performed, the area of the hand region increases, and when an operation that moves away from the moving image input device 7 is performed, the operation decreases because the area decreases. Can be identified.
[0080]
By giving some meaning to the number of fingers and the direction of movement recognized as described above, various inputs for controlling the information processing apparatus 8 connected to the motion recognition system can be performed. . In addition, the system described above can input the number of recognized fingers and the specified position to the information processing apparatus 8 while tracking the movement of the hand. Therefore, for example, by displaying the locus of the hand movement on the monitor, the user can check the movement recognized by the movement recognition system. In addition, if this function is used, it is possible to provide guidance for operation input to the user.
[0081]
Next, the flow of processing in the motion recognition system according to the present embodiment will be described with reference to the flowchart shown in FIG. When the processing is started, frame images taken by the moving image input device 7 are sequentially stored in the frame memory 1 (step 1, hereinafter referred to as S1).
[0082]
Next, based on the two frame images of the current time and the previous time stored in the frame memory 1, a motion region is detected by the motion detector 2 (S2). At this time, the motion detection unit 2 creates a block image including a plurality of blocks based on the difference image between the two frame images, and detects a motion region based on the block image.
[0083]
Next, based on the frame image of the current time stored in the frame memory 1, the skin color area is detected by the skin color detection unit 3 (S3). At this time, the skin color detection unit 3 creates a chromaticity image from the frame image at the current time, extracts a skin color region as a block image based on the skin color detection condition for each pixel in the chromaticity image, and The skin color region as the body region is narrowed down by examining the circularity of the body.
[0084]
Next, the area integration unit 4 integrates the block image related to the motion region created by the motion detection unit 2 and the block image related to the skin color region created by the skin color detection unit 3 (S4). And it is judged by this integration result whether a body region was detected (S5).
[0085]
If the body region is not detected (NO in S5), the process from S1 is started again without performing the subsequent processes. On the other hand, when the body region is detected (YES in S5), the chromaticity of each pixel in the body region is detected, and the skin color detection condition is updated based on the histogram indicating the relationship between the chromaticity value and the number of pixels. (S6).
[0086]
Next, the shape analysis unit 5 creates a shape region based on the body region, and performs shape analysis based on the shape region (S7). In this shape analysis, a contour line of a shape region is detected, this contour line is approximated by a contour straight line having a certain length, and the contour straight line is analyzed to analyze a region such as a hand.
[0087]
When the shape analysis unit 5 does not recognize an area such as a hand (NO in S8), the process from S1 is started again without performing the subsequent processes. On the other hand, when the region such as the hand is recognized (YES in S8), the motion recognition unit 6 recognizes the motion of the region such as the hand recognized above (S9).
[0088]
If the operation is not recognized in S9 (NO in S10), the process from S1 is started again without performing the subsequent processes. On the other hand, when the operation is recognized in S9 (YES in S10), the recognition result is output to the information processing apparatus 8 (S11).
[0089]
Thereafter, it is determined whether or not there is an end command for the motion recognition process (S12). If there is no end command (NO in S12), the process starts again from S1, and if there is an end command (YES in S12). ) Terminates the process.
[0090]
In the embodiment described above, it is assumed that all processes from the input of a moving image to the recognition of the operation are performed on input images at the same time. However, when all the processes cannot be performed at one time due to a lack of processing capability of the system, the process up to extracting body region candidates by the region integration unit 4 and the shape It is also possible to perform the processing after the analysis unit 5 on the input image at another time. This will be described in more detail below.
[0091]
For example, assume that a hand candidate region is extracted at time 0. For an input image at time 1 that is later than time 0, a chromaticity image is first created according to equation (1). Then, the candidate area at time 0 is projected on the chromaticity image, and the skin color area is detected in the projection area based on the skin color detection condition at time 0. With this as a hand candidate region, the subsequent shape analysis and motion recognition processing is performed. That is, at time 1, based on the region extracted as a body region candidate by the region integration unit 4 at time 0, a body region candidate region in the input image at time 1 is set, and for this candidate region, The processing after the shape analysis unit 5 is performed.
[0092]
As described above, the motion recognition system according to this embodiment is extracted from the frame image data input every time from the moving image input device 7 by the skin color detection unit 3 and the region with motion extracted by the motion detection unit 2. Since the target area is extracted by the area integration unit 4 based on the determined skin color area, for example, as shown in the related art, compared with a configuration in which the target area is extracted only by luminance information and color information. The target region can be extracted more accurately and with high reliability. For example, even if there is an area of a color similar to the color characterizing the object in the background, the background is basically non-moving, so that it is not extracted as a candidate for the object area by the motion detection means. Therefore, it is possible to appropriately extract the object without having to use a special environment such as a dark curtain on the background.
[0093]
Further, since a contact-type input device such as a data glove is not required, troublesome work such as mounting a special device on the hand or the like can be eliminated. At the same time, a contact input device such as a data glove is generally expensive. Therefore, the cost of the system can be reduced by eliminating such an input device.
[0094]
In the above configuration, since it is only necessary to detect a moving region and a skin color region, the necessary image data may be visible image data that is generally used. Therefore, for example, it is possible to eliminate the need for an expensive and large-sized image input device that can input special image data such as an infrared image.
[0095]
The motion recognition system described above describes the processing performed in the motion detection unit 2, the skin color detection unit 3, the region integration unit 4, the shape analysis unit 5, and the motion recognition unit 6 as a program that can be executed on a computer. However, it can also be realized by executing this program on a computer. This program is stored in a computer-readable recording medium. This recording medium includes a tape system such as a magnetic tape and a cassette tape, a magnetic disk such as a floppy disk and a hard disk, an optical disk such as a CD-ROM / MO / MD / DVD, an IC card (including a memory card) / Examples include a card system such as an optical card, or a medium that carries a fixed program including a semiconductor memory such as a mask ROM, EPROM, EEPROM, flash ROM, or the like.
[0096]
Moreover, the medium which carries a program in a fluid way so that a program may be downloaded from a communication network may be used. When the program is downloaded from the communication network in this way, the download program may be stored in the main device in advance, or may be installed from another recording medium.
[0097]
Further, the content stored in the recording medium is not limited to a program, and may be data.
[0098]
【The invention's effect】
As described above, the motion recognition system according to the present invention is a time recognition system that recognizes the shape and motion of an object by processing time-series image data including an image of a specific object. Motion detection means for extracting a region having motion from image data, color detection means for extracting a region including a color characterizing the object from the time-series image data, detection results of the motion detection means and the color detection means And a region integration means for extracting, as a target region, a region that is a moving region and that includes a color that characterizes the target.
[0099]
Thereby, there is an effect that the target region can be extracted more accurately and with high reliability. For example, even if there is an area of a color similar to the color characterizing the object in the background, the background is basically non-moving, so that it is not extracted as a candidate for the object area by the motion detection means. Therefore, there is an effect that it is possible to appropriately extract the object without having to use a special environment such as a dark curtain on the background.
[0100]
In addition, since a contact-type input device such as a data glove is not required, troublesome work such as attaching a special device to a hand or the like can be eliminated, and at the same time, the cost of the system can be reduced. There is an effect.
[0101]
In addition, since the necessary image data may be visible image data that is generally used, for example, special image data such as an infrared image can be input, and a large-sized image input device that is expensive and does not need to be used. There is an effect that can be done.
[0102]
Further, in the motion recognition system according to the present invention, the motion detection unit creates a difference image in which the difference between the luminance values of each pixel is a pixel value in two image data at different times in the time-series image data. A configuration may be adopted in which a region having motion is detected based on the difference image.
[0103]
Thereby, in addition to the above-described effect, there is an effect that a region with movement can be detected accurately and with a small amount of arithmetic processing.
[0104]
In the motion recognition system according to the present invention, the motion detection unit divides the difference image into blocks each having a predetermined size, and calculates an average value or an integrated value of luminance values of pixels included in each block as a block value. A region in which the area of the region formed by connecting blocks having block values exceeding a predetermined threshold is extracted as a region having motion may be created.
[0105]
As a result, in addition to the above-described effects, only a region that occupies a certain range is extracted from the moving region. Therefore, for example, even when a small object different from the target is moving in the background, it can be removed from the target region candidates. Therefore, it is possible to increase the accuracy of detection of the target region.
[0106]
Further, the motion recognition system according to the present invention may be configured such that the color detection means extracts a pixel area in which the pixel value of each color component satisfies a predetermined condition in the image data as an area including a color characterizing the object. Good.
[0107]
Thereby, in addition to the above-described effect, there is an effect that it is possible to accurately detect a region including a color that characterizes a target. Further, by appropriately changing the conditions for each color component, it is possible to appropriately cope with changes in the background and illumination.
[0108]
In the motion recognition system according to the present invention, the color detection unit is a pixel region in which pixel values of each color component satisfy a predetermined condition in the image data, and a region formed by connecting the pixel regions. The area may be extracted as an area including a color characterizing the object when the shape and area of the image satisfy a predetermined condition.
[0109]
Thereby, in addition to the above-described effect, not only the color condition but also the shape and area of the region are taken into consideration, and the region including the color characterizing the object is detected. Therefore, for example, even if there is a region having the same color as the color that characterizes the object in the background, such a region can be excluded from the candidates depending on the conditions based on the shape and area. Therefore, there is an effect that it is possible to improve the accuracy of detection of a region including a color characterizing the object.
[0110]
In addition, the motion recognition system according to the present invention uses the color that characterizes the object until the condition for the pixel value of each color component used to extract the region including the color that characterizes the object reaches the current time. It is good also as a structure determined based on the extraction result of the area | region to include.
[0111]
As a result, in addition to the above effects, for example, even if an environmental change such as the background or lighting state occurs during motion recognition, the condition for the pixel value of each color component is changed according to such a change. It becomes possible. That is, even if the environment changes, there is an effect that it is possible to maintain the accuracy of extracting the region including the color characterizing the object.
[0112]
In the motion recognition system according to the present invention, the region integration unit further includes a color that characterizes the target at the current time, which is a region extracted as a target region by the region integration unit at a predetermined past time. It is good also as a structure which extracts the area | region which contains as said object area | region.
[0113]
Thereby, in addition to the above effect, there is an effect that the target can be extracted as the target region even when the target is hardly moving.
[0114]
The motion recognition system according to the present invention may further include a shape analysis unit that analyzes the shape of the target region extracted by the region integration unit.
[0115]
Thereby, in addition to the above effect, the state of the shape of the target region can be recognized by a certain code indicating the shape. That is, it is possible to classify the shape of the target region that changes variously into a plurality of categories.
[0116]
Further, in the motion recognition system according to the present invention, the shape analysis unit approximates the contour line of the target region by a plurality of straight lines having a predetermined range length, and the target is determined by the inclination, length, and positional relationship of the straight line. It is good also as a structure which recognizes the shape of an area | region.
[0117]
Thereby, in addition to the above-described effects, there is an effect that a necessary minimum shape analysis can be performed.
[0118]
The motion recognition system according to the present invention further includes motion recognition means for recognizing the direction of movement of the target area by tracking the shape of the target area analyzed by the shape analysis means over time. It is good also as a structure.
[0119]
Thereby, in addition to the above-described effect, the state of movement of the target region can be recognized by a certain code indicating movement. In other words, there is an effect that it is possible to classify the movement of the target area that changes variously into a plurality of categories.
[0120]
The motion recognition system according to the present invention may be configured to perform extraction of a target region in the region integration unit and shape analysis in the shape analysis unit on image data at different times.
[0121]
As a result, in addition to the above effects, it is possible to reduce the amount of processing performed at one unit time. Therefore, even in a system that is slightly inferior in computing performance, processing is performed smoothly without causing stagnation of processing. There is an effect that it becomes possible.
[0122]
The motion recognition system according to the present invention may be configured such that the object is a human hand.
[0123]
Thus, in addition to the above effects, for example, the number of fingers to be presented, the direction thereof, and the direction of movement are given meaning, and by recognizing them, for example, an information processing apparatus connected to the outside It is possible to function as an interface for transmitting a control command to the system. As a result, the user interface can be realized by an intuitive operation without the user having to learn complicated operations.
[0124]
Further, the recording medium on which the motion recognition program according to the present invention is recorded records the motion recognition program for recognizing the shape and motion of the target by processing the time-series image data including the image of the specific target. In the recording medium, processing for extracting a region having motion from the time-series image data, processing for extracting a region including a color characterizing the object from the time-series image data, the motion detection unit, and the color detection unit And a motion recognition program for causing a computer to execute a process of extracting, as a target area, an area that is a moving area and includes a color that characterizes the target based on the detection result of is there.
[0125]
Thereby, there is an effect that the target region can be extracted more accurately and with high reliability. For example, even if there is an area of the same color as the color that characterizes the object in the background, the background is basically non-moving, so that it is not extracted as a moving area. Therefore, there is an effect that it is possible to appropriately extract the object without having to use a special environment such as a dark curtain on the background.
[0126]
In addition, since a contact-type input device such as a data glove is not required, troublesome work such as attaching a special device to a hand or the like can be eliminated, and the cost of the system can be reduced. There is an effect.
[0127]
In addition, since the necessary image data may be visible image data that is generally used, for example, special image data such as an infrared image can be input, and a large-sized image input device that is expensive and does not need to be used. There is an effect that can be done.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a schematic configuration of a motion recognition system according to an embodiment of the present invention.
FIG. 2 is a flowchart showing a flow of processing performed in the motion recognition system.
FIGS. 3A to 3D are explanatory diagrams showing images created when the motion detection unit included in the motion recognition system performs processing.
FIG. 4 is an explanatory diagram schematically showing processing in a region integration unit included in the motion recognition system.
FIG. 5A is an explanatory diagram showing a region detected as a body region candidate in the region integration unit, and FIG. 5B shows the region shown in FIG. It is explanatory drawing which shows the state projected on the chromaticity image produced in the detection part.
FIG. 6 shows the chromaticity value on the horizontal axis and the vertical axis for the chromaticity values of the pixels included in the area surrounded by the broken line in FIG. This is a histogram created by taking the number of pixels.
FIG. 7 is an explanatory diagram illustrating processing performed in a shape analysis unit included in the motion recognition system.
[Explanation of symbols]
1 frame memory
2 Motion detector (motion detector)
3 Skin color detection part (color detection means)
4 Area Integration Department (area integration means)
5 Shape analysis part (shape analysis means)
6 motion recognition unit (motion recognition means)
7 Video input device
8 Information processing equipment

Claims

In a motion recognition system that recognizes the shape and motion of an object by processing time-series image data including an image of the specific object,
Motion detection means for extracting a region having motion from the time-series image data;
Color detection means for extracting a region including a color characterizing the object from the time-series image data;
A region integration unit that extracts a target region based on the detection results of the motion detection unit and the color detection unit;
The region integration unit overlaps the motion region detected by the motion detection unit, the region where the region previously detected as the target region by the region integration unit is added, and the color region detected by the color detection unit. A motion recognition system characterized in that, when the area of a region exceeds a predetermined threshold value, that region is a candidate for the target region .

The motion detection means creates a difference image in which the difference between the luminance values of each pixel is a pixel value in two pieces of image data at different times in the time-series image data, and an area with motion based on the difference image The motion recognition system according to claim 1, wherein:

The motion detection means divides the difference image into blocks each having a predetermined size, creates a block image having an average value or an integrated value of luminance values of pixels included in each block as a block value. 3. The motion recognition system according to claim 2, wherein a region in which an area of a region formed by connecting blocks exceeding a predetermined threshold is within a predetermined range is extracted as a region having motion.

The motion recognition system according to claim 1, wherein the color detection unit extracts a pixel region in which the pixel value of each color component satisfies a predetermined condition in the image data as a region including a color characterizing the object.

In the image data, the color detection means is a pixel region in which the pixel value of each color component satisfies a predetermined condition, and the shape and area of the region formed by connecting the pixel regions satisfy a predetermined condition. 5. The motion recognition system according to claim 4, wherein the region is extracted as a region including a color characterizing the object.

The condition for the pixel value of each color component used when extracting the region including the color characterizing the target is determined based on the extraction result of the region including the color characterizing the target until the current time is reached. 6. The motion recognition system according to claim 4 or 5, wherein:

The condition for the pixel value of each color component used when extracting the region including the color characterizing the target is updated based on the extraction result of the region including the color characterizing the target until the current time is reached. 6. The motion recognition system according to claim 4 or 5, wherein:

8. The motion recognition system according to claim 1, further comprising shape analysis means for analyzing the shape of the target area extracted by the area integration means.

The shape analysis means approximates the contour line of the target area with a plurality of straight lines having a predetermined range of length, and recognizes the shape of the target area based on the inclination, length, and positional relationship of the straight line. The motion recognition system according to claim 8.

The motion recognition means for recognizing the direction of movement of the target area by tracking the shape of the target area analyzed by the shape analysis means over time. Motion recognition system.

9. The motion recognition system according to claim 8, wherein the extraction of the target region in the region integration unit and the analysis of the shape in the shape analysis unit are performed on image data at different times.

The motion recognition system according to claim 1, wherein the object is a human hand.

In a recording medium recording an action recognition program for recognizing the shape and action of a target by processing time-series image data including the image of the specific target,
A process of extracting a moving region from the time-series image data;
A process of extracting a region including a color characterizing the object from the time-series image data;
The area of the overlapping area between the detected motion area and the detected motion area based on the result of the process of extracting the moving area, the area previously added as the target area, and the detected color area is A computer-readable recording medium, which records an action recognition program for causing a computer to execute processing for setting a target area as a candidate for a target area when a predetermined threshold is exceeded .

The motion recognition system according to any one of claims 1 to 12, wherein the color detection means narrows down a region including a color based on a circularity of the region including a color.