JP3637226B2

JP3637226B2 - Motion detection method, motion detection device, and recording medium

Info

Publication number: JP3637226B2
Application number: JP02393999A
Authority: JP
Inventors: 康晋山内; 功雄三原; 明森下; 直子梅木; 美和子土井
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1999-02-01
Filing date: 1999-02-01
Publication date: 2005-04-13
Anticipated expiration: 2019-02-01
Also published as: JP2000222585A

Description

【０００１】
【発明の属する技術分野】
本発明は、平面方向のみならず、奥行き方向の動きを検出する動き検出方法およびそれを用いた動き検出装置に関する。
【０００２】
また、本発明は、平面方向のみならず、奥行き方向の動きを検出して、その動きを認識する動き認識方法およびそれを用いた動き認識装置に関する。
【０００３】
【従来の技術】
従来、ビデオカメラなどの撮像装置を用いて、認識対象物の動きを抽出しようとした場合、以下のような方法が取られていた。
【０００４】
まず１つ目は、オプティカルフローと呼ばれる手法である。これは、特定のブロック画像に着目し、隣り合うフレーム間で、ある着目画像領域が平面内でどの方角に動いたかを計測し、その方向を推定するものである。次フレームにおける対象画像の移動方向を特定するには、時系列的に隣り合うフレーム間で類似度を算出する方法が代表的である。対象画像領域近傍で同じサイズのブロック画像を対象に前フレームにおける着目ブロック画像との相関係数を計算し、その係数の最も高いブロックへの方向が動きベクトルとして推定される。
【０００５】
この技術は人間の顔のトラッキングなどロボットビジョンの分野では広く利用されており、相関係数の計算はハードウェア化が容易なため、相関係数を計算する特殊なアクセラレータボードを付加することでリアルタイムなトラッキングも可能となっている。この手法は、着目ブロック画像が２次元的に大きく変化しない場合、かなりの精度で平面方向の動き成分を抽出することが可能であるが、対象画像はビデオカメラなどで取得した２次元画像であるため、奥行き方向を含めた３次元的な動き情報は検出できていない。
【０００６】
また、オプティカルフローの展開例として、隣り合うピクセル間で差分情報をハード的に抽出することで、動きベクトルを推定するセンサも登場してきている。リアルタイムに動き成分がとれることからゲームなどのエンターテイメント分野、あるいは監視システムに利用されつつある。この場合も、ピクセルレベルでの２次元的動き情報は抽出されるが、奥行き方向の動き情報は抽出することが不可能である。
【０００７】
別の手法としては、画像全体を対象とした動きの検出以外にも、画像中の特徴点をトラッキングするという手法もある。たとえば、手先の動きをトラッキングする場合を考えてみる。この場合、まず複数のカメラから撮像した手の取得画像から、手先に相当する部分を特徴点として定義し、その３次元位置情報を検出することが必要になる。そのためには、まず各カメラで同時期に取得された画像から特徴点（手先）を抽出し、そのスクリーン座標から３角測量の要領で特徴点の３次元的な空間位置情報を求める。次に時系列的に求められた特徴点の空間位置情報から、特徴点、この場合手先の３次元的な動き情報を検出することができるわけである。しかし、この手法では、事前に各カメラの各種パラメータの調整に相当するキャリブレーションを行った上で、特徴点の抽出、対応づけといった煩雑な作業が必要であり、汎用的に使える手法とは言えない。
【０００８】
また、動きを特徴づける関節などの部位にあらかじめセンサを装着し、撮像した画像からセンサ部位を抽出し、２次元的あるいは３次元的な動きを計測するモーションキャプチャと呼ばれる手法も存在する。この手法では、上記で紹介した手法に比べ、特徴点の抽出や対応づけ処理は軽くなるが、システム全体のコストが高くつき、システムを稼動する上での制約も多い。さらに煩わしい特定のセンサデバイスを装着する必要があり、とても一般ユーザが使えるものにはなっていない。
【０００９】
以上のように、従来方法では、画像系列から奥行き情報を含む３次元的な動きの抽出を行う方法には様々な問題点があった。
【００１０】
【発明の解決しようとする課題】
以上説明したように、従来の手法では、ビデオカメラなどを用いて認識対象物を２次元情報しか持たない画像として取得していたため、対象物の３次元的な動きの抽出を２次元情報のみから行うしかなく、高精度で奥行き方向を含めた３次元的な動きの認識を行うことは不可能であった。
【００１１】
そこで、本発明は、３次元的な動きを高精度に検出できる動き検出方法およびそれを用いた動き検出装置を提供することを目的とする。
【００１２】
また、本発明は、３次元的な動きを高精度に認識できる動き認識方法およびそれを用いた動き認識装置を提供することを目的とする。
【００１３】
【課題を解決するための手段】
（１）本発明の動き検出方法は、取得した距離画像を所定の大きさの小領域に分割して、連続して取得した距離画像間で、類似する前記小領域毎に平面方向の動きを検出し、前記小領域毎に奥行き情報を算出して、該奥行き情報を基に類似する小領域間で奥行き方向の動きを検出することにより、３次元的な動きを高精度に検出することができる。
【００１４】
（２）本発明の動き検出方法は、取得した距離画像を所定の大きさの小領域に分割して、前記小領域毎に奥行き情報を算出し、連続して取得した距離画像間で、前記小領域毎に前記奥行き情報を基に奥行き方向の動きを検出し、移動元の小領域と、前記検出された奥行き方向の動きに基づき前記奥行き情報を補正した小領域であって前記移動元の小領域に類似する小領域との間で平面方向の動きを検出することにより、奥行き方向の動き成分による距離画像の濃淡変化を補正した上で、平面方向動き情報の検出を行っているため、より高精度に奥行き方向の動き成分を有する対象物の３次元的な動きを検出することができる。
【００１５】
（３）本発明の動き認識方法は、取得した距離画像を所定の大きさの小領域に分割して、連続して取得した距離画像間で、類似する前記小領域毎に平面方向の動きを検出し、前記小領域毎に奥行き情報を算出して、該奥行き情報を基に類似する小領域間で奥行き方向の動きを検出し、前記平面方向の動きと前記奥行き方向の動きとから該動きを認識することにより、３次元的な動きを高精度に認識することができる。
【００１６】
（４）本発明の動き認識方法は、取得した距離画像を所定の大きさの小領域に分割して、前記小領域毎に奥行き情報を算出し、連続して取得した距離画像間で、前記小領域毎に前記奥行き情報を基に奥行き方向の動きを検出し、移動元の小領域と前記検出された奥行き方向の動きに基づき前記奥行き情報を補正した小領域であって前記移動元の小領域に類似する小領域との間で平面方向の動きを検出し、前記平面方向の動きと前記奥行き方向の動きとから該動きを認識することにより、奥行き方向の動き成分による距離画像の濃淡変化を補正した上で、平面方向動き情報の検出を行っているため、より高精度に奥行き方向の動き成分を有する対象物の３次元的な動きを認識することができる。
【００１７】
（５）本発明の動き検出装置は、距離画像を取得する画像取得手段と、
この画像取得手段で取得した距離画像を所定の大きさの小領域に分割する分割手段と、
前記画像取得手段で連続して取得した距離画像間で、類似する前記小領域毎に平面方向の動きを検出する第１の検出手段と、
前記小領域毎に奥行き情報を算出する算出手段と、
この算出手段で算出された奥行き情報を基に類似する小領域間で奥行き方向の動きを検出する第２の検出手段と、
を具備したことにより、３次元的な動きを高精度に検出することができる。
【００１８】
（６）本発明の動き検出装置は、距離画像を取得する画像取得手段と、
この画像取得手段で取得した距離画像を所定の大きさの小領域に分割する分割手段と、
前記小領域毎に奥行き情報を算出する算出手段と、
前記画像取得手段で連続して取得した距離画像間で、前記小領域毎に前記奥行き情報を基に奥行き方向の動きを検出する第１の動き検出手段と、
この第１の検出手段で検出された奥行き方向の動きに基づき前記奥行き情報を補正する補正手段と、
移動元の小領域と、前記補正手段で前記奥行き情報を補正した小領域であって前記移動元の小領域に類似する小領域との間で平面方向の動きを検出する第２の検出手段とを具備したことにより、奥行き方向の動き成分による距離画像の濃淡変化を補正した上で、平面方向動き情報の検出を行っているため、より高精度に奥行き方向の動き成分を有する対象物の３次元的な動きを検出することができる。
【００１９】
（７）本発明の動き認識装置は、距離画像を取得する画像取得手段と、
この画像取得手段で取得した距離画像を所定の大きさの小領域に分割する分割手段と、
前記画像取得手段で連続して取得した距離画像間で、類似する前記小領域毎に平面方向の動きを検出する第１の検出手段と、
前記小領域毎に奥行き情報を算出する算出手段と、
この算出手段で算出された奥行き情報を基に類似する小領域間で奥行き方向の動きを検出する第２の検出手段と、
類似する小領域間から検出された前記平面方向の動きと前記奥行き方向の動きとから該動きを認識する認識手段と、
を具備したことにより、３次元的な動きを高精度に認識することができる。
【００２０】
（８）本発明の動き認識装置は、距離画像を取得する画像取得手段と、
この画像取得手段で取得した距離画像を所定の大きさの小領域に分割する分割手段と、
前記小領域毎に奥行き情報を算出する算出手段と、
前記画像取得手段で連続して取得した距離画像間で、前記小領域毎に前記奥行き情報を基に奥行き方向の動きを検出する第１の動き検出手段と、
この第１の検出手段で検出された奥行き方向の動きに基づき前記奥行き情報を補正する補正手段と、
移動元の小領域と、前記補正手段で前記奥行き情報を補正した小領域であって前記移動元の小領域に類似する小領域との間で平面方向の動きを検出する第２の検出手段と、
類似する小領域間から検出された前記平面方向の動きと前記奥行き方向の動きとから該動きを認識する認識手段と、
を具備したことにより、奥行き方向の動き成分による距離画像の濃淡変化を補正した上で、平面方向動き情報の検出を行っているため、より高精度に奥行き方向の動き成分を有する対象物の３次元的な動きを認識することができる。
【００２１】
【発明の実施の形態】
以下、本発明の実施形態について図面を参照して説明する。
（第１の実施形態）
図１は、本発明の第１の実施形態に係る画像認識装置の全体構成図である。
【００２２】
本実施形態の画像認識装置は、距離画像ストリームを取得するための撮像手段を備えた画像取得部１と、画像取得部１で取得された距離画像を格納する画像格納部２と、画像格納部２に格納された距離画像を動き検出単位となる所定の大きさの小領域（ブロック画像）に分割するブロック分割部３と、画像取得部１で取得した距離画像を動き検出単位となる所定の大きさの小領域（ブロック画像）に分割するブロック分割部４と、画像格納部２に格納された距離画像（サンプル距離画像）と画像取得部１で取得された距離画像（最新距離画像）との間でブロック画像毎に平面方向の動き（平面方向動きベクトル）を検出する平面方向動き検出部５と、ブロック画像の奥行き情報（距離値）を算出する距離値算出部６と、平面方向動き検出部５で検出された平面方向の動き（平面方向動き情報、より具体的には平面方向動きベクトル）に基づき推定される平面方向の移動元のブロック画像と移動先のブロック画像との間の距離値算出部６で算出された距離値を基に奥行き方向の動き（奥行き方向動き情報、より具体的には奥行き方向動きベクトル）を検出する奥行き方向動き検出部７と、平面方向動き検出部５で検出された平面方向動き情報と奥行き方向動き検出部７で検出された奥行き方向動き情報とテンプレート９を参照して当該動きを認識する動き認識部８と、認識すべき動きを登録したテンプレート９とから構成されている。
【００２３】
まず、画像取得部１および距離画像について説明する。
【００２４】
画像取得部１は、認識対象物体（例えば、人間の手、顔、全身など）を、その３次元形状を反映した奥行き値を持つ画像（以降、距離画像と呼ぶ）として所定時間毎（例えば１／３０秒毎など）に取得するものである（例えば、特願平８−２７４９４９号の画像取得方法などを用いて実現することができる）。
【００２５】
所定時間毎に距離画像が取得されてゆくため、これらをメモリなどを用いて、画像取得部１の内部または外部で逐次保持することで、対象物の距離画像による動画像（以降、距離画像ストリームと呼ぶ）を得ることができる。このとき、距離画像ストリームは、距離画像の取得間隔をｔ秒としたとき、「最新の距離画像」、「最新からｔ秒前（以降、１フレーム前と呼ぶ）の距離画像」、「最新から２ｔ秒前（２フレーム前、以下同様）の距離画像」、…、といった複数フレームの距離画像の集合体として得られることになる。
【００２６】
画像取得部１は、図２に示すように、主に、発光部１０１、受光部１０３、反射光抽出部１０２、タイミング信号生成部１０４から構成される。
【００２７】
発光部１０１は、タイミング信号生成部１０４にて生成されたタイミング信号に従って時間的に強度変動する光を発光する。この光は発光部前方にある対象物体に照射される。
【００２８】
受光部１０３は、発光部１０１が発した光の対象物体による反射光の量を検出する。
【００２９】
反射光抽出部１０２は、受光部１０３にて受光された反射光の空間的な強度分布を抽出する。この反射光の空間的な強度分布は画像として捉えることができるので、これを反射光画像あるいは距離画像と呼ぶ。
【００３０】
受光部１０３は一般的に発光部１０１から発せられる光の対象物による反射光だけでなく、照明光や太陽光などの外光も同時に受光する。そこで、反射光抽出部１０２は発光部１０１が発光しているときに受光した光の量と、発光部１０１が発光していないときに受光した光の量の差をとることによって、発光部１０１からの光の対象物体による反射光成分だけを取り出す。
【００３１】
反射光抽出部１０２では、受光部１０３にて受光された反射光から、その強度分布、すなわち、図３に示すような反射光画像（距離画像）を抽出する。
【００３２】
図３では、簡単のため、２５６×２５６画素の反射光画像の一部である８×８画素の反射光画像の場合について示している。
【００３３】
物体からの反射光は、物体の距離が大きくなるにつれ大幅に減少する。物体の表面が一様に光を散乱する場合、反射光画像１画素あたりの受光量は物体までの距離の２乗に反比例して小さくなる。
【００３４】
反射光画像の各画素値は、その画素に対応する単位受光部で受光した反射光の量を表す。反射光は、物体の性質（光を鏡面反射する、散乱する、吸収する、など）、物体の向き、物体の距離などに影響されるが、物体全体が一様に光を散乱する物体である場合、その反射光量は物体までの距離と密接な関係を持つ。手などは、このような性質をもつため、画像取得部１の前方に手を差し出した場合の反射光画像は、手までの距離、手の傾き（部分的に距離が異なる）などを反映する図４に示したような３次元的なイメージを得ることができる。
【００３５】
図５は、例えば、特願平９−２９９６４８号に記載されているような画像取得部１を構成する発光部１０１と、受光部１０３の外観の一例を示したもので、中央部には円形レンズとその後部にあるエリアセンサ（図示せず）から構成される受光部１０３が配置され、円形レンズの周囲にはその輪郭に沿って、赤外線などの光を照射するＬＥＤから構成される発光部１０１が複数（例えば６個）等間隔に配置されている。
【００３６】
発光部１０１から照射された光が物体に反射され、受光部１０３のレンズにより集光され、レンズの後部にあるエリアセンサで受光される。エリアセンサは、例えば２５６×２５６のマトリックス状に配列されたセンサで、マトリックス中の各センサにて受光された反射光の強度がそれぞれ画素値となる。このようにして取得された画像が、図３に示すような反射光の強度分布としての距離画像である。
【００３７】
図３は、距離画像データの一部（２５６ｘ２５６画素の一部の８ｘ８画素）を示したものである。この例では、行列中のセルの値（画素値）は、取得した反射光の強さを８ビット２５６階調で示したものである。例えば、「２５５」の値があるセルは、画像取得部１に最も接近した状態、「０」の値があるセルは、画像取得部１から遠くにあり、反射光が画像取得部１にまで到達しないことを示している。
【００３８】
図４は、図３に示したようなマトリックス形式の距離画像データ全体を３次元的に示したものである。この例では、人間の手の距離画像データの場合を示している。
【００３９】
図６に、画像取得部１により取得された手の距離画像の例を示す。距離画像は、奥行き情報を有する３次元画像で、例えば、ｘ軸（横）方向６４画素、ｙ軸（縦）方向６４画素、ｚ軸（奥行き）方向２５６階調の画像になっている。図６は、距離画像の距離値、すなわちｚ軸方向の階調をグレースケールで表現したもので、この場合、色が黒に近いほど画像取得部１からの距離が近く、白に近くなるほど距離が遠いことを示している。また、色が完全に白のところは、画像がない、あるいはあっても遠方でないのと同じであることを示している。
【００４０】
物体からの反射光の強さは当該物体までの距離の２乗に反比例して小さくなる。すなわち、距離画像中の各画素（ｉ、ｊ）の画素値をＱ（ｉ、ｊ）とすると、
Ｑ（ｉ、ｊ）＝Ｋ／ｄ^２…（１）
と表すことができる。
【００４１】
ここで、Ｋは、例えば、ｄ＝０．５ｍのときに、Ｒ（ｉ、ｊ）の値が「２５５」になるように調整された係数である。上式をｄについて解くことで、距離値を求めることができる。
【００４２】
次に、図１の動き認識装置の各構成部についてより詳細に説明する。
【００４３】
画像格納部２は、画像取得部１で取得された距離画像ストリーム中に含まれる距離画像のうち、常に最新より数フレーム前（例えば、常に１フレーム前）の距離画像（以降、サンプル距離画像と呼ぶ）を格納しておくためのものである。
【００４４】
ここで、サンプル距離画像として何フレーム前の距離画像を用いるかは、画像取得部１の距離画像取得間隔（フレームレート）、対象物の動作速度などの情報を基に決定する。例えば、対象物の２次元投影イメージが変化しない一連の動作の間に、Ｎフレーム取得できるならば、サンプル画像は、１ないしＮフレーム前の距離画像の間で任意に選べばよい。
【００４５】
ブロック分割部３は、画像格納部２に格納された距離画像（サンプル距離画像）を、動き検出単位となるブロック画像に分割するためのものである。
【００４６】
ブロック分割部４は、画像取得部１で新規に取得された距離画像（最新距離画像）を、動き検出単位となるブロック画像に分割するためのものである。
【００４７】
ここでは、取得された距離画像をサイズの等しいブロック画像に分割することを考える。例えば、対象となる距離画像のフレームサイズがｘ軸（横）方向６４画素、ｙ軸（縦）方向６４画素である場合、ｘ軸方向、ｙ軸方向それぞれに８等分して分割すると、１つのブロック画像の大きさは、ｘ軸（横）方向８画素、ｙ軸（縦）方向８画素の計８ｘ８＝６４画素で構成されることになる。
【００４８】
図７は、図６に示した距離画像を８ｘ８のブロック画像に分割した様子を表したものである。
【００４９】
次に、平面方向動き検出部５で実際にどのようにして画像格納部２に格納されたサンプル距離画像と、画像取得部１によって新規に取得された最新距離画像とから、平面方向の動き情報を検出するかを図８に示すフローチャートを参照して説明する。
【００５０】
まず、ブロック分割部３で分割されたサンプル距離画像上のブロック画像のうち、動き情報算出の対象となるブロック画像（対象ブロック）を設定する（ステップＳ１）。
【００５１】
ブロック分割部４で分割された最新距離画像上のブロック画像（探索ブロック）と、ステップＳ１で設定された対象ブロックとの間の類似度を計算することで、対象物が動いたと推定される移動先ブロックの探索を行う（ステップＳ２）。
【００５２】
類似度算出の対象となる最新距離画像上の探索ブロックは、ブロック分割部３、４で同じブロックサイズに分割を行ったとすると、サンプル距離画像中の１つの対象ブロックに対し、ｘ軸方向、ｙ軸方向に１ブロックずつずらすことで、最大（ｘ軸方向フレームサイズ／ｘ軸方向ブロックサイズ）×（ｙ軸方向フレームサイズ／ｙ軸方向ブロックサイズ）個のブロック画像をとることができる。しかし、実際には時系列的に近いフレーム同士の相関はとても高いため、探索対象となるブロックは、対象ブロックに対してｘ軸方向、ｙ軸方向にそれぞれ１ブロックずつずらした範囲で行えば十分なことが多い。
【００５３】
ここでは、図９に示すように、最新距離画像Ｇ１、その１フレーム前のサンプル距離画像Ｇ２ともに一定サイズのブロック画像に分割し、各ブロック画像間の類似度を求めることにする。例えば、フレームサイズがｘ軸方向に６４画素、ｙ軸方向に６４画素の距離画像をｘ軸方向、ｙ軸方向それぞれに８分割した場合、ｘ軸方向およびｙ軸方向のブロックサイズは共に６４／８＝８画素となる。また、最新距離画像Ｇ２上の探索する範囲としてサンプル距離画像Ｇ１中の対象ブロックａ２２に対し上下、左右に１ブロックずれたブロック画像ｂ１１、ｂ１２、ｂ１３、ｂ２１、ｂ２３、ｂ３１、ｂ３２、ｂ３３とする。
【００５４】
なお、最新距離画像中の探索ブロックの選択は、このように、必ずしも探索ブロック同士が互いに重なり合わないブロック画像になるように選択する必要はなく、サイズの等しいブロック画像として、最新距離画像中の対象ブロックａ２２と等しい位置にあるブロック画像を中心として、ｘ軸方向、ｙ軸方向に１画素ずつずらすことで、最大（ｘ軸方向フレームサイズ−ｘ軸方向ブロックサイズ）×（ｙ軸方向フレームサイズ−ｙ軸方向ブロックサイズ）個、任意にとることもできる。
【００５５】
最新距離画像上の探索ブロック（ブロック単位の計測で、その位置をｘ軸方向にｘブロック目、ｙ軸方向にｙブロック目とし、（ｘ，ｙ）とする）と、サンプル距離画像上の対象ブロック（その位置をブロック単位の計測で、ｘ軸方向にｚブロック目、ｙ軸方向にｗブロック目とし、（ｚ，ｗ）とする）間の類似度Ｃｚｗ−ｘｙは、例えば、次式（２）から求めることができる。
【００５６】
【数１】

【００５７】
式（２）を用いて、対象ブロック（例えば、図９の対象ブロックａ２２）とその周辺の最新距離画像上の全ての探索ブロック（例えば、図９の探索ブロックｂ１１、ｂ１２、ｂ１３、ｂ２１、ｂ２３、ｂ３１、ｂ３２、ｂ３３）のそれぞれとの類似度を算出する（ステップ３）。図９において、サンプル距離画像Ｇ１中の対象ブロックａ２２と最新距離画像Ｇ２中の探索ブロックｂ１１、ｂ１２、ｂ１３、ｂ２１、ｂ２３、ｂ３１、ｂ３２、ｂ３３との類似度を式（２）を用いて求めると、探索ブロックｂ２２との類似度が「０．１」、探索ブロックｂ３２との類似度が「０．２」、探索ブロックｂ３３との類似度が「０．９」、それ以外の探索ブロックとの類似度が「０」となる。
【００５８】
全ての探索ブロックに対して処理が終了したら、平面方向動き検出部５で平面方向動きベクトル算出処理（ステップＳ４）に移る。
【００５９】
平面方向動きベクトル算出処理では、以上で求めたサンプル距離画像上の対象ブロックに対する最新距離画像上の探索ブロックの類似度算出結果から、対象ブロックに最も類似しているブロック画像を抽出する。
【００６０】
式（２）を用いて類似度を算出した場合、その値の最も高い探索ブロックを抽出することになる。例えば、図９において、サンプル距離画像中の対象ブロックａ２２と最も類似する最新距離画像中の探索ブロックはｂ３３であるので、対象ブロックａ２２（ここでは、対象ブロックａ２２の端点あるいは中心位置）から探索ブロックｂ３３（ここでは、探索ブロックｂ３３の端点あるいは中心位置）へのベクトルが最終的な平面方向の動きベクトルとなる。
【００６１】
例えば、対象ブロックの位置が、ブロック単位の計測で（２，２）の位置（つまり、ｘ軸方向に２ブロック、ｙ軸方向に２ブロック目のブロック）、類似度の最も高い探索ブロックの位置が同様に（３，３）の位置であった場合、平面方向にブロック移動したと推定される移動量は（３，３）−（２，２）＝（１，１）となり、最終的な平面方向動きベクトル量はｘ軸方向に１×（ｘ軸方向のブロックサイズ）、ｙ軸方向に１×（ｙ軸方向のブロックサイズ）となる。
【００６２】
なお、図９は、対象ブロックを中心とした上下、左右１ブロック範囲の探索ブロックにおいて、式（２）を用いて算出した類似度と、最終的に検出された平面方向動きベクトルを図示したものである。
【００６３】
サンプル距離画像上の全ブロック画像を対象ブロックとして、ステップＳ１〜ステップＳ４を繰り返し、平面方向動きベクトルを算出する（ステップＳ５）。
【００６４】
距離値算出部６は、距離画像上のブロック画像が有するｚ軸方向（奥行き方向）の距離値を求めるものである。これは、そのブロック画像に撮像されている対象物体までの距離を代表する奥行き情報を求めることに相当している。
【００６５】
図１０は、ブロック画像を構成する全ての画素値を平均化することで、そのブロック画像の距離値を求める場合を概念的に示したものである。例えば、座標（ｘ，ｙ）にあるブロック画像の距離値Ｄｘｙは、そのブロック画像内のすべての画素値を平均化することにより、次式（３）から求めることができる。なお、図１０では、１フレーム中の隣接する２つのブロック画像（第１のブロック画像、第２のブロック画像）の離散的な画素値（ここでは、距離値に対応する）を連続する滑らかな曲線で描いている。
【数２】

【００６６】
なお、式（３）を用いて、単純に、ブロック画像中の画素値の平均値をそのまま距離値として用いてもよく、また、式（３）を用いる際に、ブロック画像中の各画素毎に例えば、式（１）を用いて求めた距離値ｄをＦｘｙとし用いてもよく、また、式（３）を用いて求めた画素値の平均値から式（１）を用いて距離値ｄを求めても良い。ここで、説明の簡単なため、上記いずれの場合であってもブロック画像の距離値と呼ぶことにする。
【００６７】
ブロック画像の距離値の算出は、ここで述べた平均化による手法に限定する必要はなく、ｘｙ平面上のブロック画像を底辺とする立方体を定義し、ブロック画像を構成する距離値の総和と等しくなるようなｚ軸方向の高さを用いてもよいし、あるいは単にブロック画像を構成する距離値の中間値などを用いることもできる。
【００６８】
奥行き方向動き検出部７では、距離画像を構成する全ブロック画像において、平面方向動き検出部５で検出された平面方向動きベクトルで推定される最新距離画像上の移動先として推定されるブロック画像の距離値、およびサンプル距離画像上の移動元のブロック画像の距離値を差し引くことで、奥行き方向動き情報を算出する。
【００６９】
図１１は、奥行き方向動き検出部７において、サンプル距離画像と最新距離画像上の移動元のブロック画像と移動先のブロック画像との距離値から奥行き方向の動き情報が算出される様子を表したものである。ここで算出された奥行き方向動き情報が最終的に奥行き方向動きベクトル（向きが奥行き方向で大きさが移動元のブロック画像と移動先のブロック画像との距離値の差）として出力される。
【００７０】
以上説明したように、画像取得部１で取得される距離画像ストリームから平面方向動き情報のみならず、奥行き方向動き情報を検出することが可能となっている。
【００７１】
ブロック分割部３におけるブロック分割の精度、つまり１フレームあたりのブロック分割を細かく行えば、より精細な動き情報の検出を行うことが可能であり、逆にブロック分割を粗く行えば、より大きな撮像対象物体に対して対象レベルの動き検出を高速に行うことが可能となる。
【００７２】
動き認識部８については、後述する。
（第２の実施形態）
上記第１の実施形態では、平面方向動き情報を求め、その後、奥行き方向動き情報を求めているが、正確にいうならば、物体は、平面方向だけでなく、奥行き方向にも動いているので、最初に求めた平面方向動き情報には、奥行き方向動き情報を平面方向に投影した分の動きが影響してしまっている。
【００７３】
そこで、第２の実施形態では、このような問題点を解決するため、まず、全てのブロック画像に対し、奥行き方向動き情報を求める。このとき、サンプル距離画像上のあるブロック画像と最新距離画像上のどのブロック画像とが対応するかはわかっていないので、最新距離画像上のサンプル距離画像上の対象ブロックに対応する位置の近傍の複数のブロック画像に対し、奥行き方向動き情報を求める。次に、この奥行き方向動き情報を用いて、ブロック画像の奥行き情報（すなわち、距離値）を補正し、補正された最新の距離画像のブロック画像について、平面方向動き情報を求める。このようにすることで、奥行き方向動き情報を含まない正確な平面方向動き情報を取得でき、さらにこれに基づいて、奥行き方向動き情報を求めることが可能となる。
【００７４】
図１２は、本発明の第２の実施形態に係る画像認識装置の全体構成図である。
【００７５】
本実施形態の画像認識装置は、距離画像ストリームを取得するための撮像手段を備えた画像取得部１と、画像取得部１で取得された距離画像を格納する画像格納部１２と、画像格納部１２に格納された距離画像を動き検出単位となる所定の大きさの小領域（ブロック画像）に分割するブロック分割部１３と、ブロック分割部１３で分割されたブロック画像の奥行き情報（距離値）を算出する距離値算出部１４と、画像取得部１で取得した距離画像を動き検出単位となる所定の大きさの小領域（ブロック画像）に分割するブロック分割部１５と、ブロック分割部１５で分割されたブロック画像の奥行き情報（距離値）を算出する距離値算出部１６と、画像格納部１２に格納された距離画像（サンプル距離画像）と画像取得部１で取得された距離画像（最新距離画像）との間でブロック画像毎に奥行き方向の動き（奥行き方向動きベクトル）を検出する奥行き方向動き検出部１７と、奥行き方向動き検出部１７で検出された奥行き方向動き情報に基づき画像取得部１で取得された距離画像上のブロック画像の奥行き情報を補正する奥行き情報補正部１８と、奥行き情報を補正した最新距離画像上のブロック画像とサンプル距離画像上のブロック画像とから平面方向の動き（平面方向動きベクトル）を検出する平面方向動き検出部１９と、平面方向動き検出部１９で検出された平面方向動きベクトルと奥行き方向動き検出部１７で検出された奥行き方向動きベクトルとテンプレート９を参照して動きを認識する動き認識部８と、認識すべき動きを登録したテンプレート９とから構成されている。
【００７６】
画像取得部１、画像格納部１２、ブロック分割部１３、１５、および距離値算出部１４は、図１の画像取得部１、画像格納部２、ブロック分割部３、４、および距離値算出部６と同様であるので説明は省略し、異なる部分について説明する。
【００７７】
図１２に示す動き認識装置は、画像取得部１で取得した最新距離画像と画像格納部１２に格納されたサンプル距離画像おのおのにおいて、各ブロック画像の奥行き情報（距離値）を、距離値算出部１４、１６で予め算出しておくことにある。
【００７８】
奥行き方向動き検出部１７では、サンプル距離画像上の動き検出対象であるブロック画像（対象ブロック）に対する最新距離画像上のブロック画像の奥行き方向の動きを検出する（奥行き方向動き情報を算出する）ものである。
【００７９】
ここで、図１３に示すフローチャートを参照して、奥行き方向動き検出部１７における奥行き方向動き検出処理について説明する。
【００８０】
まず、サンプル距離画像上のブロック画像のうち、動き情報算出の対象となるブロック画像（対象ブロック）を設定する（ステップＳ２１）。
【００８１】
画像取得部１で取得された最新距離画像上の動き情報探索の対象となるブロック画像（探索ブロック）の距離値を、ステップＳ２１で設定されたサンプル距離画像上の対象ブロックの距離値と比較することで、奥行き方向動き情報を算出する（ステップＳ２２）。探索ブロックは、最新距離画像上からサンプル距離画像上の対象ブロックに移動先となりうるブロック画像、すなわち、探索ブロックを選択する場合は、第１の実施形態と同様である。
【００８２】
例えば、探索ブロックの距離値からサンプル距離画像上の対象ブロックの距離値を差し引くことで、その探索ブロックにおける対象ブロックの奥行き方向動き情報を求めることができる。
【００８３】
この段階では最新距離画像上のどの探索ブロックにサンプル距離画像上の対象ブロックが遷移したかは明確でないが、ここで求めた各探索ブロックの奥行き方向動き情報が、最終的な奥行き方向動きベクトルとなるわけである。
【００８４】
ステップＳ２２の処理を最新距離画像上の全ての探索ブロックに対して行う（ステップＳ２３）。全ての探索ブロックに対しステップＳ２２の処理を終了したら、ステップＳ２１に戻り、サンプル距離画像上の他のブロック画像を対象ブロックに設定して、サンプル距離画像上の全てのブロック画像についてステップＳ２２の処理を行う（ステップＳ２４）。
【００８５】
奥行き情報補正部１８では、奥行き方向動き検出部１７で求めた各探索ブロックの奥行き方向動き情報にもとづき、最新距離画像上のブロック画像を変換し、平面方向動き検出部１９において平面方向の動き成分を精度よく抽出するために、奥行き方向動き成分を補正する。
【００８６】
ここで、図１４に示すフローチャートを参照して、奥行き情報補正部１８における、奥行き情報補正処理について説明する。
【００８７】
まず、サンプル距離画像上のブロック画像のうち、動き情報算出の対象となるブロック画像（対象ブロック）を設定する（ステップＳ３１）。奥行き方向動き検出部１７で、動き情報探索の対象となった最新距離画像上のブロック画像に対して、各探索ブロック位置で求められた奥行き方向動き情報に基づき、当該ブロック画像全体に距離値を補正するための変換処理をかける（ステップＳ３２）。
【００８８】
一般に対象物体が奥行き方向に平行運動を行った場合、その動きは対象物体を撮像した距離画像上では、距離値変化、つまり濃淡変化として求められる。したがって、第１の実施形態における平面方向動き検出部５で行ったように異なる時間に取得された最新距離画像とサンプル距離画像間で、ブロック画像間の相関をとってもそれらの間に類似性を見出せない場合も起こり、この場合、平面方向の動き成分が検出できないということになるわけである。
【００８９】
そこで、第２の実施形態では、この奥行き情報補正部１８で、最新距離画像に奥行き成分の補正をかけることで、平面方向動き検出部１９におけるブロック画像間の類似度計算において、上記の奥行き方向依存成分を除去することが必要になるわけである。
【００９０】
ここでは、最新距離画像上の探索ブロックの画像をその距離値が、対象ブロックの距離値と等しくなるように変換処理をかけることにする。探索ブロックを構成する画素パターンには変化を与えず、距離値のみを変化させる最も簡単な方法は、ブロックを構成する画素全体の画素値をそのブロック画像の奥行き方向動き情報だけｚ軸方向にシフトさせることである。
【００９１】
例えば、ブロック単位の計測で最新距離画像上のｘ軸方向にｘブロック目、ｙ軸方向にｙブロック目の位置（ｘ，ｙ）にあるブロック画像中の画素値は次式（４）を用いて、奥行き方向の補正を行うことができる。
【００９２】
【数３】

【００９３】
図１５は、奥行き情報補正部１８で、最新距離画像上のブロック画像をそのブロック画像から検出された奥行き方向動き情報だけ、ｚ軸（奥行き）方向に底上げし、当該ブロック画像の距離値をサンプル距離画像上の対象ブロックの画像の距離値に合わせている様子を示している。
【００９４】
以上の奥行き情報補正処理を最新距離画像上の全ての探索ブロックに対して行う（ステップＳ３３）。全ての探索ブロックに対しステップＳ３２の処理を終了したら、ステップＳ３１に戻り、サンプル距離画像上の他のブロック画像を対象ブロックに設定して、サンプル距離画像上の全てのブロック画像についてステップＳ２２の処理を行う（ステップＳ２４）。
【００９５】
次に、平面方向動き検出部１９では、サンプル距離画像上の対象ブロックの画像と、奥行き情報補正部１８でその距離値を補正して得られた変換距離画像とから、対象ブロックの平面方向動き情報を検出する。
【００９６】
平面方向動き検出部１９における処理の流れは、第１の実施形態における平面方向動き検出部５および図８に示した平面方向動き検出部５の処理動作を示したフローチャートの説明で、探索ブロックとして奥行き情報補正部１８で補正したブロック画像と置き換えれば同様である。
【００９７】
最終的にサンプル距離画像上の各対象ブロックにおいて、平面方向動き検出部１９で検出された最も相関の高いブロックへの相対ベクトルが平面方向動きベクトルとして検出され、移動先ブロック（最も相関の高いブロック）における奥行き方向動き検出部１７であらかじめ検出しておいた奥行き方向動き情報が奥行き方向動きベクトルとして検出できたことになる。
【００９８】
第１の実施形態における構成では、奥行き方向動き情報の検出処理が一度で済むため、高速に奥行き方向の動き情報を検出することができるが、奥行き方向の動き成分が大きな対象物に対しては、平面方向の動き成分を検出し損ねる可能性がある。一方、第２の実施形態における構成では、奥行き方向動き情報に基づき最新距離画像の奥行き情報に補正をかけ、奥行き方向の動き成分による距離画像の濃淡変化を補正した上で、平面方向動き情報の検出を行っているため、より高精度に奥行き方向の動き成分を有する対象物の３次元動き情報を検出することが可能となる。
（第１および第２の実施形態における動き検出結果の表示および動き認識部８の説明）
図１７、１８は、第１の実施形態の平面方向動き検出部５と奥行き方向動き検出部７、あるいは第２の実施形態の奥行き方向動き検出部１７と平面方向動き検出部１９にて検出された、３次元動き情報（平面方向動きベクトル、奥行き方向動きベクトル）を３次元的なベクトルで表して、所定の表示装置に表示する際の表示例を示したものである。
【００９９】
ここでは、例えば、６４×６４の画素で構成される距離画像を縦方向に８等分、横方向に８等分に分割し、各ブロック画像における動きベクトルを矢印で３次元的に表示したものである。
【０１００】
動きベクトルを３次元的に表示するとは、例えば、図１９に示すように、移動距離が０〜５ｃｍ、５〜１０ｃｍ、１０〜１５ｃｍと３段階に分類して、奥行き方向動き情報に基づき、矢印を段階的に色を違えたり、線分の種類を違えたりして表示すればよい。
【０１０１】
図１７、１８に示した画面表示例で、画面の下部に表示してあるスケールは、求められた各ブロック画像の動きベクトルを合成することで、フレーム全体のＸＹＺ各軸方向における動き成分のかたよりを表示したものである。例えば、図１７の場合、Ｘ軸方向の動きスケールが右に片寄っている、つまり動きベクトルが全体的に右を向いていることを示しており、撮像した手が右に動いていると解釈することができる。同様にＹ軸方向、あるいは奥行き方向（Ｚ軸方向）に関しても、動きベクトルのＹ軸方向成分、奥行き成分（Ｚ軸方向成分）を観察することで、撮像しているオブジェクトの全体的な動きも検出することができるわけである。
【０１０２】
このように、フレーム画像中の特定個所、つまり各ブロック画像における動き情報を検出するだけでなく、各ブロックにおける動きベクトルを合成することで、領域としての動き情報を求めることも可能であり、オブジェクトの動き認識といったより高次の動き検出に利用することもできる。
【０１０３】
例えば、図１、図１２の動き認識部８では、６４×６４の画素で構成される距離画像を縦方向に８等分、横方向に８等分に分割して各ブロック画像から検出された平面方向および奥行き方向の動きベクトルを合成して、縦方向に２等分、横方向に２等分に分割して計４つのブロック画像毎の３次元的な動きベクトルを算出する。
【０１０４】
この動きベクトルと図１６に示したようなテンプレート９と照合して検出された動きがどのような種類の動きであるかを認識する。このテンプレート９は、所定のメモリ上に構成される。
【０１０５】
図１６に示したテンプレート９には、認識すべき動き（例えば、右上への移動、左上への移動、右回りの回転移動、左回りの回転移動等）のそれぞれに対応する４つの各ブロック画像毎の３次元動きベクトルが予め登録されている。
【０１０６】
動き認識部８は、図１７に示したような動きベクトルを合成して４つの３次元動きベクトルを算出し、図１９に示したようなテンプレート９を参照することにより、この距離画像から検出された動きは右上への移動であると認識できる。また、図１８に示したような動きベクトルを合成して４つの３次元動きベクトルを算出し、図１９に示したようなテンプレートを参照することにより、この距離画像から検出された動きは右回りの回転移動であると認識できる。
【０１０７】
以上の各構成部は、画像取得部１の撮像部分を除いて、ソフトウェアとしても実現可能である。すなわち、上記した各手順をコンピュータに実行させることのできるプログラムとして機械読みとり可能な記録媒体に記録して頒布することができる。
【０１０８】
本発明は、上述した実施の形態に限定されるものではなく、その技術的範囲において種々変形して実施することができる。
【０１０９】
【発明の効果】
以上説明したように、本発明によれば、３次元的な動きを高精度に検出できる。また、３次元的な動き高精度に認識できる。
【図面の簡単な説明】
【図１】本発明の第１の実施形態に係る画像認識装置の構成例を概略的に示す図。
【図２】画像取得部の概略的な構成図。
【図３】距離画像をマトリックス状に示した図。
【図４】距離画像を３次元的に示した図。
【図５】画像取得部を構成する発光部と受光部の外観の一例を示した図。
【図６】距離画像の一具体例を示した図。
【図７】図６の距離画像を分割した様子を示した図。
【図８】平面方向動き検出処理動作を説明するためのフローチャート。
【図９】平面方向動き検出処理を具体的に説明するための図。
【図１０】ブロック画像の奥行き情報（距離値）を概念的に説明するための図。
【図１１】奥行き方向動き検出処理を具体的に説明するための図。
【図１２】本発明の第２の実施形態に係る画像認識装置の構成例を概略的に示す図。
【図１３】奥行き方向動き検出処理について説明するためのフローチャート。
【図１４】奥行き情報補正処理について説明するためのフローチャート。
【図１５】奥行き情報補正処理を具体的に説明するための図。
【図１６】認識すべき動きの登録されたテンプレートの一例を示した図。
【図１７】３次元動き情報（平面方向動きベクトル、奥行き方向動きベクトル）を３次元的なベクトルで表して、所定の表示装置に表示する際の表示例を示した図。
【図１８】３次元動き情報（平面方向動きベクトル、奥行き方向動きベクトル）を３次元的なベクトルで表して、所定の表示装置に表示する際の表示例を示した図。
【図１９】３次元動き情報（平面方向動きベクトル、奥行き方向動きベクトル）を表したベクトルの３次元的な表示例を示した図。
【符号の説明】
１…画像取得部
２…画像格納部
３、４…ブロック分割部
５…平面方向動き検出部
６…距離値算出部
７…奥行き方向動き検出部
８…動き認識部
９…テンプレート
１２…画像格納部
１３…ブロック分割部
１４…距離値算出部
１５…ブロック分割部
１６…距離値算出部
１７…奥行き方向動き検出部
１８…奥行き情報補正部
１９…平面方向動き検出部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a motion detection method for detecting motion in the depth direction as well as the planar direction, and a motion detection apparatus using the same.
[0002]
The present invention also relates to a motion recognition method that detects motion in the depth direction as well as in the plane direction and recognizes the motion, and a motion recognition apparatus using the motion recognition method.
[0003]
[Prior art]
Conventionally, when trying to extract the movement of a recognition object using an imaging device such as a video camera, the following method has been taken.
[0004]
The first is a technique called optical flow. In this method, attention is paid to a specific block image, and a direction in which a certain image area has moved in a plane between adjacent frames is measured and the direction thereof is estimated. A typical method for specifying the moving direction of the target image in the next frame is to calculate the similarity between adjacent frames in time series. A correlation coefficient with the block image of interest in the previous frame is calculated for block images of the same size in the vicinity of the target image area, and the direction to the block with the highest coefficient is estimated as a motion vector.
[0005]
This technology is widely used in the field of robot vision, such as human face tracking. Since the calculation of correlation coefficients is easy to implement in hardware, it is possible to add real-time data by adding a special accelerator board that calculates correlation coefficients. Tracking is also possible. In this method, if the block image of interest does not change greatly in two dimensions, it is possible to extract the motion component in the plane direction with considerable accuracy, but the target image is a two-dimensional image acquired by a video camera or the like. Therefore, the three-dimensional motion information including the depth direction cannot be detected.
[0006]
In addition, as an example of development of the optical flow, a sensor that estimates a motion vector by extracting difference information between adjacent pixels in a hardware manner has also appeared. Since motion components can be taken in real time, they are being used in entertainment fields such as games or surveillance systems. Also in this case, two-dimensional motion information at the pixel level is extracted, but it is impossible to extract motion information in the depth direction.
[0007]
As another method, there is a method of tracking feature points in an image in addition to the detection of motion for the entire image. For example, consider tracking the movement of the hand. In this case, first, it is necessary to define a portion corresponding to the hand from the acquired images of the hands picked up from a plurality of cameras as feature points and detect the three-dimensional position information. For this purpose, first, feature points (hands) are extracted from images acquired at the same time by each camera, and three-dimensional spatial position information of the feature points is obtained from the screen coordinates in the manner of triangulation. Next, feature points, in this case, three-dimensional motion information of the hand can be detected from the spatial position information of the feature points obtained in time series. However, this method requires a complicated work such as feature point extraction and association after performing calibration corresponding to the adjustment of various parameters of each camera in advance. Absent.
[0008]
There is also a technique called motion capture in which a sensor is previously attached to a part such as a joint that characterizes the movement, the sensor part is extracted from the captured image, and two-dimensional or three-dimensional movement is measured. In this method, the feature point extraction and association processing is lighter than the method introduced above, but the cost of the entire system is high, and there are many restrictions on operating the system. Furthermore, it is necessary to wear a specific sensor device that is bothersome, and it is not very usable for general users.
[0009]
As described above, the conventional method has various problems in the method of extracting a three-dimensional motion including depth information from an image series.
[0010]
[Problem to be Solved by the Invention]
As described above, in the conventional method, since the recognition target is acquired as an image having only two-dimensional information using a video camera or the like, the extraction of the three-dimensional motion of the target is performed only from the two-dimensional information. However, it is impossible to recognize a three-dimensional motion including the depth direction with high accuracy.
[0011]
Therefore, an object of the present invention is to provide a motion detection method and a motion detection apparatus using the same, which can detect a three-dimensional motion with high accuracy.
[0012]
It is another object of the present invention to provide a motion recognition method capable of recognizing three-dimensional motion with high accuracy and a motion recognition apparatus using the same.
[0013]
[Means for Solving the Problems]
(1) In the motion detection method of the present invention, the acquired distance image is divided into small regions of a predetermined size, and the motion in the planar direction is detected for each similar small region between the continuously acquired distance images. It is possible to detect three-dimensional motion with high accuracy by detecting depth information for each small region and detecting motion in the depth direction between similar small regions based on the depth information. it can.
[0014]
(2) The motion detection method of the present invention divides the acquired distance image into small areas of a predetermined size, calculates depth information for each of the small areas, and A movement in the depth direction is detected for each small area based on the depth information, and the movement source small area and the small area in which the depth information is corrected based on the detected movement in the depth direction, the movement source By detecting the movement in the planar direction between the small areas similar to the small areas, and correcting the change in shade of the distance image due to the movement component in the depth direction, the plane direction movement information is detected. It is possible to detect a three-dimensional movement of an object having a movement component in the depth direction with higher accuracy.
[0015]
(3) In the motion recognition method of the present invention, the acquired distance image is divided into small areas of a predetermined size, and the movement in the planar direction is performed for each similar small area between the continuously acquired distance images. Detecting depth information for each small area, detecting movement in the depth direction between similar small areas based on the depth information, and detecting the movement from the movement in the plane direction and the movement in the depth direction. Can be recognized with high accuracy.
[0016]
(4) In the motion recognition method of the present invention, the acquired distance image is divided into small areas of a predetermined size, depth information is calculated for each small area, and A motion in the depth direction is detected based on the depth information for each small region, and the small information is obtained by correcting the depth information based on the small region of the movement source and the detected movement in the depth direction. By detecting the movement in the plane direction between the small area similar to the area and recognizing the movement from the movement in the plane direction and the movement in the depth direction, the change in the gradation of the distance image due to the movement component in the depth direction Since the plane direction motion information is detected after correcting the above, it is possible to recognize the three-dimensional motion of the object having the motion component in the depth direction with higher accuracy.
[0017]
(5) The motion detection apparatus of the present invention includes an image acquisition unit that acquires a distance image;
A dividing unit that divides the distance image acquired by the image acquiring unit into small areas of a predetermined size;
First distance detection means for detecting a movement in a planar direction for each of the similar small regions between distance images continuously acquired by the image acquisition means;
Calculating means for calculating depth information for each small area;
Second detection means for detecting movement in the depth direction between similar small regions based on the depth information calculated by the calculation means;
The three-dimensional movement can be detected with high accuracy.
[0018]
(6) The motion detection apparatus of the present invention includes an image acquisition unit that acquires a distance image;
A dividing unit that divides the distance image acquired by the image acquiring unit into small areas of a predetermined size;
Calculating means for calculating depth information for each small area;
A first motion detection unit that detects a motion in the depth direction based on the depth information for each of the small regions between distance images continuously acquired by the image acquisition unit;
Correction means for correcting the depth information based on the movement in the depth direction detected by the first detection means;
Second detection means for detecting a movement in a planar direction between the movement-source small area and the small area whose depth information has been corrected by the correction means and similar to the movement-source small area; Since the detection of the plane direction motion information is performed after correcting the change in density of the distance image due to the motion component in the depth direction, the object 3 having the motion component in the depth direction can be detected with higher accuracy. Dimensional motion can be detected.
[0019]
(7) The motion recognition apparatus of the present invention includes an image acquisition unit that acquires a distance image;
A dividing unit that divides the distance image acquired by the image acquiring unit into small areas of a predetermined size;
First distance detection means for detecting a movement in a planar direction for each of the similar small regions between distance images continuously acquired by the image acquisition means;
Calculating means for calculating depth information for each small area;
Second detection means for detecting movement in the depth direction between similar small regions based on the depth information calculated by the calculation means;
Recognizing means for recognizing the movement from the movement in the planar direction and the movement in the depth direction detected between similar small regions;
With this, it is possible to recognize a three-dimensional movement with high accuracy.
[0020]
(8) The motion recognition apparatus of the present invention includes an image acquisition unit that acquires a distance image;
A dividing unit that divides the distance image acquired by the image acquiring unit into small areas of a predetermined size;
Calculating means for calculating depth information for each small area;
A first motion detection unit that detects a motion in the depth direction based on the depth information for each of the small regions between distance images continuously acquired by the image acquisition unit;
Correction means for correcting the depth information based on the movement in the depth direction detected by the first detection means;
Second detection means for detecting a movement in a planar direction between the movement-source small area and the small area whose depth information has been corrected by the correction means and similar to the movement-source small area; ,
Recognizing means for recognizing the movement from the movement in the planar direction and the movement in the depth direction detected between similar small regions;
Since the detection of the plane direction motion information is performed after correcting the change in density of the distance image due to the motion component in the depth direction, the object 3 having the motion component in the depth direction can be detected with higher accuracy. Dimensional movement can be recognized.
[0021]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of the present invention will be described below with reference to the drawings.
(First embodiment)
FIG. 1 is an overall configuration diagram of an image recognition apparatus according to a first embodiment of the present invention.
[0022]
The image recognition apparatus according to the present embodiment includes an image acquisition unit 1 including an imaging unit for acquiring a distance image stream, an image storage unit 2 that stores a distance image acquired by the image acquisition unit 1, and an image storage unit. 2 divides the distance image stored in 2 into small areas (block images) having a predetermined size as a motion detection unit, and the distance image acquired by the image acquisition unit 1 as a predetermined motion detection unit. A block dividing unit 4 that divides the image into small regions (block images), a distance image (sample distance image) stored in the image storage unit 2, and a distance image (latest distance image) acquired by the image acquisition unit 1; A plane direction motion detection unit 5 that detects a plane direction motion (plane direction motion vector) for each block image, a distance value calculation unit 6 that calculates depth information (distance value) of the block image, and a plane direction motion Detection unit 5 Distance value calculation unit between the block image of the movement source in the plane direction and the block image of the movement destination estimated based on the detected movement in the plane direction (planar direction movement information, more specifically, the plane direction motion vector) 6 is detected by the depth direction motion detection unit 7 that detects the motion in the depth direction (depth direction motion information, more specifically, the depth direction motion vector) based on the distance value calculated in 6, and the planar direction motion detection unit 5. The movement information recognition unit 8 recognizes the movement by referring to the planar direction movement information, the depth direction movement information detected by the depth direction movement detection unit 7 and the template 9, and the template 9 in which the movement to be recognized is registered. Has been.
[0023]
First, the image acquisition unit 1 and the distance image will be described.
[0024]
The image acquisition unit 1 uses a recognition target object (for example, a human hand, face, whole body, etc.) as an image having a depth value reflecting the three-dimensional shape (hereinafter referred to as a distance image) at predetermined time intervals (for example, 1 / Every 30 seconds, etc.) (for example, it can be realized using the image acquisition method of Japanese Patent Application No. 8-274949).
[0025]
Since distance images are acquired every predetermined time, these are sequentially held inside or outside the image acquisition unit 1 using a memory or the like, thereby moving a moving image (hereinafter referred to as a distance image stream) based on the distance image of the object. Can be obtained). At this time, when the distance image acquisition interval is t seconds, the distance image stream includes “the latest distance image”, “the distance image t seconds before the latest (hereinafter referred to as one frame before)”, “from the latest It is obtained as an aggregate of distance images of a plurality of frames, such as a distance image 2 t seconds ago (two frames before, and so on).
[0026]
As shown in FIG. 2, the image acquisition unit 1 mainly includes a light emitting unit 101, a light receiving unit 103, a reflected light extracting unit 102, and a timing signal generating unit 104.
[0027]
The light emitting unit 101 emits light whose intensity varies with time in accordance with the timing signal generated by the timing signal generating unit 104. This light is applied to the target object in front of the light emitting unit.
[0028]
The light receiving unit 103 detects the amount of light reflected by the target object of the light emitted from the light emitting unit 101.
[0029]
The reflected light extraction unit 102 extracts a spatial intensity distribution of the reflected light received by the light receiving unit 103. Since the spatial intensity distribution of the reflected light can be captured as an image, it is called a reflected light image or a distance image.
[0030]
In general, the light receiving unit 103 simultaneously receives not only light reflected from an object of light emitted from the light emitting unit 101 but also external light such as illumination light and sunlight. Therefore, the reflected light extraction unit 102 takes the difference between the amount of light received when the light emitting unit 101 emits light and the amount of light received when the light emitting unit 101 does not emit light, thereby obtaining the light emitting unit 101. Only the reflected light component of the light from the target object is extracted.
[0031]
The reflected light extraction unit 102 extracts the intensity distribution, that is, a reflected light image (distance image) as shown in FIG. 3 from the reflected light received by the light receiving unit 103.
[0032]
For the sake of simplicity, FIG. 3 shows a case of a reflected light image of 8 × 8 pixels that is a part of a reflected light image of 256 × 256 pixels.
[0033]
The reflected light from the object decreases significantly as the distance of the object increases. When the surface of the object scatters light uniformly, the amount of received light per pixel of the reflected light image decreases in inverse proportion to the square of the distance to the object.
[0034]
Each pixel value of the reflected light image represents the amount of reflected light received by the unit light receiving unit corresponding to the pixel. Reflected light is affected by the nature of the object (specularly reflects, scatters, absorbs, etc.), the direction of the object, the distance of the object, etc., but the entire object scatters light uniformly. In this case, the amount of reflected light is closely related to the distance to the object. Since the hand and the like have such properties, the reflected light image when the hand is put out in front of the image acquisition unit 1 reflects the distance to the hand, the inclination of the hand (partially different distance), and the like. A three-dimensional image as shown in FIG. 4 can be obtained.
[0035]
FIG. 5 shows an example of the appearance of the light emitting unit 101 and the light receiving unit 103 constituting the image acquisition unit 1 as described in Japanese Patent Application No. 9-299648, for example, with a circular shape at the center. A light receiving unit 103 composed of a lens and an area sensor (not shown) at the rear thereof is arranged, and a light emitting unit composed of LEDs that irradiate light such as infrared rays along the outline of the circular lens. 101 (for example, six) 101 are arranged at equal intervals.
[0036]
The light emitted from the light emitting unit 101 is reflected by the object, collected by the lens of the light receiving unit 103, and received by the area sensor at the rear of the lens. The area sensor is, for example, a sensor arranged in a 256 × 256 matrix, and the intensity of reflected light received by each sensor in the matrix becomes a pixel value. The image acquired in this way is a distance image as the intensity distribution of reflected light as shown in FIG.
[0037]
FIG. 3 shows a part of the distance image data (a part of 256 × 256 pixels, 8 × 8 pixels). In this example, the value of the cell (pixel value) in the matrix indicates the intensity of the acquired reflected light in 8-bit 256 gradations. For example, a cell having a value of “255” is closest to the image acquisition unit 1, a cell having a value of “0” is far from the image acquisition unit 1, and reflected light reaches the image acquisition unit 1. Indicates that it will not reach.
[0038]
FIG. 4 three-dimensionally shows the entire distance image data in the matrix format as shown in FIG. This example shows the case of distance image data of a human hand.
[0039]
FIG. 6 shows an example of a hand distance image acquired by the image acquisition unit 1. The distance image is a three-dimensional image having depth information, and is, for example, an image having 64 pixels in the x-axis (horizontal) direction, 64 pixels in the y-axis (vertical) direction, and 256 gradations in the z-axis (depth) direction. FIG. 6 shows the distance value of the distance image, that is, the gradation in the z-axis direction in gray scale. In this case, the closer the color is to black, the closer the distance from the image acquisition unit 1 is, and the closer to white is the distance. Indicates that it is far away. Further, a place where the color is completely white indicates that there is no image, or even if it is present, it is the same as not far away.
[0040]
The intensity of reflected light from an object decreases in inverse proportion to the square of the distance to the object. That is, if the pixel value of each pixel (i, j) in the distance image is Q (i, j),
Q (i, j) = K / d ² ... (1)
It can be expressed as.
[0041]
Here, for example, K is a coefficient adjusted so that the value of R (i, j) becomes “255” when d = 0.5 m. The distance value can be obtained by solving the above equation for d.
[0042]
Next, each component of the motion recognition apparatus in FIG. 1 will be described in more detail.
[0043]
The image storage unit 2 is a distance image that is always several frames before (for example, always one frame before) of the distance images included in the distance image stream acquired by the image acquisition unit 1 (hereinafter referred to as a sample distance image and To store).
[0044]
Here, how many frames before the distance image is used as the sample distance image is determined based on information such as the distance image acquisition interval (frame rate) of the image acquisition unit 1 and the operation speed of the object. For example, if N frames can be acquired during a series of operations in which the two-dimensional projection image of the object does not change, the sample image may be arbitrarily selected between 1 to N frame previous distance images.
[0045]
The block dividing unit 3 is for dividing the distance image (sample distance image) stored in the image storage unit 2 into block images serving as motion detection units.
[0046]
The block division unit 4 is for dividing the distance image (latest distance image) newly acquired by the image acquisition unit 1 into block images serving as motion detection units.
[0047]
Here, it is considered that the acquired distance image is divided into block images having the same size. For example, when the frame size of the target distance image is 64 pixels in the x-axis (horizontal) direction and 64 pixels in the y-axis (vertical) direction, dividing into 8 equal parts in each of the x-axis direction and the y-axis direction results in 1 The size of one block image is composed of a total of 8 × 8 = 64 pixels of 8 pixels in the x-axis (horizontal) direction and 8 pixels in the y-axis (vertical) direction.
[0048]
FIG. 7 shows a state in which the distance image shown in FIG. 6 is divided into 8 × 8 block images.
[0049]
Next, from the sample distance image actually stored in the image storage unit 2 by the plane direction motion detection unit 5 and the latest distance image newly acquired by the image acquisition unit 1, the plane direction motion information is obtained. It will be described with reference to the flowchart shown in FIG.
[0050]
First, among the block images on the sample distance image divided by the block dividing unit 3, a block image (target block) for which motion information is calculated is set (step S1).
[0051]
Movement estimated that the object has moved by calculating the similarity between the block image (search block) on the latest distance image divided by the block dividing unit 4 and the target block set in step S1 The previous block is searched (step S2).
[0052]
Assuming that the search block on the latest distance image that is the target of similarity calculation is divided into the same block size by the block dividing units 3 and 4, the x-axis direction, y By shifting one block at a time in the axial direction, the maximum (x-axis direction frame size / x-axis direction block size) × (y-axis direction frame size / y-axis direction block size) block images can be taken. However, since the correlation between frames that are close in time series is actually very high, it is sufficient that the block to be searched is within a range that is shifted by one block each in the x-axis direction and the y-axis direction with respect to the target block. There are many things.
[0053]
Here, as shown in FIG. 9, both the latest distance image G1 and the sample distance image G2 one frame before are divided into block images of a certain size, and the similarity between the block images is obtained. For example, when a distance image having a frame size of 64 pixels in the x-axis direction and 64 pixels in the y-axis direction is divided into 8 in each of the x-axis direction and the y-axis direction, the block sizes in the x-axis direction and the y-axis direction are both 64 / 8 = 8 pixels. The search range on the latest distance image G2 is block images b11, b12, b13, b21, b23, b31, b32, and b33 that are shifted by one block vertically and horizontally with respect to the target block a22 in the sample distance image G1. .
[0054]
Note that the search block in the latest distance image does not necessarily have to be selected so that the search blocks do not overlap with each other as described above. The maximum (x-axis direction frame size−x-axis direction block size) × (y-axis direction frame size) is shifted by one pixel in the x-axis direction and y-axis direction around the block image at the same position as the target block a22. -Y-axis direction block size), arbitrarily.
[0055]
Search block on the latest distance image (measured in block units, the position is the x-th block in the x-axis direction, the y-th block in the y-axis direction is (x, y)), and the target on the sample distance image The similarity Czw-xy between blocks (measured in units of blocks, with the z-th block in the x-axis direction, the w-th block in the y-axis direction, and (z, w)) is, for example, 2).
[0056]
[Expression 1]

[0057]
Using the equation (2), the target block (for example, the target block a22 in FIG. 9) and all the search blocks (for example, the search blocks b11, b12, b13, b21, and b23 in FIG. 9) on the latest distance image around the target block. , B31, b32, b33) are calculated (step 3). In FIG. 9, the similarity between the target block a22 in the sample distance image G1 and the search blocks b11, b12, b13, b21, b23, b31, b32, and b33 in the latest distance image G2 is obtained using Equation (2). The similarity with the search block b22 is “0.1”, the similarity with the search block b32 is “0.2”, the similarity with the search block b33 is “0.9”, and other search blocks Is similar to “0”.
[0058]
When the processing is completed for all the search blocks, the plane direction motion detection unit 5 proceeds to the plane direction motion vector calculation process (step S4).
[0059]
In the plane direction motion vector calculation processing, the block image most similar to the target block is extracted from the similarity calculation result of the search block on the latest distance image with respect to the target block on the sample distance image obtained above.
[0060]
When the similarity is calculated using Expression (2), the search block having the highest value is extracted. For example, in FIG. 9, since the search block in the latest distance image that is most similar to the target block a22 in the sample distance image is b33, the search block starts from the target block a22 (here, the end point or the center position of the target block a22). The vector to b33 (here, the end point or the center position of the search block b33) becomes the final motion vector in the plane direction.
[0061]
For example, the position of the target block is the position of (2, 2) in block unit measurement (that is, the second block in the x-axis direction and the second block in the y-axis direction), and the position of the search block with the highest similarity Is the position of (3, 3) in the same way, the amount of movement estimated to have moved the block in the plane direction is (3, 3)-(2, 2) = (1, 1), and the final The amount of motion vector in the plane direction is 1 × (block size in the x-axis direction) in the x-axis direction and 1 × (block size in the y-axis direction) in the y-axis direction.
[0062]
FIG. 9 illustrates the similarity calculated using Equation (2) and the finally detected motion vector in the planar direction in the search block in the vertical and horizontal one block range centered on the target block. It is.
[0063]
Steps S1 to S4 are repeated for all block images on the sample distance image as the target block, and a plane direction motion vector is calculated (step S5).
[0064]
The distance value calculation unit 6 obtains a distance value in the z-axis direction (depth direction) of the block image on the distance image. This corresponds to obtaining depth information representative of the distance to the target object imaged in the block image.
[0065]
FIG. 10 conceptually shows a case where the distance value of a block image is obtained by averaging all pixel values constituting the block image. For example, the distance value Dxy of the block image at the coordinates (x, y) can be obtained from the following equation (3) by averaging all pixel values in the block image. In FIG. 10, continuous pixel values (in this case, corresponding to distance values) of two adjacent block images (first block image and second block image) in one frame are continuously smooth. It is drawn with a curve.
[Expression 2]

[0066]
It should be noted that the average value of the pixel values in the block image may be simply used as the distance value as is using the equation (3), and each pixel in the block image is used when the equation (3) is used. For example, the distance value d obtained using the equation (1) may be used as Fxy, and the distance value d using the equation (1) from the average value of the pixel values obtained using the equation (3). You may ask for. Here, for the sake of simplicity of explanation, in any case, the distance value of the block image is referred to.
[0067]
The calculation of the distance value of the block image need not be limited to the averaging method described here, but a cube whose base is the block image on the xy plane is defined and is equal to the sum of the distance values constituting the block image. A height in the z-axis direction as described above may be used, or an intermediate value of distance values constituting a block image may be used.
[0068]
In the depth direction motion detection unit 7, the block image estimated as the movement destination on the latest distance image estimated by the plane direction motion vector detected by the plane direction motion detection unit 5 in all the block images constituting the distance image. Depth direction motion information is calculated by subtracting the distance value and the distance value of the source block image on the sample distance image.
[0069]
FIG. 11 shows how the depth direction motion detection unit 7 calculates the motion information in the depth direction from the distance value between the movement source block image and the movement destination block image on the sample distance image and the latest distance image. Is. The depth direction motion information calculated here is finally output as a depth direction motion vector (a difference in distance value between the block image of the movement source and the block image of the movement destination whose direction is the depth direction).
[0070]
As described above, it is possible to detect not only the plane direction motion information but also the depth direction motion information from the distance image stream acquired by the image acquisition unit 1.
[0071]
If the block division unit 3 performs block division accuracy, that is, if the block division per frame is finely performed, finer motion information can be detected. Conversely, if the block division is coarsely performed, a larger imaging target can be detected. It becomes possible to perform motion detection at a target level on an object at high speed.
[0072]
The motion recognition unit 8 will be described later.
(Second Embodiment)
In the first embodiment, plane direction motion information is obtained, and then depth direction motion information is obtained. To be precise, the object moves not only in the plane direction but also in the depth direction. The motion information obtained by projecting the depth direction motion information in the plane direction has an influence on the plane direction motion information obtained first.
[0073]
Therefore, in the second embodiment, in order to solve such a problem, first, depth direction motion information is obtained for all block images. At this time, since it is not known which block image on the latest distance image corresponds to a certain block image on the sample distance image, the position near the position corresponding to the target block on the sample distance image on the latest distance image. Depth direction motion information is obtained for a plurality of block images. Next, using the depth direction motion information, the depth information (that is, the distance value) of the block image is corrected, and the plane direction motion information is obtained for the corrected block image of the latest distance image. In this way, accurate plane direction motion information that does not include depth direction motion information can be acquired, and based on this, depth direction motion information can be obtained.
[0074]
FIG. 12 is an overall configuration diagram of an image recognition apparatus according to the second embodiment of the present invention.
[0075]
The image recognition apparatus according to the present embodiment includes an image acquisition unit 1 including an imaging unit for acquiring a distance image stream, an image storage unit 12 that stores a distance image acquired by the image acquisition unit 1, and an image storage unit. 12, which divides the distance image stored in 12 into small areas (block images) having a predetermined size serving as a motion detection unit, and depth information (distance value) of the block image divided by the block divider 13. A distance value calculation unit 14 that calculates the distance, a block division unit 15 that divides the distance image acquired by the image acquisition unit 1 into small areas (block images) of a predetermined size serving as a motion detection unit, and a block division unit 15 The distance value calculation unit 16 that calculates the depth information (distance value) of the divided block image, the distance image (sample distance image) stored in the image storage unit 12, and the distance acquired by the image acquisition unit 1 A depth direction motion detection unit 17 that detects a motion in the depth direction (depth direction motion vector) for each block image between the image (latest distance image) and the depth direction motion information detected by the depth direction motion detection unit 17. Based on the depth information correction unit 18 that corrects the depth information of the block image on the distance image acquired by the image acquisition unit 1, and the block image on the latest distance image and the block image on the sample distance image that have corrected the depth information. A plane direction motion detection unit 19 that detects a plane direction motion (plane direction motion vector), a plane direction motion vector detected by the plane direction motion detection unit 19, and a depth direction motion vector detected by the depth direction motion detection unit 17. And a motion recognition unit 8 that recognizes the motion with reference to the template 9 and a template 9 that registers the motion to be recognized. That.
[0076]
The image acquisition unit 1, the image storage unit 12, the

block division units

13, 15 and the distance value calculation unit 14 are the image acquisition unit 1, the image storage unit 2, the block division units 3, 4 and the distance value calculation unit of FIG. Since this is the same as 6, the description is omitted, and only different parts will be described.
[0077]
The motion recognition apparatus shown in FIG. 12 uses the depth information (distance value) of each block image as the distance value calculation unit in the latest distance image acquired by the image acquisition unit 1 and the sample distance image stored in the image storage unit 12. 14 and 16 in advance.
[0078]
The depth direction motion detection unit 17 detects the motion in the depth direction of the block image on the latest distance image with respect to the block image (target block) that is the motion detection target on the sample distance image (calculates depth direction motion information). It is.
[0079]
Here, the depth direction motion detection processing in the depth direction motion detection unit 17 will be described with reference to the flowchart shown in FIG.
[0080]
First, among the block images on the sample distance image, a block image (target block) that is a target for motion information calculation is set (step S21).
[0081]
The distance value of the block image (search block) that is the target of motion information search on the latest distance image acquired by the image acquisition unit 1 is compared with the distance value of the target block on the sample distance image set in step S21. Thus, the depth direction motion information is calculated (step S22). The search block is the same as that in the first embodiment when selecting a block image that can be a movement destination from the latest distance image to the target block on the sample distance image, that is, a search block.
[0082]
For example, by subtracting the distance value of the target block on the sample distance image from the distance value of the search block, it is possible to obtain the depth direction motion information of the target block in the search block.
[0083]
At this stage, it is not clear to which search block on the latest distance image the target block on the sample distance image has transitioned, but the depth direction motion information of each search block obtained here is the final depth direction motion vector. That is why.
[0084]
The process of step S22 is performed on all search blocks on the latest distance image (step S23). When the process of step S22 is completed for all the search blocks, the process returns to step S21, another block image on the sample distance image is set as the target block, and the process of step S22 is performed on all the block images on the sample distance image. Is performed (step S24).
[0085]
The depth information correction unit 18 converts the block image on the latest distance image based on the depth direction motion information of each search block obtained by the depth direction motion detection unit 17, and the plane direction motion detection unit 19 converts the motion component in the plane direction. In order to accurately extract the motion component in the depth direction.
[0086]
Here, the depth information correction processing in the depth information correction unit 18 will be described with reference to the flowchart shown in FIG.
[0087]
First, among the block images on the sample distance image, a block image (target block) that is a target for motion information calculation is set (step S31). The depth direction motion detection unit 17 applies a distance value to the entire block image based on the depth direction motion information obtained at each search block position with respect to the block image on the latest distance image that is the target of motion information search. Conversion processing for correction is performed (step S32).
[0088]
In general, when a target object performs a parallel motion in the depth direction, the movement is obtained as a change in distance value, that is, a change in shading on a distance image obtained by imaging the target object. Therefore, even if the correlation between the block images is found between the latest distance image and the sample distance image acquired at different times as performed by the planar direction motion detection unit 5 in the first embodiment, a similarity can be found between them. In some cases, the motion component in the plane direction cannot be detected.
[0089]
Therefore, in the second embodiment, the depth information correction unit 18 corrects the depth component of the latest distance image, thereby calculating the above-described depth direction in the similarity calculation between the block images in the planar direction motion detection unit 19. It is necessary to remove the dependent component.
[0090]
Here, a conversion process is applied to the image of the search block on the latest distance image so that the distance value thereof is equal to the distance value of the target block. The simplest way to change only the distance value without changing the pixel pattern that forms the search block is to shift the pixel value of all the pixels that make up the block in the z-axis direction by the motion information in the depth direction of the block image. It is to let you.
[0091]
For example, the pixel value in the block image at the position (x, y) of the x-th block in the x-axis direction and the y-th block in the y-axis direction on the latest distance image in block unit measurement uses the following equation (4). Thus, correction in the depth direction can be performed.
[0092]
[Equation 3]

[0093]
FIG. 15 shows a depth information correction unit 18 that raises the block image on the latest distance image by the depth direction motion information detected from the block image in the z-axis (depth) direction and samples the distance value of the block image. It shows a state in which it is matched with the distance value of the image of the target block on the distance image.
[0094]
The depth information correction process described above is performed on all search blocks on the latest distance image (step S33). When the process of step S32 is completed for all the search blocks, the process returns to step S31, another block image on the sample distance image is set as the target block, and the process of step S22 is performed on all the block images on the sample distance image. Is performed (step S24).
[0095]
Next, the plane direction motion detection unit 19 performs the plane direction motion of the target block from the image of the target block on the sample distance image and the converted distance image obtained by correcting the distance value by the depth information correction unit 18. Detect information.
[0096]
The flow of processing in the plane direction motion detection unit 19 is a description of the flowchart showing the processing operation of the plane direction motion detection unit 5 in the first embodiment and the plane direction motion detection unit 5 shown in FIG. This is the same if the block image corrected by the depth information correction unit 18 is replaced.
[0097]
Finally, in each target block on the sample distance image, the relative vector to the block with the highest correlation detected by the plane direction motion detection unit 19 is detected as the plane direction motion vector, and the destination block (the block with the highest correlation) is detected. The depth direction motion information previously detected by the depth direction motion detection unit 17 in FIG.
[0098]
In the configuration of the first embodiment, since the depth direction motion information detection process only needs to be performed once, the motion information in the depth direction can be detected at high speed, but for an object having a large motion component in the depth direction. There is a possibility of failing to detect the motion component in the plane direction. On the other hand, in the configuration according to the second embodiment, the depth information of the latest distance image is corrected based on the depth direction motion information, and the change in shade of the distance image due to the motion component in the depth direction is corrected. Since the detection is performed, it is possible to detect the three-dimensional motion information of the object having the motion component in the depth direction with higher accuracy.
(Display of motion detection result and description of motion recognition unit 8 in the first and second embodiments)
17 and 18 are detected by the planar direction motion detector 5 and the depth direction motion detector 7 of the first embodiment, or by the depth direction motion detector 17 and the planar direction motion detector 19 of the second embodiment. In addition, a three-dimensional motion information (plane direction motion vector, depth direction motion vector) is represented by a three-dimensional vector, and a display example when displaying on a predetermined display device is shown.
[0099]
Here, for example, a distance image composed of 64 × 64 pixels is divided into eight equal parts in the vertical direction and eight equal parts in the horizontal direction, and the motion vectors in each block image are displayed three-dimensionally with arrows. It is.
[0100]
For example, as shown in FIG. 19, the movement vector is displayed in three stages, as shown in FIG. 19, and the movement distance is classified into three stages of 0 to 5 cm, 5 to 10 cm, and 10 to 15 cm. May be displayed in stages with different colors or different line segments.
[0101]
In the screen display examples shown in FIGS. 17 and 18, the scale displayed at the bottom of the screen is obtained by combining the motion vectors of the obtained block images to determine the motion components in the XYZ axis directions of the entire frame. Is displayed. For example, in the case of FIG. 17, the motion scale in the X-axis direction is shifted to the right, that is, the motion vector is entirely pointing to the right, and the captured hand is interpreted as moving to the right. be able to. Similarly, with regard to the Y-axis direction or the depth direction (Z-axis direction), by observing the Y-axis direction component and depth component (Z-axis direction component) of the motion vector, the overall movement of the object being imaged can also be observed. It can be detected.
[0102]
In this way, it is possible not only to detect motion information in a specific location in a frame image, that is, in each block image, but also to determine motion information as a region by combining motion vectors in each block. It can also be used for higher-order motion detection such as motion recognition.
[0103]
For example, in the motion recognition unit 8 in FIGS. 1 and 12, a distance image composed of 64 × 64 pixels is divided into eight equal parts in the vertical direction and eight equal parts in the horizontal direction, and is detected from each block image. The motion vectors in the plane direction and the depth direction are combined and divided into two equal parts in the vertical direction and two equal parts in the horizontal direction, and a three-dimensional motion vector for each of the four block images is calculated.
[0104]
It is recognized what kind of motion the motion detected by comparing this motion vector with the template 9 as shown in FIG. The template 9 is configured on a predetermined memory.
[0105]
The template 9 shown in FIG. 16 includes four block images corresponding to movements to be recognized (for example, movement to the upper right, movement to the upper left, clockwise rotation, and counterclockwise rotation). Each three-dimensional motion vector is registered in advance.
[0106]
The motion recognition unit 8 combines the motion vectors as shown in FIG. 17 to calculate four three-dimensional motion vectors, and is detected from the distance image by referring to the template 9 as shown in FIG. Can be recognized as moving to the upper right. Also, by synthesizing the motion vectors as shown in FIG. 18 to calculate four three-dimensional motion vectors and referring to the template as shown in FIG. 19, the motion detected from this distance image is clockwise. It can be recognized as a rotational movement.
[0107]
Each of the above components can be implemented as software except for the imaging portion of the image acquisition unit 1. That is, the above-described procedures can be recorded and distributed on a machine-readable recording medium as a program that can be executed by a computer.
[0108]
The present invention is not limited to the above-described embodiment, and can be implemented with various modifications within the technical scope thereof.
[0109]
【The invention's effect】
As described above, according to the present invention, three-dimensional movement can be detected with high accuracy. In addition, it is possible to recognize three-dimensional movement with high accuracy.
[Brief description of the drawings]
FIG. 1 is a diagram schematically showing a configuration example of an image recognition apparatus according to a first embodiment of the present invention.
FIG. 2 is a schematic configuration diagram of an image acquisition unit.
FIG. 3 is a diagram showing a distance image in a matrix.
FIG. 4 is a diagram showing a three-dimensional distance image.
FIG. 5 is a diagram illustrating an example of the appearance of a light emitting unit and a light receiving unit that constitute an image acquisition unit.
FIG. 6 is a diagram showing a specific example of a distance image.
7 is a diagram illustrating a state in which the distance image in FIG. 6 is divided.
FIG. 8 is a flowchart for explaining a planar direction motion detection processing operation;
FIG. 9 is a diagram for specifically explaining planar direction motion detection processing;
FIG. 10 is a diagram for conceptually explaining depth information (distance value) of a block image.
FIG. 11 is a diagram for specifically explaining depth direction motion detection processing;
FIG. 12 is a diagram schematically showing a configuration example of an image recognition apparatus according to a second embodiment of the present invention.
FIG. 13 is a flowchart for explaining depth direction motion detection processing;
FIG. 14 is a flowchart for explaining depth information correction processing;
FIG. 15 is a diagram for specifically explaining depth information correction processing;
FIG. 16 is a diagram showing an example of a template in which a motion to be recognized is registered.
FIG. 17 is a diagram showing a display example when three-dimensional motion information (planar direction motion vector, depth direction motion vector) is represented by a three-dimensional vector and displayed on a predetermined display device.
FIG. 18 is a diagram illustrating a display example when three-dimensional motion information (planar direction motion vector, depth direction motion vector) is represented by a three-dimensional vector and displayed on a predetermined display device.
FIG. 19 is a diagram illustrating a three-dimensional display example of a vector representing three-dimensional motion information (planar direction motion vector, depth direction motion vector).
[Explanation of symbols]
1 ... Image acquisition unit
2 ... Image storage
3, 4 ... Block division part
5 ... Planar direction motion detector
6 ... Distance value calculation unit
7 ... Depth direction motion detector
8 ... Motion recognition unit
9 ... Template
12. Image storage unit
13: Block division section
14: Distance value calculation unit
15: Block division part
16: Distance value calculation unit
17 ... Depth direction motion detector
18 ... Depth information correction unit
19 ... Planar direction motion detector

Claims

A first step of acquiring a first distance image and a second distance image in which each pixel value indicates a distance value in the depth direction in time series;
A second step of dividing the first and second distance images into small regions of arbitrary size;
A third step for determining a small region most similar between the first and second range images;
A fourth step of calculating, for each of the most similar small areas between the first and second distance images, a representative pixel value of a pixel value in the small area;
A fifth step of calculating a difference between representative pixel values of the most similar small regions to obtain a motion amount in the depth direction;
A sixth step of calculating a planar motion vector from the difference in position of each of the most similar small regions on the first and second distance images;
Motion detecting method characterized by having a.

A first step of acquiring a first distance image and a second distance image in which each pixel value indicates a distance value in the depth direction in time series;
A second step of dividing the first and second distance images into small regions of arbitrary size;
A third step of calculating a representative pixel value of a pixel value in the small region for each of the small regions of the first and second distance images;
The representative pixel value of the first small area of the first distance image is equal to the representative pixel value of each of a plurality of small areas around the first small area in the second distance image. A fourth step of correcting each pixel value of the plurality of small regions,
A fifth step of obtaining a second subregion most similar to the first subregion of the first distance image from the plurality of subregions corrected for each pixel value;
A plane direction motion vector is calculated from a difference between the position of the first small area on the first distance image and the position of the second small area on the second distance image. Steps,
Calculating a difference between the representative pixel value of the first small area of the first distance image and the representative pixel value of the second small area of the second distance image, and moving amount in the depth direction A seventh step for determining
Motion detecting method characterized by having a.

In the first step, each pixel value is obtained from a light emitting unit that irradiates light to the imaging target, a light receiving unit that receives reflected light from the imaging target, and an intensity distribution of the reflected light received by the light receiving unit. The image acquisition apparatus including an image generation unit that generates a distance image indicating a distance value in the depth direction acquires the first distance image and the second distance image in time series. Item 3. The motion detection method according to Item 2.

7. The method according to claim 7, further comprising a seventh step of recognizing a motion of the imaging target from a plane direction motion vector and a depth direction motion amount calculated from each small region in the fifth and sixth steps. 2. The motion detection method according to 1.

8. The method according to claim 8, further comprising an eighth step of recognizing a motion of the imaging target from a plane direction motion vector and a depth direction motion amount detected from each small region in the sixth and seventh steps. 3. The motion detection method according to 2.

Distance image generating means for generating a distance image in which each pixel value indicates a distance value in the depth direction;
A dividing unit configured to divide the time-series first distance image and the second distance image generated by the distance image generating unit into small regions of an arbitrary size;
Means for determining a small region most similar between the first and second distance images;
Means for calculating a representative pixel value of a pixel value in the small region for each of the most similar small regions between the first and second distance images ;
Means for calculating a difference between representative pixel values of each of the most similar small regions to obtain a motion amount in the depth direction;
Means for calculating a planar motion vector from the difference in position of the most similar small regions on the first and second distance images;
A motion detection apparatus comprising:

Distance image generating means for generating a distance image in which each pixel value indicates a distance value in the depth direction;
A dividing unit configured to divide the time-series first distance image and the second distance image generated by the distance image generating unit into small regions of an arbitrary size;
Means for calculating a representative pixel value of a pixel value in each small area of the first and second distance images;
The representative pixel value of the first small area of the first distance image is equal to the representative pixel value of each of the plurality of small areas around the first small area in the second distance image. Means for correcting each pixel value of each of the plurality of small regions,
Means for obtaining a second small region most similar to the first small region of the first distance image from the plurality of small regions corrected for each pixel value;
Means for calculating a planar motion vector from the difference between the position of the first small area on the first distance image and the position of the second small area on the second distance image;
Calculating the difference between the representative pixel value of the first small area of the first distance image and the representative pixel value of the second small area of the second distance image, and moving amount in the depth direction A means of seeking
A motion detection apparatus comprising:

Furthermore, it comprises a light emitting means for irradiating the imaging target with light, and a light receiving means for receiving the reflected light from the imaging target,
The movement according to claim 6 or 7, wherein the distance image generating means generates a distance image in which each pixel value indicates a distance value in the depth direction from the intensity distribution of the reflected light received by the light receiving means. Detection device.

The apparatus further comprises means for recognizing the movement of the imaging target from a plane direction motion vector calculated from each small region of the first and second distance images and a depth direction motion amount. 6. The motion detection device according to 6.

The apparatus further comprises means for recognizing the movement of the imaging target from a plane direction motion vector calculated from each small region of the first and second distance images and a depth direction motion amount. 8. The motion detection device according to 7.

A computer including distance image generation means for generating a distance image in which each pixel value indicates a distance value in the depth direction,
A first step of dividing the time-series first distance image and the second distance image acquired by the distance image generation means into small regions of an arbitrary size;
A second step of determining a subregion that is most similar between the first and second range images;
A third step of calculating a representative pixel value of a pixel value in the small region for each of the most similar small regions between the first and second distance images;
A fourth step of calculating a difference between representative pixel values of each of the most similar small regions to obtain a motion amount in the depth direction;
A fifth step of calculating a planar motion vector from a difference in position of each of the most similar small regions on the first and second distance images;
A machine-readable recording medium on which a program for executing is recorded.

A computer including distance image generation means for generating a distance image in which each pixel value indicates a distance value in the depth direction,
A first step of dividing the time-series first distance image and the second distance image acquired by the distance image generation means into small regions of an arbitrary size;
A second step of calculating a representative pixel value of a pixel value in the small area for each of the small areas of the first and second distance images;
The representative pixel value of the first small area of the first distance image is equal to the representative pixel value of each of a plurality of small areas around the first small area in the second distance image. A third step of correcting each pixel value of each of the plurality of small regions,
A fourth step of obtaining a second subregion most similar to the first subregion of the first distance image from the plurality of subregions corrected for each pixel value;
A fifth plane-direction motion vector is calculated from a difference between the position of the first small area on the first distance image and the position of the second small area on the second distance image. Steps,
Calculating the difference between the representative pixel value of the first small area of the first distance image and the representative pixel value of the second small area of the second distance image, and moving amount in the depth direction A sixth step for determining
A machine-readable recording medium on which a program for executing is recorded.