JP3577875B2

JP3577875B2 - Moving object extraction device

Info

Publication number: JP3577875B2
Application number: JP05132397A
Authority: JP
Inventors: 基孫中; 利和藤岡; 和史水澤; 武久田中
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 1997-03-06
Filing date: 1997-03-06
Publication date: 2004-10-20
Anticipated expiration: 2017-03-06
Also published as: JPH10255057A

Description

【０００１】
【発明の属する技術分野】
本発明は、監視装置等において動画像から動領域を検出して正確に移動物体を抽出するための技術に関する。
【０００２】
【従来の技術】
近年、広域の監視や設備監視のために、多数のカメラを用いた監視システムが増大している。このため、観測者が複数の映像をモニタリングする非効率な監視作業を排除して、対象物体の動き等を捉えて状況把握が行える自動監視システムの開発が期待されている。特に、複数のカメラ映像で広域の観測区域の移動物体を追跡する手法としては、映像内での対象物体の領域を正確に抽出し、その対象物体の足元位置を求めて、その位置を観測空間の平面位置に変換し、対象物体の追跡を行ったり、同一の観測区域を２台のカメラで撮影して２つの映像に写る同一の物体の対応付けを行い、その２台のカメラの視差を利用して対象物体の奥行き距離を求めるステレオ処理を行う方法等がある。
【０００３】
【発明が解決しようとする課題】
しかしながら、前記従来の技術では、大きな課題が残されている。例えば、対象物体の足元位置を算出手法では、対象物体の重なりや停止物体の位置および対象の下部が遮蔽物で隠れた場合に足元位置を求めることが難しい。また、ステレオ処理では、映像内の同一パターンマッチングは、エッジ情報が少ないものや視差によりパターンのマッチングがとれない対象物体の存在等の難しさがある。
【０００４】
本発明は前記従来の問題点に鑑みてなされたもので、複数映像の統合方法として処理負荷が少ない自動生成型のテンプレートマッチング法に基づくテンプレートの位置から多視点的な処理を導入することにより正確に移動物体を抽出することを目的とする。
【０００５】
【課題を解決するための手段】
課題を解決するために本発明は、カメラ映像を入力する入力手段と、前記の入力手段から得られた映像データから動領域を検出する動領域抽出手段と、前記の動領域抽出手段から得られた動領域から各動領域別に領域のサイズや画素数の動領域の情報を検出するラベリング処理手段と、前記ラベリング処理手段より検出された各動領域の情報を用いて、その動領域と対象とする移動物体との類似度を抽出する形状情報抽出手段と、前記類似度に基づいて前記入力手段から得られた映像データと前記形状情報抽出手段から得られた動領域の情報から、探索範囲に既にテンプレートが存在する場合には最も近いテンプレートを更新し、その範囲にテンプレートが存在しない場合はテンプレートを生成し、そのテンプレートの情報を移動物体の情報として出力するテンプレート処理手段とを備えたカメラ映像処理系と、前記複数のカメラ映像処理系からの情報を統合する統合処理手段と、前記統合処理手段から得られた移動物体の位置や分類情報を処理して出力する出力処理手段より構成されている。
【０００６】
これにより、本発明では上記の課題を動画像処理の特徴を生かすような処理手順で空間位置と各画像内の画素位置との写像関数が既知の複数のカメラ映像を用い、対象領域の大まかな足元位置情報を抽出し、この不安定な情報を多視点的処理と時系列処理を施すことにより、精度の高い対象領域の運動軌跡（トレース）を求めることができ正確に移動物体を抽出することができる。
【０００７】
【発明の実施の形態】
本発明の請求項１に記載の発明は、カメラ映像を入力する入力手段と、前記の入力手段から得られた映像データから動領域を検出する動領域抽出手段と、前記の動領域抽出手段から得られた動領域から各動領域別に領域のサイズや画素数の動領域の情報を検出するラベリング処理手段と、前記ラベリング処理手段より検出された各動領域の情報を用いて、その動領域と対象とする移動物体との類似度を抽出する形状情報抽出手段と、前記類似度に基づいて前記入力手段から得られた映像データと前記形状情報抽出手段から得られた動領域の情報から、探索範囲に既にテンプレートが存在する場合には最も近いテンプレートを更新し、その範囲にテンプレートが存在しない場合はテンプレートを生成し、そのテンプレートの情報を移動物体の情報として出力するテンプレート処理手段とを備えたカメラ映像処理系と、前記複数のカメラ映像処理系からの情報を統合する統合処理手段と、前記統合処理手段から得られた移動物体の位置や分類情報を処理して出力する出力処理手段より構成される移動物体抽出装置としたものであり、本構成により移動物体の追跡に形状特徴抽出手段とテンプレートマッチング法を組み合わせたテンプレート処理を用いて多視点的な移動物体情報の統合・同定処理することにより動画像処理の特徴を生かした簡易な処理で、精度の高い移動物体の追跡を行うことができるという作用を有する。
【０００８】
本発明の請求項２に記載の発明は、複数のカメラで撮影される観測区域は、どの区域も２台以上のカメラ映像により重複して撮影されていることを特徴とする請求項１記載の移動物体抽出装置としたものであり、請求項１記載の発明効果をより大きくできるという作用を有する。
【０００９】
本発明の請求項３に記載の発明は、複数のカメラを設置する場合に４台を一組として、これらのカメラをカメラ１、カメラ２、カメラ３、カメラ４とした場合に、カメラ１とカメラ２およびカメラ３とカメラ４を並列又は内側に向いた方向に、カメラ１、２とカメラ３、４は対向に設置しながら、どの観測区域も２台以上のカメラ映像により重複して撮影できるようにカメラを増やしてゆくことを特徴とする請求項１、２記載の移動物体抽出装置としたものであり、請求項１、２記載の発明効果を高速道路等の長く伸びた領域でだせるという作用を有する。
【００１０】
本発明の請求項４に記載の発明は、入力手段が、可視カメラや赤外線カメラまたはそれらの組み合わせによることを特徴とする請求項１〜３記載の移動物体抽出装置としたものであり、請求項１〜３までの記載の発明効果をだすための入力手段の限定であり、可視カメラと赤外線カメラの双方の長所・短所を補って効果を引き出すことができるという作用を有する。
【００１１】
本発明の請求項５に記載の発明は、統合処理手段では、各カメラ映像処理系のテンプレート処理手段からテンプレートの位置、テンプレートの対象分類、テンプレートの通し番号およびテンプレートの位置に存在する対象の映像内における領域の推定下限位置情報を用いて観測区域の移動物体の位置を算出することを特徴とする請求項１〜４記載の移動物体抽出装置としたものであり、請求項１〜４記載の発明効果をだすための各カメラ映像処理系からの情報を統合する場合に多くの数や種類の移動物体を追跡できるという作用を有する。
【００１２】
本発明の請求項６に記載の発明は、統合処理手段では、各カメラの観測空間における設置位置と映像内の画素と観測空間の平面位置の写像関数が既知であり、各映像処理系のテンプレート処理で抽出されたテンプレートの中心点Ａとそのテンプレートが示す対象領域の推定下限点Ｂを観測空間の平面位置にプロットし、点Ａと点Ｂを直線で結ぶ。そして、複数の映像処理系から得られる前記の直線の交点を対象物体の位置候補とすることを特徴とする請求項１〜５記載の移動物体抽出装置としたものであり、本発明の多視点的な移動物体情報の統合・同定処理するために統合処理手段で移動物体の位置精度を高めることができるという作用を有する。
【００１３】
本発明の請求項７に記載の発明は、統合処理手段から得られた対象物体の位置候補とそれまで得られた対象物体の位置およびテンプレートの対象分類の情報を用いて観測空間における最短距離処理にて対応付けを時間軸に従って行うことを出力処理手段で行うことを特徴とする請求項６記載の移動物体抽出装置としたものであり、本発明の多視点的な移動物体情報の統合・同定処理するために出力処理手段で時間軸での移動物体の誤認識を減少させることができるという作用を有する。
【００１４】
本発明の請求項８に記載の発明は、テンプレート処理手段は、カメラ映像から２次元フィルタ処理を行い、抽出したエッジ映像データに基づいてテンプレートを生成することを特徴とする請求項１〜７記載の移動物体抽出装置としたものであり、本発明で記載のテンプレート処理手段でエッジ情報を用いることによりパターンマッチング処理が効果的に行
うことができるという作用を有する。
【００１５】
本発明の請求項９に記載の発明は、形状情報抽出手段は、ラベリング処理手段より抽出された各動領域別の領域の大きさを示す画素数や領域のサイズより対象とする物体の類似度を計算し、その類似度に基づいて、対象物体を示すテンプレートを生成する位置情報を抽出することを特徴とする請求項１〜８記載の移動物体抽出装置としたものであり、本発明において多くの数や種類の移動物体を追跡できるという作用を有する。
【００１６】
本発明の請求項１０に記載の発明は、形状情報抽出手段から抽出される類似度は、人間と車などの複数の対象物体の類似度を算出することを特徴とする請求項１〜９記載の移動物体抽出装置としたものであり、本発明において多くの種類の移動物体を追跡できるという作用を有する。
【００１８】
本発明の請求項１１に記載の発明は、テンプレート処理手段では、テンプレートの生成時にテンプレートの属性を示すために、テンプレートの番号、テンプレートの生成時に対象物体が何であったかを示す対象物体、テンプレートの現在の位置、位置更新をした場合の元の位置、更新した回数を示す更新回数、テンプレートが消滅対象となった場合に消滅を示す消滅フラグおよびテンプレートの内容を示すＴＸＳ＊ＴＹＳの画素値（ＴＸＳ、ＴＹＳはテンプレートの大きさを示す）を持つように設定することを特徴とする請求項１記載の移動物体抽出装置としたものであり、本発明における多数の種類や量および処理の柔軟性を持たせることができる作用を有する。
【００１９】
本発明の請求項１２に記載の発明は、テンプレート処理手段では、テンプレートの生成は、形状特徴手段より抽出された類似度が適切な閾値以上の場合にテンプレートの属性を設定したテンプレートを生成させることを特徴とする請求項１または１１記載の移動物体抽出装置としたものであり、形状特徴手段からの類似度をテンプレート処理の生成に活用
して、テンプレートの生成を行うことができる作用を有する。
【００２０】
本発明の請求項１３に記載の発明は、テンプレート処理手段では、テンプレートの生成は、形状特徴手段より抽出された類似度が所定の閾値以上の場合にテンプレートの生成する適切な位置を形状特徴手段より抽出されたテンプレート設定位置にテンプレートを生成させることを特徴とする請求項１または１１〜１２記載の移動物体抽出装置としたものであり、本発明において形状特徴手段から類似度以外にテンプレートの生成位置情報を出力し、精度の高いテンプレート生成が行えることができる作用を有する。
【００２１】
本発明の請求項１４に記載の発明は、テンプレート処理手段では、抽出された動領域において形状特徴手段より抽出された類似度が所定の閾値以上の場合に、その動領域の周辺を探索して、対象となる物体と同一のテンプレートが存在する場合には、そのテンプレートの位置を更新し、テンプレートの内容を移動した位置の画素値に入れ替えることを特徴とする請求項１または１１〜１３記載の移動物体抽出装置としたものであり、一つの対象物体に重複してテンプレートの生成を防ぐことができる作用を有する。
【００２２】
本発明の請求項１５に記載の発明は、テンプレート処理手段では、入力手段から映像信号が転送される毎に、テンプレートの位置更新がされないテンプレートについては、テンプレートの現在の位置を中心にテンプレートマッチング処理を実行し、誤差が最も少ない位置にテンプレートを移動し、テンプレートの属性を示す更新回数を加増し、移動前の位置を元の位置に設定することを行うことを特徴とする請求項１または１１〜１４記載の移動物体抽出装置としたものであり、テンプレートマッチング法の有効な使用が行えることができる作用を有する。
【００２３】
本発明の請求項１６に記載の発明は、テンプレート処理手段では、入力手段から映像信号が転送される毎に、全てのテンプレートに対して、位置更新の処理を行った後に、テンプレートの動きベクトルを求めて、同一動きベクトルがもつテンプレートが複数存在した場合には、その複数のテンプレートのなかから位置や更新回数に応じて、一つのテンプレートに消滅フラグを立てて、消滅対象となったことを示し、この消滅フラグが立っているテンプレートが連続して消滅対象となった場合には、このテンプレートを消滅させることを特徴とする請求項１または１１〜１５記載の移動物体抽出装置としたものであり、不要なテンプレートを消滅させて精度の高いテンプレート処理を行えることができる作用を有する。
【００２４】
以下に、本発明の実施の形態について、図１から図６を用いて説明する。
（実施の形態１）
図１は、本発明の実施の形態１の移動物体抽出装置のブロック構成図を示す。図１において、７から９はカメラ映像処理系を示し、１はカメラ映像を入力する入力手段、２は入力手段１から得られた映像データから動領域を検出する動領域抽出手段、３は動領域抽出手段２から得られた動領域から各動領域別に領域の大きさや画素数等の動領域の情報を検出するラベリング処理手段、４はラベリング処理手段３より検出された各動領域の情報を用いて、その動領域が対象とする物体にどの程度似ているかを示す情報を抽出する形状情報抽出手段、５は入力手段１から得られた映像データと形状特徴抽出手段４から得られた動領域の情報に基づいてテンプレートの操作を行うテンプレート処理手段とから構成され、６は複数のカメラ映像処理系７から９からの情報を統合する統合処理手段、１０は統合処理手段６から得られた移動物体の位置や分類情報を処理してデスプレイ等に表示する出力処理手段より構成される。
【００２５】
本発明のカメラ映像処理系７〜９である入力手段１からテンプレート処理手段５までの大きな流れは、形状特徴を用いた対象物体の判別法とテンプレートマッチング法を組み合わせた処理である。テンプレートマッチングは、撮影環境の変化に対して安定した処理であるとともに、計算負荷が比較的軽い処理であるが、テンプレートの生成方法、テンプレートによる移動物体の追跡を行った場合に対象物体の動きや大きさの変動に対して追跡が不安定となる。そこで、本発明では移動物体の形状特徴から対象物体の類似度を求めて、その類似度を用いてテンプレート生成／位置更新およびテンプレートマッチングを行い、移動物体の追跡をすることで、処理負荷が少なく、高精度の移動物体のテンプレートマッチング法にしている。
【００２６】
以下に各処理手段毎に詳細に説明する。
入力手段１は、カメラとして例えば可視カメラまたは赤外線カメラ等を用いるものとする。本発明では複数のカメラが存在するが、同一種類のカメラのみでも、異なる種類のカメラが混在する場合もある。その用途に応じて選択が可能とする。
【００２７】
また、複数のカメラで撮影される観測区域は、どの区域も２台以上のカメラ映像により重複して撮影されていることを条件として、設置の例としては、図２に示すように４台を一組として、これらのカメラをカメラ１、カメラ２、カメラ３、カメラ４とした場合に、カメラ１とカメラ２およびカメラ３とカメラ４を並列又は内側に向いた方向に、カメラ１、２とカメラ３、４は対向に設置しながら、どの観測区域も２台以上のカメラ映像により重複して撮影できるようにカメラを増やしてゆく方法もある。図２の監視エリアとしては、高速道路を含めた自動車道路等が想定される。
【００２８】
動領域抽出処理手段２は、本発明では対象物体の全体の検出が重要で、足元まで含めた全体領域が１つの領域として抽出されることが望ましく、このため、入力手段１からの映像データから動領域を抽出する際、対象物体の領域の分離する割合が少ないのであれば映像の１フレーム間の差分する方法や背景映像を算出して、その背景を用いて現時点での映像との差分をとる方法を用いるものとする。また、前記の方法で対象物体の領域が分離するような場合には、映像の１フレーム間の差分結果を数フレームに渡って累積するフレーム間累積差分を行ってから最適な閾値で２値化し、動き領域抽出する方法を用いるものとする。
【００２９】
ラベリング処理手段３は、動領域抽出手段２で検出された２値化された動き領域からラベリング処理により対象物体（例えば、人物や車）の領域を区分するものである。一般的には、動領域抽出手段２で検出された動き領域には対象物体以外の領域が含まれることがあり、影や映り込みおよび映像に上乗されたノイズがこれらの例である。そこで、本手段では、これらの不要な部分を除去して実際の対象物体の領域に近い部分を一つの領域として抽出する領域整形としての役目も含まれている。
【００３０】
形状情報抽出処理手段４は、ラベリング処理手段３で検出される動領域には、対象物体の重なり、分離や影を含む場合も含まれる。従って、本発明で後述するテンプレートを用いて対象物体を追尾を行う場合、新たな動領域に対してテンプレートを生成させるためのルールが必要となる。形状情報抽出処理４では、動領域のサイズやフェレ比・全体の傾きなどの形状特徴を用いて、動領域が追尾すべき対象物体であるかどうかの確からしさを示す“類似度”を算出し、これによって判別を行うルールを設定した。室内では人物を対象として監視・追尾を行う。
【００３１】
この場合には、動領域が単一の人物かそれ以外の物体（複数の人物が重なったものや影など）かが問題となる。これは人物の実サイズがほぼ決まっていることから判別できる。この判別ルールについて以下に説明する。
【００３２】
画像上での人物のサイズＳｉｚｅはカメラからの距離を介して射影変換によって実サイズと関係づけられる。
【００３３】
そこで、この射影変換を一次式Ｓｉｚｅ＝ａｘＹ_{ｂｏｔｔｏｍ} ＋ｂで近似できる。ここでＹ_{ｂｏｔｔｏｍ}は動領域下端の画像上の座標であり、人物であれば足元の位置にほぼ対応する。長さの次元を持つサイズとしては動領域の外接矩形の幅ｗ・高さｈ、動領域の面積（画素数）の平方根ｒなどがある。ｗ、ｈ、ｒはいずれも上記の一次式でよく近似できる。従ってｗ、ｈ、ｒは主成分分析により一つの量に情報圧縮できる。そこでＳｉｚｅをＳｉｚｅ＝ α_ｗｘｗ＋ α_ｈｘｈ＋ α_ｒｘｒと定義する。ここで α_ｗ、α_ｈ、α_ｒは第一主成分軸への射影係数で α_ｗ ^２＋α_ｈ ^２＋α_ｒ ^２＝１を満たす。Ｓｉｚｅは前述の一次式でよく近似できるので、適切なフィッティング方法（最小二乗法など）により前記式の係数ａ、ｂを求め、この式によって「足元がにある人物の標準サイズ」を決定した。
【００３４】
この標準サイズを用いて判別ルールを次のように定めた。動領域の人物類似度ＬをＬ＝１００（１−｜Ｓｉｚｅ−Ｓｉｚｅ^０｜／Ｓｉｚｅ^０）と定義し、Ｌ≧Ｔｈを満たす場合に人物とする。ただし、ここでＳｉｚｅ^０は動領域のＹ_{ｂｏｔｔｏｍ}から求めた標準サイズ（ａｘＹ_{ｂｏｔｔｏｍ} ＋ｂ）、Ｔｈは適切なしきい値である。
【００３５】
なお、上記は人物についてのルールを示したが、車と人の判断等では、Ｙ_{ｂｏｔｔｏｍ} と動領域の面積の平方根ｒの一次近似式を用いて、その近似式の上下関係で人か車の判断を行う方法もある。
【００３６】
テンプレート処理手段５のテンプレート処理は、形状情報抽出処理手段４で算出された類似度を用いて類似度が高い場合にテンプレートの作成候補位置を用いてテンプレートの生成および位置修正を行う。また、類似度が低い場合には、テンプレートマッチングを実施し、テンプレートの位置を変更する。
【００３７】
図３にテンプレートの生成や位置変更の例を示す。具体的なテンプレートの生成や位置変更は、検出された動領域の類似度が適当な閾値以上である場合に（図３（１））形状情報抽出処理手段４から得られる位置（人の場合には頭部）を中心にテンプレートを探索し（図３（２））、その範囲にテンプレートがない場合に新しくテンプレートを発生させる。また、探索範囲に既にテンプレートが存在する場合には、最も近いテンプレートを形状情報抽出処理手段４で指示された更新位置に移動し（図３（３））、テンプレートの画素値を移動した位置の画素値に全て入れ替える。
【００３８】
テンプレートマッチングは、各フレーム画像からＳｏｂｅｌフィルタ処理で抽出したエッジ画像に対して行う。このため、テンプレートに保存される画素値はエッジ画像から抽出された値である。なお、テンプレートマッチング法において、テンプレートに似たパターンを探索する手法としては、最短距離法・ＳＳＤＡ（ＳｅｑｕｅｎｔｉａｌＳｉｍｉｌａｒｉｔｙＤｅｔｅｃｔｉｏｎＡｌｇｏｒｉｔｈｍ）・相関法・統計的手法等がある。本発明では高速性も兼ねそろえたＳＳＤＡを用いた。しかし、ＳＳＤＡではパターンが大きく変化した場合にテンプレートがエッジの少ないエリアに外れる。これを防ぐため、テンプレート内のエッジ情報の加算値と参照パターンのエッジ情報の加算値との差の絶対値をＳＳＤＡを実行時の誤差に加えている。
【００３９】
また、本手段ではテンプレ−トの消滅がある。各テンプレートは位置変更時に算出される移動ベクトルが同一のテンプレートが複数存在する場合には該当のテンプレートは消滅対象となる。
【００４０】
次に、テンプレート処理手段５の内部のブロック構成図を図４に示す。テンプレート処理手段５は、エッジ抽出処理手段１１、テンプレート生成・位置変更処理手段１４、テンプレートマッチング処理手段１２、エッジ特徴処理手段１３およびテンプレート後処理手段１５により構成される。
【００４１】
処理の流れを以下に説明する。
エッジ抽出手段１１は、入力手段１から入力されるカメラ映像１０４をフレーム毎にエッジ抽出処理を行う。このエッジ抽出手段で得られた情報は、テンプレートの現在の位置を中心に決められた範囲でマッチング処理手段１２による前述したマッチング処理やエッジ特徴処理手段１３でエッジ特徴処理を施される。エッジ特徴処理では前述したようにテンプレート内のエッジ値の合計値とエッジ画像のテンプレートが合わせられる領域のエッジ値の合計値との差を行う処理等のエッジパターンとテンプレートが持つパターンとのマッチングが行われる。テンプレートマッチング処理とエッジ特徴処理の誤差は、テンプレート後処理手段１５で加算され、加算値が最も少ない位置にテンプレートは更新される。
【００４２】
また、形状情報抽出処理手段４からの得られた動領域の位置やサイズ情報とその類似度は、テンプレート生成・位置変更処理手段１４に入力され、類似度が適当な閾値以上の場合については図３で示した処理が施される。類似度が適当な閾値よりも大きい場合にはこの手段での処理が優先される。なお、テンプレート後処理手段１５では前述したテンプレートの消滅処理を行う。
【００４３】
次に、本発明におけるテンプレート処理手段５のテンプレートの構造を図５に示す。テンプレートは、生成される時に以下の属性とテンプレートが生成された場所のエッジパターンを持つようにつくられる。つまり、テンプレートには、テンプレートの通し番号、テンプレートの映像内の現在位置、変更前の位置、対象物体（これは対象物体の識別子である。）、減点ｆｌａｇ、静／動ｆｌａｇ、更新回数およびエッジパターンである。
【００４４】
各属性について以下に説明する。各テンプレートは現在の画像上の座標と前回の座標とを保持している。前回の座標を保持するのは、移動量ベクトルを求めるためである。移動量ベクトルは同一方向に動くテンプレートを見付けるためのもので、同一方向に連続して動くテンプレートは消滅対象となる。更新回数は生成時点で０が設定され、更新される度に１ずつ増える。更新回数はテンプレートの強度を示す値でもあり、この値が低いものは消滅対象になりやすい。減点ｆｌａｇは移動量ベクトルが同一方向に動くテンプレートが２枚以上あった場合にｏｎとなり、後述する処理で更新回数が減少されるか、消滅処理が行われる。静／動ｆｌａｇは類似度が適当な閾値以上を示す領域を示すテンプレートについてｏｎとなる。このｆｌａｇにより更新時のテンプレートの更新の割合を変える。
【００４５】
テンプレートの更新方法を説明する。基本的には前回のテンプレートの画素値と新たにマッチングしたパターンの画素値との重み付き加算により行う。ここで、はテンプレート内座標の新しい画素値、は前回までのテンプレート画素値、はテンプレートがマッチングしたパターンのテンプレート内画素値である。
【００４６】
つまり、対象物体が静止状態に近いときに更新量を増やす。更新を行った場合、消滅flagがOFFであれば更新回数が＋１される。
【００４７】
最後にテンプレートの消滅について説明する。前述した移動量ベクトルが同一のテンプレートが複数存在する場合に消滅ｆｌａｇをＯＮとする。消滅ｆｌａｇがＯＮの場合、テンプレート更新時に更新回数を次のように変更する。
【００４８】
更新回数＞ T1 の場合更新回数＝ T2
T1 ＞更新回数＞ T2 の場合更新回数＝ T3
更新回数＜ T3 の場合消滅
ここで、Ｔ１、Ｔ２、Ｔ３は T1>T2>T3 の関係がある。従って、３回連続して消滅flagが立つ場合にはテンプレートを消滅させる。なお、テンプレートの対象物体（対象の識別子）は、屋外シーンのように人物や車両の判別が必要な場合に使用する。
【００４９】
統合処理手段６は、各カメラ映像処理系７〜９を前述した処理にて生成されたテンプレートの位置を多視点的な手法にて統合を行っている。具体的な例を図６に示す。図６において、カメラ１とカメラ２は、ほぼ同一の観測エリアを違う角度から撮影している。このとき、カメラ１の映像とカメラ２の映像はそれぞれテンプレート処理にて移動物体を追跡している。この追跡はテンプレートにて行っており、テンプレートの位置は対象の一部（人の場合には頭部）を示す。このため、カメラが実空間に対してどの様に設置されているか既知の場合でも対象物体の床面の位置（人の場合には足元位置）が正確に求められない。そこで、形状情報抽出処理手段４で示したように、人物の高さｈは、人物の動領域の下端のＹ座標Ｙ_{ｂｏｔｔｏｍ}との関係は一次式で近似できるので、各カメラ映像内のテンプレートの中心点から、その中心点を床面方向に垂直に延ばした点で対象の推定ボトム位置をこの一次式から近似的に推測することができる。
【００５０】
例えば、図６においては、カメラ映像１（図６（ａ））ではテンプレートの中心点Ａ、対象の推定ボトム点Ｂ、カメラ映像２（図（ｂ））ではカメラ映像１のＡ、Ｂに対応する点としてＡ｀、Ｂ｀となる。この各映像から抽出された各点をカメラの設置位置が既知とすると映像歪みがそれ程大きく無い場合には線形の式にて変換できる。
【００５１】
図６の右側に対象のトレースを行う実空間のイメージ図（図６（ｃ））を示す。この実空間においてＡ、Ｂ点およびＡ｀、Ｂ｀点を繋いだ直線を実空間にプロットする。この場合には、各映像のテンプレート処理が正確に行われていると仮定すると、対象物体が２台のカメラを結ぶ線上にある場合を除けば、対応する対象物体の実空間の位置に交点を持つことが幾何学的に証明できる。そこで、この交点を対象物体の実空間の位置とすることができる。
【００５２】
出力処理手段１０は、統合処理手段６で交点を求めた後、時間軸に従って、過去に存在した対象物体の位置と交点との最短距離による対応付けを行いながら実空間の位置をプロットすることにより各対象物体のトレースが行える。テンプレートマッチング処理では、必ずしもテンプレートの位置が正確に対象物体の位置を示していない場合も存在するが、最短距離による対応付けにてロバスト性を確保している。また、交点が定まらない場合や複数の対象物体が存在した場合に架空の交点が発生する可能性があるが、過去の履歴を用いた対応付けにて処理を行っている。
【００５３】
次に、本発明の具体例を以下に説明する。
図１の実施例としては、室内での人物の監視を行うシステムがある。このとき、一台のカメラでは正しく人物を抽出して追跡するが難しい場合が多々ある。例えば、床面積が大きかったり、机等で対象物体の下部が隠蔽したり、床面に対象の映り込みや影があったりする場合である。そこで、２台以上のカメラを設置し、複数の移動物体の追跡を自動生成型のテンプレートマッチング法を用いて多視点的な移動物体情報の統合・同定処理することにより、精度が高く、簡易なシステムが実現できる。
【００５４】
また、屋外の映像監視システムでは広域の監視エリアでの車や人などの混在する場面が多い。この場合にも、２台以上のカメラを設置し、複数の移動物体の追跡を自動生成型のテンプレートマッチング法を用いて多視点的な移動物体情報の統合・同定処理することにより、精度が高く、簡易なシステムが実現できる。
【００５５】
また、図２に示したように幅はそれ程広く無いが、横に広く伸びた高速道路や一般道における監視システムではカメラの設置を図２に示すようにしながら規則的に監視エリアを重複して撮影し、その撮影した映像を用いて複数の車の追跡を自動生成型のテンプレートマッチング法を用いて多視点的な車情報の統合・同定処理することにより、精度が高く、簡易なシステムが実現できる。
【００５６】
【発明の効果】
以上説明したように、本発明では、自動監視システムなどにおける移動物体の追跡に自動生成型のテンプレートマッチング法を用いて多視点的な移動物体情報の統合・同定処理することにより動画像処理の特徴を生かした簡易な処理で、精度の高い移動物体の追跡を行うことができ、自動生成型のテンプレートマッチング法により対象物体の重なりや停止物体の位置を簡単に求められるために簡易な処理で、精度の高い移動物体の追跡を行うことができる。
【００５７】
また、ステレオ処理に見られるパターンマッチングの難しさや非効率性を省略できる。さらに、カメラの設置位置を分散させて広い区域の観測ができる。
【図面の簡単な説明】
【図１】本発明の実施の形態１の移動物体抽出装置のブロック構成図
【図２】本発明の実施の形態１の移動物体抽出装置における複数カメラの配置例図
【図３】本発明の実施の形態１の移動物体抽出装置のテンプレート処理手段における生成時や位置変更時におけるテンプレートの探索例を示す図
【図４】本発明の実施の形態１の移動物体抽出装置のテンプレート処理手段のブロック構成図
【図５】本発明の実施の形態１の移動物体抽出装置のテンプレートの構造図
【図６】本発明の実施の形態１の移動物体抽出装置の統合処理手段の処理例を示す図
【符号の説明】
１入力手段
２動領域抽出手段
３ラベリング処理手段
４形状情報抽出処理手段
５テンプレート処理手段
６統合処理手段
７、８、９カメラ映像処理系
１０出力処理手段
１１エッジ抽出処理手段
１２マッチング処理手段
１３エッジ特徴処理手段
１４テンプレート生成・位置変更処理手段
１５テンプレート後処理手段[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a technique for detecting a moving area from a moving image and accurately extracting a moving object in a monitoring device or the like.
[0002]
[Prior art]
In recent years, monitoring systems using a large number of cameras have been increasing for wide area monitoring and facility monitoring. For this reason, the development of an automatic monitoring system capable of grasping the situation by capturing the movement of a target object, etc. is expected, excluding an inefficient monitoring work in which an observer monitors a plurality of videos. In particular, as a method of tracking a moving object in a wide observation area with multiple camera images, an area of the target object in the image is accurately extracted, a foot position of the target object is obtained, and the position is determined in the observation space. To track the target object, or image the same observation area with two cameras and associate the same object in the two images, and calculate the parallax of the two cameras. There is a method of performing stereo processing for obtaining a depth distance of a target object using the method.
[0003]
[Problems to be solved by the invention]
However, the conventional technique has a major problem. For example, in the method of calculating the foot position of the target object, it is difficult to obtain the foot position when the overlap of the target object, the position of the stationary object, and the lower part of the target are hidden by a shield. Also, in the stereo processing, the same pattern matching in the video has difficulty such as the presence of a target object having little edge information and a target object whose pattern cannot be matched due to parallax.
[0004]
SUMMARY OF THE INVENTION The present invention has been made in view of the above-described conventional problems, and has been made accurate by introducing a multi-viewpoint process from a template position based on an automatically generated template matching method with a small processing load as a method of integrating a plurality of videos. The purpose is to extract a moving object.
[0005]
[Means for Solving the Problems]
In order to solve the problem, the present invention provides input means for inputting a camera image, moving area extracting means for detecting a moving area from video data obtained from the input means, and moving area extracting means. Area for each moving area from the moving areaSize and number of pixelsLabeling processing means for detecting information on a moving area, and using the information on each moving area detected by the labeling processing means,WhenTarget moving objectSimilarity withShape information extracting means for extracting,Based on the similarityVideo data obtained from the input means and information on a moving area obtained from the shape information extracting meansWhen the template already exists in the search range, the closest template is updated, and when the template does not exist in the range, the template is generated, and the information of the template is output as the information of the moving object.A camera image processing system having a template processing unit, an integrated processing unit for integrating information from the plurality of camera image processing systems, and processing of the position and classification information of a moving object obtained from the integrated processing unit.outputOutput processing means.
[0006]
Accordingly, in the present invention, a plurality of camera images having a known mapping function between a spatial position and a pixel position in each image are used in a processing procedure that makes use of the feature of the moving image processing to take advantage of the moving image processing. By extracting foot position information and performing multi-viewpoint processing and time-series processing on this unstable information, it is possible to obtain a highly accurate motion trajectory (trace) of the target area and accurately extract a moving object. Can be.
[0007]
BEST MODE FOR CARRYING OUT THE INVENTION
According to a first aspect of the present invention, there is provided an input unit for inputting a camera image, a moving region extracting unit for detecting a moving region from video data obtained from the input unit, and a moving region extracting unit. Regions for each moving region from the obtained moving regionSize and number of pixelsLabeling processing means for detecting information on a moving area, and using the information on each moving area detected by the labeling processing means,WhenTarget moving objectSimilarity withShape information extracting means for extracting,Based on the similarityVideo data obtained from the input means and information on a moving area obtained from the shape information extracting meansWhen the template already exists in the search range, the closest template is updated, and when the template does not exist in the range, the template is generated, and the information of the template is output as the information of the moving object.A camera image processing system having a template processing unit, an integrated processing unit for integrating information from the plurality of camera image processing systems, and processing of the position and classification information of a moving object obtained from the integrated processing unit.outputThis is a moving object extraction device composed of output processing means that performs multi-viewpoint moving object information using template processing combining shape feature extraction means and template matching method for tracking a moving object. The integration / identification processing has an effect that it is possible to track a moving object with high accuracy by simple processing utilizing characteristics of moving image processing.
[0008]
The invention according to claim 2 of the present invention is characterized in that, in each of the observation areas photographed by a plurality of cameras, each of the observation areas is photographed in duplicate by two or more camera images. This is a moving object extracting device, and has an effect that the effect of the invention described in claim 1 can be further enhanced.
[0009]
The invention described in claim 3 of the present invention provides a camera 1, a camera 2, a camera 3, and a camera 4 when four cameras are set as a set when a plurality of cameras are installed. Cameras 1 and 2 and cameras 3 and 4 are installed facing each other in the direction where cameras 2 and 3 and 4 face in parallel or inward, and any observation area can be photographed with two or more camera images in duplicate. The moving object extraction device according to the first or second aspect is characterized in that the number of cameras is increased as described above, and the effects of the first and second aspects can be obtained in a long stretched area such as a highway. Has an action.
[0010]
According to a fourth aspect of the present invention, there is provided the moving object extracting apparatus according to any one of the first to third aspects, wherein the input means is a visible camera, an infrared camera, or a combination thereof. This is a limitation of the input means for achieving the effects of the inventions described in 1 to 3, and has an effect that the advantages can be obtained by compensating for the advantages and disadvantages of both the visible camera and the infrared camera.
[0011]
According to a fifth aspect of the present invention, in the integrated processing means, the position of the template, the classification of the target of the template, the serial number of the template, and the image of the target existing at the position of the template are transmitted from the template processing means of each camera video processing system. The moving object extraction apparatus according to any one of claims 1 to 4, wherein the position of the moving object in the observation area is calculated using the estimated lower limit position information of the area in (1). When integrating information from each camera image processing system for achieving an effect, it has an effect that a large number and types of moving objects can be tracked.
[0012]
According to a sixth aspect of the present invention, in the integrated processing means, a mapping function of an installation position of each camera in an observation space, a pixel in an image, and a plane position of the observation space is known, and a template of each image processing system is provided. The center point A of the template extracted in the process and the estimated lower limit point B of the target area indicated by the template are plotted at a plane position in the observation space, and the points A and B are connected by a straight line. The moving object extraction device according to claim 1, wherein intersections of the straight lines obtained from a plurality of video processing systems are set as candidate positions of the target object. There is an effect that the position accuracy of the moving object can be improved by the integration processing means for performing the integration / identification processing of the moving object information.
[0013]
According to a seventh aspect of the present invention, the shortest distance processing in the observation space is performed by using information on the position candidate of the target object obtained from the integration processing means, the position of the target object obtained so far, and the target classification of the template. 7. The moving object extracting apparatus according to claim 6, wherein the association is performed in accordance with the time axis by the output processing means. The output processing means has the effect of reducing erroneous recognition of a moving object on the time axis for processing.
[0014]
The invention according to claim 8 of the present invention provides a template processing meansIsPerform two-dimensional filter processing from camera images,ExtractedEdge video dataToOn the basis ofGenerate a templateA moving object extraction apparatus according to any one of claims 1 to 7, wherein the template processing means according to the present invention effectively executes pattern matching processing by using edge information.
Has the effect of being able to
[0015]
The invention according to claim 9 of the present invention isShape information extraction meansCalculates the similarity of the target object from the number of pixels indicating the size of the region for each moving region extracted by the labeling processing means and the size of the region, and based on the similarity, a template indicating the target object GenerateLocation informationThe moving object extracting apparatus according to any one of claims 1 to 8, wherein the moving object is extracted, and has an effect that a large number and types of moving objects can be tracked in the present invention.
[0016]
The invention according to claim 10 of the present invention is:Shape information extraction meansThe degree of similarity extracted from is a moving object extraction apparatus according to claims 1 to 9, wherein the degree of similarity between a plurality of target objects such as a human being and a car is calculated. It has the effect of being able to track any type of moving object.
[0018]
Claims of the invention11In the invention described in (1), the template processing means includes a template number, a target object indicating what the target object was at the time of generating the template, a current position of the template, and a position update for indicating the attribute of the template when generating the template. Position, the number of updates indicating the number of updates, the disappearance flag indicating the disappearance when the template is to be erased, and the pixel value of TXS * TYS indicating the contents of the template (TXS, TYS is the size of the template. ).1The moving object extracting apparatus according to the present invention has an effect capable of giving flexibility to a large number of types, amounts, and processing in the present invention.
[0019]
Claims of the invention12In the invention described in (1), the template processing means generates the template in which the attribute of the template is set when the similarity extracted by the shape feature means is equal to or more than an appropriate threshold value.1 or 1The similarity from the shape feature means is used for generating template processing.
Thus, there is an effect that a template can be generated.
[0020]
Claims of the inventionThirteenAccording to the invention described in the above, the template processing means generates the template when the similarity extracted by the shape feature means isPredeterminedThe method according to claim 1, wherein when the value is equal to or more than the threshold value, the template is generated at a template setting position extracted by the shape feature means.1 or 11-12The moving object extraction device described above has an effect of outputting template generation position information other than the similarity from the shape feature means in the present invention, thereby enabling highly accurate template generation.
[0021]
Claims of the invention14According to the invention described in the above, in the template processing means, the similarity extracted by the shape feature means in the extracted moving region isPredeterminedIn the case where the value is equal to or larger than the threshold value, the periphery of the moving area is searched, and if the same template as the target object exists, the position of the template is updated and the pixel value of the position where the content of the template is moved is set to ExchangeRukoClaims characterized by the following:1 or 11-13The moving object extracting apparatus described above has an effect of preventing generation of a template overlapping one target object.
[0022]
According to a fifteenth aspect of the present invention, in the template processing means, each time a video signal is transferred from the input means, for a template whose position is not updated, template matching processing is performed centering on the current position of the template. And move the template to the position with the least error, and update the number of updates indicating the attribute of the template.Increase, The original positionpositionClaims characterized by performing setting1 or 11-14The moving object extraction device described above has an effect that the template matching method can be effectively used.
[0023]
Claims of the invention16According to the invention described in (1), the template processing means obtains a template motion vector after performing position update processing for all templates every time a video signal is transferred from the input means, and obtains the same motion vector. If there is more than one template, set a deletion flag for one template according to the position and the number of updates from among the multiple templates, andShow,If the template on which the extinction flag is set is continuously deleted, the template is deleted.1 or 11 to 15The moving object extracting apparatus described above has an effect of eliminating unnecessary templates and performing highly accurate template processing.
[0024]
An embodiment of the present invention will be described below with reference to FIGS.
(Embodiment 1)
FIG. 1 shows a block diagram of a moving object extracting apparatus according to Embodiment 1 of the present invention. In FIG. 1, reference numerals 7 to 9 denote camera image processing systems, 1 denotes an input means for inputting a camera image, 2 denotes a moving area extracting means for detecting a moving area from the video data obtained from the input means 1, and 3 denotes a moving image. Labeling processing means for detecting information on the moving area such as the size of the area and the number of pixels for each moving area from the moving areas obtained from the area extracting means 2, and the information on each moving area detected by the labeling processing means 3 The shape information extracting means 5 extracts information indicating how similar the moving area is to the target object, and the video data obtained from the input means 1 and the motion information obtained from the shape feature extracting means 4 are used. A template processing means for operating a template on the basis of area information; 6 an integrated processing means for integrating information from a plurality of camera video processing systems 7 to 9; Composed of output processing means for displaying Desupurei like to process the position and classification information of the moving object has.
[0025]
A large flow from the input means 1 to the template processing means 5, which are the camera image processing systems 7 to 9 of the present invention, is a process in which a target object discriminating method using shape features and a template matching method are combined. Template matching is a process that is stable against changes in the shooting environment and has a relatively light calculation load.However, when a template generation method and tracking of a moving object using the template are performed, Tracking becomes unstable with respect to size fluctuations. Therefore, in the present invention, the processing load is reduced by obtaining the similarity of the target object from the shape characteristics of the moving object, performing template generation / position update and template matching using the similarity, and tracking the moving object. , A highly accurate moving object template matching method.
[0026]
Hereinafter, each processing means will be described in detail.
The input means 1 uses, for example, a visible camera or an infrared camera as a camera. Although a plurality of cameras exist in the present invention, different types of cameras may be mixed even with only the same type of camera. Selection can be made according to the application.
[0027]
As an example of installation, as shown in FIG. 2, four observation areas are photographed by a plurality of cameras, provided that each area is photographed in an overlapping manner by two or more camera images. As one set, when these cameras are referred to as camera 1, camera 2, camera 3, and camera 4, cameras 1 and 2, and cameras 3 and 4 are arranged side by side or inward. There is also a method of increasing the number of cameras so that cameras 3 and 4 are installed opposite to each other so that any observation area can be photographed in an overlapping manner by two or more camera images. As the monitoring area in FIG. 2, an automobile road including an expressway is assumed.
[0028]
In the present invention, it is important for the moving area extraction processing means 2 to detect the entire target object, and it is preferable that the entire area including the foot is extracted as one area. When extracting a moving area, if the rate of separation of the area of the target object is small, a method for subtracting one frame of the video or a background video is calculated, and the difference from the current video using the background is calculated. The method used shall be used. Further, when the region of the target object is separated by the above-described method, an inter-frame cumulative difference is obtained by accumulating the difference result between one frame of the video over several frames, and then binarized with an optimal threshold. And a method of extracting a motion region.
[0029]
The labeling processing unit 3 divides the region of the target object (for example, a person or a car) from the binarized motion region detected by the moving region extraction unit 2 by a labeling process.What to doIt is. In general, the motion area detected by the motion area extraction unit 2 may include an area other than the target object, and examples thereof include a shadow, reflection, and noise added to the video. Therefore, the present means also has a role as an area shaping that removes these unnecessary parts and extracts a part close to the area of the actual target object as one area.
[0030]
The moving area detected by the labeling processing unit 3 of the shape information extraction processing unit 4 includes a case where the target object includes an overlap, separation, or a shadow. Therefore, when tracking a target object using a template described later in the present invention, a rule for generating a template for a new moving area is required. In the shape information extraction processing 4, “similarity” indicating the likelihood of determining whether or not the moving area is the target object to be tracked is calculated using the shape characteristics such as the size of the moving area, the Feret ratio, and the overall inclination. , A rule for making a determination is set. In the room, monitoring and tracking is performed for a person.
[0031]
In this case, there is a problem whether the moving area is a single person or another object (such as an overlapping or shadow of a plurality of persons). This can be determined from the fact that the actual size of the person is substantially determined. This determination rule will be described below.
[0032]
The size Size of the person on the image is related to the actual size by projective transformation via the distance from the camera.
[0033]
Therefore, this projective transformation is performed by the linear expression Size = axY._bottom  + B can be approximated. Where Y_bottomIs the coordinates of the lower end of the moving area on the image, and almost corresponds to the position of the foot of a person. The size having the length dimension includes the width w and height h of the circumscribed rectangle of the moving region, the square root r of the area (number of pixels) of the moving region, and the like. Each of w, h, and r can be well approximated by the above-described linear expression. Therefore, w, h, and r can be compressed into one quantity by principal component analysis. Then, Size is calculated as Size = α_wxw + α_hxh + α_rxr. Where α_w  , Α_h, Α_rIs the projection coefficient onto the first principal component axis and α_w ²+ Α_h ²+ Α_r ²= 1 is satisfied. Since Size can be well approximated by the above-described linear expression, the coefficients a and b of the above expression were obtained by an appropriate fitting method (least square method or the like), and “the standard size of the person at the foot” was determined by this expression.
[0034]
The discrimination rule was determined as follows using the standard size. The person similarity L in the moving area is calculated as L = 100 (1- | Size-Size⁰| / Size⁰), And a person is determined when L ≧ Th is satisfied. However, here, Size⁰Is the motion area Y_bottomStandard size (a xY_bottom  + B), Th is an appropriate threshold.
[0035]
Note that the above shows the rules for a person._bottom  There is also a method in which a person or a car is determined based on the upper and lower relation of the approximate expression using a first-order approximate expression of the square root r of the area of the moving region.
[0036]
The template processing of the template processing means 5 uses the similarity calculated by the shape information extraction processing means 4 to generate a template and correct the position using the template creation candidate position when the similarity is high. If the similarity is low, template matching is performed to change the position of the template.
[0037]
FIG. 3 shows an example of template generation and position change. The specific template generation and position change are performed when the similarity of the detected moving area is equal to or more than an appropriate threshold (FIG. 3A). (FIG. 3 (2)), and a template is newly generated when there is no template in that range. If a template already exists in the search range, the closest template is moved to the update position designated by the shape information extraction processing means 4 (FIG. 3 (3)), and the pixel value of the template is moved to the updated position. All are replaced with pixel values.
[0038]
The template matching is performed on an edge image extracted from each frame image by Sobel filter processing. Therefore, the pixel value stored in the template is a value extracted from the edge image. In the template matching method, as a method of searching for a pattern similar to a template, there are a shortest distance method, an SSDA (Sequential Similarity Detection Algorithm), a correlation method, a statistical method, and the like. In the present invention, SSDA having high speed is used. However, in SSDA, when the pattern changes significantly, the template falls out of an area with few edges. To prevent this, the absolute value of the difference between the added value of the edge information in the template and the added value of the edge information of the reference pattern is added to the error when SSDA is executed.
[0039]
Further, in this means, there is the disappearance of the template. If there are a plurality of templates having the same movement vector calculated when the position is changed, the corresponding template is to be deleted.
[0040]
Next, FIG. 4 shows a block diagram of the inside of the template processing means 5. The template processing unit 5 includes an edge extraction processing unit 11, a template generation / position change processing unit 14, a template matching processing unit 12, an edge feature processing unit 13, and a template post-processing unit 15.
[0041]
The processing flow will be described below.
The edge extracting unit 11 performs an edge extracting process on the camera image 104 input from the input unit 1 for each frame. The information obtained by the edge extracting means is subjected to the above-described matching processing by the matching processing means 12 and edge feature processing by the edge feature processing means 13 in a range determined around the current position of the template. In the edge feature processing, as described above, matching between the edge pattern and the pattern of the template such as a process of performing a difference between the total value of the edge values in the template and the total value of the edge values of the region where the template of the edge image is matched is performed. Done. An error between the template matching processing and the edge feature processing is added by the template post-processing means 15, and the template is updated to a position where the added value is the smallest.
[0042]
The position and size information of the moving region and the similarity obtained from the shape information extraction processing means 4 are input to the template generation / position change processing means 14. The processing indicated by 3 is performed. If the similarity is larger than an appropriate threshold, the processing by this means is prioritized. Note that the template post-processing means 15 performs the above-described template disappearance processing.
[0043]
Next, FIG. 5 shows the structure of the template of the template processing means 5 in the present invention. When a template is created, it is created to have the following attributes and edge patterns where the template was created. That is, the template includes the serial number of the template, the current position of the template in the video, the position before the change, the target object (this is the identifier of the target object), the deduction flag, the still / moving flag, the number of updates, and the edge pattern. It is.
[0044]
Each attribute will be described below. Each template holds the coordinates on the current image and the previous coordinates. The reason why the previous coordinates are held is to obtain the movement amount vector. The movement amount vector is for finding a template that moves in the same direction, and templates that move continuously in the same direction are to be deleted. The number of updates is set to 0 at the time of generation, and increases by one each time it is updated. The number of updates is also a value indicating the strength of the template, and those having a low value are likely to be eliminated. The deduction point flag is turned on when there are two or more templates whose movement amount vectors move in the same direction, and the number of updates is reduced in the processing described later, or the deletion processing is performed. The static / dynamic flag is turned on for a template indicating an area in which the similarity is equal to or more than an appropriate threshold. The update rate of the template at the time of update is changed by the flag.
[0045]
A method for updating a template will be described. Basically, weighted addition of the pixel value of the previous template and the pixel value of the newly matched patternDo. hereIs the new pixel value of the coordinates in the template, is the previous pixel value of the template, and is the pixel value in the template of the pattern that the template matched.is there.
[0046]
I meanWhen the target object is close to a stationary state, the update amount is increased. When updating is performed, if the disappearance flag is OFF, the number of updates is incremented by one.
[0047]
Finally, the disappearance of the template will be described. When a plurality of templates having the same movement amount vector exist, the disappearance flag is set to ON. When the disappearance flag is ON, the number of updates is changed as follows when updating the template.
[0048]
When update count> T1 Update count = T2
When T1> update count> T2 Update count = T3
Update count<  For T3 disappeared
Here, T1, T2, T3Is T1>T2> T3 Have a relationship. Therefore, when the disappearance flag is set three times in succession, the template is erased. Note that the target object (target identifier) of the template is used when it is necessary to distinguish between a person and a vehicle, such as in an outdoor scene.
[0049]
The integration processing means 6 integrates the positions of the templates generated in the above-described processing by the camera video processing systems 7 to 9 by a multi-viewpoint method. A specific example is shown in FIG. In FIG. 6, the camera 1 and the camera 2 photograph substantially the same observation area from different angles. At this time, the video of the camera 1 and the video of the camera 2 track the moving object by template processing. This tracking is performed using a template, and the position of the template indicates a part of the target (the head in the case of a person). For this reason, even if it is known how the camera is installed in the real space, the position of the floor of the target object (foot position in the case of a person) cannot be accurately obtained. Therefore, as shown in the shape information extraction processing means 4, the height h of the person is the Y coordinate Y of the lower end of the moving area of the person._bottomCan be approximated by a linear expression, so from the center point of the template in each camera image, the estimated bottom position of the target is approximately estimated from this linear expression at the point where the center point extends perpendicular to the floor surface direction. can do.
[0050]
For example, in FIG. 6, the camera image 1 (FIG. 6A) corresponds to the center point A of the template, the estimated bottom point B of the object, and the camera image 2 (FIG. 6B) corresponds to A and B of the camera image 1. A ｀ and B ｀. If it is assumed that the camera installation position is known for each point extracted from each image, if the image distortion is not so large, it can be converted by a linear equation.
[0051]
The right side of FIG. 6 shows an image diagram (FIG. 6C) of the real space in which the target trace is performed. In this real space, a straight line connecting points A and B and points A ｀ and B ｀ is plotted in the real space. In this case, assuming that the template processing of each video is correctly performed, an intersection is set at the position of the corresponding target object in the real space, except when the target object is on a line connecting the two cameras. It can be proved geometrically to have. Therefore, this intersection can be set as the position of the target object in the real space.
[0052]
The output processing means 10 obtains the intersection by the integration processing means 6, and then plots the position in the real space while associating the position of the target object existing in the past with the shortest distance according to the time axis. Tracing of each target object can be performed. In the template matching process, there are cases where the position of the template does not always accurately indicate the position of the target object, but robustness is ensured by associating with the shortest distance. In addition, when an intersection is not determined or when a plurality of target objects exist, a fictitious intersection may occur, but the processing is performed by association using a past history.
[0053]
Next, specific examples of the present invention will be described below.
As an embodiment of FIG. 1, there is a system for monitoring a person indoors. At this time, it is often difficult to correctly extract and track a person with one camera. For example, there are cases where the floor area is large, the lower part of the target object is concealed by a desk or the like, or the target surface is reflected or shadowed on the floor surface. Therefore, two or more cameras are installed, and tracking of multiple moving objects is performed by integrating and identifying multi-viewpoint moving object information using an automatically generated template matching method. The system can be realized.
[0054]
Also, in an outdoor video surveillance system, there are many scenes where vehicles and people are mixed in a wide monitoring area. In this case as well, two or more cameras are installed, and tracking of a plurality of moving objects is performed by integrating and identifying multi-viewpoint moving object information using an automatically generated template matching method. , A simple system can be realized.
[0055]
Although the width is not so wide as shown in FIG. 2, in a surveillance system on a highway or a general road which extends horizontally, the surveillance areas are regularly overlapped while installing cameras as shown in FIG. A high-accuracy, simple system is realized by integrating and identifying multiple viewpoints of vehicle information using the automatically generated template matching method to track multiple vehicles using the captured images. it can.
[0056]
【The invention's effect】
As described above, in the present invention, the feature of moving image processing is achieved by integrating and identifying multi-view moving object information using an automatically generated template matching method for tracking a moving object in an automatic monitoring system or the like. By using simple processing, it is possible to track a moving object with high accuracy, and it is possible to easily obtain the position of the overlapping or stopping object by the automatically generated template matching method. It is possible to track a moving object with high accuracy.
[0057]
Further, it is possible to omit the difficulty and inefficiency of pattern matching found in stereo processing. In addition, it is possible to observe a wide area by dispersing the installation positions of the cameras.
[Brief description of the drawings]
FIG. 1 is a block diagram of a moving object extracting apparatus according to a first embodiment of the present invention.
FIG. 2 is a layout example of a plurality of cameras in the moving object extraction device according to the first embodiment of the present invention;
FIG. 3 is a diagram showing an example of searching for a template at the time of generation or position change in the template processing means of the moving object extraction device according to the first embodiment of the present invention;
FIG. 4 is a block diagram showing a template processing unit of the moving object extracting apparatus according to the first embodiment of the present invention;
FIG. 5 is a structural diagram of a template of the moving object extraction device according to the first embodiment of the present invention;
FIG. 6 is a diagram illustrating a processing example of an integrated processing unit of the moving object extraction device according to the first embodiment of the present invention.
[Explanation of symbols]
1 Input means
2 Moving area extraction means
3 Labeling processing means
4 Shape information extraction processing means
5 Template processing means
6 Integrated processing means
7, 8, 9 camera image processing system
10 Output processing means
11 Edge extraction processing means
12 Matching processing means
13 Edge feature processing means
14 Template generation / position change processing means
15 Template post-processing means

Claims

In an apparatus for inputting a plurality of camera images and tracking a moving object in an observation area which can be photographed by the camera images, an input unit for inputting a camera image, and a moving region is detected from image data obtained from the input unit. Moving region extracting means, labeling processing means for detecting information on the moving region of the size and the number of pixels for each moving region from the moving region obtained from the moving region extracting means, and the labeling processing means Shape information extracting means for extracting the similarity between the moving area and the target moving object by using the information of each moving area ; video data and the shape information obtained from the input means based on the similarity; the information of the moving region obtained from the extraction means, and updates the closest template if there is already a template in the search range, there is no template that range situ Generates a template, and integration processing unit for integrating the camera image processing system that includes a template processing means for outputting information of the template as the information of the moving object, the information from the plurality of camera images processing system, wherein A moving object extraction device comprising output processing means for processing and outputting the position and classification information of the moving object obtained from the integration processing means.

2. The moving object extracting apparatus according to claim 1, wherein each of the observation areas photographed by a plurality of cameras is photographed in an overlapping manner by two or more camera images.

When a plurality of cameras are installed as a set of four cameras, and these cameras are camera 1, camera 2, camera 3, and camera 4, camera 1 and camera 2 and camera 3 and camera 4 are arranged in parallel or inside. The cameras 1 and 2 and the cameras 3 and 4 are installed facing each other in the direction toward, and the number of cameras is increased so that any observation area can be overlapped with two or more camera images. The moving object extraction device according to claim 1.

4. The moving object extraction device according to claim 1, wherein the input unit is a visible camera, an infrared camera, or a combination thereof.

The integrated processing means uses the template position, template object classification, template serial number, and the estimated lower limit position information of the area in the target image existing at the template position from the template processing means of each camera image processing system to obtain the observation area. The moving object extraction device according to claim 1, wherein the position of the moving object is calculated.

In the integrated processing means, a mapping function of the installation position of each camera in the observation space, a pixel in the video, and a plane position of the observation space is known, and the center point A of the template extracted in the template processing of each video processing system and its The estimated lower limit point B of the target area indicated by the template is plotted at a plane position in the observation space, and the points A and B are connected by a straight line. 6. The moving object extracting apparatus according to claim 1, wherein intersections of the straight lines obtained from a plurality of video processing systems are set as candidate positions of the target object.

Outputs that the correspondence is performed according to the time axis in the shortest distance processing in the observation space using the information on the position candidate of the target object obtained from the integrated processing means and the position of the target object obtained so far and the target classification of the template. 7. The moving object extraction device according to claim 6, wherein the processing is performed by processing means.

Template processing means performs two-dimensional filtering the camera image, the extracted moving object extraction device according to any one of claims 1 to 7, characterized in that to generate a template based on the edge image data.

The shape information extracting means calculates the similarity of the target object from the number of pixels indicating the size of the area for each moving area extracted by the labeling processing means and the size of the area, and based on the similarity, moving the object extracting apparatus according to any one of claims 1 to 8, characterized in that extracts the position information that generates a template that indicates the object.

The moving object extraction device according to claim 1, wherein the similarity extracted from the shape information extracting unit calculates a similarity between a plurality of target objects such as a human and a car.

The template processing means includes a template number, a target object indicating what the target object was at the time of generating the template, a current position of the template, and an original position when the position is updated to indicate the attribute of the template when generating the template. , The number of updates indicating the number of updates, the disappearance flag indicating the disappearance when the template is the object of disappearance, and a pixel value of TXS * TYS indicating the contents of the template (TXS, TYS indicates the size of the template). 2. The moving object extracting apparatus according to claim 1, wherein

The template processing unit, generating the template shape information similarity extracted from the extraction means according to claim 1 or 11 further characterized in that to produce a template with the set of attributes of the templates if more than a predetermined threshold value Moving object extraction device.

In the template processing means, when the similarity extracted by the shape information extracting means is equal to or more than a predetermined threshold, the template generating means generates a template at an appropriate position to be generated by the template at the template setting position extracted by the shape information means. moving the object extracting apparatus according to claim 1, 1 1 or 12, characterized in that to.

When the similarity extracted by the shape information extracting means in the extracted moving area is equal to or more than a predetermined threshold, the template processing means searches around the moving area and finds the same template as the target object. when updates the position of the template, moving object extraction device according to any one of claims 1 or 11 to 13, wherein the Turkey interchanging the pixel value of the position moved the contents of the template.

In the template processing means, each time a video signal is transferred from the input means, for a template for which the position of the template is not updated, a template matching process is performed centering on the current position of the template, and the template is positioned at a position having the least error. 15. The moving object extracting apparatus according to claim 1 , wherein the moving object extracting apparatus moves , increases the number of updates indicating the attribute of the template, and sets the position before the moving to the original position. .

In the template processing means, each time a video signal is transferred from the input means, after performing position update processing for all templates, a motion vector of the template is obtained, and a plurality of templates having the same motion vector are obtained. If it exists, in accordance with the position and the number of updates from among the plurality of templates, make a annihilation flag on one of the templates, indicates that became the erasing target, annihilation flag of this is standing template 16. The moving object extracting apparatus according to claim 1, wherein the template is erased when is continuously deleted .