JP3914864B2

JP3914864B2 - Pattern recognition apparatus and method

Info

Publication number: JP3914864B2
Application number: JP2002351820A
Authority: JP
Inventors: 修山口
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2001-12-13
Filing date: 2002-12-03
Publication date: 2007-05-16
Anticipated expiration: 2022-12-03
Also published as: JP2003242509A

Description

【０００１】
【発明の属する技術分野】
本発明は、パターン認識方法に関する。
【０００２】
【従来の技術】
画像から特定の物体の位置、姿勢、形状を検出、認識する技術は、コンピュータビジョンの中で重要である。従来より、複雑な背景の下で対象の位置、大きさを正しく検出するための様々な方法が考えられている（非特許文献１参照）。
【０００３】
この中の特徴点ベースの対象検出法として、ジオメトリックハッシング法（Geometric Hashing)（非特許文献２参照）がある。
【０００４】
このジオメトリックハッシング法は、複数の特徴点の集合によって幾何学的な対象を表現し、平行移動、拡大、回転に不変な構造表現を用いてモデルの記述を行う。そして、ハフ（Ｈｏｕｇｈ）変換（非特許文献３参照）に類似した、対象モデルに関して仮説の投票と多数決原理によって、対象を検出するというものである。この方法では、対象の識別と検出を同時に行うことができる。
【０００５】
なお、ここでハッシングとは、複数の検索対象に対して高速な検索を行うために、ハッシュ関数による索引を付加することで、検索対象を効率的にメモリ上に分散させ、高速な対象へのアクセスを可能とするアルゴリズムの名称である。
【０００６】
一方、見え方指向（Appearance based）の考え方として、濃淡情報のテンプレートマッチングを基本としたいくつかの方法がある（非特許文献４参照）。
【０００７】
この中の濃淡情報を用いたマッチングは、パターン全体の大域的な情報を用いているため、ノイズに対してロバストである。
【０００８】
【非特許文献１】
L.G.Brown:”A Survey of Image Registration Techniques",ACM Computing Surveys,Vol.24,No.4,pp.325-376（白井（訳）：「画像の位置合わせ手法の概観」，コンピュータサイエンス acm computing surveys '92 bit別冊,pp.77-120)
【０００９】
【非特許文献２】
Lamdan,Y.and Wolfson,H.J.:”Geometric Hashing:a general and efficient model-based recognition scheme",Proceeding of International Conference Computer Vision,pp.238-249,1988
【００１０】
【非特許文献３】
P.V.C.Hough,”Method and means for recognizing complex patterns."U.S.Patent 3069654,1962.
【００１１】
【非特許文献４】
H.A.Rowley,S.Baluja,and T.Kanade,”Rotational Invariant Neural Network-Based Face Detection",in Proceedings,IEEE Conference on Computer Vision and Pattern Recognition.pp.38-44,1998
【００１２】
【発明が解決しようとする課題】
ジオメトリックハッシング法は、様々な拡張がなされてきた。しかし、幾何学的な対象を検出する方法には適しているが、テクスチャを持つ対象、例えば、図２−２０１に示した顔のように、目、鼻、口などといった部品や頬の陰影などの濃淡情報をもつ複雑な対象では困難となる。
【００１３】
例として、図２−２０２のように特徴点検出を行って、図２−２０３の特徴点集合から検出を行うといった場合を考える。この場合、点集合だけでは顔のモデルを表現することが困難であり、複雑な背景下ではさらに余分な候補点が増加しエラーを引き起こすことになる。
【００１４】
一方、見え方指向の考え方においても、文献４の顔検出の例からわかるように、予め回転させたテンプレートを複数個用意するなどの、モデル表現を多重にする必要や、複数の（テンプレート）マッチングを数段階のピラミッド画像に対して行う必要がある。すなわち、複数の変形（変換）に対するテンプレートを多重にもつ必要がある。また、全体的なパターンの類似度の評価にはロバストであるが、部分的な変形の発見や局所的なノイズには弱いといった側面もある。さらには、従来の顔領域検出で行われているように、検出精度を向上させるために、誤認識パターンを利用する方法がある。その面においても、誤認識パターンをどのようにして収集するのがよいかという問題が残る。
【００１５】
従来のこれらの検出法を改善するために、要求される性質としては、
・部分的な情報の登録とモデルの多重記述
・姿勢を規定しないための不変量に基づいたモデルの記述法
・ノイズに強い検出メカニズム
などが挙げられる。
【００１６】
そこで、本発明では、これらの性質を満たすように、ハッシュテーブルを用いた部分画像の分散モデル表現と効率的なパターン検索により、高速かつ正しい精度のよい物体認識、物体抽出を行うことを目的とする。
【００１７】
【課題を解決するための手段】
本発明は、認識対象物が撮影された画像と、予め登録したモデルとを比較して、前記認識対象物の認識を行うパターン認識装置であって、前記認識対象物が撮影された画像を入力する画像入力手段と、前記入力した画像中から複数の特徴点を抽出する特徴点抽出手段と、前記抽出された複数の特徴点の中から３点の特徴点の組み合わせを選択する特徴点選択手段と、前記選択された３点の特徴点の中の一つの特徴点を原点して他の２つの特徴点への２つの基準ベクトルと、これら２つの基準ベクトルの間の角度θを含む基底を計算する基底計算手段と、前記計算された各基底における前記２つの基準ベクトルのそれぞれの長さＬ１、Ｌ２を同じ長さに変換し、前記角度θを直角に変換して、前記認識対象物の部分パターンを前記画像から抽出する部分パターン抽出手段と、２つの基準ベクトルの長さＬ１、Ｌ２の比と、前記２つの基準ベクトルの間の角度θからなるインデックスパラメータに基づいて分割された複数の登録場所から構成され、かつ、前記モデルの部分パターンが、その部分パターンに関するインデックスパラメータである前記２つの基準ベクトルのそれぞれの長さＬ１、Ｌ２の比と、前記２つの基準ベクトルの間の角度θに対応した登録場所にハッシュ関数によって登録されたテーブルを記憶するテーブル記憶手段と、前記抽出された認識対象物の部分パターンに対応する前記基底の２つの基準ベクトルについて同じ長さに変換する前のそれぞれの長さＬ１、Ｌ２の比と、前記角度θに基づいて、前記記憶されたテーブルの登録場所をハッシュ関数によって決定するインデックス検索手段と、前記決定されたテーブルの登録場所に登録された前記モデルの部分パターンと、前記認識対象物の部分パターンの類似度とを判定するパターン類似度計算手段と、を有することを特徴とするパターン認識装置である。
【００２６】
本発明であると、テーブルを用いた部分パターンによる分散モデル表現と効率的なパターン検索が可能となる。
【００２７】
また、部分パターンを高速に検索し、複数の検索結果を用いて、正しい物体の位置、姿勢を検出、また対象の識別が可能である。
【００２８】
また、ノイズ、隠れを含んだ対象の検出、識別も安定して行うことが可能となる。
【００２９】
また、テーブル登録手段での効率的なモデルの保存が可能となり、検索時の時間短縮などの効果がある。
【００３０】
【発明の実施の形態】
（第１の実施例）
以下、本発明の第１の実施例のパターン認識装置について図面に基づいて説明する。
【００３１】
本実施例は、ある画像中から対象とする物体の位置と姿勢を検出する方法である。認識対象物とする物体はどのようなものでも構わないが、本実施例では、図４−４０１のような、模様を持った（一つの面に「Ａ」の文字が記載された）直方体の箱の場合を説明に用いる。
【００３２】
本実施例では、この箱の任意の位置と任意の姿勢を写した画像を一つのモデルとして、様々な位置や姿勢の箱に関する画像を複数のモデルとして登録する。
【００３３】
そして、同じ箱が撮影された検出対象画像について、どのモデルと類似しているかを判断して、その検出対象画像に撮影された箱の位置と姿勢を認識するものである。
【００３４】
そこで、以下の説明では「モデルの登録」と「投票によるモデルの検出と識別」の２つのフェーズに分けて説明する。
【００３５】
（１）パターン認識装置の構成
図１は、本実施例のパターン認識装置の構成図である。
【００３６】
図１に示すように、パターン認識装置は、画像入力部１、特徴点抽出部２、特徴点選択部３、基底計算部４、部分パターン構成部５、インデックス計算部６、テーブル登録部７、パターン類似度計算部８、仮説情報生成部９及び物体認識部１０から構成される。
【００３７】
このうち画像入力部１は、ＣＣＤカメラなどによって構成され、その他の特徴点抽出部２、特徴点選択部３、基底計算部４、部分パターン構成部５、インデックス計算部６、テーブル登録部７、パターン類似度計算部８、仮説情報生成部９及び物体認識部１０の各機能は、パソコンなどのコンピュータに記憶されているプログラムによって実現される。
【００３８】
（２）モデルの登録
認識対象物のモデルの登録方法について図１、図３、図４を用いて説明する。
【００３９】
（２−１）モデル登録の処理の流れ
処理の流れを図３のフローチャートに基づいて説明する。
【００４０】
まず、図１の画像入力部１において、認識対象物を含む画像を入力する（図３のステップ３０１）。
【００４１】
次に、特徴点抽出部２において、その画像に対して、特徴点の検出を行う（図３のステップ３０２）。
【００４２】
この特徴点の検出の手法としては、非特許文献５〔福井和広、山口修：“形状情報とパターン照合の組合せによる顔特徴点抽出”，信学論(D-II),vol.J80-D-II No.8(1997)〕で提案している分離度フィルタによるものでよい。また、角点（コーナー）を検出するHarrisのフィルタ〔非特許文献６参照：C.J.Harris and M.Stephens,A combined corner and edge Detector.In Proc.4th Alvey Vision Conference,Manchester,pages 147-151,1988.〕のようなものを用いてもよく、用途や認識対象物に合わせて方式を選択すればよい。
【００４３】
次に、特徴点選択部３において、認識対象物を含む部分の特徴点を選択するが、ここでは、特徴点の中から、３つの点の組み合わせを全て求める（図３のステップ３０３）。
【００４４】
そして、その組み合わせ毎に部分パターンの登録を行う。それぞれの３点の組み合わせから、基底計算部４にて、２つのベクトルを決定し、それに基づいた基底情報を用いて、部分パターン構成部５において、周辺の領域の部分濃淡パターンを切り出す（図３のステップ３０５）。
【００４５】
それぞれの切り出された部分濃淡パターンは、インデックス計算部６において、ハッシュ関数によって登録場所が計算され（図３のステップ３０６）、テーブル登録部７が管理するハッシュテーブルに登録される（図３のステップ３０７）。
【００４６】
これを繰り返し、全ての組合わせが求まった時点で一つのモデルの登録が終了する（図３のステップ３０４）。
【００４７】
複数のモデルを登録する場合には、上記処理を繰り返す。
【００４８】
（２−２）モデル登録の具体例
モデル対象である箱が撮影された図４−４０１の画像をモデル登録する場合について説明する。
【００４９】
モデル対象である箱の特徴点を検出した結果を図４−４０２に示す。
【００５０】
これら特徴点の中から、予め主要な点（以下、着目点という）を選択しておき、それらの点の３点の組み合わせを全て求める。この選択した３点の特徴点の組み合わせを特徴点グループという。
【００５１】
図４−４０３は、これら特徴点グループの中から１つの特徴点グループを図示したもので、その３点の特徴点から２つの基準ベクトルを求める。
【００５２】
図４−４０４は、２つの基準ベクトルで張られる基底を示したもので、２つの基準ベクトルのそれぞれの長さＬ１、Ｌ２とその間の角度θを計算しておく。
【００５３】
なお、「基底」とは、いくつかの基準ベクトルから、座標系を構成するもので、原点と座標軸の情報をもつものである。ここでは、２つの基準ベクトルの始点を原点とし、各基準ベクトルの方向を座標軸の方向とする。そして、特徴点グループの定義して、特徴点Ｆ１、Ｆ２、Ｆ３でグループを形成する場合に、特徴点Ｆ１を原点したものと、特徴点Ｆ２を原点としたものとでは異なるグループを形成するものとする。
【００５４】
次に、その基底に対して、その原点と２つの基準ベクトルの周辺の画像パターンを切り出す（図４−４０５）。切り出しを行う場合、２つの基準ベクトルが張る基底の座標系を直交座標系に変換して画像パターンを切り出す。
【００５５】
この画像を切り出すために、２つの基準ベクトルが張る基底の座標系を直交座標系に変換する。すなわち、基準ベクトルのなす角度は直角に、２つの基準ベクトルの長さは同じになるように、画像全体を変形させる。
【００５６】
その後、基底の原点を中心とした、予め決められた範囲の濃淡画像（ｍ×ｎピクセルの画像）を部分パターンとして切り出す。
【００５７】
切り出された濃淡画像は、各種の幾何学的変換に対して不変な形式となる。このように濃淡画像に対して予め座標変換を行って登録、検索することで、後のパターン同士の比較の際に、濃淡画像に対する幾何学的変換を行うことなく類似性を判断することが可能となる。
【００５８】
図４−４０６が、切り出された部分濃淡画像を表しており、本実施例では、正方形の濃淡画像を表している。
【００５９】
切り出された部分画像は、基準ベクトルの長さ、方向によっては、歪んだ見え方になる場合がある。
【００６０】
図４−４０７は、同様に別の３点の組み合わせから、基底の構成と部分パターンの切り出しを行ったものである。
【００６１】
それぞれの切り出された部分パターンは、ハッシュ関数によって計算されたテーブル（以下、ハッシュテーブルという）の所定の登録場所に登録する。この登録内容は、部分濃淡画像に加え、着目点の種類や、認識対象物全体における部分パターンの相対的な位置情報などを同時に含んでもよい。これについては後述する。
【００６２】
（２−３）ハッシュテーブルへの登録
ハッシュ関数の持つ性質として、与えた基底に対して、平行移動、回転、拡大縮小変換に対しての不変性を利用する。ある同じ条件の基底を持つものは、これらの変換を施したとしても、同じ返り値を返す関数を定義する。
【００６３】
ここで、基底を構成する２つの基準ベクトル間の角度と２つのベクトルの長さの比については、平行移動、回転、拡大、縮小といった幾何学的変換に対して不変であるため、この不変性を用いたハッシュ関数の構成を考える。すなわち、２つの基準ベクトル間の角度と２つのベクトルの長さの比を、幾何学的変換に関して不変なパラメータ（以下、インデックスパラメータという）として、ハッシュテーブルの登録場所を決定する。
【００６４】
ハッシュ関数Ｈは、以下のように３点の特徴点ｐ１，ｐ２，ｐ３を引数として与え、インデックスパラメータである長さの比Ｒ_ａｘｓと角度θ_ａｘｓを求めた後、量子化し、ハッシュテーブルの位置を返す。ここで、ｐ１、ｐ２、ｐ３の各位置は、絶対座標系の原点を基準にして表されている。また、基底における原点は、特徴点ｐ１とする。
【００６５】
【数１】

図４−４０８は、２次元的にハッシュテーブルを表しており、縦軸は２つの基準ベクトルの長さの比Ｒ_ａｘｓ、横軸は２つの基準ベクトルのなす角度θ_ａｘｓを表すものとする。
【００６６】
ここで、各基底毎に求まった２つのベクトルの間の角度θ_ａｘｓ、長さの比Ｒ_ａｘｓを求め、その値が示すハッシュテーブルの位置に、切り出した部分濃淡パターンなどを登録する。
【００６７】
なお、これらのθ_ａｘｓ、Ｒ_ａｘｓについては、誤差も考慮して、適当な量子化を行ってよい。
【００６８】
各ハッシュテーブル上の（登録される）実体は、次のように表現する。なお、この形式、種類に限定されるものではない。
【００６９】
【数２】

Ａｘｓは、３つの特徴点に基づいた座標系情報を表す。具体的には３点の座標位置、２つの基底ベクトルの情報などを含む。
【００７０】
Ｌａｂｅｌは、３つの特徴点がどの着目点に対応しているかを表す。具体的には、図７−７０７で示されるように、各頂点にラベル（Ｆ１〜Ｆ１２）がつけられており、Ａｘｓで示される３点のそれぞれが、どのラベルに対応しているかを記述しておく。
【００７１】
Ｉｎｖは、変換に対する不変量であり、先に示したθ_ａｘｓ、Ｒ_ａｘｓなどが含まれる。
【００７２】
Ｒｅｌは、物体を囲む領域を表現した点集合を相対的に表現したもので、検出結果の場所を示すためや再度その領域を抽出することなどに利用される。具体例としては、図９のような矩形を表すための頂点集合の４点の座標位置で記述する。
【００７３】
ＳＰａｔは、特徴点に囲まれた局所画像であり、切り出した部分濃淡パターンを示す。
【００７４】
それぞれの組み合わせ全てに対して同様の処理を行い、モデルを登録する。
【００７５】
これまで１枚の画像に対しての処理を述べたが、同じ認識対象物で撮影条件が異なる別の画像を用いて、同様に登録を行うことで、さまざまな環境で撮影された認識対象物の認識が可能となる。
【００７６】
（２−４）ハッシングアルゴリズムにおける衝突の対応
しかし、このような登録を逐次行っていった場合、ある３点の組み合わせにより同じハッシュ関数の値を持つ場合、その部分パターンが同じ場所に登録されてしまうことになる。すなわち、一般のハッシングアルゴリズムにおける“衝突”(collision) が発生する。
【００７７】
図５−５０１のようにハッシュテーブルの各位置には、複数の候補パターンをリスト構造としてつなげて複数個持てるようにし、衝突が発生した場合には、図５−５０２のように新たな照合の候補として複数個を持つ。この方法は一般にチェイン法と呼ばれている。照合の候補が多くなった場合には、図５−５０３のように、多数の候補が存在することとなる。
【００７８】
ここで、同じ認識対象物の複数のモデルの登録を行った場合、各照合データのパターンの類似度を判断し、その類似度が高い場合は、一つのモデルとして併合してしまうことを考える。
【００７９】
例えば、図５−５０４のように、パターン同士の類似度が似ているなどの条件を満たす場合には、新たなモデルのパターンを用意し、一つにまとめる。まとめ方としては、複数のモデルの平均パターンを作成するなどが一例として挙げられる。また、一つにまとめる場合に、導入した平均パターンが元のモデルとの類似度が低い場合は、別のモデルとして別に扱うようにしてもよい。
【００８０】
（３）投票によるモデルの検出と識別
次に、画像中から認識対象物を検出する方法について説明する。
【００８１】
認識対象物の検出アルゴリズムをわかりやすく説明するために、ハッシュテーブルを用いたモデルの選択、仮説の生成と、仮説の統合、検証による認識とに分けて説明する。
【００８２】
（３−１）ハッシュテーブルを用いたモデルの選択
（３−１−１）モデルの選択処理の説明
ハッシュテーブルを用いたモデルの選択の処理の流れについて図６に基づいて説明する。
【００８３】
まず、認識対象物とする画像を、図１の画像入力部１に画像を読み込む（図６のステップ６０１）。
【００８４】
次に、特徴点抽出部２において、認識対象物とする画像に対して特徴点の抽出を行う（図６のステップ６０２）。
【００８５】
次に、検出された特徴点から、特徴点選択部３において、検出された特徴点の組合わせを選び（図６のステップ６０３）、組み合わせが全て選ばれるまで（図６のステップ６０４）、逐次行われる。
【００８６】
各組み合わせに対して、基底計算部４にて基底を計算する（図６のステップ６０５）。
【００８７】
そして、その基底におけるハッシュ関数のインデックスパラメータをインデックス計算部６にて計算する（図６のステップ６０６）。
【００８８】
テーブル登録部７において、そのインデックスパラメータに対応するハッシュテーブルの登録場所を検索する（図６のステップ６０７）。
【００８９】
これは、登録パターンが存在するかどうかにより判断が分かれる（図６のステップ６０８）。
【００９０】
登録パターンが存在する場合は、部分パターン構成部５で、周辺の部分パターンを切り出し、登録パターンと部分パターンとの類似度をパターン類似度計算部８において計算して比較する。なお、パターン同士の類似度の計算法については、一般の濃淡パターンに対する類似度の計算法、例えば、正規化相関、ＳＳＤ、単純類似度などでもよい。このパターン同士の類似度計算の方法については問わない。
【００９１】
登録パターンが存在しない場合は、モデルが存在しないため、類似度の計算は行われない（図６のステップ６０８）。
【００９２】
モデルの選択が行われた後、３点の組み合わせによる周辺領域の部分パターンが類似している場合、検出したい対象領域の一部である可能性を持っていることから、仮説情報生成部９において、検出対象領域の仮説情報を生成する。これを全ての検出された特徴点の組合わせについて行い、仮説情報の生成処理を繰り返す。これについては、後述する。
【００９３】
（３−１−２）モデルの選択処理の具体例
これまでの手順を図７を用いて具体的に説明する。
【００９４】
本処理の目的は、入力した画像に基づいて、先に説明したモデル登録で登録を行った箱の位置、姿勢の検出を行うことである。
【００９５】
図７−７０１が、認識対象物である箱が撮影された入力画像である。この画像からモデル登録と同様に特徴点検出を行う（図７−７０２）。
【００９６】
次に、特徴点の中からモデル登録と同様にある３点を選択する。
【００９７】
その３点に基づいて作成した基底情報により、周辺の領域の濃淡パターンを切り出す（図７−７０３）。
【００９８】
次に、基底情報の不変量に対応するハッシュテーブル（図７−７０４）の場所を検索する。
【００９９】
登録パターンが存在し、かつ、類似度が設定した閾値を超えた場合、仮説が生成される。
【０１００】
図７−７０５のように、３つの適合するパターンがあった場合、３つのそれぞれの仮説が生成される。
【０１０１】
（３−２）仮説情報の生成
仮説情報の内容としては、特徴点の場所の情報、位置や大きさの情報、濃淡パターン情報、モデルパターンとの類似度などの情報を含む。一例として仮説を次の３つ組で定義する。なお、仮説情報の定義はこれに限らない。
【０１０２】
【数３】

ＴＲｅｌは、相対的な位置情報であるＲｅｌを選択した３点にしたがって変換した位置情報であり、画像中での物体の存在領域を表している。
【０１０３】
ＴＬａｂｅｌは、特徴点の中で、どの着目点に対応するのかを記述するためのものである。
【０１０４】
Ｐｓｉｍは、モデルとして登録されている濃淡パターンと選択された部分パターンとの類似度を表す。
【０１０５】
検出時に選択された３点から決定されるパターン情報をハッシュテーブルに記述されているものと同様に次のように表す。
【０１０６】
【数４】

なお、φは空集合を表し、Ｌａｂｅｌ_ｘがφとなっているのは、着目点の情報は３点の選択だけでは決定しないためである。
【０１０７】
また、同じハッシュ関数の値を持つ部分に存在する部分モデルを
【数５】

とする。
【０１０８】
すなわち、検索の結果で同じ場所にあるため、Ｉｎｖ_ｘ＝Ｉｎｖ_ｍが成立している。仮説情報を生成するためには、以下のような３つの関数によってそれぞれの要素が計算される。
【０１０９】
【数６】

ＦｕｎｃＬａｂは、対応する着目点のラベルを計算する関数である。ＦｕｎｃＬａｂは、Ａｘｓ_ｘの各点に対し、Ａｘｓ_ｍの順序に対応したＬａｂｅｌ_ｍのラベルをＴＬａｂｅｌとして返す。
【０１１０】
ＦｕｎｃＧｅｏｍは、選択された基底に基づいて物体の存在位置を計算する関数である。
【０１１１】
具体的には、以下の式で計算される。ここで（．，．）は座標を表す。
【０１１２】
【数７】

となる。
【０１１３】
ＦｕｎｃＳｉｍは、濃淡パターン同士の類似度を計算する関数である。パターン同士の類似度を求める方法には、前述したように様々なものがあるが、例えば、単純類似度の場合は以下の式で計算される。
【０１１４】
【数８】

各ハッシュテーブルに登録されている部分モデルの内容に基づき、それぞれの仮説の内容を計算する。なお、先に類似度Ｐｓｉｍの値に対して閾値を設定するように説明したが、類似度値は低いものについても、すべて仮説情報を生成するという使い方でもよい。
【０１１５】
各仮説情報は、検出対象領域の一部であるという情報をもっているため、これらの情報を統合することによって物体の認識が行える。
【０１１６】
（３−３）仮説情報の統合、検証による認識
部分パターンの類似度を用いて、検出対象領域の一部であるという仮説情報を前述のように生成し、全ての仮説情報を仮説空間に投票し、結果を統合する。これは、物体認識部１０において処理される。
【０１１７】
本実施例では、モデルの登録時に設定した着目点の場所を特定できるような検出方法について説明を行う。
【０１１８】
（３−３−１）仮説投票処理の具体例
仮説情報を投票するための仮説空間は、物体認識部１０に含まれる仮説統合部で管理される。図７−７０６は、仮説空間を図的に表したものである。図７−７０７は、モデルの登録の際に指定した着目点の場所を表しており、仮説空間は、それぞれの着目点に対応したそれぞれの仮説投票箱により構成される。
【０１１９】
図７−７０３に示した３点が選ばれた場合の仮説として、図７−７０５のようにパターンの類似度が閾値を超えたものが、３つの存在する場合を考える。それぞれの仮説情報はどの着目点の組み合わせで得られたものであるかを記述している。例えば、着目点Ｆ８、Ｆ４、Ｆ５である。
【０１２０】
それぞれの着目点Ｆ１、Ｆ２、・・・・、Ｆ１２に対応する仮説投票箱に、パターンの類似度値だけ投票を行う。すなわち、Ｌａｂｅｌに記述されている仮説投票箱にＰｓｉｍの示す類似度値を加算する。
【０１２１】
なお、通常、同じ着目点に対しての投票については、複数の別の特徴点位置を持つものが存在する。この場合、各仮説投票箱で、異なる座標値をもつものは、それぞれの座標ごとに投票値を管理しておく。これは、複数の物体を検出する際や間違った検出内容を削除する際に有用である。
【０１２２】
このように、全ての組み合わせに関する全ての仮説情報に対して、投票を行う。
【０１２３】
次に、投票値の多かったものを組み合わせから、順に得票値の高いものを各特徴点の検出結果として、その座標値を出力する（図７−７０８）。この座標値が、認識対象物の画像に撮影されている箱の特徴点の座標値となる。
【０１２４】
これについては、ある閾値を設定し、その設定値以上の得票値を得たもののみを出力するなどの処理を行って良い。図７の例では、３つの特徴点が選択され、その位置に着目点として指定されている特徴点が検出されている。
【０１２５】
なお、選択された特徴点の組み合わせから、全体のパターンの位置がどこにあるのかを計算し、それを出力としてよい（図７−７０９）。この場合は、物体の存在位置を示すＴＲｅｌの情報を組み合わせた結果とすればよい。
【０１２６】
（３−３−２）仮説投票処理の説明
図８は、仮説統合部の処理フローチャートを示す。
【０１２７】
図８のステップ８０１で、一つの仮説情報を受け取ると、その仮説情報を構成する３つの特徴点それぞれの着目点の種類に対し、各着目点の座標値が既に同じものがあるかどうかを調べる（図８のステップ８０２）。
【０１２８】
同じものがある場合は、その仮説投票箱の投票値に先に求めた類似度を加算し、投票値を更新する（図８のステップ８０３）。
【０１２９】
同じものがない場合は、結果となる座標が異なるため、その仮説投票箱のなかに別のカウンターを用意し（図８のステップ８０４）、類似度を加算して更新を開始する。全ての仮説情報の投票が終了したかどうかを判定し（図８のステップ８０５）、終了していなければ、仮説情報の投票を繰り返す。
【０１３０】
終了した場合、それぞれの仮説投票箱の投票値の中から、得票値の高いものを出力し、検出結果とする（図８のステップ８０６）。
【０１３１】
（第２の実施例）
次に、第２の実施例に図９，１０に基づいて説明する。
【０１３２】
第２の実施例では、部分的な画像に加えて、大域的な画像の切り出しを行ったパターンを併用した検出方法について述べる。
【０１３３】
（１）顔領域検出に対する大域画像を利用した認識法
図９は、顔領域検出に対する大域画像を利用した認識法に関して、各種のデータについて図示したものである。
【０１３４】
図９−９０１は顔を含む画像である。着目点として、３つの特徴点（両鼻孔と片側の口端）が選択されており、大域画像（顔全体）の領域を表す矩形を示している。
【０１３５】
図９−９０２は、その３点から張られる基底を表し、その間の角度、ベクトルの長さ比は、ハッシュテーブルのインデックスとして利用する。
【０１３６】
図９−９０３は、その３点から張られる基底により、切り取られる周辺の濃淡パターンを表す。
【０１３７】
これに加え同時に図９−９０４のように、顔領域全体の濃淡パターンも切り出しておく。この場合のテーブルへの登録内容は、
【数９】

となり、全体の濃淡パターンＧＰａｔを追加したものとなる。
【０１３８】
さらに、その基底情報によって生成される座標系によって、顔全体の領域を表す矩形を図９−９０５から図９−９０６のように表現を替え、各頂点の相対的な座標Ｒｅｌを計算しておく。これは、検出された場合に相対的な全体領域の位置がどのようになるかを求めるために利用する。
【０１３９】
モデルの登録時には、前述した方法では、部分パターン（図９−９０３）を登録していたが、これに追加して全体パターン（図９−９０４）と図９−９０６で示した全体領域の相対座標位置も追加する。
【０１４０】
認識時には、同様に、部分パターンを切り出すが、それと同時に全体領域の切り出しを行うことも必要となる。選択された基底情報に対して、全体領域の相対座標位置を計算し、実際の画像上の座標位置をもとめ、その領域の濃淡パターンを全体パターン候補として利用する。
【０１４１】
全体パターンの利用法としては、モデルの検索の際に、部分パターン同士の類似度に加え、全体パターン同士の類似度計算の結果を利用することが挙げられる。
【０１４２】
（２）顔認識の具体例
顔認識を例にすると、図１０のように、（ａ）（ｂ）は異なる仮説情報によって基底が選択されている様子（図１０−１００１，１００２）を示し、それぞれ部分パターンＳＰａｔと全体パターンＧＰａｔが切り出されている。このとき、２つの異なる仮説が同じ認識対象物を示しているかどうかの判断を行うために、検出内容の示す全体パターン同士を比較することで判断を行うことができる。
【０１４３】
（ａ）（ｂ）については、全体パターンとしては、類似した場所が切り出されている（図１０−１００４，１００５）。さらに異なる仮説情報からなる検出内容（ｃ）を示す図１０−１００３では、図１０−１００６のように異なる全体パターンが切り出され、このパターンと図１０−１００４，１００５のパターンを比較することで、この仮説情報が適当でないことが判断できる。
【０１４４】
これは、モデルと認識対象物とのパターン比較ではなく、既に仮説情報として登録されているもの同士の間でのパターン比較により、同一認識対象物であるかどうかの検証に利用できる。
【０１４５】
（第３の実施例）
次に、第３の実施例について説明する。
【０１４６】
第３の実施例では、認識対象物を同定、識別する実施例である。
【０１４７】
認識対象物の姿勢がある程度規定されている場合には、同様のハッシュテーブルによるモデル表現の考え方によって認識対象物の識別が可能である。
【０１４８】
例えば、図１１のように物体Ａ（図１１−１１０１）、物体Ｂ（図１１−１１０１）に対して、特徴点抽出（図１１−１１０３，１１０４）を行い、その全ての組み合わせにより、部分パターンを切り出し、ハッシュテーブルに登録する。この際、Ｌａｂｅｌに対して登録時に物体の名称についても登録しておく（図１１−１１０５）。
【０１４９】
認識時には、同様の手順を行い、投票された特徴点を含む物体の名称を出力することで、どの物体が存在するかという認識対象物の同定も可能となる。
【０１５０】
例えば、人物認識を行う場合には、複数の人物の顔データをテーブルに登録しておき、最も得票値を得た人物を、その人物の認識結果とするなどにも応用できる。
【０１５１】
（第４の実施例）
次に、画像中から複数の対象を認識し、それらの名称とその存在領域を結果とする認識の実施例について、図１２に基づいて説明する。
【０１５２】
図１２−１２０１のように２つの物体Ａ、Ｂ（図１２−１２０４，１２０５）があり、それらの位置と名称を見つけることを目的とする。ハッシュテーブルには、いくつかの物体が予め登録されており、その名称をアルファベットで識別することにする。
【０１５３】
まず、これまでの実施例と同様に、入力画像に対して（図１２−１２０２）に対し、特徴点抽出を行う。
【０１５４】
次に、任意の３点の特徴点を選択し、基底と部分パターンを計算する（図１２−１２０３）。
【０１５５】
基底に対して計算される不変量からハッシュテーブルの検索位置（図１２−１２０７）を決定し、そこに登録されているモデルとの比較を行う。
【０１５６】
図１２−１２０８に示すように、この例では、４つのモデルが登録されているとする。これらの登録内容は、各部分パターンと、どの物体であるかという名称のラベル、そして相対的に表した場合の、存在領域を表すための点集合Ｒｅｌが記述されている。
【０１５７】
今、４つの登録内容と、図１２−１２０６に示した部分パターンとの比較を行い、パターン同士の類似度が設定した閾値を越える場合に仮説情報が生成される。
【０１５８】
図１２では、図１２−１２０９に対応する部分パターンからのみ仮説情報が生成された場合を示し、そのパターン類似度Ｓｉｍが仮説投票箱に投票される。
【０１５９】
投票を行うための仮説空間（図１２-１２１１）は、存在が仮定される領域毎に仮説投票箱をもつ。初期状態では、仮説投票箱は存在せず、ある仮説情報が生成された時点で、その存在位置と名称毎に仮説投票箱が用意される。
【０１６０】
図１２-１２１１では、既に５つの仮説投票箱が存在しているが、異なる場所、名称の仮説情報が新たに生成される場合には、新たな仮説投票箱が用意される。また同じ場所、名称の仮説投票箱が既に存在しており、同じ場所、名称の仮説情報が生成された場合には、仮説投票箱には、その類似度の積算のみを行う。
【０１６１】
図１２−１２０９から生成された仮説情報では、既に同じ位置、名称の仮説投票箱Ｒ３が存在しているため、類似度Ｓｉｍ（図１２−１２１０）をＲ３の値に積算する。
【０１６２】
このように投票を順次行い、すべての特徴点の組み合わせについて行った結果、図１２-１２１１の投票結果から、２つの物体の検出結果を得る。図１２-１２１２はその結果を示し、２つの物体の位置と名称を得ることができる。
【０１６３】
ここで、仮説情報の仮説空間（図１２−１２１１）において、Ｒ２とＲ３は、ほぼ同じ位置に同じ物体があるが、仮説情報を逐次的に生成する仮定で、２つの仮説投票箱が用意されてしまっている。このような場合には、２つを統合もしくは、いずれかを選択するという処理を行うことで、一つの結果を出力すればよい。また、Ｒ４とＲ５のように、同じ位置で異なる名称の仮説情報が存在する場合には、投票値の高いものを選択する処理を行い、結果とする。
【０１６４】
（変更例）
（１）変更例１
上記実施例では、認識対象物として箱を例に説明を行ったが、認識対象物に対してはこれを問わない。例えば、人間の顔、車両など、さまざまな物体の認識に利用可能である。複数の対象を見つけるようにも変更できる。複数の対象を見つける場合には、仮説統合部にて、空間的な位置のことなる複数の物体に対して、別々に得票値の結果をチェックすることにより実現できる。
【０１６５】
（２）変更例２
また、特徴点の組み合わせに用いた点の数は、３点に限定するものではない。その際、ハッシュ関数に用いたような長さ比や角度などの不変量の種類、数、算出法についても限定しない。例えば、透視変換に対する不変量として、５点の組み合わせから計算される複比（compound ratio）などを用いることもできる。
【０１６６】
例えば、前述した実施例では２つの不変量を用いたが、１つだけとして、一次元のハッシュテーブルでもよいし、３つめの不変量を採用し、３次元のハッシュテーブルを利用してもよい。
【０１６７】
また、ハッシュテーブルに用いるインデックスパラメータが全てが不変量である必要はない。例えば、物体の大きさの変化や方向の変化が限定されているような状況では、長さの情報や方向の情報をインデックスパラメータとして利用することが高速な検索が可能になる。
【０１６８】
（３）変更例３
ハッシュテーブルへの登録時に、誤差を考慮して、各テーブルの登録場所の近傍に重複して登録しておいてもよい。
【０１６９】
また、逆に登録はそのままで、認識時にハッシュ関数で計算された場所とその近傍を検索するようにしてもよい。
【０１７０】
（４）変更例４
上記実施例では、濃淡パターンを用いたが、その濃淡パターンを別のパターン変化させたものでもよい。
【０１７１】
例えば、フーリエ変換、log-polar変換などを施すといったものや、エッジ画像、そのほかのパターンを用いて類似性を比較してもよい。
【０１７２】
また、濃淡画像を対象としたが、カラー画像にも適用できる。カラー画像がＲＧＢカラー表現の場合、各Ｒ，Ｇ，Ｂに対して同様の処理を行うといった適用方法や、別の色表現に変換した後、その色表現でのパターンを用いたものでもよい。
【０１７３】
さらに、類似性の比較の際に色情報を用いることができる。例えば、非特許文献７〔M. Swain and D. Ballard, "Color indexing,"Int. Journal of Computer vision, vol. 22, pages 11-32, December 1991〕で提案されている色ヒストグラムを用いて、領域内の色ヒストグラム同士を比較してもよい。
【０１７４】
（５）変更例５
高速化についてもさまざまな変更が考えられる。
【０１７５】
例えば、ランダムハフ変換で採用されているように、特徴点をランダムに選択し、投票を行うのと同様な手法をとることができる。この場合、特徴点をランダムに選択するだけでなく、生成する仮説情報をランダムに選択するなどの方法もある。
【０１７６】
（６）変更例６
また、高速化のために基底情報を構成するベクトルの長さ、角度、などに対する、いろいろな制約を利用してもよい。
【０１７７】
例えば、路面を走行する車を検出する場合に、車が１８０°回転した姿勢を探索することは必要がないため、探索範囲、パラメータに対する制限を行うことで検索の効率化ができる。
【０１７８】
また、仮説情報の生成、投票などは、並列化が可能であるため、複数の計算機構（並列計算機、グリッド環境、ＰＣクラスタ）を用いた実装などを行ってもよい。
【０１７９】
（７）変更例７
特徴点抽出部２については、分離度フィルタを用いた検出、コーナー点検出を実施例で示したが、点だけでなく、線、領域、円、円弧といった幾何学的対象でもよい。その場合、それぞれの幾何学的対象に応じた特徴点を用意し、特徴点選択部３に渡せばよい。
【０１８０】
例えば、区分的な直線検出を行い、その両端の２点を特徴点としたり、また、直線の中央点を特徴点とするといったように、幾何学的対象の検出を利用してもよい。
【０１８１】
このとき、特徴点に対しての複数の種類が選択可能な場合には、その特徴点の種類をキーとして検索を行ってもよい。例えば、円形の特徴点（Ｅ）と角点の特徴点（Ｋ）を検出可能な場合、モデル登録の際に３点を選択したとき、その組み合わせが８通り発生する。
【０１８２】
（Ｅ，Ｅ，Ｅ，）（Ｅ，Ｅ，Ｋ）（Ｅ，Ｋ，Ｋ）（Ｅ，Ｋ，Ｅ）（Ｋ，Ｅ，Ｅ）（Ｋ，Ｋ，Ｅ）（Ｋ，Ｅ，Ｋ）（Ｋ，Ｋ，Ｋ）
この情報を用いて、認識時に３点の選択を行った場合に、その組み合わせが同じかどうかを用いて、モデル選択を行ってもよい。これは、モデルの選択を高速にするだけでなく、部分パターンの周辺に、どのような特徴点が存在しているかを正しく評価することができるため、認識精度も向上する。もちろん、特徴点の種類、数は限定しない。
【０１８３】
（８）変更例８
テーブル登録部７において、実施例では、衝突が起こった際のモデルのデータ構造について、リスト構造として複数のモデルをもつことを示したが、このデータ構造に関してはこの限りではない。
【０１８４】
例えば、ツリー構造にすることで、同じインデックスパラメータをもつ複数のモデルの探索を効率化することがきる。
【０１８５】
また、この同一のインデックスパラメータをもつモデルに対して、何らかの順序構造を定義して、検索の際の効率化を向上させることも可能である。
【０１８６】
（９）変更例９
本実施例では、正解パターンとの類似性のみを利用した。マッチングの際に、非正解パターンを利用することで、誤認識を軽減できる。具体例としては、非正解パターンとの類似度を投票空間から減算するなどの方法がある。
【０１８７】
以上、本発明はその趣旨を逸脱しない範囲で種々変更して実施することが可能である。
【０１８８】
【発明の効果】
本発明は、局所的な画像情報と効率的な登録方法に基づいた、認識認識対象物の検出を行う新しい方式である。従来の見え方指向の方法に対して、計算コストの削減が可能となる。
【０１８９】
また、高速に類似度を検知することが可能になり、検出精度が向上する。
【０１９０】
さらに、部分的な情報を用いているため、オクルージョンにも強い。またハッシュテーブルを利用した効率的なモデル探索が行われるため、高速な検索への適用が可能となった。
【図面の簡単な説明】
【図１】本発明の第１の実施例を示す物体認識装置の構成図である。
【図２】特徴点検出処理を施した画像である。
【図３】モデルの登録時のフローチャートである。
【図４】モデルの登録方法である。
【図５】モデルの衝突に関する処理方法である。
【図６】検出時のフローチャート（仮説情報の生成）である。
【図７】検出方法である。
【図８】統合のフローチャートである。
【図９】部分パターン情報についての図である。
【図１０】第２の実施例における大域パターン情報の利用についての図である。
【図１１】認識対象物の識別の図である。
【図１２】第４の実施例における説明図である。
【符号の説明】
１画像入力部
２特徴点抽出部
３特徴点選択部
４基底計算部
５部分パターン構成部
６インデックス計算部
７テーブル登録部
８パターン類似度計算部
９仮説情報生成部
１０物体認識部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a pattern recognition method.
[0002]
[Prior art]
Technology for detecting and recognizing the position, posture, and shape of a specific object from an image is important in computer vision. Conventionally, various methods for correctly detecting the position and size of an object under a complicated background have been considered (see Non-Patent Document 1).
[0003]
Among them, there is a geometric hashing method (see Non-Patent Document 2) as a feature point-based object detection method.
[0004]
In this geometric hashing method, a geometric object is expressed by a set of a plurality of feature points, and a model is described using a structure expression that is invariant to translation, expansion, and rotation. Then, similar to the Hough transformation (see Non-Patent Document 3), the object is detected by the hypothesis voting and the majority rule regarding the object model. In this method, object identification and detection can be performed simultaneously.
[0005]
In addition, hashing here refers to adding an index by a hash function in order to perform a high-speed search for a plurality of search targets, thereby efficiently distributing the search targets on the memory, The name of the algorithm that allows access.
[0006]
On the other hand, there are several methods based on template matching of shading information as an approach based on appearance orientation (see Non-Patent Document 4).
[0007]
The matching using the shading information in this is robust against noise because it uses global information of the entire pattern.
[0008]
[Non-Patent Document 1]
LGBrown: “A Survey of Image Registration Techniques”, ACM Computing Surveys, Vol. 24, No. 4, pp. 325-376 (translation: “Overview of Image Registration Techniques”, Computer Science acm computing surveys' (92 bit separate volume, pp.77-120)
[0009]
[Non-Patent Document 2]
Lamdan, Y. and Wolfson, HJ: “Geometric Hashing: a general and efficient model-based recognition scheme”, Proceeding of International Conference Computer Vision, pp. 238-249, 1988
[0010]
[Non-Patent Document 3]
PVCHough, ”Method and means for recognizing complex patterns.” USPatent 3069654,1962.
[0011]
[Non-Patent Document 4]
HARowley, S. Baluja, and T. Kanade, “Rotational Invariant Neural Network-Based Face Detection”, in Proceedings, IEEE Conference on Computer Vision and Pattern Recognition.pp.38-44,1998
[0012]
[Problems to be solved by the invention]
Various expansions have been made to the geometric hashing method. However, although it is suitable for a method of detecting a geometric object, an object having a texture, for example, a part such as an eye, a nose, a mouth, or a shadow of a cheek, as in the face shown in FIG. It is difficult for a complex object with grayscale information.
[0013]
As an example, let us consider a case where feature points are detected as shown in FIG. 2-202 and detected from the feature point set of FIG. 2-203. In this case, it is difficult to express a face model with only a point set, and an extra candidate point increases and causes an error under a complicated background.
[0014]
On the other hand, in the view-oriented concept, as can be seen from the face detection example in Document 4, it is necessary to multiplex model expressions, such as preparing a plurality of pre-rotated templates, and multiple (template) matching. Must be performed on several levels of pyramid images. That is, it is necessary to have multiple templates for a plurality of deformations (transformations). Moreover, although it is robust to the evaluation of overall pattern similarity, it is also vulnerable to the discovery of partial deformation and local noise. Furthermore, there is a method of using a misrecognition pattern in order to improve detection accuracy, as is done in conventional face area detection. Even in this aspect, there remains a problem of how to collect misrecognition patterns.
[0015]
In order to improve these conventional detection methods, the required properties are:
・ Partial information registration and multiple description of models
・ Model description method based on invariants for not specifying posture
・ Noise-resistant detection mechanism
Etc.
[0016]
Therefore, in the present invention, to satisfy these properties, the object is to perform high-speed and accurate object recognition and object extraction by using a distributed model representation of a partial image using a hash table and efficient pattern search. To do.
[0017]
[Means for Solving the Problems]
The present invention relates to a pattern recognition apparatus that recognizes the recognition target object by comparing an image of the recognition target object and a pre-registered model, and inputs the image of the recognition target object. Image input means, feature point extraction means for extracting a plurality of feature points from the input image, and the extracted plural From among feature points 3 points Feature point selection means for selecting a combination of feature points, and the selected Includes two reference vectors from one of the three feature points to the other two feature points and the angle θ between these two reference vectors A basis calculation means for calculating a basis; Converting the lengths L1 and L2 of the two reference vectors in the calculated bases to the same length, and converting the angle θ to a right angle; Partial pattern extraction means for extracting a partial pattern of the recognition object from the image; It consists of the ratio of the lengths L1 and L2 of the two reference vectors and the angle θ between the two reference vectors. It is composed of a plurality of registration locations divided based on an index parameter, and the partial pattern of the model is an index parameter related to the partial pattern The ratio of the lengths L1 and L2 of the two reference vectors and the angle θ between the two reference vectors In the registration location corresponding to By hash function Corresponding to the table storage means for storing the registered table and the partial pattern of the extracted recognition object The ratio of the lengths L1 and L2 before the two reference vectors of the base are converted to the same length, and the angle θ Based on the stored location of the stored table By hash function Index search means for determining, pattern similarity calculation means for determining the partial pattern of the model registered at the registration location of the determined table, and the similarity of the partial pattern of the recognition object Is a pattern recognition device characterized by
[0026]
According to the present invention, it is possible to represent a distributed model by a partial pattern using a table and perform efficient pattern search.
[0027]
In addition, it is possible to search a partial pattern at high speed, detect the correct position and orientation of an object, and identify a target using a plurality of search results.
[0028]
In addition, it becomes possible to stably detect and identify objects including noise and hiding.
[0029]
In addition, it is possible to save the model efficiently in the table registration means, and there is an effect of shortening the time for searching.
[0030]
DETAILED DESCRIPTION OF THE INVENTION
(First embodiment)
A pattern recognition apparatus according to a first embodiment of the present invention will be described below with reference to the drawings.
[0031]
This embodiment is a method for detecting the position and orientation of a target object from a certain image. Any object can be used as a recognition target, but in this embodiment, a rectangular parallelepiped having a pattern (character “A” is written on one surface) as shown in FIG. The case of a box is used for explanation.
[0032]
In the present embodiment, an image obtained by copying an arbitrary position and an arbitrary posture of the box is registered as one model, and images relating to the boxes at various positions and postures are registered as a plurality of models.
[0033]
Then, it is determined which model is similar to the detection target image in which the same box is captured, and the position and orientation of the box captured in the detection target image are recognized.
[0034]
Therefore, the following description will be divided into two phases of “model registration” and “model detection and identification by voting”.
[0035]
(1) Configuration of pattern recognition device
FIG. 1 is a configuration diagram of a pattern recognition apparatus according to the present embodiment.
[0036]
As shown in FIG. 1, the pattern recognition apparatus includes an image input unit 1, a feature point extraction unit 2, a feature point selection unit 3, a basis calculation unit 4, a partial pattern configuration unit 5, an index calculation unit 6, a table registration unit 7, The pattern similarity calculation unit 8, the hypothesis information generation unit 9, and the object recognition unit 10 are configured.
[0037]
Among these, the image input unit 1 is constituted by a CCD camera or the like, and other feature point extraction unit 2, feature point selection unit 3, base calculation unit 4, partial pattern configuration unit 5, index calculation unit 6, table registration unit 7, Each function of the pattern similarity calculation unit 8, the hypothesis information generation unit 9, and the object recognition unit 10 is realized by a program stored in a computer such as a personal computer.
[0038]
(2) Model registration
A method for registering a model of a recognition object will be described with reference to FIGS.
[0039]
(2-1) Model registration process flow
The flow of processing will be described based on the flowchart of FIG.
[0040]
First, in the image input unit 1 of FIG. 1, an image including a recognition target is input (step 301 in FIG. 3).
[0041]
Next, the feature point extraction unit 2 detects feature points for the image (step 302 in FIG. 3).
[0042]
Non-Patent Document 5 [Kazuhiro Fukui, Osamu Yamaguchi: “Face Feature Point Extraction by Combination of Shape Information and Pattern Matching”, Science theory (D-II), vol.J80-D -II No.8 (1997)] may be used. Also, Harris filters for detecting corner points (corners) [see Non-Patent Document 6: CJ Harris and M. Stephens, A combined corner and edge Detector. In Proc. 4th Alvey Vision Conference, Manchester, pages 147-151, 1988. ] May be used, and the method may be selected in accordance with the application and the recognition object.
[0043]
Next, the feature point selection unit 3 selects the feature points of the part including the recognition target. Here, all combinations of three points are obtained from the feature points (step 303 in FIG. 3).
[0044]
A partial pattern is registered for each combination. From the combination of the three points, the base calculation unit 4 determines two vectors, and using the base information based on the vectors, the partial pattern configuration unit 5 cuts out the partial shading pattern of the surrounding area (FIG. 3). Step 305).
[0045]
Each extracted partial gray pattern is registered in the hash table managed by the table registration unit 7 (step 306 in FIG. 3) after the registration location is calculated by the hash function in the index calculation unit 6 (step 306 in FIG. 3). 307).
[0046]
This is repeated and registration of one model is completed when all combinations are obtained (step 304 in FIG. 3).
[0047]
When registering a plurality of models, the above process is repeated.
[0048]
(2-2) Specific example of model registration
A case will be described in which the model shown in FIG. 4-401 in which the model target box is photographed is registered.
[0049]
The result of detecting the feature points of the model target box is shown in FIG.
[0050]
From these feature points, main points (hereinafter referred to as points of interest) are selected in advance, and all combinations of these three points are obtained. The combination of the three selected feature points is called a feature point group.
[0051]
FIG. 4-403 illustrates one feature point group from among these feature point groups, and two reference vectors are obtained from the three feature points.
[0052]
FIG. 4-404 shows a base spanned by two reference vectors. The lengths L1 and L2 of the two reference vectors and the angle θ between them are calculated.
[0053]
The “base” constitutes a coordinate system from several reference vectors, and has information on the origin and coordinate axes. Here, the starting point of the two reference vectors is the origin, and the direction of each reference vector is the direction of the coordinate axis. Then, when defining a feature point group and forming a group with feature points F1, F2, and F3, different groups are formed depending on whether the feature point F1 is the origin and the feature point F2 is the origin And
[0054]
Next, an image pattern around the origin and the two reference vectors is cut out from the base (FIG. 4-405). When cutting out, an image pattern is cut out by converting a base coordinate system spanned by two reference vectors into an orthogonal coordinate system.
[0055]
In order to cut out this image, the base coordinate system spanned by the two reference vectors is converted into an orthogonal coordinate system. That is, the entire image is deformed so that the angle formed by the reference vectors is a right angle and the lengths of the two reference vectors are the same.
[0056]
Thereafter, a grayscale image (an image of m × n pixels) in a predetermined range centered on the base origin is cut out as a partial pattern.
[0057]
The gray-scaled image that has been cut out has a form that is invariant to various geometric transformations. In this way, by performing coordinate conversion on the grayscale image in advance and registering and searching, it is possible to determine similarity without performing geometric transformation on the grayscale image when comparing subsequent patterns. It becomes.
[0058]
FIG. 4-406 shows a cut-out partial grayscale image, and in this embodiment, a square grayscale image.
[0059]
The cut out partial image may appear distorted depending on the length and direction of the reference vector.
[0060]
FIG. 4-407 shows the base configuration and partial pattern cut out from another three-point combination.
[0061]
Each extracted partial pattern is registered at a predetermined registration location in a table calculated by a hash function (hereinafter referred to as a hash table). This registered content may include, in addition to the partial grayscale image, the type of the point of interest and the relative position information of the partial pattern in the entire recognition target. This will be described later.
[0062]
(2-3) Registration in hash table
As a property of the hash function, invariance with respect to translation, rotation, and scaling conversion is used for a given base. Those with the same condition base define a function that returns the same return value even if these transformations are applied.
[0063]
Here, the ratio between the angle between the two reference vectors constituting the base and the length of the two vectors is invariant to the geometric transformation such as translation, rotation, enlargement, and reduction. Consider the structure of a hash function using. That is, the registration location of the hash table is determined using the ratio between the angle between the two reference vectors and the length of the two vectors as an invariable parameter (hereinafter referred to as an index parameter) regarding the geometric transformation.
[0064]
The hash function H gives three feature points p1, p2, and p3 as arguments as follows, and a length ratio R that is an index parameter: _axs And angle θ _axs , Quantize and return the hash table position. Here, the positions of p1, p2, and p3 are represented with reference to the origin of the absolute coordinate system. The origin at the base is the feature point p1.
[0065]
[Expression 1]

FIG. 4-408 represents the hash table two-dimensionally, and the vertical axis represents the ratio R of the lengths of the two reference vectors. _axs The horizontal axis is the angle θ between the two reference vectors _axs .
[0066]
Where the angle θ between the two vectors found for each basis _axs , Length ratio R _axs And the extracted partial shading pattern or the like is registered at the position of the hash table indicated by the value.
[0067]
These θ _axs , R _axs For, appropriate quantization may be performed in consideration of errors.
[0068]
An entity (registered) on each hash table is expressed as follows. The format and the type are not limited.
[0069]
[Expression 2]

Axs represents coordinate system information based on three feature points. Specifically, it includes information on three coordinate positions, two basis vectors, and the like.
[0070]
Label represents which point of interest the three feature points correspond to. Specifically, as shown in FIG. 7-707, labels (F1 to F12) are attached to the respective vertices, and the label corresponding to each of the three points indicated by Axs is described. Keep it.
[0071]
Inv is an invariant to the transformation and θ shown above _axs , R _axs Etc. are included.
[0072]
Rel is a relative representation of a set of points representing an area surrounding an object, and is used to indicate the location of a detection result, or to extract the area again. As a specific example, description is made with four coordinate positions of a vertex set for representing a rectangle as shown in FIG.
[0073]
SPat is a local image surrounded by feature points, and indicates a cut-out partial shading pattern.
[0074]
The same processing is performed for all the combinations, and the model is registered.
[0075]
The processing for one image has been described so far, but recognition objects captured in various environments by registering in the same way using different images with the same recognition object and different shooting conditions. Can be recognized.
[0076]
(2-4) Correspondence of collision in hashing algorithm
However, when such registration is performed sequentially, if the same hash function value is obtained by a combination of three points, the partial pattern is registered at the same location. That is, a “collision” occurs in a general hashing algorithm.
[0077]
As shown in FIG. 5-501, a plurality of candidate patterns can be connected to each position of the hash table as a list structure so that a plurality of candidate patterns can be held. Has multiple candidates. This method is generally called the chain method. When the number of collation candidates increases, a large number of candidates exist as shown in FIG.
[0078]
Here, it is considered that when a plurality of models of the same recognition target object are registered, the similarity of the patterns of each matching data is judged, and when the similarity is high, they are merged as one model.
[0079]
For example, as shown in FIG. 5-504, when a condition such as the similarity between patterns being similar is satisfied, new model patterns are prepared and combined into one. As an example of summarization, an average pattern of a plurality of models is created as an example. When the average pattern introduced is low in similarity with the original model, it may be handled separately as another model.
[0080]
(3) Model detection and identification by voting
Next, a method for detecting a recognition target object from an image will be described.
[0081]
In order to explain the detection algorithm for the recognition target object in an easy-to-understand manner, the description will be divided into model selection using a hash table, hypothesis generation, hypothesis integration, and recognition by verification.
[0082]
(3-1) Model selection using hash table
(3-1-1) Description of model selection processing
A flow of model selection processing using a hash table will be described with reference to FIG.
[0083]
First, an image to be recognized is read into the image input unit 1 in FIG. 1 (step 601 in FIG. 6).
[0084]
Next, the feature point extraction unit 2 extracts feature points from the image to be recognized (step 602 in FIG. 6).
[0085]
Next, from the detected feature points, the feature point selection unit 3 selects combinations of the detected feature points (step 603 in FIG. 6) and continues until all the combinations are selected (step 604 in FIG. 6). Done.
[0086]
For each combination, the basis calculation unit 4 calculates the basis (step 605 in FIG. 6).
[0087]
Then, the index parameter of the hash function in the base is calculated by the index calculation unit 6 (step 606 in FIG. 6).
[0088]
The table registration unit 7 searches the registration location of the hash table corresponding to the index parameter (step 607 in FIG. 6).
[0089]
This is determined depending on whether a registered pattern exists (step 608 in FIG. 6).
[0090]
When there is a registered pattern, the partial pattern configuration unit 5 cuts out a peripheral partial pattern, and the pattern similarity calculation unit 8 calculates and compares the similarity between the registered pattern and the partial pattern. As a method for calculating the degree of similarity between patterns, a method for calculating the degree of similarity with respect to a general shading pattern, for example, normalized correlation, SSD, simple similarity, or the like may be used. There is no limitation on the method of calculating the similarity between the patterns.
[0091]
If there is no registered pattern, the similarity is not calculated because there is no model (step 608 in FIG. 6).
[0092]
After the model is selected, if the partial pattern of the peripheral area by the combination of the three points is similar, there is a possibility that it is a part of the target area to be detected. The hypothesis information of the detection target area is generated. This is performed for all combinations of detected feature points, and the hypothesis information generation process is repeated. This will be described later.
[0093]
(3-1-2) Specific example of model selection processing
The procedure so far will be specifically described with reference to FIG.
[0094]
The purpose of this processing is to detect the position and orientation of the box registered in the model registration described above based on the input image.
[0095]
FIG. 7-701 is an input image obtained by photographing a box that is a recognition target. Feature points are detected from this image in the same manner as model registration (FIG. 7-702).
[0096]
Next, three points are selected from the feature points as in model registration.
[0097]
Based on the base information created based on these three points, the shading pattern of the surrounding area is cut out (FIG. 7-703).
[0098]
Next, the location of the hash table (FIG. 7-704) corresponding to the invariant of the base information is searched.
[0099]
If a registered pattern exists and the similarity exceeds a set threshold, a hypothesis is generated.
[0100]
As in FIG. 7-705, if there are three matching patterns, three respective hypotheses are generated.
[0101]
(3-2) Generation of hypothesis information
The contents of the hypothesis information include information such as feature point location information, position and size information, shading pattern information, and similarity to model patterns. As an example, the hypothesis is defined by the following triplet. The definition of hypothesis information is not limited to this.
[0102]
[Equation 3]

TRel is position information obtained by converting Rel, which is relative position information, according to the selected three points, and represents an existing area of an object in the image.
[0103]
TLabel is for describing which point of interest corresponds to among the feature points.
[0104]
Psim represents the similarity between the gray pattern registered as a model and the selected partial pattern.
[0105]
The pattern information determined from the three points selected at the time of detection is expressed as follows in the same manner as described in the hash table.
[0106]
[Expression 4]

Φ represents an empty set, and Label _x Is because φ is not determined by selecting only three points.
[0107]
Also, the partial model that exists in the part with the same hash function value
[Equation 5]

And
[0108]
In other words, because the search results are in the same place, Inv _x = Inv _m Is established. In order to generate hypothesis information, each element is calculated by the following three functions.
[0109]
[Formula 6]

FuncLab is a function that calculates the label of the corresponding point of interest. FuncLab is an Axs _x Axs for each point _m Label corresponding to the order of _m Return the label of as a TLabel.
[0110]
FuncGeom is a function that calculates the location of an object based on a selected base.
[0111]
Specifically, it is calculated by the following formula. Here, (.,.) Represents coordinates.
[0112]
[Expression 7]

It becomes.
[0113]
FuncSim is a function for calculating the similarity between the light and shade patterns. There are various methods for obtaining the similarity between patterns as described above. For example, in the case of simple similarity, the calculation is performed by the following formula.
[0114]
[Equation 8]

Based on the content of the partial model registered in each hash table, the content of each hypothesis is calculated. Note that the threshold value is set for the value of the similarity Psim earlier, but it may be used to generate hypothesis information even for those having a low similarity value.
[0115]
Since each hypothesis information has information that it is a part of the detection target region, the object can be recognized by integrating these pieces of information.
[0116]
(3-3) Recognition by integration and verification of hypothesis information
Using the similarity of the partial patterns, hypothesis information that is a part of the detection target region is generated as described above, all hypothesis information is voted on the hypothesis space, and the results are integrated. This is processed in the object recognition unit 10.
[0117]
In the present embodiment, a detection method that can identify the location of the point of interest set at the time of model registration will be described.
[0118]
(3-3-1) Specific example of hypothesis voting process
A hypothesis space for voting hypothesis information is managed by a hypothesis integration unit included in the object recognition unit 10. FIG. 7-706 graphically represents the hypothesis space. FIG. 7-707 shows the location of the point of interest specified at the time of model registration, and the hypothesis space is constituted by each hypothesis voting box corresponding to each point of interest.
[0119]
As a hypothesis when the three points shown in FIG. 7-703 are selected, consider a case where there are three patterns whose similarity exceeds the threshold as shown in FIG. 7-705. Each hypothesis information describes which point of interest combination is obtained. For example, the points of interest F8, F4, and F5.
[0120]
Voting is performed on the hypothesis ballot boxes corresponding to the respective points of interest F1, F2,. That is, the similarity value indicated by Psim is added to the hypothesis voting box described in Label.
[0121]
Normally, there are those having a plurality of different feature point positions for voting on the same point of interest. In this case, for each hypothesis voting box having different coordinate values, the voting value is managed for each coordinate. This is useful when detecting a plurality of objects or deleting erroneous detection contents.
[0122]
In this way, voting is performed for all hypothesis information related to all combinations.
[0123]
Next, from the combination of the ones with the largest vote values, the one with the highest vote value is output as the detection result of each feature point in order (FIG. 7-708). This coordinate value becomes the coordinate value of the feature point of the box photographed in the image of the recognition object.
[0124]
For this, a certain threshold value may be set, and processing such as outputting only a vote value that is equal to or higher than the set value may be performed. In the example of FIG. 7, three feature points are selected, and feature points designated as points of interest at the positions are detected.
[0125]
It should be noted that the position of the entire pattern is calculated from the selected combination of feature points and may be used as an output (FIG. 7-709). In this case, the result of combining TRel information indicating the position of the object may be used.
[0126]
(3-3-2) Explanation of hypothesis voting process
FIG. 8 shows a process flowchart of the hypothesis integration unit.
[0127]
When one piece of hypothesis information is received in step 801 of FIG. 8, it is checked whether or not there is already the same coordinate value of each point of interest with respect to the type of point of interest of each of the three feature points constituting the hypothesis information. (Step 802 in FIG. 8).
[0128]
If there is the same thing, the similarity obtained previously is added to the vote value of the hypothesis ballot box to update the vote value (step 803 in FIG. 8).
[0129]
If there is no same, the resulting coordinates are different, so another counter is prepared in the hypothesis ballot box (step 804 in FIG. 8), and the update is started by adding the similarity. It is determined whether or not all of the hypothesis information has been voted (step 805 in FIG. 8), and if not, the hypothesis information is repeatedly voted.
[0130]
When the process is completed, a vote with a higher vote value is output from the vote values in the respective hypothesis ballot boxes and is used as a detection result (step 806 in FIG. 8).
[0131]
(Second embodiment)
Next, a second embodiment will be described with reference to FIGS.
[0132]
In the second embodiment, a detection method using a pattern obtained by cutting out a global image in addition to a partial image will be described.
[0133]
(1) Recognition method using global image for face area detection
FIG. 9 illustrates various data regarding a recognition method using a global image for face area detection.
[0134]
FIG. 9-901 is an image including a face. Three feature points (both nostrils and one mouth end) are selected as points of interest, and a rectangle representing the region of the global image (the entire face) is shown.
[0135]
FIG. 9-902 represents a base stretched from the three points, and the angle between them and the vector length ratio are used as an index of the hash table.
[0136]
FIG. 9-903 shows the surrounding shading pattern cut out by the base stretched from the three points.
[0137]
In addition to this, the shading pattern of the entire face region is also cut out as shown in FIG. 9-904. In this case, the contents registered in the table are:
[Equation 9]

Thus, the whole shade pattern GPat is added.
[0138]
Further, by changing the representation of the rectangle representing the entire face area as shown in FIG. 9-905 to FIG. 9-906 by the coordinate system generated by the base information, the relative coordinates Rel of each vertex are calculated. . This is used to determine the relative position of the entire area when it is detected.
[0139]
At the time of model registration, the partial pattern (FIG. 9-903) was registered in the above-described method. In addition, the entire pattern (FIG. 9-904) and the relative area of the entire area shown in FIG. Add coordinate position.
[0140]
At the time of recognition, a partial pattern is similarly cut out, but it is also necessary to cut out the entire area at the same time. With respect to the selected base information, the relative coordinate position of the entire area is calculated, the actual coordinate position on the image is obtained, and the shading pattern of that area is used as the entire pattern candidate.
[0141]
As a method of using the whole pattern, it is possible to use the result of the similarity calculation between the entire patterns in addition to the similarity between the partial patterns when searching for the model.
[0142]
(2) Specific examples of face recognition
Taking face recognition as an example, as shown in FIG. 10, (a) and (b) show how bases are selected based on different hypothesis information (FIGS. 10-1001, 1002), and a partial pattern SPat and an overall pattern GPat, respectively. Is cut out. At this time, in order to determine whether two different hypotheses indicate the same recognition target, it is possible to make a determination by comparing the entire patterns indicated by the detected contents.
[0143]
As for (a) and (b), similar places are cut out as the entire pattern (FIGS. 10-1004 and 1005). Further, in FIG. 10-1003 showing detection contents (c) consisting of different hypothesis information, different whole patterns are cut out as shown in FIG. 10-1006, and by comparing this pattern with the patterns of FIG. 10-1004, 1005, It can be determined that this hypothesis information is not appropriate.
[0144]
This can be used for verification of whether or not they are the same recognition object by pattern comparison between those already registered as hypothesis information, not pattern comparison between the model and the recognition object.
[0145]
(Third embodiment)
Next, a third embodiment will be described.
[0146]
The third embodiment is an embodiment for identifying and identifying a recognition object.
[0147]
When the orientation of the recognition target is defined to some extent, the recognition target can be identified by the same model representation concept using a hash table.
[0148]
For example, as shown in FIG. 11, feature point extraction (FIGS. 11-1103 and 1104) is performed on object A (FIGS. 11-1101) and B (FIGS. 11-1101), and partial patterns are obtained by combining all of them. Are registered in the hash table. At this time, the name of the object is also registered with Label at the time of registration (FIGS. 11 to 1105).
[0149]
At the time of recognition, the same procedure is performed, and the name of the object including the voted feature point is output, so that it is possible to identify the object to be recognized as to which object exists.
[0150]
For example, when performing person recognition, the face data of a plurality of persons is registered in a table, and the person who obtained the most vote value can be applied to the recognition result of the person.
[0151]
(Fourth embodiment)
Next, an embodiment of recognition that recognizes a plurality of objects from an image and uses the names and their existence areas as a result will be described with reference to FIG.
[0152]
There are two objects A and B (FIGS. 12-1204 and 1205) as shown in FIG. 12-121, and the purpose is to find their positions and names. In the hash table, some objects are registered in advance, and their names are identified by alphabets.
[0153]
First, as in the previous embodiments, feature point extraction is performed on the input image (FIGS. 12-1220).
[0154]
Next, arbitrary three feature points are selected, and bases and partial patterns are calculated (FIG. 12-1203).
[0155]
The search position (FIG. 12-1207) of the hash table is determined from the invariant calculated for the base and compared with the model registered there.
[0156]
As shown in FIG. 12-1208, in this example, it is assumed that four models are registered. These registered contents describe each partial pattern, a label indicating which object it is, and a point set Rel for representing an existing area when represented relatively.
[0157]
Now, the four registered contents are compared with the partial patterns shown in FIGS. 12-1206, and hypothesis information is generated when the similarity between the patterns exceeds a set threshold.
[0158]
FIG. 12 shows a case where hypothesis information is generated only from the partial pattern corresponding to FIG. 12-1209, and the pattern similarity Sim is voted on the hypothesis voting box.
[0159]
The hypothesis space for voting (FIGS. 12-1211) has a hypothesis voting box for each region assumed to exist. In the initial state, there is no hypothesis voting box. When certain hypothesis information is generated, a hypothesis voting box is prepared for each existence position and name.
[0160]
In FIG. 12-1211, five hypothesis ballot boxes already exist. However, when new hypothesis information of different places and names is generated, a new hypothesis ballot box is prepared. If hypothesis voting boxes with the same location and name already exist and hypothesis information with the same location and name is generated, only the similarity is accumulated in the hypothesis voting box.
[0161]
In the hypothesis information generated from FIG. 12-1209, the hypothesis ballot box R3 having the same position and name already exists, so the similarity Sim (FIG. 12-1210) is added to the value of R3.
[0162]
As a result of sequentially voting in this way and performing all combinations of feature points, two object detection results are obtained from the voting results shown in FIGS. Figures 12-1212 show the results, and we can get the position and name of the two objects.
[0163]
Here, in the hypothesis space of hypothesis information (FIGS. 12-1211), R2 and R3 have the same object at almost the same position, but two hypothesis voting boxes are prepared on the assumption that hypothesis information is generated sequentially. It has been. In such a case, one result may be output by performing a process of integrating the two or selecting one. Also, when there is hypothesis information with different names at the same position, such as R4 and R5, a process of selecting the one with a high vote value is performed and the result is obtained.
[0164]
(Example of change)
(1) Modification 1
In the said Example, although the box was demonstrated to the example as a recognition target object, this is not ask | required with respect to a recognition target object. For example, it can be used to recognize various objects such as human faces and vehicles. It can also be changed to find multiple targets. Finding a plurality of objects can be realized by separately checking the result of the vote value for a plurality of objects having different spatial positions in the hypothesis integration unit.
[0165]
(2) Modification example 2
Further, the number of points used for the combination of feature points is not limited to three. At that time, the type, number, and calculation method of invariants such as length ratio and angle used in the hash function are not limited. For example, a compound ratio calculated from a combination of five points can be used as an invariant for the perspective transformation.
[0166]
For example, in the above-described embodiment, two invariants are used, but only one may be a one-dimensional hash table, or a third invariant may be adopted and a three-dimensional hash table may be used. .
[0167]
In addition, all index parameters used for the hash table need not be invariant. For example, in a situation where the change in the size of the object and the change in the direction are limited, it is possible to perform a high-speed search by using the length information and the direction information as index parameters.
[0168]
(3) Modification 3
When registering in the hash table, in consideration of an error, it may be registered in the vicinity of the registration location of each table.
[0169]
Conversely, the registration may be left as is, and the location calculated by the hash function at the time of recognition and the vicinity thereof may be searched.
[0170]
(4) Modification 4
In the above embodiment, the shading pattern is used, but the shading pattern may be changed in another pattern.
[0171]
For example, the similarity may be compared using a Fourier transform, a log-polar transform, an edge image, or other patterns.
[0172]
Further, although it is intended for gray images, it can also be applied to color images. When the color image is an RGB color expression, an application method in which similar processing is performed on each of R, G, and B, or a pattern using the color expression after conversion to another color expression may be used.
[0173]
Furthermore, color information can be used in the similarity comparison. For example, using a color histogram proposed in Non-Patent Document 7 [M. Swain and D. Ballard, "Color indexing," Int. Journal of Computer vision, vol. 22, pages 11-32, December 1991], You may compare the color histograms in an area | region.
[0174]
(5) Modification 5
Various changes can be considered for speeding up.
[0175]
For example, as employed in random Hough transform, it is possible to take a method similar to selecting feature points at random and voting. In this case, there is a method of not only selecting feature points at random but also selecting hypothesis information to be generated at random.
[0176]
(6) Modification 6
Also, various constraints on the length, angle, etc. of the vectors constituting the base information may be used for speeding up.
[0177]
For example, when a vehicle traveling on a road surface is detected, it is not necessary to search for a posture in which the vehicle has rotated 180 °. Therefore, search efficiency can be improved by limiting the search range and parameters.
[0178]
In addition, since hypothesis information generation, voting, and the like can be performed in parallel, implementation using a plurality of calculation mechanisms (parallel computer, grid environment, PC cluster) may be performed.
[0179]
(7) Modification example 7
For the feature point extraction unit 2, detection using a separability filter and corner point detection are shown in the embodiment, but not only points but also geometric objects such as lines, regions, circles, and arcs may be used. In that case, a feature point corresponding to each geometric object is prepared and passed to the feature point selection unit 3.
[0180]
For example, geometrical object detection may be used, such as performing piecewise straight line detection and using two points at both ends as feature points, or using the center point of a straight line as a feature point.
[0181]
At this time, when a plurality of types of feature points can be selected, a search may be performed using the types of the feature points as keys. For example, when a circular feature point (E) and a corner feature point (K) can be detected, when three points are selected during model registration, eight combinations are generated.
[0182]
(E, E, E,) (E, E, K) (E, K, K) (E, K, E) (K, E, E) (K, K, E) (K, E, K) (K, K, K)
If three points are selected at the time of recognition using this information, model selection may be performed using whether or not the combination is the same. This not only speeds up the model selection, but also accurately evaluates what feature points exist around the partial pattern, thereby improving recognition accuracy. Of course, the type and number of feature points are not limited.
[0183]
(8) Modification 8
In the table registration unit 7, in the embodiment, the model data structure at the time of the collision is shown to have a plurality of models as the list structure, but this data structure is not limited to this.
[0184]
For example, a tree structure can improve the efficiency of searching for a plurality of models having the same index parameter.
[0185]
In addition, it is possible to define some order structure for the model having the same index parameter to improve the efficiency of the search.
[0186]
(9) Modification Example 9
In this embodiment, only the similarity to the correct pattern is used. When matching is performed, incorrect recognition can be reduced by using an incorrect pattern. As a specific example, there is a method of subtracting the similarity with the incorrect pattern from the voting space.
[0187]
As described above, the present invention can be implemented with various modifications without departing from the spirit of the present invention.
[0188]
【The invention's effect】
The present invention is a new method for detecting recognition and recognition objects based on local image information and an efficient registration method. The calculation cost can be reduced as compared with the conventional appearance-oriented method.
[0189]
In addition, the similarity can be detected at high speed, and the detection accuracy is improved.
[0190]
Furthermore, because it uses partial information, it is strong against occlusion. In addition, since an efficient model search using a hash table is performed, it can be applied to a high-speed search.
[Brief description of the drawings]
FIG. 1 is a configuration diagram of an object recognition apparatus according to a first embodiment of the present invention.
FIG. 2 is an image subjected to a feature point detection process.
FIG. 3 is a flowchart at the time of model registration.
FIG. 4 is a model registration method.
FIG. 5 is a processing method related to model collision.
FIG. 6 is a flowchart (generation of hypothesis information) at the time of detection.
FIG. 7 is a detection method.
FIG. 8 is a flowchart of integration.
FIG. 9 is a diagram of partial pattern information.
FIG. 10 is a diagram regarding the use of global pattern information in the second embodiment.
FIG. 11 is a diagram for identifying a recognition object.
FIG. 12 is an explanatory diagram of the fourth embodiment.
[Explanation of symbols]
1 Image input section
2 Feature point extraction unit
3 feature point selector
4 Basis calculation part
5 Partial pattern components
6 Index calculator
7 Table registration department
8 Pattern similarity calculator
9 Hypothesis information generator
10 Object recognition unit

Claims

A pattern recognition device that compares an image obtained by capturing a recognition object with a pre-registered model and recognizes the recognition object,
Image input means for inputting an image in which the recognition object is photographed;
Feature point extraction means for extracting a plurality of feature points from the input image;
Feature point selection means for selecting a combination of three feature points from the plurality of extracted feature points;
Calculate a base including two reference vectors to the other two feature points from one feature point among the selected three feature points, and an angle θ between these two reference vectors A basis calculation means;
The lengths L1 and L2 of the two reference vectors in the calculated bases are converted to the same length, the angle θ is converted to a right angle, and the partial pattern of the recognition object is extracted from the image. Partial pattern extraction means to perform,
It is composed of a plurality of registration locations divided based on a ratio of lengths L1 and L2 of two reference vectors and an index parameter consisting of an angle θ between the two reference vectors , and the partial pattern of the model is A table registered by a hash function at a registration location corresponding to the ratio of the lengths L1 and L2 of the two reference vectors, which are index parameters relating to the partial pattern, and the angle θ between the two reference vectors. Table storage means for storing;
The two reference vectors of the base corresponding to the extracted partial pattern of the recognition object are stored based on the ratio of the lengths L1 and L2 before conversion to the same length and the angle θ. Index search means for determining a registered location of the table by a hash function ;
Pattern similarity calculation means for determining the partial pattern of the model registered at the registration location of the determined table and the similarity of the partial pattern of the recognition object;
The pattern recognition apparatus characterized by having.

Image input means for inputting a model image of a model to be registered in the table;
Feature point extraction means for extracting a plurality of feature points from the input image;
Feature point selection means for selecting a combination of three feature points from the plurality of extracted feature points;
Calculate a base including two reference vectors to the other two feature points from one feature point among the selected three feature points, and an angle θ between these two reference vectors A basis calculation means;
The lengths L1 and L2 of the two reference vectors in the calculated bases are converted to the same length, the angle θ is converted to a right angle, and a partial pattern of the model corresponding to the bases is converted into the model image. Partial pattern extraction means for extracting from
Using the ratio of the lengths L1 and L2 before conversion into the same length for the two reference vectors of the base corresponding to the partial pattern of the model and the angle θ as an index parameter , the registration location of the table is hashed Index determination means determined by a function ;
Table registration means for registering the partial pattern of the model at the determined location of the table;
The pattern recognition apparatus according to claim 1, further comprising:

Hypothesis information generating means for generating hypothesis information indicating where the recognition object exists in the image based on the similarity between the two partial patterns, and voting the generated hypothesis information to a hypothesis space,
Object recognition means for determining the identification, position and orientation of the recognition object based on the number of votes of hypothesis information voted for each hypothesis space;
The pattern recognition apparatus according to claim 1, further comprising:

The hypothesis information generating means includes
Vote in hypothesis voting boxes provided for each feature point in the hypothesis space, using the similarity between the two partial patterns belonging to the feature points corresponding to the models in the search location of the table as hypothesis information. By calculating the number of votes for each feature point,
The object recognition means includes
The pattern recognition apparatus according to claim 3, wherein a feature point whose number of votes in the hypothesis voting box exceeds a threshold value is a feature point corresponding to the recognition object.

The hypothesis information generating means includes
As hypothesis information, the hypothesis voting box provided for each of the existence positions in the hypothesis space, with information including the similarity between the two partial patterns belonging to the existence position of the recognition object at the search location of the table. By voting, the number of votes for each location is calculated,
The object recognition means includes
The pattern recognition apparatus according to claim 3, wherein an existence position where the number of votes in the hypothesis voting box exceeds a threshold is an existence position corresponding to the recognition object.

The hypothesis information generating means includes
A hypothesis voting box provided for each type of recognition object in the hypothesis space, with information including the similarity between both partial patterns belonging to each type of recognition object in the search location of the table as hypothesis information Voting for each feature point to obtain the number of votes for each feature point,
The object recognition means includes
The pattern recognition apparatus according to claim 3, wherein a type in which the number of votes in the hypothesis ballot box exceeds a threshold is a type corresponding to the recognition target.

The table registration means includes
The pattern recognition apparatus according to claim 2, wherein a plurality of models having the same property can be merged.

A pattern recognition method for recognizing the recognition object by comparing an image obtained by photographing a recognition object with a pre-registered model,
An image input step for inputting an image in which the recognition object is photographed;
A feature point extracting step of extracting a plurality of feature points from the input image;
A feature point selecting step of selecting a combination of three feature points from the plurality of extracted feature points;
Calculate a base including two reference vectors to the other two feature points from one feature point among the selected three feature points, and an angle θ between these two reference vectors A basis calculation step;
The lengths L1 and L2 of the two reference vectors in the calculated bases are converted to the same length, the angle θ is converted to a right angle, and the partial pattern of the recognition object is extracted from the image. A partial pattern extraction step to perform,
The length L1, L2 ratio of the two reference vectors, constructed from the angle θ or Ranaru index parameter plurality of storage locations that are divided based on between the two reference vectors, and, the partial pattern of said model Is a table registered by a hash function at a registration location corresponding to the ratio of the lengths L1 and L2 of the two reference vectors and the angle θ between the two reference vectors, which are index parameters relating to the partial pattern. A table storage step for storing
The two reference vectors of the base corresponding to the extracted partial pattern of the recognition object are stored based on the ratio of the lengths L1 and L2 before conversion to the same length and the angle θ. An index search step for determining the registration location of the selected table by a hash function ;
A pattern similarity calculation step for determining the partial pattern of the model registered at the registration location of the determined table and the similarity of the partial pattern of the recognition object;
A pattern recognition method characterized by comprising:

A program for realizing, by a computer, a pattern recognition method for recognizing the recognition object by comparing an image obtained by photographing the recognition object with a model registered in advance.
An image input function for inputting an image obtained by photographing the recognition object;
A feature point extraction function for extracting a plurality of feature points from the input image;
A feature point selection function for selecting a combination of three feature points from the plurality of extracted feature points;
Calculate a base including two reference vectors to the other two feature points from one feature point among the selected three feature points, and an angle θ between these two reference vectors A basis calculation function;
The lengths L1 and L2 of the two reference vectors in the calculated bases are converted to the same length, the angle θ is converted to a right angle, and the partial pattern of the recognition object is extracted from the image. Partial pattern extraction function,
It is composed of a plurality of registration locations divided based on a ratio of lengths L1 and L2 of two reference vectors and an index parameter consisting of an angle θ between the two reference vectors , and the partial pattern of the model is A table registered by a hash function at a registration location corresponding to the ratio of the lengths L1 and L2 of the two reference vectors, which are index parameters relating to the partial pattern, and the angle θ between the two reference vectors. A table storage function to store;
The two reference vectors of the base corresponding to the extracted partial pattern of the recognition object are stored based on the ratio of the lengths L1 and L2 before conversion to the same length and the angle θ. Index search function that determines the registration location of the table using a hash function ,
A pattern similarity calculation function for determining the partial pattern of the model registered at the registration location of the determined table and the similarity of the partial pattern of the recognition object;
A pattern recognition method program characterized by realizing the above.