JP4199594B2

JP4199594B2 - OBJECT IDENTIFICATION DEVICE, ITS PROGRAM, AND RECORDING MEDIUM CONTAINING THE PROGRAM

Info

Publication number: JP4199594B2
Application number: JP2003150248A
Authority: JP
Inventors: 良規草地; 章鈴木; 哲也杵渕; 賢一荒川; 直己伊藤; 知彦有川
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2003-05-28
Filing date: 2003-05-28
Publication date: 2008-12-17
Anticipated expiration: 2023-05-28
Also published as: JP2004355183A

Description

【０００１】
【発明の属する技術分野】
本発明は、画像内に、どのようなオブジェクトが写っているかを識別する画像識別技術に属し、その具体的な産業応用システムとして、例えば画像検索システムなどが挙げられる。
【０００２】
【従来の技術】
画像認識技術は、画像データ内のある領域がどの対象に属するかを特定する技術である。画像認識方式は、大きく分けて、パターンマッチング方式と統計的識別方式がある。
【０００３】
パターンマッチング方式では、入力画像の中にあらかじめ作成した標準パターンと同じ物があるか、あるいは近いものがあるかを検出する。標準パターンは、そのパターンを良く示す特徴を用いて表現される。代表的な手法であるテンプレートマッチング方法では、濃淡画像を特徴としたテンプレートや、濃淡画像を微分した微分濃淡画像を特徴とするテンプレートを標準パターンとして利用するのが一般的である。
【０００４】
統計的識別方式では、照合したい２枚の画像が単純な平行移動の関係にない場合や、入力画像が特徴パラメータで記述されている場合、画像の直接照合はできず、特徴記述間のマッチングが必要になる。入力画像の特徴ベクトルとある対象の同時生起確率を計算して、これが最大となる対象を入力画像が属する対象とする。これは入力画像の特徴ベクトルから見て、最も確かな対象を選んだことになる。
【０００５】
しかしながら、これらの従来の手法では、以下の２つの問題があった。
１．照合すべきパターンが膨大であるため、計算量が大きくなり、現実的な時間では識別できない。
２．識別すべきオブジェクト数が膨大になると、統計的方式では、識別精度を高めるために特徴ベクトルの次元が大きい必要がある。多次元での生起確率を求めるためには、大量の学習サンプルデータが必須であり、現実的ではない。そのため、ある程度のサンプル数を用いて正規分布を求めて生起確率を推定するが、実際の分布が正規分布ではなく、識別精度を高めるには限界がある。
【０００６】
この問題点を解く従来方法としては、テンプレートを圧縮する非特許文献１などの技術がある。
【０００７】
【非特許文献１】
村瀬洋、Ｓ．Ｋ．Ｎａｙｅｒ著「２次元照合による３次元物体認識−パラメ小リック固有空間法−」、信学論、Ｊ７７−Ｄ−ＩＩ，Ｎｏ．１１，ｐｐ２１７９−２１８７．１９９４年、１１月
【０００８】
【発明が解決しようとする課題】
上述のように、従来の手法では、
１．照合すべきパターンが膨大であるため、計算量が大きくなり、現実的な時間では識別できない。
２．識別すべきオブジェクト数が膨大になると、統計的方式では、識別精度を高めるために特徴ベクトルの次元が大きい必要がある。多次元での生起確率を求めるためには、大量の学習サンプルデータが必須であり、現実的ではない。そのため、ある程度のサンプル数を用いて正規分布を求めて生起確率を推定するが、実際の分布が正規分布ではなく、識別精度を高めるには限界がある。
以上の２つの問題点があり、この問題点を解くための、テンプレートを圧縮するという非特許文献１などの従来方法では、ある程度のオブジェクト数までは対処できるが、大きく増大する場合には対処できないという問題がある。
【０００９】
本発明は、上記従来の技術の問題点を解決するためになされたものであり、オブジェクト数が膨大に増えても判別に必要とする計算時間は増大させず、また、判別を高速に行うことができる方法および装置を提供することが課題である。
【００１０】
【課題を解決するための手段】
上記課題を解決するために、本発明は、登録された複数のオブジェクトを識別するためにオブジェクトの複数画像を用いてオブジェクトを登録するオブジェクト学習装置であって、各オブジェクトの特徴データとして特徴ベクトルを入力する入力手段と、前記各オブジェクトの特徴ベクトル間の類似度からオブジェクトを木構造として分類するオブジェクト分類手段と、前記分類された木構造の各ノードに属する特徴ベクトルを主成分分析する主成分分析手段と、を有することを特徴とするオブジェクト学習装置を、その手段とする。
【００１１】
あるいは、オブジェクトの複数画像を用いて登録された複数のオブジェクトを識別するオブジェクト識別装置であって、識別したい対象の特徴ベクトルを入力する入力手段と、複数のオブジェクトを分類した木構造のルートノードにおいて特徴ベクトルを圧縮する圧縮手段と、前記木構造の各ノードに連結するすべての子ノードにおいて圧縮された特徴ベクトルをさらに圧縮する子ノード圧縮手段と、前記木構造の各ノードにおいて圧縮された特徴ベクトルが下層のどのノードに属するかを判別する判別手段と、前記木構造の末端ノードにおいて該当するカテゴリを出力する出力手段と、を有することを特徴とするオブジェクト識別装置を、その手段とする。
【００１２】
あるいは、上記オブジェクト識別装置において、前記入力手段は、各オブジェクトの画像データを入力する画像入力手段と、画像データから特徴を抽出する特徴抽出手段と、を有することを特徴とするオブジェクト識別装置を、その手段とする。
【００１３】
あるいは、上記オブジェクト識別装置において、前記オブジェクト分類手段は、親ノードにおいて、各オブジェクト同士の特徴ベクトルの相関を計算する相関計算手段と、相関の距離値によってオブジェクトを複数の子ノードに分類する子ノード分類手段と、を有することを特徴とするオブジェクト識別装置を、その手段とする。
【００１４】
あるいは、上記オブジェクト識別装置において、前記判別手段は、各ノードにおいて、圧縮された特徴ベクトルと、子ノードにおいて圧縮された子ノード圧縮特徴ベクトルの間の距離である部分空間距離を計算する部分空間距離計算手段と、最小の部分空間距離を示す子ノードを選択する選択手段と、を有することを特徴とするオブジェクト識別装置を、その手段とする。
【００１５】
あるいは、登録された複数のオブジェクトを識別するためにオブジェクトの複数画像を用いてオブジェクトを登録するオブジェクト学習方法であって、各オブジェクトの特徴データとして特徴ベクトルを入力する入力段階と、前記各オブジェクトの特徴ベクトル間の類似度からオブジェクトを木構造として分類するオブジェクト分類段階と、前記分類された木構造の各ノードに属する特徴ベクトルを主成分分析する主成分分析段階と、を有することを特徴とするオブジェクト学習方法を、その手段とする。
【００１６】
あるいは、オブジェクトの複数画像を用いて登録された複数のオブジェクトを識別するオブジェクト識別方法であって、識別したい対象の特徴ベクトルを入力する入力段階と、複数のオブジェクトを分類した木構造のルートノードにおいて特徴ベクトルを圧縮する圧縮段階と、前記木構造の各ノードに連結するすべての子ノードにおいて圧縮された特徴ベクトルをさらに圧縮する子ノード圧縮段階と、前記木構造の各ノードにおいて圧縮された特徴ベクトルが下層のどのノードに属するかを判別する判別段階と、前記木構造の末端ノードにおいて該当するカテゴリを出力する出力段階と、を有することを特徴とするオブジェクト識別方法を、その手段とする。
【００１７】
あるいは、上記オブジェクト識別方法において、前記入力段階は、各オブジェクトの画像データを入力する画像入力段階と、画像データから特徴を抽出する特徴抽出段階と、を有することを特徴とするオブジェクト識別方法を、その手段とする。
【００１８】
あるいは、上記オブジェクト識別方法において、前記オブジェクト分類段階は、親ノードにおいて、各オブジェクト同士の特徴ベクトルの相関を計算する相関計算段階と、相関の距離値によってオブジェクトを複数の子ノードに分類する子ノード分類段階と、を有することを特徴とするオブジェクト識別方法を、その手段とする。
【００１９】
あるいは、上記オブジェクト識別方法において、前記判別段階は、各ノードにおいて、圧縮された特徴ベクトルと、子ノードにおいて圧縮された子ノード圧縮特徴ベクトルの間の距離である部分空間距離を計算する部分空間距離計算段階と、最小の部分空間距離を示す子ノードを選択する選択段階と、を有することを特徴とするオブジェクト識別方法を、その手段とする。
【００２０】
あるいは、上記オブジェクト学習方法における段階を、コンピュータに実行させるためのプログラムとしたことを特徴とするオブジェクト学習プログラムを、その手段とする。
【００２１】
あるいは、上記オブジェクト学習方法における段階を、コンピュータに実行させるためのプログラムとし、該プログラムを、該コンピュータが読み取りできる記録媒体に記録したことを特徴とするオブジェクト学習プログラムを記録した記録媒体を、その手段とする。
【００２２】
あるいは、上記オブジェクト識別方法における段階を、コンピュータに実行させるためのプログラムとしたことを特徴とするオブジェクト識別プログラムを、その手段とする。
【００２３】
あるいは、上記オブジェクト識別方法における段階を、コンピュータに実行させるためのプログラムとし、該プログラムを、該コンピュータが読み取りできる記録媒体に記録したことを特徴とするオブジェクト識別プログラムを記録した記録媒体を、その手段とする。
【００２４】
本発明では、複数のオブジェクトを登録する手段や段階において、各オブジェクトの特徴ベクトル間の類似度からオブジェクトを木構造として分類することにより、カテゴリ数が増えても判別に必要とする計算時間をそれほど増大させない。また、分類された木構造の各ノードに属する特徴ベクトルを主成分分析し、オブジェクトを識別する手段や段階において、木構造の各ノードにおいて主成分を用いて特徴ベクトルを圧縮し、下層のどのノードに属するかを判別することにより、部分空間距離を高速に求めることを可能とし、特徴ベクトルが下層のどのノードに属するかを高速に判別可能とする。
【００２５】
【発明の実施の形態】
以下、本発明の実施の形態について図を用いて詳細に説明する。
【００２６】
図１は、本発明の一実施の形態例によるシステム構成を示す図であって、本発明を翻訳システムに適用した例であり、本システムにより、ユーザは、撮影した文字／記号／商標／看板の画像を基に、その文字／記号／商標／看板の翻訳情報をみることができる。ただし、文字／記号／商標／看板の翻訳辞書を作成していることが前提となる。本システムは、本発明による学習装置１と識別装置２、および翻訳装置３等の登録情報蓄積検索装置から構成される。本発明は、学習装置１により文字／記号／商標／看板の異なるフォントの画像を含んだ複数画像を用いて文字／記号／商標／看板を登録し、識別装置２により登録された複数の文字／記号／商標／看板を識別する。翻訳装置３は、文字／記号／商標／看板名と登録情報を蓄積しておき、文字／記号／商標／看板名から翻訳情報を検索する装置であり、一般のデータベースにより構築できるため本実施の形態例では詳細を記載しない。
【００２７】
本発明は、上述のように学習装置１、識別装置２から構成されている。学習装置１は、文字／記号／商標／看板の複数視点画像から識別に必要な情報を求め、蓄積装置に蓄積する。識別装置２は、ユーザが入力する画像と学習装置１の蓄積装置に蓄積された識別に必要な情報を利用し、画像に撮影された文字／記号／商標／看板を識別する。
【００２８】
以下の例では、「あ」、「い」、「う」、「え」、「お」、「か」の６種類の文例を登録／識別する場合を例に説明する。ただし、本例は、オブジェクトを６種類に限定するものではなく、何種類にでも拡張可能である。
【００２９】
図２は、本発明の第１の実施の形態例の構成を示した図であって、学習装置１および識別装置２の詳細を説明している。
【００３０】
学習装置１は、各オブジェクトの特徴ベクトルを入力する入力手段１１と、各オブジェクトの特徴ベクトル間の類似度からオブジェクトを木構造として分類するオブジェクト分類手段１２と、前記木構造のノードにおいて所属するオブジェクトの特徴ベクトルを用いて主成分分析する主成分分析手段１３と、主成分分析の結果得られた固有ベクトルおよび主成分を蓄積する蓄積手段１４から構成されている。
【００３１】
識別装置２は、識別したい対象の特徴ベクトルを入力する入力手段２１と、前記木構造のルートノードにおいて特徴ベクトルとルートノードの固有ベクトルを用いて圧縮する圧縮手段２２と、前記木構造の各ノードに連結するすべての子ノードにおいて圧縮された特徴ベクトルをさらに圧縮する子ノード圧縮手段２３と、前記木構造の各ノードにおいて圧縮された特徴ベクトルが下層のどのノードに属するかを判別する判別手段２４と、前記木構造の末端ノードにおいて該当するカテゴリを出力する出力手段２５より構成される。
【００３２】
図３は、本発明の第２の実施の形態例の構成を示した図であって、学習装置１および識別装置２の詳細を説明している。前述の第１の実施の形態例と異なる点は、前記入力手段２１が、オブジェクトの画像を入力する画像入力手段２１１と、画像から識別に必要な特徴を描出する特徴抽出手段２１２から構成される点である。
【００３３】
以下、本発明の方法の実施の形態例を説明する。
【００３４】
まず、学習段階について説明する。図４は、本発明の学習段階の一実施の形態例を説明した図であって、学習装置１の動作例を示したフローチャートである。学習装置１では以下の処理を行う。
１．全画像数Ｈ分繰り返し［１］
２．対象画像の入力
３．繰り返し［１］終了
４．全画像数Ｈ分繰り返し［２］
５．特徴の抽出
６．繰り返し［２］終了
７．オブジェクトの分類
８．木構造のノード数Ｊ分繰り返し［３］
９．主成分分析
１０．繰り返し［３］終了
１１．主成分、圧縮特徴の蓄積。
【００３５】
以下、図を利用しながら図２、図３の各手段における動作例について詳細に説明する。
【００３６】
図５は、本発明のオブジェクト分類手段１２の一実施の形態例を説明した図である。オブジェクト分類手段１２は、オブジェクトの特徴ベクトル間の相関を計算する相関計算手段１２１と、オブジェクト同士を木構造に分類する子ノード分類手段１２２より備成される。
【００３７】
図６は、本実施の形態例によるオブジェクト分類の動作例を示したフローチャートである。
１．残りカテゴリ数＝オブジェクト数とする。
２．残りカテゴリ数が１になるまで繰り返し［１］
３．残りカテゴリ数分繰り返し［２］
４．残りカテゴリ集団からカテゴリを１つ選択
５．残りカテゴリ数−１分繰り返し［３］
６．選択したカテゴリと残りカテゴリ集団に属するすべてのカテゴリと特徴の相関を計算する。
７．繰り返し［３］終了
８．相関値が閾値以上のカテゴリを統合して統合カテゴリとし、新カテゴリ集団に移す。相関値が閾値以上となるカテゴリが複数存在する場合は、それらのカテゴリをすべて統合する。
９．統合カテゴリ内の特徴ベクトルの平均を求め、代表特徴とする。
１０．繰り返し［２］終了。
１１．新カテゴリに属するカテゴリをすべて残りカテゴリに移す。
１２．残りカテゴリ数を更新、８で統合が１度も起こらなかった場合、閾値を更新。
１３．繰り返し［１］終了。
【００３８】
ただし、閾値およびその更新方法はあらかじめ入されている。統合が起こらなかった場合、統合を促すために、閾値を下げるように設定する。例えば、
更新後の閾値＝現在の閾値−０．１
ただし、閾値＜０の場合は、閾値＝０とする。
【００３９】
図７はオブジェクト分類の過程を示したフローチャートであり、各状態について説明する。
【００４０】
状態１は、残りカテゴリから「あ」を選択し、残りのカテゴリとの相関を計算した状態である。「い」との相関値が高いため、「あ」と「い」を統合して統合カテゴリとし、新カテゴリとする。次に、残りカテゴリから「う」を選択し、残りのカテゴリとの相関を計算する（状態２）。「え」との相関値が高いため、「う」と「え」を統合して統合カテゴリとし、新カテゴリとする。次に残りカテゴリから「お」を選択し、残りのカテゴリとの相関を計算する（状態３）。「か」との相関恒が高いため、「お」と「か」を統合して統合カテゴリとし、新カテゴリとする。新カテゴリを残りカテゴリに移す（状態４）。
【００４１】
上記の過程を残りカテゴリ数が状態４において１つになるまで繰り返す。その結果構成される木構造の例を図８に示す。各段階での統合カテゴリをノード、最上位のノードをルートノード、最下位のノードを末端ノード、ルートノードから末端ノードまでの距離を木構造の段数と呼ぶ。
【００４２】
図９は、各ノードにおいて主成分分析する主成分分析手段１３の一実施の形態例であり、図９（ａ）のルートノード（あいうえおか）では、各カテゴリの特徴ベクトルＤ０（あ）、Ｄ０（い）、Ｄ０（う）、Ｄ０（え）、Ｄ０（お）を主成分分析し、固有ベクトルＳ（あいうえお）と各圧縮特徴ベクトルＤ１（あ）、Ｄ１（い）、Ｄ１（う）、Ｄ１（え）、Ｄ１（お）、Ｄ１（か）を得る。図９（ｂ）のノード（あい）では、各カテゴリの特徴ベクトルＤ１（あ）、Ｄ１（い）を主成分分析し、固有ベクトルＳ（あい）と各圧縮特徴ベクトルＤ２（あ）、Ｄ２（い）を得る。図９（ｃ）のノード「あ」では、特徴ベクトルＤ２（あ）を主成分分析し、固有ベクトルＳ（あ）と圧縮特徴ベクトルＤ３（あ）を得る。
【００４３】
次に、本発明の方法の一実施の形態例による識別段階について説明する。
【００４４】
図１０は、識別段階での動作例を示したフローチャートである。
１．識別したい画像の入力
２．画徴から特徴を抽出する。
３．ルートノードを選択し、特徴ベクトルを圧縮する。
４．木構造の段数分（本例では３）繰り返し［１］
５．選択されたノードの子ノード数分繰り返し［２］
６．子ノードにおいて子ノード圧縮特徴ヘクトルを計算する。
７．繰り返し［２］終了
８．判別をする。
９．最小の部分空間距離を示す子ノードを選択する。
１０．繰り返し［１］終了
１１．選択されたノードのカテゴリを出力する。
【００４５】
図１１は、本発明の判別手段２４の一実施の形態例を示した図であって、各ノードにおいて、圧縮された特徴ベクトルと、子ノードにおいて圧縮された子ノード圧縮特徴ベクトルの間の距離（部分空間距離）を計算する部分空間距離計算手段２４１と、最小の部分空間距離を示す子ノードを選択する選択手段２４２より構成される。
【００４６】
図１２は、本発明による判別の流れの一実施の形態例を示したフローチャートである。
１．子ノード数分繰り返し［１］
２．圧縮特徴ベクトルと子ノード圧縮特徴ベクトルから、部分空間距離を計算する。
３．繰り返し［１］終了
４．部分空間距離が最も小さい子ノードを選択する。
【００４７】
図１３は、識別を行う場合の流れの一実施の形態例を示したフローチャートである。
【００４８】
特徴ベクトルＤ０（入力）が入力されると、固有ベクトルＳ（あいうえおか）を利用してＤ０（入力）を圧縮し、Ｄ１（入力）を生成する。次に、ルートノードの子ノードである「あい」、「うえ」、「おか」の各子ノード上で、各固有ベクトルＳ（あい）、Ｓ（うえ）、Ｓ（おが）を用いて子ノード圧縮特徴ベクトルＤ２１（入力）、Ｄ２２（入力）、Ｄ２３（入力）を求める。
【００４９】
Ｄ１（入力）とＤ２１（入力）から部分空間距離Ｌ２１（入力）を求める。Ｄ２（人力）とＤ２２（入力）から部分空間距離Ｌ２２（入力）を求める。Ｄ１（入力）とＤ２３（入力）から部分空間距離Ｌ２３（入力）を求める。Ｌ２１（入力）、Ｌ２２（入力）、Ｌ２３（入力）の中から、最小となるノード（ここでは「あい」）を選択する。
【００５０】
さらに、「あい」の子ノードである「あ」、「い」の各子ノード上で、各固有ベクトルＳ（あ）、Ｓ（い）を用いて子ノード圧縮特徴ベクトルＤ３１（入力）、Ｄ３２（入力）を求める。Ｄ２１（入力）とＤ３１（入力）から部分空間距離Ｌ３１（入力）を求める。Ｄ２１（入力）とＤ３２（入力）から部分空間距離Ｌ３２（入力）を求める。Ｌ３１（入力）、Ｌ３２（入力）の中から、最小となるノード（ここでは「あい」）を選択する。最後に「あ」を出力する。
【００５１】
なお、図２、図３、図５、および図１１を用いて説明した各装置における各部の一部もしくは全部の機能をコンピュータのプログラムで構成し、そのプログラムをコンピュータを用いて実行して本発明を実現することができること、あるいは、図４、図６〜図１０、図１２、および図１３を用いて説明した各段階での処理の段階をコンピュータのプログラムで構成し、そのプログラムをコンピュータに実行させることができることは言うまでもなく、コンピュータでその機能を実現するためのプログラム、あるいは、コンピュータにその処理の段階を実行させるためのプログラムを、そのコンピュータが読み取り可能な記録媒体、例えば、フレキシブルディスクや、ＭＯ、ＲＯＭ、メモリカード、ＣＤ、ＤＶＤ、リムーバブルディスクなどに記録して、保存したり、配布したりすることが可能である。また、上記のプログラムをインターネットや電子メールなど、ネットワークを通して提供することも可能である。このように、記録媒体やネットワークにより提供されたプログラムをコンピュータにインストールすることで、本発明が実施可能となる。
【００５２】
【発明の効果】
以上説明したように、本発明によれば、複数のオブジェクトを登録する手段や段階において、各オブジ上クドの特徴ヘクトル間の類似度からオブジェクトを木構造として分類するため、カテゴリ数が増えても判別に必要とする計算時間はそれほど増大しない。また、分類された木構造の各ノードに属する特徴ベクトルを主成分分析し、オブジェクトを識別する手段や段階において、木構造の各ノードにおいて主成分を用いて特徴ベクトルを圧縮するため、部分空間距離を高速に求めることができ、特徴ベクトルが下層のどのノードに属するかを高速に判別できる。
【図面の簡単な説明】
【図１】本発明の一実施の形態例によるシステム構成を示す図
【図２】本発明の第１の実施の形態例の構成を示した図
【図３】本発明の第２の実施の形態例の構成を示した図
【図４】本発明の学習段階の一実施の形態例を説明したフローチャート
【図５】本発明によるオブジェクト分類手段の一実施の形態例を説明した図
【図６】本発明の一実施の形態例によるオブジェクト分類の動作例を示したフローチャート
【図７】本発明の一実施の形態例によるオブジェクト分類の過程を示すフローチャート
【図８】本発明の一実施の形態例によるオブジェクトの分類結果の例を示す図
【図９】（ａ），（ｂ），（ｃ）は、本発明の一実施の形態例による各ノードにおける主成分分析の流れを示す図
【図１０】本発明の一実施の形態例による識別段階での動作例を示したフローチャート
【図１１】本発明による判別手段の一実施の形態例を示した図
【図１２】本発明による判別の流れの一実施の形態例を示したフローチャート
【図１３】本発明による識別を行う場合の流れの一実施の形態例を示したフローチャート
【符号の説明】
１…学習装置
１１…入力手段
１１１…画像入力手段
１１２…特徴抽出手段
１２…オブジェクト分類手段
１２１…相関計算手段
１２２…子ノード分類手段
１３…主成分分析手段
１４…蓄積手段
２…識別装置
２１…入力手段
２１１…画像入力手段
２１２…特徴抽出手段
２２…圧縮手段
２３…子ノード圧縮手段
２４…判別手段
２４１…部分空間距離計算手段
２４２…選択手段
２５…出力手段
３…翻訳装置[0001]
BACKGROUND OF THE INVENTION
The present invention belongs to an image identification technique for identifying what kind of object is shown in an image, and a specific industrial application system includes, for example, an image search system.
[0002]
[Prior art]
The image recognition technique is a technique for specifying to which object a certain area in image data belongs. Image recognition methods are broadly divided into pattern matching methods and statistical identification methods.
[0003]
In the pattern matching method, it is detected whether there is an input image that is the same as or similar to a standard pattern created in advance. The standard pattern is expressed using features that clearly indicate the pattern. In a template matching method as a typical method, a template characterized by a grayscale image or a template characterized by a differential grayscale image obtained by differentiating the grayscale image is generally used as a standard pattern.
[0004]
In the statistical identification method, when two images to be collated do not have a simple translational relationship, or when the input image is described by a feature parameter, the images cannot be directly collated, and matching between feature descriptions is not possible. I need it. The feature vector of the input image and the co-occurrence probability of a certain target are calculated, and the target having the maximum value is determined as the target to which the input image belongs. This means that the most reliable object is selected from the feature vector of the input image.
[0005]
However, these conventional methods have the following two problems.
1. Since the number of patterns to be verified is enormous, the amount of calculation increases and cannot be identified in a realistic time.
2. When the number of objects to be identified becomes enormous, in the statistical method, the dimension of the feature vector needs to be large in order to increase the identification accuracy. In order to obtain the multidimensional occurrence probability, a large amount of learning sample data is essential, which is not realistic. For this reason, a normal distribution is obtained by using a certain number of samples and the occurrence probability is estimated. However, the actual distribution is not a normal distribution, and there is a limit to increasing the identification accuracy.
[0006]
As a conventional method for solving this problem, there is a technique such as Non-Patent Document 1 for compressing a template.
[0007]
[Non-Patent Document 1]
Hiroshi Murase, S. K. Nayer, “Three-dimensional object recognition by two-dimensional matching-Paramelic rick eigenspace method-”, Theory of Science, J77-D-II, No. 11, pp 2179-2187. 1994, November.
[Problems to be solved by the invention]
As mentioned above, in the conventional method,
1. Since the number of patterns to be verified is enormous, the amount of calculation increases and cannot be identified in a realistic time.
2. When the number of objects to be identified becomes enormous, in the statistical method, the dimension of the feature vector needs to be large in order to increase the identification accuracy. In order to obtain the multidimensional occurrence probability, a large amount of learning sample data is essential, which is not realistic. For this reason, a normal distribution is obtained by using a certain number of samples and the occurrence probability is estimated. However, the actual distribution is not a normal distribution, and there is a limit to increasing the identification accuracy.
There are the above two problems, and a conventional method such as Non-Patent Document 1 for compressing a template for solving this problem can cope with a certain number of objects, but cannot cope with a large increase in the number of objects. There is a problem.
[0009]
The present invention has been made in order to solve the above-described problems of the prior art, and does not increase the calculation time required for discrimination even when the number of objects increases enormously, and performs discrimination at high speed. It is an object to provide a method and apparatus capable of performing the above.
[0010]
[Means for Solving the Problems]
In order to solve the above problems, the present invention is an object learning device for registering an object using a plurality of images of an object in order to identify a plurality of registered objects, wherein a feature vector is used as feature data of each object. An input means for inputting; an object classification means for classifying an object as a tree structure based on the similarity between the feature vectors of each object; and a principal component analysis for principal component analysis of feature vectors belonging to each node of the classified tree structure An object learning apparatus characterized by comprising means is a means.
[0011]
Alternatively, in an object identification device for identifying a plurality of registered objects using a plurality of images of the object, in an input means for inputting a target feature vector to be identified and a root node of a tree structure in which the plurality of objects are classified Compression means for compressing feature vectors; child node compression means for further compressing feature vectors compressed at all child nodes connected to each node of the tree structure; and feature vectors compressed at each node of the tree structure An object identification apparatus comprising: a determination unit that determines which node the node belongs to; and an output unit that outputs a corresponding category at the end node of the tree structure.
[0012]
Alternatively, in the object identification device, the input unit includes: an image input unit that inputs image data of each object; and a feature extraction unit that extracts a feature from the image data. That means.
[0013]
Alternatively, in the object identification device, the object classification unit includes a correlation calculation unit that calculates a correlation between feature vectors of each object at a parent node, and a child node that classifies the object into a plurality of child nodes based on a correlation distance value. An object identification device characterized by having classification means is used as the means.
[0014]
Alternatively, in the object identification device, the determination unit calculates a subspace distance, which is a distance between a compressed feature vector and a child node compressed feature vector compressed at a child node, at each node. An object identification apparatus characterized by having a calculation means and a selection means for selecting a child node indicating the minimum subspace distance is used as the means.
[0015]
Alternatively, an object learning method for registering an object using a plurality of images of an object to identify a plurality of registered objects, an input step of inputting a feature vector as feature data of each object; An object classification stage for classifying an object as a tree structure based on a similarity between feature vectors, and a principal component analysis stage for principal component analysis of a feature vector belonging to each node of the classified tree structure The object learning method is used as the means.
[0016]
Alternatively, an object identification method for identifying a plurality of registered objects using a plurality of images of an object, in an input stage for inputting a target feature vector to be identified, and a root node of a tree structure in which the plurality of objects are classified A compression stage for compressing feature vectors; a child node compression stage for further compressing feature vectors compressed at all child nodes connected to each node of the tree structure; and a feature vector compressed at each node of the tree structure The object identifying method is characterized in that it has a determining step of determining which node in the lower layer belongs to and an output step of outputting a corresponding category at the end node of the tree structure.
[0017]
Alternatively, in the object identification method, the input step includes an image input step of inputting image data of each object, and a feature extraction step of extracting features from the image data. That means.
[0018]
Alternatively, in the object identification method, the object classification step includes: a correlation calculation step of calculating a correlation between feature vectors of each object at a parent node; and a child node that classifies the object into a plurality of child nodes according to a correlation distance value. An object identification method characterized by having a classification step is used as the means.
[0019]
Alternatively, in the object identification method, the determination step includes calculating a subspace distance, which is a distance between a compressed feature vector and a child node compressed feature vector compressed at a child node, at each node. The object identifying method is characterized by having a calculation step and a selection step of selecting a child node indicating the minimum subspace distance.
[0020]
Alternatively, an object learning program characterized in that the step in the object learning method is a program for causing a computer to execute is used as the means.
[0021]
Alternatively, the means in the object learning method is a program for causing a computer to execute the program, and the program is recorded on a recording medium readable by the computer. And
[0022]
Alternatively, an object identification program characterized in that the step in the object identification method is a program for causing a computer to execute is used as the means.
[0023]
Alternatively, a means for causing a computer to execute the steps in the object identifying method, and recording the object identifying program, wherein the program is recorded on a recording medium readable by the computer. And
[0024]
In the present invention, in the means and stage for registering a plurality of objects, by classifying the objects as a tree structure based on the similarity between the feature vectors of each object, the calculation time required for discrimination can be reduced even if the number of categories increases. Do not increase. In addition, the principal component analysis is performed on the feature vectors belonging to each node of the classified tree structure, and the feature vector is compressed using the principal component at each node of the tree structure in the means and stage for identifying the object, so that By determining whether the feature vector belongs to the subspace, it is possible to determine the partial space distance at high speed, and to determine which node in the lower layer the feature vector belongs to at high speed.
[0025]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
[0026]
FIG. 1 is a diagram showing a system configuration according to an embodiment of the present invention, which is an example in which the present invention is applied to a translation system. With this system, a user can take a photographed character / symbol / trademark / signboard. The translation information of the character / symbol / trademark / signboard can be viewed based on the image of However, it is assumed that a translation dictionary of characters / symbols / trademarks / signboards has been created. This system is composed of a registered information storage / retrieval device such as a learning device 1, an identification device 2, and a translation device 3 according to the present invention. The present invention registers a character / symbol / trademark / signboard using a plurality of images including images of different fonts of characters / symbols / trademarks / signboards by the learning device 1, and a plurality of characters / symbols registered by the identification device 2. Identify symbols / trademarks / signs. The translation device 3 is a device that accumulates characters / symbols / trademarks / signboard names and registration information, and retrieves translation information from the characters / symbols / trademarks / signboard names. Details are not described in the embodiment.
[0027]
The present invention includes the learning device 1 and the identification device 2 as described above. The learning device 1 obtains information necessary for identification from a plurality of viewpoint images of characters / symbols / trademarks / signboards and accumulates the information in a storage device. The identification device 2 identifies the character / symbol / trademark / signboard photographed in the image by using the image input by the user and the information necessary for identification stored in the storage device of the learning device 1.
[0028]
In the following example, a case where six types of sentence examples of “a”, “i”, “u”, “e”, “o”, “ka” are registered / identified will be described as an example. However, this example does not limit the number of objects to six types, and the number of types can be expanded.
[0029]
FIG. 2 is a diagram showing a configuration of the first exemplary embodiment of the present invention, and details of the learning device 1 and the identification device 2 are described.
[0030]
The learning apparatus 1 includes an input unit 11 for inputting a feature vector of each object, an object classifying unit 12 for classifying an object as a tree structure based on a similarity between the feature vectors of each object, and an object belonging to the node of the tree structure The principal component analysis means 13 for performing principal component analysis using the feature vectors and the accumulation means 14 for accumulating eigenvectors and principal components obtained as a result of the principal component analysis.
[0031]
The identification device 2 includes an input unit 21 that inputs a feature vector of an object to be identified, a compression unit 22 that compresses a feature vector and an eigenvector of the root node at the root node of the tree structure, and each node of the tree structure. A child node compression means 23 for further compressing feature vectors compressed in all the connected child nodes; a discrimination means 24 for determining which node in the lower layer the feature vector compressed in each node of the tree structure belongs; The output means 25 outputs the corresponding category at the end node of the tree structure.
[0032]
FIG. 3 is a diagram showing the configuration of the second exemplary embodiment of the present invention, and details the learning device 1 and the identification device 2. The difference from the first embodiment is that the input means 21 is composed of an image input means 211 for inputting an image of an object and a feature extraction means 212 for rendering a feature necessary for identification from the image. Is a point.
[0033]
Hereinafter, embodiments of the method of the present invention will be described.
[0034]
First, the learning stage will be described. FIG. 4 is a diagram illustrating an embodiment of the learning stage according to the present invention, and is a flowchart illustrating an operation example of the learning device 1. The learning device 1 performs the following processing.
1. Repeat for the total number of images H [1]
2. 2. Input of target image Repeat [1] End 4. Repeated for H for all images [2]
5. Feature extraction Repeat [2] End 7. Classification of objects Repeat for J nodes in tree structure [3]
9. Principal component analysis10. Repeat [3] End 11. Accumulation of principal components and compression features.
[0035]
In the following, an example of the operation of each means in FIGS. 2 and 3 will be described in detail with reference to the drawings.
[0036]
FIG. 5 is a diagram for explaining an embodiment of the object classification means 12 of the present invention. The object classification unit 12 includes a correlation calculation unit 121 that calculates a correlation between feature vectors of objects, and a child node classification unit 122 that classifies the objects into a tree structure.
[0037]
FIG. 6 is a flowchart showing an operation example of object classification according to the present embodiment.
1. The number of remaining categories = the number of objects.
2. Repeat until the number of remaining categories is 1 [1]
3. Repeat for the remaining number of categories [2]
4). 4. Select one category from the remaining category group Number of remaining categories-1 minute repeat [3]
6). The correlation between the selected category and all categories belonging to the remaining category group and the feature is calculated.
7). Repeat [3] End 8. The categories whose correlation values are greater than or equal to the threshold are integrated into an integrated category and transferred to a new category group. When there are a plurality of categories having a correlation value equal to or greater than the threshold value, all these categories are integrated.
9. An average of feature vectors in the integrated category is obtained and used as a representative feature.
10. Repeat [2] end.
11. Move all categories that belong to the new category to the remaining categories.
12 The number of remaining categories is updated. If integration has never occurred at 8, the threshold value is updated.
13. Repeat [1] End.
[0038]
However, the threshold and its updating method are entered in advance. When integration does not occur, a threshold is set to be lowered in order to promote integration. For example,
Threshold after update = current threshold−0.1
However, when the threshold value <0, the threshold value = 0.
[0039]
FIG. 7 is a flowchart showing the process of object classification, and each state will be described.
[0040]
State 1 is a state in which “A” is selected from the remaining categories and the correlation with the remaining categories is calculated. Since the correlation value with “I” is high, “A” and “I” are integrated into an integrated category to be a new category. Next, “U” is selected from the remaining categories, and the correlation with the remaining categories is calculated (state 2). Since the correlation value with “e” is high, “u” and “e” are integrated into an integrated category to be a new category. Next, “O” is selected from the remaining categories, and the correlation with the remaining categories is calculated (state 3). Since the correlation with “ka” is high, “o” and “ka” are integrated into an integrated category, which is a new category. The new category is moved to the remaining category (state 4).
[0041]
The above process is repeated until the number of remaining categories becomes one in state 4. An example of the resulting tree structure is shown in FIG. The integrated category at each stage is called a node, the highest node is a root node, the lowest node is a terminal node, and the distance from the root node to the terminal node is called the number of stages in the tree structure.
[0042]
FIG. 9 shows an embodiment of principal component analysis means 13 for principal component analysis at each node. In the root node (Aiueoka) of FIG. 9A, feature vectors D0 (A) and D0 of each category are shown. (I), D0 (U), D0 (E), D0 (O) are subjected to principal component analysis, and the eigenvector S (Aiueo) and each compressed feature vector D1 (A), D1 (I), D1 (U), D1 (E), D1 (O) and D1 (Ka) are obtained. In the node (Ai) of FIG. 9B, the feature vectors D1 (A) and D1 (I) of each category are subjected to principal component analysis, and the eigenvector S (Ai) and the compressed feature vectors D2 (A) and D2 (I) are analyzed. ) In the node “A” in FIG. 9C, the principal component analysis is performed on the feature vector D2 (A) to obtain the eigenvector S (A) and the compressed feature vector D3 (A).
[0043]
Next, the identification step according to an embodiment of the method of the present invention will be described.
[0044]
FIG. 10 is a flowchart showing an operation example in the identification stage.
1. 1. Input of image to be identified Extract features from the picture.
3. Select the root node and compress the feature vector.
4). Repeat for the number of steps in the tree structure (3 in this example) [1]
5. Repeat for the number of child nodes of the selected node [2]
6). Compute the child node compression feature vector at the child node.
7). Repeat [2] End 8. Make a decision.
9. Select the child node that shows the smallest subspace distance.
10. Repeat [1] End 11. Output the category of the selected node.
[0045]
FIG. 11 is a diagram showing an embodiment of the discriminating means 24 of the present invention, in which the distance between the feature vector compressed at each node and the child node compressed feature vector compressed at the child node. It comprises subspace distance calculation means 241 for calculating (subspace distance) and selection means 242 for selecting a child node indicating the minimum subspace distance.
[0046]
FIG. 12 is a flowchart showing an embodiment of the determination flow according to the present invention.
1. Repeat for the number of child nodes [1]
2. A subspace distance is calculated from the compressed feature vector and the child node compressed feature vector.
3. Repeat [1] End 4. Select the child node with the smallest subspace distance.
[0047]
FIG. 13 is a flowchart showing an embodiment of a flow for performing identification.
[0048]
When the feature vector D0 (input) is input, D0 (input) is compressed using the eigenvector S (Aiueoka) to generate D1 (input). Next, on each child node of “Ai”, “Ue”, and “Oka” that are child nodes of the root node, each eigenvector S (Ai), S (Ue), S (Oga) is used and the child node Compression feature vectors D21 (input), D22 (input), and D23 (input) are obtained.
[0049]
A subspace distance L21 (input) is obtained from D1 (input) and D21 (input). The subspace distance L22 (input) is obtained from D2 (human power) and D22 (input). A subspace distance L23 (input) is obtained from D1 (input) and D23 (input). The smallest node (here, “ai”) is selected from L21 (input), L22 (input), and L23 (input).
[0050]
Further, on each child node of “A” and “I” which are child nodes of “Ai”, using each eigenvector S (A), S (I), a child node compressed feature vector D31 (input), D32 ( Input). A subspace distance L31 (input) is obtained from D21 (input) and D31 (input). A subspace distance L32 (input) is obtained from D21 (input) and D32 (input). The smallest node (here, “ai”) is selected from L31 (input) and L32 (input). Finally, “A” is output.
[0051]
It should be noted that some or all of the functions of each unit in each apparatus described with reference to FIGS. 2, 3, 5, and 11 are configured by a computer program, and the program is executed using the computer. Or the processing steps at each step described with reference to FIG. 4, FIG. 6 to FIG. 10, FIG. 12 and FIG. 13 are configured by a computer program and the program is executed on the computer. Needless to say, a program for realizing the function of the computer or a program for causing the computer to execute the processing stage can be recorded on a recording medium that can be read by the computer, for example, a flexible disk, For MO, ROM, memory card, CD, DVD, removable disk, etc. Was recorded, or stored, it is possible to or distribute. It is also possible to provide the above program through a network such as the Internet or electronic mail. In this way, the present invention can be implemented by installing a program provided by a recording medium or a network in a computer.
[0052]
【The invention's effect】
As described above, according to the present invention, in the means and stage for registering a plurality of objects, the objects are classified as a tree structure based on the similarity between the feature vectors of each object, so that the number of categories increases. However, the calculation time required for discrimination does not increase so much. In addition, the feature vector belonging to each node of the classified tree structure is subjected to principal component analysis, and the feature vector is compressed using the principal component at each node of the tree structure in the means and stage for identifying the object. Can be determined at high speed, and it can be determined at high speed which node the feature vector belongs to.
[Brief description of the drawings]
FIG. 1 is a diagram illustrating a system configuration according to an embodiment of the present invention. FIG. 2 is a diagram illustrating a configuration of a first embodiment of the present invention. FIG. 3 is a diagram illustrating a second embodiment of the present invention. FIG. 4 is a flow chart illustrating an embodiment of the learning stage of the present invention. FIG. 5 is a diagram illustrating an embodiment of the object classifying means according to the present invention. FIG. 7 is a flowchart showing an operation example of object classification according to an embodiment of the present invention. FIG. 7 is a flowchart showing a process of object classification according to an embodiment of the present invention. FIG. 9 is a diagram illustrating examples of object classification results according to examples. FIGS. 9A, 9B, and 9C are diagrams illustrating a flow of principal component analysis in each node according to an embodiment of the present invention. 10. Identification step according to one embodiment of the present invention. FIG. 11 is a flowchart showing an embodiment of the discrimination means according to the present invention. FIG. 12 is a flowchart showing an embodiment of the discrimination flow according to the present invention. Flowchart showing an embodiment of a flow for performing identification according to the present invention
DESCRIPTION OF SYMBOLS 1 ... Learning apparatus 11 ... Input means 111 ... Image input means 112 ... Feature extraction means 12 ... Object classification means 121 ... Correlation calculation means 122 ... Child node classification means 13 ... Principal component analysis means 14 ... Accumulation means 2 ... Identification apparatus 21 ... Input means 211 ... Image input means 212 ... Feature extraction means 22 ... Compression means 23 ... Child node compression means 24 ... Discrimination means 241 ... Subspace distance calculation means 242 ... Selection means 25 ... Output means 3 ... Translation device

Claims

An object identification device for identifying a plurality of registered objects using a plurality of images of an object,
  An input means for inputting a feature vector of an object to be identified;
  Compression means for compressing a feature vector at a root node of a tree structure in which a plurality of objects are classified;
  Child node compression means for further compressing feature vectors compressed in all child nodes connected to each node of the tree structure;
  Determining means for determining which node in the lower layer the feature vector compressed in each node of the tree structure belongs;
  Output means for outputting a corresponding category at the end node of the tree structure.
  An object identification device characterized by the above.

The input means includes
  Image input means for inputting image data of each object;
  And feature extraction means for extracting features from the image data.
  The object identification device according to claim 1.

The discrimination means includes
  At each node, a compressed feature vector;
  A subspace distance calculating means for calculating a subspace distance that is a distance between child node compressed feature vectors compressed at the child node;
  Selecting means for selecting a child node exhibiting a minimum subspace distance;
  The object identification device according to claim 1, wherein the object identification device is an object identification device.

A program that causes a computer to function as each means that constitutes the object identification device according to claim 1.

A recording medium storing the program according to claim 4.