JP2004192310A

JP2004192310A - Object learning device, object identifying device, these methods, these programs and medium for recording these programs

Info

Publication number: JP2004192310A
Application number: JP2002359042A
Authority: JP
Inventors: Yoshinori Kusachi; 良規草地; Akira Suzuki; 章鈴木; Tetsuya Kinebuchi; 哲也杵渕; Kenichi Arakawa; 賢一荒川; Naoki Ito; 直己伊藤; Tomohiko Arikawa; 知彦有川
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2002-12-11
Filing date: 2002-12-11
Publication date: 2004-07-08

Abstract

<P>PROBLEM TO BE SOLVED: To restrict data quantity even if the number of parameters are large and enable to identify without lowering correlation value at the time of identifying an object which is learned and registered from target images. <P>SOLUTION: In a learning device 1, a parameter image of a deformed image, different thickness image and the like of each object and separated sub-template image are created. Each pixel value of these images is converted into optical energy and applied logarithmic conversion, differential direction and strength are calculated by calculating a transverse differential element and a longitudinal differential element, a calculated characteristic vector is compressed by calculating histogram in this differential direction and a sub-template, an entire compressed space, and a partial space are calculated and registered as necessary information for identification. In the identification device 2, an object area is cut out from a pictured image by a user, applied logarithmic conversion, differential strength direction calculation and differential direction histogram calculation in the same manner and casting is performed by distance to a space of the sub-template part and identification result is output according to a distance to the space to the whole part space. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、画像内に、どのようなオブジェクトが写っているかを識別する画像識別技術に属し、その具体的な産業応用システムとして、例えば画像検索システムなどが挙げられる。
【０００２】
【従来の技術】
画像認識技術は、画像データ内のある領域がどの対象に属するかを特定する技術である。画像認識方式は、大きく分けて、パターンマッチング方式と統計的識別方式がある。
【０００３】
パターンマッチング方式では、入力画像の中にあらかじめ作成した標準パターンと同じ物があるか、あるいは近いものがあるかを検出する。標準パターンは、そのパターンを良く示す特徴を用いて表現される。代表的な手法であるテンプレートマッチング方法では、濃淡画像を特徴としたテンプレートや、濃淡画像を微分した微分濃淡画像を特徴とするテンプレートを標準パターンとして利用するのが一般的である。
【０００４】
統計的識別方式では、照合したい２枚の画像が単純な平行移動の関係にない場合や、入力画像が特徴パラメータで記述されている場合、画像の直接照合はできず、特徴記述間のマッチングが必要になるが、その際、入力画像の特徴ベクトルとある対象の同時生起確率を計算して、これが最大となる対象を入力画像が属す対象とする。これは入力画像の特徴ベクトルから見て、最も確かな対象を選んだことになる。
【０００５】
なお、後述の［発明が解決しようとする課題］の欄で述べる本発明の課題１を解決する方法として、テンプレートを圧縮する技術が非特許文献１に開示されている。しかし、この技術には照明条件がシミュレーション可能である環境下でないと利用できないなどの制約がある。また、この技術では同じく本発明の課題２を解決することはできない。
【０００６】
【非特許文献１】
村瀬洋、Ｓ．Ｋ．Ｎａｙｅｒ、「２次元照合による３次元物体認識−パラメトリック固有空間法−」、電子情報通信学会論文誌（Ｄ−ＩＩ）、１９９４年１１月、第Ｊ７７−Ｄ−ＩＩ巻、第１１号、ｐ．２１７９−２１８７
【０００７】
【発明が解決しようとする課題】
しかしながら、上記従来技術の手法では、以下の２つの問題があった。
【０００８】
（１）オブジェクト物のパラメータ、撮影条件のパラメータが多い場合にデータ量が指数関数的に大きくなるため、対応ができなくなる。例えば、屋外などで、照明条件が大きく変化し、かつ、オブジェクトの見え方が撮影位置によって大きく変化するオブジェクトの場合、パターンマッチング方式では、必要になるテンプレート数（パラメータの組み合わせ数）が膨大になり、現実的ではなかった。また、統計的識別方式においても、オブジェクト物の同時生起確率を計算するために、膨大なデータが必要となり、現実的ではなかった。
【０００９】
（２）また、上記問題（１）からテンプレート数を抑えると、用意したテンプレートと完全に合致しないオブジェクトに対しては、相関値が下がってしまい、認識できない。例えば、ゴシック体の文字のみをテンプレートとして登録すると、楷書体の文字に対しては、相関値が下がってしまう。また、例えば濃い文字と薄い文字では、背景の色や模様によっては相関値が下がってしまう。
【００１０】
本発明は、上記従来技術の問題（１）を解決することを課題１とし、上記従来技術の問題（２）を解決することを課題２とする。
【００１１】
【課題を解決するための手段】
本発明では、上記課題を解決するために、オブジェクトの複数画像を用いてオブジェクトを登録する学習装置と、入力画像から該登録された複数のオブジェクトを識別する識別装置とからなる装置における該学習装置であって、複数画像に対してオブジェクトの画像をあらかじめ想定されているパラメータの組み合わせ毎に該パラメータに従って変換した全パラメータ画像を作成する全パラメータ画像作成手段と、該全パラメータ画像に対してあらかじめ定められた注目点を中心としてあらかじめ定められた大きさの画像であるサブテンプレート画像を複数個作成するサブテンプレート画像作成手段と、該全パラメータ画像および該サブテンプレート画像から特徴を抽出した全体特徴ベクトルおよびサブテンプレート特徴ベクトルを計算する特徴抽出手段と、１特徴ベクトルを１サンプルとみなして該全体特徴ベクトルに主成分分析を行い全体圧縮空間および全体圧縮特徴ベクトルを計算し、かつ、該サブテンプレート特徴ベクトル群毎に主成分分析を行いサブテンプレート圧縮空間およびサブテンプレート圧縮特徴ベクトルを計算する圧縮手段と、該全体圧縮特徴ベクトルを対象毎に主成分分析を行って全体部分空間を計算し、かつ、該サブテンプレート圧縮特徴ベクトルを対象毎に主成分分析を行ってサブテンプレート部分空間を計算する部分空間作成手段と、該全体圧縮空間および該全体部分空間および該サブテンプレート圧縮空間および該サブテンプレート部分空間を登録・蓄積する蓄積手段と、を有することを特徴とするオブジェクト学習装置を、その解決の手段とする。
【００１２】
あるいは、オブジェクトの複数画像を用いてオブジェクトを登録する学習装置と、入力画像から該登録された複数のオブジェクトを識別する識別装置とからなる装置における該識別装置であって、照合したい入力画像の一部を選択してオブジェクト領域を切り出す画像切り出し手段と、該切り出された画像の特徴ベクトルを抽出する特徴抽出手段と、該特徴ベクトルをサブテンプレート部分空間に投影したサブテンプレート圧縮特徴ベクトルを計算し、各オブジェクトの各サブテンプレート部分空間との距離を求めるサブテンプレート部分空間距離計算手段と、オブジェクトがあらかじめ想定されている座標変換の一定範囲内のパラメータの組み合わせによって変形していることを仮定し、位置座標と各々の該パラメータの組み合わせに対しての確からしさを示すテーブルを用意し、パラメータの組み合わせが正しいことを前提としたときのオブジェクトの位置座標を算出し、該サブテンプレート部分空間距離計算手段にて計算した圧縮空間までの距離を該座標および座標変換パラメータで定められるテーブルの値に加算する投票手段と、定められた閾値以下の該テーブルの値を示すオブジェクトの座標位置および座標変換パラメータおよび対象を推定し求めるパラメータ推定手段と、該推定された座標位置および座標変換パラメータの画像を切り出し、対応する全体部分空間との距離を計算する全体部分空間距離計算手段と、定められた閾値以下の該距離を示した全体画像のパラメータを識別結果として出力する出力手段と、を有することを特徴とするオブジェクト識別装置を、その解決の手段とする。
【００１３】
あるいは、上記のオブジェクト学習装置において、前記全パラメータ画像生成手段は、複数画像に対してオブジェクトの画像をあらかじめ想定されている座標変換のパラメータの組み合わせ毎に該パラメータに従って変換した変形全体画像を作成する変形全体画像作成手段と、各変形全体画像に対してオブジェクト領域の画素値を定められた画素値に変換した異濃度画像を作成する異濃度画像作成手段と、を有することを特徴とするオブジェクト学習装置を、その解決の手段とする。
【００１４】
あるいは、上記のオブジェクト学習装置において、前記特徴抽出手段は、各異濃度画像およびサブテンプレート画像の各画素値を光エネルギーに変換し光エネルギーに対して対数変換する対数変換手段と、各異濃度画像およびサブテンプレート画像の横方向の微分と縦方向の微分成分を計算して微分の方向と強さを計算する微分強度方向計算手段と、各異濃度画像およびサブテンプレート画像に対し、定められた領域内の各画素の微分方向を定められた段階に量子化し、微分の強さを段階毎に累積加算した微分方向ヒストグラムを作成する微分方向ヒストグラム化手段と、を有することを特徴とするオブジェクト学習装置を、その解決の手段とする。
【００１５】
あるいは、上記のオブジェクト学習装置において、前記部分空間作成手段は、寄与率の合計が一定以上になる固有空間を構成する固有空間構成手段と、全体特徴ベクトルを固有空間に投影した際の座標および全体特徴ベクトルのパラメータである変形パラメータおよび濃度パラメータを記載したパラメータ対応表を作成する対応表作成手段と、を有することを特徴とするオブジェクト学習装置を、その解決の手段とする。
【００１６】
あるいは、上記のオブジェクト識別装置において、前記全体部分空間距離計算手段は、対応する全体部分空間との距離を計算する代わりにパラメータ対応表に記載されている座標との距離を計算するパラメータ距離計算手段を有することを特徴とするオブジエクト識別装置を、その解決の手段とする。
【００１７】
あるいは、オブジェクトの複数画像を用いてオブジェクトを登録する学習段階と、入力画像から該登録された複数のオブジェクトを識別する識別段階とを有する方法におけるオブジェクト学習方法であって、オブジェクトを登録する学習段階は、複数画像に対してオブジェクトの画像をあらかじめ想定されているパラメータの組み合わせ毎に該パラメータに従って変換した全パラメータ画像を作成する全パラメータ画像作成段階と、該全パラメータ画像に対してあらかじめ定められた注目点を中心としてあらかじめ定められた大きさの画像であるサブテンプレート画像を複数個作成するサブテンプレート画像作成段階と、該全パラメータ画像および該サブテンプレート画像から全体特徴ベクトルおよびサブテンプレート特徴ベクトルを抽出する特徴軸出段階と、１特徴ベクトルを１サンプルとみなして該全体特徴ベクトルに主成分分析を行い全体圧縮空間および全体圧縮特徴ベクトルを計算し、かつ、該サブテンプレート特徴ベクトル群毎に主成分分析を行いサブテンプレート圧縮空間およびサブテンプレート圧縮特徴ベクトルを計算する圧縮段階と、該全体圧縮特徴ベクトルを対象毎に主成分分析を行って全体部分空間を計算し、かつ、該サブテンプレート圧縮特徴ベクトルを対象毎に主成分分析を行ってサブテンプレート部分空間を計算する部分空間作成段階と、該全体圧縮空間および該全体部分空間および該サブテンプレート圧縮空間および該サブテンプレート部分空間を登録・蓄積する蓄積段階と、を有することを特徴とするオブジェクト学習方法を、その解決の手段とする。
【００１８】
あるいは、オブジェクトの複数画像を用いてオブジェクトを登録する学習段階と、該登録された複数のオブジェクトを識別する識別段階を有する方法におけるオブジェクト識別方法であって、オブジェクトを識別する識別段階は、照合したい入力画像の一部を選択してオブジェクト領域を切り出す画像切り出し段階と、該切り出された画像の特徴ベクトルを抽出する特徴抽出段階と、該特徴ベクトルをサブテンプレート部分空間に投影したサブテンプレート圧縮特徴ベクトルを計算し、各オブジェクトの各サブテンプレート部分空間との距離を求めるサブテンプレート部分空間距離計算段階と、オブジェクトがあらかじめ想定されている座標変換の一定範囲内のパラメータの組み合わせによって変形していることを仮定し、位置座標と各々の該パラメータの組み合わせに対しての確からしさを示すテーブルを用意し、パラメータの組み合わせが正しいことを前提としたときのオブジェクトの位置座標を算出し、該サブテンプレート部分空間距離計算段階にて計算した圧縮空間までの距離を該座標および座標変換パラメータで定められるテーブルの値に加算する投票段階と、定められた閾値以下の該テーブルの値を示すオブジェクトの座標位置および座標変換パラメータおよび対象を推定し求めるパラメータ推定段階と、該推定された座標位置および座標変換パラメータの画像を切り出し、対応する全体部分空間との距離を計算する全体部分空間距離計算段階と、定められた閾値以下の該距離を示した全体画像のパラメータを識別結果として出力する出力段階と、を有することを特徴とするオブジェクト識別方法を、その解決の手段とする。
【００１９】
あるいは、上記のオブジェクト学習方法において、前記全パラメータ画像生成段階は、複数画像に対してオブジェクトの画像をあらかじめ想定されている座標変換のパラメータの組み合わせ毎に該パラメータに従って変換した変形全体画像を作成する変形全体画像作成段階と、各変形全体画像に対してオブジェクト領域の画素値を定められた画素値に変換した異濃度画像を作成する異濃度画像作成段階と、を有することを特徴とするオブジェクト学習方法を、その解決の手段とする。
【００２０】
あるいは、上記のオブジェクト学習方法において、前記特徴抽出段階は、各異濃度画像およびサブテンプレート画像の各画素値を光エネルギーに変換し光エネルギーに対して対数変換をする対数変換段階と、各異濃度画像およびサブテンプレート画像の横方向の微分と縦方向の微分成分を計算して微分の方向と強さを計算する微分強度方向計算段階と、各異濃度画像およびサブテンプレート画像に対し、定められた領域内の各画素の微分方向を定められた段階に量子化し、微分の強さを段階毎に累積加算した微分方向ヒストグラムを作成する微分方向ヒストグラム化段階と、を有することを特徴とするオブジェクト学習方法を、その解決の手段とする。
【００２１】
あるいは、上記のオブジェクト学習方法において、前記部分空間作成段階は、寄与率の合計が一定以上になる固有空間を構成する固有空間構成段階と、全体特徴ベクトルを固有空間に投影した際の座標および全体特徴ベクトルのパラメータである変形パラメータおよび濃度パラメータを記載したパラメータ対応表を作成する対応表作成段階と、を有することを特徴とするオブジェクト学習方法を、その解決の手段とする。
【００２２】
あるいは、上記のオブジェクト識別方法において、前記全体部分空間距離計算段階は、対応する全体部分空間との距離を計算する代わりにパラメータ対応表に記載されている座標との距離を計算するパラメータ距離計算段階を有することを特徴とするオブジェクト識別方法を、その解決の手段とする。
【００２３】
あるいは、上記のオブジェクト学習方法における段階を、コンピュータに実行させるためのプログラムとしたことを特徴とするオブジェクト学習プログラムを、その解決の手段とする。
【００２４】
あるいは、上記のオブジェクト識別方法における段階を、コンピュータに実行させるためのプログラムとしたことを特徴とするオブジェクト識別プログラムを、その解決の手段とする。
【００２５】
あるいは、上記のオブジェクト学習方法における段階を、コンピュータに実行させるためのプログラムとし、該プログラムを、該コンピュータが読み取りできる記録媒体に記録したことを特徴とするオブジェクト学習プログラムを記録した記録媒体を、その解決の手段とする。
【００２６】
あるいは、上記のオブジェクト識別方法における段階を、コンピュータに実行させるためのプログラムとし、該プログラムを、該コンピュータが読み取りできる記録媒体に記録したことを特徴とするオブジェクト識別プログラムを記録した記録媒体を、その解決の手段とする。
【００２７】
本発明では、
（１）画像の各画素値を光エネルギーに変換し光エネルギーに対して対数変換をし、横方向の微分と縦方向の微分成分を計算して微分の方向と強さを計算し、定められた領域内の各画素の微分方向を定められた段階に量子化し、微分の強さを段階毎に累積加算した微分方向ヒストグラムを特徴として利用する。
【００２８】
（２）全対象の画像（異なるフォント、文字領域の異なる濃度、幾何変形などを含む）の特徴ベクトルに対して主成分分析を行って特徴ベクトルを圧縮し、次に対象毎に主成分分析を行ってさらに圧縮し、該圧縮空間にて判定を行う。
【００２９】
（３）各対象のテンプレートをあらかじめ定められた注目点を中心としてあらかじめ定められた大きさの画像であるサブテンプレート画像を作成し、サブテンプレート毎にテンプレートマッチングを行い、あらかじめ想定されている座標変換の一定範囲内のパラメータの組み合わせによって変形していることを仮定し、位置座標と各々の該パラメータの組み合わせに対しての確からしさを示すテーブルを用意し、パラメータの組み合わせが正しいことを前提としたときのオブジェクトの位置座標を算出し、該座標および座標変換パラメータで定められるテーブルの値から差し引き、最小のテーブル値を示す対象の座標位置および座標変換パラメータおよび対象カテゴリを求め、推定された座標位置および座標変換パラメータの画像を切り出し、対応するテンプレートとの距離を計算する。
【００３０】
本発明によると、上記の手段（１）〜（３）によって、以下の３つの作用を得ることができる。
【００３１】
手段（１）により、画素値の対数をとっているため、照明光の大きさの変動による影響は小さくなる。また、微分の方向は照明光による変化の影響を受けない。また、定められた領域内で微分方向のヒストグラムを計算するため、変形や位置ずれに対し変動が小さくなり、用意したテンプレートと完全に合致しない対象でも、相関値を高く保つことができる。すなわち、照明光の変化による画素値の変化を吸収し、かつ変形・位置ずれに強い特徴を用いるため、照明による変化と微小な位置ずれを考慮しなくてもよくなり、必要なテンプレートの数を大幅に削減でき、課題１を解決することができる。
【００３２】
手段（２）により、各画像を圧縮するため、照合に必要なテンプレートの数を大幅に減らすことができる。また、フォントの変形と文字領域の濃さの変化を圧縮した固有空間上で判定を行うため、従来は困難だったフォント・濃さの異なるオブジェクトに対しても、相関値を下げずに判定を行うことができ、課題２を解決することができる。
【００３３】
手段（３）により、各オブジェクトの変形が局所に注目すると変形が小さい特性を利用するため（参照：特願２００２−０４８９８１「画像処理装置およびその方法およびその方法を記録した記録媒体」）、照合に必要なテンプレートの数を大幅に減らすことができ、課題１を解決することができる。
【００３４】
【発明の実施の形態】
以下、本発明の実施の形態例について図を用いて詳細に説明する。
【００３５】
図１は、本発明を翻訳システムに適用した一実施の形態例を示す図であり、本システムにより、ユーザは、撮影した文字／記号／商標／看板の画像を基に、その文字／記号／商標／看板の翻訳情報をみることができる。ただし、文字／記号／商標／看板の翻訳辞書を作成していることが前提となる。本システムは、本発明および登録情報蓄積検索装置（翻訳装置３）から構成される。本発明は、文字／記号／商標／看板の異なるフォントの画像含んだ複数画像を用いて文字／記号／商標／看板を登録し、登録された複数の文字／記号／商標／看板を識別する。翻訳装置３は、文字／記号／商標／看板名と登録情報を蓄積しておき、文字／記号／商標／看板名から翻訳情報を検索する装置であり、一般のデータベースにより構築できるため、本実施の形態例では詳細を記載しない。
【００３６】
本発明は、学習装置１、識別装置２から構成される。学習装置１は、文字／記号／商標／看板までの距離が一定である複数視点画像から識別に必要な情報を求め、図略の蓄積装置に蓄積する。識別装置２は、ユーザが入力する画像と蓄積装置に蓄積された識別に必要な情報を利用し、画像に撮影された文字／記号／商標／看板を識別する。
【００３７】
図２は、本発明の第１の実施の形態例を示した図であって、学習装置１および識別装置２の詳細を説明している。
【００３８】
学習装置１は、文字／記号／商標／看板の画像の全変化要素である全パラメータ（例えば幾何変換パラメータ、背景濃度パラメータ、文字領域濃度パラメータなど）の全組み合わせに対して、文字／記号／商標／看板の画像を変換した全パラメータ画像を作成する全パラメータ画像作成手段１１と、全パラメータ画像に対してあらかじめ定められた注目点を中心としてあらかじめ定められた大きさの画像であるサブテンプレート画像を複数個作成するサブテンプレート画像作成手段１２と、全パラメータ画像およびサブテンプレート画像から特徴を抽出した全体特徴ベクトルおよびサブテンプレート特徴ベクトルを計算する特徴抽出手段１３と、１特徴ベクトルを１サンプルとみなして全体特徴ベクトルに主成分分析を行い全体圧縮空間および全体圧縮特徴ベクトルを計算し、かつ、サブテンプレート特徴ベクトル群毎に主成分分析を行いサブテンプレート圧縮空間およびサブテンプレート圧縮特徴ベクトルを計算する圧縮手段１４と、全体圧縮特徴ベクトルを対象毎に主成分分析を行って全体部分空間を計算し、かつ、サブテンプレート圧縮特徴ベクトルを対象毎に主成分分析を行ってサブテンプレート部分空間を計算する部分空間作成手段１５と、前記全体圧縮空間および全体部分空間およびサブテンプレート圧縮空間およびサブテンプレート部分空間を蓄積する蓄積手段と圧縮空間を蓄積する蓄積手段（以下、２つの蓄積手段を併せて蓄積手段４と呼ぶ）から構成されている。
【００３９】
識別装置２は、照合したい入力画像（識別対象の画像）の一部を選択して文字／記号／商標／看板領域を切り出す画像切り出し手段２１と、切り出された画像の特徴ベクトルを抽出する特徴抽出手段２２と、抽出された特徴ベクトルをサブテンプレート部分空間に投影したサブテンプレート圧縮特徴ベクトルを計算し、各文字／記号／商標／看板の各サブテンプレート部分空間との距離を求めるサブテンプレート部分空間距離計算手段２３と、文字／記号／商標／看板があらかじめ想定されている座標変換の一定範囲内のパラメータの組み合わせによって変形していることを仮定し、位置座標と各々の該パラメータの組み合わせに対しての確からしさを示すテーブルを用意し、パラメータの組み合わせが正しいことを前提としたときの文字／記号／商標／看板の位置座標を算出し、サブテンプレート圧縮空間距離計算手段２３にて計算した圧縮空間までの距離を、該座標および座標変換パラメータで定められるテーブルの値から差し引く投票手段２４と、定められた閾値以下のテーブル値を示す文字／記号／商標／看板の座標位置および座標変換パラメータおよび対象を求めるパラメータ推定手段２５と、推定された座標位置および座標変換パラメータの画陵を切り出し、対応する全体部分空間との距離を計算する全体部分空間空間距離計算手段２６と、定められた閾値以下の距離を示した全体画像のパラメータを識別結果として出力する出力手段２７より構成される。
【００４０】
図３は、本発明の第２の実施の形態例を説明した図である。以下、図を利用しながら各手段について詳細に説明する。
【００４１】
以下の例では、「電」、「信」、「話」の３種類の文字を登録／識別する場合を例に説明する。ただし、本例は、文字を３種類に限定するものではなく、何種類にでも拡張可能である。また、以後の表現では、画像の名前をＩとすると、各画素値をＩ（ｘ，ｙ）で表現する。
【００４２】
まず、学習装置２において、第１の実施の形態例における全パラメータ画像生成手段１１は、複数画像に対して文字／記号／商標／看板の画像をあらかじめ想定されている座標変換のパラメータの組み合わせ毎に該パラメータに従って変換した変形全体画像を作成する変形全体画像作成手段１１ａと、各変形全体画像に対して文字／記号／商標／看板領域の画素値を定められた画素値に変換した異濃度画像を作成する異濃度画像作成手段１１ｂにより構成されている。
【００４３】
ここで、複数画像とは、パラメータ化が困難な、各文字／記号／商標／看板の様々な種類の画像群のことである。「電」の例の一部を図４に示す。以後、「電」の複数画像数をＳ（電）、「信」の複数画像数をＳ（信）、「話」の複数画像数をＳ（話）のように表す。変形画像作成手段１１ａでは、複数画像を入力とし、定められた幾何変換に基づき、画像を変形させた変形全体画像を作成する。「電」をアフィン変換に基づいて変形した変形全体画像の一部の例を図５に示す。以後、変形全体画像数をＨ（電）のように表す。
【００４４】
異濃度画像作成手段１１ｂでは、各変形全体画像に対し、文字領域の濃度を変換する。「電」の変形全体画像の一部の文字領域を変換した異濃度画像の例を図６に示す。以後、異濃度画像数をＪ（電）のように表す。
【００４５】
サブテンプレート作成手段１２は、全パラメータ画像に対してあらかじめ定められた注目点を中心としてあらかじめ定められた大きさの画像であるサブテンプレート画像を複数個作成する。図７では、「電」を単純に９分割し、各分割画像の中心を注目点としている例を示している。以後、この分割数をＴと表す。
【００４６】
また、第１の実施の形態例における特徴抽出手段１３は、各異濃度画像およびサブテンプレート画像の各画素値を光エネルギーに変換し光エネルギーに対して対数変換をする対数変換手段１３ａと、各異濃度画像およびサブテンプレート画像の横方向の微分と縦方向の微分成分を計算して微分の方向と強さを計算する微分強度方向計算手段１３ｂと、各異濃度画像およびサブテンプレート画像に対し、定められた領域内の各画素の微分方向を定められた段階に量子化し、微分の強さを段階毎に累積加算した微分方向ヒストグラムを作成する微分方向ヒストグラム化手段１３ｃにより構成される。
【００４７】
対数変換手段１３ａは、入力画像の各画素に対し、光エネルギーに変換し、光エネルギーに対して、対数変換を行う。一般の画像の画素値は、ＣＣＤの特性関数をＦ、光エネルギー値をｖとすると、画素値＝Ｆ（ｖ）と表されるため、ν＝Ｆ^−１（画素値）と変換する。Ｆがわからない場合は、ｖ＝画素値とする。対数変換では、例えば、ν（ｘ，ｙ）＝ｌｏｇ_１０（１＋ν（ｘ，ｙ））のように変換する。
【００４８】
微分強度方向計算手段１３ｂの例を図８に示す。原画像Ｉの横をｘ軸、縦をｙ軸と考える。画像は横Ｘピクセル×縦Ｙピクセルであり、画像サイズはＸ×Ｙとなる。まず、原画像に対し、ソーベルオペレータを作用させ、ｘ方向の微分を計算したｘ方向微分画像Ｄｘとｙ方向の微分を計算したｙ方向微分画像Ｄｙを生成する。ソーベルオペレータでは、以下の式に従って画素値を求める。
【００４９】
【数１】

【００５０】
ただし、ソーベルオペレータを用いるのは一例であって、その他の方法であってもよい。
【００５１】
次に、微分強度画像Ｄｉと微分方向画像Ｄｄの各画素を以下の手段で求める。
【００５２】
【数２】

【００５３】
さらに、微分方向ヒストグラム化手段１３ｃの例を図９に示す。本例では、画像を４（ｎ）分割した領域内で方向を５（ｍ）段階に量子化してヒストグラムを作成しており、特徴は、２０（ｎ×ｍ）次元のベクトルで表現される。以後、これを特徴ベクトルと呼ぶ。
【００５４】
第１の実施の形態例における圧縮手段１４は、１特徴ベクトルを１サンプルとみなして全体特徴ベクトルに主成分分析を行い全体圧縮空間および全体圧縮特徴ベクトルを計算し、かつ、サブテンプレート特徴ベクトル群毎に主成分分析を行い、サブテンプレート圧縮空間およびサブテンプレート圧縮特徴ベクトルを計算する主成分分析手段１４ａにより構成される。これにより、２０（ｎ×ｍ）次元のベクトルは、Ｍ次元に圧縮される。主成分分析手段１４ａは、複数の入力ベクトルに対して、主成分分析を行い、固有ベクトルと寄与率を計算する。これには以下の手段で計算する。
（１）入力ベクトル群の共分散行列を求める。
（２）共分散行列の固有値（寄与率）、固有ベクトルを求める。
【００５５】
第１の実施の形態例における部分空間作成手段１５は、前記対象毎および対象のサブテンプレート毎に主成分分析を行い、寄与率の合計が一定以上になる固有空間を構成する固有空間構成手段１５ａと、全パラメータ画像を対応する固有空間に投影した際の座標および全体画像のパラメータ（変形パラメータおよび濃度パラメータ）を記載したパラメータ対応表を作成する対応表作成手段１５ｂから構成される。
【００５６】
固有空間構成手段１５ａは、全体圧縮特徴ベクトル（Ｍ次元）またはサブテンプレート圧縮特徴ベクトルに対して主成分分析を行い、固有ベクトルと寄与率を計算し、各固有ベクトルと該固有ベクトルの寄与率から、固有空間を構成する。
【００５７】
図１０は、寄与率の合計が一定以上となる固有空間の構成例である。例えば、「電」の圧縮特徴ベクトルを主成分分析し、１０次元の固有ベクトルで寄与率の合計が８０％以上となった場合、寄与率の上位１０個の固有ベクトルで固有空間を構成する。また、「信」の圧縮特徴ベクトルを主成分分析し、１２次元の固有ベクトルで寄与率の合計が８０％以上となった場合、寄与率の上位１２個の固有ベクトルで固有空間を構成する。また、「話」の圧縮特徴ベクトルを主成分分析し、９次元の固有ベクトルで寄与率の合計が８０％以上となった場合、寄与率の上位９個の固有ベクトルで固有空間を構成する。これらの固有空間を全体部分空間と呼ぶ。本図では、３次元で原空間を表現しているが、実際は、ｍ×ｎ次元の空間である。また、２次元平面で固有空間を表現しているが、実際は、１０，１２，９次元の空間である。パラメータ対応表の例を図１１に示す。本例では、全パラメータ画像がフォント名、アフィンパラメータ、文字領域濃度、画像の固有空間上での座標値が記されている。また、同様にサブテンプレートの特徴ベクトルも圧縮し、作成された固有空間をサブテンフレート部分空間と呼ぶ。
【００５８】
蓄積手段４では構成された固有空間（全休部分空間、サブテンプレート部分空間）を蓄積する。図１２は、固有空間を蓄積するフォーマットであって、文字／記号／商標／看板毎にエントリがある。各エントリには、文字／記号／商標／看板名、固有ベクトルの数、各固有ベクトル（Ｍ次元）が保存されている。
【００５９】
次に、識別装置２において、画像切り出し手段２１では、識別対処の画像のある画素を中心として、定められたサイズ（学習段階の画像と同サイズ：Ｘ×Ｙ）の領域を切り出す。
【００６０】
第１の実施の形態例における特徴抽出手段２２は、対数変換手段２２ａと、微分強度方向計算手段２２ｂ、微分方向ヒストグラム化手段２２ｃから構成されており、前述の学習段階（学習装置）の場合と同様である。
【００６１】
サブテンプレート部分空間距離計算手段２３は、計算された圧縮特徴ベクトルＥと各文字／記号／商標／看板の各サブテンプレート部分空間との距離を求める。図１３は、Ｅと「電」「信」「話」の距離を示している。
【００６２】
投票手段２４は、文字／記号／商標／看板があらかじめ想定されている座標変換の一定範囲内のパラメータの組み合わせによって変形していることを仮定し、位置座標と各々の該パラメータの組み合わせに対しての確からしさを示す図１４のようなテーブルを用意し、パラメータの組み合わせが正しいことを前提としたときの文字／記号／商標／看板の位置座標を算出し、サブテンプレート部分空間距離計算手段２３にて計算した圧縮空間までの距離を、該座標および座標変換パラメータで定められるテーブルの値に加算する。
【００６３】
図１５は、アフィン変換行列を
【００６４】
【数３】

【００６５】
と定義したときのアフィンパラメータ（ａ，ｂ，ｃ，ｄ）が（１，１，１，１）の場合の各サブテンプレートが投票を行う位置を示した図であり、文字の中心位置に投票が行われている。本手段では、各アフィンパラメータの組み合わせ毎に投票が行われる。
【００６６】
パラメータ推定手段２６ｃは、確からしさのテーブルから、定められた閾値以下のテーブル値を示す文字／記号／商標／看板の座標位置および座標変換パラメータおよび対象を求める。
【００６７】
第１の実施の形態例における全体部分空間距離計算手段２６は、推定された座標位置および座標変換パラメータの画像を切り出し、パラメータ対応表に記載されている座標との距離を計算するパラメータ距離計算手段２６ｃにより構成できる。例えば、切り出された画像「電」の全体部分空間に投影されて（５，５，６，９，１０，４，１，７，２，５）になったとすると、これは図１１のパラメータ対応表から、「電」のフォント名が明朝、アフィンパラメータが（２，５，６，９）、濃度が２５４の場合の座標値と一致するので、パラメータ距離は０となる。
【００６８】
出力手段２７では、固有空間内の座標との距離が定められた閾値以下である文字／記号／商標／看板とその際のパラメータを識別結果として出力する。例では「電」およびその時のアフィンパラメータ（２，５，６，９）と文字領域濃度２５４が結果となる。
【００６９】
図１６は、本発明の第３の実施の形態例としての学習段階を説明した図であって、学習装置の動作例を示したフローチャートである。学習装置では、以下の処理を行う。
１．登録文字／記号／商標／看板数（本例では３）分繰り返し［１］
２．文字／記号／商標／看板の変形画像数Ｈ分繰り返し［２］
３．変形画像の作成
４．繰り返し［２］終了
５．異濃度画像数Ｊ分繰り返し［３］
６．異濃度画像の作成
７．繰り返し［３］終了
８．異濃度画像数Ｊ×サブテンプレート分割数Ｔ分繰り返し［４］
９．サブテンプレート画像の作成
１０．繰り返し［４］終了
１１．サブテンプレート画像数（Ｊ×Ｔ）＋Ｊ分繰り返し［５］
１２．画素数分繰り返し［６］
１３．画素値を対数変換する
１４．繰り返し［６］終了
１５．画素数分繰り返し［７］
１６．微分強度方向の計算（例については記載済み）
１７．繰り返し［７］終了
１８．分割数（ｎ）分繰り返し［８］
１９．微分方向ヒストグラムの計算（例については記載済み）
２０．繰り返し［８］終了
２１．サブテンプレート分割数（Ｔ）＋１回繰り返し［９］
２２．サブテンプレート毎に主成分分析または全体画像の主成分分析
２３．固有空間の構成
２４．固有空間の蓄積
２５．繰り返し［９］終了
２６．繰り返し［５］終了
２７．繰り返し［１］終了
図１７は、本発明の第３の実施の形態例における識別段階を説明した図であって、識別装置の動作例を示したフローチャートである。識別装置では以下の処理を行う。
１．識別対象画像の入力
２．対象数（本例では３）分繰り返し［１］
３．入力画像の画素数分繰り返し［２］
４．領域の切り出し
５．切り出し領域画素数分繰り返し［３］
６．対数変換
７．繰り返し［３］終了
８．切り出し領域画素数分繰り返し［４］
９．微分強度方向計算
１０．繰り返し［４］終了
１１．分割数（ｎ）分繰り返し［５］
１２．微分方向ヒストグラム計算
１３．繰り返し［５］終了
１４．サブテンプレート画像数（Ｊ×Ｔ）分繰り返し［６］
１５．サブテンプレート部分空間との距離計算
１６．投票
１７．繰り返し［６］終了
１８．パラメータ推定
１９．繰り返し［２］終了
２０．繰り返し［１］終了
２１．投票値が閾値以下の（対象、画素座標）のパラメータ数分繰り返し［７］
２２．全体部分空間との距離計算
２３．繰り返し［７］終了
２４．出力
２５．終了
ただし、２２．の全体部分空間の距離計算の代わりに、パラメータ距離計算を行う場合もある。
【００７０】
なお、図２、図３で説明した学習装置と識別装置の各装置における各部の一部もしくは全部の機能をそれぞれの装置のコンピュータのプログラムで構成し、そのプログラムをコンピュータを用いて実行して本発明を実現することができること、あるいは、図６、図７で示した学習段階と識別段階の各段階を各段階別のコンピュータのプログラムとして構成し、それぞれのプログラムを各コンピュータに実行させることができることは言うまでもなく、各コンピュータでその機能を実現するためのプログラム、あるいは、各コンピュータにそれぞれの段階を実行させるためのプログラムを、その各コンピュータが読み取り可能な記録媒体、例えば、フレキシブルディスクや、ＭＯ、ＲＯＭ、メモリカード、ＣＤ、ＤＶＤ、リムーバブルディスクなどに記録して、保存したり、配布したりすることが可能である。また、上記のプログラムをインターネットや電子メールなど、ネットワークを通して提供することも可能である。このように、記録媒体やネットワークにより提供されたプログラムをそれぞれのコンピュータにインストールすることで、本発明が実施可能となる。
【００７１】
なお、上記実施の形態例では、学習装置（学習段階）と識別装置（識別段階）を分けて説明したが、一体の装置（段階）として構成しても構わない。この場合には、上記それぞれのプログラムは、一体となった一つのプログラムとして構成され、一つのコンピュータで実行可能となる。
【００７２】
【発明の効果】
以上述べたように本発明によれば、画像の各画素値を光エネルギーに変換し光エネルギーに対して対数変換をし、横方向の微分と縦方向の微分成分を計算して微分の方向と強さを計算し、定められた領域内の各画素の微分方向を定められた段階に量子化し、微分の強さを段階毎に累積加算した微分方向ヒストグラムを特徴として利用するため、照明光の変化による画素値の変化を吸収し、かつ変形・位置ずれに強い特徴を用いているため、照明による変化と微小な位置ずれを考慮しなくてもよくなり、必要なテンプレートの数を大幅に削減することができる。
【００７３】
また、全対象の画像（異なるフォント、文字領域の異なる濃度、幾何変形などを含む）の特徴ベクトルに対して主成分分析を行って特徴ベクトルを圧縮し、次に対象毎に主成分分析を行ってさらに圧縮するため、照合に必要なテンプレートの数を大幅に減らすことができる。また、フォントの変形と文字領域の濃さの変化を圧縮した固有空間上で判定を行うため、従来は困難だったフォント・濃さの異なるオブジェクトに対しても、相関値を下げずに判定を行うことができる。
【００７４】
さらに、各対象のテンプレートをあらかじめ定められた注目点を中心としてあらかじめ定められた大きさの画像であるサブテンプレート画像を作成し、サブテンプレート毎にテンプレートマッチングを行い、あらかじめ想定されている座標変換の一定範囲内のパラメータの組み合わせによって変形していることを仮定し、位置座標と各々の該パラメータの組み合わせに対しての確からしさを示すテーブルを用意し、パラメータの組み合わせが正しいことを前提としたときのオブジェクトの位置座標を算出し、該座標および座標変換パラメータで定められるテーブルの値から差し引き、最小のテーブル値を示す対象の座標位置および座標変換パラメータおよび対象カテゴリを求め、推定された座標位置および座標変換パラメータの画像を切り出し、対応するテンプレートとの距離を計算するため、照合に必要なテンプレートの数を大幅に減らすことができ、課題１を解決することができる。
【図面の簡単な説明】
【図１】本発明を翻訳システムに適用した一実施の形態例を示す図である。
【図２】本発明の第１の実施の形態例を示す図であって、学習装置および識別装置の詳細を説明する図である。
【図３】本発明の第１の実施の形態例を示す図であって、図２のさらに詳細を説明する図である。
【図４】複数画像「電」の例を示す図である。
【図５】変形画像「電」の例を示す図である。
【図６】異濃度画像「電」の例を示す図である。
【図７】サブテンプレート画像「電」の例を示す図である。
【図８】本実施の形態例の微分方向計算手段の例を示す図である。
【図９】本実施の形態例の微分方向ヒストグラム化手段の例を説明する図である。
【図１０】寄与率の合計が一定以上となる固有空間の構成例を示す図である。
【図１１】パラメータ対応表の例を示す図である。
【図１２】固有空間を蓄積するフォーマットの例を示す図である。
【図１３】固有空間までの距離を示す図である。
【図１４】確からしさを表すテーブルの例を示す図である。
【図１５】本実施の形態例の投票手段を説明する図である。
【図１６】本発明の第３の実施の形態例として学習段階を説明した図であって、学習装置の動作例を示したフローチャートである。
【図１７】本発明の第３の実施の形態例における識別段階の実施の形態例を説明した図であって、識別装置の動作例を示したフローチャートである。
【符号の説明】
１…学習装置
１１…全パラメータ画像作成手段
１１ａ…変形画像作成手段
１１ｂ…異濃度画像作成手段
１２…サブテンプレート作成手段
１３…特徴抽出手段
１３ａ…対数変換手段
１３ｂ…微分強度方向計算手段
１３ｃ…微分方向ヒストグラム化手段
１４…圧縮手段
１４ａ…主成分分析手段
１５…部分空間作成手段
１５ａ…固有空間構成手段
１５ｂ…対応表作成手段
２…識別装置
２１…画像切り出し手段
２２…特徴抽出手段
２２ａ…対数変換手段
２２ｂ…微分強度方向計算手段
２２ｃ…微分方向ヒストグラム化手段
２３…サブテンプレート部分空間距離計算手段
２４…投票手段
２５…パラメータ推定手段
２６…全体部分空間距離計算手段
２６ａ…パラメータ距離計算手段
２７…出力手段
３…翻訳装置
４…蓄積手段[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention belongs to an image identification technique for identifying what kind of object is present in an image, and a specific industrial application system is, for example, an image search system.
[0002]
[Prior art]
The image recognition technology is a technology for specifying to which object a certain area in image data belongs. Image recognition methods are roughly classified into a pattern matching method and a statistical identification method.
[0003]
In the pattern matching method, it is detected whether or not an input image has the same thing as a standard pattern created in advance or a similar thing. The standard pattern is expressed using features that show the pattern well. In a template matching method, which is a typical method, a template featuring a grayscale image or a template featuring a differential grayscale image obtained by differentiating a grayscale image is generally used as a standard pattern.
[0004]
In the statistical identification method, when two images to be compared are not in a simple parallel movement relationship, or when an input image is described by a feature parameter, direct matching of images cannot be performed, and matching between feature descriptions is not performed. At this time, the co-occurrence probability of the feature vector of the input image and a certain object is calculated at this time, and the object having the maximum value is determined as the object to which the input image belongs. This means that the most reliable target has been selected from the feature vector of the input image.
[0005]
Non-Patent Document 1 discloses a technique for compressing a template as a method for solving the problem 1 of the present invention described in the section of [Problem to be Solved by the Invention] described later. However, this technique has a limitation that it cannot be used unless the lighting conditions are in an environment where simulation is possible. In addition, this technique cannot solve Problem 2 of the present invention.
[0006]
[Non-patent document 1]
Hiroshi Murase, S.M. K. Nayer, "Three-dimensional object recognition by two-dimensional collation-parametric eigenspace method-", IEICE Transactions (D-II), November 1994, J77-D-II, No. 11, p. 2179-2187
[0007]
[Problems to be solved by the invention]
However, the above-mentioned conventional technique has the following two problems.
[0008]
(1) When the number of parameters of the object and the number of parameters of the photographing conditions are large, the data amount becomes exponentially large, so that it is impossible to cope with it. For example, in the case of an object, such as outdoors, where the lighting conditions change greatly and the appearance of the object changes greatly depending on the shooting position, the number of templates (the number of combinations of parameters) required by the pattern matching method becomes enormous. Was not realistic. Also, in the statistical identification method, a huge amount of data is required to calculate the simultaneous occurrence probability of the object, which is not practical.
[0009]
(2) In addition, if the number of templates is reduced from the above problem (1), the correlation value of an object that does not completely match the prepared template is reduced and cannot be recognized. For example, if only Gothic characters are registered as a template, the correlation value will decrease for square characters. Further, for example, in a dark character and a light character, the correlation value decreases depending on the background color and pattern.
[0010]
An object of the present invention is to solve the above-mentioned problem (1) of the conventional technology, and to solve the problem (2) of the conventional technology.
[0011]
[Means for Solving the Problems]
According to the present invention, in order to solve the above-described problems, the learning device in a device including a learning device that registers an object using a plurality of images of the object and an identification device that identifies the registered plurality of objects from an input image. An all-parameter image creating means for creating an all-parameter image obtained by converting an image of an object with respect to a plurality of images for each combination of parameters which are assumed in advance, and A sub-template image creating means for creating a plurality of sub-template images each having an image of a predetermined size centered on the obtained point of interest; an overall feature vector having features extracted from the all-parameter image and the sub-template image; Compute subtemplate feature vector A feature extraction unit, performing a principal component analysis on the entire feature vector assuming one feature vector as one sample to calculate a whole compressed space and a whole compressed feature vector, and performing a principal component analysis for each of the sub-template feature vector groups Compression means for calculating a sub-template compression space and a sub-template compression feature vector, performing a principal component analysis on the entire compression feature vector for each object to calculate an entire subspace, and A subspace creating means for performing a principal component analysis for each time to calculate a sub-template subspace, and a storage means for registering / accumulating the whole compressed space, the whole subspace, the subtemplate compressed space and the subtemplate subspace. An object learning device characterized by having
[0012]
Alternatively, the identification device in the device including a learning device for registering an object using a plurality of images of the object and an identification device for identifying the registered plurality of objects from the input image. Calculating a sub-template compression feature vector by projecting the feature vector to a sub-template subspace; an image cut-out unit for selecting a part to cut out an object region; a feature extraction unit for extracting a feature vector of the cut-out image; Assuming that the object is deformed by a combination of a sub-template subspace distance calculating means for calculating the distance between each object and each sub-template subspace, and a parameter within a predetermined range of coordinate transformation assumed in advance, For each combination of coordinates and parameters A table indicating the likelihood of the object is calculated, the position coordinates of the object are calculated assuming that the combination of parameters is correct, and the distance to the compressed space calculated by the sub-template partial space distance calculating means is calculated as the coordinates. Voting means for adding the value of the table defined by the coordinate conversion parameter and a parameter, parameter estimation means for estimating the coordinate position of the object indicating the value of the table which is equal to or less than a predetermined threshold value, the coordinate conversion parameter, and the object; An image of the coordinate position and the coordinate conversion parameter thus set, and a whole subspace distance calculating means for calculating a distance to the corresponding whole subspace; and a parameter of the whole image showing the distance equal to or less than a predetermined threshold value. Output means for outputting as an object identification device, And means of resolution.
[0013]
Alternatively, in the above object learning apparatus, the all-parameter image generating means creates a transformed whole image obtained by transforming an image of an object with respect to a plurality of images for each combination of parameters of coordinate transformation assumed in advance according to the parameter. Object learning, comprising: a modified whole image creating unit; and a different density image creating unit that creates a different density image by converting a pixel value of an object area into a predetermined pixel value for each of the modified whole images. The device is the means of the solution.
[0014]
Alternatively, in the object learning apparatus described above, the feature extracting unit converts each pixel value of each different-density image and the sub-template image into light energy, and performs logarithmic conversion on the light energy. And differential intensity direction calculating means for calculating the direction and intensity of the differential by calculating the horizontal and vertical differential components of the sub-template image, and a defined area for each different density image and the sub-template image. A differential direction histogram generating means for quantizing the differential direction of each pixel in a predetermined stage to a predetermined stage, and creating a differential direction histogram by cumulatively adding the strength of the differential for each stage. Is the means of solving the problem.
[0015]
Alternatively, in the above object learning apparatus, the subspace creation unit includes an eigenspace configuration unit that configures an eigenspace in which the sum of the contribution ratios is equal to or more than a predetermined value, An object learning device characterized by having a correspondence table creating means for creating a parameter correspondence table in which a deformation parameter and a density parameter which are parameters of a feature vector are described.
[0016]
Alternatively, in the above object identification device, the whole partial space distance calculating means calculates a distance to coordinates described in a parameter correspondence table instead of calculating a distance to a corresponding whole partial space. An object identification device characterized by having the following is a means for solving the problem.
[0017]
Alternatively, the object learning method includes a learning step of registering an object using a plurality of images of the object and an identification step of identifying the registered plurality of objects from an input image, the learning step of registering the object. Is an all-parameter image creating step of creating an all-parameter image obtained by converting an image of an object for a plurality of images according to the parameter for each combination of parameters that are assumed in advance; A sub-template image creating step of creating a plurality of sub-template images each having an image of a predetermined size centering on the point of interest, and extracting an overall feature vector and a sub-template feature vector from the all parameter images and the sub-template images You A feature axis setting step, a principal component analysis is performed on the entire feature vector assuming one feature vector as one sample to calculate an overall compressed space and an overall compressed feature vector, and a principal component analysis is performed for each of the sub-template feature vector groups. And calculating a sub-template compression space and a sub-template compression feature vector, and performing a principal component analysis on the entire compression feature vector for each target to calculate a whole subspace, and calculating the sub-template compression feature vector. A subspace creation step of calculating a sub-template subspace by performing principal component analysis for each object; and an accumulation step of registering and accumulating the entire compressed space, the entire subspace, the subtemplate compressed space, and the subtemplate subspace. An object learning method characterized by having .
[0018]
Alternatively, an object identification method in a method having a learning step of registering an object using a plurality of images of the object and an identification step of identifying the plurality of registered objects, wherein the identification step of identifying the object is to be collated An image extraction step of selecting a part of the input image to extract an object area, a feature extraction step of extracting a feature vector of the extracted image, and a sub-template compressed feature vector that projects the feature vector to a sub-template subspace And the sub-template subspace distance calculation step of calculating the distance between each object and each sub-template subspace, and that the object is deformed by a combination of parameters within a predetermined range of coordinate transformation assumed in advance. Assuming, position coordinates and each Prepare a table showing the likelihood for the parameter combination, calculate the position coordinates of the object assuming that the parameter combination is correct, and calculate the compressed space calculated in the sub-template subspace distance calculation step. Voting step of adding the distance to the table value determined by the coordinates and the coordinate conversion parameters, and a parameter for estimating the coordinate position and the coordinate conversion parameters of the object indicating the value of the table that is equal to or less than a predetermined threshold value and the target An estimation step, a whole subspace distance calculation step of cutting out an image of the estimated coordinate position and the coordinate conversion parameter, and calculating a distance to a corresponding whole subspace, and a whole showing the distance equal to or less than a predetermined threshold value Outputting an image parameter as an identification result. The object identification method of, and means for the resolution.
[0019]
Alternatively, in the above object learning method, the all-parameter image generating step creates a transformed whole image obtained by transforming an image of an object with respect to a plurality of images for each combination of parameters of coordinate conversion assumed in advance according to the parameter. Object learning, comprising: a modified whole image creating step; and a different-density image creating step of creating a different-density image by converting a pixel value of an object area into a predetermined pixel value for each modified whole image. The method is the means of the solution.
[0020]
Alternatively, in the above object learning method, the feature extracting step includes: a logarithmic conversion step of converting each pixel value of each different density image and the sub-template image into light energy and performing a logarithmic conversion on the light energy; A differential intensity direction calculation step of calculating the differential direction and intensity of the horizontal direction and the vertical direction of the image and the sub-template image to calculate the direction and intensity of the differential, and for each different density image and the sub-template image, A differential direction histogram forming step of quantizing the differential direction of each pixel in the region to a predetermined stage and creating a differential direction histogram by cumulatively adding the intensity of the differential for each stage. The method is the means of the solution.
[0021]
Alternatively, in the above object learning method, the subspace creation step includes: an eigenspace configuration step of configuring an eigenspace in which the sum of the contribution ratios is equal to or more than a predetermined value; An object learning method characterized by having a correspondence table creation step of creating a parameter correspondence table in which a deformation parameter and a density parameter which are parameters of a feature vector are described, is a means for solving the problem.
[0022]
Alternatively, in the above object identification method, the whole subspace distance calculating step is a parameter distance calculation step of calculating a distance to coordinates described in a parameter correspondence table instead of calculating a distance to a corresponding whole subspace. An object identification method characterized by having the following is a means for solving the problem.
[0023]
Alternatively, an object learning program characterized in that the steps in the above object learning method are a program for causing a computer to execute the steps is a means for solving the problem.
[0024]
Alternatively, an object identification program characterized in that the steps in the object identification method described above are a program for causing a computer to execute the steps is a means for solving the problem.
[0025]
Alternatively, a storage medium storing an object learning program, characterized in that the steps in the above object learning method are a program for causing a computer to execute the program, and the program is recorded on a storage medium readable by the computer, Means of solution.
[0026]
Alternatively, a recording medium recording an object identification program, characterized in that the steps in the above object identification method are recorded on a computer-readable recording medium as a program for causing a computer to execute the program, Means of solution.
[0027]
In the present invention,
(1) Each pixel value of the image is converted into light energy, logarithmic conversion is performed on the light energy, and the horizontal and vertical differential components are calculated to calculate the direction and intensity of the differential. The differentiation direction of each pixel in the set area is quantized to a predetermined stage, and the differentiation direction histogram obtained by cumulatively adding the strength of the differentiation for each stage is used as a feature.
[0028]
(2) Principal component analysis is performed on the feature vectors of all target images (including different fonts, different densities of character regions, geometric deformation, etc.) to compress the feature vectors, and then the principal component analysis is performed for each target. The compression is further performed, and a determination is made in the compressed space.
[0029]
(3) Create a sub-template image which is an image of a predetermined size centering on a predetermined point of interest for each target template, perform template matching for each sub-template, and perform coordinate conversion assumed in advance. Assuming that it is deformed by a combination of parameters within a certain range, prepare a table showing the position coordinates and the likelihood for each combination of the parameters, assuming that the combination of parameters is correct The position coordinates of the object at the time are calculated and subtracted from the values of the table defined by the coordinates and the coordinate conversion parameters to obtain the target coordinate position, the coordinate conversion parameter and the target category indicating the minimum table value, and the estimated coordinate position And image of coordinate conversion parameters To calculate the distance between the corresponding template.
[0030]
According to the present invention, the following three functions can be obtained by the above means (1) to (3).
[0031]
Since the logarithm of the pixel value is calculated by the means (1), the influence of the variation in the size of the illumination light is reduced. Further, the direction of the differentiation is not affected by the change due to the illumination light. In addition, since the histogram in the differential direction is calculated within the determined area, the variation with respect to deformation and displacement is small, and the correlation value can be kept high even for a target that does not completely match the prepared template. In other words, it absorbs changes in pixel values due to changes in illumination light and uses a feature that is strong against deformation and displacement, making it unnecessary to consider changes due to illumination and small displacements. This can significantly reduce the number of problems, and can solve the problem 1.
[0032]
By means (2), since each image is compressed, the number of templates required for collation can be significantly reduced. In addition, since the judgment is performed on the eigenspace that compresses the deformation of the font and the change in the density of the character area, it is possible to judge objects with different fonts and darkness without lowering the correlation value, which was difficult in the past. Task 2 can be performed.
[0033]
The means (3) uses the characteristic that the deformation of each object is small when the deformation is focused on a local part (see: Japanese Patent Application No. 2002-048981 “Image processing apparatus and its method and recording medium on which the method is recorded”). Can significantly reduce the number of templates required for the first task, and can solve the first problem.
[0034]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
[0035]
FIG. 1 is a diagram showing an embodiment in which the present invention is applied to a translation system. With this system, a user can use a character / symbol / trademark / signboard based on a photographed image of the character / symbol / sign. See translation information for trademarks / signs. However, it is assumed that a translation dictionary of characters / symbols / trademarks / signboards has been created. This system comprises the present invention and a registered information storage and retrieval device (translation device 3). According to the present invention, a character / symbol / trademark / sign is registered using a plurality of images including images of different fonts of the character / symbol / trademark / sign, and the registered plural characters / symbols / trademark / sign are identified. The translation device 3 is a device that stores character / symbol / trademark / signboard name and registration information and searches for translation information from the character / symbol / trademark / signboard name, and can be constructed using a general database. Details will not be described in the embodiment.
[0036]
The present invention includes a learning device 1 and an identification device 2. The learning device 1 obtains information necessary for identification from a plurality of viewpoint images in which the distance to a character / symbol / trademark / signboard is constant, and stores the information in a storage device (not shown). The identification device 2 identifies characters / symbols / trademarks / signboards captured in the image by using an image input by the user and information necessary for identification stored in the storage device.
[0037]
FIG. 2 is a diagram illustrating the first embodiment of the present invention, and illustrates details of the learning device 1 and the identification device 2.
[0038]
The learning device 1 applies a character / symbol / trademark to all combinations of all parameters (for example, a geometric conversion parameter, a background density parameter, a character area density parameter, etc.) that are all change elements of the character / symbol / trademark / signboard image. An all-parameter image creating means 11 for creating an all-parameter image obtained by converting an image of a signboard, and a sub-template image which is an image of a predetermined size centered on a predetermined point of interest for all the parameter images. A plurality of sub-template image creating means 12; a feature extracting means 13 for calculating an overall feature vector and a sub-template feature vector obtained by extracting features from all parameter images and sub-template images; and one feature vector as one sample Principal component analysis is performed on the global feature vector, and the global compression space and A compression means 14 for calculating an overall compression feature vector and performing a principal component analysis for each sub-template feature vector group to calculate a sub-template compression space and a sub-template compression feature vector; A subspace creating unit 15 for performing an analysis to calculate a whole subspace and performing a principal component analysis for each subtemplate compressed feature vector to calculate a subtemplate subspace; And a storage means for storing the sub-template compressed space and the sub-template partial space, and a storage means for storing the compressed space (hereinafter, the two storage means are collectively referred to as storage means 4).
[0039]
The identification device 2 includes an image extraction unit 21 that selects a part of an input image (an image to be identified) to be compared and extracts a character / symbol / trademark / signboard area, and a feature extraction that extracts a feature vector of the extracted image. A means for calculating a sub-template compressed feature vector by projecting the extracted feature vector to the sub-template sub-space and calculating a distance between each character / symbol / trademark / signboard and each sub-template sub-space; Assuming that the calculating means 23 and the character / symbol / trademark / signboard are deformed by a combination of parameters within a predetermined range of coordinate conversion assumed in advance, the position coordinates and each combination of the parameters are changed. Prepare a table that shows the certainty of the characters and the characters / Voting means 24 for calculating the position coordinates of the symbol / trademark / signboard and subtracting the distance to the compressed space calculated by the sub-template compressed space distance calculating means 23 from the values of the table defined by the coordinates and the coordinate conversion parameters; A parameter estimating means 25 for obtaining a coordinate position and a coordinate conversion parameter and an object of a character / symbol / trademark / sign indicating a table value equal to or less than a predetermined threshold value; And a total subspace distance calculating means 26 for calculating a distance from the entire subspace to be processed, and an output means 27 for outputting as parameters a parameter of the whole image indicating a distance equal to or less than a predetermined threshold value.
[0040]
FIG. 3 is a diagram illustrating a second embodiment of the present invention. Hereinafter, each means will be described in detail with reference to the drawings.
[0041]
In the following example, a case will be described as an example where three types of characters, “den”, “shin”, and “talk” are registered / identified. However, this example is not limited to three types of characters, but can be extended to any number of types. In the following expressions, if the name of the image is I, each pixel value is represented by I (x, y).
[0042]
First, in the learning device 2, the all-parameter image generation means 11 in the first embodiment converts the image of the character / symbol / trademark / signboard for a plurality of images for each combination of the coordinate transformation parameters assumed in advance. A modified whole image creating means 11a for creating a transformed overall image converted in accordance with the parameters, and a different density image in which the pixel value of the character / symbol / trademark / signboard area is converted into a defined pixel value for each transformed entire image Are formed by different density image creating means 11b.
[0043]
Here, the plural images are various kinds of image groups of each character / symbol / trademark / signboard, which are difficult to parameterize. FIG. 4 shows a part of an example of “den”. Hereinafter, the number of plural images of "den" is expressed as S (den), the number of plural images of "shin" is expressed as S (shin), and the number of plural images of "story" is expressed as S (story). The deformed image creating means 11a receives a plurality of images as input and creates a deformed entire image obtained by deforming the image based on a predetermined geometric transformation. FIG. 5 shows an example of a part of the entire deformed image obtained by deforming “den” based on the affine transformation. Hereinafter, the total number of transformed images is represented as H (density).
[0044]
The different-density image creating means 11b converts the density of the character area for each of the transformed entire images. FIG. 6 shows an example of a different-density image obtained by converting a character region of a part of the entire image of “den”. Hereinafter, the number of different density images is represented as J (electric).
[0045]
The sub-template creating means 12 creates a plurality of sub-template images each having an image of a predetermined size centered on a predetermined point of interest for all parameter images. FIG. 7 shows an example in which “den” is simply divided into nine parts, and the center of each divided image is set as a point of interest. Hereinafter, this division number is represented by T.
[0046]
The feature extracting unit 13 in the first embodiment includes a logarithmic conversion unit 13a that converts each pixel value of each different-density image and sub-template image into light energy and performs logarithmic conversion on the light energy, Differential intensity direction calculating means 13b for calculating the differential and horizontal differential components of the different density image and the sub-template image to calculate the direction and intensity of the differentiation, and for each different density image and the sub template image, The differential direction histogram forming means 13c quantizes the differential direction of each pixel in the determined area to a predetermined stage, and creates a differential direction histogram by cumulatively adding the strength of the differentiation for each stage.
[0047]
The logarithmic conversion means 13a converts each pixel of the input image into light energy and performs logarithmic conversion on the light energy. The pixel value of a general image is expressed as: pixel value = F (v), where F is the characteristic function of the CCD and v is the light energy value. ^-1 (Pixel value). If F is unknown, v = pixel value. In the logarithmic transformation, for example, ν (x, y) = log ₁₀ The conversion is performed as (1 + ν (x, y)).
[0048]
FIG. 8 shows an example of the differential intensity direction calculation means 13b. The horizontal and vertical lengths of the original image I are considered as x-axis and y-axis, respectively. The image is horizontal X pixels × vertical Y pixels, and the image size is X × Y. First, a Sobel operator is applied to an original image to generate an x-direction differential image Dx in which an x-direction differential is calculated and a y-direction differential image Dy in which a y-direction differential is calculated. The Sobel operator obtains a pixel value according to the following equation.
[0049]
(Equation 1)

[0050]
However, the use of the Sobel operator is an example, and other methods may be used.
[0051]
Next, each pixel of the differential intensity image Di and the differential direction image Dd is obtained by the following means.
[0052]
(Equation 2)

[0053]
FIG. 9 shows an example of the differential direction histogram forming means 13c. In this example, a histogram is created by quantizing the direction into 5 (m) steps in a region obtained by dividing the image into 4 (n), and the feature is represented by a 20 (n × m) -dimensional vector. Hereinafter, this is called a feature vector.
[0054]
The compressing means 14 in the first embodiment calculates a whole compression space and a whole compression feature vector by performing a principal component analysis on the whole feature vector, assuming one feature vector as one sample, and further comprises a sub-template feature vector group. Each component is composed of a principal component analysis unit 14a that performs a principal component analysis and calculates a sub-template compressed space and a sub-template compressed feature vector. As a result, the 20 (n × m) -dimensional vector is compressed to M-dimensional. The principal component analysis means 14a performs principal component analysis on a plurality of input vectors, and calculates eigenvectors and contribution rates. This is calculated by the following means.
(1) Find the covariance matrix of the input vector group.
(2) Eigenvalues (contribution rates) and eigenvectors of the covariance matrix are obtained.
[0055]
The subspace creating unit 15 in the first embodiment performs a principal component analysis for each object and each sub-template of the object, and forms an eigenspace forming unit 15a that forms an eigenspace in which the sum of the contribution rates is equal to or more than a certain value. And a correspondence table creating means 15b for creating a parameter correspondence table that describes coordinates when all parameter images are projected onto the corresponding eigenspace and parameters (deformation parameters and density parameters) of the entire image.
[0056]
The eigenspace composing means 15a performs a principal component analysis on the whole compressed feature vector (M-dimension) or the sub-template compressed feature vector, calculates an eigenvector and a contribution rate, and calculates an eigenspace from each eigenvector and the contribution rate of the eigenvector. Is composed.
[0057]
FIG. 10 is a configuration example of an eigenspace in which the sum of the contribution rates is equal to or more than a certain value. For example, the compressed feature vector of “den” is subjected to the principal component analysis, and when the sum of the contribution rates of the 10-dimensional eigenvectors is 80% or more, the eigenspace is configured by the top 10 eigenvectors of the contribution rates. In addition, the compressed feature vector of “shin” is subjected to principal component analysis, and when the total contribution rate of the 12-dimensional eigenvectors is 80% or more, an eigenspace is formed by the top 12 eigenvectors of the contribution rate. Also, the compressed feature vector of “talk” is subjected to principal component analysis, and when the sum of the contribution ratios of the nine-dimensional eigenvectors is 80% or more, the eigenspace is configured with the top nine eigenvectors of the contribution ratio. These eigenspaces are called whole subspaces. Although the original space is represented in three dimensions in this figure, it is actually an m × n-dimensional space. Although the eigenspace is represented by a two-dimensional plane, it is actually a 10, 12, or 9-dimensional space. FIG. 11 shows an example of the parameter correspondence table. In this example, the font name, the affine parameters, the character area density, and the coordinate values of the image on the eigenspace are described for all the parameter images. Similarly, the feature vector of the subtemplate is also compressed, and the created eigenspace is called a subtemplate subspace.
[0058]
The accumulating means 4 accumulates the configured eigenspaces (all subspaces, subtemplate subspaces). FIG. 12 shows a format for storing a unique space, in which there is an entry for each character / symbol / trademark / sign. Each entry stores a character / symbol / trademark / signboard name, the number of eigenvectors, and each eigenvector (M dimension).
[0059]
Next, in the identification device 2, the image cutout means 21 cuts out an area of a predetermined size (the same size as the image at the learning stage: X × Y) centering on a certain pixel of the image to be identified.
[0060]
The feature extracting means 22 in the first embodiment comprises a logarithmic converting means 22a, a differential intensity direction calculating means 22b, and a differential direction histogram forming means 22c. The same is true.
[0061]
The sub-template partial space distance calculation means 23 calculates the distance between the calculated compressed feature vector E and each sub-template partial space of each character / symbol / trademark / signboard. FIG. 13 shows the distance between E and “den”, “shin”, and “talk”.
[0062]
The voting means 24 assumes that the character / symbol / trademark / signboard is deformed by a combination of parameters within a certain range of coordinate conversion assumed in advance, and the voting means 24 determines the position coordinates and each combination of the parameters. A table as shown in FIG. 14 showing the likelihood of the character / symbol / trademark / signboard assuming that the combination of parameters is correct is calculated. The calculated distance to the compression space is added to the value of the table defined by the coordinates and the coordinate conversion parameters.
[0063]
FIG. 15 shows the affine transformation matrix
[0064]
[Equation 3]

[0065]
FIG. 9 is a diagram showing the position where each sub-template performs voting when the affine parameters (a, b, c, d) are (1, 1, 1, 1) when the affine parameters are defined as follows. Has been done. In this means, voting is performed for each affine parameter combination.
[0066]
The parameter estimating means 26c obtains, from the likelihood table, the coordinate position of the character / symbol / trademark / signboard indicating the table value equal to or less than the predetermined threshold value, the coordinate conversion parameter, and the object.
[0067]
The whole partial space distance calculating means 26 in the first embodiment cuts out the image of the estimated coordinate position and the coordinate conversion parameter, and calculates the distance to the coordinates described in the parameter correspondence table. 26c. For example, if the projected image “den” is projected on the entire subspace and becomes (5, 5, 6, 9, 10, 4, 1, 7, 2, 5), this corresponds to the parameter corresponding to FIG. From the table, since the font name of “den” matches the coordinate value when the font name is Mincho, the affine parameters are (2, 5, 6, 9), and the density is 254, the parameter distance is 0.
[0068]
The output unit 27 outputs a character / symbol / trademark / signboard whose distance from the coordinates in the eigenspace is equal to or less than a predetermined threshold value and parameters at that time as an identification result. In the example, “den”, the affine parameters (2, 5, 6, 9) at that time, and the character area density 254 are the results.
[0069]
FIG. 16 is a diagram illustrating a learning stage as a third embodiment of the present invention, and is a flowchart illustrating an operation example of the learning device. The learning device performs the following processing.
1. Repeat for registered characters / symbols / trademarks / signboards (3 in this example) [1]
2. Repeat for the number of deformed images of characters / symbols / trademarks / signboards H [2]
3. Creating a deformed image
4. Repeat [2] end
5. Repeat J for different number of density images [3]
6. Creating different density images
7. Repeat [3] End
8. Repeat for the number of different density images J x the number of sub template divisions T [4]
9. Creating sub template images
10. Repeat [4] end
11. Number of sub-template images (J × T) + J repetitions [5]
12. Repeat for the number of pixels [6]
13. Logarithmic conversion of pixel values
14． Repeat [6] End
15. Repeat for the number of pixels [7]
16. Calculation of differential intensity direction (example already described)
17. Repeat [7] End
18. Repeat for the number of divisions (n) [8]
19. Calculation of differential direction histogram (example already described)
20. Repeat [8] End
21. Sub-template division number (T) +1 repeat [9]
22. Principal component analysis for each sub-template or principal component analysis of whole image
23. Composition of eigenspace
24. Eigenspace accumulation
25. Repeat [9] End
26. Repeat [5] end
27. Repeat [1] End
FIG. 17 is a diagram illustrating an identification step in the third embodiment of the present invention, and is a flowchart illustrating an operation example of the identification device. The identification device performs the following processing.
1. Input of identification target image
2. Repeat for the number of objects (3 in this example) [1]
3. Repeat for the number of pixels of the input image [2]
4. Extract area
5. Repeat for the number of cutout area pixels [3]
6. Logarithmic transformation
7. Repeat [3] End
8. Repeat for the number of pixels in the cutout area [4]
9. Differential intensity direction calculation
10. Repeat [4] end
11. Repeat for the number of divisions (n) [5]
12. Differential direction histogram calculation
13. Repeat [5] end
14． Repeat for the number of sub-template images (J × T) [6]
15. Distance calculation with sub-template subspace
16. Vote
17. Repeat [6] End
18. Parameter estimation
19. Repeat [2] end
20. Repeat [1] End
21. Repeat for the number of parameters whose voting value is equal to or less than the threshold (target, pixel coordinates) [7]
22. Distance calculation with whole subspace
23. Repeat [7] End
24. output
25. End
However, 22. In some cases, parameter distance calculation may be performed instead of the distance calculation of the entire subspace.
[0070]
Note that some or all of the functions of each unit in the learning device and the identification device described with reference to FIGS. 2 and 3 are configured by a computer program of each device, and the programs are executed by using the computer to execute this program. The invention can be realized, or each of the learning stage and the identification stage shown in FIGS. 6 and 7 can be configured as a computer program for each stage, and each program can be executed by each computer. Needless to say, a program for realizing the function in each computer, or a program for causing each computer to execute each step, is stored in a recording medium readable by each computer, for example, a flexible disk, an MO, ROM, memory card, CD, DVD, removable disk Recorded and the etc., or save, it is possible to or distribute. Further, it is also possible to provide the above program through a network such as the Internet or e-mail. As described above, the present invention can be implemented by installing a program provided by a recording medium or a network on each computer.
[0071]
In the above embodiment, the learning device (learning stage) and the identification device (identification stage) are described separately, but may be configured as an integrated device (stage). In this case, the respective programs are configured as one integrated program and can be executed by one computer.
[0072]
【The invention's effect】
As described above, according to the present invention, each pixel value of an image is converted into light energy, logarithmic conversion is performed on the light energy, and horizontal and vertical differential components are calculated to calculate the differential direction and the differential direction. Calculate the intensity, quantize the derivative direction of each pixel in the defined area to a defined stage, and use the derivative direction histogram obtained by cumulatively adding the derivative strength for each stage as a feature. Features that absorb changes in pixel values due to changes and are resistant to deformation and displacement, eliminating the need to consider changes due to illumination and minute displacement, greatly reducing the number of required templates. can do.
[0073]
In addition, the feature vectors of all target images (including different fonts, different densities of character areas, geometric deformation, etc.) are subjected to principal component analysis to compress the feature vectors, and then the principal component analysis is performed for each target. The number of templates required for collation can be greatly reduced. In addition, since the judgment is performed on the eigenspace that compresses the deformation of the font and the change in the density of the character area, it is possible to judge objects with different fonts and density without lowering the correlation value, which was difficult in the past It can be carried out.
[0074]
Furthermore, a sub-template image, which is an image of a predetermined size, is created for each target template centering on a predetermined point of interest, template matching is performed for each sub-template, and coordinate conversion of a coordinate conversion assumed in advance is performed. Assuming that the deformation is caused by a combination of parameters within a certain range, prepare a table showing the position coordinates and the certainty for each combination of the parameters, and assuming that the combination of parameters is correct The position coordinates of the object are calculated, subtracted from the values of the table defined by the coordinates and the coordinate conversion parameters, the target coordinate position and the coordinate conversion parameter and the target category indicating the minimum table value are obtained, and the estimated coordinate position and Cut out image of coordinate transformation parameters In order to calculate the distance between the corresponding template, can significantly reduce the number of templates required for matching, it is possible to solve the problem 1.
[Brief description of the drawings]
FIG. 1 is a diagram showing an embodiment in which the present invention is applied to a translation system.
FIG. 2 is a diagram illustrating a first embodiment of the present invention, and is a diagram illustrating details of a learning device and an identification device.
FIG. 3 is a diagram showing a first embodiment of the present invention, and is a diagram for explaining further details of FIG. 2;
FIG. 4 is a diagram illustrating an example of a plurality of images “den”.
FIG. 5 is a diagram illustrating an example of a deformed image “den”.
FIG. 6 is a diagram illustrating an example of a different-density image “den”.
FIG. 7 is a diagram illustrating an example of a sub-template image “D”.
FIG. 8 is a diagram illustrating an example of a differential direction calculation unit according to the embodiment.
FIG. 9 is a diagram for explaining an example of a differential direction histogram forming means according to the embodiment.
FIG. 10 is a diagram illustrating a configuration example of an eigenspace in which the sum of contribution rates is equal to or greater than a certain value.
FIG. 11 is a diagram showing an example of a parameter correspondence table.
FIG. 12 is a diagram illustrating an example of a format for storing an eigenspace.
FIG. 13 is a diagram illustrating a distance to an eigenspace.
FIG. 14 is a diagram showing an example of a table representing certainty.
FIG. 15 is a diagram illustrating voting means of the present embodiment.
FIG. 16 is a diagram illustrating a learning stage as a third embodiment of the present invention, and is a flowchart illustrating an operation example of a learning device.
FIG. 17 is a diagram for explaining an embodiment of the identification stage in the third embodiment of the present invention, and is a flowchart showing an operation example of the identification device.
[Explanation of symbols]
1. Learning device
11 ... means for creating all parameter images
11a ... deformed image creation means
11b: Different density image creating means
12 ... Sub-template creation means
13. Feature extraction means
13a: Logarithmic conversion means
13b: Differential intensity direction calculating means
13c... Differentiation direction histogram generation means
14 compression means
14a: Principal component analysis means
15 ... means for creating subspace
15a ... Eigenspace composition means
15b: Correspondence table creation means
2. Identification device
21 ... Image clipping means
22. Feature extraction means
22a: logarithmic conversion means
22b: Differential intensity direction calculating means
22c: Differential direction histogram forming means
23 ... Sub template subspace distance calculation means
24 ... Voting means
25 ... Parameter estimation means
26. Total partial space distance calculation means
26a: Parameter distance calculation means
27 output means
3. Translation device
4. Storage means

Claims

A learning device for registering an object using a plurality of images of an object, and an identification device for identifying the registered plurality of objects from an input image, wherein the learning device includes:
All-parameter image creating means for creating all-parameter images obtained by converting an image of an object with respect to a plurality of images for each combination of parameters that are assumed in advance,
Sub-template image creating means for creating a plurality of sub-template images which are images of a predetermined size centered on a predetermined point of interest for all the parameter images,
Feature extraction means for calculating an overall feature vector and a sub-template feature vector that extract features from the all-parameter image and the sub-template image;
One feature vector is regarded as one sample, and a principal component analysis is performed on the entire feature vector to calculate a whole compressed space and a whole compressed feature vector, and a principal component analysis is performed for each sub template feature vector group to perform a sub template compression space. And compression means for calculating a sub-template compression feature vector;
Subspace creation for performing a principal component analysis on the entire compressed feature vector for each object to calculate an entire subspace, and performing a principal component analysis on the subtemplate compressed feature vector for each object to calculate a subtemplate subspace Means,
Accumulating means for registering and accumulating the entire compressed space, the entire subspace, the sub-template compressed space, and the sub-template subspace;
An object learning device, comprising:

A learning device for registering an object using a plurality of images of the object, and an identification device for identifying the plurality of registered objects from an input image, wherein the identification device includes:
Image cutout means for selecting a part of the input image to be collated and cutting out an object area,
Feature extracting means for extracting a feature vector of the cut-out image;
A sub-template subspace distance calculation means for calculating a sub-template compressed feature vector by projecting the feature vector onto the sub-template subspace, and calculating a distance between each object and each sub-template subspace;
Assuming that the object is deformed by a combination of parameters within a certain range of the coordinate transformation assumed in advance, prepare a table showing the position coordinates and the certainty for each combination of the parameters, Is calculated assuming that the combination is correct, and the distance to the compressed space calculated by the sub-template subspace distance calculating means is converted to a value of a table defined by the coordinates and the coordinate conversion parameters. Voting means for adding, parameter estimating means for estimating a coordinate position and a coordinate conversion parameter and an object of an object indicating a value of the table equal to or less than a predetermined threshold value,
Whole subspace distance calculating means for extracting an image of the estimated coordinate position and the coordinate conversion parameter, and calculating a distance to a corresponding whole subspace;
Output means for outputting, as an identification result, parameters of the entire image indicating the distance that is equal to or less than a predetermined threshold,
An object identification device, comprising:

The all-parameter image generation means includes:
A transformed whole image creating means for creating a transformed whole image obtained by transforming an image of an object with respect to a plurality of images for each combination of parameters of coordinate conversion assumed in advance,
A different-density image creating means for creating a different-density image in which the pixel value of the object area is converted into a determined pixel value for each of the transformed entire images;
The object learning device according to claim 1, comprising:

The feature extracting means includes:
Logarithmic conversion means for converting each pixel value of each different density image and sub-template image to light energy and logarithmically converting the light energy,
Differential intensity direction calculating means for calculating the direction and strength of the differentiation by calculating the horizontal and vertical differential components of each different density image and sub-template image,
For each different-density image and sub-template image, the differentiation direction of each pixel in a defined area is quantized to a predetermined stage, and a differentiation direction histogram is created by cumulatively adding the strength of the differentiation for each stage. Means for generating a histogram,
The object learning device according to claim 3, comprising:

The subspace creation means,
Eigenspace composing means for composing an eigenspace in which the sum of contribution rates is equal to or more than a certain value,
Correspondence table creating means for creating a parameter correspondence table describing deformation parameters and density parameters that are the parameters of the overall feature vector and the parameters of the overall feature vector when projected onto the eigenspace,
The object learning device according to claim 1, wherein the object learning device includes:

The whole subspace distance calculation means,
3. The object identification apparatus according to claim 2, further comprising parameter distance calculation means for calculating a distance from a coordinate described in a parameter correspondence table instead of calculating a distance from a corresponding whole subspace.

An object learning method, comprising: a learning step of registering an object using a plurality of images of the object; and an identification step of identifying the registered plurality of objects from the input image,
The learning phase for registering objects is
An all-parameter image creating step of creating an all-parameter image obtained by converting an image of an object with respect to a plurality of images in accordance with the parameter for each combination of parameters previously assumed;
A sub-template image creating step of creating a plurality of sub-template images of an image of a predetermined size centered on a predetermined point of interest for all the parameter images;
A feature axis setting step of extracting an overall feature vector and a sub-template feature vector from the all-parameter image and the sub-template image;
One feature vector is regarded as one sample, and a principal component analysis is performed on the entire feature vector to calculate a whole compressed space and a whole compressed feature vector, and a principal component analysis is performed for each sub template feature vector group to perform a sub template compression space. And a compression step of calculating a sub-template compression feature vector;
Subspace creation for performing a principal component analysis on the entire compressed feature vector for each object to calculate an entire subspace, and performing a principal component analysis on the subtemplate compressed feature vector for each object to calculate a subtemplate subspace Stages and
An accumulation step of registering and accumulating the entire compressed space and the entire subspace and the sub-template compressed space and the sub-template subspace;
An object learning method, comprising:

A learning step of registering an object using a plurality of images of the object, and an object identification method in a method having an identification step of identifying the registered plurality of objects,
The identification step for identifying the object comprises:
An image clipping step of selecting a part of the input image to be collated and cutting out an object area,
A feature extraction step of extracting a feature vector of the cut-out image;
Calculating a sub-template compressed feature vector by projecting the feature vector onto a sub-template sub-space, and calculating a sub-template sub-space distance calculating step of obtaining a distance between each object and each sub-template sub-space;
Assuming that the object is deformed by a combination of parameters within a certain range of the coordinate transformation assumed in advance, prepare a table showing the position coordinates and the certainty for each combination of the parameters, The position coordinates of the object assuming that the combination is correct are calculated, and the distance to the compressed space calculated in the sub-template subspace distance calculation step is converted to a value of a table defined by the coordinates and the coordinate conversion parameters. A voting step of adding, and a parameter estimation step of estimating a coordinate position and a coordinate conversion parameter and an object of an object indicating a value of the table equal to or less than a predetermined threshold value;
Cutting out an image of the estimated coordinate position and the coordinate conversion parameter, and calculating a distance to a corresponding whole subspace;
An output step of outputting, as an identification result, parameters of the entire image indicating the distance equal to or less than a predetermined threshold,
An object identification method comprising:

The all parameter image generation step includes:
A transformed whole image creating step of creating a transformed whole image obtained by transforming an image of an object with respect to a plurality of images for each combination of parameters of a coordinate transformation that is assumed in advance,
A different-density image creating step of creating a different-density image by converting the pixel value of the object area into a determined pixel value for each of the transformed entire images;
The object learning method according to claim 7, comprising:

The feature extraction step includes:
A logarithmic conversion step of converting each pixel value of each different density image and sub-template image into light energy and performing logarithmic conversion on the light energy,
A differential intensity direction calculating step of calculating a horizontal differential and a vertical differential component of each different density image and the sub-template image to calculate the direction and intensity of the differential,
For each different-density image and sub-template image, the differentiation direction of each pixel in a defined area is quantized to a predetermined stage, and a differentiation direction histogram is created by cumulatively adding the strength of the differentiation for each stage. Histogramming step;
The object learning method according to claim 9, comprising:

The subspace creation step includes:
An eigenspace construction step of constituting an eigenspace in which the sum of the contribution rates is equal to or more than a certain value;
A correspondence table creation step of creating a parameter correspondence table describing deformation parameters and density parameters, which are parameters of the overall feature vector and the parameters of the overall feature vector when projected onto the eigenspace,
11. The object learning method according to claim 7, wherein the object learning method includes:

The whole subspace distance calculation step includes:
9. The object identification method according to claim 8, further comprising a parameter distance calculation step of calculating a distance to coordinates described in a parameter correspondence table instead of calculating a distance to a corresponding whole subspace.

12. An object learning program, wherein the steps in the object learning method according to any one of claims 7, 9, 10, and 11 are a program for causing a computer to execute the steps.

13. An object identification program, wherein the steps in the object identification method according to claim 8 or 12 are a program for causing a computer to execute the steps.

A program for causing a computer to execute the steps in the object learning method according to any one of claims 7, 9, 10, and 11,
A recording medium recording an object learning program, wherein the program is recorded on a recording medium readable by the computer.

A program for causing a computer to execute the steps in the object identification method according to claim 8 or 12,
A recording medium recording an object identification program, wherein the program is recorded on a recording medium readable by the computer.