JP3696212B2

JP3696212B2 - Generation of image used for matching in pattern recognition, and method, apparatus, and program for pattern recognition using the image

Info

Publication number: JP3696212B2
Application number: JP2003035565A
Authority: JP
Inventors: 雄志三田; 敏充金子; 修堀
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2003-02-13
Filing date: 2003-02-13
Publication date: 2005-09-14
Anticipated expiration: 2023-02-13
Also published as: JP2004246618A

Description

【０００１】
【発明の属する技術分野】
本発明は、画像処理の技術分野におけるパターン認識に係わり、特に、パターン認識における照合（マッチング）に用いられる画像の生成ならびに同画像を用いたパターン認識のための方法、装置、およびプログラムに関する。
【０００２】
【従来の技術】
複数の画像同士の類似性を判定する方法として画像照合がある。画像照合は、画像中から特定の物体を検出する際や、画像検索を行う際に用いられる。画像照合では、あらかじめ準備しておいた見本画像（テンプレートとも呼ばれる）と対象画像の２枚を用い、何らかの測度に基づきこれら２枚の画像の類似度を評価する。よく用いられる類似度にはSSD(Sum of Square Difference)、SAD(Sum of Absolute Difference)あるいは正規化相関がある。
【０００３】
これらの類似度を用いる評価方法は、いずれも見本画像および対象画像において同じ位置の画素の濃淡値を比較するものであるが、見本画像と対象画像が撮影された照明条件の違いに起因する輝度変動やノイズの混入に対して照合精度が低下するという問題がある。このような問題を軽減する方法として、増分符号相関（下記非特許文献１参照）がある。増分符号相関では、見本画像および対象画像のそれぞれにおいて、水平方向に隣接する画素同士の濃淡値の増分（大小関係）を符号として表現し、その符号の一致数を類似度とする。増分符号相関法によれば、符号が逆転しない範囲の輝度変動やノイズ混入に対して頑強な照合を行うことができることが知られている。
【０００４】
また、増分符号相関を拡張した定性的３値表現も知られている（下記非特許文献２参照）。定性的３値表現においては、水平方向だけでなく垂直方向に隣接する画素同士についても評価することにより、照合の方向依存性を緩和している。さらに、濃淡値の大小関係および同値関係を３値で表すことにより、一様な濃淡を持つ画像に対する照合精度を向上している。
【０００５】
これら２つの方法は、特に単一の見本画像を与えることを前提とする課題に有効である。例えば、記号・ランドマークの識別や画像の位置あわせといった課題が挙げられる。しかし、単一の見本画像では不十分な課題もある。例えば、画像に含まれる人物の顔を検出するという課題が挙げられる。顔は、目・鼻・口の相対的な位置関係など基本的な構造は共通しているものの、それらの部位の大きさ・肌の色・ヒゲの有無など個々人で異なる特徴を有している。したがって、単一の見本画像だけでは個々人の差異を表現できないという問題がある。
【０００６】
素直には照合に用いる見本画像を複数にすることが考えられる。しかしながら、見本画像の数を増やせばその分、処理時間が増大するという問題がある。また、どの画像を見本として選択するかによって、検出精度も大幅に変化する。
【０００７】
見本画像を複数用意することに代えて、定性的３値表現（非特許文献２）では、収集した多数の顔の平均画像を見本として用いている。平均画像は、各画素における濃淡値の平均を求めることによって作成される。個々人の差異がある画素はぼかされ、共通の濃淡値を持つ画素は濃淡値が保存されるので、顔に共通の特徴を際立たせる効果がある。しかし、平均的な顔との差異が大きい顔については、正しく照合できないという問題がある。平均的な顔との差異が大きい部分についても顔らしさを定量的に評価し、顔でない画像との識別に利用する必要がある。
【０００８】
【非特許文献１】
村瀬一朗，金子俊一，五十嵐悟，「増分符号相関によるロバスト画像照合」，電子情報通信学会論文誌D-II, Vol.J83-D-II, No.5, pp.1323-1331, 2000
【０００９】
【非特許文献２】
山口修，福井和広，「定性的３値表現に基づく画像マッチング」，信学技報PRMU2002-34, 2002
【００１０】
【発明が解決しようとする課題】
従来の定性的３値表現のように平均画像を用いるのではなく、複数の見本画像を用いる新たな手法を提供するにあたり、処理時間を増大させることなく、しかも照合のロバスト性を向上することが望まれる。本発明はかかる事情を考慮してなされたものであり、パターン認識における照合（マッチング）に用いられる画像の生成ならびに同画像を用いたパターン認識のための方法、装置、およびプログラムを提供することを目的とする。
【００１１】
【課題を解決するための手段】
本発明は、各々濃淡値画像からなる複数の見本画像から、パターン認識における照合に用いられる画像を生成する方法、装置、ならびにプログラムを開示する。まず前記複数の見本画像において２つの画素の異なる組合せについての濃淡差分値を計算する。前記濃淡差分値を所定の量子化レベルで量子化し、該量子化レベルに依存する所定の量子化値のいずれかを画素値として有する複数の見本量子化画像を生成する。前記量子化値ごとに、前記複数の見本量子化画像の各画素において当該量子化値が生起する確率を計算し、この計算された前記生起確率の値を画素値として有する見本確率画像を前記量子化値ごとに生成する。そして、この見本確率画像をパターン認識における照合に用いられる画像とする。
【００１２】
また、本発明は上記のように生成された見本確率画像を用いて対象画像との照合を行うパターン認識方法、装置、ならびにプログラムを開示する。まず濃淡値画像からなる対象画像において２つの画素の異なる組合せについての濃淡差分値を計算する。前記濃淡差分値を所定の量子化レベルで量子化し、該量子化レベルに依存する所定の量子化値のいずれかを画素値として有する対象量子化画像を生成する。そして、前記対象量子化画像と、見本確率画像との類似度を計算する。
【００１３】
【発明の実施の形態】
（第１の実施形態）まず、本発明の第１の実施の形態について、図１〜図４を参照して説明する。図１は本発明の第１実施形態に係る画像処理装置の概略構成を示すブロック図である。本実施形態に係る画像処理装置は、汎用のコンピュータを用いて実現することができる。ＣＰＵ、メモリ、入出力インターフェース、キーボード、ディスプレイといったコンピュータの基本構成要素については図示省略してある。図１に示すように画像処理装置１は見本画像処理部１１、対象画像処理部１２、照合部１３を有する。これら構成要素は画像処理装置１が果たす機能に対応しており、例えばコンピュータプログラムとして実現することができる。画像処理装置１が果たす機能は大別すると２つある。その一つは見本画像の処理であり、もう一つは見本画像と対象画像との照合処理である。前者の見本画像処理においては、複数の見本濃淡値画像から照合処理に用いられる見本確率画像を生成する。後者の照合処理においては、生成された見本確率画像を用いて対象画像との照合を行い、対象画像と見本確率画像との類似性を判定する。本実施形態は見本画像処理および照合処理の両者を実行する画像処理装置に関するものであるが、見本画像処理および照合処理のいずれか一方のみを実行する画像処理装置についても本発明の実施形態に含まれる。
【００１４】
見本画像処理部１１により処理される画像を見本画像I、見本量子化画像Q、見本確率画像Pと称する。見本画像Iは対象画像との照合に用いられる見本確率画像Pの元となる画像であり、濃淡値を画素値として有する。見本画像処理部１１は複数の見本画像Iに基づいて複数の見本量子化画像Qおよび見本確率画像Pを生成する。また、対象画像処理部１２により処理される画像は対象画像I'および対象量子化画像Q'である。対象画像I'は見本確率画像Pとの照合のため本実施形態の画像処理装置１に対して与えられる画像であり、見本画像Iと同様に濃淡値画像からなる。対象画像処理部１２は対象画像I'に基づいて対象量子化画像Q'を生成する。なお、見本画像I、見本量子化画像Q、見本確率画像P、対象画像I'、対象量子化画像Q'はいずれも図示しないハードディスク装置等に記憶保持される。
【００１５】
図２は、本実施形態に係る画像処理装置において実行される一連の処理手順を示すフローチャートである。見本画像と対象画像の画像サイズは等しいものとし、Ｗ×Ｈ画素とする。見本画像I内の位置(x,y)の画素の濃淡値をI(x,y)とし、同様に、対象画像I’内の位置(x,y)の画素の濃淡値をI’(x,y)とする。
【００１６】
ステップS101において、見本画像Iにおける２つの画素の濃度差分値を計算する。２つの画素を組合せるには幾つか方法があり、例えば、画素I(x,y)とこの画素I(x,y)に対して水平方向に隣接した右隣の画素I(x+1,y)とを組み合わせることができる。あるいは画素I(x,y)と、この画素I(x,y)に対して垂直方向に隣接した下隣の画素I(x,y+1)とを組合せることもできる。このように画素の組合せを選択する方法は、一般的な画像の性質として、注目画素と隣接画素との相関は高いとの知見に基づく。なお、画素I(x,y)とI(x+2,y)，I(x,y)とI(x+1,y+1)といった組合せとしてもよい。本実施形態では、隣接画素を選択する場合を例として説明を行う。この場合、濃度差分値は、I(x+1,y) − I(x,y)またはI(x,y+1) − I(x,y)のように計算できる。画像の濃淡値が０〜２５５の２５６階調で表現される場合、濃度差分値は−２５５〜＋２５５までの５１１階調となる。
【００１７】
ステップS102では、２つの画素の濃度差分値を所定の量子化レベルで量子化する。
【００１８】
例えば量子化レベルを２とするとき、見本量子化画像Qの各画素値を以下の数式に従って求めることにより、見本画像Iを当該量子化レベルで量子化することができる。
【００１９】
【数１】

【００２０】
これは、水平方向に隣接した画素との差分値に基づく量子化であるが、垂直方向に隣接する画素との差分値に基づく量子化においては、以下の数式を用いればよい。
【００２１】
【数２】

【００２２】
また、上式両方を用いて、見本量子化画像を水平方向および垂直方向について２枚作成しておいてもよい。
【００２３】
例えば量子化レベルを３とするとき、定性的３値表現（上述の非特許文献２参照）によれば、見本量子化画像Qの各画素値を以下の式に従って算出することができる。
【００２４】
【数３】

【００２５】
量子化レベルが２の場合と同様に、水平方向の隣接画素でなく垂直方向の隣接画素を用いてもよい。あるいは、それら両方を用いてもよい。
【００２６】
なお、水平方向の隣接画素を用いて作成した見本量子化画像は、（Ｗ−１）×Ｈ画素の大きさとなり、垂直方向の隣接画素を用いて作成した見本量子化画像は、Ｗ×（Ｈ−１）画素の大きさとなる。
【００２７】
上式では、隣接画素間の濃度差分値の符号に応じて量子化を行っているが、以下の数式にしたがって量子化を行ってもよい。
【００２８】
【数４】

【００２９】
ここで、t₁およびt₂は量子化のためのしきい値であり、例えばt₁=t₂=5のように設定して濃淡差分値が±５の範囲は同値であるとみなして量子化を行うように定めることができる。
【００３０】
また、量子化レベルをLとして一般化し、
【数５】

【００３１】
のようにして量子化を行ってもよい。なお、量子化レベルLを４以上としてもよいが、明るさの変動やノイズの混入に対する頑強性を確保するには、２もしくは３の量子化レベルを用いると良いことが報告されている（上述の非特許文献１、２参照）。そこで本実施形態では、上述した定性的３値表現を用いるものとして説明する。言うまでもなく、本発明は定性的３値表現に限定されない。
【００３２】
図３は、各見本画像に対する定性的３値表現による見本量子化画像の例を示す図である。ここでは複数の人物の顔画像を使用し、あらかじめ手入力した目鼻の位置がほぼ一致するように７枚の見本画像Iが作成されている。見本量子化画像Qは、濃度差分値の符号に応じて、白、黒、灰（図ではハッチング）の３つの明るさで表現されている。なお、ここでは顔画像を例としたが、目、鼻、口などの顔の各部位をそれぞれ切り出した画像を用いてもよい。
【００３３】
準備された見本画像Iのすべてを対象にステップS101およびS102が実行される。ステップS103においては、すべての見本画像Iに対して見本量子化画像Q（ここでは水平方向および垂直方向の計１４枚）が作成されたかどうかを判定し、次のステップへ移る。
【００３４】
ステップS104では、見本確率画像Pを作成する。見本確率画像Pの各画素値は、見本量子化画像Qの各量子化レベルの生起確率とする。レベルｌ(0≦l≦L)に対応する見本確率画像の各画素値P_l(x,y)は、見本画像の総数Nおよびn番目の見本量子化画像の画素値Q_n(x,y)により、以下の数式により算出される。
【００３５】
【数６】

【００３６】
図４は６００枚の見本画像から作成した見本確率画像の例を示す図である。作成される見本確率画像Pの枚数は、見本画像Iにおける２つの画素の選択方法と量子化レベルLに応じて決定される。見本画像Iの枚数を６００枚としているが、あくまで一例であり、高い照合精度を得られるよう適切な枚数を実験的に求めるのがよい。図４では、定性的３値表現により量子化を行っているため、画素の選択は水平方向と垂直方向の２通りであり、量子化レベルは３であるので、６枚の見本確率画像Pが作成されている。
【００３７】
図４において、301は、６００枚の見本画像から作成した平均画像であり、見本確率画像Pとの比較を行うために示した。302〜304は水平方向に隣接する画素を用いて作成した見本確率画像Pである。302は、顔の各位置において右隣の画素の濃淡値が大きい確率を各画素の値として保持している。303は同値となる確率を、304は小さい確率を表している。確率が高い（１に近い）ほど明るく、逆に確率が低い（０に近い）ほど暗く表示されている。例えば、頬の辺りは一様な濃淡を持つ場合が多いので、見本確率画像303の頬の領域は明るく表示されている。また、目や鼻の付近では濃淡が大きく変化するので、見本確率画像302もしくは304では目鼻の付近で明るい領域と暗い領域が現れている。305〜307は垂直方向に隣接する画素を用いて作成した見本確率画像であり、それぞれ注目画素の下の画素値が大きい確率、同値となる確率、小さい確率を表している。目、鼻、口の付近では濃淡の変化が激しく、頬の付近では濃淡が変化しないため、それを反映した結果が現れている。
【００３８】
次にステップS105およびS106では、対象画像I'に対してステップS101およびS102と同様の処理を行う。例えば、見本画像Iを定性的３値表現によって量子化した場合、対象画像I’についても定性的３値表現による量子化を行い、対象量子化画像Q’を作成する。
【００３９】
ステップS107では、ステップS104において作成された見本確率画像Pと、ステップS106において作成された対象量子化画像Q’とを用いて、類似度の算出を行う。類似度は対象量子化画像Q’の各画素値の生起確率を見本確率画像Pから取得し、その重み付きの乗算結果として以下の数式により定義する。
【００４０】
【数７】

【００４１】
ここで、W_Q'(x,y)は、対象量子化画像Q’の各画素値に対する重み係数であり、定性的３値表現を用いた場合では、隣接画素の濃淡差分値の符号に対する重みとなる。一般に、同値符号の発生頻度は他の符号の発生頻度に比べて小さい。同値符号が発生したときの重みを大きくとることによって、類似性判定の精度を向上させることができる。例えば、W_Q'(x,y)=-1＝W_Q'(x,y)=1＝１，W_Q'(x,y)=0＝２のように同値符号が発生したときの重みを他の符号の２倍になるように設定する。顔画像では同値符号はほとんど発生しないが、一様な濃淡を持つ背景は同値符号が多く発生するので、顔と一様な背景を識別する際には、このような重み付けは有効に作用する。
【００４２】
なお、十分な見本サンプルが集めらない場合、すなわち見本画像の総数Nが小さいとき、P_Q'(x,y)(x,y)=0となることがある。このとき類似度は０となってしまい、他の画素における確率値は考慮されなくなるという問題がある。そこで、
【数８】

のように、見本確率画像Pの画素値に対して下限値αを設定しておき、αを下回る確率値が得られた場合には、その値をαで置き換えることを行う。例えば、α=0.01のように小さい値を設定しておくことにより、この問題に対処することが可能となる。
【００４３】
類似度は、上式の対数をとり、
【数９】

としてもよい。
また、次式のように見本確率画像Pの各画素値の平均値を類似度としてもよい。
【００４４】
【数１０】

【００４５】
最後に、ステップS108では、ステップS107において算出された類似度に基づき、見本画像群Iに対する対象画像I'の類似性を判定する。実験的に決定したしきい値を用い、類似度がしきい値を上回っているならば、「類似している」と判定する。逆に、類似度がしきい値より低いならば、「類似していない」と判定する。例えば、見本画像Iとして顔画像を用いているとき、類似度がしきい値以上ならば「顔である」と判定し、しきい値未満であれば「顔でない」と判定する。
【００４６】
（第２の実施形態）次に、本発明の第２の実施形態を図５および図６を参照して説明する。図５は、第２実施形態に係る画像処理装置において実行される一連の手順のフローチャートである。ステップS401〜S404およびS409〜S412はそれぞれ第１実施形態の図２に示したフローチャートにおけるステップS101〜S104およびS105〜S108とほぼ同一の処理である。図５と図２では、新たにステップS405〜S408が挿入されていること、およびステップS411の類似度算出方法が異なる。以下では、これらの相違点についてのみ説明する。
【００４７】
図５におけるステップS405〜S408は、見本画像によく似ているが異なる偽の見本画像から偽見本確率画像を作成するための処理を示している。パターン認識では、しばしば見本画像によく似た紛らわしい画像が出現する。これを「偽見本画像」と称する。パターン認識では偽見本画像と見本画像とを識別する必要性が生じる。
【００４８】
偽見本画像は、例えば図２で示したフローチャートにおいて、見本画像Iと類似していると判定された画像の中で、見本画像Iとは異なる画像を収集することによって得られる。あるいは、単純に見本が含まれない画像を大量に収集してもよい。このような偽見本画像群から作成した偽見本確率画像を用いることにより、紛らわしい画像を正しく識別することが可能となる。ステップS405〜S408は、図２に示したフローチャートにおけるステップS101〜S104に対応しており、見本画像の代わりに偽見本画像を用いる点のみ異なる。
【００４９】
ステップS411では、見本画像群Iから作成した見本確率画像P、偽見本画像群から作成した偽見本確率画像P^F、対象画像から作成した対象量子化画像Q'の３枚の画像から、見本画像群Iと対象画像I'との類似度を算出する。類似度は、各画素におけるPとP^Fの比を用いて以下のように定義される。
【００５０】
【数１１】

【００５１】
また、上式の対数をとり、
【数１２】

としてもよい。
また、次式のように各画素における見本確率画像Pと偽見本確率画像P^Fとの比の平均値を類似度としてもよい。
【００５２】
【数１３】

【００５３】
偽見本画像群を用いることにより、見本と偽見本との差異を強調した類似度を算出することが可能となる。
【００５４】
図６は、見本画像中の３箇所の位置における隣接画像との濃淡差分値のヒストグラムを示したものである。濃淡値は０〜２５５の２５６階調で表現されるため、濃淡差分値は−２５５〜＋２５５の５１１階調となる。位置によって、濃淡差分値の分布に偏りが生じており、この偏りが顔の特性を表している。例えば、目は周囲に比べて濃淡値が低いので、目の付近Ｐ１またはＰ２で濃淡差分値を求めると、０よりも大きい方もしくは小さい方にヒストグラムの分布が偏る。これに対し、周囲の濃淡値とほとんど差がない鼻の付近Ｐ３では、濃淡差分値が０となる頻度が高く、これを中心とした分布が形成されている。
【００５５】
顔でない対象画像では、濃淡差分値の分布が顔とは異なっていると考えられる。偽見本画像では、画像中の位置によって見本画像の分布と近い場合と異なる場合があり、確率画像同士の比を取ることによって分布の違いを強調した類似度を求めることができる。
【００５６】
（第３の実施形態）
次に、本発明の第３の実施形態について図７および図８を参照して説明する。第３の実施形態は顔検出への応用例に関する。本実施形態では、見本画像と対象画像の大きさが同一であることを前提としている。しかし、入力画像中の顔の大きさは必ずしも見本画像と一致しないという問題がある。図７に示される顔検出のための一連の処理手順は、このような画像サイズの不一致に対応することができるよう構成されている。
【００５７】
まずステップS601において、見本確率画像を作成する。見本確率画像の作成手順は第１実施形態に示したものと同様である。なお、顔とよく似ているが顔ではない偽見本確率画像を同時に作成しておいてもよい（第２実施形態参照）。次にステップS602において、入力画像の大きさを様々な尺度で拡大、縮小した複数の画像を作成し記憶する。拡大、縮小の尺度を密に変化させれば、大きさが少しずつ異なる顔を含んだ入力画像群が作成される。これにより、いずれかの入力画像には見本画像とほぼ同一の大きさの顔が含まれることになる。
【００５８】
次に図８に示すように、それぞれの入力画像７０２〜７０４に走査ウィンドウ７０５を設置する。走査ウィンドウ７０５の大きさは見本確率画像７０１を作成するための見本画像と同一とする。この走査ウィンドウ７０５を入力画像の端から少しずつずらしながら、ウィンドウ７０５内部の画像を切り出す（ステップS603）。ウィンドウ７０５内部の画像を対象画像として、類似性の判定を行う（ステップS604）。類似性判定の方法は上述の通りである。偽見本確率画像を用いた類似度に基づき類似性判定を行ってもよい。入力画像中のすべての領域を走査したかどうかを判定し（ステップS605）、走査が終了していれば記憶した他の大きさの異なる入力画像に対してステップS603〜S605を繰り返す。最終的に、ステップS607において、類似性判定の結果「顔に類似している」と判定された領域の情報を出力する。
【００５９】
図８から分かるように、入力画像702および703の「顔」は走査ウィンドウ７０５よりも大きいため、顔の一部しかウィンドウ７０５内に含まれない。これらは、「顔でない」領域と判定されてしまう。一方、入力画像704に含まれる顔は、走査ウィンドウと同程度の大きさであるため、「顔である」と判定されることになる。
【００６０】
なお、本発明は上述した実施形態に限定されず種々変形して実施可能である。
【００６１】
【発明の効果】
以上述べたように、本発明によれば、パターン認識の画像照合における処理時間を増大させることなく、しかも照合のロバスト性を向上することができる。
【図面の簡単な説明】
【図１】本発明の第１実施形態に係る画像処理装置の概略構成を示すブロック図
【図２】本発明の第１実施形態に係る画像処理装置において実行される一連の処理手順を示すフローチャート
【図３】各見本画像に対する定性的３値表現による見本量子化画像の例を示す図
【図４】見本画像から作成した見本確率画像の例を示す図
【図５】本発明の第２実施形態に係る画像処理装置において実行される一連の手順のフローチャート
【図６】見本画像中の３箇所の位置における隣接画像との濃淡差分値のヒストグラムを示した図
【図７】本発明の第３実施形態に係る顔検出のための一連の処理手順を示すフローチャート
【図８】入力画像のサイズを異ならせて顔検出を行っている様子を示す図
【符号の説明】
１…画像処理装置、１１…見本画像処理部、１２…対象画像処理部、１３…照合（マッチング）部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to pattern recognition in the technical field of image processing, and more particularly, to a method, an apparatus, and a program for generating an image used for matching in pattern recognition and for pattern recognition using the image.
[0002]
[Prior art]
Image matching is a method for determining the similarity between a plurality of images. Image collation is used when a specific object is detected from an image or when an image search is performed. In the image matching, two sample images (also called templates) and a target image prepared in advance are used, and the similarity between these two images is evaluated based on some measure. Commonly used similarities include SSD (Sum of Square Difference), SAD (Sum of Absolute Difference) or normalized correlation.
[0003]
Both of these evaluation methods that use similarity measure compare the gray values of pixels at the same position in the sample image and the target image. However, the brightness caused by the difference in illumination conditions under which the sample image and the target image were taken There is a problem that collation accuracy is reduced due to fluctuations and noise. As a method of reducing such a problem, there is incremental code correlation (see Non-Patent Document 1 below). In the incremental code correlation, in each of the sample image and the target image, the gradation value increment (magnitude relationship) between pixels adjacent in the horizontal direction is expressed as a code, and the number of coincidence of the codes is used as the similarity. According to the incremental code correlation method, it is known that robust collation can be performed against luminance fluctuations and noise contamination in a range where the code does not reverse.
[0004]
A qualitative ternary expression in which the incremental code correlation is extended is also known (see Non-Patent Document 2 below). In the qualitative ternary expression, the direction dependency of matching is relaxed by evaluating not only the horizontal direction but also pixels adjacent in the vertical direction. Furthermore, the collation accuracy with respect to an image with uniform shading is improved by expressing the magnitude relation and the equivalence relation of the shading values with three values.
[0005]
These two methods are particularly effective for a task that assumes the provision of a single sample image. For example, there are problems such as identification of symbols and landmarks and image alignment. However, there is a problem that a single sample image is insufficient. For example, there is a problem of detecting a human face included in an image. Although the face has the same basic structure such as the relative positional relationship between eyes, nose, and mouth, it has different characteristics depending on the individual, such as the size of those parts, skin color, and the presence or absence of whiskers. . Therefore, there is a problem that individual differences cannot be expressed only by a single sample image.
[0006]
To be honest, it is conceivable to use a plurality of sample images used for collation. However, if the number of sample images is increased, there is a problem that the processing time increases accordingly. Also, the detection accuracy varies greatly depending on which image is selected as a sample.
[0007]
Instead of preparing a plurality of sample images, in the qualitative ternary expression (Non-patent Document 2), the collected average images of many faces are used as samples. The average image is created by calculating the average of the gray values in each pixel. Pixels having individual differences are blurred, and pixels having a common gray value are preserved in gray values, so that there is an effect of highlighting common features on the face. However, there is a problem that a face that has a large difference from the average face cannot be correctly matched. It is necessary to quantitatively evaluate the face likeness even in a portion where the difference from the average face is large, and use it for discrimination from an image that is not a face.
[0008]
[Non-Patent Document 1]
Ichiro Murase, Shunichi Kaneko, Satoru Igarashi, “Robust Image Matching by Incremental Sign Correlation”, IEICE Transactions D-II, Vol.J83-D-II, No.5, pp.1323-1331, 2000
[0009]
[Non-Patent Document 2]
Osamu Yamaguchi, Kazuhiro Fukui, “Image matching based on qualitative ternary expression”, IEICE Technical Report PRMU2002-34, 2002
[0010]
[Problems to be solved by the invention]
In providing a new method using a plurality of sample images instead of using an average image as in the conventional qualitative ternary expression, it is possible to improve the robustness of matching without increasing the processing time. desired. The present invention has been made in consideration of such circumstances, and provides a method, an apparatus, and a program for generating an image used for matching in pattern recognition and for pattern recognition using the image. Objective.
[0011]
[Means for Solving the Problems]
The present invention discloses a method, an apparatus, and a program for generating an image used for matching in pattern recognition from a plurality of sample images each composed of a gray value image. First, a light / dark difference value is calculated for different combinations of two pixels in the plurality of sample images. The grayscale difference value is quantized at a predetermined quantization level, and a plurality of sample quantized images having any one of the predetermined quantization values depending on the quantization level as pixel values are generated. For each quantized value, a probability that the quantized value occurs in each pixel of the plurality of sample quantized images is calculated, and a sample probability image having the calculated occurrence probability value as a pixel value is calculated as the quantized value. Generate for each digitized value. And let this sample probability image be an image used for collation in pattern recognition.
[0012]
In addition, the present invention discloses a pattern recognition method, apparatus, and program for collating with a target image using the sample probability image generated as described above. First, a light / dark difference value is calculated for different combinations of two pixels in a target image consisting of a light / dark image. The grayscale difference value is quantized at a predetermined quantization level, and a target quantized image having any one of the predetermined quantization values depending on the quantization level as a pixel value is generated. Then, the degree of similarity between the target quantized image and the sample probability image is calculated.
[0013]
DETAILED DESCRIPTION OF THE INVENTION
(First Embodiment) First, a first embodiment of the present invention will be described with reference to FIGS. FIG. 1 is a block diagram showing a schematic configuration of an image processing apparatus according to the first embodiment of the present invention. The image processing apparatus according to the present embodiment can be realized using a general-purpose computer. The basic components of the computer such as a CPU, memory, input / output interface, keyboard, and display are not shown. As shown in FIG. 1, the image processing apparatus 1 includes a sample image processing unit 11, a target image processing unit 12, and a collation unit 13. These components correspond to the functions performed by the image processing apparatus 1, and can be realized as a computer program, for example. The function performed by the image processing apparatus 1 is roughly divided into two. One is the processing of the sample image, and the other is the matching processing between the sample image and the target image. In the former sample image processing, a sample probability image used for collation processing is generated from a plurality of sample grayscale images. In the latter collation processing, collation with the target image is performed using the generated sample probability image, and the similarity between the target image and the sample probability image is determined. The present embodiment relates to an image processing apparatus that executes both sample image processing and collation processing, but an image processing apparatus that executes only one of sample image processing and collation processing is also included in the embodiment of the present invention. It is.
[0014]
The images processed by the sample image processing unit 11 are referred to as a sample image I, a sample quantized image Q, and a sample probability image P. The sample image I is an image that is a source of the sample probability image P used for collation with the target image, and has a gray value as a pixel value. The sample image processing unit 11 generates a plurality of sample quantized images Q and sample probability images P based on the plurality of sample images I. The images processed by the target image processing unit 12 are a target image I ′ and a target quantized image Q ′. The target image I ′ is an image given to the image processing apparatus 1 of the present embodiment for collation with the sample probability image P, and is composed of a gray value image like the sample image I. The target image processing unit 12 generates a target quantized image Q ′ based on the target image I ′. Note that the sample image I, the sample quantized image Q, the sample probability image P, the target image I ′, and the target quantized image Q ′ are all stored and held in a hard disk device or the like (not shown).
[0015]
FIG. 2 is a flowchart showing a series of processing steps executed in the image processing apparatus according to the present embodiment. Assume that the sample image and the target image have the same image size and have W × H pixels. The gray value of the pixel at the position (x, y) in the sample image I is I (x, y), and similarly, the gray value of the pixel at the position (x, y) in the target image I ′ is I ′ (x , y).
[0016]
In step S101, a density difference value between two pixels in the sample image I is calculated. There are several ways to combine two pixels, for example, a pixel I (x, y) and a pixel I (x + 1, x) adjacent to the pixel I (x, y) in the horizontal direction. y) can be combined. Alternatively, the pixel I (x, y) can be combined with the lower adjacent pixel I (x, y + 1) adjacent to the pixel I (x, y) in the vertical direction. The method of selecting a combination of pixels is based on the knowledge that the correlation between the target pixel and the adjacent pixel is high as a general image property. Note that a combination of pixels I (x, y) and I (x + 2, y), I (x, y) and I (x + 1, y + 1) may be used. In the present embodiment, a case where an adjacent pixel is selected will be described as an example. In this case, the density difference value can be calculated as I (x + 1, y) -I (x, y) or I (x, y + 1) -I (x, y). When the gray value of the image is expressed by 256 gradations from 0 to 255, the density difference value is 511 gradations from −255 to +255.
[0017]
In step S102, the density difference value between the two pixels is quantized at a predetermined quantization level.
[0018]
For example, when the quantization level is 2, the sample image I can be quantized at the quantization level by obtaining each pixel value of the sample quantized image Q according to the following formula.
[0019]
[Expression 1]

[0020]
This is quantization based on a difference value with a pixel adjacent in the horizontal direction, but in quantization based on a difference value with a pixel adjacent in the vertical direction, the following mathematical formula may be used.
[0021]
[Expression 2]

[0022]
Further, two sample quantized images may be created in the horizontal direction and the vertical direction using both of the above equations.
[0023]
For example, when the quantization level is 3, according to the qualitative ternary expression (see Non-Patent Document 2 above), each pixel value of the sample quantized image Q can be calculated according to the following equation.
[0024]
[Equation 3]

[0025]
Similar to the case where the quantization level is 2, adjacent pixels in the vertical direction may be used instead of adjacent pixels in the horizontal direction. Alternatively, both of them may be used.
[0026]
Note that the sample quantized image created using the adjacent pixels in the horizontal direction has a size of (W−1) × H pixels, and the sample quantized image created using the adjacent pixels in the vertical direction is W × ( H-1) The pixel size.
[0027]
In the above equation, quantization is performed according to the sign of the density difference value between adjacent pixels, but quantization may be performed according to the following equation.
[0028]
[Expression 4]

[0029]
Here, t ₁ and t ₂ are threshold values for quantization. For example, t ₁ = t ₂ = 5 is set, and the range where the grayscale difference value is ± 5 is regarded as the same value. Can be determined to perform.
[0030]
Also, generalize the quantization level as L,
[Equation 5]

[0031]
You may quantize like this. Although the quantization level L may be set to 4 or more, it has been reported that a quantization level of 2 or 3 should be used in order to ensure robustness against fluctuations in brightness and noise contamination (described above). Non-Patent Documents 1 and 2). Therefore, in the present embodiment, description will be made assuming that the above-described qualitative ternary expression is used. Needless to say, the present invention is not limited to qualitative ternary expressions.
[0032]
FIG. 3 is a diagram illustrating an example of a sample quantized image by qualitative ternary expression for each sample image. Here, seven sample images I are created using face images of a plurality of persons so that the positions of eyes and nose that have been manually input in advance substantially coincide. The sample quantized image Q is represented by three brightnesses of white, black, and gray (hatching in the figure) according to the sign of the density difference value. Here, the face image is taken as an example, but an image obtained by cutting out each part of the face such as eyes, nose and mouth may be used.
[0033]
Steps S101 and S102 are executed for all of the prepared sample images I. In step S103, it is determined whether or not sampled quantized images Q (here, a total of 14 images in the horizontal and vertical directions) have been created for all the sample images I, and the process proceeds to the next step.
[0034]
In step S104, a sample probability image P is created. Each pixel value of the sample probability image P is an occurrence probability of each quantization level of the sample quantized image Q. Each pixel value P _l (x, y) of the sample probability image corresponding to the level l (0 ≦ l ≦ L) is the total number N of the sample images and the pixel value Q _n (x, y of the _nth sample quantized image). ) Is calculated by the following mathematical formula.
[0035]
[Formula 6]

[0036]
FIG. 4 is a diagram showing an example of a sample probability image created from 600 sample images. The number of sample probability images P to be created is determined according to the selection method of two pixels in the sample image I and the quantization level L. Although the number of sample images I is 600, it is only an example, and it is preferable to experimentally obtain an appropriate number so as to obtain high collation accuracy. In FIG. 4, since quantization is performed by qualitative ternary expression, there are two pixel selections in the horizontal direction and the vertical direction, and the quantization level is 3, so that six sample probability images P are obtained. Has been created.
[0037]
In FIG. 4, 301 is an average image created from 600 sample images, and is shown for comparison with the sample probability image P. Reference numerals 302 to 304 are sample probability images P created using pixels adjacent in the horizontal direction. 302 holds, as the value of each pixel, the probability that the gray value of the right adjacent pixel is large at each position of the face. 303 represents a probability of being equivalent, and 304 represents a small probability. The higher the probability (closer to 1), the brighter the color, and the lower the probability (closer to 0), the darker the image is displayed. For example, since the area around the cheek often has a uniform shading, the cheek area of the sample probability image 303 is displayed brightly. In addition, since the shade changes greatly in the vicinity of the eyes and nose, bright and dark areas appear in the vicinity of the eyes and nose in the

sample probability image

302 or 304. Reference numerals 305 to 307 are sample probability images created by using pixels adjacent in the vertical direction, and represent the probability that the pixel value under the target pixel is large, the probability of being the same value, and the small probability, respectively. The change in shading is intense near the eyes, nose, and mouth, and the shading does not change near the cheeks.
[0038]
Next, in steps S105 and S106, processing similar to that in steps S101 and S102 is performed on the target image I ′. For example, when the sample image I is quantized by qualitative ternary expression, the target image I ′ is also quantized by qualitative ternary expression to create a target quantized image Q ′.
[0039]
In step S107, the similarity is calculated using the sample probability image P created in step S104 and the target quantized image Q ′ created in step S106. The similarity is obtained from the sample probability image P by taking the occurrence probability of each pixel value of the target quantized image Q ′, and is defined by the following mathematical expression as a weighted multiplication result.
[0040]
[Expression 7]

[0041]
Here, W _{Q ′ (x, y)} is a weighting coefficient for each pixel value of the target quantized image Q ′. When qualitative ternary expression is used, the weight for the sign of the grayscale difference value of the adjacent pixel is used. It becomes. In general, the frequency of occurrence of equivalence codes is lower than the frequency of occurrence of other codes. By increasing the weight when the equivalence code is generated, the accuracy of similarity determination can be improved. For example, W _{Q ′ (x, y) = − 1} = W _{Q ′ (x, y) = 1} = 1, W _{Q ′ (x, y) = 0} = 2 Is set to be twice that of the other codes. Although equivalence codes hardly occur in a face image, since many equivalence codes occur in a background having uniform shading, such weighting is effective in identifying a face and a uniform background.
[0042]
If sufficient sample samples are not collected, that is, if the total number N of sample images is small, P _{Q ′ (x, y)} (x, y) = 0 may occur. At this time, the degree of similarity becomes 0, and there is a problem that probability values in other pixels are not considered. there,
[Equation 8]

As described above, the lower limit value α is set for the pixel value of the sample probability image P, and when a probability value lower than α is obtained, the value is replaced with α. For example, this problem can be dealt with by setting a small value such as α = 0.01.
[0043]
The similarity is the logarithm of the above formula,
[Equation 9]

It is good.
Further, the average value of the pixel values of the sample probability image P may be set as the similarity as in the following equation.
[0044]
[Expression 10]

[0045]
Finally, in step S108, the similarity of the target image I ′ to the sample image group I is determined based on the similarity calculated in step S107. An experimentally determined threshold value is used, and if the degree of similarity exceeds the threshold value, it is determined as “similar”. On the other hand, if the similarity is lower than the threshold, it is determined that “not similar”. For example, when a face image is used as the sample image I, it is determined as “face” if the similarity is equal to or greater than a threshold value, and “not a face” is determined if it is less than the threshold value.
[0046]
(Second Embodiment) Next, a second embodiment of the present invention will be described with reference to FIGS. FIG. 5 is a flowchart of a series of procedures executed in the image processing apparatus according to the second embodiment. Steps S401 to S404 and S409 to S412 are substantially the same processes as steps S101 to S104 and S105 to S108 in the flowchart shown in FIG. 2 of the first embodiment, respectively. 5 and 2 differ in that steps S405 to S408 are newly inserted and the similarity calculation method in step S411 is different. Only these differences will be described below.
[0047]
Steps S405 to S408 in FIG. 5 show processing for creating a false sample probability image that is similar to a sample image but different from a false sample image. In pattern recognition, confusing images that often resemble sample images often appear. This is referred to as a “false sample image”. In pattern recognition, it is necessary to distinguish between a false sample image and a sample image.
[0048]
The false sample image is obtained, for example, by collecting images different from the sample image I among images determined to be similar to the sample image I in the flowchart shown in FIG. Alternatively, a large amount of images that simply do not include a sample may be collected. By using a false sample probability image created from such a false sample image group, it becomes possible to correctly identify a confusing image. Steps S405 to S408 correspond to steps S101 to S104 in the flowchart shown in FIG. 2 and differ only in that a false sample image is used instead of the sample image.
[0049]
In step S411, a sample image is obtained from three images: a sample probability image P created from the sample image group I, a false sample probability image P ^F created from the false sample image group, and a target quantized image Q ′ created from the target image. The similarity between the group I and the target image I ′ is calculated. The similarity, using the ratio of P and P ^F at each pixel are defined as follows.
[0050]
[Expression 11]

[0051]
Also, take the logarithm of the above formula,
[Expression 12]

It is good.
Further, the average value of the ratio of the sample probability image P and false sample probability image P ^F at each pixel may be similarity as follows.
[0052]
[Formula 13]

[0053]
By using the false sample image group, it is possible to calculate a similarity degree that emphasizes the difference between the sample and the false sample.
[0054]
FIG. 6 shows a histogram of shade difference values with adjacent images at three positions in the sample image. Since the gradation value is expressed by 256 gradations of 0 to 255, the gradation difference value is 511 gradations of −255 to +255. Depending on the position, there is a bias in the distribution of light and shade difference values, and this bias represents the characteristics of the face. For example, since the gray value of the eyes is lower than that of the surroundings, the distribution of the histogram is biased to be larger or smaller than 0 when the gray level difference value is obtained near the eyes P1 or P2. On the other hand, in the vicinity of the nose P3 where there is almost no difference from the surrounding shade values, the shade difference value is frequently zero, and a distribution centering around this is formed.
[0055]
In a target image that is not a face, it is considered that the distribution of density difference values is different from that of the face. The false sample image may be different from the case of being close to the distribution of the sample image depending on the position in the image, and the similarity that emphasizes the difference in distribution can be obtained by taking the ratio of the probability images.
[0056]
(Third embodiment)
Next, a third embodiment of the present invention will be described with reference to FIGS. The third embodiment relates to an application example to face detection. In the present embodiment, it is assumed that the sample image and the target image have the same size. However, there is a problem that the size of the face in the input image does not necessarily match the sample image. A series of processing procedures for face detection shown in FIG. 7 is configured to be able to cope with such image size mismatch.
[0057]
First, in step S601, a sample probability image is created. The procedure for creating the sample probability image is the same as that shown in the first embodiment. A false sample probability image that is similar to a face but not a face may be created at the same time (see the second embodiment). In step S602, a plurality of images obtained by enlarging and reducing the size of the input image with various scales are created and stored. If the scales of enlargement and reduction are changed closely, an input image group including faces that are slightly different in size is created. As a result, any of the input images includes a face having the same size as the sample image.
[0058]
Next, as shown in FIG. 8, a scanning window 705 is installed in each of the input images 702 to 704. The size of the scanning window 705 is the same as the sample image for creating the sample probability image 701. The image inside the window 705 is cut out while shifting the scanning window 705 little by little from the edge of the input image (step S603). Similarity determination is performed using the image inside the window 705 as a target image (step S604). The method for determining similarity is as described above. The similarity determination may be performed based on the similarity using the false sample probability image. It is determined whether or not all areas in the input image have been scanned (step S605), and if scanning has been completed, steps S603 to S605 are repeated for other stored input images having different sizes. Finally, in step S607, information on an area determined as “similar to face” as a result of the similarity determination is output.
[0059]
As can be seen from FIG. 8, since the “face” of the

input images

702 and 703 is larger than the scanning window 705, only a part of the face is included in the window 705. These are determined to be “non-face” regions. On the other hand, since the face included in the input image 704 is about the same size as the scanning window, it is determined to be “face”.
[0060]
The present invention is not limited to the above-described embodiment, and can be implemented with various modifications.
[0061]
【The invention's effect】
As described above, according to the present invention, it is possible to improve the robustness of matching without increasing the processing time in pattern matching image matching.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a schematic configuration of an image processing apparatus according to a first embodiment of the present invention. FIG. 2 is a flowchart showing a series of processing procedures executed in the image processing apparatus according to the first embodiment of the present invention. FIG. 3 is a diagram showing an example of a sample quantized image by qualitative ternary expression for each sample image. FIG. 4 is a diagram showing an example of a sample probability image created from the sample image. FIG. 5 is a second embodiment of the present invention. FIG. 6 is a flowchart of a series of procedures executed in the image processing apparatus according to the embodiment. FIG. 6 is a diagram showing a histogram of density difference values from adjacent images at three positions in the sample image. FIG. 8 is a flowchart showing a series of processing procedures for face detection according to the embodiment. FIG. 8 is a diagram showing how face detection is performed by changing the size of the input image.
DESCRIPTION OF SYMBOLS 1 ... Image processing apparatus, 11 ... Sample image processing part, 12 ... Target image processing part, 13 ... Collation (matching) part

Claims

A method for generating an image used for matching in pattern recognition from a plurality of sample images each consisting of a gray value image,
Calculating shade difference values for different combinations of two pixels in the plurality of sample images;
Quantizing the grayscale difference value at a predetermined quantization level, and generating a plurality of sample quantized images having any of the predetermined quantization values depending on the quantization level as pixel values;
For each quantized value, calculating a probability that the quantized value will occur in each pixel of the plurality of sample quantized images;
Generating a sample probability image having the calculated occurrence probability value as a pixel value for each quantized value;
And using the sample probability image as an image used for matching in pattern recognition.

The method according to claim 1, wherein the density difference value is calculated for different combinations of two adjacent pixels.

The method according to claim 1, wherein the grayscale difference value is quantized with three quantization levels.

Calculating a gray level difference value for different combinations of two pixels in a plurality of false sample images each consisting of a gray level image;
Quantizing the grayscale difference value at a predetermined quantization level, and generating a plurality of false sample quantized images having any of the predetermined quantization values depending on the quantization level as pixel values;
For each quantized value, calculating a probability that the quantized value occurs in each pixel of the plurality of false sample quantized images;
Generating a false sample probability image having the calculated occurrence probability value as a pixel value for each quantized value;
The method of claim 1, further comprising:

Calculating a gray level difference value for different combinations of two pixels in a target image consisting of a gray level image;
Quantizing the grayscale difference value at a predetermined quantization level, and generating a target quantized image having any of the predetermined quantization values depending on the quantization level as a pixel value;
Calculating a similarity between the target quantized image and a sample probability image generated according to the method according to claim 1;
A pattern recognition method comprising:

The step of calculating the similarity includes a step of acquiring an occurrence probability value of each pixel value in the target quantized image from the sample probability image and calculating a multiplication result of the occurrence probability values of all pixels as the similarity. 6. The method of claim 5, wherein:

The step of calculating the similarity includes a step of obtaining an occurrence probability value of each pixel value in the target quantized image from the sample probability image and calculating an average value of the occurrence probability values of all pixels as the similarity. 6. The method of claim 5, wherein:

6. The method according to claim 5, wherein a false sample probability image generated according to the method according to claim 4 is used in the similarity calculation in addition to the sample probability image.

An image processing apparatus for generating an image used for matching in pattern recognition from a plurality of sample images each consisting of a gray value image,
Means for calculating a density difference value for different combinations of two pixels in the plurality of sample images;
Means for quantizing the grayscale difference value at a predetermined quantization level and generating a plurality of sample quantized images having any one of the predetermined quantization values depending on the quantization level as a pixel value;
Means for calculating, for each quantized value, a probability that the quantized value occurs in each pixel of the plurality of sample quantized images;
An image processing apparatus comprising: means for generating, for each quantized value, a sample probability image having the calculated occurrence probability value as a pixel value.

Means for calculating a grayscale difference value for different combinations of two pixels in a target image consisting of a grayscale image;
Means for quantizing the grayscale difference value at a predetermined quantization level and generating a target quantized image having any one of the predetermined quantization values depending on the quantization level as a pixel value;
Means for calculating a similarity between the target quantized image and a sample probability image generated by the image processing device according to claim 9;
A pattern recognition apparatus comprising:

A program for generating an image used for matching in pattern recognition from a plurality of sample images each consisting of a gray value image,
Calculating a grayscale difference value for different combinations of two pixels in the plurality of sample images;
A step of quantizing the grayscale difference value at a predetermined quantization level, and generating a plurality of sample quantized images having any one of the predetermined quantization values depending on the quantization level as pixel values;
For each quantized value, a procedure for calculating a probability that the quantized value occurs in each pixel of the plurality of sample quantized images;
Generating a sample probability image having the calculated occurrence probability value as a pixel value for each quantized value;
An image processing program for causing a computer to execute.

A procedure for calculating a gray level difference value for different combinations of two pixels in a target image consisting of a gray level image;
A procedure of quantizing the grayscale difference value at a predetermined quantization level and generating a target quantized image having any one of the predetermined quantization values depending on the quantization level as a pixel value;
A procedure for calculating a similarity between the target quantized image and a sample probability image generated by the program according to claim 11;
A pattern recognition program that causes a computer to execute.