JP2004152087A

JP2004152087A - Method and apparatus for extracting feature vector of image

Info

Publication number: JP2004152087A
Application number: JP2002317610A
Authority: JP
Inventors: Sadataka Akahori; 貞登赤堀
Original assignee: Fuji Photo Film Co Ltd
Current assignee: Fujifilm Holdings Corp
Priority date: 2002-10-31
Filing date: 2002-10-31
Publication date: 2004-05-27

Abstract

<P>PROBLEM TO BE SOLVED: To extract feature vectors reflecting variance of distribution shapes in component signal values of images, while being low in dimension for effectively performing comparison, sorting, semantic decision or the like of two-dimensional images. <P>SOLUTION: In respect to at least one kind of component signal value, an accumulated histogram of the component signal value assigned to a plurality of pixels constituting the two-dimensional images is created, and the values of n quantile points of the accumulated histogram are extracted as the feature vectors. The values of n are not smaller than 3, and natural numbers equal to or smaller than the dimensional number of the accumulated histogram. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、２次元画像の特徴を表す特徴量を抽出する方法および装置に関し、特に、有効に低次元化された特徴量を抽出する方法および装置に関する。
【０００２】
【従来の技術】
２次元のデジタル写真画像や動画の１フレームの比較や分類を行うために、それらの画像またはそれらをブロックに分割したブロック画像から、当該画像の特徴を示す特徴量を抽出することが従来から行われている。また、このような特徴量の抽出は、２次元画像に含まれている撮影対象が何であるか、すなわち２次元画像の意味を判定するためにも用いることができる。意味の判定を行うことにより、特定の撮影対象に相当する画像領域のみに画像処理を施して高画質化を図ること等が可能となる。たとえば、色に関する特徴量に基づいて人物の肌に相当する画像領域を特定し、その画像領域のみに対し雑音を取り除く処理を施し、美しい肌色に仕上げる技術等が提案されている（たとえば、特許文献１参照）。
【０００３】
かかる２次元画像の特徴量として有用なものとして、ヒストグラムを用いたものが知られている。たとえば、ＲＧＢ表色系の各成分について、０から２５５の各濃度値を示す画素数を計数してヒストグラムを作成し、それらの３×２５６個の計数値をそのまま特徴量とする手法や、ＲＧＢ表色系の色空間を４×４×４個等の領域に分割して、各領域に属する画素数を計数したヒストグラムを作成し、各計数値を特徴量とする手法がある（たとえば、特許文献２参照）。
【０００４】
また、上記のようにヒストグラムの計数値をそのまま２次元画像の特徴量として用いるのではなく、色相、彩度および明度について作成したヒストグラムの各々について、その代表値として平均値と標準偏差を求めて、特徴量として用いる方法も知られている（たとえば、特許文献３参照）。この方法では、ヒストグラムの計数値をそのまま特徴量として用いる場合に比べ、抽出される特徴量の数、すなわち特徴量の次元を小さくすることができる。
【０００５】
一方、画像処理の分野ではないが、特に正規分布を示さない頻度分布に関し、平均値や標準偏差に代わるより適当な分布の代表値としてｎ分位点の値を用いることが、計測器の異常検出等の分野において提案されている（たとえば、特許文献４参照）。ここで、「ｎ分位点の値」とは、分布を調べる対象であるＮ個の値を小さい順に並べた場合に、小さい方からＮ／ｎ番目、２Ｎ／ｎ番目・・・Ｎ（ｎ−１）／ｎ番目となる各値を指す。
【０００６】
【特許文献１】
特公平５−６２８７９号公報
【０００７】
【特許文献２】
特開２０００−３５３１７３号公報
【０００８】
【特許文献３】
特開平７−２９００７号公報
【０００９】
【特許文献４】
特開平８−１６６８２０号公報
【００１０】
【発明が解決しようとする課題】
上記に説明したヒストグラムの計数値をそのまま２次元画像の特徴量として使用する方法は有効ではあるが、抽出される特徴量の数、すなわち特徴量の次元が一般に大きくなり、特徴量に基づいて画像の比較、分類、意味判定等を行う際の計算量が膨大になるという欠点がある。かかる問題は、精度を向上させるためにヒストグラムの刻み幅を細かくすればするほど深刻になる。一方、特徴量の次元を低く抑えるためにヒストグラムの刻み幅を粗くすれば、特徴量の数値精度が低くなってしまう。
【００１１】
ヒストグラムの平均値と標準偏差を特徴量として使用する方法によれば、特徴量の次元は小さく抑えることができる。しかしながら、２次元画像をなす各画素が担う輝度成分、色成分、エッジ成分等の成分信号値の分布は、様々な分布形状を示すものであり、特に複数の撮影対象や一様でないテクスチャーを含む画像については、分布形状の複雑さは高くなる。したがって、ヒストグラムの代表値を特徴量として使用する場合には、その代表値は分布形状の相違を適切に反映するものであることが好ましい。この点で、平均値および標準偏差は、２次元画像の特徴量として用いる代表値としては必ずしも適当でなく、特徴量に基づく画像の比較、分類、意味判定等の精度を低下させてしまう可能性が高い。
【００１２】
本発明は、かかる事情に鑑み、数値精度を保ちながら有効に低次元化され、かつ成分信号値の分布形状の相違を適切に反映した特徴量を、２次元画像から抽出することを目的とするものである。
【００１３】
【課題を解決するための手段】
すなわち、本発明に係る画像の特徴量を抽出する方法は、２次元画像から、該２次元画像の特徴を表す特徴量を抽出する方法であって、少なくとも１種の成分信号値に関し、その２次元画像をなす複数の画素に割り当てられた該成分信号値の累積ヒストグラムを作成する工程と、該累積ヒストグラムのｎ分位点の値を、特徴量として抽出する工程を含み、上記のｎが、３以上、上記の累積ヒストグラムの次元数未満の、自然数であることを特徴とする方法である。
【００１４】
また、本発明に係る画像の特徴量を抽出する装置は、２次元画像から、該２次元画像の特徴を表す特徴量を抽出する装置であって、少なくとも１種の成分信号値に関し、その２次元画像をなす複数の画素に割り当てられた該成分信号値の累積ヒストグラムを作成する手段と、該累積ヒストグラムのｎ分位点の値を、特徴量として抽出する手段を備え、上記のｎが、３以上、上記の累積ヒストグラムの次元数未満の、自然数であることを特徴とする装置である。
【００１５】
ここで、「特徴量」とは、ある２次元画像の特徴を示すパラメータとなる量の総称であって、たとえば色の特徴、輝度の特徴、奥行情報、該画像に含まれるエッジの特徴等を示す特徴量が含まれ得る。
【００１６】
また、「成分信号値」とは、２次元画像の各画素に割り当てられた当該画像の１成分の信号値であり、たとえば輝度成分の信号値や色成分の信号値が含まれる。さらに、輝度成分の分布等から導出されたエッジ画像等、何らかの処理を施した画像の各画素の信号値や、それらの信号値の絶対値を取ったものや規格化したもの等も、「成分信号値」に含まれるものとする。
【００１７】
また、「ｎ分位点」とは、分布の代表点の１種であって、分布を調べる対象であるＮ個の値を小さい順に並べた場合に、小さい方からＮ／ｎ番目、２Ｎ／ｎ番目・・・Ｎ（ｎ−１）／ｎ番目となる各値が、「ｎ分位点の値」となる。言い換えれば、これらのＮ個の値の累積ヒストグラムを作成して、縦軸をｎ等分に分割した点に対応する横軸の値を特定することにより、「ｎ分位点の値」を求めることができる。たとえば、四分位点であれば、規格化された累積ヒストグラムの縦軸の値０．２５、０．５０および０．７５に対応する横軸の値が、四分位点の値になる。
【００１８】
ヒストグラムや累積ヒストグラムの「次元数」とは、ヒストグラムや累積ヒストグラムの作成時において想定されている柱の本数を指すものとする。
【００１９】
また、本発明は、上記の特徴量に基づいて、さらに２次元画像の意味を判定するものであってもよい。ここで、２次元画像の「意味を判定する」とは、その画像が何を撮影した画像であるかを判定することを指す。この意味の判定は、２種以上の成分信号値に関して上記の特徴量を抽出し、該２種以上の成分信号値に関する特徴量の組合せに基づいて行うことが好ましい。これらの２種以上の成分信号値は、輝度成分の信号値、色成分の信号値およびエッジ成分の信号値のうちの少なくとも２つが含まれるように選択することができる。これらの場合において、上記の意味の判定は、自己組織化マップを用いて行ってもよい。
【００２０】
なお、上記のｎの値は、２０以下であることが好ましい。
【００２１】
また、特徴量を抽出する対象である上記の２次元画像は、全体画像であってもよいし、全体画像を分割して得られたブロック画像であってもよい。
【００２２】
ここで、「全体画像」とは、撮影したデジタル写真画像や、動画の１フレームの、１枚分全体に相当する２次元画像を指すものとする。一方、「ブロック画像」とは、全体画像をいくつかの領域（ブロック）に分割した各画像片を指し、たとえば、１０２４×１２８０画素の全体画像を３２×３２画素の大きさに分割したそれぞれの画像片等がこれに相当する。なお、上記の「意味を判定する」ことには、たとえば、ブロック画像について、「空」、「建物」、「草原」等のいずれの対象が撮影されたブロックであるかを判定することや、全体画像について、「人物写真」、「建物の写真」、「海の風景写真」等のいずれであるかを判定することが含まれる。
【００２３】
【発明の効果】
本発明の画像の特徴量を抽出する方法および装置は、成分信号値の累積ヒストグラムのｎ分位点の値を特徴量として抽出するものであり、ｎは累積ヒストグラムの次元数未満の自然数であるので、特徴量の次元を有効に圧縮し、画像の比較、分類、意味判定等を行う際の計算量を抑えることができる。加えて、ｎが３以上の自然数であるので、ヒストグラムの平均値や標準偏差を特徴量として用いる場合と異なり、成分信号値の分布形状の相違を特徴量に反映させることができ、信頼性の高い比較、分類、意味判定等を行うことができる。さらに、基となる累積ヒストグラムの刻み幅を細かくしておけば、ｎ分位点の値を特徴量とすることにより特徴量の次元を圧縮しても、特徴量の数値精度は高く保つことができる。
【００２４】
特に、当該特徴量を２次元画像の意味の判定に用いる場合には、複数の撮影対象や一様でないテクスチャーを含み、成分信号値の分布形状が複雑な画像についても、適切な意味の判定が期待できる。さらに、１種のみではなく２種以上の成分信号値、たとえば輝度成分の信号値、色成分の信号値およびエッジ成分の信号値のうちの少なくとも２つに関する特徴量の組合せに基づいて画像の意味を判定することとすれば、意味判定の信頼性を格段に高めることができる。
【００２５】
また、特に自己組織化マップを用いた意味判定においては、抽出した各特徴量を成分とする特徴ベクトルを、自己組織化マップ上の多数の参照特徴ベクトルの全てと比較する計算を行わなくてはならないので、上記の特徴量の次元圧縮の効果は極めて重要である。
【００２６】
【発明の実施の形態】
以下、図面により、本発明の例示的な実施形態を詳細に説明する。
【００２７】
図１は、本発明による画像の特徴量を抽出する方法または装置を利用した、２次元の全体画像に含まれる各画像領域の意味特定処理の手順を示したフローチャートである。この処理は、全体画像中の個々の画像領域、すなわち「空」、「建物」、「草原」等の撮影対象のいずれかに対応すると考えられる個々の有意な領域について、その意味を特定するものであり、その後、意味に基づく画像分類や、各意味に対応する画像領域ごとに区別された条件による画像処理を行うために有用な処理である。まず、ステップ１０において処理対象である全体画像を表す画像データが読み込まれ、ステップ１２において適当な画像領域が特定され、ステップ１４において全体画像がブロック画像に分割され、ステップ１６において各ブロック画像から特徴量が抽出されて各ブロック画像の意味が判定され、それに基づいて各画像領域の意味が特定される。これらの各ステップのうち、本発明は特にステップ１６において使用される特徴量の抽出方法および装置に関するものであるが、他のステップについても、以下、順を追って説明していく。
【００２８】
ステップ１２における画像領域の特定手法の例については、図２を用いて説明する。
【００２９】
図２の（ａ）は処理対象である原画像としての全体画像を示す。まず、この原画像を構成する各画素に関し、隣接する画素の色の特徴を比較して、類似画素を統合することとする。ここで、色の特徴を比較して類似画素を統合するとは、たとえば、ＲＧＢ表色系で表された原画像の各成分信号値、すなわちＲ、ＧおよびＢの各成分の濃度値を、隣接画素間でそれぞれ比較して、いずれの成分信号値の差もが所定の閾値を超える場合に、それらの画素を統合する等の処理を行うことである。ＲＧＢ表色系に代えて、ＹＣＣ表色系で表された各成分信号値を比較してもよい。この比較および統合は、上記の閾値等の所定の基準によりそれ以上の統合が起こらなくなるまで順次繰り返され、類似の色の特徴を有する画素からなる区域が拡大していく。この類似画素の統合が完了した後の状態が、図２の（ｂ）の状態であるとする。
【００３０】
ここに、図２の（ｂ）に示した画像を構成する各区域のうち、周囲長が所定の長さより短い区域を「微小区域」と呼び、周囲長が該所定の長さ以上である区域を「非微小区域」と呼ぶこととする。図２の（ｂ）においては、区域２０および２２等は微小区域、区域２４、２６および２８等は非微小区域である。
【００３１】
次に、図２の（ｂ）の画像を構成する各区域を隣接する区域と比較して、統合可能なものをさらに統合するのであるが、この区域の統合の基準は、微小区域と非微小区域で異なる。微小区域については、１の非微小区域に完全に包含されている微小区域（たとえば非微小区域２６に完全に包含されている微小区域２０）は、その１の非微小区域に統合されるものとする。また、２以上の非微小区域と境界を接する微小区域は、接する境界の長さが長い方の非微小区域に統合されるものとする。この基準によれば、微小区域統合後の状態は、図２の（ｃ）のようになる。
【００３２】
非微小区域については、当該非微小区域をなす画素の平均の色の特徴を、隣接する各非微小区域をなす画素の平均の色の特徴と比較し、類似の度合いが閾値等による所定の基準を超える隣接非微小区域がある場合は、統合が行われる。たとえば、図２の（ｃ）における非微小区域２４の平均の色の特徴について、非微小区域２６の平均の色の特徴との類似の度合いは上記の所定の基準を超えるが、非微小区域２８の平均の色の特徴との類似の度合いは上記の所定の基準以下である場合は、当該非微小区域２４は、非微小区域２６と統合され、非微小区域２８とは統合されない。かかる所定の基準による非微小区域の統合の最終的な結果は、たとえば図２の（ｄ）のようになる。この最終的な状態の画像を構成する各領域が、「画像領域」として特定される。
【００３３】
以上、図１のステップ１２における画像領域への分割手法の例を図２を用いて説明したが、このステップ１２における画像領域への分割が、他のいかなる周知の手法によるものでもよいことは言うまでもない。
【００３４】
図１に戻って、ステップ１４では、処理対象である全体画像がブロック画像に分割される。本実施形態では、全体画像は１０２４×１２８０画素のデジタル写真画像であるとし、ブロック画像は各々３２×３２画素の画像であるとする。分割された全体画像を、図３に示す。なお、図では、説明の便宜のため、実際よりも粗い分割で示してある。
【００３５】
続いて、図１のステップ１６において、分割された各ブロック画像から特徴量が抽出され、各ブロック画像の意味が判定され、それに基づいて各画像領域の意味が特定される。ステップ１６において行われる処理の詳細な工程を、図４のフローチャートに示す。
【００３６】
まず、図４のステップ４０において、図２の（ｄ）のように特定された複数の画像領域のうちの１の画像領域に包含されるブロック画像が特定される。ここで、１の画像領域に包含されるブロック画像とは、その画像領域に完全に包含されているブロック画像を言い、画像領域間の境界にまたがるブロック画像は含まないものとする。
【００３７】
次に、ステップ４２において、ステップ４０で特定されたブロック画像の１つについて、そのブロック画像に含まれる各画素が担う輝度成分の信号値の分布から、輝度に関する特徴量が抽出される。ここで輝度成分の信号値とは、本実施形態ではＹＣＣ表色系で表された当該ブロック画像のＹ成分の信号値を指すが、これに限られない。
【００３８】
このステップ４２において行われる処理を、図５のフローチャートにより詳細に示す。
【００３９】
まず、図５のステップ６０において、現在の特徴量抽出対象であるブロック画像に関し、各画素が担う輝度成分の信号値のヒストグラムを作成する。このヒストグラムは、たとえば現在のブロック画像が、図３中のブロック画像３０のような輝度が略一様の１つの撮影対象（ここでは空）のみを含むものであれば、図６の（ａ）の左側に示すような形状になる。図３中のブロック画像３２のような、それぞれ輝度が略一様の複数の撮影対象（ここでは建物の壁部分と窓部分）を含むブロック画像については、図６の（ｂ）の左側に示すような、複数のピークを有する形状になる。図３中のブロック画像３４のような、色が略一様であるが輝度にむらのある撮影対象（ここでは草原）を含むブロック画像については、図６の（ｃ）の左側に示すような形状になる。これらの例では、ヒストグラムの次元数は２５６である。
【００４０】
次に、図５のステップ６２において、ステップ６０で作成されたヒストグラムを基に、輝度成分の各信号値の累積計数値を示す累積ヒストグラムが作成され、規格化される。この累積ヒストグラムは、上記に例示した図３中のブロック画像３０、３２および３４に関しては、それぞれ図６の（ａ）、（ｂ）および（ｃ）の右側のような形状になる。このように、累積ヒストグラムの形状は、現在のブロック画像に含まれる撮影対象の数やテクスチャーを反映した、該ブロック画像に特有の形状となる。
【００４１】
続いて、図５のステップ６４において、上記の規格化された累積ヒストグラムから３つの四分位点の値が特定され、輝度に関する特徴量とされる。具体的には、図６に示すように、規格化された累積ヒストグラムの縦軸の値０．２５、０．５０および０．７５のそれぞれに対応する横軸の値が、四分位点の値として特定され、これらの３つの値が輝度に関する特徴量とされる。これらの四分位点の値や間隔は、累積ヒストグラムの形状の違い、すなわちもとのヒストグラムにおける輝度成分の信号値の分布形状の違いを反映したものとなっている。
【００４２】
上記のようにして輝度に関する３つの特徴量が抽出された後、図４に戻ってステップ４４において、同じ現在のブロック画像に関し、各画素が担う色成分の信号値から、さらに追加の特徴量が抽出される。本実施形態では、色成分の信号値として、ＹＣＣ表色系で表された当該ブロック画像の２つの色差成分（すなわち、Ｃｒ成分とＣｂ成分）の各信号値を用い、これら２つの色成分の信号値について、上記と同様の手法により、規格化された累積ヒストグラムを作成し、四分位点の値を特徴量として特定する。すなわち、２つの色差成分についてそれぞれ３つの四分位点の値が特徴量として特定されるので、合計で６つの色に関する特徴量が抽出されることになる。なお、色成分の信号値としては、本実施形態で使用するＣｒ成分とＣｂ成分の各信号値に限らず、たとえばＲＧＢ表色系における各成分の信号値等を用いてもよい。
【００４３】
次に、ステップ４６において、同じ現在のブロック画像に関し、エッジ成分の信号値から、さらに追加の特徴量が抽出される。本実施形態では、当該ブロック画像のＹＣＣ表色系におけるＹ成分の画像に対し、図７に示すようなエッジ検出用のフィルターを適用することにより求めた縦エッジ画像の絶対値成分の信号値、ならびに横エッジ画像の絶対値成分の信号値の２種類が、エッジ成分の信号値として使用する。ここでも、上記と同様の手法により、これら２種類のエッジ成分の信号値について、規格化された累積ヒストグラムを作成し、四分位点の値を特徴量として特定する。これにより、合計で６つのエッジの特徴に関する特徴量が抽出される。なお、エッジ成分の信号値は、上記の本実施形態で使用されるものに限られず、たとえば、絶対値成分ではなくエッジ画像のそのままの信号値を用いてもよいし、４方向のエッジ検出用フィルターを用いて導出した４種類の信号値等を用いてもよい。
【００４４】
以上のステップ４２から４６により、現在のブロック画像についての合計１５個の特徴量が抽出が終了すると、ステップ４８に進んで、該ブロック画像の意味の判定が行われる。本実施形態では、これら１５個の特徴量を成分とする「特徴ベクトル」を、「自己組織化マップ」上に写像する方法を用いて、ブロック画像の意味を判定する。
【００４５】
「自己組織化マップ」とは、図８に示すように、複数の参照特徴ベクトルに対応する点が２次元的に配されたマップであり、互いに類似する参照特徴ベクトルに対応する点は互いに近い位置に配置されている。この自己組織化マップには、やはり図８に示すように、対応する大きさの意味のマップが付随している。この意味のマップは、自己組織化マップ上の各参照特徴ベクトルに対応する画像が「空」の画像である確率の分布マップ、「建物」の画像である確率の分布マップ等の、複数の確率分布マップが重ね合わされたものである。この自己組織化マップおよび意味のマップは、予め、「空」や「建物」の画像であることが分かっている多数の画像の特徴ベクトルを、コンピュータに学習させることにより作成されている。
【００４６】
この学習過程の例を概略的に説明すると、学習前の状態においては、自己組織化マップ上には、様々な参照特徴ベクトルが、ランダムに分布している。また、各確率分布マップの各点の初期値は、０とされている。ここに、学習対象として、まず「空」であることが分かっている１枚の画像の特徴ベクトルが入力されると、自己組織化マップ上において、この入力された特徴ベクトルに最も類似する参照特徴ベクトルが特定される。この特定は、たとえば、入力された特徴ベクトルとのユークリッド距離が最も小さい参照特徴ベクトルを探索する等により行われる。すると、自己組織化マップ上においてその特定された参照特徴ベクトルの近傍、たとえば７×７の範囲にある参照特徴ベクトルが、上記の特定された参照特徴ベクトルに近づくように（すなわち、特定された参照特徴ベクトルとの類似度が高まるように）修正される。一方、「空」の画像である確率の分布マップ上では、上記の特定された参照特徴ベクトルに対応する点およびその７×７の範囲の近傍の点に、たとえば「１」の頻度値が加算される。次に、「建物」であることが分かっている１枚の画像の特徴ベクトルが入力されると、上記と同様に、自己組織化マップ上において、最も類似する参照特徴ベクトルの特定、および近傍の参照特徴ベクトルの修正が行われる。一方、「建物」の画像である確率の分布マップ上では、特定された参照特徴ベクトルに対応する点およびその近傍の点に、「１」の頻度値が加算される。このような学習を繰り返すと、自己組織化マップ上では、類似の特徴を示す参照特徴ベクトルが、徐々に互いに近い位置に集まってくる。一方、それぞれの確率分布マップ上でも、徐々に島状の頻度の分布が形成されていく。学習が進んで類似の参照特徴ベクトルが集合してくるにしたがって、当初７×７であった参照特徴ベクトルの修正を行う近傍の大きさは、徐々に小さくされていく。学習終了後、それぞれの確率分布マップは規格化されて重ね合わされ、図８に示すような意味のマップが形成される。
【００４７】
図４のステップ４８では、現在のブロック画像から抽出された特徴ベクトルは、上記のような学習により予め導出された自己組織化マップ上の、最も類似する参照特徴ベクトルに対応する点に写像される。ここでも、類似度の評価は、両ベクトル間のユークリッド距離等を指標として行われる。そして、自己組織化マップの写像先の点に対応する、意味のマップ上の点が参照され、「空」、「建物」等の「意味」のうち、その点において最も高い確率を示している意味が、現在のブロック画像の意味とされる。
【００４８】
ここで、本実施形態では四分位点の値を特徴量としているので、５種類の異なる成分信号値（すなわち、輝度成分、２つの色成分、２つのエッジ成分の各信号値）に基づきながら、特徴ベクトルの次元が１５次元に抑えられており、しかもなお成分信号値の分布形状の相違が特徴ベクトルに反映されている。さらに、基となる累積ヒストグラムの刻み幅を細かくすることにより、特徴量の数値精度も高く保つことができる。したがって、抽出された特徴ベクトルと参照特徴ベクトルの比較等に要する計算量を抑えながら、信頼性の高い意味判定を行うことができる。
【００４９】
続いて、図４のステップ５０において、現在の画像領域に含まれるブロック画像がまだ残っているかどうかが確認され、現在の画像領域に含まれる全てのブロック画像の意味が判定されるまで、ステップ４２から５０が繰り返される。
【００５０】
現在の画像領域に含まれる全てのブロック画像の意味の判定が終了すると、ステップ５２において、各ブロック画像の判定された意味のうち最多のものが、現在の画像領域の意味として特定される。たとえば、図２の（ｄ）の画像領域２８に含まれるブロック画像の中には、「水」等の意味に判定されるものも混在し得るが、大半は「空」の意味に判定されるので、画像領域２８は「空」の領域として特定される。
【００５１】
続いて、ステップ５４において、まだ意味を特定していない画像領域が残っているかどうかが確認され、全ての画像領域の意味が特定されるまで、図４に示した工程が繰り返される。
【００５２】
なお、本実施形態では、四分位点の値を特徴量としているが、本発明はｎ分位点の値を特徴量として抽出する点を特徴とするものであり、ｎの値は４に限られず、３以上、累積ヒストグラムの次元数未満の自然数であれば、いかなる値でもよい。抽出した特徴量を画像の意味の判定に用いる場合には、意味判定のための計算量が膨大になることを防止する観点から、ｎの値は３以上２０以下程度が好ましい。
【００５３】
また、上記の実施形態では、輝度成分、色成分およびエッジ成分の各信号値から抽出した特徴量を併用して画像の意味を判定したが、これに限られず、輝度成分、色成分およびエッジ成分の一部のみを使用してもよいし、これらに代えてあるいは加えて、奥行情報等に関する他の成分信号値を使用してもよい。
【００５４】
さらに、上記の実施形態では、画像の意味の判定方法として自己組織化マップを用いた方法を利用したが、これに限らず他の方法を利用してもよい。たとえば、意味が分かっている多数の画像を予め特徴量空間に写像して学習し、各意味の画像の特徴量空間における重心座標とばらつき（標準偏差）を求めておいて、意味判定の対象である画像から特徴量を抽出して特徴量空間に写像し、学習で求めた各意味の画像の重心とのマハラノビス距離に基づいて意味を判定する方法等を採用してもよい。また、主成分分析により、さらに低次元化された特徴ベクトルを用いてもよい。
【００５５】
また、上記では、本発明の１つの実施形態として、２次元の全体画像をブロック画像に分割し、それぞれのブロック画像の意味を判定することにより、該全体画像に含まれる各画像領域の意味を特定する処理について説明したが、本発明の別の実施形態として、全体画像の意味判定処理への適用も考えられる。この場合、たとえば、「人物写真」、「建物の写真」、「海の風景写真」等であることが分かっている全体画像を予め学習して自己組織化マップと意味のマップを作成しておき、意味判定対象である全体画像から、累積ヒストグラムの作成およびｎ分位点の特定により抽出した複数の特徴量を成分とする特徴ベクトルを、自己組織化マップ上に写像する。ここでも、意味判定方法は自己組織化マップを用いたものに限られないことは言うまでもない。
【００５６】
なお、本発明による画像の特徴量を抽出する方法および装置は、上記に説明したような画像の意味判定処理だけでなく、画像の比較、分類等、他の様々な画像処理にも応用できるものである。
【００５７】
以上、本発明の実施形態について詳細に述べたが、これらの実施形態は例示的なものに過ぎず、本発明の技術的範囲は、本明細書中の特許請求の範囲のみによって定められるべきものであることは言うまでもない。
【図面の簡単な説明】
【図１】本発明の１つの実施形態である、２次元の全体画像に含まれる各画像領域の意味特定処理の手順を示したフローチャート
【図２】図１の意味特定処理における画像領域の特定手法の例を示した工程図
【図３】ブロック画像に分割された全体画像を示した図
【図４】図１の意味特定処理における意味特定手法の詳細な工程を示したフローチャート
【図５】図４の輝度に関する特徴量の抽出工程をさらに詳細に示したフローチャート
【図６】輝度成分の信号値のヒストグラムおよび累積ヒストグラムの例を示した図
【図７】エッジ成分の信号値の導出に用いられるエッジ検出用フィルターの例を示した図
【図８】自己組織化マップおよび対応する意味のマップの例を示した概念図[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a method and apparatus for extracting feature quantities representing features of a two-dimensional image, and more particularly to a method and apparatus for extracting feature quantities that have been effectively reduced in dimension.
[0002]
[Prior art]
In order to compare or classify one frame of a two-dimensional digital photographic image or moving image, it has been conventionally performed to extract a feature amount indicating the feature of the image from the image or a block image obtained by dividing them into blocks. It has been broken. Such feature amount extraction can also be used to determine what the photographing object is included in the two-dimensional image, that is, the meaning of the two-dimensional image. By determining the meaning, it is possible to improve the image quality by performing image processing only on the image region corresponding to the specific photographing target. For example, a technique has been proposed in which an image area corresponding to a person's skin is specified based on a color-related feature amount, noise is removed from only the image area, and a beautiful skin color is finished (for example, Patent Documents). 1).
[0003]
As a useful feature amount of such a two-dimensional image, one using a histogram is known. For example, for each component of the RGB color system, a histogram is created by counting the number of pixels indicating each density value from 0 to 255, and the 3 × 256 count values are used as feature amounts as they are. There is a technique in which a color space of a color system is divided into 4 × 4 × 4 areas, a histogram is created in which the number of pixels belonging to each area is counted, and each count value is used as a feature amount (for example, a patent) Reference 2).
[0004]
Also, instead of using the count value of the histogram as it is as the feature quantity of the two-dimensional image as described above, an average value and a standard deviation are obtained as representative values for each of the histograms created for hue, saturation, and brightness. A method of using it as a feature quantity is also known (see, for example, Patent Document 3). In this method, the number of extracted feature quantities, that is, the dimension of the feature quantity can be reduced as compared with the case where the count value of the histogram is used as it is as the feature quantity.
[0005]
On the other hand, although it is not in the field of image processing, especially for a frequency distribution that does not show a normal distribution, it is possible to use the value of the n quantile as a representative value of a more appropriate distribution instead of the average value or standard deviation. It has been proposed in the field of detection and the like (see, for example, Patent Document 4). Here, the “n quantile value” refers to the N / nth, 2N / nth,... N (n) from the smallest, when N values to be examined for distribution are arranged in ascending order. -1) / n points to each value.
[0006]
[Patent Document 1]
Japanese Patent Publication No. 5-62879
[0007]
[Patent Document 2]
JP 2000-353173 A
[0008]
[Patent Document 3]
JP-A-7-29007
[0009]
[Patent Document 4]
JP-A-8-166820
[0010]
[Problems to be solved by the invention]
The method of using the count value of the histogram as described above as the feature quantity of the two-dimensional image is effective, but the number of feature quantities to be extracted, that is, the dimension of the feature quantity generally increases, and the image based on the feature quantity There is a drawback that the amount of calculation when performing comparison, classification, semantic determination, etc. becomes enormous. Such a problem becomes more serious as the step size of the histogram is made finer in order to improve accuracy. On the other hand, if the step size of the histogram is increased in order to keep the feature amount dimension low, the numerical accuracy of the feature amount is lowered.
[0011]
According to the method of using the average value and standard deviation of the histogram as the feature quantity, the dimension of the feature quantity can be kept small. However, the distribution of component signal values such as the luminance component, color component, edge component, etc. that each pixel constituting the two-dimensional image shows various distribution shapes, and in particular includes a plurality of photographing objects and non-uniform textures. For images, the complexity of the distribution shape is high. Therefore, when the representative value of the histogram is used as the feature amount, it is preferable that the representative value appropriately reflects the difference in distribution shape. In this respect, the average value and the standard deviation are not necessarily appropriate as the representative values used as the feature quantity of the two-dimensional image, and may reduce the accuracy of image comparison, classification, semantic determination, etc. based on the feature quantity. Is expensive.
[0012]
In view of such circumstances, an object of the present invention is to extract, from a two-dimensional image, a feature amount that is effectively reduced in dimension while maintaining numerical accuracy and appropriately reflects a difference in distribution shape of component signal values. Is.
[0013]
[Means for Solving the Problems]
That is, the method for extracting the feature amount of the image according to the present invention is a method for extracting the feature amount representing the feature of the two-dimensional image from the two-dimensional image, which relates to at least one component signal value. Including a step of creating a cumulative histogram of the component signal values assigned to a plurality of pixels constituting a dimensional image, and a step of extracting the value of the n quantile of the cumulative histogram as a feature amount, wherein n is It is a method characterized by being a natural number of 3 or more and less than the number of dimensions of the cumulative histogram.
[0014]
An apparatus for extracting feature quantities of an image according to the present invention is an apparatus for extracting feature quantities representing features of a two-dimensional image from a two-dimensional image, and relates to at least one component signal value. Means for creating a cumulative histogram of the component signal values assigned to a plurality of pixels constituting a dimensional image, and means for extracting the value of the n quantile of the cumulative histogram as a feature quantity, wherein n is The apparatus is a natural number that is 3 or more and less than the number of dimensions of the cumulative histogram.
[0015]
Here, the “feature amount” is a general term for quantities serving as parameters indicating characteristics of a certain two-dimensional image, and includes, for example, color characteristics, luminance characteristics, depth information, edge characteristics included in the image, and the like. The indicated feature quantity may be included.
[0016]
The “component signal value” is a signal value of one component of the image assigned to each pixel of the two-dimensional image, and includes, for example, a signal value of a luminance component and a signal value of a color component. Furthermore, the signal value of each pixel of an image that has undergone some processing, such as an edge image derived from the distribution of luminance components, the absolute value of those signal values, and the standardized one, etc. "Signal value".
[0017]
The “n quantile” is one kind of representative points of the distribution. When N values to be examined for distribution are arranged in ascending order, the N / nth, 2N / Each value that is nth... N (n−1) / nth becomes a “value of n quantile”. In other words, a cumulative histogram of these N values is created, and the value of the horizontal axis corresponding to the point obtained by dividing the vertical axis into n equal parts is specified, thereby obtaining the “n quantile value”. be able to. For example, in the case of a quartile, the values on the horizontal axis corresponding to the values 0.25, 0.50, and 0.75 on the vertical axis of the standardized cumulative histogram are the values of the quartiles.
[0018]
The “dimension number” of the histogram or cumulative histogram refers to the number of columns assumed when the histogram or cumulative histogram is created.
[0019]
Further, the present invention may further determine the meaning of the two-dimensional image based on the above feature amount. Here, “determining the meaning” of a two-dimensional image refers to determining what the image is taken from. This meaning determination is preferably performed on the basis of a combination of feature amounts relating to two or more types of component signal values after extracting the feature amounts relating to the two or more types of component signal values. These two or more types of component signal values can be selected so as to include at least two of a luminance component signal value, a color component signal value, and an edge component signal value. In these cases, the above-described meaning determination may be performed using a self-organizing map.
[0020]
The value of n is preferably 20 or less.
[0021]
Further, the above-described two-dimensional image that is a target from which the feature amount is extracted may be an entire image or a block image obtained by dividing the entire image.
[0022]
Here, the “whole image” refers to a photographed digital photograph image or a two-dimensional image corresponding to the entire one frame of a moving image. On the other hand, the “block image” refers to each image piece obtained by dividing the entire image into several regions (blocks). For example, each of the entire image of 1024 × 1280 pixels divided into a size of 32 × 32 pixels. An image piece or the like corresponds to this. Note that the above “determining the meaning” includes, for example, determining which block, such as “sky”, “building”, “grass”, etc., is a captured block, This includes determining whether the whole image is a “person photograph”, a “building photograph”, a “seascape photograph”, or the like.
[0023]
【The invention's effect】
The method and apparatus for extracting the feature amount of the image of the present invention extracts the value of the n quantile of the cumulative histogram of component signal values as the feature amount, and n is a natural number less than the number of dimensions of the cumulative histogram. Therefore, it is possible to effectively compress the dimension of the feature amount and reduce the amount of calculation when performing image comparison, classification, semantic determination, and the like. In addition, since n is a natural number of 3 or more, unlike the case where the average value or standard deviation of the histogram is used as the feature amount, the difference in the distribution shape of the component signal values can be reflected in the feature amount, and the reliability can be improved. High comparison, classification, semantic determination, etc. can be performed. Furthermore, if the step size of the base cumulative histogram is made fine, even if the dimension of the feature quantity is compressed by using the value of the n quantile as the feature quantity, the numerical accuracy of the feature quantity can be kept high. it can.
[0024]
In particular, when the feature amount is used to determine the meaning of a two-dimensional image, an appropriate meaning determination can be performed even for an image that includes a plurality of shooting targets and non-uniform textures and has a complicated distribution shape of component signal values. I can expect. Furthermore, the meaning of the image is based on a combination of feature quantities related to at least two of component signal values, not just one type, for example, a luminance component signal value, a color component signal value, and an edge component signal value. If it is determined, the reliability of the semantic determination can be remarkably improved.
[0025]
In particular, in the semantic determination using the self-organizing map, it is necessary to perform a calculation for comparing the feature vector having each extracted feature quantity as a component with all of the many reference feature vectors on the self-organizing map. Therefore, the effect of dimensional compression of the above feature quantity is extremely important.
[0026]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the drawings.
[0027]
FIG. 1 is a flowchart showing the procedure of a process for specifying meaning of each image area included in a two-dimensional whole image using a method or apparatus for extracting image feature values according to the present invention. This process specifies the meaning of individual image areas in the entire image, that is, individual significant areas that are considered to correspond to any of the shooting targets such as “sky”, “building”, “grass”, etc. After that, it is useful processing for performing image processing based on conditions classified for each image area corresponding to each classification and image classification based on the meaning. First, image data representing an entire image to be processed is read in step 10, an appropriate image area is specified in step 12, the entire image is divided into block images in step 14, and features from each block image in step 16. The amount is extracted to determine the meaning of each block image, and the meaning of each image region is specified based on the extracted amount. Of these steps, the present invention particularly relates to a feature quantity extraction method and apparatus used in step 16, but the other steps will be described in order.
[0028]
An example of the image region specifying method in step 12 will be described with reference to FIG.
[0029]
FIG. 2A shows an entire image as an original image to be processed. First, with respect to each pixel constituting the original image, the color characteristics of adjacent pixels are compared and similar pixels are integrated. Here, the comparison of color characteristics and integration of similar pixels means, for example, that each component signal value of the original image represented by the RGB color system, that is, the density value of each component of R, G, and B is adjacent to each other. Each pixel is compared, and when the difference between any component signal values exceeds a predetermined threshold value, processing such as integration of those pixels is performed. Instead of the RGB color system, each component signal value represented in the YCC color system may be compared. This comparison and integration is sequentially repeated until no further integration occurs according to a predetermined criterion such as the above-described threshold value, and an area composed of pixels having similar color characteristics is expanded. The state after the integration of similar pixels is assumed to be the state shown in FIG.
[0030]
Here, among the areas constituting the image shown in FIG. 2B, an area whose peripheral length is shorter than a predetermined length is referred to as a “micro area”, and the peripheral length is equal to or greater than the predetermined length. Will be referred to as "non-micro-areas". In FIG. 2B, the areas 20 and 22 etc. are micro areas, and the areas 24, 26 and 28 etc. are non-micro areas.
[0031]
Next, each area constituting the image of FIG. 2 (b) is compared with the adjacent area, and what can be integrated is further integrated. Varies by area. For a micro area, a micro area that is completely contained in one non-micro area (eg, a micro area 20 that is completely contained in non-micro area 26) is integrated into that one non-micro area. To do. Moreover, the micro area | region which contact | connects a boundary with two or more non-micro area | regions shall be integrated into the non-micro area | region with the longer length of the boundary which touches. According to this standard, the state after the integration of the minute areas is as shown in FIG.
[0032]
For non-micro areas, the average color characteristics of the pixels forming the non-micro areas are compared with the average color characteristics of the pixels forming each adjacent non-micro area, and the degree of similarity is determined based on a threshold or the like. If there are more than non adjacent micro-areas, integration is performed. For example, for the average color feature of the non-micro-area 24 in FIG. 2c, the degree of similarity of the non-micro-area 26 with the average color feature exceeds the predetermined criteria, but the non-micro-area 28 If the degree of similarity to the average color feature is less than or equal to the predetermined criterion, the non-micro area 24 is integrated with the non-micro area 26 and is not integrated with the non-micro area 28. The final result of the integration of the non-micro area according to the predetermined criteria is, for example, as shown in FIG. Each area constituting the final image is identified as an “image area”.
[0033]
The example of the method for dividing the image area in step 12 in FIG. 1 has been described with reference to FIG. 2. Needless to say, the division into the image area in step 12 may be performed by any other known method. Yes.
[0034]
Returning to FIG. 1, in step 14, the entire image to be processed is divided into block images. In the present embodiment, it is assumed that the entire image is a digital photographic image having 1024 × 1280 pixels, and the block images are images each having 32 × 32 pixels. The divided whole image is shown in FIG. In the figure, for the convenience of explanation, it is shown in a coarser division than actual.
[0035]
Subsequently, in step 16 of FIG. 1, a feature amount is extracted from each divided block image, the meaning of each block image is determined, and the meaning of each image region is specified based on the extracted feature amount. The detailed process of the process performed in step 16 is shown in the flowchart of FIG.
[0036]
First, in step 40 of FIG. 4, a block image included in one image region among the plurality of image regions specified as shown in FIG. 2D is specified. Here, the block image included in one image region refers to a block image completely included in the image region, and does not include a block image straddling the boundary between the image regions.
[0037]
Next, in step 42, for one of the block images identified in step 40, a feature value related to luminance is extracted from the distribution of the luminance component signal values carried by each pixel included in the block image. Here, the signal value of the luminance component refers to the signal value of the Y component of the block image represented in the YCC color system in the present embodiment, but is not limited thereto.
[0038]
The processing performed in step 42 is shown in detail in the flowchart of FIG.
[0039]
First, in step 60 of FIG. 5, a histogram of signal values of luminance components carried by each pixel is created for the block image that is the current feature amount extraction target. For example, if the current block image includes only one shooting target (in this case, the sky) having substantially uniform luminance, such as the block image 30 in FIG. The shape is as shown on the left side. A block image including a plurality of photographing objects (here, a wall portion and a window portion of a building) having substantially uniform luminance, such as the block image 32 in FIG. 3, is shown on the left side of FIG. 6B. Thus, the shape has a plurality of peaks. As shown in the left side of FIG. 6C, a block image including a subject to be imaged (here, a grassland) with substantially uniform color but uneven brightness, such as the block image 34 in FIG. Become a shape. In these examples, the number of dimensions of the histogram is 256.
[0040]
Next, in step 62 in FIG. 5, a cumulative histogram indicating the cumulative count value of each signal value of the luminance component is created and normalized based on the histogram created in step 60. The cumulative histogram has the shapes as shown on the right side of FIGS. 6A, 6B, and 6C with respect to the block images 30, 32, and 34 in FIG. 3 exemplified above. As described above, the shape of the cumulative histogram is a shape unique to the block image that reflects the number of shooting targets and the texture included in the current block image.
[0041]
Subsequently, in step 64 of FIG. 5, the values of the three quartiles are specified from the normalized cumulative histogram and are used as the feature values regarding luminance. Specifically, as shown in FIG. 6, the values on the horizontal axis corresponding to the values 0.25, 0.50, and 0.75 on the vertical axis of the normalized cumulative histogram are These three values are used as feature values related to luminance. These quartile values and intervals reflect the difference in the shape of the cumulative histogram, that is, the difference in the distribution shape of the luminance component signal values in the original histogram.
[0042]
After the three feature values related to the luminance are extracted as described above, returning to FIG. 4, in step 44, regarding the same current block image, additional feature values are obtained from the signal values of the color components carried by each pixel. Extracted. In the present embodiment, as the signal values of the color components, the signal values of the two color difference components (that is, the Cr component and the Cb component) of the block image expressed in the YCC color system are used. For the signal value, a standardized cumulative histogram is created by the same method as described above, and the value of the quartile is specified as the feature amount. That is, since the values of the three quartiles are specified as the feature amounts for the two color difference components, the feature amounts relating to the six colors are extracted in total. The signal value of the color component is not limited to the signal value of the Cr component and the Cb component used in the present embodiment, and for example, the signal value of each component in the RGB color system may be used.
[0043]
Next, in step 46, an additional feature amount is extracted from the signal value of the edge component for the same current block image. In this embodiment, the signal value of the absolute value component of the vertical edge image obtained by applying a filter for edge detection as shown in FIG. 7 to the Y component image in the YCC color system of the block image, Two kinds of signal values of the absolute value component of the horizontal edge image are used as the signal value of the edge component. Again, a standardized cumulative histogram is created for the signal values of these two types of edge components by the same method as described above, and the value of the quartile is specified as a feature quantity. As a result, feature amounts relating to the features of six edges in total are extracted. Note that the signal value of the edge component is not limited to that used in the above-described embodiment. For example, the signal value of the edge image may be used as it is instead of the absolute value component, or for edge detection in four directions. You may use four types of signal values derived | led-out using the filter.
[0044]
When the extraction of a total of 15 feature values for the current block image is completed in the above steps 42 to 46, the process proceeds to step 48, and the meaning of the block image is determined. In the present embodiment, the meaning of the block image is determined using a method of mapping “feature vectors” having these 15 feature amounts as components onto a “self-organizing map”.
[0045]
As shown in FIG. 8, the “self-organizing map” is a map in which points corresponding to a plurality of reference feature vectors are two-dimensionally arranged, and points corresponding to similar reference feature vectors are close to each other. Placed in position. As shown in FIG. 8, the self-organizing map is accompanied by a map having a meaning of a corresponding size. This meaning map has a plurality of probabilities such as a probability distribution map in which the image corresponding to each reference feature vector on the self-organizing map is an “sky” image, and a probability distribution map in which the image is a “building” image. Distribution maps are superimposed. The self-organizing map and the semantic map are created in advance by causing a computer to learn feature vectors of a large number of images that are known to be “sky” and “building” images.
[0046]
An example of this learning process will be schematically described. In a state before learning, various reference feature vectors are randomly distributed on the self-organizing map. The initial value of each point in each probability distribution map is set to 0. Here, when a feature vector of one image that is known to be “empty” is input as a learning target, a reference feature that is most similar to the input feature vector is displayed on the self-organizing map. A vector is specified. This specification is performed by, for example, searching for a reference feature vector having the smallest Euclidean distance from the input feature vector. Then, the reference feature vector in the vicinity of the identified reference feature vector on the self-organizing map, for example, in the range of 7 × 7, approaches the identified reference feature vector (that is, the identified reference). Modified so that the similarity to the feature vector is increased. On the other hand, on the probability distribution map that is an image of “sky”, for example, a frequency value of “1” is added to the point corresponding to the above-described identified reference feature vector and points in the vicinity of the 7 × 7 range. Is done. Next, when a feature vector of one image known to be a “building” is input, identification of the most similar reference feature vector on the self-organizing map and a neighboring The reference feature vector is corrected. On the other hand, on the probability distribution map that is an image of “building”, a frequency value of “1” is added to a point corresponding to the identified reference feature vector and points in the vicinity thereof. When such learning is repeated, reference feature vectors indicating similar features are gradually gathered at positions close to each other on the self-organizing map. On the other hand, on each probability distribution map, an island-like frequency distribution is gradually formed. As learning progresses and similar reference feature vectors are gathered, the size of the neighborhood where the reference feature vector, which was originally 7 × 7, is corrected is gradually reduced. After completion of learning, the respective probability distribution maps are standardized and superimposed to form a meaning map as shown in FIG.
[0047]
In step 48 of FIG. 4, the feature vector extracted from the current block image is mapped to a point corresponding to the most similar reference feature vector on the self-organizing map previously derived by learning as described above. . Again, the similarity is evaluated using the Euclidean distance between both vectors as an index. Then, the point on the meaning map corresponding to the point of the mapping destination of the self-organizing map is referred to, and indicates the highest probability among the “meaning” such as “sky” and “building”. The meaning is the meaning of the current block image.
[0048]
Here, in the present embodiment, the value of the quartile is used as the feature amount, so that it is based on five different component signal values (that is, signal values of the luminance component, two color components, and two edge components). The dimension of the feature vector is suppressed to 15 dimensions, and the difference in the distribution shape of the component signal values is reflected in the feature vector. Further, by reducing the step size of the base cumulative histogram, the numerical accuracy of the feature amount can be kept high. Therefore, it is possible to perform highly reliable semantic determination while suppressing the amount of calculation required for comparing the extracted feature vector and the reference feature vector.
[0049]
Subsequently, in step 50 of FIG. 4, it is confirmed whether or not the block image included in the current image area still remains, and step 42 is performed until the meanings of all the block images included in the current image area are determined. To 50 are repeated.
[0050]
When the determination of the meanings of all the block images included in the current image area is completed, in step 52, the largest number of the determined meanings of each block image is specified as the meaning of the current image area. For example, some of the block images included in the image area 28 in FIG. 2D may be determined to mean “water”, but most are determined to mean “empty”. Therefore, the image area 28 is specified as an “empty” area.
[0051]
Subsequently, in step 54, it is confirmed whether or not there remains an image area whose meaning has not yet been specified, and the process shown in FIG. 4 is repeated until the meanings of all the image areas are specified.
[0052]
In the present embodiment, the value of the quartile is used as the feature quantity. However, the present invention is characterized in that the value of the n quartile is extracted as the feature quantity, and the value of n is 4. Any value may be used as long as it is a natural number of 3 or more and less than the number of dimensions of the cumulative histogram. When the extracted feature amount is used to determine the meaning of the image, the value of n is preferably about 3 or more and 20 or less from the viewpoint of preventing an enormous amount of calculation for the meaning determination.
[0053]
In the above embodiment, the meaning of the image is determined using the feature values extracted from the signal values of the luminance component, color component, and edge component. However, the present invention is not limited to this, and the luminance component, color component, and edge component are not limited thereto. May be used, or instead of or in addition to these, other component signal values relating to depth information or the like may be used.
[0054]
Furthermore, in the above embodiment, a method using a self-organizing map is used as a method for determining the meaning of an image. However, the present invention is not limited to this, and other methods may be used. For example, a large number of images whose meanings are known are mapped and learned in advance in the feature amount space, and the barycentric coordinates and variation (standard deviation) in the feature amount space of each meaning image are obtained. For example, a feature amount may be extracted from a certain image, mapped to the feature amount space, and the meaning may be determined based on the Mahalanobis distance from the center of gravity of each meaning image obtained by learning. Further, feature vectors that are further reduced in dimension by principal component analysis may be used.
[0055]
In the above description, as one embodiment of the present invention, the two-dimensional whole image is divided into block images, and the meaning of each block image is determined to determine the meaning of each image area included in the whole image. Although the specifying process has been described, as another embodiment of the present invention, application to the meaning determination process of the entire image is also conceivable. In this case, for example, a self-organizing map and a meaning map are created by learning in advance the entire images that are known to be “personal photographs”, “building photographs”, “seascape photographs”, etc. Then, a feature vector having a plurality of feature amounts extracted from the entire image as a semantic determination target by creating a cumulative histogram and specifying an n quantile is mapped onto a self-organizing map. Again, it goes without saying that the semantic determination method is not limited to the one using the self-organizing map.
[0056]
The image feature extraction method and apparatus according to the present invention can be applied not only to image semantic determination processing as described above but also to various other image processing such as image comparison and classification. It is.
[0057]
As mentioned above, although embodiment of this invention was described in detail, these embodiment is only an illustration, The technical scope of this invention should be defined only by the claim in this specification Needless to say.
[Brief description of the drawings]
FIG. 1 is a flowchart showing the procedure of a process for specifying meaning of each image area included in a two-dimensional whole image according to an embodiment of the present invention.
2 is a process diagram showing an example of an image area specifying method in the meaning specifying process of FIG. 1;
FIG. 3 shows an entire image divided into block images.
4 is a flowchart showing detailed steps of a meaning specifying method in the meaning specifying process of FIG. 1;
FIG. 5 is a flowchart showing in more detail the process of extracting feature values related to luminance in FIG. 4;
FIG. 6 is a diagram showing an example of a histogram of signal values of luminance components and an example of a cumulative histogram.
FIG. 7 is a diagram showing an example of an edge detection filter used for deriving edge component signal values;
FIG. 8 is a conceptual diagram showing an example of a self-organizing map and a corresponding meaning map;

Claims

A method for extracting a feature amount representing a feature of a two-dimensional image from a two-dimensional image,
Creating a cumulative histogram of the component signal values assigned to a plurality of pixels forming the two-dimensional image for at least one component signal value;
Extracting the value of the n quantile of the cumulative histogram as the feature amount;
The n is a natural number of 3 or more and less than the number of dimensions of the cumulative histogram.

The method according to claim 1, further comprising determining a meaning of the two-dimensional image based on the feature amount.

The feature amount is extracted with respect to two or more types of component signal values, and the meaning of the two-dimensional image is determined based on a combination of the feature amounts with respect to the two or more types of component signal values. The method described.

4. The method according to claim 3, wherein the two or more types of component signal values include at least two of a luminance component signal value, a color component signal value, and an edge component signal value.

The method according to any one of claims 2 to 4, wherein in the step of determining the meaning, the meaning is determined using a self-organizing map.

The method according to claim 2, wherein n is 20 or less.

The method according to claim 1, wherein the two-dimensional image is a whole image.

The method according to claim 1, wherein the two-dimensional image is a block image obtained by dividing an entire image.

An apparatus for extracting a feature amount representing a feature of a two-dimensional image from a two-dimensional image,
Means for creating a cumulative histogram of the component signal values assigned to a plurality of pixels forming the two-dimensional image for at least one component signal value;
Means for extracting the value of the n quantile of the cumulative histogram as the feature amount;
The n is a natural number of 3 or more and less than the number of dimensions of the cumulative histogram.

The apparatus according to claim 9, further comprising means for determining the meaning of the two-dimensional image based on the feature amount.

The feature amount is extracted with respect to two or more types of component signal values, and the meaning of the two-dimensional image is determined based on a combination of the feature amounts with respect to the two or more types of component signal values. The apparatus of claim 10.

12. The apparatus according to claim 11, wherein the two or more types of component signal values include at least two of a luminance component signal value, a color component signal value, and an edge component signal value.

The apparatus according to any one of claims 10 to 12, wherein the means for determining the meaning determines the meaning using a self-organizing map.

14. The apparatus according to claim 10, wherein n is 20 or less.

15. The apparatus according to claim 9, wherein the two-dimensional image is a whole image.

The apparatus according to claim 9, wherein the two-dimensional image is a block image obtained by dividing an entire image.