JP5836779B2

JP5836779B2 - Image processing method, image processing apparatus, imaging apparatus, and program

Info

Publication number: JP5836779B2
Application number: JP2011265047A
Authority: JP
Inventors: 俊太舘; 裕輔御手洗; 克彦森
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2011-12-02
Filing date: 2011-12-02
Publication date: 2015-12-24
Anticipated expiration: 2031-12-02
Also published as: JP2013117860A

Description

本発明は画像処理方法、画像処理装置、撮像装置およびプログラムに関する。 The present invention relates to an image processing method, an image processing apparatus, an imaging apparatus, and a program.

従来、画像を被写体ごとに分割する手法として非特許文献１のような方法がある。非特許文献１の方法では、各被写体のカテゴリーごとにテクスチャや画素の色といった特徴量を記憶しておく。次にこの特徴量に基づいて各画素がどのカテゴリーに属するかの尤度を求め、画素間の整合性を考慮しながら画素の帰属を決定することで領域分割を行う。 Conventionally, there is a method as described in Non-Patent Document 1 as a method of dividing an image for each subject. In the method of Non-Patent Document 1, feature quantities such as textures and pixel colors are stored for each category of each subject. Next, the likelihood of which category each pixel belongs to is obtained based on the feature amount, and region division is performed by determining the attribution of the pixel while considering the consistency between the pixels.

ＰｕｓｈｍｅｅｔＫｏｈｌｉ，Ｌ’ｕｂｏｒＬａｄｉｃｋｙａｎｄＰｈｉｌｉｐＨ．Ｓ．Ｔｏｒｒ，Ｉｎｔ．Ｊ．ｏｆＣｏｍｐｕｔｅｒＶｉｓｉｏｎ，８２（３），３０２−３２４，２００９.Pushmeet Kohli, L'ubor Ladicky and Philip H. et al. S. Torr, Int. J. et al. of Computer Vision, 82 (3), 302-324, 2009. ＪｉｔｅｎｄｒａＭａｌｉｋ，ＳｅｒｇｅＢｅｌｏｎｇｉｅ，ＴｈｏｍａｓＬｅｕｎｇａｎｄＪｉａｎｂｏＳｈｉ，Ｉｎｔ．Ｊ．ｏｆＣｏｍｐｕｔｅｒＶｉｓｉｏｎ，４３（１），７−２７，２００１．Jitendra Malik, Serge Belongie, Thomas Leung and Jianbo Shi, Int. J. et al. of Computer Vision, 43 (1), 7-27, 2001. Ａ．Ｒａｂｉｎｏｖｉｃｈ，Ｔ．Ｌａｎｇｅ，Ｊ．Ｂｕｈｍａｎｎ，Ｓ．Ｂｅｌｏｎｇｉｅ，ＩＥＥＥＣｏｎｆ．ｏｎＣｏｍｐｕｔｅｒＶｉｓｉｏｎａｎｄＰａｔｔｅｒｎＲｅｃｏｇｎｉｔｉｏｎ，２００６．A. Rabinovich, T .; Lange, J .; Buhmann, S.M. Belongie, IEEE Conf. on ComputerVision and Pattern Recognition, 2006. Ｊ．Ｃａｒｒｅｉｒａ，Ｃ．Ｓｍｉｎｃｈｉｓｅｓｃｕ，ＩＥＥＥＣｏｎｆ．ｏｎＣｏｍｐｕｔｅｒＶｉｓｉｏｎａｎｄＰａｔｔｅｒｎＲｅｃｏｇｎｉｔｉｏｎ，２０１０．J. et al. Carreira, C.I. Sminchiscu, IEEE Conf. on Computer Vision and Pattern Recognition, 2010. Ａ．Ｇｉｏｎｉｓ，Ｐ．Ｉｎｄｙｋ，Ｒ．Ｍｏｔｗａｎｉ，Ｐｒｏｃ．ｏｆｔｈｅ２５ｔｈＶｅｒｙＬａｒｇｅＤａｔａｂａｓｅＣｏｎｆ．，１９９９．A. Gionis, P.M. Indyk, R.A. Motwani, Proc. of the 25th Very Large Database Conf. 1999. Ｊ．Ｓｉｖｉｃ，Ｂ．Ｃ．Ｒｕｓｓｅｌｌ，Ａ．Ａ．Ｅｆｒｏｓ，Ａ．Ｚｉｓｓｅｒｍａｎ，ａｎｄＷ．Ｔ．Ｆｒｅｅｍａｎ．ＩｎＰｒｏｃ．ｏｆ１０ｔｈＩＥＥＥＩｎｔ．Ｃｏｎｆ．ｏｎＣｏｍｐｕｔｅｒＶｉｓｉｏｎ，２００５．J. et al. Sivic, B.M. C. Russell, A.R. A. Efros, A.E. Zisserman, and W.C. T. T. et al. Freeman. In Proc. of 10th IEEE Int. Conf. on Computer Vision, 2005. Ａ．Ｓａｘｅｎａ，Ｓ．Ｈ．Ｃｈｕｎｇ，Ａ．Ｙ．Ｎｇ，Ｉｎｔ．Ｊ．ｏｆＣｏｍｐｕｔｅｒＶｉｓｉｏｎ，７６（１），５３−６９，２００８．A. Saxena, S .; H. Chung, A.D. Y. Ng, Int. J. et al. of Computer Vision, 76 (1), 53-69, 2008. Ｊ．Ｓｉｖｉｃ，Ａ．Ｚｉｓｓｅｒｍａｎ，Ｐｒｏｃ．ｏｆｔｈｅ９ｔｈＩＥＥＥＩｎｔ．Ｃｏｎｆ．ｏｎＣｏｍｐｕｔｅｒＶｉｓｉｏｎ，２００３．J. et al. Sivic, A.M. Zisserman, Proc. of the 9th IEEE Int. Conf. on Computer Vision, 2003. Ｏ．Ｃｈｕｍ，Ｍ．Ｐｅｒｄｏｃｈ，Ｊ．Ｍａｔａｓ，ＩＥＥＥＣｏｎｆ．ｏｎＣｏｍｐｕｔｅｒＶｉｓｉｏｎａｎｄＰａｔｔｅｒｎＲｅｃｏｇｎｉｔｉｏｎ，２００９．O. Chum, M.M. Perdoc, J. et al. Matas, IEEE Conf. on Computer Vision and Pattern Recognition, 2009. Ｂ．Ｃ．Ｒｕｓｓｅｌｌ，Ｗ．Ｔ．Ｆｒｅｅｍａｎ，Ａ．Ａ．Ｅｆｒｏｓ，Ｊ．Ｓｉｖｉｃ，Ａ．Ｚｉｓｓｅｒｍａｎ，Ｐｒｏｃ．ｏｆｔｈｅＩＥＥＥＣｏｎｆ．ｏｎＣｏｍｐｕｔｅｒＶｉｓｉｏｎａｎｄＰａｔｔｅｒｎＲｅｃｏｇｎｉｔｉｏｎ，２００６．B. C. Russell, W.M. T. T. et al. Freeman, A.M. A. Efros, J. et al. Sivic, A.M. Zisserman, Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, 2006. Ｄ．Ｂｌｅｉ，Ａ．Ｎｇ，ａｎｄＭ．Ｊｏｒｄａｎ．，Ｊ．ｏｆＭａｃｈｉｎｅＬｅａｒｎｉｎｇＲｅｓｅａｒｃｈ，３：９９３．１０２２，２００３．D. Blei, A.D. Ng, and M.M. Jordan. , J. et al. of Machine Learning Research, 3: 993.1022, 2003. Ｇ．Ｋｉｍ，Ａ．Ｔｏｒｒａｌｂａ，ＡｎｎｕａｌＣｏｎｆ．ｏｎＮｅｕｒａｌＩｎｆｏｒｍａｔｉｏｎＰｒｏｃｅｓｓｉｎｇＳｙｓｔｅｍｓ，２００９．G. Kim, A.A. Torralba, Annual Conf. on Neural Information Processing Systems, 2009. Ｇ．Ｗａｎｇ，Ｙ．Ｚｈａｎｇ．Ｌ．Ｆｅｉ−Ｆｅｉ，Ｐｒｏｃ．ｏｆｔｈｅＩＥＥＥＣｏｎｆ．ｏｎＣｏｍｐｕｔｅｒＶｉｓｉｏｎａｎｄＰａｔｔｅｒｎＲｅｃｏｇｎｉｔｉｏｎ，２００６．G. Wang, Y.W. Zhang. L. Fei-Fei, Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, 2006. Ｋ．Ｂａｒｎａｒｄ，Ｄ．Ｆｏｒｓｙｔｈ，Ｐｒｏｃ．ｏｆ８ｔｈＩＥＥＥＩｎｔ．Ｃｏｎｆ．ｏｎＣｏｍｐｕｔｅｒＶｉｓｉｏｎ，２００１．K. Barnard, D.D. Forsyth, Proc. of 8th IEEE Int. Conf. on Computer Vision, 2001.

非特許文献１の方法では、複数の事例画像と、事例画像中の各画素がどの被写体に属するかという教師データを用意し、被写体の特徴を事前に学習しておく必要がある。このため領域分割が可能な被写体のカテゴリーは限定され、未知の被写体に対して所望の分割結果が得られないことがある。 In the method of Non-Patent Document 1, it is necessary to prepare a plurality of case images and teacher data indicating to which subject each pixel in the case image belongs and to learn the features of the subject in advance. For this reason, the categories of subjects that can be divided into regions are limited, and a desired division result may not be obtained for an unknown subject.

被写体を限定しない方法として、局所特徴の類似性などを手掛かりにして領域分割を行う非特許文献２のような方法も研究されている。しかしこの種の方法では、被写体が見え方の異なる複数の部分で構成されている場合、複数の部分に過分割されてしまうという避け難い問題がある。 As a method that does not limit the subject, a method such as Non-Patent Document 2 in which region segmentation is performed based on similarity of local features or the like has been studied. However, this type of method has an unavoidable problem that when the subject is composed of a plurality of portions with different appearances, the subject is excessively divided into a plurality of portions.

本発明は、被写体の教師データを事前に与えることなく、入力画像を被写体の種類に応じて、互いに異なる領域に分割することが可能な技術を提供する。 The present invention provides a technique that can divide an input image into different regions according to the type of subject without giving instructor data of the subject in advance.

上記の目的を達成する本発明の一つの側面に係る画像処理装置は、入力画像を、当該入力画像中の被写体ごとの領域に分割する画像処理装置であって、
前記入力画像を被写体ごとの複数の領域の候補を表わす候補領域群に分割する分割手段と、
前記候補領域群のそれぞれの候補領域に対して類似する類似画像群を画像データベースより取得する取得手段と、
前記類似画像群の中から類似画像に共通する特徴を候補領域の特徴成分として抽出する抽出手段と、
前記特徴成分と前記入力画像の各画素の特徴量とを比較して、各画素が候補領域群のうちどの候補領域に属するかを同定し、領域分割の結果として出力する領域同定手段と、を備えることを特徴とする。 An image processing device according to one aspect of the present invention that achieves the above object is an image processing device that divides an input image into regions for each subject in the input image,
Dividing means for dividing the input image into candidate region groups representing candidates for a plurality of regions for each subject;
Acquisition means for acquiring a similar image group similar to each candidate area of the candidate area group from an image database;
Extraction means for extracting features common to similar images from the similar image group as feature components of candidate regions;
Comparing the feature component with the feature amount of each pixel of the input image, identifying which candidate region each pixel belongs to a candidate region group, and outputting a region identification unit as a result of region division; It is characterized by providing.

本発明によれば、被写体の教師データを事前に与えることなく、入力画像を被写体の種類に応じて、互いに異なる領域に分割することができる。 According to the present invention, it is possible to divide an input image into different areas according to the type of the subject without providing the subject's teacher data in advance.

第１実施形態の画像処理装置の構成を説明する図。1 is a diagram illustrating a configuration of an image processing apparatus according to a first embodiment. 領域成分の抽出動作の概念を説明する図。The figure explaining the concept of the extraction operation | movement of a region component. 従来手法の領域成分の抽出動作と実施形態における領域成分の抽出動作との比較を説明する図。The figure explaining the comparison with the extraction operation of the area | region component of a conventional method, and the extraction operation | movement of the area | region component in embodiment. 領域成分抽出部の動作を説明する図。The figure explaining operation | movement of an area | region component extraction part. 類似領域の抽出動作の概念を説明する図。The figure explaining the concept of extraction operation | movement of a similar area | region. 領域同定部の動作を説明する図。The figure explaining operation | movement of an area | region identification part. 類似画像取得部の動作を説明する図。The figure explaining operation | movement of a similar image acquisition part. 類似画像取得部の更に詳細な動作を説明する図。The figure explaining the further detailed operation | movement of a similar image acquisition part. 第２実施形態の画像処理装置の構成を説明する図。The figure explaining the structure of the image processing apparatus of 2nd Embodiment. 第２実施形態の画像処理装置の全体の動作を説明する図。The figure explaining the operation | movement of the whole image processing apparatus of 2nd Embodiment. 第３実施形態の画像処理装置の構成を説明する図。The figure explaining the structure of the image processing apparatus of 3rd Embodiment. 領域ラベル同定部の動作を説明する図。The figure explaining operation | movement of an area | region label identification part. 第５実施形態の画像処理装置の構成を説明する図。The figure explaining the structure of the image processing apparatus of 5th Embodiment. 第５実施形態の画像処理装置の全体の動作を説明する図。FIG. 10 is a diagram for explaining the overall operation of an image processing apparatus according to a fifth embodiment.

（第１実施形態）
以下、図面を参照しながら本発明の実施形態に係る画像処理方法（画像の領域分割方法）、および画像処理装置１０について詳細に説明する。また、撮像装置２０は、画像を撮像する撮像部と、撮像部により撮像された画像を入力画像とし、入力画像を、入力画像中の被写体ごとの領域に分割する画像処理装置１０とを有する。なお、図面間で符号が同じものは同じ動作をするものとして説明を省略する。本発明の実施形態にかかる画像処理方法および画像処理装置は、入力画像を被写体に応じて適切に複数の領域に分割するための画像処理技術に関するものである。適切に被写体を分割することは、被写体の認識やシーンの認識、被写体に応じた画質の補正などにおいて多くの画像処理を容易にする。ここでの入力画像は静止画・動画を問わない。また被写体としては人物や犬などの生物、建物や道具などの人工物、山や空といった自然物などあらゆる対象を含む。 (First embodiment)
Hereinafter, an image processing method (an image region dividing method) and an image processing apparatus 10 according to an embodiment of the present invention will be described in detail with reference to the drawings. The imaging device 20 includes an imaging unit that captures an image and an image processing device 10 that uses the image captured by the imaging unit as an input image and divides the input image into regions for each subject in the input image. Note that the same reference numerals in the drawings perform the same operation, and the description thereof is omitted. An image processing method and an image processing apparatus according to an embodiment of the present invention relate to an image processing technique for appropriately dividing an input image into a plurality of regions according to a subject. Appropriately dividing the subject facilitates many image processes in subject recognition, scene recognition, image quality correction according to the subject, and the like. The input image here may be a still image or a moving image. The subject includes all objects such as living things such as people and dogs, artificial objects such as buildings and tools, and natural objects such as mountains and the sky.

図１を参照して本発明の実施形態にかかる画像処理装置１０の動作を説明する。領域分割部１０１は入力画像を被写体ごとの複数の領域の候補を表わす複数の候補領域群に分割する。次に、類似画像取得部１０２は、画像データベース１０３から候補領域群のそれぞれの領域に対して複数の類似の画像（類似画像群）を取得する。領域成分抽出部１０４はこの類似画像群の中から類似画像に共通する特徴成分としてＫ個の成分を候補領域の特徴成分（領域成分）として抽出する。各領域成分は入力画像中の主要な被写体に関する成分である。領域同定部１０５は特徴成分（領域成分）の情報と入力画像の各画素の特徴量とを比較する。そして、領域同定部１０５は、比較の結果から各画素がそれぞれＫ個の領域成分のうちどの候補領域に属するか判定してラベリングして、ラベリングの結果を被写体ごとの領域分割の結果として出力する。このようにして最終的に入力画像の領域分割結果を得る。 The operation of the image processing apparatus 10 according to the embodiment of the present invention will be described with reference to FIG. The area dividing unit 101 divides the input image into a plurality of candidate area groups representing a plurality of area candidates for each subject. Next, the similar image acquisition unit 102 acquires a plurality of similar images (similar image group) for each region of the candidate region group from the image database 103. The region component extraction unit 104 extracts K components as feature components common to similar images from the similar image group as feature components (region components) of candidate regions. Each region component is a component related to a main subject in the input image. The region identification unit 105 compares the information of the feature component (region component) with the feature amount of each pixel of the input image. Then, the region identification unit 105 determines which candidate region of each of the K region components each pixel belongs from the comparison result, performs labeling, and outputs the labeling result as a result of region division for each subject. . In this way, the area division result of the input image is finally obtained.

なお本発明に係る領域分割の方法は、部分的には非特許文献１０に開示される方法に関連する。ここで本方法と非特許文献１０との共通点および機能的な差異に関して説明することは、本方法の効果を理解する上で有効であるので以下に詳しく説明する。 The region dividing method according to the present invention is partially related to the method disclosed in Non-Patent Document 10. Here, the description of the common points and functional differences between the present method and Non-Patent Document 10 is effective in understanding the effects of the present method, and will be described in detail below.

図３の模式図で二つの方法の相違を概念的に説明する。図３（ａ）が非特許文献１０の動作を示す図である。非特許文献１０の方法では画像データベース内の全ての画像群を対象として領域の主成分を抽出する。同方法は特定の画像を重視せずに全画像の共通の成分を抽出するため、「空」や「地面」といった、一般に画像中に広く現れる被写体の成分が上位の成分として抽出される。他方で、「牛」のように、比較的少数の画像内にのみ現れ、且つ被写体の姿勢による見え方の変動があるような被写体は、適切に抽出されないことがある。上記のような被写体の成分は剰余の成分として無視されるか、他の被写体の成分と混合して抽出されることがある。これは主成分分析において下位の成分にノイズが多く含まれることと同様にして理解できる。 The schematic diagram of FIG. 3 conceptually explains the difference between the two methods. FIG. 3A shows the operation of Non-Patent Document 10. In the method of Non-Patent Document 10, the main component of the region is extracted for all image groups in the image database. In this method, since common components of all the images are extracted without giving importance to a specific image, components of subjects that generally appear widely in the image such as “sky” and “ground” are extracted as higher components. On the other hand, a subject such as “cow” that appears only in a relatively small number of images and has a change in appearance depending on the posture of the subject may not be extracted properly. The subject component as described above may be ignored as a surplus component, or may be extracted by mixing with other subject components. This can be understood similarly to the fact that a lot of noise is included in the lower component in the principal component analysis.

これに対して本発明に係る領域分割の方法は、図３（ｂ）に示すように、１枚の入力画像に対して、領域分割を行って被写体の候補領域を抽出する。次に候補領域に類似する領域を画像データベースから取得し、入力画像と関連する画像のみに限定した画像セット（類似画像群）を生成する。次の領域成分抽出部１０４は、画像セット（類似画像群）に対して非特許文献１０の方法と同種の方法を適用することができる。ただし画像セット（類似画像群）が限定されているため、入力画像の被写体の領域に関わる成分のみをより精度高く抽出することが可能である。以上が非特許文献１０の方法と本方法との差異となる。 On the other hand, in the area dividing method according to the present invention, as shown in FIG. 3B, the candidate area of the subject is extracted by dividing the area of one input image. Next, an area similar to the candidate area is acquired from the image database, and an image set (similar image group) limited to only images related to the input image is generated. The next region component extraction unit 104 can apply a method similar to the method of Non-Patent Document 10 to an image set (similar image group). However, since the image set (similar image group) is limited, it is possible to extract only the components related to the subject area of the input image with higher accuracy. The above is the difference between the method of Non-Patent Document 10 and this method.

（領域分割部１０１）
次に本発明の実施形態に係る領域分割部１０１の動作を説明する。領域分割部１０１は入力画像から複数の領域を抽出し、画像中の被写体の領域の候補とする。領域分割部１０１は画像中でひとまとまりと考えられる領域を入力画像から複数個抽出することが可能である。この段階の領域抽出は画素の色、テクスチャ、画素の位置等、画像の局所的な特徴量の類似性を手掛かりとして行う。領域の多くは被写体の一部だけを切り出し、あるいは領域中に複数の被写体を含むことが考えられる。このような切り出しの失敗を考慮して、領域分割部１０１では十分な数のバリエーションの候補領域を抽出する。なお候補領域は多重の切り出しを許す。すなわち互いに部分的な重なりがあることを許す。 (Area division unit 101)
Next, the operation of the area dividing unit 101 according to the embodiment of the present invention will be described. The area dividing unit 101 extracts a plurality of areas from the input image and sets them as candidates for the area of the subject in the image. The area dividing unit 101 can extract a plurality of areas that can be considered as a group in the image from the input image. The region extraction at this stage is performed by using the similarity of the local feature amount of the image such as the color of the pixel, the texture, and the position of the pixel as a clue. In many areas, it is conceivable that only a part of the subject is cut out or a plurality of subjects are included in the area. In consideration of such a failure in clipping, the region dividing unit 101 extracts a sufficient number of candidate regions for variations. Note that the candidate area allows multiple cutouts. That is, it is allowed to have a partial overlap with each other.

このような機能を満たす領域分割部１０１の具体的な方法としては、非特許文献３のようにクラスタリングを行う手法や非特許文献４のようなグラフカットを利用する方法がある。上記の手法は、分割するクラスターの数やグラフカットを行う中心の画素の位置といった制御可能なパラメーターを備える。上記の手法はこの制御パラメーターを変化させて複数通りの領域分割を行い、複数の候補領域を得ることができる。 As a specific method of the region dividing unit 101 satisfying such a function, there are a method of clustering as in Non-Patent Document 3 and a method of using a graph cut as in Non-Patent Document 4. The above-described method has controllable parameters such as the number of clusters to be divided and the position of the center pixel for performing the graph cut. In the above method, a plurality of candidate regions can be obtained by changing the control parameter and performing a plurality of region divisions.

（類似画像取得部１０２）
次に類似画像取得部１０２の動作を説明する。類似画像取得部１０２は、前段の処理で得られた入力画像中の候補領域に対して、類似する領域を含む画像（類似画像）を取得する。類似画像の取得は候補領域ごとに行われ、複数の検索結果の画像が類似画像群として後段の処理に送られる。 (Similar image acquisition unit 102)
Next, the operation of the similar image acquisition unit 102 will be described. The similar image acquisition unit 102 acquires an image (similar image) including a region similar to the candidate region in the input image obtained in the previous process. Acquisition of similar images is performed for each candidate area, and a plurality of search result images are sent as a similar image group to subsequent processing.

類似画像の検索技術はイメージリトリーバルと呼ばれる分野で良く研究されている。例えば、非特許文献８や非特許文献９ではロゴマークなど、特異な見え方の部分画像を画像データベースから高速に検索する手法について述べられている。類似画像取得部１０２に上記手法を用いてもよいが、上記手法は領域中の物体の姿勢変化などの見え方の変動に弱いことがある。そのため本実施形態では、以降に述べるような方法を用いることでより柔軟に類似領域の取得を行う。 Similar image retrieval techniques are well studied in a field called image retrieval. For example, Non-Patent Document 8 and Non-Patent Document 9 describe a technique for searching a partial image having a unique appearance such as a logo mark from an image database at high speed. Although the above method may be used for the similar image acquisition unit 102, the above method may be vulnerable to changes in appearance such as a change in the posture of an object in the region. Therefore, in the present embodiment, the similar region is acquired more flexibly by using a method as described below.

以下に類似画像取得部１０２の具体的な動作を図７のフロー図を用いて説明する。まず類似画像取得部１０２は、所定の方法で各候補領域の特徴量を算出する（Ｓ７０１、Ｓ７０２）。次にこの特徴量を類似の領域を検索するための問合せのデータとして、画像データベース１０３から類似の領域を検索し、類似領域（領域サンプル）を取得する（Ｓ７０３）。ここで類似領域の検索の具体的な方法に関しては後述する。類似画像取得部１０２は検索結果から類似性（類似度）の高い順に類似領域（領域サンプル）をランキング（ソート）する（Ｓ７０４）。次に上位ランクの類似領域（領域サンプル）の画像を類似画像のセットとして出力する（Ｓ７０５）。出力する画像は所定のランク以上のものに限る。もしくは類似性（類似度）のスコアが所定の閾値以上のものか、あるいは類似性（類似度）のスコアとランクの双方の条件を満たすものに限ってもよい。 The specific operation of the similar image acquisition unit 102 will be described below with reference to the flowchart of FIG. First, the similar image acquisition unit 102 calculates the feature amount of each candidate region by a predetermined method (S701, S702). Next, similar data is searched from the image database 103 as inquiry data for searching for similar regions using this feature amount, and a similar region (region sample) is acquired (S703). A specific method for searching for similar regions will be described later. The similar image acquisition unit 102 ranks (sorts) the similar regions (region samples) in descending order of similarity (similarity) from the search result (S704). Next, images of similar regions (region samples) of higher rank are output as a set of similar images (S705). Images to be output are limited to those of a predetermined rank or higher. Alternatively, the similarity (similarity) score may be equal to or higher than a predetermined threshold, or the similarity (similarity) score and rank may be satisfied.

領域の特徴量としては、様々なものが考えられる。過去の画像認識技術に用いられた特徴を用いることができる。例えば、領域中の複数の特徴点からＳＩＦＴ（Scale Invariant Feature Transform）特徴を算出し、Ｂａｇ−ｏｆ−ｗｏｒｄｓ手法によって変換したヒストグラム特徴量を用いることができる。また例えば、領域の画像内の位置、領域の形状を表す２次モーメント、領域内の画素の色ヒストグラムなどを用いることもできる。また例えば、ＨＯＧ特徴量やＬＢＰ特徴量などを用いることもできる。また例えば、これら複数の特徴量のベクトルを連結した高次の特徴量などを用いることもできる。なお、候補領域の形状は一般に不定形なので、ＨＯＧ特徴量やＬＢＰ特徴量等を算出する際には、領域に外接する矩形で周辺部ごと画像を切り出し、正方形の画像に変換する必要がある。以上のような各特徴量に関する詳細は非特許文献６や非特許文献７などの多くの先行研究にて広く開示されているので、詳細についてはそれらに譲る。 Various features are conceivable as the region feature amount. Features used in past image recognition techniques can be used. For example, a SIFT (Scale Invariant Feature Transform) feature is calculated from a plurality of feature points in the region, and a histogram feature amount converted by the Bag-of-words method can be used. Further, for example, the position of the region in the image, the second moment representing the shape of the region, the color histogram of the pixels in the region, and the like can be used. Also, for example, HOG feature values, LBP feature values, and the like can be used. In addition, for example, a higher-order feature value obtained by concatenating a plurality of feature value vectors can be used. Note that since the shape of the candidate region is generally indefinite, when calculating the HOG feature value, the LBP feature value, etc., it is necessary to cut out the image for each peripheral part with a rectangle circumscribing the region and convert it into a square image. Details regarding each feature amount as described above have been widely disclosed in many previous studies such as Non-Patent Document 6 and Non-Patent Document 7, and the details will be given to them.

（類似領域の検索方法（Ｓ７０３））
次に類似領域の検索する方法の詳細について説明する。画像データベース１０３には十万や百万といったオーダーのサンプル画像のセットがあらかじめ記憶されていると想定する。そのためここでは高速な近傍検索の一般的な手法であるハッシュを用いる。例えば具体的には非特許文献５に開示されている局所性鋭敏型ハッシュ（ＬＳＨ）と呼ばれる手法を用いる。 (Similar area search method (S703))
Next, details of a method for searching for similar regions will be described. Assume that the image database 103 stores in advance a set of sample images in the order of 100,000 or million. Therefore, here, a hash, which is a general technique for high-speed neighborhood search, is used. For example, a technique called local sensitive hash (LSH) disclosed in Non-Patent Document 5 is specifically used.

ＬＳＨでは高速検索のためにオフラインでデータの登録を行っておく。まず画像データベース１０３の中の画像群をあらかじめ領域分割部１０１と同様の方法で領域分割しておく。また更に分割された各領域について特徴量を算出しておく。次に特徴量を所定の関数を用いて変換し、ハッシュキーと呼ばれる元の特徴量よりも低次元の変量を得る。このハッシュキーをアドレスとして各領域のデータを登録し、ハッシュ表と呼ばれる表を作成する。ハッシュ表の一つのアドレス上には類似のデータサンプルが集まって登録される。以上の処理が画像データベース１０３にデータを登録する際の動作となる。 LSH registers data offline for high-speed search. First, an image group in the image database 103 is divided into regions in advance by the same method as the region dividing unit 101. Further, a feature amount is calculated for each divided area. Next, the feature quantity is converted using a predetermined function to obtain a variable having a lower dimension than the original feature quantity called a hash key. Data of each area is registered using this hash key as an address, and a table called a hash table is created. Similar data samples are collected and registered on one address of the hash table. The above processing is the operation when registering data in the image database 103.

次に検索時の動作を、図８を用いて説明する。図８は先に図７で説明した類似画像取得部１０２の更に詳細なフロー図である。まず、類似の領域を検索するための問合せに用いる候補領域を特徴量に変換し（Ｓ８０１、Ｓ８０２）、これを更にハッシュキーに変換する（Ｓ８０３）。次に、ハッシュキーからハッシュ表を参照する。対応するアドレスのエントリーを参照して、登録されているデータがあれば類似領域として取得する（Ｓ８０４、Ｓ８０５）。ステップＳ８０５の判定処理で、類似領域が存在しない場合（Ｓ８０５−Ｎｏ）、処理をステップＳ８１２に進める。一方、ステップＳ８０５の判定処理で、類似領域が存在する場合（Ｓ８０５−Ｙｅｓ）、処理をステップＳ８０６に進めて、ハッシュキーに変換される前の元となる各類似領域について、高次元の特徴量を呼び出す（Ｓ８０６、Ｓ８０７）。特徴量に基づいて問合せの候補領域と類似領域との正確な距離を算出する（Ｓ８０８）。この処理を全ての類似領域について繰り返し実行する。そして、算出された距離を比較して、距離の近い順にサンプル（類似領域）をソートし（Ｓ８１０）、所定の閾値よりも距離の近かった類似領域を出力する（Ｓ８１１）。ここで、距離としてはユークリッド距離やカイ二乗距離などを用いればよい。この処理を全ての候補領域について繰り返し実行する（Ｓ８１２）。以上が類似画像取得部１０２の詳細な動作となる。なお、近傍探索のための精度の高いハッシュ関数の設計に関しては広く研究されており、詳細は非特許文献５に譲る。 Next, the operation at the time of searching will be described with reference to FIG. FIG. 8 is a more detailed flowchart of the similar image acquisition unit 102 described above with reference to FIG. First, a candidate area used for a query for searching for a similar area is converted into a feature amount (S801, S802), and further converted into a hash key (S803). Next, the hash table is referenced from the hash key. By referring to the entry of the corresponding address, if there is registered data, it is acquired as a similar area (S804, S805). If it is determined in step S805 that no similar region exists (S805-No), the process proceeds to step S812. On the other hand, if it is determined in step S805 that a similar region exists (S805-Yes), the process proceeds to step S806, and a high-dimensional feature value is obtained for each similar region that is the original before being converted into a hash key. (S806, S807). Based on the feature amount, an accurate distance between the query candidate region and the similar region is calculated (S808). This process is repeated for all similar regions. Then, the calculated distances are compared, the samples (similar areas) are sorted in order of increasing distance (S810), and similar areas that are closer than the predetermined threshold are output (S811). Here, as the distance, an Euclidean distance, a chi-square distance, or the like may be used. This process is repeated for all candidate areas (S812). The above is the detailed operation of the similar image acquisition unit 102. The design of a high-precision hash function for neighborhood search has been extensively studied, and details are given in Non-Patent Document 5.

なお、類似画像取得部１０２の別の派生的な形態として、画像の類似度のグラフ構造を用いる形態も考えられる。これにより類似画像の問合せをより柔軟に行うことが期待できる。ここでいうグラフとは、各領域をノードとし、領域間の類似性をエッジとする類似度行列で表されるグラフのことである。具体的な動作例は例えば以下のようになる。まず前述と同じくハッシュによって近傍領域を検出する。次に領域間の類似度行列を使って更に近傍領域に近接する他のデータサンプルを探索する。個々のデータサンプルについて高次の特徴量で候補領域との距離の評価を行う。このような工夫により、ハッシュのみでは反映しにくい画像の多様体としての相互の関連性を考慮しながら、類似画像の問合せを行うことができる。 As another derivative form of the similar image acquisition unit 102, a form using a graph structure of image similarity may be considered. As a result, it can be expected that similar images can be inquired more flexibly. Here, the graph is a graph represented by a similarity matrix having each region as a node and the similarity between the regions as an edge. A specific operation example is as follows, for example. First, the neighborhood area is detected by hashing as described above. Next, another data sample closer to the neighboring region is searched using the similarity matrix between the regions. For each data sample, the distance from the candidate region is evaluated with a high-order feature amount. By such a device, it is possible to inquire similar images while considering the mutual relationship as a variety of images that are difficult to reflect only by hashing.

また、画像データベース１０３の別の派生的な形態として、インターネット上に設置された画像検索サーバーを用いる形態も考えられる。この画像検索サーバーはエージェントをウェブサービスのネットワーク上で動作させ、新たな画像をネットワーク上で自動収集してデータベースに追加する機能を持つ。このような形態の画像データベース１０３を用いる類似画像取得部１０２、領域分割部１０１においては、与えられた入力画像中に新規な被写体が含まれていても正しく動作することが期待できる。例えば、入力画像中にある種の家電の新製品のような、未知の特定形状の被写体が写っている状況を考える。ユーザがこの被写体について何らの知識を持っていなくても、ウェブ上に十分な数の同製品の画像情報が掲示され、データベースへの収集がなされていれば、正しく領域分割することが可能であることが期待できる。 As another derivative form of the image database 103, a form using an image search server installed on the Internet is also conceivable. This image search server has a function of operating an agent on a web service network, automatically collecting new images on the network, and adding them to a database. The similar image acquisition unit 102 and the region division unit 101 using the image database 103 having such a form can be expected to operate correctly even if a new subject is included in a given input image. For example, consider a situation in which an object with an unknown specific shape appears in the input image, such as a new product of a certain home appliance. Even if the user does not have any knowledge about this subject, it is possible to correctly divide the area if a sufficient number of image information of the product is posted on the web and collected in the database. I can expect that.

（領域成分抽出部１０４）
領域成分抽出部１０４の動作について説明する。このモジュールの動作の目的は、前段の類似画像取得部１０２によって得られた類似画像のセット（類似画像群）の中から、領域の成分を抽出することである。 (Regional component extraction unit 104)
The operation of the region component extraction unit 104 will be described. The purpose of the operation of this module is to extract the components of the region from the set of similar images (similar image group) obtained by the similar image acquisition unit 102 in the previous stage.

本実施形態において領域の成分と呼んでいるものについて図２を用いて具体的に説明する。図２（ａ）は画像中の草地に関する領域をいくつか抽出したものである。それぞれの草地の事例画像は僅かな見え方のバリエーションがある。また領域内には牛の四肢のような余計な要素も混合している。図２（ａ）中の各領域は、図２（ｂ）のようにそれぞれ特徴量に変換される。ここでは先に説明したＳＩＦＴ特徴量とＢａｇ−ｏｆ−ｗｏｒｄｓ手法を用いてヒストグラム特徴量に変換した例を図示している。なお図中のｗ_１・・・ｗ_Ｖはコードブックの分布である。この特徴量のセットから、特徴量間に共通する図２（ｃ）のような主成分を抽出してこれを領域１、２、３の全体的な成分とする。領域の成分は草地に共通する特徴のみが分離され、コードブックの分布の上に表現されたものである。 In the present embodiment, what is called a component of the region will be specifically described with reference to FIG. FIG. 2 (a) shows several extracted regions relating to grassland in the image. Each grassland example image has a slight variation in appearance. The area also contains extra elements such as cow limbs. Each area in FIG. 2A is converted into a feature amount as shown in FIG. Here, an example in which the SIFT feature value and the Bag-of-words method described above are used for conversion to a histogram feature value is illustrated. It should be noted that _{_w} 1 ··· _w _V in the figure is the distribution of the code book. A main component as shown in FIG. 2C, which is common to the feature amounts, is extracted from the feature amount set, and is used as the overall components of the regions 1, 2, and 3. The components of the region are expressed on the codebook distribution by separating only the features common to the grassland.

このような成分抽出を目的とした手法は複数存在する。例えば非特許文献１０のＬａｔｅｎｔＤｉｒｉｃｈｌｅｔａｌｌｏｃａｔｉｏｎ（ＬＤＡ）の方法を用いることができる。またＬＤＡ以外ではディリクレ過程を用いたモデルなどもよく研究されている（非特許文献１３）。ここでは成分抽出の精度の高さで一定の評価のあるＬＤＡを中心に説明を行うが、本発明に係る実施形態は成分抽出の方法を特にＬＤＡに限定するものではないことに注意されたい。 There are a plurality of methods for such component extraction. For example, the method of Latent Dilatation (LDA) of Non-Patent Document 10 can be used. In addition to LDA, models using the Dirichlet process have been well studied (Non-patent Document 13). Here, the description will focus on LDA having a certain evaluation with high accuracy of component extraction, but it should be noted that the embodiment according to the present invention does not particularly limit the method of component extraction to LDA.

ＬＤＡは自然言語処理に広く用いられる文書生成モデルの一つであり、以下の（１）式により示される。 LDA is one of document generation models widely used for natural language processing, and is represented by the following equation (1).

以下にＬＤＡについて簡単に説明する。上式はＫ個のトピック（話題）を含む文書がＮ個の語彙を生成する時の確率モデルを表したものである。ここでｗは生成されたＮ個の語彙ｗ_１、・・・・、ｗ_Ｎの列を表す。Ｋ個のトピックのうち、各トピックＺ_ｋは語彙ｗ_１、・・・・、ｗ_Ｖのうちいずれの語彙を生成するかの生成確率を決める確率分布である。またＶは可能な語彙の総数である。Ｚ_ｋはβ_ｋをハイパーパラメーターとする多項分布より生成される。ｚはＮ個の確率分布ｚ_１、・・・・、ｚ_Ｎの列である。ｚ_ｎはＫ個のトピックＺ_１、・・・・、Ｚ_Ｋの混合で構成される分布で、ｎ番目の語彙の生成に関わる。混合比はパラメーターθによって決定され、θはαをハイパーパラメーターとしてディリクレ分布より生成される。学習時には与えられた事例データを用いて変分法等でハイパーパラメーターを学習する。 The LDA will be briefly described below. The above expression represents a probability model when a document including K topics (topics) generates N vocabularies. Here, w represents a sequence of the generated N vocabulary w ₁ ,..., W _N. Among the K topics, each topic Z _k is a probability distribution that determines the generation probability of which vocabulary w ₁ ,..., W _V is generated. V is the total number of possible vocabularies. Z _k is generated from a multinomial distribution with β _k as a hyperparameter. z is a sequence of _N probability distributions z ₁ ,..., z _N. z _n is a distribution composed of a mixture of _K topics Z ₁ ,..., Z _K and is involved in the generation of the nth vocabulary. The mixing ratio is determined by the parameter θ, and θ is generated from the Dirichlet distribution with α as a hyperparameter. During learning, hyperparameters are learned by the variational method using the given case data.

トピックＺ_ｋを画像の特徴量の潜在変数とし、Ｂａｇ−ｏｆ−ｗｏｒｄｓのコードブックを語彙ｗ_１、・・・、ｗ_Ｖと置くことで、ＬＤＡは画像認識に適用することが可能である。ＬＤＡによって抽出される各トピックＺ_ｋは各コードブックｗ_１、・・・・、ｗ_Ｖの生起する確率分布となる。ＬＤＡについては非特許文献１１に、その画像認識への適用の詳細に関しては非特許文献１０等に開示されているので詳細はそちらに譲る。 By setting the topic _Zk as a latent variable of the feature amount of the image and placing the Bag-of-words codebook as the vocabulary w ₁ ,..., W _V , the LDA can be applied to image recognition. Each topic Z _k extracted by the LDA becomes a probability distribution in which each code book w ₁ ,..., W _V occurs. Since LDA is disclosed in Non-Patent Document 11 and details of its application to image recognition are disclosed in Non-Patent Document 10 and the like, details will be given to that.

次にＬＤＡを適用した領域成分抽出部１０４の詳細な動作を図４のフロー図に基づいて説明する。まず領域成分抽出部１０４は類似画像取得部１０２から類似領域群と類似領域を含む類似画像群とを受け取る。次に領域成分抽出部１０４は全ての類似画像について、領域分割の結果を読み出す（Ｓ４０１、Ｓ４０２）。この領域分割の結果は先の類似画像取得部１０２の動作時に利用したものと同一である。制御パラメーターを変化させて複数通りの分割結果を得て画像データベース１０３に記憶させたものである。次に、読みだされた全ての分割領域に対して、類似領域にオーバーラップする領域を抽出する（Ｓ４０３、Ｓ４０４、Ｓ４０５）。 Next, the detailed operation of the region component extraction unit 104 to which LDA is applied will be described based on the flowchart of FIG. First, the region component extraction unit 104 receives a similar region group and a similar image group including a similar region from the similar image acquisition unit 102. Next, the region component extraction unit 104 reads the region division results for all similar images (S401, S402). The result of this area division is the same as that used when the similar image acquisition unit 102 is operated. A plurality of division results are obtained by changing the control parameters and stored in the image database 103. Next, an area that overlaps the similar area is extracted from all the read divided areas (S403, S404, and S405).

類似領域にオーバーラップする領域を抽出する（Ｓ４０３、Ｓ４０４、Ｓ４０５）ステップは類似領域の質を高めるために重要であるので特に図５に模式的に図示する。図５には入力画像中の候補領域５０１と、候補領域５０１に関連するとして抽出された類似領域５０２が示されている。また類似領域５０２にオーバーラップする領域５０３ａ、５０３ｂ、５０３ｃが示されている。類似領域５０２の一部と重複する領域も類似領域とともに抽出する。オーバーラップする領域５０３（５０３ａ、５０３ｂ、５０３ｃ）も抽出してデータサンプルとする工夫によって、類似領域５０２が被写体の一部分に限られていても、領域成分を抽出するためのより好適なデータを得ることが期待できる。 The step of extracting a region that overlaps the similar region (S403, S404, S405) is important for improving the quality of the similar region, and is particularly schematically illustrated in FIG. FIG. 5 shows a candidate area 501 in the input image and a similar area 502 extracted as related to the candidate area 501. Also, regions 503a, 503b, and 503c that overlap the similar region 502 are shown. A region overlapping with a part of the similar region 502 is also extracted together with the similar region. By extracting the overlapping region 503 (503a, 503b, 503c) as a data sample, even if the similar region 502 is limited to a part of the subject, more suitable data for extracting the region component is obtained. I can expect that.

説明を図４に戻し、ステップＳ４０１からステップＳ４０６の処理を全類似画像について繰り返し実行する。 Returning to FIG. 4, the processing from step S401 to step S406 is repeatedly executed for all similar images.

次に領域成分抽出部１０４はステップＳ４０３〜Ｓ４０５で得られた全ての領域の特徴量を画像データベース１０３から読み出し、これをデータサンプルとして先掲の手法であるＬＤＡを実行する（Ｓ４０７）。ＬＤＡの結果としてＫ個のトピックＺ_１、・・・・、Ｚ_Ｋを得てこれを領域成分とする（Ｓ４０８）。なお、ここでは領域成分の数として所定の数Ｋを定めたが、ある程度大きな数Ｋを決めておき、混合比のパラメーターθ_ｋが小さいトピックを削除してもよい。その場合はＫ以下の可変の数のトピックを得ることができる。 Next, the region component extraction unit 104 reads out the feature values of all the regions obtained in steps S403 to S405 from the image database 103, and executes LDA as the above-described method using this as a data sample (S407). As a result of the LDA, K topics Z ₁ ,..., Z _K are obtained and set as region components (S408). Here, although determined a predetermined number K as the number of domain components may delete some previously determined large numbers K, parameters of the mixing ratio theta _k is smaller topics. In that case, a variable number of topics of K or less can be obtained.

（領域同定部１０５）
次に領域同定部１０５の動作について詳しく説明する。領域同定部１０５には、領域成分抽出部１０４によって抽出されたＫ個の領域成分Ｚ_１、・・・・、Ｚ_Ｋが送られる。次に領域同定部１０５はこの領域成分に基づいて入力画像の各画素のカテゴリーの同定を行う。 (Region identification unit 105)
Next, the operation of the area identification unit 105 will be described in detail. K region components Z ₁ ,..., Z _K extracted by the region component extraction unit 104 are sent to the region identification unit 105. Next, the region identification unit 105 identifies the category of each pixel of the input image based on this region component.

具体的な動作のフローを、図６を用いて説明する。入力画像中の各画素に関してその画素を中心として領域分割を行う（Ｓ６０１〜Ｓ６０２）。各領域のＢａｇ−ｏｆ−ｗｏｒｄｓの特徴量を算出する（Ｓ６０３）。この特徴量とＫ個の領域成分Ｚ_１、・・・・、Ｚ_Ｋの間のカルバック・ライブラー距離（ＫＬ距離）を求める。Ｋ個の領域成分の全てに対してカルバック・ライブラー距離を算出する（Ｓ６０４〜Ｓ６０６）。 A specific operation flow will be described with reference to FIG. For each pixel in the input image, region division is performed around the pixel (S601 to S602). The feature amount of Bag-of-words of each region is calculated (S603). A Cullback-Lailer distance (KL distance) between this feature amount and K region components Z ₁ ,..., Z _K is obtained. The cullback / librarian distance is calculated for all K region components (S604 to S606).

ステップＳ６０７の判定で、Ｋ個のうち、最小のカルバック・ライブラー距離（ＫＬ距離）が閾値以下であるか否かを判定する。ステップＳ６０７の判定で、最小のカルバック・ライブラー距離が所定の閾値以下にならない場合（Ｓ６０７−Ｎｏ）、処理をステップＳ６０８に進め、注目画素を最小のカルバック・ライブラー距離の領域成分にラベリングする。最も近かった領域成分Ｚ_ｋに注目画素が帰属するとしてｋ番目のラベルを割り当てる（Ｓ６０８）。一方、ステップＳ６０７の判定で、最小のカルバック・ライブラー距離が所定の閾値以下になる場合（Ｓ６０７−Ｙｅｓ）、処理をステップ６０９に進め、注目画素を空白領域とする。カルバック・ライブラー距離が所定の閾値以下の画素はどれにも属さない領域としてラベルを割り当てない（Ｓ６０９）。従って、ここでラベルの種類の数は帰属が未決定の領域を含めて最大でＫ＋１となる。入力画像の全画素についてステップＳ６０１からステップＳ６１０の処理を繰り返し、最終的なラベリングの結果を領域分割結果として出力する（Ｓ６１１）。なお、ここでは領域に与えられるラベルは種別の番号のみであって、「牛」や「草原」といった意味的なラベルの同定は特になされていないことに注意されたい。
以上のような処理を行うことによって、被写体の教師データを用いずに、入力画像を被写体の種類に応じて、互いに異なる領域に分割することができる。 In the determination in step S607, it is determined whether or not the minimum cullback / liver distance (KL distance) out of K is equal to or less than a threshold value. If it is determined in step S607 that the minimum cullback / librarian distance is not less than or equal to the predetermined threshold (S607-No), the process proceeds to step S608, and the target pixel is labeled with the area component of the minimum cullback / librar distance. . The k-th label is assigned assuming that the pixel of interest belongs to the closest region component Z _k (S608). On the other hand, if it is determined in step S607 that the minimum cullback / librarian distance is equal to or smaller than the predetermined threshold value (S607-Yes), the process proceeds to step 609 to set the target pixel as a blank area. A label is not assigned as an area that does not belong to any pixel whose Cullback-Lailer distance is equal to or less than a predetermined threshold (S609). Therefore, the number of types of labels here is K + 1 at the maximum including the area where the attribution is undetermined. The processing from step S601 to step S610 is repeated for all the pixels of the input image, and the final labeling result is output as a region division result (S611). It should be noted that here, the label given to the area is only the type number, and no semantic label such as “cow” or “grass” is identified.
By performing the processing as described above, it is possible to divide the input image into different regions according to the type of the subject without using the subject teacher data.

（第２実施形態）
本発明に係る第２実施形態として、第１実施形態が行う領域分割の精度を更に向上させることのできる拡張的な方法について説明する。図９に本実施形態の画像処理装置１０のブロック図を示す。図１と符号を同じとする処理要素は同じ動作をするものとして説明を省略する。図９に示す構成では、領域同定部１０５の領域分割結果が変化しなくなるまで（所定の変化率以下になるまで）、領域分割部１０１からの処理を繰り返す。この点で、図９の構成は第１実施形態における図１の構成と相違する。 (Second Embodiment)
As a second embodiment according to the present invention, an extended method capable of further improving the accuracy of area division performed by the first embodiment will be described. FIG. 9 shows a block diagram of the image processing apparatus 10 of the present embodiment. The processing elements having the same reference numerals as those in FIG. In the configuration shown in FIG. 9, the processing from the region dividing unit 101 is repeated until the region division result of the region identifying unit 105 does not change (until the predetermined change rate or less). In this respect, the configuration of FIG. 9 is different from the configuration of FIG. 1 in the first embodiment.

図１０の本実施形態の全体的なフローの図に基づいて本実施形態の画像処理装置１０の動作を詳細に説明する。まず、第１実施形態と同様の処理を行い、入力画像より領域分割結果を得る。領域同定部１０５から出力された領域分割の結果を候補領域群として取得する（Ｓ１００１）。領域分割部１０１（第２の分割部）は、次にステップＳ１００１の取得工程で取得された領域分割の結果の個々の候補領域に対してグラフカットアルゴリズムを適用する。具体的には各領域の画素とそれ以外の領域の画素をそれぞれ前景と背景の初期値として与え、グラフカットアルゴリズムを適用する。このとき、グラフカットアルゴリズムのパラメーターを変化させて、複数の領域分割結果を得る（Ｓ１００２〜Ｓ１００６）。領域分割部１０１（第２の分割部）は、このようにして得られた候補領域群を新たな候補領域群とする（第２の候補領域群）。新たな候補領域群（第２の候補領域群）に対して、第１実施形態と同様の処理により再び類似領域の検索を行い、領域分割の処理を行う（Ｓ１００７）。 The operation of the image processing apparatus 10 of this embodiment will be described in detail based on the overall flow diagram of this embodiment of FIG. First, processing similar to that of the first embodiment is performed, and a region division result is obtained from the input image. The result of area division output from the area identification unit 105 is acquired as a candidate area group (S1001). Next, the region dividing unit 101 (second dividing unit) applies the graph cut algorithm to each candidate region obtained as a result of the region division acquired in the acquisition step of step S1001. Specifically, the pixels of each region and the pixels of the other regions are given as initial values of the foreground and the background, respectively, and the graph cut algorithm is applied. At this time, the parameters of the graph cut algorithm are changed to obtain a plurality of region division results (S1002 to S1006). The area dividing unit 101 (second dividing unit) sets the candidate area group thus obtained as a new candidate area group (second candidate area group). The new candidate area group (second candidate area group) is searched again for similar areas by the same process as in the first embodiment, and the area division process is performed (S1007).

ステップＳ１００７では、新たな候補領域群のそれぞれの候補領域に対して類似する類似画像群を画像データベースより取得する。類似画像群の中から類似画像に共通する特徴を候補領域の特徴成分として抽出する。抽出された特徴成分と入力画像の各画素の特徴量とを比較して、各画素が新たな候補領域群のうちどの候補領域に属するかを示すラベリングをして、ラベリングの結果を被写体ごとの領域分割の結果（第２の領域分割の結果）として出力する。 In step S1007, a similar image group similar to each candidate area of the new candidate area group is acquired from the image database. Features common to similar images are extracted from the similar image group as feature components of candidate regions. The extracted feature component is compared with the feature value of each pixel of the input image, and each pixel is labeled to indicate which candidate region the new candidate region group belongs to, and the labeling result is obtained for each subject. The result is output as a result of area division (result of second area division).

ステップＳ１００８では、領域分割の結果に対して第２の領域分割の結果が収束しているか否かを判定する。第２の領域分割の結果が収束していると判定された場合、領域同定工程は、第２の領域分割の結果を被写体ごとの領域分割の結果として出力する。第２の領域分割の結果が収束していないと判定される場合、領域分割の結果が変化しなくなるまで（例えば、所定の変化率以下になるまで）以上の処理を繰り返す（Ｓ１００８）。以上のような工夫を行えば、繰り返し処理が進む度に領域分割結果の精度が向上することが期待できる。 In step S1008, it is determined whether or not the second region division result has converged with respect to the region division result. When it is determined that the result of the second area division has converged, the area identification step outputs the result of the second area division as a result of the area division for each subject. When it is determined that the result of the second region division has not converged, the above processing is repeated until the result of the region division does not change (for example, until it becomes a predetermined change rate or less) (S1008). If the above-described device is performed, it can be expected that the accuracy of the region division result is improved every time the iterative process proceeds.

なお、ここでグラフカットを行って第一回目の領域分割結果から新たな候補領域群を生成する方法を示したが、候補領域群の生成の方法はこれに限定されない。別の実施形態として以下のようなものでもよい。例えば、第１実施形態と同様の処理を行い第一回目の分割結果を得る。次に第一回目の候補領域群のうち、第一回目の分割結果と所定の割合以上にオーバーラップしている候補領域のみを抽出する。これを第二回目の候補領域群とし、再び領域分割を行うことも可能である。 Although a method of generating a new candidate region group from the first region division result by performing graph cut is shown here, the method of generating the candidate region group is not limited to this. Another embodiment may be as follows. For example, the same processing as in the first embodiment is performed to obtain the first division result. Next, in the first candidate area group, only candidate areas that overlap the first division result with a predetermined ratio or more are extracted. This can be used as a second candidate area group, and the area can be divided again.

（第３実施形態）
本発明に係る第３実施形態として、第１実施形態が行う領域分割の結果に対して、「牛」や「草原」といった被写体の名称の同定（被写体認識）を行う方法について説明する。本実施形態における被写体認識方法は第１実施形態の拡張的な方法である。図１１に本実施形態の画像処理装置１０のブロック図を示す。図１１に示す画像処理装置１０は、図１に示したブロック図の構成に対して更に領域ラベル同定部１０６を備える。領域ラベル同定部１０６は、画像データベースに記憶されている各画像に対して関連づけられたタグを用いて、領域同定部１０５の領域分割の結果にアノテ−ションを設定し、入力画像における被写体の認識結果として出力する。図１１の画像データベース１０３は、各画像に対して関連づけられたタグを記憶しているとする。タグはウェブページなどから画像を自動収集する際に、ウェブページの見出しなどの関連情報から抽出されたり、Ｆｌｉｃｋｒ等のイメージデータベースのように、人手によってタグが付されていることを想定している。 (Third embodiment)
As a third embodiment according to the present invention, a method for identifying subject names (subject recognition) such as “cow” and “grass” based on the result of area division performed by the first embodiment will be described. The subject recognition method in this embodiment is an extended method of the first embodiment. FIG. 11 shows a block diagram of the image processing apparatus 10 of the present embodiment. The image processing apparatus 10 illustrated in FIG. 11 further includes a region label identification unit 106 in addition to the configuration of the block diagram illustrated in FIG. The region label identifying unit 106 sets an annotation for the result of region segmentation by the region identifying unit 105 using the tag associated with each image stored in the image database, and recognizes the subject in the input image. Output as a result. Assume that the image database 103 in FIG. 11 stores a tag associated with each image. It is assumed that tags are extracted from related information such as web page headings when images are automatically collected from web pages, or that tags are manually added like image databases such as Flickr. .

本実施形態の画像処理装置１０の領域ラベル同定部１０６の動作フローを図１２に基づいて説明する。まずＫ個の候補領域のカテゴリーとその領域成分がすでに得られているとする（Ｓ１２０１）。次に各カテゴリーについて、それぞれ領域の事例データ（類似画像）を得る。そのために先ず領域成分の特徴量よりハッシュキーを作成する（Ｓ１２０２）。これは第１実施形態で説明したのと同じ方法を用いればよい。次にハッシュ表から類似領域群を得る（Ｓ１２０３）。次に類似領域群の含まれる画像の全てのタグを画像データベース１０３から読み出す（Ｓ１２０４）。次にタグのセット中に出現頻度の高い語彙を抽出し、これをアノテーションの結果とする（Ｓ１２０５）。なお、出現頻度に基づく方法は語彙の抽出の最も単純な方法であるが、ここで語彙間のオントロジー上の距離を考慮したり、語彙の出現頻度の希少さを加味すれば、より精度高くアノテーションを行うことができる。このようなアノテーションを設定する手法は古くより研究されており、本実施形態においては実現手段を特に一つに限定するものではない。ここでは非特許文献１４などの先行手法を参考の方法として挙げて詳細の説明を省く。 An operation flow of the area label identification unit 106 of the image processing apparatus 10 according to the present embodiment will be described with reference to FIG. First, it is assumed that K candidate area categories and area components have already been obtained (S1201). Next, the case data (similar image) of each area is obtained for each category. For this purpose, first, a hash key is created from the feature amount of the region component (S1202). This may be the same method as described in the first embodiment. Next, a similar area group is obtained from the hash table (S1203). Next, all the tags of the image including the similar region group are read from the image database 103 (S1204). Next, a vocabulary having a high appearance frequency is extracted from the tag set, and this is used as an annotation result (S1205). Note that the method based on the appearance frequency is the simplest method of vocabulary extraction, but if you consider the ontological distance between the vocabulary or consider the rareness of the vocabulary appearance frequency, the annotation with higher accuracy can be used. It can be performed. A method for setting such an annotation has been studied for a long time, and in the present embodiment, the realization means is not particularly limited to one. Here, a prior method such as Non-Patent Document 14 is cited as a reference method, and detailed description is omitted.

（第４実施形態）
本発明に係る第４実施形態として、第１実施形態で述べた領域成分抽出部１０４ならびに領域同定部１０５の動作の派生的な別の形態について説明する。本実施形態は第１実施形態と部分的に同一であり、領域成分抽出部１０４と領域同定部１０５の動作のみが異なる形態を説明する。そのため、ここでは説明の重複を避けて領域成分抽出部１０４と領域同定部１０５の動作についてのみ説明する。 (Fourth embodiment)
As a fourth embodiment according to the present invention, another derivative form of the operations of the region component extraction unit 104 and the region identification unit 105 described in the first embodiment will be described. This embodiment is partially the same as the first embodiment, and a mode in which only the operations of the region component extraction unit 104 and the region identification unit 105 are different will be described. Therefore, here, only the operations of the region component extraction unit 104 and the region identification unit 105 will be described while avoiding repeated description.

本実施形態では領域成分抽出部１０４と領域同定部１０５との動作において画像の類似性のグラフ構造を利用する。非特許文献１２にはこの目的に適用可能な方法の一例であるリンク解析と呼ばれる手法が開示されている。本方法は、インターネット上の検索エンジンが、リンクの集中するウェブページを重要度の高いページとして抽出する際に用いる方法と同種のアルゴリズムである。 In the present embodiment, a graph structure of image similarity is used in the operations of the region component extraction unit 104 and the region identification unit 105. Non-Patent Document 12 discloses a technique called link analysis, which is an example of a method applicable to this purpose. This method is the same kind of algorithm as a method used when a search engine on the Internet extracts a web page on which links are concentrated as a highly important page.

以下に非特許文献１２の概略を説明する。本手法はデータベースの画像群から所定の方法で領域を複数個切り出す。切り出した領域をそれぞれノードと見なし、領域間の類似性をノード間のエッジとするグラフを作成する。次に、このグラフ構造の核となっているノードを見つけ出して、画像データベースに含まれる被写体の代表的な事例画像とする。このようなノードをハブと呼ぶ。ハブは複数個存在してよい。 The outline of Non-Patent Document 12 will be described below. In this method, a plurality of regions are cut out from a database image group by a predetermined method. Each of the cut out regions is regarded as a node, and a graph is created in which the similarity between the regions is an edge between the nodes. Next, a node that is the core of this graph structure is found out and used as a representative case image of a subject included in the image database. Such a node is called a hub. There may be a plurality of hubs.

次に非特許文献１２の方法を領域成分抽出部１０４に適用した形態の詳細について説明する。領域成分抽出部１０４は、まず前段の処理で得られた類似画像群から領域群を切り出す。領域群を切り出す方法は領域分割部１０１と同じ方法でもよいし、非特許文献１２に述べられているようなスーパーピクセルを用いる方法でもよい。次に領域をノードとし、領域間の特徴量の距離を求めてグラフを作成する。次に同グラフのハブとなっているノードをＫ個抽出する。Ｋ個のハブをＫ個のカテゴリーの代表的な事例画像と見なすことができる。 Next, the detail of the form which applied the method of the nonpatent literature 12 to the area | region component extraction part 104 is demonstrated. The region component extraction unit 104 first cuts out a region group from the similar image group obtained in the preceding process. The method of cutting out the region group may be the same method as the region dividing unit 101 or a method using superpixels as described in Non-Patent Document 12. Next, a region is defined as a node, and a graph is created by obtaining the distance of the feature amount between the regions. Next, K nodes that are the hub of the graph are extracted. K hubs can be considered as representative case images of K categories.

次に領域同定部１０５について説明する。領域同定部１０５では入力画像の各画素について第１実施形態で説明したのと同様の方法で領域を一つ生成する。この領域に最も類似する類似領域を画像データベース１０３より取り出す。次に上記の類似領域がＫ個のカテゴリーのいずれに属するかを求める。具体的には、上記の類似領域から各ハブまでの測地線距離を求め、最も近いハブのカテゴリーに注目画素を割り当てる。なお、ここで類似する領域を一つではなく所定の複数の数だけ取り出し、投票によりカテゴリーの帰属を決定してもよい。 Next, the region identification unit 105 will be described. The area identifying unit 105 generates one area for each pixel of the input image by the same method as described in the first embodiment. A similar region most similar to this region is extracted from the image database 103. Next, it is determined which of the K categories the similar region belongs to. Specifically, the geodesic distance from the similar region to each hub is obtained, and the pixel of interest is assigned to the closest hub category. Here, it is also possible to take out a predetermined plurality of similar regions instead of one and determine category attribution by voting.

以上が画像の類似性のネットワーク構造を用いた領域分割の方法の実施形態の説明である。本実施形態では領域成分の抽出に領域間の類似性のグラフを利用することで、画像の多様体としての性質をより考慮しながら領域の成分の抽出を行うことができる。 The above is the description of the embodiment of the region segmentation method using the network structure of image similarity. In this embodiment, by using a graph of similarity between regions for extracting region components, it is possible to extract region components while more considering the properties of the image manifold.

（第５実施形態）
本発明に係る画像処理方法（領域分割方法）の別の形態として、任意の手段により求められた領域カテゴリーの成分に基づいて候補領域の領域分割を行う方法について説明する。第１実施形態では類似画像に基づいて領域の成分を求めたが、本実施形態において、領域の成分を求める構成はこれに限定するものではない。本実施形態では外部の構成によって領域の成分が得られているような場合に適用可能な別の実施形態について説明する。本実施形態の画像処理装置１０の構成例を図１３に示す。また本実施形態の画像処理装置１０の動作フローを図１４に示す。領域分割部１０１によって分割された候補領域群は、領域同定部１０５に送られる。領域同定部１０５は、Ｋ個の領域成分を領域成分データベース１０７より読み出し、次に候補領域ごとに特徴量を算出する（Ｓ１４０１、Ｓ１４０２）。Ｋ個の領域成分とのカルバック・ライブラー距離を計算する（Ｓ１４０３、Ｓ１４０４、Ｓ１４０５）。以上の処理を入力画像の全候補領域について繰り返し実行する（Ｓ１４０６）。 (Fifth embodiment)
As another form of the image processing method (region division method) according to the present invention, a method for dividing a candidate region based on a component of a region category obtained by any means will be described. In the first embodiment, the component of the region is obtained based on the similar image. However, in the present embodiment, the configuration for obtaining the component of the region is not limited to this. In the present embodiment, another embodiment that can be applied when a component of a region is obtained by an external configuration will be described. A configuration example of the image processing apparatus 10 of the present embodiment is shown in FIG. FIG. 14 shows an operation flow of the image processing apparatus 10 of the present embodiment. The candidate region group divided by the region dividing unit 101 is sent to the region identifying unit 105. The region identification unit 105 reads K region components from the region component database 107, and then calculates a feature amount for each candidate region (S1401, S1402). The Cullback-Ribler distance with K region components is calculated (S1403, S1404, S1405). The above processing is repeatedly executed for all candidate regions of the input image (S1406).

次に入力画像の各画素について以下の処理を行う。注目画素が含まれる候補領域を選び、そのうちいずれかの領域成分までの距離が最小となる候補領域を選択する（Ｓ１４０７、Ｓ１４０８）。このときカルバック・ライブラー距離（ＫＬ距離）が閾値以下であれば（Ｓ１４０９−Ｙｅｓ）、画素を同領域成分のカテゴリーに割り当てる（Ｓ１４１０）。一方、ステップＳ１４０９の判定で、距離が閾値より大きい場合（Ｓ１４０９−Ｎｏ）、注目画素を空白領域としてラベリングを行わない（Ｓ１４１１）。以上の処理を入力画像の全画素について繰り返し実行する（Ｓ１４１２）。そして、全画素のラベリングの結果を領域分割結果として出力して処理を終了する（Ｓ１４１３）。 Next, the following processing is performed for each pixel of the input image. A candidate area including the target pixel is selected, and a candidate area having a minimum distance to any of the area components is selected (S1407, S1408). At this time, if the Cullback-Lailer distance (KL distance) is equal to or smaller than the threshold (S1409—Yes), the pixel is assigned to the category of the same region component (S1410). On the other hand, if it is determined in step S1409 that the distance is larger than the threshold (S1409-No), labeling is not performed with the target pixel as a blank area (S1411). The above process is repeated for all the pixels of the input image (S1412). Then, the result of labeling of all pixels is output as a region division result, and the process is terminated (S1413).

（その他の実施形態）
また、本発明は、以下の処理を実行することによっても実現される。即ち、上述した実施形態の機能を実現するソフトウェア（プログラム）を、ネットワーク又は各種記憶媒体を介してシステム或いは装置に供給し、そのシステム或いは装置のコンピュータ（またはＣＰＵやＭＰＵ等）がプログラムを読み出して実行する処理である。 (Other embodiments)
The present invention can also be realized by executing the following processing. That is, software (program) that realizes the functions of the above-described embodiments is supplied to a system or apparatus via a network or various storage media, and a computer (or CPU, MPU, or the like) of the system or apparatus reads the program. It is a process to be executed.

Claims

An image processing method for dividing an input image into regions for each subject in the input image,
A dividing step of dividing the input image into candidate region groups representing a plurality of region candidates for each subject;
An acquiring unit that acquires a similar image group similar to each candidate area of the candidate area group from an image database;
An extraction step in which an extraction unit extracts a feature common to similar images from the group of similar images as a feature component of a candidate region;
A region that is identified by the region identification means by comparing the feature component with the feature amount of each pixel of the input image, identifying which candidate region each pixel belongs to, and outputting as a result of region division An identification process;
An image processing method comprising:

A second dividing unit further comprising a second dividing step of dividing the result of the region division output in the region identification step into a new candidate region group representing a plurality of region candidates for each subject;
The acquisition step acquires a similar image group similar to each candidate region of the new candidate region group divided in the second division step from the image database,
The extraction step extracts features common to similar images from the similar image group as feature components of candidate regions,
The region identification step compares the extracted feature component with the feature amount of each pixel of the input image to identify which candidate region the new candidate region group belongs to, and The image processing method according to claim 1, wherein the image processing method is a result of area division.

The region identification step includes a determination step of determining whether the result of the second region division has converged with respect to the result of the region division,
When it is determined in the determination step that the result of the second region division has converged, the region identification step outputs the result of the second region division as a result of region division for each subject,
When it is determined by the determination step that the result of the second region division has not converged, the second division step uses the result of the second region division as a plurality of region candidates for each subject. The image processing method according to claim 2, wherein the image processing method is divided into new candidate area groups to be represented.

The setting means sets an annotation to the result of area division in the area identification step using a tag associated with each image stored in the image database, and recognizes the subject in the input image. 4. The image processing method according to claim 1, further comprising a setting step of outputting as a first step.

An image processing apparatus that divides an input image into regions for each subject in the input image,
Dividing means for dividing the input image into candidate region groups representing candidates for a plurality of regions for each subject;
Acquisition means for acquiring a similar image group similar to each candidate area of the candidate area group from an image database;
Extraction means for extracting features common to similar images from the similar image group as feature components of candidate regions;
A region identification unit that compares the feature component with a feature amount of each pixel of the input image, identifies which candidate region each pixel belongs to, and outputs as a result of region division;
An image processing apparatus comprising:

A second dividing unit that divides the result of the region division output by the region identification unit into a new candidate region group representing a plurality of region candidates for each subject;
The acquisition means acquires a similar image group similar to each candidate area of the new candidate area group divided by the second dividing means from an image database,
The extraction means extracts features common to similar images from the group of similar images as feature components of candidate regions,
The region identification means compares the extracted feature component with the feature amount of each pixel of the input image to identify which candidate region the new candidate region group belongs to, and The image processing apparatus according to claim 5, wherein the image processing apparatus is a result of area division.

The area identification means has a determination means for determining whether or not the result of the second area division has converged with respect to the result of the area division,
When the determination unit determines that the result of the second region division has converged, the region identification unit outputs the result of the second region division as a result of region division for each subject,
When the determination unit determines that the result of the second region division has not converged, the second division unit uses the second region division result as a plurality of region candidates for each subject. The image processing apparatus according to claim 6, wherein the image processing apparatus is divided into new candidate area groups to be represented.

A setting for setting an annotation on the result of area division by the area identification unit using a tag associated with each image stored in the image database and outputting the result as a recognition result of the subject in the input image The image processing apparatus according to claim 5, further comprising a unit.

An imaging means for capturing an image;
The image processing apparatus according to any one of claims 5 to 8, wherein the image captured by the imaging unit is used as an input image, and the input image is divided into regions for each subject in the input image.
An imaging device comprising:

The program for functioning a computer as each means of the image processing apparatus of any one of Claims 5 thru | or 8.