JP6699048B2

JP6699048B2 - Feature selecting device, tag related area extracting device, method, and program

Info

Publication number: JP6699048B2
Application number: JP2016141830A
Authority: JP
Inventors: 数藤　恭子; 恭子数藤; 崇之梅田; 和彦村崎; 光太山口; 岡谷　貴之; 貴之岡谷
Original assignee: Tohoku University NUC; Nippon Telegraph and Telephone Corp
Current assignee: Tohoku University NUC; Nippon Telegraph and Telephone Corp
Priority date: 2016-07-19
Filing date: 2016-07-19
Publication date: 2020-05-27
Anticipated expiration: 2036-07-19
Also published as: JP2018013887A

Description

本発明は、特徴選択装置、タグ関連領域抽出装置、方法、及びプログラムに関する。 The present invention relates to a feature selection device, a tag related area extraction device, a method, and a program.

通信環境の高品質化、撮影機能を備えたデバイス（デジタルカメラ、スマートフォン、タブレット等）の普及、それに伴うＳＮＳ（social networking service）サイトやＥＣ（electronic commerce）サイトなどの発展により、ネットワーク上に流通する画像コンテンツの数は膨大なものとなっている。このような膨大なコンテンツを効率的に整理・検索するために、画像を自動的に解析する技術への要望が高まっている。 Distribution on the network due to the improvement of communication environment, the spread of devices equipped with shooting functions (digital cameras, smartphones, tablets, etc.) and the accompanying development of SNS (social networking service) sites and EC (electronic commerce) sites. The number of image contents to be played is enormous. In order to efficiently organize and search such enormous contents, there is an increasing demand for a technique for automatically analyzing images.

解析技術の一つとして、画像に含まれる特定の物体の領域を、予め学習した検出器によって自動的に抽出する技術がある（例えば、非特許文献１，２）。 As one of analysis techniques, there is a technique of automatically extracting a region of a specific object included in an image by a detector learned in advance (for example, Non-Patent Documents 1 and 2).

また、物体らしい画像領域を抽出し、抽出した領域を物体認識用に学習済みのDeep Convolutional Neural Networks（ＤＣＮＮ）へ入力することで、各画像領域に物体が含まれるか否かを判定する技術が知られている（例えば、非特許文献３）。 Further, there is a technique for determining whether or not an object is included in each image area by extracting an image area that seems to be an object and inputting the extracted area to Deep Convolutional Neural Networks (DCNN) that has already been learned for object recognition. It is known (for example, nonpatent literature 3).

Felzenszwalb, P., McAllester, D., & Ramanan, D.,(2008, June). "A discriminatively trained, multiscale, deformable part model.", In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on (p.1-8). IEEE.Felzenszwalb, P., McAllester, D., & Ramanan, D., (2008, June). "A discriminatively trained, multiscale, deformable part model.", In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on (p.1-8). IEEE. Malisiewicz, T., Gupta, A., & Efros, A. A.,(2011, November). "Ensemble of exemplar-svms for object detection and beyond.", In Computer Vision (ICCV), 2011 IEEE International Conference on (p.89-96). IEEE.Malisiewicz, T., Gupta, A., & Efros, AA, (2011, November). "Ensemble of exemplar-svms for object detection and beyond.", In Computer Vision (ICCV), 2011 IEEE International Conference on (p. 89-96). IEEE. Girshick, R. (2015). "Fast r-cnn.", In Proceedings of the IEEE International Conference on Computer Vision (p.1440-1448).Girshick, R. (2015). "Fast r-cnn.", In Proceedings of the IEEE International Conference on Computer Vision (p.1440-1448).

上記非特許文献１や上記非特許文献２に記載されているような検出器を生成するためには、特定の物体を含む画像と、画像内での特定の物体の領域情報とを紐付けた学習データが大量に必要である。 In order to generate a detector as described in Non-Patent Document 1 or Non-Patent Document 2, an image including a specific object and area information of the specific object in the image are associated with each other. A large amount of learning data is required.

また、非特許文献３に記載されているような物体認識用のＤＣＮＮの学習には、大量の画像データとタグのセットが必要となる。この学習用データは、明示的には画像中の特定の物体領域は要求しないが、基本的には単一の物体が写る画像が用いられており、特定の物体の領域情報が内包された学習データが必要となる。 Further, learning of the DCNN for object recognition as described in Non-Patent Document 3 requires a large amount of image data and a set of tags. This learning data does not explicitly request a specific object area in the image, but basically an image showing a single object is used, and learning that includes area information of a specific object Data is needed.

前述のようにＳＮＳサイトにはユーザがタグを付与して投稿した大量の画像データがあり、またＥＣサイトではコンテンツプロバイダーが例えば色や柄などのタグを付与した衣服の画像データが大量に存在しているが、従来必要とされている学習データ、すなわち、画像とタグに加えて、タグの領域情報が紐付けられたデータを生成するには大変な人手がかかる。 As described above, the SNS site has a large amount of image data posted by the user by adding tags, and the EC site has a large amount of image data of clothes to which the content provider has added tags such as colors and patterns. However, it takes a lot of manpower to generate the learning data that has been conventionally required, that is, the data in which the region information of the tag is linked in addition to the image and the tag.

また、画像におけるタグの領域情報を識別するためには、どのような画像特徴が有効であるのかが不明であるという問題がある。 Further, there is a problem that it is unclear what image feature is effective for identifying the area information of the tag in the image.

本発明は、上記問題点を解決するために成されたものであり、画像識別に有効な画像特徴を取得することができる特徴選択装置、方法、及びプログラムを提供することを目的とする。 The present invention has been made to solve the above problems, and an object of the present invention is to provide a feature selection apparatus, method, and program capable of acquiring image features effective for image identification.

また、画像中のタグと関連する領域を精度よく抽出することができるタグ関連領域抽出装置、方法、及びプログラムを提供することを目的とする。 Another object of the present invention is to provide a tag-related area extraction device, method, and program capable of accurately extracting an area associated with a tag in an image.

上記目的を達成するために、本発明に係る映像特徴選択装置は、画像に含まれる特定の物体を表すタグが付与された画像の集合であるポジティブセット及び前記タグが付与されていない画像の集合であるネガティブセットに基づいて、前記画像から得られる複数の画像特徴の各々に対し、前記ポジティブセットに含まれる画像の各々についての前記画像特徴の分布を表すヒストグラムと、前記ネガティブセットに含まれる画像の各々についての前記画像特徴の分布を表すヒストグラムとを生成する画像特徴生成部と、前記画像特徴生成部によって生成された、複数の前記画像特徴の各々についての、前記ポジティブセットの前記ヒストグラムと前記ネガティブセットの前記ヒストグラムとの間の距離を計算する画像特徴分布比較部と、前記画像特徴分布比較部によって計算された前記距離について上位Ｎ個の前記画像特徴を選択する特徴記述子選択部と、を含んで構成されている。 In order to achieve the above object, a video feature selection apparatus according to the present invention is a set of images to which a tag representing a specific object included in an image is attached, and a set of images to which the tag is not attached. Based on the negative set, for each of the plurality of image features obtained from the image, a histogram representing the distribution of the image features for each of the images included in the positive set, and the images included in the negative set. An image feature generator that generates a histogram representing the distribution of the image features for each of the image features, and the histogram of the positive set and the histogram that is generated by the image feature generator for each of the plurality of image features. An image feature distribution comparison unit that calculates a distance between the negative set and the histogram; a feature descriptor selection unit that selects the top N image features of the distance calculated by the image feature distribution comparison unit; It is configured to include.

また、本発明に係る特徴選択方法は、画像特徴生成部が、画像に含まれる特定の物体を表すタグが付与された画像の集合であるポジティブセット及び前記タグが付与されていない画像の集合であるネガティブセットに基づいて、前記画像から得られる複数の画像特徴の各々に対し、前記ポジティブセットに含まれる画像の各々についての前記画像特徴の分布を表すヒストグラムと、前記ネガティブセットに含まれる画像の各々についての前記画像特徴の分布を表すヒストグラムとを生成するステップと、画像特徴分布比較部が、前記画像特徴生成部によって生成された、複数の前記画像特徴の各々についての、前記ポジティブセットの前記ヒストグラムと前記ネガティブセットの前記ヒストグラムとの間の距離を計算するステップと、特徴記述子選択部が、前記画像特徴分布比較部によって計算された前記距離について上位Ｎ個の前記画像特徴を選択するステップと、を含んで実行することを特徴とする。 Further, in the feature selection method according to the present invention, the image feature generation unit uses a positive set, which is a set of images to which a tag representing a specific object included in the image is added, and a set of images to which the tag is not added. Based on a certain negative set, for each of the plurality of image features obtained from the image, a histogram representing the distribution of the image features for each of the images included in the positive set, and of the images included in the negative set. A step of generating a histogram representing a distribution of the image features for each, and an image feature distribution comparison unit, the image of the plurality of image features generated by the image feature generation unit, the positive set of the Calculating a distance between a histogram and the histograms of the negative set, and a feature descriptor selecting unit selecting the top N image features of the distance calculated by the image feature distribution comparing unit. It is characterized in that it is executed by including and.

また、本発明の前記画像特徴生成部は、前記ポジティブセット及び前記ネガティブセットと、予め学習されたニューラルネットワークとに基づいて、前記ポジティブセットに含まれる画像の各々及び前記ネガティブセットに含まれる画像の各々を前記ニューラルネットワークへ入力し、前記画像から得られる複数の画像特徴の各々としての前記ニューラルネットワークの各ユニットの出力に対し、前記ポジティブセットに含まれる画像の各々についての前記ユニットの出力の分布を表すヒストグラムと、前記ネガティブセットに含まれる画像の各々についての前記ユニットの出力の分布を表すヒストグラムとを生成し、前記特徴記述子選択部は、前記画像特徴分布比較部によって計算された前記距離について上位Ｎ個の前記ユニットの出力を選択するようにしてもよい。 Further, the image feature generation unit of the present invention, based on the positive set and the negative set, and a neural network that has been learned in advance, each of the images included in the positive set and the image included in the negative set. A distribution of the output of the unit for each of the images included in the positive set with respect to the output of each unit of the neural network as each of a plurality of image features obtained from the image. And a histogram representing the distribution of the output of the unit for each of the images included in the negative set, wherein the feature descriptor selection unit includes the distance calculated by the image feature distribution comparison unit. The output of the top N units may be selected.

また、本発明のタグ関連領域抽出装置は、上記の特徴選択装置と、入力画像に対してマスキングするための大きさが異なるマスクを複数生成するマスク生成部と、前記特徴記述子選択部によって選択された前記ユニットの各々の出力に対し、前記マスク生成部によって生成された複数の前記マスクの各々によってマスキングされた前記入力画像であるマスク画像の各々を、前記ニューラルネットワークへ入力し、前記マスク画像の各々から得られる、前記ユニットの出力の平均を表す画像を、画像特徴記述子として生成する特徴記述子生成部と、前記特徴記述子選択部によって選択された前記ユニットの各々の出力について、前記特徴記述子生成部によって生成された前記画像特徴記述子を正規化した正規化画像特徴記述子を生成する特徴記述子正規化部と、前記画像特徴分布比較部によって前記選択された前記ユニットの各々の出力について得られた前記距離の各々と、前記特徴記述子正規化部によって前記選択された前記ユニットの各々の出力について生成された前記正規化画像特徴記述子の各々とに基づいて、前記正規化画像特徴記述子の各々を、前記距離に応じた重みで足し合わせることにより、前記入力画像の各画素についての、前記タグと前記画素との間の関連度を算出するタグ関連度算出部と、を含んで構成されている。 The tag-related area extraction device of the present invention is selected by the feature selection device, a mask generation unit that generates a plurality of masks of different sizes for masking an input image, and the feature descriptor selection unit. The mask image, which is the input image masked by each of the plurality of masks generated by the mask generation unit, is input to the neural network with respect to each output of the generated units. For each output of each of the units selected by the feature descriptor selecting unit, an image representing an average of the output of the unit obtained from each of the A feature descriptor normalization unit that generates a normalized image feature descriptor obtained by normalizing the image feature descriptor generated by the feature descriptor generation unit, and each of the units selected by the image feature distribution comparison unit Based on each of the distances obtained for each of the outputs and each of the normalized image feature descriptors generated by the feature descriptor normalizer for each output of the selected units. A tag relevance calculator that calculates the relevance between the tag and the pixel for each pixel of the input image by adding the respective weighted image feature descriptors with a weight according to the distance; , Is included.

また、前記タグ関連度算出部は、更に、前記関連度が予め定められた値以上である画像からなる領域を、前記タグと関連する領域として抽出するようにしてもよい。 Further, the tag relevance calculating unit may further extract an area formed of an image having the relevance of a predetermined value or more as an area associated with the tag.

また、前記ニューラルネットワークにはＣＮＮ（ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ）を用いるようにしてもよい。 Moreover, you may make it use CNN(Convolutional Neural Network) for the said neural network.

本発明に係るプログラムは、コンピュータを、上記の特徴選択装置あるいは上記のタグ関連領域抽出装置の各部として機能させるためのプログラムである。 A program according to the present invention is a program for causing a computer to function as each unit of the feature selection device or the tag-related area extraction device.

本発明の特徴選択装置、方法、及びプログラムによれば、ポジティブセット及びネガティブセットに基づいて、画像から得られる複数の画像特徴の各々に対し、ポジティブセットの画像の画像特徴の分布を表すヒストグラムと、ネガティブセットの画像の画像特徴の分布を表すヒストグラムとを生成し、複数の画像特徴の各々についての、ポジティブセットのヒストグラムとネガティブセットのヒストグラムとの間の距離を計算し、計算された距離について上位Ｎ個の画像特徴を画像特徴記述子として選択することにより、画像識別に有効な画像特徴を取得することができる、という効果が得られる。 According to the feature selection apparatus, the method, and the program of the present invention, based on the positive set and the negative set, for each of the plurality of image features obtained from the image, a histogram representing the distribution of the image features of the images of the positive set, and Generate a histogram representing the distribution of image features of the negative set of images, calculate the distance between the positive set histogram and the negative set histogram for each of the plurality of image features, and for the calculated distance By selecting the top N image features as the image feature descriptor, it is possible to obtain the effect that the image features effective for image identification can be acquired.

また、本発明のタグ関連領域抽出装置、方法、及びプログラムによれば、ポジティブセットに含まれる画像の各々及びネガティブセットに含まれる画像の各々をニューラルネットワークへ入力し、ニューラルネットワークの各ユニットの出力に対し、ポジティブセットのヒストグラムとネガティブセットのヒストグラムとの間の距離を計算し、計算された距離について上位Ｎ個のユニットの出力を選択し、入力画像に対してマスキングするための大きさが異なるマスクを複数生成し、選択されたユニットの各々の出力に対し、複数のマスクの各々によってマスキングされた入力画像であるマスク画像の各々をニューラルネットワークへ入力し、マスク画像の各々から得られる、ユニットの出力の平均を表す画像を画像特徴記述子として生成し、生成された画像特徴記述子を正規化した正規化画像特徴記述子を生成し、選択されたユニットの各々の出力について得られた距離の各々と、選択されたユニットの各々の出力について生成された正規化画像特徴記述子の各々とに基づいて、正規化画像特徴記述子の各々を距離に応じた重みで足し合わせることにより、入力画像の各画素についての、タグと画素との間の関連度を算出することにより、画像中のタグと関連する領域を精度よく抽出することができる、という効果が得られる。 Further, according to the tag-related area extracting device, method, and program of the present invention, each of the images included in the positive set and each of the images included in the negative set are input to the neural network, and the output of each unit of the neural network is output. In contrast, the distance between the positive set histogram and the negative set histogram is calculated, and the output of the top N units for the calculated distance is selected, and the sizes for masking the input image are different. A unit obtained by generating a plurality of masks, inputting each mask image, which is an input image masked by each of the plurality of masks, to a neural network for each output of each selected unit, and obtaining each mask image. Generate an image representing the average of the outputs of the image feature descriptors, generate a normalized image feature descriptor by normalizing the generated image feature descriptors, and obtain the distance obtained for each output of the selected unit. , And each of the normalized image feature descriptors generated for each output of each of the selected units by adding each of the normalized image feature descriptors with a distance-dependent weight. By calculating the degree of association between the tag and the pixel for each pixel of the image, it is possible to obtain an effect that the region related to the tag in the image can be accurately extracted.

本発明の実施の形態に係るタグ関連領域抽出装置の構成を示すブロック図である。It is a block diagram which shows the structure of the tag relevant area extraction device which concerns on embodiment of this invention. 本発明の実施の形態に係るタグ関連領域抽出装置のタグ関連領域抽出部の一構成例を示す図である。It is a figure showing an example of 1 composition of a tag relation field extraction part of a tag relation field extraction device concerning an embodiment of the invention. 本発明の実施の形態に係るタグ関連領域抽出装置におけるタグ関連領域抽出処理ルーチンを示すフローチャートである。It is a flow chart which shows a tag related field extraction processing routine in a tag related field extraction device concerning an embodiment of the invention. 本発明の実施の形態を用いた実験結果の例を示す図である。It is a figure which shows the example of the experimental result which used the embodiment of this invention. 本発明の実施の形態を用いた実験結果の例を示す図である。It is a figure which shows the example of the experimental result which used the embodiment of this invention.

以下、図面を参照して本発明の実施の形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

＜本発明の実施の形態に係る概要＞ <Outline of Embodiment of the Present Invention>

本発明の実施の形態では、タグの有無による画像特徴の分布の変化および画像の各領域から得られる特徴記述子の違いを用いて、タグと関連の強い領域を画像から抽出する。 In the embodiment of the present invention, a region strongly associated with the tag is extracted from the image by using the change in the distribution of the image features depending on the presence or absence of the tag and the difference in the feature descriptor obtained from each region of the image.

具体的には、何らかの局所的な特徴記述子を画像集合に適用し、タグが付与された画像集合から得られる画像特徴の分布と、タグが付与されていない画像集合から得られる画像特徴の分布との間で分布間距離を算出し、最も距離が離れているものから認識容易なタグであるとして選出する。この場合、特徴記述子を多数用意することで、画像特徴の分布間距離の大きいタグと同時に、画像特徴の分布間距離の大きな特徴記述子を得ることができる。これは、視覚的な認識において有用な特徴記述子であると見ることができる。 Specifically, some kind of local feature descriptor is applied to the image set, and the distribution of image features obtained from the image set tagged and the distribution of image features obtained from the image set not tagged. The distance between distributions is calculated between and, and the tag with the longest distance is selected as the tag that is easy to recognize. In this case, by preparing a large number of feature descriptors, it is possible to obtain a tag having a large distance between image feature distributions and a feature descriptor having a large distance between image feature distributions. This can be seen as a feature descriptor useful in visual recognition.

さらにこれを用いて、タグが付与された画像集合に特徴的な画像特徴記述を得ることができる。こうして得られた特徴記述子と特徴的な画像特徴記述と画像の各領域から得られる特徴記述とを比べることで、タグと関連の強い領域を得ることができる。 Further, using this, it is possible to obtain the image feature description characteristic of the image set to which the tag is added. By comparing the feature descriptor thus obtained with the characteristic image feature description and the feature description obtained from each region of the image, it is possible to obtain a region strongly associated with the tag.

以下では、タグが付与された画像集合としてＳＮＳに投稿されたファッションコーディネート画像と関連のタグとのセットを想定する。 In the following, a set of a fashion coordinate image posted on the SNS as a set of images to which tags are attached and related tags is assumed.

＜本発明の実施の形態に係るタグ関連領域抽出装置の構成＞ <Configuration of Tag Related Area Extraction Device According to Embodiment of Present Invention>

次に、本発明の実施の形態に係るタグ関連領域抽出装置の構成について説明する。図１に示すように、本発明の実施の形態に係るタグ関連領域抽出装置１００は、ＣＰＵと、ＲＡＭと、後述するタグ関連領域抽出処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することができる。このタグ関連領域抽出装置１００は、機能的には図１に示すように入力部１０と、演算部２０と、出力部４０とを備えている。タグ関連領域抽出装置１００は、タグが付与された画像集合からタグに関連する領域を抽出する。 Next, the configuration of the tag-related area extraction device according to the embodiment of the present invention will be described. As shown in FIG. 1, a tag-related area extraction device 100 according to an embodiment of the present invention includes a CPU, a RAM, and a ROM that stores a program and various data for executing a tag-related area extraction processing routine described later. And a computer including and. The tag-related area extraction device 100 is functionally provided with an input unit 10, a calculation unit 20, and an output unit 40 as shown in FIG. The tag-related area extraction device 100 extracts an area related to a tag from the image set to which the tag is added.

入力部１０は、タグ付き画像集合Ｄを受け付ける。タグは、画像に特定の物体が含まれていることを表す。タグ付き画像集合Ｄには、タグｕ毎に、タグｕが付与された画像の集合であるポジティブセットＤ_ｕ ^＋と、タグｕが付与されていない画像の集合であるネガティブセットＤ_ｕ ⁻とが含まれている。また、入力部１０は、タグｕ毎に、当該タグが付与された入力画像Ｉ（ｘ，ｙ）を受け付ける。 The input unit 10 receives the tagged image set D. The tag indicates that the image contains a specific object. The tagged image set D includes, for each tag u, a positive set D _u ⁺ which is a set of images to which the tag u is added and a negative set D _u ⁻ which is a set of images to which the tag u is not added. include. The input unit 10 also receives, for each tag u, the input image I(x, y) to which the tag is attached.

演算部２０は、画像集合データベース２２と、画像特徴生成部２４と、画像特徴分布比較部２６と、特徴記述子選択部２８と、タグ関連領域抽出部３０とを備えている。 The calculation unit 20 includes an image set database 22, an image feature generation unit 24, an image feature distribution comparison unit 26, a feature descriptor selection unit 28, and a tag related area extraction unit 30.

画像集合データベース２２には、入力部１０によって受け付けた、各タグｕのポジティブセットＤ_ｕ ^＋とネガティブセットＤ_ｕ ⁻とを含むタグ付き画像集合Ｄが格納される。 The image set database 22 stores the tagged image set D including the positive set D _u ⁺ and the negative set D _u ^{− of} each tag u received by the input unit 10.

画像特徴生成部２４は、タグｕ毎に、画像集合データベース２２に格納されている、ポジティブセットＤ_ｕ ^＋及びネガティブセットＤ_ｕ ⁻と、予め学習されたニューラルネットワークとに基づいて、ポジティブセットＤ_ｕ ^＋及びネガティブセットＤ_ｕ ⁻に含まれる全ての画像に対し、画像特徴を生成する。 The image feature generation unit 24, for each tag u, based on the positive set D _u ⁺ and the negative set D _u ⁻ stored in the image set database 22 and the neural network learned in advance, the positive set D _u. Image features are generated for all images in the ⁺ and negative set D _u ⁻ .

具体的には、まず、画像特徴生成部２４は、画像集合データベース２２に格納されたポジティブセットＤ_ｕ ^＋に含まれる画像の各々及びネガティブセットＤ_ｕ ⁻に含まれる画像の各々を、予め学習されたニューラルネットワークへ入力する。 Specifically, first, the image feature generation unit 24 preliminarily learns each of the images included in the positive set D _u ⁺ and the negative set D _u ⁻ stored in the image set database 22. Input to the neural network.

本実施の形態では、画像から得られる各画像特徴として、予め学習されたニューラルネットワークの各ユニットの出力を用いる場合を例に説明する。また、予め学習されたニューラルネットワークとして、Convolutional Neural Network（ＣＮＮ）を用いる場合を例に説明する。ＣＮＮは局所的な特徴記述子としてみなすことができる。ＣＮＮは内部に多数のフィルタを保持しており、それぞれのフィルタの出力を異なる特徴記述子として用いることができる。なお、本実施の形態では、ＣＮＮのフィルタは物体認識のための画像データセットなどを用いて事前に学習されたものを用いる。以下では、ＣＮＮの各層において用いられるフィルタをユニットと呼ぶ。 In the present embodiment, the case where the output of each unit of the neural network learned in advance is used as each image feature obtained from the image will be described as an example. Further, a case where a Convolutional Neural Network (CNN) is used as a neural network learned in advance will be described as an example. CNN can be viewed as a local feature descriptor. The CNN holds many filters inside, and the output of each filter can be used as a different feature descriptor. In this embodiment, the CNN filter used is one that has been learned in advance using an image data set or the like for object recognition. Below, the filter used in each layer of CNN is called a unit.

次に、画像特徴生成部２４は、画像から得られる各画像特徴としてのニューラルネットワークの各ユニットｉに対し、ポジティブセットＤ_ｕ ^＋に含まれる画像の各々についての当該ユニットの出力の分布を表すヒストグラムＰ_ｉ ^＋と、ネガティブセットＤ_ｕ ⁻に含まれる画像の各々についての当該ユニットｉの出力の分布を表すヒストグラムＰ_ｉ ⁻とを生成する。 Next, the image feature generation unit 24, for each unit i of the neural network as each image feature obtained from the image, a histogram that represents the distribution of the output of that unit for each image included in the positive set D _u ^+. Generate P _i ⁺ and a histogram P _i ⁻ representing the distribution of the output of the unit i for each of the images included in the negative set D _u ⁻ .

画像特徴分布比較部２６は、タグｕ毎に、複数のユニットｉの各々について、画像特徴生成部２４によって生成されたポジティブセットＤ_ｕ ^＋のヒストグラムＰ_ｉ ^＋とネガティブセットＤ_ｕ ⁻のヒストグラムＰ_ｉ ⁻との間の距離を計算する。 Image feature distribution comparing unit 26, for each tag u, for each of a plurality of units i, positive set generated by the image feature generation unit 24 D _u ⁺ histogram P _i ⁺ and negative set of D _u ^- histogram P _i ⁻ Calculate the distance between and.

本実施の形態では、ヒストグラムＰ_ｉ ^＋とヒストグラムＰ_ｉ ⁻と間の距離として、カルバック・ライブラー距離（以下、ＫＬ距離と称する）を用いる場合を例に説明する。 In the present embodiment, a case will be described as an example in which the Kullback-Leibler distance (hereinafter, referred to as KL distance) is used as the distance between the histogram P _i ⁺ and the histogram P _i ⁻ .

画像集合データベース２２に格納されたタグ付き画像集合Ｄから得られる、タグｕに対する、ポジティブセットＤ_ｕ ^＋とネガティブセットＤ_ｕ ⁻との間のＫＬ距離Ｓ_ｉ（ｕ│Ｄ）は、ヒストグラムの各ｂｉｎをｘとして以下のように求められる。なお、ｘは各ユニットから出力される値を表す。 The KL distance S _i (u|D) between the positive set D _u ⁺ and the negative set D _u ⁻ with respect to the tag u, which is obtained from the tagged image set D stored in the image set database 22, is represented by each histogram. It is calculated as follows, where bin is x. In addition, x represents the value output from each unit.

例えば、ｕが視覚的に認識容易でないタグである場合、ポジティブセットＤ_ｕ ^＋の画像特徴の分布はランダムに近くなり、ポジティブセットＤ_ｕ ^＋の画像特徴の分布とネガティブセットＤ_ｕ ⁻の画像特徴の分布との差は小さくなる。一方、ｕが視覚的に認識容易なタグである場合、ポジティブセットＤ_ｕ ^＋の画像特徴の分布とネガティブセットＤ_ｕ ⁻の画像特徴の分布との差が大きくなる。 For example, if u is a tag not visually recognized easily, the distribution of positive set D _u ⁺ of the image features is close to random, positive set D _u ⁺ of the image features of the distribution and negative set D _u ^- image features The difference from the distribution of is small. On the other hand, when u is a tag that can be easily recognized visually, the difference between the distribution of image features of the positive set D _u ⁺ and the distribution of image features of the negative set D _u ⁻ becomes large.

そのため、例えば「赤」「白」といった色の名前のタグや、「ボーダー」「花柄」といったテクスチャのタグの場合、ＫＬ距離Ｓ_ｉ（ｕ│Ｄ）の値は大きくなる。 Therefore, in the case of tags with color names such as “red” and “white” and texture tags such as “border” and “floral pattern”, the value of the KL distance S _i (u|D) becomes large.

そして、画像特徴分布比較部２６は、複数のユニットの各々についての、ＫＬ距離Ｓ_ｉ（ｕ│Ｄ）を出力する。 Then, the image feature distribution comparison unit 26 outputs the KL distance S _i (u|D) for each of the plurality of units.

特徴記述子選択部２８は、タグｕ毎に、画像特徴分布比較部２６によって計算された距離について上位Ｎ個のユニットを画像特徴として選択する。 The feature descriptor selection unit 28 selects, for each tag u, the top N units of the distances calculated by the image feature distribution comparison unit 26 as image features.

具体的には、特徴記述子選択部２８は、画像特徴分布比較部２６によって計算されたＫＬ距離Ｓ_ｉ（ｕ│Ｄ）を入力として、ＫＬ距離Ｓ_ｉ（ｕ│Ｄ）の値が大きい方からＮ個のユニットを画像特徴として選択し、選択されたＫＬ距離Ｓ_ｉ（ｕ│Ｄ）の上位Ｎ個のユニットからなる集合をθ_ｕとする。 Specifically, the feature descriptor selection unit 28 receives the KL distance S _i (u|D) calculated by the image feature distribution comparison unit 26 as an input, and determines the one with the larger value of the KL distance S _i (u|D). From the above, N units are selected as image features, and a set of upper N units of the selected KL distance S _i (u|D) is defined as θ _u .

タグ関連領域抽出部３０は、タグｕ毎に、特徴記述子選択部２８によって選択されたＫＬ距離Ｓ_ｉ（ｕ│Ｄ）の上位Ｎ個のユニットからなる集合θ_ｕに基づいて、当該タグが付与された入力画像から、当該タグと関連する領域を抽出する。タグ関連領域抽出部３０は、マスク生成部３２と、特徴記述子生成部３４と、特徴記述子正規化部３６と、タグ関連度算出部３８とを備えている。 For each tag u, the tag-related area extraction unit 30 determines the tag based on the set θ _u composed of the upper N units of the KL distance S _i (u|D) selected by the feature descriptor selection unit 28. An area associated with the tag is extracted from the provided input image. The tag-related area extraction unit 30 includes a mask generation unit 32, a characteristic descriptor generation unit 34, a characteristic descriptor normalization unit 36, and a tag relation degree calculation unit 38.

マスク生成部３２は、入力部１０によって受け付けた入力画像Ｉ（ｘ，ｙ）に対してマスキングするための、大きさが異なるマスクを複数生成する。マスクは、データセットの平均画像（マスクをかける画像と同じ解像度）の対応する同じ位置から切り取ったものを用いる。マスクのサイズは、例えば入力画像Ｉ（ｘ，ｙ）の１０分の１、５分の１、３分の１などとする。 The mask generation unit 32 generates a plurality of masks having different sizes for masking the input image I(x, y) received by the input unit 10. As the mask, an average image of the data set (same resolution as the image to be masked) cut out from the corresponding corresponding position is used. The size of the mask is, for example, one-tenth, one-fifth, one-third, or the like of the input image I(x,y).

特徴記述子生成部３４は、タグｕ毎に、特徴記述子選択部２８によって当該タグｕについて選択されたユニットからなる集合θ_ｕの各ユニットの出力に対し、マスク生成部３２によって生成された複数のマスクの各々によってマスキングされた入力画像であるマスク画像の各々を、ニューラルネットワークへ入力する。そして、特徴記述子生成部３４は、当該タグｕについて選択されたユニットからなる集合θ_ｕの各ユニットの出力に対し、マスク画像の各々から得られる、当該ユニットの出力の平均を表す画像を、画像特徴記述子として生成する。 The characteristic descriptor generation unit 34 generates, for each tag u, a plurality of units generated by the mask generation unit 32 with respect to the output of each unit of the set θ _{u including} the units selected for the tag u by the characteristic descriptor selection unit 28. Each of the mask images, which is the input image masked by each of the masks, is input to the neural network. Then, the feature descriptor generation unit 34, for the output of each unit of the set θ _u consisting of the unit selected for the tag u, an image representing the average of the output of the unit, obtained from each of the mask image, Generate as an image feature descriptor.

特徴記述子生成部３４では、入力画像Ｉ（ｘ，ｙ）に対し（ｘ，ｙ）で規定されるマスクによって画像の一部領域を隠し、ユニットｉの出力に対して、一部領域が隠された画像から画像特徴記述子Ａ_ｉ（ｘ，ｙ）を生成する。このようなマスクから画像特徴記述子を生成する関数をａ_ｉ（ｘ，ｙ）と定義する。Ａ_ｉ（ｘ，ｙ）は、ｉ番目の特徴記述子（ここではｉ番目のユニットの出力）で、複数のマスクに対応して得られた出力の平均とする。 The feature descriptor generating unit 34 hides a partial area of the image with a mask defined by (x, y) for the input image I(x, y), and hides a partial area of the output of the unit i. An image feature descriptor A _i (x, y) is generated from the generated image. A function for generating an image feature descriptor from such a mask is defined as a _i (x, y). A _i (x, y) is the i-th feature descriptor (here, the output of the i-th unit), and is an average of outputs obtained corresponding to a plurality of masks.

特徴記述子正規化部３６は、タグｕ毎に、特徴記述子選択部２８によって選択されたユニットからなる集合θ_ｕのユニットｉの各々の出力について、特徴記述子生成部３４によって生成された画像特徴記述子Ａ_ｉ（ｘ，ｙ）を正規化した正規化画像特徴記述子Ｒ_ｉ（ｘ，ｙ）を生成する。 The feature descriptor normalization unit 36 generates an image generated by the feature descriptor generation unit 34 for each output of the unit i of the set θ _{u including} the units selected by the feature descriptor selection unit 28 for each tag u. A normalized image feature descriptor R _i (x, y) is generated by normalizing the feature descriptor A _i (x, y).

具体的には、まず、特徴記述子正規化部３６は、各ユニットｉの画像特徴記述子Ａ_ｉ（ｘ，ｙ）の平均画像を算出する。そして、特徴記述子正規化部３６は、ユニットｉの各々について、画像特徴記述子Ａ_ｉ（ｘ，ｙ）を入力し、画像特徴記述子Ａ_ｉ（ｘ，ｙ）と平均画像との差分が最大値をとるときの、当該差分の値が負である場合は画像特徴記述子Ａ_ｉ（ｘ，ｙ）全体に−１を掛ける。そして、特徴記述子正規化部３６は、各ユニットｉの画像特徴記述子Ａ_ｉ（ｘ，ｙ）を、０から１の間の値に正規化したものを正規化画像特徴記述子Ｒ_ｉ（ｘ，ｙ）とする。 Specifically, first, the feature descriptor normalization unit 36 calculates an average image of the image feature descriptors A _i (x, y) of each unit i. The feature descriptor normalizing unit 36, for each unit i, the image characteristic descriptor A _{i (x,} y) enter the image feature descriptors A _{i (x,} y) the difference between the average image and the When the value of the difference is negative when the maximum value is obtained, the entire image feature descriptor A _i (x, y) is multiplied by -1. Then, the feature descriptor normalization unit 36 normalizes the image feature descriptor A _i (x, y) of each unit i to a value between 0 and 1, and normalizes the image feature descriptor R _i ( x, y).

タグ関連度算出部３８は、タグｕ毎に、特徴記述子選択部２８によって選択されたユニットからなる集合θ_ｕのユニットｉの各々の出力について得られたＫＬ距離Ｓ_ｉ（ｕ│Ｄ）の各々と、特徴記述子正規化部３６によって選択されたユニットｉの各々の出力について生成された正規化画像特徴記述子Ｒ_ｉ（ｘ，ｙ）の各々とに基づいて、正規化画像特徴記述子Ｒ_ｉ（ｘ，ｙ）の各々を、ＫＬ距離Ｓ_ｉ（ｕ│Ｄ）に応じた重みで足し合わせることにより、入力画像Ｉ（ｘ，ｙ）の各画素についての、タグと画素との間の関連度を算出する。 The tag relevance calculator 38 calculates the KL distance S _i (u|D) obtained for each output of the unit i of the set θ _{u including} the units selected by the feature descriptor selector 28 for each tag u. A normalized image feature descriptor based on each of the normalized image feature descriptors R _i (x, y) generated for each output of the unit i selected by the feature descriptor normalizer 36; By adding each R _i (x, y) with a weight corresponding to the KL distance S _i (u|D), the tag and pixel between each pixel of the input image I(x, y) Calculate the degree of association of.

タグ関連度算出部３８は、入力画像Ｉ（ｘ，ｙ）の各画素についての、タグｕと画素との間の関連度を表すタグ関連領域Ｍ（ｘ，ｙ｜ｕ，Ｉ）を、以下の式（２）により算出する。 The tag relevance calculator 38 calculates the tag relevance region M(x, y|u, I) representing the relevance between the tag u and the pixel for each pixel of the input image I(x, y) as follows. It is calculated by the equation (2).

ここで、Ｓ_ｉ（ｕ│Ｄ）はタグ付き画像集合Ｄから得られる、タグｕに対する、ポジティブセットＤ_ｕ ^＋のヒストグラムＰ_ｉ ^＋とネガティブセットＤ_ｕ ⁻のヒストグラムＰ_ｉ ⁻との間のＫＬ距離であり、ＺはＳ_ｉ（ｕ│Ｄ）の大きいユニットから順にＮ個のユニットまでのＳ_ｉ（ｕ│Ｄ）を足し合わせたものである。 _{Here, S} i _(u│D) is obtained from the tagged image set D, to the tag u, positive set _D ^{u +} histogram _P ^{i +} and negative set _{D u} ^- KL between ^- histogram _{P i} is the distance, Z is one in which the sum of _S i _(u│D) from large units _S i _(u│D) to N units sequentially.

また、タグ関連度算出部３８は、更に、上記式（２）に従って算出した関連度が予め定められた値以上である画像からなる領域を、タグと関連する領域として抽出してもよい。 Further, the tag relevance calculating unit 38 may further extract a region including an image having a relevance calculated according to the above formula (2) that is equal to or more than a predetermined value as a region related to the tag.

＜本発明の実施の形態に係るタグ関連領域抽出装置の作用＞ <Operation of Tag Related Area Extraction Device According to Embodiment of Present Invention>

次に、本発明の実施の形態に係るタグ関連領域抽出装置１００の作用について説明する。入力部１０においてタグ付き画像集合Ｄを受け付けて画像集合データベース２２に格納し、入力画像Ｉ（ｘ，ｙ）が入力されると、タグ関連領域抽出装置１００は、タグｕ毎に、図３に示すタグ関連領域抽出処理ルーチンを実行する。 Next, the operation of the tag-related area extraction device 100 according to the embodiment of the present invention will be described. When the input unit 10 receives the tagged image set D, stores it in the image set database 22, and inputs the input image I(x, y), the tag-related area extraction device 100 displays the tag u for each tag u in FIG. The tag related area extraction processing routine shown is executed.

まず、ステップＳ１００において、画像特徴生成部２４は、画像集合データベース２２に格納されている、ポジティブセットＤ_ｕ ^＋及びネガティブセットＤ_ｕ ⁻を取得する。 First, in step S100, the image feature generation unit 24 acquires the positive set D _u ⁺ and the negative set D _u ⁻ stored in the image set database 22.

次に、ステップＳ１０２において、画像特徴生成部２４は、上記ステップＳ１００で取得したポジティブセットＤ_ｕ ^＋に含まれる画像の各々及びネガティブセットＤ_ｕ ⁻に含まれる画像の各々を、ＣＮＮへ入力する。そして、画像特徴生成部２４は、ＣＮＮの各ユニットｉに対し、ポジティブセットＤ_ｕ ^＋に含まれる画像の各々についての当該ユニットの出力の分布を表すヒストグラムＰ_ｉ ^＋と、ネガティブセットＤ_ｕ ⁻に含まれる画像の各々についての当該ユニットｉの出力の分布を表すヒストグラムＰ_ｉ ⁻とを生成する。 Next, in step S102, the image feature generation unit 24 inputs each of the images included in the positive set D _u ⁺ and each of the images included in the negative set D _u ⁻ acquired in step S100 to the CNN. Then, the image feature generation unit 24, for each unit i of the CNN, the histogram P _i ⁺ representing the distribution of the output of the unit for each of the images included in the positive set D _u ^+, negative set function D _u ^- in And a histogram P _i ⁻ representing the distribution of the output of the unit i in question for each of the included images.

ステップＳ１０４において、画像特徴分布比較部２６は、上記ステップＳ１０２で生成された、複数のユニットの各々についての、ポジティブセットＤ_ｕ ^＋のヒストグラムＰ_ｉ ^＋とネガティブセットＤ_ｕ ⁻のヒストグラムＰ_ｉ ⁻との間の距離を、上記式（１）に従って計算する。 In step S104, the image feature distribution comparing unit 26 generated in step S102, for each of a plurality of units, positive set _D ^{u +} histogram _P ^{i +} and negative set _{D u} ^- histogram _{P i} ^- and The distance between is calculated according to equation (1) above.

ステップＳ１０６において、特徴記述子選択部２８は、上記ステップＳ１０４で計算された距離について上位Ｎ個のユニットを選択し、ユニットの集合θ_ｕとする。 In step S106, the feature descriptor selection unit 28 selects the upper N units of the distance calculated in step S104 and sets them as the unit set θ _u .

ステップＳ１０８において、マスク生成部３２は、入力部１０によって受け付けた入力画像Ｉ（ｘ，ｙ）に対してマスキングするための、大きさが異なるマスクを複数生成する。 In step S108, the mask generation unit 32 generates a plurality of masks having different sizes for masking the input image I(x, y) received by the input unit 10.

ステップＳ１１０において、特徴記述子生成部３４は、上記ステップＳ１０６で選択されたユニットからなる集合θ_ｕのユニットｉの各々の出力に対し、上記ステップＳ１０８で生成された複数のマスクの各々によってマスキングされた入力画像であるマスク画像の各々を、ニューラルネットワークへ入力する。そして、特徴記述子生成部３４は、ユニットｉの各々の出力に対し、マスク画像の各々から得られる、ユニットｉの出力の平均を表す画像を、画像特徴記述子Ａ_ｉ（ｘ，ｙ）として生成する。 In step S110, the feature descriptor generation unit 34 masks the output of each unit i of the set θ _{u including} the units selected in step S106 by each of the plurality of masks generated in step S108. Each of the mask images, which are the input images, is input to the neural network. Then, the feature descriptor generation unit 34 sets, as an image feature descriptor A _i (x, y), an image representing the average of the outputs of the unit i obtained from each of the mask images for each output of the unit i. To generate.

ステップＳ１１２において、特徴記述子正規化部３６は、上記ステップＳ１０６で選択されたユニットからなる集合θ_ｕのユニットｉの各々の出力について、上記ステップＳ１１０で生成された画像特徴記述子Ａ_ｉ（ｘ，ｙ）を正規化した正規化画像特徴記述子Ｒ_ｉ（ｘ，ｙ）を生成する。 In step S112, the feature descriptor normalization unit 36, for each output of the unit i of the set θ _{u including} the units selected in step S106, generates the image feature descriptor A _i (x , Y) is normalized to generate a normalized image feature descriptor R _i (x, y).

ステップＳ１１４において、タグ関連度算出部３８は、上記ステップＳ１０６で選択されたユニットからなる集合θ_ｕのユニットｉの各々の出力について得られたＫＬ距離Ｓ_ｉ（ｕ│Ｄ）の各々と、選択されたユニットｉの各々の出力について上記ステップＳ１１２で生成された正規化画像特徴記述子Ｒ_ｉ（ｘ，ｙ）の各々とに基づいて、上記式（２）に従って、正規化画像特徴記述子Ｒ_ｉ（ｘ，ｙ）の各々を、ＫＬ距離Ｓ_ｉ（ｕ│Ｄ）に応じた重みで足し合わせることにより、入力画像Ｉ（ｘ，ｙ）の各画素についての、タグと画素との間の関連度を算出する。 In step S114, the tag relevance calculating unit 38 selects each of the KL distances S _i (u|D) obtained for each output of the unit i of the set θ _{u including} the units selected in step S106. Based on each of the normalized image feature descriptors R _i (x, y) generated in step S112 for each output of each unit i, according to equation (2) above, the normalized image feature descriptor R By adding each _i (x, y) with a weight according to the KL distance S _i (u|D), for each pixel of the input image I(x, y), between the tag and the pixel, Calculate the degree of association.

ステップＳ１１６において、出力部４０は、上記ステップＳ１１４で算出された、入力画像Ｉ（ｘ，ｙ）の各画素についての、タグと画素との間の関連度を結果として出力し処理を終了する。 In step S116, the output unit 40 outputs the degree of association between the tag and the pixel for each pixel of the input image I(x, y) calculated in step S114 as a result, and ends the process.

＜実験例＞
図４は、実際の画像に対し、人手でタグ抽出したタグ関連領域と、本発明の実施の形態に係るタグ関連領域抽出装置により得られたタグ関連領域の例である。また、図５は、ヒストグラム間の距離Ｓ_ｉ（ｕ│Ｄ）の大きい方から選択するユニットの数Ｎの値を変えた場合の抽出されたタグ関連領域である。最適なユニットの数Ｎの値はタグにより異なる。ユニットの数Ｎの値が大きくなると計算量は増えるが、より詳細にタグ関連領域が抽出されている。 <Experimental example>
FIG. 4 is an example of a tag-related area obtained by manually extracting tags from an actual image and a tag-related area obtained by the tag-related area extracting apparatus according to the embodiment of the present invention. In addition, FIG. 5 is an extracted tag-related region when the value of the number N of units selected from the larger distance S _i (u|D) between histograms is changed. The optimum value of the number N of units depends on the tag. The calculation amount increases as the value of the number N of units increases, but the tag-related area is extracted in more detail.

以上説明したように、本発明の実施の形態に係るタグ関連領域抽出装置によれば、ポジティブセットに含まれる画像の各々及びネガティブセットに含まれる画像の各々をＣＮＮへ入力し、ＣＮＮの各ユニットの出力に対し、ポジティブセットのヒストグラムとネガティブセットのヒストグラムとの間の距離を計算し、計算された距離について上位Ｎ個のユニットの出力を選択し、入力画像に対してマスキングするための大きさが異なるマスクを複数生成し、選択されたユニットの各々の出力に対し、複数のマスクの各々によってマスキングされた入力画像であるマスク画像の各々をＣＮＮへ入力し、マスク画像の各々から得られる、ユニットの出力の平均を表す画像を画像特徴記述子として生成し、生成された画像特徴記述子を正規化した正規化画像特徴記述子を生成し、選択されたユニットの各々の出力について得られた距離の各々と、選択されたユニットの各々の出力について生成された正規化画像特徴記述子の各々とに基づいて、正規化画像特徴記述子の各々を距離に応じた重みで足し合わせることにより、入力画像の各画素についての、タグと画素との間の関連度を算出することにより、画像中のタグと関連する領域を精度よく抽出することができる。 As described above, according to the tag-related area extraction device of the embodiment of the present invention, each of the images included in the positive set and each of the images included in the negative set are input to the CNN, and each unit of the CNN is input. Size for calculating the distance between the positive set histogram and the negative set histogram, and selecting the output of the top N units for the calculated distance and masking for the input image A plurality of different masks are generated, and for each output of the selected unit, each of the mask images, which is an input image masked by each of the plurality of masks, is input to CNN and is obtained from each of the mask images. An image representing the average of the outputs of the units is generated as an image feature descriptor, a normalized image feature descriptor obtained by normalizing the generated image feature descriptor is generated, and obtained for each output of the selected unit. Based on each of the distances and each of the normalized image feature descriptors generated for each output of the selected units, by adding each of the normalized image feature descriptors with a weight depending on the distance, By calculating the degree of association between the tag and the pixel for each pixel of the input image, the region related to the tag in the image can be accurately extracted.

また、本発明の実施の形態によれば、このような領域情報は含まないが、画像と画像に含まれるタグが紐付いたデータから、タグ毎の関連領域を学習することができる。 Further, according to the embodiment of the present invention, although such area information is not included, a related area for each tag can be learned from data in which an image and a tag included in the image are associated with each other.

なお、本発明は、上述した実施の形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 The present invention is not limited to the above-described embodiments, and various modifications and applications can be made without departing from the scope of the present invention.

例えば、上記の実施の形態は、本発明をタグ関連領域抽出装置に適用した場合を例に説明したが、これに限定されるものではなく、本発明を特徴選択装置に適用してもよい。その場合には、特徴選択装置は、画像特徴生成部２４と、画像特徴分布比較部２６と、特徴記述子選択部２８とを含んで構成される。 For example, the above embodiment has been described by taking the case where the present invention is applied to the tag-related area extracting device as an example, but the present invention is not limited to this and the present invention may be applied to a feature selecting device. In that case, the feature selection device is configured to include an image feature generation unit 24, an image feature distribution comparison unit 26, and a feature descriptor selection unit 28.

この特徴選択装置によれば、ポジティブセット及びネガティブセットに基づいて、画像から得られる複数の画像特徴の各々に対し、ポジティブセットの画像の画像特徴の分布を表すヒストグラムと、ネガティブセットの画像の画像特徴の分布を表すヒストグラムとを生成し、複数の画像特徴の各々についての、ポジティブセットのヒストグラムとネガティブセットのヒストグラムとの間の距離を計算し、計算された距離について上位Ｎ個の画像特徴を画像特徴記述子として選択することにより、画像識別に有効な画像特徴を取得することができる。 According to this feature selection device, based on the positive set and the negative set, for each of the plurality of image features obtained from the image, a histogram representing the distribution of the image features of the image of the positive set, and the image of the image of the negative set. And a histogram representing the distribution of the features, calculating the distance between the positive set histogram and the negative set histogram for each of the plurality of image features, and calculating the top N image features for the calculated distances. By selecting the image feature descriptor, it is possible to acquire the image feature effective for image identification.

また、上記の実施の形態は、画像から得られる各画像特徴として、予め学習されたニューラルネットワークの各ユニットの出力を用いる場合を例に説明したが、これに限定されるものではなく、他の画像特徴を用いてもよい。 Further, in the above embodiment, the case where the output of each unit of the neural network learned in advance is used as each image feature obtained from the image has been described as an example, but the present invention is not limited to this and other Image features may be used.

また、上述した実施の形態では、ニューラルネットワークとしてＣＮＮを用いる場合を例に説明したがこれに限定されるものではなく、他のニューラルネットワークを用いてもよい。 Further, in the above-described embodiment, the case where the CNN is used as the neural network has been described as an example, but the present invention is not limited to this, and another neural network may be used.

また、上述した実施の形態では、ヒストグラム間の距離としてカルバック・ライブラー距離を用いる場合を例に説明したがこれに限定されるものではなく、他の距離を用いてもよい。 Further, in the above-described embodiment, the case where the Kullback-Leibler distance is used as the distance between the histograms has been described as an example, but the present invention is not limited to this, and another distance may be used.

上述のタグ関連領域抽出装置１００は、内部にコンピュータシステムを有しているが、「コンピュータシステム」は、ＷＷＷシステムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。 The above-described tag-related area extraction device 100 has a computer system inside, but the “computer system” also includes a homepage providing environment (or display environment) if a WWW system is used. .

また、本願明細書中において、プログラムが予めインストールされている実施形態として説明したが、当該プログラムを、コンピュータ読み取り可能な記録媒体に格納して提供することも可能である。 Further, in the specification of the present application, the embodiment in which the program is preinstalled has been described, but the program can be stored in a computer-readable recording medium and provided.

１０入力部
２０演算部
２２画像集合データベース
２４画像特徴生成部
２６画像特徴分布比較部
２８特徴記述子選択部
３０タグ関連領域抽出部
３２マスク生成部
３４特徴記述子生成部
３６特徴記述子正規化部
３８タグ関連度算出部
４０出力部
１００タグ関連領域抽出装置 10 input unit 20 operation unit 22 image set database 24 image feature generation unit 26 image feature distribution comparison unit 28 feature descriptor selection unit 30 tag related region extraction unit 32 mask generation unit 34 feature descriptor generation unit 36 feature descriptor normalization unit 38 tag degree-of-association calculation unit 40 output unit 100 tag-related area extraction device

Claims

Of a plurality of image features obtained from the image based on a positive set that is a set of images to which a tag representing a specific object included in the image is attached and a negative set that is a set of images to which the tag is not attached. Image features for each generating a histogram representing the distribution of the image features for each of the images included in the positive set and a histogram representing the distribution of the image features for each of the images included in the negative set A generator,
An image feature distribution comparison unit that calculates a distance between the histogram of the positive set and the histogram of the negative set, for each of the plurality of image features generated by the image feature generation unit,
A feature descriptor selection unit that selects the top N image features for the distance calculated by the image feature distribution comparison unit;
Only including,
The image feature generation unit, based on the positive set and the negative set, and a neural network that has been learned in advance, the neural network for each of the images included in the positive set and each of the images included in the negative set. A histogram representing the distribution of the output of the unit for each of the images included in the positive set, with respect to the output of each unit of the neural network as each of a plurality of image features obtained from the image. And a histogram representing the distribution of the output of the unit for each of the images included in the negative set,
The feature descriptor selection unit selects the output of the top N units for the distance calculated by the image feature distribution comparison unit,
Feature selection device.

A feature selection apparatus according to claim 1 ;
A mask generation unit that generates a plurality of masks of different sizes for masking the input image;
For each output of each of the units selected by the feature descriptor selection unit, the mask image that is the input image masked by each of the plurality of masks generated by the mask generation unit is set to the neural network. A feature descriptor generation unit that inputs to the network and generates an image representing an average of the outputs of the units obtained from each of the mask images, as an image feature descriptor,
For each output of the unit selected by the feature descriptor selection unit, a feature descriptor normalization for generating a normalized image feature descriptor obtained by normalizing the image feature descriptor generated by the feature descriptor generation unit. Akabe,
Each of the distances obtained for the output of each of the selected units by the image feature distribution comparison unit and the generated for each output of each of the selected units by the feature descriptor normalization unit. The tag and the pixel for each pixel of the input image by adding each of the normalized image feature descriptors with a weight according to the distance based on each of the normalized image feature descriptors. A tag relevance calculator that calculates a relevance between
An apparatus for extracting a tag-related area including.

The tag-related area extraction device according to claim 2 , wherein the tag-related area calculating unit further extracts an area formed of an image in which the degree of association is a predetermined value or more as an area associated with the tag.

The feature selecting apparatus according to claim 1 , wherein a CNN (Convolutional Neural Network) is used for the neural network.

The CNN (Convolutional Neural Network) is used for the neural network, The tag related area|region extraction apparatus of Claim 2 or Claim 3 .

The image feature generation unit obtains from the image based on a positive set that is a set of images to which a tag representing a specific object included in the image is attached and a negative set that is a set of images to which the tag is not attached. A histogram representing the distribution of the image features for each of the images included in the positive set and a histogram representing the distribution of the image features for each of the images included in the negative set for each of the plurality of image features The steps of generating and
An image feature distribution comparing unit calculates a distance between the histogram of the positive set and the histogram of the negative set for each of the plurality of image features generated by the image feature generating unit; ,
A feature descriptor selecting unit selects the top N image features of the distance calculated by the image feature distribution comparing unit;
Only including,
The step of generating the histogram by the image feature generation unit is included in each of the images included in the positive set and the negative set based on the positive set and the negative set, and a neural network that has been learned in advance. Inputting each of the images to the neural network and outputting the output of each unit of the neural network as each of a plurality of image features obtained from the image to the output of each unit of the images included in the positive set And a histogram representing the distribution of the output of the unit for each of the images included in the negative set,
The step of selecting the image features by the feature descriptor selection unit selects the output of the top N units for the distance calculated by the image feature distribution comparison unit,
Feature selection method.

Computer, according to claim 1 or feature selection device according to claim 4, or claim 2, claim 3, or a program to function as each section of the tag associated region extraction apparatus according to claim 5.