JP6778625B2

JP6778625B2 - Image search system, image search method and image search program

Info

Publication number: JP6778625B2
Application number: JP2017015717A
Authority: JP
Inventors: 悠一吉田
Original assignee: Denso IT Laboratory Inc
Current assignee: Denso IT Laboratory Inc
Priority date: 2017-01-31
Filing date: 2017-01-31
Publication date: 2020-11-04
Anticipated expiration: 2037-01-31
Also published as: JP2018124740A

Description

本発明は、大量の参照画像の中からクエリ画像に対応する参照画像を検索する画像検索システム、画像検索方法及び画像検索プログラムに関する。 The present invention relates to an image search system, an image search method, and an image search program for searching a reference image corresponding to a query image from a large number of reference images.

従来、大量の参照画像の中からクエリ画像に対応する参照画像を検索し、クエリ画像に撮影された物体を特定する特定物体認識が行われている。画像検索では、事前準備として、特定の物体が撮影された画像を多種多様な物体について大量に準備して参照画像とし、当該参照画像をデータベースに記録しておく。さらに、各参照画像について、参照画像の特徴を示す特徴量を算出し、当該特徴量を各参照画像に対応付けてデータベースに記録しておく。画像検索を行う場合には、撮影装置によって撮影された画像をクエリ画像とする。そして、クエリ画像について特徴量を算出し、クエリ画像の特徴量と各参照画像の特徴量とを比較して、クエリ画像の特徴量と最も一致度の高い特徴量を有する参照画像を対応する参照画像として選択する。選択された参照画像に撮影された物体がクエリ画像に撮影された物体であるといえ、クエリ画像に撮影された物体が特定されることになる（例えば、特許文献１参照）。 Conventionally, specific object recognition has been performed by searching for a reference image corresponding to a query image from a large number of reference images and identifying an object captured in the query image. In the image search, as a preliminary preparation, a large number of images of a specific object are prepared for a wide variety of objects as reference images, and the reference images are recorded in a database. Further, for each reference image, a feature amount indicating the feature of the reference image is calculated, and the feature amount is associated with each reference image and recorded in the database. When performing an image search, an image taken by the photographing device is used as a query image. Then, the feature amount of the query image is calculated, the feature amount of the query image is compared with the feature amount of each reference image, and the reference image having the feature amount having the highest degree of matching with the feature amount of the query image is referred to. Select as an image. It can be said that the object captured in the selected reference image is the object captured in the query image, and the object captured in the query image is specified (see, for example, Patent Document 1).

特開２０１５−１１１３３９号公報Japanese Unexamined Patent Publication No. 2015-11139

近年、撮影装置によって撮影可能な画像のサイズが大きくなっている。このような大きなサイズの画像をクエリ画像として画像検索を行う場合には、特徴量の算出ないし参照画像の選択に時間を要し、実用的な時間内で画像検索を実行することが困難となる。
本発明の目的は、高速に画像検索を行うことが可能な画像検索システム、画像検索方法及び画像検索プログラムを提供することである。 In recent years, the size of an image that can be captured by a photographing device has increased. When performing an image search using such a large size image as a query image, it takes time to calculate the feature amount and select the reference image, and it becomes difficult to execute the image search within a practical time. ..
An object of the present invention is to provide an image search system, an image search method, and an image search program capable of performing an image search at high speed.

本発明の第１実施態様は、クエリ画像から前記クエリ画像の各位置において特定のカテゴリに属する物体が存在する可能性を示す注目度を表す注目度画像を生成する注目度画像生成部と、前記注目度画像から前記注目度に基づいて興味領域を生成する興味領域生成部と、前記クエリ画像から前記興味領域に対応する領域を切り出して特定カテゴリクエリ画像を生成する特定カテゴリクエリ画像生成部と、複数の参照画像が記録されているデータベースから前記特定カテゴリクエリ画像に対応する参照画像を検索する参照画像検索部と、を有し、前記注目度画像生成部は、前記特定のカテゴリを含む複数のカテゴリに属する物体が撮影された画像を用いて学習した深層畳み込みニューラルネットワークを用いる、画像検索システムである。 A first embodiment of the present invention includes a attention level image generation unit that generates a attention level image representing a attention level indicating the possibility that an object belonging to a specific category exists at each position of the query image from the query image. An interest region generation unit that generates an interest region based on the attention level image, a specific category query image generation unit that cuts out a region corresponding to the interest region from the query image and generates a specific category query image, possess a reference image retrieval unit in which a plurality of reference images to find the reference image corresponding to the specific category query image from the database being recorded, the said prominence image generation unit, a plurality of containing the specific category This is an image search system that uses a deep convolution neural network learned by using images of objects belonging to the category .

本実施態様では、クエリ画像の各位置に特定のカテゴリに属する物体が存在する可能性を示す注目度に基づいて興味領域を生成し、クエリ画像から興味領域に対応する領域を切り出して特定カテゴリクエリ画像を生成し、当該特定カテゴリクエリ画像に基づいて参照画像の検索を行っている。このため、参照画像の検索処理を高速で行うことができ、画像検索を高速で行うことが可能となっている。 In the present embodiment, an area of interest is generated based on the degree of attention indicating the possibility that an object belonging to a specific category exists at each position of the query image, and an area corresponding to the area of interest is cut out from the query image to perform a specific category query. An image is generated, and a reference image is searched based on the specific category query image. Therefore, the reference image search process can be performed at high speed, and the image search can be performed at high speed.

本発明の第２実施態様は、前記画像検索システムは、前記クエリ画像のサイズを縮小するクエリ画像縮小部をさらに有し、前記注目度画像生成部は、前記クエリ画像縮小部によって縮小された前記クエリ画像から前記注目度画像を生成する、画像検索システムである。 In a second embodiment of the present invention, the image search system further includes a query image reduction unit that reduces the size of the query image, and the attention level image generation unit is reduced by the query image reduction unit. This is an image search system that generates the attention level image from a query image.

本実施態様では、縮小されたサイズの小さなクエリ画像に基づいて注目度画像の生成を行っている。このため、注目度画像の生成処理を高速で行うこができ、画像検索をさらに高速で行うことが可能となっている。 In this embodiment, the attention level image is generated based on the reduced size query image. Therefore, the attention level image generation process can be performed at high speed, and the image search can be performed at even higher speed.

本発明の第３実施態様は、クエリ画像から前記クエリ画像の各位置において特定のカテゴリに属する物体が存在する可能性を示す注目度を表す注目度画像を生成する注目度画像生成ステップと、前記注目度画像から前記注目度に基づいて興味領域を生成する興味領域生成ステップと、前記クエリ画像から前記興味領域に対応する領域を切り出して特定カテゴリクエリ画像を生成する特定カテゴリクエリ画像生成ステップと、複数の参照画像が記録されているデータベースから前記特定カテゴリクエリ画像に対応する参照画像を検索する参照画像検索ステップと、を有し、前記注目度画像生成ステップは、前記特定のカテゴリを含む複数のカテゴリに属する物体が撮影された画像を用いて学習した深層畳み込みニューラルネットワークを用いる、画像検索方法である。
本実施態様では、第１実施態様と同様の効果を奏する。 A third embodiment of the present invention includes a attention level image generation step of generating a attention level image representing a attention level indicating the possibility that an object belonging to a specific category exists at each position of the query image from the query image. An interest region generation step that generates an interest region based on the attention level image from an attention level image, a specific category query image generation step that cuts out a region corresponding to the interest region from the query image and generates a specific category query image, and the like. possess a reference image retrieval step in which a plurality of reference images to find the reference image corresponding to the specific category query image from the database being recorded, the said prominence image generating step, the plurality including the specific category This is an image search method that uses a deep convolution neural network learned by using images of objects belonging to the category .
In this embodiment, the same effect as in the first embodiment is obtained.

本発明の第４実施態様は、コンピュータに、クエリ画像から前記クエリ画像の各位置において特定のカテゴリに属する物体が存在する可能性を示す注目度を表す注目度画像を生成する注目度画像生成機能と、前記注目度画像から前記注目度に基づいて興味領域を生成する興味領域生成機能と、前記クエリ画像から前記興味領域に対応する領域を切り出して特定カテゴリクエリ画像を生成する特定カテゴリクエリ画像生成機能と、複数の参照画像が記録されているデータベースから前記特定カテゴリクエリ画像に対応する参照画像を検索する参照画像検索機能と、を実現させ、前記注目度画像生成機能は、前記特定のカテゴリを含む複数のカテゴリに属する物体が撮影された画像を用いて学習した深層畳み込みニューラルネットワークを用いる、画像検索プログラムである。
本実施態様では、第１実施態様と同様の効果を奏する。 A fourth embodiment of the present invention is a attention level image generation function that generates a attention level image representing a attention level indicating the possibility that an object belonging to a specific category exists at each position of the query image from the query image in a computer. And an interest area generation function that generates an interest area based on the attention degree from the attention degree image, and a specific category query image generation that cuts out an area corresponding to the interest area from the query image and generates a specific category query image. The function and the reference image search function for searching the reference image corresponding to the specific category query image from the database in which a plurality of reference images are recorded are realized, and the attention level image generation function sets the specific category. This is an image search program that uses a deep convolution neural network learned by using images of objects belonging to a plurality of categories including the images .
In this embodiment, the same effect as in the first embodiment is obtained.

本発明では、高速に画像検索を行うことが可能となっている。 In the present invention, it is possible to perform an image search at high speed.

本発明の各実施形態の画像検索方法の概要を示す模式図。The schematic diagram which shows the outline of the image search method of each embodiment of this invention. 本発明の第１実施形態の画像検索システムを示すブロック図。The block diagram which shows the image search system of 1st Embodiment of this invention. 本発明の第１実施形態のニューラルネットワーク部を示す模式図。The schematic diagram which shows the neural network part of 1st Embodiment of this invention. 本発明の第１実施形態の画像検索方法を示すフロー図。The flow chart which shows the image search method of 1st Embodiment of this invention. 本発明の第１実施形態の画像検索方法を示すフロー図。The flow chart which shows the image search method of 1st Embodiment of this invention. 本発明の第２実施形態の興味領域生成ステップを示すフロー図。The flow chart which shows the interest area generation step of 2nd Embodiment of this invention. 本発明の第３実施形態の興味領域生成ステップを示すフロー図。The flow chart which shows the interest area generation step of 3rd Embodiment of this invention.

図１を参照して、本発明の各実施形態の画像検索方法の概要を説明する。
本概要説明については、各実施形態の理解に資することを目的として、基本的な概念のみを示すものであり、本発明の画像検索方法については、様々な変形態様が考えられ、本概要説明において示される処理方法に限定されるものではない。 An outline of the image search method according to each embodiment of the present invention will be described with reference to FIG.
This outline explanation shows only the basic concept for the purpose of contributing to the understanding of each embodiment, and various modifications of the image search method of the present invention can be considered. It is not limited to the processing method shown.

各実施形態の画像検索方法については、大量の参照画像からクエリ画像に対応する参照画像を検索する際に、クエリ画像に前処理を施して、参照画像の検索処理を高速で行えるようにするものである。前処理においては、認識対象である特定のカテゴリに属する物体が存在する可能性の高い領域をクエリ画像から切り出し、切り出した画像に基づいて参照画像の検索処理を行う。 Regarding the image search method of each embodiment, when a reference image corresponding to a query image is searched from a large number of reference images, the query image is preprocessed so that the reference image search process can be performed at high speed. Is. In the preprocessing, a region in which an object belonging to a specific category to be recognized is likely to exist is cut out from the query image, and a reference image search process is performed based on the cut out image.

具体的には、図１に示されるように、まず、クエリ画像８１から、クエリ画像８１の各位置において特定のカテゴリに属する物体が存在する可能性を示す注目度を表す注目度画像８３を生成する。そして、生成された注目度画像８３から、注目度に基づいて注目領域ａ，ｂを抽出し、抽出した注目領域ａ，ｂに基づいて興味領域Ｄを生成する。注目領域ａ，ｂの決定方法、興味領域Ｄの設定方法としては、様々な方法を用いることが可能であり、その具体例を各実施形態で説明する。続いて、クエリ画像８１から興味領域Ｄに対応する領域Ｄ´を切り出して、特定カテゴリクエリ画像８５を生成する。このようにして生成された特定カテゴリクエリ画像８５に基づいて参照画像の検索処理を行う。 Specifically, as shown in FIG. 1, first, from the query image 81, an attention level image 83 showing the degree of attention indicating the possibility that an object belonging to a specific category exists at each position of the query image 81 is generated. To do. Then, the attention areas a and b are extracted from the generated attention area 83 based on the attention degree, and the interest area D is generated based on the extracted attention areas a and b. Various methods can be used as a method for determining the regions of interest a and b and a method for setting the region of interest D, and specific examples thereof will be described in each embodiment. Subsequently, the region D'corresponding to the region of interest D is cut out from the query image 81 to generate the specific category query image 85. The reference image search process is performed based on the specific category query image 85 generated in this way.

図２乃至図５を参照し、本発明の第１実施形態について説明する。
図２及び図３を参照し、本実施形態の画像検索システムについて説明する。
図２に示されるように、本実施形態の画像検索システムは画像を撮影する撮影部１０を有する。撮影部１０としては、モバイルデバイスのカメラや車両の車載カメラ等が用いられる。撮影部１０は比較的サイズの大きな画像を撮影する。画像のサイズの大小とは画素数の多寡を示す。撮影部１０によって撮影された画像がクエリ画像８１となる。 The first embodiment of the present invention will be described with reference to FIGS. 2 to 5.
The image search system of the present embodiment will be described with reference to FIGS. 2 and 3.
As shown in FIG. 2, the image search system of the present embodiment has a photographing unit 10 for photographing an image. As the photographing unit 10, a camera of a mobile device, an in-vehicle camera of a vehicle, or the like is used. The photographing unit 10 captures a relatively large image. The size of the image indicates the number of pixels. The image captured by the photographing unit 10 becomes the query image 81.

画像検索システムは、データベース部６０に記録されている大量の参照画像からクエリ画像８１に対応する参照画像を検索する演算制御部１２を有する。演算制御部１２は以下に述べる各機能を有する。なお、演算制御部１２に当該各機能を実現させるためのプログラムについても本願発明の範囲に含まれる。 The image search system includes a calculation control unit 12 that searches a large number of reference images recorded in the database unit 60 for a reference image corresponding to the query image 81. The arithmetic control unit 12 has each function described below. The program for realizing each of the functions in the arithmetic control unit 12 is also included in the scope of the present invention.

演算制御部１２は、クエリ画像８１のサイズを縮小するクエリ画像縮小部１５を有する。クエリ画像８１は、以下に述べる注目度画像生成部２０による注目度画像８３の生成に適したサイズに適宜縮小される。画像のサイズの縮小においては、複数の画素を、当該複数の画素の画素値の平均の画素値を有する単一の画素に置換する等、適宜の粗視化を行う。このため、画像のサイズの縮小により解像度は低下することになる。 The arithmetic control unit 12 has a query image reduction unit 15 that reduces the size of the query image 81. The query image 81 is appropriately reduced to a size suitable for generating the attention level image 83 by the attention level image generation unit 20 described below. In reducing the size of the image, appropriate coarse graining is performed, such as replacing a plurality of pixels with a single pixel having an average pixel value of the pixel values of the plurality of pixels. Therefore, the resolution is lowered by reducing the size of the image.

演算制御部１２は、クエリ画像縮小部１５によって縮小されたクエリ画像８１から、特定のカテゴリについての注目度を表す注目度画像８３を生成する注目度画像生成部２０を有する。注目度とは、クエリ画像の各位置において、所定のカテゴリに属する物体が存在する可能性を示すものである。このような注目度画像を生成する方法としては、例えば、B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, 「Learning Deep Features for Discriminative Localization」 Computer Vision and Pattern Recognition (CVPR), 2016に記載された技術を用いることができる。 The arithmetic control unit 12 has an attention level image generation unit 20 that generates an attention level image 83 representing the attention level for a specific category from the query image 81 reduced by the query image reduction unit 15. The degree of attention indicates the possibility that an object belonging to a predetermined category exists at each position of the query image. Examples of methods for generating such attention level images include B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, "Learning Deep Features for Discriminative Localization" Computer Vision and Pattern Recognition ( The techniques described in CVPR), 2016 can be used.

注目度画像生成部２０は、様々なカテゴリについての注目度を算出するニューラルネットワーク部２１と、特定のカテゴリについての注目度を表す注目度画像８３を描画する画像描画部２２と、を有する。 The attention level image generation unit 20 includes a neural network unit 21 that calculates attention levels for various categories, and an image drawing unit 22 that draws attention level images 83 that represent attention levels for specific categories.

カテゴリとしては、ＤＶＤ／ＣＤジャケット、ポスター、道路標識等の様々なカテゴリが用いられる。特定のカテゴリについては、画像検索システムの目的に応じて適宜選択される。例えば、ＤＶＤ／ＣＤジャケットの撮影された画像から、撮影されているＤＶＤ／ＣＤを特定し、当該ＤＶＤ／ＣＤの内容等の関連情報を得るような画像検索システムでは、特定のカテゴリとしてＤＶＤ／ＣＤジャケットのカテゴリが選択される。また、道路標識の撮影された画像から、撮影されている道路標識を特定し、速度規制、一方通行等の当該道路標識の内容を得るような画像検索システムでは、特定のカテゴリとして道路標識が選択される。 As the category, various categories such as DVD / CD jackets, posters, and road signs are used. Specific categories are appropriately selected according to the purpose of the image search system. For example, in an image search system that identifies a DVD / CD being shot from a shot image of a DVD / CD jacket and obtains related information such as the contents of the DVD / CD, the DVD / CD is classified as a specific category. The jacket category is selected. In addition, in an image search system that identifies the road sign being photographed from the photographed image of the road sign and obtains the content of the road sign such as speed regulation and one-way traffic, the road sign is selected as a specific category. Will be done.

図３に示されるように、ニューラルネットワーク部２１では、深層畳込ニューラルネットワークが用いられる。深層畳込ニューラルネットワークは、入力層２３、多数の中間層２４、全結合層２８及び出力層２９を積層することにより形成されており、中間層２４は、畳込層２５、活性化層２６及びプーリング層２７を積層することにより形成されている。 As shown in FIG. 3, a deep convolution neural network is used in the neural network unit 21. The deep convolutional neural network is formed by laminating an input layer 23, a large number of intermediate layers 24, a fully connected layer 28, and an output layer 29, and the intermediate layer 24 includes a convolution layer 25, an activation layer 26, and an activation layer 26. It is formed by laminating the pooling layers 27.

深層畳込ニューラルネットワークは、画像が入力される入力層２３を有する。
そして、深層畳込ニューラルネットワークは多数の中間層２４を有する。
中間層２４は、画像の各位置における特徴を抽出する畳込層２５を有する。即ち、畳込層２５は、式（１）に示されるように、各ユニットｌについて、画像の所定の位置（ｉ，ｊ）における前層からの入力ｘに対して、入力ｘの全ユニットについての重み付け和Σαｘにバイアスβを付加して、次層への出力ｙとする。
The deep convolution neural network has an input layer 23 into which an image is input.
And the deep convolution neural network has a large number of intermediate layers 24.
The intermediate layer 24 has a convolutional layer 25 that extracts features at each position in the image. That is, as shown in the equation (1), the convolutional layer 25 is used for all the units of the input x with respect to the input x from the front layer at the predetermined position (i, j) of the image for each unit l. Bias β is added to the weighted sum Σαx of the above to obtain the output y to the next layer.

中間層２４は、収束性や学習速度の向上に寄与する活性化層２６を有する。活性化層２６は、式（２）に示されるように、各ユニットｌについて、畳込層２５からの入力ｘに対する活性化関数ｆからの応答を次層への出力ｙとする。活性化関数としてはＲｅＬＵ（ｒｅｃｔｉｆｉｅｄｌｉｎｅａｒｕｎｉｔ）を用いる。なお、活性化関数としては、シグモイド関数等、その他の適宜の関数を用いてもよい。
The intermediate layer 24 has an activation layer 26 that contributes to improvement of convergence and learning speed. As shown in the equation (2), the activation layer 26 sets the response from the activation function f to the input x from the convolution layer 25 as the output y to the next layer for each unit l. ReLU (rectified liner unit) is used as the activation function. As the activation function, other appropriate functions such as a sigmoid function may be used.

中間層２４は、画像における局所的な変動を捨象して情報を圧縮するプーリング層２７を有する。即ち、プーリング層２７は、式（３）に示されるように、各ユニットｌについて、ｍ個の要素を包含する小領域Ｍ内において、活性化層２６からの入力ｘの平均値をとる平均プーリングを行う。なお、プーリング方法としては、最大プーリング等、その他の適宜のプーリング方法を用いてもよい。
The intermediate layer 24 has a pooling layer 27 that abstracts information by abstracting local fluctuations in the image. That is, as shown in the formula (3), the pooling layer 27 takes an average value of the inputs x from the activation layer 26 in the small region M including m elements for each unit l. I do. As the pooling method, other appropriate pooling method such as maximum pooling may be used.

そして、深層畳込ニューラルネットワークは、各カテゴリｃについて注目度を算出する全結合層２８を有する。即ち、全結合層２８は、式（４）に示されるように、各カテゴリｃについて、前層からの入力ｘの全ユニットについての重み付き和Ｓ（＝Σωｘ）を算出する。
そして、全結合層２８は、式（５）に示されるように、重み付き和Ｓの入力に対するソフトマップ関数の応答を各カテゴリｃについての注目度Ｐとする。
さらに、深層畳込ニューラルネットワークは、各カテゴリｃについての注目度Ｐを画像描画部２２に出力する出力層２９を有する。 Then, the deep convolution neural network has a fully connected layer 28 for calculating the degree of attention for each category c. That is, as shown in the equation (4), the fully connected layer 28 calculates the weighted sum S (= Σωx) for all the units of the input x from the previous layer for each category c.
Then, as shown in the equation (5), the fully connected layer 28 sets the response of the softmap function to the input of the weighted sum S as the degree of attention P for each category c.
Further, the deep convolution neural network has an output layer 29 that outputs the degree of attention P for each category c to the image drawing unit 22.

深層畳込ニューラルネットワークでは、上述した重みα，ω及びバイアスβ等のパラメーターについては、各カテゴリｃに属する物体の撮影された多数の画像を用いた学習により、予め決定されている。即ち、理想的な出力Ｑと実際の出力Ｒとの乖離については、式（６）に示される交差エントロピーＥによって測定される。
当該交差エントロピーＥが極小化されるように、式（７）に示されるように、誤差逆伝搬法を用いて、重みα，ωないしバイアスβ等のパラメーターを順次更新して、パラメーターを決定する。
In the deep convolution neural network, the parameters such as the weights α, ω and the bias β described above are determined in advance by learning using a large number of captured images of the objects belonging to each category c. That is, the dissociation between the ideal output Q and the actual output R is measured by the cross entropy E shown in the equation (6).
As shown in Eq. (7), parameters such as weights α, ω, and bias β are sequentially updated to determine the parameters so that the cross entropy E is minimized by using the error back propagation method. ..

図２に示されるように、注目度画像生成部２０は、ニューラルネットワーク部２１から入力されたカテゴリｃについての注目度Ｐから、特定のカテゴリＣについての注目度Ｐを表す注目度画像８３を描画する画像描画部２２を有する。
注目度画像生成部２０で生成される注目度画像８３は縮小されたクエリ画像８１と同一のサイズとなる。 As shown in FIG. 2, the attention level image generation unit 20 draws a attention level image 83 representing the attention level P for a specific category C from the attention level P for the category c input from the neural network unit 21. The image drawing unit 22 is provided.
The attention level image 83 generated by the attention level image generation unit 20 has the same size as the reduced query image 81.

演算制御部１２は、クエリ画像８１の縮小率に基づいて、注目度画像８３のサイズを縮小前のクエリ画像８１と同一のサイズに拡大する注目度画像拡大部３５を有する。画像のサイズの拡大においては、単一の画素を、当該画素の画素値と同一の画素値を有する複数の画素に置換する等、適宜の補完を行う。 The arithmetic control unit 12 has an attention image enlargement unit 35 that enlarges the size of the attention image 83 to the same size as the query image 81 before reduction based on the reduction ratio of the query image 81. In expanding the size of the image, appropriate complementation is performed, such as replacing a single pixel with a plurality of pixels having the same pixel value as the pixel value of the pixel.

演算制御部１２は、拡大された注目度画像８３から、注目度に基づいて興味領域Ｄを生成する興味領域生成部４０を有する。
即ち、興味領域生成部４０は、所定の閾値以上の注目度を有する１つ以上の領域を注目領域ａ，ｂとして抽出する注目領域抽出部４１を有する。興味領域生成部４０は、注目領域抽出部４１で抽出された１つ以上の注目領域ａ，ｂから１つ以上の注目領域ａ，ｂを選択する注目領域選択部４２を有する。本実施形態では、注目領域選択部４２は注目領域抽出部４１で抽出された全ての注目領域ａ，ｂを選択する。興味領域生成部４０は、選択された全ての注目領域ａ，ｂに外接する長方形状の領域を興味領域Ｄとして設定する興味領域設定部４３を有する。当該興味領域Ｄについては、クエリ画像８１の対応する領域において、特定のカテゴリに属する物体が存在する可能性が高い領域を示すものである。 The arithmetic control unit 12 has an interest region generation unit 40 that generates an interest region D based on the attention level from the enlarged attention level image 83.
That is, the interest region generation unit 40 has an attention region extraction unit 41 that extracts one or more regions having a degree of attention equal to or higher than a predetermined threshold value as attention regions a and b. The area of interest generation unit 40 has an area of interest 42 that selects one or more areas of interest a, b from one or more areas of interest a, b extracted by the area of interest 41. In the present embodiment, the attention area selection unit 42 selects all the attention areas a and b extracted by the attention area extraction unit 41. The interest region generation unit 40 has an interest region setting unit 43 that sets a rectangular region circumscribing all the selected interest regions a and b as the interest region D. The region of interest D indicates a region in the corresponding region of the query image 81 where an object belonging to a specific category is likely to exist.

演算制御部１２は、クエリ画像８１から興味領域Ｄに対応する領域を切り出して、特定カテゴリクエリ画像８５を生成する特定カテゴリクエリ画像生成部４５を有する。画像の切出しとは、画像のサイズの拡縮を伴うことなく、画像の一部を分離することを示す。このため、画像の切出しによって画像のサイズ自体は小さくなるものの、画像のサイズが縮小されるわけではなく、解像度が低下することはない。 The arithmetic control unit 12 has a specific category query image generation unit 45 that cuts out a region corresponding to the region of interest D from the query image 81 and generates a specific category query image 85. Clipping an image means separating a part of an image without scaling the size of the image. Therefore, although the size of the image itself is reduced by cropping the image, the size of the image is not reduced and the resolution is not reduced.

演算制御部１２は、特定カテゴリクエリ画像８５を縮小する特定カテゴリクエリ画像縮小部５０を有する。特定カテゴリクエリ画像８５のサイズは、以下に述べる参照画像検索部５２による参照画像の検索に適したサイズに適宜縮小される。 The arithmetic control unit 12 has a specific category query image reduction unit 50 that reduces the specific category query image 85. The size of the specific category query image 85 is appropriately reduced to a size suitable for searching the reference image by the reference image search unit 52 described below.

演算制御部１２は、データベース部６０に記録されている多量の参照画像から、特定カテゴリクエリ画像８５に対応する参照画像を検索する参照画像検索部５２を有する。
参照画像検索部５２は、特定カテゴリクエリ画像８５の特徴量を算出する特徴量算出部５５を有する。即ち、特徴量算出部５５は、特定カテゴリクエリ画像８５からエッジを抽出してエッジ画像を生成し、エッジ画像から複数の特徴点を検出する特徴点検出部５６を有する。特徴量算出部５５は、各特徴点における局所特徴量を抽出する局所特徴量抽出部５７を有する。局所特徴量としてはＳＩＦＴ特徴量が用いられ、ＳＩＦＴ特徴量はＮ次元ベクトルとして得られる。特徴量算出部５５は、ＳＩＦＴ特徴量をバイナリコードに変換するバイナリ変換部５８を有する。バイナリ変換部５８は、式（８）に示されるように、ＳＩＦＴ特徴量ｖを二値によって表現されるバイナリコードｈに変換する。ここで、ｄは変換後のバイナリコードのサイズ、ｗは、Ｎ行ｄ列の行列であり、Ｎ次元における半径１の超球上の点から、正規分布に従ってランダムサンプリングをして得られるベクトルである。
The arithmetic control unit 12 has a reference image search unit 52 that searches for a reference image corresponding to the specific category query image 85 from a large number of reference images recorded in the database unit 60.
The reference image search unit 52 has a feature amount calculation unit 55 that calculates the feature amount of the specific category query image 85. That is, the feature amount calculation unit 55 has a feature point detection unit 56 that extracts an edge from the specific category query image 85 to generate an edge image and detects a plurality of feature points from the edge image. The feature amount calculation unit 55 has a local feature amount extraction unit 57 that extracts local feature amounts at each feature point. The SIFT feature is used as the local feature, and the SIFT feature is obtained as an N-dimensional vector. The feature amount calculation unit 55 has a binary conversion unit 58 that converts the SIFT feature amount into a binary code. As shown in the equation (8), the binary conversion unit 58 converts the SIFT feature amount v into the binary code h represented by the binary value. Here, d is the size of the converted binary code, w is a matrix of N rows and d columns, and is a vector obtained by random sampling according to a normal distribution from a point on a hypersphere having a radius of 1 in the N dimension. is there.

一方、データベース部６０は、大量の参照画像が記録されている参照画像記録部６１を有する。参照画像としては、多種多様なカテゴリについて、カテゴリ毎に、当該カテゴリに属する物体が撮影された画像が多数記録されている。例えば、カテゴリとして、ＤＶＤ／ＣＤジャケット、ポスター、道路標識等が用いられ、ＤＶＤ／ＣＤジャケットのカテゴリに属する物体として、様々な種類のＤＶＤ／ＣＤジャケット、道路標識のカテゴリに属する物体として、速度規制、一方通行等の様々な道路標識が用いられる。 On the other hand, the database unit 60 has a reference image recording unit 61 in which a large amount of reference images are recorded. As the reference image, a large number of images of objects belonging to the category are recorded for each of a wide variety of categories. For example, DVD / CD jackets, posters, road signs, etc. are used as categories, and speed regulation is used as objects belonging to the DVD / CD jacket category, various types of DVD / CD jackets, and objects belonging to the road sign category. , One-way traffic signs, etc. are used.

データベース部６０は、各参照画像について算出された特徴量が各参照画像と対応付けられて記録されている特徴量記録部６２を有する。特徴量としては、特徴量算出部５５によって算出される特徴量と同様、各特徴点におけるＳＩＦＴ特徴量をバイナリ変換したバイナリコードが用いられる。 The database unit 60 has a feature amount recording unit 62 in which the feature amount calculated for each reference image is recorded in association with each reference image. As the feature amount, a binary code obtained by binary-converting the SIFT feature amount at each feature point is used as in the feature amount calculated by the feature amount calculation unit 55.

データベース部６０は、各参照画像に関連する関連情報が各参照画像と対応付けられて記録されている関連情報記録部６３を有する。関連情報としては、例えば、参照画像がＤＶＤ／ＣＤジャケットである場合には、当該ＤＶＤ／ＣＤの内容、映画ＤＶＤであれば監督や出演者の情報、音楽ＣＤであれば作曲者や演奏者の情報が用いられる。また、参照画像が交通標識である場合には、制限速度や一方通行等の当該交通標識の内容が用いられる。 The database unit 60 has a related information recording unit 63 in which related information related to each reference image is recorded in association with each reference image. As related information, for example, when the reference image is a DVD / CD jacket, the contents of the DVD / CD, the information of the director or performer if it is a movie DVD, or the composer or performer if it is a music CD. Information is used. When the reference image is a traffic sign, the content of the traffic sign such as speed limit or one-way traffic is used.

そして、参照画像検索部５２は、データベース部６０に記録されている多数の参照画像から、特定カテゴリクエリ画像８５に対応する参照画像を選択する参照画像選択部６５を有する。即ち、参照画像選択部６５は、特定カテゴリクエリ画像８５の各特徴点の特徴量と全参照画像の全特徴点の特徴量とを比較し、特定カテゴリクエリ画像８５の特徴点の特徴量と最も一致度の高い特徴量を有する参照画像の特徴点を、当該特定カテゴリクエリ画像８５の特徴点に対応する特徴点として選択する。ここで、両バイナリコードの不一致度を示すハミング距離が最も小さくなる特徴量が最も一致度の高い特徴量とされる。そして、参照画像選択部６５は、全参照画像の内、特定カテゴリクエリ画像８５の特徴点に対応する特徴点の数が最も多い参照画像を、当該特定カテゴリクエリ画像８５に対応する参照画像として選択する。 Then, the reference image search unit 52 has a reference image selection unit 65 that selects a reference image corresponding to the specific category query image 85 from a large number of reference images recorded in the database unit 60. That is, the reference image selection unit 65 compares the feature amount of each feature point of the specific category query image 85 with the feature amount of all the feature points of all the reference images, and compares the feature amount of the feature points of the specific category query image 85 with the feature amount of the most. The feature points of the reference image having a feature amount with a high degree of matching are selected as the feature points corresponding to the feature points of the specific category query image 85. Here, the feature amount having the smallest Hamming distance indicating the degree of mismatch between the two binary codes is regarded as the feature amount having the highest degree of matching. Then, the reference image selection unit 65 selects the reference image having the largest number of feature points corresponding to the feature points of the specific category query image 85 as the reference image corresponding to the specific category query image 85 among all the reference images. To do.

クエリ画像８１に撮影されている物体と、特定カテゴリクエリ画像８５に対応する参照画像に撮影されている物体は同一種類の物体であるといえ、参照画像に撮影されている物体の種類は予め特定されているから、クエリ画像８１に撮影されている物体の種類が特定されることになる。
画像検索システムは、選択された参照画像に対応付けられた関連情報を読み出す関連情報読出部７０を有する。 It can be said that the object captured in the query image 81 and the object captured in the reference image corresponding to the specific category query image 85 are the same type of object, and the type of the object captured in the reference image is specified in advance. Therefore, the type of the object captured in the query image 81 is specified.
The image search system has a related information reading unit 70 that reads related information associated with the selected reference image.

画像検索システムは、読み出された関連情報に基づいて出力を行う出力部７５を有する。出力部７５は、例えば、モバイルデバイスのカメラでＤＶＤ／ＣＤジャケットが撮影された場合には、当該モバイルデバイスの画面に当該ＤＶＤ／ＣＤの関連情報を表示し、車載カメラで道路標識が撮影された場合には、当該道路標識の内容に基づいて車両を自動制御する。 The image search system has an output unit 75 that outputs based on the read related information. For example, when the DVD / CD jacket is photographed by the camera of the mobile device, the output unit 75 displays the related information of the DVD / CD on the screen of the mobile device, and the road sign is photographed by the in-vehicle camera. In that case, the vehicle is automatically controlled based on the content of the road sign.

図４及び図５を参照し、本発明の第１実施形態の画像検索方法について説明する。
撮影ステップ（Ｓ１０）
撮影ステップ（Ｓ１０）では、比較的サイズの大きな画像を撮影する。本実施形態では、モバイルデバイスのカメラによりＤＶＤ／ＣＤジャケットが撮影されており、画像のサイズは４０００×３０００画素である。撮影された画像がクエリ画像８１となる。 The image search method of the first embodiment of the present invention will be described with reference to FIGS. 4 and 5.
Shooting step (S10)
In the shooting step (S10), a relatively large image is shot. In the present embodiment, the DVD / CD jacket is photographed by the camera of the mobile device, and the size of the image is 4000 × 3000 pixels. The captured image becomes the query image 81.

クエリ画像縮小ステップ（Ｓ１５）
クエリ画像縮小ステップ（Ｓ１５）では、クエリ画像８１を縮小する。クエリ画像８１は、以下に述べる注目度画像生成ステップ（Ｓ２０）における注目度画像８３の生成に適したサイズに適宜縮小される。本実施形態では、クエリ画像８１のサイズを４０００×３０００画素から２５６×２５６画素まで縮小している。 Query image reduction step (S15)
In the query image reduction step (S15), the query image 81 is reduced. The query image 81 is appropriately reduced to a size suitable for generating the attention level image 83 in the attention level image generation step (S20) described below. In the present embodiment, the size of the query image 81 is reduced from 4000 × 3000 pixels to 256 × 256 pixels.

注目度画像生成ステップ（Ｓ２０）
注目度画像生成ステップ（Ｓ２０）では、上述したように深層畳込ニューラルネットワークを用いて、縮小したクエリ画像８１から注目度画像８３を生成する。注目度画像８３のサイズは、縮小したクエリ画像８１のサイズと同一であり、本実施形態では２５６×２５６画素である。 Attention level image generation step (S20)
In the attention level image generation step (S20), the attention level image 83 is generated from the reduced query image 81 by using the deep convolution neural network as described above. The size of the attention level image 83 is the same as the size of the reduced query image 81, and is 256 × 256 pixels in this embodiment.

注目度画像拡大ステップ（Ｓ２５）
注目度画像拡大ステップ（Ｓ２５）では、クエリ画像８１の縮小率に基づいて、注目度画像８３を縮小前のクエリ画像８１と同一のサイズに拡大する。本実施形態では、注目度画像８３のサイズを２５６×２５６画素から４０００×３０００画素まで拡大する。 Attention level image enlargement step (S25)
In the attention level image enlargement step (S25), the attention level image 83 is enlarged to the same size as the query image 81 before reduction based on the reduction ratio of the query image 81. In the present embodiment, the size of the attention level image 83 is increased from 256 × 256 pixels to 4000 × 3000 pixels.

興味領域生成ステップ（Ｓ３０）
興味領域生成ステップ（Ｓ３０）では、拡大された注目度画像８３から注目度に基づいて興味領域Ｄを生成する。即ち、注目領域抽出ステップ（Ｓ３１）では、注目度画像８３において所定の閾値以上の注目度を有する１つ以上の注目領域ａ，ｂを抽出する。そして、注目領域選択ステップ（Ｓ３２）では、抽出された全ての注目領域ａ，ｂを選択する。興味領域設定ステップ（Ｓ３３）では、選択された全ての注目領域ａ，ｂに外接する長方形状の領域を興味領域Ｄに設定する。 Area of interest generation step (S30)
In the interest region generation step (S30), the interest region D is generated from the enlarged attention level image 83 based on the attention level. That is, in the attention region extraction step (S31), one or more attention regions a and b having an attention degree equal to or higher than a predetermined threshold value are extracted in the attention degree image 83. Then, in the attention area selection step (S32), all the extracted attention areas a and b are selected. In the area of interest setting step (S33), a rectangular area circumscribing all the selected areas of interest a and b is set as the area of interest D.

特定カテゴリクエリ画像生成ステップ（Ｓ３５）
特定カテゴリクエリ画像生成ステップ（Ｓ３５）では、クエリ画像８１から興味領域Ｄに対応する領域Ｄ´を切り出して、特定カテゴリクエリ画像８５を生成する。本実施形態では、クエリ画像８１のサイズが４０００×３０００画素であるのに対して、特定カテゴリクエリ画像８５のサイズは２０００×１８００画素である。 Specific category query image generation step (S35)
In the specific category query image generation step (S35), the area D'corresponding to the area of interest D is cut out from the query image 81, and the specific category query image 85 is generated. In the present embodiment, the size of the query image 81 is 4000 × 3000 pixels, whereas the size of the specific category query image 85 is 2000 × 1800 pixels.

特定カテゴリクエリ画像縮小ステップ（Ｓ４０）
特定カテゴリクエリ画像縮小ステップ（Ｓ４０）では、特定カテゴリクエリ画像８５を縮小する。特定カテゴリクエリ画像８５のサイズは、以下に述べる参照画像検索ステップ（Ｓ４２）における参照画像の検索に適したサイズに適宜設定される。本実施形態では、特定カテゴリクエリ画像８５のサイズは２０００×１８００画素から３２０×２４０画素まで縮小される。 Specific category query image reduction step (S40)
In the specific category query image reduction step (S40), the specific category query image 85 is reduced. The size of the specific category query image 85 is appropriately set to a size suitable for searching the reference image in the reference image search step (S42) described below. In the present embodiment, the size of the specific category query image 85 is reduced from 2000 × 1800 pixels to 320 × 240 pixels.

参照画像検索ステップ（Ｓ４２）
参照画像検索ステップ（Ｓ４２）において、特徴量算出ステップ（Ｓ４５）では、特定カテゴリクエリ画像８５の特徴量が算出される。即ち、特徴量算出ステップ（Ｓ４６）において、特徴点検出ステップ（Ｓ４６）では、特定カテゴリクエリ画像８５からエッジを抽出してエッジ画像を生成し、エッジ画像から複数の特徴点を検出する。局所特徴量抽出ステップ（Ｓ４７）では、各特徴点においてＳＩＦＴ特徴量を抽出する。バイナリ変換ステップ（Ｓ４８）では、ＳＩＦＴ特徴量をバイナリコードに変換する。 Reference image search step (S42)
In the reference image search step (S42), the feature amount calculation step (S45) calculates the feature amount of the specific category query image 85. That is, in the feature amount calculation step (S46), in the feature point detection step (S46), an edge is extracted from the specific category query image 85 to generate an edge image, and a plurality of feature points are detected from the edge image. In the local feature amount extraction step (S47), the SIFT feature amount is extracted at each feature point. In the binary conversion step (S48), SIFT features are converted into binary code.

参照画像検索ステップ（Ｓ４２）において、参照画像選択ステップ（Ｓ５０）では、データベースに記録されている多数の参照画像から、特定カテゴリクエリ画像８５に対応する参照画像を選択する。即ち、検索画像の各特徴点のバイナリコードと、データベースに記録されている全参照画像の全特徴点のバイナリコードとを比較し、ハミング距離が最も近い参照画像の特徴点を、当該特定カテゴリクエリ画像８５の特徴点に対応する特徴点として選択する。そして、全参照画像の内、特定カテゴリクエリ画像８５の特徴点に対応する特徴点の数が最も多い参照画像を、特定カテゴリクエリ画像８５に対応する参照画像として選択する。 In the reference image search step (S42), in the reference image selection step (S50), a reference image corresponding to the specific category query image 85 is selected from a large number of reference images recorded in the database. That is, the binary code of each feature point of the search image is compared with the binary code of all the feature points of all the reference images recorded in the database, and the feature points of the reference image having the closest Hamming distance are subjected to the specific category query. It is selected as a feature point corresponding to the feature point of the image 85. Then, among all the reference images, the reference image having the largest number of feature points corresponding to the feature points of the specific category query image 85 is selected as the reference image corresponding to the specific category query image 85.

本実施形態の画像検索システム及び画像検索方法は以下の効果を奏する。
本実施形態の画像検索システム及び画像検索方法では、縮小されたクエリ画像８１に基づき、深層畳込ニューラルネットワークを用いてクエリ画像８１の各位置において特定のカテゴリに属する物体が存在する可能性を示す注目度を表す注目度画像８３を生成している。そして、縮小前のクエリ画像８１と同じサイズに拡大した注目度画像８３に基づいて、クエリ画像８１において特定のカテゴリに属する物体の存在する可能性の高い興味領域Ｄを生成し、クエリ画像８１から当該興味領域Ｄに対応する領域Ｄ´を特定カテゴリクエリ画像８５として切り出し、当該特定カテゴリクエリ画像８５に基づいて参照画像の検索を行っている。このため、クエリ画像８１よりも小さなサイズの特定カテゴリクエリ画像８５に基づいて、参照画像の検索処理を行うこととなるため、画像検索を高速で行うことが可能となっている。さらに、縮小されたサイズの小さなクエリ画像８１に基づいて、注目度画像８３の生成を行っているため、深層畳込ニューラルネットワークを用いた注目度画像８３の生成処理を高速で行うことができ、画像検索をさらに高速で行うことが可能となっている。なお、特定カテゴリクエリ画像８５のサイズはクエリ画像８１のサイズよりも小さくなっているが、特定カテゴリクエリ画像８５はクエリ画像８１から切り出されて生成されており、特定カテゴリクエリ画像８５の解像度はクエリ画像８１の解像度から低下しているわけではない。このため、クエリ画像８１をそのまま用いて画像検索を行った場合と同程度の検索精度が実現されている。 The image search system and the image search method of the present embodiment have the following effects.
In the image search system and the image search method of the present embodiment, based on the reduced query image 81, an object belonging to a specific category may exist at each position of the query image 81 by using a deep convolution neural network. The attention level image 83 showing the attention level is generated. Then, based on the attention level image 83 enlarged to the same size as the query image 81 before reduction, an interest region D in which an object belonging to a specific category is likely to exist in the query image 81 is generated, and the query image 81 is used. The area D'corresponding to the area of interest D is cut out as the specific category query image 85, and the reference image is searched based on the specific category query image 85. Therefore, since the reference image search process is performed based on the specific category query image 85 having a size smaller than that of the query image 81, the image search can be performed at high speed. Further, since the attention level image 83 is generated based on the reduced size query image 81, the attention level image 83 can be generated at high speed using the deep convolution neural network. Image search can be performed at even higher speeds. Although the size of the specific category query image 85 is smaller than the size of the query image 81, the specific category query image 85 is cut out from the query image 81 and generated, and the resolution of the specific category query image 85 is a query. It is not reduced from the resolution of image 81. Therefore, the same level of search accuracy as when the image search is performed using the query image 81 as it is is realized.

本実施形態では、縮小したクエリ画像に基づいて注目度画像を生成しているが、充分な演算能力を有するコンピュータを用いる場合には、クエリ画像を縮小することなく、クエリ画像をそのまま用いて、注目度画像の生成を行うようにしてもよい。 In the present embodiment, the attention level image is generated based on the reduced query image, but when a computer having sufficient computing power is used, the query image is used as it is without reducing the query image. The attention degree image may be generated.

また、本実施形態では、縮小されたクエリ画像から注目度画像を生成し、注目度画像を拡大した後に拡大した注目度画像から興味領域を生成し、クエリ画像から興味領域に対応する領域を切り出して、特定カテゴリクエリ画像を生成している。しかしながら、注目度画像を拡大する前に興味領域を生成し、クエリ画像の縮小率に基づいて興味領域を拡大して、クエリ画像から拡大した興味領域に対応する領域を切り出し、特定カテゴリクエリ画像を生成するようにしてもよい。 Further, in the present embodiment, the attention level image is generated from the reduced query image, the interest level image is generated from the enlarged attention level image after the attention level image is enlarged, and the area corresponding to the interest area is cut out from the query image. And generate a specific category query image. However, the area of interest is generated before the attention level image is enlarged, the area of interest is expanded based on the reduction ratio of the query image, the area corresponding to the expanded area of interest is cut out from the query image, and the specific category query image is obtained. It may be generated.

図６を参照し、本発明の第２実施形態について説明する。
図６に示されるように、本実施形態の画像検索方法において、興味領域生成ステップ（Ｓ６０）では、注目領域抽出ステップ（Ｓ６１）は第１実施形態と同様である。注目領域選択ステップ（Ｓ６２）では、注目領域抽出ステップ（Ｓ６１）で抽出された１つ以上の注目領域ａ，ｂ，ｃの内、最大の面積を有する注目領域ａを選択する。また、興味領域設定ステップ（Ｓ６３）は第１実施形態と同様である。 A second embodiment of the present invention will be described with reference to FIG.
As shown in FIG. 6, in the image search method of the present embodiment, in the interest region generation step (S60), the attention region extraction step (S61) is the same as that of the first embodiment. In the attention area selection step (S62), the attention area a having the largest area is selected from the one or more attention areas a, b, and c extracted in the attention area extraction step (S61). Further, the area of interest setting step (S63) is the same as that of the first embodiment.

図７を参照し、本発明の第３実施形態について説明する。
図７に示されるように、本実施形態の画像検索方法において、興味領域生成ステップ（Ｓ６５）では、注目領域抽出ステップ（Ｓ６６）は第１実施形態と同様である。注目領域選択ステップ（Ｓ６７）では、注目領域抽出ステップ（Ｓ６６）で抽出された１つ以上の注目領域ａ，ｂ，ｃの内、所定の閾値以上の面積を有する注目領域ａ，ｂを選択する。また、興味領域設定ステップ（Ｓ６８）は第１実施形態と同様である。 A third embodiment of the present invention will be described with reference to FIG. 7.
As shown in FIG. 7, in the image search method of the present embodiment, in the interest region generation step (S65), the attention region extraction step (S66) is the same as that of the first embodiment. In the attention area selection step (S67), among the one or more attention areas a, b, c extracted in the attention area extraction step (S66), the attention areas a and b having an area equal to or larger than a predetermined threshold value are selected. .. Further, the area of interest setting step (S68) is the same as that of the first embodiment.

１５クエリ画像縮小部
２０注目度画像生成部
４０興味領域生成部
４１注目領域抽出部
４２注目領域選択部
４３興味領域設定部
４５特定カテゴリクエリ画像生成部
５２参照画像検索部
８１クエリ画像
８３注目度画像
８５特定カテゴリクエリ画像
ａ，ｂ，ｃ注目領域
Ｄ興味領域
15 Query image reduction unit 20 Attention image generation unit 40 Interest area generation unit 41 Interest area extraction unit 42 Interest area selection unit 43 Interest area setting unit 45 Specific category query image generation unit 52 Reference image search unit 81 Query image 83 Attention image 85 Specific category query image a, b, c Area of interest D Area of interest

Claims

An attention level image generation unit that generates a attention level image indicating the degree of attention indicating the possibility that an object belonging to a specific category exists at each position of the query image from the query image.
An interest region generation unit that generates an interest region based on the attention degree image from the attention degree image,
A specific category query image generation unit that generates a specific category query image by cutting out an area corresponding to the area of interest from the query image.
A reference image search unit that searches for a reference image corresponding to the specific category query image from a database in which a plurality of reference images are recorded, and
Have a,
The attention level image generation unit uses a deep convolutional neural network learned by using images taken of objects belonging to a plurality of categories including the specific category.
Image search system.

The image search system further includes a query image reduction unit that reduces the size of the query image.
The attention level image generation unit generates the attention level image from the query image reduced by the query image reduction unit.
The image search system according to claim 1.

The area of interest generation unit
An attention region extraction unit that extracts one or more attention regions whose attention degree is equal to or higher than a predetermined threshold value from the attention degree image.
A region of interest selection unit that selects one or more regions of interest from one or more regions of interest extracted by the region of interest.
An interest area setting unit that sets a rectangular area circumscribing all the attention areas selected by the attention area selection unit as an interest area, and an interest area setting unit.
To prepare
The image search system according to claim 1.

The attention region selection unit selects all the attention regions extracted by the attention region extraction unit.
The image search system according to claim 3 .

The attention region selection unit selects the attention region having the largest area among one or more attention regions extracted by the attention region extraction unit.
The image search system according to claim 3 .

The attention region selection unit selects an attention region having an area equal to or larger than a predetermined threshold value among one or more attention regions extracted by the attention region extraction unit.
The image search system according to claim 3 .

A attention level image generation step of generating a attention level image indicating a degree of attention indicating the possibility that an object belonging to a specific category exists at each position of the query image from the query image.
An interest region generation step of generating an interest region from the attention level image based on the attention level,
A specific category query image generation step of cutting out an area corresponding to the area of interest from the query image to generate a specific category query image, and
A reference image search step for searching a reference image corresponding to the specific category query image from a database in which a plurality of reference images are recorded, and
Have a,
The attention level image generation step uses a deep convolutional neural network learned using images taken of objects belonging to a plurality of categories including the specific category.
Image search method.

On the computer
An attention level image generation function that generates a attention level image indicating the degree of attention indicating the possibility that an object belonging to a specific category exists at each position of the query image from the query image.
An interest region generation function that generates an interest region based on the attention level from the attention level image,
A specific category query image generation function that generates a specific category query image by cutting out an area corresponding to the area of interest from the query image, and
A reference image search function that searches for a reference image corresponding to the specific category query image from a database in which a plurality of reference images are recorded, and
Realized ,
The attention level image generation function uses a deep convolutional neural network learned by using images taken of objects belonging to a plurality of categories including the specific category.
Image search program.