JP2018124740A

JP2018124740A - Image retrieval system, image retrieval method and image retrieval program

Info

Publication number: JP2018124740A
Application number: JP2017015717A
Authority: JP
Inventors: 悠一吉田; Yuichi Yoshida
Original assignee: Denso IT Laboratory Inc
Current assignee: Denso IT Laboratory Inc
Priority date: 2017-01-31
Filing date: 2017-01-31
Publication date: 2018-08-09
Anticipated expiration: 2037-01-31
Also published as: JP6778625B2

Abstract

PROBLEM TO BE SOLVED: To provide an image retrieval system capable of performing image retrieval at high speed.SOLUTION: The image retrieval system generates an attention level image (83) representing a degree of attention indicative of a possibility that an object belonging to a specific category exists at each position of a query image (81) from the query image (81), generates a region of interest (D) from a saliency image (83) on the basis of the degree of attention, cuts out a region (D ') corresponding to the region of interest (D) from the query image (81) to generate a specific category query image (85), and searches for a reference image corresponding to the specific category query image (85) from a database in which multiple reference images are recorded.SELECTED DRAWING: Figure 1

Description

本発明は、大量の参照画像の中からクエリ画像に対応する参照画像を検索する画像検索システム、画像検索方法及び画像検索プログラムに関する。 The present invention relates to an image search system, an image search method, and an image search program for searching a reference image corresponding to a query image from a large number of reference images.

従来、大量の参照画像の中からクエリ画像に対応する参照画像を検索し、クエリ画像に撮影された物体を特定する特定物体認識が行われている。画像検索では、事前準備として、特定の物体が撮影された画像を多種多様な物体について大量に準備して参照画像とし、当該参照画像をデータベースに記録しておく。さらに、各参照画像について、参照画像の特徴を示す特徴量を算出し、当該特徴量を各参照画像に対応付けてデータベースに記録しておく。画像検索を行う場合には、撮影装置によって撮影された画像をクエリ画像とする。そして、クエリ画像について特徴量を算出し、クエリ画像の特徴量と各参照画像の特徴量とを比較して、クエリ画像の特徴量と最も一致度の高い特徴量を有する参照画像を対応する参照画像として選択する。選択された参照画像に撮影された物体がクエリ画像に撮影された物体であるといえ、クエリ画像に撮影された物体が特定されることになる（例えば、特許文献１参照）。 Conventionally, specific object recognition is performed in which a reference image corresponding to a query image is searched from a large number of reference images, and an object photographed in the query image is specified. In the image search, as a preliminary preparation, a large number of images obtained by photographing a specific object are prepared for various objects and used as reference images, and the reference images are recorded in a database. Further, for each reference image, a feature amount indicating the feature of the reference image is calculated, and the feature amount is recorded in the database in association with each reference image. When performing an image search, an image captured by the imaging device is used as a query image. Then, the feature amount is calculated for the query image, the feature amount of the query image is compared with the feature amount of each reference image, and the reference image corresponding to the reference image having the feature amount having the highest degree of coincidence with the feature amount of the query image is compared. Select as an image. It can be said that the object photographed in the selected reference image is the object photographed in the query image, and the object photographed in the query image is specified (for example, see Patent Document 1).

特開２０１５−１１１３３９号公報Japanese Unexamined Patent Publication No. 2015-111339

近年、撮影装置によって撮影可能な画像のサイズが大きくなっている。このような大きなサイズの画像をクエリ画像として画像検索を行う場合には、特徴量の算出ないし参照画像の選択に時間を要し、実用的な時間内で画像検索を実行することが困難となる。
本発明の目的は、高速に画像検索を行うことが可能な画像検索システム、画像検索方法及び画像検索プログラムを提供することである。 In recent years, the size of images that can be photographed by a photographing apparatus has increased. When performing an image search using such a large-sized image as a query image, it takes time to calculate a feature amount or to select a reference image, and it is difficult to execute the image search within a practical time. .
An object of the present invention is to provide an image search system, an image search method, and an image search program that can perform an image search at high speed.

本発明の第１実施態様は、クエリ画像から前記クエリ画像の各位置において特定のカテゴリに属する物体が存在する可能性を示す注目度を表す注目度画像を生成する注目度画像生成部と、前記注目度画像から前記注目度に基づいて興味領域を生成する興味領域生成部と、前記クエリ画像から前記興味領域に対応する領域を切り出して特定カテゴリクエリ画像を生成する特定カテゴリクエリ画像生成部と、複数の参照画像が記録されているデータベースから前記特定カテゴリクエリ画像に対応する参照画像を検索する参照画像検索部と、を有する画像検索システムである。 The first embodiment of the present invention includes an attention level image generation unit that generates an attention level image indicating a degree of attention indicating the possibility that an object belonging to a specific category exists at each position of the query image from a query image; An interest region generating unit that generates an interest region based on the attention level from the attention level image, a specific category query image generating unit that generates a specific category query image by cutting out a region corresponding to the interest region from the query image, A reference image search unit that searches a reference image corresponding to the specific category query image from a database in which a plurality of reference images are recorded.

本実施態様では、クエリ画像の各位置に特定のカテゴリに属する物体が存在する可能性を示す注目度に基づいて興味領域を生成し、クエリ画像から興味領域に対応する領域を切り出して特定カテゴリクエリ画像を生成し、当該特定カテゴリクエリ画像に基づいて参照画像の検索を行っている。このため、参照画像の検索処理を高速で行うことができ、画像検索を高速で行うことが可能となっている。 In the present embodiment, a region of interest is generated based on the degree of attention indicating the possibility that an object belonging to a specific category exists at each position of the query image, and a region corresponding to the region of interest is cut out from the query image to generate a specific category query. An image is generated, and a reference image is searched based on the specific category query image. Therefore, the reference image search process can be performed at high speed, and the image search can be performed at high speed.

本発明の第２実施態様は、前記画像検索システムは、前記クエリ画像のサイズを縮小するクエリ画像縮小部をさらに有し、前記注目度画像生成部は、前記クエリ画像縮小部によって縮小された前記クエリ画像から前記注目度画像を生成する、画像検索システムである。 In a second embodiment of the present invention, the image search system further includes a query image reduction unit that reduces the size of the query image, and the attention level image generation unit is reduced by the query image reduction unit. An image search system that generates the attention level image from a query image.

本実施態様では、縮小されたサイズの小さなクエリ画像に基づいて注目度画像の生成を行っている。このため、注目度画像の生成処理を高速で行うこができ、画像検索をさらに高速で行うことが可能となっている。 In this embodiment, the attention level image is generated based on the reduced query image having a small size. Therefore, the attention level image generation process can be performed at high speed, and the image search can be performed at higher speed.

本発明の第３実施態様は、クエリ画像から前記クエリ画像の各位置において特定のカテゴリに属する物体が存在する可能性を示す注目度を表す注目度画像を生成する注目度画像生成ステップと、前記注目度画像から前記注目度に基づいて興味領域を生成する興味領域生成ステップと、前記クエリ画像から前記興味領域に対応する領域を切り出して特定カテゴリクエリ画像を生成する特定カテゴリクエリ画像生成ステップと、複数の参照画像が記録されているデータベースから前記特定カテゴリクエリ画像に対応する参照画像を検索する参照画像検索ステップと、を有する画像検索方法である。
本実施態様では、第１実施態様と同様の効果を奏する。 In a third embodiment of the present invention, an attention level image generation step for generating an attention level image indicating a degree of attention indicating a possibility that an object belonging to a specific category exists at each position of the query image from the query image; A region of interest generation step of generating a region of interest from the attention level image based on the degree of attention; a specific category query image generation step of generating a specific category query image by cutting out a region corresponding to the region of interest from the query image; And a reference image search step of searching for a reference image corresponding to the specific category query image from a database in which a plurality of reference images are recorded.
In this embodiment, the same effects as in the first embodiment can be obtained.

本発明の第４実施態様は、コンピュータに、クエリ画像から前記クエリ画像の各位置において特定のカテゴリに属する物体が存在する可能性を示す注目度を表す注目度画像を生成する注目度画像生成機能と、前記注目度画像から前記注目度に基づいて興味領域を生成する興味領域生成機能と、前記クエリ画像から前記興味領域に対応する領域を切り出して特定カテゴリクエリ画像を生成する特定カテゴリクエリ画像生成機能と、複数の参照画像が記録されているデータベースから前記特定カテゴリクエリ画像に対応する参照画像を検索する参照画像検索機能と、を実現させる画像検索プログラムである。
本実施態様では、第１実施態様と同様の効果を奏する。 The fourth embodiment of the present invention is a degree-of-interest image generation function for generating a degree-of-interest image that represents the degree of attention indicating the possibility that an object belonging to a specific category exists at each position of the query image from the query image. A region of interest generation function for generating a region of interest from the attention level image based on the degree of attention; and a specific category query image generation for cutting out a region corresponding to the region of interest from the query image and generating a specific category query image An image search program for realizing a function and a reference image search function for searching a reference image corresponding to the specific category query image from a database in which a plurality of reference images are recorded.
In this embodiment, the same effects as in the first embodiment can be obtained.

本発明では、高速に画像検索を行うことが可能となっている。 In the present invention, it is possible to perform image retrieval at high speed.

本発明の各実施形態の画像検索方法の概要を示す模式図。The schematic diagram which shows the outline | summary of the image search method of each embodiment of this invention. 本発明の第１実施形態の画像検索システムを示すブロック図。1 is a block diagram showing an image search system according to a first embodiment of the present invention. 本発明の第１実施形態のニューラルネットワーク部を示す模式図。The schematic diagram which shows the neural network part of 1st Embodiment of this invention. 本発明の第１実施形態の画像検索方法を示すフロー図。The flowchart which shows the image search method of 1st Embodiment of this invention. 本発明の第１実施形態の画像検索方法を示すフロー図。The flowchart which shows the image search method of 1st Embodiment of this invention. 本発明の第２実施形態の興味領域生成ステップを示すフロー図。The flowchart which shows the region of interest production | generation step of 2nd Embodiment of this invention. 本発明の第３実施形態の興味領域生成ステップを示すフロー図。The flowchart which shows the region of interest production | generation step of 3rd Embodiment of this invention.

図１を参照して、本発明の各実施形態の画像検索方法の概要を説明する。
本概要説明については、各実施形態の理解に資することを目的として、基本的な概念のみを示すものであり、本発明の画像検索方法については、様々な変形態様が考えられ、本概要説明において示される処理方法に限定されるものではない。 With reference to FIG. 1, the outline of the image search method of each embodiment of the present invention will be described.
For the purpose of contributing to the understanding of each embodiment, the present outline description shows only a basic concept, and the image search method of the present invention can be variously modified. It is not limited to the processing method shown.

各実施形態の画像検索方法については、大量の参照画像からクエリ画像に対応する参照画像を検索する際に、クエリ画像に前処理を施して、参照画像の検索処理を高速で行えるようにするものである。前処理においては、認識対象である特定のカテゴリに属する物体が存在する可能性の高い領域をクエリ画像から切り出し、切り出した画像に基づいて参照画像の検索処理を行う。 As for the image search method of each embodiment, when searching for a reference image corresponding to a query image from a large number of reference images, pre-processing is performed on the query image so that the reference image search process can be performed at high speed. It is. In the preprocessing, an area where there is a high possibility that an object belonging to a specific category to be recognized exists is extracted from the query image, and a reference image search process is performed based on the extracted image.

具体的には、図１に示されるように、まず、クエリ画像８１から、クエリ画像８１の各位置において特定のカテゴリに属する物体が存在する可能性を示す注目度を表す注目度画像８３を生成する。そして、生成された注目度画像８３から、注目度に基づいて注目領域ａ，ｂを抽出し、抽出した注目領域ａ，ｂに基づいて興味領域Ｄを生成する。注目領域ａ，ｂの決定方法、興味領域Ｄの設定方法としては、様々な方法を用いることが可能であり、その具体例を各実施形態で説明する。続いて、クエリ画像８１から興味領域Ｄに対応する領域Ｄ´を切り出して、特定カテゴリクエリ画像８５を生成する。このようにして生成された特定カテゴリクエリ画像８５に基づいて参照画像の検索処理を行う。 Specifically, as shown in FIG. 1, first, an attention level image 83 representing the attention level indicating the possibility that an object belonging to a specific category exists at each position of the query image 81 is generated from the query image 81. To do. Then, the attention areas a and b are extracted from the generated attention level image 83 based on the attention level, and the interest area D is generated based on the extracted attention areas a and b. Various methods can be used as a method for determining the attention areas a and b and a method for setting the area of interest D, and specific examples will be described in each embodiment. Subsequently, a region D ′ corresponding to the region of interest D is cut out from the query image 81 to generate a specific category query image 85. A reference image search process is performed based on the specific category query image 85 generated in this manner.

図２乃至図５を参照し、本発明の第１実施形態について説明する。
図２及び図３を参照し、本実施形態の画像検索システムについて説明する。
図２に示されるように、本実施形態の画像検索システムは画像を撮影する撮影部１０を有する。撮影部１０としては、モバイルデバイスのカメラや車両の車載カメラ等が用いられる。撮影部１０は比較的サイズの大きな画像を撮影する。画像のサイズの大小とは画素数の多寡を示す。撮影部１０によって撮影された画像がクエリ画像８１となる。 A first embodiment of the present invention will be described with reference to FIGS.
The image search system of this embodiment will be described with reference to FIGS.
As shown in FIG. 2, the image search system of the present embodiment includes a photographing unit 10 that captures an image. As the photographing unit 10, a mobile device camera, a vehicle-mounted camera, or the like is used. The imaging unit 10 captures a relatively large image. The size of the image indicates the number of pixels. An image photographed by the photographing unit 10 is a query image 81.

画像検索システムは、データベース部６０に記録されている大量の参照画像からクエリ画像８１に対応する参照画像を検索する演算制御部１２を有する。演算制御部１２は以下に述べる各機能を有する。なお、演算制御部１２に当該各機能を実現させるためのプログラムについても本願発明の範囲に含まれる。 The image search system includes an arithmetic control unit 12 that searches for a reference image corresponding to the query image 81 from a large number of reference images recorded in the database unit 60. The arithmetic control unit 12 has the following functions. Note that a program for causing the arithmetic control unit 12 to realize each function is also included in the scope of the present invention.

演算制御部１２は、クエリ画像８１のサイズを縮小するクエリ画像縮小部１５を有する。クエリ画像８１は、以下に述べる注目度画像生成部２０による注目度画像８３の生成に適したサイズに適宜縮小される。画像のサイズの縮小においては、複数の画素を、当該複数の画素の画素値の平均の画素値を有する単一の画素に置換する等、適宜の粗視化を行う。このため、画像のサイズの縮小により解像度は低下することになる。 The arithmetic control unit 12 includes a query image reduction unit 15 that reduces the size of the query image 81. The query image 81 is appropriately reduced to a size suitable for generating the attention level image 83 by the attention level image generation unit 20 described below. In reducing the size of the image, appropriate coarse-graining is performed, such as replacing a plurality of pixels with a single pixel having an average pixel value of the pixel values of the plurality of pixels. For this reason, the resolution is reduced by reducing the size of the image.

演算制御部１２は、クエリ画像縮小部１５によって縮小されたクエリ画像８１から、特定のカテゴリについての注目度を表す注目度画像８３を生成する注目度画像生成部２０を有する。注目度とは、クエリ画像の各位置において、所定のカテゴリに属する物体が存在する可能性を示すものである。このような注目度画像を生成する方法としては、例えば、B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, 「Learning Deep Features for Discriminative Localization」 Computer Vision and Pattern Recognition (CVPR), 2016に記載された技術を用いることができる。 The arithmetic control unit 12 includes a degree-of-interest image generation unit 20 that generates a degree-of-interest image 83 representing the degree of attention about a specific category from the query image 81 reduced by the query image reduction unit 15. The degree of attention indicates the possibility that an object belonging to a predetermined category exists at each position of the query image. For example, B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, `` Learning Deep Features for Discriminative Localization '' Computer Vision and Pattern Recognition ( CVPR), 2016 can be used.

注目度画像生成部２０は、様々なカテゴリについての注目度を算出するニューラルネットワーク部２１と、特定のカテゴリについての注目度を表す注目度画像８３を描画する画像描画部２２と、を有する。 The attention level image generation unit 20 includes a neural network unit 21 that calculates the attention level for various categories, and an image drawing unit 22 that draws the attention level image 83 that represents the attention level for a specific category.

カテゴリとしては、ＤＶＤ／ＣＤジャケット、ポスター、道路標識等の様々なカテゴリが用いられる。特定のカテゴリについては、画像検索システムの目的に応じて適宜選択される。例えば、ＤＶＤ／ＣＤジャケットの撮影された画像から、撮影されているＤＶＤ／ＣＤを特定し、当該ＤＶＤ／ＣＤの内容等の関連情報を得るような画像検索システムでは、特定のカテゴリとしてＤＶＤ／ＣＤジャケットのカテゴリが選択される。また、道路標識の撮影された画像から、撮影されている道路標識を特定し、速度規制、一方通行等の当該道路標識の内容を得るような画像検索システムでは、特定のカテゴリとして道路標識が選択される。 As the category, various categories such as a DVD / CD jacket, a poster, and a road sign are used. The specific category is appropriately selected according to the purpose of the image search system. For example, in an image search system that identifies a DVD / CD being shot from a shot image of a DVD / CD jacket and obtains related information such as the content of the DVD / CD, the DVD / CD is a specific category. The jacket category is selected. Also, in an image search system that identifies the road sign being photographed from the captured image of the road sign and obtains the content of the road sign such as speed regulation and one-way traffic, the road sign is selected as a specific category. Is done.

図３に示されるように、ニューラルネットワーク部２１では、深層畳込ニューラルネットワークが用いられる。深層畳込ニューラルネットワークは、入力層２３、多数の中間層２４、全結合層２８及び出力層２９を積層することにより形成されており、中間層２４は、畳込層２５、活性化層２６及びプーリング層２７を積層することにより形成されている。 As shown in FIG. 3, the neural network unit 21 uses a deep convolution neural network. The deep convolutional neural network is formed by stacking an input layer 23, a number of intermediate layers 24, a total coupling layer 28, and an output layer 29. The intermediate layer 24 includes convolution layers 25, activation layers 26, and It is formed by laminating the pooling layer 27.

深層畳込ニューラルネットワークは、画像が入力される入力層２３を有する。
そして、深層畳込ニューラルネットワークは多数の中間層２４を有する。
中間層２４は、画像の各位置における特徴を抽出する畳込層２５を有する。即ち、畳込層２５は、式（１）に示されるように、各ユニットｌについて、画像の所定の位置（ｉ，ｊ）における前層からの入力ｘに対して、入力ｘの全ユニットについての重み付け和Σαｘにバイアスβを付加して、次層への出力ｙとする。
The deep convolution neural network has an input layer 23 to which an image is input.
The deep convolutional neural network has a number of intermediate layers 24.
The intermediate layer 24 includes a convolution layer 25 that extracts features at each position of the image. In other words, the convolutional layer 25, as shown in the equation (1), for each unit l, with respect to the input x from the previous layer at the predetermined position (i, j) of the image, for all units of the input x A bias β is added to the weighted sum Σαx, and an output y to the next layer is obtained.

中間層２４は、収束性や学習速度の向上に寄与する活性化層２６を有する。活性化層２６は、式（２）に示されるように、各ユニットｌについて、畳込層２５からの入力ｘに対する活性化関数ｆからの応答を次層への出力ｙとする。活性化関数としてはＲｅＬＵ（ｒｅｃｔｉｆｉｅｄｌｉｎｅａｒｕｎｉｔ）を用いる。なお、活性化関数としては、シグモイド関数等、その他の適宜の関数を用いてもよい。
The intermediate layer 24 has an activation layer 26 that contributes to improving convergence and learning speed. As shown in Expression (2), the activation layer 26 sets a response from the activation function f to the input x from the convolution layer 25 as an output y to the next layer for each unit l. As the activation function, ReLU (rectified linear unit) is used. As the activation function, other appropriate functions such as a sigmoid function may be used.

中間層２４は、画像における局所的な変動を捨象して情報を圧縮するプーリング層２７を有する。即ち、プーリング層２７は、式（３）に示されるように、各ユニットｌについて、ｍ個の要素を包含する小領域Ｍ内において、活性化層２６からの入力ｘの平均値をとる平均プーリングを行う。なお、プーリング方法としては、最大プーリング等、その他の適宜のプーリング方法を用いてもよい。
The intermediate layer 24 includes a pooling layer 27 that compresses information by discarding local fluctuations in the image. That is, the pooling layer 27 is an average pooling that takes an average value of the input x from the activation layer 26 in the small region M including m elements for each unit l as shown in the equation (3). I do. As a pooling method, other appropriate pooling methods such as maximum pooling may be used.

そして、深層畳込ニューラルネットワークは、各カテゴリｃについて注目度を算出する全結合層２８を有する。即ち、全結合層２８は、式（４）に示されるように、各カテゴリｃについて、前層からの入力ｘの全ユニットについての重み付き和Ｓ（＝Σωｘ）を算出する。
そして、全結合層２８は、式（５）に示されるように、重み付き和Ｓの入力に対するソフトマップ関数の応答を各カテゴリｃについての注目度Ｐとする。
さらに、深層畳込ニューラルネットワークは、各カテゴリｃについての注目度Ｐを画像描画部２２に出力する出力層２９を有する。 The deep convolutional neural network has a total connection layer 28 for calculating the attention level for each category c. That is, as shown in Expression (4), the total coupling layer 28 calculates a weighted sum S (= Σωx) for all units of the input x from the previous layer for each category c.
Then, as shown in Expression (5), the total coupling layer 28 sets the response of the soft map function to the input of the weighted sum S as the attention level P for each category c.
Further, the deep convolutional neural network includes an output layer 29 that outputs a degree of attention P for each category c to the image drawing unit 22.

深層畳込ニューラルネットワークでは、上述した重みα，ω及びバイアスβ等のパラメーターについては、各カテゴリｃに属する物体の撮影された多数の画像を用いた学習により、予め決定されている。即ち、理想的な出力Ｑと実際の出力Ｒとの乖離については、式（６）に示される交差エントロピーＥによって測定される。
当該交差エントロピーＥが極小化されるように、式（７）に示されるように、誤差逆伝搬法を用いて、重みα，ωないしバイアスβ等のパラメーターを順次更新して、パラメーターを決定する。
In the deep convolution neural network, the parameters such as the weights α and ω and the bias β described above are determined in advance by learning using a large number of captured images of objects belonging to each category c. That is, the difference between the ideal output Q and the actual output R is measured by the cross entropy E shown in Equation (6).
In order to minimize the cross entropy E, parameters such as weights α, ω and bias β are sequentially updated using the back propagation method as shown in Equation (7) to determine the parameters. .

図２に示されるように、注目度画像生成部２０は、ニューラルネットワーク部２１から入力されたカテゴリｃについての注目度Ｐから、特定のカテゴリＣについての注目度Ｐを表す注目度画像８３を描画する画像描画部２２を有する。
注目度画像生成部２０で生成される注目度画像８３は縮小されたクエリ画像８１と同一のサイズとなる。 As shown in FIG. 2, the attention level image generation unit 20 draws the attention level image 83 representing the attention level P for the specific category C from the attention level P for the category c input from the neural network unit 21. The image drawing unit 22 is provided.
The attention level image 83 generated by the attention level image generation unit 20 has the same size as the reduced query image 81.

演算制御部１２は、クエリ画像８１の縮小率に基づいて、注目度画像８３のサイズを縮小前のクエリ画像８１と同一のサイズに拡大する注目度画像拡大部３５を有する。画像のサイズの拡大においては、単一の画素を、当該画素の画素値と同一の画素値を有する複数の画素に置換する等、適宜の補完を行う。 The arithmetic control unit 12 includes an attention degree image enlargement unit 35 that enlarges the size of the attention degree image 83 to the same size as the query image 81 before reduction based on the reduction ratio of the query image 81. In enlarging the size of the image, appropriate complementation is performed, such as replacing a single pixel with a plurality of pixels having the same pixel value as the pixel value of the pixel.

演算制御部１２は、拡大された注目度画像８３から、注目度に基づいて興味領域Ｄを生成する興味領域生成部４０を有する。
即ち、興味領域生成部４０は、所定の閾値以上の注目度を有する１つ以上の領域を注目領域ａ，ｂとして抽出する注目領域抽出部４１を有する。興味領域生成部４０は、注目領域抽出部４１で抽出された１つ以上の注目領域ａ，ｂから１つ以上の注目領域ａ，ｂを選択する注目領域選択部４２を有する。本実施形態では、注目領域選択部４２は注目領域抽出部４１で抽出された全ての注目領域ａ，ｂを選択する。興味領域生成部４０は、選択された全ての注目領域ａ，ｂに外接する長方形状の領域を興味領域Ｄとして設定する興味領域設定部４３を有する。当該興味領域Ｄについては、クエリ画像８１の対応する領域において、特定のカテゴリに属する物体が存在する可能性が高い領域を示すものである。 The arithmetic control unit 12 includes an interest region generation unit 40 that generates an interest region D based on the attention level from the enlarged attention level image 83.
That is, the region-of-interest generating unit 40 includes a region-of-interest extracting unit 41 that extracts one or more regions having attention degrees equal to or greater than a predetermined threshold as the regions of interest a and b. The region-of-interest generating unit 40 includes a region-of-interest selecting unit 42 that selects one or more regions of interest a, b from one or more regions of interest a, b extracted by the region of interest extraction unit 41. In the present embodiment, the attention area selection unit 42 selects all the attention areas a and b extracted by the attention area extraction unit 41. The region-of-interest generating unit 40 has a region-of-interest setting unit 43 that sets a rectangular region circumscribing all the selected regions of interest a and b as the region of interest D. The region of interest D indicates a region where there is a high possibility that an object belonging to a specific category exists in a corresponding region of the query image 81.

演算制御部１２は、クエリ画像８１から興味領域Ｄに対応する領域を切り出して、特定カテゴリクエリ画像８５を生成する特定カテゴリクエリ画像生成部４５を有する。画像の切出しとは、画像のサイズの拡縮を伴うことなく、画像の一部を分離することを示す。このため、画像の切出しによって画像のサイズ自体は小さくなるものの、画像のサイズが縮小されるわけではなく、解像度が低下することはない。 The arithmetic control unit 12 includes a specific category query image generation unit 45 that generates a specific category query image 85 by cutting out a region corresponding to the region of interest D from the query image 81. Image cropping refers to separating a part of an image without enlarging or reducing the size of the image. For this reason, although the size of the image itself is reduced by cutting out the image, the size of the image is not reduced and the resolution is not reduced.

演算制御部１２は、特定カテゴリクエリ画像８５を縮小する特定カテゴリクエリ画像縮小部５０を有する。特定カテゴリクエリ画像８５のサイズは、以下に述べる参照画像検索部５２による参照画像の検索に適したサイズに適宜縮小される。 The arithmetic control unit 12 includes a specific category query image reduction unit 50 that reduces the specific category query image 85. The size of the specific category query image 85 is appropriately reduced to a size suitable for reference image search by the reference image search unit 52 described below.

演算制御部１２は、データベース部６０に記録されている多量の参照画像から、特定カテゴリクエリ画像８５に対応する参照画像を検索する参照画像検索部５２を有する。
参照画像検索部５２は、特定カテゴリクエリ画像８５の特徴量を算出する特徴量算出部５５を有する。即ち、特徴量算出部５５は、特定カテゴリクエリ画像８５からエッジを抽出してエッジ画像を生成し、エッジ画像から複数の特徴点を検出する特徴点検出部５６を有する。特徴量算出部５５は、各特徴点における局所特徴量を抽出する局所特徴量抽出部５７を有する。局所特徴量としてはＳＩＦＴ特徴量が用いられ、ＳＩＦＴ特徴量はＮ次元ベクトルとして得られる。特徴量算出部５５は、ＳＩＦＴ特徴量をバイナリコードに変換するバイナリ変換部５８を有する。バイナリ変換部５８は、式（８）に示されるように、ＳＩＦＴ特徴量ｖを二値によって表現されるバイナリコードｈに変換する。ここで、ｄは変換後のバイナリコードのサイズ、ｗは、Ｎ行ｄ列の行列であり、Ｎ次元における半径１の超球上の点から、正規分布に従ってランダムサンプリングをして得られるベクトルである。
The arithmetic control unit 12 includes a reference image search unit 52 that searches a reference image corresponding to the specific category query image 85 from a large number of reference images recorded in the database unit 60.
The reference image search unit 52 includes a feature amount calculation unit 55 that calculates the feature amount of the specific category query image 85. That is, the feature amount calculation unit 55 includes a feature point detection unit 56 that generates an edge image by extracting an edge from the specific category query image 85 and detects a plurality of feature points from the edge image. The feature quantity calculation unit 55 includes a local feature quantity extraction unit 57 that extracts a local feature quantity at each feature point. The SIFT feature value is used as the local feature value, and the SIFT feature value is obtained as an N-dimensional vector. The feature amount calculation unit 55 includes a binary conversion unit 58 that converts the SIFT feature amount into a binary code. The binary conversion unit 58 converts the SIFT feature value v into a binary code h expressed by binary values, as shown in Expression (8). Here, d is the size of the binary code after conversion, w is a matrix of N rows and d columns, and is a vector obtained by random sampling according to a normal distribution from points on a hypersphere of radius 1 in N dimensions. is there.

一方、データベース部６０は、大量の参照画像が記録されている参照画像記録部６１を有する。参照画像としては、多種多様なカテゴリについて、カテゴリ毎に、当該カテゴリに属する物体が撮影された画像が多数記録されている。例えば、カテゴリとして、ＤＶＤ／ＣＤジャケット、ポスター、道路標識等が用いられ、ＤＶＤ／ＣＤジャケットのカテゴリに属する物体として、様々な種類のＤＶＤ／ＣＤジャケット、道路標識のカテゴリに属する物体として、速度規制、一方通行等の様々な道路標識が用いられる。 On the other hand, the database unit 60 includes a reference image recording unit 61 in which a large amount of reference images are recorded. As a reference image, for each of various categories, a large number of images in which an object belonging to the category is captured are recorded. For example, DVD / CD jackets, posters, road signs, and the like are used as categories. As objects belonging to the category of DVD / CD jackets, speed restrictions as objects belonging to various types of DVD / CD jackets, road sign categories. Various road signs such as one-way streets are used.

データベース部６０は、各参照画像について算出された特徴量が各参照画像と対応付けられて記録されている特徴量記録部６２を有する。特徴量としては、特徴量算出部５５によって算出される特徴量と同様、各特徴点におけるＳＩＦＴ特徴量をバイナリ変換したバイナリコードが用いられる。 The database unit 60 includes a feature amount recording unit 62 in which the feature amount calculated for each reference image is recorded in association with each reference image. As the feature amount, as with the feature amount calculated by the feature amount calculation unit 55, a binary code obtained by binary conversion of the SIFT feature amount at each feature point is used.

データベース部６０は、各参照画像に関連する関連情報が各参照画像と対応付けられて記録されている関連情報記録部６３を有する。関連情報としては、例えば、参照画像がＤＶＤ／ＣＤジャケットである場合には、当該ＤＶＤ／ＣＤの内容、映画ＤＶＤであれば監督や出演者の情報、音楽ＣＤであれば作曲者や演奏者の情報が用いられる。また、参照画像が交通標識である場合には、制限速度や一方通行等の当該交通標識の内容が用いられる。 The database unit 60 includes a related information recording unit 63 in which related information related to each reference image is recorded in association with each reference image. As related information, for example, if the reference image is a DVD / CD jacket, the contents of the DVD / CD, if it is a movie DVD, information on the director or performer, if it is a music CD, the composer or performer Information is used. When the reference image is a traffic sign, the contents of the traffic sign such as a speed limit and one-way traffic are used.

そして、参照画像検索部５２は、データベース部６０に記録されている多数の参照画像から、特定カテゴリクエリ画像８５に対応する参照画像を選択する参照画像選択部６５を有する。即ち、参照画像選択部６５は、特定カテゴリクエリ画像８５の各特徴点の特徴量と全参照画像の全特徴点の特徴量とを比較し、特定カテゴリクエリ画像８５の特徴点の特徴量と最も一致度の高い特徴量を有する参照画像の特徴点を、当該特定カテゴリクエリ画像８５の特徴点に対応する特徴点として選択する。ここで、両バイナリコードの不一致度を示すハミング距離が最も小さくなる特徴量が最も一致度の高い特徴量とされる。そして、参照画像選択部６５は、全参照画像の内、特定カテゴリクエリ画像８５の特徴点に対応する特徴点の数が最も多い参照画像を、当該特定カテゴリクエリ画像８５に対応する参照画像として選択する。 The reference image search unit 52 includes a reference image selection unit 65 that selects a reference image corresponding to the specific category query image 85 from a large number of reference images recorded in the database unit 60. That is, the reference image selection unit 65 compares the feature amount of each feature point of the specific category query image 85 with the feature amount of all feature points of all the reference images, and the feature amount of the feature point of the specific category query image 85 is the most. A feature point of a reference image having a feature amount with a high degree of coincidence is selected as a feature point corresponding to the feature point of the specific category query image 85. Here, the feature quantity with the smallest Hamming distance indicating the degree of inconsistency between the two binary codes is the feature quantity with the highest degree of coincidence. Then, the reference image selection unit 65 selects a reference image having the largest number of feature points corresponding to the feature points of the specific category query image 85 among all reference images as a reference image corresponding to the specific category query image 85. To do.

クエリ画像８１に撮影されている物体と、特定カテゴリクエリ画像８５に対応する参照画像に撮影されている物体は同一種類の物体であるといえ、参照画像に撮影されている物体の種類は予め特定されているから、クエリ画像８１に撮影されている物体の種類が特定されることになる。
画像検索システムは、選択された参照画像に対応付けられた関連情報を読み出す関連情報読出部７０を有する。 It can be said that the object photographed in the query image 81 and the object photographed in the reference image corresponding to the specific category query image 85 are the same type of object, and the type of the object photographed in the reference image is specified in advance. Therefore, the type of the object photographed in the query image 81 is specified.
The image search system includes a related information reading unit 70 that reads related information associated with the selected reference image.

画像検索システムは、読み出された関連情報に基づいて出力を行う出力部７５を有する。出力部７５は、例えば、モバイルデバイスのカメラでＤＶＤ／ＣＤジャケットが撮影された場合には、当該モバイルデバイスの画面に当該ＤＶＤ／ＣＤの関連情報を表示し、車載カメラで道路標識が撮影された場合には、当該道路標識の内容に基づいて車両を自動制御する。 The image search system includes an output unit 75 that performs output based on the read related information. For example, when a DVD / CD jacket is photographed with a camera of a mobile device, the output unit 75 displays the relevant information of the DVD / CD on the screen of the mobile device, and a road sign is photographed with an in-vehicle camera. In this case, the vehicle is automatically controlled based on the content of the road sign.

図４及び図５を参照し、本発明の第１実施形態の画像検索方法について説明する。
撮影ステップ（Ｓ１０）
撮影ステップ（Ｓ１０）では、比較的サイズの大きな画像を撮影する。本実施形態では、モバイルデバイスのカメラによりＤＶＤ／ＣＤジャケットが撮影されており、画像のサイズは４０００×３０００画素である。撮影された画像がクエリ画像８１となる。 The image search method according to the first embodiment of the present invention will be described with reference to FIGS.
Shooting step (S10)
In the photographing step (S10), a relatively large image is photographed. In this embodiment, a DVD / CD jacket is shot by the camera of the mobile device, and the size of the image is 4000 × 3000 pixels. The captured image becomes the query image 81.

クエリ画像縮小ステップ（Ｓ１５）
クエリ画像縮小ステップ（Ｓ１５）では、クエリ画像８１を縮小する。クエリ画像８１は、以下に述べる注目度画像生成ステップ（Ｓ２０）における注目度画像８３の生成に適したサイズに適宜縮小される。本実施形態では、クエリ画像８１のサイズを４０００×３０００画素から２５６×２５６画素まで縮小している。 Query image reduction step (S15)
In the query image reduction step (S15), the query image 81 is reduced. The query image 81 is appropriately reduced to a size suitable for generating the attention level image 83 in the attention level image generation step (S20) described below. In the present embodiment, the size of the query image 81 is reduced from 4000 × 3000 pixels to 256 × 256 pixels.

注目度画像生成ステップ（Ｓ２０）
注目度画像生成ステップ（Ｓ２０）では、上述したように深層畳込ニューラルネットワークを用いて、縮小したクエリ画像８１から注目度画像８３を生成する。注目度画像８３のサイズは、縮小したクエリ画像８１のサイズと同一であり、本実施形態では２５６×２５６画素である。 Attention level image generation step (S20)
In the attention level image generation step (S20), the attention level image 83 is generated from the reduced query image 81 using the deep convolutional neural network as described above. The size of the attention level image 83 is the same as the size of the reduced query image 81, and is 256 × 256 pixels in this embodiment.

注目度画像拡大ステップ（Ｓ２５）
注目度画像拡大ステップ（Ｓ２５）では、クエリ画像８１の縮小率に基づいて、注目度画像８３を縮小前のクエリ画像８１と同一のサイズに拡大する。本実施形態では、注目度画像８３のサイズを２５６×２５６画素から４０００×３０００画素まで拡大する。 Attention degree image enlargement step (S25)
Attention level image enlargement step (S25), based on the reduction rate of query image 81, attention level image 83 is enlarged to the same size as query image 81 before reduction. In the present embodiment, the size of the attention level image 83 is increased from 256 × 256 pixels to 4000 × 3000 pixels.

興味領域生成ステップ（Ｓ３０）
興味領域生成ステップ（Ｓ３０）では、拡大された注目度画像８３から注目度に基づいて興味領域Ｄを生成する。即ち、注目領域抽出ステップ（Ｓ３１）では、注目度画像８３において所定の閾値以上の注目度を有する１つ以上の注目領域ａ，ｂを抽出する。そして、注目領域選択ステップ（Ｓ３２）では、抽出された全ての注目領域ａ，ｂを選択する。興味領域設定ステップ（Ｓ３３）では、選択された全ての注目領域ａ，ｂに外接する長方形状の領域を興味領域Ｄに設定する。 Region of interest generation step (S30)
In the region of interest generation step (S30), the region of interest D is generated based on the attention level from the enlarged attention level image 83. That is, in the attention area extraction step (S31), one or more attention areas a and b having an attention degree equal to or higher than a predetermined threshold in the attention degree image 83 are extracted. In the attention area selection step (S32), all the extracted attention areas a and b are selected. In the region of interest setting step (S33), a rectangular region circumscribing all the selected regions of interest a and b is set as the region of interest D.

特定カテゴリクエリ画像生成ステップ（Ｓ３５）
特定カテゴリクエリ画像生成ステップ（Ｓ３５）では、クエリ画像８１から興味領域Ｄに対応する領域Ｄ´を切り出して、特定カテゴリクエリ画像８５を生成する。本実施形態では、クエリ画像８１のサイズが４０００×３０００画素であるのに対して、特定カテゴリクエリ画像８５のサイズは２０００×１８００画素である。 Specific category query image generation step (S35)
In the specific category query image generation step (S35), a region D ′ corresponding to the region of interest D is cut out from the query image 81 to generate a specific category query image 85. In the present embodiment, the size of the query image 81 is 4000 × 3000 pixels, while the size of the specific category query image 85 is 2000 × 1800 pixels.

特定カテゴリクエリ画像縮小ステップ（Ｓ４０）
特定カテゴリクエリ画像縮小ステップ（Ｓ４０）では、特定カテゴリクエリ画像８５を縮小する。特定カテゴリクエリ画像８５のサイズは、以下に述べる参照画像検索ステップ（Ｓ４２）における参照画像の検索に適したサイズに適宜設定される。本実施形態では、特定カテゴリクエリ画像８５のサイズは２０００×１８００画素から３２０×２４０画素まで縮小される。 Specific category query image reduction step (S40)
In the specific category query image reduction step (S40), the specific category query image 85 is reduced. The size of the specific category query image 85 is appropriately set to a size suitable for reference image search in the reference image search step (S42) described below. In the present embodiment, the size of the specific category query image 85 is reduced from 2000 × 1800 pixels to 320 × 240 pixels.

参照画像検索ステップ（Ｓ４２）
参照画像検索ステップ（Ｓ４２）において、特徴量算出ステップ（Ｓ４５）では、特定カテゴリクエリ画像８５の特徴量が算出される。即ち、特徴量算出ステップ（Ｓ４６）において、特徴点検出ステップ（Ｓ４６）では、特定カテゴリクエリ画像８５からエッジを抽出してエッジ画像を生成し、エッジ画像から複数の特徴点を検出する。局所特徴量抽出ステップ（Ｓ４７）では、各特徴点においてＳＩＦＴ特徴量を抽出する。バイナリ変換ステップ（Ｓ４８）では、ＳＩＦＴ特徴量をバイナリコードに変換する。 Reference image search step (S42)
In the reference image search step (S42), the feature amount of the specific category query image 85 is calculated in the feature amount calculation step (S45). That is, in the feature amount calculation step (S46), in the feature point detection step (S46), an edge is extracted from the specific category query image 85 to generate an edge image, and a plurality of feature points are detected from the edge image. In the local feature amount extraction step (S47), SIFT feature amounts are extracted at each feature point. In the binary conversion step (S48), the SIFT feature value is converted into a binary code.

参照画像検索ステップ（Ｓ４２）において、参照画像選択ステップ（Ｓ５０）では、データベースに記録されている多数の参照画像から、特定カテゴリクエリ画像８５に対応する参照画像を選択する。即ち、検索画像の各特徴点のバイナリコードと、データベースに記録されている全参照画像の全特徴点のバイナリコードとを比較し、ハミング距離が最も近い参照画像の特徴点を、当該特定カテゴリクエリ画像８５の特徴点に対応する特徴点として選択する。そして、全参照画像の内、特定カテゴリクエリ画像８５の特徴点に対応する特徴点の数が最も多い参照画像を、特定カテゴリクエリ画像８５に対応する参照画像として選択する。 In the reference image search step (S42), in the reference image selection step (S50), a reference image corresponding to the specific category query image 85 is selected from a large number of reference images recorded in the database. That is, the binary code of each feature point of the search image is compared with the binary code of all the feature points of all reference images recorded in the database, and the feature point of the reference image having the closest Hamming distance is compared with the specific category query. A feature point corresponding to the feature point of the image 85 is selected. Then, the reference image having the largest number of feature points corresponding to the feature points of the specific category query image 85 among all the reference images is selected as the reference image corresponding to the specific category query image 85.

本実施形態の画像検索システム及び画像検索方法は以下の効果を奏する。
本実施形態の画像検索システム及び画像検索方法では、縮小されたクエリ画像８１に基づき、深層畳込ニューラルネットワークを用いてクエリ画像８１の各位置において特定のカテゴリに属する物体が存在する可能性を示す注目度を表す注目度画像８３を生成している。そして、縮小前のクエリ画像８１と同じサイズに拡大した注目度画像８３に基づいて、クエリ画像８１において特定のカテゴリに属する物体の存在する可能性の高い興味領域Ｄを生成し、クエリ画像８１から当該興味領域Ｄに対応する領域Ｄ´を特定カテゴリクエリ画像８５として切り出し、当該特定カテゴリクエリ画像８５に基づいて参照画像の検索を行っている。このため、クエリ画像８１よりも小さなサイズの特定カテゴリクエリ画像８５に基づいて、参照画像の検索処理を行うこととなるため、画像検索を高速で行うことが可能となっている。さらに、縮小されたサイズの小さなクエリ画像８１に基づいて、注目度画像８３の生成を行っているため、深層畳込ニューラルネットワークを用いた注目度画像８３の生成処理を高速で行うことができ、画像検索をさらに高速で行うことが可能となっている。なお、特定カテゴリクエリ画像８５のサイズはクエリ画像８１のサイズよりも小さくなっているが、特定カテゴリクエリ画像８５はクエリ画像８１から切り出されて生成されており、特定カテゴリクエリ画像８５の解像度はクエリ画像８１の解像度から低下しているわけではない。このため、クエリ画像８１をそのまま用いて画像検索を行った場合と同程度の検索精度が実現されている。 The image search system and image search method of the present embodiment have the following effects.
In the image search system and the image search method of this embodiment, based on the reduced query image 81, the possibility that an object belonging to a specific category exists at each position of the query image 81 using a deep convolution neural network is shown. An attention level image 83 representing the attention level is generated. Then, based on the attention level image 83 enlarged to the same size as the query image 81 before reduction, an interest region D in which there is a high possibility that an object belonging to a specific category exists in the query image 81 is generated. A region D ′ corresponding to the region of interest D is cut out as a specific category query image 85 and a reference image is searched based on the specific category query image 85. For this reason, since the reference image search process is performed based on the specific category query image 85 having a size smaller than that of the query image 81, the image search can be performed at high speed. Furthermore, since the attention level image 83 is generated based on the reduced query image 81 having a small size, the generation process of the attention level image 83 using the deep convolution neural network can be performed at high speed. Image search can be performed at higher speed. Although the size of the specific category query image 85 is smaller than the size of the query image 81, the specific category query image 85 is generated by being cut out from the query image 81, and the resolution of the specific category query image 85 is the query. The resolution of the image 81 is not lowered. For this reason, the same search accuracy as that in the case of performing an image search using the query image 81 as it is is realized.

本実施形態では、縮小したクエリ画像に基づいて注目度画像を生成しているが、充分な演算能力を有するコンピュータを用いる場合には、クエリ画像を縮小することなく、クエリ画像をそのまま用いて、注目度画像の生成を行うようにしてもよい。 In the present embodiment, the attention level image is generated based on the reduced query image. However, when using a computer having sufficient calculation capability, the query image is used as it is without reducing the query image. The attention level image may be generated.

また、本実施形態では、縮小されたクエリ画像から注目度画像を生成し、注目度画像を拡大した後に拡大した注目度画像から興味領域を生成し、クエリ画像から興味領域に対応する領域を切り出して、特定カテゴリクエリ画像を生成している。しかしながら、注目度画像を拡大する前に興味領域を生成し、クエリ画像の縮小率に基づいて興味領域を拡大して、クエリ画像から拡大した興味領域に対応する領域を切り出し、特定カテゴリクエリ画像を生成するようにしてもよい。 Further, in the present embodiment, an attention level image is generated from the reduced query image, an interest area is generated from the enlarged attention level image after the attention level image is enlarged, and an area corresponding to the interest area is cut out from the query image The specific category query image is generated. However, the region of interest is generated before the attention level image is expanded, the region of interest is expanded based on the reduction rate of the query image, the region corresponding to the expanded region of interest is cut out from the query image, and the specific category query image is extracted. You may make it produce | generate.

図６を参照し、本発明の第２実施形態について説明する。
図６に示されるように、本実施形態の画像検索方法において、興味領域生成ステップ（Ｓ６０）では、注目領域抽出ステップ（Ｓ６１）は第１実施形態と同様である。注目領域選択ステップ（Ｓ６２）では、注目領域抽出ステップ（Ｓ６１）で抽出された１つ以上の注目領域ａ，ｂ，ｃの内、最大の面積を有する注目領域ａを選択する。また、興味領域設定ステップ（Ｓ６３）は第１実施形態と同様である。 The second embodiment of the present invention will be described with reference to FIG.
As shown in FIG. 6, in the image search method of the present embodiment, in the region of interest generation step (S60), the region of interest extraction step (S61) is the same as in the first embodiment. In the attention area selection step (S62), the attention area a having the maximum area is selected from the one or more attention areas a, b, c extracted in the attention area extraction step (S61). Also, the region of interest setting step (S63) is the same as in the first embodiment.

図７を参照し、本発明の第３実施形態について説明する。
図７に示されるように、本実施形態の画像検索方法において、興味領域生成ステップ（Ｓ６５）では、注目領域抽出ステップ（Ｓ６６）は第１実施形態と同様である。注目領域選択ステップ（Ｓ６７）では、注目領域抽出ステップ（Ｓ６６）で抽出された１つ以上の注目領域ａ，ｂ，ｃの内、所定の閾値以上の面積を有する注目領域ａ，ｂを選択する。また、興味領域設定ステップ（Ｓ６８）は第１実施形態と同様である。 A third embodiment of the present invention will be described with reference to FIG.
As shown in FIG. 7, in the image search method of the present embodiment, in the region of interest generation step (S65), the region of interest extraction step (S66) is the same as in the first embodiment. In the attention area selection step (S67), the attention areas a and b having an area equal to or larger than a predetermined threshold are selected from the one or more attention areas a, b, and c extracted in the attention area extraction step (S66). . Also, the region of interest setting step (S68) is the same as in the first embodiment.

１５クエリ画像縮小部
２０注目度画像生成部
４０興味領域生成部
４１注目領域抽出部
４２注目領域選択部
４３興味領域設定部
４５特定カテゴリクエリ画像生成部
５２参照画像検索部
８１クエリ画像
８３注目度画像
８５特定カテゴリクエリ画像
ａ，ｂ，ｃ注目領域
Ｄ興味領域
DESCRIPTION OF SYMBOLS 15 Query image reduction part 20 Attention level image generation part 40 Interest area generation part 41 Attention area extraction part 42 Attention area selection part 43 Interest area setting part 45 Specific category query image generation part 52 Reference image search part 81 Query image 83 Attention degree image 85 Specific category query image a, b, c Region of interest D Region of interest

Claims

A degree-of-interest image generation unit that generates a degree-of-interest image that represents the degree of attention indicating the possibility that an object belonging to a specific category exists at each position of the query image from the query image;
A region of interest generation unit that generates a region of interest from the attention level image based on the degree of attention;
A specific category query image generation unit that generates a specific category query image by cutting out an area corresponding to the region of interest from the query image;
A reference image search unit for searching a reference image corresponding to the specific category query image from a database in which a plurality of reference images are recorded;
An image search system.

The image search system further includes a query image reduction unit that reduces the size of the query image,
The attention level image generation unit generates the attention level image from the query image reduced by the query image reduction unit;
The image search system according to claim 1.

The attention level image generation unit uses a deep convolution neural network learned using an image in which an object belonging to a plurality of categories including the specific category is captured,
The image search system according to claim 1.

The region of interest generation unit
A region-of-interest extraction unit that extracts one or more regions of interest in which the degree of attention is greater than or equal to a predetermined threshold value from the attention level image;
A region of interest selection unit that selects one or more regions of interest from one or more regions of interest extracted by the region of interest extraction unit;
A region-of-interest setting unit that sets, as a region of interest, a rectangular region that circumscribes all the regions of interest selected by the region of interest selection unit;
Comprising
The image search system according to claim 1.

The attention area selection unit selects all the attention areas extracted by the attention area extraction unit,
The image search system according to claim 4.

The attention area selection unit selects an attention area having the largest area among one or more attention areas extracted by the attention area extraction unit.
The image search system according to claim 4.

The attention area selection unit selects an attention area having an area equal to or larger than a predetermined threshold among one or more attention areas extracted by the attention area extraction unit.
The image search system according to claim 4.

An attention level image generation step for generating an attention level image indicating the attention level indicating the possibility that an object belonging to a specific category exists at each position of the query image from the query image;
A region of interest generation step for generating a region of interest based on the degree of attention from the attention level image;
A specific category query image generation step of generating a specific category query image by cutting out a region corresponding to the region of interest from the query image;
A reference image search step of searching for a reference image corresponding to the specific category query image from a database in which a plurality of reference images are recorded;
An image search method comprising:

On the computer,
A degree-of-interest image generation function that generates a degree-of-interest image that indicates the degree of attention indicating the possibility that an object belonging to a specific category exists at each position of the query image from the query image;
A region of interest generation function for generating a region of interest based on the degree of attention from the attention level image;
A specific category query image generation function for generating a specific category query image by cutting out an area corresponding to the region of interest from the query image;
A reference image search function for searching a reference image corresponding to the specific category query image from a database in which a plurality of reference images are recorded;
An image search program that realizes