JP2015056077A

JP2015056077A - Image retrieval device, system, program, and method using image based binary feature vector

Info

Publication number: JP2015056077A
Application number: JP2013189872A
Authority: JP
Inventors: 祐介内田; Yusuke Uchida; 茂之酒澤; Shigeyuki Sakasawa
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2013-09-12
Filing date: 2013-09-12
Publication date: 2015-03-23
Anticipated expiration: 2033-09-12
Also published as: JP6041439B2

Abstract

PROBLEM TO BE SOLVED: To provide an image retrieval device, program, and method capable of minimizing the capacity of a reference information database and also reducing the amount of arithmetic operation performed during image retrieval.SOLUTION: The image retrieval device comprises: hash codebook storage means for storing a hash codebook for each hash value; bit number codebook storage means for storing a high-entropy bit number set for each hash value; local feature extraction means for extracting a binary feature vector set from an image; hash means for outputting a hash value most resembling the binary feature vector by using the hash codebook storage means; bit string generation means for generating a new bit string from a hash value bit number set by using the bit number codebook storage means; transposition index storage means for storing a reference image bit string for each hash value in correlation with each other; and retrieval means for retrieving a reference image from a query image bit string.

Description

本発明は、バイナリ特徴ベクトルを用いて画像を検索する技術に関する。 The present invention relates to a technique for retrieving an image using a binary feature vector.

近年、局所特徴点に基づいた画像認識や検索技術が注目されている。物体認識に用いる特徴ベクトルの抽出アルゴリズムとしては、回転やスケールの変化にロバストな、例えばＳＩＦＴ(Scale-Invariant Feature Transform)（例えば非特許文献１参照）やＳＵＲＦ(Speeded Up Robust Features)が用いられる。例えばＳＩＦＴの場合、１枚の画像からは１２８次元の特徴ベクトルの集合が抽出される。ＳＩＦＴとは、スケールスペースを用いて特徴的な局所領域を解析し、そのスケール変化及び回転に不変となる特徴ベクトルを記述する技術である。一方で、ＳＵＲＦの場合、ＳＩＦＴよりも高速処理が可能であって、１枚の画像から６４次元の特徴ベクトルの集合が抽出される。ＳＩＦＴは、処理コストが高く且つリアルタイムマッチングが困難であるのに対し、ＳＵＲＦは、積分画像を利用することによって処理を高速化している。 In recent years, image recognition and search techniques based on local feature points have attracted attention. As a feature vector extraction algorithm used for object recognition, for example, SIFT (Scale-Invariant Feature Transform) (see, for example, Non-Patent Document 1) or SURF (Speeded Up Robust Features), which is robust to changes in rotation and scale, is used. For example, in the case of SIFT, a set of 128-dimensional feature vectors is extracted from one image. SIFT is a technique for analyzing a characteristic local region using a scale space and describing a feature vector that is invariant to scale change and rotation. On the other hand, in the case of SURF, higher-speed processing is possible than SIFT, and a set of 64-dimensional feature vectors is extracted from one image. While SIFT has a high processing cost and is difficult to perform real-time matching, SURF uses an integral image to speed up the processing.

一方で、スマートフォンやタブレット端末のような携帯端末が普及する中で、コンテンツ検索処理に対して、更なる省メモリ化及び高速マッチング化が要求されてきている。特に、拡張現実感（Augmented Reality, AR）の用途における画像認識の技術分野によれば、リアルタイムに処理するべく、ＳＩＦＴやＳＵＲＦよりも更に高速にコンテンツを検索することが要求される。このため、バイナリ特徴ベクトル抽出アルゴリズムであるＦＡＳＴ(Features from Accelerated Segment Test)（例えば非特許文献２参照）やＦＲＥＡＫ(Fast Retina Keypoint)（例えば非特許文献３参照）が注目されている。これは、ＳＩＦＴやＳＵＲＦよりも高速に特徴ベクトルが抽出でき、更に抽出される特徴ベクトルもコンパクトにすることができる。 On the other hand, while portable terminals such as smartphones and tablet terminals are widespread, further memory saving and faster matching have been required for content search processing. In particular, according to the technical field of image recognition in the use of augmented reality (Augmented Reality), it is required to search for content at higher speed than SIFT or SURF in order to perform real-time processing. For this reason, FAST (Features from Accelerated Segment Test) (for example, see Non-Patent Document 2) and FRAK (Fast Retina Keypoint) (for example, see Non-Patent Document 3), which are binary feature vector extraction algorithms, are attracting attention. This makes it possible to extract feature vectors at a higher speed than SIFT and SURF, and to further reduce the extracted feature vectors.

特開２０１０−２８２５８１号公報JP 2010-282581 A

J. Sivic et al., "Video Google: A Text Retrieval Approach toObject Matching in Videos," in Proc. ICCV, 2003.J. Sivic et al., "Video Google: A Text Retrieval Approach to Object Matching in Videos," in Proc. ICCV, 2003. E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, "ORB: Anefficient alternative to SIFT or SURF," in Proc. ICCV, 2011.E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, "ORB: Anefficient alternative to SIFT or SURF," in Proc. ICCV, 2011. A. Alahi, R. Ortiz, and P. Vandergheynst, "FREAK: Fast RetinaKeypoint," in Proc. CVPR, 2012.A. Alahi, R. Ortiz, and P. Vandergheynst, "FREAK: Fast RetinaKeypoint," in Proc. CVPR, 2012. 「相関係数」、[online]、［平成２５年８月３１日検索］、インターネット＜URL:http://ja.wikipedia.org/wiki/%E7%9B%B8%E9%96%A2%E4%BF%82%E6%95%B0＞“Correlation coefficient”, [online], [searched August 31, 2013], Internet <URL: http://en.wikipedia.org/wiki/%E7%9B%B8%E9%96%A2% E4% BF% 82% E6% 95% B0>

スマートフォンのような携帯端末によれば、撮影したクエリ画像に類似するリファレンス画像を直ぐに検索できるアプリケーションがある。一方で、画像検索（認識）技術によれば、認識対象となる大量のリファレンス情報を、データベースとして予め保持しておく必要がある。そのために、スマートフォンにインストールされる画像検索用アプリケーション自体も、リファレンス情報のデータベースを持つ必要がある。 According to a portable terminal such as a smartphone, there is an application that can immediately search for a reference image similar to a captured query image. On the other hand, according to the image search (recognition) technique, a large amount of reference information to be recognized needs to be stored in advance as a database. Therefore, the image search application itself installed in the smartphone needs to have a reference information database.

しかしながら、大量のリファレンス画像を認識しようとするほど、そのデータベースも大容量となり、結果的にそのアプリケーションのサイズ自体も大きくなる。これに対し、ユーザとしては、比較的メモリ容量が小さいスマートフォンに、大容量のアプリケーションをインストールすることに抵抗感を生じやすい。また、データベースが大容量となるほど、画像検索アプリケーションで検索に要する演算処理量（及び演算時間）も大きくなる。 However, the larger the number of reference images that are recognized, the larger the database becomes, and as a result, the size of the application itself increases. On the other hand, the user tends to feel resistance to installing a large-capacity application on a smartphone having a relatively small memory capacity. In addition, as the capacity of the database increases, the amount of computation processing (and computation time) required for retrieval by the image retrieval application increases.

勿論、スマートフォンにインストールされた画像検索用アプリケーションが、ネットワークを介して逐次にデータベースファイルのみをダウンロードすることもできる。しかしながら、データベースファイルの容量が大きくなるほど、そのダウンロードに要する時間も長くなり、結果的にユーザビリティを損なう可能性がある。 Of course, the image search application installed in the smartphone can also download only the database file sequentially via the network. However, the larger the capacity of the database file, the longer it takes to download it, which may result in a loss of usability.

そこで、本発明によれば、リファレンス情報のデータベースをできる限り小容量とすると共に、画像検索時の演算処理量も少なくすることができる画像検索装置、プログラム及び方法を提供することを目的とする。 Accordingly, an object of the present invention is to provide an image search apparatus, program, and method that can reduce the reference information database as much as possible and reduce the amount of calculation processing during image search.

本発明によれば、多数のリファレンス画像の中から、クエリ画像に類似したリファレンス画像を検索する画像検索装置であって、
ハッシュ値毎に、バイナリ特徴ベクトルの量子化に用いるハッシュコードブックを記憶するハッシュコードブック記憶手段と、
ハッシュ値毎に、情報量（エントロピー）の高いビット番号集合を記憶するビット番号コードブック記憶手段と、
クエリ画像及びリファレンス画像からバイナリ特徴ベクトルの集合を抽出する局所特徴抽出手段と、
ハッシュコードブック記憶手段を用いて、バイナリ特徴ベクトルに、最も類似するハッシュ値を出力するハッシュ手段と、
ビット番号コードブック記憶手段における当該ハッシュ値のビット番号集合を参照して、バイナリ特徴ベクトルから部分ビット列を選択するビット列生成手段と、
ハッシュ値毎に、リファレンス画像のビット列を対応付けて記憶する転置インデックス記憶手段と、
転置インデックス記憶手段を用いて、クエリ画像のビット列からリファレンス画像を検索する検索手段と
を有することを特徴とする。 According to the present invention, an image search device for searching a reference image similar to a query image from a number of reference images,
Hash codebook storage means for storing a hash codebook used for quantization of a binary feature vector for each hash value;
Bit number codebook storage means for storing a set of bit numbers having a high amount of information (entropy) for each hash value;
Local feature extraction means for extracting a set of binary feature vectors from a query image and a reference image;
Hashing means for outputting a hash value most similar to a binary feature vector using a hash codebook storage means;
A bit string generation unit that selects a partial bit string from a binary feature vector with reference to a bit number set of the hash value in the bit number codebook storage unit;
For each hash value, transposed index storage means for storing a bit string of a reference image in association with each other,
Search means for searching for a reference image from a bit string of a query image using transposed index storage means.

本発明の画像検索装置における他の実施形態によれば、
多数のリファレンス画像における大量のバイナリ特徴ベクトルをクラスタリングし、クラスタ毎の代表バイナリ特徴ベクトルと、その識別子であるハッシュ値とを、ハッシュコードブック記憶手段へ記憶させるハッシュコードブック生成手段を更に有することも好ましい。 According to another embodiment of the image search device of the present invention,
It may further comprise a hash codebook generating means for clustering a large number of binary feature vectors in a large number of reference images and storing the representative binary feature vector for each cluster and a hash value as an identifier thereof in a hash codebook storage means. preferable.

本発明の画像検索装置における他の実施形態によれば、
ハッシュコードブック生成手段は、クラスタリングに、k-means法又はk-medoids法を用いることも好ましい。 According to another embodiment of the image search device of the present invention,
It is also preferable that the hash codebook generating means uses a k-means method or a k-medoids method for clustering.

本発明の画像検索装置における他の実施形態によれば、
同一ハッシュ値の１つのクラスタに含まれる多数のバイナリ特徴ベクトルについて、当該ビットの情報量が高く、且つ、当該ビットと他のビットとの間で相関が小さいビット番号を並べたビット番号列を導出し、ハッシュ値毎のそのビット番号列をビット番号コードブック記憶手段へ記憶させるビット番号コードブック生成手段を更に有することも好ましい。 According to another embodiment of the image search device of the present invention,
For a number of binary feature vectors included in one cluster of the same hash value, a bit number sequence in which bit numbers with a high amount of information of the bit and a small correlation between the bit and other bits are arranged. It is also preferable to further include a bit number code book generating means for storing the bit number string for each hash value in the bit number code book storage means.

本発明の画像検索装置における他の実施形態によれば、
ビット番号コードブック生成手段は、ビット番号列に、１つのビット番号ずつ収容するために、
同一ハッシュ値の１つのクラスタに含まれる多数のバイナリ特徴ベクトルについて、各ビット（０／１）の平均値が０．５に近いビット（情報量が高いビット）から順に並べた仮ビット番号列を生成し、
仮ビット番号列の上位ビットから順に、既にビット番号列に収容された他のビットと間の相関係数の絶対値が所定閾値以下であれば、当該上位ビットのビット番号をビット番号列に収容し、これをビット番号列が所定数となるまで収容することも好ましい。 According to another embodiment of the image search device of the present invention,
The bit number code book generating means stores the bit number one by one in the bit number string.
For a large number of binary feature vectors included in one cluster of the same hash value, a temporary bit number sequence in which the average value of each bit (0/1) is arranged in order from the bit close to 0.5 (the bit with the highest amount of information) Generate and
If the absolute value of the correlation coefficient with other bits already accommodated in the bit number sequence is less than or equal to a predetermined threshold, the bit number of the upper bit is accommodated in the bit number sequence in order from the upper bit of the temporary bit number sequence It is also preferable to accommodate this until the bit number string reaches a predetermined number.

本発明の画像検索装置における他の実施形態によれば、
転置インデックス記憶手段は、リファレンス画像について、ハッシュ値LIDn毎に、リファレンス画像識別子とビット列との複数の組を、リスト形式で対応付けた転置インデックスを登録したものであることも好ましい。 According to another embodiment of the image search device of the present invention,
Preferably, the transposed index storage means registers a transposed index in which a plurality of sets of reference image identifiers and bit strings are associated in a list format for each hash value LIDn for the reference image.

本発明の画像検索装置における他の実施形態によれば、
検索手段は、
クエリ画像についてハッシュ値LIDn及びの複数の組を入力し、
クエリ画像のハッシュ値とビット列の各組について、当該ハッシュ値に対応する転置インデックスのリストを取得し、
クエリ画像ビット列と、取得したリスト中に含まれるリファレンス画像のビット列との間の非類似度を算出し、
非類似度が低いほど高くなるスコア値を算出し、
当該リファレンス画像毎に該スコア値を累積加算し、
スコア値が最も高くなるリファレンス画像を検索結果として出力することも好ましい。 According to another embodiment of the image search device of the present invention,
Search means are
Enter multiple sets of hash values LIDn and the query image,
For each set of hash value and bit string of query image, obtain a list of transposed indexes corresponding to the hash value,
Calculate the dissimilarity between the query image bit string and the reference image bit string included in the acquired list,
Calculate the score value that increases as the dissimilarity is lower,
The score value is cumulatively added for each reference image,
It is also preferable to output a reference image having the highest score value as a search result.

本発明の画像検索装置における他の実施形態によれば、
検索手段は、非類似度が小さい上位所定件数(K)に相当するスコア値のみを当該リファレンス画像に累積加算することも好ましい。 According to another embodiment of the image search device of the present invention,
It is also preferable that the search means cumulatively add only the score value corresponding to the upper predetermined number (K) having a low dissimilarity to the reference image.

本発明の画像検索装置における他の実施形態によれば、
検索手段について、上位所定件数(K)は、予め設定された固定数か、又は、予め設定された非類似度に関する閾値THに基づいて当該非類似度が閾値TH以下となる非類似度の数であることも好ましい。 According to another embodiment of the image search device of the present invention,
For the search means, the upper predetermined number (K) is a preset fixed number or the number of dissimilarities whose dissimilarity is equal to or less than the threshold TH based on a preset threshold TH related to dissimilarity It is also preferable.

本発明の画像検索装置における他の実施形態によれば、
検索手段における、上位所定件数(K)のうちi件目に対応する加算スコアについて、K番目の非類似度が、i番目の非類似度と比較して大きい場合に、加算スコアの値が大きく設定されることも好ましい。 According to another embodiment of the image search device of the present invention,
In the search means, for the addition score corresponding to the i-th item in the top predetermined number (K), if the K-th dissimilarity is larger than the i-th dissimilarity, the value of the addition score is large It is also preferable that it is set.

本発明の画像検索装置における他の実施形態によれば、
検索手段について、上位所定件数(K)のうちi件目に対応する加算スコアは、
（１）K番目の非類似度の二乗からi番目の非類似度の二乗を引いたものとするか、
（２）K番目の非類似度の二乗とi番目の非類似度の二乗の比から1を引いたものとするか、
（３）K番目の非類似度とi番目の非類似度の比の二乗から1を引いたものとするか、又は、
（４）K番目の非類似度とi番目の非類似度の比から1を引いたものの二乗とする
ことも好ましい。 According to another embodiment of the image search device of the present invention,
Regarding the search means, the addition score corresponding to the i-th item in the top predetermined number (K) is
(1) Whether the square of the i-th dissimilarity is subtracted from the square of the K-th dissimilarity,
(2) Whether 1 is subtracted from the ratio of the square of the Kth dissimilarity and the square of the ith dissimilarity,
(3) Subtract 1 from the square of the ratio of the Kth dissimilarity and the ith dissimilarity, or
(4) It is also preferable to set the square of the ratio of the Kth dissimilarity and the ith dissimilarity minus one.

本発明の画像検索装置における他の実施形態によれば、
局所特徴抽出手段は、ＯＲＢ(Oriented FAST and Rotated BRIEF)、ＦＲＥＡＫ(Fast Retina Keypoint)、又はＢＲＩＳＫ(Binary Robust Independent Elementary Features)に基づくバイナリ特徴ベクトルを抽出することも好ましい。 According to another embodiment of the image search device of the present invention,
The local feature extraction means preferably extracts a binary feature vector based on ORB (Oriented FAST and Rotated BRIEF), FREK (Fast Retina Keypoint), or BRISK (Binary Robust Independent Elementary Features).

本発明によれば、多数のリファレンス画像の中から、クエリ画像に類似したリファレンス画像を検索する装置に搭載されたコンピュータを機能させるプログラムであって、
ハッシュ値毎に、バイナリ特徴ベクトルの量子化に用いるハッシュコードブックを記憶するハッシュコードブック記憶手段と、
ハッシュ値毎に、情報量（エントロピー）の高いビット番号集合を記憶するビット番号コードブック記憶手段と、
クエリ画像及びリファレンス画像からバイナリ特徴ベクトルの集合を抽出する局所特徴抽出手段と、
ハッシュコードブック記憶手段を用いて、バイナリ特徴ベクトルに、最も類似するハッシュ値を出力するハッシュ手段と、
ビット番号コードブック記憶手段における当該ハッシュ値のビット番号集合を参照して、バイナリ特徴ベクトルから部分ビット列を選択するビット列生成手段と、
ハッシュ値毎に、リファレンス画像のビット列を対応付けて記憶する転置インデックス記憶手段と、
転置インデックス記憶手段を用いて、クエリ画像のビット列からリファレンス画像を検索する検索手段と
してコンピュータを機能させることを特徴とする。 According to the present invention, there is provided a program for causing a computer mounted on an apparatus for searching a reference image similar to a query image to function from among a large number of reference images,
Hash codebook storage means for storing a hash codebook used for quantization of a binary feature vector for each hash value;
Bit number codebook storage means for storing a set of bit numbers having a high amount of information (entropy) for each hash value;
Local feature extraction means for extracting a set of binary feature vectors from a query image and a reference image;
Hashing means for outputting a hash value most similar to a binary feature vector using a hash codebook storage means;
A bit string generation unit that selects a partial bit string from a binary feature vector with reference to a bit number set of the hash value in the bit number codebook storage unit;
For each hash value, transposed index storage means for storing a bit string of a reference image in association with each other,
Using the transposed index storage means, the computer functions as search means for searching for a reference image from a bit string of a query image.

本発明によれば、装置を用いて、多数のリファレンス画像の中から、クエリ画像に類似したリファレンス画像を検索する方法であって、
当該装置は、
ハッシュ値毎に、バイナリ特徴ベクトルの量子化に用いるハッシュコードブックを記憶するハッシュコードブック記憶部と、
ハッシュ値毎に、情報量（エントロピー）の高いビット番号集合を記憶するビット番号コードブック記憶部と、
ハッシュ値毎に、リファレンス画像のビット列を対応付けて記憶する転置インデックス記憶部と
を有し、
クエリ画像及びリファレンス画像からバイナリ特徴ベクトルの集合を抽出する第１のステップと、
ハッシュコードブック記憶部を用いて、バイナリ特徴ベクトルに、最も類似するハッシュ値を出力する第２のステップと、
ビット番号コードブック記憶部における当該ハッシュ値のビット番号集合を参照して、バイナリ特徴ベクトルから部分ビット列を選択する第３のステップと、
転置インデックス記憶部を用いて、クエリ画像のビット列からリファレンス画像を検索する第４のステップと
を有することを特徴とする。 According to the present invention, a method for searching a reference image similar to a query image from among a large number of reference images using an apparatus,
The device is
For each hash value, a hash codebook storage unit that stores a hash codebook used for quantization of binary feature vectors;
A bit number codebook storage unit for storing a set of bit numbers having a high amount of information (entropy) for each hash value;
For each hash value, it has a transposed index storage unit that stores a bit string of a reference image in association with each other,
A first step of extracting a set of binary feature vectors from a query image and a reference image;
A second step of outputting a hash value most similar to the binary feature vector using the hash codebook storage unit;
A third step of selecting a partial bit string from a binary feature vector with reference to a set of bit numbers of the hash value in the bit number codebook storage unit;
And a fourth step of searching the reference image from the bit string of the query image using the transposed index storage unit.

本発明の画像検索装置、プログラム及び方法によれば、リファレンス情報のデータベースをできる限り小容量とすると共に、画像検索時の演算処理量も少なくすることができる。 According to the image search apparatus, program, and method of the present invention, the reference information database can be made as small as possible, and the amount of calculation processing during image search can be reduced.

本発明における画像検索装置の機能構成図である。It is a functional block diagram of the image search device in the present invention. 本発明における検索情報配信サーバの機能構成図である。It is a function block diagram of the search information delivery server in this invention. 図２の検索情報配信サーバと通信可能な端末の機能構成図である。It is a function block diagram of the terminal which can communicate with the search information delivery server of FIG. 本発明における画像検索処理を表す説明図である。It is explanatory drawing showing the image search process in this invention. 本発明におけるビット列の生成を表す説明図である。It is explanatory drawing showing the production | generation of the bit string in this invention. 転置インデックス登録部のデータ構造を表す説明図である。It is explanatory drawing showing the data structure of a transposed index registration part. 検索部における処理を表す説明図である。It is explanatory drawing showing the process in a search part. 本発明における検索部の処理を表すフローチャートである。It is a flowchart showing the process of the search part in this invention.

以下、本発明の実施の形態について、図面を用いて詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図１は、本発明における画像検索装置の機能構成図である。
図２は、本発明における検索情報配信サーバの機能構成図である。
図３は、図２の検索情報配信サーバと通信可能な端末の機能構成図である。 FIG. 1 is a functional configuration diagram of an image search apparatus according to the present invention.
FIG. 2 is a functional configuration diagram of the search information distribution server in the present invention.
FIG. 3 is a functional configuration diagram of a terminal capable of communicating with the search information distribution server of FIG.

画像検索装置１は、大量のリファレンス画像（検索対象画像）を予め入力し、データベース（コードブックやインデックス）を生成し蓄積している。そのために、以下のように２つの実施の形態に区分できる。
（１）画像検索装置（単体）：図１のように、画像検索装置１は、ユーザから多数のリファレンス画像を入力し、そのリファレンス画像に対するデータベースを生成し蓄積する。そして、画像検索装置１は、ユーザからクエリ画像（検索キー画像）を入力し、データベースを用いてそのクエリ画像に対するリファレンス画像を検索し、そのリファレンス画像の検索結果をユーザに対して出力する。 The image search apparatus 1 inputs a large amount of reference images (search target images) in advance, and generates and accumulates a database (codebook or index). Therefore, it can be divided into two embodiments as follows.
(1) Image Retrieval Device (Single Unit): As shown in FIG. 1, the image retrieval device 1 inputs a large number of reference images from a user, and generates and accumulates a database for the reference images. Then, the image search device 1 inputs a query image (search key image) from the user, searches a reference image for the query image using a database, and outputs a search result of the reference image to the user.

（２）サーバ−クライアント・システム：図２のように、検索情報配信サーバ２は、大量のリファレンス画像を入力し、そのリファレンス画像に対するデータベースを生成し蓄積する。
クライアントとしての端末１は、ネットワークを介して検索情報配信サーバ２から、そのデータベース情報をダウンロードする。そのデータベース情報は、端末１の記憶領域（メモリ空間やディスク空間）に蓄積される。
そして、図３のように、端末１は、ユーザからクエリ画像を入力し、データベースを用いてそのクエリ画像に対するリファレンス画像を検索し、そのリファレンス画像の検索結果をユーザに対して出力する。 (2) Server-client system: As shown in FIG. 2, the search information distribution server 2 inputs a large amount of reference images, and generates and stores a database for the reference images.
The terminal 1 as a client downloads the database information from the search information distribution server 2 via the network. The database information is stored in a storage area (memory space or disk space) of the terminal 1.
Then, as illustrated in FIG. 3, the terminal 1 inputs a query image from the user, searches a reference image for the query image using a database, and outputs a search result of the reference image to the user.

図１と、図２及び図３とは、異なるシステム構成となるが、各機能構成部は全く同じである。以下では、代表的に図１を参照して説明する。 Although FIG. 1 differs from FIG. 2 and FIG. 3 in system configuration, each functional component is completely the same. The following description will be made with reference to FIG.

図１によれば、画像検索装置１は、ハッシュコードブック記憶部１０１と、ビット番号コードブック記憶部１０２と、転置インデックス記憶部１０３と、局所特徴抽出部１１と、ハッシュ部１２と、ビット列生成部１３と、検索部１４と、ハッシュコードブック生成部１５と、ビット番号コードブック生成部１６とを有する。これら機能構成部は、装置に搭載されたコンピュータを機能させる画像検索用プログラムを実行することによって実現される。例えばスマートフォンで画像を検索する場合、画像検索装置１は、インストール可能な画像検索アプリケーションであってもよい。また、これら機能構成部の処理の流れは、装置を用いた画像検索方法としても理解できる。 According to FIG. 1, the image search apparatus 1 includes a hash codebook storage unit 101, a bit number codebook storage unit 102, a transposed index storage unit 103, a local feature extraction unit 11, a hash unit 12, and a bit string generation. Section 13, search section 14, hash codebook generation section 15, and bit number codebook generation section 16. These functional components are realized by executing an image search program that causes a computer installed in the apparatus to function. For example, when searching for an image with a smartphone, the image search device 1 may be an installable image search application. Further, the processing flow of these functional components can be understood as an image search method using the apparatus.

図４は、本発明における画像検索処理を表す説明図である。以下では、適宜、図４を参照して説明する。 FIG. 4 is an explanatory diagram showing image search processing in the present invention. Hereinafter, description will be made with reference to FIG. 4 as appropriate.

［局所特徴抽出部１１］
局所特徴抽出部１１は、クエリ画像及びリファレンス画像から、局所特徴のバイナリ特徴ベクトルの集合を抽出する。本発明によれば、バイナリ特徴ベクトルの抽出アルゴリズムとして、ＯＲＢ(Oriented FAST and Rotated BRIEF)又はＦＲＥＡＫ(Fast Retina Keypoint)を用いる。ＯＲＢの場合、１つのコンテンツから２５６ビットのバイナリ特徴ベクトルの集合が抽出される。例えば、高速にマッチングを実行するべく、バイナリコードによる特徴記述としてＢＲＩＥＦ(Binary Robust Independent Elementary Features)がある。本発明によれば、ＢＲＩＥＦに回転不変性を導入した特徴記述をすることができる「ＯＲＢ」を用いる。特に、ＯＲＢによれば、ＳＩＦＴやＳＵＲＦと比較して、同等以上の精度を保持すると共に、数百倍の高速化を実現することができる。 [Local feature extraction unit 11]
The local feature extraction unit 11 extracts a set of binary feature vectors of local features from the query image and the reference image. According to the present invention, ORB (Oriented FAST and Rotated BRIEF) or FRAK (Fast Retina Keypoint) is used as the binary feature vector extraction algorithm. In the case of ORB, a set of 256-bit binary feature vectors is extracted from one content. For example, in order to execute matching at high speed, there is BRIEF (Binary Robust Independent Elementary Features) as a feature description by a binary code. According to the present invention, “ORB” is used which can describe a feature in which rotation invariance is introduced into BRIEF. In particular, according to the ORB, it is possible to maintain an accuracy equal to or higher than that of SIFT or SURF and realize a speed increase of several hundred times.

ＯＲＢは、「特徴点検出処理」と「特徴ベクトル記述処理」との２つのステップから構成される。 The ORB is composed of two steps of “feature point detection processing” and “feature vector description processing”.

（特徴点検出処理）
ＯＲＢの特徴点検出処理によれば、高速にキーポイントを検出するためにＦＡＳＴを用いる。また、ＦＡＳＴでは、スケール変化に対してロバストではないため、画像を複数のサイズに変換し、それぞれのサイズの画像から特徴点を抽出する。
また、既存のＦＡＳＴには、回転不変性を得るためのキーポイントのオリエンテーション算出のアルゴリズムがない。そのために、ＯＲＢでは、回転不変性を得るべくOriented FASTを採用している。オリエンテーションを基準として特徴記述をすることによって、入力画像が回転していても、同一なキーポイントは同一な特徴量となって検出することができる。そのために、キーポイントの中心とパッチの輝度の重心の方向ベクトルを用いる。 (Feature point detection processing)
According to the ORB feature point detection process, FAST is used to detect key points at high speed. In addition, since FAST is not robust to scale changes, an image is converted into a plurality of sizes, and feature points are extracted from images of each size.
Further, the existing FAST does not have an algorithm for calculating the key point orientation for obtaining rotation invariance. Therefore, ORB adopts Oriented FAST in order to obtain rotational invariance. By describing the features based on the orientation, even if the input image is rotated, the same key point can be detected as the same feature amount. Therefore, the direction vector of the center of the key point and the center of gravity of the brightness of the patch is used.

（特徴ベクトル記述処理）
次に、ＯＲＢにおける特徴ベクトル記述処理によれば、検出された特徴点毎に、ＢＲＩＥＦ特徴ベクトル記述子によってバイナリ特徴ベクトルが抽出される。これらは、特徴点周辺の２箇所のピクセルの輝度の大小関係から求められる。
ＢＲＩＥＦは、バイナリコードによってキーポイントの特徴量記述を実行することができる。ＳＩＦＴやＳＵＲＦによれば、特徴量記述に高次元の実数を用いていた。しかしながら、高次元の実数を用いた場合、メモリ容量の増加と類似度計算の増加と問題となる。そこで、ＯＲＢに基づくＢＲＩＥＦを用いることによって、バイナリコードによって特徴記述することで省メモリ化し、類似度計算にハミング距離を用いることで処理コストの抑制を実現する。
ＢＲＩＥＦによれば、パッチ内においてランダムに選択された２点の輝度差の符号からバイナリコードを生成する。選択するピクセルは、キーポイント位置を中心としたガウス分布に従ってランダムに選択する。ここで、ＯＲＢは、更に高精度にマッチングをさせるために、学習を用いてピクセルを選択している。選択するピクセル位置は、ペアのビット分散が大きく且つＮ組のペアの相関が低い場合に、特徴記述能力が高いバイナリコードとして、特徴記述に使用する。Ｎ組のペアは、Greedyアルゴリズムを用いて絞り込む。 (Feature vector description processing)
Next, according to the feature vector description processing in the ORB, a binary feature vector is extracted for each detected feature point using the BRIEF feature vector descriptor. These are obtained from the magnitude relationship of the luminance of two pixels around the feature point.
BRIEF can execute keypoint feature description by binary code. According to SIFT and SURF, high-dimensional real numbers are used for feature description. However, when a high-dimensional real number is used, there is an increase in memory capacity and similarity calculation. Therefore, by using BREF based on ORB, it is possible to save memory by describing features by binary code, and it is possible to reduce processing costs by using a Hamming distance for similarity calculation.
According to BRIEF, a binary code is generated from the sign of the luminance difference between two points randomly selected in the patch. The pixels to be selected are randomly selected according to a Gaussian distribution centered on the key point position. Here, the ORB selects pixels using learning in order to perform matching with higher accuracy. The selected pixel position is used for feature description as a binary code with high feature description capability when the bit variance of the pair is large and the correlation of N pairs is low. N pairs are narrowed down using the Greedy algorithm.

［ハッシュコードブック記憶部１０１］
ハッシュコードブック記憶部１０１は、ハッシュ値ｓ毎に、バイナリ特徴ベクトルの量子化に用いるハッシュコードブックを記憶する。図４によれば、コードブックは、バイナリ特徴ベクトルＢと同一ビット長のコードベクトルＢ₁〜Ｂ_Sを保存したものである。ＯＲＢのバイナリ特徴ベクトルの場合、例えば２５６ビット長である。 [Hash codebook storage unit 101]
The hash codebook storage unit 101 stores a hash codebook used for quantization of the binary feature vector for each hash value s. According to FIG. 4, the code book stores code vectors B _{1 to} B _S having the same bit length as the binary feature vector B. In the case of an ORB binary feature vector, it is, for example, 256 bits long.

［ハッシュコードブック生成部１５］
ハッシュコードブック生成部１５は、多数のリファレンス画像における大量のバイナリ特徴ベクトルをクラスタリングする。クラスタリングには、ハミング距離を用いたk-medoids法を用いてもよいし、単純にk-means法を用いてもよい。ハッシュコードブック生成部１５は、クラスタ毎の代表バイナリ特徴ベクトルをコードベクトルとし、その識別子であるハッシュ値を対応付けて、ハッシュコードブック記憶部１０１へ記憶させる。 [Hash codebook generator 15]
The hash codebook generation unit 15 clusters a large number of binary feature vectors in a large number of reference images. For clustering, the k-medoids method using the Hamming distance may be used, or the k-means method may be simply used. The hash codebook generation unit 15 uses a representative binary feature vector for each cluster as a code vector, associates a hash value as an identifier thereof, and stores the hash value in the hash codebook storage unit 101.

［ハッシュ部１２］
ハッシュ部１２は、ハッシュコードブック記憶部１０１を用いて、局所特徴抽出部１１から出力されたバイナリ特徴ベクトルＢに、最も類似するコードベクトルＢ_sのハッシュ値LIDを出力する。ハッシュ部１２は、バイナリ特徴ベクトルが類似していれば高い確率で同一のハッシュ値を返すことができる。そのために、同一のハッシュ値を持つバイナリ特徴ベクトルのみをマッチングすることによって高速化を実現できる。 [Hash part 12]
Using the hash codebook storage unit 101, the hash unit 12 outputs the hash value LID of the code vector B _s most similar to the binary feature vector B output from the local feature extraction unit 11. If the binary feature vectors are similar, the hash unit 12 can return the same hash value with a high probability. Therefore, speeding up can be realized by matching only binary feature vectors having the same hash value.

「最も類似した」とは、例えばハミング距離が最も近いものであってもよい。
LID＝argmin_s ham（Ｂ,Ｂ_s）
ham（ｘ,ｙ）：ｘとｙとのハミング距離
ハッシュ部１２は、クエリ画像のn（n=1〜N）番目のバイナリ特徴ベクトルのハッシュ値LID_n（＝s）を、ビット列生成部１３へ出力する。 “The most similar” may be, for example, the one having the shortest Hamming distance.
LID = argmin _s ham (B, B _s )
ham (x, y): Hamming distance between x and y The hash unit 12 uses the hash value LID _n (= s) of the n (n = 1 to N) -th binary feature vector of the query image as the bit string generation unit 13. Output to.

データベースの検索時間は、特徴ベクトルのマッチングに多くの時間がかかることを考慮して、最も類似した特徴ベクトルとのみマッチングすることで高速化を実現する。 The database search time is increased by matching only the most similar feature vector in consideration of the fact that much time is required for feature vector matching.

［ビット番号コードブック記憶部１０２］
ビット番号コードブック記憶部１０２は、ハッシュ値s毎に、情報量（エントロピー）の高いビット番号集合を記憶する。ビット番号集合は、例えば６４個である。図４によれば、ハッシュ値毎に、バイナリ特徴ベクトルのビット位置を表すビット番号が、情報量の高い順に登録されている。尚、ここで、情報量が高いビット番号の集合が登録されていればよく、必ずしも情報量が高い順である必要はない。 [Bit Number Codebook Storage Unit 102]
The bit number codebook storage unit 102 stores a set of bit numbers having a high information amount (entropy) for each hash value s. The number of bit number sets is 64, for example. According to FIG. 4, for each hash value, bit numbers representing the bit positions of the binary feature vectors are registered in descending order of information amount. Here, it is sufficient that a set of bit numbers having a high information amount is registered, and it is not always necessary that the information amount is in descending order.

［ビット列生成部１３］
ビット列生成部１３は、ビット番号コードブック記憶部１０２における当該ハッシュ値sのビット番号集合Ｃ_ｓを参照して、バイナリ特徴ベクトルＢから部分ビット列（例えば６４ビット）を選択する。このビット列は、ビット番号集合Ｃ_sに記述されているビット番号の位置のビットのみを、バイナリ特徴ベクトルＢから取り出して並べたものである。これによって、入力されたバイナリ特徴ベクトルよりも短いビット列が生成されることとなる。これは、バイナリ特徴ベクトル間の識別に寄与するビット列のみを生成し、それ以外のビットを捨てることを意味する。そのビット列のみをコードブックとしてデータベースに保存することによって、データベース自体の容量を小さくすることができると共に、画像検索時の演算処理量も少なくすることができる。 [Bit string generation unit 13]
The bit string generation unit 13 refers to the bit number set C _s of the hash value s in the bit number codebook storage unit 102 and selects a partial bit string (for example, 64 bits) from the binary feature vector B. In this bit string, only the bit at the position of the bit number described in the bit number set C _s is extracted from the binary feature vector B and arranged. As a result, a bit string shorter than the input binary feature vector is generated. This means that only a bit string that contributes to discrimination between binary feature vectors is generated and the other bits are discarded. By storing only the bit string as a code book in the database, the capacity of the database itself can be reduced, and the amount of calculation processing at the time of image search can be reduced.

バイナリ特徴ベクトルＢから、ハッシュ値LIDとビット列RCとの組が得られる。
（LID, RC）
LID：log₂Ｓビット（Ｓは、ハッシュコードブック記憶部のコードブック数）
RC：L・log₂Ｆビット（Ｆは、ビット番号コードブック記憶部のコードブック数） From the binary feature vector B, a set of a hash value LID and a bit string RC is obtained.
(LID, RC)
LID: log ₂ S bits (S is the number of codebooks in the hash codebook storage unit)
RC: L·log ₂ F bit (F is the number of codebooks in the bit number codebook storage unit)

そして、ビット列生成部１３は、以下の２つの方向で出力する。
（１）当該クエリ画像のバイナリ特徴ベクトルについては、ハッシュ値LIDn毎に、ビット列RCの複数の組を、検索部１４へ出力する。
（２）当該リファレンス画像のバイナリ特徴ベクトルについては、ハッシュ値LIDn毎に、ビット列RCの複数の組を、転置インデックス登録部１０３へ出力する。
尚、LIDnにおけるnは、入力画像のn（n=1〜N）番目のバイナリ特徴ベクトルを表す。 Then, the bit string generation unit 13 outputs in the following two directions.
(1) For the binary feature vector of the query image, a plurality of sets of bit strings RC are output to the search unit 14 for each hash value LIDn.
(2) For the binary feature vector of the reference image, a plurality of sets of bit strings RC are output to the transposed index registration unit 103 for each hash value LIDn.
Note that n in LIDn represents the n (n = 1 to N) th binary feature vector of the input image.

［ビット番号コードブック生成部１６］
ビット番号コードブック生成部１６は、ハッシュ値毎にビット番号列を生成し、そのビット番号列を、ビット番号コードブック記憶部１０２へ記憶させる。ハッシュ値毎にビット列の分布が異なるため、それらの分布に適したビット番号列のコードブックを作成する。その「ビット番号列」は、同一ハッシュ値の１つのクラスタに含まれる多数のバイナリ特徴ベクトルについて、当該ビットの情報量が高く、且つ、当該ビットと他のビットとの間で相関が小さいビット番号を並べたものである。 [Bit Number Codebook Generation Unit 16]
The bit number code book generating unit 16 generates a bit number string for each hash value, and stores the bit number string in the bit number code book storage unit 102. Since the distribution of bit strings differs for each hash value, a code book of bit number strings suitable for those distributions is created. The “bit number sequence” is a bit number that has a high information amount of the bit and a small correlation between the bit and other bits for a number of binary feature vectors included in one cluster of the same hash value. Are arranged.

図５は、本発明におけるビット列の生成を表す説明図である。 FIG. 5 is an explanatory diagram showing generation of a bit string in the present invention.

ハッシュ値毎に、以下のステップでビット番号列が生成される。
（Ｓ５１）バイナリ特徴ベクトルの全ビットの総当たりで、ビット間の相関係数を算出する。例えばＯＲＢの場合、図５のように２５６ビットの総当たりの相関係数が算出される（例えば非特許文献４参照）。相関係数は、−１〜１の値を取る。i番目のビットとj番目のビットと間の相関係数を、例えば以下では８つのベクトルを用いて算出する例を表す。
（0, 0, 0, 0, 1, 1, 1, 1）
↑ベクトル１のiビット目
↑ベクトル２のiビット目
↑ベクトル３のiビット目
・・・・・
↑ベクトル８のiビット目
（1, 1, 1, 1, 0, 0, 0, 0）
↑ベクトル１のjビット目
↑ベクトル２のjビット目
↑ベクトル３のjビット目
・・・・・
↑ベクトル８のjビット目
この場合、以下のように相関係数が算出される。
（0, 0, 0, 0, 1, 1, 1, 1）<->（1, 1, 1, 1, 0, 0, 0, 0）相関係数＝-１
同様に、他の例によれば、以下のように相関係数が算出される。
（0, 0, 0, 0, 1, 1, 1, 1）<->（0, 0, 0, 1, 1, 1, 1, 0）相関係数＝0.5
（0, 0, 0, 0, 1, 1, 1, 1）<->（0, 0, 1, 1, 1, 1, 0, 0）相関係数＝0 For each hash value, a bit number string is generated by the following steps.
(S51) A correlation coefficient between bits is calculated for all the bits of the binary feature vector. For example, in the case of ORB, a 256-bit round-robin correlation coefficient is calculated as shown in FIG. 5 (see, for example, Non-Patent Document 4). The correlation coefficient takes a value of −1 to 1. In the following, for example, the correlation coefficient between the i-th bit and the j-th bit is calculated using eight vectors.
(0, 0, 0, 0, 1, 1, 1, 1)
↑ i-th bit of vector 1
↑ i-th bit of vector 2
↑ Bit i of vector 3
...
↑ i-th bit of vector 8 (1, 1, 1, 1, 0, 0, 0, 0)
↑ jth bit of vector 1
↑ jth bit of vector 2
↑ jth bit of vector 3
...
↑ jth bit of vector 8 In this case, the correlation coefficient is calculated as follows.
(0, 0, 0, 0, 1, 1, 1, 1) <-> (1, 1, 1, 1, 0, 0, 0, 0) Correlation coefficient = -1
Similarly, according to another example, the correlation coefficient is calculated as follows.
(0, 0, 0, 0, 1, 1, 1, 1) <-> (0, 0, 0, 1, 1, 1, 1, 0) Correlation coefficient = 0.5
(0, 0, 0, 0, 1, 1, 1, 1) <-> (0, 0, 1, 1, 1, 1, 0, 0) Correlation coefficient = 0

（Ｓ５２）同一ハッシュ値の１つのクラスタに含まれる多数のバイナリ特徴ベクトルについて、ビット番号毎の平均値を算出する。ここで、平均値から、以下のような関係が想定される。
平均値が０に近いほど、両方のビットが”０”である確率が高い。
平均値が０．５に近いほど、両方のビットが反転する確率が高い。
（一方のビットが”１”である場合に、他方のビットが”０”となりやすい）
平均値が１に近いほど、両方のビットが”１”である確率が高い。 (S52) The average value for each bit number is calculated for a number of binary feature vectors included in one cluster of the same hash value. Here, the following relationship is assumed from the average value.
The closer the average value is to 0, the higher the probability that both bits are “0”.
The closer the average value is to 0.5, the higher the probability that both bits are inverted.
(When one bit is “1”, the other bit is likely to be “0”)
The closer the average value is to 1, the higher the probability that both bits are “1”.

（Ｓ５３）各ビット（０／１）の平均値が０．５に近いビット（情報量が高いビット）から順に並べた「仮ビット番号列」を生成する。 (S53) A “provisional bit number sequence” is generated in which the average value of each bit (0/1) is arranged in order from the bits close to 0.5 (bits with a large amount of information).

（Ｓ５４）仮ビット番号列の上位ビットから順に、既にビット番号列に収容された他のビットと間の相関係数の絶対値が所定閾値以下であれば、当該上位ビットのビット番号をビット番号列に１ビットずつ収容する。これは、ビット間の相関が高いビット同士を選択しないようにすることを意味する。これをビット番号列が所定数（例えば６４ビット）となるまで収容する。 (S54) If the absolute value of the correlation coefficient with other bits already accommodated in the bit number sequence is not more than a predetermined threshold in order from the upper bit of the temporary bit number sequence, the bit number of the upper bit is changed to the bit number. One bit is stored in the column. This means that bits having high correlation between bits are not selected. This is accommodated until the bit number string reaches a predetermined number (for example, 64 bits).

［転置インデックス記憶部１０３］
転置インデックス記憶部１０３は、ハッシュ値LIDn毎に、画像識別子RIDとビット列RCとの組（RID,RC_n）をリストとして登録したものである。 [Transposed index storage unit 103]
The transposed index storage unit 103 registers a set (RID, RC _n ) of an image identifier RID and a bit string RC as a list for each hash value LIDn.

図６は、転置インデックス登録部のデータ構造を表す説明図である。 FIG. 6 is an explanatory diagram showing the data structure of the inverted index registration unit.

リファレンス画像RIDには、ハッシュ値LIDnとビット列RCnとの複数の組が対応付けられている。
RID -> （LID₁,RC₁）（LID₂,RC₂）・・・（LID_n,RC_n）・・・（LID_N,RC_N）
組（LID,RC）毎に、その画像識別子RIDとビット列RCとの組（RID,RC）を、転置インデックスにおけるそのハッシュ値LIDに連結して登録する。
LID₁ -> （RID,RC）（RID,RC）（RID,RC）・・・
LID₂ -> （RID,RC）（RID,RC）（RID,RC）・・・
・・・
LID_n -> （RID,RC）（RID,RC）（RID,RC）・・・
・・・ A plurality of sets of hash values LIDn and bit strings RCn are associated with the reference image RID.
RID-> (LID ₁ , RC ₁ ) (LID ₂ , RC ₂ ) ... (LID _n , RC _n ) ... (LID _N , RC _N )
For each pair (LID, RC), the pair (RID, RC) of the image identifier RID and the bit string RC is registered by being linked to the hash value LID in the transposed index.
LID _1- > (RID, RC) (RID, RC) (RID, RC) ...
LID _2- > (RID, RC) (RID, RC) (RID, RC) ...
...
LID _n- > (RID, RC) (RID, RC) (RID, RC) ...
...

［検索部１４］
検索部１４は、転置インデックス記憶部１０３を用いて、クエリ画像のビット列からリファレンス画像を検索する。入力されるクエリ画像は、ハッシュ値LID_n及びビット列QC_nの複数の組からなる。
クエリ符号 ->（LID₁,QC₁）（LID₂,QC₂）・・・（LID_n,QC_n）・・・（LID_N,QC_N） [Search unit 14]
The search unit 14 uses the transposed index storage unit 103 to search for a reference image from the bit string of the query image. The input query image is composed of a plurality of sets of hash values LID _n and bit strings QC _n .
Query code-> (LID ₁ , QC ₁ ) (LID ₂ , QC ₂ ) ... (LID _n , QC _n ) ... (LID _N , QC _N )

図７は、検索部における処理を表す説明図である。
図８は、本発明における検索部の処理を表すフローチャートである。 FIG. 7 is an explanatory diagram illustrating processing in the search unit.
FIG. 8 is a flowchart showing the processing of the search unit in the present invention.

検索部１４は、以下の処理ステップを、リファレンス画像毎に実行し（Ｓ４）、最終的に、最も高いスコア値のリファレンス画像を検索結果として出力する（Ｓ５）。
score[]＝0 （Ｓ０）
for each i＝１〜N
転置インデックスのLID_i番目のリストを取得（Ｓ１）
リスト中の組（RID₁, RC₁）〜（RID_M, RC_M）について
クエリ画像に基づく組（QC_i, RC_j）の距離Ｄ_ijを算出し、
距離と画像識別子の組（Ｄ_ij, RID_j）を作成する（Ｓ２）
Ｄ_ijを昇順にソートする（Ｓ３）
上位K番目までの組Ｄ_i'j'を選択する
for each k＝1〜K
k番目のＤ_i'j'について、score［RC_j'］＋＝Ｓ（Ｄ_i'j',Ｄ）
end for
end for The search unit 14 executes the following processing steps for each reference image (S4), and finally outputs the reference image having the highest score value as a search result (S5).
score [] = 0 (S0)
for each i = 1 to N
Get LID _i- th list of inverted index (S1)
About pairs (RID ₁ , RC ₁ ) to (RID _M , RC _M ) in the list
The distance D _ij of the pair (QC _i , RC _j ) based on the query image is calculated,
A pair of distance and image identifier (D _ij , RID _j ) is created (S2)
Sort D _ij in ascending order (S3)
Select the top K-th set D _{i'j '}
for each k = 1 ~ K
For the k-th D _{i′j ′} , score [RC _{j ′} ] + = S (D _{i′j ′} , D)
end for
end for

（Ｓ８０）最初に、初期設定として、変数score[]＝０とする。 (S80) First, as an initial setting, a variable score [] = 0 is set.

検索部１４は、以下のＳ１〜Ｓ３の処理を、入力されたクエリ画像に基づくハッシュ値とビット列の組（LID_i, QC_i）毎に、繰り返し実行する（i＝1〜N）。 The search unit 14 repeatedly executes the following processes S1 to S3 for each set of hash values and bit strings (LID _i , QC _i ) based on the input query image (i = 1 to N).

（Ｓ８１）クエリ画像の各ハッシュ値とビット列の組（LID_i,QC_i）について、当該ハッシュ値LID_iに対応する転置インデックスのリスト（RID_j,RC_j）・・・を取得する。 (S81) For each hash value and bit string pair (LID _i , QC _i ) of the query image, a transposed index list (RID _j , RC _j )... Corresponding to the hash value LID _i is acquired.

（Ｓ８２）転置インデックスの当該ハッシュ値LID_iに並ぶリストの分だけ、以下の処理を繰り返し実行する（j＝1〜M）。
クエリ画像のビット列QC_iと、取得したリスト中のリファレンス画像の各ビット列RC_jとの間の距離Ｄ_ijを算出する。そして、その距離Ｄ_ijと画像識別子RID_jとの組（Ｄ_ij, RID_j）を作成する。 (S82) The following processing is repeatedly executed for the list arranged in the hash value LID _i of the inverted index (j = 1 to M).
A distance D _ij between the bit string QC _i of the query image and each bit string RC _j of the reference image in the acquired list is calculated. Then, a set (D _ij , RID _j ) of the distance D _ij and the image identifier RID _j is created.

（Ｓ８３）次に、距離Ｄ_ijと画像識別子RID_jとの組（Ｄ_ij, RID_j）を、距離が短い順（昇順）にソートする。 (S83) Next, the sets (D _ij , RID _j ) of the distance D _ij and the image identifier RID _j are sorted in ascending order of distance (ascending order).

そして、距離が短い上位所定件数(K)のみを選択する。距離が短いとは、類似度が高いことを意味する。また、上位所定件数(K)は、予め設定された固定数であってもよい。又は、予め設定された距離に関する閾値THに基づいて当該距離が閾値TH以下となる距離の数であってもよい。 Then, only the upper predetermined number (K) having a short distance is selected. A short distance means a high degree of similarity. The upper predetermined number (K) may be a fixed number set in advance. Alternatively, the distance may be the number of distances where the distance is equal to or less than the threshold TH based on a preset threshold TH related to the distance.

具体的には、上位K件の複数の組（Ｄ_ij, RID_j）について、距離が短いほど高くなるスコア値を算出し、それらスコア値を累積加算する。具体的には、k番目の距離Ｄ_i'j'及びK番目の距離Ｄを用いて、以下のいずれかによって、スコア値を算出する。
Ｓ(Ｄ_i'j',Ｄ)：k番目の距離を持つ画像への投票スコア値
（１）k番目の距離の二乗から、i番目の距離の二乗を引いたものとする
Ｓ(Ｄ_i'j',Ｄ)＝Ｄ²−Ｄ_i'j' ²
（２）k番目の距離の二乗とi番目の距離の二乗の比から、1を引いたものとする
Ｓ(Ｄ_i'j',Ｄ)＝Ｄ²／Ｄ_i'j' ²−１
（３）k番目の距離とi番目の距離の比の二乗から、1を引いたものとする
Ｓ(Ｄ_i'j',Ｄ)＝(Ｄ／Ｄ_i'j')²−１
（４）k番目の距離とi番目の距離の比から1を引いたものの二乗とする
Ｓ(Ｄ_i'j',Ｄ)＝(Ｄ／Ｄ_i'j'−１)² Specifically, for a plurality of sets (D _ij , RID _j ) of the top K, a score value that increases as the distance is shorter is calculated, and the score values are cumulatively added. Specifically, using the kth distance D _{i′j ′} and the Kth distance D, the score value is calculated by either of the following.
S (D _{i′j ′} , D): Vote score value for image having k-th distance (1) The square of i-th distance is subtracted from the square of k-th distance S (D _{i 'j'} , D) = D ² -D _{i'j '} ²
(2) It is assumed that 1 is subtracted from the ratio of the square of the kth distance and the square of the ith distance. S (D _{i′j ′} , D) = D ² / D _{i′j ′} ² −1
(3) It is assumed that 1 is subtracted from the square of the ratio of the kth distance and the ith distance. S (D _{i′j ′} , D) = (D / D _{i′j ′} ) ² −1
(4) S (D _{i′j ′} , D) = (D / D _{i′j ′} −1) ^{2, which is the} square of 1 minus the ratio of the kth distance and the ith distance

（Ｓ８４）当該リファレンス画像毎に該スコア値を累積加算する。次のリファレンス画像について、Ｓ１〜Ｓ３を繰り返す。 (S84) The score values are cumulatively added for each reference image. S1 to S3 are repeated for the next reference image.

（Ｓ８５）最終的に、スコア値が最も高くなるリファレンス画像を、検索結果として出力する。 (S85) Finally, the reference image with the highest score value is output as the search result.

以上、詳細に説明したように、本発明の画像検索装置、システム、プログラム及び方法によれば、リファレンス情報のデータベースをできる限り小容量とすると共に、画像検索時の演算処理量も少なくすることができる。 As described above in detail, according to the image search apparatus, system, program, and method of the present invention, the reference information database can be made as small as possible and the amount of calculation processing during image search can be reduced. it can.

前述した本発明の種々の実施形態について、本発明の技術思想及び見地の範囲の種々の変更、修正及び省略は、当業者によれば容易に行うことができる。前述の説明はあくまで例であって、何ら制約しようとするものではない。本発明は、特許請求の範囲及びその均等物として限定するものにのみ制約される。 Various changes, modifications, and omissions of the above-described various embodiments of the present invention can be easily made by those skilled in the art. The above description is merely an example, and is not intended to be restrictive. The invention is limited only as defined in the following claims and the equivalents thereto.

１画像検索装置、端末
１０１ハッシュコードブック記憶部
１０２ビット番号コードブック記憶部
１０３転置インデックス記憶部
１１局所特徴抽出部
１２ハッシュ部
１３ビット列生成部
１４検索部
１５ハッシュコードブック生成部
１６ビット番号コードブック生成部
２検索情報配信サーバ DESCRIPTION OF SYMBOLS 1 Image search device, terminal 101 Hash code book storage unit 102 Bit number code book storage unit 103 Transposed index storage unit 11 Local feature extraction unit 12 Hash unit 13 Bit string generation unit 14 Search unit 15 Hash code book generation unit 16 Bit number code book Generation unit 2 Search information distribution server

Claims

An image search device for searching a reference image similar to a query image from a large number of reference images,
Hash codebook storage means for storing a hash codebook used for quantization of a binary feature vector for each hash value;
Bit number codebook storage means for storing a set of bit numbers having a high amount of information (entropy) for each hash value;
Local feature extraction means for extracting a set of binary feature vectors from the query image and the reference image;
Hashing means for outputting the most similar hash value to the binary feature vector using the hash codebook storage means;
A bit string generation means for selecting a partial bit string from the binary feature vector with reference to a bit number set of the hash value in the bit number codebook storage means;
For each hash value, transposed index storage means for storing a bit string of the reference image in association with each other,
An image search apparatus comprising: search means for searching for a reference image from a bit string of the query image using the transposed index storage means.

A hash codebook generating means for clustering a large number of binary feature vectors in a large number of reference images and storing a representative binary feature vector for each cluster and a hash value as an identifier thereof in the hash codebook storage means; The image search device according to claim 1.

The image search apparatus according to claim 2, wherein the hash codebook generating unit uses a k-means method or a k-medoids method for clustering.

For a number of binary feature vectors included in one cluster of the same hash value, a bit number sequence in which bit numbers with a high amount of information of the bit and a small correlation between the bit and other bits are arranged. 4. The image search apparatus according to claim 2, further comprising a bit number code book generating unit that stores the bit number string for each hash value in the bit number code book storage unit.

The bit number code book generating means stores the bit number string one bit number at a time in the bit number string.
For a large number of binary feature vectors included in one cluster of the same hash value, a temporary bit number sequence in which the average value of each bit (0/1) is arranged in order from the bit close to 0.5 (the bit with the highest amount of information) Generate and
If the absolute value of the correlation coefficient between other bits already contained in the bit number sequence in order from the upper bit of the temporary bit number sequence is less than or equal to a predetermined threshold, the bit number of the upper bit is changed to the bit number sequence. The image search apparatus according to claim 4, wherein the image search apparatus stores the data until the bit number string reaches a predetermined number.

The transposed index storage unit registers a transposed index in which a plurality of pairs of a reference image identifier and the bit string are associated in a list format for each hash value LIDn with respect to a reference image. Item 6. The image search device according to any one of Items 1 to 5.

The search means includes
Enter multiple sets of hash values LIDn and the query image,
For each set of hash value and bit string of the query image, obtain a list of transposed indexes corresponding to the hash value,
Calculating the dissimilarity between the query image bit string and the bit string of the reference image included in the acquired list;
Calculate a score value that increases as the dissimilarity is lower,
The score value is cumulatively added for each reference image,
An image search apparatus that outputs a reference image having the highest score value as a search result.

The image search apparatus according to claim 7, wherein the search means cumulatively adds only a score value corresponding to the upper predetermined number (K) having a low dissimilarity to the reference image.

For the search means, the upper predetermined number (K) is a preset fixed number, or a dissimilarity in which the dissimilarity is equal to or less than a threshold TH based on a preset threshold TH related to dissimilarity The image search device according to claim 7, wherein

In the search means, when the K-th dissimilarity is larger than the i-th dissimilarity for the i-th added score corresponding to the i-th among the upper predetermined number (K), the value of the added score The image search device according to claim 7, wherein a large value is set.

For the search means, the addition score corresponding to the i-th among the upper predetermined number (K) is:
(1) Whether the square of the i-th dissimilarity is subtracted from the square of the K-th dissimilarity,
(2) Whether 1 is subtracted from the ratio of the square of the Kth dissimilarity and the square of the ith dissimilarity,
(3) Subtract 1 from the square of the ratio of the Kth dissimilarity and the ith dissimilarity, or
(4) The image search apparatus according to claim 7, wherein the ratio is obtained by subtracting 1 from the ratio of the Kth dissimilarity and the ith dissimilarity to the square.

The local feature extraction means extracts a binary feature vector based on ORB (Oriented FAST and Rotated BRIEF), FREK (Fast Retina Keypoint), or BRISK (Binary Robust Independent Elementary Features).
The image search apparatus according to claim 1, wherein the image search apparatus is an image search apparatus.

A program for causing a computer mounted on a device for searching a reference image similar to a query image to function from among a large number of reference images,
Hash codebook storage means for storing a hash codebook used for quantization of a binary feature vector for each hash value;
Bit number codebook storage means for storing a set of bit numbers having a high amount of information (entropy) for each hash value;
Local feature extraction means for extracting a set of binary feature vectors from the query image and the reference image;
Hashing means for outputting the most similar hash value to the binary feature vector using the hash codebook storage means;
A bit string generation means for selecting a partial bit string from the binary feature vector with reference to a bit number set of the hash value in the bit number codebook storage means;
For each hash value, transposed index storage means for storing a bit string of the reference image in association with each other,
A program that causes a computer to function as search means for searching a reference image from a bit string of the query image using the transposed index storage means.

A method of searching for a reference image similar to a query image from a number of reference images using an apparatus,
The device is
For each hash value, a hash codebook storage unit that stores a hash codebook used for quantization of binary feature vectors;
A bit number codebook storage unit for storing a set of bit numbers having a high amount of information (entropy) for each hash value;
A transposed index storage unit that stores a bit string of the reference image in association with each hash value;
A first step of extracting a set of binary feature vectors from the query image and the reference image;
A second step of outputting a hash value most similar to the binary feature vector using the hash codebook storage unit;
A third step of selecting a partial bit string from the binary feature vector by referring to a bit number set of the hash value in the bit number codebook storage unit;
And a fourth step of searching for a reference image from a bit string of the query image using the transposed index storage unit.