JP2013012076A

JP2013012076A - Retrieval device and program for retrieving high dimensional feature vector with high accuracy

Info

Publication number: JP2013012076A
Application number: JP2011144832A
Authority: JP
Inventors: Yusuke Uchida; 祐介内田; Shigeyuki Sakasawa; 茂之酒澤
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2011-06-29
Filing date: 2011-06-29
Publication date: 2013-01-17
Anticipated expiration: 2031-06-29
Also published as: JP5598925B2

Abstract

【課題】高次元の特徴ベクトルを検索する際に、量子化ＩＤ毎の特徴ベクトルの密度を考慮し、ＩＤ同士の間の類似度の決定における不公平性を緩和する検索装置等を提供する。
【解決手段】リファレンスハッシュＩＤ毎に、コンテンツＩＤと特徴ベクトルとの組み合わせを、複数蓄積したリファレンス情報蓄積手段と、特徴ベクトル毎に、ハッシュＩＤを導出するハッシュ化手段と、クエリハッシュＩＤに対応する各リファレンス特徴ベクトルと、クエリ特徴ベクトルとの間の距離を算出する距離算出手段と、リファレンスコンテンツＩＤを、距離が短い昇順にソートする距離ソート手段と、上位閾値割合（％）に含まれるリファレンスコンテンツＩＤのみを抽出するリファレンスコンテンツＩＤ抽出手段と、リファレンスコンテンツＩＤに対応する複数のリファレンスコンテンツの中から、最も類似するリファレンスコンテンツを検索する類似検索手段とを有する。
【選択図】図２A search device and the like that reduce unfairness in determining similarity between IDs in consideration of the density of feature vectors for each quantization ID when searching for high-dimensional feature vectors.
A reference information storage unit that stores a plurality of combinations of content IDs and feature vectors for each reference hash ID, a hashing unit that derives a hash ID for each feature vector, and a query hash ID Distance calculating means for calculating the distance between each reference feature vector and the query feature vector, distance sorting means for sorting the reference content IDs in ascending order of distance, and reference content included in the upper threshold ratio (%) Reference content ID extraction means for extracting only the ID, and similarity search means for searching for the most similar reference content from among a plurality of reference contents corresponding to the reference content ID.
[Selection] Figure 2

Description

本発明は、特徴ベクトルの集合で表されるリファレンスコンテンツの集合から、同じく特徴ベクトルの集合で表されるクエリコンテンツ（検索キー）に類似したリファレンスコンテンツを高精度に検索する技術に関する。特に、高次元の特徴ベクトルの集合で表されるマルチメディアコンテンツ（例えば画像）の検索に適する。 The present invention relates to a technique for accurately retrieving reference content similar to a query content (search key) similarly represented by a set of feature vectors from a set of reference content represented by a set of feature vectors. In particular, it is suitable for searching multimedia contents (for example, images) represented by a set of high-dimensional feature vectors.

近年、オンライン／オフラインに限られず、ストレージの大容量化に伴って、大量のコンテンツを蓄積することが可能となっている。また、携帯電話機やスマートフォンに代表される情報端末機器の普及によって、ユーザ自ら取得した写真データのようなデジタルコンテンツも、データベースに大量かつ容易に蓄積することができる。オフラインデータベースとして、ＨＤＤ(Hard Disk Drive)、ＤＶＤ(Digital Versatile Disk)、Blu-ray disc等の記憶装置がある。また、オンラインデータベースとしては、Flickr（登録商標）やMySpace（登録商標）のようなソーシャルネットワークサービスがある。これら記憶装置及びサービスによれば、データベースに蓄積された個人の大量且つ多様なマルチメディアコンテンツを検索するする技術が重要となる。 In recent years, not limited to online / offline, it has become possible to accumulate a large amount of content as the capacity of the storage increases. In addition, with the widespread use of information terminal devices typified by mobile phones and smartphones, digital content such as photograph data acquired by the user can be easily stored in a large amount in a database. Offline databases include storage devices such as HDD (Hard Disk Drive), DVD (Digital Versatile Disk), and Blu-ray disc. Online databases include social network services such as Flickr (registered trademark) and MySpace (registered trademark). According to these storage devices and services, a technique for searching for a large amount and various multimedia contents of individuals stored in a database becomes important.

マルチメディアコンテンツを検索するために、これらコンテンツから多数の特徴ベクトルを抽出し、この特徴ベクトルの集合同士の間の類似度が高いコンテンツを検索結果として出力する技術がある。この技術によれば、マルチメディアコンテンツの特徴ベクトルを量子化し、量子化された特徴ベクトルの頻度からヒストグラムを作成する。そのヒストグラム同士の間のＬ１ノルム又はＬ２ノルムの距離によって類似度（距離）を算出する。ノルムとは、２つの点の間の距離を表す。Ｌ１ノルムとは、２つの点の各次元の値の絶対値の和を意味し、Ｌ２ノルムとは、２つの点の各次元の値を二乗した和を意味する。 In order to search for multimedia contents, there is a technique for extracting a large number of feature vectors from these contents and outputting contents having a high degree of similarity between sets of feature vectors as search results. According to this technique, feature vectors of multimedia content are quantized and a histogram is created from the frequency of the quantized feature vectors. The similarity (distance) is calculated by the distance of the L1 norm or L2 norm between the histograms. The norm represents the distance between two points. The L1 norm means the sum of the absolute values of the dimensions of the two points, and the L2 norm means the sum of the squares of the values of the two points.

また、画像コンテンツから大量の局所特徴ベクトルを抽出し、それらをベクトル量子化し、同一の代表ベクトルにベクトル量子化された局所特徴ベクトルの数で類似度を算出する技術もある（例えば非特許文献１参照）。 There is also a technique for extracting a large amount of local feature vectors from image content, vector quantizing them, and calculating the similarity based on the number of local feature vectors vector-quantized to the same representative vector (for example, Non-Patent Document 1). reference).

更に、画像から複数の局所不変特徴量を抽出し、特徴ベクトルの頻度のヒストグラム化し、そのヒストグラムの重なり率によって画像とカテゴリとの間の類似度を算出する技術もある（例えば特許文献１参照）。この技術によれば、ヒストグラムに基づいて被写体のパターン認識に不要となる特徴（例えば背景の特徴）を除くことができる。これによって、画像中から物体と物体以外とを予め分離することなく、当該物体の特徴を抽出することができる。 Furthermore, there is a technique for extracting a plurality of local invariant feature amounts from an image, making a histogram of the frequency of feature vectors, and calculating the similarity between the image and the category based on the overlapping ratio of the histograms (see, for example, Patent Document 1). . According to this technique, features (for example, background features) that are not necessary for pattern recognition of a subject can be removed based on the histogram. As a result, the feature of the object can be extracted without previously separating the object and the non-object from the image.

一方で、前述した従来技術によれば、量子化によって、元々の特徴ベクトルの情報が失われる。その量子化の粗さによって、以下のような問題が生じる。
（誤検出）粗く量子化した場合、異なる特徴ベクトルが同じ識別値（ＩＤ(IDentifier)）を持つ代表ベクトルに量子化される可能性が高くなる。これによって、異なるコンテンツを類似していると判定するという誤検出が、増加する。
（未検出）逆に、細かく量子化した場合、類似した特徴ベクトルが異なるＩＤを持つ代表ベクトルに量子化される可能性が高くなる。これによって、類似したコンテンツが類似していると判定されないという未検出が、増加する。 On the other hand, according to the prior art described above, the original feature vector information is lost due to quantization. The following problems arise due to the roughness of the quantization.
(Error detection) When roughly quantized, different feature vectors are likely to be quantized into representative vectors having the same identification value (ID (IDentifier)). This increases false detection of determining that different content is similar.
(Undetected) Conversely, when fine quantization is performed, there is a high possibility that similar feature vectors are quantized into representative vectors having different IDs. This increases non-detection that similar content is not determined to be similar.

このような問題を解決するべく、誤検出及び未検出の２種類のエラーのトレードオフを改善する技術がある（例えば非特許文献２及び３）。この技術によれば、予め特徴ベクトルを、コンパクトなコードに符号化する。そして、ベクトル量子化で同じＩＤとなった特徴ベクトル間の距離を、符号間の距離で近似する。その距離が所定閾値以下となる特徴ベクトル同士のみが、類似度に貢献するものとする。 In order to solve such a problem, there is a technique for improving the trade-off between two types of errors that are erroneously detected and undetected (for example, Non-Patent Documents 2 and 3). According to this technique, the feature vector is encoded in advance into a compact code. Then, the distance between feature vectors having the same ID by vector quantization is approximated by the distance between codes. Only feature vectors whose distance is less than or equal to a predetermined threshold value contribute to the similarity.

特開２０１０−２８２５８１号公報JP 2010-282581 A 特開２００９−０２０７６９号公報JP 2009-020769A

J. Sivic et al., "Video Google: A Text Retrieval Approach toObject Matching in Videos," in Proc. ICCV, 2003.J. Sivic et al., "Video Google: A Text Retrieval Approach to Object Matching in Videos," in Proc. ICCV, 2003. H. Jegou, M. Douze, and C. Schmid, "Improving bag-offeaturesfor large scale image search," in IJCV, vol.87, no.3, pp.316-336, 2010.H. Jegou, M. Douze, and C. Schmid, "Improving bag-offeatures for large scale image search," in IJCV, vol.87, no.3, pp.316-336, 2010. Y. Uchida, M. Agrawal, and S. Sakazawa, "Accurate Content-BasedVideo Copy Detection with Efficient Feature Indexing," in Proc. of ICMR,2011.Y. Uchida, M. Agrawal, and S. Sakazawa, "Accurate Content-Based Video Copy Detection with Efficient Feature Indexing," in Proc. Of ICMR, 2011. D. G. Lowe, "Distinctive Image Features from Scale-InvariantKeypoints", International Journal of Computer Vision, vol. 60, no. 2, pp.91-110, 2004.D. G. Lowe, "Distinctive Image Features from Scale-InvariantKeypoints", International Journal of Computer Vision, vol. 60, no. 2, pp.91-110, 2004. H. Jegou, M. Douze, and C. Schmid, "Product quantization fornearest neighbor search," in IEEE Trans. on PAMI, vol. 33, no. 1, pp117-128, 2011.H. Jegou, M. Douze, and C. Schmid, "Product quantization fornearest neighbor search," in IEEE Trans. On PAMI, vol. 33, no. 1, pp117-128, 2011.

しかしながら、前述した非特許文献２及び３に記載された技術によれば、全ての量子化ＩＤについて、同一の所定閾値によって類似度に貢献するか否かを判定しているために、量子化ＩＤ毎の特徴ベクトルの密度については全く考慮されていない。このため、ある量子化ＩＤでは、過剰に特徴ベクトルが類似度に貢献し、一方で、別の量子化ＩＤでは、類似度に貢献する特徴ベクトルが少なくなるといった問題が生じる。 However, according to the techniques described in Non-Patent Documents 2 and 3 described above, since it is determined whether or not all quantization IDs contribute to similarity by the same predetermined threshold, the quantization ID The density of each feature vector is not considered at all. For this reason, in one quantization ID, the feature vector contributes excessively to the similarity, while in another quantization ID, the feature vector contributing to the similarity decreases.

そこで、本発明によれば、高次元の特徴ベクトルを検索する際に、量子化ＩＤ毎の特徴ベクトルの密度を考慮することによって、量子化ＩＤ同士の間の類似度の決定における不公平性を緩和することができる検索装置及びプログラムを提供することを目的とする。 Therefore, according to the present invention, when searching for a high-dimensional feature vector, by considering the density of feature vectors for each quantization ID, the unfairness in determining the similarity between the quantization IDs can be reduced. An object of the present invention is to provide a search device and a program that can be relaxed.

本発明によれば、特徴ベクトルの集合で表されるリファレンスコンテンツの集合から、特徴ベクトルの集合で表されるクエリコンテンツに類似したリファレンスコンテンツを検索する検索装置であって、
リファレンスハッシュ識別値毎に、リファレンスコンテンツ識別値と、リファレンス特徴ベクトルとの組み合わせを、複数蓄積したリファレンス情報蓄積手段と、
クエリコンテンツについて、クエリ特徴ベクトル毎に、１つ以上のハッシュ関数を用いて、１つ以上の一意のクエリハッシュ識別値を導出するハッシュ化手段と、
リファレンス情報蓄積手段を用いて、クエリハッシュ識別値に対応する各リファレンス特徴ベクトルと、クエリ特徴ベクトルとの間の距離を算出する距離算出手段と、
クエリハッシュ識別値に対応する全てのリファレンスコンテンツ識別値を、距離が短い昇順にソートする距離ソート手段と、
ソートされたリファレンスコンテンツ識別値の中で、上位閾値割合（％）に含まれるリファレンスコンテンツ識別値のみを抽出するリファレンスコンテンツ識別値抽出手段と、
抽出されたリファレンスコンテンツ識別値に対応するリファレンス特徴ベクトルを含む複数のリファレンスコンテンツの中から、最も類似するリファレンスコンテンツを検索する類似検索手段と
を有することを特徴とする。 According to the present invention, there is provided a search device for searching reference content similar to query content represented by a set of feature vectors from a set of reference content represented by a set of feature vectors,
Reference information storage means for storing a plurality of combinations of reference content identification values and reference feature vectors for each reference hash identification value;
For query content, hashing means for deriving one or more unique query hash identification values using one or more hash functions for each query feature vector;
Distance calculating means for calculating a distance between each reference feature vector corresponding to the query hash identification value and the query feature vector using the reference information storage means;
Distance sorting means for sorting all reference content identification values corresponding to query hash identification values in ascending order of distance;
Reference content identification value extraction means for extracting only reference content identification values included in the upper threshold ratio (%) among the sorted reference content identification values;
It has a similar search means for searching for the most similar reference content from among a plurality of reference contents including a reference feature vector corresponding to the extracted reference content identification value.

本発明の検索装置における他の実施形態によれば、
リファレンス情報蓄積手段は、リファレンス特徴ベクトルに代えて、当該リファレンス特徴ベクトルに対するリファレンス符号を蓄積しており、
ハッシュ化手段は、多数のリファレンス特徴ベクトル毎に、１つ以上のハッシュ関数を用いて、１つ以上の一意のリファレンスハッシュ識別値を更に導出し、
リファレンス特徴ベクトルを、リファレンス符号に符号化し、リファレンス情報蓄積手段へ蓄積する符号化手段を更に有し、
距離算出手段は、リファレンス情報蓄積手段を用いて、クエリハッシュ識別値に対応する各リファレンス符号と、クエリ特徴ベクトルとの間の距離を算出することも好ましい。 According to another embodiment of the search device of the present invention,
The reference information storage means stores a reference code for the reference feature vector instead of the reference feature vector,
The hashing means further derives one or more unique reference hash identification values using one or more hash functions for each of a number of reference feature vectors;
Encoding means for encoding a reference feature vector into a reference code and storing it in a reference information storage means;
The distance calculation means preferably calculates the distance between each reference code corresponding to the query hash identification value and the query feature vector using the reference information storage means.

本発明の検索装置における他の実施形態によれば、類似検索手段は、クエリコンテンツに含まれる複数のクエリ特徴ベクトル毎に導出された複数のリファレンスコンテンツ識別値を、当該リファレンスコンテンツ識別値毎に投票し、その投票数に応じたリファレンスコンテンツ識別値を抽出することも好ましい。 According to another embodiment of the search device of the present invention, the similarity search means votes a plurality of reference content identification values derived for each of a plurality of query feature vectors included in the query content for each reference content identification value. It is also preferable to extract a reference content identification value corresponding to the number of votes.

本発明の検索装置における他の実施形態によれば、
リファレンス情報蓄積手段は、リファレンスコンテンツ毎に、リファレンスハッシュ識別値の頻度のＬｐノルムを蓄積し、
類似検索手段は、リファレンスコンテンツ毎の得票数を、対応するリファレンスコンテンツのＬｐノルムで正規化した値を、類似度として算出することも好ましい。 According to another embodiment of the search device of the present invention,
The reference information accumulation means accumulates the Lp norm of the frequency of the reference hash identification value for each reference content,
The similarity search means preferably calculates, as the similarity, a value obtained by normalizing the number of votes obtained for each reference content with the Lp norm of the corresponding reference content.

本発明の検索装置における他の実施形態によれば、
符号化手段は、リファレンス特徴ベクトル及びクエリ特徴ベクトルに対して射影変換を行い、変換後の各次元の値を閾値で二値化することによって、バイナリ列のリファレンス符号及びクエリ符号を生成し、
距離算出手段は、リファレンス符号及びクエリ符号のハミング距離を算出することも好ましい。 According to another embodiment of the search device of the present invention,
The encoding means performs a projective transformation on the reference feature vector and the query feature vector, and generates a binary string reference code and query code by binarizing the value of each dimension after conversion with a threshold value,
It is also preferable that the distance calculation means calculates a Hamming distance between the reference code and the query code.

本発明の検索装置における他の実施形態によれば、
符号化手段は、
ハッシュ識別値に対応する代表ベクトルを蓄積しており、
リファレンス特徴ベクトルと、当該リファレンス特徴ベクトルのリファレンスハッシュ識別値に対応する代表ベクトルとの間のリファレンス差分ベクトルを算出し、そのリファレンス差分ベクトルをリファレンス符号に符号化することも好ましい。 According to another embodiment of the search device of the present invention,
The encoding means is
The representative vector corresponding to the hash identification value is accumulated,
It is also preferable to calculate a reference difference vector between the reference feature vector and a representative vector corresponding to the reference hash identification value of the reference feature vector, and encode the reference difference vector into a reference code.

本発明の検索装置における他の実施形態によれば、代表ベクトルは、当該ハッシュ識別値を持つ多数の特徴ベクトルの平均ベクトル又は中央値ベクトルから算出されることも好ましい。 According to another embodiment of the search device of the present invention, the representative vector is preferably calculated from an average vector or a median vector of a number of feature vectors having the hash identification value.

本発明の検索装置における他の実施形態によれば、
符号化手段は、
直積量子化のためのコードブックを更に有し、
リファレンス差分ベクトルを、コードブックを用いて直積量子化によってリファレンス符号に符号化し、
クエリ特徴ベクトルと、クエリハッシュ識別値に対応する代表ベクトルとの間のクエリ差分ベクトルを算出するものであり、
距離算出手段は、リファレンス符号とクエリ差分ベクトルとのＬｐ距離を算出することも好ましい。 According to another embodiment of the search device of the present invention,
The encoding means is
A codebook for product quantization,
The reference difference vector is encoded into a reference code by direct product quantization using a codebook,
A query difference vector between the query feature vector and a representative vector corresponding to the query hash identification value,
It is also preferable that the distance calculation means calculates an Lp distance between the reference code and the query difference vector.

本発明の検索装置における他の実施形態によれば、ハッシュ化手段は、ベクトル量子化用のコードブックを更に有し、ハッシュ関数はベクトル量子化を行うことでリファレンスハッシュ識別値及びクエリハッシュ識別値を算出することも好ましい。 According to another embodiment of the search device of the present invention, the hashing means further includes a codebook for vector quantization, and the hash function performs vector quantization to thereby obtain a reference hash identification value and a query hash identification value. It is also preferable to calculate.

本発明の検索装置における他の実施形態によれば、ハッシュ化手段は、リファレンス特徴ベクトル及びクエリ特徴ベクトルと、同次元の既定のベクトルとの内積の値を、所定閾値で離散化することでリファレンスハッシュ識別値及びクエリハッシュ識別値を算出することも好ましい。 According to another embodiment of the search device of the present invention, the hashing unit discriminates the value of the inner product of the reference feature vector and the query feature vector and a predetermined vector of the same dimension with a predetermined threshold value, thereby making the reference It is also preferable to calculate a hash identification value and a query hash identification value.

本発明の検索装置における他の実施形態によれば、ベクトルは、画像の局所特徴領域から抽出された局所特徴ベクトルであることも好ましい。 According to another embodiment of the search device of the present invention, the vector is preferably a local feature vector extracted from a local feature region of an image.

本発明によれば、特徴ベクトルの集合で表されるリファレンスコンテンツの集合から、特徴ベクトルの集合で表されるクエリコンテンツに類似したリファレンスコンテンツを検索するように、コンピュータを機能させるプログラムであって、
リファレンスハッシュ識別値毎に、リファレンスコンテンツ識別値と、リファレンス特徴ベクトルとの組み合わせを、複数蓄積したリファレンス情報蓄積手段と、
クエリコンテンツについて、クエリ特徴ベクトル毎に、１つ以上のハッシュ関数を用いて、１つ以上の一意のクエリハッシュ識別値を導出するハッシュ化手段と、
リファレンス情報蓄積手段を用いて、クエリハッシュ識別値に対応する各リファレンス特徴ベクトルと、クエリ特徴ベクトルとの間の距離を算出する距離算出手段と、
クエリハッシュ識別値に対応する全てのリファレンスコンテンツ識別値を、距離が短い昇順にソートする距離ソート手段と、
ソートされたリファレンスコンテンツ識別値の中で、上位閾値割合（％）に含まれるリファレンスコンテンツ識別値のみを抽出するリファレンスコンテンツ識別値抽出手段と、
抽出されたリファレンスコンテンツ識別値に対応するリファレンス特徴ベクトルを含む複数のリファレンスコンテンツの中から、最も類似するリファレンスコンテンツを検索する類似検索手段と
してコンピュータを機能させることを特徴とする。 According to the present invention, a program that causes a computer to function so as to search for reference content similar to query content represented by a set of feature vectors from a set of reference content represented by a set of feature vectors,
Reference information storage means for storing a plurality of combinations of reference content identification values and reference feature vectors for each reference hash identification value;
For query content, hashing means for deriving one or more unique query hash identification values using one or more hash functions for each query feature vector;
Distance calculating means for calculating a distance between each reference feature vector corresponding to the query hash identification value and the query feature vector using the reference information storage means;
Distance sorting means for sorting all reference content identification values corresponding to query hash identification values in ascending order of distance;
Reference content identification value extraction means for extracting only reference content identification values included in the upper threshold ratio (%) among the sorted reference content identification values;
The computer is caused to function as a similarity search means for searching for the most similar reference content from among a plurality of reference contents including a reference feature vector corresponding to the extracted reference content identification value.

本発明の検索装置及びプログラムによれば、高次元の特徴ベクトルを検索する際に、量子化ＩＤ毎の特徴ベクトルの密度を考慮することによって、量子化ＩＤ同士の間の類似度の決定における不公平性を緩和することができる。 According to the search device and program of the present invention, when searching for high-dimensional feature vectors, the density of the similarity between quantization IDs is determined by considering the density of feature vectors for each quantization ID. Fairness can be eased.

リファレンス特徴ベクトルの集合に対するクラスタリングを表す説明図である。It is explanatory drawing showing the clustering with respect to the set of a reference feature vector. 本発明における検索装置の第１の機能構成図である。It is a 1st functional block diagram of the search device in this invention. 本発明におけるデータ構造を表す第１の説明図である。It is the 1st explanatory view showing the data structure in the present invention. ハッシュ化部の機能構成を表す説明図である。It is explanatory drawing showing the function structure of a hash part. リファレンスコンテンツＩＤの投票を表すグラフである。It is a graph showing the vote of reference content ID. 本発明における検索装置の第２の機能構成図である。It is a 2nd function block diagram of the search device in this invention. 本発明におけるデータ構造を表す第２の説明図である。It is the 2nd explanatory view showing the data structure in the present invention. 本発明における符号化部の機能構成図である。It is a function block diagram of the encoding part in this invention.

以下では、本発明の実施の形態について、図面を用いて詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

本発明の検索装置及びプログラムによれば、多数のリファレンスコンテンツ（検索対象のコンテンツ）の中から、クエリコンテンツ（検索キーのコンテンツ）に最も類似するリファレンスコンテンツを検索する。検索のために、マルチメディアコンテンツから、特徴ベクトルの集合を抽出する。即ち、特徴ベクトルの集合の１つが、１つのマルチメディアコンテンツに対応する。具体的には、リファレンスコンテンツは、複数のリファレンス特徴ベクトルの集合からなり、同様に、クエリコンテンツも、複数のクエリ特徴ベクトルの集合からなる。リファレンス特徴ベクトル及びクエリ特徴ベクトルは、同じ次元数のベクトルである。 According to the search device and program of the present invention, the reference content that is most similar to the query content (content of the search key) is searched from among a large number of reference contents (contents to be searched). For searching, a set of feature vectors is extracted from the multimedia content. That is, one set of feature vectors corresponds to one multimedia content. Specifically, the reference content includes a set of a plurality of reference feature vectors, and similarly, the query content includes a set of a plurality of query feature vectors. The reference feature vector and the query feature vector are vectors having the same number of dimensions.

尚、マルチメディアコンテンツが画像である場合、その特徴ベクトルは、画像の局所特徴領域から抽出された局所特徴ベクトルである。 When the multimedia content is an image, the feature vector is a local feature vector extracted from the local feature region of the image.

図１は、リファレンス特徴ベクトルの集合に対するクラスタリングを表す説明図である。 FIG. 1 is an explanatory diagram showing clustering for a set of reference feature vectors.

図１（ａ）によれば、多数のリファレンス特徴ベクトルが、例えばＮ個のクラスタＦn（ｎ＝１〜Ｎ）にクラスタリングされている。クラスタリングには、例えばk-means又はk-means++のようなアルゴリズムが用いられる。各クラスタ内には、多数のリファレンス特徴ベクトルが含まれている。クラスタ内にクエリ特徴ベクトルが含まれる場合、そのクエリ特徴ベクトルから所定閾値の距離範囲に属するリファレンス特徴ベクトルが、候補として導出される。図１によれば、以下のように表される。
× ：リファレンス特徴ベクトル
＋：クラスタにおける代表ベクトル（平均ベクトル又は中央値ベクトル）
黒丸●：クエリ特徴ベクトル
波線○：クエリ特徴ベクトルから所定閾値の距離範囲 According to FIG. 1A, a large number of reference feature vectors are clustered into, for example, N clusters Fn (n = 1 to N). For clustering, an algorithm such as k-means or k-means ++ is used. Each cluster includes a number of reference feature vectors. When a query feature vector is included in the cluster, a reference feature vector belonging to a distance range of a predetermined threshold is derived from the query feature vector as a candidate. According to FIG. 1, it is expressed as follows.
×: Reference feature vector +: Representative vector in cluster (mean vector or median vector)
Black circle ●: Query feature vector Wavy line ○: Distance range from the query feature vector to a predetermined threshold

例えば、クラスタＦ１にクエリ特徴ベクトルａが含まれ、クラスタＦ３にクエリ特徴ベクトルｂが含まれるとする。従来技術によれば、クエリ特徴ベクトルから、所定閾値の距離範囲に属するリファレンス特徴ベクトルの数によって類似度を算出している。 For example, it is assumed that the query feature vector a is included in the cluster F1 and the query feature vector b is included in the cluster F3. According to the conventional technique, the similarity is calculated from the query feature vector by the number of reference feature vectors belonging to a predetermined threshold distance range.

しかしながら、クラスタのセル範囲によっては、以下のように分かれる。
・クラスタのセル範囲が小さい場合、特徴ベクトルの分布密度が高い。この場合、クエリ特徴ベクトルから、所定閾値範囲に属するリファレンス特徴ベクトルの数も多く、類似度が高いと判定される。
・クラスタのセル範囲が大きい場合、特徴ベクトルの分布密度が低い。この場合、クエリ特徴ベクトルから、所定閾値範囲に属するリファレンス特徴ベクトルの数も少なく、類似度も低いと判定される。 However, depending on the cell range of the cluster, it is divided as follows.
When the cell range of the cluster is small, the feature vector distribution density is high. In this case, it is determined from the query feature vector that the number of reference feature vectors belonging to the predetermined threshold range is large and the degree of similarity is high.
When the cell range of the cluster is large, the distribution density of feature vectors is low. In this case, it is determined from the query feature vector that the number of reference feature vectors belonging to the predetermined threshold range is small and the similarity is low.

これに対し、図１（ｂ）に基づく本発明によれば、クラスタ内のリファレンス特徴ベクトルの中で、上位閾値割合Ｐ（％）に含まれるリファレンス特徴ベクトルのみを抽出する。抽出されたリファレンス特徴ベクトルについてのみ、クエリ特徴ベクトルから所定閾値範囲に属する数が多いほど、類似度が高いと判定される。即ち、本発明によれば、クラスタのセル範囲（リファレンス特徴ベクトルの分布）に応じて、判定対象となるリファレンス特徴ベクトルが異なる。 On the other hand, according to the present invention based on FIG. 1B, only the reference feature vector included in the upper threshold ratio P (%) is extracted from the reference feature vectors in the cluster. For only the extracted reference feature vector, it is determined that the higher the number belonging to the predetermined threshold range from the query feature vector, the higher the similarity. That is, according to the present invention, the reference feature vector to be determined differs depending on the cell range of the cluster (reference feature vector distribution).

以下では、クエリコンテンツ及びリファレンスコンテンツについて、類似検索のための比較対象が異なる２通りの実施形態について説明する。
［第１の実施形態］クエリコンテンツのクエリ特徴ベクトルと、リファレンスコンテンツのリファレンス特徴ベクトルとを比較して、類似度を判定する。
［第２の実施形態］クエリコンテンツのクエリ特徴ベクトルにおけるクエリ符号（又はクエリ差分ベクトル）と、リファレンスコンテンツのリファレンス特徴ベクトルにおけるリファレンス符号とを比較して、類似度を判定する。 Below, two types of embodiment from which the comparison object for a similar search differs about query content and reference content is described.
[First Embodiment] The query feature vector of the query content and the reference feature vector of the reference content are compared to determine the similarity.
Second Embodiment A query code (or query difference vector) in a query feature vector of query content and a reference code in a reference feature vector of reference content are compared to determine the similarity.

［第１の実施形態］
図２は、本発明における検索装置の第１の機能構成図である。
図３は、本発明におけるデータ構造を表す第１の説明図である。 [First Embodiment]
FIG. 2 is a first functional configuration diagram of the search device according to the present invention.
FIG. 3 is a first explanatory diagram showing a data structure in the present invention.

図２によれば、検索装置１は、リファレンス情報蓄積部１００と、特徴ベクトル集合抽出部１１１と、ハッシュ化部１１２と、距離算出部１１３と、距離ソート部１１４と、リファレンスコンテンツＩＤ抽出部１１５と、類似検索部１１６とを有する。これら機能構成部は、装置に搭載されたコンピュータを機能させるプログラムを実行することによって実現される。 According to FIG. 2, the search device 1 includes a reference information storage unit 100, a feature vector set extraction unit 111, a hashing unit 112, a distance calculation unit 113, a distance sorting unit 114, and a reference content ID extraction unit 115. And a similarity search unit 116. These functional components are realized by executing a program that causes a computer installed in the apparatus to function.

検索装置１は、多数のリファレンスコンテンツを予め入力し、リファレンス情報蓄積部１００に、リファレンスコンテンツに関する情報を記憶する。また、検索装置１は、検索の際に、検索キーとなるクエリコンテンツを入力し、リファレンス情報蓄積部１００を用いて、クエリコンテンツに最も類似するリファレンスコンテンツを検索する。 The search device 1 inputs a large number of reference contents in advance, and stores information related to the reference contents in the reference information storage unit 100. In addition, the search device 1 inputs query content serving as a search key at the time of search, and uses the reference information storage unit 100 to search for reference content that is most similar to the query content.

特徴ベクトル集合抽出部１１１は、マルチメディアコンテンツから、特徴ベクトルの集合を抽出する。ここでは、リファレンスコンテンツから、複数のリファレンス特徴ベクトルの集合が抽出され、同様に、クエリコンテンツから、複数のクエリ特徴ベクトルの集合が抽出される（図３の１１１参照）。例えば、ＳＩＦＴ(Scale-Invariant Feature Transform)アルゴリズムによれば、１枚の画像からは、１２８次元（Ｄ＝１２８）の特徴ベクトルの集合が抽出される（例えば非特許文献４参照）。ＳＩＦＴとは、スケールスペースを用いて特徴的な局所領域を解析し、そのスケール変化及び回転に不変となる特徴ベクトルを記述する技術である。
特徴ベクトルｆ：ｆ＝（ｆ_１，ｆ_２，・・・，ｆ_Ｄ）
抽出されたリファレンス特徴ベクトルは、ハッシュ化部１１２及びリファレンス情報蓄積部１００へ出力される。また、抽出されたクエリ特徴ベクトルは、ハッシュ化部１１２及び距離算出部１１３へ出力される。 The feature vector set extraction unit 111 extracts a set of feature vectors from the multimedia content. Here, a set of a plurality of reference feature vectors is extracted from the reference content, and similarly, a set of a plurality of query feature vectors is extracted from the query content (see 111 in FIG. 3). For example, according to the SIFT (Scale-Invariant Feature Transform) algorithm, a set of 128-dimensional (D = 128) feature vectors is extracted from one image (see, for example, Non-Patent Document 4). SIFT is a technique for analyzing a characteristic local region using a scale space and describing a feature vector that is invariant to scale change and rotation.
Feature vector f: f = (f ₁ , f ₂ ,..., F _D )
The extracted reference feature vector is output to the hashing unit 112 and the reference information storage unit 100. Further, the extracted query feature vector is output to the hashing unit 112 and the distance calculation unit 113.

ハッシュ化部１１２は、リファレンスコンテンツについて、リファレンス特徴ベクトル毎に、１つ以上のハッシュ関数を用いて、１つ以上の一意のリファレンスハッシュＩＤ（識別値）を導出する。同様に、クエリコンテンツについても、クエリ特徴ベクトル毎に、１つ以上のハッシュ関数を用いて、１つ以上の一意のクエリハッシュＩＤを導出する（図３の１１２参照）。 The hashing unit 112 derives one or more unique reference hash IDs (identification values) for the reference content for each reference feature vector using one or more hash functions. Similarly, for the query content, one or more unique query hash IDs are derived for each query feature vector using one or more hash functions (see 112 in FIG. 3).

図４は、ハッシュ化部の機能構成を表す説明図である。 FIG. 4 is an explanatory diagram illustrating a functional configuration of the hashing unit.

ハッシュ化部１１２は、以下の２つの処理のいずれか一方を用いる。尚、リファレンス特徴ベクトル及びクエリ特徴ベクトルには、同じハッシュ化処理が実行される。 The hashing unit 112 uses one of the following two processes. The same hashing process is performed on the reference feature vector and the query feature vector.

（第１のハッシュ処理）ハッシュ化部１１２は、ベクトル量子化用のコードブックを有し、特徴ベクトルに対してベクトル量子化を実行する。これによって、リファレンスハッシュＩＤ及びクエリハッシュＩＤを算出する。 (First Hash Process) The hash unit 112 has a vector book for vector quantization, and executes vector quantization on the feature vector. Thereby, a reference hash ID and a query hash ID are calculated.

ここで、コードブックとは、特徴ベクトルｆを量子化するために、ハッシュＩＤｎ（＝１〜Ｎ）と、代表ベクトルｆ_ｎとを対応付けたものである。
ハッシュＩＤｎ：代表ベクトルｆ_ｎ
１：ｆ_１＝（ｆ_１１，ｆ_１２，・・・，ｆ_１Ｄ）
２：ｆ_２＝（ｆ_２１，ｆ_２２，・・・，ｆ_２Ｄ）
３：ｆ_３＝（ｆ_３１，ｆ_３２，・・・，ｆ_３Ｄ）
・・・
Ｎ：ｆ_Ｎ＝（ｆ_Ｎ１，ｆ_Ｎ２，・・・，ｆ_ＮＤ） Here, the codebook to quantize the feature vector f, a hash IDn (= 1 to N), in which associates the representative vector f _n.
Hash IDn: representative vector f _n
1: f ₁ = (f ₁₁ , f ₁₂ ,..., F _1D )
2: f ₂ = (f ₂₁ , f ₂₂ ,..., F _2D )
3: f ₃ = (f ₃₁ , f ₃₂ ,..., F _3D )
...
N: f _N = (f _N1 , f _N2 ,..., F _ND )

コードブックは、以下のステップを実行することによって生成される（例えば図１参照）。
（Ｓ１）リファレンス特徴ベクトルの集合を、Ｎ個のクラスタにクラスタリングする。
（Ｓ２）次に、クラスタ毎に、代表ベクトルを導出する（平均ベクトル又は中央値ベクトル）。
（Ｓ３）各代表ベクトルに、一意のハッシュＩＤｎ（＝１〜Ｎ）を割り当てたコードブックを生成する。 The code book is generated by performing the following steps (see, for example, FIG. 1).
(S1) A set of reference feature vectors is clustered into N clusters.
(S2) Next, a representative vector is derived for each cluster (average vector or median vector).
(S3) A code book in which a unique hash IDn (= 1 to N) is assigned to each representative vector is generated.

第１のハッシュ処理によれば、入力された特徴ベクトルｆとの距離が最も小さくなる代表ベクトルｆ_ｎ'を算出し、特徴ベクトルｆに、識別値ＩＤ_n'を割り当てる。
ｎ＝ｑ（ｆ）＝ａｒｇｍｉｎ_ｎ||ｆ−ｆ_ｎ||^２
（登録ベクトルｆの量子化関数ｑ（ｆ）は、||ｆ−ｆ_ｎ||^２が最小となるｎを導出）
ｑ：Ｒ^Ｄ->Ｎ（量子化を意味する）
ｆ_ｎ：代表ベクトル According to the first hashing process, a representative vector f _n ′ having the smallest distance from the input feature vector f is calculated, and an identification value ID _n ′ is assigned to the feature vector f.
n = q (f) = argmin _n || f−f _n || ²
(The quantization function q (f) of the registration vector f derives n where || f−f _n || ² is minimized)
q: R ^D- > N (means quantization)
f _n : representative vector

このとき、特徴ベクトルに対して、最も距離が小さくなる１つのＩＤn'を割り当てるのではなく、距離が小さくなる代表ベクトルにおける上位いくつかの複数のＩＤを割り当ててもよい。また、異なるコードブックを複数保持しておき、それぞれのコードブックで個別にＩＤを割り当ててもよい。これらの処理によって、Ｄ次元の特徴ベクトルの集合ｆ₁〜ｆ_Kそれぞれに、１つ以上のハッシュＩＤが割り当てられる。 At this time, instead of assigning one IDn ′ having the smallest distance to the feature vector, a plurality of higher IDs in the representative vector having the smallest distance may be assigned. Alternatively, a plurality of different code books may be held, and an ID may be individually assigned to each code book. Through these processes, one or more hash IDs are assigned to each of the D-dimensional feature vector sets f _{1 to} f _K.

（第２のハッシュ処理）ハッシュ化部１１２は、リファレンス特徴ベクトル及びクエリ特徴ベクトルと、同次元の既定のベクトルとの内積の値を、所定閾値で離散化することによって、リファレンスハッシュＩＤ及びクエリハッシュＩＤを算出する。 (Second hash processing) The hashing unit 112 discretizes the value of the inner product of the reference feature vector and the query feature vector and a predetermined vector of the same dimension with a predetermined threshold value, so that the reference hash ID and the query hash ID is calculated.

尚、ハッシュ化部１１２では、近似最近傍探索(Approximate Nearest Neighbor)アルゴリズムの１つであるＬＳＨ(Locality-Sensitive Hashing)で用いられるハッシュ関数を用いることもできる。ＬＳＨによれば、類似するデータ同士のハッシュＩＤは一致し、非類似のデータ同士のハッシュＩＤは異なる可能性が高くなるように、ハッシュＩＤを出力する。 The hashing unit 112 may use a hash function used in LSH (Locality-Sensitive Hashing), which is one of the approximate nearest neighbor algorithms. According to LSH, hash IDs are output so that the hash IDs of similar data match and the hash IDs of dissimilar data are likely to be different.

リファレンス特徴ベクトルをハッシュ化したリファレンスハッシュＩＤは、リファレンス情報蓄積部１００へ出力される。また、クエリ特徴ベクトルをハッシュ化したクエリハッシュＩＤは、距離算出部１１３へ出力される。 The reference hash ID obtained by hashing the reference feature vector is output to the reference information storage unit 100. The query hash ID obtained by hashing the query feature vector is output to the distance calculation unit 113.

リファレンス情報蓄積部１００は、リファレンスハッシュＩＤ毎に、リファレンスコンテンツＩＤと、リファレンス特徴ベクトルとの組み合わせを、複数蓄積する（図３の１００参照）。即ち、リファレンスハッシュＩＤ毎に、リファレンスコンテンツＩＤ及びリファレンス特徴ベクトルがリスト化される。 The reference information storage unit 100 stores a plurality of combinations of reference content IDs and reference feature vectors for each reference hash ID (see 100 in FIG. 3). That is, the reference content ID and the reference feature vector are listed for each reference hash ID.

再び図２へ戻って、距離算出部１１３は、リファレンス情報蓄積部１００がリファレンス特徴ベクトルを蓄積している場合、クエリ特徴ベクトルにおけるクエリハッシュＩＤikに対応するリファレンス特徴ベクトルのリストを参照する。そして、各リファレンス特徴ベクトルと、クエリ特徴ベクトルとの間の距離を算出する（図３の１１３参照）。算出された距離の情報は、距離ソート部１１４へ出力される。 Returning to FIG. 2 again, the distance calculation unit 113 refers to a list of reference feature vectors corresponding to the query hash IDik in the query feature vector when the reference information storage unit 100 stores the reference feature vector. Then, the distance between each reference feature vector and the query feature vector is calculated (see 113 in FIG. 3). Information on the calculated distance is output to the distance sorting unit 114.

距離ソート部１１４は、クエリハッシュＩＤに対応する全てのリファレンスコンテンツＩＤを、距離が短い昇順にソートする。リファレンスコンテンツＩＤ及び距離のペア毎に、ソートされる。 The distance sorting unit 114 sorts all reference content IDs corresponding to the query hash ID in ascending order of distance. Sorted for each pair of reference content ID and distance.

リファレンスコンテンツＩＤ抽出部１１５は、ソートされたリファレンスコンテンツＩＤ全体の中で、上位閾値割合Ｐ（％）に含まれるリファレンスコンテンツＩＤ群のみを抽出する（図３の１１４参照）。 The reference content ID extraction unit 115 extracts only the reference content ID group included in the upper threshold ratio P (%) from the entire sorted reference content ID (see 114 in FIG. 3).

類似検索部１１６は、クエリコンテンツ（クエリ特徴ベクトルの集合）とリファレンスコンテンツ（リファレンス特徴ベクトルの集合）との間の類似度を判定する。リファレンスコンテンツＩＤ抽出部１１５によって抽出されたリファレンスコンテンツＩＤに対応するリファレンス特徴ベクトルを含む複数のリファレンスコンテンツの中から、最も類似するリファレンスコンテンツを検索する。 The similarity search unit 116 determines the similarity between the query content (a set of query feature vectors) and the reference content (a set of reference feature vectors). The most similar reference content is searched from among a plurality of reference contents including a reference feature vector corresponding to the reference content ID extracted by the reference content ID extraction unit 115.

ここで、類似検索部１１６は、クエリコンテンツに含まれる複数のクエリ特徴ベクトル毎に導出された複数のリファレンスコンテンツＩＤを、当該リファレンスコンテンツＩＤ毎に投票し、その投票数に応じたリファレンスコンテンツＩＤを抽出する。 Here, the similarity search unit 116 votes, for each reference content ID, a plurality of reference content IDs derived for each of the plurality of query feature vectors included in the query content, and determines a reference content ID corresponding to the number of votes. Extract.

図５は、リファレンスコンテンツＩＤの投票を表すグラフである。 FIG. 5 is a graph showing the vote of the reference content ID.

最初に、リファレンスコンテンツＩＤ抽出部１１５によって抽出されたリファレンスコンテンツＩＤ群について、クエリ特徴ベクトルのクエリハッシュＩＤ毎に、各リファレンスコンテンツＩＤの個数をカウントする（図５（ａ）参照）。例えば、１つのクエリコンテンツについて、クリエ特徴ベクトル毎にクエリハッシュＩＤ１〜３が導出されたとする。そのクエリハッシュＩＤ毎に、リファレンスコンテンツＩＤが抽出される。
クリエハッシュＩＤリファレンスコンテンツＩＤ
１ -> ５，６
２ -> １，４，５
３ -> １，５
ここで、リファレンスコンテンツＩＤの出現頻度は、以下のように表される。
リファレンスコンテンツＩＤ出現頻度
１ -> ２
２ -> ０
３ -> ０
４ -> １
５ -> ３（投票数の局所最大）
６ -> １ First, for the reference content ID group extracted by the reference content ID extraction unit 115, the number of each reference content ID is counted for each query hash ID of the query feature vector (see FIG. 5A). For example, assume that query hash IDs 1 to 3 are derived for each query feature vector for one query content. A reference content ID is extracted for each query hash ID.
Crie hash ID Reference content ID
1-> 5, 6
2-> 1, 4, 5
3-> 1,5
Here, the appearance frequency of the reference content ID is expressed as follows.
Reference content ID appearance frequency
1-> 2
2-> 0
3-> 0
4-> 1
5-> 3 (local maximum number of votes)
6-> 1

次に、リファレンスコンテンツＩＤ毎の総個数を、正規化項で正規化する。ここで、正規化項は、以下のように算出される。 Next, the total number for each reference content ID is normalized by a normalization term. Here, the normalization term is calculated as follows.

リファレンス情報蓄積部１００は、ハッシュＩＤがＩＤ１〜４をとる場合、リファレンスコンテンツ毎に、各ハッシュＩＤの出現頻度を蓄積する（図５（ｂ）参照）。例えば、リファレンスコンテンツＩＤ１における、リファレンスハッシュＩＤの出現頻度は、以下のようになる。
［リファレンスコンテンツＩＤ１の場合］
リファレンスハッシュＩＤ出現頻度
１ -> １
２ -> ３
３ -> ０
４ -> １ When the hash ID is ID1 to 4, the reference information storage unit 100 stores the appearance frequency of each hash ID for each reference content (see FIG. 5B). For example, the appearance frequency of the reference hash ID in the reference content ID1 is as follows.
[For reference content ID1]
Reference hash ID appearance frequency
1-> 1
2-> 3
3-> 0
4-> 1

そして、リファレンスハッシュＩＤ毎の出現頻度に応じて、以下のように２種類の正規化項を算出する。以下では、リファレンスハッシュＩＤ１についてのみ記述する。
リファレンスハッシュＩＤ１のＬ１ノルム
１＋３＋０＋１＝５
リファレンスハッシュＩＤ１のＬ２ノルムの場合
√(１²＋３²＋０²＋１²)＝√(１＋９＋０＋１)＝√１１ Then, according to the appearance frequency for each reference hash ID, two types of normalization terms are calculated as follows. Hereinafter, only the reference hash ID1 will be described.
L1 norm of reference hash ID1 1 + 3 + 0 + 1 = 5
In the case of the L2 norm of the reference hash ID 1 √ (1 ² +3 ² +0 ² +1 ² ) = √ (1 + 9 + 0 + 1) = √11

最終的に、クエリコンテンツに対するリファレンスコンテンツの類似度は、以下のように表される（図５（ｃ）参照）。
リファレンスコンテンツＩＤＬ１ノルム類似度Ｌ２ノルム類似度
１ -> ２／５２／√１１
２ -> ０／９０／√９
３ -> ０／１１０／√１１
４ -> １／８１／√８
５ -> ３／９３／√９
６ -> １／８１／√８ Finally, the similarity of the reference content to the query content is expressed as follows (see FIG. 5C).
Reference content ID L1 norm similarity L2 norm similarity
1-> 2/5 2 / √11
2-> 0/9 0 / √9
3-> 0/11 0 / √11
4-> 1/8 1 / √8
5-> 3/9 3 / √9
6-> 1/8 1 / √8

このように、類似検索部１１６は、リファレンスコンテンツ毎の得票数を、対応するリファレンスコンテンツのＬｐノルムで正規化した値を、類似度として算出する。そして、類似度が最も高いリファレンスコンテンツＩＤを、検索結果として出力する。 In this manner, the similarity search unit 116 calculates a value obtained by normalizing the number of votes for each reference content with the Lp norm of the corresponding reference content as the similarity. Then, the reference content ID having the highest similarity is output as a search result.

［第２の実施形態］
図６は、本発明における検索装置の第２の機能構成図である。
図７は、本発明におけるデータ構造を表す第２の説明図である。 [Second Embodiment]
FIG. 6 is a second functional configuration diagram of the search device according to the present invention.
FIG. 7 is a second explanatory diagram showing a data structure in the present invention.

前述した第１の実施形態によれば、リファレンス情報蓄積部１００は、リファレンスハッシュＩＤ毎に、リファレンスコンテンツＩＤと、リファレンス特徴ベクトルとの組み合わせを、複数蓄積する。これに対し、第２の実施形態によれば、リファレンス情報蓄積部１００は、リファレンスハッシュＩＤ毎に、リファレンスコンテンツＩＤと、当該リファレンス特徴ベクトルに対するリファレンス符号との組み合わせを、複数蓄積する。第２の蓄積情報は、第１の蓄積情報のリファレンス特徴ベクトルに代えて、リファレンス符号を蓄積する。 According to the first embodiment described above, the reference information storage unit 100 stores a plurality of combinations of reference content IDs and reference feature vectors for each reference hash ID. On the other hand, according to the second embodiment, the reference information storage unit 100 stores a plurality of combinations of the reference content ID and the reference code for the reference feature vector for each reference hash ID. The second accumulation information accumulates a reference code instead of the reference feature vector of the first accumulation information.

図６によれば、図２と比較して、リファレンス特徴ベクトル及びクエリ特徴ベクトルを、リファレンス符号及びクエリ符号（若しくはクエリ差分ベクトル）に符号化する符号化部１１７を更に有する。 According to FIG. 6, compared with FIG. 2, it further has the encoding part 117 which encodes a reference feature vector and a query feature vector into a reference code and a query code (or query difference vector).

図７によれば、リファレンス情報蓄積部１００は、リファレンスコンテンツＩＤ毎に、リファレンス符号を記憶する（図７の１００参照）。また、距離算出部１１３は、リファレンス符号と、クエリ符号（又はクエリ差分ベクトル）との間の距離を算出する（図７の１１３参照）。それらの点以外は、前述した図３と全く同様である。 According to FIG. 7, the reference information storage unit 100 stores a reference code for each reference content ID (see 100 in FIG. 7). The distance calculation unit 113 calculates a distance between the reference code and the query code (or query difference vector) (see 113 in FIG. 7). Except for these points, it is exactly the same as FIG. 3 described above.

図８は、本発明における符号化部の機能構成図である。 FIG. 8 is a functional configuration diagram of the encoding unit in the present invention.

符号化部１１７は、大きく２通りの符号化処理の中で、いずれか一方の処理が実行される。尚、リファレンス特徴ベクトル及びクエリ特徴ベクトルには、同じ符号化処理が実行される。 The encoding unit 117 executes either one of the two main types of encoding processes. The same encoding process is performed on the reference feature vector and the query feature vector.

（第１の符号化処理）
符号化部１１７は、特徴ベクトル集合抽出部１１１から出力されたリファレンス特徴ベクトル及びクエリ特徴ベクトルを入力する。次に、各特徴ベクトルｆkについて、割り当てられたハッシュＩＤikに対応する代表ベクトルｇikからの差分ベクトルｒkを算出する（ｒk＝ｆk−ｇik）。そして、差分ベクトルｒkに対してランダムな直交射影(Random Orthogonal Projection)によって射影変換を実行する。次に、変換後の各次元の値を、閾値で二値化することによって、バイナリ列のリファレンス符号及びクエリ符号を生成する。尚、直積量子化の場合には、差分ベクトルｒkを複数の部分ベクトルに分割し、それぞれを個別のコードブックで量子化し、量子化ＩＤの列を符号ｃkとする。リファレンス符号は、リファレンス情報蓄積部１００へ出力され、クエリ符号は、距離算出部１１３へ出力される。 (First encoding process)
The encoding unit 117 receives the reference feature vector and the query feature vector output from the feature vector set extraction unit 111. Next, for each feature vector fk, a difference vector rk from the representative vector gik corresponding to the assigned hash IDik is calculated (rk = fk−gik). Then, the projective transformation is executed on the difference vector rk by random orthogonal projection (Random Orthogonal Projection). Next, a binary string reference code and a query code are generated by binarizing the value of each dimension after conversion with a threshold value. In the case of direct product quantization, the difference vector rk is divided into a plurality of partial vectors, each of which is quantized with an individual codebook, and a sequence of quantization IDs is designated as code ck. The reference code is output to the reference information storage unit 100, and the query code is output to the distance calculation unit 113.

このような符号化処理に対して、距離算出部１１３は、リファレンス符号及びクエリ符号のハミング距離(Hamming Embedding)を算出する（例えば非特許文献２参照）。 For such an encoding process, the distance calculation unit 113 calculates a Hamming Embedding of the reference code and the query code (see, for example, Non-Patent Document 2).

（第２の符号化処理）
符号化部１１７は、ハッシュＩＤに対応する代表ベクトルを蓄積する。代表ベクトルとは、当該ハッシュＩＤを持つ多数の特徴ベクトルの平均ベクトル又は中央値ベクトルから算出されるものであってもよい。そして、符号化部１１７は、リファレンス特徴ベクトルと、当該リファレンス特徴ベクトルのリファレンスハッシュＩＤに対応する代表ベクトルとの間のリファレンス差分ベクトルを算出し、そのリファレンス差分ベクトルをリファレンス符号に符号化する。 (Second encoding process)
The encoding unit 117 stores a representative vector corresponding to the hash ID. The representative vector may be calculated from an average vector or median vector of a number of feature vectors having the hash ID. Then, the encoding unit 117 calculates a reference difference vector between the reference feature vector and a representative vector corresponding to the reference hash ID of the reference feature vector, and encodes the reference difference vector into a reference code.

符号化部１１７は、直積量子化のためのコードブックを更に有する。そして、リファレンス差分ベクトルを、コードブックを用いて直積量子化によってリファレンス符号に符号化する（例えば非特許文献３参照）。また、符号化部１１７は、クエリ特徴ベクトルと、クエリハッシュＩＤに対応する代表ベクトルとの間のクエリ差分ベクトルを算出する。リファレンス符号は、リファレンス情報蓄積部１００へ出力され、クエリ差分ベクトルは、距離算出部１１３へ出力される。 The encoding unit 117 further includes a code book for direct product quantization. Then, the reference difference vector is encoded into a reference code by direct product quantization using a code book (see, for example, Non-Patent Document 3). In addition, the encoding unit 117 calculates a query difference vector between the query feature vector and the representative vector corresponding to the query hash ID. The reference code is output to the reference information storage unit 100, and the query difference vector is output to the distance calculation unit 113.

このような符号化処理に対して、距離算出部１１３は、リファレンス符号とクエリ差分ベクトルとのＬｐ距離を算出する（例えば非特許文献４及び５参照）。 For such an encoding process, the distance calculation unit 113 calculates the Lp distance between the reference code and the query difference vector (see, for example, Non-Patent Documents 4 and 5).

第１の符号化処理及び第２の符号化処理の両方とも、予め大量のリファレンス特徴ベクトルから、ハッシュ化部１１２によって各ハッシュＩＤが割り当てられるリファレンス特徴ベクトルの平均ベクトルを算出しておく。 In both the first encoding process and the second encoding process, an average vector of reference feature vectors to which each hash ID is assigned by the hashing unit 112 is calculated in advance from a large amount of reference feature vectors.

前述した本発明の種々の実施形態について、本発明の技術思想及び見地の範囲の種々の変更、修正及び省略は、当業者によれば容易に行うことができる。前述の説明はあくまで例であって、何ら制約しようとするものではない。本発明は、特許請求の範囲及びその均等物として限定するものにのみ制約される。 Various changes, modifications, and omissions of the above-described various embodiments of the present invention can be easily made by those skilled in the art. The above description is merely an example, and is not intended to be restrictive. The invention is limited only as defined in the following claims and the equivalents thereto.

１検索装置
１００リファレンス情報蓄積部
１１１特徴ベクトル集合抽出部
１１２ハッシュ化部
１１３距離算出部
１１４距離ソート部
１１５リファレンスコンテンツＩＤ抽出部
１１６類似検索部
１１７符号化部 DESCRIPTION OF SYMBOLS 1 Search apparatus 100 Reference information storage part 111 Feature vector set extraction part 112 Hashing part 113 Distance calculation part 114 Distance sort part 115 Reference content ID extraction part 116 Similarity search part 117 Encoding part

Claims

A search device for searching for reference content similar to query content represented by a set of feature vectors from a set of reference content represented by a set of feature vectors,
Reference information storage means for storing a plurality of combinations of reference content identification values and reference feature vectors for each reference hash identification value;
Hashing means for deriving one or more unique query hash identification values for each query feature vector using one or more hash functions for the query content;
Distance calculation means for calculating a distance between each reference feature vector corresponding to the query hash identification value and the query feature vector using the reference information storage means;
Distance sorting means for sorting all the reference content identification values corresponding to the query hash identification values in ascending order of the distance;
Reference content identification value extraction means for extracting only reference content identification values included in the upper threshold ratio (%) among the sorted reference content identification values;
A search apparatus comprising: similar search means for searching for the most similar reference content from among a plurality of reference contents including a reference feature vector corresponding to the extracted reference content identification value.

The reference information storage means stores a reference code for the reference feature vector instead of the reference feature vector,
The hashing means further derives one or more unique reference hash identification values using one or more hash functions for each of the plurality of reference feature vectors;
Encoding means for encoding the reference feature vector into a reference code and storing it in the reference information storage means;
The distance calculation unit calculates a distance between each reference code corresponding to the query hash identification value and the query feature vector using the reference information storage unit. Search device.

The similarity search means votes a plurality of reference content identification values derived for each of a plurality of query feature vectors included in the query content for each reference content identification value, and a reference content identification value corresponding to the number of votes The search device according to claim 1, wherein the search device is extracted.

The reference information storage means stores an Lp norm of the frequency of the reference hash identification value for each reference content,
4. The search apparatus according to claim 3, wherein the similarity search means calculates a value obtained by normalizing the number of votes obtained for each reference content by the Lp norm of the corresponding reference content as the similarity.

The encoding means performs a projective transformation on the reference feature vector and the query feature vector, and generates a binary sequence reference code and query code by binarizing the values of each dimension after conversion with a threshold value. And
The search apparatus according to claim 4, wherein the distance calculation unit calculates a Hamming distance between the reference code and the query code.

The encoding means includes
Storing representative vectors corresponding to the hash identification values;
5. A reference difference vector between the reference feature vector and a representative vector corresponding to a reference hash identification value of the reference feature vector is calculated, and the reference difference vector is encoded into a reference code. The search device described in 1.

The search apparatus according to claim 6, wherein the representative vector is calculated from an average vector or a median vector of a number of feature vectors having the hash identification value.

The encoding means includes
A codebook for product quantization,
The reference difference vector is encoded into a reference code by direct product quantization using the codebook,
Calculating a query difference vector between the query feature vector and a representative vector corresponding to the query hash identification value;
The search apparatus according to claim 7, wherein the distance calculating unit calculates an Lp distance between the reference code and a query difference vector.

The hashing unit further includes a codebook for vector quantization, and the hash function calculates a reference hash identification value and a query hash identification value by performing vector quantization. 9. The search device according to 8.

The hashing means calculates a reference hash identification value and a query hash identification value by discretizing values of inner products of the reference feature vector and the query feature vector and a predetermined vector of the same dimension with a predetermined threshold. 9. The search device according to claim 5 or 8, wherein:

The search device according to claim 9 or 10, wherein the vector is a local feature vector extracted from a local feature region of an image.

A program that causes a computer to function so as to search reference content similar to query content represented by a set of feature vectors from a set of reference content represented by a set of feature vectors,
Reference information storage means for storing a plurality of combinations of reference content identification values and reference feature vectors for each reference hash identification value;
Hashing means for deriving one or more unique query hash identification values for each query feature vector using one or more hash functions for the query content;
Distance calculation means for calculating a distance between each reference feature vector corresponding to the query hash identification value and the query feature vector using the reference information storage means;
Distance sorting means for sorting all the reference content identification values corresponding to the query hash identification values in ascending order of the distance;
Reference content identification value extraction means for extracting only reference content identification values included in the upper threshold ratio (%) among the sorted reference content identification values;
A search program characterized by causing a computer to function as a similarity search means for searching for the most similar reference content from among a plurality of reference contents including a reference feature vector corresponding to the extracted reference content identification value.