JP5833499B2

JP5833499B2 - Retrieval device and program for retrieving content expressed by high-dimensional feature vector set with high accuracy

Info

Publication number: JP5833499B2
Application number: JP2012121454A
Authority: JP
Inventors: 祐介内田; 茂之酒澤
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2012-05-29
Filing date: 2012-05-29
Publication date: 2015-12-16
Anticipated expiration: 2032-05-29
Also published as: JP2013246739A

Description

本発明は、特徴ベクトルの集合で表されるリファレンスコンテンツの集合から、同じく特徴ベクトルの集合で表されるクエリコンテンツ（検索キー）に類似したリファレンスコンテンツを高精度に検索する技術に関する。特に、高次元の特徴ベクトルの集合で表されるマルチメディアコンテンツ（例えば画像）の検索に適する。 The present invention relates to a technique for accurately retrieving reference content similar to a query content (search key) similarly represented by a set of feature vectors from a set of reference content represented by a set of feature vectors. In particular, it is suitable for searching multimedia contents (for example, images) represented by a set of high-dimensional feature vectors.

近年、オンライン／オフラインに限られず、ストレージの大容量化に伴って、大量のコンテンツを蓄積することが可能となっている。また、携帯電話機やスマートフォンに代表される情報端末機器の普及によって、ユーザ自ら取得した写真データのようなデジタルコンテンツも、データベースに大量かつ容易に蓄積することができる。オフラインデータベースとして、ＨＤＤ(Hard Disk Drive)、ＤＶＤ(Digital Versatile Disk)、Blu-ray disc等の記憶装置がある。また、オンラインデータベースとしては、Flickr（登録商標）やMySpace（登録商標）のようなソーシャルネットワークサービスがある。これら記憶装置及びサービスによれば、データベースに蓄積された個人の大量且つ多様なマルチメディアコンテンツを検索するする技術が重要となる。 In recent years, not limited to online / offline, it has become possible to accumulate a large amount of content as the capacity of the storage increases. In addition, with the widespread use of information terminal devices typified by mobile phones and smartphones, digital content such as photograph data acquired by the user can be easily stored in a large amount in a database. Offline databases include storage devices such as HDD (Hard Disk Drive), DVD (Digital Versatile Disk), and Blu-ray disc. Online databases include social network services such as Flickr (registered trademark) and MySpace (registered trademark). According to these storage devices and services, a technique for searching for a large amount and various multimedia contents of individuals stored in a database becomes important.

マルチメディアコンテンツを検索するために、これらコンテンツから多数の特徴ベクトルを抽出し、この特徴ベクトルの集合同士の間の類似度が高いコンテンツを検索結果として出力する技術がある。この技術によれば、マルチメディアコンテンツの特徴ベクトルを量子化し、量子化された特徴ベクトルの頻度からヒストグラムを作成する。そのヒストグラム同士の間のＬ１ノルム又はＬ２ノルムの距離によって類似度（距離）を算出する。ノルムとは、２つの点の間の距離を表す。Ｌ１ノルムとは、２つの点の各次元の値の絶対値の和を意味し、Ｌ２ノルムとは、２つの点の各次元の値を二乗した和を意味する。 In order to search for multimedia contents, there is a technique for extracting a large number of feature vectors from these contents and outputting contents having a high degree of similarity between sets of feature vectors as search results. According to this technique, feature vectors of multimedia content are quantized and a histogram is created from the frequency of the quantized feature vectors. The similarity (distance) is calculated by the distance of the L1 norm or L2 norm between the histograms. The norm represents the distance between two points. The L1 norm means the sum of the absolute values of the dimensions of the two points, and the L2 norm means the sum of the squares of the values of the two points.

また、画像コンテンツから大量の局所特徴ベクトルを抽出し、それらをベクトル量子化し、同一の代表ベクトルにベクトル量子化された局所特徴ベクトルの数で類似度を算出する技術もある（例えば非特許文献１参照）。 There is also a technique for extracting a large amount of local feature vectors from image content, vector quantizing them, and calculating the similarity based on the number of local feature vectors vector-quantized to the same representative vector (for example, Non-Patent Document 1). reference).

更に、画像から複数の局所不変特徴量を抽出し、特徴ベクトルの頻度のヒストグラム化し、そのヒストグラムの重なり率によって画像とカテゴリとの間の類似度を算出する技術もある（例えば特許文献１参照）。この技術によれば、ヒストグラムに基づいて被写体のパターン認識に不要となる特徴（例えば背景の特徴）を除くことができる。これによって、画像中から物体と物体以外とを予め分離することなく、当該物体の特徴を抽出することができる。 Furthermore, there is a technique for extracting a plurality of local invariant feature amounts from an image, making a histogram of the frequency of feature vectors, and calculating the similarity between the image and the category based on the overlapping ratio of the histograms (see, for example, Patent Document 1). . According to this technique, features (for example, background features) that are not necessary for pattern recognition of a subject can be removed based on the histogram. As a result, the feature of the object can be extracted without previously separating the object and the non-object from the image.

従来、局所特徴量を用いた類似画像検索の枠組みは、「Bag-of-Visual Words」（又はBag-of-Features、Bag-of-Keypoints）と称される（例えば非特許文献１参照）。この技術によれば、Bag-of-Wordsモデル及び転置インデックスを用いた文章の検索方法を、類似画像の検索に適用したものである。Bag-of-Wordsは、文章を１つの単語の頻度により定義される特徴ベクトルで表現し、文章集合に基づいて予め導出されたＩＤＦ(Inverse Document Frequency)を単語の重みとして文章間の類似度を導出する枠組みである。これに対し、Bag-of-Visual Wordsは、画像の局所特徴量を量子化し、量子化後の局所特徴量を単語と見立て、同様に頻度により定義される１つの特徴ベクトルとして表現し、ＩＤＦを用いた重み付けを利用して同一の類推方法を適用することができる。 Conventionally, a similar image search framework using local features is referred to as “Bag-of-Visual Words” (or Bag-of-Features, Bag-of-Keypoints) (see, for example, Non-Patent Document 1). According to this technique, a sentence retrieval method using a Bag-of-Words model and a transposed index is applied to retrieval of similar images. Bag-of-Words expresses a sentence as a feature vector defined by the frequency of one word, and uses IDF (Inverse Document Frequency) derived in advance based on the sentence set to determine the similarity between sentences. It is a framework to derive. On the other hand, Bag-of-Visual Words quantizes the local feature quantity of an image, regards the local feature quantity after quantization as a word, and expresses it as one feature vector similarly defined by the frequency. The same analogy method can be applied using the weighting used.

特開２０１０−２８２５８１号公報JP 2010-282581 A 特開２００９−０２０７６９号公報JP 2009-020769A

J. Sivic et al., "Video Google: A Text Retrieval Approach toObject Matching in Videos," in Proc. ICCV, 2003.J. Sivic et al., "Video Google: A Text Retrieval Approach to Object Matching in Videos," in Proc. ICCV, 2003. H. Jegou, M. Douze, and C. Schmid, "Improving bag-offeaturesfor large scale image search," in IJCV, vol.87, no.3, pp.316-336, 2010.H. Jegou, M. Douze, and C. Schmid, "Improving bag-offeatures for large scale image search," in IJCV, vol.87, no.3, pp.316-336, 2010. Y. Uchida, M. Agrawal, and S. Sakazawa, "Accurate Content-BasedVideo Copy Detection with Efficient Feature Indexing," in Proc. of ICMR,2011.Y. Uchida, M. Agrawal, and S. Sakazawa, "Accurate Content-Based Video Copy Detection with Efficient Feature Indexing," in Proc. Of ICMR, 2011. D. G. Lowe, "Distinctive Image Features from Scale-InvariantKeypoints," International Journal of Computer Vision, vol. 60, no. 2, pp.91-110, 2004.D. G. Lowe, "Distinctive Image Features from Scale-InvariantKeypoints," International Journal of Computer Vision, vol. 60, no. 2, pp.91-110, 2004. H. Jegou, M. Douze, and C. Schmid, "Product quantization fornearest neighbor search," in IEEE Trans. on PAMI, vol. 33, no. 1, pp117-128, 2011.H. Jegou, M. Douze, and C. Schmid, "Product quantization fornearest neighbor search," in IEEE Trans. On PAMI, vol. 33, no. 1, pp117-128, 2011. O. Boiman, E. Shechtman, and M. Irani, "In defense ofnearest-neighbor based image classification," in Proc. of CVPR, 2008.O. Boiman, E. Shechtman, and M. Irani, "In defense of nearest-neighbor based image classification," in Proc. Of CVPR, 2008.

しかしながら、既存のBag-of-Visual Wordsの技術によれば、特徴ベクトルに基づいてコンテンツ間の類似度のスコアを算出する際に、文章検索におけるＩＤＦを用いている。ＩＤＦとは、テキストマイニングの用途について、文章中に出現した特定の単語がどのくらい特徴的であるかを識別するための指標を表す。ＩＤＦの場合、例えば固有名詞のような、「各文章は、当該文章内に含まれる少数の単語によって検索される」という前提で設計されている。言い換えれば、「各文章は、当該文章内に含まれない単語によっては検索されない」という前提である。ＩＤＦは、具体的には、当該文章の中で特定の単語が出現した回数と、コーパス全体の中でその文章を含む文章数の自然対数とから、そのコーパス中におけるその単語の特徴度として算出される。 However, according to the existing Bag-of-Visual Words technique, IDF in sentence search is used when calculating a score of similarity between contents based on a feature vector. IDF represents an index for identifying how characteristic a specific word that appears in a sentence is for the purpose of text mining. In the case of IDF, it is designed on the premise that “each sentence is searched by a small number of words included in the sentence”, such as proper nouns. In other words, it is a premise that “each sentence is not searched by a word not included in the sentence”. Specifically, the IDF is calculated as the characteristic degree of the word in the corpus from the number of times a specific word appears in the sentence and the natural logarithm of the number of sentences including the sentence in the entire corpus. Is done.

一方、画像検索の場合、局所不変特徴領域から、高次元の特徴ベクトルを抽出する必要がある。例えば、物体認識に用いる特徴ベクトルを抽出するために代表的なＳＩＦＴ(Scale-Invariant Feature Transform)によれば、特徴領域を複数のブロックに分割し、各ブロックから輝度勾配の方向を重み付きヒストグラムとして抽出する。 On the other hand, in the case of image search, it is necessary to extract a high-dimensional feature vector from a local invariant feature region. For example, according to a typical SIFT (Scale-Invariant Feature Transform) for extracting a feature vector used for object recognition, a feature region is divided into a plurality of blocks, and the direction of the luminance gradient from each block is used as a weighted histogram. Extract.

ここで、リファレンスコンテンツ（検索対象のコンテンツ）に対して、クエリコンテンツ（検索キーのコンテンツ）は、対象物とは無関係な特徴ベクトルが多く含まれる場合がある。具体的には、対象物をカメラで撮影した画像を、クエリコンテンツとした場合である。リファレンスコンテンツには、例えば背景が白で、検索対象物のみが写っているのに対し、クエリコンテンツには、当該対象物だけでなく、その背景に様々な物が写り込む。即ち、クエリコンテンツの背景には、当該対象物とは無関係の様々特徴ベクトルが検出される。これが、検索精度を低下させる原因となっている。 Here, there are cases where the query content (content of the search key) contains a lot of feature vectors unrelated to the target with respect to the reference content (content to be searched). Specifically, this is a case where an image obtained by photographing the object with a camera is used as query content. In the reference content, for example, the background is white and only the search target object is shown, whereas in the query content, not only the target object but also various objects appear in the background. That is, various feature vectors irrelevant to the target object are detected in the background of the query content. This causes a decrease in search accuracy.

そこで、本発明によれば、高次元の特徴ベクトル集合を検索する際に、クエリコンテンツの特徴ベクトルに、無関係な特徴ベクトルが含まれていることを考慮して、リファレンスコンテンツに対するスコア付けをすることができる検索装置及びプログラムを提供することを目的とする。 Therefore, according to the present invention, when searching for a high-dimensional feature vector set, scoring is performed on the reference content in consideration that the feature vector of the query content includes an irrelevant feature vector. It is an object of the present invention to provide a search device and a program that can perform the above-mentioned.

本発明によれば、特徴ベクトルの集合で表されるリファレンスコンテンツの集合から、特徴ベクトルの集合で表されるクエリコンテンツに類似したリファレンスコンテンツを検索する検索装置であって、
複数のリファレンスコンテンツＲ_jから抽出された各特徴ベクトルに対応付けて、リファレンスコンテンツ識別子を記憶したリファレンス情報蓄積手段と、
リファレンス情報蓄積手段を用いて、クエリコンテンツの各特徴ベクトルｑ_iについて、類似した特徴ベクトルを持つリファレンスコンテンツの特徴ベクトルの集合Ｄを、少なくとも１つ以上探索する類似ベクトル探索手段と、
混合パラメータλを用いて、クエリコンテンツの各特徴ベクトルｑ_iが、探索された各リファレンスコンテンツから生成される確率λ・ｐ(ｑ_i|Ｒ_j)と、当該リファレンスコンテンツと無関係な背景モデルから生成される確率(1-λ)・ｐ(ｑ_i)との確率比に基づいて、リファレンスコンテンツＲ_j毎にスコアを加算することを、当該クエリコンテンツの全ての特徴ベクトルｑ_iについて実行し、最終的に、所定閾値以上の上位のスコアを得たリファレンスコンテンツＲ_jを、検索結果として出力する投票手段と
を有することを特徴とする。 According to the present invention, there is provided a search device for searching reference content similar to query content represented by a set of feature vectors from a set of reference content represented by a set of feature vectors,
Reference information storage means for storing a reference content identifier in association with each feature vector extracted from a plurality of reference contents R _j ;
Similar vector search means for searching for at least one feature vector set D of reference contents having similar feature vectors for each feature vector q _i of query contents using reference information storage means;
Using the mixed parameter λ, each feature vector q _i of the query content is generated from the probability λ · p (q _i | R _j ) generated from each searched reference content and a background model unrelated to the reference content. Based on the probability ratio with respect to the probability (1-λ) · p (q _i ), the score is added for each reference content R _j for all feature vectors q _i of the query content, and finally In particular, it has a voting means for outputting, as a search result, reference content R _j that has obtained a higher score above a predetermined threshold.

本発明の検索装置における他の実施形態によれば、投票手段は、確率比を、探索されたリファレンスコンテンツの特徴ベクトルの集合Ｄに含まれる、全リファレンスコンテンツの特徴ベクトルの出現数(Ｄ_s)に対する、当該リファレンスコンテンツＲ_jの特徴ベクトルの出現数(ｎ_j)との比（ｎ_j／Ｄ_s)に基づいて算出することも好ましい。 According to another embodiment of the search device of the present invention, the voting means includes the probability ratio of the number of appearances of feature vectors of all reference contents (D _s ) included in the set D of feature vectors of the searched reference contents. It is also preferable to calculate based on the ratio (n _j / D _s ) to the number of appearances of the feature vector (n _j ) of the reference content R _j .

本発明の検索装置における他の実施形態によれば、投票手段は、確率比を、更に以下の式における比に基づいて算出する
（集合Ｄに含まれる当該リファレンスコンテンツＲ_jの特徴ベクトルの数(ｎ_j)×
全リファレンスコンテンツの特徴ベクトルの数(|Ｒ_all|)）／
（集合Ｄに含まれる全リファレンスコンテンツの特徴ベクトルの数(Ｄ_s)×
当該リファレンスコンテンツＲ_jの特徴ベクトルの数(Ｒ_j)）
ことも好ましい。 According to another embodiment of the search device of the present invention, the voting means further calculates the probability ratio based on the ratio in the following equation (number of feature vectors of the reference content R _j included in the set D ( n _j ) ×
Number of feature vectors of all reference content (| R _all |)) /
(Number of feature vectors of all reference contents included in set D (D _s ) ×
Number of feature vectors of the reference content R _j (R _j ))
It is also preferable.

本発明の検索装置における他の実施形態によれば、
類似ベクトル探索手段は、探索されたリファレンスコンテンツの特徴ベクトルの集合を、クエリコンテンツの特徴ベクトルｑ_iに対する平均的な類似度に応じて１つ以上の部分集合（クラスタ）Ｄに区分し、上位からの近傍数m（≧1）番目までの部分集合Ｄを順序付けし、
投票手段は、部分集合Ｄ_t（1≦t≦m）毎に各リファレンスコンテンツについてスコアを算出し、当該スコアが最大となる部分集合Ｄ_t’におけるスコアを投票に用いる
ことも好ましい。 According to another embodiment of the search device of the present invention,
The similar vector search means classifies the set of feature vectors of the searched reference content into one or more subsets (clusters) D according to the average similarity to the feature vector q _i of the query content. Order subsets D up to the number m (≧ 1) of neighborhoods of
Preferably, the voting means calculates a score for each reference content for each subset D _t (1 ≦ t ≦ m), and uses a score in the subset D _t ′ that maximizes the score for voting.

本発明の検索装置における他の実施形態によれば、
投票手段は、上位からt番目までの特徴ベクトルの集合Ｄ_tについて、クエリコンテンツの特徴ベクトルｑ_iに対するリファレンスコンテンツjのスコアｓ_jは、以下の式によって算出される
ｓ_j＝max_jlog｛λ/(1-λ)・(ｎ_tj・|Ｒ_all|)／(Σ_s=1 ^t|Ｄ_s(ｑ_i)|・|Ｒ_j|)＋１)}
ｎ_tj： t番目までの部分集合Ｄ_tに含まれる
リファレンスコンテンツjの特徴ベクトルの数
Σ_s=1 ^t|Ｄ_s(ｑ_i)|： t番目までの部分集合Ｄ_tに含まれる
全リファレンスコンテンツjの特徴ベクトルの数
|Ｒ_all|：全リファレンスコンテンツにおける全特徴ベクトルの数
|Ｒ_j|：当該リファレンスコンテンツjにおける全特徴ベクトルの数
λ、1-λ：混合パラメータ
ことも好ましい。 According to another embodiment of the search device of the present invention,
The voting means calculates the score s _j of the reference content j with respect to the feature vector q _i of the query content with respect to the set D _t of feature vectors from the top to the t-th feature vector s _j = max _j log {λ / (1-λ) · (n _tj · | R _all |) / (Σ _{s = 1} ^t | D _s (q _i ) | · | R _j |) +1)}
n _tj : included in t-th subset D _t
Number of feature vectors for reference content j
Σ _{s = 1} ^t | D _s (q _i ) |: included in t-th subset D _t
Number of feature vectors of all reference content j
| R _all |: Number of all feature vectors in all reference contents
| R _j |: Number of all feature vectors in the reference content j
λ, 1-λ: Mixing parameters are also preferred.

本発明の検索装置における他の実施形態によれば、
リファレンス情報蓄積手段は、各部分集合（クラスタ）Ｄ毎に、特徴ベクトルの平均的な代表ベクトルを更に対応付けて記憶しており、
類似ベクトル探索手段は、クエリコンテンツの各特徴ベクトルｑ_iと、各部分集合Ｄの代表ベクトルとを比較して、部分集合Ｄを探索することも好ましい。 According to another embodiment of the search device of the present invention,
The reference information storage means further stores an average representative vector of feature vectors in association with each subset (cluster) D,
It is also preferable that the similar vector search means searches the subset D by comparing each feature vector q _i of the query content with the representative vector of each subset D.

本発明の検索装置における他の実施形態によれば、類似ベクトル探索手段は、
クエリコンテンツの各特徴ベクトルｑ_iに類似する、リファレンスコンテンツの特徴ベクトルのうちk個の部分集合を探索する第１の手段と、
近傍数kの部分集合に含まれる、リファレンスコンテンツの特徴ベクトルの数Ｌを計数する第２の手段と、
近傍数k個の部分集合に含まれるＬ個のリファレンスコンテンツの特徴ベクトルのうち、更にｑ_iに類似した上位m（m≦Ｌ）個の特徴ベクトルを探索する第３の手段と
を有し、
第３の手段における特徴ベクトル数mは、第２の手段によって計数された特徴ベクトルの数Ｌに応じて更新され、
Ｄ_t(ｑi)（1≦t≦m）は、ｔ番目の特徴ベクトルのみで構成されることも好ましい。 According to another embodiment of the search device of the present invention, the similar vector search means includes:
First means for searching for k subsets of feature vectors of reference content that are similar to each feature vector q _i of query content;
A second means for counting the number L of feature vectors of reference content included in the subset of the number of neighbors k ;
A third means for searching for upper m (m ≦ L) feature vectors similar to q _i among feature vectors of L reference contents included in a subset of k neighbors;
The number m of feature vectors in the third means is updated according to the number L of feature vectors counted by the second means ,
It is also preferable that D _t (qi) (1 ≦ t ≦ m) is composed of only the t-th feature vector.

本発明の検索装置における他の実施形態によれば、第３の手段における特徴ベクトル数mは、第２の手段によって得られる特徴ベクトルの数Ｌを用いて、αＬ（α≦１）によって決定されることも好ましい。 According to another embodiment of the search device of the present invention, the number m of feature vectors in the third means is determined by αL (α ≦ 1) using the number L of feature vectors obtained by the second means . It is also preferable.

本発明の検索装置における他の実施形態によれば、第３の手段における特徴ベクトル数mは、第２の手段によって得られる特徴ベクトルの数Ｌを用いて、Ｌ^α（α≦１）によって決定されることも好ましい。
According to another embodiment of the search device of the present invention, the number m of feature vectors in the third means is determined by L ^α (α ≦ 1) using the number L of feature vectors obtained by the second means . It is also preferred that

本発明の検索装置における他の実施形態によれば、
リファレンスコンテンツ及びクエリコンテンツから、特徴ベクトルを抽出する特徴ベクトル集合抽出手段を更に有し、
特徴ベクトル集合抽出手段は、異なる種類のアルゴリズム毎に、複数の特徴ベクトルを出力することができ、
投票手段は、クエリコンテンツ及びリファレンスコンテンツそれぞれについて、異なる種類の特徴ベクトル毎にスコアｓを算出し、各リファレンスコンテンツについて異なる種類の特徴ベクトルのスコアｓを重み付け和した値を、最終的なスコアとする
ことも好ましい。 According to another embodiment of the search device of the present invention,
It further has a feature vector set extraction means for extracting feature vectors from the reference content and the query content,
The feature vector set extraction means can output a plurality of feature vectors for each different type of algorithm,
The voting means calculates a score s for each of different types of feature vectors for each of the query content and the reference content, and uses a value obtained by weighted sum of the scores s of the different types of feature vectors for each reference content as a final score. It is also preferable.

本発明の検索装置における他の実施形態によれば、
クエリコンテンツ及びリファレンスコンテンツは、画像であって、
リファレンスコンテンツとしての画像には、同一の物体又は同一カテゴリに属する少なくとも１つのインスタンス（対象物、オブジェクト）が写っている
ことも好ましい。 According to another embodiment of the search device of the present invention,
The query content and the reference content are images,
It is also preferable that at least one instance (object, object) belonging to the same object or the same category is shown in the image as the reference content.

本発明によれば、特徴ベクトルの集合で表されるリファレンスコンテンツの集合から、特徴ベクトルの集合で表されるクエリコンテンツに類似したリファレンスコンテンツを検索する装置に搭載されたコンピュータを機能させる検索プログラムであって、
複数のリファレンスコンテンツＲ_jから抽出された各特徴ベクトルに対応付けて、リファレンスコンテンツ識別子を記憶したリファレンス情報蓄積手段と、
リファレンス情報蓄積手段を用いて、クエリコンテンツの各特徴ベクトルｑ_iについて、類似した特徴ベクトルを持つリファレンスコンテンツの特徴ベクトルの集合Ｄを、少なくとも１つ以上探索する類似ベクトル探索手段と、
混合パラメータλを用いて、クエリコンテンツの各特徴ベクトルｑ_iが、探索された各リファレンスコンテンツから生成される確率λ・ｐ(ｑ_i|Ｒ_j)と、当該リファレンスコンテンツと無関係な背景モデルから生成される確率(1-λ)・ｐ(ｑ_i)との確率比に基づいて、リファレンスコンテンツＲ_j毎にスコアを加算することを、当該クエリコンテンツの全ての特徴ベクトルｑ_iについて実行し、最終的に、所定閾値以上の上位のスコアを得たリファレンスコンテンツＲ_jを、検索結果として出力する投票手段と
してコンピュータを機能させることを特徴とする。 According to the present invention, there is provided a search program for causing a computer mounted in an apparatus for searching for reference content similar to query content represented by a set of feature vectors to function from a set of reference content represented by a set of feature vectors. There,
Reference information storage means for storing a reference content identifier in association with each feature vector extracted from a plurality of reference contents R _j ;
Similar vector search means for searching for at least one feature vector set D of reference contents having similar feature vectors for each feature vector q _i of query contents using reference information storage means;
Using the mixed parameter λ, each feature vector q _i of the query content is generated from the probability λ · p (q _i | R _j ) generated from each searched reference content and a background model unrelated to the reference content. Based on the probability ratio with respect to the probability (1-λ) · p (q _i ), the score is added for each reference content R _j for all feature vectors q _i of the query content, and finally In particular, the computer is caused to function as voting means for outputting the reference content R _j having a higher score equal to or higher than a predetermined threshold as a search result.

本発明の検索装置及びプログラムによれば、高次元の特徴ベクトル集合を検索する際に、クエリコンテンツの特徴ベクトルに、無関係な特徴ベクトルが含まれていることを考慮して、リファレンスコンテンツに対するスコア付けをすることができる。 According to the search device and the program of the present invention, when searching for a high-dimensional feature vector set, scoring for reference content is performed in consideration that the feature vector of query content includes an irrelevant feature vector. Can do.

本発明における検索装置の機能構成図である。It is a functional block diagram of the search device in this invention. リファレンス情報生成部の処理内容を表す説明図である。It is explanatory drawing showing the processing content of a reference information generation part. リファレンスコンテンツの特徴ベクトルの１つの集合Ｄから投票する説明図である。It is explanatory drawing which votes from one set D of the feature vector of a reference content. リファレンスコンテンツの特徴ベクトルの複数の集合Ｄから投票する説明図である。FIG. 6 is an explanatory diagram for voting from a plurality of sets D of reference content feature vectors. 階層的なコードブックを表す説明図である。It is explanatory drawing showing a hierarchical codebook. 複数のリファレンスコンテンツの複数の特徴ベクトルから投票する説明図である。It is explanatory drawing which votes from the several feature vector of several reference content.

以下では、本発明の実施の形態について、図面を用いて詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

本発明の検索装置及びプログラムによれば、多数のリファレンスコンテンツ（検索対象のコンテンツ）の中から、クエリコンテンツ（検索キーのコンテンツ）に最も類似するリファレンスコンテンツを検索する。 According to the search device and program of the present invention, the reference content that is most similar to the query content (content of the search key) is searched from among a large number of reference contents (contents to be searched).

図１は、本発明における検索装置の機能構成図である。 FIG. 1 is a functional configuration diagram of a search device according to the present invention.

図１によれば、検索装置１は、リファレンス情報蓄積部１０と、特徴ベクトル集合抽出部１１と、リファレンス情報生成部１２と、類似ベクトル探索部１３と、投票部１４とを有する。これら機能構成部は、装置に搭載されたコンピュータを機能させるプログラムを実行することによって実現される。 According to FIG. 1, the search device 1 includes a reference information storage unit 10, a feature vector set extraction unit 11, a reference information generation unit 12, a similar vector search unit 13, and a voting unit 14. These functional components are realized by executing a program that causes a computer installed in the apparatus to function.

検索装置１は、多数のリファレンスコンテンツを予め入力し、リファレンス情報蓄積部１０に、リファレンスコンテンツに関する情報を記憶する。また、検索装置１は、検索の際に、検索キーとなるクエリコンテンツを入力し、リファレンス情報蓄積部１０を用いて、クエリコンテンツに最も類似するリファレンスコンテンツを検索する。 The search device 1 inputs a large number of reference contents in advance and stores information related to the reference contents in the reference information storage unit 10. In addition, the search device 1 inputs query content serving as a search key at the time of search, and uses the reference information storage unit 10 to search for reference content most similar to the query content.

クエリコンテンツ及びリファレンスコンテンツは、例えば画像である。この場合、リファレンスコンテンツとしての画像には、同一の物体又は同一カテゴリに属する少なくとも１つのインスタンス（対象物、オブジェクト）が写っている。 The query content and the reference content are images, for example. In this case, the image as the reference content includes at least one instance (object, object) belonging to the same object or the same category.

［特徴ベクトル集合抽出部１１］
特徴ベクトル集合抽出部１１は、１つのマルチメディアコンテンツから、特徴ベクトルの集合を抽出する。例えばマルチメディアコンテンツが画像である場合、その特徴ベクトルは、画像の局所特徴領域から抽出された局所特徴ベクトルである。 [Feature vector set extraction unit 11]
The feature vector set extraction unit 11 extracts a set of feature vectors from one multimedia content. For example, when the multimedia content is an image, the feature vector is a local feature vector extracted from the local feature region of the image.

特徴ベクトル集合抽出部１１は、具体的には、リファレンスコンテンツＲ_j毎に特徴ベクトルの集合を抽出し、それら特徴ベクトルの集合は、リファレンス情報蓄積部１０へ出力される。また、特徴ベクトル集合抽出部１１は、クエリコンテンツから特徴ベクトルの集合Ｑ（＝{ｑ_i}）を抽出し、それら特徴ベクトルの集合は、類似ベクトル探索部１３へ出力される。尚、リファレンスコンテンツの特徴ベクトルと、クエリコンテンツの特徴ベクトルとは、同じ次元数である。 Specifically, the feature vector set extraction unit 11 extracts a set of feature vectors for each reference content R _j, and the set of feature vectors is output to the reference information storage unit 10. The feature vector set extraction unit 11 extracts a feature vector set Q (= {q _i }) from the query content, and the set of feature vectors is output to the similar vector search unit 13. The feature vector of the reference content and the feature vector of the query content have the same number of dimensions.

物体認識に用いる特徴ベクトルの抽出アルゴリズムとしては、例えばＳＩＦＴやＳＵＲＦ(Speeded Up Robust Features)が用いられる。例えば、ＳＩＦＴの場合、１枚の画像からは１２８次元の特徴ベクトルの集合が抽出される（例えば非特許文献４参照）。ＳＩＦＴとは、スケールスペースを用いて特徴的な局所領域を解析し、そのスケール変化及び回転に不変となる特徴ベクトルを記述する技術である。一方で、ＳＵＲＦの場合、ＳＩＦＴよりも高速処理が可能であって、１枚の画像から６４次元の特徴ベクトルの集合が抽出される。 For example, SIFT or SURF (Speeded Up Robust Features) is used as an algorithm for extracting a feature vector used for object recognition. For example, in the case of SIFT, a set of 128-dimensional feature vectors is extracted from one image (for example, see Non-Patent Document 4). SIFT is a technique for analyzing a characteristic local region using a scale space and describing a feature vector that is invariant to scale change and rotation. On the other hand, in the case of SURF, higher-speed processing is possible than SIFT, and a set of 64-dimensional feature vectors is extracted from one image.

［リファレンス情報生成部１２］
リファレンス情報生成部１２は、リファレンスコンテンツの特徴ベクトルの集合Ｒ_jに対して、以下の処理を実行し、リファレンス情報蓄積部１０へコードブックを出力する。 [Reference information generator 12]
The reference information generation unit 12 executes the following process on the reference content feature vector set R _j and outputs a codebook to the reference information storage unit 10.

図２は、リファレンス情報生成部の処理内容を表す説明図である。 FIG. 2 is an explanatory diagram illustrating processing contents of the reference information generation unit.

（Ｓ２１）リファレンスコンテンツの特徴ベクトルの集合を、k個のクラスタにクラスタリングする。クラスタリングには、例えばk-meansが用いられる。
（Ｓ２２）次に、クラスタ毎に、代表ベクトルを導出する（平均ベクトル又は中央値ベクトル）。この代表ベクトルは、"Visual Words"とも称される。
（Ｓ２３）各代表ベクトルに、一意のＩＤｎ（＝１〜Ｎ）を割り当てたコードブックを生成する。 (S21) The set of reference content feature vectors is clustered into k clusters. For clustering, for example, k-means is used.
(S22) Next, a representative vector is derived for each cluster (average vector or median vector). This representative vector is also called "Visual Words".
(S23) A code book in which a unique ID n (= 1 to N) is assigned to each representative vector is generated.

例えば、入力された特徴ベクトルｆとの距離が最も小さくなる代表ベクトルｆ_ｎを算出する。
代表ベクトルｆ_ｎ＝argmin_ｎ||ｆ−ｆ_ｎ||^２
ここで、コードブックとは、代表ベクトルｆ_ｎ毎に、そのクラスタに属する１つ以上のリファレンスコンテンツＩＤ（識別子）を対応付けたものである。 For example, the representative vector f _n having the smallest distance from the input feature vector f is calculated.
Representative vector f _n = argmin _n || f−f _n || ²
Here, the code book is obtained by associating one or more reference content IDs (identifiers) belonging to the cluster for each representative vector f _n .

［リファレンス情報蓄積部１０］
リファレンス情報蓄積部１０は、複数のリファレンスコンテンツＲ_jから抽出された各特徴ベクトルに対応付けて、リファレンスコンテンツ識別子を記憶する。ここで、リファレンス情報蓄積部１０は、リファレンス情報生成部１２から出力されたコードブック（転置インデックス）を記憶するものであってもよい。コードブックは、複数のリファレンスコンテンツＲ_jから抽出された特徴ベクトルを、類似度に応じて複数の集合（クラスタ）Ｄに区分したものであり、集合毎に、リファレンスコンテンツ識別子と、これら特徴ベクトルの平均的な代表ベクトルとを割り当てたものである。 [Reference information storage unit 10]
Reference information storage unit 10 in association with the respective feature vectors extracted from a plurality of reference content R _j, and stores the reference content identifier. Here, the reference information storage unit 10 may store the code book (transposed index) output from the reference information generation unit 12. The code book is obtained by dividing feature vectors extracted from a plurality of reference contents R _j into a plurality of sets (clusters) D according to the degree of similarity. An average representative vector is assigned.

尚、以下の実施形態によれば、複数のリファレンスコンテンツjから特徴ベクトルの集合Ｒ_jを抽出した場合について詳述している。一方で、例えば非特許文献６のように、特徴ベクトルの集合Ｒ_jを、特定のカテゴリに基づく特徴ベクトルの集合とすることによって、クエリコンテンツを、カテゴリに分類することもできる。この場合、後述するように、クエリコンテンツに対して、各カテゴリに対するスコアを算出し、最もスコアの高い上位複数件のカテゴリに分類するか、スコアが一定以上のカテゴリのタグを付加することができる。 In the following embodiment, a case where a set of feature vectors R _j is extracted from a plurality of reference contents j will be described in detail. On the other hand, as in Non-Patent Document 6, for example, the query content can be classified into categories by setting the feature vector set R _j as a set of feature vectors based on a specific category. In this case, as will be described later, a score for each category can be calculated for the query content and classified into a plurality of top categories having the highest score, or a tag of a category having a score of a certain level or more can be added. .

［類似ベクトル探索部１３］
類似ベクトル探索部１３は、リファレンス情報蓄積部１０を用いて、クエリコンテンツの各特徴ベクトルｑ_iについて、類似した特徴ベクトルを持つリファレンスコンテンツの特徴ベクトルの集合Ｄを、少なくとも１つ以上探索する。クエリコンテンツの特徴ベクトルｑ_iと、リファレンスコンテンツの特徴ベクトルとの間の距離が短いほど、類似度が高いことを意味する。具体的には、最近傍探索(Approximate Nearest Neighbor)アルゴリズムの１つである直積量子化を用いた方法（例えば非特許文献５参照）やHamming Embeddingを用いた方法（例えば非特許文献２参照）、ＬＳＨ(Locality-Sensitive Hashing)を用いることも好ましい。探索された１つ以上の特徴ベクトルの集合Ｄに基づくリファレンスコンテンツＩＤは、投票部１４へ出力される。 [Similar vector search unit 13]
The similar vector search unit 13 uses the reference information storage unit 10 to search for at least one or more sets D of reference content feature vectors having similar feature vectors for each feature vector q _i of the query content. The shorter the distance between the query content feature vector q _i and the reference content feature vector, the higher the similarity. Specifically, a method using Cartesian product quantization (for example, see Non-Patent Document 5) or a method using Hamming Embedding (for example, see Non-Patent Document 2), which is one of the Approximate Nearest Neighbor algorithms. It is also preferable to use LSH (Locality-Sensitive Hashing). The reference content ID based on the searched set D of one or more feature vectors is output to the voting unit 14.

［投票部１４］
投票部１４は、リファレンスコンテンツＲ_j毎にスコアを加算することを、当該クエリコンテンツの全ての特徴ベクトルｑ_iについて実行し、最終的に、所定閾値以上の上位のスコアを得たリファレンスコンテンツＲ_jを、検索結果として出力する。投票部１４は、従来技術によればＩＤＦによって投票していたのに対し、本発明によれば、以下に詳述する算出式によって投票される。 [Voting section 14]
Voting section 14, the adding the scores for each reference content R _j, performed for all feature vectors q _i of the query content, finally, reference content R _j to obtain a score of more than the upper predetermined threshold value Is output as a search result. According to the present invention, the voting unit 14 is voted by the calculation formula described in detail below, whereas the voting unit 14 has voted by the IDF according to the prior art.

本発明によれば、クエリコンテンツを生成したであろう可能性の最も高いリファレンスコンテンツj’を導出する。以下の式は、事後確率を意味し、クエリコンテンツが、j番目のリファレンスコンテンツから生成されたであろう確率ｐを表す。
j’＝argmax_jｐ(Ｒ_j|Ｑ)
Ｑ：クエリコンテンツの特徴ベクトルの集合
Ｒ_j：j番目のリファレンスコンテンツの特徴ベクトルの集合
ｐ(Ｒ_j|Ｑ)：クエリコンテンツの特徴ベクトルの集合Ｑから、
リファレンスコンテンツの特徴ベクトルの集合Ｒ_jが生成される事後確率
argmax_j：右項の事後確率を最大とするjを導出することを意味する According to the present invention, the reference content j ′ that is most likely to have generated the query content is derived. The following equation represents the posterior probability and represents the probability p that the query content would have been generated from the jth reference content.
j ′ = argmax _j p (R _j | Q)
Q: Set of feature vectors of query content R _j : Set of feature vectors of j-th reference content p (R _j | Q): From set Q of feature vectors of query content
A posteriori probability that a set of reference content feature vectors R _j is generated
argmax _j : means to derive j that maximizes the posterior probability of the right term

前述の事後確率の式は、一般的に、ベイズの定理を用いて以下の式が成立する。これは、事前確率に尤度確率を乗算することによって、事後確率を算出するものである。
j’＝argmax_jｐ(Ｒ_j|Ｑ)＝argmax_jｐ(Ｑ|Ｒ_j)ｐ(Ｒ_j)
ｐ(Ｑ|Ｒ_j)：リファレンスコンテンツの特徴ベクトルの集合Ｒ_jから、
クエリコンテンツの特徴ベクトルの集合Ｑが生成される尤度確率
ｐ(Ｒ_j) ：リファレンスコンテンツの特徴ベクトルの集合Ｒjが検索される
事前確率
（ｐ(Ｒ_j)が高いほど、検索される確率が高いことを意味する） In general, the following posterior probability formula is established using Bayes' theorem. In this method, the posterior probability is calculated by multiplying the prior probability by the likelihood probability.
j ′ = argmax _j p (R _j | Q) = argmax _j p (Q | R _j ) p (R _j )
p (Q | R _j ): From the set R _j of reference content feature vectors,
Likelihood probability p (R _j ) that a set Q of query content feature vectors is generated: A set Rj of reference content feature vectors is searched
Prior probability
(The higher p (R _j ) means the higher the probability of being searched)

尚、ここで、検索されるリファレンスコンテンツには偏りがなく、ｐ(Ｒ_j)は、いずれのjであっても一定であると仮定する。そうすると、ｐ(Ｒ_j)を削除することができ、単に以下のように表される。
j’＝argmax_j ｐ(Ｑ|Ｒ_j) Here, it is assumed that the reference content to be searched is not biased and p (R _j ) is constant regardless of j. Then, p (R _j ) can be deleted, and is simply expressed as follows.
j ′ = argmax _j p (Q | R _j )

ここで、クエリコンテンツの特徴ベクトルの集合Ｑは、独立に生成されたものであると仮定する。「独立に生成」とは、ある特徴が出た場合、次に必ず特定の特徴が出るような影響が無い、即ち、以前の結果に影響しないことを意味する。この場合、クエリコンテンツの特徴ベクトルの集合Ｑの各特徴ベクトルｑ₁,ｑ₂,ｑ₃,・・・,ｑnの個々に基づく確率の積となる。この場合、以下の式によって表される。
j’＝argmax_jΠ_i=1 ⁿｐ(ｑ_i|Ｒ_j) Here, it is assumed that the set Q of query content feature vectors is generated independently. “Independently generated” means that when a certain feature appears, there is no influence that a specific feature appears next, that is, it does not affect the previous result. In this case, it is the product of the probabilities based on the individual feature vectors q ₁ , q ₂ , q ₃ ,. In this case, it is represented by the following formula.
j ′ = argmax _j Π _{i = 1} ⁿ p (q _i | R _j )

更に、確率の積Πは、一般に、logの和Σによって表すことができる。単調増加関数であるために、確率の大小関係は維持されるためである。この場合、以下の式によって表される。
j’＝argmax_jΠ_i=1 ⁿｐ(ｑ_i|Ｒ_j)＝argmax_jΣ_i=1 ⁿlogｐ(ｑ_i|Ｒ_j) Furthermore, the product of the probabilities can generally be expressed by the sum Σ of logs. This is because the magnitude relationship of the probabilities is maintained because it is a monotonically increasing function. In this case, it is represented by the following formula.
j ′ = argmax _j Π _{i = 1} ⁿ p (q _i | R _j ) = argmax _j Σ _{i = 1} ⁿ log p (q _i | R _j )

ここで、各クエリ特徴ベクトルが、リファレンスコンテンツの特徴ベクトル集合から生成された確率と、リファレンスコンテンツとは無関係な背景モデルから生成された確率の線形結合としてモデル化を行う。
j’＝argmax_jΣ_i=1 ⁿlogｐ(ｑ_i|Ｒ_j)
＝argmax_jΣ_i=1 ⁿlog(λｐ(ｑ_i|Ｒ_j)＋(1-λ)・ｐ(ｑ_i))
＝argmax_jΣ_i=1 ⁿ(log(λｐ(ｑ_i|Ｒ_j)＋(1-λ)・ｐ(ｑ_i))−log(1-λ)・ｐ(ｑ_i))
＝argmax_jΣ_i=1 ⁿlog{λ/(1-λ)・ｐ(ｑ_i|Ｒ_j)/ｐ(ｑ_i)＋１}
i：クエリコンテンツの特徴ベクトルのＩＤ
λ：線形結合の混合パラメータ
ｐ(ｑ_i)：リファレンスコンテンツとは無関係な背景モデルから生成された確率
（クエリコンテンツにおける対象物と無関係な背景画像に基づく）
λｐ(ｑ_i|Ｒ_j)＋(1-λ)・ｐ(ｑ_i)：
λにおけるｐ(ｑ_i|Ｒ_j)の確率と、(1-λ)におけるｐ(ｑ_i)の確率との和は、
全体の確率を意味する
−log(1-λ)・ｐ(ｑ_i)：
変形のため全体的なペナルティを引いても、順番は変わらない。
後述する式の変形のためのもの。
λ/(1-λ)・ｐ(ｑ_i|Ｒ_j)/ｐ(ｑ_i)＋１：
"logａ−logｂ＝logａ/ｂ"に基づいて、式を変形したもの Here, each query feature vector is modeled as a linear combination of the probability generated from the feature vector set of the reference content and the probability generated from the background model unrelated to the reference content.
j ′ = argmax _j Σ _{i = 1} ⁿ logp (q _i | R _j )
= Argmax _j Σ _{i = 1} ⁿ log (λp (q _i | R _j ) + (1-λ) · p (q _i ))
= Argmax _j Σ _{i = 1} ⁿ (log (λp (q _i | R _j ) + (1-λ) · p (q _i )) − log (1-λ) · p (q _i ))
= Argmax _j Σ _{i = 1} ⁿ log {λ / (1-λ) · p (q _i | R _j ) / p (q _i ) +1}
i: ID of feature vector of query content
λ: Mixed parameter of linear combination p (q _i ): Probability generated from background model unrelated to reference content
(Based on the background image unrelated to the object in the query content)
λp (q _i | R _j ) + (1-λ) · p (q _i ):
The sum of the probability of p (q _i | R _j ) at λ and the probability of p (q _i ) at (1-λ) is
-Log (1-λ) · p (q _i ) which means the overall probability:
Even if an overall penalty is drawn for deformation, the order does not change.
For transformation of the formula described later.
λ / (1-λ) · p (q _i | R _j ) / p (q _i ) +1:
Based on "loga-logb = loga / b"

本発明によれば、混合パラメータλを用いて、クエリコンテンツの各特徴ベクトルｑ_iが、探索された各リファレンスコンテンツから生成される確率λ・ｐ(ｑ_i|Ｒ_j)と、当該リファレンスコンテンツと無関係な背景モデルから生成される確率(1-λ)・ｐ(ｑ_i)との確率比を用いる。 According to the present invention, the probability λ · p (q _i | R _j ) that each feature vector q _i of the query content is generated from each searched reference content using the mixed parameter λ, the reference content, The probability ratio with the probability (1-λ) · p (q _i ) generated from an irrelevant background model is used.

ここで、以下の式のように置く。
ｓ_ij＝log{λ/(1-λ)・ｐ(ｑ_i|Ｒ_j)/ｐ(ｑ_i)＋１}
i：クエリコンテンツの特徴ベクトルのＩＤ
j：リファンレンスコンテンツのＩＤ
ｑ_i：クエリコンテンツの特徴ベクトル
Ｒ_j：リファンレンスコンテンツ
ｓ_ijとは、クエリコンテンツ特徴ベクトルｑ_iが観測された際に、ｑ_iそれぞれが、リファンレンスコンテンツjが得られるスコアを意味する。即ち、ｑ_iが観測された時点で、それぞれが、リファレンスコンテンツjから生成された尤もらしさを意味する。 Here, it puts like the following formula.
s _ij = log {λ / (1-λ) · p (q _i | R _j ) / p (q _i ) +1}
i: ID of feature vector of query content
j: ID of the reference content
q _i : Feature vector of query content R _j : Reference content s _ij means that when query content feature vector q _i is observed, each q _i means a score for obtaining reference content j . That is, when q _i is observed, each means the likelihood generated from the reference content j.

そして、クエリコンテンツの特徴ベクトルi及びリファンレンスコンテンツjの全てについて、「ｓ_ij」を算出する。そして、スコアΣ_i=1 ⁿｓ_ijが最大となるリファレンスコンテンツＲ_jを検索結果として選択する。 Then, “s _ij ” is calculated for all of the feature vector i and the reference content j of the query content. Then, the reference content R _j having the maximum score Σ _{i = 1} ⁿ s _ij is selected as a search result.

しかしながら、ｓ_ijは、各iについて全てのjについて算出しなければならないために、大規模データベースを対象とした場合、その計算量が膨大なものとなる。 However, since s _ij must be calculated for all i for each i, the amount of calculation becomes enormous when targeting a large-scale database.

そこで、本発明によれば、近似法を適用し、クエリコンテンツの特徴ベクトルｑ_iについて、リファレンスコンテンツの特徴ベクトル集合の中から、ｑ_iに類似した特徴ベクトル集合Ｄ(ｑ_i)を抽出する。そして、以下のようにｓ_ijの算出を近似する。
Ｄ(ｑ_i)に対応する特徴ベクトルが含まれるリファレンスコンテンツＲ_jのみについて「ｓ_ij」を算出する
それ以外のＲ_jについては、ｐ(ｑ_i|Ｒ_j)＝０とする
このとき、ｓ_ij＝log(1)＝０となるために、Ｄ(ｑ_i)に対応する特徴ベクトルが含まれないリファレンスコンテンツのスコアは増減しない。 Therefore, according to the present invention, an approximation method is applied to extract a feature vector set D (q _i ) similar to q _i from the feature vector set of the reference content for the query content feature vector q _i . Then, the calculation of s _ij is approximated as follows.
“S _ij ” is calculated only for the reference content R _j including the feature vector corresponding to D (q _i ). For other R _j , p (q _i | R _j ) = 0. _{Since ij} = log (1) = 0, the score of the reference content not including the feature vector corresponding to D (q _i ) is not increased or decreased.

ここで、Ｄ(ｑ_i)は更に、m（1〜M）個の互いに素な集合から構成されるとする。
Ｄ(ｑ_i)＝Ｄ₁(ｑ_i)∪Ｄ₂(ｑ_i)∪・・・∪Ｄ_m(ｑ_i)
リファレンス情報蓄積部１０は、コードブックに、多数のリファレンスコンテンツにおける複数の代表ベクトルが登録されている。そして、各代表ベクトルには、リファンレンスコンテンツのＩＤが紐付けられている。ここで、各代表ベクトルに紐付くリファンレンスコンテンツの特徴ベクトルの集合は、それらm個の中で、特徴ベクトルが互いにオーバラップしない。即ち、「互いに素」であると言える。 Here, D (q _i ) is further composed of m (1 to M) disjoint sets.
D (q _i ) = D ₁ (q _i ) ∪D ₂ (q _i ) ∪ ... ∪D _m (q _i )
In the reference information storage unit 10, a plurality of representative vectors in a large number of reference contents are registered in a code book. Each representative vector is associated with the ID of the reference content. Here, the set of feature vectors of the reference content associated with each representative vector does not overlap each other among the m feature vectors. That is, it can be said that they are “prime”.

また、Ｄt(ｑ_i)及びＤs(ｑ_i)について、t＜sであれば、以下の不等式が成立する。
ｐ(ｑ_i|Ｄt(ｑ_i)) ＞ｐ(ｑ_i|Ｄs(ｑ_i))
即ち、t＜sのとき、ｑ_iは、Ｄs(ｑ_i)よりもＤt(ｑ_i)から生成された確率が高い。このＤ₁(ｑ_i)・・・Ｄ_m(ｑ_i)それぞれについて、ｓ_ijを以下のように算出する。 For Dt (q _i ) and Ds (q _i ), if t <s, the following inequality holds.
p (q _i | Dt (q _i ))> p (q _i | Ds (q _i ))
That is, when t <s, q _i has a higher probability of being generated from Dt (q _i ) than Ds (q _i ). For each of D ₁ (q _i )... D _m (q _i ), s _ij is calculated as follows.

各Ｄt(ｑ_i)について、ｐ(ｑ_i|Ｒ_j)及びｐ(ｑ_i)は、ｋ近傍密度推定法によって算出される。
ｐ(ｑ_i|Ｒ_j)＝ｎ_tj／(|Ｒ_j|・Ｖ_t)
ｐ(ｑ_i)＝Σ_s=1 ^t|Ｄs(ｑ_i)|／(|Ｒall|・Ｖ_t)
ｎ_tj：Ｄ₁(ｑ_i)・・・Ｄt(ｑ_i)に出現するＲ_jの特徴ベクトルの数
Ｒall：リファレンスコンテンツ全ての特徴ベクトル集合
Ｖ_t：ｑ_iとｒ’tの距離を半径とする超球（３次元以上）の体積
（尚、２次元であれば面積、１次元であれば長さを表す）
ｒ’t：Ｄt(q_i)の中で最もｑ_iから遠い特徴ベクトル
ここで、ｐ(ｑ_i)は、|Ｒall|個（ｋ近傍密度推定法におけるＮ）のうち、Σ_s=1 ^t|Ｄs(ｑ_i)|個（ｋ近傍密度推定法におけるk）が落ちる確率を、体積Ｖtで割ったものであることを表す。 For each Dt (q _i ), p (q _i | R _j ) and p (q _i ) are calculated by the k neighborhood density estimation method.
p (q _i | R _j ) = n _tj / (| R _j | · V _t )
p (q _i ) = Σ _{s = 1} ^t | Ds (q _i ) | / (| Rall | · V _t )
n _tj : D ₁ (q _i )... Dt (q _i ) number of feature vectors of R _j Rall: feature vector set of all reference contents V _t : distance between q _i and r′t as radius Volume of a supersphere (more than 3 dimensions)
(In addition, if 2D, it represents the area, 1D represents the length)
Rt ': Here farther feature vector from the most q _i in Dt (q _i), p (q _i) is, | Rall | of (N in k-nearest neighbor density estimation) number, Σ _{s = 1} ^t The probability that | Ds (q _i ) | (k in the k neighborhood density estimation method) falls is divided by the volume Vt.

ここで、Ｖ_tを算出する場合、多大な計算量が必要となる。そこで、本発明によれば、ｓ_ijの式に代入することによって、Ｖ_tの項を削除し、以下の式によって算出する。
ｓ_ij＝log{λ/(1-λ)・ｎ_tj|Ｒall|／Σ_s=1 ^t(|Ｄs(ｑ_i)|・|Ｒ_j｜)＋１}
このスコアは、各画像について、最も大きなスコアのみが加算されることとする。 Here, the case of calculating the V _t, required a great deal of computational complexity. Therefore, according to the present invention, the term of V _t is deleted by substituting it into the expression of s _ij , and calculation is performed according to the following expression.
s _ij = log {λ / (1-λ) · n _tj | Rall | / Σ _{s = 1} ^t (| Ds (q _i ) | · | R _j |) +1}
As for this score, only the largest score is added for each image.

即ち、本発明の投票部１４は、上位からt番目までの特徴ベクトルの集合Ｄ_tについて、クエリコンテンツの特徴ベクトルｑ_iに対するリファレンスコンテンツjのスコアｓ_jは、以下の式によって算出される。
ｓ_j＝max_jlog｛λ/(1-λ)・(ｎ_tj・|Ｒ_all|)／(Σ_s=1 ^tＤ_s(ｑ_i)・|Ｒ_j|)＋１)}
ｎ_tj： t番目までの部分集合Ｄ_tに含まれる
リファレンスコンテンツjの特徴ベクトルの数
Σ_s=1 ^tＤ_s(ｑ_i)： t番目までの部分集合Ｄ_tに含まれる
全リファレンスコンテンツjの特徴ベクトルの数
|Ｒ_all|：全リファレンスコンテンツにおける全特徴ベクトルの数
|Ｒ_j|：当該リファレンスコンテンツjにおける全特徴ベクトルの数
λ、1-λ：混合パラメータ That is, the voting unit 14 of the present invention calculates the score s _j of the reference content j with respect to the feature vector q _i of the query content for the set D _t of feature vectors from the top to the t-th by the following equation.
s _j = max _j log {λ / (1-λ) · (n _tj · | R _all |) / (Σ _{s = 1} ^t D _s (q _i ) · | R _j |) +1)}
n _tj : included in t-th subset D _t
Number of feature vectors for reference content j
Σ _{s = 1} ^t D _s (q _i ): included in t-th subset D _t
Number of feature vectors of all reference content j
| R _all |: Number of all feature vectors in all reference contents
| R _j |: Number of all feature vectors in the reference content j
λ, 1-λ: Mixing parameters

上述の式によれば、確率比「ｎ_tj／Σ_s=1 ^t(|Ｄs(ｑ_i)|」は、探索されたリファレンスコンテンツの特徴ベクトルの集合Ｄに含まれる、全てのリファレンスコンテンツＲjの出現数に対する、各リファレンスコンテンツＲjの出現数との比によって表される出現頻度とする。 According to the above equation, the probability ratio “n _tj / Σ _{s = 1} ^t (| Ds (q _i ) |” is the value of all the reference contents Rj included in the set D of feature vectors of the searched reference contents. The appearance frequency is represented by the ratio of the number of appearances to the number of appearances of each reference content Rj.

［類似ベクトル探索部１３及び投票部１４における具体的な処理内容］
以下では、本発明における類似ベクトル探索部１３及び投票部１４における具体的な処理内容について詳述する。 [Specific Processing Contents in Similar Vector Search Unit 13 and Voting Unit 14]
Below, the specific processing content in the similar vector search part 13 and the voting part 14 in this invention is explained in full detail.

検索の際、クエリコンテンツから特徴ベクトルの集合Ｑが抽出され、各特徴ベクトルｑ_i毎に、ベクトル量子化によって、代表ベクトルに紐付いているリファレンスコンテンツの特徴ベクトルの１つ以上の集合Ｄが検索される。そして、対応するリファレンスコンテンツＩＤに投票する。クエリコンテンツの全ての特徴ベクトルｑ_iについて投票した後、スコアの上位のリファレンスコンテンツＩＤを検索結果とする。 During the search, a set Q of feature vectors is extracted from the query content, and for each feature vector q _i , one or more sets D of reference content feature vectors linked to the representative vector are searched by vector quantization. The Then, vote for the corresponding reference content ID. After voting for all the feature vectors q _i of the query content, a reference content ID having a higher score is used as a search result.

図３は、リファレンスコンテンツの特徴ベクトルの１つの集合Ｄから投票する説明図である。 FIG. 3 is an explanatory diagram for voting from one set D of feature vectors of reference content.

類似ベクトル探索部１３は、クエリコンテンツの各特徴ベクトルｑ_iに対して類似する順に上位m件の集合Ｄを探索することも好ましい。図３によれば、上位１つの集合Ｄ(ｑ_i)が探索されている（m＝1，Ｄ(ｑ_i)＝Ｄ₁(ｑ_i)）。その１つの集合には、８つのリファレンスコンテンツＩＤが登録されている。但し、ここで、８件のＩＤに順序が付けられていないことに注意する。８つのリファレンスコンテンツＩＤの中で、ユニークなＩＤは４つ（１，４，５，６）ある。 It is also preferable that the similar vector search unit 13 searches for the top m sets D in the order similar to each feature vector q _i of the query content. According to FIG. 3, the top one set D (q _i ) is searched (m = 1, D (q _i ) = D ₁ (q _i )). Eight reference content IDs are registered in one set. However, note that the 8 IDs are not ordered. Among the eight reference content IDs, there are four (1, 4, 5, 6) unique IDs.

リファレンスコンテンツＩＤ＝１：ｎ₁₁＝３個
score₁＝score₁＋log｛λ/(1-λ)・(３|Ｒall|／８|Ｒ₁|)＋１｝
（８個の中で３個が、ＩＤ＝１であることを意味）
リファレンスコンテンツＩＤ＝４：ｎ₁₄＝２個
score₄＝score₄＋log｛λ/(1-λ)・(２|Ｒall|／８|Ｒ₁|)＋１｝
（８個の中で２個が、ＩＤ＝４であることを意味）
リファレンスコンテンツＩＤ＝５：ｎ₁₅＝１個
score₅＝score₅＋log｛λ/(1-λ)・(１|Ｒall|／８|Ｒ₁|)＋１｝
（８個の中で１個が、ＩＤ＝５であることを意味）
リファレンスコンテンツＩＤ＝６：ｎ₁₆＝２個
score₅＝score₅＋log｛λ/(1-λ)・(２|Ｒall|／８|Ｒ₁|)＋１｝
（８個の中で２個が、ＩＤ＝６であることを意味）
本発明によれば、投票部１４は、部分集合Ｄ_t（1≦t≦m）毎に各リファレンスコンテンツについてスコアを算出し、当該スコアが最大となる部分集合Ｄ_t’におけるスコアを投票に用いる。 Reference content ID = 1: n ₁₁ = 3
score ₁ = score ₁ + log {λ / (1-λ) · (3 | Rall | / 8 | R ₁ |) +1}
(3 out of 8 means ID = 1)
Reference content ID = 4: n ₁₄ = 2
score ₄ = score ₄ + log {λ / (1-λ) · (2 | Rall | / 8 | R ₁ |) +1}
(2 out of 8 means ID = 4)
Reference content ID = 5: n ₁₅ = 1
score ₅ = score ₅ + log {λ / (1-λ) · (1 | Rall | / 8 | R ₁ |) +1}
(1 out of 8 means ID = 5)
Reference content ID = 6: n ₁₆ = 2
score ₅ = score ₅ + log {λ / (1-λ) · (2 | Rall | / 8 | R ₁ |) +1}
(2 out of 8 means ID = 6)
According to the present invention, the voting unit 14 calculates a score for each reference content for each subset D _t (1 ≦ t ≦ m), and uses the score in the subset D _t ′ that maximizes the score for voting. .

図４は、リファレンスコンテンツの特徴ベクトルの複数の集合Ｄから投票する説明図である。 FIG. 4 is an explanatory diagram for voting from a plurality of sets D of reference content feature vectors.

図３は、最近傍の代表ベクトルに紐付いている集合Ｄから投票するのに対し、図４は、ｋ近傍の代表ベクトルに紐付いている集合Ｄから投票する。類似ベクトル探索部１３は、探索されたリファレンスコンテンツの特徴ベクトルの集合を、クエリコンテンツの特徴ベクトルｑ_iに対する平均的に類似度に応じて１つ以上の部分集合Ｄに区分し、上位からの近傍数m（m≧1）番目までの部分集合Ｄを順序付けするものであってもよい。 3 votes from the set D associated with the nearest representative vector, whereas FIG. 4 votes from the set D associated with the representative vector near k. The similar vector search unit 13 divides the set of feature vectors of the searched reference content into one or more subsets D according to the average similarity to the feature vector q _i of the query content, The subsets D up to several m (m ≧ 1) may be ordered.

図４によれば、類似ベクトル探索部は、クエリコンテンツの１つの特徴ベクトルｑ_iについて、ｋ近傍探索（m＝3）によって、リファレンスコンテンツの特徴ベクトルの３つの集合Ｄ(ｑ_i)を探索する(m＝3，Ｄ(ｑ_i)＝Ｄ₁(ｑ_i)、Ｄ₂(ｑ_i)、Ｄ₃(ｑ_i))。図４によれば、Ｄ₁(ｑ_i)の集合には、３つのリファレンスコンテンツＩＤが登録されており、Ｄ₂(ｑ_i)の集合には、３つのリファレンスコンテンツＩＤが登録されており、Ｄ₃(ｑ_i)の集合には、２つのリファレンスコンテンツＩＤが登録されている。図４によれば、Ｄ₁->Ｄ₂->Ｄ₃の順に順序付けられている。 According to FIG. 4, the similar vector search unit searches for three sets D (q _i ) of feature vectors of the reference content by k-neighbor search (m = 3) for one feature vector q _i of the query content. (m = 3, D (q _i ) = D ₁ (q _i ), D ₂ (q _i ), D ₃ (q _i )). According to FIG. 4, three reference content IDs are registered in the set of D ₁ (q _i ), and three reference content IDs are registered in the set of D ₂ (q _i ). Two reference content IDs are registered in the set of D ₃ (q _i ). According to FIG. 4, the order is in the order of D _1- > D _2- > D ₃ .

（第１の集合t＝1、|Ｄ₁(ｑ_i)|＝３個、Σ_s=1 ^t|Ｄ₁(ｑ_i)|＝３個）
リファレンスコンテンツＩＤ＝１：ｎ₁₁＝２個
score₁＝score₁＋log｛λ/(1-λ)・(２|Ｒall|／３|Ｒ₁|)＋１｝
リファレンスコンテンツＩＤ＝４：ｎ₁₄＝１個
score₄＝score₄＋log｛λ/(1-λ)・(１|Ｒall|／３|Ｒ₄|)＋１｝
（第２の集合t＝2、|Ｄ₂(ｑ_i)|＝３個、Σ_s=1 ^t|Ｄ₂(ｑ_i)|＝６個）
リファレンスコンテンツＩＤ＝５：ｎ₁₅＝１個
score₅＝score₅＋log｛λ/(1-λ)・(１|Ｒall|／６|Ｒ₅|)＋１｝
リファレンスコンテンツＩＤ＝４：ｎ₁₄＝１個
×score₄＝score₄＋log｛λ/(1-λ)・(２|Ｒall|／６|Ｒ₄|)＋１｝
★ここで、先のＤ₁で得られたscore₄と同じであるために、採用しない。
リファレンスコンテンツＩＤ＝１：ｎ₁₁＝１個
×score₁＝score₁＋log｛λ/(1-λ)・(３|Ｒall|／６|Ｒ₁|)＋１｝
★ここで、先のＤ₁で得られたscore₁よりも小さい（２／３＞３／６）ために、
採用しない。
（第３の集合t＝3、|Ｄ₃(ｑ_i)|＝２個、Σ_s=1 ^t|Ｄ₃(ｑ_i)|＝８個）
リファレンスコンテンツＩＤ＝６：ｎ₁₆＝２個
score₆＝score₆＋log｛λ/(1-λ)・(２|Ｒall|／８|Ｒ₆|)＋１｝ (First set t = 1, | D ₁ (q _i ) | = 3, Σ _{s = 1} ^t | D ₁ (q _i ) | = 3)
Reference content ID = 1: n ₁₁ = 2
score ₁ = score ₁ + log {λ / (1-λ) · (2 | Rall | / 3 | R ₁ |) +1}
Reference content ID = 4: n ₁₄ = 1
score ₄ = score ₄ + log {λ / (1-λ) · (1 | Rall | / 3 | R ₄ |) +1}
(Second set t = 2, | D ₂ (q _i ) | = 3, Σ _{s = 1} ^t | D ₂ (q _i ) | = 6)
Reference content ID = 5: n ₁₅ = 1
score ₅ = score ₅ + log {λ / (1-λ) · (1 | Rall | / 6 | R ₅ |) +1}
Reference content ID = 4: n ₁₄ = 1 × score ₄ = score ₄ + log {λ / (1-λ) · (2 | Rall | / 6 | R ₄ |) +1}
* Here, since it is the same as score ₄ obtained in D ₁ above, it is not adopted.
Reference content ID = 1: n ₁₁ = 1 piece xscore ₁ = score ₁ + log {λ / (1-λ) · (3 | Rall | / 6 | R ₁ |) +1}
★ Here, because it is smaller than score ₁ obtained with D ₁ above (2/3> 3/6),
Not adopted.
(Third set t = 3, | D ₃ (q _i ) | = 2, Σ _{s = 1} ^t | D ₃ (q _i ) | = 8)
Reference content ID = 6: n ₁₆ = 2
score ₆ = score ₆ + log {λ / (1-λ) · (2 | Rall | / 8 | R ₆ |) +1}

図５は、階層的なコードブックを表す説明図である。 FIG. 5 is an explanatory diagram showing a hierarchical code book.

図５によれば、図３及び図４と比較して、コードブックが階層的に構成されている（例えば非特許文献７参照）。このような場合であっても、前述した図３及び図４と同様に、リファレンスコンテンツの特徴ベクトルの集合毎にスコアを投票することができる。 According to FIG. 5, the code book is hierarchically configured as compared with FIGS. 3 and 4 (see, for example, Non-Patent Document 7). Even in such a case, a score can be voted for each set of reference content feature vectors, as in FIGS. 3 and 4 described above.

図６は、複数のリファレンスコンテンツの複数の特徴ベクトルから投票する説明図である。 FIG. 6 is an explanatory diagram for voting from a plurality of feature vectors of a plurality of reference contents.

図６によれば、図３及び図４のように複数のリファレンスコンテンツの複数の特徴ベクトルが集合Ｄとして構成されることなく、個別に分散して構成されたものである。クエリコンテンツの各特徴ベクトルから、直積量子化を用いた方法やHamming Embeddingを用いた方法（例えば非特許文献２参照）、ＬＳＨ等のアルゴリズムを用いて、単にm近傍探索として、リファレンスコンテンツの近傍特徴ベクトルが探索されたものである。 According to FIG. 6, a plurality of feature vectors of a plurality of reference contents are not configured as a set D as shown in FIGS. From each feature vector of the query content, using the method of Cartesian product quantization, the method using Hamming Embedding (see Non-Patent Document 2, for example), or an algorithm such as LSH, the neighborhood feature of the reference content is simply used as an m neighborhood search. The vector has been searched.

（第１の特徴ベクトルt＝1、|Ｄ₁(ｑ_i)|＝１個、Σ_s=1 ^t|Ｄ₁(ｑ_i)|＝１個）
リファレンスコンテンツＩＤ＝１：ｎ₁₁＝１個
score₁＝score₁＋log｛λ/(1-λ)・(１|Ｒall|／１|Ｒ₁|)＋１｝
（第２の特徴ベクトルt＝2、|Ｄ₂(ｑ_i)|＝１個、Σ_s=1 ^t|Ｄ₂(ｑ_i)|＝２個）
リファレンスコンテンツＩＤ＝４：ｎ₂₄＝１個
score₄＝score₄＋log｛λ/(1-λ)・(１|Ｒall|／２|Ｒ₁|)＋１｝
（第３の特徴ベクトルt＝3、|Ｄ₃(ｑ_i)|＝１個、Σ_s=1 ^t|Ｄ₃(ｑ_i)|＝３個）
リファレンスコンテンツＩＤ＝１：ｎ₃₁＝１個
×score₁＝score₁＋log｛λ/(1-λ)・(２|Ｒall|／３|Ｒ₁|)＋１｝
★ここで、先のＤ₁で得られたscore₁よりも小さい（１／１＞２／３）ために、
採用しない。
（第４の特徴ベクトルt＝4、|Ｄ₄(ｑ_i)|＝１個、Σ_s=1 ^t|Ｄ₄(ｑ_i)|＝４個）
リファレンスコンテンツＩＤ＝５：ｎ₄₅＝１個
score₅＝score₅＋log｛λ/(1-λ)・(１|Ｒall|／４|Ｒ₁|)＋１｝
（第５の特徴ベクトルt＝5、|Ｄ₅(ｑ_i)|＝１個、Σ_s=1 ^t|Ｄ₅(ｑ_i)|＝５個）
リファレンスコンテンツＩＤ＝４：ｎ₅₄＝１個
score₄＝score₄＋log｛λ/(1-λ)・(２|Ｒall|／５|Ｒ₁|)＋１｝
★ここで、先のＤ₂で得られたscore₄よりも小さい（１／２＞２／５）ために、
採用しない。
（第６の特徴ベクトルt＝6、|Ｄ₆(ｑ_i)|＝１個、Σ_s=1 ^t|Ｄ₆(ｑ_i)|＝６個）
リファレンスコンテンツＩＤ＝１：ｎ₆₁＝１個
score₁＝score₁＋log｛λ/(1-λ)・(３|Ｒall|／６|Ｒ₁|)＋１｝
★ここで、先のＤ₁で得られたscore₁よりも小さい（１／１＞３／６）ために、
採用しない。
（第７の特徴ベクトルt＝7、|Ｄ₇(ｑ_i)|＝１個、Σ_s=1 ^t|Ｄ₇(ｑ_i)|＝７個）
リファレンスコンテンツＩＤ＝６：ｎ₇₆＝１個
score₆＝score₆＋log｛λ/(1-λ)・(１|Ｒall|／７|Ｒ₁|)＋１｝
（第８の特徴ベクトルt＝8、|Ｄ₈(ｑ_i)|＝１個、Σ_s=1 ^t|Ｄ₈(ｑ_i)|＝８個）
リファレンスコンテンツＩＤ＝６：ｎ₈₆＝１個
score₆＝score₆＋log｛λ/(1-λ)・(２|Ｒall|／８|Ｒ₁|)＋１｝
★ここで、先のＤ₇で得られたscore₆よりも大きい（１／７＜２／８）ために、
先のＤ₇で得られたscore₆を採用しない。 (First feature vector t = 1, | D ₁ (q _i ) | = 1, Σ _{s = 1} ^t | D ₁ (q _i ) | = 1)
Reference content ID = 1: n ₁₁ = 1
score ₁ = score ₁ + log {λ / (1-λ) · (1 | Rall | / 1 | R ₁ |) +1}
(Second feature vector t = 2, | D ₂ (q _i ) | = 1, Σ _{s = 1} ^t | D ₂ (q _i ) | = 2)
Reference content ID = 4: n ₂₄ = 1
score ₄ = score ₄ + log {λ / (1-λ) · (1 | Rall | / 2 | R ₁ |) +1}
(Third feature vector t = 3, | D ₃ (q _i ) | = 1, Σ _{s = 1} ^t | D ₃ (q _i ) | = 3)
Reference content ID = 1: n ₃₁ = 1 piece x score ₁ = score ₁ + log {λ / (1-λ) · (2 | Rall | / 3 | R ₁ |) +1}
★ Here, because it is smaller than the score ₁ obtained in the previous D ₁ (1/1> 2/3),
Not adopted.
(Fourth feature vector t = 4, | D ₄ (q _i ) | = 1, Σ _{s = 1} ^t | D ₄ (q _i ) | = 4)
Reference content ID = 5: n ₄₅ = 1
score ₅ = score ₅ + log {λ / (1-λ) · (1 | Rall | / 4 | R ₁ |) +1}
(Fifth feature vector t = 5, | D ₅ (q _i ) | = 1, Σ _{s = 1} ^t | D ₅ (q _i ) | = 5)
Reference content ID = 4: n ₅₄ = 1
score ₄ = score ₄ + log {λ / (1-λ) · (2 | Rall | / 5 | R ₁ |) +1}
★ Here, because it is smaller than the score ₄ obtained in D ₂ above (1/2> 2/5),
Not adopted.
(Sixth feature vector t = 6, | D ₆ (q _i ) | = 1, Σ _{s = 1} ^t | D ₆ (q _i ) | = 6)
Reference content ID = 1: n ₆₁ = 1
score ₁ = score ₁ + log {λ / (1-λ) · (3 | Rall | / 6 | R ₁ |) +1}
★ Here, because it is smaller than the score ₁ obtained in D ₁ above (1/1> 3/6),
Not adopted.
(Seventh feature vector t = 7, | D ₇ (q _i ) | = 1, Σ _{s = 1} ^t | D ₇ (q _i ) | = 7)
Reference content ID = 6: n ₇₆ = 1
score ₆ = score ₆ + log {λ / (1-λ) · (1 | Rall | / 7 | R ₁ |) +1}
(Eighth feature vector t = 8, | D ₈ (q _i ) | = 1, Σ _{s = 1} ^t | D ₈ (q _i ) | = 8)
Reference content ID = 6: _n86 = 1
score ₆ = score ₆ + log {λ / (1-λ) · (2 | Rall | / 8 | R ₁ |) +1}
★ Here, because it is larger than the score ₆ obtained in D ₇ above (1/7 <2/8),
The score ₆ obtained in the previous D ₇ is not adopted.

［類似ベクトル探索部１３における近傍特徴ベクトル数mの決定方法］
（直積量子化を用いた方法やHamming Embeddingを用いた方法、ＬＳＨ等のような、２段階のm近傍探索を利用する場合）
類似ベクトル探索部１３は、近傍特徴ベクトル数mを固定値とすることなく、更新（可変）することも好ましい。この場合、類似ベクトル探索部１３は、以下の２つのステップを有する（例えば非特許文献２及び５参照）。 [Method for Determining the Number m of Neighboring Feature Vectors in Similar Vector Search Unit 13]
(When using direct product quantization, Hamming Embedding, two-step m-neighbor search such as LSH)
It is also preferable that the similar vector search unit 13 updates (varies) the number m of neighboring feature vectors without making the fixed value a fixed value. In this case, the similar vector search unit 13 includes the following two steps (for example, see Non-Patent Documents 2 and 5).

（Ｓ１）クエリコンテンツの各特徴ベクトルｑ_iに類似する、リファレンスコンテンツの特徴ベクトルのうちk個の部分集合を探索する。即ち、Ｓ１では、大まかな近傍集合を求めることで絞込む。例えばベクトル量子化を用いて大まかな近傍集合を導出することも好ましい（例えば非特許文献２及び５参照）。 (S1) Search for k subsets of feature vectors of reference content similar to each feature vector q _i of query content. That is, in S1, it narrows down by calculating | requiring a rough neighborhood set. For example, it is also preferable to derive a rough neighborhood set using vector quantization (see, for example, Non-Patent Documents 2 and 5).

（Ｓ２）次に、近傍数k個の部分集合に含まれる、リファレンスコンテンツの特徴ベクトルの数Lを計数する。この数Lは、クエリコンテンツの特徴ベクトル周辺の特徴ベクトルの密度を反映している。例えば、Ｓ２における距離計算について、特徴ベクトルをバイナリ符号化した符号を用いるものであってもよいし（例えば非特許文献２参照）、特徴ベクトルを直積量子化により符号化したものであってもよい（例えば非特許文献３参照）。 (S2) Next, the number L of reference content feature vectors included in the k subsets of the number of neighbors is counted. This number L reflects the density of feature vectors around the feature vector of the query content. For example, for distance calculation in S2, a code obtained by binary encoding of a feature vector may be used (for example, refer to Non-Patent Document 2), or a feature vector may be encoded by direct product quantization. (For example, refer nonpatent literature 3).

（Ｓ３）次に、近傍数k個の部分集合に含まれるL個のリファレンスコンテンツの特徴ベクトルのうち、更にｑ_iに類似した上位m（m≦L）個の特徴ベクトルを探索する。即ち、ｑ_iとリファレンスコンテンツの特徴ベクトルとの（近似）距離を更に導出し、更に厳密な近傍集合を導出する。その上で、Ｓ１における近似特徴ベクトル数mは、Ｓ２によって計数された特徴ベクトルの数Lに応じて更新される。例えば、Ｓ２によって計数された特徴ベクトルの数Lが、所定閾値以上であれば、Ｓ３における近傍特徴ベクトル数mを増加させることができる。尚、t番目の集合Ｄ_t(qi)（1≦t≦m）は、t番目の特徴ベクトルのみで構成される。 (S3) Next, among the L reference content feature vectors included in the k subsets of neighborhoods, the top m (m ≦ L) feature vectors similar to q _i are searched. That is, an (approximate) distance between q _i and the feature vector of the reference content is further derived, and a stricter neighborhood set is derived. Then, the approximate feature vector number m in S1 is updated according to the number L of feature vectors counted in S2. For example, if the number L of feature vectors counted in S2 is greater than or equal to a predetermined threshold, the number m of neighboring feature vectors in S3 can be increased. Note that the t-th set D _t (qi) (1 ≦ t ≦ m) includes only the t-th feature vector.

また、Ｓ１における近傍特徴ベクトル数mを、Ｓ２によって得られる特徴ベクトルの数Ｌを用いて、αＬ（α≦１）によって決定するものであってもよい。また、Ｌ^α（α≦１）によって決定するものであってもよい。 Further, the number m of neighboring feature vectors in S1 may be determined by αL (α ≦ 1) using the number L of feature vectors obtained in S2. Further, it may be determined by L ^α (α ≦ 1).

尚、他の実施形態として、特徴ベクトル集合抽出部１１は、異なる種類のアルゴリズム毎に、複数の特徴ベクトルを出力することも好ましい。異なる種類のアルゴリズムとしては、例えばＳＩＦＴ及びＳＵＲＦの両方に基づくものである。この場合、投票部１４は、クエリコンテンツ及びリファレンスコンテンツそれぞれについて、異なる種類の特徴ベクトル毎にスコアｓを算出し、各リファレンスコンテンツについて異なる種類の特徴ベクトルのスコアｓを重み付け和した値を、最終的なスコアとする。 As another embodiment, the feature vector set extraction unit 11 preferably outputs a plurality of feature vectors for each different type of algorithm. Different types of algorithms are based, for example, on both SIFT and SURF. In this case, the voting unit 14 calculates a score s for each different type of feature vector for each of the query content and the reference content, and finally calculates a weighted sum of the scores s of the different types of feature vectors for each reference content. Score.

以上、詳細に説明したように、本発明の検索装置及びプログラムによれば、高次元の特徴ベクトル集合を検索する際に、クエリコンテンツの特徴ベクトルに、無関係な特徴ベクトルが含まれていることを考慮して、リファレンスコンテンツに対するスコア付けをすることができる。 As described above in detail, according to the search device and program of the present invention, when a high-dimensional feature vector set is searched, the feature vector of the query content includes an irrelevant feature vector. In consideration, the reference content can be scored.

前述した本発明の種々の実施形態について、本発明の技術思想及び見地の範囲の種々の変更、修正及び省略は、当業者によれば容易に行うことができる。前述の説明はあくまで例であって、何ら制約しようとするものではない。本発明は、特許請求の範囲及びその均等物として限定するものにのみ制約される。 Various changes, modifications, and omissions of the above-described various embodiments of the present invention can be easily made by those skilled in the art. The above description is merely an example, and is not intended to be restrictive. The invention is limited only as defined in the following claims and the equivalents thereto.

１検索装置
１０リファレンス情報蓄積部
１１特徴ベクトル集合抽出部
１２リファレンス情報生成部
１３類似ベクトル探索部
１４投票部
DESCRIPTION OF SYMBOLS 1 Search apparatus 10 Reference information storage part 11 Feature vector set extraction part 12 Reference information generation part 13 Similar vector search part 14 Voting part

Claims

A search device for searching for reference content similar to query content represented by a set of feature vectors from a set of reference content represented by a set of feature vectors,
Reference information storage means for storing a reference content identifier in association with each feature vector extracted from a plurality of reference contents R _j ;
Using the reference information storage means, for each feature vector q _i of the query content, similar vector search means for searching for at least one set D of reference content feature vectors having similar feature vectors;
Using the mixed parameter λ, each feature vector q _i of the query content is generated from the probability λ · p (q _i | R _j ) generated from each searched reference content and a background model unrelated to the reference content. Based on the probability ratio with the generated probability (1-λ) · p (q _i ), adding a score for each reference content R _{j is} performed for all feature vectors q _i of the query content, A search apparatus, comprising: voting means for outputting, as a search result, reference content R _j that finally obtains a higher score above a predetermined threshold.

The voting means uses the probability ratio for the appearance of feature vectors of the reference content R _j with respect to the number of appearances of feature vectors of all reference content (D _s ) included in the set D of feature vectors of the searched reference content. The search device according to claim 1, wherein calculation is performed based on a ratio (n _j / D _s ) to the number (n _j ).

The voting means further calculates the probability ratio based on the ratio in the following formula (number of feature vectors of the reference content R _j included in the set D (n _j ) ×
Number of feature vectors of all reference content (| R _all |)) /
(Number of feature vectors of all reference contents included in set D (D _s ) ×
Number of feature vectors of the reference content R _j (R _j ))
The search device according to claim 2.

The similarity vector search means classifies the set of feature vectors of the searched reference content into one or more subsets (clusters) D according to the average similarity to the feature vector q _i of the query content, Order subsets D up to the number m (≧ 1) of neighbors from
The voting means calculates a score for each reference content for each subset D _t (1 ≦ t ≦ m), and uses a score in the subset D _t ′ with the maximum score for voting. Item 4. The search device according to Item 2 or 3.

The voting means calculates the score s _j of the reference content j with respect to the feature vector q _i of the query content for the set D _t of feature vectors from the top to the t th by the following formula: s _j = max _j log {λ / (1-λ) · (n _tj · | R _all |) / (Σ _{s = 1} ^t | D _s (q _i ) | · | R _j |) +1)}
n _tj : included in t-th subset D _t
Number of feature vectors for reference content j
Σ _{s = 1} ^t | D _s (q _i ) |: included in t-th subset D _t
Number of feature vectors of all reference content j
| R _all |: Number of all feature vectors in all reference contents
| R _j |: The number of all feature vectors in the reference content j λ, 1-λ: The mixed parameter according to claim 4.

The reference information storage means further stores an average representative vector of feature vectors in association with each subset (cluster) D;
The search according to claim 4 or 5, wherein the similar vector search means searches the subset D by comparing each feature vector q _i of the query content with a representative vector of each subset D. apparatus.

The similar vector search means includes:
First means for searching for k subsets of feature vectors of reference content that are similar to each feature vector q _i of query content;
A second means for counting the number L of feature vectors of reference content included in the subset of the number of neighbors k ;
A third means for searching for upper m (m ≦ L) feature vectors similar to q _i among feature vectors of L reference contents included in a subset of k neighbors;
The number m of feature vectors in the third means is updated according to the number L of feature vectors counted by the second means ,
7. The search device according to claim 4, wherein the D _t (qi) (1 ≦ t ≦ m) includes only a t-th feature vector.

8. The search apparatus according to claim 7, wherein the number m of feature vectors in the third means is determined by αL (α ≦ 1) using the number L of feature vectors obtained by the second means . .

8. The search according to claim 7, wherein the number m of feature vectors in the third means is determined by L ^α (α ≦ 1) using the number L of feature vectors obtained by the second means . apparatus.

It further has a feature vector set extraction means for extracting feature vectors from the reference content and the query content,
The feature vector set extraction means can output a plurality of feature vectors for each different kind of algorithm,
The voting means calculates a score s for each of different types of feature vectors for each of the query content and the reference content, and uses a value obtained by weighting and summing the scores s of the different types of feature vectors for each reference content as a final score. The search device according to any one of claims 1 to 9, wherein:

The query content and the reference content are images,
The search according to any one of claims 1 to 10, wherein the image as the reference content includes at least one instance (target object, object) belonging to the same object or the same category. apparatus.

A search program for causing a computer mounted on an apparatus for searching reference content similar to query content represented by a set of feature vectors to function from a set of reference content represented by a set of feature vectors,
Reference information storage means for storing a reference content identifier in association with each feature vector extracted from a plurality of reference contents R _j ;
Using the reference information storage means, for each feature vector q _i of the query content, similar vector search means for searching for at least one set D of reference content feature vectors having similar feature vectors;
Using the mixed parameter λ, each feature vector q _i of the query content is generated from the probability λ · p (q _i | R _j ) generated from each searched reference content and a background model unrelated to the reference content. Based on the probability ratio with the generated probability (1-λ) · p (q _i ), adding a score for each reference content R _{j is} performed for all feature vectors q _i of the query content, A search program characterized by causing a computer to function as voting means for finally outputting the reference content R _j having a higher score above a predetermined threshold as a search result.