JP5959446B2

JP5959446B2 - Retrieval device, program, and method for high-speed retrieval by expressing contents as a set of binary feature vectors

Info

Publication number: JP5959446B2
Application number: JP2013014891A
Authority: JP
Inventors: 祐介内田; 茂之酒澤
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2013-01-30
Filing date: 2013-01-30
Publication date: 2016-08-02
Anticipated expiration: 2033-01-30
Also published as: JP2014146207A

Description

本発明は、特徴ベクトルの集合で表されるリファレンスコンテンツ（検索対象のコンテンツ）の集合から、同じく特徴ベクトルの集合で表されるクエリコンテンツ（検索キーとなるコンテンツ）に類似したリファレンスコンテンツを検索する技術に関する。特に、特徴ベクトルの集合で表されるマルチメディアコンテンツ（例えば画像）の検索に適する。 The present invention searches for a reference content similar to a query content (content serving as a search key) similarly represented by a set of feature vectors from a set of reference content (contents to be searched) represented by a set of feature vectors. Regarding technology. In particular, it is suitable for searching multimedia contents (for example, images) represented by a set of feature vectors.

近年、オンライン／オフラインに限られず、ストレージの大容量化に伴って、大量のコンテンツを蓄積することが可能となっている。また、携帯電話機やスマートフォンに代表される情報端末機器の普及によって、ユーザ自ら取得した写真データのようなデジタルコンテンツも、データベースに大量かつ容易に蓄積することができる。オフラインデータベースとして、ＨＤＤ(Hard Disk Drive)、ＤＶＤ(Digital Versatile Disk)、Blu-ray disc等の記憶装置がある。また、オンラインデータベースとしては、Flickr（登録商標）やMySpace（登録商標）のようなソーシャルネットワークサービスがある。これら記憶装置及びサービスによれば、データベースに蓄積された個人の大量且つ多様なマルチメディアコンテンツを検索するする技術が重要となる。 In recent years, not limited to online / offline, it has become possible to accumulate a large amount of content as the capacity of the storage increases. In addition, with the widespread use of information terminal devices typified by mobile phones and smartphones, digital content such as photograph data acquired by the user can be easily stored in a large amount in a database. Offline databases include storage devices such as HDD (Hard Disk Drive), DVD (Digital Versatile Disk), and Blu-ray disc. Online databases include social network services such as Flickr (registered trademark) and MySpace (registered trademark). According to these storage devices and services, a technique for searching for a large amount and various multimedia contents of individuals stored in a database becomes important.

マルチメディアコンテンツを検索するために、これらコンテンツから多数の特徴ベクトルを抽出し、この特徴ベクトルの集合同士の間の類似度が高いコンテンツを検索結果として出力する技術がある。この技術によれば、マルチメディアコンテンツの特徴ベクトルを量子化し、量子化された特徴ベクトルの頻度からヒストグラムを作成する。そのヒストグラム同士の間のＬ１ノルム又はＬ２ノルムの距離によって類似度（距離）を算出する。ノルムとは、２つの点の間の距離を表す。Ｌ１ノルムとは、２つの点の各次元の値の絶対値の和を意味し、Ｌ２ノルムとは、２つの点の各次元の値を二乗した和を意味する。 In order to search for multimedia contents, there is a technique for extracting a large number of feature vectors from these contents and outputting contents having a high degree of similarity between sets of feature vectors as search results. According to this technique, feature vectors of multimedia content are quantized and a histogram is created from the frequency of the quantized feature vectors. The similarity (distance) is calculated by the distance of the L1 norm or L2 norm between the histograms. The norm represents the distance between two points. The L1 norm means the sum of the absolute values of the dimensions of the two points, and the L2 norm means the sum of the squares of the values of the two points.

また、画像コンテンツから大量の局所特徴ベクトルを抽出し、それらをベクトル量子化し、同一の代表ベクトルにベクトル量子化された局所特徴ベクトルの数で類似度を算出する技術もある（例えば非特許文献１参照）。 There is also a technique for extracting a large amount of local feature vectors from image content, vector quantizing them, and calculating the similarity based on the number of local feature vectors vector-quantized to the same representative vector (for example, Non-Patent Document 1). reference).

更に、画像から複数の局所不変特徴量を抽出し、特徴ベクトルの頻度のヒストグラム化し、そのヒストグラムの重なり率によって画像とカテゴリとの間の類似度を算出する技術もある（例えば特許文献１参照）。この技術によれば、ヒストグラムに基づいて被写体のパターン認識に不要となる特徴（例えば背景の特徴）を除くことができる。これによって、画像中から物体と物体以外とを予め分離することなく、当該物体の特徴を抽出することができる。 Furthermore, there is a technique for extracting a plurality of local invariant feature amounts from an image, making a histogram of the frequency of feature vectors, and calculating the similarity between the image and the category based on the overlapping ratio of the histograms (see, for example, Patent Document 1). . According to this technique, features (for example, background features) that are not necessary for pattern recognition of a subject can be removed based on the histogram. As a result, the feature of the object can be extracted without previously separating the object and the non-object from the image.

従来、局所特徴量を用いた類似画像検索の枠組みは、「Bag-of-Visual Words」（又はBag-of-Features、Bag-of-Keypoints）と称される（例えば非特許文献１参照）。この技術によれば、Bag-of-Wordsモデル及び転置インデックスを用いた文章の検索方法を、類似画像の検索に適用したものである。Bag-of-Wordsは、文章を１つの単語の頻度により定義される特徴ベクトルで表現し、文章集合に基づいて予め導出されたＩＤＦ(Inverse Document Frequency)を単語の重みとして文章間の類似度を導出する枠組みである。これに対し、Bag-of-Visual Wordsは、画像の局所特徴量を量子化し、量子化後の局所特徴量を単語と見立て、同様に頻度により定義される１つの特徴ベクトルとして表現し、ＩＤＦを用いた重み付けを利用して同一の類推方法を適用することができる。 Conventionally, a similar image search framework using local features is referred to as “Bag-of-Visual Words” (or Bag-of-Features, Bag-of-Keypoints) (see, for example, Non-Patent Document 1). According to this technique, a sentence retrieval method using a Bag-of-Words model and a transposed index is applied to retrieval of similar images. Bag-of-Words expresses a sentence as a feature vector defined by the frequency of one word, and uses IDF (Inverse Document Frequency) derived in advance based on the sentence set to determine the similarity between sentences. It is a framework to derive. On the other hand, Bag-of-Visual Words quantizes the local feature quantity of an image, regards the local feature quantity after quantization as a word, and expresses it as one feature vector similarly defined by the frequency. The same analogy method can be applied using the weighting used.

更に、近年は、「Bag-of-Visual Words」の枠組みを発展させたフィッシャーベクトルに基づく技術が注目されている（例えば非特許文献３参照）。フィッシャーベクトルを用いた技術によれば、特徴ベクトルを混合ガウス分布でモデル化し、混合ガウス分布のパラメータに関するフィッシャーカーネルを、明示的に特徴ベクトルへマッピングしたものを画像を表現する特徴ベクトルとして利用することができる。この技術によれば、特徴ベクトルの集合を１つの固定長の特徴ベクトルで記述することができ、更にユークリッド距離を特徴ベクトル間の距離尺度として用いることができる。 Furthermore, in recent years, a technique based on the Fisher vector, which is an extension of the “Bag-of-Visual Words” framework, has attracted attention (see, for example, Non-Patent Document 3). According to the technique using the Fisher vector, the feature vector is modeled by a mixed Gaussian distribution, and the Fisher kernel related to the parameters of the mixed Gaussian distribution is explicitly mapped to the feature vector and used as the feature vector that represents the image. Can do. According to this technique, a set of feature vectors can be described by one fixed-length feature vector, and the Euclidean distance can be used as a distance measure between feature vectors.

図１は、従来技術におけるコンテンツの検索装置の機能構成図である。 FIG. 1 is a functional configuration diagram of a conventional content search apparatus.

図１の検索装置１によれば、モデルパラメータを生成するために、多数の訓練コンテンツを予め入力し、モデルパラメータを予め蓄積しておく。また、検索装置１は、多数のリファレンスコンテンツ（検索対象のコンテンツ）を予め入力し、モデルパラメータを用いて正規化に変換したリファレンス特徴ベクトルを予め蓄積しておく。その上で、検索装置１は、クエリコンテンツ（検索キーのコンテンツ）について、モデルパラメータを用いてクエリ特徴ベクトルを正規化し、そのクエリ特徴ベクトルに最も類似するリファレンス特徴ベクトルを検索し、リファレンスコンテンツを特定する。 According to the search device 1 of FIG. 1, in order to generate model parameters, a large number of training contents are input in advance and the model parameters are stored in advance. In addition, the search device 1 inputs in advance a large number of reference contents (contents to be searched) and stores in advance reference feature vectors converted to normalization using model parameters. Then, the search device 1 normalizes the query feature vector using the model parameter for the query content (the search key content), searches for the reference feature vector most similar to the query feature vector, and specifies the reference content To do.

図１によれば、検索装置１は、特徴ベクトル抽出部１１と、モデル推定部１２と、モデルパラメータ蓄積部１３と、特徴ベクトル変換部１４と、リファレンス情報蓄積部１５と、特徴ベクトル検索部１６とを有する。これら機能構成部は、装置に搭載されたコンピュータを機能させるプログラムを実行することによって実現される。 According to FIG. 1, the search device 1 includes a feature vector extraction unit 11, a model estimation unit 12, a model parameter storage unit 13, a feature vector conversion unit 14, a reference information storage unit 15, and a feature vector search unit 16. And have. These functional components are realized by executing a program that causes a computer installed in the apparatus to function.

特徴ベクトル抽出部１１は、各マルチメディアコンテンツから、特徴ベクトルの集合を抽出する。例えばマルチメディアコンテンツが画像である場合、その特徴ベクトルは、画像の局所特徴領域から抽出された局所特徴ベクトルである。訓練コンテンツは、特徴ベクトルの集合に変換され、モデル推定部１２へ出力される。また、リファレンスコンテンツ及びクエリコンテンツもそれぞれ、特徴ベクトルの集合に変換され、特徴ベクトル変換部１４へ出力される。これら全てのコンテンツについて、同一次元数（D次元）の特徴ベクトルが抽出される。 The feature vector extraction unit 11 extracts a set of feature vectors from each multimedia content. For example, when the multimedia content is an image, the feature vector is a local feature vector extracted from the local feature region of the image. The training content is converted into a set of feature vectors and output to the model estimation unit 12. Also, the reference content and the query content are each converted into a set of feature vectors and output to the feature vector conversion unit 14. For all these contents, feature vectors with the same number of dimensions (D dimensions) are extracted.

物体認識に用いる特徴ベクトルの抽出アルゴリズムとしては、回転やスケールの変化にロバストな、例えばＳＩＦＴ(Scale-Invariant Feature Transform)やＳＵＲＦ(Speeded
Up Robust Features)が用いられる。例えば、ＳＩＦＴの場合、１枚の画像からは１２８次元の特徴ベクトルの集合が抽出される。ＳＩＦＴとは、スケールスペースを用いて特徴的な局所領域を解析し、そのスケール変化及び回転に不変となる特徴ベクトルを記述する技術である。一方で、ＳＵＲＦの場合、ＳＩＦＴよりも高速処理が可能であって、１枚の画像から６４次元の特徴ベクトルの集合が抽出される。ＳＩＦＴは、処理コストが高く且つリアルタイムマッチングが困難であるのに対し、ＳＵＲＦは、積分画像を利用することによって処理を高速化している。 The feature vector extraction algorithm used for object recognition is, for example, SIFT (Scale-Invariant Feature Transform) or SURF (Speeded)
Up Robust Features) is used. For example, in the case of SIFT, a set of 128-dimensional feature vectors is extracted from one image. SIFT is a technique for analyzing a characteristic local region using a scale space and describing a feature vector that is invariant to scale change and rotation. On the other hand, in the case of SURF, higher-speed processing is possible than SIFT, and a set of 64-dimensional feature vectors is extracted from one image. While SIFT has a high processing cost and is difficult to perform real-time matching, SURF uses an integral image to speed up the processing.

モデル推定部１２は、特徴ベクトル抽出部１１から出力された訓練コンテンツの特徴ベクトルの集合用いて混合ガウス分布(Gaussian Mixture Model)のモデルパラメータを推定し、そのモデルパラメータを出力する。Bag-of-Featuresの識別性能は、確率密度分布のモデル化の精度に依存する。混合ガウス分布は混合数とパラメータを調整することで、任意の連続な密度関数を表現できるため、混合ガウス分布を用いる。 The model estimation unit 12 estimates model parameters of a Gaussian Mixture Model using the set of feature vectors of the training content output from the feature vector extraction unit 11, and outputs the model parameters. Bag-of-Features discrimination performance depends on the accuracy of probability density distribution modeling. A mixed Gaussian distribution is used because an arbitrary continuous density function can be expressed by adjusting the number of parameters and parameters.

モデルパラメータ蓄積部１３は、モデル推定部１２から出力されたモデルパラメータを蓄積する。 The model parameter storage unit 13 stores the model parameters output from the model estimation unit 12.

特徴ベクトル変換部１４は、リファレンスコンテンツ及びクエリコンテンツそれぞれの特徴ベクトルの集合を、１つの固定長の特徴ベクトルへ明示的にマッピングする。このマッピングには、モデルパラメータ蓄積部１３のモデルパラメータに基づくフィッシャーカーネルが用いられる。具体的には、特徴ベクトル集合からモデルの対数尤度関数の勾配ベクトルを求め、モデルパラメータに関するフィッシャー情報行列により正規化をすることで、特徴ベクトルへマッピングする。非特許文献３に記載された技術によれば、フィッシャー情報行列が対角行列と仮定されている。変換された１つの特徴ベクトルは、フィッシャーベクトルと称される。特徴ベクトル変換部１４は、リファレンスコンテンツの特徴ベクトルの集合からマッピングしたフィッシャーベクトルを、リファレンス蓄積部１５へ出力し、クエリコンテンツの特徴ベクトルの集合からマッピングしたフィッシャーベクトルを、特徴ベクトル検索部１６へ出力する。 The feature vector conversion unit 14 explicitly maps a set of feature vectors of the reference content and the query content to one fixed-length feature vector. For this mapping, a Fisher kernel based on the model parameters of the model parameter storage unit 13 is used. Specifically, the gradient vector of the log likelihood function of the model is obtained from the feature vector set, and is mapped to the feature vector by normalizing with a Fisher information matrix related to the model parameter. According to the technique described in Non-Patent Document 3, the Fisher information matrix is assumed to be a diagonal matrix. One transformed feature vector is called a Fisher vector. The feature vector conversion unit 14 outputs the Fisher vector mapped from the set of feature vectors of the reference content to the reference storage unit 15, and outputs the Fisher vector mapped from the set of feature vectors of the query content to the feature vector search unit 16. To do.

リファレンス情報蓄積部１５は、特徴ベクトル変換部１４から出力された、リファレンスコンテンツの正規化されたフィッシャーベクトルを蓄積する。 The reference information accumulation unit 15 accumulates the normalized Fisher vector of the reference content output from the feature vector conversion unit 14.

特徴ベクトル検索部１６は、リファレンス情報蓄積部１５を用いて、クエリコンテンツのフィッシャーベクトルに最も類似する、リファレンスコンテンツのフィッシャーベクトルを検索する。ここでは、ユークリッド距離を用いることができ、クエリコンテンツのフィッシャーベクトルとの間の距離が短い、リファレンスコンテンツのフィッシャーベクトルが検索され、そのリファレンスコンテンツが特定される。 The feature vector search unit 16 uses the reference information storage unit 15 to search for the Fisher vector of the reference content that is most similar to the Fisher vector of the query content. Here, the Euclidean distance can be used, the Fisher vector of the reference content having a short distance from the Fisher vector of the query content is searched, and the reference content is specified.

特開２０１０−２８２５８１号公報JP 2010-282581 A

J. Sivic et al., "Video Google: A Text Retrieval Approach toObject Matching in Videos," in Proc. ICCV, 2003.J. Sivic et al., "Video Google: A Text Retrieval Approach to Object Matching in Videos," in Proc. ICCV, 2003. D. G. Lowe, "Distinctive Image Features from Scale-InvariantKeypoints," International Journal of Computer Vision, vol. 60, no. 2, pp.91-110, 2004.D. G. Lowe, "Distinctive Image Features from Scale-InvariantKeypoints," International Journal of Computer Vision, vol. 60, no. 2, pp.91-110, 2004. F. Perronnin, J. Sanchez, and T. Mensink, "Improving the FisherKernel for Large-Scale Image Classification," in Proc. ECCV, 2010.F. Perronnin, J. Sanchez, and T. Mensink, "Improving the FisherKernel for Large-Scale Image Classification," in Proc. ECCV, 2010. E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, "ORB: Anefficient alternative to SIFT or SURF," in Proc. ICCV, 2011.E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, "ORB: Anefficient alternative to SIFT or SURF," in Proc. ICCV, 2011. A. Alahi, R. Ortiz, and P. Vandergheynst, "FREAK: Fast RetinaKeypoint," in Proc. CVPR, 2012.A. Alahi, R. Ortiz, and P. Vandergheynst, "FREAK: Fast RetinaKeypoint," in Proc. CVPR, 2012. H. Jegou, M. Douze, and C. Schmid, "Product quantization fornearest neighbor search," in IEEE Trans. on PAMI, vol. 33, no. 1, pp117-128, 2011.H. Jegou, M. Douze, and C. Schmid, "Product quantization fornearest neighbor search," in IEEE Trans. On PAMI, vol. 33, no. 1, pp117-128, 2011. 三品陽平、「CVReadiing、ORB: an efficient alternative toSIFT or SURF」、[online]、［平成２４年１２月５日検索］、インターネット＜URL:http://www.vision.cs.chubu.ac.jp/CV-R/jpdf/Rublee_iccv2011.pdf＞Yohei Sanshin, “CVReadiing, ORB: an efficient alternative to SIFT or SURF”, [online], [Searched on December 5, 2012], Internet <URL: http://www.vision.cs.chubu.ac.jp /CV-R/jpdf/Rublee_iccv2011.pdf> 原田達也、「大規模データを用いた一般物体・シーン認識の潮流と理論」、[online]、［平成２４年１２月５日検索］、インターネット＜URL:https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&item_id=81096&item_no=1&page_id=13&block_id=8＞Tatsuya Harada, “The Trend and Theory of General Object / Scene Recognition Using Large-Scale Data”, [online], [Search December 5, 2012], Internet <URL: https: //ipsj.ixsq.nii. ac.jp/ej/index.php?active_action=repository_view_main_item_detail&item_id=81096&item_no=1&page_id=13&block_id=8>

しかしながら、スマートフォンやタブレット端末のようなモバイル端末が普及する中で、コンテンツ検索処理に対して、更なる省メモリ化及び高速マッチング化が要求されてきている。特に、拡張現実感（Augmented Reality, AR）の用途における画像認識の技術分野によれば、リアルタイムに処理するべく、ＳＩＦＴやＳＵＲＦよりも更に高速にコンテンツを検索することが要求される。 However, with the widespread use of mobile terminals such as smartphones and tablet terminals, further memory saving and faster matching have been required for content search processing. In particular, according to the technical field of image recognition in the use of augmented reality (Augmented Reality), it is required to search for content at higher speed than SIFT or SURF in order to perform real-time processing.

そこで、本発明によれば、ＳＩＦＴやＳＵＲＦよりも更に高速にコンテンツを検索することができる検索装置、プログラム及び方法を提供することを目的とする。 Therefore, an object of the present invention is to provide a search device, a program, and a method capable of searching for contents at a higher speed than SIFT and SURF.

本発明によれば、装置に搭載されたコンピュータを、訓練コンテンツから抽出されたモデルパラメータを用いて、リファレンスコンテンツの集合から、クエリコンテンツに類似したリファレンスコンテンツを検索するように機能させる検索プログラムであって、
訓練コンテンツ、リファレンスコンテンツ及びクエリコンテンツそれぞれについて、D次元のバイナリ特徴ベクトルの集合x₁〜x_Tを抽出する特徴ベクトル抽出手段と、
訓練コンテンツのバイナリ特徴ベクトルの集合から、i（1≦i≦N）番目の多変量ベルヌーイ分布に関する混合比ｗ_iと、i番目の多変量ベルヌーイ分布のd（1≦d≦D）番目のパラメータμ_idと、パラメータμ_idに関するフィッシャー情報量ｆ_idとを算出するモデル推定手段と、
混合比ｗ_iとパラメータμ_idとフィッシャー情報量ｆ_idとを蓄積するモデルパラメータ蓄積手段と、
リファレンスコンテンツ又はクエリコンテンツのバイナリ特徴ベクトルの集合から、モデルパラメータ蓄積手段に蓄積された混合比ｗ_iとパラメータμ_idとフィッシャー情報量ｆ_idとを用いて、リファレンスコンテンツ又はクエリコンテンツに対応する１つのフィッシャーベクトルを算出する特徴ベクトル変換手段と、
クエリコンテンツのフィッシャーベクトルに最も類似する、リファレンスコンテンツのフィッシャーベクトルを検索する特徴ベクトル検索手段と
してコンピュータを機能させることを特徴とする。 According to the present invention, there is provided a search program that causes a computer installed in an apparatus to function to search for reference content similar to query content from a set of reference content using model parameters extracted from training content. And
For each of the training content, the reference content, and the query content, a feature vector extraction unit that extracts a set of D-dimensional binary feature vectors x _{1 to} x _T ,
From the set of binary feature vectors of the training content, the mixture ratio w _i for the i (1 ≦ i ≦ N) -th multivariate Bernoulli distribution and the d (1 ≦ d ≦ D) -th parameter of the i-th multivariate Bernoulli distribution model estimation means for calculating μ _id and the Fisher information amount f _id related to the parameter μ _id ;
Model parameter accumulating means for accumulating the mixing ratio w _i , parameter μ _id, and Fisher information amount f _id ;
From the set of binary feature vectors of the reference content or query content, one of the reference content or query content corresponding to the reference content or query content using the mixture ratio w _i , parameter μ _id, and Fisher information amount f _id stored in the model parameter storage means Feature vector conversion means for calculating a Fisher vector;
The computer is caused to function as a feature vector search unit that searches for the Fisher vector of the reference content that is most similar to the Fisher vector of the query content.

本発明の検索プログラムにおける他の実施形態によれば、
特徴ベクトル抽出手段は、ＯＲＢ(Oriented FAST and Rotated BRIEF)又はＦＲＥＡＫ(Fast Retina Keypoint)を用いてバイナリ特徴ベクトルの集合を抽出するようにコンピュータを機能させることも好ましい。 According to another embodiment of the search program of the present invention,
The feature vector extracting means preferably causes the computer to function to extract a set of binary feature vectors using ORB (Oriented FAST and Rotated BRIEF) or FRAK (Fast Retina Keypoint).

本発明の検索プログラムにおける他の実施形態によれば、
モデル推定手段は、訓練コンテンツのバイナリ特徴ベクトルの集合x₁〜x_Tから、
Ｅ(Expectation)ステップについて、バイナリ特徴ベクトルx_i毎に潜在変数iの期待値γ_t (i)を推定し、
Ｍ(Maximization)ステップについて、期待値γ_t (i)を用いて、混合比ｗ_i及びパラメータμ_iを更新し、
これらＥステップ及びＭステップを、収束するまで繰り返すことによって、混合比ｗ_i及びパラメータμ_iのパラメータ群λを算出する
λ（w₁、・・・、w_N及びμ₁₁、・・・、μ_ND）
ようにコンピュータを機能させることも好ましい。 According to another embodiment of the search program of the present invention,
The model estimation means uses a set of binary feature vectors x _{1 to} x _T of the training content,
For the E (Expectation) step, estimate the expected value γ _t (i) of the latent variable i for each binary feature vector x _i ,
For the M (Maximization) step, update the mixture ratio w _i and parameter μ _i using the expected value γ _t (i),
By repeating these E step and M step until convergence, a parameter group λ of the mixture ratio w _i and parameter μ _i is calculated. Λ (w ₁ ,..., W _N and μ ₁₁ ,. _ND )
It is also preferable to make the computer function.

本発明の検索プログラムにおける他の実施形態によれば、
モデル推定手段は、
パラメータμ_idの対数尤度関数を偏微分で定義したフィッシャースコアｓ_idを算出し、
フィッシャースコアｓ_idの分散としてフィッシャー情報量ｆ_idを算出する
ようにコンピュータを機能させることも好ましい。 According to another embodiment of the search program of the present invention,
Model estimation means
Calculate a Fisher score s _{id in} which the log likelihood function of the parameter μ _id is defined by partial differentiation,
It is also preferable to cause the computer to function to calculate the Fisher information amount f _id as the variance of the Fisher score s _id .

本発明の検索プログラムにおける他の実施形態によれば、
特徴ベクトル変換手段は、バイナリ特徴ベクトルの集合毎に、パラメータμ_idを用いてフィッシャースコアｓ_idを算出し、これらをid毎に累積した累積フィッシャースコアｓ'_idを算出し、
各累積フィッシャースコアｓ'_idを、対応するフィッシャー情報量ｆ_idの平方根√ｆ_idで除算したフィッシャーベクトルｖ_idを算出する
ようにコンピュータを機能させることも好ましい。 According to another embodiment of the search program of the present invention,
Feature vector conversion means, for each set of binary feature vectors, calculates the Fischer score s _id using the parameter mu _id, they were calculated cumulative Fisher scores s' _id obtained by accumulating for each id,
It is also preferable to cause the computer to function to calculate a Fisher vector v _id obtained by dividing each accumulated Fisher score s ′ _id by the square root √f _id of the corresponding Fisher information amount f _id .

本発明の検索プログラムにおける他の実施形態によれば、
モデル推定手段は、
パラメータμ_idの対数尤度関数を偏微分で定義したフィッシャースコアｓ_idを算出し、
フィッシャースコアｓ_idの混合要素i毎のベクトル（s_i1〜s_iD）に対して主成分分析を実行し、
主成分分析の結果、固有値の大きいものからK個を、フィッシャー情報量ｆ_idとして出力し、
固有値に対応するK個の固有ベクトルｇ_iKを更に出力し、
モデルパラメータ蓄積手段は、固有ベクトルｇ_iKを更に蓄積する
ようにコンピュータを機能させることも好ましい。 According to another embodiment of the search program of the present invention,
Model estimation means
Calculate a Fisher score s _{id in} which the log likelihood function of the parameter μ _id is defined by partial differentiation,
The principal component analysis is performed on the vector (s _{i1 to} s _iD ) for each mixed element i of the Fisher score s _id ,
As a result of the principal component analysis, K pieces having the largest eigenvalues are output as the Fisher information amount f _id ,
Further output K eigenvectors g _iK corresponding to the eigenvalues,
The model parameter storage means preferably causes the computer to function so as to further store the eigenvector g _iK .

本発明の検索プログラムにおける他の実施形態によれば、
特徴ベクトル変換手段は、バイナリ特徴ベクトルの集合毎に、パラメータμ_idを用いてフィッシャースコアｓ_idを算出し、これらをid毎に累積した累積フィッシャースコアｓ'_idを算出し、
混合要素i毎に、各累積フィッシャースコアｓ'_idのベクトル（s'_i1〜s_iD）を、対応する固有ベクトルｇ_iｋ（g_i1〜g_iK）を用いて正規化（射影）した正規化ベクトルｖ'_ikを算出し、
正規化ベクトルｖ'_ikを、対応する固有ベクトルｇ_ikの平方根√ｇ_ikで除算したフィッシャーベクトルｖ_ikを算出する
ようにコンピュータを機能させることも好ましい。 According to another embodiment of the search program of the present invention,
Feature vector conversion means, for each set of binary feature vectors, calculates the Fischer score s _id using the parameter mu _id, they were calculated cumulative Fisher scores s' _id obtained by accumulating for each id,
A normalized vector v obtained by normalizing (projecting) a vector (s ′ _{i1 to} s _iD ) of each cumulative Fisher score s ′ _id using a corresponding eigenvector g _ik (g _{i1 to} g _iK ) for each mixed element i. 'Calculate _ik ,
It is also preferred to have the computer function to calculate a Fisher vector v _ik obtained by dividing the normalized vector v ′ _ik by the square root √g _ik of the corresponding eigenvector g _ik .

本発明によれば、訓練コンテンツから抽出されたモデルパラメータを用いて、リファレンスコンテンツの集合から、クエリコンテンツに類似したリファレンスコンテンツを検索する検索装置であって、
訓練コンテンツ、リファレンスコンテンツ及びクエリコンテンツそれぞれについて、D次元のバイナリ特徴ベクトルの集合x₁〜x_Tを抽出する特徴ベクトル抽出手段と、
訓練コンテンツのバイナリ特徴ベクトルの集合から、i（1≦i≦N）番目の多変量ベルヌーイ分布に関する混合比ｗ_iと、i番目の多変量ベルヌーイ分布のd（1≦d≦D）番目のパラメータμ_idと、パラメータμ_idに関するフィッシャー情報量ｆ_idとを算出するモデル推定手段と、
混合比ｗ_iとパラメータμ_idとフィッシャー情報量ｆ_idとを蓄積するモデルパラメータ蓄積手段と、
リファレンスコンテンツ又はクエリコンテンツのバイナリ特徴ベクトルの集合から、モデルパラメータ蓄積手段の混合比ｗ_iとパラメータμ_idとフィッシャー情報量ｆ_idとを用いて、リファレンスコンテンツ又はクエリコンテンツに対応する１つのフィッシャーベクトルを算出する特徴ベクトル変換手段と、
クエリコンテンツのフィッシャーベクトルに最も類似する、リファレンスコンテンツのフィッシャーベクトルを検索する特徴ベクトル検索手段と
を有することを特徴とする。 According to the present invention, a search device that searches for reference content similar to query content from a set of reference content using model parameters extracted from training content,
For each of the training content, the reference content, and the query content, a feature vector extraction unit that extracts a set of D-dimensional binary feature vectors x _{1 to} x _T ,
From the set of binary feature vectors of the training content, the mixture ratio w _i for the i (1 ≦ i ≦ N) -th multivariate Bernoulli distribution and the d (1 ≦ d ≦ D) -th parameter of the i-th multivariate Bernoulli distribution model estimation means for calculating μ _id and the Fisher information amount f _id related to the parameter μ _id ;
Model parameter accumulating means for accumulating the mixing ratio w _i , parameter μ _id, and Fisher information amount f _id ;
From a set of binary feature vectors of the reference content or query content, one Fisher vector corresponding to the reference content or query content is obtained using the mixture ratio w _i of the model parameter storage means, the parameter μ _id, and the Fisher information amount f _id. A feature vector conversion means for calculating;
And feature vector search means for searching for the Fisher vector of the reference content that is most similar to the Fisher vector of the query content.

本発明によれば、装置を用いて、訓練コンテンツから抽出されたモデルパラメータを用いて、リファレンスコンテンツの集合から、クエリコンテンツに類似したリファレンスコンテンツを検索する検索方法であって、
モデルパラメータを蓄積する第１のステップとして、
訓練コンテンツそれぞれについて、D次元のバイナリ特徴ベクトルの集合x₁〜x_Tを抽出し、
訓練コンテンツのバイナリ特徴ベクトルの集合から、i（1≦i≦N）番目の多変量ベルヌーイ分布に関する混合比ｗ_iと、i番目の多変量ベルヌーイ分布のd（1≦d≦D）番目のパラメータμ_idと、パラメータμ_idに関するフィッシャー情報量ｆ_idとを算出し、
混合比ｗ_iとパラメータμ_idとフィッシャー情報量ｆ_idとを蓄積し、
リファレンス情報を蓄積する第２のステップとして、
リファレンスコンテンツそれぞれについて、D次元のバイナリ特徴ベクトルの集合を抽出し、
リファレンスコンテンツそれぞれのバイナリ特徴ベクトルの集合から、モデルパラメータとして蓄積された混合比ｗ_iとパラメータμ_idとフィッシャー情報量ｆ_idとを用いて、１つのフィッシャーベクトルを算出し、
フィッシャーベクトルを蓄積し、
クエリコンテンツからリファレンスコンテンツを検索する第３のステップとして、
クエリコンテンツそれぞれのバイナリ特徴ベクトルの集合から、モデルパラメータとして蓄積された混合比ｗ_iとパラメータμ_idとフィッシャー情報量ｆ_idとを用いて、１つのフィッシャーベクトルを算出し、
クエリコンテンツのフィッシャーベクトルに最も類似する、リファレンスコンテンツのフィッシャーベクトルを検索する
を有することを特徴とする。 According to the present invention, there is provided a search method for searching reference content similar to query content from a set of reference content using a model parameter extracted from training content using an apparatus,
As a first step of accumulating model parameters,
For each training content, a set of D-dimensional binary feature vectors x _{1 to} x _T is extracted,
From the set of binary feature vectors of the training content, the mixture ratio w _i for the i (1 ≦ i ≦ N) -th multivariate Bernoulli distribution and the d (1 ≦ d ≦ D) -th parameter of the i-th multivariate Bernoulli distribution Calculate μ _id and Fisher information amount f _id related to parameter μ _id ,
Accumulate the mixing ratio w _i , the parameter μ _id and the Fisher information amount f _id ,
As a second step of accumulating reference information,
For each reference content, extract a set of D-dimensional binary feature vectors,
From the set of binary feature vectors of each reference content, one Fisher vector is calculated using the mixture ratio w _i , the parameter μ _id, and the Fisher information amount f _id accumulated as model parameters.
Accumulate Fisher vector,
As a third step of searching reference content from query content,
From the set of binary feature vectors of each query content, one Fisher vector is calculated using the mixture ratio w _i , the parameter μ _id and the Fisher information amount f _id accumulated as model parameters,
It has a search for the Fisher vector of the reference content that is most similar to the Fisher vector of the query content.

本発明の検索装置、プログラム及び方法によれば、ＳＩＦＴやＳＵＲＦよりも更に高速にコンテンツを検索することができる。 According to the search device, program, and method of the present invention, content can be searched at a higher speed than SIFT or SURF.

従来技術におけるコンテンツの検索装置の機能構成図である。It is a functional block diagram of the content search apparatus in a prior art. 本発明におけるコンテンツの検索装置の機能構成図である。It is a functional block diagram of the content search device in the present invention.

以下、本発明の実施の形態について、図面を用いて詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

本発明の検索装置、プログラム及び方法によれば、訓練コンテンツから抽出されたモデルパラメータを用いて、リファレンスコンテンツの集合から、クエリコンテンツに類似したリファレンスコンテンツを検索する。ここで、本発明によれば、コンテンツから抽出される特徴ベクトルについて、バイナリ特徴ベクトルを抽出することを第１の特徴としている。また、これらのバイナリ特徴ベクトルを多変量混合ベルヌーイ分布でモデル化し、それらのモデルパラメータからフィッシャーベクトルを抽出することを第２の特徴としている。 According to the search device, program, and method of the present invention, reference content similar to query content is searched from a set of reference content using model parameters extracted from training content. Here, according to the present invention, extracting a binary feature vector from a feature vector extracted from content is a first feature. The second feature is that these binary feature vectors are modeled by a multivariate mixed Bernoulli distribution and a Fisher vector is extracted from the model parameters.

図２は、本発明におけるコンテンツの検索装置の機能構成図である。 FIG. 2 is a functional configuration diagram of the content search apparatus according to the present invention.

図２の検索装置における機能構成は、図１の機能構成と同じものである。しかしながら、各機能構成部における処理内容が異なる。以下では、検索装置１における特徴ベクトル抽出部１１と、モデル推定部１２と、モデルパラメータ蓄積部１３と、特徴ベクトル変換部１４と、リファレンス情報蓄積部１５と、特徴ベクトル検索部１６とについて、順に説明する。 The functional configuration of the search device of FIG. 2 is the same as the functional configuration of FIG. However, the processing contents in each functional component are different. Hereinafter, the feature vector extraction unit 11, the model estimation unit 12, the model parameter storage unit 13, the feature vector conversion unit 14, the reference information storage unit 15, and the feature vector search unit 16 in the search device 1 will be sequentially described. explain.

［特徴ベクトル抽出部１１］
本発明における特徴ベクトル抽出部１１は、訓練コンテンツ、リファレンスコンテンツ及びクエリコンテンツそれぞれについて、D次元のバイナリ特徴ベクトルの集合X＝{x₁〜x_T}を抽出する。例えばマルチメディアコンテンツが画像である場合、その特徴ベクトルは、画像の局所特徴領域から抽出された局所バイナリ特徴ベクトルである。訓練コンテンツから抽出されたバイナリ特徴ベクトルの集合は、モデル推定部１２へ出力される。リファレンスコンテンツ及びクエリコンテンツから抽出されたバイナリ特徴ベクトルの集合はそれぞれ、特徴ベクトル変換部１４へ出力される。 [Feature vector extraction unit 11]
The feature vector extraction unit 11 in the present invention extracts a set of D-dimensional binary feature vectors X = {x _{1 to} x _T } for each of training content, reference content, and query content. For example, when the multimedia content is an image, the feature vector is a local binary feature vector extracted from the local feature region of the image. A set of binary feature vectors extracted from the training content is output to the model estimation unit 12. Each set of binary feature vectors extracted from the reference content and the query content is output to the feature vector conversion unit 14.

本発明によれば、バイナリ特徴ベクトルの抽出アルゴリズムとして、ＯＲＢ(Oriented FAST and Rotated BRIEF)（例えば非特許文献４及び非特許文献７参照）又はＦＲＥＡＫ(Fast Retina Keypoint)（例えば非特許文献５）を用いる。ＯＲＢの場合、１つのコンテンツから２５６ビットのバイナリ特徴ベクトルの集合が抽出される。例えば、高速にマッチングを実行するべく、バイナリコードによる特徴記述としてＢＲＩＥＦ(Binary Robust Independent Elementary Features)がある。本発明によれば、ＢＲＩＥＦに回転不変性を導入した特徴記述をすることができる「ＯＲＢ」を用いる。特に、ＯＲＢによれば、ＳＩＦＴやＳＵＲＦと比較して、同等以上の精度を保持すると共に、数百倍の高速化を実現することができる。 According to the present invention, an ORB (Oriented FAST and Rotated BRIEF) (for example, see Non-Patent Document 4 and Non-Patent Document 7) or FRAK (Fast Retina Keypoint) (for example, Non-Patent Document 5) is used as an algorithm for extracting binary feature vectors. Use. In the case of ORB, a set of 256-bit binary feature vectors is extracted from one content. For example, in order to execute matching at high speed, there is BRIEF (Binary Robust Independent Elementary Features) as a feature description by a binary code. According to the present invention, “ORB” is used which can describe a feature in which rotation invariance is introduced into BRIEF. In particular, according to the ORB, it is possible to maintain an accuracy equal to or higher than that of SIFT or SURF and realize a speed increase of several hundred times.

＜ＯＲＢについて＞
ＯＲＢは、「特徴点検出処理」と「特徴ベクトル記述処理」との２つのステップから構成される。 <About ORB>
The ORB is composed of two steps of “feature point detection processing” and “feature vector description processing”.

（特徴点検出処理）
ＯＲＢにおける特徴点検出処理によれば、高速にキーポイントを検出するためにＦＡＳＴ(Features from Accelerated Segment Test)を用いる。また、ＦＡＳＴでは、スケール変化に対してロバストではないため、画像を複数のサイズに変換し、それぞれのサイズの画像から特徴点を抽出する。 (Feature point detection processing)
According to the feature point detection process in the ORB, FAST (Features from Accelerated Segment Test) is used to detect key points at high speed. In addition, since FAST is not robust to scale changes, an image is converted into a plurality of sizes, and feature points are extracted from images of each size.

また、既存のＦＡＳＴには、回転不変性を得るためのキーポイントのオリエンテーション算出のアルゴリズムがない。そのために、ＯＲＢでは、回転不変性を得るべくOriented FASTを採用している。オリエンテーションを基準として特徴記述をすることによって、入力画像が回転していても、同一なキーポイントは同一な特徴量となって検出することができる。そのために、キーポイントの中心とパッチの輝度の重心の方向ベクトルを用いる。 Further, the existing FAST does not have an algorithm for calculating the key point orientation for obtaining rotation invariance. Therefore, ORB adopts Oriented FAST in order to obtain rotational invariance. By describing the features based on the orientation, even if the input image is rotated, the same key point can be detected as the same feature amount. Therefore, the direction vector of the center of the key point and the center of gravity of the brightness of the patch is used.

（特徴ベクトル記述処理）
次に、ＯＲＢにおける特徴ベクトル記述処理によれば、検出された特徴点毎に、BRIEF特徴ベクトル記述子によってバイナリ特徴ベクトルが抽出される。これらは、特徴点周辺の２箇所のピクセルの輝度の大小関係から求められる。 (Feature vector description processing)
Next, according to the feature vector description processing in the ORB, a binary feature vector is extracted for each detected feature point by using a BRIF feature vector descriptor. These are obtained from the magnitude relationship of the luminance of two pixels around the feature point.

ＢＲＩＥＦは、バイナリコードによってキーポイントの特徴量記述を実行することができる。ＳＩＦＴやＳＵＲＦによれば、特徴量記述に高次元の実数を用いていた。しかしながら、高次元の実数を用いた場合、メモリ容量の増加と類似度計算の増加と問題となる。そこで、ＯＲＢに基づくＢＲＩＥＦを用いることによって、バイナリコードによって特徴記述することで省メモリ化し、類似度計算にハミング距離を用いることで処理コストの抑制を実現する。 BRIEF can execute keypoint feature description by binary code. According to SIFT and SURF, high-dimensional real numbers are used for feature description. However, when a high-dimensional real number is used, there is an increase in memory capacity and similarity calculation. Therefore, by using BREF based on ORB, it is possible to save memory by describing features by binary code, and it is possible to reduce processing costs by using a Hamming distance for similarity calculation.

ＢＲＩＥＦによれば、パッチ内においてランダムに選択された２点の輝度差の符号からバイナリコードを生成する。選択するピクセルは、キーポイント位置を中心としたガウス分布に従ってランダムに選択する。ここで、ＯＲＢは、更に高精度にマッチングをさせるために、学習を用いてピクセルを選択している。選択するピクセル位置は、ペアのビット分散が大きく且つＮ組のペアの相関が低い場合に、特徴記述能力が高いバイナリコードとして、特徴記述に使用する。Ｎ組のペアは、Greedyアルゴリズムを用いて絞り込む。 According to BRIEF, a binary code is generated from the sign of the luminance difference between two points randomly selected in the patch. The pixels to be selected are randomly selected according to a Gaussian distribution centered on the key point position. Here, the ORB selects pixels using learning in order to perform matching with higher accuracy. The selected pixel position is used for feature description as a binary code with high feature description capability when the bit variance of the pair is large and the correlation of N pairs is low. N pairs are narrowed down using the Greedy algorithm.

［モデル推定部１２］
モデル推定部１２は、訓練コンテンツのバイナリ特徴ベクトルの集合から、i番目の多変量ベルヌーイ分布に関する混合比ｗ_iと、i番目の多変量ベルヌーイ分布のd番目のパラメータμ_idとを算出する。これらは、モデルパラメータλとして算出される。
λ（w₁、・・・、w_N及びμ₁₁、・・・、μ_ND）
また、本発明におけるモデル推定部１２は、パラメータμ_idに関するフィッシャー情報量ｆ_idも更に算出する。
f₁₁、・・・、f_ND（N×D個）：フィッシャー情報量 [Model estimation unit 12]
The model estimation unit 12 calculates a mixture ratio w _i related to the i-th multivariate Bernoulli distribution and a d-th parameter μ _{id of} the i-th multivariate Bernoulli distribution from the set of binary feature vectors of the training content. These are calculated as model parameters λ.
λ (w ₁ ,..., w _N and μ ₁₁ ,..., μ _ND )
In addition, the model estimation unit 12 according to the present invention further calculates a Fisher information amount f _id related to the parameter μ _id .
f ₁₁ ,..., f _ND (N × D): Fisher information amount

＜多変量混合ベルヌーイ分布に基づくパラメータw_i及びμ_idの算出＞
本発明によれば、バイナリ特徴ベクトルの集合を「多変量混合ベルヌーイ分布」でモデル化したモデルパラメータλを推定する。ベルヌーイ分布とは、確率pで1を、確率q＝1-pで0をとる離散確率分布という。Xをベルヌーイ分布に従う確率変数とすれば、確率変数Xの平均はp、分散はpq＝p(1-p)となる。「多変量混合ベルヌーイ分布」は、バイナリ特徴ベクトルx_tが生成される確率p(x_t|λ)を表現する。

N：混合数 <Calculation of parameters w _i and μ _id based on multivariate mixed Bernoulli distribution>
According to the present invention, a model parameter λ obtained by modeling a set of binary feature vectors with a “multivariate mixed Bernoulli distribution” is estimated. The Bernoulli distribution is a discrete probability distribution with a probability p of 1 and a probability q = 1−p of 0. If X is a random variable according to Bernoulli distribution, the mean of the random variable X is p, and the variance is pq = p (1-p). The “multivariate mixed Bernoulli distribution” expresses a probability p (x _t | λ) that a binary feature vector x _t is generated.

N: Number of mixtures

混合分布であるため、p₁からp_Nまでの異なる多変量ベルヌーイ分布が、それぞれ混合比w_iで選択され、x_tが生成される。i番目の多変量ベルヌーイ分布から、バイナリ特徴ベクトルx_tが生成される確率は、以下の式で表現される。

μ_id：i番目の多変量ベルヌーイ分布のd番目のパラメータ
x_t,d：バイナリ特徴ベクトルx_tのd番目のビット
D：バイナリ特徴ベクトルのビット長
p_i（x_t|λ）：バイナリ特徴ベクトルx_tがi番目の多変量ベルヌーイ分布から生成
された際に、d番目のビットが1となる確率 Since they are mixed distributions, different multivariate Bernoulli distributions from p ₁ to p _N are selected with the respective mixing ratios w _i to generate x _t . The probability that the binary feature vector x _t is generated from the i-th multivariate Bernoulli distribution is expressed by the following equation.

μ _id : d-th parameter of i-th multivariate Bernoulli distribution
x _{t, d} : d-th bit of binary feature vector x _t
D: Bit length of binary feature vector
p _i (x _t | λ): Binary feature vector x _t is generated from the i-th multivariate Bernoulli distribution
The probability that the dth bit will be 1

これらのパラメータは、具体的には、訓練コンテンツのバイナリ特徴ベクトルの集合x₁〜x_Tから、ＥＭ(Expectation-Maximization)アルゴリズムの繰り返し処理によって推定する。ＥＭアルゴリズムとは、統計学について、確率モデルのパラメータを最尤法に基づいて推定方法であって、観測不可能な潜在変数に確率モデルが依存する場合に用いられるものである。 Specifically, these parameters are estimated from a set of binary feature vectors x _{1 to} x _T of the training content by an iterative process of an EM (Expectation-Maximization) algorithm. The EM algorithm is an estimation method for statistical parameters based on the maximum likelihood method, and is used when the probability model depends on a latent variable that cannot be observed.

Ｅ(Expectation、期待値)ステップでは、バイナリ特徴ベクトルx_i毎に、潜在変数z_tiの分布に基づいて、モデルの尤度の期待値γ_t(i)を推定する。

γt(i)：t番目の訓練ベクトルがi番目の多変量ベルヌーイ分布から生成された確率 In the E (Expectation, expected value) step, the expected value γ _t (i) of the model likelihood is estimated for each binary feature vector x _i based on the distribution of the latent variable z _ti .

γt (i): Probability that the t-th training vector was generated from the i-th multivariate Bernoulli distribution

Ｍ(Maximization、最大化)ステップでは、Ｅステップで算出された尤度の期待値γ_t(i)を最大化するべく、混合比ｗ_i及びパラメータμ_iを更新する。Ｍステップで算出されたパラメータは、次のＥステップにおける潜在変数の分布を決定するために用いられる。

In the M (Maximization) step, the mixture ratio w _i and the parameter μ _i are updated in order to maximize the expected value γ _t (i) of the likelihood calculated in the E step. The parameter calculated in the M step is used to determine the distribution of latent variables in the next E step.

これらＥステップ及びＭステップを、収束するまで繰り返すことによって、対数尤度を最大化する混合比ｗ_i及びパラメータμ_iのパラメータ群λを算出する
λ（w₁、・・・、w_N及びμ₁₁、・・・、μ_ND） By repeating these E step and M step until convergence, the parameter group λ of the mixture ratio w _i and the parameter μ _i that maximizes the log likelihood is calculated λ (w ₁ ,..., W _N and μ ₁₁・・・・・・ μ _ND ）

＜フィッシャー情報量f_idの算出＞
また、モデル推定部１２は、多変量混合ベルヌーイ分布のパラメータμ_idに関する「フィッシャー情報量f_id」を算出する。フィッシャーカーネル(Fisher kernel）は、生成的アプローチ(generative
approach)と判別的アプローチ(discriminative approach)とを結合させる枠組みである（例えば非特許文献８参照）。フィッシャーカーネルでは、最初に、局所記述子を生成する確率密度分布から導出される勾配ベクトルを計算し、この勾配ベクトルをフィッシャー情報行列で正規化したものを、画像を表現する１つの特徴ベクトルとする。フィッシャー情報行列を対角行列と仮定した場合、正規化は各パラメータに関する勾配を、フィッシャー情報量で正規化することと同値である。フィッシャーカーネルによれば、Bag of Featuresと比較して、同一サイズのコードブックであっても、要素数のより多い特徴ベクトルを得ることができる。即ち、特徴ベクトルの表現する情報が多いため、計算コストの高いカーネル法を利用して高次元空間へ射影する必要がなく、線形的な識別であっても十分な性能を引き出すことができる。 <Calculation of Fisher information amount f _id >
Further, the model estimation unit 12 calculates “Fischer information amount f _id ” related to the parameter μ _id of the multivariate mixed Bernoulli distribution. The Fisher kernel is a generative approach.
approach) and a discriminative approach (see Non-Patent Document 8, for example). In the Fisher kernel, first, a gradient vector derived from a probability density distribution that generates a local descriptor is calculated, and this gradient vector normalized by a Fisher information matrix is used as one feature vector that represents an image. . Assuming that the Fisher information matrix is a diagonal matrix, normalization is equivalent to normalizing the gradient for each parameter with the amount of Fisher information. According to the Fisher kernel, a feature vector having a larger number of elements can be obtained even with a codebook of the same size as compared with Bag of Features. That is, since there is a lot of information expressed by feature vectors, there is no need to project to a high-dimensional space using a kernel method with high calculation cost, and sufficient performance can be obtained even with linear identification.

（フィッシャー情報行列を対角行列とした第１の実施形態）
モデル推定部１２は、
（Ｓ１１）パラメータμ_idの対数尤度関数を偏微分で定義したフィッシャースコアｓ_idを算出し、
（Ｓ１２）フィッシャースコアｓ_idの分散としてフィッシャー情報量ｆ_idを算出する。 (First embodiment in which the Fisher information matrix is a diagonal matrix)
The model estimation unit 12
(S11) A Fisher score s _{id in} which a logarithmic likelihood function of the parameter μ _id is defined by partial differentiation is calculated,
(S12) The Fisher information amount f _id is calculated as the variance of the Fisher score s _id .

フィッシャー情報量は、フィッシャースコアの２次モーメントで定義される。μ_idに関するフィッシャースコアは、多変量混合ベルヌーイ分布に関して、バイナリ特徴ベクトル集合X＝{x₁、・・・、x_T}を観測した際の対数尤度関数L(λ|X)＝log P(X|λ)のμ_idに関する偏微分で定義される。 The Fisher information amount is defined by the second moment of the Fisher score. Fisher's score for μ _id is a log-likelihood function L (λ | X) = log P (when a binary feature vector set X = {x ₁ ,..., x _T } is observed for a multivariate mixed Bernoulli distribution. X | λ) is defined as a partial derivative with respect to μ _id .

μ_idに関するフィッシャースコアs_idは、以下の式で定義される。

Fisher relates mu _id score s _id is defined by the following equation.

また、前述のγt(i)を用いると、以下の式となる。

Further, when the above-described γt (i) is used, the following equation is obtained.

μ_idに関するフィッシャー情報量f_idは、以下の式によって定義される。

Fisher information f _id relates mu _id is defined by the following equation.

従来（非特許文献３）では、フィッシャー情報量を、パラメータλから近似的に計算している。本発明では、以下に示すようにフィッシャー情報量をサンプルから直接計算することで、近似ではない正確なフィッシャー情報量を計算し、精度を担保する。
数７は、x_tの独立性と、

とによって、フィッシャー情報量ｆ_idは、以下のようになる。

Conventionally (Non-Patent Document 3), the Fisher information amount is approximately calculated from the parameter λ. In the present invention, as shown below, the Fisher information amount is directly calculated from the sample, thereby calculating an accurate Fisher information amount that is not approximate and ensuring accuracy.
Equation 7 is the independence of x _t

Thus, the Fisher information amount f _id is as follows.

（主成分分析を用いた第２の実施形態）
モデル推定部１２は、
（Ｓ２１）対数尤度関数のパラメータμ_idに関する偏微分で定義されるフィッシャースコアｓ_idを算出し、
（Ｓ２２）フィッシャースコアｓ_idに対して主成分分析を実行し、
（Ｓ２３）主成分分析の結果、固有値の大きいものからK個を、正規化パラメータｆ_idとして出力し、
（Ｓ２４）固有値に対応するK個の固有ベクトルｇ_iKを更に出力する。 (Second embodiment using principal component analysis)
The model estimation unit 12
(S21) Calculate a Fisher score s _id defined by partial differentiation with respect to the parameter μ _id of the log likelihood function,
(S22) A principal component analysis is performed on the Fisher score s _id ,
(S23) As a result of the principal component analysis, K values having large eigenvalues are output as normalization parameters f _id ,
(S24) Further output K eigenvectors g _iK corresponding to the eigenvalues.

特に画像の場合、バイナリ特徴ベクトルのビット間には相関がある。そのために、第１の実施形態のような対角行列の仮定は必ずしも成立しない。そのため、第２の実施形態によれば、以下のように主成分分析を用いて無相関化と正規化を実行する。主成分分析とは、直交回転を用いて変数間に相関がある元の観測値を、相関の無い主成分とよばれる値に変換することをいう。 Particularly in the case of an image, there is a correlation between the bits of the binary feature vector. For this reason, the assumption of the diagonal matrix as in the first embodiment is not necessarily established. Therefore, according to the second embodiment, decorrelation and normalization are performed using principal component analysis as follows. Principal component analysis refers to transforming an original observed value having a correlation between variables into a value called an uncorrelated principal component using orthogonal rotation.

多変量混合ベルヌーイ分布のi番目の多変量ベルヌーイ分布について、フィッシャースコアｓ_i1、・・・、ｓ_iDを、バイナリ特徴ベクトル集合x₁、・・・、x_Tに対して算出し、主成分分析を実行する。主成分分析の結果の固有値の大きいものからK個を、f_i1、・・・、f_iKとし、対応する固有ベクトルをg_i1、・・・、g_iKとし、モデルパラメータとしてモデルパラメータ蓄積部１３へ出力する。 For the i-th multivariate Bernoulli distribution multivariate mixed Bernoulli distribution, calculated Fisher scores s _i1, · · ·, a s _iD, binary feature vector set x _1, · · ·, relative to x _T, principal component analysis Execute. K from the largest eigenvalues of the result of the principal component analysis are set as f _i1 ,..., F _iK , the corresponding eigenvectors are set as g _i1 _,. Output.

［モデルパラメータ蓄積部１３］
モデルパラメータ蓄積部１３は、モデル推定部１２から出力されたモデルパラメータとして、混合比ｗ_i（i＝1〜N）パラメータμ_id（i＝1〜N, d＝1〜D）とフィッシャー情報量ｆ_id（i＝1〜N, d＝1〜D）とを蓄積する。また、第２の実施形態によれば、モデルパラメータ蓄積部１３は、固有ベクトルｇ_ik（i＝1〜N, k＝1〜K個）も更に蓄積する。 [Model parameter storage unit 13]
The model parameter accumulating unit 13 uses the mixture ratio w _i (i = 1 to N) parameter μ _id (i = 1 to N, d = 1 to D) and the Fisher information amount as model parameters output from the model estimating unit 12. f _id (i = 1 to N, d = 1 to D) is stored. Further, according to the second embodiment, the model parameter accumulating unit 13 further accumulates eigenvectors g _ik (i = 1 to N, k = 1 to K).

［特徴ベクトル変換部１４］
特徴ベクトル変換部１４は、リファレンスコンテンツ又はクエリコンテンツのバイナリ特徴ベクトルの集合x₁〜x_Tから、モデルパラメータ蓄積部１３の混合比ｗ_iとパラメータμ_idとフィッシャー情報量ｆ_idとを用いて、リファレンスコンテンツ又はクエリコンテンツに対応する１つのフィッシャーベクトルｖを算出する。 [Feature vector conversion unit 14]
The feature vector conversion unit 14 uses the mixture ratio w _i , the parameter μ _id, and the Fisher information amount f _{id of} the model parameter storage unit 13 from the set of binary feature vectors x _{1 to} x _T of the reference content or the query content, One Fisher vector v corresponding to the reference content or query content is calculated.

（モデル推定部１２の第１の実施形態に対して）
特徴ベクトル変換部１４は、
（Ｓ１３）バイナリ特徴ベクトルの集合x₁〜x_T毎に、パラメータｗ_i及びμ_idを用いてフィッシャースコアｓ_id（s₁₁〜s_ND）を算出し、これらをid毎に累積した累積フィッシャースコアｓ'_id（s'₁₁〜s'_ND）算出し、
（Ｓ１４）各累積フィッシャースコアｓ'_idを、対応するフィッシャー情報量ｆ_idの平方根√ｆ_idで除算したフィッシャーベクトルｖ_idを算出する。
v_id＝s_id／√f_id
f₁₁、・・・、f_ND（N×D個）：フィッシャー情報量 (For the first embodiment of the model estimation unit 12)
The feature vector conversion unit 14
(S13) A Fisher score s _id (s _{11 to} s _ND ) is calculated for each set of binary feature vectors x _{1 to} x _T using the parameters w _i and μ _id and accumulated for each id. Calculate s' _id (s' ₁₁ ~ s' _ND )
(S14) A Fisher vector v _id is calculated by dividing each accumulated Fisher score s ′ _id by the square root √f _id of the corresponding Fisher information amount f _id .
v _id = s _id / √f _id
f ₁₁ ,..., f _ND (N × D): Fisher information amount

（モデル推定部１２の第２の実施形態に対して）
特徴ベクトル変換部１４は、
（Ｓ２５）バイナリ特徴ベクトルの集合毎に、パラメータｗ_i及びμ_idを用いてフィッシャースコアｓ_id（s₁₁〜s_ND）を算出し、これらをid毎に累積した累積フィッシャースコアｓ'_id（s'₁₁〜s'_ND）を算出し、
（Ｓ２６）混合要素i毎に、各累積フィッシャースコアｓ'_id（s'_i1〜s’_iD）、対応する固有ベクトルｇ_iKを用いて正規化（射影）した正規化ベクトルｖ'_idを算出し、
（Ｓ２７）正規化ベクトルｖ'_idを、対応する固有ベクトルｇ_idの平方根√ｇ_idで除算したフィッシャーベクトルｖ_idを算出する。
v_id＝v’_id／√g_id (For the second embodiment of the model estimation unit 12)
The feature vector conversion unit 14
(S25) For each set of binary feature vectors, the Fisher score s _id (s _{11 to} s _ND ) is calculated using the parameters w _i and μ _id , and the accumulated Fisher score s ′ _id (s _'11 ~s' _ND) is calculated,
(S26) For each mixed element i, calculate a normalized vector v ′ _id normalized (projected) using each cumulative Fisher score s ′ _id (s ′ _{i1 to} s ′ _iD ) and the corresponding eigenvector g _iK ,
(S27) the normalized vector v _'id, to calculate a Fischer vector v _id divided by the square root √G _id of the corresponding eigenvectors g _id.
v _id = v ' _id / √g _id

特徴ベクトル変換部１４は、リファレンスコンテンツについて変換されたフィッシャーベクトルは、リファレンス情報蓄積部１５へ出力し、クエリコンテンツについて変換されたフィッシャーベクトルは、特徴ベクトル変換部１６へ出力される。 The feature vector conversion unit 14 outputs the Fisher vector converted for the reference content to the reference information storage unit 15, and the Fisher vector converted for the query content is output to the feature vector conversion unit 16.

［特徴ベクトル検索部１６］
特徴ベクトル検索部１６は、従来技術の図１と同様に、リファレンス情報蓄積部１５を用いて、クエリコンテンツのフィッシャーベクトルｖ_Qに最も類似する、リファレンスコンテンツのフィッシャーベクトルｖ_Rを検索する。ここでは、ユークリッド距離を用いることができ、v_Q及びv_Rの間の距離が短いほど、クエリコンテンツに対してそのリファレンスコンテンツの類似度が高いことを意味する。具体的には、最近傍探索(Approximate Nearest Neighbor)アルゴリズムの１つである直積量子化を用いた方法（例えば非特許文献６参照）やＬＳＨ(Locality-Sensitive Hashing)を用いることも好ましい。 [Feature vector search unit 16]
The feature vector search unit 16 uses the reference information storage unit 15 to search for the Fisher vector v _R of the reference content that is most similar to the Fisher vector v _Q of the query content, as in FIG. Here, it is possible to use the Euclidean distance, v as the distance between _Q and v _R is shorter, the higher the degree of similarity of the reference content to the query content. Specifically, it is also preferable to use a method (for example, refer to Non-Patent Document 6) that uses direct product quantization, which is one of the Approximate Nearest Neighbor algorithms, or LSH (Locality-Sensitive Hashing).

以上、詳細に説明したように、本発明の検索装置、プログラム及び方法によれば、ＳＩＦＴやＳＵＲＦよりも更に高速にコンテンツを検索することができる。 As described above in detail, according to the search device, program, and method of the present invention, content can be searched at a higher speed than SIFT or SURF.

前述した本発明の種々の実施形態について、本発明の技術思想及び見地の範囲の種々の変更、修正及び省略は、当業者によれば容易に行うことができる。前述の説明はあくまで例であって、何ら制約しようとするものではない。本発明は、特許請求の範囲及びその均等物として限定するものにのみ制約される。 Various changes, modifications, and omissions of the above-described various embodiments of the present invention can be easily made by those skilled in the art. The above description is merely an example, and is not intended to be restrictive. The invention is limited only as defined in the following claims and the equivalents thereto.

１検索装置
１１特徴ベクトル抽出部
１２モデル推定部
１３モデルパラメータ蓄積部
１４特徴ベクトル変換部
１５リファレンス情報蓄積部
１６特徴ベクトル検索部 DESCRIPTION OF SYMBOLS 1 Search apparatus 11 Feature vector extraction part 12 Model estimation part 13 Model parameter storage part 14 Feature vector conversion part 15 Reference information storage part 16 Feature vector search part

Claims

A search program that causes a computer mounted on a device to function to search for reference content similar to query content from a set of reference content using model parameters extracted from training content,
For each of the training content, the reference content, and the query content, a feature vector extraction unit that extracts a set of D-dimensional binary feature vectors x _{1 to} x _T ,
From the set of binary feature vectors of the training content, the mixture ratio w _i for the i (1 ≦ i ≦ N) -th multivariate Bernoulli distribution and the d (1 ≦ d ≦ D) -th parameter of the i-th multivariate Bernoulli distribution model estimation means for calculating μ _id and the Fisher information amount f _id related to the parameter μ _id ;
Model parameter accumulating means for accumulating the mixing ratio w _i , parameter μ _id, and Fisher information amount f _id ;
1 corresponding to the reference content or query content from the set of binary feature vectors of the reference content or query content using the mixture ratio w _i , parameter μ _id, and Fisher information amount f _id stored in the model parameter storage means. Feature vector conversion means for calculating two Fisher vectors;
A search program that causes a computer to function as a feature vector search unit that searches for a Fisher vector of a reference content that is most similar to a Fisher vector of a query content.

The said feature vector extraction means makes a computer function so that a set of binary feature vectors may be extracted using ORB (Oriented FAST and Rotated BRIEF) or FRAK (Fast Retina Keypoint). Search program.

The model estimation means includes a set of binary feature vectors x _{1 to} x _{T of} training content,
For the E (Expectation) step, estimate the expected value γ _t (i) of the latent variable i for each binary feature vector x _i ,
For the M (Maximization) step, the mixture ratio w _i and the parameter μ _i are updated using the expected value γ _t (i),
By repeating these E step and M step until convergence, a parameter group λ of the mixture ratio w _i and parameter μ _i is calculated. Λ (w ₁ ,..., W _N and μ ₁₁ ,. _ND )
The search program according to claim 1 or 2, wherein the computer functions as described above.

The model estimation means includes
Calculating a Fisher score s _{id in} which a logarithmic likelihood function of the parameter μ _id is defined by partial differentiation;
The search program according to any one of claims 1 to 3, wherein the computer is caused to calculate a Fisher information amount f _id as a variance of the Fisher score s _id .

The feature vector conversion means calculates a Fisher score s _id using the parameter μ _id for each set of binary feature vectors, calculates a cumulative Fisher score s ′ _id obtained by accumulating these for each _id ,
5. The search program according to claim 4, wherein the computer is caused to calculate a Fisher vector v _id obtained by dividing each cumulative Fisher score s ′ _id by a square root √f _id of a corresponding Fisher information amount f _id . .

The model estimation means includes
Calculating a Fisher score s _{id in} which a logarithmic likelihood function of the parameter μ _id is defined by partial differentiation;
A principal component analysis is performed on the vector (si1 to siD) for each mixed element i of the Fisher score s _id ,
As a result of the principal component analysis, K pieces having large eigenvalues are output as the Fisher information amount f _id ,
Further output K eigenvectors g _iK corresponding to the eigenvalues,
The search program according to any one of claims 1 to 3, wherein the model parameter storage means causes a computer to further store the eigenvector g _iK .

The feature vector conversion means calculates a Fisher score s _id using the parameter μ _id for each set of binary feature vectors, calculates a cumulative Fisher score s ′ _id obtained by accumulating these for each _id ,
A normalized vector v obtained by normalizing (projecting) a vector (s ′ _{i1 to} s _iD ) of each cumulative Fisher score s ′ _id using a corresponding eigenvector g _ik (g _{i1 to} g _iK ) for each mixed element i. 'Calculate _ik ,
The search program according to claim 6, wherein the computer functions to calculate a Fisher vector v _ik obtained by dividing the normalized vector v ′ _ik by the square root √g _ik of the corresponding eigenvector g _ik .

A search device for searching reference content similar to query content from a set of reference content using model parameters extracted from training content,
For each of the training content, the reference content, and the query content, a feature vector extraction unit that extracts a set of D-dimensional binary feature vectors x _{1 to} x _T ,
From the set of binary feature vectors of the training content, the mixture ratio w _i for the i (1 ≦ i ≦ N) -th multivariate Bernoulli distribution and the d (1 ≦ d ≦ D) -th parameter of the i-th multivariate Bernoulli distribution model estimation means for calculating μ _id and the Fisher information amount f _id related to the parameter μ _id ;
Model parameter accumulating means for accumulating the mixing ratio w _i , parameter μ _id, and Fisher information amount f _id ;
From a set of binary feature vectors of reference content or query content, one Fisher vector corresponding to the reference content or query content using the mixture ratio w _i , parameter μ _id and Fisher information amount f _{id of the} model parameter storage means. Feature vector conversion means for calculating
And a feature vector search unit that searches for a Fisher vector of reference content that is most similar to a Fisher vector of query content.

A search method for searching reference content similar to query content from a set of reference content using a model parameter extracted from training content using an apparatus,
As a first step of accumulating model parameters,
For each training content, a set of D-dimensional binary feature vectors x _{1 to} x _T is extracted,
From the set of binary feature vectors of the training content, the mixture ratio w _i for the i (1 ≦ i ≦ N) -th multivariate Bernoulli distribution and the d (1 ≦ d ≦ D) -th parameter of the i-th multivariate Bernoulli distribution Calculate μ _id and Fisher information amount f _id related to parameter μ _id ,
Accumulate the mixing ratio w _i , the parameter μ _id and the Fisher information amount f _id ,
As a second step of accumulating reference information,
For each reference content, extract a set of D-dimensional binary feature vectors,
From the set of binary feature vectors of each reference content, one Fisher vector is calculated using the mixture ratio w _i , the parameter μ _id, and the Fisher information amount f _id accumulated as model parameters.
Accumulates the Fisher vector,
As a third step of searching reference content from query content,
From the set of binary feature vectors of each query content, one Fisher vector is calculated using the mixture ratio w _i , the parameter μ _id and the Fisher information amount f _id accumulated as model parameters,
A search method comprising searching for a Fisher vector of reference content that is most similar to a Fisher vector of query content.