JP2016162127A

JP2016162127A - Image search apparatus, method and program

Info

Publication number: JP2016162127A
Application number: JP2015039480A
Authority: JP
Inventors: 眞哉村田; Shinya Murata; 秀尚永野; Hidenao Nagano; 薫平松; Kaoru Hiramatsu; 隆仁川西; Takahito Kawanishi; 邦夫柏野; Kunio Kashino
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2015-02-27
Filing date: 2015-02-27
Publication date: 2016-09-05
Anticipated expiration: 2035-02-27
Also published as: JP6062981B2

Abstract

PROBLEM TO BE SOLVED: To provide an image search apparatus capable of searching an image precisely representing an instance.SOLUTION: A precision calculation section 30 calculates a first image precision based on a collation result with a first feature vector of a feature point in an interest area; calculates a second image precision based on a collation result of a second feature vector of a feature point other than the interest area with respect to each of upper K images in the first image precision; and calculates a weighted linear sum of the first image precision and the second image precision as a ranking score.SELECTED DRAWING: Figure 1

Description

本発明は、映像検索装置、方法、及びプログラムに関する。 The present invention relates to a video search apparatus, method, and program.

従来、exponential IDF(eIDF)と呼ばれる特徴点の検索識別性(重要度、重み)を使用した確率的情報検索法(BM25)に基づくインスタンスサーチシステムが提案されている（例えば、非特許文献１参照）。 Conventionally, an instance search system based on a probabilistic information search method (BM25) using search identification (importance, weight) of feature points called exponential IDF (eIDF) has been proposed (for example, see Non-Patent Document 1). ).

Masaya Murata et al., “NTT Communication Science Laboratories and National Institute of Informatics at TRECVID 2013 Instance Search Task.”, In Proc. of TRECVID 2013 Workshop, 2013.Masaya Murata et al., “NTT Communication Science Laboratories and National Institute of Informatics at TRECVID 2013 Instance Search Task.”, In Proc. Of TRECVID 2013 Workshop, 2013.

本発明は、従来技術とは異なるアプローチにより、より高精度にインスタンスを示す画像を検索することができる映像検索装置、方法、及びプログラムを提供することを目的とする。 An object of the present invention is to provide a video search apparatus, method, and program capable of searching for an image showing an instance with higher accuracy by an approach different from the prior art.

上記目的を達成するために、本発明の映像検索装置は、クエリとなるインスタンスを含むインスタンス画像から、前記インスタンス画像内の前記インスタンスを示す注目領域の特徴点の第一の特徴ベクトル、及び前記注目領域でない注目領域外の特徴点の第二の特徴ベクトルを抽出するクエリ特徴抽出部と、複数の映像の各々について、前記映像から抽出された特徴点の特徴ベクトルと、前記クエリ特徴抽出部によって抽出された前記第一の特徴ベクトルとを照合した結果に基づいて、前記映像の第一の映像適合確率を算出し、前記算出された前記第一の映像適合確率の上位Ｋ件の映像の各々に対して、前記映像から抽出された特徴点の特徴ベクトルと、前記クエリ特徴抽出部によって抽出された前記第二の特徴ベクトルとを照合した結果に基づいて、前記映像の第二の映像適合確率を算出し、前記算出された前記第一の映像適合確率と、前記算出された第二の映像適合確率との重み付き線形和をランキングスコアとして算出する適合確率計算部と、を含んで構成されている。 In order to achieve the above object, the video search device of the present invention includes, from an instance image including an instance serving as a query, a first feature vector of a feature point of a region of interest indicating the instance in the instance image, and the attention A query feature extraction unit that extracts a second feature vector of a feature point outside the attention region that is not a region, a feature vector of feature points extracted from the video, and a query feature extraction unit for each of a plurality of videos A first video matching probability of the video is calculated based on a result of collating the first feature vector, and each of the top K videos of the calculated first video matching probability is calculated. On the other hand, the result of collating the feature vector of the feature point extracted from the video with the second feature vector extracted by the query feature extraction unit And calculating a second video match probability of the video, and calculating a weighted linear sum of the calculated first video match probability and the calculated second video match probability as a ranking score. And a matching probability calculation unit.

本発明の映像検索装置によれば、前記クエリ特徴抽出部が、クエリとなるインスタンスを含むインスタンス画像から、前記インスタンス画像内の前記インスタンスを示す注目領域の特徴点の第一の特徴ベクトル、及び前記注目領域でない注目領域外の特徴点の第二の特徴ベクトルを抽出する。そして、前記適合確率計算部が、複数の映像の各々について、前記映像から抽出された特徴点の特徴ベクトルと、前記クエリ特徴抽出部によって抽出された前記第一の特徴ベクトルとを照合した結果に基づいて、前記映像の第一の映像適合確率を算出し、前記算出された前記第一の映像適合確率の上位Ｋ件の映像の各々に対して、前記映像から抽出された特徴点の特徴ベクトルと、前記クエリ特徴抽出部によって抽出された前記第二の特徴ベクトルとを照合した結果に基づいて、前記映像の第二の映像適合確率を算出し、前記算出された前記第一の映像適合確率と、前記算出された第二の映像適合確率との重み付き線形和をランキングスコアとして算出する。 According to the video search device of the present invention, the query feature extraction unit includes, from an instance image including an instance serving as a query, a first feature vector of a feature point of a region of interest indicating the instance in the instance image, and the A second feature vector of feature points outside the attention area that is not the attention area is extracted. Then, for each of a plurality of videos, the matching probability calculation unit compares the feature vector of the feature point extracted from the video with the first feature vector extracted by the query feature extraction unit. And calculating a first video match probability of the video, and a feature vector of feature points extracted from the video for each of the top K videos of the calculated first video match probability And the second feature vector extracted by the query feature extraction unit to calculate a second video match probability of the video, and the calculated first video match probability And a weighted linear sum of the calculated second video matching probabilities as a ranking score.

このように、注目領域の特徴点の第一の特徴ベクトルと照合した結果に基づいて、第一の映像適合確率を算出し、第一の映像適合確率の上位Ｋ件の映像の各々に対して、注目領域外の特徴点の第二の特徴ベクトルとを照合した結果に基づいて、第二の映像適合確率を算出し、第一の映像適合確率と、第二の映像適合確率との重み付き線形和をランキングスコアとして算出することにより、より高精度にインスタンスを示す映像を検索することができる。 As described above, the first video matching probability is calculated based on the result of matching with the first feature vector of the feature point of the attention area, and for each of the top K videos of the first video matching probability. Based on the result of matching the second feature vector of the feature point outside the attention area, the second video matching probability is calculated, and the first video matching probability and the second video matching probability are weighted. By calculating a linear sum as a ranking score, it is possible to search for an image showing an instance with higher accuracy.

本発明の映像検索装置は、前記複数の映像の各々について、前記特徴抽出部により抽出された第一の特徴ベクトル及び第二の特徴ベクトルから、重複する特徴ベクトルを除いて、前記映像から抽出された特徴点の特徴ベクトルと、前記第一の特徴ベクトルとを照合した結果に基づいて、前記第一の特徴ベクトルの出現頻度を集計し、前記映像から抽出された特徴点の特徴ベクトルと、前記第二の特徴ベクトルとを照合した結果に基づいて、前記第二の特徴ベクトルの出現頻度を集計し、前記集計された前記第一の特徴ベクトルの出現頻度及び前記第二の特徴ベクトルの出現頻度に基づいて、映像長を計算し、前記第一の特徴ベクトルを有する映像数に基づいて、前記第一の特徴ベクトルの重みを計算し、前記第二の特徴ベクトルを有する映像数に基づいて、前記第二の特徴ベクトルの重みを計算する映像情報計算部を更に含み、前記適合確率計算部は、前記複数の映像の各々について、前記映像情報計算部によって前記映像について計算された前記第一の特徴ベクトルの出現頻度、映像長、及び前記第一の特徴ベクトルの重みに基づいて、前記映像の第一の映像適合確率を算出し、前記映像情報計算部によって前記映像について計算された前記第二の特徴ベクトルの出現頻度、映像長、及び前記第二の特徴ベクトルの重みに基づいて、前記映像の第二の映像適合確率を算出することができる。 The video search device of the present invention extracts each of the plurality of videos from the video by excluding overlapping feature vectors from the first feature vector and the second feature vector extracted by the feature extraction unit. The frequency of appearance of the first feature vector based on the result of matching the feature vector of the feature point and the first feature vector, the feature vector of the feature point extracted from the video, Based on the result of collating with the second feature vector, the appearance frequency of the second feature vector is aggregated, and the aggregated appearance frequency of the first feature vector and the appearance frequency of the second feature vector The video length is calculated on the basis of the number of videos having the first feature vector, the weight of the first feature vector is calculated based on the number of videos having the first feature vector, and the video having the second feature vector is calculated. Further including a video information calculation unit that calculates a weight of the second feature vector, wherein the matching probability calculation unit is calculated for the video by the video information calculation unit for each of the plurality of videos. Based on the appearance frequency of the first feature vector, the video length, and the weight of the first feature vector, a first video matching probability of the video is calculated and calculated for the video by the video information calculation unit. The second video matching probability of the video can be calculated based on the appearance frequency of the second feature vector, the video length, and the weight of the second feature vector.

本発明の映像情報計算部は、前記第一の特徴ベクトルを有する映像数が少ないほど大きくなる前記第一の特徴ベクトルの重みを計算し、前記第二の特徴ベクトルを有する映像数が少ないほど大きくなる前記第二の特徴ベクトルの重みを計算することができる。 The video information calculation unit of the present invention calculates the weight of the first feature vector that increases as the number of videos having the first feature vector decreases, and increases as the number of videos having the second feature vector decreases. The weight of the second feature vector can be calculated.

上記の適合確率計算部は、前記第一の映像適合確率に対する重みを、前記第二の映像適合確率に対する重みより大きくして、前記算出された前記第一の映像適合確率と、前記算出された第二の映像適合確率との重み付き線形和をランキングスコアとして算出することができる。 The matching probability calculation unit may be configured such that the weight for the first video matching probability is larger than the weight for the second video matching probability, and the calculated first video matching probability and the calculated A weighted linear sum with the second video matching probability can be calculated as a ranking score.

また、本発明の映像検索方法は、クエリ特徴抽出部及び適合確率計算部を含む映像検索装置における映像検索方法であって、前記クエリ特徴抽出部が、クエリとなるインスタンスを含むインスタンス画像から、前記インスタンス画像内の前記インスタンスを示す注目領域の特徴点の第一の特徴ベクトル、及び前記注目領域でない注目領域外の特徴点の第二の特徴ベクトルを抽出し、前記適合確率計算部が、複数の映像の各々について、前記映像から抽出された特徴点の特徴ベクトルと、前記クエリ特徴抽出部によって抽出された前記第一の特徴ベクトルとを照合した結果に基づいて、前記映像の第一の映像適合確率を算出し、前記算出された前記第一の映像適合確率の上位Ｋ件の映像の各々に対して、前記映像から抽出された特徴点の特徴ベクトルと、前記クエリ特徴抽出部によって抽出された前記第二の特徴ベクトルとを照合した結果に基づいて、前記映像の第二の映像適合確率を算出し、前記算出された前記第一の映像適合確率と、前記算出された第二の映像適合確率との重み付き線形和をランキングスコアとして算出する。 Further, the video search method of the present invention is a video search method in a video search device including a query feature extraction unit and a matching probability calculation unit, wherein the query feature extraction unit includes an instance image including an instance serving as a query, Extracting a first feature vector of a feature point of an attention area indicating the instance in the instance image and a second feature vector of a feature point outside the attention area that is not the attention area; For each video, based on the result of matching the feature vector of the feature point extracted from the video with the first feature vector extracted by the query feature extraction unit, the first video adaptation of the video Probability is calculated, and the feature points of the feature points extracted from the video for each of the top K videos of the calculated first video matching probability are calculated. And the second feature vector extracted by the query feature extraction unit, a second video matching probability of the video is calculated, and the calculated first video matching is calculated. A weighted linear sum of the probability and the calculated second video matching probability is calculated as a ranking score.

また、本発明のプログラムは、コンピュータを、上記の映像検索装置を構成する各部として機能させるためのプログラムである。 Further, the program of the present invention is a program for causing a computer to function as each part constituting the video search device.

以上説明したように、本発明の映像検索装置、方法、及びプログラムによれば、注目領域の特徴点の第一の特徴ベクトルと照合した結果に基づいて、第一の映像適合確率を算出し、第一の映像適合確率の上位Ｋ件の映像の各々に対して、注目領域外の特徴点の第二の特徴ベクトルとを照合した結果に基づいて、第二の映像適合確率を算出し、第一の映像適合確率と、第二の映像適合確率との重み付き線形和をランキングスコアとして算出することにより、より高精度にインスタンスを示す映像を検索することができる、という効果が得られる。 As described above, according to the video search device, method, and program of the present invention, the first video matching probability is calculated based on the result of matching with the first feature vector of the feature point of the region of interest, Based on the result of matching each of the top K videos of the first video matching probability with the second feature vector of the feature point outside the attention area, the second video matching probability is calculated, By calculating the weighted linear sum of the one video matching probability and the second video matching probability as a ranking score, an effect that a video showing an instance can be retrieved with higher accuracy can be obtained.

本発明の実施の形態に係る映像検索装置の構成を示す概略図である。It is the schematic which shows the structure of the video search apparatus which concerns on embodiment of this invention. クエリのデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of a query. クエリ特徴ＤＢのデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of query characteristic DB. 特徴照合結果のデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of a characteristic collation result. 注目領域外特徴出現頻度及び注目領域内特徴出現頻度のデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of the feature appearance frequency outside an attention area, and the feature appearance frequency within an attention area. 映像長のデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of video length. 特徴点重みのデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of a feature point weight. 途中の検索結果ランキングのデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of a search result ranking in the middle. 検索結果ランキングのデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of a search result ranking. 本発明の実施の形態における映像検索処理ルーチンの内容を示すフローチャートである。It is a flowchart which shows the content of the image | video search processing routine in embodiment of this invention.

以下、図面を参照して本発明の実施の形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

＜本実施の形態の概要＞
本実施の形態は、大規模映像データベースからインスタンスと呼ばれる特定の人物、物体、場所が映る映像を検索し、適合度順にランキングしてユーザに提示する映像検索装置に関するものである。映像検索装置の入力はインスタンスを写した画像(インスタンス画像)と、その画像内におけるインスタンスの領域を示した注目領域画像(region-of-interest, ROI 画像) のペアであり、システムはこの画像情報だけを用いてデータベースに格 <Outline of the present embodiment>
The present embodiment relates to a video search device that searches a video showing a specific person, object, and place called an instance from a large-scale video database, ranks them in order of suitability, and presents them to the user. The input of the video search device is a pair of an image showing the instance (instance image) and a region of interest image (region-of-interest, ROI image) that shows the region of the instance in the image. Only in the database

納されている大量の映像のランキング結果を検索結果として出力する。本実施の形態では２段階の検索処理を行うことで高精度なインスタンスサーチシステムを実現する。 The ranking results of a large number of stored videos are output as search results. In the present embodiment, a highly accurate instance search system is realized by performing a two-stage search process.

また、本実施の形態ではクエリのインスタンス画像中の注目領域から取得された画像特徴のみを用いてまずＢＭ２５を実行し、その検索結果ランキング上位Ｋ件の映像を、注目領域外から取得された画像特徴のみを用いたＢＭ２５スコアも考慮してリランキングする手法を提案する。 In this embodiment, the BM 25 is first executed using only the image features acquired from the attention area in the query instance image, and the K results of the top search result ranking images are acquired from outside the attention area. We propose a re-ranking method that also considers the BM25 score using only features.

また、上記の非特許文献１に記載のeIDF には設定パラメータがありその設定が困難であるといった問題があった。本実施の形態ではその様なパラメータが存在しない、square root IDF(srIDF) も提案する。 In addition, the eIDF described in Non-Patent Document 1 has a problem that setting parameters are difficult to set. In this embodiment, a square root IDF (srIDF) that does not have such a parameter is also proposed.

＜本発明の実施の形態に係る映像検索装置の構成＞
本実施の形態に係る映像検索装置１０は、ＣＰＵと、ＲＡＭと、後述する映像検索処理ルーチンを実行するためのプログラムを記憶したＲＯＭとを備えたコンピュータで構成されている。このコンピュータは、機能的には、図１に示すように、入力部１２、演算部１４、及び出力部１６を備えている。 <Configuration of Video Retrieval Device According to Embodiment of the Present Invention>
The video search apparatus 10 according to the present embodiment is configured by a computer including a CPU, a RAM, and a ROM that stores a program for executing a video search processing routine described later. Functionally, the computer includes an input unit 12, a calculation unit 14, and an output unit 16, as shown in FIG.

入力部１２は、インスタンスが映る画像(インスタンス画像)と、その画像中におけるインスタンスの領域を白黒で示した注目領域画像(ROI 画像)とのペアを、クエリとして、少なくとも１つのクエリを受け付ける。クエリの例を図２に示す。図２の左の画像がインスタンス画像、右の画像が注目領域画像であり、クエリは、この造花がささる大きな花瓶を表している。なお、インスタンス画像と注目領域画像との複数のペアがクエリとして入力されることもある。 The input unit 12 accepts at least one query using, as a query, a pair of an image showing an instance (instance image) and a region of interest image (ROI image) in which the region of the instance in the image is displayed in black and white. An example of the query is shown in FIG. The left image in FIG. 2 is an instance image, and the right image is a region-of-interest image. A plurality of pairs of instance images and attention area images may be input as queries.

演算部１４は、クエリを手掛かりにして、大量の映像の中から実際にこのインスタンス(この造花がささる大きな花瓶)が映る映像を検索し、適合度順でランキングしたものを検索結果として出力部１６により出力する。 The calculation unit 14 searches for a video in which this instance (a large vase where this artificial flower is found) actually appears from a large number of videos using a query as a clue, and outputs a ranking result in the order of suitability as an output unit. 16 for output.

演算部１４は、クエリ特徴抽出部２０、クエリ特徴データベース（ＤＢ）２２、映像特徴データベース（ＤＢ）２４、特徴照合部２６と、映像情報計算部２８と、適合確率計算部３０と、検索ランキング部３２とを含んで構成で表すことができる。また、演算部１４は、注目領域外特徴集合４０、注目領域内特徴集合４２、特徴照合結果４４、注目領域外特徴出現頻度４６、注目領域内特徴出現頻度４８、特徴点重み５０、及び映像長５２の各々を記憶する記憶領域を備えている。 The calculation unit 14 includes a query feature extraction unit 20, a query feature database (DB) 22, a video feature database (DB) 24, a feature matching unit 26, a video information calculation unit 28, a matching probability calculation unit 30, and a search ranking unit. 32. The calculation unit 14 also includes an outside-of-attention region feature set 40, an inside-of-interest feature set 42, a feature matching result 44, an out-of-interest feature appearance frequency 46, an inside-of-interest feature appearance frequency 48, a feature point weight 50, and video length A storage area for storing each of 52 is provided.

クエリ特徴抽出部２０は、入力部１２で受け付けたクエリのインスタンス画像から特徴点を検出し、検出した特徴点の特徴量を記述した特徴ベクトルを抽出する。クエリ特徴抽出部２０は、例えば、インスタンス画像において輝度値の変化が激しい箇所をHarris-Laplace法（「C. Harris et al., “A combined corner and edge detector.”, 4th Alvey Vision Conf., 1988.」参照）により特徴点として検出する。そして、検出した各特徴点の特徴量をSIFT又はCompact Color SIFT（「K. Mikolajczyk et al., “Scale and affine invariant interest point detectors.”, IJCV, 2004.」参照）により記述する。SIFTは128次元のベクトルであり、Compact Color SIFTは、輝度に関する１２８次元のＳＩＦＴ特徴量、色度を表す６４次元のベクトルを追加した１９２次元のベクトルである。 The query feature extraction unit 20 detects a feature point from the query instance image received by the input unit 12 and extracts a feature vector describing the feature amount of the detected feature point. For example, the query feature extraction unit 20 may use a Harris-Laplace method (“C. Harris et al.,“ A combined corner and edge detector. ”, 4th Alvey Vision Conf. As a feature point. Then, the feature amount of each detected feature point is described by SIFT or Compact Color SIFT (see “K. Mikolajczyk et al.,“ Scale and affine invariant interest point detectors ”, IJCV, 2004.). SIFT is a 128-dimensional vector, and Compact Color SIFT is a 192-dimensional vector to which a 128-dimensional SIFT feature value related to luminance and a 64-dimensional vector representing chromaticity are added.

クエリ特徴抽出部２０は、入力部で受け付けたクエリの注目領域画像をもとに、各特徴点の特徴ベクトルを、注目領域内の第一の特徴ベクトルと、注目領域外の第二の特徴ベクトルとに分けて、注目領域外特徴集合４０及び注目領域内特徴集合４２に格納し、第一の特徴ベクトルと第二の特徴ベクトルとを合わせて、重複した特徴ベクトルを取り除いた集合を、クエリ特徴ＤＢ２２に格納する（図３参照）。なお、複数のクエリが入力された場合には、複数のクエリについて抽出された第一の特徴ベクトルと第二の特徴ベクトルとを合わせて、重複した特徴ベクトルを取り除いた集合を、クエリ特徴ＤＢ２２に格納すればよい。 The query feature extraction unit 20 uses the feature region image of the query received by the input unit as the feature vector of each feature point, the first feature vector in the attention region, and the second feature vector outside the attention region. Are stored in the feature-out-of-interest feature set 40 and the feature-in-feature feature set 42, and a set obtained by removing the duplicate feature vector by combining the first feature vector and the second feature vector is obtained as a query feature. Store in the DB 22 (see FIG. 3). When a plurality of queries are input, a set obtained by combining the first feature vector and the second feature vector extracted for the plurality of queries and removing the duplicate feature vector is stored in the query feature DB 22. Store it.

映像特徴データベース２４には、映像データベース（図示省略）に大量に格納されている約５〜２０秒の映像の各々から抽出された各特徴点の特徴ベクトルが記憶されている。映像からの特徴抽出では、映像をフレーム画像(例えば１フレーム/秒)に分割し、各フレーム画像から、クエリ特徴抽出部２０と同様に、各特徴点の特徴ベクトルを抽出し、映像の各フレーム画像から抽出された各特徴点の特徴ベクトルを集約したものを、当該映像の各特徴点の特徴ベクトルとしている。 The video feature database 24 stores feature vectors of each feature point extracted from each video of about 5 to 20 seconds stored in large quantities in a video database (not shown). In feature extraction from video, the video is divided into frame images (for example, 1 frame / second), and feature vectors of each feature point are extracted from each frame image in the same manner as the query feature extraction unit 20, and each frame of the video is extracted. A set of feature vectors of feature points extracted from an image is used as a feature vector of each feature point of the video.

特徴照合部２６は、映像特徴データベース２４に格納されている各映像の特徴ベクトルと、クエリ特徴データベース２２に格納されている第一の特徴ベクトル及び第二の特徴ベクトルとを照合し、映像の各々について、当該映像の各特徴ベクトルに対して最もベクトルの値が近い第一の特徴ベクトル又は第二の特徴ベクトルを探索し、ベクトル間のコサイン類似度が所定値以上（例えば、０．９）以上という条件を考慮して、照合した第一の特徴ベクトル又は第二の特徴ベクトルを求める。上記のCompact Color SIFTの場合、例えば、１９２次元の特徴ベクトル間のコサイン類似度（０〜１の範囲の値を取り、同一の特徴ベクトルの場合は１）を用いて、コサイン類似度が所定値以上（例えば、０．９５）の特徴ベクトルと第一の特徴ベクトル又は第二の特徴ベクトルとを一致する特徴ベクトルと判断する。この照合結果が、特徴照合結果４４に格納される（図４参照）。 The feature collating unit 26 collates the feature vector of each video stored in the video feature database 24 with the first feature vector and the second feature vector stored in the query feature database 22, and each of the videos Is searched for the first feature vector or the second feature vector having the closest vector value to each feature vector of the video, and the cosine similarity between the vectors is not less than a predetermined value (for example, 0.9) or more. The first feature vector or the second feature vector that has been collated is obtained in consideration of the above condition. In the case of the above Compact Color SIFT, for example, the cosine similarity between 192-dimensional feature vectors (takes a value in the range of 0 to 1 and is 1 in the case of the same feature vector), and the cosine similarity is a predetermined value. The feature vector described above (for example, 0.95) and the first feature vector or the second feature vector are determined as matching feature vectors. This matching result is stored in the feature matching result 44 (see FIG. 4).

映像情報計算部２８は、特徴照合部２６による照合結果に基づいて、複数の映像の各々について、第一の特徴ベクトルの各々の出現頻度を集計し、第二の特徴ベクトルの各々の出現頻度を集計して、注目領域内特徴出現頻度４８及び注目領域外特徴出現頻度４６に格納すると共に（図５参照）、集計された第一の特徴ベクトルの出現頻度及び第二の特徴ベクトルの出現頻度の総和を、当該映像の映像長として計算し、映像長５２に格納する（図６参照）。 Based on the collation result by the feature collating unit 26, the video information calculating unit 28 totals the appearance frequencies of the first feature vectors for each of the plurality of videos, and calculates the appearance frequencies of the second feature vectors. Aggregating and storing the feature appearance frequency 48 in the attention area and the feature appearance frequency 46 outside the attention area (see FIG. 5), and the appearance frequency of the first feature vector and the appearance frequency of the second feature vector that are aggregated. The sum is calculated as the video length of the video and stored in the video length 52 (see FIG. 6).

また、映像情報計算部２８は、複数の映像の各々について集計された第一の特徴ベクトルの出現頻度及び第二の特徴ベクトルの出現頻度に基づいて、第一の特徴ベクトルの各々に対し、以下の（１）式に従って、当該第一の特徴ベクトルを有する映像数に基づく、第一の特徴ベクトルの重みを計算し、第二の特徴ベクトルの各々に対し、以下の（１）式に従って、当該第二の特徴ベクトルを有する映像数に基づく、第二の特徴ベクトルの重みを計算して、特徴点重み５０に格納する（図７参照）。 In addition, the video information calculation unit 28 performs the following for each of the first feature vectors based on the appearance frequency of the first feature vector and the appearance frequency of the second feature vector that are aggregated for each of the plurality of videos. The weight of the first feature vector based on the number of videos having the first feature vector is calculated according to the equation (1), and for each second feature vector, according to the following equation (1) Based on the number of images having the second feature vector, the weight of the second feature vector is calculated and stored in the feature point weight 50 (see FIG. 7).

ここで、Nは映像の総数、niは特徴ベクトルIを有する映像数である。この重みは式の形からsquare root IDF(srIDF)と呼ばれ、n_iが大きくなる傾向を示せば値が急速に小さくなる性質を要する。つまり、各特徴ベクトルの出現傾向を定量化したものであり、多頻出傾向にある特徴ベクトルは検索(映像の識別)においてあまり有効ではないと考えられるため、重みが小さくなる。非特許文献１のeIDFと比べて設定パラメータが存在しないため、実装が容易であるといった特徴がある。 Here, N is the total number of videos, and ni is the number of videos having the feature vector I. This weight is called square root IDF (srIDF) from the form of the equation, and if the value of n _i shows a tendency to increase, the value needs to decrease rapidly. In other words, the appearance tendency of each feature vector is quantified, and a feature vector having a tendency to appear frequently is considered to be not very effective in the search (video identification), so the weight becomes small. Compared to the eIDF of Non-Patent Document 1, there is no setting parameter, so that the mounting is easy.

適合確率計算部３０は、複数の映像の各々について、映像情報計算部２８によって当該映像について計算された第一の特徴ベクトルの各々の出現頻度、映像長、及び第一の特徴ベクトルの各々の重みに基づいて、以下の（２）式に従って、当該映像の第一の映像適合確率ＢＭ２５を算出する。 For each of the plurality of videos, the matching probability calculation unit 30 calculates the appearance frequency, the video length, and the weight of each of the first feature vectors calculated for the video by the video information calculation unit 28. Based on the above, the first video matching probability BM25 of the video is calculated according to the following equation (2).

ここで、q_j ^ROIは、ｊ番目の第一の特徴ベクトルであり、q_j ^ROI > 0は、映像ｖ内にこのj番目の第一の特徴ベクトルが１回以上存在していることを意味し、上記(2)式の右辺はこれらの和であることを意味する。v_jが実際の出現回数であり、注目領域内特徴出現頻度４８にこの情報が格納されている。k₁,b₁はパラメータであり、k₁= 2,b₁ = 0.75がよく使用される。vl,avvlはそれぞれ映像ｖの映像長、平均映像ショット長であり、映像長５２にこの情報が格納されている。平均映像ショット長は映像長５２の各映像長の平均値である。ここでsrIDF_jの代わりに非特許文献１のeIDF_jを用いてもよい。 Here, q _j ^ROI is the j-th first feature vector, and q _j ^ROI > 0 means that the j-th first feature vector exists once or more in the video v. In addition, the right side of the above equation (2) means the sum of these. v _j is the actual number of appearances, and this information is stored in the feature region in-region appearance frequency 48. k ₁ and b ₁ are parameters, and k ₁ = 2, b ₁ = 0.75 is often used. vl and avvl are the video length and average video shot length of the video v, respectively, and this information is stored in the video length 52. The average video shot length is an average value of the video lengths of the video length 52. Here, eIDF _j of Non-Patent Document 1 may be used instead of srIDF _j .

適合確率計算部３０は、上記で計算した第一の映像適合確率ＢＭ２５の降順で映像をランキングし(途中の検索結果ランキング(図８参照))、途中の検索結果ランキング上位Ｋ件の映像の各々に対して、映像情報計算部２８によって当該映像について計算された第二の特徴ベクトルの出現頻度、映像長、及び第二の特徴ベクトルの重みに基づいて、以下の（３）式に従って、当該映像の第二の映像適合確率ＢＭ２５を計算する。 The matching probability calculation unit 30 ranks the videos in the descending order of the first video matching probability BM25 calculated above (halfway search result ranking (see FIG. 8)), and each of the top K search result ranking videos in the middle. On the other hand, based on the appearance frequency of the second feature vector, the video length, and the weight of the second feature vector calculated for the video by the video information calculation unit 28, the video is calculated according to the following equation (3). The second video matching probability BM25 is calculated.

ここで、q_j’は、j 番目の第二の特徴ベクトルであり、q_j’ > 0は、映像ｖ内にこのj番目の第二の特徴ベクトルが１回以上存在していることを意味し、上記（３）式の右辺はこれらの和であることを意味する。v_j’が実際の出現回数であり、注目領域外特徴出現頻度４６にこの情報が格納されている。k₁,b₁,vl,avvlは上記（２）式と同じである。 Here, q _j ′ is the j-th second feature vector, and q _j ′> 0 means that the j-th second feature vector exists once or more in the video v. And the right side of the above equation (3) means the sum of these. v _j ′ is the actual number of appearances, and this information is stored in the feature appearance frequency 46 outside the attention area. k ₁ , b ₁ , vl, and avvl are the same as the above equation (2).

また、適合確率計算部３０は、途中の検索結果ランキング上位Ｋ件の映像の各々に対して、以下の（４）式に従って、算出された当該映像の第一の映像適合確率と、算出された当該映像の第二の映像適合確率との重み付き線形和をランキングスコアとして算出する。 In addition, the matching probability calculation unit 30 calculates the first video matching probability of the corresponding video calculated according to the following equation (4) for each of the top K search result ranking videos in the middle. A weighted linear sum with the second video matching probability of the video is calculated as a ranking score.

ここでscore(v, q)が最終的なランキングスコアであり、αは重みである。通常、（２）式のスコアの重みの方が大きく、例えば、上記（２）式のスコアの1/10が、（２）式に足される(α = 1/10)。この様な２段階のリランキング法により、背景情報(第二の特徴ベクトル)の検索結果ランキングへの寄与を低減させる(コントロールする)ことができ、インスタンス情報(第一の特徴ベクトル)を多く含む映像をランキング上位へ表示させることができるようになる。K = 20〜30が通常の設定である。上記（３）式の重みとして０も設定可能であり、これは第二の特徴ベクトルでのリランキングは行わないことに相当する。クエリのインスタンス画像に映るインスタンスとその背景が明らかに関係しない場合、リランキングによる検索精度向上は望めないばかりか精度の劣化につながるため、この様な重み０の設定がなされる。 Here, score (v, q) is a final ranking score, and α is a weight. Usually, the weight of the score of the formula (2) is larger. For example, 1/10 of the score of the formula (2) is added to the formula (2) (α = 1/10). By such a two-step reranking method, the contribution of the background information (second feature vector) to the search result ranking can be reduced (controlled), and a lot of instance information (first feature vector) is included. The video can be displayed at the top of the ranking. K = 20-30 is a normal setting. 0 can also be set as the weight of the above equation (3), which corresponds to not performing reranking with the second feature vector. When the instance shown in the instance image of the query and its background are not clearly related, not only improvement in search accuracy by reranking can be expected, but also the accuracy is deteriorated, so such weight 0 is set.

検索ランキング部３２は、上記（４）式で算出されたランキングスコアの降順でランキングした検索結果を作成し、出力部１６により出力する。図９に、検索結果のデータ構造の一例を示す。図９では、ランキングスコアの降順で並べられた映像とランキングスコアとが対応付けられたデータ構造となっている。 The search ranking unit 32 creates a search result ranked in descending order of the ranking score calculated by the above formula (4), and outputs it by the output unit 16. FIG. 9 shows an example of the data structure of the search result. FIG. 9 shows a data structure in which videos arranged in descending order of ranking scores and ranking scores are associated with each other.

なお、検索結果は、上記のようにランキング形式にする場合に限定されず、ランキングスコアが最大となる映像のみを検索結果としてもよいし、ランキングスコアが所定値以上となる映像をランダムに並べた検索結果としてもよい。 The search result is not limited to the ranking format as described above. Only the video with the highest ranking score may be used as the search result, or the videos with the ranking score equal to or higher than the predetermined value are randomly arranged. It is good also as a search result.

＜本発明の実施の形態に係る映像検索装置の作用＞
次に、本発明の実施の形態に係る映像検索装置１０の作用について説明する。映像検索装置１０に、少なくとも一つのクエリが入力されると、映像検索装置１０において、図１０に示す映像検索処理ルーチンが実行される。 <Operation of Video Retrieval Device According to Embodiment of the Present Invention>
Next, the operation of the video search apparatus 10 according to the embodiment of the present invention will be described. When at least one query is input to the video search device 10, the video search processing routine shown in FIG. 10 is executed in the video search device 10.

ステップＳ１００で、クエリ特徴抽出部２０が、入力されたクエリのインスタンス画像から特徴点を検出し、注目領域から検出した特徴点の特徴量を記述した第一の特徴ベクトル、及び注目領域外から検出した特徴点の特徴量を記述した第二の特徴ベクトルを抽出し、抽出した第一の特徴ベクトル及び第二の特徴ベクトルから重複した特徴ベクトルを除いて、クエリ特徴データベース２２に格納する。 In step S100, the query feature extraction unit 20 detects a feature point from the input instance image of the query, and detects the feature value of the feature point detected from the attention area and the outside from the attention area. The second feature vector describing the feature quantity of the feature point is extracted, and the duplicated feature vectors are removed from the extracted first feature vector and second feature vector, and stored in the query feature database 22.

次に、ステップＳ１０２で、特徴照合部２６は、複数の映像の各々について、映像特徴データベース２４に格納されている当該映像の特徴ベクトルと、クエリ特徴データベース２２に格納された第一の特徴ベクトル及び第二の特徴ベクトルとを照合する。 Next, in step S102, the feature matching unit 26, for each of the plurality of videos, the feature vector of the video stored in the video feature database 24, the first feature vector stored in the query feature database 22, and Match the second feature vector.

次に、ステップＳ１０４で、映像情報計算部２８が、複数の映像の各々について、上記ステップＳ１０２の照合結果に基づいて、当該映像の第一の特徴ベクトルの出現頻度、及び第二の特徴ベクトルの出現頻度を集計する。 Next, in step S104, the video information calculation unit 28 determines, for each of the plurality of videos, the appearance frequency of the first feature vector of the video and the second feature vector based on the collation result in step S102. Aggregate appearance frequency.

そして、ステップＳ１０６において、映像情報計算部２８が、第一の特徴ベクトルの各々について、上記ステップＳ１０４の集計結果に基づいて、上記（１）式に従って当該第一の特徴ベクトルの重みを計算し、第二の特徴ベクトルの各々について、上記ステップＳ１０４の集計結果に基づいて、上記（１）式に従って当該第二の特徴ベクトルの重みを計算する。 In step S106, the video information calculation unit 28 calculates the weight of the first feature vector for each of the first feature vectors based on the result of the step S104 according to the equation (1), For each of the second feature vectors, the weight of the second feature vector is calculated according to the above equation (1) based on the tabulation result of step S104.

そして、ステップＳ１０８で、映像情報計算部２８が、複数の映像の各々について、上記ステップＳ１０４で集計された当該映像の第一の特徴ベクトルの出現頻度、及び第二の特徴ベクトルの出現頻度に基づいて、当該映像の映像長を計算する。 In step S108, the video information calculation unit 28, for each of the plurality of videos, is based on the appearance frequency of the first feature vector and the appearance frequency of the second feature vector of the video totaled in step S104. The video length of the video is calculated.

次に、ステップＳ１１０で、適合確率計算部３０が、複数の映像の各々について、上記ステップＳ１０４で集計された当該映像の第一の特徴ベクトルの出現頻度、及び第二の特徴ベクトルの出現頻度と、上記ステップＳ１０６で計算された第一の特徴ベクトルの重み及び第二の特徴ベクトルの重みと、上記ステップＳ１０８で計算された当該映像の映像長とに基づいて、上記（２）式、（３）式に従って、当該映像の第一の映像適合確率及び第二の映像適合確率を計算し、上記（４）式に従って、当該映像の第一の映像適合確率及び第二の映像適合確率の重み付け和を、ランキングスコアとして計算する。 Next, in step S110, the matching probability calculation unit 30 calculates the appearance frequency of the first feature vector and the appearance frequency of the second feature vector of the video collected in step S104 for each of the plurality of videos. Based on the weight of the first feature vector and the weight of the second feature vector calculated in step S106 and the video length of the video calculated in step S108, the above formula (2), (3 ) To calculate the first video match probability and the second video match probability of the video, and according to the above formula (4), the weighted sum of the first video match probability and the second video match probability of the video. Is calculated as a ranking score.

そして、ステップＳ１１２で、検索ランキング部３２が、上記ステップＳ１１０で計算されたランキングスコアの降順で映像をランキングした検索結果を作成し、出力部１６により出力して映像検索処理ルーチンを終了する。 In step S112, the search ranking unit 32 creates a search result in which videos are ranked in descending order of the ranking score calculated in step S110. The search result is output by the output unit 16 and the video search processing routine ends.

以上説明したように、本実施の形態に係る映像検索装置によれば、インスタンス画像の注目領域の特徴点の第一の特徴ベクトルと照合した結果に基づいて、第一の映像適合確率を算出し、第一の映像適合確率の上位Ｋ件の映像の各々に対して、インスタンス画像の注目領域外の特徴点の第二の特徴ベクトルとを照合した結果に基づいて、第二の映像適合確率を算出し、第一の映像適合確率と、第二の映像適合確率との重み付き線形和をランキングスコアとして算出することにより、より高精度にインスタンスを示す映像を検索することができる。 As described above, according to the video search device according to the present embodiment, the first video matching probability is calculated based on the result of matching with the first feature vector of the feature point of the attention area of the instance image. Based on the result of matching each of the top K videos of the first video matching probability with the second feature vector of the feature point outside the attention area of the instance image, the second video matching probability is calculated. By calculating and calculating a weighted linear sum of the first video matching probability and the second video matching probability as a ranking score, a video showing an instance can be searched with higher accuracy.

また、本実施の形態は画像クエリに基づく特定物体探索システムの一般的な枠組であり、本実施の形態で使用したもの以外の特徴点検出器や特徴記述法、ランキング法にも対応可能である。提案したリランキング法により、インスタンスそのものを含んでいる確率が高く、かつクエリのインスタンス画像全体とも適合している確率が高い映像を検索結果上位ランクに表示させることが可能となり、高精度なインスタンス検索システムの実現が期待できる。 In addition, this embodiment is a general framework of a specific object search system based on an image query, and can correspond to a feature point detector, a feature description method, and a ranking method other than those used in this embodiment. . With the proposed re-ranking method, it is possible to display videos with a high probability of including the instance itself and a high probability of matching with the entire instance image of the query in the higher rank of the search results, and high-accuracy instance search Realization of the system can be expected.

一度本実施の形態の技術が実用レベルに達するとその応用は非常に広範である。タブレットやウェアラブルデバイスに本実施の形態の技術が搭載されると、ユーザは現実世界の対象物を撮影する感覚でデータベース映像にひもづくWWW上の様々な情報にアクセスすることが可能になる。 Once the technology of this embodiment reaches a practical level, its application is very wide. If the technology of the present embodiment is installed in a tablet or wearable device, the user can access various information on the WWW associated with the database video as if photographing a real-world object.

なお、本発明は、上述した実施形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 Note that the present invention is not limited to the above-described embodiment, and various modifications and applications are possible without departing from the gist of the present invention.

例えば、本願明細書中において、プログラムが予めインストールされている実施形態として説明したが、当該プログラムを、コンピュータ読み取り可能な記録媒体に格納して提供することも可能である。 For example, in the present specification, the embodiment has been described in which the program is installed in advance. However, the program may be provided by being stored in a computer-readable recording medium.

１０映像検索装置
１２入力部
１４演算部
１６出力部
２０クエリ特徴抽出部
２２クエリ特徴データベース
２４映像特徴データベース
２６特徴照合部
２８映像情報計算部
３０適合確率計算部
３２検索ランキング部
４０注目領域外特徴集合
４２注目領域内特徴集合
４４特徴照合結果
４６注目領域外特徴出現頻度
４８注目領域内特徴出現頻度
５２映像長 DESCRIPTION OF SYMBOLS 10 Image | video search device 12 Input part 14 Operation part 16 Output part 20 Query feature extraction part 22 Query feature database 24 Video feature database 26 Feature collation part 28 Image | video information calculation part 30 Relevance probability calculation part 32 Search ranking part 40 Out of attention area feature set 42 Intra-region feature set 44 Feature matching result 46 Out-of-region feature appearance frequency 48 In-region feature appearance frequency 52 Video length

Claims

Extract a first feature vector of a feature point of an attention area indicating the instance in the instance image and a second feature vector of a feature point outside the attention area that is not the attention area from an instance image including an instance serving as a query. A query feature extraction unit to
For each of a plurality of videos, based on the result of matching the feature vector of the feature point extracted from the video with the first feature vector extracted by the query feature extraction unit, the first of the video Calculate the video match probability,
A feature vector of feature points extracted from the video and the second feature extracted by the query feature extraction unit for each of the top K videos of the calculated first video matching probability. A second video matching probability of the video is calculated based on a result of matching the vector, and a weight between the calculated first video matching probability and the calculated second video matching probability A fitting probability calculation unit that calculates a linear sum with a ranking score;
Video search device including

For each of the plurality of videos, a feature vector of feature points extracted from the video by excluding overlapping feature vectors from the first feature vector and the second feature vector extracted by the query feature extraction unit; Based on the result of collating the first feature vector, the appearance frequency of the first feature vector is totaled, and the feature vector of the feature points extracted from the video and the second feature vector are Based on the collation result, the appearance frequency of the second feature vector is aggregated, and the video length is calculated based on the aggregated appearance frequency of the first feature vector and the appearance frequency of the second feature vector. Calculate
The weight of the first feature vector is calculated based on the number of videos having the first feature vector, and the weight of the second feature vector is calculated based on the number of videos having the second feature vector. A video information calculation unit for
For each of the plurality of videos, the matching probability calculation unit calculates an appearance frequency of the first feature vector, a video length, and a weight of the first feature vector calculated for the video by the video information calculation unit. Based on the first video match probability of the video,
Based on the appearance frequency of the second feature vector, the video length, and the weight of the second feature vector calculated for the video by the video information calculation unit, a second video matching probability of the video is calculated. The video search device according to claim 1.

The video information calculation unit calculates the weight of the first feature vector that increases as the number of videos having the first feature vector decreases, and increases as the number of videos having the second feature vector decreases. The video search apparatus according to claim 2, wherein the weight of the second feature vector is calculated.

The adaptation probability calculation unit sets the weight for the first video adaptation probability to be greater than the weight for the second video adaptation probability, and the calculated first video adaptation probability and the calculated The video search device according to claim 1, wherein a weighted linear sum with the second video match probability is calculated as a ranking score.

A video search method in a video search device including a query feature extraction unit and a matching probability calculation unit,
A first feature vector of a feature point of a region of interest indicating the instance in the instance image, and a feature point outside the region of interest that is not the region of interest Extract the second feature vector,
Based on a result of matching the feature vector of the feature point extracted from the video and the first feature vector extracted by the query feature extraction unit for each of a plurality of videos Calculating a first video matching probability of the video;
A feature vector of feature points extracted from the video and the second feature extracted by the query feature extraction unit for each of the top K videos of the calculated first video matching probability. A second video matching probability of the video is calculated based on a result of matching the vector, and a weight between the calculated first video matching probability and the calculated second video matching probability Video search method that calculates a linear sum with a ranking score.

The program for functioning a computer as each part which comprises the video search device of any one of Claims 1-4.