JP5121917B2

JP5121917B2 - Image search apparatus, image search method and program

Info

Publication number: JP5121917B2
Application number: JP2010284156A
Authority: JP
Inventors: ゾランステイチ
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2010-12-21
Filing date: 2010-12-21
Publication date: 2013-01-16
Anticipated expiration: 2030-12-21
Also published as: JP2012133516A

Description

本発明は、画像検索装置、画像検索方法及びプログラムに関する。 The present invention relates to an image search device, an image search method, and a program.

画像を検索キーとして入力し、画像の特徴量（配色、テクスチャ、形状等の画像の特徴を数値化して表現したもの）を比較することにより、検索キーである画像（以下「クエリ画像」という）に類似する画像を検索する技術が知られている。ユーザがクエリ画像を入力すると、クエリ画像から特徴量を抽出して、検索対象の画像の特徴量との類似度を算出することで、類似画像を検索する（例えば、特許文献１）。 By inputting an image as a search key and comparing image feature amounts (represented by quantifying image features such as color scheme, texture, and shape), an image that is a search key (hereinafter referred to as a “query image”) A technique for searching for an image similar to is known. When the user inputs a query image, the feature amount is extracted from the query image, and the similarity with the feature amount of the search target image is calculated to search for a similar image (for example, Patent Document 1).

また、画像内の部分的な領域に注目した類似画像の検索技術として、ビジュアルキーワードという手法が知られている。これは、１枚の画像が複数の部分画像により構成されていると捉えることにより考案された手法であり、次のような処理により特徴ベクトルが生成される。 Further, as a similar image search technique that focuses on a partial area in an image, a technique called a visual keyword is known. This is a technique devised by considering that one image is composed of a plurality of partial images, and a feature vector is generated by the following processing.

即ち、画像から複数の部分画像を抽出して、予め画像がクラスタリングされて形成された基準となるクラスタ（以下適宜「基準クラスタ」という）に対して、その部分画像を特徴量に基づいてマッピング（分類）し、各部分画像が属する基準クラスタの数に基づいて特徴ベクトルが生成される。このように、ビジュアルキーワードを用いることで、画像全体から抽出される特徴量ではなく、画像を細かな領域として捉えた特徴量により、精度のよい画像検索が可能になる。 That is, a plurality of partial images are extracted from an image, and the partial image is mapped based on the feature amount to a reference cluster (hereinafter referred to as “reference cluster” where appropriate) formed by clustering the images in advance ( And feature vectors are generated based on the number of reference clusters to which each partial image belongs. As described above, by using the visual keyword, it is possible to perform an image search with high accuracy not by the feature amount extracted from the entire image but by the feature amount obtained by capturing the image as a fine area.

上記のように算出した類似度に基づいてクエリ画像に類似する画像を検索結果として出力する。類似度に基づいてランキング（順位付け）して表示する際には、該類似度が所定の閾値以上となる画像を選出して検索結果として出力するのが一般的である（例えば、特許文献２）。 An image similar to the query image is output as a search result based on the similarity calculated as described above. When ranking and ranking are displayed based on the similarity, it is common to select an image having the similarity equal to or higher than a predetermined threshold and output it as a search result (for example, Patent Document 2). ).

特開２００１−５２１７５号公報JP 2001-52175 A 特開２００５−１４８８０１号公報JP 2005-148801 A

ところで、画像検索を行う場合に、ユーザが入力するクエリ画像には、ユーザの探索したいオブジェクト（対象物）が含まれており、そのオブジェクトと類似する画像が検索結果として出力されるのが望ましい。 By the way, when performing an image search, the query image input by the user includes an object (target object) that the user wants to search, and it is desirable that an image similar to the object is output as a search result.

しかし、画像から抽出される特徴量や特徴ベクトルは、画像の配色やテクスチャ、形状等を機械的に数値化したものであるため、クエリ画像に含まれるオブジェクトを考慮したものではない。このため、類似度が予め設定した閾値以上であっても、クエリ画像と類似するオブジェクトを含まない画像を選出してしまうことがあった。また、閾値の値を高めて設定すると、クエリ画像と類似するオブジェクトを含む画像を検索結果として選出できなくなった。このように、閾値を予め定めると、選出される検索結果の精度が落ちてしまう。 However, the feature amount and feature vector extracted from the image are obtained by mechanically quantifying the color scheme, texture, shape, and the like of the image, and thus do not consider the objects included in the query image. For this reason, even if the similarity is equal to or higher than a preset threshold value, an image that does not include an object similar to the query image may be selected. In addition, if the threshold value is set higher, an image including an object similar to the query image cannot be selected as a search result. As described above, when the threshold value is determined in advance, the accuracy of the selected search result is lowered.

本発明は、上述の課題に鑑みてなされたものであって、その目的とするところは、類似画像検索の結果として出力する画像の選出精度を向上させることである。 The present invention has been made in view of the above-described problems, and an object thereof is to improve the accuracy of selecting an image to be output as a result of a similar image search.

上記目的を達成するため、第１の発明は、複数の検索対象画像の中からクエリ画像に類似する画像を検索して出力する画像検索装置において、クエリ画像及び検索対象画像内から抽出した複数の部分画像の各々を複数のクラスタの何れかに分類することで、各画像から抽出した部分画像の各クラスタへの分類数に基づく特徴ベクトルを生成する特徴ベクトル生成手段と、前記クエリ画像の特徴ベクトルと、前記複数の検索対象画像の特徴ベクトルとの類似度を算出し、該類似度に基づいて該検索対象画像をランキングするランキング手段と、前記クエリ画像の部分画像を分類したクラスタを基準クラスタとして、該基準クラスタ全ての数に対して前記検索対象画像の部分画像を共通に分類した前記基準クラスタの数の割合を算出する共通分類度算出手段と、前記ランキングされた検索対象画像の上位から、前記割合が所定値以上の検索対象画像を出力対象として選出する選出手段と、を備えることを特徴としている。 To achieve the above object, according to a first aspect of the present invention, there is provided an image search apparatus for searching for and outputting an image similar to a query image from a plurality of search target images, and a plurality of search images extracted from the query image and the search target image. Feature vector generation means for generating a feature vector based on the number of classification of each partial image extracted from each image into each cluster by classifying each partial image into a plurality of clusters, and the feature vector of the query image And ranking means for calculating the similarity with the feature vectors of the plurality of search target images, ranking the search target images based on the similarity, and a cluster that classifies the partial images of the query image as a reference cluster The common classification for calculating the ratio of the number of the reference clusters obtained by classifying the partial images of the search target image in common with respect to the number of all the reference clusters A calculation unit, from the top of the said ranking search target image, wherein the ratio is characterized by and a selecting means for selecting as an output target search target image a predetermined value or more.

第１の発明によれば、クエリ画像の部分画像を分類した基準クラスタに対して、検索対象画像の部分画像を共通に分類した基準クラスタの数の割合を算出し、ランキングされた検索対象画像の上位から、割合が所定値以上の検索対象画像を出力対象として選出する。即ち、出力対象の選定条件は、クエリ画像を基準として算出される共通の基準クラスタの数の割合となり、クエリ画像に基づいてその選定条件は動的に変化する。このため、クエリ画像に基づいて、類似度が高く、更にクエリ画像と共通の基準クラスタを有する検索対象画像が選出されることとなる。従って、類似画像検索の結果として出力する画像の選出精度を向上させることができる。 According to the first aspect of the present invention, the ratio of the number of reference clusters in which the partial images of the search target image are commonly classified is calculated with respect to the reference clusters in which the partial images of the query image are classified. From the top, search target images having a ratio equal to or higher than a predetermined value are selected as output targets. That is, the selection condition of the output target is a ratio of the number of common reference clusters calculated based on the query image, and the selection condition dynamically changes based on the query image. Therefore, based on the query image, a search target image having a high degree of similarity and having a reference cluster in common with the query image is selected. Therefore, it is possible to improve the accuracy of selecting an image to be output as a result of similar image search.

また、第２の発明のクラスタリング装置は、請求項１に記載の画像検索装置と、前記検索対象画像の中から一ずつ前記クエリ画像を選択して前記特徴ベクトル生成手段に入力することで、前記選出手段により選出された検索対象画像を得て、該検索対象画像と該クエリ画像とを纏めた一集合を作成するクラスタリング部と、を備えることを特徴としている。 According to a second aspect of the present invention, there is provided a clustering apparatus that selects the query image one by one from the search target image and inputs the query image to the feature vector generation unit. A clustering unit that obtains a search target image selected by the selection unit and creates a set of the search target image and the query image is provided.

第２の発明によれば、検索対象画像から選択したクエリ画像に対して選出された検索結果を纏めることにより一集合を作成する。このため、クエリ画像と類似するオブジェクトを含む画像によって画像群が作成されることとなり、精度の高い画像のクラスタリングを実現することができる。 According to the second invention, a set is created by collecting the search results selected for the query images selected from the search target images. For this reason, an image group is created by an image including an object similar to the query image, and high-precision image clustering can be realized.

本発明によれば、類似画像検索の結果として出力する画像の選出精度を向上させることである。 According to the present invention, the accuracy of selecting an image to be output as a result of a similar image search is improved.

画像検索装置の機能構成を示すブロック図。The block diagram which shows the function structure of an image search device. 特徴ベクトル生成処理のフローチャート。The flowchart of a feature vector generation process. 検索処理のフローチャート。The flowchart of a search process. 共通分類度の算出の一例を説明するための図。The figure for demonstrating an example of calculation of a common classification degree. 出力対象の画像の選出の一例を説明するための図。The figure for demonstrating an example of selection of the image of the output object. 実験結果を示す図。The figure which shows an experimental result. クラスタリング処理のフローチャート。The flowchart of a clustering process.

［画像検索装置の構成］
以下、本発明の画像検索装置を、図１に示す画像検索装置１に適用した場合の実施の形態を図面に基づいて説明する。 [Configuration of image search device]
Hereinafter, an embodiment when the image search apparatus of the present invention is applied to the image search apparatus 1 shown in FIG. 1 will be described with reference to the drawings.

図１は、画像検索装置１の機能ブロック図である。画像検索装置１は、通信ネットワークを介して接続されたインターネットに接続され、該インターネットを介してウェブ上から画像データを収集可能となっている。この収集したデータをデータベース（ＤＢ）に蓄積して、検索対象の画像を作成する。 FIG. 1 is a functional block diagram of the image search apparatus 1. The image search device 1 is connected to the Internet connected via a communication network, and can collect image data from the web via the Internet. The collected data is accumulated in a database (DB) to create a search target image.

画像検索装置１は、通信ネットワークを介して接続されたパーソナルコンピュータや携帯端末等のクライアント端末から送信されるクエリ画像を検索要求として受信する。そして、その検索要求に応じた類似画像検索を行って、類似度順にランキングした検索結果をクライアント端末に出力する。 The image search apparatus 1 receives a query image transmitted from a client terminal such as a personal computer or a mobile terminal connected via a communication network as a search request. Then, a similar image search is performed according to the search request, and search results ranked in the order of similarity are output to the client terminal.

また、画像検索装置１の機能をモジュールとして用いてもよく、例えば、他のモジュールからクエリ画像の入力を受け付け、検索結果の画像を該モジュールに返すこととしてもよい。 The function of the image search device 1 may be used as a module. For example, an input of a query image may be received from another module, and an image of a search result may be returned to the module.

本実施形態における画像検索装置１は、ビジュアルキーワードの手法を用いて画像の特徴ベクトルを生成してインデックス化する。ビジュアルキーワードによる画像検索とは、１枚の画像を複数の画像領域の集合として表現し、各画像を構成する画像領域（以下、適宜「部分画像」という）から得られる特徴量に基づいて画像のインデックス（特徴ベクトル）を生成する技術であり、テキスト中のキーワードから文章の特徴量を求めるテキスト検索技術の応用といえる。 The image search apparatus 1 according to the present embodiment generates and indexes an image feature vector using a visual keyword technique. An image search using a visual keyword represents a single image as a set of a plurality of image areas, and the image search is performed based on feature amounts obtained from image areas (hereinafter referred to as “partial images” as appropriate) constituting each image. This is a technique for generating an index (feature vector), and can be said to be an application of a text search technique for obtaining a feature amount of a sentence from a keyword in the text.

このため、ビジュアルキーワードによる画像検索では、画像中の画像領域をキーワードとして扱うことでテキスト検索技術（転置インデックスやベクトル空間モデル、単語の出現頻度等）における技術を画像領域検索へ適用して、大規模且つ高速性を実現することができる。 For this reason, in image search using visual keywords, text search technology (transposition index, vector space model, word appearance frequency, etc.) is applied to image region search by treating the image region in the image as a keyword. Scale and high speed can be realized.

ビジュアルキーワードによる画像検索についての参考技術文献としては、
・Sivic and Zisserman:“Efficient visual search for objects in videos”, Proceedings of the IEEE, Vol.96,No.4.,pp.548-566,Apr 2008.
・Yang and Hauptmann:“A text categorization approach to video scene classification using keypoint features”,Carnegie Mellon University Technical Report,pp.25,Oct 2006.
・Jiang and Ngo:“Bag-of-visual-words expansion using visual relatedness for video indexing”,Proc.31^st ACM SIGIR Conf.,pp.769-770,Jul 2008.
・Jiang, Ngo, andYang:“Towards optimal bag-of-features for object categorization and semantic video retrieval”,Proc.6th ACM CIVR Conf.,pp.494-501,Jul.2007.
・Yang, Jiang, Hauptmann, and Ngo:“Evaluating bag-of-visual-words representations in scene classification”,Proc.15^th ACM MM Conf., Workshop onMMIR,pp.197-206,Sep. 2007.
等が挙げられる。 As reference technical literature on image search using visual keywords,
Sivic and Zisserman: “Efficient visual search for objects in videos”, Proceedings of the IEEE, Vol.96, No.4, pp.548-566, Apr 2008.
・ Yang and Hauptmann: “A text categorization approach to video scene classification using keypoint features”, Carnegie Mellon University Technical Report, pp. 25, Oct 2006.
・ Jiang and Ngo: “Bag-of-visual-words expansion using visual relatedness for video indexing”, Proc. 31 ^st ACM SIGIR Conf., Pp.769-770, Jul 2008.
・ Jiang, Ngo, and Yang: “Towards optimal bag-of-features for object categorization and semantic video retrieval”, Proc. 6th ACM CIVR Conf., Pp.494-501, Jul. 2007.
・ Yang, Jiang, Hauptmann, and Ngo: “Evaluating bag-of-visual-words representations in scene classification”, Proc. 15 ^th ACM MM Conf., Workshop on MMIR, pp.197-206, Sep. 2007.
Etc.

また、ある一つの画像を複数の部分画像の集合として表現することによって、一般的な類似画像検索とは異なり、画像中の一部分を任意の大きさや位置で切り出した画像をクエリ画像とした検索が可能となる。このため、ユーザは、所望の検索結果を得るために、画像の一部分を指定するといった操作により、より直接・正確にクエリを表現することができる。 In addition, by expressing a single image as a set of a plurality of partial images, unlike general similar image search, search using an image obtained by cutting a part of an image in an arbitrary size or position as a query image is possible. It becomes possible. For this reason, in order to obtain a desired search result, the user can express a query more directly and accurately by an operation such as designating a part of an image.

図１に示すように、画像検索装置１は、クエリ画像受付部１０、特徴ベクトル生成部２０、ランキング部３０、共通分類度算出部４０、検索結果選出部５０、ビジュアルキーワード生成部６０、ビジュアルキーワードＤＢ６５、インデクシング部７０、インデックスＤＢ７５、領域管理ＤＢ８０及び検索対象画像ＤＢ９０を備えて構成される。 As shown in FIG. 1, the image search apparatus 1 includes a query image receiving unit 10, a feature vector generating unit 20, a ranking unit 30, a common classification degree calculating unit 40, a search result selecting unit 50, a visual keyword generating unit 60, a visual keyword. A DB 65, an indexing unit 70, an index DB 75, a region management DB 80, and a search target image DB 90 are configured.

これらの機能部は、所謂コンピュータにより構成され、演算／制御装置としてのＣＰＵ（Central Processing Unit）、記憶媒体としてのＲＡＭ（Random Access Memory）及びＲＯＭ（Read Only Memory）、通信インターフェイス等が連関することで実現される。 These functional units are configured by so-called computers, and are associated with a CPU (Central Processing Unit) as an arithmetic / control device, a RAM (Random Access Memory) and a ROM (Read Only Memory) as a storage medium, a communication interface, and the like. It is realized with.

クエリ画像受付部１０は、類似画像検索の検索キーとなるクエリ画像をクライアント端末やモジュールから受信して受け付ける。このクエリ画像は、検索対象画像ＤＢ９０に格納されている画像や、その画像データの一部分の領域を指定する操作により切り出された画像、新たに受信した画像がある。また、クエリ画像としては、１つの画像であってもよいし、複数の画像の組み合わせでもよい。 The query image receiving unit 10 receives and receives a query image serving as a search key for similar image search from a client terminal or a module. This query image includes an image stored in the search target image DB 90, an image cut out by an operation for designating a partial area of the image data, and a newly received image. The query image may be a single image or a combination of a plurality of images.

特徴ベクトル生成部２０は、クエリ画像から部分画像を抽出し、その部分画像の特徴量に基づいて特徴ベクトルを生成する特徴ベクトル生成処理（図２参照）を行って、クエリ画像から特徴ベクトルを生成する。特徴ベクトル生成部２０は、図１に示すように、部分画像抽出部２２と、マッピング部２４とを備えて構成され、これらが協働することにより、後述する特徴ベクトル生成処理を実現する。 The feature vector generation unit 20 extracts a partial image from the query image, performs a feature vector generation process (see FIG. 2) that generates a feature vector based on the feature amount of the partial image, and generates a feature vector from the query image. To do. As shown in FIG. 1, the feature vector generation unit 20 includes a partial image extraction unit 22 and a mapping unit 24, and implements a feature vector generation process to be described later by cooperating with each other.

ランキング部３０は、インデックスＤＢ７５に記憶された検索対象画像毎の特徴ベクトルと、クエリ画像から生成した特徴ベクトルとの間の類似度を算出し、この類似度に基づいて検索対象画像をランキング（順位付け）する。この類似度の算出には、コサイン距離やBhattacharyya距離等の公知技術が用いられる。 The ranking unit 30 calculates the similarity between the feature vector for each search target image stored in the index DB 75 and the feature vector generated from the query image, and ranks the search target images based on the similarity (ranking). Append) For calculating the similarity, a known technique such as a cosine distance or a Bhattacharyya distance is used.

共通分類度算出部４０は、クエリ画像の部分画像をマッピングしたビジュアルキーワード（クラスタ）に対して、検索対象画像の部分画像をマッピングした共通のビジュアルキーワードの割合を共通分類度として算出する。即ち、この共通分類度は、クエリ画像と共通のビジュアルキーワードを有する度合いであり、この共通のビジュアルキーワードが高い程に共通の領域を有する確度が高いといえる。 The common classification degree calculation unit 40 calculates, as the common classification degree, the ratio of the common visual keyword that maps the partial image of the search target image to the visual keyword (cluster) that maps the partial image of the query image. That is, the common classification degree is a degree having a common visual keyword with the query image, and it can be said that the higher the common visual keyword, the higher the probability of having a common area.

検索結果選出部５０は、ランキング部３０によりランキングされた検索対象画像の上位から、共通分類度が所定値以上のものを選出して出力する。この検索結果選出部５０が出力するデータは、例えば、検索対象画像の画像ＩＤを類似度に基づいてソートしたデータである。画像ＩＤには、検索対象画像ＤＢ９０にアクセスするためのアドレス（ＵＲＬ）を付加してもよい。 The search result selection unit 50 selects and outputs images having a common classification degree equal to or higher than a predetermined value from the top of the search target images ranked by the ranking unit 30. The data output by the search result selection unit 50 is, for example, data obtained by sorting the image IDs of search target images based on the similarity. An address (URL) for accessing the search target image DB 90 may be added to the image ID.

ビジュアルキーワード生成部６０は、画像データの特徴ベクトルを生成する際に、画像内の部分画像をマッピングする対象の分類先（基準クラスタ）を生成する。ビジュアルキーワード生成部６０は、画像検索に用いる画像や学習用に予め用意された画像データから複数の部分画像を抽出し、その部分画像の有する特徴量に基づいてそれらをクラスタリングする。尚、クラスタリングの標準的な手法としては、k-means, Hierarchical Agglomerative Clustering(HAC)などが用いられる。 When generating a feature vector of image data, the visual keyword generation unit 60 generates a classification destination (reference cluster) to which a partial image in the image is mapped. The visual keyword generation unit 60 extracts a plurality of partial images from images used for image search and image data prepared in advance for learning, and clusters them based on the feature amounts of the partial images. As a standard method of clustering, k-means, Hierarchical Agglomerative Clustering (HAC) or the like is used.

特徴ベクトル生成部２０は、画像から検出した部分画像を、ビジュアルキーワード生成部６０のクラスタリングにより形成されるクラスタにマッピング（分類）することで、特徴ベクトルを生成する。このクラスタを、画像を視覚的なキーワードの集まりとして表現するための特徴量空間として「ビジュアルキーワード」という。尚、本発明でいう「分類」や「マッピング」は、部分画像と基準クラスタ（ビジュアルキーワード）との対応付けを行うことをいい、部分画像を基準クラスタの集合内に追加することは行わない。 The feature vector generation unit 20 generates a feature vector by mapping (classifying) the partial image detected from the image to a cluster formed by the clustering of the visual keyword generation unit 60. This cluster is referred to as a “visual keyword” as a feature amount space for expressing an image as a collection of visual keywords. Note that “classification” and “mapping” as used in the present invention refers to associating a partial image with a reference cluster (visual keyword), and does not add the partial image to the set of reference clusters.

ビジュアルキーワードＤＢ６５は、ビジュアルキーワード生成部６０のクラスタリングにより形成されたクラスタを識別するビジュアルキーワードＩＤ（ＶＫＩＤ）と、そのクラスタの特徴量空間（多次元空間）での中心点の座標である中心座標と、該クラスタの範囲を示す半径とを対応付けて記憶するデータベースである。 The visual keyword DB 65 includes a visual keyword ID (VKID) for identifying a cluster formed by clustering by the visual keyword generation unit 60, and a center coordinate that is a coordinate of a center point in the feature amount space (multidimensional space) of the cluster. , A database that stores the radius indicating the cluster range in association with each other.

中心座標は、各クラスタに属する画像の特徴量の平均値を示す値であり、特徴量空間上での多次元の座標により示される。半径は、例えば、クラスタに属する画像のうちの、中心座標から最遠の画像との距離により求められる。 The center coordinate is a value indicating an average value of feature amounts of images belonging to each cluster, and is represented by multidimensional coordinates on the feature amount space. The radius is obtained, for example, by the distance from the image farthest from the center coordinate among the images belonging to the cluster.

インデクシング部７０は、図２の特徴ベクトル生成処理に基づいて検索対象画像ＤＢ９０に記憶された画像データについての特徴ベクトルを生成して、この生成した特徴ベクトルを画像データのインデックスとしてインデックスＤＢ７５に対応付けて記憶する。インデクシング部７０は、部分画像抽出部７２と、マッピング部７４とを備えて構成され、これらが協働することにより、後述する特徴ベクトル生成処理を実現する。 The indexing unit 70 generates a feature vector for the image data stored in the search target image DB 90 based on the feature vector generation process of FIG. 2, and associates the generated feature vector with the index DB 75 as an index of the image data. And remember. The indexing unit 70 includes a partial image extraction unit 72 and a mapping unit 74, and implements a feature vector generation process to be described later by cooperating with each other.

また、インデクシング部７０は、画像データから検出した部分画像に領域ＩＤを割り振り、その部分画像をマッピングしたビジュアルキーワードのＶＫＩＤを画像ＩＤと領域ＩＤとに対応付けて領域管理ＤＢ８０に記憶する。この領域ＩＤは、画像内でのＸＹ座標であってもよいし、領域分割した際の行番号・列番号であってもよい。 Further, the indexing unit 70 assigns a region ID to the partial image detected from the image data, and stores the VKID of the visual keyword mapping the partial image in the region management DB 80 in association with the image ID and the region ID. The area ID may be an XY coordinate in the image, or may be a row number / column number when the area is divided.

インデックスＤＢ７５は、検索対象画像ＤＢ９０に記憶された画像データの画像ＩＤと、この画像データから生成した特徴ベクトル（ビジュアルキーワード毎の部分画像の出現頻度）とを対応付けて記憶するデータベースである。 The index DB 75 is a database that stores an image ID of image data stored in the search target image DB 90 and a feature vector (frequency of appearance of partial images for each visual keyword) generated from the image data in association with each other.

領域管理ＤＢ８０は、検索対象画像内の部分画像をマッピングしたビジュアルキーワードの対応関係を管理するデータベースであり、図１に示すように、検索対象画像の画像ＩＤと、領域ＩＤと、ＶＫＩＤとを対応付けて記憶する。 The area management DB 80 is a database that manages the correspondence between visual keywords that map partial images in a search target image, and corresponds to the image ID, area ID, and VKID of the search target image as shown in FIG. Add and remember.

検索対象画像ＤＢ９０は、類似画像の検索対象としてインターネット上から収集した画像データ（「検索対象画像」という）を蓄積記憶するデータベースであって、図１に示すように、画像ＩＤと、画像データとを対応付けて記憶する。画像ＩＤは、各画像データを固有に識別するための識別情報であって、キーワード及び画像データを記憶する際に、割り振られる。 The search target image DB 90 is a database for accumulating and storing image data collected from the Internet as a search target for similar images (referred to as “search target image”). As shown in FIG. 1, an image ID, image data, Are stored in association with each other. The image ID is identification information for uniquely identifying each image data, and is assigned when a keyword and image data are stored.

〔特徴ベクトル生成処理〕
次ぎに、特徴ベクトル生成処理について、図２のフローチャートと、図４の概念図とを参照しながら説明する。特徴ベクトル生成処理は、特徴ベクトル生成部２０がクエリ画像に対して、インデクシング部７０が検索対象画像に対して行うが、以下の説明では、特徴ベクトル生成部２０が行う場合を取り上げて説明する。 [Feature vector generation processing]
Next, the feature vector generation process will be described with reference to the flowchart of FIG. 2 and the conceptual diagram of FIG. The feature vector generation process is performed on the query image by the feature vector generation unit 20 and the search target image by the indexing unit 70. In the following description, the case of the feature vector generation unit 20 will be described.

先ず、部分画像抽出部２２が、クエリ画像の画像データから複数の部分画像を抽出する（ステップＳ１１）。この部分画像の抽出方法としては、画像中の特徴的な領域（特徴領域）を抽出する手法と、画像を所定領域で分割することで抽出する手法とがある。 First, the partial image extraction unit 22 extracts a plurality of partial images from the image data of the query image (step S11). As a method for extracting the partial image, there are a method for extracting a characteristic region (feature region) in the image and a method for extracting the image by dividing the image into predetermined regions.

特徴領域を検出する手法としては、
・Ｈａｒｒｉｓ−ａｆｆｉｎｅ
・Ｈｅｓｓｉａｎ−ａｆｆｉｎｅ
・Ｍａｘｉｍａｌｌｙｓｔａｂｌｅｅｘｔｒｅｍａｌｒｅｇｉｏｎｓ（ＭＳＥＲ）
・ＤｉｆｆｅｒｅｎｃｅｏｆＧａｕｓｓｉａｎｓ（ＤｏＧ）
・ＬａｐｌａｃｉａｎｏｆＧａｕｓｓｉａｎ（ＬｏＧ）
・ＤｅｔｅｒｍｉｎａｎｔｏｆＨｅｓｓｉａｎ（ＤｏＨ）
等がある。 As a technique for detecting feature regions,
・ Harris-affine
・ Hessian-affine
・ Maximally stable extremal regions (MSER)
・ Difference of Gaussians (DoG)
・ Laplacian of Gaussian (LoG)
・ Determinant of Hessian (DoH)
Etc.

また、特徴領域の検出技術については、“Local Invariant Feature Detectors: A Survey”（Foundations and Trends in Computer Graphics and Vision,Vol.3,No.3,pp.177-280,2007.）等において公開されており、適宜公知技術を採用可能である。 The feature region detection technology is published in “Local Invariant Feature Detectors: A Survey” (Foundations and Trends in Computer Graphics and Vision, Vol.3, No.3, pp.177-280, 2007.). Any known technique can be adopted as appropriate.

また、画像を所定領域で分割して検出する手法としては、例えば、予め定めたＭ×Ｎブロックに分割したり、分割後のブロックの大きさが予め定めたｍ×ｎ画素となるように分割したりする手法がある。例えば、画像を１０×１０のブロックに分割する場合、画像の大きさが６４０×４８０画素であれば、１ブロックの大きさは６４×４８画素となる。 In addition, as a method of detecting an image by dividing it into predetermined areas, for example, the image is divided into predetermined M × N blocks, or divided so that the size of the divided blocks becomes predetermined m × n pixels. There is a technique to do. For example, when an image is divided into 10 × 10 blocks, if the size of the image is 640 × 480 pixels, the size of one block is 64 × 48 pixels.

次に、抽出した部分画像が有する特徴量を算出する（ステップＳ１２）。尚、特徴領域を抽出している場合には、スケール変化や回転、角度変化等のアフィン変換に耐性を持つ局所特徴量を抽出する。局所特徴量の一例としては、例えば次のものが挙げられる。 Next, the feature amount of the extracted partial image is calculated (step S12). When the feature region is extracted, a local feature amount resistant to affine transformation such as scale change, rotation, and angle change is extracted. Examples of the local feature amount include the following.

・ＳＩＦＴ
・ｇｒａｄｉｅｎｔｌｏｃａｔｉｏｎａｎｄｏｒｉｅｎｔａｔｉｏｎｈｉｓｔｏｇｒａｍ
・ｓｈａｐｅｃｏｎｔｅｘｔ
・ＰＣＡ−ＳＩＦＴ
・ｓｐｉｎｉｍａｇｅｓ
・ｓｔｅｅｒａｂｌｅｆｉｌｔｅｒｓ
・ｄｉｆｆｅｒｅｎｔｉａｌｉｎｖａｒｉａｎｔｓ
・ｃｏｍｐｌｅｘｆｉｌｔｅｒｓ
・ｍｏｍｅｎｔｉｎｖａｒｉａｎｔｓ・ SIFT
・ Gradient location and orientation histogram
・ Shape context
・ PCA-SIFT
・ Spin images
・ Steerable filters
・ Differential inverters
・ Complex filters
・ Moment inviteants

局所特徴量の抽出については、“A performance evaluation of local descriptors”（IEEE Trans. Pattern Analysis and Machine Intelligence, Vol.27, No.10,pp.1615-1630,2005.）等において公開されており、適宜公知技術を採用可能である。 The extraction of local features is published in “A performance evaluation of local descriptors” (IEEE Trans. Pattern Analysis and Machine Intelligence, Vol.27, No.10, pp.1615-1630, 2005.) A known technique can be adopted as appropriate.

この特徴領域から抽出した特徴量に基づいて生成した特徴ベクトルは、オブジェクト（物体）の存在する可能性の高い特徴領域から生成されるため、画像中のオブジェクトの特徴を示す指標として有効である。 Since the feature vector generated based on the feature amount extracted from the feature region is generated from the feature region where the object (object) is highly likely to exist, it is effective as an index indicating the feature of the object in the image.

また、領域分割により部分画像を抽出している場合には、画像の配色やテクスチャ、形状等の各画像の特徴を数値化して表現した画像特徴量を用いる。この領域分割により検出した領域画像の特徴量から生成した特徴ベクトルは、画像を構成する各部分から生成されるため、画像の全体的な構成を示す指標として有効である。 Further, when partial images are extracted by area division, image feature amounts expressed by quantifying the features of each image such as the color scheme, texture, and shape of the image are used. Since the feature vector generated from the feature amount of the region image detected by the region division is generated from each part constituting the image, it is effective as an index indicating the overall configuration of the image.

次に、マッピング部２４は、画像から抽出した複数の部分画像の中から１つを選択し（ステップＳ１３）、その部分画像と、各ビジュアルキーワード（ビジュアルキーワード生成部６０により予め形成されたクラスタ）との間の距離を算出する（ステップＳ１４）。この距離は、特徴量空間上における部分画像の中心座標と、ビジュアルキーワードの中心座標との間の距離によって求められる。クラスタの中心座標に用いる値としては、クラスタに属するデータ（特徴量）の平均値としてのcentroidの他に、中心に最も近いデータを採用するmedoidや、データを昇順に並べたときに中央に位置するデータを採用するmedian等を適用することができる。 Next, the mapping unit 24 selects one of the plurality of partial images extracted from the image (step S13), the partial image, and each visual keyword (cluster formed in advance by the visual keyword generation unit 60). Is calculated (step S14). This distance is obtained by the distance between the center coordinates of the partial image in the feature amount space and the center coordinates of the visual keyword. As the value used for the center coordinates of the cluster, in addition to the centroid as the average value of the data (features) belonging to the cluster, the medoid that adopts the data closest to the center, or the center position when the data are arranged in ascending order Median that employs data to be applied can be applied.

そして、算出した距離が最も近いビジュアルキーワードに、選択した部分画像をマッピング（分類）する（ステップＳ１５）。この際、マッピング部２４は、インデックスＤＢ７５の特徴ベクトルの各ビジュアルキーワードのスカラ値をその部分画像の数を加算する。また、領域管理ＤＢ８０の部分画像の領域ＩＤに、該部分画像をマッピングしたビジュアルキーワードのＶＫＩＤの割り当てを行い、各々を対応付けて記憶する。 Then, the selected partial image is mapped (classified) to the visual keyword having the closest calculated distance (step S15). At this time, the mapping unit 24 adds the number of partial images to the scalar value of each visual keyword of the feature vector in the index DB 75. Further, the VKID of the visual keyword that maps the partial image is assigned to the region ID of the partial image in the region management DB 80, and each is associated and stored.

例えば、図４において、部分画像Ｇ１１と、ビジュアルキーワードＶＫ１〜ＶＫＮとの間の距離を算出した結果、ビジュアルキーワードＶＫ１が最も近いと判定されたとする。この場合は、部分画像Ｇ１１がビジュアルキーワードＶＫ１にマッピングされ、特徴ベクトルＶのＶＫ１のスカラ値に１加算されることとなる。 For example, in FIG. 4, it is assumed that as a result of calculating the distance between the partial image G11 and the visual keywords VK1 to VKN, it is determined that the visual keyword VK1 is the closest. In this case, the partial image G11 is mapped to the visual keyword VK1, and 1 is added to the scalar value of VK1 of the feature vector V.

次に、マッピング部２４は、ステップＳ１３において全ての部分画像を選択したか否かを判定し（ステップＳ１６）、未選択の部分画像がある場合には（ステップＳ１６；Ｎｏ）、ステップＳ１３に処理を移行して、ステップＳ１５までの処理を繰り返す。 Next, the mapping unit 24 determines whether or not all partial images have been selected in step S13 (step S16). If there is an unselected partial image (step S16; No), the process proceeds to step S13. And the process up to step S15 is repeated.

また、全ての部分画像を選択したと判定した場合には（ステップＳ１６；Ｙｅｓ）、マッピングしたビジュアルキーワードに基づいて特徴ベクトルを生成する（ステップＳ１７）。即ち、マッピングした結果のビジュアルキーワード毎での部分画像の出現頻度により多次元で表される特徴ベクトルを生成する。これにより、図４のようにクエリ画像Ｇ１についての特徴ベクトルＶが生成される。 If it is determined that all partial images have been selected (step S16; Yes), a feature vector is generated based on the mapped visual keyword (step S17). That is, a feature vector expressed in multi-dimensions is generated according to the appearance frequency of partial images for each visual keyword as a result of mapping. Thereby, the feature vector V about the query image G1 is generated as shown in FIG.

特徴ベクトル生成部２０は、上記の処理により生成したクエリ画像の特徴ベクトルをランキング部３０及び共通分類度算出部４０に出力する。また、インデクシング部７０が特徴ベクトル生成処理により、検索対象画像の各々の特徴ベクトルを生成すると、検索対象画像の画像ＩＤに対応付けてインデックスＤＢ７５に対応付けて記憶する。 The feature vector generation unit 20 outputs the feature vector of the query image generated by the above processing to the ranking unit 30 and the common classification degree calculation unit 40. Further, when the indexing unit 70 generates each feature vector of the search target image by the feature vector generation process, the indexing unit 70 stores it in association with the index DB 75 in association with the image ID of the search target image.

以上のように、特徴ベクトル生成処理により、画像内の局所的な特徴量のビジュアルキーワードへの分布によって表わされる特徴ベクトルを生成できる。 As described above, a feature vector represented by a distribution of local feature amounts in an image to visual keywords can be generated by feature vector generation processing.

〔検索処理〕
次ぎに、ランキング部３０、共通分類度算出部４０及び検索結果選出部５０より行う検索処理について、図３のフローチャートと、図４及び５の概念図とを参照しながら説明する。 [Search processing]
Next, search processing performed by the ranking unit 30, the common classification degree calculation unit 40, and the search result selection unit 50 will be described with reference to the flowchart of FIG. 3 and conceptual diagrams of FIGS.

先ず、ランキング部３０は、特徴ベクトル生成部２０から出力されるクエリ画像の特徴ベクトルを得ると、この特徴ベクトルと、インデックスＤＢ７５に格納された複数の検索対象画像の特徴ベクトルとの間の類似度を算出する（ステップＳ３１）。 First, when the ranking unit 30 obtains the feature vector of the query image output from the feature vector generation unit 20, the similarity between the feature vector and the feature vectors of the plurality of search target images stored in the index DB 75 is obtained. Is calculated (step S31).

そして、その算出した類似度に基づいて検索対象画像をランキング（順位付け）して、類似度の降順に並び替える（ステップＳ３２）。これにより、図５に示すように、クエリ画像Ｇ１に対して類似する検索対象画像が類似度順に得られる。 Then, the search target images are ranked (ranked) based on the calculated similarity and rearranged in descending order of similarity (step S32). Thereby, as shown in FIG. 5, search target images similar to the query image G1 are obtained in the order of similarity.

また、共通分類度算出部４０は、クエリ画像が属するビジュアルキーワード（ＶＫ）を基準クラスタとし、該基準クラスタに対して検索対象画像の部分画像が共通して分類された前記基準クラスタの数の割合を共通分類度として算出する（ステップＳ３３）。 Further, the common classification degree calculation unit 40 uses the visual keyword (VK) to which the query image belongs as a reference cluster, and the ratio of the number of the reference clusters in which the partial images of the search target image are commonly classified with respect to the reference cluster. Is calculated as a common classification degree (step S33).

この共通分類度は、クエリ画像の部分画像をマッピングしたビジュアルキーワードのうち、検索対象画像の部分画像をマッピングした共通のビジュアルキーワードの数を、クエリ画像の部分画像をマッピングしたビジュアルキーワードの数で除した値である。 This common classification is calculated by dividing the number of common visual keywords that map partial images of the search target image among the visual keywords that map partial images of the query image by the number of visual keywords that map partial images of the query image. It is the value.

例えば、図４においては、クエリ画像Ｇ１の部分画像がマッピングされたビジュアルキーワードがＶＫ１、ＶＫ２、ＶＫ３の３つであり、このビジュアルキーワードの中で検索対象画像の部分画像がマッピングされたビジュアルキーワードがＶＫ１とＶＫ３の２つであるので、共通分類度は２／３として算出される。 For example, in FIG. 4, there are three visual keywords VK1, VK2, and VK3 to which the partial images of the query image G1 are mapped, and among these visual keywords, the visual keywords to which the partial images of the search target image are mapped. Since there are two, VK1 and VK3, the common classification degree is calculated as 2/3.

検索結果選出部５０は、ランキング部３０がランキングした検索対象画像の上位から、共通分類度が所定の閾値以上のものを選出する（ステップＳ３４）。例えば、閾値を１０％と設定したとすると、検索対象画像Ｇ５１〜Ｇ５６が選出されることとなる。これらの検索対象画像は、クエリ画像を基準として共通のビジュアルキーワードに属する部分画像を含む度合いの高いものである。 The search result selection unit 50 selects those having a common classification degree equal to or higher than a predetermined threshold from the top of the search target images ranked by the ranking unit 30 (step S34). For example, if the threshold is set to 10%, search target images G51 to G56 are selected. These search target images have a high degree of including partial images belonging to a common visual keyword on the basis of the query image.

検索結果選出部５０は、共通分類度に基づいて選出した類似度の高い順の検索対象画像を検索結果とし、クエリ画像を受け付けたクライアント端末やモジュールに対して出力する（ステップＳ３５）。 The search result selection unit 50 sets the search target images in the descending order of similarity selected based on the common classification degree as search results, and outputs the search results to the client terminal or module that has received the query image (step S35).

以上のように、クエリ画像に対する検索結果を出力する際に、クエリ画像と検索対象画像の特徴ベクトル間の類似度により、局所的な特徴量に基づいた類似画像を検索し、更に、クエリ画像を基準とした相対的な値である共通分類度に基づいて共通領域を有する検索対象画像を選出する。 As described above, when outputting the search result for the query image, the similar image based on the local feature amount is searched based on the similarity between the feature vector of the query image and the search target image, and the query image is further searched. A search target image having a common area is selected based on a common classification degree that is a relative value as a reference.

共通分類度に基づく検索対象画像の選出を所定の閾値に基づいて行うこととなるが、この閾値は、クエリ画像を基準とした相対的な比率である。図６は、共通分類度の閾値を１０％と設定して、１０，２００枚の検索対象画像に対してクエリ画像を変えて実験を行った結果の例である。図６に示すように、共通分類度の閾値を１０％と設定しても、実験の結果選出される検索対象画像の類似度の最低値は、基準となるクエリ画像によって変動することとなる。 Selection of the search target image based on the common classification degree is performed based on a predetermined threshold, and this threshold is a relative ratio based on the query image. FIG. 6 is an example of the result of an experiment performed by changing the query image for 10,200 search target images with the common classification degree threshold set to 10%. As shown in FIG. 6, even if the common classification degree threshold is set to 10%, the minimum value of the similarity of the search target images selected as a result of the experiment varies depending on the reference query image.

このため、従来の類似度に対して固定的（絶対的）な閾値を設定するのとは異なり、実際に選出される検索対象画像の類似度はクエリ画像に応じて動的に変化させると共に、クエリ画像と共通なビジュアルキーワードを有する検索対象画像を選出する。従って、クエリ画像のマッピングされたビジュアルキーワードに基づくことで、類似画像検索の結果として出力する画像の選出精度を向上させることができる。 For this reason, unlike setting a fixed (absolute) threshold value with respect to the conventional similarity, the similarity of the search target image actually selected is dynamically changed according to the query image, A search target image having a visual keyword common to the query image is selected. Therefore, based on the mapped visual keyword of the query image, it is possible to improve the accuracy of selecting an image to be output as a similar image search result.

［変形例］
尚、上述した実施形態は本発明を適用した一例であり、その適用可能な範囲は適宜変更可能である。 [Modification]
The above-described embodiment is an example to which the present invention is applied, and the applicable range can be changed as appropriate.

例えば、共通分類度に対して重み付けを行ってもよい。具体的に、上述の実施形態では、クエリ画像をマッピングしたビジュアルキーワードの数と、検索対象画像を共通にマッピングしたビジュアルキーワードの数とにより共通分類度（例えば、２／３＝（１＋１）／（１＋１＋１））を算出しているが、これらのビジュアルキーワードの数に対して、部分画像とビジュアルキーワードとの間の距離を用いて重み付けを行う。 For example, the common classification degree may be weighted. Specifically, in the above-described embodiment, the common classification degree (for example, 2/3 = (1 + 1) / () is determined by the number of visual keywords that map query images and the number of visual keywords that map search target images in common. 1 + 1 + 1)), but the number of these visual keywords is weighted using the distance between the partial image and the visual keyword.

図４において、クエリ画像の部分画像Ｇ１１〜Ｇ１５をビジュアルキーワードＶＫ１にマッピングしており、そのマッピングした部分画像それぞれとビジュアルキーワードＶＫ１との間の距離の平均値Ｄ１を求める。また、ビジュアルキーワードＶＫ２についても、マッピングした部分画像と、ビジュアルキーワードＶＫ２との間の距離の平均値Ｄ２を求める。 In FIG. 4, partial images G11 to G15 of the query image are mapped to the visual keyword VK1, and an average value D1 of the distance between each mapped partial image and the visual keyword VK1 is obtained. For the visual keyword VK2, the average value D2 of the distance between the mapped partial image and the visual keyword VK2 is obtained.

この距離に基づいて、クエリ画像をマッピングしたビジュアルキーワードの数に重み付けを行うことで、この数は、１×Ｄ１＋１×Ｄ２＋１×Ｄ３となる。 Based on this distance, by weighting the number of visual keywords to which the query image is mapped, this number becomes 1 × D1 + 1 × D2 + 1 × D3.

また、同様に検索対象画像についても、各ビジュアルキーワードにマッピングした部分画像との距離の平均値を、ビジュアルキーワード毎に算出し、この距離に基づいてクエリ画像と共通してマッピングしたビジュアルキーワードの数に重み付けを行う。 Similarly, for the search target image, the average value of the distance from the partial image mapped to each visual keyword is calculated for each visual keyword, and the number of visual keywords mapped in common with the query image based on this distance Is weighted.

例えば、図４においては、検索対象画像をマッピングしたビジュアルキーワードＶＫ１についての部分画像との間の距離の平均値をＤＡ、ビジュアルキーワードＶＫ３についての部分画像との間の距離の平均値をＤＣと求めたとすると、クエリ画像と共通してマッピングしたビジュアルキーワードの数に重み付けを行うと、１×ＤＡ＋１×ＤＣとなる。 For example, in FIG. 4, the average value of the distance to the partial image for the visual keyword VK1 to which the search target image is mapped is obtained as DA, and the average value of the distance to the partial image for the visual keyword VK3 is obtained as DC. For example, if the number of visual keywords mapped in common with the query image is weighted, 1 × DA + 1 × DC is obtained.

このように重み付けしたビジュアルキーワードの数により共通分類度を求めると、（ＤＡ＋ＤＣ）／（Ｄ１＋Ｄ２＋Ｄ３）となる。このように部分画像とビジュアルキーワードとの間の距離に基づいて重み付けを行うことで、ビジュアルキーワードと部分画像との間の距離のばらつきを考慮できるため、共通分類度の精度を向上させることができる。 When the common classification degree is obtained by the number of visual keywords weighted in this way, (DA + DC) / (D1 + D2 + D3) is obtained. By weighting based on the distance between the partial image and the visual keyword in this way, it is possible to consider the variation in the distance between the visual keyword and the partial image, so that the accuracy of the common classification degree can be improved. .

また、例えば、上述した実施形態はクエリ画像に対して検索結果を出力する構成のみに特化して説明したが、これを利用して検索対象画像のクラスタリングを行うこととしてもよい。図７は、そのクラスリングの処理を説明するためのフローチャートである。 Further, for example, the above-described embodiment has been described only for the configuration in which the search result is output for the query image. However, the search target image may be clustered using this. FIG. 7 is a flowchart for explaining the class ring processing.

図示は省略するが、図１の画像検索装置１に対してクラスタリング部を備え、このクラスタリング部が検索対象画像ＤＢ９０の中から一つの検索対象画像を選択する（ステップＳ５１）。そして、その選択した検索対象画像をクエリ画像としてクエリ画像受付部１０に入力することで、上述した検索処理を実行させる（ステップＳ５２）。 Although not shown, the image search apparatus 1 of FIG. 1 includes a clustering unit, and this clustering unit selects one search target image from the search target image DB 90 (step S51). And the search process mentioned above is performed by inputting the selected search object image into the query image reception part 10 as a query image (step S52).

クラスタリング部は、検索結果選出部５０から出力される検索結果を受け取り、その検索対象画像と、クエリ画像とにより一集合（クラスタ）を形成してクラスタＤＢ（図示略）に集合毎に記憶する（ステップＳ５３）。 The clustering unit receives the search result output from the search result selection unit 50, forms a set (cluster) from the search target image and the query image, and stores the set in a cluster DB (not shown) for each set ( Step S53).

そして、その登録したクエリ画像及び検索対象画像を検索対象画像ＤＢ９０から削除する（ステップＳ５４）。この削除は、例えば、各画像に対して削除フラグを設定する等として、実際にデータを削除しなくてもよい。 Then, the registered query image and search target image are deleted from the search target image DB 90 (step S54). For this deletion, for example, it is not necessary to actually delete data, for example, by setting a deletion flag for each image.

クラスタリング部は、検索対象画像ＤＢ９０に未処理の画像があるか否かを判定し（ステップＳ５５）、未処理の画像があると判定した場合には（ステップＳ５５；Ｙｅｓ）、ステップＳ５１に処理を移行して、検索処理を行わせていく。 The clustering unit determines whether or not there is an unprocessed image in the search target image DB 90 (step S55), and if it is determined that there is an unprocessed image (step S55; Yes), the process proceeds to step S51. Move to let the search process.

このように、検索対象画像ＤＢ９０に格納された画像に対して、上述の検索処理を行って一集合となる画像群を逐次作成していくことで、クエリ画像と類似性の高いオブジェクトを含む画像で纏め上げて（クラスタリング）いくことができるようになる。 As described above, by performing the above-described search processing on the images stored in the search target image DB 90 and sequentially creating a set of images, an image including an object highly similar to the query image. Can be summarized (clustering).

クラスタリング処理により作成した複数の集合は、その集合同士の距離（例えば、集合内の画像群の特徴ベクトルの平均間の距離）に基づいて更に纏め上げていくこととしてもよい。このクラスタリングの標準的な手法としては、k-means, Hierarchical Agglomerative Clustering(HAC)などが用いられる。即ち、複数のクラスタを纏め上げていくボトムアップによる階層的クラスタリングを行うことにより、類似するオブジェクトを含むクラスタを作成してくことができる。 The plurality of sets created by the clustering process may be further summarized based on the distance between the sets (for example, the distance between the averages of the feature vectors of the image groups in the set). As a standard method of this clustering, k-means, Hierarchical Agglomerative Clustering (HAC), etc. are used. That is, a cluster including similar objects can be created by performing hierarchical clustering by bottom-up by collecting a plurality of clusters.

また、テキスト検索における単語の重み付け手法であるＴＦ／ＩＤＦ（term frequency-inverse document frequency）により、特徴ベクトルに重み付けを行うこととしてもよい。 Further, the feature vectors may be weighted by TF / IDF (term frequency-inverse document frequency) which is a word weighting method in text search.

ＴＦ／ＩＤＦに関する参考資料としては、
C.D.Manning, P.Raghavan and H.Schutze:" Introduction to Information Retrieval",Cambridge University Press.2008.
が知られている。 For reference materials on TF / IDF,
CDManning, P. Raghavan and H. Schutze: "Introduction to Information Retrieval", Cambridge University Press. 2008.
It has been known.

ＴＦ／ＩＤＦは、文章中の特徴的な単語を抽出するためのアルゴリズムであり、単語の出現頻度であるＴＦと、逆出現頻度であるＩＤＦとの二つの指標により算出される。具体的には、次式により求められる。
ＴＦ／ＩＤＦ＝ＴＦ（ｉ,ｊ）／Ｔ（ｉ）＊ＩＤＦ（ｊ）
ＩＤＦ（ｉ）＝ｌｏｇ（Ｎ／ＤＦ（ｉ）） TF / IDF is an algorithm for extracting a characteristic word in a sentence, and is calculated by two indexes, ie, TF that is the appearance frequency of the word and IDF that is the reverse appearance frequency. Specifically, it is calculated | required by following Formula.
TF / IDF = TF (i, j) / T (i) * IDF (j)
IDF (i) = log (N / DF (i))

ここで、
ＴＦ（ｉ，ｊ）は、キーワード抽出対象のドキュメントｉ中でのキーワードｊの出現数
Ｔ（ｉ）は、ドキュメントｉ中の全ての単語の数
Ｎは、全てのドキュメント数
ＤＦ（ｊ）は、キーワードｊが含まれるドキュメントの数
である。 here,
TF (i, j) is the number of occurrences of keyword j in document i to be extracted, T (i) is the number of all words in document i N is the number of all documents DF (j) is The number of documents containing the keyword j.

これを、ドキュメントを画像、単語を同一のビジュアルキーワードに属する部分画像として捉え、各画像のビジュアルキーワード毎にＴＦ／ＩＤＦ値を求めて、このＴＦ／ＩＤＦ値を、部分画像をマッピングしたビジュアルキーワードに加算することで、特徴ベクトルを生成する。 This is regarded as a partial image belonging to the same visual keyword with a document as an image and a word, and a TF / IDF value is obtained for each visual keyword of each image, and this TF / IDF value is converted into a visual keyword mapping the partial image. A feature vector is generated by addition.

このとき、画像ＩＤをｉ、各ビジュアルキーワードｋとして、各ビジュアルキーワードの重み値であるＴＦ／ＩＤＦ（ｉ,ｋ）は以下の式により算出する。 At this time, assuming that the image ID is i and each visual keyword k, TF / IDF (i, k) that is a weight value of each visual keyword is calculated by the following equation.

ＴＦ／ＩＤＦ（ｉ,ｋ）＝ＴＦ（ｉ,ｋ）／Ｔ（ｉ）＊ＩＤＦ（ｋ）
ＩＤＦ（ｋ）＝ｌｏｇ（Ｎ／ＤＦ（ｋ）） TF / IDF (i, k) = TF (i, k) / T (i) * IDF (k)
IDF (k) = log (N / DF (k))

尚、ＴＦ（ｉ,ｋ）は、画像ｉから抽出した部分画像がビジュアルキーワードｋで出現する数に重み付けを行ったものであり、各ビジュアルキーワードｋ内に属する（出現する）部分画像と、ビジュアルキーワードｋの中心点との距離に基づく上述した重み値（０〜１）となる。 TF (i, k) is obtained by weighting the number of partial images extracted from the image i that appear in the visual keyword k, and the partial images belonging to (appearing in) each visual keyword k and visual The weight value (0 to 1) described above based on the distance from the center point of the keyword k is obtained.

また、Ｔ（ｉ）は、画像ｉから抽出した部分画像の総数に、ビジュアルキーワードとの距離に基づく重み付けをした値であり、画像ｉから抽出した各部分画像が属するクラスタとの距離に基づいた重み値を合計したものである。 T (i) is a value obtained by weighting the total number of partial images extracted from the image i based on the distance from the visual keyword, and is based on the distance from the cluster to which each partial image extracted from the image i belongs. This is the sum of the weight values.

また、ＤＦ（ｋ）は、各ビジュアルキーワードｋに分類した部分画像が、各ビジュアルキーワードｋに出現する数に、ビジュアルキーワードとの距離に基づく重み付けを行った値である。また、Ｎは、検索対象画像ＤＢ９０の画像総数である。 DF (k) is a value obtained by weighting the number of partial images classified into each visual keyword k appearing in each visual keyword k based on the distance from the visual keyword. N is the total number of images in the search target image DB 90.

このように、ＴＦ／ＩＤＦにおけるドキュメントを画像とみなし、ドキュメント内の単語を同一のビジュアルキーワードに属する部分画像とみなして重み付けを行うことで、各画像に出現する部分画像の重要度を下げ、特定の画像に際立って出現する特徴的な部分画像についての重要度を上げるように特徴ベクトルのスカラ値に重み付けを行うことができる。 In this way, the document in TF / IDF is regarded as an image, and the words in the document are regarded as partial images belonging to the same visual keyword, and weighting is performed, so that the importance of partial images appearing in each image is reduced and specified. The scalar value of the feature vector can be weighted so as to increase the importance of the characteristic partial image that appears conspicuously in the image.

このＴＦ／ＩＤＦによる重み付けを用いて、クエリ画像内の部分画像が属するビジュアルキーワードと、検索対象画像内の部分画像が属するビジュアルキーワードとの類似スコアを求めてもよい。 Using the weighting by TF / IDF, a similarity score between the visual keyword to which the partial image in the query image belongs and the visual keyword to which the partial image in the search target image belongs may be obtained.

また、共通分類度算出部４０は、全ての検索対象画像に対して求めることとして説明したが、例えば、類似度でランキングした検索対象画像のうちの、上位から所定数（例えば、１〜２０位までの２０個）の検索対象画像についてのみ算出することとしてよい。また、所定数の検索対象画像の共通分類度が全て閾値以上である場合には、更に下位の検索対象画像について算出の対象を順に広げて（例えば、２１〜４０位までの２０個）もよい。これにより、算出した共通分類度が閾値以上であるか否かの判定は、その所定数について行えばよいため、処理の高速化を図れる。 Moreover, although the common classification degree calculation part 40 demonstrated as calculating with respect to all the search target images, for example, it is a predetermined number (for example, 1-20th rank) from the top among the search target images ranked by similarity. It is good also as calculating only about 20 search object images. Further, when the common classification degree of a predetermined number of search target images is all equal to or greater than the threshold value, the calculation target may be further expanded in order for lower search target images (for example, 20 items from 21 to 40). . Thereby, since it is sufficient to determine whether or not the calculated common classification degree is equal to or greater than the threshold value, it is possible to increase the processing speed.

今回開示された実施の形態はすべての点で例示であって制限的なものではないと考えられるべきである。本発明の範囲は上記した説明ではなくて特許請求の範囲によって示され、特許請求の範囲と均等の意味および範囲内でのすべての変更が含まれることが意図される。 The embodiment disclosed this time should be considered as illustrative in all points and not restrictive. The scope of the present invention is defined by the terms of the claims, rather than the description above, and is intended to include any modifications within the scope and meaning equivalent to the terms of the claims.

１画像検索装置
１０クエリ画像受付部
２０特徴ベクトル生成部
２２部分画像抽出部
２４マッピング部
３０ランキング部
４０共通分類度算出部
５０検索結果選出部
６０ビジュアルキーワード生成部
７０インデクシング部
７２部分画像抽出部
７４マッピング部
６５ビジュアルキーワードＤＢ
７５インデックスＤＢ
８０領域管理ＤＢ
９０検索対象画像ＤＢ
ＶＫ１-ＶＫＮビジュアルキーワード DESCRIPTION OF SYMBOLS 1 Image search device 10 Query image reception part 20 Feature vector generation part 22 Partial image extraction part 24 Mapping part 30 Ranking part 40 Common classification degree calculation part 50 Search result selection part 60 Visual keyword generation part 70 Indexing part 72 Partial image extraction part 74 Mapping unit 65 Visual keyword DB
75 Index DB
80 Area management DB
90 Search target image DB
VK1-VKN visual keywords

Claims

In an image search device that searches and outputs an image similar to a query image from a plurality of search target images,
By classifying each of the partial images extracted from the query image and the search target image into one of a plurality of clusters, a feature vector based on the number of classifications of the partial images extracted from each image into each cluster is generated. Feature vector generation means;
Ranking means for calculating a similarity between the feature vector of the query image and the feature vectors of the plurality of search target images, and ranking the search target images based on the similarity;
A common classification degree calculation that calculates a ratio of the number of the reference clusters that commonly classify the partial images of the search target image with respect to the number of all the reference clusters, with a cluster obtained by classifying the partial images of the query image as a reference cluster Means,
Selecting means for selecting, as an output target, a search target image having the ratio equal to or higher than a predetermined value from the top of the ranked search target images;
An image search apparatus comprising:

An image search device according to claim 1;
By selecting the query images one by one from the search target images and inputting them to the feature vector generation unit, the search target images selected by the selection unit are obtained, and the search target image, the query image, A clustering unit that creates a set of
A clustering apparatus comprising:

In an image search method in which a computer searches and outputs an image similar to a query image from a plurality of search target images,
By classifying each of the plurality of partial images extracted from the query image and the search target image into one of a plurality of clusters, a feature vector is generated based on the number of classifications of the partial images extracted from each image into each cluster. A feature vector generation step;
A ranking step of calculating a similarity between the feature vector of the query image and the feature vectors of the plurality of search target images, and ranking the search target images based on the similarity;
A common classification degree calculation that calculates a ratio of the number of the reference clusters that commonly classify the partial images of the search target image with respect to the number of all the reference clusters, with a cluster obtained by classifying the partial images of the query image as a reference cluster Process,
From the top of the ranked search target images, a selection step of selecting, as an output target, a search target image whose ratio is a predetermined value or more
The image search method characterized in that the computer performs the above.

A program for causing a computer to execute the image search method according to claim 3.