JP5569728B2

JP5569728B2 - Image ranking method, program, storage medium, and image display system

Info

Publication number: JP5569728B2
Application number: JP2010109454A
Authority: JP
Inventors: 秀敏川久保; 啓司柳井
Original assignee: THE UNIVERSITY OF ELECTRO-COMUNICATINS
Current assignee: THE UNIVERSITY OF ELECTRO-COMUNICATINS
Priority date: 2010-05-11
Filing date: 2010-05-11
Publication date: 2014-08-13
Anticipated expiration: 2030-05-11
Also published as: JP2011238057A

Description

本発明は、画像ランキングの作成方法、プログラム及び記憶媒体並びにかかる方法を用いた画像表示システムに関する。 The present invention relates to an image ranking creation method, a program, a storage medium, and an image display system using the method.

近年、インターネット上には膨大な情報が存在し、その情報から如何に効率的に、有益な情報のみを検索できるかが大きな課題となっている。例えば、テキスト情報を検索する技術として、非特許文献１があることはよく知られており、これにより、膨大なテキスト情報から有益な情報を上位に表示するシステムが提供されている。 In recent years, there is a huge amount of information on the Internet, and how to efficiently search only useful information from the information has become a big issue. For example, it is well known that there is Non-Patent Document 1 as a technique for searching for text information, thereby providing a system for displaying useful information from a large amount of text information.

一方、インターネット上にはテキスト情報だけでなく、Ｗｅｂ上のアルバムサービスやＧＰＳ機能付きカメラの普及に伴い、撮影地点の緯度経度情報がメタデータとして付された画像の情報も膨大に存在しているが、テキスト検索ほど研究はまだ進んでいない。 On the other hand, not only text information on the Internet, but with the spread of album services on the web and cameras with GPS functions, there is a huge amount of information on images with latitude and longitude information of shooting locations added as metadata. However, research is not as advanced as text search.

このような、多くの画像情報から、ユーザにとって有益な画像のランキングを得る先行技術として、非特許文献２、３、４がある。 Non-patent documents 2, 3, and 4 are prior arts for obtaining a ranking of images useful for the user from such a large amount of image information.

非特許文献２には、画像データベースの分析手法として、まず画像の近傍関係をグラフ化し、マルコフモデルによる画像間の遷移確率を求め、遷移確率行列の固有ベクトルを用いて、代表画像を決定する技術が開示されている。 In Non-Patent Document 2, as an image database analysis method, there is a technique in which the relationship between images is first graphed, a transition probability between images by a Markov model is obtained, and a representative image is determined using an eigenvector of a transition probability matrix. It is disclosed.

非特許文献３には、非特許文献１のアルゴリズムを画像に適用し、Ｗｅｂページ間のリンク構造を表わす行列の代わりに、画像間の類似度を表わす行列を用い、画像のランク付けを行う技術（ＶｉｓｕａｌＲａｎｋ）が開示されている。この技術では、テキストベースの画像検索結果を修正するためにＶｉｓｕａｌＲａｎｋを用いているが、画像間の類似度にＳＩＦＴ(Scale Invariant Feature Transform)特徴の対応点数を利用している。このＳＩＦＴ特徴の対応点数による類似度は、商品画像やランドマーク画像など、同一物体が写っていて対応点の出やすい場合に有効である反面、上位画像に同一物体の画像が並びやすく、結果の多様性に欠けるという問題がある。 In Non-Patent Document 3, a technique for ranking images by applying the algorithm of Non-Patent Document 1 to images and using a matrix representing similarity between images instead of a matrix representing a link structure between Web pages. (VisualRank) is disclosed. In this technique, VisualRank is used to correct a text-based image search result, but the number of corresponding points of SIFT (Scale Invariant Feature Transform) features is used for the similarity between images. The similarity based on the number of corresponding points of the SIFT feature is effective when the same object such as a product image or a landmark image is captured and the corresponding point is likely to appear. There is a problem of lack of diversity.

非特許文献４には、上記非特許文献３の問題を改良したものであり、まず画像のクラスタリングを行い、各クラスタについてＶｉｓｕａｌＲａｎｋを適用し、各クラスタについての結果を並列に提示することで、結果の多様性を確保する技術が開示されている。 Non-Patent Document 4 is an improvement of the problem of Non-Patent Document 3 described above. First, image clustering is performed, VisualRank is applied to each cluster, and the results for each cluster are presented in parallel. A technique for ensuring the diversity of the above is disclosed.

S. Brin and L. Page, “The anatomy of a large-scale hyper-textual Web search engine,” Computer networks and ISDN systems, vol.30, no.1-7, pp.107-117, 1998.S. Brin and L. Page, “The anatomy of a large-scale hyper-textual Web search engine,” Computer networks and ISDN systems, vol.30, no.1-7, pp.107-117, 1998. X.He, W.Y.Ma, and H.Zhang,“ImageRank: spectral techniques for structural analysis of image database,” IEEE ICME,2003.X.He, W.Y.Ma, and H.Zhang, “ImageRank: spectral techniques for structural analysis of image database,” IEEE ICME, 2003. Y.Jing and S.Baluja,“VisualRank: Applying pagerank to large-scale image search,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.30, no.11, 1870-1890, 2008Y. Jing and S. Baluja, “VisualRank: Applying pagerank to large-scale image search,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.30, no.11, 1870-1890, 2008 安倍満，吉田悠一，“Visualrankの多クラスへの拡張: 画像特徴量を用いた類似画像の自動分類とランキング付け手法" PRMU2008-178，pp.183-188Mitsuru Abe and Junichi Yoshida, “Extension of Visualrank to Multiple Classes: Automatic Classification and Ranking Method for Similar Images Using Image Features” PRMU2008-178, pp.183-188

本願の発明者らは、大量の画像情報の中から、ユーザにとって関心があると思われる画像を上位に表示するための技術を鋭意研究した結果、先行技術文献では、全く考慮されていない位置情報を考慮し、位置情報を有する画像に対してＶｉｓｕａｌＲａｎｋ技術を適用し、画像の特徴量と位置情報の両方に基づくランキングを作成することが、画像情報の検索技術において有益であるとの知見を得た。 As a result of earnest research on a technique for displaying an image considered to be of interest to the user from a large amount of image information among a large amount of image information, the inventors of the present application have found position information not considered at all in the prior art documents. Taking into account the above, we obtained the knowledge that applying VisualRank technology to images with position information and creating a ranking based on both image feature values and position information is useful in image information search technology. It was.

そこで、本発明は、上記知見に基づきなされたものであり、本発明の目的とするところは、位置情報と画像特徴量の両方を考慮したランキングの方法、及び、このランキング方法を用いて画像を表示するシステムを提供することにある。 Therefore, the present invention has been made based on the above knowledge, and the object of the present invention is to provide a ranking method that considers both position information and image feature amounts, and images using this ranking method. It is to provide a display system.

上記課題を解決するために、本発明のある観点によれば、メタデータを有する画像を検索し、ランキングを作成する方法であって、場所情報を含む入力を受け付けるステップと、前記入力に基づき、前記画像を検索するステップと、前記検索した画像の特徴量ベクトルを抽出し、類似度行列を作成するステップと、前記検索した画像の前記メタデータ内の位置情報と前記入力により示された位置との距離に基づきバイアスベクトルを作成するステップと、前記類似度行列と前記バイアスベクトルを用いて、下記（１）式を反復計算することによりランキングを作成するステップと、を含み、前記バイアスベクトルを作成するステップにおいて、前記距離が小さいほど大きなバイアスを与えることを特徴とするランキングを作成する方法が提供される。但し、Ｒはランキング値、Ｓは正規化された類似度行列、Ｐは正規化されたバイアスベクトル、αは、バイアスの強さを調整するパラメータ（０＜＝α＜＝１）、である。
Ｒ＝α（Ｓ×Ｒ）＋（１−α）Ｐ・・・（１）
この構成によれば、大量の画像情報の中から、ユーザにとって関心があると思われる画像を上位に表示するためのランキング方法を提供することが可能となる。 In order to solve the above-described problem, according to an aspect of the present invention, a method for searching for an image having metadata and creating a ranking, wherein an input including location information is received, and based on the input, Searching the image; extracting a feature vector of the searched image; creating a similarity matrix; position information in the metadata of the searched image; and a position indicated by the input; Creating a bias vector based on the distance of the step, and creating a ranking by repeatedly calculating the following equation (1) using the similarity matrix and the bias vector. Providing a method of creating a ranking, wherein the smaller the distance, the larger the bias is given. It is. Where R is a ranking value, S is a normalized similarity matrix, P is a normalized bias vector, and α is a parameter for adjusting the bias strength (0 <= α <= 1).
R = α (S × R) + (1−α) P (1)
According to this configuration, it is possible to provide a ranking method for displaying an image considered to be of interest to the user from a large amount of image information.

また、前記特徴量ベクトルは、下記（２）式に基づき抽出されることを特徴としてもよい。但し、Ｓｖは視覚特徴量ベクトル、Ｓｔはメタデータ特徴量ベクトル、βは、視覚特徴量ベクトルとメタデータ特徴量ベクトルの重みを調整するパラメータ（０＜＝β＜＝１）、である。
Ｓ＝β×Ｓｖ＋（１−β）×Ｓｔ・・・（２）
この構成によれば、画像の視覚特徴量とメタデータ特徴量に基づくランキング方法を提供することが可能となる。 Further, the feature quantity vector may be extracted based on the following equation (2). However, Sv is a visual feature vector, St is a metadata feature vector, β is a parameter (0 <= β <= 1) for adjusting the weight of the visual feature vector and the metadata feature vector.
S = β × Sv + (1−β) × St (2)
According to this configuration, it is possible to provide a ranking method based on the visual feature amount and metadata feature amount of an image.

また、前記視覚特徴量ベクトルは、複数の画像から求められることを特徴としてもよい。
この構成によれば、複数の画像の視覚特徴量に基づくランキング方法を提供することが可能となる。 The visual feature vector may be obtained from a plurality of images.
According to this configuration, it is possible to provide a ranking method based on visual feature amounts of a plurality of images.

また、前記バイアスベクトルを作成するステップにおいて、複数の前記場所情報に基づき、バイアスベクトルを作成することを特徴としてもよい。
この構成によれば、複数の位置情報に基づくランキング方法を提供することが可能となる。 In the step of creating the bias vector, a bias vector may be created based on the plurality of pieces of location information.
According to this configuration, it is possible to provide a ranking method based on a plurality of position information.

また、前記バイアスベクトルを作成するステップにおいて、前記距離が大きいほど大きなバイアスを与えることを特徴としてもよい。
この構成によれば、ユーザが関心のある位置から遠い位置の位置情報を有する画像を上位にランキングする方法を提供することが可能となる。 In the step of creating the bias vector, a larger bias may be applied as the distance increases.
According to this configuration, it is possible to provide a method of ranking an image having position information at a position far from a position in which the user is interested in higher rank.

また、前記入力を受け付けるステップは、前記αの値の入力をさらに受け付けることを特徴としてもよい。
この構成によれば、画像の特徴量と位置情報の重みを変化させることにより、どちらの情報を優先させて上位にランキングするのか、ユーザの好みにより調整することが可能となる。 The step of receiving the input may further receive an input of the value of α.
According to this configuration, by changing the feature amount of the image and the weight of the position information, it is possible to adjust which information is prioritized and ranked higher according to the user's preference.

また、本発明の別の観点によれば、メタデータを有する画像を検索し、ランキングに基づき前記画像を表示するシステムであって、場所情報を含む入力を受け付ける入力受付部と、前記入力に基づき、前記画像を検索する検索部と、前記検索した画像の特徴量ベクトルを抽出し、類似度行列を作成する行列作成部と、前記検索した画像の前記メタデータ内の位置情報と前記入力により示された位置との距離に基づきバイアスベクトルを作成するバイアス作成部と、前記類似度行列と前記バイアスベクトルを用いて、下記（１）式を反復計算することによりランキングを作成するランキング計算部と、前記ランキングに基づき前記画像を表示する表示部と、を備え、前記バイアス作成部において、前記距離が小さいほど大きなバイアスを与えることを特徴とする画像表示システムが提供される。
但し、Ｒはランキング値、Ｓは正規化された類似度行列、Ｐは正規化されたバイアスベクトル、αは、バイアスの強さを調整するパラメータ（０＜＝α＜＝１）、である。
Ｒ＝α（Ｓ×Ｒ）＋（１−α）Ｐ・・・（１）
この構成によれば、大量の画像情報を検索し、ユーザにとって関心があると思われる画像を上位に表示するシステムを提供することが可能となる。 According to another aspect of the present invention, a system for searching for an image having metadata and displaying the image based on a ranking, the input receiving unit receiving an input including location information, and the input based on the input A search unit for searching for the image, a matrix generation unit for extracting a feature vector of the searched image and generating a similarity matrix, position information in the metadata of the searched image, and the input A bias creation unit that creates a bias vector based on the distance to the position, a ranking calculation unit that creates a ranking by repeatedly calculating the following equation (1) using the similarity matrix and the bias vector; A display unit that displays the image based on the ranking, and the bias creating unit gives a larger bias as the distance is smaller. Image display system is provided, wherein the door.
Where R is a ranking value, S is a normalized similarity matrix, P is a normalized bias vector, and α is a parameter for adjusting the bias strength (0 <= α <= 1).
R = α (S × R) + (1−α) P (1)
According to this configuration, it is possible to provide a system that retrieves a large amount of image information and displays an image that seems to be of interest to the user at the top.

以上説明したように、本発明によれば、大量の画像情報の中から、ユーザにとって関心があると思われる画像を上位に表示することができる。 As described above, according to the present invention, it is possible to display an image considered to be of interest to the user from a large amount of image information.

本発明の第１実施形態におけるランキングを作成する方法のフローチャート。The flowchart of the method of producing the ranking in 1st Embodiment of this invention. 実験例１−１に用いたユーザインターフェース。User interface used in Experimental Example 1-1. 実験例１−１に用いた２５０語の名詞リスト。The noun list of 250 words used for Experimental example 1-1. 実験例１−１に用いた１００語の形容詞リスト。List of adjectives of 100 words used in Experimental Example 1-1. 実験例１−１に用いた注目点の都市名と緯度・経度のリスト。The list of the city name and latitude / longitude of the attention point used in Experimental Example 1-1. 実験例１−１の結果を示す画像。クエリが“pyramid”で、注目点が、(a)カイロ、(b)パリ、(c)ニューヨーク、(d)シドニー。The image which shows the result of Experimental example 1-1. The query is “pyramid” and the points of interest are (a) Cairo, (b) Paris, (c) New York, and (d) Sydney. 実験例１−１の結果を示す画像。クエリが“traditional”で、注目点が、(a)東京、(b)シドニー、(c)リオデジャネイロ、(d)デリー。The image which shows the result of Experimental example 1-1. The query is “traditional” and the points of interest are (a) Tokyo, (b) Sydney, (c) Rio de Janeiro, and (d) Delhi. 実験例１−２の結果を示す画像。“house”をクエリとし、αの値を１．０とした場合。The image which shows the result of Experimental example 1-2. When “house” is a query and the value of α is 1.0. 実験例１−２の結果を示す画像。“house”をクエリとし、αの値を０．９とした場合。The image which shows the result of Experimental example 1-2. When “house” is a query and the value of α is 0.9. 実験例１−２の結果を示す画像。“house”をクエリとし、αの値を０．８とした場合。The image which shows the result of Experimental example 1-2. When “house” is a query and the value of α is 0.8. 実験例１−２の結果を示す画像。“house”をクエリとし、αの値を０．５とした場合。The image which shows the result of Experimental example 1-2. When “house” is a query and the value of α is 0.5. 実験例１−２の結果を示す画像。“house”をクエリとし、αの値を０．０とした場合。The image which shows the result of Experimental example 1-2. When “house” is a query and the value of α is 0.0. 第２実施形態における、メタデータ特徴量による類似度を使用した場合の実験例２の結果を示す画像。The image which shows the result of Experimental example 2 at the time of using the similarity degree by metadata feature-value in 2nd Embodiment. 第２実施形態における、視覚特徴量による類似度を使用した場合の実験例２の結果を示す画像。The image which shows the result of Experimental example 2 at the time of using the similarity degree by visual feature-value in 2nd Embodiment. 実験例３−１の結果を示す画像。クエリが“phone”で、注目点がパリである。(a)タグ特徴量による類似度を使用した場合。(b)視覚特徴量による類似度を使用した場合。(c)タグ特徴量による類似度と、視覚特徴量による類似度を合成して用いた場合。The image which shows the result of Experimental example 3-1. The query is “phone” and the point of interest is Paris. (a) When similarity based on tag features is used. (b) When similarity based on visual features is used. (c) When the similarity based on the tag feature and the similarity based on the visual feature are combined and used. 実験例３−２の結果を示す画像。クエリが“cat”で、注目点が東京である。(a)タグ特徴量による類似度を使用した場合。(b)視覚特徴量による類似度を使用した場合。(c)タグ特徴量による類似度と、視覚特徴量による類似度を合成して用いた場合。The image which shows the result of Experimental example 3-2. The query is “cat” and the point of interest is Tokyo. (a) When similarity based on tag features is used. (b) When similarity based on visual features is used. (c) When the similarity based on the tag feature and the similarity based on the visual feature are combined and used. 実験例４の結果を示す画像。クエリが“insect”で、注目点が、シドニー、デリー、ケープタウンの３地点とした場合。The image which shows the result of Experimental example 4. When the query is “insect” and the points of interest are Sydney, Delhi, and Cape Town. 実験例５−１の結果を示す画像。クエリが“castle”で、注目点が東京で、ポジティブなバイアスを用いた場合。The image which shows the result of Experimental example 5-1. The query is “castle”, the focus is on Tokyo, and a positive bias is used. 実験例５−１の結果を示す画像。クエリが“castle”で、注目点が東京で、ネガティブなバイアスを用いた場合。The image which shows the result of Experimental example 5-1. The query is “castle”, the focus is on Tokyo, and a negative bias is used. 実験例５−２の結果を示す画像。クエリが“arc de triomphe”で、注目点がパリで、ネガティブなバイアスを用いた場合。The image which shows the result of Experimental example 5-2. The query is “arc de triomphe”, the focus is Paris, and a negative bias is used. 第６実施形態における、画像表示システムのブロック図。The block diagram of the image display system in 6th Embodiment.

＜第１実施形態＞
以下では、図面を参照しながら、本発明の各実施形態に係る方法及び装置等について説明する。
図１は、本実施形態におけるランキングを作成する方法のフローチャートである。なお、フローチャートにおいて、「Ｓ」は各処理のステップを示すものとする。 <First Embodiment>
Hereinafter, a method, an apparatus, and the like according to each embodiment of the present invention will be described with reference to the drawings.
FIG. 1 is a flowchart of a method for creating a ranking in the present embodiment. In the flowchart, “S” indicates a step of each process.

まず、ユーザから、検索する画像のクエリと場所情報の入力を受け付ける（Ｓ１００）。
画像のクエリは、典型的には言語からなり、名詞、形容詞、文章など、後述の画像検索の機能次第でどのようなテキスト情報も含むことができる。また、画像マッチングの技術を用い、画像を入力としてもよい。 First, an input of an image query to be searched and location information is received from the user (S100).
An image query typically consists of a language and can include any text information, such as nouns, adjectives, sentences, etc., depending on the image search function described below. An image may be input using an image matching technique.

検索される画像は、画像に関する付加情報、即ちメタデータを有する。そのメタデータは、画像に付された、画像を撮影した地点のＧＰＳ(Global Positioning System)情報、日時、方向、コメントなどを含む。典型的には、デジタルカメラ用の画像ファイルの規格Exif (Exchangeable Image File Format）にあるように、画像に付加された様々な情報を言う。 The image to be searched has additional information regarding the image, that is, metadata. The metadata includes GPS (Global Positioning System) information, date / time, direction, comments, and the like of the location where the image was taken, attached to the image. Typically, it refers to various information added to an image as in the Exif (Exchangeable Image File Format) standard for image files for digital cameras.

入力する場所情報とは、地域、地方、国、都市などの土地の地理上の位置を表わす情報であり、言語、地図情報、又は位置情報として表現できる。位置情報とは、座標系で表現される、二次元平面又は三次元空間における物理量を言い、典型的には、緯度、経度、高さなどからなるＧＰＳ情報である。 The place information to be input is information indicating the geographical position of a land such as a region, a region, a country, and a city, and can be expressed as language, map information, or position information. The position information refers to a physical quantity in a two-dimensional plane or a three-dimensional space expressed by a coordinate system, and is typically GPS information including latitude, longitude, height, and the like.

受け付けたクエリに関するメタデータ付き画像を検索する（Ｓ１１０）。このステップは、後続のステップの母集団となる画像を収集するステップであり、画像検索の方法自体は特に特定されない。典型的には、Ｆｌｉｃｋｒ（登録商標）、Ｇｏｏｇｌｅ（登録商標）、Ｙａｈｏｏ（登録商標）が提供するＡＰＩ（Application Programming Interface）の検索機能を用いたＷｅｂサービスを使用してもよい。もちろん、自ら画像データを有し、その中から、受け付けたクエリにヒットする画像を独自の機構で検索し、結果セットの画像を後続のステップの母集団としてもよい。 An image with metadata related to the accepted query is searched (S110). This step is a step of collecting images that are a population of subsequent steps, and the image search method itself is not particularly specified. Typically, a Web service using an API (Application Programming Interface) search function provided by Flickr (registered trademark), Google (registered trademark), or Yahoo (registered trademark) may be used. Of course, it is also possible to search for an image that has its own image data and that hits the accepted query from the original mechanism, and use the image of the result set as the population of the subsequent steps.

次に、検索した画像から視覚特徴量を抽出し、類似度行列を作成する（Ｓ１２０）。
検索した画像からの視覚特徴量の表現方法としては、カラーヒストグラムとＳＩＦＴ記述子によるＢａｇｏｆＦｅａｔｕｒｅｓ法を利用する。但し、これに限定されるものではなく、例えば、エッジヒストグラムやガボール特徴ヒストグラムなども利用できる。カラーヒストグラムは、画像中にどの色がどの程度の割合で出現しているかを示すヒストグラムであり、通常ＲＧＢ色空間を６４等分してヒストグラムを作成する。その結果、画像は、６４次元のベクトルで表現される。 Next, a visual feature amount is extracted from the searched image, and a similarity matrix is created (S120).
As a method of expressing the visual feature amount from the searched image, the Bag of Features method using a color histogram and a SIFT descriptor is used. However, the present invention is not limited to this, and for example, an edge histogram or a Gabor feature histogram can be used. The color histogram is a histogram showing which color appears in what ratio in the image, and the histogram is created by dividing the normal RGB color space into 64 equal parts. As a result, the image is represented by a 64-dimensional vector.

ＢａｇｏｆＦｅａｔｕｒｅｓ法は、画像から局所特徴量を抽出し、その出現頻度のヒストグラムで画像を表現する方法である。局所特徴量とは、画像の一部分の特徴を記述する特徴量であり、画像中の複数の個所から抽出する。ＳＩＦＴ法は、特徴点の検出と特徴量の記述を行うアルゴリズムであり、検出した特徴点に対して、各画素に回転・スケール変化・照明変化による画像の変化に不変な特徴量を記述することができるので、特定物体認識だけでなく画像分類の特徴量としても有効である。 The Bag of Features method is a method of extracting a local feature amount from an image and expressing the image with a histogram of its appearance frequency. The local feature amount is a feature amount that describes a feature of a part of the image, and is extracted from a plurality of locations in the image. The SIFT method is an algorithm for detecting feature points and describing feature amounts. For each detected feature point, a feature amount that is invariant to image changes due to rotation, scale change, or illumination change is described for each pixel. Therefore, it is effective not only for specific object recognition but also as a feature quantity for image classification.

具体的には、以下のように行う。まず、収集した各画像について、局所特徴を抽出する個所を決定する。Difference of Gaussian(DoG)、ランダムサンプリング、グリッドサンプリングなど、様々な方法を取り得る。なお、後述の実験例ではランダムサンプリングを採用している。そして、この決定した個所から、ＳＩＦＴ法による特徴記述方法で、局所特徴を抽出する。 Specifically, this is performed as follows. First, for each collected image, a location for extracting a local feature is determined. Various methods such as Difference of Gaussian (DoG), random sampling, and grid sampling can be used. Note that random sampling is employed in the experimental examples described later. Then, local features are extracted from the determined portions by a feature description method using the SIFT method.

次に、抽出された特徴記述子を作成する。具体的には、ｋ−ｍｅａｎｓ法のクラスタリングによって得られる特徴記述子のクラスタの各中心を、ＢａｇｏｆＦｅａｔｕｒｅｓ法のためのコードブック要素とする。 Next, the extracted feature descriptor is created. Specifically, each center of the cluster of feature descriptors obtained by the clustering of the k-means method is set as a codebook element for the Bag of Features method.

そして、画像から抽出された特徴記述子を、最も近いコードブック要素へ割り振り、ヒストグラムを作成する。このようにして得られた特徴量の出現頻度ヒストグラムが、ＢａｇｏｆＦｅａｔｕｒｅｓ法での画像表現ベクトルであり、画像を、どのコードブック要素に近い特徴量がどの程度の割合で出現しているかというヒストグラムで表現したことになる。 Then, the feature descriptor extracted from the image is allocated to the nearest codebook element, and a histogram is created. The appearance frequency histogram of the feature quantity obtained in this way is an image expression vector in the Bag of Features method, and a histogram indicating the ratio of the feature quantity close to which codebook element appears in the image. It will be expressed in.

次に、上記のようにして得られたカラーヒストグラムと特徴量出現頻度ヒストグラムを基に、ヒストグラムインタセクションにより、各画像間の類似度を求める。ヒストグラムインタセクションとは、比較する２つの画像のヒストグラムの各要素について、小さい方の値を採用し、これを全部の要素に亘り和を求め、その和が大きければ値が大きい、即ち、類似度が高いと評価するものである。具体的には、式１０の通り、カラーヒストグラムと特徴量出現頻度ヒストグラムのそれぞれで、類似度行列を求めておき、それらの線形和を取り合成することにより、両特徴量を混合した類似度行列を求めることができる。なお、式１０で、Scombineは合成した類似度行列、Scolorはカラーヒストグラムによる類似度行列、Sbofは特徴量出現頻度ヒストグラムによる類似度行列である。
Next, based on the color histogram and the feature quantity appearance frequency histogram obtained as described above, the similarity between the images is obtained by histogram intersection. With the histogram intersection, the smaller value is adopted for each element of the histograms of the two images to be compared, and the sum is obtained over all the elements. If the sum is large, the value is large. Is evaluated as high. Specifically, as shown in Expression 10, a similarity matrix is obtained for each of the color histogram and the feature quantity appearance frequency histogram, and a similarity matrix in which both feature quantities are mixed by combining the linear sums of them. Can be requested. In Equation 10, Scombine is a synthesized similarity matrix, Scolor is a similarity matrix based on a color histogram, and Sbof is a similarity matrix based on a feature amount appearance frequency histogram.

次に、本実施形態では、検索した画像のメタデータに含まれる位置情報と入力された場所の位置から、特徴量を補正するためのバイアスベクトルを作成する（Ｓ１３０）。具体的には、以下のように行う。 Next, in the present embodiment, a bias vector for correcting the feature amount is created from the position information included in the searched image metadata and the position of the input location (S130). Specifically, this is performed as follows.

画像のメタデータに含まれる位置情報とは、典型的にはＧＰＳ情報であり、画像を撮影した位置などが緯度・経度で表わされている。また、入力された場所の位置とは、ユーザが関心のある注目する場所（注目点）として、地理上の位置を表わす情報であり、入力される時点では、言語、地図情報、又は位置情報として表現されるが、２地点間の地球上の距離を求めるために、最終的に、緯度・経度で表わされる。もちろん、２地点間の距離をデータとして有している場合には、下記のような計算をすることなく、その距離データを用いてもよいことは言うまでもない。 The position information included in the metadata of the image is typically GPS information, and the position where the image is captured is represented by latitude and longitude. Further, the position of the input place is information indicating a geographical position as a place of interest (attention point) in which the user is interested, and at the time of input, as the language, map information, or position information. Although it is expressed, in order to obtain the distance on the earth between two points, it is finally expressed by latitude and longitude. Of course, when the distance between two points is included as data, it goes without saying that the distance data may be used without performing the following calculation.

２地点間が緯度・経度で表わされる場合、２地点間の地球上の距離は、式１１に表わすように、地球を半径１の完全な球であると仮定し、球面三角法により計算できる。
When the distance between two points is represented by latitude and longitude, the distance on the earth between the two points can be calculated by spherical trigonometry, assuming that the earth is a perfect sphere with a radius of 1, as shown in Equation 11.

上記の２地点間の距離を基に、バイアスベクトルを式１２により作成する。
Based on the distance between the above two points, a bias vector is created according to Equation 12.

このバイアスベクトルは、画像iに対応する要素であり、画像iが注目点Aに近いほど、即ち、両者の距離が小さいほど大きくなる。なお、好ましくは、このバイアスベクトルは、正規化されて用いられる。
また、地理上の距離が近くとも、文化的に相違する場合は距離を遠ざける（重みを小さくする）ようにしても良い。この場合は文化的相違に関する重みテーブルを別途設ける等、適宜周知の方法で実現できる。 This bias vector is an element corresponding to the image i, and increases as the image i is closer to the point of interest A, that is, as the distance between the two is smaller. Preferably, this bias vector is used after being normalized.
In addition, even if the geographical distance is close, the distance may be increased (the weight is reduced) if it is culturally different. In this case, it can be realized by a known method as appropriate, such as separately providing a weight table for cultural differences.

次に、作成した類似度行列とバイアスベクトルを用いて、ＶｉｓｕａｌＲａｎｋを実行し、画像のランキングを得る（Ｓ１４０）。具体的には、ＶｉｓｕａｌＲａｎｋを実行し、ランキングを得るとは、式１の反復計算を行い、列ベクトルＲを収束させることである。
Next, using the created similarity matrix and bias vector, VisualRank is executed to obtain image ranking (S140). Specifically, executing VisualRank and obtaining the ranking means performing the iterative calculation of Equation 1 to converge the column vector R.

類似度行列ＳにＶｉｓｕａｌＲａｎｋ値であるランキング値Ｒのベクトルをかけることで、ランキング値の更新を行い、この更新を、ランキング値Ｒが収束するまで繰り返し、その結果、ランキング値の大きい画像が上位画像となる。 The ranking value is updated by multiplying the similarity matrix S by the vector of the ranking value R, which is the VisualRank value, and this updating is repeated until the ranking value R converges. It becomes.

ＲはＶｉｓｕａｌＲａｎｋベクトルで，各画像のＶｉｓｕａｌＲａｎｋ値が並ぶ。初期値は全画像について等しく、例えば、１．０としてよい。Ｓは、上記で求めた、画像の類似度行列の各列を正規化したものである。なお、Ｓは類似度行列を列ごとに正規化するのは、更新時にＶｉｓｕａｌＲａｎｋ値の合計を変化させないためである。 R is a VisualRank vector in which VisualRank values of each image are arranged. The initial value is the same for all images, and may be 1.0, for example. S is obtained by normalizing each column of the image similarity matrix obtained above. The reason why S normalizes the similarity matrix for each column is that the total VisualRank value is not changed during the update.

バイアスベクトルＰとして一様なベクトルを与えると、各画像のＶｉｓｕａｌＲａｎｋ値を均等化させる方向へ補正がかかる。一方、不均一なベクトルを与えると、一部の画像を強調する補正がかかる。 When a uniform vector is given as the bias vector P, correction is applied in the direction of equalizing the VisualRank value of each image. On the other hand, if a non-uniform vector is given, correction for enhancing a part of the image is applied.

＜実験例１−１＞
図２は、ユーザから検索する画像のクエリと場所情報の入力を受け付けるユーザインターフェースである。母集団を後述するような方法で作成したため、特定の名詞や形容詞を選択してクエリを作成するようになっている。また、注目する場所も、メニュー形式で選択できるようになっている。もちろん、ユーザが自由に入力できるようなフィールドを備えていてもよい。 <Experimental Example 1-1>
FIG. 2 is a user interface that receives an image query and location information input from a user. Since the population was created by the method described later, a query is created by selecting specific nouns and adjectives. In addition, the place of interest can be selected in a menu format. Of course, you may provide the field which a user can input freely.

Ｆｌｉｃｋｒのオンラインアルバムサービスの検索機能ＡＰＩを使用し、図３と４に示す名詞２５０語、形容詞１００語、合計３５０語について検索し、位置情報付き画像を２０００枚ずつ収集した。また、Ｆｌｉｃｋｒでは、一部のユーザが類似した画像を大量に投稿していることがあるので、同一ユーザが投稿した画像の数を制限した。また、図５に示すように、注目する都市を１０都市選定し、その緯度・経度を注目点の位置情報として使用した。 Using the search function API of Flickr's online album service, we searched for 250 nouns, 100 adjectives and a total of 350 words shown in FIGS. 3 and 4, and collected 2000 images with position information. In addition, in Flickr, some users may post a lot of similar images, so the number of images posted by the same user is limited. Further, as shown in FIG. 5, ten cities of interest are selected, and the latitude and longitude thereof are used as position information of the point of interest.

画像を収集した後、上記単語ごとにＶｉｓｕａｌＲａｎｋ値を算出した。その際、カラーヒストグラムはＲＧＢ空間での６４次元とし、ＳＩＦＴ記述子による要素数は５００次元とした。 After collecting the images, the Visual Rank value was calculated for each word. At that time, the color histogram was 64 dimensions in the RGB space, and the number of elements by the SIFT descriptor was 500 dimensions.

図６ａ〜ｄは、“pyramid”をクエリとした結果である。図６ａは、カイロを注目点とした結果であり、エジプトのピラミッドの画像が上位画像になっている。図６ｂは、パリが注目点であり、ルーブル美術館の前にあるピラミッド型の建築物が上位に表示される。図６ｃは、注目点がニューヨークであり、アメリカ東部で撮影された建築物の画像が上位に表示される。図６ｄは、注目点がシドニーであり、オーストラリアで撮影された“pyramid“に関する画像が上位に表示されている。 6A to 6D show the results of using “pyramid” as a query. FIG. 6A shows the result of the Cairo as a point of interest, and the image of the Egyptian pyramid is the upper image. In FIG. 6b, Paris is a notable point, and a pyramid type building in front of the Louvre is displayed at the top. In FIG. 6c, the attention point is New York, and an image of a building photographed in the eastern United States is displayed at the top. In FIG. 6d, the attention point is Sydney, and an image regarding “pyramid” taken in Australia is displayed at the top.

また、図７ａ〜ｄは、“traditional”をクエリとした結果である。図７ａは東京、図７ｂはシドニー、図７ｃはリオデジャネイロ、図７ｄはデリーである。各地域についての伝統的な衣装を着た人々の画像が得られた。即ち、例えば“traditional”と“東京”である場合、着物などを着た人の画像が上位にランキングされている。
このように、大量の画像情報を検索し、ユーザにとって関心があると思われる画像を上位に表示するためのランキング方法を提供することが可能となる。また、ＧＰＳ情報を用いることで、同一物体の画像を排除することも可能となる。 7A to 7D show the results of using “traditional” as a query. 7a is Tokyo, FIG. 7b is Sydney, FIG. 7c is Rio de Janeiro, and FIG. 7d is Delhi. Images of people wearing traditional costumes for each region were obtained. That is, for example, in the case of “traditional” and “Tokyo”, images of people wearing kimonos are ranked higher.
As described above, it is possible to provide a ranking method for searching a large amount of image information and displaying an image considered to be of interest to the user at the top. Further, by using GPS information, it is possible to exclude images of the same object.

＜実験例１−２＞
また、式１のαを変化させると、以下のように、画像の特徴量と場所情報の重みを変化させることにより、上位に表示される画像に変化が起こるので、どちらの情報を優先させて上位にランキングするのか、ユーザの好みにより調整することが可能となる。 <Experimental example 1-2>
In addition, when α in Expression 1 is changed, as shown below, a change occurs in the image displayed at the upper level by changing the weight of the image feature amount and the location information. Therefore, which information has priority. It is possible to adjust the ranking according to the user's preference.

図８ａ〜ｅは、“house”をクエリとし、αの値を(a)１．０、(b)０．９、(c)０．８、(d)０．５、(e)０．０と変化させたものである。(a)では、α＝１なので、シドニーという場所情報の重みは一切入ってない状態であり、上位に表示されるのは、西洋の家の画像である。(b)では、α＝０．９なので、シドニーという場所情報の重みが１０％入ったものであり、オーストラリアの家の画像が上位になっている。(c)では、α＝０．８であり、さらにシドニーという場所情報の重みが増したものであり、オーストラリアの中でもシドニーに近い場所で撮影された画像が上位になっている。(d)では、α＝０．５、即ち、画像特徴量と場所情報の重みが５０：５０であり、シドニーに近い場所で撮影されただけで、見た目があまり“house”らしくない画像も上位に入っている。(e)では、α＝０であり、画像特徴量の重みは一切含まれず、シドニーという場所情報のみによるものであり、“house”とは関係がなく撮影場所がシドニーに近い順に表示される。 8A to 8E, “house” is a query, and the values of α are (a) 1.0, (b) 0.9, (c) 0.8, (d) 0.5, (e) 0. This is changed to 0. In (a), since α = 1, there is no weight of location information “Sydney”, and an image of a Western house is displayed at the top. In (b), α = 0.9, so the weight of the location information Sydney is 10%, and the image of the Australian house is at the top. In (c), α = 0.8, and the weight of the location information “Sydney” is further increased, and an image taken in a location close to Sydney in Australia is ranked high. In (d), α = 0.5, that is, the weight of the image feature amount and the location information is 50:50, and an image that does not look like “house” just because it was taken in a place close to Sydney In. In (e), α = 0, no image feature weights are included, and only the location information “Sydney” is used, and the shooting locations are displayed in the order closest to Sydney, regardless of “house”.

＜第２実施形態＞
本実施形態は、画像に付されたメタデータにおけるタグや説明のためのテキスト文の類似度を使用するものである。なお、画像に直接付されたテキスト文等だけでなく、画像が含まれるＷｅｂページのテキスト文を使用してもよい。この場合、類似度行列Ｓは以下の式２により求められる。なお、Svは視覚特徴量ベクトル、Stはメタデータ特徴量ベクトル、βは視覚特徴量ベクトルとメタデータ特徴量ベクトルの重みを調整するパラメータである。
Second Embodiment
In the present embodiment, a tag in metadata attached to an image or a text sentence similarity for explanation is used. In addition, not only the text sentence directly attached to the image but also a text sentence of a Web page including the image may be used. In this case, the similarity matrix S is obtained by the following equation 2. Sv is a visual feature vector, St is a metadata feature vector, β is a parameter for adjusting the weight of the visual feature vector and the metadata feature vector.

具体的には、画像に付されたテキストをクエリごとに集計し、多く付された上位例えば５００の種類のテキストをコードブックとし（但し、クエリ自体はコードブックには入れない）、そのテキストについてのバイナリベクトルを画像の特徴ベクトルとして、そのコードブックに関する５００次元のベクトルを画像ごとに作成する。それぞれの画像では、付されているテキストに対応する要素が１になり、逆に付されていない要素は０になる。画像間の類似度は、バイナリベクトルのコサイン類似度を使用する。例えば、画像に付されたテキストに対応する要素が１で他が０そのため画像X,Yの類似度は以下の式１３ようになる。
Specifically, the text attached to the image is aggregated for each query, and the higher-ranked texts, for example, 500 kinds of texts are used as a code book (however, the query itself is not included in the code book). As a feature vector of an image, a 500-dimensional vector related to the codebook is created for each image. In each image, the element corresponding to the attached text is 1, and the element not attached is 0. The similarity between images uses the cosine similarity of binary vectors. For example, since the element corresponding to the text attached to the image is 1 and the others are 0, the similarity between the images X and Y is expressed by the following Expression 13.

＜実験例２＞
図９は、テキストによる類似度を使用し、“napoleon”をクエリとし、場所情報を“シドニー”とした場合の結果である。上位画像のほとんどがナポレオンフィッシュに関連する画像となっている。これに比べ、視覚特徴量のみを使用した場合は、図１０のように、無関係な画像も含まれている。テキストの内容は人により付けられているので、画像の特徴に直接関係のある場合が多いことに起因すると考えられる。 <Experimental example 2>
FIG. 9 shows the results when the similarity by text is used, “napoleon” is a query, and location information is “Sydney”. Most of the upper images are related to Napoleon Fish. In contrast, when only the visual feature amount is used, irrelevant images are also included as shown in FIG. It is considered that the content of the text is attached by a person and is often directly related to the feature of the image.

＜第３実施形態＞
本実施形態は、複数の画像の視覚特徴量とメタデータ特徴量（タグ特徴量）を混合するものである。例えば、２つの画像の視覚特徴量とメタデータ特徴量から、すべてが混合された類似度行列を求めるには、式１４のように計算できる。なお、Sv1、Sv2はそれぞれの視覚特徴量の類似度行列、Stはメタデータ特徴量の類似度行列である。
<Third Embodiment>
In the present embodiment, visual feature amounts and metadata feature amounts (tag feature amounts) of a plurality of images are mixed. For example, in order to obtain a similarity matrix in which all are mixed from the visual feature amount and the metadata feature amount of two images, it can be calculated as shown in Equation 14. Sv1 and Sv2 are similarity matrixes of the respective visual feature quantities, and St is a similarity matrix of the metadata feature quantities.

＜実験例３−１＞
図１１ａ〜ｃは“phone”をクエリとし，パリを場所情報とした場合の結果である。
(a)ではタグ特徴量を用いて類似度を求めており、英国の赤い電話ボックスの画像が上位画像となっている。これらの画像には、“london”、“red”、“phonebooth”といったタグが共通して付与されていた。
(b)では視覚特徴量を用いて類似度を求めており、街中を写した黒っぽい画像が上位画像となっている。
(c)はタグ特徴量と視覚特徴量でそれぞれ求めた類似度を合成して用いた場合である。
赤い電話ボックスの全体を写した画像が主な上位画像となっている。
タグ特徴量によって赤い電話ボックスの画像が上位になりつつ、視覚特徴量によって電話ボックスの全体を写した画像が上位に選ばれたと考えられる。 <Experimental example 3-1>
11a to 11c show the results when “phone” is a query and Paris is location information.
In (a), the similarity is obtained using the tag feature, and the image of the British red telephone box is the upper image. Tags such as “london”, “red”, and “phonebooth” were commonly assigned to these images.
In (b), the degree of similarity is obtained using visual feature amounts, and a dark image showing the city is the upper image.
(c) is a case where the similarity obtained by the tag feature and the visual feature is combined and used.
An image of the entire red telephone box is the main high-level image.
It is considered that the image of the entire telephone box was selected by the visual feature amount while the image of the red telephone box was ranked high by the tag feature amount.

＜実験例３−２＞
図１２ａ〜ｃは“cat”をクエリとし、東京を場所情報とした場合の結果である。
(a)ではタグ特徴量を用いて類似度を求めており、屋内で撮影された猫の画像が主な上位画像となっている。これらの画像には、“pets”、“cute”といったタグが共通して付与されていた。
(b)では視覚特徴量を用いて類似度を求めており、屋外で撮影された猫の画像が主な上位画像となっている。これは背景領域の類似性によるものと考えられる。
(c)はタグ特徴量と視覚特徴量でそれぞれ求めた類似度を合成して用いた場合である。
屋内での撮影画像と屋外での撮影画像のバランスが良くなり、上位画像の多様性が向上している。
これらのことは、視覚特徴量とタグ特徴量を組み合わせることで、より多角的な観点からみて尤もらしい上位画像を得られるようになる可能性や、上位画像の多様性が向上する可能性を示している。 <Experimental Example 3-2>
12A to 12C show the results when “cat” is a query and Tokyo is location information.
In (a), the similarity is obtained by using the tag feature amount, and the cat image taken indoors is the main upper image. Tags such as “pets” and “cute” were commonly assigned to these images.
In (b), the degree of similarity is obtained using the visual feature amount, and the cat image taken outdoors is the main upper image. This is considered to be due to the similarity of the background region.
(c) is a case where the similarity obtained by the tag feature and the visual feature is combined and used.
The balance between images taken indoors and images taken outdoors is improved, and the diversity of higher-order images is improved.
These indicate that combining the visual feature value and the tag feature value can lead to a higher-level image that is likely to be obtained from a more diverse viewpoint, and the diversity of the higher-level image can be improved. ing.

＜第４実施形態＞
本実施形態では、複数の場所情報に基づき、バイアスベクトルを求めるものである。複数の場所情報のバイアスベクトルの平均、最大値、最小値などにより求めることができる。 <Fourth embodiment>
In this embodiment, a bias vector is obtained based on a plurality of pieces of location information. It can be obtained from the average, maximum value, minimum value, etc. of bias vectors of a plurality of location information.

＜実験例４＞
図１３は、“insect”をクエリとし、注目点として、シドニー・デリー・ケープタウンを場所情報とした場合の結果である。３つの場所情報それぞれについてバイアスベクトルを作成し、その平均ベクトルを使用している。インド洋を取り囲む３都市を使用することで、インド洋を囲む広い地域で撮影された虫の画像が上位画像として得られている。 <Experimental example 4>
FIG. 13 shows the results when “insect” is used as a query, and Sydney, Delhi, and Cape Town are used as location information as location information. A bias vector is created for each of the three pieces of location information, and the average vector is used. By using three cities that surround the Indian Ocean, insect images taken in a wide area surrounding the Indian Ocean are obtained as high-order images.

＜第５実施形態＞
本実施形態では、場所情報で与えられた位置から遠い位置の位置情報を持つ画像に大きい値を与える、即ち、ネガティブなバイアスベクトルを求めるものである。式１５によりもとめることができる。例えば、パリ以外の凱旋門画像を検索したい場合に、場所情報の入力値として“パリ”を与えることができる。
＜実験例５−１＞
図１４(a)は“castle”をクエリとし、東京を場所情報とした場合の結果である。
東京に近い場所で撮影された画像に大きいバイアス値を与えたため日本の城の画像が上位画像となった。
図１４(b)は、逆に、東京から遠い地点で撮影された画像に大きいバイアス値を与えた場合の結果である。日本から離れた各地で撮影された、城を主とした建物の画像が上位となっている。
＜実験例５−２＞
図１５は“arc de triomphe”をクエリとし、パリについてネガティブなバイアスベクトルを用いた場合の結果である。タイの凱旋門（パトゥーサイ）の画像や、中国の深センにあるフランスの凱旋門を模倣したものの画像が上位にきており、「パリ以外の凱旋門画像」が得られている。 <Fifth Embodiment>
In this embodiment, a large value is given to an image having position information at a position far from the position given by the location information, that is, a negative bias vector is obtained. It can be found by Equation 15. For example, when searching for a triumphal arch image other than Paris, “Paris” can be given as an input value of location information.
<Experimental Example 5-1>
FIG. 14A shows the result when “castle” is a query and Tokyo is location information.
The image of a Japanese castle became the top image because a large bias value was given to images taken near Tokyo.
FIG. 14B shows the result when a large bias value is given to an image photographed at a point far from Tokyo. The images of buildings mainly made up of castles, taken in various locations away from Japan, are at the top.
<Experimental example 5-2>
FIG. 15 shows the results when “arc de triomphe” is used as a query and a negative bias vector is used for Paris. The image of Thailand's Arc de Triomphe (Patusai) and the imitation of the French Arc de Triomphe in Shenzhen, China, are at the top, and an “Arch of Triumph Image outside Paris” has been obtained.

上述したランキングを作成する方法を、コンピュータに実行させるためのプログラムとして実現することもできるし、コンピュータにより読み出され実行可能なプログラムとして記憶した記憶媒体として提供することもできる。 The above-described method for creating a ranking can be realized as a program for causing a computer to execute, or can be provided as a storage medium that is read by a computer and stored as an executable program.

＜第６実施形態＞
図１６は、上述したランキングを作成する方法を実装した画像表示システムのブロック図である。 <Sixth Embodiment>
FIG. 16 is a block diagram of an image display system that implements the above-described ranking creation method.

本実施形態における画像表示システム１の入力受付部１０は、ユーザ９９が関心のある検索しようとする画像のクエリと場所情報を、画像表示システム１への入力として、受け付ける。具体的には、図２に関して上述した通りである。 The input receiving unit 10 of the image display system 1 in the present embodiment receives an image query and location information that the user 99 is interested in as an input to the image display system 1. Specifically, as described above with reference to FIG.

検索部２０は、入力受付部１０で受け付けた画像のクエリを基に、データベース内の画像データ１００やインターネット上の画像データ２００を検索する。行列作成部３０は、検索部２０により検索条件に合致し探し出された画像から特徴量を抽出し、類似度行列を作成する。また、バイアス作成部４０は、検索部２０により探し出された画像のメタデータに含まれる位置情報と入力受付部１０において入力された場所情報の位置から、特徴量を補正するためのバイアスベクトルを作成する。ランキング計算部５０は、行列作成部３０が作成した類似度行列とバイアス作成部４０が作成したバイアスベクトルを用いて、ＶｉｓｕａｌＲａｎｋを実行し、画像のランキングを得る。表示部６０は、ユーザ９９に対して、ランキング計算部５０が計算したランキングを基にした表示順に従い、画像を表示する。
これによれば、大量の画像情報を検索し、ユーザにとって関心があると思われる画像を上位に表示するシステムを提供することが可能となる。 The search unit 20 searches the image data 100 in the database and the image data 200 on the Internet based on the image query received by the input receiving unit 10. The matrix creation unit 30 extracts feature amounts from the images found by the search unit 20 that match the search conditions, and creates a similarity matrix. In addition, the bias creation unit 40 calculates a bias vector for correcting the feature amount from the position information included in the metadata of the image found by the search unit 20 and the position of the location information input by the input receiving unit 10. create. The ranking calculation unit 50 executes VisualRank using the similarity matrix created by the matrix creation unit 30 and the bias vector created by the bias creation unit 40 to obtain image rankings. The display unit 60 displays images to the user 99 according to the display order based on the ranking calculated by the ranking calculation unit 50.
According to this, it is possible to provide a system that retrieves a large amount of image information and displays an image that seems to be of interest to the user at the top.

１画像表示システム
１０入力受付部
２０検索部
３０行列作成部
４０バイアス作成部
５０ランキング計算部
６０表示部
９９ユーザ
１００データベース内の画像データ
２００インターネット上の画像データ DESCRIPTION OF SYMBOLS 1 Image display system 10 Input reception part 20 Search part 30 Matrix preparation part 40 Bias preparation part 50 Ranking calculation part 60 Display part 99 User 100 Image data 200 in database 200 Image data on the internet

Claims

A method of an apparatus for searching for an image having at least position information attached as metadata and creating a ranking of the searched image,
Receiving input including a query and location information for the image to be searched;
Searching the image based on the query;
Extracting a feature amount of the searched image and obtaining a similarity between the searched images based on the extracted feature amount;
Calculating a distance between a position indicated by the position information in the metadata of the searched image and a position indicated by the input location information;
Creating the ranking using the similarity and the distance. A method for creating a ranking.

A system that searches for images having at least position information as metadata and creates a ranking of the searched images,
An input accepting unit that accepts an input including a query about the image to be searched and location information;
A search unit for searching for the image based on the query;
Extracting a feature amount of the searched image, and calculating a similarity between the searched images based on the extracted feature amount; and
A distance calculation unit that calculates a distance between the position indicated by the position information in the metadata of the searched image and the position indicated by the input location information;
A ranking creation system comprising: a ranking calculation unit that creates the ranking using the similarity and the distance.

The ranking creating system according to claim 2, wherein the input receiving unit further receives an input of the similarity and the weighting of the distance when the ranking is created.

The input receiving unit receives a plurality of input of the location information,
The distance calculation unit calculates the distance for each of the plurality of location information,
4. The ranking creation system according to claim 2, wherein the ranking calculation unit creates the ranking using the similarity and the plurality of calculated distances.

The ranking creation system according to claim 2, wherein a larger bias is applied to the ranking as the distance is smaller.

The ranking creation system according to claim 2, wherein a larger bias is applied to the ranking as the distance is larger.

The similarity generation unit extracts a visual feature amount of the searched image, extracts a text feature amount vector from text information in the metadata of the searched image, and extracts the extracted visual feature. The ranking creation system according to any one of claims 2 to 6, wherein the similarity is obtained based on a quantity and the text feature quantity.

The system for creating a ranking according to any one of claims 2 to 7, wherein the similarity creation unit obtains the similarity from a plurality of images.

The similarity creation unit creates a similarity matrix,
A bias creating unit that creates a bias vector based on the distance calculated by the distance calculating unit;
9. The ranking calculation unit according to claim 2, wherein the ranking calculation unit creates a ranking by repeatedly calculating the following equation (1) using the similarity matrix and the bias vector. A system for creating rankings described in.
R = α (S × R) + (1−α) P (1)
However,
R is the ranking value,
S is a similarity matrix,
P is a bias vector,
α is a parameter (0 <= α <= 1) for adjusting the strength of the bias.

The program for functioning a computer as a system which produces the ranking in any one of Claim 2 thru | or 9.