JP2013054458A

JP2013054458A - Image identification information assigning program and image identification information assigning device

Info

Publication number: JP2013054458A
Application number: JP2011190967A
Authority: JP
Inventors: Bunen Seki; 文渊戚; Noriji Kato; 典司加藤
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2011-09-01
Filing date: 2011-09-01
Publication date: 2013-03-21
Anticipated expiration: 2031-09-01
Also published as: JP5754306B2

Abstract

PROBLEM TO BE SOLVED: To provide an image identification information assigning program and an image identification information assigning device which assign a plurality of pieces of identification information to an image by using correlation information concerning images.SOLUTION: An image identification information assigning device 1 comprises: a feature extraction unit 3 for extracting respective feature amounts from a plurality of query images 120; an annotation score quantization unit 5 for calculating respective quantization values for a plurality of labels to be assigned to a query image 120 by using a learning model 130 on the basis of the feature quantity extracted by the feature extraction unit 3; and a label estimation unit 7 which has the number of random field models corresponding to the number of labels, and inputs the quantization value of each label into the plurality of random field models to output scores of the plurality of labels for each query image 120.

Description

本発明は、画像識別情報付与プログラム及び画像識別情報付与装置に関する。 The present invention relates to an image identification information providing program and an image identification information providing apparatus.

近年、画像アノテーション技術は、画像データベース管理における画像検索システム、画像認識システムなどのための一つの重要な技術となっている。この画像アノテーション技術により、ユーザは、例えば、必要とする画像と意味的に近い画像を検索できる。 In recent years, image annotation technology has become one important technology for image retrieval systems, image recognition systems, and the like in image database management. With this image annotation technology, the user can search for an image that is semantically close to the required image, for example.

画像アノテーション技術として、例えば特許文献１〜４に開示されているものがある。これらは、未知画像に対する意味的なラベルを付与するが、手段としては画像の特徴量を抽出してから最近傍アルゴリズム（ＮＮ：Nearest Neighbor）を用いて、類似画像を検索し、検索された類似画像に付与されたラベルを用いてターゲット画像にレベルを付与する。しかしながら、最近傍アルゴリズムで抽出された画像のみからラベルを付与するという方法では、アノテーションの精度が高くないという問題があった。 As an image annotation technique, for example, there are those disclosed in Patent Documents 1 to 4. These assign a semantic label to an unknown image, but as a means, a feature image is extracted and then a nearest neighbor algorithm (NN: Nearest Neighbor) is used to search for a similar image. A level is assigned to the target image using a label attached to the image. However, the method of attaching a label only from an image extracted by the nearest neighbor algorithm has a problem that the accuracy of the annotation is not high.

上記の問題を改善するため、特許文献５、６で提案されているものがある。これらは、画像特徴に対するラベルの出現頻度に基づいて、学習された識別器を用いて各ラベルの確率を推定する。 In order to improve the above problems, there are some proposed in Patent Documents 5 and 6. They estimate the probability of each label using a learned discriminator based on the appearance frequency of the label with respect to the image feature.

また、既存の分類方法を改良するために、ラベルと特徴量の相関情報を正準相関分析（ＣＣＡ：Canonical Correlation Analysis）でモデリングして、画像特徴量と意味的なラベルのギャップを埋めるモデルが提案されている（例えば非特許文献１参照。）。 In addition, in order to improve the existing classification method, there is a model that fills the gap between image features and semantic labels by modeling correlation information between labels and features using Canonical Correlation Analysis (CCA). (For example, refer nonpatent literature 1).

特開２００５−３５２７８２号公報JP 2005-352882 A 特開２００７−１０９０６７号公報JP 2007-109067 A 特開２００９−１８８９５１号公報JP 2009-188951 A 特開２０１０−２７１７６９号公報JP 2010-271769 A 特開２０００−３５３１７３号公報JP 2000-353173 A 特開２００９−４８３３４号公報JP 2009-48334 A

T.BAilloeul, C.Zhu and Y.Xu, “Automatic image tagging as a random walk with priors on the canonical correlation subspace”, MIR 2008T.BAilloeul, C.Zhu and Y.Xu, “Automatic image tagging as a random walk with priors on the canonical correlation subspace”, MIR 2008

しかし、特許文献５、６に開示された方法では、識別器がオブジェクトのクラス毎に構築され、独立に各ラベルの事後確率を計算しているので、クラス間の相関を利用できないという問題がある。また、非特許文献１に開示された方法では、ＣＣＡにより構築したグラフモデルからランダムウォークでターゲット画像の特徴量からラベルを推定するものであり、局所的な最小値に陥る可能性があり、また計算時間もかかるという問題がある。 However, the methods disclosed in Patent Documents 5 and 6 have a problem that the classifiers are constructed for each class of objects and the posterior probabilities of each label are calculated independently, so that the correlation between classes cannot be used. . Further, in the method disclosed in Non-Patent Document 1, a label is estimated from a feature amount of a target image by a random walk from a graph model constructed by CCA, which may fall into a local minimum value. There is a problem that calculation time is also required.

本発明の課題は、画像に関する相関情報を用いて画像に対して複数の識別情報を付与する画像識別情報付与プログラム及び画像識別情報付与装置を提供することである。 The subject of this invention is providing the image identification information provision program and image identification information provision apparatus which provide several identification information with respect to an image using the correlation information regarding an image.

［１］コンピュータを、複数の画像からそれぞれ特徴量を抽出する抽出手段と、前記抽出手段によって抽出された前記特徴量から学習モデルを用いて前記画像に付与すべき複数の識別情報に対してそれぞれ第１の評価値を計算する計算手段と、前記識別情報の数に対応した数の確率場モデルを有し、前記複数の画像について前記計算手段によって計算された前記識別情報毎の前記第１の評価値を前記複数の確率場モデルに入力し、前記画像毎に前記複数の識別情報に対する第２の評価値を出力する出力手段として機能されるための画像識別情報付与プログラム。 [1] An extraction unit that extracts a feature amount from each of a plurality of images, and a plurality of pieces of identification information to be assigned to the image using a learning model from the feature amounts extracted by the extraction unit. Computation means for calculating a first evaluation value and random field models corresponding to the number of identification information, and the first information for each of the identification information calculated by the calculation means for the plurality of images An image identification information addition program for functioning as output means for inputting evaluation values to the plurality of random field models and outputting second evaluation values for the plurality of identification information for each of the images.

［２］前記出力手段の前記確率場モデルを前記複数の画像間の相関情報に基づいて最適化する最適化手段を、さらに備えた前記［１］に記載の画像識別情報付与プログラム。 [2] The image identification information adding program according to [1], further including optimization means for optimizing the random field model of the output means based on correlation information between the plurality of images.

［３］前記出力手段の前記確率場モデルを前記複数の識別情報間の相関情報に基づいて最適化する最適化手段を、さらに備えた前記［１］に記載の画像識別情報付与プログラム。 [3] The image identification information addition program according to [1], further including optimization means for optimizing the random field model of the output means based on correlation information between the plurality of identification information.

［４］複数の画像からそれぞれ特徴量を抽出する抽出手段と、前記抽出手段によって抽出された前記特徴量から学習モデルを用いて前記画像に付与すべき複数の識別情報に対してそれぞれ第１の評価値を計算する計算手段と、前記識別情報の数に対応した数のＭＲＦモデルを有し、前記複数の画像について前記計算手段によって計算された前記識別情報毎の前記第１の評価値を前記複数の確率場モデルに入力し、前記画像毎に前記複数の識別情報に対する第２の評価値を出力する出力手段とを備えた画像識別情報付与装置。 [4] Extraction means for extracting feature amounts from a plurality of images, respectively, and a plurality of pieces of identification information to be added to the images using a learning model from the feature amounts extracted by the extraction means. A calculation means for calculating an evaluation value; and a number of MRF models corresponding to the number of the identification information, and the first evaluation value for each of the identification information calculated by the calculation means for the plurality of images An image identification information providing apparatus comprising: output means for inputting a plurality of random field models and outputting a second evaluation value for the plurality of identification information for each of the images.

請求項１又は４に記載された発明によれば、画像に関する相関情報を用いて画像に対して複数の識別情報を付与することができる。 According to the invention described in claim 1 or 4, a plurality of pieces of identification information can be given to the image using the correlation information regarding the image.

請求項２に記載された発明によれば、複数の画像間の相関情報に基づいて画像に対して最適化された複数の識別情報を付与することができる。 According to the second aspect of the present invention, a plurality of pieces of identification information optimized for an image can be given based on correlation information between the plurality of images.

請求項３に記載された発明によれば、複数の識別情報間の相関情報に基づいて画像に対して最適化された複数の識別情報を付与することができる。 According to the third aspect of the present invention, a plurality of identification information optimized for an image can be given based on correlation information between the plurality of identification information.

図１は、本発明の第１の実施の形態に係る画像識別情報付与装置の構成例を示すブロック図である。FIG. 1 is a block diagram showing a configuration example of an image identification information providing apparatus according to the first embodiment of the present invention. 図２は、ラベル推定部の概略の構成例を示し、（ａ）は平面図、（ｂ）は側面図である。FIG. 2 shows a schematic configuration example of the label estimation unit, where (a) is a plan view and (b) is a side view. 図３は、第１の実施の形態の動作例を示すフローチャートである。FIG. 3 is a flowchart illustrating an operation example of the first embodiment. 図４は、本発明の第２の実施の形態に係る画像識別情報付与装置の構成例を示すブロック図である。FIG. 4 is a block diagram showing a configuration example of an image identification information providing apparatus according to the second embodiment of the present invention.

以下、本発明の実施の形態について図面を参照して説明する。なお、各図中、実質的に同一の機能を有する構成要素については、同一の符号を付してその重複した説明を省略する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In addition, in each figure, about the component which has the substantially same function, the same code | symbol is attached | subjected and the duplicate description is abbreviate | omitted.

［第１の実施の形態］
図１は、本発明の第１の実施の形態に係る画像識別情報付与装置の構成例を示すブロック図である。この画像識別情報付与装置１は、画像受付部２、特徴抽出部３、ラベル事後確率計算部４、量子化部５、ノード接合部６、ラベル推定部７、ラベル付与部８、アノテーション情報出力部９及び記憶部１０を有して概略構成されている。 [First Embodiment]
FIG. 1 is a block diagram showing a configuration example of an image identification information providing apparatus according to the first embodiment of the present invention. The image identification information adding device 1 includes an image receiving unit 2, a feature extracting unit 3, a label posterior probability calculating unit 4, a quantizing unit 5, a node joining unit 6, a label estimating unit 7, a label adding unit 8, and an annotation information output unit. 9 and a storage unit 10.

従来のアノテーション手段は、学習コーパス（学習用画像と学習用画像に付与されたラベルとが対になったもの）中の学習用画像から周知の特徴抽出手法により特徴量を抽出し、特徴量とラベルとの関連を識別モデルとして学習する。学習された識別モデル、すなわち学習モデル１３０は、データベースに保存する。そして、クエリ画像（入力画像、未知画像ともいう。）１２０にラベルを付与するため、学習モデル１３０を用いてクエリ画像１２０に対して、ラベルの事後確率を計算し、最も高い値を持つラベルを推定結果とする。 The conventional annotation means extracts a feature amount from a learning image in a learning corpus (a pair of a learning image and a label given to the learning image) by a known feature extraction method, The relation with the label is learned as an identification model. The learned identification model, that is, the learning model 130 is stored in a database. Then, in order to give a label to the query image (also referred to as an input image or an unknown image) 120, the posterior probability of the label is calculated for the query image 120 using the learning model 130, and the label having the highest value is selected. Estimated result.

本明細書において、「アノテーション」とは、画像全体に対してラベルを付与することをいう。「ラベル」は、識別情報の一例であり、画像の全体又は部分領域の内容を表す識別情報、例えば単語である。 In this specification, “annotation” means that a label is assigned to the entire image. The “label” is an example of identification information, and is identification information that represents the contents of the entire image or a partial area, for example, a word.

本実施の形態は、ラベル事後確率計算部４によりラベルの事後確率を計算してから、画クエリ像１２０間の相関情報に基づいてＭＲＦモデルもしくはＣＲＦモデルでラベルの順位を調整してクエリ画像１２０にラベルを付与する。ここで、「ＭＲＦモデル」とは、マルコフ確率場（ＭＲＦ：Markov Random Field）モデルのことであり、「ＣＲＦモデル」とは、条件付き確率場（ＣＲＦ：Conditional Random Field）モデルのことである。これらのマルコフ確率場モデル及び条件付き確立場モデルは、確率場モデルの一例である。 In the present embodiment, the label posterior probability calculation unit 4 calculates the posterior probability of the label, and then adjusts the rank of the label using the MRF model or the CRF model based on the correlation information between the image query images 120, thereby obtaining the query image 120. Give a label to Here, the “MRF model” refers to a Markov Random Field (MRF) model, and the “CRF model” refers to a conditional random field (CRF) model. These Markov random field models and conditional established field models are examples of random field models.

以下、本実施の形態の特徴的な部分、すなわち量子化部５、ノード接合部６及びラベル推定部７を中心に説明する。 Hereinafter, the characteristic part of the present embodiment, that is, the quantization unit 5, the node junction unit 6, and the label estimation unit 7 will be mainly described.

画像受付部２は、ラベルを付与する対象画像のクエリ画像１２０を受け付ける。 The image receiving unit 2 receives a query image 120 of a target image to which a label is assigned.

特徴抽出部３は、抽出手段の一例であり、クエリ画像から特徴量を抽出する。特徴量は、例えばＲ，Ｇ，Ｂ等の色やテクスチャなどの画像特徴を並べたものである。 The feature extraction unit 3 is an example of an extraction unit, and extracts feature amounts from the query image. The feature amount is an arrangement of image features such as colors such as R, G, B, and texture.

ラベル事後確率計算部４は、特徴量ｆから各ラベルｃの事後確率（Ｐ（ｃ｜ｆ））を算出し、ラベル毎にアノテーションスコア（アナログ値）として出力する。 The label posterior probability calculation unit 4 calculates the posterior probability (P (c | f)) of each label c from the feature quantity f, and outputs it as an annotation score (analog value) for each label.

記憶部１０には、画像識別情報付与プログラム１１０等の各種のプログラム、クエリ画像１２０、学習モデル１３０、ラベル辞書１４０、リンク情報１５０等の各種のデータが格納されている。記憶部１０は、例えばＲＯＭ、ＲＡＭ、ＨＤＤ等により構成されている。 The storage unit 10 stores various programs such as an image identification information adding program 110, various data such as a query image 120, a learning model 130, a label dictionary 140, and link information 150. The storage unit 10 includes, for example, a ROM, a RAM, an HDD, and the like.

アンテーション出力部９は、ラベル付与部８によって付与されたアノテーション情報（ラベルとスコア）を外部に出力するものであり、例えば液晶ディスプレイ等の表示部や、プリンタ等の印刷部を用いることができる。 The annotation output unit 9 outputs the annotation information (label and score) given by the label giving unit 8 to the outside. For example, a display unit such as a liquid crystal display or a printing unit such as a printer can be used. .

（量子化部）
従来のＭＲＦやＣＲＦを用いた画像アノテーション技術では、例えば非特許文献“Word co-occurrence and Markov Random Field for Improving Automatic Image Annotation ”H.J.Escalante, M.Montes and L.E.Sucar, BMVC,2007に開示されているように、ラベルの共起を用いてＭＲＦモデルを構築し、ラベルの確率を観測値として入力し、入力画像に対してラベルを推定する。この従来技術では、画像のラベルを推定する隠れノードは複数のラベルから一つのラベルを選択するノードであり、したがって、１つの画像全体又は画像領域に対して１つのラベルしか付与できず、画像全体に複数のラベルを付与するアノテーションには適用できない。 (Quantization part)
Conventional image annotation technology using MRF or CRF is disclosed in, for example, non-patent document “Word co-occurrence and Markov Random Field for Improving Automatic Image Annotation” HJEscalante, M. Montes and LESucar, BMVC, 2007 Then, the MRF model is constructed using the label co-occurrence, the label probability is input as an observation value, and the label is estimated for the input image. In this prior art, the hidden node that estimates the label of an image is a node that selects one label from a plurality of labels, and therefore only one label can be assigned to an entire image or image area, and the entire image It cannot be applied to annotations with multiple labels.

これを解決するために、本実施の形態は、各ラベルに対して１つのＭＲＦもしくはＣＲＦモデルを持ち、各モデルの隠れノードは量子化されたラベルの確率を持つ。そしてＭＲＦもしくはＣＲＦモデルで推定した量子化値によって、ラベルの順位を決め、１つの画像に対してスコアの高い複数のラベルを付与するものである。 In order to solve this, this embodiment has one MRF or CRF model for each label, and the hidden node of each model has the probability of a quantized label. Then, the order of the labels is determined by the quantized value estimated by the MRF or CRF model, and a plurality of labels having high scores are assigned to one image.

本実施の形態の量子化部５は、ラベル事後確率計算部４がラベル毎に算出したアナログ値であるアノテーションスコアを量子化する。量子化した値（量子化値）は、離散化した値（離散化値）である。アノテーションスコアのレベルを均等に量子化するため、ヒストグラム平坦化(Histogram Equalization)方法で決定する。量子化値は、後述するＭＲＦモデル７０_１〜７０_Ｎの各事後確率計算ノード７２の隠れ変数の初期状態とされる。アノテーションスコア（アナログ値）とこれに対応する量子化値の一例を表１に示す。表１中、Ｍは画像の数であり、Ｎはラベルの数である。ここで、ラベル事後確率計算部４及び量子化部５は、計算手段の一例であり、ラベル事後確率計算部４が算出するアナログ値であるアノテーションスコア、及び量子化部５が出力する量子化値は、第１の評価値の一例である。 The quantization unit 5 of the present embodiment quantizes the annotation score, which is an analog value calculated for each label by the label posterior probability calculation unit 4. The quantized value (quantized value) is a discretized value (discretized value). In order to uniformly quantize the annotation score level, it is determined by a histogram equalization method. The quantized value is set as an initial state of a hidden variable of each posterior probability calculation node 72 of MRF models 70 ₁ to 70 _N described later. Table 1 shows an example of the annotation score (analog value) and the corresponding quantization value. In Table 1, M is the number of images, and N is the number of labels. Here, the label posterior probability calculation unit 4 and the quantization unit 5 are examples of calculation means, and an annotation score that is an analog value calculated by the label posterior probability calculation unit 4 and a quantization value output by the quantization unit 5 Is an example of a first evaluation value.

（ノード接合部）
本実施の形態のノード接合部６は、画像間の相関情報に基づいて事後確率計算ノード７２及び推定事後確率計算ノード７３間を接合する接合リンク７６の位置情報（リンク情報）１５０を生成し、生成したリンク情報１５０を記憶部１０に格納する。入力側リンク７５及び出力側リンク７７は、予め付けられている。画像間の相関情報として、例えば画像の撮影時間、画像特徴量の類似度等を用いることができる。相関情報としてアノテーション（ラベル）間の相関を用いる例は、後述する。 (Node junction)
The node junction unit 6 according to the present embodiment generates position information (link information) 150 of the junction link 76 that joins the posterior probability calculation node 72 and the estimated posterior probability calculation node 73 based on the correlation information between images. The generated link information 150 is stored in the storage unit 10. The input side link 75 and the output side link 77 are attached in advance. As the correlation information between images, for example, the image capturing time, the similarity of image feature amounts, and the like can be used. An example of using a correlation between annotations (labels) as correlation information will be described later.

ノード接合部６は、一連の複数のクエリ画像（クエリ画像集合）１２０を入力して、画像間の相関情報を計算してから、ＭＲＦモデル０_１〜７０_Ｎの事後確率計算ノード７２及び推定事後確率計算ノード７３間の接合方法を決める。接合方法の一例として、画像間の特徴量の類似度がある閾値以上の場合、当該画像に対応する事後確率計算ノード７２及び推定事後確率計算ノード７３間に接合リンク７６を付与し、画像間の類似度が閾値より小さい場合、対応するノード７２、７３間に接合リンク７６を付与しない。また、撮影時刻がお互いに近い画像に対して接合リンク７６を付与してもよい。リンク情報１５０の一例を表２に示す。画像の数は、事後確率計算ノード７２の数、及び推定事後確率計算ノード７３の数と同じである。表２において、「１」はノード７２、７３間に接合リンク７６がある場合を示し、「０」はノード７２、７３間に接合リンク７６がない場合を示す。 The node junction unit 6 inputs a series of query images (query image set) 120, calculates correlation information between the images, and then calculates the posterior probability calculation node 72 and the estimated posterior of the MRF models 0 ₁ to 70 _N. A joining method between the probability calculation nodes 73 is determined. As an example of the joining method, when the similarity of the feature amount between images is equal to or greater than a threshold value, a joining link 76 is provided between the posterior probability calculation node 72 and the estimated posterior probability calculation node 73 corresponding to the images, When the similarity is smaller than the threshold value, the joint link 76 is not provided between the corresponding nodes 72 and 73. Moreover, you may provide the joining link 76 with respect to the image whose imaging | photography time is near each other. An example of the link information 150 is shown in Table 2. The number of images is the same as the number of posterior probability calculation nodes 72 and the number of estimated posterior probability calculation nodes 73. In Table 2, “1” indicates a case where the junction link 76 exists between the nodes 72 and 73, and “0” indicates a case where the junction link 76 does not exist between the nodes 72 and 73.

また、リンク情報１５０は事前に画像の相関情報から生成することができるが、動的にリンク情報１５０を生成してもよい。すなわち、事後確立計算ノード７２の隠れ変数の状態により、隠れ変数間の距離又は量子化値の差がある閾値以下の場合、ノード７２、７３間の接合リンク７６を自動的に追加し、隠れ変数間の距離又は量子化値の差がある閾値より大きい場合、ノード７２、７３間の接合リンク７６を自動的に除外してもよい。 The link information 150 can be generated from the correlation information of the image in advance, but the link information 150 may be generated dynamically. That is, when the distance between the hidden variables or the difference in the quantized value is equal to or less than a certain threshold due to the state of the hidden variable of the post-establishment calculation node 72, the junction link 76 between the nodes 72 and 73 is automatically added. If the distance between them or the difference in quantization values is greater than a certain threshold, the junction link 76 between the nodes 72, 73 may be automatically excluded.

（ラベル推定部）
図２は、ラベル推定部７の概略の構成例を示す図である。ラベル推定部７は、ラベル毎に設けられたＭＲＦモデル７０_１〜７０_Ｎを有し、対応するＭＲＦモデル７０_１〜７０_Ｎのノード隠れ変数の初期状態の設定及びノードを繋げるリンク情報１５０を入力して、グラフカットメッセージ・パッシング方法（Yuri Boykov, O.Veksler, R.Zabih, “Fast Approximate Energy Minimization via Graph Cuts”,PAMI2001）により、ラベルの量子化状態を最適化する。 (Label estimation part)
FIG. 2 is a diagram illustrating a schematic configuration example of the label estimation unit 7. The label estimation unit 7 has MRF models 70 ₁ to 70 _N provided for each label, and inputs the initial state of the node hidden variables of the corresponding MRF models 70 ₁ to 70 _N and link information 150 connecting the nodes. Then, the quantization state of the label is optimized by a graph cut message passing method (Yuri Boykov, O. Veksler, R. Zabih, “Fast Approximate Energy Minimization via Graph Cuts”, PAMI2001).

各ＭＲＦモデル７０_１〜７０_Ｎは、同一の構造を有しているので、代表としてＭＲＦモデル７０_１について説明する。ＭＲＦモデル７０_１は、図２（ａ）に示すように、量子化値Ｑが入力される入力ノード７１_１〜７１_Ｍと、アノテーションスコア量子化部５から出力された事後確率を保持する事後確率計算ノード７２_１〜７２_Ｍと、推定の事後確率を計算する推定事後確率計算ノード７３_１〜７３_Ｍと、ラベルのスコアを出力する出力ノード７４_１〜７４_Ｍと、入力ノード７１_１〜７１_Ｍと事後確率計算ノード７２_１〜７２_Ｍを接合する入力側リンク７５_１〜７５_Ｍと、事後確率計算ノード７２_１〜７２_Ｍと推定事後確率計算ノード７３_１〜７３_Ｍを接合する接合リンク７６と、推定事後確率計算ノード７３_１〜７３_Ｍと出力ノード７４_１〜７４_Ｍを接合する出力側リンク７７_１〜７７_Ｍとを有して概略構成されている。また、画像と入力ノード７１_１〜７１_Ｍ及び出力ノード７４_１〜７４_Ｍは一対一に対応しているので、各ＭＲＦモデル７０_１〜７０_Ｎの入力ノード７１_１〜７１_Ｍ及び出力ノード７４_１〜７４_Ｍの数は、画像の数Ｍと同じである。 Since each MRF model 70 ₁ to 70 _N has the same structure, the MRF model 70 ₁ will be described as a representative. MRF model 70 _1, as shown in FIG. 2 (a), the posterior probability of holding an input node ₇₁ 1 -71 _M quantized value Q is inputted, the posterior probability output from the annotation scoring quantization unit 5 The calculation nodes 72 _{1 to} 72 _M , the estimated posterior probability calculation nodes 73 _{1 to} 73 _M for calculating the estimation posterior probability, the output nodes 74 ₁ to 74 _M for outputting the label score, and the input nodes 71 _{1 to} 71 _M and the input-side link ₇₅ 1 to 75 _M for joining the posterior probability calculation nodes ₇₂ 1 to 72 _M, the posterior probability calculation nodes ₇₂ 1 to 72 _M and the estimated posteriori probability calculation nodes ₇₃ 1-73 junction link 76 joining the _M , it is schematically configured to have an output-side link ₇₇ 1 to 77 _M for bonding the estimated posterior probability calculation nodes ₇₃ 1 to 73 _M output node ₇₄ 1 to 74 _M. Further, since the image and the input nodes 71 _{1 to} 71 _M and the output nodes 74 ₁ to 74 _M correspond one-to-one, the input nodes 71 _{1 to} 71 _M and the output nodes 74 _{1 of} each MRF model 70 ₁ to 70 _N. to 74 the number of _M is the same as the number M of the image.

例えば、最初の画像（Image1）の量子化値Ｑ_１１〜Ｑ_１Ｎは、ＭＲＦモデル７０_１〜７０_Ｎの各入力ノード７１_１に入力し、次の画像（Image2）の量子化値Ｑ_２１〜Ｑ_２Ｎは、ＭＲＦモデル７０_１〜７０_Ｎの各入力ノード７１_２に入力し、同様にＭ番目の画像（ImageM）の量子化値Ｑ_M１〜Ｑ_MＮは、ＭＲＦモデル７０_１〜７０_Ｎの各入力ノード７１_Mに入力し、その後ＭＲＦモデル７０_１〜７０_Ｎの各出力ノード７４_１〜７４_Mから画像１〜Ｍに対する各ラベルＬ_１〜Ｌ_Ｎのスコアが出力される。 For example, the quantized values Q _{11 to} Q _1N of the _first image (Image ₁₎ are input to the input nodes 71 ₁ of the MRF models 70 ₁ to 70 _N , and the quantized values Q _{21 to} Q of the next image (Image 2) are input. _2N inputs to each input node 71 ₂ of MRF model ₇₀ 1 to 70 _N, the quantization value Q _M1 to Q _MN similarly M-th image (Imagem), each input of the MRF model ₇₀ 1 to 70 _N After inputting to the node 71 _M , the scores of the labels L _{1 to} L _N for the images ₁ to _M are output from the output nodes 74 ₁ to 74 _{M of the} MRF models 70 ₁ to 70 _N.

入力側リンク７５_１〜７５_Ｍ及び出力側リンク７７_１〜７７_Ｍは、予め与えられている。接合リンク７６は、リンク情報１５０に基づいてノード接合部６により与えられる。接合リンク７６は、１つのＭＲＦモデル７０の事後確率計算ノード７２_１〜７２_Ｍと推定事後確率計算ノード７３_１〜７３_Ｍを接合するだけでなく、ＭＲＦモデル７０_１〜７０_Ｎ間でも接合する。 The input side links 75 ₁ to 75 _M and the output side links 77 _{1 to} 77 _M are given in advance. The junction link 76 is given by the node junction 6 based on the link information 150. The joint link 76 joins not only the posterior probability calculation nodes 72 _{1 to} 72 _{M of} one MRF model 70 and the estimated posterior probability calculation nodes 73 _{1 to} 73 _M but also the MRF models 70 ₁ to 70 _N.

以上の構成により、すべてＭＲＦモデル７０の各対応するノードの状態を比較して、画像に対するすべてラベルを付与する。すなわち、画像Ｍに対してはＭＲＦモデル７０_１〜７０_Ｎの出力ノード７４_Mの値を比較し、上位のラベルをその画像に対して付与する。ここで、ラベル推定部７は、出力手段の一例であり、出力ノード７４_１から出力するラベルＬ_１〜Ｌ_Ｎのスコアは、第２の評価値の一例である。 With the above configuration, all the states of the corresponding nodes of the MRF model 70 are compared, and all labels are assigned to the images. That is, for the image M, the values of the output nodes 74 _M of the MRF models 70 ₁ to 70 _N are compared, and a higher level label is assigned to the image. Here, the label estimation unit 7 is an example of an output unit, and the scores of the labels L _{1 to} L _N output from the output node 74 ₁ are an example of a second evaluation value.

（第１の実施の形態の動作）
図３は、第１の実施の形態の動作例を示すフローチャートである。本実施の形態は、画像の相関情報に基づき、ＭＲＦモデル７０_１〜７０_Ｎに接合リンク７６を付けるのが特徴である。 (Operation of the first embodiment)
FIG. 3 is a flowchart illustrating an operation example of the first embodiment. The present embodiment is characterized in that the junction links 76 are attached to the MRF models 70 ₁ to 70 _N based on the correlation information of the images.

画像受付部２がクエリ画像１２０を受け付けると、特徴抽出部３は、クエリ画像１２０から特徴量を抽出する。 When the image reception unit 2 receives the query image 120, the feature extraction unit 3 extracts a feature amount from the query image 120.

ラベル事後確率計算部４は、周知の識別器により保存された学習モデル１３０を用いて、クエリ画像１２０に対する各ラベルの事後確率を計算し（Ｓ１）、その事後確率をアノテーションスコアとして出力する。 The label posterior probability calculation unit 4 calculates the posterior probability of each label with respect to the query image 120 using the learning model 130 stored by a known classifier (S1), and outputs the posterior probability as an annotation score.

量子化部５は、ラベル事後確率計算部４が出力したアノテーションスコアをあらかじめ定められた閾値に応じて量子化する（Ｓ４）。量子化された値は、隠れノードの初期値に設定され、その後グラフカットメッセージ・パッシング方法により、推定事後確率計算ノード７３に隠れ変数の最終状態の推定結果が保持される。 The quantization unit 5 quantizes the annotation score output by the label posterior probability calculation unit 4 according to a predetermined threshold (S4). The quantized value is set to the initial value of the hidden node, and then the estimation result of the final state of the hidden variable is held in the estimated posterior probability calculation node 73 by the graph cut message passing method.

次に、すべての推定事後確率計算ノード７３が処理した後に、ノード７２、７３間を繋げるリンク情報１５０を取得する。ノード接合部６は、画像の相関情報に基づいて接合リンク７６を付ける（Ｓ５）。画像の相関情報が時間の場合、画像ペアの撮影時間の差が予め定められた時間（例えば５時間）以下の場合、該当するペアのノード７２、７３間に接合リンク７６を付ける。また、異なる画像の撮影時間の差が予め定められた時間（例えば５時間）よりも大きい場合、該当するペアのノード７２、７３間に接合リンク７６を付けない。 Next, after all the estimated posterior probability calculation nodes 73 have processed, the link information 150 connecting the nodes 72 and 73 is acquired. The node junction 6 attaches the junction link 76 based on the correlation information of the image (S5). When the correlation information of the image is time, if the difference between the shooting times of the image pair is equal to or less than a predetermined time (for example, 5 hours), the junction link 76 is attached between the nodes 72 and 73 of the corresponding pair. Further, when the difference between the shooting times of different images is larger than a predetermined time (for example, 5 hours), the junction link 76 is not attached between the nodes 72 and 73 of the corresponding pair.

画像の相関情報が画像の類似度である場合、画像から様々な特徴量を抽出する。例えば、ＲＧＢ、ｎｏｒｍａｌｉｚｅｄ−ＲＧ、ＨＳＶ（色空間）、ＬＡＢ、ｒｏｂｕｓｔＨｕｅ特徴量（van de Weijer, C. Schmid, “Coloring Local Feature Extraction”, ECCV 2006を参照）、Ｇａｂｏｒ特徴量、ＤＣＴ（Direction Curve Tangent）特徴量、ＳＩＦＴ（Scale Invariant Feature Transform）特徴量及びＧＩＳＴ（Generalized Search Tree）特徴量であり、いかなる特徴を用いてもよい。画像同士の類似度は、特徴量の距離とする。正規化した距離が０．５以下場合には画像ペアに対応するノード７２、７３のペアの間に接合リンク７６を付ける。０．５より大きい場合には画像ペアに対応するノード７２、７３のペアの間に接合リンク７６を付けない。 When the correlation information of the image is the similarity of the image, various feature amounts are extracted from the image. For example, RGB, normalized-RG, HSV (color space), LAB, robustHue feature (see van de Weijer, C. Schmid, “Coloring Local Feature Extraction”, ECCV 2006), Gabor feature, DCT (Direction Curve Tangent ) Feature amount, SIFT (Scale Invariant Feature Transform) feature amount, and GIST (Generalized Search Tree) feature amount, and any feature may be used. The similarity between images is the distance of the feature amount. When the normalized distance is 0.5 or less, the junction link 76 is attached between the pair of nodes 72 and 73 corresponding to the image pair. If it is greater than 0.5, the junction link 76 is not attached between the pair of nodes 72 and 73 corresponding to the image pair.

以上のようにして１つのラベルに対応するＭＲＦモデル７０を構築する。次のステップでは、ＭＲＦモデル７０を最適化する（Ｓ６）。すなわちラベルに対応するＭＲＦモデル７０_１〜７０_Ｎの事後確率計算ノード７２に上記ステップＳ４で計算された隠れ変数状態を入力し、ノード７２、７３間を接合するリンク情報１５０を入力して、ノード７２，７３間に接合リンク７６を付ける。上記ステップＳ４、Ｓ５、Ｓ６は、すべてのラベル及びノードについて行われる（Ｓ２、Ｓ３）。 As described above, the MRF model 70 corresponding to one label is constructed. In the next step, the MRF model 70 is optimized (S6). That is, the hidden variable state calculated in step S4 is input to the posterior probability calculation node 72 of the MRF models 70 ₁ to 70 _N corresponding to the labels, and the link information 150 that joins the nodes 72 and 73 is input to the node. A joining link 76 is attached between 72 and 73. The above steps S4, S5, S6 are performed for all labels and nodes (S2, S3).

最後に、各ラベルに対応するＭＲＦモデル７０_１〜７０_Ｎを全部最適化し、１つの画像に対応するすべてのＭＲＦモデル７０_１〜７０_Ｎの推定事後確率計算ノード７３_１〜７３_Ｍの隠れ変数の最終状態を統合し、その結果、画像に対するすべてアノテーションスコア調整できた。そして調整したアノテーションスコアの順位を付けて、高い順にクエリ画像にラベルを付与する（Ｓ７）。例えば、１つの画像（Image1）の量子化値Ｑ_１１〜Ｑ_１Ｎを、ＭＲＦモデル７０_１〜７０_Ｎの各入力ノード７１_１に入力すると、すべてのＭＲＦモデル７０_１〜７０_Ｎの各推定事後確率計算ノード７３_１の隠れ変数が出力ノード７４_１から各ラベルＬ_１〜Ｌ_Ｎのスコアとして出力される。 Finally, all the MRF models 70 ₁ to 70 _N corresponding to each label are optimized, and the estimated posterior probability calculation nodes 73 _{1 to} 73 _M of all the MRF models 70 ₁ to 70 _N corresponding to one image are changed. The final state was integrated, and as a result, all the annotation scores for the images could be adjusted. Then, the ranks of the adjusted annotation scores are given and labels are given to the query images in descending order (S7). For example, the quantization values _Q 11 _{to Q 1N} of one image (Image1), the input to each input node 71 ₁ of MRF model ₇₀ 1 to 70 _N, the estimated posterior probability of all MRF models ₇₀ 1 to 70 _N The hidden variable of the calculation node 73 ₁ is output from the output node 74 ₁ as the score of each label L _{1 to} L _N.

（第１の実施の形態の効果）
第１の実施の形態によれば、複数の画像間の相関情報に基づいてＭＲＦモデルを最適化しているので、本構成を採用しない場合と比べて画像に対して高い精度で複数のラベルを付与することができる。 (Effects of the first embodiment)
According to the first embodiment, since the MRF model is optimized based on correlation information between a plurality of images, a plurality of labels are assigned to the images with higher accuracy than when this configuration is not adopted. can do.

［第２の実施の形態］
図４は、本発明の第２の実施の形態に係る画像識別情報付与装置の構成例を示すブロック図である。本実施の形態の画像識別情報付与装置１は、第１の実施の形態と同様に、画像受付部２、特徴抽出部３、ラベル事後確率計算部４、量子化部５、ノード接合部６、ラベル推定部７、ラベル付与部８、アノテーション情報出力部９及び記憶部１０を有して概略構成されている。本実施の形態は、第１の実施の形態とは、ノード接合部６が異なり、他は第１の実施の形態と同様に構成され、同様の作用を奏するので、その説明を省略する。 [Second Embodiment]
FIG. 4 is a block diagram showing a configuration example of an image identification information providing apparatus according to the second embodiment of the present invention. As in the first embodiment, the image identification information providing apparatus 1 according to the present embodiment includes an image reception unit 2, a feature extraction unit 3, a label posterior probability calculation unit 4, a quantization unit 5, a node junction unit 6, A label estimation unit 7, a label addition unit 8, an annotation information output unit 9, and a storage unit 10 are schematically configured. The present embodiment is different from the first embodiment in that the node junction portion 6 is different, and other configurations are the same as those in the first embodiment, and the same operations are performed.

本実施の形態のノード接合部６は、ラベルの相関情報に基づいてＭＲＦモデル７０の事後確率計算ノード７２及び推定事後確率計算ノード７３間に接合リンク７６を生成し、生成した接合リンク７６の位置情報であるリンク情報を記憶部１０に保存する。ラベルの相関情報として、例えば、ある画像ペアに対して、量子化されたアノテーションスコアにより上位５つを列挙し、順位を問わず、画像ペアの同じラベルの数を数える。同じラベルの数は１つ以上の場合、対応するノード７２、７３間に接合リンク７６を付与し、同じラベルの数がゼロの場合、対応するノード７２、７３間に接合リンク７６を付与しない。 The node junction 6 of the present embodiment generates a junction link 76 between the posterior probability calculation node 72 and the estimated posterior probability calculation node 73 of the MRF model 70 based on the correlation information of the label, and the position of the generated junction link 76 The link information, which is information, is stored in the storage unit 10. As the correlation information of labels, for example, the top five are listed by quantized annotation scores for a certain image pair, and the number of the same label of the image pair is counted regardless of the order. When the number of the same labels is one or more, the joint link 76 is provided between the corresponding nodes 72 and 73, and when the number of the same labels is zero, the joint link 76 is not provided between the corresponding nodes 72 and 73.

（第２の実施の形態の効果）
第２の実施の形態によれば、複数のラベル間の相関情報に基づいてＭＲＦモデルを最適化しているので、本構成を採用しない場合と比べて画像に対して高い精度で複数のラベルを付与することができる。 (Effect of the second embodiment)
According to the second embodiment, since the MRF model is optimized based on correlation information between a plurality of labels, a plurality of labels are assigned to an image with higher accuracy than when this configuration is not adopted. can do.

次に、本発明の実施例について、画像の数Ｍを１００、量子化値の範囲を１から２０００とした場合を例に挙げて説明する。ラベル事後確率計算部４が算出したアナログ値のアノテーションスコアは、量子化部５によって離散化値に変換される。表３は、アノテーションスコア（アナログ値）と量子化値（離散化値）の具体的な一例を示す。 Next, an embodiment of the present invention will be described by taking as an example the case where the number M of images is 100 and the range of quantization values is 1 to 2000. The annotation score of the analog value calculated by the label posterior probability calculation unit 4 is converted into a discretized value by the quantization unit 5. Table 3 shows a specific example of the annotation score (analog value) and the quantized value (discretized value).

表３中、画像ＩＤの下の括弧は、画像に付与すべき正解ラベルを示す。表３から、量子化部５が出力したアノテーションスコア（ラベル事後確率）のみで第１位のラベルを付けると、Image1、ImageMについて不正解となっていることから、精度が高くないことが分かる。 In Table 3, parentheses below the image ID indicate a correct label to be assigned to the image. From Table 3, it can be seen that if the first-ranked label is attached only by the annotation score (label posterior probability) output from the quantizing unit 5, the accuracy is not high because Image1 and ImageM are incorrect.

ノード接合部６が作成したノード情報（ノード接合マトリクス）１５０の一例を表４に示す。表４中、「１」は画像間に時間的な相関があるため、ノード７２、７３間に接合リンク７６がある場合を示し、「０」は画像間に時間的な相関が無いため、ノード７２，７３間に接合リンク７６がない場合を示す。 An example of node information (node junction matrix) 150 created by the node junction 6 is shown in Table 4. In Table 4, “1” indicates that there is a temporal correlation between images, and thus indicates a case where there is a junction link 76 between the nodes 72 and 73, and “0” indicates that there is no temporal correlation between images. The case where there is no joining link 76 between 72 and 73 is shown.

表５は、調整する前（ＭＲＦモデルの入力値）の量子化されたアノテーションスコア（量子化値）と、調整した後（ＭＲＦモデルの出力値）のアノテーションスコア（量子化値）である。 Table 5 shows the quantized annotation score (quantized value) before adjustment (input value of the MRF model) and the annotation score (quantized value) after adjustment (output value of the MRF model).

ここでは、画像の数を１００とし、量子化値の範囲を１から２０００とする。表５中の量子化値は、アナログ値の量子化値がヒストグラム平坦化方法により離散化値に変換されたものである。表５中の画像ＩＤの下の括弧は、画像に付けるべき正解ラベルを示す。表５の２列目と３列目はＭＲＦモデル７０を最適化する前のものである。最適化前のラベルは量子化値が高い順に並んでいる。最適化前の量子化値は表３と同じである。また、表５の４列目と５列目はＭＲＦモデル７０を最適化した後のものである。最適化後のラベルは、量子化値（調整アノテーションスコア）が高い順に並んでいる。以上の結果より、画像ＩＤImage1については、最適化前はラベル「hug」が第１位であったが、最適化後はラベル「hand」が第１位となり、正解が得られている。また、画像ＩＤImage100については、最適化前と最適化後で第１位のラベル「hand」は変わらない。しかし、第２位は最適化前のラベル「face」から最適化後はラベル「foot」に変わり、さらに量子化値も「１１７」から「１４８」に高くなり、正解に近くなるので、精度が高くなることが分かる。 Here, the number of images is 100, and the range of quantization values is 1 to 2000. The quantized values in Table 5 are obtained by converting the quantized values of analog values into discretized values by the histogram flattening method. The parentheses below the image ID in Table 5 indicate the correct answer label to be attached to the image. The second and third columns in Table 5 are before the MRF model 70 is optimized. The labels before optimization are arranged in descending order of quantization value. The quantization value before optimization is the same as in Table 3. The fourth and fifth columns in Table 5 are after the MRF model 70 has been optimized. The optimized labels are arranged in descending order of quantization values (adjusted annotation scores). From the above results, for the image ID Image1, the label “hug” was ranked first before optimization, but after optimization, the label “hand” was ranked first and the correct answer was obtained. In addition, for the image ID Image100, the first-ranked label “hand” does not change before and after optimization. However, the second place changes from the label “face” before optimization to the label “foot” after optimization, and the quantization value also increases from “117” to “148”, which is close to the correct answer. It turns out that it becomes high.

本実施例によれば、クエリ画像集合に対する、周知の情報検索の評価値であるＦ値（F-measure）は、０．５３６から０．５４９に向上した。本実施例は、画像の相関を用いたが、ラベルの相関を用いた場合も、本実施例と同様の効果が期待できる。 According to the present embodiment, the F value (F-measure), which is a well-known evaluation value of information retrieval, for the query image set is improved from 0.536 to 0.549. Although the present embodiment uses image correlation, the same effect as the present embodiment can be expected when label correlation is used.

［他の実施の形態］
なお、本発明は、上記実施の形態に限定されず、本発明の要旨を変更しない範囲で種々に変形が可能である。例えば、画像受付部２、特徴抽出部３、ラベル事後確率計算部４、量子化部５、ノード接合部６、ラベル推定部７、ラベル付与部８及びアノテーション情報出力部９の各機能は、コンピュータ読み取り可能な画像識別情報付与プログラム１１０に従ってＣＰＵが動作することにより実現してもよい。また、上記実施の形態の画像受付部２、特徴抽出部３、ラベル事後確率計算部４、量子化部５、ノード接合部６、ラベル推定部７、ラベル付与部８及びアノテーション情報出力部９の全て又は一部をＡＳＩＣ等のハードウエアによって実現してもよい。 [Other embodiments]
In addition, this invention is not limited to the said embodiment, A various deformation | transformation is possible in the range which does not change the summary of this invention. For example, each function of the image reception unit 2, the feature extraction unit 3, the label posterior probability calculation unit 4, the quantization unit 5, the node junction unit 6, the label estimation unit 7, the label assignment unit 8, and the annotation information output unit 9 is a computer. It may be realized by the CPU operating in accordance with the readable image identification information adding program 110. In addition, the image receiving unit 2, the feature extracting unit 3, the label posterior probability calculating unit 4, the quantizing unit 5, the node joining unit 6, the label estimating unit 7, the label attaching unit 8, and the annotation information output unit 9 of the above embodiment. You may implement | achieve all or one part by hardware, such as ASIC.

また、上記実施の形態で用いたプログラムをＣＤ−ＲＯＭ等の記録媒体に記憶して提供することもできる。また、上記実施の形態で説明した上記ステップの入替え、削除、追加等は、本発明の要旨を変更しない範囲内で可能である。 The program used in the above embodiment can be provided by being stored in a recording medium such as a CD-ROM. Moreover, replacement, deletion, addition, and the like of the steps described in the above embodiments are possible within a range that does not change the gist of the present invention.

１…画像識別情報付与装置、２…画像受付部、３…特徴抽出部、４…ラベル事後確率計算部、５…量子化部、６…ノード接合部、７…ラベル推定部、８…ラベル付与部、９…アノテーション情報出力部、１０…記憶部、７０_１〜７０_Ｎ…ＭＲＦモデル、７１_１〜７１_Ｍ…入力ノード、７２_１〜７２_Ｍ…事後確率計算ノード、７３_１〜７３_Ｍ…推定事後確率計算ノード、７４_１〜７４_Ｍ…出力ノード、７５_１〜７５_Ｍ…入力側リンク、７６…接合リンク、７７_１〜７７_Ｍ…出力側リンク、１１０…画像識別情報付与プログラム、１２０…クエリ画像、１３０…学習モデル、１４０…ラベル辞書、１５０…リンク情報 DESCRIPTION OF SYMBOLS 1 ... Image identification information provision apparatus, 2 ... Image reception part, 3 ... Feature extraction part, 4 ... Label posterior probability calculation part, 5 ... Quantization part, 6 ... Node junction part, 7 ... Label estimation part, 8 ... Label assignment Part, 9 ... annotation information output part, 10 ... storage part, 70 ₁ to 70 _N ... MRF model, 71 _{1 to} 71 _M ... input node, 72 _{1 to} 72 _M ... posterior probability calculation node, 73 _{1 to} 73 _M ... estimation A posteriori probability calculation node, 74 ₁ to 74 _M ... output node, 75 ₁ to 75 _M ... input side link, 76 ... junction link, 77 _{1 to} 77 _M ... output side link, 110 ... image identification information adding program, 120 ... query Image 130 ... Learning model 140 ... Label dictionary 150 ... Link information

Claims

Computer
Extraction means for extracting feature amounts from a plurality of images,
Calculation means for calculating a first evaluation value for each of a plurality of pieces of identification information to be added to the image using a learning model from the feature amount extracted by the extraction means;
There are a number of random field models corresponding to the number of identification information, and the first evaluation value for each of the identification information calculated by the calculation unit for the plurality of images is input to the plurality of random field models. An image identification information adding program for functioning as output means for outputting a second evaluation value for the plurality of identification information for each image.

The image identification information addition program according to claim 1, further comprising optimization means for optimizing the random field model of the output means based on correlation information between the plurality of images.

The image identification information adding program according to claim 1, further comprising optimization means for optimizing the random field model of the output means based on correlation information between the plurality of identification information.

Extraction means for extracting feature amounts from a plurality of images,
Calculation means for calculating a first evaluation value for each of a plurality of pieces of identification information to be added to the image using a learning model from the feature amount extracted by the extraction means;
There are a number of random field models corresponding to the number of identification information, and the first evaluation value for each of the identification information calculated by the calculation unit for the plurality of images is input to the plurality of random field models. An image identification information providing apparatus comprising: output means for outputting a second evaluation value for the plurality of identification information for each image.