JP2015109024A

JP2015109024A - Image dictionary generation device, image dictionary generation method and computer program

Info

Publication number: JP2015109024A
Application number: JP2013252201A
Authority: JP
Inventors: 泳青孫; Yongqing Sun; 数藤　恭子; Kyoko Sudo; 恭子数藤; 行信谷口; Yukinobu Taniguchi
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2013-12-05
Filing date: 2013-12-05
Publication date: 2015-06-11

Abstract

PROBLEM TO BE SOLVED: To improve accuracy of an image dictionary.SOLUTION: Feature amounts are extracted for a plurality of learning images corresponding to a certain semantic label, full station representative information item indicating features of the plurality of learning images corresponding to the semantic label is extracted on the basis of the feature amounts, and the feature amounts are clustered. For each cluster, local representative information is extracted on the basis of the feature amount included in the cluster, and on the basis of the full station representative information and the local representative information, the learning images and the clusters are associated with each other. On the basis of the association between the clusters and the plurality of learning images, identifiers of the clusters are generated, and a combination of the identifiers is acquired as an image dictionary of the semantic label.

Description

本発明は、画像辞書を生成する技術に関する。 The present invention relates to a technique for generating an image dictionary.

従来の画像辞書生成方法として次のような方法がある。
まず、画像辞書の対象となる全ての意味ラベルに関する学習画像群から複数のトピックモデルが求められる。次に、個々の学習画像とトピックモデルとの類似度が算出される。次に、類似度の高い順で学習画像が複数のトピックに対応付けられる。次に、ある意味ラベルに関する画像辞書を構築するときに、各トピックに属する学習画像群から、その意味ラベルに付与された学習画像が収集される。次に、機械学習手法を用いてトピック毎の識別モデルが構築される。そして、トピック毎の識別モデルを合わせることによって、その意味ラベルの画像辞書が生成される（非特許文献１参照）。 There are the following methods as a conventional image dictionary generation method.
First, a plurality of topic models are obtained from learning image groups related to all semantic labels that are targets of the image dictionary. Next, the similarity between each learning image and the topic model is calculated. Next, learning images are associated with a plurality of topics in descending order of similarity. Next, when an image dictionary related to a certain meaning label is constructed, learning images assigned to the meaning label are collected from the learning image group belonging to each topic. Next, an identification model for each topic is constructed using a machine learning technique. And the image dictionary of the meaning label is produced | generated by matching the identification model for every topic (refer nonpatent literature 1).

TRECVID 2012 Semantic Video Concept Detection by NTT-MD-DUTY. Sun, K. Sudo, Y. Taniguchi -- NTT Media Intelligence Laboratories, Japan H. Li, L. Yi, Y. Guan -- School of Software, Dalian University of Technology, China, http://www-nlpir.nist.gov/projects/tvpubs/tv12.papers/ntt.pdfTRECVID 2012 Semantic Video Concept Detection by NTT-MD-DUTY.Sun, K. Sudo, Y. Taniguchi-NTT Media Intelligence Laboratories, Japan H. Li, L. Yi, Y. Guan-School of Software, Dalian University of Technology, China, http://www-nlpir.nist.gov/projects/tvpubs/tv12.papers/ntt.pdf

しがしながら、上記の非特許文献１に示す画像辞書生成方法には、以下の問題がある。
意味ラベルが、画像における複数の要素（トピック）の組み合わせで構成されることがある。例えば、“ビーチ”という意味ラベルは、“海”、“太陽”、“砂”、“人”などの複数のトピックの組み合わせで構成される。その結果、複数のトピックを用いて画像の内容を再構成することが必要である。トピックが多数存在する場合、一部のトピックのみが用いられてしまうと、学習画像群（全局情報）の情報の利用が不十分となってしまう。そのため、画像辞書の精度が低くなるという問題がある。 However, the image dictionary generation method shown in Non-Patent Document 1 has the following problems.
A semantic label may be composed of a combination of a plurality of elements (topics) in an image. For example, the meaning label “beach” is composed of a combination of a plurality of topics such as “sea”, “sun”, “sand”, and “people”. As a result, it is necessary to reconstruct the image content using multiple topics. When there are many topics, if only some topics are used, the use of information in the learning image group (all station information) becomes insufficient. Therefore, there is a problem that the accuracy of the image dictionary is lowered.

また、画像の平均特徴ベクトルでトピックを表現する従来技術では、画像辞書の表現力が充分ではない。そのため、結果として意味ラベル付与の精度が低くなるという問題があった。 In addition, in the conventional technique for expressing a topic with an average feature vector of an image, the image dictionary has insufficient expression power. As a result, there has been a problem that the accuracy of meaning labeling is lowered.

また、画像の意味が複数のトピックで構成される場合、各トピックの貢献度が異なることがある。例えば、“ビーチ”という画像の意味は、複数のトピック“海”、“太陽”、“砂”、“人”の組み合わせで表現される。しかし、“海”が大きく映され、“人”が小さく映される一枚の“ビーチ”の画像では、トピック“海”とトピック“人”とでは貢献度が異なる。このようなトピック毎の貢献度が処理に反映されていなかったため、意味ラベル付与の精度が低くなるという問題がある。 In addition, when the meaning of an image is composed of a plurality of topics, the contribution degree of each topic may be different. For example, the meaning of the image “beach” is expressed by a combination of a plurality of topics “sea”, “sun”, “sand”, and “people”. However, in a single “beach” image in which “sea” is projected large and “people” is projected small, the topic “sea” and topic “people” have different degrees of contribution. Since the degree of contribution for each topic is not reflected in the process, there is a problem that the accuracy of meaning label assignment is lowered.

上記事情に鑑み、本発明は、画像辞書の精度を向上させることを可能とする技術の提供を目的としている。 In view of the above circumstances, an object of the present invention is to provide a technique that can improve the accuracy of an image dictionary.

本発明の一態様は、ある意味ラベルに対応付けられた複数の学習画像について特徴量を抽出する特徴量抽出部と、前記特徴量に基づいて、前記意味ラベルに対応付けられた複数の学習画像の特徴を示す全局代表情報を抽出する全局代表情報抽出部と、前記特徴量をクラスタリングし、クラスタ毎にそのクラスタに含まれる前記特徴量に基づいて局所代表情報を抽出する局所代表情報抽出部と、前記全局代表情報及び前記局所代表情報に基づいて、前記学習画像と前記クラスタとを対応付ける学習画像対応付け部と、前記クラスタと複数の学習画像との対応付けに基づいて、各クラスタの識別器を生成し、前記識別器の組み合わせを前記意味ラベルの画像辞書として取得する画像辞書生成部と、を備える画像辞書生成装置である。 According to one aspect of the present invention, a feature amount extraction unit that extracts a feature amount for a plurality of learning images associated with a semantic label, and a plurality of learning images associated with the semantic label based on the feature amount An all-station representative information extracting unit that extracts all-station representative information indicating the features of the local area, and a local representative information extracting unit that clusters the feature amounts and extracts local representative information based on the feature amounts included in the clusters for each cluster; A learning image associating unit for associating the learning image with the cluster based on the all-station representative information and the local representative information, and a classifier for each cluster based on the association between the cluster and a plurality of learning images. And an image dictionary generation unit that acquires the combination of the discriminators as an image dictionary of the semantic labels.

本発明の一態様は、上記の画像辞書生成装置であって、前記学習画像対応付け部は、前記全局代表情報及び前記局所代表情報に基づいて各学習画像と前記クラスタとの再構成誤差を算出し、前記再構成誤差が小さい順に前記学習画像と前記クラスタとを対応付ける。 One aspect of the present invention is the image dictionary generation device described above, wherein the learning image association unit calculates a reconstruction error between each learning image and the cluster based on the all-station representative information and the local representative information. Then, the learning image is associated with the cluster in ascending order of the reconstruction error.

本発明の一態様は、ある意味ラベルに対応付けられた複数の学習画像について特徴量を抽出する特徴量抽出ステップと、前記特徴量に基づいて、前記意味ラベルに対応付けられた複数の学習画像の特徴を示す全局代表情報を抽出する全局代表情報抽出ステップと、前記特徴量をクラスタリングし、クラスタ毎にそのクラスタに含まれる前記特徴量に基づいて局所代表情報を抽出する局所代表情報抽出ステップと、前記全局代表情報及び前記局所代表情報に基づいて、前記学習画像と前記クラスタとを対応付ける学習画像対応付けステップと、前記クラスタと複数の学習画像との対応付けに基づいて、各クラスタの識別器を生成し、前記識別器の組み合わせを前記意味ラベルの画像辞書として取得する画像辞書生成ステップと、を有する画像辞書生成方法である。 According to one aspect of the present invention, a feature amount extracting step of extracting a feature amount for a plurality of learning images associated with a certain meaning label, and a plurality of learning images associated with the meaning label based on the feature amount All-station representative information extracting step for extracting all-station representative information indicating the characteristics of the above; and local representative information extracting step for clustering the feature amounts and extracting local representative information based on the feature amounts included in the clusters for each cluster; A learning image associating step for associating the learning image with the cluster based on the all-station representative information and the local representative information, and an identifier for each cluster based on the association between the cluster and a plurality of learning images. And an image dictionary generating step for acquiring a combination of the discriminators as an image dictionary of the semantic labels. It is a generation method.

本発明の一態様は、上記の画像辞書生成装置としてコンピュータを動作させるためのコンピュータプログラムである。 One aspect of the present invention is a computer program for causing a computer to operate as the image dictionary generation apparatus.

本発明により、画像辞書の精度を向上させることが可能となる。 According to the present invention, it is possible to improve the accuracy of the image dictionary.

画像辞書生成装置の構成を示す概略ブロック図である。It is a schematic block diagram which shows the structure of an image dictionary production | generation apparatus. 画像辞書生成装置の処理の具体例を示すフローチャートである。It is a flowchart which shows the specific example of a process of an image dictionary production | generation apparatus. 学習画像対応付け部が学習画像をトピックに対応付ける処理の具体例を示すフローチャートである。It is a flowchart which shows the specific example of the process in which a learning image matching part matches a learning image with a topic. 画像辞書生成部が画像辞書を生成する処理の具体例を示すフローチャートである。It is a flowchart which shows the specific example of the process in which an image dictionary production | generation part produces | generates an image dictionary. 画像意味ラベル付与装置の構成を示す概略ブロック図である。It is a schematic block diagram which shows the structure of an image meaning label provision apparatus. 画像辞書生成装置によって生成された画像辞書を用いて、画像や映像に意味ラベルを付与する処理の具体例を表すフローチャートである。It is a flowchart showing the specific example of the process which assign | provides a semantic label to an image or an image | video using the image dictionary produced | generated by the image dictionary production | generation apparatus. 画像辞書を用いて画像や映像に意味ラベルを付与する処理の概略を表す概略図である。It is the schematic showing the outline of the process which provides a semantic label to an image and an image | video using an image dictionary.

本発明における画像辞書生成装置の一実施例の詳細について説明する。
図１は、画像辞書生成装置１０の構成を示す概略ブロック図である。画像辞書生成装置１０は、バスで接続されたＣＰＵ（Central Processing Unit）やメモリや補助記憶装置などを備え、画像辞書生成プログラムを実行する。画像辞書生成装置１０は、画像辞書生成プログラムの実行によって、蓄積部１、学習画像群収集部２、特徴量抽出部３、代表情報抽出部４、学習画像対応付け部５及び画像辞書生成部６を備える装置として機能する。なお、画像辞書生成装置１０の各機能の全て又は一部は、ＡＳＩＣ（Application Specific Integrated Circuit）やＰＬＤ（Programmable Logic Device）やＦＰＧＡ（Field Programmable Gate Array）等のハードウェアを用いて実現されても良い。画像辞書生成プログラムは、コンピュータ読み取り可能な記録媒体に記録されても良い。コンピュータ読み取り可能な記録媒体とは、例えばフレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置である。 The details of an embodiment of the image dictionary generating apparatus according to the present invention will be described.
FIG. 1 is a schematic block diagram showing the configuration of the image dictionary generation device 10. The image dictionary generation device 10 includes a CPU (Central Processing Unit), a memory, an auxiliary storage device, and the like connected by a bus, and executes an image dictionary generation program. The image dictionary generation device 10 executes the image dictionary generation program by storing the storage unit 1, the learning image group collection unit 2, the feature amount extraction unit 3, the representative information extraction unit 4, the learning image association unit 5, and the image dictionary generation unit 6. It functions as a device provided with. Note that all or some of the functions of the image dictionary generation device 10 may be realized using hardware such as an application specific integrated circuit (ASIC), a programmable logic device (PLD), or a field programmable gate array (FPGA). good. The image dictionary generation program may be recorded on a computer-readable recording medium. The computer-readable recording medium is, for example, a portable medium such as a flexible disk, a magneto-optical disk, a ROM, a CD-ROM, or a storage device such as a hard disk built in the computer system.

蓄積部１は、磁気ハードディスク装置や半導体記憶装置などの記憶装置を用いて構成される。蓄積部１は、事前に用意された学習画像を意味ラベル毎に蓄積する。学習画像と意味ラベルとの対応付けは、どのような処理によって行われても良い。例えば、人の手によって意味ラベルに学習画像が対応付けされても良い。例えば、Ｗｅｂ上に存在する画像及びタグを、対応付けられた学習画像及び意味ラベルとして取得しても良い。蓄積部１は、学習画像群収集部２から意味ラベルの指定を受けると、指定された意味ラベルに対応付けられた複数の学習画像を学習画像群収集部２に対して出力する。 The storage unit 1 is configured using a storage device such as a magnetic hard disk device or a semiconductor storage device. The accumulating unit 1 accumulates learning images prepared in advance for each semantic label. The association between the learning image and the semantic label may be performed by any process. For example, a learning image may be associated with a semantic label by a human hand. For example, images and tags that exist on the Web may be acquired as associated learning images and semantic labels. When the storage unit 1 receives designation of a semantic label from the learning image group collection unit 2, the accumulation unit 1 outputs a plurality of learning images associated with the designated semantic label to the learning image group collection unit 2.

学習画像群収集部２は、蓄積部１に対して意味ラベルの指定を出力することによって、指定された意味ラベルに対応付けられた複数の学習画像を受け取る。学習画像群収集部２は、受け取った複数の学習画像（以下、「学習画像群」という。）を特徴量抽出部３へ出力する。学習画像群には、各学習画像の意味ラベルに関する情報（以下、「意味ラベル情報」という。）も含まれる。 The learning image group collection unit 2 outputs a plurality of learning images associated with the designated semantic label by outputting the designation of the semantic label to the storage unit 1. The learning image group collection unit 2 outputs the received plurality of learning images (hereinafter referred to as “learning image group”) to the feature amount extraction unit 3. The learning image group also includes information on the meaning label of each learning image (hereinafter referred to as “semantic label information”).

特徴量抽出部３は、学習画像群収集部２から意味ラベルに関する学習画像群を受け取る。特徴量抽出部３は、受け取った学習画像群に含まれる個々の学習画像について、特徴量を抽出する。特徴量抽出部３は、例えば、色ヒストグラムや模様ヒストグラムなどの特徴量（物理特徴量）を抽出しても良い。特徴量抽出部３は、学習画像を分割し、各領域から特徴量を抽出しても良い。特徴量抽出部３は、学習画像からＳＩＦＴ特徴点を抽出し、ＳＩＦＴなどの局所特徴量を抽出しても良い。特徴量抽出部３は、抽出された特徴量を局所代表情報抽出部４２へ出力する。 The feature quantity extraction unit 3 receives a learning image group related to a semantic label from the learning image group collection unit 2. The feature amount extraction unit 3 extracts a feature amount for each learning image included in the received learning image group. The feature quantity extraction unit 3 may extract feature quantities (physical feature quantities) such as color histograms and pattern histograms, for example. The feature amount extraction unit 3 may divide the learning image and extract the feature amount from each region. The feature amount extraction unit 3 may extract SIFT feature points from the learning image and extract local feature amounts such as SIFT. The feature quantity extraction unit 3 outputs the extracted feature quantity to the local representative information extraction unit 42.

代表情報抽出部４は、代表情報を抽出する。代表情報抽出部４は、全局代表情報抽出部４１と、局所代表情報抽出部４２と、を備える。 The representative information extraction unit 4 extracts representative information. The representative information extraction unit 4 includes an all-station representative information extraction unit 41 and a local representative information extraction unit 42.

全局代表情報抽出部４１は、特徴量抽出部３から抽出された学習画像群の特徴量を受け取る。全局代表情報抽出部４１は、受け取った特徴量に基づいて、全局代表情報を抽出する。全局代表情報は、一つの意味ラベルに対応付けられた学習画像群の全体の情報を簡潔且つ精度よく表す情報である。例えば、全局代表情報抽出部４１は、学習画像群の画像の全てのＳＩＦＴ特徴点における局所特徴量に基づき、全局代表情報を抽出してもよい。例えば、この処理にはsparse representation手法が適用されてもよい。sparse representation手法には、以下に示す参考文献１に開示された技術が適用されてもよい。
参考文献１：http://www.stanford.edu/class/cs294a/sparseAutoencoder_2011new.pdf The all-station representative information extraction unit 41 receives the feature amount of the learning image group extracted from the feature amount extraction unit 3. The all-station representative information extraction unit 41 extracts all-station representative information based on the received feature amount. The all-station representative information is information that simply and accurately represents the entire information of the learning image group associated with one semantic label. For example, the all-station representative information extraction unit 41 may extract all-station representative information based on local feature amounts at all SIFT feature points of the learning image group image. For example, a sparse representation method may be applied to this processing. The technique disclosed in Reference Document 1 shown below may be applied to the sparse representation method.
Reference 1: http://www.stanford.edu/class/cs294a/sparseAutoencoder_2011new.pdf

この手法により求められた基底Ｄ^Ｓ（複数の特徴量で構成される行列Ｎ×Ｍ，Ｎは特徴量の次元数，Ｍ＞Ｎ）が全局代表情報である。全局代表情報抽出部４１は、抽出した全局代表情報を蓄積部１に格納する。また、全局代表情報抽出部４１は、抽出した全局代表情報を学習画像対応付け部５に出力する。 The base D ^S (matrix N × M composed of a plurality of feature amounts, N is the number of feature dimensions, M> N) obtained by this method is representative information of all stations. The all-station representative information extraction unit 41 stores the extracted all-station representative information in the storage unit 1. Further, the all-station representative information extraction unit 41 outputs the extracted all-station representative information to the learning image association unit 5.

局所代表情報抽出部４２は、特徴量抽出部３によって抽出された複数の特徴量を受け取る。局所代表情報抽出部４２は、受け取った複数の特徴量についてクラスタリング処理を行い、複数のクラスタを生成する。生成された各クラスタが一つのトピックに対応する。例えば、クラスタリング処理は、k-means法、ＬＤＡ（Latent Dirichlet Allocation）法などを用いて実行されても良い。 The local representative information extraction unit 42 receives a plurality of feature amounts extracted by the feature amount extraction unit 3. The local representative information extraction unit 42 performs a clustering process on the received plurality of feature amounts to generate a plurality of clusters. Each generated cluster corresponds to one topic. For example, the clustering process may be executed using a k-means method, an LDA (Latent Dirichlet Allocation) method, or the like.

クラスタリング処理では、特徴量から算出される画像類似度が近い画像群が、一つのクラスタ（トピック）として出力される。局所代表情報は、各クラスタに含まれる学習画像の情報を簡潔且つ精度よく表す情報である。例えば、局所代表情報抽出部４２は、クラスタに含まれる学習画像のＳＩＦＴ特徴点における局所特徴量に基づき、局所代表情報を抽出してもよい。例えば、この処理にはsparse representation手法が適用されてもよい。以下、局所代表情報をＤ^Ｉ _ｃとして表す。ｃ＝１，２…Ｋである。Ｋはトピックの数である。 In the clustering process, an image group with similar image similarity calculated from the feature amount is output as one cluster (topic). The local representative information is information that simply and accurately represents the information of the learning image included in each cluster. For example, the local representative information extraction unit 42 may extract the local representative information based on the local feature amount at the SIFT feature point of the learning image included in the cluster. For example, a sparse representation method may be applied to this processing. Hereinafter, the local representative information is represented as D ^I _c . c = 1, 2. K is the number of topics.

局所代表情報抽出部４２は、抽出された局所代表情報を蓄積部１に格納する。また、局所代表情報抽出部４２は、抽出された局所代表情報を学習画像対応付け部５へ出力する。 The local representative information extraction unit 42 stores the extracted local representative information in the storage unit 1. Further, the local representative information extraction unit 42 outputs the extracted local representative information to the learning image association unit 5.

学習画像対応付け部５は、全局代表情報抽出部４１から、全局代表情報を受け取る。学習画像対応付け部５は、局所代表情報抽出部４２から、局所代表情報を受け取る。なお、以下の説明では、全体代表情報及び局所代表情報をまとめて「代表情報」と記載する。学習画像対応付け部５は、代表情報を用いて学習画像群の各画像の再構成誤差を取得する。学習画像対応付け部５は、再構成誤差が小さい画像から順に、各学習画像を複数のトピックに対応付ける。学習画像対応付け部５は、トピックと学習画像との対応関係を画像辞書生成部６へ出力する。 The learning image association unit 5 receives all station representative information from the all station representative information extraction unit 41. The learning image association unit 5 receives local representative information from the local representative information extraction unit 42. In the following description, overall representative information and local representative information are collectively referred to as “representative information”. The learning image association unit 5 acquires the reconstruction error of each image in the learning image group using the representative information. The learning image association unit 5 associates each learning image with a plurality of topics in order from the image with the smallest reconstruction error. The learning image association unit 5 outputs the correspondence between the topic and the learning image to the image dictionary generation unit 6.

画像辞書生成部６は、学習画像対応付け部５からトピック毎の学習画像を受け取る。画像辞書生成部６は、画像辞書を生成しようとしている意味ラベル（以下、「処理対象意味ラベル」という。）に付与された学習画像を収集し、機械学習手法を用いてトピック毎の識別モデルを構築する。そして、画像辞書生成部６は、トピック毎の識別モデルを合わせて、処理対象意味ラベルの画像辞書を生成する。画像辞書生成部６は、生成した画像辞書を蓄積部１に格納する。 The image dictionary generation unit 6 receives a learning image for each topic from the learning image association unit 5. The image dictionary generation unit 6 collects learning images assigned to semantic labels (hereinafter referred to as “processing target semantic labels”) for which an image dictionary is to be generated, and uses a machine learning method to determine an identification model for each topic. To construct. Then, the image dictionary generation unit 6 generates an image dictionary of the processing target semantic labels by combining the identification models for each topic. The image dictionary generation unit 6 stores the generated image dictionary in the storage unit 1.

次に、全局代表情報及び局所代表情報について説明する。
sparse representationが用いられた場合、全局代表情報及び局所代表情報はそれぞれ以下の式１及び式２によって表される。

Next, all-station representative information and local representative information will be described.
When sparse representation is used, all-station representative information and local representative information are expressed by the following

equations

1 and 2, respectively.

式１において、Ｄ_ｃは全局代表情報と局所代表情報から構成される代表情報を表す基底である。Ｄ^Ｓは全局代表情報を表す基底である。Ｄ^Ｉ _ｃは局所代表情報を表す基底である。

In Equation 1, D _c is a base representing representative information composed of all-station representative information and local representative information. D ^S is a base representing all-station representative information. D ^I _c is a base representing local representative information.

式２において、Ｇ_ｃはＤ_ｃに対応する係数である。Ｇ^Ｓ _ｃは画像がＤ^Ｓで再構成されるときのＤ^Ｓに対応する係数である。Ｇ^Ｉ _ｃは画像があるＤ^Ｉ _ｃ（ｃ＝１、２、・・・）で再構成されるときのＤ^Ｉ _ｃに対応する係数である。 In Equation 2, G _c is a coefficient corresponding to D _c . G ^S _c is a coefficient corresponding to the ^{D S} when the image is reconstructed by ^{D S.} G ^I _c is a coefficient corresponding to D ^I _c when the image is reconstructed with a certain D ^I _c (c = 1, 2,...).

Ｄ_ｃは、学習画像群のsparse codingの基底と個々のトピックの基底とを結合する。すなわち、［Ｄ^Ｓ，Ｄ^Ｉ _ｃ］は行列を横に並べたものであり、Ｇ_ｃは基底に対応する係数である。
参考文献１におけるトピックモデルは、１セットのパラメータ（k-meansによる平均特徴量など）で表される。これに対し、本実施形態では、全局代表情報及び局所代表情報を用いてトピックモデルが表される。このように全体の情報及び局所の情報が利用されるため、情報量ロスが少なく精度よくトピックの内容を表現できる。 D _c combines the sparse coding bases of the learning images and the bases of the individual topics. In other words, [D ^S , D ^I _c ] is a matrix in which the matrix is arranged horizontally, and G _c is a coefficient corresponding to the base.
The topic model in Reference 1 is represented by a set of parameters (such as an average feature amount by k-means). On the other hand, in this embodiment, a topic model is represented using all-station representative information and local representative information. Since the entire information and local information are used in this way, the content of the topic can be expressed accurately with little loss of information.

図２は、画像辞書生成装置１０の処理の具体例を示すフローチャートである。以下、図２を用いて画像辞書生成装置１０の処理例について説明する。 FIG. 2 is a flowchart illustrating a specific example of processing of the image dictionary generation device 10. Hereinafter, a processing example of the image dictionary generation device 10 will be described with reference to FIG.

まず、学習画像群収集部２が、蓄積部１から、意味ラベルに対応付けられた全ての学習画像（学習画像群）を読み出す（ステップＳ２０１）。次に、特徴量抽出部３が、読み出された学習画像群において、学習画像毎に特徴量を抽出する（ステップＳ２０２）。次に、全局代表情報抽出部４１が、全局代表情報を抽出する（ステップＳ２０３）。次に、局所代表情報抽出部４２が、特徴量抽出部３によって抽出された特徴量に対してクラスタリング処理を行う。そして、局所代表情報抽出部４２が、各クラスタの局所代表情報を抽出する（ステップＳ２０４）。 First, the learning image group collection unit 2 reads all the learning images (learning image group) associated with the semantic labels from the storage unit 1 (step S201). Next, the feature amount extraction unit 3 extracts a feature amount for each learning image in the read learning image group (step S202). Next, the all station representative information extraction unit 41 extracts all station representative information (step S203). Next, the local representative information extraction unit 42 performs a clustering process on the feature amount extracted by the feature amount extraction unit 3. And the local representative information extraction part 42 extracts the local representative information of each cluster (step S204).

次に、学習画像対応付け部５が、代表情報に基づいて、学習画像群における各学習画像を各トピックに対応付ける（ステップＳ２０５）。 Next, the learning image association unit 5 associates each learning image in the learning image group with each topic based on the representative information (step S205).

図３は、学習画像対応付け部５が学習画像をトピックに対応付ける処理の具体例を示すフローチャートである。まず、学習画像対応付け部５は、全ての意味ラベルに関する学習画像群（Ｎ枚の学習画像を含む：Ｎは１以上の整数）を読み出す（ステップＳ３０１）。次に、学習画像対応付け部５は、ステップＳ２０３及びステップＳ２０４において抽出された代表情報を読み出す（ステップＳ３０２）。 FIG. 3 is a flowchart illustrating a specific example of processing in which the learning image association unit 5 associates the learning image with the topic. First, the learning image associating unit 5 reads out a learning image group (including N learning images: N is an integer of 1 or more) related to all semantic labels (step S301). Next, the learning image association unit 5 reads the representative information extracted in step S203 and step S204 (step S302).

次に、学習画像対応付け部５は、変数ｎに１を代入する（ステップＳ３０３）。学習画像対応付け部５は、Ｋ個のトピックモデルを用いて、ｎ番目の学習画像（特徴量はｘ_ｉ）の再構成誤差を算出する（ステップＳ３０４）。
Sparse codingが用いられる場合、ｃ番目のトピックモデルを用いて再構成する数式は以下の様に表される。

Next, the learning image associating unit 5 substitutes 1 for the variable n (step S303). The learning image associating unit 5 calculates a reconstruction error of the nth learning image (feature amount is x _i ) using the K topic models (step S304).
When Sparse coding is used, the mathematical formula reconstructed using the c-th topic model is expressed as follows.

ｇ_ｃｉは係数Ｇ_ｃにおけるｉ番目の列の係数である。そして、ｎ番目の学習画像について、ｃ番目のトピックモデルを用いた再構成誤差は以下の数式で表される。

g _ci is a coefficient of the i-th column in the coefficient G _c . For the nth learning image, the reconstruction error using the cth topic model is expressed by the following mathematical formula.

次に、学習画像対応付け部５は、学習画像群に含まれる全ての学習画像について再構成誤差の算出が完了したか否か判定する（ステップＳ３０５）。具体的には、学習画像対応付け部５は、変数ｎが学習画像群の枚数Ｎよりも小さいか否か判定する。変数ｎがＮよりも小さい場合（ステップＳ３０５−ＹＥＳ）、学習画像対応付け部５は、変数ｎをインクリメントしてステップＳ３０４の処理に戻る（ステップＳ３０７）。一方、変数ｎがＮ以上である場合（ステップＳ３０５−ＮＯ）、学習画像対応付け部５は、各トピックについて、算出された再構成誤差の小さい方から順に上位Ｐ個（Ｐは１以上の整数）の学習画像をそのトピックに対応付ける（ステップＳ３０６）。 Next, the learning image association unit 5 determines whether or not the calculation of the reconstruction error has been completed for all the learning images included in the learning image group (step S305). Specifically, the learning image association unit 5 determines whether or not the variable n is smaller than the number N of learning images. When the variable n is smaller than N (step S305—YES), the learning image association unit 5 increments the variable n and returns to the process of step S304 (step S307). On the other hand, when the variable n is greater than or equal to N (step S305—NO), the learning image associating unit 5 determines, for each topic, the top P (P is an integer equal to or greater than 1) in order from the smallest calculated reconstruction error. ) Is associated with the topic (step S306).

Ｐの値は例えば以下の様に設定されてもよい。まず、再構成誤差ε_ｉｃを小さい順に一列で並べる。例えば、（０．５８，０．６，０．６２，０．９５，０．９６，０．９８）のように再構成誤差が並べられる。次に、再構成誤差ε_ｉｃを、一つ前の再構成誤差ε_ｉｃで除算する。このようにして得られる値を前後で比較し、数値が急に高くなる時点までに処理されたトピックの数をＰとして採用する。上記の例では、０．９５の属するトピックよりも前のトピックが採用される。このような処理を行うことにより、上位Ｐ個を自動で決定することが可能となる。 The value of P may be set as follows, for example. First, the reconstruction errors ε _ic are arranged in a line in ascending order. For example, reconstruction errors are arranged like (0.58, 0.6, 0.62, 0.95, 0.96, 0.98). Next, the reconstruction error epsilon _ics, divided by the previous reconstruction error epsilon _ics. The values obtained in this way are compared before and after, and the number of topics processed up to the point when the numerical value suddenly increases is adopted as P. In the above example, the topic before the topic to which 0.95 belongs is adopted. By performing such processing, it is possible to automatically determine the top P.

以上の処理によって、学習画像が複数のトピックに対応付けられる。例えば、ＳＩＦＴ点や領域特徴量が用いられた場合、意味ラベル“ビーチ”に対応付けられた学習画像が、“太陽”、“人”、“海”のようなトピックに対応付けられる。 Through the above processing, the learning image is associated with a plurality of topics. For example, when SIFT points and area feature quantities are used, the learning image associated with the semantic label “beach” is associated with topics such as “sun”, “people”, and “sea”.

図２の説明に戻る。ステップＳ２０５の処理の後、画像辞書生成部６は、トピック毎に対応付けられた学習画像に基づいて、意味ラベルに関する画像辞書を生成する（ステップＳ２０６）。 Returning to the description of FIG. After the process of step S205, the image dictionary generation unit 6 generates an image dictionary related to the semantic label based on the learning image associated with each topic (step S206).

図４は、画像辞書生成部６が画像辞書を生成する処理の具体例を示すフローチャートである。まず、画像辞書生成部６が、Ｍ個のトピックモデル毎に、各トピックモデルに対応付けられた複数の学習画像を読み込む（ステップＳ４０１）。次に、画像辞書生成部６が、変数ｃに１を代入する（ステップＳ４０２）。次に、画像辞書生成部６が、変数ｍに１を代入する（ステップＳ４０３）。 FIG. 4 is a flowchart showing a specific example of processing in which the image dictionary generating unit 6 generates an image dictionary. First, the image dictionary generation unit 6 reads a plurality of learning images associated with each topic model for each of M topic models (step S401). Next, the image dictionary generating unit 6 substitutes 1 for the variable c (step S402). Next, the image dictionary generation unit 6 substitutes 1 for the variable m (step S403).

次に、画像辞書生成部６が、ｍ番目のトピックモデルに対応付けられた学習画像の中から、ｃ番目の意味ラベルに付与された学習画像を取得する（ステップＳ４０４）。この処理において、画像辞書生成部６は、このような条件に該当する全ての学習画像を取得する。次に、画像辞書生成部６が、取得された学習画像の数が所定の閾値以上であるか否か判定する（ステップＳ４０５）。所得された学習画像の数が所定の閾値以上である場合（ステップＳ４０５−ＹＥＳ）、画像辞書生成部６が、機械学習手法を用いて、ｃ番目の意味ラベルに関するｍ番目のトピックの識別器を生成する（ステップＳ４０６）。機械学習手法の具体例として、ＳＶＭ（Support vector machine）がある。識別器を生成する際に用いられる特徴量については、ステップＳ２０２において抽出される特徴量が物理的な特徴量（色や模様など）である場合、そのまま特徴量が使用されても良い。ステップＳ２０２において抽出される特徴量がＳＩＦＴ特徴点や領域特徴量の場合、ｎ番目の学習画像におけるｍ番目のトピックモデルとの類似度が一定の値以上である特徴点や領域特徴量を特徴量として用いれば良い。 Next, the image dictionary generation unit 6 acquires a learning image assigned to the c-th semantic label from learning images associated with the m-th topic model (step S404). In this process, the image dictionary generation unit 6 acquires all learning images that meet such a condition. Next, the image dictionary generation unit 6 determines whether or not the number of acquired learning images is equal to or greater than a predetermined threshold (step S405). When the number of acquired learning images is equal to or greater than a predetermined threshold (step S405—YES), the image dictionary generation unit 6 uses the machine learning technique to identify the mth topic classifier relating to the cth semantic label. Generate (step S406). A specific example of the machine learning method is SVM (Support Vector Machine). Regarding the feature amount used when generating the discriminator, when the feature amount extracted in step S202 is a physical feature amount (color, pattern, etc.), the feature amount may be used as it is. When the feature amount extracted in step S202 is a SIFT feature point or a region feature amount, a feature point or a region feature amount whose similarity to the mth topic model in the nth learning image is equal to or greater than a certain value is used as the feature amount. It may be used as.

ステップＳ４０５の処理において、取得された学習画像の数が所定の閾値未満である場合（ステップＳ４０５−ＮＯ）、又はステップＳ４０６の処理の後、画像辞書生成部６が、ｃ番目の意味ラベルに関して、全てのトピックに対してステップＳ４０４〜Ｓ４０６の処理が完了したか否か判定する（ステップＳ４０７）。処理が完了していないトピックが存在する場合、すなわち変数ｍがＭより小さい場合（ステップＳ４０７−ＹＥＳ）、画像辞書生成部６が、ｍをインクリメントして（ステップＳ４１０）、ステップＳ４０４の処理に戻る。 In the process of step S405, when the number of acquired learning images is less than the predetermined threshold (step S405-NO), or after the process of step S406, the image dictionary generation unit 6 relates to the c-th semantic label. It is determined whether or not the processing of steps S404 to S406 has been completed for all topics (step S407). If there is a topic that has not been processed, that is, if the variable m is smaller than M (step S407: YES), the image dictionary generation unit 6 increments m (step S410) and returns to the process of step S404. .

一方、ステップＳ４０７の処理において、全てのトピックについてステップＳ４０４〜Ｓ４０６の処理が完了した場合（ステップＳ４０７−ＮＯ）、すなわち変数ｍがＭ以上である場合、画像辞書生成部６が、各トピックの識別器をｃ番目の意味ラベルの画像辞書として蓄積部１に格納する（ステップＳ４０８）。 On the other hand, in the process of step S407, when the processes of steps S404 to S406 are completed for all topics (step S407-NO), that is, when the variable m is M or more, the image dictionary generation unit 6 identifies each topic. Is stored in the storage unit 1 as an image dictionary of the c-th semantic label (step S408).

次に、画像辞書生成部６が、全ての意味ラベルに対してステップＳ４０８の処理が行われたか否か判定する（ステップＳ４０９）。ステップＳ４０８の処理が行われていない意味ラベルが存在する場合（ステップＳ４０９−ＹＥＳ）、すなわち変数ｃがＣよりも小さい場合、画像辞書生成部６が、変数ｃをインクリメントしてステップＳ４０３の処理に戻る。一方、全ての意味ラベルについてステップＳ４０８の処理が完了している場合（ステップＳ４０９−ＮＯ）、すなわち変数ｃがＣ以上である場合、画像辞書生成部６は処理を終了する。 Next, the image dictionary generation unit 6 determines whether or not the process of step S408 has been performed on all semantic labels (step S409). When there is a meaning label that has not been subjected to the process of step S408 (step S409-YES), that is, when the variable c is smaller than C, the image dictionary generation unit 6 increments the variable c and proceeds to the process of step S403. Return. On the other hand, when the process of step S408 is completed for all semantic labels (step S409-NO), that is, when the variable c is C or more, the image dictionary generation unit 6 ends the process.

図５は、画像意味ラベル付与装置２０の構成を示す概略ブロック図である。画像意味ラベル付与装置２０は、バスで接続されたＣＰＵやメモリや補助記憶装置などを備え、画像意味ラベル付与プログラムを実行する。画像意味ラベル付与装置２０は、画像意味ラベル付与プログラムの実行によって、蓄積部２１、画像収集部２２、特徴量抽出部２３、対応付け部２４、類似度統合部２５及び画像意味ラベル付与部２６を備える装置として機能する。なお、画像意味ラベル付与装置２０の各機能の全て又は一部は、ＡＳＩＣやＰＬＤやＦＰＧＡ等のハードウェアを用いて実現されても良い。画像辞書生成プログラムは、コンピュータ読み取り可能な記録媒体に記録されても良い。コンピュータ読み取り可能な記録媒体とは、例えばフレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置である。 FIG. 5 is a schematic block diagram showing the configuration of the image meaning label assigning device 20. The image meaning label assigning device 20 includes a CPU, a memory, an auxiliary storage device, and the like connected by a bus, and executes an image meaning label assigning program. The image meaning label assigning device 20 includes an accumulating unit 21, an image collecting unit 22, a feature amount extracting unit 23, an associating unit 24, a similarity integrating unit 25, and an image meaning label assigning unit 26 by executing an image meaning label assigning program. It functions as a device provided. Note that all or some of the functions of the image meaning label assigning apparatus 20 may be realized using hardware such as an ASIC, a PLD, or an FPGA. The image dictionary generation program may be recorded on a computer-readable recording medium. The computer-readable recording medium is, for example, a portable medium such as a flexible disk, a magneto-optical disk, a ROM, a CD-ROM, or a storage device such as a hard disk built in the computer system.

蓄積部２１は、磁気ハードディスク装置や半導体記憶装置などの記憶装置を用いて構成される。蓄積部２１は、分類対象となる画像、代表情報、画像辞書等のデータを蓄積する。 The storage unit 21 is configured using a storage device such as a magnetic hard disk device or a semiconductor storage device. The storage unit 21 stores data such as images to be classified, representative information, and image dictionaries.

画像収集部２２は、蓄積部２１に対して分類の指定を出力することによって、蓄積部２１に格納された分析対象画像を受け取る。画像収集部２２は、受け取った分析対象画像を特徴量抽出部２３へ出力する。 The image collection unit 22 receives the analysis target image stored in the storage unit 21 by outputting the designation of classification to the storage unit 21. The image collection unit 22 outputs the received analysis target image to the feature amount extraction unit 23.

特徴量抽出部２３は、画像収集部２２から分析対象画像を受け取る。特徴量抽出部２３は、受け取った分析対象画像について、特徴量を抽出する。特徴量抽出部２３は、例えば、色ヒストグラムや模様ヒストグラムなどの特徴量（物理特徴量）を抽出しても良い。特徴量抽出部２３は、学習画像を分割し、各領域から特徴量を抽出しても良い。特徴量抽出部２３は、学習画像からＳＩＦＴ特徴点を抽出し、ＳＩＦＴなどの局所特徴量を抽出しても良い。特徴量抽出部２３が特徴量を抽出する際に用いるアルゴリズムは、画像辞書生成装置１０の特徴量抽出部３と同じアルゴリズムであることが望ましい。特徴量抽出部２３は、抽出された特徴量を対応付け部２４へ出力する。 The feature amount extraction unit 23 receives the analysis target image from the image collection unit 22. The feature amount extraction unit 23 extracts a feature amount from the received analysis target image. The feature quantity extraction unit 23 may extract feature quantities (physical feature quantities) such as color histograms and pattern histograms, for example. The feature amount extraction unit 23 may divide the learning image and extract the feature amount from each region. The feature amount extraction unit 23 may extract SIFT feature points from the learning image and extract local feature amounts such as SIFT. The algorithm used when the feature quantity extraction unit 23 extracts the feature quantity is preferably the same algorithm as the feature quantity extraction unit 3 of the image dictionary generation device 10. The feature quantity extraction unit 23 outputs the extracted feature quantity to the association unit 24.

対応付け部２４は、特徴量抽出部２３から受け取った特徴量と、蓄積部２１から受け取った代表情報と、に基づいて分析対象画像の再構成誤差を算出する。対応付け部２４は、再構成誤差が小さい順に複数のトピックに分析対象画像を対応付ける。対応付け部２４は、対応付けの結果を示す情報（対応付け情報）を類似度統合部２５へ出力する。 The association unit 24 calculates the reconstruction error of the analysis target image based on the feature amount received from the feature amount extraction unit 23 and the representative information received from the storage unit 21. The association unit 24 associates analysis target images with a plurality of topics in ascending order of reconstruction error. The association unit 24 outputs information indicating the result of the association (association information) to the similarity integration unit 25.

類似度統合部２５は、対応付け部２４から対応付け情報を受け取る。類似度統合部２５は、蓄積部２１から画像辞書を受け取る。類似度統合部２５は、分析対象画像と各トピックの識別器との類似度を算出する。類似度統合部２５は、対応付け部２４から受け取ったトピック毎の再構成誤差を用いて類似度を統合する。類似度統合部２５は、統合された結果を示す情報（統合情報）を画像意味ラベル付与部２６へ出力する。 The similarity integration unit 25 receives the association information from the association unit 24. The similarity integration unit 25 receives the image dictionary from the storage unit 21. The similarity integration unit 25 calculates the similarity between the analysis target image and the classifier of each topic. The similarity integration unit 25 integrates the similarity using the reconstruction error for each topic received from the association unit 24. The similarity integrating unit 25 outputs information (integrated information) indicating the integrated result to the image meaning label attaching unit 26.

画像意味ラベル付与部２６は、類似度統合部２５から受け取った統合情報を用いて、分析対象画像に意味ラベルを付与する。画像意味ラベル付与部２６は、意味ラベルが付与された画像を蓄積部２１に格納する。 The image meaning label assigning unit 26 assigns a meaning label to the analysis target image using the integrated information received from the similarity integrating unit 25. The image meaning label assigning unit 26 stores the image assigned the meaning label in the storage unit 21.

図６は、画像辞書生成装置１０によって生成された画像辞書を用いて、画像意味ラベル付与装置２０が画像や映像に意味ラベルを付与する処理の具体例を表すフローチャートである。図７は、画像辞書を用いて画像や映像に意味ラベルを付与する処理の概略を表す概略図である。以下、画像辞書を用いて画像や映像に意味ラベルを付与する処理の具体例について説明する。 FIG. 6 is a flowchart illustrating a specific example of processing in which the image meaning label assigning device 20 assigns a meaning label to an image or video using the image dictionary generated by the image dictionary generating device 10. FIG. 7 is a schematic diagram showing an outline of a process for assigning a semantic label to an image or video using an image dictionary. Hereinafter, a specific example of processing for assigning a semantic label to an image or video using an image dictionary will be described.

まず、意味ラベル付与装置２０は、全局代表情報及び局所代表情報を蓄積部１から取得する（ステップＳ５０１）。次に、意味ラベル付与装置２０は、意味ラベルの付与対象となる画像（例えばＪ枚）を読み込む（ステップＳ５０２）。次に、意味ラベル付与装置２０は、変数ｊに１を代入する（ステップＳ５０３）。次に、意味ラベル付与装置２０は、ｊ番目の画像とトピックモデルＤｃ（ｃ＝１、・・・、Ｋ）とを用いて、再構成誤差ε_ｊｃを算出する。意味ラベル付与装置２０は、所定の閾値以下の再構成誤差が算出されたトピックモデルＤｃとｊ番目の画像とを対応付ける。例えば、図７において、クロス“×”で表示されているトピックモデル（Ｔ１及びＴｍ−１）と画像ｊとが対応付けられている。 First, the semantic label assigning device 20 acquires all-station representative information and local representative information from the storage unit 1 (step S501). Next, the semantic label assigning device 20 reads an image (for example, J images) to which a semantic label is assigned (step S502). Next, the semantic label assigning apparatus 20 substitutes 1 for the variable j (step S503). Next, the semantic label assignment apparatus 20 calculates a reconstruction error ε _jc using the j-th image and the topic model Dc (c = 1,..., K). The semantic label assigning device 20 associates the topic model Dc for which a reconstruction error equal to or less than a predetermined threshold is calculated with the j-th image. For example, in FIG. 7, topic models (T1 and Tm−1) displayed as a cross “×” are associated with the image j.

次に、意味ラベル付与装置２０は、図７の表の列毎に、ｊが対応付けられたトピックモデルに関する特徴量とその列にある個々の識別器との類似度ｓｃとを算出する。そして、意味ラベル付与装置２０は、意味ラベルＣｉ（ｉ＝１，…，Ｌ：ｉ及びＬは１以上の整数）毎に算出した再構成誤差を重み付けとして用いて、算出された類似度ｓｃを統合することで、画像ｊと意味ラベルＣｉとの類似度を取得する。対応付けられた再構成誤差をε_ｊｃ’とすると、重み付けｗｊｃは以下の式５によって算出される。

Next, the semantic label assignment apparatus 20 calculates, for each column in the table of FIG. 7, the feature amount related to the topic model associated with j and the similarity sc between the individual classifiers in the column. Then, the semantic label assigning device 20 uses the reconstruction error calculated for each semantic label Ci (i = 1,..., L: i and L are integers of 1 or more) as weights, and calculates the calculated similarity sc. By integrating, the similarity between the image j and the semantic label Ci is acquired. When the associated reconstruction error is ε _jc ′, the weighting wjc is calculated by the following equation (5).

Ｑは、行毎に画像ｊが対応するトピックの数を表す。画像ｊとある意味ラベルとの類似度は、式６によって算出される。

Q represents the number of topics to which the image j corresponds for each row. The similarity between the image j and a certain meaning label is calculated by Equation 6.

次に、意味ラベル付与装置２０は、ステップＳ５０５において取得された類似度の高い方から順にＲ個までの意味ラベルを画像ｊに付与する（ステップＳ５０６）。次に意味ラベル付与装置２０は、全ての画像に意味ラベルを付与したか否か判定する（ステップＳ５０７）。意味ラベルが付与されていない画像が存在する場合（ステップＳ５０７−ＹＥＳ）、すなわち変数ｊがＪよりも小さい場合、意味ラベル付与装置２０は変数ｊをインクリメントしてステップＳ５０４の処理に戻る。一方、意味ラベルが付与されていない画像が存在しない場合（ステップＳ５０７−ＮＯ）、すなわち変数ｊがＪ以上である場合、意味ラベル付与装置２０は処理を終了する。 Next, the semantic label assigning device 20 assigns up to R semantic labels to the image j in order from the highest similarity acquired in step S505 (step S506). Next, the semantic label assigning device 20 determines whether or not semantic labels have been assigned to all images (step S507). If there is an image to which no semantic label is assigned (step S507: YES), that is, if the variable j is smaller than J, the semantic label assigning device 20 increments the variable j and returns to the process of step S504. On the other hand, if there is no image to which no semantic label is assigned (step S507—NO), that is, if the variable j is equal to or greater than J, the semantic label assignment device 20 ends the process.

画像辞書生成装置１０では、画像群の全体情報を表す全局代表情報と、画像群に分けられたトピックの局所代表情報とを用いて、画像が複数のトピックに対応付けられる。また、それぞれのトピックの画像への貢献度は、再構成誤差の逆数によって与えられる。それぞれのトピックの画像への貢献度を利用して、トピック毎の識別結果が統合される。これらの処理により、精度良く画像辞書を生成することが可能となる。 In the image dictionary generation device 10, an image is associated with a plurality of topics using all-station representative information representing the entire information of the image group and local representative information of the topics divided into the image group. Also, the degree of contribution of each topic to the image is given by the reciprocal of the reconstruction error. Using the degree of contribution of each topic to the image, the identification results for each topic are integrated. With these processes, it is possible to generate an image dictionary with high accuracy.

また、画像辞書生成装置１０では、全ての意味ラベルに関する学習画像群（様々な意味合いを持つ画像群）に基づいて、全ての意味ラベルに共通したトピックモデルが生成される。そのため、よりバリエーションに長けたトピックモデルを生成することが可能となる。例えば、“ビーチ”という意味ラベルに関する画像群のみから“人”のトピックモデルを生成する場合に比べて、“ビーチ”、“祭り”、“会議”、“野生”、“飲み会”などの豊富な意味ラベルに関する画像群から“人”のトピックモデルを生成した方が、より精度が良く汎用性の高いトピックモデルを生成することが可能となる。 Also, the image dictionary generation device 10 generates a topic model common to all semantic labels based on learning image groups (image groups having various meanings) related to all semantic labels. Therefore, it is possible to generate a topic model that is more varied. For example, “Beach”, “Festival”, “Meeting”, “Wild”, “Drinking Party”, etc. are more abundant than when generating “People” topic models only from images related to the meaning label “Beach”. If a “person” topic model is generated from a group of images related to various semantic labels, a more accurate and versatile topic model can be generated.

なお、画像辞書の生成に際して機械学習手法が用いられるため、従来は学習データが多くなるにつれて処理時間が多くなるという問題もあった。例えば、非特許文献１のように一つの意味ラベルに対して全ての学習データを用いて一つの識別モデルを算出するような画像辞書生成方法では、学習画像の量に応じて膨大な処理時間を要してしまうという問題があった。このような問題に対し、画像辞書生成装置１０では、全ての意味ラベルに対して共通したトピックモデルを生成するため、学習画像の量が増大した場合に処理時間の増大を抑える事が可能となる。
また、意味ラベルが大量に存在する場合や、意味ラベル付与対象となる画像が大量に存在する場合、より効率よく意味ラベルを付与することが可能となる。 In addition, since a machine learning method is used when generating an image dictionary, there has been a problem that processing time increases as learning data increases. For example, in an image dictionary generation method in which one identification model is calculated using all learning data for one semantic label as in Non-Patent Document 1, an enormous processing time is required depending on the amount of learning images. There was a problem that it was necessary. In order to deal with such a problem, the image dictionary generation device 10 generates a common topic model for all semantic labels, so that it is possible to suppress an increase in processing time when the amount of learning images increases. .
In addition, when there are a large number of semantic labels, or when there are a large number of images to which a semantic label is to be applied, it is possible to apply the semantic labels more efficiently.

＜変形例＞
図３のフローチャートのステップＳ３０６の処理において、学習画像対応付け部５は、再構成誤差に基づいて他の基準でトピックと学習画像との対応付けを行っても良い。例えば、学習画像対応付け部５は、各トピックについて、予め定められた閾値以下の再構成誤差の画像を全てそのトピックに対応付けても良い。 <Modification>
In the process of step S306 in the flowchart of FIG. 3, the learning image association unit 5 may associate the topic with the learning image based on another criterion based on the reconstruction error. For example, the learning image associating unit 5 may associate all the images of reconstruction errors that are equal to or less than a predetermined threshold with respect to each topic.

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 The embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to this embodiment, and includes designs and the like that do not depart from the gist of the present invention.

１０…画像辞書生成装置，１…蓄積部，２…学習画像群収集部，３…特徴量抽出部，４…代表情報抽出部，４１…全局代表情報抽出部，４２…局所代表情報抽出部，５…学習画像対応付け部，６…画像辞書生成部，２０…画像意味ラベル付与装置，２１…蓄積部，２２…画像収集部，２３…特徴量抽出部，２４…対応付け部，２５…類似度統合部，２６…画像意味ラベル付与部 DESCRIPTION OF SYMBOLS 10 ... Image dictionary production | generation apparatus, 1 ... Accumulation part, 2 ... Learning image group collection part, 3 ... Feature-value extraction part, 4 ... Representative information extraction part, 41 ... All station representative information extraction part, 42 ... Local representative information extraction part, DESCRIPTION OF SYMBOLS 5 ... Learning image matching part, 6 ... Image dictionary production | generation part, 20 ... Image meaning label provision apparatus, 21 ... Accumulation part, 22 ... Image collection part, 23 ... Feature-value extraction part, 24 ... Association part, 25 ... Similarity Degree integration unit, 26 ... image meaning label assigning unit

Claims

A feature amount extraction unit that extracts feature amounts of a plurality of learning images associated with a certain meaning label;
An all-station representative information extraction unit that extracts all-station representative information indicating features of a plurality of learning images associated with the semantic label, based on the feature amount;
Clustering the feature quantities, and for each cluster, a local representative information extraction unit that extracts local representative information based on the feature quantities included in the cluster;
A learning image association unit that associates the learning image with the cluster based on the all-station representative information and the local representative information;
An image dictionary generation unit that generates a classifier of each cluster based on the association between the cluster and a plurality of learning images, and acquires a combination of the classifiers as an image dictionary of the semantic labels;
An image dictionary generation device comprising:

The learning image association unit calculates a reconstruction error between each learning image and the cluster based on the all-station representative information and the local representative information, and the learning image and the cluster in ascending order of the reconstruction error. The image dictionary generation device according to claim 1, which is associated with the image dictionary.

A feature amount extraction step for extracting feature amounts for a plurality of learning images associated with a certain meaning label;
All-station representative information extraction step for extracting all-station representative information indicating the characteristics of a plurality of learning images associated with the semantic label based on the feature amount;
Clustering the feature amounts, and extracting local representative information based on the feature amounts included in the clusters for each cluster; and
A learning image association step for associating the learning image with the cluster based on the all-station representative information and the local representative information;
An image dictionary generating step for generating a classifier of each cluster based on the association between the cluster and a plurality of learning images, and acquiring a combination of the classifiers as an image dictionary of the semantic labels;
An image dictionary generation method comprising:

A computer program for operating a computer as the image dictionary generation device according to claim 1.