JP5650628B2

JP5650628B2 - Image dictionary generation device, image dictionary generation method, and image dictionary generation program

Info

Publication number: JP5650628B2
Application number: JP2011251525A
Authority: JP
Inventors: 泳青孫; 豪入江; 佐藤　隆; 隆佐藤; 明小島; 森本　正志; 正志森本; 数藤　恭子; 恭子数藤
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2011-11-17
Filing date: 2011-11-17
Publication date: 2015-01-07
Anticipated expiration: 2031-11-17
Also published as: JP2013109389A

Description

本発明は、画像や映像の内容がどのような概念であるかを認識するときに必要となる画像辞書を生成する画像辞書生成装置、画像辞書生成方法および画像辞書生成プログラムに関する。 The present invention relates to an image dictionary generation apparatus, an image dictionary generation method, and an image dictionary generation program that generate an image dictionary that is necessary when recognizing what concept an image or video has.

画像データベースの構築や画像・映像類似検索を実現するために、概念を表す言葉である意味コンセプトに関する画像辞書の生成が必要となる。しかし、画像辞書生成するための学習画像データは意味コンセプト毎にマッチした十分の量が得られないことが多い。例えば、手動での学習画像データ収集する場合、目視の選別で多大のコストと時間がかかるため、十分の量が集まるのは困難である。この学習画像データ不足を補うため、解決策の一つは学習データの属するドメイン（以下、目標ドメインという）と異なるドメイン（以下、元ドメインという）の画像データを利用することである。例えば、日本の放送映像を対象とした画像辞書生成するときに、意味コンセプトに関する学習データ（目標ドメインデータ）不足の場合、アメリカの放送映像やウェブ画像などの他のドメインに同じ意味コンセプトを持つデータ（元ドメインデータ）を用いれば、学習データを補うことができる。 In order to realize image database construction and image / video similarity search, it is necessary to generate an image dictionary related to a semantic concept, which is a word representing a concept. However, in many cases, a sufficient amount of learning image data for generating an image dictionary that matches each semantic concept cannot be obtained. For example, when learning image data is collected manually, it is difficult to collect a sufficient amount because it requires a lot of cost and time for visual selection. In order to make up for the shortage of learning image data, one solution is to use image data of a domain (hereinafter referred to as an original domain) different from a domain (hereinafter referred to as a target domain) to which the learning data belongs. For example, when creating an image dictionary for Japanese broadcast video, if there is insufficient learning data (target domain data) on semantic concepts, data with the same semantic concept in other domains such as American broadcast video and web images If (original domain data) is used, learning data can be supplemented.

従来技術として、ＴＲＥＣＶＩＤ（映像検索ならびにそのための映像解析技術の高度化をめざし、米国の標準技術局（ＮＩＳＴ：National Institute of Standards and Technology）とＤＴＯ（Disruptive Technology Office）が主催する国際共同研究プロジェクト；http://www-nlpir.nist.gov/projects/tv2011/tv2011.html#data）が提供する映像を対象にした画像辞書を生成するときに、意味コンセプトに関するＴＲＥＣＶＩＤの学習キーフレーム画像（目標ドメインデータ）と、その意味コンセプトをクエリーキーとしてウェブから収集してきたウェブ画像（元ドメインデータ）と単純に混ぜて、学習データとするものが知られている（例えば、非特許文献１参照）。 As a conventional technology, TRECVID (an international joint research project sponsored by the National Institute of Standards and Technology (NIST) and Disruptive Technology Office (NIST) aiming at the advancement of video search and video analysis technology for that purpose; TRECVID learning keyframe images related to semantic concepts (target domain) when generating an image dictionary for video provided by http://www-nlpir.nist.gov/projects/tv2011/tv2011.html#data Data) and a web image (original domain data) collected from the web using the semantic concept as a query key are simply mixed to obtain learning data (for example, see Non-Patent Document 1).

また、ＴＲＥＣＶＩＤの映像を対象にした画像辞書を生成するときに、まずは特徴量空間で非特許文献１と同じ手法で収集してきたウェブ画像とＴＲＥＣＶＩＤの意味コンセプトに関する学習キーフレーム画像の特徴分布を分析した上、ウェブ画像毎に重み付け、それらのウェブ画像と学習キーフレーム画像と合わせて、学習データとするものも知られている（例えば、非特許文献２参照）。 Also, when generating an image dictionary for TRECVID video, first analyze the feature distribution of the web image collected in the same way as in Non-Patent Document 1 in the feature amount space and the learning key frame image related to the semantic concept of TRECVID. In addition, it is also known that weighting is performed for each web image and learning data is combined with the web image and the learning key frame image (for example, see Non-Patent Document 2).

IVA-NLPR-IA-CAS TRECVID 2009: High Level Features Extraction www-nlpir.nist.gov/projects/tvpubs/tv9.papers/iva-nlpr-ia-cas.pdfIVA-NLPR-IA-CAS TRECVID 2009: High Level Features Extraction www-nlpir.nist.gov/projects/tvpubs/tv9.papers/iva-nlpr-ia-cas.pdf Columbia University/VIREO-CityU/IRIT TRECVID2008 High-Level Feature Extraction and Interactive Video Search http://www-nlpir.nist.gov/projects/tvpubs/tv8.papers/columbia.pdfColumbia University / VIREO-CityU / IRIT TRECVID2008 High-Level Feature Extraction and Interactive Video Search http://www-nlpir.nist.gov/projects/tvpubs/tv8.papers/columbia.pdf

しかしながら、異なるドメインの画像が同じ意味コンセプトを表現しても、特徴空間で異なる分布を持つことがよくある。例えば、「電話機」に関する映像のキーフレーム画像には、オフィスにある机の上に小さい電話機が置いてある画像であるのに対して、ウェブ画像には、電話機がほぼ中央の位置で鮮明に大きく写されている。すなわち、色とサイズの特徴空間で「電話機」の画像を表現すると、映像のキーフレーム画像とウェブ画像の特徴分布が違うものになる。 However, even if images from different domains represent the same semantic concept, they often have different distributions in the feature space. For example, a key frame image of a video relating to “telephone” is an image in which a small telephone is placed on a desk in an office, whereas a web image is clearly large at a central position. It is copied. That is, if the image of the “phone” is expressed in the color and size feature space, the feature distribution of the video key frame image and the web image will be different.

前述したように、非特許文献１は、異なる特徴分布を持つ画像データを単純にあわせ、学習データとして用いるため、生成した画像辞書の精度が極めて低いという問題がある。また、非特許文献２は、異なるドメインの画像データの異なる特徴分布を考慮し、ウェブ画像毎に重みを付けることで、特徴区間でウェブ画像（元ドメインデータ）の特徴分布を映像のキーフレーム画像（目標ドメインデータ）の分布にフィッテイングさせようとしているが、ウェブ画像において具体的にどの部分の特徴分布が違うのかを考慮しないため、ウェブ画像と映像のキーフレーム画像とを合わせた学習データを用いて生成する画像辞書の精度が低いという問題がある。 As described above, Non-Patent Document 1 has a problem that the accuracy of the generated image dictionary is extremely low because image data having different feature distributions are simply combined and used as learning data. Further, Non-Patent Document 2 considers different feature distributions of image data of different domains and weights each web image so that the feature distribution of the web image (original domain data) is represented in the key frame image of the video in the feature section. I am trying to fit the distribution of (target domain data), but I do not consider which part of the feature distribution is different in the web image, so I have to add learning data that combines the web image and the keyframe image of the video. There is a problem that the accuracy of the image dictionary generated by using the image dictionary is low.

例えば、図４に示すように、「電話機」の意味コンセプトは、「電話機」、「手」や「机」などの部分で構成されている。色とサイズの特徴空間で意味コンセプトを表現すると、ＴＲＥＣＶＩＤのドメインとウェブ画像のドメインにおける「電話機」が違う特徴分布になっても、「机」などの部分が類似特徴分布を持つ。この場合、画像毎で均一の重みをつけても、元ドメインの画像分布が目標ドメインの分布にフィッティングさせることができないという問題がある。 For example, as shown in FIG. 4, the meaning concept of “telephone” is composed of parts such as “telephone”, “hand”, and “desk”. When the semantic concept is expressed in the feature space of color and size, even if the “telephone” in the TRECVID domain and the web image domain have different feature distributions, the portion such as “desk” has a similar feature distribution. In this case, there is a problem that even if a uniform weight is given to each image, the image distribution of the original domain cannot be fitted to the distribution of the target domain.

本発明は、このような事情に鑑みてなされたもので、画像や映像の内容がどのような概念であるかを認識するときに必要となる画像辞書を精度よく生成することができる画像辞書生成装置、画像辞書生成方法および画像辞書生成プログラムを提供することを目的とする。 The present invention has been made in view of such circumstances, and an image dictionary generation capable of accurately generating an image dictionary necessary for recognizing what kind of concept an image or video content is. An object is to provide an apparatus, an image dictionary generation method, and an image dictionary generation program.

本発明は、画像の内容を認識する際に用いる画像辞書を生成する画像辞書生成装置であって、概念を表す言葉である意味コンセプトに関する学習データの属するドメインである目標ドメインの画像収集と、前記目標ドメインと異なる参照すべきドメインである元ドメインの画像収集を行う画像収集手段と、前記目標ドメインと前記元ドメインの画像におけるそれぞれのトピックを算出し、算出したトピックの情報と該トピックに属する画像情報を出力するトピック算出手段と前記トピック算出手段により出力された前記元ドメインのトピックを前記目標ドメインのトピックに対応付け、対応付けられたトピックの情報と、該トピックに属する画像情報を出力するトピック対応付け手段と、前記トピック毎に前記元ドメインのトピックに関する画像情報を前記目標ドメインのトピックに関する画像の特徴分布にフィッティングさせて合成し、前記目標ドメインと前記元ドメインのトピック情報とトピック毎の画像情報を出力する画像情報合成手段と、前記目標ドメインと前記元ドメインのトピック情報と、前記トピック毎の画像情報から意味コンセプトを表す画像辞書を生成する辞書生成手段とを備えることを特徴とする。 The present invention is an image dictionary generation device that generates an image dictionary used when recognizing the contents of an image, and includes collecting an image of a target domain that is a domain to which learning data related to a semantic concept that is a word representing a concept belongs, Image collection means for collecting an image of an original domain that is a domain to be referred to, which is different from the target domain, and calculates each topic in the image of the target domain and the original domain, information on the calculated topic and an image belonging to the topic A topic calculation unit that outputs information and a topic of the original domain output by the topic calculation unit are associated with a topic of the target domain, and information on the associated topic and a topic that outputs image information belonging to the topic Associating means and for each topic related to the topic of the original domain Image information combining means for fitting image information to a feature distribution of an image related to the topic of the target domain and outputting the target domain, topic information of the original domain, and image information for each topic, the target domain, and the target domain It is characterized by comprising dictionary generating means for generating an image dictionary representing a semantic concept from the topic information of the original domain and the image information for each topic.

本発明は、画像の内容を認識する際に用いる画像辞書を生成する画像辞書生成方法であって、概念を表す言葉である意味コンセプトに関する学習データの属するドメインである目標ドメインの画像収集と、前記目標ドメインと異なる参照すべきドメインである元ドメインの画像収集を行う画像収集ステップと、前記目標ドメインと前記元ドメインの画像におけるそれぞれのトピックを算出し、算出したトピックの情報と該トピックに属する画像情報を出力するトピック算出ステップと前記トピック算出ステップにより出力された前記元ドメインのトピックを前記目標ドメインのトピックに対応付け、対応付けられたトピックの情報と、該トピックに属する画像情報を出力するトピック対応付けステップと、前記トピック毎に前記元ドメインのトピックに関する画像情報を前記目標ドメインのトピックに関する画像の特徴分布にフィッティングさせて合成し、前記目標ドメインと前記元ドメインのトピック情報とトピック毎の画像情報を出力する画像情報合成ステップと、前記目標ドメインと前記元ドメインのトピック情報と、前記トピック毎の画像情報から意味コンセプトを表す画像辞書を生成する辞書生成ステップとを備えることを特徴とする。 The present invention is an image dictionary generation method for generating an image dictionary used when recognizing the contents of an image, the image collection of a target domain that is a domain to which learning data related to a semantic concept that is a word representing a concept belongs, An image collection step for collecting an image of an original domain that is a domain to be referred to, which is different from the target domain, and calculates each topic in the image of the target domain and the original domain, and information on the calculated topic and an image belonging to the topic A topic calculation step for outputting information, and a topic in the original domain output in the topic calculation step in association with a topic in the target domain, and information on the associated topic and a topic for outputting image information belonging to the topic Mapping step, and for each topic, the original domain An image information combining step for fitting image information related to a pick to a feature distribution of an image related to a topic of the target domain and combining the target domain, topic information of the original domain, and image information for each topic, and the target domain And a dictionary generation step of generating an image dictionary representing a semantic concept from the topic information of the original domain and the image information for each topic.

本発明は、コンピュータを前記画像辞書生成装置として機能させる画像辞書生成プログラムである。 The present invention is an image dictionary generation program that causes a computer to function as the image dictionary generation device.

本発明によれば、画像や映像の内容がどのような概念であるかを認識するときに必要となる画像辞書を精度よく生成することができるという効果が得られる。 According to the present invention, it is possible to obtain an effect that an image dictionary necessary for recognizing what concept the contents of an image or video have can be generated with high accuracy.

本発明の一実施形態の構成を示すブロック図である。It is a block diagram which shows the structure of one Embodiment of this invention. 図１に示す装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the apparatus shown in FIG. 距離マトリクスの一例を示す説明図である。It is explanatory drawing which shows an example of a distance matrix. 意味コンセプトの一例を示す説明図である。It is explanatory drawing which shows an example of a semantic concept.

以下、図面を参照して、本発明の一実施形態による画像辞書生成装置を説明する。図１は同実施形態の構成を示すブロック図である。この図に示すように画像辞書生成装置は、画像記憶部１、画像収集部２、トピック算出部３、対応付け部４、画像情報合成部５、辞書生成部６及び辞書記憶部７から構成される。 Hereinafter, an image dictionary generating apparatus according to an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing the configuration of the embodiment. As shown in this figure, the image dictionary generation device is composed of an image storage unit 1, an image collection unit 2, a topic calculation unit 3, an association unit 4, an image information synthesis unit 5, a dictionary generation unit 6, and a dictionary storage unit 7. The

画像記憶部１は、事前に用意しておいた画像をドメイン毎で意味コンセプト毎で蓄積・管理する。画像記憶部１は、画像収集部２から意味コンセプトとドメインの指定を受け取ると、画像収集部２に対してその意味コンセプトに関する目標ドメインと元ドメインの画像を出力する。画像収集部２は、画像記憶部１に意味コンセプトとドメインの指定を出力すると、格納された意味コンセプトに関する目標ドメインと元ドメインの画像を受け取り、受け取った画像をトピック算出部３へ出力する。 The image storage unit 1 stores and manages images prepared in advance for each semantic concept for each domain. When the image storage unit 1 receives the designation of the semantic concept and the domain from the image collection unit 2, the image storage unit 1 outputs an image of the target domain and the original domain related to the semantic concept to the image collection unit 2. When outputting the semantic concept and domain designation to the image storage unit 1, the image collection unit 2 receives the images of the target domain and the original domain related to the stored semantic concept, and outputs the received image to the topic calculation unit 3.

トピック算出部３は、画像収集部２から目標ドメインの画像と元ドメインの画像を受け取り、目標ドメインの画像と元ドメインの画像においてそれぞれのトピック算出を行う。そして、トピック算出部３は、目標ドメインの画像と元ドメインの画像に関する個々のトピック情報とトピックに属する画像情報を対応付け部４へ出力する。対応付け部４は、トピック算出部３からそれぞれのドメインのトピック情報を受け取って、元ドメインのトピックを目標ドメインのトピックに対応付ける。そして、対応付け部４は、対応付けられたトピック情報とトピックに属する画像情報を画像情報合成部５へ出力する。 The topic calculation unit 3 receives the image of the target domain and the image of the original domain from the image collection unit 2, and performs respective topic calculations on the image of the target domain and the image of the original domain. Then, the topic calculation unit 3 outputs the individual topic information related to the target domain image and the original domain image and the image information belonging to the topic to the association unit 4. The association unit 4 receives the topic information of each domain from the topic calculation unit 3, and associates the topic of the original domain with the topic of the target domain. Then, the associating unit 4 outputs the associated topic information and the image information belonging to the topic to the image information combining unit 5.

画像情報合成部５は、対応付け部４から両ドメイン間の対応付けられたトピック情報とトピックに関する画像情報を受け取り、その対応付けられたトピック情報を用いて、トピック毎で、元ドメインの画像の特徴分布を目標ドメインンの画像の特徴分布にフィッテイングさせ、両ドメインの画像情報を合わせる。そして、画像情報合成部５は、合わせたトピック情報とトピックに関する画像情報を辞書生成部６へ出力する。 The image information synthesizing unit 5 receives the topic information associated with both domains and the image information related to the topic from the associating unit 4, and uses the associated topic information for each topic in the image of the original domain. The feature distribution is fitted to the feature distribution of the target domain image, and the image information of both domains is matched. Then, the image information synthesis unit 5 outputs the combined topic information and the image information related to the topic to the dictionary generation unit 6.

辞書生成部６は、画像情報合成部５から合わせた両ドメインのトピック情報とトピックに関する画像情報を受け取り、学習手法を用いて、トピックモデルを生成する。辞書生成部６は、それらのトピックモデルを合わせて、意味コンセプトを表す画像辞書として、トピックの情報と画像辞書を辞書記憶部７に格納する。 The dictionary generation unit 6 receives topic information of both domains combined from the image information synthesis unit 5 and image information related to the topic, and generates a topic model using a learning method. The dictionary generation unit 6 combines the topic models and stores topic information and an image dictionary in the dictionary storage unit 7 as an image dictionary representing a semantic concept.

画像の意味を表した基本的な単位は画像のトピックと考えられる。ここで、トピックとは画像領域分割やクラスタリングなどの手法で得られた領域或いは画素の集合に対応つける。精度高い画像領域分割手法を用いれば、画像領域は実世界のオブジェクト（例：空、車輪、人の顔、胴体など）に対応させられると考えてもいい。 The basic unit that expresses the meaning of an image can be considered as the topic of the image. Here, a topic is associated with a region or a set of pixels obtained by a method such as image region division or clustering. If a highly accurate image region segmentation method is used, it can be considered that an image region can correspond to a real-world object (eg, sky, wheel, human face, torso, etc.).

次に、図２を参照して、図１に示す画像辞書生成装置の処理動作を説明する。図２は、図１に示す画像辞書生成装置の処理動作を示すフローチャートである。まず、画像収集部２は、画像記憶部１から意味コンセプトに関する目標ドメインと元ドメインの画像を読み込み、トピック算出部３へ出力する（ステップＳ１）。これを受けたトピック算出部３は、目標ドメインの画像と元ドメインの画像においてそれぞれのトピック抽出を行い、抽出したトピック情報を対応付け部４へ出力する（ステップＳ２）。 Next, the processing operation of the image dictionary generation device shown in FIG. 1 will be described with reference to FIG. FIG. 2 is a flowchart showing the processing operation of the image dictionary generating apparatus shown in FIG. First, the image collection unit 2 reads the images of the target domain and the original domain related to the semantic concept from the image storage unit 1 and outputs them to the topic calculation unit 3 (step S1). Receiving this, the topic calculation unit 3 performs topic extraction on the target domain image and the original domain image, and outputs the extracted topic information to the association unit 4 (step S2).

ここで、元ドメインの画像を例としてトピック抽出処理について説明する。まず元ドメインの個々の画像に対して、特徴量の抽出を行う。例えば、画像のＳＩＦＴ特徴点を特徴量として抽出してもいい。そして、元ドメインの画像におけるすべてのＳＩＦＴ特徴点におけるクラスタリングを行う。最後に、一定以上の特徴点を持つクラスタを元ドメインのトピックとする。また、画像領域分割手法を用いて、元ドメインの個々の画像に領域分割を行って、得られた領域群におけるクラスタリングを行う。一定以上の領域を持つクラスタを元ドメインのトピックとしてもよい。この同じ処理により目標ドメインのトピック抽出もできる。ドメインの画像に関するトピック情報は得られた個々のクラスタの中心となる。すなわち、クラスタにある全ての特徴量の平均値である。トピックに属する画像情報はクラスタである。 Here, topic extraction processing will be described using an image of the original domain as an example. First, feature amounts are extracted from individual images in the original domain. For example, a SIFT feature point of an image may be extracted as a feature amount. Then, clustering is performed on all SIFT feature points in the original domain image. Finally, a cluster having a certain feature point or more is set as a topic of the original domain. In addition, using an image region dividing method, region division is performed on individual images in the original domain, and clustering is performed on the obtained region group. A cluster having a certain area or more may be used as a topic of the original domain. This same process can also extract the topic of the target domain. Topic information about domain images is the center of the individual clusters obtained. That is, the average value of all feature quantities in the cluster. Image information belonging to a topic is a cluster.

次に、対応付け部４は、トピック算出部３から出力された両ドメインのトピック情報を用いて、元ドメインのトピックを目標ドメインのトピックに対応付ける（ステップＳ３）。対応付け処理は、ステップＳ３１〜Ｓ３６の処理によって行う。対応付け部４は、元ドメインのトピック情報（Ｍ個とする）と目標ドメインのトピック情報（Ｎ個とする）を読み込む（ステップＳ３１）。そして、対応付け部４は、両ドメインのトピック間の距離ｄｍｎ，ｍ＝１．．．Ｍ，ｎ＝１．．．Ｎを算出する（ステップＳ３２）。ここで、ｄｍｎは元ドメインのｍ番目トピックと目標ドメインのｎ番目トピックとのユークリッド距離を算出すればよい。両ドメインのトピック間の距離は図３に示す距離マトリクスを用いて表せる。 Next, the associating unit 4 associates the topic of the original domain with the topic of the target domain using the topic information of both domains output from the topic calculating unit 3 (step S3). The association process is performed by the processes of steps S31 to S36. The associating unit 4 reads the topic information of the original domain (M) and the topic information of the target domain (N) (step S31). Then, the associating unit 4 determines the distance dmn, m = 1. . . M, n = 1. . . N is calculated (step S32). Here, dmn may be calculated as the Euclidean distance between the mth topic in the original domain and the nth topic in the target domain. The distance between topics in both domains can be expressed using the distance matrix shown in FIG.

次に、対応付け部４は、変数ｍに１をセットする（ステップＳ３３）。そして、対応付け部４は、ｍ番目の行において距離の最小値ｄｍｊを求め、最小値ｄｍｊが一定値（Ｄ１）より小であるか否かを判定する（ステップＳ３４）。この判定の結果、最小値ｄｍｊが一定値（Ｄ１）より小でない場合、変数ｍに１を加えて、最小値ｄｍｊが一定値（Ｄ１）より小になるまで繰り返す。 Next, the associating unit 4 sets 1 to the variable m (step S33). Then, the associating unit 4 obtains the minimum value dmj of the distance in the m-th row and determines whether or not the minimum value dmj is smaller than a certain value (D1) (step S34). If the minimum value dmj is not smaller than the constant value (D1) as a result of this determination, 1 is added to the variable m and the process is repeated until the minimum value dmj is smaller than the constant value (D1).

次に、最小値ｄｍｊが一定値（Ｄ１）より小になった場合、対応付け部４は、元ドメインのｍ番目のトピックが目標ドメインのｊ番目のトピックに対応付ける（ステップＳ３５）。そして、対応付け部４は、ｍ＝Ｍであるか否かを判定し（ステップＳ３６）、ｍ＝Ｍでなければ、変数ｍに１を加えて、ステップＳ３４に戻り、処理を繰り返す。対応付け部４は、ｍ＝Ｍになった時点で、両ドメインの対応付けられたトピック情報とトピックに属する画像情報を画像情報合成部５に出力する。 Next, when the minimum value dmj becomes smaller than the fixed value (D1), the associating unit 4 associates the mth topic in the original domain with the jth topic in the target domain (step S35). Then, the associating unit 4 determines whether m = M is satisfied (step S36). If m = M is not satisfied, 1 is added to the variable m, the process returns to step S34, and the process is repeated. The association unit 4 outputs the topic information associated with both domains and the image information belonging to the topic to the image information synthesis unit 5 when m = M.

ステップＳ３１〜Ｓ３６の処理により、目標ドメインのトピックに距離大きい、すなわち、関係ない元ドメインのトピックがフィルタリングされ、目標ドメインのトピックの近傍である元ドメインのトピックだけが選択されて対応付けられるため、元ドメインから目標ドメインの意味コンセプトに応じた質のよい学習データが選択できる。これらのトピックとトピックに関する画像情報を用いて、目標ドメインにおける画像辞書の精度がよくなる。 By the processing of steps S31 to S36, the topic of the target domain having a large distance, that is, the topic of the unrelated original domain is filtered, and only the topic of the original domain that is near the topic of the target domain is selected and associated. Good quality learning data can be selected from the original domain according to the semantic concept of the target domain. Using these topics and the image information related to the topics, the accuracy of the image dictionary in the target domain is improved.

次に、画像情報合成部５は、対応付け部５から出力された両ドメインの対応付けられたトピック情報とトピックに属する画像情報を用いてトピック毎に両ドメインの画像情報の特徴分布をフィッティングさせ、合成して、結果を辞書生成部６に出力する（ステップＳ４）。合成方法は、以下の（１）〜（３）の方法がある。 Next, the image information composition unit 5 fits the feature distribution of the image information of both domains for each topic using the topic information associated with both domains output from the association unit 5 and the image information belonging to the topic. , And outputs the result to the dictionary generator 6 (step S4). There are the following synthesis methods (1) to (3).

（１）目標ドメインにトピックｉに対応付けた元ドメインのトピックがない場合、ｉ番目のトピックに関する学習データはｉ番目のトピックに属するクラスタとなる。例えば、ＳＩＦＴ特徴点が画像特徴量の場合、ｉ番目のトピックに関する学習データはｉ番目のトピックに属くＳＩＦＴ特徴点のクラスタである。 (1) When there is no topic in the original domain associated with topic i in the target domain, the learning data related to the i-th topic is a cluster belonging to the i-th topic. For example, when the SIFT feature point is an image feature amount, the learning data regarding the i-th topic is a cluster of SIFT feature points belonging to the i-th topic.

（２）目標ドメインにトピックｉに対応付けた元ドメインのトピックが一つ（ｊとする）の場合、まずはｄｉｊが一定以下になるかを判断する。一定以下になれば、目標ドメインのｉ番目のトピックに属するクラスタと元ドメインのｊ番目のトピックに属するクラスタと合わせて、ｉ番目のトピックに関する学習データとする。一定以上になれば、両トピックに属する画像情報の確率分布を分析し、元ドメインのｊ番目のトピックに属するクラスタを目標ドメインのｉ番目のトピックに属するクラスタにフィッティングさせてから、ｉ番目のトピックに属するクラスタと合わせることで、ｉ番目のトピックに関する学習データとなる。例えば、両トピック間の確立分布の差異を表すカルバック・ライブラー距離（ＫＬｄｉｖｅｒｇｅｎｃｅ）を求めた上で、元ドメインのｊ番目のトピックに属するクラスタを目標ドメインのｉ番目のトピックに属するクラスタにフィッテイングさせて、合わせてｉ番目のトピックに関する学習データとなればよい。 (2) If there is one topic (originally j) in the original domain associated with topic i in the target domain, it is first determined whether dij is below a certain level. If it becomes below a certain level, the learning data related to the i-th topic is obtained by combining the cluster belonging to the i-th topic in the target domain and the cluster belonging to the j-th topic in the original domain. If it exceeds a certain level, the probability distribution of the image information belonging to both topics is analyzed, the cluster belonging to the jth topic in the original domain is fitted to the cluster belonging to the ith topic in the target domain, and then the ith topic By combining with the clusters belonging to, it becomes learning data regarding the i-th topic. For example, after obtaining a cul divergence (KL divergence) representing a difference in probability distribution between both topics, a cluster belonging to the jth topic in the original domain is fit to a cluster belonging to the ith topic in the target domain. The learning data related to the i-th topic may be obtained.

（３）目標ドメインにトピックｉに対応付けた元ドメインのトピックが複数（ｊ、ｌとする）場合、まずはｄｉｊ，ｄｉｌがすべて一定以下になるかを判断する。一定以下になれば、目標ドメインのｉ番目のトピックに属するクラスタと元ドメインのｊ、ｌ番目のトピックに属するクラスタと合わせて、ｉ番目のトピックに関する学習データとする。一定以上になれば、前記（２）におけるＮｏの場合の処理と同じく、順番に、ｉ番目のトピックとｊ番目のトピックに関するクラスタを合わせて、そして、新しいｉ番目のトピックに関するクラスタとｌ番目のトピックに関するクラスタを合わせる。 (3) When there are a plurality of topics (j, l) in the original domain associated with the topic i in the target domain, it is first determined whether or not dij and dir are all below a certain level. If it falls below a certain level, the cluster belongs to the i-th topic in the target domain and the clusters belonging to the j and l-th topics in the original domain are used as learning data for the i-th topic. If it becomes a certain value or more, as in the case of No in the above (2), the clusters for the i-th topic and the j-th topic are combined in order, and the cluster for the new i-th topic and the l-th topic Align clusters on topics.

次に、辞書生成部６は、画像情報合成部５から出力された目標ドメインのトピック情報とトピックに関するクラスタに対して、学習手法を用いてＮ個のトピックモデルを生成する。ここで用いる学習手法は公知の手法が適用でき、例えばＳＶＭを用いることができる。そして、辞書生成部６は、このＮ個のトピックモデルを意味コンセプトモデルに関する画像辞書とする（ステップＳ５）。そして、辞書生成部６は、画像辞書とＮ個のトピック情報を辞書記憶部７に記憶する。未知の画像に対して、意味コンセプトを識別するときに、まず画像の特徴量とトピック情報との類似度照合で、画像にＮ個のトピックを生成しておく。そして、生成された画像のトピックに対して、画像辞書を用いて、個々の識別結果を求める。最後に、Ｎ個の識別結果を合わせて、画像の最終の識別を行う。例えば、Ｎ個の識別結果の平均は最終の識別結果とすればよい。 Next, the dictionary generation unit 6 generates N topic models using the learning method for the topic domain topic information output from the image information synthesis unit 5 and the clusters related to the topics. As a learning method used here, a known method can be applied, and for example, SVM can be used. Then, the dictionary generation unit 6 sets the N topic models as an image dictionary related to the semantic concept model (step S5). The dictionary generation unit 6 stores the image dictionary and N pieces of topic information in the dictionary storage unit 7. When a semantic concept is identified for an unknown image, first, N topics are generated in the image by similarity matching between the feature amount of the image and topic information. Then, an individual identification result is obtained for the topic of the generated image using an image dictionary. Finally, the final identification of the image is performed by combining the N identification results. For example, the average of N identification results may be the final identification result.

以上説明したように、トピック毎で異なるドメインの画像データの特徴分布を解析した上、元ドメインの画像データのトピックと目標ドメインの画像データのトピックに対応つけ、トピック毎で元ドメインの画像情報の特徴分布を目標ドメインの画像情報の特徴分布にフィッテイングさせて、合わせることで、目標ドメインにおいて質のよい学習データが収集でき、その学習データを用いて生成した画像辞書が精度よく実現できる。 As explained above, after analyzing the feature distribution of the image data of different domains for each topic, it is associated with the topic of the image data of the original domain and the topic of the image data of the target domain. By fitting the feature distribution to the feature distribution of the image information of the target domain and combining them, high-quality learning data can be collected in the target domain, and an image dictionary generated using the learning data can be realized with high accuracy.

なお、図１における処理部の機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより画像辞書生成処理を行ってもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータシステム」は、ホームページ提供環境（あるいは表示環境）を備えたＷＷＷシステムも含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（ＲＡＭ）のように、一定時間プログラムを保持しているものも含むものとする。 Note that an image dictionary generation process is performed by recording a program for realizing the function of the processing unit in FIG. 1 on a computer-readable recording medium, causing the computer system to read and execute the program recorded on the recording medium. May be performed. Here, the “computer system” includes an OS and hardware such as peripheral devices. The “computer system” includes a WWW system having a homepage providing environment (or display environment). The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Further, the “computer-readable recording medium” refers to a volatile memory (RAM) in a computer system that becomes a server or a client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line. In addition, those holding programs for a certain period of time are also included.

また、上記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。また、上記プログラムは、前述した機能の一部を実現するためのものであってもよい。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であってもよい。 The program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line. The program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, what is called a difference file (difference program) may be sufficient.

以上、図面を参照して本発明の実施の形態を説明してきたが、上記実施の形態は本発明の例示に過ぎず、本発明が上記実施の形態に限定されるものではないことは明らかである。したがって、本発明の精神及び範囲を逸脱しない範囲で構成要素の追加、省略、置換、その他の変更を行っても良い。 As mentioned above, although embodiment of this invention has been described with reference to drawings, the said embodiment is only the illustration of this invention, and it is clear that this invention is not limited to the said embodiment. is there. Accordingly, additions, omissions, substitutions, and other modifications of components may be made without departing from the spirit and scope of the present invention.

画像や映像の内容がどのような概念であるかを認識するときに必要となる画像辞書の生成を高精度の生成することが不可欠な用途に適用できる。 The present invention can be applied to applications where it is indispensable to generate an image dictionary necessary for recognizing what kind of concept the contents of an image or video are.

１・・・画像記憶部、２・・・画像収集部、３・・・トピック算出部、４・・・対応付け部、５・・・画像情報合成部、６・・・辞書生成部、７・・・辞書記憶部 DESCRIPTION OF SYMBOLS 1 ... Image storage part, 2 ... Image collection part, 3 ... Topic calculation part, 4 ... Association part, 5 ... Image information synthetic | combination part, 6 ... Dictionary generation part, 7 ... Dictionary storage

Claims

An image dictionary generation device that generates an image dictionary used when recognizing the contents of an image,
Image collection means for collecting an image of a target domain that is a domain to which learning data related to a semantic concept that is a term representing a concept belongs, and collecting an image of an original domain that is a domain to be referred to different from the target domain;
Topic calculation means for calculating each topic in the images of the target domain and the original domain, and outputting the calculated topic information and image information belonging to the topic;
Associating the topic of the original domain output by the topic calculation means with the topic of the target domain, information on the associated topic, and topic association means for outputting image information belonging to the topic;
An image for outputting image information about the target domain, the topic information of the original domain, and image information for each topic by combining the image information related to the topic of the original domain with the feature distribution of the image related to the topic of the target domain for each topic Information synthesis means;
An image dictionary generating apparatus comprising: dictionary generating means for generating an image dictionary representing a semantic concept from the topic information of the target domain and the original domain, and image information for each topic.

An image dictionary generation method for generating an image dictionary used when recognizing the contents of an image,
An image collecting step of collecting an image of a target domain that is a domain to which learning data related to a semantic concept that is a word representing a concept belongs, and collecting an image of an original domain that is a domain to be referred to different from the target domain;
A topic calculation step of calculating respective topics in the images of the target domain and the original domain, and outputting the calculated topic information and image information belonging to the topic;
Associating the topic of the original domain output by the topic calculation step with the topic of the target domain, the topic association step of outputting information of the associated topic and image information belonging to the topic;
An image for outputting image information about the target domain, the topic information of the original domain, and image information for each topic by combining the image information related to the topic of the original domain with the feature distribution of the image related to the topic of the target domain for each topic An information synthesis step;
An image dictionary generation method comprising: a dictionary generation step of generating an image dictionary representing a semantic concept from the topic information of the target domain and the original domain, and the image information for each topic.

An image dictionary generation program for causing a computer to function as the image dictionary generation apparatus according to claim 1.