JP2008204007A

JP2008204007A - Image dictionary generation method, device and program

Info

Publication number: JP2008204007A
Application number: JP2007036995A
Authority: JP
Inventors: Yongqing Sun; 泳青孫; Satoshi Shimada; 聡嶌田; Masashi Morimoto; 正志森本
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2007-02-16
Filing date: 2007-02-16
Publication date: 2008-09-04
Anticipated expiration: 2027-02-16
Also published as: JP4755122B2

Abstract

<P>PROBLEM TO BE SOLVED: To extract a proper semantic label according to a content without previously selecting a semantic label to be imparted to accurately generate an image dictionary. <P>SOLUTION: Content related information that is text information wherein contents of the video content are explained is acquired from a Web site, the text information related to the content is collected from the Web site based on a content disclosure data or a content creation date, and a word extracted from the content related information, a plurality of words expressing a topic are extracted from the text information as a word set, a Web image related to the word set is collected, a visual pattern related to the topic is generated with the collected Web image as learning data, and the word set and the visual pattern are stored in an image dictionary storage means as the image dictionary. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、画像辞書生成方法及び装置及びプログラムに係り、特に、テキスト情報と画像情報が含まれたＷｅｂ情報を用いて画像辞書を作成するための画像辞書生成方法及び装置及びプログラムに関する。 The present invention relates to an image dictionary generation method, apparatus, and program, and more particularly, to an image dictionary generation method, apparatus, and program for creating an image dictionary using Web information including text information and image information.

従来の画像辞書生成方法として次のような方法がある。 There are the following methods as a conventional image dictionary generation method.

まず、映像に関して付与すべき意味ラベルを選択する。次に、その意味ラベルを表現した画像を学習データとして収集する。最後に、意味ラベルに対応付けるビジュアルパターンを学習データから求める。以上の処理により、映像に関する意味ラベルと対応付けたビジュアルパターンで構成した画像辞書を生成することができる（例えば、非特許文献1参照）。 First, a semantic label to be given for a video is selected. Next, images representing the meaning labels are collected as learning data. Finally, the visual pattern associated with the semantic label is obtained from the learning data. Through the above processing, an image dictionary composed of visual patterns associated with semantic labels related to video can be generated (for example, see Non-Patent Document 1).

また、学習データを効率的に収集する方法として、予め用意しておく意味ラベルを表すワードを用いてＷｅｂサイトから画像を収集する方法もある（例えば、非特許文献2参照）。
Y. Wu. B. L. Tseng and J. R. Smith, "Ontology-based Multi-Classification Leaning for Video Concept Detection," IEEE International Conference on Multimedia and Expo (ICME), June 2004. Yongqing Sun, Satoshi Shimada, Masashi Morimoto, "Visual pattern discovery using web images". ACM MIR workshop, 2006. In addition, as a method of efficiently collecting learning data, there is a method of collecting images from a website using a word representing a semantic label prepared in advance (see, for example, Non-Patent Document 2).
Y. Wu. BL Tseng and JR Smith, "Ontology-based Multi-Classification Leaning for Video Concept Detection," IEEE International Conference on Multimedia and Expo (ICME), June 2004. Yongqing Sun, Satoshi Shimada, Masashi Morimoto, "Visual pattern discovery using web images". ACM MIR workshop, 2006.

しかしながら、上記の非特許文献1に示すような画像辞書生成方法は、映像に関する画像辞書の精度が学習データに依存するので、実際の映像に良く反映できる学習データを選別するためには膨大な時間と手間を要するという問題がある。 However, since the image dictionary generation method as shown in Non-Patent Document 1 described above depends on the learning data for the accuracy of the image dictionary related to the video, it takes an enormous amount of time to select the learning data that can be well reflected in the actual video. There is a problem that it takes time and effort.

また、上記の非特許文献２に示すような学習データを収集する方法は、Ｗｅｂ情報の特性により、収集されたＷｅｂ画像には多種多様な画像が含まれているので、ノイズを含む学習データを用いることになり、生成した画像辞書の精度が低くなるという問題がある。 In addition, the method of collecting learning data as shown in Non-Patent Document 2 described above includes various kinds of images in the collected web image due to the characteristics of the web information. Therefore, there is a problem that the accuracy of the generated image dictionary is lowered.

さらに、両者の従来の方法とも、映像にどのような意味ラベルを付与するかを事前に選択する必要があるが、映像のコンテンツを反映した適切な意味ラベルを選択するためには、時間と手間を要するという問題がある。 Furthermore, in both conventional methods, it is necessary to select in advance what kind of semantic label is to be given to the video, but in order to select an appropriate semantic label that reflects the video content, time and effort are required. There is a problem that requires.

本発明は、上記の点に鑑みなされたもので、事前に付与すべき意味ラベルを選択することなく、コンテンツに応じて適切な意味ラベルを抽出し、精度よく画像辞書を生成することが可能な画像辞書生成方法及び装置及びプログラムを提供することを目的とする。 The present invention has been made in view of the above points, and can extract an appropriate semantic label according to content and generate an image dictionary with high accuracy without selecting a semantic label to be given in advance. An object is to provide an image dictionary generation method, apparatus, and program.

近年、情報と通信放送連携に伴い、Ｗｅｂ情報とＴＶ映像が深く関連しているので、意味ラベルを付与したい映像と関連した画像を収集するには、前述の非特許文献２のようなＷｅｂ情報を活用することが有効である。本発明は、画像辞書の精度が低いという従来技術の問題を解決するために、コンテンツに応じて適切な意味ラベルを自動抽出する手段を設けることで、精度を向上させる。具体的には以下のような手段を用いる。 In recent years, Web information and TV video have been closely related to information and communication / broadcasting cooperation. Therefore, in order to collect images related to video for which a semantic label is to be added, Web information such as that described in Non-Patent Document 2 is used. It is effective to utilize The present invention improves the accuracy by providing means for automatically extracting an appropriate semantic label according to the content in order to solve the problem of the prior art that the accuracy of the image dictionary is low. Specifically, the following means are used.

図１は、本発明の原理を説明するための図である。 FIG. 1 is a diagram for explaining the principle of the present invention.

本発明（請求項１）は、映像に対して意味的なレベルを付与するための、意味ラベルとビジュアルパターンとの関連を定義した画像辞書を生成する画像辞書生成方法であって、
映像コンテンツ関連情報取得手段が、映像コンテンツの内容を説明したテキスト情報であるコンテンツ関連情報をウェブサイトから取得する映像コンテンツ関連情報取得ステップ（ステップ１）と、
テキスト情報収集手段が、コンテンツ関連情報から抽出したワードとコンテンツ作成日付またはコンテンツ公開日付に基づいてウェブサイトからコンテンツに関するテキスト情報を収集し、記憶手段に格納するテキスト情報収集ステップ（ステップ２）と、
話題抽出手段が、記憶手段に格納されているテキスト情報から話題を表す複数のワードをワードセットとして抽出する話題抽出ステップ（ステップ３）と、
画像収集・ビジュアルパターン生成手段が、ワードセットに関するウェブ画像を収集し（ステップ４）、収集したウェブ画像を学習データとして話題と関連するビジュアルパターンを生成し（ステップ５）、該ワードセットと該ビジュアルパターンを画像辞書として画像辞書記憶手段に格納する（ステップ６）画像収集・ビジュアルパターン生成ステップと、を行う。 The present invention (Claim 1) is an image dictionary generation method for generating an image dictionary that defines a relationship between a semantic label and a visual pattern for giving a semantic level to a video,
A video content related information acquisition unit (step 1) in which the video content related information acquisition means acquires content related information, which is text information describing the content of the video content, from the website;
A text information collecting step (step 2) in which the text information collecting means collects text information related to the content from the website based on the word extracted from the content related information and the content creation date or the content release date, and stores it in the storage means;
A topic extraction step (step 3) in which the topic extraction unit extracts a plurality of words representing the topic from the text information stored in the storage unit as a word set;
The image collection / visual pattern generation means collects a web image related to a word set (step 4), generates a visual pattern related to a topic using the collected web image as learning data (step 5), and the word set and the visual A pattern is stored in the image dictionary storage means as an image dictionary (step 6). An image collection / visual pattern generation step is performed.

本発明（請求項２）は、話題抽出ステップ（ステップ３）において、
コンテンツ関連情報から抽出した映像ジャンルに応じて、抽出すべき話題の種別を定義した映像属性テーブルを記憶手段に格納しておき、該属性テーブルの各属性ｎ（ｎ＝１，２，…，Ｎ）に対して、テキスト情報収集ステップにより記憶手段に格納されているテキスト情報を読み出して分類し、
分類された同一グループのテキスト情報から、属性ｎに関する話題を表すワードセットを抽出する。 According to the present invention (Claim 2), in the topic extraction step (Step 3),
In accordance with the video genre extracted from the content-related information, a video attribute table defining the type of topic to be extracted is stored in the storage means, and each attribute n (n = 1, 2,..., N) of the attribute table is stored. ), The text information stored in the storage means is read and classified by the text information collecting step,
A word set representing a topic related to the attribute n is extracted from the classified text information of the same group.

図２は、本発明の原理構成図である。 FIG. 2 is a principle configuration diagram of the present invention.

本発明（請求項３）は、映像に対して意味的なレベルを付与するための、意味ラベルとビジュアルパターンとの関連を定義した画像辞書を生成する画像辞書生成装置であって、
映像コンテンツの内容を説明したテキスト情報であるコンテンツ関連情報をウェブサイトから取得する映像コンテンツ関連情報取得手段１００と、
コンテンツ関連情報から抽出したワードとコンテンツ作成日付またはコンテンツ公開日付に基づいてウェブサイト３からコンテンツに関するテキスト情報を収集し、記憶手段に格納するテキスト情報収集手段１０１と、
記憶手段に格納されているテキスト情報から話題を表す複数のワードをワードセットとして抽出する話題抽出手段１０４と、
ワードセットに関するウェブ画像を収集し、収集したウェブ画像を学習データとして話題と関連するビジュアルパターンを生成し、該ワードセットと該ビジュアルパターンを画像辞書として画像辞書記憶手段２に格納する画像収集・ビジュアルパターン生成手段１０５と、を有する。 The present invention (Claim 3) is an image dictionary generation device for generating an image dictionary defining a relationship between a semantic label and a visual pattern for giving a semantic level to a video,
Video content related information acquisition means 100 for acquiring content related information, which is text information describing the content of video content, from a website;
Text information collecting means 101 for collecting text information related to the content from the website 3 based on the word extracted from the content related information and the content creation date or the content release date, and storing it in the storage means;
Topic extraction means 104 for extracting a plurality of words representing a topic from the text information stored in the storage means as a word set;
Collecting web images related to a word set, generating a visual pattern associated with a topic using the collected web image as learning data, and storing the word set and the visual pattern as an image dictionary in the image dictionary storage means 2 Pattern generation means 105.

本発明（請求項４）は、コンテンツ関連情報から抽出した映像ジャンルに応じて、抽出すべき話題の種別を定義した映像属性テーブルを格納した映像属性記憶手段を更に有し、
話題抽出手段１０４は、
映像属性記憶手段の属性テーブルの各属性ｎ（ｎ＝１，２，…，Ｎ）に対して、テキスト情報収集手段１０１により記憶手段に格納されているテキスト情報を読み出して分類する手段と、
分類された同一グループのテキスト情報から、属性ｎに関する話題を表すワードセットを抽出する手段と、を含む。 The present invention (Claim 4) further includes video attribute storage means storing a video attribute table that defines the type of topic to be extracted according to the video genre extracted from the content related information,
The topic extraction means 104
Means for reading and classifying the text information stored in the storage means by the text information collection means 101 for each attribute n (n = 1, 2,..., N) in the attribute table of the video attribute storage means;
Means for extracting a word set representing a topic related to the attribute n from the classified text information of the same group.

本発明（請求項５）は、コンピュータに、請求項３または４記載の機能を実現させるプログラムである。 The present invention (Claim 5) is a program for causing a computer to realize the function according to Claim 3 or 4.

上記のように本発明によれば、映像に関連するＷｅｂ情報（コンテンツ関連情報）を収集して、それらを分類することで話題を抽出することにより、事前に付与すべき意味ラベルを選択することなく、コンテンツに応じて適切な意味ラベルを自動抽出するので、手間と時間をかけることなく、精度よく画像辞書を生成することができる。 As described above, according to the present invention, it is possible to select a semantic label to be given in advance by collecting Web information (content related information) related to video and classifying them to extract topics. In addition, since an appropriate semantic label is automatically extracted according to content, an image dictionary can be generated with high accuracy without taking time and effort.

以下、図面と共に本発明の実施の形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図３は、本発明の一実施の形態におけるシステム構成を示す。 FIG. 3 shows a system configuration in an embodiment of the present invention.

同図に示すシステムは、画像辞書生成装置１、画像辞書データベース２、インターネットに接続されるＷｅｂサイト３から構成される。 The system shown in FIG. 1 includes an image dictionary generation device 1, an image dictionary database 2, and a Web site 3 connected to the Internet.

画像辞書生成装置１は、ウェブサイト３から映像に関するウェブ情報を取得し、取得したウェブ情報を用いて、映像に関する画像辞書を生成する。 The image dictionary generation apparatus 1 acquires web information related to a video from the website 3, and generates an image dictionary related to the video using the acquired web information.

画像辞書データベース２は、画像辞書生成装置１で生成された画像辞書を受け取って格納する。 The image dictionary database 2 receives and stores the image dictionary generated by the image dictionary generation device 1.

ウェブサイト３は、インターネット上で公開されている数多くのウェブサイトである。公開されている情報は、周辺テキストが付与されたウェブ画像とウェブテキスト情報から構成されている。ウェブサイト３は、画像辞書生成装置1の要求に応じて周辺テキストが付与されたウェブ画像とウェブテキスト情報を当該画像辞書生成装置１に出力する。 The website 3 is a large number of websites published on the Internet. The published information is composed of a web image to which peripheral text is added and web text information. The website 3 outputs to the image dictionary generation device 1 the web image and the web text information to which the peripheral text is added in response to a request from the image dictionary generation device 1.

以下に、上記の画像辞書生成装置1について詳細に説明する。 Hereinafter, the image dictionary generating apparatus 1 will be described in detail.

図４は、本発明の一実施の形態における画像辞書生成装置の構成を示す。 FIG. 4 shows the configuration of the image dictionary generation apparatus according to an embodiment of the present invention.

同図に示す画像辞書生成装置１は、映像コンテンツ関連情報取得部１００、テキスト情報収集部１０１、テキスト情報分類部１０２、映像属性テーブル管理部１０３、映像話題抽出部１０４、ウェブ画像収集とビジュアルパターン生成部１０５から構成される。 The image dictionary generation apparatus 1 shown in FIG. 1 includes a video content related information acquisition unit 100, a text information collection unit 101, a text information classification unit 102, a video attribute table management unit 103, a video topic extraction unit 104, web image collection and visual patterns. The generation unit 105 is configured.

上記の各構成における処理についてＴＶ番組映像を例として説明する。 The processing in each of the above configurations will be described using a TV program video as an example.

映像コンテンツ関連情報取得部１００は、ある番組映像を説明したテキスト情報をウェブサイト３から取得し、メモリ等の記憶手段（図示せず）に格納する。例えば、テレビ映像のＥＰＧ情報をＴＶ番組放送局のホームページから取得すればよい。ＥＰＧ情報の例を図５に示す。同図に示すＥＰＧ情報は、放送日時、タイトル、ジャンル、概要、出演者等からなり、映像コンテンツ関連情報取得部１００は、当該ＥＰＧ情報から映像のタイトルと放送日時を抽出してテキスト情報収集部１０１に出力し、ＥＰＧ情報の映像ジャンル情報を抽出して属性テキスト情報分類部１０２に出力する。 The video content related information acquisition unit 100 acquires text information describing a certain program video from the website 3 and stores it in storage means (not shown) such as a memory. For example, EPG information of a television image may be acquired from a homepage of a TV program broadcasting station. An example of EPG information is shown in FIG. The EPG information shown in the figure includes broadcast date / time, title, genre, outline, performer, and the like, and the video content related information acquisition unit 100 extracts the video title and broadcast date / time from the EPG information to obtain a text information collection unit. 101, the video genre information of the EPG information is extracted and output to the attribute text information classification unit 102.

テキスト情報収集部１０１は、映像コンテンツ関連情報取得部１００から映像のタイトルと放送日時を受け取ると、映像のタイトルを検索条件として、放送日時前後の限定期間のウェブテキスト情報をウェブサイト３から収集し、メモリ等の記憶手段（図示せず）に格納する。収集したウェブテキスト情報をテキスト情報分類部１０２へ出力する。例えば、映像放送日時前の1週間と後の３週間にかけた1ヶ月の最近ウェブテキスト情報を収集してもよい。 Upon receiving the video title and the broadcast date / time from the video content related information acquisition unit 100, the text information collection unit 101 collects web text information from the website 3 for a limited period before and after the broadcast date / time using the video title as a search condition. And stored in storage means (not shown) such as a memory. The collected web text information is output to the text information classification unit 102. For example, recent web text information for one month before one week before the video broadcast date and three weeks after it may be collected.

テキスト情報分類部１０２は、映像コンテンツ関連情報取得部１００から映像ジャンル情報を受け取ると、映像属性テーブル管理部１０３へ出力する。また、映像属性テーブル管理部１０３から受け取った映像属性テーブルの情報に従って、テキスト情報収集部１０１から受け取ったウェブテキスト情報を分類し、メモリ等の記憶手段（図示せず）に格納する。分類したテキスト情報を映像話題抽出部１０４へ出力する。 Upon receiving the video genre information from the video content related information acquisition unit 100, the text information classification unit 102 outputs the video genre information to the video attribute table management unit 103. Further, the web text information received from the text information collection unit 101 is classified according to the information of the video attribute table received from the video attribute table management unit 103, and stored in a storage means (not shown) such as a memory. The classified text information is output to the video topic extraction unit 104.

映像属性テーブル管理部１０３は、映像ジャンルに応じて抽出すべき話題の種別を定義した映像属性テーブルを記憶手段に保持・管理する。映像属性テーブル管理部１０３は、図６に示すような映像属性テーブルを有する。映像属性テーブルは、映像ジャンルと当該映像ジャンルに関する複数の属性を保持する。映像属性テーブル管理部１０３は、テキスト情報分類部１０２から映像ジャンル情報を受け取って映像に対応付ける映像属性テーブルの情報をテキスト情報分類部１０２へ出力する。 The video attribute table management unit 103 holds and manages a video attribute table defining topic types to be extracted according to the video genre in a storage unit. The video attribute table management unit 103 has a video attribute table as shown in FIG. The video attribute table holds a video genre and a plurality of attributes related to the video genre. The video attribute table management unit 103 receives the video genre information from the text information classification unit 102 and outputs information of the video attribute table associated with the video to the text information classification unit 102.

映像話題抽出部１０４は、テキスト情報分類部１０２においてメモリに格納されたテキスト情報を読み出して、統計手法によりテキスト情報のワード頻度分布から上位ワードを話題として抽出する。抽出された話題に関するワードをウェブ画像収集とビジュアルパターン生成部１０４へ出力する。 The video topic extraction unit 104 reads the text information stored in the memory in the text information classification unit 102, and extracts the upper word as a topic from the word frequency distribution of the text information by a statistical method. The extracted word related to the topic is output to the web image collection and visual pattern generation unit 104.

ウェブ画像収集とビジュアルパターン生成部１０５は、映像話題抽出部１０４から受け取った話題に関するワードを検索条件として、関連ウェブ画像をウェブサイト３から収集する。収集されたウェブ画像を学習手法で話題に関するビジュアルパターンを生成する。話題に関するワードとビジュアルパターンで構成した画像辞書を画像データベース２に格納する。 The web image collection and visual pattern generation unit 105 collects related web images from the website 3 using a word related to the topic received from the video topic extraction unit 104 as a search condition. A visual pattern related to the topic is generated by using the collected web images as a learning method. An image dictionary composed of words related to the topic and visual patterns is stored in the image database 2.

以上の構成により、テキストと画像を含むウェブ情報を用いて、映像を対象とした画像辞書の生成を行う。 With the above configuration, an image dictionary for video is generated using web information including text and images.

次に、上記の画像辞書生成装置１における基本動作を説明する。 Next, a basic operation in the image dictionary generation apparatus 1 will be described.

図７は、本発明の一実施の形態における画像辞書生成装置の基本動作のフローチャートである。 FIG. 7 is a flowchart of the basic operation of the image dictionary generation apparatus according to an embodiment of the present invention.

ステップ２０１）映像コンテンツ関連情報取得部１００において、ある映像コンピュータ関連情報をウェブサイト３から取得し、記憶手段（図示せず）に格納する。例えば、図５に示すような映像のＥＰＧ情報をＴＶ番組放送局のホームページから取得すればよい。 Step 201) In the video content related information acquisition unit 100, certain video computer related information is acquired from the website 3 and stored in storage means (not shown). For example, what is necessary is just to acquire the EPG information of a video as shown in FIG. 5 from the homepage of a TV program broadcasting station.

ステップ２０２）テキスト情報収集部１０１において、映像コンテンツ関連情報ＥＰＧにあったタイトルを検索条件として、ウェブサイト３から放送日時前後の限定期間のウェブテキスト情報を収集する。例えば、映像放送日時前の１週間と、放送日時後の３週間の１ヶ月分の最近のウェブテキスト情報を収集し、メモリ（図示せず）に格納する。 Step 202) The text information collection unit 101 collects web text information for a limited period before and after the broadcast date and time from the website 3, using the titles in the video content related information EPG as search conditions. For example, recent web text information for one week before the video broadcasting date and time and three weeks after the broadcasting date is collected and stored in a memory (not shown).

ステップ２０３）テキスト情報分類部１０２は、ステップ２０１において映像コンテンツ関連情報取得部１００がＥＰＧ情報から抽出した映像ジャンルに従って、映像属性テーブル管理部１０３から対応する映像属性テーブルを読み出す。映像ジャンルが「ドラマ」である場合は、図６の例では、映像属性テーブル管理部１０３のドラマ映像に対する属性テーブル（人物、場所、事件、感情、社交活動）を読み出す。 Step 203) The text information classification unit 102 reads the corresponding video attribute table from the video attribute table management unit 103 according to the video genre extracted from the EPG information by the video content related information acquisition unit 100 in Step 201. When the video genre is “drama”, in the example of FIG. 6, the attribute table (person, place, incident, emotion, social activity) for the drama video of the video attribute table management unit 103 is read.

ステップ２０４）テキスト情報分類部１０２は、映像属性テーブルにあった属性ｎに対応したワードを追加した検索条件で、ステップ２０２において収集され、メモリ（図示せず）に格納されたウェブテキスト情報に対し、再検索を行って、当該属性ｎに関するテキスト情報を収集し、メモリ（図示せず）に格納する。属性ｎに関するテキスト情報を収集するのは、ウェブテキストと属性ｎに対応したワードとのテキスト照合を行い、類似度の高い順にＷｅｂテキストを属性ｎに関するテキスト情報として収集すればよい。ここで、映像属性テーブルにＮ個属性があれば、ステップ２０２で収集されたウェブテキストは、Ｎ個の属性毎のテキストグループに分類され、メモリ（図示セず）に格納する。 Step 204) The text information classifying unit 102 searches the web text information collected in Step 202 and stored in the memory (not shown) under the search condition in which the word corresponding to the attribute n in the video attribute table is added. Re-searching is performed to collect text information related to the attribute n and store it in a memory (not shown). The text information related to the attribute n is collected by performing text matching between the web text and a word corresponding to the attribute n, and collecting the web text as text information related to the attribute n in descending order of similarity. Here, if there are N attributes in the video attribute table, the web text collected in step 202 is classified into a text group for each of the N attributes and stored in a memory (not shown).

ステップ２０５）映像話題抽出部１０４は、メモリ（図示せず）のテキスト情報分類部１０２で分類されたテキストグループについて、ｎ＝１として、最初のテキストグループを読み出して、以下の処理を行う。 Step 205) The video topic extraction unit 104 reads the first text group with n = 1 for the text group classified by the text information classification unit 102 of the memory (not shown), and performs the following processing.

ステップ２０６）映像話題抽出部１０４は、属性ｎに関するテキストグループから話題を抽出する。処理内容を図８のフローチャートに沿って説明する。 Step 206) The video topic extraction unit 104 extracts a topic from the text group related to the attribute n. The processing contents will be described with reference to the flowchart of FIG.

ステップ３０１）映像話題抽出部１０４は、ステップ２０４においてメモリ（図示せず）に格納された属性Ｎに関するウェブテキスト情報を読み込む。 Step 301) The video topic extraction unit 104 reads the web text information related to the attribute N stored in the memory (not shown) in Step 204.

ステップ３０２）読み込まれたウェブテキスト情報における、"を"、"は"、"が"などのStopping wordを削除する。 Step 302) In the read web text information, Stopping words such as “to”, “to” and “to” are deleted.

ステップ３０３）ステップ３０２で処理されたテキスト情報を用いて、類似したワードを統合する。ドラマ映像に関する属性ｎ「人物」の例として、テキスト情報に表したワードは、"ＡＡＡさん"、"Ｔさん"、"娘"、"友達"、"Ｃさん"の場合で、予め記憶手段（図示せず）に格納されているドラマ映像の人物関係情報を参照して、類似したワードを統合してワードセットを生成し、メモリ（図示せず）に格納する。図９は、予め記憶手段（図示せず）に格納されているドラマ映像の人物関係情報の例である。当該ドラマの人物関係情報は、予めＴＶ放送局のホームページから読み込んで記憶手段に格納しておけばよい。図９に従って生成されたワードセットは、｛ＡＡＡさん、娘｝、｛友達、Ｔさん｝、｛Ｃさん｝になる。 Step 303) Using the text information processed in step 302, similar words are integrated. As an example of the attribute n “person” relating to the drama video, the words represented in the text information are “Mr. AAA”, “Mr. T”, “Daughter”, “Friend”, “Mr. C”, and storage means ( Referring to the personal relationship information of the drama video stored in (not shown), similar words are integrated to generate a word set and stored in a memory (not shown). FIG. 9 is an example of personal relationship information of a drama video stored in advance in storage means (not shown). The person relation information of the drama may be read in advance from the homepage of the TV broadcast station and stored in the storage means. The word sets generated according to FIG. 9 are {AAA, daughter}, {friend, T}, and {C}.

ステップ３０４）ステップ３０３で処理された各ワードセットをメモリ（図示せず）から読み出して、個々のワードセットの頻度を算出する。 Step 304) Each word set processed in step 303 is read from a memory (not shown), and the frequency of each word set is calculated.

ステップ３０５）ステップ３０４で算出されたワードセットの頻度のうち、頻度の高い上位Ｍ個のワードセットを属性ｎに関する話題として抽出し、メモリ（図示せず）に格納する。 Step 305) Of the word set frequencies calculated in Step 304, the top M word sets with the highest frequency are extracted as topics related to the attribute n and stored in a memory (not shown).

ステップ２０７）ウェブ画像収集・ビジュアルパターン生成部１０５は、ｍ＝１として、ステップ２０６において、メモリ（図示せず）に格納された最初の話題について処理する。 Step 207) The web image collection / visual pattern generation unit 105 sets m = 1 and processes the first topic stored in the memory (not shown) in Step 206.

ステップ２０８）ウェブ画像収集・ビジュアルパターン生成部１０５は、メモリ（図示せず）から読み出したワードセットを検索条件としてウェブサイト３からウェブ画像収集を行う。収集した画像から当該ワードセットに関するビジュアルモジュールを抽出し、ワードセットとビジュアルモジュールを組にしてメモリ（図示せず）に格納する。ここで、共通ビジュアルモジュールを抽出する一例として、非特許文献２の方法を用いて、ワードを収集条件として、ウェブサイトから取得した画像から適切な認識関数を求めることにより、ワードセットに対応したビジュアルモジュールを生成すればよい。 Step 208) The web image collection / visual pattern generation unit 105 performs web image collection from the website 3 using a word set read from a memory (not shown) as a search condition. A visual module related to the word set is extracted from the collected image, and the word set and the visual module are combined and stored in a memory (not shown). Here, as an example of extracting a common visual module, the method of Non-Patent Document 2 is used to obtain an appropriate recognition function from an image acquired from a website using a word as a collection condition. Just create a module.

ステップ２０９）全てのワードセットにおいて、上記のステップ２０８の処理を行ったかを判定し、行っていなければｍ＝ｍ＋１とし、ステップ２０８に戻る。そうでなければ、ステップ２１０へ移行する。 Step 209) In all word sets, it is determined whether or not the processing in the above step 208 has been performed. If not, m = m + 1 is set, and the processing returns to step 208. Otherwise, go to Step 210.

ステップ２１０）全ての属性に関するテキスト情報において、上記のステップ２０６〜２０９の処理を行ったかを判定し、行っていなければｎ＝ｎ＋１として、ステップ２０６に戻る。そうでなければステップ２１１へ移行する。 Step 210) It is determined whether or not the processing of Steps 206 to 209 has been performed on the text information regarding all the attributes. If not, n = n + 1 is set and the processing returns to Step 206. Otherwise, the process proceeds to step 211.

ステップ２１１）映像に対して上記のステップ２０８で生成し、メモリ（図示せず）に格納されているワードセットと当該ワードセットに対応したビジュアルパターンの組を読み出して画像辞書として画像辞書データベース２に格納する。 Step 211) A set of a word set and a visual pattern corresponding to the word set generated in Step 208 and stored in a memory (not shown) is read out from the video and stored in the image dictionary database 2 as an image dictionary. Store.

なお、上記の実施の形態における画像辞書生成装置の動作をプログラムとして構築し、画像辞書生成装置として利用されるコンピュータにインストールして実行させる、または、ネットワークを介して流通させることが可能である。 Note that the operation of the image dictionary generation apparatus in the above embodiment can be constructed as a program, installed in a computer used as the image dictionary generation apparatus, executed, or distributed via a network.

また、構築されたプログラムをコンピュータが読み取り可能な記録媒体に格納し、コンピュータにインストールする、または、配布することが可能である。 Further, the constructed program can be stored in a computer-readable recording medium and installed or distributed in the computer.

なお、本発明は、上記の実施の形態に限定されることなく、特許請求の範囲内において種々変更・応用が可能である。 The present invention is not limited to the above-described embodiment, and various modifications and applications can be made within the scope of the claims.

本発明は、画像ＤＢシステムや映像ＤＢシステムを含む画像処理に利用可能である。 The present invention can be used for image processing including an image DB system and a video DB system.

本発明の原理を説明するための図である。It is a figure for demonstrating the principle of this invention. 本発明の原理構成図である。It is a principle block diagram of this invention. 本発明の一実施の形態におけるシステム構成図である。1 is a system configuration diagram according to an embodiment of the present invention. 本発明の一実施の形態における画像辞書生成装置の構成図である。It is a block diagram of the image dictionary production | generation apparatus in one embodiment of this invention. 本発明の一実施の形態における映像のＥＰＧ情報である。It is the EPG information of the image | video in one embodiment of this invention. 本発明の一実施の形態における映像ジャンル毎の映像属性テーブル管理部の例である。It is an example of the video attribute table management part for every video genre in one embodiment of this invention. 本発明の一実施の形態における画像辞書生成装置の基本動作のフローチャートである。It is a flowchart of the basic operation | movement of the image dictionary production | generation apparatus in one embodiment of this invention. 本発明の一実施の形態における属性に関する映像話題抽出処理のフローチャートである。It is a flowchart of the video topic extraction process regarding the attribute in one embodiment of this invention. 本発明の一実施の形態における人物関係図の例である。It is an example of the person relationship figure in one embodiment of this invention.

Explanation of symbols

１画像辞書生成装置
２画像辞書データベース
３ウェブサイト
１００映像コンテンツ関連情報取得手段、映像コンテンツ関連情報取得部
１０１テキスト情報収集手段、テキスト情報収集部
１０２テキスト情報分類部
１０３映像属性テーブル管理部
１０４話題抽出手段、映像話題抽出部
１０５画像収集・ビジュアルパターン生成手段、画像収集・ビジュアルパターン生成部 DESCRIPTION OF SYMBOLS 1 Image dictionary production | generation apparatus 2 Image dictionary database 3 Website 100 Video content related information acquisition means, Video content related information acquisition part 101 Text information collection means, Text information collection part 102 Text information classification part 103 Video attribute table management part 104 Topic extraction Means, video topic extraction unit 105 image collection / visual pattern generation unit, image collection / visual pattern generation unit

Claims

An image dictionary generation method for generating an image dictionary that defines a relationship between a semantic label and a visual pattern for giving a semantic level to a video,
A video content related information acquisition unit, wherein the video content related information acquisition means acquires content related information, which is text information describing the content of the video content, from a website;
A text information collecting step for collecting text information about the content from the website based on the word extracted from the content related information and the content creation date or the content release date, and storing the text information in the storage unit;
A topic extraction step for extracting a plurality of words representing a topic as a word set from the text information stored in the storage means;
Image collection / visual pattern generation means collects web images related to the word set, generates a visual pattern related to the topic using the collected web images as learning data, and uses the word set and the visual pattern as an image dictionary. Image collection / visual pattern generation step stored in dictionary storage means;
The image dictionary generation method characterized by performing.

In the topic extraction step,
In accordance with the video genre extracted from the content-related information, a video attribute table defining the type of topic to be extracted is stored in the storage means, and each attribute n (n = 1, 2,..., N) of the attribute table is stored. ), The text information stored in the storage means by the text information collection step is read and classified,
The image dictionary generation method according to claim 1, wherein a word set representing a topic related to the attribute n is extracted from the classified text information of the same group.

An image dictionary generation device that generates an image dictionary that defines a relationship between a semantic label and a visual pattern for giving a semantic level to a video,
Video content related information acquisition means for acquiring content related information, which is text information describing the content of the video content, from a website;
Text information collecting means for collecting text information about the content from the website based on the word extracted from the content related information and the content creation date or the content release date, and storing the text information in a storage means;
Topic extraction means for extracting a plurality of words representing a topic from the text information stored in the storage means as a word set;
Collecting a web image related to the word set, generating a visual pattern related to the topic using the collected web image as learning data, and storing the word set and the visual pattern as an image dictionary in an image dictionary storage unit Visual pattern generation means;
An image dictionary generation device characterized by comprising:

According to the video genre extracted from the content-related information, the video attribute storage means further stores a video attribute table that defines the type of topic to be extracted,
The topic extraction means includes
For each attribute n (n = 1, 2,..., N) in the attribute table of the video attribute storage means, the text information stored in the storage means is read and classified by the text information collection means. Means,
Means for extracting a word set representing a topic related to the attribute n from the classified text information of the same group;
The image dictionary generation device according to claim 3, comprising:

On the computer,
A program for realizing the function according to claim 3 or 4.