JP2007006116A

JP2007006116A - Picture indexer

Info

Publication number: JP2007006116A
Application number: JP2005183569A
Authority: JP
Inventors: Kikuka Miura; 菊佳三浦; Hideki Sumiyoshi; 英樹住吉; Ichiro Yamada; 一郎山田
Original assignee: Nippon Hoso Kyokai NHK; Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2005-06-23
Filing date: 2005-06-23
Publication date: 2007-01-11

Abstract

<P>PROBLEM TO BE SOLVED: To provide a picture indexer which identifies a variety of figurative objects by simple means when the figurative object on a picture is identified and stored as an indexed picture in a storing means. <P>SOLUTION: Closed captions are analyzed to identify a configurative object on a picture from features of sentences often added to a scene with the configurative object taken thereon. The features are such that firstly for a sentence closed with a noun or a "noun and an auxiliary verb", the configurative object often appears there in the picture; and that secondly for a sentence including a demonstrative pronoun, a configurative object of the noun located at the subjective case of the sentence often appears there in the picture. A feature extracting means 6 extracts a sentence having such language features by the language processing technique, and combines the configurative object name with a closed caption and the time information and with the time information of the picture to store them in a picture indexing storing means 7. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、自然言語処理技術を利用した情報抽出処理に関し、特に、クローズドキャプションを用いて映像上の具象物を特定する技術に関する。 The present invention relates to information extraction processing using natural language processing technology, and more particularly, to technology for specifying a concrete object on an image using closed captioning.

従来、映像上の具象物を特定する技術として、“Ｎａｍｅ−Ｉｔ”と呼ばれるシステムが提案されている（非特許文献１を参照）。このシステムは、クローズドキャプション、顔画像認識、トランスクリプト、及びビデオキャプション（オープンキャプション）を用いて、ニュース番組の映像に出てくる人物の名前及び顔を特定するものである。具体的には、映像情報から人物の顔を抽出し、音声情報から人物に関するワードを抽出し、名前と顔との間の関連を評価し、そして、その人物の名前及び顔を特定する。 Conventionally, a system called “Name-It” has been proposed as a technique for specifying a concrete object on an image (see Non-Patent Document 1). This system uses closed captions, face image recognition, transcripts, and video captions (open captions) to identify names and faces of people appearing in news program videos. Specifically, a person's face is extracted from the video information, a word related to the person is extracted from the audio information, the association between the name and the face is evaluated, and the name and face of the person are specified.

このように特定された情報は、インデックスと映像とが関連付けられ、インデキシングされた映像として記憶装置に蓄積される。そして、例えば映像制作者により、これらの映像が二次的に利用される。 The information specified in this way is associated with the index and the video, and is stored in the storage device as an indexed video. These videos are used secondarily by, for example, a video producer.

ここで、クローズドキャプションとは、テレビ番組を例にすると、字幕放送用のテレビでしか見ることができない字幕のことをいう。これに対し、番組の映像上に載せられている字幕（テロップ）をオープンキャプションという。 Here, the closed caption refers to a caption that can be viewed only on a television for caption broadcasting, taking a television program as an example. On the other hand, subtitles (telops) placed on the program video are called open captions.

Shin’ichi Satoh and Yuichi Nakamura and Takeo Kanade、“Name-It:Naming and Detecting Faces in Video by the Integration of Image and Natural Language Processing”、International Jiont Conference on Artificial Intelligence,Inc.、Proceedings of IJCAI-97、pp.1488-1493、１９９７Shin'ichi Satoh and Yuichi Nakamura and Takeo Kanade, “Name-It: Naming and Detecting Faces in Video by the Integration of Image and Natural Language Processing”, International Jiont Conference on Artificial Intelligence, Inc., Proceedings of IJCAI-97, pp .1488-1493, 1997

しかしながら、前述の“Ｎａｍｅ−Ｉｔ”と呼ばれるシステムでは、ニュース番組の映像に出てくる人物の名前及び顔しか特定できないため、その用途が限定されていた。例えば、映像制作者の立場としては、ニュース番組の人物だけでなく、様々な具象物を対象に、インデキシングされた映像を利用できることが望ましい。 However, in the system called “Name-It” described above, only the name and face of a person appearing in the video of a news program can be specified. For example, from the standpoint of video producers, it is desirable to be able to use indexed video for various concrete objects as well as for news program people.

また、前述のシステムでは、映像上の人物の名前及び顔を特定する際に、クローズドキャプション、顔画像認識、トランスクリプト等の大量の情報が必要になるため、その処理計算量も多くなるという問題があった。 In addition, the above-described system requires a large amount of information such as closed captions, face image recognition, and transcripts when specifying the name and face of a person on the video, and the amount of processing calculations increases. was there.

そこで、本発明は、かかる問題を解決するためになされたものであり、その目的は、映像上の具象物を特定し、インデキシングされた映像として蓄積手段に蓄積する場合、多種多様な具象物を、簡易な手段により特定可能な映像インデキシング装置を提供することにある。 Therefore, the present invention has been made to solve such a problem, and its purpose is to specify a concrete object on an image, and to store a wide variety of concrete objects in the storage means as an indexed image. Another object of the present invention is to provide a video indexing device that can be specified by simple means.

本発明は、前記課題を解決するため、クローズドキャプションを解析し、具象物の映っている場面に付加されることの多い文の特徴により、映像上の具象物を特定することを特徴とする。ここで、映像は、静止画や動画を含み、音声の有無を問わないものとする。 In order to solve the above-described problem, the present invention is characterized in that a closed caption is analyzed, and a concrete object on a video is specified by a feature of a sentence that is often added to a scene where the concrete object is reflected. Here, the video includes a still image and a moving image, and it does not matter whether or not there is sound.

すなわち、本発明による映像インデキシング装置は、元映像情報に含まれるクローズドキャプションから、文末が体言止めの文、または文末の最後の２つの品詞が名詞及び断定の助動詞の文を抽出し、該抽出した文に含まれる名詞のうちの最終に現れる名詞により、具象物の名称を抽出する特徴抽出手段を備えたことを特徴とする。また、クローズドキャプションから指示代名詞を含む文を抽出し、該抽出した文に含まれる名詞のうちの、が格に位置する名詞により具象物の名称を抽出する特徴抽出手段を備えたことを特徴とする。 In other words, the video indexing device according to the present invention extracts a sentence with a sentence stop at the end of a sentence, or a sentence with a noun and an assertive auxiliary verb at the end of the sentence from the closed caption included in the original video information. It is characterized by comprising a feature extraction means for extracting the name of a concrete object by the noun that appears at the end of the nouns included in the sentence. Further, it is characterized by comprising a feature extraction means for extracting a sentence including a pronoun from a closed caption, and extracting a name of a concrete object by a noun in which the noun contained in the extracted sentence is located. To do.

また、本発明による映像インデキシング装置は、さらに、元映像情報からクローズドキャプション及び映像を分割し、該クローズドキャプション及び映像にそれぞれ同一の時刻情報を付与する分割手段と、前記特徴抽出手段により抽出された具象物の名称と、前記分割手段により分割及び付与された時刻情報付きクローズドキャプションとを含むインデックス、及び、前記分割手段により分割及び付与された時刻情報付き映像が蓄積される映像インデキシング蓄積手段とを備えたことを特徴とする。 The video indexing device according to the present invention may further include a dividing unit that divides the closed caption and the video from the original video information, and adds the same time information to the closed caption and the video, respectively, and the feature extraction unit extracts the video. An index including the name of the concrete object, a closed caption with time information divided and given by the dividing means, and a video indexing storage means for storing the video with time information divided and given by the dividing means. It is characterized by having.

また、本発明による映像インデキシング装置は、さらに、前記映像インデキシング蓄積手段から、具象物の名称によりインデックス及び時刻情報付き映像を検索する検索手段を備えたことを特徴とする。 The video indexing apparatus according to the present invention further includes search means for searching the video with index and time information by the name of the concrete object from the video index storage means.

また、本発明による映像インデキシング装置は、さらに、具象物の名称が蓄積された具象物辞書蓄積手段を備え、前記特徴抽出手段が、前記具象物辞書蓄積手段を検索し、請求項１の最終の名詞または請求項２のが格に位置する名詞が蓄積されていた場合に、具象物の名称を抽出することを特徴とする。 The video indexing device according to the present invention further comprises a concrete object dictionary storage means for storing the names of the concrete objects, wherein the feature extraction means searches the concrete object dictionary storage means, When nouns or nouns in which claim 2 is located are stored, the names of concrete objects are extracted.

また、本発明による映像インデキシング装置は、前記元映像情報、映像インデキシング蓄積手段に蓄積されたインデックス及び時刻情報付き映像が、番組毎に区別されていることを特徴とする。 The video indexing apparatus according to the present invention is characterized in that the original video information, the index stored in the video indexing storage means, and the video with time information are distinguished for each program.

また、本発明による映像インデキシング装置は、前記クローズドキャプションの代わりに音声とすることを特徴とする。 The video indexing apparatus according to the present invention is characterized in that audio is used instead of the closed caption.

本発明によれば、クローズドキャプションを解析して具象物を特定するようにしたから、人物だけでなく動物等の多種多様な具象物を対象とすることが可能となる。また、クローズドキャプションのみを用いるようにしたから、簡易な手段により具象物を特定することができ、従来技術のように大量の情報を用いる必要がなく、インデキシングのための処理計算量も少なくて済む。 According to the present invention, since closed captions are analyzed to specify concrete objects, it is possible to target not only humans but also various concrete objects such as animals. In addition, since only closed captions are used, concrete objects can be identified by simple means, and it is not necessary to use a large amount of information as in the prior art, and the amount of processing calculation for indexing can be reduced. .

以下、本発明の実施の形態について図面を用いて詳細に説明する。以下に説明する実施例１〜３は、いずれも、クローズドキャプションを解析し、具象物が映っている場面に付加されることの多い文の特徴から、映像上の具象物を特定する。実施例４は、音声情報を解析し、具象物が映っている場面に付加されることの多い文の特徴から、映像上の具象物を特定する。クローズドキャプションにおける文の特徴は、第１に、文末が体言止めである文、または文末が「名詞＋断定の助動詞」の文の場合に、その箇所で、具象物が映っている可能性が高いことである。例えば、「突然現れたのは、ライオン。」「本当におとなしいライオンです。」というクローズドキャプションにおいては、文の最後の名詞である「ライオン」という具象物が映像に映っていることが多い。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In each of the first to third embodiments described below, the closed caption is analyzed, and the concrete object on the video is specified from the feature of the sentence that is often added to the scene where the concrete object is reflected. In the fourth embodiment, the speech information is analyzed, and the concrete object on the video is specified from the feature of the sentence that is often added to the scene where the concrete object is reflected. The feature of sentences in closed captions is that, firstly, if the sentence ends with a sentence stop, or if the sentence ends with a "noun + asserted auxiliary verb", there is a high possibility that a concrete object is reflected at that point. That is. For example, in a closed caption such as “The lion that suddenly appeared” or “It is a really gentle lion”, there are many instances where the last noun of the sentence, “lion”, is reflected in the video.

第２に、指示代名詞を含む文の場合に、その箇所で、その文の「が格」に位置する名詞の具象物が映像に映っている可能性が高いことである。例えば、「これは、ライオンが餌を食べているところです。」というクローズドキャプションにおいては、「が格」の「ライオン」が映像に映っていることが多い。 Secondly, in the case of a sentence including an indicating pronoun, there is a high possibility that a concrete object of the noun located in the sentence “ga” is reflected in the video. For example, in a closed caption such as “This is where the lion is eating”, the “lion” of “ga” is often shown in the video.

このような言語的特徴を有する文を言語処理手法を用いて抽出し、対象となる具象物名とその文が提示される時刻情報とを対にし、また、その映像と時刻情報とを対にしてそれぞれ蓄積手段に蓄積する。ここで、映像は、静止画や動画を含み、音声の有無を問わないものとする。 A sentence having such linguistic features is extracted using a language processing method, and the target concrete object name is paired with the time information on which the sentence is presented, and the image and the time information are paired. Each of which is stored in the storage means. Here, the video includes a still image and a moving image, and it does not matter whether or not there is sound.

まず、実施例１の映像インデキシング装置について説明する。図１は、本発明による実施例１の映像インデキシング装置を示す全体構成図である。この映像インデキシング装置１０１は、番組情報蓄積手段１、クローズドキャプション・映像分割手段２、クローズドキャプション蓄積手段５、特徴抽出手段６、映像インデキシング蓄積手段７、検索手段１２、及び映像検索インターフェース１４を備えている。 First, the video indexing apparatus according to the first embodiment will be described. FIG. 1 is an overall configuration diagram showing a video indexing apparatus according to a first embodiment of the present invention. The video indexing apparatus 101 includes program information storage means 1, closed caption / video division means 2, closed caption storage means 5, feature extraction means 6, video indexing storage means 7, search means 12, and video search interface 14. Yes.

番組情報蓄積手段１には、テレビ番組情報が蓄積されている。ここで、テレビ番組情報は、時刻情報を含み、クローズドキャプションが映像と共に併せて放送される情報から構成されているものとする。クローズドキャプション・映像分割手段２は、番組情報蓄積手段１からテレビ番組情報を読み出し、当該テレビ番組情報からクローズドキャプション及び映像を分割する。そして、クローズドキャプション及び映像にそれぞれ同一の時刻情報を付加し、時刻情報付きクローズドキャプション３を出力し、クローズドキャプション蓄積手段５に蓄積する。また、時刻情報付き映像４を出力し、映像インデキシング蓄積手段７に蓄積する。クローズドキャプション蓄積手段５には、時刻情報付きクローズドキャプションが蓄積される。 The program information storage means 1 stores TV program information. Here, it is assumed that the TV program information includes time information and information in which closed captions are broadcast together with video. The closed caption / video dividing means 2 reads the TV program information from the program information storage means 1 and divides the closed caption and video from the TV program information. Then, the same time information is added to the closed caption and the video, respectively, and the closed caption with time information 3 is output and stored in the closed caption storage means 5. Also, the video 4 with time information is output and stored in the video indexing storage means 7. The closed caption accumulating means 5 accumulates closed captions with time information.

特徴抽出手段６は、クローズドキャプション蓄積手段５から時刻情報付きクローズドキャプションを読み出し、当該時刻情報付きクローズドキャプションから、具象物が被写体として映っている場面の映像を表している文を抽出し、検索のためのインデックス（具象物名、時刻情報及びクローズドキャプション）を生成して出力し、映像インデキシング蓄積手段７に蓄積する。特徴抽出手段６の詳細については後述する。 The feature extraction unit 6 reads the closed caption with time information from the closed caption storage unit 5, extracts a sentence representing the video of the scene where the concrete object is shown as the subject from the closed caption with time information, An index (concrete object name, time information, and closed caption) is generated and output for storage in the video indexing storage means 7. Details of the feature extraction means 6 will be described later.

映像インデキシング蓄積手段７は、インデックス蓄積手段８及び時刻情報付き映像蓄積手段９を備えており、インデックス蓄積手段８には、具象物名、時刻情報及びクローズドキャプションから成るインデックスが蓄積され、時刻情報付き映像蓄積手段９には、時刻情報付き映像が蓄積される。 The video indexing storage unit 7 includes an index storage unit 8 and a video storage unit 9 with time information. The index storage unit 8 stores an index composed of a concrete object name, time information, and closed caption, and includes time information. The video storage means 9 stores video with time information.

検索手段１２は、映像検索インターフェース１４から具象物を指定する検索語を入力し、当該検索語に基づいて、映像インデキシング蓄積手段７のインデックス蓄積手段８を検索し、インデックス蓄積手段８から時刻情報付きインデックス１０（具象物名、時刻情報及びクローズドキャプション）を読み出す。そして、時刻情報付きインデックス１０の時刻情報について、当該時刻情報の前後一定区間の映像を映像インデキシング蓄積手段７の時刻情報付き映像蓄積手段９から取り出す。ここで、時刻情報の前後一定区間を示す時間幅は、予め設定されているものとする。 The search means 12 inputs a search word designating a concrete object from the video search interface 14, searches the index storage means 8 of the video indexing storage means 7 based on the search word, and adds time information from the index storage means 8. Read index 10 (representative object name, time information and closed caption). Then, with respect to the time information of the index with time information 10, the video in a certain section before and after the time information is extracted from the video information storage means 9 with time information of the video indexing storage means 7. Here, it is assumed that a time width indicating a certain interval before and after the time information is set in advance.

具体的には、検索手段１２は、インデックス蓄積手段８から読み出した時刻情報付きインデックス１０の時刻情報に基づいて、時刻情報付き映像蓄積手段９を検索し、当該時刻情報の前後の一定区間の時刻情報付き映像１１を読み出す。そして、検索手段１２は、時刻情報、対象となる具象物が映っていると考えられる映像、及びクローズドキャプションから成る検索結果１３を出力する。 Specifically, the search unit 12 searches the video information storage unit 9 with time information based on the time information of the index 10 with time information read from the index storage unit 8, and the time of a certain section before and after the time information. Read video 11 with information. And the search means 12 outputs the search result 13 which consists of time information, the image | video considered that the target concrete object is reflected, and a closed caption.

映像検索インターフェース１４は、映像インデキシング装置１０１に対するオペレータの操作や、外部からの通信等により、具象物を指定する検索語を入力し、当該検索語を検索手段１２に出力する。また、検索手段１２から検索結果１３を入力し、当該検索結果１３を画面に表示したり、外部に送信したりする。 The video search interface 14 inputs a search word for designating a concrete object by an operator's operation on the video indexing apparatus 101 or communication from the outside, and outputs the search word to the search means 12. Further, the search result 13 is input from the search means 12 and the search result 13 is displayed on the screen or transmitted to the outside.

図２は、映像検索インターフェース１４が画面に表示する検索結果一覧の例を示す図である。映像検索インターフェース１４は、図２に示すように、時刻情報、サムネイル画像及びクローズドキャプションを画面に一覧表示する。この場合、映像検索インターフェース１４は、映像をサムネイル画像として画面に表示する。また、オペレータが、表示された検索結果１３のサムネイル画像をクリックすると、映像検索インターフェース１４は、その映像を再生する。 FIG. 2 is a diagram illustrating an example of a search result list displayed on the screen by the video search interface 14. As shown in FIG. 2, the video search interface 14 displays a list of time information, thumbnail images, and closed captions on the screen. In this case, the video search interface 14 displays the video as a thumbnail image on the screen. When the operator clicks on the thumbnail image of the displayed search result 13, the video search interface 14 plays back the video.

次に、図１に示した特徴抽出手段６の動作について詳細に説明する。図３−１は、図１に示した特徴抽出手段６の処理の第１の例を示すフローチャート図である。以下、図３−１を参照して、第１の例について説明する。特徴抽出手段６は、クローズドキャプション蓄積手段５から時刻情報付きクローズドキャプションを読み出し、クローズドキャプションテキストデータを入力する（ステップＳ３１）。そして、当該クローズドキャプションテキストデータから文を１文毎に取り出し（ステップＳ３２）、形態素解析を行う（ステップＳ３３）。 Next, the operation of the feature extraction unit 6 shown in FIG. 1 will be described in detail. FIG. 3A is a flowchart illustrating a first example of processing of the feature extraction unit 6 illustrated in FIG. The first example will be described below with reference to FIG. The feature extraction unit 6 reads the closed caption with time information from the closed caption storage unit 5 and inputs the closed caption text data (step S31). Then, sentences are extracted from the closed caption text data for each sentence (step S32), and morphological analysis is performed (step S33).

図４は、図３に示したステップ３３における形態素解析を説明する図である。形態素解析とは、文を単語毎に分割すると共に、その品詞も出力する処理をいう。図４を参照して、例えば、クローズドキャプションテキストデータから取り出した１文が「これはライオン。」の場合、形態素解析により、「これ」「は」「ライオン」「。」を出力する。また、入力したクローズドキャプションテキストデータが「本当におとなしいライオンです。」の場合、形態素解析により、「本当に」「おとなしい」「ライオン」「です」「。」を出力する。 FIG. 4 is a diagram for explaining the morphological analysis in step 33 shown in FIG. Morphological analysis refers to a process of dividing a sentence into words and outputting the part of speech. Referring to FIG. 4, for example, when one sentence extracted from the closed caption text data is “This is a lion.”, “This”, “ha”, “lion” and “.” Are output by morphological analysis. Also, if the input closed caption text data is “It is a really docile lion”, “Really”, “Adult”, “Lion”, “I” and “.” Are output by morphological analysis.

特徴抽出手段６は、ステップ３３の形態素解析の結果に基づいて、クローズドキャプションテキストデータから取り出した１文が、（１）文末が体言止め、または、（２）「名詞＋断定の助動詞」であるか否かを判定する（ステップＳ３４）。具体的には、（１）では、文の中で一番最後の品詞が名詞の場合が相当し、（２）では、文の最後の２つの品詞が「名詞」及び「断定の助動詞」の場合が相当する。つまり、（１）または（２）を満たす文を抽出する。（１）の場合は、「名詞＋“だけ”」「名詞＋“ほど”」等の「名詞＋副助詞」のパターンを満たす文を抽出するようにしてもよい。 Based on the result of the morphological analysis in step 33, the feature extraction means 6 is one sentence taken out from the closed caption text data: (1) the sentence ends, or (2) "noun + assertive auxiliary verb" Is determined (step S34). Specifically, (1) corresponds to the case where the last part of speech in the sentence is a noun, and in (2), the last two parts of speech of the sentence are “noun” and “assertion auxiliary verb”. Case corresponds. That is, a sentence satisfying (1) or (2) is extracted. In the case of (1), a sentence satisfying the pattern of “noun + adjunctive” such as “noun +“ only ””, “noun +“ about ””, etc. may be extracted.

そして、特徴抽出手段６は、ステップ３４で抽出した文について、最終名詞（文の中で一番最後の名詞）を具象物名とし、具象物名、時刻情報及びクローズドキャプションから成るインデックス形式の情報（インデックス）を生成し（ステップＳ３５）、当該インデックスをインデックス蓄積手段８に蓄積する（ステップＳ３６）。 Then, the feature extraction means 6 uses the final noun (the last noun in the sentence) as a concrete object name for the sentence extracted in step 34, and information in an index format including the concrete object name, time information, and closed captions. (Index) is generated (step S35), and the index is stored in the index storage means 8 (step S36).

図３−２は、図１に示した特徴抽出手段６の処理の第２の例を示すフローチャート図である。以下、図３−２を参照して、第２の例について説明する。特徴抽出手段６は、番組情報蓄積手段１から時刻情報付きクローズドキャプションを読み出し、クローズドキャプションテキストデータを入力する（ステップＳ３１）。そして、当該クローズドキャプションテキストデータから文を１文毎に取り出し（ステップＳ３２）、形態素解析を行う（ステップＳ３３）。ステップ３１〜３３は、図３−１に示したステップ３１〜３３と同様である。 FIG. 3B is a flowchart illustrating a second example of the process of the feature extraction unit 6 illustrated in FIG. Hereinafter, a second example will be described with reference to FIG. The feature extraction unit 6 reads the closed caption with time information from the program information storage unit 1 and inputs the closed caption text data (step S31). Then, sentences are extracted from the closed caption text data for each sentence (step S32), and morphological analysis is performed (step S33). Steps 31 to 33 are the same as steps 31 to 33 shown in FIG.

特徴抽出手段６は、ステップ３３の形態素解析の結果に基づいて、クローズドキャプションテキストデータから取り出した１文が、指示代名詞を含むか否かを判定する（ステップＳ３７）。そして、指示代名詞を含む文を抽出する。ステップ３７で抽出した文について、「が格」に位置する名詞を具象物名とし、具象物名、時刻情報及びクローズドキャプションから成るインデックス形式の情報（インデックス）を生成し（ステップＳ３８）、当該インデックスをインデックス蓄積手段８に蓄積する（ステップＳ３６）。ステップ３６は、図３−１に示したステップ３６と同様である。 The feature extraction means 6 determines whether or not one sentence extracted from the closed caption text data includes an indicating pronoun based on the result of the morphological analysis in step 33 (step S37). Then, a sentence including a demonstrative pronoun is extracted. With respect to the sentence extracted in step 37, the noun positioned at “ga” is used as a concrete object name, and information in the index format (index) including the concrete object name, time information, and closed caption is generated (step S38). Is stored in the index storage means 8 (step S36). Step 36 is the same as step 36 shown in FIG.

図５は、図３−１に示したステップ３５、及び図３−２に示したステップ３８において、特徴抽出手段６により生成されたインデックス形式の情報を示す図である。このインデックス形式の情報は、具象物名５１、時刻情報５２及びクローズドキャプション５３から構成されている。具象物名５１は、ステップ３４，３７において抽出した文の中で、最後に現れる名詞または「が格」の名詞であり、検索の際には、この具象物名とのマッチングがなされる。時刻情報５２は、クローズドキャプション・映像分割手段２が、クローズドキャプション及び映像に時刻情報を付加したときの時刻情報である。 FIG. 5 is a diagram showing index format information generated by the feature extraction unit 6 in step 35 shown in FIG. 3A and step 38 shown in FIG. 3-2. This index format information includes a concrete object name 51, time information 52, and a closed caption 53. The concrete object name 51 is a noun that appears last in the sentences extracted in Steps 34 and 37 or a noun “ga”, and is matched with the concrete object name in the search. The time information 52 is time information when the closed caption / video dividing unit 2 adds time information to the closed caption and the video.

以上のように、実施例１の映像インデキシング装置１０１によれば、クローズドキャプション・映像分割手段２が、テレビ番組情報からクローズドキャプション及び映像を分割し、同一の時刻情報を付加し、時刻情報付きクローズドキャプション３を映像インデキシング蓄積手段７に蓄積する。また、特徴抽出手段６が、時刻情報付きクローズドキャプションから、具象物が被写体として映っている場面の映像を表している文を、所定の言語処理技術を用いて抽出すると共に、具象物名を抽出し、具象物名、時刻情報及びクローズドキャプションから成るインデックスを映像インデキシング蓄積手段７に蓄積するようにした。これにより、人物だけでなく動物等の多種多様な具象物を対象として、インデキシングした映像を蓄積することが可能となる。また、クローズドキャプションから具象物名を抽出するようにしたから、簡易な手段によりインデキシングした映像を蓄積することが可能となる。また、具象物名を検索語として、インデキシングした映像を検索することができるから、かかる映像の二次的利用が容易になる。 As described above, according to the video indexing apparatus 101 of the first embodiment, the closed caption / video dividing unit 2 divides the closed caption and the video from the TV program information, adds the same time information, and the closed with time information. Caption 3 is stored in video indexing storage means 7. In addition, the feature extraction means 6 extracts a sentence representing a scene image of a concrete object as a subject from a closed caption with time information using a predetermined language processing technique and extracts a concrete object name. In addition, an index composed of a concrete object name, time information, and closed caption is stored in the video indexing storage means 7. As a result, it is possible to accumulate indexed videos not only for people but also for various concrete objects such as animals. In addition, since the concrete object name is extracted from the closed caption, it is possible to store the indexed video by a simple means. In addition, since the indexed video can be searched using the concrete object name as a search word, secondary usage of the video is facilitated.

また、実施例１の映像インデキシング装置１０１によれば、番組制作現場において制作者が映像を二次利用する場合、通常、「○○が映っている映像」で検索する。本実施例は、番組制作現場において通常行われている検索に対応することができる。また、クローズドキャプションは、既に一般化して多くの番組で放送されている。したがって、新たな情報を入力してテレビ番組情報を構成する必要がない。また、前述のように、自然言語処理技術を利用して具象物を特定するようにしたから、単純なキーワードマッチング手法を用いるよりも、具象物を高精度に特定することが可能となる。 Further, according to the video indexing apparatus 101 of the first embodiment, when the producer uses the video for the second time at the program production site, the search is usually performed using “video showing OO”. This embodiment can cope with a search that is normally performed at a program production site. Closed captions have already been generalized and broadcast on many programs. Therefore, it is not necessary to input new information to configure the TV program information. Further, as described above, since the concrete object is specified using the natural language processing technique, it is possible to specify the concrete object with higher accuracy than using a simple keyword matching method.

次に、実施例２の映像インデキシング装置について説明する。実施例２は、実施例１に比べて、具象物を特定する精度をさらに向上させたものである。図６は、本発明による実施例２の映像インデキシング装置を示す全体構成図である。この映像インデキシング装置１０２は、番組情報蓄積手段１、クローズドキャプション・映像分割手段２、クローズドキャプション蓄積手段５、特徴抽出手段６０、具象物辞書蓄積手段６１、映像インデキシング蓄積手段７、検索手段１２、及び映像検索インターフェース１４を備えている。映像インデキシング装置１０２と図１に示した実施例１の映像インデキシング装置１０１とを比較すると、映像インデキシング装置１０２は、特徴抽出手段６の代わりに特徴抽出手段６０を備え、また、新たに具象物辞書蓄積手段６１を備えている点で相違する。尚、図６の映像インデキシング装置１０２において、図１の映像インデキシング装置１０１と共通する部分には図１と同一の符号を付し、その詳しい説明は省略する。 Next, a video indexing apparatus according to the second embodiment will be described. In the second embodiment, the accuracy of specifying a concrete object is further improved as compared with the first embodiment. FIG. 6 is an overall configuration diagram illustrating a video indexing apparatus according to a second embodiment of the present invention. The video indexing apparatus 102 includes a program information storage means 1, a closed caption / video division means 2, a closed caption storage means 5, a feature extraction means 60, a concrete object dictionary storage means 61, a video index storage means 7, a search means 12, and A video search interface 14 is provided. Comparing the video indexing device 102 with the video indexing device 101 of the first embodiment shown in FIG. 1, the video indexing device 102 includes a feature extraction unit 60 instead of the feature extraction unit 6, and a new concrete object dictionary. The difference is that the storage means 61 is provided. In the video indexing apparatus 102 in FIG. 6, the same reference numerals as those in FIG. 1 are attached to the same parts as those in the video indexing apparatus 101 in FIG. 1, and detailed description thereof is omitted.

特徴抽出手段６０は、クローズドキャプション蓄積手段５から時刻情報付きクローズドキャプションを読み出し、当該時刻情報付きクローズドキャプションから、具象物が被写体として映っている場面の映像を表している文を抽出し、具象物名を抽出し、その具象物名が具象物辞書蓄積手段６１に蓄積されているか否かを判定し、蓄積されている場合は、インデックス（具象物名、時刻情報及びクローズドキャプション）を生成し、映像インデキシング蓄積手段７に蓄積する。具象物辞書蓄積手段６１には、具象物を表す名詞（具象物名）が蓄積されている。 The feature extraction unit 60 reads the closed caption with time information from the closed caption storage unit 5, extracts a sentence representing the video of the scene where the concrete object is shown as the subject from the closed caption with the time information, and extracts the concrete object The name is extracted, and it is determined whether or not the concrete object name is stored in the concrete object dictionary storage means 61. If it is stored, an index (a concrete object name, time information and closed caption) is generated, The image is stored in the video indexing storage means 7. The concrete object dictionary storage means 61 stores nouns (representative object names) representing the concrete objects.

次に、映像インデキシング装置１０２の特徴抽出手段６０の動作について詳細に説明する。図７は、図６に示した特徴抽出手段６０の処理の例を示すフローチャート図である。図７のステップ３１〜３４，３５及び３６は、図３−１に示したステップ３１〜３４，３５及び３６と同様である。特徴抽出手段６０は、文の中の最終名詞（文の中で一番最後の名詞）が具象物辞書蓄積手段６１に蓄積されているか否かを判定する（ステップＳ７１）。蓄積されている場合は、具象物名、時刻情報及びクローズドキャプションから成るインデックス形式の情報を生成し（ステップＳ３５）、当該情報をインデックス蓄積手段８に蓄積する（ステップＳ３６）。これに対し、具象物辞書蓄積手段６１に蓄積されていない場合は、インデックス形式の情報を生成せず、インデックス蓄積手段８に蓄積しないで、ステップ３２へ移行する。 Next, the operation of the feature extraction unit 60 of the video indexing apparatus 102 will be described in detail. FIG. 7 is a flowchart showing an example of processing of the feature extraction unit 60 shown in FIG. Steps 31 to 34, 35 and 36 in FIG. 7 are the same as steps 31 to 34, 35 and 36 shown in FIG. The feature extraction means 60 determines whether or not the last noun in the sentence (the last noun in the sentence) is stored in the concrete object dictionary storage means 61 (step S71). If it is stored, information in the index format consisting of the concrete object name, time information and closed caption is generated (step S35), and the information is stored in the index storage means 8 (step S36). On the other hand, if the information is not stored in the concrete object dictionary storage unit 61, the index format information is not generated and stored in the index storage unit 8, and the process proceeds to step 32.

尚、図７に示したフローチャートは、図３−１に対応するものであるが、図３−２に対応する例についても同様である。すなわち、特徴抽出手段６０は、「が格」に位置する名詞が具象物辞書蓄積手段６１に蓄積されているか否かを判定し、蓄積されている場合は、具象物名、時刻情報及びクローズドキャプションから成るインデックス形式の情報を生成し、当該情報をインデックス蓄積手段８に蓄積する。 The flowchart shown in FIG. 7 corresponds to FIG. 3-1, but the same applies to the example corresponding to FIG. 3-2. That is, the feature extraction means 60 determines whether or not a noun located in “gaku” is accumulated in the concrete object dictionary accumulation means 61. If it is accumulated, the feature object name, time information and closed caption are stored. Information in the index format is generated, and the information is stored in the index storage unit 8.

以上のように、実施例２の映像インデキシング装置１０２によれば、特徴抽出手段６０が、時刻情報付きクローズドキャプションの一文から具象物名を抽出する際に、文の中の最終名詞または「が格」に位置する名詞が具象物辞書蓄積手段６１に蓄積されているか否かを判定し、蓄積されている場合に、インデックスを生成してインデックス蓄積手段８に蓄積するようにした。これにより、実施例１の効果に加えて、具象物を特定する精度をさらに向上させることが可能となる。 As described above, according to the video indexing device 102 of the second embodiment, when the feature extraction unit 60 extracts the concrete object name from one sentence of the closed caption with time information, the final noun or “ It is determined whether or not the noun located at “” is stored in the concrete object dictionary storage means 61, and if it is stored, an index is generated and stored in the index storage means 8. Thereby, in addition to the effect of Example 1, it becomes possible to further improve the precision which specifies a concrete object.

次に、実施例３の映像インデキシング装置について説明する。実施例３は、実施例１及び２に比べて、大量の番組情報を効率良く扱うようにしたものである。図８は、本発明による実施例３の映像インデキシング装置における番組情報蓄積手段１５及び映像インデキシング蓄積手段７０を示す構成図である。実施例３の映像インデキシング装置は、図１に示した映像インデキシング装置１０１、及び図６に示した映像インデキシング装置１０２と同様に構成されるが、番組情報蓄積手段１の代わりに番組情報蓄積手段１５を、映像インデキシング蓄積手段７の代わりに映像インデキシング蓄積手段７０を備えている点で実施例１及び２と相違する。 Next, a video indexing apparatus according to the third embodiment will be described. In the third embodiment, a large amount of program information is handled more efficiently than in the first and second embodiments. FIG. 8 is a block diagram showing the program information storage means 15 and the video indexing storage means 70 in the video indexing apparatus according to the third embodiment of the present invention. The video indexing apparatus according to the third embodiment is configured in the same manner as the video indexing apparatus 101 shown in FIG. 1 and the video indexing apparatus 102 shown in FIG. 6, but the program information storage means 15 instead of the program information storage means 1. Is different from the first and second embodiments in that a video indexing storage unit 70 is provided instead of the video indexing storage unit 7.

番組情報蓄積手段１５には、番組毎に、番組を一意に示す識別子（ＩＤ番号）を含むテレビ番組情報８１が蓄積されている。また、映像インデキシング蓄積手段７０は、インデックス蓄積手段８０及び時刻情報付き映像蓄積手段９０を備えており、インデックス蓄積手段８０には、番組毎に、番組を一意に示す識別子、具象物名、時刻情報及びクローズドキャプションから成るインデックスが蓄積され、時刻情報付き映像蓄積手段９には、番組毎に、番組を一意に示す識別子及び時刻情報付き映像が蓄積される。この識別子は、各種情報の蓄積を分類するために利用される。 The program information storage means 15 stores television program information 81 including an identifier (ID number) that uniquely indicates a program for each program. The video indexing storage unit 70 includes an index storage unit 80 and a video storage unit 90 with time information. The index storage unit 80 includes an identifier, a concrete object name, and time information for each program. And an index composed of closed captions is stored, and the time information-added video storage means 9 stores, for each program, an identifier uniquely indicating the program and a video with time information. This identifier is used to classify the accumulation of various types of information.

図９は、実施例３の特徴抽出手段６により生成されたインデックス形式の情報を示す図である。このインデックス形式の情報は、番組毎に、具象物名、時刻情報及びクローズドキャプションから構成されている。図９に示すように、番組はＩＤ：１〜４により区別されている。具象物名、時刻情報及びクローズドキャプションは、図５に示した具象物名５１、時刻情報５２及びクローズドキャプション５３と同様である。 FIG. 9 is a diagram illustrating index format information generated by the feature extraction unit 6 according to the third embodiment. This index format information includes a concrete object name, time information, and closed caption for each program. As shown in FIG. 9, programs are distinguished by IDs 1 to 4. The concrete object name, time information, and closed caption are the same as the concrete object name 51, time information 52, and closed caption 53 shown in FIG.

図１０は、実施例３の映像検索インターフェース１４が画面に表示する検索結果一覧の例を示す図である。実施例３の映像検索インターフェース１４は、図１０に示すように、番組を一意に示す識別子、時刻情報、サムネイル画像及びクローズドキャプションを画面に一覧表示する。 FIG. 10 is a diagram illustrating an example of a search result list displayed on the screen by the video search interface 14 according to the third embodiment. As shown in FIG. 10, the video search interface 14 according to the third embodiment displays a list of identifiers, time information, thumbnail images, and closed captions that uniquely indicate programs.

以上のように、実施例３の映像インデキシング装置によれば、番組情報蓄積手段１５には番組毎に番組情報が蓄積され、映像インデキシング蓄積手段７０にも番組毎にインデックス及び時刻情報付き映像が蓄積され、映像検索インターフェース１４が、番組を一意に示す識別子を含む検索結果を出力するようにした。これにより、各種情報を番組毎に扱うようにしたから、実施例１または２の効果に加えて、大量の番組情報を効率良く扱うことが可能となる。また、映像インデキシング蓄積手段７０には番組毎のインデックス及び映像が蓄積されるから、当該映像インデキシング蓄積手段７０を、番組の貴重な映像を利用したマルチメディア百科事典として利用することが可能となる。 As described above, according to the video indexing apparatus of the third embodiment, program information is stored in the program information storage unit 15 for each program, and the video with index and time information is stored in the video indexing storage unit 70 for each program. The video search interface 14 outputs a search result including an identifier that uniquely indicates the program. Thereby, since various information is handled for each program, in addition to the effects of the first or second embodiment, a large amount of program information can be handled efficiently. Further, since the index and video for each program are stored in the video indexing storage means 70, the video indexing storage means 70 can be used as a multimedia encyclopedia using valuable video of the program.

次に、実施例４の映像インデキシング装置について説明する。実施例４は、実施例１〜３と異なり、クローズドキャプションの代わりに音声を用いて映像上の具象物を特定するものである。図１１は、本発明による実施例４の映像インデキシング装置を示す全体構成図である。この映像インデキシング装置１０４は、番組情報蓄積手段１、音声抽出手段１１１、音声認識手段１１３、時刻情報付き音声認識結果蓄積手段１１４、特徴抽出手段６、映像インデキシング蓄積手段７、検索手段１２、及び映像検索インターフェース１４を備えている。映像インデキシング装置１０４と図１に示した実施例１の映像インデキシング装置１０１とを比較すると、映像インデキシング装置１０４は、クローズドキャプション・映像分割手段２及びクローズドキャプション蓄積手段５の代わりに音声抽出手段１１１、音声認識手段１１３及び時刻情報付き音声認識結果蓄積手段１１４を備えている点で相違する。尚、図１１の映像インデキシング装置１０４において、図１の映像インデキシング装置１０１と共通する部分には図１と同一の符号を付し、その詳しい説明は省略する。 Next, a video indexing apparatus according to the fourth embodiment will be described. The fourth embodiment is different from the first to third embodiments in that a concrete object on the video is specified using sound instead of the closed caption. FIG. 11 is an overall configuration diagram showing a video indexing apparatus according to a fourth embodiment of the present invention. This video indexing device 104 includes program information storage means 1, voice extraction means 111, voice recognition means 113, voice recognition result storage means 114 with time information, feature extraction means 6, video indexing storage means 7, search means 12, and video. A search interface 14 is provided. Comparing the video indexing device 104 with the video indexing device 101 of the first embodiment shown in FIG. 1, the video indexing device 104 includes an audio extracting unit 111 instead of the closed caption / video dividing unit 2 and the closed caption storage unit 5. The difference is that a voice recognition means 113 and a voice recognition result accumulating means 114 with time information are provided. In the video indexing device 104 in FIG. 11, the same reference numerals as those in FIG. 1 are assigned to the portions common to the video indexing device 101 in FIG. 1, and the detailed description thereof is omitted.

音声抽出手段１１１は、番組情報蓄積手段１からテレビ番組情報を読み出し、当該テレビ番組情報から音声を抽出し、音声及び映像を分割する。そして、音声及び映像にそれぞれ同一の時刻情報を付加し、時刻情報付き音声１１２を音声認識手段１１３に出力する。また、時刻情報付き映像４を出力し、映像インデキシング蓄積手段７に蓄積する。 The audio extraction unit 111 reads the TV program information from the program information storage unit 1, extracts audio from the TV program information, and divides the audio and video. Then, the same time information is added to the sound and video, and the sound 112 with time information is output to the sound recognition means 113. Also, the video 4 with time information is output and stored in the video indexing storage means 7.

音声認識手段１１３は、音声抽出手段１１１から時刻情報付き音声１１２を入力し、音声を認識し、時刻情報付き音声認識結果を出力し、時刻情報付き音声認識結果蓄積手段１１４に蓄積する。ここで、音声認識手段１１３には、例えば、ＩＢＭ（登録商標）社のVia Voiceと呼ばれる装置や本件出願人の音声認識装置（特願平１１−６０６４０号公報を参照）が用いられる。時刻情報付き音声認識結果蓄積手段１１４には、時刻情報付き音声認識結果が蓄積される。 The voice recognition unit 113 receives the voice 112 with time information from the voice extraction unit 111, recognizes the voice, outputs the voice recognition result with time information, and stores the result in the voice recognition result with time information storage unit 114. Here, as the voice recognition means 113, for example, a device called Via Voice of IBM (registered trademark) or a voice recognition device of the present applicant (see Japanese Patent Application No. 11-60640) is used. The voice recognition result with time information is stored in the voice recognition result with time information storage unit 114.

特徴抽出手段６は、時刻情報付き音声認識結果蓄積手段１１４から時刻情報付き音声認識結果を読み出し、当該時刻情報付き音声認識結果から、具象物が被写体として映っている場面の映像を表している文を抽出し、具象物名を抽出し、検索のためのインデックス（具象物名、時刻情報及び音声認識結果）を生成して出力し、映像インデキシング蓄積手段７に蓄積する。特徴抽出手段６の動作は、図３−１及び３−２に示した動作と同様である。 The feature extraction unit 6 reads the speech recognition result with time information from the speech recognition result storage unit 114 with time information, and from the speech recognition result with time information, a sentence representing an image of a scene in which a concrete object is shown as a subject Are extracted, concrete object names are extracted, indexes for retrieval (concrete object names, time information, and speech recognition results) are generated and output, and stored in the video indexing storage means 7. The operation of the feature extraction unit 6 is the same as the operation shown in FIGS.

尚、図１１に示した特徴抽出手段６の代わりに、図６に示した実施例２の特徴抽出手段６０及び具象物辞書蓄積手段６１を備えるようにしてもよいし、番組情報蓄積手段１及び映像インデキシング蓄積手段７の代わりに、図８に示した実施例３の番組情報蓄積手段１５及び映像インデキシング蓄積手段７０を備えるようにしてもよい。 In place of the feature extraction unit 6 shown in FIG. 11, the feature extraction unit 60 and the concrete object dictionary storage unit 61 of the second embodiment shown in FIG. Instead of the video index storage means 7, the program information storage means 15 and the video index storage means 70 of the third embodiment shown in FIG. 8 may be provided.

以上のように、実施例４の映像インデキシング装置１０４によれば、特徴抽出手段６が、音声抽出手段１１１、音声認識手段１１３及び時刻情報付き音声認識結果蓄積手段１１４により、時刻情報付き音声認識結果を解析し、インデックス（具象物名、時刻情報及び音声認識結果）を生成し、映像インデキシング蓄積手段７に蓄積するようにした。これにより、実施例１〜３の効果に加えて、音声を用いて映像上の具象物を特定することが可能となる。 As described above, according to the video indexing device 104 of the fourth embodiment, the feature extraction unit 6 uses the voice extraction unit 111, the voice recognition unit 113, and the voice recognition result storage unit with time information 114 to store the voice recognition result with time information. Then, an index (concrete object name, time information, and voice recognition result) is generated and stored in the video indexing storage means 7. Thereby, in addition to the effect of Examples 1-3, it becomes possible to specify the concrete object on an image | video using an audio | voice.

以上、実施例を挙げて本発明を説明したが、本発明は上記実施例に限定されるものではなく、その技術思想を逸脱しない範囲で種々変形可能である。例えば、上記実施例１〜４では、番組情報蓄積手段１，１５、クローズドキャプション蓄積手段５、映像インデキシング蓄積手段７，７０、インデックス蓄積手段８、時刻情報付き映像蓄積手段９、具象物辞書蓄積手段６１、時刻情報付き音声認識結果蓄積手段１１４には、各種情報等が蓄積されるようにしたが、これらの蓄積手段は、ハードディスク等の蓄積装置、データベースシステムまたはサーバで構成するようにしてもよい。また、クローズドキャプション・映像分割手段２、特徴抽出手段６，６０、検索手段１２、音声抽出手段１１１、音声認識手段１１３は、それぞれの機能を有する装置、カードまたはモジュールで構成するようにしてもよい。 The present invention has been described with reference to the embodiments. However, the present invention is not limited to the above embodiments, and various modifications can be made without departing from the technical idea thereof. For example, in the first to fourth embodiments, the program information storage means 1 and 15, the closed caption storage means 5, the video indexing storage means 7 and 70, the index storage means 8, the video information storage means 9 with time information, the concrete object dictionary storage means 61. Various kinds of information are stored in the speech recognition result storage unit 114 with time information, but these storage units may be configured by a storage device such as a hard disk, a database system, or a server. . Further, the closed caption / video dividing unit 2, the feature extracting units 6 and 60, the searching unit 12, the voice extracting unit 111, and the voice recognizing unit 113 may be configured by devices, cards, or modules having respective functions. .

本発明による実施例１の映像インデキシング装置を示す全体構成図である。1 is an overall configuration diagram illustrating a video indexing apparatus according to a first embodiment of the present invention. 検索結果一覧の例を示す図である。It is a figure which shows the example of a search result list. 特徴抽出手段のフローチャート図（１）である。It is a flowchart figure (1) of a feature extraction means. 特定抽出手段のフローチャート図（２）である。It is a flowchart figure (2) of a specific extraction means. 形態素解析を説明する図である。It is a figure explaining a morphological analysis. インデックス形式を示す図である。It is a figure which shows an index format. 本発明による実施例２の映像インデキシング装置を示す全体構成図である。It is a whole block diagram which shows the video indexing apparatus of Example 2 by this invention. 特徴抽出手段のフローチャート図である。It is a flowchart figure of a feature extraction means. 本発明による実施例３の映像インデキシング装置における映像インデキシング蓄積装置を示す構成図である。It is a block diagram which shows the video indexing accumulation | storage apparatus in the video indexing apparatus of Example 3 by this invention. インデックス形式を示す図である。It is a figure which shows an index format. 検索結果一覧の例を示す図である。It is a figure which shows the example of a search result list. 本発明による実施例４の映像インデキシング装置を示す全体構成図である。It is a whole block diagram which shows the video indexing apparatus of Example 4 by this invention.

Explanation of symbols

１，１５番組情報蓄積手段
２クローズドキャプション・映像分割手段
３時刻情報付きクローズドキャプション
４，１１，６３時刻情報付き映像
５クローズドキャプション蓄積手段
６，６０特徴抽出手段
７，７０映像インデキシング蓄積手段
８，８０インデックス蓄積手段
９，９０時刻情報付き映像蓄積手段
１０時刻情報付きインデックス
１２検索手段
１３検索結果
１４映像検索インターフェース
５１具象物名
５２時刻情報
５３クローズドキャプション
６１具象物辞書蓄積手段
６２インデックス
８１テレビ番組情報
１０１，１０２，１０４映像インデキシング装置
１１１音声抽出手段
１１２時刻情報付き音声
１１３音声認識手段
１１４時刻情報付き音声認識結果蓄積手段 1,15 Program information storage means 2 Closed caption / video dividing means 3 Closed caption with time information 4, 11, 63 Video with time information 5 Closed caption storage means 6, 60 Feature extraction means 7, 70 Video indexing storage means 8, 80 Index storage means 9, 90 Time information video storage means 10 Time information index 12 Search means 13 Search result 14 Video search interface 51 Concrete object name 52 Time information 53 Closed caption 61 Concrete object dictionary storage means 62 Index 81 Television program information 101 , 102, 104 Video indexing device 111 Voice extraction means 112 Voice 113 with time information Voice recognition means 114 Voice recognition result storage means with time information

Claims

In a video indexing device that identifies a concrete object on the video from the original video information and stores the video of the concrete object,
From the closed caption included in the original video information, extract the sentence whose sentence end is body-stopping, or the last two parts of speech at the end of the sentence are nouns and assertive auxiliary verbs, and extract the last of the nouns included in the extracted sentence A video indexing device comprising feature extraction means for extracting a name of a concrete object by a noun that appears.

In a video indexing device that identifies a concrete object on the video from the original video information and stores the video of the concrete object,
A feature extraction unit is provided for extracting a sentence including a pronoun from a closed caption included in the original video information, and extracting a name of the concrete object using a noun in which the noun is included in the extracted sentence. A video indexing device characterized by that.

The video indexing device according to claim 1 or 2,
Further, dividing means for dividing the closed caption and the video from the original video information, and adding the same time information to the closed caption and the video, respectively,
An index including the name of the concrete object extracted by the feature extraction means, a closed caption with time information divided and given by the division means, and a video with time information divided and given by the division means are stored. A video indexing device comprising: a video indexing storage means.

The video indexing device according to claim 3,
The video indexing apparatus further comprises a search means for searching the video with index and time information by the name of the concrete object from the video index storage means.

In the video indexing device according to any one of claims 1 to 4,
Furthermore, it comprises a concrete object dictionary storage means for storing the names of the concrete objects,
The feature extraction unit searches the concrete object dictionary storage unit, and extracts the name of the concrete object when the final noun of claim 1 or the noun of claim 2 is stored. A video indexing device characterized by

In the video indexing device according to any one of claims 3 to 5,
The video indexing apparatus, wherein the original video information, the index and the video with time information stored in the video indexing storage unit according to claim 3 are distinguished for each program.

In the video indexing device according to any one of claims 1 to 6,
A video indexing apparatus characterized in that audio is used instead of the closed caption.