JP2008301426A

JP2008301426A - Featured value generating device, summary video detecting device, and program

Info

Publication number: JP2008301426A
Application number: JP2007148389A
Authority: JP
Inventors: Yoshihiko Kawai; 吉彦河合; Nobuyuki Yagi; 伸行八木
Original assignee: Nippon Hoso Kyokai NHK; Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2007-06-04
Filing date: 2007-06-04
Publication date: 2008-12-11
Anticipated expiration: 2027-06-04
Also published as: JP4731522B2

Abstract

<P>PROBLEM TO BE SOLVED: To attain a summary video detection of high detection accuracy which is applicable to a variety of videos. <P>SOLUTION: A feature vector generating device 3 characteristically comprises a program video feature vector generator 30 and a stored video feature vector generator 32, which extract one or a plurality of index words from at least one of character data or an audio signal corresponding to a video, and generates a featured value associated with the video, based on the number of times of appearance of each index word in the video. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、特徴量生成装置、要約映像検出装置、及びプログラムに関する。 The present invention relates to a feature quantity generation device, a summary video detection device, and a program.

近年、テレビ放映された映像を、番組ごとの番組映像として蓄積し、いつでも誰でも見られるようにするサービスが人気を集めている。このようなサービスでは、蓄積される番組映像の量が膨大なものになってしまうため、ユーザが効率的に見たい番組映像を選択できるよう、番組映像ごとに、その要約映像を用意しておくことが望まれている。 In recent years, services that accumulate television broadcast video as program video for each program so that anyone can view it at any time are gaining popularity. In such a service, the amount of the program video to be accumulated becomes enormous, so a summary video is prepared for each program video so that the user can select the program video that he / she wants to see efficiently. It is hoped that.

番組映像の蓄積時に新たに要約映像を作成することも考えられるが、多くの番組では、実は放送前にスポット映像や告知映像を作成していることが多く、これらを用いれば、効率的に要約映像を取得できる。 Although it is conceivable to create a new summary video when the program video is stored, many programs actually create a spot video or a notification video before broadcasting. Video can be acquired.

ところが、蓄積段階で上記要約映像が番組ごとに用意されていることは少なく、通常は、テレビ放映された映像の中から要約映像を見つけ出さなければならない。そこで、テレビ放映された映像（以下、蓄積映像という。）から、要約映像として相応しい部分（以下、要約部分という。）を検出する技術が望まれている。 However, the summary video is rarely prepared for each program at the accumulation stage, and it is usually necessary to find the summary video from the video broadcast on television. Therefore, there is a demand for a technique for detecting a portion suitable for a summary video (hereinafter referred to as a summary portion) from a television broadcast video (hereinafter referred to as a stored video).

この点、非特許文献１及び２には、このような技術として用いることのできる技術が開示されている。
非特許文献１に開示される技術によれば、蓄積映像の区間ごとに色ヒストグラムに基づく特徴ベクトルを求めておくとともに、番組映像についても同様の特徴ベクトルを求め、特徴ベクトルの類似性によって、要約部分を検出することができる。 In this regard, Non-Patent Documents 1 and 2 disclose techniques that can be used as such techniques.
According to the technique disclosed in Non-Patent Document 1, a feature vector based on a color histogram is obtained for each section of accumulated video, and a similar feature vector is obtained for a program video. The part can be detected.

また、非特許文献２に開示される技術によれば、蓄積映像の区間ごとにカメラのフラッシュの発光パターンを求めておくとともに、番組映像についても同様の発光パターンを求め、発光パターンの類似性によって、要約部分を検出することができる。
柏野邦夫他著、「ヒストグラム枝刈りアルゴリズムに基づくオーディオ及びビデオ信号の高速検索方法(Quick Search Method for Audio and Video Signals Based on Histogram Pruning)」、IEEE TRANSACTIONS ON MULTIMEDIA、Vol.5 No.3、２００３年９月、p.348-357 瀧本政雄他著、「大容量放送映像アーカイブからの同一フラッシュシーン映像の発見」、電子情報通信学会論文誌（Ｄ）、Vol.J89-D,No.12、２００６年１２月、p.2699-2709 Further, according to the technique disclosed in Non-Patent Document 2, the flash emission pattern of the camera flash is obtained for each section of the stored video, and the same light emission pattern is obtained for the program video, and the similarity of the light emission patterns is obtained. The summary part can be detected.
Kunio Kanno et al., “Quick Search Method for Audio and Video Signals Based on Histogram Pruning”, IEEE TRANSACTIONS ON MULTIMEDIA, Vol.5 No.3, 2003 September, p.348-357 Masao Enomoto et al., “Discovery of the same flash scene video from large-capacity broadcast video archives”, IEICE Transactions (D), Vol. J89-D, No. 12, December 2006, p.2699- 2709

しかしながら、上記非特許文献１に開示される技術には、色の類似性のみに基づいて要約部分を見つけることになることから、検出精度があまり上がらないという問題がある。また、上記非特許文献２に開示される技術は、カメラのフラッシュがたかれている映像（ニュースに差し込まれる現場映像など）にしか適用し得ないという問題がある。 However, the technique disclosed in Non-Patent Document 1 has a problem that detection accuracy does not increase so much because a summary portion is found based only on color similarity. In addition, the technique disclosed in Non-Patent Document 2 has a problem that it can be applied only to video (such as on-site video inserted into news) in which the flash of the camera is being shot.

従って、本発明の課題の一つは、幅広い映像に適用可能な、検出精度の高い要約映像検出を実現するための特徴量生成装置、要約映像検出装置、及びプログラムを提供することにある。 Accordingly, one of the objects of the present invention is to provide a feature amount generation device, a summary video detection device, and a program for realizing summary video detection with high detection accuracy applicable to a wide range of videos.

上記課題を解決するための本発明にかかる特徴量生成装置は、映像に対応する文字データ又は音声信号のうちの少なくとも一方から、１又は複数の索引語を抽出する抽出手段と、前記索引語ごとの前記映像内での出現数に基づいて前記映像に関する特徴量を生成する特徴量生成手段と、を含むことを特徴とする。
上記特徴量は映像の意味内容を反映しているので、上記特徴量を用いて行われる要約画像検出は、幅広い映像に適用可能な、検出精度の高いものとなる。なお、前記文字データは、前記映像に含まれるクローズドキャプションや、電子番組ガイド内の前記映像にかかる部分の文字データを含む、こととしてもよい。こうすれば、クローズドキャプションや電子番組ガイドから文字データを取得することができる。 In order to solve the above problems, a feature value generating apparatus according to the present invention includes an extraction unit that extracts one or a plurality of index words from at least one of character data or audio signals corresponding to a video, and each index word And a feature quantity generating means for generating a feature quantity related to the video based on the number of appearances in the video.
Since the feature quantity reflects the semantic content of the video, the summary image detection performed using the feature quantity is applicable to a wide range of videos and has high detection accuracy. The character data may include closed captions included in the video and character data of a portion related to the video in the electronic program guide. In this way, character data can be acquired from a closed caption or an electronic program guide.

また、上記各特徴量生成装置において、前記特徴量生成手段は、蓄積映像内での出現数に基づいて決定される前記索引語ごとの希少性にさらに基づいて、前記映像に関する特徴量を生成する、こととしてもよい。
これによれば、特徴量による要約映像検出の精度を、さらに高めることができる。 Further, in each of the feature value generation devices, the feature value generation means generates a feature value related to the video based on the rarity of each index word determined based on the number of appearances in the stored video. It's good.
According to this, the accuracy of the summary video detection based on the feature amount can be further increased.

また、本発明の一側面にかかる特徴量生成装置は、映像の区間ごとに、区間映像に対応する文字データ又は音声信号のうちの少なくとも一方から、１又は複数の索引語を抽出する抽出手段と、前記索引語ごとの前記区間映像内での出現数に基づいて該区間映像に関する特徴量を生成する区間映像特徴量生成手段と、前記区間映像特徴量生成手段によって生成される各区間映像に関する特徴量に基づき、隣り合った複数の区間映像からなる連続区間映像に関する特徴量を生成する連続区間映像特徴量生成手段と、を含むことを特徴とする。
これによれば、映像の区間ごとに、特徴量を作成することができるとともに、特徴量算出の処理負荷を軽減することが可能になる。 In addition, the feature value generation device according to one aspect of the present invention includes an extraction unit that extracts one or a plurality of index words from at least one of character data or audio signals corresponding to a section video for each section of the video. , A section video feature generating unit that generates a feature amount related to the section video based on the number of appearances in the section video for each index word, and a feature related to each section video generated by the section video feature generating unit And a continuous segment video feature quantity generating unit that generates a feature quantity related to a continuous segment video composed of a plurality of adjacent segment videos based on the quantity.
According to this, it is possible to create a feature amount for each section of the video, and to reduce the processing load of feature amount calculation.

また、本発明にかかる要約映像検出装置は、蓄積映像の各区間それぞれについて、請求項１に記載された特徴量生成装置により生成された特徴量を取得する蓄積映像特徴量取得手段と、番組映像について、請求項１に記載された特徴量生成装置により生成された特徴量を取得する番組映像特徴量取得手段と、前記蓄積映像の前記区間ごとに、該区間に関する特徴量と、前記番組映像に関する特徴量と、の類似度を算出する類似度算出手段と、前記類似度算出手段の算出結果に基づいて、前記蓄積映像の前記各区間の中から前記番組映像の要約映像を検出する要約映像検出手段と、を含むことを特徴とする。
これによれば、幅広い映像に適用可能な、検出精度の高い要約映像検出が実現される。 A summary video detection apparatus according to the present invention includes a stored video feature quantity acquisition unit that acquires a feature quantity generated by the feature quantity generation apparatus according to claim 1 for each section of a stored video, and a program video. The program video feature quantity acquisition means for acquiring the feature quantity generated by the feature quantity generation device according to claim 1, the feature quantity related to the section for each section of the stored video, and the program video Similarity calculation means for calculating the similarity between the feature quantity, and summary video detection for detecting the summary video of the program video from each section of the stored video based on the calculation result of the similarity calculation means Means.
According to this, summary video detection with high detection accuracy applicable to a wide range of videos is realized.

また、本発明の一側面にかかる要約映像検出装置は、蓄積映像の各区間及び隣り合った複数の区間からなる連続区間それぞれについて、請求項３に記載された特徴量生成装置により生成された特徴量を取得する蓄積映像特徴量取得手段と、番組映像について、請求項１に記載された特徴量生成装置により生成された特徴量を取得する番組映像特徴量取得手段と、前記蓄積映像の前記区間及び前記連続区間ごとに、該区間又は該連続区間に関する特徴量と、前記番組映像に関する特徴量と、の類似度を算出する類似度算出手段と、前記類似度算出手段の算出結果に基づいて、前記蓄積映像の前記各区間及び前記各連続区間の中から前記番組映像の要約映像を検出する要約映像検出手段と、を含むことを特徴とする。
このようにしても、幅広い映像に適用可能な、検出精度の高い要約映像検出が実現される。また、様々な長さの映像区間を要約映像候補として取り扱うことが可能になる。 According to another aspect of the present invention, there is provided a summary video detection apparatus that generates features generated by the feature value generation device according to claim 3 for each section of accumulated video and each of consecutive sections including a plurality of adjacent sections. Accumulated video feature quantity acquisition means for acquiring a quantity, program video feature quantity acquisition means for acquiring a feature quantity generated by the feature quantity generation device according to claim 1 for the program video, and the section of the stored video And for each of the continuous sections, based on the calculation result of the similarity calculation means for calculating the similarity between the feature quantity related to the section or the continuous section and the feature quantity related to the program video, the calculation result of the similarity calculation means, And summary video detection means for detecting a summary video of the program video from each section and each continuous section of the stored video.
In this way, summary video detection with high detection accuracy applicable to a wide range of videos is realized. In addition, video sections of various lengths can be handled as summary video candidates.

また、本発明にかかるプログラムは、映像に対応する文字データ又は音声信号のうちの少なくとも一方から、１又は複数の索引語を抽出する抽出手段、及び前記索引語ごとの前記映像内での出現数に基づいて前記映像に関する特徴量を生成する特徴量生成手段、としてコンピュータを機能させるためのプログラムである。 In addition, the program according to the present invention includes an extracting unit that extracts one or a plurality of index words from at least one of character data or audio signals corresponding to a video, and the number of appearances in the video for each index word. Is a program for causing a computer to function as a feature amount generating means for generating a feature amount related to the video based on the image.

以下、本発明の実施の形態について、図面を参照しながら説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１は、本実施の形態にかかる要約映像検出システム１のシステム構成を示す図である。同図に示すように、要約映像検出システム１は、映像データベース２、特徴ベクトル生成装置３、特徴ベクトルデータベース４、要約映像検出装置５を含んで構成される。 FIG. 1 is a diagram showing a system configuration of a summary video detection system 1 according to the present embodiment. As shown in the figure, the summary video detection system 1 includes a video database 2, a feature vector generation device 3, a feature vector database 4, and a summary video detection device 5.

図１は、各装置の機能ブロックも示している。同図に示すように、映像データベース２は機能的に蓄積映像記憶部２０を含んで構成されており、蓄積映像記憶部２０は番組映像記憶部２１を含んで構成される。また、特徴ベクトル生成装置３は機能的に、番組映像特徴ベクトル生成部３０、希少性情報算出部３１、蓄積映像特徴ベクトル生成部３２を含んで構成される。また、特徴ベクトルデータベース４は機能的に、番組映像特徴ベクトル記憶部４０及び蓄積映像特徴ベクトル記憶部４１を含んで構成される。また、要約映像検出装置５は機能的に、番組指定受付部５０、番組映像特徴ベクトル取得部５１、類似度算出部５２、蓄積映像特徴ベクトル取得部５３、類似度記憶部５４、要約映像検出部５５を含んで構成される。以下、これら各部について詳細に説明する。 FIG. 1 also shows functional blocks of each device. As shown in the figure, the video database 2 is functionally configured to include an accumulated video storage unit 20, and the accumulated video storage unit 20 includes a program video storage unit 21. The feature vector generation device 3 is functionally configured to include a program video feature vector generation unit 30, a rarity information calculation unit 31, and an accumulated video feature vector generation unit 32. The feature vector database 4 is functionally configured to include a program video feature vector storage unit 40 and an accumulated video feature vector storage unit 41. In addition, the summary video detection device 5 functionally includes a program designation receiving unit 50, a program video feature vector acquisition unit 51, a similarity calculation unit 52, an accumulated video feature vector acquisition unit 53, a similarity storage unit 54, and a summary video detection unit. 55 is comprised. Hereinafter, each of these parts will be described in detail.

まず、映像データベース２について説明する。蓄積映像記憶部２０は、過去にテレビ放送された映像（番組として編集・放送された映像である番組映像の他、スポット映像や告知映像等も含む。以下、蓄積映像という。）を、所定の映像形式（例えばＭＰＥＧ等。）により記憶している。なお、蓄積映像記憶部２０は、映像に対応する文字データ（クローズドキャプション（ＣＣ）や、番組ごとの電子番組ガイド（ＥＰＧ）など）や音声信号も、映像に同期させて記憶する。 First, the video database 2 will be described. The stored video storage unit 20 is a predetermined video that has been broadcast on television in the past (including a program video that has been edited and broadcast as a program, as well as a spot video, a notification video, etc., hereinafter referred to as a stored video). It is stored in a video format (for example, MPEG). The stored video storage unit 20 also stores character data (closed caption (CC), electronic program guide (EPG) for each program, etc.) and audio signals corresponding to the video in synchronization with the video.

次に、特徴ベクトル生成装置３及び特徴ベクトルデータベース４について説明する。以下、まず番組映像に関する特徴ベクトルの生成について説明する。番組映像特徴ベクトル生成部３０は、蓄積映像記憶部２０に記憶される各番組映像について、それぞれに関する特徴ベクトルを生成する。以下、具体的に説明する。 Next, the feature vector generation device 3 and the feature vector database 4 will be described. Hereinafter, generation of a feature vector related to a program video will be described first. The program video feature vector generation unit 30 generates a feature vector for each program video stored in the stored video storage unit 20. This will be specifically described below.

図２は、番組映像特徴ベクトル生成部３０の内部構成を示す図である。同図に示すように、番組映像特徴ベクトル生成部３０は抽出部３００及び特徴ベクトル生成部３０１を含んで構成される。 FIG. 2 is a diagram illustrating an internal configuration of the program video feature vector generation unit 30. As shown in the figure, the program video feature vector generation unit 30 includes an extraction unit 300 and a feature vector generation unit 301.

抽出部３００は、番組映像に付加された文字データ又は音声信号のうちの少なくとも一方から、１又は複数の索引語を抽出する（抽出手段）。
抽出部３００の処理について、その内部構成を示す図面を参照しながら説明する。図３は、抽出部３００の内部構成を示す図である。同図に示すように、抽出部３００は、ストリーム分離部３０００、索引語抽出部３００２、音声認識部３００３、索引語抽出部３００４、統合部３００５を含んで構成される。 The extraction unit 300 extracts one or a plurality of index words from at least one of character data or audio signals added to the program video (extraction means).
The processing of the extraction unit 300 will be described with reference to the drawing showing its internal configuration. FIG. 3 is a diagram illustrating an internal configuration of the extraction unit 300. As shown in the figure, the extraction unit 300 includes a stream separation unit 3000, an index word extraction unit 3002, a speech recognition unit 3003, an index word extraction unit 3004, and an integration unit 3005.

ストリーム分離部３０００は、蓄積映像記憶部２０に記憶される映像から、付加されている文字データ（クローズドキャプションや電子番組ガイド情報）及び音声信号を分離する。そして、文字データを索引語抽出部３００２に、音声信号を音声認識部３００３に、それぞれ出力する。 The stream separation unit 3000 separates added character data (closed captions and electronic program guide information) and audio signals from the video stored in the accumulated video storage unit 20. Then, the character data is output to the index word extraction unit 3002, and the speech signal is output to the speech recognition unit 3003.

音声認識部３００３は、ストリーム分離部３０００から入力された音声信号に所定の音声認識処理を施すことにより、該音声信号を文字データに変換する。音声認識部３００３は、取得した文字データを索引語抽出部３００４に出力する The voice recognition unit 3003 performs a predetermined voice recognition process on the voice signal input from the stream separation unit 3000, thereby converting the voice signal into character data. The speech recognition unit 3003 outputs the acquired character data to the index word extraction unit 3004.

索引語抽出部３００２及び索引語抽出部３００４は、それぞれストリーム分離部３０００及び音声認識部３００３から入力される各文字データを解析することにより、各文字データから索引語を抽出する。なお、文字データの解析には形態素解析を用いることが好適であり、その場合、索引語には形態素と品詞の組み合わせを用いる。すなわち、形態素が同じでも、品詞が異なれば異なる索引語となる。また、索引語抽出部３００２及び索引語抽出部３００４は、文字データを構成する索引語の全てを抽出することとしてもよいし、例えば名詞のみを抽出することとしてもよい。 The index word extraction unit 3002 and the index word extraction unit 3004 extract the index word from each character data by analyzing each character data input from the stream separation unit 3000 and the speech recognition unit 3003, respectively. It is preferable to use morphological analysis for the analysis of character data. In this case, a combination of morpheme and part of speech is used as an index word. That is, even if the morphemes are the same, different index words are obtained if the parts of speech are different. Further, the index word extraction unit 3002 and the index word extraction unit 3004 may extract all of the index words constituting the character data, or may extract only nouns, for example.

統合部３００５は、索引語抽出部３００２及び索引語抽出部３００４の各抽出結果を統合し、１つの抽出結果を出力する。具体的には、索引語抽出部３００２及び索引語抽出部３００４の一方が抽出結果を得られなかった場合（番組映像に対応する文字データ又は音声信号がない場合等）には、他方により得られた抽出結果を出力する。また、索引語抽出部３００２及び索引語抽出部３００４の両方が抽出結果を得た場合、信頼性の高い（音声認識処理を経ていない）索引語抽出部３００２の抽出結果を出力する。ただし、クローズドキャプションにおいては、画面上に文字スーパーとして表示された情報は省略されることがある。クローズドキャプションにおいて上記省略がある場合、クローズドキャプション内にはその省略箇所を表す記号が含まれている。統合部３００５は、この記号を検出することにより、省略部分を検出する。そして、検出した省略部分を索引語抽出部３００４の抽出結果により置換する。 The integration unit 3005 integrates the extraction results of the index word extraction unit 3002 and the index word extraction unit 3004 and outputs one extraction result. Specifically, when one of the index word extraction unit 3002 and the index word extraction unit 3004 fails to obtain an extraction result (such as when there is no character data or audio signal corresponding to the program video), it is obtained by the other. Output the extracted results. Further, when both the index word extraction unit 3002 and the index word extraction unit 3004 obtain the extraction result, the extraction result of the index word extraction unit 3002 with high reliability (not subjected to the speech recognition process) is output. However, in the closed caption, information displayed as a character superimposition on the screen may be omitted. When there is the above omission in the closed caption, the closed caption includes a symbol representing the omission. The integration unit 3005 detects the omitted part by detecting this symbol. Then, the detected omitted part is replaced with the extraction result of the index word extraction unit 3004.

図２に戻り、特徴ベクトル生成部３０１は、統合部３００５が出力する抽出結果を用い、索引語ごとの番組映像内での出現数を取得する。そして、取得した索引語ごとの出現数に基づいて該番組映像に関する特徴ベクトルを生成する（特徴量生成手段）。このとき、特徴ベクトル生成部３０１は、蓄積映像内での出現数に基づいて決定される索引語ごとの希少性にも基づいて、特徴ベクトルの生成を行う。 Returning to FIG. 2, the feature vector generation unit 301 uses the extraction result output from the integration unit 3005 to acquire the number of appearances in the program video for each index word. Then, a feature vector related to the program video is generated based on the obtained number of appearances for each index word (feature amount generating means). At this time, the feature vector generation unit 301 generates a feature vector based on the rarity of each index word determined based on the number of appearances in the stored video.

具体的には、特徴ベクトル生成部３０１は、以下の式（１）により、特徴ベクトルの索引語ごとの要素である特徴量を生成する。ここで、ｔｆ（ｔ_ｋ，Ｐｉ）は番組Ｐｉにおける索引語ｔ_ｋの出現数であり、Ｓ（ｔ_ｋ）は、索引語ｔ_ｋの希少性を表す希少性情報である（後述）。また、ｖ_ｋ ^Ｐｉは番組Ｐｉの索引語ｔ_ｋについての特徴量である。 Specifically, the feature vector generation unit 301 generates a feature amount that is an element for each index word of the feature vector by the following equation (1). Here, tf (t _k , Pi) is the number of appearances of the index word t _k in the program Pi, and S (t _k ) is rarity information indicating the rarity of the index word t _k (described later). Further, v _k ^Pi is a feature amount for the index word t _k of the program Pi.

特徴ベクトル生成部３０１は、統合部３００５が出力する抽出結果に含まれる全ての索引語について、式（１）の計算を行う。そして、その結果を用い、式（２）により、各索引語ｔ_ｋについての特徴量により構成されるベクトルである特徴ベクトルＶ_Ｐｉを生成する。ここで、値Ｄは、後述する番組映像特徴ベクトル記憶部４０に記憶される索引語の数である。統合部３００５が出力する抽出結果に含まれていなかった索引語については、特徴ベクトル生成部３０１は、特徴ベクトルＶ_Ｐｉを生成するにあたり、ｖ_ｋ ^Ｐｉにゼロを代入しておく。 The feature vector generation unit 301 calculates Equation (1) for all index words included in the extraction result output from the integration unit 3005. And thus used, according to equation (2), generates a feature vector V _Pi is a vector composed of the feature quantity for each index term t _k. Here, the value D is the number of index words stored in the program video feature vector storage unit 40 described later. For an index word that is not included in the extraction result output by the integration unit 3005, the feature vector generation unit 301 substitutes zero for v _k ^Pi when generating the feature vector V _Pi .

特徴ベクトル生成部３０１は、上記特徴ベクトルＶ_Ｐｉを、番組映像特徴ベクトル記憶部４０に記憶させる。
図４は、番組映像特徴ベクトル記憶部４０の記憶内容の具体例を示す図である。同図に示すように、番組映像特徴ベクトル記憶部４０は、索引語ごとに、テレビ番組ごとの出現数及び上記特徴量を記憶している。なお、番組映像特徴ベクトル記憶部４０に記憶される索引語は、過去に蓄積映像（番組映像を含む。）から抽出された索引語全てである。 The feature vector generation unit 301 stores the feature vector V _Pi in the program video feature vector storage unit 40.
FIG. 4 is a diagram showing a specific example of the contents stored in the program video feature vector storage unit 40. As shown in the figure, the program video feature vector storage unit 40 stores the number of appearances for each television program and the feature amount for each index word. The index words stored in the program video feature vector storage unit 40 are all index words previously extracted from the stored video (including the program video).

特徴ベクトル生成部３０１は、上記特徴ベクトルＶ_Ｐｉを番組映像特徴ベクトル記憶部４０に記憶させる際、既に記憶されている索引語については、その索引語の行に、取得した出現数ｔｆ（ｔ_ｋ，Ｐｉ）及び算出した特徴量ｖ_ｋ ^Ｐｉを記憶させる。一方、まだ記憶されていない索引語については、その索引語の行を追加し、追加した行に、取得した出現数ｔｆ（ｔ_ｋ，Ｐｉ）及び算出した特徴量ｖ_ｋ ^Ｐｉを記憶させる。その他の行については、出現数及び特徴量ともにゼロを記憶させる。 When the feature vector generation unit 301 stores the feature vector V _Pi in the program video feature vector storage unit 40, for the index word that has already been stored, the number of occurrences tf (t _k acquired in the index word row is stored. , Pi) and the calculated feature value v _k ^Pi are stored. On the other hand, for an index word that has not yet been stored, a row of the index word is added, and the obtained number of appearances tf (t _k , Pi) and the calculated feature value v _k ^Pi are stored in the added row. For the other rows, zero is stored for both the number of appearances and the feature amount.

ここで、上記希少性情報Ｓ（ｔ_ｋ）について説明する。希少性情報算出部３１は、番組映像特徴ベクトル記憶部４０の記憶内容に基づいて、索引語ｔ_ｋごとの希少性情報Ｓ（ｔ_ｋ）を算出する。具体的には、式（３）又は式（４）を用いて、希少性情報Ｓ（ｔ_ｋ）を算出する。なお、ｐｆ（ｔ_ｋ）は索引語ｔ_ｋの出現数が１以上である番組映像の数であり、希少性情報算出部３１は、番組映像特徴ベクトル記憶部４０の記憶内容に基づいてｐｆ（ｔ_ｋ）を算出する。また、Ｎは過去の番組映像の総数である。 Here, the rarity information S (t _k ) will be described. The rarity information calculation unit 31 calculates rarity information S (t _k ) for each index word t _k based on the stored contents of the program video feature vector storage unit 40. Specifically, the scarcity information S (t _k ) is calculated using the formula (3) or the formula (4). Note that pf (t _k ) is the number of program videos in which the number of occurrences of the index word t _k is 1 or more, and the rarity information calculation unit 31 is based on the stored contents of the program video feature vector storage unit 40. t _k ) is calculated. N is the total number of past program videos.

式（３）はＩＤＦ(Inverse Document Frequency)値であり、式（４）はエントロピーに基づく値である。これらの各式を用いることにより、ある特定の番組映像のみに出現するような索引語の希少性情報Ｓ（ｔ_ｋ）の値は、そうでない索引語（様々な番組映像に出現する索引語）の希少性情報Ｓ（ｔ_ｋ）の値に比べて高くなる。 Expression (3) is an IDF (Inverse Document Frequency) value, and Expression (4) is a value based on entropy. By using each of these formulas, the value of the scarcity information S (t _k ) of an index word that appears only in a specific program video is an index word that does not (index word that appears in various program videos). Becomes higher than the value of the scarcity information S (t _k ).

さて、次に、蓄積映像（番組映像を含む。）に関する特徴ベクトルの生成について説明する。蓄積映像特徴ベクトル生成部３２は、蓄積映像記憶部２０に記憶される蓄積映像について、特徴ベクトルを生成する。以下、具体的に説明する。 Next, generation of feature vectors relating to stored video (including program video) will be described. The stored video feature vector generation unit 32 generates a feature vector for the stored video stored in the stored video storage unit 20. This will be specifically described below.

図５は、蓄積映像特徴ベクトル生成部３２の内部構成を示す図である。同図に示すように、蓄積映像特徴ベクトル生成部３２は抽出部３０２及び特徴ベクトル生成部３０３を含んで構成される。 FIG. 5 is a diagram illustrating an internal configuration of the stored video feature vector generation unit 32. As shown in the figure, the accumulated video feature vector generation unit 32 includes an extraction unit 302 and a feature vector generation unit 303.

抽出部３０２の処理は、番組映像特徴ベクトル生成部３０に含まれる抽出部３００の処理とほぼ同様であるが、蓄積映像の区間ごとに、区間映像に付加された文字データ又は音声信号のうちの少なくとも一方から、１又は複数の索引語を抽出するという点で、抽出部３００と異なっている。この区間は、予め定められた時間長の区間であることが好適であるが、特に、コマーシャル映像やスポット映像の時間長の最大公約数の区間とすることが好適である。 The processing of the extraction unit 302 is substantially the same as the processing of the extraction unit 300 included in the program video feature vector generation unit 30, but for each section of the stored video, of the character data or audio signal added to the section video It differs from the extraction unit 300 in that one or more index words are extracted from at least one. This section is preferably a section having a predetermined time length, and is particularly preferably a section having the greatest common divisor of the time length of commercial video or spot video.

特徴ベクトル生成部３０３は、区間映像特徴ベクトル生成部３０３０及び連続区間映像特徴ベクトル生成部３０３１を含んで構成される。
区間映像特徴ベクトル生成部３０３０は、抽出部３０２が出力する抽出結果を用い、索引語ごとの区間映像内での出現数を取得する。そして、取得した索引語ごとの出現数に基づいて区間映像に関する特徴ベクトルを生成する（区間映像特徴量生成手段）。具体的な特徴ベクトルの生成方法は、特徴ベクトル生成部３０１とほぼ同様である。 The feature vector generation unit 303 includes a section video feature vector generation unit 3030 and a continuous section video feature vector generation unit 3031.
The section video feature vector generation unit 3030 uses the extraction result output from the extraction unit 302 to acquire the number of appearances in the section video for each index word. Then, a feature vector related to the section video is generated based on the obtained number of appearances for each index word (section video feature value generation means). A specific feature vector generation method is almost the same as that of the feature vector generation unit 301.

ただし、特徴ベクトル生成部３０１は番組映像ごとに特徴ベクトルを生成するが、区間映像特徴ベクトル生成部３０３０は区間映像ごとに特徴ベクトルを生成するという違いがある。このため、特徴ベクトルを表す記号としては異なるものを使用するのが好適であり、以下に定義しておく。まず、時刻Ｔｘに始まり時刻Ｔｙで終わる区間映像を、Ｔｘ〜Ｔｙと表す。その結果、式（１）は、次の式（５）のように書き直される。ここで、ｖ_ｋ ^{Ｔｘ〜Ｔｙ}は区間番組Ｔｘ〜Ｔｙの索引語ｔ_ｋについての特徴量である。 However, the feature vector generation unit 301 generates a feature vector for each program video, but the section video feature vector generation unit 3030 generates a feature vector for each section video. For this reason, it is preferable to use different symbols representing the feature vectors, which are defined below. First, a section video starting at time Tx and ending at time Ty is expressed as Tx to Ty. As a result, equation (1) is rewritten as the following equation (5). _Here, ^{v k Tx~Ty} is a feature amount for index terms _{t k} interval program Tx～Ty.

また、区間映像Ｔｘ〜Ｔｙに関する特徴ベクトルは、式（６）のＶ_{Ｔｘ〜Ｔｙ}で表される。

In addition, the feature vectors related to the section videos Tx to _Ty are expressed by _VTx to _Ty in Expression (6).

連続区間映像特徴ベクトル生成部３０３１は、区間映像特徴ベクトル生成部３０３０によって生成される各区間映像に関する特徴ベクトルに基づき、隣り合った複数の区間映像からなる連続区間映像に関する特徴ベクトルを生成する（連続区間映像特徴ベクトル生成手段）。具体的には、連続区間映像特徴ベクトル生成部３０３１は、隣り合った複数の区間映像からなる連続区間映像に関する特徴ベクトルを、要素ごとに足し算することにより、連続区間映像に関する特徴ベクトルを生成する。 The continuous segment video feature vector generation unit 3031 generates a feature vector related to a continuous segment video composed of a plurality of adjacent segment videos based on the feature vector related to each segment video generated by the segment video feature vector generation unit 3030 (continuous). Section video feature vector generation means). Specifically, the continuous segment video feature vector generation unit 3031 generates a feature vector related to a continuous segment video by adding, for each element, a feature vector related to a continuous segment video composed of a plurality of adjacent segment videos.

図６は、連続区間映像に関する特徴ベクトルの例を示す図である。同図の例では、まず、区間映像ごとの特徴ベクトルＶ_{Ｔ１〜Ｔ２}、Ｖ_{Ｔ２〜Ｔ３}、Ｖ_{Ｔ３〜Ｔ４}、Ｖ_{Ｔ４〜Ｔ５}がそれぞれ生成される。次に、隣接する２区間ずつの特徴ベクトルＶ_{Ｔ１〜Ｔ３}＝Ｖ_{Ｔ１〜Ｔ２}＋Ｖ_{Ｔ２〜Ｔ３}、Ｖ_{Ｔ３〜Ｔ５}＝Ｖ_{Ｔ３〜Ｔ４}＋Ｖ_{Ｔ４〜Ｔ５}が生成される。さらに、隣接する４区間ずつの特徴ベクトルＶ_{Ｔ１〜Ｔ５}＝Ｖ_{Ｔ１〜Ｔ３}＋Ｖ_{Ｔ３〜Ｔ５}も生成される。このような生成が、以降も繰り返される。この例では、結果として、特徴ベクトルが階層的に生成されており、同一階層内での重複はない。 FIG. 6 is a diagram illustrating an example of a feature vector related to a continuous section video. In the example shown in the figure, first, feature vectors V _T1 to _T2 , V _{T2 to T3} , V _T3 to _T4 , and V _T4 to _T5 are generated for each section video. Next, feature vectors V _{T1 to T3} = V _{T1 to T2} + V _{T2 to T3} and V _{T3 to T5} = V _{T3 to T4} + V _T4 to _T5 are generated every two adjacent sections. Further, feature vectors V _{T1 to T5} = V _{T1 to T3} + V _{T3 to T5} are also generated for every four adjacent sections. Such generation is repeated thereafter. In this example, as a result, the feature vectors are generated hierarchically, and there is no overlap in the same hierarchy.

図７は、連続区間映像に関する特徴ベクトルの他の例を示す図である。同図の例は、図６の例において、同一階層内での重複を認める場合の例である。すなわち、この例では、隣接する２区間ずつの特徴ベクトルを生成する際、特徴ベクトルＶ_{Ｔ１〜Ｔ３}＝Ｖ_{Ｔ１〜Ｔ２}＋Ｖ_{Ｔ２〜Ｔ３}、Ｖ_{Ｔ３〜Ｔ５}＝Ｖ_{Ｔ３〜Ｔ４}＋Ｖ_{Ｔ４〜Ｔ５}の他、Ｖ_{Ｔ２〜Ｔ４}＝Ｖ_{Ｔ２〜Ｔ３}＋Ｖ_{Ｔ３〜Ｔ４}も生成される。Ｖ_{Ｔ２〜Ｔ４}とＶ_{Ｔ１〜Ｔ３}、Ｖ_{Ｔ２〜Ｔ４}とＶ_{Ｔ３〜Ｔ５}は、それぞれ重複区間を有している。さらに上位の階層についても、同様に重複を認めて特徴ベクトルが生成される。 FIG. 7 is a diagram illustrating another example of the feature vector related to the continuous section video. The example of the figure is an example in the case where duplication within the same hierarchy is recognized in the example of FIG. That is, in this example, when generating feature vectors for every two adjacent sections, feature vectors V _{T1 to T3} = V _{T1 to T2} + V _{T2 to T3} , V _{T3 to T5} = V _{T3 to T4} + V _{T4 to T5} , V _{T2 to T4} = V _{T2 to T3} + V _{T3 to T4} are also generated. V _T2 to _T4 and V _{T1 to T3} , V _T2 to _T4 and V _{T3 to T5} have overlapping sections, respectively. In addition, for the upper layers, feature vectors are generated in a similar manner with recognition of duplication.

なお、このように階層的に特徴ベクトルを算出する場合の階層の深さは、最下位階層の区間長に基づいて決まる最上位階層の区間長が、要約映像として検出したい映像の時間長になるように決定される。図６に示した例を取り上げて具体的な例を挙げると、最下位階層の区間長が１０秒程度であり、要約映像として検出したい映像の時間長が数分程度であれば、５〜６階層となる。 Note that the hierarchical depth in the case of calculating feature vectors hierarchically in this way is the time length of the video to be detected as the summary video, with the section length of the highest hierarchy determined based on the section length of the lowest hierarchy. To be determined. Taking the example shown in FIG. 6 as a specific example, if the section length of the lowest layer is about 10 seconds and the time length of the video to be detected as the summary video is about several minutes, 5-6 It becomes a hierarchy.

区間映像特徴ベクトル生成部３０３０及び連続区間映像特徴ベクトル生成部３０３１は、以上のようにして生成した特徴ベクトルを、蓄積映像特徴ベクトル記憶部４１に記憶させる。 The section video feature vector generation unit 3030 and the continuous section video feature vector generation unit 3031 cause the accumulated video feature vector storage unit 41 to store the feature vectors generated as described above.

図８は、蓄積映像特徴ベクトル記憶部４１の記憶内容の具体例を示す図である。同図に示すように、蓄積映像特徴ベクトル記憶部４１は、索引語ごとに、区間ごとの上記特徴量を記憶している。なお、蓄積映像特徴ベクトル記憶部４１に記憶される索引語は、過去に蓄積映像（番組映像を含む。）から抽出された索引語全てである。 FIG. 8 is a diagram illustrating a specific example of the stored contents of the accumulated video feature vector storage unit 41. As shown in the figure, the accumulated video feature vector storage unit 41 stores the feature quantity for each section for each index word. The index words stored in the stored video feature vector storage unit 41 are all index words extracted from the stored video (including program video) in the past.

区間映像特徴ベクトル生成部３０３０及び連続区間映像特徴ベクトル生成部３０３１は、生成した特徴ベクトルを蓄積映像特徴ベクトル記憶部４１に記憶させる際、既に記憶されている索引語については、その索引語の行に、算出した特徴量ｖ_ｋ ^{Ｔｘ〜Ｔｙ}を記憶させる。一方、まだ記憶されていない索引語については、その索引語の行を追加し、追加した行に、特徴量ｖ_ｋ ^{Ｔｘ〜Ｔｙ}を記憶させる。その他の行の特徴量についてはゼロを記憶させる。 When the section video feature vector generation unit 3030 and the continuous section video feature vector generation unit 3031 store the generated feature vector in the accumulated video feature vector storage unit 41, the index word row of the index word already stored is stored. Then, the calculated feature values v _k ^Tx to ^Ty are stored. On the other hand, for an index word that has not yet been stored, a row of the index word is added, and feature quantities v _k ^Tx to ^Ty are stored in the added row. Zero is stored for the feature values of the other rows.

次に、図１に戻って要約映像検出装置５について説明する。まず、要約映像検出装置５はディスプレイなどの表示手段及びキーボード・マウスなどの入力手段を備えており、番組指定受付部５０は、これらを用いて、番組映像特徴ベクトル記憶部４０に記憶される各番組映像のうちの１つのユーザによる指定を受け付ける。 Next, referring back to FIG. 1, the summary video detection device 5 will be described. First, the summary video detection device 5 includes display means such as a display and input means such as a keyboard / mouse, and the program designation receiving unit 50 uses these to store each of the program video feature vector storage units 40 stored therein. The designation by one user of the program video is accepted.

番組映像特徴ベクトル取得部５１は、ユーザにより指定された番組映像について、特徴ベクトル生成装置３により生成され、番組映像特徴ベクトル記憶部４０に記憶された特徴ベクトルを取得する（番組映像特徴ベクトル取得手段）。そして、取得した特徴ベクトルを類似度算出部５２に出力する。 The program video feature vector acquisition unit 51 acquires the feature vector generated by the feature vector generation device 3 and stored in the program video feature vector storage unit 40 for the program video specified by the user (program video feature vector acquisition means) ). Then, the acquired feature vector is output to the similarity calculation unit 52.

番組映像特徴ベクトル取得部５１から特徴ベクトルの入力を受けた類似度算出部５２は、蓄積映像特徴ベクトル取得部５３（蓄積映像特徴ベクトル取得手段）を用い、蓄積映像の各区間それぞれについて、特徴ベクトル生成装置３により生成され、蓄積映像特徴ベクトル記憶部４１に記憶された特徴ベクトルを取得する。このとき取得対象とする区間は、番組映像のスポット映像や告知映像が番組放送の数週間前から放送され始めることに鑑み、指定された番組映像が放送された日前数週間程度の区間とすることが好ましい。 The similarity calculation unit 52 that has received a feature vector input from the program video feature vector acquisition unit 51 uses a stored video feature vector acquisition unit 53 (stored video feature vector acquisition means), and uses a feature vector for each section of the stored video. A feature vector generated by the generation device 3 and stored in the accumulated video feature vector storage unit 41 is acquired. The section to be acquired at this time should be a section of several weeks before the designated program video is broadcasted in consideration of the fact that the program video spot video and announcement video start to be broadcast several weeks before the program broadcast. Is preferred.

類似度算出部５２は、蓄積映像特徴ベクトル取得部５３により取得される各特徴ベクトルを用い、蓄積映像の区間（連続区間を含む。）ごとに、該区間に関する特徴ベクトルと、番組映像に関する特徴ベクトルと、の類似度を算出する（類似度算出手段）。 The similarity calculation unit 52 uses each feature vector acquired by the stored video feature vector acquisition unit 53 and uses a feature vector related to the section and a feature vector related to the program video for each section (including continuous sections) of the stored video. The similarity is calculated (similarity calculation means).

類似度算出部５２は、類似度の算出を式（７）に基づいて行うことが好適である。ここで、ｓｉｍ（Ｖ_Ｐｉ，Ｖ_{Ｔｘ〜Ｔｙ}）は、番組映像Ｐｉに関する特徴ベクトルＶ_ｐｉと区間映像Ｔｘ〜Ｔｙに関する特徴ベクトルＶ_{Ｔｘ〜Ｔｙ}の類似度である。また、Ｉ（ｔ_ｋ）は、番組映像Ｐｉ内における索引語ｔ_ｋの重要度を表す重み係数である。例えば、電子番組ガイド内に、番組映像Ｐｉのタイトル、副題、出演者として現れている索引語ほど、Ｉ（ｔ_ｋ）を大きな値とする。また、番組映像Ｐｉ内での出現数が高く、かつ上記希少性が高い索引語ほど、Ｉ（ｔ_ｋ）を大きな値とする。なお、Ｉ（ｔ_ｋ）の値を決定するためのこれらの条件は適宜決定されるものであるが、過去のデータからの機械学習により適切な条件を決定することとしてもよい。 The similarity calculation unit 52 preferably calculates the similarity based on the equation (7). Here, sim (V _Pi , V _{Tx to Ty} ) is the similarity between the feature vector V _pi related to the program video Pi and the feature vectors V _{Tx to} Ty related to the section video Tx to Ty. I (t _k ) is a weighting coefficient that represents the importance of the index word t _k in the program video Pi. For example, I (t _k ) is set to a larger value for index words appearing as titles, subtitles, and performers of the program video Pi in the electronic program guide. In addition, I (t _k ) is set to a larger value for an index word having a higher number of appearances in the program video Pi and having a higher rarity. Note that these conditions for determining the value of I (t _k ) are appropriately determined, but appropriate conditions may be determined by machine learning from past data.

式（７）によれば、重み係数Ｉ（ｔ_ｋ）によって重み付けられた特徴ベクトル同士がなす角の余弦値によって類似度を表していることになるので、特徴ベクトルの絶対値は類似度に影響しない。映像が長いほど特徴ベクトルの絶対値は大きくなるが、式（７）を用いることにより、映像の長さに影響されない類似度を算出することが可能となっている。
類似度算出部５２は、算出した区間映像ごとの類似度を類似度記憶部５４に記憶させる。 According to Equation (7), the similarity is represented by the cosine value of the angle formed by the feature vectors weighted by the weighting coefficient I (t _k ), so the absolute value of the feature vector affects the similarity. do not do. The longer the video is, the larger the absolute value of the feature vector is. However, by using Equation (7), it is possible to calculate a similarity that is not affected by the length of the video.
The similarity calculation unit 52 stores the calculated similarity for each section video in the similarity storage unit 54.

ここで、蓄積映像特徴ベクトル取得部５３による特徴ベクトル取得の順序について説明する。第１の方法では、蓄積映像特徴ベクトル取得部５３は、まず始点を決め、その始点から順次区間長を長くして、区間映像に関する特徴ベクトルを取得していく。そして、区間長が所定の最大長に達した場合、始点に該最大長を加算して新たな始点を取得する。以降の処理は同様である。 Here, the order of feature vector acquisition by the stored video feature vector acquisition unit 53 will be described. In the first method, the accumulated video feature vector acquisition unit 53 first determines a starting point, and sequentially increases the section length from the starting point to acquire a feature vector related to the section video. When the section length reaches a predetermined maximum length, the maximum length is added to the start point to obtain a new start point. The subsequent processing is the same.

また、第２の方法では、蓄積映像特徴ベクトル取得部５３は、まず始点を決め、その始点から所定区間長の区間映像に関する特徴ベクトルを取得する。次に、始点に所定時間長（所定時間長＞所定区間長）を加算して新たな始点を取得する。以降の処理は同様である。 In the second method, the accumulated video feature vector acquisition unit 53 first determines a start point, and acquires a feature vector related to a section video having a predetermined section length from the start point. Next, a new start point is acquired by adding a predetermined time length (predetermined time length> predetermined section length) to the start point. The subsequent processing is the same.

なお、この第２の方法を採用する場合、蓄積映像特徴ベクトル取得部５３が特徴ベクトルを取得する都度、類似度算出部５２による類似度算出を行うことが好ましい。そして、ある始点から所定区間長の類似度が所定値以下である場合（ほとんど場合或いは全く類似していない場合）、又は、該類似度が所定値以上である場合（極めてよく類似している場合）、始点に加算する値を、上記所定時間長ではなく上記所定区間長とすることが好ましい。 When the second method is employed, it is preferable that the similarity calculation unit 52 performs similarity calculation every time the accumulated video feature vector acquisition unit 53 acquires a feature vector. And, when the similarity of a certain section length from a certain starting point is less than or equal to a predetermined value (when almost or not similar at all), or when the similarity is greater than or equal to a predetermined value (when very similar) It is preferable that the value added to the start point is not the predetermined time length but the predetermined section length.

図９は、上記第２の方法を採用する場合において、蓄積映像特徴ベクトル取得部５３により取得される特徴ベクトルの例を示す図である。同図の例では、所定時間長が１区間分で、所定区間長が４区間分となっている。同図においては、特徴ベクトルＶ_{Ｔ３〜Ｔ７}の類似度が所定値以下であった場合であり、この場合、蓄積映像特徴ベクトル取得部５３は、Ｖ_{Ｔ４〜Ｔ８}、Ｖ_{Ｔ５〜Ｔ９}、Ｖ_{Ｔ６〜Ｔ１０}、及びＶ_{Ｔ７〜Ｔ１１}の取得をスキップし、Ｖ_{Ｔ３〜Ｔ７}の次にＶ_{Ｔ８〜Ｔ１２}を取得している。こうすることで、類似度算出にかかる時間を削減し、処理を高速化することができる。 FIG. 9 is a diagram illustrating an example of feature vectors acquired by the accumulated video feature vector acquisition unit 53 when the second method is employed. In the example of the figure, the predetermined time length is one section and the predetermined section length is four sections. In this figure, the similarity _between the feature vectors V _{T3 to T7} is equal to or less than a predetermined value. In this case, the stored video feature vector acquisition unit 53 performs V _T4 to _T8 , V _{T5 to} _T9 , and V _{T6 to T10,} and skip the acquisition of _{V _T7~T11,} has acquired the _{V T8~T12} to the next _{V T3~T7.} By doing so, it is possible to reduce the time required for similarity calculation and speed up the processing.

さて、要約映像検出部５５は、類似度記憶部５４に記憶される類似度算出部５２の算出結果に基づいて、蓄積映像の各区間及び各連続区間の中から、番組映像の要約映像を検出する（要約映像検出手段）。具体的には、類似度の最も高い区間映像を要約映像として検出してもよいし、類似度の高い順にいくつかの区間映像を要約映像として検出してもよい。 The summary video detection unit 55 detects the summary video of the program video from each section of the stored video and each continuous section based on the calculation result of the similarity calculation unit 52 stored in the similarity storage unit 54. (Summary video detection means). Specifically, a section video with the highest similarity may be detected as a summary video, or several section videos may be detected as a summary video in descending order of similarity.

要約映像検出部５５は、要約映像検出装置５の表示手段を用い、ユーザに対し、検出した要約映像を提示する。複数の区間映像を提示する場合には、類似度の順に表示することが好ましい。 The summary video detection unit 55 uses the display means of the summary video detection device 5 to present the detected summary video to the user. When presenting a plurality of section videos, it is preferable to display them in order of similarity.

最後に、以上説明した要約映像検出装置５の各処理について、処理フローを参照しながら、再度より詳細に説明する。
図１０は、要約映像検出装置５の処理フローを示すフロー図である。同図に示すように、要約映像検出装置５は、まず、番組映像Ｐｉに関する特徴ベクトルＶ_ｐｉを取得する（ステップＳ１）。次に、蓄積映像の検索範囲を設定し（ステップＳ２）、区間長最大値Ｔ_ＬＭＡＸ、区間長増分Ｔ_Ｉ、蓄積映像内の始点Ｔ_Ｓ、映像区間長Ｔ_Ｌをそれぞれ初期設定する（ステップＳ３〜ステップＳ６）。 Finally, each process of the summary video detection device 5 described above will be described in more detail again with reference to the processing flow.
FIG. 10 is a flowchart showing the processing flow of the summary video detection device 5. As shown in the figure, the summary video detecting device 5 first acquires a feature vector V _pi related to the program video Pi (step S1). Next, the search range of the stored video is set (step S2), and the section length maximum value T _LMAX , the section length increment T _I , the start point T _S in the stored video, and the video section length _TL are initialized (step S3). -Step S6).

次に、要約映像検出装置５は、映像区間長Ｔ_Ｌが区間長最大値Ｔ_ＬＭＡＸ以下であるか否かを判定し（ステップＳ７）、以下でなければ、Ｔ_Ｓに映像区間長Ｔ_ＬＭＡＸを加算し、ステップＳ６に処理を戻す（ステップＳ８）。 Next, the digest video detector 5 determines whether the image interval length _{T L} is equal to or less than the interval length maximum value _{T LMAX} (step S7), and if less, the image interval length _{T LMAX} to _{T S} Add and return to step S6 (step S8).

一方、ステップＳ７の判定において、映像区間長Ｔ_Ｌが区間長最大値Ｔ_ＬＭＡＸ以下であると判定されると、要約映像検出装置５は、次に、始点Ｔ_Ｓから映像区間長Ｔ_Ｌ分の区間が、ステップＳ２で初期設定した検索範囲内に含まれるか否かを判定する（ステップＳ８）。その結果、検索範囲外であれば、ステップＳ１４に処理を移す。検索範囲内であれば、始点Ｔ_Ｓから映像区間長Ｔ_Ｌ分の区間（Ｔｘ〜Ｔｙとする。）に関する特徴ベクトルＶ_{Ｔｘ〜Ｔｙ}を取得する。そして、特徴ベクトルＶ_ｐｉと特徴ベクトルＶ_{Ｔｘ〜Ｔｙ}の類似度を算出し（ステップＳ１２）、類似度記憶部５４に記憶させる（ステップＳ１３）。次に、要約映像検出装置５は、Ｔ_ＬにＴ_Ｉを加算し、処理をＳ７に戻す。 On the other hand, if it is determined in step S7 that the video segment length T _L is equal to or _smaller than the maximum segment length value T _LMAX , the summary video detection device 5 next performs the video segment length T _L from the start point T _S. It is determined whether or not the section is included in the search range initially set in step S2 (step S8). If the result is out of the search range, the process proceeds to step S14. If the search range, obtains a feature vector _{V Tx～Ty} relates to a video interval length _{T L} min interval (a Tx~Ty.) From the starting point _{T S.} Then, the similarity between the feature vector V _pi and the feature vectors V _{Tx to Ty} is calculated (step S12) and stored in the similarity storage unit 54 (step S13). Next, the digest video detector 5 adds _{T I} to _{T L,} the process returns to S7.

ステップＳ１４では、要約映像検出装置５は、類似度記憶部５４に記憶された類似度に基づいて要約映像を検出する。そして、類似度の高い順にソートしてユーザに対して提示する。 In step S <b> 14, the summary video detection device 5 detects the summary video based on the similarity stored in the similarity storage unit 54. And it sorts and shows to a user in order with high similarity.

以上説明したように、要約映像検出システム１によれば、上記特徴ベクトルが映像の意味内容を反映しているので、要約映像検出装置５によって行われる要約画像の検出が、幅広い映像に適用可能な、検出精度の高いものとなる。
また、特徴ベクトル生成装置３は、特徴ベクトル生成の基になる文字データを、クローズドキャプションや電子番組ガイドから文字データを取得することができる。 As described above, according to the summary video detection system 1, the feature vector reflects the semantic content of the video, so that the summary image detection performed by the summary video detection device 5 can be applied to a wide range of videos. The detection accuracy is high.
In addition, the feature vector generation device 3 can acquire character data from a closed caption or an electronic program guide as character data on which the feature vector is generated.

また、特徴ベクトル生成装置３は、映像の区間ごとに、特徴ベクトルを作成することができる。また、映像の区間ごとに特徴ベクトルを生成したことから、階層的に特徴ベクトルを算出することができるので、特徴ベクトル算出の処理負荷を軽減することが可能になる。
さらに、要約映像検出システム１では、様々な長さの映像区間を要約映像候補として取り扱うことが可能になっている。 Also, the feature vector generation device 3 can create a feature vector for each video section. Further, since the feature vectors are generated for each section of the video, the feature vectors can be calculated hierarchically, so that the processing load for calculating the feature vectors can be reduced.
Furthermore, the summary video detection system 1 can handle video sections of various lengths as summary video candidates.

以上本発明の実施の形態について説明したが、本発明はこうした実施の形態に何等限定されるものではなく、本発明は、その要旨を逸脱しない範囲において、種々なる態様で実施され得ることは勿論である。 Although the embodiments of the present invention have been described above, the present invention is not limited to these embodiments, and the present invention can of course be implemented in various modes without departing from the scope of the present invention. It is.

例えば、上記実施の形態では、蓄積映像に関する特徴ベクトルを予め算出して記憶しておいたが、番組指定受付部５０によって番組映像が指定されたときに蓄積映像に関する特徴ベクトルを算出するようにしてもよい。 For example, in the above embodiment, the feature vector related to the stored video is calculated and stored in advance, but when the program video is designated by the program designation receiving unit 50, the feature vector related to the stored video is calculated. Also good.

また、各連続区間に関する特徴ベクトルの算出において、上記実施の形態では、より下位層の区間に関する特徴ベクトルの足し算により求めていたが、より上位の特徴ベクトルから、より下位の特徴ベクトルを引くことにより求めることとしてもよい。例えば、Ｔ１〜Ｔ９の特徴ベクトルＶ_{Ｔ１〜Ｔ９}は、Ｖ_{Ｔ１〜Ｔ１０}−Ｖ_{Ｔ９〜Ｔ１０}として求めることが可能である。 Further, in the calculation of the feature vector for each continuous section, in the above embodiment, the feature vector for the lower layer section is obtained by addition. However, by subtracting the lower-order feature vector from the higher-order feature vector, It may be asking. For example, the feature vectors V _{T1 to} _T9 of _{T1 to} _T9 can be obtained as V _{T1 to T10} −V _{T9 to} _T10 .

また、特徴ベクトル生成装置３及び要約映像検出装置５の機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより、上記各処理を行ってもよい。
ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものであってもよい。また、この「コンピュータシステム」は、ＷＷＷシステムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。
また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、フラッシュメモリ等の書き込み可能な不揮発性メモリ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。
さらに、「コンピュータ読み取り可能な記録媒体」には、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（例えばＤＲＡＭ（Dynamic Random Access Memory））のように、一定時間プログラムを保持しているものも含むものとする。
また、上記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。
さらに、上記プログラムは、上述した各機能の一部を実現するためのものであってもよい。さらに、上述した各機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であってもよい。 Further, a program for realizing the functions of the feature vector generation device 3 and the summary video detection device 5 is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read into a computer system and executed. As a result, the above-described processes may be performed.
Here, the “computer system” may include an OS and hardware such as peripheral devices. Further, the “computer system” includes a homepage providing environment (or display environment) if a WWW system is used.
The “computer-readable recording medium” means a flexible disk, a magneto-optical disk, a ROM, a writable nonvolatile memory such as a flash memory, a portable medium such as a CD-ROM, a hard disk built in a computer system, etc. This is a storage device.
Furthermore, the “computer-readable recording medium” includes a volatile memory (for example, DRAM (DRAM) in a computer system that becomes a server or a client when a program is transmitted through a network such as the Internet or a communication line such as a telephone line. Dynamic Random Access Memory)), etc., which hold programs for a certain period of time.
The program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line.
Further, the program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve each function mentioned above in combination with the program already recorded on the computer system, what is called a difference file (difference program) may be sufficient.

本発明の実施の形態にかかる要約映像検出システムのシステム構成を示す図である。It is a figure which shows the system configuration | structure of the summary image | video detection system concerning embodiment of this invention. 本発明の実施の形態にかかる番組映像特徴ベクトル生成部の内部構成を示す図である。It is a figure which shows the internal structure of the program image | video feature vector production | generation part concerning embodiment of this invention. 本発明の実施の形態にかかる抽出部の内部構成を示す図である。It is a figure which shows the internal structure of the extraction part concerning embodiment of this invention. 本発明の実施の形態にかかる番組映像特徴ベクトル記憶部の記憶内容の具体例を示す図である。It is a figure which shows the specific example of the memory content of the program image | video feature vector memory | storage part concerning embodiment of this invention. 本発明の実施の形態にかかる蓄積映像特徴ベクトル生成部の内部構成を示す図である。It is a figure which shows the internal structure of the accumulation | storage image | video feature vector production | generation part concerning embodiment of this invention. 本発明の実施の形態にかかる連続区間映像に関する特徴ベクトルの例を示す図である。It is a figure which shows the example of the feature vector regarding the continuous area image | video concerning embodiment of this invention. 本発明の実施の形態にかかる連続区間映像に関する特徴ベクトルの例を示す図である。It is a figure which shows the example of the feature vector regarding the continuous area image | video concerning embodiment of this invention. 本発明の実施の形態にかかる蓄積映像特徴ベクトル記憶部の記憶内容の具体例を示す図である。It is a figure which shows the specific example of the memory content of the accumulation image | video feature vector memory | storage part concerning embodiment of this invention. 本発明の実施の形態にかかる蓄積映像特徴ベクトル取得部により取得される特徴ベクトルの例を示す図である。It is a figure which shows the example of the feature vector acquired by the stored image | video feature vector acquisition part concerning embodiment of this invention. 本発明の実施の形態にかかる要約映像検出装置の処理フローを示すフロー図である。It is a flowchart which shows the processing flow of the summary image | video detection apparatus concerning embodiment of this invention.

Explanation of symbols

１要約映像検出システム、
２映像データベース、
３特徴ベクトル生成装置、
４特徴ベクトルデータベース、
５要約映像検出装置、
２０蓄積映像記憶部、
３０番組映像特徴ベクトル生成部、
３１希少性情報算出部、
３２蓄積映像特徴ベクトル生成部、
４０番組映像特徴ベクトル、
４１蓄積映像特徴ベクトル記憶部、
５０番組指定受付部、
５１番組映像特徴ベクトル取得部、
５２類似度算出部、
５３蓄積映像特徴ベクトル取得部、
５４類似度記憶部、
５５要約映像検出部、
５５類似度記憶部、
３００，３０２抽出部、
３０１，３０３特徴ベクトル生成部、
３０００ストリーム分離部、
３００２索引語抽出部、
３００３音声認識部、
３００４索引語抽出部、
３００５統合部、
３０３０区間映像特徴ベクトル生成部、
３０３１連続区間映像特徴ベクトル生成部。 1 summary video detection system,
2 video database,
3 feature vector generator,
4 Feature vector database,
5 summary video detection device,
20 Accumulated video storage unit,
30 Program video feature vector generator,
31 Rareness information calculator,
32. stored image feature vector generation unit,
40 Program video feature vector,
41 stored image feature vector storage unit,
50 Program designation reception part,
51 Program video feature vector acquisition unit,
52 similarity calculation unit,
53. Accumulated video feature vector acquisition unit,
54 similarity storage unit,
55 Summary video detector,
55 similarity storage unit,
300,302 extraction unit,
301, 303 feature vector generator,
3000 stream separator,
3002 Index word extraction unit,
3003 voice recognition unit,
3004 Index word extraction unit,
3005 Integration Department,
3030 section image feature vector generation unit,
3031 A continuous segment video feature vector generation unit.

Claims

Extraction means for extracting one or a plurality of index words from at least one of character data or audio signals corresponding to video;
Feature quantity generating means for generating a feature quantity related to the video based on the number of appearances in the video for each index word;
The feature-value production | generation apparatus characterized by including.

In the feature-value production | generation apparatus of Claim 1,
The feature amount generation means generates a feature amount related to the video based on the rarity of each index word determined based on the number of appearances in the stored video;
The feature-value production | generation apparatus characterized by this.

Extraction means for extracting one or a plurality of index words from at least one of character data or audio signal corresponding to the section video for each section of the video;
Section video feature value generation means for generating a feature value related to the section video based on the number of appearances in the section video for each index word;
A continuous section video feature quantity generating means for generating a feature quantity regarding a continuous section video composed of a plurality of adjacent section videos based on a feature quantity regarding each section video generated by the section video feature quantity generation means;
The feature-value production | generation apparatus characterized by including.

For each section of the stored video, stored video feature quantity acquisition means for acquiring the feature quantity generated by the feature quantity generation device according to claim 1;
A program video feature amount acquisition means for acquiring a feature amount generated by the feature amount generation apparatus according to claim 1 for a program video;
For each section of the stored video, similarity calculation means for calculating the similarity between the feature quantity related to the section and the feature quantity related to the program video;
Summary video detection means for detecting a summary video of the program video from the sections of the stored video based on the calculation result of the similarity calculation means;
A summary video detection apparatus comprising:

An accumulated video feature amount acquisition means for acquiring a feature amount generated by the feature amount generation device according to claim 3 for each of the sections of the stored video and each continuous section composed of a plurality of adjacent sections;
A program video feature amount acquisition means for acquiring a feature amount generated by the feature amount generation apparatus according to claim 1 for a program video;
For each of the sections and the continuous sections of the stored video, a similarity calculation unit that calculates a similarity between the feature quantity related to the section or the continuous section and the feature quantity related to the program video;
Summary video detection means for detecting a summary video of the program video from the sections and the continuous sections of the stored video based on the calculation result of the similarity calculation means;
A summary video detection apparatus comprising:

Extracting means for extracting one or a plurality of index words from at least one of character data or audio signals corresponding to the video, and a feature quantity related to the video based on the number of appearances in the video for each index word Feature quantity generating means to generate,
As a program to make the computer function.