JP2007060222A

JP2007060222A - Apparatus, method and program for processing video

Info

Publication number: JP2007060222A
Application number: JP2005242344A
Authority: JP
Inventors: Osamu Isaka; 治井坂; Haruo Kochi; 晴雄東風; Mitsuru Takahashi; 充高橋
Original assignee: Daikin Industries Ltd
Current assignee: Daikin Industries Ltd
Priority date: 2005-08-24
Filing date: 2005-08-24
Publication date: 2007-03-08

Abstract

<P>PROBLEM TO BE SOLVED: To efficiently recognize the outline of video contents and to efficiently extract a required scene by using an accurate representative image. <P>SOLUTION: An analysis processing part 211 detects whether or not a separated display selection caption is changed. When the display selection caption is changed, the representative image is registered on the basis of the display selection caption. In the meantime, when there is no display selection caption, a silent state is determined, and the analysis processing part 211 confirms whether or not the holding time of the silent state is prescribed reference time or longer. When the holding time of the silent state is the reference time or longer, shifting to a silent mode is performed, and the analysis processing part 211 calculates a difference between respective video data. Then, when the difference becomes a reference value or more, the analysis processing part 211 registers the representative image for the latter image data. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、短時間に映像を視聴するための映像処理装置、映像処理方法及びそのための映像処理プログラムに関する。 The present invention relates to a video processing apparatus, a video processing method, and a video processing program therefor for viewing video in a short time.

今日、放送やネットワークを介して多様なコンテンツが提供されている。このコンテンツの中には、動画を映し出した映像も含まれる。このような映像のコンテンツを短時間に確認したい場合がある。このような場合、録画して早送りで確認したり、録画装置のダイジェスト機能を利用したりする場合がある。しかし、録画テープの早送りの場合、高速で写し出される映像を眺めることによって検索するため、肝心な画面を見落とす場合がある。録画装置のダイジェスト機能では、画面上の色，形または動きに基づいて機械的に生成したダイジェストを用いるが、映像によっては全体の流れを把握することが難しい場合がある。 Today, various contents are provided through broadcasting and networks. This content includes a video showing a moving image. There is a case where such video content is desired to be confirmed in a short time. In such a case, there is a case where recording is performed and fast-forwarding is confirmed, or a digest function of the recording device is used. However, in the case of fast-forwarding a recording tape, the search is performed by looking at the video projected at a high speed, so the important screen may be overlooked. The digest function of the recording apparatus uses a digest that is mechanically generated based on the color, shape, or movement on the screen, but it may be difficult to grasp the entire flow depending on the video.

また、テレビ放送の音声と同期して送信されるクローズドキャプション（字幕放送）を利用する場合もある。このクローズドキャプションは、テレビジョン信号の２１番目の水平走査線に、画面に関する音声等を文字コード化したものを挿入する。クローズドキャプションデータは、専用のデコーダによってテレビジョン信号から分離することができる。 In some cases, closed captions (subtitle broadcasting) transmitted in synchronism with audio of a television broadcast are used. In this closed caption, a voice-coded audio code or the like is inserted into the 21st horizontal scanning line of the television signal. Closed caption data can be separated from the television signal by a dedicated decoder.

このようなクローズドキャプションデータに基づき映像を検索する映像検索装置に関する技術が開示されている（例えば、特許文献１参照。）。この特許文献１に記載の映像検索装置では、入力部から「検索準備」の指示を受けると、映像再生部が媒体の再生を開始し、デコーダがテレビジョン信号をデコードしてクローズドキャプションデータを取得し、メモリに格納する。そして、入力部から「検索要求設定」を受けると、文章検索部においてメモリ内のクローズドキャプションデータを検索し、この検索結果をメモリに保存する。更に、メモリに記録された検索結果を読み出してデコーダへ送り、デコーダでテレビジョン信号に変換して、映像表示部に表示する。これにより、映像に対応するクローズドキャプションの文字情報を利用して映像の内容を容易に検索することができる。
特開平７−２１２７０８号公報（第１頁） A technique related to a video search apparatus that searches for video based on such closed caption data is disclosed (see, for example, Patent Document 1). In the video search device described in Patent Document 1, upon receiving an instruction for “preparation for search” from the input unit, the video playback unit starts playback of the medium, and the decoder decodes the television signal to obtain closed caption data. And store it in memory. When “search request setting” is received from the input unit, the sentence search unit searches closed caption data in the memory, and the search result is stored in the memory. Further, the search result recorded in the memory is read out and sent to the decoder, which converts it into a television signal and displays it on the video display unit. Thereby, the content of the video can be easily searched using the text information of the closed caption corresponding to the video.
Japanese Patent Laid-Open No. 7-212708 (first page)

しかし、映像において重要な場面において、クローズドキャプションデータが記録されているとは限らない。すなわち、音声のない無声状態の場面においても、映像を通じて大切な情報を提供することがある。このような場合、クローズドキャプションデータのみに基づいてダイジェストを作成したのでは、適切なダイジェストを生成することができない場合もあり、コンテンツを把握することができない場合がある。 However, closed caption data is not always recorded in an important scene in the video. In other words, important information may be provided through video even in a silent state without sound. In such a case, if the digest is created based only on the closed caption data, an appropriate digest may not be generated, and the content may not be grasped.

本発明は、上記課題を解決するためになされたものであり、その目的は、より的確な代表画像を用いることにより、効率的に映像を短時間に視聴可能にして、コンテンツの概要を把握することができる映像処理装置、映像処理方法及び映像処理プログラムを提供することにある。 The present invention has been made to solve the above-mentioned problems, and its purpose is to make it possible to efficiently view a video in a short time by using a more accurate representative image, and to grasp an outline of the content. It is an object to provide a video processing apparatus, a video processing method, and a video processing program.

上記問題点を解決するために、請求項１に記載の発明は、字幕データを含む放送信号を取得する信号取得手段と、前記放送信号から少なくとも字幕データと映像情報を分離する分離手段と、前記字幕データを解析する解析手段と、記録された映像情報から代表画像を
抽出し、この代表画像の映像情報を表示させるためのポインタ情報を記録するポインタ情報記憶手段とを備えた映像処理装置であって、前記解析手段が、字幕の表示タイミングを検出し、前記表示タイミングに連動して映像情報のポインタ情報を記録する第１登録処理を実行し、字幕データが検出されない場合には、所定の登録ルールに従って、映像情報のポインタ情報を記録する第２登録処理を実行することを要旨とする。 In order to solve the above problem, the invention according to claim 1 is characterized in that a signal acquisition unit that acquires a broadcast signal including caption data, a separation unit that separates at least caption data and video information from the broadcast signal, A video processing apparatus comprising analysis means for analyzing subtitle data and pointer information storage means for recording pointer information for extracting representative images from the recorded video information and displaying the video information of the representative images. The analysis means detects a subtitle display timing, executes a first registration process for recording pointer information of video information in conjunction with the display timing, and when subtitle data is not detected, a predetermined registration is performed. The gist is to execute the second registration process for recording the pointer information of the video information according to the rule.

請求項２に記載の発明は、請求項１に記載の映像処理装置において、前記映像処理装置は、前記ポインタ情報記憶手段に記録されたポインタ情報を、ネットワークを介して利用者端末に対して送信する送信手段を更に備えたことを要旨とする。 According to a second aspect of the present invention, in the video processing device according to the first aspect, the video processing device transmits pointer information recorded in the pointer information storage means to a user terminal via a network. The gist of the present invention is to further include a transmitting means.

請求項３に記載の発明は、請求項１又は２に記載の映像処理装置において、前記映像処理装置は、放送信号から分離される映像情報を記録する映像情報記憶手段を更に備え、前記ポインタ情報を参照し、前記映像情報記憶手段に記録された映像情報から代表画像を抽出する抽出手段を更に備えたことを要旨とする。 According to a third aspect of the present invention, in the video processing device according to the first or second aspect, the video processing device further includes video information storage means for recording video information separated from a broadcast signal, and the pointer information And an extractor for extracting a representative image from the video information recorded in the video information storage means.

請求項４に記載の発明は、請求項１〜３のいずれか一つに記載の映像処理装置において、前記登録ルールは、連続する映像情報の差分を算出し、前記差分が基準値以上になった場合に、前記映像情報に関するポインタ情報を記録することを要旨とする。 According to a fourth aspect of the present invention, in the video processing device according to any one of the first to third aspects, the registration rule calculates a difference between continuous video information, and the difference is equal to or greater than a reference value. In this case, the gist is to record the pointer information related to the video information.

請求項５に記載の発明は、請求項１〜４のいずれか一つに記載の映像処理装置において、前記解析手段は、前記分離手段が分離した字幕データを、前記ポインタ情報に関連付けて記録することを要旨とする。 According to a fifth aspect of the present invention, in the video processing device according to any one of the first to fourth aspects, the analysis unit records the subtitle data separated by the separation unit in association with the pointer information. This is the gist.

請求項６に記載の発明は、請求項１〜５のいずれか一つに記載の映像処理装置において、前記解析手段は、前記ポインタ情報により特定される代表画像データを、前記ポインタ情報に関連付けて記録することを要旨とする。 According to a sixth aspect of the present invention, in the video processing device according to any one of the first to fifth aspects, the analyzing unit associates representative image data specified by the pointer information with the pointer information. The gist is to record.

請求項７に記載の発明は、請求項１〜６のいずれか一つに記載の映像処理装置において、前記映像処理装置は、前記ポインタ情報の修正手段を更に備えたことを要旨とする。
請求項８に記載の発明は、字幕データを含む放送信号を取得する信号取得手段と、前記放送信号から少なくとも字幕データと映像情報を分離する分離手段と、前記字幕データを解析する解析手段と、記録された映像情報から代表画像を抽出し、この代表画像の映像情報を表示させるためのポインタ情報を記録するポインタ情報記憶手段とを備えた映像処理装置を用いて映像情報を登録する方法であって、前記解析手段が、字幕の表示タイミングを検出し、前記表示タイミングに連動して映像情報のポインタ情報を記録する第１登録処理を実行し、字幕データが検出されない場合には、所定の登録ルールに従って、映像情報のポインタ情報を記録する第２登録処理を実行することを要旨とする。 A seventh aspect of the present invention is the video processing apparatus according to any one of the first to sixth aspects, wherein the video processing apparatus further includes a correction unit for the pointer information.
The invention according to claim 8 is a signal acquisition means for acquiring a broadcast signal including caption data, a separation means for separating at least caption data and video information from the broadcast signal, an analysis means for analyzing the caption data, This is a method of registering video information using a video processing apparatus that includes a pointer information storage unit that extracts a representative image from recorded video information and records pointer information for displaying the video information of the representative image. The analysis means detects a subtitle display timing, executes a first registration process for recording pointer information of video information in conjunction with the display timing, and when subtitle data is not detected, a predetermined registration is performed. The gist is to execute the second registration process for recording the pointer information of the video information according to the rule.

請求項９に記載の発明は、字幕データを含む放送信号を取得する信号取得手段と、前記放送信号から少なくとも字幕データと映像情報を分離する分離手段と、前記字幕データを解析する解析手段と、記録された映像情報から代表画像を抽出し、この代表画像の映像情報を表示させるためのポインタ情報を記録するポインタ情報記憶手段とを備えた映像処理装置を用いて映像情報を登録するプログラムであって、前記解析手段を、字幕の表示タイミングを検出し、前記表示タイミングに連動して映像情報のポインタ情報を記録する第１登録処理を実行し、字幕データが検出されない場合には、所定の登録ルールに従って、映像情報のポインタ情報を記録する第２登録処理を実行する手段として機能させることを要旨とする。 The invention described in claim 9 is a signal acquisition means for acquiring a broadcast signal including caption data, a separation means for separating at least caption data and video information from the broadcast signal, an analysis means for analyzing the caption data, A program for registering video information using a video processing device that includes a pointer information storage unit that extracts a representative image from recorded video information and records pointer information for displaying the video information of the representative image. The analysis means detects a subtitle display timing, performs a first registration process for recording pointer information of video information in conjunction with the display timing, and when subtitle data is not detected, a predetermined registration is performed. The gist is to function as a means for executing a second registration process for recording pointer information of video information according to a rule.

（作用）
請求項１、８、９に記載の発明によれば、解析手段は、字幕の表示タイミングを検出し
、この表示タイミングに連動して映像情報のポインタ情報を記録する第１登録処理を実行し、更に、字幕データが検出されない場合には、所定の登録ルールに従って、映像情報のポインタ情報を記録する第２登録処理を実行する。これにより、字幕データが検出されない無声状態においても、ポインタ情報を記録することができる。ここで、ポインタ情報としては、映像処理装置に設けた計時手段による時刻データや、映像処理装置に設けたフレームカウンタによるフレームカウントデータを用いることができる。このポインタ情報を用いることにより代表画像を抽出することができるので、利用者は短時間に映像コンテンツを把握することができる。 (Function)
According to the first, eighth, and ninth aspects of the invention, the analysis unit detects the display timing of the caption, and executes the first registration process for recording the pointer information of the video information in conjunction with the display timing, Furthermore, when subtitle data is not detected, the 2nd registration process which records the pointer information of video information according to a predetermined registration rule is performed. Thereby, pointer information can be recorded even in a silent state in which no caption data is detected. Here, as the pointer information, it is possible to use time data by a clock means provided in the video processing device or frame count data by a frame counter provided in the video processing device. Since the representative image can be extracted by using the pointer information, the user can grasp the video content in a short time.

請求項２に記載の発明によれば、ポインタ情報記憶手段に記録されたポインタ情報を、ネットワークを介して利用者端末に対して送信する。これにより、利用者端末において、ポインタ情報を生成する負荷を軽減することができる。 According to the second aspect of the present invention, the pointer information recorded in the pointer information storage means is transmitted to the user terminal via the network. Thereby, the load which produces | generates pointer information in a user terminal can be reduced.

請求項３に記載の発明によれば、ポインタ情報を用いて代表画像を抽出するため、この代表画像を確認することにより、コンテンツを把握することができる。
請求項４に記載の発明によれば、連続する映像情報の差分を算出し、この差分が基準値以上になった場合に、映像情報に関するポインタ情報を記録する。これにより、映像の変化に基づいて代表画像を特定することができる。 According to the invention described in claim 3, since the representative image is extracted using the pointer information, the content can be grasped by confirming the representative image.
According to the fourth aspect of the present invention, the difference between successive video information is calculated, and the pointer information related to the video information is recorded when the difference exceeds a reference value. Thereby, the representative image can be specified based on the change of the video.

請求項５に記載の発明によれば、ポインタ情報に字幕データが関連付けられて記録されるので、字幕を閲覧しながら、コンテンツを把握することができる。
請求項６に記載の発明によれば、ポインタ情報には代表画像が関連付けられて記録されるので、効率的に代表画像を表示させることができる。 According to the invention described in claim 5, since the caption data is recorded in association with the pointer information, it is possible to grasp the content while browsing the caption.
According to the sixth aspect of the present invention, since the representative image is recorded in association with the pointer information, the representative image can be efficiently displayed.

請求項７に記載の発明によれば、映像処理装置はポインタ情報の修正手段を備えので、ポインタ情報を適切な位置に修正することができる。 According to the seventh aspect of the present invention, since the video processing apparatus includes the pointer information correcting means, the pointer information can be corrected to an appropriate position.

本発明によれば、より的確な代表画像を用いることにより、効率的に映像を短時間に視聴可能にして、コンテンツの概要を把握することができる。そして、効率的に必要な場面を抽出することができる。 According to the present invention, by using a more accurate representative image, it is possible to efficiently view a video in a short time and grasp the outline of the content. And a necessary scene can be extracted efficiently.

（第１の実施形態）
以下、本発明を具体化した第１の一実施形態を、図１〜図６を用いて説明する。図１は、本発明を適用した映像処理装置の構成について説明するための説明図である。本実施形態では、図１に示すように、放送局からの放送信号を、テレビジョン受像機３０を用いて受信する。このテレビジョン受像機３０には録画装置２０が接続される。例えば、この録画装置２０は映像処理装置として機能し、ハードディスクレコーダ装置を用いることができる。 (First embodiment)
Hereinafter, a first embodiment embodying the present invention will be described with reference to FIGS. FIG. 1 is an explanatory diagram for explaining a configuration of a video processing apparatus to which the present invention is applied. In this embodiment, as shown in FIG. 1, a broadcast signal from a broadcast station is received using a television receiver 30. A recording device 20 is connected to the television receiver 30. For example, the recording device 20 functions as a video processing device, and a hard disk recorder device can be used.

放送局１０は、地上波や衛星波などを用いて番組を放送する施設である。放送される番組の放送信号には、映像データと音声データが含まれる。この映像データは、動画像データであり、音声データは、映像データと同期して再生される音声に関するデータである。 The broadcasting station 10 is a facility that broadcasts programs using terrestrial waves or satellite waves. The broadcast signal of the broadcast program includes video data and audio data. The video data is moving image data, and the audio data is data relating to audio that is reproduced in synchronization with the video data.

また、この放送信号の中には、映像として常に表示される字幕と、選択により表示される字幕とが含まれる。前者には、番組の題名やキャストなどの紹介、海外の作品における日本語字幕等がある。一方、後者のように表示選択の可能な字幕データ（いわゆるクローズドキャプション）には、出演者の会話に対応するテキストデータや、例えばＢＧＭや効果音など、放送コンテンツに関する説明が含まれている場合もある。このように、表示と
非表示を選択可能な字幕データを「表示選択字幕データ」と呼ぶ。 Also, the broadcast signal includes subtitles that are always displayed as video and subtitles that are displayed by selection. The former includes introductions of program titles and casts, Japanese subtitles in overseas works, etc. On the other hand, subtitle data (so-called closed caption) that can be displayed and selected as in the latter case may include text data corresponding to a performer's conversation, and explanations about broadcast content such as BGM and sound effects. is there. In this way, caption data that can be selected to be displayed or not is referred to as “display selected caption data”.

次に、この表示選択字幕データについて説明する。例えばＮＴＳＣ方式のアナログの地上波放送では、映像信号に５２５本の走査線が用いられている。この５２５本のうち、各フィールド（２フィールドで１フレームを構成）の最初の２１本相当は、ＶＢＩ（Vertical Blanking Interval：垂直帰線消去期間）と呼ばれ、走査を開始するためのインターバル用に割り当てられている。クローズドキャプションは、各フィールドのＶＢＩのうち、ＶＢＩの２１本目に７ｂｉｔの文字コードを多重化することによって伝送されるように構成されている。そして、各フィールドを使って２種類の文字セットを約６０文字／秒で伝送することができる。この表示選択字幕データは、再生時に映像データからデコードされて、映像と同時に表示させることができる。 Next, the display selection subtitle data will be described. For example, in NTSC analog terrestrial broadcasting, 525 scanning lines are used for video signals. Of these 525 lines, the first 21 lines of each field (one field is composed of 2 fields) is called VBI (Vertical Blanking Interval), and is used as an interval for starting scanning. Assigned. The closed caption is configured to be transmitted by multiplexing a 7-bit character code in the 21st VBI of the VBI of each field. Each field can be used to transmit two types of character sets at about 60 characters / second. This display selection subtitle data is decoded from the video data at the time of reproduction, and can be displayed simultaneously with the video.

ユーザは、テレビジョン受像機３０を用いて、表示選択字幕データを含む映像データ及び音声データで構成される放送信号を受信する。このテレビジョン受像機３０は、チューナ３１、信号処理部３２、ディスプレイやスピーカからなる出力部３３を備える。表示選択字幕データを表示させない場合には、チューナ３１によって選局された放送信号を信号処理部３２において復調し、出力部３３において、映像信号はディスプレイに出力し、音声信号はスピーカに出力して番組を視聴することができる。 The user uses the television receiver 30 to receive a broadcast signal composed of video data and audio data including display selection subtitle data. The television receiver 30 includes a tuner 31, a signal processing unit 32, and an output unit 33 including a display and a speaker. When the display selection subtitle data is not displayed, the broadcast signal selected by the tuner 31 is demodulated by the signal processing unit 32, the video signal is output to the display, and the audio signal is output to the speaker by the output unit 33. You can watch the program.

この録画装置２０は、テレビジョン受像機３０のチューナ３１によって選局された放送信号を取得し、映像信号および音声信号をデコードする。そして、ユーザの操作入力に基づいて、表示選択字幕データを抽出してデコードする。 The recording device 20 acquires a broadcast signal selected by the tuner 31 of the television receiver 30 and decodes a video signal and an audio signal. Then, based on the user's operation input, the display selection subtitle data is extracted and decoded.

更に、録画装置２０は、メタデータを生成し、これを用いて録画データのダイジェェストを作成する。このダイジェストは、映像の特徴となる代表画像を一覧表示させるためのデータである。このメタデータは、図２に示すように、番組ＩＤ毎に、映像の中で代表画像を特定するためのポインタ情報としてのタイムコードを含む。更に、この画像に関連した選択表示字幕がある場合には、この選択表示字幕に対応したテキストデータを、タイムコードに関連付けて記録される。 Further, the recording device 20 generates metadata and uses this to create a digest of the recording data. This digest is data for displaying a list of representative images that are characteristic of the video. As shown in FIG. 2, this metadata includes a time code as pointer information for specifying a representative image in the video for each program ID. Further, when there is a selected display subtitle associated with this image, text data corresponding to this selected display subtitle is recorded in association with the time code.

この録画装置２０は、ＣＰＵ（Central Processing Unit ）からなる制御手段、ＲＯＭ（Read Only Memory）やＲＡＭ（Random Access Memory )、ＨＤＤ（Hard Disk Drive ）等のデータ記憶手段を備える。録画装置２０のブロック構成を図３に示す。録画装置２０が実行する機能（ＣＰＵが実行する映像処理プログラムによって実行される機能を含む）について説明するための機能ブロック図である。 The recording device 20 includes a control unit including a CPU (Central Processing Unit), and a data storage unit such as a ROM (Read Only Memory), a RAM (Random Access Memory), and an HDD (Hard Disk Drive). FIG. 3 shows a block configuration of the recording device 20. It is a functional block diagram for demonstrating the function (The function performed by the video processing program which CPU performs) which the video recording apparatus 20 performs.

この録画装置２０のブロック構成図を図３に示す。
テレビジョン受像機３０のチューナ３１により選局された放送信号のうち、表示選択字幕データを含む映像信号、音声信号は、それぞれ所定の入力端子より、映像信号デコーダ２０１、音声信号デコーダ２０２に入力される。 A block diagram of the recording apparatus 20 is shown in FIG.
Of the broadcast signals selected by the tuner 31 of the television receiver 30, the video signal and the audio signal including the display selection subtitle data are respectively input to the video signal decoder 201 and the audio signal decoder 202 from predetermined input terminals. The

映像信号デコーダ２０１は、信号取得手段として機能し、供給された映像信号をデコードし、デコードされた映像データをメモリ２０３に供給する。メモリ２０３は、供給された映像信号を一時保持するフレームメモリである。音声信号デコーダ２０２は、供給された音声信号をデコードし、デコードされた音声データを出力する。 The video signal decoder 201 functions as a signal acquisition unit, decodes the supplied video signal, and supplies the decoded video data to the memory 203. The memory 203 is a frame memory that temporarily holds the supplied video signal. The audio signal decoder 202 decodes the supplied audio signal and outputs the decoded audio data.

受信された放送信号をリアルタイムに出力する場合、表示選択字幕データデコーダ２０５は、メモリ２０３から映像データを取得する。そして、表示選択字幕データの表示が指示された場合、表示選択字幕データデコーダ２０５は、取得した映像データに含まれる表示選択字幕データをデコードする映像情報を分離する分離手段として機能する。そして、
対応するテキストデータをＯＳＤ（On Screen Display ）２０６に供給するとともに、映像データを合成処理部２０７に供給する。 When the received broadcast signal is output in real time, the display selection subtitle data decoder 205 acquires video data from the memory 203. When the display selection subtitle data is instructed to be displayed, the display selection subtitle data decoder 205 functions as a separation unit that separates the video information for decoding the display selection subtitle data included in the acquired video data. And
Corresponding text data is supplied to an OSD (On Screen Display) 206 and video data is supplied to a composition processing unit 207.

ＯＳＤ２０６は、供給されたテキストデータを、表示画面に重畳して表示させるための画像データであるＯＳＤデータに変換して、合成処理部２０７に供給する。合成処理部２０７は、供給された映像データに、供給されたＯＳＤデータを重畳して、出力端子からテレビジョン受像機３０の出力部３３のディスプレイに出力する。また、音声処理部２０８は、音声信号デコーダ２０２によりデコードされた音声データを取得して、テレビジョン受像機３０の出力部３３のスピーカに出力する。 The OSD 206 converts the supplied text data into OSD data, which is image data to be displayed superimposed on the display screen, and supplies the OSD data to the composition processing unit 207. The composition processing unit 207 superimposes the supplied OSD data on the supplied video data, and outputs it from the output terminal to the display of the output unit 33 of the television receiver 30. Also, the audio processing unit 208 acquires the audio data decoded by the audio signal decoder 202 and outputs it to the speaker of the output unit 33 of the television receiver 30.

番組ＩＤ抽出部２１０は、メモリ２０３に保持されている映像データから番組管理データに含まれる番組ＩＤを抽出させる。
録画管理部２０９は、映像情報から代表画像を抽出する抽出手段として機能する。この録画管理部２０９は、内蔵タイマを参照して、録画が開始された時刻（絶対時刻）を取得し、メモリ２０３から供給される映像データ、又は音声信号デコーダ２０２から供給される音声データのうちの少なくともいずれか一方に、取得した時刻情報を付加する。更に、映像データおよび音声データに対して、番組ＩＤ抽出部２１０から供給された番組ＩＤを付加して、映像情報記憶手段としての録画データ記憶部２２に供給する。ここで付加された時刻情報は、タイムスタンプとして、後述する処理において用いられる。録画データ記憶部２２には、番組ＩＤ及びタイムスタンプが付加された映像データおよび音声データを記憶する。録画データ記憶部２２は、例えば、ハードディスクなどの大容量記録媒体により構成されるが、ＤＶＤ（Digital Versatile Disk）やビデオテープなどのリムーバブルな記録媒体を用いることも可能である。 The program ID extraction unit 210 extracts the program ID included in the program management data from the video data held in the memory 203.
The recording management unit 209 functions as an extraction unit that extracts a representative image from video information. The recording management unit 209 refers to the built-in timer to acquire the time (absolute time) at which recording was started, and from among the video data supplied from the memory 203 or the audio data supplied from the audio signal decoder 202 The acquired time information is added to at least one of the above. Further, the program ID supplied from the program ID extraction unit 210 is added to the video data and audio data, and the video data and audio data are supplied to the recording data storage unit 22 as video information storage means. The time information added here is used as a time stamp in processing to be described later. The recorded data storage unit 22 stores video data and audio data to which a program ID and a time stamp are added. The recorded data storage unit 22 is constituted by a large-capacity recording medium such as a hard disk, for example, but a removable recording medium such as a DVD (Digital Versatile Disk) or a video tape can also be used.

表示選択字幕データデコーダ２０５は、取得した映像データに含まれる表示選択字幕データをデコードして、対応するテキストデータを解析処理部２１１に供給する。
この解析処理部２１１は解析手段として機能し、字幕モード処理（第１登録処理）と無声モード処理（第２登録処理）とを実行する。この２つの処理を、図４、図５を用いて説明する。ここでは、登録ルールとして、連続する映像データの差分を算出し、この差分が基準値以上になった場合に、映像情報に関するポインタ情報を記録するルールを用いる。 The display selection subtitle data decoder 205 decodes the display selection subtitle data included in the acquired video data and supplies the corresponding text data to the analysis processing unit 211.
The analysis processing unit 211 functions as an analysis unit, and executes subtitle mode processing (first registration processing) and silent mode processing (second registration processing). These two processes will be described with reference to FIGS. Here, as a registration rule, a rule is used in which a difference between successive video data is calculated, and pointer information relating to video information is recorded when the difference exceeds a reference value.

まず、図４を用いて、字幕モード処理を説明する。この字幕モード処理においては、映像データに表示選択字幕データが含まれているかどうかを検出する（ステップＳ１−１）。表示選択字幕データを分離できた場合（ステップＳ１−１において「ＹＥＳ」の場合）、解析処理部２１１は、表示選択字幕の表示タイミングを検出し、前記表示タイミングに連動して代表画像の登録を行なう（ステップＳ１−２）。具体的には、表示選択字幕データデコーダ２０５から供給された表示選択字幕データに対応するテキストデータを、適切な長さのテキスト群に分割し、分割されたテキスト群をタイムコード付加処理部２１３に供給する。 First, subtitle mode processing will be described with reference to FIG. In this caption mode processing, it is detected whether or not display selection caption data is included in the video data (step S1-1). When the display selection subtitle data can be separated (in the case of “YES” in step S1-1), the analysis processing unit 211 detects the display timing of the display selection subtitle and registers the representative image in conjunction with the display timing. Perform (step S1-2). Specifically, the text data corresponding to the display selection subtitle data supplied from the display selection subtitle data decoder 205 is divided into text groups of appropriate lengths, and the divided text groups are sent to the time code addition processing unit 213. Supply.

一方、表示選択字幕データが含まれておらず、分離できない場合（ステップＳ１−１において「ＮＯ」の場合）、無声状態と判断する。この場合、解析処理部２１１は、タイマを始動し、この無声状態の継続時間を計る。そして、解析処理部２１１は、所定のメモリに記録された基準時間を取得し、無声状態の保持時間が、この基準時間より長く継続しているかどうかを比較する（ステップＳ１−３）。無声状態の保持時間が基準時間を超えている場合（ステップＳ１−３において「ＹＥＳ」の場合）、無声モードの処理（図５）を実行する。なお、無声状態の保持時間が基準時間に達していない場合（ステップＳ１−３において「ＮＯ」の場合）、ステップＳ１−１の処理に戻る。 On the other hand, when the display selection subtitle data is not included and separation is not possible (in the case of “NO” in step S1-1), it is determined that the state is silent. In this case, the analysis processing unit 211 starts a timer and measures the duration of the silent state. Then, the analysis processing unit 211 acquires a reference time recorded in a predetermined memory, and compares whether or not the unvoiced state holding time continues longer than the reference time (step S1-3). If the unvoiced state retention time exceeds the reference time (in the case of “YES” in step S1-3), the process of the silent mode (FIG. 5) is executed. If the unvoiced state holding time has not reached the reference time (in the case of “NO” in step S1-3), the process returns to step S1-1.

次に、図５を用いて無声モード処理について説明する。この無声モード処理では、解析
処理部２１１は各映像データ間の差分を算出する（ステップＳ２−１）。具体的には、解析処理部２１１は、現在の画像と次の画像（所定時間経過後の画像）との差分を数値として算出する。この差分には、画像を構成する各画素の色信号や強度信号の変化の総計値や、動きベクトルの大きさを用いることも可能である。 Next, the silent mode processing will be described with reference to FIG. In the silent mode process, the analysis processing unit 211 calculates a difference between the video data (step S2-1). Specifically, the analysis processing unit 211 calculates a difference between the current image and the next image (an image after a predetermined time has elapsed) as a numerical value. For this difference, it is also possible to use the total value of changes in the color signal and intensity signal of each pixel constituting the image and the magnitude of the motion vector.

そして、解析処理部２１１は、この差分と所定の基準値とを比較する（ステップＳ２−２）。差分が基準値以上になった場合（ステップＳ２−２において「ＹＥＳ」の場合）、差分を算出した画像から表示選択字幕データを分離しているかどうかを判断する（ステップＳ２−３）。表示選択字幕データを分離している場合（ステップＳ２−３において「ＹＥＳ」の場合）には、字幕モード処理に戻る（ステップＳ２−４）。 Then, the analysis processing unit 211 compares this difference with a predetermined reference value (step S2-2). If the difference is equal to or greater than the reference value (“YES” in step S2-2), it is determined whether or not the display selection subtitle data is separated from the image for which the difference is calculated (step S2-3). When the display selection subtitle data is separated (in the case of “YES” in step S2-3), the processing returns to the subtitle mode processing (step S2-4).

一方、差分が基準値以上になっており、この段階においても表示選択字幕データを分離していない場合（ステップＳ２−３において「ＮＯ」の場合）、解析処理部２１１は、後の画像データを代表画像の登録を行なう（ステップＳ２−５）。この場合には、表示選択字幕に対応するテキストがないため、解析処理部２１１は、代表画像の登録指示として「無声」の文字からなるテキストデータをタイムコード付加処理部２１３に供給する。 On the other hand, if the difference is equal to or greater than the reference value and the display selection subtitle data is not separated at this stage (in the case of “NO” in step S2-3), the analysis processing unit 211 stores the subsequent image data. The representative image is registered (step S2-5). In this case, since there is no text corresponding to the display selection subtitle, the analysis processing unit 211 supplies text data including “unvoiced” characters to the time code addition processing unit 213 as a representative image registration instruction.

図６に示すように時刻１〜２の「字幕１」、時刻３〜４の「字幕２」が代表画像のポインタ情報を登録するためのトリガとして用いられる。また、時刻５から所定時刻が経過した時刻６から無声モードになり、時刻６以上は画像の変化値を算出する。そして、基準値を超えた時刻７が代表画像を登録するためのトリガとして用いられる。更に、時刻８において、字幕モードに復帰する。 As shown in FIG. 6, "Subtitle 1" at times 1 and 2 and "Subtitle 2" at times 3 to 4 are used as triggers for registering pointer information of the representative image. Also, the silent mode is entered from time 6 when a predetermined time has elapsed from time 5, and the change value of the image is calculated at time 6 or higher. The time 7 that exceeds the reference value is used as a trigger for registering the representative image. Further, at time 8, the subtitle mode is restored.

このように解析処理部２１１から代表画像の登録指示を受けるタイムコード付加処理部２１３の機能を、図２に戻って説明する。タイムコード付加処理部２１３は、タイマ２１２を用いて、代表画像の登録指示を受けた時刻をタイムコードとして付加する。例えば、表示選択字幕に対応するテキストの場合には、表示選択字幕の開始時刻に対応するタイムコードが付加される。放送に対してリアルタイムで放送信号を取得した場合、タイムコード付加処理部２１３はタイマ２１２が示す現在時刻に基づいて、タイムコードをテキストデータに付加するものとする。また、番組放送時刻に対してタイムコード付加時刻に遅れがある場合には、タイムコード付加処理部２１３は、この遅延時間とタイマ２１２が示す現在時刻とに基づいて、番組の放送時刻に対応するタイムコードを算出し、テキストデータに付加する。タイムコード付加処理部２１３は、タイムコードが付加されたテキストデータを、メタデータ生成部２１４に供給する。 The function of the time code addition processing unit 213 that receives a representative image registration instruction from the analysis processing unit 211 will be described with reference to FIG. The time code addition processing unit 213 uses the timer 212 to add the time at which the representative image registration instruction is received as a time code. For example, in the case of text corresponding to a display selection subtitle, a time code corresponding to the start time of the display selection subtitle is added. When the broadcast signal is acquired in real time for the broadcast, the time code addition processing unit 213 adds the time code to the text data based on the current time indicated by the timer 212. If the time code addition time is delayed with respect to the program broadcast time, the time code addition processing unit 213 corresponds to the broadcast time of the program based on the delay time and the current time indicated by the timer 212. The time code is calculated and added to the text data. The time code addition processing unit 213 supplies the text data with the time code added to the metadata generation unit 214.

メタデータ生成部２１４は、このタイムコードが付加されたテキストデータに、番組ＩＤ抽出部２１０から供給された番組ＩＤ情報を付加してメタデータを生成する。そして、メタデータを、ポインタ情報記憶手段としてのダイジェストデータ記憶部２３に記録する。 The metadata generation unit 214 generates the metadata by adding the program ID information supplied from the program ID extraction unit 210 to the text data to which the time code is added. Then, the metadata is recorded in the digest data storage unit 23 as pointer information storage means.

次に、このメタデータを利用して短時間に番組を視聴する場合の処理を説明する。この場合、録画装置の操作入力部２００を用いる。この操作入力部２００や表示部は、例えば、ボタン、キー、タッチパネル、タッチパッド、レバーなどの入力デバイスで構成され、ユーザの操作入力を受ける。表示部は、例えば、ＬＣＤ（Liquid Crystal Display）またはＣＲＴ（Cathode Ray Tube）などで構成され、各種情報を表示する。操作入力部２００に、視聴対象の番組ＩＤが入力された場合、録画管理部２０９は、ダイジェストデータ記憶部２３から、この番組ＩＤの付加されたメタデータを抽出する。そして、録画管理部２０９は、このメタデータに含まれるタイムコードと番組ＩＤと基づいて、録画データ記憶部２２を検索して、この番組の録画データを読み出す。 Next, processing when viewing a program in a short time using this metadata will be described. In this case, the operation input unit 200 of the recording apparatus is used. The operation input unit 200 and the display unit are configured by input devices such as buttons, keys, a touch panel, a touch pad, and a lever, and receive user operation inputs. The display unit is composed of, for example, an LCD (Liquid Crystal Display) or a CRT (Cathode Ray Tube), and displays various types of information. When the viewing target program ID is input to the operation input unit 200, the recording management unit 209 extracts the metadata to which the program ID is added from the digest data storage unit 23. Then, the recording management unit 209 searches the recording data storage unit 22 based on the time code and the program ID included in the metadata, and reads the recording data of this program.

そして、録画管理部２０９は、タイムコードに対応する画面を特定し、タイムコードの早い時刻から順番に画像を列挙した一覧画面データを生成する。各画像には、メタデータに含まれるテキストデータが表示される。更に、録画管理部２０９は、各画像に対してタイムコードを付加する。この一覧画面データを表示部に出力させる。これにより、ユーザは映像の概要を把握することができる。更に、この一覧画面の中の所定の代表画面を選択することにより、ユーザが所望する位置から、録画の再生を開始する。この場合も、録画管理部２０９は、代表画像に関連付けられたタイムコードに基づいて、録画データ記憶部２２から映像を再生する。 Then, the recording management unit 209 identifies a screen corresponding to the time code, and generates list screen data in which images are listed in order from an earlier time of the time code. In each image, text data included in the metadata is displayed. Furthermore, the recording management unit 209 adds a time code to each image. The list screen data is output to the display unit. Thereby, the user can grasp | ascertain the outline | summary of an image | video. Further, by selecting a predetermined representative screen in the list screen, the reproduction of the recording is started from the position desired by the user. Also in this case, the recording management unit 209 reproduces the video from the recording data storage unit 22 based on the time code associated with the representative image.

以上、本実施形態によれば、以下に示す効果を得ることができる。
○ 上記実施形態では、表示選択字幕を分離できた場合、解析処理部２１１は、表示選択字幕に基づいて代表画像の登録を行なう（ステップＳ１−２）。一方、表示選択字幕が分離できない場合、無声状態と判断されて、解析処理部２１１は、無声状態の保持時間が所定の基準時間以上になっている場合、無声モードに移行する。この無声モードでは、解析処理部２１１は各映像データ間の差分を算出する（ステップＳ２−１）。そして、差分が基準値以上の場合には、代表画像の登録を行なう。これにより、字幕がある場合には、この字幕により代表画像が登録される。一方、字幕もない場合には、この間の画像の変化により代表画像が登録される。従って、無声状態であっても、適切に代表画像を一覧表示させたダイジェストを生成することができる。このダイジェストを用いることにより、ユーザは、効率的に映像コンテンツの概要を把握することができる。 As described above, according to the present embodiment, the following effects can be obtained.
In the above embodiment, when the display selection subtitle can be separated, the analysis processing unit 211 registers the representative image based on the display selection subtitle (step S1-2). On the other hand, if the display selection subtitle cannot be separated, it is determined that the state is unvoiced, and the analysis processing unit 211 shifts to the unvoiced mode when the unvoiced state holding time is equal to or longer than a predetermined reference time. In the silent mode, the analysis processing unit 211 calculates a difference between the pieces of video data (step S2-1). If the difference is greater than or equal to the reference value, the representative image is registered. Thereby, when there is a caption, the representative image is registered with the caption. On the other hand, when there is no subtitle, the representative image is registered by the change of the image during this period. Therefore, it is possible to generate a digest that appropriately displays a list of representative images even in a silent state. By using this digest, the user can efficiently grasp the outline of the video content.

○ 上記実施形態では、解析処理部２１１は、無声状態の保持時間が所定の基準時間以上になっているかどうかを比較する（ステップＳ１−３）。無声状態の保持時間が基準時間以上になっている場合、無声モードに移行する。すなわち、各映像データ間の差分を算出し、これに基づいて代表画像の登録を行なう。画像間の差分の計算にはＣＰＵに負荷がかかるが、無声モードに移行するまでのタイムラグを設けることにより、この負荷を軽減することができる。 In the above embodiment, the analysis processing unit 211 compares whether or not the unvoiced state retention time is equal to or longer than a predetermined reference time (step S1-3). When the silent state holding time is equal to or longer than the reference time, the silent mode is entered. That is, the difference between the video data is calculated, and the representative image is registered based on the difference. The calculation of the difference between images places a load on the CPU, but this load can be reduced by providing a time lag until the transition to the silent mode.

○ 上記実施形態では、録画管理部２０９は、タイムコードに対応する画面を特定し、タイムコードの早い時刻から順番に画像を列挙した一覧画面データを生成する。各画像には、メタデータに含まれるテキストデータが表示される。更に、録画管理部２０９は、各画像に対してタイムコードを付加する。このテキストデータにより、代表画面の内容を、より正確に把握することができる。更に、タイムコードを用いることにより。この部分から映像の再生を行なうこともできる。 In the above embodiment, the recording management unit 209 identifies a screen corresponding to a time code, and generates list screen data that lists images in order from the earliest time of the time code. In each image, text data included in the metadata is displayed. Furthermore, the recording management unit 209 adds a time code to each image. With this text data, the contents of the representative screen can be grasped more accurately. Furthermore, by using time code. Video can also be played from this part.

○ 上記実施形態では、このタイムコードに関する画像を録画データ記憶部２２から検索して読み出し、メタデータとともに、タイムコードにおいて早い時刻から順番に画像を表示部に表示させる。これにより、画像とメタデータ（ここでは、字幕）を用いるため、ユーザは効率的に概要を把握することができる。 In the above embodiment, an image related to this time code is retrieved from the recorded data storage unit 22 and read out, and the image is displayed on the display unit together with the metadata in order from the earliest time in the time code. Thereby, since an image and metadata (here, subtitles) are used, the user can efficiently grasp the outline.

（第２の実施形態）
次に、本発明を具体化した第２の実施形態を、図７〜図１１を用いて説明する。なお、第２の実施形態は、第１の実施形態のメタデータの生成を、外部のサーバで実行させる構成であるため、同様の部分についてはその詳細な説明を省略する。図７は、本発明を適用した情報処理システムの構成について説明するためのシステム図である。本実施形態では、図７に示すように、放送局からの放送を、テレビジョン受像機３０を用いて受信する。このテレビジョン受像機３０には表示選択字幕デコーダ４０、ホームサーバ５０が接続される。また、ホームサーバ５０は、ネットワークとしてのインターネットＩを介してメタデータ作成サーバ７０に接続される。 (Second Embodiment)
Next, a second embodiment of the present invention will be described with reference to FIGS. In the second embodiment, the metadata generation of the first embodiment is executed by an external server, and detailed description of the same parts is omitted. FIG. 7 is a system diagram for explaining a configuration of an information processing system to which the present invention is applied. In the present embodiment, as shown in FIG. 7, a broadcast from a broadcasting station is received using a television receiver 30. A display selection subtitle decoder 40 and a home server 50 are connected to the television receiver 30. The home server 50 is connected to the metadata creation server 70 via the Internet I as a network.

放送局１０は、第１の実施形態と同様に地上波や衛星波などを用いて番組を放送する施設である。この放送信号の中には、映像として常に表示される字幕と、選択により表示される字幕とが含まれる。ユーザは、テレビジョン受像機３０を用いて、表示選択字幕データを含む映像データ及ぶ音声データで構成される放送信号を受信する。 The broadcasting station 10 is a facility that broadcasts programs using terrestrial waves, satellite waves, and the like, as in the first embodiment. This broadcast signal includes subtitles that are always displayed as video and subtitles that are displayed by selection. The user uses the television receiver 30 to receive a broadcast signal composed of video data including display selection subtitle data and audio data.

更に、本実施形態では、表示選択字幕デコーダ４０は、テレビジョン受像機３０のチューナ３１によって選局された放送信号を取得し、映像信号および音声信号をデコードする。そして、ユーザの操作入力に基づいて、表示選択字幕データを抽出、デコードし、出力部３３への出力を行なう。また、表示選択字幕デコーダ４０は、ホームサーバ５０がインターネットＩを介してメタデータ作成サーバ７０から取得したメタデータを用いて検索や要約処理を実行することが可能なタイムコード付き録画データを生成する。 Further, in the present embodiment, the display selection subtitle decoder 40 acquires the broadcast signal selected by the tuner 31 of the television receiver 30 and decodes the video signal and the audio signal. Then, the display selection subtitle data is extracted and decoded based on the user's operation input, and output to the output unit 33. Further, the display selection subtitle decoder 40 generates time-coded recording data that allows the home server 50 to execute search and summary processing using the metadata acquired from the metadata creation server 70 via the Internet I. .

ホームサーバ５０は、ユーザの操作入力に基づいて、表示選択字幕デコーダ４０により生成されたタイムスタンプつき録画データの供給を受けて、内部のデータ記憶手段に記録する。また、ホームサーバ５０は、インターネットＩを介して、メタデータ作成サーバ７０から、表示選択字幕データに対応するテキストデータと、テキストデータに対応するタイムコードにより構成される暗号化メタデータの供給を受ける。 The home server 50 receives the recording data with time stamp generated by the display selection subtitle decoder 40 based on the user's operation input, and records it in the internal data storage means. Further, the home server 50 receives supply of encrypted metadata including text data corresponding to display selection subtitle data and a time code corresponding to the text data from the metadata creation server 70 via the Internet I. .

更に、暗号化メタデータを、鍵データを利用して復号化し、このメタデータを用いてユーザが入力したテキストとメタデータとのマッチング処理を実行することができる。そして、マッチング処理の結果、ユーザが入力したテキストに対応するタイムコードを検出した場合には、タイムコードに基づいて録画データを検索し、表示選択字幕デコーダ４０に供給する。 Furthermore, it is possible to decrypt the encrypted metadata using the key data, and execute matching processing between the text input by the user and the metadata using the metadata. If the time code corresponding to the text input by the user is detected as a result of the matching process, the recorded data is searched based on the time code and supplied to the display selection subtitle decoder 40.

メタデータ作成サーバ７０は、各種ネットワークや電波を介して受信することにより、放送局１０が作成した表示選択字幕データ付きの放送信号を取得する。そして、この放送信号を用いてメタデータを作成して、作成したメタデータを暗号化する。また、メタデータ作成サーバ７０は、暗号化されたメタデータを、インターネットＩを介してユーザに配布する。このメタデータ作成サーバ７０は、ＣＰＵ（Central Processing Unit ）からなる制御手段、ＲＯＭ（Read Only Memory）やＲＡＭ（Random Access Memory )、ＨＤＤ（Hard Disk Drive ）等のデータ記憶手段を備える。メタデータ作成サーバ７０のブロック構成を図８に示す。メタデータ作成サーバ７０が実行する機能（ＣＰＵが実行するソフトウェアによって実行される機能を含む）について説明するための機能ブロック図である。 The metadata creation server 70 acquires a broadcast signal with display-selected caption data created by the broadcast station 10 by receiving it via various networks and radio waves. Then, metadata is created using the broadcast signal, and the created metadata is encrypted. Further, the metadata creation server 70 distributes the encrypted metadata to users via the Internet I. The metadata creation server 70 includes control means including a CPU (Central Processing Unit), and data storage means such as a ROM (Read Only Memory), a RAM (Random Access Memory), and an HDD (Hard Disk Drive). FIG. 8 shows a block configuration of the metadata creation server 70. It is a functional block diagram for demonstrating the function (including the function performed by the software which CPU performs) which the metadata production server 70 performs.

信号取得手段としての放送信号取得部７０１は、ネットワークや放送電波を介して放送信号を取得してデコーダ７０２に供給する。デコーダ７０２は、放送信号取得部７０１から供給された放送信号をデコードする。ここで、デコーダ７０２は、放送信号のうち、メタデータの作成に必要となる番組ＩＤ情報を含む番組管理情報や表示選択字幕データが含まれている映像信号のみをデコードする。 A broadcast signal acquisition unit 701 as a signal acquisition unit acquires a broadcast signal via a network or a broadcast radio wave and supplies the broadcast signal to the decoder 702. The decoder 702 decodes the broadcast signal supplied from the broadcast signal acquisition unit 701. Here, the decoder 702 decodes only the video signal including the program management information including the program ID information and the display selection subtitle data necessary for creating the metadata among the broadcast signals.

番組ＩＤ情報抽出部７０３は、デコーダ７０２によりデコードされた映像データに含まれる番組管理データから、放送番組を特定することができる番組ＩＤ情報を抽出し、メタデータ生成部７０８に供給する。更に、番組ＩＤ情報抽出部７０３は、この映像データを、分離手段としての表示選択字幕データデコーダ７０４に供給する。 The program ID information extraction unit 703 extracts program ID information that can specify a broadcast program from the program management data included in the video data decoded by the decoder 702 and supplies the extracted program ID information to the metadata generation unit 708. Further, the program ID information extraction unit 703 supplies the video data to the display selection subtitle data decoder 704 as a separating unit.

表示選択字幕データデコーダ７０４は、取得した映像データに含まれる表示選択字幕データをデコードして、対応するテキストデータを、解析手段としての解析処理部７０５に供給する。 The display selection subtitle data decoder 704 decodes the display selection subtitle data included in the acquired video data, and supplies the corresponding text data to the analysis processing unit 705 as analysis means.

この解析処理部７０５は、上記第１の実施形態の解析処理部２１１の代わりに字幕モー
ド処理と無声モード処理とを実行する。この２つの処理については、図３、図４と同じである。 The analysis processing unit 705 performs subtitle mode processing and silent mode processing instead of the analysis processing unit 211 of the first embodiment. These two processes are the same as those in FIGS.

このように解析処理部７０５から代表画像の登録指示を受けるタイムコード付加処理部７０７の機能を、図８に戻って説明する。タイムコード付加処理部７０７は、タイマ７０６を用いて、代表画像の登録指示を受けた時刻をタイムコードとして付加する。例えば、表示選択字幕に対応するテキストの場合には、表示選択字幕の開始時刻に対応するタイムコードが付加される。放送信号取得部７０１が、放送に対してリアルタイムで放送信号を取得した場合、タイムコード付加処理部７０７はタイマ７０６が示す現在時刻に基づいて、タイムコードをテキストデータに付加するものとする。また、番組放送時刻に対してタイムコード付加時刻に遅れがある場合には、タイムコード付加処理部７０７は、この遅延時間とタイマ７０６が示す現在時刻とに基づいて、番組の放送時刻に対応するタイムコードを算出し、テキストデータに付加する。 The function of the time code addition processing unit 707 that receives a representative image registration instruction from the analysis processing unit 705 will be described with reference to FIG. The time code addition processing unit 707 uses the timer 706 to add the time when the representative image registration instruction is received as a time code. For example, in the case of text corresponding to a display selection subtitle, a time code corresponding to the start time of the display selection subtitle is added. When the broadcast signal acquisition unit 701 acquires a broadcast signal in real time for the broadcast, the time code addition processing unit 707 adds the time code to the text data based on the current time indicated by the timer 706. If the time code addition time is delayed with respect to the program broadcast time, the time code addition processing unit 707 corresponds to the broadcast time of the program based on the delay time and the current time indicated by the timer 706. The time code is calculated and added to the text data.

メタデータ生成部７０８は、タイムコード付加処理部７０７から供給されたタイムコードが付加されたテキストデータに、番組ＩＤ情報抽出部７０３から供給された番組ＩＤ情報を付加してメタデータを生成する。このメタデータは、図９に示すように、テキストデータに対して、テキスト群の開始時刻と終了時刻が記載されたタイムコードが付加される。そして、番組ＩＤ情報抽出部７０３から供給された番組ＩＤ情報が付加される。そして、このメタデータは暗号化処理部７０９に供給される。 The metadata generation unit 708 generates metadata by adding the program ID information supplied from the program ID information extraction unit 703 to the text data to which the time code supplied from the time code addition processing unit 707 is added. As shown in FIG. 9, the metadata includes a time code in which the start time and end time of the text group are written to the text data. Then, the program ID information supplied from the program ID information extraction unit 703 is added. This metadata is supplied to the encryption processing unit 709.

暗号化処理部７０９は、メタデータ生成部７０８から供給されたメタデータを、鍵データ記憶部７２に予め記憶されている暗号化鍵で暗号化する。図１０に示すように、鍵データ記憶部７２には、番組ＩＤ毎に暗号化鍵が記録されている。そして、暗号化処理部７０９は、メタデータに含まれる番組ＩＤに基づいて、鍵データ記憶部７２から暗号化鍵を抽出し、この暗号化鍵を用いてメタデータの暗号化を行ない、暗号化メタデータ記憶部７３に記録する。この暗号化メタデータ記憶部７３は、代表画像の映像情報を表示させるためのポインタ情報を記録するポインタ情報記憶手段として機能する。この暗号化メタデータは、ユーザからの要求に応じて、番組毎に、送信手段としての通信部７１０からインターネットＩを介してホームサーバ５０に提供される。 The encryption processing unit 709 encrypts the metadata supplied from the metadata generation unit 708 with an encryption key stored in advance in the key data storage unit 72. As shown in FIG. 10, the key data storage unit 72 stores an encryption key for each program ID. Then, the encryption processing unit 709 extracts the encryption key from the key data storage unit 72 based on the program ID included in the metadata, encrypts the metadata using this encryption key, and performs encryption. Records in the metadata storage unit 73. The encrypted metadata storage unit 73 functions as pointer information storage means for recording pointer information for displaying video information of the representative image. The encrypted metadata is provided to the home server 50 via the Internet I from the communication unit 710 as a transmission unit for each program in response to a request from the user.

次に、表示選択字幕デコーダ４０と、ホームサーバ５０との構成を説明する。先ず、図１１に表示選択字幕デコーダ４０のブロック構成図を示す。
テレビジョン受像機３０のチューナ３１により選局された放送信号のうち、表示選択字幕データを含む映像信号、音声信号は、それぞれ所定の入力端子より、映像信号デコーダ４０１、音声信号デコーダ４０２に入力される。 Next, the configuration of the display selection subtitle decoder 40 and the home server 50 will be described. First, a block diagram of the display selection subtitle decoder 40 is shown in FIG.
Of the broadcast signals selected by the tuner 31 of the television receiver 30, the video signal and audio signal including display selection subtitle data are respectively input to the video signal decoder 401 and the audio signal decoder 402 from predetermined input terminals. The

映像信号デコーダ４０１は、供給された映像信号をデコードし、デコードされた映像データをメモリ４０３に供給する。メモリ４０３は、供給された映像信号を一時保持するフレームメモリである。音声信号デコーダ４０２は、供給された音声信号をデコードし、デコードされた音声データを出力する。 The video signal decoder 401 decodes the supplied video signal and supplies the decoded video data to the memory 403. The memory 403 is a frame memory that temporarily holds the supplied video signal. The audio signal decoder 402 decodes the supplied audio signal and outputs the decoded audio data.

受信された放送信号をリアルタイムに出力する場合、表示選択字幕データデコーダ４０５は、メモリ４０３から映像データを取得する。そして、表示選択字幕データの表示が指示された場合、表示選択字幕データデコーダ４０５は、取得した映像データに含まれる表示選択字幕データをデコードして、対応するテキストデータをＯＳＤ（On Screen Display ）４０６に供給するとともに、映像データを合成処理部４０７に供給する。 When the received broadcast signal is output in real time, the display selection subtitle data decoder 405 acquires video data from the memory 403. When the display selection subtitle data is instructed to be displayed, the display selection subtitle data decoder 405 decodes the display selection subtitle data included in the acquired video data and converts the corresponding text data into an OSD (On Screen Display) 406. And the video data to the composition processing unit 407.

ＯＳＤ４０６は、供給されたテキストデータを、表示画面に重畳して表示させるための画像データであるＯＳＤデータに変換して、合成処理部４０７に供給する。合成処理部４
０７は、供給された映像データに、供給されたＯＳＤデータを重畳して、出力端子からテレビジョン受像機３０の出力部３３のディスプレイに出力する。また、音声処理部４０８は、音声信号デコーダ４０２によりデコードされた音声データを取得して、テレビジョン受像機３０の出力部３３のスピーカに出力する。 The OSD 406 converts the supplied text data into OSD data that is image data to be displayed superimposed on the display screen, and supplies the OSD data to the composition processing unit 407. Composition processing unit 4
07 superimposes the supplied OSD data on the supplied video data and outputs it from the output terminal to the display of the output unit 33 of the television receiver 30. Also, the audio processing unit 408 acquires the audio data decoded by the audio signal decoder 402 and outputs it to the speaker of the output unit 33 of the television receiver 30.

次に、映像信号および音声信号をホームサーバ５０に出力して録画させる場合、出力信号生成部４０９は、番組ＩＤ抽出部４１０に、メモリ４０３に保持されている映像データから番組管理データに含まれる番組ＩＤを抽出させる。 Next, when the video signal and the audio signal are output to the home server 50 for recording, the output signal generation unit 409 is included in the program management data from the video data held in the memory 403 in the program ID extraction unit 410. The program ID is extracted.

出力信号生成部４０９は、タイマ４１１を参照して、録画が開始された時刻（絶対時刻）を取得し、メモリ４０３から供給される映像データ、又は音声信号デコーダ４０２から供給される音声データのうちの少なくともいずれか一方に、取得した時刻情報を付加する。更に、映像データおよび音声データに対して、番組ＩＤ抽出部４１０から供給された番組ＩＤを付加して、録画のための出力信号を生成してホームサーバ５０に供給する。ここで付加された時刻情報は、タイムスタンプとして、後述する検索や抽出処理において用いられる。 The output signal generation unit 409 refers to the timer 411, acquires the time (absolute time) at which recording was started, and out of the video data supplied from the memory 403 or the audio data supplied from the audio signal decoder 402 The acquired time information is added to at least one of the above. Further, the program ID supplied from the program ID extraction unit 410 is added to the video data and audio data, and an output signal for recording is generated and supplied to the home server 50. The time information added here is used as a time stamp in search and extraction processing described later.

このような処理により、表示選択字幕デコーダ４０は、ユーザの操作入力に基づいて、表示選択字幕をデコードして映像に重畳させて表示させたり、録画データを生成し、ホームサーバ５０に供給して録画させたりすることが可能となる。 Through such processing, the display selection subtitle decoder 40 decodes the display selection subtitle based on the user's operation input and displays it on the video, generates recording data, and supplies it to the home server 50. It is possible to record.

次に、図１２に示すホームサーバ５０のブロック構成図を説明する。
操作入力部５０１は、例えば、ボタン、キー、タッチパネル、タッチパッド、レバーなどの入力デバイスで構成され、ユーザの操作入力を受ける。録画データ記憶部５２には、表示選択字幕デコーダ４０から供給される番組ＩＤ及びタイムスタンプが付加された映像データおよび音声データを記憶する。録画データ記憶部５２は、例えば、ハードディスクなどの大容量記録媒体により構成されるようにしても、ＤＶＤ（Digital Versatile Disk）やビデオテープなどのリムーバブルな記録媒体を用いることも可能である。 Next, a block configuration diagram of the home server 50 illustrated in FIG. 12 will be described.
The operation input unit 501 includes, for example, input devices such as buttons, keys, a touch panel, a touch pad, and a lever, and receives user operation inputs. The recorded data storage unit 52 stores video data and audio data to which a program ID and a time stamp supplied from the display selection subtitle decoder 40 are added. For example, the recorded data storage unit 52 may be configured by a large-capacity recording medium such as a hard disk or a removable recording medium such as a DVD (Digital Versatile Disk) or a video tape.

録画制御部５０２は、操作入力部５０１から放送番組の録画指示が入力された場合、表示選択字幕デコーダ４０の出力信号生成部４０９に対して映像信号や音声信号の出力を指示する。 When a broadcast program recording instruction is input from the operation input unit 501, the recording control unit 502 instructs the output signal generation unit 409 of the display selection subtitle decoder 40 to output a video signal or an audio signal.

ネットワークＩＦ部５０３は、インターネットＩを介して通信を行なうインターフェースである。ここでは、メタデータ作成サーバ７０との間でデータの送受信を行なう。
表示部５０４は、例えば、ＬＣＤ（Liquid Crystal Display）またはＣＲＴ（Cathode Ray Tube）などで構成され、各種情報を表示する。 The network IF unit 503 is an interface that performs communication via the Internet I. Here, data is transmitted to and received from the metadata creation server 70.
The display unit 504 is composed of, for example, an LCD (Liquid Crystal Display) or a CRT (Cathode Ray Tube), and displays various types of information.

暗号化メタデータ記憶部５３は、メタデータ作成サーバ７０から取得した暗号化メタデータを記憶する。
鍵データ記憶部５４は、暗号化メタデータを復号化するために、予め復号化鍵を記憶する。この復号化鍵は番組ＩＤ毎に提供され、記録される。 The encrypted metadata storage unit 53 stores the encrypted metadata acquired from the metadata creation server 70.
The key data storage unit 54 stores a decryption key in advance in order to decrypt the encrypted metadata. This decryption key is provided and recorded for each program ID.

復号処理部５０５は、暗号化メタデータ記憶部５３に記録されている暗号化メタデータを、鍵データ記憶部５４に記憶されている復号化鍵を用いて復号し、映像情報記憶手段としてのメタデータ記憶部５５に記録する。 The decryption processing unit 505 decrypts the encrypted metadata recorded in the encrypted metadata storage unit 53 using the decryption key stored in the key data storage unit 54, and performs meta data as a video information storage unit. It is recorded in the data storage unit 55.

操作入力部５０１に、検索対象の番組ＩＤと、検索キーとなるテキストが入力された場合、マッチング処理部５０６は、メタデータ記憶部５５に記録されているメタデータを参照して、マッチング処理を実行する。そして、検索キーを含むテキストを特定した場合、
このテキストに関連付けられたタイムコードを録画データ検索処理部５０７に供給する。 When a search target program ID and a search key text are input to the operation input unit 501, the matching processing unit 506 refers to the metadata recorded in the metadata storage unit 55 and performs matching processing. Execute. And if you specify text that contains a search key,
The time code associated with this text is supplied to the recorded data search processing unit 507.

ここで、操作入力部５０１において番組ＩＤのみが指定されている場合、マッチング処理部５０６は、この番組ＩＤに関連付けられて記録されたメタデータをメタデータ記憶部５５から抽出する。そして、このメタデータに含まれるタイムコードと番組ＩＤと基づいて録画データ記憶部２２を検索する。そして、抽出したタームコードを録画データ検索処理部５０７に供給する。 Here, when only the program ID is specified in the operation input unit 501, the matching processing unit 506 extracts the metadata recorded in association with the program ID from the metadata storage unit 55. Then, the recorded data storage unit 22 is searched based on the time code and the program ID included in the metadata. Then, the extracted term code is supplied to the recorded data search processing unit 507.

録画データ検索処理部５０７は、マッチング処理部５０６から供給されたマッチング結果（番組ＩＤとタイムコード）に基づいて録画データ記憶部５２を検索する。ここでは、マッチング処理部５０６と録画データ検索処理部５０７とが、映像情報から代表画像を抽出する抽出手段として機能する。そして、この番組ＩＤの付与された録画において、タイムコードにより特定された画像を表示選択字幕デコーダ４０に供給する。 The recorded data search processing unit 507 searches the recorded data storage unit 52 based on the matching result (program ID and time code) supplied from the matching processing unit 506. Here, the matching processing unit 506 and the recorded data search processing unit 507 function as extraction means for extracting a representative image from the video information. Then, in recording with the program ID, the image specified by the time code is supplied to the display selection subtitle decoder 40.

そして、タイムコードに対応する画面を特定し、タイムコードの早い時刻から順番に画像を列挙した一覧画面データ（ダイジェスト）を生成する。ダイジェストに含まれる各画像には、メタデータに含まれるテキストデータが同時に表示される。更に、各画像には、この画面を再生開始位置として特定するためのタイムコードが付加される。このダイジェストを用いることにより、ユーザは映像コンテンツの概要を把握することができる。このダイジェストの中の任意の画像が選択された場合、この画像に関連付けられたタイムコードにより再生開始位置を特定し、このタイムコードに基づいて録画データを再生する。 Then, the screen corresponding to the time code is specified, and list screen data (digest) in which images are listed in order from the time with the earliest time code is generated. In each image included in the digest, text data included in the metadata is displayed at the same time. Furthermore, a time code for specifying this screen as a reproduction start position is added to each image. By using this digest, the user can grasp the outline of the video content. When an arbitrary image in the digest is selected, the reproduction start position is specified by the time code associated with the image, and the recorded data is reproduced based on the time code.

以上、本実施形態によれば、第１の実施形態の効果に加え、以下に示す効果を得ることができる。
○ 上記実施形態では、メタデータ作成サーバ７０が、メタデータを作成し、各ユーザのホームサーバ５０に提供する。このため、各ユーザの録画装置において、メタデータを作成する必要がなく、ホームサーバ５０の負荷を軽減することができる。 As described above, according to the present embodiment, the following effects can be obtained in addition to the effects of the first embodiment.
In the above embodiment, the metadata creation server 70 creates metadata and provides it to the home server 50 of each user. For this reason, it is not necessary to create metadata in the recording device of each user, and the load on the home server 50 can be reduced.

なお、上記実施形態は、以下の態様に変更してもよい。
・上記実施形態では、メタデータに基づいて代表画像を特定するポインタ情報としてタイムコードを用いたが、これに限られるものはではなく、画面毎に割り振られたフレームデータを用いることも可能である。この場合、映像処理装置はフレームカウンタを備え、メタデータを記録する場合には、ポインタ情報としてフレームデータを用いる。そして、ダイジェストを作成したり、録画を再生したりする場合には、このフレームデータを用いて、画像や再生開始位置を特定することができる。この場合には、タイムコードにおける時刻のずれの影響をなくすことができる。 In addition, you may change the said embodiment into the following aspects.
In the above embodiment, the time code is used as the pointer information for specifying the representative image based on the metadata. However, the time code is not limited to this, and it is also possible to use the frame data allocated for each screen. . In this case, the video processing apparatus includes a frame counter, and uses frame data as pointer information when recording metadata. Then, when creating a digest or playing back a recording, this frame data can be used to specify an image and a playback start position. In this case, the influence of the time shift in the time code can be eliminated.

・上記実施形態では、解析処理部（２１１、７０５）は、無声状態の保持時間が所定の基準時間を越えているかどうかを比較する。そして、無声状態の保持時間が基準時間以上になっている場合、無声モードに移行する。これに代えて、無声状態になった場合、速やかに無声モードにしてもよい。これにより、画像の変化を確実に抽出することができる。 In the above embodiment, the analysis processing units (211 and 705) compare whether or not the unvoiced state retention time exceeds a predetermined reference time. When the silent state retention time is equal to or longer than the reference time, the silent mode is entered. Instead of this, when a silent state is entered, the silent mode may be set promptly. Thereby, the change of an image can be extracted reliably.

・上記実施形態では、解析処理部（２１１、７０５）は、無声状態の保持時間が所定の基準時間以上になっているかを比較し、無声状態の保持時間が基準時間以上になっている場合、無声モードに移行する。これに代えて、この基準時間は映像の種類に応じて変更してもよい。例えば、各字幕の表示時間について、所定の統計値（例えば、平均値）を算出する。そして、この統計値に基づいて基準時間を変更する。放送のコンテンツにより字幕の出し方が異なる場合があるが、状況に応じて適切に無声状態を把握することができる。 In the above embodiment, the analysis processing units (211 and 705) compare whether the unvoiced state retention time is equal to or longer than a predetermined reference time, and if the unvoiced state retention time is equal to or longer than the reference time, Transition to silent mode. Alternatively, the reference time may be changed according to the type of video. For example, a predetermined statistical value (for example, an average value) is calculated for the display time of each caption. Then, the reference time is changed based on this statistical value. Although there are cases where subtitles are put out differently depending on the broadcast content, it is possible to appropriately grasp the silent state according to the situation.

・上記実施形態では、メタデータには代表画像のポインタ情報（タイムコード）に加えて表示選択字幕データを記録するが、ポインタ情報のみであってもよい。代表画像を特定することができれば、この代表画像を一覧表示させることにより、短時間にコンテンツを視聴することができる。 In the above embodiment, the display selection subtitle data is recorded in the metadata in addition to the pointer information (time code) of the representative image, but only the pointer information may be recorded. If the representative image can be specified, the content can be viewed in a short time by displaying a list of the representative images.

・上記実施形態では、メタデータには代表画像のポインタ情報（タイムコード）に加えて表示選択字幕データを記録するが、ポインタ情報（タイムコード）に代表画像そのものを関連付けて記録してもよい。この場合、解析処理部（２１１、７０５）が字幕モードや無声モードにおいて代表画像データを合わせて記録する。この場合、一覧画面を作成する場合、録画データ記憶部（２２、５２）にアクセスする必要がなく、装置に負荷をかけずにコンテンツの概略を把握することができる。 In the above embodiment, the display selection caption data is recorded in the metadata in addition to the pointer information (time code) of the representative image. However, the representative image itself may be recorded in association with the pointer information (time code). In this case, the analysis processing units (211 and 705) record the representative image data together in the caption mode or the silent mode. In this case, when creating the list screen, it is not necessary to access the recorded data storage unit (22, 52), and the outline of the content can be grasped without imposing a load on the apparatus.

・上記実施形態では、解析処理部（２１１、７０５）がポインタ情報を記録するが、操作入力部を修正手段として機能させて、ポインタ情報を変更できるようにしてもよい。この場合には、画像を見ながらタイムコードやフレームデータを変更することができる。 In the above embodiment, the analysis processing units (211 and 705) record the pointer information. However, the operation input unit may function as a correction unit so that the pointer information can be changed. In this case, the time code and the frame data can be changed while viewing the image.

本発明の一実施形態のシステム概略図。The system schematic of one Embodiment of this invention. 本発明の一実施形態のメタデータの説明図。Explanatory drawing of the metadata of one Embodiment of this invention. 本発明の一実施形態の録画装置のブロック構成の説明図。Explanatory drawing of the block configuration of the video recording apparatus of one Embodiment of this invention. 本発明の一実施形態の処理手順の説明図。Explanatory drawing of the process sequence of one Embodiment of this invention. 本発明の一実施形態の処理手順の説明図。Explanatory drawing of the process sequence of one Embodiment of this invention. 本発明の一実施形態の処理手順の説明図。Explanatory drawing of the process sequence of one Embodiment of this invention. 本発明の他の実施形態のシステムの概略図。Schematic of the system of other embodiment of this invention. 本発明の一実施形態のメタデータ作成サーバのブロック構成の説明図。Explanatory drawing of the block structure of the metadata production server of one Embodiment of this invention. 本発明の一実施形態のメタデータの説明図。Explanatory drawing of the metadata of one Embodiment of this invention. 本発明の一実施形態の鍵データ記憶部に記録されたデータの説明図。Explanatory drawing of the data recorded on the key data storage part of one Embodiment of this invention. 本発明の一実施形態の表示選択字幕デコーダの説明図。Explanatory drawing of the display selection subtitle decoder of one Embodiment of this invention. 本発明の一実施形態のホームサーバの説明図。Explanatory drawing of the home server of one Embodiment of this invention.

Explanation of symbols

１０…放送局、２０…映像処理装置としての録画装置、２０５…分離手段としての表示選択字幕データデコーダ、２０９…抽出手段としての録画管理部、２１１…解析手段としての解析処理部、３０…テレビジョン受像機、２２…映像情報記憶手段としての録画データ記憶部、２３…ポインタ情報記憶手段としてのダイジェストデータ記憶部、４０…表示字幕デコーダ、５０…ホームサーバ、７０…映像処理装置としてのメタデータ作成サーバ、７０１…信号取得手段としての放送信号取得部、７０４…分離手段としての表示選択字幕データデコーダ、５５…メタデータ記憶部、７０５…解析手段としての解析処理部、Ｉ…ネットワークとしてのインターネット。 DESCRIPTION OF SYMBOLS 10 ... Broadcasting station, 20 ... Recording apparatus as video processing apparatus, 205 ... Display selection caption data decoder as separation means, 209 ... Recording management section as extraction means, 211 ... Analysis processing section as analysis means, 30 ... Television John receiver, 22 ... recorded data storage unit as video information storage means, 23 ... digest data storage unit as pointer information storage means, 40 ... display subtitle decoder, 50 ... home server, 70 ... metadata as video processing apparatus Creation server, 701 ... broadcast signal acquisition unit as signal acquisition means, 704 ... display selection subtitle data decoder as separation means, 55 ... metadata storage section, 705 ... analysis processing section as analysis means, I ... Internet as network .

Claims

Signal acquisition means for acquiring a broadcast signal including subtitle data;
Separating means for separating at least caption data and video information from the broadcast signal;
Analyzing means for analyzing the caption data;
A video processing apparatus comprising pointer information storage means for extracting pointer information from the recorded video information and recording pointer information for displaying the video information of the representative image,
The analysis means detects a subtitle display timing, and executes a first registration process for recording pointer information of video information in conjunction with the display timing;
A video processing apparatus that executes a second registration process for recording pointer information of video information according to a predetermined registration rule when caption data is not detected.

The video processing apparatus according to claim 1, further comprising a transmission unit configured to transmit the pointer information recorded in the pointer information storage unit to a user terminal via a network. apparatus.

The video processing apparatus further includes video information storage means for recording video information separated from the broadcast signal,
The video processing apparatus according to claim 1, further comprising an extraction unit that refers to the pointer information and extracts a representative image from the video information recorded in the video information storage unit.

The registration rule calculates a difference between successive video information, and records pointer information regarding the video information when the difference is greater than or equal to a reference value. The video processing device described in 1.

The video processing apparatus according to claim 1, wherein the analysis unit records the subtitle data separated by the separation unit in association with the pointer information.

The video processing apparatus according to claim 1, wherein the analysis unit records representative image data specified by the pointer information in association with the pointer information.

The video processing apparatus according to claim 1, further comprising a correction unit for the pointer information.

Signal acquisition means for acquiring a broadcast signal including subtitle data;
Separating means for separating at least caption data and video information from the broadcast signal;
Analyzing means for analyzing the caption data;
This is a method of registering video information using a video processing apparatus that includes a pointer information storage unit that extracts a representative image from recorded video information and records pointer information for displaying the video information of the representative image. And
The analysis means detects a subtitle display timing, and executes a first registration process for recording pointer information of video information in conjunction with the display timing;
A video processing method comprising: executing second registration processing for recording pointer information of video information according to a predetermined registration rule when caption data is not detected.

Signal acquisition means for acquiring a broadcast signal including subtitle data;
Separating means for separating at least caption data and video information from the broadcast signal;
Analyzing means for analyzing the caption data;
A program for registering video information using a video processing device that includes a pointer information storage unit that extracts a representative image from recorded video information and records pointer information for displaying the video information of the representative image. And
The analyzing means;
Detecting a display timing of subtitles, and performing a first registration process for recording pointer information of video information in conjunction with the display timing;
A video processing program that functions as means for executing a second registration process for recording pointer information of video information according to a predetermined registration rule when no caption data is detected.