JP2003018505A

JP2003018505A - Information reproducing device and conversation scene detection method

Info

Publication number: JP2003018505A
Application number: JP2001198328A
Authority: JP
Inventors: Shunji Ui; 俊司宇井
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2001-06-29
Filing date: 2001-06-29
Publication date: 2003-01-17

Abstract

PROBLEM TO BE SOLVED: To provide an information reproducing device which can skip a scene in which a conversation is discontinued for a long time and start recorded information from the next conversation scene, when the information is reproduced from an optical disc in which video data, audio data, auxiliary image data such as titles are recorded. SOLUTION: A conversation reproducing means 110 of an information reproducing device 100 decides whether information read out of an optical disc 101 by a reading means 102 is a leading pack of an auxiliary image data unit or not and, if it is decided that it is the leading pack of the auxiliary data unit, outputs the address information of a navigation pack containing the leading pack to a system control unit 111. When the system control unit 111 receives the address information from the conversation detecting means 110, it controls the reading means 102 to reproduce a conversation scene from the position on the optical disc 101 indicated by the address information.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、ビデオ信号、音声
信号と共に字幕などの副映像信号が記録された情報記録
媒体からビデオ信号、音声信号、副映像信号を再生する
情報再生装置および会話シーン検出方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an information reproducing apparatus for reproducing a video signal, an audio signal and a sub-picture signal from an information recording medium in which a sub-picture signal such as a caption is recorded together with a video signal and an audio signal, and a conversation scene detection. Regarding the method.

【０００２】[0002]

【従来の技術】映画などのビデオデータを光ディスク上
にデジタル記録して再生する方式のひとつにＤＶＤ−Ｖ
ｉｄｅｏ規格がある。ＤＶＤ−Ｖｉｄｅｏ規格において
は、ＭＰＥＧ方式を用いて圧縮符号化されたビデオ（デ
ータ）ストリームと、ＤＯＬＢＹ−ＡＣ３やＭＰＥＧ方
式により圧縮符号化された最大８本の音声（データ）ス
トリームと、字幕などを表示するための副映像がランレ
ングス符号化された最大３２本の副映像（データ）スト
リームとをＭＰＥＧのプログラムストリーム（データス
トリーム）として多重化しディスク上に記録することが
できる。ＤＶＤ−Ｖｉｄｅｏ規格に準拠したＤＶＤ−Ｖ
ｉｄｅｏディスクをＤＶＤ−Ｖｉｄｅｏ規格に準拠した
情報再生装置を用いて再生すると、外国映画をオリジナ
ル言語で聞き、日本語字幕を見ながら鑑賞するようなこ
とが可能になる。2. Description of the Related Art DVD-V is one of the methods for digitally recording and reproducing video data such as movies on an optical disk.
There is a video standard. According to the DVD-Video standard, a video (data) stream compression-coded using the MPEG system, a maximum of eight audio (data) streams compression-coded using the DOLBY-AC3 or MPEG system, and subtitles are included. A maximum of 32 sub-pictures (data) streams in which the sub-pictures for display are run-length coded can be multiplexed as an MPEG program stream (data stream) and recorded on the disc. DVD-V compliant with the DVD-Video standard
When the video disc is played back by using the information playback device conforming to the DVD-Video standard, it becomes possible to listen to a foreign movie in the original language and watch it while watching Japanese subtitles.

【０００３】情報再生装置において音声と副映像の各デ
ータを有効に用いる方法として、特開平８−２０５０４
４号公報に開示された技術がある。この公報には、再生
される会話などの音声データを音声認識して、この音声
認識された音声データから文字列コードを生成して音声
データの位置データに対応付けてインデックス情報を作
成し、このインデックス情報を参照して映画などの印象
場面の検索、シーン系列サーチ、ストーリ−の要約作成
などが行える技術が開示されている。As a method of effectively using audio and sub-picture data in an information reproducing apparatus, Japanese Patent Laid-Open No. 20504/1996
There is a technique disclosed in Japanese Patent No. 4 publication. In this publication, voice data of a conversation to be played back is voice-recognized, a character string code is generated from the voice-recognized voice data, and index information is created in association with position data of the voice data. There is disclosed a technique capable of searching for an impression scene such as a movie, searching for a scene sequence, and creating a story summary by referring to index information.

【０００４】[0004]

【発明が解決しようとする課題】映画などを収録したＤ
ＶＤ−Ｖｉｄｅｏディスクでは、外国語の聞き取り学習
時などのように、会話が無い状況が長く続いているシー
ンを飛ばし次の会話のあるシーンから再生を開始したい
というユーザ要求がある。すなわち、外国映画において
オリジナル言語の字幕を表示しつつオリジナル言語を聞
いて聞き取りが正しいかを確認する外国語学習する場合
などである。[Problems to be Solved by the Invention] D recording a movie or the like
With the VD-Video disc, there is a user request to skip a scene in which there is no conversation for a long time, such as when learning to listen to a foreign language, and start playing from the scene with the next conversation. That is, for example, when learning a foreign language in which a subtitle in the original language is displayed in a foreign movie while listening to the original language to confirm whether the listening is correct.

【０００５】しかしながら、従来のＤＶＤ−Ｖｉｄｅｏ
規格に準拠した光ディスクから情報を再生する情報再生
装置において、ランダムアクセスを行う際のアクセスポ
イントは、例えば映画の場合の書くチャプタ−の先頭な
ど、予め定められたポイントのみであり、上述したよう
な会話の無いシーンを飛ばしてアクセスすることはでき
ないという問題があった。However, the conventional DVD-Video
In an information reproducing apparatus that reproduces information from an optical disc conforming to the standard, the access point when performing random access is only a predetermined point such as the beginning of a chapter to write in the case of a movie, as described above. There was a problem that scenes without conversation could not be skipped and accessed.

【０００６】また、特開平８−２０５０４４号公報には
シーン系列サーチに関する開示があるが、音声認識機能
が必要であり、装置が複雑化するという問題点があっ
た。Further, Japanese Patent Application Laid-Open No. 8-205044 discloses a scene sequence search, but it requires a voice recognition function, which causes a problem that the apparatus becomes complicated.

【０００７】本発明は、ユーザの指示に基づいて、会話
の無いシーンを飛ばし、前後の会話シーンへ簡単にアク
セスすることができる情報再生装置および会話シーン検
出方法を提供することを目的とする。It is an object of the present invention to provide an information reproducing apparatus and a conversation scene detection method capable of skipping a scene without conversation and easily accessing the conversation scenes before and after, based on a user's instruction.

【０００８】また、本発明は、ユーザの指示に基づい
て、データストリーム中の任意の会話シーンへ簡単にア
クセスすることができる情報再生装置および会話シーン
検出方法を提供することを目的とする。Another object of the present invention is to provide an information reproducing apparatus and a conversation scene detection method capable of easily accessing an arbitrary conversation scene in a data stream based on a user's instruction.

【０００９】さらに、本発明は、自動的にデータストリ
ーム中の会話の無い部分を飛ばして、会話シーンのみを
つないで再生することができる情報再生装置および会話
シーン検出方法を提供することを目的とする。Another object of the present invention is to provide an information reproducing apparatus and a conversation scene detecting method capable of automatically skipping a conversation-free portion of a data stream and connecting and reproducing only the conversation scene. To do.

【００１０】[0010]

【課題を解決するための手段】本発明の情報再生装置
は、データストリームを読出す手段と、読み出したデー
タストリームから字幕の先頭位置を検出する検出手段
と、この検出手段が字幕の先頭位置を検出した際に、こ
の先頭位置に対応した再生開始位置情報に基づき前記デ
ータストリームの再生位置を設定する制御手段とを備え
ている。An information reproducing apparatus according to the present invention comprises means for reading a data stream, detecting means for detecting a head position of a caption from the read data stream, and this detecting means for detecting a head position of the caption. When it is detected, the control means for setting the reproduction position of the data stream based on the reproduction start position information corresponding to the head position.

【００１１】また、本発明の情報再生装置は、データス
トリームを読出す手段と、この読出し手段が読み出した
データストリームから字幕の先頭位置を検出する検出手
段と、この検出手段が字幕の先頭位置を検出した際に、
この先頭位置に対応した再生開始位置情報が記憶される
記憶手段と、この記憶手段に記憶された再生開始位置情
報に基づきデータストリームの再生位置を制御する制御
手段とを備えている。In the information reproducing apparatus of the present invention, the means for reading the data stream, the detecting means for detecting the head position of the caption from the data stream read by the reading means, and the detecting means for detecting the head position of the caption. When it is detected,
The storage means stores the reproduction start position information corresponding to the head position, and the control means for controlling the reproduction position of the data stream based on the reproduction start position information stored in the storage means.

【００１２】また、本発明の情報再生装置は、データス
トリームを読出す手段と、この読出し手段が読み出した
データストリームから全ての字幕の先頭位置を検出する
検出手段と、この検出手段が字幕の先頭位置を検出する
たびに各先頭位置に対応した再生開始位置情報がそれぞ
れ記憶される記憶手段と、この記憶手段に記憶された各
再生開始位置情報に基づき前記データストリームの再生
位置を制御する制御手段とを備えている。Further, the information reproducing apparatus of the present invention comprises means for reading the data stream, detecting means for detecting the head positions of all captions from the data stream read by the reading means, and the detecting means for detecting the head of the caption. Storage means for storing reproduction start position information corresponding to each head position each time the position is detected, and control means for controlling the reproduction position of the data stream based on the reproduction start position information stored in the storage means. It has and.

【００１３】また、本発明の会話シーン検出方法は、デ
ータストリームを読出し、読み出したデータストリーム
から字幕の先頭位置を検出し、字幕の先頭位置を検出し
た際に、先頭位置に対応した再生開始位置情報に基きデ
ータストリームの再生位置を変更する。Further, according to the conversation scene detecting method of the present invention, the data stream is read, the head position of the caption is detected from the read data stream, and when the head position of the caption is detected, the reproduction start position corresponding to the head position is detected. Change the playback position of the data stream based on the information.

【００１４】ことを特徴とする会話シーン検出方法。A method for detecting a conversation scene, characterized in that

【００１５】また、本発明の会話シーン検出方法は、デ
ータストリームを読出し、読み出したデータストリーム
から字幕の先頭位置を検出し、字幕の先頭位置を検出し
た際に、この先頭位置に対応した再生開始位置情報を記
憶手段に記憶し、この記憶手段に記憶された再生開始位
置情報に基づきデータストリームの再生位置を制御す
る。Further, according to the conversation scene detection method of the present invention, the data stream is read, the head position of the caption is detected from the read data stream, and when the head position of the caption is detected, the reproduction start corresponding to the head position is started. The position information is stored in the storage means, and the reproduction position of the data stream is controlled based on the reproduction start position information stored in the storage means.

【００１６】また、本発明の会話シーン検出方法は、デ
ータストリームを読出し、読み出したデータストリーム
から全ての字幕の先頭位置を検出し、字幕の先頭位置を
検出するたびに、各先頭位置に対応した再生開始位置情
報を記憶手段に記憶し、この記憶手段に記憶された各再
生開始位置情報に基づきデータストリームの再生位置を
制御する。Further, the conversation scene detection method of the present invention reads the data stream, detects the head positions of all captions from the read data stream, and corresponds to each head position each time the head position of the caption is detected. The reproduction start position information is stored in the storage means, and the reproduction position of the data stream is controlled based on each reproduction start position information stored in the storage means.

【００１７】[0017]

【発明の実施の形態】以下、図面を参照して、本発明を
説明する。DETAILED DESCRIPTION OF THE INVENTION The present invention will be described below with reference to the drawings.

【００１８】まず、ＤＶＤ−Ｖｉｄｅｏディスクのデー
タ構造を説明する。図１はＤＶＤ−Ｖｉｄｅｏディスク
のデータ構造を説明するための図である。First, the data structure of the DVD-Video disc will be described. FIG. 1 is a diagram for explaining the data structure of a DVD-Video disc.

【００１９】図１（ａ）に示すように、ＤＶＤ−Ｖｉｄ
ｅｏディスクは、ＵＤＦブリッジ方式のファイルシステ
ム部であるボリューム・アンド・ファイル・ストラクチ
ャー（以下、ＢＦＳと称する）部１１ａ、ＤＶＤ−Ｖｉ
ｄｅｏ用のファイル群が記録されたＤＶＤ−Ｖｉｄｅｏ
ゾーン１１ｂ、ＤＶＤ−Ｖｉｄｅｏデータ以外のファイ
ルが記録されるＤＶＤアザ−ズ・ゾーン１１ｃから構成
される。ＤＶＤ−Ｖｉｄｅｏゾーンは図１（ｂ）に示す
ように複数のファイル＃１、…、＃ｉ、…、＃ｐ、…に
分割されており、図１（ｃ）に示すようにビデオマネジ
ャーと呼ばれる１個のボリューム管理部（以下、ＶＭＧ
と称する）１３ａと、１個以上のビデオタイトルセット
（以下、ＶＴＳと称する）＃１、＃２、…から構成され
る。ＶＭＧ１３ａはディスク中の全タイトルのサーチ情
報などディスクに関わる情報を記録すると共にディスク
全体に関わるメニューを記録することができる。ＶＴＳ
＃１、＃２、…は属性を同じくするひとつ以上のタイト
ルを記録し、ＶＴＳ毎に各ＶＴＳ用のメニューを記録す
ることができる。As shown in FIG. 1A, the DVD-Vid
The eo disk is a volume and file structure (hereinafter referred to as BFS) unit 11a, which is a UDF bridge type file system unit, and a DVD-Vi.
DVD-Video recording a set of files for deo
The zone 11b and the DVD other zone 11c in which files other than DVD-Video data are recorded. The DVD-Video zone is divided into a plurality of files # 1, ..., #i, ..., #p, ... As shown in FIG. 1B, and is called a video manager as shown in FIG. 1C. One volume management unit (hereinafter VMG
13a) and one or more video title sets (hereinafter referred to as VTS) # 1, # 2 ,. The VMG 13a can record information related to the disc such as search information of all titles on the disc and a menu related to the entire disc. VTS
Each of # 1, # 2, ... Can record one or more titles having the same attribute, and can record a menu for each VTS for each VTS.

【００２０】ＶＭＧ１３ａは、図１（ｄ）に示すよう
に、メニューあるいはタイトルを再生するための制御情
報（以下、ＶＭＧＩと称する）１５ａと、メニュー再生
用のビデオデータの集合であるビデオ・オブジェクト・
セット（以下、ＶＭＧＭ＿ＶＯＢＳと称する）１５ｂか
ら構成される。ＶＴＳ＃１、＃２、…はぞれぞれメニュ
ーあるいはタイトルを再生するための制御情報（以下、
ＶＴＳＩと称する）１５ｃ、メニュー再生用のビデオデ
ータの集合であるビデオ・オブジェクト・セット（以
下、ＶＴＳＭ＿ＶＯＢＳと称する）１５ｄ、タイトル再
生用のビデオデータの集合であるビデオ・オブジェクト
・セット（以下、ＶＴＳＴＴ＿ＶＯＢＳと称する）１５
ｅから構成される。As shown in FIG. 1D, the VMG 13a includes control information (hereinafter referred to as VMGI) 15a for reproducing a menu or title, and a video object, which is a set of video data for menu reproduction.
The set (hereinafter referred to as VMGM_VOBS) 15b is configured. VTS # 1, # 2, ... Control information for reproducing the menu or title respectively (hereinafter,
VTSI) 15c, a video object set (hereinafter referred to as VTSM_VOBS) 15d that is a set of video data for menu reproduction, and a video object set (hereinafter referred to as VTSTT_VOBS) that is a set of video data for title reproduction. 15)
It consists of e.

【００２１】ＶＯＢＳは、図１（ｅ）に示すように、ひ
とつ以上のビデオ・オブジェクト（以下、ＶＯＢ＃１、
＃２、…、＃ｉと称する）というビデオ／音声／副映像
（データ）ストリームの多重化ストリームにより構成さ
れる。ＶＯＢ＃１、＃２、…、＃ｉでは、多重化方式と
してＭＰＥＧ２のシステムシーンで規定されているプロ
グラム（データ）ストリーム方式を採用している。さら
に、ＶＯＢ＃１、＃２、…、＃ｉは、図１（ｆ）に示す
ように、タイトル中におけるシーンやメニューのページ
毎にセル＃１、＃２と言う再生単位に分割可能である。
本実施形態においては、プログラムストリームをデータ
ストリームという。As shown in FIG. 1E, the VOBS is one or more video objects (hereinafter, VOB # 1,
# 2, ..., #i), which is a multiplexed stream of video / audio / sub-picture (data) streams. The VOBs # 1, # 2, ..., #i employ a program (data) stream system defined by the MPEG2 system scene as a multiplexing system. Further, as shown in FIG. 1F, VOBs # 1, # 2, ..., #i can be divided into reproduction units called cells # 1 and # 2 for each scene or menu page in the title. .
In this embodiment, the program stream is called a data stream.

【００２２】ＶＭＧＩ１５ａやＶＴＳＩ１５ｃはセル＃
１、＃２、…＃ｉを単位として再生順序を定義したプロ
グラムチェーン（以下、ＰＧＣと称する）を有しおり、
ＰＧＣによりタイトルやメニューを構成することができ
る。ＰＧＣは、ＰＧＣを再生する前に実行されるプリコ
マンド、ＰＧＣを再生した後に実行されるポストコマン
ド、各セルを再生後に実行されるセルコマンドが定義さ
れ、この定義によりタイトルやメニュー間のリンクやさ
まざまなナビゲーションを設定できる。The VMGI 15a and VTSI 15c are cell #
It has a program chain (hereinafter referred to as PGC) that defines the playback order in units of 1, # 2, ... #i,
Titles and menus can be configured by PGC. The PGC defines a pre-command executed before playing the PGC, a post command executed after playing the PGC, and a cell command executed after playing each cell. With this definition, links between titles and menus and You can set various navigations.

【００２３】図１（ｇ）は各セルのＭＰＥＧ２のプログ
ラム（データ）ストリーム方式による多重化ストリーム
イメージである。ＤＶＤ−Ｖｉｄｅｏ規格では、主映像
であるＭＰＥＧ２ビデオ（データ）ストリームと、最大
８ストリームの音声（データ）ストリームと、最大３２
ストリームの副映像（データ）ストリームを多重化でき
る。これらの各ストリームはそれぞれ２０４８バイト毎
にパック化され、パック単位にプログラムストリーム上
に時分割多重される。また、ＤＶＤ−Ｖｉｄｅｏ規格プ
ログラムストリーム上に約０．５秒毎にナビゲーション
パック（以下、Ｎａｖｉパックと称する）と呼ばれる再
生制御用の情報が挿入される。ＤＶＤ−Ｖｉｄｅｏ規格
では、Ｎａｖｉパックから次のＮａｖｉパックまでをビ
デオ・オブジェクト・ユニット（以下、ＶＯＢＵと称す
る）といい、ＶＯＢにおける再生可能な最小単位すなわ
ちアクセスユニットを構成する。ＶＯＢＵ１７はＮａｖ
ｉパック１７ａ、ビデオパック１７ｂ、オーディオパッ
ク１７ｃ、副映像パック１７ｄから構成される。各パッ
クはそれぞれパックヘッダ（図示せず）を有しており、
このパックヘッダにパックの種別情報、アドレス情報な
どが記録されている。FIG. 1 (g) shows a multiplexed stream image of the MPEG2 program (data) stream system of each cell. According to the DVD-Video standard, an MPEG2 video (data) stream which is a main video, an audio (data) stream with a maximum of 8 streams, and a maximum of 32 streams.
The sub-picture (data) stream of the stream can be multiplexed. Each of these streams is packed into 2048 bytes and time-division multiplexed on the program stream in pack units. Further, information for reproduction control called a navigation pack (hereinafter referred to as Navi pack) is inserted into the DVD-Video standard program stream approximately every 0.5 seconds. In the DVD-Video standard, a Navi pack to the next Navi pack is called a video object unit (hereinafter referred to as VOBU), and constitutes a minimum reproducible unit in VOB, that is, an access unit. VOBU17 is Nav
It is composed of an i-pack 17a, a video pack 17b, an audio pack 17c, and a sub-picture pack 17d. Each pack has a pack header (not shown),
Pack type information, address information, and the like are recorded in the pack header.

【００２４】なお、ＤＶＤ−Ｖｉｄｅｏ規格において
は、ＰＧＣによりセルを単位として再生順序が定義され
ているが、任意のセルをパート・オブ・タイトル（以
下、ＰＴＴと称する）として登録し、そのＰＴＴ先頭へ
のアクセスを瞬時に行うことができる。例えば、映画タ
イトルの場合、各チャプターをＰＴＴとして登録してお
くことにより、ユーザが好みのチャプタ−を瞬時に頭出
しして視聴できる。In the DVD-Video standard, the playback order is defined by PGC as a unit of cell, but any cell is registered as a part of title (hereinafter referred to as PTT) and the PTT head is registered. Access to can be done instantly. For example, in the case of a movie title, by registering each chapter as a PTT, the user can instantly find the desired chapter and watch it.

【００２５】次に、ＤＶＤ−Ｖｉｄｅｏ規格における副
映像フォーマットを説明する。Next, the sub-picture format in the DVD-Video standard will be described.

【００２６】図２は副映像符号化データである副映像ユ
ニットのデータ構造を説明するための図である。この副
映像ユニット２０は一枚の副映像画像、すなわち一枚の
字幕画像を符号化したもので、サブピクチャー・ユニッ
ト・ヘッダー（以下、ＳＰＵＨと称する）２１、ピクセ
ル・データ（以下、ＰＸＤと称する）２３、ディスプレ
イ・コントロール・テーブル（以下、ＳＰ＿ＤＣＳＱＴ
と称する）２５から構成されている。ＳＰＵＨ２１はユ
ニットデータのサイズやＳＰ＿ＤＣＳＱＴ２５の先頭ア
ドレスが記述されたヘッダデータである。ＰＸＤ２３は
字幕信号などの画素データをランレングス符号化したも
のである。この各画素データは、背景画像、パターン画
素および２種類の強調画素の４値をとる。ＳＰ＿ＤＣＳ
ＱＴ２５は、画素データを表示する際の制御信号テーブ
ルである。各制御信号は制御が適用される時刻とその時
刻に行われるべき制御内容からなる。また、制御内容の
種類には表示開始、表示終了、色設定、コントラスト設
定、表示領域変更、色・コントラスト変更などがある。FIG. 2 is a diagram for explaining the data structure of the sub-picture unit which is the sub-picture coded data. The sub-picture unit 20 is one sub-picture image, that is, one subtitle image encoded, and includes a sub-picture unit header (hereinafter referred to as SPUH) 21 and pixel data (hereinafter referred to as PXD). ) 23, display control table (hereinafter SP_DCSQT
25). SPUH21 is header data in which the size of unit data and the start address of SP_DCSQT25 are described. The PXD 23 is run-length encoded pixel data such as a caption signal. Each pixel data has four values, that is, a background image, a pattern pixel, and two types of emphasized pixels. SP_DCS
The QT 25 is a control signal table when displaying pixel data. Each control signal consists of the time when the control is applied and the control content to be performed at that time. The types of control contents include display start, display end, color setting, contrast setting, display area change, color / contrast change, and the like.

【００２７】この副映像ユニット２０は約２Ｋバイト毎
に分割され、パケットヘッダが付加されることにより副
映像パケットが作成される。この副映像パケットに対
し、パックヘッダが付加されて２０４８バイトの副映像
パックが作成される。図１にて説明したＶＯＢへの多重
はこの副映像パックを単位としている。This sub-picture unit 20 is divided into about 2 Kbytes each, and a sub-picture packet is created by adding a packet header. A pack header is added to this sub-picture packet to create a 2048-byte sub-picture pack. The sub-picture pack is used as a unit for multiplexing on the VOB described with reference to FIG.

【００２８】図３は副映像ユニットのパケット化、パッ
ク化の様子を説明するための図である。FIG. 3 is a diagram for explaining how the sub-picture unit is packetized and packed.

【００２９】図３においては、１つの副映像ユニットデ
ータから３つの副映像パケット３１、３３、３５が作成
される場合を示している。各副映像パック３１、３３、
３５はそれぞれパックヘッダ３１ａ、３３ａ、３５ａお
よび副映像パケット３１ｂ、３３ｂ、３５ｂから構成さ
れる。なお、３番目の副映像パック３には、パックサイ
ズを２０４８バイトとするためのパディングパケット３
７というアライン用パケットが挿入されている。各副映
像パケット３１ｂ、３３ｂ、３５ｂはそれぞれパケット
ヘッダ３１ｃ、３３ｃ、３５ｃと分割された副映像ユニ
ットデータ３１ｄ、３３ｄ、３５ｄから構成される。FIG. 3 shows a case where three sub-picture packets 31, 33 and 35 are created from one sub-picture unit data. Each sub-picture pack 31, 33,
35 includes pack headers 31a, 33a and 35a and sub-picture packets 31b, 33b and 35b, respectively. The third sub-picture pack 3 includes padding packet 3 for setting the pack size to 2048 bytes.
An aligning packet of 7 is inserted. Each sub-picture packet 31b, 33b, 35b is composed of packet headers 31c, 33c, 35c and sub-picture unit data 31d, 33d, 35d, respectively.

【００３０】各パックヘッダ３１ａ、３３ａ、３５ａに
は、システム・クロック・リファレンス（以下、ＳＣＲ
と称する）という時刻情報が記録されている。副映像ユ
ニットが分割されたうちの先頭副映像パケット３１ｂの
パケットヘッダ３１ｃには、プレゼンテーション・タイ
ム・スタンプ（以下、ＰＴＳと称する）という時刻情報
が記述されている。ＳＣＲはこのパックが記録媒体から
再生システムへ転送されるシステム時間を示す。ＰＴＳ
はこのパケットに含まれるデータ、すなわち副映像ユニ
ットデータが再生されるシステム時間を示す。Each pack header 31a, 33a, 35a has a system clock reference (hereinafter referred to as SCR).
(Referred to as)) is recorded. Time information called a presentation time stamp (hereinafter referred to as PTS) is described in the packet header 31c of the first sub-picture packet 31b of the sub-picture unit divided. SCR indicates the system time when this pack is transferred from the recording medium to the reproduction system. PTS
Indicates the system time in which the data included in this packet, that is, the sub-picture unit data is reproduced.

【００３１】図２の副映像ユニットデータのＳＰ＿ＤＣ
ＳＱＴには、副映像ユニットの表示制御コマンドがその
制御時刻と共に記述されている。この制御時刻は、副映
像ユニットの先頭データが含まれる副映像パケット３１
ｂのパケットヘッダ３１ｃのＰＴＳをゼロとした際の相
対値で表されている。ＳＰ＿ＤＣＳＱＴの表示制御コマ
ンドには、表示開始コマンド、表示終了コマンドが含ま
れる。表示開始コマンドは相対時刻ゼロに必ず発行され
なければならないという規定があるため、実際には副映
像パケット３１ｂのＰＴＳが副映像ユニットの表示開始
時刻となる。副映像ユニットの表示終了時刻はＳＰ＿Ｄ
ＣＳＱＴに記述されている表示終了コマンドの発行時刻
と副映像パケット３５ｂのパケットヘッダ３５ｃのＰＴ
Ｓから求められる。なお、表示終了コマンドはＳＰ＿Ｄ
ＣＳＱＴに記述されていない場合がある。この場合、そ
の副映像ユニットの表示終了時刻は次の副映像ユニット
の表示開始時刻またはセルの終了時刻となる。SP_DC of the sub-picture unit data of FIG.
In the SQT, a display control command for the sub-picture unit is described together with its control time. This control time is the sub-picture packet 31 including the head data of the sub-picture unit.
It is represented by a relative value when the PTS of the packet header 31c of b is set to zero. The display control command of SP_DCSQT includes a display start command and a display end command. Since there is a provision that the display start command must be issued at relative time zero, the PTS of the sub-picture packet 31b actually becomes the display start time of the sub-picture unit. The display end time of the sub-picture unit is SP_D
Issuing time of display end command described in CSQT and PT of packet header 35c of sub-picture packet 35b
Calculated from S. The display end command is SP_D
It may not be described in CSQT. In this case, the display end time of the sub-picture unit becomes the display start time of the next sub-picture unit or the cell end time.

【００３２】次に、ビデオ信号、音声信号、字幕（副映
像）信号の関係を説明する。Next, the relationship between the video signal, the audio signal, and the caption (sub-picture) signal will be described.

【００３３】図４はビデオ／音声／副映像信号の再生イ
メージを説明するための図である。FIG. 4 is a diagram for explaining a reproduction image of a video / audio / sub-picture signal.

【００３４】一般的な映画の場合、図４に示されるよう
に、ビデオ信号と音声信号は途切れることなく連続して
信号が存在する。これに対し、ほとんどの字幕（副映
像）信号は、人が会話をしているシーンのみに存在す
る。In the case of a general movie, as shown in FIG. 4, the video signal and the audio signal continuously exist without interruption. On the other hand, most subtitle (sub-picture) signals exist only in a scene where a person is talking.

【００３５】図５は図４に示すビデオ信号、音声信号、
副映像信号を符号化することにより発生する符号化デー
タと、これにより得られたビデオ符号化データ、音声符
号化データ、副映像符号化データを多重して得られたデ
ータストリーム（ＶＯＢ）の例を説明するための図であ
る。図５において、ビデオ符号化データおよび音声符号
化データはＶＯＢの先頭から最後までほぼ均一的に連続
して存在するのに対し、副映像符号化データは信号が存
在する部分のみに偏在する。本実施例は、この特性を利
用して副映像（字幕）信号をキーとして会話シーンを検
出する。FIG. 5 shows a video signal, an audio signal,
Example of data stream (VOB) obtained by multiplexing coded data generated by coding sub-picture signal and video coded data, audio coded data, and sub-picture coded data obtained thereby It is a figure for explaining. In FIG. 5, the coded video data and the coded audio data exist substantially continuously from the beginning to the end of the VOB, while the sub-picture coded data is unevenly distributed only in the portion where the signal exists. This embodiment utilizes this characteristic to detect a conversation scene using the sub-picture (caption) signal as a key.

【００３６】次に、本発明の一実施形態に係わるデジタ
ルビデオ再生装置を説明する。Next, a digital video reproducing apparatus according to an embodiment of the present invention will be described.

【００３７】図６は本実施形態のデジタルビデオ再生装
置の構成を示すブロック図である。FIG. 6 is a block diagram showing the arrangement of the digital video reproducing apparatus according to this embodiment.

【００３８】デジタルビデオ再生装置１００は情報記録
媒体である光ディスク１０１に記録されているＤＶＤ−
Ｖｉｄｅｏフォーマットのデジタルビデオ信号を再生す
る。光ディスク１０１には、ビデオ（データ）ストリー
ム、複数言語の音声（データ）ストリーム、複数言語の
副映像（データ）ストリームが多重化されて記録されて
いる。読出し手段１０２は光ディスク１０１からＶＯＢ
を読み出す。読出し手段１０２から読み出されたＶＯＢ
はデマルチプレクサ１０３に供給される。デマルチプレ
クサ１０３はＶＯＢ中にパケット多重化されているビデ
オ（データ）ストリーム、音声（データ）ストリーム、
副映像（データ）ストリームを分離する。デマルチプレ
クサ１０３により分離されたビデオ（データ）ストリー
ムはビデオ復号手段１０４に、副映像（データ）ストリ
ームは副映像復号手段１０５、音声（データ）ストリー
ムは音声復号手段１０６に供給される。ビデオ復号手段
１０４はビデオ（データ）ストリームからビデオ信号を
復号する。副映像復号手段１０５は副映像（データ）ス
トリームから副映像（字幕）信号を復号する。音声復号
手段１０６は音声（データ）ストリームから音声信号を
復号する。ビデオ復号手段１０４で復号されたビデオ信
号および副映像復号手段１０５で復号された副映像信号
はビデオミキサー１０７に供給される。ビデオミキサー
１０７はビデオ信号および副映像信号を混合し、出力端
子１０８に出力する。音声復号手段１０６で復号された
音声信号は出力端子１０９に出力される。会話検出手段
１１０は読出し手段１０２が出力するＶＯＢから会話シ
ーンを検出する。会話検出手段１１０は会話シーン検出
時に用いられるＮａｖｉパックのアドレス情報を保持す
るアドレス保持部１１０ａ、副映像ユニットデータの先
頭パックを検出したか否かの情報を保持するフラグ部１
１０ｂを有する。The digital video reproducing apparatus 100 is a DVD-recorded on an optical disc 101 which is an information recording medium.
Reproduces a video video digital video signal. On the optical disc 101, a video (data) stream, an audio (data) stream in a plurality of languages, and sub-picture (data) streams in a plurality of languages are multiplexed and recorded. The reading means 102 reads VOB from the optical disk 101.
Read out. VOB read from the reading means 102
Are supplied to the demultiplexer 103. The demultiplexer 103 includes a video (data) stream, an audio (data) stream, which are packet-multiplexed in the VOB,
The sub-picture (data) stream is separated. The video (data) stream separated by the demultiplexer 103 is supplied to the video decoding means 104, the sub-picture (data) stream is supplied to the sub-picture decoding means 105, and the audio (data) stream is supplied to the audio decoding means 106. The video decoding means 104 decodes a video signal from a video (data) stream. The sub-picture decoding means 105 decodes a sub-picture (caption) signal from the sub-picture (data) stream. The audio decoding means 106 decodes an audio signal from an audio (data) stream. The video signal decoded by the video decoding means 104 and the sub-picture signal decoded by the sub-picture decoding means 105 are supplied to the video mixer 107. The video mixer 107 mixes the video signal and the sub-picture signal and outputs the mixed signal to the output terminal 108. The audio signal decoded by the audio decoding means 106 is output to the output terminal 109. The conversation detecting means 110 detects a conversation scene from the VOB output by the reading means 102. The conversation detecting means 110 is an address holding unit 110a that holds address information of a Navi pack used when detecting a conversation scene, and a flag unit 1 that holds information as to whether or not a leading pack of sub-picture unit data is detected.
With 10b.

【００３９】システム制御手段１１１は、リモコンや再
生装置１００のフロントパネル上のボタンなどのユーザ
インタフェース手段１１２からユーザ指示を受ける。シ
ステム制御手段１１１はユーザインタフェース手段１１
２からのユーザ指示により、読出し手段１０２に対する
信号読出し位置制御や、デマルチプレクサ１０３に対す
る音声や副映像の言語切り替え制御などシステム制御を
行なう。The system control means 111 receives a user instruction from the user interface means 112 such as a remote controller or buttons on the front panel of the reproducing apparatus 100. The system control means 111 is the user interface means 11
In accordance with a user's instruction from 2, the system control such as the signal reading position control for the reading means 102 and the language switching control for the audio and sub-picture for the demultiplexer 103 is performed.

【００４０】また、システム制御手段１１１は、ユーザ
からユーザインタフェース手段１１２を介して、次（ま
たは前）の会話シーンへのスキップ指示を受けると、再
生動作を停止すると共に会話検出手段１１０に対して再
生位置よりも順方向（または逆方向）に次の（または
前）の会話シーンを検出するよう指示を出す。この指示
に基づき、会話検出手段１１０は読出し手段１０２を介
して光ディスク１０１からＶＯＢを読出し、次（または
前）の会話シーンを検出する。Further, when the system control means 111 receives a skip instruction to the next (or previous) conversation scene from the user via the user interface means 112, the system control means 111 stops the reproduction operation and instructs the conversation detection means 110. It issues an instruction to detect the next (or previous) conversation scene in the forward direction (or backward direction) from the playback position. Based on this instruction, the conversation detecting means 110 reads the VOB from the optical disc 101 via the reading means 102 and detects the next (or previous) conversation scene.

【００４１】ここで、ユーザインタフェース手段１１２
のひとつであるリモコンについて説明する。Here, the user interface means 112.
A remote control, which is one of the above, will be described.

【００４２】図７はリモコンの外観図である。リモコン
７０は電源ボタン７１、メニュー呼出ボタン７２、カー
ソルボタン７３ａ、７３ｂ、７３ｃ、７３ｄ、決定ボタ
ン７４、音声言語切替ボタン７５ａ、字幕言語切替ボタ
ン７５ｂ、アングル切替ボタン７５ｃ、巻戻しボタン７
６ａ、早送りボタン７６ｂ、一時停止ボタン７６ｃ、再
生ボタン７６ｄ、停止ボタン７６ｅ、順方向スキップボ
タン７６ｆ、逆方向スキップボタン７６ｇ、前会話シー
ンボタン７７ａ、次会話シーンボタン７７ｂを有する。
ユーザが前会話シーンボタン７７ａまたは次会話シーン
ボタン７７ｂを押すと、会話検出手段１１０が次または
前の会話シーンを検出する。FIG. 7 is an external view of the remote controller. The remote controller 70 includes a power button 71, a menu call button 72, cursor buttons 73a, 73b, 73c, 73d, an enter button 74, an audio language switching button 75a, a subtitle language switching button 75b, an angle switching button 75c, and a rewind button 7.
6a, fast-forward button 76b, pause button 76c, play button 76d, stop button 76e, forward skip button 76f, backward skip button 76g, previous conversation scene button 77a, and next conversation scene button 77b.
When the user presses the previous conversation scene button 77a or the next conversation scene button 77b, the conversation detection means 110 detects the next or previous conversation scene.

【００４３】次に、本実施形態による会話シーン検出動
作を詳細に説明する。Next, the conversation scene detection operation according to this embodiment will be described in detail.

【００４４】まず、会話シーン検出動作の前提として、
ユーザが図７のリモコン７０の次（前）会話シーンボタ
ン７７ｂ（７７ａ）を押下すると、システム制御部１１
１は読出し手段１０２に対してスキップ動作を指示す
る。システム制御部１１１からのスキップ動作指示があ
ると、読出し手段１０２はＶＯＢの各パックのパックヘ
ッダ部のみを順々に順方向（逆方向）に読みながらスキ
ップ動作を行う。First, as a premise of the conversation scene detection operation,
When the user presses the next (previous) conversation scene button 77b (77a) on the remote controller 70 of FIG. 7, the system control unit 11
1 instructs the reading means 102 to perform a skip operation. When there is a skip operation instruction from the system control unit 111, the reading unit 102 performs the skip operation while sequentially reading only the pack header portion of each pack of the VOB in the forward direction (reverse direction).

【００４５】図８は現在より順方向の次の会話シーンを
検出する際の会話検出手段１１０の動作を説明するフロ
ーチャートである。FIG. 8 is a flow chart for explaining the operation of the conversation detecting means 110 when detecting the next conversation scene in the forward direction from the present.

【００４６】ユーザが図７のリモコン７０の次会話シー
ンボタン７７ｂを押下すると、システム制御部１１１は
会話検出手段１１０に動作の開始を指示する。会話検出
手段１１０は、読出し手段１０２から読出されるパック
ヘッダ部のデータ取得動作を行い（ステップＳ８０
１）、パックが読み込めたか否かを判断する（ステップ
Ｓ８０３）。パックが無いと判断された際、会話検出手
段１１０はデータストリームの終端まで辿り着いたと判
断し、以降の処理を中断して会話シーン検出動作を終了
する。パックが読み込めた場合、会話検出手段１１０は
パックヘッダ部のデータにより、パックがＶＯＢのアク
セスユニットであるＶＯＢＵの先頭のＮａｖｉパックで
あるか否かを判断する（ステップＳ８０５）。Ｎａｖｉ
パックであると判断された場合、会話検出手段１１０は
読み込んだＮａｖｉパックのアドレス情報をアドレス保
持部１１０ａに記録し（ステップＳ８０７）、次のパッ
クを取得するためのステップＳ８０１に処理を戻す。When the user depresses the next conversation scene button 77b on the remote controller 70 shown in FIG. 7, the system control section 111 instructs the conversation detecting means 110 to start the operation. The conversation detecting means 110 performs a data acquisition operation of the pack header part read from the reading means 102 (step S80).
1) It is determined whether the pack has been read (step S803). When it is determined that there is no pack, the conversation detection unit 110 determines that the end of the data stream has been reached, and the subsequent processing is interrupted to end the conversation scene detection operation. When the pack can be read, the conversation detection unit 110 determines whether the pack is the first Navi pack of the VOBU which is the access unit of the VOB based on the data of the pack header (step S805). Navi
When it is determined that the pack is a pack, the conversation detection unit 110 records the address information of the read Navi pack in the address holding unit 110a (step S807), and returns the process to step S801 for acquiring the next pack.

【００４７】Ｎａｖｉパックではないと判断された場
合、会話検出手段１１０は読み込んだパックが副映像の
先頭パックであるか否かをパックヘッダ部のデータによ
り判断する（ステップＳ８０９）。先頭パックではない
と判断された場合、会話検出手段１１０は次のパックを
取得するためのステップＳ８０１に処理を戻す。読み込
んだパックが副映像の先頭パックであると判断された場
合、会話検出手段１１０はステップＳ８０７によりＮａ
ｖｉパックのアドレス情報がアドレス保持部１１０ａに
保持されているか否かを判断する（ステップＳ８１
１）。アドレス情報が保持されていないと判断された場
合、会話検出手段１１０は次のパックを取得するための
ステップＳ８０１に処理を戻す。Ｎａｖｉパックのアド
レス情報が保持されていると判断された場合、会話検出
手段１１０は次の会話シーンへのアクセスポイントとし
てアドレス保持部１１０ａで保持しているＮａｖｉパッ
クのアドレス情報をシステム制御手段１１１に出力し
（ステップＳ８１３）、会話シーン検出動作を終了す
る。When it is determined that the pack is not the Navi pack, the conversation detecting means 110 determines whether the read pack is the first pack of the sub-picture or not, based on the data in the pack header section (step S809). If it is determined that the pack is not the first pack, the conversation detecting unit 110 returns the process to step S801 for acquiring the next pack. When it is determined that the read pack is the first pack of the sub-picture, the conversation detecting unit 110 determines Na in step S807.
It is determined whether the address information of the vi pack is held in the address holding unit 110a (step S81).
1). When it is determined that the address information is not held, the conversation detecting unit 110 returns the process to step S801 for acquiring the next pack. When it is determined that the address information of the Navi pack is held, the conversation detecting unit 110 causes the system control unit 111 to use the address information of the Navi pack held in the address holding unit 110a as an access point to the next conversation scene. Output (step S813) and terminate the conversation scene detection operation.

【００４８】図９は現在より前の会話シーンを検出する
際の会話検出手段１１０の動作を説明するフローチャー
トである。FIG. 9 is a flow chart for explaining the operation of the conversation detecting means 110 when detecting a conversation scene before the present.

【００４９】現在より前の会話シーンを検出する際は、
Ｎａｖｉパックより先に副映像ユニットデータの先頭パ
ックが見つかるため、図６のフラグ部１１０ｂを用いて
副映像ユニットデータの先頭パックの検出の有無をチェ
ックする。When detecting a conversation scene before the present,
Since the leading pack of the sub-picture unit data is found before the Navi pack, the presence or absence of detection of the leading pack of the sub-picture unit data is checked using the flag section 110b of FIG.

【００５０】ユーザが図７のリモコン７０の次会話シー
ンボタン７７ａを押下すると、システム制御部１１１は
会話検出手段１１０に動作の開始を指示する。会話検出
手段１１０はフラグ部１１０ｂのデータを“０”に初期
化して（ステップＳ９０１）、読出し手段１０２から読
出されるパックヘッダ部のデータ取得動作を行う（ステ
ップＳ９０３）。ここで、フラグ部１１０ｂのデータ
“０”は、副映像ユニットデータの先頭パックが会話検
出手段１１０により検出されていないことを示す。会話
検出手段１１０はステップＳ９０３の取得動作でパック
が読み込めたか否かを判断する（ステップＳ９０５）。
パックが読み込めなかったと判断された場合、会話検出
手段１１０はデータストリームの終端まで辿り着いたと
判断し、以降の処理を中断して会話シーン検出動作を終
了する。When the user depresses the next conversation scene button 77a on the remote controller 70 shown in FIG. 7, the system control section 111 instructs the conversation detecting means 110 to start the operation. The conversation detecting means 110 initializes the data of the flag portion 110b to "0" (step S901), and performs the data acquisition operation of the pack header portion read by the reading means 102 (step S903). Here, the data "0" of the flag portion 110b indicates that the head pack of the sub-picture unit data has not been detected by the conversation detecting means 110. The conversation detecting means 110 determines whether or not the pack has been read in the acquisition operation of step S903 (step S905).
When it is determined that the pack could not be read, the conversation detecting unit 110 determines that the end of the data stream has been reached, interrupts the subsequent processing, and ends the conversation scene detection operation.

【００５１】パックが読み込めた場合、会話検出手段１
１０はフラグ部１１０ｂのデータが“１”か否かを判断
する（ステップＳ９０７）。ここで、フラグ部１１０ｂ
のデータ“１”は副映像ユニットデータの先頭パックが
会話検出手段１１０により検出済であることを示す。フ
ラグ部１１０ｂのデータが“１”ではないと判断された
場合、会話検出手段１１０はパックヘッダ部のデータよ
り、読み込んだパックが副映像ユニットデータの先頭パ
ックであるか否かを判断する（ステップＳ９０９）。先
頭パックであると判断された場合、会話検出手段１１０
はフラグ部１１０ｂのデータを“１”に設定し（ステッ
プＳ９１１）、次のパックを取得するためのステップＳ
９０３に処理を戻す。先頭パックでないと判断された場
合、会話検出手段１１０は次のパックを取得するための
ステップＳ９０３に処理を戻す。When the pack can be read, the conversation detecting means 1
10 determines whether the data in the flag section 110b is "1" (step S907). Here, the flag section 110b
Data "1" indicates that the head pack of the sub-picture unit data has been detected by the conversation detecting means 110. When it is determined that the data in the flag section 110b is not "1", the conversation detecting means 110 determines from the data in the pack header section whether the read pack is the first pack of the sub-picture unit data (step). S909). When it is determined that the pack is the first pack, the conversation detecting means 110
Sets the data of the flag section 110b to "1" (step S911), and acquires the next pack in step S
The processing is returned to 903. If it is determined that the pack is not the first pack, the conversation detecting unit 110 returns the process to step S903 for acquiring the next pack.

【００５２】ステップＳ９０７においてフラグ部１１０
ｂのデータが“１”であると判断された場合、会話検出
手段１１０は読み込んだパックがＮａｖｉパックである
か否かをパックヘッダ部のデータにより判断する（ステ
ップＳ９１３）。Ｎａｖｉパックであると判断された場
合、会話検出手段１１０は読み込んだＮａｖｉパックの
アドレス情報をアドレス保持部１１０ａに記録する（ス
テップＳ９１５）し、検出した副映像ユニットデータの
先頭パックに先行するＮａｖｉパックを見つけたことか
ら、アドレス保持部１１０ａに記録したＮａｖｉパック
のアドレス情報を会話シーンのアクセスポイント（再生
開始位置）としてシステム制御手段１１１に出力し（ス
テップＳ９１７）、会話シーン検出処理を終了する。ス
テップＳ９１３でＮａｖｉパックではないと判断された
場合、会話検出手段１１０は次のパックを取得するため
のステップＳ９０３に処理を戻す。In step S907, the flag section 110
When it is determined that the data of b is "1", the conversation detecting unit 110 determines whether or not the read pack is a Navi pack based on the data of the pack header portion (step S913). When it is determined that the Navi pack is the Navi pack, the conversation detection unit 110 records the address information of the read Navi pack in the address holding unit 110a (step S915), and the Navi pack preceding the first pack of the detected sub-picture unit data. Therefore, the address information of the Navi pack recorded in the address holding unit 110a is output to the system control unit 111 as an access point (playback start position) of the conversation scene (step S917), and the conversation scene detection process is ended. When it is determined in step S913 that the pack is not the Navi pack, the conversation detecting unit 110 returns the process to step S903 for acquiring the next pack.

【００５３】システム制御手段１１１は、会話検出手段
１１０がステップＳ８１３またはステップｓ９１７にて
出力するＮａｖｉパックのアドレス情報を基に、次のあ
るいは前の会話シーンへジャンプするようデジタルビデ
オ再生装置を制御する。これにより、図４の例におい
て、字幕１→字幕２→字幕３→字幕４の順番に、または
その逆順に順次会話シーンをスキップできる。The system control means 111 controls the digital video reproducing apparatus so as to jump to the next or previous conversation scene based on the address information of the Navi pack output by the conversation detection means 110 in step S813 or step s917. . Thereby, in the example of FIG. 4, the conversation scene can be skipped in the order of subtitle 1 → subtitle 2 → subtitle 3 → subtitle 4 or vice versa.

【００５４】なお、ジャンプ後すぐに会話が始まるよ
り、字幕の開始時刻より一定時間（数秒）前の時間を会
話シーンのアクセスポイントとしたほうがよいことも考
えられる。これを実現するために、次の会話シーン検出
の場合はＮａｖｉパックのアドレス情報を所定数保持す
るようにしておき、ステップＳ８１３では記録されてい
るアドレス情報の中から所定数前のアドレスを出力する
ようにする。現在より前の会話シーン検出の場合は、ス
テップＳ９１７において、副映像ユニットデータの先頭
パック検出の直前のＮａｖｉパックのアドレス情報を出
力するのではなく、先頭パック検出から所定数前のＮａ
ｖｉパックを検出してそのアドレス情報を出力するよう
にすればよい。It may be considered that it is better to set a certain time (several seconds) before the start time of the subtitle as the access point of the conversation scene rather than the conversation immediately after the jump. In order to realize this, in the case of detecting the next conversation scene, a predetermined number of Navi pack address information is held, and in step S813, the address before the predetermined number is output from the recorded address information. To do so. In the case of detecting the conversation scene before the present, in step S917, the address information of the Navi pack immediately before the detection of the first pack of the sub-picture unit data is not output, but the Na of the predetermined number before the detection of the first pack is detected.
The vi pack may be detected and its address information may be output.

【００５５】次に、本発明の第２の実施の形態を説明す
る。Next, a second embodiment of the present invention will be described.

【００５６】図１０は第２の実施の形態に係わるデジタ
ルビデオ再生装置の構成を示すブロック図である。図１
０のデジタルビデオ再生装置２００は図６のデジタルビ
デオ再生装置１００の構成に加えて会話シーンの情報を
リストとして記憶する記憶手段２０１と会話シーンの選
択メニューを生成するメニュー生成手段２０３を有す
る。なお、図１０において図６と同じ構成要素について
は同一符号を付し説明を省略する。FIG. 10 is a block diagram showing the structure of a digital video reproducing apparatus according to the second embodiment. Figure 1
In addition to the configuration of the digital video reproducing apparatus 100 of FIG. 6, the digital video reproducing apparatus 200 of 0 has a storage means 201 for storing information of the conversation scene as a list and a menu generating means 203 for generating a selection menu of the conversation scene. In FIG. 10, the same components as those in FIG. 6 are designated by the same reference numerals, and the description thereof will be omitted.

【００５７】第１の実施形態のデジタルビデオ再生装置
１００においては、ユーザからの指示にしたがって、そ
の都度１つ先または１つ前の会話シーンを検出したが、
第２の実施形態のデジタルビデオ再生装置２００はユー
ザからの指示によりあらかじめＶＯＢ全体の会話シーン
アクセスポイントを会話検出手段１１０で検出して、会
話シーン情報リストとして記憶手段２０１に記録する。
この動作は図９のフローチャートの動作をＶＯＢの先頭
から繰り返し行うことで実現できる。In the digital video reproducing apparatus 100 according to the first embodiment, the next or previous conversation scene is detected each time according to the instruction from the user.
The digital video reproducing device 200 of the second embodiment detects the conversation scene access points of the entire VOB by the conversation detecting means 110 in advance according to an instruction from the user, and records them in the storing means 201 as a conversation scene information list.
This operation can be realized by repeating the operation of the flowchart of FIG. 9 from the beginning of the VOB.

【００５８】図１１は記憶手段２０１に記憶される会話
シーン情報リストの構成の一例を示す図である。図１５
の例では、記憶手段２０１は記録されている会話シーン
数の情報を記録する領域１５１、各会話シーンのアクセ
スポイントであるＮａｖｉパックのアドレス情報が順番
に記録される領域１５２からなる。ユーザからリモコン
７０を介して現在より前の会話シーンまたは次の会話シ
ーンへジャンプの指示があると、システム制御部１１１
は記憶手段２０１から指示された会話シーンのアドレス
情報を取得し、取得したアドレスから再生を行うようデ
ジタルビデオ再生装置２００を制御する。この実施形態
によれば、会話シーンをスキップする毎にＶＯＢをシー
クする必要がなくなり、ユーザの指示に対するレスポン
スが早くなる。FIG. 11 shows an example of the structure of the conversation scene information list stored in the storage means 201. Figure 15
In the above example, the storage unit 201 is composed of an area 151 for recording information on the number of recorded conversation scenes, and an area 152 for sequentially recording the address information of the Navi pack which is the access point of each conversation scene. When the user gives an instruction via the remote controller 70 to jump to the conversation scene before the present or the next conversation scene, the system control unit 111
Acquires the address information of the conversation scene instructed from the storage means 201, and controls the digital video reproducing device 200 so as to reproduce from the acquired address. According to this embodiment, it is not necessary to seek the VOB each time the conversation scene is skipped, and the response to the user's instruction becomes faster.

【００５９】メニュー生成手段２０３は記憶部２０１に
記録されているリスト情報をメニューとしてオンスクリ
ーン表示させる。ユーザからリモコン７０を介してメニ
ュー表示指示があると、システム制御手段１１１は記憶
手段２０１からリスト情報を読出し、メニュー生成手段
２０３に供給する。メニュー生成手段２０３は供給され
たリスト情報からメニュー画面を作成する。メニュー生
成手段が作成したメニュー画面はビデオミキサー１０７
に供給され、出力端子１０８を介してオンスクリーン表
示が行われる。図１２はオンスクリーン表示されたメニ
ュー画面の一例を示す図である。ユーザは図７のリモコ
ンのカーソルボタン７３ａ、７３ｂ、７３ｃ、７３ｄ、
決定ボタン７４を操作して図１２のメニュー画面から希
望の会話シーンを選択する。システム制御手段１１１は
メニュー画面で選択された会話シーンのアクセスポイン
トを記憶部２０１から取得し、取得したアクセスポイン
トからの再生制御を行う。これにより、直前または直後
の会話シーンのみではなく、任意の会話シーンへのアク
セス（ジャンプ）が可能になる。The menu generating means 203 displays the list information recorded in the storage unit 201 as a menu on screen. When the user gives a menu display instruction via the remote controller 70, the system control means 111 reads the list information from the storage means 201 and supplies it to the menu generation means 203. The menu generation means 203 creates a menu screen from the supplied list information. The menu screen created by the menu creating means is the video mixer 107.
And is displayed on the screen through the output terminal 108. FIG. 12 is a diagram showing an example of the menu screen displayed on-screen. The user selects the cursor buttons 73a, 73b, 73c, 73d on the remote controller shown in FIG.
The enter button 74 is operated to select a desired conversation scene from the menu screen of FIG. The system control unit 111 acquires an access point of the conversation scene selected on the menu screen from the storage unit 201, and controls reproduction from the acquired access point. As a result, it is possible to access (jump) not only the conversation scene immediately before or immediately after but also an arbitrary conversation scene.

【００６０】次に、本発明の第３の実施の形態を説明す
る。Next, a third embodiment of the present invention will be described.

【００６１】第１の実施形態および第２の実施形態にお
いて会話検出手段１１０は全ての副映像データすなわち
全ての字幕の切り変わり点を会話シーンの先頭として検
出したが、第３の実施形態においては所定時間連続して
取り交わされる会話を１つの会話シーンとして扱うよう
にしている。In the first and second embodiments, the conversation detecting means 110 detects all the sub-picture data, that is, the switching points of all the subtitles as the beginning of the conversation scene, but in the third embodiment, Conversations exchanged continuously for a predetermined time are treated as one conversation scene.

【００６２】図１３は第３の実施の形態に係わるデジタ
ルビデオ再生装置の構成を示すブロック図である。図１
３のデジタルビデオ再生装置３００は会話検出手段１１
０が副映像ユニットの開始時刻を記憶する開始時刻記憶
部３０１、副映像ユニットの終了時刻を記憶する終了時
刻記憶部３０３、副映像ユニットデータを一時的に記憶
する一時記憶部３０５を有する構成となっている。な
お、図１３において図１０と同じ構成要素については同
一符号を付し説明を省略する。FIG. 13 is a block diagram showing the structure of a digital video reproducing apparatus according to the third embodiment. Figure 1
The digital video reproducing apparatus 300 of No. 3 is the conversation detecting means 11
0 has a start time storage unit 301 that stores the start time of the sub video unit, an end time storage unit 303 that stores the end time of the sub video unit, and a temporary storage unit 305 that temporarily stores the sub video unit data. Has become. Note that, in FIG. 13, the same components as those in FIG.

【００６３】図１４は第３の実施形態に係わる会話検出
手段１１０の動作を説明するためのフローチャートであ
る。FIG. 14 is a flow chart for explaining the operation of the conversation detecting means 110 according to the third embodiment.

【００６４】ユーザからの指示によりＶＯＢ全体の会話
シーンアクセスポイントの会話が開始されると、会話検
出手段１１０は副映像ユニットの開始時刻を記憶する開
始時刻記憶部３０１の値を“０”に、また副映像ユニッ
トの終了時刻を記憶する終了時刻記憶部３０３の値を
“−１”に設定（初期化）し（ステップＳ１４０１）、
読出し手段１０２から読出されるパックヘッダ部のデー
タ取得動作を行う（ステップＳ１４０３）。会話検出手
段１１０は取得動作によりパックが読み込めたか否かを
判断し（ステップＳ１４０５）、パックが無いと判断し
た際にはデータストリームの終端まで辿り着いたと判断
し、以降の処理を中断して会話シーン検出動作を終了す
る。パックが読み込めた場合、会話検出手段１１０はパ
ックヘッダ部のデータにより、パックがＶＯＢのアクセ
スユニットであるＶＯＢＵの先頭のＮａｖｉパックであ
るか否かを判断する（ステップＳ１４０７）。Ｎａｖｉ
パックであると判断された場合、会話検出手段１１０は
読み込んだＮａｖｉパックのアドレス情報をアドレス保
持部１１０ａに記録し（ステップＳ１４０９）、次のパ
ックを取得するためのステップＳ１４０３に処理を戻
す。When the conversation of the conversation scene access points of the entire VOB is started by an instruction from the user, the conversation detecting means 110 sets the value of the start time storage unit 301 for storing the start time of the sub-picture unit to "0", Also, the value of the end time storage unit 303 that stores the end time of the sub-picture unit is set (initialized) to "-1" (step S1401),
The data acquisition operation of the pack header part read from the reading means 102 is performed (step S1403). The conversation detection unit 110 determines whether or not the pack can be read by the acquisition operation (step S1405), and when it is determined that there is no pack, determines that the end of the data stream has been reached and interrupts the subsequent processing. The conversation scene detection operation ends. When the pack can be read, the conversation detection unit 110 determines whether the pack is the first Navi pack of VOBU which is the access unit of the VOB based on the data of the pack header (step S1407). Navi
When it is determined that the pack is a pack, the conversation detecting unit 110 records the address information of the read Navi pack in the address holding unit 110a (step S1409), and returns the process to step S1403 for acquiring the next pack.

【００６５】Ｎａｖｉパックではないと判断された場
合、会話検出手段１１０は読み込んだパックが副映像の
パックであるか否かをパックヘッダ部のデータにより判
断する（ステップＳ１４１１）。副映像パックではない
と判断された場合、会話検出手段１１０は次のパックを
取得するためのステップＳ１４０３に処理を戻す。副映
像パックであると判断された場合、会話検出手段１１０
は読み込んだパックが副映像の先頭パックであるか否か
をパックヘッダ部のデータにより判断する（ステップＳ
１４１３）。読み込んだパックが副映像の先頭パックで
あると判断された場合、会話検出手段１１０は一時記憶
部３０５に読み込んだパックから副映像パックに入って
いる副映像ユニットの再生開始時刻を示すＰＴＳを読み
出し開始時刻記憶部３０１に記憶する（ステップＳＳ１
４１５）。読み込んだパックが副映像の先頭パックでは
ないと判断された場合、会話検出手段１１０は後述する
ステップＳ１４２３に処理を移す。When it is determined that the pack is not a Navi pack, the conversation detecting means 110 determines whether or not the read pack is a sub-picture pack based on the data in the pack header section (step S1411). When it is determined that the pack is not the sub-picture pack, the conversation detecting unit 110 returns the process to step S1403 for acquiring the next pack. When it is judged that the pack is a sub-picture pack, the conversation detecting means 110
Determines whether the read pack is the first pack of the sub-picture based on the data in the pack header section (step S
1413). When it is determined that the read pack is the first pack of the sub-picture, the conversation detecting means 110 reads the PTS indicating the reproduction start time of the sub-picture unit contained in the sub-picture pack from the pack read into the temporary storage unit 305. It is stored in the start time storage unit 301 (step SS1).
415). When it is determined that the read pack is not the first pack of the sub-picture, the conversation detecting unit 110 moves the process to step S1423 described later.

【００６６】再生開始時刻情報を開始時刻記憶部３０１
に記憶後、会話検出手段１１０は開始時刻記憶部３０１
に記憶された今回検出した副映像ユニットの再生開始時
刻と終了時刻記憶部３０３に記憶されている前回検出し
た副映像ユニットの再生終了時刻のインターバル値を算
出し、算出値があらかじめ定められた値（以下、ＭＡＸ
値と称する）を超えているか否かを判断する（ステップ
Ｓ１４１７）。インターバル値がＭＡＸ値を越えていな
いと判断された場合、会話検出手段１１０は後述するス
テップＳ１４２３に処理を移す。The reproduction start time information is stored in the start time storage unit 301.
After storing in the start time storage unit 301,
The interval value of the reproduction start time of the currently detected sub-picture unit and the reproduction end time of the previously detected sub-picture unit stored in the storage unit 303 is calculated, and the calculated value is a predetermined value. (Hereafter, MAX
It is determined whether the value exceeds the value (referred to as a value) (step S1417). When it is determined that the interval value does not exceed the MAX value, the conversation detecting unit 110 moves the process to step S1423 described later.

【００６７】インターバル値がＭＡＸ値を越えていると
判断された場合、会話検出手段１１０は終了時刻記憶部
３０３に記憶されている値が“−１”であるか否かを判
断する（ステップＳ１４１９）。終了時刻記憶部３０３
の値が“−１”の状態は、前に検出した副映像ユニット
中に表示終了コマンドが記述されておらず前のユニット
がこのユニットの表示開始時刻まで続いている、すなわ
ちインターバルが無いことを示している。終了時刻記憶
部３０３の値が“−１”と判断された場合、会話検出手
段１１０は後述するステップＳ１４２３に処理を移す。When it is determined that the interval value exceeds the MAX value, the conversation detecting means 110 determines whether or not the value stored in the end time storage unit 303 is "-1" (step S1419). ). End time storage unit 303
When the value of is "-1", it means that the display end command is not described in the previously detected sub-picture unit and the previous unit continues until the display start time of this unit, that is, there is no interval. Shows. When it is determined that the value of the end time storage unit 303 is "-1", the conversation detecting unit 110 moves the process to step S1423 described later.

【００６８】終了時刻記憶部３０３の値が“−１”では
ないと判断された場合、会話検出手段１１０は次の会話
シーンへのアクセスポイントとしてアドレス保持部１１
０ａで保持しているＮａｖｉパックのアドレス情報を会
話シーン情報として記憶手段２０１に記録し（ステップ
Ｓ１４２１）、処理をステップＳ１４０３に戻す。When it is determined that the value of the end time storage unit 303 is not "-1", the conversation detecting means 110 determines the address holding unit 11 as an access point to the next conversation scene.
The address information of the Navi pack stored in 0a is recorded in the storage unit 201 as conversation scene information (step S1421), and the process returns to step S1403.

【００６９】ステップＳ１４１３で読み込んだパックが
副映像ユニットの先頭パックではない、またはインター
バル値がＭＡＸ値を越えていない、または終了時刻記憶
部３０３の値が“−１”と判断された場合、会話検出手
段１１０は一時記憶部３０５に読み込んだパックを一時
記憶し（ステップＳ１４２３）、この副映像パックが副
映像ユニットデータの最終パックであるか否かを判断す
る（ステップＳ１４２５）。最終パックでないと判断さ
れた場合、会話検出手段１１０は次のパックを取得する
ためのステップＳ１４０３に処理を戻す。最終パックで
あると判断された場合、会話検出手段１１０は一時記憶
部３０５に読み込んだパックから再生終了コマンドが発
行される相対時刻を得、開始時刻記憶部３０１に記憶さ
れた再生開示時刻に相対時刻を加算して再生終了時刻を
算出し、終了時刻記憶部３０３に記憶し（ステップＳ１
４２７）、処理をステップＳ１４０３に戻す。なお、副
映像ユニットデータ中に表示コマンドが記述されていな
い場合、会話検出手段１１０はステップＳ１４２７にお
いて終了時刻記憶部３０３に“−１”を設定する。If the pack read in step S1413 is not the first pack of the sub-picture unit, the interval value does not exceed the MAX value, or the value of the end time storage unit 303 is "-1", the conversation The detection unit 110 temporarily stores the read pack in the temporary storage unit 305 (step S1423), and determines whether this sub-picture pack is the final pack of sub-picture unit data (step S1425). If it is determined that the pack is not the final pack, the conversation detecting unit 110 returns the process to step S1403 for acquiring the next pack. When it is determined that the pack is the final pack, the conversation detection unit 110 obtains the relative time when the reproduction end command is issued from the pack read into the temporary storage unit 305, and the relative time to the reproduction disclosure time stored in the start time storage unit 301. The reproduction end time is calculated by adding the times and stored in the end time storage unit 303 (step S1).
427) and the process returns to step S1403. When the display command is not described in the sub-picture unit data, the conversation detecting means 110 sets "-1" in the end time storage unit 303 in step S1427.

【００７０】この処理によれば、副映像ユニットの表示
インターバルが所定値以下の場合には会話シーンアクセ
スポイントとして検出されることが無くなり、例えば図
４の例において、字幕３と字幕４の会話が１つのシーン
として扱われる。According to this processing, when the display interval of the sub-picture unit is less than or equal to a predetermined value, it is not detected as a conversation scene access point. For example, in the example of FIG. It is treated as one scene.

【００７１】図１５は記憶手段２０１に記憶される会話
シーン情報リストの構成の他の例を示す図である。図１
１においては各会話シーン情報としてアクセスポイント
のみを記憶したが、図１５ではアクセスポイントに加
え、各会話シーン毎の再生開始時刻および再生終了時刻
情報、字幕データを記録するようにしている。メニュー
生成手段２０３は図１５の記憶手段２０１の情報を基に
メニュー画面を作成する。図１６は図１５の記憶手段２
０１の情報を基にメニュー生成手段２０３で作成された
メニュー画面の例を示す図である。図１６のメニュー画
面では、各会話シーン毎に再生開始時刻および字幕を表
示している。ユーザはこのメニュー画面を見ることによ
り、各会話シーンがどの時刻から始まっているか、どの
ような内容の会話であるかを判断でき、明確に好みのの
会話シーンを選択してアクセスすることができる。FIG. 15 is a diagram showing another example of the structure of the conversation scene information list stored in the storage means 201. Figure 1
In FIG. 1, only the access point is stored as each conversation scene information, but in FIG. 15, in addition to the access point, reproduction start time and reproduction end time information for each conversation scene and subtitle data are recorded. The menu generation means 203 creates a menu screen based on the information in the storage means 201 of FIG. FIG. 16 shows the storage means 2 of FIG.
It is a figure which shows the example of the menu screen produced by the menu production | generation means 203 based on the information of 01. On the menu screen of FIG. 16, the reproduction start time and subtitles are displayed for each conversation scene. By looking at this menu screen, the user can determine at what time each conversation scene starts and what kind of conversation the conversation is, and can clearly access the conversation scene of his choice. .

【００７２】上述した実施形態では、ユーザからの指示
により特定会話シーンへジャンプする動作を説明した
が、自動的に会話が無い部分を飛ばして会話シーンのみ
を自動的につなげて再生するモードを設けることも可能
である。この場合、記憶手段２０１は図１５に示される
ように会話シーンの終了時刻情報を記憶させる。図１７
は会話シーン自動再生モードを使用する際に用いられる
リモコン１７０の例を示す図である。リモコン１７０は
図７のリモコン７０の各ボタンに加え会話シーンを自動
再生するための自動再生ボタン１７１を有している。ユ
ーザが自動再生ボタン１７１を押下すると、システム制
御手段１１１はまず先頭の会話シーンのアクセスポイン
トを取得し、その位置から再生を開始する。さらに、シ
ステム制御手段１１１は会話シーンの再生中に再生時刻
と再生中の会話シーンの終了時刻を監視し、再生時刻が
会話シーンの終了時刻となった際に次の会話シーンのア
クセスポイントを得て、再生位置を移動するようデジタ
ルビデオ再生装置を制御する。これにより、会話シーン
のみを自動的に再生することができる。In the above-described embodiment, the operation of jumping to the specific conversation scene according to an instruction from the user has been described. However, a mode for automatically skipping a portion having no conversation and automatically connecting and reproducing only the conversation scene is provided. It is also possible. In this case, the storage unit 201 stores the end time information of the conversation scene as shown in FIG. FIG. 17
FIG. 6 is a diagram showing an example of a remote controller 170 used when using the conversation scene automatic reproduction mode. The remote controller 170 has, in addition to the buttons of the remote controller 70 of FIG. 7, an automatic playback button 171 for automatically playing a conversation scene. When the user presses the automatic reproduction button 171, the system control means 111 first acquires the access point of the first conversation scene and starts reproduction from that position. Further, the system control means 111 monitors the reproduction time during the reproduction of the conversation scene and the end time of the conversation scene being reproduced, and obtains the access point of the next conversation scene when the reproduction time reaches the end time of the conversation scene. The digital video playback device is controlled so as to move the playback position. As a result, only the conversation scene can be automatically reproduced.

【００７３】なお、本発明は、光ディスク媒体からの情
報再生のみならず、磁気ディスク、磁気テープ、その他
情報記録媒体からの情報再生装置で適用可能である。The present invention can be applied not only to information reproduction from an optical disk medium but also to an information reproducing apparatus from a magnetic disk, a magnetic tape, or other information recording medium.

【００７４】[0074]

【発明の効果】本発明によれば、主映像信号が符号化さ
れたビデオデータと、音声信号が符号化された音声デー
タと、字幕信号などが符号化された副映像データとが多
重化されたデータストリームを再生する情報再生装置お
よび会話シーン検出方法において、字幕信号から会話シ
ーンを検出することにより、会話が無い状況が長く続い
ているシーンを飛ばし、検出した会話シーンに簡単にア
クセスすることが可能になる。また、本発明によれば、
ユーザの指示に基づいてデータストリーム中の任意の会
話シーンへ簡単にアクセスすることが可能になる。さら
に、本発明によれば、自動的にデータストリーム中の会
話の無い部分を飛ばして、会話シーンのみをつないで再
生することが可能になる。According to the present invention, the video data in which the main video signal is encoded, the audio data in which the audio signal is encoded, and the sub-video data in which the caption signal is encoded are multiplexed. By detecting a conversation scene from a caption signal in an information reproducing apparatus and a conversation scene detecting method for reproducing a data stream, it is possible to skip a scene in which there is no conversation for a long time and easily access the detected conversation scene. Will be possible. Further, according to the present invention,
It allows easy access to any conversational scene in the data stream based on user instructions. Further, according to the present invention, it is possible to automatically skip a portion of the data stream where there is no conversation, and connect and reproduce only the conversation scene.

[Brief description of drawings]

【図１】ＤＶＤ−Ｖｉｄｅｏディスクのデータ構造を説
明するための図。FIG. 1 is a diagram for explaining a data structure of a DVD-Video disc.

【図２】副映像ユニットのデータ構造を説明するための
図。FIG. 2 is a diagram for explaining a data structure of a sub-picture unit.

【図３】副映像ユニットのパケット化、パック化の様子
を説明するための図。FIG. 3 is a diagram for explaining how a sub-picture unit is packetized and packed.

【図４】ビデオ／音声／副映像信号の再生イメージを説
明するための図。FIG. 4 is a diagram for explaining a reproduction image of a video / audio / sub-picture signal.

【図５】ビデオ符号化データ、音声符号化データ、副映
像符号化データを多重したデータストリームの例を説明
するための図。FIG. 5 is a diagram for explaining an example of a data stream in which coded video data, coded audio data, and coded sub-picture data are multiplexed.

【図６】デジタルビデオ再生装置の構成を示すブロック
図。FIG. 6 is a block diagram showing the configuration of a digital video playback device.

【図７】ユーザインタフェース手段であるリモコンの外
観図。FIG. 7 is an external view of a remote controller that is user interface means.

【図８】次の会話シーンを検出する際の会話検出手段の
動作を説明するフローチャート。FIG. 8 is a flowchart illustrating the operation of the conversation detecting means when detecting the next conversation scene.

【図９】現在より前の会話シーンを検出する際の会話検
出手段の動作を説明するフローチャート。FIG. 9 is a flowchart for explaining the operation of the conversation detecting means when detecting a conversation scene before the present.

【図１０】他の実施の形態に係わるデジタルビデオ再生
装置の構成を示すブロック図。FIG. 10 is a block diagram showing the configuration of a digital video playback device according to another embodiment.

【図１１】記憶手段に記憶される会話シーン情報リスト
の構成の一例を示す図。FIG. 11 is a diagram showing an example of a configuration of a conversation scene information list stored in a storage unit.

【図１２】オンスクリーン表示されたメニュー画面の一
例を示す図。FIG. 12 is a diagram showing an example of an on-screen displayed menu screen.

【図１３】第３の実施の形態に係わるデジタルビデオ再
生装置の構成を示すブロック図。FIG. 13 is a block diagram showing the configuration of a digital video playback device according to a third embodiment.

【図１４】第３の実施形態に係わる会話検出手段の動作
を説明するためのフローチャート。FIG. 14 is a flow chart for explaining the operation of the conversation detecting means according to the third embodiment.

【図１５】記憶手段に記憶される会話シーン情報リスト
の構成の他の例を示す図。FIG. 15 is a diagram showing another example of the configuration of the conversation scene information list stored in the storage means.

【図１６】メニュー画面の他の例を示す図。FIG. 16 is a diagram showing another example of the menu screen.

【図１７】リモコンの他の例を示す図。FIG. 17 is a diagram showing another example of a remote controller.

[Explanation of symbols]

１００、２００、３００デジタルビデオ再生装置１０１光ディスク１０２読出し手段１０３デマルチプレクサ１０４ビデオ復号手段１０５副映像復号手段１０６音声復号手段１０７ビデオミキサー１０８、１０９出力端子１１０会話検出手段１１０ａアドレス保持部１１０ｂフラグ部１１１システム制御手段１１２ユーザインタフェース２０１記憶手段２０３メニュー生成手段３０１開始時刻記憶部３０３終了時刻記憶部３０５一時記憶部 100, 200, 300 Digital video playback device 101 optical disc 102 reading means 103 Demultiplexer 104 video decoding means 105 Sub-picture decoding means 106 voice decoding means 107 video mixer 108, 109 output terminals 110 Conversation detection means 110a address holding unit 110b flag section 111 System control means 112 User Interface 201 storage means 203 Menu generation means 301 Start time storage 303 End time storage 305 Temporary storage

フロントページの続きＦターム(参考） 5C052 AA02 AB03 AB04 AC08 CC06 CC11 5C053 FA24 GB06 GB11 GB12 GB37 HA29 JA01 JA16 JA24 KA05 LA04 LA06 5D044 AB05 AB07 BC03 CC04 DE18 DE57 GK12 Continued front page F-term (reference) 5C052 AA02 AB03 AB04 AC08 CC06 CC11 5C053 FA24 GB06 GB11 GB12 GB37 HA29 JA01 JA16 JA24 KA05 LA04 LA06 5D044 AB05 AB07 BC03 CC04 DE18 DE57 GK12

Claims

[Claims]

1. Information for reproducing a data stream in which video data in which a main video signal is encoded, audio data in which an audio signal is encoded, and sub-video data in which a caption signal is encoded are multiplexed. In the reproducing apparatus, a means for reading the data stream, a detecting means for detecting the head position of the caption from the data stream read by the reading means, and a head position when the detecting means detects the head position of the caption And a control means for controlling the reproduction position of the data stream based on the reproduction start position information corresponding to.

2. Information for reproducing a data stream in which video data in which a main video signal is encoded, audio data in which an audio signal is encoded, and sub-video data in which a caption signal is encoded are multiplexed. In the reproducing apparatus, a means for reading the data stream, a detecting means for detecting the head position of the caption from the data stream read by the reading means, and a head position when the detecting means detects the head position of the caption Information reproduction comprising: storage means for storing reproduction start position information corresponding to the above, and control means for controlling the reproduction position of the data stream based on the reproduction start position information stored in this storage means. apparatus.

3. Data in which video data in which a main video signal is encoded, audio data in which an audio signal is encoded, and sub-video data in which caption signals for a plurality of captions are encoded are multiplexed. In an information reproducing apparatus for reproducing a stream, a means for reading a data stream, a detecting means for detecting the head positions of all captions from the data stream read by the reading means, and a detecting means for detecting the head position of the captions. Every time you,
Storage means for storing reproduction start position information corresponding to each head position, and control means for controlling the reproduction position of the data stream based on the reproduction start position information stored in the storage means. Characteristic information reproducing device.

4. Information for reproducing a data stream in which video data in which a main video signal is encoded, audio data in which an audio signal is encoded, and sub-video data in which a caption signal is encoded are multiplexed. A method for detecting a conversation scene in a playback device, comprising: reading a data stream; detecting a head position of a caption from the read data stream; and when detecting a head position of the caption, reproducing start position information corresponding to the head position is detected. A method for detecting a conversation scene, characterized in that a reproduction position of the data stream is set based on the conversation scene.

5. Information for reproducing a data stream in which video data in which a main video signal is encoded, audio data in which an audio signal is encoded, and sub-video data in which a caption signal is encoded are multiplexed. A method for detecting a conversation scene in a reproducing apparatus, comprising: reading a data stream; detecting a head position of a caption from the read data stream; and detecting a head position of the caption, reproduction start position information corresponding to the head position. Is stored in the storage means, and the reproduction position of the data stream is controlled based on the reproduction start position information stored in the storage means.

6. Data in which video data in which a main video signal is encoded, audio data in which an audio signal is encoded, and sub-video data in which caption signals for a plurality of captions are encoded are multiplexed. A method for detecting a conversation scene in an information reproducing apparatus for reproducing a stream, comprising reading out a data stream, detecting the start positions of all captions from the read data stream, and detecting the start positions of the captions. A method for detecting a conversation scene, characterized in that the reproduction start position information corresponding to is stored in a storage means, and the reproduction position of the data stream is controlled based on each reproduction start position information stored in the storage means.