JPH08205044A

JPH08205044A - Information service controller and information service control method

Info

Publication number: JPH08205044A
Application number: JP7009221A
Authority: JP
Inventors: Kenji Suzuki; 謙二鈴木
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1995-01-24
Filing date: 1995-01-24
Publication date: 1996-08-09

Abstract

PURPOSE: To control the reproduction output of a sub-video image according to the recognition result by recognizing the language of a voice signal through voice when service information recorded in a recording medium is reproduced in the controller having the reproduction function of the recording medium capable of recording the service information comprising a main video image, a sub-video image synchronous with the main video image and audio signals of plural channels. CONSTITUTION: A voice recognition section (RECO) 17 receives voice information reproduced and outputted from an audio signal output section 14e to recognize the classification of a language based on extraction of characteristic of utterance of a human voice signal or the like and provides the output of information being the result of kind recognition to a discrimination section (JUDGE) 18. Upon the receipt of the language classification recognition result from the voice recognition section (RECO) 17, the discrimination section (JUDGE) 18 controls the sub-picture decoder (SP-DEC) 14b of an MPEG decoder section (MPEG2-DEC) 14 to control the reproduction output of a caption.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、円盤状の回転記録媒体
両面に、コード圧縮された映像、副映像、多チャネル音
声等でなる映画情報、教育情報、写真集等の各種提供情
報が記録可能なディスク単体又はディスクパックをドラ
イブ対象とする再生装置に適用される情報提供制御装置
及び情報提供制御方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention records, on both sides of a disk-shaped rotary recording medium, various information provided such as code-compressed video, sub-video, movie information including multi-channel audio, educational information, and photo albums. The present invention relates to an information provision control device and an information provision control method applied to a reproduction device in which a single disc or a disc pack that can be driven is used.

【０００２】[0002]

【従来の技術】近年、マルチメディア技術の一貫とし
て、パーソナルコンピュータにＣＤ−ＲＯＭ等の大容量
記憶媒体をインタフェース接続し、当該記憶媒体より映
像、音声等の情報を取り込む構成とした各種のシステム
が普及している。2. Description of the Related Art In recent years, as a continuation of multimedia technology, various systems have been constructed in which a personal computer is interfaced with a large-capacity storage medium such as a CD-ROM, and information such as video and audio is fetched from the storage medium. It is popular.

【０００３】更に、最近では、ＣＤ−ＲＯＭの数倍乃至
十数倍程度の容量をもつ高密度大容量記録媒体が開発さ
れるに至った。その一例を挙げると、両面記録方式によ
り１０ＧＢ（ギガバイト）に近い容量をもち、最大で１
０Ｍ BpS（メガビット／セコンド）に近い転送レートを
もつ大容量回転記録媒体の技術が確立されるに至った。
この種の記録媒体を用いることにより、例えばＭＰＥＧ
２システム仕様で片面２時間以上の動画情報をサブピク
チャ及び複数音声チャネルを含めて記録／再生可能とな
る。この種の大容量回転記録媒体をここではＳＤディス
クと称す。Furthermore, recently, a high-density and large-capacity recording medium having a capacity of several times to several tens of times that of a CD-ROM has been developed. As an example, the double-sided recording system has a capacity close to 10 GB (gigabyte), and a maximum of 1
The technology of a large-capacity rotary recording medium having a transfer rate close to 0 MBpS (megabit / second) has been established.
By using this type of recording medium, for example, MPEG
With the two-system specification, it is possible to record / reproduce moving picture information on one side for two hours or more including sub-pictures and multiple audio channels. This type of large-capacity rotary recording medium is referred to as an SD disk here.

【０００４】[0004]

【発明が解決しようとする課題】このような大容量回転
記録媒体（ＳＤディスク）に、例えば映像と、この映像
に同期する副映像及び複数チャネルの音声でなる１種又
は複数種の提供情報を記録して、再生プレーヤにより任
意の提供情報を再生出力するシステムを実現する際、副
映像、多チャネル音声等の各記録機能を如何に有効に活
用するかがシステム機能拡張性の上で大きな課題とな
る。On such a large-capacity rotary recording medium (SD disk), for example, one kind or a plurality of kinds of provided information consisting of a picture, a sub-picture synchronized with this picture, and audio of a plurality of channels are provided. When implementing a system for recording and reproducing and outputting arbitrary provided information by a reproduction player, how to effectively utilize each recording function of sub-picture, multi-channel audio, etc. is a major issue in terms of system function expandability. Becomes

【０００５】本発明は上記実情に鑑みなされたもので、
主映像と同映像に同期する副映像及び複数チャネルの音
声による提供情報が記録可能な記録媒体の再生機能をも
つシステムに於いて、主映像に同期する、副映像、多チ
ャネル音声等の記録機能を有効に活用して、システム機
能の拡張と使用用途の拡充化が図れる情報提供制御装置
及び情報提供制御方法を提供することを目的とする。The present invention has been made in view of the above circumstances,
In a system that has a playback function of a recording medium that can record main video and sub-video synchronized with the same video and information provided by multiple channels of audio, a recording function of sub-video, multi-channel audio, etc. that is synchronized with the main video. It is an object of the present invention to provide an information provision control device and an information provision control method capable of effectively utilizing system to expand system functions and usage applications.

【０００６】具体的には、記録媒体に記録された提供情
報の再生時に於いて、再生出力された音声の言語を音声
認識し、その音声認識結果を制御情報として映像、音声
等の再生出力制御に反映させる機能をもつ情報提供制御
装置及び情報提供制御方法を提供することを目的とす
る。Specifically, at the time of reproducing the provided information recorded on the recording medium, the language of the sound reproduced and outputted is recognized, and the result of the sound recognition is used as control information to control the reproduction output of video, sound and the like. It is an object of the present invention to provide an information provision control device and an information provision control method having a function of reflecting the above.

【０００７】又、記録媒体に記録された提供情報の再生
時に於いて、再生出力された音声の言語を音声認識しコ
ード化された文字列情報を生成して、当該文字列情報を
用いインデックス情報を作成して保存する機能をもつ情
報提供制御装置及び情報提供制御方法を提供することを
目的とする。Further, at the time of reproducing the provided information recorded on the recording medium, the language of the reproduced and outputted voice is recognized by voice to generate coded character string information, and the character string information is used to generate index information. It is an object of the present invention to provide an information provision control device and an information provision control method having a function of creating and storing.

【０００８】[0008]

【課題を解決するための手段】本発明は、主映像と同映
像に同期する副映像及び複数チャネルの音声による提供
情報が記録可能な記録媒体の再生機能をもつ装置に於い
て、再生出力された音声の言語を認識する手段と、この
認識結果に従い副映像の再生出力を制御する手段とを具
備してなることを特徴とする。SUMMARY OF THE INVENTION The present invention is an apparatus having a reproduction function of a recording medium capable of recording provided information by a main image, a sub image synchronized with the main image, and audio of a plurality of channels. It is characterized by comprising means for recognizing the language of the sound and means for controlling the reproduction output of the sub-picture according to the recognition result.

【０００９】又、本発明は、主映像と同映像に同期する
副映像及び複数チャネルの音声による提供情報が記録可
能な記録媒体の再生機能をもつ装置に於いて、再生出力
された音声の言語を認識する手段と、この認識結果に従
い再生出力される音声を制御する手段とを具備してなる
ことを特徴とする。Further, the present invention is an apparatus having a reproducing function of a recording medium capable of recording provided information by a sub-picture synchronized with the main picture and the same picture and a plurality of channels of sound, and the language of the reproduced and outputted sound. And a means for controlling the sound reproduced and output according to the recognition result.

【００１０】又、本発明は、主映像と同映像に同期する
副映像及び複数チャネルの音声による提供情報が記録可
能な記録媒体の再生機能をもつ装置に於いて、再生出力
された特定言語の音声と他言語の音声とを識別する手段
と、他言語の再生出力音声を識別したとき副映像による
特定言語の字幕を再生出力する手段とを具備してなるこ
とを特徴とする。Further, the present invention is an apparatus having a reproducing function of a recording medium capable of recording provided information by a sub-picture synchronized with the main picture and the same picture and a plurality of channels of audio. It is characterized by comprising means for distinguishing a sound from a sound in another language, and means for reproducing and outputting a subtitle in a specific language by a sub-picture when a reproduced output sound in another language is identified.

【００１１】又、本発明は、主映像と同映像に同期する
副映像及び複数チャネルの音声による提供情報が記録可
能な記録媒体の再生機能をもつ装置に於いて、再生出力
された特定言語の音声と他言語の音声とを識別する手段
と、他言語の再生出力音声を識別したとき特定言語の字
幕を含む複数の副映像を再生出力する手段とを具備して
なることを特徴とする。Further, the present invention is an apparatus having a reproducing function of a recording medium capable of recording provided information by a sub-picture synchronized with the main picture and the same picture and a plurality of channels of audio, in a specific language reproduced and outputted. It is characterized by comprising means for discriminating between a voice and a voice in another language, and means for reproducing and outputting a plurality of sub-pictures including subtitles in a specific language when the reproduced output voice in another language is identified.

【００１２】又、本発明は、主映像と同映像に同期する
副映像及び複数チャネルの音声による提供情報が記録可
能な記録媒体の再生機能をもつ装置に於いて、再生出力
された特定言語の音声と他言語の音声とを識別する手段
と、他言語の再生出力音声を識別したとき他の音声記録
部に記録された特定言語の音声を再生出力する手段とを
具備してなることを特徴とする。Further, the present invention is an apparatus having a reproduction function of a recording medium capable of recording provided information by a sub-picture synchronized with the main picture and the same picture and audio of a plurality of channels. And a means for reproducing and outputting a sound of a specific language recorded in another sound recording unit when the reproduced and output sound of the other language is identified. And

【００１３】又、本発明は、主映像と同映像に同期する
副映像及び複数チャネルの音声による提供情報が記録可
能な記録媒体の再生機能をもつ装置に於いて、再生出力
された音声から特定の音声を認識する手段と、特定の音
声を認識したとき再生シーンを切り替える手段とを具備
してなることを特徴とする。Further, according to the present invention, in a device having a reproducing function of a recording medium capable of recording provided information by a sub-video synchronized with the main video and the same video and a plurality of channels of audio, the audio is reproduced and output. And a unit for switching the reproduction scene when a specific voice is recognized.

【００１４】又、本発明は、主映像と同映像に同期する
副映像及び複数チャネルの音声による提供情報が記録可
能な記録媒体の再生機能をもつ装置に於いて、再生出力
された音声から特定の音声を認識する手段と、特定の音
声を認識したときマルチシーン再生を行なう手段とを具
備してなることを特徴とする。Further, according to the present invention, in a device having a reproducing function of a recording medium capable of recording provided information by a sub-video synchronized with the main video and the same video and a plurality of channels of audio, the audio is reproduced and output. And a unit for performing multi-scene reproduction when a specific voice is recognized.

【００１５】又、本発明は、主映像と同映像に同期する
副映像及び複数チャネルの音声による提供情報が記録可
能な記録媒体の再生時に於いて、再生出力された音声の
言語を認識し、当該音声認識結果に従い副映像の再生出
力を制御する情報提供制御方法を特徴とする。Further, the present invention recognizes the language of the reproduced and outputted sound at the time of reproducing the recording medium capable of recording the main image, the sub-image synchronized with the same image and the information provided by the plural channels of audio, An information provision control method for controlling the reproduction output of the sub-picture according to the voice recognition result is characterized.

【００１６】又、本発明は、主映像と同映像に同期する
副映像及び複数チャネルの音声による提供情報が記録可
能な記録媒体の再生時に於いて、再生出力された音声の
言語を認識し、当該音声認識結果に従い再生出力される
音声を制御する情報提供制御方法を特徴とする。Further, the present invention recognizes the language of the reproduced and outputted sound when reproducing the recording medium capable of recording the main image, the sub-image synchronized with the same image and the information provided by the plural channels of audio, It is characterized by an information provision control method for controlling a voice reproduced and output according to the result of the voice recognition.

【００１７】又、本発明は、主映像と同映像に同期する
副映像及び複数チャネルの音声による提供情報が記録可
能な記録媒体の再生時に於いて、再生出力された特定言
語の音声と他言語の音声とを識別し、他言語の再生出力
音声を識別したとき副映像による特定言語の字幕を再生
出力する情報提供制御方法を特徴とする。Further, according to the present invention, in reproducing a recording medium capable of recording the main image, the sub-image synchronized with the same image, and the information provided by a plurality of channels of audio, the audio of a specific language reproduced and output and another language It is characterized by an information provision control method for reproducing and outputting a subtitle of a specific language by a sub-picture when a reproduction output sound of another language is identified.

【００１８】又、本発明は、主映像と同映像に同期する
副映像及び複数チャネルの音声による提供情報が記録可
能な記録媒体の再生時に於いて、再生出力された特定言
語の音声と他言語の音声とを識別し、他言語の再生出力
音声を識別したとき特定言語の字幕を含む複数の副映像
を再生出力する情報提供制御方法を特徴とする。Further, according to the present invention, when a recording medium capable of recording the main image, the sub-image synchronized with the same image, and the information provided by the voices of a plurality of channels, the voice of a specific language reproduced and output and another language are reproduced. And a reproduction output sound of another language are identified, and an information provision control method for reproducing and outputting a plurality of sub-pictures including subtitles of a specific language is characterized.

【００１９】又、本発明は、主映像と同映像に同期する
副映像及び複数チャネルの音声による提供情報が記録可
能な記録媒体の再生時に於いて、再生出力された特定言
語の音声と他言語の音声とを識別し、他言語の再生出力
音声を識別したとき他の音声記録部に記録された特定言
語の音声を再生出力する情報提供制御方法を特徴とす
る。Further, according to the present invention, in reproducing a recording medium capable of recording information provided by a main image, a sub-image synchronized with the main image, and audio of a plurality of channels, the voice of a specific language reproduced and outputted and another language. And a reproduction output voice of another language are identified, an information provision control method for reproducing and outputting a voice of a specific language recorded in another voice recording unit is characterized.

【００２０】又、本発明は、主映像と同映像に同期する
副映像及び複数チャネルの音声による提供情報が記録可
能な記録媒体の再生時に於いて、再生出力された音声か
ら特定の音声を認識し、特定の音声を認識したとき再生
シーンを切り替える情報提供制御方法を特徴とする。Further, according to the present invention, when reproducing a recording medium capable of recording the main image, the sub-image synchronized with the same image, and the information provided by a plurality of channels of audio, a specific audio is recognized from the audio reproduced and output. However, it is characterized by an information provision control method for switching the reproduction scene when a specific voice is recognized.

【００２１】又、本発明は、主映像と同映像に同期する
副映像及び複数チャネルの音声による提供情報が記録可
能な記録媒体の再生時に於いて、再生出力された音声か
ら特定の音声を認識し、特定の音声を認識したときマル
チシーン再生を行なう情報提供制御方法を特徴とする。Further, according to the present invention, when reproducing a recording medium capable of recording the main image, the sub-image synchronized with the same image, and the information provided by the voices of a plurality of channels, a specific voice is recognized from the reproduced and outputted voice. However, it is characterized by an information provision control method for performing multi-scene reproduction when a specific voice is recognized.

【００２２】又、本発明は、主映像と同映像に同期する
副映像及び複数チャネルの音声による提供情報が記録可
能な記録媒体の再生機能をもつ装置に於いて、再生出力
された音声の言語を音声認識しコード化された文字列情
報を生成する手段と、コード化された文字列情報を一定
の時間単位で採取しインデックス情報として保存する手
段とを具備してなることを特徴とする。Further, according to the present invention, in a device having a reproducing function of a recording medium capable of recording provided information by a sub-picture synchronized with the main picture and the same picture and a plurality of channels of sound, the language of the reproduced and outputted sound. And a means for generating coded character string information, and a means for collecting the coded character string information in fixed time units and storing it as index information.

【００２３】又、本発明は、主映像と同映像に同期する
副映像及び複数チャネルの音声による提供情報が記録可
能な記録媒体の再生機能をもつ装置に於いて、記録媒体
に記録された提供情報の再生時に於いて再生提供情報の
フレーム番号を取得する手段と、再生出力された音声の
言語を音声認識しコード化された文字列情報を生成する
手段と、コード化された文字列情報を取得フレーム番号
に対応付けてインデックス情報を生成する手段と、生成
されたインデックス情報を保存する手段とを具備してな
ることを特徴とする。Further, the present invention is an apparatus having a reproducing function of a recording medium capable of recording provided information by a main image, a sub image synchronized with the same image, and audio of a plurality of channels. When the information is reproduced, a means for acquiring the frame number of the reproduction providing information, a means for recognizing the language of the reproduced and outputted voice and generating coded character string information, and a coded character string information It is characterized by comprising means for generating index information in association with the acquired frame number, and means for storing the generated index information.

【００２４】又、本発明は、主映像と同映像に同期する
副映像及び複数チャネルの音声による提供情報が記録可
能な記録媒体の再生機能をもつ装置に於いて、記録媒体
に記録された提供情報の再生時に於いて再生フレームの
提供情報を採取する手段と、再生出力された音声の言語
を音声認識しコード化された文字列情報を生成する手段
と、コード化された文字列情報を該当するフレームの採
取提供情報に対応付けてインデックス情報を生成する手
段と、生成されたインデックス情報を保存する手段とを
具備してなることを特徴とする。Further, the present invention is an apparatus having a reproducing function of a recording medium capable of recording provided information by a main image, a sub image synchronized with the same image, and audio of a plurality of channels. When reproducing information, a means for collecting the provided information of the reproduction frame, a means for recognizing the language of the reproduced and output voice and generating coded character string information, and a method for applying the coded character string information It is characterized by comprising means for generating index information in association with the collection / providing information of the frame, and means for storing the generated index information.

【００２５】又、本発明は、主映像と同映像に同期する
副映像及び複数チャネルの音声による提供情報が記録可
能な記録媒体の再生機能をもつ装置に於いて、記録媒体
に記録された提供情報の再生時に於いて再生提供情報の
フレーム番号を取得し、当該フレームの提供情報を採取
する手段と、再生出力された音声の言語を音声認識しコ
ード化された文字列情報を生成する手段と、コード化さ
れた文字列情報と取得したフレーム番号と採取した提供
情報とを対応付けてインデックス情報を生成する手段
と、生成されたインデックス情報を保存する手段とを具
備してなることを特徴とする。Further, the present invention is an apparatus having a reproduction function of a recording medium capable of recording provided information by a main image, a sub-image synchronized with the same image, and audio of a plurality of channels. A means for acquiring the frame number of the reproduction provision information at the time of reproducing the information and collecting the provision information of the frame; and a means for recognizing the language of the reproduced and outputted voice and generating coded character string information. And a means for generating index information by associating the coded character string information with the acquired frame number and the collected provision information, and means for storing the generated index information. To do.

【００２６】又、本発明は、主映像と同映像に同期する
副映像及び複数チャネルの音声による提供情報が記録可
能な記録媒体の再生時に於いて、再生出力された音声の
言語を音声認識しコード化された文字列情報を生成し、
その文字列情報を一定の時間単位で採取しインデックス
情報として保存する情報提供制御方法を特徴とする。Further, according to the present invention, when the recording medium capable of recording the main image, the sub-image synchronized with the same image, and the information provided by a plurality of channels of voice is reproduced, the language of the reproduced and outputted voice is recognized. Generate coded string information,
The information provision control method is characterized in that the character string information is collected at a fixed time unit and stored as index information.

【００２７】又、本発明は、主映像と同映像に同期する
副映像及び複数チャネルの音声による提供情報が記録可
能な記録媒体の再生時に於いて、提供情報のフレーム番
号を取得するとともに、再生出力された音声の言語を音
声認識しコード化された文字列情報を生成し、この文字
列情報とフレーム番号とを対応付けてインデックス情報
を生成し保存する情報提供制御方法を特徴とする。Further, according to the present invention, when reproducing a recording medium capable of recording the main image, the sub image synchronized with the main image, and the additional information by the audio of a plurality of channels, the frame number of the additional information is acquired and reproduced. The present invention is characterized by an information provision control method of recognizing a language of an output voice, generating coded character string information, and generating and storing index information by associating the character string information with a frame number.

【００２８】又、本発明は、主映像と同映像に同期する
副映像及び複数チャネルの音声による提供情報が記録可
能な記録媒体の再生時に於いて、再生出力された音声の
言語を音声認識しコード化された文字列情報を生成する
とともに、該当フレームの提供情報を採取して、コード
化された文字列情報を該当するフレームの提供情報に対
応付けてインデックス情報を生成し保存する情報提供制
御方法を特徴とする。Further, according to the present invention, when the recording medium capable of recording the main image, the sub-image synchronized with the same image, and the information provided by a plurality of channels of voice is reproduced, the language of the reproduced and outputted voice is recognized. Information provision control that generates coded character string information, collects the provision information of the corresponding frame, associates the coded character string information with the provision information of the corresponding frame, and generates and stores index information. Characterize the method.

【００２９】又、本発明は、主映像と同映像に同期する
副映像及び複数チャネルの音声による提供情報が記録可
能な記録媒体の再生時に於いて、提供情報のフレーム番
号を取得するとともに、再生出力された音声の言語を音
声認識しコード化された文字列情報を生成し、更にフレ
ームの提供情報を採取して、コード化された文字列情報
と取得したフレーム番号と採取した提供情報とを対応付
けてインデックス情報を生成し保存する情報提供制御方
法を特徴とする。Further, according to the present invention, when reproducing a recording medium capable of recording the main video, the sub-video synchronized with the same video, and the additional information by a plurality of channels of audio, the frame number of the additional information is acquired and reproduced. Generates coded character string information by recognizing the language of the output voice, collects frame provision information, and then encodes the coded character string information, the acquired frame number, and the collected provision information. A feature is an information provision control method for generating and storing index information in association with each other.

【００３０】[0030]

【作用】記録媒体に記録された提供情報の再生時に於い
て、再生出力された音声の言語を音声認識し、その音声
認識結果を制御情報として、主映像、副映像、音声等の
再生出力制御に反映させる機能をもつことにより、例え
ば再生された提供情報の内容を確実かつ正確に伝達でき
るとともに伝達機能及び用途を拡充でき、特に複数言語
を扱う提供情報の再生時に於ける情報伝達機能の向上及
び用途の拡充化が図れる。When the provided information recorded on the recording medium is reproduced, the language of the reproduced and outputted sound is recognized, and the result of the recognition is used as control information to control the reproduction output of the main video, sub-video, audio, etc. By having a function to reflect the information in the reproduced information, for example, the content of the reproduced information reproduced can be surely and accurately transmitted, and the communication function and the usage can be expanded. Especially, the information transmission function when reproducing the information provided in multiple languages is improved. Also, the applications can be expanded.

【００３１】又、記録媒体に記録された提供情報の再生
時に於いて、再生出力された音声の言語を音声認識しコ
ード化された文字列情報を生成して、当該文字列情報を
用いインデックス情報を作成して保存する機能をもつこ
とにより、副映像の字幕情報からインデックス情報を作
成し保存することができる。このインデックス情報を参
照することにより、映画等の提供情報に於ける、印象場
面の検索、シーン系列サーチ、ストーリーの要約等の各
種作業が容易に行なえ、字幕情報の有効活用が図れる。Further, at the time of reproducing the provided information recorded on the recording medium, the language of the reproduced voice is recognized by voice to generate coded character string information, and the index information is used by using the character string information. By having the function of creating and storing the sub-image, it is possible to create and store the index information from the subtitle information of the sub-picture. By referring to the index information, it is possible to easily perform various operations such as search for an impression scene, scene sequence search, and story summary in the provided information such as a movie, and to effectively use the subtitle information.

【００３２】[0032]

【実施例】以下図面を参照して本発明の実施例を説明す
る。図１は本発明の第１実施例に於けるシステムのハー
ドウェア構成を示すブロック図である。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing the hardware configuration of a system according to the first embodiment of the present invention.

【００３３】図１に於いて、１０は映画情報等の長時間
提供情報がディスク両面（Ａ面１０ａ及びＢ面１０ｂ）
に高密度記録された大容量回転記録媒体（ＳＤディスク
と称す）であり、ここでは、長時間映像情報が可変速符
号化され記録される。この長時間映像情報としては、主
映像と同映像に同期する副映像（サブピクチャ）及び複
数チャネルの音声が含まれる、例えば字幕・多チャネル
音声付の映画情報、教育情報、写真集等を例に挙げるこ
とができる。In FIG. 1, reference numeral 10 indicates both sides of the disk (A side 10a and B side 10b) for long-time provision information such as movie information.
It is a high-capacity rotary recording medium (referred to as an SD disc) in which high-density recording is performed, in which long-time video information is variable-speed coded and recorded. Examples of this long-time video information include sub-video (sub-picture) synchronized with the main video and audio of multiple channels, such as movie information with subtitles / multi-channel audio, educational information, photo albums, etc. Can be listed in.

【００３４】１１はシステム全体の制御を司るＣＰＵで
あり、ここでは、主記憶（ＭＥＭ）１５に格納された情
報提供制御プログラム（ＰＣＰ）に従い、図２に示すよ
うな音声認識による字幕制御処理（又は図４に示すよう
な音声認識によるインデックス作成処理）を実行する。Reference numeral 11 denotes a CPU which controls the entire system, and here, in accordance with the information provision control program (PCP) stored in the main memory (MEM) 15, a caption control process by voice recognition as shown in FIG. Alternatively, an index creating process by voice recognition as shown in FIG. 4) is executed.

【００３５】１２はＳＤディスク１０をドライブ制御す
るメディアドライブコントローラ（ＭＤＣ）であり、Ｃ
ＰＵ１１の制御の下に、メディアドライブユニット（Ｍ
ＤＵ）１３をドライブ制御する。このメディアドライブ
コントローラ（ＭＤＣ）１２には、メディアドライブユ
ニット（ＭＤＵ）１３よりリード（ピックアップ）した
データ（ＭＰＥＧ２の可変速符号化処理によりデータ転
送レートを可変とした、例えば主映像と同映像に同期す
る副映像及び複数チャネルの音声とでなる映画情報）を
順次所定量貯えるバッファ（ＢＵＦ）が設けられる。Reference numeral 12 is a media drive controller (MDC) for controlling the drive of the SD disk 10, and C
Under the control of the PU 11, the media drive unit (M
DU) 13 is drive-controlled. The media drive controller (MDC) 12 has data read (picked up) from the media drive unit (MDU) 13 (variable data transfer rate by variable speed encoding processing of MPEG2, for example, synchronized with main video). A buffer (BUF) for sequentially storing a predetermined amount of movie information including sub-pictures and audio of a plurality of channels is provided.

【００３６】１３はＳＤディスク１０をドライブするメ
ディアドライブユニット（ＭＤＵ）であり、メディアド
ライブコントローラ（ＭＤＣ）１２の制御の下に、ター
ンテーブル１３ａに載置されたＳＤディスク１０を回転
駆動制御し、ＳＤディスク１０に記録された情報をピッ
クアップ１３ｂによりリードして、所定フォーマットの
ディジタルデータ列（ＭＰＥＧ２のストリーム）として
高速転送レートでメディアドライブコントローラ（ＭＤ
Ｃ）１２のバッファ（ＢＵＦ）を介しＭＰＥＧデコーダ
部（ＭＰＥＧ２−ＤＥＣ）１４へ出力する。Reference numeral 13 denotes a media drive unit (MDU) that drives the SD disk 10. Under the control of the media drive controller (MDC) 12, the SD disk 10 placed on the turntable 13a is rotationally driven and controlled to drive SD. The information recorded on the disk 10 is read by the pickup 13b, and is converted into a digital data string (MPEG2 stream) of a predetermined format at a high transfer rate at a media drive controller (MD).
C) It outputs to the MPEG decoder part (MPEG2-DEC) 14 via the buffer (BUF) of 12.

【００３７】１４はＣＰＵ１１の制御の下に、メディア
ドライブユニット（ＭＤＵ）１３よりリードされたＭＰ
ＥＧ２ストリームのデータ列をメディアドライブコント
ローラ（ＭＤＣ）１２内のバッファ（ＢＵＦ）を介して
入力し、主映像と、主映像に同期する、副映像（例えば
字幕）、及び複数チャネルの音声とに分離し、それぞれ
に伸長処理し同期化して再生出力するＭＰＥＧデコーダ
部（ＭＰＥＧ２−ＤＥＣ）であり、主映像をデコードす
るビデオデコーダ（ＶＤ−ＤＥＣ）１４ａと、副映像を
デコードするサブピクチャデコーダ（ＳＰ−ＤＥＣ）１
４ｂと、音声情報をデコードするオーディオデコーダ
（ＡＵ−ＤＥＣ）１４ｃと、ビデオデコーダ（ＶＤ−Ｄ
ＥＣ）１４ａで分離伸長処理した主映像にサブピクチャ
デコーダ（ＳＰ−ＤＥＣ）１４ｂで分離伸長処理した副
映像を合成しビデオ信号（例えばＮＴＳＣ信号）として
出力するビデオ信号出力部（ＭＵＸ）１４ｄと、オーデ
ィオデコーダ（ＡＵ−ＤＥＣ）１４ｃで伸長処理した音
声情報をアナログオーディオ信号として出力するオーデ
ィオ信号出力制御部１４ｅとを備えて構成される。Reference numeral 14 denotes an MP read from the media drive unit (MDU) 13 under the control of the CPU 11.
The data string of the EG2 stream is input via a buffer (BUF) in the media drive controller (MDC) 12 and separated into a main video, a sub video (for example, subtitles) synchronized with the main video, and audio of multiple channels. A video decoder (VD-DEC) 14a that decodes the main video and a sub-picture decoder (SP-) that decodes the main video. DEC) 1
4b, an audio decoder (AU-DEC) 14c for decoding audio information, and a video decoder (VD-D).
A video signal output unit (MUX) 14d for synthesizing the main video separated and expanded by the EC) 14a with the sub video separated and expanded by the sub-picture decoder (SP-DEC) 14b, and outputting as a video signal (for example, NTSC signal); The audio decoder (AU-DEC) 14c is provided with an audio signal output control section 14e for outputting the audio information expanded as an analog audio signal.

【００３８】１５は制御プログラム格納領域、アプリケ
ーションプログラム格納領域、ワーク領域等が設けられ
る主記憶（ＭＥＭ）であり、ここでは図２に示すような
音声認識による字幕制御処理ルーチン、及び図４に示す
ような音声認識によるインデックス作成処理ルーチン
と、保存されたインデックス情報をリスト形式で表示出
力するためのインデックス画面作成処理ルーチンとをも
つ情報提供制御プログラム（ＰＣＰ）が格納される。更
に、バッテリィバックアップされた不揮発性記憶領域に
は、インデックス作成処理ルーチンの実行により作成さ
れたインデックス情報を記憶し管理するインデックステ
ーブル格納部（ＩＮＤＥＸ−ＴＢＬ）１５ａが設けら
れ、ＣＰＵ１１の制御の下にインデックス情報が書き込
まれる。Reference numeral 15 denotes a main memory (MEM) provided with a control program storage area, an application program storage area, a work area, etc. Here, a caption control processing routine by voice recognition as shown in FIG. 2 and FIG. 4 are shown. An information provision control program (PCP) having such an index creation processing routine by voice recognition and an index screen creation processing routine for displaying and outputting the saved index information in a list format is stored. Further, the battery-backed non-volatile storage area is provided with an index table storage unit (INDEX-TBL) 15a for storing and managing index information created by executing the index creation processing routine, and under the control of the CPU 11. Index information is written.

【００３９】１６はキー操作部（ＫＢ）と表示部を備え
たユーザインタフェース部（User I/F）であり、ここで
は、フレーム番号の入力、インデックステーブルの表示
出力指示等に供される。Reference numeral 16 denotes a user interface unit (User I / F) having a key operation unit (KB) and a display unit, which is used here for inputting a frame number, displaying and outputting an index table, and the like.

【００４０】１７は再生出力される音声の言語を音声認
識してコード化された文字列情報を生成する音声認識部
（ＲＥＣＯ）であり、ＭＰＥＧデコーダ部（ＭＰＥＧ２
−ＤＥＣ）１４のオーディオ信号出力部１４ｅより再生
出力された音声情報を入力して人声の発音特徴抽出等か
ら言語の種別（例えば日本語、英語、仏語等）を認識
し、特定言語であるとき、その言語の特徴音抽出等によ
り会話等の言葉を音声認識して、その言葉の発音順に従
う文字列のコードを生成する。この際、第１実施例では
言語の種別認識結果の情報が使用され、第２実施例では
言葉の発音順に従う文字列のコードが使用される。Reference numeral 17 is a voice recognition unit (RECO) for recognizing the language of the reproduced and outputted voice to generate coded character string information, and an MPEG decoder unit (MPEG2).
-DEC) The audio information output from the audio signal output unit 14e of the DEC 14 is input to recognize the type of language (for example, Japanese, English, French, etc.) from the extraction of pronunciation features of human voice, and it is a specific language. At this time, words such as conversation are recognized by speech by extracting characteristic sounds of the language, and codes of character strings according to the pronunciation order of the words are generated. In this case, the information of the language type recognition result is used in the first embodiment, and the code of the character string according to the pronunciation order of the words is used in the second embodiment.

【００４１】１８は音声認識部（ＲＥＣＯ）１７より得
られる、言語の種別認識結果の情報を受けて、その内容
に従い、ＭＰＥＧデコーダ部（ＭＰＥＧ２−ＤＥＣ）１
４のサブピクチャデコーダ（ＳＰ−ＤＥＣ）１４ｂを制
御する判定部（ＪＵＤＧＥ）であり、ここでは音声認識
部（ＲＥＣＯ）１７より出力される種別認識結果の情報
から、日本語以外の言葉が再生出力されたことを検知し
て、その検知時に、サブピクチャデコーダ（ＳＰ−ＤＥ
Ｃ）１４ｂにトリガをかけ、日本語字幕の副映像を再生
出力制御する。Reference numeral 18 receives the information of the language type recognition result obtained from the voice recognition unit (RECO) 17, and according to the contents, the MPEG decoder unit (MPEG2-DEC) 1
4 is a determination unit (JUDGE) that controls the sub-picture decoder (SP-DEC) 14b of No. 4 and here, a word other than Japanese is reproduced and output from the information of the type recognition result output from the voice recognition unit (RECO) 17. It is detected that the sub-picture decoder (SP-DE
C) Trigger 14b to control playback output of the sub-picture of Japanese subtitles.

【００４２】２１はＭＰＥＧデコーダ部（ＭＰＥＧ２−
ＤＥＣ）１４より出力されるＮＴＳＣ信号を受けて再生
映像を表示出力する表示装置（ＤＩＳＰ）であり、２２
は同じくＭＰＥＧデコーダ部（ＭＰＥＧ２−ＤＥＣ）１
４より出力されるアナログオーディオ信号を受けて再生
映像に同期した音声（可聴音）を出力する音声出力部で
ある。Reference numeral 21 denotes an MPEG decoder section (MPEG2-
DEC) is a display device (DISP) that receives and outputs an NTSC signal output from the DEC 14
Is also an MPEG decoder unit (MPEG2-DEC) 1
4 is an audio output unit that receives an analog audio signal output from the audio output unit 4 and outputs an audio (audible sound) synchronized with a reproduced image.

【００４３】図２は本発明の第１実施例に於ける音声認
識による字幕制御処理ルーチンを示すフローチャートで
ある。図３は本発明の第２実施例に於けるシステムのハ
ードウェア構成を示すブロック図である。尚、図１に示
す第１実施例の構成と同一部分には同一符号を付してそ
の説明を省略する。FIG. 2 is a flow chart showing a subtitle control processing routine by voice recognition in the first embodiment of the present invention. FIG. 3 is a block diagram showing the hardware configuration of the system according to the second embodiment of the present invention. The same parts as those of the first embodiment shown in FIG. 1 are designated by the same reference numerals and the description thereof will be omitted.

【００４４】この第２実施例の特徴は、音声認識部（Ｒ
ＥＣＯ）１７より得られる文字列コードを用いてインデ
ックスを作成し保存する点と、そのインデックスを表示
出力できる点である。The feature of the second embodiment is that the voice recognition unit (R
ECO) 17 is used to create and store an index using a character string code, and the index can be displayed and output.

【００４５】図３に於いて、１９はＣＰＵ１１の制御の
下に、現在再生時のフレーム番号を一時的にラッチする
Ｉ／Ｏレジスタ（ＦＲＡＭＥ）である。２０は上記音声
認識部（ＲＥＣＯ）１７より生成された文字列情報と、
Ｉ／Ｏレジスタ（ＦＲＡＭＥ）１８にラッチされたフレ
ーム番号とをもとに、字幕の文字コード列とフレーム番
号とを互いに対応付けて一定フォームのインデックス情
報を作成するインデックス作成部（ＩＮＤＥＸ−ＧＥ
Ｎ）である。このインデックス作成部（ＩＮＤＥＸ−Ｇ
ＥＮ）１９で作成されたインデックス情報はＣＰＵ１１
の制御の下に、インデックステーブル格納部（ＩＮＤＥ
Ｘ−ＴＢＬ）１５ａに書き込まれる。In FIG. 3, 19 is an I / O register (FRAME) under the control of the CPU 11 for temporarily latching the frame number at the time of the current reproduction. 20 is character string information generated by the voice recognition unit (RECO) 17;
An index creation unit (INDEX-GE) that creates a fixed form of index information by associating a character code string of a subtitle with a frame number based on the frame number latched in the I / O register (FRAME) 18.
N). This index creation unit (INDEX-G
EN) 19 is the index information created by the CPU 11
Index table storage (INDE
X-TBL) 15a.

【００４６】２３はＣＰＵ１１の制御の下に、表示用の
画面データが展開される表示用メモリ（ＶＭ）であり、
ここではインデックステーブル格納部（ＩＮＤＥＸ−Ｔ
ＢＬ）１５ａの内容に従うインデックス情報が一定のリ
スト形式で展開される。Reference numeral 23 is a display memory (VM) in which display screen data is expanded under the control of the CPU 11.
Here, the index table storage unit (INDEX-T
The index information according to the content of (BL) 15a is expanded in a fixed list format.

【００４７】２４はＣＰＵ１１の制御の下に、ＭＰＥＧ
デコーダ部（ＭＰＥＧ２−ＤＥＣ）１４より出力される
ＮＴＳＣ信号、又は表示用メモリ（ＶＭ）２３の表示デ
ータを選択して表示装置（ＤＩＳＰ）２１に表示出力制
御する表示制御部（ＤＩＳＰ−ＣＯＮＴ）である。24 is an MPEG under the control of the CPU 11.
A display control unit (DISP-CONT) that selects the NTSC signal output from the decoder unit (MPEG2-DEC) 14 or the display data of the display memory (VM) 23 and controls the display output to the display device (DISP) 21. is there.

【００４８】図４は本発明の第２実施例に於ける音声認
識によるインデックス作成処理ルーチンを示すフローチ
ャートである。図５はＳＤディスク１０に記録される映
画情報の記録フォーマット例を示す図であり、ＭＰＥＧ
２の可変速符号化処理によりデータ転送レートを可変と
している。このため、動画データ、すなわちＭＰＥＧ２
のビデオについては、単位時間当りに記録／再生する情
報量を異ならせることができるので、動きの激しいシー
ンほど転送レートを高くすることにより高品質の動画再
生が可能となる。FIG. 4 is a flow chart showing an index creation processing routine by voice recognition in the second embodiment of the present invention. FIG. 5 is a diagram showing an example of a recording format of movie information recorded on the SD disk 10, which is MPEG.
The data transfer rate is made variable by the variable speed encoding process of No. 2. Therefore, moving image data, that is, MPEG2
Since the amount of information to be recorded / reproduced per unit time can be made different for the above video, high quality moving image reproduction can be performed by increasing the transfer rate for a scene with a lot of movement.

【００４９】図５に示されているように、１本の映画情
報は、ファイル管理情報部とデータ部とから構成されて
おり、データ部は多数のデータブロック（ブロック＃０
〜＃ｎ）を含んでいる。各データブロックの先頭にはＤ
ＳＩ（Disk Serh Information ）パックがあり、ＤＳＩ
パックから次のＤＳＩパックまでが１つのデータブロッ
クとなる。各ＤＳＩパックの記憶位置は、ファイル管理
情報部のディスクサーチマップ情報によって管理されて
いる。As shown in FIG. 5, one piece of movie information is composed of a file management information section and a data section, and the data section has a large number of data blocks (block # 0).
~ # N) are included. D at the beginning of each data block
There is SI (Disk Serh Information) pack and DSI
One data block is from the pack to the next DSI pack. The storage location of each DSI pack is managed by the disc search map information in the file management information section.

【００５０】１つのデータブロックには、ある一定時間
（例えば、約１／２秒）に再生することが必要な情報を
構成するものであり、多数のビデオパック（ＶＩＤＥＯ
パック）、サブピクチャパック（Ｓ．Ｐパック）、及び
オーディオパック（ＡＵＤＩＯパック）を含んでいる。
これらパックのデータサイズは固定であるが、１つのデ
ータブロックに含ませることができるパック数は可変で
ある。したがって、動きの激しいシーンに対応するデー
タブロックほど、多数のビデオパックが含まれることに
なる。One data block constitutes information required to be reproduced in a certain fixed time (for example, about 1/2 second), and a large number of video packs (VIDEO).
Pack), a sub-picture pack (SP pack), and an audio pack (AUDIO pack).
The data size of these packs is fixed, but the number of packs that can be included in one data block is variable. Therefore, a larger number of video packs are included in a data block corresponding to a scene with a lot of motion.

【００５１】ビデオパック、サブピクチャパック、及び
オーディオパックは、それぞれヘッダ部とパケット部
（ビデオパケット、サブピクチャパケット、オーディオ
パケット）から構成されている。パケット部は符号化さ
れたデータそのものである。The video pack, sub-picture pack, and audio pack each include a header part and a packet part (video packet, sub-picture packet, audio packet). The packet part is the encoded data itself.

【００５２】ヘッダ部は、パックヘッダ、システムヘッ
ダ、パケットヘッダから構成されており、パケットヘッ
ダには、対応するパケットがビデオパケット、サブピク
チャパケット、オーディオパケットのいずれであるかを
示すストリームＩＤが登録されている。The header portion is composed of a pack header, a system header, and a packet header. A stream ID indicating whether the corresponding packet is a video packet, subpicture packet, or audio packet is registered in the packet header. Has been done.

【００５３】ここで上記各図を参照して本発明の実施例
に於ける動作を説明する。先ず、図１及び図２を参照し
て本発明の第１実施例に於ける動作を説明する。メディ
アドライブユニット（ＭＤＵ）１３のターンテーブル１
３ａに、図５に示すようなフォーマットで複数種の提供
情報（例えば字幕付き映画情報）が記録されたＳＤディ
スク１０をセットして、ユーザインタフェース部（User
I/F）１６のキー操作により映画情報を選択し再生起動
をかけると、ＣＰＵ１１の制御の下に、ＳＤディスク１
０から選択指定された映画情報の再生が開始される。こ
の再生時に於いては、ＭＰＥＧデコーダ部（ＭＰＥＧ２
−ＤＥＣ）１４のオーディオ信号出力部１４ｅより音声
情報がデコード出力され、オーディオ信号出力部１４ｅ
を介して音声出力部２２より再生出力される（図２ステ
ップＳ1 ）。The operation of the embodiment of the present invention will be described below with reference to the drawings. First, the operation of the first embodiment of the present invention will be described with reference to FIGS. Turntable 1 of media drive unit (MDU) 13
The SD disc 10 in which a plurality of types of providing information (for example, movie information with subtitles) is recorded in the format as shown in FIG.
When the movie information is selected by the key operation of I / F) 16 and the reproduction is started, the SD disc 1 is controlled under the control of the CPU 11.
Playback of the movie information selected and designated from 0 is started. During this reproduction, the MPEG decoder section (MPEG2
Audio information is decoded and output from the audio signal output unit 14e of the (-DEC) 14;
It is reproduced and output from the audio output unit 22 via the (step S1 in FIG. 2).

【００５４】この際は、ＣＰＵ１１の制御の下に、ＳＤ
ディスク１０より読出された選択メニュ表示用のＭＰＥ
Ｇ２ストリームのデータ列がメディアドライブコントロ
ーラ（ＭＤＣ）１２内のバッファ（ＢＵＦ）を介してＭ
ＰＥＧデコーダ部（ＭＰＥＧ２−ＤＥＣ）１４に入力さ
れ、主映像と、主映像に同期する、副映像（例えば字
幕）、及び複数チャネルの音声とに分離されて、それぞ
れに伸長処理され同期化されたメニュ表示用のＮＴＳＣ
信号及び音声信号として出力される。即ち、主映像がビ
デオデコーダ（ＶＤ−ＤＥＣ）１４ａによりデコードさ
れ、副映像（字幕）がサブピクチャデコーダ（ＳＰ−Ｄ
ＥＣ）１４ｂによりデコードされ、音声情報がオーディ
オデコーダ（ＡＵ−ＤＥＣ）１４ｃによりデコードされ
て、ビデオデコーダ（ＶＤ−ＤＥＣ）１４ａ及びサブピ
クチャデコーダ（ＳＰ−ＤＥＣ）１４ｂの各デコード出
力がそれぞれビデオ信号出力部（ＭＵＸ）１４ｄに供給
され、ビデオデコーダ（ＶＤ−ＤＥＣ）１４ａで分離伸
長処理された主映像にサブピクチャデコーダ（ＳＰ−Ｄ
ＥＣ）１４ｂで分離伸長処理された副映像が合成され
て、表示出力用のアナログビデオ信号（ＮＴＳＣ信号）
として出力される。又、オーディオデコーダ（ＡＵ−Ｄ
ＥＣ）１４ｃで伸長処理された音声情報がオーディオ信
号出力部１４ｅに供給されアナログオーディオ信号とし
て出力される。但し、この第１実施例では、サブピクチ
ャデコーダ（ＳＰ−ＤＥＣ）１４ｂよりデコード出力さ
れる副映像の字幕情報が音声認識部（ＲＥＣＯ）１７の
認識結果により再生出力制御される。At this time, under the control of the CPU 11, SD
MPE for displaying the selected menu read from the disk 10
The data string of the G2 stream is transferred via the buffer (BUF) in the media drive controller (MDC) 12 to the M
It is input to the PEG decoder unit (MPEG2-DEC) 14, separated into a main image, a sub-image (for example, subtitles) synchronized with the main image, and audio of a plurality of channels, which are decompressed and synchronized with each other. NTSC for menu display
It is output as a signal and an audio signal. That is, the main video is decoded by the video decoder (VD-DEC) 14a, and the sub-video (caption) is sub-picture decoder (SP-D).
EC) 14b, audio information is decoded by an audio decoder (AU-DEC) 14c, and each decoded output of the video decoder (VD-DEC) 14a and sub-picture decoder (SP-DEC) 14b is output as a video signal. The sub-picture decoder (SP-D) is supplied to the main image supplied to the unit (MUX) 14d and separated and expanded by the video decoder (VD-DEC) 14a.
EC) 14b The sub-images that have been separated and expanded are combined to create an analog video signal (NTSC signal) for display output.
Is output as Also, an audio decoder (AU-D
The audio information expanded by the EC) 14c is supplied to the audio signal output unit 14e and output as an analog audio signal. However, in the first embodiment, reproduction output control of subtitle information of the sub-picture decoded and output from the sub-picture decoder (SP-DEC) 14b is performed according to the recognition result of the audio recognition unit (RECO) 17.

【００５５】上記ＭＰＥＧデコーダ部（ＭＰＥＧ２−Ｄ
ＥＣ）１４より出力されたＮＴＳＣ信号は表示装置（Ｄ
ＩＳＰ）２１に供給され、アナログオーディオ信号は音
声出力部２２に供給されて、音声及び字幕付きの映画情
報が表示出力される。The MPEG decoder section (MPEG2-D)
The NTSC signal output from the EC) 14 is the display device (D
ISP) 21, the analog audio signal is supplied to the audio output unit 22, and the movie information with audio and subtitles is displayed and output.

【００５６】この際、オーディオデコーダ（ＡＵ−ＤＥ
Ｃ）１４ｃよりデコード出力された音声情報はオーディ
オ信号出力部１４ｅを介して音声認識部（ＲＥＣＯ）１
７に供給される。At this time, the audio decoder (AU-DE
C) The voice information decoded and output from 14c is transmitted to the voice recognition unit (RECO) 1 via the audio signal output unit 14e.
7 is supplied.

【００５７】音声認識部（ＲＥＣＯ）１７は、オーディ
オ信号出力部１４ｅを介してオーディオデコーダ（ＡＵ
−ＤＥＣ）１４ｃより受けた音声から言語の種別及び言
葉を音声認識して言語の種別判定出力情報及びコード化
された文字列情報を生成する（図２ステップＳ2 ）。即
ち、音声認識部（ＲＥＣＯ）１７は、ＭＰＥＧデコーダ
部（ＭＰＥＧ２−ＤＥＣ）１４のオーディオ信号出力部
１４ｅより再生出力された音声情報を入力して人声の発
音特徴抽出等から言語の種別（例えば日本語、英語、仏
語等）を認識し、特定言語であるとき、その言語の特徴
音抽出等により会話等の言葉を音声認識して、その言葉
の発音順に従う文字列のコードを生成する。この際、第
１実施例では言語の種別認識結果の情報が使用される
（尚、第２実施例では言葉の発音順に従う文字列のコー
ドが使用される）。The voice recognition unit (RECO) 17 receives the audio decoder (AU) via the audio signal output unit 14e.
-DEC) 14c recognizes the language type and language from the voice received to generate language type determination output information and coded character string information (step S2 in FIG. 2). That is, the voice recognition unit (RECO) 17 inputs the voice information reproduced and output from the audio signal output unit 14e of the MPEG decoder unit (MPEG2-DEC) 14 and extracts a human voice pronunciation feature or the like to determine a language type (for example, (Japanese, English, French, etc.) is recognized, and when it is a specific language, words such as conversation are voice-recognized by extracting characteristic sounds of the language, and a code of a character string according to the pronunciation order of the words is generated. At this time, the information of the language type recognition result is used in the first embodiment (the code of the character string according to the pronunciation order of the words is used in the second embodiment).

【００５８】この音声認識部（ＲＥＣＯ）１７より得ら
れる、言語の種別認識結果の情報は判定部（ＪＵＤＧ
Ｅ）１８に供給される。判定部（ＪＵＤＧＥ）１８は、
音声認識部（ＲＥＣＯ）１７より言語の種別認識結果の
情報を受けると、その内容に従い、ＭＰＥＧデコーダ部
（ＭＰＥＧ２−ＤＥＣ）１４のサブピクチャデコーダ
（ＳＰ−ＤＥＣ）１４ｂを制御して字幕の再生出力制御
を行なう（図２ステップＳ3 ，Ｓ4 ）。The information of the language type recognition result obtained from the voice recognition unit (RECO) 17 is used as a judgment unit (JUDG).
E) is supplied to 18. The judgment unit (JUDGE) 18
When the information of the language type recognition result is received from the voice recognition unit (RECO) 17, the sub-picture decoder (SP-DEC) 14b of the MPEG decoder unit (MPEG2-DEC) 14 is controlled according to the information to reproduce and output the subtitle. Control is performed (steps S3 and S4 in FIG. 2).

【００５９】即ち、判定部（ＪＵＤＧＥ）１８は、音声
認識部（ＲＥＣＯ）１７より出力される種別認識結果の
情報から、日本語以外の言葉が再生出力されたことを検
知すると、サブピクチャデコーダ（ＳＰ−ＤＥＣ）１４
ｂにトリガをかけ、日本語字幕の副映像を再生出力制御
する。That is, when the judgment unit (JUDGE) 18 detects that a word other than Japanese has been reproduced and output from the information of the type recognition result output from the voice recognition unit (RECO) 17, the sub-picture decoder (JUDGE) 18 SP-DEC) 14
Trigger b to control the playback output of the Japanese subtitle sub-picture.

【００６０】これにより、例えば再生された映画情報に
日本語以外の英会話等による言葉が介在すると、その外
国語の音声（言葉）が日本語字幕で再生出力される。
尚、ここでは、副音声による字幕を音声認識部（ＲＥＣ
Ｏ）１７の認識結果に従い選択的に再生出力制御する構
成であったが、例えば一つの副音声による字幕を常時再
生出力していて、音声認識部（ＲＥＣＯ）１７の認識結
果に従い他の一つの副音声による字幕を選択的に再生出
力制御し、例えば色を異にする２種の字幕を重ね合わせ
て出力する構成、又は、字幕を差し替える構成等であっ
てもよい。又は、音声認識部（ＲＥＣＯ）１７の認識結
果に従い、日本語字幕に合わせて日本語音声を合成出力
することも可能である。又は、音声認識部（ＲＥＣＯ）
１７の認識結果に従い、特定の音声を認識したとき再生
シーンを切り替える構成、マルチシーン再生を行なう構
成等も実現可能である。Thus, for example, when a word such as an English conversation other than Japanese intervenes in the reproduced movie information, the sound (word) of the foreign language is reproduced and output in Japanese subtitles.
In addition, here, the subtitle by the sub-audio is recognized by the voice recognition unit (REC
O) The configuration is such that the reproduction output is selectively controlled according to the recognition result of 17; however, for example, subtitles of one sub-audio are always reproduced and output, and another sub-audio is recognized according to the recognition result of the voice recognition unit (RECO) 17. A configuration may also be adopted in which subtitles by sub-audio are selectively played back and output, and for example, two types of subtitles of different colors are superimposed and output, or a configuration in which subtitles are replaced. Alternatively, according to the recognition result of the voice recognition unit (RECO) 17, it is also possible to synthesize and output the Japanese voice in accordance with the Japanese subtitle. Or voice recognition unit (RECO)
According to the recognition result of 17, it is also possible to realize a configuration for switching the playback scene when a specific voice is recognized, a configuration for performing multi-scene playback, and the like.

【００６１】次に、図３、及び図４を参照して本発明の
第２実施例に於ける動作を説明する。この第２実施例の
動作が上述した第１実施例の動作と特に異なる点は、音
声認識部（ＲＥＣＯ）１７の認識結果に従い、その認識
文字列情報を用いて該当提供情報（例えば映画情報）の
インデックスを作成する点にある。Next, the operation of the second embodiment of the present invention will be described with reference to FIGS. 3 and 4. The operation of the second embodiment is particularly different from the operation of the first embodiment described above, according to the recognition result of the voice recognition unit (RECO) 17, using the recognized character string information, the corresponding provision information (for example, movie information). The point is to create an index.

【００６２】即ち、音声認識部（ＲＥＣＯ）１７は、Ｍ
ＰＥＧデコーダ部（ＭＰＥＧ２−ＤＥＣ）１４のオーデ
ィオ信号出力部１４ｅより再生出力された音声情報を入
力すると、その音声情報に含まれる人声の発音特徴抽出
等から言語の種別（例えば日本語、英語、仏語等）を認
識し、特定言語であるとき、その言語の特徴音抽出等に
より会話等の言葉を音声認識して、その言葉の発音順に
従う文字列のコードを生成する（図４ステップＳ5 ，Ｓ
6 ）。That is, the voice recognition unit (RECO) 17 is M
When the audio information reproduced and output from the audio signal output unit 14e of the PEG decoder unit (MPEG2-DEC) 14 is input, the type of language (for example, Japanese, English, (French, etc.) is recognized, and when it is a specific language, words such as conversation are voice-recognized by extracting characteristic sounds of the language, and a code of a character string according to the pronunciation order of the words is generated (step S5 in FIG. 4, S
6).

【００６３】この際、第１実施例では言語の種別認識結
果の情報が使用されたが、この第２実施例では、言葉の
発音順に従う文字列のコードが使用される。この音声認
識部（ＲＥＣＯ）１７から言葉の発音順に従う文字列の
コード（文字列情報）が生成されると、この文字列情報
はインデックス作成部（ＩＮＤＥＸ−ＧＥＮ）２０に供
給される。At this time, the information of the language type recognition result is used in the first embodiment, but in the second embodiment, the code of the character string according to the pronunciation order of the words is used. When the code (character string information) of the character string according to the pronunciation order of the words is generated from the voice recognition unit (RECO) 17, the character string information is supplied to the index creation unit (INDEX-GEN) 20.

【００６４】一方、ＣＰＵ１１は、映画情報再生時に於
けるフレーム更新の都度、そのフレーム番号情報をＩ／
Ｏレジスタ（ＦＲＡＭＥ）１９にラッチする。このＩ／
Ｏレジスタ（ＦＲＡＭＥ）１９に一時記憶されたフレー
ム番号情報は、インデックス作成部（ＩＮＤＥＸ−ＧＥ
Ｎ）２０に入力される。On the other hand, the CPU 11 sends the frame number information to the I / O each time the frame is updated when the movie information is reproduced.
It is latched in the O register (FRAME) 19. This I /
The frame number information temporarily stored in the O register (FRAME) 19 is stored in the index creation unit (INDEX-GE).
N) is input to 20.

【００６５】インデックス作成部（ＩＮＤＥＸ−ＧＥ
Ｎ）２０は、音声認識部（ＲＥＣＯ）１７から言葉の発
音順に従うコード化された文字列情報を受けると、その
文字列情報と、Ｉ／Ｏレジスタ（ＦＲＡＭＥ）１９にラ
ッチされたフレーム番号とをもとに、一定長の文字列と
フレーム番号とを互いに対応付けて一定フォームのイン
デックス情報を作成しＣＰＵ１１に通知する（図４ステ
ップＳ7 ）。Index creation unit (INDEX-GE
N) 20 receives the coded character string information according to the pronunciation order of words from the voice recognition unit (RECO) 17, and the character string information and the frame number latched in the I / O register (FRAME) 19. Based on the above, a constant length character string and a frame number are associated with each other to create constant form index information and notify the CPU 11 (step S7 in FIG. 4).

【００６６】ＣＰＵ１１は、このインデックス作成部
（ＩＮＤＥＸ−ＧＥＮ）２０で作成されたインデックス
情報を、主記憶（ＭＥＭ）１５内のインデックステーブ
ル格納部（ＩＮＤＥＸ−ＴＢＬ）１５ａに書き込む（図
４ステップＳ8 ）。The CPU 11 writes the index information created by the index creating unit (INDEX-GEN) 20 into the index table storage unit (INDEX-TBL) 15a in the main memory (MEM) 15 (step S8 in FIG. 4). .

【００６７】このようにして、主記憶（ＭＥＭ）１５内
のインデックステーブル格納部（ＩＮＤＥＸ−ＴＢＬ）
１５ａに所定フォーマット形式で字幕の文字コード列と
フレーム番号とを対応付けたインデックステーブルが作
成される。In this way, the index table storage unit (INDEX-TBL) in the main memory (MEM) 15 is used.
An index table in which a character code string of a subtitle and a frame number are associated with each other in a predetermined format is created in 15a.

【００６８】このインデックステーブル格納部（ＩＮＤ
ＥＸ−ＴＢＬ）１５ａに格納されたインデックステーブ
ルは、ユーザインタフェース部（User I/F）１６のキー
入力操作による指示に従い、ＣＰＵ１１の制御の下に、
表示用メモリ（ＶＭ）２３に表示データとして展開さ
れ、表示制御部（ＤＩＳＰ−ＣＯＮＴ）２４を介して表
示装置（ＤＩＳＰ）２１に表示することができる。This index table storage (IND
The index table stored in the EX-TBL) 15a is under the control of the CPU 11 according to the instruction by the key input operation of the user interface unit (User I / F) 16.
It can be expanded as display data in the display memory (VM) 23 and can be displayed on the display device (DISP) 21 via the display control unit (DISP-CONT) 24.

【００６９】上記したように、ＳＤディスク１０に記録
された提供情報（映画情報）の再生時に於いて、副映像
の字幕情報をメモリ上に展開し、文字認識してコード化
し、字幕をコード化した文字列情報を用いてインデック
ス情報を作成する機能をもつことにより、このインデッ
クス情報を参照して、映画等の提供情報に於ける、印象
場面の検索、シーン系列サーチ、ストーリーの要約等の
各種作業が容易に行なえ、字幕情報の有効活用が図れ
る。As described above, at the time of reproducing the provided information (movie information) recorded on the SD disc 10, the subtitle information of the sub-picture is expanded on the memory, the characters are recognized and coded, and the subtitle is coded. By having a function to create index information using the character string information that has been created, it is possible to refer to this index information and search for impression scenes, scene sequence searches, story summaries, etc. in the provided information such as movies. Work can be done easily and subtitle information can be effectively used.

【００７０】尚、上記した実施例では、音声認識部（Ｒ
ＥＣＯ）１７から言葉の発音順に従うコード化された文
字列情報からインデックス情報を作成する際、フレーム
番号を対応付けてインデックステーブルを作成したが、
これに限らず、例えばインデックス情報を一定の時間単
位で採取してインデックステーブルを作成する手段、又
は、メディアドライブコントローラ（ＭＤＣ）１２内の
バッファ（ＢＵＦ）より該当フレームのデータを採取
し、そのフレームのデータを対応付けてインデックステ
ーブルを作成する手段、又は、音声認識部（ＲＥＣＯ）
１７で得た言葉の発音順に従うコード化された文字列情
報と、Ｉ／Ｏレジスタ（ＦＲＡＭＥ）１９より取得した
フレーム番号と、メディアドライブコントローラ（ＭＤ
Ｃ）１２内のバッファ（ＢＵＦ）より採取した提供情報
（映画の１シーン情報）とを対応付けてインデックステ
ーブルを作成する手段等も上記実施例から容易に実現可
能である。又、この実施例では、インデックス情報を表
示出力する構成を例に示したが、これに限らず、例えば
プリントアウトによる出力手段、又は、ウィンドウを開
いて表示する手段等であってもよい。In the above embodiment, the voice recognition unit (R
When the index information is created from the coded character string information according to the pronunciation order of words from ECO) 17, the index table is created by associating the frame numbers with each other.
However, the present invention is not limited to this, for example, a unit that collects index information in a fixed time unit to create an index table or a buffer (BUF) in the media drive controller (MDC) 12 collects data of a corresponding frame, and the frame is collected. Means for creating an index table by associating the above data with each other, or a voice recognition unit (RECO)
Coded character string information according to the pronunciation order of the words obtained in 17, the frame number acquired from the I / O register (FRAME) 19, the media drive controller (MD
C) A means for creating an index table by associating it with provided information (1 scene information of a movie) collected from the buffer (BUF) in 12 can also be easily realized from the above embodiment. Further, in the present embodiment, the configuration for displaying and outputting the index information is shown as an example, but the present invention is not limited to this, and it may be an output means such as a printout or a means for opening and displaying a window.

【００７１】[0071]

【発明の効果】以上詳記したように本発明によれば、記
録媒体に記録された提供情報の再生時に於いて、再生出
力された音声の言語を音声認識し、その音声認識結果を
制御情報として、主映像、副映像、音声等の再生出力制
御に反映させる機能をもつ構成としたことにより、例え
ば再生された提供情報の内容を確実かつ正確に伝達でき
るとともに伝達機能及び用途を拡充でき、特に複数言語
を扱う提供情報の再生時に於ける情報伝達機能の向上及
び用途の拡充化が図れる。As described above in detail, according to the present invention, when the provided information recorded on the recording medium is reproduced, the language of the reproduced and outputted voice is recognized, and the result of the voice recognition is used as the control information. As a result, by adopting a configuration having a function to be reflected in reproduction output control of main video, sub-video, audio, etc., for example, it is possible to reliably and accurately transmit the content of the reproduced provided information and to expand the transmission function and application, In particular, it is possible to improve the information transmission function and expand the applications when reproducing the provided information that handles a plurality of languages.

【００７２】又、記録媒体に記録された提供情報の再生
時に於いて、再生出力された音声の言語を音声認識しコ
ード化された文字列情報を生成して、当該文字列情報を
用いインデックス情報を作成して保存する機能をもつ構
成としたことにより、副映像の字幕情報からインデック
ス情報を作成し保存することができる。このインデック
ス情報を参照することにより、映画等の提供情報に於け
る、印象場面の検索、シーン系列サーチ、ストーリーの
要約等の各種作業が容易に行なえ、字幕情報の有効活用
が図れる。Further, at the time of reproducing the provided information recorded on the recording medium, the language of the reproduced voice is recognized by voice, the encoded character string information is generated, and the index information is used by using the character string information. The index information can be created and saved from the subtitle information of the sub-picture by adopting the configuration having the function of creating and saving. By referring to the index information, it is possible to easily perform various operations such as search for an impression scene, scene sequence search, and story summary in the provided information such as a movie, and to effectively use the subtitle information.

[Brief description of drawings]

【図１】本発明の第１実施例に於けるシステムのハード
ウェア構成を示すブロック図。FIG. 1 is a block diagram showing a hardware configuration of a system according to a first embodiment of the present invention.

【図２】本発明の第１実施例に於ける音声認識による字
幕制御処理ルーチンを示すフローチャート。FIG. 2 is a flowchart showing a subtitle control processing routine by voice recognition according to the first embodiment of the present invention.

【図３】本発明の第２実施例に於けるシステムのハード
ウェア構成を示すブロック図。FIG. 3 is a block diagram showing a hardware configuration of a system according to a second embodiment of the present invention.

【図４】本発明の第２実施例に於ける音声認識によるイ
ンデックス作成処理ルーチンを示すフローチャート。FIG. 4 is a flowchart showing an index creation processing routine by voice recognition according to a second embodiment of the present invention.

【図５】上記各実施例に於けるＳＤディスク１０の記録
フォーマット例を示す図。FIG. 5 is a diagram showing an example of a recording format of the SD disk 10 in each of the above embodiments.

[Explanation of symbols]

１０…ＳＤディスク、１０ａ，１０ｂ…記録面、１１…
ＣＰＵ、１２…メディアドライブコントローラ（ＭＤ
Ｃ）、１３…メディアドライブユニット（ＭＤＵ）、１
３ａ…ターンテーブル、１３ｂ…ピックアップ、１４…
ＭＰＥＧデコーダ部（ＭＰＥＧ２−ＤＥＣ）、１４ａ…
ビデオデコーダ（ＶＤ−ＤＥＣ）、１４ｂサブピクチャ
デコーダ（ＳＰ−ＤＥＣ）、１４ｃ…オーディオデコー
ダ（ＡＵ−ＤＥＣ）、１４ｄ…ビデオ信号出力部（ＭＵ
Ｘ）、１４ｅ…オーディオ信号出力部、１５…主記憶
（ＭＥＭ）、１６…ユーザインタフェース部（User I/
F）、１７…音声認識部（ＲＥＣＯ）、１８…判定部
（ＪＵＤＧＥ）、１９…Ｉ／Ｏレジスタ（ＦＲＡＭ
Ｅ）、２０…インデックス作成部（ＩＮＤＥＸ−ＧＥ
Ｎ）、２１…表示装置（ＤＩＳＰ）、２２…音声出力部
（ＳＰ）、２３…表示用メモリ（ＶＭ）、２４…表示制
御部（ＤＩＳＰ−ＣＯＮＴ）。10 ... SD disc, 10a, 10b ... Recording surface, 11 ...
CPU, 12 ... Media drive controller (MD
C), 13 ... Media drive unit (MDU), 1
3a ... turntable, 13b ... pickup, 14 ...
MPEG decoder unit (MPEG2-DEC), 14a ...
Video decoder (VD-DEC), 14b Sub-picture decoder (SP-DEC), 14c ... Audio decoder (AU-DEC), 14d ... Video signal output unit (MU)
X), 14e ... Audio signal output section, 15 ... Main memory (MEM), 16 ... User interface section (User I /
F), 17 ... Voice recognition unit (RECO), 18 ... Judgment unit (JUDGE), 19 ... I / O register (FRAM)
E), 20 ... Index creation unit (INDEX-GE
N), 21 ... Display device (DISP), 22 ... Audio output unit (SP), 23 ... Display memory (VM), 24 ... Display control unit (DISP-CONT).

フロントページの続き (51)Int.Cl.⁶ 識別記号庁内整理番号ＦＩ技術表示箇所Ｈ０４Ｎ 5/93 Continuation of the front page (51) Int.Cl. ⁶ Identification code Office reference number FI Technical display area H04N 5/93

Claims

[Claims]

1. A device for recognizing a language of a reproduced and outputted sound in a device having a reproduction function of a recording medium capable of recording provided information by a sub-picture synchronized with the main picture and the same picture and a plurality of channels of sound. And an information provision control device comprising: means for controlling the reproduction output of the sub-picture according to the recognition result.

2. A means for recognizing a language of a reproduced and outputted sound in a device having a reproducing function of a recording medium capable of recording provided information by a sub-picture synchronized with the main picture and the same picture and a plurality of channels of sound. And an information providing control device comprising means for controlling the sound reproduced and output according to the recognition result.

3. An apparatus having a reproducing function of a recording medium capable of recording provided information by a sub-picture synchronized with the main picture and the same picture and a plurality of channels of sound, and a sound of a specific language reproduced and outputted and another language. And a means for reproducing and outputting a subtitle of a specific language by a sub-picture when a reproduction output sound of another language is identified.

4. An apparatus having a reproduction function of a recording medium capable of recording provided information by a main image, a sub-image synchronized with the same image, and audio of a plurality of channels, and an audio of a specific language reproduced and output and another language. And a means for reproducing and outputting a plurality of sub-pictures including subtitles of a specific language when the reproduction output sound of another language is identified.

5. An apparatus having a reproduction function of a recording medium capable of recording provided information by a sub-picture synchronized with the main picture and the same picture and a plurality of channels of sound, and reproduced and output the sound of a specific language and another language. And a means for reproducing and outputting a sound of a specific language recorded in another sound recording unit when a reproduced and output sound of another language is identified. Control device.

6. An apparatus having a reproducing function of a recording medium capable of recording provided information by a sub-picture synchronized with a main picture and the same picture and a plurality of channels of sound, and recognizing a specific sound from the reproduced and outputted sound. An information provision control device, comprising: a means for switching and a means for switching a reproduction scene when a specific voice is recognized.

7. An apparatus having a reproduction function of a recording medium capable of recording provided information by a sub-picture synchronized with the same picture as the main picture and a plurality of channels of sound, and recognizing a specific sound from the reproduced and outputted sound. An information provision control apparatus comprising: a means for performing multi-scene reproduction when a specific voice is recognized.

8. The language of the reproduced and output audio is recognized when the recording medium capable of recording the main video and the sub-video synchronized with the same video and the information provided by a plurality of channels of audio is recognized, and the audio recognition result is obtained. An information provision control method characterized by controlling the reproduction output of the sub-picture according to the above.

9. A speech recognition result obtained by recognizing a language of a reproduced and outputted sound when reproducing a recording medium capable of recording provided information by a sub-picture synchronized with the main picture and the same picture and a plurality of channels of sound. A method for controlling information provision, characterized in that the voice reproduced and output according to the method is controlled.

10. When reproducing a recording medium capable of recording the main image, a sub-image synchronized with the same image, and information provided by a plurality of channels of audio, the audio of a specific language reproduced and output and the audio of another language are output. A method for controlling information provision, characterized in that a subtitle of a specific language is reproduced and output by a sub-picture when the output is identified and the reproduction output sound of another language is identified.

11. When reproducing a main image, a sub-image synchronized with the main image, and a recording medium capable of recording provided information by a plurality of channels of audio, the audio of a specific language reproduced and output and the audio of another language are output. A method for controlling provision of information, characterized in that a plurality of sub-pictures including subtitles of a specific language are reproduced and output when they are identified and reproduction output sound of another language is identified.

12. When reproducing a main image, a sub-image synchronized with the same image, and a recording medium capable of recording provided information by a plurality of channels of audio, the audio of a specific language reproduced and output and the audio of another language are output. A method for controlling provision of information, characterized in that when a discriminating and reproducing output voice of another language is discriminated, a voice of a specific language recorded in another voice recording section is reproduced and outputted.

13. When reproducing a recording medium capable of recording information provided by a sub-video synchronized with the same video as the main video and audio of a plurality of channels, a specific audio is recognized from the reproduced and output audio, and a specific audio is recognized. An information provision control method, characterized in that a reproduction scene is switched when a voice is recognized.

14. When reproducing a recording medium capable of recording information provided by a sub-video synchronized with the main video and the audio of a plurality of channels, a specific audio is recognized from the reproduced and output audio, and a specific audio is recognized. An information provision control method characterized by performing multi-scene reproduction when a voice is recognized.

15. A device having a reproduction function of a recording medium capable of recording provided information by a sub-image synchronized with the main image and the sub-image and audio of a plurality of channels, recognizes the language of the reproduced and outputted voice. An information providing control device comprising: a unit for generating coded character string information; and a unit for collecting the coded character string information at a fixed time unit and storing it as index information.

16. An apparatus having a reproducing function of a recording medium capable of recording provided information by a sub-picture synchronized with the same picture as the main picture and audio of a plurality of channels, at the time of reproducing the provided information recorded on the recording medium. In this case, the means for acquiring the frame number of the reproduction provision information, the means for recognizing the language of the reproduced and output voice and generating the coded character string information, and the coded character string information for the acquisition frame number An information provision control device comprising: means for associating and generating index information; and means for storing the generated index information.

17. An apparatus having a reproducing function of a recording medium capable of recording the auxiliary information synchronized with the main image and the sub-image and the audio of a plurality of channels when reproducing the provided information recorded on the recording medium. A means for collecting the information provided for the reproduced frame, a means for recognizing the language of the reproduced and output voice and generating coded character string information, and a means for collecting the coded character string information for the corresponding frame. An information provision control device comprising: a unit for generating index information in association with the provision information; and a unit for storing the generated index information.

18. An apparatus having a function of reproducing a recording medium capable of recording provided information by sub-picture synchronized with the same picture as the main picture and audio of a plurality of channels, at the time of reproducing the provided information recorded on the recording medium. At this time, a means for acquiring the frame number of the reproduction provision information and collecting the provision information of the frame, a means for recognizing the language of the reproduced and output voice and generating coded character string information, are encoded. Information provision control, characterized in that it comprises means for generating index information by associating the obtained character string information with the acquired frame number and the collected provision information, and means for storing the generated index information. apparatus.

19. The method according to claim 15, 16 or 17 or 18, further comprising means for visually outputting the stored index information.
The information provision control device described.

20. The information provision control device according to claim 17, wherein at least one type of provision information is collected from the main video, the sub video, and the audio.

21. The information provision control apparatus according to claim 17, wherein at least one type of provision information of main video, sub-video, and audio is sampled for a plurality of frames.

22. The information provision control device according to claim 17, further comprising means for reproducing a still image by referring to index information.

23. The information provision control apparatus according to claim 17, further comprising means for reproducing a moving image with reference to index information.

24. The information provision control device according to claim 17, further comprising means for reproducing audio by referring to index information.

25. When reproducing a recording medium capable of recording the main image, the sub-image synchronized with the same image, and the information provided by a plurality of channels of audio, the language of the audio reproduced and outputted is recognized and encoded. An information provision control method characterized in that character string information is generated, and the character string information is collected at fixed time intervals and stored as index information.

26. When reproducing a recording medium on which a main image and a sub-image synchronized with the same image and audio of a plurality of channels can record the provided information, the frame number of the provided information is acquired, and the reproduced audio is output. The method for controlling information provision, comprising: recognizing the language of [1], generating coded character string information, and correlating the character string information with a frame number to generate and storing index information.

27. When reproducing a recording medium on which a main image, a sub image synchronized with the same image, and information provided by a plurality of channels of audio can be recorded, the language of the reproduced and outputted audio is recognized and encoded. Generate string information,
An information provision control method, which collects provision information of a corresponding frame, and associates coded character string information with provision information of the corresponding frame to generate and store index information.

28. When reproducing a recording medium capable of recording provided information in the form of sub-picture synchronized with the same picture as the main picture and audio of a plurality of channels, the frame number of the provided information is acquired and reproduced and outputted. Generates coded character string information by recognizing the language of, and collects the frame provision information, and associates the coded character string information with the acquired frame number and the collected provision information to create an index. An information provision control method characterized by generating and storing information.