JPH10149193A

JPH10149193A - Device and method for processing information

Info

Publication number: JPH10149193A
Application number: JP8310246A
Authority: JP
Inventors: Nobuyuki Sadanaka; 信行定仲
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1996-11-21
Filing date: 1996-11-21
Publication date: 1998-06-02

Abstract

PROBLEM TO BE SOLVED: To exactly generate text data corresponding to words from audio signals. SOLUTION: A voice recognizing part 120 recognizes the voices of audio signals of a center channel containing word signals in audio data for 5.1 channels reproduced from a DVD 111 and generates the text data. At an address generating part 121, an address corresponding to these text data is generated and bit map data corresponding to a text are outputted from a bit map data ROM 122 and stored as superimposed characters in a frame memory 114 while being superimposed on the image of video data outputted from an MPEG 2 decoder 113. The video data written in the frame memory 114 are encoded by a video encoder 115, converted to NTSC system video signals, outputted to a display 133 to display.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、情報処理装置およ
び方法に関し、特に、ビデオ信号に対応するオーディオ
信号からテキストデータを生成することができるように
した情報処理装置および方法に関する。The present invention relates to an information processing apparatus and method, and more particularly to an information processing apparatus and method capable of generating text data from an audio signal corresponding to a video signal.

【０００２】[0002]

【従来の技術】最近、DVD(Digital Versatile Disc)が
開発され、普及しつつある。このDVDにおいては、ビデ
オデータと、それに対応するオーディオデータの他に、
サブピクチャデータが記録できるようになされている。
そして、このサブピクチャデータとして、字幕のデータ
を記録しておくことで、必要に応じて、字幕を本来の画
像に重畳して表示することができるようになされてい
る。2. Description of the Related Art Recently, a DVD (Digital Versatile Disc) has been developed and is becoming popular. In this DVD, besides the video data and the corresponding audio data,
Sub picture data can be recorded.
Then, by recording subtitle data as the sub-picture data, the subtitle can be superimposed on the original image and displayed as necessary.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、多くの
場合、例えば、音声で英語の言語が話されている場合、
字幕の言語は、英語以外の、例えば、日本語、フランス
語などの、英語を理解できない視聴者のための言語とさ
れるのが一般的である。その結果、例えば、文字による
英語は理解することができるが、音声を聞くことが困難
な聴覚障害者などは、そのプログラムを楽しむことがで
きない課題があった。However, in many cases, for example, when a spoken English language is spoken,
The language of the subtitles is generally a language other than English, for example, a language for viewers who cannot understand English, such as Japanese and French. As a result, for example, there is a problem that a hearing-impaired person or the like who cannot understand voices can understand English by using letters but cannot enjoy the program.

【０００４】また、米国においては、聴覚障害者のため
に、英語の字幕を表示することが義務付けられている
が、サブピクチャデータに英語を付加すれば、その分だ
け他の言語をサブピクチャに付加する余裕がなくなり、
同一のDVDを、多くの国に配布することが困難になる。[0004] In the United States, it is mandatory to display English subtitles for the hearing impaired. However, if English is added to subpicture data, other languages can be used as subpictures accordingly. I can't afford to add
It will be difficult to distribute the same DVD to many countries.

【０００５】そこで、例えば、ビデオ信号に付随するオ
ーディオ信号を音声認識することにより、テキストデー
タを生成し、字幕として表示することも考えられる。し
かしながら、一般的に、オーディオ信号には、台詞だけ
ではなく、音楽や効果音などが合成されており、このオ
ーディオ信号から台詞だけを正しく音声認識することは
困難である課題があった。Therefore, for example, it is conceivable to generate text data by voice-recognizing an audio signal accompanying a video signal and display it as subtitles. However, in general, not only speech but also music and sound effects are synthesized in the audio signal, and there is a problem that it is difficult to correctly recognize only speech from this audio signal.

【０００６】本発明はこのような状況に鑑みてなされた
ものであり、簡単かつ確実に、台詞に対応するテキスト
データを生成することができるようにするものである。The present invention has been made in view of such a situation, and it is an object of the present invention to easily and surely generate text data corresponding to dialogue.

【０００７】[0007]

【課題を解決するための手段】請求項１に記載の情報処
理装置は、ビデオ信号とビデオ信号に対応する台詞信号
とが多重化されている多重化信号から台詞信号を分離す
る分離手段と、分離された台詞信号を音声認識する音声
認識手段と、音声認識結果に対応してテキストデータを
発生する発生手段とを備えることを特徴とする。An information processing apparatus according to claim 1, comprising: a separating unit for separating a speech signal from a multiplexed signal in which a video signal and a speech signal corresponding to the video signal are multiplexed; It is characterized by comprising speech recognition means for speech recognition of the separated speech signal, and generation means for generating text data in accordance with the speech recognition result.

【０００８】請求項６に記載の情報処理方法は、ビデオ
信号とビデオ信号に対応する台詞信号とが多重化されて
いる多重化信号から台詞信号を分離する分離ステップ
と、分離された台詞信号を音声認識する音声認識ステッ
プと、音声認識結果に対応してテキストデータを発生す
る発生ステップとを備えることを特徴とする。According to a sixth aspect of the present invention, there is provided an information processing method, comprising: a separating step of separating a speech signal from a multiplexed signal in which a video signal and a speech signal corresponding to the video signal are multiplexed; The method includes a voice recognition step for voice recognition and a generation step for generating text data in accordance with the voice recognition result.

【０００９】請求項１に記載の情報処理装置および請求
項６に記載の情報処理方法においては、ビデオ信号とビ
デオ信号に対応する台詞信号とが多重化されている多重
化信号から台詞信号が分離され、分離された台詞信号が
音声認識される。そして、音声認識結果に対応してテキ
ストデータが発生される。その結果、正確な音声認識が
可能となり、正確なテキストデータを得ることができ
る。In the information processing apparatus according to the first aspect and the information processing method according to the sixth aspect, a speech signal is separated from a multiplexed signal in which a video signal and a speech signal corresponding to the video signal are multiplexed. Then, the separated speech signal is subjected to speech recognition. Then, text data is generated corresponding to the speech recognition result. As a result, accurate speech recognition becomes possible, and accurate text data can be obtained.

【００１０】[0010]

【発明の実施の形態】以下に本発明の実施の形態を説明
するが、特許請求の範囲に記載の発明の各手段と以下の
実施の形態との対応関係を明らかにするために、各手段
の後の括弧内に、対応する実施の形態（但し一例）を付
加して本発明の特徴を記述すると、次のようになる。但
し勿論この記載は、各手段を記載したものに限定するこ
とを意味するものではない。DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiments of the present invention will be described below. In order to clarify the correspondence between each means of the invention described in the claims and the following embodiments, each means is described. When the features of the present invention are described by adding the corresponding embodiment (however, an example) in parentheses after the parentheses, the result is as follows. However, of course, this description does not mean that each means is limited to those described.

【００１１】請求項１に記載の情報処理装置は、ビデオ
信号とビデオ信号に対応する台詞信号とが多重化されて
いる多重化信号から台詞信号を分離する分離手段（例え
ば、図１のデマルチプレクサ１１２）と、分離された台
詞信号を音声認識する音声認識手段（例えば、図１の音
声認識部１２０）と、音声認識結果に対応してテキスト
データを発生する発生手段（例えば、図１の音声認識部
１２０）とを備えることを特徴とする。According to a first aspect of the present invention, there is provided an information processing apparatus for separating a speech signal from a multiplexed signal in which a video signal and a speech signal corresponding to the video signal are multiplexed (for example, the demultiplexer shown in FIG. 1). 112), voice recognition means for voice recognition of the separated speech signal (for example, voice recognition unit 120 in FIG. 1), and generation means for generating text data corresponding to the voice recognition result (for example, voice in FIG. 1) And a recognition unit 120).

【００１２】請求項２に記載の情報処理装置は、ビデオ
信号をデコードするデコード手段（例えば、図１のMPEG
２デコーダ１１３）と、デコードされた画像に、テキス
トデータに対応するテキストを合成する合成手段（例え
ば、図１のフレームメモリ１１４）とをさらに備えるこ
とを特徴とする。According to another aspect of the present invention, there is provided an information processing apparatus comprising: decoding means for decoding a video signal;
2 decoder 113) and synthesizing means (for example, the frame memory 114 in FIG. 1) for synthesizing text corresponding to text data with the decoded image.

【００１３】請求項３に記載の情報処理装置は、テキス
トデータを記憶する記憶手段（例えば、図６のHDD８
０）をさらに備えることを特徴とする。According to a third aspect of the present invention, there is provided an information processing apparatus which stores text data (for example, the HDD
0) is further provided.

【００１４】図１は、本発明の情報処理装置を適用した
DVDプレーヤの構成例を表している。このDVDプレーヤ１
００においては、DVD１１１より再生された信号がデマ
ルチプレクサ１１２に供給され、ビデオデータ、オーデ
ィオデータ、およびサブピクチャデータに分離されるよ
うになされている。MPEG(Moving Pictures Experts Gro
up)２デコーダ１１３は、入力されたビデオデータをデ
コードした後、フレームメモリ１１４に出力するように
なされている。FIG. 1 shows an information processing apparatus according to the present invention.
3 illustrates a configuration example of a DVD player. This DVD player 1
In 00, the signal reproduced from the DVD 111 is supplied to the demultiplexer 112, and is separated into video data, audio data, and sub-picture data. MPEG (Moving Pictures Experts Gro
up) The 2 decoder 113 decodes the input video data and outputs it to the frame memory 114.

【００１５】また、サブピクチャデコーダ１１８は、デ
マルチプレクサ１１２より供給されたサブピクチャデー
タをデコードし、スイッチ１１９の接点Ａを介して、フ
レームメモリ１１４に供給するようになされている。フ
レームメモリ１１４より読み出されたデータは、ビデオ
エンコーダ１１５に入力され、NTSC方式またはPAL方式
などのビデオ信号に変換された後、ディスプレイ１３３
に供給されるようになされている。The sub-picture decoder 118 decodes the sub-picture data supplied from the demultiplexer 112 and supplies the decoded data to the frame memory 114 via the contact A of the switch 119. The data read from the frame memory 114 is input to the video encoder 115, converted into a video signal of the NTSC system or the PAL system, etc.
To be supplied.

【００１６】デマルチプレクサ１１２から出力されるオ
ーディオデータは、ドルビーＡＣ−３（商標）方式でエ
ンコードされており、ＡＣ−３デコーダ１１６は、これ
をデコードし、５．１チャンネルのマルチトラックオー
ディオデータとして出力するようになされている。D/A
変換器１１７は、ＡＣ−３デコーダ１１６より出力され
た５．１チャンネル分のオーディオデータをD/A変換し
た後、マルチチャンネルアンプ１３１に出力している。
マルチチャンネルアンプ１３１は、入力された５．１チ
ャンネル分のオーディオ信号を増幅した後、スピーカシ
ステム１３２に出力している。スピーカシステム１３２
は、５チャンネル分のスピーカとしての、前左側のスピ
ーカ１４１、前右側のスピーカ１４２、後ろ左側のスピ
ーカ１４３、後ろ右側のスピーカ１４４、および前セン
ターのスピーカ１４５と、０．１チャンネル分のスピー
カとしての、重低音用のスピーカ（ウーハー）１４６に
より構成されている。The audio data output from the demultiplexer 112 is encoded according to the Dolby AC-3 (trademark) system, and the AC-3 decoder 116 decodes the encoded data to produce 5.1-channel multi-track audio data. The output has been made. D / A
The converter 117 D / A-converts the 5.1-channel audio data output from the AC-3 decoder 116 and outputs the data to the multi-channel amplifier 131.
The multi-channel amplifier 131 amplifies the input 5.1-channel audio signal and outputs the amplified signal to the speaker system 132. Speaker system 132
Is a speaker for five channels, a front left speaker 141, a front right speaker 142, a rear left speaker 143, a rear right speaker 144, and a front center speaker 145, and a 0.1 channel speaker. Of the speaker (woofer) 146 for heavy bass.

【００１７】音声認識部１２０は、ＡＣ−３デコーダ１
１６より出力された５．１チャンネルのデータのうち、
センターチャンネルのオーディオデータ（スピーカシス
テム１３２のスピーカ１４５に供給されるオーディオデ
ータ）を音声認識し、音声認識した結果に対応してテキ
ストデータを生成し、アドレス生成部１２１に出力して
いる。アドレス生成部１２１は、テキストデータに対応
するアドレスを発生し、ビットマップデータROM１２２
に出力する。ビットマップデータROM１２２には、テキ
ストデータに対応するビットマップデータが記憶されて
おり、そのビットマップデータが、スイッチ１１９の接
点Ｂを介して、フレームメモリ１１４に供給されるよう
になされている。The speech recognition unit 120 is provided with the AC-3 decoder 1
16 out of the 5.1 channel data output
The audio data of the center channel (audio data supplied to the speaker 145 of the speaker system 132) is subjected to voice recognition, text data is generated in accordance with the voice recognition result, and the text data is output to the address generation unit 121. The address generation unit 121 generates an address corresponding to the text data, and generates an address corresponding to the text data.
Output to Bitmap data corresponding to text data is stored in the bitmap data ROM 122, and the bitmap data is supplied to the frame memory 114 via the contact B of the switch 119.

【００１８】例えば、マイクロコンピュータなどにより
構成される制御回路１２４は、操作部１２３からの入力
に対応して、スイッチ１１９を切り替える他、各部を制
御するようになされている。For example, a control circuit 124 constituted by a microcomputer or the like switches a switch 119 in response to an input from the operation unit 123 and controls each unit.

【００１９】次に、その動作について説明する。ユーザ
が、操作部１２３を操作して、DVD１１１の再生を指令
すると、制御回路１２４は、各部を制御して、再生を開
始させる。DVD１１１より再生された信号は、デマルチ
プレクサ１１２に入力され、デマルチプレクサ１１２
は、この再生信号から、ビデオ信号、オーディオ信号、
およびサブピクチャ信号を分離し、それぞれMPEG２デコ
ーダ１１３、ＡＣ−３デコーダ１１６、またはサブピク
チャデコーダ１１８に出力する。Next, the operation will be described. When the user operates the operation unit 123 to instruct reproduction of the DVD 111, the control circuit 124 controls each unit to start reproduction. The signal reproduced from the DVD 111 is input to the demultiplexer 112,
From this playback signal, video signal, audio signal,
And the sub-picture signal are separated and output to the MPEG2 decoder 113, the AC-3 decoder 116, or the sub-picture decoder 118, respectively.

【００２０】MPEG２デコーダ１１３は、入力されたビデ
オデータを、MPEG２方式でデコードし、４：２：２のデ
ジタルビデオデータとして復号し、フレームメモリ１１
４に出力し、記憶させる。The MPEG2 decoder 113 decodes the input video data according to the MPEG2 system, decodes it as 4: 2: 2 digital video data, and
4 for storage.

【００２１】一方、サブピクチャデコーダ１１８は、入
力されたサブピクチャデータをデコードし、スイッチ１
１９の接点Ａを介して、フレームメモリ１１４に出力
し、記憶させる。その結果、フレームメモリ１１４にお
いて、オリジナルの画像に字幕が重畳された画像が得ら
れ、これがビデオエンコーダ１１５に入力され、例え
ば、NTSC方式のビデオ信号に変換され、ディスプレイ１
３３に供給され、表示される。On the other hand, the sub-picture decoder 118 decodes the input sub-picture data and
The data is output to the frame memory 114 via the 19 contact points A and stored. As a result, an image in which subtitles are superimposed on the original image is obtained in the frame memory 114, and this image is input to the video encoder 115 and converted into, for example, an NTSC video signal.
33 and is displayed.

【００２２】ＡＣ−３デコーダ１１６は、入力された
５．１チャンネル分のオーディオデータをデコードし、
D/A変換器１１７に出力する。D/A変換器１１７は、入力
されたオーディオデータをD/A変換した後、マルチチャ
ンネルアンプ１３１に出力する。マルチチャンネルアン
プ１３１は、入力された５．１チャンネル分のオーディ
オ信号を増幅し、それぞれ対応するスピーカ１４１乃至
１４６に出力する。The AC-3 decoder 116 decodes the input 5.1 channel audio data,
Output to the D / A converter 117. The D / A converter 117 performs D / A conversion on the input audio data and outputs the audio data to the multi-channel amplifier 131. The multi-channel amplifier 131 amplifies the input 5.1-channel audio signals and outputs the amplified signals to the corresponding speakers 141 to 146, respectively.

【００２３】以上のようにして、通常の再生が行われ
る。ユーザは、サブピクチャによる字幕を必要としない
場合には、操作部１２３を操作して、制御回路１２４を
介して、スイッチ１１９を接点Ｃ側に切り替えさせるこ
とで、字幕を表示させないようにすることができる。あ
るいはまた、サブピクチャデータとして、独立した複数
の言語の字幕が用意されている場合には、操作部１２３
を操作することで、その中から１つの所望の字幕を選択
し、サブピクチャデコーダ１１８にデコードさせ、ディ
スプレイ１３３に表示させることができる。Normal reproduction is performed as described above. When the user does not need the subtitle by the sub-picture, the user operates the operation unit 123 to switch the switch 119 to the contact C side via the control circuit 124 so that the subtitle is not displayed. Can be. Alternatively, if subtitles in a plurality of independent languages are prepared as sub-picture data, the operation unit 123
By operating the subtitle, one desired subtitle can be selected from the subtitles, decoded by the sub-picture decoder 118, and displayed on the display 133.

【００２４】さらに、ユーザは、スピーカシステム１３
２より出力される音声の言語に対応するテキストを字幕
として表示させたい場合には、操作部１２３を制御し、
制御回路１２４に、スイッチ１１９を接点Ｂ側に切り替
えさせる。このとき、音声認識部１２０は、ＡＣ−３デ
コーダ１１６が出力する５．１チャンネル分のオーディ
オデータのうち、センターチャンネルに対応する音声デ
ータの音声認識処理を実行する。このセンターチャンネ
ルには、通常、ディスプレイ１３３に表示されている画
像に対応するオーディオ成分のうち、台詞のみが含まれ
ており、音楽、効果音などは、他のスピーカ１４１乃至
１４４、およびスピーカ１４６に対応するチャンネルに
のみ含まれており、センターチャンネルには含まれてい
ない。従って、音声認識部１２０は、音楽や効果音など
に影響されずに、台詞だけを正確に音声認識することが
できる。Further, the user can use the speaker system 13
If the user wants to display the text corresponding to the language of the audio output from 2 as subtitles, the operation unit 123 is controlled,
The control circuit 124 switches the switch 119 to the contact B side. At this time, the voice recognition unit 120 performs voice recognition processing of voice data corresponding to the center channel among the 5.1-channel audio data output from the AC-3 decoder 116. The center channel usually includes only speech among audio components corresponding to the image displayed on the display 133, and music and sound effects are transmitted to the other speakers 141 to 144 and the speaker 146. Only included in the corresponding channel, not in the center channel. Therefore, the voice recognition unit 120 can accurately recognize only the speech without being affected by music or sound effects.

【００２５】音声認識部１２０は、音声認識した結果に
対応するテキストデータを生成し、これをアドレス生成
部１２１に供給する。アドレス生成部１２１は、入力さ
れたテキストデータに対応するアドレスを発生し、ビッ
トマップデータROM１２２に出力する。ビットマップデ
ータROM１２２は、アドレス生成部１２１より入力され
たアドレスに対応するビットマップデータを読み出し、
出力する。例えば、アドレス生成部１２１が、アルファ
ベット文字Ａに対応するアドレスを出力した場合には、
アルファベット文字Ａに対応するビットマップデータが
読み出され、出力される。The voice recognition unit 120 generates text data corresponding to the result of the voice recognition, and supplies the text data to the address generation unit 121. The address generation unit 121 generates an address corresponding to the input text data, and outputs the generated address to the bitmap data ROM 122. The bitmap data ROM 122 reads out bitmap data corresponding to the address input from the address generation unit 121,
Output. For example, when the address generation unit 121 outputs an address corresponding to the alphabetical character A,
The bitmap data corresponding to the alphabetical character A is read and output.

【００２６】ビットマップデータROM１２２より出力さ
れたビットマップデータは、スイッチ１１９の接点Ｂを
介して、フレームメモリ１１４に供給され、MPEG２デコ
ーダ１１３より供給されている画像データ上に、字幕と
して重畳される。そして、このフレームメモリ１１４上
のデータが、ビデオエンコーダ１１５において、NTSC方
式あるいはPAL方式などのビデオデータに変換され、デ
ィスプレイ１３３に出力され、表示される。The bitmap data output from the bitmap data ROM 122 is supplied to the frame memory 114 via the contact B of the switch 119, and is superimposed as subtitles on the image data supplied from the MPEG2 decoder 113. . Then, the data on the frame memory 114 is converted into video data of the NTSC system or the PAL system by the video encoder 115 and output to the display 133 for display.

【００２７】以上のようにして、聴覚障害者であったと
しても、操作部１２３を操作することで、スピーカシス
テム１３２より出力される音声の言語と同一の言語の字
幕を見ながら、プログラムを楽しむことができる。As described above, even if the person is deaf, the user can enjoy the program by operating the operation unit 123 while watching the subtitles in the same language as the language of the sound output from the speaker system 132. be able to.

【００２８】図２は、本発明の情報処理装置を応用し
た、ＡＶシステムの構成例を示している。この実施の形
態においては、パーソナルコンピュータ１が、チュー
ナ、アンプ、ビデオディスクプレーヤなどのＡＶ機器２
とともに、テレビジョン受像機３に接続されている。テ
レビジョン受像機３は、画像を表示するCRT４と、オー
ディオ信号を出力するスピーカ５とを有している。FIG. 2 shows an example of the configuration of an AV system to which the information processing apparatus of the present invention is applied. In this embodiment, a personal computer 1 is an AV device 2 such as a tuner, an amplifier, and a video disc player.
In addition, it is connected to the television receiver 3. The television receiver 3 has a CRT 4 for displaying an image and a speaker 5 for outputting an audio signal.

【００２９】また、キーボード１１は、複数のキー１２
とタッチパッド１３を有し、それらの操作に対応する赤
外線信号を、赤外線発信部１４からパーソナルコンピュ
ータ１に出射するようになされている。The keyboard 11 includes a plurality of keys 12.
And a touch pad 13, and an infrared signal corresponding to the operation is emitted from the infrared transmitting unit 14 to the personal computer 1.

【００３０】図３は、パーソナルコンピュータ１の外観
形状を表している。パーソナルコンピュータ１は、その
幅が２２５mm、高さが９４mm、奥行きが３５０mmとされ
ている。また、パーソナルコンピュータ１の前面には、
開閉自在なドア２１が設けられ、ドア２１の左右には、
面２２が設けられている。図中、左側の面２２には、電
源をオンまたはオフするとき操作されるパワースイッチ
２３と、キーボード１１の赤外線発信部１４より出射さ
れた赤外線を受信する赤外線受信部２４が形成されてい
る。FIG. 3 shows the external shape of the personal computer 1. The personal computer 1 has a width of 225 mm, a height of 94 mm, and a depth of 350 mm. Also, on the front of the personal computer 1,
An openable and closable door 21 is provided.
A surface 22 is provided. In the figure, a power switch 23 that is operated to turn on or off the power and an infrared receiving unit 24 that receives infrared light emitted from the infrared transmitting unit 14 of the keyboard 11 are formed on the left surface 22.

【００３１】また、パーソナルコンピュータ１の上面に
は、パーソナルコンピュータ１に対して接続される周辺
機器を載置した場合に、その周辺機器の脚部が上面に安
定して配置されるように、周辺機器の脚部に対応する位
置に、凹部２５が形成されている。Also, on the upper surface of the personal computer 1, when peripheral devices to be connected to the personal computer 1 are mounted, the peripheral devices are arranged so that the legs of the peripheral devices are stably arranged on the upper surface. A recess 25 is formed at a position corresponding to the leg of the device.

【００３２】図４は、パーソナルコンピュータ１のドア
２１を開放した状態を示している。同図に示すように、
ドア２１を開放すると、DVD(Digital Versatile Disc)
ドライブ３３が露出するようになされている。また、こ
のDVDドライブ３３の下方には、シリアルインタフェー
スとしてのUSB端子３１とIEEE(Institute of Electrica
l and Electronics Engineers)１３９４規格の１３９４
端子３２が設けられている。FIG. 4 shows a state where the door 21 of the personal computer 1 is opened. As shown in the figure,
When the door 21 is opened, a DVD (Digital Versatile Disc)
The drive 33 is exposed. A USB terminal 31 as a serial interface and an IEEE (Institute of Electric
l and Electronics Engineers) 1394 standard 1394
A terminal 32 is provided.

【００３３】図５は、パーソナルコンピュータ１の背面
のドア４１を開放した状態を示している。同図に示すよ
うに、ドア４１を開放すると、PCカード挿入口４２が露
出するようになされている。また、PCカード挿入口４２
の下方には、USB端子４３、１３９４端子４４の他、プ
リンタを接続するプリンタ端子４５と、コンピュータグ
ラフィックスデータを出力するVGA端子４６が設けられ
ている。FIG. 5 shows a state in which the door 41 on the back of the personal computer 1 is opened. As shown in the figure, when the door 41 is opened, the PC card insertion slot 42 is exposed. Also, the PC card insertion slot 42
A USB terminal 43 and a 1394 terminal 44, a printer terminal 45 for connecting a printer, and a VGA terminal 46 for outputting computer graphics data are provided below the USB terminal 43.

【００３４】図６は、パーソナルコンピュータ１の内部
の構成例を表している。CPU(Central Processing Unit)
７１は、例えば、Intel社のPentium（商標）が用いられ
る。このCPUは、１６６MHzの周波数の内部クロック、ま
たは６６MHzの周波数の外部クロックにより動作する。R
AM７２は、１６MBのメインメモリであり、CPU７１によ
り処理されるデータ、プログラムなどを、適宜記憶す
る。ROM７３は、CPU７１が各種の処理を実行する上にお
いて必要なプログラムを記憶している。EEPROM(Electri
cally Erasable Programmable Read Only Memory)７４
は、パーソナルコンピュータ１の電源をオフした後も記
憶する必要があるデータなどを、適宜記憶する。FIG. 6 shows an example of the internal configuration of the personal computer 1. CPU (Central Processing Unit)
For example, Pentium (trademark) of Intel Corporation is used for 71. This CPU operates with an internal clock having a frequency of 166 MHz or an external clock having a frequency of 66 MHz. R
The AM 72 is a 16 MB main memory, and stores data, programs, and the like processed by the CPU 71 as appropriate. The ROM 73 stores programs necessary for the CPU 71 to execute various processes. EEPROM (Electri
cally Erasable Programmable Read Only Memory) 74
Stores data that needs to be stored even after the power of the personal computer 1 is turned off.

【００３５】グラフィックス処理部７５は、動画処理
（動画データの表示形式であるYUV信号からグラフィッ
クス信号データ形式のRGB信号へ変換する色空間変換、
所望の画面寸法で表示するためのスケーリング（拡大ま
たは縮小）処理など）を行う他、３次元グラフィックス
処理（例えば、３次元の物体を２次画面の画面に投影す
るためのラスタライズ処理、オブジェクトの表面を滑ら
かに見せるためのグローシェーディング処理、半透明の
オブジェクトを表現するためのアルファブレンディング
処理など）を行ったり、さらに、その処理結果をディス
プレイメモリ７６に書き込み、合成回路８５に出力する
処理などを行う。このグラフィックス処理部７５はま
た、テキストデータに対応するビットマップデータも生
成する。The graphics processing unit 75 performs a moving image process (color space conversion for converting a YUV signal, which is a display format of moving image data, to an RGB signal, which is a graphics signal data format).
In addition to performing scaling (enlargement or reduction) processing for displaying with a desired screen size, etc., three-dimensional graphics processing (for example, rasterization processing for projecting a three-dimensional object on a secondary screen screen, object Glow shading processing to make the surface look smooth, alpha blending processing to represent translucent objects, etc.), and further, to write the processing result to the display memory 76 and output it to the synthesizing circuit 85. Do. The graphics processing unit 75 also generates bitmap data corresponding to the text data.

【００３６】MPEG２ビデオデコーダ７７は、DVDドライ
ブ３３によりDVDから再生されたビデオデータをデコー
ドし、合成回路８５に出力する。サブピクチャデコーダ
８８は、DVDから再生されたサブピクチャデータをデコ
ードし、合成回路８５に出力する。デジタルサウンド処
理部８１は、ADPCM(Adaptive Difference Pulse Code M
odulation)音源の伸長、MPEGオーディオデータの伸長、
残響音やサラウンドなどの効果音生成のためのFM(Frequ
ency Modulation)サウンド構成（すなわち、異なる周波
数と振幅の複数の正弦波を合成することによりオーディ
オ信号を生成する処理）、あるいはMIDI(Musical Instr
ument Digital Interface)ウエーブテーブル合成処理、
ＡＣ−３デコード処理などを行う。MIDIウエーブテーブ
ル合成処理とは、楽器音の音素となるデジタルデータを
記憶したウエーブテーブルを用いて、内蔵するシンセサ
イザで、MIDIデータの再生を行う処理である。それぞれ
の処理されたオーディオ信号は、内蔵されているオーデ
ィオミキサによってミキシングされ、アナログオーディ
オ信号に変換され、テレビジョン受像機３のスピーカ５
に出力される。音声認識回路８７は、音声認識処理を行
う。[0036] The MPEG2 video decoder 77 decodes video data reproduced from the DVD by the DVD drive 33 and outputs the decoded data to the synthesizing circuit 85. The sub-picture decoder 88 decodes the sub-picture data reproduced from the DVD and outputs it to the synthesizing circuit 85. The digital sound processing unit 81 has an ADPCM (Adaptive Difference Pulse Code M
odulation) sound source expansion, MPEG audio data expansion,
FM (Frequency) for generating sound effects such as reverberation and surround
ency Modulation) sound configuration (ie, the process of generating an audio signal by synthesizing multiple sine waves of different frequencies and amplitudes) or MIDI (Musical Instr.
ument Digital Interface) Wavetable synthesis processing,
An AC-3 decoding process is performed. The MIDI wave table synthesizing process is a process in which a built-in synthesizer reproduces MIDI data using a wave table that stores digital data serving as phonemes of musical instrument sounds. Each processed audio signal is mixed by a built-in audio mixer, converted into an analog audio signal, and converted into a speaker 5 of the television receiver 3.
Is output to The speech recognition circuit 87 performs a speech recognition process.

【００３７】Intercast（商標）用ボード７８は、イン
ターキャストの放送をアンテナ９１を介して受信し、復
調する処理を行うボードである。インターキャストにお
いては、映像信号の垂直帰線期間に、World Wide Web(W
WW)のページの基となるHTML(Hyper Text Markup Langua
ge)データを挿入して送信する。受信されたデータは、
ハードディスクドライブ(HDD)８０で駆動されるハード
ディスクに蓄積される。ハードディスクドライブ８０の
HTMLデータの中を行き来することで、使用者は、疑似的
に、インターラクティブな環境を手にすることができ
る。The Intercast (trademark) board 78 is a board that receives an intercast broadcast via an antenna 91 and performs processing for demodulation. In the intercast, the World Wide Web (W
HTML (Hyper Text Markup Langua)
ge) Insert data and send. The data received is
It is stored on a hard disk driven by a hard disk drive (HDD) 80. Hard disk drive 80
By traversing through the HTML data, the user can have a pseudo-interactive environment.

【００３８】例えば、スポーツ番組の場合、番組の内容
にあわせて、スコアや決定的なシーンの静止画、ビデオ
クリップなどが、このインターキャストで送信される。
これらの静止画やビデオクリップは、関連情報とリンク
されており、例えばアナログ電話回線を介してリンク先
にアクセスし、その関連情報を得ることができるように
なされている。このインターキャストは、Intel社が開
発したものである。For example, in the case of a sports program, a score, a still image of a definitive scene, a video clip, and the like are transmitted by the intercast in accordance with the contents of the program.
These still images and video clips are linked to related information. For example, the link destination is accessed via an analog telephone line so that the related information can be obtained. This intercast was developed by Intel.

【００３９】DSVD(Digital Simultaneous Voice & Dat
a)モデム７９は、Intel社の開発したDSVD方式で音声と
データとを時分割多重して、モジュラージャック９２を
介して電話回線に出力するとともに、電話回線を介して
入力されたDSVD方式の信号から、オーディオ信号とデー
タとを復調分離する処理を行う。この方式においては、
デジタル圧縮したオーディオ信号と通常のオーディオ信
号とが、Ｖ．４２プロトコルのヘッダを使って多重化さ
れる。オーディオ信号が存在しない場合、最大のデータ
転送速度は２８．８ｋビット／秒となり、オーディオ信
号がある場合、１９．２ｋビット／秒となる。また、オ
ーディオ信号の伝送速度は、９．６ｋビット／秒とな
る。オーディオ信号の圧縮伸長方式は、ロックウエル社
のDigiTalk（商標）や、DSPグループのTrueSpeech（商
標）などが用いられる。DSVD (Digital Simultaneous Voice & Dat)
a) The modem 79 time-division multiplexes voice and data using the DSVD method developed by Intel Corporation, outputs the time-division multiplexed signal to the telephone line via the modular jack 92, and outputs the DSVD signal input via the telephone line. Then, a process of demodulating and separating the audio signal and the data is performed. In this scheme,
The digitally-compressed audio signal and the normal audio signal are converted into V.V. It is multiplexed using a header of 42 protocol. If no audio signal is present, the maximum data rate is 28.8 kbit / sec, and if there is an audio signal, 19.2 kbit / sec. Further, the transmission speed of the audio signal is 9.6 kbit / sec. As the compression / expansion method of the audio signal, Rockwell's DigiTalk (trademark), DSP Group's TrueSpeech (trademark), or the like is used.

【００４０】キーボードコントローラ８４は、赤外線受
信部２４からの信号を受け取り、その受信信号に対応す
る信号をCPU７１に出力するようになされている。The keyboard controller 84 receives a signal from the infrared receiving section 24 and outputs a signal corresponding to the received signal to the CPU 71.

【００４１】合成回路８５は、グラフィックス処理部７
５の出力、MPEG２ビデオデコーダ７７の出力、およびサ
ブピクチャデコーダ８８の出力を必要に応じて合成し、
NTSCエンコーダ８６に出力するようになされている。NT
SCエンコーダ８６は、合成回路８５より入力されたビデ
オデータをNTSC方式のアナログビデオ信号に変換して、
テレビジョン受像機３に出力するようになされている。The synthesizing circuit 85 includes the graphics processing unit 7
5, the output of the MPEG2 video decoder 77 and the output of the sub-picture decoder 88 are synthesized as necessary.
An output is provided to the NTSC encoder 86. NT
The SC encoder 86 converts the video data input from the synthesizing circuit 85 into an NTSC analog video signal,
The data is output to the television receiver 3.

【００４２】バスは、便宜上、１つのみが示されている
が、実際には、CPU７１とRAM７２を接続するローカルバ
ス、キーボードコントローラ８４に接続されているISA
(Industry Standard Architecture)バス、およびその他
のROM７３乃至HDD８０などが接続されているPCI(Periph
eral Component Interconnect)バスにより構成されてい
る。ISAバスは、８ビットまたは１６ビットのバスであ
り、PCIバスは３２ビットまたは６４ビットのバスであ
る。PCIバスは、２５MHz乃至６６MHzの間の速度で動作
し、最大５２８KB／秒のスループットを実現する。この
速度は、ISAバスの４２倍以上の速度である。Although only one bus is shown for convenience, a local bus connecting the CPU 71 and the RAM 72 and an ISA connected to the keyboard controller 84 are actually used.
(Industry Standard Architecture) bus, and other PCI (Periph
eral Component Interconnect) bus. The ISA bus is an 8-bit or 16-bit bus, and the PCI bus is a 32-bit or 64-bit bus. The PCI bus operates at a speed between 25 MHz and 66 MHz and achieves a maximum of 528 KB / sec throughput. This speed is more than 42 times faster than the ISA bus.

【００４３】拡張スロット８２は、PCIバスのための拡
張スロットであり、拡張スロット８３は、ISAバスのた
めの拡張スロットである。この拡張スロットに、適宜、
所定の機能の周辺回路（例えばSCSIボード）を接続する
ことで、所望の機能を実現することができる。The expansion slot 82 is an expansion slot for a PCI bus, and the expansion slot 83 is an expansion slot for an ISA bus. In this expansion slot,
A desired function can be realized by connecting a peripheral circuit of a predetermined function (for example, a SCSI board).

【００４４】なお、ローカルバスとPCIバスの間、およ
びPCIバスとISAバスの間には、それぞれ専用のバスブリ
ッジ回路（図示せず）が設けられている。Note that dedicated bus bridge circuits (not shown) are provided between the local bus and the PCI bus, and between the PCI bus and the ISA bus.

【００４５】図７は、キーボード１１の内部の構成例を
表している。検出回路１４１は、キー１２の中から操作
されたキーを検出するようになされている。また、検出
回路１４１は、タッチパッド１３の操作された点Ｐの座
標データ（Ｘ，Ｙ）を検出する。そして、検出回路１４
１は、検出結果を送信モジュール１４２に出力する。送
信モジュール１４２は、検出回路１４１より入力された
信号を送信信号に変換し、赤外線発信部１４に出力し、
赤外線信号として発信させる。FIG. 7 shows an example of the internal configuration of the keyboard 11. The detection circuit 141 detects a key operated from the keys 12. Further, the detection circuit 141 detects the coordinate data (X, Y) of the operated point P on the touch pad 13. Then, the detection circuit 14
1 outputs the detection result to the transmission module 142. The transmission module 142 converts the signal input from the detection circuit 141 into a transmission signal, outputs the signal to the infrared transmitting unit 14,
Transmit as an infrared signal.

【００４６】電池１４３は、電源回路１４４に所定の電
力を供給している。電源回路１４４は、検出回路１４１
と送信モジュール１４２に対して必要な電力を供給して
いる。電源スイッチ１４５は、キーボード１１を使用し
たり、使用を中止するとき操作される。The battery 143 supplies a predetermined power to the power supply circuit 144. The power supply circuit 144 includes a detection circuit 141
And the necessary power to the transmission module 142. The power switch 145 is operated when the keyboard 11 is used or stopped.

【００４７】次に、このパーソナルコンピュータ１にお
いて、DVDの再生を指令した場合の動作について説明す
る。キーボード１１のキー１２のうち、DVDの再生の指
令を入力するキーを操作すると、その操作が検出回路１
４１で検出される。送信モジュール１４２は、検出され
たキーに対応する信号を赤外線発信部１４に出力し、赤
外線信号として出力させる。Next, the operation of the personal computer 1 when a command to reproduce a DVD is issued will be described. When one of the keys 12 of the keyboard 11 for inputting a DVD playback command is operated, the operation is performed by the detection circuit 1.
It is detected at 41. The transmission module 142 outputs a signal corresponding to the detected key to the infrared transmitting unit 14 and outputs the signal as an infrared signal.

【００４８】この赤外線信号は、パーソナルコンピュー
タ１の赤外線受信部２４で受信される。キーボードコン
トローラ８４は、赤外線受信部２４より検出信号の入力
を受けると、検出結果をバスを介してCPU７１に出力す
る。CPU７１は、このようにして、DVDの再生の指令を受
けると、DVDドライバ３３を制御し、そこに装着されて
いるDVDを再生させる。This infrared signal is received by the infrared receiving section 24 of the personal computer 1. When receiving the detection signal from the infrared receiver 24, the keyboard controller 84 outputs the detection result to the CPU 71 via the bus. When receiving the instruction to reproduce the DVD in this way, the CPU 71 controls the DVD driver 33 to reproduce the DVD mounted thereon.

【００４９】DVDドライバ３３でDVDより再生された再生
信号のうち、ビデオデータは、MPEG２ビデオデコーダ７
７に供給され、デコードされる。また、DVDより再生さ
れたサブピクチャデータは、サブピクチャデコーダ８８
に入力され、デコードされる。合成回路８５は、MPEG２
ビデオデコーダ７７より出力されたビデオデータと、サ
ブピクチャデコーダ８８より出力された字幕データとを
合成し、合成したデータをNTSCエンコーダ８６に出力す
る。NTSCエンコーダ８６は、入力されたデータをNTSC方
式のビデオデータに変換し、テレビジョン受像機３のCR
T４に出力し、表示させる。The video data of the reproduction signal reproduced from the DVD by the DVD driver 33 is the MPEG2 video decoder 7.
7 and decoded. Also, the sub-picture data reproduced from the DVD is
And decoded. The synthesizing circuit 85 is an MPEG2
The video data output from the video decoder 77 and the subtitle data output from the sub-picture decoder 88 are combined, and the combined data is output to the NTSC encoder 86. The NTSC encoder 86 converts the input data into NTSC video data,
Output to T4 and display.

【００５０】また、DVDより再生されたオーディオデー
タは、デジタルサウンド処理部８１に入力され、ＡＣ−
３方式でデコードされる。そして、デコードされたオー
ディオ信号は、アナログ信号に変換された後、テレビジ
ョン受像機３のスピーカ５に出力され、放音される。The audio data reproduced from the DVD is input to the digital sound processing section 81,
It is decoded by three methods. Then, after the decoded audio signal is converted into an analog signal, it is output to the speaker 5 of the television receiver 3 and emitted.

【００５１】但し、この場合において、デジタルサウン
ド処理部８１は、前左側と前右側の２チャンネルのオー
ディオ信号のみをスピーカ５に出力する。However, in this case, the digital sound processing section 81 outputs only the two-channel audio signals of the front left and front right to the speaker 5.

【００５２】以上のようにして、DVDより再生された画
像と音声を楽しむことができ、必要に応じて、予めDVD
に記録されている字幕も見ることができる。As described above, it is possible to enjoy the image and the sound reproduced from the DVD.
You can also see the subtitles recorded in.

【００５３】一方、ユーザが、キーボード１１を操作
し、センターチャンネルの台詞に対応する字幕の表示を
指令したとき、CPU７１は、デジタルサウンド処理部８
１を制御し、そこに入力されている５．１チャンネル分
のオーディオデータのうち、センターチャンネルのオー
ディオデータを音声認識回路８７に供給させる。音声認
識回路８７は、入力されたセンターチャンネルの台詞信
号を音声認識し、テキストデータを生成し、グラフィッ
クス処理部７５に出力する。グラフィックス処理部７５
は、入力されたテキストデータに対応するビットマップ
データを生成し、合成回路８５に出力し、MPEG２ビデオ
デコーダ７７の出力する画像に字幕として重畳させる。
その結果、テレビジョン受像機３のCRT４には、センタ
ーチャンネルの台詞が字幕として重畳された画像が表示
される。On the other hand, when the user operates the keyboard 11 to instruct the display of subtitles corresponding to the dialogue of the center channel, the CPU 71 sets the digital sound processing unit 8
1 of the 5.1 channel audio data input thereto, and the center channel audio data is supplied to the speech recognition circuit 87. The speech recognition circuit 87 performs speech recognition on the input speech signal of the center channel, generates text data, and outputs the text data to the graphics processing unit 75. Graphics processing unit 75
Generates bitmap data corresponding to the input text data, outputs the bitmap data to the synthesizing circuit 85, and superimposes it as subtitles on the image output from the MPEG2 video decoder 77.
As a result, an image in which the dialog of the center channel is superimposed as subtitles is displayed on the CRT 4 of the television receiver 3.

【００５４】さらに、キーボード１１を操作し、音声認
識された結果得られたテキストデータの記録を指令する
と、CPU７１は、音声認識回路８７より出力されたテキ
ストデータをHDD８０に供給し、ハードディスクに記録
させる。Further, when the keyboard 11 is operated to instruct recording of the text data obtained as a result of the speech recognition, the CPU 71 supplies the text data output from the speech recognition circuit 87 to the HDD 80 and records it on the hard disk. .

【００５５】CPU７１は、このようにして、ハードディ
スクに記録されたテキストデータを翻訳のアプリケーシ
ョンソフトに従って翻訳させたり、図示せぬプリンタに
出力し、プリントさせたり、DSVDモデム７９を介して、
ネットワーク上に伝送させたりして利用する。In this way, the CPU 71 translates the text data recorded on the hard disk according to the translation application software, outputs the text data to a printer (not shown), prints the text data, and outputs the text data via the DSVD modem 79.
Use it by transmitting it over a network.

【００５６】[0056]

【発明の効果】以上の如く、請求項１に記載の情報処理
装置および請求項６に記載の情報処理方法によれば、多
重化信号から分離された台詞信号を音声認識し、音声認
識結果に対応するテキストデータを発生するようにした
ので、音楽や効果音などに影響されずに、正確に、台詞
をテキストデータに変換することができる。従って、そ
れを字幕として表示させたり、他の言語に翻訳したり、
再利用することが可能となる。As described above, according to the information processing apparatus according to the first aspect and the information processing method according to the sixth aspect, the speech signal separated from the multiplexed signal is subjected to speech recognition, and the speech recognition result is output. Since the corresponding text data is generated, the dialogue can be accurately converted to text data without being affected by music or sound effects. Therefore, it can be displayed as subtitles, translated into other languages,
It can be reused.

[Brief description of the drawings]

【図１】本発明の情報処理装置を応用したDVDプレーヤ
の構成例を示すブロック図である。FIG. 1 is a block diagram illustrating a configuration example of a DVD player to which an information processing device according to the present invention is applied.

【図２】本発明の情報処理装置を応用したＡＶシステム
の構成例を示す斜視図である。FIG. 2 is a perspective view showing a configuration example of an AV system to which the information processing apparatus of the present invention is applied.

【図３】図２のパーソナルコンピュータの前面から見た
外観の構成を示す斜視図である。FIG. 3 is a perspective view showing a configuration of an external appearance of the personal computer of FIG. 2 as viewed from the front.

【図４】図３のパーソナルコンピュータのドアを開放し
た状態を示す斜視図である。FIG. 4 is a perspective view showing a state in which a door of the personal computer of FIG. 3 is opened.

【図５】図２のパーソナルコンピュータの後面のドアを
開放した状態を示す斜視図である。5 is a perspective view showing a state where a door on a rear surface of the personal computer in FIG. 2 is opened.

【図６】図２のパーソナルコンピュータの内部の構成例
を示すブロック図である。FIG. 6 is a block diagram showing an example of the internal configuration of the personal computer shown in FIG. 2;

【図７】図２のキーボードの内部の構成例を示すブロッ
ク図である。FIG. 7 is a block diagram showing an example of the internal configuration of the keyboard shown in FIG. 2;

[Explanation of symbols]

１パーソナルコンピュータ，３テレビジョン受像
機，４ CRT，５スピーカ，１１キーボード，
１２キー，１３タッチパッド，１４赤外線
発信部，２４赤外線受信部，３３ DVDドライ
ブ，７５グラフィックス処理部，７７ MPEG２ビ
デオデコーダ，８５合成回路，８６ NTSCエンコ
ーダ，１１１ DVD，１１２デマルチプレクサ，
１１３MPEG２デコーダ，１１４フレームメモリ，
１１５ビデオエンコーダ，１１６ＡＣ−３デコー
ダ，１１８サブピクチャデコーダ，１２０音声
認識部，１２２ビットマップデータＲＯＭ1 personal computer, 3 television receiver, 4 CRT, 5 speakers, 11 keyboard,
12 keys, 13 touch pad, 14 infrared transmitter, 24 infrared receiver, 33 DVD drive, 75 graphics processor, 77 MPEG2 video decoder, 85 synthesis circuit, 86 NTSC encoder, 111 DVD, 112 demultiplexer,
113 MPEG2 decoder, 114 frame memory,
115 video encoder, 116 AC-3 decoder, 118 sub-picture decoder, 120 voice recognition unit, 122 bitmap data ROM

Claims

[Claims]

1. A separating means for separating a speech signal from a multiplexed signal in which a video signal and a speech signal corresponding to the video signal are multiplexed, and a speech recognition means for recognizing the separated speech signal as speech. An information processing apparatus comprising: a generation unit configured to generate text data in response to a speech recognition result.

2. The apparatus according to claim 1, further comprising: decoding means for decoding the video signal; and synthesizing means for synthesizing a text corresponding to the text data with an image corresponding to the decoded video signal. 2. The information processing device according to 1.

3. The information processing apparatus according to claim 1, further comprising a storage unit that stores the text data.

4. The information processing apparatus according to claim 1, wherein the multiplex signal is a reproduction signal from a DVD.

5. The information processing apparatus according to claim 1, wherein the speech signal is a signal of a center channel among audio signals encoded by Dolby AC-3.

6. A separating step of separating the speech signal from a multiplexed signal in which a video signal and a speech signal corresponding to the video signal are multiplexed, and a voice recognition step of recognizing the separated speech signal as speech. And a generating step of generating text data in response to a speech recognition result.