JP3594409B2

JP3594409B2 - MPEG audio playback device and MPEG playback device

Info

Publication number: JP3594409B2
Application number: JP16945496A
Authority: JP
Inventors: 英樹山内; 茂之岡田; 正幸飯田; 浩司田中
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1995-06-30
Filing date: 1996-06-28
Publication date: 2004-12-02
Anticipated expiration: 2016-06-28
Also published as: JPH0973299A

Abstract

PROBLEM TO BE SOLVED: To provide an MPEG audio reproducing device which reproduces audio signals that are easily understood during a variable speed reproducing. SOLUTION: An MPEG audio reproducing device 1 consists of a reproducing speed detecting circuit 2, an MPEG audio decoder 3, a speech speed conversion processing circuit 4, a D/A converter 5 and an audio amplifier 6. Moreover, an MPEG reproducing device is provided with an audio-video purser (an AV purser) and an MPEG video decoder 12 in addition to the device 1. The circuit 4 consists of a DSP 31, a ring memory 32 and an up-down counter 33. The circuit 4 expands the time length of the voice segment inputted during a high speed reproducing and reduces the time length of each silence interval. During a low speed reproducing, the time length of each voice segment is expanded, the time length of each silence interval is reduced or each silence interval is deleted, each voice segment is connected together and inserted into a silecne interval.

Description

【０００１】
【発明の属する技術分野】
本発明はＭＰＥＧ（ＭｏｖｉｎｇＰｉｃｔｕｒｅＥｘｐｅｒｔＧｒｏｕｐ）オーディオ再生装置およびＭＰＥＧ再生装置に係り、詳しくは、話速変換機能を備えたＭＰＥＧオーディオ再生装置およびＭＰＥＧ再生装置に関するものである。
【０００２】
【従来の技術】
マルチメディアで扱われる情報は、膨大な量で且つ多種多様であり、これらの情報を高速に処理することがマルチメディアの実用化を図る上で必要となってくる。情報を高速に処理するためには、データの圧縮・伸長技術が不可欠となる。そのようなデータの圧縮・伸長技術として「ＭＰＥＧ」方式が挙げられる。このＭＰＥＧ方式は、ＩＳＯ（ＩｎｔｅｒｎａｔｉｏｎａｌＯｒｇａｎｉｚａｔｉｏｎｆｏｒＳｔａｎｄａｒｄｉｚａｔｉｏｎ）／ＩＥＣ（ＩｎｔａｒｎａｔｉｏｎａｌＥｌｅｃｔｒｏｔｅｃｈｎｉｃａｌＣｏｍｍｉｓｓｉｏｎ）傘下のＭＰＥＧ委員会（ＩＳＯ／ＩＥＣＪＴＣ１／ＳＣ２９／ＷＧ１１）によって標準化されつつある。
【０００３】
ＭＰＥＧは３つのパートから構成されている。パート１の「ＭＰＥＧシステムパート」（ＩＳＯ／ＩＥＣＩＳ１１１７２Ｐａｒｔ１：Ｓｙｓｔｅｍｓ）では、ビデオデータとオーディオデータの多重化構造（マルチプレクス・ストラクチャ）および同期方式が規定される。パート２の「ＭＰＥＧビデオパート」（ＩＳＯ／ＩＥＣＩＳ１１１７２Ｐａｒｔ２：Ｖｉｄｅｏ）では、ビデオデータの高能率符号化方式およびビデオデータのフォーマットが規定される。パート３の「ＭＰＥＧオーディオパート」（ＩＳＯ／ＩＥＣＩＳ１１１７２Ｐａｒｔ３：Ａｕｄｉｏ）では、オーディオデータの高能率符号化方式およびオーディオデータのフォーマットが規定される。
【０００４】
ＭＰＥＧビデオパートで取り扱われるビデオデータは動画に関するものであり、その動画は１秒間に数十個（例えば、３０個）のフレーム（静止画、コマ）によって構成されている。ビデオデータは、シーケンス（Ｓｅｑｕｅｎｃｅ）、ＧＯＰ（ＧｒｏｕｐＯｆＰｉｃｔｕｒｅｓ）、ピクチャ、スライス（Ｓｌｉｃｅ）、マクロブロック（Ｍａｃｒｏｂｌｏｃｋ）、ブロックの順に６層の階層構造から成る。
【０００５】
また、ＭＰＥＧには主にエンコードレートの違いにより、現在のところ、ＭＰＥＧ−１，ＭＰＥＧ−２の２つの方式がある。ＭＰＥＧ−１においてフレームはピクチャに対応している。ＭＰＥＧ−２においては、フレームまたはフィールドをピクチャに対応させることもできる。フィールドは、２個で１つのフレームを構成している。ピクチャにフレームが対応している構造はフレーム構造と呼ばれ、ピクチャにフィールドが対応している構造はフィールド構造と呼ばれる。
【０００６】
ＭＰＥＧでは、フレーム間予測と呼ばれる圧縮技術を用いる。フレーム間予測は、フレーム間のデータを時間的な相関に基づいて圧縮する。フレーム間予測では双方向予測が行われる。双方向予測とは、過去の再生画像（または、ピクチャ）から現在の再生画像を予測する順方向予測と、未来の再生画像から現在の再生画像を予測する逆方向予測とを併用することである。
【０００７】
この双方向予測は、Ｉピクチャ（Ｉｎｔｒａ−Ｐｉｃｔｕｒｅ），Ｐピクチャ（Ｐｒｅｄｉｃｔｉｖｅ−Ｐｉｃｔｕｒｅ），Ｂピクチャ（Ｂｉｄｉｒｅｃｔｉｏｎａｌｌｙｐｒｅｄｉｃｔｉｖｅ−Ｐｉｃｔｕｒｅ）と呼ばれる３つのタイプのピクチャを規定している。Ｉピクチャは、過去や未来の再生画像とは無関係に、独立して生成される。Ｐピクチャは順方向予測（過去のＩピクチャまたはＰピクチャからの予測）により生成される。Ｂピクチャは双方向予測により生成される。双方向予測においてＢピクチャは、以下に示す３つの予測のうちいずれか１つにより生成される。▲１▼順方向予測；過去のＩピクチャまたはＰピクチャからの予測、▲２▼逆方向予測；未来のＩピクチャまたはＰピクチャからの予測、▲３▼双方向予測；過去および未来のＩピクチャまたはＰピクチャからの予測。そして、これらＩ，Ｐ，Ｂピクチャがそれぞれエンコードされる。つまり、Ｉピクチャは過去や未来のピクチャが無くても生成される。これに対し、Ｐピクチャは過去のピクチャが無いと生成されず、Ｂピクチャは過去または未来のピクチャが無いと生成されない。
【０００８】
フレーム間予測では、まず、Ｉピクチャが周期的に生成される。次に、Ｉピクチャよりも数フレーム先のフレームがＰピクチャとして生成される。このＰピクチャは、過去から現在への一方向（順方向）の予測により生成される。続いて、Ｉピクチャの前、Ｐピクチャの後に位置するフレームがＢピクチャとして生成される。このＢピクチャを生成するとき、順方向予測，逆方向予測，双方向予測の３つの中から最適な予測方法が選択される。一般的に連続した動画では、現在の画像とその前後の画像とは良く似ており、異なっているのはその一部分に過ぎない。そこで、前のフレーム（例えば、Ｉピクチャ）と次のフレーム（例えば、Ｐピクチャ）とは同じであると仮定し、両フレーム間に変化があればその差分（Ｂピクチャ）のみを抽出して圧縮する。これにより、フレーム間のデータを時間的な相関に基づいて圧縮することができる。
【０００９】
ＭＰＥＧビデオパートに準拠してエンコードされたビデオデータのデータ列（ビットストリーム）は、ＭＰＥＧビデオストリーム（以下、ビデオストリームと略す）と呼ばれる。また、ＭＰＥＧオーディオパートに準拠してエンコードされたオーディオデータのデータ列は、ＭＰＥＧオーディオストリーム（以下、オーディオストリームと略す）と呼ばれる。そして、ビデオストリームとオーディオストリームは、ＭＰＥＧシステムパートに準拠して時分割多重化され、１本のデータ列としてのＭＰＥＧシステムストリーム（以下、システムストリームと略す）となる。システムストリームはマルチプレックスストリームとも呼ばれる。
【００１０】
ＭＰＥＧパートにおけるエンコードからデコードまでの流れは、以下のようになっている。ＭＰＥＧシステムエンコーダ（以下、システムエンコーダと略す）は、ビデオデータとオーディオデータのそれぞれを連係を保ちながら別個にエンコードを行い、ビデオストリームとオーディオストリームを生成する。次に、ＭＰＥＧシステムエンコーダに装備されたマルチプレクサ（ＭＵＸ；Ｍｕｌｔｉｐｌｅｘｅｒ）は、伝送媒体または記録媒体のフォーマットに適合するように、ビデオストリームとオーディオストリームの多重化を行い、システムストリームを生成する。そのシステムストリームは、伝送媒体を介してＭＵＸから伝送されるか、または記録媒体に記録される。
【００１１】
ＭＰＥＧシステムデコーダ（以下、システムデコーダと略す）に装備されたデマルチプレクサ（ＤＭＵＸ；ＤｅＭＵｌｔｉｐｌｅＸｅｒ）は、システムストリームをビデオストリームとオーディオストリームに分離する。次に、システムデコーダは各ストリームを個別にデコードして、ビデオのデコード出力（以下、ビデオ出力という）とオーディオのデコード出力（以下、オーディオ出力という）を生成する。ビデオ出力はディスプレイへ出力され、ディスプレイで動画が再生される。オーディオ出力はＤ／Ａ（Ｄｉｇｉｔａｌ／Ａｎａｌｏｇ）コンバータおよびオーディオアンプを介してスピーカへ出力され、スピーカから音声が再生される。
【００１２】
ところで、ＭＰＥＧ−１は主にビデオＣＤ（ＣｏｍｐａｃｔＤｉｓｃ），ＣＤ−ＲＯＭ（ＣＤ−ＲｅａｄＯｎｌｙＭｅｍｏｒｙ），ＤＶＤ（ＤｉｇｉｔａｌＶｉｄｅｏＤｉｓｃ）などの記録媒体を用いた蓄積メディアに対応しており、ＭＰＥＧ−２はＭＰＥＧ−１をも含む幅広い範囲のアプリケーションに対応している。
【００１３】
蓄積メディアにおいては、以下に示す２つの可変速再生が要求される。▲１▼動画を通常（標準）の再生速度より高速で再生（以下、高速再生という）する機能。▲２▼動画を通常の再生速度より低速で再生（以下、低速再生という）する機能。高速再生機能は、例えば、ユーザが短時間に動画を見るために早送り再生を行う際や、見たい動画を探索するために早送り再生または早送り逆転再生を行う際に用いられる。また、低速再生機能は、例えば、ユーザが動画を注意深く見る際などに用いられる。
【００１４】
記録媒体から読み出されたシステムストリームのビットレートは、読み出し速度に対応したものになる。従って、高速再生を行うには記録媒体からシステムストリームを高速で読み出し、低速再生を行うには記録媒体からシステムストリームを低速で読み出す。例えば、記録媒体としてビデオＣＤやＤＶＤを用いた場合には、ビデオＣＤやＤＶＤの回転速度を通常の再生時（標準再生時）よりも速くしたり遅くしたりすることで、システムストリームを所望の速度で読み出すようにする。
【００１５】
【発明が解決しようとする課題】
従来、ＭＰＥＧにおいては、前記したような動画の可変速再生については検討されていたものの、音声の可変速再生については何らの検討もなされていなかった。
【００１６】
オーディオストリームのビットレートはシステムストリームのそれと同一である。そのため、動画の高速再生時には、オーディオストリームのビットレートも大きくなり、再生される音声の音程（ピッチ）が上がるのに加えて、発声速度（話速）が速くなる。また、動画の低速再生時には、オーディオストリームのビットレートも小さくなり、再生される音声のピッチは変化しないものの、音声が途切れ途切れになる。このように、動画の可変速再生時には、音声が聞き苦しいものになるという問題があった。
【００１７】
ところで、近年、ピッチを変化させることなく話速を任意に制御する話速変換技術の開発が進められており、本出願人もＶＴＲやテープレコーダに利用可能な話速変換処理ＬＳＩを既に開発している（特開平７−１９２３９２号公報（Ｇ１１Ｂ２０／０２）、日経エレクトロニクス１９９４年１１月２１日号（Ｎｏ．６２２）Ｐ．９３〜９８．参照）。しかし、話速変換技術をＭＰＥＧに利用する試みはなされていない。
【００１８】
また、音声と動画（映像）の同期生成においては、「リップシンク」を考慮する必要がある。リップシンクとは、ディスプレイに映し出される人物の口の動きと、スピーカから発声される音声との同期がとれていることをいう。口の動きより音声の方が早くなったり、逆に遅くなったりする状態をリップシンクにずれがあるという。リップシンクのずれが人間の聴覚の許容範囲を外れると、視聴者は違和感を覚える。一般に、音声が動画より遅れることによって生じるリップシンクのずれとして許容できる時間は、約５０〜２５０ｍｓであるといわれている。
【００１９】
本発明は上記要求を満足するためになされたものであって、以下の目的を有するものである。
〔１〕可変速再生時においても自然で聞き易い音声を再生することが可能なＭＰＥＧオーディオ再生装置を提供する。
【００２０】
〔２〕上記〔１〕のＭＰＥＧオーディオ再生装置とＭＰＥＧビデオデコーダとを備えたＭＰＥＧ再生装置を提供する。
〔３〕上記〔１〕のＭＰＥＧオーディオ再生装置とＭＰＥＧビデオデコーダとを備え、音声と動画との時間ずれを低減することが可能なＭＰＥＧ再生装置を提供する。
【００２４】
【課題を解決するための手段】
請求項１に記載の発明は、記録媒体（２１）から読み出されたＭＰＥＧオーディオストリームをＭＰＥＧオーディオパートに準拠してデコードし、オーディオ信号を生成するＭＰＥＧオーディオデコーダ（３）と、オーディオ信号に対して話速変換処理を行う話速変換処理手段（２，４）とを備え、話速変換処理手段は、オーディオストリームのビットレートが通常時よりも大きい場合には、再生される各音声区間の時間長さを長くすると共に各無音区間の時間長さを短くするようにして話速変換処理を行い、オーディオストリームのビットレートが通常時よりも小さい場合には、再生される各音声区間の時間長さを長くすると共に各無音区間の時間長さを短くするか、または、各無音区間を削除して各音声区間をつなぎ合わせた後に無音区間を挿入するようにして話速変換処理を行うことをその要旨とする。
【００２５】
請求項２に記載の発明は、請求項１に記載のＭＰＥＧオーディオ再生装置において、話速変換処理手段（２，４）は、オーディオ信号を蓄積するリングメモリ（３２）と、リングメモリの蓄積量を検出する検出手段（３３）とを備え、リングメモリの蓄積量に応じて音声区間の時間長さの圧縮伸長率を調整することをその要旨とする。
【００２６】
請求項３に記載の発明は、請求項２に記載のＭＰＥＧオーディオ再生装置において、話速変換処理手段（２，４）は、オーディオ信号の音声区間と無音区間とを判別する音声判別部（４１）と、無音区間の削除処理または挿入処理を行う無音削除挿入部（４２）と、リングメモリ（３２）の蓄積量に基づいて音声区間の圧縮伸長処理を行うことで圧縮伸長率を調整する時間軸圧縮伸長部（４３）とを備えたことをその要旨とする。
【００２７】
請求項４に記載の発明は、請求項１〜３のいずれか１項に記載のＭＰＥＧオーディオ再生装置（１）と、記録媒体（２１）から読み出されたＭＰＥＧビデオストリームをＭＰＥＧビデオパートに準拠してデコードし、ビデオ信号を生成するＭＰＥＧビデオデコーダ（１２）とを備えたことをその要旨とする。
【００２８】
請求項５に記載の発明は、請求項２または請求項３に記載のＭＰＥＧオーディオ再生装置（１）と、記録媒体（２１）から読み出されたＭＰＥＧビデオストリームをＭＰＥＧビデオパートに準拠してデコードし、ビデオ信号を生成するＭＰＥＧビデオデコーダ（１２）と、リングメモリ（３２）に書き込まれる以前のオーディオ信号に、時刻に関する情報としてのインデックス信号を付加するインデックス付加回路（５１）と、リングメモリ（３２）から読み出されたオーディオ信号に付加されているインデックス信号を検出し、そのインデックス信号から得られる時刻情報と現在の時刻情報とから、話速変換処理手段（２，４）における信号遅延時間を検出し、その検出された遅延時間を示す信号をＭＰＥＧビデオデコーダ（１２）へ供給するインデックス検出回路（５２）とを備え、ＭＰＥＧビデオデコーダ（１２）は、前記遅延時間を示す信号に基づいて自己の動作のタイミングを制御することをその要旨とする。
【００２９】
請求項６に記載の発明は、請求項３に記載のＭＰＥＧオーディオ再生装置（１）と、記録媒体（２１）から読み出されたＭＰＥＧビデオストリームをＭＰＥＧビデオパートに準拠してデコードし、ビデオ信号を生成するＭＰＥＧビデオデコーダ（１２）と、音声判別部（４１）の処理結果と、オーディオストリームのビットレートとに基づいて、話速変換処理手段（２，４）における信号遅延時間を検出し、その検出された遅延時間を示す信号をＭＰＥＧビデオデコーダ（１２）へ供給する遅延時間検出回路（５３）とを備え、ＭＰＥＧビデオデコーダ（１２）は、前記遅延時間を示す信号に基づいて自己の動作のタイミングを制御することをその要旨とする。
【００３０】
請求項７に記載の発明は、請求項３に記載のＭＰＥＧオーディオ再生装置（１）と、記録媒体（２１）から読み出されたＭＰＥＧビデオストリームをＭＰＥＧビデオパートに準拠してデコードし、ビデオ信号を生成するＭＰＥＧビデオデコーダ（１２）と、リングメモリ（３２）の蓄積量に基づいて、話速変換処理済みのオーディオ信号とビデオ信号との同期を得るための制御信号を生成し、その制御信号をＭＰＥＧビデオデコーダ（１２）へ供給する制御回路（５４）とを備え、ＭＰＥＧビデオデコーダ（１２）は、前記制御信号に基づいて自己の動作のタイミングを制御することをその要旨とする。
【００３１】
請求項８に記載の発明は、請求項３に記載のＭＰＥＧオーディオ再生装置（１）と、記録媒体（２１）から読み出されたＭＰＥＧビデオストリームをＭＰＥＧビデオパートに準拠してデコードし、ビデオ信号を生成するＭＰＥＧビデオデコーダ（１２）と、音声判別部（４１）および時間軸圧縮伸長部（４３）の処理結果に基づいて、話速変換処理手段（２，４）における信号遅延時間を検出し、その検出された遅延時間を示す信号をＭＰＥＧビデオデコーダ（１２）へ供給する遅延時間検出回路（５５）とを備え、ＭＰＥＧビデオデコーダ（１２）は、前記遅延時間を示す信号に基づいて自己の動作のタイミングを制御することをその要旨とする。
【００３２】
【発明の実施の形態】
（第１実施形態）
以下、本発明を具体化した第１実施形態を図面に従って説明する。
【００３３】
図１に、本実施形態のブロック回路図を示す。
本実施形態のＭＰＥＧオーディオ再生装置１は、再生速度検出回路２、ＭＰＥＧオーディオデコーダ３、話速変換処理回路４、Ｄ／Ａコンバータ５、オーディオアンプ６から構成されている。尚、各回路２〜６は１チップのＬＳＩに搭載することもできる。
【００３４】
また、本実施形態のＭＰＥＧ再生装置２３は、ＭＰＥＧオーディオ再生装置１に加え、オーディオビデオパーサ（ＡＶパーサ）１１、ＭＰＥＧビデオデコーダ１２を備えている。
【００３５】
話速変換処理回路４は、例えば、ＤＳＰ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ）３１、リングメモリ３２、アップダウンカウンタ３３、読み出しクロック生成回路３６を備えている。尚、話速変換処理回路４の動作については、前記文献（日経エレクトロニクス１９９４年１１月２１日号（Ｎｏ．６２２）Ｐ．９３〜９８．）に詳述されている。
【００３６】
再生速度検出回路２は、ビデオＣＤやＤＶＤなどの記録媒体２１から読み出されたＭＰＥＧシステムストリームのビットレートに対応したデコードクロックを生成する。そのデコードクロックは各回路１２，３，４へ出力される。
【００３７】
ＡＶパーサ１１は、デマルチプレクサ（ＤＭＵＸ）１３を備えており、記録媒体２１から読み出されたＭＰＥＧシステムストリームを入力する。ＤＭＵＸ１３は、システムストリームをＭＰＥＧビデオストリームとＭＰＥＧオーディオストリームに分離する。ビデオストリームはビデオデコーダ１２へ出力され、オーディオストリームはオーディオデコーダ３へ出力される。
【００３８】
ビデオデコーダ１２は、ＭＰＥＧビデオパートに準拠してビデオストリームをデコードし、ビデオ出力（以下、ビデオ信号という）を生成する。そのビデオ信号はディスプレイ２２へ出力され、ディスプレイ２２で動画が再生される。
【００３９】
オーディオデコーダ３は、ＭＰＥＧオーディオパートに準拠してオーディオストリームをデコードし、ディジタル信号のオーディオ出力（以下、オーディオ信号という）を生成する。そのオーディオ信号は話速変換処理回路４へ出力される。話速変換処理回路４において信号処理されたオーディオ信号はＤ／Ａコンバータ５によってＤ／Ａ変換された後、オーディオアンプ６で増幅されてスピーカ２３へ送られる。そして、スピーカ２３から音声が再生される。
【００４０】
記録媒体２１から読み出されたシステムストリームのビットレートは、読み出し速度に対応したものになる。また、各回路３，４，１２の動作はデコードクロックによって規定される。
【００４１】
従って、ビデオデコーダ１２は、システムストリームのビットレートに対応したビデオ信号を生成する。すなわち、システムストリームのビットレートが、通常の再生時（標準再生時）よりも大きければディスプレイ２２では動画が高速再生され、通常の再生時よりも小さければディスプレイ２２では動画が低速再生される。
【００４２】
また、オーディオデコーダ３は、システムストリームのビットレートに対応したオーディオ信号を生成する。すなわち、システムストリームのビットレートが、通常の再生時よりも大きければオーディオ信号のビットレートも大きくなり、通常の再生時より小さければオーディオ信号のビットレートも小さくなる。
【００４３】
ところで、ビデオ信号とオーディオ信号とは、通常の再生時において同期生成されるようになっている。
ＤＳＰ３１は、フレームメモリ３４および話速変換部３５から構成されている。フレームメモリ３４は、適宜なフレーム数分（例えば、２フレーム分）のオーディオ信号を記憶する。話速変換部３５は、フレームメモリ３４に記憶されたオーディオ信号に対してフレーム単位で話速変換処理を行い、話速変換処理済みのオーディオ信号（以下、データという）を生成する。尚、１フレームは、適宜な数（例えば、２００個）のサンプリングデータから構成される。
【００４４】
フレームメモリ３４の内部は、２つの領域（以下、Ａ領域、Ｂ領域と記載して区別する）に分けられている。オーディオデコーダ３から出力されたオーディオ信号がＢ領域に書き込まれるのと同時に、Ａ領域に蓄積されている１フレーム分のオーディオ信号が読み出されて話速変換部３５へ転送される。そして、Ｂ領域に１フレーム分のオーディオ信号が蓄積されると、今度は、Ｂ領域に蓄積された１フレーム分のオーディオ信号が読み出されて話速変換部３５へ転送され、それと同時に、オーディオデコーダ３から出力されたオーディオ信号がＡ領域に書き込まれる。
【００４５】
話速変換部３５の生成したデータは、話速変換部３５が生成した書き込みクロックに従ってリングメモリ３２に書き込まれる。リングメモリ３２は、例えば、ＦＩＦＯ（Ｆｉｒｓｔ−Ｉｎ−Ｆｉｒｓｔ−Ｏｕｔ）構成のＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）から成る。
【００４６】
読み出しクロック生成回路３６は、デコードクロックに従って読み出しクロックを生成する。
リングメモリ３２に蓄積されたデータは、読み出しクロックに従って読み出され、その読み出されたデータはＤ／Ａコンバータ５へ出力される。Ｄ／Ａコンバータ５は、読み出しクロックをサンプリング周波数として用いる。
【００４７】
書き込みクロックはアップダウンカウンタ３３のアップカウント入力端子ＵＰに入力され、読み出しクロックはアップダウンカウンタ３３のダウンカウント入力端子ＤＯＷＮに入力される。アップダウンカウンタ３３は、書き込みクロックの総数と読み出しクロックの総数との差をカウントする。そのカウント値は、リングメモリ３２の蓄積量に対応する。つまり、アップダウンカウンタ３３は、書き込みクロックと読み出しクロックとに基づいて、リングメモリ３２の蓄積量を検出する。そのリングメモリ３２の蓄積量は話速変換部３５へ出力される。
【００４８】
図２に、話速変換部３５に内部構成を示す。
話速変換部３５は、音声判別部４１、無音削除挿入部４２、時間軸圧縮伸長部４３から構成されている。
【００４９】
音声判別部４１は、フレームメモリ３４から読み出されたオーディオ信号が、音声区間（音声が存在している区間）か、または、無音区間（音声が存在していない区間）かを判別する。尚、人間が発声する音声以外の背景雑音は無音区間として取り扱う。
【００５０】
無音削除挿入部４２は、音声判別部４１の判別した無音区間に対して、その無音区間の削除処理、または、新たな無音区間の挿入処理を行う。
時間軸圧縮伸長部４３は、音声判別部４１の判別した音声区間に対して、リングメモリ３２の蓄積量に基づいて圧縮処理または伸長処理を行う。
【００５１】
また、各部４２，４３は、その処理内容に対応した書き込みクロックを生成する。
次に、高速再生時における話速変換部３５の動作について説明する。
【００５２】
オーディオデコーダ３から出力されるオーディオ信号のビットレートは、オーディオストリームのそれと同一になる。従って、高速再生時には、通常の再生時に比べて、オーディオ信号のビットレートが大きくなる。通常の再生時よりもビットレートの大きなオーディオ信号をそのままＤ／Ａコンバータ５へ送った場合、通常の再生時に比べて、スピーカ２３から再生される音声のピッチは上がり話速は速くなる。
【００５３】
そこで、話速変換部３５において、スピーカ２３から再生される音声のピッチを通常の再生時とほぼ同一にし、且つ、スピーカ２３から再生される話速を通常の再生時に近づけるように話速変換処理を行う。
【００５４】
すなわち、無音削除挿入部４２は、音声判別部４１の判別した無音区間の継続長を算出し、その継続長が所定長以上の場合は無音区間を削除する。
また、時間軸圧縮伸長部４３は、音声判別部４１の判別した音声区間に対して、例えば、自己相関法を用いてピッチ抽出を行い、抽出したピッチ波形に対して圧縮処理を行う。その結果、高速再生時において、オーディオ信号のビットレートが大きくなった場合に、スピーカ２３から再生される音声区間の時間長さは伸長される。
【００５５】
尚、時間軸圧縮伸長部４３における圧縮処理に際しては、無音区間の状態とリングメモリ３２の蓄積量とに応じて動的に圧縮率を変化させる。
例えば、同一のピッチ周期をもつ３周期波形を２周期波形に圧縮することで、２／３倍の圧縮（圧縮率；２／３）を得る。具体的には、３周期波形から、時間軸方向で前にある２周期波形と、後ろにある２周期波形とをそれぞれ切り出す。そして、前の２周期波形に単調減少する三角窓関数を、後ろの２周期波形に単調増加する三角窓関数をそれぞれ乗じる。この二つの波形を加算することで出力波形を得る。
【００５６】
また、０．９倍の圧縮（圧縮率；０．９）を得るには、例えば、１０周期波形から９周期波形に圧縮する。この場合は、先頭の３周期波形に対して同様の処理を施す。つまり、入力の１０周期波形のうち、先頭の３周期波形を除いた７周期波形は処理に使わない。
【００５７】
このＭ周期波形からＮ周期波形に圧縮する組み合わせを色々と用意しておくことで、多種類の圧縮率を得る。ところで、無音区間が短い場合、圧縮率が低い（圧縮の度合いが大きい）とリングメモリ３２がオーバーフローする恐れがある。これを防ぐためには、リングメモリ３２の蓄積量に応じて、時間軸圧縮伸長部４３における圧縮率を動的に変化させればよい。また、背景雑音が存在する場合、音声区間やピッチの抽出誤りが生じる。これを防ぐためには、音声判別部４１における音声区間の検出レベルを雑音信号に応じて変化させればよい。
【００５８】
次に、低速再生時における話速変換部３５の動作について、図３および図４に従って説明する。
図３に、通常の再生時および０．５倍速再生時において再生される音声の例を示す。
【００５９】
低速再生時には、通常の再生時に比べて、オーディオ信号のビットレートが小さくなる。そのため、方法１に示すように、通常の再生時よりもビットレートの小さなオーディオ信号をそのままＤ／Ａコンバータ５へ送った場合、通常の再生時に比べて、スピーカ２３から再生される音声のピッチは変化しないものの、音声が途切れ途切れになる。つまり、各音声区間（「あ」「い」「う」「え」）の時間長さは通常の再生時のそれと変わらず、全く音の存在していない無音区間が各音声区間の間に挿入されるため、音声が途切れ途切れになり、ユーザは聴感上違和感を覚える。
【００６０】
そこで、話速変換部３５において、方法２または方法３に示すように話速変換処理を行う。尚、ＭＰＥＧオーディオでは、低速再生時に音声のピッチが変化しないため、高速再生時のように時間軸圧縮伸長部４３においてピッチを変える処理を行う必要はない。
【００６１】
（方法２）
方法２では、時間軸圧縮伸長部４３において各音声区間の長さを伸長させ、それと共に、無音削除挿入部４２において各無音区間の長さを短くすることで、音声の途切れを目立たなくする。
【００６２】
尚、時間軸圧縮伸長部４３において音声区間の長さを伸長させるには、音声判別部４１の判別した音声区間に対して、例えば、自己相関法を用いてピッチ抽出を行い、抽出したピッチ波形に対して伸長処理を行う。例えば、同一のピッチ周期をもつ２周期波形を３周期波形に伸長することで、３／２倍の伸長（伸長率；３／２）を得る。また、同一のピッチ周期をもつ３周期波形を４周期波形に伸長することで、４／３倍の伸長（伸長率；４／３）を得る。その結果、低速再生時において、オーディオ信号のビットレートが小さくなった場合に、スピーカ２３から再生される音声区間の時間長さは伸長される。
【００６３】
このとき、音声区間を伸長し過ぎると、音声区間が間延びして聞こえるため、音声の途切れは目立たなくなるものの、やはり不自然になる。これを防止するには、通常の再生時における音声区間の長さＬ１に対して、低速再生時における音声区間の長さＬ２を、例えば、以下の式に示すように設定する。
【００６４】
Ｌ２／Ｌ１≦１．４
尚、上記式は０．５倍速再生時だけでなく、あらゆる倍率の低速再生時に適用できる。ここで、時間軸圧縮伸長部４３における音声区間の伸長率は一定値にしてもよく、以下の▲１▼▲２▼に示すように可変にしてもよい。
【００６５】
▲１▼リングメモリ３２の蓄積量に対応して音声区間の伸長率を動的に変化させる。無音区間が短い場合、音声区間の伸長率が大きい（伸長の度合いが大きい）とリングメモリ３２がオーバーフローする恐れがある。これを防ぐためには、音声区間の伸長率を小さくすればよい。
【００６６】
▲２▼音声のピッチ変化に対応して音声区間の伸長率を動的に変化させる。つまり、図４に示すように、音声のピッチ変化に対応して音声区間の伸長率を変化させることで、話速を変化させる。この場合、音声の聞き易さをさらに向上させることができる。尚、音声のピッチ変化に対応して音声区間の伸長率を変化させることで話速を変化させる技術は公知である（信学技報ＳＰ９２−５６，ＨＣ９２−３３（１９９２−０９），Ｐ．４９〜５６．参照）。
【００６７】
（方法３）
方法３では、無音削除挿入部４２において、各無音区間を削除して各音声区間をつなぎ合わせた後で、音声区間に続いて新たに無音区間を挿入することで、音声の途切れを目立たなくする。尚、挿入する無音区間は、以下の▲１▼〜▲３▼のいずれであってもよい。
【００６８】
▲１▼全く音の存在しない無音区間。
▲２▼視聴者が違和感を覚えないような白色雑音を含む無音区間。尚、そのような白色雑音は、予め作成して別メモリ（図示略）に記憶しておく。
【００６９】
▲３▼音声判別部４１において無音区間と判別したオーディオ信号を別メモリ（図示略）に保持しておき、それを無音区間として挿入する。
このように、本実施形態によれば、以下の作用および効果を得ることができる。
【００７０】
（１）話速変換処理回路４を設けることで、高速再生時において、スピーカ２３から再生される音声のピッチを通常の再生時とほぼ同一にし、且つ、スピーカ２３から再生される話速を通常の再生時に近づけることが可能になり、自然で聞き易い音声を再生することができる。
【００７１】
ところで、ｍ倍速再生時（ｍ＞１）には、オーディオストリームおよびデコードクロックのビットレートは通常の再生時のｍ倍になる。このとき、話速変換部３５から出力されるデータのビットレートを通常の再生時とほぼ同一になるようにすれば、再生される音声のピッチを通常の再生時とほぼ同一にすることができる。すなわち、話速変換部３５においてビットレートをｍ→１に変換すれば、再生される音声のピッチは通常の再生時とほぼ同一になる。
【００７２】
（２）話速変換処理回路４を設けることで、低速再生時において再生される音声の途切れを目立たなくすることが可能になり、自然で聞き易い音声を再生することができる。
【００７３】
ところで、上記方法２と方法３とを、以下の（１）（２）に示すように併用してもよい。
（１）ＭＰＥＧオーディオ再生装置１のユーザが、方法２と方法３とを任意に切り換え選択できるようにする。このようにすれば、個々のユーザの聴覚特性に合わせることが可能になり、ユーザにとって聞き易い音声を再生することができる。
（２）低速再生の倍率に対応して方法２と方法３とが自動的に切り換え選択されるようにする。例えば、１〜０．５倍速再生時には方法３が選択され、０．５倍速以下の再生時には方法２が選択されるようにする。このようにすれば、再生速度に応じて、自然な音声を再生することができる。
【００７４】
（３）各回路２〜６を１チップのＬＳＩに搭載した場合には、ＭＰＥＧオーディオ再生装置１を小型化することができる。
（第２実施形態）
以下、本発明を具体化した第２実施形態を図面に従って説明する。尚、本実施形態において、第１実施形態と同じ構成部材については符号を等しくしてその詳細な説明を省略する。
【００７５】
図５に、本実施形態の要部ブロック回路図を示す。本実施形態において、第１実施形態と異なるのは、インデックス付加回路５１およびインデックス検出回路５２が設けられている点だけである。
【００７６】
インデックス付加回路５１は、フレームメモリ３４の前段（すなわち、ＭＰＥＧオーディオデコーダ３と話速変換処理回路４の間）に設けられている。インデックス付加回路５１は、デコードクロックに従って、オーディオデコーダ３の生成したオーディオ信号に一定周期でインデックス信号を付加する。そのインデックス信号が付加されたオーディオ信号は、フレームメモリ３４へ出力される。
【００７７】
インデックス検出回路５２は、リングメモリ３２から読み出されたデータに付加されているインデックス信号を検出し、そのインデックス信号から得られる時刻情報と現在時刻とから、話速変換処理回路４が信号処理に要する時間Δｔを算出し、その時間Δｔに関する検出信号をビデオデコーダ１２へ供給する。ビデオデコーダ１２は、その時間Δｔに関する検出信号に従って、自己の動作のタイミングを制御する。
【００７８】
このように、本実施形態によれば、第１実施形態の作用および効果に加えて、以下の作用および効果を得ることができる。
（１）前記したように、ビデオデコーダ１２の生成するビデオ信号と、オーディオデコーダ３の生成するオーディオ信号とは、通常の再生時において同期生成されるようになっている。そのため、オーディオデコーダ３とＤ／Ａコンバータ５の間に話速変換処理回路４を設けると、話速変換処理回路４における信号処理に要する時間分（すなわち、話速変換処理回路４における遅延時間分）だけ、オーディオ信号が遅延することになる。
【００７９】
そこで、インデックス付加回路５１を用いて、フレームメモリ３４へ入力されるオーディオ信号に予め一定周期でインデックス信号を付加する。
インデックス検出回路５２は、リングメモリ３２から読み出されたデータに付加されているインデックス信号を検出し、話速変換処理回路４が信号処理に要する時間Δｔを算出し、その時間Δｔに関する検出信号をビデオデコーダ１２へ供給する。ビデオデコーダ１２は、その時間Δｔに関する検出信号に従って、自己の動作のタイミングを制御する。また、インデックス検出回路５２が次にインデックス信号を検出したとき、ビデオデコーダ１２は、そのときに算出された時間と前回算出された時間との差だけ、自己の動作のタイミングを遅らせたり早めたりする。
【００８０】
その結果、話速変換処理回路４における遅延時間に関係なく、リングメモリ３２から読み出されたデータ（すなわち、話速変換処理済みのオーディオ信号）とビデオ信号との同期をとることができる。
【００８１】
（２）上記（１）より、スピーカ２３で再生される音声と、ディスプレイ２２で再生される動画との時間ずれを低減することが可能になり、リップシンクのずれを人間の聴覚の許容範囲内にすることができる。
【００８２】
（３）オーディオ信号に付加されたインデックス信号は、無音削除挿入部４２によって削除されることがある。しかし、インデックス信号を付加する周期を短くして、オーディオ信号に十分な数のインデックス信号を付加しておけば、そのインデックス信号の内のいくつかが無音削除挿入部４２によって削除されたとしても、リングメモリ３２から読み出されたデータには一定数以上のインデックス信号が残ることになる。その残ったインデックス信号により、上記（１）の作用および効果を得ることができる。
【００８３】
（第３実施形態）
以下、本発明を具体化した第３実施形態を図面に従って説明する。尚、本実施形態において、第２実施形態と同じ構成部材については符号を等しくしてその詳細な説明を省略する。
【００８４】
図６に、本実施形態の要部ブロック回路図を示す。本実施形態において、第２実施形態と異なるのは、インデックス付加回路５１が、フレームメモリ３４と音声判別部４１の間に設けられている点だけである。インデックス付加回路５１は、デコードクロックに従って、フレームメモリ３４から読み出されたオーディオ信号に一定周期でインデックス信号を付加する。そのインデックス信号が付加されたオーディオ信号は、音声判別部４１へ出力される。
【００８５】
前記したように、フレームメモリ３４が２フレーム分のオーディオ信号を蓄積する場合、フレームメモリ３４の記憶容量は、例えば、０．８Ｋバイト程度あれば十分である。このように、フレームメモリ３４の記憶容量が小さい場合には、話速変換処理回路４における遅延時間に比べて、フレームメモリ３４における書き込み動作および読み出し動作に要する時間（すなわち、フレームメモリ３４における遅延時間）は僅かであり、無視しても差し支えない。
【００８６】
従って、本実施形態によれば、第２実施形態と同様の作用および効果を得ることができる。
（第４実施形態）
以下、本発明を具体化した第４実施形態を図面に従って説明する。尚、本実施形態において、第２実施形態と同じ構成部材については符号を等しくしてその詳細な説明を省略する。
【００８７】
図７に、本実施形態の要部ブロック回路図を示す。本実施形態において、第２実施形態と異なるのは、インデックス付加回路５１が、音声判別部４１と無音削除挿入部４２および時間軸圧縮伸長部４３との間にそれぞれ設けられている点だけである。インデックス付加回路５１は、デコードクロックに従って、音声判別部４１における信号処理が済んだオーディオ信号に一定周期でインデックス信号を付加する。そのインデックス信号が付加されたオーディオ信号は、無音削除挿入部４２および時間軸圧縮伸長部４３へ出力される。
【００８８】
前記したように、フレームメモリ３４の記憶容量が小さい場合には、話速変換処理回路４における遅延時間に比べて、フレームメモリ３４における遅延時間は僅かであり、無視しても差し支えない。
【００８９】
また、音声判別部４１における信号処理に要する時間（すなわち、音声判別部４１における遅延時間）は、話速変換処理回路４における遅延時間に比べて僅かであり、無視しても差し支えない。
【００９０】
従って、本実施形態によれば、第２実施形態と同様の作用および効果を得ることができる。
（第５実施形態）
以下、本発明を具体化した第５実施形態を図面に従って説明する。尚、本実施形態において、第２実施形態と同じ構成部材については符号を等しくしてその詳細な説明を省略する。
【００９１】
図８に、本実施形態の要部ブロック回路図を示す。本実施形態において、第２実施形態と異なるのは、インデックス付加回路５１が、無音削除挿入部４２および時間軸圧縮伸長部４３とリングメモリ３２との間に設けられている点だけである。インデックス付加回路５１は、デコードクロックに従って、各部４２，４３における信号処理が済んだオーディオ信号に一定周期でインデックス信号を付加する。そのインデックス信号が付加されたオーディオ信号は、リングメモリ３２へ出力される。
【００９２】
前記したように、フレームメモリ３４の記憶容量が小さい場合には、話速変換処理回路４における遅延時間に比べて、フレームメモリ３４における遅延時間は僅かであり、無視しても差し支えない。
【００９３】
また、各部４１〜４３における信号処理に要する時間（すなわち、各部４１〜４３における遅延時間）は、話速変換処理回路４における遅延時間に比べて僅かであり、無視しても差し支えない。
【００９４】
つまり、話速変換処理回路４における遅延時間は、主に、リングメモリ３２における書き込み動作および読み出し動作に要する時間（すなわち、リングメモリ３２における遅延時間）によって決定される。
【００９５】
従って、本実施形態によれば、第２実施形態と同様の作用および効果を得ることができる。また、本実施形態によれば、第２実施形態のようにオーディオ信号に付加されたインデックス信号が無音削除挿入部４２によって削除されることがない。そのため、付加したインデックス信号が全て活用され、インデックス信号の数を減らすことが可能になることから、インデックス付加回路５１の回路規模を小さくすることができる。
【００９６】
（第６実施形態）
以下、本発明を具体化した第６実施形態を図面に従って説明する。尚、本実施形態において、第１実施形態と同じ構成部材については符号を等しくしてその詳細な説明を省略する。
【００９７】
図９に、本実施形態の要部ブロック回路図を示す。本実施形態において、第１実施形態と異なるのは、遅延時間検出回路５３が設けられている点だけである。
前記したように、音声判別部４１は、フレームメモリ３４から読み出されたオーディオ信号が、音声区間か又は無音区間かを判別する。つまり、音声判別部４１の処理結果には、オーディオ信号に音声が含まれているか否かという情報が含まれている。
【００９８】
また、デコードクロックは、システムストリームのビットレートに対応している。つまり、デコードクロックには、予めオーディオ信号の圧縮伸長率の情報が含まれている。
【００９９】
そこで、遅延時間検出回路５３は、オーディオ信号に音声が含まれているか否かという情報と圧縮伸長率の情報とに基づいて、話速変換処理回路４における遅延時間を検出し、その検出信号をビデオデコーダ１２へ供給する。ビデオデコーダ１２は、遅延時間検出回路５３の検出信号に基づいて、自己の動作のタイミングを制御する。その結果、話速変換処理回路４における遅延時間に関係なく、リングメモリ３２から読み出されたデータ（すなわち、話速変換処理済みのオーディオ信号）とビデオ信号との同期をとることができる。
【０１００】
このように、本実施形態によれば、第２実施形態と同様の効果を得ることができる。
（第７実施形態）
以下、本発明を具体化した第７実施形態を図面に従って説明する。尚、本実施形態において、第１実施形態と同じ構成部材については符号を等しくしてその詳細な説明を省略する。
【０１０１】
図１０に、本実施形態の要部ブロック回路図を示す。本実施形態において、第１実施形態と異なるのは、制御回路５４が設けられている点だけである。
制御回路５４は、アップダウンカウンタ３３の検出したリングメモリ３２の蓄積量に基づいて、ビデオデコーダ１２の動作速度を制御するための制御信号を生成し、その制御信号をビデオデコーダ１２へ供給する。ビデオデコーダ１２は、制御回路５４の制御信号に基づいて、自己の動作のタイミングを制御する。その結果、リングメモリ３２から読み出されたデータと、ビデオデコーダ１２の生成するビデオ信号との同期をとることができる。
【０１０２】
前記したように、話速変換処理回路４における遅延時間は、主にリングメモリ３２における遅延時間によって決定される。リングメモリ３２における遅延時間は、その蓄積量と相関関係があり、蓄積量が大きくなるほど遅延時間も大きくなる。従って、リングメモリ３２の蓄積量に基づいてビデオデコーダ１２の動作速度を制御すれば、リングメモリ３２から読み出されたデータ（すなわち、話速変換処理済みのオーディオ信号）とビデオ信号との同期をとることができる。
【０１０３】
このように、本実施形態によれば、第２実施形態と同様の効果を得ることができる。
（第８実施形態）
以下、本発明を具体化した第８実施形態を図面に従って説明する。尚、本実施形態において、第１実施形態と同じ構成部材については符号を等しくしてその詳細な説明を省略する。
【０１０４】
図１１に、本実施形態の要部ブロック回路図を示す。本実施形態において、第１実施形態と異なるのは、遅延時間検出回路５５が設けられている点だけである。
【０１０５】
前記したように、音声判別部４１の処理結果には、オーディオ信号に音声が含まれているか否かという情報が含まれている。
また、時間軸圧縮伸長部４３の処理結果には、オーディオ信号の圧縮伸長率の情報が含まれている。
【０１０６】
そこで、遅延時間検出回路５５は、オーディオ信号に音声が含まれているか否かという情報と圧縮伸長率の情報とに基づいて、話速変換処理回路４における遅延時間を検出し、その検出信号をビデオデコーダ１２へ供給する。ビデオデコーダ１２は、遅延時間検出回路５５の検出信号に基づいて、自己の動作のタイミングを制御する。その結果、話速変換処理回路４における遅延時間に関係なく、リングメモリ３２から読み出されたデータ（すなわち、話速変換処理済みのオーディオ信号）とビデオ信号との同期をとることができる。
【０１０７】
このように、本実施形態によれば、第２実施形態と同様の効果を得ることができる。
図１２に、可変速再生機能を備えたＭＰＥＧビデオデコーダ１２の要部ブロック回路を示す。
【０１０８】
ＭＰＥＧビデオデコーダ１２は、ビットバッファ２０２、ピクチャヘッダ検出回路２０３、ＭＰＥＧビデオデコードコア回路（以下、デコードコア回路と略す）２０４、可変閾値オーバーフロー判定回路（以下、判定回路と略す）２０５、ピクチャスキップ回路２０６、制御コア回路２０７から構成されている。尚、各回路２０３〜２０７は１チップのＬＳＩに搭載することもできる。
【０１０９】
制御コア回路２０７は各回路２〜６を制御する。
ＡＶパーサ１１から転送されてきたＭＰＥＧビデオストリームはビットバッファ２０２へ入力される。
【０１１０】
ビットバッファ２０２はＦＩＦＯ構成のＲＡＭから成るリングメモリによって構成され、転送されてくるビデオストリームをそのまま順次蓄積する。
ピクチャヘッダ検出回路２０３は、ビットバッファ２０２に蓄積されたビデオストリームの各ピクチャの先頭に付くピクチャヘッダを検出し、その各ピクチャヘッダに規定されているピクチャのタイプ（Ｉ，Ｐ，Ｂ）を検出する。
【０１１１】
制御コア回路２０７は、ピクチャヘッダ検出回路２０３の検出結果と後記する判定回路２０５の判定結果とに基づいて、ビットバッファ２０２から１フレーム期間毎に適宜なピクチャ分のビデオストリームを読み出す。尚、ビットバッファ２０２から読み出されたビデオストリームは、読み出された後もビットバッファ２０２にそのまま残される。
【０１１２】
ビットバッファ２０２から読み出された各ピクチャは、ピクチャスキップ回路２０６を介してデコードコア回路２０４へ転送される。
デコードコア回路２０４は、各ピクチャをＭＰＥＧビデオパートに準拠してデコードし、各ピクチャ毎のビデオ信号を生成する。
【０１１３】
ピクチャスキップ回路２０６は、制御コア回路２０７の制御に従って各ノード２０６ａ，２０６ｂ側への接続が切り換えられる。そして、ピクチャスキップ回路２０６がノード２０６ａ側に接続されると、ビットバッファ２０２から読み出されたピクチャはそのままデコードコア回路２０４へ転送される。また、ノード２０６ｂ側に接続されると、ビットバッファ２０２から読み出されたピクチャはデコードコア回路２０４へ転送されずにスキップされる。その結果、デコードコア回路２０４へ転送されるピクチャは、ピクチャスキップ回路２０６によってスキップされた分だけピクチャ単位で間引かれる。
【０１１４】
判定回路２０５は、再生速度検出回路２の生成したデコードクロックに基づいてビットバッファ２０２の占有量Ｂｍの閾値Ｂｔｈｎを設定し、ビットバッファ２０２の占有量Ｂｍと閾値Ｂｔｈｎとを比較する。尚、判定回路２０５では、再生速度検出回路２の生成した実際のデコードクロックの周波数と、通常の再生時のデコードクロックの周波数との比を求め、その比を再生速度の倍率ｎとする。従って、２倍速再生時には倍率ｎ＝２となり、閾値Ｂｔｈｎ＝Ｂｔｈ２となる。また、通常の再生時には倍率ｎ＝１となり、閾値Ｂｔｈｎ＝Ｂｔｈ１となる。
【０１１５】
そして、判定回路２０５は、ビットバッファ２０２の占有量Ｂｍが閾値Ｂｔｈｎを越えない場合には、ビットバッファ２０２がオーバーフローする恐れがなく正常であると判定する。この場合、制御コア回路２０７は、ビットバッファ２０２から１ピクチャ分のビデオストリームを読み出す。そして、制御コア回路２０７は、ピクチャスキップ回路２０６をノード２０６ａ側に接続し、そのビットバッファ２０２から読み出されたピクチャをデコードコア回路２０４へ転送させる。
【０１１６】
また、判定回路２０５は、ビットバッファ２０２の占有量Ｂｍが閾値Ｂｔｈｎを越えた場合には、ビットバッファ２０２がオーバーフローする恐れがあると判定する。この場合、制御コア回路２０７は、ビットバッファ２０２の占有量Ｂｍが閾値Ｂｔｈｎを下回るまで、ビットバッファ２０２から適宜なピクチャ分のビデオストリームを読み出す。そして、制御コア回路２０７は、ピクチャスキップ回路２０６をノード２０６ｂ側に接続し、そのビットバッファ２０２から読み出された適宜なピクチャ分のビデオストリームを全てスキップさせる。
【０１１７】
図１３に、ビットバッファ２０２の占有量Ｂｍの変化を示す。
ビットバッファ２０２の占有量ＢｍはビットレートＲＢをグラフの傾きとして上昇する。ビットレートＲＢは、シーケンスの先頭に付くシーケンスヘッダのＢＲ（ＢｉｔＲａｔｅ）に従って式（１）に示すように規定される。また、ＡＶパーサ１１から転送されてくるビデオストリームのピクチャレートＲＰはシーケンスヘッダのＰＲ（ＰｉｃｔｕｒｅＲａｔｅ）によって規定される。そして、ビットバッファ２０２の容量Ｂは、シーケンスヘッダのＶＢＶ（Ｖｂｖ［ＶｉｄｅｏＢｕｆｆｅｒｒｉｎｇＶｅｒｉｆｉｅｒ］ＢｕｆｆｅｒＳｉｚｅ）に従って式（２）に示すように規定される。そして、１フレーム期間毎に、デコードコア回路２０４がそのときデコードしようとする１ピクチャ分のビデオストリームが、ビットバッファ２０２から一気に読み出される。ここで、１フレーム期間にビットバッファ２０２に入力されるビデオストリームのデータ量Ｘは、ビットレートＲＢおよびピクチャレートＲＰに従って式（３）に示すように規定される。従って、ビットバッファ２０２から１ピクチャ分のビデオストリームが一気に読み出された直後のビットバッファ２０２の占有量Ｂｍ（＝Ｂ０〜Ｂ６）は、データ量Ｘとビットバッファ２０２の容量Ｂとに基づいて、式（４）に示す条件を満たすように規定される。
【０１１８】
ＲＢ＝４００×ＢＲ ………（１）
Ｂ＝１６×１０２４×ＶＢＶ ………（２）
Ｘ＝ＲＢ／ＲＰ ………（３）
０＜Ｂｍ＜Ｂ−Ｘ＝Ｂ−（ＲＢ／ＲＰ） ………（４）
式（４）に示す条件を満たすようにビットバッファ２０２の占有量Ｂｍが規定されていれば、ビットバッファ２０２がオーバーフローしたりアンダーフローしたりすることはない。逆に言えば、ビットバッファ２０２の占有量Ｂｍが閾値（Ｂ−Ｘ）を越えると、次の１フレーム期間にビットバッファ２０２に入力されるビデオストリームによってビットバッファ２０２がオーバーフローする可能性が極めて高くなる。
【０１１９】
ビデオデコーダ１２では、通常の再生時において、式（４）が満たされるように、ビットレートＲＢ、ピクチャレートＲＰ、容量Ｂの各値が規定されている。つまり、式（２）に示すようにビットバッファ２０２の容量Ｂを設定しておけば、ピクチャスキップ回路２０６の接続をノード２０６ａ側に固定しておいたとしても、理想的な状態ではビットバッファ２０２がオーバーフローしたりアンダーフローしたりすることはない。
【０１２０】
従って、通常の再生時において、ビットバッファ２０２から１ピクチャ分のデータが一気に読み出された直後の占有量Ｂｍ（＝Ｂ０〜Ｂ４）は、閾値Ｂｔｈ１に基づいて、式（５）に示す条件を満たすように規定される。尚、閾値Ｂｔｈ１は、式（４）に基づいて、式（６）に示すように設定される。
【０１２１】
０＜Ｂｍ＜Ｂｔｈ１＜Ｂ ………（５）
Ｂｔｈ１＝Ｂ−Ｘ＝Ｂ−（ＲＢ／ＲＰ） ………（６）
ところで、実際の状態では、式（２）に示すようにビットバッファ２０２の容量Ｂを設定しておいても、ピクチャスキップ回路２０６の接続をノード２０６ａ側に固定しておくと、ビットバッファ２０２がオーバーフローする恐れがある。
【０１２２】
しかし、ビデオデコーダ１２では、通常の再生時において、ビットバッファ２０２の占有量Ｂｍが閾値Ｂｔｈ１を越えた場合、ビットバッファ２０２がオーバーフローする恐れがあると判定される。すると、ビットバッファ２０２の占有量Ｂｍが閾値Ｂｔｈ１を下回るまで、ビットバッファ２０２から適宜なピクチャ分のビデオストリームが読み出される。そして、ピクチャスキップ回路２０６はノード２０６ｂ側に接続され、そのビットバッファ２０２から読み出された適宜なピクチャ分のビデオストリームは全てスキップされる。従って、ビデオデコーダ１２によれば、通常の再生時において、ビットバッファ２０２がオーバーフローすることはない。
【０１２３】
高速再生時におけるビットバッファ２０２の占有量Ｂｍはビットレートｎ×ＲＢをグラフの傾きとして上昇する。例えば、２倍速再生時におけるビットバッファ２０２の占有量Ｂｍはビットレート２×ＲＢをグラフの傾きとして上昇する。
【０１２４】
従って、高速再生時において、ビットバッファ２０２から１ピクチャ分のデータが一気に読み出された直後の占有量Ｂｍ（＝Ｂ０〜Ｂ４）は、閾値Ｂｔｈｎに基づいて、式（７）に示す条件を満たすように規定される。尚、閾値Ｂｔｈｎは式（８）に示すように設定される。
【０１２５】
０＜Ｂｍ＜Ｂｔｈｎ ………（７）
Ｂｔｈｎ＝Ｂ−ｎ×Ｘ＝Ｂ−（ｎ×ＲＢ／ＲＰ） ………（８）
高速再生時においては、ビットバッファ２０２の占有量Ｂｍが閾値Ｂｔｈｎを越えた場合、ビットバッファ２０２がオーバーフローする恐れがあると判定される。例えば、２倍速再生時には占有量Ｂｍが閾値Ｂｔｈ２（＝Ｂ−（２×ＲＢ／ＲＰ））を越えた場合、３倍速再生時には占有量Ｂｍが閾値Ｂｔｈ３（＝Ｂ−（３×ＲＢ／ＲＰ））を越えた場合に、ビットバッファ２０２がオーバーフローする恐れがあると判定される。すると、ビットバッファ２０２の占有量Ｂｍが閾値Ｂｔｈｎを下回るまでビットバッファ２０２から適宜なピクチャ分のビデオストリームが読み出され、そのビデオストリームは全てスキップされる。従って、ビデオデコーダ１２によれば、高速再生時において、ビットバッファ２０２がオーバーフローすることはない。
【０１２６】
デコードコア回路２０４において任意のピクチャをデコードしている途中でビットバッファ２０２がオーバーフローすると、デコード処理中のピクチャのビットバッファ２０２に残っている部分に対して、新たに入力されたビデオストリームが上書きされる。その結果、デコード処理中のピクチャのビットバッファ２０２に残っている部分が破壊されて失われる。すると、デコードコア回路２０４では、そのピクチャのデコードを完了することが不可能になり、そのピクチャのビデオ信号を生成することができなくなる。従って、デコードコア回路２０４において任意のピクチャをデコードしている途中でビットバッファ２０２がオーバーフローすることは絶対に避けなければならない。
【０１２７】
そのため、ビットバッファ２０２がオーバーフローする恐れがあるかどうかの判定は、デコードコア回路２０４において任意のピクチャのデコードを開始する前に行う必要がある。より正確には、ピクチャヘッダ検出回路２０３がピクチャヘッダを検出した時点で、ビットバッファ２０２がオーバーフローする恐れがあるかどうかを判定し、そのピクチャをピクチャスキップ回路２０６を介してスキップするかどうかを決定する必要がある。
【０１２８】
ところで、１つのピクチャのデータ量は０〜４０バイトであるが、そのデータ量はデコードコア回路２０４においてデコードが終了した時点でないとわからない。また、１つのピクチャのデコード処理時間は、そのピクチャのデータ量やデコードコア回路２０４の動作速度によって異なるが、通常、１フレーム期間の１／３〜３／４程度である。
【０１２９】
ビットバッファ２０２から読み出されたピクチャのデータ量が０バイトの場合、そのピクチャの読み出し前後でビットバッファ２０２の占有量Ｂｍは変化しないため、そのピクチャをスキップしたとしてもオーバーフローを回避することはできない。逆に言えば、ビットバッファ２０２から読み出されたピクチャのデータ量が０バイトの場合でも、ビットバッファ２０２に十分な空き容量があればオーバーフローすることはない。
【０１３０】
そこで、１フレーム期間にビットバッファ２０２に入力されるビデオストリームのデータ量分の空き容量を、ビットバッファ２０２に確保しておく。そうすれば、ビットバッファ２０２から読み出されたピクチャのデータ量が０バイトの場合でもオーバーフローすることはない。
【０１３１】
１フレーム期間にビットバッファ２０２に入力されるビデオストリームのデータ量は、（ｎ×Ｘ＝ｎ×ＲＢ／ＲＰ）になる。ビットバッファ２０２の空き容量がこのデータ量以上であればオーバーフローすることはない。従って、式（８）に示すように閾値Ｂｔｈｎを設定しておけば、ビットバッファ２０２のオーバーフローを確実に回避することができる。
【０１３２】
すなわち、判定回路２０５は、ピクチャヘッダ検出回路２０３がピクチャヘッダを検出した時点でビットバッファ２０２の空き容量をチェックし、十分な空き容量（ｎ×Ｘ＝ｎ×ＲＢ／ＲＰ）が確保されているかどうかを判定する。十分な空き容量が確保されていなければ、そのピクチャヘッダに基づいて制御コア回路２０７がビットバッファ２０２から読み出したピクチャを、ピクチャスキップ回路２０６を介してスキップする。続いて、判定回路２０５は、ピクチャヘッダ検出回路２０３が次のピクチャヘッダを検出した時点で、再びビットバッファ２０２の空き容量をチェックする。これらの処理に要する時間は、デコードコア回路２０４のデコード処理時間に比べてはるかに短いため、ビットバッファ２０２に十分な空き容量が確保できてからデコードコア回路２０４のデコード処理を開始しても十分に間に合う。
【０１３３】
ところで、ピクチャヘッダ検出回路２０３がピクチャヘッダを検出した時点や、デコードコア回路２０４がデコードを開始した後に、ビットバッファ２０２がアンダーフローすることがある。この場合は、ビデオストリームがビットバッファ２０２に入力され次第、ビットバッファ２０２から１ピクチャ分のビデオストリームを逐次読み出せばよいため、特に問題とはならない。
【０１３４】
以上詳述したように、ビデオデコーダ１２によれば、以下に示す効果を得ることができる。
▲１▼通常の再生時において、ビットバッファ２０２のオーバーフローを回避することができる。
【０１３５】
▲２▼高速再生時において、ビットバッファ２０２のオーバーフローを回避することができる。
▲３▼判定回路２０５およびピクチャスキップ回路２０６を設けることにより、ビットバッファ２０２のオーバーフローを回避することができる。上記したように判定回路２０５およびピクチャスキップ回路２０６の制御は簡単であるため、制御コア回路２０７はマイクロコンピュータを用いて構成する必要がない。そして、各回路２０３〜２０７を１チップのＬＳＩに搭載した場合には、ビデオデコーダ１２を小型化することができる。
【０１３６】
▲４▼ピクチャスキップ回路２０６のノード２０６ｂ側からスキップされるビデオストリームは、ピクチャ単位となる。そのため、デコードコア回路２０４へ転送されるピクチャの途中でデータが途切れることはない。従って、デコードコア回路２０４では、ＩピクチャだけでなくＰピクチャやＢピクチャについてもデコード可能になる。その結果、ディスプレイ２２で再生される動画に生じるコマ落ちが少なくなる。そのため、２〜４倍という比較的遅い高速再生時において、数コマ／秒の表示が可能になる。従って、高速再生時における動画の動きを滑らかにして画質を大幅に向上させることができる。
【０１３７】
ところで、上記したビデオデコーダ１２において、式（９）に示す規定を満たすように、２つの閾値Ｂ２ｔｈｎ，Ｂ３ｔｈｎを設定してもよい。尚、各閾値Ｂ２ｔｈｎ，Ｂ３ｔｈｎの値は、上記のように再生速度に応じて設定されると共に、ディスプレイ２２で再生される動画の画質を実際に検討して適宜に設定すればよい。
【０１３８】
０＜Ｂ３ｔｈｎ＜Ｂ２ｔｈｎ＜Ｂ ………（９）
判定回路２０５は、ビットバッファ２０２の占有量Ｂｍと各閾値Ｂｔｈｎ，Ｂ２ｔｈｎとを比較し、占有量Ｂｍが式（１０）〜（１２）に示すどの領域に含まれるかを判定する。
【０１３９】
Ｂｍ＜Ｂ３ｔｈｎ ………（１０）
Ｂ３ｔｈｎ＜Ｂｍ＜Ｂ２ｔｈｎ ………（１１）
Ｂ２ｔｈｎ＜Ｂｍ ………（１２）
判定回路２０５は、式（１０）に示すように、ビットバッファ２０２の占有量Ｂｍが閾値Ｂ３ｔｈｎを越えない場合には、ビットバッファ２０２がオーバーフローする恐れがなく正常であると判定する。この場合、制御コア回路２０７は、ビットバッファ２０２から１ピクチャ分のビデオストリームを読み出す。そして、制御コア回路２０７は、ピクチャスキップ回路２０６をノード２０６ａ側に接続し、そのビットバッファ２０２から読み出されたピクチャをデコードコア回路２０４へ転送させる。
【０１４０】
判定回路２０５は、式（１２）に示すように、ビットバッファ２０２の占有量Ｂｍが閾値Ｂ２ｔｈｎを越え且つ閾値Ｂｔｈｎを越えない場合に、ビットバッファ２０２から読み出されたピクチャがＩピクチャまたはＰピクチャならば、第１のフラグを立てる。また、式（１１）に示すように、ビットバッファ２０２の占有量Ｂｍが閾値Ｂ３ｔｈｎを越え且つ閾値Ｂ２ｔｈｎを越えない場合に、ビットバッファ２０２から読み出されたピクチャがＰピクチャならば、第２のフラグを立てる。第１または第２のフラグが立っている場合、式（１０）に示す場合でも、制御コア回路２０７は、ビットバッファ２０２から読み出されたピクチャがＢピクチャならば、ピクチャスキップ回路２０６をノード２０６ｂ側に接続し、そのピクチャをスキップさせる。
【０１４１】
図１３に、２つの閾値Ｂ２ｔｈｎ，Ｂ３ｔｈｎを設定した場合におけるビットバッファ２０２の占有量Ｂｍの変化を示す。
占有量Ｂｍが閾値Ｂ３ｔｈｎを越えた場合、ビットバッファ２０２から読み出されたピクチャがＢピクチャであればデコードせずにスキップする（図示※１）。ここで、Ｂピクチャのスキップ後に占有量Ｂｍがまだ閾値Ｂ３ｔｈｎを越えていても、ビットバッファ２０２から次に読み出されたピクチャがＩピクチャまたはＰピクチャであればデコードする（図示※２）。
【０１４２】
占有量Ｂｍが閾値Ｂ３ｔｈｎを越えた場合でも、ビットバッファ２０２から読み出されたピクチャがＩピクチャまたはＰピクチャであればデコードする（図示※３）。ここで、ＩピクチャまたはＰピクチャのデコード後に占有量Ｂｍがまだ閾値Ｂ３ｔｈｎを越えている場合、ビットバッファ２０２から次に読み出されたピクチャがＢピクチャであればデコードせずにスキップする（図示※４）。このＢピクチャのスキップは、占有量Ｂｍが閾値Ｂ３ｔｈｎを下回るまで繰り返し行う（図示※５）。
【０１４３】
占有量Ｂｍが閾値Ｂ２ｔｈｎを越えた場合、ビットバッファ２０２から読み出されたピクチャがＩピクチャまたはＰピクチャであれば、判定回路２０５は第１のフラグを立てる（図示※６）。第１のフラグが立っている場合、ビットバッファ２０２から次に読み出されたピクチャがＢピクチャであれば、占有量Ｂｍが閾値Ｂ３ｔｈｎを下回っていても、そのＢピクチャをスキップする（図示※７）。
【０１４４】
占有量Ｂｍが閾値Ｂ３ｔｈｎを越え且つ閾値Ｂ２ｔｈｎを越えない場合、ビットバッファ２０２から読み出されたピクチャがＰピクチャであれば、判定回路２０５は第２のフラグを立てる（図示※８）。第２のフラグが立っている場合、ビットバッファ２０２から次に読み出されたピクチャがＢピクチャであれば、占有量Ｂｍが閾値Ｂ３ｔｈｎを下回っていても、そのＢピクチャをスキップする（図示※９）。
【０１４５】
占有量Ｂｍが閾値Ｂ３ｔｈｎを越え且つ閾値Ｂ２ｔｈｎを越えない場合、ビットバッファ２０２から読み出されたピクチャがＩピクチャのときには、判定回路２０５は第２のフラグを立てない（図示※１０）。第２のフラグが立っていない場合、占有量Ｂｍが閾値Ｂ３ｔｈｎを下回っていれば、ビットバッファ２０２から次に読み出されたピクチャがＢピクチャであってもデコードする。
【０１４６】
以上のように、２つの閾値Ｂ２ｔｈｎ，Ｂ３ｔｈｎを設定した場合には、上記したビデオデコーダ１２の効果▲１▼〜▲３▼に加えて、以下の効果を得ることができる。
▲４▼ビットバッファ２０２の占有量Ｂｍが閾値Ｂ３ｔｈｎを越え且つ閾値Ｂｔｈｎを越えない場合、ＩピクチャおよびＰピクチャを可能な限りデコードすると共に、Ｂピクチャを優先してスキップする。
【０１４７】
Ｂピクチャは双方向予測によって生成されるため、その重要度はＩピクチャやＰピクチャに比べて低い。従って、重要度の低いＢピクチャを優先してスキップすることにより、ディスプレイ２２で再生される動画に生じるコマ落ちをさらに少なくすることができる。その結果、高速再生時における動画の動きをさらに滑らかにして画質をより向上させることができる。
【０１４８】
▲５▼第１のフラグを設定することで、ＩピクチャまたはＰピクチャのデコード後にビットバッファ２０２の占有量Ｂｍが閾値Ｂ３ｔｈｎを下回っても、余裕をみて次にビットバッファ２０２から読み出されるＢピクチャを予めスキップすることができる。また、第２のフラグを設定することで、Ｐピクチャのデコード後にビットバッファ２０２の占有量Ｂｍが閾値Ｂ３ｔｈｎを下回っても、余裕をみて次にビットバッファ２０２から読み出されるＢピクチャを予めスキップすることができる。
【０１４９】
このように、Ｂピクチャを予めスキップすることは、ビットバッファ２０２の次回のオーバーフローに対して予防措置を講ずることに他ならない。従って、ビットバッファ２０２のオーバーフローをより確実に回避することができる。
【０１５０】
▲６▼Ｉピクチャのデータ量はＰピクチャのそれの２〜３倍と多い。そのため、Ｐピクチャが読み出された場合に比べて、Ｉピクチャが読み出された場合の方がビットバッファ２０２の占有量Ｂｍの減少の度合いが大きい。従って、Ｐピクチャが読み出された後よりも、Ｉピクチャが読み出された後の方がビットバッファ２０２がオーバーフローする可能性が小さくなる。そこで、第１および第２のフラグを設定することにより、ＩピクチャとＰピクチャとで前記予防措置に差をつける。すなわち、Ｉピクチャに対する予防措置の閾値Ｂ２ｔｈｎを、Ｐピクチャに対する予防措置の閾値Ｂ３ｔｈｎよりも高い値に設定することで、Ｉピクチャに対する予防措置をＰピクチャのそれに比べて緩くすることが可能になる。その結果、Ｂピクチャの無駄なスキップを少なくすることができる。
【０１５１】
▲７▼以下のａ）ｂ）に示すＧＯＰ構成（ピクチャのタイプの並び）のビデオストリームがＡＶパーサ１１から転送されてきた場合についてシミュレーションしたところ、以下に示す結果が得られた。
【０１５２】
ａ）ＩＢＰＢＰＢＰＢＰ・・・
ｂ）ＩＢＢＰＢＢＰＢＢＰＢＢＰＢＢＩＢＰ・・・
［１］２倍速再生時；ａ）の場合、ＩピクチャおよびＰピクチャの全てがデコード可能であり、その結果、３０コマ／秒のフルレートで表示できる。ｂ）の場合、ＩピクチャおよびＰピクチャの全てとＢピクチャの一部がデコード可能であり、その結果、２５コマ／秒以上で表示できる。
【０１５３】
［２］４倍速再生時；ａ）ｂ）共に、Ｉピクチャおよびそれに続く３〜４枚のＰピクチャがデコード可能であり、その結果、１５コマ／秒以上で表示できる。
ところで、第２〜第３実施形態において、ビデオデコーダ１２の動作速度を制御するには、デコードコア回路２０４におけるデコード処理の速度を制御すればよい。
【０１５４】
尚、上記各実施形態は以下のように変更してもよく、その場合でも同様の作用および効果を得ることができる。
（１）リングメモリ３２を、ＤＳＰ３１の後段ではなく、ＤＳＰ３１の前段（すなわち、ＭＰＥＧオーディオデコーダ３とＤＳＰ３１の間）に設ける。
【０１５５】
（２）ＭＰＥＧ再生装置２３を構成する各回路１，１１，１２を１チップのＬＳＩに搭載する。このようにすれば、ＭＰＥＧ再生装置２３を小型化することができる。
【０１５６】
（３）第２〜第８実施形態において、ビデオデコーダ１２の動作速度を制御するのではなく、ビデオデコーダ１２とディスプレイ２２の間に遅延回路を挿入し、その遅延回路の遅延時間を制御する。
【０１５７】
（４）第２〜第８実施形態の内いずれか２つ以上の実施形態を適宜に組み合わせて実施する。このようにすれば、組み合わせた各実施形態の相乗作用によりさらに優れた効果を得ることができる。
【０１５８】
（５）第１〜第８実施形態をＣＰＵを用いたソフトウェア的な処理に置き代える。すなわち、各回路（１〜５５）における信号処理をＣＰＵを用いたソフトウェア的な信号処理に置き代える。
【０１５９】
（６）図１２に示したＭＰＥＧビデオデコーダ１２においては、説明を分かり易くするため、ピクチャスキップ回路２０６が各ノード２０６ａ，２０６ｂを有し、制御コア回路２０７の制御に従って各ノード２０６ａ，２０６ｂの接続が切り換えられる構成としたが、この構成に代えて、ピクチャスキップ回路２０６を、制御コア回路２０７の制御に従って、デコードコア回路２０４でデコードされるべきピクチャだけを通過させる論理回路によって構成してもよい。
【０１６０】
以上、本発明を具体化した各実施形態について説明したが、上記実施形態から把握できる請求項以外の技術的思想について、以下にそれらの効果と共に記載する。
（イ）請求項１〜３のいずれか１項に記載のＭＰＥＧオーディオ再生装置において、オーディオ信号をＤ／Ａ変換するＤ／Ａコンバータ（５）と、Ｄ／Ａコンバータの出力を増幅するオーディオアンプ（６）とを備えたＭＰＥＧオーディオ再生装置。
【０１６１】
このようにすれば、ディジタルのオーディオ信号からスピーカを駆動するためのアナログ信号を生成することができる。
（ロ）請求項４〜８のいずれか１項に記載のＭＰＥＧ再生装置において、記録媒体（２１）から読み出されたＭＰＥＧシステムストリームを、ＭＰＥＧオーディオストリームとＭＰＥＧビデオストリームとに分離するデマルチプレクサ（１３）を備えたＭＰＥＧ再生装置。
【０１６２】
このようにすれば、オーディオデコーダへオーディオストリームを、ビデオデコーダへビデオストリームをそれぞれ転送することができる。
【０１６３】
【発明の効果】
請求項１〜３のいずれか１項に記載の発明によれば、可変速再生時においても自然で聞き易い音声を再生することが可能なＭＰＥＧオーディオ再生装置を提供することができる。
【０１６４】
請求項４に記載の発明によれば、可変速再生時においても自然で聞き易い音声を再生することが可能なＭＰＥＧオーディオ再生装置とＭＰＥＧビデオデコーダとを備えたＭＰＥＧ再生装置を提供することができる。
【０１６５】
請求項５〜８のいずれか１項に記載の発明によれば、可変速再生時においても自然で聞き易い音声を再生することが可能なＭＰＥＧオーディオ再生装置とＭＰＥＧビデオデコーダとを備え、音声と動画との時間ずれを低減することが可能なＭＰＥＧ再生装置を提供することができる。
【図面の簡単な説明】
【図１】第１実施形態のブロック回路図。
【図２】第１実施形態の要部ブロック回路図。
【図３】第１実施形態の作用を説明するための模式図。
【図４】第１実施形態の作用を説明するための模式図。
【図５】第２実施形態の要部ブロック回路図。
【図６】第３実施形態の要部ブロック回路図。
【図７】第４実施形態の要部ブロック回路図。
【図８】第５実施形態の要部ブロック回路図。
【図９】第６実施形態の要部ブロック回路図。
【図１０】第７実施形態の要部ブロック回路図。
【図１１】第８実施形態の要部ブロック回路図。
【図１２】ＭＰＥＧビデオデコーダの要部ブロック回路図。
【図１３】ＭＰＥＧビデオデコーダの動作を説明するためのグラフ。
【図１４】ＭＰＥＧビデオデコーダの動作を説明するためのグラフ。
【符号の説明】
１…ＭＰＥＧオーディオ再生装置
２…話速変換手段としての再生速度検出回路
３…ＭＰＥＧオーディオデコーダ
４…話速変換手段としての話速変換処理回路
１２…ＭＰＥＧビデオデコーダ
２１…記録媒体
３２…リングメモリ
３３…検出手段としてのアップダウンカウンタ
４１…音声判別部
４２…無音削除挿入部
４３…時間軸圧縮伸長部
５１…インデックス付加回路
５２…インデックス検出回路
５３，５５…遅延時間検出回路
５４…制御回路[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a moving picture expert group (MPEG) audio playback apparatus and an MPEG playback apparatus, and more particularly, to an MPEG audio playback apparatus and an MPEG playback apparatus having a speech speed conversion function.
[0002]
[Prior art]
The information handled in multimedia is enormous and diverse, and it is necessary to process such information at high speed in order to put multimedia into practical use. In order to process information at high speed, data compression / decompression technology is indispensable. As such a data compression / decompression technique, an “MPEG” method can be cited. This MPEG system is being standardized by an MPEG committee (ISO / IEC JTC1 / SC29 / WG11) under the umbrella of ISO (International Organization for Standardization) / IEC (International Electrotechnical Commission).
[0003]
MPEG is composed of three parts. In the “MPEG system part” of Part 1 (ISO / IEC IS 11172 Part 1: Systems), a multiplexing structure (multiplex structure) of video data and audio data and a synchronization method are defined. In Part 2, “MPEG Video Part” (ISO / IEC IS 11172 Part 2: Video), a high-efficiency encoding method of video data and a format of the video data are specified. Part 3 “MPEG Audio Part” (ISO / IEC IS 11172 Part 3: Audio) specifies a high-efficiency encoding method of audio data and a format of the audio data.
[0004]
The video data handled by the MPEG video part relates to a moving image, and the moving image is composed of several tens (eg, 30) frames (still images, frames) per second. The video data has a hierarchical structure of six layers in the order of a sequence (Sequence), a GOP (Group Of Pictures), a picture, a slice (Slice), a macroblock (Macroblock), and a block.
[0005]
At present, there are two MPEG systems, MPEG-1 and MPEG-2, mainly due to differences in encoding rates. In MPEG-1, a frame corresponds to a picture. In MPEG-2, frames or fields can correspond to pictures. Two fields constitute one frame. The structure in which a frame corresponds to a picture is called a frame structure, and the structure in which a field corresponds to a picture is called a field structure.
[0006]
MPEG uses a compression technique called inter-frame prediction. Inter-frame prediction compresses data between frames based on temporal correlation. In the inter-frame prediction, bidirectional prediction is performed. Bidirectional prediction is to use both forward prediction for predicting a current playback image from a past playback image (or picture) and backward prediction for predicting a current playback image from a future playback image. .
[0007]
The bidirectional prediction defines three types of pictures called I-pictures (Intra-Picture), P-pictures (Predictive-Picture), and B-pictures (Bidirectionally predictive-Picture). The I picture is generated independently of a past or future reproduced image. The P picture is generated by forward prediction (prediction from a past I picture or P picture). B pictures are generated by bidirectional prediction. In bidirectional prediction, a B picture is generated by any one of the following three predictions. (1) Forward prediction; prediction from past I or P pictures; (2) backward prediction; prediction from future I or P pictures; (3) bidirectional prediction; past and future I pictures or Prediction from P pictures. Then, these I, P, and B pictures are respectively encoded. That is, an I picture is generated without any past or future picture. In contrast, a P picture is not generated without a past picture, and a B picture is not generated without a past or future picture.
[0008]
In the inter-frame prediction, first, an I picture is periodically generated. Next, a frame several frames ahead of the I picture is generated as a P picture. This P picture is generated by one-way (forward) prediction from the past to the present. Subsequently, a frame located before the I picture and after the P picture is generated as a B picture. When generating this B picture, an optimal prediction method is selected from three of forward prediction, backward prediction, and bidirectional prediction. In general, in a continuous moving image, a current image and images before and after the current image are very similar, and only a part thereof is different. Therefore, it is assumed that the previous frame (for example, I picture) and the next frame (for example, P picture) are the same, and if there is a change between both frames, only the difference (B picture) is extracted and compressed. I do. Thereby, data between frames can be compressed based on temporal correlation.
[0009]
A data sequence (bit stream) of video data encoded according to the MPEG video part is called an MPEG video stream (hereinafter, abbreviated as a video stream). A data string of audio data encoded in accordance with the MPEG audio part is called an MPEG audio stream (hereinafter, abbreviated as audio stream). Then, the video stream and the audio stream are time-division multiplexed in accordance with the MPEG system part, and become an MPEG system stream (hereinafter abbreviated as a system stream) as one data string. System streams are also called multiplex streams.
[0010]
The flow from the encoding to the decoding in the MPEG part is as follows. An MPEG system encoder (hereinafter, abbreviated as a system encoder) separately encodes video data and audio data while maintaining coordination, and generates a video stream and an audio stream. Next, a multiplexer (MUX) provided in the MPEG system encoder multiplexes a video stream and an audio stream so as to conform to a format of a transmission medium or a recording medium, and generates a system stream. The system stream is transmitted from the MUX via a transmission medium or recorded on a recording medium.
[0011]
A demultiplexer (DMUX; DeMultiplexer) provided in an MPEG system decoder (hereinafter abbreviated as a system decoder) separates a system stream into a video stream and an audio stream. Next, the system decoder individually decodes each stream to generate a video decoded output (hereinafter, referred to as a video output) and an audio decoded output (hereinafter, referred to as an audio output). The video output is output to a display, and a moving image is reproduced on the display. The audio output is output to a speaker via a D / A (Digital / Analog) converter and an audio amplifier, and sound is reproduced from the speaker.
[0012]
Meanwhile, MPEG-1 mainly corresponds to a storage medium using a recording medium such as a video CD (Compact Disc), a CD-ROM (CD-Read Only Memory), and a DVD (Digital Video Disc). Supports a wide range of applications, including MPEG-1.
[0013]
In a storage medium, the following two variable speed reproductions are required. (1) A function for reproducing a moving image at a speed higher than a normal (standard) reproduction speed (hereinafter, referred to as high-speed reproduction). (2) A function for playing a moving image at a speed lower than a normal playing speed (hereinafter, referred to as a low speed playing). The high-speed playback function is used, for example, when a user performs fast-forward playback to view a moving image in a short time, or when performing fast-forward playback or fast-forward reverse playback to search for a desired moving image. The low-speed playback function is used, for example, when a user watches a moving image carefully.
[0014]
The bit rate of the system stream read from the recording medium corresponds to the reading speed. Therefore, to perform high-speed reproduction, the system stream is read from the recording medium at high speed, and to perform low-speed reproduction, the system stream is read from the recording medium at low speed. For example, when a video CD or DVD is used as a recording medium, the rotation speed of the video CD or DVD is made faster or slower than at the time of normal reproduction (at the time of standard reproduction), so that a desired system stream is obtained. Read at speed.
[0015]
[Problems to be solved by the invention]
Conventionally, in MPEG, variable speed playback of moving images as described above has been studied, but no consideration has been given to variable speed playback of audio.
[0016]
The bit rate of the audio stream is the same as that of the system stream. Therefore, at the time of high-speed reproduction of a moving image, the bit rate of the audio stream is also increased, and the pitch (pitch) of the reproduced sound is increased, and in addition, the utterance speed (speech speed) is increased. In addition, at the time of low-speed reproduction of a moving image, the bit rate of the audio stream is reduced, and the pitch of the reproduced audio does not change, but the audio is interrupted. As described above, there has been a problem that the sound becomes hard to hear when the moving image is reproduced at a variable speed.
[0017]
By the way, in recent years, the development of a speech speed conversion technology for arbitrarily controlling the speech speed without changing the pitch has been progressed, and the present applicant has already developed a speech speed conversion processing LSI that can be used for a VTR or a tape recorder. (See Japanese Patent Application Laid-Open No. 7-192392 (G11B 20/02), Nikkei Electronics, November 21, 1994, No. 622, pages 93 to 98.). However, no attempt has been made to use the speech speed conversion technology for MPEG.
[0018]
Further, in the synchronous generation of audio and a moving image (video), it is necessary to consider “lip sync”. Lip sync means that the movement of the mouth of the person shown on the display is synchronized with the sound uttered from the speaker. If the sound is faster or slower than the mouth movement, the lip sync is said to be out of sync. If the deviation of the lip sync is out of the permissible range of human hearing, the viewer will feel uncomfortable. In general, it is said that a permissible time as a shift of the lip sync caused by the delay of the sound from the moving image is about 50 to 250 ms.
[0019]
The present invention has been made to satisfy the above-mentioned requirements, and has the following objects.
[1] To provide an MPEG audio reproducing apparatus capable of reproducing natural and easy-to-hear sound even during variable speed reproduction.
[0020]
[2] An MPEG playback device including the MPEG audio playback device and the MPEG video decoder of [1] is provided.
[3] An MPEG reproducing apparatus including the MPEG audio reproducing apparatus of [1] and an MPEG video decoder and capable of reducing a time lag between audio and moving images.
[0024]
[Means for Solving the Problems]
Claim 1The invention described in (1) provides an MPEG audio decoder (3) for decoding an MPEG audio stream read from a recording medium (21) in accordance with an MPEG audio part and generating an audio signal, and a speech speed for the audio signal. Speech speed conversion processing means (2, 4) for performing conversion processing, wherein the speech speed conversion processing means is reproduced when the bit rate of the audio stream is higher than normal.eachThe duration of the voice sectionlongThe speech speed conversion process is performed by shortening the time length of each silent section, and when the bit rate of the audio stream is smaller than the normal time, the time length of each voice section to be reproduced is changed.longAnd to shorten the time length of each silent section, or to perform a speech speed conversion process by inserting each silent section after deleting each silent section and connecting each voice section. I do.
[0025]
Claim 2The invention described inClaim 1In the MPEG audio reproducing apparatus described in (1), the speech speed conversion processing means (2, 4) includes a ring memory (32) for storing an audio signal, and a detection means (33) for detecting a storage amount of the ring memory. The gist is to adjust the compression / expansion rate of the time length of the voice section according to the amount of storage in the ring memory.
[0026]
Claim 3The invention described inClaim 2In the MPEG audio reproducing apparatus described in (1), the speech speed conversion processing means (2, 4) performs a voice discrimination unit (41) for discriminating a voice section and a silent section of the audio signal, and performs a process of deleting or inserting a silent section. And a time axis compression / expansion section (43) for adjusting the compression / expansion rate by performing compression / expansion processing of the voice section based on the amount of storage in the ring memory (32). Is the gist.
[0027]
Claim 4The invention described inClaims 1-3And an MPEG video decoder (1) that decodes an MPEG video stream read from a recording medium (21) in accordance with an MPEG video part and generates a video signal. The point is that the item 12) is provided.
[0028]
Claim 5The invention described inClaim 2 or Claim 3And an MPEG video decoder (12) for decoding an MPEG video stream read from a recording medium (21) in accordance with the MPEG video part and generating a video signal, An index adding circuit (51) for adding an index signal as time information to the audio signal before being written to the memory (32), and an index signal added to the audio signal read from the ring memory (32) From the time information obtained from the index signal and the current time information, a signal delay time in the speech speed conversion processing means (2, 4) is detected, and a signal indicating the detected delay time is converted to an MPEG video signal. An index detection circuit (52) for supplying to the decoder (12); G video decoder (12), as its gist to control the timing of its own operation based on the signal indicative of the delay time.
[0029]
Claim 6The invention described inClaim 3And an MPEG video decoder (12) that decodes an MPEG video stream read from the recording medium (21) in accordance with the MPEG video part and generates a video signal, and an audio device (1). A signal delay time in the speech speed conversion processing means (2, 4) is detected based on the processing result of the determination unit (41) and the bit rate of the audio stream, and a signal indicating the detected delay time is converted to an MPEG video signal. The MPEG video decoder (12) is provided with a delay time detection circuit (53) to be supplied to the decoder (12). The gist of the MPEG video decoder (12) is to control its own operation timing based on the signal indicating the delay time.
[0030]
Claim 7The invention described inClaim 3And an MPEG video decoder (12) for decoding an MPEG video stream read from a recording medium (21) in accordance with the MPEG video part and generating a video signal, A control circuit for generating a control signal for synchronizing the audio signal and the video signal after the speech speed conversion processing based on the storage amount of the memory (32), and supplying the control signal to the MPEG video decoder (12) (54), the gist of which is that the MPEG video decoder (12) controls its own operation timing based on the control signal.
[0031]
Claim 8The invention described inClaim 3And an MPEG video decoder (12) that decodes an MPEG video stream read from the recording medium (21) in accordance with the MPEG video part and generates a video signal, and an audio device (1). A signal delay time in the speech speed conversion processing means (2, 4) is detected based on the processing results of the discrimination section (41) and the time axis compression / expansion section (43), and a signal indicating the detected delay time is converted to an MPEG signal. The MPEG video decoder (12) includes a delay time detection circuit (55) to be supplied to the video decoder (12). The gist of the MPEG video decoder (12) is to control the timing of its own operation based on the signal indicating the delay time. .
[0032]
BEST MODE FOR CARRYING OUT THE INVENTION
(1st Embodiment)
Hereinafter, a first embodiment of the present invention will be described with reference to the drawings.
[0033]
FIG. 1 shows a block circuit diagram of the present embodiment.
The MPEG audio reproducing apparatus 1 of this embodiment includes a reproducing speed detecting circuit 2, an MPEG audio decoder 3, a voice speed converting circuit 4, a D / A converter 5, and an audio amplifier 6. Each of the circuits 2 to 6 can be mounted on a one-chip LSI.
[0034]
The MPEG playback device 23 of the present embodiment includes an audio video parser (AV parser) 11 and an MPEG video decoder 12 in addition to the MPEG audio playback device 1.
[0035]
The voice speed conversion processing circuit 4 includes, for example, a DSP (Digital Signal Processor) 31, a ring memory 32, an up / down counter 33, and a read clock generation circuit 36. The operation of the speech speed conversion processing circuit 4 is described in detail in the aforementioned document (Nikkei Electronics, November 21, 1994, No. 622, pp. 93-98.).
[0036]
The reproduction speed detection circuit 2 generates a decode clock corresponding to the bit rate of the MPEG system stream read from the recording medium 21 such as a video CD or DVD. The decode clock is output to each of the circuits 12, 3, and 4.
[0037]
The AV parser 11 includes a demultiplexer (DMUX) 13 and inputs the MPEG system stream read from the recording medium 21. The DMUX 13 separates the system stream into an MPEG video stream and an MPEG audio stream. The video stream is output to the video decoder 12, and the audio stream is output to the audio decoder 3.
[0038]
The video decoder 12 decodes a video stream according to the MPEG video part and generates a video output (hereinafter, referred to as a video signal). The video signal is output to the display 22, and the moving image is reproduced on the display 22.
[0039]
The audio decoder 3 decodes an audio stream according to the MPEG audio part and generates an audio output of a digital signal (hereinafter, referred to as an audio signal). The audio signal is output to the speech speed conversion processing circuit 4. The audio signal subjected to signal processing in the voice speed conversion processing circuit 4 is D / A converted by the D / A converter 5, amplified by the audio amplifier 6, and sent to the speaker 23. Then, the sound is reproduced from the speaker 23.
[0040]
The bit rate of the system stream read from the recording medium 21 corresponds to the reading speed. The operation of each of the circuits 3, 4, and 12 is defined by a decode clock.
[0041]
Therefore, the video decoder 12 generates a video signal corresponding to the bit rate of the system stream. In other words, if the bit rate of the system stream is higher than during normal playback (during standard playback), the moving image is played at high speed on the display 22, and if it is smaller than during normal playback, the moving image is played at low speed.
[0042]
The audio decoder 3 generates an audio signal corresponding to the bit rate of the system stream. That is, if the bit rate of the system stream is higher than that during normal reproduction, the bit rate of the audio signal is higher, and if it is lower than that during normal reproduction, the bit rate of the audio signal is lower.
[0043]
By the way, a video signal and an audio signal are generated synchronously during normal reproduction.
The DSP 31 includes a frame memory 34 and a speech speed conversion unit 35. The frame memory 34 stores audio signals for an appropriate number of frames (for example, two frames). The voice speed conversion unit 35 performs voice speed conversion processing on the audio signal stored in the frame memory 34 on a frame basis, and generates an audio signal (hereinafter, referred to as data) after voice speed conversion processing. One frame is composed of an appropriate number (for example, 200) of sampling data.
[0044]
The inside of the frame memory 34 is divided into two areas (hereinafter, referred to as area A and area B). At the same time that the audio signal output from the audio decoder 3 is written to the B area, one frame of the audio signal stored in the A area is read and transferred to the speech speed conversion unit 35. When the audio signal for one frame is stored in the B area, the audio signal for one frame stored in the B area is read out and transferred to the speech speed conversion unit 35. The audio signal output from the decoder 3 is written to the area A.
[0045]
The data generated by the voice speed converter 35 is written to the ring memory 32 according to the write clock generated by the voice speed converter 35. The ring memory 32 is composed of, for example, a random access memory (RAM) having a first-in-first-out (FIFO) configuration.
[0046]
The read clock generation circuit 36 generates a read clock according to the decode clock.
The data stored in the ring memory 32 is read according to a read clock, and the read data is output to the D / A converter 5. The D / A converter 5 uses a read clock as a sampling frequency.
[0047]
The write clock is input to an up-count input terminal UP of the up-down counter 33, and the read clock is input to a down-count input terminal DOWN of the up-down counter 33. The up / down counter 33 counts the difference between the total number of write clocks and the total number of read clocks. The count value corresponds to the storage amount of the ring memory 32. That is, the up / down counter 33 detects the accumulated amount of the ring memory 32 based on the write clock and the read clock. The amount stored in the ring memory 32 is output to the speech speed conversion unit 35.
[0048]
FIG. 2 shows an internal configuration of the speech speed conversion unit 35.
The speech speed conversion unit 35 includes a voice discrimination unit 41, a silence deletion / insertion unit 42, and a time axis compression / expansion unit 43.
[0049]
The voice discriminating unit 41 determines whether the audio signal read from the frame memory 34 is a voice section (a section where voice exists) or a silent section (a section where no voice exists). Note that background noise other than voice uttered by humans is handled as a silent section.
[0050]
The silence deletion / insertion unit 42 deletes the silence section or inserts a new silence section into the silence section determined by the speech determination unit 41.
The time axis compression / expansion unit 43 performs a compression process or an expansion process on the voice section determined by the voice determination unit 41 based on the storage amount of the ring memory 32.
[0051]
Each of the units 42 and 43 generates a write clock corresponding to the processing content.
Next, the operation of the speech speed conversion unit 35 during high-speed playback will be described.
[0052]
The bit rate of the audio signal output from the audio decoder 3 is the same as that of the audio stream. Therefore, at the time of high-speed reproduction, the bit rate of the audio signal is higher than at the time of normal reproduction. When an audio signal having a higher bit rate than during normal reproduction is sent to the D / A converter 5 as it is, the pitch of the sound reproduced from the speaker 23 is increased and the speech speed is faster than during normal reproduction.
[0053]
Therefore, the speech speed conversion unit 35 performs a speech speed conversion process such that the pitch of the sound reproduced from the speaker 23 is made substantially the same as that during normal reproduction, and the speech speed reproduced from the speaker 23 is made close to that during normal reproduction. I do.
[0054]
That is, the silence deletion / insertion unit 42 calculates the continuation length of the silence section determined by the speech determination unit 41, and deletes the silence section if the continuation length is equal to or longer than the predetermined length.
In addition, the time axis compression / expansion unit 43 performs pitch extraction on the voice section determined by the voice determination unit 41 using, for example, the autocorrelation method, and performs compression processing on the extracted pitch waveform. As a result, when the bit rate of the audio signal is increased at the time of high-speed reproduction, the time length of the audio section reproduced from the speaker 23 is extended.
[0055]
In the compression process performed by the time axis compression / expansion unit 43, the compression ratio is dynamically changed according to the state of the silent section and the storage amount of the ring memory 32.
For example, by compressing a three-period waveform having the same pitch period into a two-period waveform, 2/3 times compression (compression ratio: 2/3) is obtained. Specifically, a two-period waveform at the front and a two-period waveform at the rear in the time axis direction are cut out from the three-period waveform. Then, the preceding two-period waveform is multiplied by a monotonically decreasing triangular window function, and the following two-period waveform is multiplied by a monotonically increasing triangular window function. An output waveform is obtained by adding these two waveforms.
[0056]
To obtain a 0.9-fold compression (compression ratio: 0.9), for example, the waveform is compressed from a 10-period waveform to a 9-period waveform. In this case, the same processing is performed on the first three cycle waveforms. That is, among the input 10-period waveforms, the 7-period waveform excluding the first three-period waveform is not used for the processing.
[0057]
By preparing various combinations for compressing the M-period waveform into the N-period waveform, various types of compression ratios can be obtained. By the way, when the silent section is short, if the compression ratio is low (the degree of compression is high), the ring memory 32 may overflow. In order to prevent this, the compression ratio in the time axis compression / expansion unit 43 may be dynamically changed according to the storage amount of the ring memory 32. In addition, when background noise is present, an error in the extraction of a voice section or a pitch occurs. To prevent this, the detection level of the voice section in the voice determination unit 41 may be changed according to the noise signal.
[0058]
Next, the operation of the speech speed conversion unit 35 during low-speed reproduction will be described with reference to FIGS.
FIG. 3 shows an example of sound reproduced at the time of normal reproduction and at the time of 0.5 × speed reproduction.
[0059]
At the time of low-speed reproduction, the bit rate of the audio signal is lower than at the time of normal reproduction. Therefore, as shown in the method 1, when an audio signal having a smaller bit rate than that during normal reproduction is directly sent to the D / A converter 5, the pitch of the sound reproduced from the speaker 23 is smaller than that during normal reproduction. Sound does not change, but does not change. In other words, the time length of each voice section (“A”, “I”, “U”, “E”) is the same as that during normal playback, and a silent section with no sound inserted between each voice section. Therefore, the sound is interrupted, and the user feels uncomfortable in hearing.
[0060]
Therefore, the speech speed conversion unit 35 performs the speech speed conversion processing as shown in the method 2 or the method 3. In the case of MPEG audio, since the pitch of the audio does not change during low-speed reproduction, there is no need to perform the process of changing the pitch in the time axis compression / expansion unit 43 as in high-speed reproduction.
[0061]
(Method 2)
In the method 2, the time axis compression / decompression unit 43 extends the length of each voice section, and the silence deletion / insertion unit 42 shortens the length of each silent section, thereby making the discontinuity of voice inconspicuous.
[0062]
In order to extend the length of the voice section in the time axis compression / decompression section 43, the voice section determined by the voice determination section 41 is subjected to pitch extraction using, for example, the autocorrelation method, and the extracted pitch waveform is extracted. Is subjected to decompression processing. For example, by expanding a two-period waveform having the same pitch period into a three-period waveform, a 3 / 2-fold elongation (elongation ratio; 3/2) is obtained. Further, by extending a three-period waveform having the same pitch period into a four-period waveform, a 4 / 3-fold elongation (elongation ratio: 4/3) is obtained. As a result, at the time of low-speed reproduction, when the bit rate of the audio signal decreases, the time length of the audio section reproduced from the speaker 23 is extended.
[0063]
At this time, if the voice section is extended too much, the voice section is prolonged and can be heard, so that the discontinuity of the voice becomes inconspicuous but still unnatural. To prevent this, the length L2 of the voice section at the time of low-speed playback is set to the length L1 of the voice section at the time of normal playback, for example, as shown in the following equation.
[0064]
L2 / L1 ≦ 1.4
Note that the above equation can be applied not only at the time of 0.5 × speed reproduction, but also at the time of low speed reproduction at any magnification. Here, the expansion rate of the voice section in the time axis compression / expansion unit 43 may be a constant value, or may be variable as shown in the following (1) and (2).
[0065]
{Circle around (1)} The expansion rate of the voice section is dynamically changed according to the storage amount of the ring memory 32. When the silent section is short, the ring memory 32 may overflow if the expansion rate of the voice section is large (the degree of expansion is large). To prevent this, the extension rate of the voice section may be reduced.
[0066]
{Circle around (2)} The expansion rate of the voice section is dynamically changed according to the pitch change of the voice. That is, as shown in FIG. 4, the speech speed is changed by changing the expansion rate of the voice section in accordance with the change in the pitch of the voice. In this case, the audibility of the voice can be further improved. Note that a technique for changing the speech speed by changing the expansion rate of a voice section in response to a change in voice pitch is known (IEICE Technical Report SP92-56, HC92-33 (1992-09), p. 49-56).
[0067]
(Method 3)
In the method 3, the silence deletion / insertion unit 42 deletes each silence section and connects each speech section, and then inserts a new silence section following the speech section to make the discontinuity of the speech inconspicuous. . The silent section to be inserted may be any of the following (1) to (3).
[0068]
(1) A silent section in which no sound exists.
{Circle around (2)} A silent section containing white noise that does not make the viewer feel uncomfortable. Such white noise is created in advance and stored in another memory (not shown).
[0069]
{Circle around (3)} The audio signal determined as a silent section by the voice determining section 41 is stored in a separate memory (not shown), and is inserted as a silent section.
As described above, according to the present embodiment, the following operations and effects can be obtained.
[0070]
(1) By providing the voice speed conversion processing circuit 4, the pitch of the voice reproduced from the speaker 23 during high-speed reproduction is made substantially the same as that during normal reproduction, and the voice speed reproduced from the speaker 23 is normally set. Can be brought closer to the time of reproduction, and a natural and easy-to-hear sound can be reproduced.
[0071]
By the way, at the time of m-times speed reproduction (m> 1), the bit rates of the audio stream and the decode clock are m times that of the normal reproduction. At this time, if the bit rate of the data output from the speech speed conversion unit 35 is made substantially the same as during normal reproduction, the pitch of the reproduced sound can be made substantially the same as during normal reproduction. . That is, if the bit rate is converted from m to 1 in the speech speed conversion unit 35, the pitch of the reproduced voice is substantially the same as that in normal reproduction.
[0072]
(2) The provision of the speech speed conversion processing circuit 4 makes it possible to make discontinuity in the sound reproduced during low-speed reproduction inconspicuous, and reproduce sound that is natural and easy to hear.
[0073]
By the way, the above methods 2 and 3 may be used in combination as shown in the following (1) and (2).
(1) The user of the MPEG audio reproducing apparatus 1 can arbitrarily switch and select between the method 2 and the method 3. By doing so, it is possible to match the auditory characteristics of each user, and it is possible to reproduce a sound that is easy for the user to hear.
(2) Method 2 and method 3 are automatically switched and selected in accordance with the low-speed reproduction magnification. For example, method 3 is selected during 1- to 0.5-times speed reproduction, and method 2 is selected during reproduction at 0.5-times or lower speed. In this way, natural sound can be reproduced according to the reproduction speed.
[0074]
(3) When the circuits 2 to 6 are mounted on a one-chip LSI, the size of the MPEG audio reproducing apparatus 1 can be reduced.
(2nd Embodiment)
Hereinafter, a second embodiment of the present invention will be described with reference to the drawings. In the present embodiment, the same components as those in the first embodiment have the same reference numerals, and a detailed description thereof will be omitted.
[0075]
FIG. 5 shows a block diagram of a main part of the present embodiment. The present embodiment is different from the first embodiment only in that an index adding circuit 51 and an index detecting circuit 52 are provided.
[0076]
The index adding circuit 51 is provided in a stage preceding the frame memory 34 (that is, between the MPEG audio decoder 3 and the speech speed conversion processing circuit 4). The index adding circuit 51 adds an index signal to the audio signal generated by the audio decoder 3 at a constant period according to the decode clock. The audio signal to which the index signal is added is output to the frame memory 34.
[0077]
The index detection circuit 52 detects an index signal added to the data read from the ring memory 32, and based on the time information and the current time obtained from the index signal, the speech speed conversion processing circuit 4 performs signal processing. The required time Δt is calculated, and a detection signal relating to the time Δt is supplied to the video decoder 12. The video decoder 12 controls its own operation timing in accordance with the detection signal related to the time Δt.
[0078]
As described above, according to the present embodiment, the following operations and effects can be obtained in addition to the operations and effects of the first embodiment.
(1) As described above, the video signal generated by the video decoder 12 and the audio signal generated by the audio decoder 3 are synchronously generated during normal reproduction. Therefore, when the voice speed conversion processing circuit 4 is provided between the audio decoder 3 and the D / A converter 5, the time required for signal processing in the voice speed conversion processing circuit 4 (that is, the delay time in the voice speed conversion processing circuit 4) ), The audio signal will be delayed.
[0079]
Therefore, an index signal is added to the audio signal input to the frame memory 34 in a predetermined cycle in advance by using the index adding circuit 51.
The index detection circuit 52 detects an index signal added to the data read from the ring memory 32, calculates a time Δt required for the speech speed conversion processing circuit 4 to perform signal processing, and generates a detection signal related to the time Δt. It is supplied to the video decoder 12. The video decoder 12 controls its own operation timing in accordance with the detection signal related to the time Δt. When the index detection circuit 52 next detects an index signal, the video decoder 12 delays or advances its own operation timing by the difference between the time calculated at that time and the time calculated last time. .
[0080]
As a result, regardless of the delay time in the speech speed conversion processing circuit 4, the data read from the ring memory 32 (that is, the audio signal subjected to the speech speed conversion process) can be synchronized with the video signal.
[0081]
(2) From the above (1), it is possible to reduce the time lag between the sound reproduced by the speaker 23 and the moving image reproduced on the display 22, and the lip sync deviation is within the allowable range of human hearing. Can be
[0082]
(3) The index signal added to the audio signal may be deleted by the silence deletion / insertion unit 42. However, if a sufficient number of index signals are added to the audio signal by shortening the cycle of adding the index signals, even if some of the index signals are deleted by the silent deletion insertion unit 42, A fixed number or more of index signals remain in the data read from the ring memory 32. With the remaining index signal, the operation and effect (1) can be obtained.
[0083]
(Third embodiment)
Hereinafter, a third embodiment of the present invention will be described with reference to the drawings. In the present embodiment, the same components as those in the second embodiment have the same reference numerals, and a detailed description thereof will be omitted.
[0084]
FIG. 6 shows a block diagram of a main part of the present embodiment. The present embodiment is different from the second embodiment only in that an index adding circuit 51 is provided between the frame memory 34 and the audio discriminating unit 41. The index adding circuit 51 adds an index signal to the audio signal read from the frame memory 34 at a constant period according to the decode clock. The audio signal to which the index signal has been added is output to the audio discrimination unit 41.
[0085]
As described above, when the frame memory 34 stores audio signals for two frames, it is sufficient that the storage capacity of the frame memory 34 is, for example, about 0.8 Kbytes. As described above, when the storage capacity of the frame memory 34 is small, the time required for the write operation and the read operation in the frame memory 34 (that is, the delay time in the frame memory 34) is smaller than the delay time in the speech speed conversion processing circuit 4. ) Is slight and can be ignored.
[0086]
Therefore, according to the present embodiment, the same operation and effect as those of the second embodiment can be obtained.
(Fourth embodiment)
Hereinafter, a fourth embodiment of the present invention will be described with reference to the drawings. In the present embodiment, the same components as those in the second embodiment have the same reference numerals, and a detailed description thereof will be omitted.
[0087]
FIG. 7 shows a block diagram of a main part of the present embodiment. The present embodiment is different from the second embodiment only in that an index adding circuit 51 is provided between the audio discriminating unit 41, the silence deletion inserting unit 42, and the time axis compressing / expanding unit 43, respectively. . The index adding circuit 51 adds an index signal to the audio signal, which has been subjected to the signal processing in the audio discriminating unit 41, at a constant period in accordance with the decode clock. The audio signal to which the index signal is added is output to the silence deletion / insertion unit 42 and the time axis compression / expansion unit 43.
[0088]
As described above, when the storage capacity of the frame memory 34 is small, the delay time in the frame memory 34 is small compared to the delay time in the speech speed conversion processing circuit 4 and can be ignored.
[0089]
In addition, the time required for signal processing in the voice discriminating unit 41 (that is, the delay time in the voice discriminating unit 41) is shorter than the delay time in the speech speed conversion processing circuit 4 and can be ignored.
[0090]
Therefore, according to the present embodiment, the same operation and effect as those of the second embodiment can be obtained.
(Fifth embodiment)
Hereinafter, a fifth embodiment of the present invention will be described with reference to the drawings. In the present embodiment, the same components as those in the second embodiment have the same reference numerals, and a detailed description thereof will be omitted.
[0091]
FIG. 8 shows a block diagram of a main part of the present embodiment. The present embodiment is different from the second embodiment only in that an index adding circuit 51 is provided between the silent memory insertion / insertion unit 42 and the time axis compression / expansion unit 43 and the ring memory 32. The index adding circuit 51 adds an index signal to the audio signal that has been subjected to the signal processing in each of the units 42 and 43 at a constant period in accordance with the decode clock. The audio signal to which the index signal is added is output to the ring memory 32.
[0092]
As described above, when the storage capacity of the frame memory 34 is small, the delay time in the frame memory 34 is small compared to the delay time in the speech speed conversion processing circuit 4 and can be ignored.
[0093]
The time required for signal processing in each of the units 41 to 43 (that is, the delay time in each of the units 41 to 43) is shorter than the delay time in the speech speed conversion processing circuit 4 and can be ignored.
[0094]
That is, the delay time in the speech speed conversion processing circuit 4 is mainly determined by the time required for the write operation and the read operation in the ring memory 32 (that is, the delay time in the ring memory 32).
[0095]
Therefore, according to the present embodiment, the same operation and effect as those of the second embodiment can be obtained. Further, according to the present embodiment, the index signal added to the audio signal is not deleted by the silence deletion insertion unit 42 as in the second embodiment. Therefore, all of the added index signals are utilized and the number of index signals can be reduced, so that the circuit size of the index adding circuit 51 can be reduced.
[0096]
(Sixth embodiment)
Hereinafter, a sixth embodiment of the present invention will be described with reference to the drawings. In the present embodiment, the same components as those in the first embodiment have the same reference numerals, and a detailed description thereof will be omitted.
[0097]
FIG. 9 shows a block diagram of a main part of the present embodiment. The present embodiment differs from the first embodiment only in that a delay time detection circuit 53 is provided.
As described above, the audio determination unit 41 determines whether the audio signal read from the frame memory 34 is an audio section or a silent section. That is, the processing result of the audio discriminating unit 41 includes information indicating whether or not audio is included in the audio signal.
[0098]
The decode clock corresponds to the bit rate of the system stream. That is, the decode clock contains information on the compression / expansion rate of the audio signal in advance.
[0099]
Therefore, the delay time detection circuit 53 detects the delay time in the speech speed conversion processing circuit 4 based on the information on whether or not the audio signal contains voice and the information on the compression / decompression rate, and outputs the detection signal. It is supplied to the video decoder 12. The video decoder 12 controls the timing of its own operation based on the detection signal of the delay time detection circuit 53. As a result, the data read from the ring memory 32 (that is, the audio signal subjected to the speech speed conversion processing) and the video signal can be synchronized regardless of the delay time in the speech speed conversion processing circuit 4.
[0100]
As described above, according to the present embodiment, the same effect as that of the second embodiment can be obtained.
(Seventh embodiment)
Hereinafter, a seventh embodiment of the present invention will be described with reference to the drawings. In the present embodiment, the same components as those in the first embodiment have the same reference numerals, and a detailed description thereof will be omitted.
[0101]
FIG. 10 shows a block diagram of a main part of the present embodiment. The present embodiment differs from the first embodiment only in that a control circuit 54 is provided.
The control circuit 54 generates a control signal for controlling the operation speed of the video decoder 12 based on the storage amount of the ring memory 32 detected by the up / down counter 33, and supplies the control signal to the video decoder 12. The video decoder 12 controls its own operation timing based on the control signal of the control circuit 54. As a result, the data read from the ring memory 32 and the video signal generated by the video decoder 12 can be synchronized.
[0102]
As described above, the delay time in the speech speed conversion processing circuit 4 is mainly determined by the delay time in the ring memory 32. The delay time in the ring memory 32 has a correlation with the accumulated amount, and the larger the accumulated amount, the longer the delay time. Therefore, if the operation speed of the video decoder 12 is controlled based on the storage amount of the ring memory 32, the synchronization between the data read from the ring memory 32 (that is, the audio signal subjected to the speech speed conversion process) and the video signal is synchronized. Can be taken.
[0103]
As described above, according to the present embodiment, the same effect as that of the second embodiment can be obtained.
(Eighth embodiment)
Hereinafter, an eighth embodiment of the invention will be described with reference to the drawings. In the present embodiment, the same components as those in the first embodiment have the same reference numerals, and a detailed description thereof will be omitted.
[0104]
FIG. 11 shows a block diagram of a main part of the present embodiment. This embodiment differs from the first embodiment only in that a delay time detection circuit 55 is provided.
[0105]
As described above, the processing result of the audio discriminating unit 41 includes information indicating whether or not audio is included in the audio signal.
The processing result of the time axis compression / expansion unit 43 includes information on the compression / expansion rate of the audio signal.
[0106]
Therefore, the delay time detection circuit 55 detects the delay time in the speech speed conversion processing circuit 4 based on information on whether or not the audio signal contains voice and information on the compression / decompression rate, and outputs the detection signal. It is supplied to the video decoder 12. The video decoder 12 controls its own operation timing based on the detection signal of the delay time detection circuit 55. As a result, regardless of the delay time in the speech speed conversion processing circuit 4, the data read from the ring memory 32 (that is, the audio signal subjected to the speech speed conversion process) can be synchronized with the video signal.
[0107]
As described above, according to the present embodiment, the same effect as that of the second embodiment can be obtained.
FIG. 12 shows a main block circuit of the MPEG video decoder 12 having the variable speed reproduction function.
[0108]
The MPEG video decoder 12 includes a bit buffer 202, a picture header detection circuit 203, an MPEG video decode core circuit (hereinafter abbreviated as a decode core circuit) 204, a variable threshold overflow determination circuit (hereinafter abbreviated as a determination circuit) 205, and a picture skip circuit. 206, a control core circuit 207. Each of the circuits 203 to 207 can be mounted on a one-chip LSI.
[0109]
The control core circuit 207 controls each of the circuits 2 to 6.
The MPEG video stream transferred from the AV parser 11 is input to the bit buffer 202.
[0110]
The bit buffer 202 is constituted by a ring memory composed of a RAM having a FIFO structure, and sequentially accumulates the transferred video stream as it is.
The picture header detecting circuit 203 detects a picture header attached to the head of each picture of the video stream stored in the bit buffer 202, and detects a picture type (I, P, B) defined in each picture header. I do.
[0111]
The control core circuit 207 reads a video stream for an appropriate picture from the bit buffer 202 every frame period based on the detection result of the picture header detection circuit 203 and the determination result of a determination circuit 205 described later. The video stream read from the bit buffer 202 remains in the bit buffer 202 even after being read.
[0112]
Each picture read from the bit buffer 202 is transferred to the decode core circuit 204 via the picture skip circuit 206.
The decode core circuit 204 decodes each picture according to the MPEG video part and generates a video signal for each picture.
[0113]
The picture skip circuit 206 switches connections to the nodes 206a and 206b under the control of the control core circuit 207. When the picture skip circuit 206 is connected to the node 206a, the picture read from the bit buffer 202 is transferred to the decode core circuit 204 as it is. When connected to the node 206b, the picture read from the bit buffer 202Decode core circuit 204Skipped without being forwarded to. As a result, the pictures transferred to the decode core circuit 204 are thinned out in picture units by the amount skipped by the picture skip circuit 206.
[0114]
The determination circuit 205 sets a threshold Bthn of the occupancy Bm of the bit buffer 202 based on the decode clock generated by the reproduction speed detection circuit 2 and compares the occupancy Bm of the bit buffer 202 with the threshold Bthn. The determination circuit 205 calculates the ratio between the frequency of the actual decode clock generated by the reproduction speed detection circuit 2 and the frequency of the decode clock during normal reproduction, and sets the ratio as the reproduction speed magnification n. Therefore, at the time of double speed reproduction, the magnification n = 2 and the threshold value Bthn = Bth2. Also, during normal reproduction, the magnification n = 1, and the threshold Bthn = Bth1.
[0115]
When the occupation amount Bm of the bit buffer 202 does not exceed the threshold value Bthn, the determination circuit 205 determines that the bit buffer 202 is normal without a risk of overflow. In this case, the control core circuit 207 reads a video stream for one picture from the bit buffer 202. Then, the control core circuit 207 connects the picture skip circuit 206 to the node 206a, and transfers the picture read from the bit buffer 202 to the decode core circuit 204.
[0116]
When the occupation amount Bm of the bit buffer 202 exceeds the threshold value Bthn, the determination circuit 205 determines that the bit buffer 202 may overflow. In this case, the control core circuit 207 reads a video stream of an appropriate picture from the bit buffer 202 until the occupation amount Bm of the bit buffer 202 falls below the threshold Bthn. Then, the control core circuit 207 connects the picture skip circuit 206 to the node 206b side, and skips all video streams for appropriate pictures read from the bit buffer 202.
[0117]
FIG. 13 shows a change in the occupation amount Bm of the bit buffer 202.
The occupancy Bm of the bit buffer 202 rises with the bit rate RB as the slope of the graph. The bit rate RB is defined as shown in Expression (1) according to the BR (Bit Rate) of the sequence header at the beginning of the sequence. The picture rate RP of the video stream transferred from the AV parser 11 is defined by the PR (Picture Rate) of the sequence header. The capacity B of the bit buffer 202 is defined as shown in Expression (2) according to VBV (Vbv [Video Buffering Verifyer] Buffer Size) of the sequence header. Then, for each frame period, a video stream for one picture which the decoding core circuit 204 is to decode at that time is read from the bit buffer 202 at a stretch. Here, the data amount X of the video stream input to the bit buffer 202 during one frame period is defined as shown in Expression (3) according to the bit rate RB and the picture rate RP. Accordingly, the occupation amount Bm (= B0 to B6) of the bit buffer 202 immediately after the video stream for one picture is read at a stretch from the bit buffer 202 is determined based on the data amount X and the capacity B of the bit buffer 202. It is defined so as to satisfy the condition shown in Expression (4).
[0118]
RB = 400 × BR (1)
B = 16 × 1024 × VBV (2)
X = RB / RP (3)
0 <Bm <BX = B- (RB / RP) (4)
If the occupation amount Bm of the bit buffer 202 is defined so as to satisfy the condition shown in Expression (4), the bit buffer 202 does not overflow or underflow. Conversely, when the occupation amount Bm of the bit buffer 202 exceeds the threshold value (BX), the possibility that the bit buffer 202 overflows due to the video stream input to the bit buffer 202 in the next one frame period is extremely high. Become.
[0119]
In the video decoder 12, during normal reproduction, the values of the bit rate RB, the picture rate RP, and the capacity B are defined so as to satisfy Expression (4). That is, if the capacity B of the bit buffer 202 is set as shown in the equation (2), even if the connection of the picture skip circuit 206 is fixed to the node 206a side, the bit buffer 202 in an ideal state Does not overflow or underflow.
[0120]
Therefore, at the time of normal reproduction, the occupancy Bm (= B0 to B4) immediately after data for one picture is read at a stretch from the bit buffer 202 is determined based on the threshold value Bth1 and the condition shown in Expression (5). Stipulated to be satisfied. The threshold value Bth1 is set based on the equation (4) as shown in the equation (6).
[0121]
0 <Bm <Bth1 <B (5)
Bth1 = BX = B- (RB / RP) (6)
By the way, in an actual state, even if the capacity B of the bit buffer 202 is set as shown in the equation (2), if the connection of the picture skip circuit 206 is fixed to the node 206a side, the bit buffer 202 There is a risk of overflow.
[0122]
However, the video decoder 12 determines that the bit buffer 202 may overflow when the occupation amount Bm of the bit buffer 202 exceeds the threshold value Bth1 during normal reproduction. Then, a video stream for an appropriate picture is read from the bit buffer 202 until the occupation amount Bm of the bit buffer 202 falls below the threshold value Bth1. The picture skip circuit 206 is connected to the node 206b, and skips all video streams of appropriate pictures read from the bit buffer 202. Therefore, according to the video decoder 12, the bit buffer 202 does not overflow during normal reproduction.
[0123]
The occupancy Bm of the bit buffer 202 at the time of high-speed reproduction increases with the bit rate n × RB as the gradient of the graph. For example, the occupancy Bm of the bit buffer 202 at the time of 2 × speed reproduction increases with the bit rate 2 × RB as the slope of the graph.
[0124]
Therefore, at the time of high-speed reproduction, the occupation amount Bm (= B0 to B4) immediately after the data for one picture is read at a stretch from the bit buffer 202 satisfies the condition shown in Expression (7) based on the threshold value Bthn. It is specified as follows. Note that the threshold value Bthn is set as shown in Expression (8).
[0125]
0 <Bm <Bthn (7)
Bthn = B−n × X = B− (n × RB / RP) (8)
At the time of high-speed reproduction, when the occupation amount Bm of the bit buffer 202 exceeds the threshold value Bthn, it is determined that the bit buffer 202 may overflow. For example, when the occupation amount Bm exceeds the threshold value Bth2 (= B− (2 × RB / RP)) at the time of double speed reproduction, the occupation amount Bm becomes the threshold value Bth3 (= B− (3 × RB / RP)) at the time of triple speed reproduction. ), It is determined that the bit buffer 202 may overflow. Then, a video stream for an appropriate picture is read from the bit buffer 202 until the occupation amount Bm of the bit buffer 202 falls below the threshold value Bthn, and all the video streams are skipped. Therefore, according to the video decoder 12, the bit buffer 202 does not overflow during high-speed reproduction.
[0126]
If the bit buffer 202 overflows during the decoding of an arbitrary picture in the decoding core circuit 204, the newly input video stream is overwritten on the remaining portion of the bit buffer 202 of the picture being decoded. You. As a result, the portion of the picture being decoded that remains in the bit buffer 202 is destroyed and lost. Then, the decoding core circuit 204 cannot complete the decoding of the picture, and cannot generate a video signal of the picture. Therefore, it is absolutely necessary to prevent the bit buffer 202 from overflowing while the decoding core circuit 204 is decoding an arbitrary picture.
[0127]
Therefore, it is necessary to determine whether or not the bit buffer 202 may overflow before the decoding core circuit 204 starts decoding an arbitrary picture. More precisely, when the picture header detection circuit 203 detects the picture header, it is determined whether or not the bit buffer 202 may overflow, and it is determined whether or not the picture is skipped via the picture skip circuit 206. There is a need to.
[0128]
By the way, the data amount of one picture is 0 to 40 bytes, but the data amount cannot be known until the decoding in the decode core circuit 204 is completed. The decoding processing time of one picture depends on the data amount of the picture and the operation speed of the decoding core circuit 204, but is usually about 1/3 to 3/4 of one frame period.
[0129]
When the data amount of a picture read from the bit buffer 202 is 0 bytes, the occupation amount Bm of the bit buffer 202 does not change before and after the reading of the picture, so that even if the picture is skipped, overflow cannot be avoided. . Conversely, even when the data amount of the picture read from the bit buffer 202 is 0 bytes, there is no overflow if the bit buffer 202 has a sufficient free space.
[0130]
Therefore, a free space for the data amount of the video stream input to the bit buffer 202 in one frame period is secured in the bit buffer 202. Then, even if the data amount of the picture read from the bit buffer 202 is 0 bytes, no overflow occurs.
[0131]
The data amount of the video stream input to the bit buffer 202 during one frame period is (n × X = n × RB / RP). If the free space of the bit buffer 202 is equal to or larger than this data amount, no overflow occurs. Therefore, if the threshold value Bthn is set as shown in Expression (8), the overflow of the bit buffer 202 can be reliably avoided.
[0132]
That is, the determination circuit 205 checks the free space of the bit buffer 202 when the picture header detection circuit 203 detects the picture header, and determines whether a sufficient free space (n × X = n × RB / RP) is secured. Determine whether If sufficient free space is not secured, the control core circuit 207 skips the picture read from the bit buffer 202 based on the picture header via the picture skip circuit 206. Subsequently, when the picture header detection circuit 203 detects the next picture header, the determination circuit 205 checks the free space of the bit buffer 202 again. Since the time required for these processes is much shorter than the decoding process time of the decode core circuit 204, even if the decoding process of the decode core circuit 204 is started after a sufficient free space is secured in the bit buffer 202. In time.
[0133]
By the way, the bit buffer 202 may underflow when the picture header detection circuit 203 detects the picture header or after the decoding core circuit 204 starts decoding. In this case, as soon as the video stream is input to the bit buffer 202, the video stream for one picture may be sequentially read from the bit buffer 202, so that there is no particular problem.
[0134]
As described in detail above, according to the video decoder 12, the following effects can be obtained.
(1) At the time of normal reproduction, overflow of the bit buffer 202 can be avoided.
[0135]
{Circle over (2)} At the time of high-speed reproduction, overflow of the bit buffer 202 can be avoided.
(3) By providing the determination circuit 205 and the picture skip circuit 206, overflow of the bit buffer 202 can be avoided. As described above, since the control of the determination circuit 205 and the picture skip circuit 206 is simple, the control core circuit 207 does not need to be configured using a microcomputer. When the circuits 203 to 207 are mounted on a one-chip LSI, the size of the video decoder 12 can be reduced.
[0136]
(4) The video stream skipped from the node 206b side of the picture skip circuit 206 is a picture unit. Therefore, data is not interrupted in the middle of the picture transferred to the decode core circuit 204. Therefore, the decode core circuit 204 can decode not only an I picture but also a P picture and a B picture. As a result, dropped frames occurring in the moving image reproduced on the display 22 are reduced. Therefore, it is possible to display several frames per second at the time of relatively slow high-speed reproduction of 2 to 4 times. Therefore, it is possible to smooth the motion of the moving image at the time of high-speed reproduction and to greatly improve the image quality.
[0137]
By the way, in the video decoder 12, the two thresholds B2thn and B3thn may be set so as to satisfy the rule shown in Expression (9). The values of the thresholds B2thn and B3thn are set according to the reproduction speed as described above, and may be set as appropriate by actually considering the image quality of the moving image reproduced on the display 22.
[0138]
0 <B3thn <B2thn <B (9)
The determination circuit 205 compares the occupation amount Bm of the bit buffer 202 with each of the thresholds Bthn and B2thn, and determines in which area the occupation amount Bm is included in Expressions (10) to (12).
[0139]
Bm <B3thn ... (10)
B3thn <Bm <B2thn (11)
B2thn <Bm (12)
When the occupation amount Bm of the bit buffer 202 does not exceed the threshold value B3thn as shown in Expression (10), the determination circuit 205 determines that the bit buffer 202 is normal without a risk of overflow. In this case, the control core circuit 207 reads a video stream for one picture from the bit buffer 202. Then, the control core circuit 207 connects the picture skip circuit 206 to the node 206a, and transfers the picture read from the bit buffer 202 to the decode core circuit 204.
[0140]
When the occupation amount Bm of the bit buffer 202 exceeds the threshold value B2thn and does not exceed the threshold value Bthn, the determination circuit 205 determines whether the picture read from the bit buffer 202 is an I picture or a P picture, as shown in Expression (12). If so, set the first flag. Further, as shown in Expression (11), when the occupation amount Bm of the bit buffer 202 exceeds the threshold value B3thn and does not exceed the threshold value B2thn, if the picture read from the bit buffer 202 is a P picture, the second Set the flag. When the first or second flag is set, the control core circuit 207 sets the picture skip circuit 206 to the node 206b if the picture read from the bit buffer 202 is a B picture, even in the case of the equation (10). And skip that picture.
[0141]
FIG. 13 shows a change in the occupation amount Bm of the bit buffer 202 when two threshold values B2thn and B3thn are set.
When the occupation amount Bm exceeds the threshold value B3thn, if the picture read from the bit buffer 202 is a B picture, the picture is skipped without decoding (* 1 in the figure). Here, even if the occupation amount Bm still exceeds the threshold B3thn after the skipping of the B picture, if the picture read next from the bit buffer 202 is an I picture or a P picture, decoding is performed (illustration * 2).
[0142]
Even when the occupation amount Bm exceeds the threshold value B3thn, if the picture read from the bit buffer 202 is an I picture or a P picture, decoding is performed (illustration * 3). Here, if the occupation amount Bm still exceeds the threshold value B3thn after the decoding of the I picture or the P picture, if the picture read out next from the bit buffer 202 is a B picture, the picture is skipped without decoding (illustration *). 4). This skipping of the B picture is repeated until the occupation amount Bm falls below the threshold value B3thn (illustration * 5).
[0143]
When the occupation amount Bm exceeds the threshold value B2thn, if the picture read from the bit buffer 202 is an I picture or a P picture, the determination circuit 205 sets a first flag (illustration * 6). When the first flag is set, if the next picture read from the bit buffer 202 is a B picture, the B picture is skipped even if the occupation amount Bm is below the threshold value B3thn (illustrated * 7). ).
[0144]
When the occupation amount Bm exceeds the threshold value B3thn and does not exceed the threshold value B2thn, if the picture read from the bit buffer 202 is a P picture, the determination circuit 205 sets a second flag (illustration * 8). When the second flag is set, if the next picture read from the bit buffer 202 is a B picture, the B picture is skipped even if the occupation amount Bm is below the threshold B3thn (see FIG. 9). ).
[0145]
When the occupation amount Bm exceeds the threshold value B3thn and does not exceed the threshold value B2thn, and the picture read from the bit buffer 202 is an I picture, the determination circuit 205 does not set the second flag (illustration * 10). When the second flag is not set, if the occupation amount Bm is smaller than the threshold value B3thn, decoding is performed even if the next picture read from the bit buffer 202 is a B picture.
[0146]
As described above, when the two thresholds B2thn and B3thn are set, the following effects can be obtained in addition to the effects (1) to (3) of the video decoder 12 described above.
{Circle around (4)} When the occupation amount Bm of the bit buffer 202 exceeds the threshold value B3thn and does not exceed the threshold value Bthn, the I picture and the P picture are decoded as much as possible and the B picture is skipped with priority.
[0147]
Since the B picture is generated by bidirectional prediction, its importance is lower than that of an I picture or a P picture. Therefore, by skipping the B-picture of low importance with priority, it is possible to further reduce the number of dropped frames that occur in the moving image reproduced on the display 22. As a result, the motion of the moving image during high-speed reproduction can be further smoothed, and the image quality can be further improved.
[0148]
(5) By setting the first flag, even if the occupation amount Bm of the bit buffer 202 falls below the threshold value B3thn after decoding the I picture or the P picture, the B picture to be read next from the bit buffer 202 with a margin is provided. It can be skipped in advance. Also, by setting the second flag, even if the occupation amount Bm of the bit buffer 202 falls below the threshold value B3thn after decoding the P picture, the B picture read out from the bit buffer 202 is skipped in advance with a margin. Can be.
[0149]
In this way, skipping the B picture in advance is nothing but taking a preventive measure against the next overflow of the bit buffer 202. Therefore, overflow of the bit buffer 202 can be avoided more reliably.
[0150]
(6) The data amount of the I picture is as large as two to three times that of the P picture. Therefore, the degree of reduction in the occupation amount Bm of the bit buffer 202 is greater when an I picture is read than when a P picture is read. Therefore, the possibility that the bit buffer 202 overflows is smaller after the I picture is read than after the P picture is read. Therefore, by setting the first and second flags, the precautionary measures are differentiated between the I picture and the P picture. That is, by setting the threshold value B2thn of the preventive measure for the I picture to a value higher than the threshold value B3thn of the preventive measure for the P picture, the preventive measure for the I picture can be less strict than that of the P picture. As a result, unnecessary skips of B pictures can be reduced.
[0151]
{Circle around (7)} Simulation was performed on a case where a video stream having the following GOP configuration (arrangement of picture types) shown in a) and b) was transferred from the AV parser 11, and the following results were obtained.
[0152]
a) IBPBPBPBP ...
b) IBBPBBPBBPBBPBBIBP ...
[1] At the time of 2 × speed reproduction; in the case of a), all of the I picture and the P picture can be decoded, and as a result, they can be displayed at a full rate of 30 frames / sec. In the case of b), all of the I and P pictures and a part of the B picture can be decoded, and as a result, they can be displayed at 25 frames / second or more.
[0153]
[2] During quadruple-speed playback; a) and b) can decode an I picture and 3 to 4 subsequent P pictures, and as a result, can display at 15 frames / second or more.
Incidentally, in the second and third embodiments, the operation speed of the video decoder 12 may be controlled by controlling the speed of the decoding process in the decode core circuit 204.
[0154]
The above embodiments may be modified as follows, and the same operation and effect can be obtained in such a case.
(1) The ring memory 32 is provided not at the stage after the DSP 31 but at the stage before the DSP 31 (that is, between the MPEG audio decoder 3 and the DSP 31).
[0155]
(2) Each of the circuits 1, 11, and 12 constituting the MPEG reproducing apparatus 23 is mounted on a one-chip LSI. By doing so, the size of the MPEG playback device 23 can be reduced.
[0156]
(3) In the second to eighth embodiments, instead of controlling the operation speed of the video decoder 12, a delay circuit is inserted between the video decoder 12 and the display 22, and the delay time of the delay circuit is controlled.
[0157]
(4) Any two or more of the second to eighth embodiments are appropriately combined and implemented. In this case, a more excellent effect can be obtained by the synergistic action of the combined embodiments.
[0158]
(5) The first to eighth embodiments are replaced with software processing using a CPU. That is, the signal processing in each of the circuits (1 to 55) is replaced with software signal processing using a CPU.
[0159]
(6) In the MPEG video decoder 12 shown in FIG. 12, the picture skip circuit 206 has each of the nodes 206a and 206b for easy understanding, and the connection of each of the nodes 206a and 206b is controlled by the control core circuit 207. However, instead of this configuration, the picture skip circuit 206 may be configured by a logic circuit that allows only the pictures to be decoded by the decode core circuit 204 to pass under the control of the control core circuit 207. .
[0160]
As described above, each embodiment embodying the present invention has been described. However, technical ideas other than the claims that can be grasped from the above embodiment will be described below together with their effects.
(I)Claims 1-35. The MPEG audio reproducing apparatus according to claim 1, further comprising: a D / A converter (5) for D / A converting the audio signal; and an audio amplifier (6) for amplifying an output of the D / A converter. Audio playback device.
[0161]
In this way, an analog signal for driving a speaker can be generated from a digital audio signal.
(B)Claims 4 to 82. The MPEG reproducing apparatus according to claim 1, further comprising a demultiplexer (13) for separating the MPEG system stream read from the recording medium (21) into an MPEG audio stream and an MPEG video stream. .
[0162]
This makes it possible to transfer the audio stream to the audio decoder and the video stream to the video decoder.
[0163]
【The invention's effect】
Claims 1-3According to the invention described in any one of the above, it is possible to provide an MPEG audio reproducing apparatus capable of reproducing natural and easy-to-listen sound even during variable speed reproduction.
[0164]
Claim 4According to the invention described in (1), it is possible to provide an MPEG reproducing apparatus including an MPEG audio reproducing apparatus and an MPEG video decoder which can reproduce natural and easy-to-listen sound even at the time of variable speed reproduction.
[0165]
Claims 5-8According to the invention described in any one of the above, an MPEG audio playback device and an MPEG video decoder capable of reproducing natural and easy-to-listen sound even at the time of variable speed reproduction are provided, and a time lag between the sound and the moving image is provided. MPEG reproducing apparatus capable of reducing the number of pixels can be provided.
[Brief description of the drawings]
FIG. 1 is a block circuit diagram of a first embodiment.
FIG. 2 is a main part block circuit diagram of the first embodiment.
FIG. 3 is a schematic diagram for explaining the operation of the first embodiment.
FIG. 4 is a schematic diagram for explaining the operation of the first embodiment.
FIG. 5 is a main part block circuit diagram of a second embodiment.
FIG. 6 is a main part block circuit diagram of a third embodiment.
FIG. 7 is a main part block circuit diagram of a fourth embodiment.
FIG. 8 is a main part block circuit diagram of a fifth embodiment.
FIG. 9 is a main part block circuit diagram of a sixth embodiment.
FIG. 10 is a main part block circuit diagram of a seventh embodiment.
FIG. 11 is a main part block circuit diagram of an eighth embodiment.
FIG. 12 is a main part block circuit diagram of an MPEG video decoder.
FIG. 13 is a graph for explaining the operation of the MPEG video decoder.
FIG. 14 is a graph for explaining the operation of the MPEG video decoder.
[Explanation of symbols]
1. MPEG audio playback device
2. Reproduction speed detection circuit as speech speed conversion means
3. MPEG audio decoder
4: Speech speed conversion processing circuit as speech speed conversion means
12 ... MPEG video decoder
21: Recording medium
32 ... Ring memory
33 ... Up / down counter as detection means
41: voice discrimination unit
42: Silence deletion insertion section
43: Time axis compression / expansion unit
51 ... Index addition circuit
52 ... Index detection circuit
53, 55 ... delay time detection circuit
54 ... Control circuit

Claims

An MPEG audio decoder that decodes the MPEG audio stream read from the recording medium in accordance with the MPEG audio part and generates an audio signal;
Speech speed conversion processing means for performing a speech speed conversion process on the audio signal,
When the bit rate of the audio stream is higher than normal, the speech speed conversion processing means increases the time length of each voice section to be reproduced and shortens the time length of each silent section. If the bit rate of the audio stream is lower than the normal time when the bit rate of the audio stream is lower than the normal time, the time length of each voice section to be reproduced is increased and the time length of each silent section is shortened, or An MPEG audio playback apparatus that performs a speech speed conversion process by deleting a silent section and connecting voice sections and then inserting a silent section.

The MPEG audio playback device according to claim 1 ,
The speech speed conversion processing means ,
A ring memory for storing audio signals,
Detecting means for detecting the accumulated amount of the ring memory,
An MPEG audio reproducing apparatus that adjusts a compression / expansion rate of a time length of an audio section according to a storage amount of a ring memory.

The MPEG audio playback device according to claim 2 ,
The speech speed conversion processing means ,
A voice discriminating unit that discriminates between a voice section and a silent section of the audio signal;
Silence deletion insertion unit that performs deletion processing or insertion process silent section,
An MPEG audio reproducing apparatus comprising: a time axis compression / expansion unit that adjusts a compression / expansion rate by performing compression / expansion processing of a voice section based on the storage amount of a ring memory .

An MPEG audio playback device according to any one of claims 1 to 3 ,
An MPEG reproducing apparatus comprising: an MPEG video decoder that decodes an MPEG video stream read from a recording medium in accordance with an MPEG video part and generates a video signal.

An MPEG audio playback device according to claim 2 or claim 3 ,
An MPEG video decoder that decodes an MPEG video stream read from a recording medium in accordance with an MPEG video part and generates a video signal;
An index addition circuit that adds an index signal as information about time to an audio signal before being written to the ring memory , and detects an index signal added to the audio signal read from the ring memory , and detects the index signal from the index signal. An index detection circuit for detecting a signal delay time in the speech speed conversion processing means from the obtained time information and the current time information, and supplying a signal indicating the detected delay time to the MPEG video decoder ;
An MPEG video decoder , wherein the MPEG video decoder controls its own operation timing based on the signal indicating the delay time.

An MPEG audio playback device according to claim 3 ,
An MPEG video decoder that decodes an MPEG video stream read from a recording medium in accordance with an MPEG video part and generates a video signal;
Based on the processing result of the audio discrimination unit and the bit rate of the audio stream, a signal delay time in the speech speed conversion processing means is detected, and a signal indicating the detected delay time is supplied to the MPEG video decoder . And a circuit ,
An MPEG video decoder , wherein the MPEG video decoder controls its own operation timing based on the signal indicating the delay time.

An MPEG audio playback device according to claim 3 ,
An MPEG video decoder that decodes an MPEG video stream read from a recording medium in accordance with an MPEG video part and generates a video signal;
A control circuit for generating a control signal for obtaining synchronization between the audio signal and the video signal subjected to the speech speed conversion processing based on the storage amount of the ring memory , and supplying the control signal to the MPEG video decoder ;
An MPEG video decoder , wherein the MPEG video decoder controls the timing of its own operation based on the control signal.

An MPEG audio playback device according to claim 3 ,
An MPEG video decoder that decodes an MPEG video stream read from a recording medium in accordance with an MPEG video part and generates a video signal;
A delay time detecting circuit for detecting a signal delay time in the speech speed conversion processing means based on the processing results of the audio discriminating unit and the time axis compression / expansion unit , and supplying a signal indicating the detected delay time to the MPEG video decoder ; With
An MPEG video decoder , wherein the MPEG video decoder controls its own operation timing based on the signal indicating the delay time.