JP3966814B2

JP3966814B2 - Simple playback method and simple playback device, decoding method and decoding device usable in this method

Info

Publication number: JP3966814B2
Application number: JP2002373284A
Authority: JP
Inventors: 昌弘吉田
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 2002-12-24
Filing date: 2002-12-24
Publication date: 2007-08-29
Anticipated expiration: 2022-12-24
Also published as: JP2004206771A

Description

【０００１】
【発明の属する技術分野】
本発明は、簡易再生方法、簡易再生装置、復号方法、および復号装置に関する。本発明は特に音声データの再生時間を短縮して高速に再生する技術に関する。
【０００２】
【従来の技術】
近年、ＢＳデジタル放送やＣＳデジタル放送の普及が進み、また地上波放送もアナログからデジタルへの移行が目前に迫っている。デジタル放送は、単に画質や音質が高精細であるだけでなく、多チャンネル化やインタラクティブ化など、放送形態に多様性をもたらす。
【０００３】
放送のデジタル化は記録メディアにも変革をもたらし、ハードディスクやＤＶＤへのデジタル録画はビデオテープへのアナログ録画に取って代わろうとしている。デジタル録画したデータはランダムアクセスが可能であることから、録画を継続しながら少し前に録画した部分を同時に再生できるなど、アナログ録画にはない多彩な機能を実現できる。また、画質や音質を維持しながら再生時間を短縮した高速再生を実現する時短再生もそうした機能の一つである（例えば、特許文献１参照）。
【０００４】
【特許文献１】
特開平７−１９１６９５号公報（全文）
【０００５】
【発明が解決しようとする課題】
時短再生の機能は、従来からカセットテープ式のビデオ再生装置や音声再生装置、留守番電話機能付き電話機などに搭載されている。しかしこれらは再生スピードが速くなるだけでなく、音声のピッチも上がって聞き取りにくくなってしまう。デジタルデータの場合、映像と音声を同期させたまま高速化する方式と、映像と音声を同期させずに別々に高速化して音質変化を抑える方式とがある。音声に関して聞き取りやすさを重視するならば後者が有利である。しかし、内部処理に必要なバッファメモリの量が増大してしまい、特に音声データの量が再生速度に比例しないため予測が難しく、データの過不足によるバッファの破綻を防止するためにバッファメモリの容量を必要以上に大きめに設定せざるを得ない場合もある。メモリは他の部品と比べても高価であり、特にコストアップにつながりやすく、そうしたコストアップは厳しいコスト管理を要求される開発現場において切実な問題である。
【０００６】
本発明はこうした状況に鑑みなされたものであり、その目的はより簡易な構成にて時短再生を実現する点にある。別の目的は、より低容量のメモリ構成にて時短再生を実現する点にある。さらに別の目的は、符号化データを所望の速度で復号する技術を提供する点にある。
【０００７】
【課題を解決するための手段】
本発明のある態様は簡易再生方法である。この方法は、符号化されたオーディオストリームを入力する過程と、そのオーディオストリームを復号する過程と、復号により生成された音声データから無音部分を除去した時短データを生成する過程と、その時短データを出力する過程と、生成されたまま未出力の時短データの量が設定上限量を超えたときに前記復号を停止する過程と、を有する。
【０００８】
ここでいう簡易再生は、いわゆる時短再生とも呼ばれる技術であり、無音部分を除去して再生時間を短縮することにより高速再生を実現する。「符号化されたオーディオストリーム」は、例えばパケット化されたデータストリームであるＰＥＳ（Packetized Elementary Stream）信号であってもよく、ビデオストリームとともにシステムストリームを形成してもよい。「オーディオストリーム」は、ハードディスクや光ディスクなどの記録媒体に格納された状態から読み出されることを主に想定する。
【０００９】
「無音部分」は、完全な無音部分とほぼ無音の部分の双方を含んでもよい。「音声データから無音部分を除去した時短データ」は、有音部分だけを切り出して時間的に連続させることにより再生時間を短縮した音声データであってもよい。有音部分が非周期的であることから、時短処理の出力周期もまた非周期的である。したがって従来は、再生前の一時格納先であるバッファの容量は通常再生用のバッファ容量に比べて大きくする必要があった。
【００１０】
しかしながら上記の態様によれば、バッファに格納される時短データの量が所定量を超えるときにオーディオストリームの復号を停止するので、バッファへの格納も停止される。これにより、バッファオーバーを防止できるだけでなく、容量が比較的小さいバッファを利用でき、コストを抑制できる。特に、バッファには読出と書込が高速である高価なメモリの利用が要求されるため、容量を抑えることによるコスト低減効果は大きい。
【００１１】
本発明の別の態様は簡易再生装置である。この装置は、符号化されたオーディオストリームを復号するストリームデコード部と、その復号により生成された音声データから無音部分を除去した時短データを生成する時短処理部と、その時短データを、出力されるまで一時的に記憶する出力バッファと、出力バッファの記憶量が設定上限量を超えたときにストリームデコード部に対して復号の停止要求を通知する出力監視部と、を有する。ストリームデコード部は、停止要求に応じて復号を一時停止する。
【００１２】
また本装置は、符号化されたオーディオストリームを、復号されるまで一時的に記憶する入力バッファと、入力バッファの記憶量に基づく入力状況をストリームデコード部へ通知する入力監視部と、をさらに有してもよい。その場合、ストリームデコード部は、停止要求を受け取ったときに入力バッファの記憶量が設定上限量以内であることを条件に復号を一時停止してもよい。また、ストリームデコード部は、復号の再開後、一時停止に起因する復号の遅れを解消する出力周期にてオーディオストリームを復号してもよい。この復号は、通常時よりも出力周期の短い高速復号であってもよい。
【００１３】
この装置は、簡易再生機能を有する音声再生装置や映像再生装置として実現されてもよい。ストリームデコード部は、この装置と一体に構成してもよいし、独立した部品または装置の形で構成してもよい。以上の構成により、出力バッファの記憶量は一定量以内に保つことができるので、バッファの破綻を防止できるとともに比較的低容量のメモリを利用でき、コストを低減できる。一方、入力バッファに関しては、復号が一時的に停止される間にもオーディオストリームが入力され続けるため、通常再生用の入力バッファと比べて容量を大きくする必要がある。しかしながら、オーディオストリームは圧縮されているので、そのデータサイズは音声データのサイズよりも小さい。したがって、入力バッファの増量分よりも出力バッファの減量分の方が大きく、総合的なバッファ容量は通常再生用よりも小さいメモリで構成でき、十分に低コストを実現できる。
【００１４】
本発明のさらに別の態様は、復号方法である。この方法は、符号化されたオーディオストリームを入力する過程と、あらかじめ指定された出力周期に基づいて復号すべき周波数帯域を決定する過程と、オーディオストリームを前記決定した周波数帯域に限定して音声データに復号する過程と、その音声データを出力する工程と、を有する。
【００１５】
「あらかじめ指定された出力周期」は、上記の別態様においてオーディオストリームの復号を停止した時間に応じて定めてもよい。特に、通常時よりも短い周期を指定でき、上記の別態様における復号再開後の高速復号に適している。「復号すべき周波数帯域」として、上限の周波数または下限の周波数のいずれかを決定してもよい。
【００１６】
以上の方法により、符号化されたオーディオストリームは限定された周波数帯域だけが復号される。したがって、高速復号を実現できるだけでなく、周波数帯域の指定により所望の速度での復号を実現できる。
【００１７】
本発明のさらに別の態様は、復号装置である。この装置は、あらかじめ指定された出力周期に基づいて復号すべき周波数帯域を決定する帯域決定部と、符号化されたオーディオストリームを前記決定した周波数帯域に限定して復号する復号処理部と、を有する。
【００１８】
復号処理部は、オーディオストリームから音声チャンネルごとに前記決定した周波数帯域に限定して変形離散コサイン変換（Modified Discrete Cosine Transform。以下「ＭＤＣＴ」という。）係数を取得する前処理部を含んでもよい。前処理部によって実行される処理は、周波数軸上で演算する処理であってもよく、処理すべき周波数帯域を限定することにより処理に要する時間が低減される。
【００１９】
また、復号処理部は、取得したＭＤＣＴ係数に基づいて、音声チャンネルごとの窓処理に用いる窓関数のブロックタイプを判別し、そのブロックタイプ別に後続の処理を振り分けるスイッチ部と、決定した周波数帯域に限定してＭＤＣＴ係数から音声データを取得する後処理部と、を含んでもよい。したがって、通常再生時には音声チャンネルごと、例えば５個のチャンネルについて個別処理するところ、上記の態様では窓関数のブロックタイプ、すなわちロングタイプとショートタイプの２通りについて個別処理すればよい。これにより、処理に要する時間が低減される。
【００２０】
なお、以上の構成要素の任意の組合せや、本発明の構成要素や表現を方法、装置、システム、コンピュータプログラム、プログラムを格納した記録媒体、データ構造などの間で相互に置換したものもまた、本発明の態様として有効である。
【００２１】
【発明の実施の形態】
（第１実施形態）
本実施形態は、ＢＳデジタル放送を録画データを再生する際に、２倍速の早送りとなる時短再生を実現する。時短処理時は、音声はモノラルであり、映像と音声が同期しない方式を採る。従来の時短再生機能付き再生装置においては、ＰＥＳ信号を復号まで格納する入力バッファとして約１０ＫＢの容量が必要であり、時短データを再生まで格納する出力バッファとして約１９２ＫＢの容量が必要であった。ＰＥＳ信号を復号した音声データを時短処理まで格納する時短バッファの約２ＫＢを加えると、全体として約２０４ＫＢの容量が必要であった。
【００２２】
本実施形態では、出力バッファを半分の９６ＫＢに減らすとともに、バッファオーバーが生じないようにＰＥＳ信号の復号を適宜停止する。その一方でＰＥＳ信号を通常よりも多く格納するために入力バッファを２倍の２０ＫＢに増やす。このように入力バッファの増加を伴うものの、ＰＥＳ信号は時短データよりもデータサイズが小さいので、バッファの増分も小さくて済む。全体として１１８ＫＢの容量で足り、従来より約８６ＫＢのメモリ容量低減を実現できる。
【００２３】
また、ＰＥＳ信号の復号を一時的に停止する分、復号再開後に従来よりも高速に復号する必要がある。しかし、時短再生では通常再生と比べて音質に対する要求が低いことから、復号する周波数帯域を限定するとともにモノラル信号に変換することによって復号に要する時間を短縮できる。周波数を限定する幅を調整することによって所望の速度で復号することもできる。このように、復号処理および時短処理を状況に応じて制御して一時的な記憶量も制御することにより、バッファメモリの低容量化を実現する。
【００２４】
図１は、本実施形態における再生装置１０の構成を示す機能ブロック図である。再生装置１０は、ストリーム保持部１２、時間測定部１４、再生ユニット２０、メモリユニット３０、および制御ユニット４０を有する。ストリーム保持部１２は、デジタル放送などのＰＥＳ信号を録画または録音するための記録媒体である。
【００２５】
再生ユニット２０は、ＰＥＳ信号に含まれる音声部分を復号するストリームデコード部２２と、音声データに時短処理を施す時短処理部２４と、を含む。ＢＳデジタル放送の場合、ＡＡＣ方式のデータを復号すると１チャンネルあたり１０２４サンプルのデータが得られる。メモリユニット３０は、ＰＥＳ信号を復号されるまで一時的に記憶する入力バッファ３２と、ＰＥＳ信号の復号により生成される音声データを時短処理されるまで一時的に記憶する時短バッファ３４と、時短データを再生されるまで一時的に記憶する出力バッファ３６と、を含む。
【００２６】
ＰＥＳ信号に含まれるＰＴＳ（Presentation Time Stamp）信号とＰＣＲ（Program Clock Reference）信号によって同期をとっていれば、ＰＥＳ信号の復号を続けている限り入力バッファ３２は破綻しない。一方、出力バッファ３６からの時短データの出力は周期的であるが、出力バッファ３６への時短データの入力は非周期的である。無音部分の長さによっては出力バッファ３６が破綻するおそれがあり、記憶量の調整が必要となる。
【００２７】
制御ユニット４０は、ストリーム保持部１２からＰＥＳ信号を読み出して入力バッファ３２へ転送する入力制御部４６と、出力バッファ３６から時短データを読み出して外部の表示装置６０へ出力する出力制御部４８と、入力バッファ３２の記憶量を監視する入力監視部４２と、出力バッファ３６の記憶量を監視する出力監視部４４と、通常再生と時短再生を切り替える切替制御部５０と、その切替指示をユーザから受け取る指示受付部５２と、を含む。時間測定部１４は、ストリームデコード部２２における復号停止時間を計測する。
【００２８】
ユーザが通常再生または時短再生を指示すると、切替制御部５０は再生ユニット２０および制御ユニット４０の動作を通常再生と時短再生の間で切り替える。通常再生の場合、ＰＥＳ信号は入力バッファ３２に一旦格納された後、ストリームデコード部２２により復号されて出力バッファ３６に格納される。時短再生の場合、ＰＥＳ信号は入力バッファ３２に一旦格納された後、ストリームデコード部２２により復号されて時短バッファ３４に格納される。その音声データは時短処理部２４により時短処理されて出力バッファ３６に格納される。
【００２９】
ここで、出力バッファ３６の容量は従来の容量の約１／２である。例えば従来１９２ＫＢであったところ、本実施形態では９６ＫＢで構成する。したがって、音声データに含まれる無音部分の長さにもよるが、連続して時短処理すれば出力バッファ３６の記憶量はすぐに一杯になるおそれがある。そこで、出力バッファ３６の設定上限量として、例えば容量９６ＫＢの約８割である７６ＫＢを設定する。
【００３０】
出力監視部４４は、出力バッファ３６の記憶量がその設定上限量である７６ＫＢ以内に収まっているかどうかを監視する。記憶量が７６ＫＢを超過したとき、出力監視部４４は復号の停止要求をストリームデコード部２２へ通知するとともに、超過した旨を入力監視部４２に通知する。入力監視部４２は、入力バッファ３２の記憶量が入力バッファ３２の設定上限量以内に収まっているかどうかを検出し、収まっていれば空き容量が十分であるとしてその旨をストリームデコード部２２へ通知する。ストリームデコード部２２は、出力監視部４４から停止要求を受け取るとともに、入力監視部４２から空き容量が十分である旨の通知を受け取ったときに、ＰＥＳ信号の復号を一時停止する。入力バッファ３２の設定上限量は、初期的には入力バッファ３２の容量の約１／２である１０ＫＢである。その後は一時停止を開始したときの入力バッファ３２の記憶量をαＫＢとした場合にα＋１０ＫＢを入力バッファ３２の設定上限量とする。
【００３１】
出力制御部４８は、出力バッファ３６の記憶量が設定下限量、例えば容量９６ＫＢの約５割である４８ＫＢを下回ったとき、ストリームデコード部２２に対して一時停止の解除要求を通知する。また、入力監視部４２は、入力バッファ３２の記憶量がその設定上限量を超えたときにストリームデコード部２２に対して一時停止の解除要求を通知する。ストリームデコード部２２は、出力制御部４８または入力監視部４２から受け取った解除要求に応じてＰＥＳ信号の復号を再開する。
【００３２】
一方、出力バッファ３６の記憶量がその設定上限量を超過するとともに、入力バッファ３２の記憶量もまたその設定上限量を超過している場合、ストリームデコード部２２による復号を停止せず、時短処理部２４による処理を調整する。例えば、音声データから無音部分を検出するときの閾値を調整して無音部分を長くとるなど短縮度合を調整する。これによって時短処理部２４の出力を低減させるとともに、入力バッファ３２の記憶量増加を抑える。またストリームデコード部２２による復号を高速化して入力バッファ３２の記憶量を低減させる。高速化の方法は後述する。
【００３３】
時間測定部１４は、ストリームデコード部２２における復号停止時間を計測する。ストリームデコード部２２は、復号の再開後、一時停止に起因する復号の遅れを解消する出力周期にてＰＥＳ信号を高速復号する。その出力周期は、時間測定部１４によって計測された停止時間と、ストリームデコード部２２の処理能力に基づいて決定される。例えば、通常時の復号で出力周期が約10.7[msec]である場合に、停止時間をβ[msec]とし、復号再開後の１００フレームで復号の遅れを解消すると想定する。その場合、１００フレーム間（1070[msec]）の出力周期は、10.7−Ｂ／100[msec]となる。
【００３４】
以上のように、入力バッファ３２を、ＰＥＳ信号を連続して復号するときに必要な容量より大きく構成する。同時に出力バッファ３６を、時短データを連続して生成するときに必要な容量より小さく構成する。したがって、入力バッファ３２の増加量は出力バッファ３６の低減量で十分に吸収でき、全体としてメモリ容量を削減できる。
【００３５】
図２は、本実施形態におけるストリームデコード部の詳細を示す機能ブロック図である。ストリームデコード部２２は、入力切替部７１、高速復号ユニット７０、および通常復号ユニット９０を有する。通常復号ユニット９０は、通常復号時に機能する部分であり、高速復号ユニット７０は高速復号時に機能する部分である。入力切替部７１は、入力バッファ３２から読み出したＰＥＳ信号を高速復号ユニット７０または通常復号ユニット９０のいずれかに送ることにより、通常復号と高速復号を切り替える。
【００３６】
通常復号ユニット９０において、通常復号前処理部９２は、ＰＥＳ信号に対して、ハフマンデコード処理、逆量子化処理、スケーリング処理、ＭＳステレオ処理、インテンシティステレオ処理を施すことにより、音声チャンネルごとにＭＤＣＴ係数を得る。通常復号前処理部９２が出力するＣチャンネル、Ｌチャンネル、Ｒチャンネル、ＳＬチャンネル、ＳＲチャンネルの各ＭＤＣＴ係数は、それぞれＣチャンネルＩＭＤＣＴ部９４、ＬチャンネルＩＭＤＣＴ部９６、ＲチャンネルＩＭＤＣＴ部９８、ＳＬチャンネルＩＭＤＣＴ部１００、ＳＲチャンネルＩＭＤＣＴ部１０２によりＩＭＤＣＴ（Inverse Modified Discrete Cosine Transform、逆変形離散コサイン変換）処理が施された後、それぞれＣチャンネル窓処理部１０４、Ｌチャンネル窓処理部１０６、Ｒチャンネル窓処理部１０８、ＳＬチャンネル窓処理部１１０、ＳＲチャンネル窓処理部１１２により窓処理が施される。
【００３７】
Ｃチャンネル窓処理部１０４、Ｌチャンネル窓処理部１０６、Ｒチャンネル窓処理部１０８、ＳＬチャンネル窓処理部１１０、ＳＲチャンネル窓処理部１１２がそれぞれ出力したＣ、Ｌ、Ｒ、ＳＬ、ＳＲの各チャンネルの音声データは、通常復号ダウンミックス部１１４により希望の出力チャンネル数にダウンミックスされ、出力バッファ３６へ格納される。
【００３８】
高速復号ユニット７０において、帯域決定部７２は、時間測定部１４により指定された出力周期に基づいて、復号すべき周波数帯域を決定する。例えば、高周波帯域を除去する場合、その上限周波数を決定する。
【００３９】
高速復号前処理部７４は、帯域決定部７２により決定された周波数帯域に限定してＰＥＳ信号に各種処理を施すことにより、音声チャンネルごとにＭＤＣＴ係数を取得する。高速復号前処理部７４により施される処理は、通常復号前処理部９２と同様にハフマンデコード処理、逆量子化処理、スケーリング処理、ＭＳステレオ処理、インテンシティステレオ処理である。
【００４０】
スイッチ部７６は、高速復号前処理部７４から取得した各音声チャンネルのＭＤＣＴ係数から、音声チャンネルごとの窓処理に用いる窓関数のブロックタイプを判別する。スイッチ部７６は、判別したブロックタイプ別に後続の処理を振り分ける。すなわち、窓関数がロングタイプである音声チャンネルはロングタイプダウンミックス部７８へ振り分け、窓関数がショートタイプである音声チャンネルはショートタイプダウンミックス部８０へ振り分ける。振り分けられる音声チャンネル数の比率は、例えば２対３の場合もあれば、０対５の場合もある。
【００４１】
ロングタイプダウンミックス部７８およびショートタイプダウンミックス部８０は、それぞれ入力された音声チャンネルを単一のチャンネルへダウンミックスする。このとき、帯域決定部７２により決定された周波数帯域に限定して複数の音声チャンネルを加算する。ロングタイプの場合、次式（１）に基づいて加算される。
【数１】

ここで、ｋはロングタイプの音声チャンネル数を示し、０〜５の範囲である。ｉはロングタイプのサンプル数であり、０〜１０２３の範囲である。MDCT_ch[ch][i]はロングタイプのＭＤＣＴ係数を示す。ただし、上限周波数のサンプル番号をUpL[ch]とした場合、MDCT_ch[ch][m+1]＝０（UpL[ch]≦ｍ≦1023）となる。
【００４２】
ショートタイプの場合、次式（２）に基づいて加算される。
【数２】

ここで、ｋはショートタイプの音声チャンネル数を示し、０〜５の範囲である。ｉはショートタイプのサンプル数であり、０〜１２７の範囲である。MDCT_ch[ch][128×B+i]はショートタイプのＢ番目の窓のＭＤＣＴ係数を示す。Ｂは１フレーム中の窓の数を示し、０〜７の範囲である。ただし、上限周波数のサンプル番号をUpS[ch]とした場合、MDCT_ch[ch][128×B+m+1]＝０（UpS[ch]≦ｍ≦127）となる。
【００４３】
ロングタイプダウンミックス部７８の出力は、ロングタイプＩＭＤＣＴ部８２によるＩＭＤＣＴ処理とロングタイプ窓処理部８６による窓処理が施される。ショートタイプダウンミックス部８０の出力は、ショートタイプＩＭＤＣＴ部８４によるＩＭＤＣＴ処理とショートタイプ窓処理部８８による窓処理が施される。ロングタイプ窓処理部８６およびショートタイプ窓処理部８８の出力は加算されてモノラル信号の音声データとして時短バッファ３４に格納される。以上の構成により、所望の復号速度、すなわち復号の停止による遅れを解消できる速度でＰＥＳ信号を復号できる。
【００４４】
図３は、音声チャンネルごとに指定する上限周波数のサンプル番号の算出例を示す。例えば、ロングタイプのサンプル番号UpL[ch]は、Ｃチャンネル、Ｌチャンネル、Ｒチャンネルでは1023−(Ａ−β)×512×αとし、ＳＬチャンネル、ＳＲチャンネルでは1023−(Ａ−γ)×512×αとする。ショートタイプのサンプル番号UpS[ch]は、Ｃチャンネル、Ｌチャンネル、Ｒチャンネルでは127−(Ａ−β)×64×αとし、ＳＬチャンネル、ＳＲチャンネルでは127−(Ａ−γ)×64×αとする。ここで、α＝0.8、β＝1.5、γ＝0.5とする。Ａは復号処理の速度であり、１．５から２の範囲とする。なお、算出結果が負の値になった場合は、上限周波数をゼロとする。
【００４５】
図４は、再生装置１０による時短処理の手順を示すフローチャートである。ユーザから時短再生が指示されると、ストリームデコード部２２による復号が開始され（Ｓ１０）、得られた音声データに対する時短処理が開始される（Ｓ１２）。出力バッファ３６の記憶量が上限量以内にあるときはそのまま処理を継続し（Ｓ１４Ｎ）、上限量を超えて一杯に近づいたとき（Ｓ１４Ｙ）、ストリームデコード部２２へ停止要求を通知する（Ｓ１６）。そのとき入力バッファ３２の記憶量が上限量を超えずに空き容量が十分であれば（Ｓ１８Ｙ）、ストリームデコード部２２は復号を停止し（Ｓ２０）、時間測定部１４が停止時間の計測を開始する（Ｓ２２）。
【００４６】
その後、入力バッファ３２の記憶量が上限量を超えて一杯に近づくと（Ｓ２４Ｙ）、時間測定部１４は停止時間の計測を終了して（Ｓ２８）、ストリームデコード部２２は停止を解除し、復号を再開する（Ｓ３０）。また、入力バッファ３２の記憶量が上限量を超えていない場合であっても（Ｓ２４Ｎ）、出力バッファ３６の記憶量が下限量を下回って空に近づいたときもまた（Ｓ２６Ｙ）、停止時間の計測を停止して（Ｓ２８）、復号を再開する（Ｓ３０）。このとき、停止時間に応じた速度で高速復号を実行する（Ｓ３２）。出力バッファ３６の記憶量が下限量を下回らない限り（Ｓ２６Ｎ）、復号は停止されたままである。
【００４７】
一方、Ｓ１６で停止要求を通知したときに、入力バッファ３２の記憶量が上限量を超えて空き容量が十分でない場合（Ｓ１８Ｎ）、Ｓ２０からＳ３２の処理をスキップするとともに、時短処理部２４は時短処理による出力量を調整し（Ｓ３６）、ストリームデコード部２２は入力バッファ３２の記憶量が上限量以内に収まるまで所定速度にて高速復号する（Ｓ４０）。以上のＳ１４からＳ３２までの処理は、時短再生が継続される間、繰り返される（Ｓ３４Ｙ）。
【００４８】
図５は、図４のＳ３２における高速復号処理の詳細を示すフローチャートである。停止時間が０の場合は（Ｓ５０Ｎ）、Ｓ５２〜Ｓ６４の処理をスキップして、高速復号は実行しない。停止時間が０より多ければ（Ｓ５０Ｙ）、その停止時間に応じて復号すべき周波数帯域を決定し（Ｓ５２）、その周波数帯域に限定した前処理を実行する（Ｓ５４）。音声チャンネルごとに窓関数のブロックタイプを判定し（Ｓ５６）、ブロックタイプ別に後続の処理を振り分ける（Ｓ５８）。ブロックタイプ別に周波数帯域を限定してダウンミックスし（Ｓ６０）、ブロックタイプ別にＩＭＤＣＴ処理を施し（Ｓ６２）、ブロックタイプ別に窓処理を施す（Ｓ６４）。
【００４９】
（第２実施形態）
本実施形態は、ストリームデコード部２２の構成をより簡素化している。図６は、本実施形態におけるストリームデコード部の詳細を示す機能ブロック図である。本実施形態のストリームデコード部２２は、通常復号用の機能ブロックと高速復号用の機能ブロックを共通化している点で第１実施形態と異なる。例えば、第１実施形態では前処理のための機能ブロックとして通常復号前処理部９２および高速復号前処理部７４を設けていたが、本実施形態では前処理部１２０として共通化されている。前処理部１２０は、帯域決定部７２が決定した周波数帯域に限定して前処理を実行するが、通常復号時は周波数帯域の限定をなくして前処理を実行すればよい。
【００５０】
第１〜５ＩＭＤＣＴ部１２２、１２４、１２６、１２８、１３０は、通常復号時にはそれぞれ第１実施形態のＣチャンネルＩＭＤＣＴ部９４、ＬチャンネルＩＭＤＣＴ部９６、ＲチャンネルＩＭＤＣＴ部９８、ＳＬチャンネルＩＭＤＣＴ部１００、ＳＲチャンネルＩＭＤＣＴ部１０２として機能する。また、第１ＩＭＤＣＴ部１２２および第２ＩＭＤＣＴ部１２４は、高速復号時にはそれぞれ第１実施形態のロングタイプＩＭＤＣＴ部８２またはショートタイプＩＭＤＣＴ部８４として機能する。
【００５１】
第１〜５窓処理部１３２、１３４、１３６、１３８、１４０は、通常復号時にはそれぞれ第１実施形態のＣチャンネル窓処理部１０４、１０６、１０８、１１０、１１２として機能する。また、第１窓処理部１３２および第２窓処理部１３４は、高速復号時にはそれぞれ第１実施形態のロングタイプ窓処理部８６またはショートタイプ窓処理部８８として機能する。
【００５２】
第１ダウンミックス部１４２は、通常復号時は第１実施形態の通常復号ダウンミックス部１１４として機能し、高速復号時は第１実施形態のロングタイプダウンミックス部７８として機能する。第２ダウンミックス部１４４は、高速復号時にショートタイプダウンミックス部８０として機能する。スイッチ部１４６は、高速復号時に第１実施形態のスイッチ部７６として機能する。
【００５３】
以上の構成において、高速復号時は前処理部１２０による前処理の出力をスイッチ部１４６が窓関数のブロックタイプ別に第１ダウンミックス部１４２と第２ダウンミックス部１４４に振り分る。それぞれの出力に第１ＩＭＤＣＴ部１２２および第２ＩＭＤＣＴ部１２４がＩＭＤＣＴ処理を施し、さらにそれぞれの出力に第１窓処理部１３２および第２窓処理部１３４が窓処理を施す。これらの出力を加算して時短バッファ３４に格納する。これにより、第１実施形態のストリームデコード部２２と同じ動作を、より簡素な構成で実現できる。
【００５４】
以上、本発明を実施の形態をもとに説明した。この実施の形態は例示であり、その各構成要素や各処理プロセスの組合せにいろいろな変形例が可能なこと、またそうした変形例も本発明の範囲にあることは当業者に理解されるところである。以下、変形例を挙げる。
【００５５】
実施の形態においては、通常再生と時短再生を切り替える再生装置として本発明を実現した。変形例においては、時短再生専用の装置として実現してもよい。また、ストリームデコード部２２を独立の復号装置として実現してもよいし、さらに高速復号ユニット７０の部分を独立の高速復号装置として実現してもよい。
【００５６】
【発明の効果】
本発明によれば、より簡素な構成で時短再生を実現できる。
【図面の簡単な説明】
【図１】第１実施形態における再生装置の構成を示す機能ブロック図である。
【図２】第１実施形態におけるストリームデコード部の詳細を示す機能ブロック図である。
【図３】音声チャンネルごとに指定する上限周波数のサンプル番号の算出例を示す図である。
【図４】再生装置による時短処理の手順を示すフローチャートである。
【図５】高速復号処理の詳細を示すフローチャートである。
【図６】第２実施形態におけるストリームデコード部の詳細を示す機能ブロック図である。
【符号の説明】
２２ストリームデコード部、２４時短処理部、３２入力バッファ、３６出力バッファ、４２入力監視部、４４出力監視部、７２帯域決定部、７４高速復号前処理部、７６スイッチ部、７８ロングタイプダウンミックス部、８０ショートタイプダウンミックス部、８２ロングタイプＩＭＤＣＴ部、８４ショートタイプＩＭＤＣＴ部、８６ロングタイプ窓処理部、８８ショートタイプ窓処理部。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a simple reproduction method, a simple reproduction device, a decoding method, and a decoding device. The present invention particularly relates to a technique for shortening the reproduction time of audio data and reproducing it at high speed.
[0002]
[Prior art]
In recent years, the spread of BS digital broadcasting and CS digital broadcasting has progressed, and terrestrial broadcasting is approaching the transition from analog to digital. Digital broadcasting not only has high definition of image quality and sound quality, but also brings diversity to broadcasting forms such as multi-channel and interactive.
[0003]
Digitalization of broadcasting has revolutionized recording media, and digital recording on hard disks and DVDs is replacing analog recording on videotapes. Since digitally recorded data can be accessed at random, various functions not available in analog recording can be realized, such as being able to play back the recorded part at the same time while continuing recording. Also, one of such functions is time-saving reproduction that realizes high-speed reproduction with reduced reproduction time while maintaining image quality and sound quality (see, for example, Patent Document 1).
[0004]
[Patent Document 1]
JP 7-191695 A (full text)
[0005]
[Problems to be solved by the invention]
The time-saving playback function has been conventionally installed in a cassette tape type video playback device, audio playback device, telephone with answering machine function, and the like. However, these not only increase the playback speed, but also increase the pitch of the sound, making it difficult to hear. In the case of digital data, there are a method of increasing the speed while synchronizing the video and audio, and a method of increasing the speed separately without synchronizing the video and audio and suppressing a change in sound quality. The latter is advantageous if emphasis is placed on ease of listening with respect to speech. However, the amount of buffer memory required for internal processing increases. In particular, the amount of audio data is not proportional to the playback speed, making it difficult to predict, and the capacity of the buffer memory to prevent buffer failure due to excessive or insufficient data In some cases, it may be necessary to set a larger than necessary. Memory is more expensive than other components, and is likely to lead to cost increases, and such cost increases are a serious problem in development sites that require strict cost control.
[0006]
The present invention has been made in view of such a situation, and an object thereof is to realize a short-time reproduction with a simpler configuration. Another object is to realize a short time reproduction with a lower memory capacity configuration. Yet another object is to provide a technique for decoding encoded data at a desired rate.
[0007]
[Means for Solving the Problems]
One embodiment of the present invention is a simple reproduction method. This method includes a process of inputting an encoded audio stream, a process of decoding the audio stream, a process of generating time-short data from which silence has been removed from audio data generated by decoding, and A process of outputting, and a process of stopping the decoding when the amount of time-short data that has been generated and has not been output exceeds a set upper limit amount.
[0008]
The simple reproduction referred to here is a technique called so-called short-time reproduction, and realizes high-speed reproduction by removing the silent portion and shortening the reproduction time. The “encoded audio stream” may be, for example, a PES (Packetized Elementary Stream) signal that is a packetized data stream, and may form a system stream together with the video stream. The “audio stream” is mainly assumed to be read from a state stored in a recording medium such as a hard disk or an optical disk.
[0009]
The “silent part” may include both a complete silent part and a substantially silent part. The “time-short data obtained by removing the silent part from the voice data” may be voice data in which the reproduction time is shortened by cutting out only the voiced part and making it continuous in time. Since the sound part is aperiodic, the output period of the short-time processing is also aperiodic. Therefore, conventionally, the capacity of the buffer that is a temporary storage destination before reproduction has to be larger than the buffer capacity for normal reproduction.
[0010]
However, according to the above aspect, since the decoding of the audio stream is stopped when the amount of time-saving data stored in the buffer exceeds a predetermined amount, the storage in the buffer is also stopped. As a result, not only can the buffer over be prevented, but a buffer having a relatively small capacity can be used, thereby reducing the cost. In particular, since the buffer requires the use of an expensive memory that can be read and written at high speed, the cost reduction effect by suppressing the capacity is great.
[0011]
Another aspect of the present invention is a simple playback device. This apparatus outputs a stream decoding unit that decodes an encoded audio stream, a time-shortening processing unit that generates time-shortening data by removing silence from the audio data generated by the decoding, and the time-shortening data. And an output monitoring unit for notifying the stream decoding unit of a decoding stop request when the storage amount of the output buffer exceeds the set upper limit amount. The stream decoding unit temporarily stops decoding in response to the stop request.
[0012]
The apparatus further includes an input buffer that temporarily stores the encoded audio stream until decoding, and an input monitoring unit that notifies the stream decoding unit of an input status based on the storage amount of the input buffer. May be. In this case, the stream decoding unit may temporarily stop decoding on the condition that the storage amount of the input buffer is within the set upper limit when a stop request is received. In addition, the stream decoding unit may decode the audio stream in an output cycle that eliminates the decoding delay caused by the temporary stop after the decoding is resumed. This decoding may be high-speed decoding with an output cycle shorter than that in the normal time.
[0013]
This device may be realized as an audio playback device or a video playback device having a simple playback function. The stream decoding unit may be configured integrally with this device, or may be configured as an independent component or device. With the above configuration, the storage amount of the output buffer can be kept within a certain amount, so that the failure of the buffer can be prevented, a relatively low-capacity memory can be used, and the cost can be reduced. On the other hand, regarding the input buffer, since the audio stream is continuously input even while decoding is temporarily stopped, it is necessary to increase the capacity compared to the input buffer for normal reproduction. However, since the audio stream is compressed, its data size is smaller than the size of the audio data. Therefore, the amount of decrease of the output buffer is larger than the amount of increase of the input buffer, and the total buffer capacity can be configured with a memory smaller than that for normal reproduction, and a sufficiently low cost can be realized.
[0014]
Yet another embodiment of the present invention is a decoding method. This method includes a step of inputting an encoded audio stream, a step of determining a frequency band to be decoded based on a predetermined output period, and audio data by limiting the audio stream to the determined frequency band. And a process of outputting the audio data.
[0015]
The “predetermined output period” may be determined according to the time when decoding of the audio stream is stopped in the above-described another mode. In particular, a cycle shorter than the normal time can be specified, which is suitable for high-speed decoding after decoding restart in the above-described another mode. As the “frequency band to be decoded”, either the upper limit frequency or the lower limit frequency may be determined.
[0016]
With the above method, only a limited frequency band of the encoded audio stream is decoded. Therefore, not only high-speed decoding can be realized, but also decoding at a desired speed can be realized by specifying a frequency band.
[0017]
Yet another embodiment of the present invention is a decoding device. The apparatus includes: a band determination unit that determines a frequency band to be decoded based on a predesignated output period; and a decoding processing unit that decodes an encoded audio stream limited to the determined frequency band. Have.
[0018]
The decoding processing unit may include a preprocessing unit that obtains a modified discrete cosine transform (hereinafter referred to as “MDCT”) coefficient limited to the determined frequency band for each audio channel from the audio stream. The processing executed by the preprocessing unit may be processing performed on the frequency axis, and the time required for processing is reduced by limiting the frequency band to be processed.
[0019]
In addition, the decoding processing unit determines the block type of the window function used for the window processing for each audio channel based on the acquired MDCT coefficient, and switches the subsequent processing according to the block type to the determined frequency band. And a post-processing unit that acquires audio data from MDCT coefficients in a limited manner. Therefore, during normal playback, individual processing is performed for each audio channel, for example, five channels. In the above-described mode, the window function block type, that is, the long type and the short type may be individually processed. Thereby, the time required for processing is reduced.
[0020]
It should be noted that any combination of the above-described components, and the components and expressions of the present invention are mutually replaced between a method, apparatus, system, computer program, recording medium storing the program, data structure, etc. This is effective as an embodiment of the present invention.
[0021]
DETAILED DESCRIPTION OF THE INVENTION
(First embodiment)
In the present embodiment, when reproducing the recorded data of the BS digital broadcast, the short-time reproduction that is fast-forwarding at double speed is realized. At the time-shortening processing, the sound is monaural, and a method in which video and sound are not synchronized is adopted. The conventional playback device with a short time playback function requires a capacity of about 10 KB as an input buffer for storing the PES signal until decoding, and a capacity of about 192 KB as an output buffer for storing the short time data until playback. When about 2 KB of the time buffer for storing the audio data obtained by decoding the PES signal until time reduction processing is added, a capacity of about 204 KB as a whole is required.
[0022]
In the present embodiment, the output buffer is reduced to half 96 KB, and decoding of the PES signal is stopped as appropriate so as not to cause buffer over. On the other hand, in order to store more PES signals than usual, the input buffer is doubled to 20 KB. Although the input buffer is increased as described above, the PES signal has a data size smaller than that of the short-time data, so that the buffer increment can be small. A total capacity of 118 KB is sufficient, and a memory capacity reduction of about 86 KB can be realized.
[0023]
In addition, since the decoding of the PES signal is temporarily stopped, it is necessary to decode the PES signal at a higher speed than before after the decoding is resumed. However, since the demand for sound quality is low in short-time playback compared to normal playback, the time required for decoding can be shortened by limiting the frequency band to be decoded and converting to a monaural signal. It is also possible to perform decoding at a desired speed by adjusting the width that limits the frequency. In this way, the capacity of the buffer memory can be reduced by controlling the decoding process and the time-saving process according to the situation and controlling the temporary storage amount.
[0024]
FIG. 1 is a functional block diagram showing the configuration of the playback apparatus 10 in the present embodiment. The playback apparatus 10 includes a stream holding unit 12, a time measurement unit 14, a playback unit 20, a memory unit 30, and a control unit 40. The stream holding unit 12 is a recording medium for recording or recording a PES signal such as digital broadcasting.
[0025]
The reproduction unit 20 includes a stream decoding unit 22 that decodes an audio portion included in the PES signal, and a time reduction processing unit 24 that performs time reduction processing on the audio data. In the case of BS digital broadcasting, when AAC format data is decoded, data of 1024 samples per channel is obtained. The memory unit 30 includes an input buffer 32 that temporarily stores the PES signal until it is decoded, a time buffer 34 that temporarily stores the audio data generated by decoding the PES signal until the time is processed, and the time data. And an output buffer 36 for temporarily storing until playback.
[0026]
If synchronization is achieved by a PTS (Presentation Time Stamp) signal and a PCR (Program Clock Reference) signal included in the PES signal, the input buffer 32 does not fail as long as decoding of the PES signal is continued. On the other hand, the output of the short time data from the output buffer 36 is periodic, but the input of the short time data to the output buffer 36 is aperiodic. Depending on the length of the silent portion, the output buffer 36 may break down, and the storage amount needs to be adjusted.
[0027]
The control unit 40 reads the PES signal from the stream holding unit 12 and transfers the PES signal to the input buffer 32, the output control unit 48 reads the time-saving data from the output buffer 36, and outputs it to the external display device 60, An input monitoring unit 42 that monitors the storage amount of the input buffer 32, an output monitoring unit 44 that monitors the storage amount of the output buffer 36, a switching control unit 50 that switches between normal playback and short-time playback, and a switching instruction from the user. An instruction receiving unit 52. The time measuring unit 14 measures the decoding stop time in the stream decoding unit 22.
[0028]
When the user instructs normal reproduction or short-time reproduction, the switching control unit 50 switches the operations of the reproduction unit 20 and the control unit 40 between normal reproduction and short-time reproduction. In the case of normal reproduction, the PES signal is temporarily stored in the input buffer 32, then decoded by the stream decoding unit 22 and stored in the output buffer 36. In the case of short-time playback, the PES signal is temporarily stored in the input buffer 32, then decoded by the stream decoding unit 22 and stored in the short-time buffer 34. The audio data is subjected to time reduction processing by the time reduction processing unit 24 and stored in the output buffer 36.
[0029]
Here, the capacity of the output buffer 36 is about ½ of the conventional capacity. For example, the conventional configuration is 192 KB, but in this embodiment, the configuration is 96 KB. Therefore, although depending on the length of the silent portion included in the audio data, the storage amount of the output buffer 36 may be filled up quickly if time-saving processing is performed continuously. Therefore, for example, 76 KB, which is about 80% of the capacity 96 KB, is set as the set upper limit amount of the output buffer 36.
[0030]
The output monitoring unit 44 monitors whether the storage amount of the output buffer 36 is within the set upper limit amount of 76 KB. When the storage amount exceeds 76 KB, the output monitoring unit 44 notifies the stream decoding unit 22 of a request to stop decoding, and notifies the input monitoring unit 42 that the storage amount has been exceeded. The input monitoring unit 42 detects whether or not the storage amount of the input buffer 32 is within the set upper limit of the input buffer 32, and if so, notifies the stream decoding unit 22 that the free space is sufficient. To do. When the stream decoding unit 22 receives a stop request from the output monitoring unit 44 and also receives a notification from the input monitoring unit 42 that the free space is sufficient, the stream decoding unit 22 temporarily stops decoding of the PES signal. The set upper limit amount of the input buffer 32 is initially 10 KB, which is about ½ of the capacity of the input buffer 32. Thereafter, when the storage amount of the input buffer 32 when the temporary stop is started is αKB, α + 10 KB is set as the set upper limit amount of the input buffer 32.
[0031]
When the storage amount of the output buffer 36 falls below the set lower limit amount, for example, 48 KB, which is about 50% of the capacity 96 KB, the output control unit 48 notifies the stream decoding unit 22 of a suspension cancellation request. In addition, the input monitoring unit 42 notifies the stream decoding unit 22 of a suspension cancellation request when the storage amount of the input buffer 32 exceeds the set upper limit amount. The stream decoding unit 22 resumes decoding of the PES signal in response to the release request received from the output control unit 48 or the input monitoring unit 42.
[0032]
On the other hand, when the storage amount of the output buffer 36 exceeds the set upper limit amount and the storage amount of the input buffer 32 also exceeds the set upper limit amount, the decoding by the stream decoding unit 22 is not stopped, and the time-saving process is performed. The processing by the unit 24 is adjusted. For example, the degree of shortening is adjusted by adjusting a threshold value when detecting a silent part from audio data to make the silent part longer. As a result, the output of the short-time processing unit 24 is reduced and an increase in the storage amount of the input buffer 32 is suppressed. In addition, the decoding by the stream decoding unit 22 is accelerated and the storage amount of the input buffer 32 is reduced. A method for speeding up will be described later.
[0033]
The time measuring unit 14 measures the decoding stop time in the stream decoding unit 22. After resuming decoding, the stream decoding unit 22 performs high-speed decoding of the PES signal at an output period that eliminates the decoding delay caused by the temporary stop. The output period is determined based on the stop time measured by the time measuring unit 14 and the processing capability of the stream decoding unit 22. For example, it is assumed that when the output cycle is about 10.7 [msec] in normal decoding, the stop time is set to β [msec] and the decoding delay is eliminated in 100 frames after decoding is resumed. In this case, the output period between 100 frames (1070 [msec]) is 10.7−B / 100 [msec].
[0034]
As described above, the input buffer 32 is configured to have a capacity larger than that required when the PES signal is continuously decoded. At the same time, the output buffer 36 is configured to have a smaller capacity than that required when continuously generating time-shortening data. Therefore, the increase amount of the input buffer 32 can be sufficiently absorbed by the decrease amount of the output buffer 36, and the memory capacity can be reduced as a whole.
[0035]
FIG. 2 is a functional block diagram showing details of the stream decoding unit in the present embodiment. The stream decoding unit 22 includes an input switching unit 71, a high-speed decoding unit 70, and a normal decoding unit 90. The normal decoding unit 90 is a part that functions during normal decoding, and the high-speed decoding unit 70 is a part that functions during high-speed decoding. The input switching unit 71 switches between normal decoding and high speed decoding by sending the PES signal read from the input buffer 32 to either the high speed decoding unit 70 or the normal decoding unit 90.
[0036]
In the normal decoding unit 90, the normal decoding pre-processing unit 92 performs Huffman decoding processing, inverse quantization processing, scaling processing, MS stereo processing, and intensity stereo processing on the PES signal, thereby performing MDCT for each audio channel. Get the coefficient. The MDCT coefficients of the C channel, L channel, R channel, SL channel, and SR channel output from the normal decoding preprocessing unit 92 are respectively a C channel IMDCT unit 94, an L channel IMDCT unit 96, an R channel IMDCT unit 98, and an SL channel. After IMDCT (Inverse Modified Discrete Cosine Transform) processing is performed by the IMDCT unit 100 and the SR channel IMDCT unit 102, the C channel window processing unit 104, the L channel window processing unit 106, and the R channel window processing, respectively. Window processing is performed by the unit 108, the SL channel window processing unit 110, and the SR channel window processing unit 112.
[0037]
C, L, R, SL, and SR channels output from the C channel window processing unit 104, the L channel window processing unit 106, the R channel window processing unit 108, the SL channel window processing unit 110, and the SR channel window processing unit 112, respectively. The audio data is downmixed to the desired number of output channels by the normal decoding downmix unit 114 and stored in the output buffer 36.
[0038]
In the high-speed decoding unit 70, the band determination unit 72 determines a frequency band to be decoded based on the output period specified by the time measurement unit 14. For example, when removing a high frequency band, the upper limit frequency is determined.
[0039]
The high-speed decoding preprocessing unit 74 obtains MDCT coefficients for each audio channel by performing various processes on the PES signal limited to the frequency band determined by the band determination unit 72. The processing performed by the high-speed decoding preprocessing unit 74 is Huffman decoding processing, inverse quantization processing, scaling processing, MS stereo processing, and intensity stereo processing as in the normal decoding preprocessing unit 92.
[0040]
The switch unit 76 determines the block type of the window function used for the window processing for each audio channel from the MDCT coefficient of each audio channel acquired from the fast decoding preprocessing unit 74. The switch unit 76 distributes subsequent processing according to the determined block type. That is, an audio channel whose window function is a long type is allocated to the long type downmix unit 78, and an audio channel whose window function is a short type is allocated to the short type downmix unit 80. The ratio of the number of distributed audio channels may be, for example, 2 to 3 or 0 to 5.
[0041]
The long type downmix unit 78 and the short type downmix unit 80 downmix each input audio channel to a single channel. At this time, a plurality of audio channels are added limited to the frequency band determined by the band determining unit 72. In the case of a long type, it adds based on following Formula (1).
[Expression 1]

Here, k indicates the number of long-type audio channels and is in the range of 0-5. i is the number of long type samples, and is in the range of 0-1023. MDCT_ch [ch] [i] indicates a long type MDCT coefficient. However, when the sample number of the upper limit frequency is UpL [ch], MDCT_ch [ch] [m + 1] = 0 (UpL [ch] ≦ m ≦ 1023).
[0042]
In the case of a short type, it adds based on following Formula (2).
[Expression 2]

Here, k indicates the number of short type audio channels, and is in the range of 0-5. i is the number of short-type samples and is in the range of 0-127. MDCT_ch [ch] [128 × B + i] indicates the MDCT coefficient of the B type window of the short type. B indicates the number of windows in one frame, and ranges from 0 to 7. However, when the sample number of the upper limit frequency is UpS [ch], MDCT_ch [ch] [128 × B + m + 1] = 0 (UpS [ch] ≦ m ≦ 127).
[0043]
The output of the long type downmix unit 78 is subjected to IMDCT processing by the long type IMDCT unit 82 and window processing by the long type window processing unit 86. The output of the short type downmix unit 80 is subjected to IMDCT processing by the short type IMDCT unit 84 and window processing by the short type window processing unit 88. The outputs of the long type window processing unit 86 and the short type window processing unit 88 are added and stored in the time buffer 34 as audio data of a monaural signal. With the above configuration, the PES signal can be decoded at a desired decoding speed, that is, a speed that can eliminate the delay caused by the stop of decoding.
[0044]
FIG. 3 shows an example of calculating the sample number of the upper limit frequency designated for each audio channel. For example, the long type sample number UpL [ch] is 1023− (A−β) × 512 × α for the C channel, L channel, and R channel, and 1023− (A−γ) × 512 for the SL channel and SR channel. X α. The short type sample number UpS [ch] is 127− (A−β) × 64 × α for the C channel, L channel, and R channel, and 127− (A−γ) × 64 × α for the SL channel and SR channel. To do. Here, α = 0.8, β = 1.5, and γ = 0.5. A is the speed of the decoding process, and is in the range of 1.5 to 2. When the calculation result is a negative value, the upper limit frequency is set to zero.
[0045]
FIG. 4 is a flowchart showing a procedure of time reduction processing by the playback apparatus 10. When the user gives an instruction for time reduction playback, decoding by the stream decoding unit 22 is started (S10), and time reduction processing for the obtained audio data is started (S12). When the storage amount of the output buffer 36 is within the upper limit, the processing is continued as it is (S14N), and when the upper limit is exceeded (S14Y), a stop request is notified to the stream decoding unit 22 (S16). . At that time, if the storage capacity of the input buffer 32 does not exceed the upper limit and the free space is sufficient (S18Y), the stream decoding unit 22 stops decoding (S20), and the time measurement unit 14 starts measuring the stop time. (S22).
[0046]
Thereafter, when the storage amount of the input buffer 32 exceeds the upper limit amount and becomes full (S24Y), the time measurement unit 14 ends the measurement of the stop time (S28), the stream decoding unit 22 releases the stop, and the decoding is performed. Is resumed (S30). Further, even when the storage amount of the input buffer 32 does not exceed the upper limit amount (S24N), also when the storage amount of the output buffer 36 falls below the lower limit amount and becomes empty (S26Y), the stop time The measurement is stopped (S28), and decoding is resumed (S30). At this time, high-speed decoding is executed at a speed corresponding to the stop time (S32). As long as the storage amount of the output buffer 36 does not fall below the lower limit amount (S26N), the decoding remains stopped.
[0047]
On the other hand, when the stop request is notified in S16, if the storage capacity of the input buffer 32 exceeds the upper limit and the free space is not sufficient (S18N), the processing from S20 to S32 is skipped, and the time-saving processing unit 24 is time-saving. The output amount by the process is adjusted (S36), and the stream decoding unit 22 performs high-speed decoding at a predetermined speed until the storage amount of the input buffer 32 falls within the upper limit amount (S40). The processes from S14 to S32 are repeated while the short-time reproduction is continued (S34Y).
[0048]
FIG. 5 is a flowchart showing details of the high-speed decoding process in S32 of FIG. When the stop time is 0 (S50N), the processing of S52 to S64 is skipped and high speed decoding is not executed. If the stop time is greater than 0 (S50Y), the frequency band to be decoded is determined according to the stop time (S52), and pre-processing limited to that frequency band is executed (S54). The block type of the window function is determined for each audio channel (S56), and subsequent processing is assigned to each block type (S58). The frequency band is limited for each block type and downmixed (S60), IMDCT processing is performed for each block type (S62), and window processing is performed for each block type (S64).
[0049]
(Second Embodiment)
In the present embodiment, the configuration of the stream decoding unit 22 is further simplified. FIG. 6 is a functional block diagram showing details of the stream decoding unit in the present embodiment. The stream decoding unit 22 of this embodiment is different from that of the first embodiment in that the normal decoding functional block and the high-speed decoding functional block are shared. For example, in the first embodiment, the normal decoding pre-processing unit 92 and the high-speed decoding pre-processing unit 74 are provided as functional blocks for pre-processing, but in the present embodiment, they are shared as the pre-processing unit 120. The preprocessing unit 120 performs the preprocessing by limiting to the frequency band determined by the band determination unit 72, but the normal processing may be performed without the limitation of the frequency band during normal decoding.
[0050]
The first to

fifth IMDCT units

122, 124, 126, 128, and 130 are the C channel IMDCT unit 94, the L channel IMDCT unit 96, the R channel IMDCT unit 98, the SL channel IMDCT unit 100, and SR of the first embodiment at the time of normal decoding, respectively. It functions as the channel IMDCT unit 102. Further, the first IMDCT unit 122 and the second IMDCT unit 124 function as the long type IMDCT unit 82 or the short type IMDCT unit 84 of the first embodiment at the time of high-speed decoding, respectively.
[0051]
The first to fifth

window processing units

132, 134, 136, 138, and 140 function as the C channel window processing units 104, 106, 108, 110, and 112 of the first embodiment at the time of normal decoding, respectively. Further, the first window processing unit 132 and the second window processing unit 134 function as the long type window processing unit 86 or the short type window processing unit 88 of the first embodiment at the time of high-speed decoding, respectively.
[0052]
The first downmix unit 142 functions as the normal decoding downmix unit 114 of the first embodiment during normal decoding, and functions as the long type downmixing unit 78 of the first embodiment during high speed decoding. The second downmix unit 144 functions as a short type downmix unit 80 during high-speed decoding. The switch unit 146 functions as the switch unit 76 of the first embodiment during high-speed decoding.
[0053]
In the above configuration, at the time of high-speed decoding, the switch unit 146 distributes the output of the preprocessing by the preprocessing unit 120 to the first downmix unit 142 and the second downmix unit 144 according to the window function block type. The first IMDCT unit 122 and the second IMDCT unit 124 perform IMDCT processing on the respective outputs, and the first window processing unit 132 and the second window processing unit 134 perform window processing on the respective outputs. These outputs are added and stored in the time buffer 34. Thereby, the same operation as that of the stream decoding unit 22 of the first embodiment can be realized with a simpler configuration.
[0054]
The present invention has been described based on the embodiments. This embodiment is an exemplification, and it is understood by those skilled in the art that various modifications can be made to the combination of each component and each processing process, and such modifications are also within the scope of the present invention. . Hereinafter, modifications will be described.
[0055]
In the embodiment, the present invention is realized as a playback device that switches between normal playback and short-time playback. In a modification, it may be realized as a device dedicated to short-time playback. Further, the stream decoding unit 22 may be realized as an independent decoding device, and the portion of the high speed decoding unit 70 may be realized as an independent high speed decoding device.
[0056]
【The invention's effect】
According to the present invention, it is possible to realize a short time reproduction with a simpler configuration.
[Brief description of the drawings]
FIG. 1 is a functional block diagram showing a configuration of a playback device in a first embodiment.
FIG. 2 is a functional block diagram showing details of a stream decoding unit in the first embodiment.
FIG. 3 is a diagram illustrating a calculation example of a sample number of an upper limit frequency designated for each audio channel.
FIG. 4 is a flowchart showing a procedure of time reduction processing by the playback device.
FIG. 5 is a flowchart showing details of high-speed decoding processing.
FIG. 6 is a functional block diagram showing details of a stream decoding unit in the second embodiment.
[Explanation of symbols]
22 stream decoding unit, 24 time reduction processing unit, 32 input buffer, 36 output buffer, 42 input monitoring unit, 44 output monitoring unit, 72 bandwidth determination unit, 74 high-speed decoding preprocessing unit, 76 switch unit, 78 long type downmix unit , 80 short type downmix section, 82 long type IMDCT section, 84 short type IMDCT section, 86 long type window processing section, 88 short type window processing section.

Claims

A stream decoding unit for decoding the encoded audio stream;
A time-shortening processing unit for generating time-shortening data obtained by removing silent portions from the audio data generated by the decoding;
An output buffer for temporarily storing the time-saving data until it is output;
The stream decoding unit
When the storage amount of the output buffer exceeds a set upper limit amount, the decoding is paused,
When releasing the pause and restarting the decoding, a simple playback is performed in which a frequency band to be decoded is determined based on the pause time and the audio stream is decoded into audio data limited to the determined frequency band apparatus.