JP4441989B2

JP4441989B2 - Encoding apparatus and encoding method

Info

Publication number: JP4441989B2
Application number: JP2000159931A
Authority: JP
Inventors: 智弘小谷田
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2000-05-30
Filing date: 2000-05-30
Publication date: 2010-03-31
Anticipated expiration: 2020-05-30
Also published as: JP2001337699A

Abstract

PROBLEM TO BE SOLVED: To judge easily whether or not continuity among several files should be considered when the several files are coded or decoded. SOLUTION: A retrieval condition is set in a step S51 and a variable is initialized in a step S52. When it is determined that there is a file to be retrieved in a step S53, the file is opened in a step S55 and the front part data is read in a step S56. In a step S57 the condition of set threshold value is compared with the data read un. If the data read in is above the set threshold value, warning process is done in a step S58. In a step S59, S60 and S61 final part data is processed. In a step S62 the file is closed and in a step S63 i is incremented, returning to the step S53. Only the file that is given a warning is listened to and thus it is easy to judge whether or not the continuity should be considered.

Description

【０００１】
【発明の属する技術分野】
この発明は、オーディオデータ等のディジタル信号に係る符号化装置および符号化方法に関する。
【０００２】
【従来の技術】
オーディオ信号の高能率符号化に係る従来技術として、例えば、時間領域のオーディオ信号を単位時間毎にブロック化し、ブロック毎の時間軸上の信号を周波数軸上の信号に変換（直交変換）して複数の周波数帯域に分割し、各帯域毎に符号化するブロック化周波数帯域分割方式の一つである変換符号化方法が知られている。また、時間領域のオーディオ信号を単位時間毎にブロック化せずに、複数の周波数帯域に分割して符号化する非ブロック化周波数帯域分割方法の一つである帯域分割符号化（サブ・バンド・コーディング（ＳＢＣ：Sub Band Coding ））方法が知られている。
【０００３】
さらに、上述の帯域分割符号化と変換符号化とを組み合わせてなる高能率符号化方法も知られている。この方法では、例えば、帯域分割符号化方式によって分割した各帯域毎の信号を、変換符号化方式によって周波数領域の信号に直交変換し、直交変換された各帯域毎に符号化が施される。
【０００４】
ここで、上述した帯域分割符号化方式に使用される帯域分割用フィルタとしては、例えばＱＭＦ(Quadrature Mirror filter)等のフィルタがある。ＱＭＦについては、例えば、 R.E.Crochiere Digital coding of speech in subbands Bell Syst.Tech. J. Vol.55, No.8(1976)に述べられている。また、ICASSP 83, BOSTON Polyphase Quadrature filters-A new subband coding technique JosephH. Rothweiler には、ポリフェーズクワドラチャフィルタ(Polyphase Quadrature filter) などの等バンド幅のフィルタ分割手法および装置が述べられている。
【０００５】
また、直交変換としては、例えば、入力オーディオ信号を所定単位時間（フレーム）でブロック化し、該ブロック毎に高速フーリエ変換（ＦＦＴ）やコサイン変換（ＤＣＴ）、モディファイドＤＣＴ変換（ＭＤＣＴ）等を行うことで時間軸を周波数軸に変換するような方法が知られている。ＭＤＣＴについては、例えば、ICASSP 1987 Subband/Transform Coding Using Filter Bank Designs Based on Time Domain Aliasing Cancellation J.P.Princen A.B.Bradley Univ. of Surrey Royal Melbourne Inst.of Tech. に述べられている。
【０００６】
一方、周波数帯域分割された各周波数成分を量子化する際に、人間の聴覚特性を考慮した周波数分割幅を用いる符号化方法が知られている。すなわち、臨界帯域（クリティカルバンド）と呼ばれる、帯域幅が高域程広くなるような帯域幅が広く用いられている。このような臨界帯域を用いてオーディオ信号を複数バンド（例えば２５バンド）の帯域に分割することがある。このような帯域分割方法によれば、各帯域毎のデータを符号化する際に、各帯域毎に所定のビット配分、或いは各帯域毎に適応的なビット配分による符号化が行われる。例えば、ＭＤＣＴ処理によって生成されるＭＤＣＴ係数データを上述したようなビット配分によって符号化する場合には、各ブロック毎に対応して生成される各帯域毎のＭＤＣＴ係数データに対して適応的なビット数が配分され、そのようなビット数配分の下で符号化が行われる。
【０００７】
このようなビット配分方法およびそれを実現する装置についての公知文献として、例えば以下のようなものが挙げられる。まず、例えばIEEE Transactions of Accoustics,Speech,and Signal Processing,vol.ASSP-25,No.4,August(1977)には、各帯域毎の信号の大きさに基づいてビット配分を行う方法が記載されている。また、例えばICASSP 1980 Thecritical band coder--digital encoding of the perceptual requirements of the auditory system M.A. Kransner MIT には、聴覚マスキングを利用することによって各帯域毎に必要な信号対雑音比を得て固定的なビット配分を行う方法が記載されている。
【０００８】
また、各帯域毎の符号化に際しては、各帯域毎に正規化を行って量子化を行うことにより、より効率的な符号化を実現するいわゆるブロックフローティング処理が行われている。例えば、ＭＤＣＴ処理によって生成されるＭＤＣＴ係数データを符号化する際には、各帯域毎に上述のＭＤＣＴ係数の絶対値の最大値等に対応した正規化を行った上で量子化を行うことにより、より効率的な符号化が行われる。正規化処理は例えば以下のように行われる。すなわち、予め番号付けされた複数種類の値を用意し、それら複数種類の値の内で各ブロックについての正規化に係るものを所定の演算処理によって決定し、決定した値に付されている番号を正規化情報として使用する。複数種類の値に対応する番号付けは、例えば、番号の１の増減に、オーディオレベルの２ｄＢの増減が対応する等の一定の関係の下で行われる。
【０００９】
上述したような方法で生成される高能率符号化データは、次のようにして復号化される。まず、各帯域毎のビット配分情報、正規化情報等を参照して、符号化データに基づいてＭＤＣＴ係数データを生成する処理がなされる。このＭＤＣＴ係数データに基づいていわゆる逆直交変換（ＩＭＤＣＴ）が行われることにより、時間領域のデータが生成される。高能率符号化の過程で帯域分割用フィルタによる帯域分割が行なわれていた場合は、帯域合成フィルタを用いて時間領域のデータを合成する処理がさらになされる。
【００１０】
上述した符号化に用いられている直交変換のＭＤＣＴ処理、並びに復号化に用いられている、逆直交変換のＩＭＤＣＴ処理では、処理を行うフレーム間の不連続性を防止するために、いわゆるオーバーラップ処理が利用されている。ある楽曲を符号化し、また、復号化する時には、当該楽曲の始点および終点については、このオーバーラップおよび変換サイズを考慮した適合処理が行われる。
【００１１】
上述した方法での高能率符号化は、基本的には楽曲単位で行われるが、大量の楽曲を高能率符号化処理するような場合、各楽曲の処理の終了毎に、ユーザが次の楽曲の処理の開始を促すのは非効率的であるため、通常、あらかじめ所望の楽曲を選択して、自動的に選択された楽曲が高能率符号化されるような処理が行われる。より具体的には、電子音楽配信の配信用サーバでは、ハードディスクに大量のＰＣＭファイルを格納し、コンピュータソフトウェア処理によって高速に高能率符号化の処理がなされる。
【００１２】
配信用サーバのように、大量の楽曲を自動的に高能率符号化処理する場合、楽曲単位で高能率符号化が行われるので、各楽曲に対して、始点および終点における、直交変換におけるオーバーラップおよび変換サイズを考慮した適合処理を行うことになる。楽曲によっては、他の楽曲との相関関係がある場合、例えば当該楽曲の始点が他の楽曲の終点との連続性を保つような場合がある。具体例としては、ライブ版、リミックス、ダンス系等の音楽では、楽曲同士が無音期間を介することなくつながっていることがある。このような場合でも、上述したような始点および終点における適合処理を楽曲毎に独立して行うと、高能率符号化処理後のデータは、楽曲間の連続性を失ってしまう問題がある。復号化においても同様の問題が発生する。楽曲間に連続性があるものを処理する場合には、始点および終点における適合処理を行わずに、楽曲間データを連続的に処理することが望ましい。
【００１３】
【発明が解決しようとする課題】
ところで、上述したような連続的な符号化を行うか否かの判断は、基本的には、実際に処理を行いたい各楽曲の先頭部分と最終部分について試聴確認することによって行われることになる。しかしながら、大量の楽曲を処理しなければならないような場合には、それらの全ての楽曲についての試聴確認を行う必要が生じ、多大な時間を費やすこととなる。また、長時間の確認作業に伴い、判断能力が低下し、判断の正確性が低下する可能性もある。
【００１４】
したがって、この発明の目的は、処理を行うべく選択された楽曲の、先頭部分と最終部分のデータを自動的に分析し、符号化または復号化に伴う連続性を考慮するか否かの判断を、容易、且つ正確に行い、作業効率を向上させることができる符号化装置および符号化方法を提供することにある。
【００１５】
【課題を解決するための手段】
請求項１の発明は、複数のディジタルオーディオファイルに対して所定長毎にブロック化を施し、ブロック処理されたディジタルオーディオファイルに対して圧縮処理を施す符号化装置であって、
複数のディジタルオーディオファイルの中から圧縮処理を施す２以上のディジタルオーディオファイルを、圧縮処理を施す順序で選択する第１の選択手段と、
第１の選択手段にて選択された２以上のディジタルオーディオファイルのうち、圧縮処理を施す順序で隣接するディジタルオーディオファイルの前方に位置するディジタルオーディオファイルの終端部近傍のブロックと、第１の選択手段にて選択された隣接するディジタルオーディオファイルの後方に位置するディジタルオーディオファイルの始端部近傍のブロックと、隣接する２つのディジタルオーディオファイルに跨がっているブロックとに基づいて符号化処理を施す第１の符号化手段と、
第１の選択手段にて選択された２以上のディジタルオーディオファイルのうち、圧縮処理を施す順序で隣接するディジタルオーディオファイルの前方に位置するディジタルオーディオファイルの終端部近傍のブロックまたは隣接するディジタルオーディオファイルの後方に位置するディジタルオーディオファイルの始端部近傍のブロックと、隣接する２つのディジタルオーディオファイルに跨がっているブロックとに基づいて符号化処理を施す第２の符号化手段と、
第１の選択手段にて選択された２以上のディジタルオーディオファイルの始点、終点のデータを分析する分析手段と、
分析手段の分析結果を参照して、第１の符号化手段における符号化処理と第２の符号化手段における符号化処理との一方を選択する第２の選択手段とを備える符号化装置である。
【００１７】
請求項５の発明は、複数のディジタルオーディオファイルに対して所定長毎にブロック化を施し、ブロック処理されたディジタルオーディオファイルに対して圧縮処理を施す符号化方法であって、
複数のディジタルオーディオファイルの中から圧縮処理を施す２以上のディジタルオーディオファイルを、圧縮処理を施す順序で選択する第１の選択ステップと、
第１の選択ステップにて選択された２以上のディジタルオーディオファイルのうち、圧縮処理を施す順序で隣接するディジタルオーディオファイルの前方に位置するディジタルオーディオファイルの終端部近傍のブロックと、第１の選択ステップにて選択された隣接するディジタルオーディオファイルの後方に位置するディジタルオーディオファイルの始端部近傍のブロックと、隣接する２つのディジタルオーディオファイルに跨がっているブロックとに基づいて符号化処理を施す第１の符号化ステップと、
第１の選択ステップにて選択された２以上のディジタルオーディオファイルのうち、圧縮処理を施す順序で隣接するディジタルオーディオファイルの前方に位置するディジタルオーディオファイルの終端部近傍のブロックまたは隣接するディジタルオーディオファイルの後方に位置するディジタルオーディオファイルの始端部近傍のブロックと、隣接する２つのディジタルオーディオファイルに跨がっているブロックとに基づいて符号化処理を施す第２の符号化ステップと、
第１の選択ステップにて選択された２以上のディジタルオーディオファイルの始点、終点のデータを分析する分析ステップと、
分析ステップの分析結果を参照して、第１の符号化ステップにおける符号化処理と第２の符号化ステップにおける符号化処理との一方を選択する第２の選択ステップとを備える符号化方法である。
【００２０】
以上のような発明によれば、処理対象の複数ファイルの始点、終点のデータを分析することによって、複数ファイルの連続性に関する分析を行うことができる。分析結果を参照して、試聴を行うファイルを限定することが可能となる。それによって、処理の効率を向上できる。
【００２１】
【発明の実施の形態】
この発明の一実施形態について、以下、図面を参照して説明する。一実施形態では、オーディオＰＣＭ信号等の入力ディジタル信号を、帯域分割符号化（ＳＢＣ）、適応変換符号化（ＡＴＣ）および適応ビット割当の技術を用いて高能率符号化する。この高能率符号化技術について、図１を参照して説明する。
【００２２】
図１に示す高能率符号化装置では、入力ディジタル信号を複数の周波数帯域に分割すると共に、各周波数帯域毎に直交変換を行って、得られた周波数軸のスペクトルデータを、低域では、後述する人間の視覚特性を考慮したいわゆる臨界帯域幅（クリティカルバンド）毎に、中高域では、ブロックフローティング効率を考慮して臨界帯域幅を細分化した帯域毎に、適応的にビット割当して符号化している。通常このブロックが量子化雑音発生ブロックとなる。さらに、一実施形態においては、直交変換の前に入力信号に応じて適応的にブロックサイズ（ブロック長）を変化させている。
【００２３】
例えばサンプリング周波数が４４．１ｋＨｚの場合、入力端子１００を介して０〜２２ｋＨｚのオーディオＰＣＭ信号がＱＭＦフィルタ等の帯域分割フィルタ１０１に供給される。帯域分割フィルタ１０１は、供給される信号を０〜１１ｋＨｚ帯域と１１ｋＨｚ〜２２ｋＨｚ帯域とに分割する。１１〜２２ｋＨｚ帯域の信号はＭＤＣＴ(Modified Discrete Cosine Transform)回路１０３およびブロック決定回路１０９、１１０、１１１に供給される。
【００２４】
また、０ｋＨｚ〜１１ｋＨｚ帯域の信号は帯域分割フィルタ１０２に供給される。帯域分割フィルタ１０２は、供給される信号を５. ５ｋＨｚ〜１１ｋＨｚ帯域と０〜５. ５ｋＨｚ帯域とに分割する。５．５〜１１ｋＨｚ帯域の信号はＭＤＣＴ回路１０４およびブロック決定回路１０９、１１０、１１１に供給される。また、０〜５. ５ｋＨｚ帯域の信号は、ＭＤＣＴ回路１０５およびブロック決定回路１０９、１１０、１１１に供給される。帯域分割フィルタ１０１、１０２は、例えばＱＭＦフィルタ等を用いて構成することができる。ブロック決定回路１０９は、供給される信号に基づいてブロックサイズを決定し、決定したブロックサイズを示す情報をＭＤＣＴ回路１０３および出力端子１１３に供給する。
【００２５】
ブロック決定回路１１０は、供給される信号に基づいてブロックサイズを決定し、決定したブロックサイズを示す情報をＭＤＣＴ回路１０４および出力端子１１５に供給する。ブロック決定回路１１１は、供給される信号に基づいてブロックサイズを決定し、決定したブロックサイズを示す情報をＭＤＣＴ回路１０５お。よび出力端子１１７に供給する。ブロックサイズブロック決定回路１１０、１１１、１１２は、供給される信号の時間特性、周波数分布に応じて適応的にブロックサイズ（ブロック長）を設定する。
【００２６】
ＭＤＣＴ回路１０３、１０４、１０５は、供給される信号に基づいてＭＤＣＴ処理を行い、ＭＤＣＴ係数データまたは周波数軸上のスペクトルデータを生成する。ＭＤＣＴ回路１０３が生成する高域のＭＤＣＴ係数データまたは周波数軸上のスペクトルデータは、ブロックフローティングの有効性を考慮して臨界帯域幅を細分化する処理を施された後に適応ビット割当符号化回路１０６およびビット割当算出回路１１８に供給される。ＭＤＣＴ回路１０４が生成する中域のＭＤＣＴ係数データまたは周波数軸上のスペクトルデータは、ブロックフローティングの有効性を考慮して臨界帯域幅を細分化する処理を施された後に適応ビット割当符号化回路１０７およびビット割当算出回路１１８に供給される。
【００２７】
ＭＤＣＴ回路１０５が生成する低域のＭＤＣＴ係数データまたは周波数軸上のスペクトルデータは、臨界帯域（クリティカルバンド）毎にまとめる処理を施された後に適応ビット割当符号化回路１０８およびビット割当算出回路１１８に供給される。ここで、臨界帯域とは、人間の聴覚特性を考慮して分割された周波数帯域であり、ある純音の周波数近傍の同じ強さの狭帯域バンドノイズによって当該純音がマスクされる時に、当該狭帯域バンドノイズの帯域のことである。臨界帯域は、高域ほど帯域幅が広くなるという性質がある。０〜２２ｋＨｚの全周波数帯域は、例えば２５のクリティカルバンドに分割されている。
【００２８】
ビット割当算出回路１１８は、供給されるＭＤＣＴ係数データまたは周波数軸上のスペクトルデータ、およびブロックサイズ情報に基づいて、後述するようなマスキング効果等を考慮して上述の臨界帯域およびブロックフローティングを考慮した各分割帯域毎のマスキング量、エネルギーおよび或いはピーク値等を計算し、計算結果に基づいて各帯域毎にブロックフロ−ティングの状態を示すスケ−ルファクタ、および割当てビット数を計算する。計算された割当てビット数は、適応ビット割当符号化回路１０６、１０７、１０８に供給される。以下の説明において、ビット割当の単位とされる各分割帯域を単位ブロックと表記する。
【００２９】
適応ビット割当符号化回路１０６は、ブロック決定回路１０９から供給されるブロックサイズ情報、ビット割当算出回路１１８から供給される割当ビット数および正規化情報としてのスケールファクタ情報に応じて、ＭＤＣＴ回路１０３から供給されるスペクトルデータまたはＭＤＣＴ係数データを再量子化（正規化して量子化）する処理を行う。かかる処理の結果として、高能率符号化データが生成される。この高能率符号化は演算器１２０に供給される。適応ビット割当符号化回路１０７は、ブロック決定回路１１０から供給されるブロックサイズ情報、ビット割当算出回路１１８から供給される割当ビット数およびスケールファクタ情報に応じて、ＭＤＣＴ回路１０４から供給されるスペクトルデータまたはＭＤＣＴ係数データを再量子化する処理を行う。かかる処理の結果として、高能率符号化データが生成される。この高能率符号化データが演算器１２１に供給される。
【００３０】
適応ビット割当符号化回路１０８は、ブロック決定回路１１０から供給されるブロックサイズ情報、ビット割り当て算出回路１１８から供給される割当ビット数およびスケールファクタ情報に応じて、ＭＤＣＴ回路１０５から供給されるスペクトルデータまたはＭＤＣＴ係数データを再量子化する。かかる処理の結果として、高能率符号化データが生成される。この高能率符号化データは演算器１２２に供給される。正規化情報変更回路１１９、および演算器１２０、１２１、１２２については後述する。
【００３１】
図２に、ＭＤＣＴ回路１０３，１０４，１０５に供給される、各帯域毎のデータの例を示す。ブロック決定回路１０９，１１０，１１１の動作により、帯域分割フィルタ１０１、１０２から出力される計３個のデータについて、各帯域毎について独立に直交変換ブロックサイズを設定することができると共に、信号の時間特性、周波数分布等により時間分解能を切り換えることが可能とされている。すなわち、信号が時間的に準定常的である場合には、図２Ａに示すような、直交変換ブロックサイズを例えば１１．６ｍｓと大きくするＬｏｎｇＭｏｄｅが用いられる。
【００３２】
一方、信号が非定常的である場合には、直交変換ブロックサイズをＬｏｎｇＭｏｄｅ時に比べて２分割または４分割とするモードが用いられる。より具体的には、全てを４分割して例えば２．９ｍｓとするＳｈｏｒｔＭｏｄｅ（図２Ｂ参照）、或いは、一部を２分割して例えば５．８ｍｓとし、他の一部を４分割して例えば２．９ｍｓとするＭｉｄｄｌｅＭｏｄｅ−ａ（図２Ｃ参照）または、ＭｉｄｄｌｅＭｏｄｅ−ｂ（図２Ｄ参照）が用いられる。このように時間分解能を様々に設定することにより、実際の複雑な入力信号に適応できるようになされる。
【００３３】
回路規模等に係る制約が小さい場合には、直交変換ブロックサイズの分割をさらに複雑なものとすることにより、実際の入力信号をより適切に処理できることは明白である。上述したようなブロックサイズは、ブロック決定回路１０９，１１０，１１１によって決定され、決定されたブロックサイズの情報はＭＤＣＴ回路１０３，１０４，１０５およびビット割り当て算出回路１１８に供給されると共に、出力端子１１３、１１５、１１７を介して出力される。
【００３４】
次に、図３を参照して、ビット割当て算出回路１１８について詳細に説明する。入力端子３０１を介して、ＭＤＣＴ回路１０３、１０４、１０５からの周波数軸上のスペクトルデータまたはＭＤＣＴ係数、およびブロック決定回路１０９、１１０、１１１からのブロックサイズ情報がエネルギー算出回路３０２に供給される。エネルギー算出回路３０２は、例えば当該単位ブロック内での各振幅値の総和を計算する等の方法で単位ブロック毎のエネルギーを計算する。なお、エネルギー算出回路３０２の代わりに振幅値のピーク値、平均値等を計算する構成を設け、振幅値のピーク値、平均値等の計算値に基づいてビット割当て処理を行うようしても良い。
【００３５】
エネルギー算出回路３０２の出力の一例を図４に示す。図４では、各バンド毎の総和値のスペクトルＳＢを、先端に丸を付した縦方向の線分によって示す。ここで、横軸が周波数、縦軸が信号強度をそれぞれ示す。なお、図示が煩雑となるのを避けるため、図４では、単位ブロックによる分割数を１２ブロック（Ｂ１〜Ｂ１２）とし、Ｂ１２のスペクトルのみに符号「ＳＢ」を付した。
【００３６】
また、エネルギー算出回路３０２は、単位ブロックのブロックフローティングの状態を示す正規化情報であるスケールファクタ値を決定する処理を行う。具体的には、例えばあらかじめスケールファクタ値の候補として幾つかの正の値を用意し、それらの内、単位ブロック内のスペクトルデータ又はＭＤＣＴ係数の絶対値の最大値以上の値をとるものの中で最小のものを当該単位ブロックのスケールファクタ値として採用する。スケールファクタ値の候補は、実際の値と対応した形で、例えば数ビットを用いて番号付けを行ない、その番号を図示しないＲＯＭ（Read Only Memory) 等に記憶させておけば良い。この際に、スケールファクタ値の候補は、番号順に例えば２ｄＢの間隔での値を持つように規定しておく。ある単位ブロックについて採用されたスケールファクタ値に付される番号がサブ情報として用いられ、当該単位ブロックについてのスケールファクタ情報とされる。
【００３７】
エネルギー算出回路３０２の出力、すなわち、スペクトルＳＢの各値は、畳込みフイルタ回路３０３に送られる。畳込みフイルタ回路３０３は、例えば、入力データを順次遅延させる複数の遅延素子と、これら遅延素子からの出力にフイルタ係数（重み付け関数）を乗算する複数の乗算器と、各乗算器出力の総和をとる総和加算器とから構成することができる。畳込みフイルタ回路３０３は、スペクトルＳＢのマスキングにおける影響を考慮するための、スペクトルＳＢに所定の重み付け関数を掛けて加算するような畳込み（コンボリユーション）処理を施す。この畳込み処理により、図４中で点線で示す部分の総和が計算される。
【００３８】
図３に戻り、畳込みフイルタ回路３０３の出力は演算器３０４に供給される。演算器３０４には、さらに、許容関数（マスキングレベルを表現する関数）が（ｎ−ａｉ）関数発生回路３０５から供給される。演算器３０４は、許容関数に従って、畳込みフイルタ回路３０３によって畳み込まれた領域における、許容可能なノイズレベルに対応するレベルαを計算する。ここで、許容可能なノイズレベル（許容ノイズレベル）に対応するレベルαとは、後述するように、逆コンボリユーション処理を行うことによって、クリテイカルバンドの各バンド毎の許容ノイズレベルとなるようなレベルである。レベルαの算出値は、許容関数を増減させることによって制御される。
【００３９】
すなわち、許容ノイズレベルに対応するレベルαは、クリテイカルバンドのバンドの低域から順に与えられる番号をｉとすると、次の式（１）で求めることができる。
【００４０】
α＝Ｓ−（ｎ−ａｉ）（１）
【００４１】
式（１）において、ｎ，ａは定数でａ＞０、Ｓは畳込み処理されたスペクトルの強度であり、式（１）中（ｎ−ａｉ）が許容関数となる。一例としてｎ＝３８，ａ＝１とすることができる。
【００４２】
演算器３０４によって計算されるレベルαが割算器３０６に伝送される。割算器３０６は、レベルαを逆コンボリユーションする処理を行い、その結果としてレベルαからマスキングスペクトルを生成する。このマスキングスペクトルが許容ノイズスペクトルとなる。なお、逆コンボリユーション処理を行う場合、一般的には複雑な演算が行われる必要があるが、この発明の一実施形態では、簡略化した割算器３０６を用いて逆コンボリユーションを行っている。マスキングスペクトルは、合成回路３０７に供給される。合成回路３０７には、さらに、後述するような最小可聴カーブＲＣを示すデータが最小可聴カーブ発生回路３１２から供給される。
【００４３】
合成回路３０７は、割算器３０６の出力であるマスキングスペクトルと最小可聴カーブＲＣのデータとを合成することにより、マスキングスペクトルを生成する。生成されるマスキングスペクトルが減算器３０８に供給される。減算器３０８には、さらに、エネルギー検出回路３０２の出力、すなわち帯域毎のスペクトルＳＢが遅延回路３０９によってタイミングを調整された上で供給される。減算器３０８は、マスキングスペクトルとスペクトルＳＢとに基づく減算処理を行う。
【００４４】
かかる処理の結果として、ブロック毎のスペクトルＳＢの、マスキングスペクトルのレベル以下の部分がマスキングされる。図５に、マスキングの一例を示す。スペクトルＳＢにおける、マスキングスペクトルのレベル（ＭＳと表記する）以下の部分がマスキングされていることがわかる。なお、図示が煩雑となるのを避けるため、図５中ではＢ１２においてのみ、スペクトルに符号「ＳＢ」を付すと共にマスキングスペクトルのレベルに符号「ＭＳ」を付した。
【００４５】
雑音絶対レベルが最小可聴カーブＲＣ以下ならばその雑音は人間には聞こえないことになる。最小可聴カーブは、コーデイングが同じであっても例えば再生時の再生ボリユームの違いによって異なる。但し、実際のデジタルシステムでは、例えば１６ビットダイナミックレンジへの音楽データの入り方にはさほど違いがないので、例えば４ｋＨｚ付近の最も耳に聞こえやすい周波数帯域の量子化雑音が聞こえないとすれば、他の周波数帯域ではこの最小可聴カーブのレベル以下の量子化雑音は聞こえないと考えられる。
【００４６】
従って、例えばシステムの持つワードレングスの４ｋＨｚ付近の雑音が聞こえないような使い方をする場合、最小可聴カーブＲＣとマスキングスペクトルＭＳとを合成することによって許容ノイズレベルを得るようにすれば、この場合の許容ノイズレベルは図６中の斜線で示す部分となる。なお、ここでは、最小可聴カーブの４ｋＨｚのレベルを例えば２０ビット相当の最低レベルに合わせている。図６では、各ブロック内の水平方向の実線としてＳＢ、各ブロック内の水平方向の点線としてＭＳをそれぞれ示した。但し、図示が煩雑となるのを避けるため、図６ではＢ１２のスペクトルのみについて符号「ＳＢ」、「ＭＳ」を付した。また、図６では、信号スペクトルＳＳを一点鎖線で示した。
【００４７】
図３に戻り、減算器３０８の出力は許容雑音補正回路３１０に供給される。許容雑音補正回路３１０は、例えば等ラウドネスカーブのデータ等に基づいて、減算器３０８の出力における許容雑音レベルを補正する。すなわち、許容雑音補正回路３１０は、上述したマスキング、聴覚特性等の様々なパラメータに基いて、各単位ブロックに対する割当ビットを算出する。許容雑音補正回路３１０の出力は、出力端子３１１を介して、ビット割当算出回路１１８の最終的な出力データとして出力される。ここで、等ラウドネスカーブとは、人間の聴覚特性に関する特性曲線であり、例えば１ｋＨｚの純音と同じ大きさに聞こえる各周波数での音の音圧を求めて曲線で結んだもので、ラウドネスの等感度曲線とも呼ばれる。
【００４８】
また、この等ラウドネスカーブは、図６に示した最小可聴カーブＲＣと同じ曲線を描く。この等ラウドネスカーブにおいては、例えば４ｋＨｚ付近では１ｋＨｚのところより音圧が８〜１０ｄＢ下がっても１ｋＨｚと同じ大きさに聞こえ、逆に、５０Ｈｚ付近では１ｋＨｚでの音圧よりも約１５ｄＢ高くないと同じ大きさに聞こえない。このため、最小可聴カーブＲＣのレベルを越える雑音（許容ノイズレベル）が等ラウドネスカーブに沿った周波数特性を持つようにすれば、その雑音が人間に聞こえないようにすることができる。
【００４９】
等ラウドネスカーブを考慮して許容ノイズレベルを補正することは、人間の聴覚特性に適合していることがわかる。以上のように、ビット割当算出回路１１８では、メイン情報としての直交変換出力スペクトルをサブ情報によって処理したデータと、サブ情報としてのブロックフローティングの状態を示すスケールファクタおよび語調を示すワードレンクスが得られる。これらの情報に基づいて、図１中の適応ビット符号化回路１０６、１０７、１０８が再量子化を行って、符号化フォーマットに従う高能率符号化データを生成する。
【００５０】
図１に戻り、正規化情報変更回路１１９について説明する。上述したように、エネルギー算出回路３０２によって決定されるスケールファクタ情報を操作することにより、例えば２ｄＢ毎のレベル調整を行うことができる。正規化情報変更回路１１９は、スケールファクタ情報の変更に係る値を生成し、生成した値をそれぞれ、演算器１２０、１２１、１２２に供給する。演算器１２０は、１２１、１２２は、それぞれ、適応ビット割当符号化回路１０６、１０７、１０８から供給される符号化データ中のスケールファクタ情報に、正規化情報変更回路１１９から供給される値を加算する。但し、正規化情報変更回路１１９から出力される値が負の場合は、演算器１２０、１２１、１２２は減算器として作用するものとする。この際の加算結果については、フォーマットで定められたスケールファクタの数値の範囲内に収まるような制限を行う。
【００５１】
なお、スケールファクタ情報に加算すべき値として、正規化情報変更回路１１９が全単位ブロックに対して同一の値を出力する場合にはレベル調整処理が行われるが、正規化情報変更回路１１９が単位ブロック毎に異なる値を出力するようにすれば、例えばフィルタ処理等を実現できる。フィルタ処理等を行う場合には、正規化情報変更回路１１９は、スケールファクタ情報に加算すべき値と、その値が加算されるべきスケールファクタ情報をに係る単位ブロックの番号との組を出力する。以上のような正規化情報調整処理は、後述する復号化の場合に実現することも可能である。
【００５２】
次に、高能率符号化データの符号化フォーマットについて、図７を参照して説明する。左側に示した数値０，１，２，‥‥，２１１はバイト数を表しており、この一例では２１２バイトを１フレームの単位としている。先頭の０バイト目の位置には、図１中のブロック決定回路１０９、１１０、１１１において決定された、各帯域のブロックサイズ情報を記録する。次の１バイト目の位置には、記録する単位ブロックの個数の情報を記録する。例えば高域側になる程、ビット割当算出回路１１８によってビット割当が０とされて記録が不必要となる場合が多いため、このような状況に対応するように単位ブロックの個数を設定することにより、聴感上の影響が大きい中低域に多くのビットを配分するようになされている。それと共に、かかる１バイト目の位置にはビット割当情報の２重書きを行なっている単位ブロックの個数、及びスケールファクタ情報の２重書きを行なっている単位ブロックの個数が記録される。
【００５３】
２重書きとは、エラー訂正用に、あるバイト位置に記録されたデータと同一のデータを他の場所に記録する方法である。２重書きされるデータの量を多くする程、エラーに対する強度が向上するが、２重書きされるデータの量を少なくする程、スペクトラムデータに使用できるデータ容量が多くなる。この符号化フォーマットの一例では、ビット割当情報、スケールファクタ情報のそれぞれについて独立に２重書きを行なう単位ブロックの個数を設定することにより、エラーに対する強度と、スペクトラムデータを記録するために使用されるビット数とを適切なものとするようにしている。なお、それぞれの情報について、規定されたビット内でのコードと単位ブロックとの個数の対応は、あらかじめフォーマットとして定めている。
【００５４】
１バイト目の位置の８ビットにおける記録内容の一例を図８に示す。ここでは、最初の３ビットを実際に記録される単位ブロックの個数の情報とし、後続の２ビットをビット割当情報の２重書きを行なっている単位ブロックの個数の情報とし、最後の３ビットをスケールファクタ情報の２重書きを行なっている単位ブロックの個数の情報とする。
【００５５】
図８において、２バイト目からの位置には、単位ブロックのビット割当情報が記録される。ビット割当情報の記録のために、単位ブロック１個当たり例えば４ビットが使用される。これにより、０番目の単位ブロックから順番に記録される単位ブロックの個数分のビット割当情報が記録されることになる。ビット割当情報のデータの後に、各単位ブロックのスケールファクタ情報が記録される。スケールファクタ情報の記録のために、単位ブロック１個当たり例えば６ビットが使用される。これにより、０番目の単位ブロックから順番に記録される単位ブロックの個数分のスケールファクタ情報が記録される。
【００５６】
スケールファクタ情報の後に、単位ブロック内のスペクトラムデータが記録される。スペクトラムデータは、０番目の単位ブロックより順番に、実際に記録させる単位ブロックの個数分記録される。各単位ブロック毎に何本のスペクトラムデータが存在するかは、あらかじめフォーマットで定められているので、上述したビット割当情報によりデータの対応をとることが可能となる。なお、ビット割当が０の単位ブロックについては記録を行なわない。
【００５７】
このスペクトラム情報の後に、上述したスケールファクタ情報の２重書き、およびビット割当情報の２重書きを行なう。この２重書きの記録方法は、個数の対応を図８に示した２重書きの情報に対応させるだけで、その他の点については上述のスケールファクタ情報、およびビット割当情報の記録と同様である。最後のバイトすなわち２１１バイト目、およびその１バイト前の位置すなわち２１０バイト目には、それぞれ、０バイト目と１バイト目の情報が２重書きされる。これら２バイト分の２重書きはフォーマットとして定められており、スケールファクタ情報の２重書きやビット割当情報の２重書きのように、２重書き記録量の可変の設定はできない。
【００５８】
次に、高能率符号化データを復号化する復号化処理について説明する。復号化処理系の構成の一例を図９に示す。高能率符号化データは、入力端子７０７を介して演算器７１０に供給される。また、符号化処理において使用されたブロックサイズ情報、すなわち図１中の出力端子１１３、１１５、１１７の出力信号と等価のデータが入力端子７０８に供給される。また、正規化情報変更回路７０９は、各単位ブロックのスケールファクタ情報に加算または減算すべき値を生成する。
【００５９】
演算器７１０は、さらに、正規化情報変更回路７０９から数値データを供給される。演算器７１０は、供給される高能率符号化データ中のスケールファクタ情報に対して、正規化情報変更回路７０９から供給される数値データを加算する。但し、正規化情報変更回路７０９から供給される数値データが負の数の場合は、演算器７１０は減算器として作用するものとする。演算器７１０の出力は、適応ビット割当復号化回路７０６、および出力端子７１１に供給される。
【００６０】
適応ビット割当復号化回路７０６は、適応ビット割当情報を参照してビット割当てを解除する処理を、高域、中域、低域の各帯域について行う。高域、中域、低域のそれぞれに対する適応ビット割当て復号化回路７０６の出力は、逆直交変換回路７０３、７０４、７０５に供給される。逆直交変換回路７０３、７０４、７０５は、供給されるデータを逆直交変換処理する。これにより、周波数軸上の信号が時間軸上の信号に変換される。逆直交変換回路７０３、７０４、７０５の出力である、部分帯域の時間軸上信号は、帯域合成フィルタ７０１、７０２によって合成され、全帯域信号に復号化される。帯域合成フィルタ７０１、７０２としては、例えばＩＱＭＦ(Inverse Quadrature Mirror filter)等を使用することができる。
【００６１】
演算器７１０による加算または減算によってスケールファクタ情報を操作することにより、再生データについて例えば２ｄＢ毎のレベル調整を行うことができる。例えば、正規化情報変更回路７０９から全て同じ数値を出力し、その数値を全単位ブロックのスケールファクタ情報に一律に加算または減算する処理により、全単位ブロックに対して２ｄＢを単位とするレベル調整を行うことが可能とされる。
【００６２】
また、例えば、正規化情報変更回路７０９から単位ブロック毎に独立な数値を出力し、それらの数値を各単位ブロックのスケールファクタ情報に加算または減算する処理によって単位ブロック毎のレベル調整を行うことができ、その結果としてフィルタ機能を実現することができる。より具体的には、正規化情報変更回路７０９が単位ブロックの番号と、当該単位ブロックのスケールファクタ情報に加算または減算すべき値との組を出力させる等の方法で、単位ブロックと当該単位ブロックのスケールファクタ情報に加算または減算すべき値とが対応付けられるようにする。なお、演算器７１０による加算または減算の結果として生成されるスケールファクタ情報は、対応するスケールファクタ値が高能率符号化データのフォーマットで定められた範囲に収まるように制限される。
【００６３】
演算器７１０によって単位ブロックのレベル調整が行われたスケールファクタ値については、適応ビット割当復号化回路７０６の復号化の行程に使用されることにより、復号化信号のレベル調整を行うのみに利用することが可能であると共に、例えば符号化情報が記録された記録媒体よりスケールファクタ値を読み込み、調整が行われたスケールファクタ値を出力端子７１１に出力させ、記録媒体に記録されたスケールファクタ値を調整された値に変更することも可能である。記録媒体の情報の変更については、必要に応じて行えるものとする。これによって、非常に簡単なシステムで、記録媒体のレベル情報を変更することが可能となる。
【００６４】
上述の説明では、符号化回路、復号化回路の双方においてスケールファクタ情報の変更処理を行うものとした。これに対して、復号化回路のみにおいてスケールファクタ情報の変更処理を行うようにした場合にも、変更処理の結果として、レベル調整、フィルタ処理等の機能を充分に得ることができる。
【００６５】
次に、上述した高能率符号化における処理を行う時間単位について説明する。図１における入力端子１００には、オーディオのＰＣＭサンプルが供給されるが、入力後に行われるＭＤＣＴ回路１０３，１０４，１０５によるＭＤＣＴ処理においては、いわゆる直交変換処理を行うためのサンプル数が規定され、それが一つの単位となり、繰り返し処理がなされる。
【００６６】
ここでは、入力端子１００から入力された１０２４サンプルのＰＣＭサンプルが５１２本のＭＤＣＴ係数、またはスペクトラムデータとして、ＭＤＣＴ回路１０３，１０４，１０５より出力される。具体的には、入力端子１００から入力された１０２４個のＰＣＭサンプルが帯域分割フィルタ１０１によって、５１２個の高域サンプルと５１２個の低域サンプルと２５６個の中域サンプルとなる。その後に、帯域分割フィルタ１０２からの２５６個の低域サンプルは、ＭＤＣＴ回路１０５によって、１２８個の低域スペクトラムデータとなり、帯域分割フィルタ１０２からの２５６個の中域サンプルは、ＭＤＣＴ回路１０４によって、１２８個の中域スペクトラムデータとなり、帯域分割フィルタ１０１からの５１２個の高域サンプルは、ＭＤＣＴ回路１０３によって、２５６個の高域スペクトラムデータとなる。このように、合計５１２個のスペクトラムデータが１０２４個のＰＣＭサンプルから作成される。この１０２４個のＰＣＭサンプルが上述した高能率符号化の１回の処理を行う時間単位となり、図７に示した２１２バイトの高能率符号化データ、すなわち、１フレームとなる。
【００６７】
上述したように、１フレームは、例えば１０２４個のＰＣＭサンプルからなるが、図１中のＭＤＣＴ回路１０３，１０４，１０５によるＭＤＣＴ処理においては、通常、順次処理されていく各フレームにおいてオーバーラップ部分が生じる。ＰＣＭサンプルとフレームの関係を図１０を用いて説明する。図１０に示すように、例えば、ｎ番目からｎ＋１０２３番目までの１０２４個のＰＣＭサンプルがＮ番目のフレームで処理される場合に、Ｎ＋１番目のフレームでは、ｎ＋５１２番目からｎ＋１５３５番目までの１０２４個のＰＣＭサンプルが処理され、Ｎ＋２番目のフレームでは、ｎ＋１０２４番目からｎ＋２０４７番目までの１０２４個のＰＣＭサンプルが処理される。このように、一つのフレームは、隣接するサウンドフレームと、５１２個のＰＣＭサンプルのオーバーラップを持つ形となる。つまり、このような形で処理を行うと、高能率符号化情報の１フレームは、１０２４個のＰＣＭサンプルを処理したものであるが、隣接フレームとのオーバーラップを考慮すると、５１２個のＰＣＭサンプル相当ということになる。
【００６８】
図１０は、ＰＣＭサンプルの途中でのフレームとの対応を示しているが、ＰＣＭサンプルの始点については、例えば始点より以前の段階に５１２個の０データのＰＣＭサンプルを想定して、これらの５１２個の０データのＰＣＭサンプルを、最初のフレーム以前の仮想的なフレームとオーバーラップして処理するものとする。また、最後のフレームでは、サンプル列終了時点以後に５１２個の０データのＰＣＭサンプルを想定して、それら５１２個の０データのＰＣＭサンプルを、最後のフレーム以後の仮想的なフレームとオーバーラップして処理するものとする。
【００６９】
次に、上述した符号化または復号化方法について、いわゆるパソコン上のソフトウエアとして処理する方法について説明する。パソコン上での処理としては、主にハードディスク上のＰＣＭのデータファイルを高能率符号化することにより、ハードディスク上に高能率符号化データファイルを作成する、またはハードディスク上の高能率符号化データファイルを復号化処理することによりハードディスク上にＰＣＭのデータファイルを作成することが考えられる。この時、通常一つの楽曲が一つのファイルに対応される。
【００７０】
具体例として、いわゆるパソコンにおける、ＧＵＩ(Graphical User Interface)を利用したソフトウエアでの画面表示、操作方法、処理行程等について、図１１を用いて説明する。図１１は、符号化および復号化のソフトウエアのパソコン上での画面表示の一例を示すものである。このソフトウエアは、まずＰＣＭデータと高能率符号化データのためのディレクトリを選択する。８０１は、ＰＣＭデータファイルのディレクトリパスの表示部であり、現在この例ではＣドライブのＰＣＭＤＡＴＡという名のディレクトリが選択されていることが示されている。８０３は、表示部８０１にて示されたディレクトリ内のファイル構成を表示すると共に、ディレクトリ移動、ドライブ移動、ファイル選択等を行える表示操作部である。この例では、現在の表示部８０１で示されたディレクトリの下には更にｔｍｐという名称のディレクトリが存在していることが分かる。
【００７１】
また、「・・」の表示は、一つ上の階層のディレクトリを示しているものとする。また、ｔｍｐ以下６つのファイルはＰＣＭデータファイルを示している。また、その下の［−ｃ−］［−ｄ−］は、移動可能なドライブを示している。表示されているものが、ディレクトリか、ドライブか、ＰＣＭデータかの判断は、表示されている文字列や、文字列の横に付加されている、いわゆるアイコンにより、判断することが可能である。
【００７２】
ディレクトリとドライブの表示部は、その文字列位置にマウスポインタを対応させ、ダブルクリックすることで、現行ディレクトリ位置を、ダブルクリックした場所に移動させることが可能である。この例では、例えばｔｍｐの場所でダブルクリックを行うと、表示部８０１の表示は、Ｃ：￥ＰＣＭＤＡＴＡ￥ｔｍｐとなり、表示操作部８０３では、ｔｍｐの下のファイルの状態、および移動可能ドライブが示されるようになる。このように、ドライブ名やディレクトリ名をダブルクリックを繰り返すことにより、ＰＣＭデータファイル用の所望のディレクトリ位置に移動することができる。
【００７３】
８０２は、高能率符号化データ用のディレクトリ位置を表示する表示部であり、図示の例では、ＣドライブのＥＮＣＯＤＥＤＡＴＡという名のディレクトリが選択されていることが示されている。８０４は、表示部８０２にて示されたディレクトリ内のファイル構成を表示すると共に、ディレクトリ移動、ドライブ移動、ファイル選択等を行える表示操作部である。この例では、表示部８０２で示された現在のディレクトリの下には、ファイル、ディレクトリが共に存在していないことが示されている。表示操作部８０４における操作、および表示部８０２との対応については、表示操作部８０３、表示部８０１におけるものと同様であり、表示操作部８０４にて高能率符号化データ用のディレクトリを選択することができる。
【００７４】
８０５は、高能率符号化を実行するボタンであり、ここをクリックすることで、表示操作部８０３にて選択されたＰＣＭデータファイルが順に高能率符号化され、表示部８０２で示されたディレクトリの下に高能率符号化ファイルが作成される。この実際の処理の流れについて図１２を用いて説明する。
【００７５】
図１２Ａに示す状態では、図１１における表示操作部８０３にて、ｄａｔａ２．ｐｃｍ、ｄａｔａＡ．ｐｃｍ、ｄａｔａＢ．ｐｃｍの３つのＰＣＭファイルが選択され、反転表示されている。ここで図１１におけるボタン８０５をクリックすることにより、これらの３つのファイルがそれぞれ順に高能率符号化される。通常の高能率符号化処理の場合、処理を行うファイルの順序は特に問題とならない。
【００７６】
図１２Ｂに示す状態では、高能率符号化処理実行中の表示画面を示すものであり、符号化処理行程の進行状況が、棒グラフのような形で認識できるようになっている。ここでは図示していないが、ボタンの形で処理を途中で中止するような手段を設けても良い。図１２Ｃは、選択された全てのファイルの高能率符号化処理が終了した状態を示すものである。図１１における操作表示部８０４には、処理により作成された３つの高能率符号化データファイル、ｄａｔａ２ｅｎｃ．ｄａｔ、ｄａｔａＡｅｎｃ．ｄａｔ、ｄａｔａＢｅｎｃ．ｄａｔが表示されている。処理後の、高能率符号化データファイルのファイル名については任意性があるが、ここでは処理を行うＰＣＭファイル名の、いわゆる拡張子部分となる．ｐｃｍを取り除いた部分の名称にｅｎｃ．ｄａｔが自動的に付加されたファイル名を採用するようにしている。
【００７７】
次にボタン８０７について説明する。このボタン８０７は、複数のファイルの高能率符号化処理を、データ列として連続に扱うようにするものである。図１２Ｂを参照して説明したように、ファイルを連続して処理する場合、一づつのファイルについて、図１による行程と、図１０で示したデータ関係による処理を行うこととなる。このため、処理を行う全てのファイルについて、上述したように、始点での５１２個の０データのＰＣＭサンプルの想定、および終点についての０データのＰＣＭサンプルの想定を考慮した処理を行うこととなる。通常、楽曲がファイル毎に独立している場合はこの方法で問題とならないが、楽曲としては別であるがＰＣＭデータとして連続となっているような場合、高能率符号化処理を行うことで、連続性が失われてしまうこととなる。
【００７８】
この例を、先に示した図１２におけるｄａｔａＡ．ｐｃｍ、ｄａｔａＢ．ｐｃｍが連続したＰＣＭデータである場合を想定し、図１３Ａ、図１３Ｂ、および図１３Ｃを用いて説明する。図１３Ａでは、分割点を境にして、ｄａｔａＡ．ｐｃｍの終点のＰＣＭデータとｄａｔａＢ．ｐｃｍの始点のＰＣＭデータが連続しているものである様子を示している。
【００７９】
また、先に図１０等を用いて説明した高能率符号化処理を行うフレーム割りの最終部分については、図１３ＡにおけるＮとＮ＋１のような状態となったものとする。この時、ｄａｔａＡ．ｐｃｍの最終部の処理を示したものが図１３Ｂである。すなわち、Ｎ＋１番目のフレームが最終フレームとなるが、図１３Ａにおける分割点以降のデータについては別ファイルのデータであるので、分割点以降のデータを使用せず端数分となった部分については０データを詰め込んで処理を行う。
【００８０】
これに対して、ｄａｔａＢ．ｐｃｍの始点のデータについては、図１３Ｃに示した形の処理を行う。すなわち、図１３Ａにおける分割点以前のデータについては別ファイルのデータであるので、分割点以前のデータを使用せず、先頭フレームの１０２４個のＰＣＭデータは、５１２個のゼロデータと５１２個のｄａｔａＢ．ｐｃｍの始点のデータから構成される。
【００８１】
この時、図１３Ｂで示したｄａｔａＡ．ｐｃｍを処理するフレーム割りと、図１３Ｃで示したｄａｔａＢ．ｐｃｍを処理するフレーム割りが異なったものとなる。また、それぞれが端数分としてゼロデータを挿入しているため、連続性も失われた状態となっている。すなわち、ｄａｔａＡ．ｐｃｍとｄａｔａＢ．ｐｃｍを連続再生した場合は、連続した音となるが、ｄａｔａＡｅｎｃ．ｄａｔとｄａｔａＢｅｎｃ．ｄａｔを復号化して連続再生した場合は、音切れのような形となってしまう。
【００８２】
これに対して、図１１におけるボタン８０７をクリックして、データ列を連続した形で処理する場合の例を図１４Ａ、図１４Ｂおよび図１４Ｃを用いて説明する。図１４Ａに示すように、ファイルの分割点、およびｄａｔａＡ．ｐｃｍの処理フレーム割り等は、図１３Ａと同様の状態となっている。図１４Ｂは、ｄａｔａＡ．ｐｃｍの最終フレームの様子を示すものであるが、図１３Ｂとは異なり、分割点より外側のデータに０データを埋めるのではなく、ｄａｔａＢ．ｐｃｍのデータを採用している。
【００８３】
また、図１４Ｃは、ｄａｔａＢ．ｐｃｍの先頭のフレーム割りを示しているが、図１３Ｃのように、ファイルの始点にフレームをあわせて０データを埋めるのではなく、ｄａｔａＡ．ｐｃｍのフレーム割りと連続性を保つようなフレーム割り処理として、ｄａｔａＢ．ｐｃｍの始点より外側のデータについては、ｄａｔａＡ．ｐｃｍのデータを採用するようにしている。つまり図１４Ａでのフレーム割りで考えた場合の、Ｎ＋２というのがｄａｔａＢ．ｐｃｍの先頭フレームということになる。このように処理することにより、高能率符号化処理データにおいても、二つのファイル間で連続性が保たれることとなり、ｄａｔａＡｅｎｃ．ｄａｔとｄａｔａＢｅｎｃ．ｄａｔを復号化して連続再生した場合の音切れが起こらないこととなる。
【００８４】
上述した図１３Ａ、図１３Ｂ、図１３Ｃに示したように、符号化処理を行う場合の処理を図１５のフローチャートに示し、図１４Ａ、図１４Ｂ、図１４Ｃに示したように、符号化処理を行う場合の処理を図１６のフローチャートに示す。
【００８５】
図１５の最初のステップＳ１では、１０２４ポイント分の読み込みバッファを用意する。次に、処理の対象のファイルの番号ｉを０に設定する（ステップＳ２）。ステップＳ３では、処理すべきｉ番目のファイルがあるかどうかが決定される。ファイルがなければ、処理は、終了する（ステップＳ４）。
【００８６】
ｉ番目のファイルがある場合に、ステップＳ５において、読み込みバッファの前半５１２ポイント分データとしてゼロデータを詰める処理を行う。次に、ｉ番目の読み込みファイル（ＰＣＭファイル）をオープンし（ステップＳ６）、そして、ｉ番目の書き込みファイル（符号化ファイル）をオープンする（ステップＳ７）。読み込んだ符号化からバッファの後半５１２ポイントにデータを読み込む（ステップＳ８）。
【００８７】
ステップＳ９では、読み込みデータ量が取得され、読み込み位置が更新される。ステップＳ１０では、読み込みデータ量が５１２ポイントに満たないかどうかが決定される。読み込みデータ量が５１２ポイントに満たない場合には、ステップＳ１１において、読み込みバッファの５１２ポイントと、読み込みデータ量の差分量のデータとしてゼロデータが詰められる。
【００８８】
ステップＳ１０で読み込みデータ量が５１２ポイントある場合、またはステップＳ１１（ゼロデータの詰め込み）に続いて、ステップＳ１２において、１フレーム分の符号化処理がなされる。ステップＳ１３では、符号化データを書き込みファイルに書き込む。
【００８９】
ステップＳ１０の決定の結果が肯定の場合（読み込みデータ量が５１２ポイントに満たない場合）では、ステップＳ１４で、ｉ番目の読み込みファイルをクローズし、ステップＳ１５でｉ番目の書き込みファイルをクローズし、ステップＳ１６でｉのインクリメント処理がなされる。そして、処理がステップＳ３（ｉ番目のファイルの有無の決定）に戻る。
【００９０】
ステップＳ１０の決定の結果が否定の場合（読み込みデータ量が５１２ポイントある場合）では、ステップＳ１３に続いてステップＳ１７の処理がなされる。ステップＳ１７では、読み込みバッファの後半５１２ポイント分のデータをその前半５１２ポイントにシフトする。そして、処理がステップＳ９（読み込みデータ量の取得、および読み込み位置の更新）に戻る。
【００９１】
このようにして、図１３に示すように、楽曲がファイル毎に独立している場合に適用される処理がなされる。また、楽曲としては別であるが、ＰＣＭデータとして連続となっているような場合に適用される処理（図１４）を図１６のフローチャートを参照して説明する。
【００９２】
最初のステップＳ２１で、１０２４ポイント分の読み込みバッファが用意される。ステップＳ２２では、ｉが０に初期化される。ステップＳ２３では、最初のファイル（ｉ==０）であるか否かが決定される。最初のファイルの場合には、ステップＳ２４において、読み込みバッファの前半５１２ポイント分のデータとしてゼロデータが詰められる。そして、ｉ番目の読み込み（ＰＣＭ）ファイルをオープンし（ステップＳ２５）、ｉ番目の書き込み（符号化）ファイルをオープンする（ステップＳ２６）。ステップＳ２７では、読み込みファイルからバッファの後半の５１２ポイントにデータを読み込む。
【００９３】
ステップＳ２８では、読み込みデータ量が取得され、読み込み位置が更新される。ステップＳ２９では、読み込みデータ量が５１２ポイントに満たないかどうかが決定される。読み込みデータ量が５１２ポイントに満たない場合には、ステップＳ３０において、処理すべきｉ＋１番目のファイルがあるかどうかが決定される。
【００９４】
ステップＳ３０において、処理すべきｉ＋１番目のファイルがないと決定されると、ステップＳ３１では、読み込みバッファの５１２ポイントと、読み込みデータ量の差分量のデータとしてゼロデータが詰められる。
【００９５】
ステップＳ３０において、処理すべきｉ＋１番目のファイルがあると決定されると、ステップＳ３２において、ｉ＋１番目の読み込みファイルのオープンがなされる。そして、ステップＳ３３では、読み込みバッファの５１２ポイントと、読み込みデータ量の差分量のデータがｉ＋１番目のファイルから読み込まれ、読み込み位置が更新される。
【００９６】
ステップＳ２９で読み込みデータ量が５１２ポイントある場合、ステップＳ３１（ゼロデータの詰め込み）、またはステップＳ３３（ｉ＋１番目のファイルからのデータの読み込みと、読み込み位置の更新）に続いて、ステップＳ３４において、１フレーム分の符号化処理がなされる。ステップＳ３５では、符号化データを書き込みファイルに書き込む。
【００９７】
ステップＳ２９の決定の結果が否定の場合（読み込みデータ量が５１２ポイントある場合）では、ステップＳ３５に続いてステップＳ３６の処理がなされる。ステップＳ３６では、読み込みバッファの後半５１２ポイント分のデータをその前半５１２ポイントにシフトする。そして、処理がステップＳ２７（読み込みファイルからバッファの後半５１２ポイントにデータを読み込む）に戻る。
【００９８】
ステップＳ２９の決定の結果が肯定の場合（読み込みデータ量が５１２ポイントに満たない場合）では、ステップＳ３７で、ｉ番目の読み込みファイルをクローズし、ステップＳ３８でｉ番目の書き込みファイルをクローズする。そして、ステップＳ３０の決定の結果が否定（すなわち、ｉ＋１番目のファイルがない）場合に、処理が終了する（ステップＳ４０）。一方、ステップＳ３０の決定の結果が肯定（すなわち、ｉ＋１番目のファイルがある）場合に、ステップＳ３９でｉのインクリメント処理がなされ、ステップＳ３６の処理がなされる。そして、処理がステップＳ２３（最初のファイルか否かの決定）に戻る。
【００９９】
次に図１１におけるボタン８０９について説明する。ボタン８０９は、ファイルのデータを解析し、上述した連続性を考慮した符号化処理を行うべきか否かの判断を行うためのものである。連続性の考慮については、通常、考慮するべきか否かの判断は、実際に楽曲ファイルの最終部、先頭部を試聴して判断されている。しかしながら、大量の楽曲ファイルが存在する場合、その全てについて試聴を行うことは、時間を浪費し、効率的ではない。
【０１００】
一般的にある楽曲ファイルがその一つのファイルで完結して、他のファイルとの連続性が無いような場合、ファイルの先頭部付近、あるいは最終部付近のデータが無音となり、ゼロかゼロに近い値となる傾向にある。一方、他のファイルとの連続性があるようなファイルの場合、ファイルの先頭部付近、あるいは最終部付近のデータはゼロ以外で、ある程度の大きさ、すなわちある程度の音量レベルとなっている可能性が高い。ボタン８０９が押されると、この特徴に基づいて、符号化のために選択されたＰＣＭファイルの先頭部付近、あるいは最終部付近のデータの値を読み込み、それを分析することで該ファイルが他のファイルとの連続性を持つ可能性が高いか否かの判断がなされる。
【０１０１】
分析の方法としては様々なものが考えられるが、最も単純な方法としては例えば、ファイルの先頭データと最終データを１ポイントづつ読み込み、それがゼロであるか否かの判断を行う。若し、先頭データがゼロ以外の値であれば、そのファイルは、他のファイルの最終部と連続性がある可能性があるものとする。若し、最終データがゼロ以外の値であれば、そのファイルは他のファイルの先頭部と連続性がある可能性があるものとする。これらの連続性がある可能性がある旨を使用者に警告するようにする。使用者は、警告に従い、該ファイルを試聴することで、連続性を確認することで、確実な判断を行うことができる。
【０１０２】
また、例えば最終データがゼロ以外の値となるようなファイルが符号化処理の対象として選択されていた場合に、上述した警告を行わずに、符号化処理の際に、該ファイルの次に符号化処理されるように選択されたファイルと自動的に連続処理を行うようにしても良い。同様に、例えば先頭データがゼロ以外の値となるようなファイルが符号化処理の対象として選択されている場合に、上述した警告を行わずに、符号化処理の際に、該ファイルの前に符号化処理されるように選択されたファイルと自動的に連続処理を行うようにしても良い。
【０１０３】
上述の説明では、先頭データと最終データを１ポイントづつで、しきい値をゼロとした場合の例について説明した。しかしながら、実際には音として聞こえなくても、アナログ的なノイズや、ディザ等の影響で、他のファイルと連続性が無いのにも関わらず、先頭データまたは最終データがある程度の大きさの値をもったデータとなっている場合も少なくない。
【０１０４】
この問題に対処するために、使用者が自由にしきい値の設定等を行えるようにする。また、しきい値の設定は、レベルだけでなく、読み込みポイント数等でも行えるものとする。例えば、先頭データと最終データを１ポイントづつではなく１０ポイントづつとし、その１０ポイントの値の総和、平均値等をしきい値として用いる方法が可能である。あるいは、ファイルの最終値と他のファイルの先頭値との差分量を検索する方法等も可能である。このように、様々なしきい値設定を可能とすることで、より適切な判断が可能となる。
【０１０５】
上述したボタン８０９がクリックされた場合の処理例の行程と、画面表示について図１７および図１８を参照して説明する。まず、ボタン８０９をクリックすると、ステップＳ５１において検索条件の設定を行う。これはしきい値の設定を行うものである。例えば図１８Ａに示すような設定画面を通じて、先頭データと最終データについて読み込むデータのポイント数と、レベルを入力すると共に、しきい値の算出方法を入力するようにする。図１８Ａの例では、先頭データ、最終データ共に、１０ポイントの平均値が５を上回るかどうかで判断する設定となっている。なお、算出方法については、平均値以外に総和、最大値等が選択可能とされている。
【０１０６】
次に、ステップＳ５２において、符号化処理選択ファイルを順に処理していくための変数の初期化が行われる。ここではこの変数をｉとして、ゼロを設定している。ステップＳ５３では、符号化処理選択ファイル数と、変数ｉを比較することで、検索すべきファイルがあるか否かが決定される。ステップＳ５３で、処理選択ファイル数を変数ｉが上回れば処理が終了となる（ステップＳ５４）。
【０１０７】
ステップＳ５３において処理すべきファイルがあると決定されると、ステップＳ５５では、該ファイルを読み込むためのオープン処理を行う。この後、ステップＳ５６で、先頭部データを読み込む。ここで読み込むデータ数は、ステップＳ５１で設定したポイント数となる。ステップＳ５７では、ステップＳ５１で設定した、しきい値の条件と読み込んだデータが比較される。
【０１０８】
ステップＳ５７で、読み込んだデータがしきい値を上回るようであれば、ステップＳ５８の警告処理を行う。この警告処理では、例えば、図１８Ｂに示すような警告メッセージを表示する。表示中のＯＫのボタンを使用者がクリックしてから次のステップに進む。ステップＳ５９、Ｓ６０、Ｓ６１は、それぞれ最終部データについて、先頭データにおけるステップＳ５６、Ｓ５７、Ｓ５８に対応した処理を行うものである。同様に、図１８Ｃの警告メッセージの表示は、図１８Ｂの先頭データに関する警告表示と対応するものである。
【０１０９】
ステップＳ６０またはＳ６１までの処理を終えた後、ステップＳ６２において、ｉ番目のファイルをクローズ処理する。そして、ステップＳ６３において、変数ｉのインクリメント処理を行い、ステップＳ５３に戻る。この後に、警告のあったファイルについてのみ試聴を行うことで、使用者は符号化処理において連続性を考慮すべきか否かを容易に判断することが可能となる。また、ここでは符号化処理の前に、ＰＣＭファイルについて検索する例を述べたが、復号化の際にも同様に、符号化ファイルの正規化情報や、量子化状態などについてしきい値を持たせることで、同様の処理を行うことが可能である。
【０１１０】
次に、検索結果をうけて、実際の符号化処理を行う。図１１中のボタン８０７によって、図１３に示した形で処理を行うか、図１４に示した形で処理を行うかが選択される。なお、連続させるファイルの数が二つ以上の場合も同様である。連続処理させるファイルの選択については、図１７で示した方法で割り出される。
【０１１１】
図１９は、連続させるファイルを実際に設定する方法の一例を示す。図１９は、図１１にてボタン８０７をクリックした場合に現れる操作表示画面であり、操作表示画面上で連続処理させるファイルが選択される。９０１で示す表示部には、連続処理を行うファイルを表示している。ここではｄａｔａ２．ｐｃｍと、ｄａｔａ３．ｐｃｍを連続処理する例が示されている。表示部９０１には、直接ファイル名を入力することが可能であるが、ボタン９０５を使って、いわゆるファイル構造をグラフィカルに検索し、ファイルを選択することも可能である。このとき、ファイルを選択した順序が、連続処理に反映されることとなるが、表示部９０１内で順序を変更することも可能である。
【０１１２】
また、９０２を使用することで、複数のファイルの連続処理に対応することも可能である。表示部９０３は、表示部９０１と同様に、その他の組みで連続処理をさせるファイルについて設定するものである。この例ではｄａｔａＡ．ｐｃｍ、ｄａｔａＢ．ｐｃｍ、ｄａｔａＣ．ｐｃｍを連続処理させる設定が示されている。ここではｄａｔａ２．ｐｃｍと、ｄａｔａ３．ｐｃｍの連続処理を一組目、ｄａｔａＡ．ｐｃｍ、ｄａｔａＢ．ｐｃｍ、ｄａｔａＣ．ｐｃｍを二組目としているが、とくにこの組の数値については、直接処理結果には関わらない。９０４については、一組目の９０２に相当するものである。また、ここでは二組を表示しているが、９０６を使用することで、このような組を、更に設定することも可能である。最後にＯＫボタン９０７をクリックすることで設定が完了する。
【０１１３】
再び図１１について説明する。８０６は、表示操作部８０４にて選択された高能率符号化データファイルを復号化する時に押されるボタンである。その処理方法、表示内容の対応等については、高能率符号化時のボタン８０５によるものと同様である。また復号化時においても、上述した高能率符号化の連続処理の場合と同様に、ボタン８０７を使用することで、連続復号化処理を設定することが可能である。復号化の連続処理の場合は、ある高能率符号化データファイルの最終フレームと、他の高能率符号化データファイルの先頭フレームを連続フレームとして復号化処理する形に設定を行うようにすればよい。８０８は、プログラムを終了させるためのボタンである。
【０１１４】
上述した方法で、複数ファイルの符号化、復号化の際に、各ファイル独立に処理を行うか、または、異なるファイル間にまたがった連続性を考慮した処理を行うかを選択して、所望の形で処理ファイルを作成することが可能となる。また、連続性を考慮するか否かを、容易に判断することが可能となり、適応した処理をより迅速に行うことが可能となる。
【０１１５】
【発明の効果】
上述したこの発明によるディジタル信号処理方法は、所望の複数ファイルの符号化処理を行う時に、異なるファイル間の始点、終点の連続性を考慮した符号化と、考慮しない符号化を選択することが可能である。また、この発明では、所望の複数ファイルの始点、終点のデータを分析し、分析結果をもとに連続性を考慮した符号化と、考慮しない符号化を選択することにより、より容易、かつ正確に、連続性を考慮した処理を行うか否かの判断ができるようになり、大幅な作業効率の向上が可能となる。
【図面の簡単な説明】
【図１】高能率符号化データの生成に係る構成の一例を示すブロック図である。
【図２】各帯域毎の直交変換ブロックサイズについて説明するための略線図である。
【図３】図１中の一部の構成について詳細に示すブロック図である。
【図４】臨界帯域、ブロックフローティング等を考慮して分割された帯域のスペクトルの一例を示す略線図である。
【図５】マスキングスペクトルの一例を示す略線図である。
【図６】最小可聴カーブ、マスキングスペクトルの合成について説明するための略線図である。
【図７】この発明の一実施形態における符号化データフォーマットの一例を示す略線図である。
【図８】図７中の１バイト目のデータの詳細を示した略線図である。
【図９】ディジタル信号復号化処理に係る構成の一例を示すブロック図である。
【図１０】符号化データ内の各フレームにおけるオーバーラップについて説明するための略線図である。
【図１１】パソコン上で高能率符号化処理、および復号化処理を行うシステムの操作表示画面の一具体例を示す略線図である。
【図１２】図１１のシステムにより複数のファイルについて高能率符号化をおこなう処理を示す略線図である。
【図１３】二つのファイルの連続性を考慮せずに高能率符号化を行う場合のフレーム対応を示す略線図である。
【図１４】二つのファイルの連続性を考慮して高能率符号化を行う場合のフレーム対応を示す略線図である。
【図１５】二つのファイルの連続性を考慮せずに高能率符号化を行う場合の処理工程を示すフローチャートである。
【図１６】二つのファイルの連続性を考慮して高能率符号化を行う場合の処理工程を示すフローチャートである。
【図１７】先頭データと最終データを分析する処理工程を示すフローチャートである。
【図１８】先頭データと最終データを分析する処理工程において、検索しきい値の条件の入力画面、および警告メッセージの画面を示す略線図である。
【図１９】連続性を考慮した処理を行うファイルの組合せを選択するための操作表示画面の一具体例を示す略線図である。
【符号の説明】
１０１、１０２・・・帯域分割フィルタ、１０３、１０４、１０５・・・直交変換回路（ＭＤＣＴ）、１０９、１１０、１１１・・・ブロック決定回路、１１８・・・ビット割り当て算出回路、１０６、１０７、１０８・・・適応ビット割当符号化回路、１１９・・・正規化情報変更回路、１２０、１２１、１２２・・・加算器、３０２・・・帯域毎エネルギー算出器、３０３・・・畳込みフィルタ、３０４・・・加算器、３０５・・・関数発生器、３０６・・・割り算器、３０７・・・合成器、３０８・・・減算器、３０９・・・遅延回路、３１０・・・許容雑音補正器、７０１、７０２・・・帯域合成フィルタ（ＩＱＭＦ）、７０３、７０４、７０５・・・逆直交変換回路（ＩＭＤＣＴ）、７０６・・・適応ビット割当復号化回路、７０９・・・正規化情報変更回路、７１０・・・加算器、８０３・・・ＰＣＭデータファイルに関する表示操作部、８０４・・・符号化データファイルに関する表示操作部、８０７・・・複数のファイルの高能率符号化時の処理を選択するボタン、８０９・・・ファイルの連続性の解析を行うためのボタン[0001]
BACKGROUND OF THE INVENTION
  The present invention relates to an encoding apparatus and encoding method relating to a digital signal such as audio data.To the lawRelated.
[0002]
[Prior art]
As a conventional technique related to high-efficiency encoding of audio signals, for example, a time-domain audio signal is blocked per unit time, and a signal on the time axis for each block is converted into a signal on the frequency axis (orthogonal transform). A transform coding method is known which is one of the blocked frequency band division schemes that divide into a plurality of frequency bands and encode each band. In addition, sub-band band coding (sub-band coding) is one of the non-blocking frequency band division methods for coding by dividing the audio signal in the time domain into a plurality of frequency bands without being blocked every unit time. A coding (SBC: Sub Band Coding) method is known.
[0003]
Furthermore, a high-efficiency encoding method that combines the above-described band division encoding and transform encoding is also known. In this method, for example, a signal for each band divided by the band division coding method is orthogonally transformed to a frequency domain signal by the transform coding method, and coding is performed for each band subjected to the orthogonal transformation.
[0004]
Here, examples of the band division filter used in the above-described band division encoding method include a filter such as a QMF (Quadrature Mirror filter). QMF is described in, for example, R.E.Crochiere Digital coding of speech in subbands Bell Syst.Tech. J. Vol.55, No.8 (1976). Also, ICASSP 83, BOSTON Polyphase Quadrature filters-A new subband coding technique Joseph H. Rothweiler describes an equal-bandwidth filter division technique and apparatus such as a polyphase quadrature filter.
[0005]
As orthogonal transform, for example, an input audio signal is blocked in a predetermined unit time (frame), and fast Fourier transform (FFT), cosine transform (DCT), modified DCT transform (MDCT), etc. are performed for each block. Thus, a method for converting the time axis to the frequency axis is known. MDCT is described in, for example, ICASSP 1987 Subband / Transform Coding Using Filter Bank Designs Based on Time Domain Aliasing Cancellation J.P.Princen A.B.Bradley Univ. Of Surrey Royal Melbourne Inst.of Tech.
[0006]
On the other hand, an encoding method using a frequency division width in consideration of human auditory characteristics when quantizing each frequency component obtained by frequency band division is known. That is, a bandwidth called a critical band (critical band) is widely used such that the bandwidth becomes wider as the frequency increases. An audio signal may be divided into a plurality of bands (for example, 25 bands) using such a critical band. According to such a band division method, when data for each band is encoded, encoding by predetermined bit allocation for each band or adaptive bit allocation for each band is performed. For example, when the MDCT coefficient data generated by the MDCT process is encoded by the bit allocation as described above, the adaptive bit is applied to the MDCT coefficient data for each band generated corresponding to each block. Numbers are allocated and encoding is performed under such bit number distributions.
[0007]
As publicly known literature regarding such a bit allocation method and an apparatus for realizing the method, for example, the following can be cited. First, for example, IEEE Transactions of Accoustics, Speech, and Signal Processing, vol.ASSP-25, No.4, August (1977) describes a method of allocating bits based on the signal magnitude for each band. ing. Also, for example, ICASSP 1980 The critical band coder--digital encoding of the perceptual requirements of the auditory system MA Kransner MIT uses auditory masking to obtain the required signal-to-noise ratio for each band and a fixed bit. The method of allocation is described.
[0008]
Also, when encoding for each band, so-called block floating processing is performed to realize more efficient encoding by performing normalization and quantization for each band. For example, when encoding MDCT coefficient data generated by MDCT processing, quantization is performed after performing normalization corresponding to the maximum value of the absolute value of the above-mentioned MDCT coefficient for each band. More efficient encoding is performed. For example, the normalization process is performed as follows. That is, a plurality of types of values numbered in advance are prepared, and among these types of values, those related to normalization for each block are determined by a predetermined calculation process, and the numbers assigned to the determined values Is used as normalization information. Numbering corresponding to a plurality of types of values is performed under a certain relationship, for example, an increase / decrease of the number 1 corresponds to an increase / decrease of 2 dB of the audio level.
[0009]
The highly efficient encoded data generated by the method as described above is decoded as follows. First, processing for generating MDCT coefficient data based on the encoded data is performed with reference to bit allocation information, normalization information, and the like for each band. Time domain data is generated by performing so-called inverse orthogonal transform (IMDCT) based on the MDCT coefficient data. If band division by the band division filter has been performed in the process of high-efficiency encoding, processing for synthesizing time domain data using a band synthesis filter is further performed.
[0010]
In the above-described orthogonal transform MDCT processing used for encoding and inverse orthogonal transform IMDCT processing used for decoding, so-called overlap is used to prevent discontinuity between frames to be processed. Processing is being used. When a certain piece of music is encoded and decoded, matching processing is performed for the start point and end point of the music in consideration of the overlap and the conversion size.
[0011]
The high-efficiency encoding by the above-described method is basically performed in units of music. However, when a large amount of music is subjected to high-efficiency encoding processing, the user is required to perform the next music every time the processing of each music is completed. Since it is inefficient to prompt the start of this process, usually, a process is performed in which a desired music is selected in advance and the automatically selected music is highly efficiently encoded. More specifically, in a distribution server for electronic music distribution, a large amount of PCM files are stored in a hard disk, and high-efficiency encoding processing is performed at high speed by computer software processing.
[0012]
When a high-efficiency encoding process is automatically performed for a large number of music pieces like a distribution server, since high-efficiency encoding is performed in units of music pieces, overlap in orthogonal transformation at the start and end points for each music piece In addition, the adaptation processing considering the conversion size is performed. Depending on the music, when there is a correlation with other music, for example, the start point of the music may maintain continuity with the end point of the other music. As a specific example, in music such as a live version, remix, and dance, there is a case where music pieces are connected without a silent period. Even in such a case, if the adaptation process at the start point and the end point as described above is performed independently for each piece of music, there is a problem that the data after the high-efficiency encoding process loses continuity between pieces of music. Similar problems occur in decoding. When processing what has continuity between music, it is desirable to process inter-music data continuously, without performing the adaptation process in a start point and an end point.
[0013]
[Problems to be solved by the invention]
By the way, the determination as to whether or not to perform the continuous encoding as described above is basically performed by checking the listening of the first and last portions of each piece of music to be actually processed. . However, when it is necessary to process a large number of music pieces, it is necessary to check the listening of all the music pieces, and a great deal of time is consumed. In addition, along with the long-time confirmation work, the judgment ability may be reduced, and the judgment accuracy may be reduced.
[0014]
  Therefore, the object of the present invention is to automatically analyze the data of the first part and the last part of the music selected to be processed and determine whether to consider the continuity associated with encoding or decoding. , Encoding method and encoding method that can be performed easily and accurately and improve work efficiencyThe lawIt is to provide.
[0015]
[Means for Solving the Problems]
  The invention of claim 1 is an encoding device that blocks a plurality of digital audio files for each predetermined length and compresses the block-processed digital audio file,
  Apply compression from multiple digital audio files2 or moreDigital audio filesIn order of compression processingFirst selecting means for selecting;
  Selected by the first selection meansOf the two or more digital audio files, in the order of compression processingA block near the end of a digital audio file positioned in front of the adjacent digital audio file and a block near the start of a digital audio file positioned behind the adjacent digital audio file selected by the first selection means When,AdjacentFirst encoding means for performing an encoding process based on a block straddling two digital audio files;
  Selected by the first selection meansOf the two or more digital audio files, in the order of compression processingBlock near the end of a digital audio file located in front of an adjacent digital audio fileOr a block near the beginning of a digital audio file located behind an adjacent digital audio fileWhen,AdjacentSecond encoding means for performing encoding processing based on blocks straddling two digital audio files;
  Two or more digital audios selected by the first selection meansAn analysis means for analyzing the data at the start and end of the file;
  A second selection unit that selects one of the encoding process in the first encoding unit and the encoding process in the second encoding unit with reference to the analysis result of the analysis unit;MarkEncoding device.
[0017]
  Claim5The present invention is an encoding method in which a plurality of digital audio files are blocked at predetermined lengths and a compression process is performed on the block-processed digital audio file,
  A first selection step of selecting two or more digital audio files to be compressed from a plurality of digital audio files in the order of the compression processing;
  Of the two or more digital audio files selected in the first selection step, a block in the vicinity of the end of the digital audio file located in front of the adjacent digital audio file in the order in which compression processing is performed, and the first selection Encoding processing is performed based on the block near the beginning of the digital audio file located behind the adjacent digital audio file selected in the step and the block straddling the two adjacent digital audio files. A first encoding step;
  Of the two or more digital audio files selected in the first selection step, a block near the end of the digital audio file positioned in front of the adjacent digital audio file in the order in which compression processing is performed or an adjacent digital audio file A second encoding step for performing an encoding process based on a block near the beginning of the digital audio file located behind and a block straddling two adjacent digital audio files;
  An analysis step for analyzing data of start and end points of two or more digital audio files selected in the first selection step;
  A second selection step for selecting one of the encoding process in the first encoding step and the encoding process in the second encoding step with reference to the analysis result of the analysis step;It is an encoding method.
[0020]
According to the invention as described above, it is possible to analyze the continuity of a plurality of files by analyzing the data of the start and end points of the plurality of files to be processed. It is possible to limit the files to be auditioned with reference to the analysis result. Thereby, the processing efficiency can be improved.
[0021]
DETAILED DESCRIPTION OF THE INVENTION
An embodiment of the present invention will be described below with reference to the drawings. In one embodiment, an input digital signal, such as an audio PCM signal, is highly efficient encoded using band division coding (SBC), adaptive transform coding (ATC), and adaptive bit allocation techniques. This high-efficiency encoding technique will be described with reference to FIG.
[0022]
In the high-efficiency encoding device shown in FIG. 1, the input digital signal is divided into a plurality of frequency bands, and orthogonal transform is performed for each frequency band. For each so-called critical bandwidth (critical band) that takes into account human visual characteristics, and in the mid-high range, bit allocation is adaptively assigned to each sub-band of the critical bandwidth considering block floating efficiency. ing. Normally, this block is a quantization noise generation block. Furthermore, in one embodiment, the block size (block length) is adaptively changed according to the input signal before the orthogonal transformation.
[0023]
For example, when the sampling frequency is 44.1 kHz, an audio PCM signal of 0 to 22 kHz is supplied to the band division filter 101 such as a QMF filter via the input terminal 100. The band dividing filter 101 divides the supplied signal into a 0 to 11 kHz band and an 11 kHz to 22 kHz band. The signals in the 11 to 22 kHz band are supplied to an MDCT (Modified Discrete Cosine Transform) circuit 103 and block determination circuits 109, 110, and 111.
[0024]
A signal in the 0 kHz to 11 kHz band is supplied to the band division filter 102. The band division filter 102 divides the supplied signal into a 5.5 kHz to 11 kHz band and a 0 to 5.5 kHz band. The signal in the 5.5 to 11 kHz band is supplied to the MDCT circuit 104 and the block determination circuits 109, 110, and 111. A signal in the 0 to 5.5 kHz band is supplied to the MDCT circuit 105 and the block determination circuits 109, 110, and 111. The band division filters 101 and 102 can be configured using, for example, a QMF filter. The block determination circuit 109 determines the block size based on the supplied signal, and supplies information indicating the determined block size to the MDCT circuit 103 and the output terminal 113.
[0025]
The block determination circuit 110 determines a block size based on the supplied signal, and supplies information indicating the determined block size to the MDCT circuit 104 and the output terminal 115. The block determination circuit 111 determines the block size based on the supplied signal, and sends information indicating the determined block size to the MDCT circuit 105. And supplied to the output terminal 117. The block size block determination circuits 110, 111, and 112 adaptively set the block size (block length) according to the time characteristics and frequency distribution of the supplied signal.
[0026]
The MDCT circuits 103, 104, and 105 perform MDCT processing based on the supplied signal, and generate MDCT coefficient data or spectrum data on the frequency axis. The high-frequency MDCT coefficient data generated by the MDCT circuit 103 or the spectrum data on the frequency axis is subjected to a process for subdividing the critical bandwidth in consideration of the effectiveness of block floating, and then the adaptive bit allocation encoding circuit 106 And supplied to the bit allocation calculation circuit 118. The mid-range MDCT coefficient data generated by the MDCT circuit 104 or the spectrum data on the frequency axis is subjected to processing for subdividing the critical bandwidth in consideration of the effectiveness of block floating, and then the adaptive bit allocation encoding circuit 107. And supplied to the bit allocation calculation circuit 118.
[0027]
The low-frequency MDCT coefficient data generated by the MDCT circuit 105 or the spectrum data on the frequency axis is subjected to a process of grouping for each critical band (critical band), and then applied to the adaptive bit allocation encoding circuit 108 and the bit allocation calculation circuit 118. Supplied. Here, the critical band is a frequency band divided in consideration of human auditory characteristics, and when the pure tone is masked by narrow band noise of the same intensity near the frequency of a certain pure tone, Band noise band. The critical band has the property that the higher the band, the wider the bandwidth. The entire frequency band of 0 to 22 kHz is divided into 25 critical bands, for example.
[0028]
Based on the supplied MDCT coefficient data or spectrum data on the frequency axis, and the block size information, the bit allocation calculation circuit 118 considers the above-described critical band and block floating in consideration of a masking effect and the like as described later. The masking amount, energy, peak value, etc. for each divided band are calculated, and a scale factor indicating the state of block floating and the number of allocated bits are calculated for each band based on the calculation result. The calculated number of allocated bits is supplied to adaptive bit allocation coding circuits 106, 107, and 108. In the following description, each divided band that is a unit of bit allocation is referred to as a unit block.
[0029]
The adaptive bit allocation coding circuit 106 receives from the MDCT circuit 103 according to the block size information supplied from the block determination circuit 109, the number of allocated bits supplied from the bit allocation calculation circuit 118, and the scale factor information as normalization information. The supplied spectrum data or MDCT coefficient data is requantized (normalized and quantized). As a result of such processing, highly efficient encoded data is generated. This high efficiency encoding is supplied to the arithmetic unit 120. The adaptive bit allocation coding circuit 107 receives the spectrum data supplied from the MDCT circuit 104 according to the block size information supplied from the block determination circuit 110, the number of assigned bits supplied from the bit allocation calculation circuit 118, and the scale factor information. Alternatively, a process of requantizing the MDCT coefficient data is performed. As a result of such processing, highly efficient encoded data is generated. This highly efficient encoded data is supplied to the arithmetic unit 121.
[0030]
The adaptive bit allocation encoding circuit 108 uses the block size information supplied from the block determination circuit 110, the number of allocation bits supplied from the bit allocation calculation circuit 118, and the scale factor information to supply spectral data supplied from the MDCT circuit 105. Alternatively, the MDCT coefficient data is requantized. As a result of such processing, highly efficient encoded data is generated. This highly efficient encoded data is supplied to the calculator 122. The normalization information change circuit 119 and the calculators 120, 121, and 122 will be described later.
[0031]
FIG. 2 shows an example of data for each band supplied to the MDCT circuits 103, 104, and 105. By the operation of the block determination circuits 109, 110, and 111, the orthogonal transform block size can be set independently for each band for a total of three data output from the band division filters 101 and 102, and the signal time It is possible to switch the time resolution depending on characteristics, frequency distribution, and the like. That is, when the signal is quasi-stationary in time, Long Mode that increases the orthogonal transform block size to 11.6 ms, for example, as shown in FIG. 2A is used.
[0032]
On the other hand, when the signal is nonstationary, a mode in which the orthogonal transform block size is divided into two or four as compared with the Long Mode is used. More specifically, Short Mode (see FIG. 2B) that divides everything into 4 parts, for example, 2.9 ms, or divides a part into 2 parts, for example, 5.8 ms, and divides the other part into 4 parts. For example, Middle Mode-a (see FIG. 2C) or Middle Mode-b (see FIG. 2D) of 2.9 ms is used. Thus, by setting various time resolutions, it is possible to adapt to actual complex input signals.
[0033]
Obviously, when the constraint on the circuit scale or the like is small, the actual input signal can be more appropriately processed by making the division of the orthogonal transform block size more complicated. The block size as described above is determined by the block determination circuits 109, 110, and 111. Information on the determined block size is supplied to the MDCT circuits 103, 104, and 105 and the bit allocation calculation circuit 118, and the output terminal 113. , 115 and 117.
[0034]
Next, the bit allocation calculation circuit 118 will be described in detail with reference to FIG. Spectral data or MDCT coefficients on the frequency axis from the MDCT circuits 103, 104, and 105, and block size information from the block determination circuits 109, 110, and 111 are supplied to the energy calculation circuit 302 via the input terminal 301. The energy calculation circuit 302 calculates the energy for each unit block, for example, by calculating the sum of the amplitude values in the unit block. Instead of the energy calculation circuit 302, a configuration for calculating peak values, average values, and the like of amplitude values may be provided, and bit allocation processing may be performed based on calculated values such as peak values, average values, and the like of amplitude values. .
[0035]
An example of the output of the energy calculation circuit 302 is shown in FIG. In FIG. 4, the spectrum SB of the total value for each band is indicated by a vertical line segment with a circle at the tip. Here, the horizontal axis represents frequency and the vertical axis represents signal intensity. In FIG. 4, the number of divisions per unit block is 12 blocks (B1 to B12), and only the spectrum of B12 is denoted by the symbol “SB” in order to avoid complicated illustration.
[0036]
The energy calculation circuit 302 performs a process of determining a scale factor value that is normalization information indicating the block floating state of the unit block. Specifically, for example, some positive values are prepared in advance as candidates for the scale factor value, and among them, the value of the spectrum data in the unit block or the absolute value of the MDCT coefficient is greater than the maximum value. The smallest one is adopted as the scale factor value of the unit block. The scale factor value candidates may be numbered using, for example, several bits in a form corresponding to the actual values, and the numbers may be stored in a ROM (Read Only Memory) or the like (not shown). At this time, the scale factor value candidates are defined so as to have values at intervals of 2 dB, for example, in numerical order. A number assigned to a scale factor value adopted for a certain unit block is used as sub information, and is used as scale factor information for the unit block.
[0037]
The output of the energy calculation circuit 302, that is, each value of the spectrum SB is sent to the convolution filter circuit 303. The convolution filter circuit 303, for example, includes a plurality of delay elements that sequentially delay input data, a plurality of multipliers that multiply the output from these delay elements by a filter coefficient (weighting function), and a sum of outputs from the multipliers. And a sum adder. The convolution filter circuit 303 performs a convolution process such that the spectrum SB is multiplied by a predetermined weighting function and added in order to consider the influence on the masking of the spectrum SB. By this convolution processing, the sum total of the portions indicated by dotted lines in FIG. 4 is calculated.
[0038]
Returning to FIG. 3, the output of the convolution filter circuit 303 is supplied to the arithmetic unit 304. The computing unit 304 is further supplied from the (n-ai) function generation circuit 305 with an allowable function (a function expressing the masking level). The arithmetic unit 304 calculates a level α corresponding to an allowable noise level in the region convolved by the convolution filter circuit 303 according to the tolerance function. Here, the level α corresponding to the allowable noise level (allowable noise level) is set to an allowable noise level for each band of the critical band by performing a reverse convolution process, as will be described later. The level. The calculated value of the level α is controlled by increasing or decreasing the allowable function.
[0039]
That is, the level α corresponding to the allowable noise level can be obtained by the following equation (1), where i is a number given sequentially from the lowest band of the critical band.
[0040]
α = S− (n−ai) (1)
[0041]
In equation (1), n and a are constants, a> 0, S is the intensity of the spectrum subjected to convolution processing, and (n−ai) in equation (1) is an allowable function. As an example, n = 38 and a = 1 can be set.
[0042]
The level α calculated by the calculator 304 is transmitted to the divider 306. The divider 306 performs a process of deconvolution of the level α, and as a result, generates a masking spectrum from the level α. This masking spectrum becomes an allowable noise spectrum. In general, when performing the inverse convolution process, it is necessary to perform a complicated operation. However, in one embodiment of the present invention, the simplified convolution unit 306 is used to perform the inverse convolution. ing. The masking spectrum is supplied to the synthesis circuit 307. Further, data indicating a minimum audible curve RC as described later is supplied from the minimum audible curve generating circuit 312 to the synthesis circuit 307.
[0043]
The combining circuit 307 generates a masking spectrum by combining the masking spectrum that is the output of the divider 306 and the data of the minimum audible curve RC. The generated masking spectrum is supplied to the subtracter 308. Further, the output of the energy detection circuit 302, that is, the spectrum SB for each band is supplied to the subtracter 308 after the timing is adjusted by the delay circuit 309. The subtracter 308 performs a subtraction process based on the masking spectrum and the spectrum SB.
[0044]
As a result of such processing, the portion of the spectrum SB for each block below the masking spectrum level is masked. FIG. 5 shows an example of masking. It can be seen that the portion of the spectrum SB below the masking spectrum level (denoted as MS) is masked. In order to avoid complication of illustration, in FIG. 5, only “B” is given to the spectrum, and “MS” is given to the level of the masking spectrum only in B12.
[0045]
If the absolute noise level is below the minimum audible curve RC, the noise cannot be heard by humans. Even if the coding is the same, the minimum audible curve differs depending on, for example, the reproduction volume during reproduction. However, in an actual digital system, for example, there is not much difference in how music data enters the 16-bit dynamic range. For example, if quantization noise in the frequency band that is most audible near 4 kHz, for example, cannot be heard, It is considered that quantization noise below the level of the minimum audible curve cannot be heard in other frequency bands.
[0046]
Therefore, for example, when the system is used such that the noise around the 4 kHz word length of the system cannot be heard, an allowable noise level can be obtained by synthesizing the minimum audible curve RC and the masking spectrum MS. The allowable noise level is a portion indicated by hatching in FIG. Here, the level of 4 kHz of the minimum audible curve is set to the lowest level corresponding to 20 bits, for example. In FIG. 6, SB is shown as a horizontal solid line in each block, and MS is shown as a horizontal dotted line in each block. However, in order to avoid complication of illustration, in FIG. 6, only the spectrum of B12 is denoted by “SB” and “MS”. In FIG. 6, the signal spectrum SS is indicated by a one-dot chain line.
[0047]
Returning to FIG. 3, the output of the subtracter 308 is supplied to the allowable noise correction circuit 310. The allowable noise correction circuit 310 corrects the allowable noise level at the output of the subtractor 308 based on, for example, data of an equal loudness curve. That is, the allowable noise correction circuit 310 calculates the allocation bit for each unit block based on the various parameters such as masking and auditory characteristics described above. The output of the allowable noise correction circuit 310 is output as final output data of the bit allocation calculation circuit 118 via the output terminal 311. Here, the equal loudness curve is a characteristic curve relating to human auditory characteristics, for example, the sound pressure of sound at each frequency that is heard at the same magnitude as a pure tone of 1 kHz is obtained and connected by a curve. Also called sensitivity curve.
[0048]
Further, this equal loudness curve draws the same curve as the minimum audible curve RC shown in FIG. In this equal loudness curve, for example, even if the sound pressure is 8 to 10 dB lower than 1 kHz near 4 kHz, it can be heard as large as 1 kHz. It doesn't sound the same size. Therefore, if noise exceeding the level of the minimum audible curve RC (allowable noise level) has frequency characteristics along the equal loudness curve, the noise can be prevented from being heard by humans.
[0049]
It can be seen that correcting the allowable noise level in consideration of the equal loudness curve is suitable for human auditory characteristics. As described above, the bit allocation calculation circuit 118 obtains data obtained by processing the orthogonal transform output spectrum as the main information with the sub information, and the scale factor indicating the block floating state and the word length indicating the tone as the sub information. Based on these pieces of information, the adaptive bit encoding circuits 106, 107, and 108 in FIG. 1 perform requantization to generate highly efficient encoded data according to the encoding format.
[0050]
Returning to FIG. 1, the normalization information changing circuit 119 will be described. As described above, by manipulating the scale factor information determined by the energy calculation circuit 302, for example, level adjustment every 2 dB can be performed. The normalization information change circuit 119 generates values related to the change of the scale factor information, and supplies the generated values to the computing units 120, 121, and 122, respectively. The arithmetic unit 120 adds the values supplied from the normalization information changing circuit 119 to the scale factor information in the encoded data supplied from the adaptive bit allocation encoding circuits 106, 107, and 108, respectively. To do. However, when the value output from the normalization information change circuit 119 is negative, the arithmetic units 120, 121, 122 act as subtractors. The addition result at this time is limited so as to be within the range of the numerical value of the scale factor defined in the format.
[0051]
Note that when the normalization information change circuit 119 outputs the same value for all unit blocks as a value to be added to the scale factor information, level adjustment processing is performed, but the normalization information change circuit 119 If different values are output for each block, for example, filter processing or the like can be realized. When performing filter processing or the like, the normalization information changing circuit 119 outputs a set of a value to be added to the scale factor information and a unit block number related to the scale factor information to which the value is to be added. . The normalization information adjustment process as described above can also be realized in the case of decoding described later.
[0052]
Next, the encoding format of the highly efficient encoded data will be described with reference to FIG. Numerical values 0, 1, 2,..., 211 shown on the left side represent the number of bytes, and in this example, 212 bytes are used as a unit of one frame. The block size information of each band determined by the block determination circuits 109, 110, and 111 in FIG. Information on the number of unit blocks to be recorded is recorded at the position of the next first byte. For example, as the frequency becomes higher, the bit allocation is set to 0 by the bit allocation calculation circuit 118 and recording is often unnecessary. Therefore, by setting the number of unit blocks so as to cope with such a situation, Many bits are distributed to the middle and low range, which has a great influence on hearing. At the same time, the number of unit blocks in which bit assignment information is double-written and the number of unit blocks in which scale factor information is double-written are recorded at the position of the first byte.
[0053]
Double writing is a method of recording the same data as data recorded at a certain byte position in another location for error correction. Increasing the amount of data that is written twice increases the strength against errors, but the amount of data that can be used for spectrum data increases as the amount of data that is written twice is reduced. In an example of this encoding format, by setting the number of unit blocks for performing double writing independently for each of bit allocation information and scale factor information, it is used for recording the strength against errors and spectrum data. The number of bits is made appropriate. For each piece of information, the correspondence between the number of codes and unit blocks within the prescribed bits is determined in advance as a format.
[0054]
An example of recorded contents in 8 bits at the position of the first byte is shown in FIG. Here, the first 3 bits are used as information on the number of unit blocks to be actually recorded, the subsequent 2 bits are used as information on the number of unit blocks in which bit assignment information is written twice, and the last 3 bits are used as information. This is information on the number of unit blocks in which scale factor information is double-written.
[0055]
In FIG. 8, the bit allocation information of the unit block is recorded at the position from the second byte. For recording bit allocation information, for example, 4 bits are used per unit block. Thereby, bit allocation information corresponding to the number of unit blocks recorded in order from the 0th unit block is recorded. After the bit allocation information data, the scale factor information of each unit block is recorded. For recording scale factor information, for example, 6 bits are used per unit block. Thereby, the scale factor information for the number of unit blocks recorded in order from the 0th unit block is recorded.
[0056]
After the scale factor information, the spectrum data in the unit block is recorded. The spectrum data is recorded in order from the 0th unit block for the number of unit blocks to be actually recorded. Since how many pieces of spectrum data exist for each unit block is determined in advance by the format, it is possible to take correspondence of data by the above-described bit allocation information. Note that recording is not performed for a unit block whose bit allocation is 0.
[0057]
After the spectrum information, the above-described double writing of the scale factor information and the double writing of the bit allocation information are performed. This double-write recording method is the same as the above-described recording of scale factor information and bit allocation information except that the correspondence of the number corresponds to the double-write information shown in FIG. . In the last byte, that is, the 211th byte, and the position that is one byte before that, that is, the 210th byte, the information of the 0th byte and the 1st byte is written twice. These two-byte double writing is defined as a format, and the double writing recording amount cannot be set variable like the double writing of the scale factor information and the double writing of the bit allocation information.
[0058]
Next, a decoding process for decoding high-efficiency encoded data will be described. An example of the configuration of the decoding processing system is shown in FIG. The highly efficient encoded data is supplied to the computing unit 710 via the input terminal 707. Also, block size information used in the encoding process, that is, data equivalent to the output signals of the output terminals 113, 115, and 117 in FIG. 1 is supplied to the input terminal 708. Also, the normalization information change circuit 709 generates a value to be added to or subtracted from the scale factor information of each unit block.
[0059]
The arithmetic unit 710 is further supplied with numerical data from the normalization information change circuit 709. The computing unit 710 adds the numerical data supplied from the normalization information changing circuit 709 to the scale factor information in the supplied high efficiency encoded data. However, when the numerical data supplied from the normalization information change circuit 709 is a negative number, the calculator 710 acts as a subtracter. The output of the arithmetic unit 710 is supplied to an adaptive bit allocation decoding circuit 706 and an output terminal 711.
[0060]
The adaptive bit allocation decoding circuit 706 performs processing for releasing the bit allocation with reference to the adaptive bit allocation information for each of the high frequency band, the mid frequency band, and the low frequency band. The output of the adaptive bit allocation decoding circuit 706 for each of the high band, middle band, and low band is supplied to inverse orthogonal transform circuits 703, 704, and 705. The inverse orthogonal transform circuits 703, 704, and 705 perform inverse orthogonal transform processing on the supplied data. Thereby, the signal on the frequency axis is converted into a signal on the time axis. The partial band signals on the time axis, which are the outputs of the inverse orthogonal transform circuits 703, 704, and 705, are synthesized by the band synthesis filters 701 and 702, and decoded into full band signals. As the band synthesis filters 701 and 702, for example, IQMF (Inverse Quadrature Mirror filter) can be used.
[0061]
By manipulating the scale factor information by addition or subtraction by the arithmetic unit 710, it is possible to adjust the level of reproduction data, for example, every 2 dB. For example, the same numerical value is output from the normalization information changing circuit 709, and the level adjustment in units of 2 dB is performed for all unit blocks by the process of uniformly adding or subtracting the numerical value to the scale factor information of all unit blocks. It is possible to do.
[0062]
Further, for example, independent numerical values are output for each unit block from the normalization information change circuit 709, and the level adjustment for each unit block is performed by adding or subtracting these numerical values to the scale factor information of each unit block. As a result, a filter function can be realized. More specifically, the normalization information changing circuit 709 outputs a unit block and the unit block by a method such as outputting a set of a unit block number and a value to be added to or subtracted from the scale factor information of the unit block. The scale factor information is associated with the value to be added or subtracted. Note that the scale factor information generated as a result of addition or subtraction by the arithmetic unit 710 is limited so that the corresponding scale factor value falls within the range defined by the format of the highly efficient encoded data.
[0063]
The scale factor value that has been subjected to the level adjustment of the unit block by the arithmetic unit 710 is used for only the level adjustment of the decoded signal by being used in the decoding process of the adaptive bit allocation decoding circuit 706. For example, the scale factor value is read from the recording medium on which the encoded information is recorded, the adjusted scale factor value is output to the output terminal 711, and the scale factor value recorded on the recording medium is output. It is also possible to change to an adjusted value. The information on the recording medium can be changed as necessary. As a result, the level information of the recording medium can be changed with a very simple system.
[0064]
In the above description, the scale factor information changing process is performed in both the encoding circuit and the decoding circuit. On the other hand, even when the scale factor information changing process is performed only in the decoding circuit, functions such as level adjustment and filtering can be sufficiently obtained as a result of the changing process.
[0065]
Next, a time unit for performing the processing in the above-described high efficiency encoding will be described. 1 is supplied with audio PCM samples. In the MDCT processing performed by the MDCT circuits 103, 104, and 105 performed after the input, the number of samples for performing so-called orthogonal transform processing is defined. It becomes a unit and is repeatedly processed.
[0066]
Here, 1024 PCM samples input from the input terminal 100 are output from the MDCT circuits 103, 104, and 105 as 512 MDCT coefficients or spectrum data. Specifically, 1024 PCM samples input from the input terminal 100 are converted into 512 high-frequency samples, 512 low-frequency samples, and 256 mid-frequency samples by the band division filter 101. Thereafter, 256 low-frequency samples from the band division filter 102 become 128 low-frequency spectrum data by the MDCT circuit 105, and 256 middle-frequency samples from the band division filter 102 are obtained by the MDCT circuit 104. The 128 high-frequency spectrum data are converted into 128 high-frequency spectrum data, and the 512 high-frequency samples from the band division filter 101 are converted into 256 high-frequency spectrum data by the MDCT circuit 103. In this way, a total of 512 pieces of spectrum data are created from 1024 PCM samples. These 1024 PCM samples are a time unit for performing the above-described high-efficiency encoding once, and are the 212-byte high-efficiency encoded data shown in FIG. 7, that is, one frame.
[0067]
As described above, one frame is composed of, for example, 1024 PCM samples, but in the MDCT processing by the MDCT circuits 103, 104, and 105 in FIG. Arise. The relationship between the PCM sample and the frame will be described with reference to FIG. As shown in FIG. 10, for example, when 1024 PCM samples from nth to n + 1023 are processed in the Nth frame, 1024 PCMs from n + 512th to n + 1535th in the N + 1th frame. Samples are processed, and in the (N + 2) th frame, 1024 PCM samples from the (n + 1024) th to the (n + 2047) th are processed. As described above, one frame has an overlap between adjacent sound frames and 512 PCM samples. That is, when processing is performed in this manner, one frame of high-efficiency encoded information is obtained by processing 1024 PCM samples, but considering overlap with adjacent frames, 512 PCM samples. That would be considerable.
[0068]
FIG. 10 shows the correspondence with the frame in the middle of the PCM sample. With respect to the start point of the PCM sample, for example, 512 PCM samples of 0 data are assumed at a stage before the start point. It is assumed that the PCM sample of 0 data is overlapped with the virtual frame before the first frame and processed. Also, in the last frame, 512 zero-data PCM samples are assumed after the end of the sample sequence, and these 512 zero-data PCM samples overlap the virtual frames after the last frame. Shall be processed.
[0069]
Next, a method for processing the above-described encoding or decoding method as software on a so-called personal computer will be described. As processing on a personal computer, a high-efficiency encoded data file is created on the hard disk by performing high-efficiency encoding mainly on the PCM data file on the hard disk, or a high-efficiency encoded data file on the hard disk is created. It is conceivable to create a PCM data file on the hard disk by performing decryption processing. At this time, one piece of music usually corresponds to one file.
[0070]
As a specific example, screen display, operation method, processing process, etc. in software using a GUI (Graphical User Interface) in a so-called personal computer will be described with reference to FIG. FIG. 11 shows an example of screen display on the personal computer of the encoding and decoding software. The software first selects a directory for PCM data and high efficiency encoded data. Reference numeral 801 denotes a display section for the directory path of the PCM data file. In this example, a directory named PCMDATA in the C drive is currently selected. Reference numeral 803 denotes a display operation unit that displays the file structure in the directory indicated by the display unit 801 and can perform directory movement, drive movement, file selection, and the like. In this example, it can be seen that a directory named tmp exists under the directory indicated by the current display unit 801.
[0071]
In addition, it is assumed that the display “..” indicates a directory one level above. Further, the six files below tmp indicate PCM data files. Also, [-c-] and [-d-] below it indicate a movable drive. Whether the displayed item is a directory, a drive, or PCM data can be determined by a displayed character string or a so-called icon added to the side of the character string.
[0072]
The display section of the directory and drive can move the current directory position to the double-clicked position by associating the mouse pointer with the character string position and double-clicking. In this example, for example, when double-clicking is performed at the location of tmp, the display on the display unit 801 is C: ¥ PCMDATA ¥ tmp, and the display operation unit 803 indicates the state of the file under tmp and the movable drive. It comes to be. In this way, by repeatedly double-clicking the drive name or directory name, it is possible to move to the desired directory position for the PCM data file.
[0073]
Reference numeral 802 denotes a display unit that displays a directory position for high-efficiency encoded data. In the illustrated example, a directory named ENCODEDATA of the C drive is selected. Reference numeral 804 denotes a display operation unit that displays a file structure in the directory indicated by the display unit 802 and can perform directory movement, drive movement, file selection, and the like. In this example, it is indicated that neither a file nor a directory exists under the current directory shown on the display unit 802. The operations in the display operation unit 804 and the correspondence with the display unit 802 are the same as those in the display operation unit 803 and the display unit 801. The display operation unit 804 selects a directory for high-efficiency encoded data. Can do.
[0074]
Reference numeral 805 denotes a button for executing high-efficiency encoding. By clicking here, the PCM data file selected by the display operation unit 803 is sequentially encoded with high-efficiency, and the directory of the directory indicated by the display unit 802 is displayed. A high-efficiency encoded file is created below. The actual processing flow will be described with reference to FIG.
[0075]
In the state shown in FIG. 12A, the display operation unit 803 in FIG. pcm, dataA. pcm, dataB. Three PCM files of pcm are selected and highlighted. When the button 805 in FIG. 11 is clicked here, these three files are sequentially encoded with high efficiency. In the case of normal high-efficiency encoding processing, the order of files to be processed is not particularly problematic.
[0076]
The state shown in FIG. 12B shows a display screen during execution of the high-efficiency encoding process, and the progress of the encoding process can be recognized in the form of a bar graph. Although not shown here, a means for canceling the process in the form of a button may be provided. FIG. 12C shows a state where the high-efficiency encoding process has been completed for all the selected files. The operation display unit 804 in FIG. 11 includes three high-efficiency encoded data files, data2enc. data, dataAenc. data, dataBenc. dat is displayed. The file name of the high-efficiency encoded data file after processing is arbitrary, but here is the so-called extension part of the PCM file name to be processed. The name of the part from which pcm is removed is enc. A file name automatically added with dat is adopted.
[0077]
Next, the button 807 will be described. This button 807 is used to continuously handle high-efficiency encoding processing of a plurality of files as a data string. As described with reference to FIG. 12B, when files are continuously processed, the process according to FIG. 1 and the data relationship shown in FIG. 10 are performed for each file. For this reason, as described above, all files to be processed are processed in consideration of the assumption of 512 zero-data PCM samples at the start point and the assumption of zero-data PCM samples at the end point. . Normally, this method does not pose a problem if the music is independent for each file, but if it is separate as a music but continuous as PCM data, by performing a highly efficient encoding process, Continuity will be lost.
[0078]
An example of this is shown in FIG. pcm, dataB. The case where pcm is continuous PCM data will be described with reference to FIGS. 13A, 13B, and 13C. In FIG. 13A, dataA. PCM data at the end point of pcm and dataB. It shows that the PCM data at the starting point of pcm is continuous.
[0079]
Further, it is assumed that the final part of the frame division for performing the high-efficiency encoding processing described with reference to FIG. 10 or the like is in a state such as N and N + 1 in FIG. 13A. At this time, dataA. FIG. 13B shows the processing of the final part of pcm. That is, the (N + 1) th frame is the final frame, but the data after the dividing point in FIG. 13A is data of another file, so the data after the dividing point is not used and 0 data is obtained for the fractional portion. To process.
[0080]
In contrast, dataB. For the data of the start point of pcm, the process shown in FIG. 13C is performed. That is, since the data before the division point in FIG. 13A is data of another file, the data before the division point is not used, and the 1024 PCM data of the first frame is 512 zero data and 512 dataB. . It consists of data of the starting point of pcm.
[0081]
At this time, the dataA. The frame allocation for processing pcm and the dataB. The frame allocation for processing pcm is different. In addition, since zero data is inserted as a fraction, continuity is lost. That is, dataA. pcm and dataB. When pcm is reproduced continuously, it becomes a continuous sound. data and dataBenc. When dat is decoded and continuously reproduced, the sound is cut off.
[0082]
On the other hand, an example in which the data string is processed in a continuous form by clicking the button 807 in FIG. 11 will be described with reference to FIGS. 14A, 14B, and 14C. As shown in FIG. 14A, the file division points and dataA. The pcm processing frame allocation and the like are in the same state as in FIG. 13A. FIG. 14B shows dataA. This shows the state of the final frame of pcm, but unlike FIG. 13B, data B.R is not filled with 0 data in the data outside the dividing point. The data of pcm is adopted.
[0083]
FIG. 14C shows dataB. Although the top frame allocation of pcm is shown, as shown in FIG. 13C, instead of filling the frame with the start point of the file and filling 0 data, dataA. As a frame division process that maintains continuity with the frame division of pcm, dataB. For data outside the starting point of pcm, dataA. The data of pcm is adopted. That is, N + 2 when considering the frame division in FIG. This is the first frame of pcm. By processing in this way, continuity is maintained between the two files even in high-efficiency encoded data, and dataAenc. data and dataBenc. When dat is decoded and continuously reproduced, no sound interruption occurs.
[0084]
As shown in FIG. 13A, FIG. 13B, and FIG. 13C described above, the processing in the case of performing the encoding processing is shown in the flowchart of FIG. 15, and the encoding processing is performed as shown in FIGS. 14A, 14B, and 14C. The process in the case of performing is shown in the flowchart of FIG.
[0085]
In the first step S1 of FIG. 15, a reading buffer for 1024 points is prepared. Next, the number i of the file to be processed is set to 0 (step S2). In step S3, it is determined whether there is an i-th file to be processed. If there is no file, the process ends (step S4).
[0086]
If the i-th file exists, in step S5, a process of filling zero data as 512 points of data in the first half of the read buffer is performed. Next, the i-th read file (PCM file) is opened (step S6), and the i-th write file (encoded file) is opened (step S7). Data is read from the read encoding into the latter half 512 points of the buffer (step S8).
[0087]
In step S9, the read data amount is acquired, and the read position is updated. In step S10, it is determined whether the amount of read data is less than 512 points. If the read data amount is less than 512 points, in step S11, zero data is packed as data of the difference between the read buffer 512 points and the read data amount.
[0088]
When the read data amount is 512 points in step S10, or following step S11 (zero data filling), encoding processing for one frame is performed in step S12. In step S13, the encoded data is written into the write file.
[0089]
If the result of determination in step S10 is affirmative (if the read data amount is less than 512 points), the i-th read file is closed in step S14, the i-th write file is closed in step S15, and step In S16, i is incremented. Then, the process returns to step S3 (determination of presence / absence of i-th file).
[0090]
If the result of the determination in step S10 is negative (if the read data amount is 512 points), the process of step S17 is performed following step S13. In step S17, the data for the latter half 512 points of the read buffer is shifted to the first half 512 points. Then, the process returns to step S9 (acquisition of read data amount and update of read position).
[0091]
In this way, as shown in FIG. 13, the process applied when the music is independent for each file is performed. Further, a process (FIG. 14) applied when the PCM data is continuous although it is different from the music will be described with reference to the flowchart of FIG.
[0092]
In the first step S21, a reading buffer for 1024 points is prepared. In step S22, i is initialized to 0. In step S23, it is determined whether or not it is the first file (i == 0). In the case of the first file, in step S24, zero data is packed as data for the first half 512 points of the read buffer. Then, the i-th read (PCM) file is opened (step S25), and the i-th write (encoded) file is opened (step S26). In step S27, data is read from the read file to 512 points in the latter half of the buffer.
[0093]
In step S28, the read data amount is acquired, and the read position is updated. In step S29, it is determined whether the amount of read data is less than 512 points. If the read data amount is less than 512 points, it is determined in step S30 whether there is an i + 1 th file to be processed.
[0094]
If it is determined in step S30 that there is no i + 1-th file to be processed, in step S31, zero data is packed as 512 points of data in the read buffer and data of the difference between the read data amounts.
[0095]
If it is determined in step S30 that there is an i + 1 th file to be processed, the i + 1 th read file is opened in step S32. In step S33, 512 points of the reading buffer and the data of the difference amount of the reading data amount are read from the i + 1th file, and the reading position is updated.
[0096]
If the read data amount is 512 points in step S29, following step S31 (zero data filling) or step S33 (reading data from the (i + 1) th file and updating the read position), in step S34, 1 Encoding processing for the frame is performed. In step S35, the encoded data is written into the write file.
[0097]
If the determination result in step S29 is negative (if the read data amount is 512 points), the process of step S36 is performed subsequent to step S35. In step S36, the data for the second half 512 points of the read buffer is shifted to the first half 512 points. Then, the process returns to step S27 (reading data from the read file to the second half 512 points of the buffer).
[0098]
If the determination result in step S29 is affirmative (if the read data amount is less than 512 points), the i-th read file is closed in step S37, and the i-th write file is closed in step S38. Then, when the result of the determination in step S30 is negative (ie, there is no i + 1-th file), the process ends (step S40). On the other hand, if the result of the determination in step S30 is affirmative (that is, there is an i + 1-th file), i is incremented in step S39, and the process in step S36 is performed. Then, the process returns to step S23 (determination of whether or not it is the first file).
[0099]
Next, the button 809 in FIG. 11 will be described. A button 809 is used to analyze file data and determine whether or not to perform the above-described encoding processing in consideration of continuity. Regarding the consideration of continuity, the judgment as to whether or not to consider is usually made by actually listening to the last part and the beginning part of the music file. However, when there are a large number of music files, it is time consuming and inefficient to audition all of them.
[0100]
In general, when a music file is completed with one file and there is no continuity with other files, the data near the beginning or the end of the file becomes silent, and it is zero or near zero. It tends to be a value. On the other hand, in the case of a file that has continuity with other files, the data near the beginning or the end of the file may be other than zero and may have a certain size, that is, a certain volume level. Is expensive. When the button 809 is pressed, based on this feature, the value of data near the beginning or the end of the PCM file selected for encoding is read, and the file is analyzed by analyzing it. A determination is made whether there is a high probability of having continuity with the file.
[0101]
Various analysis methods are conceivable. As the simplest method, for example, the first data and the last data of a file are read one point at a time, and it is determined whether or not they are zero. If the leading data is a value other than zero, the file may have continuity with the last part of another file. If the final data is a value other than zero, the file may have continuity with the head of another file. The user is warned that there may be such continuity. The user can make a reliable judgment by checking the continuity by listening to the file according to the warning.
[0102]
Also, for example, when a file whose final data has a value other than zero is selected as the target of the encoding process, the encoding is performed after the file without performing the above-described warning. It is also possible to automatically perform continuous processing with a file selected to be processed. Similarly, for example, when a file whose leading data has a value other than zero is selected as the target of the encoding process, the above-mentioned warning is not given before the file without performing the above warning. A continuous process may be automatically performed with a file selected to be encoded.
[0103]
In the above description, an example has been described in which the head data and the last data are set one point at a time and the threshold value is set to zero. However, even if it is not actually heard as sound, the beginning data or the last data is a value of a certain size even though there is no continuity with other files due to analog noise, dithering, etc. There are many cases where the data has
[0104]
In order to cope with this problem, a user can freely set a threshold value and the like. The threshold value can be set not only by the level but also by the number of reading points. For example, it is possible to use a method in which the top data and the last data are set at 10 points instead of 1 point, and the sum, average value, etc. of the 10 points are used as threshold values. Alternatively, a method of searching for a difference amount between the final value of a file and the start value of another file is also possible. In this way, by making various threshold settings possible, more appropriate determination can be made.
[0105]
A process of an example of processing when the above-described button 809 is clicked and screen display will be described with reference to FIGS. 17 and 18. First, when a button 809 is clicked, search conditions are set in step S51. This is to set a threshold value. For example, through the setting screen as shown in FIG. 18A, the number of points and the level of data to be read for the first data and the last data are input, and the threshold value calculation method is input. In the example of FIG. 18A, both the top data and the final data are set to be determined based on whether the average value of 10 points exceeds 5. For the calculation method, a sum, a maximum value, etc. can be selected in addition to the average value.
[0106]
Next, in step S52, initialization of variables for sequentially processing the encoding process selection file is performed. Here, this variable is set to i and zero is set. In step S53, whether or not there is a file to be searched is determined by comparing the number of encoding process selection files with the variable i. If the variable i exceeds the number of process selection files in step S53, the process ends (step S54).
[0107]
If it is determined in step S53 that there is a file to be processed, an open process for reading the file is performed in step S55. Thereafter, the head data is read in step S56. The number of data read here is the number of points set in step S51. In step S57, the threshold value set in step S51 is compared with the read data.
[0108]
If the read data exceeds the threshold value in step S57, the warning process in step S58 is performed. In this warning process, for example, a warning message as shown in FIG. 18B is displayed. After the user clicks the OK button being displayed, the process proceeds to the next step. Steps S59, S60, and S61 respectively perform processing corresponding to steps S56, S57, and S58 in the first data for the final data. Similarly, the display of the warning message in FIG. 18C corresponds to the warning display regarding the top data in FIG. 18B.
[0109]
After the processing up to step S60 or S61 is completed, the i-th file is closed in step S62. In step S63, the variable i is incremented, and the process returns to step S53. Thereafter, the user can easily determine whether or not continuity should be considered in the encoding process by auditioning only the file with the warning. In this example, the PCM file is searched before the encoding process. Similarly, the decoding file has a threshold value for the normalization information of the encoded file, the quantization state, and the like. By doing so, it is possible to perform the same processing.
[0110]
Next, the actual encoding process is performed on the basis of the search result. A button 807 in FIG. 11 selects whether processing is performed in the form shown in FIG. 13 or processing is performed in the form shown in FIG. The same applies when the number of continuous files is two or more. The selection of files to be continuously processed is determined by the method shown in FIG.
[0111]
FIG. 19 shows an example of a method for actually setting files to be continuous. FIG. 19 is an operation display screen that appears when the button 807 in FIG. 11 is clicked, and files to be continuously processed are selected on the operation display screen. A display unit 901 displays a file for continuous processing. Here, data2. pcm and data3. An example of continuous processing of pcm is shown. Although the file name can be directly input to the display unit 901, a so-called file structure can be graphically searched and a file can be selected using a button 905. At this time, the order in which the files are selected is reflected in the continuous processing, but the order can be changed in the display unit 901.
[0112]
Further, by using 902, it is possible to support continuous processing of a plurality of files. Similar to the display unit 901, the display unit 903 sets a file to be continuously processed in other combinations. In this example, dataA. pcm, dataB. pcm, dataC. A setting to continuously process pcm is shown. Here, data2. pcm and data3. pcm continuous treatment in the first set, dataA. pcm, dataB. pcm, dataC. Although pcm is the second set, the numerical value of this set is not directly related to the processing result. Reference numeral 904 corresponds to the first set 902. Although two sets are displayed here, such a set can be further set by using 906. Finally, clicking the OK button 907 completes the setting.
[0113]
FIG. 11 will be described again. A button 806 is pressed when the high-efficiency encoded data file selected by the display operation unit 804 is decoded. The processing method, correspondence of display contents, and the like are the same as those by the button 805 at the time of high-efficiency encoding. Also at the time of decoding, it is possible to set the continuous decoding process by using the button 807 as in the case of the above-described high-efficiency encoding continuous process. In the case of continuous decoding processing, the final frame of a certain high-efficiency encoded data file and the first frame of another high-efficiency encoded data file may be set to be decoded. . Reference numeral 808 denotes a button for ending the program.
[0114]
In the method described above, when encoding or decoding multiple files, select whether to process each file independently or to perform processing considering continuity between different files, and select the desired It becomes possible to create a processing file in the form. In addition, it is possible to easily determine whether or not to consider continuity, and it is possible to perform adaptive processing more quickly.
[0115]
【The invention's effect】
In the digital signal processing method according to the present invention described above, it is possible to select encoding in consideration of the continuity of the start and end points between different files and encoding that does not take into account when encoding a plurality of desired files. It is. In addition, according to the present invention, it is easier and more accurate by analyzing the data of the start and end points of a desired plurality of files and selecting the encoding considering continuity and the encoding not considering based on the analysis result. In addition, it is possible to determine whether or not to perform processing in consideration of continuity, and the work efficiency can be greatly improved.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating an example of a configuration relating to generation of highly efficient encoded data.
FIG. 2 is a schematic diagram for explaining an orthogonal transform block size for each band;
FIG. 3 is a block diagram showing in detail a part of the configuration in FIG. 1;
FIG. 4 is a schematic diagram illustrating an example of a spectrum of a band divided in consideration of a critical band, block floating, and the like.
FIG. 5 is a schematic diagram illustrating an example of a masking spectrum.
FIG. 6 is a schematic diagram for explaining synthesis of a minimum audible curve and a masking spectrum.
FIG. 7 is a schematic diagram illustrating an example of an encoded data format according to an embodiment of the present invention.
8 is a schematic diagram showing details of data of the first byte in FIG. 7; FIG.
FIG. 9 is a block diagram illustrating an example of a configuration related to a digital signal decoding process.
FIG. 10 is a schematic diagram for explaining overlap in each frame in encoded data;
FIG. 11 is a schematic diagram illustrating a specific example of an operation display screen of a system that performs high-efficiency encoding processing and decoding processing on a personal computer.
12 is a schematic diagram showing processing for performing high-efficiency encoding on a plurality of files by the system of FIG.
FIG. 13 is a schematic diagram showing frame correspondence when performing high-efficiency encoding without considering the continuity of two files.
FIG. 14 is a schematic diagram showing frame correspondence when performing high-efficiency encoding in consideration of the continuity of two files.
FIG. 15 is a flowchart showing processing steps when high-efficiency encoding is performed without considering the continuity of two files.
FIG. 16 is a flowchart showing processing steps when high-efficiency encoding is performed in consideration of the continuity of two files.
FIG. 17 is a flowchart showing processing steps for analyzing the top data and the final data.
FIG. 18 is a schematic diagram illustrating a search threshold value input screen and a warning message screen in a processing step of analyzing the top data and the final data.
FIG. 19 is a schematic diagram illustrating a specific example of an operation display screen for selecting a combination of files to be processed in consideration of continuity.
[Explanation of symbols]
101, 102 ... Band division filter, 103, 104, 105 ... Orthogonal transform circuit (MDCT), 109, 110, 111 ... Block decision circuit, 118 ... Bit allocation calculation circuit, 106, 107, 108: adaptive bit allocation encoding circuit, 119: normalized information changing circuit, 120, 121, 122 ... adder, 302 ... energy calculator for each band, 303 ... convolution filter, 304 ... adder, 305 ... function generator, 306 ... divider, 307 ... synthesizer, 308 ... subtractor, 309 ... delay circuit, 310 ... allowable noise correction 701, 702 ... Band synthesis filter (IQMF), 703, 704, 705 ... Inverse orthogonal transform circuit (IMDCT), 706 ... Adaptive bit allocation decoding circuit, 70 ... normalization information change circuit, 710 ... adder, 803 ... display operation unit for PCM data file, 804 ... display operation unit for encoded data file, 807 ... height of multiple files Button for selecting the processing at the time of efficiency encoding, 809 ... button for analyzing the continuity of the file

Claims

An encoding device that blocks a plurality of digital audio files at predetermined lengths and compresses a block-processed digital audio file,
First selection means for selecting two or more digital audio files to be compressed from the plurality of digital audio files in the order of the compression processing ;
Of the two or more digital audio files selected by the first selection means, a block near the end of a digital audio file located in front of an adjacent digital audio file in the order in which the compression processing is performed ; and neighboring blocks beginning of the digital audio files to be located behind the digital audio file which has been said adjacent selected by the first selection means, based on a block that extends over to the adjacent two digital audio files First encoding means for performing encoding processing,
Of the two or more digital audio files selected by the first selection means, a block near the end of a digital audio file positioned in front of an adjacent digital audio file in the order in which the compression processing is performed, or the adjacent Second encoding means for performing encoding processing based on a block in the vicinity of the beginning of the digital audio file located behind the digital audio file to be performed and a block straddling the two adjacent digital audio files When,
Analyzing means for analyzing data of start and end points of the two or more digital audio files selected by the first selection means ;
Marks refer to the analysis result of the analyzing means, Ru and a second selecting means for selecting one of the encoding processing in the encoding process and the second encoding means in the first encoding means Encoding device.

The analysis means is
Comparing the volume level of the starting data of the selected digital audio file with a preset threshold;
The encoding apparatus according to claim 1, wherein when the data level of the start point is greater than the threshold value, it is determined that there is continuity with the end point part of another digital audio file.

The analysis means is
Comparing the volume level of the end point data of the selected digital audio file with a preset threshold;
2. The encoding apparatus according to claim 1, wherein when the end point data level is larger than the threshold value, it is determined that there is continuity with the head portion of another digital audio file.

The second selection means is:
The encoding apparatus according to claim 2 or 3, wherein when the analyzing means determines that there is continuity with the other digital audio file, the encoding process in the first encoding means is selected.

A coding method for performing block processing on a plurality of digital audio files for each predetermined length and compressing the block-processed digital audio file,
A first selection step of selecting two or more digital audio files to be compressed from the plurality of digital audio files in the order of the compression processing;
Of the two or more digital audio files selected in the first selection step, a block in the vicinity of the end of the digital audio file located in front of the adjacent digital audio file in the order in which the compression processing is performed; Based on a block near the beginning of the digital audio file located behind the adjacent digital audio file selected in the first selection step, and a block straddling the two adjacent digital audio files. A first encoding step for performing an encoding process;
Of the two or more digital audio files selected in the first selection step, a block near the end of a digital audio file located in front of an adjacent digital audio file in the order in which the compression processing is performed, or the adjacent A second encoding step for performing an encoding process based on a block near the beginning of the digital audio file located behind the digital audio file to be performed and a block straddling the two adjacent digital audio files; When,
An analysis step for analyzing data of start and end points of the two or more digital audio files selected in the first selection step;
Encoding comprising a second selection step of selecting one of the encoding process in the first encoding step and the encoding process in the second encoding step with reference to the analysis result of the analysis step Method.